Reshape Geometry

Explode Parcel Polygon Into Parcel Edges

This step defines a function called explode_to_lines() that converts polygon boundaries into individual line segments(LineString). It then retains retains the original attributes of each polygon, creating a new GeoDataFrame (parcel_seg) with the exploded line segments. After generating this GeoDataFrame, it further resets the index by grouping the line segments based on the parcel_id column, using cumcount() to ensure each segment has a unique index within its respective group.

Code

# Function to explode Polygons into individual boundary line segments
def explode_to_lines(gdf):
    # Create a list to store new rows
    line_list = []

    for index, row in gdf.iterrows():
        # Get the exterior boundary of the polygon
        exterior = row['geometry'].exterior
        # Convert the boundary into LineString segments
        lines = [LineString([exterior.coords[i], exterior.coords[i + 1]]) 
                 for i in range(len(exterior.coords) - 1)]
        
        # Create new rows for each line segment, retaining the original attributes
        for line in lines:
            new_row = row.copy()
            new_row['geometry'] = line
            line_list.append(new_row)
    
    # Use pd.concat to generate the final GeoDataFrame
    line_gdf = pd.concat(line_list, axis=1).T
    line_gdf = gpd.GeoDataFrame(line_gdf, geometry='geometry', crs=gdf.crs)
    
    return line_gdf

# Call the function to explode the line segments
parcel_seg = explode_to_lines(parcel)

# Reset the index by group
parcel_seg['new_index'] = parcel_seg.groupby('parcel_id').cumcount()
parcel_seg.set_index('new_index', inplace=True)
parcel_seg.index.name = None

	Prop_ID	GEO_ID	parcel_id	parcel_addr	landuse	landuse_spec	parcel_label	geometry	Found_Match	match_road_address
0	51	NaN	326302	117 E BELKNAP ST	NaN	C1	NaN	LINESTRING (-10835051.6694 3863246.022100002, ...	True	E Belknap St
1	51	NaN	326302	117 E BELKNAP ST	NaN	C1	NaN	LINESTRING (-10835069.5268 3863276.837999999, ...	True	E Belknap St
2	51	NaN	326302	117 E BELKNAP ST	NaN	C1	NaN	LINESTRING (-10835051.8949 3863286.881099999, ...	True	E Belknap St
3	51	NaN	326302	117 E BELKNAP ST	NaN	C1	NaN	LINESTRING (-10835035.1922 3863255.495399996, ...	True	E Belknap St
4	7123337	NaN	513461	1416 MICAH WAY	R	A	NaN	LINESTRING (-10822668.8036 3885115.729199999, ...	True	Micah Way

Geometry Correction

In some cases, the exterior boundaries of polygons contain curved geometries that are split into multiple line segments during the explode step. The goal is to merge such segments into a single, continuous curve if they are connected and their turning angle is below a specified threshold. The following process outlines how to perform this merging:

01.Calculate the Angle Between Two Lines

Develop a function to compute the angle between adjacent line segments. This will help determine if the segments are nearly collinear.

02.Check if Two Segments are Connected

Create a function to check whether two segments share a common endpoint, indicating they are adjacent and can be considered for merging.

03.Re-generate Segment Index Based on Turning Points

Develop a function to re-index segments based on turning points (where the angle exceeds the threshold), marking where merging should stop.

04.Iterative Merging of Connected Segments

For each segment, check its connectivity and angle with the next segment:

If two connected segments have an angle below the threshold, merge them.
Continue to the next segment and repeat the merging process until a turning point is encountered (i.e., the angle is above the threshold).

05.Divide the curve into two segments

This step calculates the number of edges for each parcel (parcel_id). For parcels with exactly three edges, it identifies curved segments where the angle between start and end tangents exceeds 30 degrees (angle_threshold=30). These curves are split into two sections at the midpoint of the angle change (angle_fraction=0.5). The modified segments are then combined with the unchanged segments to produce the final result.:

Calculate edge counts: Count the number of edges per parcel_id using groupby.size(). Map this count to a new column edge_num.
Identify which curve need to be changed: Focus on segments where edge_num == 3 for processing potential curve segments.
Create tangent lines of start and end points for each edge and calculate the angle between them.
Create the half split point(angle_fraction=0.5) for each curve and process the curve line.
Combine those changed edges into unchanged edges

The detailed Python implementation is shown below:

01.Calculate the Angle Between Two Lines

This step defines two functions, fun_bearing_ra() and calculate_angle_difference(), for analyzing the orientation and angle between geometric line segments.This approach provides a robust way to measure angular differences between line segments and effectively addresses the issue of angle bias caused by different vector directions, which is essential for identifying whether two adjacent segments are nearly parallel and can be merged.

fun_bearing_ra(geom)
This function computes the bearing of a given geometry (geom) using its first and last coordinates. The bearing is calculated with atan2 and returned as a radian value.

Code

# Function to calculate the bearing of a geometry
def fun_bearing_ra(geom):
    coords = np.array(geom.coords)
    # Use the first and last coordinates to calculate the bearing
    x1, y1 = coords[0]
    x2, y2 = coords[-1]
    
    # Calculate the bearing using atan2
    bearing = math.atan2(y2 - y1, x2 - x1)
    
    return bearing

calculate_angle_difference(line1, line2)
This function calculates the angle difference between two line segments (line1 and line2) by comparing their bearings using fun_bearing_ra(). It effectively solves the issue of angle bias caused by different vector orientations by normalizing the angle difference to a consistent range of -π to π radians.
The angle difference is constrained within the range of -π to π radians to ensure it accurately reflects the shortest angular distance between the two segments.
This approach ensures that the measured angle remains accurate even if the segments point in opposite directions. By returning the smaller of the two possible angles (the angle itself or its supplement) to guarantee that the angle difference is always ≤ 90 degrees.

Code

# Fuction to calculate the angle based on the bearing
def calculate_angle_difference(line1, line2):
    bearing1 = fun_bearing_ra(line1)
    bearing2 = fun_bearing_ra(line2)
    # Calculate the absolute angle difference and ensure it is <= 180 degrees
    delta_theta = bearing2 - bearing1
    
    # Ensure the angle is between -π and π
    if delta_theta > math.pi:
        delta_theta -= 2 * math.pi
    elif delta_theta < -math.pi:
        delta_theta += 2 * math.pi
    
    # Convert the angle to degrees
    angle_between_degrees = math.degrees(abs(delta_theta))
    
    # Return the smaller angle difference (angle or its supplement)
    return min(angle_between_degrees, 180 - angle_between_degrees)

02.Check if Two Segments are Connected

The are_segments_connected() function determines if two line segments (line1 and line2) share a common point, indicating that they are connected. It achieves this by comparing the coordinates of the start and end points of both segments using NumPy arrays. If any of the four combinations of start or end points match (i.e., start-to-start, start-to-end, end-to-start, or end-to-end), the function returns True, indicating that the segments are connected. Otherwise, it returns False. This function is essential for identifying adjacency between line segments, which serves as a prerequisite for merging operations.

Code

# Check if two segments share a common point (i.e., their start or end point is the same)
def are_segments_connected(line1, line2):
    coords1 = np.array(line1.coords)
    coords2 = np.array(line2.coords)
    
    # Check if the start or end points of the segments are the same
    if np.all(coords1[0] == coords2[0]) or np.all(coords1[0] == coords2[-1]) or \
       np.all(coords1[-1] == coords2[0]) or np.all(coords1[-1] == coords2[-1]):
        return True
    return False

03.Re-generate Segment Index Based on Turning Points

This step processes parcel_seg GeoDataFrame by identifying turning points and reordering the index in each segments group. The re-indexing is critical for ensuring that the subsequent merging logic functions correctly. For example, if the segments are not ordered correctly, some segments that meet the merging criteria might not be combined as expected (e.g., some polygons are drawn starting from the middle of a segment rather than at a corner).
This process realigns the segments to reflect their actual geometric sequence, which is essential for correct segment merging in subsequent operations.It consists of two main functions:

reorder_segments_by_turning_point(segments, turning_point_index)
This function reorders the index in selected segments list, starting from a specified turning point.

Code

# Function to reorder segments based on the turning point
def reorder_segments_by_turning_point(segments, turning_point_index):
    # Reorder segments starting from the identified turning point
    reordered_segments = segments[turning_point_index:] + segments[:turning_point_index]
    return reordered_segments

process_parcel_segments(parcel_seg)
This main function processes each group of line segments based on parcel_id by calculating the angle differences between adjacent segments using calculate_angle_difference(). If an angle difference exceeds the defined threshold (15 degrees), it is marked as a turning point. The function then reorders segments, starting from the first turning point, to preserve logical continuity in the line sequence. The reordered segments are updated in original parcel_seg GeoDataframe, along with their original index and new index.

Code

# Main function: Process each parcel_id group and return a new GeoDataFrame
def process_parcel_segments(parcel_seg):
    merged_segments = []  # List to store the reordered segments

    # Group the parcel segments by parcel_id and process each group
    for object_id, group in parcel_seg.groupby('parcel_id'):
        segments = group['geometry'].tolist()  # Get the list of line segments for the current group
        original_indices = group.index.tolist()  # Preserve the original indices
        turning_points = []

        # Loop through all adjacent segments to calculate angle differences
        for i in range(1, len(segments)):
            if are_segments_connected(segments[i-1], segments[i]):
                angle_diff = calculate_angle_difference(segments[i-1], segments[i])
                if angle_diff > 15:  # If angle difference is greater than 15 degrees, mark it as a turning point
                    turning_points.append(i)

        # If there are turning points, reorder the segments starting from the first turning point
        if turning_points:
            turning_point_index = turning_points[0]
            reordered_segments = reorder_segments_by_turning_point(segments, turning_point_index)
            reordered_original_indices = reorder_segments_by_turning_point(original_indices, turning_point_index)
        else:
            # If no turning points, retain the original order
            reordered_segments = segments
            reordered_original_indices = original_indices

        # Store the reordered segments and their attributes
        for j, (line, original_index) in enumerate(zip(reordered_segments, reordered_original_indices)):
            row = group.iloc[0].copy()  # Copy the first row's attributes
            row['geometry'] = line
            row['original_index'] = original_index  # Preserve the original index
            row['new_index'] = j  # Assign the new index based on the reordered list
            merged_segments.append(row)

    # Create a new GeoDataFrame for the reordered segments
    updated_gdf = gpd.GeoDataFrame(merged_segments, columns=parcel_seg.columns.tolist() + ['original_index', 'new_index'])
    updated_gdf = updated_gdf.reset_index(drop=True)

    return updated_gdf

# Run the main function and get the new GeoDataFrame
updated_parcel_seg = process_parcel_segments(parcel_seg)
parcel_seg = updated_parcel_seg

Total number of parcel segments before geometry correction: 37141

04.Iterative Merging of Connected Segments

This step processes parcel_seg by merging line segments within each parcel group (OBJECTID) based on their connectivity and angle difference(15 degrees), which is to combine multiple curved segments into a single continuous curve.
Specifically, if segments[i-1] shares a common point with segments[i] and the angle difference is less than 15 degrees, they are merged into one. However, the merging process does not continue by comparing the merged segment with segments[i+1]. Instead, the next comparison is always made between segments[i] (the last unmerged segment) and segments[i+1]. If segments[i] and segments[i+1] also satisfy the merging criteria, then the previously merged segment is combined with segments[i+1], forming a new, longer segment. This step-by-step approach ensures that each newly created segment only extends when consecutive segments continue to meet the merging conditions.

Code

# Group parcel_seg by parcel_id and process each group
merged_segments = []

for object_id, group in parcel_seg.groupby('parcel_id'):
    # Get the list of geometries in the current group
    segments = group.geometry.tolist()
    # Start with the first segment
    merged_lines = [segments[0]]  # Start with the first segment
    
    for i in range(1, len(segments)):
        connected = False
        
        # Always compare the current segment with the previous one
        if are_segments_connected(segments[i-1], segments[i]):
            # Calculate the angle difference between the current segment and the previous one
            angle_diff = calculate_angle_difference(segments[i-1], segments[i])
            
            # If the angle difference is less than 15 degrees, merge the adjacent line segments
            if angle_diff < 15:
                # Merge the current and previous segments
                merged_result = linemerge([merged_lines[-1], segments[i]])
                
                # Check if the result is a MultiLineString, if so, skip the merge
                if isinstance(merged_result, LineString):
                    merged_lines[-1] = merged_result
                    connected = True
                else:
                    # Skip the merge if it's a MultiLineString
                    continue
        
        # If no connected segment is found or the angle difference is too large, add the current segment as a new one
        if not connected:
            merged_lines.append(segments[i])
    
    # Keep the merged results and add other attributes
    for line in merged_lines:
        row = group.iloc[0].copy()  # Copy the first attribute row from the group
        row['geometry'] = line
        merged_segments.append(row)

# Create a new GeoDataFrame from the merged line segments
parcel_seg = gpd.GeoDataFrame(merged_segments, columns=parcel_seg.columns)

# Check for MultiLineString geometries and explode them into LineString
exploded_segments = []

for index, row in parcel_seg.iterrows():
    geom = row['geometry']
    
    if isinstance(geom, MultiLineString):
        # Explode the MultiLineString into individual LineStrings
        for line in geom:
            new_row = row.copy()
            new_row['geometry'] = line
            exploded_segments.append(new_row)
    else:
        # Keep the original LineString geometries
        exploded_segments.append(row)

# Create a new GeoDataFrame from the exploded segments
parcel_seg = gpd.GeoDataFrame(exploded_segments, columns=parcel_seg.columns)
# extract useful columns
parcel_seg.drop(columns=['original_index', 'new_index'], inplace=True)
# Reset the index of the final GeoDataFrame
parcel_seg = parcel_seg.reset_index(drop=True)

Total number of parcel segments after geometry correction: 20346

05.Divide the curve into two segments

Divide curves with three edges into two segments at the midpoint of the angle change (>30°) and combine with unchanged segments for final output.

Code

edge_counts = parcel_seg.groupby('parcel_id').size()
parcel_seg['edge_num'] = parcel_seg['parcel_id'].map(edge_counts)

# Function to create tangent lines at both ends of a line segment
def create_tangents(line):
    coords = list(line.coords)
    if len(coords) < 2:
        return None, None  # Skip invalid geometries
    
    # Create tangents at the start and end of the line segment
    start_tangent = LineString([coords[0], coords[1]])
    end_tangent = LineString([coords[-2], coords[-1]])
    
    return start_tangent, end_tangent

# Function to filter curve segments based on angle difference of tangents > 30 degrees
def filter_curve_segments(parcel_seg, angle_threshold=30):
    filtered_segments = []
    non_filtered_segments = []
    
    for idx, row in parcel_seg.iterrows():
        line = row['geometry']
        start_tangent, end_tangent = create_tangents(line)
        
        if start_tangent and end_tangent:
            angle_diff = calculate_angle_difference(start_tangent, end_tangent)
            row_dict = row.to_dict()  # Convert the entire row to a dictionary
            row_dict['index'] = idx  # Preserve the original index
            
            if angle_diff > angle_threshold:
                # Add the entire row to the filtered list
                filtered_segments.append(row_dict)
            else:
                # Add the entire row to the non-filtered list
                non_filtered_segments.append(row_dict)
    
    # Create DataFrames with the filtered and non-filtered results if data exists
    if filtered_segments:
        filtered_df = pd.DataFrame(filtered_segments).set_index('index')
        filtered_gdf = gpd.GeoDataFrame(filtered_df, crs=parcel_seg.crs, geometry=filtered_df['geometry'])
    else:
        # Initialize an empty GeoDataFrame with the same structure if no data
        filtered_gdf = gpd.GeoDataFrame(columns=parcel_seg.columns, crs=parcel_seg.crs)
    
    if non_filtered_segments:
        non_filtered_df = pd.DataFrame(non_filtered_segments).set_index('index')
        non_filtered_gdf = gpd.GeoDataFrame(non_filtered_df, crs=parcel_seg.crs, geometry=non_filtered_df['geometry'])
    else:
        # Initialize an empty GeoDataFrame with the same structure if no data
        non_filtered_gdf = gpd.GeoDataFrame(columns=parcel_seg.columns, crs=parcel_seg.crs)
    
    return filtered_gdf, non_filtered_gdf

# Call the function to filter curve segments and create two GeoDataFrames
filtered_parcel_seg, non_filtered_parcel_seg = filter_curve_segments(parcel_seg[parcel_seg['edge_num'] == 3])

# %%
# Function to create tangent lines and reverse the line if necessary
def create_tangents_with_reversal(line):
    coords = list(line.coords)
    if len(coords) < 2:
        return None, None  # Skip invalid geometries
    
    # Find the points with the smallest and largest y-coordinate (latitude)
    if coords[0][1] < coords[-1][1]:  # If the first point's y is smaller, it's the start point
        start_point = coords[0]
        end_point = coords[-1]
    else:  # Otherwise, the last point is the start point
        start_point = coords[-1]
        end_point = coords[0]

    # Reverse the line if start_point is not the same as coords[0]
    if start_point != coords[0]:
        coords.reverse()  # Reverse the order of coordinates
    
    # Now create tangents based on the (possibly reversed) coordinates
    start_tangent = LineString([coords[0], coords[1]])  # Tangent from the first to the second point
    end_tangent = LineString([coords[-2], coords[-1]])  # Tangent from the second last to the last point

    return start_tangent, end_tangent, LineString(coords)  # Return the tangents and the (possibly reversed) line

# Function to calculate the split point based on the half rule
def calculate_split_point(line, start_tangent, end_tangent, angle_diff, angle_fraction=0.5):
    coords = list(line.coords)

    # Iterate through the line and find the point where the angle difference is approximately half
    for i in range(1, len(coords) - 1):
        intermediate_tangent = LineString([coords[i - 1], coords[i]])
        current_angle_diff = calculate_angle_difference(start_tangent, intermediate_tangent)
        
        if current_angle_diff >= angle_diff * angle_fraction:
            return coords[i]  # Return the split point

    return coords[-1]  # If no point found, return the endpoint

# Function to process each segment in filtered_parcel_seg
def process_filtered_parcel_seg(filtered_parcel_seg, angle_threshold=30, angle_fraction=0.5):
    new_data = []
    
    for idx, row in filtered_parcel_seg.iterrows():
        line = row['geometry']
        
        # Apply the tangent and reversal function
        start_tangent, end_tangent, adjusted_line = create_tangents_with_reversal(line)
        
        if start_tangent and end_tangent:
            angle_diff = calculate_angle_difference(start_tangent, end_tangent)
            
            if angle_diff > angle_threshold:
                # Calculate the split point based on the angle difference and fraction
                split_point = calculate_split_point(adjusted_line, start_tangent, end_tangent, angle_diff, angle_fraction)
                
                # Add split point to row's data
                row_dict = row.to_dict()
                row_dict['split_point'] = Point(split_point)  # Store the split point as geometry
                row_dict['index'] = idx  # Store the original index
                
                new_data.append(row_dict)
            else:
                # If no split needed, just keep the original row
                row_dict = row.to_dict()
                row_dict['split_point'] = None  # No split point, store None
                row_dict['index'] = idx  # Store the original index
                
                new_data.append(row_dict)

    # Convert the processed data back into a GeoDataFrame
    new_df = pd.DataFrame(new_data).set_index('index')  # Use original index
    new_gdf = gpd.GeoDataFrame(new_df, crs=parcel_seg.crs, geometry='split_point')
    
    return new_gdf

# Check if filtered_parcel_seg is non-empty before processing
if not filtered_parcel_seg.empty:
    # Call the function to process the filtered_parcel_seg
    processed_parcel_seg = process_filtered_parcel_seg(filtered_parcel_seg)
else:
    # Handle the case where filtered_parcel_seg is empty
    processed_parcel_seg = gpd.GeoDataFrame(columns=filtered_parcel_seg.columns, crs=parcel_seg.crs)

# %%
# Function to split filtered_parcel_seg using points from processed_parcel_seg
def split_lines_with_points(filtered_parcel_seg, processed_parcel_seg):
    split_segments = []

    for idx, row in filtered_parcel_seg.iterrows():
        line = row['geometry']
        split_point_geom = processed_parcel_seg.loc[idx, 'split_point']  # Get the corresponding point geometry from split_point column
        
        if isinstance(split_point_geom, Point):
            # Check if the split point is on the line
            if line.contains(split_point_geom):
                # If the point is on the line, use it directly for splitting
                split_lines = split(line, split_point_geom)
            else:
                # If the point is not on the line, find the closest point on the line
                projected_distance = line.project(split_point_geom)
                nearest_point = line.interpolate(projected_distance)
                split_lines = split(line, nearest_point)
            
            # Handle GeometryCollection by extracting valid LineString geometries
            if isinstance(split_lines, GeometryCollection):
                split_segments.extend([{
                    **row.to_dict(), 'geometry': geom
                } for geom in split_lines.geoms if isinstance(geom, LineString)])
                continue  # Skip to the next iteration

        # If no valid split point or GeometryCollection, add the original row
        split_segments.append(row.to_dict())
    
    # Convert split_segments to a GeoDataFrame and return
    split_gdf = gpd.GeoDataFrame(split_segments, crs=parcel_seg.crs, geometry='geometry')
    return split_gdf

# Check if both filtered_parcel_seg and processed_parcel_seg are non-empty before processing
if not filtered_parcel_seg.empty and not processed_parcel_seg.empty:
    # Call the function to split lines based on points
    split_parcel_seg = split_lines_with_points(filtered_parcel_seg, processed_parcel_seg)
else:
    # Handle the case where one or both GeoDataFrames are empty
    split_parcel_seg = gpd.GeoDataFrame(columns=filtered_parcel_seg.columns, crs=parcel_seg.crs)

# Function to combine split_parcel_seg and non_filtered_parcel_seg, ensuring parcel_id proximity
def combine_parcel_segs(split_parcel_seg, non_filtered_parcel_seg):
    # Ensure both datasets contain the 'parcel_id' column
    if 'parcel_id' not in split_parcel_seg.columns or 'parcel_id' not in non_filtered_parcel_seg.columns:
        raise ValueError("Both datasets must contain the 'parcel_id' column.")
    
    # Convert parcel_id to string to avoid type errors during sorting
    split_parcel_seg['parcel_id'] = split_parcel_seg['parcel_id'].astype(str)
    non_filtered_parcel_seg['parcel_id'] = non_filtered_parcel_seg['parcel_id'].astype(str)
    
    # Concatenate the two GeoDataFrames and ensure 'crs' and 'geometry' are set
    combined_parcel_seg = gpd.GeoDataFrame(
        pd.concat([split_parcel_seg, non_filtered_parcel_seg], ignore_index=True),
        crs=parcel_seg.crs,  # Use the crs from one of the input GeoDataFrames
        geometry='geometry'  # Ensure the geometry column is correctly set
    )
    
    # Sort by 'parcel_id' to ensure similar parcel_id are together
    combined_parcel_seg_sorted = combined_parcel_seg.sort_values(by='parcel_id')
    
    return combined_parcel_seg_sorted

# Check if both split_parcel_seg and non_filtered_parcel_seg are non-empty before processing
if not split_parcel_seg.empty and not non_filtered_parcel_seg.empty:
    # Call the function to combine the datasets
    reconstr_seg = combine_parcel_segs(split_parcel_seg, non_filtered_parcel_seg)
else:
    # Handle the case where one or both GeoDataFrames are empty
    reconstr_seg = gpd.GeoDataFrame(columns=split_parcel_seg.columns, crs=parcel_seg.crs)

# %%
# Check if reconstr_seg is non-empty before concatenating
if not reconstr_seg.empty:
    parcel_seg = pd.concat([parcel_seg[parcel_seg['edge_num'] != 3], reconstr_seg], ignore_index=True).reset_index(drop=True)

parcel_seg = parcel_seg.drop(columns=['edge_num'])

Total number of parcel segments after geometry correction: 20367