This step defines a function called explode_to_lines() that converts polygon boundaries into individual line segments(LineString). It then retains retains the original attributes of each polygon, creating a new GeoDataFrame (parcel_seg) with the exploded line segments. After generating this GeoDataFrame, it further resets the index by grouping the line segments based on the parcel_id column, using cumcount() to ensure each segment has a unique index within its respective group.
Code
# Function to explode Polygons into individual boundary line segmentsdef explode_to_lines(gdf):# Create a list to store new rows line_list = []for index, row in gdf.iterrows():# Get the exterior boundary of the polygon exterior = row['geometry'].exterior# Convert the boundary into LineString segments lines = [LineString([exterior.coords[i], exterior.coords[i +1]]) for i inrange(len(exterior.coords) -1)]# Create new rows for each line segment, retaining the original attributesfor line in lines: new_row = row.copy() new_row['geometry'] = line line_list.append(new_row)# Use pd.concat to generate the final GeoDataFrame line_gdf = pd.concat(line_list, axis=1).T line_gdf = gpd.GeoDataFrame(line_gdf, geometry='geometry', crs=gdf.crs)return line_gdf# Call the function to explode the line segmentsparcel_seg = explode_to_lines(parcel)# Reset the index by groupparcel_seg['new_index'] = parcel_seg.groupby('parcel_id').cumcount()parcel_seg.set_index('new_index', inplace=True)parcel_seg.index.name =None
Prop_ID
GEO_ID
parcel_id
parcel_addr
landuse
landuse_spec
parcel_label
geometry
Found_Match
match_road_address
0
51
NaN
326302
117 E BELKNAP ST
NaN
C1
NaN
LINESTRING (-10835051.6694 3863246.022100002, ...
True
E Belknap St
1
51
NaN
326302
117 E BELKNAP ST
NaN
C1
NaN
LINESTRING (-10835069.5268 3863276.837999999, ...
True
E Belknap St
2
51
NaN
326302
117 E BELKNAP ST
NaN
C1
NaN
LINESTRING (-10835051.8949 3863286.881099999, ...
True
E Belknap St
3
51
NaN
326302
117 E BELKNAP ST
NaN
C1
NaN
LINESTRING (-10835035.1922 3863255.495399996, ...
True
E Belknap St
4
7123337
NaN
513461
1416 MICAH WAY
R
A
NaN
LINESTRING (-10822668.8036 3885115.729199999, ...
True
Micah Way
Geometry Correction
In some cases, the exterior boundaries of polygons contain curved geometries that are split into multiple line segments during the explode step. The goal is to merge such segments into a single, continuous curve if they are connected and their turning angle is below a specified threshold. The following process outlines how to perform this merging:
01.Calculate the Angle Between Two Lines
Develop a function to compute the angle between adjacent line segments. This will help determine if the segments are nearly collinear.
02.Check if Two Segments are Connected
Create a function to check whether two segments share a common endpoint, indicating they are adjacent and can be considered for merging.
03.Re-generate Segment Index Based on Turning Points
Develop a function to re-index segments based on turning points (where the angle exceeds the threshold), marking where merging should stop.
04.Iterative Merging of Connected Segments
For each segment, check its connectivity and angle with the next segment:
If two connected segments have an angle below the threshold, merge them.
Continue to the next segment and repeat the merging process until a turning point is encountered (i.e., the angle is above the threshold).
05.Divide the curve into two segments
This step calculates the number of edges for each parcel (parcel_id). For parcels with exactly three edges, it identifies curved segments where the angle between start and end tangents exceeds 30 degrees (angle_threshold=30). These curves are split into two sections at the midpoint of the angle change (angle_fraction=0.5). The modified segments are then combined with the unchanged segments to produce the final result.:
Calculate edge counts: Count the number of edges per parcel_id using groupby.size(). Map this count to a new column edge_num.
Identify which curve need to be changed: Focus on segments where edge_num == 3 for processing potential curve segments.
Create tangent lines of start and end points for each edge and calculate the angle between them.
Create the half split point(angle_fraction=0.5) for each curve and process the curve line.
Combine those changed edges into unchanged edges
The detailed Python implementation is shown below:
01.Calculate the Angle Between Two Lines
This step defines two functions, fun_bearing_ra() and calculate_angle_difference(), for analyzing the orientation and angle between geometric line segments.This approach provides a robust way to measure angular differences between line segments and effectively addresses the issue of angle bias caused by different vector directions, which is essential for identifying whether two adjacent segments are nearly parallel and can be merged.
fun_bearing_ra(geom)
This function computes the bearing of a given geometry (geom) using its first and last coordinates. The bearing is calculated with atan2 and returned as a radian value.
Code
# Function to calculate the bearing of a geometrydef fun_bearing_ra(geom): coords = np.array(geom.coords)# Use the first and last coordinates to calculate the bearing x1, y1 = coords[0] x2, y2 = coords[-1]# Calculate the bearing using atan2 bearing = math.atan2(y2 - y1, x2 - x1)return bearing
calculate_angle_difference(line1, line2)
This function calculates the angle difference between two line segments (line1 and line2) by comparing their bearings using fun_bearing_ra(). It effectively solves the issue of angle bias caused by different vector orientations by normalizing the angle difference to a consistent range of -π to π radians.
The angle difference is constrained within the range of -π to π radians to ensure it accurately reflects the shortest angular distance between the two segments.
This approach ensures that the measured angle remains accurate even if the segments point in opposite directions. By returning the smaller of the two possible angles (the angle itself or its supplement) to guarantee that the angle difference is always ≤ 90 degrees.
Code
# Fuction to calculate the angle based on the bearingdef calculate_angle_difference(line1, line2): bearing1 = fun_bearing_ra(line1) bearing2 = fun_bearing_ra(line2)# Calculate the absolute angle difference and ensure it is <= 180 degrees delta_theta = bearing2 - bearing1# Ensure the angle is between -π and πif delta_theta > math.pi: delta_theta -=2* math.pielif delta_theta <-math.pi: delta_theta +=2* math.pi# Convert the angle to degrees angle_between_degrees = math.degrees(abs(delta_theta))# Return the smaller angle difference (angle or its supplement)returnmin(angle_between_degrees, 180- angle_between_degrees)
02.Check if Two Segments are Connected
The are_segments_connected() function determines if two line segments (line1 and line2) share a common point, indicating that they are connected. It achieves this by comparing the coordinates of the start and end points of both segments using NumPy arrays. If any of the four combinations of start or end points match (i.e., start-to-start, start-to-end, end-to-start, or end-to-end), the function returns True, indicating that the segments are connected. Otherwise, it returns False. This function is essential for identifying adjacency between line segments, which serves as a prerequisite for merging operations.
Code
# Check if two segments share a common point (i.e., their start or end point is the same)def are_segments_connected(line1, line2): coords1 = np.array(line1.coords) coords2 = np.array(line2.coords)# Check if the start or end points of the segments are the sameif np.all(coords1[0] == coords2[0]) or np.all(coords1[0] == coords2[-1]) or\ np.all(coords1[-1] == coords2[0]) or np.all(coords1[-1] == coords2[-1]):returnTruereturnFalse
03.Re-generate Segment Index Based on Turning Points
This step processes parcel_seg GeoDataFrame by identifying turning points and reordering the index in each segments group. The re-indexing is critical for ensuring that the subsequent merging logic functions correctly. For example, if the segments are not ordered correctly, some segments that meet the merging criteria might not be combined as expected (e.g., some polygons are drawn starting from the middle of a segment rather than at a corner).
This process realigns the segments to reflect their actual geometric sequence, which is essential for correct segment merging in subsequent operations.It consists of two main functions:
reorder_segments_by_turning_point(segments, turning_point_index)
This function reorders the index in selected segments list, starting from a specified turning point.
Code
# Function to reorder segments based on the turning pointdef reorder_segments_by_turning_point(segments, turning_point_index):# Reorder segments starting from the identified turning point reordered_segments = segments[turning_point_index:] + segments[:turning_point_index]return reordered_segments
process_parcel_segments(parcel_seg)
This main function processes each group of line segments based on parcel_id by calculating the angle differences between adjacent segments using calculate_angle_difference(). If an angle difference exceeds the defined threshold (15 degrees), it is marked as a turning point. The function then reorders segments, starting from the first turning point, to preserve logical continuity in the line sequence. The reordered segments are updated in original parcel_seg GeoDataframe, along with their original index and new index.
Code
# Main function: Process each parcel_id group and return a new GeoDataFramedef process_parcel_segments(parcel_seg): merged_segments = [] # List to store the reordered segments# Group the parcel segments by parcel_id and process each groupfor object_id, group in parcel_seg.groupby('parcel_id'): segments = group['geometry'].tolist() # Get the list of line segments for the current group original_indices = group.index.tolist() # Preserve the original indices turning_points = []# Loop through all adjacent segments to calculate angle differencesfor i inrange(1, len(segments)):if are_segments_connected(segments[i-1], segments[i]): angle_diff = calculate_angle_difference(segments[i-1], segments[i])if angle_diff >15: # If angle difference is greater than 15 degrees, mark it as a turning point turning_points.append(i)# If there are turning points, reorder the segments starting from the first turning pointif turning_points: turning_point_index = turning_points[0] reordered_segments = reorder_segments_by_turning_point(segments, turning_point_index) reordered_original_indices = reorder_segments_by_turning_point(original_indices, turning_point_index)else:# If no turning points, retain the original order reordered_segments = segments reordered_original_indices = original_indices# Store the reordered segments and their attributesfor j, (line, original_index) inenumerate(zip(reordered_segments, reordered_original_indices)): row = group.iloc[0].copy() # Copy the first row's attributes row['geometry'] = line row['original_index'] = original_index # Preserve the original index row['new_index'] = j # Assign the new index based on the reordered list merged_segments.append(row)# Create a new GeoDataFrame for the reordered segments updated_gdf = gpd.GeoDataFrame(merged_segments, columns=parcel_seg.columns.tolist() + ['original_index', 'new_index']) updated_gdf = updated_gdf.reset_index(drop=True)return updated_gdf# Run the main function and get the new GeoDataFrameupdated_parcel_seg = process_parcel_segments(parcel_seg)parcel_seg = updated_parcel_seg
Total number of parcel segments before geometry correction: 37141
04.Iterative Merging of Connected Segments
This step processes parcel_seg by merging line segments within each parcel group (OBJECTID) based on their connectivity and angle difference(15 degrees), which is to combine multiple curved segments into a single continuous curve.
Specifically, if segments[i-1] shares a common point with segments[i] and the angle difference is less than 15 degrees, they are merged into one. However, the merging process does not continue by comparing the merged segment with segments[i+1]. Instead, the next comparison is always made between segments[i] (the last unmerged segment) and segments[i+1]. If segments[i] and segments[i+1] also satisfy the merging criteria, then the previously merged segment is combined with segments[i+1], forming a new, longer segment. This step-by-step approach ensures that each newly created segment only extends when consecutive segments continue to meet the merging conditions.
Code
# Group parcel_seg by parcel_id and process each groupmerged_segments = []for object_id, group in parcel_seg.groupby('parcel_id'):# Get the list of geometries in the current group segments = group.geometry.tolist()# Start with the first segment merged_lines = [segments[0]] # Start with the first segmentfor i inrange(1, len(segments)): connected =False# Always compare the current segment with the previous oneif are_segments_connected(segments[i-1], segments[i]):# Calculate the angle difference between the current segment and the previous one angle_diff = calculate_angle_difference(segments[i-1], segments[i])# If the angle difference is less than 15 degrees, merge the adjacent line segmentsif angle_diff <15:# Merge the current and previous segments merged_result = linemerge([merged_lines[-1], segments[i]])# Check if the result is a MultiLineString, if so, skip the mergeifisinstance(merged_result, LineString): merged_lines[-1] = merged_result connected =Trueelse:# Skip the merge if it's a MultiLineStringcontinue# If no connected segment is found or the angle difference is too large, add the current segment as a new oneifnot connected: merged_lines.append(segments[i])# Keep the merged results and add other attributesfor line in merged_lines: row = group.iloc[0].copy() # Copy the first attribute row from the group row['geometry'] = line merged_segments.append(row)# Create a new GeoDataFrame from the merged line segmentsparcel_seg = gpd.GeoDataFrame(merged_segments, columns=parcel_seg.columns)# Check for MultiLineString geometries and explode them into LineStringexploded_segments = []for index, row in parcel_seg.iterrows(): geom = row['geometry']ifisinstance(geom, MultiLineString):# Explode the MultiLineString into individual LineStringsfor line in geom: new_row = row.copy() new_row['geometry'] = line exploded_segments.append(new_row)else:# Keep the original LineString geometries exploded_segments.append(row)# Create a new GeoDataFrame from the exploded segmentsparcel_seg = gpd.GeoDataFrame(exploded_segments, columns=parcel_seg.columns)# extract useful columnsparcel_seg.drop(columns=['original_index', 'new_index'], inplace=True)# Reset the index of the final GeoDataFrameparcel_seg = parcel_seg.reset_index(drop=True)
Total number of parcel segments after geometry correction: 20346
05.Divide the curve into two segments
Divide curves with three edges into two segments at the midpoint of the angle change (>30°) and combine with unchanged segments for final output.
Code
edge_counts = parcel_seg.groupby('parcel_id').size()parcel_seg['edge_num'] = parcel_seg['parcel_id'].map(edge_counts)# Function to create tangent lines at both ends of a line segmentdef create_tangents(line): coords =list(line.coords)iflen(coords) <2:returnNone, None# Skip invalid geometries# Create tangents at the start and end of the line segment start_tangent = LineString([coords[0], coords[1]]) end_tangent = LineString([coords[-2], coords[-1]])return start_tangent, end_tangent# Function to filter curve segments based on angle difference of tangents > 30 degreesdef filter_curve_segments(parcel_seg, angle_threshold=30): filtered_segments = [] non_filtered_segments = []for idx, row in parcel_seg.iterrows(): line = row['geometry'] start_tangent, end_tangent = create_tangents(line)if start_tangent and end_tangent: angle_diff = calculate_angle_difference(start_tangent, end_tangent) row_dict = row.to_dict() # Convert the entire row to a dictionary row_dict['index'] = idx # Preserve the original indexif angle_diff > angle_threshold:# Add the entire row to the filtered list filtered_segments.append(row_dict)else:# Add the entire row to the non-filtered list non_filtered_segments.append(row_dict)# Create DataFrames with the filtered and non-filtered results if data existsif filtered_segments: filtered_df = pd.DataFrame(filtered_segments).set_index('index') filtered_gdf = gpd.GeoDataFrame(filtered_df, crs=parcel_seg.crs, geometry=filtered_df['geometry'])else:# Initialize an empty GeoDataFrame with the same structure if no data filtered_gdf = gpd.GeoDataFrame(columns=parcel_seg.columns, crs=parcel_seg.crs)if non_filtered_segments: non_filtered_df = pd.DataFrame(non_filtered_segments).set_index('index') non_filtered_gdf = gpd.GeoDataFrame(non_filtered_df, crs=parcel_seg.crs, geometry=non_filtered_df['geometry'])else:# Initialize an empty GeoDataFrame with the same structure if no data non_filtered_gdf = gpd.GeoDataFrame(columns=parcel_seg.columns, crs=parcel_seg.crs)return filtered_gdf, non_filtered_gdf# Call the function to filter curve segments and create two GeoDataFramesfiltered_parcel_seg, non_filtered_parcel_seg = filter_curve_segments(parcel_seg[parcel_seg['edge_num'] ==3])# %%# Function to create tangent lines and reverse the line if necessarydef create_tangents_with_reversal(line): coords =list(line.coords)iflen(coords) <2:returnNone, None# Skip invalid geometries# Find the points with the smallest and largest y-coordinate (latitude)if coords[0][1] < coords[-1][1]: # If the first point's y is smaller, it's the start point start_point = coords[0] end_point = coords[-1]else: # Otherwise, the last point is the start point start_point = coords[-1] end_point = coords[0]# Reverse the line if start_point is not the same as coords[0]if start_point != coords[0]: coords.reverse() # Reverse the order of coordinates# Now create tangents based on the (possibly reversed) coordinates start_tangent = LineString([coords[0], coords[1]]) # Tangent from the first to the second point end_tangent = LineString([coords[-2], coords[-1]]) # Tangent from the second last to the last pointreturn start_tangent, end_tangent, LineString(coords) # Return the tangents and the (possibly reversed) line# Function to calculate the split point based on the half ruledef calculate_split_point(line, start_tangent, end_tangent, angle_diff, angle_fraction=0.5): coords =list(line.coords)# Iterate through the line and find the point where the angle difference is approximately halffor i inrange(1, len(coords) -1): intermediate_tangent = LineString([coords[i -1], coords[i]]) current_angle_diff = calculate_angle_difference(start_tangent, intermediate_tangent)if current_angle_diff >= angle_diff * angle_fraction:return coords[i] # Return the split pointreturn coords[-1] # If no point found, return the endpoint# Function to process each segment in filtered_parcel_segdef process_filtered_parcel_seg(filtered_parcel_seg, angle_threshold=30, angle_fraction=0.5): new_data = []for idx, row in filtered_parcel_seg.iterrows(): line = row['geometry']# Apply the tangent and reversal function start_tangent, end_tangent, adjusted_line = create_tangents_with_reversal(line)if start_tangent and end_tangent: angle_diff = calculate_angle_difference(start_tangent, end_tangent)if angle_diff > angle_threshold:# Calculate the split point based on the angle difference and fraction split_point = calculate_split_point(adjusted_line, start_tangent, end_tangent, angle_diff, angle_fraction)# Add split point to row's data row_dict = row.to_dict() row_dict['split_point'] = Point(split_point) # Store the split point as geometry row_dict['index'] = idx # Store the original index new_data.append(row_dict)else:# If no split needed, just keep the original row row_dict = row.to_dict() row_dict['split_point'] =None# No split point, store None row_dict['index'] = idx # Store the original index new_data.append(row_dict)# Convert the processed data back into a GeoDataFrame new_df = pd.DataFrame(new_data).set_index('index') # Use original index new_gdf = gpd.GeoDataFrame(new_df, crs=parcel_seg.crs, geometry='split_point')return new_gdf# Check if filtered_parcel_seg is non-empty before processingifnot filtered_parcel_seg.empty:# Call the function to process the filtered_parcel_seg processed_parcel_seg = process_filtered_parcel_seg(filtered_parcel_seg)else:# Handle the case where filtered_parcel_seg is empty processed_parcel_seg = gpd.GeoDataFrame(columns=filtered_parcel_seg.columns, crs=parcel_seg.crs)# %%# Function to split filtered_parcel_seg using points from processed_parcel_segdef split_lines_with_points(filtered_parcel_seg, processed_parcel_seg): split_segments = []for idx, row in filtered_parcel_seg.iterrows(): line = row['geometry'] split_point_geom = processed_parcel_seg.loc[idx, 'split_point'] # Get the corresponding point geometry from split_point columnifisinstance(split_point_geom, Point):# Check if the split point is on the lineif line.contains(split_point_geom):# If the point is on the line, use it directly for splitting split_lines = split(line, split_point_geom)else:# If the point is not on the line, find the closest point on the line projected_distance = line.project(split_point_geom) nearest_point = line.interpolate(projected_distance) split_lines = split(line, nearest_point)# Handle GeometryCollection by extracting valid LineString geometriesifisinstance(split_lines, GeometryCollection): split_segments.extend([{**row.to_dict(), 'geometry': geom } for geom in split_lines.geoms ifisinstance(geom, LineString)])continue# Skip to the next iteration# If no valid split point or GeometryCollection, add the original row split_segments.append(row.to_dict())# Convert split_segments to a GeoDataFrame and return split_gdf = gpd.GeoDataFrame(split_segments, crs=parcel_seg.crs, geometry='geometry')return split_gdf# Check if both filtered_parcel_seg and processed_parcel_seg are non-empty before processingifnot filtered_parcel_seg.empty andnot processed_parcel_seg.empty:# Call the function to split lines based on points split_parcel_seg = split_lines_with_points(filtered_parcel_seg, processed_parcel_seg)else:# Handle the case where one or both GeoDataFrames are empty split_parcel_seg = gpd.GeoDataFrame(columns=filtered_parcel_seg.columns, crs=parcel_seg.crs)# Function to combine split_parcel_seg and non_filtered_parcel_seg, ensuring parcel_id proximitydef combine_parcel_segs(split_parcel_seg, non_filtered_parcel_seg):# Ensure both datasets contain the 'parcel_id' columnif'parcel_id'notin split_parcel_seg.columns or'parcel_id'notin non_filtered_parcel_seg.columns:raiseValueError("Both datasets must contain the 'parcel_id' column.")# Convert parcel_id to string to avoid type errors during sorting split_parcel_seg['parcel_id'] = split_parcel_seg['parcel_id'].astype(str) non_filtered_parcel_seg['parcel_id'] = non_filtered_parcel_seg['parcel_id'].astype(str)# Concatenate the two GeoDataFrames and ensure 'crs' and 'geometry' are set combined_parcel_seg = gpd.GeoDataFrame( pd.concat([split_parcel_seg, non_filtered_parcel_seg], ignore_index=True), crs=parcel_seg.crs, # Use the crs from one of the input GeoDataFrames geometry='geometry'# Ensure the geometry column is correctly set )# Sort by 'parcel_id' to ensure similar parcel_id are together combined_parcel_seg_sorted = combined_parcel_seg.sort_values(by='parcel_id')return combined_parcel_seg_sorted# Check if both split_parcel_seg and non_filtered_parcel_seg are non-empty before processingifnot split_parcel_seg.empty andnot non_filtered_parcel_seg.empty:# Call the function to combine the datasets reconstr_seg = combine_parcel_segs(split_parcel_seg, non_filtered_parcel_seg)else:# Handle the case where one or both GeoDataFrames are empty reconstr_seg = gpd.GeoDataFrame(columns=split_parcel_seg.columns, crs=parcel_seg.crs)# %%# Check if reconstr_seg is non-empty before concatenatingifnot reconstr_seg.empty: parcel_seg = pd.concat([parcel_seg[parcel_seg['edge_num'] !=3], reconstr_seg], ignore_index=True).reset_index(drop=True)parcel_seg = parcel_seg.drop(columns=['edge_num'])
Total number of parcel segments after geometry correction: 20367