Merge all classified parcel segments into a single GeoDataFrame.
01.Regular Inside Parcel
02.Regular Corner Parcel
03.Special Parcel
04.Jagged Parcel
05.Curve Parcel
06.Cul De Sac Parcel
07.No Match Address Parcel
08.No Address Parcel
09.Duplicated Address Parcel
Code
# List of GeoDataFrames to combineparcel_list = [ regular_insid_parcel, regular_corner_parcel, special_parcel, jagged_parcel, curve_parcel, cul_de_sac_parcel, no_match_address_parcel, no_address_parcel, duplicated_address_parcel]# Concatenate all GeoDataFrames in the list and ensure 'crs' and 'geometry' are setcombined_parcel = gpd.GeoDataFrame( pd.concat(parcel_list, ignore_index=True), crs=parcel_seg.crs, # Use the crs from the first GeoDataFrame in the list geometry='geometry'# Ensure the geometry column is correctly set)combined_parcel['parcel_id'] = combined_parcel['parcel_id'].astype(str)# Sort by 'parcel_id' to ensure similar parcel_id are togethercombined_parcel = combined_parcel.sort_values(by='parcel_id').reset_index(drop=True)
Parcels labeled as Cul-de-sac, Curve, Special, No Match Address, and No Address are further subdivided into “standard” and “other” types to assign a confidence level. These rules are based on visual inspection and observed patterns. While the criteria differ slightly for each parcel label, the main factors are whether both the front and rear sides are present and the number of edges in the parcel.
Step 1: Assign Standard and Other Types for Selected Parcel Label
Code
# Calculate the number of edges for each parcel groupedge_counts = combined_parcel.groupby('parcel_id').size()combined_parcel['num_edges'] = combined_parcel['parcel_id'].map(edge_counts)# for cul_de_sac parcel:def update_label_cul_de_sac(group):if group['num_edges'].iloc[0] ==4:if'front'in group['side'].values and'rear'in group['side'].values: group['parcel_label'] ='cul_de_sac parcel_standard'else: group['parcel_label'] ='cul_de_sac parcel_other'else:# Directly change label if 'num_edges' is not equal to 4 group['parcel_label'] ='cul_de_sac parcel_other'return group# Apply the function to each group and update the main DataFrameupdated_parcel_cul_de_sac = combined_parcel[combined_parcel['parcel_label'] =='cul_de_sac parcel'].groupby('parcel_id', group_keys=False).apply(update_label_cul_de_sac)combined_parcel.update(updated_parcel_cul_de_sac)# for curve parcel:def update_label_curve(group):if group['num_edges'].iloc[0] ==4:if'front'in group['side'].values and'rear'in group['side'].values: group['parcel_label'] ='curve parcel_standard'else: group['parcel_label'] ='curve parcel_other'else:# Directly change label if 'num_edges' is not equal to 4 group['parcel_label'] ='curve parcel_other'return group# Apply the function to each group and update the main DataFrameupdated_parcel_curve = combined_parcel[combined_parcel['parcel_label'] =='curve parcel'].groupby('parcel_id', group_keys=False).apply(update_label_curve)combined_parcel.update(updated_parcel_curve)# for no match address parcel:def update_label_nomatch(group):if group['num_edges'].iloc[0] ==4:if'front'in group['side'].values and'rear'in group['side'].values: group['parcel_label'] ='no_match_address_standard'else: group['parcel_label'] ='no_match_address_other'else:# Directly change label if 'num_edges' is not equal to 4 group['parcel_label'] ='no_match_address_other'return group# Apply the function to each group and update the main DataFrameupdated_parcel_nomatch = combined_parcel[combined_parcel['parcel_label'] =='no_match_address'].groupby('parcel_id', group_keys=False).apply(update_label_nomatch)combined_parcel.update(updated_parcel_nomatch)# for no address parceldef update_label_noaddress(group):if group['num_edges'].iloc[0] ==4:if'front'in group['side'].values and'rear'in group['side'].values: group['parcel_label'] ='no_address_parcel_standard'else: group['parcel_label'] ='no_address_parcel_other'else:# Directly change label if 'num_edges' is not equal to 4 group['parcel_label'] ='no_address_parcel_other'return group# Apply the function to each group and update the main DataFrameupdated_parcel_noaddress = combined_parcel[combined_parcel['parcel_label'] =='parcel without address'].groupby('parcel_id', group_keys=False).apply(update_label_noaddress)combined_parcel.update(updated_parcel_noaddress)# for special parceldef update_label_special(group):if group['num_edges'].iloc[0] in [4, 5]:if'front'in group['side'].values and'rear'in group['side'].values: group['parcel_label'] ='special parcel_standard'else: group['parcel_label'] ='special parcel_other'else:# Directly change label if 'num_edges' is not equal to 4 group['parcel_label'] ='special parcel_other'return group# Apply the function to each group and update the main DataFrameupdated_parcel_special = combined_parcel[combined_parcel['parcel_label'] =='special parcel'].groupby('parcel_id', group_keys=False).apply(update_label_special)combined_parcel.update(updated_parcel_special)
Step 2: Create the Confidence Area to Mark where is Confidence
Code
parcel_label_summary = combined_parcel.groupby('parcel_id')['parcel_label'].first().reset_index()# Rename the columns for clarityparcel_label_summary.columns = ['parcel_id', 'unique_parcel_labels']confidence_area = parcel[['parcel_id','parcel_addr','landuse','parcel_label','geometry']].copy()confidence_area['parcel_id'] = confidence_area['parcel_id'].astype(str)confidence_area = confidence_area.merge(parcel_label_summary, on='parcel_id', how='left')confidence_area['parcel_label'] = confidence_area['unique_parcel_labels']confidence_area = confidence_area.drop(columns=['unique_parcel_labels'])confidence_area['confidence_level'] = np.where( confidence_area['parcel_label'].isin(['regular inside parcel', 'regular corner parcel', 'special parcel_standard','curve parcel_standard','cul_de_sac parcel_standard','no_match_address_standard','no_address_parcel_standard']),'Yes', 'No')# calculate the area/sqm, and transfer into the sq_acreconfidence_area['area_acre'] = confidence_area['geometry'].area *0.000247105confidence_area.to_crs(epsg=4326, inplace=True)
confidence_level
Yes 4528
No 348
Name: count, dtype: int64
Create the centroid point for each parcel and add them to the end of the parcel group
Code
def add_centroids_to_combined_parcel(confidence_area, combined_parcel):# Step 1: Calculate the centroid for each geometry and add it to a new column 'centroid_geometry' confidence_area['centroid_geometry'] = confidence_area['geometry'].centroid# Step 2: Group by 'parcel_id' and get the centroid for each group as a DataFrame centroids_by_parcel = confidence_area.groupby('parcel_id')['centroid_geometry'].apply(lambda x: x.iloc[-1]).reset_index()# Step 3: Add centroid data to the last row of the corresponding group in combined_parcel rows_to_add = [] # List to store new rows to be addedfor _, row in centroids_by_parcel.iterrows(): parcel_id = row['parcel_id'] centroid_geometry = row['centroid_geometry']# Get rows in combined_parcel that match the parcel_id parcel_group = combined_parcel[combined_parcel['parcel_id'] == parcel_id]# Add centroid row at the end of the groupifnot parcel_group.empty:# Create a new row, setting centroid as geometry, keeping other columns empty or default new_row = parcel_group.iloc[-1].copy() new_row['geometry'] = centroid_geometry new_row['side'] ='centroid'# Set the 'side' column value to 'centroid' rows_to_add.append(new_row) # Add new row to list# Use pd.concat to add all new rows to combined_parcel combined_parcel = pd.concat([combined_parcel, pd.DataFrame(rows_to_add)], ignore_index=True)# Create a helper column to ensure centroid rows appear at the end of each group combined_parcel['is_centroid'] = combined_parcel['side'] =='centroid'# Sort by 'parcel_id' and 'is_centroid' so that centroid rows are at the end of each group combined_parcel = combined_parcel.sort_values(by=['parcel_id', 'is_centroid'], ascending=[True, True]).reset_index(drop=True)# Step 6: Drop the helper column combined_parcel = combined_parcel.drop(columns=['is_centroid'])return combined_parcel# Use the function to directly update combined_parcelcombined_parcel = add_centroids_to_combined_parcel(confidence_area, combined_parcel)# If the CRS is not set or is incorrect, set it to the correct one (e.g., EPSG:3857)combined_parcel.set_crs(epsg=3857, inplace=True, allow_override=True)# Convert to EPSG:4326combined_parcel = combined_parcel.to_crs(epsg=4326)
Prop_ID
GEO_ID
parcel_id
parcel_addr
landuse
landuse_spec
parcel_label
Found_Match
match_road_address
shape_index
50_threshold
num_edges
angle_difference
shared_side
parcel_bearing
road_bearing
angle
distance_to_road
side
geometry
0
03027805
NaN
1000
3924 EARL ST
R
A
regular inside parcel
1
Earl St
1.192563
0
4
0.790292
1
1.560608
0.011927
88.732853
28.968721
Interior side
LINESTRING (-97.28932 32.78599, -97.28931 32.7...
1
03027805
NaN
1000
3924 EARL ST
R
A
regular inside parcel
1
Earl St
1.192563
0
4
0.790292
1
-3.138026
0.011927
0.479009
50.845007
rear
LINESTRING (-97.28912 32.78599, -97.28930 32.7...
2
03027805
NaN
1000
3924 EARL ST
R
A
regular inside parcel
1
Earl St
1.192563
0
4
0.790292
0
-0.001866
0.011927
0.790292
7.329586
front
LINESTRING (-97.28931 32.78632, -97.28911 32.7...
3
03027805
NaN
1000
3924 EARL ST
R
A
regular inside parcel
1
Earl St
1.192563
0
4
0.790292
1
-1.589407
0.011927
88.250309
29.215929
Interior side
LINESTRING (-97.28911 32.78632, -97.28912 32.7...
4
03027805
NaN
1000
3924 EARL ST
R
A
regular inside parcel
1
Earl St
1.192563
0
4
0.790292
1
-1.589407
0.011927
88.250309
29.215929
centroid
POINT (-97.28921 32.78615)
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
25235
00381004
NaN
999
1712 N BEACH ST
NaN
F1
regular inside parcel
1
N Beach St
1.361853
1
4
0.044117
0
1.572441
1.573211
0.044117
12.197149
front
LINESTRING (-97.29050 32.78570, -97.29050 32.7...
25236
00381004
NaN
999
1712 N BEACH ST
NaN
F1
regular inside parcel
1
N Beach St
1.361853
1
4
0.044117
1
-0.000172
1.573211
89.851809
78.928026
Interior side
LINESTRING (-97.29050 32.78599, -97.29029 32.7...
25237
00381004
NaN
999
1712 N BEACH ST
NaN
F1
regular inside parcel
1
N Beach St
1.361853
1
4
0.044117
1
3.141498
1.573211
89.856205
79.090904
Interior side
LINESTRING (-97.28930 32.78570, -97.28951 32.7...
25238
00381004
NaN
999
1712 N BEACH ST
NaN
F1
regular inside parcel
1
N Beach St
1.361853
1
4
0.044117
1
-1.558978
1.573211
0.538781
145.821717
rear
LINESTRING (-97.28930 32.78599, -97.28930 32.7...
25239
00381004
NaN
999
1712 N BEACH ST
NaN
F1
regular inside parcel
1
N Beach St
1.361853
1
4
0.044117
1
-1.558978
1.573211
0.538781
145.821717
centroid
POINT (-97.28990 32.78585)
25240 rows × 20 columns
Data Key Column Explanation
Prop_ID
A unique identifier for each property, sourced directly from the data portal.
parcel_id
A unique identifier assigned to each parcel, defined as a sequential number starting from 1 and unique within this dataset.
parcel_addr
The physical address associated with each parcel.
landuse
The current land use type for each parcel. Example codes include: - R: Residential - P: Public - C: Commercial
landuse_spec
The more detailed current land use codes for each parcel. More details you can see here
parcel_label
The classification label assigned to each parcel created by us.
side
The classification label for each parcel edge, which can be one of the following: - Front - Rear - Exterior Side - Interior Side - Centroid point to present where the parcel is
geometry
The geometric representation of each parcel edge using LineString format. CRS=4326(WGS 84)
Make this Notebook Trusted to load map: File -> Trust Notebook