Generate Results

Data Integation

Merge all classified parcel segments into a single GeoDataFrame.

  • 01.Regular Inside Parcel
  • 02.Regular Corner Parcel
  • 03.Special Parcel
  • 04.Jagged Parcel
  • 05.Curve Parcel
  • 06.Cul De Sac Parcel
  • 07.No Match Address Parcel
  • 08.No Address Parcel
  • 09.Duplicated Address Parcel
Code
# List of GeoDataFrames to combine
parcel_list = [
    regular_insid_parcel, regular_corner_parcel, special_parcel, jagged_parcel, curve_parcel,
    cul_de_sac_parcel, no_match_address_parcel, no_address_parcel, duplicated_address_parcel
]

# Concatenate all GeoDataFrames in the list and ensure 'crs' and 'geometry' are set
combined_parcel = gpd.GeoDataFrame(
    pd.concat(parcel_list, ignore_index=True),
    crs=parcel_seg.crs,  # Use the crs from the first GeoDataFrame in the list
    geometry='geometry'  # Ensure the geometry column is correctly set
)

combined_parcel['parcel_id'] = combined_parcel['parcel_id'].astype(str)
# Sort by 'parcel_id' to ensure similar parcel_id are together
combined_parcel = combined_parcel.sort_values(by='parcel_id').reset_index(drop=True)
Prop_ID GEO_ID parcel_id parcel_addr landuse landuse_spec parcel_label Found_Match match_road_address shape_index 50_threshold num_edges angle_difference shared_side parcel_bearing road_bearing angle distance_to_road side geometry
0 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 False 4.0 0.790292 True 1.560608 0.011927 88.732853 28.968721 Interior side LINESTRING (-97.28932 32.78599, -97.28931 32.7...
1 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 False 4.0 0.790292 True -3.138026 0.011927 0.479009 50.845007 rear LINESTRING (-97.28912 32.78599, -97.28930 32.7...
2 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 False 4.0 0.790292 False -0.001866 0.011927 0.790292 7.329586 front LINESTRING (-97.28931 32.78632, -97.28911 32.7...
3 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 False 4.0 0.790292 True -1.589407 0.011927 88.250309 29.215929 Interior side LINESTRING (-97.28911 32.78632, -97.28912 32.7...
4 03027783 NaN 1001 3932 EARL ST R A regular inside parcel 1 Earl St 1.195293 False 4.0 1.184516 True 1.560690 0.011927 88.737543 29.672111 Interior side LINESTRING (-97.28892 32.78599, -97.28892 32.7...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
20362 04723295 NaN 998 3920 EARL ST R A regular inside parcel 1 Earl St 1.214266 False 4.0 1.325428 True 3.136724 0.011927 0.962359 50.593869 rear LINESTRING (-97.28932 32.78599, -97.28950 32.7...
20363 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 True 4.0 0.044117 False 1.572441 1.573211 0.044117 12.197149 front LINESTRING (-97.29050 32.78570, -97.29050 32.7...
20364 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 True 4.0 0.044117 True -0.000172 1.573211 89.851809 78.928026 Interior side LINESTRING (-97.29050 32.78599, -97.29029 32.7...
20365 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 True 4.0 0.044117 True 3.141498 1.573211 89.856205 79.090904 Interior side LINESTRING (-97.28930 32.78570, -97.28951 32.7...
20366 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 True 4.0 0.044117 True -1.558978 1.573211 0.538781 145.821717 rear LINESTRING (-97.28930 32.78599, -97.28930 32.7...

20367 rows × 20 columns

parcel_label
regular inside parcel     14576
regular corner parcel      2768
special parcel             1320
jagged parcel               913
duplicated address          270
curve parcel                253
cul_de_sac parcel           185
parcel without address       78
no_match_address              4
Name: count, dtype: int64

Granular Classification for Marking Confidence

Parcels labeled as Cul-de-sac, Curve, Special, No Match Address, and No Address are further subdivided into “standard” and “other” types to assign a confidence level. These rules are based on visual inspection and observed patterns. While the criteria differ slightly for each parcel label, the main factors are whether both the front and rear sides are present and the number of edges in the parcel.

  • 01.Regular Inside Parcel(confidence level = ‘Yes’)
  • 02.Regular Corner Parcel(confidence level = ‘Yes’)
  • 03.Special Parcel Standard(confidence level = ‘Yes’)
  • 04.Special Parcel Other(confidence level = ‘No’)
  • 05.Jagged Parcel(confidence level = ‘No’)
  • 06.Curve Parcel Standard(confidence level = ‘Yes’)
  • 07.Curve Parcel Other(confidence level = ‘No’)
  • 08.Cul De Sac Parcel Standard(confidence level = ‘Yes’)
  • 09.Cul De Sac Parcel Other(confidence level = ‘No’)
  • 10.No Match Address Parcel Standard(confidence level = ‘Yes’)
  • 11.No Match Address Parcel Other(confidence level = ‘No’)
  • 12.No Address Parcel Standard(confidence level = ‘Yes’)
  • 13.No Address Parcel Other(confidence level = ‘No’)
  • 14.Duplicated Address Parcel(confidence level = ‘No’)

Step 1: Assign Standard and Other Types for Selected Parcel Label

Code
# Calculate the number of edges for each parcel group
edge_counts = combined_parcel.groupby('parcel_id').size()
combined_parcel['num_edges'] = combined_parcel['parcel_id'].map(edge_counts)

# for cul_de_sac parcel:
def update_label_cul_de_sac(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'cul_de_sac parcel_standard'
        else:
            group['parcel_label'] = 'cul_de_sac parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'cul_de_sac parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_cul_de_sac = combined_parcel[combined_parcel['parcel_label'] == 'cul_de_sac parcel'].groupby('parcel_id', group_keys=False).apply(update_label_cul_de_sac)
combined_parcel.update(updated_parcel_cul_de_sac)

# for curve parcel:
def update_label_curve(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'curve parcel_standard'
        else:
            group['parcel_label'] = 'curve parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'curve parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_curve = combined_parcel[combined_parcel['parcel_label'] == 'curve parcel'].groupby('parcel_id', group_keys=False).apply(update_label_curve)
combined_parcel.update(updated_parcel_curve)

# for no match address parcel:
def update_label_nomatch(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'no_match_address_standard'
        else:
            group['parcel_label'] = 'no_match_address_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'no_match_address_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_nomatch = combined_parcel[combined_parcel['parcel_label'] == 'no_match_address'].groupby('parcel_id', group_keys=False).apply(update_label_nomatch)
combined_parcel.update(updated_parcel_nomatch)

# for no address parcel
def update_label_noaddress(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'no_address_parcel_standard'
        else:
            group['parcel_label'] = 'no_address_parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'no_address_parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_noaddress = combined_parcel[combined_parcel['parcel_label'] == 'parcel without address'].groupby('parcel_id', group_keys=False).apply(update_label_noaddress)
combined_parcel.update(updated_parcel_noaddress)

# for special parcel
def update_label_special(group):
    if group['num_edges'].iloc[0] in [4, 5]:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'special parcel_standard'
        else:
            group['parcel_label'] = 'special parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'special parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_special = combined_parcel[combined_parcel['parcel_label'] == 'special parcel'].groupby('parcel_id', group_keys=False).apply(update_label_special)
combined_parcel.update(updated_parcel_special)
parcel_label
regular inside parcel         14576
regular corner parcel          2768
jagged parcel                   913
special parcel_standard         693
special parcel_other            627
duplicated address              270
cul_de_sac parcel_other         173
curve parcel_other              157
curve parcel_standard            96
no_address_parcel_standard       60
no_address_parcel_other          18
cul_de_sac parcel_standard       12
no_match_address_standard         4
Name: count, dtype: int64

Step 2: Create the Confidence Area to Mark where is Confidence

Code
parcel_label_summary = combined_parcel.groupby('parcel_id')['parcel_label'].first().reset_index()
# Rename the columns for clarity
parcel_label_summary.columns = ['parcel_id', 'unique_parcel_labels']

confidence_area = parcel[['parcel_id','parcel_addr','landuse','parcel_label','geometry']].copy()
confidence_area['parcel_id'] = confidence_area['parcel_id'].astype(str)
confidence_area = confidence_area.merge(parcel_label_summary, on='parcel_id', how='left')
confidence_area['parcel_label'] = confidence_area['unique_parcel_labels']
confidence_area = confidence_area.drop(columns=['unique_parcel_labels'])


confidence_area['confidence_level'] = np.where(
    confidence_area['parcel_label'].isin(['regular inside parcel', 'regular corner parcel', 'special parcel_standard','curve parcel_standard','cul_de_sac parcel_standard','no_match_address_standard','no_address_parcel_standard']),
    'Yes', 'No'
)

# calculate the area/sqm, and transfer into the sq_acre
confidence_area['area_acre'] = confidence_area['geometry'].area * 0.000247105
confidence_area.to_crs(epsg=4326, inplace=True)
confidence_level
Yes    4528
No      348
Name: count, dtype: int64

Create the centroid point for each parcel and add them to the end of the parcel group

Code
def add_centroids_to_combined_parcel(confidence_area, combined_parcel):
    # Step 1: Calculate the centroid for each geometry and add it to a new column 'centroid_geometry'
    confidence_area['centroid_geometry'] = confidence_area['geometry'].centroid
    # Step 2: Group by 'parcel_id' and get the centroid for each group as a DataFrame
    centroids_by_parcel = confidence_area.groupby('parcel_id')['centroid_geometry'].apply(lambda x: x.iloc[-1]).reset_index()
    # Step 3: Add centroid data to the last row of the corresponding group in combined_parcel
    rows_to_add = []  # List to store new rows to be added
    for _, row in centroids_by_parcel.iterrows():
        parcel_id = row['parcel_id']
        centroid_geometry = row['centroid_geometry']
        # Get rows in combined_parcel that match the parcel_id
        parcel_group = combined_parcel[combined_parcel['parcel_id'] == parcel_id]
        # Add centroid row at the end of the group
        if not parcel_group.empty:
            # Create a new row, setting centroid as geometry, keeping other columns empty or default
            new_row = parcel_group.iloc[-1].copy()
            new_row['geometry'] = centroid_geometry
            new_row['side'] = 'centroid'  # Set the 'side' column value to 'centroid'
            rows_to_add.append(new_row)  # Add new row to list
    # Use pd.concat to add all new rows to combined_parcel
    combined_parcel = pd.concat([combined_parcel, pd.DataFrame(rows_to_add)], ignore_index=True)
    # Create a helper column to ensure centroid rows appear at the end of each group
    combined_parcel['is_centroid'] = combined_parcel['side'] == 'centroid'
    # Sort by 'parcel_id' and 'is_centroid' so that centroid rows are at the end of each group
    combined_parcel = combined_parcel.sort_values(by=['parcel_id', 'is_centroid'], ascending=[True, True]).reset_index(drop=True)
    # Step 6: Drop the helper column
    combined_parcel = combined_parcel.drop(columns=['is_centroid'])
    return combined_parcel

# Use the function to directly update combined_parcel
combined_parcel = add_centroids_to_combined_parcel(confidence_area, combined_parcel)
# If the CRS is not set or is incorrect, set it to the correct one (e.g., EPSG:3857)
combined_parcel.set_crs(epsg=3857, inplace=True, allow_override=True)
# Convert to EPSG:4326
combined_parcel = combined_parcel.to_crs(epsg=4326)
Prop_ID GEO_ID parcel_id parcel_addr landuse landuse_spec parcel_label Found_Match match_road_address shape_index 50_threshold num_edges angle_difference shared_side parcel_bearing road_bearing angle distance_to_road side geometry
0 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 0 4 0.790292 1 1.560608 0.011927 88.732853 28.968721 Interior side LINESTRING (-97.28932 32.78599, -97.28931 32.7...
1 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 0 4 0.790292 1 -3.138026 0.011927 0.479009 50.845007 rear LINESTRING (-97.28912 32.78599, -97.28930 32.7...
2 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 0 4 0.790292 0 -0.001866 0.011927 0.790292 7.329586 front LINESTRING (-97.28931 32.78632, -97.28911 32.7...
3 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 0 4 0.790292 1 -1.589407 0.011927 88.250309 29.215929 Interior side LINESTRING (-97.28911 32.78632, -97.28912 32.7...
4 03027805 NaN 1000 3924 EARL ST R A regular inside parcel 1 Earl St 1.192563 0 4 0.790292 1 -1.589407 0.011927 88.250309 29.215929 centroid POINT (-97.28921 32.78615)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
25235 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 1 4 0.044117 0 1.572441 1.573211 0.044117 12.197149 front LINESTRING (-97.29050 32.78570, -97.29050 32.7...
25236 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 1 4 0.044117 1 -0.000172 1.573211 89.851809 78.928026 Interior side LINESTRING (-97.29050 32.78599, -97.29029 32.7...
25237 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 1 4 0.044117 1 3.141498 1.573211 89.856205 79.090904 Interior side LINESTRING (-97.28930 32.78570, -97.28951 32.7...
25238 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 1 4 0.044117 1 -1.558978 1.573211 0.538781 145.821717 rear LINESTRING (-97.28930 32.78599, -97.28930 32.7...
25239 00381004 NaN 999 1712 N BEACH ST NaN F1 regular inside parcel 1 N Beach St 1.361853 1 4 0.044117 1 -1.558978 1.573211 0.538781 145.821717 centroid POINT (-97.28990 32.78585)

25240 rows × 20 columns

Data Key Column Explanation

Prop_ID

A unique identifier for each property, sourced directly from the data portal.

parcel_id

A unique identifier assigned to each parcel, defined as a sequential number starting from 1 and unique within this dataset.

parcel_addr

The physical address associated with each parcel.

landuse

The current land use type for each parcel. Example codes include:
- R: Residential
- P: Public
- C: Commercial

landuse_spec

The more detailed current land use codes for each parcel. More details you can see here

parcel_label

The classification label assigned to each parcel created by us.

side

The classification label for each parcel edge, which can be one of the following:
- Front
- Rear
- Exterior Side
- Interior Side
- Centroid point to present where the parcel is

geometry

The geometric representation of each parcel edge using LineString format. CRS=4326(WGS 84)

Make this Notebook Trusted to load map: File -> Trust Notebook

Data Statistics

The Confidence = Yes classifications correspond to the “regular inside parcel” and “regular corner parcel” types. The confidence percentage is calculated using the formula below:

\(\text{Confidence Percentage} = \left( \frac{\text{Number of 'Yes'}}{\text{Total Number of Parcels}} \right) \times 100\)

Where:

  • Number of ‘Yes’: The count of parcels classified with ‘Yes’ in the confidence_level column.
  • Total Number of Parcels: The total count of parcels in the dataset.
Confidence Percentage: 92.86%