Generate Results

Data Integation

Merge all classified parcel segments into a single GeoDataFrame.

01.Regular Inside Parcel
02.Regular Corner Parcel
03.Special Parcel
04.Jagged Parcel
05.Curve Parcel
06.Cul De Sac Parcel
07.No Match Address Parcel
08.No Address Parcel
09.Duplicated Address Parcel

Code

# List of GeoDataFrames to combine
parcel_list = [
    regular_insid_parcel, regular_corner_parcel, special_parcel, jagged_parcel, curve_parcel,
    cul_de_sac_parcel, no_match_address_parcel, no_address_parcel, duplicated_address_parcel
]

# Concatenate all GeoDataFrames in the list and ensure 'crs' and 'geometry' are set
combined_parcel = gpd.GeoDataFrame(
    pd.concat(parcel_list, ignore_index=True),
    crs=parcel_seg.crs,  # Use the crs from the first GeoDataFrame in the list
    geometry='geometry'  # Ensure the geometry column is correctly set
)

combined_parcel['parcel_id'] = combined_parcel['parcel_id'].astype(str)
# Sort by 'parcel_id' to ensure similar parcel_id are together
combined_parcel = combined_parcel.sort_values(by='parcel_id').reset_index(drop=True)

	Prop_ID	GEO_ID	parcel_id	parcel_addr	landuse	landuse_spec	parcel_label	Found_Match	match_road_address	shape_index	50_threshold	num_edges	angle_difference	shared_side	parcel_bearing	road_bearing	angle	distance_to_road	side	geometry
0	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	False	4.0	0.790292	True	1.560608	0.011927	88.732853	28.968721	Interior side	LINESTRING (-97.28932 32.78599, -97.28931 32.7...
1	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	False	4.0	0.790292	True	-3.138026	0.011927	0.479009	50.845007	rear	LINESTRING (-97.28912 32.78599, -97.28930 32.7...
2	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	False	4.0	0.790292	False	-0.001866	0.011927	0.790292	7.329586	front	LINESTRING (-97.28931 32.78632, -97.28911 32.7...
3	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	False	4.0	0.790292	True	-1.589407	0.011927	88.250309	29.215929	Interior side	LINESTRING (-97.28911 32.78632, -97.28912 32.7...
4	03027783	NaN	1001	3932 EARL ST	R	A	regular inside parcel	1	Earl St	1.195293	False	4.0	1.184516	True	1.560690	0.011927	88.737543	29.672111	Interior side	LINESTRING (-97.28892 32.78599, -97.28892 32.7...
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
20362	04723295	NaN	998	3920 EARL ST	R	A	regular inside parcel	1	Earl St	1.214266	False	4.0	1.325428	True	3.136724	0.011927	0.962359	50.593869	rear	LINESTRING (-97.28932 32.78599, -97.28950 32.7...
20363	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	True	4.0	0.044117	False	1.572441	1.573211	0.044117	12.197149	front	LINESTRING (-97.29050 32.78570, -97.29050 32.7...
20364	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	True	4.0	0.044117	True	-0.000172	1.573211	89.851809	78.928026	Interior side	LINESTRING (-97.29050 32.78599, -97.29029 32.7...
20365	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	True	4.0	0.044117	True	3.141498	1.573211	89.856205	79.090904	Interior side	LINESTRING (-97.28930 32.78570, -97.28951 32.7...
20366	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	True	4.0	0.044117	True	-1.558978	1.573211	0.538781	145.821717	rear	LINESTRING (-97.28930 32.78599, -97.28930 32.7...

20367 rows × 20 columns

parcel_label
regular inside parcel     14576
regular corner parcel      2768
special parcel             1320
jagged parcel               913
duplicated address          270
curve parcel                253
cul_de_sac parcel           185
parcel without address       78
no_match_address              4
Name: count, dtype: int64

Granular Classification for Marking Confidence

Parcels labeled as Cul-de-sac, Curve, Special, No Match Address, and No Address are further subdivided into “standard” and “other” types to assign a confidence level. These rules are based on visual inspection and observed patterns. While the criteria differ slightly for each parcel label, the main factors are whether both the front and rear sides are present and the number of edges in the parcel.

01.Regular Inside Parcel(confidence level = ‘Yes’)
02.Regular Corner Parcel(confidence level = ‘Yes’)
03.Special Parcel Standard(confidence level = ‘Yes’)
04.Special Parcel Other(confidence level = ‘No’)
05.Jagged Parcel(confidence level = ‘No’)
06.Curve Parcel Standard(confidence level = ‘Yes’)
07.Curve Parcel Other(confidence level = ‘No’)
08.Cul De Sac Parcel Standard(confidence level = ‘Yes’)
09.Cul De Sac Parcel Other(confidence level = ‘No’)
10.No Match Address Parcel Standard(confidence level = ‘Yes’)
11.No Match Address Parcel Other(confidence level = ‘No’)
12.No Address Parcel Standard(confidence level = ‘Yes’)
13.No Address Parcel Other(confidence level = ‘No’)
14.Duplicated Address Parcel(confidence level = ‘No’)

Step 1: Assign Standard and Other Types for Selected Parcel Label

Code

# Calculate the number of edges for each parcel group
edge_counts = combined_parcel.groupby('parcel_id').size()
combined_parcel['num_edges'] = combined_parcel['parcel_id'].map(edge_counts)

# for cul_de_sac parcel:
def update_label_cul_de_sac(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'cul_de_sac parcel_standard'
        else:
            group['parcel_label'] = 'cul_de_sac parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'cul_de_sac parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_cul_de_sac = combined_parcel[combined_parcel['parcel_label'] == 'cul_de_sac parcel'].groupby('parcel_id', group_keys=False).apply(update_label_cul_de_sac)
combined_parcel.update(updated_parcel_cul_de_sac)

# for curve parcel:
def update_label_curve(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'curve parcel_standard'
        else:
            group['parcel_label'] = 'curve parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'curve parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_curve = combined_parcel[combined_parcel['parcel_label'] == 'curve parcel'].groupby('parcel_id', group_keys=False).apply(update_label_curve)
combined_parcel.update(updated_parcel_curve)

# for no match address parcel:
def update_label_nomatch(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'no_match_address_standard'
        else:
            group['parcel_label'] = 'no_match_address_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'no_match_address_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_nomatch = combined_parcel[combined_parcel['parcel_label'] == 'no_match_address'].groupby('parcel_id', group_keys=False).apply(update_label_nomatch)
combined_parcel.update(updated_parcel_nomatch)

# for no address parcel
def update_label_noaddress(group):
    if group['num_edges'].iloc[0] == 4:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'no_address_parcel_standard'
        else:
            group['parcel_label'] = 'no_address_parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'no_address_parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_noaddress = combined_parcel[combined_parcel['parcel_label'] == 'parcel without address'].groupby('parcel_id', group_keys=False).apply(update_label_noaddress)
combined_parcel.update(updated_parcel_noaddress)

# for special parcel
def update_label_special(group):
    if group['num_edges'].iloc[0] in [4, 5]:
        if 'front' in group['side'].values and 'rear' in group['side'].values:
            group['parcel_label'] = 'special parcel_standard'
        else:
            group['parcel_label'] = 'special parcel_other'
    else:
        # Directly change label if 'num_edges' is not equal to 4
        group['parcel_label'] = 'special parcel_other'
    return group

# Apply the function to each group and update the main DataFrame
updated_parcel_special = combined_parcel[combined_parcel['parcel_label'] == 'special parcel'].groupby('parcel_id', group_keys=False).apply(update_label_special)
combined_parcel.update(updated_parcel_special)

parcel_label
regular inside parcel         14576
regular corner parcel          2768
jagged parcel                   913
special parcel_standard         693
special parcel_other            627
duplicated address              270
cul_de_sac parcel_other         173
curve parcel_other              157
curve parcel_standard            96
no_address_parcel_standard       60
no_address_parcel_other          18
cul_de_sac parcel_standard       12
no_match_address_standard         4
Name: count, dtype: int64

Step 2: Create the Confidence Area to Mark where is Confidence

Code

parcel_label_summary = combined_parcel.groupby('parcel_id')['parcel_label'].first().reset_index()
# Rename the columns for clarity
parcel_label_summary.columns = ['parcel_id', 'unique_parcel_labels']

confidence_area = parcel[['parcel_id','parcel_addr','landuse','parcel_label','geometry']].copy()
confidence_area['parcel_id'] = confidence_area['parcel_id'].astype(str)
confidence_area = confidence_area.merge(parcel_label_summary, on='parcel_id', how='left')
confidence_area['parcel_label'] = confidence_area['unique_parcel_labels']
confidence_area = confidence_area.drop(columns=['unique_parcel_labels'])


confidence_area['confidence_level'] = np.where(
    confidence_area['parcel_label'].isin(['regular inside parcel', 'regular corner parcel', 'special parcel_standard','curve parcel_standard','cul_de_sac parcel_standard','no_match_address_standard','no_address_parcel_standard']),
    'Yes', 'No'
)

# calculate the area/sqm, and transfer into the sq_acre
confidence_area['area_acre'] = confidence_area['geometry'].area * 0.000247105
confidence_area.to_crs(epsg=4326, inplace=True)

confidence_level
Yes    4528
No      348
Name: count, dtype: int64

Create the centroid point for each parcel and add them to the end of the parcel group

Code

def add_centroids_to_combined_parcel(confidence_area, combined_parcel):
    # Step 1: Calculate the centroid for each geometry and add it to a new column 'centroid_geometry'
    confidence_area['centroid_geometry'] = confidence_area['geometry'].centroid
    # Step 2: Group by 'parcel_id' and get the centroid for each group as a DataFrame
    centroids_by_parcel = confidence_area.groupby('parcel_id')['centroid_geometry'].apply(lambda x: x.iloc[-1]).reset_index()
    # Step 3: Add centroid data to the last row of the corresponding group in combined_parcel
    rows_to_add = []  # List to store new rows to be added
    for _, row in centroids_by_parcel.iterrows():
        parcel_id = row['parcel_id']
        centroid_geometry = row['centroid_geometry']
        # Get rows in combined_parcel that match the parcel_id
        parcel_group = combined_parcel[combined_parcel['parcel_id'] == parcel_id]
        # Add centroid row at the end of the group
        if not parcel_group.empty:
            # Create a new row, setting centroid as geometry, keeping other columns empty or default
            new_row = parcel_group.iloc[-1].copy()
            new_row['geometry'] = centroid_geometry
            new_row['side'] = 'centroid'  # Set the 'side' column value to 'centroid'
            rows_to_add.append(new_row)  # Add new row to list
    # Use pd.concat to add all new rows to combined_parcel
    combined_parcel = pd.concat([combined_parcel, pd.DataFrame(rows_to_add)], ignore_index=True)
    # Create a helper column to ensure centroid rows appear at the end of each group
    combined_parcel['is_centroid'] = combined_parcel['side'] == 'centroid'
    # Sort by 'parcel_id' and 'is_centroid' so that centroid rows are at the end of each group
    combined_parcel = combined_parcel.sort_values(by=['parcel_id', 'is_centroid'], ascending=[True, True]).reset_index(drop=True)
    # Step 6: Drop the helper column
    combined_parcel = combined_parcel.drop(columns=['is_centroid'])
    return combined_parcel

# Use the function to directly update combined_parcel
combined_parcel = add_centroids_to_combined_parcel(confidence_area, combined_parcel)
# If the CRS is not set or is incorrect, set it to the correct one (e.g., EPSG:3857)
combined_parcel.set_crs(epsg=3857, inplace=True, allow_override=True)
# Convert to EPSG:4326
combined_parcel = combined_parcel.to_crs(epsg=4326)

	Prop_ID	GEO_ID	parcel_id	parcel_addr	landuse	landuse_spec	parcel_label	Found_Match	match_road_address	shape_index	50_threshold	num_edges	angle_difference	shared_side	parcel_bearing	road_bearing	angle	distance_to_road	side	geometry
0	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	0	4	0.790292	1	1.560608	0.011927	88.732853	28.968721	Interior side	LINESTRING (-97.28932 32.78599, -97.28931 32.7...
1	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	0	4	0.790292	1	-3.138026	0.011927	0.479009	50.845007	rear	LINESTRING (-97.28912 32.78599, -97.28930 32.7...
2	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	0	4	0.790292	0	-0.001866	0.011927	0.790292	7.329586	front	LINESTRING (-97.28931 32.78632, -97.28911 32.7...
3	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	0	4	0.790292	1	-1.589407	0.011927	88.250309	29.215929	Interior side	LINESTRING (-97.28911 32.78632, -97.28912 32.7...
4	03027805	NaN	1000	3924 EARL ST	R	A	regular inside parcel	1	Earl St	1.192563	0	4	0.790292	1	-1.589407	0.011927	88.250309	29.215929	centroid	POINT (-97.28921 32.78615)
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
25235	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	1	4	0.044117	0	1.572441	1.573211	0.044117	12.197149	front	LINESTRING (-97.29050 32.78570, -97.29050 32.7...
25236	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	1	4	0.044117	1	-0.000172	1.573211	89.851809	78.928026	Interior side	LINESTRING (-97.29050 32.78599, -97.29029 32.7...
25237	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	1	4	0.044117	1	3.141498	1.573211	89.856205	79.090904	Interior side	LINESTRING (-97.28930 32.78570, -97.28951 32.7...
25238	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	1	4	0.044117	1	-1.558978	1.573211	0.538781	145.821717	rear	LINESTRING (-97.28930 32.78599, -97.28930 32.7...
25239	00381004	NaN	999	1712 N BEACH ST	NaN	F1	regular inside parcel	1	N Beach St	1.361853	1	4	0.044117	1	-1.558978	1.573211	0.538781	145.821717	centroid	POINT (-97.28990 32.78585)

25240 rows × 20 columns

Data Key Column Explanation

Prop_ID

A unique identifier for each property, sourced directly from the data portal.

parcel_id

A unique identifier assigned to each parcel, defined as a sequential number starting from 1 and unique within this dataset.

parcel_addr

The physical address associated with each parcel.

landuse

The current land use type for each parcel. Example codes include:
- R: Residential
- P: Public
- C: Commercial

landuse_spec

The more detailed current land use codes for each parcel. More details you can see here

parcel_label

The classification label assigned to each parcel created by us.

side

The classification label for each parcel edge, which can be one of the following:
- Front
- Rear
- Exterior Side
- Interior Side
- Centroid point to present where the parcel is

geometry

The geometric representation of each parcel edge using LineString format. CRS=4326(WGS 84)

Make this Notebook Trusted to load map: File -> Trust Notebook

Data Statistics

The Confidence = Yes classifications correspond to the “regular inside parcel” and “regular corner parcel” types. The confidence percentage is calculated using the formula below:

\(\text{Confidence Percentage} = \left( \frac{\text{Number of 'Yes'}}{\text{Total Number of Parcels}} \right) \times 100\)

Where:

Number of ‘Yes’: The count of parcels classified with ‘Yes’ in the confidence_level column.
Total Number of Parcels: The total count of parcels in the dataset.

Confidence Percentage: 92.86%