Data Preprocessing

Data Standardlizing and Preprocessing

The Data Standardlizing and Preprocessing process focuses on unifying the columns’ name, labelling those parcels without address, removing duplicate geometries and ensuring the integrity of the Tarrant Parcels and Road datasets. Due to data quality issues, some geometries in both datasets have identical attributes but are repeated. This process involves identifying and eliminating these redundant geometries while ensuring that key information, such as address names, is retained. After this process, the datasets are transformed into a consistent format and prepared for further spatial analysis.

In the selected parcels of Tarrant County, there are some duplicated geometries(741/5311), which usually share the same address attributes.
Figure 1: An example of duplicated parcels, created by Houpu Li
In the road centerline dataset of Tarrant County, there are some duplicated geometries(1812/41202), which usually share the different address attributes.
Figure 2: An example of duplicated roads, created by Houpu Li

01.Preprocessing for Parcels

This section details the data cleaning process for the tarrant_parcel dataset, including CRS transformation, removal of duplicate geometries, standardization of column names (handling cases where OBJECTID and OBJECTID_1 refer to the same attribute), labeling parcels with missing address information, and resetting the index.

Note:
In the original dataset, both Prop_ID and OBJECTID columns are present and serve as parcel identifiers. However, after inspection, Prop_ID is found to be truly unique, while OBJECTID contains about 10% duplicate values. And the official documentation from TxGIO, there is no explanation for OBJECTID. For more details, see here. From this section through step 6: Label Parcel Edges, we rename OBJECTID (or OBJECTID_1) to parcel_id for data analysis and processing. In step 7: Generate Results, we reconstruct the parcel_id by assigning new sequential values starting from 1, since OBJECTID is inaccurate and redundant compared to the unique Prop_ID.

Code

# Tarrant Parcel:

'''read the parcel and standardlizing the column name'''
# define file path
parcel_path = r'Tarrant_County/ParcelView_Tarrant.zip'
# standlize the column names
parcel_cols = {'OBJECTID': 'parcel_id', 'OBJECTID_1': 'parcel_id', 'SITUS_ADDR': 'parcel_addr', 'STAT_LAND_': 'landuse_spec'}
# read the data
parcel = gpd.read_file(parcel_path)
# rename the columns
parcel.rename(columns=lambda x: parcel_cols.get(x, x), inplace=True)

'''parcel preprocessing'''
# Define a function to extract only the road name part (before the first comma)
def optimize_road_name(situs_addr):
    if pd.isna(situs_addr) or situs_addr.strip() == ', ,':
        return None
    else:
        return situs_addr.split(',')[0].strip()
# Apply the function to the 'SITUS_ADDR' column
parcel['parcel_addr'] = parcel['parcel_addr'].apply(optimize_road_name)
parcel['parcel_addr'] = parcel['parcel_addr'].replace(r'^\s*$', None, regex=True)

'''extract residential area based on specifical landuse'''
parcel['landuse'] = parcel['landuse_spec'].apply(lambda x: 'R' if isinstance(x, str) and x[0] in ['A', 'B'] else None)

'''read the parcel data and data cleanning, tips: steps 3 and 4 are necessary to remove duplicate geometries and ensure that the remaining rows contain the required address information.'''
# Step 1: Transfer the CRS to 4326
parcel = parcel.to_crs(4326)
# Step 2: Create a column to indicate whether 'parcel_addr' or 'landuse' has a value (True/False)
parcel['has_info'] = (~parcel['parcel_addr'].isna()) | (~parcel['landuse'].isna())
# Step 3: Sort the rows by 'has_info' in descending order to prioritize rows with parcel_addr or landuse values
parcel = parcel.sort_values(by='has_info', ascending=False)
# Step 4: Drop duplicates based on geometry, keeping the first occurrence (which now has priority rows at the top)
parcel = parcel.drop_duplicates(subset='geometry')
# Step 5: Drop the 'has_info' column as it's no longer needed
parcel = parcel.drop(columns=['has_info'])
# Step 6: Initialize 'parcel_labeled' column with None values
parcel['parcel_label'] = None
parcel.loc[parcel['parcel_addr'].isna(), 'parcel_label'] = 'parcel without address'
parcel = parcel.reset_index(drop=True)
# Step 7: Extracted the useful columns
parcel = parcel[['Prop_ID','GEO_ID','parcel_id','parcel_addr','landuse','landuse_spec','parcel_label','geometry']]
# Step 8: Group the duplicate parcel_id values and add a suffix.
parcel.loc[parcel['parcel_id'].duplicated(keep=False), 'parcel_id'] = (
    parcel.loc[parcel['parcel_id'].duplicated(keep=False), 'parcel_id'].astype(str) + '_' +
    parcel.loc[parcel['parcel_id'].duplicated(keep=False)].groupby('parcel_id').cumcount().add(1).astype(str)
)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 682424 entries, 0 to 682423
Data columns (total 8 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   Prop_ID       678845 non-null  float64
 1   GEO_ID        0 non-null       float64
 2   parcel_id     682424 non-null  int64  
 3   parcel_addr   678836 non-null  object 
 4   landuse       581550 non-null  object 
 5   landuse_spec  678845 non-null  object 
 6   parcel_label  3588 non-null    object 
 7   geometry      682423 non-null  object 
dtypes: float64(2), int64(1), object(5)
memory usage: 41.7+ MB
None

	Prop_ID	GEO_ID	parcel_id	parcel_addr	landuse	landuse_spec	parcel_label	geometry
0	51.0	NaN	326302	117 E BELKNAP ST	NaN	C1	NaN	POLYGON ((-97.33292518846756 32.75814832311077...
1	7123337.0	NaN	513461	1416 MICAH WAY	R	A	NaN	POLYGON ((-97.22168801237416 32.92320942667897...
2	7123353.0	NaN	513472	1412 MICAH WAY	R	A	NaN	POLYGON ((-97.2218496040224 32.92250941167325,...
3	7123361.0	NaN	513474	1410 MICAH WAY	R	A	NaN	POLYGON ((-97.22206715622127 32.92215834717135...
4	7123388.0	NaN	513475	1408 MICAH WAY	R	A	NaN	POLYGON ((-97.22262514925158 32.92230780820663...

02.Preprocessing for Roads Centerline

The code performs data cleaning on the tarrant_road dataset by transforming the CRS, removing duplicate geometries, and resetting the index.

Code

# Tarrant Roads:

# define file path
road_path = r'Tarrant_County/tl_2023_48439_roads_Tarrant.zip'
# standlize the column names
road_cols = {'LINEARID': 'road_id', 'FULLNAME': 'road_addr'}
# read the data
road = gpd.read_file(road_path)
# rename the columns
road.rename(columns=lambda x: road_cols.get(x, x), inplace=True)

'''read the road data and data cleanning'''
# Step 1: Transfer the CRS to 4326
road = road.to_crs(4326)
# Step 2: Create a column to indicate whether 'road_addr' has a value (True/False)
road['has_info'] = ~road['road_addr'].isna()
# Step 3: Sort the rows by 'has_info' in descending order to prioritize rows with Situs_Addr or RP values
road = road.sort_values(by='has_info', ascending=False)
# Step 4: Drop duplicates based on geometry, keeping the first occurrence (which now has priority rows at the top)
road = road.drop_duplicates(subset='geometry')
# Step 5: Drop the 'has_info' column as it's no longer needed
road = road.drop(columns=['has_info'])
road = road.reset_index(drop=True)
road = road[['road_id','road_addr','geometry']]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 40289 entries, 0 to 40288
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   road_id    40289 non-null  int64 
 1   road_addr  35145 non-null  object
 2   geometry   40289 non-null  object
dtypes: int64(1), object(2)
memory usage: 944.4+ KB
None

	road_id	road_addr	geometry
0	1102638482847	4e 4w Rmp	LINESTRING (-97.0400735653666 32.8903365536895...
1	1101928845786	Shady Oak Dr	LINESTRING (-97.08646849800373 32.949245671442...
2	1105598242853	Dove Creek Cir	LINESTRING (-97.09455850067135 32.946642680548...
3	1101928841512	Brookwood Dr	LINESTRING (-97.09230449331861 32.953235682141...
4	1101928820527	Dublin Cir	LINESTRING (-97.09074749003082 32.956201682184...

Attribute Management & Geometry Simplification

The code performs a geometry explosion to split multi-part geometries (e.g., MultiPolygon or MultiLineString) into individual geometries (e.g., Polygon or LineString). After exploding, it resets the index to maintain a clean, sequential index for further data processing.

Code

# Reset the index to maintain a clean sequential index after the explosion
parcel = parcel.explode(index_parts=False).reset_index(drop=True)
road = road.explode(index_parts=False).reset_index(drop=True)

Geometry Extraction

In this step, we decompose each road line into individual segments, which helps facilitate accurate spatial matching in later analysis. In other words, splitting the road lines into smaller segments allows us to precisely assign each parcel boundary to its nearest road segment, improving spatial alignment and analytical precision.

Code

# Initialize lists to store line segments and corresponding addresses
line_strings = []
addrs = []
linear_ids = []

# Iterate over rows in road
for idx, row in road.iterrows():
    line = row['geometry']  # Assume this is a LineString geometry
    addr = row['road_addr']
    linear_id = row['road_id']
    
    if line.is_valid and isinstance(line, LineString):
        for i in range(len(line.coords) - 1):
            current_line = LineString([line.coords[i], line.coords[i + 1]])
            line_strings.append(current_line)
            addrs.append(addr)
            linear_ids.append(linear_id)
    else:
        print(f"Invalid or non-LineString geometry detected: {line}")
        
# Create GeoDataFrame for the split road segments
road_seg = gpd.GeoDataFrame({'geometry': line_strings, 'road_addr': addrs, 'road_id': linear_ids}, crs=road.crs)
road_seg = road_seg.to_crs(3857)

Address Matching

This step is crucial because the two datasets are collected from different sources, which may lead to slight variations in the recorded addresses. Although both the road and parcel datasets contain address information, discrepancies can arise. For example:
- Road address: Kamyrn Rd / Houpu Road
- Parcel address: 1230 Kamyrn Road / 1230 Hpu Rood
As shown above, minor differences such as abbreviations or misspellings can make direct matching challenging. To address this issue, we use the fuzzywuzzy and rtree libraries to implement a robust matching method that links nearst road addresses to their corresponding parcel segments. - fuzzywuzzy: A library used for fuzzy string matching, which helps identify similar but non-identical addresses by calculating a similarity score. - rtree: A spatial indexing library used to efficiently find and match spatial features based on their proximity, improving the accuracy of address-to-parcel assignments.

01.Coordinate Transformation

This step converts the coordinate reference systems (CRS) of both extracted_parcel and extracted_road to a projected CRS (EPSG: 3857) to ensure spatial consistency for subsequent spatial operations.

Code

# transfer the crs to projected crs
parcel = parcel.to_crs(3857)
road = road.to_crs(parcel.crs)

02.Spatial Filtering Using cKDtree

To match the address names between road segments and parcels, we first need to identify the nearest road. However, the nearest road may not always have the exact same name as the parcel’s address. To improve the matching accuracy, we identify the n nearest roads for each parcel (in this case, n = 50). This approach provides multiple candidate road segments for each parcel, increasing the chances of finding a correct match in subsequent processes. The code calculates the centroid coordinates (x, y) of both road segments and parcel geometries. It then uses the cKDTree from scipy to find the 50 nearest roads for each parcel based on their centroid locations. The results will update in the extracted_parcel GeoDataFrame, adding Nearest_Road columns to indicate the closest road names for each parcel. Finally, the temporary x and y columns are dropped to maintain a clean dataset. This process enables efficient spatial address matching between parcels and road segments for further analysis.

Code

# Find the centroid coordinate for each road and parcel
road_seg['x'] = road_seg.geometry.apply(lambda geom: geom.centroid.x)
road_seg['y'] = road_seg.geometry.apply(lambda geom: geom.centroid.y)

parcel['x'] = parcel.geometry.apply(lambda geom: geom.centroid.x)
parcel['y'] = parcel.geometry.apply(lambda geom: geom.centroid.y)

# find the nearest road by using cKDTree
n = 50
tree = cKDTree(road_seg[['x', 'y']])
distances, indices = tree.query(parcel[['x', 'y']], k=n)  # find the nearest n roads

# Create a temporary DataFrame to store the nearest road names
nearest_road_names = pd.DataFrame({
    f'Nearest_Road_{i+1}_Address': road_seg.iloc[indices[:, i]].road_addr.values
    for i in range(n)
})

# Concatenate the new columns with the original DataFrame
parcel = pd.concat([parcel, nearest_road_names], axis=1)
# Drop the x, y column
parcel = parcel.drop(columns=['x', 'y'])

	parcel_id	parcel_addr	landuse_spec	parcel_label	geometry	Nearest_Road_1_FULLNAME	Nearest_Road_2_FULLNAME	Nearest_Road_3_FULLNAME	Nearest_Road_4_FULLNAME	Nearest_Road_5_FULLNAME	Nearest_Road_6_FULLNAME	Nearest_Road_7_FULLNAME	Nearest_Road_8_FULLNAME	Nearest_Road_9_FULLNAME	Nearest_Road_10_FULLNAME	Nearest_Road_11_FULLNAME	Nearest_Road_12_FULLNAME	Nearest_Road_13_FULLNAME	Nearest_Road_14_FULLNAME	Nearest_Road_15_FULLNAME	Nearest_Road_16_FULLNAME	Nearest_Road_17_FULLNAME	Nearest_Road_18_FULLNAME	Nearest_Road_19_FULLNAME	Nearest_Road_20_FULLNAME	Nearest_Road_21_FULLNAME	Nearest_Road_22_FULLNAME	Nearest_Road_23_FULLNAME	Nearest_Road_24_FULLNAME	Nearest_Road_25_FULLNAME	Nearest_Road_26_FULLNAME	Nearest_Road_27_FULLNAME	Nearest_Road_28_FULLNAME	Nearest_Road_29_FULLNAME	Nearest_Road_30_FULLNAME	Nearest_Road_31_FULLNAME	Nearest_Road_32_FULLNAME	Nearest_Road_33_FULLNAME	Nearest_Road_34_FULLNAME	Nearest_Road_35_FULLNAME	Nearest_Road_36_FULLNAME	Nearest_Road_37_FULLNAME	Nearest_Road_38_FULLNAME	Nearest_Road_39_FULLNAME	Nearest_Road_40_FULLNAME	Nearest_Road_41_FULLNAME	Nearest_Road_42_FULLNAME	Nearest_Road_43_FULLNAME	Nearest_Road_44_FULLNAME	Nearest_Road_45_FULLNAME	Nearest_Road_46_FULLNAME	Nearest_Road_47_FULLNAME	Nearest_Road_48_FULLNAME	Nearest_Road_49_FULLNAME	Nearest_Road_50_FULLNAME
0	400151	2717 E BELKNAP ST	C	NaN	POLYGON ((-10832192.750619514 3864892.01523430...	Blandin St	US Hwy 377	E Belknap St	Marshall St	Blandin St	Grace Ave	Noble Ave	Noble Ave	Grace Ave	Noble Ave	E Belknap St	US Hwy 377	Noble Ave	Blandin St	Blandin St	Grace Ave	Marshall St	Noble Ave	US Hwy 377	E Belknap St	Grace Ave	N Sylvania Ave	E Belknap St	US Hwy 377	N Sylvania Ave	Noble Ave	N Sylvania Ave	N Sylvania Ave	Grace Ave	Emma St	E Belknap St	US Hwy 377	N Sylvania Ave	Blandin St	Blandin St	Plumwood St	Marshall St	E Belknap St	US Hwy 377	N Sylvania Ave	E Belknap St	US Hwy 377	Noble Ave	N Sylvania Ave	Pittsburg Pl	Race St	Pittsburg Pl	N Judkins St	Race St	Juanita St
1	403412	3000 MARIGOLD AVE	R	NaN	POLYGON ((-10831742.722703427 3866447.98916702...	N Chandler Dr	Marigold Ave	N Chandler Dr	Primrose Ave	Primrose Ave	Marigold Ave	N Chandler Dr	N Chandler Dr	Primrose Ave	Marigold Ave	Honeysuckle Ave	N Chandler Dr	Primrose Ave	Primrose Ave	N Chandler Dr	Marigold Ave	N Riverside Dr	Honeysuckle Ave	Honeysuckle Ave	Primrose Ave	N Chandler Dr	N Riverside Dr	Blandin St	Yucca Ave	Primrose Ave	Yucca Ave	N Riverside Dr	Blandin St	Primrose Ave	Primrose Ave	N Chandler Dr	Blandin St	Honeysuckle Ave	Marigold Ave	Marigold Ave	N Riverside Dr	Primrose Ave	Carnation Ave	Yucca Ave	N Riverside Dr	N Riverside Dr	Blandin St	Blandin St	Carnation Ave	N Chandler Dr	Yucca Ave	N Chandler Dr	Honeysuckle Ave	Clary Ave	Honeysuckle Ave
2	403410	3012 MARIGOLD AVE	R	NaN	POLYGON ((-10831697.814370392 3866447.98820623...	Marigold Ave	N Chandler Dr	Primrose Ave	N Chandler Dr	Marigold Ave	Primrose Ave	N Chandler Dr	Primrose Ave	N Chandler Dr	Marigold Ave	N Riverside Dr	N Chandler Dr	Honeysuckle Ave	Primrose Ave	Honeysuckle Ave	N Riverside Dr	N Chandler Dr	N Riverside Dr	Yucca Ave	Marigold Ave	N Chandler Dr	Primrose Ave	Primrose Ave	Honeysuckle Ave	Marigold Ave	N Riverside Dr	Honeysuckle Ave	Yucca Ave	Primrose Ave	N Riverside Dr	N Riverside Dr	N Chandler Dr	Primrose Ave	Blandin St	Carnation Ave	Blandin St	Honeysuckle Ave	Primrose Ave	Bolton St	Primrose Ave	Primrose Ave	Bolton St	Yucca Ave	Primrose Ave	Bolton St	Blandin St	N Chandler Dr	N Riverside Dr	Marigold Ave	Yucca Ave
3	403409	3016 MARIGOLD AVE	R	NaN	POLYGON ((-10831675.720805798 3866447.98629809...	Marigold Ave	Marigold Ave	Primrose Ave	N Chandler Dr	N Chandler Dr	Primrose Ave	N Riverside Dr	N Chandler Dr	Primrose Ave	N Chandler Dr	N Riverside Dr	Honeysuckle Ave	Marigold Ave	N Chandler Dr	Honeysuckle Ave	N Riverside Dr	Primrose Ave	N Chandler Dr	Marigold Ave	Yucca Ave	Primrose Ave	Honeysuckle Ave	N Riverside Dr	N Chandler Dr	Primrose Ave	N Riverside Dr	N Riverside Dr	Honeysuckle Ave	Marigold Ave	Yucca Ave	Bolton St	Honeysuckle Ave	Primrose Ave	N Chandler Dr	Primrose Ave	Primrose Ave	Bolton St	Carnation Ave	Bolton St	Yucca Ave	Primrose Ave	Blandin St	Blandin St	N Riverside Dr	Primrose Ave	N Chandler Dr	Primrose Ave	Blandin St	Clary Ave	N Chandler Dr
4	403408	3020 MARIGOLD AVE	R	NaN	POLYGON ((-10831652.897093678 3866449.42460290...	Marigold Ave	Marigold Ave	Primrose Ave	Primrose Ave	N Riverside Dr	N Chandler Dr	N Chandler Dr	N Riverside Dr	N Chandler Dr	Primrose Ave	Honeysuckle Ave	N Chandler Dr	N Riverside Dr	Honeysuckle Ave	N Chandler Dr	Marigold Ave	Marigold Ave	Primrose Ave	Honeysuckle Ave	Yucca Ave	N Chandler Dr	Primrose Ave	N Riverside Dr	N Chandler Dr	N Riverside Dr	N Riverside Dr	Bolton St	Honeysuckle Ave	Primrose Ave	Primrose Ave	Bolton St	Bolton St	Primrose Ave	Yucca Ave	Yucca Ave	Honeysuckle Ave	Marigold Ave	N Chandler Dr	Carnation Ave	Primrose Ave	N Riverside Dr	Primrose Ave	Blandin St	N Chandler Dr	Carnation Ave	Bolton St	Blandin St	Clary Ave	Primrose Ave	Clary Ave

03.Field Similarity Matching Using Fuzzywuzzy

This code performs address matching between parcel addresses (Situs_Addr) and the nearest road segments name by using fuzzy string matching. It first removes spaces and converts both addresses and road names to lowercase for consistency. Then, it identifies the top n nearest roads for each parcel and calculates a similarity score using the fuzz.partial_ratio function from the fuzzywuzzy library. The code keeps track of the best match if the similarity score exceeds a predefined threshold (50%). If a match is found, new columns are created to store the matched road segment and its original format. The results are then merged back into the main extracted_parcel GeoDataFrame, and any parcels without a match are labeled as no_match_address and print the count of parcels that could not be matched. This step helps refine address matching by allowing for minor variations and typos in the address data, thereby improving overall matching accuracy between parcels and roads segments.

Code

# The function to find the match address between the n nearst roads segments and parcels
def check_and_extract_match_info(row):
    # Remove spaces from parcel_addr
    parcel_addr = row['parcel_addr'].replace(' ', '').lower()
    
    # Dynamically generate a list of the nearest n road names, check if they are not NaN
    road_names = [row[f'Nearest_Road_{i+1}_Address'].replace(' ', '').lower() 
                  if pd.notna(row[f'Nearest_Road_{i+1}_Address']) else '' 
                  for i in range(n)]
    
    # Define a similarity threshold (e.g., 50%)
    threshold = 50
    best_match = None
    best_similarity = 0
    
    # Check each road name and record match information
    for road in road_names:
        if road:  # Only proceed if the road name is not empty
            # Calculate the similarity score using fuzz.partial_ratio
            similarity = fuzz.partial_ratio(parcel_addr, road)
        
            # Keep track of the best match
            if similarity > best_similarity and similarity >= threshold:
                best_similarity = similarity
                best_match = road
    
    if best_match:
        match_segment = best_match  # Matched road segment
        original_road = row[f'Nearest_Road_{road_names.index(best_match) + 1}_Address']  # Original road name with spaces
        return pd.Series([True, match_segment, original_road])
    
    return pd.Series([False, None, None])  # Return False and None if no match found

# Step 1: Ensure 'parcel_addr' has no NaN values before applying the function
parcel_clean = parcel.loc[parcel['parcel_addr'].notna()].copy()
# Step 2: Apply the check_and_extract_match_info function to add new columns
parcel_clean[['Found_Match', 'match_segment', 'match_road_address']] = parcel_clean.apply(check_and_extract_match_info, axis=1)
# Step 3: Merge the newly created columns back into the original 'parcel' DataFrame
parcel = parcel.merge(parcel_clean[['Found_Match', 'match_segment', 'match_road_address']], 
                                          left_index=True, right_index=True, 
                                          how='left')
parcel.loc[parcel['Found_Match'] == False, 'parcel_label'] = 'no_match_address'
# Step 4: Count how many rows have 'Found_Match' == False
len(parcel[parcel['Found_Match'] == False])

	parcel_id	parcel_addr	landuse_spec	parcel_label	geometry	Nearest_Road_1_FULLNAME	Nearest_Road_2_FULLNAME	Nearest_Road_3_FULLNAME	Nearest_Road_4_FULLNAME	Nearest_Road_5_FULLNAME	Nearest_Road_6_FULLNAME	Nearest_Road_7_FULLNAME	Nearest_Road_8_FULLNAME	Nearest_Road_9_FULLNAME	Nearest_Road_10_FULLNAME	Nearest_Road_11_FULLNAME	Nearest_Road_12_FULLNAME	Nearest_Road_13_FULLNAME	Nearest_Road_14_FULLNAME	Nearest_Road_15_FULLNAME	Nearest_Road_16_FULLNAME	Nearest_Road_17_FULLNAME	Nearest_Road_18_FULLNAME	Nearest_Road_19_FULLNAME	Nearest_Road_20_FULLNAME	Nearest_Road_21_FULLNAME	Nearest_Road_22_FULLNAME	Nearest_Road_23_FULLNAME	Nearest_Road_24_FULLNAME	Nearest_Road_25_FULLNAME	Nearest_Road_26_FULLNAME	Nearest_Road_27_FULLNAME	Nearest_Road_28_FULLNAME	Nearest_Road_29_FULLNAME	Nearest_Road_30_FULLNAME	Nearest_Road_31_FULLNAME	Nearest_Road_32_FULLNAME	Nearest_Road_33_FULLNAME	Nearest_Road_34_FULLNAME	Nearest_Road_35_FULLNAME	Nearest_Road_36_FULLNAME	Nearest_Road_37_FULLNAME	Nearest_Road_38_FULLNAME	Nearest_Road_39_FULLNAME	Nearest_Road_40_FULLNAME	Nearest_Road_41_FULLNAME	Nearest_Road_42_FULLNAME	Nearest_Road_43_FULLNAME	Nearest_Road_44_FULLNAME	Nearest_Road_45_FULLNAME	Nearest_Road_46_FULLNAME	Nearest_Road_47_FULLNAME	Nearest_Road_48_FULLNAME	Nearest_Road_49_FULLNAME	Nearest_Road_50_FULLNAME	Found_Match	match_segment	match_road_address
0	400151	2717 E BELKNAP ST	C	NaN	POLYGON ((-10832192.750619514 3864892.01523430...	Blandin St	US Hwy 377	E Belknap St	Marshall St	Blandin St	Grace Ave	Noble Ave	Noble Ave	Grace Ave	Noble Ave	E Belknap St	US Hwy 377	Noble Ave	Blandin St	Blandin St	Grace Ave	Marshall St	Noble Ave	US Hwy 377	E Belknap St	Grace Ave	N Sylvania Ave	E Belknap St	US Hwy 377	N Sylvania Ave	Noble Ave	N Sylvania Ave	N Sylvania Ave	Grace Ave	Emma St	E Belknap St	US Hwy 377	N Sylvania Ave	Blandin St	Blandin St	Plumwood St	Marshall St	E Belknap St	US Hwy 377	N Sylvania Ave	E Belknap St	US Hwy 377	Noble Ave	N Sylvania Ave	Pittsburg Pl	Race St	Pittsburg Pl	N Judkins St	Race St	Juanita St	True	ebelknapst	E Belknap St
1	403412	3000 MARIGOLD AVE	R	NaN	POLYGON ((-10831742.722703427 3866447.98916702...	N Chandler Dr	Marigold Ave	N Chandler Dr	Primrose Ave	Primrose Ave	Marigold Ave	N Chandler Dr	N Chandler Dr	Primrose Ave	Marigold Ave	Honeysuckle Ave	N Chandler Dr	Primrose Ave	Primrose Ave	N Chandler Dr	Marigold Ave	N Riverside Dr	Honeysuckle Ave	Honeysuckle Ave	Primrose Ave	N Chandler Dr	N Riverside Dr	Blandin St	Yucca Ave	Primrose Ave	Yucca Ave	N Riverside Dr	Blandin St	Primrose Ave	Primrose Ave	N Chandler Dr	Blandin St	Honeysuckle Ave	Marigold Ave	Marigold Ave	N Riverside Dr	Primrose Ave	Carnation Ave	Yucca Ave	N Riverside Dr	N Riverside Dr	Blandin St	Blandin St	Carnation Ave	N Chandler Dr	Yucca Ave	N Chandler Dr	Honeysuckle Ave	Clary Ave	Honeysuckle Ave	True	marigoldave	Marigold Ave

04.Extracted Useful Columns

Ultimately, to simplify the GeoDataframe, we need to re-extract and update the data based on the relevant columns to retain only the necessary information.

Code

# Extract the useful columns
parcel = parcel[['Prop_ID','GEO_ID','parcel_id','parcel_addr','landuse','landuse_spec','parcel_label','geometry','Found_Match','match_road_address']]

	Prop_ID	GEO_ID	parcel_id	parcel_addr	landuse	landuse_spec	parcel_label	geometry	Found_Match	match_road_address
0	51	NaN	326302	117 E BELKNAP ST	NaN	C1	NaN	POLYGON ((-10835051.6694 3863246.022100002, -1...	True	E Belknap St
1	7123337	NaN	513461	1416 MICAH WAY	R	A	NaN	POLYGON ((-10822668.8036 3885115.729199999, -1...	True	Micah Way
2	7123353	NaN	513472	1412 MICAH WAY	R	A	NaN	POLYGON ((-10822686.7919 3885022.895000002, -1...	True	Micah Way
3	7123361	NaN	513474	1410 MICAH WAY	R	A	NaN	POLYGON ((-10822711.0097 3884976.338, -1082272...	True	Micah Way
4	7123388	NaN	513475	1408 MICAH WAY	R	A	NaN	POLYGON ((-10822773.1252 3884996.159000003, -1...	True	Micah Way