Pros and cons of two approaches

Which approach is better ( if any) and why ?:

# Approach 1:
# autos = autos.rename({'dateCrawled': 'date_crawled', 
#                         'offerType': 'offer_type',
#                         'vehicleType': 'vehicle_type',
#                         'yearOfRegistration': 'registration_year',
#                         'monthOfRegistration': 'registration_month', 
#                         'powerPS': 'power_ps',
#                         'fuelType': 'fuel_type', 
#                         'notRepairedDamage': 'unrepaired_damage',
#                         'dateCreated': 'ad_created',
#                         'nrOfPictures': 'num_of_pictures', 
#                         'postalCode': 'postal_code', 
#                         'lastSeen': 'last_seen_date'}, axis=1)
# autos.columns

# Approach 2:
def clean_col(col):
    col = col.replace("yearOfRegistration","registration_year")
    col = col.replace("monthOfRegistration","registration_month")
    col = col.replace("notRepairedDamage","unrepaired_damage")
    col = col.replace("dateCreated","ad_created")
    col = col.replace("dateCrawled","date_Crawled")
    col = col.replace("offerType","offer_Type")
    col = col.replace("dateCrawled","date_Crawled")
    col = col.replace("vehicleType","vehicle_Type")
    col = col.replace("powerPS","power_PS")
    col = col.replace("fuelType","fuel_Type")
    col = col.replace("nrOfPictures","num_of_pictures")
    col = col.replace("postalCode","postal_Code")
    col = col.replace("lastSeen","last_Seen")
    col = col.lower() # lowercase
    return col

new_columns = []

for c in autos.columns:
    clean_c = clean_col(c)
    new_columns.append(clean_c)
    
autos.columns = new_columns

print(autos.columns)
autos.head()

Hi @drill_n_bass,

I’m definitely for the first approach. The second one is much longer and cumbersome: you create quite a case-specific function which actually you are not going to use many times in your project (and this should be the main objective of function creation), use a lot of repetitive code col = col.replace(..., then you change the case to lower, introduce a for-loop, re-assign column names. All in all, why to do all these movements and “invent the bicycle” when we already have an existing, elegant approach (the first one) to rename columns of a dateframe? :blush:

1 Like

ok:)
thnk you for the feedback ! :slight_smile:

1 Like