Data Cleaning with Pandas - Correcting Bad Values

Using the map function as per the tutorial have some deal breaking flaws which such as large number of unique values and being too tedious while easily lead to human error. Is this really a “industry standard” way of correcting this type of data?

print(laptops["os"].unique())

mapping_dict = {
    'Android': 'Android',
    'Chrome OS': 'Chrome OS',
    'Linux': 'Linux',
    'Mac OS': 'macOS',
    'No OS': 'No OS',
    'Windows': 'Windows',
    'macOS': 'macOS'
}

laptops["os"] = laptops["os"].map(mapping_dict)
1 Like

Providing a mapping dictionary would be really unnecessarily tedious, as you can see many key values are the same so no replacement is occurring but they just have to be there to prevent Nan outputs.
pandas.Series.str.replace would be a more flexible option where you can either replace 1 string at a time and chain them if you need more, or just provide an all-encompassing regex pattern to replace multiple strings that match that pattern to the same value.

1 Like