Data cleaning basics - storage sort

Hi,
as i’m trying to work out on the data cleaning basics, i got struck by a question which i dont know to answer/find a logic.

In one of the additional questions , it was asked to give the highest storage capacity of the laptops. but i’m unable to find a logic that helps to sort the value which has both numbers and alphabets.

Screen Link:

https://app.dataquest.io/m/293/data-cleaning-basics/13/next-steps

Hello, seems you provided wrong link, kindly update the link.

Hi,
What you have to do is to get rid of letters (.str[-2]) to choose the last two characters) and keep with numbers only. The challenge is that although most values are in GB, there are some in TB.
What I did was to replace the ‘GB’ by ‘’ and convert to float those rows where the storage space measured in GB and for those with TB additionally multiply by 1024.

laptops.loc[laptops['storage_space'].str[-2:] == 'GB', 'storage_space'] =laptops.loc[laptops['storage_space'].str[-2:] == 'GB', 'storage_space'] .str.replace('GB', '').astype(float)

laptops.loc[laptops['storage_space'].str[-2:] == 'TB', 'storage_space'] =laptops.loc[laptops['storage_space'].str[-2:] == 'TB', 'storage_space'] .str.replace('TB', '').astype(float) * 1024

laptop_max_storage = laptops[laptops['storage_space'] == laptops['storage_space'].max()]
3 Likes

good idea :slight_smile:
Below another approach( I rid off all numbers higher than 64 GB ( there is no smaller drivers atm, so all things smaller than 64 is in TB :wink: )

btw. It can be defined even lower, when error occurs ( if some older laptop with 32 GB appears).

# 3. Which laptop has the most storage space?

# Building def for removing prefix:
clean_storage = (laptops[laptops['storage']
                         .str.extract('(\d+)').astype(float) < 64]
                )# "< 64" allows to filter just TB storages

clean_storage_max = laptops[laptops["storage"] == clean_storage['storage'].max()]
clean_storage_max_the_one = clean_storage_max.iloc[0]
print('\n')
print('Answer 3:')
print('\n')
print(clean_storage_max_the_one['manufacturer':'model_name'])
print(clean_storage_max_the_one['storage'])

output:
image
Variables: