# Data cleaning basics - storage sort

Hi,
as i’m trying to work out on the data cleaning basics, i got struck by a question which i dont know to answer/find a logic.

In one of the additional questions , it was asked to give the highest storage capacity of the laptops. but i’m unable to find a logic that helps to sort the value which has both numbers and alphabets.

https://app.dataquest.io/m/293/data-cleaning-basics/13/next-steps

Hi,
What you have to do is to get rid of letters (`.str[-2]`) to choose the last two characters) and keep with numbers only. The challenge is that although most values are in GB, there are some in TB.
What I did was to replace the ‘GB’ by ‘’ and convert to float those rows where the storage space measured in GB and for those with TB additionally multiply by 1024.

``````laptops.loc[laptops['storage_space'].str[-2:] == 'GB', 'storage_space'] =laptops.loc[laptops['storage_space'].str[-2:] == 'GB', 'storage_space'] .str.replace('GB', '').astype(float)

laptops.loc[laptops['storage_space'].str[-2:] == 'TB', 'storage_space'] =laptops.loc[laptops['storage_space'].str[-2:] == 'TB', 'storage_space'] .str.replace('TB', '').astype(float) * 1024

laptop_max_storage = laptops[laptops['storage_space'] == laptops['storage_space'].max()]``````
3 Likes

good idea
Below another approach( I rid off all numbers higher than 64 GB ( there is no smaller drivers atm, so all things smaller than 64 is in TB )

btw. It can be defined even lower, when error occurs ( if some older laptop with 32 GB appears).

``````# 3. Which laptop has the most storage space?

# Building def for removing prefix:
clean_storage = (laptops[laptops['storage']
.str.extract('(\d+)').astype(float) < 64]
)# "< 64" allows to filter just TB storages

clean_storage_max = laptops[laptops["storage"] == clean_storage['storage'].max()]
clean_storage_max_the_one = clean_storage_max.iloc[0]
print('\n')