Pandas: Data Cleaning Basics - Optional Questions in #13

I just wanted to share my approach to the optional questions at the end of the lesson and bounce off of the community to see if I had arrived at the correct results – not too sure on the third question about storage space. Thanks in advance :slight_smile:

Screen Link:

Are laptops made by Apple more expensive than those made by other manufacturers?

mean_price_by_manufacturer = {}
median_price_by_manufacturer = {}
manufacturers = laptops["manufacturer"].unique()
for m in manufacturers:
    selected_rows = laptops[laptops["manufacturer"] == m]
    mean = selected_rows["price_euros"].mean()
    median = selected_rows["price_euros"].median()
    mean_price_by_manufacturer[m] = mean
    median_price_by_manufacturer[m] = median

Although Apple’s prices trend ahead of many manufacturers, they nevertheless average out cheaper than MSI, Microsoft, Razer, Google and LG when reviewing both mean and median prices:

{'Apple': 1564.1985714285713,
 'HP': 1067.7748540145985,
 'Acer': 626.7758252427185,
 'Asus': 1104.1693670886077,
 'Dell': 1186.06898989899,
 'Lenovo': 1086.3844444444446,
 'Chuwi': 314.2966666666667,
 'MSI': 1728.9081481481483,
 'Microsoft': 1612.3083333333334,
 'Toshiba': 1267.8125,
 'Huawei': 1424.0,
 'Xiaomi': 1133.4625,
 'Vero': 217.425,
 'Razer': 3346.1428571428573,
 'Mediacom': 295.0,
 'Samsung': 1413.4444444444443,
 'Google': 1677.6666666666667,
 'Fujitsu': 729.0,
 'LG': 2099.0}

{'Apple': 1339.69,
 'HP': 966.5,
 'Acer': 559.0,
 'Asus': 1012.5,
 'Dell': 985.0,
 'Lenovo': 899.0,
 'Chuwi': 248.9,
 'MSI': 1599.0,
 'Microsoft': 1569.5,
 'Toshiba': 1211.5,
 'Huawei': 1424.0,
 'Xiaomi': 1099.45,
 'Vero': 206.85000000000002,
 'Razer': 2899.0,
 'Mediacom': 265.0,
 'Samsung': 1649.0,
 'Google': 1559.0,
 'Fujitsu': 739.0,
 'LG': 2099.0}

What is the best value laptop with a screen size of 15" or more?

large_screen_laptops = laptops.loc[laptops["screen_size_inches"] >= 15]
lowest_price = large_screen_laptops["price_euros"].min()
best_value_laptop = large_screen_laptops.loc[large_screen_laptops["price_euros"] == lowest_price]
best_value_laptop_name = best_value_laptop["model_name"]
290	Chromebook C910-C2ST

Which laptop has the most storage space?

unique_storage = laptops["storage"].unique()
laptops["storage_size"] = laptops["storage"].str.split().str[0] 
storage_size_counts = laptops["storage_size"].value_counts()
laptops["storage_size"] = laptops["storage_size"].str.replace("1TB","1000GB").str.replace("2TB","2000GB")
storage_size_counts_2 = laptops["storage_size"].value_counts()
laptops["storage_size"] = laptops["storage_size"].str.replace('GB','')
storage_size_counts_3 = laptops["storage_size"].value_counts()
laptops["storage_size"] = laptops["storage_size"].astype(float)
laptops.rename({"storage_size": "storage_size_gb"}, axis=1, inplace=True)
most_storage = laptops["storage_size_gb"].max()
most_storage_laptops = laptops.loc[laptops["storage_size_gb"] == most_storage]
sorted_by_ram = most_storage_laptops.sort_values("ram_gb", ascending=False)
laptop_with_most_storage = sorted_by_ram.iloc[0]["model_name"]

To arrive at the answer below, I created a new column to store the storage size in GB for each laptop by cleaning the data, converting all values to GB, removing non-numeric values, converting the dtype to float etc. I then extracted all of the laptops with the largest storage size (2TB) and sorted by RAM to identify the device with 16GB of RAM. Is this the correct solution?

'Inspiron 5567'
1 Like

Hey, Colleen!
Tanks for sharing your research!
A dummy question: did you complete it locally on your computer or in the same window that the last exercise?
im trying to find the least annoying option here))

Hi Darya! No worries at all – I completed this within Dataquest itself (within that window I shared)…haven’t done much of the work locally at all, tbh :slight_smile:

1 Like

Ok, thank you, Colleen!


about the question: ‘Which laptop has the most storage space?’

When I ask for unique values in column ‘storage’ I get the following:



Some laptops have both SSD and HDD drives.
To get the total storage space I think you should somehow combine these.

Furthermore, to get from TB to GB i did the following:

laptops["storage"] = laptops["storage"].str.replace("GB","000").str.replace("TB","000000")

This saves you the step of removing “GB” later on.

Hi, Here is my solution to the largest Storage Part :

#Converting TB to GB
laptops[“storage”] = laptops[“storage”].str.replace(“TB”,“000”).str.replace(“GB”,"")
#Renaming Storage
laptops.rename({“storage”:“storage_gb”},axis = 1, inplace = True)
laptops[“string1”] = laptops[“storage_gb”].str.split("+").str[0]
laptops[“string2”] = (laptops[“storage_gb”].str.split("+").str[1]).fillna(“0”)
laptops[“string1”] = laptops[“string1”].str.split().str[0]
laptops[“string2”] = laptops[“string2”].str.split().str[0]
laptops[“string1”] = laptops[“string1”].astype(int)
laptops[“string2”] = laptops[“string2”].astype(int)
laptops[“final_storage”] = laptops[“string1”] + laptops[“string2”]
laptops.sort_values(“final_storage”,ascending = False).iloc[0][[“manufacturer”,“model_name”,“final_storage”]]

manufacturer MSI
model_name GS73VR Stealth
final_storage 2512
Name: 894, dtype: object