Solution for extra challenge questions 293-12

#Solution for extra challenge questions
‘’’
Our data set is ready for some analysis, but there are still some data cleaning tasks left! Here are your next steps:

  • Convert the price_euros column to a numeric dtype.
  • Extract the screen resolution from the screen column.
  • Extract the processor speed from the cpu column.

Here are some questions you might like to answer in your own time by analyzing the cleaned data:

  • Are laptops made by Apple more expensive than those made by other manufacturers?
  • What is the best value laptop with a screen size of 15" or more?
  • Which laptop has the most storage space?
    ‘’’
    #weight cleaning
    laptops[“weight”] = laptops[“weight”].str.replace(r’(kg).*’,’’)
    laptops = laptops.astype({“weight”:float})
    laptops.rename(columns={“weight”: “weight_kg”},inplace=True)

laptops.to_csv(“laptops_cleaned.csv”)

laptops_temp = laptops.copy() #for submit answer workaround ,will be modifying laptops for further analysis

#challange questions on next screen

#price column cleaning
laptops[“price_euros”] = laptops[“price_euros”].str.replace(",","")
laptops =laptops.astype({“price_euros”:float})

#Extract the screen resolution from the screen column.
laptops[“resolution”]=laptops[“screen”].str.split(" ").str[-1]

#Extract the processor speed from the cpu column.
laptops[“speed”]=laptops[“cpu”].str.split(" ").str[-1]

#Which laptop has the most storage space?
laptops.info()
storage = laptops[“storage”].str.split(" ").str[0].str.extract(r’(\d+)(\w+)’)
storage_unit = storage[1].unique()
storage = storage.astype({0: ‘int32’})
storage.loc[storage[1] == ‘TB’,0] *= 1024
storage.loc[storage[1] == ‘TB’,1] = ‘GB’
laptops[“storage_gb”] = storage[0]

most_storage = laptops.loc[laptops[“storage_gb”] == laptops[“storage_gb”].max(),:]

#What is the best value laptop with a screen size of 15" or more?

laptop_screen_large = laptops[laptops[“screen_size_inches”] > 15]

laptop_screen_large_best_value = laptop_screen_large[laptop_screen_large[“price_euros”] == laptop_screen_large[“price_euros”].min()]

#Are laptops made by Apple more expensive than those made by other manufacturers?

manufacturers = laptops[“manufacturer”].unique()
manufacturers_mean_price = {}
for manufacturer in manufacturers:
selected_manufacturer = laptops[laptops[“manufacturer”]==manufacturer]
selected_manufacturer_mean_price = selected_manufacturer[“price_euros”].mean()
manufacturers_mean_price[manufacturer] = selected_manufacturer_mean_price

manufacturers_mean_price_series = pd.Series(manufacturers_mean_price)
manufacturers_mean_price_series.index.name = “manufacturer”
manufacturers_mean_price_series=manufacturers_mean_price_series.rename(“mean”)
manufacturers_mean_price_series=manufacturers_mean_price_series.sort_values(ascending=False)

#Razer has most expensive laptop

‘’’

Manufacturer

Razer 334614.285714

LG 209900.000000

MSI 172890.814815

Google 167766.666667

Microsoft 161230.833333

Apple 156419.857143

‘’’

laptops=laptops_temp.copy() #//for submit answer workaround

Finding it difficult understanding how you analyzed the laptops with highest storage question. Can you explain please?

#Which laptop has the most storage space?
laptops.info()
storage = laptops[“storage”].str.split(" ").str[0].str.extract(r’(\d+)(\w+)’)

in above line is first I split and extracted how many storage :example

[128GB, Flash, Storage]

at 0 index I got 128gb , Now I want 128 and GB .
BUt in some cases unit is TB also.
So what I did is used regular expression to extract groups.
r’(\d+)(\w+)’) with (\d+) I extracted 128 and with (\w+)’ I extracted GB .

storage_unit = storage[1].unique()

After that I analysed how many unique values are there and I found out that only GB and TB are there as unit.

storage = storage.astype({0: ‘int32’})
Converted the column to integer for mathematical operations.

storage.loc[storage[1] == ‘TB’,0] *= 1024
storage.loc[storage[1] == ‘TB’,1] = ‘GB’
laptops[“storage_gb”] = storage[0]

Converted all TB into GB .

most_storage = laptops.loc[laptops[“storage_gb”] == laptops[“storage_gb”].max(),:]

and above gave me laptop has the most storage space

Hi, I think the first challenging question is ‘Are laptops made by Apple more expensive than those made by other manufacturers?’

Are you analyzing the price based on the storage? does it mean you group the laptops by brand and by storage, then compare the average price among the manufacturers?

No, I grouped laptops only by manufacturers .
I have just cleaned the storage column to make it consistent as it contained TB and GB. Made it all GB to make it useful to do analysis.

May be I have changed the order of answering the challenge questions which created confusion.

Hi! yes , i found the answer is in a different order. Could you explain below code? thank, i dont understand how did you use the column for calculation after you changed datatype. Thank you in advance!

after splitting with help of regular expression I got a dataframe consisting of 2 columns , 1 column I converted to int32 and other is object. To first column I multiplies using vectorized multiplication *1024 . Note that I want to multiply 1024 only to those cell having TB so I selected only those using condition . storage.loc[storage[1] == ‘TB’,0] *= 1024 means select column 0 and where column 1 has value “TB” , multiply by 1024.
I used short hand notification like a+=1 , a = a+1 .

Second column I replaced TB with GB .

At last I created a column called laptops[“storage_gb”] for analysis purpose .

You can get more involvement from the community if you format your post a little more clearly. :slight_smile:

Can you please check out the technical question guidelines here?

1 Like