Hi:)
Below my answers for additional questions from this exercise( 13/14). I’ve done it in the last page where I could, so, here(exercise 12/14).
I’m open to suggestions ( what could be done better, more simply ect.). Also I posted some additional questions at the end of this post. Would be very happy if someone put some light on those topics.
Base code from exercise 293-12:
laptops['weight'] = (laptops['weight'].str.replace('kg','') .str.replace('s','').astype(float)
)
laptops.rename({"weight":'weight_kg'}, axis=1, inplace=True)
Below additional DQ tasks ( don’t know why they was on 13/14 page):
# Convert the price_euros column to a numeric dtype.
laptops['price_euros'] = (laptops['price_euros']
.str.replace(',','.')
.astype(float)
)
# Extract the screen resolution from the screen column.
laptops["screen resolution"] = (laptops["screen"].str.split().str[-1])
# Extract the processor speed from the cpu column.
laptops["processor speed"] = (laptops["cpu"].str.split().str[-1])
# last part of the exercice 12/14 page ( it has to be under 13/14 page for saving cause)
dtypes = laptops.dtypes
laptops.to_csv('laptops_cleaned.csv',index=False)
“Here are some questions you might like to answer in your own time by analyzing the cleaned data”:
# 1. Are laptops made by Apple more expensive than those made by other manufacturers?
# Answer: That's a tricky one! The correct answer is YES. No need to write a code to check it.
# The idea of programming is to simplify things. Obvious things like: do birds fly? Is the flame hot?
# Are laptops made by Apple more expensive than those made by other manufacturers and will they not have usb ports? - those things are from the same "obvious" category. :)
# Let's suppose that the brand isn't called Apple( at this point you can see how much I like this company), but: worthless-trendy-garbage(aka. WTG) compared with good laptops. The answer is:
# Identify laptops from wtg and good ones:
wtg = laptops[laptops['manufacturer'] == "Apple"]
good_laptops = laptops[laptops['manufacturer'] != "Apple"]
# Find the minimum, maximum and average prices for both objects written above:
wtg_av = wtg['price_euros'].describe()[1]
wtg_min = wtg['price_euros'].describe()[3]
wtg_max = wtg['price_euros'].describe()[7]
good_laptops_av = good_laptops['price_euros'].describe()[1]
good_laptops_min = good_laptops['price_euros'].describe()[3]
good_laptops_max = good_laptops['price_euros'].describe()[7]
# Pring of the answer for the first additional mission:
print("Answer 1:")
print('\n')
print("WTG average price is:", int(wtg_av), "EUR")
print('"Good laptops" average price is:', int(good_laptops_av), "EUR")
(print('"Good laptops" are cheaper by', int(wtg_av)
- int(good_laptops_av), "EUR", "on average"))
print('\n')
print("WTG maximum price is:", int(wtg_max), "EUR")
print("'Good laptops' max price is:", int(good_laptops_max), "EUR")
# Additional "if statement" needed,
# because - surprisingly - there are good laptops more expensive than WTG.# There is some logical reason for that, for sure..if (int(wtg_max) - int(good_laptops_max)) > 0:
(print('"Good laptops" are cheaper by', int(wtg_max)
- int(good_laptops_max), "EUR", "on maximum price range")
)
if (int(wtg_max) - int(good_laptops_max)) < 0:
(print('WTG are cheaper by', int(good_laptops_max)
- int(wtg_max), "EUR", "on maximum price range")
)
# the exception proves the rule on maximum price range and with penguins - penguins don't fly.
print('\n')
print("WTG minimum price is:", int(wtg_min), "EUR")
print("'Good laptops' minimum price is:", int(good_laptops_min), "EUR")
(print('"Good laptops" are cheaper by', int(wtg_min)
- int(good_laptops_min), "EUR", "on minimum price range"))
# 2. What is the best value laptop with a screen size of 15" or more?
pc_15_inch_mask = laptops[laptops['screen_size_inches'] >= 15.0]
pc_15_inch_mask_sorted = pc_15_inch_mask.sort_values('price_euros')
sorted_pc_15 = pc_15_inch_mask_sorted.iloc[0]
top_pc_15plus_name = sorted_pc_15['model_name']
# Pring of the answer for the second additional mission:
print('==========================================================')
print('\n')
print("Answer 2:")
print('\n')
print('top laptop with a screen size of 15" or more is:', top_pc_15plus_name)
print('\n')
print('full data for best 15":', '\n', '\n', sorted_pc_15)
# 3. Which laptop has the most storage space?
# Building def for removing prefix:
clean_storage = (laptops[laptops['storage']
.str.extract('(\d+)').astype(float) < 64]
)# "< 64" allows to filter just TB storages
clean_storage_max = laptops[laptops["storage"] == clean_storage['storage'].max()]
clean_storage_max_the_one = clean_storage_max.iloc[0]
print('==========================================================')
print('\n')
print('Answer 3:')
print('\n')
print(clean_storage_max_the_one['manufacturer':'model_name'])
print(clean_storage_max_the_one['storage'])
Additional questions to DQ community:
question 1
What’s the difference between Series.str.rsplit and Series.str.split? At one point I tried to use this method. But I was defeated…
I thought that the code:
laptops["screen resolution"] = (laptops["screen"].str.split().str[-1])
is equivalent to:
laptops["screen resolution"] = (laptops["screen"].str.rsplit().str[1])
…but it’s not. Why ? Is the “rsplit” not the reverse method to “split”?
question 2:
why series.str.split() doesn’t remove the thing that goes to the new column from the old one?
If I have an ex screen: IPS Panel Retina Display 2560x1600 - and I want to move just “2560x1600”. After using this method, the screen column contains “2560x1600” still.