Data Cleaning Basics Challenge Question

Hello,
I had three questions on Page 12 of the “Data Cleaning Basics” Mission.

URL: Learn data science with Python and R projects

Question 1: How would someone know that there is a ‘kgs’ for a unit and not just ‘kg.’ I did value_counts() and all I saw was ‘kg.’ Is there a method to find unique units?

Question 2: My original code was:
laptops.loc[:,'weight']=laptops.loc[:,'weight'].str.replace('kg','').str.replace('kgs','').astype(float)

This raised an error. When I flipped the ‘kgs’ and ‘kg,’ everything worked fine. Is there a reason why it needs to be in that order to work?

Question 3: For the df.to_csv() method, what happens if we don’t do index=False when saving this new csv file?

1 Like

Did you try
print(laptops['weight'].unique())

You have removed kg from all the rows, which means that 4kgs' is now '4s'.
Hence, replace('kgs','') will not make any changes to the data. Next, when you’re trying .astype(float), on the string '4s', an error will be thrown.

As explained in the Learn section:

By default, pandas will save the index labels as a column in the CSV file. Our dataset has integer labels that don’t contain any data, so we don’t need to save the index.

You can try and run laptops.to_csv('laptops_cleaned.csv', index = False) in your local, and see the difference in the csv files.

Hope its clear now.

4 Likes

Hi @dash.debasmita,

Thanks for the reply.

For Question 1, I did do that, however, it is the number and the unit. I didn’t know if there was just a way to see unique units. I know there was a “kgs” because the page told me, however, if it didn’t (lets say we were looking at this for a job), how would one have found that one “kgs” in a sea of “kg?”

For Question 2, thank you! That was bothering me so much!

For Question 3, how do I save the cleaned csv to my computer? Or run it on my computer? That’s what I wanted to do last night but I didn’t know how, so I was just playing around with it in the (I believe it’s called) the script editor. I like NOT doing what the page tells me sometimes because I like to see the error that occurs. I know that sounds weird, but it helps me learn :grinning:!

I don’t know any built-in method for this, but you can extract the units into a temp Series and then check for unique value.

temp = laptops['weight']
temp = temp.apply(lambda x: x[str(x).find('k'):])
temp.value_counts()

You can use Jupyter Notebook on your local machine.

1 Like

Hi @dash.debasmita,
Thank you for your help.