Data Cleaning Challenge: key error "weight"?

Screen Link: Learn data science with Python and R projects

My Code:

import pandas as pd
laptops = pd.read_csv('laptops.csv', encoding='Latin-1')

def clean_w(weight):
    laptops["weight"] = laptops["weight"].str.replace("kg","").astype(float)
    laptops["weight"] = laptops["weight"].str.replace("kgs","").astype(float)
    return weight

new_weight = []
for row in laptops["weight"].unique():
    clean_weight = clean_w(row)
    new_weight.append(clean_weight)
    
weight_update = new_weight
laptops.rename({"weight":"weight_kg"},axis = 1, inplace = True)
    
df.to_csv('laptops.csv', index = False)

What I expected to happen:
Nice work

What actually happened:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'weight'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1-934527a212af> in <module>
     49 
     50 new_weight = []
---> 51 for row in laptops["weight"].unique():
     52     clean_weight = clean_w(row)
     53     new_weight.append(clean_weight)

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'weight'
  1. is it necessary to create def?
  2. can I split the replace to 2 sentences, one for replace kg to space and another for replace kgs to space?
  3. why it doesn’t meed to loop the data by using “for” syntax?
    4.what’s the key error"weight" mean actually?

Hi, ipngasi
enough only these two next lines:
laptops[“weight”] = laptops[“weight”].str.replace(“kg”,"").astype(float)
laptops[“weight”] = laptops[“weight”].str.replace(“kgs”,"").astype(float)
when you study regular expressions - it will one line

remove it all - it is redundant complicate

new_weight =
for row in laptops[“weight”].unique(): - there not row in columns weight there series with different and shorten length than row and it rising Except error
clean_weight = clean_w(row)
new_weight.append(clean_weight)
weight_update = new_weight

Hi @ipngasi , your code is actually pretty accurate. However I don’t think using for loop would be the expected way to solve the problem as pandas helps us exactly to stop using for loops in every situation, and uses vectorized methods instead.

When we use:

laptops["weight"] = laptops["weight"].str.replace("kg","").astype(float)

the str part in laptops['weight'].str.replace(...) does excatly what a for loop would do, it is accessing each string in the weight column and testing the replace method to that paticular string. (Series — pandas 1.2.4 documentation)

We could also go with:

laptops["weight"] = laptops["weight"].str.replace("kg","").str.replace("kgs","").astype(float)

I think the error you’re getting lies in the last line, as the original csv already has the name “laptops.csv”:

df.to_csv('laptops.csv', index = False)

Maybe trying to save the csv with a different name, like

df.to_csv('laptops_cleaned.csv', index = False)
  1. is it necessary to create def?
    No need
  2. can I split the replace to 2 sentences, one for replace kg to space and another for replace kgs to space? Yes

2.1 why it doesn’t meed to loop the data by using “for” syntax?
Because of the “str” method, as mentioned above and in the topic 8 of the class

  1. what’s the key error"weight" mean actually?
    Try saving the csv with a different name

Also, try to refresh your page, deleting cookies or using another browser, go to the first screen of the lesson and run your code on each page again, and at 12/14 page, remember to give a different name to the file you’re saving, that should solve the error!

1 Like

Thank you for your answer, i understand it now. And thank you to remind me the reason of using Pandas which can do the for loop function.


Not understand why it keep showing error. Do I have to use one line instead?

@squallmengxin

When ...str.replace('kg', '') runs, rows with kg are replaced with the empty string '', while kgs turns to s.

The code that tried to change the data type to float doesn’t work because of the s.

Try ...str.replace('s', '').

1 Like

@monorienaghogho Thank you so much. I understand now.

1 Like