Data Cleaning Basics: 12. Challenge: Clean a String Column help

Screen Link:
https://app.dataquest.io/c/54/m/293/data-cleaning-basics/12/challenge-clean-a-string-column

My Code:

laptops["weight"] = laptops["weight"].str.replace("kgs", "")
laptops["weight"] = laptops["weight"].str.replace("kg", "").astype(float)
laptops.rename({"weight": "weight_kg"}, axis=1, inplace=True)
laptops.to_csv('laptops.csv',index=False)

What I expected to happen:
Essentially I expected this to remove the strings “kg” and “kgs” from the entries in the “weight” column, and change them to floats. After that rename the column “weight” to “weight_kg” and save the csv file to a new csv file.

What actually happened:
Well originally after I pressed the the Run Code Button i managed to get the code running without errors, however after I pressed the submit button I got an error message similar to what I have now but with the KeyError: ‘weight’ in relation to the weight column. However it some how morphed into KeyError: ‘screen_size’ in relation to the the ‘screen_size’ column. I have no idea how or why this is as I never personally had to do anything to the “screen_size” column I was only shown how to clean the data in that column in examples. My best guess at this point is that I overwrote the original “laptops.csv” when I tried to save the csv file to a new file but forgot to change to csv file name.

At present I have tried running the solution code but receive another error message KeyError: ‘weight’ .

Error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2645             try:
-> 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'screen_size'

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-1-298eaae3ab86> in <module>
     11 laptops.columns = [clean_col(c) for c in laptops.columns]
     12 
---> 13 laptops["screen_size"] = laptops["screen_size"].str.replace('"','').astype(float)
     14 laptops.rename({"screen_size": "screen_size_inches"}, axis=1, inplace=True)
     15 laptops["ram"] = laptops["ram"].str.replace('GB','').astype(float)

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels > 1:
   2799                 return self._getitem_multilevel(key)
-> 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-> 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'screen_size'

Key error when I run the solution code:

“”"

KeyError Traceback (most recent call last)
/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2645 try:
→ 2646 return self._engine.get_loc(key)
2647 except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ‘weight’

During handling of the above exception, another exception occurred:

KeyError Traceback (most recent call last)
in
4 #laptops.to_csv(‘laptops_cleaned.csv’,index=False)
5
----> 6 laptops[“weight”] = laptops[“weight”].str.replace(“kgs”,"").str.replace(“kg”,"").astype(float)
7 laptops.rename({“weight”: “weight_kg”}, axis=1, inplace=True)
8 laptops.to_csv(‘laptops_cleaned.csv’,index=False)

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/frame.py in getitem(self, key)
2798 if self.columns.nlevels > 1:
2799 return self._getitem_multilevel(key)
→ 2800 indexer = self.columns.get_loc(key)
2801 if is_integer(indexer):
2802 indexer = [indexer]

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2646 return self._engine.get_loc(key)
2647 except KeyError:
→ 2648 return self._engine.get_loc(self._maybe_cast_indexer(key))
2649 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
2650 if indexer.ndim > 1 or indexer.size > 1:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ‘weight’
“”"

Hi @ilondire05

I believe that the datasets are connected throughout the screens in a lesson. So any changes made earlier gets reflected into the next screens as well. Probably it can also affect the previous ones too.

When I checked the lessons, the 3rd instruction says

  1. Use the DataFrame.to_csv() method to save the laptops dataframe to a CSV file laptops_cleaned.csv without index labels.

I can see that you have made the changes and updated to laptops.csv instead of laptops_cleaned.csv

Could you please try changing the name of your output csv file? I hope this sorts out your issue.