Clean And Analyze Employee Exit Surveys - Combine the Data

Screen Link:

Error:

KeyErrorTraceback (most recent call last)
/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2524             try:
-> 2525                 return self._engine.get_loc(key)
   2526             except KeyError:

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'institute_service'

During handling of the above exception, another exception occurred:

KeyErrorTraceback (most recent call last)
<ipython-input-105-15e24cf5f58b> in <module>()
----> 1 combined_updated['institute_service'].value_counts()

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/frame.py in __getitem__(self, key)
   2137             return self._getitem_multilevel(key)
   2138         else:
-> 2139             return self._getitem_column(key)
   2140 
   2141     def _getitem_column(self, key):

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/frame.py in _getitem_column(self, key)
   2144         # get column
   2145         if self.columns.is_unique:
-> 2146             return self._get_item_cache(key)
   2147 
   2148         # duplicate columns & possible reduce dimensionality

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/generic.py in _get_item_cache(self, item)
   1840         res = cache.get(item)
   1841         if res is None:
-> 1842             values = self._data.get(item)
   1843             res = self._box_item_values(item, values)
   1844             cache[item] = res

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/internals.py in get(self, item, fastpath)
   3841 
   3842             if not isna(item):
-> 3843                 loc = self.items.get_loc(item)
   3844             else:
   3845                 indexer = np.arange(len(self.items))[isna(self.items)]

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
   2525                 return self._engine.get_loc(key)
   2526             except KeyError:
-> 2527                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2528 
   2529         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: 'institute_service'

I know the reason I’m getting this error is because the column combined['institute_service'] has been dropped as combined['institute_service'].value_counts().sum() only returned 273 non-null values. When I go back to the top of my jupyter notebook and insert tafe_survey_updated['institute_service'].value_counts().sum(), it returned 596. I couldn’t figure out why the values didn’t add up and get dropped later on when concatenate the two dataframe.

employee-exit-survey.ipynb (198.1 KB)



Click here to view the jupyter notebook file in a new tab

Hi @jaydenlsf47,

I opened your Jupiter on my computer, ran it as it was, and got no error.

When you concatenated both dataframes in the code cell [85], no values of institute_service were actually dropped. Instead, the institute_service column of one dataframe was stacked upon the same column of the second dataframe, as expected.

To prove it, you can insert this piece of code:

print(dete_resignations_up['institute_service'].value_counts().sum())
print(tafe_resignations_up['institute_service'].value_counts().sum())

right before your code cell [85], just to count the number of values in this column in both dataframes before their concatenation. Then, after concatenation, right after the code cell [85], put this code, to check the resulting sum of values:

print(combined['institute_service'].value_counts().sum())

And finally, run the whole notebook from the beginning and then check the values of the pieces of code above. You will see that the values were summed up.