BLACK FRIDAY EXTRA SAVINGS EVENT - EXTENDED
START FREE

Splitting funded_month column

Screen Link:
https://app.dataquest.io/c/33/m/167/guided-project%3A-analyzing-startup-fundraising-deals-from-crunchbase/2/selecting-data-types

My Code:

for chunk in chunk_iter:
    split_month = chunk['funded_month'].str.split("-")[1]
    chunk['funded_month'] = split_month
    print(split_month)

What I expected to happen:
I want to just extract the 2nd half of the split (with just the month), but for some reason, I cannot get the 2nd half of the split to iterate across the entire chunk.

What actually happened:
I am only able to get specific indexes from the split, but I cannot get the 2nd index (index 1) consistently extracted from every row in every chunk

Error:

ValueError                                Traceback (most recent call last)
<ipython-input-68-6632dab0f6fc> in <module>
     22 for chunk in chunk_iter:
     23     split_month = chunk['funded_month'].str.split("-")[1]
---> 24     chunk['funded_month'] = split_month
     25     print(split_month)
     26     #chunk['funded_month'] = pd.to_numeric(clean_term)

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/frame.py in __setitem__(self, key, value)
   2936         else:
   2937             # set column
-> 2938             self._set_item(key, value)
   2939 
   2940     def _setitem_slice(self, key, value):

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/frame.py in _set_item(self, key, value)
   2998 
   2999         self._ensure_valid_index(value)
-> 3000         value = self._sanitize_column(key, value)
   3001         NDFrame._set_item(self, key, value)
   3002 

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/frame.py in _sanitize_column(self, key, value, broadcast)
   3634 
   3635             # turn me into an ndarray
-> 3636             value = sanitize_index(value, self.index, copy=False)
   3637             if not isinstance(value, (np.ndarray, Index)):
   3638                 if isinstance(value, list) and len(value) > 0:

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/internals/construction.py in sanitize_index(data, index, copy)
    609 
    610     if len(data) != len(index):
--> 611         raise ValueError("Length of values does not match length of index")
    612 
    613     if isinstance(data, ABCIndexClass) and not copy:

ValueError: Length of values does not match length of index

Ahh I figured it out!

for chunk in chunk_iter:
split_month = chunk[‘funded_month’].str.split("-").str[1]
chunk[‘funded_month’] = pd.to_numeric(split_month)

I just needed to add that ‘str[1]’ in order to capture the month from the string split.

1 Like