ASSISTANCE REQUIRED(Using apply to transform strings)

Screen Link: < Learn data science with Python and R projects>

in the mission above : we are asked to string the element in the function we create

def extract_last_word(element):
    return str(element).split()[-1]
    
merged['Currency Apply']=merged['CurrencyUnit'].apply(extract_last_word)

My QUESTION/CONFUSION

why is there a need to string the element before spliting . Especially because the column in question merged[‘CurrencyUnit’] is an object type / string type already.

17 CurrencyUnit 145 non-null object

when I decide not to string it, like below , I get an error msg

def extract_last_word(element):
    return element.split()[-1]
    
merged['Currency Apply']=merged['CurrencyUnit'].apply(extract_last_word)

ERROR MESSAGE

AttributeError                            Traceback (most recent call last)
<ipython-input-1-943dab601ea3> in <module>
      9     return element.split()[-1]
     10 
---> 11 merged['Currency Apply']=merged['CurrencyUnit'].apply(extract_last_word)
     12 
     13 merged['Currency Apply'].head()

/dataquest/system/env/python3/lib/python3.8/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   3846             else:
   3847                 values = self.astype(object).values
-> 3848                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   3849 
   3850         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()

<ipython-input-1-943dab601ea3> in extract_last_word(element)
      7 ##INITIAL CODE ENDS
      8 def extract_last_word(element):
----> 9     return element.split()[-1]
     10 
     11 merged['Currency Apply']=merged['CurrencyUnit'].apply(extract_last_word)

AttributeError: 'float' object has no attribute 'split' 

but the column merged[‘CurrencyUnit’] is not a float

1 Like

This is a very good question and I can see why it is causing you confusion. It is true that merged['CurrencyUnit'] is an object type which usually means string but be careful of that assumption; it isn’t always a “pure string type” in the series.

As luck would have it, the error message is telling use the truth…there’s a float in there somewhere! But where?! And how can we find it?

Here are a few things I tried before figuring out why you are getting this error:

merged['CurrencyUnit'].value_counts(dropna=False)
merged['CurrencyUnit'][merged['CurrencyUnit'].isnull()]
type(merged['CurrencyUnit'][37])

Trying running each of these lines separately and figure out what the output is telling us about the data. In the end, let me know if this doesn’t answer your question.

It did solve my issue.Especially the identification of the float element within the CurrencyUnit column causing the error ,hence the need for the element to be converted into a string

Does it mean that when the data was read using pd.read_csv, pandas couldnt read properly some the elements in the ‘CurrencyUnit’ as such assigned Nan to them .?

But is strange the Nan has a type of float?

Below is an extract I recall from a previous pandas mission:
There is also a type we haven’t seen before, object, which is used for columns that have data that doesn’t fit into any other dtypes. This is almost always used for columns containing string values.

When we import data, pandas will attempt to guess the correct dtype for each column.

It’s not that pandas couldn’t read the data properly or that a mistake was made while reading it, but rather that the data doesn’t exist for that observation/row in the first place! In other words, it’s missing data to begin with. This is very common with datasets. You will run into this CONSTANTLY when working with data. You will see Null, None, na, and NaN often and it just simply means we don’t have a value for that data for one reason or another. Datasets are rarely “perfect!”

Yes! It is strange, isn’t it? Try reading a bit of this Wiki to find out that it gets even more strange:

In computing, NaN (/næn/), standing for Not a Number

So, it’s “not a number” but python classifies it as a float! Very strange indeed! Let’s just say that in the world of NaN, intuition can get you into trouble. It just takes time and practice before you get comfortable working with (and around) them.

While this is true, even in this case because our column does contain string values, it doesn’t fully explain why you were getting this particular error…now we know it’s because of those missing values (Nan) which python (counter-intuitively) considers to be floats.

2 Likes