Unable to submit string challenge

Screen Link:
https://app.dataquest.io/m/346/working-with-strings-in-pandas/11/challenge-clean-a-string-column-aggregate-the-data-and-plot-the-results

My Code:

import numpy as np
merged['IncomeGroup']=merged['IncomeGroup'].str.replace("income", "").str.replace(":","").str.upper()
pv_incomes=merged.pivot_table(index='IncomeGroup',values='Happiness Score',aggfunc=np.mean)
pv_incomes.plot(kind='bar',rot=30,ylim=(0,10))```

What I expected to happen:
Successful submission

What actually happened: 

Your 1st plot doesn’t match what we expected.


<!--Enter other details below: -->
It shows some error in the pv_incomes which is the reason for wrong plot i guess.

If it’s showing you an error, then that’s also something you need to share so that others can help you out accordingly.

I would also suggest that before you share the error here, spend some time to see

  • what the error says

    • Google the error if you have to to understand what it’s about
  • where it points to in your code

    • that can help you debug the problem on your own
  • make sure you have followed the instructions accurately

I checked, my output matches to platform’s completely. The error just shows that plot doesn’t match.

Aah, ok. Understood.

The issue is with your outcome of -

merged['IncomeGroup']=merged['IncomeGroup'].str.replace("income", "").str.replace(":","").str.upper()

ends up adding an additional space. So, your get HIGH OECD and HIGH NONOECD. Both of which have a double space between their words. You need to look into removing that extra space.

In order to remove that double space i used .replace(“double_space”," ") and it was fine then. without replacing double space with single space this was automatically giving same error.

using .replace(":","") is replacing the colon with one space that causes double space issue.

No, that’s not the reason.

For the value High income: OECD, there is a space between High and income, and there’s a space between : and OECD. Your code above was not adding any spaces. It was just removing income and : (not replacing with any space). But you weren’t accounting for the leftover spaces.

Your approach of .replace(“double_space”," ") was one way to solve this. Another could have been to add a space before income in .replace("income", "").

So, it would have been .replace(" income", "").

1 Like

Got it. Thanks a lot.

1 Like

this explains perfectly this caution provided

Make sure to remove the whitespace at the end of the strings

intially thought it meant use Series.str.strip but I get it now.

In Screenlink Working with Strings In Pandas

I agree that. We are giving the “vectorized string methods” a spin while resolving the exercise. But in real time the solution won’t be this simple as just removing “Income” while normalizing the series.

Better solution for real time would be to create a map. Through dict {}…

dict ={'Upper middle income':'UPPER MIDDLE',
      'Lower middle income' : 'LOWER MIDDLE',
      'High income: OECD' : 'HIGH OECD',
       'Low income' : 'LOW',
       'High income: nonOECD':'HIGH NONOECD'
      }
merged['IncomeGroup'].replace(dict,inplace = True)
merged['IncomeGroup'] = merged['IncomeGroup'].str.strip()

pv_incomes = merged.pivot_table(index='IncomeGroup',values='Happiness Score')

pv_incomes.plot(kind = 'bar',rot = 30,ylim= (0,10))
plt.show()

It would be even better if we can import the map through file/table to dataframe and convert it into a map to normalize a particular series. Which is what usually happens in real time.

# reading csv file from url . It Can have (local_value,Mapped_value)
data = pd.read_csv("MAP.csv")

# dropping null value columns to avoid errors
data.dropna(inplace = True)

# converting to dict
#Adding the local_value
data_dict = data.set_index('local_value').to_dict()

# display
data_dict