participants = pd.read_csv('participants.csv')
participants['name'] = participants['name'].str.title()
for key in size_replacement_table:
if val in size_replacement_table[key]:
participants['t-shirt'] = participants['t-shirt'].apply(change_size)
What I expected to happen:
I thought that the above code would yield the correct answer for the practice problem.
What actually happened:
Inspecting the variable editor seems to suggest that the dataframe my code created and that of the expected answer are identical. I would be very grateful if someone could point out the issue with my code. I have wracked my brains but can’t seem to come up with a suitable answer. Thank you so much.
Sometimes, you have to be careful of edge cases that can come up when trying to clean data.
For example, you use
str.title() to ensure that the first letter of the first and the last name are capitalized. In theory, that seems fine.
However, you might have a name like -
Your code will change the above to -
Parry Ben-Aharon. That
A should not be capitalized.
That’s the kind of edge cases you have in this data. Some other examples you should be careful of -
Now, the thing is the above is based on Dataquest’s implementation. Which, as per me, is incorrect.
Because names like
Markus O'growgane do have that
g capitalized. For example - https://en.wikipedia.org/wiki/Liam_O’Brien
As per me, it’s better to have the output that you get than the one that DQ has right now. But, for the time being, that’s not what they expect here, so try to modify your implementation.
@Sahil - I think this should be looked into based on what’s considered to be the right way to write such names.
Thank you so much! I’ll try to keep edge cases in mind in my future analyses.