Hi, the first guided project in the Fundamentals Course of the Data Science path (Profitable App Profiles) suggests to remove non-English entries in the Google Data Set by detecting characters that are outside of the ASCII 0-127 (standard code tabel) range.
But as a Dutch native speaker I wonder if that would really filter out all the non-English app names. We hardly use any accents in our language (extended code tabel 128 and upwards), and I suspect this is the case with more languages than just Dutch.
Not really looking for a solution, just saying. But in a real project, I wonder if this solution would be acceptable! Or am I missing something?
thanks,
Annemarie