The is english function for strings originally looks like this…
for character in string:
if ord(character) > 127:
This does not work and incorrectly returns False for this with emojis. Because they are outside of the ASCII range. So the following print statements return False when we want them to return true…
print(is_english('Docs To Go™ Free Office Suite'))
The suggested fix is to count the # outside of range and basically make exceptions if there are only less than three. But why can’t I just indent the return True to be inside of the for loop? Which seems to give me the desired results…
for char in string:
if ord(char) > 127:
I was able to explore this a little more by cleaning the data using both my formula and the formula in the solution.
I ran both formulas on the data set separately and created a separate list showing the excluded names.
Both formulas gave back very similar number of rows. But the answer in the solution looks like it did a slightly better job.
I don’t fully understand why the difference but I figure it has something to do with what is “inside” or "outside of the range of ASCII characters. Whatever that means.
Hi @davidriasp. When you put the
return True line in the loop, the function is only going to check the first character. If it’s not an ASCII character, the loop exits and returns False. If it is, it will break and return True without checking any other characters. There are a few apps that have non-ASCII first character, like
'🔥 Football Wallpapers 4K | Full HD Backgrounds 😍' and
'⋆Solitaire⋆', that will end up getting rejected.
Counting the non-ASCII characters has its downsides too, as it allows apps like
'乗換NAVITIME Timetable & Route Search in Japan Tokyo' (only 2 non-ASCII characters). Neither method is going to be perfect, so you can use your best judgment in your project.
Always easy and great explanations @april.g! Thank you so much!