Guided Project: Profitable App Profiles for the App Store and Google Play Markets - Step 7 of 14

On step 7 of 14 - Removing non-english apps:

  1. I executed the function of detecting if an app is English or non-english using the is_english function in the solution. I wonder why the third-string which is clearly non-English was recognized as English (i.e. ‘爱奇艺PPS -《欢乐颂2》电视剧热播’)

  2. When I executed the code to make a list of app names that are in English, it doesn’t seem that the code worked as I still have the same number of row output as the android_clean and ios data set. (9659 rows for android vs. the expected 9614; 7197 rows for ios vs. the expected 6183).

android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if app_english(name):
        android_english.append(app)
        
for app in ios:
    name = app[1]
    if app_english(name):
        ios_english.append(app)
        
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

Anything I’m missing in the code?

Thanks!

Hi @ccontan. Could you include your code for you app_english() function as well?

App English function:

def app_english(string):
    non_ascii = 0

    for character in string:
    if ord(character) > 127: 
        non_ascii += 1
    
    if non_ascii > 3:
        return False 
    else: 
        return True

Excluding the non-English app in the dataset:

android_english = []
ios_english = []

for app in android_clean:
    name = app[0]
    if app_english(name):
    android_english.append(app)
    
for app in ios:
    name = app[1]
    if app_english(name):
        ios_english.append(app)
    
explore_data(android_english, 0, 3, True)
print('\n')
explore_data(ios_english, 0, 3, True)

I’m not able to replicate the error. I copied and pasted your code and I got an indenting error. After fixing the indenting, it ran fine and I got the expected values for the English android and ios apps. If you could upload a copy of your .lpynb file, I can have a look at it and see what else might be going on.

Basics.ipynb (22.0 KB)

Attached.

Thanks for sharing your notebook! I was able to spot the problem.

def app_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127: 
            non_ascii += 1
        
        if non_ascii > 3:
            return False 
        else: 
            return True 

The if/else part of your statement is indented inside the loop. What that means is that on the first iteration, it will check the character and increment non_ascii accordingly. Then it will process the if/else statement. Since non_ascii at that point will only be 0 or 1, it will process the else and return True. It won’t run through any of the other characters.

To fix this, take the if/else out of the loop:

def app_english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127: 
            non_ascii += 1
        
    if non_ascii > 3:
        return False 
    else: 
        return True
2 Likes