Returning Number of English Apps

Screen Link:
https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/7/removing-non-english-apps-part-two

My Code:

def english(string):
    non_english = []
    for character in string:
        if ord(character) > 127:
            non_english.append(character)
        
    if len(non_english) > 3:
        return 'False'
    else:
        return 'True'
    
english('爱奇艺PPS -《欢乐颂2》电视剧热播')

android_english = []
ios_english = []

for app in ios:
    app_name = app[1]
    if english(app_name):
        ios_english.append(app)
        
for app in android_clean:
    app_name = app[0]
    if english(app_name):
        android_english.append(app)
        
explore_data(ios_english, 0, 3, True)
print ('\n')
explore_data(android_english, 0, 3, True)

What I expected to happen:
Return the number of English apps in the non-duplicate list

What actually happened:
Returned the same number of apps in the non-duplicate list

Number of rows: 7197
Number of columns: 16

Number of rows: 9659
Number of columns: 13

Why does this not return the same answer as the solution (6183 and 9614)? It is identical to the solution code, except that I named my variables ‘app_name’ instead of ‘name’ and function ‘english’ instead of ‘is_english’.

Please provide the explore_data() code.

def explore_data(dataset, start, end, rows_and_columns=False):
    dataset_slice = dataset[start:end]    
    for row in dataset_slice:
        print(row)
        print('\n') 
        
    if rows_and_columns:
        print('Number of rows:', len(dataset))
        print('Number of columns:', len(dataset[0]))
        
print(android_header)
print('\n')
explore_data(android, 0, 3, True)

Could you kindly upload your notebook, so that I can search. The error doesn’t seem to come from these.

Basics.ipynb (109.4 KB)

Click here to view the jupyter notebook file in a new tab

1 Like

This is where the problem was. Change your english() to the below.

I will try to explain why it gave the wrong result.

def english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True
1 Like

I ran these two using pandas DataFrame and both worked. They gave the correct results. I do not know why your code did not work for this instance.

def english(string):
    non_english = []
    for character in string:
        if ord(character) > 127:
            non_english.append(character)
        
    if len(non_english) > 3:
        return 'False'
    else:
        return 'True'

import pandas as pd
ios_a = pd.DataFrame(ios, columns=ios_header)
ios_a.head(2)

ios_a['confirm'] = ios_a['track_name'].apply(english)
ios_a['confirm'].value_counts()
def english(string):
    non_ascii = 0
    
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
    
    if non_ascii > 3:
        return False
    else:
        return True
    
import pandas as pd
ios_a = pd.DataFrame(ios, columns=ios_header)
ios_a.head(2)
    
ios_a['confirm'] = ios_a['track_name'].apply(english)
ios_a['confirm'].value_counts()

Hello Everyone,
I wrote the code as below. And it is working fine & working fine for emojis, other characters. It seems same can be used. Please advice if it is not sufficient.

English Apps##

def is_english(string):
for character in string:
if ord(character) > 127:
return False
else:
return True

print(is_english(‘Instagram’))
print(is_english(‘爱奇艺PPS -《欢乐颂2》电视剧热播’))
print(is_english(‘Docs To Go™ Free Office Suite’))
print(is_english(‘Instachat :stuck_out_tongue_winking_eye:’))

@s.cook20
The problem might in the body of your function, what if instead of having non_english as an empty list then set it to zero i.e nono_english = 0 .
So any time there is a non_english letter, you have an increment by one to non_english which we have initially set to zero.This helps to count out number of non-english words.
have a look

def english(string): 
    non_english = 0
    for character in string:
        if ord(character) > 127:
            non_english += 0 

With the code above, you can then bring in the argument i.e, if there are more than three characters which fall out in the ASCII then the whole character become non-english(return True) otherwise return False.
Have a look

if non_english >3:
    return False

return True

With this, you can call your function with any character as the argument and shall turn out correctly.
for example

print(english('爱奇艺PPS -《欢乐颂2》电视剧热播'))

False

@deodattatijare
But with this

def is_english(string):
    for character in string:
       if ord(character) > 127:
           return False
      
         
 return True

print(is_english('Instagram’))
print(is_english(‘爱奇艺PPS -《欢乐颂2》电视剧热播’))
print(is_english(‘Docs To Go™ Free Office Suite’))
print(is_english(‘Instachat :stuck_out_tongue_winking_eye:’))

alone , most of the english apps will be left out. For example english app like Instachat :stuck_out_tongue_winking_eye: will be included as non-english just because of that single emoji.
This is why we should come up with an argument, where an app will be excluded as an english app iff there are more than three characters that fall out in the ASCII.This will help us not to lose most of the english apps with at most three emojis or other characters.
have a look

def english(string):
    non_ascii = 0
    for character in string:
        if ord(character) > 127:
            non_ascii += 1
        
    if non_ascii > 3:
        return False
    
        
return True
1 Like

But its working with ‘if else’. It’s different form suggested solution. Herein else is kept as a part of if condition. Attached code snip with output.
This may remove non English app part 2 i.e. screen 7 code. Plz correct me if i am not missing any issues which may face in later part of coding.

1 Like

@deodattatijare
The reason why ‘’‘instachat😄’’’ is evaluated to be True is because of the ‘if else’ .Remember in ‘if else’ once one of the condition pass, the rest are ignored. i.e if you check the first character in “instachat😄” letter ‘i’ is falling in ASCII range, and by that, the whole name will be evaulated True( the rest of the characters in “instachat :smile:” will be ignored including the emoji, which is non-ascii.)

2 Likes

I get it now. Suggested code is better as it actually scanning all letters in string and gives result.
Thanks for explaining.

2 Likes

Can someone have a look at this peice of code?
i tested it and it seems to work, but is very different in syntax compared to the solution. I am missing something, but what…

def is_english(string):
    count = 0
    for char in string:
        if ord(char) > 127 :
            count = count +1
            if count > 3 :
                return False
    
    return True

Thanks a lot for feedback.