Difference between length of English apps in my project and the solution

Screen Link: https://app.dataquest.io/m/350/guided-project%3A-profitable-app-profiles-for-the-app-store-and-google-play-markets/7/removing-non-english-apps-part-two

My Code:

def check_english(word) : 
    count = 0
    for character in word :
        if ord(character) > 127 :
            count += 1
            if count > 3:
                return False
    return True
print(check_english('Instagram'))
print(check_english('爱奇艺PPS -《欢乐颂2》电视剧热播'))
print(check_english('Docs To Go™ Free Office Suite'))
print(check_english('Instachat 😜'))
>>True
>>False
>>True
>>True

english_ios_apps = []
english_android_apps = []
for row in ios_apps[1:] :
    name = row[1]
    if check_english(name) :
        english_ios_apps.append(row)
for row in android_clean :
    name = row[0]
    if check_english(name):
        english_android_apps.append(row) 
explore( english_android_apps, 0, 3, True)
print('\n')
explore(english_ios_apps, 0, 3, True)

What I expected to happen:

Number of rows: 9614
Number of columns: 13

Number of rows: 6183
Number of columns: 16

What actually happened:

Number of rows: 9502
Number of columns: 13

Number of rows: 6100
Number of columns: 16

You see the difference of results , I want to known if the data set I use now, was the same for the solution notebook . Thanks to give me an answer .

hi @biadboze

I tried your code with re-downloaded datasets. It gave me 9614 and 6180 records for android and apple datasets respectively.

There might/could be something different in the previous steps from the project instructions leading to the difference.

For the previous steps I have the same results like in the solution Notebook. what is the size of your different data sets?

hi @biadboze

image

you can try to rename the current datasets as backups and download the datasets used in the solution notebook (helpful post)

and perform a re-run of the kernel. if you still get the same mismatch then your code needs tweaking else the dataset was problematic.

image

I downloaded the datasets of the solution notebook, it matches now so the problem was my datasets. thanks for your help

The problem is not really the data set. I had similar experience. The condition if count >3 shouldn’t be in the nested conditional statement in the for loop. It should be outside the for loop.

1 Like

hi @samuelabidemi2

Great catch! I might have modified @biadboze’s code while taking it in my notebook.

Didn’t see the indentation at all :frowning_face:

Yeah, that’s true. That’s one of the stuffs about coding. All the best man!

the problem wasn’t the code , do you understand the code ??

HI! how to rename the current data set and download the new one?

my code has correct indentation, but i still have the same problem like yours.I dont think it is because of indentation.

Following is the link to download the dataset:

[Google_dataset] https://www.kaggle.com/lava18/google-play-store-apps)
[iOS_dataset] (https://www.kaggle.com/ramamet4/app-store-apple-data-set-10k-apps)

You can simply rename the file or you can use the rename command.

Kindly Back track whether the previous data cleaning steps are correct. You can do this by checking the number of rows and match it with solution provided.

Hi @candiceliu93 @samuelabidemi2 and @biadboze

I re-tried this code with some variations. I will just note the differences here, please do your own testing once and let me know your inputs as well.

I am a DQ student too and I could be at fault too in completely understanding the code or its workflow. Multiple mistakes help us much more in re-enforcing a new concept than it’s one-time practice IMHO.

You can try to match your code with one of the variations or perhaps a different one, so as to understand if in the entire notebook we all are going different only at this part of the code or some other code prior to this as @pablajaspreet94 has also suggested.

Ascii_Check_App_List.ipynb (10.8 KB)

Click here to view the jupyter notebook file in a new tab

Hi All,
I had a similar issue. When I ran the code for getting the English only apps, the total no. of apps I got were:
Google Play store: 9659
iOS App store: 7197

The solution notebook provided shows -
Google Play store: 9614
iOS App store: 6183

I checked my code and compared it with the code provided in the Solution notebook and it turns out there is slight error in the function ‘is_english’ in the notebook.

When I changed the above to as shown below, the total no. of apps started matching the ones in the Solution Notebook –
image

If you think about that code, you don’t need a ‘else’ statement there as mentioned in the solution notebook. If the condition if non_ascii > 3 is false, then it will just carry on with the ‘For’ loop.
This along with the correct indentation for the ‘return True’ for the function ‘is_english’ that corrected my issue.

Please feel free to let me know if my understanding is wrong above, as I am a beginner myself.

Thanks.

Anish

3 Likes

I had this same problem and I have been running through my head about what I could possibly have done wrong until I saw your comment. Thank you

Thankyou for solving this, I couldn’t work out why the provided solution wasn’t working and this helped!