Function to remove lines from data set using Pandas and some conditions

Screen Link:

My Code:

def is_english(dataframe):
    for app in dataframe['App']:
        number_of_letter = 0
        for letter in app:
            if ord(letter) > 127:
                number_of_letter += 1
        if number_of_letter > 3:
            dataframe = dataframe.drop(dataframe[dataframe['App'] == app].index, inplace = True) 
            return dataframe
        return dataframe

Hi, everyone. I am trying to use Pandas library to clean and analyze the App Store and Google Play data set (the same data set from the first Guided Project in Python module). I created a function (called “is_english”) to identify any “non-English” character in the app’s name from the data set and remove these apps from the data set. In my code, I’m considering any app that has more than 3 characters with an order higher than 127 as a non-english app. The ‘App’ expression means the column ‘App’ from the dataset.

I don’t know if it’s possible to use the built-in function “ord()” in this case and if the code line dataframe.drop(dataframe[dataframe[‘App’] == app].index is correct. It seems that the function is not working and it goes directly to the last return dataframe . Can anyone help me?

It is usually not good to loop through a dataframe. If you are using pandas, you should write a function and the use apply. Like this:

def is_en(string):
    count = 0
    for characther in string:
        if ord(characther) > 127:
            count += 1
    if count > 3:
        string = np.nan
    return string

# Applying the function 
ios['track_name'] = ios['track_name'].apply(is_en)
android['App'] = android['App'].apply(is_en)

I also did this using pandas, you can see how I did the whole project here.

1 Like

Thanks Otavio. I didn’t know about the apply function. Much much easier to work on the projects from now on. Super cool.

1 Like

You’re welcome!

I’d recommend you finish all the pandas courses before recreating this project with pandas.

New tools require new approaches. It makes no sense to deal with a dataframe using the methods you used on lists.

1 Like

Yeah, you’re right. It’s because I’m to curious so I tried to recreate the project after the first course of pandas. But I’ll do as you say.

1 Like