How do i filter out all non roman titles (apps) chinese, arabic letters etc

I tried it with the isascii() function, but that gives me way too much false alerts.

code:

def isRoman(dataset,title_column):
    non_roman_index=[]
    for counter, value in enumerate(dataset):
        if value[title_column].isascii()==False:
            non_roman_index.append(counter)
    
    print(non_roman_index)
    
isRoman(apps_data_apple,1)
isRoman(apps_data_google,1)

i want, that a function sees, whether an app has an title which is different to our usual alphabetical letters: abc… (roman letters)

thanks for any help
Philipp

Click here to open the screen in a new tab.

Hi @phibaar1

Take only the first 10 records from each dataset (dataset[:10]) to try this out. You can try both these codes in separate code cells.

  1. Try to print the values for the ‘value’ variable under the “if” statement.

  2. change the ‘enumerate(dataset)’ to ‘dataset.iterrows()’:

Let us know if you observe and are able to understand the difference. Also let me know if you would prefer a straight forward answer.

Happy Learning!

Why is list.iterrows better than enumerate(list)

the dataset has the type of a list. Thats why.