Delete $ in 'price' column

Screen Link:

My Code:

free_eng_apple = []
free_eng_google = []
for row in eng_apple:
    price = float(row[4])
    if price == 0.0:
        free_eng_apple.append(row)
print(free_eng_apple[:5])

for row in eng_google:
    price = float(row[7])
    if price == 0.0:
        free_eng_google.append(row)
print(free_eng_google[:5])

What I expected to happen:

isolating free apps

What actually happened:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-d7e005ad9fad> in <module>
     16 
     17 for row in eng_google:
---> 18     price = float(row[7])
     19     if price == 0.0:
     20         free_eng_google.append(row)

ValueError: could not convert string to float: '$4.99'

I guess you can simply put like below, without changing the price column to float.:

android_final = []
ios_final = []

for app in android_english:
    price = app[7]
    if price == '0':
        android_final.append(app)
        
for app in ios_english:
    price = app[4]
    if price == '0.0':
        ios_final.append(app)
        
print(len(android_final))
print(len(ios_final))

however, there could still be an error if there’s a value like $0.0, and this will be not regarded as a free app since this is not ‘0.0’. is there anyway to simply delete $ or € or any other currency signs?

Thank you!

Hi @summerale,

however, there could still be an error if there’s a value like $0.0, and this will be not regarded as a free app since this is not ‘0.0’.

No, there are no such cases. Look how you can check it:

free_list= []

for app in android_english:
    price_category = app[6]  # "Free" or "Paid"
    price = app[7]
    if price_category == 'Free' and price not in free_list:
        free_list.append(price)
        
print(free_list)

Output: ['0']

The piece of code above means that all the android apps marked as “Free” have price 0, and not 0.0.

Now let’s check the opposite: if it’s always true that the apps with the price 0 are marked “Free”

zero_categories_list = []

for app in android_english:
    price_category = app[6]  # "Free" or "Paid"
    price = app[7]
    if price == '0' and price_category not in zero_categories_list:
        zero_categories_list.append(price_category)
        
print(zero_categories_list)

Output: ['Free', 'NaN']
So yes, the apps with the price 0 are always marked “Free”

Finally, let’s check your hypothesis about the price $0.0 and the price category of such apps:

zero_zero_categories_list = []

for app in android_english:
    price_category = app[6]  # "Free" or "Paid"
    price = app[7]
    if price == '$0.0' and price_category not in zero_zero_categories_list:
        zero_zero_categories_list.append(price_category)
        
print(zero_zero_categories_list)

Output: []
Hence, there’re no such cases.

2 Likes

dear @Elena_Kosourova,

thank you so much for your time and comment! maybe it wasn’t clear, i meant by “what if”. what if there are such cases, and how can we get rid of them?

1 Like

Hi @summerale,

Ah, ok, now I understood: you want to delete any other sybmol from prices (like currencies etc.) and leave only pure prices (like 0, 2.5, 34.9, etc.)

For doing this, you can use a regular expression, which is a sequence of characters that forms a search pattern used for checking if a string contains that pattern. You will learn regular expressions in this DQ course. Even more efficiently, you can use the pandas library, which you will also learn in future courses :slightly_smiling_face:

Anyway, let’s consider using regular expressions. First, we’ll import a library for applying regular expressions operations:

import re

Next, we’ll assign a regular expression pattern to be serached:

pattern = re.compile('\d*\.?\d+')  

You can decifer the meaning of each symbol in this pattern here. In short words, with this pattern we can find any float or integer numbers in a string. Once again, everything will become much more clear to you after the course on regular expressions.

Then, we use a for-loop to find this pattern in each price, to extract it and to re-assign the cleaned value back to app[7]:

for app in android_english:
    price = app[7]
    cleaned_price = pattern.findall(price)
    app[7] = cleaned_price[0]

Why we use the 0 (cleaned_price[0]) for assigning the value back to app[7] - because cleaned_price is actually a list containing, in our case, only one item, which is a cleaned version of price. We take this only value (of string type) from this list and re-assign it back to app[7]

Now, if you want to check the resulting values of prices in android_english:

for app in android:
    print(app[7])

and you’ll get a long list of all the updated values of price.

2 Likes

@Elena_Kosourova, thank you so much for the comment! With your comment I got to know that the concept of regular expression is needed and could taste it a little bit. I will bookmark this thread and come back after the lessons. Thank you so much again! :slight_smile:

You are welcome @summerale.

A question:

Why in Android is “0” and in iOS is “0.0”? When I tried to use the float in Android it counted 0 apps.

Thank you!

Susanna

Hi @susannamendoza,

because that is how the price is described in each dataset, and they are written as ‘string’ not ‘float’ :slight_smile:

Thank you so much for the answer. The reason was simplier than I thought… :face_with_hand_over_mouth:

I still dont get why we used ‘0.0’ for ios dataset.
Isn’t string ‘0’ different than ‘0.0’ ? I don’t see ‘0.0’ in the ios columns, I only see ‘0’.

Thanks