Help with data type converstion

Hi, i am doing my second guided project. I am doing two projects. This is my first one i am working on and its an independent project to help me learn.

Basically I am just trying to calculate the avg of all the rows. The problem is that the data in the csv file is actually a string. ( There are over 1000 rows). Below is how the data looks in excel.

191.9p
196.9p
196.9p

Here is my code:

#Create an empty varaible and poulate that by using a for loop
diesel_price = 0

for row in fuel_data [1:]:
    avg_diesel = float(row[2])
    diesel_price +=1
    avg_diesel_sum = diesel_price + avg_diesel

avg_diesel_price = diesel_price / len(fuel_data[1:])

Here is the error:

~\AppData\Local\Temp/ipykernel_9000/722415982.py in
3
4 for row in fuel_data [1:]:
----> 5 avg_diesel = float(row[2])
6 diesel_price +=1
7 avg_diesel_sum = diesel_price + avg_diesel

ValueError: could not convert string to float: ‘191.9p’

I am just looking for guidance here please. Is there a way to convert this in python? or is it better to actual convert these values in excel?

Thank you for any help.

Hi @Frankie ,

You can use the replace() to remove the ‘p’.

Click on triangle bullet to view a quick example below:

example
x =  '91.9p' 
y = x.replace('p', "")   # replace the 'p' with a nothing " " using the quotation marks
z = float(y) # now that the p has been removed you can convert to float

x
str (<class 'str'>)
'91.9p'
y
str (<class 'str'>)
'91.9'
z
float (<class 'float'>)
91.9

Try this:

for row in fuel_data [1:]:
    row[2] = row[2].replace("p", "")
    avg_diesel = float(row[2])
    diesel_price +=1
    avg_diesel_sum = diesel_price + avg_diesel

avg_diesel_price = diesel_price / len(fuel_data[1:])
3 Likes

Hi thanks for this, this is very useful to know and makes sense. I have actually change the data in excel. (Removed the p)

But im still getting an error

diesel_price = 0

for row in fuel_data [1:]:
    avg_diesel = float(row[2])
    diesel_price_sum =  diesel_price + avg_diesel
    

avg_diesel_price = diesel_price_sum / len(fuel_data[1:])
 4 for row in fuel_data [1:]:
  5     avg_diesel = (row[2])

----> 6 diesel_price_sum = diesel_price + avg_diesel
7
8

TypeError: unsupported operand type(s) for +: ‘int’ and ‘str’

Not sure why though?

–actually is it because i need to declare a variable for these?

diesel_price_sum , diesel_price, avg_diesel

Thanks

1 Like

You can try printing the type so you can find out which variable is still a string. It is kicking an error because it is trying to add an int + str

You can use print(type(variable_name)) to find out the variable type.

print(type(diesel_price_sum))

3 Likes

Thank you, i found the culprit! but all the data apart from the header is now a float, so im confused why its still a string?

print(type(diesel_price_sum))

<class 'float'>

In [260]:

print(type(diesel_price))

<class 'int'>

In [261]:

print(type(avg_diesel))

<class 'str'>
1 Like

Did you convert line[5] to a float?

1 Like

Yeah it pings another error lol

ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_9000/4005590499.py in <module>
      3 
      4 for row in fuel_data [1:]:
----> 5     avg_diesel = float (row[2])
      6     diesel_price_sum =  diesel_price +  avg_diesel
      7 

ValueError: could not convert string to float: ''
1 Like

Can you copy and post the entire code?

2 Likes

Sure, ive just started so thats it really:

opened_file = open('ShellPrices.csv')
from csv import reader
read_file = reader(opened_file)
# Below we create a list we can use
fuel_data = list(read_file)


#Create an empty varaible and poulate that by using a for loop
diesel_price = 0

for row in fuel_data [1:]:
    avg_diesel = float (row[2])
    diesel_price_sum =  diesel_price +  avg_diesel
    

avg_diesel_price = diesel_price_sum / len(fuel_data[1:])


1 Like

Thank you,

You may have to check your row to make sure no strings are still there. This may be why avg_diesel is not converting to a float.

You can use try/except block to check if the row[2] contains a string:

for row in fuel_data:
        
    try:
        result = float(row[2])
          
    except ValueError:
        print('Value is not convertible to float')
        print(row[2])

After you run this, did you have any values that did not convert?

2 Likes

Hi, thanks for this, interesting

Value is not convertible to float
Diesel
Value is not convertible to float

Value is not convertible to float

Value is not convertible to float

Value is not convertible to float

1 Like

Hi am really sorry, it must be the dataset, i copied it from the web. When i trimmed it down it worked. so its not the code. There must be a corrupted cell ( maybe it was empty cell) I’ve been through it all.

Thanks for the help and advice!

1 Like

Sounds good. If you only had (6) not converting, you could just change value to 0 for those (6) from dataset.

If you want to share dataset, we may can help identify the issue…

try this code to change row[2] value to 0:

for row in fuel_data:
        
    try:
        result = float(row[2])
          
    except ValueError:
 
        row[2] = 0
        print(row[2])
3 Likes

Thanks Casandra,

There were quite alot of empty rows - about 15. So i just removed them from excel and it seems to work :slight_smile:
Thanks for all your help

3 Likes