Mission 294-7 : Guided Project - Exploring eBay Car Sales Data

While I was working on this guided project, I encountered this error which I couldn’t understand.

My Code:

brand_mean_prices = {}

for brand in popular_brands:
    brand_only = autos[autos['brand'] == brand]
    mean_price = brand_only['price'].mean()
    brand_mean_prices[brand] = int(mean_price)

brand_mean_prices

What I expected to happen: I thought my code would run smoothly, as I already converted the values of the price column to integer.

What actually happened:

TypeErrorTraceback (most recent call last)
/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    127                 else:
--> 128                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    129             except Exception:

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/nanops.py in nanmean(values, axis, skipna)
    355     count = _get_counts(mask, axis, dtype=dtype_count)
--> 356     the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
    357 

/dataquest/system/env/python3/lib/python3.4/site-packages/numpy/core/_methods.py in _sum(a, axis, dtype, out, keepdims)
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32     return umr_sum(a, axis, dtype, out, keepdims)
     33 

TypeError: Can't convert 'int' object to str implicitly

During handling of the above exception, another exception occurred:

TypeErrorTraceback (most recent call last)
<ipython-input-36-5fe60424104b> in <module>()
      3 for brand in popular_brands:
      4     brand_only = autos[autos['brand'] == brand]
----> 5     mean_price = brand_only['price'].mean()
      6     brand_mean_prices[brand] = int(mean_price)
      7 

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/generic.py in stat_func(self, axis, skipna, level, numeric_only, **kwargs)
   7313                                       skipna=skipna)
   7314         return self._reduce(f, name, axis=axis, skipna=skipna,
-> 7315                             numeric_only=numeric_only)
   7316 
   7317     return set_function_name(stat_func, name, cls)

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/series.py in _reduce(self, op, name, axis, skipna, numeric_only, filter_type, **kwds)
   2575                                           'numeric_only.'.format(name))
   2576             with np.errstate(all='ignore'):
-> 2577                 return op(delegate, skipna=skipna, **kwds)
   2578 
   2579         return delegate._reduce(op=op, name=name, axis=axis, skipna=skipna,

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/nanops.py in _f(*args, **kwargs)
     75             try:
     76                 with np.errstate(invalid='ignore'):
---> 77                     return f(*args, **kwargs)
     78             except ValueError as e:
     79                 # we want to transform an object array

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/nanops.py in f(values, axis, skipna, **kwds)
    129             except Exception:
    130                 try:
--> 131                     result = alt(values, axis=axis, skipna=skipna, **kwds)
    132                 except ValueError as e:
    133                     # we want to transform an object array

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/nanops.py in nanmean(values, axis, skipna)
    354         dtype_count = dtype
    355     count = _get_counts(mask, axis, dtype=dtype_count)
--> 356     the_sum = _ensure_numeric(values.sum(axis, dtype=dtype_sum))
    357 
    358     if axis is not None and getattr(the_sum, 'ndim', False):

/dataquest/system/env/python3/lib/python3.4/site-packages/numpy/core/_methods.py in _sum(a, axis, dtype, out, keepdims)
     30 
     31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
---> 32     return umr_sum(a, axis, dtype, out, keepdims)
     33 
     34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):

TypeError: Can't convert 'int' object to str implicitly

As I mentioned above, I finished with converting types of price column so it shouldn’t be the one causing rising this error message. It seems that the python has problem with the last line of code in the for loop ( brand_mean_prices[brand] = int(mean_price) ), but the code looks just fine.

Here I upload the whole code file.
eBay Car Sales_Data Cleaning_idenk9725.ipynb (203.8 KB)

Hi @idenk9725. This snippet of code isn’t causing an issue for me. The error message is pointing to the mean_price = brand_only['price'].mean() line but I can’t tell why. Could you upload a copy of your .ipynb file so I can see what might be going on?

Hey @april.g,

thanks to the solution file and the shared project by @bhavya, I was able to figure out what the problem was.

While I was replacing the original price column with the cleaned one - no extra characters included, type converted from string to integer as suggested in the guidelines - I mistakenly wrote like this

: autos["price"] = autos[autos["price"].between(1,351000)]

which entered the wrong values into the price column.

whereas the right code should be
autos = autos[autos["price"].between(1,350000)].

So now I’m done with the original problem but really wonder what the ‘wrong’ means. It seems that the ‘wrong’ code works like a join statement in SQL to me, but I’m not so sure.

I uploaded the whole code file in the original post. Please scroll down to the bottom of the file until you find the ‘Comparing the right code and the mistake’. The last three lines of code from the bottom shows the result of running the ‘wrong’ code with comments.

Thank you for reaching out to help me!

Ah, I’m glad you were able to figure out what the problem was!

What happened is that autos[autos["price"].between(1,351000)] returns a dataframe with all the columns of autos but just the rows with the price between the two values. autos['price'] is just a column, so when trying to save a dataframe into one column, it can only fit one column (the first one). It looks like the NaN values show up in that column because that row didn’t have a price between 1 and 351000. That’s my best guess at what’s going on, anyway.

2 Likes

I didn’t expect by no means that Python would actually save a data frame into one column without raising any error message. Think it was a good lesson to learn.
Your explanation has been a great help for me. :grin: Thanks a lot!