Exploring rows in DF (Ebay Car Sale Data)

Screen Link: https://app.dataquest.io/m/294/guided-project%3A-exploring-ebay-car-sales-data/4/exploring-the-odometer-and-price-columns

My Code:

for row in autos:
if autos[‘price’]==0:
print (row)

What I expected to happen:
I’m looking for the rows with ‘price’==0

What actually happened:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Please help.

You might want to find the column in row since you are selecting the row from autos in each iteration.

So I think the code should be
if row['price']==0

Let me know if this helps.

Im afraid not:
image

In the referenced DataFrame, I am looking for those row whose ‘price’ value == 0
I know what they are after

autos[“price”].value_counts().head()

image

Now I wanna investigate deeper these rows (‘price’==0), so I would like to extract a new DataFrame only with those rows. I can access them one to one, but I wanted all of them together in a table.

Please help.

Hello @yiiiija,

Have you tried

autos[autos.price == 0]

?


This returns a dataframe.

1 Like

Excellent! So clear. Many thanks.

However, in any case, I would like to understand how to get that with a for loop.
Can you kindly help?

1 Like

@yiiiija May I ask why would you want to do that with a for loop?
It would be in-efficient with a for when pandas provides a boolean-mask filter on the dataframe itself, that happens to return the filtered rows.

Just because Im learning, and I wanna try different ways in order to organize my mind better :slight_smile:

1 Like

OK.

To check the original code, try:

for row in autos:
    print(type(row), row)

Output would be the column names, and they’re all strings! Hence the TypeError: string indices must be integers as it was looking for an integer index rather than 'price' :

<class 'str'> date_crawled
<class 'str'> name
<class 'str'> seller
<class 'str'> offer_type
<class 'str'> price
<class 'str'> ab_test
<class 'str'> vehicle_type
<class 'str'> registration_year
<class 'str'> gearbox
<class 'str'> power_ps
<class 'str'> model
<class 'str'> odometer_km
<class 'str'> registration_month
<class 'str'> fuel_type
<class 'str'> brand
<class 'str'> unrepaired_damage
<class 'str'> ad_created
<class 'str'> nr_of_pictures
<class 'str'> postalcode
<class 'str'> last_seen


In a DF, to access each row, you will need to use iterrows() to iterate over its rows and access each column value by its name:

for idx, row in autos.iterrows():
    print(row.price)


More on this, click here.

That outputs a cell of every value on the ‘price’ column… What we want to get out is the entire row with a ‘price’ ==0 … (using a loop)

Try

pd.DataFrame([row for idx, row in autos.iterrows() if row.price == 0])


The above uses a list comprehension first to obtain a list of rows matching the condition, then finally converting to a dataframe with pd.DataFrame() constructor.

1 Like

Thats nice again! You are brilliant.

However, I’m looking for a loop, its blowing my mind… Is it imposible?

List comprehension form,

[row for idx, row in autos.iterrows() if row.price == 0]

is a short-hand for a for loop.


The above 1-liner is ditto the below in expanded form:

autos_price0_list = []

for idx, row in autos.iterrows():
    if row.price == 0:
        autos_price0_list.append(row)

pd.DataFrame(autos_price0_list)

Thank you. So, you want to mean I can not get a table using a for loop, like the one we get when using

autos[autos.price == 0]

All three variations of code I discussed, return a dataframe.
To check this, use the built-in type():

autos_price0_df = autos[autos.price == 0]

print(type(autos_price0_df))

Output should be pandas.core.frame.DataFrame

autos_price0_list = []

for idx, row in autos.iterrows():
    if row.price == 0:
        autos_price0_list.append(row)

autos_price0_from_for = pd.DataFrame(autos_price0_list)

print(type(autos_price0_from_for))

Output should be pandas.core.frame.DataFrame

autos_price0_handy = pd.DataFrame([row for idx, row in autos.iterrows() if row.price == 0])

print(type(autos_price0_handy))

Output should be pandas.core.frame.DataFrame