Guided project-ebay used cars- how to put 2 series into dataframe?

Hi!I am stuck at the last challenge question, can someone help on it? Thank you so much!

I used the df.groupby to calculate average price so that I can compare the average price difference based on damage or not damage cars in the same model cars.

Now I have 2 series, how can I make these 2 series into a datagram so I can read it easily? thank you!!

Screen Link: https://app.dataquest.io/m/294/guided-project%3A-exploring-ebay-car-sales-data/8/storing-aggregate-data-in-a-dataframe

My Code:

withdamage=autos['unrepaired_damage']=='Yes'
damagecars=autos.loc[withdamage,:]
damagecars.groupby('Brand/Model')['price'].mean()
nodamage=autos['unrepaired_damage']=='No'
nodamagecars=autos.loc[nodamage,:]
nodamagecars.groupby('Brand/Model')['price'].mean()

Hi @candiceliu93
One way i think you can do that is by concatenating both series

withdamage=autos['unrepaired_damage']=='Yes'
damagecars=autos.loc[withdamage,:]
dc = damagecars.groupby('Brand/Model')['price'].mean()
nodamage=autos['unrepaired_damage']=='No'
nodamagecars=autos.loc[nodamage,:]
ndc = nodamagecars.groupby('Brand/Model')['price'].mean()

df = pd.concat([dc, ndc], ignore_index = True)

Good luck

HI! thank you for helping. I tried your code. the output shows mix dc and ndc into one columns. I was thinking a dataframe showing 3 colomuns, brand/model, dc mean price, ndc mean price… not sure if it is possible.

Hi again!
Well it is possible to do it. You need to do a left merge on both dataframes

withdamage = autos[autos['unrepaired_damage'] == 'ja']
withdamage = withdamage[['brand', 'price']]
dc = withdamage.groupby('brand')['price'].mean().reset_index()
dc.rename(columns = {'price' : 'dc mean price'}, inplace = True)

nodamage = autos[autos['unrepaired_damage'] == 'nein']
nodamage = nodamage[['brand', 'price']]
ndc = nodamage.groupby('brand')['price'].mean().reset_index()
ndc.rename(columns = {'price' : 'ndc mean price'}, inplace = True)

df = pd.merge(dc, ndc, how='left', left_on='brand', right_on='brand')

And the result would be this

imagen

Hope you understand, good luck!

Thank you for sharing the merge function.
df = pd.merge(dc, ndc, how='left', left_on='brand', right_on='brand') can you help explian why you put left_on and right_ on?

I try the merge function. Error shows “ValueError: can not merge DataFrame with instance of type <class ‘pandas.core.series.Series’>”. :cry:
Thank you for helping!!

please see my code.

damagecars=autos.loc[withdamage,:]
dc=damagecars.groupby('Brand/Model')['price'].mean() 
nodamage=autos['unrepaired_damage']=='No'
nodamagecars=autos.loc[nodamage,:]
ndc=nodamagecars.groupby('Brand/Model')['price'].mean()

dc.rename(columns={'price':'dc mean price'},inplace=True)
ndc.rename(columns={'price':'ndc mean price'},inplace=True)
df=pd.merge(dc,ndc,how='left',left_on='Brand/Model',right_on='Brand/Model')```

Hi,I think i see why i have error showing. I did not put reset_index() when i used the group by.
seems that reset_index() is used to change series into dataframe, can i use for any series? I was thinking of using the pd.series and pd.DataFrame constructors, but it did not work. can you help explaining what are the difference??

Thank you so much!

Hi sorry for not writing an explanation in my first answer

it’s not that reset_index() change the series into a dataframe. It simply reset the index of the DF and use the default. When you call groupby pandas takes the column used to group and uses it as the index of that series/DF, so with reset_index() we avoid that in order to not transform the DF into a series in this case, in that way you end with a DF that have index col, brands col and the mean price col instead of brand col as index and mean price col. And yes you can use it in any series, or any DF. Even in DF with multiple index you can use it to remove one index.

Now in merge we use left_on and right_on to specify the column, columns or index that will we used to join the dataframes. They should be the same in order to not lose any row of information. However that also depends of the method of joining, it can be left, right. I don’t remember in wich course they explain this but i think you are close to it so don’t worry for now.

The difference between pd.DataFrame and pd.Series is that the first one is used when you have two dimensional, mutable data like dictionarys. While pd.Series is used with only one dimensional data like arrays. I think neither are used to create a new DF from two DFs or two Series.

There’s a way to transform a Series into a DF with pandas.Series.to_frame you can read more about it here.

Hope i make myself clear, good luck!

For the reset_index(), if i remove it on my code, and try to print the type, it is said series, but if I add reset_index(), it shows dataframe. does it mean that it takes the column values that we used with groupby(columns) as index in the dataframe?

for the pd.merge(), if we do left join, can we only put left_on="join key'' and ignore the right_on(). like df=pd.merge(dc,ndc,how='left',left_on='Brand/Model')

Understand the pd.DataFrame and pd.Series. thank you for your explanation!!

groupby(coulums) take the specified column as index, reset_index() tells it to forget any index and use the default one wich is 0,1,2,3…
In your case groupby transform your dataframe into a series because it only have two columns, brand and price. But if you apply groupby in a df with multiple columns it will stay as a df

No, you need to specify both left_on and right_on, because if you don’t then pandas will not know wich column in the right will be used to merge. The method specied in the parameter how is the type of merge

  • left: Only use the keys or values that are in the left frame. That means if there’s a value in the left that not appears in the right, the result would be a NaN. But if there’s a value in the right that not exists in the left, it will be ignored
    imagen This is a left join
  • right: Use the keys or values that are in the right frame
    imagen A right join
  • outer: Use union of keys in both dataframes
    imagen Outer join is also known as full join
  • inner: Use intersection of keys in both dataframes
    imagen Inner Join

In this case i used left join after i checked that both dfs had the same values, but it would have been easier to use an outer join

Here you find a good explanation about the difference between every method https://www.analyticsvidhya.com/blog/2020/02/joins-in-pandas-master-the-different-types-of-joins-in-python/

Happy to be helpful