Calculating Total Questions Per Year

Basics-Copy1 (2).ipynb (272.2 KB)

Screen Link:

https://app.dataquest.io/m/469/guided-project%3A-popular-data-science-questions/10/just-a-fad
My Code:

yearly = tps.groupby('Year').agg({"Deep Learning": ['sum', 'size']})
yearly.columns = ["Deep Learning Questions", "Total Questions"]
yearly["Deep Learning Rate"] = yearly["Deep Learning Questions"]\
                               /yearly["Total Questions"]
yearly.reset_index(inplace=True)
print(yearly)

What I expected to happen:

Table of annual deep learning questions, total questions, and deep learning rate.

What actually happened:

No error. Total questions is 1 for all years.

Please see attached file for complete code.

Click here to view the jupyter notebook file in a new tab

1 Like

@vroomvroom

I think you already grouped them here.

You used size to calculate the length of each column for a particular year.

year = all_quests["CreationDate"].dt.year
all_quests['Year'] = year
all_quests["category"] = all_quests["Tags"].apply(categorize)
tps = all_quests.pivot_table(index=all_quests['Year'], 
      columns=all_quests['category'], aggfunc='size')
print(tps)                            

So when you called

yearly = tps.groupby('Year').agg({"Deep Learning": ['sum', 'size']})
yearly.columns = ["Deep Learning Questions", "Total Questions"]

Your size is 1 for each year row and your sum do not change from what you got from the first block of code.

Cheers!

1 Like

Can I get some guidance on this please? I’ve been reworking step 4, reread the section on Data Aggregation, and also compared my work with the solution. Here is what I have:

import numpy as np

all_quests['Year'] = year
all_quests["category"] = all_quests["Tags"].apply(categorize)
all_quests = all_quests[all_quests["CreationDate"].dt.year < 2020]
yearly = all_quests.groupby("Year")
questions_year = yearly(tps["Deep Learning"]).count()
all_quests.pivot_table(["Deep Learning", "Total Questions",
                        "Deep Learning Rate"], "Year",
                      aggfunc=np.sum, margins=True)
#yearly.columns = ["Deep Learning Questions", "Total Questions"]
#yearly["Deep Learning Rate"] = yearly["Deep Learning Questions"]\
 #                              /yearly["Total Questions"]
#yearly.reset_index(inplace=True)
print(all_quests)
1 Like