Iteration question,data visualisation on Traffic Dataset

Screen Link:
https://app.dataquest.io/m/523/pandas-visualizations-and-grid-charts/9/comparing-graphs

My Code:

days = [‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’]
traffic_per_day = {}
for i, day in zip(range(0, 135, 27), days):
each_day_traffic = traffic[i:i+27]
traffic_per_day[day] = each_day_traffic

for day in days:
plt.plot(traffic_per_day[day][“Hour (Coded)”],
traffic_per_day[day][“Slowness in traffic (%)”],
label=day)
plt.legend()
plt.show()


Hi:
for day in days:
plt.plot(traffic_per_day[day][“Hour (Coded)”],
traffic_per_day[day][“Slowness in traffic (%)”],
label=day)

  1. I don’t understand why we need two traffic_per_day[day]

  2. why connect the dictionary and label using chained index instead of x=“Hour (Coded”)? I tried and got an error.

for i, day in zip(range(0, 135, 27), days):
each_day_traffic = traffic[i:i+27]
traffic_per_day[day] = each_day_traffic

how does python know for every 27 values, it belongs to the right date such as 0:27 is Monday

Thank you for the help, let me know if you have any question.

1 Like

Since traffic_per_day is a dictionary, traffic_per_day[day] is how we access the data stored in the dictionary using the key [day]. Since we want to graph the data based on the day, we need to access the data stored in the dictionary twice for each day; once for the x values (values in the "Hour (Coded)" column) and once for the y values (values in the "Slowness in traffic (%)" column) that we want to plot.

I think it’s possible that a lot of your confusion would be cleared up if you had a more complete understanding of what the dictionary traffic_per_day is storing and how it was created. I’ll try to help you with that. You may also want to refer to screen #2 of this mission where they explain the original Traffic Behavior Dataset. The first line on that screen has some important information:

Our dataset describes the urban traffic in the city São Paulo from December 14, 2009 to December
18, 2009 — from Monday to Friday.

As well as this little snippet a little further down:

The data was registered from 7:00 to 20:00 every 30 minutes. The Hour (Coded) column has
values from 1 to 27

This means that the entire dataset is comprised of 135 observations; 27 for each day, Monday to Friday (i.e. 5 * 27 = 135). Therefore the first 27 rows of traffic corresponds to Monday 14 December 2009, the next 27 rows are for Tuesday, the following 27 are for Wednesday, etc…

Rather than use a pandas dataframe with the data for each day stacked on top of each other, we use a dictionary to “separate” the data by day. This is the beauty of the dictionary data structure; it allows us to organize our data by day so that we can explicitly access the data based on the day of the week.

I’m not sure I entirely understand your question here but suspect it relates to what I just explained above. Note that when we use traffic_per_day[day]["Hour (Coded)] we are accessing a column ("Hour (Coded)") of a dataframe that’s stored in a dictionary (traffic_per_day) using a key ([day]).

It’s not so much that “python knows” but rather that it’s how the data was organized in the original dataframe (traffic). That’s just how the data was originally recorded: 27 consecutive observations for Monday, then 27 for Tuesday, etc…

I hope this clears a few things up and if it doesn’t, let me know and we can try something else. :sunglasses:

3 Likes

Hi,

Thank you for the explanation, much appreciate it.

I understood most of it now. I think not understanding the dataset was the major reason for the confusion.

The only question was for (traffic_per_day[day][“Hour (Coded)”] which I thought Hour(Coded) was a label instead of a column. Because according to the previous page, the code for Hour (Coded) was a label. I thought they are functioning the same except below code is one graph per day and (traffic_per_day[day][“Hour (Coded)”], traffic_per_day[day][“Slowness in traffic (%)”] are all shown in one graph.

I attached the code for your reference.
for day in days:
traffic_per_day[day].plot.line(x=“Hour (Coded)”,
y= “Slowness in traffic (%)”)
plt.title(day)
plt.ylim([0,25])
plt.show()

Thanks for your help.

Oh, I get it now…I see what you were asking. And yes, I can see how that would be confusing! Understand that the method df.plot.line() works exclusively on a pandas dataframe and so its arguments (x = and y =) are expecting their values to be names of columns from the dataframe it is being called upon. Whereas using the function plt.plot() we need to explicitly pass the column data as arguments to the function in order for a plot to be generated. You can think of the method version as being specific to a particular dataframe and the function version as being more generic. For example, using plt.plot() we could pass data from different dataframes to be graphed whereas with df.plot() we must pass column names from the df we are calling upon.

I hope this clarifies things a bit more for you and that I haven’t added to the confusion! :sunglasses:

4 Likes

understood now, thank you very much :+1:

1 Like