# Iteration question,data visualisation on Traffic Dataset

My Code:

days = [‘Monday’, ‘Tuesday’, ‘Wednesday’, ‘Thursday’, ‘Friday’]
traffic_per_day = {}
for i, day in zip(range(0, 135, 27), days):
each_day_traffic = traffic[i:i+27]
traffic_per_day[day] = each_day_traffic

for day in days:
plt.plot(traffic_per_day[day][“Hour (Coded)”],
traffic_per_day[day][“Slowness in traffic (%)”],
label=day)
plt.legend()
plt.show()

``````
``````

Hi:
for day in days:
plt.plot(traffic_per_day[day][“Hour (Coded)”],
traffic_per_day[day][“Slowness in traffic (%)”],
label=day)

1. I don’t understand why we need two traffic_per_day[day]

2. why connect the dictionary and label using chained index instead of x=“Hour (Coded”)? I tried and got an error.

for i, day in zip(range(0, 135, 27), days):
each_day_traffic = traffic[i:i+27]
traffic_per_day[day] = each_day_traffic

how does python know for every 27 values, it belongs to the right date such as 0:27 is Monday

Thank you for the help, let me know if you have any question.

1 Like

Since `traffic_per_day` is a dictionary, `traffic_per_day[day]` is how we access the data stored in the dictionary using the key `[day]`. Since we want to graph the data based on the day, we need to access the data stored in the dictionary twice for each day; once for the x values (values in the `"Hour (Coded)"` column) and once for the y values (values in the `"Slowness in traffic (%)"` column) that we want to plot.

I think it’s possible that a lot of your confusion would be cleared up if you had a more complete understanding of what the dictionary `traffic_per_day` is storing and how it was created. I’ll try to help you with that. You may also want to refer to screen #2 of this mission where they explain the original Traffic Behavior Dataset. The first line on that screen has some important information:

Our dataset describes the urban traffic in the city São Paulo from December 14, 2009 to December
18, 2009 — from Monday to Friday.

As well as this little snippet a little further down:

The data was registered from 7:00 to 20:00 every 30 minutes. The `Hour (Coded)` column has
values from `1` to `27`

This means that the entire dataset is comprised of 135 observations; 27 for each day, Monday to Friday (i.e. 5 * 27 = 135). Therefore the first 27 rows of `traffic` corresponds to Monday 14 December 2009, the next 27 rows are for Tuesday, the following 27 are for Wednesday, etc…

Rather than use a pandas dataframe with the data for each day stacked on top of each other, we use a dictionary to “separate” the data by day. This is the beauty of the dictionary data structure; it allows us to organize our data by day so that we can explicitly access the data based on the day of the week.

I’m not sure I entirely understand your question here but suspect it relates to what I just explained above. Note that when we use `traffic_per_day[day]["Hour (Coded)]` we are accessing a column (`"Hour (Coded)"`) of a dataframe that’s stored in a dictionary (`traffic_per_day`) using a key (`[day]`).

It’s not so much that “python knows” but rather that it’s how the data was organized in the original dataframe (`traffic`). That’s just how the data was originally recorded: 27 consecutive observations for Monday, then 27 for Tuesday, etc…

I hope this clears a few things up and if it doesn’t, let me know and we can try something else.

3 Likes

Hi,

Thank you for the explanation, much appreciate it.

I understood most of it now. I think not understanding the dataset was the major reason for the confusion.

The only question was for (traffic_per_day[day][“Hour (Coded)”] which I thought Hour(Coded) was a label instead of a column. Because according to the previous page, the code for Hour (Coded) was a label. I thought they are functioning the same except below code is one graph per day and (traffic_per_day[day][“Hour (Coded)”], traffic_per_day[day][“Slowness in traffic (%)”] are all shown in one graph.

I attached the code for your reference.
for day in days:
traffic_per_day[day].plot.line(x=“Hour (Coded)”,
y= “Slowness in traffic (%)”)
plt.title(day)
plt.ylim([0,25])
plt.show()

Oh, I get it now…I see what you were asking. And yes, I can see how that would be confusing! Understand that the method `df.plot.line()` works exclusively on a pandas dataframe and so its arguments (`x =` and `y =`) are expecting their values to be names of columns from the dataframe it is being called upon. Whereas using the function `plt.plot()` we need to explicitly pass the column data as arguments to the function in order for a plot to be generated. You can think of the method version as being specific to a particular dataframe and the function version as being more generic. For example, using `plt.plot()` we could pass data from different dataframes to be graphed whereas with `df.plot()` we must pass column names from the `df` we are calling upon.