Data is not displaying correctly in Python

Screen Link:

My Code:

header_row = ['City', 'Event', 'Date', 'Injuries']

list_of_list = [['Fairfax', 'Protest', '09 May 2020', '4'], ['Herndon', 'Disturbance', '09 May 2020', '2'], ['Reston', 'Protest', '09 May 2020', '4'], ['Vienna', 'Gathering', '10 May 2020', '3'], ['McLean', 'Gathering', '11 May 2020', '4'], ['Falls Church', 'Disturbance', '15 May 2020', '3']]

dates, injuries = [], []
for row in list_of_list:
    current_date = dt.datetime.strptime(row[2], '%d %b %Y')
    hurt = int(row[3])
    dates.append(current_date)
    injuries.append(hurt)

import matplotlib.pyplot as plt
plt.style.use('seaborn')
fig, ax = plt.subplots()
ax.plot(dates, injuries, c='red')

ax.set_title('Daily Injuries, May 2020', fontsize=24)
ax.set_xlabel('', fontsize=16)
fig.autofmt_xdate()
ax.set_ylabel("Injuries", fontsize=16)
ax.tick_params(axis='both', which='major', labelsize=16)

plt.show()

What I expected to happen:

What actually happened: My dataset contains individual entries (rows) for every event. So if 10 things happened on 9 May, I will have 10 different entries (rows) for 9 May. When I display the data, I get these vertical lines on days that had several events. I wonder if there’s a way to combine days with several events, so that the visualization reflects the total number of events for that day.

For instance, in the above dataset, three events occurred on 09 May. Therefore, the visualization has a vertical line for 9 May. It looks odd, and the visualization doesn’t showcase the total number for that day, because of the vertical line.

[image|614x500](upload://mCXQiX1NE5EskWsusaLkXQWYJ6G.png) 

plot() as you can see from the documentation as well, plots the data as lines. So, it takes a particular point from your data and connects it to another point using a line and so on.

To be able to plot a certain number of entries corresponding to a particular value, you would ideally have to consider something like a bar plot.

The vertical bar for a corresponding data on the x-axis can depict the number of events that take place on that date. The height of the bar will depict that count as per the y-axis.

An example randomly selected from a google image to give you an idea -

NOTE:
You might have to modify how your data is stored corresponding to your x and y axis in your chart. You will need to make sure that for a particular date (x-axis value), the corresponding y-axis value is the number events that occur on that date.

If I print out your injuries and dates list, i get the following -

[4, 2, 4, 3, 4, 3]

[datetime.datetime(2020, 5, 9, 0, 0), datetime.datetime(2020, 5, 9, 0, 0), datetime.datetime(2020, 5, 9, 0, 0), datetime.datetime(2020, 5, 10, 0, 0), datetime.datetime(2020, 5, 11, 0, 0), datetime.datetime(2020, 5, 15, 0, 0)]

Your dates list has the 2020-05-09 3 times. And the corresponding values in injuries are 4, 2, and 4.

Those three points, corresponding to 2020-05-09 are already plotted in your plot. They are just connected by a line because that’s what plot() does.

So, if you plan to have it displayed as a bar plot to show that the date 2020-05-09, has a total of 4 + 2 + 4, that is 10 events, you will have to modify your lists injuries and dates accordingly.

If you have gone through the content in some of the initial courses, then a frequency distribution might make things easier for you. But you can approach this different ways as well.

2 Likes

Is there a way to visualize a frequency distribution using plot()?

You can plot a frequency distribution using plot() but it would still have your points connected with lines.

To give you a comparison, if you use plot() from with your frequency distribution -

image

If you created a bar plot from it -

image

1 Like

This is great. My only other question is: How do I combine all of the events which happened on 9 May?

As I said, one way is to create a frequency distribution.

In the content, they teach us to create a dictionary for such a thing. You can do that. But you will require an added step of converting the keys and values in that dictionary to corresponding lists so that you can plot the data in those lists. This might help for the latter - https://stackoverflow.com/questions/16010869/plot-a-bar-using-matplotlib-using-a-dictionary

Or you can continue to work with your current code -

  • If you already have a particular date added to dates then how do you update the value in injuries corresponding to that date if that date reappears as you iterate/loop through the data?
  • If you don’t already have a particular date added to dates, then you add it to dates and add a corresponding value in injuries. That’s what you are already doing.

The first point above is essentially how you would create a dictionary for the frequency distribution. But you just modify that logic so that it is applicable to these two separate lists instead of a single dictionary. Take your time to solve this and learn from this. Good luck!

1 Like

Thank you very much for the thoughtful responses and patience. Cheers.

1 Like