Guided Project: Visualizing Earnings Based On College Majors - 3/6 Histograms

No notes in the learning and I can’t decipher the solution - can someone please explain the solution regarding histograms, slide 3 of 6, in this guided project?

  • Why are there 2 different for loops (both shown below)?
  • What is range (1,5) and (4,8) referring to in each of them?
    - I assume it’s the columns, but can you explain the notation?
    - I assume also r-3 in the second for loop relates to columns but can you explain this?

cols = [“Sample_size”, “Median”, “Employed”, “Full_time”, “ShareWomen”, “Unemployment_rate”, “Men”, “Women”]

fig = plt.figure(figsize=(5,12))
for r in range(1,5):
ax = fig.add_subplot(4,1,r)
ax = recent_grads[cols[r]].plot(kind=‘hist’, rot=40)

fig = plt.figure(figsize=(5,12))
for r in range(4,8):
ax = fig.add_subplot(4,1,r-3)
ax = recent_grads[cols[r]].plot(kind=‘hist’, rot=40)

Hi Simon. I looked at my project, and I didn’t do it this way. I just generated a separate histogram for each one, like recent_grads['Sample_size'].plot(kind='hist'), in separate cells. I only mention that to let you know that it doesn’t have to be done a certain way. The solution provided by Dataquest gives one alternative that saves retyping the same thing over and over again.

So let’s go through the given solution so that you can use that information on a future project.

  1. There are probably 2 loops is because we’re generating a total of 8 graphs, and they’re breaking it up.
  2. range() is a built-in function that generates integers. You can check out this site for more information and the different usages for range(), including the different ways it’s used for loops.
    • range(1, 5) generates the list [1, 2, 3, 4]. The first time the loop runs, r=1, so it will create the 1st subplot and get the column at index 1 from the cols list. Then the 2nd time around, r=2, and so on. When the loop finishes, the figure will be complete and you’ll have 4 histograms. (I’m thinking maybe it should be cols[r-1] though…)
    • subplots(4, 1, r-3) is because we’re creating a new figure with subplots, but we can’t use something like (4, 1, 5) because we only have 4 axes plots set up. So to get the columns at indices 4 through 8, we need to make an adjustment for the subplot code.

I hope that helps!

Hi April,

Thanks for coming back - I did do it the basic way, just wanted to understand the loop way.

Anyway, some comments:

  • Why break it up, why not just run 8 hist’s in one go if it’s possible? Although I did try to do this but the graphs weren’t very legible so assume that’s the reason.
  • range(1,5) isn’t the same logic as range (4-8)? Looks like it should read (5,9) instead but that doesn’t work obvs…
  • I think I understand r-3 …

It looks like you were trying to link a site? It hasn’t come through if you were.

Simon

Ha, you’re right, I forgot to put the link. Here it is.

  • I can’t say for certain, but it was probably broken up to limit the size of the figure. To add more plots, you have to adjust figsize accordingly to accommodate them all. You could probably adjust it to make a 4x2 grid of plots and make figsize=(12,12) or something, but that will make the loop more complicated. (There’s another project where we do that, but I can’t remember which one it is!).

  • While I was writing out my post the same thing about the range() part of the code occurred to me. Something is off about it. I think it’s an error – the last graph of the 1st set is the same as the 1st graph of the 2nd set, so there’s an overlap. It should be as you said, (1, 5) and then (5, 9). In that case, instead of r-3 it should be r-4.

1 Like

That’s great, thanks again