Confusing Statement in the 3rd chapter (Pandas, Histograms) of Guided Project of the course, 'Exploratory Data Visualization'

https://app.dataquest.io/m/146/guided-project%3A-visualizing-earnings-based-on-college-majors/3/pandas-histograms

The statement I am referring to is here:

’ If you’ve looked at the documentation for Series.plot() , you’ll notice there is no way to control the binning strategy for histograms. Luckily, we can control the binning strategy of a histogram using [Series.hist()] which contains parameters specific to customizing histograms.’

The code given to achieve the results is:
recent_grads['Sample_size'].hist(bins=25, range=(0,5000))

But I am getting the same result with the following code:
recent_grads['Sample_size'].plot(bins=20, range=(0, 5000), kind='hist')
which negates the above statement.

According to the statement Series.plot() doesn’t provide the binning strategy, but the above line of code is working fine.

`Can anyone explain why is it so?

1 Like

They are just redundancies in matplotlib API. Multiple ways for a user to achieve the same results. With df.plot, you can dynamically change the value in kind input parameter. This is harder if you used df.hist.

I’m not sure about the binning point, could be simply author error, could be API got updated so it doesn’t provide for an input in the past but does now. I find the fastest answer is just run it and see if error comes out. It’s unfortunate that matplotlib docs has to use *args **kwargs which really doesn’t say what inputs are possible, but that’s the settled solution for now for an ever changing API with too many inputs, and for compatibility/easy docs updating of libraries depending on matplotlib (eg. pandas df.plot)

2 Likes