Why are my graphs overlapped (python)

Screen Link:

My Code:

fig = plt.figure.Figure(figsize=(5,12))

cols=["Sample_size","Median","Employed","Full_time","ShareWomen","Unemployment_rate", "Men", "Women"]

for i in range(0,8):
    ax=fig.add_subplot(8,1,i+1)
    ax=recent_grads[cols[i]].plot(kind="hist")

What actually happened:

Does anyone know how to solve this issue? I wanted to create 8 different subplots

Hi @lauratangwt:

I have not done the mission myself but this article may help you.

Refer to the sample solution.

We’re using r for a couple different things. We can’t use range(0,5) because the fig.add_subplot(4, 1, r) can’t accept 0 as a position in the subplot grid. I think this is the reason why the range started at 1. The problem though is that means we won’t graph the histogram for Sample_size because that’s col[0].

Hi, thanks for replying! But i still don’t get it, because the sample solution managed to use range(0,4) successfully and did not cause a graph overlap.

In addition, what was taught to us was using range as well.

AttributeError: 'function' object has no attribute 'Figure'
Are you sure your code runs? I pasted it into the DQ jupyter and got this.

You code and the solution is different. You did plt.figure.Figure which should not even work. Solution did plt.figure()

Hi Hanqi,

Yes it works on my DQ Jupyter. Regarding plt.figure.Figure, I asked on the community forum before, why using plt.figure() it failes on my DQ Jupyter. Here is the link: Plotting Code not running (python)

I was advised to use plt.figure.Figure().

@lauratangwt

I suspect you are doing import matplotlib as plt instead of import matplotlib.pyplot as plt.
You are using the same alias to represent the library 1 level higher than it is normally named in the docs, making use and discussion difficult.

That stackoverflow answer has 2 likes and is 7 years old (i’m not sure if it’s correct even then), reason to take it with grain of salt. I have deleted local files not tracked by git (i thought if it’s not tracked, git should not be able to touch it) by following stackoverflow blindly without understand the code, so be warned about copying code from SO.

When you do plt.figure.Figure() with plt representing matplotlib, you are actually creating a figure with https://matplotlib.org/api/_as_gen/matplotlib.figure.Figure.html#matplotlib.figure.Figure.add_subplot. This is a different way of creating a figure from using the usual pyplot wrapper interface that the solution uses: https://matplotlib.org/3.3.3/api/_as_gen/matplotlib.pyplot.figure.html.
Figure is a class in a figure.py file in the former, and figure is a function in the pyplot.py file in the latter.

By using Figure() directly, you are using matplotlib in an Object-Oriented manner, while pyplot is using it in state-based manner. You can read about these 2 paradigms in https://realpython.com/python-matplotlib-guide/.

The issue with creating class Figure directly is it is missing figure managers that help you track state (eg. current axes), which matplotlib and pandas depends on.

For example, instead of using plt.show (proper plt aliasing pyplot), if you tried to do figure.show() (where figure = matplotlib.figure.Figure()), it will error. https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/figure.py#L2395

If the figure was not created using `~.pyplot.figure`, it will lack
        a `~.backend_bases.FigureManagerBase`, and this method will raise an
        AttributeError.

You can extend this disadvantage to other scenarios too that we don’t know yet.

For the below explanations, do

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

to display all variables in a cell without print (you can assign matplotlib output to _ if you don’t want output to be polluted by useless objects during normal development, in this case, they are not useless but show us what’s going on).

Also add from matplotlib import _pylab_helpers to help investigate current state in matplotlib. What functions are imported from where can be found by ctrl+f matplotlib source code on github (or matplotlib docs for a nicer UI). This skill is transferrable to other libraries like pandas, sklearn or any other python library.

cols=["Sample_size","Median","Employed","Full_time"]
fig = matplotlib.figure.Figure(figsize=(5,12))

for i in range(0,4):
    ax = fig.add_subplot(4,1,i+1)
    ax
    
    recent_grads[cols[i]].plot(kind="hist")
    plt.gca()
    
    figManager = _pylab_helpers.Gcf.get_active()
    figManager.canvas.figure

Matplotlib plots by getting a figure, then finding the correct axes in the figure, and drawing artist objects in the axes. The 3 lines are the output for

  1. ax, the new ax created from add_subplot
  2. return value from output of recent_grads .... which is the axe (currently in focus) used by pandas
  3. plt.gca() the current axis called by pandas series.plot when no axes is provided to the ax= parameter so pandas automatically calls plt.gca() (You can know it’s that by tracing through def plot_series --> def _gca starting from https://pandas.pydata.org/pandas-docs/version/0.24.2/reference/api/pandas.Series.plot.html)

We can see the subplot axe was created properly, but it is different from the axe pandas was using, and the figure manager did not register that this new ax was created, so was plotting in the axe returned by plt.gca(). Note how the subsequent plots go into the same axe because they have same address 0x7f0012037C18 in photo (this will change every run, just compare same or different to understand the point), that’s why your plots overlap.

They are also overlapping in this screenshot but because the x-axis scale is so different the 1st loop loop is in the tiny bottom left corner of 2nd loop plot. figManager.canvas.figure shows what’s in the figure as of current loop.

I tried to provide the ax into ax= parameter of series.plot but the axes were showing empty space with no graphs so maybe this is the wrong way to do it. I don’t know how to attach axes created from add_subplot of an Object-oriented created figure to the figure manager. Judging by the above warning in source code, it seems we should never create figures in OOP way but use plt.figure().

To see how it works properly with plt.figure(),

cols = ["Sample_size", "Median", "Employed", "Full_time"]
fig = plt.figure(figsize=(5,12))

for r in range(0,4):
    ax = fig.add_subplot(4,1,r+1)
    ax
    recent_grads[cols[r]].plot(kind='hist', rot=30)
    plt.gca()
    
    figManager = _pylab_helpers.Gcf.get_active()
    figManager.canvas.figure

See how now all 3 ax refer the the same place (so the axe pandas.plot uses is the same axe produced by the subplot added to the figure), and in each loop a new axe is properly added and new bar graph from a dataframe column is drawn in there, rather than drawing in the same old single axe.

Conclusion is do not touch OOP api until you require it and know what objects you want to manipulate. Otherwise, doing everything through plt interface is sufficient for most purposes (since programming is sequential anyway so the code just needs to work with 1 axe in 1 figure at any time, and we care about how the plot looks finally rather than what specific objects are used to create them), and actually a more convenient API since it helps you call the current information (rather than you explicitly coding to find it) when you want to edit formatting details like labels/ticks etc.

Even the lead developer is lamenting the docs, so don’t sweat it. I’m just doing this to learn how OOP design can go bad.

1 Like

Wow!!! THANK YOU SO MUCH @hanqi, you’re a life saver! appreciate it loads! :smiley: