What are the differences between using df.plot(x='', y='', kind="scatter") and ax.scatter() to create a scatter plot?

Hi guys,

I’ve tried to learn Python systematically. Would you mind helping me to clarify these codes bellow? Are they different? or they doing the same thing? Thanks a lot.

Code 1:
fig, ax = plt.subplots()
ax.scatter(norm_reviews[‘Fandango_Ratingvalue’], norm_reviews[‘RT_user_norm’])

Code 2:
recent_grads.plot(x=‘Sample_size’, y=‘Median’, kind=‘scatter’)

1 Like

Hello @florah3979

There are various ways of generating plots:

  1. using matplotlib.pyplot.plot
  2. using pd.DataFrame.plot
  3. using pd.Series.plot
  4. etc

Pandas uses matplotlib 's pyplot plot function. Therefore using pd.DataFrame.plot or pd.Series.plot is a convenient shortcut to generate plots.


From your two codes above, the generated plots will be different since in code you are using the norm_reviews dataframe and in code 2 you are using recent_grads dataframe


You can learn more on visualizations techniques:
1.


2. https://www.dataquest.io/blog/tag/data-visualization/
3. https://pandas.pydata.org/docs/user_guide/visualization.html
4. https://pandas.pydata.org/pandas-docs/version/0.13/visualization.html

Hi @florah3979,

I am also going through these chapters now. So I can try to answer this with my limited knowledge.

I think in short both of them are the same but two methods to achieve the same result. But when it comes to more control they kind of differ because both of them are two different methods.

With df.plot() you are simply plotting with the given values and it can have different types of plots such as scatter plot, bar plot etc. But when you use df.scatter() it is specifically for scatter plots and offers more control for this type of plots.

If you have seen the instructions related to histograms in the guided project you will understand more.

recent_grads['Sample_size'].plot(kind='hist')

Since it is using df.plot() it won’t be able to add bins to the histogram. At the same time

recent_grads['Sample_size'].hist(bins=25, range=(0,5000))

df.hist() is able to do more with the plots.

Hope this helps.

Like @info.victoromondi, you can go through the wonderful content he has generously shared with us to understand this more. Then you can teach me :wink:
Happy learning.

1 Like