# Trouble When Creating Bars - Bar Plots and Scatter Plots

I was trying to go through the lesson “Creating Bars” where we begin to learn how to create bar plots and I’m having a hard to understanding a particular line of code. If someone could explain it to me that would be super helpful!!

``````import matplotlib.pyplot as plt
from numpy import arange
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', 'Fandango_Stars']

bar_heights = norm_reviews[num_cols].iloc.values
bar_positions = arange(5) + 0.75

fig, ax = plt.subplots()

``````

The above code is the code being used to create a bar plot. I have trouble understanding this line:

``````bar_heights = norm_reviews[num_cols].iloc.values
``````

Where `norm_reviews` is the sliced dataframe, and `num_cols` are the selected columns, what does the `iloc` and `values` portion actually do for the bar heights?

Thank you so much!

3 Likes

The bar graph we’re creating on this screen is graphing the information in the first row (`.iloc`). Using `.values` takes the information in the row and turns it into an array that matplotlib can use to plot the heights of each bar. (You can see the result in the variable inspector for `bar_heights`.) Just for fun, you can change to `.iloc` (or whichever) and see it make a bar plot for the next row.

3 Likes

Thanks again @april.g!!

Hey @april.g, I experimented without the “.values” method and it returned the same bar chart, working perfectly fine. I even submitted the result and it was properly accepted by the code checker. Any comment about why it would be a good coding practice to use the “.values” to return an array? Thanks!

1 Like

Yeh correct it returned same bar plot. @Sahil @april.g can let us know is this correct way to code.

1 Like

Generally, using `.values` can be considered good practice. This is because `.values` will return an `ndarray` which is often faster than pandas data structure. Here is my observation on the performance difference below. If the size of the data is large, then pandas is more efficient when it comes to arithmetic operations performed on the data. On the other hand, if the data is small, then ndarray performs faster. However, when it comes to indexing, ndarray is way faster than pandas data structure irrespective of data size.

Jupyter QtConsole 4.6.0

Python 3.7.6 (default, Jan 8 2020, 13:42:34)

IPython 7.12.0 -- An enhanced Interactive Python. Type '?' for help.

In : import numpy as np

...: import pandas as pd

In : ndarray_100m = np.random.randint(0, 9, 100000000)

...: series_100m = pd.Series(ndarray_100m)

...: series_ndarray_100m = series_100m.values

...: ndarray_10k = np.random.randint(0, 9, 10000)

...: series_10k = pd.Series(ndarray_10k)

...: series_ndarray_10k = series_10k.values

In : %timeit series_100m ** 1.61803398875

735 ms ± 9.45 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In : %timeit series_ndarray_100m ** 1.61803398875

2.61 s ± 30.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In : %timeit series_10k ** 1.61803398875

385 µs ± 10 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In : %timeit series_ndarray_10k ** 1.61803398875

216 µs ± 1.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In : %timeit series_100m

46.9 µs ± 22.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)

In : %timeit series_ndarray_100m

168 ns ± 2.4 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In : %timeit series_10k

16.6 µs ± 3.15 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In : %timeit series_ndarray_10k

169 ns ± 2.11 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Here is a good article on performance comparison:

Best,
Sahil

2 Likes