Why do we need to filter my columns again

Screen Link:

My Code:

import matplotlib.pyplot as plt
import numpy as np
num_cols = ['RT_user_norm', 'Metacritic_user_nom', 'IMDB_norm', 'Fandango_Ratingvalue', "Fandango_Stars"]

bar_heights = norm_reviews[num_cols].iloc[0].values
bar_positions = np.arange(5) + 0.75

fig, ax=plt.subplots()
ax.bar(bar_positions, bar_heights, 0.5)

What I expected to happen:

The code is correct but why do we need to do bar_heights = norm_reviews[num_cols].iloc[0].values ?
Can you please explain what this does. norm_reviews was made from those columns so why do we need to filter using those again? And why is iloc[0] the first row and not say, the first element of the num_cols list?
What actually happened:

Replace this line with the output/error
1 Like

I would recommend that you go through the instructions in Step 2 again to see how norm_reviews was created.

The documentation for iloc (especially the examples) should help clarify what iloc is used for. If it’s not clear from the documentation feel free to ask more questions.

1 Like

Hi @the_doctor

I have asked myself the same questions as @malickke2 has. So, based on your comments, it is correct to say that for

bar_heights = norm_reviews[num_cols].iloc[0].values

norm_reviews is a DataFrame (a 2d structure: rows/columns) created from the original DataFrame called reviews.

num_cols is a list specifying which columns we want from norm_review. Alternatively, we could have used ‘RT_user_norm’, ‘Metacritic_user_nom’, ‘IMDB_norm’, ‘Fandango_Ratingvalue’, ‘Fandango_Stars’… but that would make the code less readable.

.iloc[0] indicates the integer position we want, in this case, integer at position 0. So, it is the first element of each column in num_columns, which coincides with the first row of num_col (?) <- not sure.

.value to return the values, given the description above

Finally, we assign those values to bar_heights.

What I don’t understand (see below) is why a) returns the name of the movie along with the values whereas b) returns only the values?

a) bar_heights = norm_reviews.loc[0].values

b) bar_heights = norm_reviews[num_cols].iloc[0].values

I appreciate any feedback or tips :raised_hands:

That’s correct!

Yes, iloc is used to get the data from the dataframe at the specified position. So, iloc[0] would be the data at position 0 of the dataframe.

To phrase this in a better way, it would be the first row of norm_reviews for the columns specified in num_cols.

Because you are using only norm_reviews in that code. The first column of norm_reviews is FILM so that’s why that is returned as well.

The above should also answer your b part as well.

1 Like