Data Visualization , Bar Charts, Screen 3

geom_bar(stat = “identity”)

‘In the code above, we specify stat = "identity" within the geom_bar() layer. This is because, by default, using geom_bar() creates a bar graph where the height of the bars corresponds to the number of values in the specified y-variable. Using stat = "identity" overrides the default behavior and creates bars equal to the value of the y-variable, the average.’

Pls. elaborate the above . I did not understand the above concept.

Hi @sharathnandalike. This is a good question. The function argument stat = "identity" is confusing. By default with ggplot, when a bar chart is created ggplot will split the data up into “bins” and then count the number of observation in each “bin”. A bin represents a range of data. For example, if we are talking about movie ratings, a specific bin could represent movies with a rating between 4.0 and 4.5.

We can see the default binning and counting behavior in ggplot2 when we create a bar chart of movie ratings, like this:

reviews <- read_csv("movie_reviews.csv")
ggplot(data = reviews) +
  aes(x = Rating) +
  geom_bar()

This results in the following bar chart:

In the bar chart above, we see that ggplot2 did the work for us of splitting the data up into bins and counting the number of Rating scores that fall into each bin. geom_bar() uses stat_count() by default: it counts the number of cases at each x position, as stated in the documentation.

For screen 3, we don’t want to “count” the observations and split the data into bins, because we already created a summary of the data as average rating score by site:

Instead, we want to use the “identity” that has been specified in the summarized dataframe. In this case, the “identity” is the computed average rating value for each review site:

I hope this helps. Please let me know if you have any follow up questions.