geom_bar(stat = “identity”)
‘In the code above, we specify
stat = "identity" within the
geom_bar() layer. This is because, by default, using
geom_bar() creates a bar graph where the height of the bars corresponds to the number of values in the specified y-variable. Using
stat = "identity" overrides the default behavior and creates bars equal to the value of the y-variable, the average.’
Pls. elaborate the above . I did not understand the above concept.
Hi @sharathnandalike. This is a good question. The function argument
stat = "identity" is confusing. By default with
ggplot, when a bar chart is created
ggplot will split the data up into “bins” and then count the number of observation in each “bin”. A bin represents a range of data. For example, if we are talking about movie ratings, a specific bin could represent movies with a rating between 4.0 and 4.5.
We can see the default binning and counting behavior in
ggplot2 when we create a bar chart of movie ratings, like this:
reviews <- read_csv("movie_reviews.csv")
ggplot(data = reviews) +
aes(x = Rating) +
This results in the following bar chart:
In the bar chart above, we see that
ggplot2 did the work for us of splitting the data up into bins and counting the number of
Rating scores that fall into each bin.
stat_count() by default: it counts the number of cases at each x position, as stated in the documentation.
For screen 3, we don’t want to “count” the observations and split the data into bins, because we already created a summary of the data as average rating score by site:
Instead, we want to use the “identity” that has been specified in the summarized dataframe. In this case, the “identity” is the computed average rating value for each review site:
I hope this helps. Please let me know if you have any follow up questions.