Going fast! #DataquestChallenge Premium Annual Offer:
500 get 50% & the next 1000 get 40% off.
GET OFFER CODE

Guided Project 2 - Book profitability

Hello!

https://app.dataquest.io/m/498/guided-project%3A-creating-an-efficient-data-analysis-workflow

library(dplyr)
library(forcats)
library(ggplot2)

reviews <- read.csv("book_reviews.csv", stringsAsFactors = TRUE)

summary(reviews)
sapply(reviews, typeof)
sapply(reviews, function(x) sum(is.na(x)))

library(naniar)
gg_miss_var(reviews, state)
gg_miss_var(reviews, book)

reviews <- na.omit(reviews)

unique(reviews$state)

reviews %>% mutate(state = fct_recode(state, Texas = "TX", California = "CA",
                                      Florida = "FL", `New York` = "NY")) %>% 
  mutate(review = fct_relevel(review, "Poor" , "Fair", "Good", "Excellent", "Great"),
  rev_num = as.numeric(review), high_review = rev_num > 3) -> reviews
     
ggplot(reviews, aes(fct_reorder(book, price, .fun = sum))) + geom_bar(aes(weight = price)) + scale_y_continuous(labels = scales::label_dollar())                        

reviews %>% group_by(book) %>% 
  summarise(retail = unique(price), total_rev = sum(price), av_rating = mean(rev_num),
            n_high_rev = mean(high_review), n_sold = n()) ->r


tidyr::pivot_longer(r, cols = -1) %>%  ggplot() + geom_col(aes(name, value, fill = book), position = "dodge") + facet_wrap(~ name, scales = "free")

#book that generated most revenue : "Secrets of R"
#most sold : "Fundamentals of R"
#most high reviews : "Fundamentals of R"

#augment the price?

#some tests

chisq.test(table(reviews$book, reviews$high_review))
chisq.test(table(reviews$state))
library(rstatix)
anova_test(reviews, rev_num ~ book)

#no effect of book on high review
#no effect of book title on mean rating
#no effect of state on sales

# :)

analysis.pdf (167.7 KB)

1 Like

Hi @teorems

Thanks for submitting a project with and Welcome to DQ community! I haven’t worked with R, but I am glad you uploaded a pdf version for the project.

Are you looking for feedback on the project? Can I suggest few thing before the DQ community members do so? Let me know.

Hello there. Of course, you’re welcome! I’m sorry for the pdf which is just the “knitting” of the script and doesn’ really pay justice to RStudio wonderful productive capablities! It was just a quick and dirty take on the project, as I’m no beginner but I’ve got a free access on the platform so I’m skimming a bit through the material for the fun of it. Let me know what you think.

1 Like

Hi @teorems

Okay. I am actually glad I asked before I posted anything about the project. Since you are already familiar I will leave it be.

The .pdf helped me to just open the doc and look at the workflow without installing/configuring something specifically for R.

Hope you enjoy your explorations with DQ! :slight_smile: