Hi, I’m working on the Guided Project : Creating An Efficient Data Analysis Workflow and am trying to remove rows with missing data (FYI, all the missing values are located in the ‘review’ column) from ‘book_reviews.csv’.
Complete solution to the Guided Project is here: https://github.com/dataquestio/solutions/blob/master/Mission498Solutions.Rmd
According to the solutions provided (see the URL avove), the code for removing missing data is the following:
complete_reviews = reviews %>% filter(!is.na(review))
But when I run the command, it doesn’t work.
It gives me exactly the same output/dataset as the original ‘book_reviews.csv’ (i.e. none of the rows with missing data are removed).
However, when I run the following code, it does the trick.
complete_reviews <- reviews %>% filter(review != "NA")
Dataquest’s solution doesn’t seem to work because the ‘review’ variable is set to a ‘character’ variable and thus missing values which were coded as ‘NA’ were not recognized as a missing value, as they should have been, but as a character value.
My question is:
Regardless, if I still choose to use the is.na() function to clean my data, how would the correct R code be?
Many thanks in advance.