Guided Project R: Creating An Efficient Data Analysis Workflow question

Dear all,

I am having troubles interpreting the types of the columns in the database. I ran the code below:

colnames(book_reviews_db)

for (name in colnames(book_reviews_db)) {
coltypes <- c()
type <- typeof(book_reviews_db[[name]])
coltypes <- type
show(coltypes)
}

And the result I got was:
[1] “integer”
[1] “integer”
[1] “integer”
[1] “double”

I am puzzled because all three columns contain strings of characters and I was therefore expecting a character type.
Any idea where my mistake could be? I also tried the class() funciton and I get “factor” class for the first three columns.

Thank you in advance.

1 Like

Hi @ElisaDellAglio,

I just checked the dataset mentioned in screen 1 and it’s showing that the first 3 columns are strings:

> library(tidyverse)
> reviews <- read_csv("book_reviews.csv")
Parsed with column specification:
cols(
  book = col_character(),
  review = col_character(),
  state = col_character(),
  price = col_double()
)
> for (c in colnames(reviews)) {
+     print(typeof(reviews[[c]]))
+ }
[1] "character"
[1] "character"
[1] "character"
[1] "double

Can you please recheck it with this dataset?
book_reviews.csv (82.4 KB)

Best,
Sahil

Dear Sahil,

That’s the file. I tried to re-open it in another document and this is what happens:

code:

library(tidyverse)
review <- read.csv(“book_reviews.csv”)
glimpse(review)

The result of the glimpse function is:

Observations: 2,000
Variables: 4
book <fct> R Made Easy, R For Dummies, R Made Easy, R Made Easy, Secret... review Excellent, Fair, Excellent, Poor, Great, NA, Great, Poor, Fa…
state <fct> TX, NY, NY, FL, Texas, California, Florida, CA, CA, Texas, N... price 19.99, 15.99, 19.99, 19.99, 50.00, 19.99, 19.99, 19.99, 29.9…

It is evident that the type of the column is fct for the first three and dbl for the last one.

Thank you in advance for any additional help!
Elisa

1 Like

Observations: 2,000
Variables: 4
book fct R Made Easy, R For Dummies, R Made Easy, R Made Easy, Secret…
review fct Excellent, Fair, Excellent, Poor, Great, NA, Great, Poor, Fa…
state fct TX, NY, NY, FL, Texas, California, Florida, CA, CA, Texas, N…
price dbl 19.99, 15.99, 19.99, 19.99, 50.00, 19.99, 19.99, 19.99, 29.9…

I finally solved by converting factors to characters… after having discovered what “factor” actually means.

1 Like

Hi @ElisaDellAglio
while i am doing this complete_reviews <- complete_reviews >
mutate(
state = case_when(
complete_reviews$state == “California” ~ “CA”,
complete_reviews$state == “New York” ~ “NY”,
complete_reviews$state == “Texas” ~ “TX”,
complete_reviews$state == “Florida” ~ “FL”,
TRUE ~ state # ignore cases where it’s already postal code
)
)
getting this error
Error: Problem with mutate() input state.
x must be a character vector, not a factor object.
i Input state is case_when(...).
Run rlang::last_error() to see where the error occurred

just wanted to know if its a package installation problem or do i need to explicitly convert the factor into characters ,
i asked because in the solution file there is no type conversion made.

Hi! I think you need to manually force the format into character, otherwise, as the error message said, your command cannot run.
I found how to do it online! I will post my solution at the end of my holiday.

2 Likes

Hi everyone,

I found a convenient solution to converting factor to character types on stackoverflow: https://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters/2853231#2853231

Hope this is helpful!

2 Likes