336-5 R Working with Data Frames - Tibble subsetting

, ,

Hey everyone, I’ve recently started working on the Data Analyst in R path and was hoping you could help me shed some light on a doubt that has been bugging me.

My doubt has to do with a task in the Working with Data Frames section, namely Tibble subsetting.

I was expecting the following line of code unemployment_subset2 <- recent_grads$Unemployment_rate[c(1,3,5)] to return the same result as this one, which is the accepted solution, unemployment_subset <- recent_grads[c(1,3,5), "Unemployment_rate"], however this is not the case and I really don’t understand why… Any ideas?

Hello @MatteoBrivio. The values returned by the syntax you provided are correct, and are identical to the solution code. The difference is that the solution code returned a dataframe whereas the code you provided returns a vector. Here’s some code to demonstrate this:

college_majors <- recent_grads$Major

unemployment_subset <- recent_grads[c(1,3,5), "Unemployment_rate"]
#  	Unemployment_rate 
#  1 	0.018380527 
#  2 	0.024096386 
#  3 	0.061097712

unemployment_subset2 <- recent_grads$Unemployment_rate[c(1,3,5)]
# num [1:3] 0.0184 0.0241 0.0611

identical(unemployment_subset, unemployment_subset_2)
#[1] TRUE (this means the values in each are identical)

#[1] "list"

#[1] "double"

#[1] "tbl_df"     "tbl"        "data.frame"

#[1] "numeric"

I included the output of each command as comments with #. So, the unemployment_subset is a list type with the class of data.frame, and because we are dealing with tibbles here, it also includes the special classes tbl_df and tbl. A dataframe is a list of lists!

I hope this helps. Please let me know if you have any questions. Best,

1 Like