Mode, Screen 2 - Exercise

In the exercise , the instruction to write a code for function may be done as given in Learn ie; table(houses$Land Slope).

Why we have to write a function.

Hello @sharathnandalike,

This is an excellent question.

A function is needed here because we have a sequence of instructions (3 instructions) that we want to reuse several times. Actually, to calculate the mode after this table(houses$ Land Slope ), we have to order the output, and then take the first element.

Hi John,

I presume the function compute_mode() gives the Frequency Distribution table for a vector. The same thing is done by table() also for a vector. In table() also we can order the output, and then take the first element.

The use of table() is not mentioned before in previous missions. Excuse me if I am wrong. Plz elaborate table().

Second, I did not understand the 1st line of the code : counts_df <- tibble(vector) >.
We actually take the original data frame first ie: ‘houses’ here. Why we have taken tibble(vector) >

Hey @sharathnandalike.

Actually, the compute_mode() function does more than that:

  • Compute the frequency table (first 3 lines)
  • Arrange the frequency table in descending order (4th line)
  • Take the first element.
compute_mode <- function(vector) {
    counts_df  <-  tibble(vector) %>% 
        group_by(vector) %>% 
        summarise(frequency = n()) %>% 
        arrange(desc(frequency)) 

    counts_df$vector[1]
}

If you want to compute the mode using table() function you have to do this names(sort(-table(x)))[1] which, I’m afraid, can be hard to understand.

The best way to compute the mode, it if we don’t want to use the split-apply-combine workflow is to use the following function which, I’m afraid, can be very hard to digest.

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

We have taught it and used it in several other previous courses of our R path but not in this course (the most recent is stats1). We’ll do a little reminder. Thank you

Interesting question!

In R, as you may know, there is a difference between a vector and a dataframe. For the dplyr operations we want to perform here a dataframe/tibble is expected and not a vector. As a distribution is a vector, it had to be converted to a tibble first and this is the role of the tibble() function.

Regards,
John.