Question with for loops/vector syntax

Hello! I’m working through the R course module “Working with Control Structures.” I have a couple of questions under the section “Storing For-Loop Output in Objects.” While I understand the concept of for loops, as well as if-else statements, what’s tripping me up is a much more basic nuance about vectors. In the screenshots below, I am wondering two things:

  1. Why is it necessary to create an “empty vector”–vector <-- c ( )–first? Why can’t this step be skipped over and instead the vector written out with its components?

  2. In the Intro to R programming course, I must not have fully grasped the concept of vectors, because I can’t figure out why/how it would be necessary to write out “total_goals” as the first component in parentheses. Total goals hasn’t yet been defined, has it? I thought that it would be defined ONLY once the new elements–sums of “home_goals” and “away_goals” (I believe total_goals is a typo, there)–are calculated. Essentially, I’m wondering why it doesn’t read: total_goals <-- c (home_goals + away_goals).

Would really love it if someone has a different way of breaking this down for me–none of this comes especially naturally so alternative ways of explaining are super helpful. Thank you!

-Ben

Hello @ben.d.harris16. Great questions!

To your first question, sometimes it is necessary to create/initialize an empty vector first for speed/performance. But in this case, the empty vector total_goals needs to be initialized outside of the for-loop so that the vector itself can be modified (i.e. appended-to). If we do not initialize the vector outside of the for-loop, then R will be unable to complete the first iteration inside the for-loop. Why? Because total_goals does not exist and therefore cannot be modified/appended-to. This relates to your second question…

As covered in this screen from an earlier mission in order to append elements to a vector, we must provide the name of the vector to append to as the first function argument/input, and the item(s) to append as the subsequent function arguments.

To your question:

Total goals hasn’t yet been defined, has it?

The screenshot you provided above from the course shows that the empty vector needs to be created with total_goals <- c(), but it’s not very clear that this code needs to appear above the for-loop.

To your question:

Essentially, I’m wondering why it doesn’t read: total_goals <-- c (home_goals + away_goals).

Let’s see what happens if we try in RStudio:

library(readr)
scores <- read_csv("scores.csv")

for (i in 1:nrow(scores)) {
  total_goals <- c(scores$home_goals[i] + scores$away_goals[i])
}

The result is:

> total_goals
[1] 1

total_goals is a vector of length 1 with a value of 1. And if you look up in the “Environment” window, you will see that there is a “value” i with a value of 59L. What this means is that the only result returned is the total goals from the last row/match in the dataset, row 59 (the final value that i took in this loop). So, R only returned the value from the 59th match, the final iteration in the for-loop.

To really break this down, let’s see what happens if we simulate what the for-loop does by explicitly spelling out the code for the first three iterations. Here we change the value of i at each iteration like we do in a for-loop:

i <- 1
total_goals <- c(scores$home_goals[i] + scores$away_goals[i])
total_goals
i <- 2
total_goals <- c(scores$home_goals[i] + scores$away_goals[i])
total_goals
i <- 3
total_goals <- c(scores$home_goals[i] + scores$away_goals[i])
total_goals

The result is this:

> i <- 1
> total_goals <- c(scores$home_goals[i] + scores$away_goals[i])
> total_goals
[1] 4
> i <- 2
> total_goals <- c(scores$home_goals[i] + scores$away_goals[i])
> total_goals
[1] 1
> i <- 3
> total_goals <- c(scores$home_goals[i] + scores$away_goals[i])
> total_goals
[1] 6

The value of total_goals is returned each time, but the vector is always of length 1, values are never appended to it, instead the value from the previous iteration is overwritten. So what happens if we update the syntax to provide the name of the vector to append to as the first function argument/input:

library(readr)
scores <- read_csv("scores.csv")

total_goals <- c()

i <- 1
total_goals <- c(total_goals, scores$home_goals[i] + scores$away_goals[i])
total_goals
i <- 2
total_goals <- c(total_goals, scores$home_goals[i] + scores$away_goals[i])
total_goals
i <- 3
total_goals <- c(total_goals, scores$home_goals[i] + scores$away_goals[i])
total_goals

We see in the RStudio console that the length of total_goals grows by one with each iteration!

> library(readr)
> scores <- read_csv("scores.csv")
> 
> total_goals <- c()
> 
> i <- 1
> total_goals <- c(total_goals, scores$home_goals[i] + scores$away_goals[i])
> total_goals
[1] 4
> i <- 2
> total_goals <- c(total_goals, scores$home_goals[i] + scores$away_goals[i])
> total_goals
[1] 4 1
> i <- 3
> total_goals <- c(total_goals, scores$home_goals[i] + scores$away_goals[i])
> total_goals
[1] 4 1 6

If you look at the top three rows of the scores dataframe you will see that the total goals shown here match those rows!

Finally, what would happen if we ran the code above but did not create the empty vector first? We’d get the following error at each iteration because R can’t modidy the total_goals vector if it does not exist:

Error: object 'total_goals' not found

I hope this helps!!! Please let me know of any further questions.

Hi Casey,

This is very helpful. Excellent in-depth explanation, thank you for writing out such a detailed response to my question(s). So it essentially sounds like the name of the vector will always be the first function argument/input in order to retain all the existing elements–otherwise, the vector will be created anew (erased) every time an element is appended?

Thank you again for your help.

Best,
Ben

Casey,

One quick follow up–because vectorized functions (what I’m currently working on), unlike for-loops, don’t work element-by-element, does that make it unnecessary therefore to initialize an empty vector–making it redundant, for instance, in the below, to write “tied_matches <- c()”?

Thanks again,
Ben

Hey @ben.d.harris16, that is correct. Unless you specify an object to append to (as the first function argument), R will create/overwrite the object.

Please pardon the slow reply, I’ve been out of the office for the holidays.

HI @ben.d.harris16. In this case, the dplyr package does some of the work for you and eliminates the need to initialize an empty vector. This is also an advantage of using the purrr package for iteration; with purrr and the map() family of functions, there is no need to initialize a vector because the function does it for you under-the-hood.