How do you know if you should something outside a for loop or not?

For the guided project: profitable app profiles, it says to compute the average of installs outside the nested loop.

  1. How do you know that you should put the avg_n_installs formula outside the loop instead of within (like my code shows)?
  2. Why does putting avg_n_installs inside the for loop give me a zero division error?
  3. Why does the code I have below only print the ‘ART_AND_DESIGN’ category continuously without showing me other categories?

Screen Link:

My Code:

google_play_categories = freq_table(free_google_play,1) #category column

for category in google_play_categories:
    total = 0   #sum of installs specific to genre
    len_category = 0    #number of apps specific to genre
    
    for app in free_google_play:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            total += float(n_installs)
            len_category += 1            
        avg_n_installs = total/len_category
        print(category, ':', avg_n_installs)```

What I expected to happen:


What actually happened: 

Replace this line with the output/error


<!--Enter other details below: -->

You think about the overall logic of your code and focus on what you are trying to calculate. How and where are total and len_category changing? Does putting avg_n_installs inside the loop result in an accurate average vs if it was outside given how/where those two variables update?

What variable, as per you, contributes to a zero division error? Given your current code, why do you think that variable would be 0?

You have a for loop inside another for loop. For each iteration of the outer loop, the inner loop runs through all of its iterations. So, category doesn’t update till the inner loop is completed once.

Try to answer the above questions first. Let me know if you get stuck still.

total and len_category update to 0 in the outer loop, so i thought putting avg_n_installs inside the loop like shown would provide the the same accurate average. The correct way to do it is putting avg_n_installs out of the inner for loop, but i still don’t understand why since that also gives me the same value for total and len_category.

i think i’m confused because i don’t really understand how the nested for loop runs

google_play_categories = freq_table(free_google_play,1) #category column

for category in google_play_categories:
    total = 0   #sum of installs specific to genre
    len_category = 0    #number of apps specific to genre
    
    for app in free_google_play:
        category_app = app[1]
        if category_app == category:
            n_installs = app[5]
            n_installs = n_installs.replace('+','')
            n_installs = n_installs.replace(',','')
            total += float(n_installs)
            len_category += 1            
    avg_n_installs = total/len_category
    print(category, ':', avg_n_installs)```

[quote="the_doctor, post:2, topic:559230"]
What variable, as per you, contributes to a zero division error? Given your current code, why do you think that variable would be `0`?
[/quote]

is it because at some point when the inner for loop is being run, the cateogry_app == category condition will not be met, so the len_category remains as 0?

You are doing well so far, don’t worry.

Calculating the average inside the inner loop means that for every iteration of that loop, your average will update to a new value.

It’s only in the last iteration where you will get the desired average value.

Look at a much simpler example -

a = [[1, 2, 3], [4, 5, 6]]

for i in a:
    total = 0
    len_list = 0
    for j in i:
        total += j
        len_list += 1
        avg = total/len_list
        print(avg)

For the above, what would happen in the first iteration of the outer loop?

  • i = [1, 2, 3]
  • total = 0, len_list = 0
  • inner loop
    • j = 1
      • total = 0 + 1 = 1
      • len_list = 0 + 1 = 1
      • avg = 1/1 = 1
      • print 1
    • j = 2
      • total = 1 + 2 = 3
      • len_list = 1 + 1 = 2
      • avg = 3/2 = 1.5
      • print 1.5
    • j = 3
      • total = 3 + 3 = 6
      • len_list = 2 + 1 = 3
      • avg = 6/3 = 2
      • print 2

Only at the final iteration do you get the correct average.

So, if you had a lot more iterations, let’s say 100, you would be printing those average values 99 times before you got to the correct one after the 100th iteration.

So, what’s an alternative to the above?

Simple. You just calculate the average after the inner loop completes. So, you will be calculating and printing the average only once, after total and len_list have finished updating -

a = [[1, 2, 3], [4, 5, 6]]

for i in a:
    total = 0
    len_list = 0
    for j in i:
        total += j
        len_list += 1
    avg = total/len_list
    print(avg)

That is correct.

For some iteration of your inner loop, before len_list has been updated at least once, your if condition is False, as a result, len_category remains 0. This is also a side-effect of you trying to calculate the average for every iteration. If you allow it to update to completion before trying to calculate the average, you wouldn’t run into such an error (unless your if condition was always False, which is unlikely to happen for this scenario, I think)

Thank you so so much for explaining this to me by laying out the simpler example, I understand it now. thank you soo much!!