I’m trying to loop through a dataframe conditionally. I know that there are more efficient ways to do this, but I’m doing it this way so that I can test it’s speed compared to different methods.
I need to loop through this dataframe ‘cps’ and if the value in the ‘union’ column is equal to ‘Union’ then I need to take the value in the ‘wage’ column and add it to a variable and then divide the value by the total amount of row columns. Basically, I need to get the average wage of everyone who was in a union.
Dataframe columns are:
wage, educ, race, sex, hispanic, south, married, exper, union, age, sector
Here’s what I have so far:
def avg_union_wage_loop(x): count = 0 wageSum = 0 for row in x['union']: if row['union'] == 'Union': wageSum = wageSum + x.iloc[row['wage']] count = count + 1 avg = wageSum/count return avg
This row throws an error:
wageSum = wageSum + x.iloc[row['wage']]
and the error that I get is:
string indices must be integers
I’m not entirely sure what to do next. I guess I’m stuck on how I reference the exact row and column I need to reference to get the wage value so that I can add it to a variable where I can sum it up.
Any help would be appreciated.