TypeError: string indices must be integers error when trying to set the 468-8 function in the opposite direction

Screen Link: https://app.dataquest.io/m/468/business-metrics/8/churn-rate

My Code:

def count_customers(yearmonth):
    row_counter = 0
# here I extract the month from the function's input
    m = int(str(yearmonth)[4:])
    for r in subs:
# now, I extract the start and end months from each row and compare those numbers to the inputted month
        r_end_m = r["end_date"].dt.month
        r_start_m = r["start_date"].dt.month
        if (r_start_m < m) & (m < r_end_m):
            row_counter += 1
        else:
            pass
    return row_counter
# the row_counter would output the number of rows that meet the criteria requested on the mission screen

churn["total_customers"] = churn["yearmonth"].apply(count_customers)

What I expected to happen:

So, I know the function given in the solution creates a date from the yearmonth data and applies it as a date to compare on the subs DataFrame through vectors :nerd_face:, but initially, I thought of the function in a different manner and would kindly ask if you could help me understand the error Traceback that I’m getting when looping through the subs DataFrame:

What actually happened:

TypeErrorTraceback (most recent call last)
<ipython-input-1-ee6816adda68> in <module>()
     13     return row_counter
     14 
---> 15 churn["total_customers"] = churn["yearmonth"].apply(count_customers)
     16 
     17 # def get_customers(yearmonth):

/dataquest/system/env/python3/lib/python3.4/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2549             else:
   2550                 values = self.asobject
-> 2551                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2552 
   2553         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-1-ee6816adda68> in count_customers(yearmonth)
      5     m = int(str(yearmonth)[4:])
      6     for r in subs:
----> 7         r_end_m = r["end_date"].dt.month
      8         r_start_m = r["start_date"].dt.month
      9         if (r_start_m < m) & (m < r_end_m):

TypeError: string indices must be integers

subs is a DataFrame, You can’t iterate over a DataFrame to get its each row like this.
So. when you are trying iterate like this, python’s taking this r as a string each time and inside [] the string column name "start_date"as invalid because it’s not an integer value like str[2].
DataFrame.iterrows is a generator which yields both the index and row (as a Series):

import pandas as pd
import numpy as np

df = pd.DataFrame({'c1': [10, 11, 12], 'c2': [100, 110, 120]})

for index, row in df.iterrows():
    print(row['c1'], row['c2'])

Output:
10 100
11 110
12 120
Or,
The df.iteritems() iterates over columns and not rows. Thus, to make it iterate over rows, you have to transpose (the "T"), which means you change rows and columns into each other (reflect over diagonal). As a result, you effectively iterate the original dataframe over its rows when you use df.T.iteritems().

example:

for date, row in df.T.iteritems():

Thank you for your feedback. The approach does not work though, it turns out that in this way, I’m leaving out information from the subs DataFrame. Still, I learned that I couldn’t iterate on a df like that, thanks :+1:

1 Like

I think @jithins123 , you can help @estebanalfaroorozco out, I’m a novice in this world. So, could you please have a look here?