Hello community! hope you’re doing all well.

I am trying to work on a cohort model on my job. Basically I’ve got a table where:

- each row is a month, and the first numbered value is the number of new customers in the month
- each column is another month, and tells me how many customers acquired in the corresponding row are still there and ordering.

with some sample data, it would look like the following.

```
import numpy as np
matrix =[
[100,75,50,40,30],
[np.nan, 100,90,70,30],
[np.nan, np.nan,100,70,50],
[np.nan, np.nan,np.nan,100,90],
[np.nan, np.nan,np.nan,np.nan,100]
]
temp_df = pd.DataFrame(columns= [x for x in range(0,5)], data = matrix)
```

outputted like this.

My desire would be to calculate the retention rate compared from the first acquisition month. I find it to be a super trivial task in excel, but I found it overly challenging in python.

I came to something by doing the following:

```
def retention(x, row):
anchor = 0
for value in row:
if pd.notnull(value):
anchor = value
return x / anchor
first_row = temp_df.iloc[0,:]
new_row = first_row.apply(retention, row= first_row)
```

this works but do I need to apply to each single row and return every row to a new dataframe?

It looks super clunky: :-/

Thanks for your help

Nick