Lost 3 churn month.....? Create a new df

Thank you for helping in advance!
Screen Link: https://app.dataquest.io/m/468/business-metrics/7/date-wrangling
The below code is from the chapter. I dont understand the presentation way of line 3. Try to combine year and month into one column to create a new df. I assume yearmonths here is the df. Why we can use for loop inside [ ] and we did not clarify what is the value and what is the column name to python to execute.

Later, we created a new df called churn.churn = pd.DataFrame({"yearmonth": yearmonths}) in this code,what do codes inside { } mean?why we give the value(the wholde dataset-yearmonths) to the key " yearmonth"?

years = list(range(2011,2015))
months = list(range(1,13))
yearmonths = [y*100+m for y in years for m in months]
print(yearmonths)

What I expected to happen:

What actually happened:

Replace this line with the output/error
2 Likes

Hi @candiceliu93,

Yearmonths is not the df. It’s a list of combined years and months that is then converted to a df named churn. churn is the df of combined years and months. As for why we can use for loops inside [], its a python shortcut that allows programmers generate a list of iterated values in a single line. For example, say I want to create a list containing 0 - 5 in ascending order, rather than using:

my_list = [ ]
for I in range(6):
      my_list.append(i)

to create this list, it’s much easier for me to use:

my_list = [ I for I in range(6)]

This performs the same function as the first code cell but is much faster and easier to remember. And for your final question, the one quoted below:

In churn = pd.DataFrame({"yearmonth": yearmonths}), the {} are used to specify column names for the columns of your dataframe. When a dataframe is created from a list or any other iterable object, pandas uses an interger value as its default column name. The {} allows us to change this name to whatever we want. It works pretty much like the df.rename() function for existing dataframes except that the value on the right of the colon (yearmonths) is the list itself and the value on the left ("yearmonth") is our desired column name. Oh almost forgot…

And for this one, we don’t give the whole dataset to the key "yearmonth", its just a single column in the dataset that’s assigned to that key. The dataset itself is named churn.

I hope this helps! And if you still need further explanation feel free to ask.

1 Like

Thank you! you explained it well. I got it. Yes, we learned list comprehension in the previous chapter. I almost forget how to use it. But here we used 2 for loop inside [ ]. I dont know we can use 2 for loops at once.

For the use of { }, so we created a column called ‘‘yearmonth’’ and put the value ( the list) to this columns to create a dateframe. am I correct?

Thank you!!

2 Likes

Hey, @candiceliu93,

I’m glad I could help. Yes, We can actually use two loops inside [] and more I believe. Using two loops inside [] is similar to writing a nested for-loop. For example, rather than using:

my_list = []
for I in range(5):
     for j in range(5):
          my_list.append(j)

I would just use:

my_list = [ j for i in range(5) for j in range(5)]

It’s much more convenient for you. And, for this question below:

Essentially, yes. That’s pretty much what we are doing.

Glad I could help out!