What is the use of pd.concat here?

Screen Link:
https://app.dataquest.io/m/164/processing-dataframes-in-chunks/4/batch-processing

My Code:

lifespans = []
chunk_iter = pd.read_csv("moma.csv", chunksize=250, dtype={"ConstituentBeginDate": "float", "ConstituentEndDate": "float"})
for chunk in chunk_iter:
    diff = chunk['ConstituentEndDate'] - chunk['ConstituentBeginDate']
    lifespans.append(diff)
lifespans_dist = lifespans
print(lifespans_dist)

I used lifespans_dist = lifespans instead of lifespans_dist = pd.concat(lifespans)

What is the difference between calling the appended list and using pd.concat on the appended list?

What is the purpose of this?

From the pandas docs on pd.concat :https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
Concatenate pandas objects along a particular axis with optional set logic along the other axes.

What does “calling a list” mean? Do you mean calling the list constructor list()

Here is a good place to get an in depth overview/summary of python, https://store.lerner.co.il/free-courses

Hi @hanqi,

By appending each diff to the lifespans list I would say there is no need for concatenating because all of the data is already in lifespans list, correct me if I’m wrong?

Never mind, the answer is in the next slight:

“We can use the pandas.concat() function to combine all of the chunks at the end”.

Anyway thanks for your help!