Screen Link:
https://app.dataquest.io/m/164/processing-dataframes-in-chunks/4/batch-processing
My Code:
lifespans = []
chunk_iter = pd.read_csv("moma.csv", chunksize=250, dtype={"ConstituentBeginDate": "float", "ConstituentEndDate": "float"})
for chunk in chunk_iter:
diff = chunk['ConstituentEndDate'] - chunk['ConstituentBeginDate']
lifespans.append(diff)
lifespans_dist = lifespans
print(lifespans_dist)
I used lifespans_dist = lifespans
instead of lifespans_dist = pd.concat(lifespans)
What is the difference between calling the appended list and using pd.concat on the appended list?
hanqi
May 16, 2020, 6:12am
#2
What is the purpose of this?
From the pandas docs on pd.concat :https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html
Concatenate pandas objects along a particular axis with optional set logic along the other axes.
What does “calling a list” mean? Do you mean calling the list constructor list()
Here is a good place to get an in depth overview/summary of python, https://store.lerner.co.il/free-courses
Hi @hanqi ,
By appending each diff
to the lifespans
list I would say there is no need for concatenating because all of the data is already in lifespans
list, correct me if I’m wrong?
Never mind, the answer is in the next slight:
“We can use the pandas.concat()
function to combine all of the chunks at the end”.
Anyway thanks for your help!