Can someone explain how does the sample=pd.DataFrame and sample=sample.append(data_collected) work?

Screen Link:
https://app.dataquest.io/m/283/sampling/10/cluster-sampling

My Code:

clusters = pd.Series(wnba['Team'].unique()).sample(4, random_state = 0)

sample = pd.DataFrame()

for cluster in clusters:
    data_collected = wnba[wnba['Team'] == cluster]
    sample = sample.append(data_collected)

I am just really confused with this initiating a DataFrame. How does this append work in the last line of the loop? Thanks!

2 Likes

Append method will append dataframedata_collected into sample. If you take this dummy example

df = pd.DataFrame()      # It will make an empty dataframe without columns and rows.
print(df)     
>>> Empty DataFrame
>>> Columns: []
>>> Index: []

# if df_1 is
df_1 = pd.DataFrame({"a":[0,1,1],"b":[0,1,2]})
print(df_1)
>>>    a  b
>>> 0  0  0
>>> 1  1  1
>>> 2  1  2

# Here we are adding all rows from `df_1` which has `1` in column `a` to the empty dataframe `df`
df = df.append(df_1[df_1["a"] == 1])
print(df)
>>>    a  b
>>> 1  1  1
>>> 2  1  2
1 Like

Thank you, that makes more sense! I think I need to read more about python data structure. Sometimes I am just confused with when to use df, series, list, dictionary, etc.

1 Like

Can someone explain what’s the mechanics that are behind the idea of creating an object: df? in other words why the code below makes the appending:

…and why this version does not:

df = pd.DataFrame()      # It will make an empty dataframe without columns and rows.
print("df before ", df)     


# if df_1 is
df_1 = pd.DataFrame({"a":[0,1,1],"b":[0,1,2]})
print("df_1: ", df_1)

df.append(df_1[df_1["a"] == 1])  # here it's not an object
print("df after ", df)

I kind of feel it, but I don’t understand it fully.