Why double brackets?

Screen Link: https://app.dataquest.io/m/20/logistic-regression/7/predict-labels

Quick question, why does the admissions[["gpa"]] need to be in double brackets instead of single brackets when the admissions["admit"] column is in single brackets?

3 Likes

In this case, the difference is because of what kind of data type those two output and what is expected as input to fit()

print(type(admissions[["gpa"]]))

The type of the above is -

<class ‘pandas.core.frame.DataFrame’>

It’s a DataFrame.

However,

print(type(admissions["gpa"]))

has a type of -

<class ‘pandas.core.series.Series’>

This is important because of the kind of shape they have as well -

print(admissions["gpa"].shape)

The above prints out (644,)

And,

print(admissions[["gpa"]].shape)

prints out (644, 1)

Notice that 1 there.

Now why the above is relevant. If you check out the Documentation for fit(), this is what it expects as input -

X: {array-like, sparse matrix} of shape (n_samples, n_features)

y: array-like of shape (n_samples,)

Notice the shapes there for both. X, is expected to be (n_samples, n_features). In our particular case we only have 1 feature. That’s the same 1 we saw above. If we were using more than 1 columns as are features, it would be more than 1.

The double brackets essentially allow us to index the column as a DataFrame. And the DataFrame will have a shape depending on the (num_rows, num_cols).

Singular brackets will have a shape of (n_rows,) because they output a Series. And that’s what fit() expects for y. And that’s what admissions["admit"] returns as well.

9 Likes

Ah, ok… Thanks so much for your thorough explanation! It wasn’t immediately apparent. That makes sense now.

2 Likes

Great question, I always had the same question in mind…

Thanks for answering this very clearly. I was tearing my hair out trying to figure out why it was necessary for the X component, but not the y component. I guess this is why we read the documentation. :sweat_smile:

2 Likes