Why are Double Brackets used in model fit method - ML Python?


Screen Link:

My Code:

model.fit(admissions[["gpa"]], admissions["admit"])

What I expected to happen:

I am confused on why two brackets are required for input features whereas one set for target variable data.

1 Like

hi @danieldominey

I read this somewhere - It seems like the developers who wrote the code for modelling, wanted to take into account a multi-variate model.

But there’s a more logical explanation to this. The feature input is always expected in 2d array format, in which each row represents a sample and the column represents a feature. And it is regardless of which ML Algorithm you are using.

The logic can be understood like this:

  • np.array([1, 2, 3]) we can’t be sure if it’s a 3 sample and 1 feature or 1 sample and 3 features.

  • np.array([[1], [2], [3]]) will be considered as 3 samples 1 feature.

  • np.array([[1, 1], [1, 2], [1, 3]]) will be considered as 3 samples 2 features.

So even if we are using only one feature, we need to reshape the array in a 2d format, so that the model can understand it correctly.

If you have more technical appetite you may refer to the answer post on stack overflow here. The answer post has highlighted two other posts explaining more technically.

In addition to @Rucha’s detailed answer, I would also suggest going through an existing post similar to yours - Why double brackets? and encourage you to use the community’s search feature (although I understand it can be tricky to find similar questions at times)!