Questions on Regression

Hi all, I have few doubts, While normalizing(min-max) on the dataset for regression,

  1. Should we apply normalization on the whole dataset or just on the train alone?
  2. Should we normalize the target variable also?
  3. Should we normalize the target variable and predictor variable in both the train and test set?
  4. if applying power transform, should we apply it on both predictor and target variable, should it be applied on both train and test set?
  1. Whole dataset because the model you build on train will produce parameters that fit the standardized values from training set, if you suddenly have unstandardized values from test set, the prediction will be very off. Same reason why when using pre-trained neural networks, people import the preprocessor from the library and run the inputs through the same preprocessors before passing to model for prediction.

  2. https://stats.stackexchange.com/questions/111467/is-it-necessary-to-scale-the-target-value-in-addition-to-scaling-features-for-re Some say no, some yes, but looks like all these answers are in context of learning a linear regression using gradient descent (iterative method). This is different from solving a linear regression using the analytical method (by directly solving a closed-form matrix equation if it is not too big and feasible in time), so conclusions may vary. https://stats.stackexchange.com/questions/23128/solving-for-regression-parameters-in-closed-form-vs-gradient-descent
    Also I find it hard to find any discussion on normalization (eg. min-max) for regression, most sources i can find talk about standardization only.

  3. Same answer as 1, process test set same way as train. Predictor vs target has some answers in 2.

  4. Transforms can be applied to X or Y or both. Do whatever it takes to make a non linear equation look linear so linear regression can be used. Same transforms for both train and test