Is it really nescessary at this point in the course to make us hard code 14 new variables?!

Hello everyone,
I just have something to point out in the Overfitting part of the course. At this point we should all be coding a lot better. When they want to show us that the variance increases alongside the number of features, I was actually expecting just a simple loop to get all results in a list or a dictionary like for example:

Loop approach
features = ["cylinders", "displacement", "horsepower", "weight", "acceleration", "model year", "origin"]

mses = list()
variances = list()

for n in range(2,8):
    features_selected = features[:n]
    result = train_and_test(features_selected)
    mses.append(result[0])
    variances.append(result[1])

But instead Dataquest want us to hard code 14 variables for the Variance and the Squared Error. Is it really necessary?. It is just too much to write and declare in my opinion. Does somebody have another reason why it should be like that?.

PS: in the next exercise we are asked to do exactly the same by defining 14 variables more. I just skipped it because got bored by such programming approaches.

Thanks!

1 Like

It can depend on what needs to be done later. If you wanted to work with any of those pairs from variances and mses, you would have to access them at some point.

If you weren’t carrying out any operations on those in bulk, then you would have to either use mses[1] and variances[1] every time. Or store those two into new variables and use those variables.

But, even so, your approach presents with some modularity and can be modified for different cases (like more features were present, or a different dataset). So, that does seem to be better to me regardless.

Modular courses developed by team of content creators tend to face such problems. You can provide them feedback about this through the Contact Us page in the top-right.

Hello again :sweat_smile:,
It seems your are the only one answering my questions all the time. Thanks for that. When doing exercise s I always try to optimize the code, but sometimes Dataquest makes me feel that I am writing not very readable code. It happens all the time when there are big loops in the answers, where a list comprenhension or a vectorized operation to join two columns is faster and shorter.
Anyway this was just a post to let people know that there are much more ways of solving the same challenge in python. It would be nice to have a mission called “tips and tricks to write code in a more pythonic way”

1 Like

Just wanted to revive this thread as I have spent a good hour or so trying to solve this “nicely” and nothing seems to work… Unfortunately it doesn’t look like f-strings work in the hosted python editor, so my question now is why doesn’t something like this work?

features = ['cylinders', 'displacement', 'horsepower', 
           'weight', 'acceleration', 'model year', 'origin']
numbers_mse = ['one_mse', 'two_mse', 'three_mse', 'four_mse',
               'five_mse', 'six_mse', 'seven_mse']
numbers_var = ['one_var', 'two_var', 'three_var', 'four_var',
               'five_var', 'six_var', 'seven_var']

for i in range(0,len(numbers_mse)):
    numbers_mse[i] = train_and_test(features[0:i+1])[0]
    numbers_var[i] = train_and_test(features[0:i+1])[1]

i.e., is there any way to make the for loop create a new variable from the list number_mse and number_var and assign the answer from the function, instead of overwriting the strings in then number_mse and number_var.

Again, I know in the real world one might instead create a dataframe or dict, but just trying to find the ‘elegant’ solution to this specific problem haha.

The elegant solution is to not do what we’re suggesting to start with. Hardcoding the values was a wrong approach and it’s not a good idea to create variables from strings. Still, if you want to, you can use something like. . .

features = ['cylinders', 'displacement', 'horsepower', 
           'weight', 'acceleration', 'model year', 'origin']
numbers_mse = ['one_mse', 'two_mse', 'three_mse', 'four_mse',
               'five_mse', 'six_mse', 'seven_mse']
numbers_var = ['one_var', 'two_var', 'three_var', 'four_var',
               'five_var', 'six_var', 'seven_var']

num_of_features = len(features)
create_vars_template = "{mse}, {var} = train_and_test(features[:{idx}])"
for idx in range(num_of_features):
    to_run = create_vars_template.format(mse=numbers_mse[idx], var=numbers_var[idx], idx=idx+1)
    print(to_run)
    exec(to_run)
    exec('print(eval(to_run.split("=")[0]))')

Unfortunately, for some reason, we’re doing something that makes this not work for answer checking, even though it is correct.

1 Like