Guided project: building a spam filter

Screen Link:

My Code:

training_test_index = round(len(data_randomized)*0.8)
training_set = data_randomized[:training_test_index].reset_index(drop=True)
test_set = data_randomized[training_test_index:].reset_index(drop=True)
print(training_set.shape)
print(test_set.shape)
print(training_test_index)

the result takes first 4458 rows into training_set and last 1114 rows into test set. From what i understand, it should take the last 4458 rows into test set, isn’t it?

Best,
Jessie

1 Like

Hi again, Jessie,

We want to have 80% of the data in the training set, 20% - in the test set.
training_test_index is 4458 (since len(data_randomized)=5572).

For the training set, we select all the rows of data_randomized up to and not including the index equal to training_test_index, i.e. 4458 (the syntax :training_test_index is responsible for this). Since the indexing in Python starts from 0, we’ll have 4458 rows for the training set.

For the test set, we select all the rows of data_randomized starting from and including the index equal to training_test_index, i.e. 4458 (the syntax training_test_index: is responsible for this). Hence, we’ll have the remaining 1114 rows in the test set.