Hi! I dont understand the solution code of taking 80% data from the data set and taking 20% data fro the data set. Could you help to explain the training/test split part? We need to take 4458 rows from the randomizing data set as the training set, then take the rest as the testing set. but I can’t understand how this will take take 4458 rows outdata_randomized[:training_test_index]
( df[:4458]
-----I thought it means that we took all rows and the column 4458 out from df)
and how does data_randomized[training_test_index:]
mean take the rest of data? Thank you!!
Screen Link: https://app.dataquest.io/m/433/guided-project%3A-building-a-spam-filter-with-naive-bayes/2/training-and-test-set
My Code:
# Randomize the dataset
data_randomized = sms_spam.sample(frac=1, random_state=1)
# Calculate index for split
training_test_index = round(len(data_randomized) * 0.8)
# Training/Test split
training_set = data_randomized[:training_test_index].reset_index(drop=True)
test_set = data_randomized[training_test_index:].reset_index(drop=True)
print(training_set.shape)
print(test_set.shape)