How to take 80% data from the dataset?

Hi! I dont understand the solution code of taking 80% data from the data set and taking 20% data fro the data set. Could you help to explain the training/test split part? We need to take 4458 rows from the randomizing data set as the training set, then take the rest as the testing set. but I can’t understand how this will take take 4458 rows outdata_randomized[:training_test_index]
( df[:4458]-----I thought it means that we took all rows and the column 4458 out from df)
and how does data_randomized[training_test_index:] mean take the rest of data? Thank you!!

Screen Link: https://app.dataquest.io/m/433/guided-project%3A-building-a-spam-filter-with-naive-bayes/2/training-and-test-set

My Code:


# Randomize the dataset
data_randomized = sms_spam.sample(frac=1, random_state=1)

# Calculate index for split
training_test_index = round(len(data_randomized) * 0.8)

# Training/Test split
training_set = data_randomized[:training_test_index].reset_index(drop=True)
test_set = data_randomized[training_test_index:].reset_index(drop=True)

print(training_set.shape)
print(test_set.shape)

If you have the following list -

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
  • What is a[:3]?
  • What is a[3:5]?
  • What is a[5:8]?
  • What is a[5:]?

Print the above out if you need to, but the concept around indexing is the same as the one in the code you share.

I got it!! I was not thinking it right…thank you!

1 Like