# How to take 80% data from the dataset?

Hi! I dont understand the solution code of taking 80% data from the data set and taking 20% data fro the data set. Could you help to explain the training/test split part? We need to take 4458 rows from the randomizing data set as the training set, then take the rest as the testing set. but I can’t understand how this will take take 4458 rows out`data_randomized[:training_test_index]`
( `df[:4458]`-----I thought it means that we took all rows and the column 4458 out from df)
and how does `data_randomized[training_test_index:]` mean take the rest of data? Thank you!!

My Code:

``````
# Randomize the dataset
data_randomized = sms_spam.sample(frac=1, random_state=1)

# Calculate index for split
training_test_index = round(len(data_randomized) * 0.8)

# Training/Test split
training_set = data_randomized[:training_test_index].reset_index(drop=True)
test_set = data_randomized[training_test_index:].reset_index(drop=True)

print(training_set.shape)
print(test_set.shape)
``````

If you have the following list -

``````a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
``````
• What is `a[:3]`?
• What is `a[3:5]`?
• What is `a[5:8]`?
• What is `a[5:]`?

Print the above out if you need to, but the concept around indexing is the same as the one in the code you share.

I got it!! I was not thinking it right…thank you!

1 Like