Guided Project: Building a Spam Filter with Naive Bayes itterows()?

correct = 0
total = test_set.shape[0]

for row in test_set.iterrows():
row = row[1]
if row[‘Label’] == row[‘predicted’]:
correct += 1

print(‘Correct:’, correct)
print(‘Incorrect:’, total - correct)
print(‘Accuracy:’, correct/total)
``
So I have this code at the end of the project I do not understand completely. The iterrows() part is quite unclear and row=row[1] as well. It gives a chance to iterate over DataFrame rows as (index, Series) pairs but for is the main purpose? Would appreciate some explanation. Thanks

Hi!, I have the same doubt, can someone please clarify? Thanks

@probot @shamikthebest

iterrows() is a Pandas function that allows us to, as the documentation also stated -

Iterate over DataFrame rows as (index, Series) pairs.

So, row in that for loop will be the pair of (index, row content) from test_set. They should have used a better variable name here instead of row, but that’s what it essentially represents - the pair (index, content of the row).

The first row of test_set is -

So, from this row we need to extract the content of the actual row. That’s why they use indexing -

row = row[1]

row[1] will give us the content of the first row of test_set. And, row[0] would have given us the index from that row.

The be more specific, row[1] will give us a Series that represents the first row from test_set. Like an entire column of a dataframe is a Series, an individual row from a dataframe is also a Series.

And then they just check if the Label matches with the predicted value or not.

1 Like

Thanks, now I get it.

This might be stupid but why something like this doesn’t work:

for row in test:
if row[0] == row[2]:
    correct += 1

Because the row above, assuming you mean test_set and not test in this case, would return the column names as strings.

So, row[0] and row[2] would just be the 0th and 2nd character in the column name string. For example, for the column name Label, you will be checking if "L" == "b".

You can easily add print statements to your code and check what the values represent to understand this better.