Solution Error: Working with Missing Data Exercise 6

Screen Link:

My Code:
col_labels = ['v_number', 'vehicle_missing', 'cause_missing']

vc_null_data = []

for v in range(1,6):
    v_col = 'vehicle_{}'.format(v)
    c_col = 'cause_vehicle_{}'.format(v)
    v_null = (mvc[v_col].isnull().sum() & mvc[c_col].notnull().sum())
    c_null = (mvc[c_col].isnull().sum() & mvc[v_col].notnull().sum())
    vc_null_data.append([v, v_null, c_null])
vc_null_df = pd.DataFrame(vc_null_data, columns = col_labels)

What I expected to happen:
I get a table with the missing vehicle nulls and cause nulls.

What actually happened:

The actual results do not match the expected results.

I entered the solution exactly as expected and I am uncertain why my results do no match the expected results.


Hi @ryan.pikulski,

There is an issue with your code. In the step 2 of the mission screen, you have to count the number of rows where both the v_col column is null and the c_col column is not null:

v_null = (mvc[v_col].isnull() & mvc[c_col].notnull()).sum()

so the sum() function has to be placed after the parenthesis.

The same about calculating c_null.

Hi ryan.pikulski

As per the instructions in Step 2:

Count the number of rows where the v_col column is null and the c_col column is not null. Assign the result to v_null

So we have to get the rows where v_col is null and c_col is null.
Then count of the rows that satisfy the above condition.
v_null = (mvc[v_col].isnull() & mvc[c_col].notnull()).sum()

Likewise, we have to find c_null as well.

What your code does is, it finds out the row count for which mvc[v_col].isnull() is true, then finds out the number of rows for which mvc[c_col].notnull() is true; and then applies & on these two, to determine v_null, which is not correct.

Hope its clear now.

I see now. This makes sense. As I understand the statement, this returns two new Series objects and we are counting the series returned, not the individual conditions that create each series.

It returns one new Series object
mvc[v_col].isnull() & mvc[c_col].notnull()
containing boolean values. Then, we apply the function sum() to this object to count only True values of it. And we have these True value only in the rows where both conditions are satisfied.


As our objective is to find the total NaNs in all of the 10 vehicle columns, so its instructed to :

  1. Count the number of rows where the v_col column is null and the c_col column is not null
  2. Count the number of rows where the c_col column is null and the v_col column is not null

But, shouldn’t we Count even the number of rows where both v_col column & c_col column is null??

1 Like

Hi @Datom,

In that particular task it’s not required, but for the purpose of additional practice, you can calculate also them, of course :slightly_smiling_face:. Anyway, the cases where both v_col and c_col column are null will be present in both the variables.