How to figure out which columns are preventing to unique value in DBN column?

Hi everyone,

I’m on the second lesson of Data Cleaning Project Walkthrough where we have to implement different techniques to combine the datasets of NYC high schools.

I have a question regarding this lesson.

How do we figure out which column prevents us from having unique values in the DBN column? For example:

  1. class_size dataset has CORE COURSE (MS CORE and 9-12 ONLY) and CORE SUBJECT (MS CORE and 9-12 ONLY) columns that are preventing the DBN column from being unique.
  2. The same case is with the demographics dataset where the schoolyear column is prevent DBN from being unique.
  3. And for graduation data it is Cohort and Demographic columns causing the problem.

Here is the reference link related to my question.

Please explain a little bit. Much appreciated.
Thank you!

Screenshot regarding class_size dataset:

Screenshot regarding demographics dataset:

Screenshot regarding graduation dataset:

maybe you screenshot it or post your notebook link directly not everybody has access to the premium account