Hi all! I have trouble in understanding the below code. I understood that we use str. contains() to select the rows we want from a column, but why we can returns all rows and columns by using this codemerged_national_accounts = merged[national_accounts] In the merged dataset we dont have columns called national_accounts, so why we put this inside[ ]? Thank you in advance.
Hi @candiceliu93, I am new to DQ so maybe I am not the best to answer this but I will try!
I think your confusion might be coming from picking up on a pattern and drawing a false conclusion: merged['SpecialNotes'] looks a lot like: merged[national_accounts] and 'SpecialNotes' is a column in merged…so that must mean national_accounts is a column too, right? Not necessarily…the fact that 'SpecialNotes' is inside '' and national_accounts is not, is a hint that they might not be the same kind of thing.
What happens when you try type(merged_accounts)…what kind of object is it? Like I said, I’m still very new to DQ and programming in general but I think merged_accounts is a list of boolean values (ie a bunch of True/False) which you then use as a mask on your dataset to select specific rows based on the T/F values inside merged_accounts.
Both merged['SpecialNotes'] and merged[national_accounts] are ways of selecting specific parts of your larger dataset (merged) but they go about it in very different ways; one selects an entire column whereas the other selects particular rows based on a boolean list.
Does this make sense? Does it help? I really hope so! If not, I’m sure a more experience DQ user will be along to help you out.