At work I’ve been tasked with finding differences between two large data sets. Mainly, I just need to have the data sets run against one another and then return any differences between the two. This is normally solved with a combination of VLOOKUP statements. But, these two data sets are too large to be handled in excel.
The data sets are pretty simple, just large. Two columns, one column labeled “ID” and another labeled “Value” for the sake of simplicity.
I was thinking that this is a job for python and that I could probably solve this with some sort of nested list comprehension. Perhaps I could compare these two data sets and then throw each row that doesn’t match into it’s own list of lists?
I think I’ve seen this done in a python lesson before, I just have a few questions.
Is this the best way to go about solving this problem? Is it efficient? I may have to do this again with other data sets.
Would a nested list comprehension compare every single row of data set A to every single row of data set B? These data sets may or may not be sorted so I’d have to make sure that I’m comparing every row. Ex: It’s not enough to compare row 1 to row 1 and row 2 to row 2, I have to compare row 1 of data set A to every row of data set B.
If this post isn’t appropriate for this forum, I understand. I thought this might be a good place though, because this is one of the first times applying what I’ve learned here to a real world scenario.