How can I debug when the solution is a large table?

Screen Link:
https://app.dataquest.io/m/369/advanced-regular-expressions/8/extracting-domains-from-urls

My Code:

pattern = r"((?<=:\/\/)[^\?\/]+)"
test_urls_clean = pd.Series(test_urls.str.extract(pattern)[0])
domains = pd.Series(hn['url'].str.extract(pattern)[0], name='url')
top_domains = domains.value_counts().head(5)

What actually happened:
I can’t tell which rows of my solution are different from the correct solution. The difference is not clear. How can I see the whole dataframe of differences?

1 Like

This is indeed difficult to do so, and unfortunately not something which is taught/covered in the courses here from what I know.

The only way that I have managed to figure such things out is by using their solution to our advantage.

For example, you can find domains using their solution, and like domains2 using your solution. And then, you use basic boolean operations to find out the indices where those two don’t match.

You can then use those indices to see where your regex fails for a particular URL.

For example, yours is likely failing at the following URL - http://readthisthing.com#.

If you follow the suggested approach above, you will find that your solution results in readthisthing.com#, but the actual solution is meant to be readthisthing.com.

1 Like

Thank you @the_doctor. That helped me through exercise 8, but now I have a similar issue in exercise 9 and I can’t see my mistake