I am trying to build my skills off of what I am learning here in Dataquest to personal projects. I want to read in two txt files, one of IPv4 addresses, and the other of domains. I would like to create a graph to show correlation of the two files. Is this possible, or do I have to convert the addresses to strings?
Would it be easier to convert the text docs to a csv and read them in with Pandas? Any help or pointers would be greatly appreciated.
Bruno,
Thank you for replying. I may be getting ahead of myself. I would like to read in a text file of domains (domains.txt) and create a bar graph of how many each appears. I understand commands to get this information would be to utilize unique() or value_counts(), but I would like to attempt this on data I have collected.
Down the road, I would like to create a graph from a CSV that contains domains (ex. google.com) and IPv4 addresses (8.8.8.8), and create a graph that correlates relationships between both.
I hope this is clearer. Again, thank you for replying.
It’s not clear to me what is that you’re having trouble with. I see two possible implied questions, but I don’t understand if any of them is a question you want to ask.
One question is “Programmatically, what steps, libraries and functions to I take to accomplish this?”; and the other is “What is a good way to visualize this data?”
To both questions I want to reply that it depends on the data. IPv4 addresses should be categorical variables. How many of them are there per domain? There could be just a few, but most likely hundreds or thousands I would say. In the latter case visualizing them “directly” shouldn’t be an option.
It’s hard to help without seeing the data. And I personally haven’t understood where your difficulties lie.