I need some guidance/help to explore the White House Visitors data.
One of the questions in the optional practice in the Dates and Times mission was to try to explain why the minimum and maximum lengths of appointments was larger than what we might expect. My speculation is, because some of the visitors come from overseas, they stay for a longer period of time at the residential areas of the White House. To verify it, I was thinking to take the top 10 percentile of appointment lengths and construct a frequency table of the locations of those visits. If most of the places are residential areas, this will support my hypothesis.
I’d like to hear some thoughts on my proposed workflow and/or suggestions for alternative analysis. Also, I’d like to hear people’s thoughts on why the minimum length is larger than what we might expect.
I think your initial line of thinking on how to go about trying to validate your hypothesis is solid.
One thing I would like to draw your attention to is that data isn’t the only way to validate your hypothesis. You can do research that can inform it also, and then optionally use data to tie things back together. This is an especially useful skill if you find yourself working in an unfamiliar vertical.
For instance you might find yourself working as a data scientist in a marketing team. If you don’t have marketing experience, there are times where the answers you seek might not be found soley in the data — you might need to learn more about the subject matter and then circle back to the data to see if it confirms this.
For this example, there might be resources online that tell you whether it’s routine for overseas visitors to stay in the White House residence.
I hope this helps you understand some of the possibilities in answering this question.