day_ymd=day.set_index(['year','month','day']) holidays_midnight_ymd=holidays_midnight.set_index(['year','month','day']) holiday_join=day_ymd.join(holidays_midnight_ymd, how = 'left', lsuffix='_day',rsuffix='_night')
What I expected to happen:
I expected there to be a list of holidays similar to this list
night['holiday'].unique() array(['None', 'Columbus Day', 'Veterans Day', 'Thanksgiving Day', 'Christmas Day', 'New Years Day', 'Washingtons Birthday', 'Memorial Day', 'Independence Day', 'State Fair', 'Labor Day', 'Martin Luther King Jr Day'], dtype=object)
in the holiday_night column afterward, rather than entirely NaN’s, and I expected there to be about 24*(5 to 7) values for each holiday since previously each holiday had around 5 to 7 counts, all at night. I also expected there to be some NaNs in the column, since the holidays table only contained days on which there was a holiday, and the days table also contained days on which there was not a holiday. I was planning to make a further cut after that point for days on which there was a holiday, and do some analysis about the weather, traffic, time of day, and other factors. Clearly holidays do not occur only at night.
What actually happened:
holiday_join['holiday_day'].unique() array(['None'], dtype=object) holiday_join['holiday_night'].unique() array([nan], dtype=object)
I’m working on the I-94 project, with some additional exploration after completing the certificate, to make it ready for community review. It will be my first project I submit for community review. I’m trying to analyze the day/night holiday patterns. Only the holidays are only listed at midnight in this data set. I could probably split it by day and night using a for loop, or just consider the combined data set, and I may do that if I can’t get this to work. Either would be an okay approach.
Right now I have a dataframe with all daytime data with an index given by year, month, and day (numbers), which is called day_ymd. I also have a dataframe with all holiday data, where the holiday data is only given at midnight, in a dataframe called holidays_midnight_ymd. Again, the index is a three layer index with year, month, day in that order. The day_ymd data frame has many entries for every year,month,day set because there are many hours in a day. The holidays_midnight_ymd dataframe has unique indices.
I am doing a left join from day_ymd on holidays_midnight_ymd because I want there to be an entry for every hour in the day corresponding to a holiday, if one exists. I set the left suffix to “_day” and the right suffix to “_night”, because the left data is drawn from daytime data and the right data is drawn from midnight data.
I expected the columns with the suffix of “_night” to have non nan values. In particular, if the index in the holiday_join table is 12,11,22 for Thanskgiving, I expected all 24 entries associated with the 24 hours of that day to have data that could be matched in the holidays_midnight_ymd table and that would appear in the holiday_night column, because the key 12,11,22 matches the key for Thanksgiving for all 24 hours of the day in the day_ymd table and it’s a left join from that table. Instead I got NaN’s for all columns in the right table. I’m not sure why. That definitely shouldn’t have happened.
Can anyone explain what I did wrong, whether it was programming or conceptual?