On page 11/13 ( frequency tables and continuous variables) of Frequency distributions there is this quote:
“A similar reasoning applies when we read grouped frequency tables. If we had an interval of (180, 190] for a continuous variable, 180 and 190 are not the real limits. Instead, the real limits are given by the interval (179.5, 190.5], with 179.5 being the lower real limit of 180, and 190.5 the upper real limit of 190.”
Does this means that pandas when using value_counts considers the real limits of an interval ( i will use the same numbers of the quote) 179.5 ( instead of 180) and 190.5 ( instead of 190)?
I’m not sure I follow here – any chance you can give an example?
Carolina, I think the confusion here is around grouped frequency tables. Basically, because rounding must occur to some extent with decimals, they’re saying that anything above 179.5 is considered 180 (for the purpose of this specific group.)
Those groups are only for a grouped frequency table. Using value_counts will count anything in that group for a grouped frequency table, or any specific values for a regular frequency table.
hopefully this makes sense, I’m no data scientist! (yet)
Yes, now that I read it again it makes sense. I really didn’t catch that was just for frequency tables.