Act fast, special offers end soon! Up to $294 is savings when you get Premium today.
Get offer codes

Inefficient code to count values in the intervals

I am referring to 307.5 “The mode - Special cases” and 285.10 “Frequency distributions - Readability for grouped frequency tables”.

The code provided uses explicit cycles over values and intervals. Although the same result can be achieved by using pandas.cut() and pandas.Series.value_counts().

Example is from 307

intervals = pd.interval_range(start = 0, end = 800000, freq = 100000)
gr_freq_table = pd.Series([0,0,0,0,0,0,0,0], index = intervals)

for value in houses['SalePrice']:
    for interval in intervals:
        if value in interval:
            gr_freq_table.loc[interval] += 1
            break

print(gr_freq_table)

is the same as

intervals = pd.interval_range(start = 0, end = 800000, freq = 100000)
print(pd.cut(houses['SalePrice'], bins=intervals).value_counts().sort_index())
2 Likes

Hi @pavel.s.sokolov, when we teach we try to keep the level of abstraction as low as possible, and using pd.cut() adds an extra layer of abstraction. Keeping the level of abstraction lower makes the content easier to understand, so we usually prioritize writing more simple code over short/efficient code which may be harder to understand because of the extra layers of abstraction.

However, thanks for your feedback, we’ll try to see if we can find a place in that lesson to include the pd.cut() version too.

1 Like

Hi @alex,
I understand your point of smaller steps of learning. However, I think that layers of abstraction provide more useful mental model to use, they hide unimportant details.

For example, from the point of the student’s learning path: by the time one gets to this module s/he has already had to grasp principles of 1) “split-apply-combine” sequence (a whole module is devoted to thoroughly explain pd.groupby, SQL also follows this logic) and 2) “avoid explicit cycles in numpy/pandas” (SQL hides cycles completely). Actually, I found this solution by chance, the “cycle approach” appeared just foreign to me and I started investigating.

pd.cut() is analogous in its behaviour to pd.groupby(). I do not think this is too wide a step. Of course this hypothesis should be tested on real students.

2 Likes

Thanks @pavel.s.sokolov, you’re make some good points, we’ll see if we can incorporate pd.cut() into that lesson.