Act fast, special offers end soon! Up to \$294 is savings when you get Premium today.

Inefficient code to count values in the intervals

I am referring to 307.5 “The mode - Special cases” and 285.10 “Frequency distributions - Readability for grouped frequency tables”.

The code provided uses explicit cycles over values and intervals. Although the same result can be achieved by using pandas.cut() and pandas.Series.value_counts().

Example is from 307

``````intervals = pd.interval_range(start = 0, end = 800000, freq = 100000)
gr_freq_table = pd.Series([0,0,0,0,0,0,0,0], index = intervals)

for value in houses['SalePrice']:
for interval in intervals:
if value in interval:
gr_freq_table.loc[interval] += 1
break

print(gr_freq_table)
``````

is the same as

``````intervals = pd.interval_range(start = 0, end = 800000, freq = 100000)
print(pd.cut(houses['SalePrice'], bins=intervals).value_counts().sort_index())
``````
2 Likes

Hi @pavel.s.sokolov, when we teach we try to keep the level of abstraction as low as possible, and using `pd.cut()` adds an extra layer of abstraction. Keeping the level of abstraction lower makes the content easier to understand, so we usually prioritize writing more simple code over short/efficient code which may be harder to understand because of the extra layers of abstraction.

However, thanks for your feedback, we’ll try to see if we can find a place in that lesson to include the `pd.cut()` version too.

1 Like

Hi @alex,
I understand your point of smaller steps of learning. However, I think that layers of abstraction provide more useful mental model to use, they hide unimportant details.

For example, from the point of the student’s learning path: by the time one gets to this module s/he has already had to grasp principles of 1) “split-apply-combine” sequence (a whole module is devoted to thoroughly explain `pd.groupby`, SQL also follows this logic) and 2) “avoid explicit cycles in numpy/pandas” (SQL hides cycles completely). Actually, I found this solution by chance, the “cycle approach” appeared just foreign to me and I started investigating.

`pd.cut()` is analogous in its behaviour to `pd.groupby()`. I do not think this is too wide a step. Of course this hypothesis should be tested on real students.

2 Likes

Thanks @pavel.s.sokolov, you’re make some good points, we’ll see if we can incorporate `pd.cut()` into that lesson.