Frequency Distributions 10. Readability for Grouped Frequency Tables Alternate Answer

Alternate solution for:

wnba = pd.read_csv('wnba.csv')

bins = 10

def interval_wbins(start, end, bins):
    return pd.interval_range(start=start, end=end, freq=(end-start) / bins)

intervals = interval_wbins(0, 600, bins)

gr_freq_table_10 = pd.Series([0 for i in range(bins)], index=intervals)

def if_in_interval(value, interval):
    return 1 if value in interval else 0

from itertools import product
for value, interval in product(wnba['PTS'], gr_freq_table_10.index):
    gr_freq_table_10.loc[interval] += if_in_interval(value, interval)

in this case, product() creates an iterable of all possible combinations of values and intervals. The if_in_interval() function just checks to see if the value is in the interval: if it is it returns a 1, and if it isn’t it returns a 0. Regardless, the output will be added to the value in the Series for every iteration in the product.

The product() is to get rid of the nested for-loops, and the user-defined function if_in_interval() is an alternative to breaking out of a for-loop. I think you could skip the function if you wanted to, but you would have to make the line a lot longer or break it up.

The interval_wbins() function is entirely unnecessary. I just made it in case I wanted to try a different number of “bins” for making the intervals.

I initially thought that I could create the entire Series gr_freq_table_10 in one go with comprehensions or for-loops, but it didn’t work out. I like creating entire DataFrames out of comprehensions, but I couldn’t figure out a simple way to add up all the occurrences of the values in relation to the chosen intervals without using the method above. Either way, I think it looks a lot nicer than nested for-loops and breaking.

1 Like