Alternate solution for:
wnba = pd.read_csv('wnba.csv')
bins = 10
def interval_wbins(start, end, bins):
return pd.interval_range(start=start, end=end, freq=(end-start) / bins)
intervals = interval_wbins(0, 600, bins)
gr_freq_table_10 = pd.Series([0 for i in range(bins)], index=intervals)
def if_in_interval(value, interval):
return 1 if value in interval else 0
from itertools import product
for value, interval in product(wnba['PTS'], gr_freq_table_10.index):
gr_freq_table_10.loc[interval] += if_in_interval(value, interval)
in this case, product()
creates an iterable of all possible combinations of values and intervals. The if_in_interval()
function just checks to see if the value is in the interval: if it is it returns a 1, and if it isn’t it returns a 0. Regardless, the output will be added to the value in the Series for every iteration in the product.
The product()
is to get rid of the nested for-loops, and the user-defined function if_in_interval()
is an alternative to break
ing out of a for-loop. I think you could skip the function if you wanted to, but you would have to make the line a lot longer or break it up.
The interval_wbins()
function is entirely unnecessary. I just made it in case I wanted to try a different number of “bins” for making the intervals.
I initially thought that I could create the entire Series gr_freq_table_10
in one go with comprehensions or for-loops, but it didn’t work out. I like creating entire DataFrames out of comprehensions, but I couldn’t figure out a simple way to add up all the occurrences of the values in relation to the chosen intervals without using the method above. Either way, I think it looks a lot nicer than nested for-loops and break
ing.