The mission on Frequency Distributions has an example of categorizing a numeric column with labels in a new column :

```
def make_pts_ordinal(row):
if row['PTS'] <= 20:
return 'very few points'
if (20 < row['PTS'] <= 80):
return 'few points'
if (80 < row['PTS'] <= 150):
return 'many, but below average'
if (150 < row['PTS'] <= 300):
return 'average number of points'
if (300 < row['PTS'] <= 450):
return 'more than average'
else:
return 'much more than average'
```

I was looking for a something more pythonic. You want to be able to say

```
df[ newColName ] = label( df[ colName ] , markers, labels ) # where, in this case,
# markers would be [20, 80, 150, 300, 450] and
# labels would be ['very few points', 'few points', ... etc ]
```

That’s what follows… First a function (if you know of something in one of the standard libraries that does this, please share. I’d any day prefer something that already exists) that returns the interval number :

```
import math
import numpy as np
def interval( num_list, num ) :
""" list of numbers, number --> int"""
# [5,10,15], 5 --> 1
# [5,10,15], 4 --> 0
# [5,10,15], 15 --> 3
# [], x --> ValueError
if math.isnan( num ) or num == np.NaN :
return None
if len(num_list) == 0 :
raise ValueError('Input list cannot be empty')
L = len( num_list )
current = int( L / 2.0 )
while True :
if 0 == current :
if num < num_list[current] :
return 0
else :
return 1
elif L - 1 == current :
if num >= num_list[current] :
return L
else :
return L-1
else :
if num_list[current] <= num < num_list[current+1] :
return current
elif num == num_list[current + 1] :
return current + 1
elif num > num_list[current + 1] :
current = int( ( L + current)/2.0 )
elif num < num_list[current] :
current = int( current/2.0 )
```

And now a labeler that takes a pd.Series, marker list and label list and returns a series of labels :

```
import numpy as np
def label( series, markers, labels ) :
""" pd.Series numeric, list of numbers of len N, list of len N+1 --> pd.Series """
return series.apply( lambda x : labels[ interval( markers, x ) ] if not np.isnan(x) else 'invalid')
```

And then, for our specific case :

```
pts_order = ['very few points', 'few points', 'many, but below average',
'average number of points', 'more than average', 'much more than average']
wnba['PTS_ordinal_scale'] = label( wnba['PTS'], [20,80,150,300,450], pts_order)
```

```
wnba['PTS_ordinal_scale'].value_counts().plot.bar( rot=30)
plt.xticks(ha='right')
plt.show()
```