Given a dataframe with aggregate data (that is, the entities (Eg. players) should have one row each. If there are multiple rows to an entity, say one per year, then the report will not be meaningful…) and a list of categories of interest, report top 5 (default) entities based on the number of categories in which they rank within the top 5 (default N=5).
Features :
- Function automatically figures out the “Name” column (the non-numeric one with largest number of unique elements) (Can always explicitly specify)
- No cleaning necessary for numeric columns (which are read in as strings because of outliers, etc)
- Find stars or rogues - by default, it fetches entities with the highest values in the categories of interest, but the report_max flag can be set to False to fetch the “least bad” (for example, the best players going by turnovers would have the lowest scores in the TO column)
Any tips/comments welcome. Not really a guided project - this is using the WNBA dataset, but I thought it might be of interest/use to someone at some point.
report_wnba_stars.ipynb (33.0 KB)
Click here to view the jupyter notebook file in a new tab