Find the stars in your (aggregate) dataset

Given a dataframe with aggregate data (that is, the entities (Eg. players) should have one row each. If there are multiple rows to an entity, say one per year, then the report will not be meaningful…) and a list of categories of interest, report top 5 (default) entities based on the number of categories in which they rank within the top 5 (default N=5).

Features :

  • Function automatically figures out the “Name” column (the non-numeric one with largest number of unique elements) (Can always explicitly specify)
  • No cleaning necessary for numeric columns (which are read in as strings because of outliers, etc)
  • Find stars or rogues - by default, it fetches entities with the highest values in the categories of interest, but the report_max flag can be set to False to fetch the “least bad” (for example, the best players going by turnovers would have the lowest scores in the TO column)

Any tips/comments welcome. Not really a guided project - this is using the WNBA dataset, but I thought it might be of interest/use to someone at some point.

report_wnba_stars.ipynb (33.0 KB)

Click here to view the jupyter notebook file in a new tab


FYI, to align an embedded image in your jupyter (image inserted using Edit > Insert Image and no longer depends on external file) :

In edit mode, the cell will look like (prior to editing for this purpose of aligning) :


Where Capture.PNG is the file that has been attached.
Change the [Capture.PNG] to something a CSS style block can refer to (meaningfully) :

Eg. :

Then, in an earlier cell, (code, not markdown), you need :

    img[alt=aligned_image] {
        float : left;

And that’s it…



Thanks a lot for this suggestion, it’s exactly what I was looking for recently!

Glad that helped :slight_smile: Took a while to find it because it was scattered over multiple stackoverflow posts…

1 Like