Visualize all columns (vertical bars)

In case it proves useful. Called with df_bar_all( df_name ) will plot all columns. I have a question - see below pic - if someone knows how to improve the x-axis labels to reduce precision, please share.

Numeric columns : 10 bins
Non-numeric cols : only plot if less unique vals than nmax and use as many bins as number of unique vals.

def df_bar_all( df, nmax=15 ) :
    """dataframe,int --> nothing - just plots"""
    # nmax : if not a numeric column, then only plot if less than nmax unique values
    for col in df.columns :
        if pd.api.types.is_numeric_dtype( df[col] ) :
            (df[col].value_counts(bins = 10, normalize = True).sort_index() * 100 ).plot(kind='bar', title=col, rot=30)
            plt.xticks( ha='right')
            plt.show()
        elif len( df[col].unique() ) < 15 :
            labels = list( df[col].unique() )
            lbl_d = { label : labels.index( label) for label in labels }  # else you're looking up index within a loop - bad!
            (df[col].apply( lambda x : lbl_d[x] ).value_counts(
                bins=len(df[col].unique()),
                normalize=True).sort_index()*100).plot(kind='bar',rot=30 , title=col)
            plt.xticks( range( len( df[col].unique())), df[col].unique(),ha='right' )
            plt.show()

Eg : (https://www.kaggle.com/jinxbe/wnba-player-stats-2017)

import pandas as pd
wnba = pd.read_csv('WNBA Stats.csv')
df_bar_all( wnba )
1 Like

You should be able to use round() for when defining those xticks to limit them to specific decimal points. I think that should be one way to do it without much issue.

1 Like

My apologies - the case of non-numeric columns was much-ado-over-nothing. Turns out Series.plot is much smarter than I thought and does labels by default. So, you just have to do

df[col].value_counts(normalize=True).sort_index()*100).plot(kind='bar',rot=30,title=col)
plt.xticks( ha='right')

So this is all the function needs to be :

def df_bar_all( df, nmax=15 ) :
    """dataframe,int --> nothing - just plots"""
    # nmax : if not a numeric column, then only plot if less than nmax unique values
    for col in df.columns :
        if pd.api.types.is_numeric_dtype( df[col] ) :
            (df[col].value_counts(bins = 10, normalize = True).sort_index() * 100 ).plot(kind='bar', title=col, rot=30)
            to_plot = True
        elif len( df[col].unique() ) < 15 :
            (df[col].value_counts(normalize=True).sort_index()*100).plot(kind='bar',rot=30,title=col)
            to_plot = True
        else :
            to_plot = False
        if to_plot :
            plt.xticks( ha='right')
            plt.ylabel( '%' )
            plt.show()

image