Hi @adrianzchmn.
Great that you took the time and the effort to revise your project.
Regarding the histograms: What I was trying to say is that using fixed bin sizes for different variables might be a good or a not so good idea depending on the data. Obvously c&p-ing code 8 times and just changing variables and bin sizes is not a good idea either. You want to have the plots generated in a loop (as in your implementation). What I would probably try to do is wrap the body of your loop in a function and call the function for each variable with a specific bin size. In your case maybe something like this (I just made up the bin sizes for demonstration, so please donât use them in your actual analysis):.
cols = [
"Sample_size",
"Median",
"Employed",
"Full_time",
"ShareWomen",
"Unemployment_rate",
"Men",
"Women"
]
# Same length as cols
bin_sizes = [4, 8, 6, 12, 4, 12, 7, 5]
# Define a plotting function
def plot_hist(df, col, bin_size):
"""Plot histogram for supplied variable and bin size."""
fig = plt.plot(figsize=(10,5))
sns.histplot(data=df, x=col, bins=bin_size)
sns.despine(left=True, bottom=True)
plt.title(col, weight='bold').set_fontsize('16')
plt.show()
# Aggregates elements from each of the iterables supplied and returns tuples.
for var in zip(cols, bin_sizes):
# Pass arguments to plotting function for each iteration
plot_hist(recent_grads, var[0], var[1])
This way you can have custom bin sizes with almost the same amount of code.
Maybe this helps for future projects.
BTW: The part fig = plt.subplots(0,8, figsize =(10,5))
in your code doesnât really work, because you are overriding the fig variable with every pass of the for-loop. So, you just get a plot for every variable and not 8 subplots in 1. I donât think you actually need to have subplots here, so you can just use fig = plt.plot(figsize=(10,5))
.
Best
htw