I’m starting to think about code optimization for speed in general, and am curious if anyone has any resources for conceptual guidelines, and/or welcoming feedback on the two approaches to the same code below. I ran times for both, and mine was negligibly faster, but I am wondering what tradeoffs are being made with each approach. Which would be faster if the dataset was 1000X bigger, and why? Thanks!
possibilities = income['high_income'].unique() outcomes = len(income['high_income']) entropies =  for e in possibilities: occurances = len(income[income['high_income']==e]) ratio = occurances/outcomes P = ratio * math.log(ratio, len(possibilities)) entropies.append(P) income_entropy = -(sum(entropies))
prob_0 = income[income["high_income"] == 0].shape / income.shape prob_1 = income[income["high_income"] == 1].shape / income.shape income_entropy = -(prob_0 * math.log(prob_0, 2) + prob_1 * math.log(prob_1, 2))