Measures of variability - range of sales price

Screen Link: Learn data science with Python and R projects

My Code:

import numpy as np
group_by_year = houses.groupby(‘Yr Sold’)[‘SalePrice’].agg([np.min, np.max])
ranges = (group_by_year[‘amax’] - group_by_year[‘amin’]).reset_index()
ranges.rename(columns = {0:‘Range’}, inplace=True)

My code returned this data frame:

Yr Sold Range
0 2006 590000
1 2007 715700
2 2008 601900
3 2009 575100
4 2010 598868

I thought it was much simpler and nicer to just return a dataframe with each year and its respective range of sales price. I understood DataQuest’s solution, and created a function and submitted a dictionary with the ranges just as I was instructed to do so. BUT, if let’s say I was working independently and would have chosen the code above (with a dataframe instead of dictionary and function), would it be wrong? Is DQ instructing us to return a dictionary and use a function for teaching purposes, or is it because is better in this case for some reason?

Hey @MHnt1026e3a37

No, you wouldn’t be wrong. And it’s not a question of right or wrong. For this particular example:

  • we have only 5 data points with two details each. So DQ suggested a dictionary structure.
  • the focus was on different concepts rather than the presentation of results, perhaps that’s why dict object was used.

Generally, the data structure will be dependent on various factors, one of them would be memory utilization. We would not prefer more memory for small amounts of data. This article may help somewhat on this topic. Which Python Data Structure Should You Use? | by Yasmine Hejazi | Towards Data Science (although it does not mention dataframe).

hope this helps.