https://app.dataquest.io/m/307/the-mode/4/the-mode-for-discrete-variables
I’m confused as to why the SalePrice of houses is considered continuous. Are the values in this variable not limited to currency values, ie cents?
https://app.dataquest.io/m/307/the-mode/4/the-mode-for-discrete-variables
I’m confused as to why the SalePrice of houses is considered continuous. Are the values in this variable not limited to currency values, ie cents?
Continuous variable, also called measurement variables, are variables that have meaning when expressed as fraction. For example weight, height, price.
Discrete variable, on the other hand, do lose their meaning. For example, you cannot say 1.5 persons.
Well you couldn’t say half a cent either, right?
I understand why the distinction was made in the context of the next exercise (finding the mode of price). There aren’t an infinite amount of prices, but there are far too many to meaningfully come up with a mode. It still doesn’t mean that price is continuous in my understanding.
Price is a continuous variable because it can be scaled up and down. It lowest unit of measurement is, for example, a cent. Even other continuous variables have lowest units of measurement, not just price.
Hm, I thought the definition of continuous was that there could be infinite values between any two values. I guess I’m having trouble applying that to price, even though I see how you could discuss partial cent prices and make the limit to cents when forced to. Someone can have a height between 170.01 cm and 170.02 cm. But you can’t have a price between $1.01 and $1.02. What am I missing here?
You can have a height between 170.01cm and 170.02cm because the lowest unit of measurement allows this. When you can have meaningful fractional values, it is mostly continuous. We won’t be having similar discussions about discrete variables.
That’s what I was trying to say, you can have height between 170.01cm and 170.02cm, but you cannot have price between $1.01 and $1.02.
Age is a continuous variable, like price. Why?
Age is measured in years (time), and its lowest units are in, for example, microseconds.
Age is not expressed as 9.333 years though. But you can convert a person’s age to other units of time.
Again the issue I’m having is that in practice, a partial cent doesn’t exist. You could have infinitely smaller units than microseconds to measure time, for example femtoseconds are relevant when working with certain kinds of lasers.
It’s not the lack of a unit of measurement smaller than one cent that’s troubling me, it’s that something smaller than one cent isn’t meaningful. Currency conversion will almost certain result in a partial amount of the smallest denomination, but in the end the amount must be fit to that smallest denomination. So for working with price, there will always be values with no possible intermediates in between them. It seems like there is maybe a special distinction here with price, in that it can be treated as continuous but in practice must be worked with as discrete.
@grannysmithcrabapple everything will look like a discrete variable at its smallest unit of measurement.
Please take my words for it, as someone that has worked applying lean and six sigma methodologies.
These are the most important things you look at: Is the fractional value meaningful? Can it be expressed in other scales? Does it have reliable units of measurements?
“Please take my words for it,” isn’t an adequate answer from a moderator on a paid data science forum. I’ve addressed all of the points you’ve made. I am looking for a real answer.
A great part of my interest in data science is the transparency it allows, so that we can minimize, in general, how often we have to take someone else’s word for something.
I found this discussion that seemed to confirm my suspicions that while technically discrete, price can be treated as continuous.
I also found this forum and this one. It seems that this has been a point of contention for others.
@grannysmithcrabapple I was trying to draw from my professional experience that you should trust my judgement. I apologize if you took offense. Glad that you found some resources on your own on the internet and your doubts are now cleared.
Cheers!
I was avoiding the technical details. This graph is a high level overview of the the hypothesis test we choose for continuous and discrete variables
We do regression analysis with price and from this table regression is only done with continuous variables.
Hi @grannysmithcrabapple, I think you raise a good point — it’s not that clear why SalePrice
is continuous.
If you think inside the framework of coins and banknotes, you can say there are no intermediate values between $1.01 and $1.02.
However, if you think about banking transactions, I could probably transfer you $1.015 or $1.01567 if I wanted to — I don’t see any theoretical barrier to this. So the price variable must be continuous.
Another example is that suppose I have $1 and I have to pay it entirely and equally to 3 people so that each one would get 0.333333333333… to the infinity. I can never pay it completely and equally like that because even if I take 1000 decimal points, I can never pay it completely. So the price in its very nature is continuous. But for human convenience, we have set a boundary like cents to it. And perhaps, no one will complain that I am keeping $0.01.
As others have said, price is continuous because you can conceptualize an infinite amount of fractional values. The practical aspect of that isn’t relevant. For example, height is continuous, but we couldn’t practically measure to infinite decimal places someone’s height. The data available us would be practically discrete, but conceptually continuous.
I greatly appreciate all the responses on this topic. I’ll address some of what’s come up.
@Sahil I saw someone use the same example come to the opposite conclusion. I’ve linked it above but will paste it here:
Simple way to think about this: you, me, and your best friend need to split $10, in the form of a 10 dollar bill.
Well, we first notice that a SINGLE 10 dollar bill isn’t at all divisible if the money is to keep its value.
So we exchange the $10 for 10 $1’s. Now, a dollar to you, a dollar to me, a dollar to your friend… repeat… repeat.
Now we have a dollar left over. Again, a dollar isn’t divisible, so we go and get four quarters. A quarter to you, a quarter to me, a quarter to your friend.
We now need to split a quarter. Recognizing that splitting the quarter into two dimes and a nickel won’t solve our inequity problem, we just go ahead and cash the quarter in for 25 pennies.
A penny to you, a penny to me, a penny to your friend… repeat… repeat… repeat some more until we’re left with a single penny. In an attempt to remain equitable, we decide to break the penny up into… uh oh.
The penny is indivisible - it is the smallest atomic unit of currency in the US currency system. A continuous distribution should have an infinite number of values between $0.00 and $0.01. Money does not have this property - there is always an indivisible unit of smallest currency. And as such, money is a discrete quantity.
I’m still not clear on what to make of this.
@alex Banking transactions is exactly the topic that made the most sense to me for understand price as continuous, in particular exchanging currencies. Values will always be fractional, it is only when they’re limited to existing in the real world that they’re rounded off to the smallest denomination. But they do exist fractionally before that point.
@monorienaghogho I don’t know why you were “avoiding the technical details” on a technical question. If price is continuous because of this hypothesis testing, this would be a useful thing to include in the course. As it stands, in the ‘Variables in Statistics’ section of ‘Statistics Fundamentals’ these terms are defined as:
Treating a variable as one or the other based on the kinds of analysis typically performed is a different way of defining a variable.
It’d be worth exploring further what the implications are of treating a variable as discrete or continuous for our purposes. As an example, another excerpt (linked above):
It really doesn’t make any difference when the steps for discrete data are small. Most economists treat price as a continuous variable even though it can’t take fractional values. The variability is so great that it swamps the small steps, so you don’t get any more information from it being discrete.
What this says to me is that at some point, practically, a discrete variable can be treated as continuous.
Ultimately, the following explanation wrapped things up for me (also linked above). It explains that price must be treated continuously until forced to be discrete in the real world else rounding errors accumulate.
However, while money is traditionally rounded to discrete values after a calculation has been made using it as a continuous variable, there are instances where you want the more accurate information. For example, if you are producing some liquid product in huge quantities, you want to know what your unit cost of production is to multiple decimal places. Consider the price of gasoline example - often the price quoted involves tenths of a cent. This is possible since you’re selling non-discrete amounts. Also, you can - by increasing the minimum order size - charge a per unit price which is not as constrained. Then there are foreign exchange rates which make the numbers look even more odd. Ultimately there will be some fractions of the smallest denomination that get rounded in a transaction, but if one bank was to handle enough international transactions it could over time accumulate quite a amount from all those rounding errors.
Thanks again everyone for the contributions.
Loved all the details you have added to your post. And thank you for conducting a discussion on this topic. Just to add some more input, Penny is indivisible because it wouldn’t be economical for a country to create a coin lesser than a penny. It is not impossible if the price of creating coins is not an issue, then we can have infinite coins that can further divide a penny. But because of cost factors, and due to the lack of practical need, the government didn’t create anything below a penny. So the limitation here is set by the government for economical reasons, not because they can’t. If we remove external factors, price is continuous.
An example of this is Bitcoin. Because bitcoin doesn’t have a need to print paper money or coins, they can take any number of decimal points they like as far as the system supports it.
Ah bitcoin, that’s a great example. Thanks, this helps a lot.
It might be helpful to think about the categories ‘continuous’ and ‘discrete’ as made up constructs to help us better extract meaning from data, rather than some inherent property of a variable. In the comments here people have presented pretty good arguments to categorize ‘price’ both ways. While the philosophical exercise of trying to identify its TRUE category does have value, I think it might be best to just say “it depends”. When monorienaghogho is doing a regression analysis, price is continuous. When three people are deciding how to divide a 10 dollar bill, price is discrete.
I think the purpose of this classification is to help us decide what methods to use when working with a variable. In general, continuous distributions are better viewed with a kde plot and discrete distributions work well with histograms or bar graphs. Thoughts?