hi @johnedwardferreira5

Nope, we are not talking about cookies, we are talking about ice-creams and toppings!

Let’s just ignore the GP question for now, and only focus on this example.

Frank answer nope - this is incorrect. You just gave the observed values as expected values.

Firstly the row-totals and the column-total are what we call marginal distribution. If we break our observed values for marginal distribution, we can simplify like this:

- Total no. of ice-creams = 73
- Total no. of chocolate ice-creams (out of 73) = 40 (
**regardless** of toppings)
- Total no. of vanilla ice-creams (out of 73) = 33 (
**regardless** of toppings)
- Total no. of ice-creams with choco-chips as toppings = 38 (
**regardless** of flavor)
- Total no. of ice-creams with dry-fruits as toppings = 35 (
**regardless** of flavor)

Second, the cross-section of flavor with toppings, is what we call as joint distribution. Again breaking the observed values, we have:

- Total no. of ice-creams = 73
- Total no. of Chocolate Flavored ice-creams
**with** Choco-chips as toppings = 25
- Total no. of Chocolate Flavored ice-creams
**with** dry-fruits as toppings = 15
- and so on

When we talk about the Chi-squared test, what we want to know is, given an observed marginal distribution, what is the expected joint-distribution? (right now we are only talking about using Chi-Squared test for homogeneity. If this word is foreign to you then please ignore, we can take this up in the later post!)

Allow me to shorten the names before I try to simplify this.

Chocolate = CH

Vanilla = VA

Choco-Chip = CC

Dry-Fruits = DF

If you try to understand the observed values, we are calculating the overall probability of having CH out all ice-creams. So let’s take it this way:

- P(CH) = 40/73 = 54.8%
- P(CC) = 38/73 = 52.1%

If we **expect** our distribution is homogenous, what we mean is 52.1% of 54.8% of **Total ice-creams** should be CH + CC (Chocolate flavor and Choco-chip toppings).

This translates to, we expect the no. of CH and CC ice-creams out of the total

= 52.1\% * 54.8\% * 73

OR = ( \frac {38}{73}) * ({\frac {40}{73}}) * 73

Which then get’s reduced to **Expected Value for Chocolate ice-cream and Choco-Chip** = \frac {38\ *\ 40}{73} = 20.821 (I rounded it to 21)

Similarly, Expected value for **Vanilla and Choco-chip** = 52.1\% * (100 - 54.8)\% * 73

OR = \frac {38}{73} * \frac {33}{73} * 73

= \frac {38 * 33}{73}

= 17.178 (I rounded it to 17)

Note that, total ice-creams with Choco-chips still remain 38 (21 + 17)! Our marginal distribution for both observed and expected values is always the same.

If you have understood the essence of expected values and can proceed on your own Great! You may ignore the below section.

Following this calculation, we calculate the expected values for all variables(flavor) & attributes(toppings) to get the Expected values table.

Once we complete that, we can then use a X^2 test to understand how our observed values have fared as compared to expected values.

X_c^2 = \sum \frac {(O_i - E_i)^2}{E_i}

c = degrees of freedom

O_i = observed values

E_i = expected values

Degrees of freedom for a Chi-Squared test is calculated as:

Total no of columns without margins(Column-Total) = 2 (Col)

Total no of rows without margins(Row-Total) = 2 (Row)

c = (Col - 1) * (Row - 1) = (2-1) * (2 -1) = 1

This 1 degree of freedom implies that, if we have Marginal distribution and only **one** joint observation, we can still derive the entire observed and expected values table. To elaborate:

Observed Table with Margins and One observation:

Flavor\Toppings |
Choco-chip |
Dry-fruits |
Row-Total |

Chocolate |
25 |
? |
40 |

Vanilla |
? |
? |
33 |

Col-Total |
38 |
35 |
73 |

For a 3 column and 2 row table, c = (3-1) * (2-1) = 2

Observed Table with Margins and Two observation:

Flavor\Toppings |
Choco-chip |
Dry-fruits |
Oreo-cookie |
Row-Total |

Chocolate |
25 |
15 |
? |
60 |

Vanilla |
? |
? |
? |
45 |

Col-Total |
38 |
35 |
32 |
105 |

I hope this helps you somewhat. Do let me know in case you want me to complete the X^2 test calc as well. And in case of newer or further doubts - 404 error Just Kidding! do let me know your thoughts.