TLDR: Should we be taking into account degrees-of-freedom when performing multi-category chisquare tests

We had degrees-of-freedom originally introduced on https://app.dataquest.io/m/99/chi-squared-tests/8/degrees-of-freedom

When we first used `chisquare()`

we were using one dimensional data such as on

https://app.dataquest.io/m/99/chi-squared-tests/10/using-scipy so the

`scipy.stats.chisquare()`

default degrees-of-freedom of `k - 1 - ddof`

made sense (k being number observations)

From what I can tell (https://en.wikipedia.org/wiki/Chi-squared_test#Example_chi-squared_test_for_categorical_data),

when using multi-category chi-squares we need to adjust the degrees of

freedom to be (num_rows - 1)(num_cols -1).

When we do the examples on https://app.dataquest.io/m/100/multi-category-chi-squared-tests/4/finding-statistical-significance

we treat the observed values as if they have 3 degrees-of-freedom but

using a 2x2 crosstab I would have expected that we should be adjusting

the degrees of freedom down to `(2 rows - 1)(2 cols - 1) = 1*1 = 1`

and

therefore setting the `ddof`

parameter of `scipy.stats.chisquare()`

to be `2`

which, given we have 4 pieces of data, `k - 1 - 2`

would give us the

correct 1 degrees-of-freedom.

I am trying to understand the impact of degrees-of-freedom and why we didn’t need to take it into account when doing the missions or the Jeopardy guided project.

Using the following contrived example the difference in `ddof`

makes a considerable difference to the `p-value`

```
b one two All
a
bar 7 5 12
foo 13 6 19
All 20 11 31
```

```
# Observed
o_one_bar = 7
o_two_bar = 5
o_one_foo = 13
o_two_foo = 6
observed = (o_one_bar, o_two_bar, o_one_foo, o_two_foo)
print("Observed:", observed)
# Totals
t_all = sum(observed)
t_one = o_one_bar + o_one_foo
t_two = o_two_bar + o_two_foo
t_bar = o_one_bar + o_two_bar
t_foo = o_one_foo + o_two_foo
# expected
e_one_bar = t_one*t_bar/t_all
e_two_bar = t_two*t_bar/t_all
e_one_foo = t_one*t_foo/t_all
e_two_foo = t_two*t_foo/t_all
expected = (e_one_bar, e_two_bar, e_one_foo, e_two_foo)
print("Expected:", expected)
```

Observed: (7, 5, 13, 6)

Expected: (7.741935483870968, 4.258064516129032, 12.258064516129032, 6.741935483870968)

```
print(stats.chisquare(observed, expected))
```

Power_divergenceResult(statistic=0.32693381180223313, pvalue=0.9548858632175412)

```
print(stats.chisquare(observed, expected, ddof=2))
```

Power_divergenceResult(statistic=0.32693381180223313, pvalue=0.5674701732069024)