I would do some exploration to understand the entities and their relationships.

My guesses:

- Each player can have multiple sessions
- Each session can have multiple levels, each level can take multiple sessions
- Lower level has to be complete before higher level is started
- Level has to be started before mutually exclusive states fail or complete

Since the question is about comparing levels, `level_number`

is a main unit of analysis. Drilling in, you can analyze on player level or session level (not referring to the game level, just general english).

For each `level_number`

, find percent of distinct players that started the level and fail. This directly answers the question. You can go one step further to study in how many different sessions (no matter which player) was each level started and failed. This gives more granularity than just the percent of players failing each level. One relevant question is does the start event get saved again

with a new `event_datetime`

when player restarts a session or not. Whether you analyze on player level only or expand it to sessions too depends on which approach you think is a better proxy for difficulty, assuming you want to use the abstract concept of difficulty as a proxy measure of “likely to fail”. It could be that this player is just particularly bad at this level and requires 10 attempts while normal people need 2, inflating the difficulty of a level. So you can get into philosophy here about the means vs the ends. Do you measure the result (fail/complete) or the journey (number sessions attempted).

On statistical significance you can do 1-tail binomial test to see how likely is the fail percentage for each level, with the null hyp being every player is equally likely to fail/complete any level.https://www.youtube.com/watch?v=J8jNoF-K8E8&t=676s&ab_channel=StatQuestwithJoshStarmer

You can also get the top 2 highest fail percentage levels and compare their difference with something from https://www.stat.berkeley.edu/~stark/SticiGui/Text/percentageTests.htm#:~:text=To%20test%20at%20approximate%20significance,Z%20>%20z1−α.&text=Because%20the%20null%20hypothesis%20specifies,both%20sample%20percentages%20is%20p.

You probably can not just do top 2 levels but consider all levels. You may want to consider whether the levels are independent, which are assumptions in statistical tests.

Hope this helps you get started, please share how you do it eventually.