Understanding Majority Voting

Lesson link: Learn data science with Python and R projects

Hi! I don’t really get this concept of Majority voting. Specifically the last sentence:

Imagine this is an election and each criterion is a voter. For each app, we’re going to count the number of votes for each result, and the majority will be declared the winner. Note that since we have three criterions and two possible values for each criterion, ties are impossible.

I’ve taken this verbatim from the lesson. I’m lost because I don’t understand the last bit about ties being impossible. Specifically, ties between what?

If you think about it, each criterion will have a value of either 1 or 0. Now since we are not comparing between criterion we have no need of considering ties.

If we look from the perspective of the apps themselves, we are simply using the mode operation to find out which apps have all three criterion set to 1. Since we expect a number of apps for which all three criterion is 1, it is likely that many of the Result column values will be 1 so the consideration of a tie there must be irrelevant.

I also don’t understand the relevance of Majority Voting with respect to the exercise in this lesson. If you check the solution, the code for the same has been included but its not used.

#Code for majority voting
#---
criteria = ["price_criterion", "genre_criterion", "category_criterion"]
affordable_apps["Result"] = affordable_apps[criteria].mode(axis='columns')
#----
def new_price(row):
    if row["affordability"] == "cheap":
        return round(max(row["Price"], cheap_mean), 2)
    else:
        return round(max(row["Price"], reasonable_mean), 2)
    
affordable_apps["New Price"] = affordable_apps.apply(new_price, axis="columns")

affordable_apps["Installs"] = affordable_apps["Installs"].str.replace("[+,]", "").astype(int)

affordable_apps["Impact"] = (affordable_apps["New Price"]-affordable_apps["Price"])*affordable_apps["Installs"]

total_impact = affordable_apps["Impact"].sum()
print(total_impact)

I’ve continued my learning with the assumption that the reason behind the majority voting example is to find the number and percentage of apps that would be impacted by the application of the new price and the total impact value was to estimate the increase in revenue after the price increase.

Is this assumption correct?

We’re trying to answer the following question:

Should we increase the price of this app?

Let’s take a look at what the criteria looks like.

>>> print(affordable_apps[criteria].head())
   price_criterion  genre_criterion  category_criterion
0              1.0                0                   1
1              1.0                1                   1
2              1.0                1                   1
3              0.0                0                   0
4              1.0                1                   1

The number 1 on column x means that the criterion x suggests that the price should be increased. The value 0 suggests we don’t modify the price.

Let’s look at the first app (the one at index 0). Both the price and the category criteria suggest the price should increase, while the genre criteria suggests we shouldn’t modify it. It’s two against one, the majority voted that we should increase the price.

If there were four criteria, we could potentially get ties (two votes for each side).

Nope. The mode tells us the most common vote. In the example we saw above, the mode is 1, so we should increase the price of that app.


Does this clear all of your confusion?

1 Like

Cheers @Bruno !! That’s the second bit of help today. Much appreciated!! Its clear as clear water :smiley:

1 Like