Subtracting N elements

Screen Link: <!https://app.dataquest.io/m/479/sorting-algorithms/7/find-equal-pairs-of-indexes>

In Sorting Algorithms mission 7 they present an optimization.

No optimization:

def find_pair(values):
    N = len(values)
    for i in range(N):
        for j in range(i, N):
            if i != j and values[i] == values[j]:
                return (i, j)
    return None

With optimization:

def find_pair(values):
    N = len(values)
    count = 0
    for i in range(N):
        for j in range(i + 1, N):
            count += 1
            if values[i] == values[j]:
                return (i, j)
    print(count)
    return None

They ask you to figure out exactly how many times the if statement is executed (worst case) in the optimized function, with help of the following function.

((N**2)/2) + (N/2)

The thing here that I can’t fully understand yet, is the following. Why do I have to subtract N in the formula above to get the exact amout that the if-statement is executed in the optimized function?

@nilanbais

Imagine you have this list: [1, 2, 3].

For No optimization, the following comparison takes place:

1 - [1, 2, 3]   # 3
2 - [2, 3]      # 2
3 - [3]         # 1
Total comparison = 6

With optimization:

1 - [2, 3]  # 2
2 - [3]     # 1
Total comparison = 3

The difference is the length of the list which is 3.

When you have the following list: [1, 2, 3, 4]

Total comparison for no optimization = 10
Total comparison for optimization = 6

The difference between the two is again the length of the original list.

For one with optimization, you subtract the length of the original list from the one with no optimization.

At each index i, the interpreter creates a new range of integers.

There’s no need to create a range of integers simply to access the index.

Use enumerate.

enumerate returns a tuple (index, value).

def find_pair(values):
    N = len(values)
    count = 0
    for i, v in enumerate(values):
        j = i + 1
        while j < N: 
            count += 1
            if values[i] == values[j]:
                return (i, j)
            j += 1
    print(count, N)
    return None

From the DQ solution,

def find_pair(values):
    N = len(values)
    count = 0
    for i in range(N):
        for j in range(i + 1, N):
            count += 1
            if values[i] == values[j]:
                return (i, j)
    print(count, N)
    return None

The function above prints 499500 1000 where count = 499500 and number of elements N = 1000.

The optimization shown above is minimal because time complexity of the function is still bounded above by O(N^2).

from collections import defaultdict
def find_pair(values):
    location = defaultdict(lambda: "not present")
    count = 0 
    for i, v in enumerate(values):
        count += 1 
        if location[v] != "not present":
            return (location[v], i)
        else:
            location[v] = i
    print(count, len(values))
    return None 

The function above prints 1000 1000 where count = 1000 and number of elements N = 1000.

By using a hash table (dictionary in Python) to store previously known information, we can speed up the time complexity of algorithm from O(N^2) to O(N).