Guided Project: Investigating Airplane Accidents - Solution needed

Screen Link: https://app.dataquest.io/m/214/guided-project%3A-investigating-airplane-accidents/2/linear-and-log-time-algorithms

Here we are asked to write an algorithm having linear and logrithmic time complexities , but given that the string can be anywhere (in any row/column) we are bound to search in the whole dataset . Also we cannot sort the dataset since it will get distorted.

1 Like

I’m interested in the solution for this too.

For the linear algorithm my guess was:

lax_code=[]
for row in aviation_list:
    if 'LAX94LA336' in ''.join(row):
            lax_code.append(row)

but no clue for the logarithmic one.

2 Likes

Hi @jfpsmatos, @aji.dandvate,

As per the instruction:

Try writing a log(n) time algorithm that searches AviationData.txt for the string LAX94LA336.

points out that we only need to search it in the text file, We can use the binary search algorithm:

def find(L, target):
    start = 0
    end = len(L) - 1
    while start <= end:
        middle = (start + end)// 2
        midpoint = L[middle]
        if midpoint > target:
            end = middle - 1
        elif midpoint < target:
            start = middle + 1
        else:
            return midpoint

with open("AviationData.txt", "r") as f:
    aviation_data = f.read()
    aviation_data = sorted(aviation_data.split())
    print(find(aviation_data, 'LAX94LA336'))

If the values are sorted, then binary search algorithm is log(n) complexity.

Best,
Sahil

3 Likes

Hi @jfpsmatos,
I think your solution still has a quadratic complexity because ‘in’ operator within Python still has to perform search and compare each letter in the joined row.
Based on the string content ‘LAX94LA336’, we can say it corresponds to ‘Accident Number’ column. Knowing this makes our search linear:

lax_code = []
for row in aviation_list:
    if row[2]=='LAX94LA336':
        lax_code.append(row)

Accessing row[2] is constant.
Regards,
Said