Functions Practice Problem - Code Efficiency

Hi, I’m quite new to programming. But I am aware that there’s this thing called “efficient code”. For this practice problem, I’ve came up with two solutions. My concern is, in terms of efficiency which one is more efficient ? I don’t have a full grasp of understanding how the computer stores this or something.

And also, any tips to keep in mind when thinking about the efficiency of your code ? Like for instance, in one of the lessons here at dataquest where they tackled about using multiple ‘ifs’. If we use ‘if’ multiple times, the computer will check the result multiple times as well even though we satisfy the first ‘if’.

My two code solution for this practice problem is stated below.

def check_unique_values(csv_filename, col_name):
    from csv import reader
    file = open(csv_filename)
    read = reader(file)
    data = list(read)
    column = data[0]
    rows = data[1:]
    
    if col_name not in column:
        return None
    
    information = dict()
    if col_name in column:
        for i in rows:
            information[i[column.index(col_name)]] = information.get(i[column.index(col_name)], 0) + 1

    if len(information) == len(data)-1:
        return True
    elif len(information) != len(data)-1:
        return False   
print(check_unique_values('users.csv','email'))

def check_unique_values(csv_filename, col_name):
    from csv import reader
    file = open(csv_filename)
    read = reader(file)
    data = list(read)
    column = data[0]
    rows = data[1:]
    
    for x in rows:
        if col_name in column and col_name not in information:
            information[col_name] = [x[column.index(col_name)]]
        elif col_name in information and col_name in column:
            information[col_name].append(x[column.index(col_name)])
        else:
            return None

    if len(set(information[col_name])) == (len(data)-1):
        return True
    else:
        return False

You can answer this question yourself and in a (arguably) better way than looking at this from a theoretical point of view: test it!

Run each of the functions 1000 times, measure how long it takes for each run, store the results and analyze them (maximum, minimum and average execution times, outliers, and so on).

Running 1000 times was just an example, try to get a feel for a right number of times as you experiment. The larger, the better, up to a certain a point, because you’ll want to move on with your life and not run it a virtually infinite amount of times.

What you should vary is the size of the data you’re running the functions one, because one of them might perform better for a small amount of data, and the other may perform better for a large amount of data.

By the end of the second Python course, you should have enough knowledge to execute on my suggestion above.

We have a course on this: Algorithm Complexity, but it may be too early for you to tackle it at this point.

2 Likes

Hi Bruno, thanks for the response. I see, I remember I’ve read on some article before that you can measure the execution time but I haven’t paid attention back then. Anyways thanks for the insight and it’s good that these kind of things will be tackled in dataquest. Guess I shouldn’t rush things.

That depends on which path you’re on. The course I showed you above is part of the “Data Engineering in Python” path and I think it is the fourth course in the path.

I plan to tackle all the path anyways. Hopefully.

1 Like