Making inner loops for more optimized solution

I have a question about more optimized code for the loops as you know iterrows are so much slow and we can not use for the big datasets and i can not find any inner looping solution for using apply or different perspective . Please help me to find any better looping solution my code is under this maybe you can give me a solution for the vectorize this code i also need the id of the teachers by the way

listEmpty =

dictionaryTeacher = {}

for i,ex in teacher.iterrows():

lat1 = ex['Latitude']
lon1 = ex['Longitude']

Id = i 


for b,c in ven.iterrows():
    
    
    lat2 = c['Latitude']
    lon2 = c['Longitude']
    
    nameVen = c['Name']
    
    
    listEmpty.append((distanceCalculator(float(lat1),float(lon1),float(lat2),float(lon2)),Id,b))
    
    
    
demian = []    
    
    
listEmpty.sort()



demian = listEmpty[0]


dictionaryTeacher[ex['Name']] = demian

listEmpty = []

demian = []

Hey @ozankavcu, welcome to the community. When you are calculating something using an external function like your distanceCalculator(), you have limited options in vectorization, as that function will have to be run on each row in your dataframe.

I’m going to simplify your example code to a generalized example and then show you how to use apply() instead of iterrows(). In this example, we

  • iterate over every row in a dataframe using iterrows()
  • extract some variables from the row
  • use those variables with some fixed variables to call a function
  • append the result of that function to a list.
result_list = []

for i, row in df.iterrows():
    a = row['a']
    b = row['b']
   
    result = run_function(0, 1, a, b)
    result_list.append(result)

In order to use apply, we need a function that takes a row as input and returns the final value. Let’s rewrite the logic in a function that meets this specification:

def wrap_run_function(row):
    a = row['a']
    b = row['b']
   
    result = run_function(0, 1, a, b)
    return result

Then we can use apply to apply that function to every row:

result_series = df.apply(wrap_run_function, axis=1)

The result will be a pandas series with all the values. This may or may not be quicker code, but it’s certainly cleaner code.

A final notes — It also looks like you’re calculating the distance between two points, and this very well might be the part of your code that’s slow, depending on how that’s implemented. If speeding this up is important, you might want to recreate the logic of that function in a vectorized way using pandas or numpy, and then you will likely see a speed improvement. I know when I’ve worked with geodata there are some algorithms that are really computationally heavy, and when you enter that you can either look to reimplement those or alternatively find a library that’s done the work for you.

5 Likes