Question on threading.Thread.join()

I am on this topic about threading.Thread.join()
https://app.dataquest.io/m/169/i%2Fo-bound-programs/7/joining-threads

Does anyone know if the following are the same?

Version A:

import threading

def task(team):
    print(team)
    
teams = [1,2,3,4,5,6,7,8,9,10]
threads = []
for team in teams:
    thread = threading.Thread(target=task, args=(team,))
    thread.start()        
    for thread in threads:   # <-- join after all threads have started
        thread.join()        

Version B:

import threading

def task(team):
    print(team)
    
teams = [1,2,3,4,5,6,7,8,9,10]
threads = []
for team in teams:
    thread = threading.Thread(target=task, args=(team,))
    thread.start()        
    thread.join()      # <--- join immediately

Hi @scoodood,

What you want to compare is:

  1. joining straight after starting each thread
  2. joining all threads after starting all threads

Right?

I am asking because your code is not doing that. Version A is doing the for loop inside of the other for loop so the join also happens immediately. Version B is not adding the threads to the list so the second for loop is not doing anything.

I will then answer your question assuming that the codes are the following:

Version A

import threading
import time

def task(team):
    time.sleep(0.5)
    print(team)
    
teams = [1,2,3,4,5,6,7,8,9,10]
threads = []
for team in teams:
    print('starting thread {}'.format(team))
    thread = threading.Thread(target=task, args=(team,))
    thread.start()        
    thread.join()      # <--- join immediately

Version B

import threading
import time

def task(team):
    time.sleep(0.5)
    print(team)

teams = [1,2,3,4,5,6,7,8,9,10]
threads = []
for team in teams:
    thread = threading.Thread(target=task, args=(team,))
    thread.start()        
    threads.append(thread)

for thread in threads:
    thread.join()

I added a sleep() so that you can see the difference when you execute the code.

In version A you do not execute the threads in parallel. When you do a join() the program (main thread) will stop and wait for that thread to finish. So if you do join() just after starting, it will only execute the next iteration for the for loop after that thread is done.

Here is a diagram of what happens in version A:

You can see that at each join(), the main thread will stop and wait for that thread to finish before continuing.

In version B all threads will be running after the first for loop. Then it will wait for the first thread to finish (but the other threads continue running, only the main program stops). Then it will wait for the second and so on.

Here is a diagram of what happens in version B:

In this case, all threads start and the main thread only stops when we join() the first thread. Then, it will momentarily run to join() the second one and stop to wait for it to finish. It will only really restart executing when the last thread finishes.

I hope this helps. Let me know if you have any other questions.

3 Likes

Hi @Francois,
Sorry for my typo. Yes you are right about my VersionA and VersionB assumption. Your explanation is superb! Thanks

2 Likes