Just consider a simple example of this with the following data -
original = [ ["1234", "something"],
["", "something"],
["1000", "something"],
["", "something"],
["1002", "something"] ]
You first create the id_set, which would be {1234, 1000, 1002}
in this case.
Then you have the following code without the update at the end as you suggest -
cur_id = 1000
for row in original:
if not row[0]:
while cur_id in id_set:
cur_id += 1
row[0] = str(cur_id)
For the 2nd row, we hit the if condition.
We then check if cur_id
is in id_set
or not. It is. So, we update cur_id
by 1
. Now cur_id
is not in id_set
.
We move onto the next code line and set that id for that row.
We continue with the loop. We reach the 4th row now (since it’s empty).
We hit the while
loop and we check if cur_id
is in id_set
or not.
It’s not. Our cur_id
is 1001
which is not in id_set
. So, we don’t update cur_id
.
We end up assigning row[0]
the value 1001
.
However, we assigned that same value to our 2nd row previously. So, we end up with duplicate values here.
The source of the confusion here, as per me, is that id_set
is not being updated in this code. Updating cur_id
twice makes it a tad bit to understand.
Alternative Solution 1
I do think that the provides solution is a bit complex and shouldn’t have to be.
I think, a better solution would be -
cur_id = 1000
for row in rows[1:]:
if not row[0]:
while cur_id in id_set:
cur_id += 1
id_set.add(cur_id)
row[0] = str(cur_id)
I added id_set.add(cur_id)
to the above and removed the second update to cur_id
.
The above change makes it clear that we have a new id assigned to a row and removes the confusion of having to update cur_id
again.
Alternative Solution 2
As per me, an even more logical approach would be -
cur_id = 1000
for row in rows[1:]:
if not row[0]:
if cur_id not in id_set:
id_set.add(cur_id)
row[0] = str(cur_id)
cur_id += 1
The above has a more logical breakdown -
- Iterate through the rows
- Check if there is an id or not
- If there is no id, check if the current id exists in the set of existing ids.
- If the current id is not in the set of existing ids, then
4.1 Add the current id to the set of existing ids
4.2 Assign the current id to the row
- Update the current id by 1.
I am unsure of whether there is additional context to these problems or a ceiling on what kind of python knowledge one must have to solve this. But I think my solution shouldn’t be problematic. The only different thing here really is the not in
part. Which shouldn’t be complicated to understand.
Hopefully this helps.