Hi! I am working on the first guided project and am cleaning the data sets. I see that I have a choice between changing the original data or making a duplicate. I can see where changing the original data may not be best practice. If I create the duplicate as an empty list I can use .append
and leave the original data set unchanged.
https://bit.ly/3sIetCz
app_data = [['A', 'B', 'C', 'D'], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3], [1, 2, 3, 4]]
new_data_set = []
def refiner(data_set):
for row in data_set:
if len(row) == len(data_set[0]):
new_data_set.append(row)
print(len(app_data))
print(len(new_data_set))
refiner(app_data)
print(len(app_data))
print(len(new_data_set))
and the output
5
0
5
4
But if I try to create the duplicate by copying the original, the commutative property results in a change to the original data set.
https://bit.ly/3bTJrRk
app_data = [['A', 'B', 'C', 'D'], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3], [1, 2, 3, 4]]
new_data_set = []
new_data_set = app_data
def refiner(data_set):
for row in data_set:
if len(row) != len(data_set[0]):
row_index = data_set.index(row)
del data_set[row_index]
print(len(app_data))
print(len(new_data_set))
refiner(new_data_set)
print(len(app_data))
print(len(new_data_set))
and the output
5
5
4
4
So my question is 1) what is the effect on memory if I use the first method, and 2) if that effect is large, can I use some work-around with the second method and not get stuck changing the original data.
Thanks everyone. Hope you’re having a great week.