Data cleaning general question

Hi! After leaning all the data cleaning skills in pyhon, I still met problems in real life when I clean the dataset.

Below are the questions I have but I can’t think of a way that I learned from this course to solve.

Let say we have 8 columns they are " customer, salesperson, end Jan 18, end Jan 19, end Feb 18, end Feb 19, 18 TTl, 19 TTL"

What is the way to replace all values of year 18 columns with N/A?

My idea:

  1. use df.filter(regex='18$',axis=1) to select all year 18 columns, but how to include 18 TTl as well ?
    2, After creating the dataset I want, then I need to replace values, but so far we only learn series.str.replace(), it will take long time to do it for each column. any way I can do it at once?
    3, After changing all values in years 18 columns, I want to change the columns name from year 18 to 20. We learn how to do it in the course, but we still need to take time to print each name like {end Jan 18: end Jan 20,....} in this case, I just want simply change 18 to 20 in the coumns, anyway in python can do it like excel Find/Replace function?

Thank you!!

1 Like

I will suggest using option 2 nested around a for loop on the series.str.replace() function.

1 Like