Top ten ways to clean your data

Misspelled words, stubborn trailing spaces, unwanted prefixes, improper cases, and nonprinting characters make a bad first impression. And that is not even a complete list of ways your data can get dirty. Roll up your sleeves.

A&E.

2 Likes

Hi @Edelberth ,

Thanks for the post. I will definitely bookmark this list.

Do you think the frustration and time spent cleaning data is one of the top reasons for the popularity in data engineering?

1 Like

Yep @Casandra_Hayward

Do you think the frustration and time spent cleaning data is one of the top reasons for the popularity in data engineering?

I understand that with that question you refer to whether I think that the effort and work behind data cleaning is the reason why people do not like this phase too much, right?

A couple of months ago I shared a personal project to extract the content of the ads of a page of musical instruments, hispasonic.

There I had to do a million steps for the data to be correct and although it is true that it was hard and I enjoyed it a lot because I learned things that otherwise force me to depend on a dataset, which is no longer the case.

So if people get tired or bored, I can only say that I feel sorry for them. As I understand the figure of a Data Analyst is a person who can be self-sufficient in data capture and analysis, so the more complete one is much better, right?

I hope I have not speak too much but the truth is that you have asked me something that connects with a few projects and I also want to write.

On the other hand I am glad that what I have shared is useful to you.

Thank you.

A&E :wink:

2 Likes

I agree. I used to clean sales and financial data as part of our end of the month closing activities. In accounting, we called it reconciliations, but it can be considered under data cleaning tasks.

So, I agree, I could write all day about how errors and discoveries in data cleaning led to improved processes.

2 Likes