Data Cleaning Basics-encoding meaning

HI, can you help to explain what does the encoding=‘Latin-1’ mean in the pandas.read_csv()?

Thank you!

pandas.read_csv accepts encoding option to support different standard formats.

larin_1 is for Western Europe.
You can find here more formats.

1 Like

@candiceliu93 Hey,

We are telling pandas that encoding type is “Latin-1”. Mostly, we use encoding type UTF-8.

Encoding type is just the way computer understands our language. As you know, for computer everything is ones and zeroes.

2 Likes

Thank you! so based on the language of the dataset to select the encoding type? have i understood it correct?

1 Like

Thank you for sharing the resource! helpful!

@candiceliu93 Yes, but there are also multiple encoding types available for one language, so it also depends on what encoding type sender/source has used.

If I get a dataset, i can find the encoding type online or i have to find out from the data source?