Challenge Opening Data Sets

I am really enjoying my first week in Dataquest. Thanks for making it easy to understand.
I am having a challenge opening the datasets in mission one.
I have tried adding encoding='utf8' to open('AppleStore.csv') but am still getting this error UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 920: character maps to <undefined>
Please help, I’m stuck!


Hey @mlamborj, welcome to DQ community.

When your file is not encoded as UTF-8, you should try other types of encoding.

Some of the possible encoding values are Latin-1 and Windows-1251.

with open('AppleStore.csv', encoding='Latin-1') as f:
    # do something

You can also install the module chardet
Using pip install

pip install chardet

Using conda install

conda install chardet

When we do not know the encoding type for a particular file, we can do the following:

import chardet
import pandas as pd

def find_encoding(fname):
    r_file = open(fname, 'rb').read()
    result = chardet.detect(r_file)
    charenc = result['encoding']
    return charenc

my_encoding = find_encoding('myfile.csv')
df = pd.read_csv('myfile.csv', encoding=my_encoding)

or in your example,

with open('AppleStore.csv', encoding=my_encoding) as f:
    # do something
I have tried utf8, Latin-1, Windows-1251 with no success. Even added the errors='ignore' to the open() function but no luck. Maybe I should add that I am working on a locally installed jupyter notebook running python 3.7.4 on windows

Ok give me a few minutes, let me try it out.

using your find_encoding function there I was able to detect the file encoding as utf-8. Turns out I was writing utf8 without the hyphen!
Thanks for your help.

Good catch. utf-8 != utf8. Typo are bounds to happen.

