Hi!
I am really enjoying my first week in Dataquest. Thanks for making it easy to understand.
I am having a challenge opening the datasets in mission one.
I have tried adding encoding='utf8'
to open('AppleStore.csv')
but am still getting this error UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 920: character maps to <undefined>
Please help, I’m stuck!
Hey @mlamborj, welcome to DQ community.
When your file is not encoded as UTF-8
, you should try other types of encoding.
Some of the possible encoding values are Latin-1
and Windows-1251
.
with open('AppleStore.csv', encoding='Latin-1') as f:
# do something
You can also install the module chardet
Using pip install
pip install chardet
or
Using conda install
conda install chardet
When we do not know the encoding type for a particular file, we can do the following:
import chardet
import pandas as pd
def find_encoding(fname):
r_file = open(fname, 'rb').read()
result = chardet.detect(r_file)
charenc = result['encoding']
return charenc
my_encoding = find_encoding('myfile.csv')
df = pd.read_csv('myfile.csv', encoding=my_encoding)
or in your example,
with open('AppleStore.csv', encoding=my_encoding) as f:
# do something
@alvinctk,
I have tried utf8
, Latin-1
, Windows-1251
with no success. Even added the errors='ignore'
to the open()
function but no luck. Maybe I should add that I am working on a locally installed jupyter notebook running python 3.7.4 on windows
Ok give me a few minutes, let me try it out.
using your find_encoding
function there I was able to detect the file encoding as utf-8
. Turns out I was writing utf8
without the hyphen!
Thanks for your help.
Good catch. utf-8
!= utf8
. Typo are bounds to happen.
hi everyone!i’m a complete newbie in data science!i’ve found the first project quite complex!and i have a simple question! why do we use jupyter notebook instead of using python program directly?!!