290-2 Can't open file with numpy.genfromtxt()

So, I’m currently trying to do the boolean indexing with numpy mission of the data analyst path (https://app.dataquest.io/m/290/boolean-indexing-with-numpy), but I’m getting this error message in jupyter notebook, when I try to open csv files with numpy.genfromtxt().

import numpy as np
import pandas as pd

hn = np.genfromtxt("Hacker News.csv", encoding = "utf8", delimiter = ",")'''

I’m using jupyter because I find it better to organize what I learn there than going back to the missions whenever I have a doubt or have to refresh something I’ve forgotten. The np.genfromtxt() worked just fine in the dataquest website, but not in jupyter. There I got this error message:

ValueError: Some errors were detected !
    Line #4 (got 3 columns instead of 7)
    Line #22 (got 2 columns instead of 7)
    Line #27 (got 3 columns instead of 7)
    Line #39 (got 8 columns instead of 7)
    Line #46 (got 3 columns instead of 7)
    Line #49 (got 3 columns instead of 7)
    Line #56 (got 3 columns instead of 7)

Can someone tell me why ?

Something that I also found interesting is that I could open the file just fine with the python standard csv library (as shows in the missions is step one of the Data Analyst path), but not with the numpy library. I also managed to open the file with pandas( reader_csv) and it worked. So the problem seems really to be something with numpy.

Can anyone tell me the difference between reading csv files with standard csv library, numpy and pandas ? Also, how do I open csv files with numpy in Jupyter without getting the error message I’m receiving ?

Hey, Célio.

Something at the beginning of the question makes me not understand the whole thing.

You say you are on screen 290.2, presumably working locally, but then you mention the file Hacker News.csv that has nothing to do with this mission.

Can you please clarify the following?

  1. What mission and screen does this question refer to?
  2. What exact file are you using locally and how did you obtain it?

Hello, Bruno. Thanks for your reply. I wrote the right url in my first message. I’m trying to work with the hacker news post, because the taxi data is too big and I don’t need the huge data set just to make anotations in my jupyter notebook file. I decided to use the hacker news post because I had already downloaded it for the guided project at the end of Step 1 anyway.

To clarify, I’m indeed at Step 2 - Introduction to Numpy and Pandas of the Data Analyst path, but I was experimenting with the Hacker news file for my question above. However, I think I’d get the same error message, regardless of what file I’m using.

Thanks for your help.

I can’t tell you why exactly unless you indicate what file you used. But I can explain the gist of it.

Here’s a file to experiment with: cxf.csv (839 Bytes)

It looks like this:

id,title,url,num_points,num_comments,author,created_at
12296411,Ask HN: How to improve my personal website?,,2,6,ahmedbaracat,8/16/2016 9:55
11337617,"Shims, Jigs and Other Woodworking Concepts to Conquer Technical Debt",http://firstround.com/review/shims-jigs-and-other-woodworking-concepts-to-conquer-technical-debt/,34,7,zt,3/22/2016 16:18
10379326,That self-appendectomy,http://www.southpolestation.com/trivia/igy1/appendix.html,91,10,jimsojim,10/13/2015 9:30
11370829,Crate raises $4M seed round for its next-gen SQL database,http://techcrunch.com/2016/03/15/crate-raises-4m-seed-round-for-its-next-gen-sql-database/,3,1,hitekker,3/27/2016 18:08
11665197,Advertising Cannot Maintain the Internet. Heres the Secret Sauce Solution,http://evonomics.com/advertising-cannot-maintain-internet-heres-solution/,2,1,dredmorbius,5/10/2016 4:46

Let’s see what happens when trying the same thing with this file:

>>> from numpy import genfromtxt
>>> genfromtxt("cxf.csv", encoding="UTF-8", delimiter=",")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bruno/anaconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 2089, in genfromtxt
    raise ValueError(errmsg)
ValueError: Some errors were detected !
    Line #3 (got 8 columns instead of 7)

Now that a closer look at the third line:

11337617,"Shims, Jigs and Other Woodworking Concepts to Conquer Technical Debt",http://firstround.com/review/shims-jigs-and-other-woodworking-concepts-to-conquer-technical-debt/,34,7,zt,3/22/2016 16:18

How many commas are there? If you have k commas, how many columns does this result in?