I downloaded hacker news dataset to practice in Jupyter Notebook and in the 7th page
when u try to get frequency count for tags it doesn’t work in jupyter(says that DataFrame has no attribute value_counts()) but is fine in the Website
Could somebody tell me how to fix it in jupyter?
Thanks in advance
You have to use
value_counts() on a series object!
In the screen you’re talking about, we see that the
step_1 object is a Series. It was derived by isolating the
title column in the original
hn file you read in at the start of that mission.
Thanks a lot @blueberrypudding85 i completely forgot that))
pattern = ‘[(\w+)]’
step_1 = titles.str.extract(pattern)
tag_freq = step_1.value_counts()
I think i was too fast to say it’s over)
is there anyone who can help?
can’t run this on Jupyter because while it runs on website
@Bruno could you have a look?
Thanks in advance
What’s the error you’re getting exactly?
DataFrame’ object has no attribute ‘value_counts’ @blueberrypudding85
Would it be possible to upload your current work so far?
If you followed everything else in the mission it’s strange that you would still be getting that error.
That looks weird. My best guess is that some of the code prior to that might not have been set up correctly.
What did you define
titles as before? Have you tried restart the kernel and then running everything in order again?
titles = hn[‘title’]
and I tried to restart it multiple times it still gives me an error
Let’s try troubleshooting
What do you get when you print it? What do you get when you do:
step_1 seems to be the offending variable it makes sense to start by investigating this.
this is what it gives me)
That’s how we know it’s not a series. If it was a series, its shape would be
step_1 to see what its column is, and then index it again using that column.
step_1b = step_1['name_of_col_example']
And then do the
Well, when i print(step_1)
it gives me:
and so on
I did as you adviced but nothing worked))
it is really weird)) I hope someone from lead team can explain to me, to us why that is happening)
whenever i print type and variable it gives me Series, but should i include str.extract command it turn everything to dataframe)
My current understanding is str.extract always returns dataframe (even if only 1 capturing group is used, which should produce only 1 column which makes sense to store as series) unless
More examples: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html
whenever i print type and variable it gives me Series, but should i include str.extract command it turn everything to dataframe
I have never seen this before, maybe you were printing the wrong variable? In a method chain, A.B.C try to print A.B if you want to test effect of C rather than create your own A.B in another way.
You can use the help function to read about the return type.
Or use the type function to check for the object type.
At this juncture I’d be forced to conclude that there was a problem much earlier with your previous steps, perhaps as early as when you first read in the data.
It would help more if you uploaded the .ipynb file, perhaps on a github repo just for sharing purposes. At this point I’m really curious as to what is going on too!! And are you also positive that the file is being read in correctly? When you assign the csv file via
pd.read_csv(), and you then preview the file in the jupyter notebook, does it look the same as the dataframe you preview on the DQ site?
Our platform is using Pandas version 0.22.0. And in this version, the
expand parameter is set to
False by default in
str.extract(). And when
False, it will return a
Series if there is only 1 capture group.
From 0.23.0 pandas has set the
expand parameter as
True by default. And when expand is True, it will return a data frame.
Hope this clarifies it.
I just want to thank everyone who found time and tried to help me) Finally, I was able to proceed further,
The issue was that I didn’t set the paramenter expand=False, when I did, it resolved the issue)