Regular experession basics

You have to use value_counts() on a series object!

In the screen you’re talking about, we see that the step_1 object is a Series. It was derived by isolating the title column in the original hn file you read in at the start of that mission.

1 Like

Thanks a lot @blueberrypudding85 i completely forgot that))

pattern = ‘[(\w+)]’
step_1 = titles.str.extract(pattern)
tag_freq = step_1.value_counts()

I think i was too fast to say it’s over)

is there anyone who can help?

can’t run this on Jupyter because while it runs on website
@Bruno could you have a look?

Thanks in advance

What’s the error you’re getting exactly?

DataFrame’ object has no attribute ‘value_counts’ @blueberrypudding85

Would it be possible to upload your current work so far?

If you followed everything else in the mission it’s strange that you would still be getting that error.

That looks weird. My best guess is that some of the code prior to that might not have been set up correctly.

What did you define titles as before? Have you tried restart the kernel and then running everything in order again?

titles = hn[‘title’]
and I tried to restart it multiple times it still gives me an error

Let’s try troubleshooting step_1 then.

What do you get when you print it? What do you get when you do:

print(type(step_1))
print(step_1.shape)

Since step_1 seems to be the offending variable it makes sense to start by investigating this.

print(type(step_1))
print(step_1.shape)

<class ‘pandas.core.frame.DataFrame’>
(20099, 1)

this is what it gives me)

<class ‘pandas.core.frame.DataFrame’>
(20099, 1)

That’s how we know it’s not a series. If it was a series, its shape would be (20099,)

Try printing step_1 to see what its column is, and then index it again using that column.

For instance:

step_1b = step_1['name_of_col_example']

And then do the value_counts with step_1b

Well, when i print(step_1)
it gives me:
0
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN

and so on

I did as you adviced but nothing worked))
it is really weird)) I hope someone from lead team can explain to me, to us why that is happening)

whenever i print type and variable it gives me Series, but should i include str.extract command it turn everything to dataframe)

image
My current understanding is str.extract always returns dataframe (even if only 1 capturing group is used, which should produce only 1 column which makes sense to store as series) unless expand=False.

More examples: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.extract.html

whenever i print type and variable it gives me Series, but should i include str.extract command it turn everything to dataframe

I have never seen this before, maybe you were printing the wrong variable? In a method chain, A.B.C try to print A.B if you want to test effect of C rather than create your own A.B in another way.

1 Like

You can use the help function to read about the return type.

help(str.extract)

Or use the type function to check for the object type.

type(str.extract(r'[ab](\d)'))

At this juncture I’d be forced to conclude that there was a problem much earlier with your previous steps, perhaps as early as when you first read in the data.

It would help more if you uploaded the .ipynb file, perhaps on a github repo just for sharing purposes. At this point I’m really curious as to what is going on too!! And are you also positive that the file is being read in correctly? When you assign the csv file via pd.read_csv(), and you then preview the file in the jupyter notebook, does it look the same as the dataframe you preview on the DQ site?

Hi @Lakrau,

Our platform is using Pandas version 0.22.0. And in this version, the expand parameter is set to False by default in str.extract(). And when expand is False, it will return a Series if there is only 1 capture group.

https://pandas.pydata.org/pandas-docs/version/0.22/generated/pandas.Series.str.extract.html

From 0.23.0 pandas has set the expand parameter as True by default. And when expand is True, it will return a data frame.

https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.Series.str.extract.html

Hope this clarifies it.

Best,
Sahil

6 Likes

I just want to thank everyone who found time and tried to help me) Finally, I was able to proceed further,
The issue was that I didn’t set the paramenter expand=False, when I did, it resolved the issue)

Thanks everyone!

2 Likes

The more you know!

3 Likes

This is a PERFECT explanation! Thank you!!!

1 Like