Question related to Working With Strings In Pandas(10)

Hi there,

I’m doing point 10 of ‘Working With Strings In Pandas’ mission (Extracting More Than One Group of Patterns from a Series).
The task here is to extract the years from ‘IESurvey’ column according to the regex pattern.

Part of my code:
pattern = r"(?P<First_Year>[1-2][0-9]{3})/?(?P<Second_Year>[0-9]{2})?"
years = merged['IESurvey'].str.extractall(pattern)

The result is a ‘years’ dataframe containg only the data that fits the pattern.

I’ve started to wonder what if I want to replace the original data from ‘IESurvey’ column in ‘merged’ dataframe with those extracted according to the pattern (ie. with dates from ‘First_Year’ column).

I’ve tried the naive solution:
merged['IESurvey'] = years['First_Year']

and get this error:
TypeError: incompatible index of inserted column with frame index

Anyone can help me with this?

The error is because you’re trying add a multiindex to a (simple)index, this question answer don’t need this.

Yeah, of course I don’t need this to complete this task, but I was thinking how I can reintroduce the data collected using .extractsall() method back to the original dataframe.

Let’s say - like in this example - I’ve extracted the dates from ‘IESurvey’ column and now I want to replace the original strings with dates (obviously in rows which they were collected from), leaving the rest of the column unchanged.
The `.extractall()’ method create some kind of a subset which is smaller than original dataframe (this is the reason for the TypeError - I get it). The question is how to do it anyway.

Anyone has any idea?