Help using regex with number ranges and decimals

Screen Link:
https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/9/clean-the-service-column

The series contains the number of years in different patterns. e.g 3-5, 4.0, less than 8, and more than 12.

When I checked github, I saw the code below:

combined_updated['institute_service_up'] = combined_updated['institute_service'].astype('str').str.extract(r'(\d+)')

I understand that str.extract(r'(\d+)') is to return digits. If the value is 3-5 for instance, does this return 3 or 5 for that cell?
Also if I have 4.2, does regex extract 4 or 2?

1 Like

Straight from the documentation (emphasis is mine):

Series.str. extract ( self , pat , flags=0 , expand=True )

Extract capture groups in the regex pat as columns in a DataFrame.

For each subject string in the Series, extract groups from the first match of regular expression pat.

I’ll let you answer your questions now. Let me know if you need more help with this.

1 Like

ah, so the first digit. thank you.

Also, when I ran this solution code from github

combined_updated['institute_service_up'] = combined_updated['institute_service'].astype('str').str.extract(r'(\d+)')

combined_updated['institute_service_up'] = combined_updated['institute_service_up'].astype('float')

combined_updated['institute_service_up'].value_counts()

i got this error:

/dataquest/system/env/python3/lib/python3.4/site-packages/ipykernel/__main__.py:1: FutureWarning: currently extract(expand=None) means expand=False (return Index/Series/DataFrame) but in a future version of pandas this will be changed to expand=True (return DataFrame)
  if __name__ == '__main__':

please what does this mean?

What this means has nothing to do with the content in the mission, per se. I suspect you won’t care to go into the details of this, but if you really want to know, please ask it in a separate topic.

Thanks!