Clean and Analyze Exit Surveys

Screen Link:

My Code:

combined_updated.copy()['institute_service_up'] = combined_updated['institute_service'].astype('str').str.extract(r"(\d+)")
combined_updated.copy()['institute_service_up'] = combined_updated['institute_service_up'].astype('float')                                                                                     


What I expected to happen:
I’m confused about how this code is able to extract the lower end of the range. Why is this the case?

How was I supposed to figure out using extract(r’(\d+)’) was THE PROPER SYNTAX TO USE. I used this link: 6.2. re — Regular expression operations — Python 3.4.10 documentation but could only find information on finding digits using d. Is there an alternative within the framework of what I’ve learned in the modules to solving this issue? Had it not been for the key, I don’t think I would have been able to find this specific syntax with my own research. Any pointers on how to figure out this and make this deduction independently?

Also, what’s the difference between ‘str’ or ‘float’, and str and float—without quotations?

What actually happened:

:man_facepalming:t2::man_facepalming:t2::man_facepalming:t2: That’s actually what I’m trying to figure out :joy: :joy: :joy: :joy:

Replace this line with the output/error

tldr/ just curious about how the syntax works, ‘str’ and ‘float’ vs. str and float, and why the lower end of the range is used given the syntax I provided. I’m curious to see what the community thinks. Thanks :pray: :pray: :pray: :pray:

Hey @kylemoorman1

what’s the difference between ‘str’ or ‘float’, and str and float—without quotations?

Not entirely sure about this. But could be something to do with Python 2 and Python 3.
This stack overflow post might help.

I tried this and got similar results:

>>> type("str"), type(str)
(<class 'str'>, <class 'type'>)

Not entirely sure what exactly your question is about \d+. Can you please explain (again)?

Hi @Rucha ,

If you refer to the very top of my original post, the code is essentially using the vectorized “extract” method in order pull digits (or reoccuring digits) from the “institute service” column. What I don’t understand is:

  1. Why does this syntax work? I’m thinking maybe I’ll learn why in my current module about regular expression.


  1. Some values come across as a range before I apply my extract method code. I noticed that the lower number in the range is extracted instead of the higher number. For instance, if the value_counts method gives me 5-6 before applying the extract method, why do I get a value_counts which counts this as all 5 years? Why does the extract method extract the lower digit in the range?

I’m also curious how I could have been able to defuce this syntax on my own. Please let me know what about my question is confusing specifically. All I can do is re-explain, but please advise on what might be creating the confusion with my question.

Thanks :pray: