So based on my quick google search:
str.extract(r’(\d+)’) --> \d, \D represents ANY ONE digit/non-digit character. Digits are [0-9]
+ represents one or more ( 1+
), e.g., [0-9]+
matches one or more digits such as '123'
, '000'
.
So if d represents a digit or non-digit character? Why does it primarily pull out the digits from the “institute_service” columns
Eg: when the column value is “More than 20 years”, this function extracted 20 instead of the non-digit characters.
Also, what happened to NaN when I used this function? They just dissappeared from the updated columns once i parsed the column values through this funciton