LIMITED TIME OFFER: 50% OFF OF PREMIUM WITH OUR ANNUAL PLAN (THAT'S $294 IN SAVINGS).
GET OFFER

(7/10) Cleaning and Preparing Data in Python

Screen Link:


My Code:
test_data = ["1912", "1929", "1913-1923",
             "(1951)", "1994", "1934",
             "c. 1915", "1995", "c. 1912",
             "(1988)", "2002", "1957-1959",
             "c. 1955.", "c. 1970's", 
             "C. 1990-1999"]

bad_chars = ["(",")","c","C",".","s","'", " "]

def strip_characters(string):
    for char in bad_chars:
        fstring = string.replace(char, "")
    return fstring

stripped_test_data = []
for date in test_data:
    dates = strip_characters(date)
    stripped_test_data.append(dates)

What I expected to happen:
I expected to get a clean result

What actually happened:
stripped_test_data did not return expected data

- actual + expected

  ['1912',
   '1929',
   '1913-1923',
-  '(1951)',
+  '1951',
   '1994',
   '1934',
-  'c.1915',
+  '1915',
   '1995',
-  'c.1912',
-  '(1988)',
+  '1912',
+  '1988',
   '2002',
   '1957-1959',
-  'c.1955.',
-  "c.1970's",
-  'C.1990-1999']
+  '1955',
+  '1970',
+  '1990-1999']

I think the problem is in the first function. When I use the variable name “string”, the code seems to work just fine. Why is that?

Hi @ShamsulHoqueKhan,

You are right about where the problem is. In this function:

def strip_characters(string):
    for char in bad_chars:
        fstring = string.replace(char, "")
    return fstring

The for loop reuses the variable string to perform the replacement. When you assign string to a new variable fstring, fstring get’s reassigned in every loop. So when you return fstring, what you get out of this function is the last iteration result of the loop.(That’s why all the "c. 19xx." values in your result only have the space taken out) It’s important to keep the variable name with this line string = string.replace(char, "") so that the variable string gets updated in every iteration and all the bad_chars are being replaced.

Hope this helps! :grinning:

1 Like