Help with a code?

Can any one please, explain this code to me? I have tried to understand it with no success. Step-1 course2?

Thank you so much.

test_data = [“1912”, “1929”, “1913-1923”,
“(1951)”, “1994”, “1934”,
“c. 1915”, “1995”, “c. 1912”,
“(1988)”, “2002”, “1957-1959”,
“c. 1955.”, “c. 1970’s”,
“C. 1990-1999”]

bad_chars = ["(",")",“c”,“C”,".",“s”,"’", " "]

def strip_characters(string):
for char in bad_chars:
string = string.replace(char,"")
return string

stripped_test_data = [‘1912’, ‘1929’, ‘1913-1923’,
‘1951’, ‘1994’, ‘1934’,
‘1915’, ‘1995’, ‘1912’,
‘1988’, ‘2002’, ‘1957-1959’,
‘1955’, ‘1970’, ‘1990-1999’]
def process_date(date):
if “-” in date:
split_date = date.split("-")
date_one = split_date[0]
date_two = split_date[1]
date = (int(date_one) + int(date_two)) / 2
date = round(date)
else:
date = int(date)
return date

processed_test_data =

for d in stripped_test_data:
date = process_date(d)
processed_test_data.append(date)

for row in moma:
date = row[6]
date = strip_characters(date)
date = process_date(date)
row[6] = date

Can you be more specific about what exactly you did not understand?

Hello,

The test_data is a list that contains the data we want to clean and it contains data in different formats. The task is to remove all the extra characters so that we’re left with either a range or a single year.

The bad-chars is a list of bad characters we want to remove from the strings stored in the test_data list.

The strip_characters is a function that that uses a for loop to iterate over bad_chars list, and replaces each of the bad characters with an empty string.

def strip_characters(string):  //the function accepts a string argument
     for char in bad_chars:   // iterates over every bad character (char) in the bad-chars list
         string = string.replace(char,"") // replaces the bad character with an empty string and assigns it back to the string
     return string
stripped_test_data = [] //initialized an empty list
for string in test_data:  // iterates over every string in test_data list
    string = strip_characters(string)  // applies the function on the string and assigns the cleaned string back to string
    stripped_test_data.append(string)  //appends the cleaned string to the stripped_test_data list

The stripped_test_data is a list that contains the cleaned strings which is the output of passing each string in test_data through the strip_characters function.

Although the stripped_test_data contains two different cases of data:

  • some are ranges of years e.g. 1913 - 1923
  • some are a single year e.g. 1912

The process_date function ensures that:

  • Where there is a single year, we’ll keep it.
  • Where there is a year range, we’ll average the two years.
def process_date(date): //creates a process_date function and the input string parameter is date
    if '-' in date: // checks if the date is a range by checking if - is in the input (date)
        date = date.split('-') //it splits the string into two strings, before and after the dash character
        date_one = int(date[0]) //Converts the first number to the integer type 
        date_two = int(date[1]) //Converts the second number to the interger type
        date = round((date_one + date_two)/2) // average them by adding them together and dividing by two. Uses the round() function to round the average, so values like 1964.5 become 1964
        
    else: //if the date is not a range
        date = int(date) //converts the vavlue to an integer type
        
    return date

We now use the process_data function above to process the dates in the stripped_test_data and append the processed dates to a new list processed_test_data.

processed_test_data = [] //creating a new list to store the processed data
for string in stripped_test_data: //Loops over the stripped_test_data list 
    string = process_date(string) // applies the your process_date() function on every string in the stripped_test_data list
    processed_test_data.append(string) // appends each processed date to the processed_test_data list.

We can now iterate over the moma list of lists and use the functions already created to clean the data in our date column.

for row in moma: // iterates over the moma list of lists
    date = row[6] // Assigns the value from the Date column (index 6) to a variable (date).
    date = strip_characters(date) // Use the strip_characters() function to remove any bad characters.
    date = process_date(date) // Use the process_date() to convert the date.
    row[6] = date // Assigns the stripped and processed value back to the row

Let me know if you understand it now.