Same code works on dataquest but gives error in Jupyter Notebook

My Code:

def clean_and_convert(date):
    # check that we don't have an empty string
    if date != "":
        # move the rest of the function inside
        # the if statement
        date = date.replace("(", "")
        date = date.replace(")", "")
        date = int(date)
    return date

for row in moma:
    begin_date = row[3]
    end_date = row[4]
    birth_date = clean_and_convert(begin_date)
    death_date = clean_and_convert(end_date)
    row[3] = birth_date
    row[4] = death_date

What I expected to happen:
With the if statement, it worked as instructed on dataquest but when I use the same code in Jupyter Notebook, it gave the error.
I tried to add float in
date = int(float(date)) function and it gives the similar error

What actually happened:

ValueError                                Traceback (most recent call last)
Input In [9], in <cell line: 8>()
      9 begin_date = row[3]
     10 end_date = row[4]
---> 11 birth_date = clean_and_convert(begin_date)
     12 death_date = clean_and_convert(end_date)
     13 row[3] = birth_date

Input In [9], in clean_and_convert(date)
      3     date = date.replace('(', '') # Move the rest of the function inside the if statement
      4     date = date.replace(')', '')
----> 5     date = int(date)
      6 return date

ValueError: invalid literal for int() with base 10: 'BeginDate'

While the submitted result is correct on dataquest, I am curious as to why Jupyter gives the error. Thank you!

DQ interface stores data from previous screens as well. Whatever you’ve done in the previous screens kind of gets carried to the latest screen, most of the times. So just wanted to check if you have those previous steps and dataset present in your Jupyter notebook?

1 Like

Hello, yes I did also input all the previous codes from past slides up to this point on Jupiter and I then did the same on pycharm. Both gave me the same error despite that if statement.

Thank you

Hi @snicks.pnht

Based on the error it looks like the first row which probably contains column names is still present in MOMA and is the first record being read. Can you try the for loop by starting it from the second row instead?

For future queries please attach a link to the course screen and/or course tags in the topic section. Please also attach your jupyter notebook, it helps the community help you better.



Yes! that is it. Thank you. Doing the step by step through dataquest, the interface already accounted for it so they worked as intended, but when I transferred the exact code to other 3rd parties, I forgot to exclude the header manually.

1 Like