Not able to convert column to date - 'str' object cannot be interpreted as an integer

Hi i am trying to clean up the column here converting this column to date, but am afraid i was not able to get it.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime as dt

%matplotlib inline

This is part of the info for the columns.

S/N                                                              44 non-null int64
Date                                                             44 non-null object
Time of Reporting                                                44 non-null object
Month                                                            44 non-null int64

When i try this

fraud['Date'] = pd.to_datetime(fraud['Date'], format='%Y%m%d')

i get the following error message

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
C:\ProgramData\Anaconda3_64\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4380             try:
-> 4381                 return libindex.get_value_box(s, key)
   4382             except IndexError:

pandas/_libs/index.pyx in pandas._libs.index.get_value_box()

pandas/_libs/index.pyx in pandas._libs.index.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.get_value_at()

pandas/_libs/util.pxd in pandas._libs.util.validate_indexer()

TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-19-b32e0288d98d> in <module>
----> 1 fraud['Date'] = pd.to_datetime(fraud['Date'], format='%Y%m%d')

C:\ProgramData\Anaconda3_64\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    866         key = com.apply_if_callable(key, self)
    867         try:
--> 868             result = self.index.get_value(self, key)
    869 
    870             if not is_scalar(result):

C:\ProgramData\Anaconda3_64\lib\site-packages\pandas\core\indexes\category.py in get_value(self, series, key)
    452 
    453         # we might be a positional inexer
--> 454         return super(CategoricalIndex, self).get_value(series, key)
    455 
    456     def _can_reindex(self, indexer):

C:\ProgramData\Anaconda3_64\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4387                     raise InvalidIndexError(key)
   4388                 else:
-> 4389                     raise e1
   4390             except Exception:  # pragma: no cover
   4391                 raise e1

C:\ProgramData\Anaconda3_64\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
   4373         try:
   4374             return self._engine.get_value(s, k,
-> 4375                                           tz=getattr(series.dtype, 'tz', None))
   4376         except KeyError as e1:
   4377             if len(self) > 0 and (self.holds_integer() or self.is_boolean()):

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_value()

pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas/_libs/index_class_helper.pxi in pandas._libs.index.Int8Engine._check_type()

KeyError: 'Date'

i tried googling this problem but not sure what’s causing it. am using PC for this exercise.

hello can we see some rows of the fraud DataFrame and also fraud.columns array?

The error is here.

Hi does this help?
sorry, i am masking some of the data here. still learning how to anonymize data

Hello @willx,

One of the errors in your code is that you put the wrong format. Either change the format value to format='%Y-%m-%d' or do not include the format parameter.

Let me know if this helps.

oh hi thanks for replying @info.victoromondi @doyinsomoye

Tried this

Tried this also
format=’%Y/%m/%d’

Even tried to escape the / with \

and also removing the format. Still hittting the same problem.
This is snapshot from the Excel file i am reading from

image

Have you tried converting the column to an int type and trying again?

i tried that and get the same error as well. i am importing from Excel. could that be the reason?

That’s really weird. I get these types of errors when there’s a typo in the column name I try to refer to. I would check the fraud.columns attribute to see if there’s a space character or something like that…

1 Like

@ksenia.kustanovich that’s true, I adviced him to it here.

fraud_tidy_dt.xlsx (12.2 KB)

sorry can you see this file. i have removed all the columns as i am having problems focusing on the date column as i am having problems anonymizing the data.

rgds

@willx,

Check out what I did with the file you uploaded:

DQ.ipynb (7.3 KB)

BTW, is this all the values you have in the Date column?

Click here to view the jupyter notebook file in a new tab

hi all, i convert the file a csv file and read from the csv file and it works.
earlier i was facing problems reading from the csv file due to tokenizing data problem which is why i tried xlsx.
i guess excel contains some hidden characters or formatting that prevents the reading of the string column. thank you so much guys for your help and time :slight_smile: