Working with datetime in pandas

Hi guys
I’m working again in a personal project and i have some files in excel that i open in pandas to transform them into dataframes. There’s a column that is a date in day-month-year hour format as you can see in the image


So i use the next code to first transform the hour 24:00 into 00:00 because pandas only works from 00:00 to 23:00, then i transform them into datetime type because when i create the dataframe pandas don’t recognize the formart and says that it’s a object type. And do other stuff like extracting the day and month and combining them together

`cs19['Fecha'] = cs19['Fecha'].str.replace('24:00', '00:00')
 cs19['Fecha'] = cs19['Fecha'].astype('datetime64[ns]')
 cs19['day'] = cs19['Fecha'].dt.month
 cs19['month'] = cs19['Fecha']
 cs19['Monthday'] = (cs19['month'] * 100) + cs19['day']`

It works so far but then i found something really weird, the first 12 days have this format year-day-month hour but then at day 13 it changes into year-month-day hour and later changes again into the first one


At this point i don’t know what to do. I’ve been reading but i’m not getting any clue and to be honest this is one of my biggest flaws. I don’t know how to work well with datetimes

I appreciate any help
Thanks in advance!

Specify format of your dates. To that you can use pd.to_datetime

replace it with this

df["Fecha"] = pd.to_datetime(df.Fecha, format="%d-%m-%Y %H:%M")

It should work.

1 Like

I’m wondering how it’s possible that your 288 index of 2019-01-13 got 13 as the month (i assume this is so by looking at last column showing 1301).

Do you have a small csv to replicate this?

import pandas as pd
import io

df = pd.read_csv(io.StringIO(temp))

This is 1 way to demonstrate your problem on a small scale.

How did you read in the data?
What were the parameter settings?

Here is a good summary of some issues with pandas which depends on dateutil for date parsing and the problems with dayfirst, yearfirst, default american date format parsing, british date format that people expect as the default, their non-strict behavior (meaning no Error raised, but falling back to a other ways of parsing). Using format parameter with pd.to_datetime is good for both specifying exactly how you want it parsed, and for errors to be raised if you have bad data that doesn’t match your specified format. I expect index 288 to raise error if you used format


@DishinGoyani Thank you so much! it worked. However the date wasn’t transformed into the format specified in pd.to_datetime. I mean if i understand correctly, the date format should have change right?

@hanqi I tried to replicate your code but to be honest i don’t understand it. However the post linked it’s really useful
I use this to read the data i don’t know if i could have save time and problem with another parameter

Thank you both for your help!

Hello, @alegiraldo666 format we specify in the pd.to_datetime is for input data (input date format). Not for output.