Datetime format for Guided Project Clean and Analyze Employee Exit Surveys

Hi, how to convert month and date ? For example, from May-12 to 05/2012 ?

My code: dete_resignations[“cease_date”].value_counts()
Actual output:
2012 126
2013 74
Jan-14 22
Dec-13 17
Jun-13 14
Sep-13 11
Nov-13 9
Jul-13 9
Oct-13 6
Aug-13 4
May-12 2
May-13 2
Sep-10 1
Jul-12 1
2010 1
Jul-06 1
Name: cease_date, dtype: int64

I was expecting to come out the output like below.
expected output:
2012 126
2013 74
01/2014 22
12/2013 17
06/2013 14
09/2013 11
11/2013 9
07/2013 9
10/2013 6
08/2013 4
05/2012 2
05/2013 2
09/2010 1
07/2006 1
07/2012 1
2010 1
Name: cease_date, dtype: int64

Hi @mohngelay.nl

Sharing your code along with the question might help the community to help you with understanding where the code needs tweaking.
Meanwhile, I would suggest to look at these posts and try to modify your code to achieve the results you are looking for (there are many posts in the community, you have to use specific terms though!):

In case these posts don’t help much, do share your code. I hope I am able to look at it before other members jump on it! :stuck_out_tongue_winking_eye:

Hi Rucha,
Thank you! It is the fifth step for the guided project (Clean And Analyze Employee Exit Surveys) and below is my code, actual output, and expected output.

Code:
dete_resignations[“cease_date”] = dete_resignations[“cease_date”].str.split(’-’).str[-1]
dete_resignations[“cease_date”] = dete_resignations[“cease_date”].astype(“float”)

dete_resignations[“cease_date”].value_counts()

Actual output:
2012.0 126
2013.0 74
13.0 72
14.0 22
12.0 3
2010.0 1
6.0 1
10.0 1

I expected to get below output:

2013.0 146
2012.0 129
2014.0 22
2010.0 2
2006.0 1

I wanted to know why it happen and where to fix to get the expected output.

Like @Rucha said, please provide a mission link and format your code appropriately as per these guidelines so that we can better assist you.

1 Like

It is the fifth step for the guided project (Clean And Analyze Employee Exit Surveys) and below is my code, actual output, and expected output.

https://app.dataquest.io/m/348/guided-project%3A-clean-and-analyze-employee-exit-surveys/5/verify-the-data

My code:

dete_resignations[“cease_date”] = dete_resignations[“cease_date”].str.split(’-’).str[-1]
dete_resignations[“cease_date”] = dete_resignations[“cease_date”].astype(“float”)

dete_resignations[“cease_date”].value_counts()

Actual output:
2012.0 126
2013.0 74
13.0 72
14.0 22
12.0 3
2010.0 1
6.0 1
10.0 1

I expected to get below output:

2013.0 146
2012.0 129
2014.0 22
2010.0 2
2006.0 1

I wanted to know why it happen and where to fix to get the expected output. Thank you so much!

1 Like

hi @mohngelay.nl

At the onset! This is not a solution and you may need to take it, step by step, and figure out what works best. The reason being, there is a mismatch between the “cease date” column for your data frame and my project work.

So I am gonna take the help of a dummy series. Try the below-given codes in separate cells, observe the results, and in case you get more confused do highlight your concerns.

The code for test_series is:
test_series = pd.Series(["less than 1 year", "Apr-05", "01/2013", "05-Jun", "10/2014", 2015, 2016, 1, 10])

Let’s break the series by data values:

  • I guess you are familiar that “Apr-05” does not mean Apr-2005, it means “04-05” i.e. 4 to 5 years since the column mentions the data is in years.
  • We can change month values to a number using a dictionary and applying a replace code on the series. I did this using a dictionary and then calling that dictionary in the replace method.

Something like this:

dict_repl = {value_to_be_changed : new_value}
series.replace(dict_repl, inplace = True)

So for these values, we can either choose, 4 or 5 as years data i.e. we can either extract 4 or 5.
To do that we can use the split method with "-" which you have done. The [-1] will extract the last value in the resultant list.

test_series.astype(str).str.split("-").str[-1]

  • Value like “01/2013” represent month/year. Here we can either use "/" to split and follow as above or we can create a pattern to extract the year.
    A pattern like this - r"([2][0-9]{3})". This means the year starts with 2, the next three digits could be any number between 0 and 9 (both inclusive).

test_series.astype(str).str.split("/").str[-1] OR
test_series.astype(str).str.extract(r"([2][0-9]{3})")

  • Now we come to values with multiple words and a number somewhere in between. Do observe the outputs of the above codes for these kinds of values. Also, try the below code.

test_series.astype(str).str.extract(r"(\d+)")

  • Regarding the absolute numeric values such as 2016, 10, etc. Try the above codes but without including this code part - ".astype(str)"
1 Like

@masterryan.prof
please highlight anything misleading or additional steps to help @mohngelay.nl better!
Thanks.

Hi @Rucha,
Thanks so much! it is very helpful.

1 Like

Hi @masterryan.prof,
I would love to know if you have other approaches. Thanks!

1 Like

Hi I personally also have not done the project so I can’t really comment on it.

hi @mohngelay.nl

Thanks for marking a solution to this topic. These are just a few ways you can treat the series. If you search in the community with the mission tag, you will get different and perhaps efficient codes to wrangle such data.

Also, you can experiment and try chaining the commands or defining a function in case you encounter mixed values in a series/ array, etc.

Happy Learning :slight_smile:

1 Like