348-9 Guided Project: Clean And Analyze Employee Exit Surveys - Combining Datasets

Hi all!

I´m lookin for some guidance regarding the Project: Clean And Analyze Employee Exit Surveys, specifically step 8 (on combining the datasets). There´s probably no right or wrong answer but my institute_service values differ quite significantly from the ones provided from DQ:

NaN 164
Less than 1 year 77
3-4 72
1-2 68
11-20 49
More than 20 years 48
5-6 36
7-10 30
5.0 30
3.0 28

My results:

Less than 1 year 73
1-2 64
3-4 63
5-6 33
11-20 26
5.0 23
1.0 22
7-10 21
0.0 20
3.0 20
6.0 17
4.0 16
9.0 14
2.0 14
7.0 13
More than 20 years 10
8.0 8
13.0 8
20.0 7
15.0 7
14.0 6
10.0 6
12.0 6
17.0 6
22.0 6
16.0 5
18.0 5
11.0 4
24.0 4
23.0 4
21.0 3
19.0 3
32.0 3
39.0 3
25.0 2
26.0 2
30.0 2
28.0 2
36.0 2
38.0 1
49.0 1
42.0 1
41.0 1
29.0 1
35.0 1
27.0 1
33.0 1
31.0 1
34.0 1

My combination looked as follows:
I first decided which rows to keep for the combined dataset:
cols_to_keep = ["dissatisfied", "employment_status", "gender", "cease_date","institute","position", "separationtype","institute_service","age"]
dete_resignations_up = dete_resignations_up[cols_to_keep]
tafe_resignations_up = tafe_resignations_up[cols_to_keep]

Then combined like so:
combined = pd.concat([tafe_resignations_up, dete_resignations_up], axis=0)
combined_updated = combined.dropna(subset=["dissatisfied"]) # make sure dissatisfied has no empty values

I´m unsure if concat() was the right option here. How did you combine the datasets? Did you use merge() ? If so, on what column?

many thanks in advance…sorry fo the long paragraph!!

I opened the solution notebook and the results listed there are actually the same as what you have here. I’m not sure why on screen 9 it’s showing those particular values, though. Judging from the solution notebook though it looks like you’re on the right track!

I found this answer on Stack Overflow to be helpful in looking at the differences between the different ways to combine datasets. I would think that here pd.concat() is used because we’re able to create the 2 dataframes with the same columns and just need to stack one on top of the other.

1 Like

Awesome! Thank you so much for your input. Nice to know I’m not going in the wrong direction:-)

1 Like