GP: Creating a kaggle workflow (Cabin error)

On step 2 of the ‘Creating a kaggle workflow’ guided project, I’ve created a function that applies several functions to the train and holdout data frame.

My Code:

def preprocessing(df):
    df = process_missing(df)
    df = process_age(df)
    df = process_fare(df)
    df = process_titles(df)
    df = process_cabin(df)
    columns = ['Age_categories', 'Fare_categories', 'Title', 'Cabin_type', 'Sex']
    for col in columns:
        df = create_dummies(df, col)
    return df

train = preprocessing(train)
holdout = preprocessing(holdout)

However, when I run the above function, a long error message pops up that includes the following text (I’ve included only part of the error message since I can’t copy and paste the whole thing).

KeyErrorTraceback (most recent call last)
<ipython-input-4-5b9da2c53e03> in <module>()
     29     return df
---> 31 train = pre_process(train)
     32 holdout = pre_process(holdout)

<ipython-input-4-5b9da2c53e03> in pre_process(df)
     21     df = process_fare(df)
     22     df = process_titles(df)
---> 23     df = process_cabin(df)
     25     for col in ["Age_categories","Fare_categories",

<ipython-input-3-85f0c13ce57d> in process_cabin(df)
     47     train process_cabin(train)
     48     """
---> 49     df["Cabin_type"] = df["Cabin"].str[0]
     50     df["Cabin_type"] = df["Cabin_type"].fillna("Unknown")
     51     df = df.drop('Cabin',axis=1)

KeyError: 'Cabin'

I can’t figure out what is wrong with the code. I’ve also copied and pasted the code from the solution notebook and the same KeyError message pops up (

Any help would be very much appreciated.


Hello @Roya can you share your notebook also…

Hello @Roya,

Error saying that “There is no column name Cabin in dataframe”.

Please confirm; Does your dataframe have Cabin column by print(df.columns).
You may have renamed or deleted it from dataframe.

Hi @DishinGoyani,

The dataframe does have the Cabin column before I run the preprocessing function:

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], dtype='object')

After I run the function (which gives the error message), I get the following columns:
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Embarked', 'Age_categories', 'Fare_categories', 'Title', 'Cabin_type', 'Age_categories_Missing', 'Age_categories_Infant', 'Age_categories_Child', 'Age_categories_Teenager', 'Age_categories_Young Adult', 'Age_categories_Adult', 'Age_categories_Senior', 'Fare_categories_0-12', 'Fare_categories_12-50', 'Fare_categories_50-100', 'Fare_categories_100+', 'Title_Master', 'Title_Miss', 'Title_Mr', 'Title_Mrs', 'Title_Officer', 'Title_Royalty', 'Cabin_type_A', 'Cabin_type_B', 'Cabin_type_C', 'Cabin_type_D', 'Cabin_type_E', 'Cabin_type_F', 'Cabin_type_G', 'Cabin_type_T', 'Cabin_type_Unknown', 'Sex_female', 'Sex_male'], dtype='object')

Hi @Roya,

Have you tried using the df.loc[:, "Cabin"] syntax in place of df["Cabin"]. And have you ensured that you are looking in the columns not the rows for Cabin? Also, If this doesn’t work could you please include a link to the last page of the Creating a kaggle workflow guided project you are working on. This way we’d be able to take a proper look at your code to help you figure out a solution.

Thanks in advance!

1 Like

I checked further and it appears that in the process_cabin we are removing the original column Cabin from df.
See the second last line df = df.drop('Cabin',axis=1) of function

def process_cabin(df):
    """Process the Cabin column into pre-defined 'bins' 


    train process_cabin(train)
    df["Cabin_type"] = df["Cabin"].str[0]
    df["Cabin_type"] = df["Cabin_type"].fillna("Unknown")
    df = df.drop('Cabin',axis=1)
    return df

So if you run preprocessing function two times it would give error second time. As we have removed it on first time run.

1 Like

@DishinGoyani you’re right! It works now. I must’ve probably kept running it twice by accident.

Yes I saw the function removed the column Cabin but I couldn’t figure out why it still gave an error. Amazing, thanks so much for taking the time to answer my question.


Glad it helps! Please consider to like post or mark it as a solution if you found helpful. :slightly_smiling_face:

GUIDELINE #2: Accept and mark answer as Solution

If you find a reply that answers your question satisfactorily, please mark it as Solution . Doing so will help -

  • Others learners, who are searching for the same problem, find the solution faster
  • With the Learning Assistant program - by marking the answer as solution, you can directly help the person who helped you.