GP: Creating a kaggle workflow (Cabin error)

Hi,
On step 2 of the ‘Creating a kaggle workflow’ guided project, I’ve created a function that applies several functions to the train and holdout data frame.

My Code:

def preprocessing(df):
    df = process_missing(df)
    df = process_age(df)
    df = process_fare(df)
    df = process_titles(df)
    df = process_cabin(df)
    
    columns = ['Age_categories', 'Fare_categories', 'Title', 'Cabin_type', 'Sex']
    
    for col in columns:
        df = create_dummies(df, col)
    
    return df

train = preprocessing(train)
holdout = preprocessing(holdout)

However, when I run the above function, a long error message pops up that includes the following text (I’ve included only part of the error message since I can’t copy and paste the whole thing).

KeyErrorTraceback (most recent call last)
<ipython-input-4-5b9da2c53e03> in <module>()
     29     return df
     30 
---> 31 train = pre_process(train)
     32 holdout = pre_process(holdout)

<ipython-input-4-5b9da2c53e03> in pre_process(df)
     21     df = process_fare(df)
     22     df = process_titles(df)
---> 23     df = process_cabin(df)
     24 
     25     for col in ["Age_categories","Fare_categories",

<ipython-input-3-85f0c13ce57d> in process_cabin(df)
     47     train process_cabin(train)
     48     """
---> 49     df["Cabin_type"] = df["Cabin"].str[0]
     50     df["Cabin_type"] = df["Cabin_type"].fillna("Unknown")
     51     df = df.drop('Cabin',axis=1)

KeyError: 'Cabin'

I can’t figure out what is wrong with the code. I’ve also copied and pasted the code from the solution notebook and the same KeyError message pops up (https://github.com/dataquestio/solutions/blob/master/Mission188Solution.ipynb).

Any help would be very much appreciated.

2 Likes

Hello @Roya can you share your notebook also…

Hello @Roya,

Error saying that “There is no column name Cabin in dataframe”.

Please confirm; Does your dataframe have Cabin column by print(df.columns).
You may have renamed or deleted it from dataframe.

Hi @DishinGoyani,

The dataframe does have the Cabin column before I run the preprocessing function:

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'], dtype='object')

After I run the function (which gives the error message), I get the following columns:
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Embarked', 'Age_categories', 'Fare_categories', 'Title', 'Cabin_type', 'Age_categories_Missing', 'Age_categories_Infant', 'Age_categories_Child', 'Age_categories_Teenager', 'Age_categories_Young Adult', 'Age_categories_Adult', 'Age_categories_Senior', 'Fare_categories_0-12', 'Fare_categories_12-50', 'Fare_categories_50-100', 'Fare_categories_100+', 'Title_Master', 'Title_Miss', 'Title_Mr', 'Title_Mrs', 'Title_Officer', 'Title_Royalty', 'Cabin_type_A', 'Cabin_type_B', 'Cabin_type_C', 'Cabin_type_D', 'Cabin_type_E', 'Cabin_type_F', 'Cabin_type_G', 'Cabin_type_T', 'Cabin_type_Unknown', 'Sex_female', 'Sex_male'], dtype='object')

Hi @Roya,

Have you tried using the df.loc[:, "Cabin"] syntax in place of df["Cabin"]. And have you ensured that you are looking in the columns not the rows for Cabin? Also, If this doesn’t work could you please include a link to the last page of the Creating a kaggle workflow guided project you are working on. This way we’d be able to take a proper look at your code to help you figure out a solution.

Thanks in advance!

1 Like

I checked further and it appears that in the process_cabin we are removing the original column Cabin from df.
See the second last line df = df.drop('Cabin',axis=1) of function

def process_cabin(df):
    """Process the Cabin column into pre-defined 'bins' 

    Usage
    ------

    train process_cabin(train)
    """
    df["Cabin_type"] = df["Cabin"].str[0]
    df["Cabin_type"] = df["Cabin_type"].fillna("Unknown")
    df = df.drop('Cabin',axis=1)
    return df

So if you run preprocessing function two times it would give error second time. As we have removed it on first time run.

1 Like

@DishinGoyani you’re right! It works now. I must’ve probably kept running it twice by accident.

Yes I saw the function removed the column Cabin but I couldn’t figure out why it still gave an error. Amazing, thanks so much for taking the time to answer my question.

2 Likes

Glad it helps! Please consider to like post or mark it as a solution if you found helpful. :slightly_smiling_face:

GUIDELINE #2: Accept and mark answer as Solution

If you find a reply that answers your question satisfactorily, please mark it as Solution . Doing so will help -

  • Others learners, who are searching for the same problem, find the solution faster
  • With the Learning Assistant program - by marking the answer as solution, you can directly help the person who helped you.
2 Likes