Select rows with condition

I’m trying to clean a dataframe and I can’t remember how to select rows with a condition.

As you can see in the image below, I have a column ‘Lot Cond. Code’. Values can be ‘B’, ‘E’, ‘S’, ‘D’, ‘X’. I want to select rows of code ‘B’, ‘S’ and ‘X’. How do i do that??

I remember that i can use this code:

df[df[‘Lot Cond. Code’] == ‘B’]

but that just select rows with ‘B’ value, how do I also select rows with ‘S’ and ‘X’?? I want to drop all columns that doesn’t meet the criteria.

TIA!

image

You can use Series.isin() for this.

It’s great that you are working on your own projects. It can be challenging to do so by yourself. One small advice I would give you is to get comfortable with googling for such problems. Most of the times you will also find helpful Stackoverflow posts that provide the best possible solution. One such example related to your post - https://stackoverflow.com/questions/36410075/select-rows-from-a-dataframe-based-on-multiple-values-in-a-column-in-pandas

Good luck!

1 Like

Thanks!! I hadn’t seen that post in stackoverflow but did try Series.isin() but returned an empty DF. Now I tried with just one condition to eliminate the possibility of wrongly using Series.isin() and also returns an empty df, so that means that I’m having a different issue than I originally thought…

If I use this code:

df[df[['Sale Title Type'] == 'S']

I get an empty DF when there are rows with that value in the column as you can see here:

copart['Sale Title Type'].value_counts()

 S    1210
 B     991
 N     718
 U     304
 M     118
 X      79
 Q      72
 O      20
 C      16
 D      13
Name: Sale Title Type, dtype: int64

What could be causing this?? I thought that maybe the column type was wrong, so I checked that and the type for column ‘Sale Title Type’ is ‘object’.

copart.dtypes

Yard name                        object
Sale Date DD/MM/YYYY             object
Lot number                        int64
Vehicle Type                     object
Year                              int64
Make                             object
Model Group                      object
Model Detail                     object
Body Style                       object
Color                            object
Damage Description               object
Secondary Damage                 object
Sale Title State                 object
Sale Title Type                  object
Has Keys-Yes or No               object
Lot Cond. Code                   object
VIN                              object
Odometer                        float64
Lot ACV                         float64
 Repair cost                    float64
Engine                           object
Drive                            object
Transmission                     object
Fuel Type                        object
Runs/Drives                      object
Sale Status                      object
High Bid =non-vix Sealed=Vix     object
Special Note                     object
Location Postcode                object
Location country                 object
Make-an-Offer Eligible           object
Buy-It-Now Price                float64
Last Updated Time                object
dtype: object

Thank you for your time!.

It might be better if you shared the exact code you used.

Your dataframe is copart but you refer to df above. And the above would likely throw an error because there’s an additional [ in there as well.

It would also be helpful if you shared the code for your isin() use.

Unfortunately, since this is your own project, any specific help is difficult to provide without the notebook and the data, so you might have to experiment with the suggestions.

Yes, sorry for not being clear. I tried to copy all the code:

copart['Sale Title Type'].value_counts()

 S    1210
 B     991
 N     718
 U     304
 M     118
 X      79
 Q      72
 O      20
 C      16
 D      13
Name: Sale Title Type, dtype: int64

copart[copart['Sale Title Type'] == 'S']

|Yard name|Sale Date DD/MM/YYYY|Lot number|Vehicle Type|Year|Make|Model Group|Model Detail|Body Style|Color|Damage Description|Secondary Damage|Sale Title State|Sale Title Type|Has Keys-Yes or No|Lot Cond. Code|VIN|Odometer|Lot ACV|Repair cost|Engine|Drive|Transmission|Fuel Type|Runs/Drives|Sale Status|High Bid =non-vix Sealed=Vix|Special Note|Location Postcode|Location country|Make-an-Offer Eligible|Buy-It-Now Price|Last Updated Time|

copart.dtypes

Yard name                        object
Sale Date DD/MM/YYYY             object
Lot number                        int64
Vehicle Type                     object
Year                              int64
Make                             object
Model Group                      object
Model Detail                     object
Body Style                       object
Color                            object
Damage Description               object
Secondary Damage                 object
Sale Title State                 object
Sale Title Type                  object
Has Keys-Yes or No               object
Lot Cond. Code                   object
VIN                              object
Odometer                        float64
Lot ACV                         float64
 Repair cost                    float64
Engine                           object
Drive                            object
Transmission                     object
Fuel Type                        object
Runs/Drives                      object
Sale Status                      object
High Bid =non-vix Sealed=Vix     object
Special Note                     object
Location Postcode                object
Location country                 object
Make-an-Offer Eligible           object
Buy-It-Now Price                float64
Last Updated Time                object
dtype: object

Based on what you have provided so far, there is nothing that seems out of the ordinary to explain why this might be happening.

The only thing I can think of is that the S you have in your dataset might not be exactly the same S as normal characters. Perhaps the dataset was encoded in a way that some characters are affected by it.

Trying to load the dataset with a specific encoding might help here, but I can’t be sure since I haven’t faced such an issue before.

Or there is some empty space around that S.

That’s the best I can think of right now.

If you want, share your Notebook and dataset via a Github repo link. I can try to test it out at some point, but can’t guarantee anything yet.

I got it! There was a whitespace before the S
There were whitespaces on other columns so I did this:

copart_obj = copart.select_dtypes(['object'])
copart[copart_obj.columns] = copart_obj.apply(lambda x: x.str.strip())

And everything is working perfect now.

Thank you for your inputs!

1 Like