Alternative filtering in Pandas with 'query'

In the Dataquest community, everyone is somehow familiar
One of the most common tasks when working with data is filtering and selecting data.

We are all used to using Boolean filtering syntax

dataframe = dataframe[dataframe['a'] == 0]

with Pandas. Someone is just beginning to get acquainted with it and use it in work, someone constantly uses it.
Что делать, если нам нужно применить несколько фильтров к различным столбцам, неизвестным заранее? For example, for multi-column data filtering on a dashboard.
You can use a loop by iterating through the columns and values to which the filters will be applied

for column, value in coupe_list:
    data = data[data[column] == value]

Or you can refer to the high-level query method.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html

filter = " & ".join(["{0} == {1}".format(elem[0], elem[1]) for elem in coupe_list]
dataframe = dataframe.query(filter )

or rewrite the first example

dataframe = dataframe.query('a == 0')

I hope the query method will be as useful to you as it is to me :slight_smile:

4 Likes

Hey @moriturus7.

I agree, query is very useful. I’m actually teaching it in an upcoming “Data Cleaning: Advanced” course that should be released in the coming months, and have started using it a bit more in my own analysis work.

I find that in some cases it is beautifully expressive and succinct, but in other, more complex cases it can be confusing and hard to read. As a result, I’m tending to go back and forth a little. I am a bit torn about the inconsistency of switching between two different patterns, but this is mostly a stylistic concern.

Comparing

for column, value in coupe_list:
    data = data[data[column] == value]

to

filter = " & ".join(["{0} == {1}.format(elem[0], elem[1]) for elem in coupe_list]
dataframe = dataframe.query(filter )

I prefer the first a lot - it’s extremely clear what’s happening in the first, where the second is very difficult to understand. Maybe a nice compromise might be

for column, value in coupe_list:
    data = data.query('column == value')
3 Likes

Hey, joshdq

I’ll wait for your course. :slight_smile:

Yes, my example may not be clear to an inexperienced user, its only advantage is the flexibility gained in just a couple of lines.

Note that in your example, the syntax should be:

data = data.query ("@column == @value")

column and value are variables.

You should also keep in mind that the query is quite slow, as it is a high-level shell compared to other filtering methods, and using the query inside the loop can greatly affect performance