Python

How to use variable in a query in pandas – Data Analysis

How to use variable in a query in pandas – Data Analysis

Suppose you want to reference a variable in a query in pandas package in Python. This seems to be a straightforward task but it becomes daunting sometimes. Let’s discuss it with examples in the article below.

Let’s create a sample dataframe having 3 columns and 4 rows. This dataframe is used for demonstration purpose.

import pandas as pd
df = pd.DataFrame({"col1" : range(1,5),
                   "col2" : ['A A','B B','A A','B B'],
                   "col3" : ['A A','A A','B B','B B']
                   })

Filter a value A A in column col2

In order to do reference of a variable in query, you need to use @.

Mention Value Explicitly

newdf = df.query("col2 == 'A A'")

Reference Method

myval1 = 'A A'
newdf = df.query("col2 == @myval1")

Instead of filtering value we are referring the column which we want to use for filtering.

myvar1 = 'col2'
newdf2 = df.query("{0} == 'A A'".format(myvar1))

{0} takes a value of variable myvar1.

"{0} == 'A A'".format(myvar1)
"col2 == 'A A'"

Incase you want to pass multiple columns as

Incase you want to pass multiple columns as variables in query. Here we are using columns col2 and col3.

myvar1 = 'col2'
myvar2 = 'col3'
newdf2 = df.query("{0} == 'A A' & {1} == 'B B'".format(myvar1, myvar2))

"{0} == 'A A' & {1} == 'B B'".format(myvar1, myvar2) is equivalent to "col2 == 'A A' & col3 == 'B B'"

How to Handle Space in Column Name

Let’s rename column col2 by including a space in between for illustration purpose.

df.rename(columns={'col2':'col 2'}, inplace = True)

By using backticks “ you can pass a column which contains space.

myvar1 = '`col 2`'
newdf = df.query("{0} == 'A A'".format(myvar1))
Suggested Articles

Leave a Reply

Your email address will not be published. Required fields are marked *