Suppose you want to reference a variable in a query in pandas package in Python. This seems to be a straightforward task but it becomes daunting sometimes. Let’s discuss it with examples in the article below.
Let’s create a sample dataframe having 3 columns and 4 rows. This dataframe is used for demonstration purpose.
import pandas as pd
df = pd.DataFrame({"col1" : range(1,5),
"col2" : ['A A','B B','A A','B B'],
"col3" : ['A A','A A','B B','B B']
})
Filter a value A A
in column col2
In order to do reference of a variable in query, you need to use @
.
Mention Value Explicitly
newdf = df.query("col2 == 'A A'")
Reference Method
myval1 = 'A A'
newdf = df.query("col2 == @myval1")
Instead of filtering value we are referring the column which we want to use for filtering.
myvar1 = 'col2'
newdf2 = df.query("{0} == 'A A'".format(myvar1))
{0} takes a value of variable myvar1.
"{0} == 'A A'".format(myvar1)
"col2 == 'A A'"
Incase you want to pass multiple columns as
Incase you want to pass multiple columns as variables in query. Here we are using columns col2
and col3
.
myvar1 = 'col2'
myvar2 = 'col3'
newdf2 = df.query("{0} == 'A A' & {1} == 'B B'".format(myvar1, myvar2))
"{0} == 'A A' & {1} == 'B B'".format(myvar1, myvar2)
is equivalent to "col2 == 'A A' & col3 == 'B B'"
How to Handle Space in Column Name
Let’s rename column col2
by including a space in between for illustration purpose.
df.rename(columns={'col2':'col 2'}, inplace = True)
By using backticks “ you can pass a column which contains space.
myvar1 = '`col 2`'
newdf = df.query("{0} == 'A A'".format(myvar1))