How to Drop Columns from Pandas Dataframe - Data Analysis

In this tutorial, we will cover how to remove one or more columns from a pandas dataframe. Pandas is a python package that has several functions for data analysis.

Syntax to Drop Columns

import pandas as pd
new_df = df.drop(['column_name1','column_name2'], axis=1)

In pandas, drop( ) function is used to remove column(s) from a pandas dataframe. axis=1 tells Python that you want to apply function on columns instead of rows.

Drop Columns from Pandas Dataframe in Python

Let’s create a sample dataframe to explain examples in this tutorial. The code below creates 4 columns named ‘A’ through ‘D’.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))

The following code removes Column ‘A’ from dataframe named ‘df’ and store it to new dataframe named ‘newdf’.

newdf = df.drop(['A'], axis=1)

          B         C         D
0 -1.656038  1.655995 -1.413243
1  0.710933 -1.335381  0.832619
2 -0.411327  0.098119  0.768447
3 -0.093217  1.077528  0.196891
4  0.302687  0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779

#Check columns in newdf after dropping column A
newdf.columns

# Output
# Index(['B', 'C', 'D'], dtype='object')

Remove Multiple Columns in Python

You can specify all the columns you want to remove in a list and pass it in drop( ) function.

Method I

df2 = df.drop(['B','C'], axis=1)

Method II

cols = ['B','C']
df2 = df.drop(cols, axis=1)

Dropping Columns by Column Position

You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.

df.drop(df.columns[0], axis =1)

To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].

cols = [0,2]
df.drop(df.columns[cols], axis =1)

Dropping Columns by Name Pattern

df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})


   X1  X_2  YX  Y_1  Z
0   1    2   3    2  5
1   2    3   4    3  6
2   3    4   5    4  7
3   4    5   6    5  8
4   5    6   7    6  9

Dropping Columns Starting with ‘X’

df.loc[:,~df.columns.str.contains('^X')]

How it works?

^X is a expression of regex language which refers to beginning of letter ‘X’
df.columns.str.contains('^X') returns array [True, True, False, False, False]. True where condition meets. Otherwise False
Sign ~ refers to negate the condition.
df.loc[ ] is used to select columns

It can also be written like :

df.drop(df.columns[df.columns.str.contains('^X')], axis=1)

Other Examples

#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]

#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]

#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]

Dropping Columns with Missing Values Greater than 50%

df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
                   'B':[4,np.nan,np.nan,5,np.nan]
                   })

% of missing values can be calculated by mean of NAs in each column.

cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)

How to Drop Columns from Pandas Dataframe – Data Analysis

Syntax to Drop Columns

Drop Columns from Pandas Dataframe in Python

Remove Multiple Columns in Python

Dropping Columns by Column Position

Dropping Columns by Name Pattern

Dropping Columns Starting with ‘X’

How it works?

Other Examples

Dropping Columns with Missing Values Greater than 50%

Python Pandas Read CSV Files with 15 ways – Data Analysis

What are *args and **kwargs and How to use them – Data Analysis

Object Oriented Programming in Python with Examples – Data Analysis

Leave a Reply Cancel reply