In this tutorial, we will cover how to remove one or more columns from a pandas dataframe. Pandas is a python package that has several functions for data analysis.
Syntax to Drop Columns
import pandas as pd
new_df = df.drop(['column_name1','column_name2'], axis=1)In pandas, drop( ) function is used to remove column(s) from a pandas dataframe. axis=1 tells Python that you want to apply function on columns instead of rows.
Drop Columns from Pandas Dataframe in Python
Let’s create a sample dataframe to explain examples in this tutorial. The code below creates 4 columns named ‘A’ through ‘D’.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))The following code removes Column ‘A’ from dataframe named ‘df’ and store it to new dataframe named ‘newdf’.
newdf = df.drop(['A'], axis=1) B C D
0 -1.656038 1.655995 -1.413243
1 0.710933 -1.335381 0.832619
2 -0.411327 0.098119 0.768447
3 -0.093217 1.077528 0.196891
4 0.302687 0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779#Check columns in newdf after dropping column A
newdf.columns
# Output
# Index(['B', 'C', 'D'], dtype='object')Remove Multiple Columns in Python
You can specify all the columns you want to remove in a list and pass it in drop( ) function.
Method I
df2 = df.drop(['B','C'], axis=1)Method II
cols = ['B','C']
df2 = df.drop(cols, axis=1)Dropping Columns by Column Position
You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.
df.drop(df.columns[0], axis =1)To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].
cols = [0,2]
df.drop(df.columns[cols], axis =1)Dropping Columns by Name Pattern
df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})
X1 X_2 YX Y_1 Z
0 1 2 3 2 5
1 2 3 4 3 6
2 3 4 5 4 7
3 4 5 6 5 8
4 5 6 7 6 9Dropping Columns Starting with ‘X’
df.loc[:,~df.columns.str.contains('^X')]How it works?
^Xis a expression of regex language which refers to beginning of letter ‘X’df.columns.str.contains('^X')returns array [True, True, False, False, False]. True where condition meets. Otherwise False- Sign
~refers to negate the condition. df.loc[ ]is used to select columns
It can also be written like :
df.drop(df.columns[df.columns.str.contains('^X')], axis=1)Other Examples
#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]
#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]
#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]Dropping Columns with Missing Values Greater than 50%
df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
'B':[4,np.nan,np.nan,5,np.nan]
})% of missing values can be calculated by mean of NAs in each column.
cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)