Python

How to Drop Columns from Pandas Dataframe – Data Analysis

How to Drop Columns from Pandas Dataframe – Data Analysis

In this tutorial, we will cover how to remove one or more columns from a pandas dataframe. Pandas is a python package that has several functions for data analysis.

Syntax to Drop Columns

import pandas as pd
new_df = df.drop(['column_name1','column_name2'], axis=1)

In pandas, drop( ) function is used to remove column(s) from a pandas dataframe. axis=1 tells Python that you want to apply function on columns instead of rows.

Drop Columns from Pandas Dataframe in Python

Let’s create a sample dataframe to explain examples in this tutorial. The code below creates 4 columns named ‘A’ through ‘D’.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(6, 4), columns=list('ABCD'))

The following code removes Column ‘A’ from dataframe named ‘df’ and store it to new dataframe named ‘newdf’.

newdf = df.drop(['A'], axis=1)
          B         C         D
0 -1.656038  1.655995 -1.413243
1  0.710933 -1.335381  0.832619
2 -0.411327  0.098119  0.768447
3 -0.093217  1.077528  0.196891
4  0.302687  0.125881 -0.665159
5 -0.692847 -1.463154 -0.707779
#Check columns in newdf after dropping column A
newdf.columns

# Output
# Index(['B', 'C', 'D'], dtype='object')

Remove Multiple Columns in Python

You can specify all the columns you want to remove in a list and pass it in drop( ) function.

Method I

df2 = df.drop(['B','C'], axis=1)

Method II

cols = ['B','C']
df2 = df.drop(cols, axis=1)

Dropping Columns by Column Position

You can find out name of first column by using this command df.columns[0]. Indexing in python starts from 0.

df.drop(df.columns[0], axis =1)

To drop multiple columns by position (first and third columns), you can specify the position in list [0,2].

cols = [0,2]
df.drop(df.columns[cols], axis =1)

Dropping Columns by Name Pattern

df = pd.DataFrame({"X1":range(1,6),"X_2":range(2,7),"YX":range(3,8),"Y_1":range(2,7),"Z":range(5,10)})

   X1  X_2  YX  Y_1  Z
0   1    2   3    2  5
1   2    3   4    3  6
2   3    4   5    4  7
3   4    5   6    5  8
4   5    6   7    6  9

Dropping Columns Starting with ‘X’

df.loc[:,~df.columns.str.contains('^X')]

How it works?

  1. ^X is a expression of regex language which refers to beginning of letter ‘X’
  2. df.columns.str.contains('^X') returns array [True, True, False, False, False]. True where condition meets. Otherwise False
  3. Sign ~ refers to negate the condition.
  4. df.loc[ ] is used to select columns

It can also be written like :

df.drop(df.columns[df.columns.str.contains('^X')], axis=1)

Other Examples

#Removing columns whose name contains string 'X'
df.loc[:,~df.columns.str.contains('X')]

#Removing columns whose name contains string either 'X' or 'Y'
df.loc[:,~df.columns.str.contains('X|Y')]

#Removing columns whose name ends with string 'X'
df.loc[:,~df.columns.str.contains('X$')]

Dropping Columns with Missing Values Greater than 50%

df = pd.DataFrame({'A':[1,3,np.nan,5,np.nan],
                   'B':[4,np.nan,np.nan,5,np.nan]
                   })

% of missing values can be calculated by mean of NAs in each column.

cols = df.columns[df.isnull().mean()>0.5]
df.drop(cols, axis=1)
Suggested Articles

Leave a Reply

Your email address will not be published. Required fields are marked *