Python

How to rename columns in Pandas Dataframe – Data Analysis

How to rename columns in Pandas Dataframe – Data Analysis

In this tutorial, we will cover various methods to rename columns in pandas dataframe in Python. Renaming or changing the names of columns is one of the most common data wrangling task. If you are not from programming background and worked only in Excel Spreadsheets in the past you might feel it not so easy doing this in Python as you can easily rename columns in MS Excel by just typing in the cell what you want to have. If you are from database background it is similar to ALIAS in SQL. In Python there is a popular data manipulation package called pandas which simplifies doing these kind of data operations.

2 Methods to rename columns in Pandas

In Pandas there are two simple methods to rename name of columns.

First step is to install pandas package if it is not already installed. You can check if the package is installed on your machine by running !pip show pandas statement in Ipython console. If it is not installed, you can install it by using the command !pip install pandas.

Import Dataset for practice

To import dataset, we are using read_csv( ) function from pandas package.

import pandas as pd
df = df = pd.read_csv("<https://raw.githubusercontent.com/JackyP/testing/master/datasets/nycflights.csv>", usecols=range(1,17))

To see the names of columns in a data frame, write the command below :

df.columns
Index(['year', 'month', 'day', 'dep_time', 'dep_delay', 'arr_time',
       'arr_delay', 'carrier', 'tailnum', 'flight', 'origin', 'dest',
       'air_time', 'distance', 'hour', 'minute'],
      dtype='object')

Method I : rename() function

Suppose you want to replace column name year with year**s**. In the code below it will create a new dataframe named df2 having new column names and same values.

df2 = df.rename(columns={'year':'years'})

If you want to make changes in the same dataset df you can try this option inplace = True

df.rename(columns={'year':'years'}, inplace = True)

By default inplace = False is set, hence you need to specify this option and mark it True. If you want to rename names of multiple columns, you can specify other columns with comma separator.

df.rename(columns={'year':'years', 'month':'months' }, inplace = True)

Method II : dataframe.columns = [list]

You can also assign the list of new column names to df.columns. See the example below. We are renaming year and month columns here.

df.columns = ['years', 'months', 'day', 'dep_time', 'dep_delay', 'arr_time',
       'arr_delay', 'carrier', 'tailnum', 'flight', 'origin', 'dest',
       'air_time', 'distance', 'hour', 'minute']

Rename columns having pattern

Suppose you want to rename columns having underscore ‘_’ in their names. You want to get rid of underscore

df.columns = df.columns.str.replace('_' , '')

New column names are as follows. You can observe no underscore in the column names.

Index(['year', 'month', 'day', 'deptime', 'depdelay', 'arrtime', 'arrdelay',
       'carrier', 'tailnum', 'flight', 'origin', 'dest', 'airtime', 'distance',
       'hour', 'minute'],
      dtype='object')

Rename columns by Position

If you want to change the name of column by position (for example renaming first column) you can do it by using the code below. df.columns[0] refers to first column.

df.rename(columns={ df.columns[0]: "Col1" }, inplace = True)

Rename columns in sequence

If you want to change the name of column in sequence of numbers you can do it by iterating via for loop.

df.columns=["Col"+str(i) for i in range(1, 17)]

In the code below df.shape[1] returns no. of columns in the dataframe. We need to add 1 here as range(1,17) returns 1, 2, 3 through 16 (excluding 17).

df.columns=["Col"+str(i) for i in range(1, df.shape[1] + 1)]

Add prefix / suffix in column names

In case you want to add some text before or after existing column names, you can do it by using add_prefix( ) and add_suffix( ) functions.

df = df.add_prefix('V_')
df = df.add_suffix('_V')

How to access columns having space in names

For demonstration purpose we can add space in some column names by using df.columns = df.columns.str.replace('_' , ' '). You can access the column using the syntax df[“columnname”]

df["arr delay"]

How to change row names

With the use of index option, you can rename rows (or index). In the code below, we are altering row names 0 and 1 to ‘First’ and ‘Second’ in dataframe df. By creating dictionary and taking previous row names as keys and new row names as values.

df.rename(index={0:'First',1:'Second'}, inplace=True)
Suggested Articles

Leave a Reply

Your email address will not be published. Required fields are marked *