This tutorial covers how list comprehension works in Python. It includes many examples which would help you to familiarize the concept and you should be able to implement it in your live project at the end of this lesson.
What is list comprehension?
Python is an object oriented programming language. Almost everything in them is treated consistently as an object. Python also features functional programming which is very similar to mathematical way of approaching problem where you assign inputs in a function and you get the same output with same input value. Given a function f(x) = x2
, f(x) will always return the same result with the same x value. The function has no “side-effect” which means an operation has no effect on a variable/object that is outside the intended usage. “Side-effect” refers to leaks in your code which can modify a mutable data structure or variable.
Functional programming is also good for parallel computing as there is no shared data or access to the same variable.
List comprehension is a part of functional programming which provides a crisp way to create lists without writing a for loop.
In the image above, the for clause
iterates through each item of list. if clause
filters list and returns only those items where filter condition meets. if clause is optional so you can ignore it if you don’t have conditional statement.
[i**3 for i in [1,2,3,4] if i>2]
means take item one by one from list [1,2,3,4] iteratively and then check if it is greater than 2. If yes, it takes cube of it. Otherwise ignore the value if it is less than or equal to 2. Later it creates a list of cube of values 3 and 4. **Output :** [27, 64]
List Comprehension vs. For Loop vs. Lambda + map()
All these three have different programming styles of iterating through each element of list but they serve the same purpose or return the same output. There are some differences between them as shown below.
- List comprehension is more readable than For Loop and Lambda function.
List Comprehension
[i**2 for i in range(2,10)]
For Loop
sqr = []
for i in range(2,10):
sqr.append(i**2)
sqr
Lambda + Map
list(map(lambda i: i**2, range(2, 10)))
Output
[4, 9, 16, 25, 36, 49, 64, 81]
List comprehension is performing a loop operation and then combines items to a list in just a single line of code. It is more understandable and clearer than for loop and lambda.
range(2,10)
returns 2 through 9 (excluding 10).
*2
refers to square (number raised to power of 2).sqr = []
creates empty list.append( )
function stores output of each repetition of sequence (i.e. square value) in for loop.
map( )
applies the lambda function to each item of iterable (list). Wrap it in list( )
to generate list as output
- List comprehension is slightly faster than For Loop and Lambda function.
Here we are measuring the code execution time of these 3 methods. We are calculating square of values from 1 through 10 millions one by one only when a value is an even number (divisible by 2).
If statement if x%2 == 0
checks whether a number is even or not. For example, 5 % 2
returns 1 which is remainder. If remainder is 0, it means it is an even number.
List Comprehension
l1= [x**2 for x in range(1, 10**7) if x % 2 == 0]
# Processing Time : 3.96 seconds
For Loop
sqr = []
for x in range(1, 10**7):
if x%2 == 0:
sqr.append(x**2)
# Processing Time : 5.46 seconds
Lambda + Map()
l0 = list(map(lambda x: x**2, filter(lambda x: x%2 == 0, range(1, 10**7))))
# Processing Time : 5.32 seconds
filter(lambda x: x%2 == 0, range(1, 10**7))
returns even numbers from 1 through (10 raised to power 7) as filter() function is used to subset items from the list. Roughly you can think of filter() as WHERE
clause of SQL.
List Comprehension : IF-ELSE
Here we are telling python to convert text of each item of list to uppercase letters if length of string is greater than 4. Otherwise, convert text to lowercase. upper( )
converts string to uppercase. Similarly, you can use lower( )
function for transforming string to lowercase.
List Comprehension
mylist = ['Dave', 'Micheal', 'Deeps']
[x.upper() if len(x)>4 else x.lower() for x in mylist]
For Loop
k = []
for x in mylist:
if len(x) > 4:
k.append(x.upper())
else:
k.append(x.lower())
k
Filtering Dictionary using List Comprehension
Suppose you have a dictionary and you want to select particular keys and specific values. In short you want to apply conditional statement IF
to subset or filter dictionary.
d = {'a': [1,2,1], 'b': [3,4,1], 'c': [5,6,2]}
Filter dictionary by values
Here we are selecting values where b is greater than 2.
[x for x in d['b'] if x >2]
Output
[(3, 4)]
Filter dictionary by multiple conditions
Here we are applying condition – select only those values where a is equal to 1 and b is greater than 1.
[(x,y) for x, y in zip(d['a'],d['b']) if x == 1 and y > 1]
Output
[(1, 3)]
In the above program, x refers to d[‘a’] and y refers to d[‘b’].
Filter dictionary where all values are greater than 1
all( )
checks condition(s) for all values in the list. It returns True/False.
k
refers to keys and v
refers to values of dictionary.
[(k,v) for k,v in d.items() if all(x > 1 for x in v) ]
Output
[('c', [5, 6, 2])]
Only key c
contains values which all are greater than 1. Similarly we have any( )
function which returns True if any of the value meets condition.
[(k,v) for k,v in d.items() if any(x > 2 for x in v) ]
Output
[('b', [3, 4, 1]), ('c', [5, 6, 2])]
Using List Comprehension on Pandas DataFrame
In real-world, we generally have data stored in either CSV or relational databases. We generally convert it to pandas dataframe and then we do data cleaning and manipulation. Hence it is important to learn how to use list comprehension on dataframe. Suppose you have a dataframe for employees having their names and current and previous salary. Let’s create a pandas dataframe for illustration.
import pandas as pd
df = pd.DataFrame({'name': ['Sandy', 'Sam', 'Wright', 'Atul'],
'prevsalary': [71, 65, 64, 90],
'nextsalary': [75, 80, 61, 89]})
df
name prevsalary nextsalary
0 Sandy 71 75
1 Sam 65 80
2 Wright 64 61
3 Atul 90 89
We need to create a column called ‘Flag’ which will have value either ‘High Bracket’ or ‘Low Bracket’. If values of column prevsalary
is greater than 70, it should be labelled as ‘High Bracket’. Else assign it as ‘Low Bracket’.
df['Flag'] = ["High Bracket" if x > 70 else "Low Bracket" for x in df['prevsalary']]
The above line of code produced a new column in the df dataframe.
name prevsalary nextsalary Flag
0 Sandy 71 75 High Bracket
1 Sam 65 80 Low Bracket
2 Wright 64 61 Low Bracket
3 Atul 90 89 High Bracket
How to perform operation on multiple columns?
Suppose you want to compare two variables (columns) prevsalary
and nextsalary
. If prevsalary > nextsalary, categorize it as “Increase”. Else populate “Decrease” value.
df['Flag2'] = ["Increase" if x > y else "Decrease" for (x,y) in zip(df['nextsalary'],df['prevsalary'])]
name prevsalary nextsalary Flag Flag2
0 Sandy 71 75 High Bracket Increase
1 Sam 65 80 Low Bracket Increase
2 Wright 64 61 Low Bracket Decrease
3 Atul 90 89 High Bracket Decrease
Here we are comparing two variables x and y in list comprehension
x = df['nextsalary'], y = df['prevsalary']
Convert character variable to integer
Suppose you have a column which contains numeric values but the column is defined as a character variable (object column type). You need to convert it to integer. It is a common problem when you read data from some online portal or some columns of table are stored in char or varchar format in relational database. Let’s create a dataframe for example.
import pandas as pd
df = pd.DataFrame({"col1" : ['1', '2', '3']})
df['col2'] = [int(x) for x in df['col1']]
See the output below. col2
has column type integer.
col1 object
col2 int64
This can be solved without list comprehension as pandas has built-in function for the same df['col2'] = df['col1'].astype(int)
Nested List Comprehension
It is equivalent to multiple for-loop. The standard syntax of nested list comprehension is shown below
Suppose you have a list of lists and you want to select only the odd numbers. Any number not divisible by 2 is an odd number.
List Comprehension
mat = [[1,2], [3,4], [5,6]]
[x for row in mat for x in row if x%2==1]
Multiple For-Loop
b = []
for row in mat:
for x in row:
if x%2 == 1:
b.append(x)
b
If you observe the syntax above and compare them, you would find them very similar.
Output
[1, 3, 5]
List Comprehension on dictionaries within list
Suppose you have a list which contains multiple dictionaries and you want to apply filtering on values and selecting specific keys.
mylist = [{'a': 1, 'b': 2}, {'a': 3, 'b': 4}, {'a': 5, 'b': 6}]
In the following code, we are selecting only key a
and all the values against this key where it is greater than 1.
[i['a'] for i in mylist if 'a' in i if i['a'] > 1]
Output
[3, 5]
How to create tuples from lists
Idea is to prepare all the possible combinations by applying multiple loops with list comprehension.
l1 = ['a','b']
l2 = ['c','d']
[(x,y) for x in l1 for y in l2]
Output
[('a', 'c'), ('a', 'd'), ('b', 'c'), ('b', 'd')]
Split Sentences into words with list comprehension
In text mining, it is one of the initial data cleaning step to break sentences into words. Before splitting, make sure all words are converted into either lower or uppercase to avoid multiple words for same content.
text = ["Life is beautiful", "No need to overthink", "Meditation help in overcoming depression"]
[word for sentence in text for word in sentence.lower().split(' ')]
['life',
'is',
'beautiful',
'no',
'need',
'to',
'overthink',
'meditation',
'help',
'in',
'overcoming',
'depression']
How it works?
for sentence in text
means looping through each sentence of listtext
text[0].lower().split(' ')
converts to lowercase and then separate sentence into words and returns['life', 'is', 'beautiful']
Exercises for practice
The following are the exercises which would help you to gain hands-on experience. Solve and paste your solution in the comment box below.
- Remove these words
is, in, to, no
fromtext
list. Desired output should be as follows –
'life',
'beautiful',
'need',
'overthink',
'meditation',
'help',
'overcoming',
'depression'
- Find matching numbers in the lists.
x = [51, 24, 32, 41]
y = [42, 32, 41, 50]
Output should be [32, 41]
- If element of list is between 30 and 45, make it 1 else 0
x = [51, 24, 32, 41]
Output : [0, 0, 1, 1]