Python

Python Data Structures – Data Analysis

Python Data Structures – Data Analysis

This post explains the data structures used in Python, along with examples.

In python, there are many data structures available. They are as follows :

String

String is a sequence of characters.

How to create a string in Python?

You can create string using a single or double quote.


mystring = "Hello Python3.6"
print(mystring)

# Output
# Hello Python3.6

Note : You can also use multiple single or double quotes to define string.


mystring = r'Hello"Python"'
print(mystring)

# Output
# Hello"Python"

You can use the syntax below to get first letter.


mystring = 'Hi How are you?'
mystring[0]

# Output
# 'H'

mystring[0] refers to first letter as indexing in python starts from 0. Similarly, mystring[1] refers to second letter. To pull last letter, you can use -1 as index.

mystring[-1]

mystring.split(' ')[0]

# Output
# Hi
  1. mystring.split(' ') tells Python to use space as a delimiter.

Output : ['Hi', 'How', 'are', 'you?']

  1. mystring.split(' ')[0] tells Python to pick first word of a string.

List

Unlike string, list can contain different types of objects such as integer, float, string etc.

1. x = [142, 124, 234, 345, 465]
2. y = [‘A’, ‘C’, ‘E’, ‘M’]
3. z = [‘AA’, 44, 5.1, ‘KK’]

We can extract list item using Indexes. Index starts from 0 and end with (number of elements-1).

Syntax : list[start : stop : step]

  1. start : refers to starting position.
  2. stop : refers to end position.
  3. step : refers to increment value.
k = [124, 225, 305, 246, 259]
k[0] # returns 124
k[1] # returns 225
k[-1] # returns 259

k[0] picks first element from list. Negative sign tells Python to search list item from right to left. k[-1] selects the last element from list.

To select multiple elements from a list, you can use the following method :

k[:3] # returns [124, 225, 305]
k[0:3] # also returns [124, 225, 305]
k[::-1] # reverses the whole list and returns [259, 246, 305, 225, 124]

sorted(list) function arranges list in ascending order.

sorted(list, reverse=True) function sorts list in descending order.

sorted(k) # returns [124, 225, 246, 259, 305]
sorted(k, reverse=True) # returns [305, 259, 246, 225, 124]

In the program below, len() function is used to count the number of elements in a list. In this case, it returns 5. With the help of range() function, range(5) returns 0,1,2,3,4.

It can also be written like this –

for i in range(len(x)):
   x[i]+= 5
print(x)

The ‘+’ operator is concatenating two lists.

X = [1, 2, 3]
Y = [4, 5, 6]
Z = X + Y
print(Z)

# Output
# [1, 2, 3, 4, 5, 6]
X = [1, 2, 3]
Y = [4, 5, 6]

import numpy as np
Z =np.add(X, Y)
print(Z)

# Output
# [5 7 9]

Similarly, you can use np.multiply(X, Y) to multiply values of two list.

The ‘*’ operator is repeating list N times.

X = [1, 2, 3]
Z = X * 3
print(Z)

# Output
# [1, 2, 3, 1, 2, 3, 1, 2, 3]

Note : The above two methods also work for string list.

Suppose you need to replace third value to a different value.

X = [1, 2, 3]
X[2]=5
print(X)

# Output
# [1, 2, 5]

We can add a list item by using append method.

X = ['AA', 'BB', 'CC']
X.append('DD')
print(X)

# Result
# ['AA', 'BB', 'CC', 'DD']

Similarly, we can remove a list item by using remove method.

X = ['AA', 'BB', 'CC']
X.remove('BB')
print(X)

# Result : ['AA', 'CC']

Tuple

Like list, tuple can also contain mixed data. But tuple cannot be mutable or changed once created whereas list can be mutable or modified.

Another difference is a tuple is created inside parentheses ( ). Whereas, list is created inside square brackets [ ]

mytuple = (123,223,323)
City = ('Delhi','Mumbai','Bangalore')
for i in City:
    print(i)

# Output
# Delhi
# Mumbai
# Bangalore

Run the following command and check error.

X = (1, 2, 3)
X[2]=5

TypeError: ‘tuple’ object does not support item assignment.

Dictionary

It works like an address book wherein you can find an address of a person by searching the name. In this example. name of a person is considered as key and address as value. It is important to note that the key must be unique while values may not be. Keys should not be duplicate because if it is a duplicate, you cannot find exact values associated with key. Keys can be of any data type such as strings, numbers, or tuples.

It is defined in curly braces {}. Each key is followed by a colon (:) and then values.

teams = {'Dave' : ['teamA','teamAA', 'teamAB'],
         'Tim'  : ['teamB','teamBB','teamBC'],
         'Babita' : ['teamC','teamCB','teamCC']
        }
  • teams.keys() returns keys i.e. [‘Dave’, ‘Tim’, ‘Babita’]
  • teams.values() returns values i.e. [[‘teamA’, ‘teamAA’, ‘teamAB’], [‘teamB’, ‘teamBB’, ‘teamBC’], [‘teamC’, ‘teamCB’, ‘teamCC’]]
  • teams.items() returns both keys and values.
teams['Dave']

# Output
# ['teamA', 'teamAA', 'teamAB']

In the code below, we are removing ‘Babita’ from teams dict.

del teams['Babita']

# Output
# {'Dave': ['teamA', 'teamAA', 'teamAB'],
# 'Tim': ['teamB', 'teamBB', 'teamBC']}

Here we are adding one more key named ‘Deep’ and value against it is ‘team D’.

teams['Deep'] = 'team D'

# Output
# {'Dave': ['teamA', 'teamAA', 'teamAB'],
# 'Deep': 'team D',
# 'Tim': ['teamB', 'teamBB', 'teamBC']}

You can also create dictionary like the way it is shown below

d={}
d['a'] = 1
d['b'] = 2
print(d)

# Output
# {'a': 1, 'b': 2}

Suppose you have keys and values stored in two separate lists. You can use and zip them to create a dictionary.

keys = ['a', 'b', 'c']
values = [1, 2, 3]
d1 = dict(zip(keys, values))

Sets

Sets are unordered collections of simple objects. They are mainly used to check whether an object is present in the set and compute mathematical operations such as intersection, union, difference etc.

X = set(['A', 'B', 'C'])
'A' in X

Result : True

'D' in X

Result : False

X.add('D')
X.remove('C')
Y = X.copy()
Y & X

The examples below would help you to understand what kind of operations on data structures are commonly used in real-world.

x = [1, 2, 3, 4]
y = [2, 3, 6, 5]

list(set(x) & set(y))
list(set(x) | set(y))

& symbol refers to ‘and’ condition which means common between two lists. | symbol refers to ‘or’ condition.

x = [1, 2, 3, 4]
3 in x

# Output
# True

all returns True only when all the items exist. Whereas any returns when any of the item exist.

all(i in x for i in [1,6])

# Output
# False

any(i in x for i in [1,6])

# Output
# True
Suggested Articles

Leave a Reply

Your email address will not be published. Required fields are marked *