Data structure and Algorithms

Background: On computer, we wish we can search any item from huge data items faster, taking less time in sorting for large data set, storing data as per requirement in different data structures like queue, stack, linked list etc.

Such problem can be solved after learning data structure and algorithm.

We know, memory is a part of Computer System which is precious. Nowadays, by practicing multiprogramming, main memory (generally called as RAM) contains many numbers of processes. To effective management of that memory, we need solid concepts on data structure and algorithms.

3.1 Data structures and Abstract Data types:

Data structures is a way of managing data in main memory in order to process data processing. 

Abstract data types: In computer, we can create our own data types using fundamental data types that are supported by microprocessor or processor or CPU that is used in Computer Systems. We have interface and data members but we don’t care about the implementation from user perspectives and this is called abstract data type.

I struggled many years that why not programmer sees that implementation at all!!! But it was my lack of knowledge that I missed to learn that in data absolutely doesn’t see implementation details just it has interface.

I think I see the program why user can’t see the implementation!!! But I am not the user, the user is a data using that process.

We can take stack, queue as ADT. It means we define the value it can take, operation or behaviors we can do using those data members.

 

Stack and Queues:

Queue has limited concepts to learn. It is just, we put data through tail and take data from head. Inserting is called as enqueue operation and taking out operation is called dequeue.

Also, there is enque and deque means can be done from both side or something like that.

There is also, a circular queue to understand. The last data member of Head member is connected to the tail member.

Queue follows First In First Out (FIFO) principle.

In computer, queue data structure are used in CPU scheduling, printing etc….

Stack:

Stack is very useful data structure in few examples used in computer by us. For example, web browser, context switching by OS, recursion in programming use stack. The principle of stack is Last In First Out.

 

Using stack, we can write postfix expression.

For example, for z= (A+B)-(C+D)

Symbol

Stack

Expression

 

Null

 

(

(

 

A

 

A

+

(+

A

B

 

AB

)

 

AB+

-

-

AB+

(

-(

 

C

-(

AB+C

+

-(+

AB+C

D

-(+

AB+CD

)

 

AB+CD+-

 

 

Second example, z= (A/(B+C))*(D+E)

Symbol

Stack

Expression

(

(

 

A

(

A

/

(/

A

(

(/(

A

B

(/(

AB

+

(/(+

AB

C

(/(+

ABC

)

(/

ABC+

)

 

ABC+/

*

*

ABC+/

(

*(

 

D

*(

ABC+/D

+

*(+

 

E

*(+

ABC+/DE

)

 

ABC+/DE+

 

 

ABC+/DE+*

 

Queue

The basic principle of queue is first in first out. Inserting data at rear end and taking out of data from front. We need front and rear pointer for this to obtain the address of the queue.

Operation supported in Queue are enqueue and dequeue. Enqueue is inserting data item whereas dequeue is taking out of data item.

There is single queue, circular queue, doubly linked queue.

Single queue: We can insert data item from rear and pop out data from front. We need to shift manually the data items are front side after dequeue operation.

Circular queue: Last data item location and front data item location are connected. We can insert data item at front if there is space. Thus, it helps to prevents unused memory location.

Doubly ended queue or Deque: We can insert data and pop in either way. Two types of deque: insertion restricted deque: insertion at one end only, deletion on both. Deletion restricted deque: deletion at one end only, insertion on both.

Double linked queue: Each node has two pointers to point previous and next data item.

Few examples of queue used in computer are: CPU scheduling, printer spooler.

 

Lists:

Linked Lists: Each data item contains pointer to next data item with data item value itself. Starting data item and last data item. Last data item has pointer value with Null. First data item holds head value of pointer.

Queues: We discussed already.

Trees: Tree are non-linear data structures. Linear data structures mean one data item another. But on non-linear data structures, each data item can be followed by one or more data item and can precede more than or one data item.

Binary Search Trees: A subset of tree data structures where rules are enforced which are: left child has less value than root data value and right child has greater value than root data item value and root has at most two children.

Recursion: In program, we can find the section of program which is being called from within same section or from other program section until the condition is meet. Recursion use stack to keep record of called program.

There are direct and indirect recursion.

Also there is tail and non-tail recursion.

Lets reconsider factorial program.

Now, I am going to write for tail recursion:

            Void fact(int n, int result=1){

                        If(n==0||n==1){

                                    Cout<<”factorial is:<<result<<endl;

                                    Return;

                        }

                        Fact (n-1, result*n);

            }

 

The recursive call is the last operation in the function.

 

Non-tail recursion:

Some operation is performed after the recursive call returns. This leads to a growing call stack.

 

 

Int fact(int n){

            If(n==0||n==1)

                        Return 1;

            Return n*fact(n-1);

}

Int main(){

            Cout<<”fact is:”<<fact(4)<<endl;

            Return 0;

}

 

Return n*fact(n-1) ; needs to multiply after the recursive call non-tail recursion.

 

 

Divide and conquer: Problems are divided into smaller sub-problems. Then, each sub-problems are solved independently and later combined for overall result. Merge and quick sort are examples of divide and conquer.

 

Dyanamic Programming: It is similar to divide and conquer. However, identical sub-problem which are solved previously its solutions are used overlapping problem. Thus, calculation efforts are reduced. Use word overlapping sub-problems. Overlapping sub-problems are solved only once. For example, to calculate fib(4). The fib(3) is already calculated and we use that result to calculate fib(4). So, memory is required here.

 

Greedy Methods: The problem is to choose optimal for now. It does not look beyond steps. Suppose there are four different coins of 10,5,7,1. Our goal is to reach 19. Greedy method chooses 10,5,1,1,1,1… But best move would be 7,7,5. Coins required are less.

 

In tree, there are three terms:

Depth: Depth of node is varied. It is calculated as number of edges from root to that node. Root node depth is 0 (NOT 1).

Height: Height of a node is maximum number of edges from that node to its leaf node. Height of a root gives depth of a tree.

Degree: Degree of tree is maximum number of child that a node contain.

We can calculate depth, height and degree of a node and whole tree.

 

Depth-First Search and Breadth-first search:

In depth-first search the moves goes down and down until the search is found. If not found, it reverse back to recent root and goes down again. If not found then backtrack to root and goes to its root and move to its right child and goes down again. It uses stack for backtracking.

 

So, if height of a tree= depth of tree is: d =3  and breadth length is: b.

Thus, Total space complexity is O(d) and time complexity is O(b^d)

 

In Breadth First Search:

The moves goes in same level and keeps nodes in queue. If visiting value doesn’t found result, then next level is searched.

About complexity:

If d= depth of a tree and b= branching nodes then

Space complexity= O(b^d)

Time complexity=  O(b^d)

 

Minimum spanning tree:

It is a subset of edges that is connected, weighted and undirected graph. The number of edges are V-1 ( V is the number of vertices). Prims and Krushal algorithm are used to make MST.

Directed Acyclic Graphs:

A graphical representation where there is no way to reach starting node after movement from starting node.

 

Hashing:

Suppose, we have a function i.e f(x) where x is the value which is provided to function. It processes. That function provides us the output.

This function in hashing is called hashing function. It has one function which takes any or combination of keys from the data and finally we obtain output.

Hashfunction(key) = value

One example of hash function can be modulo. For example, we have 10 buckets to store the data.

For roll 15, we are going to store his details Roll, Name, Sex, Address, Phone number.

We will find the location for him where to store the details.

Then, using hash function which is modulo.  i.e  roll modulo 10 = 15 mod 10 = 5. Then, roll 15 details will be stored in 5 number bucket.

Similarly, 23 roll no details will be stored in = 23 modulo 10 = 3 . Its detail will be stored in 3 number bucket.

Again, 25 roll no. details will be stored in = 25 modulo 10 = 5. Its detail will be placed in 5 but it is occupied. Thus, there are number of options to solve this:

 

a)     Linked list – 5 no. bucket has linked list i.e roll 15 details then roll 15 details.

b)    Next place (Open chaining)- 6 no. bucket is empty. So, placing 25 roll no. details in bucket 6. If 6 no. bucket also occupied then moved to 7 no. bucket. Similarly, proceeding.

c)     Quadratic- i=0, i=1,2,3, ….. For example here, Formula is:

15 mod 10 = 5 + 0*2 =5

25 mod 10 = 5 + 0*2 =5 occupied… so (5+1*2)%10=6

35 mod 10 = 5.. 5 occupied then,., (5+1*2)%10=6 .. then (5+2*2)%10=9

Sorting

There are various types of sorting:

a Bubble sort: The values are checked from left to right with adjacent value. For ascending order, while checking from left to right, if left value is greater than next right value then swapping occurs. Repeatedly, checking occur with increase of one step to right side.

Thus, it check for n*n times. In worst case it is O(n*2).

B Insertion sorting: The value is checked from left+1 position to left values and step is increased. Always left side values are in order. If checking value is less than left side values then that greater value is moved to right side. Thus, number of checking are reduced as… (n-1)*(n-2)….1. Thus, in worst case it is O(n*2)

C Selection sorting:

Smallest number is supposed and checked with other number. If other next smallest is found then temporary we take new number and proceed and place in left corner with swapping with index of smallest number for left corner number. Then, next second left corner is taken as temporary and finding of second smallest number and swapping done. Thus, number of visiting number is reduced from (n-1) ,, (n-2)… to 1. Thus, in worst case it is O(n*2)

D Merge sort

Here, number are divided till atomic value and merged with sorting if needed. In worst case it is O(n*logn)

E Quick sort

We take pivot number then with smallest number than pivot number are placed to left side and greater than it are placed in right side. Then we follow again this.  In worst case its complexity is O(n*2).

Heap sort. We do heapify here. We move maximum value to top and that maximum value is swapped to most sub-tree leaf value.

Radix sort: First all numbers with are made with equal digit. If not in equal, put padding in left side. Then, first 1’s places are checked and placed on 1’s place location. Then 10’s place is checked and place… similarly perform this to number of digits of a value.