Friday 23 November 2012

Advanced Data Structures using ‘C’ - Minimum Spanning Trees

Advanced Data Structures using ‘C’ Unit 2

Unit 2 Minimum Spanning Trees
and Algorithms
Structure:
2.1 Introduction
Objective
2.2 Spanning Trees
2.2.1 Minimum spanning trees
2.2.2 Why minimum spanning trees
Self Assessment Questions
2.3 How to find minimum Spanning Tree?
2.3.1 Lemma
2.4 Kruskal's Algorithm
2.5 Prim's algorithm
Self Assessment Questions
2.6 Finding Shortest Paths using BFS
2.7 Relaxation
2.8 Dijkstra Algorithm
2.9 Bellman-Ford Algorithm
Self Assessment Questions
2.10 Single-source shortest paths in Directed Acyclic Graph (DAG)
2.11 Floyd Warshall and Variants
2.11.1 Transitive Hull
2.11.2 MiniMax Distance
2.11.3 MaxiMin Distance
2.11.4 Safest Path
Self Assessment Questions
2.12 Other Graphs
2.12.1 Graph Transpose Problem
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 54
2.12.2 Euler Cycle
2.12.3 Euler Path
2.13 Topological Sort Problem
Self Assessment Questions
2.14 Strongly Connected Components problem
2.15 Summary
2.16 Terminal Questions
2.1 Introduction
Tree is one of the most efficient data structure used in a computer program.
There are many types of tree. Binary tree is a tree that always have two
branches, Red-Black-Trees, 2-3-4 Trees, AVL Trees, etc. A well balanced
tree can be used to design a good searching algorithm.
A Tree is an undirected graph that contains no cycles and is connected.
Many trees are what is called rooted, where there is a notion of the "top"
node, which is called the root. Thus, each node has one parent, which is the
adjacent node which is closer to the root, and may have any number of
children, which are the rest of the nodes adjacent to it. The tree above was
drawn as a rooted tree.
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 55
Objectives:
By the end of this unit the readers are expected to have the complete
understanding of
 Fundamental concepts of Spanning Trees
 Various algorithms related to spanning trees
 Floyd Warshall and their variations
 Other graphs of interest
2.2 Spanning trees
A spanning tree of a graph is just a subgraph that contains all the vertices
and is a tree. A graph may have many spanning trees; for instance the
complete graph on four vertices
has sixteen spanning trees:
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 56
2.2.1 Minimum spanning trees
Now suppose the edges of the graph have weights or lengths. The weight of a tree is just the sum of weights of its edges. Obviously, different trees have different lengths. The problem: how to find the minimum length spanning tree?
2.2.2 Why minimum Spanning Trees?
The standard application is to a problem like phone network design. You have a business with several offices; you want to lease phone lines to connect them up with each other; and the phone company charges different amounts of money to connect different pairs of cities. You want a set of lines that connects all your offices with a minimum total cost. It should be a spanning tree, since if a network isn't a tree you can always remove some edges and save money. A less obvious application is that the minimum spanning tree can be used to approximately solve the traveling salesman problem. A convenient formal way of defining this problem is to find the shortest path that visits each point at least once. Note that if you have a path visiting all points exactly once, it's a special kind of tree. For instance, in the example above, twelve of sixteen spanning trees are actually paths. If you have a path visiting some vertices more than once, you can always drop some edges to get a tree. So in general the MST weight is less than the TSP weight, because it's a minimization over a strictly larger set. On the other hand, if you draw a path tracing around the minimum spanning tree, you trace each edge twice and visit all points, so the TSP weight is less than twice the MST weight. Therefore this tour is within a factor of two of optimal.
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 57
Self Assessment Questions
1. What is a Minimum Spanning Tree? Explain its properties.
2. How many Minimum Spanning tress will a complete graph have six vertices?
3. Discuss the significance of Minimum Spanning Tree.
2.3 How to find minimum Spanning Tree?
The stupid method is to list all spanning trees, and find minimum of list. We already know how to find minima... But there are far too many trees for this to be efficient. It's also not really an algorithm, because you'd still need to know how to list all the trees. A better idea is to find some key property of the MST that lets us be sure that some edge is part of it, and use this property to build up the MST one edge at a time. For simplicity, we assume that there is a unique minimum spanning tree. You can get ideas like this to work without this assumption but it becomes harder to state your theorems or write your algorithms precisely.
2.3.1 Lemma
Let X be any subset of the vertices of G, and let edge e be the smallest edge connecting X to G-X. Then e is part of the minimum spanning tree.
Proof: Suppose you have a tree T not containing e; then I want to show that T is not the MST. Let e = (u, v), with u in X and v not in X. Then because T is a spanning tree it contains a unique path from u to v, which together with e forms a cycle in G. This path has to include another edge f connecting X to G-X. T+e-f is another spanning tree (it has the same number of edges, and remains connected since you can replace any path containing f by one going the other way around the cycle). It has smaller weight than t since e
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 58
has smaller weight than f. So T was not minimum, which is what we wanted to prove.
2.4 Kruskal's Algorithm
We'll start with Kruskal's algorithm, which is easiest to understand and probably the best one for solving problems by hand. Kruskal's algorithm: sort the edges of G in increasing order by length keep a subgraph S of G, initially empty for each edge e in sorted order if the endpoints of e are disconnected in S add e to S return S Note that, whenever you add an edge (u, v), it's always the smallest connecting the part of S reachable from u with the rest of G, so by the lemma it must be part of the MST. This algorithm is known as a greedy algorithm, because it chooses at each step the cheapest edge to add to S. You should be very careful when trying to use greedy algorithms to solve other problems, since it usually doesn't work. E.g. if you want to find a shortest path from a to b, it might be a bad idea to keep taking the shortest edges. The greedy idea only works in Kruskal's algorithm because of the key property we proved. Analysis: The line testing whether two endpoints are disconnected looks like it should be slow (linear time per iteration, or O(mn) total). The slowest part turns out to be the sorting step, which takes O(m log n) time.
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 59
2.5 Prim's algorithm
Rather than build a subgraph one edge at a time, Prime's algorithm builds a tree one vertex at a time. Prim's algorithm: let T be a single vertex x while (T has fewer than n vertices) { find the smallest edge connecting T to G-T add it to T } Example of Prim's algorithm in C language: /* usedp=>how many points already used p->array of structures, consisting x,y,& used/not used this problem is to get the MST of graph with n vertices which weight of an edge is the distance between 2 points */
usedp=p[0].used=1; /* select arbitrary point as starting point */ while (usedp<n) { small=-1.0; for (i=0;i<n;i++) if (p[i].used) for (j=0;j<n;j++) if (!p[j].used) { length=sqrt(pow(p[i].x-p[j].x,2) + pow(p[i].y-p[j].y,2)); if (small==-1.0 || length<small) { small=length; smallp=j; } } minLength+=small; p[smallp].used=1; usedp++ ; }
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 60
Since each edge added is the smallest connecting T to G-T, the lemma we proved shows that we only add edges that should be part of the MST. Again, it looks like the loop has a slow step in it. But again, some data structures can be used to speed this up. The idea is to use a heap to remember, for each vertex, the smallest edge connecting T with that vertex. Prim with heaps: make a heap of values (vertex,edge,weight(edge)) initially (v,-,infinity) for each vertex let tree T be empty while (T has fewer than n vertices) { let (v,e,weight(e)) have the smallest weight in the heap remove (v,e,weight(e)) from the heap add v and e to T for each edge f=(u,v) if u is not already in T find value (u,g,weight(g)) in heap if weight(f) < weight(g) replace (u,g,weight(g)) with (u,f,weight(f)) }
Analysis: We perform n steps in which we remove the smallest element in the heap, and at most 2m steps in which we examine an edge f=(u,v). For each of those steps, we might replace a value on the heap, reducing it's weight. (You also have to find the right value on the heap, but that can be done easily enough by keeping a pointer from the vertices to the corresponding values.) I haven't described how to reduce the weight of an element of a binary heap, but it's easy to do in O(log n) time. Alternately by using a more complicated data structure known as a Fibonacci heap, you can reduce the weight of an element in constant time. The result is a total time bound of O(m + n log n).
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 61
Self Assessment Questions
1. Explain the process of finding a minimum spanning tree.
2. State and prove the Kruskal’s Algorithm.
3. What is the underlying principle of Prime Algorithm? Explain
2.6 Finding Shortest Paths using BFS
This is only applicable to the graph with unweighted edges. Simply do BFS from start node to end node, and stop the search when it encounters the first occurrence of end node.
2.7 Relaxation
The relaxation process updates the costs of all the vertices, v, connected to a vertex, u, if we could improve the best estimate of the shortest path to v by including (u,v) in the path to v. The relaxation procedure proceeds as follows: initialize_single_source(Graph G,Node s) for each vertex v in Vertices(G) G.d[v]:=infinity G.pi[v]:=nil G.d[s]:=0; This sets up the graph so that each node has no predecessor (pi[v] = nil) and the estimates of the cost (distance) of each node from the source (d[v]) are infinite, except for the source node itself (d[s] = 0). Note that we have also introduced a further way to store a graph (or part of a graph - as this structure can only store a spanning tree), the predecessor sub-graph - the list of predecessors of each node, pi[j], 1 <= j <= |V|. The edges in the predecessor sub-graph are (pi[v],v).
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 62
The relaxation procedure checks whether the current best estimate of the shortest distance to v (d[v]) can be improved by going through u (i.e. by making u the predecessor of v): relax(Node u,Node v,double w[][]) if d[v] > d[u] + w[u,v] then d[v]:=d[u] + w[u,v] pi[v]:=u
2.8 Dijkstra Algorithm
Dijkstra's algorithm (invented by Edsger W. Dijkstra) solves the problem of finding the shortest path from a point in a graph (the source) to a destination. It turns out that one can find the shortest paths from a given source to all points in a graph in the same time, hence this problem is called the Single-source shortest paths problem. This problem is related to the spanning tree. The graph representing all the paths from one vertex to all the others must be a spanning tree - it must include all vertices. There will also be no cycles as a cycle would define more than one path from the selected vertex to at least one other vertex. For a graph, G=(V,E) where V is a set of vertices and E is a set of edges. Dijkstra's algorithm keeps two sets of vertices: S (the set of vertices whose shortest paths from the source have already been determined) and V-S (the remaining vertices). The other data structures needed are: d (array of best estimates of shortest path to each vertex) & pi (an array of predecessors for each vertex) The basic mode of operation is:
1. Initialise d and pi,
2. Set S to empty,
3. While there are still vertices in V-S,
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 63
i. Sort the vertices in V-S according to the current best estimate of their distance from source,
ii. Add u, the closest vertex in V-S, to S,
iii. Relax all the vertices still in V-S connected to u
Dijkstra Algorithm: DIJKSTRA(Graph G,Node s) initialize_single_source(G,s) S:={ 0 } /* Make S empty */ Q:=Vertices(G) /* Put the vertices in a PQ */ while not Empty(Q) u:=ExtractMin(Q); AddNode(S,u); /* Add u to S */ for each vertex v which is Adjacent with u relax(u,v,w)
2.9 Bellman-Ford Algorithm
It is a more generalized single-source shortest path algorithm which can find the shortest path in a graph with negative weighted edges. If there is no negative cycle in the graph, this algorithm will updates each d[v] with the shortest path from s to v, fill up the predecessor list "pi", and return TRUE. However, if there is a negative cycle in the given graph, this algorithm will return FALSE. BELLMAN_FORD(Graph G,double w[ ][ ],Node s) initialize_single_source(G,s) for i=1 to |V[G]|-1 for each edge (u,v) in E[G] relax(u,v,w) for each edge (u,v) in E[G] if d[v] > d[u] + w(u, v) then
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 64
return FALSE return TRUE Self Assessment Questions
1. What is the significance of Relaxation process? How does it proceed?
2. State Dijkstra Algorithm. In what situations this is considered to be useful?
3. What is the Bellman-Ford Algorithm? State and explain.
2.10 Single-source shortest paths in Directed Acyclic Graph (DAG)
There exist a more efficient algorithm ~ O(V+E) ~ for solving Single-source shortest path problem for a Directed Acyclic Graph (DAG). So if you know for sure that your graph is a DAG, you may want to consider this algorithm instead of using Djikstra. DAG_SHORTEST_PATHS(Graph G,double w[][],Node s) topologically sort the vertices of G // O(V+E) initialize_single_source(G,s) for each vertex u taken in topologically sorted order for each vertex v which is Adjacent with u relax(u,v,w) A sample application of this DAG_SHORTEST_PATHS algorithm (as given in CLR book) is to solve critical path problem, i.e. finding the longest path through a DAG, for example: calculating the fastest time to complete a complex task consisting of smaller tasks when you know the time needed to complete each small task and the precedence order of tasks.
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 65
2.11 Floyd Warshall and Variants
Given a directed graph, the Floyd-Warshall All Pairs Shortest Paths algorithm computes the shortest paths between each pair of nodes in O(n^3). Given: w : edge weights d : distance matrix p : predecessor matrix w[i][j] = length of direct edge between i and j d[i][j] = length of shortest path between i and j p[i][j] = on a shortest path from i to j, p[i][j] is the last node before j, i.e. i -> ... -> p[i][j] -> j
Initialization
for (i=0; i<n; i++) for (j=0; j<n; j++) { d[i][j] = w[i][j]; p[i][j] = i; } for (i=0; i<n; i++) d[i][i] = 0;
The Algorithm
for (k=0;k<n;k++) /* k -> is the intermediate point */ for (i=0;i<n;i++) /* start from i */ for (j=0;j<n;j++) /* reaching j */ /* if i-->k + k-->j is smaller than the original i-->j */ if (d[i][k] + d[k][j] < d[i][j]) { /* then reduce i-->j distance to the smaller one i->k->j */ d[i][j] = d[i][k]+d[k][j]; /* and update the predecessor matrix */
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 66
p[i][j] = p[k][j]; } Note: In the k-th iteration of the outer loop, we try to improve the currently known shortest paths by considering k as an intermediate node. Therefore, after the k-th iteration we know those shortest paths that only contain intermediate nodes from the set {0, 1, 2,...,k}. After all n iterations we know the real shortest paths.
Constructing a Shortest Path
print_path (int i, int j) { if (i!=j) print_path(i,p[i][j]); print(j); }
2.11.1 Transitive Hull
Given a directed graph, the Floyd-Warshall algorithm can compute the Transitive Hull in O(n^3). Transitive means, if i can reach k and k can reach j then i can reach j. Transitive Hull means, for all vertices, compute its reachability. w : adjacency matrix d : transitive hull w[i][j] = edge between i and j (0=no edge, 1=edge) d[i][j] = 1 if and only if j is reachable from i
Initialization
for (i=0; i<n; i++) for (j=0; j<n; j++) d[i][j] = w[i][j];
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 67
for (i=0; i<n; i++) d[i][i] = 1;
The Algorithm
for (k=0; k<n; k++) for (i=0; i<n; i++) for (j=0; j<n; j++) /* d[i][j] is true if d[i][j] already true or if we can use k as intermediate vertex to reach j from i, otherwise, d[i][j] is false */ d[i][j] = d[i][j] || (d[i][k] && d[k][j]);
2.11.2 MiniMax Distance
Given a directed graph with edge lengths, the Floyd-Warshall algorithm can compute the minimax distance between each pair of nodes in O(n^3). For example of a minimax problem, refer to the Valladolid OJ problem below. w : edge weights d : minimax distance matrix p : predecessor matrix w[i][j] = length of direct edge between i and j d[i][j] = length of minimax path between i and j
Initialization
for (i=0; i<n; i++) for (j=0; j<n; j++) d[i][j] = w[i][j]; for (i=0; i<n; i++) d[i][i] = 0;
The Algorithm
for (k=0; k<n; k++) for (i=0; i<n; i++)
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 68
for (j=0; j<n; j++) d[i][j] = min(d[i][j], max(d[i][k], d[k][j]));
2.11.3 MaxiMin Distance
You can also compute the maximin distance with the Floyd-Warshall algorithm. Maximin is the reverse of minimax. Again, look at Valladolid OJ problem given below to understand maximin. w : edge weights d : maximin distance matrix p : predecessor matrix w[i][j] = length of direct edge between i and j d[i][j] = length of maximin path between i and j
Initialization
for (i=0; i<n; i++) for (j=0; j<n; j++) d[i][j] = w[i][j]; for (i=0; i<n; i++) d[i][i] = 0;
The Algorithm
for (k=0; k<n; k++) for (i=0; i<n; i++) for (j=0; j<n; j++) d[i][j] = max(d[i][j], min(d[i][k], d[k][j]));
2.11.4 Safest Path
Given a directed graph where the edges are labeled with survival probabilities, you can compute the safest path between two nodes (i.e. the path that maximizes the product of probabilities along the path) with Floyd Warshall.
w : edge weights p : probability matrix
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 69
w[i][j] = survival probability of edge between i and j p[i][j] = survival probability of safest path between i and j
Initialization
for (i=0; i<n; i++) for (j=0; j<n; j++) p[i][j] = w[i][j]; for (i=0; i<n; i++) p[i][i] = 1;
The Algorithm
for (k=0; k<n; k++) for (i=0; i<n; i++) for (j=0; j<n; j++) p[i][j] = max(p[i][j], p[i][k] * p[k][j]); Self Assessment Questions
1. Explain Single-source shortest paths in Directed Acyclic Graph (DAG).
2. List and briefly explain the variants of Floyd Warshall.
2.12 Other Graphs
2.12.1 Graph Transpose Problem
Input: directed graph G = (V,E) Output: graph GT = (V,ET), where ET = {(v,u) in VxV : (u,v) in E}. i.e. GT is G with all its edges reversed. Describe efficient algorithms for computing GT from G, for both the adjacency-list and adjacency-matrix representations of G. Analyze the running times of your algorithms.
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 70
Using Adjacency List representation, array B is the new array of Adjacency
List GT
for (i=1;i<=p;i++)
B[i] = nil;
for (i=1;i<=p;i++)
repeat {
append i to the end of linked list B[A[i]];
get next A[i];
} until A[i] = nil;
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 71
Total Time = O(V+E) Using Adjacency Matrix representation, D[ ][ ] is the new adj matrix for GT for (i=1;i<=p;i++) for (j=1;j<=p;j++) D[j][i] = C[i][j]; Total time = O(V2)
2.12.2 Euler Cycle
Input: Connected, directed graph G = (V,E) Output: A cycle that traverses every edge of G exactly once, although it may visit a vertex more than once. Theorem: A directed graph possesses an Eulerian cycle iff 1) It is connected 2) For all {v} in {V} indegree(v) = outdegree(v)
2.12.3 Euler Path
Input: Connected, directed graph G = (V,E) Output: A path from v1 to v2, that traverses every edge of G exactly once, although it may visit a vertex more than once. Theorem: A directed graph possesses an Eulerian path iff 1) It is connected 2) For all {v} in {V} indegree(v) = outdegree(v) with the possible exception of two vertices v1,v2 in which case, a) indegree(v1) = outdegree(v2) + 1 b) indegree(v2) = outdegree(v1) - 1 Not yet known: Finding O(E)-time algorithm to find an Euler tour of G if exist, using merging of edge-disjoint cycles
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 72
2.13 Topological Sort Problem
Input: A directed acyclic graph (DAG) G = (V,E) Output: A linear ordering of all vertices in V such that if G contains an edge (u,v), then u appears before v in the ordering. If drawn on a diagram, Topological Sort can be seen as a vertices along horizontal line, where all directed edges go from left to right. Note: A directed graph G is acylic if and only if a DFS of G yields no back edge. Topological-Sort(G) 1. call DFS(G) to compute finishing times f[v] for each vertex v 2. as each vertex is finished, insert it onto the front of a linked list 3. return the linked list of vertices Topological-Sort runs in O(V+E) ~~ due to DFS Self Assessment Questions
1. What is Graph Transpose Problem? Discuss.
2. Explain Eulerian Cycle & Eulerian Path with suitable example.
2.14 Strongly Connected Components problem
Input: A directed graph G = (V,E) Output: All strongly connected components of G, where in strongly connected component, all pair of vertices u and v in that component, we have u ~~> v and v ~~> u, i.e. u and v are reachable from each other. Strongly-Connected-Components(G) Step 1: call DFS(G) to compute finishing times f[u] for each vertex u ~~ O(V+E) Step 2: compute GT, inversing all edges in G ~~ O(V+E) using adjacency list
Advanced Data Structures using ‘C’ Unit 2
Sikkim Manipal University Page No.: 73
Step 3: call DFS(GT), but in the main loop of DFS, consider the vertices in order of decreasing f[u] as computed in step 1 ~~ O(V+E) Step 4: output the vertices of each tree in the depth-first forest of step 3 as a separate strongly connected component Strongly-Connected-Components runs in O(3(V+E)) ~~ O(V+E)
2.15 Summary
A spanning tree of a graph is just a subgraph that contains all the vertices and is a tree. Minimum Spanning tree is a useful method to solve practical network, business and other problems. Kruskal’s algorithm, Prim’s algorithm are used to find the minimum spanning tree. Dijkstra’s algorithm solves the problem of finding the shortes path from a point in a graph to a destination. Bellman-Ford algorithm is a most generalized single source shortest path algorithm to find the shortest path in a graph even with negative weights 2.16 Terminal Questions
1. We can sort a given set of numbers by first building a binary search tree containing these numbers (by repeatedly using Tree-Insert) and then printing the numbers by an inorder tree traversal. What are the worst-case and best-case running times for this sorting algorithm?
2. Show that if a node in a binary search tree has two children, then its successor has no left child, and its predecessor has no right child.
3. Is the operation of deletion "commutative" in the sense that deleting x and then y from a binary search tree leaves the same tree as deleting y and then x? Argue why it is or give a counterexample.
4. Write a procedure that will count the leaves of a binary tree. (Hint – think recursively)

No comments:

Post a Comment