Pregel: Distributed Graph Processing Framework
Motivation:-
Efficient processing of large graph is facing following problems.
- Poor locality of memory access
- Very little work per vertex
- Changing degree of parallelism over the course of execution
No scalable general purpose system available for implementing
arbitrary graph algorithm over arbitrary graph representation in
large-scale distributed environment.
Algorithm implementation to process large graph can be done by one of
the following options.
1) Designing custom distributed infrastructure which require
considerable implementation effort for each new algorithm or graph
representation.
2) Using available distributed computing platform which are not
always well suited for graph processing like MapReduce.
3) Use of single computer graph algorithm like BGL, LEDA, NetworkX,
JDSL which limit the scalability
4) Using existing parallel graph system like BGL & CGMgraph but
they do not address fault tolerances or other distributed system
issue.
Proposed Solution:-
Valiant’s Bulk Synchronous Parallel Model
Vertex-Centric approach to solve problem
Pragel computations consist of sequences of iterations called super
Supersteps.
During each Superstep S following operations can be performed,
- It can compute user defined function for each vertex V
- Each V can read message, sent it while S-1 Superstep.
- It can send message to V that will be received in S+1 Superstep
- It can modify state of V & outgoing edge. It can also change graph topology
Note: Messages are typically sent along outgoing edge, but message
can be sent to any vertex whose identifier is known.
Computation Model:-
Input: Directed Graph
Each vertex has Vertex-Identifier and modifiable user defined value.
Each directed edge has Source Vertices, modifiable user defined value
and target vertex identifier.
Pregel computation consists of,
- Input
- Supersteps separated by global synchronization points
- Algorithm termination
- Output
Algorithm Termination: It terminates when every vertex voting
for halt.
Example,
Consider Superstep 0 when all vertexes are active. A vertex
deactivates itself by voting for halt. If halted vertex receives any
messages then it again activate. To go again in deactivate stage,
vertex must vote for halt again. Algorithm terminates when every
vertex vote for halt.
Vertex & edge can be added & removed during computations.
Advantage over MapReduce:-
- Graph algorithm can be written as series of chained MapReduce invocation. This has bad performance & usability. Pregel overcome this problems.
- Pregel keeps vertices & edges on machine where it performs computation & only use network for message passing. But heavy network bandwidth is used in MapReduce.
- MapReduce is functional type programming, so expressing graph algorithm as chained MapReduce require passing entire state of the graph from one stage to next stage producing large overhead on communication & associated serializability.
- There is a need to co-ordinate the steps of chained MapReduce which adds programming complexity. That is avoided by Pregel because of Bulk Synchronization Model.
Sample Problem Solved using Pregel:-
Here dotted lines are messages. Dark vertices have voted to halt. In
each step supersteps, vertices send maximum vertex value to their
neighbor vertices. If vertex itself is bigger than its neighbor
vertex then it votes for halt. Algorithm terminates when every vertex
halt.
References:-
[1] G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert,I. Horn,
N. Leiser, and G. zajkowski. Pregel: A System for Large-Scale Graph
Processing. In SIGMOD, 2011.
I'm very pleased to discover this site. I need to to thank you for ones time just for this wonderful read!!
ReplyDeleteOracle Training in Chennai | Certification | Online Training Course | Oracle Training in Bangalore | Certification | Online Training Course | Oracle Training in Hyderabad | Certification | Online Training Course | Oracle Training in Online | Oracle Certification Online Training Course | Hadoop Training in Chennai | Certification | Big Data Online Training Course