Question: How to
write and run your own apache giraph code (Computation / InputFormat
/ OutputFormat)?
Answer:
There are two ways according to my knowledge.
Way
1:
1.
Write your custom computation code.
2.
Copy that file in apache giraph source code's “giraph-examples”
folder.
3.
Compile whole giraph source code again with maven.
4.
Use giraph jar to run your code, similar to example given on giraph
quick start page.
Problem:
Very
slow process.
Hadoop
psudo-mode setup require.
Input
and output files are in HDFS.
Everytime
you need to compile giraph-source code with maven.
Way
2:
1.
Use Eclipse or other IDE
2.
Add jar files from Apache-Hadoop's lib folder & Apache-Giraph's
lib folder in Build-path.
3.
Write your custom inputFormat/outputFormat/Computation java code
4.
Write your giraph Runner java code. (Giraph Job Runner file)
5.
Run and test your code on single Click
Advantages:
Faster
than Way1.
No
Hadoop setup require.
Input
and Output files are on local Systems only.
In
development phase, we make lot of changes in our code and we need
fast result on our sample test input file. So way-2 works better in
this case.
Question:
How we can debug in those two cases?
Answer:
We have already discussed two Ways to write/run giraph
code.
For way-1, semih and his team at Stanford
University have developed tool called Graft and now it is part of
Apache Giraph project.
For way-2, I am not sure if Graft can also work in
this case. If not than we can build another debugger to work for
way-2.
Question:
How to approach for building debugger for Way-2?
Answer:
Basic idea is to trace the state of vertice,edges & messages of
each superstep, store them in JSON format and plot them using your
custom Graph Visulization program.
Question:
How much progress I have done for building debugger for way-2?
Answer:
I have partially developed Graph Visulization program
which is inspired by Graft. I have defined my own JSON format and
using it to plot graphs according to coresponding supersteps.
Question:
Where Am I stuck?
Answer:
I am not able figure out on how to trace program. I must store trace
in JSON format and visualize it.
I can either write my trace method which need to
be called by user and use will feed current vertex & message
status as parameter in it.
I may also change the original java source code of
standard files(like BasicComputation) and repackage them in jar. So
user must use my modified jar files instead of original giraph-lib
jars.
Hi Nishant..Can you share me a sample code to run it from eclipse...custom inputFormat/outputFormat/Computation java code and also GiraphRunner code for that.it will be very helpful for me..can you guide me for that.plzz help me bro i stuck in it fro last 15 days..
ReplyDeletei will make a video and send it to you.
ReplyDelete