Monday 9 March 2015

Writeup I sent to semih, Stanford University for my Giraph Tester project.


Question: How to write and run your own apache giraph code (Computation / InputFormat / OutputFormat)?
Answer: There are two ways according to my knowledge.
Way 1:
1. Write your custom computation code.
2. Copy that file in apache giraph source code's “giraph-examples” folder.
3. Compile whole giraph source code again with maven.
4. Use giraph jar to run your code, similar to example given on giraph quick start page.
Problem:
Very slow process.
Hadoop psudo-mode setup require.
Input and output files are in HDFS.
Everytime you need to compile giraph-source code with maven.

Way 2:
1. Use Eclipse or other IDE
2. Add jar files from Apache-Hadoop's lib folder & Apache-Giraph's lib folder in Build-path.
3. Write your custom inputFormat/outputFormat/Computation java code
4. Write your giraph Runner java code. (Giraph Job Runner file)
5. Run and test your code on single Click
Advantages:
Faster than Way1.
No Hadoop setup require.
Input and Output files are on local Systems only.

In development phase, we make lot of changes in our code and we need fast result on our sample test input file. So way-2 works better in this case.

Question: How we can debug in those two cases?
Answer: We have already discussed two Ways to write/run giraph code.
For way-1, semih and his team at Stanford University have developed tool called Graft and now it is part of Apache Giraph project.
For way-2, I am not sure if Graft can also work in this case. If not than we can build another debugger to work for way-2.


Question: How to approach for building debugger for Way-2?
Answer: Basic idea is to trace the state of vertice,edges & messages of each superstep, store them in JSON format and plot them using your custom Graph Visulization program.


Question: How much progress I have done for building debugger for way-2?
Answer: I have partially developed Graph Visulization program which is inspired by Graft. I have defined my own JSON format and using it to plot graphs according to coresponding supersteps.


Question: Where Am I stuck?
Answer: I am not able figure out on how to trace program. I must store trace in JSON format and visualize it.
I can either write my trace method which need to be called by user and use will feed current vertex & message status as parameter in it.
I may also change the original java source code of standard files(like BasicComputation) and repackage them in jar. So user must use my modified jar files instead of original giraph-lib jars.

2 comments:

  1. Hi Nishant..Can you share me a sample code to run it from eclipse...custom inputFormat/outputFormat/Computation java code and also GiraphRunner code for that.it will be very helpful for me..can you guide me for that.plzz help me bro i stuck in it fro last 15 days..

    ReplyDelete
  2. i will make a video and send it to you.

    ReplyDelete