Skip to content

needdle/git-dl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Build Status

Git-Deep-Learning tool

This utility project contains a git command to turn the change history of Git repository into Protobuf structures for Tensorflow Fold.

If the Git diff hunks come from a source code file (determined by a file extension recognisable by srcML), we turn them into flattened abstract syntax trees binary structures in Protobuf.

Installation

First install the following prerequisites.

Dependencies (prerequisites)

Then perform the following commands to build the tool:

make
sudo make install

Usage

  1. Display detailed log information of Git
git dl log
  1. Turn the log information into a protobuf representation
git dl log | gitlog a

where a.pb contains the protobuf information in binary.

  1. Show the binary protobuf in nested textual format
cat a.pb | protoc -I. --decode=fast.Log git.proto
  1. Concatenate two protobuf binary files into a single one, removing the duplicates
catlog a-0.pb a-1.pb a.pb
cat a.pb | protoc -I. --decode=fast.Log git.proto

where a-0.pb and a-1.pb are logs, a.pb should be the merged (unique) logs.

  1. Split the log equally into M / ((M+N-1)/N) jobs, where M is the number of logs, N is the number of splitted files
git dl log | gitlog a $N

where $N is the number of splitted files, in other words, the command will produce a series of files

a-0.pb, ..., a-($N-1).pb

so that merging them together using catlog a-*.pb a.pb is the same as the single result file a.pb of running the previous gitlog a command.

  1. Parse larger repositories it can take substantial amount of time. To speed it up, you can use GNU Parallel to run the process by the following commands:
git dl log | gitlog -p a $N
parallel --plus "cat {} | gitlog {/.log/}" ::: a-*.log
catlog a-*.pb a.pb
rm -f a-*.log a-*.pb

where -p tells the processor to split the log first into a-*.log then the next command produces a-*.pb in parallel.

The above command sequence has also been simplified to a single command below:

git dl pb $file $num_threads

where $file is the filename for the protobuf output, $num_threads is the number of threads for parallel processing.

  1. Load the git log from protobuf binaries into python, prepared for Tensorflow Fold library.
python src/fold.py a.pb

where a.pb is the protobuf file created from previous commands.

  1. Slice git log for a particular commit and a particular file from protobuf binaries into python, prepared for Tensorflow Fold library.
git dl slice $commit $path a

where $commit is the hash of a commit id, $path is the path to a file in the changeset, a.pb is the protobuf file to be created by the command.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors