CoCoSUM

1. Environment

OS: Ubuntu 16.04.6 LTS / 18.04.4 LTS

$ conda create -n CoCoSUM python=3.6
$ conda activate CoCoSUM
$ conda install numpy pytorch-gpu=1.3.1
$ pip install torch-scatter==1.4.0  
$ pip install torch-sparse==0.4.3  
$ pip install torch-cluster==1.4.5 
$ pip install torch-geometric==1.3.2
$ pip install git+https://github.com/casics/spiral.git
$ pip install nltk==3.2.5  ipdb==0.13.3 javalang==0.12.0 networkx==2.3 BeautifulSoup4
$ pip install lxml
$ conda install natsort==7.0.1 pydot==1.4.1 
$ pip install tensorflow-gpu==2.0.0 tensorflow-hub==0.7.0 decamelize==0.1.2 
$ sudo apt install tofrodos

Docker

docker pull enshi/cocosum:v1
docker run --runtime=nvidia -itd --name=cocosum_env --ipc=host --net=host -v $HOME/sciteamdrive2:/sciteamdrive2 enshi/cocosum:v0
# the above command will return the  [container id]
docker start  [container id]
docker exec -it   [container id] /bin/bash

you should download srcml 0.9.5 and install it using:

$ sudo dpkg -i srcML-Ubuntu[your version]

2. Quick start

It will spend so many time in data processing. Thus, we provide two version open source code: all code based and mini data based.

All data

From scratch

If all the requirements are met, just run:

$ fromdos  ./pipeline.sh
$ bash ./pipeline.sh

It will

download the Code Search Net Raw Data and save in data/csn/java.
get code, summary, sbt ,and uml & save in csn/data/code, csn/data/summary, csn/data/sbt, and csn/data/uml respectively.
Build vocabulary and dataset & save in csn/data/raw_vocabulary and csn/data/dataset
train and evaluate the model.

More details you can see here

Obtain Dataset and Train a Model

step 1: Data Preparation

It will spend so many time in full data processing. Thus, we provide a processed dataset. you can:

$ wget http://157.55.87.47:13111/data.tar.bz2
$ tar -jxvf data.tar.bz2
$ mv data/csn/* CodeSum/data/csn/
$ cd ./CodeSum/data/csn/
$ cp dlen100_clen12_slen435_dvoc10000_cvoc10000_svoc10000_dataset.pkl  ./dataset
$ cp cfp1_csi1_cfd0_clc1.pkl ./summary
$ cd ../../..

step 2: Training a model

You can either load a trained model from ./model/model.pth, or train it from scratch by:

$ cd CodeSum
$ python runGruUml.py -modeltype=uml  -dty=ASTAttGRU_AttTwoChannelTrans -dataset_path "./data/csn/"

step 3: Automatic Evaluation

Please specify model_file and prediction_path according to your file's location

$ python evaluate.py -modeltype=uml -model_root_path="./model/" -model_file='model.pth' -prediction_path='./predict'

Mini data

Similar to All data. If all the requirements are met, just run:

$ cd CodeSum
$ fromdos  ./pipeline_mini_dataset.sh
$ bash ./pipeline_mini_dataset.sh

It will take about 5 min.

The log of data processing will save in ./datapreprocess/CodeSearchNet/log
The model will save in ./model
we provide four metrics: BlEU-4, ROUGe, Cider, and Meteor.

[Optional] data format: pickle2jsonl

We provide a script for data format transformation from pickle to json/jsonl.

cd pickle2jsonl
python pickle2jsonl.py --filename {PATH_TO_PICKLE_FILE} --outdir {JSON_OUTPUT_DIR}

RQ1

bash script/RQ1.sh

If you encounter problems running the script, do fromdos script/RQ1.sh and try again.

RQ2

bash script/RQ2.sh

RQ3

See jupyter notebook here.

RQ4

bash script/RQ4.sh

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
RQ		RQ
UmlEmbeddingTools		UmlEmbeddingTools
appendix		appendix
clsname_sum_intersect		clsname_sum_intersect
code_summarization		code_summarization
csn_mini_data		csn_mini_data
data/csn		data/csn
datapreprocess		datapreprocess
human-evaluation		human-evaluation
model		model
models		models
output		output
pickle2jsonl		pickle2jsonl
script		script
update_script		update_script
util		util
.gitignore		.gitignore
README.md		README.md
evaluate.py		evaluate.py
explore_data.ipynb		explore_data.ipynb
hybrid_result.ipynb		hybrid_result.ipynb
loadModel.py		loadModel.py
pipeline.sh		pipeline.sh
pipeline_mini_dataset.sh		pipeline_mini_dataset.sh
runGruUml.py		runGruUml.py
runSeqModel.py		runSeqModel.py
tf2.yaml		tf2.yaml
vsm_test.ipynb		vsm_test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CoCoSUM

1. Environment

2. Quick start

All data

From scratch

Obtain Dataset and Train a Model

step 1: Data Preparation

step 2: Training a model

step 3: Automatic Evaluation

Mini data

[Optional] data format: pickle2jsonl

RQ1

RQ2

RQ3

RQ4

About

Uh oh!

Releases

Packages

Languages

DeepSoftwareAnalytics/cocosum

Folders and files

Latest commit

History

Repository files navigation

CoCoSUM

1. Environment

2. Quick start

All data

From scratch

Obtain Dataset and Train a Model

step 1: Data Preparation

step 2: Training a model

step 3: Automatic Evaluation

Mini data

[Optional] data format: pickle2jsonl

RQ1

RQ2

RQ3

RQ4

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages