-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #11 from deepcurator/development
Merge for Milestone 7
- Loading branch information
Showing
6,505 changed files
with
1,096,776 additions
and
44,860 deletions.
The diff you're trying to view is too large. We only load the first 3000 changed files.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,21 +1,18 @@ | ||
# Code2graph | ||
|
||
The code2graph is a Python module that aims to transform the source code related to Deep Learning Architectures and methodologies into RDF graphs. In code2graph, the building blocks of the pipeline are implemented with a flexible architecture. | ||
The code2graph is a module in project [Deep Code Curator](https://github.com/deepcurator/DCC) (DCC) which aims to extract the information from scientific publications and the corresponding source code related to Deep Learning architectures and methodologies. | ||
The code2graph is a sub-module in DCC ([Deep Code Curator](https://github.com/deepcurator/DCC)) which aims to extract sementic information from text, images, code and equation accompanied with scientific DL papers. The purpose of code2graph is to build a pipeline of methodologies to extract Resource Description Framework (RDF) graphs, particularly from the code repositories related to DL publications. The figure below illustrates the current architecture. | ||
|
||
Currently, two methodogies are included in code2graph. | ||
1. Computation-based Approach, see [graphHandler.py](https://github.uci.edu/AICPS/code2graph/blob/master/core/graphHandler.py). | ||
2. The Lightweight Approach, see [graphlightweight.py](https://github.uci.edu/AICPS/code2graph/blob/master/core/graphlightweight.py). | ||
![](https://github.com/louisccc/DCC/blob/master/src/code2graph/figs/architecture.jpg?raw=true) | ||
|
||
Computation-based Approach (MNist) | The Lightweight Approach (VGG) | ||
:-------------------------:|:-------------------------: | ||
![](https://github.uci.edu/AICPS/code2graph/blob/master/figs/Sample_Output_0.png?raw=true) | ![](https://github.uci.edu/AICPS/code2graph/blob/master/figs/Sample_Output_1_.png?raw=true) | ||
Two methodogies are studied in code2graph. | ||
1. The Computational Graph-Based Approach ([graphHandler.py](https://github.com/deepcurator/DCC/blob/master/src/code2graph/core/graphHandler.py)) | ||
2. The Lightweight Approach ([graphlightweight.py](https://github.com/deepcurator/DCC/blob/master/src/code2graph/core/graphlightweight.py)) | ||
|
||
The following figure illustrates the current pipeline architecture of code2graph: | ||
![](https://github.uci.edu/AICPS/code2graph/blob/master/figs/architecture.jpg?raw=true) | ||
You can find details from [Technical Report on Code2Graph](http://cecs.uci.edu/files/2019/05/TR-19-01.pdf). A sample visualization of the graphs generated from both methods is shown below: (using [fashion MNIST program example](https://github.com/deepcurator/DCC/blob/master/src/code2graph/test/fashion_mnist/testGraph_extensive.py)) | ||
|
||
To understand the pipeline of code2graph better, you can refer to | ||
- [Deep Code Curator - Technical Report on Code2Graph](http://cecs.uci.edu/files/2019/05/TR-19-01.pdf) | ||
Computational Graph-based Approach (MNist) | The Lightweight Approach (MNist) | ||
:-------------------------:|:-------------------------: | ||
<img src="https://github.com/louisccc/DCC/blob/master/src/code2graph/figs/Sample_Output_0.png?raw=true">|<img src="https://github.com/louisccc/DCC/blob/master/src/code2graph/figs/Sample_Output_1_.png?raw=true" width="850"> | ||
|
||
## Software Dependencies | ||
|
||
|
@@ -26,11 +23,10 @@ To understand the pipeline of code2graph better, you can refer to | |
|
||
## Installation Guide | ||
|
||
Step 1: Clone the git respository by running one of the commands shown in the following snippets. | ||
Step 1: Clone the git respository by running the command below. | ||
|
||
```shell | ||
git clone https://github.uci.edu/AICPS/code2graph.git | ||
git clone [email protected]:AICPS/code2graph.git | ||
git clone https://github.com/deepcurator/DCC.git | ||
``` | ||
|
||
Step 2: Create a python virtual environment using your favorite package management system (conda, virtualenv, etc). | ||
|
@@ -48,31 +44,47 @@ Step 3: Install the required packages to your virtual environment. | |
```shell | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Package Dependencies | ||
|
||
* jupyter==1.0.0 => Jupyter notebook. | ||
* jupyter-console==5.0.0 => Jupyter notebook. | ||
* ipython==5.3.0 => Jupyter notebook. | ||
* pyvis==0.1.6.0 => RDF graph visualization. | ||
* astor==0.7.1 => AST manipulation and printing. | ||
* beautifulsoup4==4.7.1 => Webscraping. | ||
* Keras==2.2.4 => Compile Keras projects. | ||
* tensorflow==1.13.1 => Compile tensorflow projects. | ||
* matplotlib==3.0.2 | ||
* networkx==2.2 | ||
* rdflib==4.2.2 => RDF graph construction. | ||
* requests==2.21.0 => Webscraping. | ||
* scikit-learn==0.20.2 | ||
* selenium==3.141.0 => Webscraping. | ||
* urllib3==1.24.1 => Webscraping. | ||
* wget==3.2 => Webscraping. | ||
* lxml==4.3.4 => Webscraping. | ||
* showast==0.2.4 => Visualizing AST. | ||
* autopep8==1.4.4 => Preprocess data. | ||
* apscheduler==3.6.1 => Scheduler for web crawler. | ||
|
||
## Usage Examples | ||
### Running Computation-Based Approach | ||
Under Construction, or you can also refer to the [notebook](testScript/computational_graph_based.ipynb). | ||
Refer to the [notebook](testScript/computational_graph_based.ipynb). | ||
|
||
### Running Lightweight Approach | ||
Run the follwing command, or you can also refer to the [notebook](testScript/light_weight.ipynb). | ||
|
||
```shell | ||
python script_run_lightweight_method.py -ipt [PATH_TO_CODE] -opt [N [N ...]] --arg | ||
``` | ||
-ipt: Path to directory that contains the source code of your machine learining model. | ||
Refer to the [notebook](testScript/light_weight.ipynb). | ||
|
||
-opt: Types of output: 1 = call graph, 2 = call tress, 3 = RDF graphs, 4 = TensorFlow sequences. | ||
## Dataset | ||
|
||
--arg: Show arguments on graph (Hidden by default). | ||
--url: Show url/is_type relations on graph (Hidden by default). | ||
Using our script we scraped around 600 papers from paperswithcode.com website. Out of 600 papers, 120 of them have tensorflow implementation. We ran the lightweight method on those TensorFlow papers we scraped from Paperswithcode website. The lightweight method was successful on half of the tensorflow repositories. You can download the RDF graphs and triples we generated [here](https://osf.io/zrusg/?view_only=f6ed10613af94c6d8050796a30f1568b). | ||
|
||
### Running Webscraper for Paperswithcode website | ||
|
||
```shell | ||
python script_scrape_paperswithcode.py -cd [PATH_TO_CHROMEDRIVER] | ||
python script_service_pwc_scraper.py -cd [PATH_TO_CHROMEDRIVER] -sp [SAVE_PATH] | ||
``` | ||
|
||
-cd: Path to ChromeDriver. To get the ChromeDriver compatible with your browser go to the following website - http://chromedriver.chromium.org/downloads and download the ChromeDriver for the version of Chrome you are using. | ||
-cd: Path to ChromeDriver. To get the ChromeDriver compatible with your browser go to the following website - [ChromeDriver](http://chromedriver.chromium.org/downloads) and download the ChromeDriver for the version of Chrome you are using. | ||
|
||
### Running The Summary File Extractor | ||
### Running Computation-Based Approach | ||
-sp: The script will save the scraped data in this path. |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
[PWCScraper_email] | ||
# modify to sender gmail address | ||
email_address = [email protected] | ||
# modify to sender gmail password | ||
password = sender_password | ||
# modify to recipient gmail address (delimited by comma) | ||
recipients = [email protected],[email protected] | ||
|
||
[Database] | ||
user = code2graph | ||
password = testing123 | ||
host = 127.0.0.1 | ||
port = 5432 | ||
database = code2graph |
Oops, something went wrong.