Artificial Text Detection

Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.

Installation steps:

We use poetry as an enhanced dependency resolver.

make poetry-download
poetry install --no-dev

Datasets for artificial text detection

To create datasets for the further classification, it is necessary to collect them. There are 2 available ways for it:

Via Data Version Control. Get in touch with @msaidov in order to have the access to the private Google Drive;
Via datasets generation. One dataset with a size of 20,000 samples was process with MT model on V100 GPU for 30 mins;

Data Version Control usage:

poetry add "dvc[gdrive]"

Then, run dvc pull. It will download preprocessed translation datasets from the Google Drive.

Datasets generation

To generate translations before artificial text detection pipeline, install the detection module from the cloned repo or PyPi (TODO):

pip install -e .

Then, run generate script:

python detection/data/generate.py --dataset_name='tatoeba' --size=20000 --device='cuda:0'

Simple run:

To run the artificial text detection classifier, execute the pipeline:

python detection/old.py

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
.dvc		.dvc
.github		.github
artificial_detection		artificial_detection
resources		resources
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.dvcignore		.dvcignore
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
codecov.yml		codecov.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Artificial Text Detection

Contents

Installation steps:

Datasets for artificial text detection

Data Version Control usage:

Datasets generation

Simple run:

About

Uh oh!

Uh oh!

Languages

License

martysai/artificial-text-detection

Folders and files

Latest commit

History

Repository files navigation

Artificial Text Detection

Contents

Installation steps:

Datasets for artificial text detection

Data Version Control usage:

Datasets generation

Simple run:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages