Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail to reproduce the work #32

Open
muyuhuatang opened this issue Apr 5, 2021 · 4 comments
Open

Fail to reproduce the work #32

muyuhuatang opened this issue Apr 5, 2021 · 4 comments

Comments

@muyuhuatang
Copy link

Could you please check the implementation steps you provided in the README file?

I followed your instructions but find it very hard to reproduce this work, someerrors would come out like version inconsistency between allennlp and transformers, then lead to error like:

subprocess.CalledProcessError: Command 'allennlp train training_config/classifier.jsonnet --include-package dont_stop_pretraining -s model_logs\citation_intent_base' returned non-zero exit status 1.

Or just there are some wrong steps during my implementation? It is really confusing and frustrating.

@muyuhuatang
Copy link
Author

May I ask what is the allennlp version in this project? I tried 2.2.0 and 0.9.0, but all lead to errors.

@coxep
Copy link

coxep commented Apr 18, 2021

I tried using the pinned version (specified in environment.yml), and that also failed with the error shared above. Please provide a working environment.yml.

@gmarcial44
Copy link

I think there might be an issue with the datasets that are publicly available?

ClientError: An error occurred (403) when calling the HeadObject operation: Forbidden

@wise-east
Copy link

@gmarcial44 are you using the latest-allennlp branch? if so, I was able to get around this issue by replacing the environments.datasets.py file with the following:

NER_DATASETS = {
    "ncbi": {
        "data_dir": "/home/suching/scibert/data/ner/NCBI-disease/",
    },
    "sciie": {
        "data_dir": "/home/suching/scibert/data/ner/sciie/"
    },
    "jnlpba": {
        "data_dir": "/home/suching/scibert/data/ner/JNLPBA/"
    },
    "bc5cdr": {
        "data_dir": "/home/suching/scibert/data/ner/bc5cdr/"
    }
}



CLASSIFICATION_DATASETS = {
    "chemprot": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/chemprot/",
        "dataset_size": 4169
    },
    "rct-20k": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/rct-20k/",
        "dataset_size": 180040
    },
    "rct-sample": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/rct-sample/",
        "dataset_size": 500
    },
    "citation_intent": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/citation_intent/",
        "dataset_size": 1688
    },
    "sciie": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/sciie/",
        "dataset_size": 3219
    },
    "ag": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/ag/",
        "dataset_size": 115000
    },
    "hyperpartisan_news": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/hyperpartisan_news/",
        "dataset_size": 500
    },
    "imdb": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/imdb/",
        "dataset_size": 20000
    },
    "amazon": {
        "data_dir": "https://s3-us-west-2.amazonaws.com/allennlp/dont_stop_pretraining/data/amazon/",
        "dataset_size": 115251
    }
}


DATASETS = {"NER": NER_DATASETS, "CLASSIFICATION": CLASSIFICATION_DATASETS}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants