Skip to content

hf_olmo needs datasets dependency #822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

IanMagnusson
Copy link
Contributor

When I run the following hf_olmo inference instructions, it looks like there is a missing dependency on datasets:

!pip install ai2-olmo
from hf_olmo import OLMoForCausalLM

olmo = OLMoForCausalLM.from_pretrained("allenai/OLMo-1B", revision="step20000-tokens84B")
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
Cell In[2], line 1
----> 1 from hf_olmo import OLMoForCausalLM
      2 from transformers import AutoTokenizer

File /opt/miniconda3/envs/DataDecide-hf-olmo-test1/lib/python3.10/site-packages/hf_olmo/__init__.py:1
----> 1 from .configuration_olmo import OLMoConfig
      2 from .modeling_olmo import OLMoForCausalLM
      3 from .tokenization_olmo_fast import OLMoTokenizerFast

File /opt/miniconda3/envs/DataDecide-hf-olmo-test1/lib/python3.10/site-packages/hf_olmo/configuration_olmo.py:8
      5 from transformers import AutoConfig, PretrainedConfig
      6 from transformers.utils import logging
----> 8 from olmo.config import ModelConfig
      9 from olmo.exceptions import OLMoConfigurationError
     11 logger = logging.get_logger(__name__)

File /opt/miniconda3/envs/DataDecide-hf-olmo-test1/lib/python3.10/site-packages/olmo/__init__.py:1
----> 1 from .config import *
      2 from .model import *
      3 from .tokenizer import *

File /opt/miniconda3/envs/DataDecide-hf-olmo-test1/lib/python3.10/site-packages/olmo/config.py:29
     27 from .aliases import PathOrStr
     28 from .exceptions import OLMoConfigurationError
---> 29 from .util import StrEnum
     31 __all__ = [
     32     "ActivationType",
     33     "ActivationCheckpointingStrategy",
   (...)
     59     "CheckpointType",
     60 ]
     62 C = TypeVar("C", bound="BaseConfig")

File /opt/miniconda3/envs/DataDecide-hf-olmo-test1/lib/python3.10/site-packages/olmo/util.py:21
     19 import boto3
     20 import botocore.exceptions as boto_exceptions
---> 21 import datasets
     22 import requests
     23 import rich

ModuleNotFoundError: No module named 'datasets'

Could we add this as a dependency in ai2-olmo so we can avoid this error for people following the instructions. Or perhaps we could instruct people to pip install ai2-olmo[train] which seems to have this dependency? I'm hosting a set of new models that need to use hf_olmo so I'm trying to figure out what to write about setup in the model cards. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant