BBN's machine reading system with support from DARPA World Modelers and Causal Exploration programs.
For this release, we're providing both streaming implementation and batching implementation that will seamlessly integrate with TwoSix's Kafka pipeline. For batching implementation, as we don't have a good way to know that we have received all messages, we use a parameter to set a timeout. We consider streaming implementation as a special case of batching implementation in the sense that 1) Batch size is very small (e.g., could be 1
), 2) We'll keep listening to Kafka messages.
To run Hume, you need a machine with at least 2 cores, 16GB RAM. Also we're assuming it's a Linux machine with Docker installed.
Please download this file to your local machine (Please email Haoling if you want access). Download the tgz files and unzip:
cx-dependency.tar.gz
2a03df7162a6.tar.gz
docker load --input 2a03df7162a6.tar.gz
We're also providing an example config file. It's extended from https://github.com/twosixlabs-dart/python-kafka-consumer/blob/master/pyconsumer/resources/env/test.json and includes extra parameters that Hume require.
CDR_retrieval: # URI for receiving cdr.
DART_upload: # URI for submitting rdf triples back
hume.domain: "CAUSEEX" # Don't change this.
hume.num_of_vcpus # Number of cpu cores available to you. It at least needs to be 2, and please only include number of physical cores instead of SMT cores.
hume.tmp_dir: # Don't change this.
hume.batching.kafka_timeout: # When using batching mode, how many seconds should Hume listen to kafka message queue. After the time out, the system will assume that all messages have been received and will start processing.
hume.batching.maximum_num_of_cdrs_for_processing: # This is a debugging switch for batching mode. If null, all kafka messages will be processed. If an int specified, Hume will process at most that number of messages.
hume.streaming.mini_batch_size: # For streaming mode, this deterimines the size of a mini batch (which is used for improved efficiency)
hume.streaming.mini_batch_wait_time: # For streaming mode, even if we received fewer documents than mini_batch_size, the system will start processing after mini_batch_wait_time seconds.
hume.laptop_mode: false # If set to be true, the system will limit memory usage and run some components in a simplified setting to improve efficiency when running on a laptop.
After you finished the change, please name it as config.json
and put it under $PWD/runtime
.
Then you can run the system in the batch, streaming, or laptop mode:
For batch mode,
docker run --rm --net=host --mount type=bind,source=$PWD/runtime,target=/extra --mount type=bind,source=$PWD/causeex_dependency,target=/dependencies --name hume_batch_mode 2a03df7162a6 /usr/local/envs/py3-jni/bin/python3 /wm_rootfs/git/Hume/src/python/dart_integration/batching_processing.py
For streaming mode,
docker run --rm --net=host --mount type=bind,source=$PWD/runtime,target=/extra --mount type=bind,source=$PWD/causeex_dependency,target=/dependencies --name hume_stream_mode 2a03df7162a6 /usr/local/envs/py3-jni/bin/python3 /wm_rootfs/git/Hume/src/python/dart_integration/streaming_processing.py
For running on a laptop, please run the above command for either "batch mode" or "streaming mode" with hume.laptop_mode set to be true.
This work was supported by DARPA/I2O and U.S. Air Force Research Laboratory Contract No. FA8650-17-C-7716 under the Causal Exploration program, and DARPA/I2O and U.S. Army Research Office Contract No. W911NF-18-C-0003 under the World Modelers program. The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. This document does not contain technology or technical data controlled under either the U.S. International Traffic in Arms Regulations or the U.S. Export Administration Regulations.