Skip to content

HarvardPL/datalog-smmt-cvc4-impl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

225 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

datalog-smmt-cvc4-impl

Prototype SMT solver supporting Datalog as a monotonic theory.

The canonical version of this tool is in the Zenodo artifact for the POPL'23 paper "From SMT to ASP: Solver-Based Approaches to Solving Datalog Synthesis-as-Rule-Selection Problems" by Aaron Bembenek, Michael Greenberg, and Stephen Chong.

This repository includes a (modified) copy of the SMT solver CVC4, which has its own license (see CVC4-1.8/COPYING).

Project structure

  • CVC4-1.8/: a version of the SMT solver CVC4 extended with our theory of Datalog
  • dlog/: the Datalog source code that will be compiled and linked into the SMT solver
  • examples/: a couple example SMT-LIB programs using Datalog-based theories
  • scripts/: various scripts for building different parts of the system
    • build_cvc4.sh: build CVC4; needs to be run whenever you change the CVC4 source code or if you build a new Datalog library
    • build_dlog.sh: compile the Datalog code into a library
    • build_monosat.sh: build the MonoSAT system (to compare against)
  • benchmarks/: material for evaluating the correctness and efficiency of our system, and comparing it to other systems
    • reach/: scripts for benchmarking graph reachability
    • worlds/: scripts for benchmarking galaxy generation
    • synthesis/: scripts for benchmarking Datalog synthesis

Booting up the system

The project is set-up to run in a Docker container; we have some docker-compose configurations for convenience. If you use the container, the only dependencies should be Docker and docker-compose.

I'd recommend booting up the container like this:

docker-compose run eval

Using this command, the CVC4-1.8/, dlog/, scripts/, and tests/ directories will be mounted into the container, so that you can edit files in these directories locally (outside the container). In what follows, I will assume that you have booted up the container this way; all commands should be issued from inside the home directory of the container.

To build our modified version of CVC4, we first need to compile our Datalog code, and then we can build the SMT solver itself:

scripts/build_dlog.sh && scripts/build_cvc4.sh

If you want to compare against MonoSAT, you can build it with this command:

scripts/build_monosat.sh

Running the SMT solver

SMT-LIB

We have extended SMT-LIB with a new predicate dlog, which is in the form:

(dlog n relname i_1 ... i_m b_1 ... b_n)

where:

  • n is a natural number specifying the number of boolean parameters
  • relname is a string specifing the name of the Datalog relation
  • the i_j are integer arguments to the Datalog relation
  • the b_k are boolean parameters

So, assuming we have a Datalog theory defining graph reachability (which we do), we can say that node 1 is reachable from node 0 under the edge assignments e0_0, e0_1, e1_0, and e1_1:

(set-logic ALL)
(declare-fun e0_0 () Bool)
(declare-fun e0_1 () Bool)
(declare-fun e1_0 () Bool)
(declare-fun e1_1 () Bool)
(assert (dlog 4 "reach" 0 1 e0_0 e0_1 e1_0 e1_1))
(check-sat)

If you saved this program in foo.smt2, you can run it by using the command:

cvc4 foo.smt2

Python API

The Datalog predicate is also available through the Python API as pycvc4.kinds.DatalogPred; see the function test_datalog in tests/reach/bm.py for example usage.

Options

We currently support the following options:

  • datalog-why-provenance: use "why" provenance in conflict justifications (default=true)
  • datalog-why-not-provenance: use "why not" provenance in conflict justifications (default=true)
  • datalog-parallelism=N: use N threads in the Datalog solver (default=1)
  • datalog-buffer-size=N: wait for N boolean parameters to become assigned between theory checks (default=5)
  • datalog-theory-propagation: propagate Datalog theory atoms (default=true)

Any boolean option <x> also has an opposite option no-<x>; for example, we can run the command:

cvc4 foo.smt2 --no-datalog-why-not-provenance --datalog-parallelism=4 

Current Datalog theories

Graph reachability

  • Predicate: reach/2
    • Denotes that there is a path in the graph from the vertex identified by the first argument to the vertex identified by the second one
    • If a graph has n vertices, they are numbered from 0 to n-1
    • Parameter arity: equal to the square of the number of vertices in the graph
    • Each boolean parameter represents the presence of an edge: if there are n vertices in the graph, then there is a correspondence between the edge from i to j and the i * n + jth (0-indexed) parameter

Galactic map generation

This theory is used to generate a map for a hypothetical spaced-based game, where there are a bunch of worlds distributed across a pool of solar systems, with each world varying in its benevolence, its level of technology, and its level of resources. There is a set of rules that determine whether two worlds are in contact with each other and, if so, whether they trade with each other and whether they are at war with each other. There are 100 candidate worlds; a choice of worlds can be constrained using the following predicates. For all predicates, parameter i corresponds to world i.

  • Predicate: Worlds.Trades/1
    • True if the world corresponding to the argument trades with some world
  • Predicate: Worlds.AtWar/1
    • True if the world corresponding to the argument is at war with some world
  • Predicate: Worlds.Trade/0
    • True if there is some world that trades
  • Predicate: Worlds.War/0
    • True if there is some world that is at war
  • Predicate: Worlds.Smuggling/0
    • True if there is some world that trades with a world it is at war with
  • Predicate: Worlds.IntraPlanetaryStrife/0
    • True if there is some world that is at war with itself

Note: Because of a bug in Souffle's provenance mechanism, the nullary predicates listed above actually need to be unary predicates; the argument should always be 0.

Adding a new Datalog theory

Writing the program

First, you need to write the program and store it in the dlog/souffle/ directory. Nonnegative integers are the only type of term we allow. You then need to add the program to dlog/souffle/CMakeLists.txt. See dlog/souffle/reach.dl for an example.

Adding the SMT theory

Now that you have the compiled Datalog code, you need to add some C++ glue code to the CVC4 source code that tells our "theory of Datalog" implementation how to interact with the new Datalog program. First, register the program in cvc4/src/theory/datalog/souffle_solver.cpp (using the name from dlog/souffle/CMakeLists.txt).

Then you need to implement a class representing the Datalog program and classes for each output relation in the program. See the existing implementations in cvc4/src/theory/datalog/relations/ for guidance. Once you've done this, register the output relations in the function DatalogRelation::initialize() in cvc4/src/theory/datalog/datalog_relation.cpp.

At this point, rebuild CVC4 (scripts/build_cvc4.sh). Fingers crossed, your new theory should work!

About

Prototype SMT solver supporting Datalog as a monotonic theory

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors