Skip to content
This repository was archived by the owner on Dec 30, 2024. It is now read-only.
/ guide Public archive

Enhanced Reasoning Tautological Proof Engine guided by Tree-of-Thoughts LLM Search

Notifications You must be signed in to change notification settings

kevbuh/guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Guide

System Overview

This was a side project done at CUPLV. With the release of ChatGPT o1, we have decided to halt research on this project.

Enhanced Reasoning Tautological Proof Engine guided by Tree-of-Thoughts LLM Search

Paper: https://docs.google.com/document/d/1liiAKZFuTx96SoPN0SpGxQ_fUkkf4cPevdCTqcdylPo/edit?usp=sharing

Google Drive: https://drive.google.com/drive/folders/1jO0-AwkdevsX-YyOQKRhMk6_uAEuWx_H?usp=sharing

Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Add these lines into a .env file
# ANTHROPIC_API_KEY=<your-api-key-here>
# OPENAI_API_KEY=<your-api-key-here>

Proof Engine

Creates a proof using a guided tree-of-thoughts

python3 guide/proof_engine.py

Output

SOLVING: '(x and x) or (x and x)'
LLM: gpt-3.5-turbo
PARAMS: T=5, B=3, K=5, early_stop=False, pure_llm=False
ENGINE: symbolic
METHOD: tree of thoughts
----------------------------
TREE DEPTH:1/5
TREE DEPTH:2/5
pruning 4 leaves...
TREE DEPTH:3/5
FULLY REDUCED EXPR, PROOF DONE.
REMOVING NON-UNIQUE PROOF...
pruning 4 leaves...
TREE DEPTH:4/5
FULLY REDUCED EXPR, PROOF DONE.
REMOVING NON-UNIQUE PROOF...
REMOVING NON-UNIQUE PROOF...
FULLY REDUCED EXPR, PROOF DONE.
REMOVING NON-UNIQUE PROOF...
TREE DEPTH:5/5
----------FINAL PROOF----------
Proof:
(x and x) or (x and x)                Idempotent Law
≡ (x and x or x)                      Idempotent Law
≡ x
----------FINAL PROOF #2----------
Proof:
(x and x) or (x and x)                Distributive Law
≡ ((x and x or x) and (x and x or x)) Idempotent Law
≡ (x or x)                            Idempotent Law
≡ x
----------FINAL PROOF #3----------
Proof:
(x and x) or (x and x)                Idempotent Law
≡ (x and x or x)                      Idempotent Law
≡ (x and x)                           Idempotent Law
≡ x

CLI Arguments

usage: proof_engine.py [-h] [--expr EXPR] [--num_steps NUM_STEPS] [--debug]
                       [--verbose] [--cot] [--model MODEL] [--T T] [--B B]
                       [--K K] [--early_stop] [--pure_llm] [--unique] [--ckpt]
                       [--random] [--greedy]

options:
  -h, --help            show this help message and exit
  --expr EXPR           The expression to evaluate
  --num_steps NUM_STEPS
                        Number of proof steps
  --debug               Print debug statements
  --verbose             Print out states at each step
  --cot                 Use Chain of Thought
  --model MODEL         Specify a model to use
  --T T                 ToT tree depth
  --B B                 ToT branching factor
  --K K                 ToT max number of nodes per level
  --early_stop          Return on first proof found
  --pure_llm            Evaluate all expressions through llm instead of symbolic engine
  --unique              Select unique steps at each node
  --ckpt                Resume from last q in guide/ckpt.txt
  --random              Randomly select expression
  --greedy              Greedy select shortest expression

Synthetic Mirror

python3 guide/symbolic_mirror.py
(1)
(a or 1)                            Identity Law 1
(a or (a or 1))                     Identity Law 1
((a or a) or (a or 1))              Idempotent Law OR
((a or a) or (a or (a or 1)))       Identity Law 1
(((a or a and b) or a) or (a or (a or 1))) Absorption Law OR
(((a or b and a) or a) or (a or (a or 1))) Commutative Law AND
((((a or b and a) or a) or a) or (a or 1)) Associative Law OR
((((a or b and a) or a) or a) or (a or (a or 1))) Identity Law 1
((((a or b and a) or a) or a and 1) or (a or (a or 1))) Identity Law 2
(((((a or b and a) or a) or a and 1) or a) or (a or 1)) Associative Law OR

Symbolic Logic Reference

Notation

$\land: \text{AND}$

$\lor: \text{OR}$

$\lnot : \text{NOT}$

$\equiv: \text{Equivalence}$

$\Rightarrow: \text{Implication}$

$\Leftrightarrow: \text{Bi-Conditional}$

Laws

  1. Commutative Law:

    • $A \land B = B \land A$
    • $A \lor B = B \lor A$
  2. Associative Law:

    • $(A \land B) \land C = A \land (B \land C)$
    • $(A \lor B) \lor C = A \lor (B \lor C)$
  3. Distributive Law:

    • $A \land (B \lor C) = (A \land B) \lor (A \land C)$
    • $A \lor (B \land C) = (A \lor B) \land (A \lor C)$
  4. Identity Law:

    • $A \land 1 = A$; $A \land 0 = 0$
    • $A \lor 0 = A$; $A \lor 1 = 1$
  5. Negation Law:

    • $A \land \lnot A = 0$
    • $A \lor \lnot A = 1$
  6. Idempotent Law:

    • $A \land A = A$
    • $A \lor A = A$
  7. Absorption Law:

    • $A \land (A \lor B) = A$
    • $A \lor (A \land B) = A$
  8. De Morgan's Theorem:

    • $\lnot (A \land B) = \lnot A \lor \lnot B$
    • $\lnot (A \lor B) = \lnot A \land \lnot B$
  9. Double Negation (Involution) Law:

    • $\lnot (\lnot A) = A$
  10. Implication:

    • $A \Rightarrow B = \lnot A \lor B$

Roadmap

  • Proof engine
    • use the engine to collect all the logical prompts
    • and feed it to the LM
    • complete loop from llm choice back to prompt engine
    • track entire history of selections
      • expressions
      • laws it chose
    • prune identical proofs
  • Tree of thoughts
    • integrate original repo
    • option for GPT-3.5 Turbo or Claude-3 Haiku
    • use value evaluation and prune
    • stop when found first proof
    • add expr_history for more accurate value ratings
    • Flag to choose all llm or use symbolic engine
  • Symbolic engine
    • expression to AST representation
    • implement common laws on the AST
    • simplification engine
    • complete implication and bi-conditional laws
    • fix expression bugs (below in bug tracker)
  • Write testcases
    • simplify() (Simplification engine)
    • is_reduced()
    • proof_engine()
  • Experiments
    • remove already chosen options
    • save state into file so you can resume process later
    • Set up experiment suite to track model performance
    • look at entire tree
    • test with other prompts
    • choose law with respect to the probabilities
    • prune based off of >75 score
    • How variable are the results in terms of temp
    • How LLM performance scales with number of variables or length of expression
    • Compare LLM performance vs
      • Random choice
      • Greedy
      • simulated annealing ( prob inverse of complexity)
      • other heuristics
    • Find patterns of shortest vs longest proofs
    • Try different prompts with complexity measure like number of variables, depth, etc
    • Generate new problems by negating known unsatisfiable expressions
    • top-p
      • [0.1, 0.25, 0.5]
    • temperature tests
      • [0, 0.25, 0.5, 0.75]
    • Compare shortest vs longest proofs
    • How variable are the results in terms of temp
    • Does LLM give us insights
      • Helps with larger and larger formulas?
    • Incrementally increase size of formula
    • How often does it succeed
    • Find short proofs
    • Successful solve = repeat experiment X times with depth and time out and see how often its solved
    • Shortest proof == bfs or dfs? vs LLM solution
    • Give complexity measure like number of variables, depth, simulated annealing -> prob inverse of complexity
    • Is it just doing any better than random guesses?
  • Complex expressions
    • CK's
    • "(a > b) > ((b > c) > (a > c))"
    • Common Tautologies 19-25
      • "(P and (P > Q)) > Q"
      • "((P > Q) and not Q) > not P"
      • "((P > Q) and (Q > R)) > (P > R)"
      • "((P or Q) and not P ) > Q"
      • "(P > C) > not P"
        • This isn't a tautology?
      • "((P > Q) and (R > S)) > ((P or R) > (Q or S))"
      • "(P > Q) > ((P or R) > (Q or R))"
    • SATLIB - Benchmark Problems
    • Unsatisfiable formulas and negate

About

Enhanced Reasoning Tautological Proof Engine guided by Tree-of-Thoughts LLM Search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published