-
Notifications
You must be signed in to change notification settings - Fork 24
Add a "HOWITWORKS.md" to explain how ProofOfThought works #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ttulttul
wants to merge
8
commits into
DebarghaG:main
Choose a base branch
from
ttulttul:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 5 commits
Commits
Show all changes
8 commits
Select commit
Hold shift + click to select a range
bcf809b
Added a 'how it works'
ttulttul c7d6698
Added an introductory paragraph and changed the data structure's code…
ttulttul 7043af2
More formatting fixes
ttulttul efaca28
Add a title
ttulttul d7293ec
Formatting in the intro paragraph, and removing some fluff.
ttulttul c17cec9
Added a section on limitations and security considerations.
ttulttul 0eddafc
Prefer structured responses when calling the LLM if that capability e…
ttulttul fc4b433
Adding a couple of files I neglected to commit
ttulttul File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,93 @@ | ||
| ## How `ProofOfThought` Works | ||
|
|
||
| `ProofOfThought` solves one of the most significant challenges in modern | ||
| artificial intelligence: the tendency of Large Language Models (LLMs) to | ||
| "hallucinate" or generate plausible but *logically flawed* answers. Its core | ||
| innovation lies in creating a powerful hybrid reasoning system that combines | ||
| the flexible, human-like understanding of LLMs with the rigorous, verifiable | ||
| certainty of a formal theorem prover. Instead of simply trusting an LLM's | ||
| direct answer, this project tasks the LLM with a more structured goal: to act | ||
| as a programmer that translates a user's question into a symbolic logic | ||
| program. This program is then executed by the Z3 theorem prover, a trusted | ||
| engine that guarantees the final `True` or `False` result is not just a guess, | ||
| but a conclusion derived from logically consistent facts and rules. The most | ||
| inventive aspect is its automated self-correction loop; if the generated logic | ||
| program fails, the system feeds the error back to the LLM, enabling it to learn | ||
| from its mistakes and refine its reasoning until it produces a program that is | ||
| both syntactically correct and logically sound. | ||
|
|
||
| Here's how it works: | ||
|
|
||
| ### 1. High-Level User Query | ||
|
|
||
| A user starts by posing a question in natural language to the `ProofOfThought` | ||
| class, which is the main high-level API for the system. | ||
|
|
||
| ```python | ||
| from openai import OpenAI | ||
| from z3dsl.reasoning import ProofOfThought | ||
|
|
||
| client = OpenAI() | ||
| pot = ProofOfThought(llm_client=client) | ||
| result = pot.query("Would Nancy Pelosi publicly denounce abortion?") | ||
| ``` | ||
|
|
||
| ### 2. Program Generation via LLM | ||
|
|
||
| Instead of answering the question directly, the system uses an LLM (e.g. `GPT-5`) to act as a programmer. | ||
|
|
||
| * **Prompting**: The user's question is embedded into a highly detailed prompt template. This prompt instructs the LLM on how to decompose the question and convert it into a JSON-based Domain-Specific Language (DSL) that represents the problem's logic. | ||
| * **Logical Decomposition**: The LLM breaks down the question. For "Would Nancy Pelosi publicly denounce abortion?", it reasons that: | ||
| 1. Nancy Pelosi is a person and abortion is an issue. | ||
| 2. A known fact is that Nancy Pelosi supports abortion rights. | ||
| 3. A general rule is that people do not publicly denounce things they support. | ||
| * **JSON Program Creation**: The LLM translates this reasoning into a JSON program. This program defines sorts (types like `Person`), constants (`nancy_pelosi`), functions (`supports`), a knowledge base (facts and rules), and a single verification condition to test the original question. | ||
|
|
||
| ```javascript | ||
| // Example generated JSON from the project's documentation | ||
| { | ||
| "sorts": [...], | ||
| "functions": [...], | ||
| "constants": { | ||
| "persons": {"sort": "Person", "members": ["nancy_pelosi"]}, | ||
| "issues": {"sort": "Issue", "members": ["abortion"]} | ||
| }, | ||
| "knowledge_base": [ | ||
| {"assertion": "supports(nancy_pelosi, abortion)"}, | ||
| {"assertion": "ForAll([p, i], Implies(supports(p, i), Not(publicly_denounce(p, i))))"} | ||
| ], | ||
| "verifications": [ | ||
| {"name": "Pelosi Denounce Abortion", "constraint": "publicly_denounce(nancy_pelosi, abortion)"} | ||
| ], | ||
| "actions": ["verify_conditions"] | ||
| } | ||
| ``` | ||
|
|
||
| ### 3. Z3 Theorem Proving | ||
|
|
||
| * **Interpretation**: The generated JSON program is passed to the `Z3JSONInterpreter`. This component reads the file and translates the DSL into commands and assertions for the Z3 theorem prover. | ||
| * **Execution**: The Z3 solver loads the facts from the `"knowledge_base"` and then checks if the `"constraint"` from the `"verifications"` section can be true. The result is one of the following: | ||
| * **`UNSAT` (Unsatisfiable)**: The constraint contradicts the knowledge base. The system interprets this as `False`. | ||
| * **`SAT` (Satisfiable)**: The constraint is logically consistent with the knowledge base. The system interprets this as `True`. | ||
| * **`UNKNOWN`**: The solver timed out. | ||
|
|
||
| In the Nancy Pelosi example, Z3 would find that "publicly_denounce(nancy_pelosi, abortion)" is a contradiction to the knowledge base, resulting in `UNSAT` and a final answer of `False`. | ||
|
|
||
| ### 4. Self-Correction and Error Handling | ||
|
|
||
| A key feature of the project is its automatic retry mechanism. If any step fails, the system attempts to correct itself. | ||
|
|
||
| * **JSON or Z3 Errors**: If the LLM generates invalid JSON, or if the Z3 interpreter fails (e.g., due to a logical error in the program), the error message is captured. | ||
| * **Feedback Loop**: The system initiates another call to the LLM. This time, it provides the original question, the faulty program, and the specific error trace. It then asks the LLM to "fix the JSON accordingly." | ||
| * **Retry Attempts**: This feedback loop continues for a configurable number of attempts (`max_attempts`), allowing the LLM to iteratively debug and correct its own generated code. | ||
|
|
||
| ### 5. Architecture and Evaluation | ||
|
|
||
| The project is structured into two main layers, making it both powerful and easy to use: | ||
|
|
||
| * **High-level API (`z3dsl.reasoning`)**: Provides the simple `ProofOfThought` and `EvaluationPipeline` classes for end-users, abstracting away the complex underlying logic. | ||
| * **Low-level DSL (`z3dsl`)**: Consists of the interpreter, verifier, and program generator that handle the core workflow of converting questions to JSON and executing them with Z3. | ||
|
|
||
| The system also includes an `EvaluationPipeline` to benchmark performance on | ||
| datasets like StrategyQA, automatically calculating metrics such as accuracy, | ||
| precision, and recall. | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious: In a large system how can we make sure that the LLM doesn't add biases or fallacies into this logical decomposition? For smaller examples we can indeed read and see all the decompositions.
This question is more toward the repo owner than the PR author.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a section in an attempt to answer your question thoughtfully.