Computational Biology-ID: Information‑Field Greedy Algorithms

This repository contains the official implementation of the algorithms described in:

Validation of the Real-Imaginary Duality Principle in Core Challenges of Computational Biology: From Sequencing by Hybridization to RNA Inverse Folding Hongkui Liu, Kai Huang (2026)

The work introduces a unified Real‑Imaginary Duality Principle that solves two NP‑hard problems – Sequencing by Hybridization (SBH) and RNA inverse folding – exactly and in linear time, without training data or parameter tuning.

🔬 Core Algorithm Entry Point

The sole entry point for the exact, linear‑time SBH assembler is:

SBH/sbh_greedy_assembler.py

This script implements the information‑field greedy algorithm as described in Section 2.1 of the paper:

Virtual space: De Bruijn graph topology (legal overlaps)
Real space: observed k‑mer multiplicities ("fuel")
Dynamics: at each step, choose the legitimate successor with the highest remaining fuel
Automatically discards isolated error k‑mers

All other SBH scripts in the SBH/ directory are supplementary validation or extended variants (lookahead, full GL dynamics, error‑position scans, complexity benchmarks). They are not required to reproduce the core linear‑time results.

📁 Repository Structure

ComputationalBiology-ID/
├── SBH/                                   # DNA assembly (SBH) algorithms
│   ├── sbh_greedy_assembler.py            # ★ CORE ALGORITHM ★ (linear-time greedy)
│   ├── sbh_lookahead_greedy.py            # Lookahead variant (diffusion term)
│   ├── sbh_full_gl_dynamics.py            # Full GL dynamics with backtracking
│   ├── sbh_error_scan.py                  # Error rate vs. coverage scan (Table 1)
│   ├── sbh_error_position_scan.py         # 0.1% error-position determinism (seeds)
│   ├── complexity_race.py                 # Linear vs. exponential runtime (Fig. 1)
│   └── ... (logs, test outputs)
├── RNA_inverse_folding/                   # RNA secondary structure design
│   ├── rna_inverse_folding.py             # Free-energy gradient flow solver
│   └── Eterna100_Solved_Log.txt           # 96/99 puzzles solved (97.0%)
└── README.md                              # Updated documentation

🧬 SBH: Linear‑Time DNA Assembly

Run the core algorithm

cd SBH
python sbh_greedy_assembler.py

Expected output (perfect spectrum, 5000 bp, k=11)

Assembled 5000 bp in 4.8 seconds
Coverage: 100.00%
Reconstruction matches reference exactly.

Reproduce main results from the paper

Experiment	Script	Paper reference
Error‑position determinism (0.1% error)	`sbh_error_position_scan.py`	Section 2.2, Table 1
Graceful degradation (2% error → 83.5% coverage)	`sbh_lookahead_greedy.py`	Section 2.2, Table 1
Complexity comparison (greedy vs. backtracking)	`complexity_race.py`	Section 2.3, Fig. 1
Full GL dynamics (backtracking + 2‑step lookahead)	`sbh_full_gl_dynamics.py`	Supplementary

All scripts automatically simulate a random 5,000 bp genome, inject errors (if applicable), run the assembler, and report coverage, reconstructed length, and longest match.

🧬 RNA Inverse Folding: Eterna100 Benchmark

cd RNA_inverse_folding
python rna_inverse_folding.py

The solver achieves 97.0% success rate (96/99 puzzles) in 10 seconds total (25.5 ms per puzzle) – surpassing all known fully automated algorithms.

No pre‑training, no deep learning, no sequence databases.
Uses only a simplified Turner‑like energy model and a free‑energy gradient flow.
Unsolved puzzles (Still Life, The Turtle, Snowflake Necklace) are likely undesignable (see Section 2.6 and [5]).

See Eterna100_Solved_Log.txt for the complete list of solved puzzles.

License

This project is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).

You are free to share and adapt the material under the following terms:

Attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made.
NonCommercial – You may not use the material for commercial purposes.
ShareAlike – If you remix, transform, or build upon the material, you must distribute your contributions under the same license.

For commercial use, please contact the authors.

🧪 Reproducibility

Random seeds are fixed where needed (e.g., sbh_error_position_scan.py uses seeds 0‑9).
All experiments run on a single CPU core; no GPU required.
Python dependencies: torch (only for supplementary scripts), matplotlib, numpy, difflib (standard library).

Install required packages:

pip install torch matplotlib numpy   # optional for supplementary scripts

For core SBH and RNA folding, only the Python standard library is needed.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
RNA_inverse_folding		RNA_inverse_folding
SBH		SBH
applications		applications
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Computational Biology-ID: Information‑Field Greedy Algorithms

🔬 Core Algorithm Entry Point

📁 Repository Structure

🧬 SBH: Linear‑Time DNA Assembly

Run the core algorithm

Expected output (perfect spectrum, 5000 bp, k=11)

Reproduce main results from the paper

🧬 RNA Inverse Folding: Eterna100 Benchmark

License

🧪 Reproducibility

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Computational Biology-ID: Information‑Field Greedy Algorithms

🔬 Core Algorithm Entry Point

📁 Repository Structure

🧬 SBH: Linear‑Time DNA Assembly

Run the core algorithm

Expected output (perfect spectrum, 5000 bp, k=11)

Reproduce main results from the paper

🧬 RNA Inverse Folding: Eterna100 Benchmark

License

🧪 Reproducibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Expected output (perfect spectrum, 5000 bp, k=11)

Packages