droidsaw-common is the workspace's shared algorithm crate. It holds the format-agnostic middle of the decompiler pipeline — dominators, SSA construction (Braun et al. CC 2013), region structuring — plus the finding model and byte-encoding primitives every bundle parser depends on. No I/O, no bytecode knowledge. Pure Rust, BSD-3-Clause.
Common hosts algorithms; bundles host instructions. DEX and Hermes consume the same dominator pass, the same SSA builder, and the same region structurer. A correctness gauge here applies to both.
Algorithms are generic over what the format crate owns:
- Graph algorithms take
impl Graph(src/graph/mod.rs:79). The trait is four methods —entry,nodes,successors,predecessors— and nothing else. No opcode, no instruction type, no operand semantics. - SSA construction takes
impl SsaCfgplus a register typeR(src/ssa/mod.rs). The caller drivesread_variable/write_variableat each operand use — common does not interpret instructions. - Region structuring takes
impl StmtBackend(src/region/mod.rs). The backend providesextract_condition,emit_block_ops,build_if,build_while,build_try_catch, etc. Common owns the tree shape and the descent order; the backend owns every per-language string.
What never lives here:
- Format parsers (DEX, HBC, AXML, ZIP, MUTF-8 string tables).
- Opcode tables and per-format encyclopedia data.
- I/O (one scoped exception —
diagwrites panic reproducer bundles from apanic::set_hookhandler).
| Module | Purpose |
|---|---|
graph |
RPO, Cooper-Harvey-Kennedy dominators, post-dominators with synthetic exit, natural loops, reducibility, irreducible-SCC detection |
ssa |
Two-phase iterative Braun on-the-fly SSA name resolution (Builder::read_variable / write_variable / seal_phis) |
region |
Region IR (RegionNode<BId, C>), structurer (build_region_tree), lowerer (lower_region), depth cap MAX_REGION_DEPTH = 512 |
signature |
Recognizer engine for inversion-driven decompilation (per-opcode fingerprints live in bundle crates) |
encoding |
Little-endian readers, ULEB128 / SLEB128, MUTF-8, base64, UTF-8 safe truncation. Every reader is Result<_, EncodingError> and bounds-checks before slice access |
entropy |
Shannon entropy + windowed entropy_profile (Encrypted / Compressed / PackedPayload / Plaintext) |
strings |
ASCII run extraction and entropy classification (Text < 3.5 < Structured < 4.2 < Mixed < 4.8 < HighEntropy < 5.5 ≤ RandomLooking) |
finding |
Finding, Severity (Critical < High < Medium < Low < Info), Layer, GaugeClass (Cryptographic / Semantic / Representational) |
analysis |
TaintSource, TaintSink, TaintFinding — the cross-layer taint vocabulary |
provenance |
Provenance, SdkId, AssetFormat — types-only carriers for cross-detector suppression |
crypto |
KeyMaterial, CryptoExtractor trait — shared shape for batch-GCD / ROCA across layers |
budget |
ParseBudget (memory + steps + wall-clock), BudgetExhausted |
guard |
bound_count — slice-length cross-check before Vec::with_capacity(n) at every parse-time count read |
diag |
Panic-hook reproducer bundles, per-thread stage dumps (DROIDSAW_CRASH_DIR, DROIDSAW_DUMP) |
threat_model |
Acquisition metadata (RFC-3339 timestamps via chrono) |
telemetry |
SilentSkipCounters — observability for silent-skip code paths |
Three production algorithms in this crate run with a textbook differential oracle behind a fuzz target. The oracle is the sole detector for the silent-wrong-answer-that-does-not-crash bug class — panic-fuzz and ASAN cannot observe semantically wrong output.
| Production | Oracle | Reference | Fuzz target |
|---|---|---|---|
Cooper-Harvey-Kennedy dominators (src/graph/dominators.rs) |
O(N³) reachability-based (src/graph/naive_oracle.rs) |
SAS 2001 | fuzz/fuzz_targets/dominators_differential.rs |
Braun on-the-fly SSA (src/ssa/mod.rs) |
CFRWZ post-pass via dominance frontiers (src/ssa/naive_oracle.rs) |
TOPLAS 1991 | fuzz/fuzz_targets/ssa_differential.rs |
build_region_tree recursive descent (src/region/mod.rs) |
Cifuentes reducible-CFG structurer (src/region/naive_oracle.rs) |
Cifuentes 1994, ch. 6 | fuzz/fuzz_targets/region_differential.rs |
The pattern is the same in each case: production and oracle share no code, use different algorithm families, and the fuzz target asserts equivalence on every libFuzzer-generated input. Precedent and lineage are documented in the module-level doc-comments — the dominator oracle was added after a real silent-wrong-dominators incident that escaped clippy, ASAN, and panic-fuzz (src/graph/naive_oracle.rs:18-21).
A fourth fuzz target, fuzz_entropy_profile, exercises entropy_profile for panic-freedom on arbitrary byte input.
proptest cases co-located with the algorithms they cover:
tests/graph_proptest.rs— 7 proptests onGraph: RPO length equals reachable count; RPO is topological on non-back edges; entry dominates every reachable node; dominance is reflexive, antisymmetric, transitive; dominators are deterministic.tests/proptest_invariants.rs—Graph::successors/predecessorssymmetry; SSA phi-arg count matches predecessor count; SSA phi-arg vars are defined at the corresponding predecessor.tests/region_invariants.rs—Sequencenever directly contains anotherSequence; loop headers stay inside their loop body; block partition covers the scope (no loss, no duplication except self-loop headers).tests/base64_proptest.rs—base64_decoderejectslen % 4 == 1and accepts every other length class.tests/region_depth_cap.rs,tests/region_builder.rs—MAX_REGION_DEPTH = 512enforcement on adversarial nesting.tests/finding_global_cap.rs,tests/keymaterial_modulus_cap.rs— finding-count and RSA-modulus caps.
Verify locally:
cargo test -p droidsaw-commonTier-1 proof bodies live in proofs/, gated #[cfg(kani)] and invisible to normal builds. 17 harnesses across 7 files:
| File | Harnesses | Property |
|---|---|---|
leb128_roundtrip.rs |
4 | write_uleb128 ∘ read_uleb128 ≡ id for every u32; symmetric for i32; encoded length bounded |
mutf8_codepoint.rs |
4 | decode_one_codepoint roundtrips through Rust stdlib's UTF-8 / UTF-16 encoder (independent oracle — different author, different bit-extraction strategy) |
agaps.rs |
5 | Negative proofs: adversarial gaps in the current verification suite (cumulative amplification across bound_count calls, etc.) |
base64_capacity.rs |
1 | base64_decode capacity hint never overflows and is bounded by input length |
base64_decode_quartet.rs |
1 | Base64 quartet decode totality |
is_string_byte_ascii_subset.rs |
1 | ASCII subset predicate correctness |
read_lp.rs |
1 | Length-prefixed string read never reads past end |
Harnesses target spec invariants the lint floor (clippy::unwrap_used + clippy::indexing_slicing + clippy::arithmetic_side_effects) cannot already prove.
16 theorems in the droidsaw-lean workspace crate cover the structural facts about graph algorithms that the type system cannot enforce:
- Dominators —
Dominates.refl,Dominates.entry,Dominates.trans,Dominates.antisymm,Dominates.total,immediate_dom_unique. - Post-dominators —
PostDominates.refl,PostDominates.antisymm,immediate_postdom_unique; soundness of replacing the oldu32::MAXsentinel withPostDom::Node(N) | PostDom::Exit. - Paths — 6 path lemmas about vertex membership and path composition used by the dominator proofs.
- Lattice —
propagation_monotonic, the Kleene-ascending-chain result that underwrites dataflow fixed-point iteration.
The crate root re-exports the load-bearing identities (src/lib.rs:77-101):
pub use graph::{
Graph, NaturalLoop, PostDom, back_edges, dominates, dominators,
dominators_with_rpo, is_reducible, natural_loops, post_dominators,
post_dominators_with_virtual_exit, reverse_post_order, /* … */
};
pub use ssa::{ /* via module: Builder, SsaCfg, … */ };
pub use region::{ /* via module: RegionNode, StmtBackend, RegionError, … */ };
pub use encoding::{
EncodingError, base64_decode, decode_mutf8, decode_one_codepoint,
read_sleb128, read_uleb128, truncate_str, u16_le, u32_le, u64_le,
};
pub use finding::{
Confidence, Finding, FindingProvenance, GaugeClass, Layer, Severity, Source, /* … */
};Every public type implements serde::Serialize + serde::Deserialize so it can cross the SQLite / NDJSON export boundary.
The crate root pins the lint discipline for the entire workspace's algorithm core (src/lib.rs:3-32):
#![cfg_attr(not(test), deny(
clippy::unwrap_used,
clippy::expect_used,
clippy::panic,
clippy::unreachable,
clippy::todo,
clippy::arithmetic_side_effects,
clippy::indexing_slicing,
clippy::string_slice,
clippy::cast_lossless,
clippy::cast_possible_truncation,
clippy::cast_sign_loss,
clippy::cast_precision_loss,
clippy::cast_possible_wrap,
/* … */
))]
#![deny(missing_docs)]
#![deny(unsafe_code)]Three modules carry a scoped #![allow(clippy::arithmetic_side_effects, reason = "…")] with a discharged proof obligation in the reason string: encoding (LEB128 / base64 / MUTF-8 / ASCII case-map), region (DFS cursors, label counter, depth cap), diag (Hinnant date math, dump-stage counters). Every other site uses checked_* / ? discipline.
BSD-3-Clause.