Skip to content

droidsaw/droidsaw-common

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

droidsaw-common

droidsaw-common is the workspace's shared algorithm crate. It holds the format-agnostic middle of the decompiler pipeline — dominators, SSA construction (Braun et al. CC 2013), region structuring — plus the finding model and byte-encoding primitives every bundle parser depends on. No I/O, no bytecode knowledge. Pure Rust, BSD-3-Clause.

Common hosts algorithms; bundles host instructions. DEX and Hermes consume the same dominator pass, the same SSA builder, and the same region structurer. A correctness gauge here applies to both.

The trait boundary

Algorithms are generic over what the format crate owns:

  • Graph algorithms take impl Graph (src/graph/mod.rs:79). The trait is four methods — entry, nodes, successors, predecessors — and nothing else. No opcode, no instruction type, no operand semantics.
  • SSA construction takes impl SsaCfg plus a register type R (src/ssa/mod.rs). The caller drives read_variable / write_variable at each operand use — common does not interpret instructions.
  • Region structuring takes impl StmtBackend (src/region/mod.rs). The backend provides extract_condition, emit_block_ops, build_if, build_while, build_try_catch, etc. Common owns the tree shape and the descent order; the backend owns every per-language string.

What never lives here:

  • Format parsers (DEX, HBC, AXML, ZIP, MUTF-8 string tables).
  • Opcode tables and per-format encyclopedia data.
  • I/O (one scoped exception — diag writes panic reproducer bundles from a panic::set_hook handler).

Modules

Module Purpose
graph RPO, Cooper-Harvey-Kennedy dominators, post-dominators with synthetic exit, natural loops, reducibility, irreducible-SCC detection
ssa Two-phase iterative Braun on-the-fly SSA name resolution (Builder::read_variable / write_variable / seal_phis)
region Region IR (RegionNode<BId, C>), structurer (build_region_tree), lowerer (lower_region), depth cap MAX_REGION_DEPTH = 512
signature Recognizer engine for inversion-driven decompilation (per-opcode fingerprints live in bundle crates)
encoding Little-endian readers, ULEB128 / SLEB128, MUTF-8, base64, UTF-8 safe truncation. Every reader is Result<_, EncodingError> and bounds-checks before slice access
entropy Shannon entropy + windowed entropy_profile (Encrypted / Compressed / PackedPayload / Plaintext)
strings ASCII run extraction and entropy classification (Text < 3.5 < Structured < 4.2 < Mixed < 4.8 < HighEntropy < 5.5 ≤ RandomLooking)
finding Finding, Severity (Critical < High < Medium < Low < Info), Layer, GaugeClass (Cryptographic / Semantic / Representational)
analysis TaintSource, TaintSink, TaintFinding — the cross-layer taint vocabulary
provenance Provenance, SdkId, AssetFormat — types-only carriers for cross-detector suppression
crypto KeyMaterial, CryptoExtractor trait — shared shape for batch-GCD / ROCA across layers
budget ParseBudget (memory + steps + wall-clock), BudgetExhausted
guard bound_count — slice-length cross-check before Vec::with_capacity(n) at every parse-time count read
diag Panic-hook reproducer bundles, per-thread stage dumps (DROIDSAW_CRASH_DIR, DROIDSAW_DUMP)
threat_model Acquisition metadata (RFC-3339 timestamps via chrono)
telemetry SilentSkipCounters — observability for silent-skip code paths

Layered-oracle differentials

Three production algorithms in this crate run with a textbook differential oracle behind a fuzz target. The oracle is the sole detector for the silent-wrong-answer-that-does-not-crash bug class — panic-fuzz and ASAN cannot observe semantically wrong output.

Production Oracle Reference Fuzz target
Cooper-Harvey-Kennedy dominators (src/graph/dominators.rs) O(N³) reachability-based (src/graph/naive_oracle.rs) SAS 2001 fuzz/fuzz_targets/dominators_differential.rs
Braun on-the-fly SSA (src/ssa/mod.rs) CFRWZ post-pass via dominance frontiers (src/ssa/naive_oracle.rs) TOPLAS 1991 fuzz/fuzz_targets/ssa_differential.rs
build_region_tree recursive descent (src/region/mod.rs) Cifuentes reducible-CFG structurer (src/region/naive_oracle.rs) Cifuentes 1994, ch. 6 fuzz/fuzz_targets/region_differential.rs

The pattern is the same in each case: production and oracle share no code, use different algorithm families, and the fuzz target asserts equivalence on every libFuzzer-generated input. Precedent and lineage are documented in the module-level doc-comments — the dominator oracle was added after a real silent-wrong-dominators incident that escaped clippy, ASAN, and panic-fuzz (src/graph/naive_oracle.rs:18-21).

A fourth fuzz target, fuzz_entropy_profile, exercises entropy_profile for panic-freedom on arbitrary byte input.

Property tests

proptest cases co-located with the algorithms they cover:

  • tests/graph_proptest.rs — 7 proptests on Graph: RPO length equals reachable count; RPO is topological on non-back edges; entry dominates every reachable node; dominance is reflexive, antisymmetric, transitive; dominators are deterministic.
  • tests/proptest_invariants.rsGraph::successors / predecessors symmetry; SSA phi-arg count matches predecessor count; SSA phi-arg vars are defined at the corresponding predecessor.
  • tests/region_invariants.rsSequence never directly contains another Sequence; loop headers stay inside their loop body; block partition covers the scope (no loss, no duplication except self-loop headers).
  • tests/base64_proptest.rsbase64_decode rejects len % 4 == 1 and accepts every other length class.
  • tests/region_depth_cap.rs, tests/region_builder.rsMAX_REGION_DEPTH = 512 enforcement on adversarial nesting.
  • tests/finding_global_cap.rs, tests/keymaterial_modulus_cap.rs — finding-count and RSA-modulus caps.

Verify locally:

cargo test -p droidsaw-common

Kani proofs

Tier-1 proof bodies live in proofs/, gated #[cfg(kani)] and invisible to normal builds. 17 harnesses across 7 files:

File Harnesses Property
leb128_roundtrip.rs 4 write_uleb128 ∘ read_uleb128 ≡ id for every u32; symmetric for i32; encoded length bounded
mutf8_codepoint.rs 4 decode_one_codepoint roundtrips through Rust stdlib's UTF-8 / UTF-16 encoder (independent oracle — different author, different bit-extraction strategy)
agaps.rs 5 Negative proofs: adversarial gaps in the current verification suite (cumulative amplification across bound_count calls, etc.)
base64_capacity.rs 1 base64_decode capacity hint never overflows and is bounded by input length
base64_decode_quartet.rs 1 Base64 quartet decode totality
is_string_byte_ascii_subset.rs 1 ASCII subset predicate correctness
read_lp.rs 1 Length-prefixed string read never reads past end

Harnesses target spec invariants the lint floor (clippy::unwrap_used + clippy::indexing_slicing + clippy::arithmetic_side_effects) cannot already prove.

Lean proofs

16 theorems in the droidsaw-lean workspace crate cover the structural facts about graph algorithms that the type system cannot enforce:

  • DominatorsDominates.refl, Dominates.entry, Dominates.trans, Dominates.antisymm, Dominates.total, immediate_dom_unique.
  • Post-dominatorsPostDominates.refl, PostDominates.antisymm, immediate_postdom_unique; soundness of replacing the old u32::MAX sentinel with PostDom::Node(N) | PostDom::Exit.
  • Paths — 6 path lemmas about vertex membership and path composition used by the dominator proofs.
  • Latticepropagation_monotonic, the Kleene-ascending-chain result that underwrites dataflow fixed-point iteration.

Public surface

The crate root re-exports the load-bearing identities (src/lib.rs:77-101):

pub use graph::{
    Graph, NaturalLoop, PostDom, back_edges, dominates, dominators,
    dominators_with_rpo, is_reducible, natural_loops, post_dominators,
    post_dominators_with_virtual_exit, reverse_post_order, /* … */
};
pub use ssa::{ /* via module: Builder, SsaCfg, … */ };
pub use region::{ /* via module: RegionNode, StmtBackend, RegionError, … */ };
pub use encoding::{
    EncodingError, base64_decode, decode_mutf8, decode_one_codepoint,
    read_sleb128, read_uleb128, truncate_str, u16_le, u32_le, u64_le,
};
pub use finding::{
    Confidence, Finding, FindingProvenance, GaugeClass, Layer, Severity, Source, /* … */
};

Every public type implements serde::Serialize + serde::Deserialize so it can cross the SQLite / NDJSON export boundary.

Compile-time floor

The crate root pins the lint discipline for the entire workspace's algorithm core (src/lib.rs:3-32):

#![cfg_attr(not(test), deny(
    clippy::unwrap_used,
    clippy::expect_used,
    clippy::panic,
    clippy::unreachable,
    clippy::todo,
    clippy::arithmetic_side_effects,
    clippy::indexing_slicing,
    clippy::string_slice,
    clippy::cast_lossless,
    clippy::cast_possible_truncation,
    clippy::cast_sign_loss,
    clippy::cast_precision_loss,
    clippy::cast_possible_wrap,
    /* … */
))]
#![deny(missing_docs)]
#![deny(unsafe_code)]

Three modules carry a scoped #![allow(clippy::arithmetic_side_effects, reason = "…")] with a discharged proof obligation in the reason string: encoding (LEB128 / base64 / MUTF-8 / ASCII case-map), region (DFS cursors, label counter, depth cap), diag (Hinnant date math, dump-stage counters). Every other site uses checked_* / ? discipline.

License

BSD-3-Clause.

About

Format-agnostic algorithm core. Dominators (CHK), SSA (Braun), region structuring, with layered-oracle differential fuzz on all three. Plus encoding, entropy, and the Finding model. Pure Rust.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages