Skip to content

pedrovergueiro/dNaty

Repository files navigation

dNATY

dNATY

Evolutionary AI Model Compression

Up to 86% fewer FLOPs · accuracy kept · 18 benchmark datasets · no GPU required

PyPI version Python 3.10+ PyTorch 2.0+ License: BSL-1.1

Compress any PyTorch model with one function call. dNATY uses multi-objective evolutionary search (NSGA-II) guided by episodic memory to find smaller, faster architectures — automatically, on a standard CPU.

pip install dnaty

Website · Docs · Benchmarks · Changelog


Quickstart

import torch.nn as nn
from dnaty import compress
from dnaty.experiments.fast_dataset import FastDataset

# 1. Your model — any nn.Module with Linear layers
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(784, 512), nn.ReLU(),
    nn.Linear(512, 256), nn.ReLU(),
    nn.Linear(256, 10)
)

# 2. Load dataset (cached in RAM — zero I/O across generations)
ds = FastDataset("MNIST", device="cpu", train_subset=10_000)

# 3. Compress
result = compress(model, ds, target_flops=0.5, n_generations=30)

print(result.summary())
# CompressResult | arch=[301, 153, 128] | FLOPs -46.5% (1,133,056 → 605,802)
#   | params -46.5% (536K → 286K) | acc=0.9859

The compressed model is a regular nn.Module — drop it into your existing pipeline:

result.model                  # nn.Module, ready for inference
result.accuracy               # 0.9859
result.flops_reduction_pct    # 46.5
result.arch                   # [301, 153, 128]  ← hidden layer sizes found

# Save / reload
result.save("compressed.pt")
result = dnaty.load("compressed.pt")

# Export to ONNX for edge deployment (no PyTorch needed on the device)
result.export_onnx("model.onnx", input_shape=(784,))

# Measure real CPU latency on your machine
print(result.benchmark_latency((784,)))   # p50/p95/p99 ms + fps

Why dNATY?

The problem: most models ship larger than they need to be. That means slower inference, higher cloud bills, and models too heavy for edge devices (cameras, drones, robots, industrial boxes). Shrinking them by hand is days of trial-and-error with no guarantee you found the best size/accuracy trade-off.

What you get with dNATY:

  • Smaller, cheaper models — 23–86% fewer FLOPs across 18 benchmark datasets, accuracy kept
  • No GPU — the search runs on CPU in minutes, so it works in CI and on the hardware you already have
  • No manual architecture design — point it at a model + dataset, get a deployable nn.Module back
  • One function callcompress(model, dataset); export to .pt / .onnx

"Why not just TensorRT or TFLite?" — wrong layer.

Runtimes optimize execution of a fixed architecture. dNATY optimizes the architecture itself, upstream of any runtime. You don't choose between them — you chain them: compress()export_onnx() → load into TensorRT / TFLite / ONNX Runtime. The savings stack.

Versus other compression techniques

Method What it does Catch
Quantization Lower-precision weights (fp32→int8) Same architecture & op count. Stack it on top of dNATY.
Pruning Zeroes individual weights Needs sparse runtimes to actually run faster; manual tuning
Distillation Trains a small student model You design the student + write the training loop
DARTS Gradient-based architecture search Needs a GPU + hours of config
Random NAS Random architecture sampling No memory — re-tries bad ideas
dNATY Evolves a smaller architecture, memory-guided CPU-only, one call

The engine is episodic memory-guided evolutionary search: operators that helped in past generations get sampled more often, so it converges 1.6× faster than random NAS — no gradients, no GPU.


Measured results

All numbers measured on a standard desktop CPU, validation accuracy on a held-out 20% split, reproducible from scripts in this repo. Full tables, configs, and caveats: dnaty.org/benchmarks.

13 public datasets (n_generations=30, n_pop=15) — top rows:

Dataset Samples FLOPs ↓ Val acc Domain
Electrical Fault Detect 12,001 −86.0% 99.04% smart grid sensors
Dry Bean Quality 13,611 −83.4% 92.43% agricultural IoT
Predictive Maint. (AI4I) 10,000 −83.1% 96.70% factory IoT
Breast Cancer (UCI) 569 −72.6% 100.0% clinical tabular
Credit Card Fraud (full) 284,807 −64.0% 99.96% financial anomaly
Network Intrusion (NSL-KDD) 31,490 −56.3% 99.46% edge security
HAR Sensors (UCI) 10,299 −46.8% 99.17% wearables · robotics
MNIST (full 70K) 70,000 −41.8% 98.68% vision · digits

5 market-grade synthetic domains (n_generations=15, n_pop=12, deterministic feature-correlated labels):

Dataset Samples FLOPs ↓ Val acc
Telecom Churn Prediction 35,000 −64.6% 99.94%
IoT Sensor Anomaly Detection 50,000 −61.4% 99.18%
Financial Fraud Detection 100,000 −60.3% 99.31%
E-commerce Purchase Propensity 80,000 −49.9% 98.03%
Healthcare Risk Stratification 25,000 −23.3% 93.74%

Compression scales with how oversized the model is — dNATY finds the right size, it doesn't force a fixed cut. Lean models get small cuts (that's correct Pareto behavior, not a bug).

Continual learning (Split-MNIST, 5 tasks, 3 seeds)

Method Backward Transfer (BWT)
dNATY (balanced replay) −0.145 6.9× less forgetting
EWC −0.999 near-total forgetting
MLP (no CL) −0.998 baseline

CPU latency comparison

Reproduce: python scripts/prove_it.py (NAS vs random) · python scripts/benchmark_market_real.py (market datasets)


API at a glance

You want to… Use
Compress a tabular/sensor MLP compress(model, data, target_flops=0.5)
Compress a small CNN trained from scratch compress_cnn(model, loader) (early access — CIFAR-scale classification)
Compress the head of a pretrained backbone compress_with_backbone(resnet, loader, finetune_backbone=True)
Thin out conv layers too prune_conv_channels(model, amount=0.3)
Deploy without PyTorch on the device result.export_onnx("m.onnx", input_shape=...)
Save / reload result.save("m.pt") / dnaty.load("m.pt")
Detect data drift in production DriftDetector().fit(X_train) + ProductionTracker(model, detector)
Profile compute before deciding count_flops(model, input_shape) / flops_by_layer(...)

Supported backbones for compress_with_backbone: ResNet, MobileNetV2/V3, EfficientNet, VGG, DenseNet, ViT, and custom models with an fc/classifier/head attribute.

Full reference with copy-paste recipes: dnaty.org/docs

Example — pretrained backbone for edge deployment

import torchvision.models as tv
import dnaty

backbone = tv.mobilenet_v2(weights="IMAGENET1K_V1")
dnaty.prune_conv_channels(backbone, amount=0.2)          # optional: thin convs first

result = dnaty.compress_with_backbone(
    backbone, train_loader,
    target_flops=0.4,
    finetune_backbone=True, finetune_epochs=10,
)
result.export_onnx("mobilenet_edge.onnx", input_shape=(3, 224, 224))

Deterministic results

result = compress(model, ds, target_flops=0.5, n_generations=30, seed=42)
# Same seed → identical result. The pytest suite gates every release on this.

Scope, stated plainly

Strong: MLPs on tabular/sensor data; classifier heads on frozen CNN/ViT backbones; CPU-only environments. Not yet: full convolutional NAS end-to-end (under development — convs are handled by structural pruning today); transformer/LLM compression; models that are already minimal (no fat → little or no cut, and the library warns you when the model would need to grow).

No comparison against OFA or MnasNet is claimed — those target full conv search spaces on GPUs; dNATY targets CPU-only workflows on a different problem slice.


Installation

pip install dnaty                # stable (recommended)
pip install dnaty==1.1.6         # pin to this release
pip install git+https://github.com/pedrovergueiro/dNaty  # latest from source

Requirements: Python 3.10+, PyTorch 2.0+, NumPy 1.24+

pip install dnaty[dev]   # adds pytest, matplotlib, jupyter

Project structure

dNaty/
├── dnaty/
│   ├── compress.py              # public API: compress, compress_cnn,
│   │                            #   compress_with_backbone, prune_conv_channels
│   ├── result.py                # CompressResult + load() — save/export/latency
│   ├── evolution/evolver.py     # DnatyEvolver / CnnEvolver — NSGA-II search
│   ├── core/                    # DynamicMLP, DynamicCNN, Individual, episodic memory
│   ├── operators/               # structural mutation operators (dense + conv)
│   ├── training/local_train.py  # fast local trainer
│   ├── monitoring/              # DriftDetector, ProductionTracker
│   ├── utils/flops_counter.py   # count_flops, flops_by_layer
│   └── experiments/fast_dataset.py  # zero-I/O MNIST/FashionMNIST/CIFAR10 loader
├── scripts/                     # prove_it.py, benchmark_market_real.py, ...
└── tests/                       # pytest suite (57 tests) — gates every release

Hosted version

Prefer not to run it locally? dnaty.org hosts the same engine with a web UI and REST API — upload a CSV, get a compressed model back. Free tier: 1 training a day, no card.


Citation

@software{vergueiro_dnaty_2026,
  author  = {Vergueiro, Pedro},
  title   = {dNaty: Dynamic Neuro-Adaptive sYstem with evoluTionarY Learning},
  year    = {2026},
  url     = {https://github.com/pedrovergueiro/dNaty},
  version = {1.1.6},
  license = {BSL-1.1}
}

License

Business Source License 1.1 — free for research, academic work, and personal projects. Commercial use requires a license: dnaty.org/commercial · pedrol.vergueiro@gmail.com