COCOV: Continuous Collaborative Verification

A drift-aware, prototype-based framework for continual face verification with selective human-in-the-loop adaptation, evaluated across five backbone encoders and three datasets.

COCOV maintains bounded identity memory through prototype assignment, insertion, and merging while using reviewer-mediated escalation for uncertain observations. Built on a fixed deep face encoder, the framework targets continual verification under non-stationary appearance conditions where identity representations must evolve incrementally without retraining the underlying feature extractor.

Results Summary

COCOV consistently outperforms all baselines across all five encoders evaluated on VGGFace2 (30 independent runs, mean ± std):

Encoder	Static AUC	COCOV AUC	COCOV EER
FaceNet	0.9816	0.9843 ± 0.0020	4.28 ± 0.52%
ArcFace-R50	0.9783	0.9837 ± 0.0017	3.87 ± 0.35%
ArcFace-R100	0.9788	0.9841 ± 0.0016	3.89 ± 0.36%
MobileFaceNet	0.9789	0.9840 ± 0.0017	3.94 ± 0.37%
AdaFace	0.9778	0.9831 ± 0.0017	3.94 ± 0.37%

Datasets

Dataset	Role	Identities	Ordering
VGGFace2-HQ	Primary evaluation	3,000 subset	Filename
CACD	Cross-dataset (cross-age)	500 (capped)	Year
FG-NET	Cross-dataset (age-ordered)	82	Age label

Dataset access:

VGGFace2: https://www.robots.ox.ac.uk/~vgg/data/vggface2/
CACD: https://bcsiriuschen.github.io/CARC/
FG-NET: https://yanweifu.github.io/FG_NET_data/FGNET.zip

Encoders

Five backbone encoders are evaluated. All are fixed throughout experiments — no fine-tuning is performed.

Code	Encoder	Backbone	Loss	Training Data	Input
ENC01_FACENET	FaceNet	InceptionResNetV1	Triplet	VGGFace2	160×160
ENC02_ARCFACE_R50	ArcFace-R50	IResNet-50	ArcFace	WebFace600K	112×112
ENC03_ARCFACE_R100	ArcFace-R100	IResNet-100	ArcFace	Glint360K	112×112
ENC04_MOBILEFACENET	MobileFaceNet	MobileFaceNet	ArcFace	WebFace600K	112×112
ENC05_ADAFACE	AdaFace	IResNet-101	AdaFace	MS1MV3	112×112

See ENCODERS.md for installation and weight acquisition instructions per encoder.

Project Structure

cocov/
├── config/
│   └── config.yaml              # All paths, hyperparameters, calibration settings
├── data/
│   ├── dataset.py               # VGGFace2Dataset, CACDDataset, FGNETDataset
│   ├── embeddings.py            # EmbeddingCache with per-encoder subdirectories
│   └── stream.py                # Sequential verification stream construction
├── models/
│   ├── encoder.py               # Multi-backbone encoder registry (5 encoders)
│   └── identity_memory.py       # Prototype memory with assignment/insertion/merging
├── methods/
│   ├── base.py                  # Abstract BaseVerificationMethod
│   ├── static.py                # Static Enrollment baseline
│   ├── ols.py                   # Naive OLS Expansion baseline
│   ├── replay.py                # Replay Dual Memory baseline
│   ├── buffer.py                # Fixed Buffer Averaging baseline
│   └── cocov.py                 # COCOV framework
├── verification/
│   ├── verifier.py              # Similarity and drift computation
│   └── metrics.py               # AUC, EER, TAR@FAR, update counts
├── calibration/
│   └── calibrate.py             # Threshold calibration via grid search
├── experiments/
│   ├── run_experiment.py        # Main experiment runner (30 runs)
│   ├── ablation.py              # Ablation study (6 configurations)
│   └── cross_dataset.py         # Cross-dataset evaluation (CACD, FG-NET)
├── scripts/
│   ├── run_encoder.sh           # Single-encoder pipeline (4 steps)
│   ├── run_all_backbones.sh     # Full pipeline across all 5 encoders
│   ├── patch_config.py          # Config patching utility
│   └── extract_embeddings.py    # Per-encoder embedding extraction
├── webapp/
│   └── main.py                  # Reviewer escalation web interface
└── tests/
    ├── test_dataset.py
    ├── test_memory.py
    ├── test_metrics.py
    └── test_verifier.py

Installation

git clone https://github.com/davidwaf/cocov.git
cd cocov
pip install -r requirements.txt

Tested on:

Python 3.13
PyTorch 2.7.1 with CUDA 12.8
Ubuntu 24.04
NVIDIA RTX 1000 Ada Generation (6 GB VRAM)

For encoder-specific dependencies (ArcFace, AdaFace, MobileFaceNet), see ENCODERS.md.

Reproducing Experiments

Quickstart — single encoder

The recommended entry point is scripts/run_encoder.sh, which runs the complete four-step pipeline for one encoder:

cd /path/to/cocov
bash scripts/run_encoder.sh facenet

Valid encoder names: facenet, arcface_r50, arcface_r100, mobilefacenet, adaface

This script:

Patches config/config.yaml with the encoder's settings
Extracts and caches embeddings for VGGFace2, CACD, and FG-NET
Runs calibration and main experiment (30 runs)
Runs cross-dataset evaluation and ablation study

Results are written to /opt/data/cocov/results/ENC0X_*/.

Step-by-step

1 — Configure paths

Edit config/config.yaml to set your dataset and output paths:

paths:
  vggface2_root: "/path/to/VGGface2_None_norm_512_true_bygfpgan"
  cacd_root:     "/path/to/cacd/cacd_split"
  fgnet_root:    "/path/to/FGNET/images"
  embeddings_dir: "/path/to/embeddings"
  results_dir:   "/path/to/results"

2 — Extract embeddings

python scripts/extract_embeddings.py \
    --encoder facenet \
    --config config/config.yaml

Embeddings are cached under {embeddings_dir}/{encoder}/. Subsequent runs load from cache automatically.

3 — Run calibration and main experiment

python experiments/run_experiment.py --config config/config.yaml

4 — Run ablation study

python experiments/ablation.py --config config/config.yaml

5 — Run cross-dataset evaluation

python experiments/cross_dataset.py --config config/config.yaml

6 — Generate figures and tables

python analysis/generate_results.py \
    --results_dir /opt/data/cocov/results \
    --output_dir  /opt/data/cocov/analysis

Outputs:

analysis/
├── figures/
│   ├── fig1_main_auc_grouped.pdf
│   ├── fig2_main_eer_grouped.pdf
│   ├── fig3_cocov_vs_static_scatter.pdf
│   ├── fig4_ablation_heatmap.pdf
│   ├── fig5_ablation_bars.pdf
│   ├── fig6_cross_dataset_auc.pdf
│   ├── fig6_cross_dataset_eer.pdf
│   ├── fig8_updates_bar.pdf
│   ├── fig9_radar.pdf
│   └── fig10_cocov_gain.pdf
└── tables/
    ├── tab1_main_results.tex
    ├── tab2_cross_dataset.tex
    ├── tab3_ablation.tex
    └── tab4_encoder_summary.tex

All figures are saved as both .pdf (for LaTeX) and .png (for preview). LaTeX tables can be included directly with \input{}.

Reviewer Web Application

The reviewer interface handles escalated observations that fall outside automatic acceptance bounds.

python webapp/main.py

Open http://localhost:8000

The interface presents the probe image, enrolled reference, similarity score, and drift value. The reviewer selects: confirm, assign, create, or reject. All decisions are logged to /opt/data/logs/reviewer_log.json.

In experiments, reviewer responses are simulated using ground-truth labels, providing an upper bound on collaborative performance under ideal reviewer conditions.

Evaluated Methods

Method	Memory	Updates	Supervision
Static Enrollment	Fixed	None	None
Naive OLS Expansion	Unbounded	Per sample	None
Replay Dual Memory	Bounded	Scheduled	None
Fixed Buffer Averaging	Fixed-size	Per sample	None
COCOV	Bounded	Drift-gated	Conditional

Ablation Configurations

Configuration	Component Disabled
COCOV-Full	Reference configuration
COCOV-NoDrift	Drift gate
COCOV-NoMerge	Prototype merging
COCOV-NoReviewer	Reviewer escalation
COCOV-Unbounded	Memory bound (K_max)
COCOV-SinglePrototype	Multiple prototypes

Key ablation findings (across all 5 encoders):

Drift gate is the most critical component (−0.006 to −0.010 AUC)
Single prototype causes the second-largest drop (−0.003 to −0.008)
Removing merge slightly improves AUC (+0.0008) — suggesting merge is conservative; removing it allows more prototype diversity
Removing the memory bound has negligible impact — the drift gate implicitly limits update frequency

COCOV Hyperparameters

All hyperparameters are calibrated per-encoder on a held-out calibration set of 200 identities not used in evaluation.

Parameter	Symbol	Description
`assign_threshold`	ρ_assign	Max cosine distance for assignment
`new_threshold`	ρ_new	Min cosine distance for new prototype
`merge_threshold`	ρ_merge	Min cosine similarity for merging
`momentum`	γ	Momentum for prototype assignment update
`max_prototypes`	K_max	Max prototypes per identity
`verification_threshold`	τ_ver	Min similarity for acceptance
`drift_threshold`	τ_Δ	Max drift for update eligibility

Metrics

Metric	Description
AUC	Area under the ROC curve
EER	Equal error rate
TAR@FAR=1%	True acceptance rate at 1% false acceptance rate
Updates	Total prototype update operations per run

Results are reported as mean ± std across independent runs (30 for VGGFace2, 5 for CACD and FG-NET).

Hardware

Experiments conducted on:

GPU: NVIDIA RTX 1000 Ada Generation (6 GB VRAM)
CPU: Intel i9, 32 cores
RAM: 64 GB
OS: Ubuntu 24.04

Embedding extraction uses GPU. Verification, calibration, and evaluation operate on CPU from pre-extracted embeddings.

Tests

pytest tests/ -v

License

MIT License. See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COCOV: Continuous Collaborative Verification

Results Summary

Datasets

Encoders

Project Structure

Installation

Reproducing Experiments

Quickstart — single encoder

Step-by-step

1 — Configure paths

2 — Extract embeddings

3 — Run calibration and main experiment

4 — Run ablation study

5 — Run cross-dataset evaluation

6 — Generate figures and tables

Reviewer Web Application

Evaluated Methods

Ablation Configurations

COCOV Hyperparameters

Metrics

Hardware

Tests

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
analysis		analysis
calibration		calibration
config		config
data		data
experiments		experiments
methods		methods
models		models
scripts		scripts
tests		tests
verification		verification
webapp		webapp
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
ENCODERS.md		ENCODERS.md
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

COCOV: Continuous Collaborative Verification

Results Summary

Datasets

Encoders

Project Structure

Installation

Reproducing Experiments

Quickstart — single encoder

Step-by-step

1 — Configure paths

2 — Extract embeddings

3 — Run calibration and main experiment

4 — Run ablation study

5 — Run cross-dataset evaluation

6 — Generate figures and tables

Reviewer Web Application

Evaluated Methods

Ablation Configurations

COCOV Hyperparameters

Metrics

Hardware

Tests

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages