Author(s): [Cris, ChatGPT, collaborators]
Date: Aug 2025
Status: Exploratory notes, draft structure
- Entropy is well formalized (Shannon entropy, statistical mechanics). It measures unpredictability or randomness in a distribution.
- But entropy is not structure: high entropy often corresponds to noise (white noise, random strings), which contains little useful information.
- Human systems (data, puzzles, memory, search, computation) are often valued for their organization: the speed, efficiency, and reliability of retrieving what you want.
- Goal: Develop a complementary metric to entropy that quantifies organization, defined operationally as “how efficiently can you locate an element?”
- Entropy: uncertainty/randomness in a source.
- Organization (proposed): efficiency of retrieval.
- Intuition: A perfectly sorted deck is highly organized; a shuffled deck is not. A hashed or indexed system can be “super-organized.”
Key principle:
where
-
$n$ = number of items. - Sorted binary search
$\to$ 1. - Linear scan
$\to \ll 1$ . - Hash/indexing
$\to > 1$ .
-
$H(P)$ = Shannon entropy of query distribution. - If
$\mathrm{OI}_H \approx 1$ , system is near theoretical optimum. - If
$\mathrm{OI}_H > 1$ , non-comparison structures are in play.
Borrowing from NDCG/MRR:
- Always in (0,1].
- Captures “diminishing returns” of extra steps.
Often we want to normalize to the unit interval [0,1] so that:
- Perfect disorder (linear scan, no structure) → 0
- Optimal comparison-based organization (binary search or entropy bound) → 1
Let
Normalized form:
Entropy-aware version:
- Perfect disorder → score 0
- Optimal comparison → score 1
- Auxiliary structures (hashing, direct addressing) may exceed 1 (unless capped).
52-card deck (uniform queries):
| Scenario | ||||
|---|---|---|---|---|
| Unsorted pile | ~26.5 | 0.215 | 0.000 | 0.209 |
| Sorted (binary) | ~5.7 | 1.000 | 1.000 | 0.364 |
| Hashed index | ~1.2 | 4.750 | 5.778 | 0.879 |
If you want a bounded score, use the capped normalization
which maps the hashed row to 1.000.
Skewed distribution:
If 50% of queries target one item,
Effect on OI:
Example: 52 cards, sorted by suits. OI drops from 1.0 to ~0.984 with one misfile.
Balanced buckets maximize organization.
A single inversion breaks binary search; robust fallback reduces this to a misfile-like penalty.
- Not a metric: It’s an index, not a distance; triangle inequality doesn’t apply.
-
Monotonicity: Fewer expected steps
$\to$ higher OI. - Scale dependence: Larger systems allow higher OI (matches intuition: a library can be “more organized” than a deck).
-
Boundedness:
- Comparison-only
$\to$ OI$\le 1$ . - With hashing/indexing
$\to$ OI can scale as$\log n$ .
- Comparison-only
-
Differential sensitivity: Small perturbations change OI smoothly,
$\Delta \mathrm{OI} = O(1/n)$ .
- Optimal search trees: costs bounded by entropy
$H(P)$ . - Huffman coding analogy: code length vs. search depth.
- IR metrics: NDCG/MRR parallel step-discounting.
- Complexity measures: effective complexity, logical depth, statistical complexity.
- Normalization choice: bounded vs unbounded.
- Handling skewed query distributions.
- Including maintenance/update costs.
- Multi-level organization (hierarchies, caches).
- Biological analogy: SNP-like mutations as structural perturbations.
- Applications: IR, database indexing, cognitive models, evolution.
Entropy quantifies randomness; organization quantifies efficiency of access.
Our metrics (
noise, and scaling separates comparison-only from “super-organized” structures.
This sketches the outline of a future theory of organization: a counterpart
to information theory, capturing order and efficiency rather than randomness.