Skip to content

DOSAYGO-STUDIO/structropy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Structropy - Towards a Metric of Organization

Draft Research Notes

Author(s): [Cris, ChatGPT, collaborators]
Date: Aug 2025
Status: Exploratory notes, draft structure


1. Motivation

  • Entropy is well formalized (Shannon entropy, statistical mechanics). It measures unpredictability or randomness in a distribution.
  • But entropy is not structure: high entropy often corresponds to noise (white noise, random strings), which contains little useful information.
  • Human systems (data, puzzles, memory, search, computation) are often valued for their organization: the speed, efficiency, and reliability of retrieving what you want.
  • Goal: Develop a complementary metric to entropy that quantifies organization, defined operationally as “how efficiently can you locate an element?”

2. Conceptual Framework

  • Entropy: uncertainty/randomness in a source.
  • Organization (proposed): efficiency of retrieval.
  • Intuition: A perfectly sorted deck is highly organized; a shuffled deck is not. A hashed or indexed system can be “super-organized.”

Key principle:

$$ \text{Organization} = \frac{\text{Baseline cost}}{\mathbb{E}[T]} $$

where $\mathbb{E}[T]$ = expected search steps.


3. Candidate Metrics

3.1 Organization Index vs. Sorted Baseline

$$ \mathrm{OI}_{\log} = \frac{\log_2 n}{\mathbb{E}[T]} $$

  • $n$ = number of items.
  • Sorted binary search $\to$ 1.
  • Linear scan $\to \ll 1$.
  • Hash/indexing $\to > 1$.

3.2 Entropy-Aware Organization

$$ \mathrm{OI}_H = \frac{H(P)}{\mathbb{E}[T]} $$

  • $H(P)$ = Shannon entropy of query distribution.
  • If $\mathrm{OI}_H \approx 1$, system is near theoretical optimum.
  • If $\mathrm{OI}_H > 1$, non-comparison structures are in play.

3.3 Step-Discounted Organization

Borrowing from NDCG/MRR:

$$ \mathrm{OI}_{\text{SDG}} = \mathbb{E}!\left[\frac{1}{\log_2(1+T)}\right] $$

  • Always in (0,1].
  • Captures “diminishing returns” of extra steps.

3.4 Normalization of the Organization Index

Often we want to normalize to the unit interval [0,1] so that:

  • Perfect disorder (linear scan, no structure) → 0
  • Optimal comparison-based organization (binary search or entropy bound) → 1

Let $\mathbb{E}[T]_{\mathrm{rand}}$ = expected steps in a fully unorganized system (e.g. $(n+1)/2$ for linear scan under uniform queries).

Normalized form:

$$ \mathrm{OI}^{\mathrm{norm}}_{\log} = \frac{\tfrac{\log_2 n}{\mathbb{E}[T]} - \tfrac{\log_2 n}{\mathbb{E}[T]_{\mathrm{rand}}}} {1 - \tfrac{\log_2 n}{\mathbb{E}[T]_{\mathrm{rand}}}} $$

Entropy-aware version:

$$ \mathrm{OI}^{\mathrm{norm}}_H = \frac{\tfrac{H(P)}{\mathbb{E}[T]} - \tfrac{H(P)}{\mathbb{E}[T]_{\mathrm{rand}}}} {1 - \tfrac{H(P)}{\mathbb{E}[T]_{\mathrm{rand}}}} $$

  • Perfect disorder → score 0
  • Optimal comparison → score 1
  • Auxiliary structures (hashing, direct addressing) may exceed 1 (unless capped).

4. Toy Examples

52-card deck (uniform queries):

Scenario $\mathbb{E}[T]$ $\mathrm{OI}_{\log}$ $\mathrm{OI}^{\text{norm}}_{\log}$ $\mathrm{OI}_{\text{SDG}}$
Unsorted pile ~26.5 0.215 0.000 0.209
Sorted (binary) ~5.7 1.000 1.000 0.364
Hashed index ~1.2 4.750 5.778 0.879

If you want a bounded score, use the capped normalization

$\mathrm{OI}^{\text{norm,capped}}_{\log}=\min(1,\mathrm{OI}^{\text{norm}}_{\log})$,

which maps the hashed row to 1.000.

Skewed distribution:
If 50% of queries target one item, $H(P)\approx 3.84$ bits and
$\mathbb{E}[T]\approx 3.84 \Rightarrow \mathrm{OI}_H \approx 1$.


5. Differential Organization

5.1 Misfiled Item

$$ \Delta \mathbb{E}[T] \approx \frac{1+\log_2(s+1)}{n} $$

Effect on OI:

$$ \Delta \mathrm{OI} \approx -\frac{\log_2 n}{(\log_2 n)^2} \cdot \frac{1+\log_2(s+1)}{n} $$

Example: 52 cards, sorted by suits. OI drops from 1.0 to ~0.984 with one misfile.

5.2 Reshaping Buckets

$$ \Delta \mathbb{E}[T] \approx \frac{1}{n \ln 2} \left(\frac{1}{n_b} - \frac{1}{n_a}\right) $$

Balanced buckets maximize organization.

5.3 Within-Bucket Disorder

A single inversion breaks binary search; robust fallback reduces this to a misfile-like penalty.


6. Properties of the Organization Index

  • Not a metric: It’s an index, not a distance; triangle inequality doesn’t apply.
  • Monotonicity: Fewer expected steps $\to$ higher OI.
  • Scale dependence: Larger systems allow higher OI (matches intuition: a library can be “more organized” than a deck).
  • Boundedness:
    • Comparison-only $\to$ OI $\le 1$.
    • With hashing/indexing $\to$ OI can scale as $\log n$.
  • Differential sensitivity: Small perturbations change OI smoothly, $\Delta \mathrm{OI} = O(1/n)$.

7. Connections to Literature

  • Optimal search trees: costs bounded by entropy $H(P)$.
  • Huffman coding analogy: code length vs. search depth.
  • IR metrics: NDCG/MRR parallel step-discounting.
  • Complexity measures: effective complexity, logical depth, statistical complexity.

8. Open Questions & Future Work

  • Normalization choice: bounded vs unbounded.
  • Handling skewed query distributions.
  • Including maintenance/update costs.
  • Multi-level organization (hierarchies, caches).
  • Biological analogy: SNP-like mutations as structural perturbations.
  • Applications: IR, database indexing, cognitive models, evolution.

9. Conclusion

Entropy quantifies randomness; organization quantifies efficiency of access.
Our metrics ($\mathrm{OI}_{\log}$, $\mathrm{OI}_H$, $\mathrm{OI}_{\text{SDG}}$) provide complementary perspectives. Differential analysis shows robustness under
noise, and scaling separates comparison-only from “super-organized” structures.

This sketches the outline of a future theory of organization: a counterpart
to information theory, capturing order and efficiency rather than randomness.

About

Structropy - Toward a Metric of Organization

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages