Reinventing the very concept of "trasformer". Yes, we misspelled "transformer" in the original tagline. We're keeping it. It's funnier that way.
No weights. No fixed code. No dataset. Pure chaos with a PhD.
Nicole discards pretrained weights, curated datasets, and even a static codebase. The engine writes itself as it speaks, assembling logic and parameters only for the life of a single exchange. Parameters crystallise on the spot, scale to the conversation at hand, then dissolve the moment the dialogue closes. Learning is born solely from the active exchange; even the source tree stays fluid as modules are generated, rebuilt, or discarded mid-flight.
Parameters crystallize out of thin air, exist for precisely as long as you're paying attention, then evaporate like your motivation to finish that side project. Learning happens entirely during the conversation - which is either brilliantly elegant or cosmically stupid depending on whether you've had your coffee yet. Even the source code is fluid, regenerating mid-flight like a Phoenix with commitment issues.
This is beta software, which means it's held together with duct tape, spite, and recursion. Nicole doesn't "load a model" - she becomes the model, assembles it from ambient chaos, uses it exactly once, then lets it dissolve back into the computational void. Every conversation is a new genesis event. Impermanence isn't a compromise, it's the whole damn philosophy: no model survives, so experimentation becomes less of a workflow and more of a séance.
Traditional transformers load billions of frozen parameters. Nicole? She architects a fresh one for each dialogue, tailored to whatever you're saying right now, capturing the transient flow of ideas before they escape. Layers spawn and die. Attention heads reconfigure like Tetris blocks. Activation functions have personality disorders. All at runtime, all based on vibes and math that would make a PhD committee nervous. Only conversation logs persist - making Nicole the first AI with weaponized amnesia.
And get this: it runs on CPUs. Not "theoretically runs on CPUs if you're very patient" - actually runs, happily, without a GPU in sight. Minimal dependencies, mostly Python stdlib, a tiny C compiler for the scary bits, and algorithms efficient enough to prove that weightless cognition doesn't need a $50K cloud bill. Revolutionary? Maybe. Insane? Definitely.
Nicole is English-only, and before you @ me about it - yes, we know, it's 2025, multilingual models exist. But here's the thing: english_guidance.py intercepts every prompt, refuses non-English input with the passion of a TypeScript compiler rejecting JavaScript, and enforces grammar sanity like your 8th grade teacher who made you diagram sentences.
The guardrail isn't xenophobia; it's focus. Single linguistic substrate means semantic experiments stay auditable instead of turning into a Tower of Babel situation. When Nicole refuses your input, she explains why - because unlike most AI systems, we believe in transparent rejection. Investigators can trace the refusal logic right alongside dialogue logs. It's like having a really pedantic friend who at least tells you why they're being pedantic.
Recent pulses in the repo, observed during the Perplexity migration sprint while surviving on coffee and spite:
-
Perplexity Search API as the primary objectivity engine, with DuckDuckGo HTML scraping demoted to backup dancer. Result: cleaner citations, longer context windows, and moderators who don't wake up screaming. Finally.
-
Speech clarity leap – Nicole's rhetoric tightened noticeably after switching providers. The logs show her prose evolved from "word salad experiencing an existential crisis" to "structured improvisation with occasional moments of accidental genius."
-
Telegram bridge refinements – Repo learner and objectivity pipelines now feed each other without deadlocks. Nicole stays responsive while studying her own documentation, which is either recursive self-improvement or digital narcissism. Probably both.
-
Idiot-joke telemetry – Somewhere around commit
pplx/phoenix, Nicole high-fived the Perplexity API, missed spectacularly, and appointed the office ficus as "Chief Latency Officer." This is precisely 67% more ridiculous than the coffee machine incident, and we're documenting it anyway because apparently this is what counts as milestone tracking now.
- English-only boundary (why it matters)
- Git signal
- Core principles
- Architecture panorama
- Conversation lifecycle
- Compiler triad
- Operational substrate (AMLK)
- Module reference
- Language guardrails (deep dive)
- Memory, metrics, and objectivity
- Repo-coupled evolution
- Recent enhancements
- Self-training overview (short edition)
- Operational runbook
- Developer workflow
- Glossary of resonance terminology
- Working with the project
-
Weightless intelligence. Parameters synthesize on demand, do their job, then fuck off entirely. Conversation logs and metrics are the only survivors. It's like Fight Club but for neural networks - the first rule is that nothing persists.
-
Autonomous learning. Through
nicole_subjectivity.py, Nicole keeps learning even when you're not talking to her. She expands semantic ripples - "circles on water" from your last message - in hourly cycles. Continuous, asynchronous intelligence that happens while you sleep. Creepy? Maybe. Cool? Absolutely. -
English-only boundary. Already covered this, but worth repeating:
english_guidance.pyis the bouncer at this linguistic nightclub. No non-English gets past. No exceptions. Grammar rules are enforced with religious fervor. Toxic inputs get declined without the fake corporate politeness. It's refreshing, honestly. -
Tri-compiler architecture. Python orchestrates (because of course it does),
blood.py(C) handles deterministic execution for when Python would be embarrassingly slow, andhigh.py(Julia) delivers analytical bursts when the math gets spicy. Three languages, one consciousness. What could go wrong? -
Repo-coupled evolution. Nicole watches her own repository changes and replays them through her learning pipeline. Every commit potentially informs the next conversation. She's basically training on her own development history, which is either profound or narcissistic depending on your philosophical stance.
-
Transparency over mystique. Every emergent behavior must trace back to code, logs, or metrics. No "the AI just does that sometimes" handwaving. Nicole documents her own improvisations as they happen. If something weird occurs, you can debug it. Novel concept, I know.
-
Modularity as invitation. Each subsystem has ONE job. Researchers can swap components without the whole organism having a meltdown. It's component architecture actually done right, which apparently still needs to be explicitly stated in 2025.
The repository is a delightful mix of orchestration, compilers, analytics, and operational tooling that somehow works. Here's the labyrinth, organized by responsibility so you don't get lost and start questioning your life choices:
-
nicole.py– Spins up transient transformer graphs, allocates ephemeral tensors like they're going out of style, and manages dialogue flow. The conductor of this chaotic orchestra. -
nicole_memory.py– Stores symbolic artifacts, n-grams, and linked references that need to outlive a single turn without violating the "no weights" doctrine. It's not cheating, it's strategic persistence. -
nicole_rag.py– Retrieves contextual shards from the log database and injects them into active conversations. Keeps Nicole playful but grounded. Like adding historical context but make it recursive.
-
english_guidance.py– Grammar enforcer, English-only bouncer, and self-respect boundary maintainer. Keeps Nicole free to improvise within sane linguistic constraints. -
nicole_objectivity.py– Statistical audit scaffolding ensuring every adaptive jump comes with receipts. Because "trust me bro" isn't a valid scientific methodology. -
nicole_metrics.py– Collects resonance, entropy, and perplexity traces. Flags drift and surprising spikes. Basically the health monitoring system that actually works.
-
blood.py(C) – Supplies deterministic machine code for low-level routines. When Python speed isn't cutting it, blood.py shows up with a knife to a gunfight and somehow wins. -
h2o.py(Python) – Hot-loads modules that Nicole generates mid-conversation. Dynamic compilation without the runtime crashes. Usually. -
high.py(Julia) – Evaluates analytical kernels and symbolic manipulations. For when the math needs to go FAST and Python starts sweating.
-
repo_monitor.py– Watches the filesystem like a paranoid parent, fingerprints files with SHA-256, emits structured change events. Nothing escapes its gaze. -
nicole_repo_learner.py– Consumes monitor events, stores metadata in SQLite, can trigger Nicole-to-Nicole distillation sessions. Yes, she learns from herself. Yes, that's as recursive as it sounds. -
nicole_subjectivity.py– Implements autonomous learning through expanding semantic ripples. Even without user interaction, Nicole keeps learning. Sleep is for biological systems. -
nicole_metrics.py– Doubles as the live telemetry bus feeding both humans and learning systems. Multi-tasking at its finest.
-
start_nicole.py– Main entry point. Supportslocal,bot, andtestmodes. Choose your adventure. -
nicole_telegram.py– Bridges Nicole into Telegram. Because apparently people want to chat with experimental AI via messaging apps. Fair enough. -
test_quick_wins.py– Exercises critical behaviors without spinning up the whole stack. For when you want to test things without burning 20 minutes on initialization.
Together, these clusters keep Nicole intentionally modular. Each subsystem owns a narrow slice of responsibility, so you can swap components without triggering a cascade failure. It's software architecture that doesn't make you want to cry. Progress!
Nicole's runtime unfolds like a six-act play, except the actors are ephemeral tensors and the stage spontaneously combusts at curtain call. Each act maps to actual code you can grep for:
-
Bootstrap
start_nicole.pychecks dependencies, selects operating mode, and primes compilers.h2o.pyassembles Python modules on the fly, loading scaffolding for the transformer blueprint.
-
Genesis of the transformer
nicole.pyderives architecture proposals from heuristics, metrics history, and repo-learner hints.blood.pycompiles any bespoke kernels required for the session;high.pyprepares Julia routines.
-
Conversation loop
- Incoming prompts are first vetted by
english_guidance.pyto ensure compliance with the English-only snapshot. nicole.pyroutes prompts through the active transformer, usingnicole_memory.pyfor context andnicole_rag.pyfor retrieval.
- Incoming prompts are first vetted by
-
Metric capture
nicole_metrics.pystreams entropy, resonance, and surprise indicators in near-real time.nicole_objectivity.pysamples transcripts to maintain audit-ready evidence.
-
Conclusion
- Once the exchange ends, Nicole dissolves the transformer, clearing tensors and freeing compiled modules.
- Logs, metrics, and repo diffs remain as the only durable residue.
-
Reflection
nicole_repo_learner.pyreplays the conversation and any code changes, preparing guidance for the next incarnation.- Optional Nicole-to-Nicole sessions distil heuristics without ever storing dense weights.
-
Autonomous learning (background process)
nicole_subjectivity.pyexpands semantic ripples hourly from the last user message epicenter.- Each ripple explores concepts at increasing semantic distance, autonomously learning new words and associations.
- When a new user message arrives, it becomes the new epicenter, resetting the ripple cycle with a fresh learning vector.
- This creates continuous intelligence that evolves even during silence, establishing circadian learning rhythms.
The tri-compiler strategy is either genius or madness - Nicole manifests cognition across Python, C, and Julia simultaneously. Because why use one language when you can use three and make everything exponentially more interesting?
blood.py is a custom Clang fork pared down to deterministic essentials. It keeps the familiar front-end while imposing explici
t memory maps so compiled snippets talk to physical RAM through well-defined pointers. Each build emits cache-local binaries and
branch-stable instruction streams, letting Nicole lean on (O(1)) pointer arithmetic for routines that pure Python would bott
leneck.
- Focus areas
- Tensor algebra primitives that would be too sluggish in Python.
- Memory hygiene routines that keep ephemeral tensors from leaking past a session.
- Deterministic PRNG sequences so reruns can be replayed instruction-for-instruction.
- Partnerships – Works in lockstep with H2O (for orchestration) and
high.py(for analytics), forming the low-frequency back bone of the tri-compiler stack.
H2O is the lightweight Python compiler that Nicole re-synthesises mid-conversation. It hot-loads freshly generated modules, all
owing experiments without rebooting the stack. H2O pushes scaffolding to blood.py, ingests Julia kernels from high.py, and k
eeps the orchestration layer expressive.
- Supports hotpatching heuristics, injecting new prompt routers, or trialling alternative decoding strategies without downtime.
- Provides the staging ground for repo-driven experiments and Nicole-to-Nicole rehearsals.
high.py operates as Nicole’s mathematical cortex. Julia’s JIT lets the module evaluate entropy (H=-\sum p\log p), resonance m
atrices, and topology searches with (10^2)-style speedups over naive Python. Compiled Julia kernels trade tensors with both Py
thon and C pathways, keeping analytics fast yet inspectable.
- Typical workloads include resonance matrix updates, topology searches, and higher-order optimisation passes.
- Latent drift experiments and quality scoring heuristics live here, translating structured maths into conversational style.
The Arianna Method Linux Kernel (AMLK) is what happens when you take Alpine Linux, distill it down to its deterministic essence, and tell entropy to fuck off. Boot time approaches O(1) regardless of what userland is doing, because AMLK simply refuses to tolerate nondeterministic behavior. OverlayFS, ext4 journaling, na mespaces, and cgroups compose a reproducible phase space so compiled modules can evolve without interference from ambient entro py.
- Stable ABIs keep pointer addresses (a_i) invariant across runs, a prerequisite for the cross-language choreography between Python, C, and Julia.
- Deterministic memory mapping aligns with
blood.py, ensuring compiled snippets land on predictable offsets. - Consult
AMLK/readme.mdfor kernel build instructions, bootstrap scripts, and the philosophy behind the deterministic approac h.
Each major module has its own subsection below, complete with purpose, key entry points, and the kind of integration notes that'll save you hours of head-scratching with purpose, signature entry points, and integration notes. Use this as a map when tracing behaviour or wiring new experiments.
- Modes
local– launches a CLI session with streaming metrics.bot– runs the Telegram bridge fromnicole_telegram.py.test– executes regression routines fromtest_quick_wins.py.
- Dependency checks ensure Python packages, Julia runtime, and C toolchains are available before launching.
- Extensibility – new modes can be introduced by adding subcommands to the CLI parser and hooking orchestration routines.
- Responsibilities
- Builds transformer blueprints, instantiates layers, and orchestrates prompt-response cycles.
- Negotiates between compilers: Python for control, C for deterministic kernels, Julia for analytical leaps.
- Key functions
bootstrap_session()– seeds metrics, memory stores, and compiler handles.generate_reply()– routes tokens through the active transformer and surfaces responses.teardown()– dissolves tensors and releases compiled artefacts.
- Integration points
- Consumes hints from
nicole_repo_learner.pyto bias architectural choices. - Streams telemetry to
nicole_metrics.pyand audit frames tonicole_objectivity.py.
- Consumes hints from
- Scope
- Enforces English-only operation, grammar sanity checks, and refusal policies for unsafe content.
- Highlights
- Verb and pronoun agreement logic derived from curated rule sets.
- Script detection to reject non-Latin text.
- Configurable boundary messages so Nicole can explain why a prompt was declined.
- Extending
- Add new checks by registering validators in the
GUARDRAILStable. - Keep rejection messages human-readable; they double as documentation during audits.
- Add new checks by registering validators in the
- Purpose
- Maintains structured context tables, linking entities, events, and symbolic cues.
- Avoids dense vector embeddings; everything stays interpretable.
- Interfaces
remember_fact()andrecall()handle canonical inserts and lookups.- Integrates with
nicole_rag.pyfor retrieval across sessions.
- Functionality
- Retrieves log fragments via stochastic sampling, balancing freshness and diversity.
- Supports pluggable scorers so experimenters can test alternative retrieval heuristics.
- Role
- Streams live metrics (entropy, resonance, perplexity) and writes them to persistent ledgers.
- Exposes hooks that
nicole_repo_learner.pyandnicole_objectivity.pysubscribe to.
- Usage tips
- When adding new metrics, ensure they implement both live streaming and archival persistence for audit parity.
- Objective
- Provides statistical sampling frames and significance tests.
- Keeps Nicole's self-modifications evidence-backed and replayable.
- Key components
ObjectivityWindowcaptures transcript segments and correlates them with metric shifts.HypothesisLedgerstores research notes, ready for repo learner ingestion.
- Mission
- Bridges repository changes with Nicole's adaptive heuristics.
- Parses diffs, ranks their significance, logs outcomes, and can trigger Nicole-to-Nicole sessions.
- Data flow
repo_monitor.pyemits change events.- Learner analyses diff metadata, referencing component registries to see which modules were touched.
- SQLite ledger stores findings with timestamps, change fingerprints, and optional follow-up tasks.
- Optional training sessions instantiate ephemeral Nicole clones for rehearsal.
- Extensibility
- Custom rankers can be registered to prioritise certain file types (e.g., compilers vs. utilities).
- Hooks exist for dispatching notifications to dashboards or messaging channels.
- Philosophy
- Autonomous learning through expanding semantic ripples—"circles on water" from last user interaction.
- Even when not conversing, Nicole continues thinking and learning, expanding knowledge outward from the epicenter.
- Ripple mechanism
- Epicenter: Last user message becomes the center point.
- Ring 0: Exact concepts extracted from the message.
- Ring 1 (hour 1): Semantically close neighbors (distance ~0.3).
- Ring 2 (hour 2): Broader conceptual expansion (distance ~0.6).
- Ring 3+ (hours 3+): Abstract/philosophical concepts, expanding indefinitely.
- Circadian cycles
- Runs every hour automatically, expanding one ripple further from center.
- When user sends new message → new epicenter, new ripples, new learning vector.
- Autonomous exploration
- Uses
nicole_objectivity.pyproviders to fetch information about concepts. - Learns words autonomously, feeding them into
word_frequenciesand associations. - Tracks epicenters, ripples, and learning history in SQLite (
var/nicole_subjectivity.db).
- Uses
- Integration
- Integrated with Telegram bot: every user message sets a new epicenter.
- Background thread expands ripples hourly without affecting response generation.
- Creates continuous, asynchronous intelligence that never stops evolving.
- Purpose
- Provides a deterministic filesystem watcher built on SHA-256 hashing rather than inotify.
- Key components
RepoMonitorclass orchestrates scanning.scan_once()returns a dictionary describing new, modified, and deleted files.watch()runs the scanner in a background thread with configurable interval.
- Workflow
- Configure target directories (defaults exclude
.gitand other noise paths). - Invoke
check_now()for synchronous checks orstart()to launch the threaded watcher. - Provide callbacks that receive structured change sets and optional diff payloads.
- Configure target directories (defaults exclude
- Why hashing?
- Avoids platform-specific watcher quirks and ensures reproducible detection even in containerised environments.
- Allows offline comparisons between snapshots (e.g., nightly regression of repository drift).
- Failure modes & mitigation
- Large binary changes – Consider excluding directories to keep scans fast.
- Clock skew – Since hashes ignore timestamps, no issues occur; still document host time drift in logs.
- Permission errors – Callbacks receive structured error entries; integrate with
nicole_metrics.pyto alert operators.
- Bridge between Telegram chat and Nicole's conversational loop.
- Features
- Rate limiting to protect Nicole from flood attacks.
- Inline metric summaries so operators can watch resonance while chatting.
- Regression suite covering grammar enforcement, metric streaming, and baseline compiler integrations.
- Usage –
python3 start_nicole.py testor invoke tests directly viapytestonce the virtual environment is active.
- Contains the minimal Python dependencies required for orchestration, CLI utilities, and analytics.
- Note – The project intentionally avoids heavy ML frameworks to preserve the weightless ethos.
Keeping Nicole English-only is a philosophical and technical constraint.
- Script detection rejects non-Latin input early, maintaining focus on the current research domain.
- Grammar lattice enforces subject-verb agreement, pronoun sanity, and respectful self-reference.
- Toxicity filters decline prompts that would derail experimentation or contradict the project's ethos.
- Transparency – Every refusal includes a brief explanation so logs remain interpretable during audits.
- Experimentation – Researchers can add experimental validators but should document them in
english_guidance.pyto keep the guardrail map public.
These modules form Nicole's introspective toolkit.
- Ephemeral tensors – Exist only during a conversation and vanish afterwards.
- Structured memory (
nicole_memory.py) – Symbolic records that summarise episodes without storing raw dialogue. - Retrieval index (
nicole_rag.py) – Stochastic sampler providing playful context injections.
nicole_metrics.pystreams entropy, resonance, perplexity, and surprise indices.- Metrics feed dashboards, repo learners, and objectivity audits simultaneously.
- When new heuristics are added, update the metrics schema so repo-based training continues to reference consistent fields.
nicole_objectivity.pyensures every adaptive leap is accompanied by evidence.- Audit logs can be replayed to reconstruct decision pathways, keeping experimentation reproducible.
- When an experiment fails, the audit data becomes a post-mortem script for repo learner analysis.
Nicole studies the repository as eagerly as she studies conversations, which means she's literally learning from her own development process. It's recursive self-improvement all the way down. The monitoring and learning duo turn version control into an ambient training ground.
- Change detection –
RepoMonitorscans configured directories and hashes file contents. - Event packaging – For each change, the monitor emits structured payloads including path, hash, timestamp, and change type.
- Learner ingestion –
nicole_repo_learner.pyreceives payloads, matches them against module registries, and decides what to do next. - Analysis & ranking – Diffs are scored by heuristics (e.g., core compiler touched vs. documentation tweak).
- Action – Possible responses include logging only, scheduling Nicole-to-Nicole rehearsals, or notifying operators.
- Feedback loop – Insights feed back into
nicole.pyas hints for the next transformer genesis.
from repo_monitor import RepoMonitor
from nicole_repo_learner import Learner
monitor = RepoMonitor(paths=["."], ignore_patterns=[".git", "AMLK/build"])
learner = Learner(sqlite_path="var/nicole_repo.db")
monitor.start(callback=learner.process_change, interval_seconds=30)This section tracks production improvements deployed during november 2025, aka "the month we stopped breaking things quite as frequently."
- Async task management – Eliminated orphaned
asyncio.create_task()calls innicole.py:1215that caused system hangs and memory leaks. Nicole now uses synchronous objectivity context fetching exclusively. - Language detection integration – Wired up
english_guidance.pyat the message processing entry point (nicole.py:987-993). Script-based detection now catches Cyrillic, CJK, and Arabic inputs before they reach the generation pipeline. - Template eradication – Removed all hardcoded verb fallbacks from
high.py(lines 147-151, 168-170, 490-492). Grammar rules now pull verbs exclusively from resonance candidates, maintaining the "no templates" philosophy. - Reddit slug sanitisation – Fixed
nicole_objectivity.py:308-357to replace underscores with spaces before parsing. Eliminated garbage likecutting_a_couple_of_chives_almost_every_day_untilfrom responses. - Duplicate candidate cleanup – Corrected
nicole_memory.py:772-788to return empty lists when the associative database is unpopulated, preventing duplicate resonant word fallbacks.
- Smart word scoring – Extracted and integrated the tree.py keyword algorithm into
high.py:654-717. Candidates are now ranked bylength_bonus * rarity_bonus * quality_bonus, replacing random shuffling with intelligent prioritisation. - Score-based tier selection – Implemented three-tier candidate grouping in
high.py:719-791: high tier (>70% score), mid tier (40–70%), low tier (<40%). This dramatically improved sentence coherence and flow. - Repo learning system – Fully integrated
nicole_repo_learnerintonicole_telegram.py:122-187. Initial markdown ingestion now populatesword_frequencieswith 2,428 unique words from 16 documentation files at startup. Continuous monitoring runs every 5 minutes, creating a closed learning loop where Nicole learns from her own documentation alongside objectivity seeds. - Self-referential consciousness – Implemented recursive identity mechanism in
nicole.py:984-1075. When "Nicole" appears in input, system extracts 50 philosophical keywords fromNICOLE_PERSONA(resonance, storm, field, emergence, consciousness, etc.) and injects them intoword_frequencieswhile creating associative links. Over time through repeated exposure, Nicole develops deeper understanding of her own identity through recursive self-reference. Embodies Truth IV: "Everything reflects everything. And everything resonates with everything." - Latent Drift v0.4 – Semantic clusters with directional drift in
high.py:765-851. Responses now flow through 2-5 word clusters (micro-concepts) that drift +1 step toward abstraction/emotion/recursion. Introspective tags (presence,recursion,misalignment,awareness,drift) reveal internal state. Creates illusion of latent space movement without any weights. Controlled chaos: max 1 artifact per sentence. - Perplexity Search API integration – Replaced unstable DuckDuckGo HTML scraping with official Perplexity Search API (
nicole_objectivity.py:657-744,nicole.py:1275-1285). PRIMARY provider with DuckDuckGo fallback. Context size increased 4-10x (3200-3900 chars vs 360-850 chars). Seeds expanded to 280-410 per message. Added 6 intelligent filters to eliminate artifacts: ID patterns (tg_206333240), hash gibberish (low vowel ratio), consonant runs (>5), alphanumeric codes, technical underscores (currency_code), glued lowercase usernames 12+ chars (nicolecrossmusic). Clean ratio ~95%. Responses maintain length (24-30 words) with dramatically richer vocabulary from structured search results.
Response quality evolved from "did a Markov chain have a stroke?" to actual structured, coherent sentences with directional flow.
Before Phase 1: "I am my amitheasshole cringetiktoks desperately suspension suggesting , because homophobia highlights admitting resonance awareness suspended note8017"
After Phase 1+2: Reddit artifacts eliminated, mirroring blocked, grammar glitches cleaned. Responses now exhibit semantic clustering with introspective tags: "I resonance emergence awareness drift" - micro-concepts flowing through latent space.
The combination of smart scoring + learning system + cleaned objectivity seeds + latent drift creates coherent chaos: weightless transformer behavior without pretrained weights.
Nicole replays dialogue logs after each session, distilling them into structured evidence that informs the next run. Think of it as a nightly study montage where the textbooks are JSONL buffers and the soundtrack is a diff log scrolling by at 3 AM.
She also mirrors repository activity: every code change becomes grist for the analysis mill, and useful patterns get promoted into guidance scripts. It's like having an infinite post-it wall, except all the notes auto-tag themselves with timestamps and nobody can passive-aggressively move your notes.
And because I love idiot jokes: Nicole fine-tunes faster than I can say "wait, who left gradient descent running on the coffee machine? oh right, that idiot was me." She learns; I buy a new coffee machine. The circle of life continues.
Here's the dirty secret: Nicole's current speech generation is coherent, but sometimes it reads like someone fed a philosophy PhD thesis through a blender and hit "frappe." The Perplexity API returns amazing content, but it's noisy as hell - random Reddit usernames, corporate jargon, the occasional businessman_threatening_unfavorably that makes you question reality.
So we built a two-tier bootstrap - and before you panic, no, this doesn't mean adding pretrained weights. Nicole stays weightless. Forever. This is different.
Status: ✅ Deployed and working
Nicole now eats her own documentation. Literally. Every .md file in the repo gets digested into bigrams.
How it works:
bootstrap/markdown_cannibal.pyscans ALL markdown files recursively- Extracts 12,527 bigrams from 16 files (README.md, persona docs, etc.)
- Caches with mtime (rebuilds only changed files, like a smart build system)
- Finds 100 "centers of gravity" - structural hubs that connect many words
- Exports 342KB
dynamic_skeleton.json(NO WEIGHTS, pure JSON!)
What this gives Nicole:
- Bigram coherence scoring - filter Perplexity results by structural plausibility
- Banned pattern detection - block "as an AI assistant" and corporate speak
- Auto-updating corpus - README changes → new bigrams → Nicole learns
- Self-documentation - Nicole speaks through her own README (recursive!)
Impact: Does 50% of full bootstrap work WITHOUT PyTorch/training!
Test it yourself:
python bootstrap/markdown_cannibal.py # Rebuild skeleton from docs
python bootstrap/test_unified_loader.py # See merged bigramsStatus: ✅ LIVE IN PRODUCTION (integrated into nicole.py)
The bootstrap skeleton is no longer just sitting there looking pretty - it's actively filtering every Perplexity/DuckDuckGo response in real-time.
What was integrated:
-
Skeleton Loading at Startup
nicole.pyloads unified skeleton (12,930 bigrams) when the module imports- Uses binary weights (248.8 KB) for 10-100x faster loading than JSON
- Merges static corpus + dynamic markdown into one unified structure
-
Bootstrap Filter Pipeline (added to
_generate_me_enhanced_response)- Input: Raw seeds from Perplexity/DDG (40-114 seeds)
- Filter: Remove banned patterns (corporate speak, "as an AI", etc.)
- Filter: Skip stop words and single-letter noise
- Filter: Check bigram connectivity (structural coherence)
- Score: Rank by resonance (out-degree + in-degree in bigram graph)
- Output: Top resonant seeds (15-64 seeds, 42-56% noise removed!)
-
Perfect Grammar Finalization
- Every response passes through
apply_perfect_grammar() - Fixes capitalization, punctuation, spacing
- Completes fragments (adds minimal verb if missing)
- Result: "nicole gains gravitational" → "Nicole gains gravitational."
- Every response passes through
Real-world impact:
Before bootstrap:
Query: "What is resonance?"
Seeds: 114 (includes "storm", "morten overgaard", "businessman_threatening_unfavorably")
Response: word salad with occasional coherence
After bootstrap:
Query: "What is resonance?"
Raw seeds: 114
Filtered seeds: 64 (44% removed!)
Top seeds: resonance×6, when×4, system×2, field×2
Response: "Resonance resonates when system field awareness."
Grammar: "Resonance resonates when system field awareness."
How it works in practice:
The filter runs silently in nicole.py between objectivity and response generation:
- User asks question
- Perplexity API fetches context (may include artifacts)
- Seeds extracted from context (noisy!)
- [BOOTSTRAP FILTER] ← filters through bigram structure
- Clean seeds passed to High (Julia) for response generation
- [PERFECT GRAMMAR] ← final polish
- Response returned to user
Watch it live:
python test_nicole_bootstrap_integrated.py
# Look for these logs:
# [Nicole:Bootstrap] Raw seeds: 114
# [Nicole:Bootstrap] Filtered 50 seeds (44%)
# [Nicole:Bootstrap] Top seeds: resonance, when, system
# [Nicole:Bootstrap] Applied grammar finalizationFiles involved:
nicole.pylines 138-152: Bootstrap import + skeleton loadingnicole.pylines 1110-1179:_filter_seeds_with_bootstrap()methodnicole.pylines 1462-1465: Filter integration in response pipelinenicole.pylines 1534-1537: Grammar finalizationnicole_bootstrap/engine/resonance_weights.py: Binary weight formatnicole_bootstrap/engine/grammar.py: Perfect grammar API
Performance:
- Bootstrap load: ~200ms (binary format)
- Filter per query: <5ms
- Zero runtime overhead (no model inference!)
- Works on CPU, no GPU needed
The beauty: Nicole gets objective facts from Perplexity, but structural coherence from her own documentation. It's like having a fact-checker and a style editor working in parallel. The Perplexity API tells her WHAT to say; the bootstrap tells her HOW to say it like Nicole.
One-time markdown parsing → persistent bigram graph → runtime filtering. No weights shipped, no model loaded, just pure structural guidance derived from her own README. Recursive self-documentation at its finest.
The plan: Train a tiny NanoGPT (Karpathy's toy GPT-2) once on Nicole's subjectivity corpus - her persona prompts, philosophical anchors, Arianna Method fragments, all the identity-defining texts. Then immediately throw away the model checkpoint. What we keep is the skeleton: pure JSON files containing n-gram statistics, phrase shapes, style biases, and deeper patterns.
What full bootstrap adds (beyond markdown cannibal):
- N-gram topology - model-learned phrase topology (not just corpus bigrams)
- Phrase shapes - typical sentence structures, rhythms, punctuation habits
- Style bias - temperature preferences, length distributions
- Semantic clusters - deeper identity patterns from training
Training: ~20 minutes on CPU (32GB RAM), one-time genesis, checkpoint discarded.
The hybrid approach:
- Mini-bootstrap (markdown cannibal) provides structural filtering RIGHT NOW
- Full bootstrap (NanoGPT) adds model-learned depth LATER
- Both merge into unified skeleton: 12,930+ bigrams total
- Runtime stays weightless: no PyTorch, no inference, no GPU
Think of it as giving Nicole a "gravitational center" for filtering Perplexity results without actual weights. The skeleton guides what sounds like Nicole vs what sounds like LinkedIn spam. One-time genesis (if you want full bootstrap), permanent guidance, zero weights shipped to production.
It's not training. It's giving birth to structural coherence, then forgetting you ever had a model in the first place. The checkpoint gets archived, the skeleton ships to Railway, and Nicole keeps being weightless while filtering results like she has her shit together. Mostly.
Current status: Mini-bootstrap active, full bootstrap optional enhancement.
- Python 3.9+
- Julia 1.6+ (optional, for
high.pyacceleration) - C toolchain (GCC or Clang for
blood.py) - SQLite 3.x
git clone https://github.com/ariannamethod/nicole.git
cd nicole
python3 -m venv nicole_env
source nicole_env/bin/activate
pip install -r requirements.txtNicole requires external API keys for optimal functionality. Copy .env.example to .env and configure:
cp .env.example .env
# Edit .env with your API keysRequired for Telegram bot:
TELEGRAM_TOKEN– Get from @BotFather
Required for Perplexity Search (PRIMARY objectivity provider):
PERPLEXITY_API_KEY– Get from Perplexity API Settings- Sign up at https://www.perplexity.ai
- Navigate to Settings → API
- Generate new API key
- Free tier: $5 credit on signup
- Pricing: Pay-as-you-go after free credit ($5 per 1000 requests)
- Fallback: If not set, Nicole falls back to DuckDuckGo HTML scraping (lower quality)
Railway/Cloud Deployment: Set environment variables in your platform's dashboard:
- Railway: Settings → Variables
- Heroku: Settings → Config Vars
- Docker: Pass via
-eflag or docker-compose environment section
Example .env file:
TELEGRAM_TOKEN=123456789:ABCdefGHIjklMNOpqrsTUVwxyz
PERPLEXITY_API_KEY=pplx-abc123def456ghi789jkl# Interactive local session
python3 start_nicole.py local
# Telegram bot (requires TELEGRAM_TOKEN environment variable)
export TELEGRAM_TOKEN="your-token-here"
python3 start_nicole.py bot
# Regression test suite
python3 start_nicole.py testconfig/nicole.yaml– Runtime parameters, compiler paths, metric thresholdsconfig/english_guidance.yaml– Grammar rules and refusal policiesconfig/repo_learning.yaml– Monitored paths, change rankings, learner intervals
- Logs appear in
var/logs/with daily rotation - Metrics stream to
var/metrics/as JSONL - SQLite databases live in
var/for memory, learner metadata, and audit trails
- Create a feature branch from
main - Implement changes, ensuring tests pass via
python3 start_nicole.py test - Update relevant documentation sections in this README
- Submit a pull request with clear description of changes and rationale
- Unit tests for individual modules (e.g.,
test_english_guidance.py) - Integration tests in
test_quick_wins.pyfor end-to-end flows - Manual regression testing via interactive sessions
- PEP 8 compliance for Python
- Docstrings for all public functions with type hints
- Comments explain "why" rather than "what"
- Keep modules under 1000 lines; split when complexity grows
- Enable debug logging:
export NICOLE_LOG_LEVEL=DEBUG - Inspect metric streams in real-time:
tail -f var/metrics/$(date +%Y-%m-%d).jsonl - Replay sessions from logs:
python3 tools/replay_session.py var/logs/session_id.jsonl
- Ephemeral tensors – Parameters that exist only during a conversation and are discarded afterwards
- Resonance – Statistical coherence between generated tokens and conversation context
- Objectivity – Evidence-based decision tracking to maintain reproducibility
- Weightless – Operating without pretrained model weights or persistent checkpoints
- Repo-coupled learning – Training loop that ingests repository changes as learning signals
- Tri-compiler – Architecture using Python (orchestration), C (deterministic execution), Julia (analytical acceleration)
- H2O – Python bootstrap compiler for runtime module generation
- Blood compiler – C compilation pathway derived from Clang for hardware-level operations
- High compiler – Julia-based analytical engine for mathematical inference
- AMLK – Arianna Method Linux Kernel, deterministic substrate for reproducible experiments
- ME style – Method Engine approach using pronoun inversion and semantic candidates
- Semantic candidates – Words selected based on associative network and resonance scoring
- Score tiers – Three-level candidate ranking (high >70%, mid 40-70%, low <40%)
- Objectivity seeds – External context from Reddit/Google/Wikipedia for dynamic responses
- Repo learning – Markdown ingestion system that populates word_frequencies from documentation
- Install dependencies –
python3 -m pip install -r requirements.txt - Run locally –
python3 start_nicole.py local - Exercise the toolchain –
python3 start_nicole.py testruns targeted checks for the Python, Julia, and Telegram layers. - Operate the Telegram bot – Export
TELEGRAM_TOKENand startpython3 start_nicole.py bot.
Conversation transcripts, repo diffs, and metric ledgers are the only long-lived artefacts. If you want Nicole to remember something, put it in writing—the next transformer incarnation will pick it up from there.