Yet Another Memory Layer, inspired by Cognitive Science, designed for Cyber Waifu
See docs/ARCHITECTURE.md for detailed implementation.
Plast Mem is built around self-hosting and does not try to steer you towards a website with a 'Pricing' tab.
Written in Rust, it is packaged as a single binary (or Docker image) and requires only a connection to an LLM service (such as llama.cpp, Ollama) and a ParadeDB database to work.
Conversations flow continuously, but human memory segments them into discrete episodes. Plast Mem uses Event Segmentation Theory to detect natural boundaries—topic shifts, time gaps, or message accumulation—and creates episodic memories at these boundaries.
Messages are accumulated in a queue and processed in batches. A single LLM call segments conversations into coherent episodes, each with a title, summary, and surprise level.
Plast Mem implements two complementary memory layers inspired by cognitive science:
Episodic Memory captures "what happened"—discrete conversation events with temporal boundaries. Each episode stores the original messages, an LLM-generated summary, and FSRS parameters for decay modeling.
Semantic Memory captures "what is known"—durable facts and behavioral guidelines extracted from episodes. Facts are categorized into 8 types (identity, preference, interest, personality, relationship, experience, goal, guideline) and use temporal validity instead of decay.
The Consolidation Pipeline (inspired by CLS theory) runs offline to extract semantic facts from unconsolidated episodes. When 3+ episodes accumulate or a flashbulb memory (surprise ≥ 0.85) occurs, an LLM processes the episodes against existing knowledge and performs new/reinforce/update/invalidate actions.
Memory retrieval combines multiple signals for relevance:
- BM25 full-text search on summaries and keywords
- Vector similarity via embeddings (cosine distance)
- Reciprocal Rank Fusion (RRF) to merge keyword and semantic scores
- FSRS retrievability re-ranking for episodic memories (decay modeling)
The search returns the most relevant memories from both episodic and semantic layers, formatted as markdown for LLM consumption.
FSRS (Free Spaced Repetition Scheduler) determines when an episodic memory should be forgotten.
Surprise-based initialization: Episodes with high surprise (significant information gain) receive a stability boost, making them decay slower.
Review mechanism: Retrieval records candidate memories for review. When the conversation is later segmented, an LLM evaluates each memory's relevance (Again/Hard/Good/Easy) and updates FSRS parameters (stability, difficulty) accordingly. Semantic memories do not use FSRS—they remain valid until explicitly contradicted and invalidated.
We have not yet released version 0.1.0 because the core functionality is incomplete. However, you are welcome to join us in developing it! See CONTRIBUTING.md for guidelines.
While introducing graph structures can be beneficial, it also dramatically increases complexity. Plast Mem is written in Rust, which makes it difficult to easily reuse solutions like Graphiti.
Overall, I think the convenience of self-hosting is more important.
No, but I might draw inspiration from some of it - or I might not.
For locally running embedding models, we recommend Qwen3-Embedding-0.6B - its dimensionality meets requirements and delivers high-quality embeddings.
For other embedding models, simply ensure they can output vectors of 1024 dimensions or higher and support MRL, like OpenAI's text-embedding-3-small.
For chat models, no recommendations are currently available, as further testing is still required.
This project is inspired by the design of: