semantic-caching

Here are 25 public repositories matching this topic...

Hyperion-HQ / Hyperion

Ultra-low-latency LLM gateway with microsecond caching, dynamic routing, budgets, analytics, and forecasting.

Updated Apr 2, 2026
Go

Official implementation of "SentenceKV: Efficient LLM Inference via Sentence-Level Semantic KV Caching" (COLM 2025). A novel KV cache compression method that organizes cache at sentence level using semantic similarity.

natural-language-processing transformers memory-efficiency efficient-inference inference-optimization kv-cache llm semantic-caching colm2025

Updated Sep 29, 2025
Python

renswickd / semantic-prompt-cache

Star

This app leverages Semantic Caching to minimize inference latency and reduce API costs by reusing semantically similar prompt responses.

optimization ttl-cache rag mistral-api semantic-caching

Updated Jul 4, 2025
Python

AzureManagedRedis / semantic-caching-demo-and-calculator

Star

Semantic caching demo with real-time streaming and a cost & sizing calculator, powered by Azure Managed Redis and Azure OpenAI.

demo azure-managed-redis semantic-caching cost-modeling

Updated Nov 12, 2025
Python

AP3008 / Janus

Star

Rust Local Token Compression Proxy for coding agents, built solo for GenAI Genesis 2026. 🏆 1st Google Sustainability Hack

rust redis local proxy-server tui tokio deduplication ratatui axum-framework token-compression semantic-caching

Updated Mar 16, 2026
Rust

redislabsdev / langcache-customer-data-eval

Star

Evaluate how a semantic cache performs on your dataset by computing key KPIs over a threshold sweep and producing plots/CSVs:

redis evaluation vector-database semantic-caching

Updated Mar 11, 2026
Python

Clement-Okolo / Semantic-Cache

Star

Semantic caching for LLM responses using Redis Vector DB, LangChain, and HuggingFace embeddings, parses PDFs, generates FAQs with Groq, and serves similarity-based answers without redundant LLM calls.

chunking long-term-memory vector-database semantic-caching llamacloud live-caching batch-caching

Updated Feb 28, 2026
Jupyter Notebook

sensoris / semcache-python

Star

Python library for the Semcache API

python ai openai llm anthropic semantic-caching

Updated Jun 9, 2025
Python

awesome-pro / smartmemo

Sponsor

Star

Semantic memory and caching for LLM agents with classifier-validated equivalence instead of naive cosine thresholds.

python machine-learning sqlite pytorch embeddings developer-tools ai-agents cost-optimization faiss vector-search sentence-transformers semantic-memory llm llmops semantic-cache semantic-caching

Updated May 20, 2026
Python

developertogo / velo-sentinel

Star

Production-grade Java 25 Virtual Thread inference gateway bridging NVIDIA Triton → Dynamo with Earliest Deadline First (EDF) priority queuing, adaptive batching, and async shadow validation.

redis distributed-systems grpc priority-queues load-balancing model-serving triton-inference-server virtual-threads inference-gateway semantic-caching nvidia-dynamo disaggregated-serving

Updated May 9, 2026
Java

manishklach / semantic-kv-control-plane

Star

A systems research platform for semantic KV-cache orchestration, topology-aware memory placement, distributed prefix reuse, and rack-scale inference memory simulation.

Updated May 25, 2026
Python

Chief-Strategist-J / llm-observability-platform

Star

High-performance LLM observability and evaluation platform with automated instrumentation, stateful chat orchestration, semantic vector memory caching, and scheduled Temporal workers for cost anomaly detection.

python go clickhouse semantic-search temporal rag opentelemetry vector-database llm-observability semantic-caching llmops-prompt-engineering

Updated May 26, 2026
Python

sensoris / semcache-node

Star

Node SDK for the Semcache API

node js openai llm semantic-caching

Updated Jun 18, 2025
JavaScript

maichanks / llm-cost-optimizer

Star

LLM cost monitoring and optimization toolkit

redis monitoring budget cost-optimization llm openrouter prompt-compression semantic-caching token-tracking ai-cost openclaw api-cost-management

Updated Mar 16, 2026
JavaScript

nunoferna / aegis-llm

Star

LLMOps API Gateway in Go. Optimizes GenAI workloads with Qdrant semantic caching, Redis rate-limiting, and OpenTelemetry metrics.

docker kubernetes redis golang api-gateway proxy rate-limiting gemini openai cloud-native helm-chart opentelemetry qdrant llm anthropic semantic-caching

Updated Mar 15, 2026
Go

arpon-kapuria / betterdb-rag-observability

Star

📊 A FastAPI RAG pipeline exploring Redis/Valkey observability with BetterDB — semantic caching, rate limiting, and latency attribution with MCP-powered debugging.

rate-limiting observability latency-analysis llmops rag-pipeline semantic-caching

Updated May 23, 2026
Python

rj41-w2 / Rehan_Portfolio

Star

A high-performance, open-source portfolio built with React 19, featuring an intelligent AI chat assistant with multi-LLM failover, semantic caching, and a 3D-integrated UI.

portfolio portfolio-website firebase-auth open-source-project developer-portfolio gemini-api ai-chatbot vite-template llm groq-api semantic-caching

Updated May 13, 2026
JavaScript

Maanik23 / agentic-content-pipeline

Star

Multi-agent content pipeline with LangGraph, FastAPI, and Redis semantic caching

python redis multi-agent ai-agents content-pipeline fastapi llm langgraph semantic-caching

Updated Apr 4, 2026
Python

TokenLao6 / tokaify-gateway

Star

A lightweight and high-performance API gateway for large language models, through intelligent routing and semantic caching, can significantly reduce token costs

api-gateway openai claude cost-optimization llm semantic-caching

Updated May 13, 2026

graz-dev / redis-rag-semantic-cache

Star

Simple RAG implementation with semantic caching using Redis and Langchain

redis rag llms semantic-caching

Updated Nov 21, 2025
Python

Improve this page

Add a description, image, and links to the semantic-caching topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the semantic-caching topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic-caching

Here are 25 public repositories matching this topic...

Hyperion-HQ / Hyperion

zzbright1998 / SentenceKV

renswickd / semantic-prompt-cache

AzureManagedRedis / semantic-caching-demo-and-calculator

AP3008 / Janus

redislabsdev / langcache-customer-data-eval

Clement-Okolo / Semantic-Cache

sensoris / semcache-python

awesome-pro / smartmemo

developertogo / velo-sentinel

manishklach / semantic-kv-control-plane

Chief-Strategist-J / llm-observability-platform

sensoris / semcache-node

maichanks / llm-cost-optimizer

nunoferna / aegis-llm

arpon-kapuria / betterdb-rag-observability

rj41-w2 / Rehan_Portfolio

Maanik23 / agentic-content-pipeline

TokenLao6 / tokaify-gateway

graz-dev / redis-rag-semantic-cache

Improve this page

Add this topic to your repo