Claude Code plugin marketplace — 40+ installable reference skills across vLLM/SGLang inference, Kubernetes & Harvester, GPU host bring-up, observability, security, and agent workflows.
/plugin marketplace add air-gapped/skills
/plugin install <plugin>@air-gapped-marketplace
Plugins are either single-skill (e.g. jinja-expert, helm, keda) or grouped suites (e.g. vllm — bundles all 14 vLLM reference skills into one plugin). See .claude-plugin/marketplace.json for the full list.
Versioning scheme per plugin: 0.YYYYMMDD.N where YYYYMMDD is the UTC date of the most recent content change across member skills and N is the unique commit count touching any member skill directory. Run /plugin update to pick up new bumps.
| Skill | Description |
|---|---|
aiperf |
NVIDIA AIPerf — vendor-neutral generative-AI inference benchmarking (genai-perf successor). Covers aiperf profile with concurrency / request-rate / fixed-schedule trace replay / user-centric / multi-run confidence, 15 endpoint types (chat,… |
ansible-idrac-9-10 |
Run and debug dellemc.openmanage Ansible playbooks against Dell PowerEdge iDRAC 9 (14G–16G) and iDRAC 10 (17G — R670, R770, R870, R970, XE9780, XE9785). Covers the iDRAC 10 / iDRAC 9 ≥ 7.30.10.50 BasicAuthState: Unadvertised default that… |
argo-cd-apps |
Author and maintain Argo CD Application and ApplicationSet manifests as a GitOps consumer (publisher), targeting Argo CD v3.3 / v3.4 (May 2026). Covers source types (Helm, Kustomize, OCI, multi-source, plugin), sync policies + options + waves +… |
autoresearch |
Karpathy-pattern autoresearch — autonomous hill-climbing over a measurable metric, deep multi-agent research, or research-then-optimize. Three modes: Optimize (keep/discard ratchet), Research (STORM multi-perspective), Improve. |
baml-expert |
BAML (Boundary ML) expert for projects defining LLM calls as typed functions in .baml files with a generated Python client. Use whenever the repo contains baml_src/, baml_client/, baml-cli commands, or imports from baml_py / baml_client. Covers… |
confluence-best-practices |
Advise on USING Confluence well, not operating it: make the structural call — is this a space, a page, or a child page? — diagnose why a wiki is a dread (can't find anything, content rots, duplicates, hidden by permissions, unreadable), and… |
gpu-host-tuning |
Audit AND tune Linux/GPU inference hosts — read-only host snapshot |
harvester-upgrade |
Plan and run a controlled, COMMUNITY-edition Harvester HCI upgrade off an EOL line up to latest stable — the no-skip minor ladder (1.5→1.6→1.7→1.8; embedded RKE2/KubeVirt/Longhorn/SLE-Micro ride along), gated at each hop on first upgrading the… |
helm |
This skill should be used when authoring or maintaining Helm charts — creating charts, writing templates and _helpers.tpl, values.yaml patterns, Chart.yaml, values.schema.json, helm-docs, and library charts. Covers Helm 4 (SSA, WASM, OCI digest),… |
jinja-expert |
Author, read, and debug Jinja2 templates across the three places Jinja lives in 2026 — HuggingFace chat_template.jinja (rendered by apply_chat_template for vLLM / sglang), Ansible playbooks + .j2 files, and Jinja-adjacent Kubernetes workflows… |
jira-best-practices |
Advise on USING Jira well, not operating it: make the structural call — is this an epic, a story, a task, or a sub-task? — and diagnose why a Jira is a dread, then recommend the lean fix. Adapt to the organisation's OWN hierarchy names, conventions,… |
jira-cli |
Drive Atlassian Jira from the terminal with the jira CLI (jira-cli, v1.7.0) against ANY Jira — Cloud or on-premise/Data Center. Covers the full command surface (issue / epic / sprint / board / project / release), the non-interactive automation… |
jira-confluence-mcp |
Install, configure, secure, and troubleshoot the mcp-atlassian MCP server (sooperset/mcp-atlassian) that connects an agent to Jira/Confluence — including AIR-GAPPED setup (mirror the prebuilt image by digest; no PyPI/git mirror) and internal-CA /… |
k8s-components-checker |
Survey an RKE2 community cluster against an embedded compatibility registry of 19 stack components and produce a verdict for upgrade-readiness, drift-review, and version-skew questions. Components: RKE2, Rancher, Harvester, Cilium, Tetragon,… |
keda |
Configure, operate, and master KEDA (Kubernetes Event-driven Autoscaling) — ScaledObject, ScaledJob, TriggerAuthentication CRDs, 70+ scalers, HPA behavior tuning, scale-to-zero, the KEDA HTTP Add-on, production hardening, multi-trigger semantics,… |
keycloak-iam |
Operate, configure, deploy, secure, and integrate with Keycloak (open-source IAM) — the modern Quarkus distribution (24.x–26.6.x), the Keycloak Operator with Keycloak and KeycloakRealmImport CRDs, and realm/client/identity-provider configuration. |
lmcache-mp |
LMCache multiprocess (MP) mode — standalone LMCache server in its own pod/process that vLLM connects to over ZMQ. Gives process isolation, no GIL contention on the inference path, one cache shared by multiple vLLM pods per node, and CPU-memory… |
makefile-best-practices |
Makefile best practices, patterns, and templates for GNU Make 4.x — dependency graphs, task-runner workflows, parallel-safe recipes, self-documenting help targets, and language-specific patterns (Go, Python, Node, Docker, Helm, POSIX). |
nvidia-datacenter-bringup |
Bring up NVIDIA HGX/DGX datacenter GPU hosts on Ubuntu 24.04 LTS — air-gapped or connected, Secure Boot enabled. Covers B300/B200/H100/A100/L40S/L4 driver+fabricmanager+NVLSM+DOCA-OFED install order and exact package set from NVIDIA CUDA repo + DOCA… |
nvidia-nixl |
NVIDIA Inference Xfer Library (NIXL) operator + developer reference. Point-to-point KV-cache and tensor transport for distributed inference (Dynamo, vLLM, SGLang). Covers the agent API (full Python reference; C++/Rust via upstream pointers), all 13… |
open-webui-embeddings |
Wire HuggingFace embedding + reranker models (BGE-M3, BGE-Reranker-v2-m3, etc.) into Open WebUI's RAG pipeline via LiteLLM proxying HuggingFace Text Embeddings Inference (TEI). Covers the exact wire shapes Open WebUI sends (URL auto-append on embed… |
open-webui-valkey-websocket |
Deploy Open WebUI multi-pod with WebSockets and Valkey/Redis Sentinel at 1000+ user scale on Kubernetes. Centerpiece is the structural Socket.IO+Redis frame-amplification bug (#23733) that cripples multi-pod streaming, and the maintainer-endorsed… |
openshift-app |
Package applications for OpenShift deployment: container images (UBI, arbitrary UID, multi-stage builds), packaging formats (Helm, Kustomize, Operators, OLM v1), CI/CD (Tekton, ArgoCD, Shipwright, Conforma), security (SCC, PSA, supply chain, image… |
patch |
Generate candidate fixes for verified security findings. Consumes TRIAGE.json (preferred), VULN-FINDINGS.json, or an execution-harness results directory. Static-analysis input gets a per-finding patch subagent + an independent reviewer and is… |
prometheus-mimir-grafana |
Query Prometheus and Grafana Mimir, write and debug PromQL, and build or fix Grafana dashboards — for agents solving problems from metrics. Covers the Prometheus HTTP API (/api/v1/query, query_range, series, labels, metadata), Mimir… |
rancher-upgrade |
Plan and sequence COMMUNITY-edition Rancher upgrades across air-gapped multi-cluster fleets — a management/"hosting" Rancher cluster plus the downstream RKE2/K3s clusters it provisions. Covers the community release model (2.11→2.14,… |
secure-boot-cert-rotation |
Triage and remediate the Microsoft Secure Boot 2011→2023 UEFI certificate rotation (CAs expiring June/October 2026) across Dell PowerEdge / iDRAC9 bare metal, Ubuntu/Linux servers, and Harvester HCI / KubeVirt guest VMs. Establishes the load-bearing… |
sglang-hicache |
SGLang HiCache (hierarchical KV cache) — three-tier prefix cache: GPU HBM (L1) → pinned host DRAM (L2) → distributed L3 (Mooncake / 3FS / NIXL / AIBrix / EIC / SiMM / file / LMCache). Covers --enable-hierarchical-cache, all --hicache-* flags,… |
sglang-model-gateway |
SGLang Model Gateway (sgl-model-gateway, formerly sgl-router) — Rust router fronting vLLM and SGLang inference workers on Kubernetes. Covers first-class vLLM gRPC backend plus HTTP transparent-proxy for vanilla vLLM, the policy set (six… |
skill-improver |
Autoresearch loop for Claude Code skills — greedy keep/discard hill climbing on a 10-dimension quality rubric, with blind subagent validation for self-scoring bias, plus a freshen mode that probes external references (release notes, docs,… |
threat-model |
Build a threat model for a target codebase. Three modes: "interview" walks an application owner through the four-question framework and produces a threat model from their answers; "bootstrap" derives a threat model from the code plus past… |
transformers-config-tokenizers-expert |
Preflight reference for HuggingFace snapshots — what vLLM, sglang, and transformers.generate see at runtime. Covers config-file precedence (tokenizer.json, tokenizer_config.json, generation_config.json, chat_template.jinja), transformers v5… |
triage |
Triage a batch of raw security findings. Verify each is real, collapse duplicates, re-rank by derived exploitability, and tag with an owner. Takes a directory or file of scanner output and writes TRIAGE.json + TRIAGE.md sorted by what actually needs… |
vllm-benchmarking |
Run production vLLM benchmarks — vllm bench (serve, throughput, latency, sweep, startup, mm-processor), request-rate vs max-concurrency semantics, TTFT/TPOT/ITL/E2EL percentiles, goodput SLO measurement, prefix-cache workloads, air-gapped… |
vllm-caching |
vLLM tiered KV cache configuration for production H100/H200 clusters. Native CPU offload, LMCache (CPU+NVMe+GDS), NixlConnector (disaggregated prefill), MooncakeConnector (RDMA), MultiConnector composition. Version gates, sizing math (flag total… |
vllm-chat-templates |
vLLM chat-template (prompt-side Jinja) operator reference. Template resolution precedence (--chat-template → AutoProcessor → tokenizer default → bundled fallback), chat_template_kwargs allowlist silently dropping… |
vllm-configuration |
Configure vLLM completely — YAML config file format, CLI arg precedence, full VLLM_/HF_/TRANSFORMERS_* env-var catalog, end-to-end recipe for air-gapped environments (internal HF mirrors, hf-mirror.com, ModelScope, HF_HUB_OFFLINE with pre-seeded… |
vllm-deployment |
Use this skill when authoring, reviewing, or fixing a vLLM Kubernetes manifest, Docker/Podman pod, or OpenShift ServingRuntime — even when the user does not say "vllm". Triggers on: lab cluster performance practices, cache mount + survival across… |
vllm-gemma-4-31b |
Operating-point reference for serving Gemma 4 31B on vLLM — TP sizing, max_model_len, max_num_seqs, gpu_memory_utilization, kv_cache_dtype, EAGLE3 spec-dec, chat_template choice. |
vllm-input-modalities |
vLLM non-chat inference surfaces — text embeddings (/v1/embeddings, /v2/embed), reranking/scoring (/rerank, /score), speech-to-text (/v1/audio/transcriptions, /v1/audio/translations), document OCR via VLMs. Covers 2026 --runner pooling… |
vllm-nvidia-hardware |
NVIDIA AI-hardware + vLLM-platform reference covering Hopper (H100/H200), Blackwell (B100/B200/B300) and Blackwell Ultra, Grace-Blackwell superchips and NVL72 racks (GB200, GB300), Vera Rubin (R100/R300) with VR200 NVL144 and Kyber NVL576, Dell… |
vllm-observability |
Observe production vLLM — /metrics Prometheus surface (V1 engine), SLO-driven alerting on TTFT/ITL/queue/KV/preemption/aborts/corrupted-logits, shipping Grafana dashboards in examples/observability/, OTLP tracing with --otlp-traces-endpoint… |
vllm-omni |
vLLM-Omni output-side multimodal generation — image (FLUX.1/2, Qwen-Image, GLM-Image, BAGEL, SD3.5, HunyuanImage-3.0), video (Wan2.1/2.2, LTX-2, HunyuanVideo-1.5), TTS (Qwen3-TTS, CosyVoice3, Voxtral-TTS), any-to-any omni (Qwen3-Omni, Qwen2.5-Omni,… |
vllm-performance-tuning |
vLLM performance-tuning operator reference — tuning workflow (baseline → bottleneck → knob → re-bench), fused-MoE kernel autotune (benchmark_moe.py generates E=N,N=M,device_name=X.json configs), DeepEP all-to-all + expert parallelism + EPLB,… |
vllm-quantization |
vLLM datacenter-GPU quantization — picking, configuring, troubleshooting NVFP4, FP8, MXFP4, MXFP8, AWQ, GPTQ, INT8, compressed-tensors, modelopt, quark on H100/H200/B200/B300/GB200/GB300. 29 --quantization flag values, KV-cache dtypes (fp8_e4m3,… |
vllm-reasoning-parsers |
vLLM reasoning-parser operator + developer reference. --reasoning-parser CLI wiring, ReasoningParser contract (non-streaming extract_reasoning + per-delta extract_reasoning_streaming), is_reasoning_end xgrammar gating,… |
vllm-speculative-decoding |
Pick, configure, tune, monitor vLLM speculative decoding in production. Eleven SpeculativeMethod options (ngram, ngram_gpu, medusa, mlp_speculator, draft_model, suffix, eagle, eagle3, dflash, mtp, extract_hidden_states), --speculative-config JSON… |
vllm-tool-parsers |
vLLM tool-calling operator reference — picking --tool-call-parser per model family, writing custom parsers via --tool-parser-plugin, navigating vLLM source + GitHub tracker to debug any specific tool-call question. Pointer map, not source… |
vuln-scan |
Static source-code vulnerability scan. Reads a target directory (and THREAT_MODEL.md if present), spawns parallel review subagents per focus area, and writes VULN-FINDINGS.json + .md for /triage to consume. Read-only — no building, running, or… |
MIT licensed.