skills

Claude Code plugin marketplace — 40+ installable reference skills across vLLM/SGLang inference, Kubernetes & Harvester, GPU host bring-up, observability, security, and agent workflows.

Install

/plugin marketplace add air-gapped/skills
/plugin install <plugin>@air-gapped-marketplace

Plugins are either single-skill (e.g. jinja-expert, helm, keda) or grouped suites (e.g. vllm — bundles all 14 vLLM reference skills into one plugin). See .claude-plugin/marketplace.json for the full list.

Versioning scheme per plugin: 0.YYYYMMDD.N where YYYYMMDD is the UTC date of the most recent content change across member skills and N is the unique commit count touching any member skill directory. Run /plugin update to pick up new bumps.

Skill	Description
`aiperf`	NVIDIA AIPerf — vendor-neutral generative-AI inference benchmarking (genai-perf successor). Covers `aiperf profile` with concurrency / request-rate / fixed-schedule trace replay / user-centric / multi-run confidence, 15 endpoint types (chat,…
`ansible-idrac-9-10`	Run and debug `dellemc.openmanage` Ansible playbooks against Dell PowerEdge iDRAC 9 (14G–16G) and iDRAC 10 (17G — R670, R770, R870, R970, XE9780, XE9785). Covers the iDRAC 10 / iDRAC 9 ≥ 7.30.10.50 `BasicAuthState: Unadvertised` default that…
`argo-cd-apps`	Author and maintain Argo CD `Application` and `ApplicationSet` manifests as a GitOps consumer (publisher), targeting Argo CD v3.3 / v3.4 (May 2026). Covers source types (Helm, Kustomize, OCI, multi-source, plugin), sync policies + options + waves +…
`autoresearch`	Karpathy-pattern autoresearch — autonomous hill-climbing over a measurable metric, deep multi-agent research, or research-then-optimize. Three modes: Optimize (keep/discard ratchet), Research (STORM multi-perspective), Improve.
`baml-expert`	BAML (Boundary ML) expert for projects defining LLM calls as typed functions in .baml files with a generated Python client. Use whenever the repo contains baml_src/, baml_client/, baml-cli commands, or imports from baml_py / baml_client. Covers…
`confluence-best-practices`	Advise on USING Confluence well, not operating it: make the structural call — is this a space, a page, or a child page? — diagnose why a wiki is a dread (can't find anything, content rots, duplicates, hidden by permissions, unreadable), and…
`gpu-host-tuning`	Audit AND tune Linux/GPU inference hosts — read-only host snapshot
`harvester-upgrade`	Plan and run a controlled, COMMUNITY-edition Harvester HCI upgrade off an EOL line up to latest stable — the no-skip minor ladder (1.5→1.6→1.7→1.8; embedded RKE2/KubeVirt/Longhorn/SLE-Micro ride along), gated at each hop on first upgrading the…
`helm`	This skill should be used when authoring or maintaining Helm charts — creating charts, writing templates and _helpers.tpl, values.yaml patterns, Chart.yaml, values.schema.json, helm-docs, and library charts. Covers Helm 4 (SSA, WASM, OCI digest),…
`jinja-expert`	Author, read, and debug Jinja2 templates across the three places Jinja lives in 2026 — HuggingFace `chat_template.jinja` (rendered by `apply_chat_template` for vLLM / sglang), Ansible playbooks + `.j2` files, and Jinja-adjacent Kubernetes workflows…
`jira-best-practices`	Advise on USING Jira well, not operating it: make the structural call — is this an epic, a story, a task, or a sub-task? — and diagnose why a Jira is a dread, then recommend the lean fix. Adapt to the organisation's OWN hierarchy names, conventions,…
`jira-cli`	Drive Atlassian Jira from the terminal with the `jira` CLI (jira-cli, v1.7.0) against ANY Jira — Cloud or on-premise/Data Center. Covers the full command surface (issue / epic / sprint / board / project / release), the non-interactive automation…
`jira-confluence-mcp`	Install, configure, secure, and troubleshoot the mcp-atlassian MCP server (sooperset/mcp-atlassian) that connects an agent to Jira/Confluence — including AIR-GAPPED setup (mirror the prebuilt image by digest; no PyPI/git mirror) and internal-CA /…
`k8s-components-checker`	Survey an RKE2 community cluster against an embedded compatibility registry of 19 stack components and produce a verdict for upgrade-readiness, drift-review, and version-skew questions. Components: RKE2, Rancher, Harvester, Cilium, Tetragon,…
`keda`	Configure, operate, and master KEDA (Kubernetes Event-driven Autoscaling) — ScaledObject, ScaledJob, TriggerAuthentication CRDs, 70+ scalers, HPA behavior tuning, scale-to-zero, the KEDA HTTP Add-on, production hardening, multi-trigger semantics,…
`keycloak-iam`	Operate, configure, deploy, secure, and integrate with Keycloak (open-source IAM) — the modern Quarkus distribution (24.x–26.6.x), the Keycloak Operator with `Keycloak` and `KeycloakRealmImport` CRDs, and realm/client/identity-provider configuration.
`lmcache-mp`	LMCache multiprocess (MP) mode — standalone LMCache server in its own pod/process that vLLM connects to over ZMQ. Gives process isolation, no GIL contention on the inference path, one cache shared by multiple vLLM pods per node, and CPU-memory…
`makefile-best-practices`	Makefile best practices, patterns, and templates for GNU Make 4.x — dependency graphs, task-runner workflows, parallel-safe recipes, self-documenting help targets, and language-specific patterns (Go, Python, Node, Docker, Helm, POSIX).
`nvidia-datacenter-bringup`	Bring up NVIDIA HGX/DGX datacenter GPU hosts on Ubuntu 24.04 LTS — air-gapped or connected, Secure Boot enabled. Covers B300/B200/H100/A100/L40S/L4 driver+fabricmanager+NVLSM+DOCA-OFED install order and exact package set from NVIDIA CUDA repo + DOCA…
`nvidia-nixl`	NVIDIA Inference Xfer Library (NIXL) operator + developer reference. Point-to-point KV-cache and tensor transport for distributed inference (Dynamo, vLLM, SGLang). Covers the agent API (full Python reference; C++/Rust via upstream pointers), all 13…
`open-webui-embeddings`	Wire HuggingFace embedding + reranker models (BGE-M3, BGE-Reranker-v2-m3, etc.) into Open WebUI's RAG pipeline via LiteLLM proxying HuggingFace Text Embeddings Inference (TEI). Covers the exact wire shapes Open WebUI sends (URL auto-append on embed…
`open-webui-valkey-websocket`	Deploy Open WebUI multi-pod with WebSockets and Valkey/Redis Sentinel at 1000+ user scale on Kubernetes. Centerpiece is the structural Socket.IO+Redis frame-amplification bug (#23733) that cripples multi-pod streaming, and the maintainer-endorsed…
`openshift-app`	Package applications for OpenShift deployment: container images (UBI, arbitrary UID, multi-stage builds), packaging formats (Helm, Kustomize, Operators, OLM v1), CI/CD (Tekton, ArgoCD, Shipwright, Conforma), security (SCC, PSA, supply chain, image…
`patch`	Generate candidate fixes for verified security findings. Consumes TRIAGE.json (preferred), VULN-FINDINGS.json, or an execution-harness results directory. Static-analysis input gets a per-finding patch subagent + an independent reviewer and is…
`prometheus-mimir-grafana`	Query Prometheus and Grafana Mimir, write and debug PromQL, and build or fix Grafana dashboards — for agents solving problems from metrics. Covers the Prometheus HTTP API (`/api/v1/query`, `query_range`, `series`, `labels`, `metadata`), Mimir…
`rancher-upgrade`	Plan and sequence COMMUNITY-edition Rancher upgrades across air-gapped multi-cluster fleets — a management/"hosting" Rancher cluster plus the downstream RKE2/K3s clusters it provisions. Covers the community release model (2.11→2.14,…
`secure-boot-cert-rotation`	Triage and remediate the Microsoft Secure Boot 2011→2023 UEFI certificate rotation (CAs expiring June/October 2026) across Dell PowerEdge / iDRAC9 bare metal, Ubuntu/Linux servers, and Harvester HCI / KubeVirt guest VMs. Establishes the load-bearing…
`sglang-hicache`	SGLang HiCache (hierarchical KV cache) — three-tier prefix cache: GPU HBM (L1) → pinned host DRAM (L2) → distributed L3 (Mooncake / 3FS / NIXL / AIBrix / EIC / SiMM / file / LMCache). Covers `--enable-hierarchical-cache`, all `--hicache-*` flags,…
`sglang-model-gateway`	SGLang Model Gateway (`sgl-model-gateway`, formerly `sgl-router`) — Rust router fronting vLLM and SGLang inference workers on Kubernetes. Covers first-class vLLM gRPC backend plus HTTP transparent-proxy for vanilla vLLM, the policy set (six…
`skill-improver`	Autoresearch loop for Claude Code skills — greedy keep/discard hill climbing on a 10-dimension quality rubric, with blind subagent validation for self-scoring bias, plus a `freshen` mode that probes external references (release notes, docs,…
`threat-model`	Build a threat model for a target codebase. Three modes: "interview" walks an application owner through the four-question framework and produces a threat model from their answers; "bootstrap" derives a threat model from the code plus past…
`transformers-config-tokenizers-expert`	Preflight reference for HuggingFace snapshots — what vLLM, sglang, and transformers.generate see at runtime. Covers config-file precedence (tokenizer.json, tokenizer_config.json, generation_config.json, chat_template.jinja), transformers v5…
`triage`	Triage a batch of raw security findings. Verify each is real, collapse duplicates, re-rank by derived exploitability, and tag with an owner. Takes a directory or file of scanner output and writes TRIAGE.json + TRIAGE.md sorted by what actually needs…
`vllm-benchmarking`	Run production vLLM benchmarks — `vllm bench` (serve, throughput, latency, sweep, startup, mm-processor), request-rate vs max-concurrency semantics, TTFT/TPOT/ITL/E2EL percentiles, goodput SLO measurement, prefix-cache workloads, air-gapped…
`vllm-caching`	vLLM tiered KV cache configuration for production H100/H200 clusters. Native CPU offload, LMCache (CPU+NVMe+GDS), NixlConnector (disaggregated prefill), MooncakeConnector (RDMA), MultiConnector composition. Version gates, sizing math (flag total…
`vllm-chat-templates`	vLLM chat-template (prompt-side Jinja) operator reference. Template resolution precedence (`--chat-template` → AutoProcessor → tokenizer default → bundled fallback), `chat_template_kwargs` allowlist silently dropping…
`vllm-configuration`	Configure vLLM completely — YAML config file format, CLI arg precedence, full VLLM_/HF_/TRANSFORMERS_* env-var catalog, end-to-end recipe for air-gapped environments (internal HF mirrors, hf-mirror.com, ModelScope, HF_HUB_OFFLINE with pre-seeded…
`vllm-deployment`	Use this skill when authoring, reviewing, or fixing a vLLM Kubernetes manifest, Docker/Podman pod, or OpenShift ServingRuntime — even when the user does not say "vllm". Triggers on: lab cluster performance practices, cache mount + survival across…
`vllm-gemma-4-31b`	Operating-point reference for serving Gemma 4 31B on vLLM — TP sizing, max_model_len, max_num_seqs, gpu_memory_utilization, kv_cache_dtype, EAGLE3 spec-dec, chat_template choice.
`vllm-input-modalities`	vLLM non-chat inference surfaces — text embeddings (`/v1/embeddings`, `/v2/embed`), reranking/scoring (`/rerank`, `/score`), speech-to-text (`/v1/audio/transcriptions`, `/v1/audio/translations`), document OCR via VLMs. Covers 2026 `--runner pooling`…
`vllm-nvidia-hardware`	NVIDIA AI-hardware + vLLM-platform reference covering Hopper (H100/H200), Blackwell (B100/B200/B300) and Blackwell Ultra, Grace-Blackwell superchips and NVL72 racks (GB200, GB300), Vera Rubin (R100/R300) with VR200 NVL144 and Kyber NVL576, Dell…
`vllm-observability`	Observe production vLLM — `/metrics` Prometheus surface (V1 engine), SLO-driven alerting on TTFT/ITL/queue/KV/preemption/aborts/corrupted-logits, shipping Grafana dashboards in `examples/observability/`, OTLP tracing with `--otlp-traces-endpoint`…
`vllm-omni`	vLLM-Omni output-side multimodal generation — image (FLUX.1/2, Qwen-Image, GLM-Image, BAGEL, SD3.5, HunyuanImage-3.0), video (Wan2.1/2.2, LTX-2, HunyuanVideo-1.5), TTS (Qwen3-TTS, CosyVoice3, Voxtral-TTS), any-to-any omni (Qwen3-Omni, Qwen2.5-Omni,…
`vllm-performance-tuning`	vLLM performance-tuning operator reference — tuning workflow (baseline → bottleneck → knob → re-bench), fused-MoE kernel autotune (`benchmark_moe.py` generates `E=N,N=M,device_name=X.json` configs), DeepEP all-to-all + expert parallelism + EPLB,…
`vllm-quantization`	vLLM datacenter-GPU quantization — picking, configuring, troubleshooting NVFP4, FP8, MXFP4, MXFP8, AWQ, GPTQ, INT8, compressed-tensors, modelopt, quark on H100/H200/B200/B300/GB200/GB300. 29 `--quantization` flag values, KV-cache dtypes (fp8_e4m3,…
`vllm-reasoning-parsers`	vLLM reasoning-parser operator + developer reference. `--reasoning-parser` CLI wiring, `ReasoningParser` contract (non-streaming `extract_reasoning` + per-delta `extract_reasoning_streaming`), `is_reasoning_end` xgrammar gating,…
`vllm-speculative-decoding`	Pick, configure, tune, monitor vLLM speculative decoding in production. Eleven SpeculativeMethod options (ngram, ngram_gpu, medusa, mlp_speculator, draft_model, suffix, eagle, eagle3, dflash, mtp, extract_hidden_states), `--speculative-config` JSON…
`vllm-tool-parsers`	vLLM tool-calling operator reference — picking `--tool-call-parser` per model family, writing custom parsers via `--tool-parser-plugin`, navigating vLLM source + GitHub tracker to debug any specific tool-call question. Pointer map, not source…
`vuln-scan`	Static source-code vulnerability scan. Reads a target directory (and THREAT_MODEL.md if present), spawns parallel review subagents per focus area, and writes VULN-FINDINGS.json + .md for /triage to consume. Read-only — no building, running, or…

MIT licensed.

Name		Name	Last commit message	Last commit date
Latest commit History 163 Commits
.claude-plugin		.claude-plugin
.claude/skills		.claude/skills
scripts		scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skills

Install

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

skills

Install

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages