kvcache

Star

Here are 15 public repositories matching this topic...

Zefan-Cai / R-KV

Star

[Neurips 2025] R-KV: Redundancy-aware KV Cache Compression for Reasoning Models

llm kvcache reasoning-models

Updated Jun 23, 2026
Python

ovg-project / kvcached

Star

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

serverless inference-engine llm llm-serving vllm llm-inference ollama llm-framework sglang kvcache gpu-sharing kvcached gpu-mutiplexing kvcache-optimization elastic-kvcache online-offline-coserve

Updated Jun 12, 2026
Python

ModelEngine-Group / unified-cache-management

Star

Persist and reuse KV Cache to speedup your LLM.

gpu cuda nfs torch ssd dram hbm ucm npu ascend llm vllm deepseek kvcache

Updated Jun 27, 2026
Python

NoakLiu / PiKV

Star

PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]

distributed-systems parallel-computing moe mixture-model management-system mixture-of-experts mlsystem kv-cache kvcache

Updated Jun 12, 2026
Python

SiO-2 / kvcloak

Star

Official implementation of "Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference" (NDSS 2026)

privacy llm kvcache ndss-2026

Updated Feb 28, 2026
Python

lizixi-0x2F / March

Star

High-Performance KV Cache Sharing Library

optimization high-performance-computing kv-cache llm vllm kvcache

Updated Apr 2, 2026
Python

xxrjun / gb200-kvcache-offload-study

Star

An empirical study of benchmarking LLM inference with KV cache offloading using vLLM and LMCache on NVIDIA GB200 with high-bandwidth NVLink-C2C .

offloading blackwell kvcache gb200

Updated Dec 20, 2025
Python

llmsresearch / kvcompress

Sponsor

Star

KV-cache compression for LLMs: reference implementations of TurboAngle and TurboQuant codecs with Triton GPU kernels

kvcache kvcache-compression turboquant turboangle

Updated Apr 5, 2026
Python

amitshekhariitbhu / turboquant-experiment

Star

KV Cache with PagedAttention vs PagedAttention + TurboQuant - experiments across token sizes comparing memory, latency, and accuracy.

inference large-language-models llm llms llm-inference kvcache kvcache-optimization kvcache-compression turboquant

Updated Mar 26, 2026
Python

muyuuuu / LLM-Inference

Star

晚上下班不刷手机，学点什么。系列二：从 0 手写大模型推理框架，完成 Qwen3-4B 模型的本地单卡部署和 GPU 推理优化，显存不够可用 Qwen3-0.5B。

triton sampling llm-inference flash-attention kvcache qwen3 page-attention

Updated Feb 23, 2026
Python

NazmulTakbir / FlexiCache

Star

[MLSys-26] FlexiCache: Leveraging Temporal Stability of Attention Heads for Efficient KV Cache Management

vllm llm-inference kvcache kvcache-optimization

Updated Mar 9, 2026
Python

nihilistau / Position_Is_Arithmetic

Star

Prime Power Transformer: A Number-Theoretic Architecture for Compute

machine-learning ai machine-learning-algorithms distributed-computing transformers attention-mechanism distributed-ledger attention-model attention-is-all-you-need attention-mechanisms transformer-architecture mathematical-proof llm kvcache kvcache-optimization kvcache-compression

Updated Jun 28, 2026
Python

ZhengtongYan / PQCache

Star

[SIGMOD 2025] PQCache: Product Quantization-based KVCache for Long Context LLM Inference

database reproducibility sigmod llm kvcache

Updated Feb 3, 2025
Python

s7a9 / C2KV

Star

C2KV: Compressed and Composable KV Cache Reuse for Efficient LLM Inference

llm kvcache

Updated Jun 24, 2026
Python

Norinatiring463 / loop-engineering

Star

Design and audit autonomous agent loops using a tested framework for triggers, verification, guardrails, and human escalation.

automation deep-dive devtools inference multi-agent ai-engineering ai-agent prompt-engineering anthropic agentic-framework agentic-rag kvcache agentic-ai agent-memory coding-agent context-engineering agent-harness loop-engineering

Updated Jun 28, 2026
Python

Improve this page

Add a description, image, and links to the kvcache topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the kvcache topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvcache

Here are 15 public repositories matching this topic...

Zefan-Cai / R-KV

ovg-project / kvcached

ModelEngine-Group / unified-cache-management

NoakLiu / PiKV

SiO-2 / kvcloak

lizixi-0x2F / March

xxrjun / gb200-kvcache-offload-study

llmsresearch / kvcompress

amitshekhariitbhu / turboquant-experiment

muyuuuu / LLM-Inference

NazmulTakbir / FlexiCache

nihilistau / Position_Is_Arithmetic

ZhengtongYan / PQCache

s7a9 / C2KV

Norinatiring463 / loop-engineering

Improve this page

Add this topic to your repo