Resident-Claim KV Cache Behavior
A semantic contract for how inference runtimes should behave when resident KV blocks and active KV pressure compete for the same memory pool.
Plain-language summary
This work defines a contract for how an inference runtime should behave when resident key-value cache blocks and active key-value pressure compete for the same memory pool. In simpler terms: when the system cannot hold everything at once, it should still behave predictably.
Technical summary
A semantic conformance contract for resident claims over KV cache blocks, specifying expected runtime behavior under constrained cache pressure. The work focuses on defining the contract before implementation, so different runtimes can reason about resident behavior consistently.
Why it matters
AI systems are increasingly shaped by runtime constraints. If the behavior under pressure is unclear, systems become harder to reason about, optimize, and trust.
Contract before implementation
Status note: this manuscript is submitted, not published yet. I will add the public paper link here once it exists.
The work starts with a deliberately narrow question: what should an inference runtime promise when resident KV blocks and active KV pressure cannot both fit comfortably in the same pool?
The point is not to hide constraints. The point is to make behavior under constraint legible enough that systems can be tested, compared, and trusted.
The pressure case
Modern inference stacks are shaped by memory pressure. When that pressure becomes concrete, the runtime has to decide which blocks remain resident, which blocks move, and what invariants callers can depend on.
This paper treats resident claims as a semantic contract rather than an implementation trick. That distinction matters because it separates the behavior a runtime should expose from the specific allocator or backend that happens to provide it.
Follow-up
For the story around the paper, see The Dream Is Large. The Claim Is Small., a field note on why this small contract feels like a first possible contribution to open-source AI/ML.
The next step is implementation work, but I am keeping that off the public site until the submitted paper has a public home.