Fail-Closed Lowering Of Resident KV Claims

A Resident KV paper on when runtime primitives, adapters, or patches can be treated as evidence for accepted future-KV obligations.

Lowering, not a scoreboard

The second Resident KV paper is now available as an arXiv preprint: Fail-Closed Lowering of Resident KV Claims onto LLM Serving Runtimes.

The question is not whether modern serving runtimes have useful KV-cache features. They do. Priority fields, TTL-like duration, offload paths, block events, active no-evict modes, and KV-aware routing are real systems surfaces.

The question is more specific:

When can one of those surfaces be treated as evidence that a runtime has accepted responsibility for future reusable KV?

That is the lowering problem.

Feature names are not enough. A runtime can expose priority, offload, events, and routing without accepting a future-reuse claim. The paper asks what has to be true before a runtime primitive, adapter, or patch can be counted as ResidentClaim conformance.

What has to bind

The first Resident KV paper defined the contract. This paper studies what it takes to lower that contract onto real runtime surfaces.

A ResidentClaim is an accepted future-reuse object for KV state: stronger than a cache hint, and only meaningful if the runtime can say what it accepted, when the state is useful, and what happened when pressure arrived.

The central boundary is obligation-level. A conforming path has to bind behavior to an accepted claim identity, a materialization predicate, ordered lifecycle events, and claim-scoped outcomes.

Those terms are doing useful work.

If bytes move to host memory, that does not by itself show that a claimed future-reuse object can be restored before it is reused. If a block is removed, that does not by itself show claim harm. If a router sends a request to a worker with related cached KV, that does not by itself show that route cost, placement, and later reuse were attributed to an accepted claim.

The paper's discipline is fail-closed: ambiguity, missing identity, missing order, fallback recompute, generic counters, and storage-only evidence do not get upgraded into semantic conformance.

What the paper adds

The paper contributes a fail-closed lowering relation, a checker, a descriptor format, and a bad-lowering suite. The checker classifies anchored evidence by ResidentClaim obligations rather than by whether a feature name sounds nearby.

That produces five evidence labels:

Native conformance: the backend itself supplies the obligations.
Adapter-scoped evidence: a trusted join or hook supplies a bounded obligation boundary.
Approximation substrate: useful primitives exist, but key obligations are missing.
Rejected mapping: the proposed lowering confuses a nearby feature with the contract.
Unknown evidence: the public evidence is inconclusive.

The important part is not the label vocabulary. The important part is that positive and negative rows both have to pay rent in evidence.

How to read the runtime rows

The runtime rows are case studies in evidence boundaries. They are not product judgments.

In the audited descriptors and evidence, public TensorRT-LLM, SGLang/HiCache, and Dynamo-style routing expose strong substrates and selected adapter positives. The paper does not treat those public surfaces as native ResidentClaim conformance.

That distinction matters. A priority field can be useful without proving claim-scoped priority influence. A storage tier can be useful without proving restore-before-reuse and claim-scoped restoration failure. A routing score can be useful without proving accepted-claim placement and reuse attribution.

Calling those rows approximate or adapter-scoped is not a dismissal. It is the point of the audit boundary.

The local vLLM witness

The positive systems witness is deliberately scoped: a local patched vLLM connector/scheduler-boundary mechanism at backend-patch depth.

In that witness, claim metadata flows through real in-process offload/load behavior. Under controlled same-claim restoration failure, the failure reaches vLLM's invalid-KV-load path and becomes an ordered scheduler-boundary fail-closed active outcome.

That is useful because it makes the missing lifecycle/outcome semantics concrete. It is not a claim of upstream vLLM support, native conformance, or production performance.

The supporting artifact is here: resident-kv-lowering-artifact. It contains the checker, descriptors, generated matrix, bad-lowering counterexamples, and selected runtime evidence used to audit the lowering claims.

What would move a row

The useful review question is not whether a runtime has a feature with a nearby name. It is whether there is a native path, event, invariant, or artifact that supplies the missing obligations at the asserted depth.

For example:

Is there accepted claim identity?
Is the reusable state tied to a materialization predicate?
Are lifecycle transitions ordered before the outcome they explain?
Are failure, refusal, harm, demotion, expiry, placement, or reuse attributed to the claim rather than only to a request, block, worker, or counter?

If the answer is yes for a row the paper classifies more conservatively, that is the kind of correction the page should invite.

What it does not claim

This is not a KV eviction algorithm, a production offload evaluation, a throughput result, or a serving benchmark.

It does not claim native or upstream vLLM ResidentClaim support. It does not claim native TensorRT-LLM, SGLang/HiCache, or Dynamo conformance. It does not claim scheduler-native pre-admission support. It does not prove complete unaudited runtime behavior. It does not claim existing runtimes lack useful KV machinery.

The bounded result is still useful:

Feature primitives are necessary substrates, but accepted future-KV responsibility requires evidence of specific obligations.

That is the paper's job. It keeps the useful runtime machinery visible while refusing to let nearby feature names do the work of a contract.