Layer 5 of 9

Inference engine. tokens drip one matmul at a time.

Sampling turns logits into language.

Each forward pass emits logits — raw scores over vocabulary. Sampling policies decide whether to pick greedily or explore.

Temperature reshapes probability mass; nucleus sampling trims unlikely tails.

What this layer does

Pipeline

Detokenization stitches IDs back into Unicode; streaming APIs flush partial strings for responsive UX.

The problem without Deadwood

Without custodianship, your team inherits every sharp edge below.

  • Rebuild KV caches per framework fork.
  • Calibrate sampling knobs without telemetry.
  • Guard against degenerate repetition manually.

Typical DIY cost

Timeline
2–4 weeks hardening
Budget
$25k–$90k engineering weeks
Expertise
Serving engineers + UX writers

Deadwood's solution

Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.

from deadwood import InferenceRuntime

runtime = InferenceRuntime(
    model=finetuned,
    decoding="nucleus",
    temperature=0.65,
)

for chunk in runtime.stream("Summarize risk..."):
    print(chunk, end="")

How Deadwood custodies this layer

InferenceRuntime shares kernels with Optimization custodians — sampling stability inherits batching decisions automatically.

Next steps

Continue the tour

Follow how custody chains into Optimization & serving.

Next: Optimization & serving

Run a workload

Provision runners and metered jobs — describe the outcome, not every knob.

Start a job

Talk to custodians

White-glove onboarding for regulated teams and bespoke stacks.

Schedule a demo