Layer 5 of 9

Inference engine. tokens drip one matmul at a time.

Sampling turns logits into language.

Each forward pass emits logits — raw scores over vocabulary. Sampling policies decide whether to pick greedily or explore.

Temperature reshapes probability mass; nucleus sampling trims unlikely tails.

See Deadwood's solution Next layer: Optimization & serving

What this layer does

Pipeline

Detokenization stitches IDs back into Unicode; streaming APIs flush partial strings for responsive UX.

The problem without Deadwood

Without custodianship, your team inherits every sharp edge below.

Rebuild KV caches per framework fork.
Calibrate sampling knobs without telemetry.
Guard against degenerate repetition manually.

Typical DIY cost

Timeline: 2–4 weeks hardening
Budget: $25k–$90k engineering weeks
Expertise: Serving engineers + UX writers

Deadwood's solution

Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.

from deadwood import InferenceRuntime

runtime = InferenceRuntime(
    model=finetuned,
    decoding="nucleus",
    temperature=0.65,
)

for chunk in runtime.stream("Summarize risk..."):
    print(chunk, end="")

How Deadwood custodies this layer

InferenceRuntime shares kernels with Optimization custodians — sampling stability inherits batching decisions automatically.

Next steps

Continue the tour

Follow how custody chains into Optimization & serving.

Next: Optimization & serving

Run a workload

Provision runners and metered jobs — describe the outcome, not every knob.

Start a job

Talk to custodians

White-glove onboarding for regulated teams and bespoke stacks.

Schedule a demo

← Previous: Training algorithm