Layer 5 of 9
Sampling turns logits into language.
Each forward pass emits logits — raw scores over vocabulary. Sampling policies decide whether to pick greedily or explore.
Temperature reshapes probability mass; nucleus sampling trims unlikely tails.
Detokenization stitches IDs back into Unicode; streaming APIs flush partial strings for responsive UX.
Without custodianship, your team inherits every sharp edge below.
Typical DIY cost
Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.
from deadwood import InferenceRuntime
runtime = InferenceRuntime(
model=finetuned,
decoding="nucleus",
temperature=0.65,
)
for chunk in runtime.stream("Summarize risk..."):
print(chunk, end="")InferenceRuntime shares kernels with Optimization custodians — sampling stability inherits batching decisions automatically.