Layer 9 of 9

Feedback & fine-tuning. users teach better than static datasets.

Close the loop or drift wins.

Supervised fine-tuning ingests curated corrections; RLHF shapes rewards from human rankers.

DPO aligns preferences without explicit reward models; online loops ingest streaming signals.

What this layer does

Feedback mechanics

Each loop feeds custodied datasets — consent tags propagate forward.

Deadwood sequences retraining jobs through Training and Evaluation custodians automatically.

The problem without Deadwood

Without custodianship, your team inherits every sharp edge below.

  • Manually export chat logs into spreadsheets.
  • Risk leaking PII into reward datasets.
  • Lose reproducibility when reward hacks mutate offline.

Typical DIY cost

Timeline
rolling 2–4 week cycles
Budget
$50k–$200k per iteration
Expertise
Human raters + alignment researchers

Deadwood's solution

Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.

from deadwood import FeedbackLoop

loop = FeedbackLoop(
    model=finetuned,
    strategy="dpo",
    privacy="redact-pii",
)

loop.ingest(interactions)
loop.schedule_retrain(cadence="weekly")

How Deadwood custodies this layer

FeedbackLoop sanitizes signals, aligns them with manifests, and opens tickets only when evaluation gates pass.

Custody means your policy stack stays coherent — no rogue adapters trained on unapproved logs.

Next steps

Continue the tour

Return home for the full stack narrative.

Back home

Run a workload

Provision runners and metered jobs — describe the outcome, not every knob.

Start a job

Talk to custodians

White-glove onboarding for regulated teams and bespoke stacks.

Schedule a demo