Layer 9 of 9

Feedback & fine-tuning. users teach better than static datasets.

Close the loop or drift wins.

Supervised fine-tuning ingests curated corrections; RLHF shapes rewards from human rankers.

DPO aligns preferences without explicit reward models; online loops ingest streaming signals.

See Deadwood's solution

What this layer does

Feedback mechanics

Each loop feeds custodied datasets — consent tags propagate forward.

Deadwood sequences retraining jobs through Training and Evaluation custodians automatically.

The problem without Deadwood

Without custodianship, your team inherits every sharp edge below.

Manually export chat logs into spreadsheets.
Risk leaking PII into reward datasets.
Lose reproducibility when reward hacks mutate offline.

Typical DIY cost

Timeline: rolling 2–4 week cycles
Budget: $50k–$200k per iteration
Expertise: Human raters + alignment researchers

Deadwood's solution

Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.

from deadwood import FeedbackLoop

loop = FeedbackLoop(
    model=finetuned,
    strategy="dpo",
    privacy="redact-pii",
)

loop.ingest(interactions)
loop.schedule_retrain(cadence="weekly")

How Deadwood custodies this layer

FeedbackLoop sanitizes signals, aligns them with manifests, and opens tickets only when evaluation gates pass.

Custody means your policy stack stays coherent — no rogue adapters trained on unapproved logs.

Next steps

Continue the tour

Return home for the full stack narrative.

Back home

Run a workload

Provision runners and metered jobs — describe the outcome, not every knob.

Start a job

Talk to custodians

White-glove onboarding for regulated teams and bespoke stacks.

Schedule a demo

← Previous: Deployment & monitoring