Layer 9 of 9
Close the loop or drift wins.
Supervised fine-tuning ingests curated corrections; RLHF shapes rewards from human rankers.
DPO aligns preferences without explicit reward models; online loops ingest streaming signals.
Each loop feeds custodied datasets — consent tags propagate forward.
Deadwood sequences retraining jobs through Training and Evaluation custodians automatically.
Without custodianship, your team inherits every sharp edge below.
Typical DIY cost
Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.
from deadwood import FeedbackLoop
loop = FeedbackLoop(
model=finetuned,
strategy="dpo",
privacy="redact-pii",
)
loop.ingest(interactions)
loop.schedule_retrain(cadence="weekly")FeedbackLoop sanitizes signals, aligns them with manifests, and opens tickets only when evaluation gates pass.
Custody means your policy stack stays coherent — no rogue adapters trained on unapproved logs.