Layer 2 of 9
Pick structure before you touch GPUs.
Transformers route information across tokens using learned attention maps — effectively programmable memory bandwidth.
Encoder stacks summarize context; decoder stacks generate tokens autoregressively; hybrids mix both.
Each token emits queries that attend to keys elsewhere in the sequence. Softmax yields differentiable routing.
BERT-style models predict masked tokens; GPT-style models predict the next token; T5 frames tasks as text-to-text.
Parameter counts (7B, 70B, etc.) describe tensor shards inside blueprint templates you still must choose responsibly.
Without custodianship, your team inherits every sharp edge below.
Typical DIY cost
Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.
from deadwood import ArchitectureLab
lab = ArchitectureLab()
blueprint = lab.select(
modality="text",
context_window=8192,
inference_budget="a100-40gb",
)
blueprint.summary() # depth · heads · MoE flagsArchitectureLab encodes hardware envelopes — memory ceilings, tensor parallel widths, and Snowflake-side preprocessing assumptions.
Instead of guessing head counts, you declare throughput targets and Deadwood snaps to reviewed templates.