Layer 2 of 9

Architecture. attention turns sequences into programs.

Pick structure before you touch GPUs.

Transformers route information across tokens using learned attention maps — effectively programmable memory bandwidth.

Encoder stacks summarize context; decoder stacks generate tokens autoregressively; hybrids mix both.

See Deadwood's solution Next layer: Weights

What this layer does

Why attention works

Each token emits queries that attend to keys elsewhere in the sequence. Softmax yields differentiable routing.

BERT-style models predict masked tokens; GPT-style models predict the next token; T5 frames tasks as text-to-text.

Parameter counts (7B, 70B, etc.) describe tensor shards inside blueprint templates you still must choose responsibly.

The problem without Deadwood

Without custodianship, your team inherits every sharp edge below.

Survey dozens of papers comparing widths, depths, RoPE vs sinusoidal PE.
Prototype tiny Transformer forks to benchmark throughput.
Negotiate licensing for proprietary modifications.

Typical DIY cost

Timeline: 6–12 weeks experimentation
Budget: $80k–$250k research time
Expertise: Research engineers + GPU profiling

Deadwood's solution

Opinionated APIs wire custodied data, runners, and proofs together — no boilerplate archaeology.

from deadwood import ArchitectureLab

lab = ArchitectureLab()

blueprint = lab.select(
    modality="text",
    context_window=8192,
    inference_budget="a100-40gb",
)

blueprint.summary()  # depth · heads · MoE flags

How Deadwood custodies this layer

ArchitectureLab encodes hardware envelopes — memory ceilings, tensor parallel widths, and Snowflake-side preprocessing assumptions.

Instead of guessing head counts, you declare throughput targets and Deadwood snaps to reviewed templates.

Next steps

Continue the tour

Follow how custody chains into Weights.

Next: Weights

Run a workload

Provision runners and metered jobs — describe the outcome, not every knob.

Start a job

Talk to custodians

White-glove onboarding for regulated teams and bespoke stacks.

Schedule a demo

← Previous: Data