The Engineering Handbook · v1
Building Like Claxx
The handbook for shipping software that doesn’t haunt you. Twenty-six principles, eleven sections, one bias toward evidence — drawn from years of finding out the hard way.
Extends the 12-Factor App where experience taught us more than the original twelve. Read top to bottom for the worldview. Jump in by topic for reference. Steal whatever’s useful.
Software is built by people who, given a deadline, will make exactly the trade-offs the system around them rewards. The system either protects them from those trade-offs — or it doesn’t.
This handbook is the system we wish we’d had when we started. Many of the principles below are extensions of the 12-Factor App. Others were paid for in incidents we’d rather not relive. None of them are theory.
Each principle has the same shape: the belief — what we actually think is true about how software gets built; in practice — what we do as a result of believing it; and anti-pattern — what going wrong looks like when nobody’s looking. The rules are rigid on purpose. The prose isn’t.
A claim without a receipt is indistinguishable from a guess. The discipline of producing evidence is the same discipline that produces correct work — you cannot fake a passing test you didn’t run, but you can write any words you want about it. We bias our processes toward verifiable truth over the appearance of it.
In practice
- Every “done” claim links to the artifact that proves it — a CI run, a signed report, a console log, a screen recording.
- A status update that asserts an outcome without linking to the underlying evidence is treated as opinion, not as record.
- When evidence can’t be produced (manual checks, eyeballed behavior), the constraint is named explicitly and the procedure is reproducible by another person.
Anti-pattern
Standup: “Migration ran clean, we’re good.” Two days later, the rollback — because nobody had the run log on hand to check the claim before the team built on top of it. Confidence without receipts is just vibes wearing a tie.
No author audits their own output. The reviewer is always someone — or something — with no stake in the original answer. The bias of authorship is universal; the cost of a second look is small.
In practice
- Code authors don’t approve their own pull requests.
- The engineer who built a feature is not the sole sign-off on its acceptance.
- Automated reviewers don’t validate their own output; the reviewer is operationally separate from the producer.
Anti-pattern
“I tested it myself, looks good.” The author has lived inside the change for hours, knows exactly which inputs to give it, and is psychologically primed to see green. A fresh pair of eyes — or a tool that doesn’t know what you intended — finds bugs the author cannot. This isn’t an insult to the author. It’s how brains work.
Any single safeguard will eventually be misconfigured, bypassed, or revoked. We design assuming that. The same constraint enforced at three layers is not redundancy — it is how we survive the single failure that’s already in our future.
In practice
- Critical safety properties (data integrity, access control, irreversibility) are enforced at no fewer than two independent layers.
- Removing a redundant safeguard requires written justification that the remaining layers are sufficient — and a re-audit of those layers in the same change.
- “We have a check for that” isn’t enough. We have three, and we know which three.
Anti-pattern
One config flag, one engineer who remembered to set it, one Friday afternoon push. The flag flipped to false during an unrelated refactor in May. Nobody noticed until June — in production, on a customer demo. Single-point safeguards eventually flip.
The first miss is a learning moment. The second is a missing control. We don’t rely on remembering, on documentation, or on willpower. In the moment of a deadline, willpower loses every time.
In practice
- A documented rule violated more than once gets a mechanical check.
- New process expectations that depend on someone remembering something include the mechanical reminder before being declared in effect.
- “We talked about it in retro” is not a control. The hook is the control.
Anti-pattern
“Just don’t commit secrets, OK?” — said in three retros over six months. The fourth retro is a postmortem about the secret in a public commit. Discipline is a leaky bucket; the hook is plumbing.
Mass-blocking on subjective issues stalls work without improving outcomes. Hard blocks are reserved for actions you can’t take back. Everything else is ratcheted toward better, never enforced as an emergency.
In practice
- Hard blocks (preventing commit, push, merge, or deploy) are reserved for actions that cannot be undone: data loss, force-push, leaked secrets, broken history.
- Subjective concerns (style preferences, architectural taste, optional improvements) are raised as warnings or non-blocking comments.
- Adding a new hard gate requires naming the irreversible thing it protects against.
Anti-pattern
The linter that blocks every pull request for “prefer arrow functions.” The team learns to silence the linter — then learns to silence the security check that lives in the same pipeline. The boy who cried block.
The cost of a defect compounds with every minute it lives undetected. Errors at write-time are cheaper than errors at push, which are cheaper than errors at review, which are cheaper than errors in production. We invest at the front of that curve.
In practice
- Validation that can run in the editor, runs in the editor.
- Validation that can run at commit doesn’t wait for push.
- Validation that can run at push doesn’t wait for CI.
- Every deploy to a shared environment runs a smoke probe before reporting success.
Anti-pattern
The bug that took thirty seconds to write and two days to find — because the type-check ran in CI but not in the editor. The engineer’s IDE was green. The build was red four hours later. The fix was one character.
Local tooling exists to make the right thing easy in the moment of writing it. Centralized tooling exists to make the wrong thing impossible to ship. They run the same rules at different cadences and are not interchangeable.
In practice
- Local tooling is advisory and auto-fixes when it can.
- Centralized tooling is authoritative and blocking.
- Local and central run the same underlying rules; divergence is a defect, not a preference.
Anti-pattern
A pre-commit hook that finds three errors the local linter never mentioned. Engineers learn that “passes locally” doesn’t predict anything — so they stop checking locally. The system has trained them into the worst of both worlds.
Default to scoping checks to the diff. Pay the cost of a whole-system pass only when the rules themselves changed. Engineers shouldn’t pay full cost for work prior commits already paid for — doing so trains them to find ways around the system.
In practice
- Validation cost scales with the size of the change, not the size of the codebase.
- When the rules themselves change (lint config, formatter config, style guide), they’re re-applied repo-wide in the same change that introduces them.
- “Full repo pass” is the exception, not the default.
Anti-pattern
A four-line pull request that runs the entire integration suite for nineteen minutes. The engineer learns to batch unrelated changes into one PR to amortize the wait — defeating every other principle in this handbook in a single move.
Once shipped, history is read-only. We correct forward, never erase. Lost work is the worst kind of failure because it’s invisible to everyone except the person who lost it — and rewriting history is the same failure in slow motion.
In practice
- Force-push to shared branches is prohibited except as the only path to recovery, with explicit authorization recorded.
- Editing commits shared with others requires coordination with affected teammates before the rewrite.
- Migrations, audit logs, changelogs, and anything other systems index by identity are append-only after first publication.
Anti-pattern
The “quick history cleanup” that takes a teammate’s three-day branch with it. Their only copy was on the remote. The rewrite ran on a Friday at 5:47 p.m. Nobody noticed until Monday.
Don’t run subjective review on a broken build. Don’t ask for human attention before machine attention has had its turn. Cheap signals filter the work that deserves expensive ones — the wrong order means we burn the most expensive resource on problems the cheapest one would have caught.
In practice
- Formatters, linters, and type checks complete and pass before test suites run.
- Automated checks complete and pass before human review is requested.
- Expensive resources — engineer attention, integration environments, automated reviewers — aren’t spent on work that hasn’t cleared the cheap gates.
Anti-pattern
A senior engineer spends forty minutes reviewing a pull request whose first failing test, had it run, would have made the entire approach moot. The review happened because the assigner forgot to wait for green.
New active integrations start off. Activation requires intent, an owner, and a documented reason. Surface area is a tax paid in cognitive load and operational complexity — not a free gift to the future.
In practice
- New active integrations are inactive by default when introduced.
- Activation requires explicit configuration, an identified owner, and a documented justification.
- “Active integration” means anything that runs against the project on activation — agents, hooks, third-party services, automation. It does not mean every transitive dependency; supply-chain hygiene is a different problem with different tools.
Anti-pattern
A repo with eleven enabled integrations — three of which haven’t been touched in two years, and one of which is silently exfiltrating telemetry the team didn’t know it shipped with. Surface area you don’t audit, you don’t own.
Every active integration has a documented purpose and a named owner. No drive-by additions. The reason field is what turns a permission into a contract.
In practice
- Every active integration has a recorded purpose and a named owner.
- The owner is accountable for the capability’s continued necessity and security posture.
- “Why” is a required field, not a nice-to-have.
Anti-pattern
The integration that’s “definitely needed for something” but nobody can name the something. It’s been there longer than anyone currently on the team. Removing it feels risky; keeping it is also risky. We picked the risk we didn’t have to defend.
VIIDocumentation as contract
If a pattern matters, the machine checks it. Humans drift; tools forget nothing. We pair every “you should” with a check that fires when you didn’t.
In practice
- A pattern that appears in a coding standard, style guide, or playbook has a mechanical check that fires when the rule is violated.
- A new pattern added to documentation ships with its corresponding enforcement in the same change.
- “Please remember to” is a fragrance, not a control.
Anti-pattern
The style guide says X. Every code review enforces Y, because that’s what the senior engineer happens to prefer this week. New engineers learn that the doc is decorative — the real rule lives in someone’s head, and changes weekly.
Every enforcement we add began as a real bug, a real review comment, or a real incident. We don’t enforce what we wish were true. We enforce what proved itself necessary.
In practice
- New automated rules trace to a real defect, a real review comment, or a real incident.
- Every rule has a documented case it would have prevented.
- Existing rules that haven’t detected a violation in a defined review period are candidates for evaluation and possible removal.
Anti-pattern
A rule added because someone read a blog post about it. Six months later it has blocked thirty pull requests and prevented zero bugs. The cost of false positives is paid every day; the benefit was hypothetical from day one.
Names and diffs show what; commit messages, design docs, and recorded decisions explain why. The why survives refactoring, language migrations, and team turnover.
In practice
- Code comments explain why a non-obvious decision was made; they don’t paraphrase what the code already shows.
- Commit messages explain the motivation for the change — not the diff.
- Architectural and design decisions are recorded in a durable, indexed format outside of conversation history.
Anti-pattern
A two-paragraph comment narrating exactly what the loop below does, line by line, in English. The code gets refactored a year later; the comment doesn’t. The comment is now a fossil exhibit — accurately describing a creature that no longer exists.
Standards live in one place; every tool, environment, and team member reads from it. Differences between surfaces are bugs to close, not preferences to indulge.
In practice
- Each standard, pattern, configuration, and rule has exactly one canonical location.
- All other places where the standard appears are derivative — generated from, symlinked to, or referencing the canonical source.
- Differences between canonical and derivative are defects to resolve, not preferences to negotiate.
Anti-pattern
The same rule, slightly different, in four places: the README, the contributing guide, the IDE config, and the CI workflow. Three of them are wrong. Nobody knows which is canonical — because nobody declared one.
A check that didn’t run is failed, not missing. A reviewer who couldn’t reach an answer says so loudly. Ambiguity in the user-facing state is itself a failure mode — silence trains people to ignore the system rather than trust it.
In practice
- Required checks that couldn’t run are reported as failed, with a description of why.
- Pending states resolve to explicit failure or success within a defined timeout.
- Status displays distinguish “passed”, “failed”, “did not run”, and “in progress” — they don’t collapse those into one green check.
Anti-pattern
The pipeline that shows a green check because the security scan was skipped (“no relevant files in the diff,” said the config). Six months and four merges later, the security scan hasn’t run on the new auth flow at all. The green meant “I didn’t look” — but it looked like “I looked and approved.”
A “looks good” is bound to a specific state of the work, not to a person’s general opinion. When the work changes, the approval expires. Treating approvals as transferable creates a polite fiction that protects no one.
In practice
- Code review approvals are bound to a specific revision of the change.
- Any modification to a reviewed branch dismisses prior approvals on that branch.
- Re-approval is required after any change to the reviewed code, regardless of how trivial the change appears.
Anti-pattern
An approval from Tuesday on a 200-line pull request. By Friday, the pull request is 600 lines and a refactor of the auth boundary. The original reviewer never saw the new 400. The merge happens with their name on it.
Working code that no consumer has seen and accepted is not done. We define “done” as demonstrated and approved in an environment as close to production as practical. This inverts the traditional shipping order: consumer acceptance happens before code review, not after.
A reviewer reading the code should be certain that the form of the work is correct — the only remaining question is whether the implementation is correct. Reviewing the polish on a feature the user will reject is wasted attention.
In practice
- A change that affects user-visible behavior doesn’t enter code review until it’s been deployed to an environment resembling production and the consumer has accepted the form and behavior.
- The shipping order for product-facing work is: development → QA → preview deploy → consumer acceptance → code review → merge. Each gate blocks the next.
- For internal changes (refactors, infrastructure, tooling), the acceptance role is held by the team that owns the affected code.
- Code review threads don’t include “is this what the user wanted?” — that question is closed before review begins.
Anti-pattern
The team spends a week polishing the implementation of a feature. The product owner looks at the staging build and says, “Oh — that’s not what I had in mind at all.” The week was real; the rework is also real. Code review reviewed the wrong question.
It is an admission that a behavior claim was made about a system the speaker cannot fully reproduce. Local environments are development tools, not authoritative records of correctness. The minimum standard for “it works” is “it works in a shared environment whose differences from production are documented and small.”
In practice
- “Works locally” is not acceptable as evidence in any review, status update, or post-incident report.
- Behavior claims are made about — and reproduced in — a shared environment whose differences from production are documented.
- When a finding doesn’t reproduce in a shared environment, the burden of proof is on the original claim, not on the reviewer.
Anti-pattern
“Works on my machine, must be a CI flake.” The CI was right; the local environment had a stale dependency tree and the wrong env file. The shared environment is the arbiter. The laptop is not.
XIEnvironments and artifacts
A built artifact is immutable. Behavior changes require new artifacts. Configuration may differ between deploys; the artifact may not. This separates what the code does from where it runs.
In practice
- A build artifact, once produced, is immutable. Behavior changes require a new build.
- The same artifact deployed to staging and production is byte-identical.
- Hotfixes that bypass the build pipeline are prohibited except as recovery from active incidents, with explicit post-hoc reconciliation.
Anti-pattern
The staging build that passed all checks. The production build that was actually a different commit, because someone “just rebuilt with the latest.” The bug was in the latest. We tested one binary and shipped another.
Code shall not contain environment-specific values. Everything that varies between deploys is externalized — to environment variables, secrets stores, or configuration services. The codebase is identical in every environment; only the configuration changes.
In practice
- Code doesn’t contain environment-specific values (hostnames, credentials, third-party endpoints, feature toggles).
- Configuration is provided through environment variables, secrets stores, or configuration services — not through code constants or conditional environment branches.
- A new configurable value is documented at the configuration layer in the same change that introduces it.
Anti-pattern
if (env === "prod") { url = "..." } blocks scattered across the codebase. A staging change accidentally hits a production endpoint, because someone updated four of the five branches. The bug is in the one they forgot.
Databases, queues, caches, blob stores, and external APIs are referenced by handle resolved from configuration at runtime. Code shall not assume which provider supplies a backing service.
In practice
- Backing services are referenced by handle resolved from configuration at runtime.
- Code doesn’t encode assumptions about which provider supplies a backing service.
- Swapping a backing service for another of the same kind is a config change, not a code change.
Anti-pattern
A direct import of a specific cloud SDK in business logic. Two years later, “we should move off this vendor” turns into a six-month rewrite — because the assumption leaked into eighty files.
Application processes shall not store state across requests in process memory or local disk. State required across requests must live in a named backing service. A process must be safely killable at any time without data loss.
In practice
- Application processes don’t store state across requests in memory, local disk, or other ephemeral storage.
- State required across requests persists to a named backing service.
- Any process can be killed at any time without data loss.
Anti-pattern
An in-memory cache that “speeds things up” — until the process restarts and three users see each other’s data. The cache was state; the framework didn’t notice. You found out from a support ticket.
Application code emits structured logs as events to stdout. Routing, aggregation, retention, and search are the platform’s concern, not the application’s.
In practice
- Application code emits structured logs as events to
stdout (errors to stderr).
- Code doesn’t open named log files, manage rotation, or implement application-level retention.
- “Logs become someone else’s problem once they leave the process” is a feature, not a limitation.
Anti-pattern
An app that writes to /var/log/app.log and rotates it itself. Eventually the disk fills, the app crashes, and the logs that would have told you why are exactly the ones that filled the disk.
Development, staging, and production shall differ in the smallest practical number of dimensions. Each remaining difference must be documented with the reason it cannot be removed.
In practice
- Differences between development, staging, and production environments are minimized.
- Each remaining difference is documented with its reason and a removal plan.
- “It’s different in production” is a debt entry, not a permanent state.
Anti-pattern
Local uses one database; production uses another; “they’re basically the same SQL.” The migration that worked locally fails on the stricter mode in production. The “basically” was doing all the work.
How to operationalize
For each principle above, three things must exist or it drifts back into aspiration:
- A check. Automated where possible, a documented manual procedure where not.
- An owner. A team or named role accountable for the principle remaining in effect.
- A measurement. A way to know whether it’s being followed over time.
Principles without all three become the polite fiction we wrote them to prevent. The handbook above is the what — the harness (the per-codebase wiring of checks, owners, and measurements) is the how. The how lives close to the code, never in the principle.
The handbook itself follows its own rules. If a principle here can’t be reduced to a check and an owner in your stack, that’s a finding worth surfacing — not an instruction to ignore the principle. Adapt the mechanism. Don’t adapt the intent.