Wasted Testing Comes from Picking the Wrong Layer: Use the Smallest Layer That Proves the Behavior, Mock the Seam Not the System

The biggest waste in testing isn't too few tests — it's writing a pile of tests that pass easily yet never cover the real failure modes. They make the coverage number look good and turn CI all green, but when a bug actually ships, not one of them catches it — because what they test isn't the place that goes wrong.

What I want to take apart here are two judgments that decide whether a test carries signal or is pure dead weight: which layer this test should sit at, and what to mock and what not to mock. Both judgments share one underlying principle — a test should touch the behavior that can genuinely fail, neither bypassing it (layer too low) nor hauling up the entire world to reach it (layer too high, too little or too much mocking).

Pick the right layer: use the smallest layer that proves the behavior

The two most common test-layering mistakes are symmetric. One is defaulting all backend tests into host-level integration tests — slow, heavy, hauling up the entire system on every little change. The other is the opposite: over-mocking where an integration test was warranted, so the test stays green while the contract quietly drifts. The root of both is the same: not asking "what is the smallest layer needed to prove this behavior?"

The backend layers by purpose, not one generic test bucket. Under apps/api/tests/:

Layer	Purpose
`Architecture/`	Boundary rules (Api/Worker separation, BuildingBlocks independence, contract-layer purity)
`Hosts/`	Host-level integration behavior (startup, middleware, auth, infrastructure wiring)
`Modules/`	Module-focused endpoint, job, and service tests
`ApiContract/`	Externally-visible HTTP contract drift/regression checks

Layer-selection principle: pick the smallest layer that proves the behavior, without bypassing important boundaries. That sentence has two halves, both required. "Smallest layer" is for speed and stability; "without bypassing important boundaries" is so you don't accidentally test away the very thing you wanted to verify.

Architecture tests protect project and dependency boundaries — e.g. enforcing Api/Worker separation, BuildingBlocks not depending on concrete infrastructure, contract-layer purity.
Host tests are used only when the behavior depends on app startup, middleware, auth, or infrastructure wiring; currently using WebApplicationFactory + Testcontainers for PostgreSQL/RabbitMQ + config overrides. Note the word "depends" — if the behavior doesn't depend on these, a host test is using a sledgehammer to crack a nut.
Module tests are for a module's own endpoints, job logic, DB behavior, and consumers, named against the module's owning namespace and not depending on unrelated modules.
Contract tests are for externally-visible HTTP contract stability, especially important when frontend consumers or generated clients depend on a stable shape.

The frontend uses Vitest for unit/component tests and Playwright for end-to-end.

Unit tests are usually colocated with the source file.
Component tests use React Testing Library patterns.
E2E tests go in apps/web/e2e/.
Tests prefer verifying observable behavior and accessible queries.

Vitest config: environment jsdom, setup file src/test/setup.ts, coverage thresholds of 80 lines / 80 functions / 80 statements / 75 branches. Patterns already in use include describe/it, vi.spyOn/vi.mock/vi.stubEnv, with assertions against the rendered UI or public function behavior.

Interaction style prefers testing user-observable behavior: query by role, label, or visible text, prefer accessible selectors over brittle DOM traversal, and assert results rather than internal implementation details.

E2E uses Playwright (Chromium + Firefox), reuses auth state via global-setup.ts, and captures trace/screenshots/video on failure or retry. E2E is for: login/logout, auth redirects, multi-page workflows, browser/runtime integration behavior. But E2E can't replace focused unit/component tests — it's too slow and heavy to prove a piece of pure logic with.

Mocking: mock the seam, not the entire system

The mocking tradeoff is where tests most easily go off the rails, because it has two opposite pitfalls. Too little mocking and you haul up the entire real system to test a narrow piece of logic — slow and brittle; too much mocking and you mock away the very contract boundary you meant to verify — the test stays green while the contract drifts.

The repo uses mocks, fakes, and real infrastructure selectively, roughly three strategies:

Frontend unit tests mock heavily at browser/network/module boundaries.
Backend job/service tests, for narrow logic, often use fakes or in-memory dependencies.
Host-level backend tests prefer real infrastructure via Testcontainers, rather than mocking out the whole system.

The key to the judgment is distinguishing the seam from the system. A seam is a narrow, well-defined collaboration point — a fetch call, a command dispatch, an HTTP client. Mocking that kind of seam is reasonable, because the logic you're testing is on this side of the seam, and substituting the other side with a fake doesn't affect what you're verifying.

When mocking fits: testing the frontend's API wrapper around fetch; testing a single hook/component with an expensive external collaborator; testing backend job logic where the dispatch side effect can be captured by a fake dispatcher; testing an HTTP client or AI/external provider at a narrow seam. For example vi.spyOn(globalThis, "fetch"), FakeCommandDispatcher, MockHttpMessageHandler.

And when the test's value is in the wiring itself, mocking the system is testing the very thing away — here you must use real dependencies.

When real dependencies fit: API host startup and middleware, PostgreSQL and RabbitMQ connections, auth and cookie behavior, queue/health/readiness behavior. These behaviors only appear when components are truly wired together; testing "DB round-trip semantics" with a mock database avoids precisely the bug that only surfaces on a real round-trip.

The core sentence: mock the seam, not the entire system. Prefer fakes for narrow command/event capture; prefer a real integration boundary for bugs that historically only appeared when components were wired together. What to avoid is the kind of mock that "makes the test pass while hiding contract drift" — it gives you a green light while mocking the signal away with it.

Test data: nearby, minimal, explicit

Test data is usually created inline nearby, doing only the minimal setup the scenario needs. This convention defends against a sneaky loss of readability: when test data hides inside an opaque buildEverything() helper, the reader can't tell what the scenario's key values are — time zone? boundary value? all swallowed by the helper.

Backend tests often seed the EF context directly (e.g. seeding a schedule straight into EmailDbContext).
Frontend tests construct small, focused payloads inline.
E2E relies on configured synthetic/dev credentials.
Shared external test libraries are documented at the repo level, but many automated backend tests stay self-contained via Testcontainers.

Guidance: prefer small, scenario-specific seeds over a giant fixture graph; keep seed values explicit so the test intent is obvious at a glance; and when time logic matters, explicitly include time-zone-sensitive values — time-zone bugs almost always trace back to some time value silently assumed to be in local time. For credentials, use synthetic or documented dev/test users, and don't hardcode production secrets into tests.

Quality gates

Tests are part of the quality gate, not just a local confidence check. Current automated gates include: frontend unit tests and coverage, frontend build and lint, backend test workflows, backend architecture checks, API contract checks, and secret scanning.

When reviewing tests, ask: is this test sitting at the right layer? Does it actually touch the behavior that can fail? Has it avoided brittle selectors or over-mocking? If the contract changed, were the contract/integration tests updated together?

Counter-examples

// Counter-example 1 (frontend): brittle DOM traversal, goes red the moment the product changes structure
const btn = container.querySelectorAll('div')[3].children[0] // ❌
fireEvent.click(btn)

// Correct: accessible query
fireEvent.click(screen.getByRole('button', { name: 'Submit' })) // ✅

// Counter-example 2 (backend): stuffing simple module logic into a host integration test, slow and heavy
public class EmailScheduleTests : IClassFixture<CustomWebApplicationFactory> // ❌ sledgehammer for a nut
{
    // verifies only a piece of pure job logic, yet hauls up the entire host + containers
}
// Correct: a module test + FakeCommandDispatcher capturing the dispatch is enough

// Counter-example 3: mocking away the very boundary you meant to verify
var fakeDb = new Mock<IDbThing>(); // ❌ this bug only appears on a real DB round-trip
// Correct: for DB round-trip semantics, use real PostgreSQL via Testcontainers

// Counter-example 4: test data hidden in an opaque helper, the scenario's key values invisible
const data = buildEverything() // ❌ time zone? boundary value? none visible
// Correct: inline explicit seed, clear intent (including time-zone-sensitive values)

Other things to avoid: don't default every backend test into a host integration test; don't change boundaries by routing around architecture tests; don't reach through another module's internals when testing a module; don't change an endpoint's shape and forget the contract test; don't move frontend tests off to a distant global directory; don't test implementation details when user-visible assertions are available; don't let flaky tests linger without fixing the root cause.

Putting it into practice

First ask "where's the smallest layer": if a module test can prove it, don't go host; if a unit test can prove it, don't go E2E.
Mock the seam, not the system: use fakes/in-memory deps for narrow logic; when the wiring itself is the value, use real Testcontainers deps.
Test data: nearby, minimal, explicit: a small seed beats a giant fixture; for time logic, always write the time zone explicitly.
Assert user-visible behavior: prefer role/label/visible text, stay away from brittle DOM traversal.
Treat coverage and architecture/contract tests as a gate, not decoration: when a contract changes, update contract and integration tests together; when something goes flaky, find the root cause, don't let it camp between red and green.

The transferable layer

Strip away the specific tooling of Vitest and Testcontainers, and the genuinely transferable insight of testing is: a test's value depends on its distance from "the behavior that can actually fail." Layer too low, mock too hard, and the test stands beside the failing behavior rather than on it; layer too high, real dependencies for everything, and the test is slow and brittle, until nobody wants to run it.

Before writing each test, instead of first thinking "how do I make it pass," first locate the behavior that can go wrong, then ask: what is the smallest layer that proves this behavior holds without bypassing it? Seat the test precisely at that layer, mock only the seam leading to it, and coverage turns from a number into a real safety net.