An await That Resolves to undefined: The Mechanism Ledger of Runtime Boundaries, Silent Fallback, and "Shape Constraints"

We hold an almost religious assumption about async/await: as long as the inner Promise successfully resolves some value, the outer await definitely gets that value. This assumption is correct at the spec level and has never once misbehaved on V8/JSC, to the point where we never treat "await x equals the value x resolved" as a conclusion with preconditions.

But that equation depends on one implicit premise: the underlying microtask scheduling, the Promise implementation, and the cancellation/lifecycle machinery the framework imposes at this call site are all coordinated and consistent. When the specific boundary of queryFn simultaneously stacks react-query v5's signal cancellation, the dev-mode observer lifecycle, and Hermes's async implementation on RN 0.81, this premise breaks — an await whose inner value clearly resolved correctly resolves to undefined inside the queryFn body. This post isn't about the conclusion "Hermes has a bug"; it's about the complete mechanism ledger of this failure: the runtime boundary, the silent fallback, and why the two "shape constraints" it forces out are transferable.

The symptom: every inner layer succeeds, but the boundary is undefined

The stack is Expo SDK 56 → RN 0.81 → Hermes → @tanstack/react-query@5.99.x. The web is unaffected — it runs on V8/JSC, where writing async/await in queryFn is perfectly compliant. But mobile hit a deeply weird regression: the "Sign in with Microsoft" button vanished for no reason.

The button's visibility is decided by a providers endpoint, and queryFn was written like this at the time:

// Wrong (mobile) —— used await in the queryFn body
queryOptions({
  queryKey: ['auth', 'providers'],
  queryFn: async ({ signal }) => {
    try {
      const r = await authClient.providers({ signal }) // ← r may be undefined
      return r.providers
    } catch {
      return [...FALLBACK_PROVIDERS]
    }
  },
})

The chain of facts traced out is deeply counterintuitive:

authClient.providers({ signal }) is an async function; it awaits apiClient.get(...), which in turn awaits fetch(...);
the entire inner chain fully succeeds: fetch returns status=200, the body parses to { providers: ["credentials", "microsoft"] }, and authClient.providers does return that object;
but the await in the queryFn body resolves to undefined. So r.providers throws a TypeError, the catch swallows it, and the query falls to the fallback — and the SSO button is gone.

Swap the queryFn body for a .then chain — same authClient.providers call, same signal, everything else unchanged — and the bug disappears.

The mechanism: a coordination failure on one specific runtime boundary

To be honest, the root cause isn't 100% nailed down. The observable facts: the failure happens only on the specific function boundary of queryFn, and only when all three of the following are present at once:

react-query v5's signal-driven cancellation;
react-query's dev-mode observer lifecycle;
Hermes's implementation of async functions / microtask scheduling on RN 0.81.

The three together cause the await on this boundary to resolve to undefined, even though the inner Promise clearly resolved the correct value. The key evidence is the specificity of the boundary: all those inner async/awaits (authClient, apiClient, fetch) work fine, and the failure pins only to the one queryFn layer that react-query calls directly. This says the problem isn't in Hermes's async itself, but at the intersection of Hermes's async and the cancellation/observation machinery react-query imposes at this call site — another "three mechanisms each individually correct, the premise lapses at the intersection" structure.

This also directly gives the fix shape: write the queryFn body as a .then chain, avoiding the microtask orchestration of generating an async function on this boundary, and the problem stably disappears.

Shape constraint 1: queryFn uses .then, move multi-step control flow outward

// Single fetch + fail-soft fallback
queryFn: ({ signal }) =>
  authClient.providers({ signal })
    .then((r) => r.providers)
    .catch(() => [...FALLBACK_PROVIDERS]),

// Single fetch + throw on error
queryFn: ({ signal }) =>
  authClient.me({ signal }).then((dto) => normaliseUser(dto)),

// Chained fetch
queryFn: ({ signal }) =>
  authClient.me({ signal })
    .then((user) => sitesClient.listForUser(user.id, { signal }))
    .then((sites) => sites.map(toViewModel)),

// Parallel fetch
queryFn: ({ signal }) =>
  Promise.all([
    sitesClient.list({ signal }),
    timeWindowClient.current({ signal }),
  ]).then(([sites, window]) => composeDashboard(sites, window)),

This constraint governs only the function passed to queryFn — the client layer it calls internally (authClient, apiClient, custom fetchers) can use async/await freely, that path is fine. This matters: the constraint isn't "disable async across the whole stack," it's pinned precisely to the boundary that breaks.

If you just want async/await to express control flow, extract the multi-step logic into a standalone function outside queryFn — that helper can be async:

async function loadDashboard(signal: AbortSignal) {
  const user = await authClient.me({ signal })
  const sites = await sitesClient.listForUser(user.id, { signal })
  return { user, sites }
}

queryOptions({
  queryKey: DASHBOARD_QUERY_KEY,
  queryFn: ({ signal }) => loadDashboard(signal),
})

queryFn itself stays one line, returning that async function's Promise. Crossing the queryFn boundary into a standalone function appears to dodge the problem — keeping both readability and correctness. Don't refactor the queryFn body back to async/await for "readability": on mobile this isn't a style choice, it's a correctness constraint.

Shape constraint 2: a fail-soft catch must never be completely silent

This regression took so long to debug because half the root cause is the runtime boundary above, and the other half is that the fallback is completely silent:

} catch {
  return [...FALLBACK_PROVIDERS];   // no log, no telemetry
}

It silently translates a network/contract failure into a UI feature-flag flip ("the Microsoft button just isn't there"). Silent fallback isn't bad design in itself — rendering the login page as "credentials-only" when the providers endpoint is unreachable is reasonable; what's bad is that it leaves no trace, so a wrong base URL, an ATS rejection, a DNS problem all disguise themselves as the same harmless state. So from now on, every queryFn's fail-soft fallback must:

Log the error at least once (console.warn in dev, a real logger in prod), with the query key and the original error;
Comment clearly why this query is allowed to fall back ("providers is a deployment-time switch; if the backend is unreachable, treat it as no extra provider"), not just what it falls back to.

queryFn: ({ signal }) =>
  authClient.providers({ signal })
    .then((r) => r.providers)
    .catch((err) => {
      // providers is a deployment-time list. When the providers endpoint is unreachable,
      // we render the login page as "credentials-only login" — but we always surface the
      // failure, so a wrong base URL / ATS rejection / DNS issue doesn't disguise itself
      // as "Microsoft SSO is disabled."
      console.warn("[useAuthProvidersQuery] providers fetch failed", err);
      return [...FALLBACK_PROVIDERS];
    }),

Nail both constraints into a regression guard

The test runs queryFn directly with a mocked client:

Case	Assertion
Client resolves the expected payload	`queryFn` returns the mapped data
Client rejects with an `ApiError`	`queryFn` re-throws, or returns the documented fallback, and the warning logger was called
`signal` already aborted	`queryFn` re-throws `AbortError` or returns the fallback — never silently returns `undefined`

The third case specifically guards the regression that prompted this post — it turns "never silently return undefined" from a verbal agreement into a red/green assertion.

The transferable layer

Set aside Hermes and react-query's specifics, and this case has two transferable lessons.

A "universally correct" language feature can fail on one specific runtime × framework boundary. async/await is impeccable at the spec level, but its correctness depends on the underlying scheduling. When you find that on some boundary "the inner layer succeeds but the outer is empty," instead of suspecting the business logic, first suspect the boundary itself — swapping it for a more primitive Promise shape is a cheap bisection.

A silent fail-soft swallows the "diagnostic signal" too. Fault tolerance and observability are two different things: you can choose to degrade gracefully, but the degraded path must leave a trace. A catch without even a log re-encodes "the system is broken" as "the feature is just like this" — the most expensive disguise to debug. When designing any fallback, ask: if this fallback were wrongly triggered, would I have any way to know?