The assumption hiding in your schema
Every time you add a source_locale column to a translatable table, you're making a quiet architectural decision: one language is real, the rest are derived. It feels natural — especially if your team thinks in English first — but that assumption has a way of rotting your system from the inside.
A developer recently shipped a bilingual LMS using exactly this pattern: a source_locale field on every translatable entity, an MT pipeline reading source rows and writing overlay rows. It worked. Users used it. And then the edge cases started arriving.
<> "The architecture modeled one language as source and the rest as overlay... It worked. Students used it." — and then reality intervened./>
The problem isn't that source-first pipelines fail immediately. They fail gradually, in ways that are hard to debug because the asymmetry is baked into your schema rather than visible in your business logic.
What goes wrong with source-first models
Here's a minimal version of the pattern that causes trouble:
1CREATE TABLE articles (
2 id UUID PRIMARY KEY,
3 source_locale VARCHAR(10) NOT NULL, -- e.g. 'en'
4 title TEXT NOT NULL,
5 body TEXT NOT NULL
6);
7
8CREATE TABLE article_translations (
9 article_id UUID REFERENCES articles(id),
10 locale VARCHAR(10) NOT NULL,
11 title TEXT,
12 body TEXT,
13 PRIMARY KEY (article_id, locale)
14);This looks reasonable until:
- A French-speaking author creates content in French first. Now
source_locale = 'fr'and English is the overlay — but your MT pipeline, your admin UI, your fallback logic, and your CMS all assume English is the base. - A bilingual team edits both locales independently. Which one is authoritative now? Your schema can't represent that.
- You add a third language and need to know which version is "freshest" — but
article_translationshas no timestamp relative to the source, so you can't detect drift.
The deeper issue: your schema has encoded a workflow assumption as a data constraint. When the workflow changes, the data model fights you.
The symmetric alternative
The fix is to stop modeling languages as source + overlays, and start modeling them as peers with explicit relationships.
1CREATE TABLE content_items (
2 id UUID PRIMARY KEY,
3 content_type VARCHAR(50) NOT NULL, -- 'article', 'lesson', etc.
4 created_at TIMESTAMPTZ DEFAULT now()
5 -- NO source_locale here
6);
7
8CREATE TABLE content_locales (The key differences:
- No `source_locale` on the parent. The content item is just an identity anchor, not a language-carrying entity.
- `derived_from_locale` is nullable and self-referential. Any locale can be the origin of any other locale. French can derive from English, or English from French, or neither from the other.
- `translation_status` tracks provenance. You can detect machine-translated content, human-reviewed content, and stale content that was translated from an older version.
This maps directly to how Drupal's symmetric translation model works: shared structural identity, locale-specific text stored as peers. The insight isn't new — but most application-layer implementations still reach for the simpler, asymmetric pattern because it's easier to start with.
Relationships, not hierarchies
Once you stop encoding hierarchy in your schema, you unlock better queries. Want to find all content where the French version is the authoritative origin?
1SELECT ci.id, cl.title, cl.locale
2FROM content_items ci
3JOIN content_locales cl ON cl.content_item_id = ci.id
4WHERE cl.locale = 'fr'
5 AND cl.derived_from_locale IS NULL
6 AND cl.translation_status = 'original';Want to find stale translations — locales that were derived from a version that has since been updated?
1SELECT child.locale, child.content_item_id
2FROM content_locales child
3JOIN content_locales parent ON parent.id = child.derived_from_locale
4WHERE parent.authored_at > child.authored_at;This kind of query is impossible in the overlay model unless you've added extra timestamp tracking — which most teams don't, because the source-first assumption implies the source is always fresh.
The pipeline problem
Schema asymmetry infects your MT pipeline too. If your translation pipeline hardcodes "read from en, write to everything else," you've built English supremacy into your infrastructure. Transifex explicitly documents workflows where a non-English language is the source and English is the translated target — because for many teams, that's just reality.
A symmetric pipeline treats locale selection as a runtime decision, not a compile-time constant:
1async function requestTranslation(
2 contentLocaleId: string,
3 targetLocale: string,
4 engine: TranslationEngine
5): Promise<string> {
6 const source = await db.contentLocales.findById(contentLocaleId);
7 // No assumption about which locale this is
8 const translated = await engine.translate(source.body, {Notice: requestTranslation doesn't know or care whether contentLocaleId is the "source" language. Any locale can be translated from. Any locale can be the origin.
The fallback trap
One place developers usually re-introduce asymmetry is fallback behavior: "if French isn't available, show English." That's fine as a product policy — but it should be encoded as product logic, not inferred from your data model.
1const FALLBACK_CHAIN: Record<string, string[]> = {
2 'fr-CA': ['fr', 'en'],
3 'pt-BR': ['pt', 'en'],
4 'zh-TW': ['zh-HK', 'zh', 'en'],
5};
6
7function resolveContent(item: ContentItem, preferredLocale: string) {
8 const chain = FALLBACK_CHAIN[preferredLocale] ?? [preferredLocale, 'en'];
9 for (const locale of chain) {
10 const variant = item.locales.find(l => l.locale === locale);
11 if (variant) return variant;
12 }
13 return null; // explicit: no content, not a silent English default
14}Explicit fallback chains are visible, testable, and changeable. Silent English defaults are none of those things.
Why this matters
Most developers working on multilingual systems inherit the source-first model because it's the path of least resistance, and because most documentation, tutorials, and ORM plugins assume it. But the assumption encodes English-first thinking into infrastructure that may outlast the team that built it.
The practical next steps if you're building or refactoring a multilingual system:
- Audit your schema for
source_localefields or translation tables that reference a "base" record without explicit relationship tracking. - Add `derived_from` and `translation_status` to your locale variants so you can detect drift and provenance.
- Decouple your MT pipeline from any hardcoded source locale — make source selection a parameter, not a constant.
- Encode fallback logic explicitly in application code or configuration, not implicitly in which language you happened to store in the parent table.
- Add CI parity checks — verify that all required locales exist, that structural fields match across variants, and that no locales are silently stale.
The content itself doesn't have a source language. Your architecture shouldn't either.

