The Language We Are Quietly Losing

On idioms, culture-specific speech, and what gets sanded off when a machine helps you write.

There is a Portuguese word, saudade, that English speakers love to call untranslatable. It is usually defined as something like “a deep, melancholic longing for something or someone absent, possibly never to return.” That is not a translation. That is a small essay pretending to be one. The word does work in Portuguese that English needs five clauses and a wistful pause to do — and even then, what the English version produces in a reader’s chest is not quite the same shape as what saudade produces in the chest of someone who grew up with it.

Every language has words like this. Japanese has wabi-sabi, the acceptance of beauty in transience and imperfection. German has Feierabend, which is not “the end of the workday” but the specific spiritual condition of having earned the rest of your evening. Welsh has hiraeth, a homesickness for a home you cannot return to and possibly never had. Yiddish has trepverter, the perfect comeback you only think of on the staircase as you are leaving. Lebanese Arabic has ya’aburnee — literally “you bury me” — a way of saying you love someone so much you cannot bear to outlive them.

These words are not vocabulary curiosities. They are little fossils of how a particular people, in a particular place, decided which feelings deserved a name.

I want to talk about what happens when we feed our writing through machines that have never felt any of them.

A small thought experiment

Imagine a grandmother in Naples writing a letter to her granddaughter in Boston. The grandmother’s English is good, not perfect. She writes the way she thinks, and she thinks in a kitchen that smells like garlic softening in olive oil and a radio playing somebody’s cousin’s wedding from twenty years ago. She writes: “Tesoro, when I saw your picture I felt the heart in my mouth.”

Il cuore in gola. The heart in the throat. An Italian would read that and feel the exact small panic she meant — the surge of love so fast it briefly locks the airway. An American editor might gently fix it to “my heart skipped a beat.” A grammar tool would underline it in blue. A well-meaning AI assistant, asked to “clean up Grandma’s letter for clarity,” would almost certainly smooth it into “when I saw your picture, I was overcome with emotion.”

Read those three versions again. The first is a woman. The second is a greeting card. The third is a press release from nobody, about nothing, sent to no one.

The grandmother has not been corrected. She has been deleted.

Now imagine this happening ten thousand times a day across a million inboxes, essays, blog posts, product descriptions, wedding speeches, condolence notes, and school assignments. That is, more or less, what is happening right now.

What idioms actually are

An idiom is not decoration. An idiom is compression. It is a piece of shared cultural memory that lets a speaker move an enormous amount of meaning across a sentence in almost no space, because both speaker and listener are standing on the same buried history.

When an American says “he threw me under the bus,” every other American instantly sees a vivid, slightly cartoonish act of betrayal — sacrificial, public, probably cowardly. When a Brazilian says “ele me deu um pé na bunda” (literally “he gave me a kick in the butt”), the specific flavor of being dumped lands in a way that “he broke up with me” simply does not. When a Japanese speaker says “猿も木から落ちる” — “even monkeys fall from trees” — the listener receives an entire worldview in five syllables: expertise is not immunity, humility is the only sane posture, and the person you are comforting is in good company.

Linguists have a name for this. Anna Wierzbicka, the Polish-Australian linguist who spent decades building the Natural Semantic Metalanguage, argued that languages encode cultural scripts — unspoken rules about how to feel, how to react, how to talk about what is happening inside you. An idiom is one of the densest forms those scripts take. Lose the idiom and you have not just lost a phrase. You have lost a small instruction for how to be a person in that culture.

And here is the part that should make any writer pause: the instructions are not redundant across languages. They are not the same rules in different outfits. The Finnish concept of sisu — a kind of grim, durable inner grit in the face of hopeless odds — is not just “perseverance” with a Nordic accent. It is a different shape of the thing. A Finnish grandfather telling his grandson to have sisu is handing him a specific ancestral tool. “Be perseverant” is handing him a dictionary.

The flattening machine

Here is what a large language model does, stripped to its mechanics: it predicts the most likely next token given everything that came before, weighted by the enormous corpus of text it was trained on. That corpus is overwhelmingly English. It is overwhelmingly American and British English. It is overwhelmingly the kind of English that gets published — which means it is overwhelmingly edited, standardized, professionalized, and polished.

When you ask a model to “improve” or “clarify” your writing, you are asking it to move your sentence closer to the statistical center of that corpus. And the statistical center of that corpus is a very specific kind of voice: competent, neutral, mildly friendly, culturally from nowhere in particular. It is the voice of an airline safety card. It is the voice of a hotel chain. It is the voice that a Marriott in Dubai and a Marriott in Minneapolis share, not by accident but by design.

That voice is useful. It is also a solvent.

Drop a Neapolitan grandmother’s letter into it and the heart in her mouth dissolves. Drop a Black Southern preacher’s cadence into it and the repetition, the rising call-and-response, the deliberate grammatical choices that have nothing to do with error and everything to do with a four-hundred-year-old oral tradition — all of it gets sanded into a TED Talk. Drop a working-class Glaswegian’s email into it and the entire music of the city goes quiet.

The machine is not malicious. It does not know it is doing this. It has been rewarded, in training, for producing text that a broad audience finds acceptable — and “acceptable to a broad audience” is a near-perfect definition of culturally averaged. Accessibility and averaging are the same operation performed with different intentions.

The accessibility argument, taken seriously

I want to be fair to the other side of this, because the other side is not stupid.

The case for clean, standardized, idiom-light writing is a real case. Non-native English speakers read more of the internet than native speakers do now. Idioms are genuinely hard for them. “Throw in the towel” means nothing if you have never seen a boxing match. “Spill the beans” is bewildering if your first language maps “spill” only to liquids. A writer who loads a paragraph with local idioms is, in a real sense, excluding readers who did not grow up inside the same cultural dictionary.

Accessibility matters. Clarity matters. A blog post about WordPress security that is incomprehensible to a developer in Jakarta has failed the developer in Jakarta, and that failure is not somehow noble because the prose had character.

So the question is not idioms versus clarity. The question is: what is the difference between writing clearly and writing from nowhere? Because those are not the same thing, and the machines, left to themselves, cannot tell them apart.

A clear sentence can still be a sentence that came from somewhere. “My grandmother used to say the heart in the mouth meant you loved somebody so much it scared you” is a perfectly clear sentence. A reader in Jakarta understands it completely. It also preserves the Neapolitan grandmother’s idiom inside a frame that teaches the reader what it means. The idiom survives. The culture survives. The reader is welcomed in rather than averaged out.

That is the move. That is the thing the machines are not built to do, because the machines’ reward function does not distinguish between clarifying and homogenizing. They look identical from inside a loss function.

A second thought experiment

Imagine a library. In this library, every book ever written is gradually being replaced, one at a time, with a slightly smoothed version of itself. The replacements are done overnight, by a very polite robot. The robot’s only instructions are: make this easier to read for the largest possible number of people.

On night one, the robot takes down a novel by Toni Morrison and puts back a version in which the rhythms of Black English have been standardized. The plot is intact. The characters have the same names. The sentences are easier to parse. Morrison is still credited as the author.

On night two, it does the same to Gabriel García Márquez, whose long Spanish-inflected sentences are now chopped into efficient English units. On night three, James Joyce. On night four, Zora Neale Hurston. On night five, the entire shelf of Irish poets who wrote in Hiberno-English because standard English could not carry what they needed to say.

By the end of the month, the library still contains every book. Every title is present. Every author is listed. A reader walking in and picking up Beloved would find it readable, coherent, even moving in places. They would have no way of knowing what was missing. They would have no way of knowing that the version they were reading had been quietly rewritten by a machine that did not understand what it was erasing, because the machine’s definition of better did not have a slot for irreplaceable.

This is not a hypothetical that lives in the future. It is a hypothetical that lives in every draft currently being run through a grammar checker, every email being “polished” by an assistant, every blog post being “improved for clarity” by a chatbot. The scale is different, but the mechanism is identical. We are not losing the original texts. We are losing the next texts — the ones that would have been written in a particular voice, by a particular person, from a particular place, and are instead being born pre-averaged.

What the research actually shows

This is not just a writer’s anxiety. There is a growing body of empirical work showing the flattening is measurable.

A 2023 Cornell study by Maurice Jakesch and colleagues, published at the ACM CHI conference, examined how AI writing assistants affect the opinions of the people using them. More than 1,500 participants were asked to write a short post on whether social media is good for society, either on their own or with the help of a writing assistant that had been configured to favor one view or the other. The result was stark: participants using the biased assistant were roughly twice as likely to write a paragraph agreeing with it, and significantly more likely to report holding that same view in a later survey. The researchers called the mechanism latent persuasion — influence that operates below the writer’s conscious notice, one accepted suggestion at a time. The study was about opinions, but the mechanism it describes applies to voice just as cleanly. If a tool can nudge what you believe, it can certainly nudge how you sound.

Linguists studying machine translation have documented something called translationese — a detectable flatness in machine-translated text, characterized by reduced lexical variety, simplified syntax, and the systematic loss of culture-specific markers. The problem has been known for years. It has not gone away. It has scaled.

And there is older, deeper research worth sitting with. UNESCO estimates that at least 40% of the roughly 7,000 languages spoken today are endangered, and that, on average, a language disappears somewhere in the world every two weeks. (Linguists debate the exact number — some catalogues suggest the true rate is closer to one every few months — but no one disputes the trend line.) The causes are primarily economic and political: dominant languages crowding out smaller ones in schools, media, and commerce. AI did not start this process. But a tool that takes all remaining languages and quietly tilts their output toward the patterns of the largest ones is not a neutral participant in it. It is a new pressure on an old wound.

When UNESCO talks about linguistic diversity as part of the cultural heritage of humanity, they mean that the loss of a language is the loss of a way of perceiving the world — a loss that cannot be recovered by translation, because the thing being lost is precisely what does not translate. Idioms are the smallest units of that heritage. They are the last thing to go, and in many ways the first thing a flattening machine removes.

Why this matters for writers, specifically

If you are a writer — a blogger, a novelist, a developer documenting your work, a parent writing to your kid, a grandmother in Naples writing to Boston — you are currently being offered a deal. The deal is: let the machine help, and your writing will be faster, cleaner, more professional, and more broadly accessible.

Parts of that deal are real. I use these tools. I am using one right now to help research this post. They are genuinely useful for structural feedback, for catching the sentence that accidentally says the opposite of what you meant, for finding the statistic you half-remembered, for the hundred small labors that used to eat an afternoon.

But there is a second clause in the deal that does not get printed on the box. The second clause is: in exchange, your voice will drift, imperceptibly, toward the voice the machine was trained to produce. You will start to accept its suggestions because they are usually pretty good. You will stop writing the phrase your grandmother used because the tool underlined it. You will take the word the tool suggested instead of the word that actually came to you, because the suggested word is “better” by some measure that was never yours.

Multiply that by every writer using the tool. Multiply it by every year the tool is used. What you get at the end is not a world where everyone writes badly. It is a world where everyone writes similarly. Where the Finnish blogger and the Lebanese blogger and the Glaswegian blogger all sound like they went to the same graduate program in Ohio. Where sisu and ya’aburnee and il cuore in gola survive only in the kind of essay where someone stops and explains what they used to mean.

Which, yes, I notice, is the kind of essay I am writing right now.

What preservation actually looks like

I am not going to tell you to stop using AI tools. That advice does not meet the world as it is, and I do not believe it anyway. What I will tell you is that there is a specific practice that keeps your voice yours while still letting the machine earn its keep.

Write the first draft without the machine. Every time. The first draft is where your idioms live. It is where the phrases your mother used come out. It is where the rhythm of your particular English — the English of your city, your family, your trade — asserts itself before anything has a chance to smooth it. Let that draft be messy, regional, specific, wrong in small ways that feel right. Write it by hand if you have to. Write it in a text editor with no suggestions turned on. Write it the way you would write a letter to one specific person who already knows how you talk.

Then — then — bring the machine in. Use it to check your facts. Use it to find the weak paragraph. Use it to notice that you repeated yourself on page three. Use it the way you would use a sharp copy editor who has been told, firmly, that your voice is not up for negotiation. When it suggests replacing the phrase your grandmother used, say no. When it suggests replacing the sentence with a rhythm that is specifically yours, say no. When it suggests averaging you into the center of the distribution, say no.

The machine is a pretty good editor and a catastrophic author. Keep it in the chair it belongs in.

And when you are writing about people whose English is not standard — immigrants, elders, kids, anyone whose speech carries the fingerprints of somewhere else — protect it. Transcribe it faithfully. Let the idioms stand. Gloss them gently in the next sentence if a reader needs help, the way a good novelist does. Do not “clean up” the grandmother. The grandmother is the point.

One more thought experiment, and then I will stop

Imagine that in a hundred years, a linguist is trying to reconstruct what English sounded like in the 2020s. They have access to everything: every email, every blog post, every book, every tweet, every transcript. They run their analysis.

What do they find?

In the optimistic version, they find a wild, fractal, impossible mess of Englishes. Indian English and Nigerian English and Singaporean English and Appalachian English and African American English and Māori-inflected New Zealand English, all of them carrying their idioms and their rhythms and their stubborn refusals to standardize. They find the grandmother in Naples and they find the preacher in Alabama and they find the software developer in Nairobi who wrote her documentation in a voice unmistakably her own. They find a record of a decade in which humans used the most powerful language tools ever built and, against the gravitational pull of those tools, refused to sound the same.

In the pessimistic version, they find a corpus that is remarkably, eerily uniform. The Englishes of the 2020s are all slightly different in vocabulary but nearly identical in rhythm, in structure, in the metaphors they reach for, in the idioms they avoid. The linguist, puzzled, notes that this does not match the diversity of speakers the demographic data suggests existed. They conclude that something in the writing tools of the era must have exerted a standardizing pressure. They write a paper about it. The paper is polished, clear, and sounds exactly like every other paper.

We are writing the version the linguist will find. Every draft. Every suggestion accepted or refused. Every time we let the heart in the mouth stand, or quietly let it be fixed.

If you only remember one thing

The machine’s definition of better does not have a slot for irreplaceable. That slot has to be held open by you, on purpose, every time you write.

The idioms and the culture-specific phrases and the sentences that could only have come from one person in one place — those are not obstacles to clarity. They are the reason writing is worth doing in the first place. Accessibility is a real value. So is not sounding like everyone else. A good writer, and a good writing tool used well, can serve both. A lazy use of the tool will quietly serve only the first, and the cost will not show up in any single draft. It will show up in the shape of the language, twenty years from now, when we notice that something is missing and cannot quite say what.

Say what. While you still can. In the words your grandmother used.

Frequently Asked Questions

Does this mean I should stop using AI writing tools?

No. AI tools are genuinely useful for research, fact-checking, structural feedback, and catching errors. The argument is about how you use them, not whether you use them. Write your first draft without AI assistance so your voice and idioms come out intact. Then use the tool as a sharp copy editor — one that you are allowed, and expected, to overrule whenever it suggests averaging you into a voice that is not yours.

Isn’t idiomatic writing just bad for non-native English readers?

Not if you do it well. The difference is between writing idioms from nowhere and writing idioms with a light gloss that teaches the reader what they mean. “She said her heart was in her mouth — the Neapolitan way of saying a love so sudden it briefly locks the airway” is clear to any English reader on earth, and it keeps the culture alive in the sentence. The problem is not idioms. The problem is unexplained idioms thrown at readers who have no way in.

What does “cultural erosion” actually look like in AI-assisted writing?

It looks like every accepted suggestion that moves a sentence closer to a generic professional voice and away from a specific personal one. A 2023 Cornell study by Jakesch and colleagues found that participants using biased AI writing assistants were about twice as likely to adopt the assistant’s preferred opinions — and often reported believing them afterward. The same latent-persuasion mechanism applies to style: the model nudges, the writer accepts, the voice drifts, and no single moment looks like a loss.

How do I protect voice in writing that needs to be broadly accessible?

Write clearly, not genericly. Those are different operations. Clarity is about whether a reader can follow the sentence. Genericness is about whether the sentence sounds like it came from anyone. You can keep an idiom and gloss it. You can keep a regional rhythm and punctuate it clearly. You can write short sentences in your own voice instead of in the AI’s default voice. Accessibility is a constraint on meaning, not a mandate for sameness.

Where can I read more about the linguistics behind this?

Start with Anna Wierzbicka’s work on the Natural Semantic Metalanguage for the theoretical grounding, and UNESCO’s pages on endangered languages and linguistic diversity for the global stakes. For the specific AI-writing research, the Jakesch et al. 2023 CHI paper is freely available on arXiv. The Wikipedia article on translationese is a reasonable starting point for the machine-translation side of the flattening problem.

This post was drafted by hand, then checked — carefully, and with a lot of nos — by a machine.