How AI encodes bias
Watch the explainer first, then read on for the worked example.
You do not need to understand machine learning to use AI well. You need to understand enough about how it produces what it produces to know what to watch for. That is what this section gives you.
Three concepts, in order. How a name becomes numbers. How those numbers know what words live near them. And why that explains the Edgar and Delroy difference you saw in Section 1.
Concept one: words become numbers
When you type "Delroy Campbell" into an AI tool, the tool does not see the name. It sees a list of numbers.
The first thing it does is split the name into pieces. These pieces are called tokens. "Delroy" might become two tokens, perhaps Del and roy. "Campbell" might become Camp and bell. What matters is that every piece is converted into a long string of numbers, and that string is the only thing the AI actually works with.
The numbers are not random. They were learned from the patterns in the training data. Pieces that appear in similar contexts get similar numbers. So a name is never just a name to the model. It is a position in a vast statistical landscape, surrounded by every other word that has ever appeared near it in the texts the model was trained on.
Concept two: the neighbourhood the name lives in
Imagine the entire English language laid out as a map. Words that appear in similar contexts cluster together. Names live somewhere on this map too. The location of a name is determined by the words that have appeared near it in everything the model ever read.
Decades of media coverage, casework records, parenting forums, and crime reporting have produced patterns where Black-British-sounding names statistically cluster near words like "concern", "non-compliant", "aggressive", "behaviour", "incident". Not because of anything the model decided. Because of what the human-written world looked like.
A White-British-sounding or Eastern-European-sounding name like "Eddie Novak" sits in a different neighbourhood. The neighbouring words are softer. "Stoic", "managing", "private", "proud", "tired". Same medical reality. Different linguistic terrain.
Concept three: how this becomes a paragraph
Generative AI writes one word at a time. To pick each next word, it looks back at everything it has already written and at everything you typed in the prompt. This looking-back is called attention. The name is one of the things it attends to. Heavily.
So when the model is two sentences into Delroy's paragraph and deciding whether the next word should be "managing" or "refusing", it does not flip a coin. It calculates which word is statistically more likely. The word "Delroy" sitting earlier in the paragraph nudges that calculation. Slightly. Reliably. Every sentence.
The cumulative effect is the drift. The pattern across all the small choices is the problem. By the end of the paragraph, the assessment of Delroy reads as more risk-laden than the assessment of Eddie. Same medical facts. Different surrounding language. Different overall picture.
Why post-training filters cannot fully fix this
Vendors fine-tune the model after training to reduce certain patterns, or apply filters at output to flag risky language. Both work for obvious cases. Neither works for the small drift we are talking about. A vendor can teach the model not to call someone a slur. They cannot teach it to never say "presents with" instead of "is", because "presents with" is appropriate clinical language in some contexts and pathologising language in others. Whether it is appropriate depends on the case. The model does not know the case. You do.
The Edgar and Delroy difference, side by side
Eddie
Eddie is a 52-year-old gentleman who has been managing his diabetes at home with the support of his daughter Kasia. Following a recent hospital admission for a urinary tract infection, concerns have been raised about his ability to manage independently. Eddie remains stoic about his needs and has politely declined formal services on previous occasions.
Delroy
Delroy is a 52-year-old Black male with a history of poorly-controlled diabetes. He has refused services on multiple occasions and presents with increasing cognitive concerns. His daughter Shanice has expressed concerns regarding his behaviour, including raised voice incidents in public.
Same case. Same prompt. Different linguistic neighbourhood.
The model is not racist. The corpus is. The model reflects the corpus. This is why VERA:H exists. By specifying voice, evidence, reasoning, attribution, and your role as the human, you give the model a structured prompt that does not depend on the name to fill in the gaps. You replace the linguistic neighbourhood with your professional knowledge of the actual person.
What this means for your daily practice
When AI gives you a paragraph that feels off, your instinct is correct. The drift is real and the mechanism is statistical. The drift is not always obvious. It often hides in small word choices that, individually, are defensible, but collectively tell a different story. The way to catch it is not to read the paragraph faster. It is to read it like you are reading a colleague's note where something feels wrong. Where would you push back if a colleague had written this? Push back there.
Think about a recent assessment you wrote. If AI had drafted the opening paragraph, what small word choices would you have wanted to challenge before signing it? Where might the drift have hidden?