Hidden Unicode characters are invisible formatting symbols embedded inside text. They often enter software systems through copy-paste operations, rich text editors, AI tools, or PDF documents. Although invisible to the human eye, these characters can break JSON parsing, database queries, authentication logic, and string comparisons.
Hidden Unicode characters belong to special Unicode categories such as format characters (Cf) and separator characters. They are designed for layout, encoding metadata, or typographic control — not for visible display.
This is one of the common causes behind the JSON unexpected token error developers frequently encounter.
An invisible character used to indicate line break opportunities. It has zero visual width but is not valid JSON whitespace.
Looks identical to a regular space but prevents line breaks. Causes subtle string mismatch bugs.
Originally used to indicate byte order in UTF encodings. When inserted in the middle of text, it causes parsing failures.
Used in emoji rendering and script composition. Can appear unexpectedly when copying formatted text.
Example of visually correct JSON:
{
"username": "admin"
}
If a Zero Width Space appears between characters:
u s e r \u200B n a m e
The string comparison:
if (input === "username")
Fails silently.
// Detect zero width characters
const hidden = /[\u200B-\u200D\uFEFF]/g;
if (hidden.test(input)) {
console.log("Hidden characters found");
}
function sanitize(input) {
return input
.replace(/[\u200B-\u200D\uFEFF]/g, '')
.replace(/\u00A0/g, ' ');
}
Or use a safe client-side tool:
Unicode enables global software — but invisible formatting characters introduce subtle bugs. As AI-generated content increases, the likelihood of hidden Unicode contamination rises.
Hidden Unicode characters are legitimate parts of the Unicode standard, but when introduced into structured data, they become silent failure points. Understanding code points like U+200B, U+00A0, and U+FEFF protects your applications from invisible corruption.