Unicode Characters That Break Code

Many developers encounter mysterious bugs when copying text from documentation, Slack messages, Word documents, Notion pages, or AI tools like ChatGPT. The real cause is often invisible Unicode characters.

These characters are not visible on screen but can break JSON parsing, APIs, database queries, authentication tokens, and application logic.

Below is a developer reference for common Unicode characters that silently cause bugs.

U+200B Zero Width Space

The Zero Width Space is an invisible character used for word breaking. It often appears when copying text from messaging apps or rich text editors.

Why it breaks code:

Invalid JSON input
Unexpected string mismatches
API validation errors

HelloWorld

The invisible character between the words is U+200B.


text.replace(/\u200B/g, "")

U+200C Zero Width Non Joiner

The Zero Width Non Joiner prevents characters from joining in certain writing systems. However it can accidentally appear in copied code or identifiers.

Why it breaks code:

Variable names mismatch
Database query failures
Unexpected authentication errors


text.replace(/\u200C/g, "")

U+200D Zero Width Joiner

The Zero Width Joiner is used in typography and emoji composition. While useful in language rendering, it can appear accidentally in copied text.

Why it breaks code:

Invisible characters inside tokens
Unexpected string comparisons
Hidden formatting issues


text.replace(/\u200D/g, "")

U+2060 Word Joiner

The Word Joiner prevents line breaks between characters. Like other invisible characters, it may appear when copying text from formatted documents.

Why it breaks code:

Unexpected whitespace handling
String mismatch bugs
Invisible formatting characters


text.replace(/\u2060/g, "")

U+00A0 Non Breaking Space

The Non Breaking Space looks identical to a regular space but behaves differently. It often appears when copying text from HTML pages or formatted editors.

Why it breaks code:

Whitespace mismatch in strings
Database search failures
Form validation errors


text.replace(/\u00A0/g, " ")

U+FEFF Byte Order Mark

The Byte Order Mark (BOM) is sometimes inserted at the beginning of UTF-8 files. If present in JSON data, it can cause immediate parsing failures.

Why it breaks code:

JSON.parse errors
Unexpected token errors
API request failures


text.replace(/\uFEFF/g, "")

U+2028 Line Separator

The Unicode Line Separator is not always treated as valid whitespace in JavaScript or JSON. It can cause parsing issues when embedded in strings.


text.replace(/\u2028/g, "")

U+2029 Paragraph Separator

The Paragraph Separator behaves similarly to the Line Separator and may break JavaScript or JSON when inserted into text data.


text.replace(/\u2029/g, "")

Use a Unicode Cleaner Tool

Manually searching for invisible characters can be extremely difficult. A dedicated cleaning tool helps detect and remove hidden Unicode characters instantly.

Unicode Cleaner detects characters like:

Zero Width Space (U+200B)
Non Breaking Space (U+00A0)
Byte Order Mark (U+FEFF)
Other invisible Unicode characters

Try the tools:

Unicode Cleaner
ChatGPT Cleaner
JSON Formatter & Cleaner

Unicode Characters That Break Code

U+200B Zero Width Space

U+200C Zero Width Non Joiner

U+200D Zero Width Joiner

U+2060 Word Joiner

U+00A0 Non Breaking Space

U+FEFF Byte Order Mark

U+2028 Line Separator

U+2029 Paragraph Separator

Use a Unicode Cleaner Tool

Related Guides