Unicode Characters That Break Code

Many developers encounter mysterious bugs when copying text from documentation, Slack messages, Word documents, Notion pages, or AI tools like ChatGPT. The real cause is often invisible Unicode characters.

These characters are not visible on screen but can break JSON parsing, APIs, database queries, authentication tokens, and application logic.

Below is a developer reference for common Unicode characters that silently cause bugs.

U+200B Zero Width Space

The Zero Width Space is an invisible character used for word breaking. It often appears when copying text from messaging apps or rich text editors.

Why it breaks code:

  • Invalid JSON input
  • Unexpected string mismatches
  • API validation errors
Hello​World

The invisible character between the words is U+200B.


text.replace(/\u200B/g, "")

U+200C Zero Width Non Joiner

The Zero Width Non Joiner prevents characters from joining in certain writing systems. However it can accidentally appear in copied code or identifiers.

Why it breaks code:

  • Variable names mismatch
  • Database query failures
  • Unexpected authentication errors

text.replace(/\u200C/g, "")

U+200D Zero Width Joiner

The Zero Width Joiner is used in typography and emoji composition. While useful in language rendering, it can appear accidentally in copied text.

Why it breaks code:

  • Invisible characters inside tokens
  • Unexpected string comparisons
  • Hidden formatting issues

text.replace(/\u200D/g, "")

U+2060 Word Joiner

The Word Joiner prevents line breaks between characters. Like other invisible characters, it may appear when copying text from formatted documents.

Why it breaks code:

  • Unexpected whitespace handling
  • String mismatch bugs
  • Invisible formatting characters

text.replace(/\u2060/g, "")

U+00A0 Non Breaking Space

The Non Breaking Space looks identical to a regular space but behaves differently. It often appears when copying text from HTML pages or formatted editors.

Why it breaks code:

  • Whitespace mismatch in strings
  • Database search failures
  • Form validation errors

text.replace(/\u00A0/g, " ")

U+FEFF Byte Order Mark

The Byte Order Mark (BOM) is sometimes inserted at the beginning of UTF-8 files. If present in JSON data, it can cause immediate parsing failures.

Why it breaks code:

  • JSON.parse errors
  • Unexpected token errors
  • API request failures

text.replace(/\uFEFF/g, "")

U+2028 Line Separator

The Unicode Line Separator is not always treated as valid whitespace in JavaScript or JSON. It can cause parsing issues when embedded in strings.


text.replace(/\u2028/g, "")

U+2029 Paragraph Separator

The Paragraph Separator behaves similarly to the Line Separator and may break JavaScript or JSON when inserted into text data.


text.replace(/\u2029/g, "")

Use a Unicode Cleaner Tool

Manually searching for invisible characters can be extremely difficult. A dedicated cleaning tool helps detect and remove hidden Unicode characters instantly.

Unicode Cleaner detects characters like:

  • Zero Width Space (U+200B)
  • Non Breaking Space (U+00A0)
  • Byte Order Mark (U+FEFF)
  • Other invisible Unicode characters

Try the tools:

Unicode Cleaner
ChatGPT Cleaner
JSON Formatter & Cleaner

Related Guides