Unicode in JavaScript

Escape sequences, template literals, and string methods — everything you need to work with Unicode characters in JavaScript.

Get JS Escape Codes

Escape Formats in JavaScript

JavaScript supports three escape formats for embedding Unicode characters in strings. Each covers a different range of the Unicode codespace, and choosing the right one depends on whether you're working with basic ASCII, BMP characters, or the full Unicode range.

\xXX — ASCII / Latin-1 Only

The \x escape takes exactly two hex digits and covers codepoints U+0000 through U+00FF (the first 256 characters). This is limited to ASCII and Latin-1 Supplement characters:

const copyright = "\xA9";  // © Copyright Sign (U+00A9)
const pound = "\xA3";      // £ Pound Sign (U+00A3)
const degree = "\xB0";     // ° Degree Sign (U+00B0)

You cannot use \x for characters above U+00FF. For most symbol work, you'll need one of the next two formats.

\uXXXX — BMP Characters

The \u escape with exactly four hex digits covers the entire Basic Multilingual Plane — U+0000 through U+FFFF. This includes the vast majority of commonly used symbols:

const checkMark = "\u2713";   // ✓ Check Mark (U+2713)
const arrow = "\u2192";       // → Rightwards Arrow (U+2192)
const euro = "\u20AC";        // € Euro Sign (U+20AC)
const infinity = "\u221E";    // ∞ Infinity (U+221E)

This format does not support characters outside the BMP (codepoints above U+FFFF), such as emoji and many historic scripts.

\u{XXXXX} — Full Unicode Range (ES6+)

The \u{} escape, introduced in ES2015 (ES6), accepts one to six hex digits and covers the entire Unicode codespace — U+0000 through U+10FFFF:

const grin = "\u{1F600}";        // 😀 Grinning Face (U+1F600)
const rocket = "\u{1F680}";      // 🚀 Rocket (U+1F680)
const check = "\u{2713}";        // ✓ Check Mark (U+2713)
const mathA = "\u{1D49C}";       // 𝒜 Mathematical Script Capital A (U+1D49C)

This is the most versatile format and the one recommended for new code. It works for BMP characters too — "\u{2713}" is identical to "\u2713".

Surrogate Pairs and String Length

JavaScript strings are sequences of UTF-16 code units, not Unicode codepoints. Characters in the BMP (U+0000 to U+FFFF) take one code unit. Characters outside the BMP (U+10000 and above) require two code units — a surrogate pair.

This means the .length property counts code units, not characters:

"A".length;          // 1 — BMP character, one code unit
"✓".length;          // 1 — BMP character (U+2713)
"😀".length;         // 2 — above BMP, surrogate pair
"\u{1F600}".length;  // 2 — same emoji, same result

This is one of the most common pitfalls in JavaScript Unicode handling. A string containing a single emoji has a .length of 2, which breaks naive character counting, truncation, and iteration.

Correct Iteration and Counting

To iterate over actual characters (codepoints) rather than code units, use for...of or Array.from():

// Incorrect: iterates over code units
for (let i = 0; i < "😀🚀".length; i++) {
  console.log("😀🚀"[i]);  // Logs 4 broken surrogates
}

// Correct: iterates over codepoints
for (const char of "😀🚀") {
  console.log(char);  // Logs 😀, then 🚀
}

// Correct character count
Array.from("😀🚀").length;  // 2 (not 4)

Use Array.from(str).length instead of str.length whenever your strings may contain characters outside the BMP.

Practical Patterns

Template Literals

ES6 template literals support Unicode escapes directly, making them ideal for building strings with symbols:

const status = `\u{2713} Complete`;
const label = `Price: \u{20AC}29.99`;
const mood = `Feeling ${isHappy ? '\u{1F600}' : '\u{1F614}'}`;

You can also insert symbols as plain characters in template literals — no escape needed if your source file is UTF-8:

const status = `✓ Complete`;
const label = `Price: €29.99`;

String.fromCodePoint()

Use String.fromCodePoint() to create characters from their numeric codepoint at runtime. This is especially useful when you have codepoints stored as data:

String.fromCodePoint(0x2713);   // "✓"
String.fromCodePoint(0x1F600);  // "😀"
String.fromCodePoint(0x2190, 0x2192);  // "←→"

// From a variable
const cp = 0x2605;
const star = String.fromCodePoint(cp);  // "★"

Avoid the older String.fromCharCode() for non-BMP characters — it does not handle codepoints above U+FFFF correctly.

Codepoint Inspection

To get the codepoint of a character, use codePointAt():

"✓".codePointAt(0);          // 10003 (0x2713)
"✓".codePointAt(0).toString(16); // "2713"
"😀".codePointAt(0);          // 128512 (0x1F600)
"😀".codePointAt(0).toString(16); // "1f600"

Regex Unicode Properties (ES2018+)

Modern JavaScript regex supports Unicode property escapes with the \p{} syntax (requires the u or v flag):

// Match any symbol character
const symbolRegex = /\p{Symbol}/u;
symbolRegex.test("✓");  // true
symbolRegex.test("A");  // false

// Match any currency symbol
const currencyRegex = /\p{Currency_Symbol}/u;
currencyRegex.test("€");  // true
currencyRegex.test("£");  // true

// Match any math symbol
const mathRegex = /\p{Math_Symbol}/u;
mathRegex.test("∞");  // true

// Remove all symbols from a string
const cleaned = "Hello ✓ World →".replace(/\p{Symbol}/gu, "");
// "Hello  World "

Unicode property escapes are powerful for validation, filtering, and text processing when you need to identify characters by category rather than by specific codepoint.

JSON Encoding

When working with JSON, Unicode handling has specific requirements. JSON supports \uXXXX escapes but not the ES6 \u{XXXXX} syntax. Non-BMP characters must be represented as UTF-16 surrogate pairs:

// JavaScript object
const data = { icon: "🚀" };

// JSON.stringify handles surrogates automatically
JSON.stringify(data);
// '{"icon":"🚀"}' or '{"icon":"\uD83D\uDE80"}'

// Manual JSON: must use surrogate pair
// 🚀 (U+1F680) = \uD83D\uDE80 in JSON
const json = '{"icon": "\\uD83D\\uDE80"}';

In practice, JSON.stringify() and JSON.parse() handle all of this transparently. The surrogate pair detail only matters if you're constructing or parsing JSON strings manually, or debugging raw JSON that contains non-BMP characters.

Copying JS Escapes from Symbolwise

Symbolwise provides pre-formatted JavaScript escape sequences for every symbol in its database. Here's how to get them:

Search for a symbol on the Symbolwise home page — by name, keyword, or codepoint (e.g., "check mark" or "U+2713").
Select the JavaScript format from the format picker on any symbol card. The JS escape (e.g., "\u{2713}") appears, ready to copy.
Click to copy. The value is placed on your clipboard for immediate use in your code.
Open the detail page for a full encoding reference — including all three JS escape formats alongside HTML, CSS, React JSX, and Markdown.

Symbolwise uses the modern \u{} format by default for the JavaScript copy option, since it works for all codepoints. The symbol detail page also shows the \uXXXX format and surrogate pair representation where applicable.

`/Ctrl+K / Cmd+K`	Focus search
`Esc`	Clear search / close dialog
`?`	Show keyboard shortcuts