Unicode in JavaScript
Escape sequences, template literals, and string methods — everything you need to work with Unicode characters in JavaScript.
Get JS Escape CodesEscape Formats in JavaScript
JavaScript supports three escape formats for embedding Unicode characters in strings. Each covers a different range of the Unicode codespace, and choosing the right one depends on whether you're working with basic ASCII, BMP characters, or the full Unicode range.
\xXX — ASCII / Latin-1 Only
The \x escape takes exactly two hex digits and covers codepoints U+0000 through U+00FF (the first 256 characters). This is limited to ASCII and Latin-1 Supplement characters:
const copyright = "\xA9"; // © Copyright Sign (U+00A9)
const pound = "\xA3"; // £ Pound Sign (U+00A3)
const degree = "\xB0"; // ° Degree Sign (U+00B0)You cannot use \x for characters above U+00FF. For most symbol work, you'll need one of the next two formats.
\uXXXX — BMP Characters
The \u escape with exactly four hex digits covers the entire Basic Multilingual Plane — U+0000 through U+FFFF. This includes the vast majority of commonly used symbols:
const checkMark = "\u2713"; // ✓ Check Mark (U+2713)
const arrow = "\u2192"; // → Rightwards Arrow (U+2192)
const euro = "\u20AC"; // € Euro Sign (U+20AC)
const infinity = "\u221E"; // ∞ Infinity (U+221E)This format does not support characters outside the BMP (codepoints above U+FFFF), such as emoji and many historic scripts.
\u{XXXXX} — Full Unicode Range (ES6+)
The \u{} escape, introduced in ES2015 (ES6), accepts one to six hex digits and covers the entire Unicode codespace — U+0000 through U+10FFFF:
const grin = "\u{1F600}"; // 😀 Grinning Face (U+1F600)
const rocket = "\u{1F680}"; // 🚀 Rocket (U+1F680)
const check = "\u{2713}"; // ✓ Check Mark (U+2713)
const mathA = "\u{1D49C}"; // 𝒜 Mathematical Script Capital A (U+1D49C)This is the most versatile format and the one recommended for new code. It works for BMP characters too — "\u{2713}" is identical to "\u2713".
Surrogate Pairs and String Length
JavaScript strings are sequences of UTF-16 code units, not Unicode codepoints. Characters in the BMP (U+0000 to U+FFFF) take one code unit. Characters outside the BMP (U+10000 and above) require two code units — a surrogate pair.
This means the .length property counts code units, not characters:
"A".length; // 1 — BMP character, one code unit
"✓".length; // 1 — BMP character (U+2713)
"😀".length; // 2 — above BMP, surrogate pair
"\u{1F600}".length; // 2 — same emoji, same resultThis is one of the most common pitfalls in JavaScript Unicode handling. A string containing a single emoji has a .length of 2, which breaks naive character counting, truncation, and iteration.
Correct Iteration and Counting
To iterate over actual characters (codepoints) rather than code units, use for...of or Array.from():
// Incorrect: iterates over code units
for (let i = 0; i < "😀🚀".length; i++) {
console.log("😀🚀"[i]); // Logs 4 broken surrogates
}
// Correct: iterates over codepoints
for (const char of "😀🚀") {
console.log(char); // Logs 😀, then 🚀
}
// Correct character count
Array.from("😀🚀").length; // 2 (not 4)Use Array.from(str).length instead of str.length whenever your strings may contain characters outside the BMP.
Practical Patterns
Template Literals
ES6 template literals support Unicode escapes directly, making them ideal for building strings with symbols:
const status = `\u{2713} Complete`;
const label = `Price: \u{20AC}29.99`;
const mood = `Feeling ${isHappy ? '\u{1F600}' : '\u{1F614}'}`;You can also insert symbols as plain characters in template literals — no escape needed if your source file is UTF-8:
const status = `✓ Complete`;
const label = `Price: €29.99`;String.fromCodePoint()
Use String.fromCodePoint() to create characters from their numeric codepoint at runtime. This is especially useful when you have codepoints stored as data:
String.fromCodePoint(0x2713); // "✓"
String.fromCodePoint(0x1F600); // "😀"
String.fromCodePoint(0x2190, 0x2192); // "←→"
// From a variable
const cp = 0x2605;
const star = String.fromCodePoint(cp); // "★"Avoid the older String.fromCharCode() for non-BMP characters — it does not handle codepoints above U+FFFF correctly.
Codepoint Inspection
To get the codepoint of a character, use codePointAt():
"✓".codePointAt(0); // 10003 (0x2713)
"✓".codePointAt(0).toString(16); // "2713"
"😀".codePointAt(0); // 128512 (0x1F600)
"😀".codePointAt(0).toString(16); // "1f600"Regex Unicode Properties (ES2018+)
Modern JavaScript regex supports Unicode property escapes with the \p{} syntax (requires the u or v flag):
// Match any symbol character
const symbolRegex = /\p{Symbol}/u;
symbolRegex.test("✓"); // true
symbolRegex.test("A"); // false
// Match any currency symbol
const currencyRegex = /\p{Currency_Symbol}/u;
currencyRegex.test("€"); // true
currencyRegex.test("£"); // true
// Match any math symbol
const mathRegex = /\p{Math_Symbol}/u;
mathRegex.test("∞"); // true
// Remove all symbols from a string
const cleaned = "Hello ✓ World →".replace(/\p{Symbol}/gu, "");
// "Hello World "Unicode property escapes are powerful for validation, filtering, and text processing when you need to identify characters by category rather than by specific codepoint.
JSON Encoding
When working with JSON, Unicode handling has specific requirements. JSON supports \uXXXX escapes but not the ES6 \u{XXXXX} syntax. Non-BMP characters must be represented as UTF-16 surrogate pairs:
// JavaScript object
const data = { icon: "🚀" };
// JSON.stringify handles surrogates automatically
JSON.stringify(data);
// '{"icon":"🚀"}' or '{"icon":"\uD83D\uDE80"}'
// Manual JSON: must use surrogate pair
// 🚀 (U+1F680) = \uD83D\uDE80 in JSON
const json = '{"icon": "\\uD83D\\uDE80"}';In practice, JSON.stringify() and JSON.parse() handle all of this transparently. The surrogate pair detail only matters if you're constructing or parsing JSON strings manually, or debugging raw JSON that contains non-BMP characters.
Copying JS Escapes from Symbolwise
Symbolwise provides pre-formatted JavaScript escape sequences for every symbol in its database. Here's how to get them:
- Search for a symbol on the Symbolwise home page — by name, keyword, or codepoint (e.g., "check mark" or "U+2713").
- Select the JavaScript format from the format picker on any symbol card. The JS escape (e.g.,
"\u{2713}") appears, ready to copy. - Click to copy. The value is placed on your clipboard for immediate use in your code.
- Open the detail page for a full encoding reference — including all three JS escape formats alongside HTML, CSS, React JSX, and Markdown.
Symbolwise uses the modern \u{} format by default for the JavaScript copy option, since it works for all codepoints. The symbol detail page also shows the \uXXXX format and surrogate pair representation where applicable.
Further Reading
Continue exploring Unicode in web development:
- HTML Entities for Symbols — Named, decimal, and hex HTML entities with copy-ready examples.
- Using Symbols in CSS — CSS escape syntax, the
contentproperty, and pseudo-element patterns. - What Is Unicode? — The foundational guide to codepoints, planes, and the Unicode standard.
- Symbols in Code Editors — Tips for inserting and displaying Unicode in VS Code, JetBrains, and other editors.