What Is Unicode? Codepoints, Planes, and Why Every Symbol Has a Number
The universal standard that assigns a unique number to every character, symbol, and emoji — and why it matters for every developer.
Search Symbols by CodepointWhat Is Unicode?
Unicode is a universal character encoding standard maintained by the Unicode Consortium. Its goal is straightforward: assign a unique number to every character used in human writing, technical notation, and digital communication — regardless of platform, language, or program.
Before Unicode, dozens of competing encoding systems existed. ASCII covered 128 characters (enough for English), while ISO 8859 variants handled European languages, and separate standards existed for Chinese, Japanese, Korean, Arabic, and many others. Sending text between systems often produced garbled output because the same byte value could mean different characters in different encodings.
Unicode solved this by creating a single, comprehensive mapping. Today the standard defines over 150,000 characters spanning 161 modern and historic scripts, thousands of symbols, technical characters, and the full emoji set. Every character gets one permanent number, ensuring that a document written on one system renders correctly on any other system that supports Unicode.
When you search for a symbol on Symbolwise, you're searching this universal catalog. Every result shows the character's official Unicode name and unique identifier.
Codepoints: Every Character's Unique Address
A codepoint is the unique number assigned to a character in the Unicode standard. Codepoints are written in the format U+ followed by four to six hexadecimal digits. For example:
| Symbol | Name | Codepoint |
|---|---|---|
| A | Latin Capital Letter A | U+0041 |
| € | Euro Sign | U+20AC |
| ✓ | Check Mark | U+2713 |
| → | Rightwards Arrow | U+2192 |
| π | Greek Small Letter Pi | U+03C0 |
| 😀 | Grinning Face | U+1F600 |
The U+ prefix distinguishes Unicode codepoints from regular hexadecimal numbers. Characters in the basic range use four digits (U+0041), while characters outside the Basic Multilingual Plane use five or six digits (U+1F600).
Codepoints are abstract identifiers — they define which character, not how it looks on screen. The visual appearance (glyph) depends on the font. The same codepoint can render differently across fonts and operating systems, but it always represents the same character.
Planes and Blocks: How Unicode Is Organized
Unicode organizes its massive character space into 17 planes, each containing 65,536 codepoints (U+0000 to U+10FFFF). This gives a theoretical maximum of 1,114,112 codepoints, of which around 150,000 are currently assigned.
| Plane | Name | Range | Contents |
|---|---|---|---|
| 0 | Basic Multilingual Plane (BMP) | U+0000–U+FFFF | Most modern scripts, common symbols, CJK |
| 1 | Supplementary Multilingual Plane (SMP) | U+10000–U+1FFFF | Emoji, historic scripts, musical notation |
| 2 | Supplementary Ideographic Plane (SIP) | U+20000–U+2FFFF | Rare CJK ideographs |
| 3 | Tertiary Ideographic Plane (TIP) | U+30000–U+3FFFF | Additional CJK ideographs |
| 4–13 | Unassigned | U+40000–U+DFFFF | Reserved for future use |
| 14 | Supplementary Special-purpose Plane | U+E0000–U+EFFFF | Tag characters, variation selectors |
| 15–16 | Private Use Planes | U+F0000–U+10FFFF | Application-defined characters |
Within each plane, characters are grouped into blocks — named, contiguous ranges of codepoints. For example, the Arrows block (U+2190–U+21FF) contains 112 arrow characters, and the Mathematical Operators block (U+2200–U+22FF) contains 256 math symbols. Browse all blocks in the Symbolwise Unicode blocks browser.
Encoding: How Codepoints Become Bytes
A codepoint is an abstract number. To store or transmit a character, computers need an encoding that converts codepoints into sequences of bytes. The three main Unicode encodings each make different tradeoffs:
| Encoding | Bytes per Character | Key Trait | Common Use |
|---|---|---|---|
| UTF-8 | 1–4 | ASCII-compatible, variable-width | Web, APIs, files, databases |
| UTF-16 | 2 or 4 | Fixed 2 bytes for BMP, surrogate pairs for others | JavaScript strings, Windows APIs, Java |
| UTF-32 | 4 (fixed) | Constant width, simple indexing | Internal processing, some databases |
UTF-8 is the dominant encoding on the web. It uses one byte for ASCII characters (U+0000–U+007F), two bytes for Latin extensions and common symbols, three bytes for most BMP characters including CJK, and four bytes for characters outside the BMP like emoji. Because ASCII text is valid UTF-8, the encoding is backwards-compatible with decades of existing software.
UTF-16 uses two bytes for any character in the Basic Multilingual Plane and four bytes (two "surrogate" values) for characters above U+FFFF. This is the encoding used internally by JavaScript strings, which is why emoji and other supplementary characters behave differently when you measure string length in JavaScript.
UTF-32 uses a fixed four bytes per character, making random access simple but storage expensive. It's primarily used for internal processing where constant-width characters simplify algorithms.
Why Unicode Matters for Developers
Understanding Unicode is practical knowledge for any developer who works with text — which is almost everyone. Here's why it matters:
- Correct rendering — Setting the right encoding in your HTML, database, and file headers ensures characters display correctly for all users worldwide.
- String handling — In JavaScript, Python, and other languages, string length and indexing behave differently for characters outside the BMP. Knowing about surrogate pairs and codepoint iteration prevents subtle bugs.
- Security — Homoglyph attacks use visually similar Unicode characters (like Cyrillic
аvs. Latina) to spoof domain names and usernames. Unicode normalization is essential for secure string comparison. - Accessibility and internationalization — Supporting the full Unicode range means your application works for users in any language, including right-to-left scripts, combining characters, and complex text layouts.
Whether you need to embed an HTML entity, find and copy a specific symbol, or understand why a character isn't rendering, Unicode knowledge gives you the foundation to solve the problem.
Explore Unicode on Symbolwise
Symbolwise makes the Unicode standard accessible and practical. Here's how to start exploring:
- Search — Search by name or codepoint to find any of the 150,000+ indexed characters. Type "check mark" or "U+2713" and get instant results.
- Browse blocks — Open the Unicode blocks browser to explore character ranges from Basic Latin to Supplemental Arrows and beyond.
- Copy in any format — Every symbol can be copied as Plain text, HTML entity, CSS escape, JavaScript escape, React JSX, or Markdown.
- Learn more — Read how to find and copy any Unicode symbol for a step-by-step tutorial, or see Unicode blocks explained for a deeper look at how character ranges work.