UTF-8 Validator — Check Encoding, Free Online

Name: UTF-8 Validator
Availability: InStock
Author: ZTools

Validate UTF-8 encoding of text or hex bytes. Detect invalid sequences, BOM, mixed encodings. Browser-only.

About UTF-8 Validator

UTF-8 is the dominant text encoding on the modern web — backwards-compatible with ASCII, supports all Unicode characters, variable-width (1-4 bytes per code point). Invalid UTF-8 sequences cause "mojibake" (garbled text) when interpreted with a different encoding, or outright errors in strict parsers (Python, Rust, JSON parsers). The ZTools UTF-8 Validator checks input bytes (paste hex or upload binary) for valid UTF-8 sequencing, flags invalid bytes, detects byte-order marks, and identifies likely alternative encodings (Latin-1, Windows-1252, GB2312).

Use cases

Debug "mojibake" in scraped text. Text shows "Ã©" instead of "é". Validator reveals the bytes are valid UTF-8 but were decoded as Latin-1.
Validate user uploads. CSV upload with mixed encodings. Validator detects which rows have invalid UTF-8.
Check for BOM (Byte Order Mark). Some tools add a UTF-8 BOM (0xEF 0xBB 0xBF). Most parsers handle it; some choke. Validator flags it.
Verify a binary file is text. Random binary won't parse as valid UTF-8. Validator confirms or refutes.

How it works

Paste text or hex bytes. Auto-detect: if hex chars only, treat as bytes; otherwise treat as already-decoded text.
Validate. Walk bytes. UTF-8 rules: ASCII (0xxxxxxx), 2-byte (110xxxxx 10xxxxxx), 3-byte, 4-byte. Each continuation byte starts with 10.
Flag issues. Invalid lead bytes, missing continuation bytes, overlong encodings, invalid surrogates.
Display. Valid: ✓ + decoded text. Invalid: byte position + suggested encoding (often Latin-1 / Windows-1252).

Examples

Input: Hex "C3 A9" (é in UTF-8)
Output: Valid. Decodes to "é".

Input: Hex "E9" (é in Latin-1)
Output: Invalid UTF-8 (orphan continuation byte). Likely Latin-1 / Windows-1252.

Input: Hex "EF BB BF E2 9C 93"
Output: Valid UTF-8 with BOM (EF BB BF) prefix. Decodes to "✓".

Frequently asked questions

What's mojibake?

Garbled text caused by encoding/decoding mismatches. "Ã©" is "é" decoded as Latin-1 then re-encoded as UTF-8. Validator helps trace the chain.

BOM — should I keep it?

For UTF-8, BOM is optional and most modern parsers handle it. Some (Python's csv module) don't. Strip if unsure.

Overlong encoding?

Older UTF-8 spec allowed encoding ASCII as multi-byte (e.g. 0x2F as C0 AF). Modern strict UTF-8 forbids — security risk (path traversal). Validator flags.

Privacy?

All in browser.

Pro tips

For "mojibake" debugging, the chain is usually: source encoding → mis-decoded as X → re-encoded as Y. Validator helps identify each step.
Always strip BOM when concatenating UTF-8 files — multiple BOMs in one file confuse parsers.
For user uploads, validate before processing. Reject invalid UTF-8 at the boundary, not deep in the pipeline.
For round-trip safety, use modern strict UTF-8 (no overlong, no surrogates) — the modern spec.

Reviewed by Ahsan Mahmood · Last updated 2026-05-06 · Part of ZTools.

For the full, formatted version of this page, please enable JavaScript and reload https://ztools.zaions.com/utf8-validator.