UTF-8 to UTF-16 Mapping

UTF-8 to UTF-16

UTF-8 Pattern UTF-16 High Byte UTF-16 Low Byte
0tuvwxyz 00000000 0tuvwxyz
110pqrst
10uvwxyz
-
00000pqr
-
stuvwxyz
1110jklm
10npqrst
10uvwxyz
-
-
jklmnpqr
-
-
stuvwxyz
11110efg
10hijklm
10npqrst
10uvwxyz
-
110110ab
-
110111qr
-
cdjklmnp
-
stuvwxyz

Reference: Cameron, R. “A Case Study in SIMD Text Processing with Parallel Bit Streams”, School of Computing Science, SFU, 2008

Discussion Area - Leave a Comment




Spam Protection by WP-SpamFree Plugin