UTF-8 to UTF-16 Mapping
Posted on May 14th, 2008 by hengdu
UTF-8 to UTF-16
| UTF-8 Pattern | UTF-16 High Byte | UTF-16 Low Byte |
|---|---|---|
| 0tuvwxyz | 00000000 | 0tuvwxyz |
| 110pqrst 10uvwxyz |
- 00000pqr |
- stuvwxyz |
| 1110jklm 10npqrst 10uvwxyz |
- - jklmnpqr |
- - stuvwxyz |
| 11110efg 10hijklm 10npqrst 10uvwxyz |
- 110110ab - 110111qr |
- cdjklmnp - stuvwxyz |
Reference: Cameron, R. “A Case Study in SIMD Text Processing with Parallel Bit Streams”, School of Computing Science, SFU, 2008
Discussion Area - Leave a Comment