| |
Don't worry, it's not rocket science... it's
brain surgery!
- Japanese Encoding
- Shift JIS - the most popular internal code used in Japan (used
in Mac and Windows).
- EUC-JIS - A coding standard very popular on UNIX based Systems
and in PC software.
- 7-Bit JIS - including NEW-JIS, OLD-JIS and NEC-JIS. New-JIS is
similar to ISO-2022.
- ISO-2022-JP - an emerging new international Internet standard
for encoding Japanese text.
- Chinese Encoding
- Big5 - commonly used in Taiwan and Hong Kong for traditional Chinese
writing.
- GB - commonly used in China and Singapore for simplified Chinese
writing.
- HZ - a popular Internet convention for encoding GB text, popular
in newsgroup and email.
- ISO-2022-GB - an emerging new international Internet standard
for encoding Chinese text.
- Korean Encoding
- KSC5601 - The most popular internal code used in Korea.
- ISO-2022-KR - an emerging new international Internet standard
for encoding Korean text.
- DBCS(e.g. Shift-JIS, GBK, KSC, Big5)
- DBCS means Double Byte Character Code Set. DBCS means Shift-JIS
on Japanese Windows, and it means KSC in Hangeul Windows.
- EUC-JP
- It is a codeset, in which 1 byte and 2 byte are mixed which is
created from JIS 0208 and conformed with ISO-2022, then the range
of DBCS lead byte is 0xA1-0xFE. Kata-kana with half pitch width
is represented as 2 byte code.
- JIS(ISO-2022-JP)
- This is a 7-bit and multibyte codeset, which is created from JIS
0208 and conformed with ISO-2022. It does not support Kata-kana
with half pitch width.
- Unicode(UCS-2)
- This codeset is compatible with Plane 1 in ISO-10646. This is
defined by Unicode 1.X. About 60 thousand characters can be accommodated.
-
- Unicode(UTF-16)
- This codeset is a new type of Unicode defined by Unicode 2.X.
- It abandons 16-bit fixed code partially, and it can accommodate
about 1 million characters. Unicode (UCS-2) is a subset of UTF-16.
- UTF-7
- This is a 7-bit serialized format, which can be safely used with
older e-mail routers.
- UTF-8
- This is a multibyte format coverted from UCS-4 defined ISO-10646.
It accommodates about 2 billion characters. A character in this
format needs various bytes from 1 to 6. However, it is characteristic
that characters from 0th to 127th have same code point as US-ASCII.
And about 1 milion characters in UTF-16 are mutually converted with
UTF-8
- UCS-4
- This format has 32-bit fixed width, which is defined by ISO-10646.
- Java Source
- This format is used in Java source files.
- This is basically ASCII, but the code point over 128 is represented
with 4 hexadecimal number such as \uXXXX.
- Unicode 3.0 with Language Tags
- The is a new Unicode with language tags, which is defined by Unicode
3.0.
- You can get details of the format from the links of Unicode
Consortium
- You must replace the parts of Unicode language tags with XML language
tags and so on, when you open your data publicly.
- Cho-Kanji TRON code
- This is a new TRON code adopted by BTRON OS "Cho-Kanji".
- You can get details of the format from the links of Personal
Media Inc.
- ISO-2022-ESC B(TM spec)
- This is prefectly conformed to ISO-2022, and all characters in
this format are represented in 7-bit format.
- Shift-Mojikyo(TM spec)
- The character code is Unicode but,
it differs in that Mojikyo characters are assigned in the private
use area.
- It is necessary to install Mojikyo truetype font.
- More information at Mojikyo
Net.
- Unicode + (&M;)Mojikyo Tag
- This format adds tags which designate a Mojikyo number to Unicode.
- The tag's format is the following.
- &Mnnnnnn;
- start from '&M' and end ';'.
- 6 decimal numbers between '&M' and ';'.
- it needs 6 numbers and pads '0' if a number is less than
1000000.
- DBCS + (&M;)Mojikyo Tag
- This format adds tags which designate a Mojikyo number to DBCS.
- Please refer about the tags' format, Unicode + (&M;)Mojikyo
Tag.
- DBCS means Shift-JIS, Big5 and so on...
- Unicode + (@;)Mojikyo Tag
- This format adds tags which designate Mojikyo number to Unicode.
- The tag's format is the following.
- @nnnnnn;
- start form '@', end ';'
- 6 decimal numbers between '@' and ';'.
- it needs 6 numbers and pads '0' if a number is less than
1000000.
- DBCS + (@;)Mojikyo Tag
- This format adds tags which designate Mojikyo number to DBCS.
- Please refer about the tag's format, Unicode + (@;)Mojikyo Tag.
- DBCS means Shift-JIS, Big5 and so on...
|