DBCS
From Wikipedia, the free encyclopedia
DBCS stands for Double Byte Character Set. This term has two basic meanings:
- In CJK computing, the term "DBCS" traditionally means a character set in which every graphic character not representable by an accompanying SBCS is encoded in two bytes; Han characters would generally comprise most of these two-byte characters.
- The term "DBCS" can also mean a character set in which all characters (including all control characters) are encoded in two bytes.
DBCS also stands for Delivery Bar Code Sorter. It referes to a set of sorting machines used in mail sorting industry primarily by USPS. The latest generation of these machines in the field are DBCS6. The DBCS7 are shceduled for deployment in the filed by July 2008.
Contents |
[edit] DBCS’s in CJK computing
In CJK computing, the term DBCS traditionally refers to a character set where each graphic character is encoded in two bytes. The DBCS always has lead bytes with the most significant bit set (i.e., being 1), and is always paired up with a single-byte character-set (SBCS). Furthermore, for the practical reason of maintaining compatibility with unmodified, off-the-shelf software, the SBCS is associated with halfwidth characters and the DBCS with fullwidth characters.
Sometimes, the use of the term "DBCS" can imply an underlying structure that does not comply with ISO 2022; i.e., "DBCS" can sometimes mean a double-byte encoding that is specifically not EUC.
Note that this meaning of DBCS is different from what some consider correct usage today: Some might insist that these character sets be properly called either MBCS’s or variable-width encodings. Nevertheless, the term “MBCS” is not a traditional term and one should not expect the term “MBCS” to be understood; “DBCS” is the correct traditional term to describe these character sets.
[edit] Controversy
Some people use DBCS to mean the UTF-16 Unicode encoding, while other people use the term DBCS to mean older (pre-Unicode) code pages that use more than one byte per character. Shift-JIS, GB2312 and Big5 are a few code pages that can contain more than one byte per character, but even using the term DBCS for these code pages is incorrect terminology because these code pages are really MBCS (MultiByte Character Sets). Some IBM mainframes do have true DBCS code pages, which contain only the double byte portion of a multibyte code page.
If a business uses the term "DBCS enablement" for software internationalization, they are using ambiguous terminology. The business either means they want to write software for East Asian markets using older technology with code pages, or they are planning on using Unicode. Usually "Unicode enablement" means internationalizing software by using Unicode, and "DBCS enablement" means using incompatible code pages that exist between the various countries in East Asia for internationalizing software. Since Unicode supports all the major languages in East Asia, unlike many other code pages, it is generally easier to enable and maintain software that uses Unicode. DBCS (non-Unicode) enablement is usually only desired when much older operating systems or applications do not support Unicode.