BKUNICOD.RVW 980921 Addison-Wesley Publishing Co. P.O. Box 520 26 Prince Andrew Place Don Mills, Ontario M3C 2T8 416-447-5101 fax: 416-443-0948 or 1 Jacob Way Reading, MA 01867-9984 800-527-5210 617-944-3700 5851 Guion Road Indianapolis, IN 46254 800-447-2226 or Unicode, Inc. 1965 Charleston Road Mountain View, CA 94043 (415) 961-4189 Fax: (415) 966-1637 "The Unicode Standard", U$32.95/C$42.95 steve@unicode.org unicode-inc@unicode.org rick_mcgowan@next.com In the dim and distant past, the late, and generally unlamented, SUZY information system was born in Vancouver. Rather an oddball as far as online services went, one "feature" was that the programmer had tried to allow for the use of all of the IBM graphics characters. This lead to an entirely new field of "smiley" or "emoticon" (emotional icon) endeavours. Instead of the usual sideways happy face of the colon, hyphen and right parenthesis; ":-)"; we were able to use the "Ctrl-A" alternative of the IBM PC character set. Having a decimal value of one, this character is an upright happy face. This allowed other expansions, such as Ctrl-A and the right square bracket, which looks like a face and a telephone handset, and was used (usually in the "chat" modes) for "I am on the phone." "How nice," I hear you mutter between clenched teeth. "Can we now get on with the review?" Patience, stout nerds. This *is* the review. As SUZY users, particularly those who had been introduced to computer communications on the system, moved on to other services or local bulletin boards, they were usually quite shocked to find that their favourite symbols no longer worked. The little diamond (Ctrl-C) would kill a message on a VAX. Fidonet users might find that the cute tagline they had formed from graphics characters completely disappeared when they sent the message through an Internet gateway. ASCII (the American Standard Code for Information Interchange) is widely, and mistakenly, believed to define two hundred and fifty-six characters. It doesn't. Furthermore, of the hundred and twenty-eight characters it does define, many are "control" rather than printable characters. (The "card suit" symbols on the IBM PC graphics set are defined as "end of text", "end of transmission", "enquiry" and "acknowledgement" under the real ASCII standard.) In addition, many believe ASCII to be a universal standard; also not true. An octet with the decimal value thirty-five, for example, is the number sign (sometimes called an "octothorpe") in the United States, but a pound sign (the British currency) in Britain. As with most fields of computer endeavor, the nice thing about standards is that there are so many to choose from. Many vary only slightly -- but they vary. The point is that there are a number of symbols which we commonly know, but which cannot be consistently displayed on terminals or printers. Certain terminals will have certain "international" character sets, but not all are identical. Accents and other phonetic modifiers may be difficult to handle: entire character sets are given over strictly to accented characters. (In Canada we are acutely aware of the problems, with "French" keyboards used at many sites. On one, I was having difficulty finding some necessary punctuation marks for network addressing, and asked a Francophone programmer for help. "Who knows," he growled, "I never use the ____ things!") Unicode seeks to address this problem. Including not only the variations on the Latin alphabet, Unicode incorporates Greek, Cyrillic, Hebrew and other alphabets. It also includes punctuation, diacriticals, mathematical and scientific symbols and miscellaneous graphics. Asian ideographs are also assigned codes. This is no longer suitable, of course, for a seven-bit code, and Unicode is based on a sixteen-bit address space. The book gives some background and plans (chapter one), general principles and rules for conformance (chapter two). To comment on these in any meaningful way would be to rewrite these chapters. This is technical material, though not the same technology that computer types are used to. Some background study in linguistics would be a good idea, although it is not strictly necessary to understand and use the Unicode standard. There are, however, a wealth of symbols, punctuation marks and typesetting codes which Unicode gives standardized access to. On the other hand, any application which used the standard in a significant way would likely require a linguistics background in any case. The bulk of the books (two volumes) is, of course, taken up with the actual code charts. (Volume two, in fact, is almost completely concerned with Han ideographs. In spite of the recent widespread use of the English alphabet, this is still the standard written language of Chinese, Japanese and Korean: CJK in Unicode terminology.) The charts are augmented with verbal definitions of the symbols, and with cross references to similar forms. The Unicode standard is recent. In comparative terms its current usage is negligible. However, it is the defacto standard for broadly based international character sets. With the recent rejection of the proposed ISO thirty-two bit standard, and the recasting of that standard to follow Unicode's lead, Unicode is a significant factor in the development of any international applications. copyright Robert M. Slade, 1993 BKUNICOD.RVW 980921 (Postscriptum - Unicode Inc. maintains an FTP site at unicode.org (192.195.185.2). Some of the mapping tables, and the Han cross reference lists are available. Some tables are also available on IBM PC or Mac compatible floppy disks.) ====================== DECUS Canada Communications, Desktop, Education and Security group newsletters Editor and/or reviewer ROBERTS@decus.ca, RSlade@sfu.ca, Rob Slade at 1:153/733 Author "Robert Slade's Guide to Computer Viruses" (Oct. '94) Springer-Verlag