From paj@uk.co.gec-mrc Wed Jul 27 16:20:26 1994 From: paj@uk.co.gec-mrc (Paul Johnson) Newsgroups: comp.std.misc,comp.misc Subject: Software, System and Applications Standards Date: 15 Jul 94 11:48:07 GMT Organization: GEC-Marconi Research Centre, Great Baddow, UK X-Newsreader: TIN [version 1.2 PL1] Software, System and Applications Standards =========================================== by Paul Johnson. Version: 1.3 Date: 94/06/30 The copyright in this document is the property of GEC-Marconi Limited. This document may be copied freely, provided that no charge is made for the information. All copies must include this copyright statement. 0. Abstract =========== This collection of notes on standards provides information about open systems computing standards that may be encountered. The intention is to fill the gap caused by the general lack of accessible summary information on technological standards, which makes it difficult to find out what standards actually exist in any given area, or how widely they are actually used. This report will be of particular use to those who are confronted with the need to comply with open systems standards, but who do not have the background knowledge that enables them to identify sources of information on these standards. It also forms a useful overview of the open systems standards that are relevant in particular areas, and assesses the impact of these standards on GEC-Marconi businesses. The report is mainly concerned with official standards promulgated by international organisations such as ISO and CCITT. However it also includes a number of de-facto and proprietary standards where those have had significant market impact. The UK government is legally required to specify open systems when inviting bids for information systems. Other large customers may well follow suit. The appropriate standards have been collected together under the name ``GOSIP''. This is a ``meta-standard'' which specifies the options that may be used by government departments in various areas of information technology. The idea is to ensure that applications used by government departments can work together. The ISO and CCITT standards mandated by GOSIP are included in this document. The Internet is a world-wide collection of academic and commercial networks which grew out of the DARPA-Net. The Internet Activities Board (IAB) is charged with monitoring and co-ordinating the evolution of Internet protocols. These protocols may be proposed by anyone through the issuing of a ``Request For Comment'' (RFC), which documents a proposed standard. Some of these protocols have become world-wide standards and are described in this document. 0.1. Changes Since Last Version ------------------------------- Ethernet information updated. X Windows updated. Minor correction to Unix history. Correction to Internet protocols. 1. Introduction =============== 1.1. Why it is on the Net ------------------------- This document was originally developed for use within GEC-Marconi. As an experiment, we are making it available over the Net. We hope to acheive the following goals (not necessarily in this order). 1: Get feedback, improvements and updates from readers, thereby increasing our competitiveness. 2: Garner kudos and respect from the rest of the Net. 3: Help other people. 4: Some of the information in this document came from FAQs made available through the generosity of people on the Net. We would like to reciprocate. 1.2. Overview ------------- This collection of notes on standards provides information about open systems computing standards that may be encountered. The intention is to fill the gap caused by the general lack of accessible summary information on technological standards, which makes it difficult to find out what standards actually exist in any given area, or how widely they are actually used. The report is mainly concerned with official software standards promulgated by international organisations such as ISO and CCITT. However it also includes a number of de-facto and proprietary standards where those have had significant market impact, and includes some hardware standards which have significant impact on software design. One of the issues in a report of this nature is its scope. There are hundreds of open standards, and thousands of commercial ones. Some of the commercial standards are declared ``open'' by the vendor (e.g SPARC and NFS) but are considered to be proprietary by other bodies (e.g the UK Government). In some cases the common standard used by almost everyone is closed and proprietary, despite the existence of open standards that cover the same ground (e.g MS-DOS vs. Unix). Covering every standard in existence would be an expensive and futile exercise. This document tries to cover the standards that are most frequently referred to or used today. The areas of technology covered in this document are: Meta-Standards: There are so many competing standards for so many areas of technology that we need standards for which standards we use. These are ``meta-standards''. Data Specification Standards: These are languages used for specifying data protocols and file formats. Hardware Standards: These include standards that may include a software element (e.g the Ethernet access protocol), but where this software would usually be supplied on ROM or as a device driver. Software Standards: This covers operating systems and related areas. Standards in this section would usually be used directly by applications software. Application Standards: This covers communications protocols related to specific application areas. Data Exchange Formats: These standards cover the storage and transfer of specific types of data. Document Formats: This covers markup languages used to describe the format of documents. Internet Services: This section could have been included in the ``Application Standards'', but the Internet Activities Board have defined a group of open standards which are completely unrelated to the ISO and CCITT world, and this separation is reflected here. 1.3. Structure -------------- Each of the standards in this document is described under the following headings: Origin The name of the organisation responsible for the standard, or the name of the people who originally invented it. Standard Status The current version of the standard, along with a brief indication of the extent to which it is actually used and its probable future. Purpose A short description of the problem which this standard is meant to solve. Outline A technical overview of the standard, covering the major features and highlighting any particular advantages or disadvantages. References Pointers to further information. Official standards are not cited here, since they appear in the Origin and Standard Status sections. 2. Meta-Standards ================= 2.1. OSI -------- Open Standards Interconnection. -- Origin -- ISO/CCITT. -- Standard Status -- ISO seven-layer reference model for OSI, issued 1978. Standard ISO 7498. OSI protocols are not generally used outside areas where their use is enforced by regulations (e.g. GOSIP). OSI protocols are not tested widely before standardisation and are not based on existing practice. A number of important techniques such as RPC and stateless protocols were developed after the model was standardised. -- Purpose -- The overall aim is to define the architecture for networking machines in order to provide universal homogeneous connectivity in a heterogeneous environment, regardless of distance, operating system, etc. It seeks to do this by defining a generalised network architecture, as a layered model, such that functions are mapped to specific layers, in order that standards developed for each layer will be well defined as to function and to how they interface with the next-higher and/or next-lower layer. -- Outline -- This is an abstract 7-layer network model, with the lower layers covering the physical (electrical, optical) connections, signalling conventions, error handling, routing from sender to receiver, etc., and the upper layers providing high-level services for network filing functions, network services directory, network management etc. The layering is important in that it decomposes an extremely complicated problem into a set of manageable subproblems (both from the point of view of thinking about it, and producing implementations), and it allows the layer interfaces to be defined on functional grounds, leaving each layer totally independent of the others (thus allowing independent modification of the layers). OSI is therefore a conceptual model which defines the architecture for networking hardware and software. A number of standards is possible at each of the seven layers, and OSI ``profiles'' will define particular sets of standards for particular functions, e.g. MAP. The seven layers are: +-------+--------------+ | LAYER | FUNCTION | +-------+--------------+ | 7 | Application | | 6 | Presentation | | 5 | Session | | 4 | Transport | | 3 | Network | | 2 | Data Link | | 1 | Physical | +----------------------+ Very briefly, the functions of these layers are as follows. 1: The *Physical Layer* provides the physical means of communication - the wires, modems, connectors, signalling conventions - between two nodes. 2: The *Data Link Layer*, using the physical layer, furnishes and manages the communication link between any 2 nodes on the physical medium. The IEEE 802 specification (to which Ethernet belongs) divides this layer into 2 sub-layers: a: Logical Link Control (LLC), responsible for transfer, sequencing, error checking and addressing, and b: Medium Access Control (MAC), responsible for managing access to the physical medium. 3: The *Network Layer*, using the data link layer, provides routing from point to point within a network of arbitrary complexity. 4: The *Transport Layer*, using the network layer, provides a network-wide, error-free data transfer service. It manages network access, message segmentation, error correction, and control over data exchanges. 5: The *Session Layer*, is the first of the three upper application-oriented layers. Using the transport layer, the session layer sets up and manages the dialogue between application processes to ensure structured and synchronised exchange at transport layer level. 6: The *Presentation Layer* addresses the problem of differences in data encoding formats between nodes, using the notions of an abstract syntax which defines a neutral (machine independent) notation for the formal description of data types and values, and a transfer syntax which expresses the data in abstract syntax form in a neutral bit-level format for transmission. The syntax used is data-dependent, so the presentation layer selects the syntax according to the data type, and handles the translation to and from both abstract and transfer syntax. 7: The *Application Layer* is the interface to the user's application program. There is therefore no single standard, but a set of procedures, one for each application category --- e.g. for file transfer and management, for directory services, for network management, for message handling, and for remote terminal access, etc. Note that each layer in one node appears to speak directly to its peer in another node - the lower layers are invisible wherever you look from, despite the fact that all data actually goes via layer 1. Some abbreviated protocol stacks are also in use, where two or more layers are combined for efficiency. A list of all OSI related standards and their current status is printed twice a year in the ACM SIGCOMM journal Computer Communication Review. 2.2. GOSIP ---------- The UK Government Open Standards Interconnection Profile. -- Origin -- The UK Government. Other countries have defined their own equivalents, so bids for foreign contracts should consult the customer's local standards. The US standard is also called GOSIP. The following refers only to the UK GOSIP. -- Standard Status -- "GOSIP: Government Open Systems Interconnection Profile Version 4", issued in 1991. Available in two versions: "Supplier Set" and "Purchaser Set". Updates for 1992 and 1993 are also available. These are also available as "Electronic GOSIP", presumably the same documents in a machine-readable format. A small booklet "Essential Guide to GOSIP" is available for free. This is the place to start. All this information is available from CCTA Publications. Telephone +44 071 217 3331, and ask for the "Essential Guide to GOSIP" and the "CCTA Publications List". -- Purpose -- >From "Essential Guide to GOSIP": * Simplify planning and procurement of OSI-based communications systems. * Define appropriate options within the base OSI standards. * Facilitate precise specification of the requirements of an administrative user community. * Ensure that applications can interwork effectively between independently purchased systems. * Provide a basis for controlling change. * Reduce the risks associated with future procurements. * Specify the OSI implementation requirements of government bodies as a guide to product developers. * Stimulate the development of conformance and interoperability testing. -- Outline -- GOSIP is divided into a number of sections. These sections are called ``subprofiles'' and represent building blocks that can be selected to meet particular system requirements. GOSIP consists of four subprofile sets, each of which contains a number of subprofiles, each of which is made up of one or more standards. Customers should specify which particular subprofiles they require the product to conform to. The four subprofile sets are: GOSIP-S: The supporting services. Deals with communications management and ancillary issues. Covers naming and addressing, security and OSI management. GOSIP-F: The format of information. Deals with the syntax for data or document interchange, covers structured processable data and document formats (EDI and ODA) and character repertoires. GOSIP-A The application services. Deals with the top three OSI layers. Covers X.400 messaging, X.500 directory services, FTAM file services and terminal services. GOSIP-T: The underlying transport infrastructure. Deals with the lower four layers of the OSI model, covers LANs, WANs and relaying systems. The text of the US GOSIP is available by FTP from Imperial College . 2.3. IAB Official Protocol Standards ------------------------------------ IAB stands for the Internet Activities Board. -- Origin -- Internet Activities Board. The Internet has grown from the initial DARPA-Net to become a world wide network used by 100,000 companies and universities. This is a very rough estimate. The Internet appears to be growing exponentially at about 10% per month. -- Standard Status -- Updated regularly as new standards are adopted. At the time of writing the most recent version in the Imperial College repository was RFC 1250, issued on 1st August 1991. This is probably out of date by now. See your local FTP archive for the latest information. -- Purpose -- A list of the standards adopted by the IAB for use on the Internet. -- Outline -- Most standards organisations work by forming a committee to create a standard from scratch. Committee members are expected to be experts in their field and to release drafts for comment. The IAB takes a very different view. In their process, anyone can invent a protocol. They encourage these inventors to document their protocols as a ``Request For Comment'', or RFC. This is done even when there is no intention that the protocol be adopted as a standard. Such protocols are termed ``experimental''. Protocols that are intended to become standards are first designated ``proposed''. They are then implemented and tested by several groups, and a revised RFC may be issued as a result. Once the protocol has become stable it may become a ``draft standard'' and will normally become an IAB standard about 6 months later. Anyone can submit a document for publication as an RFC. It will then be assigned a number and ``published'' (made available on public FTP repositories). Once this has done the RFC can never be revised. If changes become necessary then a new RFC number is issued and the old one is tagged as obsolete. Not every RFC is intended as a standard. Other material covered includes documentation conventions, comments on past RFCs and technical notes on various aspects of the Internet. As well as the progression from experimental to standard protocols, the IAB also designates protocols as ``Required'', ``Recommended'', ``Elective'' and ``Not Recommended''. RFC 1250 "IAB Official Protocol Standards" defines this standardisation process and also lists the current state of all RFCs that have not been superseded. RFC 1250 was released in August 1991, and will probably be superseded by now. Consult a current index for the latest version. The US Department of Defence has adopted a number of IAB standards for its own use. These are: +-------------------------------+--------+------+--------------+ | Standard | Abbrev | RFC | DoD Number | +-------------------------------+--------+------+--------------+ | Internet Protocol | IP | 791 | MIL-STD-1777 | | Transmission Control Protocol | TCP | 793 | MIL-STD-1778 | | File Transfer Protocol | FTP | 765 | MIL-STD-1780 | | Simple Mail Transfer Protocol | SMTP | 821 | MIL-STD-1781 | | Telnet Protocol and Options | TELNET | 854 | MIL-STD-1782 | +-------------------------------+--------+------+--------------+ Note that these MIL-STD are now somewhat out of date. The Gateway Requirements (RFC-1009) and Host Requirements (RFC-1122, RFC-1123) take precedence over both earlier RFCs and the MIL-STDs. All current RFCs are available on-line from the Network Information Centre (NIC) repository in the USA. For details, send email to with the message body ``help: ways_to_get_rfcs''. The NIC repository is mirrored at Imperial College, but this may be slightly out of date. -- Update -- Since this report was completed, the IAB have signed a memorandum of agreement with ISO, under which the Internet will migrate towards X.400 addresses. 3. Data Specification Standards =============================== 3.1. ASN.1 ---------- -- Origin -- ISO. -- Standard Status -- Full ISO standard. ISO 8824 defines symbolic data description technique. ISO 8825 defines rules for encoding such a data description (Basic Encoding Rules, or BER). -- Purpose -- To define a symbolic human-readable data description technique, which defines the Type, Meaning and Structure of data, and a set of rules for encoding such a data description usually for transmission across an OSI network. Note that the symbolic data description defines Syntax and Grammar, but not Usage, so ASN.1 can be used in many applications. -- Outline -- The basis of ASN.1 is a set of Descriptive Statements which define its use in any particular application. The Type and Meaning of data is identified by tagging each data item with an Identifier, while the Structure is conveyed by the manner and order in which it is sent. The Descriptive Statements (which are themselves symbolic) define how the Identifiers and Structure are to be interpreted. Each application that specifies use of ASN.1 has to specify its own set of Descriptive Statements - standards that use it include the Protocol Data Unit encoding in various OSI protocols, such as FTAM (File Transfer, Access and Management), MMS (Manufacturing Message Specification), and the CCITT X.400 MHS (Message Handling System, CCITT recommendation X.409). -- References -- ComCentre Communique, April 1989. Further information in "Reading Abstract Syntax Notation One" by Ralph Purdue of ComCentre, available from ComCentre. 3.2. XDR (eXternal Data Representation) --------------------------------------- -- Origin -- SUN Microsystems Inc. -- Standard Status -- De-facto commercial standard, specification in the public domain. Available as a software library to be built into application programs (e.g. from GEC software). -- Purpose -- While ASCII files are portable, most binary files are not. XDR defines a self-descriptive format for binary files so that binary files encoded with XDR can be machine-independent. Data portability is of increasing concern, and pressure to provide portability is likely to increase. The choices for binary files are to adopt XDR, or to add knowledge of the source into binary files and translate appropriately on receipt (within the application). The latter is more efficient, but is of course limited to the supported application/machines. -- Outline -- XDR specifies a self-descriptive format (keyword+value) for binary files, so that such files can be machine-architecture independent. To make use of the XDR standard, programs that write and read binary files must be modified so that they write and read XDR formatted data. This can be done simply by implementing the specification in the application program, or by incorporating a pre-built library of XDR-conformant utilities in the application. -- Limitations -- Since XDR files contain self-descriptive data they will be larger than the corresponding non-portable binary files. This size penalty will be content dependent. XDR files will take longer to read and write than their non-portable equivalents, as a result of the extra processing and the larger file sizes. Note also that transmission of some binary files is not trivial (in particular stream files from C). -- References -- "The SUN Network File System: Design, Implementation and Experience" by R. Sandberg. SUN document. 4. Hardware Standards ===================== 4.1. SPARC ---------- -- Origin -- SUN Microsystems Inc. -- Standard Status -- De-facto standard, specification in public domain. Manufacturers include SUN and Hitachi. -- Purpose -- To provide an industry standard RISC processor architecture. SPARC stands for Scalable Processor ARChitecture. The idea is that designers can make a range of chips which can all run the same software. The number of on-chip registers may change, but the instruction set and register addresses used by programs will always stay the same. -- Outline -- RISC architectures attempt to maximise the effective speed of a computer. In a conventional CPU, much of the instruction set is used only rarely. These instructions provide little performance gain because of their rare use, but they cause a performance loss because the extra decoding hardware needed for these instructions slows down the machine for all instructions. RISC designers analyse the tradeoff between the gain when an instruction is used and the loss when it is not used. Hence they can make a logical decision about each instruction. The SPARC architecture introduces a rolling set of register windows. Each function has 8 ``local'' registers, 8 ``in'' registers and 8 ``out'' registers. When a function call is made, the new function gets a new set of ``local'' and ``out'' registers and the old ``out'' registers become the new ``in'' registers. In this way function arguments can be passed through the ``in'' and ``out'' registers without the need to transfer them via a stack frame. In addition the chip also has 8 ``global'' registers which can always be accessed. A SPARC chip may have between 6 and 32 register windows, giving between 104 and 520 on-chip registers. Once all the registers have filled up, the CPU has to start saving windows on the stack. The SUN manual "A RISC Tutorial" which describes the SPARC architecture also mentions ``tagged arithmetic'', but does not explain what this is or how it contributes to improved performance. The instruction set includes floating point instructions, and current implementations include on-chip FPUs. SPARC processors are mainly used in SUN workstations and their emulators. Their efficiency and future-proof design may also make them suitable choices for embedded applications, particularly those where equipment must be maintained and upgraded for long periods in the future. The SPARC architecture does not include a definition of the memory management unit, so this makes it more suitable for embedded applications than devices such as the 80486, where an MMU is included on-chip. "A RISC Tutorial", Sun Microsystems, Part Number 800-1795-10, Revision A, May 1988. 4.2. IBM PC and 80x86 CPUs -------------------------- -- Origin -- IBM and Intel. -- Standard Status -- De-facto industry standard. The 80x86 chips are now made by Intel and AMD, and ``clone'' PCs and accessories are made by hundreds of different manufacturers. Neither of these are open standards; in fact both Intel and IBM have tried to keep them closed. These attempts have been largely unsuccessful, and this has led in large part to the downfall of IBM. -- Purpose -- Originally, to make money for IBM and Intel. Neither company expected these products to become important standards. -- Outline -- Technically these are two different standards; one is a CPU chip, and the other is a computer architecture that uses it. In practice 80x86 chips are almost never used outside the PC architecture, and so they will be considered together. The term 80x86 is used to cover the following devices: 8086, 80186, 80286, 80386, 80486 and Pentium (sometimes called the '586: Intel gave it a name after an American court ruled that numbers cannot be trade-marked). All these chips share a common architecture and are upwardly compatible from left to right. They are also upwardly compatible with the 8 bit 8088 and Z80 CPUs. Of these chips, the most commonly used outside IBM PCs is the 80186. This is effectively the same as an 8086 except for a small amount of on-chip RAM and some IO devices. It is intended for embedded applications. The 80x86 range is notable for two things: 1: Its non-orthogonal register set. Most CPUs provide a large number of registers which can all be used in the same way. On the 68000 an add instruction can apply to any pair of the 8 data registers, or to a data register and a memory location. Similarly, any of the 8 address registers can be used to index data. The 80x86 does not provide this. Instructions are tied to particular registers. This creates problems for optimising compilers. 2: The segmented memory architecture, which was originally invented to allow upward compatibility with the 8088 and Z80. The original 8086 had a 20 bit address bus, giving 1Mb of addressable memory (this was thought to be generous given the expected lifetime of the architecture). Addresses are all 16 bit, with the extra 4 bits coming from ``segment registers''. These are also 16 bit registers, but the bits are shifted 4 bits to the left before being added to the address computed by the rest of the CPU. There are separate segment registers for code, stack and data. This causes problems for C compilers because they have to provide the programmer with a flat address space for pointer arithmetic and comparison. Further problems have been caused in the later chips by the expansion to a full 32 bit address bus (requiring an extra 12 bits of address to come from somewhere). The IBM PC was originally produced by a small team within IBM who believed that the company should try to cash in on the growth in the microcomputer market. This was not expected to be more than a sideline to IBM's main business, and so the development was done on a small budget. This led the development team to buy in their CPU and operating system rather than develop them from scratch. This allowed competitors to build their own ``compatible'' PC clones. Had IBM retained control of either of these two items then the clone market would never have developed and the commercial history of computers in the 1980s would have been very different. The 80186 CPU is useful in embedded applications, although it does not provide the power of more modern chips. The rest of the range are normally encountered in IBM PC motherboards. These can either be used in stand-alone PCs or embedded control applications. The wide availability of these boards, as well as IO devices and development software, makes this a good choice for many projects. However consideration should be given to the extra programming problems caused by the 80x86 and IBM PC architectures. A small saving on hardware may be swamped by the increase in coding time. See also the section on MS-DOS and Windows. 4.3. Ethernet ------------- -- Origin -- In 1976, Metcalf and Boggs of Xerox PARC published the first description of Ethernet. This was later codified into IEEE 802.3, which defines the Ethernet still in use today. -- Standard Status -- The standard is now well established and in wide use. IEEE 802.3 defines speeds up to 20 Mbit/sec, but the de-facto standard is 10Mbit/sec. Ethernet cards are available for IBM PCs. Unix workstations are usually fitted with Ethernet interfaces as standard. A number of packet switches exist for routing data between different subnets. Ethernet is starting to look a little slow, especially for large networks. -- Purpose -- A LAN for office and light industrial use. -- Outline -- Ethernet uses Carrier Sense Multiple Access with Collision Detection (abbreviated CSMA/CD). All nodes are connected to a common co-axial cable. When a station wishes to transmit, it first listens to see if any other station is transmitting. If it detects no other station then it goes ahead, otherwise it waits (this is the CSMA bit of the name). If two stations start transmitting simultaneously then hardware in both nodes detects this and stops the transmission. The two nodes then wait for a random interval before retrying (this is the CD bit of the name). Although data is transmitted over the Ethernet at 10Mbit/sec, theoretical studies predict that actual utilisation can only be about 30% of this. However according to a report from DEC [1], in practice utilisation can be as high as 95%. Above this the number of collisions rises rapidly and the entire network grinds to a halt. Since Ethernet is a bus-based LAN, utilisation is roughly proportional to the number of nodes. The usual solution to this problem is to split the LAN into a number of subnets, and use intelligent routers to switch messages >from one subnet to another. High speed fibre optic links are often used to transmit between subnets. In the OSI architecture, Ethernet covers layer 1 (physical) and the part of layer 2 (data link). Ethernet is ubiquitous and cheap. However it is not suitable for real-time communication because of the unpredictable transmission delay. Since every item of data is transmitted to every node, a single dishonest node can intercept all transmitted data and may also be able to masquerade as another node. These problems can be overcome by adding cryptographic protocols on top of the Ethernet standard, at some cost in performance. This may be important in some secure applications. [1] Dave Boggs, Jeff Mogul and Chris Kent. DEC Western Research Lab tech report 88-4 "Measured Capacity of an Ethernet: Myths and Reality". This report is available as a postscript file by anonymous ftp from gatekeeper.dec.com. 5. Software Standards ===================== 5.1. Unix --------- -- Origin -- The original version of Unix was written on a spare PDP-7 by Ken Thompson to support a video game. Since then it has had a long and complicated history. The story includes startup companies that became industry giants, university programmers who rewrote the whole thing for fun, several incompatible versions, and a number of standard wars. -- Standard Status -- The major Unix standard is POSIX (Portable Operating System Interface X), as defined in the IEEE 1003 family of standards. This won the standard war of the late 1980s and is now being adopted industry-wide. The Unix trademark is now (as of 14 October 1993) owned by X/Open, the Unix vendor club. Unix is now the dominant vendor-independent operating system. As a result it is frequently specified for large networked systems. -- Purpose -- POSIX is an attempt to provide a single standard for all implementations of Unix. However it is not tied to Unix. A vendor of a different operating system could provide the set of shells and utilities specified in 1003.2 and then claim to be POSIX-compliant. -- Outline -- Some of the information in this section is taken from the comp.unix.questions FAQ maintained by Ted Timar . POSIX standard numbers are of the form 1003.x. The following values of x have been allocated, although not all of these documents have been released: 0: Open Systems Environment 1: System Application Program Interface (C Language system calls) 2: Shell and Utilities 3: Test Methods 4: Real-Time Systems Interfaces 5: Ada Language Binding 6: Security 7: System administration (including printing) 8: Transparent File Access 9: FORTRAN Language Binding 10: Super computing 12: Protocol-independent interfaces 13: Real-Time Profiles 15: Supercomputing batch interfaces 16: C-language bindings 17: Directory services 19: FORTRAN 90 Language Binding As always, vendors are caught between the need to demonstrate that their products are standard and that they are better than anyone else. This leads to various non-standard extensions. However this should not be too great a problem since vendors usually indicate which areas of their product are not part of POSIX. A claim that a product is ``POSIX conformant'' does not imply that everything in the IEEE 1003.x document set is supported. However a vendor must produce a POSIX conformance document. This can be inspected to see if the product conforms in the areas of interest. X/Open (a vendor consortium) have produced a series of Portability Guides known as the XPG series. They include: XPG2: This was published in 1987. It has a strong System V influence. The volumes are: 1: Commands and Utilities 2: System Calls and Libraries 3: Terminal Interfaces, IPC, Internationalization 4: Programming Languages (C & COBOL) 5: Data Management (ISAM & SQL) XPG3 and XPG4 were published in 1989 and 1992 respectively. There were huge changes between XPG2 and XPG3 to align with POSIX.1 and to align partially with the C Standard. Many of the XPG2 interfaces and headers were withdrawn. There were equally drastic changes between XPG3 and XPG4, including alignment with POSIX.2 and FIPS 151-2, full alignment with the C Standard, and the addition of wide character interfaces. Therefore anyone working to these standards should ensure that they have the latest version, and that anything purchased as conforming to these standards also conforms to the latest version. 5.2. X Windows -------------- -- Origin -- Massachusetts Institute of Technology. -- Standard Status -- The standard is defined by a ``sample implementation'' from MIT. In fact this is the commonest implementation in use because MIT distributes the source code for free. X Windows has now been adopted as part of POSIX. Various vendors have modified the sample implementation and distributed the results without source code, but no-one seems to have rewritten it. Such an effort is unlikely. It would probably cost several million pounds to develop and the resulting product would have to compete with the MIT sample implementation. The current version is known as X11 revision 6, or just X11R6. New revisions are released by MIT every year or two. All revisions are upwardly compatible. It seems unlikely that there will be an X12 in the near future. The release of vendor-modified versions of X always lags behind the MIT release. Versions of the X Windows server are available for PCs, and X Windows is available on various PC versions of Unix. -- Purpose -- To provide a standard GUI for networked systems, especially Unix. The original intention was merely to manage the ``real-estate'' of the screen. However the basic API (known as Xlib) has been supplemented by a range of GUI libraries which are generally considered to be part of X Windows. X Windows works transparently over a network. A user can run a program on one machine and interact with it on another. A single application can control windows on several machines. -- Outline -- X11 is available under Unix, Ultrix and VAX/VMS. It is presented to application programmers through Xlib, a C procedure library. The network communication is hidden underneath Xlib, and other X libraries are built on top of this. X11 is based on a client-server model. For each physical display there is a controlling server. Client processes communicate with the servers via a reliable duplex byte stream with a block stream protocol layered on top. Where client and server are on the same machine the stream is based on a local IPC mechanism, otherwise a network connection is used. Client-server connections can be one-to-one, one-to-many, many-to-one, or many-to-many. X Windows supports one or more physical screens, with the windows arranged in a strict hierarchy. Each screen has a root window covering the display screen, covered partially or completely by child windows, which in turn may have their own children. There is usually at least one window per application program, and an application can create a tree of arbitrary depth on each screen. A child may extend beyond its parent window, but output to the child is clipped to the parent window boundaries. At each level in the hierarchy, one window is deemed to be dominant (i.e. obscures the others). Each window has a border (may be zero pixels wide), and can have a background colour (if it does not it will be transparent i.e. let other windows behind it be seen). X does not take responsibility for the contents of windows : if a window is obscured and then exposed, X will ask the application to repaint part or all of the window. X does, however, provide for storage of graphic objects called pixmaps (bitmaps if only using 1 pixel plane) at the workstation. The application can also elect to have the X server store obscured window areas as pixmaps so that it can repaint them itself. X also fails to define any system for the user to manage windows. Primitive functions for moving and resizing windows are provided by Xlib, but the user needs an application called a ``window manager'' to control these things. As far as X is concerned the window manager is just another application and has no special status or privilege. Users can therefore pick their own window manager according to taste. A range of such programs are distributed with X. Many X functions return an ID which allows the application to refer to objects stored on the X server. These can be of type Window, Font, Pixmap, Bitmap, Cursor, or Graphic Context. Fonts and cursors are normally shared automatically between applications. Most calls to Xlib operate asynchronously, but synchronisation can be forced by calls to XSync (e.g. to wait for a return value). The X Toolkit ``Xt'' is layered on Xlib. It acts as a basic GUI library, defining simple buttons and sliders and providing a standard protocol for applications to communicate with the window manager. GUI objects such as buttons and sliders are known as ``widgets'' in X terminology (short for ``window gadgets''). Xt widgets are defined in a strongly object-oriented way with function pointers used to provide a common interface to the various widgets. The result is flexible but complex. Application programmers are shielded from this complexity but programmers writing new widgets are not. A number of widget libraries have been produced by extending Xt. These include the Athena widgets, Open Look and Motif. Athena is included in the MIT distribution, but does not appear to be commonly used. The standards war between Open Look and Motif has now been won by Motif. Both of these are commercially available libraries. At one point SUN made a bid to corner the standard window system market with NeWS (Network Extensible Window System), but SUN now supports both NeWS and X. NeWS has lost the standards war and seems destined to fade into obscurity. A further level of abstraction can be layered on top of Xt: the UIMS (User Interface Management System). This allows the user interface to be separated from the application by providing high-level extensible tools for building user interfaces, with obvious advantages for portability. While X itself is ``value-free'', i.e it does not enforce a particular style or look-and-feel on user interfaces, the particular tool used will tend to result in stylistic similarities in user interfaces developed with it. Most UIMSs are tied to one particular GUI library. The favourite competitor for the de-facto GUI standard currently seems to be the Open Software Foundation library ``Motif'' Some vendors have released ``X terminals''. These are small computers, usually with big screens and an Ethernet interface, that are designed to run the X server. They often contain special graphics hardware and an X Server modified to take advantage of this. X terminals can provide a cost effective alternative to Unix workstations, but they cause a serious increase in network traffic. Increasing numbers of Unix applications are being released with X Windows support. Some will not run on anything else. GOSIP specifies ``Virtual Terminal'' support (ISO 9040, 9041). This provides form-based data entry facilities. It is not a substitute for X Windows in more complicated activities. 5.3. Virtual Terminal (ISO 9040, 9041) -------------------------------------- -- Origin -- ISO -- Standard Status -- ISO standards 9040, 9041. -- Purpose -- Standard for interactive operations over a network. -- Outline -- >From the FAQ: The Virtual Terminal (VT) service and protocol specified in ISO 9040 and ISO 9041 allow a host application to control a terminal with screen and keyboard and similar devices like printers. In addition, not only application-terminal, but also the less common application-application and terminal-terminal communication is supported. Today, only the Basic Class VT, which covers character-oriented terminals has been specified. This service is comparable to DoD Telnet and the old CCITT X.3/X.28/X.29 PAD protocol, but much more powerful. It also includes control of cursor movement, colors, character sets and attributes, access rights, synchronization, multiple pages, facility negotation, etc. This means that the huge number of classic terminal type definitions (e.g. in UNIX termcap or terminfo) are unnecessary at each host in the net, as the VT protocol includes the corresponding commands for one abstract virtual terminal that only have to be converted by the local implementation to the actual terminal control sequences. Consequently, the use of VT means not every host needs to know every type of terminal. As with most ISO standards that require general consensus amongst participating members, the OSI VT has many optional capabilities, two modes of operation and an almost infinite number of implementation- specific options. Profiles may help in reducing the optionality present (e.g., there exists a Telnet profile for VT). But it is doubtful that the OSI VT can completely put an end to the ``m * n'' terminal incompatibility problem that exists in a heterogeneous computer network. -- References -- "comp.protocols.iso FAQ" by Markus Kuhn (email ). 5.4. MS-DOS & MS-Windows ------------------------ -- Origin -- Microsoft. MS-DOS was originally written as a ``Quick and Dirty Operating System'' (QDOS) in about twelve weeks by a programmer named Tim Patterson (who is said to have regretted it ever since). Patterson's employer sold it to Microsoft (then known mostly for its CP/M implementation of Basic), who renamed it ``MS-DOS'' and licenced it to IBM for their new PC. The rest is history. -- Standard Status -- De-facto operating system and windowing system for IBM-PCs. -- Purpose -- MS-DOS seems to have been licenced by IBM because they had to have something quickly. Windows was created by Microsoft in response to the Apple Mac. Apple provided a user friendly GUI, and this enabled them to start invading the market share of IBM and Microsoft. -- Outline -- MS-DOS provides interrupt handling, a simple file system (the original version lacked a directory hierarchy) and a simple command line interpreter. Its memory allocation system could only handle 640Kbytes, which caused problem until add-on programs were developed to handle extra memory. Even today incompatibilities between different memory models can cause problems for PC users. Windows is an MS-DOS compatible operating system which provides a GUI front end and very limited multi-tasking. The current version is 3.1. One of the major innovations of Windows (at least for the IBM-PC world) was ``Object Linking and Embedding'' or OLE. This allows objects from different applications to be placed in a single document. A diagram and a spreadsheet can be included in a report. The spreadsheet figures can be linked to a bar chart in such a way that when the spreadsheet is changed the bar chart is updated automatically. At the time of writing leaks have been appearing in the trade press concerning ``Chicago'', the code-name of Windows 4. It appears that Windows 4 will be an object-oriented OS, upward-compatible with Windows 3 but otherwise completely divorced from MS-DOS. Most business applications software sold today works under MS-Windows. Customers are likely to insist that bespoke software bought from us can interwork with their off-the-shelf packages. Software developers can develop new packages which are actually a mixture of standard third-party packages and small amounts of custom software. Such packages tend to be cheaper and more flexible than software developed from scratch. 5.5. Windows NT --------------- NT stands for ``New Technology''. -- Origin -- Microsoft. -- Standard Status -- Proprietary standard. Microsoft hope that this will become the de-facto standard to replace MS-Windows. Its chief competitors are Unix with X-Windows and OS/2 from IBM (proprietary, unlikely to have a long-term future). -- Purpose -- The heir apparent to the Windows crown. Microsoft hopes to wean users >from MS-Windows by a combination of more features, true multi-tasking and upward compatibility. -- Outline -- >From the user point of view, Windows NT seems to be a bigger MS-Windows. NT needs at least an 80486 CPU with about 16 Mbytes of RAM and a correspondingly huge hard disk. Microsoft expect that it will be used for large network disk servers, while individual users continue to run MS-Windows. As the average PC grows in power, users will migrate to NT. Also Windows-NT is not tied to the 80x86 architecture. By way of comparison, a Unix with X windows will run reasonably well on an 80386 CPU with 8Mbytes of RAM. 5.6. CORBA (Common Object Request Broker Architecture) ------------------------------------------------------ -- Origin -- Object Management Group (OMG). -- Standard Status -- Object Management Architecture Guide published. -- Purpose -- The Object Management Group (OMG) is an international software industry consortium with two primary aims: * Promotion of the object-oriented approach to software engineering in general. * Development of command models and a common interface for the development and use of large-scale distributed applications (open distributed processing) using object-oriented methodology. -- Outline -- The following text is from the "comp.object FAQ" The extract was written by Richard Soley, OMG technical director (email ). In late 1990 the OMG published its Object Management Architecture (OMA) Guide document. This document outlines a single terminology for object-oriented languages, systems, databases and application frameworks; an abstract framework for object-oriented systems; a set of both technical and architectural goals; and an architecture (reference model) for distributed applications using object-oriented techniques. To fill out this reference model, four areas of standardisation have been identified: * The Object Request Broker, or key communications element, for handling distribution of messages between application objects in a highly interoperable manner; * The Object Model, or single design-portability abstract model for communicating with OMG-conforming object-oriented systems; * The Object Services, which will provide the main functions for realising basic object functionality using the Object Request Broker - the logical modelling and physical storage of objects; and * The Common Facilities will comprise facilities which are useful in many application domains and which will be made available through OMA compliant class interfaces. The OMG adoption cycle includes Requests for Information and Proposals, requesting detailed technical and commercial availability information from OMG members about existing products to fill particular parts of the reference model architecture. After passage by Technical and Business committees to review these responses, the OMG Board of Directors makes a final determination for technology adoption. Adopted specifications are available on a fee-free basis to members and non-members alike. In late 1991 OMG adopted its first interface technology, for the Object Request Broker portion of the reference model. This technology, adopted from a joint proposal (named "CORBA") of Hewlett-Packard, NCR Corp., HyperDesk Corp., Digital Equipment Corp., Sun Microsystems and Object Design Inc. includes both static and dynamic interfaces to an inter- application request handling software "bus." Unlike other organisations, the OMG itself does not and will not develop nor sell software of any kind. Instead, it selects and promulgates software interfaces; products which offer these interfaces continue to be developed and offered by commercial companies. Implementations of CORBA 1.1 are available from Hewlett-Packard (with HP Distributed Smalltalk), HyperDesk (Runs on several common architectures), IBM (System Object Model, AIX \& OS/2 only) and Sun. The OMG is basically a vendor club (although it has open membership). CORBA has not yet been recognised by ANSI, ISO or CCITT, but that is the obvious next stage. In the mean time, CORBA constitutes the only open standard in the area. -- References -- "comp.object Frequently Asked Questions" by Bob Hathaway (email ). October 1993. Available by FTP from the Imperial College repository. 5.7. SQL: Structured Query Language ----------------------------------- -- Origin -- IBM. -- Standard Status -- ISO standard 9075 released 1989 and ``updated'' in 1992. This update tripled the size of the standard. Implementations exist for INGRES, ORACLE (and many others), but not 100% compliant with the standard, and, due to extensions beyond the standard, not mutually compatible. -- Purpose -- To provide database-independence for users and applications. -- Outline -- SQL is a ``language'' for communicating with databases. A user at a terminal can prepare an SQL script and execute it in batch mode, or a program can generate SQL statements to speak to the database. SQL provides commands for: * Data insertion, modification, and deletion * Query * Data definition * Access control -- Limitations -- There are many areas not covered by SQL, including data dictionary, forms, foreign keys, primary keys, and referential integrity. In addition, not all implementations are 100% compliant, and some (all?) extend beyond the standard, so portability is limited. Both of these problems may reduce with time as both the standard and implementations grow. 6. Application Standards ======================== 6.1. X.400 Message Handling System ---------------------------------- -- Origin -- CCITT. -- Standard Status -- X.400 standard issued November 1988. The term ``X.400'' is often used to indicate a collection of standards in the range X.400 -- X.420. X.400 itself is actually just the system and service overview. -- Purpose -- To provide a standard for handling electronic mail on a store-and-forward basis. The content of the email is not interpreted or altered by the system except in certain specific situations, for instance where character set conversion is necessary. -- Outline -- A Message Handling System (MHS) is split up into the following components: UA: User Agent. An application program which interacts with a human who is reading or sending electronic mail. AU: Access Unit. This allows indirect access to the system, for instance by automatically printing out messages and handing them over for physical delivery. MTS: Message Transfer System. The system responsible for storing and forwarding electronic mail. This in turn is split up into: MTA: Message Transfer Authority. The subsystem responsible for moving messages towards their destination. MS: Message Store. The subsystem responsible for holding messages until they are either forwarded by an MTA or deleted by an AU or UA. On top of this system a range of services can be built. The X.420 standard defines the Inter-Personal Message (IPM) system. This provides user-to-user electronic mail. Addressing and routing is done by the X.500 directory services. This standard is included in the UK GOSIP. It is the most widely accepted official standard, but it is not the most widely used. That is the RFC 822 Internet Text Messages standard. 6.2. X.500 Directory Services ----------------------------- -- Origin -- CCITT. -- Standard Status -- X.500 standard issued November 1988. As with ``X.400'', the term ``X.500'' is often used to indicate a collection of standards in the range X.500 - X.521. -- Purpose -- Directory services are necessary for two reasons. 1: To isolate users of the network from frequent changes to its structure. 2: To provide a more user-friendly view of the network. The standard specifies how electronic directories of people and services should be organised. The specification includes the ways in which different organisations can arrange for their directories to work together, and methods of authentication for access and modification. -- Outline -- A typical X.500 address will look like this: C="GB" O="GEC" OU="Marconi Research Centre" T="Research Scientist" CN="Paul Johnson" The various attribute names are specified in X.520. The list above is only a sample. ``C'' stands for ``country'', ``O'' for ``organisation, ``OU'' for ``organisational unit'', ``T'' for ``title'' and ``CN'' for ``common name''. Note that CN="Laser Printer" would be equally valid. Directories cover services as well as human beings. The X.500 directory is organised as a tree with individuals and services at the leaves. At higher level nodes, various organisations are given authority to manage their own local namespaces. At the top level are various countries, known by their two-letter ISO codes (e.g Great Britain = GB). Below this are organisations, organisational units and people. A national authority is responsible for allocating names and aliases to organisations. Each organisation is responsible for the names of its organisational units, and so on. The levels in the tree are mapped on to the X.520 attribute names. Tree-structured databases such as this suffer from efficiency problems. A search for a research scientist at GEC-Marconi in Great Britain can be performed quickly because the geographical area is known. A search for a research scientist named Paul Johnson would have to be sent world-wide. To avoid this the standard allows different hierarchies to be used in a ``Yellow Pages'' service. For instance a professional organisation such as the IEE could maintain an X.500 directory of members organised by title and professional area rather than employer. Entries in such a directory would all be ``aliases''; entries which actually point to real entries elsewhere. X.500 is the most widely accepted official standard in this area. However it is not the most widely used system. See section \ref{internet} on Internet Services for more information. 6.3. FTAM (ISO 8571) -------------------- -- Origin -- ISO/OSI. -- Standard Status -- ISO standard 8571, BS 7090. Published in 1989. -- Purpose -- To provide a transparent, network-wide, file transfer and management service. -- Outline -- FTAM is an OSI layer 7 standard which forms part of the UK GOSIP. It provides network-wide file transfer and access, but does not actually constitute a file system. This leads to a dichotomy between files held on the local file system (which can be accessed by other application programs) and files available through FTAM (which have to be transferred to the local machine before they can be accessed). In theory a file system such as NFS (q.v.) can make remote files as accessible as local ones. In practice this is only feasible for local area networks. For wide area networks it makes more sense to down-load a file to the local network before working on it. Therefore FTAM should be used for managing files on a WAN. NFS or an equivalent should be used on a LAN. FTAM is part of GOSIP, but it is not clear whether this must be used for local area networks. 6.4. NFS: Network File System ----------------------------- -- Origin -- SUN Microsystems Inc. -- Standard Status -- De-facto standard, specification and source code in the public domain. Implementations exist for SUN, VAX/ULTRIX, VAX/VMS (server side only), Apollo, and IBM PC (client side only) -- Purpose -- To provide a transparent network-wide file system, i.e. it provides network-wide access to files (and directories) without the user/program having to know where the files reside. It should work in mixed networks (provided that each machine supports NFS). NFS is designed to be portable to other machine architectures and operating systems. In addition, NFS aims to allow clients and servers to recover from machine or network failures. -- Outline -- NFS is implemented on top of a Remote Procedure Call (RPC) package to simplify protocol definition and implementation, and uses an External Data Representation (XDR) to describe protocols in a machine and system independent way. To make NFS transparent to applications, the generic filesystem operations are separated from specific filesystem implementations. The generic filesystem supports two kinds of operation - operations on the filesystem (using the Virtual File System, VFS), and operations on the files within the filesystem (using the Virtual Node, vnode). NFS consists of three components: 1: The NFS protocol. This uses the SUN RPC mechanism. The RPC mechanism is synchronous, so behaves exactly like local procedure calls, which makes it easy to use. In addition, the protocol is stateless, i.e. each procedure call contains all of the required information in the call parameters, so there is no state history to be maintained (or re-established after a crash). This means that neither client nor server has to deal with crash recovery. NFS is transport independent - it currently uses the DARPA User Datagram Protocol (UDP) and Internet Protocol (IP), but could switch to others without altering the higher level protocols. The NFS Protocol and RPC utilise the SUN XDR specification. The NFS Protocol supports directory and file operations including create and delete directory, rename, create, lookup, remove a file, read, write, truncate a file, read from directory, change file attributes, etc. 2: The Server Side. Because the NFS server is stateless, when servicing a request it must commit all modified data to stable storage {\em before} returning results. This will include the data directly modified, and any consequential changes (e.g. directory changes resulting from a file change). 3: The Client Side. For compatibility with existing UNIX applications, NFS uses a UNIX-style pathname. However, the host-name lookup and file address binding are done once per filesystem via the mount command, which means that files are not available to the client until the mount is completed. The VFS and vnode interfaces hide the differences between file systems from the applications, making NFS transparent to different filesystems. Note that NFS does not itself support file locking. Instead SUN provides a separate file and record locking mechanism based on RPC. Because file locking is inherently stateful (vs. stateless), there is also a status monitor which allows the lock manager to unlock files after a crash. Of possibly greater significance is the fact that concurrent write access is not restricted by NFS: file modifications are locked at the inode level, which prevents two processes intermixing data from a single write. However, since NFS does not maintain locks between requests, and a write may span several RPC requests, two clients can intermix data on long writes. This follows the Unix philosophy. At present NFS is the nearest thing to an open standard for a file system that can work transparently over a network (FTAM is not capable of this). It is also a de-facto standard with a large number of installations world-wide. As such it is probably the system of choice for any large networked installation. -- Limitations -- While NFS allows access to files in a mixed environment this is only really useful if the files themselves are ``portable''. NFS is therefore relevant to ASCII files, and to binary files in standard formats (e.g. XDR). It is not relevant to task images, program object code, or non-standard binaries. This limit would not normally apply in a single-vendor network, where we might expect all file types to be compatible. Note also the point above about lack of concurrent write control. -- References -- "The SUN Network File System", Russel Sandberg, SUN, 1986. 7. Data Exchange Formats ======================== 7.1. Graphics Interchange Format -------------------------------- -- Origin -- CompuServe, a commercial electronic conference service. -- Standard Status -- Used to be the de-facto standard, but is now being replaced by JPEG. -- Purpose -- Developed as a device-independent method of storing pictures. -- Outline -- Pictures suitable for GIF encoding must use a palette of not more than 256 colours. This palette is stored with the picture. The picture is stored as a series of 8 bit indices into the palette, compressed. Apart from the palette quantisation, GIF is a non-lossy picture compression method. A 1024x768 pixel picture with 256 colours takes about 660Kbytes. Note that converting a GIF picture to JPEG is a bad idea: the dithering required for palette quantisation in GIF looks like fine detail to JPEG. 7.2. JPEG --------- JPEG is pronounced ``jay-peg''. -- Origin -- The Joint Photographic Experts Group, a sub-committee of ISO. -- Standard Status -- Part of ISO. -- Purpose -- A standard file format and compression algorithm for full colour pictures. One of the options for the compression algorithm (Q Coding) is patented. -- Outline -- The best brief introduction to JPEG is to be found in the comp.compression FAQ. The following information is quoted from the FAQ. JPEG works on either full-colour or gray-scale images; it does not handle bi-level (black and white) images, at least not efficiently. It doesn't handle colourmapped images either; you have to pre-expand those into an unmapped full-colour representation. JPEG works best on ``continuous tone'' images, usually those of natural real-world scenes. It does not work so well on non-realistic images, such as cartoons or line drawings, which have many sudden jumps in colour values. JPEG does not handle black-and-white (1-bit-per-pixel) images, nor does it handle motion picture compression. Standards for compressing those types of images are being worked on by other committees, named JBIG and MPEG respectively. Regular JPEG is ``lossy'', meaning that the image you get out of decompression isn't quite identical to what you originally put in. The algorithm achieves much of its compression by exploiting known limitations of the human eye, notably the fact that small colour details aren't perceived as well as small details of light-and-dark. Thus, JPEG is intended for compressing images that will be looked at by humans. If you plan to machine-analyse your images, the small errors introduced by JPEG may be a problem for you, even if they are invisible to the eye. The JPEG standard includes a separate lossless mode, but it is not widely used and does not give nearly as much compression as the lossy mode. Note that JPEG is not suitable for binary images such as documents. JBIG should be used instead. Any high-volume use of JPEG will require dedicated hardware. Such hardware is available, either as a chipset or as expansion boards for IBM PCs. See the comp.compression FAQ for a list of devices and boards. \subsubsection{References} "comp.compression Frequently Asked Questions" by Jean-loup Gailly (email ). Available from the Imperial College repository. October 1993. Contains a good introduction to many aspects of data compression, along with references to standard text books and current research. The following text is taken from this reference list: "The JPEG Still Picture Compression Standard" by Gregory K. Wallace, Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44. A good technical introduction to JPEG. Adjacent articles in that issue discuss MPEG motion picture compression, applications of JPEG, and related topics. "The Data Compression Book" by Mark Nelson. This book provides excellent introductions to many data compression methods including JPEG, plus sample source code in C. The JPEG-related source code is far from industrial-strength, but it's a pretty good learning tool. "JPEG Still Image Data Compression Standard" by William B. Pennebaker and Joan L. Mitchell. Published by Van Nostrand Reinhold, 1993, ISBN 0-442-01272-1. 650 pages, price $59.95. This book includes the complete text of the ISO JPEG standards, DIS 10918-1 and draft DIS 10918-2. Review by Tom Lane: This is by far the most complete exposition of JPEG in existence. It's written by two people who know what they are talking about: both serve on the ISO JPEG standards committee. If you want to know how JPEG works or why it works that way, this is the book to have. There are a number of errors in the first printing of the Pennebaker & Mitchell book. An errata list is available at ftp.uu.net: graphics/jpeg/pm.errata. At last report, all were fixed in the second printing. 7.3. JBIG Binary Image Compression ---------------------------------- -- Origin -- Joint Bi-level Images Group, an experts group of ISO (JTC1/SC2/WG9 and SGVIII). -- Standard Status -- Under development. Parts of the proposed standard are patented. -- Purpose -- To provide a system for compressing binary images (like faxes). This will replace the current group 3 and 4 fax algorithms. The main characteristics of the algorithm are: * JBIG will be lossless: images will not be changed by the encoding and decoding processes. * Images can be encoded and decoded sequentially: there is no need for either end to store the entire image. JBIG works best on bi-level images (like faxes). It also works well on Gray-coded grey scale images up to about six per pixel. This is done by applying JBIG to the bit planes individually. For more bits per pixel, lossless JPEG usually provides better performance. Anything beyond 6 bits per pixel is usually noise anyway, and so should be ignored. -- Outline -- The following text is taken from the Usenet comp.compression Frequently Asked Questions [\ref{comp.compression.faq.jbig}], section 74. This extract was written by Hank van Bekkem (email address: ). JBIG parameter P specifies the number of bits per pixel in the image. Its allowable range is 1 through 255, but starting at P=8 or so, compression will be more efficient using other algorithms. On the other hand, medical images such as chest X-rays are often stored with 12 bits per pixel, while no distortion is allowed, so JBIG can certainly be of use in this area. To limit the number of bit changes between adjacent decimal values (e.g. 127 and 128), it is wise to use Gray coding before compressing multi-level images with JBIG. JBIG then compresses the image on a bitplane basis, so the rest of this text assumes bi-level pixels. Progressive coding is a way to send an image gradually to a receiver instead of all at once. During sending, more detail is sent, and the receiver can build the image from low to high detail. JBIG uses discrete steps of detail by successively doubling the resolution. The sender computes a number of resolution layers D, and transmits these starting at the lowest resolution Dl. Resolution reduction uses pixels in the high resolution layer and some already computed low resolution pixels as an index into a lookup table. The contents of this table can be specified by the user. This is the obvious standard for any kind of electronic document storage and transmission system. The patented algorithm at its heart is a cause for some worry. Any application of this standard would require a patent license from IBM. -- References -- "comp.compression Frequently Asked Questions" by Jean-loup Gailly (email {\tt }). Available from the Imperial College repository. October 1993. "Progressive Bi-level Image Compression, Revision 4.1", ISO/IEC JTC1/SC2/WG9, CD 11544, September 16, 1991 "An overview of the basic principles of the Q-coder adaptive binary arithmetic coder", W.B. Pennebaker, J.L. Mitchell, G.G. Langdon, R.B. Arps, IBM Journal of research and development, Vol.32, No.6, November 1988, pp. 771-726 (This is the patented algorithm. See also the other articles about the Q-coder in this issue) 7.4. MPEG --------- -- Origin -- The Moving Pictures Experts Group, a part of ISO. -- Standard Status -- In January 1992 a Committee Draft of MPEG phase I was released (colloquially called MPEG-I). Its exact name is ISO CD 11172. MPEG-II is presently being developed. It will probably be released some time in 1994. MPEG-I chips are available from a number of suppliers. Some run at up to 4Mbits/sec data rates, allowing higher quality video than pure MPEG-I. -- Purpose -- To define a standard for compressed digital video and audio. -- Outline -- MPEG I defines a system requiring about 1.5Mbits/sec for video with a mono sound track. Frame size and rates differ for American and European standards (to fit in with the American NTSC and European PAL and SECAM analogue video standards). The European standard transmits 288 lines of 352 pixels at 50 fields per second. The fields are then interlaced to give 25 frames per second. 1.5Mbits/sec was chosen as a target figure for MPEG-I because that is the data rate provided by CD and DAT. MPEG-II will transmit ``entertainment'' quality video and sound at about 4Mbits/sec. An introduction to MPEG, along with an regularly updated list of chips and boards, can be found in the Usenet comp.compression Frequently Answered Questions. MPEG-I will be the standard for medium-quality video in such applications as CD-Interactive and video-phones. Chips are available which work at higher data rates than specified in MPEG-I, although these will not comply with MPEG-II. These chips are aimed at the cable TV and video-conferencing markets. -- References-- comp.compression Frequently Asked Questions by Jean-loup Gailly (email ). Available from the Imperial College repository. October 1993. Contains a brief description of the MPEG algorithm and a list of devices which implement it. 7.5. u-Law and A-Law (G.711) ---------------------------- The "u" in "u-Law" is actually the Greek letter "mu". -- Origin -- CCITT. -- Standard Status -- Standard G.711. Fairly widely used. u-Law is used in North America and Japan, and is often implemented on Unix workstations. A-Law is used in the rest of the world, including international telephone routes. -- Purpose -- To provide a simple logarithmic compression scheme for audio data. -- Outline -- G.711 is a lossy compression scheme which compacts 16 bit sound samples down to 12 bits, or 12 bit samples down to 8 bits. Like all lossy compression schemes it is designed around imperfections in human perception. A loud sound will ``drown out'' a quiet one. So compression schemes can afford to add random noise when the signal is loud, provided that they keep the random noise down when the signal is quiet. A log-law compression scheme quantises the input data on a logarithmic scale instead of a linear one. This provides precision (and hence low noise) at low values, but the quantisation errors (and hence the random noise) increase at higher levels. This random noise is drowned out by the signal. This is probably the simplest compression scheme for sound. It also provides reasonable quality and a reasonable amount (25-35%) of compression. -- References -- "FAQ: Audio File Formats" by Guido van Rossum (email ). 8. Document Formats =================== 8.1. SGML: Standard Generalised Markup Language ----------------------------------------------- -- Origin -- Publishing (Association of American Publishers, AAP ?). -- Standard Status -- ISO standard 8879. -- Purpose -- To define a standard way of describing the purpose of individual pieces of data in a text document, in order that the meaning and structure of the text can be extracted by automatic programs. For example, titles, paragraph headings, notes etc. should be identified as such. Data marked up with SGML can therefore be regarded as a very simple sort of database rather than as a simple sequential text file, which adds considerably to its value. In particular, it becomes possible to port the data between different publishing systems etc without loss of structure. -- Outline -- Historically, document data contained procedural markup, which conveys how the data will appear in printed form. Commands such as indenting, listing, titling, font selection and so on fall into this class, in products such as RUNOFF. Apart from making many global changes difficult, this ties the data to the particular interpreter because the commands are not universal. SGML is different - it is a descriptive markup language, which defines the data in prescribed categories. The decision to print second-level paragraph headings in double-height underlined Gill Sans is not embedded in the data, but is defined by a separate mapping (in a post processor: the DSSSL, or Document Style Semantics and Specification Language, is currently being defined as a companion standard to SGML to standardise the workings of such post processors). Thus the content of the data and the form of its presentation are separated, which makes the data portable between different systems. In addition, the user can define different document types, with different allowable elements and structures, using DTDs - Document Type Definitions. Thus memos, letters, instruction sheets, amendment sheets etc can be defined in terms of content and organisation, and can be produced to any standard output format simply by modifying the post-processing instructions. It is, of course, possible for other programs to interrogate such data, since the combination of DTD and document data is self-descriptive. Non-character based data can be implicitly embedded in an SGML document simply by storing it in a separate file and embedding a reference to it in the text file. Note that the Office Document Architecture (ODA), ISO standard 8613, is to some extent competitive with SGML, though it focusses more on interchange of formatted documents. ODA is specified as part of GOSIP. 9. Internet Services ==================== 9.1. TCP/IP ----------- -- Origin -- This is actually two different standards. RFC 791 defines the Internet Protocol (IP). RFC 793 defines the Transmission Control Protocol (TCP). There is also a User Datagram Protocol (UDP) defined in RFC 792 for packets where the delivery and ordering of the packets is handled by the client software. This is used by the Network File System and the Sun Remote Procedure Call (RPC) library. IP as defined in RFC 791 has been ammended by RFCs 950 (Subnet extension), 919 (Broadcast Datagrams) and 792 (Broadcast Datagrams with Subnets). There are also a number of mappings between IP and various other network protocols, including Ethernet and X.25. -- Standard Status -- IP and its ammendations are all Required Protocols. TCP is a Recommended Protocol. In practice it would be a very unusual Internet node that did not support TCP. The mappings from IP to other network protocols are Elective (``if you are going to do something like this, you should do exactly this''). -- Outline -- IP provides a basic packet switching protocol. Packets are not acknowledged, and may be delivered out of order. A checksum is included with each packet, but there is no error correction facility. TCP is intended to be a highly reliable host-to-host protocol between hosts in packet-switched networks. It is built on top of IP and provides a connection-oriented end-to-end link between pairs of processes running on different host machines. The protocol includes methods for connection between numbered ``ports'' on the two hosts, flow control, automatic retransmission of lost data, and the transmission of precedence and security information. The connection between a port number and an application process is handled by the operating system on the host machine. This is the de-facto world standard upon which higher-level services are built. Any system which needs to communicate with other systems accross a WAN should support this. Numerous third-party implementations are available, including packet switches and routers. 9.2. ARPA Internet Text Messages (RFC 822) ------------------------------------------ -- Origin -- RFC 724: "Proposed official standard for the format of ARPA Network messages" was written by D. Crocker, K.T. Pogran, J. Vittal and D.A. Henderson and released on 12 May 1977. A modified form was adopted as a standard (RFC 733, 21st November 1977). D. Crocker wrote a revised version (RFC 822, 13th August 1982) which has now been adopted as the standard. -- Standard Status -- Recommended: all Internet sites should support this. -- Purpose -- To provide a minimum standard for electronic mail with a framework for future expansion. -- Outline -- RFC 822 is designed to require a little and permit a lot. A message is divided into the ``header'' (a sequence of fields in a format which can be parsed by the machine) and a ``body'' (the text of the message). RFC 822 describes a syntax for header fields and lists a set of header fields which must be included. Other headers may be added by various applications. These are permitted by the standard, but apart from the basic syntax of header fields RFC 822 does not specify anything about them. Examples of such headers include ``X-Face'' (a compressed image of the sender), ``X-Mailer'' (the name of the application program used to compose the message) and ``X-Automatic-Reply'' (indicates that the message was generated by some kind of automatic process). Internet email is increasingly being used as a vehicle for other services, including file transfer, remote job submission, electronic conferencing and software distribution. The general idea is to package some kind of executable script in an email message and send it to an address on the remote machine. Mail to this address is delivered to some kind of automatic server program which performs the appropriate function, packages the results into another message, and mails them back. Internet electronic mail addresses are of the form: alias@site.domain The ``alias'' part is the name of the recipient. It is largely up to the destination machine to resolve this. Most machines allow users to be addressed by a range of aliases, usually including one of the form ``Fore-name.Surname''. The ``site'' part is usually the name of the organisation where the recipient has an account. In some cases it may also be divided by periods. The ``domain'' part allows hierarchical subdivision of the ``site'' namespace. Common domains include: +-------+----------------------------+ | com | commercial | | edu | US academic | | gov | US government | | mil | US military | | org | US non-profit organisation | | co.uk | UK commercial | | ac.uk | UK academic | +-------+----------------------------+ USA sites do not usually append a national domain name, reflecting the American origins of the Internet. Other countries have their own domain names, usually based on the ISO two-letter country code (the UK is an exception to this). The UK Joint Networking Team is responsible for JANET. They specify a similar email standard to the IAB. The most noticeable difference is that email addresses to the right of the ``@'' sign are reversed, so that in the UK {\tt paj@gec-mrc.co.uk} becomes {\tt paj@uk.co.gec-mrc}. This occasionally causes problems. UK Government departments and commercial organisations which do most of their business with the government will want X.400 mail. The rest of the world will want RFC 822 mail. A number of organisations will want both. RFC 1148 proposes a mapping between the two standards. 9.3. X.400 - Internet Email Mapping (RFC 1148) ---------------------------------------------- -- Origin -- RFC 987: "Mapping between X.400 and RFC 822" by S.E. Kille was released on 1st June 1986. Updated by RFC 1026 (1st Sept 1987) and RFC 1138 (1st December 1989). The latest version is RFC 1148: "Mapping between X.400(1988) / ISO 10021 and RFC 822" (1st March 1990) and contains minor clarifications to RFC 1138. The work which led to RFC 1138 was partly sponsored by the Joint Networking Team. -- Standard Status -- Listed as IAB Elective standard. The IAB defines an ``Elective'' standard as ``If you are going to do something like this then you should do exactly this.'' -- Purpose -- To define a mapping between the X.400 and RFC 822 electronic mail standards. The design goals were: 1: The specification should be pragmatic. There should not be a requirement for complex mappings for ``Academic'' reasons. Complex mappings should not be required to support trivial additional functionality. 2: Subject to (1), functionality across a gateway should be as high as possible. 3: It is always a bad idea to lose information as a result of any transformation. Hence, it is a bad idea for a gateway to discard information in the objects it processes. This includes requested services which cannot be fully mapped. 4: All mail gateways actually operate at exactly one level above the layer on which they conceptually operate. This implies that the gateway must not only be cognisant of the semantics of objects at the gateway level, but also be cognisant of higher level semantics. If meaningful transformation of the objects that the gateway operates on is to occur, then the gateway needs to understand more than the objects themselves. 5: The specification should be reversible. That is, a double transformation should bring you back to where you started. -- Outline -- RFC 1148 defines a ``gateway''. In electronic mail terminology this is a component that performs protocol mappings. Unfortunately RFC 822 and X.400 do not map on to each other well. Services in X.400 which are not defined in RFC 822 are mapped on to extension headers to avoid information loss, but there is no guarantee that the RFC mailers will comply with these. For instance X.400 has an service to set an expiry date on messages. This is mapped on to a new header (``Expiry-Date:'') which RFC 822 systems will ignore unless specially programmed to process it. In general the only RFC 822 mailer likely to recognise these fields is another RFC 822 - X.400 mail gateway. RFC 822 headers are either mapped onto standard X.400 services or jammed into an extension service ``RFC 822 Header Field''. The biggest problem in the mapping is addresses. RFC 822 addresses are of the form "user@site.domain". X.400 addresses are a sequence of symbol-value pairs (see the section on X.500 for more information). To solve this problems RFC 1148 defines the following: * A mapping between the ``user'' part of the RFC 822 address and the ``PersonalName'' attributes of the X.400 address. * A system of ``associations'' between the ``site'' and ``domain'' parts of RFC 822 and various other attributes. The imperfection of the RFC 822 - X.400 mapping is a regular annoyance for those who must transmit their email through such gateways. This situation is unlikely to improve. 9.4. Distributed Electronic Conferencing (RFC 1036) -- Origin -- RFC 850 by M.R. Horton released on 1st June 1983. Obsoleted by RFC 1036. -- Standard Status -- RFC 1036: "Standard for interchange of USENET messages" by M.R. Horton and R. Adams. Released 1st December 1987. This is not recognised as a standard by the Internet Activities Board. Despite this it is the de-facto world standard for electronic message broadcasting (as opposed to the point-to-point electronic mail standards of X.400 and RFC 822). -- Purpose -- To define a standard header format for broadcast messages on electronic conferences such as ``USENET''. -- Outline -- RFC 1036 extends the RFC 822 electronic mail standard by adding a number of extra fields. Any message conforming to RFC 1036 also conforms to RFC 822. USENET is a world-wide distributed electronic conference system. It is divided into a hierarchy of ``newsgroups'', each one of which has a particular topic. The USENET distribution mechanism uses a tree structure. Each node in the network is connected by some transport mechanism to a small number of other neighbouring nodes. When a node receives a message from one of its neighbours, it stores that message on disk and forwards a copy to all its other neighbours, who in turn forward it on to their neighbours. In this way any message ``posted'' to a USENET group will spread through the network. Although the Usenet message format is specified in an Internet RFC, Usenet distribution is not tied to the Internet. Usenet articles can be transmitted by the same means as any other data, including packet switching networks, high speed modems, and a floppy disk carried from one site to another. There is no equivalent to RFC 1036 in the X.400 world, but something could certainly be defined if a project required it. This could then be submitted to CCITT as the basis for a standard. However it would be difficult to avoid the problem of mapping between the two systems that bedevils X.400 - RFC 822 gateways. -- Paul Johnson (paj@gec-mrc.co.uk). | Tel: +44 245 473331 ext 3245 --------------------------------------------+---------------------------------- You are lost in a twisty maze of little | GEC-Marconi Research is not standards, all different. | responsible for my opinions