From paj@uk.co.gec-mrc Wed Jul 27 16:20:26 1994
From: paj@uk.co.gec-mrc (Paul Johnson)
Newsgroups: comp.std.misc,comp.misc
Subject: Software, System and Applications Standards
Date: 15 Jul 94 11:48:07 GMT
Organization: GEC-Marconi Research Centre, Great Baddow, UK
X-Newsreader: TIN [version 1.2 PL1]


	     Software, System and Applications Standards
	     ===========================================

			   by Paul Johnson.

Version: 1.3
Date: 94/06/30

The copyright in this document is the property of GEC-Marconi Limited.
This document may be copied freely, provided that no charge is made
for the information.  All copies must include this copyright statement.

0. Abstract
===========

This collection of notes on standards provides information about open
systems computing standards that may be encountered.  The intention is
to fill the gap caused by the general lack of accessible summary
information on technological standards, which makes it difficult to
find out what standards actually exist in any given area, or how
widely they are actually used.

This report will be of particular use to those who are confronted with
the need to comply with open systems standards, but who do not have
the background knowledge that enables them to identify sources of
information on these standards.  It also forms a useful overview of the
open systems standards that are relevant in particular areas, and
assesses the impact of these standards on GEC-Marconi businesses.

The report is mainly concerned with official standards promulgated by
international organisations such as ISO and CCITT.  However it also
includes a number of de-facto and proprietary standards where those
have had significant market impact.

The UK government is legally required to specify open systems when
inviting bids for information systems.  Other large customers may well
follow suit.  The appropriate standards have been collected together
under the name ``GOSIP''.  This is a ``meta-standard'' which specifies
the options that may be used by government departments in various
areas of information technology.  The idea is to ensure that
applications used by government departments can work together.  The
ISO and CCITT standards mandated by GOSIP are included in this
document.

The Internet is a world-wide collection of academic and commercial
networks which grew out of the DARPA-Net.  The Internet Activities
Board (IAB) is charged with monitoring and co-ordinating the evolution
of Internet protocols.  These protocols may be proposed by anyone
through the issuing of a ``Request For Comment'' (RFC), which
documents a proposed standard.  Some of these protocols have become
world-wide standards and are described in this document.

0.1. Changes Since Last Version
-------------------------------

Ethernet information updated.

X Windows updated.

Minor correction to Unix history.

Correction to Internet protocols.


1. Introduction
===============

1.1. Why it is on the Net
-------------------------

This document was originally developed for use within GEC-Marconi.  As
an experiment, we are making it available over the Net.  We hope to
acheive the following goals (not necessarily in this order).

1: Get feedback, improvements and updates from readers, thereby
   increasing our competitiveness.

2: Garner kudos and respect from the rest of the Net.

3: Help other people.

4: Some of the information in this document came from FAQs made
   available through the generosity of people on the Net.  We would
   like to reciprocate.

1.2. Overview
-------------

This collection of notes on standards provides information about open
systems computing standards that may be encountered.  The intention is
to fill the gap caused by the general lack of accessible summary
information on technological standards, which makes it difficult to
find out what standards actually exist in any given area, or how
widely they are actually used.

The report is mainly concerned with official software standards
promulgated by international organisations such as ISO and CCITT.
However it also includes a number of de-facto and proprietary
standards where those have had significant market impact, and includes
some hardware standards which have significant impact on software
design.

One of the issues in a report of this nature is its scope.  There are
hundreds of open standards, and thousands of commercial ones.  Some of
the commercial standards are declared ``open'' by the vendor (e.g
SPARC and NFS) but are considered to be proprietary by other bodies
(e.g the UK Government).  In some cases the common standard used by
almost everyone is closed and proprietary, despite the existence of
open standards that cover the same ground (e.g MS-DOS vs. Unix).

Covering every standard in existence would be an expensive and futile
exercise.  This document tries to cover the standards that are most
frequently referred to or used today.

The areas of technology covered in this document are:

Meta-Standards:
   There are so many competing standards for so many areas of
   technology that we need standards for which standards we use.
   These are ``meta-standards''.

Data Specification Standards:
   These are languages used for specifying data protocols and file
   formats.

Hardware Standards:
   These include standards that may include a software element (e.g
   the Ethernet access protocol), but where this software would
   usually be supplied on ROM or as a device driver.

Software Standards:
   This covers operating systems and related areas.  Standards in this
   section would usually be used directly by applications software.

Application Standards:
   This covers communications protocols related to specific
   application areas.

Data Exchange Formats:
   These standards cover the storage and transfer of specific types of
   data.

Document Formats:
   This covers markup languages used to describe the format of
   documents.

Internet Services:
   This section could have been included in the ``Application
   Standards'', but the Internet Activities Board have defined a group
   of open standards which are completely unrelated to the ISO and
   CCITT world, and this separation is reflected here.

1.3. Structure
--------------

Each of the standards in this document is described under the
following headings:

Origin
   The name of the organisation responsible for the standard, or the
   name of the people who originally invented it.

Standard Status
   The current version of the standard, along with a brief indication of
   the extent to which it is actually used and its probable future.

Purpose
   A short description of the problem which this standard is meant to
   solve.

Outline
   A technical overview of the standard, covering the major features and
   highlighting any particular advantages or disadvantages.

References
   Pointers to further information.  Official standards are not cited
   here, since they appear in the Origin and Standard Status sections.


2. Meta-Standards
=================

2.1. OSI
--------

Open Standards Interconnection.

-- Origin --

ISO/CCITT.

-- Standard Status --

ISO seven-layer reference model for OSI, issued 1978.  Standard ISO
7498.

OSI protocols are not generally used outside areas where their use is
enforced by regulations (e.g. GOSIP).  OSI protocols are not tested
widely before standardisation and are not based on existing practice.
A number of important techniques such as RPC and stateless protocols
were developed after the model was standardised.

-- Purpose --

The overall aim is to define the architecture for networking machines
in order to provide universal homogeneous connectivity in a
heterogeneous environment, regardless of distance, operating system,
etc.  It seeks to do this by defining a generalised network
architecture, as a layered model, such that functions are mapped to
specific layers, in order that standards developed for each layer will
be well defined as to function and to how they interface with the
next-higher and/or next-lower layer.

-- Outline --

This is an abstract 7-layer network model, with the lower layers
covering the physical (electrical, optical) connections, signalling
conventions, error handling, routing from sender to receiver, etc.,
and the upper layers providing high-level services for network filing
functions, network services directory, network management etc.

The layering is important in that it decomposes an extremely
complicated problem into a set of manageable subproblems (both from
the point of view of thinking about it, and producing
implementations), and it allows the layer interfaces to be defined on
functional grounds, leaving each layer totally independent of the
others (thus allowing independent modification of the layers).

OSI is therefore a conceptual model which defines the architecture for
networking hardware and software.  A number of standards is possible
at each of the seven layers, and OSI ``profiles'' will define particular
sets of standards for particular functions, e.g.  MAP.

The seven layers are:

		       +-------+--------------+
		       | LAYER | FUNCTION     |
		       +-------+--------------+
		       | 7     | Application  |
		       | 6     | Presentation |
		       | 5     | Session      |
		       | 4     | Transport    |
		       | 3     | Network      |
		       | 2     | Data Link    |
		       | 1     | Physical     |
		       +----------------------+

Very briefly, the functions of these layers are as follows.

1: The *Physical Layer* provides the physical means of communication -
   the wires, modems, connectors, signalling conventions - between two
   nodes.

2: The *Data Link Layer*, using the physical layer, furnishes and
   manages the communication link between any 2 nodes on the physical
   medium.  The IEEE 802 specification (to which Ethernet belongs)
   divides this layer into 2 sub-layers:

   a: Logical Link Control (LLC), responsible for transfer,
      sequencing, error checking and addressing, and

   b: Medium Access Control (MAC), responsible for managing access to
      the physical medium.
   
3: The *Network Layer*, using the data link layer, provides routing
   from point to point within a network of arbitrary complexity.

4: The *Transport Layer*, using the network layer, provides a
   network-wide, error-free data transfer service.  It manages network
   access, message segmentation, error correction, and control over
   data exchanges.

5: The *Session Layer*, is the first of the three upper
   application-oriented layers.  Using the transport layer, the
   session layer sets up and manages the dialogue between application
   processes to ensure structured and synchronised exchange at
   transport layer level.

6: The *Presentation Layer* addresses the problem of differences in
   data encoding formats between nodes, using the notions of an
   abstract syntax which defines a neutral (machine independent)
   notation for the formal description of data types and values, and a
   transfer syntax which expresses the data in abstract syntax form in
   a neutral bit-level format for transmission.  The syntax used is
   data-dependent, so the presentation layer selects the syntax
   according to the data type, and handles the translation to and from
   both abstract and transfer syntax.

7: The *Application Layer* is the interface to the user's application
   program.  There is therefore no single standard, but a set of
   procedures, one for each application category --- e.g.  for file
   transfer and management, for directory services, for network
   management, for message handling, and for remote terminal access,
   etc.

Note that each layer in one node appears to speak directly to its peer
in another node - the lower layers are invisible wherever you look
from, despite the fact that all data actually goes via layer 1.  Some
abbreviated protocol stacks are also in use, where two or more layers
are combined for efficiency.

A list of all OSI related standards and their current status is
printed twice a year in the ACM SIGCOMM journal Computer Communication
Review.

2.2. GOSIP
----------

The UK Government Open Standards Interconnection Profile.

-- Origin --

The UK Government.  Other countries have defined their own
equivalents, so bids for foreign contracts should consult the
customer's local standards.

The US standard is also called GOSIP.  The following refers only to
the UK GOSIP.

-- Standard Status --

"GOSIP: Government Open Systems Interconnection Profile Version 4",
issued in 1991.  Available in two versions: "Supplier Set" and
"Purchaser Set".  Updates for 1992 and 1993 are also available.  These
are also available as "Electronic GOSIP", presumably the same
documents in a machine-readable format.

A small booklet "Essential Guide to GOSIP" is available for free.
This is the place to start.

All this information is available from CCTA Publications.  Telephone
+44 071 217 3331, and ask for the "Essential Guide to GOSIP" and the
"CCTA Publications List".

-- Purpose --

>From "Essential Guide to GOSIP":

* Simplify planning and procurement of OSI-based communications
  systems.

* Define appropriate options within the base OSI standards.

* Facilitate precise specification of the requirements of an
  administrative user community.

* Ensure that applications can interwork effectively between
  independently purchased systems.

* Provide a basis for controlling change.

* Reduce the risks associated with future procurements.

* Specify the OSI implementation requirements of government bodies as
  a guide to product developers.

* Stimulate the development of conformance and interoperability
  testing.

-- Outline --

GOSIP is divided into a number of sections.  These sections are called
``subprofiles'' and represent building blocks that can be selected to
meet particular system requirements.  GOSIP consists of four
subprofile sets, each of which contains a number of subprofiles, each
of which is made up of one or more standards.  Customers should
specify which particular subprofiles they require the product to
conform to.

The four subprofile sets are:

GOSIP-S:
   The supporting services.  Deals with communications management and
   ancillary issues.  Covers naming and addressing, security and OSI
   management.

GOSIP-F:
   The format of information.  Deals with the syntax for data or
   document interchange, covers structured processable data and
   document formats (EDI and ODA) and character repertoires.

GOSIP-A
   The application services.  Deals with the top three OSI layers.
   Covers X.400 messaging, X.500 directory services, FTAM file
   services and terminal services.

GOSIP-T:

   The underlying transport infrastructure.  Deals with the lower four
   layers of the OSI model, covers LANs, WANs and relaying systems.

The text of the US GOSIP is available by FTP from Imperial College
<src.doc.ic.ac.uk>.

2.3. IAB Official Protocol Standards
------------------------------------

IAB stands for the Internet Activities Board.

-- Origin --

Internet Activities Board.  The Internet has grown from the initial
DARPA-Net to become a world wide network used by 100,000 companies and
universities.  This is a very rough estimate.  The Internet appears to
be growing exponentially at about 10% per month.

-- Standard Status --

Updated regularly as new standards are adopted.  At the time of
writing the most recent version in the Imperial College repository was
RFC 1250, issued on 1st August 1991.  This is probably out of date by
now.  See your local FTP archive for the latest information.

-- Purpose --

A list of the standards adopted by the IAB for use on the Internet.

-- Outline --

Most standards organisations work by forming a committee to create a
standard from scratch.  Committee members are expected to be experts
in their field and to release drafts for comment.  The IAB takes a
very different view.  In their process, anyone can invent a protocol.
They encourage these inventors to document their protocols as a
``Request For Comment'', or RFC.  This is done even when there is no
intention that the protocol be adopted as a standard.  Such protocols
are termed ``experimental''.

Protocols that are intended to become standards are first designated
``proposed''.  They are then implemented and tested by several groups,
and a revised RFC may be issued as a result.  Once the protocol has
become stable it may become a ``draft standard'' and will normally
become an IAB standard about 6 months later.

Anyone can submit a document for publication as an RFC.  It will then
be assigned a number and ``published'' (made available on public FTP
repositories).  Once this has done the RFC can never be revised.  If
changes become necessary then a new RFC number is issued and the old
one is tagged as obsolete.  Not every RFC is intended as a standard.
Other material covered includes documentation conventions, comments on
past RFCs and technical notes on various aspects of the Internet.

As well as the progression from experimental to standard protocols,
the IAB also designates protocols as ``Required'', ``Recommended'',
``Elective'' and ``Not Recommended''.

RFC 1250 "IAB Official Protocol Standards" defines this
standardisation process and also lists the current state of all RFCs
that have not been superseded.  RFC 1250 was released in August 1991,
and will probably be superseded by now.  Consult a current index for the
latest version.

The US Department of Defence has adopted a number of IAB standards for
its own use.  These are:

   +-------------------------------+--------+------+--------------+
   | Standard                      | Abbrev | RFC  | DoD Number   |
   +-------------------------------+--------+------+--------------+
   | Internet Protocol             | IP     | 791  | MIL-STD-1777 |
   | Transmission Control Protocol | TCP    | 793  | MIL-STD-1778 |
   | File Transfer Protocol        | FTP    | 765  | MIL-STD-1780 |
   | Simple Mail Transfer Protocol | SMTP   | 821  | MIL-STD-1781 |
   | Telnet Protocol and Options   | TELNET | 854  | MIL-STD-1782 |
   +-------------------------------+--------+------+--------------+

Note that these MIL-STD are now somewhat out of date.  The Gateway
Requirements (RFC-1009) and Host Requirements (RFC-1122, RFC-1123)
take precedence over both earlier RFCs and the MIL-STDs.

All current RFCs are available on-line from the Network Information
Centre (NIC) repository in the USA.  For details, send email to 
<rfc-info@ISI.EDU> with the message body ``help: ways_to_get_rfcs''.

The NIC repository is mirrored at Imperial College, but this may be
slightly out of date.

-- Update --

Since this report was completed, the IAB have signed a memorandum of
agreement with ISO, under which the Internet will
migrate towards X.400 addresses.

3. Data Specification Standards
===============================

3.1. ASN.1
----------

-- Origin --

ISO.

-- Standard Status --

Full ISO standard.

ISO 8824 defines symbolic data description technique.

ISO 8825 defines rules for encoding such a data description (Basic
Encoding Rules, or BER).


-- Purpose --

To define a symbolic human-readable data description technique, which
defines the Type, Meaning and Structure of data, and a set of rules
for encoding such a data description usually for transmission across
an OSI network.  Note that the symbolic data description defines
Syntax and Grammar, but not Usage, so ASN.1 can be used in many
applications.


-- Outline --

The basis of ASN.1 is a set of Descriptive Statements which define its
use in any particular application.  The Type and Meaning of data is
identified by tagging each data item with an Identifier, while the
Structure is conveyed by the manner and order in which it is sent.
The Descriptive Statements (which are themselves symbolic) define how
the Identifiers and Structure are to be interpreted.  Each application
that specifies use of ASN.1 has to specify its own set of Descriptive
Statements - standards that use it include the Protocol Data Unit
encoding in various OSI protocols, such as FTAM (File Transfer, Access
and Management), MMS (Manufacturing Message Specification), and the
CCITT X.400 MHS (Message Handling System, CCITT recommendation X.409).

-- References --

ComCentre Communique, April 1989.

Further information in "Reading Abstract Syntax Notation One" by
Ralph Purdue of ComCentre, available from ComCentre.

3.2. XDR (eXternal Data Representation)
---------------------------------------

-- Origin --

SUN Microsystems Inc.


-- Standard Status --

De-facto commercial standard, specification in the public domain.
Available as a software library to be built into application programs
(e.g.  from GEC software).

-- Purpose --

While ASCII files are portable, most binary files are not.  XDR
defines a self-descriptive format for binary files so that binary
files encoded with XDR can be machine-independent.

Data portability is of increasing concern, and pressure to  provide
portability  is  likely  to increase.  The choices for binary files
are to adopt XDR, or to add knowledge of  the  source  into  binary
files   and   translate   appropriately   on  receipt  (within  the
application).  The latter is  more  efficient,  but  is  of  course
limited to the supported application/machines.

-- Outline --

XDR specifies a self-descriptive format (keyword+value) for binary
files, so that such files can be machine-architecture independent.  To
make use of the XDR standard, programs that write and read binary
files must be modified so that they write and read XDR formatted data.
This can be done simply by implementing the specification in the
application program, or by incorporating a pre-built library of
XDR-conformant utilities in the application.

-- Limitations --

Since XDR files contain self-descriptive data they will be larger than
the corresponding non-portable binary files.  This size penalty will
be content dependent.

XDR files will take longer to read and write than their non-portable
equivalents, as a result of the extra processing and the larger file
sizes.  Note also that transmission of some binary files is not
trivial (in particular stream files from C).

-- References --

"The SUN Network File System: Design, Implementation and Experience"
by R. Sandberg.  SUN document.

4. Hardware Standards
=====================

4.1. SPARC
----------

-- Origin --

SUN Microsystems Inc.

-- Standard Status --

De-facto standard, specification in public domain.  Manufacturers
include SUN and Hitachi.

-- Purpose --

To provide an industry standard RISC processor architecture.  SPARC
stands for Scalable Processor ARChitecture.  The idea is that
designers can make a range of chips which can all run the same
software.  The number of on-chip registers may change, but the
instruction set and register addresses used by programs will always
stay the same.

-- Outline --

RISC architectures attempt to maximise the effective speed of a
computer.  In a conventional CPU, much of the instruction set is used
only rarely.  These instructions provide little performance gain
because of their rare use, but they cause a performance loss because
the extra decoding hardware needed for these instructions slows down
the machine for all instructions.  RISC designers analyse the tradeoff
between the gain when an instruction is used and the loss when it is
not used.  Hence they can make a logical decision about each
instruction.

The SPARC architecture introduces a rolling set of register windows.
Each function has 8 ``local'' registers, 8 ``in'' registers and 8
``out'' registers.  When a function call is made, the new function
gets a new set of ``local'' and ``out'' registers and the old ``out''
registers become the new ``in'' registers.  In this way function
arguments can be passed through the ``in'' and ``out'' registers
without the need to transfer them via a stack frame.  In addition the
chip also has 8 ``global'' registers which can always be accessed.

A SPARC chip may have between 6 and 32 register windows, giving
between 104 and 520 on-chip registers.  Once all the registers have
filled up, the CPU has to start saving windows on the stack.

The SUN manual "A RISC Tutorial" which describes the SPARC
architecture also mentions ``tagged arithmetic'', but does not explain
what this is or how it contributes to improved performance.  The
instruction set includes floating point instructions, and current
implementations include on-chip FPUs.

SPARC processors are mainly used in SUN workstations and their
emulators.  Their efficiency and future-proof design may also make
them suitable choices for embedded applications, particularly those
where equipment must be maintained and upgraded for long periods in
the future.  The SPARC architecture does not include a definition of
the memory management unit, so this makes it more suitable for
embedded applications than devices such as the 80486, where an MMU is
included on-chip.

"A RISC Tutorial", Sun Microsystems, Part Number 800-1795-10, Revision
A, May 1988.

4.2. IBM PC and 80x86 CPUs
--------------------------

-- Origin --

IBM and Intel.

-- Standard Status --

De-facto industry standard.  The 80x86 chips are now made by Intel and
AMD, and ``clone'' PCs and accessories are made by hundreds of
different manufacturers.  Neither of these are open standards; in fact
both Intel and IBM have tried to keep them closed.  These attempts
have been largely unsuccessful, and this has led in large part to the
downfall of IBM.

-- Purpose --

Originally, to make money for IBM and Intel.  Neither company expected
these products to become important standards.

-- Outline --

Technically these are two different standards; one is a CPU chip, and
the other is a computer architecture that uses it.  In practice 80x86
chips are almost never used outside the PC architecture, and so they
will be considered together.  The term 80x86 is used to cover the
following devices: 8086, 80186, 80286, 80386, 80486 and Pentium
(sometimes called the '586: Intel gave it a name after an American
court ruled that numbers cannot be trade-marked).  All these chips
share a common architecture and are upwardly compatible from left to
right.  They are also upwardly compatible with the 8 bit 8088 and Z80
CPUs.

Of these chips, the most commonly used outside IBM PCs is the 80186.
This is effectively the same as an 8086 except for a small amount of
on-chip RAM and some IO devices.  It is intended for embedded
applications.

The 80x86 range is notable for two things:

1: Its non-orthogonal register set.  Most CPUs provide a large number
   of registers which can all be used in the same way.  On the 68000
   an add instruction can apply to any pair of the 8 data registers,
   or to a data register and a memory location.  Similarly, any of the
   8 address registers can be used to index data.  The 80x86 does not
   provide this.  Instructions are tied to particular registers.  This
   creates problems for optimising compilers.


2: The segmented memory architecture, which was originally invented to
   allow upward compatibility with the 8088 and Z80.  The original
   8086 had a 20 bit address bus, giving 1Mb of addressable memory
   (this was thought to be generous given the expected lifetime of the
   architecture).  Addresses are all 16 bit, with the extra 4 bits
   coming from ``segment registers''.  These are also 16 bit
   registers, but the bits are shifted 4 bits to the left before being
   added to the address computed by the rest of the CPU.  There are
   separate segment registers for code, stack and data.  This causes
   problems for C compilers because they have to provide the
   programmer with a flat address space for pointer arithmetic and
   comparison.  Further problems have been caused in the later chips
   by the expansion to a full 32 bit address bus (requiring an extra
   12 bits of address to come from somewhere).

The IBM PC was originally produced by a small team within IBM who
believed that the company should try to cash in on the growth in the
microcomputer market.  This was not expected to be more than a
sideline to IBM's main business, and so the development was done on a
small budget.  This led the development team to buy in their CPU and
operating system rather than develop them from scratch.  This allowed
competitors to build their own ``compatible'' PC clones.  Had IBM
retained control of either of these two items then the clone market
would never have developed and the commercial history of computers in
the 1980s would have been very different.

The 80186 CPU is useful in embedded applications, although it does not
provide the power of more modern chips.  The rest of the range are
normally encountered in IBM PC motherboards.  These can either be used
in stand-alone PCs or embedded control applications.  The wide
availability of these boards, as well as IO devices and development
software, makes this a good choice for many projects.  However
consideration should be given to the extra programming problems caused
by the 80x86 and IBM PC architectures.  A small saving on hardware may
be swamped by the increase in coding time.

See also the section on MS-DOS and Windows.

4.3. Ethernet
-------------

-- Origin --

In 1976, Metcalf and Boggs of Xerox PARC published the first
description of Ethernet.  This was later codified into IEEE 802.3,
which defines the Ethernet still in use today.

-- Standard Status --

The standard is now well established and in wide use.  IEEE 802.3
defines speeds up to 20 Mbit/sec, but the de-facto standard is
10Mbit/sec.  Ethernet cards are available for IBM PCs.  Unix
workstations are usually fitted with Ethernet interfaces as standard.
A number of packet switches exist for routing data between different
subnets.  Ethernet is starting to look a little slow, especially for
large networks.

-- Purpose --

A LAN for office and light industrial use.

-- Outline --

Ethernet uses Carrier Sense Multiple Access with Collision Detection
(abbreviated CSMA/CD).  All nodes are connected to a common co-axial
cable.  When a station wishes to transmit, it first listens to see if
any other station is transmitting.  If it detects no other station
then it goes ahead, otherwise it waits (this is the CSMA bit of the
name).  If two stations start transmitting simultaneously then
hardware in both nodes detects this and stops the transmission.  The
two nodes then wait for a random interval before retrying (this is the
CD bit of the name).

Although data is transmitted over the Ethernet at 10Mbit/sec,
theoretical studies predict that actual utilisation can only be about
30% of this.  However according to a report from DEC [1], in practice
utilisation can be as high as 95%. Above this the number of collisions
rises rapidly and the entire network grinds to a halt.  Since Ethernet
is a bus-based LAN, utilisation is roughly proportional to the number
of nodes.  The usual solution to this problem is to split the LAN into
a number of subnets, and use intelligent routers to switch messages
>from one subnet to another.  High speed fibre optic links are often
used to transmit between subnets.

In the OSI architecture, Ethernet covers layer 1 (physical) and the
part of layer 2 (data link).

Ethernet is ubiquitous and cheap.  However it is not suitable for
real-time communication because of the unpredictable transmission
delay.

Since every item of data is transmitted to every node, a single
dishonest node can intercept all transmitted data and may also be able
to masquerade as another node.  These problems can be overcome by
adding cryptographic protocols on top of the Ethernet standard, at
some cost in performance.  This may be important in some secure
applications.

[1] Dave Boggs, Jeff Mogul and Chris Kent.  DEC Western Research Lab
    tech report 88-4 "Measured Capacity of an Ethernet: Myths and
    Reality".  This report is available as a postscript file by
    anonymous ftp from gatekeeper.dec.com.


5. Software Standards
=====================

5.1. Unix
---------

-- Origin --

The original version of Unix was written on a spare PDP-7 by Ken
Thompson to support a video game.  Since then it has had a long and
complicated history.  The story includes startup companies that became
industry giants, university programmers who rewrote the whole thing
for fun, several incompatible versions, and a number of standard
wars.

-- Standard Status --

The major Unix standard is POSIX (Portable Operating System Interface
X), as defined in the IEEE 1003 family of standards.  This won the
standard war of the late 1980s and is now being adopted industry-wide.

The Unix trademark is now (as of 14 October 1993) owned by X/Open, the
Unix vendor club.

Unix is now the dominant vendor-independent operating system.  As a
result it is frequently specified for large networked systems.

-- Purpose --

POSIX is an attempt to provide a single standard for all
implementations of Unix.  However it is not tied to Unix.  A vendor of
a different operating system could provide the set of shells and
utilities specified in 1003.2 and then claim to be POSIX-compliant.

-- Outline --

Some of the information in this section is taken from the
comp.unix.questions FAQ maintained by Ted Timar
<tmatimar@empress.com>.

POSIX standard numbers are of the form 1003.x.  The following values
of x have been allocated, although not all of these documents have
been released:

0: Open Systems Environment
1: System Application Program Interface (C Language system calls)
2: Shell and Utilities
3: Test Methods
4: Real-Time Systems Interfaces
5: Ada Language Binding
6: Security
7: System administration (including printing)
8: Transparent File Access
9: FORTRAN Language Binding
10: Super computing
12: Protocol-independent interfaces
13: Real-Time Profiles
15: Supercomputing batch interfaces
16: C-language bindings
17: Directory services
19: FORTRAN 90 Language Binding

As always, vendors are caught between the need to demonstrate that
their products are standard and that they are better than anyone else.
This leads to various non-standard extensions.  However this should
not be too great a problem since vendors usually indicate which areas
of their product are not part of POSIX.

A claim that a product is ``POSIX conformant'' does not imply that
everything in the IEEE 1003.x document set is supported.  However a
vendor must produce a POSIX conformance document.  This can be
inspected to see if the product conforms in the areas of interest.

X/Open (a vendor consortium) have produced a series of Portability
Guides known as the XPG series.  They include:

XPG2:
   This was published in 1987.  It has a strong System V influence.
   The volumes are:

   1: Commands and Utilities
   2: System Calls and Libraries
   3: Terminal Interfaces, IPC, Internationalization
   4: Programming Languages (C & COBOL)
   5: Data Management (ISAM & SQL)

XPG3 and XPG4 were published in 1989 and 1992 respectively.  There
were huge changes between XPG2 and XPG3 to align with POSIX.1 and to
align partially with the C Standard.  Many of the XPG2 interfaces
and headers were withdrawn.  There were equally drastic changes
between XPG3 and XPG4, including alignment with POSIX.2 and FIPS
151-2, full alignment with the C Standard, and the addition of wide
character interfaces.  Therefore anyone working to these standards
should ensure that they have the latest version, and that anything
purchased as conforming to these standards also conforms to the latest
version.


5.2. X Windows
--------------

-- Origin --

Massachusetts Institute of Technology.

-- Standard Status --

The standard is defined by a ``sample implementation'' from MIT.  In
fact this is the commonest implementation in use because MIT
distributes the source code for free.

X Windows has now been adopted as part of POSIX.

Various vendors have modified the sample implementation and
distributed the results without source code, but no-one seems to have
rewritten it.  Such an effort is unlikely.  It would probably cost
several million pounds to develop and the resulting product would have
to compete with the MIT sample implementation.

The current version is known as X11 revision 6, or just X11R6.  New
revisions are released by MIT every year or two.  All revisions are
upwardly compatible.  It seems unlikely that there will be an X12 in
the near future.  The release of vendor-modified versions of X always
lags behind the MIT release.

Versions of the X Windows server are available for PCs, and X Windows
is available on various PC versions of Unix.

-- Purpose --

To provide a standard GUI for networked systems, especially Unix.  The
original intention was merely to manage the ``real-estate'' of the
screen.  However the basic API (known as Xlib) has been supplemented
by a range of GUI libraries which are generally considered to be part
of X Windows.

X Windows works transparently over a network.  A user can run a
program on one machine and interact with it on another.  A single
application can control windows on several machines.

-- Outline --

X11 is available under Unix, Ultrix and VAX/VMS.  It is presented to
application programmers through Xlib, a C procedure library.  The
network communication is hidden underneath Xlib, and other X libraries
are built on top of this.

X11 is based on a client-server model.  For each physical display
there is a controlling server.  Client processes communicate with the
servers via a reliable duplex byte stream with a block stream protocol
layered on top.  Where client and server are on the same machine the
stream is based on a local IPC mechanism, otherwise a network
connection is used.  Client-server connections can be one-to-one,
one-to-many, many-to-one, or many-to-many.

X Windows supports one or more physical screens, with the windows
arranged in a strict hierarchy.  Each screen has a root window
covering the display screen, covered partially or completely by child
windows, which in turn may have their own children.  There is usually
at least one window per application program, and an application can
create a tree of arbitrary depth on each screen.  A child may extend
beyond its parent window, but output to the child is clipped to the
parent window boundaries.  At each level in the hierarchy, one window
is deemed to be dominant (i.e.  obscures the others).

Each window has a border (may be zero pixels wide), and can have a
background colour (if it does not it will be transparent i.e.  let
other windows behind it be seen).

X does not take responsibility for the contents of windows : if a
window is obscured and then exposed, X will ask the application to
repaint part or all of the window.  X does, however, provide for
storage of graphic objects called pixmaps (bitmaps if only using 1
pixel plane) at the workstation.  The application can also elect to
have the X server store obscured window areas as pixmaps so that it
can repaint them itself.

X also fails to define any system for the user to manage windows.
Primitive functions for moving and resizing windows are provided by
Xlib, but the user needs an application called a ``window manager'' to
control these things.  As far as X is concerned the window manager is
just another application and has no special status or privilege.
Users can therefore pick their own window manager according to taste.
A range of such programs are distributed with X.

Many X functions return an ID which allows the application to refer
to  objects  stored  on the X server.  These can be of type Window,
Font, Pixmap,  Bitmap,  Cursor,  or  Graphic  Context.   Fonts  and
cursors are normally shared automatically between applications.
Most calls to Xlib operate asynchronously, but synchronisation  can
be forced by calls to XSync (e.g.  to wait for a return value).

The X Toolkit ``Xt'' is layered on Xlib.  It acts as a basic GUI
library, defining simple buttons and sliders and providing a standard
protocol for applications to communicate with the window manager.  GUI
objects such as buttons and sliders are known as ``widgets'' in X
terminology (short for ``window gadgets'').  Xt widgets are defined in
a strongly object-oriented way with function pointers used to provide
a common interface to the various widgets.  The result is flexible but
complex.  Application programmers are shielded from this complexity
but programmers writing new widgets are not.

A number of widget libraries have been produced by extending Xt.
These include the Athena widgets, Open Look and Motif.  Athena is
included in the MIT distribution, but does not appear to be commonly
used.  The standards war between Open Look and Motif has now been won
by Motif.  Both of these are commercially available libraries.

At one point SUN made a bid to corner the standard window system
market with NeWS (Network Extensible Window System), but SUN now
supports both NeWS and X.  NeWS has lost the standards war and seems
destined to fade into obscurity.

A further level of abstraction can be layered on top of Xt: the UIMS
(User Interface Management System).  This allows the user interface to
be separated from the application by providing high-level extensible
tools for building user interfaces, with obvious advantages for
portability.

While X itself is ``value-free'', i.e it does not enforce a particular
style or look-and-feel on user interfaces, the particular tool used
will tend to result in stylistic similarities in user interfaces
developed with it.  Most UIMSs are tied to one particular GUI library.

The favourite competitor for the de-facto GUI standard currently seems
to be the Open Software Foundation library ``Motif''

Some vendors have released ``X terminals''.  These are small
computers, usually with big screens and an Ethernet interface, that
are designed to run the X server.  They often contain special graphics
hardware and an X Server modified to take advantage of this.  X
terminals can provide a cost effective alternative to Unix
workstations, but they cause a serious increase in network traffic.

Increasing numbers of Unix applications are being released with X
Windows support.  Some will not run on anything else.

GOSIP specifies ``Virtual Terminal'' support (ISO 9040, 9041).  This
provides form-based data entry facilities.  It is not a substitute for
X Windows in more complicated activities.

5.3. Virtual Terminal (ISO 9040, 9041)
--------------------------------------

-- Origin --

ISO

-- Standard Status --

ISO standards 9040, 9041.

-- Purpose --

Standard for interactive operations over a network.

-- Outline --

>From the <comp.protocols.iso> FAQ:

   The Virtual Terminal (VT) service and protocol specified in ISO
   9040 and ISO 9041 allow a host application to control a terminal
   with screen and keyboard and similar devices like printers. In
   addition, not only application-terminal, but also the less common
   application-application and terminal-terminal communication is
   supported. Today, only the Basic Class VT, which covers
   character-oriented terminals has been specified.  This service is
   comparable to DoD Telnet and the old CCITT X.3/X.28/X.29 PAD
   protocol, but much more powerful. It also includes control of
   cursor movement, colors, character sets and attributes, access
   rights, synchronization, multiple pages, facility negotation,
   etc. This means that the huge number of classic terminal type
   definitions (e.g. in UNIX termcap or terminfo) are unnecessary at
   each host in the net, as the VT protocol includes the corresponding
   commands for one abstract virtual terminal that only have to be
   converted by the local implementation to the actual terminal
   control sequences.  Consequently, the use of VT means not every
   host needs to know every type of terminal.

   As with most ISO standards that require general consensus amongst
   participating members, the OSI VT has many optional capabilities,
   two modes of operation and an almost infinite number of
   implementation- specific options. Profiles may help in reducing the
   optionality present (e.g., there exists a Telnet profile for VT).
   But it is doubtful that the OSI VT can completely put an end to the
   ``m * n'' terminal incompatibility problem that exists in a
   heterogeneous computer network.

-- References --

"comp.protocols.iso FAQ" by Markus Kuhn (email
<mskuhn@cip.informatik.uni-erlangen.de>).

5.4. MS-DOS & MS-Windows
------------------------

-- Origin --

Microsoft.

MS-DOS was originally written as a ``Quick and Dirty Operating
System'' (QDOS) in about twelve weeks by a programmer named Tim
Patterson (who is said to have regretted it ever since).  Patterson's
employer sold it to Microsoft (then known mostly for its CP/M
implementation of Basic), who renamed it ``MS-DOS'' and licenced it to
IBM for their new PC.  The rest is history.

-- Standard Status --

De-facto operating system and windowing system for IBM-PCs.

-- Purpose --

MS-DOS seems to have been licenced by IBM because they had to have
something quickly.

Windows was created by Microsoft in response to the Apple Mac.  Apple
provided a user friendly GUI, and this enabled them to start invading
the market share of IBM and Microsoft.

-- Outline --

MS-DOS provides interrupt handling, a simple file system (the original
version lacked a directory hierarchy) and a simple command line
interpreter.  Its memory allocation system could only handle
640Kbytes, which caused problem until add-on programs were developed
to handle extra memory.  Even today incompatibilities between
different memory models can cause problems for PC users.

Windows is an MS-DOS compatible operating system which provides a GUI
front end and very limited multi-tasking.  The current version is 3.1.

One of the major innovations of Windows (at least for the IBM-PC world)
was ``Object Linking and Embedding'' or OLE.  This allows objects from
different applications to be placed in a single document.  A diagram
and a spreadsheet can be included in a report.  The spreadsheet
figures can be linked to a bar chart in such a way that when the
spreadsheet is changed the bar chart is updated automatically.

At the time of writing leaks have been appearing in the trade press
concerning ``Chicago'', the code-name of Windows 4.  It appears that
Windows 4 will be an object-oriented OS, upward-compatible with
Windows 3 but otherwise completely divorced from MS-DOS.

Most business applications software sold today works under MS-Windows.
Customers are likely to insist that bespoke software bought from us
can interwork with their off-the-shelf packages.

Software developers can develop new packages which are actually a
mixture of standard third-party packages and small amounts of custom
software.  Such packages tend to be cheaper and more flexible
than software developed from scratch.

5.5. Windows NT
---------------

NT stands for ``New Technology''.

-- Origin --

Microsoft.

-- Standard Status --

Proprietary standard.  Microsoft hope that this will become the
de-facto standard to replace MS-Windows.  Its chief competitors are
Unix with X-Windows and OS/2 from IBM (proprietary, unlikely to have a
long-term future).

-- Purpose --

The heir apparent to the Windows crown.  Microsoft hopes to wean users
>from MS-Windows by a combination of more features, true multi-tasking
and upward compatibility.

-- Outline --

>From the user point of view, Windows NT seems to be a bigger
MS-Windows.  NT needs at least an 80486 CPU with about 16 Mbytes of
RAM and a correspondingly huge hard disk.  Microsoft expect that it
will be used for large network disk servers, while individual users
continue to run MS-Windows.  As the average PC grows in power, users
will migrate to NT.  Also Windows-NT is not tied to the 80x86
architecture.

By way of comparison, a Unix with X windows will run reasonably well
on an 80386 CPU with 8Mbytes of RAM.

5.6. CORBA (Common Object Request Broker Architecture)
------------------------------------------------------

-- Origin --

Object Management Group (OMG).

-- Standard Status --

Object Management Architecture Guide published.

-- Purpose --

The Object Management Group (OMG) is an international software industry
consortium with two primary aims:

* Promotion of the object-oriented approach to software engineering
  in general.

* Development of command models and a common interface for the
  development and use of large-scale distributed applications (open
  distributed processing) using object-oriented methodology.

-- Outline --

The following text is from the "comp.object FAQ" The extract was
written by Richard Soley, OMG technical director (email
<soley@omg.com>).

   In late 1990 the OMG published its Object Management Architecture
   (OMA) Guide document. This document outlines a single terminology for
   object-oriented languages, systems, databases and application
   frameworks; an abstract framework for object-oriented systems; a set
   of both technical and architectural goals; and an architecture
   (reference model) for distributed applications using object-oriented
   techniques.  To fill out this reference model, four areas of
   standardisation have been identified:

   * The Object Request Broker, or key communications element, for
      handling distribution of messages between application objects in
      a highly interoperable manner;

   * The Object Model, or single design-portability abstract model
      for communicating with OMG-conforming object-oriented systems;

   * The Object Services, which will provide the main functions
      for realising basic object functionality using the Object
      Request Broker - the logical modelling and physical storage of
      objects; and

   * The Common Facilities will comprise facilities which are
      useful in many application domains and which will be made
      available through OMA compliant class interfaces.

   The OMG adoption cycle includes Requests for Information and
   Proposals, requesting detailed technical and commercial availability
   information from OMG members about existing products to fill
   particular parts of the reference model architecture.  After passage
   by Technical and Business committees to review these responses, the
   OMG Board of Directors makes a final determination for technology adoption.
   Adopted specifications are available on a fee-free basis to members and
   non-members alike.

   In late 1991 OMG adopted its first interface technology, for the Object
   Request Broker portion of the reference model.  This technology, adopted
   from a joint proposal (named "CORBA") of Hewlett-Packard, NCR Corp.,
   HyperDesk Corp., Digital Equipment Corp., Sun Microsystems and Object
   Design Inc. includes both static and dynamic interfaces to an inter-
   application request handling software "bus."

   Unlike other organisations, the OMG itself does not and will not
   develop nor sell software of any kind.  Instead, it selects and promulgates
   software interfaces; products which offer these interfaces continue to be
   developed and offered by commercial companies.

Implementations of CORBA 1.1 are available from Hewlett-Packard (with
HP Distributed Smalltalk), HyperDesk (Runs on several common
architectures), IBM (System Object Model, AIX \& OS/2 only) and Sun.

The OMG is basically a vendor club (although it has open membership).
CORBA has not yet been recognised by ANSI, ISO or CCITT, but that is
the obvious next stage.  In the mean time, CORBA constitutes the only
open standard in the area.

-- References --

"comp.object Frequently Asked Questions" by Bob Hathaway (email
<rjh@geodesic.com>).  October 1993.  Available by FTP from the
Imperial College repository.

5.7. SQL: Structured Query Language
-----------------------------------

-- Origin --

IBM.

-- Standard Status --

ISO standard 9075 released 1989 and ``updated'' in 1992.  This update
tripled the size of the standard.

Implementations exist for INGRES, ORACLE (and many others), but not
100% compliant with the standard, and, due to extensions beyond the
standard, not mutually compatible.


-- Purpose --

To provide database-independence for users and applications.

-- Outline --

SQL is a ``language'' for communicating with databases.  A user at a
terminal can prepare an SQL script and execute it in batch mode, or a
program can generate SQL statements to speak to the database.

SQL provides commands for:

* Data insertion, modification, and deletion
* Query
* Data definition
* Access control

-- Limitations --

There are many areas not covered by SQL, including data dictionary,
forms, foreign keys, primary keys, and referential integrity.  In
addition, not all implementations are 100% compliant, and some (all?)
extend beyond the standard, so portability is limited.  Both of these
problems may reduce with time as both the standard and implementations
grow.


6. Application Standards
========================

6.1. X.400 Message Handling System
----------------------------------

-- Origin --

CCITT.

-- Standard Status --

X.400 standard issued November 1988.  The term ``X.400'' is often used
to indicate a collection of standards in the range X.400 -- X.420.
X.400 itself is actually just the system and service overview.

-- Purpose --

To provide a standard for handling electronic mail on a
store-and-forward basis.  The content of the email is not
interpreted or altered by the system except in certain specific
situations, for instance where character set conversion is necessary.

-- Outline --

A Message Handling System (MHS) is split up into the following
components:

UA: User Agent.  An application program which interacts with a human
    who is reading or sending electronic mail.

AU: Access Unit.  This allows indirect access to the system, for
    instance by automatically printing out messages and handing them
    over for physical delivery.

MTS: Message Transfer System.  The system responsible for storing and
     forwarding electronic mail.  This in turn is split up into:

     MTA: Message Transfer Authority.  The subsystem responsible for
          moving messages towards their destination.

     MS: Message Store.  The subsystem responsible for holding
         messages until they are either forwarded by an MTA or deleted
         by an AU or UA.

On top of this system a range of services can be built.  The X.420
standard defines the Inter-Personal Message (IPM) system.  This
provides user-to-user electronic mail.  Addressing and routing is done
by the X.500 directory services.

This standard is included in the UK GOSIP.  It is the most widely
accepted official standard, but it is not the most widely used.  That
is the RFC 822 Internet Text Messages standard.

6.2. X.500 Directory Services
-----------------------------

-- Origin --

CCITT.

-- Standard Status --

X.500 standard issued November 1988.  As with ``X.400'', the term
``X.500'' is often used to indicate a collection of standards in the
range X.500 - X.521.

-- Purpose --

Directory services are necessary for two reasons.

1: To isolate users of the network from frequent changes to its
   structure.

2: To provide a more user-friendly view of the network.
 
The standard specifies how electronic directories of people and
services should be organised.  The specification includes the ways in
which different organisations can arrange for their directories to
work together, and methods of authentication for access and
modification.

-- Outline --

A typical X.500 address will look like this:

   C="GB"
   O="GEC"
   OU="Marconi Research Centre"
   T="Research Scientist"
   CN="Paul Johnson"

The various attribute names are specified in X.520.  The list above is
only a sample.  ``C'' stands for ``country'', ``O'' for
``organisation, ``OU'' for ``organisational unit'', ``T'' for
``title'' and ``CN'' for ``common name''.  Note that CN="Laser
Printer" would be equally valid.  Directories cover services as well
as human beings.

The X.500 directory is organised as a tree with individuals and
services at the leaves.  At higher level nodes, various organisations
are given authority to manage their own local namespaces.  At the top
level are various countries, known by their two-letter ISO codes (e.g
Great Britain = GB).  Below this are organisations, organisational
units and people.  A national authority is responsible for allocating
names and aliases to organisations.  Each organisation is responsible
for the names of its organisational units, and so on.  The levels in
the tree are mapped on to the X.520 attribute names.

Tree-structured databases such as this suffer from efficiency
problems.  A search for a research scientist at GEC-Marconi in Great
Britain can be performed quickly because the geographical area is
known.  A search for a research scientist named Paul Johnson would
have to be sent world-wide.  To avoid this the standard allows
different hierarchies to be used in a ``Yellow Pages'' service.  For
instance a professional organisation such as the IEE could maintain an
X.500 directory of members organised by title and professional area
rather than employer.  Entries in such a directory would all be
``aliases''; entries which actually point to real entries elsewhere.

X.500 is the most widely accepted official standard in this area.
However it is not the most widely used system.  See section
\ref{internet} on Internet Services for more information.

6.3. FTAM (ISO 8571)
--------------------

-- Origin --

ISO/OSI.

-- Standard Status --

ISO standard 8571, BS 7090.  Published in 1989.

-- Purpose --

To provide a transparent, network-wide, file transfer and management
service.

-- Outline --

FTAM is an OSI layer 7 standard which forms part of the UK GOSIP.  It
provides network-wide file transfer and access, but does not actually
constitute a file system.  This leads to a dichotomy between files
held on the local file system (which can be accessed by other
application programs) and files available through FTAM (which have to
be transferred to the local machine before they can be accessed).  In
theory a file system such as NFS (q.v.) can make remote files as
accessible as local ones.  In practice this is only feasible for local
area networks.  For wide area networks it makes more sense to
down-load a file to the local network before working on it.  Therefore
FTAM should be used for managing files on a WAN.  NFS or an equivalent
should be used on a LAN.

FTAM is part of GOSIP, but it is not clear whether this must be used
for local area networks.

6.4. NFS: Network File System
-----------------------------

-- Origin --

SUN Microsystems Inc.

-- Standard Status --

De-facto standard, specification and source code in the public domain.
Implementations exist for SUN, VAX/ULTRIX, VAX/VMS (server side only),
Apollo, and IBM PC (client side only)


-- Purpose --

To provide a transparent network-wide file system, i.e.  it provides
network-wide access to files (and directories) without the
user/program having to know where the files reside.  It should work in
mixed networks (provided that each machine supports NFS).  NFS is
designed to be portable to other machine architectures and operating
systems.  In addition, NFS aims to allow clients and servers to
recover from machine or network failures.


-- Outline --

NFS is implemented on top of a Remote Procedure Call (RPC)  package
to  simplify  protocol  definition  and implementation, and uses an
External Data Representation  (XDR)  to  describe  protocols  in  a
machine and system independent way.

To make NFS transparent to  applications,  the  generic  filesystem
operations  are separated from specific filesystem implementations.
The generic filesystem supports two kinds of operation - operations
on  the  filesystem  (using  the  Virtual  File  System,  VFS), and
operations on the files within the filesystem  (using  the  Virtual
Node, vnode).

NFS consists of three components:

1: The NFS protocol.  This uses the SUN RPC mechanism.  The RPC
   mechanism is synchronous, so behaves exactly like local
   procedure calls, which makes it easy to use.  In addition,
   the protocol is stateless, i.e.  each procedure call contains
   all of the required information in the call parameters, so
   there is no state history to be maintained (or re-established
   after a crash).  This means that neither client nor server
   has to deal with crash recovery.  NFS is transport
   independent - it currently uses the DARPA User Datagram
   Protocol (UDP) and Internet Protocol (IP), but could switch
   to others without altering the higher level protocols.  The
   NFS Protocol and RPC utilise the SUN XDR specification.  The
   NFS Protocol supports directory and file operations including
   create and delete directory, rename, create, lookup, remove a
   file, read, write, truncate a file, read from directory,
   change file attributes, etc.

2: The Server Side.  Because the NFS server is stateless, when
   servicing a request it must commit all modified data to
   stable storage {\em before} returning results.  This will
   include the data directly modified, and any consequential
   changes (e.g.  directory changes resulting from a file
   change).

3: The Client Side.  For compatibility with existing UNIX
   applications, NFS uses a UNIX-style pathname.  However, the
   host-name lookup and file address binding are done once per
   filesystem via the mount command, which means that files are
   not available to the client until the mount is completed.
   The VFS and vnode interfaces hide the differences between
   file systems from the applications, making NFS transparent to
   different filesystems.

Note that NFS does not itself support file locking.  Instead SUN
provides a separate file and record locking mechanism based on RPC.
Because file locking is inherently stateful (vs.  stateless), there is
also a status monitor which allows the lock manager to unlock files
after a crash.

Of possibly greater significance is the fact that concurrent write
access is not restricted by NFS: file modifications are locked at the
inode level, which prevents two processes intermixing data from a
single write.  However, since NFS does not maintain locks between
requests, and a write may span several RPC requests, two clients can
intermix data on long writes.  This follows the Unix philosophy.

At present NFS is the nearest thing to an open standard for a file
system that can work transparently over a network (FTAM is not capable
of this).  It is also a de-facto standard with a large number of
installations world-wide.  As such it is probably the system of choice
for any large networked installation.

-- Limitations --

While NFS allows access to files in a  mixed  environment  this  is
only  really useful if the files themselves are ``portable''.  NFS is
therefore relevant to ASCII files, and to binary files in  standard
formats  (e.g.   XDR).   It is not relevant to task images, program
object code, or non-standard binaries.

This limit would not normally apply in a single-vendor network, where
we might expect all file types to be compatible.  Note also the point
above about lack of concurrent write control.

-- References --

"The SUN Network File System", Russel Sandberg, SUN, 1986.

7. Data Exchange Formats
========================

7.1. Graphics Interchange Format
--------------------------------

-- Origin --

CompuServe, a commercial electronic conference service.

-- Standard Status --

Used to be the de-facto standard, but is now being replaced by JPEG.

-- Purpose --

Developed as a device-independent method of storing pictures.

-- Outline --

Pictures suitable for GIF encoding must use a palette of not more
than 256 colours.  This palette is stored with the picture.  The
picture is stored as a series of 8 bit indices into the palette,
compressed.  Apart from the palette quantisation, GIF is a non-lossy
picture compression method.  A 1024x768 pixel picture with 256 colours
takes about 660Kbytes.

Note that converting a GIF picture to JPEG is a bad idea: the
dithering required for palette quantisation in GIF looks like fine
detail to JPEG.

7.2. JPEG
---------

JPEG is pronounced ``jay-peg''.

-- Origin --

The Joint Photographic Experts Group, a sub-committee of ISO.

-- Standard Status --

Part of ISO.

-- Purpose --

A standard file format and compression algorithm for full colour
pictures.  One of the options for the compression algorithm (Q Coding)
is patented.

-- Outline --

The best brief introduction to JPEG is to be found in the
comp.compression FAQ.  The following information is quoted from the
FAQ.

   JPEG works on either full-colour or gray-scale images; it does not
   handle bi-level (black and white) images, at least not efficiently.  It
   doesn't handle colourmapped images either; you have to pre-expand those
   into an unmapped full-colour representation.  JPEG works best on
   ``continuous tone'' images, usually those of natural real-world
   scenes.  It does not work so well on non-realistic images, such as
   cartoons or line drawings, which have many sudden jumps in colour
   values.

   JPEG does not handle black-and-white (1-bit-per-pixel) images, nor
   does it handle motion picture compression.  Standards for compressing
   those types of images are being worked on by other committees, named
   JBIG and MPEG respectively.

   Regular JPEG is ``lossy'', meaning that the image you get out of
   decompression isn't quite identical to what you originally put in.
   The algorithm achieves much of its compression by exploiting known
   limitations of the human eye, notably the fact that small colour
   details aren't perceived as well as small details of light-and-dark.
   Thus, JPEG is intended for compressing images that will be looked at
   by humans.  If you plan to machine-analyse your images, the small
   errors introduced by JPEG may be a problem for you, even if they are
   invisible to the eye.  The JPEG standard includes a separate lossless
   mode, but it is not widely used and does not give nearly as much
   compression as the lossy mode.

Note that JPEG is not suitable for binary images such as documents.
JBIG should be used instead.

Any high-volume use of JPEG will require dedicated hardware.  Such
hardware is available, either as a chipset or as expansion boards for
IBM PCs.  See the comp.compression FAQ for a list of devices
and boards.

\subsubsection{References}

"comp.compression Frequently Asked Questions" by Jean-loup Gailly
(email <jloup@chorus.fr>).  Available from the Imperial College
repository.  October 1993.

Contains a good introduction to many aspects of data compression,
along with references to standard text books and current research.
The following text is taken from this reference list:

   "The JPEG Still Picture Compression Standard" by Gregory K. Wallace,
   Communications of the ACM, April 1991 (vol. 34 no. 4), pp. 30-44.

   A good technical introduction to JPEG.  Adjacent articles in that
   issue discuss MPEG motion picture compression, applications of JPEG,
   and related topics.

   "The Data Compression Book" by Mark Nelson.

   This book provides excellent introductions to many data compression
   methods including JPEG, plus sample source code in C.  The
   JPEG-related source code is far from industrial-strength, but it's a
   pretty good learning tool.

   "JPEG Still Image Data Compression Standard" by William
   B. Pennebaker and Joan L. Mitchell.  Published by Van Nostrand
   Reinhold, 1993, ISBN 0-442-01272-1.  650 pages, price $59.95.  This
   book includes the complete text of the ISO JPEG standards, DIS 10918-1
   and draft DIS 10918-2.  Review by Tom Lane:

      This is by far the most complete exposition of JPEG in existence.
      It's written by two people who know what they are talking about:
      both serve on the ISO JPEG standards committee.  If you want to
      know how JPEG works or why it works that way, this is the book to
      have.

   There are a number of errors in the first printing of the Pennebaker
   & Mitchell book.  An errata list is available at ftp.uu.net:
   graphics/jpeg/pm.errata.  At last report, all were fixed in the second
   printing.


7.3. JBIG Binary Image Compression
----------------------------------

-- Origin --

Joint Bi-level Images Group, an experts group of ISO (JTC1/SC2/WG9 and
SGVIII).

-- Standard Status --

Under development.  Parts of the proposed standard are patented.

-- Purpose --

To provide a system for compressing binary images (like faxes).  This
will replace the current group 3 and 4 fax algorithms.  The main
characteristics of the algorithm are:

* JBIG will be lossless: images will not be changed by the encoding
  and decoding processes.

* Images can be encoded and decoded sequentially: there is no need for
  either end to store the entire image.

JBIG works best on bi-level images (like faxes).  It also works well
on Gray-coded grey scale images up to about six per pixel.  This is
done by applying JBIG to the bit planes individually.  For more bits
per pixel, lossless JPEG usually provides better performance.  Anything
beyond 6 bits per pixel is usually noise anyway, and so should be
ignored.

-- Outline --

The following text is taken from the Usenet comp.compression
Frequently Asked Questions [\ref{comp.compression.faq.jbig}], section
74.  This extract was written by Hank van Bekkem (email address:
<jbek@oce.nl>).

  JBIG parameter P specifies the number of bits per pixel in the image.
  Its allowable range is 1 through 255, but starting at P=8 or so,
  compression will be more efficient using other algorithms. On the
  other hand, medical images such as chest X-rays are often stored with
  12 bits per pixel, while no distortion is allowed, so JBIG can
  certainly be of use in this area. To limit the number of bit changes
  between adjacent decimal values (e.g. 127 and 128), it is wise to use
  Gray coding before compressing multi-level images with JBIG. JBIG
  then compresses the image on a bitplane basis, so the rest of this
  text assumes bi-level pixels.

  Progressive coding is a way to send an image gradually to a receiver
  instead of all at once. During sending, more detail is sent, and the
  receiver can build the image from low to high detail. JBIG uses
  discrete steps of detail by successively doubling the resolution. The
  sender computes a number of resolution layers D, and transmits these
  starting at the lowest resolution Dl. Resolution reduction uses
  pixels in the high resolution layer and some already computed low
  resolution pixels as an index into a lookup table. The contents of
  this table can be specified by the user.


This is the obvious standard for any kind of electronic document
storage and transmission system.  The patented algorithm at its heart
is a cause for some worry.  Any application of this standard would
require a patent license from IBM.

-- References --

"comp.compression Frequently Asked Questions" by Jean-loup Gailly
(email {\tt <jloup@chorus.fr>}).  Available from the Imperial College
repository.  October 1993.

"Progressive Bi-level Image Compression, Revision 4.1", ISO/IEC
JTC1/SC2/WG9, CD 11544, September 16, 1991

"An overview of the basic principles of the Q-coder adaptive binary
arithmetic coder", W.B. Pennebaker, J.L. Mitchell, G.G.  Langdon,
R.B. Arps, IBM Journal of research and development, Vol.32, No.6,
November 1988, pp. 771-726 (This is the patented algorithm.  See also
the other articles about the Q-coder in this issue)

7.4. MPEG
---------

-- Origin --

The Moving Pictures Experts Group, a part of ISO.

-- Standard Status --

In January 1992 a Committee Draft of MPEG phase I was released
(colloquially called MPEG-I).  Its exact name is ISO CD 11172.
MPEG-II is presently being developed.  It will probably be released
some time in 1994.

MPEG-I chips are available from a number of suppliers.  Some run at
up to 4Mbits/sec data rates, allowing higher quality video than pure
MPEG-I.

-- Purpose --

To define a standard for compressed digital video and audio.

-- Outline --

MPEG I defines a system requiring about 1.5Mbits/sec for video with a
mono sound track.  Frame size and rates differ for American and
European standards (to fit in with the American NTSC and European PAL
and SECAM analogue video standards).  The European standard transmits
288 lines of 352 pixels at 50 fields per second.  The fields are then
interlaced to give 25 frames per second.

1.5Mbits/sec was chosen as a target figure for MPEG-I because that is
the data rate provided by CD and DAT.

MPEG-II will transmit ``entertainment'' quality video and sound at
about 4Mbits/sec.

An introduction to MPEG, along with an regularly updated list
of chips and boards, can be found in the Usenet comp.compression
Frequently Answered Questions.

MPEG-I will be the standard for medium-quality video in such
applications as CD-Interactive and video-phones.  Chips are available
which work at higher data rates than specified in MPEG-I, although
these will not comply with MPEG-II.  These chips are aimed at the
cable TV and video-conferencing markets.

-- References--

comp.compression Frequently Asked Questions by Jean-loup Gailly (email
<jloup@chorus.fr>).  Available from the Imperial College repository.
October 1993.

Contains a brief description of the MPEG algorithm and a list of
devices which implement it.

7.5. u-Law and A-Law (G.711)
----------------------------

The "u" in "u-Law" is actually the Greek letter "mu".

-- Origin --

CCITT.

-- Standard Status --

Standard G.711.  Fairly widely used.  u-Law is used in North
America and Japan, and is often implemented on Unix workstations.
A-Law is used in the rest of the world, including international
telephone routes.

-- Purpose --

To provide a simple logarithmic compression scheme for audio data.

-- Outline --

G.711 is a lossy compression scheme which compacts 16 bit sound
samples down to 12 bits, or 12 bit samples down to 8 bits.  Like all
lossy compression schemes it is designed around imperfections in human
perception.  A loud sound will ``drown out'' a quiet one.  So
compression schemes can afford to add random noise when the signal is
loud, provided that they keep the random noise down when the signal is
quiet.

A log-law compression scheme quantises the input data on a logarithmic
scale instead of a linear one.  This provides precision (and hence low
noise) at low values, but the quantisation errors (and hence the
random noise) increase at higher levels.  This random noise is drowned
out by the signal.

This is probably the simplest compression scheme for sound.  It also
provides reasonable quality and a reasonable amount (25-35%) of
compression.

-- References --

"FAQ: Audio File Formats" by Guido van Rossum (email <guido@cwi.nl>).

8. Document Formats
===================

8.1. SGML: Standard Generalised Markup Language
-----------------------------------------------

-- Origin --

Publishing (Association of American Publishers, AAP ?).


-- Standard Status --

ISO standard 8879.


-- Purpose --

To define a standard way of describing the purpose of individual
pieces of data in a text document, in order that the meaning and
structure of the text can be extracted by automatic programs.  For
example, titles, paragraph headings, notes etc.  should be identified
as such.  Data marked up with SGML can therefore be regarded as a very
simple sort of database rather than as a simple sequential text file,
which adds considerably to its value.  In particular, it becomes
possible to port the data between different publishing systems etc
without loss of structure.


-- Outline --

Historically, document data contained procedural markup, which conveys
how the data will appear in printed form.  Commands such as indenting,
listing, titling, font selection and so on fall into this class, in
products such as RUNOFF.  Apart from making many global changes
difficult, this ties the data to the particular interpreter because
the commands are not universal.

SGML is different - it is a descriptive markup language, which defines
the data in prescribed categories.  The decision to print second-level
paragraph headings in double-height underlined Gill Sans is not
embedded in the data, but is defined by a separate mapping (in a post
processor: the DSSSL, or Document Style Semantics and Specification
Language, is currently being defined as a companion standard to SGML
to standardise the workings of such post processors).  Thus the
content of the data and the form of its presentation are separated,
which makes the data portable between different systems.

In addition, the user can define different document types, with
different allowable elements and structures, using DTDs - Document
Type Definitions.  Thus memos, letters, instruction sheets, amendment
sheets etc can be defined in terms of content and organisation, and
can be produced to any standard output format simply by modifying the
post-processing instructions.  It is, of course, possible for other
programs to interrogate such data, since the combination of DTD and
document data is self-descriptive.

Non-character based data can be implicitly embedded in an SGML
document simply by storing it in a separate file and embedding a
reference to it in the text file.

Note that the Office Document Architecture (ODA), ISO standard 8613,
is to some extent competitive with SGML, though it focusses more on
interchange of formatted documents.  ODA is specified as part of GOSIP.

9. Internet Services
====================

9.1. TCP/IP
-----------

-- Origin --

This is actually two different standards.  RFC 791 defines the
Internet Protocol (IP).  RFC 793 defines the Transmission Control
Protocol (TCP).  There is also a User Datagram Protocol (UDP) defined
in RFC 792 for packets where the delivery and ordering of the packets
is handled by the client software.  This is used by the Network File
System and the Sun Remote Procedure Call (RPC) library.

IP as defined in RFC 791 has been ammended by RFCs 950 (Subnet
extension), 919 (Broadcast Datagrams) and 792 (Broadcast Datagrams
with Subnets).  There are also a number of mappings between IP and
various other network protocols, including Ethernet and X.25.

-- Standard Status --

IP and its ammendations are all Required Protocols.  TCP is a
Recommended Protocol.  In practice it would be a very unusual Internet
node that did not support TCP.

The mappings from IP to other network protocols are Elective (``if you
are going to do something like this, you should do exactly this'').

-- Outline --

IP provides a basic packet switching protocol.  Packets are not
acknowledged, and may be delivered out of order.  A checksum is
included with each packet, but there is no error correction facility.

TCP is intended to be a highly reliable host-to-host protocol between
hosts in packet-switched networks.  It is built on top of IP and
provides a connection-oriented end-to-end link between pairs of
processes running on different host machines.

The protocol includes methods for connection between numbered
``ports'' on the two hosts, flow control, automatic retransmission of
lost data, and the transmission of precedence and security
information.  The connection between a port number and an application
process is handled by the operating system on the host machine.

This is the de-facto world standard upon which higher-level services
are built.  Any system which needs to communicate with other systems
accross a WAN should support this.  Numerous third-party
implementations are available, including packet switches and routers.

9.2. ARPA Internet Text Messages (RFC 822)
------------------------------------------

-- Origin --

RFC 724: "Proposed official standard for the format of ARPA
Network messages" was written by D. Crocker, K.T. Pogran, J. Vittal
and D.A. Henderson and released on 12 May 1977.  A modified form was
adopted as a standard (RFC 733, 21st November 1977).  D. Crocker wrote
a revised version (RFC 822, 13th August 1982) which has now been
adopted as the standard.

-- Standard Status --

Recommended: all Internet sites should support this.

-- Purpose --

To provide a minimum standard for electronic mail with a framework for
future expansion.

-- Outline --

RFC 822 is designed to require a little and permit a lot.  A message
is divided into the ``header'' (a sequence of fields in a format which
can be parsed by the machine) and a ``body'' (the text of the
message).  RFC 822 describes a syntax for header fields and lists a
set of header fields which must be included.  Other headers may be
added by various applications.  These are permitted by the standard,
but apart from the basic syntax of header fields RFC 822 does not
specify anything about them.  Examples of such headers include
``X-Face'' (a compressed image of the sender), ``X-Mailer'' (the name
of the application program used to compose the message) and
``X-Automatic-Reply'' (indicates that the message was generated by
some kind of automatic process).

Internet email is increasingly being used as a vehicle for other
services, including file transfer, remote job submission, electronic
conferencing and software distribution.  The general idea is to
package some kind of executable script in an email message and send it
to an address on the remote machine.  Mail to this address is
delivered to some kind of automatic server program which performs the
appropriate function, packages the results into another message, and
mails them back.

Internet electronic mail addresses are of the form:

   alias@site.domain

The ``alias'' part is the name of the recipient.  It is largely up to
the destination machine to resolve this.  Most machines allow users to
be addressed by a range of aliases, usually including one of the form
``Fore-name.Surname''.

The ``site'' part is usually the name of the organisation where the
recipient has an account.  In some cases it may also be divided by
periods.

The ``domain'' part allows hierarchical subdivision of the ``site''
namespace.  Common domains include:

+-------+----------------------------+
| com   | commercial                 |
| edu   | US academic                |
| gov   | US government              |
| mil   | US military                |
| org   | US non-profit organisation |
| co.uk | UK commercial              |
| ac.uk | UK academic                |
+-------+----------------------------+

USA sites do not usually append a national domain name, reflecting the
American origins of the Internet.  Other countries have their own
domain names, usually based on the ISO two-letter country code (the UK
is an exception to this).

The UK Joint Networking Team is responsible for JANET.  They specify a
similar email standard to the IAB.  The most noticeable difference is
that email addresses to the right of the ``@'' sign are reversed, so
that in the UK {\tt paj@gec-mrc.co.uk} becomes {\tt
paj@uk.co.gec-mrc}.  This occasionally causes problems.

UK Government departments and commercial organisations which do most of
their business with the government will want X.400 mail.  The rest of
the world will want RFC 822 mail.  A number of organisations will want
both.  RFC 1148 proposes a mapping between the two standards.

9.3. X.400 - Internet Email Mapping (RFC 1148)
----------------------------------------------

-- Origin --

RFC 987: "Mapping between X.400 and RFC 822" by S.E. Kille was
released on 1st June 1986.  Updated by RFC 1026 (1st Sept 1987) and
RFC 1138 (1st December 1989).  The latest version is RFC 1148:
"Mapping between X.400(1988) / ISO 10021 and RFC 822" (1st March 1990)
and contains minor clarifications to RFC 1138.

The work which led to RFC 1138 was partly sponsored by the Joint
Networking Team.

-- Standard Status --

Listed as IAB Elective standard.  The IAB defines an ``Elective''
standard as ``If you are going to do something like this then you
should do exactly this.''

-- Purpose --

To define a mapping between the X.400 and RFC 822 electronic mail
standards.  The design goals were:


1: The specification should be pragmatic.  There should not be a
   requirement for complex mappings for ``Academic'' reasons.  Complex
   mappings should not be required to support trivial additional
   functionality.

2: Subject to (1), functionality across a gateway should be as high as
   possible.

3: It is always a bad idea to lose information as a result of any
   transformation.  Hence, it is a bad idea for a gateway to discard
   information in the objects it processes.  This includes requested
   services which cannot be fully mapped.

4: All mail gateways actually operate at exactly one level above the
   layer on which they conceptually operate.  This implies that the
   gateway must not only be cognisant of the semantics of objects at
   the gateway level, but also be cognisant of higher level semantics.
   If meaningful transformation of the objects that the gateway
   operates on is to occur, then the gateway needs to understand more
   than the objects themselves.

5: The specification should be reversible.  That is, a double
   transformation should bring you back to where you started.

-- Outline --

RFC 1148 defines a ``gateway''.  In electronic mail terminology this
is a component that performs protocol mappings.  Unfortunately RFC 822
and X.400 do not map on to each other well.

Services in X.400 which are not defined in RFC 822 are mapped on to
extension headers to avoid information loss, but there is no guarantee
that the RFC mailers will comply with these.  For instance X.400 has
an service to set an expiry date on messages.  This is mapped on to a
new header (``Expiry-Date:'') which RFC 822 systems will ignore unless
specially programmed to process it.  In general the only RFC 822
mailer likely to recognise these fields is another RFC 822 - X.400
mail gateway.

RFC 822 headers are either mapped onto standard X.400 services or
jammed into an extension service ``RFC 822 Header Field''.

The biggest problem in the mapping is addresses.  RFC 822 addresses
are of the form "user@site.domain".  X.400 addresses are a sequence of
symbol-value pairs (see the section on X.500 for more information).
To solve this problems RFC 1148 defines the following:

* A mapping between the ``user'' part of the RFC 822 address and the
  ``PersonalName'' attributes of the X.400 address.

* A system of ``associations'' between the ``site'' and ``domain''
  parts of RFC 822 and various other attributes.

The imperfection of the RFC 822 - X.400 mapping is a regular
annoyance for those who must transmit their email through such
gateways.  This situation is unlikely to improve.

9.4. Distributed Electronic Conferencing (RFC 1036)

-- Origin --

RFC 850 by M.R. Horton released on 1st June 1983.  Obsoleted by RFC
1036.

-- Standard Status --

RFC 1036: "Standard for interchange of USENET messages" by
M.R. Horton and R. Adams.  Released 1st December 1987.

This is not recognised as a standard by the Internet Activities Board.
Despite this it is the de-facto world standard for electronic message
broadcasting (as opposed to the point-to-point electronic mail
standards of X.400 and RFC 822).

-- Purpose --

To define a standard header format for broadcast messages on
electronic conferences such as ``USENET''.

-- Outline --

RFC 1036 extends the RFC 822 electronic mail standard by adding a
number of extra fields.  Any message conforming to RFC 1036 also
conforms to RFC 822.

USENET is a world-wide distributed electronic conference system.  It
is divided into a hierarchy of ``newsgroups'', each one of which has a
particular topic.

The USENET distribution mechanism uses a tree structure.  Each node in
the network is connected by some transport mechanism to a small number
of other neighbouring nodes.  When a node receives a message from one
of its neighbours, it stores that message on disk and forwards a copy
to all its other neighbours, who in turn forward it on to their
neighbours.  In this way any message ``posted'' to a USENET group will
spread through the network.

Although the Usenet message format is specified in an Internet RFC,
Usenet distribution is not tied to the Internet.  Usenet articles can
be transmitted by the same means as any other data, including packet
switching networks, high speed modems, and a floppy disk carried from
one site to another.

There is no equivalent to RFC 1036 in the X.400 world, but something
could certainly be defined if a project required it.  This could then
be submitted to CCITT as the basis for a standard.  However it would
be difficult to avoid the problem of mapping between the two systems
that bedevils X.400 - RFC 822 gateways.


-- 
Paul Johnson (paj@gec-mrc.co.uk).	    | Tel: +44 245 473331 ext 3245
--------------------------------------------+----------------------------------
You are lost in a twisty maze of little     | GEC-Marconi Research is not
standards, all different.                   | responsible for my opinions