---------------------------------------------------------------------------- METADISK (tm) DRIVER TECHNICAL DESCRIPTION SunFLASH Vol 22 #9 October 1990 ---------------------------------------------------------------------------- METADISK (tm) DRIVER TECHNICAL DESCRIPTION Revision 2.2 David Taber Abstract -------- This document provides a technical description of the MetaDisk (TM) Driver software. The MetaDisk, which is unbundled Sun software, allows physical partitions to be grouped into larger logical partitions. By creating virtual devices that conform to standard UNIX device semantics, the MetaDisk operates transparently to the operating system and applications software. The MetaDisk allows mirroring (or shadowing) of data across physical partitions controlled by a single CPU. It also allows "meta-partitions" to span physical devices, so that files can be larger than a single disk. The MetaDisk comprises kernel code, a daemon, several utilities, a configuration data file, and a status database. This document describes the concepts and features of these modules as of July 1990 (version 1 FCS). Notice ------ This document was specially prepared for Sun customers, and is not standard Sun documentation. The information herein is intended to assist Sun customers in understanding and evaluating the SPARCserver Manager product, and is not to be considered as authoritative or as a substitute for the product documentation and Sun technical support. This information in this document is believed to be correct, but it may contain errors and omissions and is subject to change without notice. This document is Copyright 1990 by Sun Microsystems, Inc. All rights reserved. No part of this work may be reproduced in any form or by any means - graphic, electronic, or mechanical - including photocopying, recording, taping, or storage in an information retrieval system, without the prior written consent of the Copyright owner (Sun Microsystems, Inc.). Metadisk is a trademark of Sun Microsystems. UNIX is a registered trademark of AT&T. Introduction and Organization ----------------------------- This document is intended to provide technical information to Sun customers, to help guide their planning and system integration efforts. This document contains more information than standard Sun documentation, and was specially written to be more readily usable than an internal or external specification. This document refers to the Metadisk (TM) implementation sold as part of SPARCserver Manager Release 1. It is anticipated that there will be significant additions to the Metadisk functionality in future releases, so customers requiring "future features" information should set up a non-disclosure briefing via their local Sun sales representative. This document is organized into the following sections: Product Description -- An overview of the product functions and features Architecture -- A high-level definition of the constituent modules, operations, and data flows Theory of Operation -- Underlying principles of configuring and using a Metapartition Features, Specifications, and Limitations -- Expected feature-set, performance, and operational bounds Product Description ------------------- The MetaDisk (tm) Driver is part of a larger Sun product, the SPARCserver Manager (tm), or SSM. SSM is an unbundled software product designed for high-end server and graphics workstation customers desiring enhanced availability, capacity, and management of mass storage. SSM provides three major functions: disk mirroring, partition spanning, and enhanced system administration tools. SSM will run on any Sun-4 system running SunOS 4.0.3 or greater, and is transparent to OS and applications software. The MetaDisk Driver significantly enhances the functionality of mass storage by allowing an arbitrary number of physical disk partitions to be treated as a single logical device, called a meta-partition. A meta-partition may comprise IPI, SMD, or SCSI disks that are "local" (directly connected) to a CPU. A meta- partition can have any combination of features active, and a system can have several meta-partitions active simultaneously. Because the MetaDisk is implemented as kernel software running in the host CPU, it is more flexible than firmware implementations, and can be retrofitted into existing SPARC systems at low cost. The MetaDisk software has two major operational modes. Once a meta-partition is configured, it can provide: 1. Disk mirroring for high-availability storage. A partition's data (file system or "raw device") is mirrored onto a second partition by redundant writing. Reads are alternated between the mirrored partitions, and can be concurrent for higher throughput. As long as there is no hardware problem, the kernel and applications software will operate normally, with a nominal lessening of throughput. In the event of a disk or controller failure, the MetaDisk allows applications software to continue operating normally by using the redundant copy of the data as "instant backup." On a disk or controller failure, the following occurs: (a) The operating system logs an alarm message, but applications are not sent an error message. (b) The MetaDisk driver automatically switches to single- disk storage. The Metadisk's state database is updated to reflect the degraded configuration, and status utilities will indicate which disk has been taken off line. (c) All applications continue, with somewhat degraded disk throughput. (d) At some point, the system administrator selects a partition to replace the failed disk or controller. (e) The system administrator edits the MetaDisk configuration file, and runs software utilities. (f) The utilities bring the new disk's data "up to date" and automatically restart mirroring as soon as the contents are synchronized. Throughout this process, applications can still read and write data to all non-failed disks. Once this process is complete, the Metadisk state database is updated to reflect the normalized configuration. Disk mirroring can also be used to support on-line backup activities, by allowing a "snapshot" (mirrored) copy of the data to be backed up (with dump, tar, or other utilities) without interfering with users' disk read and write activities. Of course, during the backup, system throughput will diminish as the backup software consumes system resources. Disk mirroring is usually used to increase data availability, but it can enhance read performance as well. For example, in the /export file systems (which are often read-only) in an NFS server can be mirrored. 2. Partition spanning for large files. By joining a number of partitions into a single logical device, the MetaDisk allows files (and file systems) to span disks. Any combination of partitions can be joined together, creating meta-partitions of up to 2 GB. When using the UNIX file system, individual files can grow as large as 2 GB. The file system tends to spread accesses throughout the meta-partition, so in multi-user systems with multiple disk controllers, reads and writes can be concurrent for higher throughput. In the event of a disk or controller failure, the "spanning" mode of MetaDisk magnifies the un-availability of data. Typically, users that need high data availability, as well as large files, should use MetaDisk's mirroring mode to supplement the spanning mode. The MetaDisk can be used by any application, database, or file system on the local machine, or (via NFS or RFS), can serve data to remote systems. Use of metadisk services does not require any special privileges - metadisk can be used like any standard disk drive. Installation, configuration, and control of the MetaDisk requires Super User privileges. SSM's other functionality is the IO-administration tools. IOadmin provides a consistent, user-friendly interface to storage resources, based on SunView windows, menus and buttons. These tools can be used to administer Sun systems from any Sun workstation on the network. IOadmin operation is completely transparent to the end-user, and facilitates three major functions: Device Monitoring, File System Monitoring, and Performance Monitoring. Architecture ------------ The MetaDisk driver is precisely that. It is a virtual disk driver that is "above" the standard disk drivers, interposing two levels of indirection for disk I/O. The metadisk's operational features are provided by a UNIX device driver linked into the kernel. As a driver, it must conform to the semantics and behavior of UNIX devices. Any function that is "un-device-like" must be performed by utilities, files, tools, or shell scripts. The MetaDisk driver is a pseudo device in that it is not associated with any physical device hardware. It drives other drivers, and re-directs read and write requests to the proper driver. Because of this, using certain utilities (notably, fsck and dkinfo) with a meta-partition (instead of the "real" partition) may produce unexpected or erroneous results. Consequently, understanding the MetaDisk architecture is important to using it properly. The MetaDisk architecture is entirely a software architecture. There are no requirements or assumptions about the hardware per se, other than the fact that the storage devices must be disks (and should be winchester disks with caching controllers. Figure 1 shows the software architecture of the MetaDisk driver. Figure 1 MetaDisk Driver Software Architecture (all CPU resident) +---------------------------------+ | Applications / User Processes | +---------------------------------+ | | +------------------------------------------+ | SunOS 4.0 / 4.1 | | kernel | |------------------------------------------| | | | | VNODE | | | File System Interace | Raw | | | Device | |-------------------------+ Interface | | | | | | NFS | UFS | | | | | | +-----------+-------------+ | / | | / |______________________________| / | \ / | / | \ / | ethernet --- | +-------+ | | | MDD | | | +-------+ | | / \ | | / \ | +-------+ +-------+ | SCSI | | IPI | +-------+ +-------+ The operating system kernel is the standard SunOS 4.x, with the bdev and cdev tables expanded to include meta-partitions. Below the kernel is the VNODE file system interface, and the standard file systems supplied with SunOS. The UNIX file system can be mounted on any disk (SCSI, SMD, or IPI), but for simplicity the figure shows only SCSI. For raw-device access, the kernel uses the /dev devices (bdevsw/cdevsw) as a key interface. Again, the interface could be to any disk (SCSI, SMD, or IPI), but for simplicity, the figure shows only SCSI and IPI. The MetaDisk acts as a standard UNIX device, so it "poses" as a disk that can be used by the file system or raw device access. Meta-devices have standard /dev entries, following the nomenclature /dev/md0, /dev/md1, and /dev/md2; partitions (minor device numbers) are defined as a-h (0-7). The /usr/sys/sun4/conf/ file is expanded to include 24 pseudo- devices, and the MetaDisk driver is linked into a new kernel (via "make"). Once installed, Metadisk resides in CPU memory at all times. Figure 1 shows the relationship among the drivers and the file system. Of course, below the disk drivers there may be multiple controllers and disk drives, each mounted as separate file systems. As with any UNIX device, MetaDisk supports each of these partitions (file systems) as an independent data stream, so users reading or writing to non-meta-partitions will have their data sent directly to the underlying drivers. Only data stored within meta- partitions is actually accessed via the MetaDisk driver. Depending on the file system (and raw device) configuration, a given disk drive may contain several "standard" partitions, or several meta-partitions, or a mixture. It is important to note that UNIX will not prevent any underlying (real) partition to be accessed directly by users with root privileges, even if the partition is currently in use by Metadisk. "Going around" the metadisk in this fashion is very dangerous, and can lead to system panics: proper training of super-users is, as always, critical. Mirroring (or concatenation) can be done using partitions on a single disk, between partitions on different disks, or between disks on different controllers. While some of the allowable MetaDisk configurations may be of limited usefulness (for example, mirroring to a different partition on the same disk), the software does not stop the Super User from setting them up. This flexibility allows experimentation and "configuration tuning" that would not be possible in a hardware or firmware implementation. The MetaDisk software architecture and implementation make very few assumptions about the disk and controller hardware, so virtually any standard hardware configuration can be used with the MetaDisk. The disks don't have to be the same geometry, or even the same technology. You can mirror a 1 GB 8" IPI disk partition onto a 300 MB 5" SCSI concatenated with a 700 MB 9" SMD partition, no problem. (The performance may not be the greatest, but it will work.) The disks can be on the same controller, or different controllers. However, we cannot guarantee compatibility with unsupported (third-party) disks and controllers. The Metadisk and Optical disks--The Metadisk was designed with winchester disks in mind: disks with very unusual geometry or performance characteristics (such as Erasable Optical or floppy disks) are likely to cause performance or functional anomalies. In any case, the Metadisk is not tested or certified to work with non-winchester drives, and any use with such equipment is at the customers' own risk. If a customer wishes to experiment with mirroring optical drives, the following should be noted: * IF the optical disk behaves like a standard disk drive, and IF its driver is well written (particularly with regard to error handling, time-outs, and buffer management), the Metadisk would conceptually be able to mirror data blocks onto optical disks. Performance would be no worse than the underlying speed of the optical drives. * Mirroring data blocks is not enough to be useful in several applications. Mirroring data blocks would allow two WORM drives to mirror eachother, or two erasables together, but can NOT allow a WORM to mirror an erasable or a WORM to a winchester magnetic. The structures of file systems (e.g., UFS, PC-DOS, or System V) and databases cannot handle the write-once characteristic of WORM drives. * Erasable opticals can do block-mirrors of winchesters, but the performance penalty is huge. Since Metadisk will not signal the OS that a write is complete until both copies of the write are complete, the worst-performing drive dominates performance. Erasables are particularly slow when writing--an order of magnitude slower than winchesters. So an erasable optical mirror of a winchester is not likely to be acceptable. * Erasable opticals are typically in automated "jukeboxes" that allow huge capacities at low cost. Mirroring does not comprehend the existence or operation of these devices (as MDD was designed for use only with fixed- platter winchesters), so several disastrous consequences can arise if disk platters are interchanged. Coordination and effective use of these jukeboxes is a large software task far outside the scope of mirroring. * Typically, an interest in mirroring from a winchester to an optical drive signifies some other, larger requirement, such as "file archiving." Before experimenting with optical mirroring, identify and understand the larger requirement. Internal Interfaces--The Metadisk system comprises several elements not shown in Figure 1. Figure 2 depicts the relationship among the internal elements of the Metadisk system, and the flow of control information among them: FIGURE 2 Metadisk Internal Interfaces S.User --------> Metadisk Configuration File ("Metatab") | | v S.User <-------> Metadisk Utilities ^ ^ | | | | | v | Metadriver | ^ ^ | | | | | +-----> Metadaemon | | v v Metadisk State Database External Interfaces--The MetaDisk driver itself has no user interface. It is "used" by reading or writing raw blocks to meta-partitions in the /dev directory, or by mounting a file system on them. Data is directed into each mirrored and/or concatenated entity via file naming: * For raw device access, using /dev/rmd0a (a mirrored metapartition, for example) selects data mirroring. Using a standard disk device (e.g., /dev/id010c) will be unmirrored. * For file system access, mounting a mirrored metapartition (e.g., mount /dev/md0a /mirror) will allow the user to select data for mirroring by naming the file /mirror/foo. Symbolic links can be used to selectively include or exclude files in a file system from mirroring, to conserve disk space. (Symbolic links are equally useful for concatenated metapartitions.) The Metadisk configuration is defined in a file (the "metatab") that describes each mirrored and/or concatenated metapartition. The metatab is created using the administrator's editor of choice and, like fstab, is a list of "intentions" that are acted upon with utilities. The MetaDisk is configured and controlled using several utilities that can only be run by the Super User's shell or scripts (see Table 1). Table 1 MetaDisk User Interfaces User Does What Effects What ---- --------- ------------ Super User runs metascript prepares files and devices for kernel build Super User builds kernel enables metadisk Super User edits metatab defines intended metadisk config Super User runs metastat prints status report- which is "empty" Super User runs metainit starts metadisk normal operation; mirroring/spanning can now be used End user reads/writes files stored data goes through in a meta-partition MetaDisk driver Super User runs metastat makes sure everything is OK Super User runs metadetach disconnects specified metapartition from its mirror...as part of an on-line backup process Super User runs metattach reconnects specified metapartition to its mirror Super User runs metasynch resynchronizes the specified mirrored pair, while it is available for read and write use Super User runs 8 other utilities for unusual admin or maintenance situations Options: * affect specific devices or all metapartitions * change read-error threshold (per mirror) * change read algorithm (per mirror) * allow mirror writes in parallel, or sequence * resynchronize mirrors automatically or manually * specify resynchronization source ("the good copy") * control resynchronization rate Porting your application to Metadisk--Users and application writers may wish to use and control metadisk features from within their software, to provide "turn-key" features for their customers. While we generally encourage this, it is important to note that THE ADMINISTRATIVE EXTERNAL INTERFACES TO METADISK WILL BE EXPANDED IN FUTURE RELEASES. Sun will attempt to make these changes in an upward-compatible way, but cannot guarantee that software or scripts written to control metadisk release 1 will work without modification in future releases. Integration with Databases--Metadisk is compatible with all known applications, file systems, and databases. Some DBMSs have mirroring built in; others do not. In evaluating database mirroring alternatives, consider the following: Metadisk's mirroring has very good performance, as it is done at logical-block level, as a device driver. The performance under normal conditions is nearly as good as hardware-based mirroring (which is done entirely by the controller), but SSM is much more flexible than hardware implementations. A database's mirroring will actually be somewhat SLOWER than SSM's (as it is "farther away" from the hardware)... so why would you want to use a database's mirroring? A *good* implementation of mirroring within a database is tightly coupled to the database log and transaction monitor. Consequently, the database's mirroring can "know" much more about the context of the data being mirrored, and mirror resynchronization (after a system crash) can be much faster than SSM's mirroring. In addition, the tighter integration of mirroring can close certain loopholes and race conditions, leading to higher data integrity than would otherwise be possible. A good implementation of mirroring within a database can also maintain replication consistency as well as or better than metadisk. It is critical that you ask your database vendor about the features mentioned in the paragraph above. If they do NOT have ALL these features, it is probably best to use metadisk mirroring. If they DO have ALL these features, it is best to use the database's mirroring. Database mirroring will only cover the database--if you have other critical data on the same server, you can use mirroring provided by BOTH the DBMS and metadisk. Of special interest is the metasynch utility, which is used to resynchronize mirrored partitions. Whenever the data on a pair of mirrored partitions do not match (such as, after a system crash, a disk crash, or an intentional halt of the mirroring function), they are said to be out of synchronous. This utility allows the "good" copy of the data to be used by applications, while simultaneously bringing the "bad" partition up to date. Because this utility uses specially allocated MetaDisk write buffers, it is able to run "in the background," and resynchronize the data while users are "on line" with full data integrity. Note that fsck and database recovery operations can run in parallel with resynchronication, so the users will perceive no change in system recovery time. While the synchronization is in progress, the system is fully usable with a slight degradation in performance. Once the synchronization is complete, normal mirroring operation (and performance) are restored automatically. If the system has been shut down cleanly (using metafastboot or metafasthalt), all mirrored data are known to be in synchronous. Consequently, the reboot sequence skips the resync and fsck processes and the system is fully operational within a few moments of the boot. A further area of interest is replication consistency. Replication consistency is a critical issue for the integrity of mirrored data after compound errors or failures have occurred. When a disk crashes, its data become "stale"; if the mirroring software does not recognize this after a system crash or additional disk crash, the stale data could be used as the source of resynchronization. This situation can occur more often than you'd expect, as system problems are often correlated in time and tend to "cluster" in rapid succession...and the results can be complete data corruption. The Metadisk architecture includes a state database to ensure that automatic resynchronization is always correct. Replication consistency is guaranteed across power failures and system failures, but in the first release is NOT guaranteed in case of a system disk failure. In the case of a system disk crash, metadisk will not attempt to automatically resynchronize, and human intervention is required. Theory of Operation ------------------- The MetaDisk driver and associated utilities operate within the standard UNIX paradigms to present a device which acts as a disk partition. This pseudo device, which actually has no hardware associated with it, redirects all open(), read(), write(), bread(), bwrite(), and many ioctl() operations to the appropriate hardware device drivers. For file system or raw device reads and writes, the MetaDriver introduces two additional levels of mapping above the real devices by distributing the block numbers of a meta-partition across the real partitions. The file system's or raw device's block number is taken as input to the MetaDisk driver, which then translates the meta-block number to device-specific logical block numbers. The only difference between "mirroring" mode and "spanning" mode is the translation algorithm used by the MetaDisk. The mapping of cylinder, track, and sector to logical block numbers is handled, as always, by the disk controller microcode and driver software. The remapping of bad sectors is handled by controllers and driver software (in the case of SCSI and IPI) or by the format utility (in the case of SMD), and is not affected by the presence or operation of the MetaDisk driver. In mirroring mode, the MetaDisk driver duplicates all writes to both "component" partitions. Normally, the MetaDisk dispatches the second write as soon as the first write has been dispatched; it does not wait for the first write to complete. Optionally, the MetaDisk will wait for the first write to complete before issuing the second write command. The MetaDisk does not issue the write-completion signal back to the operating system until both writes have been completed. Reads, however are handled differently. Since both copies of the data are logically identical at all times, the reads can be distributed to both drives to increase parallelism. The current read algorithm alternates the read requests across both disks. Performance measurements indicate that this increases read throughput, but since the degree of increase is highly dependent on the particular access patterns that occur, it is very difficult to accurately characterize the improvement in benchmark tests. Optionally, the MetaDisk can be configured to read from one disk only, or to read using an algorithm minimizing head movement for sequential I/O. When mirroring mode encounters a disk or controller failure, the failed element is no longer used by metadisk for reading or writing. The metastat utility will show that the disk has been taken "off line" (i.e., not used by metadisk; the disk is still "on line" in terms of the device interface). The metadisk will not use the failed element until the system administrator starts the resynchronization process. In spanning mode, the MetaDisk driver logically joins several physical partitions. Consequently, the "device" (or file system) is spread across all the constituent partitions, with the logical block number of each partition appended to its predecessor. Spanning mode uses monotonically increasing logical block numbers, and does NOT include any algorithms to "stripe" the data across the partitions. Reads and writes are distributed across the partitions, according to the locality of reference (logical block number) of the data. The UNIX file system tends to fill cylinder groups evenly, so a file system that has been in use for several weeks will tend to be distributed across all the partitions spanned by a meta-partition. Consequently, in heavily-loaded, multi-user activity, MetaDisk's spanning mode can increase concurrency of seeks and accesses, and may increase throughput somewhat. Of course, the degree of increase is highly dependent on the particular access patterns that occur, and is therefore very difficult to characterize in benchmark tests. In spanning mode, any error from the disk or controller is "passed up" to the originating process without inter- ception. As a failure of any of the devices in a concatenation will make *all* the data stored in that concatentation unusable, it is critical to mirror concatenations, or at least make frequent backup tapes. A meta-partition can consist of several real partitions, spread across several physical disks. Differences in disk geometry and performance are irrelevant to the MetaDisk driver. It is important to note that, in mirroring mode, the two copies of the data will almost always be mapped onto different physical sectors of the respective disks. While this is normal, it can cause confusion if the user (or Super User) is trying to do something that "expects" an actual single disk drive (this is very unusual). Several meta-partitions can operate concurrently; in fact, this is the expected mode of operation. Each meta-partition is independent, and each may have mirroring, concatenation, and resynchronization features in operation at any time. The utilities used with the MetaDisk driver are straightforward, and are explained in the previous section. The metasynch utility requires further explanation, however, because it interacts closely with the MetaDisk driver itself. When mirrored disks are out of synchronous, and in general during recovery from hardware failures, special conditions must be properly handled if we are going to be successful. The metasynch utility is a special block-copying utility, similar (in concept) to dd. Metasynch operates on the MetaDisk buffers directly, however, so it is much more economical than a dd copy. Further, because these buffers are managed by the MetaDisk driver, consistency of the mirror is assured. In this way, users may read and write to a "resynchronizing" mirrored pair, and the writes are consistent even if the system load is such that resynchronization takes an hour or more. Great care has been taken to avoid race conditions and "consistency holes;" so far, no Metadisk bugs have been reported, and no customer has lost any data due to Metadisk (indeed, many have been saved by mirroring protection). The MetaDisk and associated utilities are not difficult to understand, given knowledge of UNIX devices. However, the interaction between the file system and meta-partitions is not obvious, particularly at "recovery time" or during re-boot. Under normal circumstances, the file system (and any UNIX utilities) operate normally with any meta-partition. However, when a hardware failure has occurred, care must be taken to apply the right utilities or actions to the *right devices*. For example, a mirrored pair of partitions is actually three entities: the "primary copy", the "secondary copy", and the mirrored pair. Since mirroring produces two copies that are logically identical *but not physically identical*, running the file system check utility (fsck) against the mirrored pair can cause erroneous, or even destructive results. Fsck must be run against the constituent (real) devices, and to do so the mirroring must be temporarily deactivated. To simplify use of the MetaDisk, we have created a number of system recovery utilities to handle almost all types of "crash recovery." The utilities, with special scripts run during the boot sequence, require manual intervention only in the case of compound hardware failures. The Metadisk is part of SPARCserver Manager, which contains a series of window-based system administration tools that greatly simplify the human interface for controlling the Metadisk. For further information about SPARCserver Manager, refer to the User's Guide (part number 800-383-810, catalog number SMG-1.0-X-X-9). Features and Specifications --------------------------- Mirroring mode: Mirroring of data between a pair of partitions (which can be concatenated, up to 2 GB each) Maximum of 8 pairs of partitions concurrently. Spanning mode: Logical partitions that span a set of partitions. Maximum of 4 partitions (2 GB total) spanned per concatenation. Maximum of 24 spans concurrently. General: Maximum usable size of a metapartition is 2 GB (this is an OS limitation, not metadisk's) Works with any Sun-supported disk and controller Partition sizes ranging from 10 to 1000 MB Runs on Sun-4 CPUs Runs with SunOS 4.x Fully compatible with UFS and raw device interfaces Transparent to hardware and software operation; file system or controller functions do not interfere with MetaDisk operation Code characteristics: The MetaDisk driver and all utilities are written entirely in C MetaDisk kernel code is ~ 50 kB object The utilities total ~ 750 kB object Metatab is a short ASCII file (typically 100-500 bytes) Performance goals (nominal, highly dependent on system load): Mirroring throughput: ~90% of "standard" I/O. Performance range is 50 - 135%, increasing with read intensity. With file system, NetISAM, or NSE use, users have noticed no performance change. Spanning throughput: ~85% of "standard" I/O. Performance range is 70 - 100%, increasing with randomness of seek patterns. System crash recovery: 30-45 mins/GB of data, with human activity (note that system is on-line, supporting users as soon as normal boot process is complete) CPU utilization: 5 - 15% under normal conditions, increasing with I/O load. With file system use, users have noticed no performance change. During resynchronization, an additional 5 - 15% of CPU cycles may be consumed, depending on I/O load and number of synchronizations in progress. Users rarely notice that the resync is going on. Limitations: MetaDisk Release One can operate only within a single CPU; that is, it can use only directly connected disk devices. Remote mirroring is NOT supported. MetaDisk can operate only with disk devices; while it does not interfere with tape and communication devices, it cannot use them in any way. Meta-partitions operate as standard UNIX devices; any behavior atypical of a UNIX device must be handled by utilities, user application code, or human intervention MetaDisk and its associated utilities will work with any Sun disk or controllers. While MetaDisk's design does not preclude the use of third- party devices, there are no assurances that MetaDisk will work with hardware not provided by Sun. MetaDisk Release One mirrors in pairs of partitions; that is, each mirrored entity consists of a single "primary" (mirrored-from) and "secondary" (mirrored-to) pair only. MetaDisk Release One and its associated utilities are building blocks for operational systems with high data availability; they are intended to help manage a disk or controller failure only. For turnkey high-availability systems, it is assumed that the user will supplement Metadisk with (1) a knowledgable system administrator, or (2) shell-scripts or application software elements. MetaDisk and its associated utilities have NO features or functions that support dual-ported disks or CPU hardware "failover" mechanisms of any kind. MetaDisk Release One runs only on Sun-4 SPARC machines. It does not run on SPARCstation-1, 1+, SLC, IPC, or other desktop systems at this time. MetaDisk Release One does not operate with the system disk (i.e., boot, swap, /, or /usr). All other partitions and file systems can be mirrored or concatenated. This limitation is not a big deal, but if you absolutely must have this feature, there are several Catalyst products available now. MetaDisk Release One does not include algorithms for striping files for increased performance. MetaDisk Release One is internally synchronous at all times; that is, both writes must be complete before the I/O completion signal is returned to the OS. There is no provision for allowing one of the copies to "fall behind" the other, for relaxed data integrity, or for "first arrived" read algorithms. MetaDisk Release One resynchronizes in units of a partition; that is, resynchronization involves copying all of the blocks in a partition to its mirror. The first release does not copy only updated data...it copies all of it. ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Sunflash is an electronic mail news service from Sun Microsystems, Ft. Lauderdale, Florida, USA. It is targeted at Sun Users and Customers. For additional information about SunFlash send mail to info-sunflash@sunvice.East.Sun.COM SunFlash is distributed via a hierarchy of aliases. Try to address change requests to the owner of the alias that you belong to. If you want to be added to the SunFlash alias, please contact the systems engineers at your local Sun office and/or send mail to sunflash-request@sunvice.East.Sun.COM. "All prices, availability, and other statements relating to Sun or third party products are valid in the U.S. only. Please contact your local Sales Representative for details of pricing and product availability in your region." Address comments to the SunFlash editor (John McLaughlin) at sun!sunvice!flash or flash@sunvice.East.Sun.COM. (305) 776-7770.