From: STAR::COUGHLAN "Tom" 3-FEB-1997 11:14:24.16 To: EVERHART CC: COUGHLAN Subj: Some inputs for the High Level Design Glenn, Friday I offered to give you an outline of the material that you should put in a design document for wide review. Here is a start. This is more fleshed-in than an outline, because I think the introductory sections of your current document need to provide more context. Please don't view this text as a mandate for what you have to write, though. I don't care if you use any of it as-is. I do hope it suggests a level of detail and information content that will help an outside reader. High-Level Design Specification for Multipath Failover Introduction This document describes a proposed method for supporting failover between multiple paths that may exist to peripheral devices in an OpenVMS system. The goal of this document is to facilitate a wide review of the proposed concepts by all who may be impacted, or may be able to provide valuable input. The facilitate this, the design is presented at a high level, and as succinctly as possible. Additional design details are available from the author. Interested parties may also wish to refer to the PS and IR (and PP?) in the following locations... Problem Statement OpenVMS has long supported failover between multiple paths to an MSCP storage device. The paths may include multiple local paths, as well as an MSCP-served path. Since this type of multipath failover is specific to MSCP devices, it was implemented in the MSCP class drivers. There are three developments that make this type of multipath support inadequate for the future: 1. SCSI clusters provide a configuration in which it is desirable to be able to switch between a local SCSI path and an MSCP served path. 2. Multi-ported SCSI devices are beginning to become available, and are an important component in high availability systems. It is expected that all future parallel SCSI RAID controllers will have multiple ports, and all FC devices will be multiported. 3. It is expected that the QIO server and the MSCP server will be run simultaneously on some systems in the cluster. This poses the need for switching between two different served paths to a device. (A diagram here would be helpful.) In light of these new developments, it is clear that a mechanism for multipath switching that is more general than the current MSCP-only approach is required. Design Challenges There are two problems that have to be solved, path switching and naming. (explain, as in Sec. 7 of your document...) The Hardware Environment SCSI-3 and Fibre Channel are expected to eventually provide a non-volatile WWN for each storage unit. This will be accomplished via something like an Ethernet ROM in each discrete drive, and in each RAID controller. The WWN for a virtual RAID storage device is expected to be a concatenation of the controller's ROM ID, a timestamp, and an incarnation number, stored with the unit's metadata. A multi-ported device is required to return the same WWN over all of its ports. This means that the WWN can be used as the mechanism for detecting multiple paths, and as the basis for a path-independent device name. Multi-port SCSI devices are expected to be implemented in a variety of ways. At least the following three characteristics are expected: 1. A particular logical storage unit is available for reads and writes on one path at a time. The device stays on that path until the host performs an explicit operation to cause a switch. It is highly desirable for such a device to be capable of answering Inquiry commands on either path at any time, so that it will detect and monitor all the possible paths. It is also desirable for this type of device to provide Unit Attention.... 2. A particular logical storage unit is capable of being active on any path, but only path may be active at any time. 3. A particular logical storage unit is capable of simultaneous access on any port. The Hardware Reality HSZ40/50/70 controllers: There is firmware in field test now that supports multi-path failover on the HSZ40/50/70 controllers. This firmware provides the following functions: (...) HSZ80 controller: The HSZ80 is expected later this year. It is expected to support the same functions as the HSZ40/50/70 controllers. There is a possibility that it may support real WWNs. (Let's as Steve) HSZ22 Controller: The HSZ22 is expected later this year. It supports fully simultaneous access on its two host ports. Its naming characteristics are TBD. The HSG controller: The Fibre Channel controller is expected late this year. It is expected to support WWNs. Seagate Fibre Channel Drives: One-path-at-a-time, real WWNs. Design Goals (Glenn this is partial) Highest priority: Failover for non foreign disks, including system, shadow, quorum, dump. Support for HSZ* is the highest priority. Alpha only. Medium priority: SCSI to MSCP switching MSCP to QIO server switching Low priority: Tapes Foreign disks Changes to make I/O intercepts more efficient.. Not planned: VAX support, other than as described above. Design Overview: This design assumes that all paths to a device will have the same OVMS device name. The method for ensuring this will be discussed later. When a system detects a device name for the first time it checks to see if there is a already a UCB for it. If not it creates one. This will be the "primary" UCB for the life of the system. All I/Os will be issued to this UCB, and from here they may be redirected. If a system detects that there is already a UCB for a device name, in creates a new UCB, but this UCB is put in a special place.... (Show a diagram of the UCBs, their ports, paths to remote systems, etc.) When a system detects an error on a path.... Describe the role of the switching execlet, in general terms. Describe the role of the switch server, in general terms. Mention the policy that will be used to prevent switching if it will cause a node to loose access. Mention that booting the wrong path can defeat this policy... The Naming Problem Briefly explain the grand WWN - alias - locking - config. file solution. Say this is too hard for now, so we will ask for a parameter to be added to the HSZ... Detailed Design .....