Folks - The following assumes some familiarity with the SCSI architecture document. These are topics which need to be further investigated to fill in details in the architecture. This is a design activity and will require that you think about issues of performance, maintainability, usefulness to customers, etc. The topics here may in some cases need only a couple days, or in other cases maybe a couple weeks to investigate. Please think about the topics and how long you expect to be needed for them. The investigations need to produce more detailed text to go into the architecture at the next level down in details. My hope is that most of these will be relatively short since there is plenty of additional detail to go beyond this. We will parcel these out at or about the Thursday meeting; the idea of passing the list around is that people might have favorite topics or want clarification or further definition. The time before then will give you some chance to seek such. More details of the interfaces at various levels of SCSI and the interfaces for common routines will need to be worked out, but these questions need to come first. The times needed will be negotiated after selections. glenn ----------------------------- SCSI architectural issues needing investigation reports 1. Support of SDTR/WDTR and LUNs. SDTR/WDTR are ID wide, but one gets inquiry data per LUN. How should SDTR be treated (pref. for smart adapters) in terms of the various enabled bits? Ditto WDTR. Force all to be the same? Switch on reselect? Disallow certain configurations? what? The question is what operations should be performed, when, to handle these negotiations correctly and in a fashion to support most devices. (There are comments in existing code about SDTR and these issues which will help.) My guess: week Sue. 1wk 2. The architecture document proposes a super-SCDRP containing command buffers as well as state information and possibly custom packing calls to port drivers to handle very unique packing (i.e., not just copy SCSI command into buffer) as well as some means of telling where the CMD buffer should be located. Are there any hidden issues with doing this that would act as problems? Any reasons such a proposal might negatively impact function or performance? My guess: 3 wks Sue, Glenn. 3 wks 3. How should SCSI data structures be linked together? (Remember SCSI3 is likely to mean larger IDs, LUN numbers, and maybe wider constants.) Moving from one data structure to another is frequent and we need to be sure a scheme in the architecture can handle growth and be efficient. My guess: week Jim. by 9/22 4. Is a single level selector sufficient for matching device "SCSI IQ" or peculiarities within class level? Or are there examples where additional capabilities lists should be maintained per device? Suggest forms for these to take if needed. Can one create a single number (or a single number per VMS function) as suggested in the architecture to be a valid representation of a SCSI IQ, or must more dimensions be used? If so, what? (Can a SCSI device be an idiot savant?) My guess: 2 weeks Rick. Start in 1wk on all 3 5. What parts of flow control can be reasonably handled at class startio and what needs to be done at port level? Would it be more advantageous to just have flow control all handled at port level? The object is to avoid resource exhaustion and strive for some I/O fairness. This involves questions of whether the class busy bits are adequate, how should queue depth and queue full status interact, and at what level, and whether one should (try to) use mode pages to tell how full a TCQ queue is and adapt to the hardware. How to handle the switch to single command vs. TCQ mode is an issue too. (Should one issue bus device reset or some such to stop long operations?) Also an issue: multi-initiator busses. If one can setmode-control quotas one might adapt total quotas so the queues would retain room even though >1 initiator is using the bus. (Since the queue manager is to be part of only those port drivers for "dumb" ports, the current flow control scheme resident in it needs to be revisited.) My guess: 2-3 wks Jim. 1+ wks 6. What do SCSI control chips supply by way of bus quality metrics? Is there any common information that can be captured and made available to users in some fashion about this, or are the control chips so different that basically no common information is conceivable? Bus quality metrics are desirable for diagnosis of field problems, possibly for field tuning of parts of SCSI, and for determining when path failover might be needed...IF it is feasible to obtain any such thing in a reasonable way. My guess: 2 wks Rick. Start in 1wk on all 3 7. At what level can RESET be handled? Is it possible to move such handling all (or mostly: some code would be common with packack handling) down to the top level port code? In general can more specific rules of thumb be given about which errors should be handled at low level vs. handling in class code? The architecture document proposes a rule that states you handle errors in class code unless knowledge of the error condition is complete at lower levels. Can we be more specific about various types of errors? The current use of mount verify to respond to SCSI bus reset makes its use for path failover difficult and imposes performance penalties which have no business being present; handling RESET should be done somewhere within the SCSI subsystem envelope. Since it is a bus-wide condition it would seem logical to do so in the bus level code (i.e., in the port driver.) Question is, is this feasible, and what issues arise from doing such? My guess: 2-3 weeks Buzzy. 3 wks. 8. What SCSI knobs and switches should be made controllable for starters? (There's a good deal in the documents about things to control, but also how does one set profiles?) My guess: week+ Rick. Start in 1wk on all 3 9. What is needed for a driver disconnect capability? (How should one idle a device? What about long I/O operations? ) (Involves disconnect, reconnect, and possibly driver unload as a further option.) My guess: week+ Grace. 1.5 wks 10. How should VMS locate port drivers and initialize SCSI subsystems? My guess: 1-2 wks Grace. 1.5 wks 11. Is there a better way to pass I/O to port level than the current svapte/bcnt/boff one? What general memory management routines are needed to translate addresses? My guess: 2 weeks Buzzy. 4 days 12. How should AEN be handled? (Target mode too.) Best to put any of it into port code (beyond the interrupt recognition)? Should a new class driver be present? My guess: 2 weeks ++ Tom. 2 wks (actually 1wk but busy with Japanese next week after) 13. Time-out. Can anything more general be done than having the timeout set by class level function dispatch? Adapt to devices perhaps? Again how might one store and init a profile in clusters? My guess: 2 weeks+ Jim. 2wks + 14. When should errors be logged and in what form? Can some generic rules of thumb (testable!) be given more than are in the current document for this? [rnote: ring buffers etc.] Mary. 3.5 wks. Statuses, 9/19/95 Sue S. - Sent me 1st draft of SDTR info Jim D. - 240 lines written so far (5-6 pages!) on flow control Rick - Needs adapter documents. Leaning toward using diagnostic ` page SCSI commands to do bus metrics. Buzzy - Reviewing use of mem mgt. and is finding map buffers straight- forward. Designing interface for code to build scatter/gather lists. Grace - 1 page written so far re research on locating port driver. (I had discussions with her after mtg & referred her to Sue to discuss some common issues betw. autoconfig. and SCSI connection setup.) Tom - Starting to write up AEN & target mode. Suggests a followon study of whether anything in SCSI 3 will invalidate the target mode implementation we have. (Group discussion was that target mode is what we have, not really AEN.) Mary - Looked over port drivers. Looking at class driver error reporting now. Finding lots of inconsistency. (In discussions with her last evening I told her she's finding exactly the kind of inconsistency we need to remove which I gather & hope helped her get it clear what we need.) ---------------------- It is mentioned that a means to allow a port driver to delay a very short time & be recalled is needed. No queue mgr in scsi2common means this will be needed. Mention to Jim. Statuses 9/21/95 Sue S - still writing. SDTR doc nearly done. Looking at sources re data structures Jim D - will send me something. However, interruption rate and scsi retrospective interfere; may need to offload some info. Rick L - going OK. Skeletons entered in note file Buzzy R - Going well. Writeup on mem mgt in note file. Thinking about reset. Grace W - still investigating. Has enough basic data. Tom G - Hope to have some writing done by Friday. Nikon CLD a major distraction. Mary Y - Looked over class drivers now. Thinking about the issues. Marge S - putting more comments into port driver book; new draft in a few days. --------------------------------- Statuses 9/26/1995 Grace - done one study; doing the second. Marge Sherwood - making some progress on the port driver book, though that's #3 on her priority list now. Jim Dunham - Updated flow control text some in response to my handwritten notes asking for more normative (as opposed to descriptive of existing code) text. Jim is more willing to discuss such needs verbally than is put down on paper here. However Jim announced he's taking a job in the cluster I/O group. I asked Rick Lord to check over Jim's text and need to get back for discussions. Sue S. - SDTR writeup done; still studying the issues in condensing more port driver inputs into a single call & data structure. Rick L - Done his 3 writeups. Has worked on a SCSI mode program which needs to get checked into Ghost somehow; may need a review. I will look it over with him. Buzzy R. - reviewing drivers for reset handling, but has been involved with qlogic rathole (and has the code in hand). Tom G. - Japanese board arrived along with Tom Y., from DEC Japan, but the board did not work. Attempting to get another rush shipped in from Japan. I told the group that some CLDs will be coming soon (some possibly as early as tomorrow) and that when their current studies are done we need to cross review. In addition Dave Fairbanks needs a code review of PKSdriver code after about 10/10; this code supports scsi clusters, for Ghost. I have a slight extension of Sue's pkcdriver fix that permits disabling SDTR to any device on SCSI busses A-D running on my workstation. Sue has a copy of the code, in case it might be useful to let selected sites get around SDTR problem devices. ----------------------------------------- Status 10/3/95 Grace - still working. Spent considerable time looking for a timing problem that led to bus reset problems on TLaser. Jim Dunham left the group. I've asked Rick to investigate flow control issues that Jim was working. Sue S. - Modified her SDTR writeup and has studied data structures some but is being bogged down by CLD following (including some mkdriver work). Rick L - Has done some CLD things but also is working on a paper on flow control, expects to have it by Friday. Buzzy R - Qlogic fire drill has tied him up these few days, but he now has sent code to Qlogic to test. He still has some cld issues to examine (script code) and is working over Reset. His work will need to be cross checked with flow control work (and vice versa) to ensure the interactions between the two are right. Tom G - A second board from Japan is due in today, and he is testing vax mode sense code and investigating a problem with floppy format being broken by the mode sense code. (This problem and the fix will need to be dealt with in AXP also.) Fix is imminent. (last minute: the new board is in, and this one works.) On the whole CLD following has tied up a LOT of group time this period and I'm trying to reset the balance so architecture gets moving again without stalling CLDs. I asked Sue to handle the CLD dispatching, which she was willing to do, with Pat's encouragement. The initial bunch of CLDs has been dispatched in one fashion or another but there are still some problems reported from within DEC and some new CLD issues that are taking some time from various group members. DJ Brown needs to take over the office G. Wang is in now. Rick has no problem doubling up with Grace, nor has she any problems, so that DJ can be next to Bill Clogher's office (allowing more room for test equipment and possible sharing of same). However we need to do something about phones. There is a second phone line that could be hooked up to an instrument in Rick's office, but phone numbers would need to be moved. This seems to be the major problem in doing such a move. I have been adding new text to the architecture document and harmonizing it, and am beginning to see a way of delegating some more of the definition of routine interfaces, but doing so will require that I have the framework ready. -------------------------------- Status as of after 10/5/95 meeting Grace - Still writing architecture material (on init). Has been given job now (10/5) of doing a quick study on support for 2048 byte sectors on CD (for Burns) per R. Critz wishes. Sue - Updated SDTR document and is reading PKSdriver re studying issues of data structures for architecture. However she is bogged with rush of CLDs; I may try to reassign some of her architecture stuff if she can't get to it. Mary - Illness has meant not much happened on architecture or anything else; still plugging. Seems SHR sent her some mail re how many TLZ07's do we want? Rick - making GOOD progress on architecture text. Also has dk audio and been reviewing scsi modesense stuff for Tom G. Tom G - Board from Japan works. Found bugs in PKC and JKdriver. One test program works, other not; Tom Yashimora is trying to see what is still off... Tom is not too busy but working on a DK floppy format problem (cld). Also writing project plan for vax modesense stuff. Bill C - Testing mode sense for Tom G (vax code version). Working with a Gryphon boot problem here; net install is messed up, but Bill has worked around it. Buzzy - Working on some Qlogic QARs and issues. One from Philips in the Netherlands is a gate for a $10M sale; looks like maybe a SCSI bus problem. Buzzy has expressed optimism re getting the reset stuff done in another week or so today (10/5) and will have some text re bus device reset. the OSF folks have proposed a somewhat different qlogic chip bugfix than at first and this may make a bit more development work necessary to support it.