From:	SMTP%"EVERHART@arisia.gce.com" 13-JUN-1995 11:16:31.04
To:	EVERHART
CC:	
Subj:	re: drive failures

Date: Mon, 12 Jun 1995 16:06:08 -0400 (EDT)
From: EVERHART@arisia.gce.com
Subject: re: drive failures
To: VMSSIG@arisia.gce.com
Errors-to: open-vms-sig-owner@DECUS.Org
Warnings-to: open-vms-sig-owner@DECUS.Org
Message-id: <950612160609.5f@arisia.gce.com>
Content-transfer-encoding: 7BIT
Comments: Send OPEN-VMS-SIG subscribe/unsubscribe requests to mailserv@DECUS.Org

It is possible to set up a logging system which can log all writes
to a disk. I wrote a virtual disk flavor awhile back that would let
you do this. What you do is back/phys and then erase the log, then
restart things. The log can be played back to reconstitute the disk
then up to any moment desired. (You use the utility I put on the 
F94 sigtapes to extract files from the back/phys in case you need
such, so you lose no generality.)

Similar things can be done with shadowing. Bear Systems (818 341 0403)
sells a remote shadowing facility that basically can shadow your
disk, potentially over a net, and which allows the disk access to
get ahead of the shadow (journals, so you don't lose data). BAsically
you get the sort of function mentioned (and I think a few more
functions).

If your disk is just going momemtarily offline and not coming back
try setting mount verification timeout up higher. Failing that,
on vax, it's possibly worth trying to use the intercept driver in
[vax94a.acorn] (jgdriver I believe) to force error retries on
vax SCSI disks. (Actually it can force extra retries on any disks,
SCSI or not.) Sometimes such retrials can make an operation succeed.
It helped me...

Unfortunately the intercept driver is only for vax...I never ported
that one to alpha...and the improvements may be less on alpha.

If your disk is just going down hard, you need to back the thing
up, take it out of the cabinet, and get a replacement FAST. Stuff
like a bearing going can give such behavior. Worn bearings do NOT
fix themselves...and similarly for lots of other error conditions
that take a drive down hard and keep it down...

glenn
Everhart@arisia.gce.com
