From:	SMTP%"leichter@lrw.com"  3-MAR-1995 08:33:04.14
To:	EVERHART
CC:	
Subj:	RE: Boehm GC for OpenVMS AXP

From: Jerry Leichter <leichter@lrw.com>
X-Newsgroups: comp.os.vms
Subject: RE: Boehm GC for OpenVMS AXP
Message-ID: <9503030354.AA04261@uu3.psi.com>
Date: Thu,  2 Mar 95 22:53:06 EDT
Organization: Info-Vax<==>Comp.Os.Vms Gateway
X-Gateway-Source-Info: Mailing List
Lines: 93
To: Info-VAX@Mvb.Saic.Com

	I am trying to port the Boehm conservative collector to OpenVMS with
	the help of my local VMS guru.  However, I have one requirement that
	has him (my guru) stumped--  I need to determine the starting and
	ending (virtual) addresses of the program's static data segment at run
	time.

	The Boehm collector does this on Unix by simply referencing the global
	symbols "etext" and "end".  On DOS-derivative systems, a data
	structure in the process prefix is consulted.  VMS lacks the former,
	and considerable digging through manuals has not made it immediately
	obvious how to emulate the latter, although I *have* convinced myself
	that the information *is* there to be had.

	In case it's not clear exactly what I'm after, the following excerpt
	from a .MAP file should illustrate:

	Psect Name      Module Name       Base     End           Length
	----------      -----------       ----     ---           ------
	$LINK$                          00010000 000100EF 000000F0 (    240.)
	                VMS_OS_DEP      00010000 000100EF 000000F0 (    240.)
	$LITERAL$                       000100F0 00010192 000000A3 (    163.)
	                VMS_OS_DEP      000100F0 00010192 000000A3 (    163.)
	$DATA$                          00020000 00020003 00000004 (      4.)
	^^^^^^                          ^^^^^^^^ ^^^^^^^^ ^^^^^^^^
	                VMS_OS_DEP      00020000 00020003 00000004 (      4.)
	$CODE$                          00030000 000301EF 000001F0 (    496.)
	                VMS_OS_DEP      00030000 000301EF 000001F0 (    496.)

	The information underscored with ^^^^'s is exactly what I need-- but I
	need to dynamically determine it at runtime.  How?

	PS> If you were about to respond with a suggestion that I parse the
	    .MAP file at run time, please use the time instead to contemplate
	    the possibility of a career change...  Maybe something in sales...

	PPS> For the humor-challenged: B^)

I can answer your question as you asked it - but it won't do you any good.
The VMS memory layout algorithms are very different from those used in Unix.
Where in Unix - well, traditional Unix anyway; modern Unixes have started to
become more elaborate - memory consists of a text (code) segment, followed by
a data segment, and then a stack segment, so that the idea of etext and end
made some sense, the memory of a VMS program is much more complex.  Your link
map shows you four segments, but there can be an arbitrary number; even within
one image, code and data can be interspersed arbitrarily; and once you factor
in shareable images - which just about every program is going to include -
you have *multiple* images, each with its own multiple segments, one after
another.  There is no such thing as *the* static data segment in a VMS
process.  Given an address at run-time, there is no good way to determine
what's actually at that address.  I suppose you could check to see if the
memory was writeable - code would never be - but that's about as far as you
can go.

That said:  The Linker sorts psects by psect attributes - there's a table in
the Linker manual showing the order in which psects with different attribute
combinations will occur in memory - and within any group with the same
attributes alphabetically by psect name.  Hence, to determine the address
range of $DATA$, you need only declare a pair of psects with the same
psect attributes as $DATA$ - the link map will show you what those are -
and with appropriate names.  $DATA and $DATA$$ should do the trick for the
psect names.  However, you're trying to simple an example:  If you had used
the "common" external model, or specified a psect with one of the other
models, user variables would have been in psects other than $DATA$.  Again,
you could probably find them all between "$" and "z"^31.

But, again, beware - the memory bracketed by these values is *not* all the
memory that you must consider as the GC's root set.  Any shareable image the
file is linked against can potentially contain its own private data segment.
For that matter, the C RTL itself - contains some private pointers to memory
it has itself malloc'ed - as part of file data, for example.  Here's an
example:  If you fopen() a file, the C RTL will malloc() some memory for it,
including a buffer.  Even if you now destroy all your FILE * pointers to the
file, the C RTL still knows it's there, so that when the program terminates it
can flush the buffer.  The root for the data structure that links all the
files known to the RTL is in a data segment owned by the RTL, nowhere near the
$DATA$ segment from the main program.  There's really no way to find out where
this data might fall, since you have no control over the layout of any
shareable images.

In summary:  I'd love to have a working Boehm-style GC for use in VMS, and in
fact have looked at doing one myself in the past.  Unfortunately, it's just
not very easy to do.  In fact, the following might be the best approach:
Grab control before main() - LIB$INITIALIZE lets you do that - and scan P0
space, a page at a time, up to the highest allocated page (use $GETJPI's
FREP0VA item to determine where that is).  Use PROBEW to determine if each
page is writeable.  Classify all writeable pages as the root set.  The set
will be somewhat than it might be, but this should be pretty close.

Be aware that not all memory in a VMS process is allocated through malloc()!
RTL routines can and will allocate memory, and doing your own malloc() won't
affect them in the slightest.  There are *many* difficulties here....

							-- Jerry
