1 INTRODUCTION VAX/VMS and XENIX. It runs under VMS, and almost runs under XENIX. "Almost" means that I didn't finish implementing XENIX system-dependent functions before the deadline for submissions to the DECUS Languages and Tools SIG Tape. As of 7-Dec-1987, it was close to working under XENIX. nd ols SIG Tape. As of 7-Dec-1987, it was close to working under XENIX. the PDP-11 TECO User's Guide (DEC Order Number DEC-11-UTECA-B-D). It was written in C so the author could move comfortably from a VAX 11/780 to various other machines, including MicroVaxes, which cannot execute TECO-11 because they don't support PDP-11 compatibility mode. nnot execute TECO-11 cause they don't support PDP-11 compatibility mode. 2 COMPILING AND LINKING "ZPort.h". This file should insure that one and only one of the following three identifiers is defined: vax11c, XENIX or UNKNOWN. In the system-dependent code, these identifiers are used to conditionally compile code for one of the three environments. The VAX-11 C compiler defines the symbol "vax11c" automatically, so it is not explicitly defined in ZPort.h. mbol "vax11c" automatically, so it is not explicitly defined in ZPort.h. debugging system is turned on (see DEBUGGING) then execute VDBGBLD.COM. Execute VSYMBS.COM to set up the "L" symbol, and then use it to link the program. YMBS.COM to set up the "L" symbol, and then use it to link the ogram. Under XENIX, compile by executing XBUILD.CSH. 3 RUNNING TECO-C to invoke TECO-11. The primary reason is that TECO-C was originally implemented on a CPM-68K system, where using the command line parsing performed by C was easier than implementing DEC-like command line parsing myself. The style used by TECO-C is natural for C. e command line parsing self. The style used by TECO-C is natural for C. The format is tecoc [-c] [-d data] [-m] [-p] [-r] [filename] wh ere the switches have the following meanings: -d preceeds data when the -p switch is used -m do not remember the last-file-edited ed -p execute a file containing TECO command -r open the file read-only TECO commands -r open the file read-only Page 2 4 CODE CONVENTIONS through global variables, not argument lists. There is a reason: the nature of the basic parsing algorithm is to use the characters in the command string as indices into a table of functions. This makes for very fast command parsing, but it means that all the functions have to modify global values, because no arguments are passed in. In other words, there were going to be 130 or un-modular functions anyway, so I gave up on modularity. This explanation does not explain some of the complications in the search code, like the global variable SrcTyp. Oh, well. plications in e search code, like the global variable SrcTyp. Oh, well. Here's a brief list of some of the conventions followed by the code: alphabetically in file "TECOC.C". variables are declared alphabetically in file "TECOC.C". in file "ZINIT.C". global variables are declared alphabetically in file "ZINIT.C". 3. All structures are defined in files "DefTeco.h" and "ZFiles.h". identifiers defined in the "ZPort.h" file. Functions which do not return a value are defined as "VOID". TECO-C should compile and link on different machines by changing only the environment definitions in the "ZPort.h" file. hanging only the environment definitions in the "ZPort.h" file. own file, and the name of the file is the same as the name of the function. and the name of the file is the same as the name of the function. compilers that don't support very many characters of uniqueness. For instance, the XENIX C compiler only sees the first 7 characters. , the XENIX C compiler only sees the first 7 characters. and lower case are mixed. An uppercase letter indicates a new word. For example, "EBfEnd" stands for "edit buffer end". If you need to know what a variable name means, look at the definition of the variable near the top of the code. The expanded version of the abbreviated name appears in the comment on the same line as the variable definition. The limit of 6 letters in variable names is not followed in system-dependent code. tters in variable names is not followed in system-dependent code. instance, "EBfBeg" and "EBfEnd" are the beginning and end of the edit buffer. If you see a variable named "BBfBeg", you can assume that it is the beginning of some other buffer, and that a "BBfEnd" exists which is the end of that buffer. ffer, and that a "BBfEnd" exists which is the end of that buffer. they don't strictly need to be. Some compilers (and lint) require this. It helps the reader, as he/she can be sure that when they this. It helps the reader, as he/she can be sure that when they Page 3 comment on the declaration line at the top of the code. It also means that an alphabetized list of "who this function calls" can be found near the beginning of each module. s function calls" can be found near the beginning of each module. 8. Each file has a consistent format, which is: 2. include directives the function 3. definitions of calle 4. external variable declarations, in alphabeical order 5. the function declaration ions, in alphabeical order 6. local variable definitions 7. code variable definitions, in alphabetical order 7. code 5 DEBUGGING TECO-C code. It is conditionally complied into the code by turning on or off an identifier defined in the "DefTeco.h" file. It uses the control-P command and is VERY helpful. the "DefTeco.h" file. It uses the control-P command d is VERY helpful. hazardous to your sanity. A macro is provided which will strip the debugging code out of a source file to produce a readable listing file. Of course, it's a TECO macro, so you have to have TECO running. To use it, mung DELDBG and give it the file name including the ".C". It will produce a file with a .LIS file type. e name including the ".C". It will produce file with a .LIS file type. execution of TECO-C with TECO-11. Put a test command string into a file. Use DEFINE/USER_MODE to redirect the output of TECO-C to a file and execute the macro with TECO-C. Then do the same thing with TECO-11. Use the DIFFERENCES command to compare the two output files. They should be 100 percent identical. to compare the two output files. They should be 100 rcent identical. 6 TOP LEVEL EXECUTION AND COMMAND PARSING very simple: after initializing, a loop is entered which reads a command string from the user, executes it, and then loops back to read another command string. If the user executes a command which causes TECO-C to exit, the program is exited directly via a call to the ZAbort function. TECO-C never exits by "falling out the bottom" of the main function. tion. CO-C never exits by "falling out the bottom" of the main function. execute the command string. ExeCSt contains the top-level parsing code. o xecute the command string. ExeCSt contains the top-level parsing code. Page 4 into a table of functions. The table contains one entry for each of the 128 characters in the ASCII collating sequence. Each function is responsible for "consuming" its command so that when it returns, the command string pointer points to the next command. t when it returns, the mmand string pointer points to the next command. failure of the command's execution. When a command fails, it calls the ErrMsg function to display an error message. It then returns FAILURE to its caller. When a command succeeds, it returns SUCCESS. When FAILURE is returned to the command parser (in EXECST.C), the command string stops executing and the user is prompted for a new string. ommand string stops ecuting and the user is prompted for a new string. 6. 1 Command Modifiers (CmdMod) arguments, which may preceed some commands. These are implemented in a way that maintains the basic "jump table" idea. For instance, when an at-sign (@) modifier is encountered in a command string, the at-sign command function (ExeAtS) is called. The only thing ExeAtS does is set a flag indicating that an at-sign has been encountered. Commands which are affected by an at-sign modifier look at this flag and behave accordingly. fected by an at-sign modifier look at this flag and behave accordingly. variable CmdMod. A bit in CmdMod is reserved for each command modifier. The modifiers are "@", ":" and "::". Of course, once the flag has been set, it must be cleared. With this parsing algorithm, the only way to do that is to make every command function explicitly reset CmdMod before a successful return. This is not too bad: clearing all the flags in CmdMod is done with one statement: "CmdMod = '\0';". ing all the flags in CmdMod done with one statement: "CmdMod = '\0';". of an expression are encountered in a command string, they are pushed onto the expression stack. Commands which handle numeric arguments check to see if the expression stack contains a value. The special case of numeric arguments is "m,n". This is easily handled: the "m" part is encountered and causes the value to be pushed onto the expression stack. The comma causes the ExeCom function to move the value into a special "m-argument" global variable (MArgmt), clear the expression stack and set another flag in CmdMod indicating that the "m" part of an "m,n" pair is defined. Then the "n" is encountered and pushed onto the stack. Commands which can take "m,n" pairs check the flag in CmdMod. the stack. Commands which can take ,n" pairs check the flag in CmdMod. flags in CmdMod are cleared at the right times. It is the responsibility of each command function to leave CmdMod and EStTop with the proper values before successfully returning. The rules are: Top with the proper values fore successfully returning. The rules are: clearing CmdMod or EStTop. They will be cleared before the next command string is executed. They will be cleared before the next command string is executed. Page 5 not clear EStTop before returning SUCCESS. If the command calls GetNmA, do not clear EStTop, as GetNmA does it for you. Otherwise, clear EStTop before returning SUCCESS. it for you. Otherwise, clear EStTop before returning SUCCESS. leave them alone. For instance, ExeDgt should not clear CmdMod, because the MARGIS bit may be set. xeDgt should not clear CmdMod, because the MARGIS bit may be set. 7 SEARCHING for a fast search and the need to handle all the features of search commands has produced code which can be a real pain to follow. This section attempts to explain how things got the way they are. ollow. This ction attempts to explain how things got the way they are. following discussion explains the code in a bottom-up fashion, to follow the way the code evolved in the author's twisted mind. fashion, to follow e way the code evolved in the author's twisted mind. string. The steps are: a is to scan a contiguous edit buffer for a search ring. The steps are: string. If you reach the end of the edit buffer without matching, the search fails. ch the end of the edit buffer without matching, the search fails. in the edit buffer, try to match successive characters in the search string with the characters which follow the found character in the edit buffer. If they all match, the search succeeds. If one doesn't, go back to step 1. l match, the search succeeds. If one doesn't, go back to step 1. commands has buried these steps deep within some confusing code. s search mmands has buried these steps deep within some confusing code. The 17 "match constructs" are indicated in the search string by the characters ^X, ^S, ^N and ^Ex where "x" can be several other characters. For instance, a ^X in the search string means that any character is to be accepted as a match in place of the ^X. If a character is not ^X, ^S, ^N or ^E, then then character itself is the match construct. An example: the search string "a^Xb" contains 3 match constructs: a, ^X and b. mple: the arch string "a^Xb" contains 3 match constructs: a, ^X and b. searching backwards, only the search for the first match construct in the search string is done in a backwards direction. When the character is found, the characters following it are compared in a forward direction to the edit buffer characters. This means that once the first match construct has been found, a single piece of code can be used to compare successive characters in the search string with successive characters in the edit haracters in the search string with successive characters in the edit Page 6 bu ffer, regardless of whether the search is forwards or backwards. The lowest level search is now: matches the first match construct in the search string. If you reach the end of the edit buffer without matching, the search fails. e end of the edit buffer without matching, the search fails. character in the edit buffer, try to match successive match constructs in the search string with the characters which follow the found character in the edit buffer. If they all match, the search succeeds. If one doesn't, go back to step 1. l match, the search succeeds. If one doesn't, go back to step 1. and in order to have a reference for later discussion, the following hierarchy chart of "who calls who" is presented. discussion, the following erarchy chart of "who calls who" is presented. E | | | | | | | | | | | d | | | | | | | | | | | ------------------------------------------------------------ | --------------------------- V Searc | h V SrcLo | p V SSerc | | | +------+ | +-- +---+ | | | | V V | V V | | ZFrSrc | BakSrc | | | | | | | | +---+ | | | +---+ +------+ | +------+ +---+ V V V ----+ CMatch | | +--------+ +--------+ described above. ZFrSrc searches forwards in the edit buffer for characters which match the first character in the search string. BakSrc does the same thing, but searches backwards. SSerch calls one of these two functions and then executes a loop which calls CMatch to compare successive match constructs in the search string to characters following the found character in the edit buffer. The reason that ZFrSrc, BakSrc and CMatch haracter in the edit buffer. The reason that ZFrSrc, BakSrc and CMatch Page 7 ca ll themselves is to handle some of the more esoteric match constructs. dependence. Case dependence in TECO is controlled by the search mode flag (see ^X). The variable SMFlag holds the value of the search mode flag, and is used by ZFrSrc, BakSrc and CMatch. e value of the search mode flag, and used by ZFrSrc, BakSrc and CMatch. It contains a VAX/VMS-specific version which uses the LIB$SCANC run-time library routine to access the SCANC instruction. The SCANC instruction looks like it was designed to handle TECO's match constructs. I couldn't resist using it. Of course, there is a straightforward C implementation of the algorithm for non-VAX/VMS environments. ghtforward C implementation of e algorithm for non-VAX/VMS environments. following capabilities of TECO searches: lgorithm arise because of the llowing capabilities of TECO searches: 1. If there is no text argument, use the previous search argument. 2. If colon modified, return success/failure and no error message the search command, exit the loop without displaying an error message. h command, exit the loop without displaying an error message. 4. Handle optional repeat count of the flag. is non-zero, verify the search based on the value of the flag. searches. f the ED flag is set, move dot by one on multiple searches. search. of the ED flag is set, don't move after a failing search. 8. Be fast. Teco's search commands are 1. S - search 2. N - paged search 3. FS - search and replace 4. FN - paged search and replace 5. _ (underscore) - paged search using yanks flag ged search using yanks, but ignore yank protection bit in ED flag Page 8 7. F - paged search and replace using yanks 8. FD - search and delete 9. FK - search and delete up to the found string 10. FB - bounded search 11. FC - bounded search and replace 8 MEMORY MANAGEMENT 8.1 The Edit Buffer And Input Buffer buffer memory management. Here's why. uses a different form of edit ffer memory management. Here's why. memory. This allows rapid movement through the edit buffer (by just maintaining a pointer to the current spot) and makes searches very straightforward. Insertion and deletion of text is expensive, because each insertion or deletion requires moving the text following the spot where the insertion or deletion occurs in order to maintain a continuous block of memory. This gets to be a real pain when a video editing capability is added to TECO, because in video mode text is added/deleted one character at a time very rapidly. in video mode text is added/deleted one character at time very rapidly. continuous piece of memory, but there is a gap at the "current spot" in the edit buffer. When the user moves around the edit buffer, the gap is moved by shuffling text from one side of the gap to the other. This means that moving around the text buffer is slower than for TECO-11's scheme, but text insertion and deletion is very fast. Searches are similar because most searches start at the current spot and go forwards or backwards, so a continuous piece of memory is searched. o forwards or backwards, so a ntinuous piece of memory is searched. video editing shell around TECOC would be efficient. implementation of a deo editing shell around TECOC would be efficient. is initialized. Suppose the allocated memory starts at address 3000. The following diagrams illustrate the memory management variables and the values they might have for several situations: gement variables and the lues they might have for several situations: Empty edit buffer: GapBeg = 3000 (gap beginning) nning) GapEnd = 13000 (gap end) ing) EBfEnd = 13000 (edit buffe EBfEnd = 13000 (edit buffer end) Page 9 Buffer contains "test", character pointer is before the first 't': GapBeg = 3000 (gap beginning) nning) GapEnd = 12996 (gap end) ing) 12997 't' (gap end) 12998 'e' 12999 's' EBfEnd = 13000 't' EBfEnd = 13000 't' (edit buffer end) Buffer contains "test", character pointer is after the last 't': 3001 'e' (edit buffer beginning) 3002 's' 3003 't' GapBeg = 3004 GapEnd = 13000 (gap end) ing) EBfEnd = 13000 (edit buffe EBfEnd = 13000 (edit buffer end) Buffer contains "test", character pointer is after the 'e': 3001 'e' (edit buffer beginning) GapBeg = 3002 GapEnd = 12998 (gap end) ing) 12999 's' (gap end) EBfEnd = 13000 't' EBfEnd = 13000 't' (edit buffer end) at GapBeg. When a deletion command is executed, GapEnd is incremented for a forward delete or GapBeg is decremented for a backwards delete. When the character pointer is moved forwards, the gap is moved forwards by copying text from the end of the gap to the beginning. When the character pointer is moved backwards, the gap is moved backwards by copying text from the the area just before the gap to the area at the end of the gap. t from the the ea just before the gap to the area at the end of the gap. and the bounded text area includes the edit buffer gap. In this case, the gap is temporarily moved so that the search can proceed over a continuous memory area. rily moved so that the search can proceed over a continuous mory area. basic edit buffer gap management. Following the end of the edit buffer (EBFEnd) is the current input stream buffer. Since file input commands always cause text to be appended to the end of the edit buffer, this is natural. Thus, no input buffer is needed: text is input directly into the edit buffer. This makes the code a little confusing, but it avoids the problem of having an input buffer. When you have an input buffer, you have to deal with the question of how large the buffer should be and what to do with it when it's too small. It's also faster and saves some memory. o do ith it when it's too small. It's also faster and saves some memory. Page 10 8. 2 Q-registers q-register is represented by a structure containing three fields: one to hold the numeric part and two to point to the beginning and end of the memory holding the text part. If the text part of the q-register is empty, then the pointer to the beginning of the text is NULL. -register is empty, en the pointer to the beginning of the text is NULL. and 1 for each digit from 0 to 9. These q-registers are accessible from any macro level. There are 36 local q-registers for each macro level. The names for local q-registers are preceeded by a period. Thus the command "1xa" inserts a line into global q-register "a", while the command "1x.a" inserts a line into the local q-register ".a". Storage for the data structure defining local q-registers is not allocated until a local q-register is first used. This saves space and time, because local q-registers are likely to be used rarely, and doing things this way avoids allocating and freeing memory every time a macro is executed. way avoids locating and freeing memory every time a macro is executed. 9 STACKS 9.1 Expression Stack the command string QA+50=$$. When the command string is executed, the value of QA is pushed on the expression stack, then the operator "+" is pushed on the expression stack, and then the value "50" is pushed on the expression stack. Whenever a full expression that can be reduced is on the expression stack, it is reduced. For the above example, the stack is reduced when the value "50" is pushed. or the above example, the stack is duced when the value "50" is pushed. The expression stack is implemented in the following variables: EStTop index of the top element in EStack operators and operands EStBot index of the current "bottom" of the EStBot index of the current "bottom" of the stack in EStack can include a macro invocation. For example, the command QA+M3=$$ causes the value of "QA" to be pushed on the expression stack, then the "+" is pushed, and then the macro contained in q-register 3 is executed. The macro in q-register 3 returns a value to be used in the expression. When the macro is entered, a new expression stack "bottom" is established. Things would get very confusing if the macro were to look at the expression stack or return with several elements on the expression stack. expression ack or return with several elements on the expression stack. 9. 2 Loop Stack command in the loop. For example, in the command 5$$, the loop stack contains the loop count (5) and the address of the first command in tack contains the loop count (5) and the address of the first command in Page 11 thloop count is decremented. If the loop count is still greater than zero after it has been decremented, then the command string pointer is reset to point to the first character in the loop. mmand string pointer is reset to int to the first character in the loop. The loop stack is implemented in the following variables: LStTop index of the top element in LStack counts and addresses LStBot index of the current "bottom" of the LStBot index of the current "bottom" of the stack in LStack expression stack needs one: macros. Consider the command string 4$$. When the "<" in is encountered, the loop count (4) and the address of the first character in the loop (S) are placed on the loop stack. Command execution continues, and the "M7" command is encountered. Suppose that q-register 7 contains the erroneous command string 10>DL>$$. When the ">" command is encountered in the macro, TECO expects the loop stack to contain a loop count and an address for the first character in the loop. In this example, there is no matching "<" command in the macro which would have set up the loop stack. It would be very bad if TECO were to think that the loop count was 4 and the first command in the loop was "S". In this situation, what TECO should do is generate the error message "BNI > not in iteration". In order to implement this, the variable LStBot is adjusted each time a macro is entered or exited. LStBot represents the bottom of the loop stack for the current macro level. StBot represents the ttom of the loop stack for the current macro level. 9. 3 Macro Stack entered. All important values are pushed onto the stack before a macro is entered and popped off the stack when the macro is exited. The stack is also used by the EI command, which means it's used when executing initialization files and mung files. ch means it's used when executing itialization files and mung files. 10 HELP "HELP" followed by a carriage return. HELP is the only TECO command that is not terminated by double escapes. . HELP is the only TECO command that not terminated by double escapes. interactive help mode is entered, so that a user can browse through a help tree, as he can from DCL. In TECO-C, access is provided to only two libraries: the library specific to TECO-C (pointed to by logical name TEC$INIT) and the system help library. To get help on TECO-C, just say "HELP", with or without arguments. To get help from the system library, say "HELP/S". I find this easier to use than TECO-11's syntax. m library, ay "HELP/S". I find this easier to use than TECO-11's syntax. Page 12 generated from TECOC.HLP, which is generated from TECOC.RNH. See file TECOC.RNH for a description of how to do it. This help library is far broader than the library for TECO-11, but much of it has yet to be filled in. er than the library for TECO-11, but much of it has yet to be filled . which are displayed when the help flag (EH) is set to 3. For systems other than VMS, the ZHelp function displays verbose text contained in static memory (see ZHelp.c). ction displays verbose text contained in static mory (see ZHelp.c). 11 FILE INPUT linked list data structures to keep track of, and most file input goes directly to the end of the edit buffer. track of, and most file input goes rectly to the end of the edit buffer. to the end of the edit buffer. After each input call, nothing needs to be moved; the pointer to the end of the edit buffer is simply adjusted to point to the end of the new record. The pointer to the end of the edit buffer serves two purposes: it points to the end of the edit buffer and to the beginning of the input buffer. ts to the end of the edit buffer and to e beginning of the input buffer. buffer and the input buffer. When the edit buffer is empty, it can be made smaller by shrinking the edit buffer gap in order to make the input buffer larger. Obviously, if the edit buffer needs to be expanded, the input buffer can suffer before more memory is actually requested from the operating system. This is easily achieved by moving the pointer to the "end-of-the-edit-buffer"/ "beginning-of-the-input-buffer". pointer to the nd-of-the-edit-buffer"/ "beginning-of-the-input-buffer". input. The EP and ER$ commands provide a complete secondary input stream which can be open at the same time as the primary stream (two input files at once). The EI command reads and executes files containing TECO commands, and is used to execute the initialization file, if one exists. The EQq command, if implemented, reads the entire contents of a file directly into a Q-register. mented, reads the entire contents of a file rectly into a Q-register. of input is not standard. For A, Y and P commands, a form feed or end-of-file "terminate" the read. For n:A commands, form feed, end-of-line or end-of-file "terminate" each read. For EI commands, two escapes or end-of-file "terminate" the read. The input code must "save" the portion of an input record following a special character and yield the saved text when the next command for the file is executed. r and yield the saved text en the next command for the file is executed. stream directly to the end of the edit buffer. When the input stream is switched via a EP or ER$ command, the obvious switching of file descriptors happens, and any text that's "leftover" from the last read is explicitly saved elsewhere. Note that is happens VERY rarely, so a malloc/free is aved elsewhere. Note that is happens VERY rarely, so a malloc/free is Page 13 ac ceptable. used as a temporary input buffer. After the file is read, the text is copied to a Q-register in the case of EQq and to a separate buffer in the case of EI. -register in the case of EQq and to a separate buffer in the se of EI. 12 PORTING TO A NEW ENVIRONMENT may be of some help. o the target machine. The AMODEM.C program may be of some help. off. ZPort.h to cause the UNKNOWN identifier to be on, others off. files to produce a new command procedure that will compile/link or your machine. e a new command procedure that will compile/link or your machine. (those files that start with a "Z"). You can should do this in roughly the following order: ZInit, ZTrmnl, ZAbort, ZDspCh, ZAlloc, ZRaloc, ZFree, ZChin. This will give you a TECO with everything but file I/O. You can run it, add text to the edit buffer, delete it, search, use expressions and the = sign command (a calculator). Then do file input: ZOpInp, ZRdLin, ZIClos. Then do file output: ZOpout, ZWrite, ZOClos, ZOClDe. Use the test macros (*tst*.tec) to test how everything works (see Testing). The remaining "Z" files are helpful but not essential. Testing). The remaining "Z" files are helpful but not essential. 13 TESTING contained in files named TSTxxx.TEC, where XXX is some kind of indication as to what is tested. For instance, TSTQR.TEC tests q-registers. The test macros do not test all the functions provided by TECO. They were originally used to verify that TECO-C performs exactly the same as TECO-11 under the VMS operating system. When I needed to test a chunk of code, I sometimes did it the right way and wrote a macro. test a chunk of code, I metimes did it the right way and wrote a macro. 14 DEBUGGING TECO-C code. It is conditionally complied into the code by turning on or off an identifier defined in the "DefTeco.h" file. It uses the control-P command and is VERY helpful. the "DefTeco.h" file. It uses the control-P command nd is VERY helpful. Page 14 hazardous to your sanity. A macro is provided which will strip the debugging code out of a source file to produce a readable listing file. Of course, it's a TECO macro, so you have to have TECO running. To use it, mung DELDBG and give it the file name including the ".C". It will produce a file with a .LIS file type. e name including the ".C". It will produce file with a .LIS file type. execution of TECO-C with TECO-11. Put a test command string into a file. Use DEFINE/USER_MODE to redirect the output of TECO-C to a file and execute the macro with TECO-C. Then do the same thing with TECO-11. Use the DIFFERENCES command to compare the two output files. They should be 100 percent identical. to compare the two output files. They should be 100