This manual is for Incrementalist version 0.2.0.
Incrementalist is a system for incrementally parsing Common Lisp code that has been developed in the context of the Climacs editor for Common Lisp code and extracted into its own system.
Incrementalist uses the Cluffer library to represent its buffers. We briefly describe the essential aspects of that library below. For detailed information on how it works, see the dedicated documentation.
Cluffer proposes two distinct protocols, namely the edit protocol and the update protocol.
We use a special version of the Common Lisp reader, i.e., Eclector, to parse the contents of a buffer. We use a special version of the reader for the following reasons:
As mentioned in Representation of the Editor Buffer, the general control structure for buffer modifications and incremental updates was designed with the following goals:
Notice that it was not a goal that editing operations use as little computational power as possible.
Input events can be divided into two categories:
When an event in the first category occurs, the following chain of events is triggered:
warning: There seem to be cases where the syntax of one buffer depends not only on its own associated buffer, but also on the contents of other buffers. It is not a big problem if the dependency is only on the contents of other buffers, but if the dependency is also on the result of the syntax analysis of other buffers, then one syntax update might invalidate another. In that case, it might be necessary to loop until all analyses are complete. This can become very complicated because there can now be circular dependencies so that the entire editor gets caught in an infinite loop.
Finally, visible views are repainted using whatever combination they want of the buffer contents and the result of the syntax update. The syntax update uses the time stamps of lines in the buffer and of the previous syntax update to compute an up-to-date representation of the buffer. This computation is done incrementally as much as possible.
We call the data structure for storing the (modified) return value of the reader together with start and end location a wad. It contains the following slots:
The representation of a wad is shown in Figure 4.1.
A location in the buffer is considered a top-level location if and only if, when the buffer is parsed by a number of consecutive calls to read, when this location is reached, the reader is in its initial state with no recursive pending invocations. Similarly A wad is considered a top-level wad if it is the result of an immediate call to read, as opposed to of a recursive call.
Let the initial character of some wad be the first non-whitespace character encountered during the call to the reader that produced this wad. Similarly, let the final character of some wad be the last character encountered during the call to the reader that produced this wad, excluding any look-ahead character that could be un-read before the wad was returned.
The value of the start-line slot for a wad w is computed as follows:
The value of the slot height of some wad w is the number of lines between the start line of w and the final character of w. If w starts and ends on the same line, then the value of this slot is 0.
The value of the slot start-column is the absolute column number of the initial character of this wad. A value of 0 means the leftmost column.
The value of the slot end-column is the absolute column number of the final character of the wad.
Incrementalist maintains a sequence 1 of top-level wads. This sequence is organized as two ordinary Common Lisp lists, called the prefix and the suffix. Given a top-level location L in the buffer, the prefix contains a list of the top-level wad that precede L and the suffix contains a list of the top-level wads that follow L. The top-level wads in the prefix occur in reverse order compared to order in which they appear in the buffer. The top-level wads in the suffix occur in the same order as they appear in the buffer. the location L is typically immediately before or immediately after the top-level expression in which the cursor of the current view is located, but that is not a requirement. Figure 4.2 illustrates the prefix and the suffix of a buffer with five top-level expressions.
Either the prefix or the suffix or both may be the empty list. The location L may be moved. It suffices 2 to pop an element off of one of the lists and push it onto the other.
To illustrate the above data structures, we use the following example:
... 34 (f 10) 35 36 (let ((x 1) 37 (y 2)) 38 (g (h x) 39 (i y) 40 (j x y))) 41 42 (f 20) ...
Each line is preceded by the absolute line number. If the wad starting at line 36 is a member of the prefix or if it is the first element of the suffix, it would be represented like this:
36 04 (let ((x 1) (y 2)) (g (h x) (i y) (j x y))) 00 01 ((x 1) (y 2)) 00 00 (x 1) 01 00 (y 2) 02 02 (g (h x) (i y) (j x y)) 00 00 (h x) 01 00 (i y) 02 00 (j x y)
Since column numbers are uninteresting for our illustration, we show only line numbers. Furthermore, we present a list as a table for a more compact presentation.
Occasionally, some top-level wads need to be moved from the prefix to the suffix or from the suffix to the prefix. There could be several reasons for such moves:
These adjustments are always accomplished by repeatedly moving a single top-level wad.
To move a single top-level wad P from the prefix to the suffix, the following actions are executed:
To move a single top-level wad P from the suffix to the prefix, the following actions are executed:
We illustrate this process by showing four possible top-level locations in the example buffer. If all three top-level wads are located in the suffix, we have the following situation:
prefix ... suffix 34 00 (f 10) 02 04 (let ((x 1) (y 2)) (g (h x) (i y) (j x y))) 06 00 (f 20) ...
In the example, we do not show the children of the top-level wad.
If the prefix contains the first top-level expression and the suffix the other two, we have the following situation:
prefix ... 34 00 (f 10) suffix 36 04 (let ((x 1) (y 2)) (g (h x) (i y) (j x y))) 06 00 (f 20) ...
If the prefix contains the first two top-level expressions and the suffix the remaining one, we have the following situation:
prefix ... 34 00 (f 10) 36 04 (let ((x 1) (y 2)) (g (h x) (i y) (j x y))) suffix 42 00 (f 20) ...
Finally, if the prefix contains all three top-level expressions, we have the following situation:
prefix ... 34 00 (f 10) 36 04 (let ((x 1) (y 2)) (g (h x) (i y) (j x y))) 42 00 (f 20) suffix ...
Modifications to the buffer are reported at the granularity of entire lines. The following operations are possible:
Several different lines may be modified between two incremental updates, and in different ways. The first step in an incremental update step is to invalidate wads that are no longer known to be correct after these modifications. This step modifies the data structure described in Prefix and Suffix in the following way:
The order of the wads in the list of residual wads is the same as the order of the wads in the buffer. The slot start-line of each wad in the list is the absolute line number of the initial character of that wad.
Suppose, for example, that the buffer contents in our running example was modified so that line 37 was altered in some way, and a line was inserted between the lines 39 and 40. As a result of this update, we need to represent the following wads:
... 34 (f 10) 35 36 (x 1) 37 38 (h x) 39 (i y) 40 41 (j x y) 42 43 (f 20) ...
In other words, we need to obtain the following representation:
prefix ... 34 00 (f 10) residual 36 00 (x 1) 38 00 (h x) 39 00 (i y) 41 00 (j x y) suffix 43 00 (f 20) ...
While the list of residual wads is being constructed, its elements are in the reverse order. Only when all buffer updates have been processed is the list of residual wads reversed to obtain the final representation.
All line modifications are reported in increasing order of line number. Before the first modification is processed, the prefix and the suffix are positioned as indicated above, and the list of residual wads is initialized to the empty list.
The following actions are taken, depending on the position of the modified line with respect to the suffix, and on the nature of the modification:
Modifications potentially apply to elements of the suffix. When such an element needs to be taken apart, we try to salvage as many as possible of its descendants. We do this by moving the element to a worklist organized as a stack represented as an ordinary Common Lisp list. The top of the stack is taken apart by popping it from the stack and pushing its children. This process goes on until either the top element has no children, or it is no longer affected by a modification to the buffer, in which case it is moved to the list of residual wads.
Let us see how we process the modifications in our running example.
Line 37 has been altered, so our first task is to adjust the prefix and the suffix so that the prefix contains the last wad that is unaffected by the modifications. This adjustment results in the following situation:
prefix ... 34 00 (f 10) residue worklist suffix 36 04 (let ((x 1) (y 2)) (g (h x) (i y) (j x y))) 06 00 (f 20) ...
The first wad of the suffix is affected by the fact that line 37 has been modified. We must move the children of that wad to the worklist. In doing so, we make the start-line of the children reflect the absolute line number, and we also make the start-line of the next wad of the suffix also reflect the absolute line number. We obtain the following situation:
prefix ... 34 00 (f 10) residue worklist 36 01 ((x 1) (y 2)) 38 02 (g (h x) (i y) (j x y)) suffix 42 00 (f 20) ...
The first element of the worklist is affected by the modification of line 37. We therefore remove it from the worklist, and add its children to the top of the worklist. In doing so, we make the start-line of those children reflect absolute line numbers. We obtain the following situation:
prefix ... 34 00 (f 10) residue worklist 36 00 (x 1) 37 00 (y 2) 38 02 (g (h x) (i y) (j x y)) suffix 42 00 (f 20) ...
The first element of the worklist is unaffected by the modification, because it precedes the modified line entirely. We therefore move it to the residue list. We now have the following situation:
prefix ... 34 00 (f 10) residue 36 00 (x 1) worklist 37 00 (y 2) 38 02 (g (h x) (i y) (j x y)) suffix 42 00 (f 20) ...
The first wad of the top of the worklist is affected by the modification. It has no children, so we pop it off the worklist.
prefix ... 34 00 (f 10) residue 36 00 (x 1) worklist 38 02 (g (h x) (i y) (j x y)) suffix 42 00 (f 20) ...
The modification of line 37 is now entirely processed. We know this because the first wad on the worklist occurs beyond the modified line in the buffer. We therefore start processing the line inserted between the existing lines 39 and 40. The first item on the worklist is affected by this insertion. We therefore remove it from the worklist and push its children instead. In doing so, we make the start-line slot those children reflect the absolute line number. We obtain the following result:
prefix ... 34 00 (f 10) residue 36 00 (x 1) worklist 38 00 (h x) 39 00 (i y) 40 00 (j x y) suffix 42 00 (f 20) ...
The first element of the worklist is unaffected by the insertion because it precedes the inserted line entirely. We therefore move it to the residue list. We now have the following situation:
prefix ... 34 00 (f 10) residue 36 00 (x 1) 38 00 (h x) worklist 39 00 (i y) 40 00 (j x y) suffix 42 00 (f 20) ...
Once again, the first element of the worklist is unaffected by the insertion because it precedes the inserted line entirely. We therefore move it to the residue list. We now have the following situation:
prefix ... 34 00 (f 10) residue 36 00 (x 1) 38 00 (h x) 39 00 (i y) worklist 40 00 (j x y) suffix 42 00 (f 20) ...
The first element of the worklist is affected by the insertion, in that it must have its line number incremented. In fact, every element of the worklist and also the first element of the suffix must have their line numbers incremented. Furthermore, this update finishes the processing of the inserted line. We now have the following situation:
prefix ... 34 00 (f 10) residue 36 00 (x 1) 38 00 (h x) 39 00 (i y) worklist 41 00 (j x y) suffix 43 00 (f 20) ...
With no more buffer modifications to process, we terminate the procedure by moving remaining wads from the worklist to the residue list. The final situation is shown here:
prefix ... 34 00 (f 10) residue 36 00 (x 1) 38 00 (h x) 39 00 (i y) 41 00 (j x y) worklist suffix 43 00 (f 20) ...
Once the cache has been processed so that only wads that are known to be valid remain, the new buffer contents must be fully parsed so that its complete structure is reflected in the cache.
Conceptually, we obtain a complete cache by applying read repeatedly from the beginning of the buffer, until all top-level wad have been found. But doing it this way essentially for every keystroke would be too slow. In this section we explain how the partially invalidated cache is used to make this process sufficiently fast.
Jump to: | E P S T U W |
---|
Index Entry | Section | ||
---|---|---|---|
| |||
E | |||
edit protocol: | Representation of the Editor Buffer | ||
| |||
P | |||
prefix: | Prefix and Suffix | ||
| |||
S | |||
suffix: | Prefix and Suffix | ||
| |||
T | |||
top-level location: | WADs | ||
top-level wad: | WADs | ||
| |||
U | |||
update protocol: | Representation of the Editor Buffer | ||
| |||
W | |||
wad: | WADs | ||
|
Jump to: | E P S T U W |
---|