Eclector

Jan Moringen (scymtym on IRC)

Agenda

Goal: Introduce and demonstrate Eclector, mainly from a user's perspective.

  1. Introduction and project history
  2. Features (roughly based on structure of documentation)
    • Errors and error recovery (with demo)
    • Customizing ordinary reader features (with demo)
    • Skipped inputs and lower-level entry-points (with demo)
    • Parse results (with demo)
  3. Performance
  4. Problems and future work

Introduction

Introduction: What is the Common Lisp Reader?

The Common Lisp reader is not single object or concept but a rather lose collection of things described in chapter 2 of the specification.

Conceptually Technically
Characters, syntax types set-syntax-from-char
Readtable, variables *readtable*, *read-eval*, *read-base*, …
Reader algorithm read, read-preserving-whitespace, read-from-string
Token interpretation ? (parse-integer belongs to numbers)
Standard macros get-macro-character, get-dispatch-macro-character, …

Control and customization of some aspects through readtables and variables.

Project History

  • Started by Robert Strandh as the reader for SICL (old goal: conforming reader implementation).
  • Extracted from SICL (and named) by Robert Strandh (new goal: portable and extensible reader implementation)
  • Completed, made extensible and now maintained by Jan Moringen

So, the name …

The Eclector was the main ship used by Yondu's clan of Ravagers serving as a port to the M-ships. https://marvelcinematicuniverse.fandom.com/wiki/Eclector

I don't know how people could miss such an obvious reference. I mean, with beach talking about comics all the time and everything.

But seriously: Eclector, Eclector.

Project Goals

Create an implementation of the Common Lisp reader algorithm and surrounding machinery that

  1. conforms to the specification
  2. provides excellent, translatable, error messages for all possible syntax errors (and allows user-defined macros to do the same)
  3. can recover from all possible syntax errors (which does not mean it can magically fix invalid code)
  4. is highly extensible and customizable (beyond the standard mechanisms such as reader macros and syntax types)
  5. supports source location tracking and parse result construction
  6. achieves acceptable performance for most clients and use-cases (possible exceptions: reading huge amounts of numeric data)

Non-goal: support for loading an entire file (more on that later)

Project Anatomy

Implementing a specification

Implementing a specification is great! … but our favorite one doesn't cover everything. For example, which of the following are valid?

#S A proper list of a structure type name and initargs must follow #S.(foo) #C (1 2) #C#|foo|#(1 2) #C#+true(1 2) #C#.(list 1 2) `#',Unquote is illegal in the function reader macro.foo

Size (generated using David A. Wheeler's 'SLOCCount'.)
code directory 4,124 lines
test directory 2,930 lines
cost to make $ 433,634
Tests
Very close to 100 % branch coverage (there are still bugs, but regressions are extremely unlikely).
Documentation
texinfo-based with homegrown syntax highlighting.

Clients

SICL
Uses Eclector as the Common Lisp reader
Clasp
Uses Eclector as the Common Lisp reader
yitzi et al's common-lisp-jupyter
Completion
GrammaTech's software evolution library and s-expression diff
Uses Eclector to construct syntax trees
Shinmera's staple documentation system
Example source highlighting and cross-referencing
Second Climacs (future)
Highlighting, syntax checking, basis for static analysis etc.

Side node: There is clear demand for "Eclector but at the s-expression syntax level".

Errors and Error Recovery

Errors: Conditions

At the character level, there are many possible ways of violating the specified Common Lisp syntax:

  • (1 2 3While reading list, expected the character ) when input ended.
  • ::fooA symbol token must not start with two package markers as in ::name.
  • #\HyperUnrecognized character name: "Hyper"
  • #10R89bThe character b is not a digit in base 10.76
  • `(:foo ,)An object must follow a unquote.

To produce good error messages for as many of those as possible, Eclector can specifically detect around 99 kinds of syntax errors for which it has corresponding condition types. Such as:

Errors: Recovery

For applications such

  • editors
  • IDEs
  • syntax checkers
  • static analyzers

it is important to continue processing source code after encountering errors.

(defun foo (x y)
  (+ x #b002The character 2 is not a digit in base 2.0101 (code-char #\RetrnUnrecognized character name: "Retrn")	Avoid tab.'(,Unquote not inside backquote.(frob::barDo not use unexported symbols. y)) #1#Reference to undefined label #1#.)
While reading list, expected the character ) when input ended.

The eclector.reader:recover restart (and convenience function of the same name) can be used to recover and continue reading after most syntax errors.

Errors: Recovery Example (1)

The simplest way or making a recovering reader is via the eclector.reader:recover convenience function:


(ECLECTOR.READER:QUASIQUOTE (:FOO (ECLECTOR.READER:UNQUOTE NIL)))

Errors: Recovery Example (2)

For each error, this prints the error message, prints the restart description and invokes eclector.reader:recover:

(handler-bind ((error (lambda (condition)
	Avoid tab.	Avoid tab.	Avoid tab.(let ((restart (find-restart 'eclector.reader:recover)))
	Avoid tab.	Avoid tab.	Avoid tab.  (format t "Recovering from error:~%~2@T~A~%using~%~2@T~A~2%"
	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.  condition restart))
	Avoid tab.	Avoid tab.	Avoid tab.(eclector.reader:recover))))
  (print (eclector.reader:read-from-string "`(::foo ,")))
Recovering from error:
  A symbol token must not start with two package markers as in ::name.
using
  Treat the character as if it had been escaped.

Recovering from error:
  While reading unquote, expected an object when input ended.
using
  Use NIL in place of the missing object.

Recovering from error:
  While reading list, expected the character ) when input ended.
using
  Return a list of the already read elements.


(ECLECTOR.READER:QUASIQUOTE (:FOO (ECLECTOR.READER:UNQUOTE NIL)))

Errors: Acclimation Example

Condition and restart reports use the Acclimation library:

(let* ((language (make-instance 'acclimation:german))
       (acclimation:*locale* (make-instance 'acclimation:locale :language language)))
  (handler-bind ((error (lambda (condition)
	Avoid tab.	Avoid tab.	Avoid tab.  (let ((restart (find-restart 'eclector.reader:recover)))
	Avoid tab.	Avoid tab.	Avoid tab.    (format t "Behandle Fehler~%~2@T~A~%durch~%~2@T~A~2%"
	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.    condition restart))
	Avoid tab.	Avoid tab.	Avoid tab.  (eclector.reader:recover))))
    (eclector.reader:read-from-string "`(::foo ,")))
Behandle Fehler
  Ein Symbol darf nicht mit zwei Paketmarkierungen beginnen wie bei ::name.
durch
  Behandle die Zeichen als maskiert.

Behandle Fehler
  Beim Lesen eines Antizitats wurde ein Objekt erwartet als die Eingabe endete.
durch
  Verwende NIL anstelle des fehlenden Objekts.

Behandle Fehler
  Beim Lesen einer Liste wurde das Zeichen ) erwartet als die Eingabe endete.
durch
  Erstelle eine Liste bestehend aus den bereits gelesenen Elementen.

Background image: Public Domain, https://en.wikipedia.org/w/index.php?curid=33285421

Demo: Linting and Checking

  • Console linter
  • McCLIM-based graphical linter

Customizing Ordinary Reader Features

Customization: Architecture

Architecture Idea 1

Express all operations performed by the reader as a set of protocols in which each generic function accepts a client parameter.

Examples

(defgeneric eclector.reader:interpret-symbol-token (client input-stream
	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.    token
	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.    position-package-marker-1
	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.	Avoid tab.    position-package-marker-2))

(defgeneric eclector.reader:evaluate-feature-expression (client feature-expression))

Customization: Architecture

read-call-sequence-ordinary-customization.png

Customization: Sandboxing

Threats and extension points for corresponding mitigation:

Read-time evaluation
eclector.reader:evaluate-expression
Structure constructors
eclector.reader:make-structure-instance
Uncontrolled interning
eclector.reader:interpret-symbol

Demo: Simple Sandboxed Reader

Writing a simple sandboxed reader using the mentioned methods:

Read-time evaluation
eclector.reader:evaluate-expression
Structure constructors
eclector.reader:make-structure-instance
Uncontrolled interning
eclector.reader:interpret-symbol

Skipped Input and Lower-lever Entry Points

Skipped Input: Architecture

Architecture Idea 2

Notify the client when non-objects are encountered in the input and provide a read-style function that does not skip over them.

Examples

(defgeneric eclector.reader:note-skipped-input (client input-stream reason))

(defgeneric eclector.reader:call-as-top-level-read
    (client thunk input-stream eof-error-p eof-value preserve-whitespace-p))

(defgeneric eclector.reader:read-maybe-nothing
    (client input-stream eof-error-p eof-value))

Skipped Input: Architecture

read-call-sequence-ordinary-skipped-input.png

Demo: Syntax Highlighting

Before

(list 1 #|foo|# "bar"

Call

(highlight-code "<code class=\"src src-lisp\">(list 1 #|foo|# \"bar\"</code>")

After

(list 1 #|foo|# "bar"
While reading list, expected the character ) when input ended.

Used in this presentation and will be used in the Eclector manual.

Demo: Read-time Conditionals

  • Read-time conditional checker
  • Read-time conditional visualizer

Parse Results

Parse Results: Concrete and Abstract Syntax Trees

Concrete Syntax Tree
cst.png
Abstract Syntax Tree
ast.png

Parse Results: Architecture

Architecture Idea 3

As a second way of using Eclector, weave the construction of source locations and parse results (for objects and skipped input) into the normal reader execution.

Examples

Parse Results: Architecture

read-call-sequence-parse-result.png

Demos: Applications of Parse Result

  • Symbol spelling linter
  • Parser thingy

Performance

Performance: Experiment

Test

(time (loop :repeat 10000
	Avoid tab.    :do (read-from-string "(1 (2 3) #+sbcl #1=\"foo\" `(,#1#))")))

SBCL 2.0.6.debian:

Evaluation took:
  0.020 seconds of real time
  0.020502 seconds of total run time (0.020502 user, 0.000000 system)
  105.00% CPU
  62,665,929 processor cycles
  3,995,504 bytes consed


Eclector master:

Evaluation took:
  0.150 seconds of real time
  0.149348 seconds of total run time (0.149348 user, 0.000000 system)
  [ Run times consist of 0.011 seconds GC time, and 0.139 seconds non-GC time. ]
  99.33% CPU
  52 lambdas converted
  449,065,278 processor cycles
  86,892,784 bytes consed


Performance: Analysis

eclector-read-from-string-flamegraph.png
  • eclector.reader:fixup and friends is a big chunk (probably of the consing as well because of hash-table shenanigans)
  • The rest of the profile is relatively flat, but that little green thing is basically everywhere …

Future Work

Future Work: Float Construction

Current float construction is naive:

(let ((magnitude (* (+ (funcall decimal-mantissa)
	Avoid tab.	Avoid tab.       (/ (funcall fraction-numerator)
	Avoid tab.	Avoid tab.	Avoid tab.  fraction-denominator))
	Avoid tab.	Avoid tab.    (if exponentp
	Avoid tab.	Avoid tab.	Avoid tab.(expt 10 (* exponent-sign (funcall exponent)))
	Avoid tab.	Avoid tab.	Avoid tab.1))))
  (return-from interpret-token
    (* sign (coerce magnitude type))))
  • Could use a proper algorithm.
  • But better let client decide:

    (defgeneric make-float
        (client type
         sign decimal-mantissa fraction-numerator fraction-denominator
         exponent-sign exponent))
    
  • Or both.

Future Work: Symbol and Package Protocol

Problem
Custom symbol and package representations do not work with some of the builtin behavior:
  • Checking basic syntax of feature expression tests
  • Checking basic syntax of structure literals
Solution(?)
A protocol through which Eclector can handle user-defined symbol and package representations:

Future Work: Extensions

Existing and novels extensions:

  • ::()
  • Rational float syntax
  • Thousand separator
  • hash-table literals
  • Unicode character names

Future Work: Multi-column Error Locations

Future Work: Extensible Quasiquotation

Future Work: Performance

Future Work: org-export?

Thank You for Your Attention!

Eclector Resources:

This presentation
https://techfak.de/~jmoringe/presentation-eclector/slides.html
IRC Channel
#sicl on freenode
Code
https://github.com/s-exressionists/eclector
Documentation
https://s-expressionists.github.io/Eclector/