From LAP to Lisp...

Bytecode and miscellaneous thoughts on the Emacs Runtime

(enter "s" to see presenter text for slides)

Review

  • Bytecode in Emacs is pervasive
  • Gives speedup of 2 to 3 times in computation-bound code
  • Is in reverse Polish prefix (data first, then operator)
  • A lot of the operations are either Lisp or Emacs functions
  • Demonstrated how to turn bytecode to LAP
  • But how we would go the other way?

Wither Bytecode?

  • Other programming languages
    • Common Lisp — is sort of being done
    • Others — huge Elisp code base migration
  • JITs
    • libjit, lightning — works off of Bytecode
    • JVM, LLVM — needs to replicate Lisp environment; garbage collector?
    • V8, SpiderMonkey — pipcet has gc integrated

A Personal History

  • Interest in beefing up the runtime environment in programming languages for debugging
  • Did it for bash, GNU Make, Ruby, ...
  • Perl and Python also have decompilation at runtime
  • Can decompilation be applied to bytecode?
  • Yes, but there is a lot of work to do

A Tale of Two Interpreters part 1: Lisp

plus functionp

ELISP> (functionp 5)
nil
ELISP> (functionp (lambda(x) x))
t
      

plus symbol-function and funcall


ELISP> (symbol-function (lambda(x) x))
(lambda (x) x)
ELISP> (setq rocky-identity (lambda(x) x))
(lambda (x) x)
ELISP> (symbol-function rocky-identity)
(lambda (x) x)
ELISP> (funcall rocky-identity 5)
5 (#o5, #x5, ?\C-e)
		

A Tale of Two Interpreters part 2: Bytecode

plus byte-compile and aref

ELISP> (byte-compile rocky-identity)
#[(x)
  "^H\207"
  [x]
  1]
ELISP> (aref rocky-identity 2)  ;; just the byte string
  "^H\207"
ELISP> (functionp rocky-identity)
t
ELISP> (funcall rocky-identity 5)
5 (#o5, #x5, ?\C-e)
    
plus disassemble

    ELISP> (disassemble rocky-identity)
byte code:
  args: (x)
0	varref	  x
1	return
    

An Emacs Error from the Lisp Interpreter


(fib-bug 3)
Debugger entered--Lisp error: (arith-error)
/(1 0)
(cond ((< n 1) 1) ((= fib-bug-count 0) (/ n 0)) (t (+ (fib-bug (1- n)) (fib-bug (- n 2)))))
fib-bug(1) ;; A link to the source code
(+ (fib-bug (1- n)) (fib-bug (- n 2)))
(cond ((< n 1) 1) ((= fib-bug-count 0) (/ n 0)) (t (+ (fib-bug (1- n)) (fib-bug (- n 2)))))
fib-bug(2) ;; A link to the source code
...
		

An Emacs Lisp Error from the Bytecode Interpreter


(fib-bug 3)
Debugger entered--Lisp error: (arith-error)
fib-bug(1)
fib-bug(2)
fib-bug(3)
eval((fib-bug 3) nil)
elisp--eval-last-sexp(nil)
...
		

Location, Location, Location...

What Emacs concepts exist for storing a location that might be useful here?

marks?
Marks associate a buffer and offset, not a file.

What Emacs concepts exist for storing a location in a file?

  • Docstrings
  • Info nodes

symbol-function Revisited


ELISP> (load-file "/usr/share/emacs/25.2/lisp/files.elc")
t
ELISP> (symbol-function 'insert-file)
#[257 "\300^A\301\";"\207
  [insert-file-1 insert-file-contents]
  4
  ("/usr/share/emacs/25.2/lisp/files.elc" . 154863)
  "*fInsert file: "]
		  

Note the "doc-string" field.

More symbol-function results:


ELISP> (symbol-function 'calendar)
#[(&optional arg)
  ("/usr/share/emacs/25.2/lisp/calendar/calendar.elc" . 40335)
  nil 3
  ("/usr/share/emacs/25.2/lisp/calendar/calendar.elc" . 38677)
  "P"]
      

The first pointer is to bytecode; the second is to docstring.

symbol-function autoload


ELISP> (autoload 'realgud:remake "realgud.el"
      	"Debugger for GNU (re)make" t)
realgud:remake
ELISP> (symbol-function 'realgud:remake)
(autoload "realgud.el" "Debugger for GNU (re)make" t nil)
	  	  

symbol-function defalias


ELISP> (defalias 'remake 'realgud:remake)
remake
ELISP> (symbol-function 'remake)
realgud:remake
		    

symbol-function C function


ELISP> (symbol-function 'end-of-line)
#<subr end-of-line>
        

Tracking Bytecode offsets

An Emacs Lisp Decompiler

Parse to Text

With "binary_op" rule

With "expr" rule where possible

With "binary_expr" rule

Finishing up

Parse at offset 3

Parse at offset 3 Display


(/ a (/ b c))
     -------
				

Parse at offset 4

Parse at offset 4 Text display


(/ a (/ b c))
-------------
		  

A Decompiler Pipeline

Adding Locations to Bytecode


0         1
01234567890123456789
(/ a (/ b c))

("tmp", "divide.el"
  '(0 . (3 . 3))
   (1 . (8 . 8))
   (2 . (10 . 10))
   (3 . (5 . 11))
   (4 . (0 . 12)))
				

Suggested Projects

  • Bytecode Manual
    • bugs: typos, grammar typos, thinkos
    • expand optimization section
    • links back to Emacs/Elisp manual
  • Decompiler:
    • Earley algorithm in Emacs Lisp
    • Fill out grammar
    • Complete the "massage instructions" phase
  • Emacs Hacking
    • Add run-time support for getting offset or adding a "line number table"
    • Bytecode lint checker

Links...












- fin