Tim Disney

Sweetening syntactic abstractions

Version 3.0 of Sweet.js has just been released! 🎉

To set expectations, keep in mind that I consider Sweet to still be an experiment and under heavy work. The version number is a reflection of semver (we've made breaking changes) not project maturity. Since the big redesign last year tons of progress has been made, but I expect at least one or two more breaking changes (major version bumps) before things start to get baked and really ready to use. That said, if you are excited about the idea of true syntactic abstractions in JavaScript please dive in!

So what's new?

Custom Operators

Custom operators are back! In the old pre-1.0 days we had the ability to define new operators with custom precedence and associativity but it was dropped in the redesign.

Operators are defined with the new operator keyword:

operator >>= left 1 = (left, right) => {
  return #`${left}.then(${right})`;
}

fetch('/foo.json') >>= resp => { return resp.json() }
                   >>= json => { return processJson(json) }

That expands to:

fetch("/foo.json").then(resp => {
  return resp.json();
}).then(json => {
  return processJson(json);
});

The implementation of custom operators is pretty experimental at the moment but give it a whirl and let us know if you run into any problems. More details are in the tutorial.

Note: the technical underpinnings for custom operators comes out of Jon Rafkind's dissertation on the Honu language.

Modules

We've been steadily adding ES module support over the past few releases. The interaction between macros and modules is fairly complex so this is an ongoing process.

We currently have the ability to import macros from another module:

// foo.js
'lang sweet.js';
export syntax m = // ...

// main.js
'lang sweet.js';
import { m } from './foo';

m // ...

Note the use of the 'lang sweet.js' directives. These directives are currently required in any module that uses macros. It allows the Sweet compiler to avoid needlessly expanding modules that don't contain any macros. At present the directive is "just" an optimization but soon we'll be using it for some pretty cool stuff.

You can now also import modules into compiletime code (macro definitions) by using the for syntax form of import:

// log.js
'lang sweet.js';

export function log(msg) {
  console.log(msg);
}

// main.js
import { log } from './log.js' for syntax;

syntax m = ctx => {
  log('doing some Sweet things');
  // ...
}

We're taking the Racket approach of dividing everything up into phases. Runtime syntax is in phase 0 and compiletime macro definitions are in phase 1. Importing for syntax allows you to phase shift your code around. Phases greater than phase 1 happen when you import for syntax a macro that uses another macro that was imported for syntax. This gives rise to an infinite "tower of phases" which sounds complicated but turns out to be pretty straightforward in practice.

Still to come are better support for implicit runtime imports, finer grain support for phases that let you import for a specific phase, and an equivalent to Racket's begin-for-syntax.

Note: the technical underpinnings of modules and macros comes out of the Racket approach set forward by Matthew Flatt in his "You want it when?" paper.

Readtables

While macros allow you to extend how syntax is parsed, sometimes you also need to extend how source text is lexed. The lexing extension approach we are taking is called readtables and @gabejohnson has been doing some amazing design and implementation work. Sweet now uses readtables internally and will soon be exposing them to users.

Internals and helpers

During expansion Sweet constructs several intermediate representations of syntax that can be manipulated and eventually turned into a Shift AST. The exact representation we want to use is under flux but unfortunately it is exposed to macro authors inside macro definitions. Exposing what should be internal details is bad so to move away from that Sweet now provides a helper library for macro authors:

import * as H from 'sweet.js/helpers' for syntax;

syntax m = ctx => {
  let v = ctx.next().value;
  if (H.isIdentifier(v, 'foo') {
    return H.fromString(v, 'bar');
  }
  return H.fromString(v, 'baz');
}
m foo; // expands to 'bar'

Macro authors should only use the helper library to inspect and manipulate syntax objects rather than rely on the current representation of syntax. Eventually we will document and freeze a intermediate syntax representation but until then just use the helpers.

What's next?

The current plan is to get Sweet to a solid and stable place where we can start building declarative conveniences on top its foundation. In particular, the current macro definition syntax is intentionally low-level and not convenient to work in. We've got some ideas about what this might look like but first we're going to make sure the base is solid.

If any of this excites you, please jump in! We'd love to have you!

Announcing Sweet.js 1.0

Sweet.js, the hygienic macro system for JavaScript, just got a shiny new release: the magical 1.0! 🎉

If you’ve been following the development of Sweet for a while you might have noticed a dearth of activity. Part of that was life (I recently finished grad school and started a new job) but part of that seeming lack of activity was actually a giant rewrite in progress.

Yes I know, that’s a thing you should never do.

But I swear it made sense this time. The collection of bad decisions I made early in development had finally made forward progress all but impossible. And so, Sweet has been completely rewritten.

So what’s good about the rewrite? Lot’s of stuff, but two I want to call out specifically here.

First, a proper parser is now integrated in the expansion pipeline. Previously, Sweet had a half-baked pseudo-parser that kinda-sorta built up an AST during expansion and then immediately flattened it back to tokens to let our crazy fork of esprima build up a proper AST before doing codegen.

Yeah, totes reasonable.

This insane pipeline had tons of bugs especially related to proper ES2015 support.

Now we have a proper parser modeled after the Shift Parser and producing a Shift AST with Babel as an optional backend for great ES2015 support everywhere. It’s amazing.

Second, Sweet now has much more reasonable macro binding forms. Previously, you could define a macro in two ways, the recursive and non-recursive declaration forms:

// recursive form
macro foo { /* ... */ }

// non-recursive form
let foo = macro { /* ... */ }

The recursive form binds the macro name inside the macro definition while the non-recursive form does not. This was gross and confusing because standard let in ES2015 does not work like this at all. Now, Sweet uses syntax and syntaxrec:

// non-recursive form
syntax foo = function (ctx) { /* ... */ }
// recursive form
syntaxrec foo = function (ctx) { /* ... */ }

These are more symmetric and fit in better as a compiletime extension of the var/let/const binding forms of ES2015.

While all these changes are great, there are a few items from the pre-1.0 days that have not been re-implemented yet. Custom operators and infix macros in particular are not currently supported. However, the foundation provided by the rewrite will make adding these features back pretty straightforward so expect them to be available soon.

So should you dive in with the new Sweet? Well, to be honest it really depends on how risk averse you are. Since so much has changed with the rewrite there are bound to be bugs we still need to shake out. Maybe don’t go running Sweet over your production code just yet.

But if you’d like to help out by putting Sweet through its paces we would love to have you! The best way to get started is to familiarize yourself with all the new syntax by reading the tutorial. If you’ve got questions or need help, head on over to gitter or #sweet.js on irc.mozilla.org.

Have fun sweetening your code!

Hygiene in sweetjs

The most important feature of sweet.js is hygiene. Hygiene prevents variables names inside of macros from clashing with variables in the surrounding code. It's what gives macros the power to actually be syntactic abstractions by hiding implementation details and allowing you to use a hygienic macro anywhere in your code.

For hygiene to work sweet.js must rename variables. Recently several people have asked me why sweet.js renames all the variables. Wouldn't it be better and cleaner to only rename the variables that macros introduce?

The tl;dr is "because hygiene" but let's unpack that a little.

Hygiene Part 1 (Binding)

The part of hygiene most people intuitively grok is keeping track of the variable bindings that a macro introduces. For example, the swap macro creates a tmp variable that should only be bound inside of the macro:

macro swap {
  rule { ($a, $b) } => {
    var tmp = $a;
    $a = $b;
    $b = tmp;
  }
}

var tmp = 10;
var b = 20;
swap (tmp, b)

Hygiene keeps the two tmp bindings distinct by renaming them both:

var tmp$1 = 10;
var b$2 = 20;

var tmp$3 = tmp$1;
tmp$1 = b$2;
b$2 = tmp$3;

This is the point where most people say "why bother renaming the variables outside of the macro"? Can't you just rename the bindings created by the macro? Wouldn't it be cleaner for the expansion to just be something like:

var tmp = 10;
var b = 20;

var tmp$1 = tmp;
tmp = b;
b = tmp$1;

Hygiene Part 2 (Reference)

The complication comes in with variable references. The body of a macro can contain references to bindings declared outside of the macro and those references must be consistent no matter the context in which the macro is invoked.

Some code to clarify. Let's say you have a macro that uses a random number function:

var random = function(seed) { /* ... */ }
let m = macro {
    rule {()} => {
        var n = random(42);
        // ...
    }
}

This macro needs to refer to random in any context that it gets invoked. But its context could have a different binding to random!

function foo() {
    var random = 42;
    m ()
}

Hygiene needs to keep the two random bindings different. So sweet.js will expand this into something like:

var random$1 = function(seed$4) { /* ... */ }
function foo() {
    var random$2 = 42;
    var n$3 = random$1(42);
    // ...
}

Note that there is no way for hygiene to do this if it only renamed identifiers inside of macros since both random bindings were declared outside of the macro. Hygiene is necessarily a whole program transformation.

(ps if this sort of feels like a closure you're on to something: one of the early techniques that led to modern hygiene algorithms was called syntactic closures)

Strictly speaking the hygiene algorithm is still conservative. Variable bindings declared outside of a macro that are never referenced by a macro don't really need to be renamed. However, modifying the hygiene algorithm to only rename exactly what needs to be renamed seems pretty difficult (especially to do so efficiently). If anyone knows techniques for this definitely let me know (or even better submit a pull request).

How to read macros

In my last post I gave a little overview of sweet.js the hygienic macro system I built over the summer. Today I want to write a little bit about what makes sweet.js possible and why we haven't really seen a macro system for JavaScript before now. I gave hints at some of this in my intern talk but now we can finally do a deep dive!

Basics

First, let's take a look at compilers 101:

Parser Pipeline

The traditional way you build a compiler front end is to first write a lexer and a parser. Code (a string) is fed to the lexer which produces an array of tokens which gets fed to the parser which produces an Abstract Syntax Tree (AST).

For example,

if (foo > bar) {
    alert("w00t!");
}

gets lexed into something like:

["if", "(", "foo", ">", "bar", ")", "{" ...]

The lexer is basically responsible for throwing away unnecessary whitespace and grouping identifiers, strings, and numbers into discrete chunks (ie tokens). The array of tokens is then parsed into an AST that might look something like this:

// output of esprima
{
    "type": "Program",
    "body": [
        {
            "type": "IfStatement",
            "test": {
                "type": "BinaryExpression",
                "operator": ">",
                "left": {
                    "type": "Identifier",
                    "name": "foo"
                },
                "right": {
                    "type": "Identifier",
                    "name": "bar"
                }
            },
            "consequent": {
                "type": "BlockStatement",
                "body": [{
                        "type": "ExpressionStatement",
                        "expression": {
                            "type": "CallExpression",
                            "callee": {
                                "type": "Identifier",
                                "name": "alert"
                            },
                            "arguments": [{
                                    "type": "Literal",
                                    "value": "w00t!",
                                    "raw": "\"w00t!\""
                                }]
                        }
                    }]
            },
            "alternate": null
        }
    ]
}

The AST gives you the structure necessary to do code optimization/generation etc.

So where can we fit macros into this picture? Which representation is best for macros to do their stuff?

Well, by the time we get to an AST it's too late since parsers only understand a fixed grammar (well, technically there is research on adaptive/extensible grammars but that way leads to madness!). Obviously the raw code as a string is too unstructured for macros so how about the array of tokens produced by the lexer?

Tokens are fine for cpp #define style macros but we want moar power! And, as it turns out, just normal tokens aren't going to cut it for us. Consider this simple macro that provides a concise way to define functions:

macro def {
    case $name $params $body => {
        function $name $params $body
    }
}
def add(a, b) {
    return a + b;
}

which should be expanded into:

function add(a, b) {
    return a + b;
}

Critically, note that the macro needs to match $params with (a, b) and $body with { return a + b; }. However, we don't have enough structure to do this with just the token array ["def", "add", "(", "a", ",", "b", ...]: we need to match the delimiters ((), {}, and []) first!

If you remember your compilers class (or went to wikipedia), delimiter matching is what separates context-free languages (what parsers recognize) from regular languages (what lexers recognize).

This is one of the reasons why macros are big in the lisp family of languages (scheme, racket, clojure, etc.). S-expressions (with (all (those (parentheses)))) are already fully delimited so it becomes almost trivial to do delimiter matching. Some people say this is due to homoiconicity but as Dave Herman pointed out, homoiconicity isn't really the point. It's not that the lisp family is homoiconic but rather that the nature of s-expressions makes it easy to implement read which is necessary for macros.

read is the crucial function that gives a little bit more structure to the array of tokens by matching up all those delimiters. Now instead of just a flat array of tokens we are going to get a read tree:

["def", "add", {
    type: "()",
    innerTokens: ["a", ",", "b"]
    }, {
    type: "{}",
    innerTokens: ["return", "a", "+", "b"]
}]

Note this doesn't have all the structure of a full AST, it just knows about tokens and delimiters not expressions and statements. So now our def macro pattern variables will match up like so:

$params -> {
    type: "()",
    innerTokens: ["a", ",", "b"]
}
$body -> {
    type: "{}",
    innerTokens: ["return", "a", "+", "b"]
}

Great! Now our pipeline looks something like:

Pipeline With Read

The Devil in the Details

Ok, so all well and good but then why haven't we seen people implement read and build a macro system for JavaScript before now?

It turns out that there's this really annoying token (for potential macro implementers) in JavaScript: /.

Here's the problem, depending on context / can mean two different things: the divide operator or the beginning of a regular expression literal.

10 / 2              // 5
/foo/.test("foo")   // true

(Well technically I guess / can also mean the start of a comment but this is always easy to figure out (since // always means line comment))

So how do we disambiguate between divide and regex? It turns out that the way a normal parser (like esprima for example) does it is by running the lexer and parser together and resolving the ambiguity via the current parsing context. In other words, as the parser is working through each production, it calls out to the lexer with a flag saying what context it is in. Depending on that context the lexer will either lex / as a divide token or as a regular expression literal.

But, we can't use the parsing context in read because we don't have any parsing context yet!

So, we somehow need to separate the lexer/reader from the parser.

Now you might think we could get away with just leaving / as an ambiguous token (say a divOrRegex token for example) to be handled by the parser once all the macros have been expanded away but consider this code fragment we might want to read:

... { /foo}bar/ ...
// as a token array this would be
// [... "{", "/", "foo", "}", "bar", "/", ...]

Remember that the entire point of read is to do delimiter matching, so should we match the } with the opening { or as part of a regular expression literal (remember /}/ is a valid regex that matches a single })? It completely depends on our interpretation of /!

Therefore, in our lexer/reader we must disambiguate the meaning of / without the help of the parser. So how do we do that?

This is the hard technical problem that Paul Stansifer (he also designed the Rust macro system) solved this summer, unlocking the power of JavaScript macros for us all!

The basic idea is when you see a / as you are reading, just look back a couple of tokens and a small fixed set of tokens will determine unambiguously if / should be a divide or the start of a regex literal. To figure out exactly how far back and which tokens to look for requires working through all the different cases in the JavaScript grammar which is hard but done!

A snippet of this algorithm goes something like:

if tok is /
    if tok-1 is )
        look back to matching (
        if identifier before ( in "if" "while"
                                  "for" "with"
            tok is start of regex literal
        else
            tok is divide
    ...

For example, if we have:

if (foo + 24 > bar) /baz/.test("foo")

When we see the / we note that the previous token was ) so we find its matching ( and note that the token before that was if so / must be the beginning of a regular expression.

What's really cool here is that when we need to disambiguate / we've already been reading up to that point so (foo + 24 > bar) is a single token (the () token with inner tokens foo, +, 24, >, and bar) and checking the token before the parens is literally as simple as tokens[idx-2] === "if". By creating the read tree as we go along we don't need to carry lookbehind state in a complicated way; in fact, in the worst case, we only have to look back 5 tokens.

If you want to read more about how this works, I've got the entire algorithm pseudo-coded up here and the actual JavaScript implementation in these relatively short two functions.

Hygienic Macros for JavaScript

Been slow to embloggen this (start of the quarter etc.) but my summer intern project was released a little while ago at sweetjs.org. Sweet.js is a hygienic macro compiler for JavaScript that takes JavaScript written with macros and produces normal JavaScript you can run in your browser or on node. It's an experiment to see if macros can work well for JavaScript.

Macros allow you to extend the syntax of JavaScript by writing macro definitions that perform syntax transformations, thus allowing you to do cool things like add sweet syntax for var destructuring or even haskell-like do notation.

The idea is to provide a middle ground between no syntax extension and having to build an entire compile-to-js language like CoffeeScript, which can't compose with other syntax extensions.

I talk a bit about the motivation and design in this presentation I gave at the end of my internship.

The language that most directly inspires sweet.js is, of course, scheme. I'm not really much of a schemer myself but have always admired the fancy work going on in that world and macros are pretty fancy. At the start of the summer I was still somewhat of a macro newbie but being at Mozilla was fantastic since I was able to draw on and learn from two well-versed scheme macrologists Dave Herman (who's PhD thesis was on macros) and Paul Stansifer (who has been developing the Rust macro system).

Sweet.js is still in very early stages, lots of bugs and missing features, but I think it shows some real promise. Let me know what you think!