Hi folks! Been a while–just started my (paying) job a few weeks ago, and haven’t had much time to work on Sunshower. I took some time last weekend to implement expression parsing for the Pascal frontend and work on the IR (intermediate representation), so we’re still making some good progress.

You can grab the source over at https://github.com/sunshower-io/updraft

Building

It’s simple to build Updraft. It doesn’t have any external dependencies, so just clone that repo and run go build -o ucc main.go. None of the backends are fleshed out or integrated yet, but you can see some cool things about the structure of your programs.

Supported syntax structures

Right now, things are pretty basic (hahaha, nah, they’re Pascal, which is almost as bad). Pascal’s pretty easy to parse, and so it makes for a good straw-man language. It also supports most of the features that our backends and IRs will require, so using it to punch through the entire compiler stack won’t leave us in the lurch with more modern languages. Right now, the Pascal front-end supports:

Comments

{this is a comment}

{this
   is 
another
comment}

Primitive Types

{string}
str := 'string';
{integers (64 bit)}
a := 432123421341232515;

{floating-point}
c := 2.001e-6;


Compound statements

BEGIN
{ statement list }
END.

Assignments

a := 1;
b := a + a;

Arithmetic Expressions


c := 4 * 2 + 16 / 90 + (200 * (300 + 400) + 500);
a := 2 * (c - 16);

But what we’re demoing this week is some cool trace features.

Feature 1: Pluggable front-ends

While only Pascal’s currently supported, if you look into UCC (Updraft Compiler Collection)’s entry-point, you’ll see that, by using Convection, our Go dependency-injection framework, we only load the compiler toolset for a given language:

func (t *TraceConfiguration) TraceCommand(
        cmd *root.RootCommand,
        c compiler.Compiler, // yaaay.  Injected =)
        cfg *root.RootConfiguration,
) {
    //etc.
}

which keeps startup fast and memory-consumption lean. Eventually, we might support dynamically-linkable front-ends, but since Updraft is intended to be embedded, I doubt it’ll become a problem.

Feature 2: Tracing

UCC contains some pretty legit tracing facilities already, including:

Unresolved symbols:
{test.pas} 
{hello world}

BEGIN
        a := 1;
        b := 200;
        c := 1 * 20 + a;
        e := 1;
        d := c + a + e + nothere;


END.

When run with -st (-s -> summary, t -> tokens) (–only available in trace mode)(actual unresolved symbols during compilation will be reported after lexical analysis)

Results in:

ERROR{type:'syntax', token: TOKEN{text: nothere, line: 8, col: 17, type: Token('IDENTIFIER'), value: '%!s(<nil>)'}

                   3 source lines.
                   0 syntax errors.
0 seconds total parsing time

Symbol table dumps

Adding the -y (./ucc trace -e interpreter -f example.pas -l pascal -sty)
flag for sYmbols on the same file results in:

ERROR{type:'syntax', token: TOKEN{text: nothere, line: 8, col: 17, type: Token('IDENTIFIER'), value: '%!s(<nil>)'}
===== SYMBOL TABLE =====
Identifier          Line Numbers
----------          ------------
a                    004 006 008 
b                    005 
c                    006 008 
d                    008 
e                    007 008 
nothere              008 


                   3 source lines.
                   0 syntax errors.
0 seconds total parsing time

Note that nothere still got a symbol-table entry. That’s because some languages allow forward declarations, and so we’ll report it to the lexer, but subsequent phases could opt to ignore them.

IR dumps

The latest cool feature is the ability to dump the entire IR into JSON (or other formats if you choose). Optimization passes will rewrite this structure, and you can dump the IR out after any number of optimization passes you like. The reducer-interpreter backend will support tracing the IR as it is dynamically rewritten during program execution.

The IR of the above program can be shown (in conjunction with everything else) via:
./ucc trace -e interpreter -f example.pas -l pascal -styp --format json

Which results in everything above, plus:

{
 "node":{
    "type":"compound", 
    "depth":1,
    "value":"<nil>",
    "text":"a",
    "value":"%!s(<nil>)",
    "line":4,
    "position":1,
    "children":[
     {
      "node":{
         "type":"Assign", 
         "depth":6,
         "value":"<nil>",
         "text":"a",
         "value":"%!s(<nil>)",
         "line":4,
         "position":1,
         "children":[
          {
           "node":{
              "type":"var", 
              "depth":11,
              "value":"<nil>",
              "text":":=",
              "value":"%!s(<nil>)",
              "line":4,
              "position":3
           }
          },
          {
           "node":{
              "type":"int64", 
              "depth":11,
              "value":"1",
              "text":"",
              "value":"%!s(int64=1)",
              "line":4,
              "position":6
           }
          }
         ]
      }
     },
     {
      "node":{
         "type":"Assign", 
         "depth":6,
         "value":"<nil>",
         "text":"b",
         "value":"%!s(<nil>)",
         "line":5,
         "position":1,
         "children":[
          {
           "node":{
              "type":"var", 
              "depth":11,
              "value":"<nil>",
              "text":":=",
              "value":"%!s(<nil>)",
              "line":5,
              "position":3
           }
          },
          {
           "node":{
              "type":"int64", 
              "depth":11,
              "value":"200",
              "text":"",
              "value":"%!s(int64=200)",
              "line":5,
              "position":6
           }
          }
         ]
      }
     },
     {
      "node":{
         "type":"Assign", 
         "depth":6,
         "value":"<nil>",
         "text":"c",
         "value":"%!s(<nil>)",
         "line":6,
         "position":1,
         "children":[
          {
           "node":{
              "type":"var", 
              "depth":11,
              "value":"<nil>",
              "text":":=",
              "value":"%!s(<nil>)",
              "line":6,
              "position":3
           }
          },
          {
           "node":{
              "type":"+", 
              "depth":11,
              "value":"<nil>",
              "text":"+",
              "value":"%!s(<nil>)",
              "line":6,
              "position":13,
              "children":[
               {
                "node":{
                   "type":"*", 
                   "depth":16,
                   "value":"<nil>",
                   "text":"*",
                   "value":"%!s(<nil>)",
                   "line":6,
                   "position":8,
                   "children":[
                    {
                     "node":{
                        "type":"int64", 
                        "depth":21,
                        "value":"1",
                        "text":"",
                        "value":"%!s(int64=1)",
                        "line":6,
                        "position":6
                     }
                    },
                    {
                     "node":{
                        "type":"int64", 
                        "depth":21,
                        "value":"20",
                        "text":"",
                        "value":"%!s(int64=20)",
                        "line":6,
                        "position":10
                     }
                    }
                   ]
                }
               },
               {
                "node":{
                   "type":"var", 
                   "depth":16,
                   "value":"<nil>",
                   "text":"a",
                   "value":"%!s(<nil>)",
                   "line":6,
                   "position":15
                }
               }
              ]
           }
          }
         ]
      }
     },
     {
      "node":{
         "type":"Assign", 
         "depth":6,
         "value":"<nil>",
         "text":"e",
         "value":"%!s(<nil>)",
         "line":7,
         "position":1,
         "children":[
          {
           "node":{
              "type":"var", 
              "depth":11,
              "value":"<nil>",
              "text":":=",
              "value":"%!s(<nil>)",
              "line":7,
              "position":3
           }
          },
          {
           "node":{
              "type":"int64", 
              "depth":11,
              "value":"1",
              "text":"",
              "value":"%!s(int64=1)",
              "line":7,
              "position":6
           }
          }
         ]
      }
     },
     {
      "node":{
         "type":"Assign", 
         "depth":6,
         "value":"<nil>",
         "text":"d",
         "value":"%!s(<nil>)",
         "line":8,
         "position":1,
         "children":[
          {
           "node":{
              "type":"var", 
              "depth":11,
              "value":"<nil>",
              "text":":=",
              "value":"%!s(<nil>)",
              "line":8,
              "position":3
           }
          },
          {
           "node":{
              "type":"+", 
              "depth":11,
              "value":"<nil>",
              "text":"+",
              "value":"%!s(<nil>)",
              "line":8,
              "position":16,
              "children":[
               {
                "node":{
                   "type":"+", 
                   "depth":16,
                   "value":"<nil>",
                   "text":"+",
                   "value":"%!s(<nil>)",
                   "line":8,
                   "position":12,
                   "children":[
                    {
                     "node":{
                        "type":"+", 
                        "depth":21,
                        "value":"<nil>",
                        "text":"+",
                        "value":"%!s(<nil>)",
                        "line":8,
                        "position":8,
                        "children":[
                         {
                          "node":{
                             "type":"var", 
                             "depth":26,
                             "value":"<nil>",
                             "text":"c",
                             "value":"%!s(<nil>)",
                             "line":8,
                             "position":6
                          }
                         },
                         {
                          "node":{
                             "type":"var", 
                             "depth":26,
                             "value":"<nil>",
                             "text":"a",
                             "value":"%!s(<nil>)",
                             "line":8,
                             "position":10
                          }
                         }
                        ]
                     }
                    },
                    {
                     "node":{
                        "type":"var", 
                        "depth":21,
                        "value":"<nil>",
                        "text":"e",
                        "value":"%!s(<nil>)",
                        "line":8,
                        "position":14
                     }
                    }
                   ]
                }
               },
               {
                "node":{
                   "type":"var", 
                   "depth":16,
                   "value":"<nil>",
                   "text":"nothere",
                   "value":"%!s(<nil>)",
                   "line":8,
                   "position":17
                }
               }
              ]
           }
          }
         ]
      }
     }
    ]
 }
}

One of the cooler features of UCC is the ability to dynamically insert or substitute IR nodes. For instance, an early language I plan on supporting is Groovy, and it would be relatively easy to append additional nodes that convey type information about a given definition even if that type information isn’t available at compile-time. Non-compatible type-reassignments could remove that type-information, while covariant assignments could specialize it further.

Leave a Reply