GSoC 2014: Improved type checker for SPAD compiler

Objectives

The project aims to deliver a new type checker for Spad language. Several improvements over current type checker are planned - i.e.:
  • introduce better type inference,
  • introduce new language constructs (if time allows),
  • produce understandable diagnostic messages (currently the compiler dumps AST as s-expression if it encounters an error),
  • eliminate well known bugs in the type system,
  • find new type errors in FriCAS library code. 
The type checker will be written completely in Spad. Additionally, new implementation will enable me to continue my work with the code generator and formalize the type system.

Techniques employed will involve bottom-up tree walking, graph-like structures for type information management, aggressive type pruning.

Deliverables

The project should deliver a partial implementation of type checker for Spad language. It'll be a set of source files with accompanying documentation. The code will have a single entry point - type checker function - which will accept parsed source file and infer types as needed. The result of the type checker will be a type annotated AST (with "quot; and "@" operators) ready to be fed into existing Spad compiler, or an error message stating (as clearly as possible) why the type checking failed. The new type checker should:
  • work on existing Spad code in FriCAS library without massive source changes,
  • handle most commonly used program structures,
  • detect as many type errors as possible before passing the code to original compiler,
  • catch errors that go undetected with current type checker,
  • show that in some cases it doesn't require user delivered annotations, that are otherwise necessary,
  • make the compilation process faster.
Given research nature of the project some goals are consciously declared vague.

Timeline

Phase #0: March 21st ~ May 18th

I've already begun the work - I'm trying to build some basic tools that I'll use to implement new type checker. The plan is to enter GSoC with all necessary tools ready to build the type checker, i.e.:
  • [done] abstract syntax tree representation
  • [done] generic AST walkers
  • [done] interface to global database (daase)
  • [done] compilation context / environment representation
  • [done] interface to call current compiler
  • [done] pretty printers.
The sources are available here.

Phase #1: May 19th ~ June 27th

I'm going to build an unoptimized bottom-up type checker that will support simple language constructs.

May 19th ~ May 30th

  • support following types:
    • [donebuiltin basic datatypes (String, Integer, DoubleFloat, Symbol),
    • [donemapping type (aka function type),
    • [donefunctors with types as arguments,
    • [donedomain signatures,
    • [left out] category signatures.
  • type inference for:
    • [doneconditional expression (if-then-else),
    • [donefunction application,
    • [donevalue assignment,
    • [doneexpression sequence,
    • [donereturn statement.
  • [done] [newhandle (partially) argument-less functor definition.

June 2nd ~ June 13th

  • type inference for:
    • [donefunction arguments and result,
    • [done] recursive function calls.
  • support for:
    • [donenew function definition,
    • [done] type annotation and conversion.
  • [done] [newhandle type as functor arguments.

June 16th ~ June 27th

  • [donesupport for import statement,
  • [done] [newerror statement,
  • [done] [new] simple loop statement (while .. repeat),
  • [left out] testing and writing documentation.

Phase #2: June 28th ~ August 18th

This phase will be spent writing support for previously skipped language constructs. If any extra time is left I'll optimize signature lookups (e.g. by environment trimming) and other performance critical functions.

June 30th ~ July 11th

  • [done[new] logger infrastructure for debugging,
  • [done[newcheck if parser nodes are well formed for records and unions,
  • [done[new] pass annotated tree to original compiler,
  • support for composite types:
    • [donerecords,
    • [done] tagless unions,
    • [left out] tagged unions.
  • [left out] compiler database modifications,
  • [partially] new domain definition,
  • [left out] new category definition.

July 14th ~ July 25th

  • [doneloop statements,
  • [done[new] segment iterator in loops,
  • [donelist comprehension expressions,
  • [done] union variant checking with case operator,
  • [left out] conditional expressions on types (has, is, pretend operators / predicates)

July 28th ~ August 8th

  • [left out] allow expressions as functor arguments,
  • support for less commonly used but useful constructs:
    • [doneanonymous functions,
    • [done] aggregate access,
    • [done] [new] arbitrary list construction.
  • [done] [new] subtyping relation for built-in types.
  • [partially] [new] domain type evaluator.

August 11th ~ August 18th

  • [new] subtyping for:
    • [done] assignment,
    • [done] function application,
    • [done] function return type.
  • [done] [new] local function definition / declaration.
  • Testing, more testing, even more testing... and writing documentation.

Weekly reports

Phase #1

Week 1

This week was mostly spent on the design of type checker data structures. I decided to use FlexibleArray domain to implement an array of annotated nodes. By it I mean original node delivered by parser with some extra data:
  • indices to parent and children nodes (if any),
  • a list of inferred types associated with the node,
  • a rule that indicates how type checker should process this node,
  • compiler's environment and some other bookkeeping information.
The type checker will be preceded by syntax check phase, which will:
  • determine symbols visibility (lexical scope, shadowing),
  • construct compiler's environment for each node (variable introduction),
  • propagate environment according to the rules (sequence vs. conditional expressions),
  • lookup for symbols' types.
When syntax check phase succeeds an array of nodes is constructed and ready to be passed to type checker.

Week 2

Implemented:
  • basic structures for the type checker - i.e. representation of AST nodes with types and an array of type variables,
  • added missing wrappers to access original compilation environment,
  • query environment for variable / function symbols,
  • conversion of AST nodes back to s-expressions understood by original compiler,
  • create structures for type checker covering basic SPAD expressions.

Week 3

Implemented:
  • add marker nodes to AST to represent references and type variables,
  • simple type unification,
  • partial type inference for basic SPAD expressions,
  • support extra SPAD expressions. 

Week 4

Implemented:
  • AST node equality check,
  • proper type unification with occurs check,
  • substitutions list merge,
  • messy but (quite complete) working type inference.

Week 5

The type checker was restructured to improve readibility and provablity.

I introduced a notion of algorithm meta-step. Between each two consecutive steps a progress is calculated (number of unfinished nodes, number of type variables and type variants). Thus the algorithm can avoid inifite looping. Type inference works very well, but a problem emerge where the same type is represented in two ways, which makes unification fail for such case. 

Summary

[June 23th ~ June 27th]

Phase #2

Week 1

[June 30th ~ July 4th]

Week 2

[July 7th ~ July 11th]

Week 3

[July 14th ~ July 18th]

Week 4

[July 21st ~ July 25th]

Week 5

[July 28th ~ August 1st]

Week 6

[August 4th ~ August 8th]

Summary

[August 11th ~ August 15th]
Comments