ObjectivesThe project aims to deliver a new type checker for Spad language. Several improvements over current type checker are planned - i.e.: - introduce better type inference,
- introduce new language constructs (if time allows),
- produce understandable diagnostic messages (currently the compiler dumps AST as s-expression if it encounters an error),
- eliminate well known bugs in the type system,
- find new type errors in FriCAS library code.
The type checker will be written completely in Spad. Additionally, new implementation will enable me to continue my work with the code generator and formalize the type system. Techniques employed will involve bottom-up tree walking, graph-like structures for type information management, aggressive type pruning. DeliverablesThe project should deliver a partial implementation of type checker for Spad language. It'll be a set of source files with accompanying documentation. The code will have a single entry point - type checker function - which will accept parsed source file and infer types as needed. The result of the type checker will be a type annotated AST (with "quot; and "@" operators) ready to be fed into existing Spad compiler, or an error message stating (as clearly as possible) why the type checking failed. The new type checker should: - work on existing Spad code in FriCAS library without massive source changes,
- handle most commonly used program structures,
- detect as many type errors as possible before passing the code to original compiler,
- catch errors that go undetected with current type checker,
- show that in some cases it doesn't require user delivered annotations, that are otherwise necessary,
- make the compilation process faster.
Given research nature of the project some goals are consciously declared vague. TimelinePhase #0: March 21st ~ May 18thI've already begun the work - I'm trying to build some basic tools that I'll use to implement new type checker. The plan is to enter GSoC with all necessary tools ready to build the type checker, i.e.: - [done]
abstract syntax tree representation - [done]
generic AST walkers - [done]
interface to global database (daase) - [done]
compilation context / environment representation - [done]
interface to call current compiler - [done]
pretty printers.
The sources are available here. Phase #1: May 19th ~ June 27thI'm going to build an unoptimized bottom-up type checker that will support simple language constructs. May 19th ~ May 30th- support following types:
- [done]
builtin basic datatypes (String, Integer, DoubleFloat, Symbol), - [done]
mapping type (aka function type), - [done]
functors with types as arguments, - [done]
domain signatures, - [left out] category signatures.
- type inference for:
- [done]
conditional expression (if-then-else), - [done]
function application, - [done]
value assignment, - [done]
expression sequence, - [done]
return statement.
- [done] [new]
handle (partially) argument-less functor definition.
June 2nd ~ June 13th- type inference for:
- [done]
function arguments and result, - [done]
recursive function calls.
- support for:
- [done]
new function definition, - [done]
type annotation and conversion.
- [done] [new]
handle type as functor arguments.
June 16th ~ June 27th- [done]
support for import statement, - [done] [new]
error statement, - [done] [new]
simple loop statement (while .. repeat), - [left out] testing and writing documentation.
Phase #2: June 28th ~ August 18thThis phase will be spent writing support for previously skipped language constructs. If any extra time is left I'll optimize signature lookups (e.g. by environment trimming) and other performance critical functions. June 30th ~ July 11th- [done] [new]
logger infrastructure for debugging, - [done] [new]
check if parser nodes are well formed for records and unions, - [done] [new]
pass annotated tree to original compiler, - support for composite types:
- [done]
records, - [done]
tagless unions, - [left out] tagged unions.
- [left out] compiler database modifications,
- [partially]
new domain definition, - [left out] new category definition.
July 14th ~ July 25th- [done]
loop statements, - [done] [new]
segment iterator in loops, - [done]
list comprehension expressions, - [done]
union variant checking with case operator, - [left out] conditional expressions on types (has, is, pretend operators / predicates)
July 28th ~ August 8th- [left out] allow expressions as functor arguments,
- support for less commonly used but useful constructs:
- [done]
anonymous functions, - [done]
aggregate access, - [done] [new]
arbitrary list construction.
- [done] [new]
subtyping relation for built-in types. - [partially] [new]
domain type evaluator.
August 11th ~ August 18th- [new] subtyping for:
- [done]
assignment, - [done]
function application, - [done]
function return type.
- [done] [new]
local function definition / declaration. - Testing, more testing, even more testing... and writing documentation.
Weekly reportsPhase #1Week 1This week was mostly spent on the design of type checker data structures. I decided to use FlexibleArray domain to implement an array of annotated nodes. By it I mean original node delivered by parser with some extra data: - indices to parent and children nodes (if any),
- a list of inferred types associated with the node,
- a rule that indicates how type checker should process this node,
- compiler's environment and some other bookkeeping information.
The type checker will be preceded by syntax check phase, which will: - determine symbols visibility (lexical scope, shadowing),
- construct compiler's environment for each node (variable introduction),
- propagate environment according to the rules (sequence vs. conditional expressions),
- lookup for symbols' types.
When syntax check phase succeeds an array of nodes is constructed and ready to be passed to type checker. - basic structures for the type checker - i.e. representation of AST nodes with types and an array of type variables,
- added missing wrappers to access original compilation environment,
- query environment for variable / function symbols,
- conversion of AST nodes back to s-expressions understood by original compiler,
- create structures for type checker covering basic SPAD expressions.
Week 3Implemented: - add marker nodes to AST to represent references and type variables,
- simple type unification,
- partial type inference for basic SPAD expressions,
- support extra SPAD expressions.
Week 4Implemented: - AST node equality check,
- proper type unification with occurs check,
- substitutions list merge,
- messy but (quite complete) working type inference.
Week 5The type checker was restructured to improve readibility and provablity.
I introduced a notion of algorithm meta-step. Between each two consecutive steps a progress is calculated (number of unfinished nodes, number of type variables and type variants). Thus the algorithm can avoid inifite looping. Type inference works very well, but a problem emerge where the same type is represented in two ways, which makes unification fail for such case. Summary[June 23th ~ June 27th] Phase #2Week 1[June 30th ~ July 4th] Week 2[July 7th ~ July 11th] Week 3[July 14th ~ July 18th] Week 4[July 21st ~ July 25th] Week 5[July 28th ~ August 1st] Week 6[August 4th ~ August 8th] Summary[August 11th ~ August 15th] |