(thesis 3/4) An interpreter for Spatial

Posted on August 16, 2017 by Ruben Fiszel

About

This post is the part III out of IV of my master thesis at the DAWN lab, Stanford, under Prof. Kunle and Prof. Odersky supervision. The central themes of this thesis are sensor fusion and spatial, an hardware description language (Verilog is also one, but tedious).

This part is about the spatial language and the development of its interpreter.

An interpreter for Spatial

Spatial: A Hardware Description Language

Building applications is only made possible thanks to the many layers of abstractions that start fundamentally at a rudimentary level. It is easy to forget how much of an exceptional feat of engineering is running an application.

An Hardware vs Software abstraction layers overview

An Hardware vs Software abstraction layers overview

An Hardware Description Language (HDL) is used to describe the circuits on which applications run on. A Software programming language describe the applications themselves (imperative languages focus on the how, and functional programming languages on the what). Fundamentally, their purpose is different. But with a sufficient level of abstraction, they share many similarities.

c := a + b would translate in software by an instruction to store in the memory (stack or heap) the sum of a and b, stored themselves somewhere else in memory. In hardware, depending on whether c represents a wire, a register, a memory location in SRAM or DRAM, the circuit is changed. However, from the user perspective, the source code looks the same. One could think that it would be possible to write Hardware exactly the same way as Software, but this is delusional. Some concepts are tied to the intrinsic nature of Hardware and hopelessly inexpressible in the world of Software. A DSL that would abstract away those differences would result in a great loss of control for the user. Nevertheless, with the right level of abstraction it is possible to at least bridge the gap to a level satisfying for both the Software and Hardware engineers. This is the motivation behind Spatial.

Spatial is hardware description language (HDL) born out of the difficulties and complexity of designing Hardware. An HDL compiles to Register-Transfer Level (RTL), an equivalent to assembly in the software world. Then the RTL is synthesized as Hardware (either as Application-specific integrated circuit (ASIC) or as a bitstream reconfiguration data). The current alternatives for HDLs available are Verilog, VHDL, HLS, Chisel and many others. What sets apart Spatial from the crowd is that Spatial has a higher level of abstraction by leveraging parallel patterns, abstracting control flows as language constructs and automatic timing and banking. Spatial targets “spatial architectures” constituted currently of the Field-Programmable Gate Array (FPGA) and a Coarse Grain Reconfigurable Arrays (CGRA) developed also by the lab, Plasticine. Chisel is actually the target language of Spatial for the FPGA backend. Parallel patterns and control flows are detailed in Part IV.

Spatial is at the same time a language, an Intermediate Representation (IR) and a compiler. The Spatial language is embedded in Scala as a domain specific language (DSL). The compiler is built around Argon as a set custom defined traversals, transformers and codegens. The Spatial compiler is referred to as “the staging compiler” to differentiate it from scalac, the Scala compiler.

Argon

Spatial is built on top of Argon, a fork of Lightweight Modular Staging (LMS). Argon and LMS are Scala libraries that enable staged programming (also called staged meta-programming). Thanks to Argon, language designers can specify a DSL and a custom compiler. In this DSL, users can write and run meta-programs and more specifically program generators: programs that generate other programs.

Argon is:

Staged type

A staged type is a type that belongs to the specified DSL and has a staging annotation. Only instances of a staged type will be transformed into the initial IR.

Indeed, for a type to be considered a staged type, it must inherit from MetaAny and have an existing typeclass instance of Type. The justification behind the dual proof of membership is that the Type context bound is more elegant to work with in most cases. Nevertheless, it suffers that it is impossible to specialize methods such that they treat differently staged and unstaged types. Only inheritance can guarantee correct dispatching of methods according to whether the argument is staged or not. Implementing the typeclass and dual proof of membership was among the contributions of this work to Argon.

trait MetaAny
trait Type[A]

case class Staged() extends MetaAny
case class Unstaged()

implicit object StagedInstance extends Type[Staged]

object Attempt1 {
	//equivalent to def equal(x: Any, y: Any) = 
	def equal[A, B](x: A, y: B) = 
		1

	def equal[A: Type, B: Type](x: A, y: B) = 
		2	
}

object Attempt2 {
	def equal(x: Any, y: Any) = 
		1

	def equal(x: MetaAny, y: MetaAny) = 
		2
}

Attempt1.equal(Unstaged(), Unstaged())
//return error: ambiguous reference to overloaded definition
Attempt1.equal(Staged(), Staged())
//return error: ambiguous reference to overloaded definition

Attempt2.equal(Unstaged(), Unstaged())
//return 1 as expected
Attempt2.equal(Staged(), Staged())
//return 2 as expected
Example of inheritance solving correct dispatching

Example of inheritance solving correct dispatching

IR

The IR in Argon is based on a “sea-of-nodes” representation of the object-program. The “sea-of-nodes” representation is a graph containing all kind of dependencies and anti-dependencies between the nodes of the IR. The IR is the data manipulated by the staging compiler and is transformed after each pass. The IR nodes are staged Exp values. Exp is an algebraic data type (ADT), more precisely a sum type of the following form:

graph TD Exp Dyn Bound Sym Const Param Exp-->Dyn Dyn-->Bound Dyn-->Sym Exp-->Const Const-->Param

Transformer and traversal

A Traversal is a pass of the compiler that traverses (iterates through) the entire IR and applies an arbitrary function. It can either be used to check if the IR is well-formed or to gather some logs and stats about the current state of the IR. Since the IR is a “sea-of-nodes”, it has to be linearized first by a scheduler as a sequence of nodes. Codegen is defined as a traversal.

A Transformer is a Traversal that not only traverses the IR but also transforms it into a new IR.

Language virtualization

Using Scala macros, some of the primitive syntax constructions and keywords of Scala are made interoperable with Spatial staged types. The following parts are currently virtualized (where cond is an Argon.Boolean):

Below, a and b are Any:

Source Context

All usage of the DSL in the meta-program is accompanied with an implicit macro expansion of a SourceContext object. That object is passed along in the IR such that all IR nodes have an associated SourceContext. That object contains context information such as the line number, the position in the line, the method name, and the content of the line on which the DSL node at the origin is located. This is how the interpreter can display the surrounding context of each interpreted instruction.

Meta-expansion

Since DSLs are embedded in Scala, it is possible to use the Scala language as a meta-programming tool directly. The construction of the IR is done in an imperative manner and only staged types are visible to the staging compiler.

List.tabulate(100)(_ => Reg[Int])

will be meta-expanded as the creation of 100 Registers.

For the same reason, when named Scala functions are defined and called inside a Spatial program, the function call is not staged but inlined during meta-expansion.

def f(x: argon.Int) = 
	//very long body
	
val b: argon.Boolean = ...

//thanks to language virtualization, this is syntaxic sugar for
//ifThenElse(b, f(0), f(1)) where ifThenElse is an argon defined
//function
if (b) 
	f(0)
else 
	f(1)

is expanded into

val b: argon.Boolean = ...

if (b)
	//very long body involving 0
else 
	//very long body involving 1

Codegen

graph TD subgraph Scala Compile Time: 1st stage prog(Meta-Program) scalac end subgraph Scala Runtime: 2nd stage exec(Executable Meta-Program) expanded(expanded DSL nodes) OIR(raw IR) subgraph passes transformer IR(IR) end codegen target(Chisel Object Program) chisel[Chisel Compiler] target2(Verilog Program) verilog[Verilog Compiler] target3(Hardware Design) end prog-->scalac scalac-->exec exec-->|meta-expansion|expanded expanded-->|staging|OIR OIR-->transformer IR-->transformer transformer-->IR IR-->codegen codegen-->target target-->chisel chisel-->target2 target2-->verilog verilog-->target3

After the IR has been transformed through all the passes, it is sufficiently refined to be processed by the codegen. The codegen is implemented as a traversal which, after linearization by scheduling, visits each item of a sequence of pair of Sym and Def. Each pair is transformed according to the Def as a string in the format of the target language and written to the output file. Def nodes have versatile meaning since they encompass the full range of the language. Language designers add Def nodes to their language in a modular manner. For Spatial, each kind of data type have an associated set of Defs which are defined in their own modules and mixed-in incrementally to the compiler. For instance, argon.Boolean have among others Def nodes that can be simplified as:

Staging compiler flow

The full work flow of program staging through Argon is as follows: The meta-program is first compiled by scalac as an “executable meta-program”. When this executable is run, it starts meta-expansion and as a result, constructs an initial IR. That initial IR goes through the different transformers which correspond to the passes of the staging compiler. Once the IR is sufficiently refined by having been through all the passes, it is codegen in the target language.

Simulation in Spatial

Synthesizing takes time, many days in some instances. It is beneficial for users to have access to an early proof of correctness of the program’s logic. This justifies the existence of a simulation mode. Before the development of the interpreter, the simulation mode was a codegen whose target was a Scala program of the simulated circuit logic. The resulting Scala program is self-contained and reproduces the execution of the hardware design, but only to some extent. To mirror exactly the execution of the design, it is required to write a cycle accurate simulator. It is possible but not simple, especially writing it in a codegen form. Furthermore, a cycle-accurate simulator already exists: Synopsys Verilog Compiler Simulator (VCS). However, VCS takes Verilog as input. Hence, it cannot leverage the richer information from the DSL and the debugging cannot be enhanced with Spatial annotations (for instance with SourceContext). Finally, writing a compiler is more complex than writing an equivalent interpreter.

Benefits of the interpreter

graph TD subgraph Scala Compile Time: 1st stage prog(Meta-Program) scalac end subgraph Scala Runtime: 2nd stage exec(Executable Meta-Program) expanded(expanded DSL nodes) OIR(raw IR) subgraph passes transformer IR(IR) end interpreter end prog-->scalac scalac-->exec exec-->|meta -expansion|expanded expanded-->|staging|OIR OIR-->transformer IR-->transformer transformer-->IR IR-->interpreter

Building an interpreter for Spatial was a requirement of having a Spatial integration in scala-flow. Furthermore, it is a requirement to integrate a Spatial simulator into any external library. It also benefits the Spatial ecosystem as a whole. Indeed, an interpreter encourages the user to have more interactions with the language and working in increasing complexity iterations thanks to fast feedback since the work flow involves less steps, is faster to launch and is more tightly integrated with Spatial (the interpreter has access to SourceContext, among others). The interpreter is not yet cycle-accurate, but this is planned as future work.

Interpreter

The interpreter is implemented as an alternative to codegen. The largest benefit of this approach is that the interpreter sees an IR that has already been processed and can mirror closely the codegen and the intended evaluation of the generated code. Moreover, if one of the passes fails or throw an error, then running the interpreter will also halt at that error.

Screenshot of the interpreter in action

Screenshot of the interpreter in action

Usage

Any Spatial application can be run using the CLI flag --interpreter. If used in combination with the flag -v (for “verbose”), each instruction interpretation will display the full state of the interpreter. If used without any verbosity flag, then only the name and number of the instruction is displayed at each step. Finally, if the flag -q (for “quiet”) is used, then nothing is displayed during the interpreter execution. At all verbosity levels, the state of the interpreter includes the result in the output bus, if any, is displayed after the last instruction has been interpreted.

Debugging nodes

The Argon DSL has been extended with the static methods breakpoint() and exit(). The method breakpoint() pauses the interpreter and displays its internal state. A key must be pressed to resume the interpreter evaluation.

The method exit() stops the evaluation of the interpreter.

Interpreter stream

In addition to standard applications being able to run as-is, applications that rely on streams have been given some specific attention in order to ease their usage with the interpreter. Indeed, being able to run a Spatial application in the same runtime as the application’s compilation gives the means to do in-memory transfer between the meta-program (or the larger surrounding program), to the object program to the input streams. This is made easy by using the following pattern:

An example is provided below:

trait StreamInOutAdd extends SpatialStream {

  @virtualize def spatial() = {
    val in  = StreamIn[Int](In1)
    val out = StreamOut[Int](Out1)    
    Accel(*) {
      out := in + 4
      breakpoint
    }
  }
}


object StreamInOutAddInterpreter extends StreamInOutAdd
	with SpatialStreamInterpreter {

  val outs = List(Out1)

  val inputs = collection.immutable.Map[Bus, List[MetaAny[_]]](
    (In1 -> List[Int](1, 2, 3, 6))
  )

}

object StreamInOutAddCompiler extends StreamInOutAdd
	with SpatialStreamCompiler

In-memory transfer was essential in integrating the spatial interpreter with scala-flow.

Implementation

The IR given to the interpreter follows a static single assignment (SSA) form. The interpreter is implemented as a traversal which, after linearization by scheduling, visits a sequence of pair of Sym and Def. The interpreter core is a central memory that contains the values of all symbols, and an auxiliary memory that contains the values of the temporary bounds. The pair of Sym and Def is processed by evaluating the Def node through modular extensions of the interpreter that mirror the modular partitioning of the IR itself. Once evaluated, the result is stored in the central memory with index the Sym. An eval function is used as an auxiliary method for the evaluation of the nodes. It takes as argument an Exp and can be simplified as:

Here is a very simplified example

val a: argon.Int = ...
val b = 2 + a
b * 4 * b

//becomes roughly after staging and linearization
Seq(
	(x1, Add(Const(2), a)),
	(x2, Mult(x1, Const(4))),
	(x3, Mult(x2, x1))
)

//let's assume the central memory starts with:
a <- 1
//the interpreter will evaluates the seq as
x1 <- eval(2) + eval(a) = 2 + 1 = 3
x2 <- eval(x1) * eval(4) = 3 * 4 = 12
x3 <- eval(x2) * eval(x1) = 12 * 3 = 46

Blocks and control flows handling rely on node-defined traversals of their inner body. Loops with various parallelizing factors are handled using an interpreter-specific scheduler.

The currently implemented modules of the Spatial IR for the interpreter are:

Conclusion

The addition of an interpreter to Argon and Spatial improves the whole ecosystem and offer new possibilities. Maintenance and extension of the simulator will be easier to write in an interpreter form, especially if a cycle-accurate simulator is developed. It is hoped that the interpreter will prove itself useful in the workflow of all app developers and become a core element of Spatial.

Part III

Continue here to read the section about the hardware implementation of the RBPF in Spatial (IV/IV)

[1] D. Koeplinger, C. Delimitrou, R. Prabhakar, C. Kozyrakis, Y. Zhang, and K. Olukotun, “Automatic generation of efficient accelerators for reconfigurable hardware,” in Proceedings of the 43rd International Symposium on Computer Architecture, 2016, pp. 115–127.

Licence Creative Commons