Fpga core in CλaSH

Lets make matrix multiplication as Fpga core in language CλaSH. Clash is basically Haskell programming language where Ghc compiler is hacked to generate either verilog, vhdl or system verilog source that can be used by tools (Xilinx Vivado for example) to synthesize digital circuit. Both clash and ghc are free and open sourced.

Multiplication of 2 matrices size 3x3 and using 16 bit arithemtics can be defined as

type M3x3 = Vec 3 (Vec 3  (Signed 16))

matrixMultiply3x3 :: M3x3 -> M3x3 -> M3x3
matrixMultiply3x3 =  fmap ( mulLine (traverse id b)) a
    where 
        mulLine a b = fmap (fold (+) . zipWith (*) b) a

Great. Fpga core! Instead of having instructions run by conventional processor, we have digital circuit that performs multiplication instantly. Time required for copletion depends only on timing for signal propagation trough logic gates.

To test and use this we need to get data in and out of Fpga. This core defines 2 * 3 * 3 * 16 = 288 bits input and 144 bits as output. Wiring all pins directly on output of chip is not an option. Bus infrastructure to get data in and out of fpga is requred.

RedPitaya is already using bus. I am using RedPitaya because it is Fpga board I got. It comes with sources for Fpga. It is running on Xilinx Zynq Soc that has both Fpga and Arm Cortex A9. Arm is running Debian OS and has 512 MB of RAM and Fpga has 28k Logic Cells and 18k LUTs approx.

Lambdaya-bus libraray

Lambdaya-bus library provides both bus core for fpga and client code for communicating with core from application.

On fpga side we need function that takes core as argument and creates bus. We aim for something with this kind of signature.

core2bus :: (Signal a -> Signal b) -> Signal BusIn -> Signal BusOut

That is, we have function that consumes core and provides bus interface. We are close to this, expect ahem, we have to deal with meta programing. The way this to work, compiler must know how large type a and b are and how to divide them in 32 bit chunks.

simpleBus :: Signal BusIn -> Signal BusOut
simpleBus = busBuild $(bTQ matrixMul3x3sig) matrixMul3x3sig

This is how at current state one defines transformation from core to bus. $(bTQ matrixMul3x3sig) is template haskell that works like macro to help deducing proper types as serialization.

It is similar from client side.

From example

multiply :: (M3x3,M3x3) -> NetworkFpgaSetGet M3x3
multiply = callCore $(bTQ matrixMul3x3sig) 5 0 5 0

Magic numbers 5 0 5 0 are page and starting offset for writing and reading from bus. There is some kick start tutrial availabe at github together with presented example.

One can run this same code also directly from Arm without networking overhead. Typeclass FpgaSetGet in this case writes directly on bus.

Porting this library on other Fpga requires implementing FpgaSetGet on client side . There are 4 functions this class is implementing, but only 2 (one set and one get) is required. On Fpga side bus with Signal BusIn -> Signal BusOut is needed.

Fpga bus

Verilog implementation is using extra bits as signals that user must explicitly check to figure out whatever data is available and similar at reading. When there is read request on bus, user must explicitly set bit (either low or high depends on documentation) to indicate that data are written on bus. Haskell enables expressing this in rich type system with same overhead, more expressiveness and less options to make this wrong.

To tell whatever data is available use type Maybe d. That mean that value d (for example data) can be either valid 5 and has value Just 5 or data is not available and value is Nothing.

Our simple bus we are implementing in clash is than

type BusIn = Maybe (FullAddress,ReadWrite,FullDataIn)

Compared to verilog similar interface is provided, except that information about data availability is encoded in type. That means that value can be Nothing or for example Just (0x40500000,Write,0x10). This make reasoning and checking whenever data is available simpler and less error prone. We are unable to not check for data availability because code that does not handle type Maybe (FullAddress,ReadWrite,FullDataIn) will not compile. This approach eliminates kind of errors when mixing up polarity or forgetting about this at all.

It is similar when writing data on bus

type BusOut = Maybe FullDataOut

That is data from core is either valid and is for example Just 42 or is not valid and is Nothing. Working with such type express intent much more clear.

Conclusion

Main pitch for Clash is that it enables developing more ambitious projects. Abstractions Haskell enables might seems scary at first, but one gets feeling about how cores are generated, just like experienced C++ programmer can get grasp of what kind of machine code he can expect. Most if not all of Haskell abstractions come for free in clash and allows describing both intent and circuitry, not just circuitry.

Since haskell is one of few languages having ability to run effectively on both fpga and on pc, we can nicely mix fpga and client side, and I wish that further impovements of lambdaya-bus library will go in further directions to simplify this approach even more. State of lambdaya-bus at this stage with version 0.0.0.2 is just proof of concept as something that can be demonstrated to work.