Remote fpga call
Fpga core in CλaSH
Lets make matrix multiplication as Fpga core in language CλaSH. Clash is basically Haskell programming language where Ghc compiler is hacked to generate either verilog, vhdl or system verilog source that can be used by tools (Xilinx Vivado for example) to synthesize digital circuit. Both clash and ghc are free and open sourced.
Multiplication of 2 matrices size 3x3 and using 16 bit arithemtics can be defined as
type M3x3 = Vec 3 (Vec 3 (Signed 16))
matrixMultiply3x3 :: M3x3 -> M3x3 -> M3x3
matrixMultiply3x3 = fmap ( mulLine (traverse id b)) a
where
mulLine a b = fmap (fold (+) . zipWith (*) b) a
Great. Fpga core! Instead of having instructions run by conventional processor, we have digital circuit that performs multiplication instantly. Time required for copletion depends only on timing for signal propagation trough logic gates.
To test and use this we need to get data in and out of Fpga. This core defines 2 * 3 * 3 * 16 = 288 bits input and 144 bits as output. Wiring all pins directly on output of chip is not an option. Bus infrastructure to get data in and out of fpga is requred.
RedPitaya is already using bus. I am using RedPitaya because it is Fpga board I got. It comes with sources for Fpga. It is running on Xilinx Zynq Soc that has both Fpga and Arm Cortex A9. Arm is running Debian OS and has 512 MB of RAM and Fpga has 28k Logic Cells and 18k LUTs approx.
Lambdaya-bus libraray
Lambdaya-bus library provides both bus core for fpga and client code for communicating with core from application.
On fpga side we need function that takes core as argument and creates bus. We aim for something with this kind of signature.
core2bus :: (Signal a -> Signal b) -> Signal BusIn -> Signal BusOut
That is, we have function that consumes core and provides bus interface.
We are close to this, expect ahem, we have to deal with meta programing.
The way this to work, compiler must know how large type a
and b
are
and how to divide them in 32 bit chunks.
simpleBus :: Signal BusIn -> Signal BusOut
simpleBus = busBuild $(bTQ matrixMul3x3sig) matrixMul3x3sig
This is how at current state one defines transformation from core to
bus. $(bTQ matrixMul3x3sig)
is template haskell that works like macro
to help deducing proper types as serialization.
It is similar from client side.
From example
multiply :: (M3x3,M3x3) -> NetworkFpgaSetGet M3x3
multiply = callCore $(bTQ matrixMul3x3sig) 5 0 5 0
Magic numbers 5 0 5 0
are page and starting offset for writing and
reading from bus. There is some kick start tutrial availabe at
github together with presented example.
One can run this same code also directly from Arm without networking
overhead. Typeclass
FpgaSetGet
in this case writes directly on bus.
Porting this library on other Fpga requires implementing
FpgaSetGet
on client side . There are 4 functions this class is
implementing, but only 2 (one set and one get) is required. On Fpga side
bus with Signal BusIn -> Signal BusOut
is needed.
Fpga bus
Verilog implementation is using extra bits as signals that user must explicitly check to figure out whatever data is available and similar at reading. When there is read request on bus, user must explicitly set bit (either low or high depends on documentation) to indicate that data are written on bus. Haskell enables expressing this in rich type system with same overhead, more expressiveness and less options to make this wrong.
To tell whatever data is available use type Maybe d
. That mean that
value d
(for example data) can be either valid 5
and has value
Just 5
or data is not available and value is Nothing
.
Our simple bus we are implementing in clash is than
type BusIn = Maybe (FullAddress,ReadWrite,FullDataIn)
Compared to verilog similar interface is provided, except that
information about data availability is encoded in type. That means that
value can be Nothing
or for example Just (0x40500000,Write,0x10)
.
This make reasoning and checking whenever data is available simpler and
less error prone. We are unable to not check for data availability
because code that does not handle type
Maybe (FullAddress,ReadWrite,FullDataIn)
will not compile. This
approach eliminates kind of errors when mixing up polarity or forgetting
about this at all.
It is similar when writing data on bus
type BusOut = Maybe FullDataOut
That is data from core is either valid and is for example Just 42
or
is not valid and is Nothing
. Working with such type express intent
much more clear.
Conclusion
Main pitch for Clash is that it enables developing more ambitious projects. Abstractions Haskell enables might seems scary at first, but one gets feeling about how cores are generated, just like experienced C++ programmer can get grasp of what kind of machine code he can expect. Most if not all of Haskell abstractions come for free in clash and allows describing both intent and circuitry, not just circuitry.
Since haskell is one of few languages having ability to run effectively on both fpga and on pc, we can nicely mix fpga and client side, and I wish that further impovements of lambdaya-bus library will go in further directions to simplify this approach even more. State of lambdaya-bus at this stage with version 0.0.0.2 is just proof of concept as something that can be demonstrated to work.