My first: LLVM program

I recently started reading through Stephen Diehl’s ‘Implementing a JIT Compiled Language with Haskell and LLVM’. I thought it might be fun to write a quick experience report from my getting started with LLVM and remembering how to build and link stuff manually.

Setup

I’m using Ubuntu under WSL on Windows 10. This works nicely with VS Code’s remoting system - I can use the Linux terminal directly inside VS Code on my Windows machine.

VSCode running LLVM inside WSL

(Side note: I forked the Kaleidoscope repository to get it to compile under Stackage LTS 14.27. In any case, you won’t need any Haskell code to follow this particular article!)

To get started, let’s install LLVM 9 and accoutrements.

sudo apt install llvm-9 llvm-9-dev clang-9

A trivial LLVM program

Let’s write a simple program that prints the character ? to stdout. To do this, we can use the putchar function provided by libc.

program.ll

declare i32 @putchar(i32)

define void @main() { 
    call i32 @putchar(i32 63) 
    ret void
}

We’ve just written our first program in the LLVM IR!

Hopefully it’s fairly clear what’s happening here.

  • First we declare that we have access to some function putchar that takes a 32-bit integer and returns another 32-bit integer.
  • Second we define a new function main that calls putchar with the argument 63 (the decimal ASCII code for ?).

Running the LLVM code

LLVM comes with a handy interpeter lli which lets us run our program directly. This is useful for testing that it works as expected without having to compile it.

$ lli-9 program.ll
?

Looks like our program is working! Now let’s see how we can compile it down to a proper exectuble.

(Note: on my machine, it also prints an error could not mmap JIT marker. It’s not clear exactly why.)

IR formats: text and bitcode

There are in fact two formats of LLVM IR. What we’ve written above uses the text representation (extension .ll). There is another binary representation (extension .bc) known as ‘bitcode’.

Let’s see quickly how we can convert between the two formats. First, we can use llvm-as turn text into bitcode:

$ ls
program.ll

$ llvm-as-9 program.ll

$ ls
program.bc program.ll

And to go the other way, from bitcode to text, we use llvm-dis:

$ ls
program.bc

$ llvm-dis-9 program.bc

$ ls
program.bc program.ll

You might have noticed that these programs are called the ‘LLVM assembler’ and ‘LLVM disassembler’. This is a bit of a misnomer, since there is no actual native assembly involved at any point - only different formats of the LLVM IR.

Converting IR to assembly

Next, we might want to convert our program from (platform-independent) LLVM IR to (native) assembly. For this, we can use llc.

By default, llc compiles to textual assembly code (.s). For example:

$ llc program.ll

$ cat program.s
        .text
        .file   "program.ll"
        .globl  main                    # -- Begin function main
        .p2align        4, 0x90
        .type   main,@function
main:                                   # @main
        .cfi_startproc
# %bb.0:
        pushq   %rax
        .cfi_def_cfa_offset 16
        movl    $63, %edi
        callq   putchar
        popq    %rax
        .cfi_def_cfa_offset 8
        retq
.Lfunc_end0:
        .size   main, .Lfunc_end0-main
        .cfi_endproc
                                        # -- End function

        .section        ".note.GNU-stack","",@progbits

Object files

llc can also be used to output object files (.o). These contain essentially the same content as the textual assembly, but in binary format. In fact, the difference between .s and .o files is essentially the same as the difference between .ll and .bc files.

To do this, use llc --filetype=obj <filename>:

$ llc-9 --filetype=obj program.ll

$ ls
program.ll program.o

Target triples

Of course, there are many different platforms with many different kinds of assembly. llc automatically determines which to use. To see which platform is being targeted by default, run llc --version.

$ llc-9 --version
LLVM (http://llvm.org/):
  LLVM version 9.0.0
  
  Optimized build.
  Default target: x86_64-pc-linux-gnu
  Host CPU: znver1

  Registered Targets:
    aarch64    - AArch64 (little endian)
    aarch64_32 - AArch64 (little endian ILP32)
    aarch64_be - AArch64 (big endian)
[...]

You can, however, also configure the target assembly by changing parts of the target triplet. You can override the whole thing at once using the -mtriple option, or set the architecture using the -mcpu/-march options. For example, let’s compile for 64-bit WebAssembly:

$ llc-9 --march wasm64 program.ll

$ cat program.s
	.text
	.file	"program.ll"
	.section	.text.main,"",@
	.globl	main                    # -- Begin function main
	.type	main,@function
main:                                   # @main
	.functype	main () -> ()
# %bb.0:
	i32.const	63
	i32.call	putchar
	drop
                                        # fallthrough-return-void
	end_function
.Lfunc_end0:
	.size	main, .Lfunc_end0-main
                                        # -- End function

	.functype	putchar (i32) -> (i32)

Linking the assembly

Though we have now compiled our LLVM IR to machine code in the form of object files (.o), these cannot be executed. They must be linked first to produce executables.

There are a few different ways we can do this.

Linking with GCC

We can use GCC to do the linking for us. Though GCC is often just seen as a ‘C compiler’, it’s rather more magical than that and can accept all kinds of weird inputs (including, among other things: Go (!), D and Ada).

In our case, we can actually give GCC the textual assembly:

$ gcc -o program program.s

or the binary object file:

$ gcc -o program program.o

In both cases, it will figure out how to do the right thing to produce an executable binary.

$ ./program
?

Linking with Clang

It’s not just GCC that we can use to do our linking. LLVM has a closely associated compiler, Clang, which provides very similar functionality. Lucky for us (in this case) it works exactly the same way as gcc.

$ clang-9 -o program program.s
[or]
$ clang-9 -o program program.o

$ ./program
?

Clang can also go one better. Thanks to its close link with the LLVM project, it can also consume LLVM IR directly:

$ clang-9 -o program program.ll

$ ./program
?

Manually linking

If you want to see what’s going on inside GCC during linking, you can pass the --verbose option (-v for short). Here’s what I get. The GCC linker is actually called ld, but you’ll notice that a program called collect2 is being called instead. It turns out that this is mostly a complicated alias for ld.

$ gcc --verbose -o program program.s | grep crt
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 
COLLECT_GCC_OPTIONS='-v' '-o' 'program' '-mtune=generic' '-march=x86-64'
 as -v --64 -o /tmp/ccpdizFX.o program.s
GNU assembler version 2.30 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.30
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/7/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-o' 'program' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/7/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/7/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper -plugin-opt=-fresolution=/tmp/ccBjbrLv.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -z now -z relro -o program /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. /tmp/ccpdizFX.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-v' '-o' 'program' '-mtune=generic' '-march=x86-64'
ross@golem-windows:/mnt/d/Development/Haskell/kaleidoscope/src/chapter1$ gcc --verbose -o program program.s
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 7.5.0-3ubuntu1~18.04' --with-bugurl=file:///usr/share/doc/gcc-7/README.Bugs --enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-7 --program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-vtable-verify --enable-libmpx --enable-plugin --enable-default-pie --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic --enable-offload-targets=nvptx-none --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 
COLLECT_GCC_OPTIONS='-v' '-o' 'program' '-mtune=generic' '-march=x86-64'
 as -v --64 -o /tmp/ccY5JS3x.o program.s
GNU assembler version 2.30 (x86_64-linux-gnu) using BFD version (GNU Binutils for Ubuntu) 2.30
COMPILER_PATH=/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/
LIBRARY_PATH=/usr/lib/gcc/x86_64-linux-gnu/7/:/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/:/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib/:/lib/x86_64-linux-gnu/:/lib/../lib/:/usr/lib/x86_64-linux-gnu/:/usr/lib/../lib/:/usr/lib/gcc/x86_64-linux-gnu/7/../../../:/lib/:/usr/lib/
COLLECT_GCC_OPTIONS='-v' '-o' 'program' '-mtune=generic' '-march=x86-64'
 /usr/lib/gcc/x86_64-linux-gnu/7/collect2 -plugin /usr/lib/gcc/x86_64-linux-gnu/7/liblto_plugin.so -plugin-opt=/usr/lib/gcc/x86_64-linux-gnu/7/lto-wrapper -plugin-opt=-fresolution=/tmp/ccFWGnuo.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lgcc_s --build-id --eh-frame-hdr -m elf_x86_64 --hash-style=gnu --as-needed -dynamic-linker /lib64/ld-linux-x86-64.so.2 -pie -z now -z relro -o program /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. /tmp/ccY5JS3x.o -lgcc --push-state --as-needed -lgcc_s --pop-state -lc -lgcc --push-state --as-needed -lgcc_s --pop-state /usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o
COLLECT_GCC_OPTIONS='-v' '-o' 'program' '-mtune=generic' '-march=x86-64'

Once upon a time, linking your object file was perhaps as simple as calling:

$ ld -o output /lib/crt0.o program.o -lc

This tells ld to how to link your object file (program.o):

  • /lib/crt0.o: against crt0.o (the C runtime library)
  • -lc against libc (the C standard library)

Unfortunately, a bit of cruft has stacked up since then. Here’s the smallest command I managed to get working.

$ ld -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o program /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/7/crtbeginS.o -L/usr/lib/gcc/x86_64-linux-gnu/7 -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/7/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/7/../../.. program.o -lc /usr/lib/gcc/x86_64-linux-gnu/7/crtendS.o /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crtn.o

Yikes! Probably best to just use gcc.

Clang provides a similar --verbose option. Here’s the output on my system:

$ clang-9 -v -o program program.oclang version 9.0.0-2~ubuntu18.04.2 (tags/RELEASE_900/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/8
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/7.5.0
Found candidate GCC installation: /usr/lib/gcc/x86_64-linux-gnu/8
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/usr/bin/ld" -z relro --hash-style=gnu --build-id --eh-frame-hdr -m elf_x86_64 -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o program /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu/crt1.o /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu/crti.o /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/crtbegin.o -L/usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0 -L/usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu -L/lib/x86_64-linux-gnu -L/lib/../lib64 -L/usr/lib/x86_64-linux-gnu -L/usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../.. -L/usr/lib/llvm-9/bin/../lib -L/lib -L/usr/lib program.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/crtend.o /usr/bin/../lib/gcc/x86_64-linux-gnu/7.5.0/../../../x86_64-linux-gnu/crtn.o

Amusingly, despite LLVM having its own linker lld, it seems that Clang actually just calls the GNU linker ld under the hood. lld is more or less undocumented, so I didn’t try to get it working.

Conclusion

So there you go - a quick introduction to writing and compiling a program in LLVM IR. You can also find more detailed docs on the LLVM website.

Cheatsheet

File formats

  • .ll - LLVM IR (textual)
  • .bc - LLVM IR (binary)
  • .s - native assembly, not linked (textual)
  • .o - native assembly, not linked (binary)

Free Trade: Composable Smart Contracts

I’ve finally got around to publishing my masters thesis in computer science from the University of Bristol.

Quick link: Free Trade: Composable Smart Contracts

Talking about contracts

In 2000, Simon Peyton Jones et al. proposed a new combinator language for describing financial contracts. In fact, they appear to have thought this was such a good idea that Peyton Jones and Eber went on to redesign the language and write a book chapter about it.

This expressive domain-specific language was designed to improve on the cumbersome representations of financial contracts used in traditional IT systems. The ultimate goal? It should be possible for financial domain experts to author and analyse contracts without waiting for custom implementation by programmers. For example, you can write a European:

european :: Date -> Contract -> Contract
european t u = when (at t) (u ` or ` zero )

Or an American option:

american :: (Date, Date) -> Contract -> Contract
american (t1,t2) u = anytime (between t1 t2) u

In just a line or two of code - and these constructs can themselves be embedded and reused in other contracts. More excitingly, we can have contracts that dynamically change over time in response to outside stimuli. Here’s one that simply scales up and down with respect to some observable volatilityIndex.

c7 :: Currency -> Contract
c7 k = scale volatilityIndex (one k)

Smart contracts

In July 2015, the Ethereum network was launched. Inspired by Bitcoin’s distributed ‘blockchain’ ledger, Ethereum introduced a critical new feature: smart contracts. In addition to currency transfers, Ethereum allowed network participants to author and deploy actual programs to the blockchain. These smart contracts are now being experimentally applied in a number of areas: prediction markets, stable currencies and so on. Here’s a trivial example: a contract which stores an unsigned integer. Any network participant can set or get the number.

pragma solidity ^0.4.23;

contract SimpleStorage {
  uint storedData;
  
  function set( uint x) public {
    storedData = x;
  }
  
  function get() public constant returns ( uint ) {
    return storedData;
  }
}

More complex contracts can be used for transferring ETH, the Ethereum currency, and representing structures such as auctions, voting systems and ownership ledgers.

Implementing Merchant

I was interested to see if the two technologies - a declarative contract language and smart contracts - could be combined in a useful way. After all, wouldn’t it be useful to be able to write financial smart contracts in a totally declarative way, with no room for errors? It’s all to easy to make a nasty mistake when you try to write contracts by hand.

It turned out that there had been some work in this area by the authors of Findel, but I wanted to prototype an implementation with more advanced features.

I re-implemented the composing contracts language as a free monadic deep embedding in Haskell, then wrote a compiler from the contract language to the Solidity smart contract language. A user can write a contract that looks like the Haskell examples above and compile it directly to Solidity or to a deployable smart contract package.

Of course, that isn’t quite enough: we need some way of deploying and managing these contracts on an Ethereum network. I also prototyped a Đapp for deploying, proposing and accepting contracts. Users can upload a JSON contract description, deploy it to the network and then activate it by proposing it to another user.

Evaluation

Is this a viable method for writing Ethereum financial contracts? Well, not today. Ethereum has a few key limitations that make financial contracts quite hard to replicate.

First, it cannot trigger execution of a contract autonomously. That means that contracts cannot perform an action in response to some event. A user must trigger them, at which point the contract can verify that the event has occurred and execute.

Second, there is no widely accepted mechanism for debt enforcement. My prototype implementation simply uses a signed integer balance, not an ERC-20/223 token or ETH itself, but a real world financial market needs debt to function.

The current compiler is not optimising enough for real-world use. In particular, the need to store and test the entire history of observable values is very inefficient and will quickly hit the transaction gas limit.

If you want to know more, I would of course highly recommend reading the full thesis at ResearchGate or Internet Archive 😉. Get in touch or create a GitHub issue if you have any questions.

Loan Shark

As part of the computer science course at the University of Bristol, I built a game over the span of a year with a team of six people.

Loan Shark is an asymmetric, networked multiplayer VR diving game. Several players work as a team to retrieve treasure from a wreck at the bottom of the ocean while an angry shark - controlled in VR - hunts them down. Loan Shark features numerous graphical effects, integrated voice chat, motion-captured animation, procedurally generated mazes and support for the Myo gesture control armband.

Logos (by me!)

System.out.println("Hello, world");

It took a while (and trying more than a few static site generators), but now I have a working website! Expect to see some content here soon.