A simple compiler In my last post I played a little with Harpy to generate code from what looked like assembly language. So the next logical step is to make a compiler. And I mean a real machine-code compiler, not some wimpy byte-code nonsense. To make life simple (it is a simple compiler, after all) I'm going to start by generating code that uses the stack all the time. So all operands will live on the stack and all operations on them will push and pop the stack. Of course, the code generated from such a compiler will make any serious compiler writer weep. The CodeGen type in Harpy is the monad used during code generation. It allows you to keep some extra information around besides the generated code; it gives you access to a state monad. I will use the state to keep track of the current stack depth. (We will see why soon.)
type StackDepth = Int type Gen = CodeGen () StackDepth () addDepth :: StackDepth -> Gen addDepth i = do d <- getState setState (d+i)The addDepth function changes the current stack depth by grabbing the old one, adding the argument and storing it back. The getState and setState functions don't generate any code, they just manipulate the state available in the CodeGen monad. With that out of the way, let's implement code generation for addition.
gadd :: Gen gadd = do pop ebx pop eax add eax ebx push eax addDepth (-1)It pops the two operands off the stack, adds them, and pushes the result. The net effect on the stack is that it has one word less on it (I count my stack depth in words), so there's also a call to addDepth. Subtraction and multiplication are very similar.
gsub :: Gen gsub = do pop ebx pop eax sub eax ebx push eax addDepth (-1) gmul :: Gen gmul = do pop ebx pop eax imul InPlace eax ebx push eax addDepth (-1)On the i386 (signed) division and remainder is computed with the idiv instruction. It divides the EDX:EAX 64 bit number with the given operand. So we must convert a 32 signed number to a 64 bit signed number, this is simply done by moving EAX to EDX and then shifting right 31 steps. This will copy the sign bit into every bit of EDX. Depending of if we want the quotient or remainder we need to push EAX or EDX.
gquot :: Gen gquot = do gquotRem push eax addDepth (-1) grem :: Gen grem = do gquotRem push edx addDepth (-1) gquotRem :: Gen gquotRem = do pop ebx pop eax mov edx eax sar edx (31 :: Word8) idiv ebxTo put a constant on the stack we simply push it and increment the remembered stack depth.
gconst :: Int -> Gen gconst c = do push (fromIntegral c :: Word32) addDepth 1OK, so now for something more interesting. Assuming we are generating code for a function, we also want to access the arguments to the function. Where are the arguments? Well, according to the IA32 calling conventions the caller pushes the arguments on the stack, so we'll follow those. First we have a bunch things on the stack, how many is kept track of in the stack depth in the CodeGen monad, and then the arguments follow in order, pushed right-to-left. So to get an argument we compute the offset, convert it to a byte offset, and push that word on the stack.
-- Get the Nth argument gargN :: Int -> Gen gargN n = do d <- getState let o = 4 * (d + n) mov eax (Disp (fromIntegral o), esp) push eax addDepth 1When generating code for a function we should not clobber any callee-save registers, so to be on the safe side we save all used registers on function entry and restore them on function exit. On function exit we also return the result in EAX.
savedRegs = [ebx, edx] -- Push all register we'll be using, except eax. gprologue :: Gen gprologue = do mapM_ push savedRegs -- Pop return value to eax, restore regs, and return gepilogue :: Gen gepilogue = do pop eax mapM_ pop (reverse savedRegs) retOK, so that was a lot of stuff, let's put it together for a test.
testGen = conv_Word32ToWord32 () (length savedRegs + 1) $ do gprologue gargN 0 gconst 1 gadd gepilogue main = do test <- testGen print (test 10)The testGen function generates the prologue, push argument, push 1, add, and the epilogue. The conv_Word32ToWord32 (from my previous post) converts the machine code to a Haskell function. We also have to give the start value of the stack depth. The stack initially contains the return address and the saved registers, so that's the number we pass. Running this gives 11, as it should. OK, so let's actually write a compiler and not just a code generator. Here is a data type for integer expressions.
data Exp = Con Int | Arg Int | BinOp BOp Exp Exp deriving (Show) data BOp = Add | Sub | Mul | Quot | Rem deriving (Show)We have constants, arguments (variables), and a few binary operators. It's all easily translated to machine code.
translate :: Exp -> Gen translate (Con c) = gconst c translate (Arg n) = gargN n translate (BinOp op x y) = do translate x; translate y; binop op where binop Add = gadd binop Sub = gsub binop Mul = gmul binop Quot = gquot binop Rem = gremFor simplicity, let's compile only functions of one argument for now.
compileIOW32 :: (Exp -> Exp) -> IO (Word32 -> Word32) compileIOW32 f = conv_Word32ToWord32 () (length savedRegs + 1) $ do gprologue translate (f (Arg 0)) gepilogueThis function takes an Exp->Exp function, by giving this function the argument Arg 0 we get an expression to translate. We tack on the usual prologue and epilogue. So let's try it.
main = do let fun x = BinOp Add x (Con 1) test <- compileIOW32 fun print (test 10)Which prints 11. But yuck, writing BinOp etc. isn't nice. Let's make some instances.
instance Eq Exp instance Ord Exp instance Num Exp where x + y = BinOp Add x y x - y = BinOp Sub x y x * y = BinOp Mul x y fromInteger i = Con (fromInteger i) instance Enum Exp instance Real Exp instance Integral Exp where quot x y = BinOp Quot x y rem x y = BinOp Rem x yAnd give it a whirl:
main = do let fun x = (x+1) * x `quot` 2 test <- compileIOW32 fun print (test 10)And this prints 55, as expected. Not bad, a compiler from (a tiny subset of) Haskell functions to machine code in a few pages. But I do admit being embarrassed about generating such incredibly poor code. But there's always future blog posts to rectify that.