Technical Musings

Thoughts, Ideas, and Experimentation

View on GitHub
19 September 2020

A simple Parser Combinator library in Idris

by Timmy Jose

Monadic parsing is a form of parsing that combines lexing, parsing, and often evaluation in the same recursive process. Languages with strong algebraic data types (and thereby good pattern-matching) are excellent candidates for monadic parsing. Interestingly, Rust has a popular parser-combinator library called nom, but using it is not as clean as in ML-like languages due to the syntax used by Rust (not to mention involving the less-than-great macro system of Rust).

This is a simple exercise in implementing one of my favourite chapters in any programming textbook, the chapter on Monadic Parsing in the book, “Programming in Haskell” (2nd Edition) by Professor Graham Hutton. He has written quite a few papers on the subject which are well worth reading as well. For instance, Monadic Parser Combinators, and Monadic Parsing in Haskell to mention a couple.

The basic idea here is to build up a relatively powerful parser-combinator library bottom-up - whereby we define a custom parser type, implement primitive parsers using the Monad andAlternative macghinery, and progressively build up more and more powerful parsers.

Let’s start off with the basic parser implementation:


module ParsCom

import Data.Strings

%default total

{-
  Note that we get some parsers for free - 
   
   * A parser that always succeeds - `pure` from the Applicative interface
   * A parser that is equivalent to epsilon (no choice) - `empty` from the Alternative interface
   * sequencing by using `do` from the Monad interface
   * Choice by using the `(<|>)` operator from the Alternative interface
   * Repetitions by using the `some` and `many` functions (that we define ourselves since the Alternative
     interface in Idris, unlike Haskell, does not seem to have them)
-}

||| The Parser data type - takes a string argument 
||| and produces a result and the unconsumed string
public export
data Parser : Type -> Type where
  MkParser : (String -> List (a, String)) -> Parser a

||| Helper function to apply a parser
export
parse : Parser a -> String -> List (a, String)
parse (MkParser p) inp = p inp

||| A parser that consumes a single character
export
item : Parser Char
item = MkParser (\inp => case (unpack inp) of
                            [] => []
                            (c :: cs) => [(c, pack cs)])

--- making the Parser types a monad so that we can 
--- use the `do` notation

public export
Functor Parser where
  -- fmap : (a -> b) -> Parser a -> Parser b
   map f p = MkParser (\inp => case parse p inp of
                                    [(v, out)] => [(f v, out)]
                                    _ => [])

public export
Applicative Parser where
  -- pure : a -> Parser a
  pure v = MkParser (\inp => [(v, inp)])

  -- (<*>) : Parser (a -> b) -> Parser a -> Parser b
  pf <*> p = MkParser (\inp => case parse pf inp of
                                  [(f, out)] => parse (map f p) out
                                  _ => [])

public export
Monad Parser where
  -- (>>=) : Parser a -> (a -> Parser b) -> Parser b
  p >>= f = MkParser (\inp => case parse p inp of
                                  [(v, out)] => parse (f v) out
                                  _ => [])

-- make the Parser an instance of the Alternative class so that 
-- we can use choice and epsilon (no choice) operators

public export
Alternative Parser where
  -- empty : Parser a
  empty = MkParser (\_ => [])

  -- (<|>) : Parser a -> Parser a -> Parser a
  p <|> q = MkParser (\inp => case parse p inp of
                                [(v, out)] => [(v, out)]
                                _ => parse q inp )

-- primitive parsers

||| A parser that matches against the supplied predicate
sat : (Char -> Bool) -> Parser Char
sat p = do x <- item
           if p x then pure x else empty

||| A parser that consumes a single character matching the supplied character
char : Char -> Parser Char
char c = sat (== c)

||| A Parser that matches against a digit character
digit : Parser Char
digit = sat isDigit

||| A Parser that matches against a lowercase letter
lower : Parser Char
lower = sat isLower

||| A Parser that matches against an uppercase letter
upper : Parser Char
upper = sat isUpper

||| A Parser that matches against an alphabetic character
alpha : Parser Char
alpha = sat isAlpha

||| A Parser that matches against an alphanumeric character
alphanum : Parser Char
alphanum = sat isAlphaNum

-- derived parsers

mutual
  export
  partial
  some : Parser a -> Parser (List a)
  some p = pure (::) <*> p <*> many p

  partial
  many : Parser a -> Parser (List a)
  many p = some p <|> pure []

||| A Parser that matches against whitespace
export
partial
space : Parser ()
space = do many (sat isSpace)
           pure ()

||| A Parser for a string of alphanumeric characters
partial
ident : Parser String
ident = do c <- item
           cs <- many alpha
           pure $ pack (c :: cs)

||| A Parser for an Nat value
partial
nat : Parser Nat
nat = do ns <- some digit
         pure (stringToNatOrZ (pack ns))

||| A Parser for an Int value
partial
int : Parser Int
int = do char '-'
         ns <- nat
         pure (-(cast ns))
      <|> do ns <- nat
             pure (cast ns)

||| A Parser that matches the supplied string
export
partial
string : String -> Parser String
string "" = pure ""
string cs = let (c :: cs) = unpack cs in
                do char c
                   string (pack cs)
                   pure $ pack (c :: cs)

-- higher-level parsers

||| A Parser that takes another Parser and handles spacing before and after
export
partial
token : Parser a -> Parser a
token p = do space
             v <- p
             space
             pure v

||| A Parser for an identifier
export
partial
identifier : Parser String
identifier = token ident

||| A Parser for a natural number
export
partial
natural : Parser Nat
natural = token nat

||| A Parser for an integer
export
partial
integer : Parser Int
integer = token int

||| A Parser for a string
export
partial
symbol : String -> Parser String
symbol cs = token (string cs)

That’s it! Well, of course, it’s not going to be as efficient or generic as Parsec, for instance, but it is excellent for edificational purposes.

Now let’s use this mini-library to define a parser-cum-evaluator for basic arithmetic expressions. Note that the only error handling provided is using the Maybe type family. For any industrial strength parser-combinator library, error handling would be, I imagine, one of the most important aspects of the library.


module ArithExpr

import ParsCom

{-
  EBNF grammar for arithmetic expressions:

  expr ::= term (+ expr | - expr |  epsilon)
  term ::= factor (* term | / term | epsilon)
  factor ::= ( expr ) | integer
  integer ::= -2 | -1 | 0 | 1 | 2 | 3 ...

-}

mutual
  partial
  expr : Parser Int
  expr = do t <- term
            do symbol "+"
               e <- expr 
               pure (t + e) 
             <|> do symbol "-"
                    e <- expr
                    pure (t - e)
             <|> pure t

  partial
  term : Parser Int
  term = do f <- factor
            do symbol "*" 
               t <- term 
               pure (f * t)
             <|> do symbol "/" 
                    t <- term
                    pure (if t == 0 then 0 else f `div` t) 
             <|> pure f

  partial
  factor : Parser Int
  factor = do symbol "("
              e <- expr
              symbol ")"
              pure e
           <|> integer

export
partial
eval : String -> Maybe Int
eval inp = case parse expr inp of
                [(v, "")] => Just v
                _ => Nothing

And finally, the runner:


module ParsMain

import ArithExpr

partial
main : IO ()
main = do e <- getLine
          case eval e of 
               Just v => putStrLn (show v)
               Nothing => putStrLn "Invalid input"

Let’s take it for a spin!


~/dev/timmyjose.github.io/testing_ground/nzo$ idris2 -o ParsMain ParsMain.idr && rlwrap ./build/exec/ParsMain
1 + 2 * 3
7
~/dev/timmyjose.github.io/testing_ground/nzo$ idris2 -o ParsMain ParsMain.idr && rlwrap ./build/exec/ParsMain
(1 + 2) * 3
9
~/dev/timmyjose.github.io/testing_ground/nzo$ idris2 -o ParsMain ParsMain.idr && rlwrap ./build/exec/ParsMain
(12 / 2) * 3 + 3 * (12 / (1 + 3))
27
~/dev/timmyjose.github.io/testing_ground/nzo$ idris2 -o ParsMain ParsMain.idr && rlwrap ./build/exec/ParsMain
(1 + 2
Invalid input
~/dev/timmyjose.github.io/testing_ground/nzo$ idris2 -o ParsMain ParsMain.idr && rlwrap ./build/exec/ParsMain
12 / 0
0
~/dev/timmyjose.github.io/testing_ground/nzo$ idris2 -o ParsMain ParsMain.idr && rlwrap ./build/exec/ParsMain
12 - (100 / 10) / (20 / 5)
10
~/dev/timmyjose.github.io/testing_ground/nzo$ idris2 -o ParsMain ParsMain.idr && rlwrap ./build/exec/ParsMain
(100 / 20 / 4)
20

Some notes:

Compared to the Haskell version, there were a couple of minor adjustments I had to make:

1). Idris StringS are different from Haskell StringS, and hence all the packing and unpacking business.

2). The Alternative interface in Idris doesn’t have the some and many repetition functions as Haskell’s Alternative typeclass does, and hence I had to handroll my own version.

3). Again, owing to the in-progress nature of Idris, evaluating even simple expressions in the REPL quickly became extremely (and I mean extremely) slow. I had to finally resort to compiling the code and running it from ParsMain. There is quite a lot of scope for performance improvement in Idris in this regard.

4). The totality checking of Idris really does help in many tangible ways - unlike in Haskell, it forces you to be more deeply aware of your own code, and to be able to justify why your code is total and covering (or not).

Overall, I quite enjoyed writing it, perhaps even more so than the Haskell version! :-)

< > Home