background preloader

Haskell Parsers

Facebook Twitter

Jaspervdj - Erasing "expected" messages in Parsec. Introduction Parsec is an industrial-strength parser library.

jaspervdj - Erasing "expected" messages in Parsec

I think one of its main advantages is that allows you generate really good error messages. However, this sometimes requires some non-obvious tricks. In this blogpost, I describe one of those. The BNF Converter. Parsing CSV by feeding BNF to Haskell's Parsec module. In a previous post, I provided a general overview of Backus-Naur Form (BNF) and an example set of BNF rules for comma-separated values (CSVs).

Parsing CSV by feeding BNF to Haskell's Parsec module

Today we'll see just how easy it is to translate those rules into Haskell code that will parse a CSV file. To work this bit o' magic, we first need a proper spellbook. So, we go and grab Parsec off the shelf. Parsec is a parsing module that comes distributed with the Glasgow Haskell Compiler (GHC). In order to make use of this module, we have to import it: import Text.ParserCombinators.Parsec Now, we'll proceed to convert each line of the CSV BNF we presented earlier into a Haskell function. Csv-file = { row } which indicates that a csv-file is composed of many rows. Csv_file = many row Easy, huh? Next comes the definition of row. Row = field-list, eol Now the Haskell: row = do result <- field_list eol return result The rule for field_list is even more involved: field-list = field, [ ",", field-list ] field_list = do first_cell <- field Nothing -> return []

Parsing with Haskell and Attoparsec. From Hask Till Dawn: Testing attoparsec parsers with hspec. Almost all haskellers end up, some day, having to write a parser.

From Hask Till Dawn: Testing attoparsec parsers with hspec

But then, that’s not really a problem because writing parsers in Haskell isn’t really annoying, like it tends to be elsewhere. Of special interest to us is attoparsec, a very fast parser combinator library. It lets you combine small, simple parsers to express how data should be extracted from that specific format you’re working with. For example, suppose you want to parse something of the form |<any char>| where <any char> can be… well, any character.

We obviously only care about that precise character sitting there – once the input is processed, we don’t really care about these | anymore. Here we go, we have our parser. This parser will fail if any of the 3 smaller parsers I’m using fail. Let’s now see our parser in action, by loading it in ghci and trying to feed it various inputs. First, we want to be able to type in Text values directly without using conversions functions from/to Strings. Why do we care about this? Parsing Log FIles in Haskell. A major upgrade to attoparsec: more speed, more power. I’m pleased to introduce the third generation of my attoparsec parsing library.

A major upgrade to attoparsec: more speed, more power

With a major change to its internals, it is both faster and more powerful than previous versions, while remaining backwards compatible. Let’s start with a speed comparison between the hand-written C code that powers Node.js’s HTTP parser and an idiomatic Haskell parser that uses attoparsec. There are good reasons to take these numbers with a fistful of salt, so imagine huge error bars, warning signs, and whatnot—but they’re still interesting. A little explanation is in order for why there are two entries for http-parser. The “null” driver consists of a series of empty callbacks, and represents the best possible performance we can get. Meanwhile, the attoparsec parser is of course tiny: a few dozen lines of code, instead of a few thousand. To be clear, you really shouldn’t treat comparing the two as anything other than a fast-and-loose exercise. Attoparsec: Fast combinator parsing for bytestrings and text. A Gentle Introduction to Parsec ¶ blog.barrucadu.

It seems to me that there aren’t many step-by-step introductions to parsec, where you build up a parser as you go.

A Gentle Introduction to Parsec ¶ blog.barrucadu

This is especially the case for applicative parsec, which is a shame as applicative functors are nice. So, I wrote one. Today, we are going to learn how to use applicatives and parsec to parse a CSV file. We’ll start off with a very basic one where there can be no commas or escape characters in the fields, then add support for quoted fields which can contain any character, and then we’ll add support for special escape characters (numeric literals and the like). Finally, I’ll leave two small exercises that you might want to work on, just to check that you managed to get everything.

The Basic Parser > import Text.Parsec> import Control.Applicative ((<$), (<*), (*>), liftA)> import Data.Char (chr)> > parseCSV :: String -> Either ParseError [[String]]> parseCSV = ... Great things have small beginnings. > parseCSV = parse csvp "" See, no variables! Oh. Quoted Cells Exercises.