background preloader

TagSoup

Facebook Twitter

Drinking TagSoup by Example. By Neil Mitchell TagSoup is a library for extracting information out of unstructured HTML code, sometimes known as tag-soup.

Drinking TagSoup by Example

The HTML does not have to be well formed, or render properly within any particular framework. This library is for situations where the author of the HTML is not cooperating with the person trying to extract the information, but is also not trying to hide the information. This document gives two particular examples, and two more may be found in the Example file from the darcs repository. The examples we give are: Obtaining the Hit Count from Haskell.org Obtaining a list of Simon Peyton-Jones' latest papers A brief overview of some other examples The intial version of this library was written in Javascript and has been used for various commercial projects involving screen scraping.

This library was written without knowledge of the Java version of TagSoup. Acknowledgements Thanks to Mike Dodds for persuading me to write this up as a library. Version History. In Haskell how do you extract strings from an XML document. Tagsoup. I use optional parameters in my TagSoup library, but it seems not to be a commonly known trick, as someone recently asked if the relevant line was a syntax error.

tagsoup

So, here is how to pass optional parameters to a Haskell function. Optional Parameters in Other Languages Optional parameters are in a number of other languages, and come in a variety of flavours. Ada and Visual Basic both provide named and positional optional parameters. For example, given the definition: Sub Foo(b as Boolean = True, i as Integer = 0, s as String = "Hello") We can make the calls: Call Foo(s = "Goodbye", b = False)Call Foo(False, 1) In the first case we give named parameters, in the second we give all the parameters up to a certain position. In some languages, such as GP+, you can say which parameters should take their default values: Call Foo(_, 42, _) Optional Parameters in Haskell Haskell doesn't have built-in optional parameters, but using the record syntax, it is simple to encode named optional parameters.

Text.HTML.TagSoup. Data Tag str Source A single HTML element.

Text.HTML.TagSoup

A whole document is represented by a list of Tag. There is no requirement for TagOpen and TagClose to match. type Row = IntSource The row/line of a position, starting at 1 type Attribute str = (str, str)Source An HTML attribute id="name" generates ("id","name")