boilerpipe - Boilerplate Removal and Fulltext Extraction from HTML pages

The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate. Boilerpipe is a Java library written by Christian Kohlschütter. The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA.

Teaching With YouTube: 197 Digital Channels For Learning

Users are free to customize the highlighting and configure their favorite styles. Content Assist An Xtext editor proposes valid code completions at any place in the document, helping your users with the syntactical details of your language. Validation and Quick Fixes Xtext has outstanding support for static analysis and validation of your models. Advanced Java Integration If your language targets the JVM, you'll love the Java support Xtext provides. Integration with other Eclipse tools Xtext provides a rich API to work with resources. More IDE Features Xtext's advanced Eclipse integration goes far beyond the editor. NetLogo Home Page. NetLogo is a multi-agent programmable modeling environment.

NetLogo Home Page

Speech by Adrian Tan at NTU convocation ceremony. Life and How to Survive It. Below is a speech to the graduating class of 2008 at NTU convocation ceremony last week by Adrian Tan, a litigation lawyer and the author of The Teenage Textbook.

Speech by Adrian Tan at NTU convocation ceremony

I must say thank you to the faculty and staff of the Wee Kim Wee School of Communication and Information for inviting me to give your convocation address. It's a wonderful honour and a privilege for me to speak here for ten minutes without fear of contradiction, defamation or retaliation. I say this as a Singaporean and more so as a husband. My wife is a wonderful person and perfect in every way except one.

A successful Git branching model »

Why git? For a thorough discussion on the pros and cons of Git compared to centralized source code control systems, see the web. But with Git, these actions are extremely cheap and simple, and they are considered one of the core parts of your daily workflow, really. Enough about the tools, let's head onto the development model.

10 Questions That Create Success

Think again. Pictures of dead presidents have never made anybody happy. And how can you be successful if you're not happy? And buying things with that all money isn't much better. How To Be More Interesting (In 10 Simple Steps) My Algorithm for Beating Procrastination. List of VC firms and Angel Investors in India. Back in June 2011, I had made this list for my reference and thought to share with everyone.

List of VC firms and Angel Investors in India

I am not giving the email ids for obvious reasons, you can always find the email ids/contact form from their sites anyway (or on linkedin), though if you are indeed serious about raising capital, I will highly advice you to contact the VC firms through some referral connection. I have added few remarks in front of them, but do your own research. Feel free to drop me an email though, if you have something specific to ask. Please do note that this list is not complete in anyway, this is something which I had created for my own reference. Feel free to add in the below wiki excel along with your remarks, please avoid removing content unless it is verbose or wrong.

Effective Sketches at Skills Matter

I presented my "Effective Sketches" talk last night at Skills Matter where we looked at how to produce effective diagrammatic representations of software systems and why they are useful. In other words, we looked at how to draw boxes and lines. :-) Here are the links to the slides and video. If you missed the talk, I'll be doing a slightly longer version at the upcoming Software Architect 2011 conference, which takes place in London during October.

blog-rants - steveyegge2

I started writing an internal blog at in summer 2004.

We spend our whole careers moving up so fast, that we're unable to hone any specific skills. Terry Jones » Blog Archive » Back of the envelope calculations with The Rule of 72. Image: The Rule of 72 deserves to be better known among technical people. It’s a widely-known financial rule of thumb used for understanding and calculating interest rates. But others, including computer scientist and start-up founders, are often concerned with growth rates. Mind Maps/Thinking Maps/Graphic Organizers.