background preloader

Bot

Facebook Twitter

Using the python wikipediabot. WikiProject India. WikiProject India WP:IND and WP:INDIA redirect here.

WikiProject India

For the notice board for India-related topics, please see WP:INB. For the essay on Independent sources, please see WP:INDY. The India WikiProject is a group dedicated to improving Wikipedia's coverage of topics related to the Republic of India and the history of the Indian subcontinent. The goals of this group are three-fold— to provide guidelines and recommendations for articles that describe aspects of India,to improve Wikipedia's coverage of India by creating, expanding, and maintaining factual articles,to serve as the point of discussion for issues related to India in Wikipedia. To achieve the goals, this project group has developed features to help in managing its creative work. Participation[edit] The details of WikiProject India members can be found at Wikipedia:WikiProject India/Members.

Departments[edit] Departments are groups of members who do specific work. Coverage[edit] Guide[edit] Startups Wiki: Ask YC Archive. This pages is an archive of quality Hacker News "Ask YC" posts grouped by subject.

Startups Wiki: Ask YC Archive

"Quality" means posts that are a) generally relevant to startups and b) contain a decent amount of useful discussion/advice. All posts on this page have been looked at manually. Within groups (and sub-groups) stories are sorted in descending date order because newer stories are more timely (in addition to often having more comments). For a more up-to-date listing (that includes postings where this one left off), check out the Ask HN Archive. When referencing this page, don't copy and paste the links to table of contents sections, as they are numerically designated and thus subject to change. Financial. Nutch. Features[edit] The fetcher ("robot" or "web crawler") has been written from scratch specifically for this project.

Nutch

History[edit] Nutch originated with Doug Cutting, creator of both Lucene and Hadoop, and Mike Cafarella. In June, 2003, a successful 100-million-page demonstration system was developed. To meet the multimachine processing needs of the crawl and index tasks, the Nutch project has also implemented a MapReduce facility and a distributed file system. In January, 2005, Nutch joined the Apache Incubator, from which it graduated to become a subproject of Lucene in June of that same year.

Releases[edit] Advantages[edit] Advantages of Nutch over a simple fetcher include[2][unreliable source?] Highly scalable and relatively feature rich crawlerfeatures like politeness which obeys robots.txt rulesrobust and scalable - Nutch can run on a cluster of up to 100 machinesquality - crawling can be biased to fetch "important" pages first Scalability[edit] The ClueWeb09 dataset (used in e.g.