background preloader

Searchbots - build your own search robot

Searchbots - build your own search robot

Related:  Deep web resourcesUseful.....eventually.Deep Web Resources

Searching the invisible web (deep web, hidden web): WebLens sear Custom Search Locate deepweb resources by adding the word database to a regular search engine query. The terms invisible web, hidden web, and deep web all refer to the same thing: a massive storehouse of online data that regular search engines don't capture. That's because terabytes of information are buried in databases and other research resources.

Distributed web crawling Distributed web crawling is a distributed computing technique whereby Internet search engines employ many computers to index the Internet via web crawling. Such systems may allow for users to voluntarily offer their own computing and bandwidth resources towards crawling web pages. By spreading the load of these tasks across many computers, costs that would otherwise be spent on maintaining large computing clusters are avoided.[1]

DeepDyve Explores The Invisible Web As web search engines have improved over the years, there’s been less attention paid to an “inconvenient truth” about the indexes of our favorite information finding tools—namely, that search engines still miss the lion’s share of information available on the web. This so-called “deep web” remains largely impenetrable to search engines for a variety of reasons, and for many types of queries that’s just fine. But if you’re a serious searcher, looking for the best information possible, you can’t afford to overlook this vast “hidden” store of information. And that’s a challenge, because search tools that probe the deep web are for the most part either obscure or fee-based. That’s changing, thanks to a company formerly known as Infovell and now called DeepDyve.

Sciencenet Sciencenet is a distributed search engine at KIT – Liebel-Lab for scientific knowledge. The Sciencenet software (YaCy) is based on p2p technology developed by Michael Christen in collaboration with Liebel-lab at KIT. Background[edit] Scientific knowledge is spread across many databases, research institutes, educational websites and literature repositories.

So, You Want A Searchable Database, Huh? - Use these to jump around or read it all... [Search Someone Else's Database] [But I Want To Search MY Site!] [Goodies Search -- Java-Driven Search Engine] [Use AltaVista or HotBot] I am asked this question time and time again, "How do I set up a searchable database?" There are actually a few different ways -- some are harder than others. Here's a quick look at the three main ways.

Nonprofit Common Crawl Offers a Database of the Entire Web, For Free, and Could Open Up Google to New Competition Google famously started out as little more than a more efficient algorithm for ranking Web pages. But the company also built its success on crawling the Web—using software that visits every page in order to build up a vast index of online content. A nonprofit called Common Crawl is now using its own Web crawler and making a giant copy of the Web that it makes accessible to anyone. The organization offers up over five billion Web pages, available for free so that researchers and entrepreneurs can try things otherwise possible only for those with access to resources on the scale of Google’s.

Research Beyond Google: 119 Authoritative, Invisible, and Compre Got a research paper or thesis to write for school or an online class? Want to research using the Internet? Good luck. There’s a lot of junk out there — outdated pages, broken links, and inaccurate information. Using Google or Wikipedia may lead you to some results, but you can’t always be sure of accuracy. And what’s more, you’ll only be searching a fraction of all of the resources available to you.

All Music, All But Invisible - Search Engine Watch (SEW) The All Music Guide is one of the most comprehensive, extensively cross-linked and easy to use musical resources on the web. It's also, unfortunately, largely invisible to search engines. If you're looking for music information, the All Music Guide is an exceptional resource, packed with high quality information that's difficult to find elsewhere. The All Music Guide is known for its extensive, detailed, and critical biographies of thousands of performers, as well as thorough discographies. This information alone makes it one of the most comprehensive, authoritative sources for music information on the web.

80 How-To Sites Worth Bookmarking Sitting on my dining room table, I currently have half a dozen projects in various states of doneness. Some involve vivisected computer parts, others will eventually be wearable and a few are just cool things I’ve ran across on the internet. I like doing things myself — I think the DIY bug is one of the best communicable diseases in the lifehack community. These eighty sites are the places I turn to when I’m trying to figure out how to accomplish any particular goal. Any time I’m facing a new project, I start searching for how-tos that will help me figure out how other people did similar things and how likely I am to finish the project with all ten fingers still intact. I’ve broken them up into a few different categories, just to help you narrow down what you might be looking for.

WebLens Custom Search Narrow results by making your query words a phrase. To do this, enclose them in double quotation marks, as in "grand canyon tours". For details, see Search Basics. Blekko Blekko, trademarked as blekko (lowercase),[2] is a company that provides a web search engine with the stated goal of providing better search results than those offered by Google Search, with results gathered from a set of 3 billion trusted webpages and excluded from such sites as content farms. The company's site, launched to the public on November 1, 2010, uses slashtags to provide results for common searches. Blekko also offers a downloadable search bar. History[edit] The company was co-founded in 2007 by Rich Skrenta, who had created Newhoo, which was acquired by Netscape and renamed as the Open Directory Project.[3] Skrenta "is still remembered most for unleashing the Elk Cloner virus on the world".[4] Blekko has raised $24 million in venture capital from such individuals as Netscape founder Marc Andreessen and Ron Conway, as well as from U.S.

99 Resources to Research & Mine the Invisible Web - College Degr College researchers often need more than Google and Wikipedia to get the job done. To find what you're looking for, it may be necessary to tap into the invisible web, the sites that don't get indexed by broad search engines. The following resources were designed to help you do just that, offering specialized search engines, directories, and more places to find the complex and obscure.

100 Amazing How-To Sites to Teach Yourself Anything Posted by Site Administrator in Online Learning May 7th, 2009 Learning new skills and expanding your knowledge doesn’t have to cost you an arm and a leg. There are loads of free resources on the Web that can help you find instructional videos, tutorials and classes to learn a wide variety of skills from fixing basic car problems to speaking another language.