background preloader

Deep Web Research 2012

Deep Web Research 2012
Bots, Blogs and News Aggregators ( is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed over the years into the "invisible" or what I like to call the "deep" web. The Deep Web covers somewhere in the vicinity of 1 trillion plus pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find hundreds of billions of pages at the present time of this writing. In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps. and others. This Deep Web Research 2012 report and guide is divided into the following sections: Bot Research Related:  Search Skills

Recommended Gateway Sites for the Deep Web Recommended Gateway Sites for the Deep Web And Specialized and Limited-Area Search Engines This portion of the Internet consists of information that requires interaction to display such as dynamically-created pages, real-time information and databases. General Gateways | Humanities | Social Sciences Science and Technology | Health Sciences Business and Government | Reference, Popular Culture | Other General Gateways: Invisible Web Directory (highly recommended) An excellent gateway to some of the best research-oriented invisible web resources available. Resource Discovery Network A well-annotated listing of Deep Web resources. ALTIS - Hospitality, Leisure, Sport and Tourism Artifact - Arts and Creative Industries BIOME - Health and Life Sciences EEVL - Engineering, Mathematics and Computing GEsource - Geography and Environment Humbul - Humanities PSIgate - Physical Sciences SOSIG - Social Sciences, Business and Law Flipper

Google Chrome Tips How to use Google for Hacking. | Arrow Webzine Google serves almost 80 percent of all search queries on the Internet, proving itself as the most popular search engine. However Google makes it possible to reach not only the publicly available information resources, but also gives access to some of the most confidential information that should never have been revealed. In this post I will show how to use Google for exploiting security vulnerabilities within websites. The following are some of the hacks that can be accomplished using Google. 1. There exists many security cameras used for monitoring places like parking lots, college campus, road traffic etc. which can be hacked using Google so that you can view the images captured by those cameras in real time. inurl:”viewerframe? Click on any of the search results (Top 5 recommended) and you will gain access to the live camera which has full controls. you now have access to the Live cameras which work in real-time. intitle:”Live View / – AXIS” 2. filetype:xls inurl:”email.xls” 3. “? 4.

11 Unknown Ways Of Using Google Search - Curious Mob Thinking what more is there to know about Google search? I mean its Google search after all, type whatever you want to search, press enter and everything in the world related to your topic is displayed in front of your eyes. But believe it or not the search engine has plenty of tricks up its sleeve. Here’s an overview of 11 Google Tricks That Will Change the Way You Search: 11. One well-known, simple trick while searching a phrase in quotes is that it will yield only pages with the same words in the same order as what’s in the quotes. Invisible Web Gets Deeper By Danny Sullivan From The Search Engine Report Aug. 2, 2000 I've written before about the "invisible web," information that search engines cannot or refuse to index because it is locked up within databases. Now a new survey has made an attempt to measure how much information exists outside of the search engines' reach. The company behind the survey is also offering up a solution for those who want tap into this "hidden" material. The study, conducted by search company BrightPlanet, estimates that the inaccessible part of the web is about 500 times larger than what search engines already provide access to. That sounds terrible, but as I've commented numerous times before, the size of a search engine does not necessarily equate to its relevancy or usefulness. For example, assume you wanted to do a trademark search against databases in various parts of the world. To date, meta search tools like this have been few and far between. Don't expect a web based version of LexiBot to be coming.

Verification Handbook for Investigative Reporting Craig Silverman is the founder of Emergent, a real-time rumor tracker and debunker. He was a fellow with the Tow Center for Digital Journalism at Columbia University, and is a leading expert on media errors, accuracy and verification. Craig is also the founder and editor of Regret the Error, a blog about media accuracy and the discipline of verification that is now a part of the Poynter Institute. He edited the Verification Handbook, previously served as director of content for Spundge, and helped launch OpenFile, an online local news startup that delivered community-driven reporting in six Canadian cities. Craig is also the former managing editor of PBS MediaShift and has been a columnist for The Globe And Mail, Toronto Star, and Columbia Journalism Review. He tweets at @craigsilverman. Rina Tsubaki leads and manages the "Verification Handbook" and "Emergency Journalism" initiatives at the European Journalism Centre in the Netherlands.

Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity What is the "Invisible Web", a.k.a. the "Deep Web"? The "visible web" is what you can find using general web search engines. It's also what you see in almost all subject directories. The "invisible web" is what you cannot find using these types of tools. The first version of this web page was written in 2000, when this topic was new and baffling to many web searchers. Since then, search engines' crawlers and indexing programs have overcome many of the technical barriers that made it impossible for them to find "invisible" web pages. These types of pages used to be invisible but can now be found in most search engine results: Pages in non-HTML formats (pdf, Word, Excel, PowerPoint), now converted into HTML. Why isn't everything visible? There are still some hurdles search engine crawlers cannot leap. The Contents of Searchable Databases. How to Find the Invisible Web Simply think "databases" and keep your eyes open. Examples: plane crash database languages database toxic chemicals database

How to search like a spy: Google's secret hacks revealed The National Security Agency just declassified a hefty 643-page research manual called Untangling the Web: A Guide to Internet Research (PDF) that, at least at first, doesn't appear all that interesting. That is, except for one section on page 73: "Google Hacking." "Say you're a cyberspy for the NSA and you want sensitive inside information on companies in South Africa," explains Kim Zetter at Wired. "What do you do?" Well, you could type the following advanced search into Google — "filetype:xls site:za confidential" — to uncover a trove of seemingly private spreadsheets. These are just two examples of the numerous private files that are inadvertently uploaded to the Internet, and can be accessed if you know the right Google search terms. Here are a few more: Pretty neat, huh? And even if keyboard espionage isn't really your thing, the document contains a number of practical tips anyone can use to become a better Googler: * Repeating a word will help you find more relevant hits.

The Ultimate Guide to the Invisible Web Search engines are, in a sense, the heartbeat of the internet; “Googling” has become a part of everyday speech and is even recognized by Merriam-Webster as a grammatically correct verb. It’s a common misconception, however, that Googling a search term will reveal every site out there that addresses your search. Typical search engines like Google, Yahoo, or Bing actually access only a tiny fraction — estimated at 0.03% — of the internet. "As much as 90 percent of the internet is only accessible through deb web websites." So where’s the rest? So what is the Deep Web, exactly? Search Engines and the Surface Web Understanding how surface pages are indexed by search engines can help you understand what the Deep Web is all about. Over time, advancing technology made it profitable for search engines to do a more thorough job of indexing site content. How is the Deep Web Invisible to Search Engines? Some examples of other Deep Web content include: Reasons a Page is Invisible Too many parameters Art

Pathways | Finding | Effective searching | Being Digital | Open University Library Services When you select a pathway, you will see a number of activities on a particular theme. Pathways allow you to develop a deeper understanding of a topic. You can work through the activities in your chosen pathway in any order. Activities will open in a new tab or window. The icon next to each activity helps you to identify the format used (e.g. activity, video, audio, or external resource). Viewing all pathways This is a list of all the pathways available. Assess your skills Assess your familiarity and confidence with online tools and environments and find out which activities can help you develop your skills further. Start pathway Avoiding plagiarism Learn to recognise what plagiarism is, the forms it can take and how to avoid it by developing your skills. Start pathway Communicating online How can you ensure your interactions with others online are appropriate and effective? Start pathway Effective searching Start pathway Exploring your information landscape Start pathway Keeping up-to-date Using

The Invisible Web: A Beginners Guide to the Web You Don't See By Wendy Boswell Updated June 02, 2016. What is the Invisible Web? The term "invisible web" mainly refers to the vast repository of information that search engines and directories don't have direct access to, like databases. Unlike pages on the visible Web (that is, the Web that you can access from search engines and directories), information in databases is generally inaccessible to the software spiders and crawlers that create search engine indexes. How Big is the Invisible Web? The Invisible Web is estimated to be literally thousands of times larger than the Web content found with general search engine queries. The major search engines - Google, Yahoo, Bing - don't bring back all the "hidden" content in a typical search, simply because they can't see that content without specialized search parameters and/or search expertise. continue reading below our video Why Is It Called "The Invisible Web"? Spiders meander throughout the Web, indexing the addresses of pages they discover. Humanities

6 common misconceptions when doing advanced Google Searching As librarians we are often called upon to teach not just library databases but also Google and Google Scholar. Unlike teaching other search tools, teaching Google is often tricky because unlike library databases where we can have insider access through our friendly product support representative as librarians we have no more or no less insight into Google which is legendary for being secretive. Still, given that Google has become synonymous with search we should be decently good at teaching it. I've noticed though, often when people teach Google, particularly advanced searching of Google, they fall prey to 2 main types of errors. The first type of error involved not keeping up to date and given the rapid speed that Google changes, we often end up teaching things that no longer work. The second type of error is perhaps more common to us librarians. Also the typical Google search brings back estimated count of results. e.g. The 6 are 1. About tilde (~) About plus operator (+) 2. 3. 4. 5. 6. 7.