background preloader

Deep Web Research 2012

Deep Web Research 2012
Bots, Blogs and News Aggregators ( is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed over the years into the "invisible" or what I like to call the "deep" web. The Deep Web covers somewhere in the vicinity of 1 trillion plus pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find hundreds of billions of pages at the present time of this writing. In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps. and others. This Deep Web Research 2012 report and guide is divided into the following sections: Bot Research Related:  Search Skills

Recommended Gateway Sites for the Deep Web Recommended Gateway Sites for the Deep Web And Specialized and Limited-Area Search Engines This portion of the Internet consists of information that requires interaction to display such as dynamically-created pages, real-time information and databases. General Gateways | Humanities | Social Sciences Science and Technology | Health Sciences Business and Government | Reference, Popular Culture | Other General Gateways: Invisible Web Directory (highly recommended) An excellent gateway to some of the best research-oriented invisible web resources available. Resource Discovery Network A well-annotated listing of Deep Web resources. ALTIS - Hospitality, Leisure, Sport and Tourism Artifact - Arts and Creative Industries BIOME - Health and Life Sciences EEVL - Engineering, Mathematics and Computing GEsource - Geography and Environment Humbul - Humanities PSIgate - Physical Sciences SOSIG - Social Sciences, Business and Law Flipper

Google Chrome Tips The Deep Web and Open Source Intelligence (OSINT): Two Peas in a Pod At BrightPlanet we often throw around the acronym OSINT and talk about open source intelligence but what is it, what/if anything does it have to do with the Deep Web and how is it being used? We answer those questions in this post. What is OSINT? For the purposes of this post, we’ll keep the definition of OSINT at a high level. If you want to dig deeper into OSINT, check out a past post of ours focused on OSINT. The term OSINT was first, and still is, employed by government agencies to refer to any unclassified, publicly-available information. The Deep Web and OSINT If you are looking to get your hands on open source intelligence, look no further than the Deep Web. If you want to go beyond the Surface Web information found by Google search, you need to utilize a Deep Web harvest to get more open source intelligence. We often use the example of grants.gov. Using OSINT Organizations often have a handle on the data inside their organization but what about outside of it? Photo: Paul Joseph

How to use Google for Hacking. | Arrow Webzine Google serves almost 80 percent of all search queries on the Internet, proving itself as the most popular search engine. However Google makes it possible to reach not only the publicly available information resources, but also gives access to some of the most confidential information that should never have been revealed. In this post I will show how to use Google for exploiting security vulnerabilities within websites. The following are some of the hacks that can be accomplished using Google. 1. There exists many security cameras used for monitoring places like parking lots, college campus, road traffic etc. which can be hacked using Google so that you can view the images captured by those cameras in real time. inurl:”viewerframe? Click on any of the search results (Top 5 recommended) and you will gain access to the live camera which has full controls. you now have access to the Live cameras which work in real-time. intitle:”Live View / – AXIS” 2. filetype:xls inurl:”email.xls” 3. “? 4.

11 Unknown Ways Of Using Google Search - Curious Mob Thinking what more is there to know about Google search? I mean its Google search after all, type whatever you want to search, press enter and everything in the world related to your topic is displayed in front of your eyes. But believe it or not the search engine has plenty of tricks up its sleeve. Here’s an overview of 11 Google Tricks That Will Change the Way You Search: 11. One well-known, simple trick while searching a phrase in quotes is that it will yield only pages with the same words in the same order as what’s in the quotes. Brandieself added: 25 Sneaky Online Tools and Gadgets to Help You Spy on Your Competitors Even before you entered into the world of “business”, you were watching your competition. Whether it was in a classroom or on a sports team, you not only wanted to keep up, you wanted to know where the marker was set so you could go one step further. It was about finding new opportunities and setting new goals based on someone you aspired to beat. At this time, when search is so important and detailed, and the Internet has grown so extensively, you have tons of different factors to consider when spying on your competition. This is where marketing tools come into play. In many cases, tools that help you monitor your own web performance also can help you gather data on your competition. 1. This is a very simple and easy-to-use tool that will send reports right to your inbox. Best Ways to Use This Tool: Get competitors’ backlinksMonitor social (or other website) mentions of your companyMonitor keyword mentions Price: Free 2. 3. This is a tool that’s all about Twitter. 4. 5. 6. 7. 8. 9. 10.

Invisible Web Gets Deeper By Danny Sullivan From The Search Engine Report Aug. 2, 2000 I've written before about the "invisible web," information that search engines cannot or refuse to index because it is locked up within databases. Now a new survey has made an attempt to measure how much information exists outside of the search engines' reach. The company behind the survey is also offering up a solution for those who want tap into this "hidden" material. The study, conducted by search company BrightPlanet, estimates that the inaccessible part of the web is about 500 times larger than what search engines already provide access to. That sounds terrible, but as I've commented numerous times before, the size of a search engine does not necessarily equate to its relevancy or usefulness. For example, assume you wanted to do a trademark search against databases in various parts of the world. To date, meta search tools like this have been few and far between. Don't expect a web based version of LexiBot to be coming.

Verification Handbook for Investigative Reporting Craig Silverman is the founder of Emergent, a real-time rumor tracker and debunker. He was a fellow with the Tow Center for Digital Journalism at Columbia University, and is a leading expert on media errors, accuracy and verification. Craig is also the founder and editor of Regret the Error, a blog about media accuracy and the discipline of verification that is now a part of the Poynter Institute. He edited the Verification Handbook, previously served as director of content for Spundge, and helped launch OpenFile, an online local news startup that delivered community-driven reporting in six Canadian cities. Craig is also the former managing editor of PBS MediaShift and has been a columnist for The Globe And Mail, Toronto Star, and Columbia Journalism Review. He tweets at @craigsilverman. Rina Tsubaki leads and manages the "Verification Handbook" and "Emergency Journalism" initiatives at the European Journalism Centre in the Netherlands.

Research Beyond Google: 119 Authoritative, Invisible, and Compre Got a research paper or thesis to write for school or an online class? Want to research using the Internet? Good luck. There’s a lot of junk out there — outdated pages, broken links, and inaccurate information. Using Google or Wikipedia may lead you to some results, but you can’t always be sure of accuracy. And what’s more, you’ll only be searching a fraction of all of the resources available to you. Google, the largest search database on the planet, currently has around 50 billion web pages indexed. Do you think your local or university librarian uses Google? Topics Covered in this Article Deep Web Search Engines | Art | Books Online | Business | Consumer | Economic and Job Data | Finance and Investing | General Research | Government Data | International | Law and Politics | Library of Congress | Medical and Health | STEM | Transportation Deep Web Search Engines To get started, try using a search engine that specializes in scouring the invisible web for results. Art Books Online Business

Invisible Web: What it is, Why it exists, How to find it, and Its inherent ambiguity What is the "Invisible Web", a.k.a. the "Deep Web"? The "visible web" is what you can find using general web search engines. It's also what you see in almost all subject directories. The "invisible web" is what you cannot find using these types of tools. The first version of this web page was written in 2000, when this topic was new and baffling to many web searchers. Since then, search engines' crawlers and indexing programs have overcome many of the technical barriers that made it impossible for them to find "invisible" web pages. These types of pages used to be invisible but can now be found in most search engine results: Pages in non-HTML formats (pdf, Word, Excel, PowerPoint), now converted into HTML. Why isn't everything visible? There are still some hurdles search engine crawlers cannot leap. The Contents of Searchable Databases. How to Find the Invisible Web Simply think "databases" and keep your eyes open. Examples: plane crash database languages database toxic chemicals database

How to search like a spy: Google's secret hacks revealed The National Security Agency just declassified a hefty 643-page research manual called Untangling the Web: A Guide to Internet Research (PDF) that, at least at first, doesn't appear all that interesting. That is, except for one section on page 73: "Google Hacking." "Say you're a cyberspy for the NSA and you want sensitive inside information on companies in South Africa," explains Kim Zetter at Wired. "What do you do?" Well, you could type the following advanced search into Google — "filetype:xls site:za confidential" — to uncover a trove of seemingly private spreadsheets. These are just two examples of the numerous private files that are inadvertently uploaded to the Internet, and can be accessed if you know the right Google search terms. Here are a few more: Pretty neat, huh? And even if keyboard espionage isn't really your thing, the document contains a number of practical tips anyone can use to become a better Googler: * Repeating a word will help you find more relevant hits.

Deep Web Search Engines Where to start a deep web search is easy. You hit Google.com and when you brick wall it, you go to scholar.google.com which is the academic database of Google. After you brick wall there, your true deep web search begins. To all the 35F and 35G’s out there at Fort Huachuca and elsewhere, you will find some useful links here to hone in on your AO. If you find a bad link, Comment the link below. Last updated July 12, 2016 – updated reverse image lookup. Multi Search engines Deeperweb.com – (broken as of Sept 2016, hopefully not dead) This is my favorite search engine. Surfwax – They have a 2011 interface for rss and a 2009 interface I think is better. www.findsmarter.com – You can filter the search by domain extension, or by topic which is quite neat. Cluster Analysis Engine TouchGraph – A brilliant clustering tool that shows you relationships in your search results using a damn spiffy visualization. Yippy.com – A useful, non-graphical clustering of results. Speciality Deep Web Engines General

Related: