background preloader

Deep Web Research 2012

Deep Web Research 2012
Bots, Blogs and News Aggregators ( is a keynote presentation that I have been delivering over the last several years, and much of my information comes from the extensive research that I have completed over the years into the "invisible" or what I like to call the "deep" web. The Deep Web covers somewhere in the vicinity of 1 trillion plus pages of information located through the world wide web in various files and formats that the current search engines on the Internet either cannot find or have difficulty accessing. The current search engines find hundreds of billions of pages at the present time of this writing. In the last several years, some of the more comprehensive search engines have written algorithms to search the deeper portions of the world wide web by attempting to find files such as .pdf, .doc, .xls, ppt, .ps. and others. This Deep Web Research 2012 report and guide is divided into the following sections: Bot Research Related:  Search Skills

Google Chrome Tips 11 Unknown Ways Of Using Google Search - Curious Mob Thinking what more is there to know about Google search? I mean its Google search after all, type whatever you want to search, press enter and everything in the world related to your topic is displayed in front of your eyes. But believe it or not the search engine has plenty of tricks up its sleeve. Here’s an overview of 11 Google Tricks That Will Change the Way You Search: 11. One well-known, simple trick while searching a phrase in quotes is that it will yield only pages with the same words in the same order as what’s in the quotes.

Verification Handbook for Investigative Reporting Craig Silverman is the founder of Emergent, a real-time rumor tracker and debunker. He was a fellow with the Tow Center for Digital Journalism at Columbia University, and is a leading expert on media errors, accuracy and verification. Craig is also the founder and editor of Regret the Error, a blog about media accuracy and the discipline of verification that is now a part of the Poynter Institute. He edited the Verification Handbook, previously served as director of content for Spundge, and helped launch OpenFile, an online local news startup that delivered community-driven reporting in six Canadian cities. Craig is also the former managing editor of PBS MediaShift and has been a columnist for The Globe And Mail, Toronto Star, and Columbia Journalism Review. He tweets at @craigsilverman. Rina Tsubaki leads and manages the "Verification Handbook" and "Emergency Journalism" initiatives at the European Journalism Centre in the Netherlands.

How to search like a spy: Google's secret hacks revealed The National Security Agency just declassified a hefty 643-page research manual called Untangling the Web: A Guide to Internet Research (PDF) that, at least at first, doesn't appear all that interesting. That is, except for one section on page 73: "Google Hacking." "Say you're a cyberspy for the NSA and you want sensitive inside information on companies in South Africa," explains Kim Zetter at Wired. "What do you do?" Well, you could type the following advanced search into Google — "filetype:xls site:za confidential" — to uncover a trove of seemingly private spreadsheets. These are just two examples of the numerous private files that are inadvertently uploaded to the Internet, and can be accessed if you know the right Google search terms. Here are a few more: Pretty neat, huh? And even if keyboard espionage isn't really your thing, the document contains a number of practical tips anyone can use to become a better Googler: * Repeating a word will help you find more relevant hits.

Pathways | Finding | Effective searching | Being Digital | Open University Library Services When you select a pathway, you will see a number of activities on a particular theme. Pathways allow you to develop a deeper understanding of a topic. You can work through the activities in your chosen pathway in any order. Activities will open in a new tab or window. The icon next to each activity helps you to identify the format used (e.g. activity, video, audio, or external resource). Viewing all pathways This is a list of all the pathways available. Assess your skills Assess your familiarity and confidence with online tools and environments and find out which activities can help you develop your skills further. Start pathway Avoiding plagiarism Learn to recognise what plagiarism is, the forms it can take and how to avoid it by developing your skills. Start pathway Communicating online How can you ensure your interactions with others online are appropriate and effective? Start pathway Effective searching Start pathway Exploring your information landscape Start pathway Keeping up-to-date Using

6 common misconceptions when doing advanced Google Searching As librarians we are often called upon to teach not just library databases but also Google and Google Scholar. Unlike teaching other search tools, teaching Google is often tricky because unlike library databases where we can have insider access through our friendly product support representative as librarians we have no more or no less insight into Google which is legendary for being secretive. Still, given that Google has become synonymous with search we should be decently good at teaching it. I've noticed though, often when people teach Google, particularly advanced searching of Google, they fall prey to 2 main types of errors. The first type of error involved not keeping up to date and given the rapid speed that Google changes, we often end up teaching things that no longer work. The second type of error is perhaps more common to us librarians. Also the typical Google search brings back estimated count of results. e.g. The 6 are 1. About tilde (~) About plus operator (+) 2. 3. 4. 5. 6. 7.

How To Extract Google Results Into a Spreadsheet Irina Shamaeva recently posted a link to this page which talks about converting your Google search results to an RSS feed. I decided to explore taking this a step further – if I can convert it to RSS, then can I import these results into a spreadsheet? With the assistance from Aaron Lintz and David Galley (to bounce ideas off of), I took a look at Excel and Google Docs to do this. First Things First – You need to use a Google Custom Search Engine (Google CSE) for this – either create your own or use one created by someone else (like my basic one)You need to complete the steps in this article to setup a Google CSE API key. STEP 1 – Find the CSE id of your search engine Find the unique identifier your Google CSE of choice. STEP 2 – Plug your CSE id and Google CSE API, and search term into the Google CSE API URL The basic URL is My example URL is (excluding the API key as this is purely for my use) Summary

Skills for Online Searching - ipl2 A+ Research & Writing Learn how search syntax works Search syntax is a set of rules describing how users can query the database being searched. Sophisticated syntax makes for a better search, one where the items retrieved are mostly relevant to the searcher's need and important items are not missed. Boolean logic Boolean logic allows the use of AND, OR and NOT to search for items containing both terms, either term, or a term only if not accompanied by another term. Wildcards and truncation This involves substituting symbols for certain letters of a word so that the search engine will retrieve items with any letter in that spot in the word. Phrase searching Many concepts are represented by a phrase rather than a single word. Proximity This allows the user to find documents only if the search terms appear near each other, within so many words or paragraphs, or adjacent to each other. Capitalization Field searching All database records are divided up into fields. Make sure you know what content you're searching

Live Training – Search Education – Google With these webinars, you can improve your own search skills and learn how to bring search literacy to your school. Browse the archive of past trainings, and make sure to follow us on Google+ to stay up to speed on the latest tips and trainings from Google. Even better search results: Getting to know Google search for education Google makes it simple to find the information you need, but there are strategies for finding higher quality sources even more easily. Learn the basics of predictive search, a method for drawing on what you know about what you need to find it faster, including successful word choice and using the filters on the left-hand side of the screen to uncover information you never dreamed was possible. Power searching: Advanced Google search for education When you realize that the information you want will be a presentation or PDF, what can you do? Beyond the First Five Links Looking for new ways to motivate students to look beyond the first five links in a search engine?

How to Search Twitter Like a Superstar [The Free Guide] Every second, on average, around 6,000 tweets are sent on Twitter, which translates to over 500 million tweets per day! Did you know you could search every single one of them? (Plus the multi-million profiles attached to them!). Twitter has an amazing, yet somewhat little-known Twitter Advanced Search tool to help you find exactly what we’re looking for. Looking to find your next customers? Want to measure the happiness of your current customers? Twitter Advanced Search is a goldmine for marketers and small business owners. Let’s get into it. First things first … how to find Twitter’s Advanced Search! There are a few different ways you can search on Twitter: You can use Twitter’s website toolbar search field. The web search page. The mobile app search (on Twitter’s iOS or Android apps). All of these options are great if you’re looking to quickly dive into a certain topic or hashtag. Navigating Twitter Advanced Search At a first glance, the Advanced Search page may appear a little overwhelming.

Every Google Search Operator You’ll Ever Need — EMA Boston I consider myself a bit of a Boolean geek. I fell in love with Boolean search in college, but it wasn’t until I got hold of Lexis-Nexis after college that I realized the power of search. When Google came around many moons later, I was disappointed that I couldn’t use the same Boolean operators that I could elsewhere — Google wants the experience to be as simple as possible, and, let’s face it, Boolean search strings can be pretty overwhelming. But then again, so can long Google search strings. As it turns out, Google uses many of the same search operators that other Boolean systems do; it just changes the terms around in places. So, to help us all out, I’ve compiled as thorough as possible comparison of Google and “traditional” Boolean notation, as represented by the Cision search tool we use here at HB. About Todd Van Hoosear Learn more about Todd

Related: