background preloader

Apache Tika - Apache Tika

Apache Tika - Apache Tika

State of Adversarial Stylometry: can you change your prose-style? Today at the Chaos Computer Congress in Berlin (28C3), Sadia Afroz and Michael Brennan presented a talk called "Deceiving Authorship Detection," about research from Drexel College on "Adversarial Stylometry," the practice of identifying the authors of texts who don't want to be identified, and the process of evading detection. Stylometry has made great and well-publicized advances in recent years (and it made the news with scandals like "Gay Girl in Damascus"), but typically this has been against authors who have not taken active, computer-assisted countermeasures at disguising their distinctive "voice" in prose. As part of the presentation, the Drexel Team released Anonymouth, a free/open tool that partially automates the process of evading authorship detection. The tool is still a rough alpha, and it requires human intervention to oversee the texts it produces, but it is still an exciting move in adversarial stylometry tools. Privacy, Security and Automation Lab

Adam Parrish · Getting data from the web Python: hidden details In the interest of brevity, we’ve skipped over some fairly important details of Python. Here’s our chance to play catch-up. Other kinds of loops; loop control The for loop is far and away the most common loop in Python. But there’s another kind of loop that you’ll encounter frequently: the while loop. >>> i = 0 >>> while i < 10: ... i += 1 ... print i ... 1 2 3 4 5 6 7 8 9 10 Python also has two loop control statements. >>> i = 0 >>> while i < 10: ... i += 1 ... if i % 2 == 1: ... continue ... print i ... 2 4 6 8 10 The continue statement causes Python to skip back to the top of the loop; the remaining statements aren't executed. Finally, we have break, which causes Python to drop out of the loop altogether. >>> i = 0 >>> while i < 10: ... i += 1 ... if i > 5: ... break ... print i ... 1 2 3 4 5 Here, as soon as i achieves a value greater than 5, the break statement gets executed, and Python stops executing the loop. Tuples from module import stuff File objects URLs 01.<? 02.

Mastering Google Analytics Custom Variables I’ve got a stack of posts that I want to write, and realized that the all deal with Custom Variables. So, to make sure that we’re all on the same page when it comes to custom vars, here’s my guide to Mastering Google Analytics Custom Variables. For those of you that have not used custom variables, CVs are a way for you to insert custom data into Google Analytics. There are 4 parts to a custom variable: 1. Name & Value Custom variables are name-value pairs of data. Google Analytics will show you a list of all the custom variable names in a list and then let you drill down into the list and see all of the values. Here’s an example. Then I can click on “Year” to a get a list of all the values: Custom variables can also be used in custom reports and advanced segments. Index or Slot The index is a way to organize your custom variables. You can technically have more than 5 custom variables, but we need to discuss the next concept, scope, and how it impacts the index. Scope The Code Super Nerd Stuff

Build a Better Sub-$200 Linux PC No one who expected the languid economy to have fully revived by now can be cheered by the way things have gone this summer; the volatile stock market alone has been a constant dispenser of heartache. So if you’re in need of a computer, even just a small one to do basic, everyday things, you may have put it off because of the uncertainty currently surrounding, well, everything. But it’s possible to build a PC yourself for an obscenely low cash layout—less than you'd spend on pretty much any full system on the market. In fact, you can even do it for as little as $200. And no, that’s not a typo. We first proved this last year , back when it looked like the economy’s most turbulent days were behind it. The answer to the first question was a no-brainer: absolutely. It was also obvious that our new desktop would be superior in terms of performance. As for whether we could spend a lot less this year than we could in 2010...

PathFinding.js Click within the white grid and drag your mouse to draw obstacles. Drag the green node to set the start position. Drag the red node to set the end position. Choose an algorithm from the right-hand panel. Click Start Search in the lower-right corner to start the animation. Breadth-First-Search Best-First-Search Dijkstra Jump Point Search Orthogonal Jump Point Search Trace generating grid 100%

Free HTML5 and JavaScript Charting Library – Flotr2 Flotr2 is a framework independent pure JavaScript charting library for drawing html5 graphs and charts. It supports many graphs types such as lines,bars,candles,pies and bubbles. Flotr2 is supported on all major browsers including IE6. The library supports plugins. Licence: Open SourceWebsite: Source: Enjoyed This Post? Digg this | stumble | | | DZone Creating and assigning roles Last updated June 13, 2012. Created on May 13, 2012.Edited by kwseldman, Itangalo. Log in to edit this page. Hi, it’s Boss. I know you’ve had a lot to do with the website, working weekends and such, so I thought you should get some help. Our intern will help you for the two weeks she is here. When talking to our consultants, they said it is possible to introduce new levels of permissions somehow – they particularly stressed that it would be a bad idea to give an intern access to settings on the site. Anyways – if you could create an editor permission level and make our intern an editor, it'd be great. //Boss Looking for support?

How We Built A Data Center With Commodity Hardware And FOSS 312inShare19Share Who are we and what we do? We are a startup named QuickoLabs based out of Bangalore, India. Our product SearchEnabler, is on-demand SEO software which crawls and analyzes user’s website to provide recommendations, helping them improve their website ranking in search engine results. Our goal is to make SEO easy, affordable & measurable for start-ups and small businesses. To realize our goal, we wanted to ensure minimum cost is incurred in our operations without compromising on product capability. Our Data Needs Today our infrastructure holds more than 8TB of data collected from web and processes nearly 250 GB of data everyday. Our infrastructure currently manages: 2 Applications Servers5 Cassandra Nodes4 Task Trackers9 Data Nodes Cost-Benefit Analysis Of Having Our Own Infrastructure We have to do a lot of web crawling and data processing to provide metrics and analytics to our customers. How we built our own data center? 1. 2. 3. 4. Try SearchEnabler SEO platform.