Research at Google

Google publishes hundreds of research papers each year. Publishing is important to us; it enables us to collaborate and share ideas with, as well as learn from, the broader scientific community. Submissions are often made stronger by the fact that ideas have been tested through real product implementation by the time of publication. We believe the formal structures of publishing today are changing - in computer science especially, there are multiple ways of disseminating information. We encourage publication both in conventional scientific venues, and through other venues such as industry forums, standards bodies, and open source software and product feature releases. Open Source We understand the value of a collaborative ecosystem and love open-source software. Product and Feature Launches With every launch, we're publishing progress and pushing functionality. Industry Standards Our researchers are often helping to define not just today's products but also tomorrow's. Resources Impact

Ontology (Computer Science) - definition in Encyclopedia of Database Systems Ontology byTom Gruber in the Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.), Springer-Verlag, 2009. Synonyms computational ontology, semantic data model, ontological engineering Definition In the context of computer and information sciences, an ontology defines a set of representational primitives with which to model a domain of knowledge or discourse. Historical Background The term "ontology" comes from the field of philosophy that is concerned with the study of being or existence. The term had been adopted by early Artificial Intelligence (AI) researchers, who recognized the applicability of the work from mathematical logic [6] and argued that AI researchers could create new ontologies as computational models that enable certain kinds of automated reasoning [5]. In the early 1990's, an effort to create interoperability standards identified a technology stack that called out the ontology layer as a standard component of knowledge systems [8]. Scientific Fundamentals

IBM Redbooks State of Mobile Security 2012 Download PDF version of State of Mobile Security 2012 The number of smartphone and tablet owners in the world will skyrocket to one billion in the next few years, according to Forrester. As the mobile economy gains momentum, it continues to capture the attention of malware writers. Mobile security continues to be a global issue, with ‘Toll Fraud,’ a type of malware designed for profit, emerging as the lead threat. Over the past year, Lookout estimates that millions of people were affected by malware worldwide with millions of dollars stolen from consumers. But numbers don’t tell the full story. Highlights Mobile malware is a profitable business. Methodology The State of Mobile Security 2012 findings are based on data collected and analyzed by Lookout’s Mobile Threat Network, which includes application data from a variety of global sources including official application markets, alternative application sources, and mobile devices to form the largest mobile application dataset in the world.

dblp: DBLP Computer Science Bibliography - Welcome Dictionary of Algorithms and Data Structures This web site is hosted by the Software and Systems Division, Information Technology Laboratory, NIST. Development of this dictionary started in 1998 under the editorship of Paul E. Black. After 20 years, DADS needs to move. If you are interested in taking over DADS, please contact Paul Black. This is a dictionary of algorithms, algorithmic techniques, data structures, archetypal problems, and related definitions. Don't use this site to cheat. Currently we do not include algorithms particular to business data processing, communications, operating systems or distributed algorithms, programming languages, AI, graphics, or numerical analysis: it is tough enough covering "general" algorithms and data structures. Some terms with a leading variable, such as n-way, m-dimensional, or p-branching, are under k-. To look up words or phrases, enter them in the box, then click the button. We thank those who contributed definitions as well as many others who offered suggestions and corrections.

Publications | LinkedIn Data Team Distributed data systems systems are used in a variety of settings like online serving, offline analytics, data transport, and search, among other use cases. They let organizations scale out their workloads using cost-effective commodity hardware, while retaining key properties like fault tolerance and scalability. At LinkedIn we have built a number of such systems. A key pattern we observe is that even though they may serve different purposes, they tend to have a lot of common functionality, and tend to use common building blocks in their architectures. One such building block that is just beginning to receive attention is cluster management, which addresses the complexity of handling a dynamic, large-scale system with many servers. All of this shared complexity, which we see in all of our systems, motivates us to build a cluster management framework, Helix, to solve these problems once in a general way.

Fast String Searching With Suffix Trees I think that I shall never see A poem lovely as a tree. Poems are made by fools like me, But only God can make a tree. - Joyce Kilmer A tree's a tree. How many more do you need to look at? The problem Matching string sequences is a problem that computer programmers face on a regular basis. Imagine that you've just been hired as a programmer working on a DNA sequencing project. It is obvious at this point that a brute force string search is going to be terribly inefficient. The intuitive solution Since the database that you are testing against is invariant, preprocessing it to simplify the search seems like a good idea. Figure 1 The Suffix Trie Representing "BANANAS" Figure 1 shows a Suffix trie for the word BANANAS. The second point is what makes the suffix trie such a nice construct. Remarkable as this might seem, it means I could determine if the word BANANAS was in the collected works of William Shakespeare by performing just seven character comparisons. Under the spreading suffix tree

.:: General Purpose Hash Function Algorithms - By Arash Partow ::. Description Hash functions are by definition and implementation pseudo random number generators (PRNG). From this generalization its generally accepted that the performance of hash functions and also comparisons between hash functions can be achieved by treating hash function as PRNGs. Analysis techniques such a Poisson distribution can be used to analyze the collision rates of different hash functions for different groups of data. In general there is a theoretical hash function known as the perfect hash function for any group of data. The problem is that there are so many permutations of types of data, some highly random, others containing high degrees of patterning that its difficult to generalize a hash function for all data types or even for specific data types. Data Distribution This is the measure of how well the hash function distributes the hash values of elements within a set of data. The hash functions in this essay are known as simple hash functions. Hashing Methodologies

Information Retrieval, Intelligence, Integrated Optimization and Scientific Marketing