background preloader

Distributed computing

Distributed computing
"Distributed Information Processing" redirects here. For the computer company, see DIP Research. Distributed computing is a field of computer science that studies distributed systems. A distributed system is a software system in which components located on networked computers communicate and coordinate their actions by passing messages.[1] The components interact with each other in order to achieve a common goal. Three significant characteristics of distributed systems are: concurrency of components, lack of a global clock, and independent failure of components.[1] Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications. A computer program that runs in a distributed system is called a distributed program, and distributed programming is the process of writing such programs.[2] There are many alternatives for the message passing mechanism, including RPC-like connectors and message queues. Introduction[edit] History[edit] Related:  Doc

Zooniverse (citizen science project) Zooniverse is a citizen science web portal owned and operated by the Citizen Science Alliance. The organization grew from the original Galaxy Zoo project and now hosts dozens of projects which allow volunteers to participate in scientific research. Unlike many early internet-based citizen science projects (such as SETI@home) which used spare computer processing power to analyse data, known as volunteer computing, Zooniverse projects require the active participation of human volunteers to complete research tasks. Projects have been drawn from disciplines including astronomy, ecology, cell biology, humanities, and climate science.[3] Active projects currently include: According to the Zooniverse site, these projects are now retired:

MapReduce Overview[edit] MapReduce is a framework for processing parallelizable problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster (if all nodes are on the same local network and use similar hardware) or a grid (if the nodes are shared across geographically and administratively distributed systems, and use more heterogenous hardware). Processing can occur on data stored either in a filesystem (unstructured) or in a database (structured). MapReduce can take advantage of locality of data, processing it on or near the storage assets in order to reduce the distance over which it must be transmitted. "Map" step: Each worker node applies the "map()" function to the local data, and writes the output to a temporary storage. A master node orchestrates that for redundant copies of input data, only one is processed." MapReduce allows for distributed processing of the map and reduction operations. Logical view[edit] Map(k1,v1) → list(k2,v2) Uses[edit]

Server-side scripting Dynamic web page: example of server-side scripting (PHP and MySQL). Server-side scripting is a technique used in web development which involves employing scripts on a web server which produce a response customized for each user's (client's) request to the website. The alternative is for the web server itself to deliver a static web page. Scripts can be written in any of a number of server-side scripting languages that are available (see below). Server-side scripting is often used to provide a customized interface for the user. Programs that run on a user's local computer without ever sending or receiving data over a network are not considered clients, and so the operations of such programs would not be considered client-side operations. History[edit] Netscape introduced an implementation of JavaScript for server-side scripting with Netscape Enterprise Server, first released in December, 1994 (soon after releasing JavaScript for browsers).[1][2] Explanation[edit] Languages[edit]

YaCy : un moteur de recherche peer to peer sous licence libre pour remplacer Google Cet article a été publié il y a 3 ans 11 mois 4 jours, il est donc possible qu’il ne soit plus à jour. Les informations proposées sont donc peut-être expirées. C’est ma découverte du jour que je dois à Twitter et plus particulièrement à @glenux. En effet de YaCy, je n’avais encore jamais entendu parler bien qu’il existe depuis 2006. A la lecture de la présentation de YaCy, il y a de quoi être emballé. Ensuite, ce sont les caractéristiques techniques qui m’emballent : Une instance de YaCy peut stocker plus de 20 millions de documents.Partage d’index en peer to peer : YaCy implémente un système de partage d’index s’apparentant à un mécanisme de peer to peer (P2P). YaCy se décompose en quatre modules : un web crawler (le processus qui parcourt les pages web à indexer), un moteur d’indexation, une base de données et une interface utilisateur. Concernant la base de données embarquée, elle est spécifique à YaCy et utilise une structure de type AVL-Trees.

Google File System Google File System (GFS or GoogleFS) is a proprietary distributed file system developed by Google for its own use. It is designed to provide efficient, reliable access to data using large clusters of commodity hardware. A new version of the Google File System is codenamed Colossus.[2] Design[edit] Google File System. Designed for system-to-system interaction, and not for user-to-system interaction. A GFS cluster consists of multiple nodes. The Master server doesn't usually store the actual chunks, but rather all the metadata associated with the chunks, such as the tables mapping the 64-bit labels to chunk locations and the files they make up, the locations of the copies of the chunks, what processes are reading or writing to a particular chunk, or taking a "snapshot" of the chunk pursuant to replicate it (usually at the instigation of the Master server, when, due to node failures, the number of copies of a chunk has fallen beneath the set number). Performance[edit] See also[edit]

Bandwidth throttling Bandwidth throttling is the intentional slowing of Internet service by an Internet service provider. It is a reactive measure employed in communication networks in an apparent attempt to regulate network traffic and minimize bandwidth congestion. Bandwidth throttling can occur at different locations on the network. On a local area network (LAN), a sysadmin may employ bandwidth throttling to help limit network congestion and server crashes. Operation[edit] Clients will make requests to servers, which will respond by sending the required data. In order to prevent such occurrences, a server administrator may implement bandwidth throttling to control the number of requests a server responds to within a specified period of time. Application[edit] Those that could have their bandwidth throttled are typically someone who is constantly downloading and uploading torrents, or someone who just watches a lot of online videos. Network neutrality[edit] Throttling vs. capping[edit] Court cases[edit]

Le logiciel libre moteur de recherche Welcome to Apache Pig! Virtual private server A virtual private server within a host A virtual private server (VPS) is a virtual machine sold as a service by an Internet hosting service. A VPS runs its own copy of an operating system, and customers have superuser-level access to that operating system instance, so they can install almost any software that runs on that OS. For many purposes they are functionally equivalent to a dedicated physical server, and being software defined are able to be much more easily created and configured. They are priced much lower than an equivalent physical server, but as they share the underlying physical hardware with other VPSs, performance may be lower, and may depend on the workload of other instances on the same hardware node. Virtualization[edit] The force driving server virtualization is similar to that which led to the development of time-sharing and multiprogramming in the past. Hosting[edit] With unmanaged or self managed hosting, the customer is left to administer his own server instance.

Folding@home The project has pioneered the use of graphics processing units (GPUs), PlayStation 3s, Message Passing Interface (used for computing on multi-core processors), and some Sony Xperia smartphones for distributed computing and scientific research. The project uses statistical simulation methodology that is a paradigm shift from traditional computing methods.[5] As part of the client–server model network architecture, the volunteered machines each receive pieces of a simulation (work units), complete them, and return them to the project's database servers, where the units are compiled into an overall simulation. Volunteers can track their contributions on the Folding@home website, which makes volunteers' participation competitive and encourages long-term involvement. Folding@home is one of the world's fastest computing systems, with a speed of approximately 100 petaFLOPS. Project significance[edit] A protein before and after folding. Biomedical research[edit] Alzheimer's disease[edit] V7[edit]

Data Science Toolkit Web application A web application or web app is any software that runs in a web browser. It is created in a browser-supported programming language (such as the combination of JavaScript, HTML and CSS) and relies on a web browser to render the application.[1][2][3] History[edit] In earlier computing models, e.g. in client-server, the load for the application was shared between code on the server and code installed on each client locally. In contrast, web applications use web documents written in a standard format such as HTML and JavaScript, which are supported by a variety of web browsers. In 1995 Netscape introduced a client-side scripting language called JavaScript allowing programmers to add some dynamic elements to the user interface that ran on the client side. In 2005, the term Ajax was coined, and applications like Gmail started to make their client sides more and more interactive. Interface[edit] Structure[edit] There are some who view a web application as a two-tier architecture. Benefits[edit] GPUGRID is a distributed computing project hosted by Pompeu Fabra University and running on the Berkeley Open Infrastructure for Network Computing (BOINC) software platform. It performs full-atom molecular biology simulations that are designed to run on Nvidia's CUDA-compatible graphics processing units. Former support for PS3s[edit] See also[edit] List of distributed computing projects References[edit] Further reading[edit] External links[edit]

Welcome to Hive!