
Google File System Un article de Wikipédia, l'encyclopédie libre. Schéma de principe de Google File System Google File System (GFS) est un système de fichiers distribué propriétaire. Il est développé par Google pour leurs propres applications. Il ne paraît pas être publiquement disponible et il est construit sur du code GPL (ext3 et Linux). Conception[modifier | modifier le code] GFS a été conçu pour répondre aux besoins de stockage de données des applications Google, notamment pour tout ce qui concerne ses activités de recherche sur le Web. Il est optimisé pour la gestion de fichiers de taille importante (jusqu'à plusieurs gigaoctets), et pour les opérations courantes des applications Google : les fichiers sont très rarement supprimés ou réécrits, la plupart des accès portent sur de larges zones et consistent surtout en des lectures, ou des ajouts en fin de fichier (record append); GFS a donc été conçu pour accélérer le traitement de ces opérations. Fichiers[modifier | modifier le code]
How To Set Up A Loadbalanced High-Availability Apache Cluster This is a "copy & paste" HowTo! The easiest way to follow this tutorial is to use a command line client/SSH client (like PuTTY for Windows) and simply copy and paste the commands (except where you have to provide own information like IP addresses, hostnames, passwords,...). This helps to avoid typos. Version 1.0 Author: Falko Timme <ft [at] falkotimme [dot] com> Last edited 04/26/2006 This tutorial shows how to set up a two-node Apache web server cluster that provides high-availability. The advantage of using a load balancer compared to using round robin DNS is that it takes care of the load on the web server nodes and tries to direct requests to the node with less load, and it also takes care of connections/sessions. For this setup, we need four nodes (two Apache nodes and two load balancer nodes) and five IP addresses: one for each node and one virtual IP address that will be shared by the load balancer nodes and used for incoming HTTP requests. I will use the following setup here:
Home » OpenStack Open Source Cloud Computing Software Because Hadoop isn’t perfect: 8 ways to replace HDFS Hadoop is on its way to becoming the de facto platform for the next-generation of data-based applications, but it’s not without flaws. Ironically, one of Hadoop’s biggest shortcomings now is also one of its biggest strengths going forward — the Hadoop Distributed File System. Within the Apache Software Foundation, HDFS is always improving in terms of performance and availability. But if the growing number of options for replacing HDFS signifies anything, it’s that HDFS isn’t quite where it needs to be. Cassandra (DataStax) Not a file system at all but an open source, NoSQL key-value store, Cassandra has become a viable alternative to HDFS for web applications that rely on fast data access. Ceph Ceph is an open source, multi-pronged storage system that was recently commercialized by a startup called Inktank. Dispersed Storage Network (Cleversafe) Isilon (EMC) Lustre MapR File System NetApp Open Solution for Hadoop Feature image courtesy of Shutterstock user Panos Karapanagiotis.
Loadbalancing / failover with IPVS and keepalived Introduction Correct failover and loadbalancing is crucial for high availablility environment. With proper setup we can eliminate single points of failure in case of server crash. I use linux kernel’s support for load balancing, since that seems as well documented and scalable method. What I want to achieve here, is actually fully redundant architecture, so that in case one server fails, the other takes over. For such architecture, you need to have at least 4 servers or in case you want to NAT your internal servers, you need 2. Schema Basic topology with 4 servers would be: Quick explanation of the above image. Router has simple NAT, but you can remove router and use loadbalancers with public ip’s if you have your own dedicated range of IP’s. Loadbalancers in general We need minimal installation of Ubuntu/CentOS linux server. Loadbalancer1 We should configure our keepalived on Loadbalancer1 accordingly: Loadbalancer2 For Loadbalancer2 we should set configuration accordingly: Testing setup
Apache Accumulo HDFS Architecture Guide Introduction The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware. It has many similarities with existing distributed file systems. However, the differences from other distributed file systems are significant. Assumptions and Goals Hardware Failure Hardware failure is the norm rather than the exception. Streaming Data Access Applications that run on HDFS need streaming access to their data sets. Large Data Sets Applications that run on HDFS have large data sets. Simple Coherency Model HDFS applications need a write-once-read-many access model for files. “Moving Computation is Cheaper than Moving Data” A computation requested by an application is much more efficient if it is executed near the data it operates on. Portability Across Heterogeneous Hardware and Software Platforms HDFS has been designed to be easily portable from one platform to another. NameNode and DataNodes HDFS has a master/slave architecture. The File System Namespace
ClusterControl for MySQL Cluster and Galera: database management tools from Severalnines Stop Managing. Start Automating. Automate backups, health checks, failover and recovery using ClusterControl. ClusterControl automates tasks at any stage of the database cluster lifecycle, including: Online backup schedulingConfiguration managementOnline software upgradeDB node failover and recoveryWorkload simulation Hot backup of clusters Hot backups are important for high availability, they can run without blocking the application. You can now schedule backups with ease, view the backups that you have taken, and restore the backups. Configuration Management ClusterControl applies your configuration changes across the entire cluster, and will orchestrate a rolling restart if required. Eliminate configuration drift by discovering and importing local configuration changes. Online Software Upgrades and Live Patching Upload the binaries for the new version of the database software that you want to upgrade to. Workload Simulation Benchmark SQL queries and analyze performance.
Apache Bloodhound TaskAnt: Team To-Do Repository