background preloader

Start page – collectd – The system statistics collection daemon

Start page – collectd – The system statistics collection daemon

Enomaly: Elastic / Cloud Computing Platform: Community etsy/statsd RabbitMQ - Messaging that just works Monitoring at Spotify: The Story So Far | Labs This is the first in a two-part series about Monitoring at Spotify. In this, I’ll be discussing our history, the challenges we faced, and how they were approached. Operational monitoring at Spotify started its life as a combination of two systems. Zabbix and a homegrown RRD-backed graphing system named “sitemon”, which used Munin for collection. In late 2013, we were starting to put more emphasis on self service and distributed operational responsibility. We tried to bandage up what we could: our Chief Architect hacked together an in-memory sitemon replacement that could hold roughly one month worth of metrics under the current load. Alerting as a service Alerting was the first problem we took a stab at. We considered developing Zabbix further. We found inspiration from attending Monitorama EU where we stumbled upon Riemann. We built a library on top of Riemann called Lyceum. Graphing We went a few rounds here. The difficulties in sharding and rebalancing Graphite became prohibitive. Tags

Cloud computing Cloud computing metaphor: For a user, the network elements representing the provider-rendered services are invisible, as if obscured by a cloud. Cloud computing is a computing term or metaphor that evolved in the late 1990s, based on utility and consumption of computer resources. Cloud computing involves application systems which are executed within the cloud and operated through internet enabled devices. Purely cloud computing does not rely on the use of cloud storage as it will be removed upon users download action. Clouds can be classified as public, private and hybrid.[1][2] Overview[edit] Cloud computing[3] relies on sharing of resources to achieve coherence and economies of scale, similar to a utility (like the electricity grid) over a network.[2] At the foundation of cloud computing is the broader concept of converged infrastructure and shared services. Cloud computing, or in simpler shorthand just "the cloud", also focuses on maximizing the effectiveness of the shared resources.

Monitoring at Spotify: Introducing Heroic | Labs This is the second part in a series about Monitoring at Spotify. In the previous post I discussed our history of operational monitoring. In this part I’ll be presenting Heroic, our scalable time series database which is now free software. Heroic is our in-house time series database. We are aware Elasticsearch has a bad reputation for data safety, so we guard against total failures by having the ability to completely rebuild the index rapidly from our data pipeline or Cassandra. A key feature of Heroic is global federation. Every host in our infrastructure is running ffwd, which is an agent responsible for receiving and forwarding metrics. This setup allows us to rapidly experiment with our service topology. In the backend everything is stored exactly as it was provided to the agent. In using Heroic, we’ve been able to build custom dashboards and alerting systems that make use of the same interface. All parts of Heroic is now free software, feel free to grab the code on Github.

Cloud.com CEO Sheng Liang Discusses Open-Source Cloud Computing & Asia Cloud.com CEO Sheng Liang was the lead developer on Sun Microsystems' original Java Virtual Machine (JVM) team. Today he is a co-founder and CEO of Cloud.com, based in Cupertino, CA. The company delivers an open-source platform for both Public and Private Clouds, and will be discussing all this at the upcoming Cloud Expo In New York June 6-9. Here are a few things we discussed in a recent interview... 1. We believe datacenter operators will build Cloud infrastructure using best-of-breed open source software. Beyond the infrastructure control layer Cloud.com additionally develops proprietary software products that enable our customers to run successful Cloud Computing businesses. 2. The Java Virtual Machine (JVM) experience showed me how a new computing paradigm can be developed and can be accepted by the mainstream IT. 3. Cloud.com customers are datacenter operators. 4. Public Cloud operations like Amazon EC2 have larger market share than Private Cloud today. 5. 6.

Linux Performance Analysis in 60,000 Milliseconds You login to a Linux server with a performance issue: what do you check in the first minute? At Netflix we have a massive EC2 Linux cloud, and numerous performance analysis tools to monitor and investigate its performance. These include Atlas for cloud-wide monitoring, and Vector for on-demand instance analysis. While those tools help us solve most issues, we sometimes need to login to an instance and run some standard Linux performance tools. In this post, the Netflix Performance Engineering team will show you the first 60 seconds of an optimized performance investigation at the command line, using standard Linux tools you should have available. In 60 seconds you can get a high level idea of system resource usage and running processes by running the following ten commands. uptime dmesg | tail vmstat 1 mpstat -P ALL 1 pidstat 1 iostat -xz 1 free -m sar -n DEV 1 sar -n TCP,ETCP 1 top Some of these commands require the sysstat package installed. 1. uptime 2. dmesg | tail 3. vmstat 1 7. free -m

OpenStack Open Source Cloud Computing Software VMware Infrastructure (vSphere) Java API

Related: