background preloader

DSM Distributed Shared Memory

Facebook Twitter

HP-UX

Dsm.pdf (Objet application/pdf) Non-Uniform Memory Access. Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory (memory local to another processor or memory shared between processors). The benefits of NUMA are limited to particular workloads, notably on servers where the data are often associated strongly with certain tasks or users.[1] NUMA architectures logically follow in scaling from symmetric multiprocessing (SMP) architectures. They were developed commercially during the 1990s by Burroughs (later Unisys), Convex Computer (later Hewlett-Packard), Honeywell Information Systems Italy (HISI) (later Groupe Bull), Silicon Graphics (later Silicon Graphics International), Sequent Computer Systems (later IBM), Data General (later EMC), and Digital (later Compaq, now HP).

Basic concept[edit] One possible architecture of a NUMA system. Distributed memory. An illustration of a distributed memory system of three computers In computer science, distributed memory refers to a multiple-processor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data is required, the computational task must communicate with one or more remote processors. In contrast, a shared memory multi processor offers a single memory space used by all processors. Processors do not have to be aware where data resides, except that there may be performance penalties, and that race conditions are to be avoided.

Architecture[edit] In a distributed memory system there is typically a processor, a memory, and some form of interconnection that allows programs on each processor to interact with each other. Programming distributed memory machines[edit] The key issue in programming distributed memory systems is how to distribute the data over the memories. Distributed shared memory[edit] See also[edit] LinuxPMI. Transfers in an openMosix cluster. OpenMosix is the predecessor of LinuxPMI. LinuxPMI (Linux Process Migration Infrastructure) is a Linux kernel extension for multi-system-image (in contrast to a single-system image) clustering. The project is a continuation of the abandoned openMosix clustering project.

Function[edit] How LinuxPMI works is perhaps best described by an example. This is in many ways similar in principle to how a multi-user operating system manages workload on a multi-CPU system; however, LinuxPMI can have machines (nodes) with several CPUs. In short it means that ten computers are able to work as one large computer; however, there is no master machine, so each machine can be used as an individual workstation. See also[edit] External links[edit] OpenMosix. Transfers in an openMosix cluster. History[edit] openMosix was considered stable on Linux kernel 2.4.x for the x86 architecture, but porting to Linux 2.6 kernel remained in the alpha stage. Support for the 64-bit AMD64 architecture only started with the 2.6 version. As of March 1, 2008, openMosix read-only source code is still hosted at SourceForge.[3] The LinuxPMI project is continuing development of the former openMosix code.

See also[edit] Live CDs[edit] Linux Live CDs with openMosix include: References[edit] External links[edit] openMosix cluster sites[edit] OpenSSI. OpenSSI. Description[edit] OpenSSI allows a cluster of individual computers (nodes) to be treated as one large system. Processes run on any node have full access to the resources of all nodes. Processes can be migrated from node to node automatically to balance system utilization. Inbound network connections can be directed to the least loaded node available. OpenSSI is designed to be used for both high performance and high availability clusters. It is possible to create an OpenSSI cluster with no single point of failure, for example the file system can be mirrored between two nodes, so if one node crashes the process accessing the file will fail over to the other node. Alternatively the cluster can be designed in such a manner that every node has direct access to the file system.

Features[edit] Single Process space[edit] OpenSSI provides a single process space - every process is visible from every node, and can be managed from any node using the normal Linux commands (ps, kill, renice and so on). Kerrighed. Kerrighed is an open source single-system image (SSI) cluster software project. The project started in October 1998 at the Paris research group The French National Institute for Research in Computer Science and Control. Since 2006, the project is mainly developed by the Kerlabs company.

Background[edit] Kerrighed is implemented as an extension to the GNU/Linux operating system. It helps scientific applications such as numerical simulations to use more power. Such applications may be using OpenMP, Message Passing Interface, and/or a Posix multithreaded programming model.[1] Kerrighed provides several features such as a distributed shared memory with a sequential consistency model, processes migration from one cluster node to another, and to a limited extent checkpointing. See also[edit] External links[edit] References[edit] Jump up ^ Morin, Christine. [Clusters_sig] Kerrighed Linux-based SSI for clusters. Main Page - Kerrighed. Welcome to XtreemOS! — XtreemOS : A Linux-based Operating System to support Virtual Organizations for next generation Grids.

Kerlabs. Cache-only memory architecture. Cache only memory architecture (COMA) is a computer memory organization for use in multiprocessors in which the local memories (typically DRAM) at each node are used as cache. This is in contrast to using the local memories as actual main memory, as in NUMA organizations. In NUMA, each address in the global address space is typically assigned a fixed home node. When processors access some data, a copy is made in their local cache, but space remains allocated in the home node. Instead, with COMA, there is no home. An access from a remote node may cause that data to migrate.

Compared to NUMA, this reduces the number of redundant copies and may allow more efficient use of the memory resources. A huge body of research has explored these issues. See also[edit] References[edit] Shared memory. An illustration of a shared memory system of three computers In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between programs. Depending on context, programs may run on a single processor or on multiple separate processors. In hardware[edit] In computer hardware, shared memory refers to a (typically large) block of random access memory (RAM) that can be accessed by several different central processing units (CPUs) in a multiple-processor computer system.

A shared memory system is relatively easy to program since all processors share a single view of data and the communication between processors can be as fast as memory accesses to a same location. CPU-to-memory connection becomes a bottleneck. In software[edit] In computer software, shared memory is either Support on UNIX platforms[edit] See also[edit] References[edit] Distributed shared memory. Software DSM systems can be implemented in an operating system (OS), or as a programming library and can be thought of as extensions of the underlying virtual memory architecture.

When implemented in the OS, such systems are transparent to the developer; which means that the underlying distributed memory is completely hidden from the users. In contrast, Software DSM systems implemented at the library or language level are not transparent and developers usually have to program differently. However, these systems offer a more portable approach to DSM system implementation. Software DSM systems also have the flexibility to organize the shared memory region in different ways.

The page based approach organizes shared memory into pages of fixed size. In contrast, the object based approach organizes the shared memory region as an abstract space for storing shareable objects of variable sizes. Examples of such systems include: See also[edit] References[edit] External links[edit] Memory coherence. Conversely, in multiprocessor (or multicore) systems, there are two or more processing elements working at the same time, and so it is possible that they simultaneously access the same memory location. Provided none of them changes the data in this location, they can share it indefinitely and cache it as they please.

But as soon as one updates the location, the others might work on an out-of-date copy that, e.g., resides in their local cache. Consequently, some scheme is required to notify all the processing elements of changes to shared values; such a scheme is known as a "memory coherence protocol", and if such a protocol is employed the system is said to have a "coherent memory". The exact nature and meaning of the memory coherency is determined by the consistency model that the coherence protocol implements. In order to write correct concurrent programs, programmers must be aware of the exact consistency model that is employed by their systems. See also[edit] References[edit] Cache coherence. Multiple Caches of Shared Resource When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system.

Referring to the "Multiple Caches of Shared Resource" figure, if the top client has a copy of a memory block from a previous read and the bottom client changes that memory block, the top client could be left with an invalid cache of memory without any notification of the change. Cache coherence is intended to manage such conflicts and maintain consistency between cache and memory. Overview[edit] In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of any one instruction operand: one copy in the main memory and one in each cache memory.

There are three distinct levels of cache coherence:[2] In both level 2 behavior and level 3 behavior, a program can observe stale data. Definition[edit] Coherency mechanisms[edit]