background preloader

Computer architecture

Facebook Twitter

陳雲濤的部落格: [筆記] 平行計算 Ch1 名詞解釋與比較. 缓存一致性(Cache Coherency)入门. 本文是RAD Game Tools程序员Fabian “ryg” Giesen在其博客上发表的《Cache coherency primer》一文的翻译,经作者许可分享至InfoQ中文站。

缓存一致性(Cache Coherency)入门

该系列共有两篇,本文系第一篇。 我计划写一些关于多核场景下数据组织的文章。 写了第一篇,但我很快意识到有大量的基础知识我首先需要讲一下。 在本文中,我就尝试阐述这些知识。 缓存(Cache) 本文是关于CPU缓存的快速入门。 在现代的CPU(大多数)上,所有的内存访问都需要通过层层的缓存来进行。 CPU的读/写(以及取指令)单元正常情况下甚至都不能直接访问内存——这是物理结构决定的;CPU都没有管脚直接连到内存。 缓存是分“段”(line)的,一个段对应一块存储空间,大小是32(较早的ARM、90年代/2000年代早期的x86和PowerPC)、64(较新的ARM和x86)或128(较新的Power ISA机器)字节。 当CPU看到一条读内存的指令时,它会把内存地址传递给一级数据缓存(或可戏称为L1D$,因为英语中“缓存(cache)”和“现金(cash)”的发音相同)。 如果我们只处理读操作,那么事情会很简单,因为所有级别的缓存都遵守以下规律,我称之为: 基本定律:在任意时刻,任意级别缓存中的缓存段的内容,等同于它对应的内存中的内容。 一旦我们允许写操作,事情就变得复杂一点了。 回写模式就有点复杂了。 回写定律:当所有的脏段被回写后,任意级别缓存中的缓存段的内容,等同于它对应的内存中的内容。 换句话说,回写模式的定律中,我们去掉了“在任意时刻”这个修饰语,代之以弱化一点的条件:要么缓存段的内容和内存一致(如果缓存段是干净的话),要么缓存段中的内容最终要回写到内存中(对于脏缓存段来说)。 直接模式更简单,但是回写模式有它的优势:它能过滤掉对同一地址的反复写操作,并且,如果大多数缓存段都在回写模式下工作,那么系统经常可以一下子写一大片内存,而不是分成小块来写,前者的效率更高。

有些(大多数是比较老的)CPU只使用直写模式,有些只使用回写模式,还有一些,一级缓存使用直写而二级缓存使用回写。 一致性协议(Coherency protocols) 只要系统只有一个CPU核在工作,一切都没问题。 平行計算ch2筆記 Directory-based Protocol. 本系列是 Parallel Programming in C with MPI and OpenMP 這本書的讀書心得!

平行計算ch2筆記 Directory-based Protocol

上篇前情提要: Multiprocessor 兩種:Centralized Multiprocessor 與 Distributed Multiprocessor Centralized Multiprocessor :直接把Uniprocessor做擴充! 直接把CPU掛到bus上! 不管哪一個CPU處理去連記憶體都一樣快! 這個架構在處理Cache coherence(快取一致性)的機制為 Write Invalidate Protocol,其運作的內容於上一篇詳細介紹過. Symmetric multiprocessor system. Multiprocessing. According to some on-line dictionaries, a multiprocessor is a computer system having two or more processing units (multiple processors) each sharing main memory and peripherals, in order to simultaneously process programs.[4][5] A 2009 textbook defined multiprocessor system similarly, but noting that the processors may share "some or all of the system’s memory and I/O facilities"; it also gave tightly coupled system as a synonymous term.[6] In Flynn's taxonomy, multiprocessors as defined above are MIMD machines.[10][11] As they are normally construed to be tightly coupled (share memory), multiprocessors are not the entire class of MIMD machines, which also contains message passing multicomputer systems.[10]

Multiprocessing

Memory management unit. A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical addresses.

Memory management unit

It is usually implemented as part of the central processing unit (CPU), but it also can be in the form of a separate integrated circuit. Overview[edit] Schematic of the operation of an MMU[1]:186 ff. Page table entries[edit] Most MMUs use an in-memory table of items called a "page table", containing one "page table entry" (PTE) per page, to map virtual page numbers to physical page numbers in main memory. Sometimes, a PTE prohibits access to a virtual page, perhaps because no physical random access memory has been allocated to that virtual page. Hardware/Software Codesign Group. Cache coherence. An illustration showing multiple caches of some memory, which acts as a shared resource In the illustration on the right, if the client on the top has a copy of a memory block from a previous read and the client on the bottom changes that memory block, the client on the top could be left with an invalid cache of memory without any notification of the change.

Cache coherence

Cache coherence is intended to manage such conflicts and maintain consistency between the cache and memory. Overview[edit] In a shared memory multiprocessor system with a separate cache memory for each processor, it is possible to have many copies of any one instruction operand: one copy in the main memory and one in each cache memory. When one copy of an operand is changed, the other copies of the operand must be changed also. There are three distinct levels of cache coherence:[2] Definition[edit] Coherence defines the behavior of reads and writes to the same memory location. Coherency mechanisms[edit] Directory-based Snooping Snarfing. CPU cache. Overview[edit] When the processor needs to read from or write to a location in main memory, it first checks whether a copy of that data is in the cache.

CPU cache

If so, the processor immediately reads from or writes to the cache, which is much faster than reading from or writing to main memory. Most modern desktop and server CPUs have at least three independent caches: an instruction cache to speed up executable instruction fetch, a data cache to speed up data fetch and store, and a translation lookaside buffer (TLB) used to speed up virtual-to-physical address translation for both executable instructions and data. The data cache is usually organized as a hierarchy of more cache levels (L1, L2, etc.; see also multi-level caches below). However, a TLB cache is part of the memory management unit (MMU) and not directly related to the CPU caches. 牛的大腦-caching.