关于CPU缓存的一些记录

2020-03-16 cpu cache Comments Word Count: 1354words Read Count: 3minutes

CPU 缓存的学习：

基础的一点概念：

CPU 一般三级缓存（L1 ,L2, L3）:

L1: 缓存指令和数据缓存，L2,L3 不分这些。

L1, L2 是每个CPU 各自有各种的， L3 是所有CPU共享的内存。

涉及到的缓存命中。

Cache Line: 就是 CPU每次加载的 64 Bytes 的内存。

然后将内存放到缓存里的方式是 CPU Associativity

CPU 缓存的一点看法。

看到了有些关于在代码方面可以帮助使用CPU 缓存的地方。

Example 1: Memory accesses and performance

How much faster do you expect Loop 2 to run, compared Loop 1?

int[] arr = new int[64 * 1024 * 1024];
// Loop 1
for (int i = 0; i < arr.Length; i++) arr[i] *= 3;
// Loop 2
for (int i = 0; i < arr.Length; i += 16) arr[i] *= 3;

在 loop 1 和loop 2 在遍历操作上，大概是loop2 的修改操作是loop 1 的 6%. 但是从实际的运行时间上来看，两者耗时相差并不大！。

The reason why the loops take the same amount of time has to do with memory. The running time of these loops is dominated by the memory accesses to the array, not by the integer multiplications. And, as I’ll explain on Example 2, the hardware will perform the same main memory accesses for the two loops.

从我理解来看就是说是在主存里读取赋值耗时很少。所以就算是少了94%的便利操作，也并没有影响多少的时间。

java:

        int[] arr = new int[128 * 1024 * 1024];
        double time = System.currentTimeMillis();
        // Loop 1
        for (int i = 0; i < arr.length; i++) arr[i] *= 3;
        double end = System.currentTimeMillis();
        System.out.println(end- time);
        // Loop 2
        for (int i = 0; i < arr.length; i += 16) arr[i] *= 3;
        double end1 = System.currentTimeMillis();

        System.out.println(end1 - end);
#104.0 ms
#88.0 ms

php:

$list = range(0,1024*64*1024);
$start = microtime(true);
for($index =0; $index < count($list);$index++ ){
    $list[$index] *=3;
}
$end = microtime(true);
$every_time = $end - $start;
echo $every_time.PHP_EOL;
$end = microtime(true);
for($index =0 ;$index < count($list) ;$index += 16 ){
    $list[$index] *=3;
}
$gap_time = microtime(true)- $end;
echo $gap_time.PHP_EOL;
echo round($every_time / $gap_time).PHP_EOL;
#4.4176740646362 s
#0.53316903114319 s
#8  times

其实一开始用PHP测试，因为文章说语言其实影响不大，结果，有点诧异，用Java了发现和得出的结论一致，后来才想起来， PHP的数组底层是用的hash 所以耗时就和你的操作次数比例相同。

然后用Java 来对 step 进行处理，会发现，1，2，4，8，16 ，32，64，128 这几个耗时 1-16 几乎是没多大变化的，但是往后就会时间和正常的读取修改操作比例成正比。

这是为什么呢：

Notice that while step is in the range from 1 to 16, the running time of the for-loop hardly changes. But from 16 onwards, the running time is halved each time we double the step.

The reason behind this is that today’s CPUs do not access memory byte by byte. Instead, they fetch memory in chunks of (typically) 64 bytes, called cache lines. When you read a particular memory location, the entire cache line is fetched from the main memory into the cache. And, accessing other values from the same cache line is cheap!

Since 16 ints take up 64 bytes (one cache line), for-loops with a step between 1 and 16 have to touch the same number of cache lines: all of the cache lines in the array. But once the step is 32, we’ll only touch roughly every other cache line, and once it is 64, only every fourth.

Understanding of cache lines can be important for certain types of program optimizations. For example, alignment of data may determine whether an operation touches one or two cache lines. As we saw in the example above, this can easily mean that in the misaligned case, the operation will be twice slower.

看了好一会才理解过来，因为CPU的缓存通常是按照块(64 bytes) 取的, （哦！！！所以叫做cache lines ，一直没理解到 cache line 这个名词，被line 的汉译给影响了。。。）所以当你的 for loop去遍历处理这个数组的时候，一次处理 cpu 去拿这个数组连续的内存块（cache line） 64 byte = 16 int （正好你的数组的 16 个int）所以你for loop 一次读取内存，cpu 缓存数组的16个int 耗时的正好是读取操作。计算很快。所以 1-16 step 取值耗时没多大变化。

所以数据对齐也很重要。这样决定你是一条cache line 还是两条，这样是否double 了时间，也解释了电脑上 1-16 step 模拟的时候有跳动。

Example 2: Instruction-level parallelism

int steps = 256 * 1024 * 1024;
int[] a = new int[2];

// Loop 1
for (int i=0; i<steps; i++) { a[0]++; a[0]++; }

// Loop 2
for (int i=0; i<steps; i++) { a[0]++; a[1]++; }

还有一个指令级的并行

Loop1 的俩a[0] ++ 后一个依赖前一个，而Loop2 的这个不需要，所以 Loop2 更快。这是在C#上的，但是我在Java 上测试，发现 Loop1 几乎快是Loop2 的一半了，应该是将 a[0]++ 转换成了 a[0]+= 2;

随便写写记录到这儿，有时间再继续看看还是很多东西的，了解下底层的原理和处理问题的想法还是很有意思的。：

Permalink: https://z8n24.top/2020/03/cpu_cache/

License：CC BY 4.0 CN

z8n24

A Computer Engineer

关于CPU缓存的一些记录

z8n24A Computer Engineer