跳到主要內容

發表文章

目前顯示的是 1月, 2017的文章

[LKML] false sharing: detection and solution in Linux

4.10 kernel 有一個有趣 perf 的patch,主要用來改善cache contention的偵測[1],尤其是對false sharing的判斷。 False sharing發生時,原本預期可以利用多核平行處理來達到加速的程式碼片段,往往會跑得比單核還慢。 舉例來說,如果有以下程式: struct  foo {     int x;     int y; }; static   struct  foo f; /* The two following functions are running concurrently: */ int sum_a(void) {     int s = 0;     int i;      for  (i = 0; i < 1000000; ++i)         s += f.x;      return  s; } void inc_b(void) {     int i;      for  (i = 0; i < 1000000; ++i)         ++f.y; } 開兩個thread讓sum_a()與inc_b()跑在不同CPU上,乍看之下各自讀寫的address不同,應該可以獨立執行, 但由於cache coherence機制以cache line為單位,所以sum_a()每次讀f.x時,CPU都可能會發現在f.x的那條cache line  是dirty(因為inc_b()有更新f.y),所以就需要花費時間re-read,但是sync進來的資料其實sum_a()根本不會用到。[2] Perf c2c是一套Red Hat工程時發展了蠻長時間的工具[2],最近被收進了4.10[1],可以方便觀察這種行為。 該團隊工程師有一篇文章,step by step教怎麼用[3] 案例:Kernel 中的RCU效能也曾經被false sharing影響過,修正方式就是… 讓 percpu data cache aligned : commit 11bbb235c26f93b7c69e441452e44adbf6ed6996 Author: Paul E. McKenney < pau

[tips] Optimize shell scripts

Currently I notice 2 tips to optimize shell scripts. The first is GNU Parallel , which is designed for this purpose for long. For example, if you have the following shell script which runs 3 seconds: #!/usr/bin/env sh #Job A sleep 1 echo 'A done' #Job B sleep 1 echo 'B done' #Job C sleep 1 echo 'C done' echo 'All Done' $time ./seq.sh A done B done C done All Done real    0m3.006s user    0m0.003s sys     0m0.002s You can use GNU Parallel to run Job A, B, C simultaneously: $time (echo 'sleep 1;echo A done'; echo 'sleep 1; echo B done'; echo 'sleep 1; echo C done') | parallel; echo 'All Done' A done B done C done real    0m1.206s user    0m0.070s sys     0m0.063s All Done The second one runs with similar idea but with an old tool you might not think of it at the first place, Makefile!! A : @sleep 1 @echo 'A done' B : @sleep 1