Memory Barrier in Lock API

由於現今 CPU 與編譯器俱備各種強大的最佳化機制(speculative execution, ALU/LSU pipelines, cache...)，使得程式在執行時，並不像程式設計師在 source code level 看到那樣地循序執行。一般情況下，這不會有問題，因為 CPU 通常只會對"獨立的"指令進行重排，但是當我們是在 multi-core 的情況下，就不是這麼顯而易見了。

今天想談一個 memory barrier 的典型應用，當作是...嗯，為研讀 CPU spec 作點整理。

考慮一個情況，現在我們要設計一組 lock/unlock 的 sync API，作為 multithread 的資源同步機制。在沒有 pthread 或其他 OS system call 可用的情況下，我們要如何實現呢？

底下是一個簡單的作法：

struct L {
        volatile atomic_t flag;
};

void lock_init(struct *L)
{
        L->flag = 0;
}

void lock(struct *L)
{
        while (1) {
                // if L->flag == 1, atomic_test_and_set does nothing and return 0
                // else set it to 1 and return 1
                if (atomic_test_and_set(L->flag, 1)) {
                        break;
                }
        }
}

void unlock(struct *L)
{
        atomic_set(L->flag, 0);
}

void task0(void)
{
        while (1) {
                lock(&list_lock);
                do_somthing(&shared_list);
                unlock(&list_lock);
        }
}

void task1(void)
{
        while (1) {
                lock(&list_lock);
                do_other(&shared_list);
                unlock(&list_lock);
        }
}

這邊假設 L->flag 的設值與讀取都是 atomic ，並且也以 volatile 修飾，所以不用擔心編譯器的最佳化造成讀到暫存器中的值。乍看之下，似乎達成了我們的基本需求，但是這樣的實作很可能會不定時發生錯誤。

正常流程如下：

由 task0 拿到 lock
task0 開始對 shared_list 作更動
作完更動後，task0 呼叫 unlock
task1 發現 lock 可獲得，抓到lock
對修改後的 shared_list 作修改

但是若 CPU 俱備 out-of-order execution 的能力，很有機會變成：

由 task0 拿到 lock
task0 開始對 shared_list 作更動
作完更動後，task0 呼叫 unlock
step 3中，task0 對 shared_list 的更動尚未生效
task1 發現 lock 可獲得，抓到lock
對修改到一半的 shared_list 作修改
錯誤，或 crash...programmer debug 一個禮拜還找不出原因...-_-

這就是 memory barrier 登場的時候了。memory barrier 依據 CPU 有不同特性與分類，但一般可以想成，在 barrier 之前的CPU 指令會保證比 barrier 之後的指令先完成。所以我們只要在 lock/unlock 前後增加 memory barrier 指令，便可保證對 resource 的修改會先於 flag 或 flag 的修改會早於讀寫 resource。如下：

void lock(struct *L)
{
        while (1) {
                // if L->flag == 1, atomic_test_and_set does nothong and return 0
                // else set it to 1 and return 1
                if (atomic_test_and_set(L->flag, 1)) {
                        memory_barrier(); //require before access to protected resource
                        break;
                }
        }
}

void unlock(struct *L)
{
        memory_barrier(); // make sure access to protected resource have completed
        atomic_set(L->flag, 0);
        memory_barrier(); // make sure update L->flag before wake up other task
}

是不是覺得很麻煩呢？嗯，第一次看到需要使用 memory barrier 時，也是覺得非常違反直覺，好像是為了讓 CPU 滿意，所以要額外增加給 CPU 的指示，不然就會與原本邏輯不同...但是為了善用各種強大的 CPU 最佳化機制，我們只好時時注意可能的抽象洩漏啦~幸運的是，r絕大部份需要用到 memory barrier 的地方，OS 或 library 都幫我們作掉了(cache 操作、sync API、MMU、interrupt)。但是如果你是負責這個部份的移植，或是要實作類似的機制(自訂的同步機制)...嗯，那就...大家共勉之～ :-)

軟體學徒forever

搜尋此網誌

Memory Barrier in Lock API

標籤

留言

張貼留言

這個網誌中的熱門文章

誰在呼叫我？不同的backtrace實作說明好文章

淺讀Linux root file system初始化流程

kernel panic之後怎麼辦？