跳到主要內容

發表文章

Introduction to BPF Performance Tools

 I have used BPF performance tools for about half of a year. The tools are really useful and let me investigate the whole system from a new angle. When COSCUP 2020 comes, I had no hesitate to give a talk about them. I would like to share the slides here again to promote this useful tool. Enjoy it! (ps: video is here )
最近的文章

Building Abstractions with Procedures

I reread one of my favorite textbook, structure and interpretation of computer programs (SICP), recently. I would like to summarize some insights I got from the book to investigate some understandings with my previous experience . So the content would be some personal random notes and not for everyone. :-) Here comes some from chapter 1, Building Abstractions with Procedures. * Why using the a non-mainsteam programming language? Because Scheme has very simple syntax and flexible semantics to implement broad range of applications easily, that's so important for a computer science student to know the main ideas by constructing infrastructures so that they won't get lost among the fancy features coming up everyday. I think this is useful especially when programmers get mature. They will find out that language features are not so important. The important thing is to deeply understand the ideas behind the complex components. If we can construct toy examples, that will be huge

Book Review: BPF Performance Tools

First thing first, I will give this book Highly Recommended. The author is famous for his work on systems performance for long, including marvelous DTrace Toolkits, flame graph, heat map... I also follow his wonderful website for several years. When the author announced his plan to publish a book about BPF, I wait for the date coming and it really didn't let me disappointed. :-) This books contains 4 parts. The first 2 parts are the meats. Part I describes the technologies will be used in part II. It contains what is BPF, what is BCC, what is bpftrace. And the most important: what mechanisms under the hood. It's so important that people need to know enough underneath to use the high-level tools well that they would not run into the corners of the limitations of tools frequently. Although author suggests readers can skim the Chapter 2, Technology Background, I highly recommend you not to. Part II starts the real topics about this book: Performance Analysis. It starts fr

[LKML] false sharing: detection and solution in Linux

4.10 kernel 有一個有趣 perf 的patch,主要用來改善cache contention的偵測[1],尤其是對false sharing的判斷。 False sharing發生時,原本預期可以利用多核平行處理來達到加速的程式碼片段,往往會跑得比單核還慢。 舉例來說,如果有以下程式: struct  foo {     int x;     int y; }; static   struct  foo f; /* The two following functions are running concurrently: */ int sum_a(void) {     int s = 0;     int i;      for  (i = 0; i < 1000000; ++i)         s += f.x;      return  s; } void inc_b(void) {     int i;      for  (i = 0; i < 1000000; ++i)         ++f.y; } 開兩個thread讓sum_a()與inc_b()跑在不同CPU上,乍看之下各自讀寫的address不同,應該可以獨立執行, 但由於cache coherence機制以cache line為單位,所以sum_a()每次讀f.x時,CPU都可能會發現在f.x的那條cache line  是dirty(因為inc_b()有更新f.y),所以就需要花費時間re-read,但是sync進來的資料其實sum_a()根本不會用到。[2] Perf c2c是一套Red Hat工程時發展了蠻長時間的工具[2],最近被收進了4.10[1],可以方便觀察這種行為。 該團隊工程師有一篇文章,step by step教怎麼用[3] 案例:Kernel 中的RCU效能也曾經被false sharing影響過,修正方式就是… 讓 percpu data cache aligned : commit 11bbb235c26f93b7c69e441452e44adbf6ed6996 Author: Paul E. McKenney < pau

[tips] Optimize shell scripts

Currently I notice 2 tips to optimize shell scripts. The first is GNU Parallel , which is designed for this purpose for long. For example, if you have the following shell script which runs 3 seconds: #!/usr/bin/env sh #Job A sleep 1 echo 'A done' #Job B sleep 1 echo 'B done' #Job C sleep 1 echo 'C done' echo 'All Done' $time ./seq.sh A done B done C done All Done real    0m3.006s user    0m0.003s sys     0m0.002s You can use GNU Parallel to run Job A, B, C simultaneously: $time (echo 'sleep 1;echo A done'; echo 'sleep 1; echo B done'; echo 'sleep 1; echo C done') | parallel; echo 'All Done' A done B done C done real    0m1.206s user    0m0.070s sys     0m0.063s All Done The second one runs with similar idea but with an old tool you might not think of it at the first place, Makefile!! A : @sleep 1 @echo 'A done' B : @sleep 1

[LKML] enhanced printk for NMI in Linux

紀錄一下近期kernel為了NMI(non-maskable interrupt)的情況設計的一個 printk() 加強機制。 我們都知道,原本 NMI 發生時,如果剛好某個 core 正把 printk lock 拿住,就不能在 NMI handler 中去使用printk(), 否則會 deadlock。 SUSE 工程師 Petr Mladek 提了一個patch[1],可以針對 NMI 的情況繼續使用 printk()。 基本的想法: 在發生 NMI 時,呼叫 printk_nmi_enter(),printk內部就會切換到另一個printk func,l og 就會推進 單獨的 buffer ,離開 NMI context 後,再呼叫 printk_nmi_exit() 切回原本printk實作, 然後起一個 irq worker,worker後續會把 NMI buffer copy 進__log_buf。 想法很簡單,但實作上許多tricky之處,請詳閱 patch 說明,或...printk() 任何情況直接用下去就對了 XD [1]  https://patchwork.kernel.org/patch/8899761/

[LKML] Linus的PDCA - BUG_ON() 的使用方式

[BUG_ON()] It's not a "let's check that  everybody did things right", it's a "this is a major design rule in  this core code". "good BUG_ON() in solid code that has been around forever" and "bad BUG_ON() checking something that we know we might be getting wrong".                                                                                                   by Linus Torvald Linus Torvalds 在釋出 4.8 後,很快地發現 kernel 在開啟 CONFIG_DEBUG_VM 時,容易會讓 kernel panic。[1] 原因是因為在 VM 的程式碼中,在檢查到某個異常情況時, VM_BUG_ON()會 使用 BUG_ON()。 根據該筆修改的  Johannes Weiner  說法,他的程式碼只會在 CONFIG_DEBUG_VM 時使用 BUG_ON(),所以認為對一般使用者不會有影響才對。 並且,作者認為,如果已經遇到了異常情況,卻不用 BUG_ON() 將系統停住,不就讓系統的行為更無法預期、更難 debug 了嗎? [ 2 ] Linus 很快地回了一封 mail ,說明 BUG_ON() 的少數使用時機: 開發者的測試環境 對 error handling 還沒想清楚,但是覺得某個地方絕不該發生,通常只應該在 [RFC] 階段,大概對應開發階段的 UT。 然而,即使開啟了 CONFIG_DEBUG_VM,也不該就讓 BUG_ON() 進入,因為對使用者來說,開啟這個選項是想獲得更多訊息, 而不是讓 kernel 死亡。            這類的 BUG_ON() 經過一些測試後,就應該拿掉。        2