How to write a software watchdog in Python?

Software watchdog is essential to monitor system's status, especially for those run for very long period of time, such as server programs. The basic idea is simple: construct a trustworth process or thread to monitor the real working process or thread.

However, after googling a while, I could not get any satisfactory solution in Python. I think it's time to write this by myself. In this article, I will share how I implement a software watchdog in Python. Before we go on, you may want to pull my code from github to play with.

The first thing is to determine what kind of model we want watchdog to run? There are two choices: be a thread, or be a process. Being a thread, we can simplify communication between watchdog and working thread. The problem is that if there are some resources hold on with working thread, watchdog might deadlock. Moreover if working thread is trapped in some loop, we have no graceful way to kill it. My choice is obvious: let watchdog and working one both be process. Here comes a question. How to let working process kick watchdog to tell that it is alive? There are many choices among different IPC. I use pipe to get this done.(Never use signal to communicate with other processes we can not control.) The following is a simple test to show it really works when worker is hanging:

$ python pywdt.py
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WDT] timeout
[WDT] stop checkers
[WORKER] running
[WDT] kill worker
[WDT] restart
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WDT] stop checkers
[WORKER] running
[WDT] kill worker

留言

mars2012年9月6日上午11:14
I create a SW watchdog in C++/POSIX based on the same idea here. Please check https://github.com/reborn2266/CppWDT
回覆刪除
回覆

新增留言

軟體學徒forever

搜尋此網誌

How to write a software watchdog in Python?

標籤

留言

張貼留言

這個網誌中的熱門文章

誰在呼叫我？不同的backtrace實作說明好文章

淺讀Linux root file system初始化流程

kernel panic之後怎麼辦？