Software watchdog is essential to monitor system's status, especially for those run for very long period of time, such as server programs. The basic idea is simple: construct a trustworth process or thread to monitor the real working process or thread.
However, after googling a while, I could not get any satisfactory solution in Python. I think it's time to write this by myself. In this article, I will share how I implement a software watchdog in Python. Before we go on, you may want to pull my code from github to play with.
The first thing is to determine what kind of model we want watchdog to run? There are two choices: be a thread, or be a process. Being a thread, we can simplify communication between watchdog and working thread. The problem is that if there are some resources hold on with working thread, watchdog might deadlock. Moreover if working thread is trapped in some loop, we have no graceful way to kill it. My choice is obvious: let watchdog and working one both be process. Here comes a question. How to let working process kick watchdog to tell that it is alive? There are many choices among different IPC. I use pipe to get this done.(Never use signal to communicate with other processes we can not control.) The following is a simple test to show it really works when worker is hanging:
$ python pywdt.py
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WDT] timeout
[WDT] stop checkers
[WORKER] running
[WDT] kill worker
[WDT] restart
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WDT] stop checkers
[WORKER] running
[WDT] kill worker
However, after googling a while, I could not get any satisfactory solution in Python. I think it's time to write this by myself. In this article, I will share how I implement a software watchdog in Python. Before we go on, you may want to pull my code from github to play with.
The first thing is to determine what kind of model we want watchdog to run? There are two choices: be a thread, or be a process. Being a thread, we can simplify communication between watchdog and working thread. The problem is that if there are some resources hold on with working thread, watchdog might deadlock. Moreover if working thread is trapped in some loop, we have no graceful way to kill it. My choice is obvious: let watchdog and working one both be process. Here comes a question. How to let working process kick watchdog to tell that it is alive? There are many choices among different IPC. I use pipe to get this done.(Never use signal to communicate with other processes we can not control.) The following is a simple test to show it really works when worker is hanging:
$ python pywdt.py
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WDT] timeout
[WDT] stop checkers
[WORKER] running
[WDT] kill worker
[WDT] restart
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WORKER] running
[WDT] check timeout
[WDT] stop checkers
[WORKER] running
[WDT] kill worker
I create a SW watchdog in C++/POSIX based on the same idea here. Please check https://github.com/reborn2266/CppWDT
回覆刪除