nginx: worker process is shutting down

2017-12-21 10:31:41来源:https://joychou.org/operations/debug-nginx-worker-process-is作者:JoyChou's Blog人点击

分享
0x00 背景

当有大量的请求Nginx时,进行reload nginx操作后,会出现一些shutting down的nginx worker进程。这是nginx为了平滑加载规则所导致,未处理完的请求会继续处理,直到所有的请求处理完,shutting down的worker进程自动结束。


但,由于某些未知的BUG,会造成shutting down的nginx worker进程,很久都不能自动结束,造成大量内存消耗。如果一个worker进程消耗200M内存,reload一次产生20个worker进程,将产生4个G的内存消耗。所以这个BUG,需要引起重视。


shutting down状态的worker主要是因为还有计时器没有被清理完。可能的情况有:


shutting down过程中任有老的请求没有处理完,等待处理完请求即可(比如用户的请求在老worker下载超大文件)(一般计时器都会对应到1个连接)
其他模块触发的timer没有处理完(可能为bug,比较少见)
lua-nginx-module模块中调用ngx.sleep()
0x01 调试
attach shutting down的进程
加载gdb脚本
执行dump_timer

命令: gdb -p 21643 -command=shut.sh


gdb脚本 shut.sh


# dump active timers
define dump_timer
dump_timer_iter ngx_event_timer_rbtree.root
end
define dump_timer_iter
# NOTE: dont set $node = $arg0, because $node will be changed by next calling dump_timer_iter()
if $arg0 != ngx_event_timer_rbtree.sentinel
# timer node($arg0) to event($ev)
set $ev = (ngx_event_t *) ((char *) $arg0 - (int)&((ngx_event_t *) 0x0)->timer)
printf "set $ev = (ngx_event_t *) %p/n", $ev
p *$ev
printf "/n"
dump_timer_iter $arg0->left
dump_timer_iter $arg0->right
end
end
0x02 复现

为了复现shutting down,我在lua代码里添加ngx.sleep(20)代码,单位是秒。


为了让请求很多,用ab进行大量的请求打入,使用命令 ab -n 1000000 -c 50 https://joychou.org/etc/passwd


reload nginx后gdb attach shutting down进程。可以看到,造成计时器没有清理完成的handler是 ngx_http_lua_sleep_handler ,也就是 ngx.sleep 造成。


(gdb) dump_timer
set $ev = (ngx_event_t *) 0x7fafc6f2c048
$1 = {data = 0x7fafc6f2bff8, write = 0, accept = 0, instance = 0, active = 0, disabled = 0,
ready = 0, oneshot = 0, complete = 0, eof = 0, error = 0, timedout = 0, timer_set = 1,
delayed = 0, deferred_accept = 0, pending_eof = 0, posted = 0, closed = 0, channel = 0,
resolver = 0, cancelable = 0, available = 0, handler = 0x4dfbd8 <ngx_http_lua_sleep_handler>,
index = 0, log = 0x7fafc37eecf0, timer = {key = 1513751173066, left = 0x7fafc3833600,
right = 0x7fafc36ea240, parent = 0x0, color = 0 '/000', data = 0 '/000'}, queue = {
prev = 0x0, next = 0x0}}
set $ev = (ngx_event_t *) 0x7fafc38335d8
$2 = {data = 0x7fafc3833588, write = 0, accept = 0, instance = 0, active = 0, disabled = 0,
ready = 0, oneshot = 0, complete = 0, eof = 0, error = 0, timedout = 0, timer_set = 1,
delayed = 0, deferred_accept = 0, pending_eof = 0, posted = 0, closed = 0, channel = 0,
resolver = 0, cancelable = 0, available = 0, handler = 0x4dfbd8 <ngx_http_lua_sleep_handler>,
index = 0, log = 0x7fafc37f4080, timer = {key = 1513751172873, left = 0x7fafc36c9820,
right = 0x7fafc36c48e0, parent = 0x7fafc6f2c070, color = 0 '/000', data = 0 '/000'}, queue = {
prev = 0x0, next = 0x0}}
0x03 参考
https://github.com/alibaba/tengine/issues/783



微信打赏





支付宝打赏




最新文章

123

最新摄影

微信扫一扫

第七城市微信公众平台