不同性質的應用程式對 KPTI (Meltdown 修正) 的效能影響

2018-02-11 14:06:18来源:https://blog.gslin.org/archives/2018/02/10/8136/不同性質的應用程式對-k作者:Gea-Suan Lin's BLOG人点击

分享

Netflix 的 Brendan Gregg 整理了他测试 KPTI 对效能的影响:「 KPTI/KAISER Meltdown Initial Performance Regressions 」。


与其他人只是概括的测试,他主要是想要针对可量测的数字对应出可能的 overhead,这样一来还没上 patch 的人就可以利用这些量测数字猜测可能的效能冲击。


他把结论放在前面:


To understand the KPTI overhead, there are at least five factors at play. In summary:



Syscall rate: there are overheads relative to the syscall rate, although high rates are needed for this to be noticable. At 50k syscalls/sec per CPU the overhead may be 2%, and climbs as the syscall rate increases. At my employer (Netflix), high rates are unusual in cloud, with some exceptions (databases).
Context switches: these add overheads similar to the syscall rate, and I think the context switch rate can simply be added to the syscall rate for the following estimations.
Page fault rate: adds a little more overhead as well, for high rates.
Working set size (hot data): more than 10 Mbytes will cost additional overhead due to TLB flushing. This can turn a 1% overhead (syscall cycles alone) into a 7% overhead. This overhead can be reduced by A) pcid, available in Linux 4.14, and B) Huge pages.
Cache access pattern: the overheads are exacerbated by certain access patterns that switch from caching well to caching a little less well. Worst case, this can add an additional 10% overhead, taking (say) the 7% overhead to 17%.

重点在于给了量测的方式,以第一个 Syscall rate 来说好了,他用 sudo perf stat -e raw_syscalls:sys_enter -a -I 1000 测试而得到程式的 syscall 数量,然后得到下面的表格,其中 X 轴是每秒千次呼叫数,Y 轴是效能损失:



用这样的方式提供给整个组织 (i.e. Netflix) 内评估冲击。


微信扫一扫

第七城市微信公众平台