wenchao-hao/lmkd - lmkd - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Suren Baghdasaryan	11221d4062	lmkd: Add ro.lmk.filecache_min_kb property for min filecache watermark We see many cases when device keeps thrashing despite lmkd kills. This happens because killed processes do not free enough filecache to fit the current workingset completely. To prevent such cases, introduce ro.lmk.filecache_min_kb property to specify min filecache size in KB that should be reached after thrashing is detected. Lmkd will keep killing background processes until this filecache size limit is satisfied. Bug: 193293513 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749	2021-07-15 11:05:09 -07:00
Suren Baghdasaryan	940e7cf8bd	lmkd: Include total GPU memory usage in killinfo reports /sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU allocations size. Include this value into killinfo reports to track GPU allocation size at the time of the kill. Bug: 189366037 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5	2021-06-03 15:56:52 -07:00
Vova Sharaienko	a92b76b54d	lmkd: reroute atoms logging to AMS - Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT - Added support to write kill & mem stats information via data socket to be read & parsed on the AMS Java side for future logging to statsd Bug: 184698933 Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS Test: statsd_testdrive 51 54 to inspect statsd logged atoms data Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4 Merged-In: Id682a438c87b3e4503261d26461f6cee641d86c4	2021-05-11 00:00:56 +00:00
Suren Baghdasaryan	5263aa7800	lmkd: Do not treat RSS=0 as a sign of a process being dead With kernel SPLIT_RSS_COUNTING feature it is possible for a valid process to report RSS of 0 size when reading /proc/pid/statm. This happens because split RSS accounting aggregates per-thread counters asynchronously and depending on the timing of the read, reported value can be inaccurate and occasionally be 0. lmkd currently treats processes reporting RSS of 0 as dead and removes them from the list of processes being tracked. This might lead to a valid process becoming unkillable. Change lmkd to stop treating RSS of 0 as a sign of a dead process. Bug: 160199622 Test: set ro.lmk.kill_heaviest_task=true and hack kernel to report RSS=0 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia311d2f98649c92d1a487657f94ea51f57813b73	2021-04-29 15:33:06 -07:00
Suren Baghdasaryan	9f1be12b9a	lmkd: Handle cases when proc_get_name() might return NULL proc_get_name() can return NULL if the corresponding process has died or open fails with ENOMEM due to memory shortages. Ensure such cases are handled without NULL pointer access. Bug: 186157675 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1	2021-04-23 21:18:18 +00:00
Suren Baghdasaryan	0142b3c166	lmkd: Allow lmkd to kill perceptible apps during heavy thrashing Occasionally a system can get into heavy file cache thrashing situation and become unresponsive. In these situations we observe lmkd wakeups, however it does not kill because all non-perceptible apps are already killed and the system manages to reclaim enough memory to stay above min watermark. Add ro.lmk.thrashing_limit_critical property which when breached will allow lmkd to kill perceptible apps. The property represents the percentage of refaulted workingset pages as a fraction of overall file cache size. By default it is disabled. Bug: 181778155 Test: thrashing.py 500 10 200 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Icb38ef6c90adaa4f5c956593b6ea0c4febc91dc0	2021-03-25 17:00:09 -07:00
Josh Gao	84623bef7b	Switch to Bionic's pidfd wrappers. Bug: http://b/172518739 Test: treehugger Change-Id: Ib6cac8f31ec64343c6eec6b82dac52888890c688	2021-03-18 17:16:08 -07:00
Suren Baghdasaryan	858e8c6373	lmkd: choose the heaviest task when killing perceptible processes When killing a task at or lower than oom_score_adj PERCEPTIBLE_APP_ADJ choose the heaviest task among the ones at that level to try minimizing the number of required kills. Because killing a perceptible app will affect user experience anyway, it makes sense to choose the one that will release the most memory and therefore no more kills might be necessary. Bug: 181778155 Test: running thrashing.py script Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I775ff774430b6fde4d619ede794825dbae59fd8e	2021-03-05 17:45:30 +00:00
Suren Baghdasaryan	236781873f	lmkd: fix log message reporting the breached watermark Wrong condition causes reporting low watermark breach when min watermark is breached and visa versa. Fix the condition to make reporting correct. Bug: 181778155 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I684141c38f961fce99d17cfb3a83706fcd84ea10	2021-03-05 17:45:10 +00:00
Ioannis Ilkos	282437fbbe	Reorder swap field in killinfo Some tools might parse killinfo entries based on the field order. Move the newly added swap field to the end to ensure compatibility. Test: build Change-Id: Id6dad850beba6835f061da95e84190d00a1b26a0	2021-03-04 17:50:05 +00:00
Ioannis Ilkos	4884890305	Log killed process swap size We already log the rss size for the process. Given lmkd strategies also consider low swap, it will be beneficial to record the swap size too. Test: build, manual test Change-Id: I923f733f7a3aa77fc5968827693b0fc085819174	2021-02-26 17:05:35 +00:00
Chris Morin	74b4df95b4	Replace mentions of "oom_adj" with "oom_score_adj" Some log messages mention "oom_adj" instead of "oom_score_adj" when referring to oom_score_adj. This is confusing because "oom_adj" is a separate value which was supplanted by oom_score_adj, but can still be used. Test: trigger memory pressure and view logs Change-Id: I23825083cecfff6bd32bfb39c6dac1f2b17a72a7	2021-02-26 00:07:16 -08:00
Suren Baghdasaryan	dc60f9717b	lmkd: Handle workingset_refault vmstat field change in 5.9 kernel Linux kernel 5.9 change some vmstat fields including workingset_refault which affects lmkd operation. Update vmstat parsing to handle both old (workingset_refault) and new (workingset_refault_file) names for that field. Bug: 175617952 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I8f9b3d027ca96154f07e7252902a5aa04cf05a9f	2020-12-14 13:38:48 -08:00
Suren Baghdasaryan	9f8d3dec72	lmkd: Remove unused workingset_refault parsing from zoneinfo workingset_refault field in zoneinfo is currently being parsed but is not used. Instead the same field in vmstat is being used to capture the number of file-backed workingset refaults. Remove the unused field parsing code. Bug: 175617952 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I79641a833c252cf50ac08c0c7d17c8294236d82d	2020-12-14 13:10:59 -08:00
Suren Baghdasaryan	3cc1f13044	lmkd: report kill reason, and meminfo details to statsd for each kill Information like free memory and swap as well as kill reason would be useful for understanding regressions in the number of lmk kills in the field. Bug: 168117803 Change-Id: Ic46aed3c85b880b32ac5ad61b55f90e0d33517c7 Test: statsd_testdrive 51, load with lmk_unit_test	2020-09-11 14:24:23 +01:00
Martin Liu	589b5752ee	lmkd: fix possible long stall state If the first PSI event triggers a kill, lmkd won't resume polling immediately after the process has died. Instead, it will wait until the next PSI event to resume the polling which is too late when the device is under memory pressure. This happens if data communication with AMS happens after previous polling window expired, in which case paused handler gets reset and polling does not resume after the kill. Fix this by changing pause handler reset logic. Bug: 167562248 Test: memory pressure test Signed-off-by: Martin Liu <liumartin@google.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I10c65c85b718a656e3d8991bf09948b96da895cb	2020-09-04 00:04:57 +08:00
Martin Liu	c3108416e7	lmkd: avoid division by zero because of file_base_lru It seems we have chance that file_base_lru is zero. Avoid it by adding 1. Bug: 167660459 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: If19dbbaafe6cd28a9d5b7f8a002f3cd33daab5e7	2020-09-04 00:03:49 +08:00
Martin Liu	1f72f5fa4b	lmkd: adjust thrashing dection strategy When a device is thrashing the file cache, workingset refaults can grow slowly because of variant reasons. Current thrashing detection mechanism could reset the thrashing counter frequently as it relies on presence of reclaim activity, however refaults can keep increasing even when the device is not actively reclaiming. In addition, the thrashing counter gets reset when conditions require a kill but lmkd could not find an eligible process to be killed. This is problematic because when this happens thrashing is being ignored. Use a fixed 1 sec periods to aggregate the thrashing counter. Also we need to keep monitoring thrashing counter while retrying as someone could release the memory to mitigate the thrashing. If thrashing counter is greater than the limit at the end of the 1 sec period this means lmkd failed to find an eligible process to kill. In this case we store accumulated thrashing in case a new eligible process appears until accumulated thrashing is less that the limit or we miss an entire 1 sec window. Bug: 163134367 Test: heavy loading launch Signed-off-by: Martin Liu <liumartin@google.com> Change-Id: Ie9f4121ea604179c0ad510cc8430e7a6aec6e6b2	2020-08-28 13:04:42 +08:00
Ioannis Ilkos	279268a07f	Emit swap size in the killed process' statsd atoms Changes: - We are already reading /proc/pid/status to resolve the tgid. While we are at it, also parse RSS and swap values. - Use the RSS and swap values for non memcg builds when creating the statsd outputs - Given we already read RSS, remove the separate read of /proc/pid/statm that used to get tasksize. Bug: 163116785 Test: manual, out/host/linux-x86/bin/statsd_testdrive 51 Change-Id: I9d98b9ffe8be0b014bb09174ec9532382cae1f38	2020-08-12 20:24:56 +01:00
Suren Baghdasaryan	d7b4fcb8a5	lmkd: Add lmkd wakeup information into killinfo logs Oftentimes while investigating bugreports it's unclear whether lmkd was active between kills. To provide visibility into lmkd activity adding the following fields into killinfo reports: MsSinceEvent - number of msecs since the last PSI/vmpressure event MsSincePrevWakeup - number of msecs since the previous wakeup WakeupsSinceEvent - number of wakeups since the last PSI/vmpressure event SkippedWakeups - number of wakeups that were skipped due to an incomplete kill Bug: 162034541 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I0356c27515132ff0dd309b59a8bf907acbd67cd8	2020-07-24 19:31:03 +00:00
Suren Baghdasaryan	7d1f4f0047	lmkd: Set default kill timeout to limit waits for uninterruptible processes When lmkd tries to kill a process in uninterruptible sleep state, it may need to wait for a long time. To prevent this set the default kill timeout to 100ms which should work for majority of the devices. Bug: 160295034 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia280dc095df9ca8494278e0a75b976ed93fc04ae	2020-07-08 11:41:14 -07:00
Martin Liu	3185c2d096	lmkd: Fix do not kill perceptible apps due to low swap if above min wmark Fix code logic to obey our intetion of not killing perceptible apps due to low swap if above min wmark. Bug: 155709603 Test: boot Signed-off-by: Martin Liu <liumartin@google.com> Merged-In: Ifc09c2a1fe7e21faa096988f471644f63951d81c Change-Id: Ifc09c2a1fe7e21faa096988f471644f63951d81c	2020-06-02 23:43:49 +08:00
Suren Baghdasaryan	48135c4cba	lmkd: Do not kill perceptible apps due to low swap if above min wmark Prevent kills of perceptible apps due to swap shortages unless system free memory is below the min watermark. This prevents kills of important apps when the system is recovering from the memory pressure. Bug: 155709603 Test: memory stress test with multiple foreground apps Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I6beb4b55f8b4f7bc22818b5a7bdfa3adc6cd31c1	2020-05-20 12:22:07 -07:00
Suren Baghdasaryan	fb1f592602	lmkd: Set the default free swap threshold to 10% for all devices Lower the min swap threshold to 10% for all devices to limit kills while swap still has enough space. Bug: 155709603 Test: memory stress test with multiple foreground apps Signed-off-by: Suren Baghdasaryan <surenb@google.com> Merged-In: I443486763c034ed0603ea52b81c060c3969af9a5 Change-Id: I443486763c034ed0603ea52b81c060c3969af9a5	2020-05-20 12:21:34 -07:00
Suren Baghdasaryan	5c039b53d8	lmkd: Fix min_score_adj to exclude killing foreground processes In the cases when foreground processes should not be killed min_score_adjust should be set above PERCEPTIBLE_APP_ADJ to prevent such kills. Bug: 155709603 Test: memory stress test with multiple foreground apps Signed-off-by: Suren Baghdasaryan <surenb@google.com> Merged-In: If187654b8001ce843ec6085ccd2042d75a986dae Change-Id: If187654b8001ce843ec6085ccd2042d75a986dae	2020-05-20 12:21:10 -07:00
Suren Baghdasaryan	ed715a3424	lmkd: Remove unused variables and fix type mismatches Fix compilation warnings by removing unused variables and add typecasting whenever mixed type comparisons are performed. Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I7f0839d803a6bf6532f077208ce54aba761dc9fe	2020-05-11 19:52:52 -07:00
Suren Baghdasaryan	1d0ebeaa9c	lmkd: Add property re-initialization support Add --reinit command-line option to allow updating lmkd properties. For example to enable debug logging in the running lmkd process user should issue: setprop ro.lmk.debug true lmkd --reinit Bug: 155149944 Test: lmkd_unit_test after resetting lmkd properties Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic60331f3368f5a7fdfe09ad7d47c7ccf0a497685	2020-05-06 15:05:04 -07:00
Suren Baghdasaryan	03dccf35a1	lmkd: enable ro.lmk.kill_timeout_ms to be used with kill notifications Allow lmkd to stop waiting for a kill notification if a kill takes longer than ro.lmk.kill_timeout_ms. Bug: 147315292 Test: lmkd_unit_test with ro.lmk.kill_timeout_ms set to 100 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia3eed3448fd6928a5e634c2737044722048b3578	2020-04-29 15:11:36 -07:00
Suren Baghdasaryan	9ca5334683	lmkd: polling code cleanup - Remove unused POLLING_STOP state - Simplify POLLING_DO_NOT_CHANGE state handling - Correct last_poll_tm assignment logic Bug: 147315292 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: If0674eda954a25f0f6c9188501ff77db8ba0813b	2020-04-29 15:11:15 -07:00
Suren Baghdasaryan	51ee4c505f	lmkd: add kill when swap utilization is too high When non-swappable allocations cause memory pressure swap will not be depleted, however a high percentage of the swappable memory will be pushed into swap. Detect this condition and kill a process when swap utilization is too high while under memory pressure. Introduce ro.lmk.swap_util_max property to represent max percentage of the overall swappable memory that can be swapped under memory pressure without triggering a kill. ro.lmk.swap_util_max is set to 100 by default which disables kills due to the swap utilization. Bug: 147315292 Test: ION memory hogger with ro.lmk.swap_util_max set to 95 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I6dbf124bb24b220d136e8f16b3dae0c0c30d32ca	2020-04-29 15:11:02 -07:00
Muhammad Qureshi	ed8fe8465a	Use generated code for logging events to statsd Use the autogenerated libstatslog_lmkd to send events to statsd The logging schema for statsd is changing as part of statsd becoming a Mainline module in R. The autogenerated code will handle the schema change. Bug: 145887874 Test: m -j Test: atest android.cts.statsd.atom.UidAtomTests#testLmkKillOccurred Change-Id: Ibae4cd822807369a799d5c1f6a9c51272e38a074	2020-01-13 12:16:47 -08:00
Suren Baghdasaryan	36baf44179	lmkd: Restrict lmkd unsolicited notifications only to subscribed recipients lmkd unsolicited notifications can cause lmkd to block if clients are not consuming them. Fix that by sending notifications to only subscribed clients. Introduce LMK_SUBSCRIBE command to allow lmkd clients to subscribe to event notifications. The only asynchronous event currently supported is LMK_ASYNC_EVENT_KILL. Bug: 146597855 Test: fill up send buffer using lmkd_unit_test Test: confirm lmkd does not block after the fix Change-Id: I014159aa55b59081f4b9ed53ecd160a49c0682bb Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2019-12-23 12:35:29 -08:00
Tom Cherry	43f3d2b190	Build lmkd as C++ Bug: 145669697 Test: build Change-Id: I4fb2a9a900c8a6915ee84cc3d82434596301b24b	2019-12-13 08:40:30 -08:00

1 2

83 Commits