wenchao-hao/lmkd - lmkd - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
Suren Baghdasaryan	ab906fb0ce	lmkd: Change critical thrashing limit to 3x of normal one As a result of experiments, the default relation between critical and normal thrashing limits has been shown to be insufficient. Increase the relation from 2x to 3x. Bug: 194316048 Change-Id: I19877e0df56be07f3f503688f408f5f91f4b1e67 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-10-05 15:07:04 +00:00
Kalesh Singh	f6f744fcc9	Merge "lmkd: Remove uses of hardcoded 4k PAGE_SIZE macro" into main	2023-08-09 00:18:34 +00:00
Kalesh Singh	5d397582ac	lmkd: Remove uses of hardcoded 4k PAGE_SIZE macro Use getpagesize() to query the real page size instead. Bug: 294618124 Test: m Change-Id: If9046f36412a54ba08b94cf3b43cd7bf9c1f26b5	2023-08-08 15:58:16 -07:00
Suren Baghdasaryan	4d8791b1f1	lmkd: check pgrefill vmstat when deciding active reclaim In rare cases it's possible that pgscan is not changing because inactive LRU is empty and can't be refilled from the active LRU due to all pages being hot. In such conditions pgscan_kswapd/pgscan_direct will not change while pgrefill will be increasing due to active LRU being scanned. Lmkd would incorrectly treat this situation as if no reclaim activity happened. Change lmkd to check pgrefill as well to detect such conditions. Bug: 288383787 Change-Id: I6b49607429e2f673bba2645ccddff1a141afbcd1 Signed-off-by: Suren Baghdasaryan <surenb@google.com>	2023-07-28 20:48:15 +00:00
Lee George Thomas	1847e9d7ab	Add a configuration to delay monitor initialization To save CPU cycles during boot for low resource device a new configuration is added to delay initialization of monitoring until boot is complete. Bug: 288566858 Test: Build, boot and verified boot logs to confirm the behavior. Merged-In: I17cfbf4c7f83bc80dd92a99dfb0254a7e20289be Change-Id: I17cfbf4c7f83bc80dd92a99dfb0254a7e20289be	2023-07-19 19:46:12 +00:00
Suren Baghdasaryan	5860e852f8	lmkd: remove unused LMK_STAT_STATE_CHANGED notification The LmkStateChanged atom was historically used to mark lmk activity and trigger additional stats polling. For more than a year this has not been used at all (as statsd supported event-based triggering). Remove unnecessary functionality. Bug: 278174420 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I9f7f56711fabb751cf7a57ea7279759bcc4a3dff	2023-05-19 14:08:10 -07:00
Kameron Lutes	556740ef04	lmkd: Send Actual OOM Score to lmkd_free_memory_before_kill_hook Previously the min_oom score of the candidate search was sent to lmkd_free_memory_before_kill_hook. This is incorrect as the hook expects the actual oom score of the process. Bug: b/273670531 Test: cq Change-Id: Id72c8b39f9c745a8f20fde15266857cb2d2222bf	2023-03-22 00:33:30 +00:00
Suren Baghdasaryan	495db5c643	lmkd: measure free swap as available and easily reclaimable memory In the case of ZRAM, SwapFree does not represent the actual free swap amount because swap space is taken from the free memory or reclaimed. Therefore use free memory and easily reclaimable memory as an approximation of how much free swap system can use. Use SwapFree as a measure of how much swap space the system will consider using. Min of those two measurements is used to decide how much usable swap the system still has. Bug: 238495258 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia7b0cc4a744d14c0d6e52603795917cf5824ea15	2022-10-04 12:53:23 -07:00
Suren Baghdasaryan	ba9ea6e3d6	lmkd: Fix UAF caused by calling pid_remove() from the watchdog thread pid_remove() is not a thread-safe function and can be called only from the main thread. Calling it from the watchdog_callback() executed in the context of the watchdog thread can cause a use-after-free failure if the same record is being used by the main thread. Fix this issue by marking the process record as invalid instead of destroying it. Later on invalid records will be cleared in kill_one_process() called from the context of the main thread. Fixes: `f8727745f9` ("lmkd: Remove process record after it is killed by lmkd watchdog") Bug: 248448498 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I0c7776aea1518c17f0a29904a44b7fe8f33980ca	2022-09-27 14:30:34 -07:00
Suren Baghdasaryan	c555ec6eeb	lmkd: Remove process record after it is killed by lmkd watchdog After lmkd watchdog kills a process, its record should be removed. Bug: 243567425 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I70bb2a432df8088ebc9865fbc36b065738248d19	2022-08-23 15:28:53 -07:00
Ioannis Ilkos	b9d0592bba	Remove kill_one_process tracepoint from lmkd We already emit a richer slice (including pid and oom score) so there's no reason for the additional print event Bug: 195085238 Change-Id: I1140f0287934e5f0abdeeb64554a249c4c940def	2022-08-04 14:45:24 +01:00
Ken Chen	310fa3ab1b	Merge "Rename gpu_mem.o to gpuMem.o"	2022-07-21 13:43:50 +00:00
Ken Chen	4c18e91f22	Rename gpu_mem.o to gpuMem.o Underscore character may cause bpf prog/map naming collision. For example, x.o with map y_z and x_y.o with map z both result in x_y_z prog/map name, which should be prevented during compile-time. aosp/2147825 will prohibit underscore character in bpf source name (source name derives the obj name). Existing bpf modules with underscore characters in source name need to be updated accordingly. Bug: 236706995 Test: adb root; adb shell ls -l /sys/fs/bpf/ \| grep gpuMem Change-Id: I213624e59ce1bca6ee7c22028504f2d51e9c50df	2022-07-08 19:09:50 +08:00
Yuming Han	ed8fc168e6	lmkd : Fixed running wrong for Go devices when use_minfree_levels is TRUE The lmkd run wrong for Go devices, because min_score_adj is unused when use_minfree_levels is set TURE. Bug: 237947900 Signed-off-by: Yuming Han <yuming.han@unisoc.com> Change-Id: I717561cd9e5f4d1a2ca60d9fc84adcd6e129420a	2022-07-07 18:24:22 +00:00
Yuming Han	79f58c012d	lmkd: Fixed data overflow on ARM Both pgscan_kwsapd and pgscan_direct are defined as unsigned long, the overflow issues occur on ARM kernel space. Just check whether their values changed. Signed-off-by: Yuming Han <yuming.han@unisoc.com> Change-Id: I73b27855ede9ca729208775e982660bae967ab92	2022-06-29 16:06:18 +08:00
Suren Baghdasaryan	caebcddf9f	lmkd: Fix the size of vmstat_field_names array The size of the vmstat_field_names array should correspond to the number of elements in vmstat_field enum (VS_FIELD_COUNT). Bug: 227769256 Reported-by: Yuming Han <yuming.han@unisoc.com> Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Icac2810c4efca2a07cefba6e220165ef4f194867	2022-06-22 23:10:47 +00:00
Kameron Lutes	e9769f7cf3	lmkd: Fix potential null dereference in hook call If hooks are enabled in LMKD and kill_info is not supplied to kill_one_process, there will be a null dereference on kill_info. This changes validates ki before dereferencing. Bug: b/210075795 Test: cq Change-Id: Ie81ca9bdb73a71f16dc5682c8721a557b8b094fb Merged-In: Ie81ca9bdb73a71f16dc5682c8721a557b8b094fb	2022-06-22 03:40:33 +00:00
Kameron Lutes	2cce3066da	lmkd: Add hooks to LMKD Adds several hooks to LMKD that can be overridden by the vendor. This allows for device specific control of LMKD when necessary. Bug: b/210075795 Test: cq Change-Id: Ib231743183134b05148d45d681765860da6274ae (cherry picked from commit 2c1248381a52fc520c6cd1acfaee80818eaa9ee1) Merged-In: Ib231743183134b05148d45d681765860da6274ae	2022-06-22 03:35:23 +00:00
Yongqin Liu	bf819b5593	lmkd: fix the cgroup attribute name to MemCgroupEventControl which was CgroupEventControl before, but it's not the one that definied in system/core/libprocessgroup/profiles/task_profiles.json. And it causes lmkd crash for some setups like 4.19q + AOSP Master Bug: 230642311 Test: boot the 4.19q + AOSP master setup with hikey960 board Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org> Change-Id: I87b1ea2040f21c52d549db58692fc8a2b114f8e6	2022-04-29 18:50:20 +08:00
전윤재	f70a8a260f	lmkd: Fix a comparison operation with uninitialized variable. Prevent comparing uninitialized poll_start_tm with curr_tm in call_handler(). The bug caused by this has been fixed by the commit: d816ab. But the main bug is not fixed yet and it may cause problem in later if we add another operations in this if block. Change-Id: Id13318297a2cbf2f9784134a2ccd648cc221e8c4 Signed-off-by: Yoonjae Jeon <yj213.jeon@samsung.com>	2022-04-25 00:51:47 +00:00
Bart Van Assche	759943643f	lmkd: Add support for cgroups v2 memcg hierarchy Use the /sys/fs/cgroup/uid_%u/pid_%u path instead of /dev/memcg/apps/uid_%u/pid_%u" if the memcg controller is mounted in the v2 cgroup hierarchy. Skip the code that refers to memcg attributes that only exist in the cgroup v1 hierarchy when using the v2 memcg. Complain if it is attempted to use the old kill strategy in combination with memcg v2 since only the new strategy is compatible with the v2 cgroup hiearchy. Bug: 213617178 Test: Tested lmkd inside the Cuttlefish emulator. Triggered an Test: out-of-memory condition as follows: Test: i=0; while [ $i -lt 16 ]; do dd if=/dev/zero of=/dev/null bs=1G count=1 & ((i++)) done Test: That caused the following output to appear in logcat: Test: 02-03 18:13:02.772 241 241 I lowmemorykiller: Kill 'com.android.packageinstaller' (3031), uid 10022, oom_score_adj 975 to free 29348kB rss, 21232kB swap; reason: min watermark is breached and swap is low (135016kB < 150384kB) Test: 02-03 18:13:02.772 241 241 I killinfo: [3031,10022,975,0,29348,3,73644,98460,8,1000,33448,40220,1503848,135016,81720,1387040,27268,29520,50348,103164,20176,57664,0,0,0,519,103,5,0,21232,30016,0,2] Test: From the kernel log: Test: [ 302.834958] Out of memory: Killed process 3017 (ADB-JDWP Connec) total-vm:13522856kB, anon-rss:0kB, file-rss:0kB, shmem-rss:560kB, UID:10052 pgtables:1008kB oom_score_adj:975 Test: [ 303.223702] Out of memory: Killed process 2859 (HeapTaskDaemon) total-vm:13534452kB, anon-rss:0kB, file-rss:0kB, shmem-rss:560kB, UID:10051 pgtables:1060kB oom_score_adj:965 Test: [ 303.478833] Out of memory: Killed process 2816 (Signal Catcher) total-vm:13524108kB, anon-rss:0kB, file-rss:0kB, shmem-rss:564kB, UID:10073 pgtables:1016kB oom_score_adj:965 Test: [ 304.823796] Out of memory: Killed process 2438 (ReferenceQueueD) total-vm:13529180kB, anon-rss:0kB, file-rss:0kB, shmem-rss:568kB, UID:10015 pgtables:1056kB oom_score_adj:955 Test: [ 305.226728] Out of memory: Killed process 3126 (DefaultDispatch) total-vm:13532596kB, anon-rss:0kB, file-rss:0kB, shmem-rss:528kB, UID:10019 pgtables:1064kB oom_score_adj:945 Test: [ 305.935615] Out of memory: Killed process 2637 (Jit thread pool) total-vm:13523084kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:10024 pgtables:1036kB oom_score_adj:935 Test: [ 307.055895] Out of memory: Killed process 2063 (HeapTaskDaemon) total-vm:13755876kB, anon-rss:0kB, file-rss:0kB, shmem-rss:11600kB, UID:10074 pgtables:1552kB oom_score_adj:0 Test: [ 307.398512] Out of memory: Killed process 2298 (rs.media.module) total-vm:13560404kB, anon-rss:12924kB, file-rss:0kB, shmem-rss:720kB, UID:10076 pgtables:1156kB oom_score_adj:-700 Test: [ 309.888679] Out of memory: Killed process 1745 (droid.bluetooth) total-vm:13720424kB, anon-rss:11084kB, file-rss:0kB, shmem-rss:208kB, UID:1002 pgtables:1220kB oom_score_adj:-700 Test: [ 311.050133] Out of memory: Killed process 1759 (ndroid.systemui) total-vm:13852268kB, anon-rss:26468kB, file-rss:0kB, shmem-rss:6988kB, UID:10075 pgtables:1628kB oom_score_adj:-800 Test: [ 313.520057] Out of memory: Killed process 2000 (m.android.phone) total-vm:13593404kB, anon-rss:14600kB, file-rss:236kB, shmem-rss:340kB, UID:1001 pgtables:1312kB oom_score_adj:-800 Test: [ 314.337744] Out of memory: Killed process 1911 (rkstack.process) total-vm:13540420kB, anon-rss:8232kB, file-rss:0kB, shmem-rss:180kB, UID:1073 pgtables:1128kB oom_score_adj:-800 Test: [ 314.745542] Out of memory: Killed process 1984 (com.android.se) total-vm:13525236kB, anon-rss:6944kB, file-rss:0kB, shmem-rss:188kB, UID:1068 pgtables:1012kB oom_score_adj:-800 Test: [ 315.007212] Out of memory: Killed process 677 (system_server) total-vm:15049836kB, anon-rss:1856kB, file-rss:0kB, shmem-rss:240kB, UID:1000 pgtables:2084kB oom_score_adj:-900 Change-Id: I78821bcd332af7b3f642f037faa2df15a937dc26 Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-03-29 13:38:04 -07:00
Bart Van Assche	b4d26bb22b	lmkd: Look up cgroup attribute paths instead of hardcoding these Retrieve cgroup attribute paths from task_profiles.json instead of hardcoding these paths. Bug: 213617178 Test: Tested lmkd in Cuttlefish. Change-Id: I03f40ac8ccd4635f21432214e1acf997c505d1e9 Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-03-24 15:58:02 -07:00
Bart Van Assche	5ebc4e8f51	lmkd: Fix a potential buffer overflow Prevent that the statement that writes '\0' past the read data can write past the end of the buffer. Bug: 213617178 Test: Compile-tested only. Change-Id: I6922c343a6bcb52dce0b5cf54f09b2850e9dfde2 Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-02-23 17:05:11 +00:00
Bart Van Assche	545067957c	lmkd: Use std::array<> and remove the ARRAY_SIZE() definition Using ARRAY_SIZE() on a pointer yields 1 while applying .size() to a pointer triggers a compiler error. Hence use .size() instead of ARRAY_SIZE(). Bug: 213617178 Test: Compile-tested only. Change-Id: Ie0f9740f59470c943f8d62b9475f7f987ed8707b Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-02-23 17:05:11 +00:00
Bart Van Assche	80a3dba57a	lmkd: Use std::min() and std::max() instead of defining min() and max() macros std::min() and std::max() check whether the argument types match but min() and max() not. Hence switch to std::min() and std::max(). Bug: 213617178 Test: Compile-tested only. Change-Id: Iaf1f63360c9360938db56d485dbd1e504129c52d Signed-off-by: Bart Van Assche <bvanassche@google.com>	2022-02-23 17:05:10 +00:00
Suren Baghdasaryan	014dd7156e	lmkd: Log psi averages in killinfo reports Add psi memory, io and cpu 10sec averages into killinfo reports. Bug: 205182133 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ifbcf47cab291fb20dbf0b5d32f1965f4e6462b49	2022-02-21 20:30:50 -08:00
Suren Baghdasaryan	5ae47a9563	lmkd: Allow killing perceptible apps when recorded stall is too high When system is under heavy memory pressure the system might be able to keep free memory above the min watermark avoiding perceptible app kills. In such situation system might end up using all its cpu capacity on memory reclaim and not doing productive work. To detect this condition, check memory full stall and compare it with the new ro.lmk.stall_limit_critical tunable representing the stall threshold. When the recorded level is over ro.lmk.stall_limit_critical, lmkd will be allowed to kill perceptible apps. ro.lmk.stall_limit_critical represents the max memory full stall in % that is allowed before perceptible apps will get killed. By default it is set to 100%, which effectively disables the feature. Currently system stall is measured based on psi memory stall 10s average value, however this definition might change in the future if better metrics are developed. Setting ro.lmk.stall_limit_critical to 5 means the system should be fully stalled (no productive work is done) for 5% of the 10sec period, resulting in 0.5 sec loss due to the stall. Bug: 205182133 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I9713e30d82641d86d1b7edb5e1ba2971b935c898	2022-02-21 20:27:45 -08:00
Suren Baghdasaryan	2bdf7f0c74	lmkd: Set task profiles to the entire process set_process_group_and_prio sets task profiles for each thread in the process separately. This can be avoided by setting task profiles to the entire process using its pid. Bug: 215557553 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I9c1917172019a42809385f6c9c084b8cb343b520	2022-01-24 13:43:42 -08:00
Suren Baghdasaryan	af1b0e0627	lmkd: Implement watchdog thread To detect lmkd being stuck on a syscall for prolonged period of time, introduce a watchdog thread which gets set when lmkd starts handling of events and is reset after handling is done. If it takes more than the timeout period (2 sec) to handle an event, watchdog wakes up and kills the least important process to prevent mounting memory pressure caused by lmkd lockup. After a kill, watchdog will wait for the reset for another timeout period and kill again. This repeats until lmkd unlocks and resets the watchdog. Bug: 201671997 Test: induce random sleep in lmkd main handler and observe watchdog kills Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I56a55834582e11c06cc6cf9da3bc7380e634b301	2022-01-06 18:14:25 +00:00
Suren Baghdasaryan	7c3addb2a1	lmkd: Use process_mrelease to reap the target process from a thread process_mrelease syscall can be used to expedite memory release of a process after it was killed. This allows memory to be released without the target process being scheduled, therefore does not depend on target's priority or the CPU it's running on. However process_mrelease syscall can take considerable time. Blocking lmkd main thread during that time can cause memory pressure events being missed while lmkd is busy reaping previous target's memory. For this reason reaping should be done in a separate thread. This way lmkd main thread can keep monitoring memory pressure while memory is being released. Introduce Reaper class which maintains a pool of threads to perform process killing and reaping. The main thread submits a request to the Reaper to kill and reap the process without blocking. If all the threads in the pool are busy at the time the next kill is needed, the kill is performed by the main thread without reaping. Bug: 130172058 Bug: 189803002 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: If7b10fdd1838bdfeea3fed3031565feffe0b52be	2022-01-06 18:14:14 +00:00
liuhailong	cf8af501dc	lmkd: Fix lowmem_minfree out of bounds lmkd daemon launches before system_server. If lowmem_targets_size does not initialize by system_server, the value will be zero. Before system_server starts lmkd receives a psi event and debug_process_killing on, the lmkd crashes here. Bug: 209090314 Signed-off-by: liuhailong <liuhailong@oppo.com> Change-Id: I0736a882ed1ff5eee2b07676ae590a2cb2a7721c	2021-12-04 17:59:24 +08:00
Suren Baghdasaryan	6e6d14b387	lmkd: fix low swap threshold failing to update after reinit lmkd calculates low swap threshold using total available swap and ro.lmk.swap_free_low_percentage property. A wrong assumption is made that both these values are constant and therefore the threshold can be calculated once and reused later. However ro.lmk.swap_free_low_percentage can be changed by the user and lmkd --reinit issued to reapply new configuration. If that happens low swap threshold will not be updated. Fix this by calculating the threshold whenever it is used. The overhead of that calculation is negligible. Bug: 203161607 Test: setprop ro.lmk.swap_free_low_percentage <new value>; lmkd --reinit Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Idff50655a75d006ea86d9ab10ca54c375c4bea46	2021-10-28 13:30:08 -07:00
Wei Wang	730e7a9248	lmkd: move to foreground cpuset before killing Test: Build and boot Bug: 199797672 Signed-off-by: Wei Wang <wvw@google.com> Change-Id: Id475625e0d892fb7111a2cf054d1b57d17003d5a	2021-09-30 23:24:11 -07:00
Wei Wang	0162e0361f	lmkd: use fd cache for cgroup migration Test: Build Bug: 199797672 Signed-off-by: Wei Wang <wvw@google.com> Change-Id: Ie7a9eb9676c58309f1407c5f8cc59b302f107d38	2021-09-21 14:38:49 -07:00
Wei Wang	0195bcdba7	lmkd: migrate process to FOREGROUND sched group before kill BG group may have settings such as cpu.shares impacting reclaim performance. Let us migrate task to foreground sched group similarly to cpuset group. Test: Build Bug: 199797672 Signed-off-by: Wei Wang <wvw@google.com> Change-Id: I75ee9f3486a2c76e65267a98e39edff96a5e1673	2021-09-13 19:07:26 -07:00
Suren Baghdasaryan	0e64eadc21	lmkd: Do not re-initialize lmkd when persistent properties are loaded When a device boots, lmkd starts before persistent properties are loaded, therefore if experiments set any flags, the corresponding persistent properties will trigger change notifications when they are first loaded during boot. In order to prevent lmkd from re-initializing on every property load, mark persistent property change by setting lmkd.reinit to 0 and delay lmkd re-initialization until sys.boot_completed=1 when all properties are set and only one re-initialization will capture them all. On devices with no experiment flags being set lmkd.reinit will be undefined at the boot completion time and re-initialization will not be triggered at all. Bug: 194316048 Test: adb shell device_config put lmkd_native thrashing_limit_critical 350 Test: adb shell device_config put lmkd_native thrashing_limit 100 Test: adb reboot; adb -b all logcat \| grep lmkd Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Iba34fc719a18d58b890549c7415bec869d471901	2021-09-01 00:56:35 -07:00
Suren Baghdasaryan	d0a800402c	lmkd: Add support for persist.device_config.lmkd_native.* properties Allow persist.device_config.lmkd_native.* to override ro.lmk.* properties to enable experiments with lmkd configuration properties. Experiments will be able to set appropriate persist.device_config.lmkd_native.<name> property which will issue "lmkd --reinit" command to reinitialize lmkd with new parameters. Bug: 194316048 Test: adb shell device_config put lmkd_native thrashing_limit_critical 350 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia48fd51eab126d307a1604530b642e86cf250688	2021-08-31 09:20:46 -07:00
Suren Baghdasaryan	39b54809fb	lmkd: Add thrashing and max_thrashing into killinfo reports Due to the increased importance of thrashing limits, include current and max thrashing levels into killinfo reports. Bug: 195979894 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I36f947e45e03a4d845d18881e137e4b242aacb65	2021-08-09 15:10:46 -07:00
George Burgess IV	e849f1414e	lmkd: fix potential NULL pointer dereference `ki` appears to be potentially NULL. Output bogus values if it is. Caught by the static analyzer: > system/memory/lmkd/lmkd.cpp:2171:66: warning: Access to field 'kill_reason' results in a dereference of a null pointer (loaded from variable 'ki') [clang-analyzer-core.NullDereference] Bug: None Test: TreeHugger Change-Id: Iae26855528e1f7fec8f1455e06c7e813a732dc75	2021-08-05 06:59:42 +00:00
Suren Baghdasaryan	34928bb817	lmkd: Add a tracepoint for each kill with kill parameters Add a trace for each kill that includes pid, kill reason, oom_adj_score, min_oom_score and max_thrashing statistics at the time of the kill. Bug: 195085238 Test: generate kills while tracing and observer the new tracepoints Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ic2014adc08f5e5dd4aacd415970332618bd15250	2021-07-30 12:59:15 -07:00
Suren Baghdasaryan	e16047516d	lmkd: Add current and max thrashing levels in LMK_MEMORY_STATS reports Thrashing threshold tuning requires collecting thrashing level data from the field and correlating these levels with other indications of device being non-responsive. Include current and max thrashing levels in the lmkd kill reports. Max thrashing level captures the highest level seen since the last kill report. Bug: 194433891 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752 Merged-In: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752	2021-07-23 19:11:36 +00:00
Suren Baghdasaryan	1ef4718aed	Revert "lmkd: Disable critical thrashing limit by default" This reverts commit `e1ffef4e36`. Reason for revert: Restore the default thrashing limits to prevent unresponsive devices. Bug: 194199500 Change-Id: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260 Merged-In: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260	2021-07-23 12:01:55 -07:00
Suren Baghdasaryan	c1171394a3	lmkd: Disable critical thrashing limit by default Critical thrashing limit determines the balance between how much thrashing should be tolerated before killing a perceptible app. This threshold might differ between devices, therefore we disable critical thrashing limit by default allowing each device to set it individually. This is done to prevent excessive kills of perceptible apps. Bug: 194199500 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Idd1715564c3727b09fcb0a109ab3d6bae9d0b99a	2021-07-20 18:12:22 +00:00
Suren Baghdasaryan	11221d4062	lmkd: Add ro.lmk.filecache_min_kb property for min filecache watermark We see many cases when device keeps thrashing despite lmkd kills. This happens because killed processes do not free enough filecache to fit the current workingset completely. To prevent such cases, introduce ro.lmk.filecache_min_kb property to specify min filecache size in KB that should be reached after thrashing is detected. Lmkd will keep killing background processes until this filecache size limit is satisfied. Bug: 193293513 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749	2021-07-15 11:05:09 -07:00
Suren Baghdasaryan	940e7cf8bd	lmkd: Include total GPU memory usage in killinfo reports /sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU allocations size. Include this value into killinfo reports to track GPU allocation size at the time of the kill. Bug: 189366037 Test: lmkd_unit_test Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5	2021-06-03 15:56:52 -07:00
Vova Sharaienko	a92b76b54d	lmkd: reroute atoms logging to AMS - Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT - Added support to write kill & mem stats information via data socket to be read & parsed on the AMS Java side for future logging to statsd Bug: 184698933 Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS Test: statsd_testdrive 51 54 to inspect statsd logged atoms data Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4 Merged-In: Id682a438c87b3e4503261d26461f6cee641d86c4	2021-05-11 00:00:56 +00:00
Suren Baghdasaryan	5263aa7800	lmkd: Do not treat RSS=0 as a sign of a process being dead With kernel SPLIT_RSS_COUNTING feature it is possible for a valid process to report RSS of 0 size when reading /proc/pid/statm. This happens because split RSS accounting aggregates per-thread counters asynchronously and depending on the timing of the read, reported value can be inaccurate and occasionally be 0. lmkd currently treats processes reporting RSS of 0 as dead and removes them from the list of processes being tracked. This might lead to a valid process becoming unkillable. Change lmkd to stop treating RSS of 0 as a sign of a dead process. Bug: 160199622 Test: set ro.lmk.kill_heaviest_task=true and hack kernel to report RSS=0 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Ia311d2f98649c92d1a487657f94ea51f57813b73	2021-04-29 15:33:06 -07:00
Suren Baghdasaryan	9f1be12b9a	lmkd: Handle cases when proc_get_name() might return NULL proc_get_name() can return NULL if the corresponding process has died or open fails with ENOMEM due to memory shortages. Ensure such cases are handled without NULL pointer access. Bug: 186157675 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1	2021-04-23 21:18:18 +00:00
Suren Baghdasaryan	0142b3c166	lmkd: Allow lmkd to kill perceptible apps during heavy thrashing Occasionally a system can get into heavy file cache thrashing situation and become unresponsive. In these situations we observe lmkd wakeups, however it does not kill because all non-perceptible apps are already killed and the system manages to reclaim enough memory to stay above min watermark. Add ro.lmk.thrashing_limit_critical property which when breached will allow lmkd to kill perceptible apps. The property represents the percentage of refaulted workingset pages as a fraction of overall file cache size. By default it is disabled. Bug: 181778155 Test: thrashing.py 500 10 200 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Change-Id: Icb38ef6c90adaa4f5c956593b6ea0c4febc91dc0	2021-03-25 17:00:09 -07:00
Josh Gao	84623bef7b	Switch to Bionic's pidfd wrappers. Bug: http://b/172518739 Test: treehugger Change-Id: Ib6cac8f31ec64343c6eec6b82dac52888890c688	2021-03-18 17:16:08 -07:00

1 2

76 Commits