Commit Graph

77 Commits

Author SHA1 Message Date
Suren Baghdasaryan 0ac96fcb1a lmkd: Kill cached apps when thrashing or below low watermark
With Android U removing AMS kills, lmkd has additional duty to kill
cached apps which previously were killed by AMS. The former logic is
not proactive enough and leads to too many cached apps contributing
to memory pressure.
Implement additional logic to kill cached apps (excluding previous
foreground apps) when low watermark is breached or when device is
thrashing.

Bug: 300660611
Change-Id: I356eac1fe6d44dad292a7ea2fadee69a5be61479
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-10-10 15:52:24 -07:00
Suren Baghdasaryan ab906fb0ce lmkd: Change critical thrashing limit to 3x of normal one
As a result of experiments, the default relation between critical and
normal thrashing limits has been shown to be insufficient. Increase the
relation from 2x to 3x.

Bug: 194316048
Change-Id: I19877e0df56be07f3f503688f408f5f91f4b1e67
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-10-05 15:07:04 +00:00
Kalesh Singh f6f744fcc9 Merge "lmkd: Remove uses of hardcoded 4k PAGE_SIZE macro" into main 2023-08-09 00:18:34 +00:00
Kalesh Singh 5d397582ac lmkd: Remove uses of hardcoded 4k PAGE_SIZE macro
Use getpagesize() to query the real page size instead.

Bug: 294618124
Test: m
Change-Id: If9046f36412a54ba08b94cf3b43cd7bf9c1f26b5
2023-08-08 15:58:16 -07:00
Suren Baghdasaryan 4d8791b1f1 lmkd: check pgrefill vmstat when deciding active reclaim
In rare cases it's possible that pgscan is not changing because inactive
LRU is empty and can't be refilled from the active LRU due to all
pages being hot. In such conditions pgscan_kswapd/pgscan_direct will
not change while pgrefill will be increasing due to active LRU being
scanned. Lmkd would incorrectly treat this situation as if no reclaim
activity happened.
Change lmkd to check pgrefill as well to detect such conditions.

Bug: 288383787
Change-Id: I6b49607429e2f673bba2645ccddff1a141afbcd1
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2023-07-28 20:48:15 +00:00
Lee George Thomas 1847e9d7ab Add a configuration to delay monitor initialization
To save CPU cycles during boot for low resource device a new
configuration is added to delay initialization of monitoring until boot
is complete.

Bug: 288566858
Test: Build, boot and verified boot logs to confirm the behavior.
Merged-In: I17cfbf4c7f83bc80dd92a99dfb0254a7e20289be

Change-Id: I17cfbf4c7f83bc80dd92a99dfb0254a7e20289be
2023-07-19 19:46:12 +00:00
Suren Baghdasaryan 5860e852f8 lmkd: remove unused LMK_STAT_STATE_CHANGED notification
The LmkStateChanged atom was historically used to mark lmk activity
and trigger additional stats polling. For more than a year this has
not been used at all (as statsd supported event-based triggering).
Remove unnecessary functionality.

Bug: 278174420
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9f7f56711fabb751cf7a57ea7279759bcc4a3dff
2023-05-19 14:08:10 -07:00
Kameron Lutes 556740ef04 lmkd: Send Actual OOM Score to lmkd_free_memory_before_kill_hook
Previously the min_oom score of the candidate search was sent to
lmkd_free_memory_before_kill_hook. This is incorrect as the hook expects
the actual oom score of the process.

Bug: b/273670531
Test: cq
Change-Id: Id72c8b39f9c745a8f20fde15266857cb2d2222bf
2023-03-22 00:33:30 +00:00
Suren Baghdasaryan 495db5c643 lmkd: measure free swap as available and easily reclaimable memory
In the case of ZRAM, SwapFree does not represent the actual free swap
amount because swap space is taken from the free memory or reclaimed.
Therefore use free memory and easily reclaimable memory as an
approximation of how much free swap system can use. Use SwapFree as
a measure of how much swap space the system will consider using. Min
of those two measurements is used to decide how much usable swap the
system still has.

Bug: 238495258
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia7b0cc4a744d14c0d6e52603795917cf5824ea15
2022-10-04 12:53:23 -07:00
Suren Baghdasaryan ba9ea6e3d6 lmkd: Fix UAF caused by calling pid_remove() from the watchdog thread
pid_remove() is not a thread-safe function and can be called only from
the main thread. Calling it from the watchdog_callback() executed in the
context of the watchdog thread can cause a use-after-free failure if the
same record is being used by the main thread. Fix this issue by marking
the process record as invalid instead of destroying it. Later on invalid
records will be cleared in kill_one_process() called from the context of
the main thread.

Fixes: f8727745f9 ("lmkd: Remove process record after it is killed by lmkd watchdog")
Bug: 248448498
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I0c7776aea1518c17f0a29904a44b7fe8f33980ca
2022-09-27 14:30:34 -07:00
Suren Baghdasaryan c555ec6eeb lmkd: Remove process record after it is killed by lmkd watchdog
After lmkd watchdog kills a process, its record should be removed.

Bug: 243567425
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I70bb2a432df8088ebc9865fbc36b065738248d19
2022-08-23 15:28:53 -07:00
Ioannis Ilkos b9d0592bba Remove kill_one_process tracepoint from lmkd
We already emit a richer slice (including pid and oom score) so there's
no reason for the additional print event

Bug: 195085238
Change-Id: I1140f0287934e5f0abdeeb64554a249c4c940def
2022-08-04 14:45:24 +01:00
Ken Chen 310fa3ab1b Merge "Rename gpu_mem.o to gpuMem.o" 2022-07-21 13:43:50 +00:00
Ken Chen 4c18e91f22 Rename gpu_mem.o to gpuMem.o
Underscore character may cause bpf prog/map naming collision. For
example, x.o with map y_z and x_y.o with map z both result in x_y_z
prog/map name, which should be prevented during compile-time.

aosp/2147825 will prohibit underscore character in bpf source name
(source name derives the obj name). Existing bpf modules with underscore
characters in source name need to be updated accordingly.

Bug: 236706995
Test: adb root; adb shell ls -l /sys/fs/bpf/ | grep gpuMem
Change-Id: I213624e59ce1bca6ee7c22028504f2d51e9c50df
2022-07-08 19:09:50 +08:00
Yuming Han ed8fc168e6 lmkd : Fixed running wrong for Go devices when use_minfree_levels is TRUE
The lmkd run wrong for Go devices, because min_score_adj is unused
when use_minfree_levels is set TURE.

Bug: 237947900
Signed-off-by: Yuming Han <yuming.han@unisoc.com>
Change-Id: I717561cd9e5f4d1a2ca60d9fc84adcd6e129420a
2022-07-07 18:24:22 +00:00
Yuming Han 79f58c012d lmkd: Fixed data overflow on ARM
Both pgscan_kwsapd and pgscan_direct are defined as unsigned long,
the overflow issues occur on ARM kernel space. Just check whether
their values changed.

Signed-off-by: Yuming Han <yuming.han@unisoc.com>
Change-Id: I73b27855ede9ca729208775e982660bae967ab92
2022-06-29 16:06:18 +08:00
Suren Baghdasaryan caebcddf9f lmkd: Fix the size of vmstat_field_names array
The size of the vmstat_field_names array should correspond to the number
of elements in vmstat_field enum (VS_FIELD_COUNT).

Bug: 227769256
Reported-by: Yuming Han <yuming.han@unisoc.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icac2810c4efca2a07cefba6e220165ef4f194867
2022-06-22 23:10:47 +00:00
Kameron Lutes e9769f7cf3 lmkd: Fix potential null dereference in hook call
If hooks are enabled in LMKD and kill_info is not supplied to
kill_one_process, there will be a null dereference on kill_info. This
changes validates ki before dereferencing.

Bug: b/210075795
Test: cq
Change-Id: Ie81ca9bdb73a71f16dc5682c8721a557b8b094fb
Merged-In: Ie81ca9bdb73a71f16dc5682c8721a557b8b094fb
2022-06-22 03:40:33 +00:00
Kameron Lutes 2cce3066da lmkd: Add hooks to LMKD
Adds several hooks to LMKD that can be overridden by the vendor. This
allows for device specific control of LMKD when necessary.

Bug: b/210075795
Test: cq

Change-Id: Ib231743183134b05148d45d681765860da6274ae
(cherry picked from commit 2c1248381a52fc520c6cd1acfaee80818eaa9ee1)
Merged-In: Ib231743183134b05148d45d681765860da6274ae
2022-06-22 03:35:23 +00:00
Yongqin Liu bf819b5593 lmkd: fix the cgroup attribute name to MemCgroupEventControl
which was CgroupEventControl before, but it's not the one
that definied in system/core/libprocessgroup/profiles/task_profiles.json.

And it causes lmkd crash for some setups like 4.19q + AOSP Master

Bug: 230642311

Test: boot the 4.19q + AOSP master setup with hikey960 board

Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
Change-Id: I87b1ea2040f21c52d549db58692fc8a2b114f8e6
2022-04-29 18:50:20 +08:00
전윤재 f70a8a260f lmkd: Fix a comparison operation with uninitialized variable.
Prevent comparing uninitialized poll_start_tm with curr_tm in call_handler().
The bug caused by this has been fixed by the commit: d816ab.
But the main bug is not fixed yet and it may cause problem
in later if we add another operations in this if block.

Change-Id: Id13318297a2cbf2f9784134a2ccd648cc221e8c4
Signed-off-by: Yoonjae Jeon <yj213.jeon@samsung.com>
2022-04-25 00:51:47 +00:00
Bart Van Assche 759943643f lmkd: Add support for cgroups v2 memcg hierarchy
Use the /sys/fs/cgroup/uid_%u/pid_%u path instead of
/dev/memcg/apps/uid_%u/pid_%u" if the memcg controller is mounted in the
v2 cgroup hierarchy. Skip the code that refers to memcg attributes that
only exist in the cgroup v1 hierarchy when using the v2 memcg. Complain
if it is attempted to use the old kill strategy in combination with
memcg v2 since only the new strategy is compatible with the v2 cgroup
hiearchy.

Bug: 213617178
Test: Tested lmkd inside the Cuttlefish emulator. Triggered an
Test: out-of-memory condition as follows:
Test: i=0; while [ $i -lt 16 ]; do dd if=/dev/zero of=/dev/null bs=1G count=1 & ((i++)) done
Test: That caused the following output to appear in logcat:
Test: 02-03 18:13:02.772   241   241 I lowmemorykiller: Kill 'com.android.packageinstaller' (3031), uid 10022, oom_score_adj 975 to free 29348kB rss, 21232kB swap; reason: min watermark is breached and swap is low (135016kB < 150384kB)
Test: 02-03 18:13:02.772   241   241 I killinfo: [3031,10022,975,0,29348,3,73644,98460,8,1000,33448,40220,1503848,135016,81720,1387040,27268,29520,50348,103164,20176,57664,0,0,0,519,103,5,0,21232,30016,0,2]
Test: From the kernel log:
Test: [  302.834958] Out of memory: Killed process 3017 (ADB-JDWP Connec) total-vm:13522856kB, anon-rss:0kB, file-rss:0kB, shmem-rss:560kB, UID:10052 pgtables:1008kB oom_score_adj:975
Test: [  303.223702] Out of memory: Killed process 2859 (HeapTaskDaemon) total-vm:13534452kB, anon-rss:0kB, file-rss:0kB, shmem-rss:560kB, UID:10051 pgtables:1060kB oom_score_adj:965
Test: [  303.478833] Out of memory: Killed process 2816 (Signal Catcher) total-vm:13524108kB, anon-rss:0kB, file-rss:0kB, shmem-rss:564kB, UID:10073 pgtables:1016kB oom_score_adj:965
Test: [  304.823796] Out of memory: Killed process 2438 (ReferenceQueueD) total-vm:13529180kB, anon-rss:0kB, file-rss:0kB, shmem-rss:568kB, UID:10015 pgtables:1056kB oom_score_adj:955
Test: [  305.226728] Out of memory: Killed process 3126 (DefaultDispatch) total-vm:13532596kB, anon-rss:0kB, file-rss:0kB, shmem-rss:528kB, UID:10019 pgtables:1064kB oom_score_adj:945
Test: [  305.935615] Out of memory: Killed process 2637 (Jit thread pool) total-vm:13523084kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:10024 pgtables:1036kB oom_score_adj:935
Test: [  307.055895] Out of memory: Killed process 2063 (HeapTaskDaemon) total-vm:13755876kB, anon-rss:0kB, file-rss:0kB, shmem-rss:11600kB, UID:10074 pgtables:1552kB oom_score_adj:0
Test: [  307.398512] Out of memory: Killed process 2298 (rs.media.module) total-vm:13560404kB, anon-rss:12924kB, file-rss:0kB, shmem-rss:720kB, UID:10076 pgtables:1156kB oom_score_adj:-700
Test: [  309.888679] Out of memory: Killed process 1745 (droid.bluetooth) total-vm:13720424kB, anon-rss:11084kB, file-rss:0kB, shmem-rss:208kB, UID:1002 pgtables:1220kB oom_score_adj:-700
Test: [  311.050133] Out of memory: Killed process 1759 (ndroid.systemui) total-vm:13852268kB, anon-rss:26468kB, file-rss:0kB, shmem-rss:6988kB, UID:10075 pgtables:1628kB oom_score_adj:-800
Test: [  313.520057] Out of memory: Killed process 2000 (m.android.phone) total-vm:13593404kB, anon-rss:14600kB, file-rss:236kB, shmem-rss:340kB, UID:1001 pgtables:1312kB oom_score_adj:-800
Test: [  314.337744] Out of memory: Killed process 1911 (rkstack.process) total-vm:13540420kB, anon-rss:8232kB, file-rss:0kB, shmem-rss:180kB, UID:1073 pgtables:1128kB oom_score_adj:-800
Test: [  314.745542] Out of memory: Killed process 1984 (com.android.se) total-vm:13525236kB, anon-rss:6944kB, file-rss:0kB, shmem-rss:188kB, UID:1068 pgtables:1012kB oom_score_adj:-800
Test: [  315.007212] Out of memory: Killed process 677 (system_server) total-vm:15049836kB, anon-rss:1856kB, file-rss:0kB, shmem-rss:240kB, UID:1000 pgtables:2084kB oom_score_adj:-900
Change-Id: I78821bcd332af7b3f642f037faa2df15a937dc26
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-03-29 13:38:04 -07:00
Bart Van Assche b4d26bb22b lmkd: Look up cgroup attribute paths instead of hardcoding these
Retrieve cgroup attribute paths from task_profiles.json instead of
hardcoding these paths.

Bug: 213617178
Test: Tested lmkd in Cuttlefish.
Change-Id: I03f40ac8ccd4635f21432214e1acf997c505d1e9
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-03-24 15:58:02 -07:00
Bart Van Assche 5ebc4e8f51 lmkd: Fix a potential buffer overflow
Prevent that the statement that writes '\0' past the read data can write
past the end of the buffer.

Bug: 213617178
Test: Compile-tested only.
Change-Id: I6922c343a6bcb52dce0b5cf54f09b2850e9dfde2
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-02-23 17:05:11 +00:00
Bart Van Assche 545067957c lmkd: Use std::array<> and remove the ARRAY_SIZE() definition
Using ARRAY_SIZE() on a pointer yields 1 while applying .size() to a
pointer triggers a compiler error. Hence use .size() instead of
ARRAY_SIZE().

Bug: 213617178
Test: Compile-tested only.
Change-Id: Ie0f9740f59470c943f8d62b9475f7f987ed8707b
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-02-23 17:05:11 +00:00
Bart Van Assche 80a3dba57a lmkd: Use std::min() and std::max() instead of defining min() and max() macros
std::min() and std::max() check whether the argument types match but
min() and max() not. Hence switch to std::min() and std::max().

Bug: 213617178
Test: Compile-tested only.
Change-Id: Iaf1f63360c9360938db56d485dbd1e504129c52d
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-02-23 17:05:10 +00:00
Suren Baghdasaryan 014dd7156e lmkd: Log psi averages in killinfo reports
Add psi memory, io and cpu 10sec averages into killinfo reports.

Bug: 205182133
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ifbcf47cab291fb20dbf0b5d32f1965f4e6462b49
2022-02-21 20:30:50 -08:00
Suren Baghdasaryan 5ae47a9563 lmkd: Allow killing perceptible apps when recorded stall is too high
When system is under heavy memory pressure the system might be able
to keep free memory above the min watermark avoiding perceptible app
kills. In such situation system might end up using all its cpu
capacity on memory reclaim and not doing productive work. To detect
this condition, check memory full stall and compare it with the new
ro.lmk.stall_limit_critical tunable representing the stall threshold.
When the recorded level is over ro.lmk.stall_limit_critical, lmkd will
be allowed to kill perceptible apps. ro.lmk.stall_limit_critical
represents the max memory full stall in % that is allowed before
perceptible apps will get killed. By default it is set to 100%, which
effectively disables the feature.
Currently system stall is measured based on psi memory stall 10s average
value, however this definition might change in the future if better
metrics are developed. Setting ro.lmk.stall_limit_critical to 5 means
the system should be fully stalled (no productive work is done) for 5%
of the 10sec period, resulting in 0.5 sec loss due to the stall.

Bug: 205182133
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9713e30d82641d86d1b7edb5e1ba2971b935c898
2022-02-21 20:27:45 -08:00
Suren Baghdasaryan 2bdf7f0c74 lmkd: Set task profiles to the entire process
set_process_group_and_prio sets task profiles for each thread in the
process separately. This can be avoided by setting task profiles to
the entire process using its pid.

Bug: 215557553
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9c1917172019a42809385f6c9c084b8cb343b520
2022-01-24 13:43:42 -08:00
Suren Baghdasaryan af1b0e0627 lmkd: Implement watchdog thread
To detect lmkd being stuck on a syscall for prolonged period of time,
introduce a watchdog thread which gets set when lmkd starts handling of
events and is reset after handling is done. If it takes more than the
timeout period (2 sec) to handle an event, watchdog wakes up and kills
the least important process to prevent mounting memory pressure caused
by lmkd lockup. After a kill, watchdog will wait for the reset for
another timeout period and kill again. This repeats until lmkd unlocks
and resets the watchdog.

Bug: 201671997
Test: induce random sleep in lmkd main handler and observe watchdog kills
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I56a55834582e11c06cc6cf9da3bc7380e634b301
2022-01-06 18:14:25 +00:00
Suren Baghdasaryan 7c3addb2a1 lmkd: Use process_mrelease to reap the target process from a thread
process_mrelease syscall can be used to expedite memory release of
a process after it was killed. This allows memory to be released
without the target process being scheduled, therefore does not depend
on target's priority or the CPU it's running on.
However process_mrelease syscall can take considerable time. Blocking
lmkd main thread during that time can cause memory pressure events
being missed while lmkd is busy reaping previous target's memory.
For this reason reaping should be done in a separate thread. This way
lmkd main thread can keep monitoring memory pressure while memory is
being released.
Introduce Reaper class which maintains a pool of threads to perform
process killing and reaping. The main thread submits a request to the
Reaper to kill and reap the process without blocking. If all the threads
in the pool are busy at the time the next kill is needed, the kill is
performed by the main thread without reaping.

Bug: 130172058
Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If7b10fdd1838bdfeea3fed3031565feffe0b52be
2022-01-06 18:14:14 +00:00
liuhailong cf8af501dc lmkd: Fix lowmem_minfree out of bounds
lmkd daemon launches before system_server. If lowmem_targets_size
does not initialize by system_server, the value will be zero.
Before system_server starts lmkd receives a psi event
and debug_process_killing on, the lmkd crashes here.

Bug: 209090314
Signed-off-by: liuhailong <liuhailong@oppo.com>
Change-Id: I0736a882ed1ff5eee2b07676ae590a2cb2a7721c
2021-12-04 17:59:24 +08:00
Suren Baghdasaryan 6e6d14b387 lmkd: fix low swap threshold failing to update after reinit
lmkd calculates low swap threshold using total available swap and
ro.lmk.swap_free_low_percentage property. A wrong assumption is made that
both these values are constant and therefore the threshold can be
calculated once and reused later. However ro.lmk.swap_free_low_percentage
can be changed by the user and lmkd --reinit issued to reapply new
configuration. If that happens low swap threshold will not be updated.
Fix this by calculating the threshold whenever it is used. The overhead
of that calculation is negligible.

Bug: 203161607
Test: setprop ro.lmk.swap_free_low_percentage <new value>; lmkd --reinit
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idff50655a75d006ea86d9ab10ca54c375c4bea46
2021-10-28 13:30:08 -07:00
Wei Wang 730e7a9248 lmkd: move to foreground cpuset before killing
Test: Build and boot
Bug: 199797672
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: Id475625e0d892fb7111a2cf054d1b57d17003d5a
2021-09-30 23:24:11 -07:00
Wei Wang 0162e0361f lmkd: use fd cache for cgroup migration
Test: Build
Bug: 199797672
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: Ie7a9eb9676c58309f1407c5f8cc59b302f107d38
2021-09-21 14:38:49 -07:00
Wei Wang 0195bcdba7 lmkd: migrate process to FOREGROUND sched group before kill
BG group may have settings such as cpu.shares impacting reclaim
performance. Let us migrate task to foreground sched group similarly to
cpuset group.

Test: Build
Bug: 199797672
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I75ee9f3486a2c76e65267a98e39edff96a5e1673
2021-09-13 19:07:26 -07:00
Suren Baghdasaryan 0e64eadc21 lmkd: Do not re-initialize lmkd when persistent properties are loaded
When a device boots, lmkd starts before persistent properties are loaded,
therefore if experiments set any flags, the corresponding persistent
properties will trigger change notifications when they are first loaded
during boot.
In order to prevent lmkd from re-initializing on every property load,
mark persistent property change by setting lmkd.reinit to 0 and delay
lmkd re-initialization until sys.boot_completed=1 when all properties
are set and only one re-initialization will capture them all. On devices
with no experiment flags being set lmkd.reinit will be undefined at the
boot completion time and re-initialization will not be triggered at all.

Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Test: adb shell device_config put lmkd_native thrashing_limit 100
Test: adb reboot; adb -b all logcat | grep lmkd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iba34fc719a18d58b890549c7415bec869d471901
2021-09-01 00:56:35 -07:00
Suren Baghdasaryan d0a800402c lmkd: Add support for persist.device_config.lmkd_native.* properties
Allow persist.device_config.lmkd_native.* to override ro.lmk.*
properties to enable experiments with lmkd configuration properties.
Experiments will be able to set appropriate
persist.device_config.lmkd_native.<name> property which will issue
"lmkd --reinit" command to reinitialize lmkd with new parameters.

Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia48fd51eab126d307a1604530b642e86cf250688
2021-08-31 09:20:46 -07:00
Suren Baghdasaryan 39b54809fb lmkd: Add thrashing and max_thrashing into killinfo reports
Due to the increased importance of thrashing limits, include current and
max thrashing levels into killinfo reports.

Bug: 195979894
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I36f947e45e03a4d845d18881e137e4b242aacb65
2021-08-09 15:10:46 -07:00
George Burgess IV e849f1414e lmkd: fix potential NULL pointer dereference
`ki` appears to be potentially NULL. Output bogus values if it is.

Caught by the static analyzer:
> system/memory/lmkd/lmkd.cpp:2171:66: warning: Access to field
'kill_reason' results in a dereference of a null pointer (loaded from
variable 'ki') [clang-analyzer-core.NullDereference]

Bug: None
Test: TreeHugger
Change-Id: Iae26855528e1f7fec8f1455e06c7e813a732dc75
2021-08-05 06:59:42 +00:00
Suren Baghdasaryan 34928bb817 lmkd: Add a tracepoint for each kill with kill parameters
Add a trace for each kill that includes pid, kill reason, oom_adj_score,
min_oom_score and max_thrashing statistics at the time of the kill.

Bug: 195085238
Test: generate kills while tracing and observer the new tracepoints
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic2014adc08f5e5dd4aacd415970332618bd15250
2021-07-30 12:59:15 -07:00
Suren Baghdasaryan e16047516d lmkd: Add current and max thrashing levels in LMK_MEMORY_STATS reports
Thrashing threshold tuning requires collecting thrashing level data from
the field and correlating these levels with other indications of device
being non-responsive.
Include current and max thrashing levels in the lmkd kill reports. Max
thrashing level captures the highest level seen since the last kill report.

Bug: 194433891
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
Merged-In: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
2021-07-23 19:11:36 +00:00
Suren Baghdasaryan 1ef4718aed Revert "lmkd: Disable critical thrashing limit by default"
This reverts commit e1ffef4e36.

Reason for revert: Restore the default thrashing limits to prevent unresponsive devices.

Bug: 194199500
Change-Id: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260
Merged-In: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260
2021-07-23 12:01:55 -07:00
Suren Baghdasaryan c1171394a3 lmkd: Disable critical thrashing limit by default
Critical thrashing limit determines the balance between how much
thrashing should be tolerated before killing a perceptible app.
This threshold might differ between devices, therefore we disable
critical thrashing limit by default allowing each device to set it
individually. This is done to prevent excessive kills of perceptible
apps.

Bug: 194199500
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idd1715564c3727b09fcb0a109ab3d6bae9d0b99a
2021-07-20 18:12:22 +00:00
Suren Baghdasaryan 11221d4062 lmkd: Add ro.lmk.filecache_min_kb property for min filecache watermark
We see many cases when device keeps thrashing despite lmkd kills. This
happens because killed processes do not free enough filecache to fit
the current workingset completely.
To prevent such cases, introduce ro.lmk.filecache_min_kb property to
specify min filecache size in KB that should be reached after thrashing
is detected. Lmkd will keep killing background processes until this
filecache size limit is satisfied.

Bug: 193293513
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
2021-07-15 11:05:09 -07:00
Suren Baghdasaryan 940e7cf8bd lmkd: Include total GPU memory usage in killinfo reports
/sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU
allocations size. Include this value into killinfo reports to track GPU
allocation size at the time of the kill.

Bug: 189366037
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
2021-06-03 15:56:52 -07:00
Vova Sharaienko a92b76b54d lmkd: reroute atoms logging to AMS
- Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT
- Added support to write kill & mem stats information via data socket
  to be read & parsed on the AMS Java side for future logging to statsd

Bug: 184698933
Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS
Test: statsd_testdrive 51 54 to inspect statsd logged atoms data
Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4
Merged-In: Id682a438c87b3e4503261d26461f6cee641d86c4
2021-05-11 00:00:56 +00:00
Suren Baghdasaryan 5263aa7800 lmkd: Do not treat RSS=0 as a sign of a process being dead
With kernel SPLIT_RSS_COUNTING feature it is possible for a valid
process to report RSS of 0 size when reading /proc/pid/statm. This
happens because split RSS accounting aggregates per-thread counters
asynchronously and depending on the timing of the read, reported
value can be inaccurate and occasionally be 0.
lmkd currently treats processes reporting RSS of 0 as dead and
removes them from the list of processes being tracked. This might
lead to a valid process becoming unkillable.
Change lmkd to stop treating RSS of 0 as a sign of a dead process.

Bug: 160199622
Test: set ro.lmk.kill_heaviest_task=true and hack kernel to report RSS=0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia311d2f98649c92d1a487657f94ea51f57813b73
2021-04-29 15:33:06 -07:00
Suren Baghdasaryan 9f1be12b9a lmkd: Handle cases when proc_get_name() might return NULL
proc_get_name() can return NULL if the corresponding process has died
or open fails with ENOMEM due to memory shortages.
Ensure such cases are handled without NULL pointer access.

Bug: 186157675
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1
2021-04-23 21:18:18 +00:00
Suren Baghdasaryan 0142b3c166 lmkd: Allow lmkd to kill perceptible apps during heavy thrashing
Occasionally a system can get into heavy file cache thrashing situation
and become unresponsive. In these situations we observe lmkd wakeups,
however it does not kill because all non-perceptible apps are already
killed and the system manages to reclaim enough memory to stay above
min watermark.
Add ro.lmk.thrashing_limit_critical property which when breached will
allow lmkd to kill perceptible apps. The property represents the
percentage of refaulted workingset pages as a fraction of overall file
cache size. By default it is disabled.

Bug: 181778155
Test: thrashing.py 500 10 200
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icb38ef6c90adaa4f5c956593b6ea0c4febc91dc0
2021-03-25 17:00:09 -07:00