Commit Graph

114 Commits

Author SHA1 Message Date
Ken Chen 4c18e91f22 Rename gpu_mem.o to gpuMem.o
Underscore character may cause bpf prog/map naming collision. For
example, x.o with map y_z and x_y.o with map z both result in x_y_z
prog/map name, which should be prevented during compile-time.

aosp/2147825 will prohibit underscore character in bpf source name
(source name derives the obj name). Existing bpf modules with underscore
characters in source name need to be updated accordingly.

Bug: 236706995
Test: adb root; adb shell ls -l /sys/fs/bpf/ | grep gpuMem
Change-Id: I213624e59ce1bca6ee7c22028504f2d51e9c50df
2022-07-08 19:09:50 +08:00
Yuming Han ed8fc168e6 lmkd : Fixed running wrong for Go devices when use_minfree_levels is TRUE
The lmkd run wrong for Go devices, because min_score_adj is unused
when use_minfree_levels is set TURE.

Bug: 237947900
Signed-off-by: Yuming Han <yuming.han@unisoc.com>
Change-Id: I717561cd9e5f4d1a2ca60d9fc84adcd6e129420a
2022-07-07 18:24:22 +00:00
Yuming Han 79f58c012d lmkd: Fixed data overflow on ARM
Both pgscan_kwsapd and pgscan_direct are defined as unsigned long,
the overflow issues occur on ARM kernel space. Just check whether
their values changed.

Signed-off-by: Yuming Han <yuming.han@unisoc.com>
Change-Id: I73b27855ede9ca729208775e982660bae967ab92
2022-06-29 16:06:18 +08:00
Suren Baghdasaryan caebcddf9f lmkd: Fix the size of vmstat_field_names array
The size of the vmstat_field_names array should correspond to the number
of elements in vmstat_field enum (VS_FIELD_COUNT).

Bug: 227769256
Reported-by: Yuming Han <yuming.han@unisoc.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icac2810c4efca2a07cefba6e220165ef4f194867
2022-06-22 23:10:47 +00:00
Kameron Lutes e9769f7cf3 lmkd: Fix potential null dereference in hook call
If hooks are enabled in LMKD and kill_info is not supplied to
kill_one_process, there will be a null dereference on kill_info. This
changes validates ki before dereferencing.

Bug: b/210075795
Test: cq
Change-Id: Ie81ca9bdb73a71f16dc5682c8721a557b8b094fb
Merged-In: Ie81ca9bdb73a71f16dc5682c8721a557b8b094fb
2022-06-22 03:40:33 +00:00
Kameron Lutes 2cce3066da lmkd: Add hooks to LMKD
Adds several hooks to LMKD that can be overridden by the vendor. This
allows for device specific control of LMKD when necessary.

Bug: b/210075795
Test: cq

Change-Id: Ib231743183134b05148d45d681765860da6274ae
(cherry picked from commit 2c1248381a52fc520c6cd1acfaee80818eaa9ee1)
Merged-In: Ib231743183134b05148d45d681765860da6274ae
2022-06-22 03:35:23 +00:00
Yongqin Liu bf819b5593 lmkd: fix the cgroup attribute name to MemCgroupEventControl
which was CgroupEventControl before, but it's not the one
that definied in system/core/libprocessgroup/profiles/task_profiles.json.

And it causes lmkd crash for some setups like 4.19q + AOSP Master

Bug: 230642311

Test: boot the 4.19q + AOSP master setup with hikey960 board

Signed-off-by: Yongqin Liu <yongqin.liu@linaro.org>
Change-Id: I87b1ea2040f21c52d549db58692fc8a2b114f8e6
2022-04-29 18:50:20 +08:00
전윤재 f70a8a260f lmkd: Fix a comparison operation with uninitialized variable.
Prevent comparing uninitialized poll_start_tm with curr_tm in call_handler().
The bug caused by this has been fixed by the commit: d816ab.
But the main bug is not fixed yet and it may cause problem
in later if we add another operations in this if block.

Change-Id: Id13318297a2cbf2f9784134a2ccd648cc221e8c4
Signed-off-by: Yoonjae Jeon <yj213.jeon@samsung.com>
2022-04-25 00:51:47 +00:00
Bart Van Assche 759943643f lmkd: Add support for cgroups v2 memcg hierarchy
Use the /sys/fs/cgroup/uid_%u/pid_%u path instead of
/dev/memcg/apps/uid_%u/pid_%u" if the memcg controller is mounted in the
v2 cgroup hierarchy. Skip the code that refers to memcg attributes that
only exist in the cgroup v1 hierarchy when using the v2 memcg. Complain
if it is attempted to use the old kill strategy in combination with
memcg v2 since only the new strategy is compatible with the v2 cgroup
hiearchy.

Bug: 213617178
Test: Tested lmkd inside the Cuttlefish emulator. Triggered an
Test: out-of-memory condition as follows:
Test: i=0; while [ $i -lt 16 ]; do dd if=/dev/zero of=/dev/null bs=1G count=1 & ((i++)) done
Test: That caused the following output to appear in logcat:
Test: 02-03 18:13:02.772   241   241 I lowmemorykiller: Kill 'com.android.packageinstaller' (3031), uid 10022, oom_score_adj 975 to free 29348kB rss, 21232kB swap; reason: min watermark is breached and swap is low (135016kB < 150384kB)
Test: 02-03 18:13:02.772   241   241 I killinfo: [3031,10022,975,0,29348,3,73644,98460,8,1000,33448,40220,1503848,135016,81720,1387040,27268,29520,50348,103164,20176,57664,0,0,0,519,103,5,0,21232,30016,0,2]
Test: From the kernel log:
Test: [  302.834958] Out of memory: Killed process 3017 (ADB-JDWP Connec) total-vm:13522856kB, anon-rss:0kB, file-rss:0kB, shmem-rss:560kB, UID:10052 pgtables:1008kB oom_score_adj:975
Test: [  303.223702] Out of memory: Killed process 2859 (HeapTaskDaemon) total-vm:13534452kB, anon-rss:0kB, file-rss:0kB, shmem-rss:560kB, UID:10051 pgtables:1060kB oom_score_adj:965
Test: [  303.478833] Out of memory: Killed process 2816 (Signal Catcher) total-vm:13524108kB, anon-rss:0kB, file-rss:0kB, shmem-rss:564kB, UID:10073 pgtables:1016kB oom_score_adj:965
Test: [  304.823796] Out of memory: Killed process 2438 (ReferenceQueueD) total-vm:13529180kB, anon-rss:0kB, file-rss:0kB, shmem-rss:568kB, UID:10015 pgtables:1056kB oom_score_adj:955
Test: [  305.226728] Out of memory: Killed process 3126 (DefaultDispatch) total-vm:13532596kB, anon-rss:0kB, file-rss:0kB, shmem-rss:528kB, UID:10019 pgtables:1064kB oom_score_adj:945
Test: [  305.935615] Out of memory: Killed process 2637 (Jit thread pool) total-vm:13523084kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB, UID:10024 pgtables:1036kB oom_score_adj:935
Test: [  307.055895] Out of memory: Killed process 2063 (HeapTaskDaemon) total-vm:13755876kB, anon-rss:0kB, file-rss:0kB, shmem-rss:11600kB, UID:10074 pgtables:1552kB oom_score_adj:0
Test: [  307.398512] Out of memory: Killed process 2298 (rs.media.module) total-vm:13560404kB, anon-rss:12924kB, file-rss:0kB, shmem-rss:720kB, UID:10076 pgtables:1156kB oom_score_adj:-700
Test: [  309.888679] Out of memory: Killed process 1745 (droid.bluetooth) total-vm:13720424kB, anon-rss:11084kB, file-rss:0kB, shmem-rss:208kB, UID:1002 pgtables:1220kB oom_score_adj:-700
Test: [  311.050133] Out of memory: Killed process 1759 (ndroid.systemui) total-vm:13852268kB, anon-rss:26468kB, file-rss:0kB, shmem-rss:6988kB, UID:10075 pgtables:1628kB oom_score_adj:-800
Test: [  313.520057] Out of memory: Killed process 2000 (m.android.phone) total-vm:13593404kB, anon-rss:14600kB, file-rss:236kB, shmem-rss:340kB, UID:1001 pgtables:1312kB oom_score_adj:-800
Test: [  314.337744] Out of memory: Killed process 1911 (rkstack.process) total-vm:13540420kB, anon-rss:8232kB, file-rss:0kB, shmem-rss:180kB, UID:1073 pgtables:1128kB oom_score_adj:-800
Test: [  314.745542] Out of memory: Killed process 1984 (com.android.se) total-vm:13525236kB, anon-rss:6944kB, file-rss:0kB, shmem-rss:188kB, UID:1068 pgtables:1012kB oom_score_adj:-800
Test: [  315.007212] Out of memory: Killed process 677 (system_server) total-vm:15049836kB, anon-rss:1856kB, file-rss:0kB, shmem-rss:240kB, UID:1000 pgtables:2084kB oom_score_adj:-900
Change-Id: I78821bcd332af7b3f642f037faa2df15a937dc26
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-03-29 13:38:04 -07:00
Bart Van Assche b4d26bb22b lmkd: Look up cgroup attribute paths instead of hardcoding these
Retrieve cgroup attribute paths from task_profiles.json instead of
hardcoding these paths.

Bug: 213617178
Test: Tested lmkd in Cuttlefish.
Change-Id: I03f40ac8ccd4635f21432214e1acf997c505d1e9
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-03-24 15:58:02 -07:00
Bart Van Assche 5ebc4e8f51 lmkd: Fix a potential buffer overflow
Prevent that the statement that writes '\0' past the read data can write
past the end of the buffer.

Bug: 213617178
Test: Compile-tested only.
Change-Id: I6922c343a6bcb52dce0b5cf54f09b2850e9dfde2
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-02-23 17:05:11 +00:00
Bart Van Assche 545067957c lmkd: Use std::array<> and remove the ARRAY_SIZE() definition
Using ARRAY_SIZE() on a pointer yields 1 while applying .size() to a
pointer triggers a compiler error. Hence use .size() instead of
ARRAY_SIZE().

Bug: 213617178
Test: Compile-tested only.
Change-Id: Ie0f9740f59470c943f8d62b9475f7f987ed8707b
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-02-23 17:05:11 +00:00
Bart Van Assche 80a3dba57a lmkd: Use std::min() and std::max() instead of defining min() and max() macros
std::min() and std::max() check whether the argument types match but
min() and max() not. Hence switch to std::min() and std::max().

Bug: 213617178
Test: Compile-tested only.
Change-Id: Iaf1f63360c9360938db56d485dbd1e504129c52d
Signed-off-by: Bart Van Assche <bvanassche@google.com>
2022-02-23 17:05:10 +00:00
Suren Baghdasaryan 014dd7156e lmkd: Log psi averages in killinfo reports
Add psi memory, io and cpu 10sec averages into killinfo reports.

Bug: 205182133
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ifbcf47cab291fb20dbf0b5d32f1965f4e6462b49
2022-02-21 20:30:50 -08:00
Suren Baghdasaryan 5ae47a9563 lmkd: Allow killing perceptible apps when recorded stall is too high
When system is under heavy memory pressure the system might be able
to keep free memory above the min watermark avoiding perceptible app
kills. In such situation system might end up using all its cpu
capacity on memory reclaim and not doing productive work. To detect
this condition, check memory full stall and compare it with the new
ro.lmk.stall_limit_critical tunable representing the stall threshold.
When the recorded level is over ro.lmk.stall_limit_critical, lmkd will
be allowed to kill perceptible apps. ro.lmk.stall_limit_critical
represents the max memory full stall in % that is allowed before
perceptible apps will get killed. By default it is set to 100%, which
effectively disables the feature.
Currently system stall is measured based on psi memory stall 10s average
value, however this definition might change in the future if better
metrics are developed. Setting ro.lmk.stall_limit_critical to 5 means
the system should be fully stalled (no productive work is done) for 5%
of the 10sec period, resulting in 0.5 sec loss due to the stall.

Bug: 205182133
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9713e30d82641d86d1b7edb5e1ba2971b935c898
2022-02-21 20:27:45 -08:00
Suren Baghdasaryan 2bdf7f0c74 lmkd: Set task profiles to the entire process
set_process_group_and_prio sets task profiles for each thread in the
process separately. This can be avoided by setting task profiles to
the entire process using its pid.

Bug: 215557553
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9c1917172019a42809385f6c9c084b8cb343b520
2022-01-24 13:43:42 -08:00
Suren Baghdasaryan af1b0e0627 lmkd: Implement watchdog thread
To detect lmkd being stuck on a syscall for prolonged period of time,
introduce a watchdog thread which gets set when lmkd starts handling of
events and is reset after handling is done. If it takes more than the
timeout period (2 sec) to handle an event, watchdog wakes up and kills
the least important process to prevent mounting memory pressure caused
by lmkd lockup. After a kill, watchdog will wait for the reset for
another timeout period and kill again. This repeats until lmkd unlocks
and resets the watchdog.

Bug: 201671997
Test: induce random sleep in lmkd main handler and observe watchdog kills
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I56a55834582e11c06cc6cf9da3bc7380e634b301
2022-01-06 18:14:25 +00:00
Suren Baghdasaryan 7c3addb2a1 lmkd: Use process_mrelease to reap the target process from a thread
process_mrelease syscall can be used to expedite memory release of
a process after it was killed. This allows memory to be released
without the target process being scheduled, therefore does not depend
on target's priority or the CPU it's running on.
However process_mrelease syscall can take considerable time. Blocking
lmkd main thread during that time can cause memory pressure events
being missed while lmkd is busy reaping previous target's memory.
For this reason reaping should be done in a separate thread. This way
lmkd main thread can keep monitoring memory pressure while memory is
being released.
Introduce Reaper class which maintains a pool of threads to perform
process killing and reaping. The main thread submits a request to the
Reaper to kill and reap the process without blocking. If all the threads
in the pool are busy at the time the next kill is needed, the kill is
performed by the main thread without reaping.

Bug: 130172058
Bug: 189803002
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If7b10fdd1838bdfeea3fed3031565feffe0b52be
2022-01-06 18:14:14 +00:00
liuhailong cf8af501dc lmkd: Fix lowmem_minfree out of bounds
lmkd daemon launches before system_server. If lowmem_targets_size
does not initialize by system_server, the value will be zero.
Before system_server starts lmkd receives a psi event
and debug_process_killing on, the lmkd crashes here.

Bug: 209090314
Signed-off-by: liuhailong <liuhailong@oppo.com>
Change-Id: I0736a882ed1ff5eee2b07676ae590a2cb2a7721c
2021-12-04 17:59:24 +08:00
Suren Baghdasaryan 6e6d14b387 lmkd: fix low swap threshold failing to update after reinit
lmkd calculates low swap threshold using total available swap and
ro.lmk.swap_free_low_percentage property. A wrong assumption is made that
both these values are constant and therefore the threshold can be
calculated once and reused later. However ro.lmk.swap_free_low_percentage
can be changed by the user and lmkd --reinit issued to reapply new
configuration. If that happens low swap threshold will not be updated.
Fix this by calculating the threshold whenever it is used. The overhead
of that calculation is negligible.

Bug: 203161607
Test: setprop ro.lmk.swap_free_low_percentage <new value>; lmkd --reinit
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idff50655a75d006ea86d9ab10ca54c375c4bea46
2021-10-28 13:30:08 -07:00
Wei Wang 730e7a9248 lmkd: move to foreground cpuset before killing
Test: Build and boot
Bug: 199797672
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: Id475625e0d892fb7111a2cf054d1b57d17003d5a
2021-09-30 23:24:11 -07:00
Wei Wang 0162e0361f lmkd: use fd cache for cgroup migration
Test: Build
Bug: 199797672
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: Ie7a9eb9676c58309f1407c5f8cc59b302f107d38
2021-09-21 14:38:49 -07:00
Wei Wang 0195bcdba7 lmkd: migrate process to FOREGROUND sched group before kill
BG group may have settings such as cpu.shares impacting reclaim
performance. Let us migrate task to foreground sched group similarly to
cpuset group.

Test: Build
Bug: 199797672
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I75ee9f3486a2c76e65267a98e39edff96a5e1673
2021-09-13 19:07:26 -07:00
Suren Baghdasaryan 0e64eadc21 lmkd: Do not re-initialize lmkd when persistent properties are loaded
When a device boots, lmkd starts before persistent properties are loaded,
therefore if experiments set any flags, the corresponding persistent
properties will trigger change notifications when they are first loaded
during boot.
In order to prevent lmkd from re-initializing on every property load,
mark persistent property change by setting lmkd.reinit to 0 and delay
lmkd re-initialization until sys.boot_completed=1 when all properties
are set and only one re-initialization will capture them all. On devices
with no experiment flags being set lmkd.reinit will be undefined at the
boot completion time and re-initialization will not be triggered at all.

Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Test: adb shell device_config put lmkd_native thrashing_limit 100
Test: adb reboot; adb -b all logcat | grep lmkd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iba34fc719a18d58b890549c7415bec869d471901
2021-09-01 00:56:35 -07:00
Suren Baghdasaryan d0a800402c lmkd: Add support for persist.device_config.lmkd_native.* properties
Allow persist.device_config.lmkd_native.* to override ro.lmk.*
properties to enable experiments with lmkd configuration properties.
Experiments will be able to set appropriate
persist.device_config.lmkd_native.<name> property which will issue
"lmkd --reinit" command to reinitialize lmkd with new parameters.

Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia48fd51eab126d307a1604530b642e86cf250688
2021-08-31 09:20:46 -07:00
Suren Baghdasaryan 39b54809fb lmkd: Add thrashing and max_thrashing into killinfo reports
Due to the increased importance of thrashing limits, include current and
max thrashing levels into killinfo reports.

Bug: 195979894
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I36f947e45e03a4d845d18881e137e4b242aacb65
2021-08-09 15:10:46 -07:00
George Burgess IV e849f1414e lmkd: fix potential NULL pointer dereference
`ki` appears to be potentially NULL. Output bogus values if it is.

Caught by the static analyzer:
> system/memory/lmkd/lmkd.cpp:2171:66: warning: Access to field
'kill_reason' results in a dereference of a null pointer (loaded from
variable 'ki') [clang-analyzer-core.NullDereference]

Bug: None
Test: TreeHugger
Change-Id: Iae26855528e1f7fec8f1455e06c7e813a732dc75
2021-08-05 06:59:42 +00:00
Suren Baghdasaryan 34928bb817 lmkd: Add a tracepoint for each kill with kill parameters
Add a trace for each kill that includes pid, kill reason, oom_adj_score,
min_oom_score and max_thrashing statistics at the time of the kill.

Bug: 195085238
Test: generate kills while tracing and observer the new tracepoints
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic2014adc08f5e5dd4aacd415970332618bd15250
2021-07-30 12:59:15 -07:00
Suren Baghdasaryan e16047516d lmkd: Add current and max thrashing levels in LMK_MEMORY_STATS reports
Thrashing threshold tuning requires collecting thrashing level data from
the field and correlating these levels with other indications of device
being non-responsive.
Include current and max thrashing levels in the lmkd kill reports. Max
thrashing level captures the highest level seen since the last kill report.

Bug: 194433891
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
Merged-In: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
2021-07-23 19:11:36 +00:00
Suren Baghdasaryan 1ef4718aed Revert "lmkd: Disable critical thrashing limit by default"
This reverts commit e1ffef4e36.

Reason for revert: Restore the default thrashing limits to prevent unresponsive devices.

Bug: 194199500
Change-Id: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260
Merged-In: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260
2021-07-23 12:01:55 -07:00
Suren Baghdasaryan c1171394a3 lmkd: Disable critical thrashing limit by default
Critical thrashing limit determines the balance between how much
thrashing should be tolerated before killing a perceptible app.
This threshold might differ between devices, therefore we disable
critical thrashing limit by default allowing each device to set it
individually. This is done to prevent excessive kills of perceptible
apps.

Bug: 194199500
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idd1715564c3727b09fcb0a109ab3d6bae9d0b99a
2021-07-20 18:12:22 +00:00
Suren Baghdasaryan 11221d4062 lmkd: Add ro.lmk.filecache_min_kb property for min filecache watermark
We see many cases when device keeps thrashing despite lmkd kills. This
happens because killed processes do not free enough filecache to fit
the current workingset completely.
To prevent such cases, introduce ro.lmk.filecache_min_kb property to
specify min filecache size in KB that should be reached after thrashing
is detected. Lmkd will keep killing background processes until this
filecache size limit is satisfied.

Bug: 193293513
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
2021-07-15 11:05:09 -07:00
Suren Baghdasaryan 940e7cf8bd lmkd: Include total GPU memory usage in killinfo reports
/sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU
allocations size. Include this value into killinfo reports to track GPU
allocation size at the time of the kill.

Bug: 189366037
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
2021-06-03 15:56:52 -07:00
Vova Sharaienko a92b76b54d lmkd: reroute atoms logging to AMS
- Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT
- Added support to write kill & mem stats information via data socket
  to be read & parsed on the AMS Java side for future logging to statsd

Bug: 184698933
Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS
Test: statsd_testdrive 51 54 to inspect statsd logged atoms data
Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4
Merged-In: Id682a438c87b3e4503261d26461f6cee641d86c4
2021-05-11 00:00:56 +00:00
Suren Baghdasaryan 5263aa7800 lmkd: Do not treat RSS=0 as a sign of a process being dead
With kernel SPLIT_RSS_COUNTING feature it is possible for a valid
process to report RSS of 0 size when reading /proc/pid/statm. This
happens because split RSS accounting aggregates per-thread counters
asynchronously and depending on the timing of the read, reported
value can be inaccurate and occasionally be 0.
lmkd currently treats processes reporting RSS of 0 as dead and
removes them from the list of processes being tracked. This might
lead to a valid process becoming unkillable.
Change lmkd to stop treating RSS of 0 as a sign of a dead process.

Bug: 160199622
Test: set ro.lmk.kill_heaviest_task=true and hack kernel to report RSS=0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia311d2f98649c92d1a487657f94ea51f57813b73
2021-04-29 15:33:06 -07:00
Suren Baghdasaryan 9f1be12b9a lmkd: Handle cases when proc_get_name() might return NULL
proc_get_name() can return NULL if the corresponding process has died
or open fails with ENOMEM due to memory shortages.
Ensure such cases are handled without NULL pointer access.

Bug: 186157675
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1
2021-04-23 21:18:18 +00:00
Suren Baghdasaryan 0142b3c166 lmkd: Allow lmkd to kill perceptible apps during heavy thrashing
Occasionally a system can get into heavy file cache thrashing situation
and become unresponsive. In these situations we observe lmkd wakeups,
however it does not kill because all non-perceptible apps are already
killed and the system manages to reclaim enough memory to stay above
min watermark.
Add ro.lmk.thrashing_limit_critical property which when breached will
allow lmkd to kill perceptible apps. The property represents the
percentage of refaulted workingset pages as a fraction of overall file
cache size. By default it is disabled.

Bug: 181778155
Test: thrashing.py 500 10 200
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icb38ef6c90adaa4f5c956593b6ea0c4febc91dc0
2021-03-25 17:00:09 -07:00
Josh Gao 84623bef7b Switch to Bionic's pidfd wrappers.
Bug: http://b/172518739
Test: treehugger
Change-Id: Ib6cac8f31ec64343c6eec6b82dac52888890c688
2021-03-18 17:16:08 -07:00
Suren Baghdasaryan 858e8c6373 lmkd: choose the heaviest task when killing perceptible processes
When killing a task at or lower than oom_score_adj PERCEPTIBLE_APP_ADJ
choose the heaviest task among the ones at that level to try minimizing
the number of required kills. Because killing a perceptible app will
affect user experience anyway, it makes sense to choose the one that
will release the most memory and therefore no more kills might be
necessary.

Bug: 181778155
Test: running thrashing.py script
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I775ff774430b6fde4d619ede794825dbae59fd8e
2021-03-05 17:45:30 +00:00
Suren Baghdasaryan 236781873f lmkd: fix log message reporting the breached watermark
Wrong condition causes reporting low watermark breach when min watermark
is breached and visa versa. Fix the condition to make reporting correct.

Bug: 181778155
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I684141c38f961fce99d17cfb3a83706fcd84ea10
2021-03-05 17:45:10 +00:00
Ioannis Ilkos 282437fbbe Reorder swap field in killinfo
Some tools might parse killinfo entries based on the field order. Move
the newly added swap field to the end to ensure compatibility.

Test: build
Change-Id: Id6dad850beba6835f061da95e84190d00a1b26a0
2021-03-04 17:50:05 +00:00
Ioannis Ilkos 4884890305 Log killed process swap size
We already log the rss size for the process. Given lmkd strategies also consider low swap, it will be beneficial to record the swap size too.

Test: build, manual test
Change-Id: I923f733f7a3aa77fc5968827693b0fc085819174
2021-02-26 17:05:35 +00:00
Chris Morin 74b4df95b4 Replace mentions of "oom_adj" with "oom_score_adj"
Some log messages mention "oom_adj" instead of "oom_score_adj" when
referring to oom_score_adj. This is confusing because "oom_adj" is a
separate value which was supplanted by oom_score_adj, but can still be
used.

Test: trigger memory pressure and view logs
Change-Id: I23825083cecfff6bd32bfb39c6dac1f2b17a72a7
2021-02-26 00:07:16 -08:00
Suren Baghdasaryan dc60f9717b lmkd: Handle workingset_refault vmstat field change in 5.9 kernel
Linux kernel 5.9 change some vmstat fields including workingset_refault
which affects lmkd operation. Update vmstat parsing to handle both
old (workingset_refault) and new (workingset_refault_file) names for
that field.

Bug: 175617952
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8f9b3d027ca96154f07e7252902a5aa04cf05a9f
2020-12-14 13:38:48 -08:00
Suren Baghdasaryan 9f8d3dec72 lmkd: Remove unused workingset_refault parsing from zoneinfo
workingset_refault field in zoneinfo is currently being parsed but
is not used. Instead the same field in vmstat is being used to
capture the number of file-backed workingset refaults. Remove the
unused field parsing code.

Bug: 175617952
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I79641a833c252cf50ac08c0c7d17c8294236d82d
2020-12-14 13:10:59 -08:00
Suren Baghdasaryan 3cc1f13044 lmkd: report kill reason, and meminfo details to statsd for each kill
Information like free memory and swap as well as kill reason would be
useful for understanding regressions in the number of lmk kills in the
field.

Bug: 168117803
Change-Id: Ic46aed3c85b880b32ac5ad61b55f90e0d33517c7
Test: statsd_testdrive 51, load with lmk_unit_test
2020-09-11 14:24:23 +01:00
Martin Liu 589b5752ee lmkd: fix possible long stall state
If the first PSI event triggers a kill, lmkd won't resume polling
immediately after the process has died. Instead, it will wait until the
next PSI event to resume the polling which is too late when the device
is under memory pressure. This happens if data communication with AMS
happens after previous polling window expired, in which case paused
handler gets reset and polling does not resume after the kill.
Fix this by changing pause handler reset logic.

Bug: 167562248
Test: memory pressure test
Signed-off-by: Martin Liu <liumartin@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I10c65c85b718a656e3d8991bf09948b96da895cb
2020-09-04 00:04:57 +08:00
Martin Liu c3108416e7 lmkd: avoid division by zero because of file_base_lru
It seems we have chance that file_base_lru is zero.
Avoid it by adding 1.

Bug: 167660459
Test: boot
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: If19dbbaafe6cd28a9d5b7f8a002f3cd33daab5e7
2020-09-04 00:03:49 +08:00
Martin Liu 1f72f5fa4b lmkd: adjust thrashing dection strategy
When a device is thrashing the file cache, workingset refaults can
grow slowly because of variant reasons. Current thrashing detection
mechanism could reset the thrashing counter frequently as it relies
on presence of reclaim activity, however refaults can keep increasing
even when the device is not actively reclaiming. In addition, the
thrashing counter gets reset when conditions require a kill but lmkd
could not find an eligible process to be killed. This is problematic
because when this happens thrashing is being ignored.

Use a fixed 1 sec periods to aggregate the thrashing counter. Also we
need to keep monitoring thrashing counter while retrying as someone
could release the memory to mitigate the thrashing. If thrashing
counter is greater than the limit at the end of the 1 sec period this
means lmkd failed to find an eligible process to kill. In this case
we store accumulated thrashing in case a new eligible process appears
until accumulated thrashing is less that the limit or we miss an
entire 1 sec window.

Bug: 163134367
Test: heavy loading launch
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ie9f4121ea604179c0ad510cc8430e7a6aec6e6b2
2020-08-28 13:04:42 +08:00
Ioannis Ilkos 279268a07f Emit swap size in the killed process' statsd atoms
Changes:
- We are already reading /proc/pid/status to resolve the tgid. While we
are at it, also parse RSS and swap values.
- Use the RSS and swap values for non memcg builds when creating the
statsd outputs
- Given we already read RSS, remove the separate read of /proc/pid/statm
that used to get tasksize.

Bug: 163116785
Test: manual, out/host/linux-x86/bin/statsd_testdrive 51
Change-Id: I9d98b9ffe8be0b014bb09174ec9532382cae1f38
2020-08-12 20:24:56 +01:00
Suren Baghdasaryan d7b4fcb8a5 lmkd: Add lmkd wakeup information into killinfo logs
Oftentimes while investigating bugreports it's unclear whether lmkd
was active between kills. To provide visibility into lmkd activity
adding the following fields into killinfo reports:
MsSinceEvent - number of msecs since the last PSI/vmpressure event
MsSincePrevWakeup - number of msecs since the previous wakeup
WakeupsSinceEvent - number of wakeups since the last PSI/vmpressure
event
SkippedWakeups - number of wakeups that were skipped due to an
incomplete kill

Bug: 162034541
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I0356c27515132ff0dd309b59a8bf907acbd67cd8
2020-07-24 19:31:03 +00:00
Suren Baghdasaryan 7d1f4f0047 lmkd: Set default kill timeout to limit waits for uninterruptible processes
When lmkd tries to kill a process in uninterruptible sleep state, it may
need to wait for a long time. To prevent this set the default kill timeout
to 100ms which should work for majority of the devices.

Bug: 160295034
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia280dc095df9ca8494278e0a75b976ed93fc04ae
2020-07-08 11:41:14 -07:00
Martin Liu 3185c2d096 lmkd: Fix do not kill perceptible apps due to low swap if above min wmark
Fix code logic to obey our intetion of not killing perceptible apps
due to low swap if above min wmark.

Bug: 155709603
Test: boot
Signed-off-by: Martin Liu <liumartin@google.com>
Merged-In: Ifc09c2a1fe7e21faa096988f471644f63951d81c
Change-Id: Ifc09c2a1fe7e21faa096988f471644f63951d81c
2020-06-02 23:43:49 +08:00
Suren Baghdasaryan 48135c4cba lmkd: Do not kill perceptible apps due to low swap if above min wmark
Prevent kills of perceptible apps due to swap shortages unless system
free memory is below the min watermark. This prevents kills of important
apps when the system is recovering from the memory pressure.

Bug: 155709603
Test: memory stress test with multiple foreground apps
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6beb4b55f8b4f7bc22818b5a7bdfa3adc6cd31c1
2020-05-20 12:22:07 -07:00
Suren Baghdasaryan fb1f592602 lmkd: Set the default free swap threshold to 10% for all devices
Lower the min swap threshold to 10% for all devices to limit kills while
swap still has enough space.

Bug: 155709603
Test: memory stress test with multiple foreground apps
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Merged-In: I443486763c034ed0603ea52b81c060c3969af9a5
Change-Id: I443486763c034ed0603ea52b81c060c3969af9a5
2020-05-20 12:21:34 -07:00
Suren Baghdasaryan 5c039b53d8 lmkd: Fix min_score_adj to exclude killing foreground processes
In the cases when foreground processes should not be killed
min_score_adjust should be set above PERCEPTIBLE_APP_ADJ to prevent such
kills.

Bug: 155709603
Test: memory stress test with multiple foreground apps
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Merged-In: If187654b8001ce843ec6085ccd2042d75a986dae
Change-Id: If187654b8001ce843ec6085ccd2042d75a986dae
2020-05-20 12:21:10 -07:00
Suren Baghdasaryan ed715a3424 lmkd: Remove unused variables and fix type mismatches
Fix compilation warnings by removing unused variables and add typecasting
whenever mixed type comparisons are performed.

Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I7f0839d803a6bf6532f077208ce54aba761dc9fe
2020-05-11 19:52:52 -07:00
Suren Baghdasaryan 1d0ebeaa9c lmkd: Add property re-initialization support
Add --reinit command-line option to allow updating lmkd properties. For
example to enable debug logging in the running lmkd process user should
issue:

setprop ro.lmk.debug true
lmkd --reinit

Bug: 155149944
Test: lmkd_unit_test after resetting lmkd properties
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic60331f3368f5a7fdfe09ad7d47c7ccf0a497685
2020-05-06 15:05:04 -07:00
Suren Baghdasaryan 03dccf35a1 lmkd: enable ro.lmk.kill_timeout_ms to be used with kill notifications
Allow lmkd to stop waiting for a kill notification if a kill takes longer
than ro.lmk.kill_timeout_ms.

Bug: 147315292
Test: lmkd_unit_test with ro.lmk.kill_timeout_ms set to 100
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia3eed3448fd6928a5e634c2737044722048b3578
2020-04-29 15:11:36 -07:00
Suren Baghdasaryan 9ca5334683 lmkd: polling code cleanup
- Remove unused POLLING_STOP state
- Simplify POLLING_DO_NOT_CHANGE state handling
- Correct last_poll_tm assignment logic

Bug: 147315292
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: If0674eda954a25f0f6c9188501ff77db8ba0813b
2020-04-29 15:11:15 -07:00
Suren Baghdasaryan 51ee4c505f lmkd: add kill when swap utilization is too high
When non-swappable allocations cause memory pressure swap will not be
depleted, however a high percentage of the swappable memory will be
pushed into swap. Detect this condition and kill a process when swap
utilization is too high while under memory pressure.
Introduce ro.lmk.swap_util_max property to represent max percentage of
the overall swappable memory that can be swapped under memory pressure
without triggering a kill. ro.lmk.swap_util_max is set to 100 by default
which disables kills due to the swap utilization.

Bug: 147315292
Test: ION memory hogger with ro.lmk.swap_util_max set to 95
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I6dbf124bb24b220d136e8f16b3dae0c0c30d32ca
2020-04-29 15:11:02 -07:00
Muhammad Qureshi ed8fe8465a Use generated code for logging events to statsd
Use the autogenerated libstatslog_lmkd to send events to statsd

The logging schema for statsd is changing as part of statsd becoming
a Mainline module in R. The autogenerated code will handle the schema
change.

Bug: 145887874
Test: m -j
Test: atest android.cts.statsd.atom.UidAtomTests#testLmkKillOccurred

Change-Id: Ibae4cd822807369a799d5c1f6a9c51272e38a074
2020-01-13 12:16:47 -08:00
Suren Baghdasaryan 36baf44179 lmkd: Restrict lmkd unsolicited notifications only to subscribed recipients
lmkd unsolicited notifications can cause lmkd to block if clients are not
consuming them. Fix that by sending notifications to only subscribed
clients. Introduce LMK_SUBSCRIBE command to allow lmkd clients to subscribe
to event notifications. The only asynchronous event currently supported is
LMK_ASYNC_EVENT_KILL.

Bug: 146597855
Test: fill up send buffer using lmkd_unit_test
Test: confirm lmkd does not block after the fix
Change-Id: I014159aa55b59081f4b9ed53ecd160a49c0682bb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-12-23 12:35:29 -08:00
Tom Cherry 43f3d2b190 Build lmkd as C++
Bug: 145669697
Test: build
Change-Id: I4fb2a9a900c8a6915ee84cc3d82434596301b24b
2019-12-13 08:40:30 -08:00