When a proc list for an oomadj is non-empty, we currently read
from disk to get the size of every process in our list, so we
can know which is the largest/heaviest process.
However, if there's only a single process in our list, we already
know it's going to be our heaviest [*]. So we don't need to do
the (relatively expensive) disk access to figure out its size,
and can just directly return this process.
[*] There's the case where our attempt to read from /proc fails
for the process. The old code would then instantly remove this
stale pid (and return NULL if that was the only process in the
list). This new code will end up returning this stale process
instead. Since proc_get_heaviest() is meant to be used in the
same way as proc_adj_tail(), and proc_adj_tail() returns
processes without checking if they are stale, we don't consider
this an issue. (Note that in the current code, the only calling
site of proc_get_heaviest() will remove this stale process when
it calls kill_one_process().)
Bug: 405391096
Test: TreeHugger
Change-Id: Iaf2f5c57dcbf2d4e45c2545a8322736b5985337c
This change allows vendors to trigger LMKD kill events with custom
reasons and minimum score adjustments. This enables experimentation
with custom heuristics and data collection for evaluation before
upstreaming changes to AOSP.
Bug: 385050909
Test: build and check the vendor kill event
Change-Id: If9b51ed9603f0e10e6fc4671fb6da26548f41aaf
Merged-In: If9b51ed9603f0e10e6fc4671fb6da26548f41aaf
Signed-off-by: Martin Liu <liumartin@google.com>
If cgroup interface is unavailable and can't be used to obtain memory
stats we can fall back to reading procfs stats. Such failure would
indicate a misconfigured device with ro.config.per_app_memcg set but
memory cgroup not being mounted or enabled correctly. However stats
reporting should not suffer because of this misconfiguration. Allow
lmkd to fall back to reading stats from procfs interface when this
happens.
Bug: 388926998
Change-Id: Idfc777022c842b45a2640f04edb70de7ca6feac8
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Summary: As a good practice, let's make sure that the "kill_desc" buffer is always null-terminated, even if its size changes in the future.
Test: Successful build on master.
Change-Id: I68a0dc346ea26126a1581994f9c508980a6ac408
Signed-off-by: Abdelrahman Daim <adaim@meta.com>
LMKD was woken up when the memory pressure was high enough either in
either the psi or the vmpressure mechanism. The memory reclaim ability
is based on CPUs' capacity of a chip and it is different from each
others. This patch can count the number of lmkd wake up when meet the
memory pressure threshold instead of heavy-loading logging. To show the
count, we can just re-init the lmkd, and it will print in the android
logcat and reset the count to zero.
Test: Run APP rotation
Bug: 365748420
Signed-off-by: JohnHsu <john.hsu@mediatek.com>
Change-Id: I3980d2a90a910c64449b4ad2b005e4d0437097e8
When both Medium and Critical events occur at same time,
depending on how the events are queued, the later event
resets the former event.
We'd want the subsequent polling (till next event is triggered)
to happen with higher event.
So, it is fine if Critical event overrides Medium, but not other
way around.
Let's see below scenario where both Medium and Critical events
occur (at T0) and handled one after other
T0: critical event handled.
T0 + 2ms: medium event handled.
T0 + 102ms: medium event polling check. //This should be critical poll
Bug: 376003899
Change-Id: I16ff3b999d7531435324a628ac17968fd4cae8cf
When a new event occurs (could be of same level or different,
doesn't matter) after a event-poll is handled, poll of new event
waits only for "100ms - time since last poll is handled".
But new poll should start after 100ms of last triggered event.
T0: event-1 triggered.
T100: event-1 polled
T120: event-2 triggered.
T200: event-2 polled. (This poll should happen 100ms after T120, i.e., T220)
Bug: 377418039
Change-Id: I10aace061668adfed2594581b94cb9f1e745820b
The kernel will print node stats within the first populated zone in the
zoneinfo file. The LMKD tries parsing node stats when it reads the first
"Node %d, zone %8s" line in zoneinfo.
However if the first zone is empty, LMKD could iterate over to the next
populated zone i.e. the next "Node %d, zone %8s" line while attempting
to read node stats. It thus reads the incorrect zone name for this next
zone.
To fix this, ensure whether node stats are indeed being parsed by
checking for the " per node-stats" line.
Bug: 292476676
Change-Id: I72cd111dac9032de506e1ab7f1c4dc96585a1e80
Signed-off-by: Jaskaran Singh <quic_jasksing@quicinc.com>
Recent kernel change [1] cause pidfd_wait() to receive EPOLLHUP when the
task exits. Current LMKD implementation expects to receive EPOLLHUP only
when socket connection gets dropped, therefore it gets confused by this
new kernel behavior. Adjust LMKD handling of EPOLLHUP to detect the case
when this event is generated by pidfd.
[1] https://lore.kernel.org/all/20240202131226.GA26018@redhat.com/
Bug: 352286227
Change-Id: Ibcf349ee3cc73551541d64975f0292d53c41d5c2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add a hook that is invoked when there are no killable processes at any
priority. This allows ARCVM to send VMMMS's no kill candidates message,
which prevents thrashing without having to wait on a balloon stall.
Bug: 362383831
Test: cq
Change-Id: Iffb680a78025bd201932bd805ceeecfe07b1fac9
Occasionally the test reads logcat before lmkd reaper had a chance to
write into it, resulting in the expected report messages being missed.
Add a 200ms delay after lmkd kills the process to give lmkd reper thread
more time to write its reports.
Bug: 347296675
Bug: 358830454
Test: atest lmkd_tests
Change-Id: I2549e37f25c81c9add91f7ee450c4a96c8cf18e4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
The starting line for kill and reap reports are the same. Add additional
logic in the test to distinguish between them and retry if a kill report
is found.
Bug: 347296675
Bug: 358830454
Test: mock the input data where lmkd_tests failed
Change-Id: Idf83831e45e6682c1dfb6cde258d4ec631a5eb32
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add RSS field, in LMK_PROCKILL cmd, to report the latest memory usage of
the killed process.
Test: Verified RSS field is captured in ApplicationExitInfo
Bug: 322549716
Change-Id: Ic1788e8121da97cd879bd7e9d685c7b879ea5475
Signed-off-by: Carlos Galo <carlosgalo@google.com>