LMKD was woken up when the memory pressure was high enough either in
either the psi or the vmpressure mechanism. The memory reclaim ability
is based on CPUs' capacity of a chip and it is different from each
others. This patch can count the number of lmkd wake up when meet the
memory pressure threshold instead of heavy-loading logging. To show the
count, we can just re-init the lmkd, and it will print in the android
logcat and reset the count to zero.
Test: Run APP rotation
Bug: 365748420
Signed-off-by: JohnHsu <john.hsu@mediatek.com>
Change-Id: I3980d2a90a910c64449b4ad2b005e4d0437097e8
When both Medium and Critical events occur at same time,
depending on how the events are queued, the later event
resets the former event.
We'd want the subsequent polling (till next event is triggered)
to happen with higher event.
So, it is fine if Critical event overrides Medium, but not other
way around.
Let's see below scenario where both Medium and Critical events
occur (at T0) and handled one after other
T0: critical event handled.
T0 + 2ms: medium event handled.
T0 + 102ms: medium event polling check. //This should be critical poll
Bug: 376003899
Change-Id: I16ff3b999d7531435324a628ac17968fd4cae8cf
When a new event occurs (could be of same level or different,
doesn't matter) after a event-poll is handled, poll of new event
waits only for "100ms - time since last poll is handled".
But new poll should start after 100ms of last triggered event.
T0: event-1 triggered.
T100: event-1 polled
T120: event-2 triggered.
T200: event-2 polled. (This poll should happen 100ms after T120, i.e., T220)
Bug: 377418039
Change-Id: I10aace061668adfed2594581b94cb9f1e745820b
The kernel will print node stats within the first populated zone in the
zoneinfo file. The LMKD tries parsing node stats when it reads the first
"Node %d, zone %8s" line in zoneinfo.
However if the first zone is empty, LMKD could iterate over to the next
populated zone i.e. the next "Node %d, zone %8s" line while attempting
to read node stats. It thus reads the incorrect zone name for this next
zone.
To fix this, ensure whether node stats are indeed being parsed by
checking for the " per node-stats" line.
Bug: 292476676
Change-Id: I72cd111dac9032de506e1ab7f1c4dc96585a1e80
Signed-off-by: Jaskaran Singh <quic_jasksing@quicinc.com>
Recent kernel change [1] cause pidfd_wait() to receive EPOLLHUP when the
task exits. Current LMKD implementation expects to receive EPOLLHUP only
when socket connection gets dropped, therefore it gets confused by this
new kernel behavior. Adjust LMKD handling of EPOLLHUP to detect the case
when this event is generated by pidfd.
[1] https://lore.kernel.org/all/20240202131226.GA26018@redhat.com/
Bug: 352286227
Change-Id: Ibcf349ee3cc73551541d64975f0292d53c41d5c2
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add a hook that is invoked when there are no killable processes at any
priority. This allows ARCVM to send VMMMS's no kill candidates message,
which prevents thrashing without having to wait on a balloon stall.
Bug: 362383831
Test: cq
Change-Id: Iffb680a78025bd201932bd805ceeecfe07b1fac9
Occasionally the test reads logcat before lmkd reaper had a chance to
write into it, resulting in the expected report messages being missed.
Add a 200ms delay after lmkd kills the process to give lmkd reper thread
more time to write its reports.
Bug: 347296675
Bug: 358830454
Test: atest lmkd_tests
Change-Id: I2549e37f25c81c9add91f7ee450c4a96c8cf18e4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
The starting line for kill and reap reports are the same. Add additional
logic in the test to distinguish between them and retry if a kill report
is found.
Bug: 347296675
Bug: 358830454
Test: mock the input data where lmkd_tests failed
Change-Id: Idf83831e45e6682c1dfb6cde258d4ec631a5eb32
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add RSS field, in LMK_PROCKILL cmd, to report the latest memory usage of
the killed process.
Test: Verified RSS field is captured in ApplicationExitInfo
Bug: 322549716
Change-Id: Ic1788e8121da97cd879bd7e9d685c7b879ea5475
Signed-off-by: Carlos Galo <carlosgalo@google.com>
When we get nothing from /proc/<PID>/status and /proc/<PID>/cmdline,
we should return NULL or False because this usually indicates the
process has already terminated. We should avoid attempting
to kill a non-existent process, as it's an unnecessary waste
of kill timeout.
Bug: 331612600
Test: give memory pressure to trigger LMKD
Change-Id: I468ff25012f9bb6fc842a7fad268ebcad0de4690
Signed-off-by: Martin Liu <liumartin@google.com>
Experiments on low-RAM devices indicate regressions due to the new low
memory kill reason which cause LMKD to kill too many processes. Change
ro.lmk.lowmem_min_oom_score to disable kills for this reason by default.
Bug: 341257415
Change-Id: Id7137c4c8d888061353b253dc6906d2854e31b1d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
We are trying to remove BPF_FD_JUST_USE_INT since we now have access to
libbase everywhere.
Test: builds
Change-Id: Ie9445d3d648e6837deb718aa38ebef3c936653d6