For now the only unsolicited message from lmkd is the process
kills on memory pressure.
Bug: 136036078
Test: atest ApplicationExitInfoTest
Change-Id: I503fd6a45ebab5276460b0ab978ebb2b8431dc0d
Signed-off-by: Jing Ji <jji@google.com>
pidhash is defined as an array of pointers:
static struct proc** pidhash = NULL;
...So we should be allocating `LINE_MAX * sizeof(struct proc *)` elems
here. Given the current constants here, this saves ~130KB, so not a big
deal, but still convenient.
Caught by clang's static analyzer:
system/memory/lmkd/statslog.c:354:19: warning: Result of 'calloc' is
converted to a pointer of type 'struct proc *', which is incompatible
with sizeof operand type 'struct proc'
[clang-analyzer-unix.MallocSizeof]
Bug: None
Test: TreeHugger
Change-Id: Iee9ca00a3a2a0ecababe9810d2ffcfc42169dd25
Add an optional process type field into lmkd registration protocol so that
applications can be distinguished from services.
Bug: 129011369
Test: boot and verify native service registration
Change-Id: Ie610b5d07cbe247a55ab31bc079ee5c5923bea11
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Introduce lmkd_unregister_proc helper function. Fix a bug where
lmkd_pack_set_procremove used a wrong structure as a parameter.
Bug: 129011369
Test: verify process record removal when it is manually killed
Change-Id: I7ab5a499f6b1c6eecfdba4d0a5ec916053e2726a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
In order to register native services init needs ability to communicate with
lmkd. Make liblmkd_utils library available in recovery mode so that init
can link to it and add a data socket in lmkd to support additional
connection from init. Ensure SOCK_CLOEXEC type for lmkd socket to prevent
init children from inheriting it.
Bug: 129011369
Test: boot and verify native service registration
Change-Id: Iaa4f59282fb10f838f6811571e97d55754b1bd41
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Associate each registered process with the PID of the lmkd client that
registered it to prevent one client from updating records of another
client.
Bug: 129011369
Test: boot and verify native service registration
Change-Id: Id8ca7bb6314df225d04da6469b523d2cdc237eaa
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
We want poll_handler to be handler_info, so it's more efficient
to just unconditionally assign it.
Test: TreeHugger
Change-Id: I55b5164da1817ef77b5d455eb618f9a2471afc5c
lmkd uses PIDs to track processes, however occasionally a PID of a process
might be reused without lmkd detecting that. This can happen if originally
registered process crashes, PID numbers wrap around and the same PID gets
reused for a different process. In this situation lmkd might kill a wrong
process. To prevent this issue from occurring lmkd will track processes
using their pidfd. During process registration lmkd calls sys_pidfd_open
and stores returned pidfd with the process record. Returned pidfd will not
be reused until lmkd closes it which happens only after the process is
unregistered. This way lmkd ensures that process identification is unique
and can't be reused.
Bug: 135608568
Test: lmkd_unit_test with and without pidfd kernel support
Change-Id: Ida10ea13905c250e47f792cdd6bd2e65aeaa3709
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
With pidfd polling support lmkd can detect process death without periodic
polling. Implement mechanism to detect kernel pidfd support using
pidfd_open syscall existence as an indicator. Implement the logic to use
pidfd to wait for process death.
Bug: 135608568
Test: lmkd_unit_test with and without pidfd kernel support
Change-Id: Ic6db7e50893534467f5130a7f998b66fb4451272
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Caught by clang's static analyzer:
system/core/lmkd/lmkd.c:930:9: warning: 1st function call argument is an
uninitialized value [clang-analyzer-core.CallAndMessage]
Bug: None
Test: TreeHugger
Change-Id: I9dc8e97d6aa22ea977fa06553d957a31a9df8819
meminfo_log is used to log the state of the memory at the time of a kill.
Instead of reporting kill information and meminfo separately let's combine
them into one killinfo_log report. While normal logs can be trimmed by
chatty, meminfo_log uses a separate log context which gives it a better
chance of survival. As a result we will have all the information relevant
to a kill in one report which has higher chance of surviving chatty.
Bug: 74119935
Test: lmkd_unit_test
Change-Id: I83a9c12d538e1fb107721b04fdafc3c6c0d83b60
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Move statsd related code out of lmkd.c to minimize ifdefs sprinkled around
the code and make it more maintainable.
Bug: 74119935
Test: lmkd_unit_test
Change-Id: Ib22f90fd380b9a31e09ab18ef16787bc07415ddf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When lmkd fails to kill it should log the error, remove the process record
and exit immediately.
Bug: 74119935
Test: lmkd_unit_test
Change-Id: I26b0fd873eeed325f7dd56097ec31abc0d63f3a1
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Page cache thrashing affects device performance and by killing a process
we try to stop it. However if the thrashing application is the one which
user is interacting with then lmkd should not kill it even though it might
affect device performance.
Bug: 141286980
Test: SequentialRWTest CTS test
Change-Id: If86c0e7e8ad9adf1816659562151ca083eaa65c4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Allow kill report to be appended with the explanation of the reasons
killing has been done. This would help identify kill reasons while
troubleshooting lmkd kills.
Change-Id: Ie5dd7a44e51d04c43c2492be8c1bc964d1b03555
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Enable new kill strategy when PSI mode is used in combination with
ro.lmk.use_minfree_levels=false. Adjust ro.lmk.swap_free_low_percentage,
introduce ro.lmk.psi_partial_stall_ms and ro.lmk.psi_complete_stall_ms
system properties to support two levels of PSI events measuring partial
and complete stalls. Add ro.lmk.use_new_strategy system property to switch
to the old strategy if necessary.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: I6f1b65e19dbe9b58c862e5e4255270c82f0afb9a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Parsing /proc/zoneinfo is expensive and zone watermarks normally do no
change often. Instead of checking free memory per each zone we aggregate
zone watermarks and compare them with MemFree from meminfo as an
approximation of memory being under a given watermark.
zoneinfo parsing is rate limited to once per minute to detect a possible
change of the memory margins from userspace.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: If4a8154c004e24324e6de44359de416766139df6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add new kill strategy which makes kill decisions based on which zone
watermark is breached, how much free swap space is still available and
what percentage of the file-backed page cache has been refaulted. This mode
is designed to be used only with PSI signals. It kills unconditionally when
a critical pressure event is received, therefore PSI stall for that event
should be set to a value representing a truly non-responding system
(currently set to 700ms out of 1sec spent in complete stall). New event
handler also controls polling interval based on current memory conditions.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: Ia213ef2bb06b245d651ebf2d813e944b4ae7565f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
After a memory event happens event handler can assess current memory
condition and decide if and when lmkd should re-check memory metrics in
order to respond to changing memory conditions. Change the event handler
interface to allow control over polling period and ability to start/extend
polling session.
Bug: 132642304
Test: lmkd_unit_test
Change-Id: Ia74011e943140b6cffbf452ff8e1744b7336eacf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
/proc/zoneinfo contains per-node data and each node contains per-zone
section for each zone. Current parser does not recognize this hierarchy
and useful per-zone information like zone watermarks cannot be retrieved.
Change the parser to parse zoneinfo into a hierarchical structure. New
parser also handles up to 2 nodes and can be easily extended to handle
more if needed by changing MAX_NR_NODES.
Bug: 132642304
Test: lmkd_unit_test
Change-Id: I9306289ea6d30d78a261c5d5c29f4f6ea167807d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Files like /proc/zoneinfo or /proc/<pid>/status can be larger than 4KB
page size. Change reread_file routine to resize read buffer whenever
it is not big enough to read the entire file. Start with 1-page buffer
and double its size until it's big enough to read the entire file.
Read /proc/zoneinfo during initialization to initialize the buffer
to a big enough size and avoid re-allocations when under memory pressure.
Bug: 137010962
Test: lmkd_unit_test
Change-Id: If9a5b0d27c2f4de9063f0fd0f36f908ece87dcce
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Fix termination of killed process name in proc_get_name function. While at
it also fix the coding style in the function.
Test: lmkd_unit_test
Bug: 141780598
Change-Id: I3f99b3e37b9a9d0750ece94f08f0b50ac839dacb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Required because the kernel cannot always get the taskname safely at
the time the process is killed (due to competition for mm->mmap_sem).
Test: manually
Bug: 130017100
Signed-off-by: Jim Blackler <jimblackler@google.com>
Change-Id: I27a2c3340da321570f0832d58fe9e79ca031620b