Parsing /proc/zoneinfo is expensive and zone watermarks normally do no
change often. Instead of checking free memory per each zone we aggregate
zone watermarks and compare them with MemFree from meminfo as an
approximation of memory being under a given watermark.
zoneinfo parsing is rate limited to once per minute to detect a possible
change of the memory margins from userspace.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: If4a8154c004e24324e6de44359de416766139df6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add new kill strategy which makes kill decisions based on which zone
watermark is breached, how much free swap space is still available and
what percentage of the file-backed page cache has been refaulted. This mode
is designed to be used only with PSI signals. It kills unconditionally when
a critical pressure event is received, therefore PSI stall for that event
should be set to a value representing a truly non-responding system
(currently set to 700ms out of 1sec spent in complete stall). New event
handler also controls polling interval based on current memory conditions.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: Ia213ef2bb06b245d651ebf2d813e944b4ae7565f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
After a memory event happens event handler can assess current memory
condition and decide if and when lmkd should re-check memory metrics in
order to respond to changing memory conditions. Change the event handler
interface to allow control over polling period and ability to start/extend
polling session.
Bug: 132642304
Test: lmkd_unit_test
Change-Id: Ia74011e943140b6cffbf452ff8e1744b7336eacf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
/proc/zoneinfo contains per-node data and each node contains per-zone
section for each zone. Current parser does not recognize this hierarchy
and useful per-zone information like zone watermarks cannot be retrieved.
Change the parser to parse zoneinfo into a hierarchical structure. New
parser also handles up to 2 nodes and can be easily extended to handle
more if needed by changing MAX_NR_NODES.
Bug: 132642304
Test: lmkd_unit_test
Change-Id: I9306289ea6d30d78a261c5d5c29f4f6ea167807d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Files like /proc/zoneinfo or /proc/<pid>/status can be larger than 4KB
page size. Change reread_file routine to resize read buffer whenever
it is not big enough to read the entire file. Start with 1-page buffer
and double its size until it's big enough to read the entire file.
Read /proc/zoneinfo during initialization to initialize the buffer
to a big enough size and avoid re-allocations when under memory pressure.
Bug: 137010962
Test: lmkd_unit_test
Change-Id: If9a5b0d27c2f4de9063f0fd0f36f908ece87dcce
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Fix termination of killed process name in proc_get_name function. While at
it also fix the coding style in the function.
Test: lmkd_unit_test
Bug: 141780598
Change-Id: I3f99b3e37b9a9d0750ece94f08f0b50ac839dacb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Required because the kernel cannot always get the taskname safely at
the time the process is killed (due to competition for mm->mmap_sem).
Test: manually
Bug: 130017100
Signed-off-by: Jim Blackler <jimblackler@google.com>
Change-Id: I27a2c3340da321570f0832d58fe9e79ca031620b
Only thread group leaders should be registered with lmkd. Add a check to
ignore any non-leader TIDs and generate an error if such condition is
detected. Run the same check before killing a process to detect cases of
non-leader TIDs being used to kill a process. This might happen if PIDs
overflow and previously registered PID gets reused for a non-leader
thread in the following scenario:
1. pid X is a thread group leader and is registered with lmkd
2. pid X dies without lmkd knowing it and pid gets recycled
3. process Y creates a thread with tid X
4. lmkd kills pid X which results in process Y being killed
Bug: 136408020
Test: lmkd_unit_test
Change-Id: I46c5a0b273f2b72cefc20ec59b80b4393f2a1a37
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Occasionally we see cases when 40ms polling is still too conservative.
Change to 10ms polling period. Since the polling happens only after PSI
signal and continues for 1sec this should not affect system performance.
Test: lmkd_unit_test
Bug: 129358844
Change-Id: Ib759b865b2104be23741fc0eacaa541e22d50dde
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Previous change If154dc364711bf7c86f32e24ddcd10be359386de called
"lmkd: Do not downgrade/ignore events when swap is full" added SwapTotal
into meminfo structure without adding the field into events.logtag file.
This results in logs which missing field and all fields starting with
"SwapFree" get reordered as a result. Fix this by adding the missing field
into events.logtag.
Bug: 129274901
Test: Confirm correct information in the logcat
Change-Id: Ia4de3790a7e9d49a0e4cba8b3161a715eaf6532e
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
200ms was too lenient when under severe memory pressure.
Test: boots, works
Bug: 127765309
Change-Id: I8e047de6318574a107720c56473ed0f25582e182
Signed-off-by: Tim Murray <timmurray@google.com>
am: e7a9fabd64 -s ours
am skip reason: change_id I4ef6433391c8758626334731d2b5de038e4468ae with SHA1 1417cdbddb is in history
Change-Id: I0b6eb14568d480b13fd0cea14863a9ad4c14c0cd
am: e7cfa67a05 -s ours
am skip reason: change_id Ie555933aafa6a6b7aa1dbf5518ebe804376e0afd with SHA1 fe31bef940 is in history
Change-Id: I3c357f360da2969e1850f32419e4b074bbe17e21
am: b68fe506e0 -s ours
am skip reason: change_id I4ef6433391c8758626334731d2b5de038e4468ae with SHA1 34c3cb84a0 is in history
Change-Id: I2521241013c39d3cc16f45b1df996135f92a053f
am: 9eee2302ee -s ours
am skip reason: change_id Ie555933aafa6a6b7aa1dbf5518ebe804376e0afd with SHA1 4dbc24d393 is in history
Change-Id: I42d2080003936bcf99fec6917fe5e52366855c07
This is to measure an application's behavior with respect to being LMKed
(the longer an app lives before being LMKed, the better).
Bug: 119854389
Test: Manual
Change-Id: I4ef6433391c8758626334731d2b5de038e4468ae
Merged-In: I4ef6433391c8758626334731d2b5de038e4468ae
(cherry picked from I4ef6433391c8758626334731d2b5de038e4468ae)
am: 962e0442d1 -s ours
am skip reason: change_id I4ef6433391c8758626334731d2b5de038e4468ae with SHA1 1417cdbddb is in history
Change-Id: I56f76418a5c6a3435dec766d731068f60bd4b642
am: 2bc24f88ca -s ours
am skip reason: change_id Ie555933aafa6a6b7aa1dbf5518ebe804376e0afd with SHA1 4dbc24d393 is in history
Change-Id: I5676596b2ee9f7448faa0b0274ac9425c7525fb0
This is to measure an application's behavior with respect to being LMKed
(the longer an app lives before being LMKed, the better).
Bug: 119854389
Test: Manual
Change-Id: I4ef6433391c8758626334731d2b5de038e4468ae
Merged-In: I4ef6433391c8758626334731d2b5de038e4468ae
(cherry picked from I4ef6433391c8758626334731d2b5de038e4468ae)
Log min_score_adj when lmkd kills a process to determine the oom_score
levels that lmkd considers during the kill.
Bug: 123024834
Change-Id: I986ae8f2808199b1654bc8d2a32dd88046c79aa3
Signed-off-by: Suren Baghdasaryan <surenb@google.com>