Allow kill report to be appended with the explanation of the reasons
killing has been done. This would help identify kill reasons while
troubleshooting lmkd kills.
Change-Id: Ie5dd7a44e51d04c43c2492be8c1bc964d1b03555
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Enable new kill strategy when PSI mode is used in combination with
ro.lmk.use_minfree_levels=false. Adjust ro.lmk.swap_free_low_percentage,
introduce ro.lmk.psi_partial_stall_ms and ro.lmk.psi_complete_stall_ms
system properties to support two levels of PSI events measuring partial
and complete stalls. Add ro.lmk.use_new_strategy system property to switch
to the old strategy if necessary.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: I6f1b65e19dbe9b58c862e5e4255270c82f0afb9a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Parsing /proc/zoneinfo is expensive and zone watermarks normally do no
change often. Instead of checking free memory per each zone we aggregate
zone watermarks and compare them with MemFree from meminfo as an
approximation of memory being under a given watermark.
zoneinfo parsing is rate limited to once per minute to detect a possible
change of the memory margins from userspace.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: If4a8154c004e24324e6de44359de416766139df6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Add new kill strategy which makes kill decisions based on which zone
watermark is breached, how much free swap space is still available and
what percentage of the file-backed page cache has been refaulted. This mode
is designed to be used only with PSI signals. It kills unconditionally when
a critical pressure event is received, therefore PSI stall for that event
should be set to a value representing a truly non-responding system
(currently set to 700ms out of 1sec spent in complete stall). New event
handler also controls polling interval based on current memory conditions.
Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: Ia213ef2bb06b245d651ebf2d813e944b4ae7565f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
After a memory event happens event handler can assess current memory
condition and decide if and when lmkd should re-check memory metrics in
order to respond to changing memory conditions. Change the event handler
interface to allow control over polling period and ability to start/extend
polling session.
Bug: 132642304
Test: lmkd_unit_test
Change-Id: Ia74011e943140b6cffbf452ff8e1744b7336eacf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
/proc/zoneinfo contains per-node data and each node contains per-zone
section for each zone. Current parser does not recognize this hierarchy
and useful per-zone information like zone watermarks cannot be retrieved.
Change the parser to parse zoneinfo into a hierarchical structure. New
parser also handles up to 2 nodes and can be easily extended to handle
more if needed by changing MAX_NR_NODES.
Bug: 132642304
Test: lmkd_unit_test
Change-Id: I9306289ea6d30d78a261c5d5c29f4f6ea167807d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Files like /proc/zoneinfo or /proc/<pid>/status can be larger than 4KB
page size. Change reread_file routine to resize read buffer whenever
it is not big enough to read the entire file. Start with 1-page buffer
and double its size until it's big enough to read the entire file.
Read /proc/zoneinfo during initialization to initialize the buffer
to a big enough size and avoid re-allocations when under memory pressure.
Bug: 137010962
Test: lmkd_unit_test
Change-Id: If9a5b0d27c2f4de9063f0fd0f36f908ece87dcce
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Fix termination of killed process name in proc_get_name function. While at
it also fix the coding style in the function.
Test: lmkd_unit_test
Bug: 141780598
Change-Id: I3f99b3e37b9a9d0750ece94f08f0b50ac839dacb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Required because the kernel cannot always get the taskname safely at
the time the process is killed (due to competition for mm->mmap_sem).
Test: manually
Bug: 130017100
Signed-off-by: Jim Blackler <jimblackler@google.com>
Change-Id: I27a2c3340da321570f0832d58fe9e79ca031620b
Only thread group leaders should be registered with lmkd. Add a check to
ignore any non-leader TIDs and generate an error if such condition is
detected. Run the same check before killing a process to detect cases of
non-leader TIDs being used to kill a process. This might happen if PIDs
overflow and previously registered PID gets reused for a non-leader
thread in the following scenario:
1. pid X is a thread group leader and is registered with lmkd
2. pid X dies without lmkd knowing it and pid gets recycled
3. process Y creates a thread with tid X
4. lmkd kills pid X which results in process Y being killed
Bug: 136408020
Test: lmkd_unit_test
Change-Id: I46c5a0b273f2b72cefc20ec59b80b4393f2a1a37
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Occasionally we see cases when 40ms polling is still too conservative.
Change to 10ms polling period. Since the polling happens only after PSI
signal and continues for 1sec this should not affect system performance.
Test: lmkd_unit_test
Bug: 129358844
Change-Id: Ib759b865b2104be23741fc0eacaa541e22d50dde
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
200ms was too lenient when under severe memory pressure.
Test: boots, works
Bug: 127765309
Change-Id: I8e047de6318574a107720c56473ed0f25582e182
Signed-off-by: Tim Murray <timmurray@google.com>
Log min_score_adj when lmkd kills a process to determine the oom_score
levels that lmkd considers during the kill.
Bug: 123024834
Change-Id: I986ae8f2808199b1654bc8d2a32dd88046c79aa3
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
lmkd can't kill processes because it has compare the size between free swap and free memory. Free swap is often larger than the free memory when system is under low memory with less swap-backed or swappable pages and finally leads to I/O thrashing.
Test: TreeHugger
Bug: 124727769
Change-Id: Ia2848859aa97a24bd13c704acee4b86cd2d3f647
We already know that "polling" must be non-zero at this point,
because it hasn't been modified since our check on line 1960.
So we remove this check for code clarity.
Test: TreeHugger
Change-Id: I069d9fd0eef70748a5333733dd0518d1ac8021b7
With new psi monitor support in the kernel lmkd can use it to register
custom pressure levels. Add lmkd support for psi monitors when they are
provided by the kernel and use them by default. When kernel does not
support psi lmkd will fall back to vmpressure usage.
Add ability to poll memory status after the initial psi event is triggered
because kernel throttles psi memory pressure events to one per PSI tracking
window (currently set to 1sec). Current implementation polls every 200ms
for 1sec duration after the initial event is triggered.
If ro.lmk.use_psi is set to false psi logic will be disabled even when psi
is supported in kernel.
Bug: 111308141
Test: lmkd_unit_test
Change-Id: I685774b176f393bab7412161773f5c9af51e0163
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This is to measure an application's behavior with respect to being LMKed
(the longer an app lives before being LMKed, the better).
Bug: 119854389
Test: Manual
Change-Id: I4ef6433391c8758626334731d2b5de038e4468ae
Merged-In: I4ef6433391c8758626334731d2b5de038e4468ae
(cherry picked from I4ef6433391c8758626334731d2b5de038e4468ae)
find_and_kill_processes() does not kill multiple processes at a time
anymore. Remove support for bulk process killing.
Change-Id: Id09132a9cebe44589a1a3ebcbff800a16fa56557
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Kill a single process at a time and try to wait up to 100ms for
that process to reclaim memory before triggering another kill.
Test: boots, works
bug: 116877958
Change-Id: I6775d0534b3e3728c04389d3eae1a00e3cbf9f27
Intrduce LMK_GETKILLCNT command for ActivityManager to get the number of
kills from lmkd.
Bug: 117126077
Test: used lmkd_unit_test to verify correct reporting
Change-Id: I09c720a7176b4df95efc544177cd2694f8d791be
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
lmkd sets the soft limit parameters for Go devices.
The limit for apps in the perceptible group is set to 16M.
However this limit is not sufficient for the keyboard app to
prevent pages from being re-claimed quickly. The mem usage of
the keyboard app is around 55M most cases with some occasional
spikes to 70-80M. Increasing the limit to 64M improves the warm
startup latency for keyboard. It is still lower than the limits
set for foreground and visible apps.
Test: Go device (1G)
Bug: 117517805
Merged-In: Id50e49327cfd76126e41ef6503971845f29196af
Change-Id: Id50e49327cfd76126e41ef6503971845f29196af
lmkd keeps a list of pids registered by ActivityManager, however on rare
occasions when framework restarts and lmkd survives that list has to be
purged. Implement a command that can be used to clear the pid list.
Bug: 116801366
Test: locally by killing zygote process
Change-Id: I71d6012f86bb83a73edd5b687e05a0848e0569b1
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
pid_remove() frees a structure representing registered process and the
pointer can't be used anymore. This change fixes an instance when pointer
was used after it was freed. pid_remove() is moved to the end of the
function and comments are added to prevent similar situation in the future.
Bug: 117625315
Change-Id: I6a922952a31232497b3f9caf87d5a21bd402db94
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Excessive number of failed kill reports when lmkd can't find an eligible
process to kill or frees not enough memory pollutes logs and bugreports.
Cleanup kill reports to remove duplicate information and rate limit failed
kill attempts at 1 report per sec. The number of suppressed failed kills
will be reported in the next lmkd report.
Bug: 113864581
Test: Verified using lmkd_unit_test
Change-Id: I67fa1fec97613f136c7582115edcbc56b1503c9c
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Introduce sys.lmk.minfree_levels system property to allow minfree level
reporting. The format for this property is:
<minfree 1>:<oom_adj 1>, <minfree 2>:<oom_adj 2>, ...
Max number of minfree levels is 6 and they are specified in the
increasing order. For example:
sys.lmk.minfree_levels=18432:0,23040:100,27648:200,32256:300,55296:900,80640:906
sys.lmk.minfree_levels updates are ratelimited to once per second in order
to prevent DoS attacks.
Bug: 111521182
Test: getprop sys.lmk.minfree_levels returns expected value
Change-Id: I80d75d6836650b12457d6a99ca88898535837a97
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
While troubleshooting memory pressure related issues it's hard to get a
good view of the memory state when lmkd kill happens. Logging relevant
information from /proc/meminfo file that was used to make a kill decision
is very helpful for further analysis. To do this efficiently we are using
Android Logger event library functions and log the data used for kill
decision after the kill signal was issued.
Test: Run lmkd_unit_test and logcat -b events -v descriptive
Change-Id: Id5de41b9d91a04dd5d3eb9b85d4e1babe9755628
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
When the swap space is full, a pressure event is unlikely to resolve by
itself. In this case, do not downgrade or ignore the events.
Bug: 112056451
Test: Fill up swap on a 1GB device and check critical vmpressure events
are not downgraded.
Change-Id: If154dc364711bf7c86f32e24ddcd10be359386de
Initial change to remove memory.stat usage when per-application memcgs
are disabled was partially merged into AOSP under the following id:
Ib6dd7586d3ef1c64cb04d16e2d2b21fa9c8e6a3a
This change adds the missing parts.
Bug: 110384555
Change-Id: I1265021b1ede0e68efbf80d6430a959eaf46a69a
Merged-In: Ib6dd7586d3ef1c64cb04d16e2d2b21fa9c8e6a3a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
We're passing a 'line' whose backing buffer is PAGE_MAX in size
into memory_stat_parse_line(). We protect overflowing the smaller
LINE_MAX 'key' buffer via some C preprocessing macros to assure
we limit the size.
Test: Local build with LMKD_LOG_STATS set for this file.
Bug: 76220622
Change-Id: I9e50d4270f7099e37a9bfc7fb9b9b95cc7adb086
Per-application memory.stat files are not available when per-application
memcgs are not used (per_app_memcg=false). Disable its usage based on
ro.config.per_app_memcg property.
minchan:
* correct indentation of memory_stat_parse
* move per_app_memcg check into memory_stat_parse inside
* change low_ram_device to per_app_memcg
Bug: 110384555
Test: manual test to see lkmd log message with memory hogger
Merged-In: Ib6dd7586d3ef1c64cb04d16e2d2b21fa9c8e6a3a
Change-Id: Ib6dd7586d3ef1c64cb04d16e2d2b21fa9c8e6a3a
Signed-off-by: Minchan Kim <minchan@google.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
* New clang compiler requires variadic function to have
at least one named parameter type.
* Use ##__VA_ARGS__ to work with empty __VA_ARGS__.
* Fix one ALOG_ASSERT parameter bug in lmkd/lmkd.c.
Bug: 111614304
Test: make with WITH_TIDY=1
Change-Id: I90f35aa88527a6897954f69a35b256a157a725c5
Setting memory.soft_limit_in_bytes on high-end devices with large memory
reserves affects performance of memory-hungry applications that have
large workingsets and keep thrashing because of the memory limits imposed.
Limit the usage of memory.soft_limit_in_bytes to low-memory devices only.
Add debug messages for future troubleshooting to capture cases when
vmpressure events are being ignored.
Bug: 78916015
Test: collect vmstat while running a heavy app
Change-Id: Ib4434b96d2be802ef89960b573486eae8d12f198
Merged-In: Ib4434b96d2be802ef89960b573486eae8d12f198
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Setting memory.soft_limit_in_bytes on high-end devices with large memory
reserves affects performance of memory-hungry applications that have
large workingsets and keep thrashing because of the memory limits imposed.
Limit the usage of memory.soft_limit_in_bytes to low-memory devices only.
Add debug messages for future troubleshooting to capture cases when
vmpressure events are being ignored.
Bug: 78916015
Test: collect vmstat while running a heavy app
Change-Id: Ib4434b96d2be802ef89960b573486eae8d12f198
Merged-In: Ib4434b96d2be802ef89960b573486eae8d12f198
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
This reverts commit 5e60f88cab8a22d88d76f7fd49f5a6e9cc0d931b.
Reason for revert: broke some builds
Bug: 78603347
Change-Id: I46bf6face35f5399d7d43146b360c0703eedfb1a