Add psi memory, io and cpu 10sec averages into killinfo reports.
Bug: 205182133
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: Ifbcf47cab291fb20dbf0b5d32f1965f4e6462b49
Merged-In: Ifbcf47cab291fb20dbf0b5d32f1965f4e6462b49
When system is under heavy memory pressure the system might be able
to keep free memory above the min watermark avoiding perceptible app
kills. In such situation system might end up using all its cpu
capacity on memory reclaim and not doing productive work. To detect
this condition, check memory full stall and compare it with the new
ro.lmk.stall_limit_critical tunable representing the stall threshold.
When the recorded level is over ro.lmk.stall_limit_critical, lmkd will
be allowed to kill perceptible apps. ro.lmk.stall_limit_critical
represents the max memory full stall in % that is allowed before
perceptible apps will get killed. By default it is set to 100%, which
effectively disables the feature.
Currently system stall is measured based on psi memory stall 10s average
value, however this definition might change in the future if better
metrics are developed. Setting ro.lmk.stall_limit_critical to 5 means
the system should be fully stalled (no productive work is done) for 5%
of the 10sec period, resulting in 0.5 sec loss due to the stall.
Bug: 205182133
Test: verify on heavy memory pressure test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Martin Liu <liumartin@google.com>
Change-Id: I9713e30d82641d86d1b7edb5e1ba2971b935c898
Merged-In: I9713e30d82641d86d1b7edb5e1ba2971b935c898
set_process_group_and_prio sets task profiles for each thread in the
process separately. This can be avoided by setting task profiles to
the entire process using its pid.
Bug: 215557553
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I9c1917172019a42809385f6c9c084b8cb343b520
Merged-In: I9c1917172019a42809385f6c9c084b8cb343b520
BG group may have settings such as cpu.shares impacting reclaim
performance. Let us migrate task to foreground sched group similarly to
cpuset group.
Test: Build
Bug: 199797672
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I75ee9f3486a2c76e65267a98e39edff96a5e1673
(cherry picked from commit 0195bcdba7)
When a device boots, lmkd starts before persistent properties are loaded,
therefore if experiments set any flags, the corresponding persistent
properties will trigger change notifications when they are first loaded
during boot.
In order to prevent lmkd from re-initializing on every property load,
mark persistent property change by setting lmkd.reinit to 0 and delay
lmkd re-initialization until sys.boot_completed=1 when all properties
are set and only one re-initialization will capture them all. On devices
with no experiment flags being set lmkd.reinit will be undefined at the
boot completion time and re-initialization will not be triggered at all.
Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Test: adb shell device_config put lmkd_native thrashing_limit 100
Test: adb reboot; adb -b all logcat | grep lmkd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iba34fc719a18d58b890549c7415bec869d471901
Merged-In: Iba34fc719a18d58b890549c7415bec869d471901
Allow persist.device_config.lmkd_native.* to override ro.lmk.*
properties to enable experiments with lmkd configuration properties.
Experiments will be able to set appropriate
persist.device_config.lmkd_native.<name> property which will issue
"lmkd --reinit" command to reinitialize lmkd with new parameters.
Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia48fd51eab126d307a1604530b642e86cf250688
Merged-In: Ia48fd51eab126d307a1604530b642e86cf250688
When a device boots, lmkd starts before persistent properties are loaded,
therefore if experiments set any flags, the corresponding persistent
properties will trigger change notifications when they are first loaded
during boot.
In order to prevent lmkd from re-initializing on every property load,
mark persistent property change by setting lmkd.reinit to 0 and delay
lmkd re-initialization until sys.boot_completed=1 when all properties
are set and only one re-initialization will capture them all. On devices
with no experiment flags being set lmkd.reinit will be undefined at the
boot completion time and re-initialization will not be triggered at all.
Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Test: adb shell device_config put lmkd_native thrashing_limit 100
Test: adb reboot; adb -b all logcat | grep lmkd
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Iba34fc719a18d58b890549c7415bec869d471901
Merged-In: Iba34fc719a18d58b890549c7415bec869d471901
Allow persist.device_config.lmkd_native.* to override ro.lmk.*
properties to enable experiments with lmkd configuration properties.
Experiments will be able to set appropriate
persist.device_config.lmkd_native.<name> property which will issue
"lmkd --reinit" command to reinitialize lmkd with new parameters.
Bug: 194316048
Test: adb shell device_config put lmkd_native thrashing_limit_critical 350
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia48fd51eab126d307a1604530b642e86cf250688
Merged-In: Ia48fd51eab126d307a1604530b642e86cf250688
Thrashing threshold tuning requires collecting thrashing level data from
the field and correlating these levels with other indications of device
being non-responsive.
Include current and max thrashing levels in the lmkd kill reports. Max
thrashing level captures the highest level seen since the last kill report.
Bug: 194433891
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
This reverts commit e1ffef4e36.
Reason for revert: Restore the default thrashing limits to prevent unresponsive devices.
Bug: 194199500
Change-Id: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260
Critical thrashing limit determines the balance between how much
thrashing should be tolerated before killing a perceptible app.
This threshold might differ between devices, therefore we disable
critical thrashing limit by default allowing each device to set it
individually. This is done to prevent excessive kills of perceptible
apps.
Bug: 194199500
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idd1715564c3727b09fcb0a109ab3d6bae9d0b99a
We see many cases when device keeps thrashing despite lmkd kills. This
happens because killed processes do not free enough filecache to fit
the current workingset completely.
To prevent such cases, introduce ro.lmk.filecache_min_kb property to
specify min filecache size in KB that should be reached after thrashing
is detected. Lmkd will keep killing background processes until this
filecache size limit is satisfied.
Bug: 193293513
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
Merged-In: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
/sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU
allocations size. Include this value into killinfo reports to track GPU
allocation size at the time of the kill.
Bug: 189366037
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
Merged-In: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
- Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT
- Added support to write kill & mem stats information via data socket
to be read & parsed on the AMS Java side for future logging to statsd
Bug: 184698933
Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS
Test: statsd_testdrive 51 54 to inspect statsd logged atoms data
Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4
Merged-In: Id682a438c87b3e4503261d26461f6cee641d86c4
- Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT
- Added support to write kill & mem stats information via data socket
to be read & parsed on the AMS Java side for future logging to statsd
Bug: 184698933
Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS
Test: statsd_testdrive 51 54 to inspect statsd logged atoms data
Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4
With kernel SPLIT_RSS_COUNTING feature it is possible for a valid
process to report RSS of 0 size when reading /proc/pid/statm. This
happens because split RSS accounting aggregates per-thread counters
asynchronously and depending on the timing of the read, reported
value can be inaccurate and occasionally be 0.
lmkd currently treats processes reporting RSS of 0 as dead and
removes them from the list of processes being tracked. This might
lead to a valid process becoming unkillable.
Change lmkd to stop treating RSS of 0 as a sign of a dead process.
Bug: 160199622
Test: set ro.lmk.kill_heaviest_task=true and hack kernel to report RSS=0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia311d2f98649c92d1a487657f94ea51f57813b73
proc_get_name() can return NULL if the corresponding process has died
or open fails with ENOMEM due to memory shortages.
Ensure such cases are handled without NULL pointer access.
Bug: 186157675
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1
proc_get_name() can return NULL if the corresponding process has died
or open fails with ENOMEM due to memory shortages.
Ensure such cases are handled without NULL pointer access.
Bug: 186157675
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1