`ki` appears to be potentially NULL. Output bogus values if it is.
Caught by the static analyzer:
> system/memory/lmkd/lmkd.cpp:2171:66: warning: Access to field
'kill_reason' results in a dereference of a null pointer (loaded from
variable 'ki') [clang-analyzer-core.NullDereference]
Bug: None
Test: TreeHugger
Change-Id: Iae26855528e1f7fec8f1455e06c7e813a732dc75
Add a trace for each kill that includes pid, kill reason, oom_adj_score,
min_oom_score and max_thrashing statistics at the time of the kill.
Bug: 195085238
Test: generate kills while tracing and observer the new tracepoints
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic2014adc08f5e5dd4aacd415970332618bd15250
Thrashing threshold tuning requires collecting thrashing level data from
the field and correlating these levels with other indications of device
being non-responsive.
Include current and max thrashing levels in the lmkd kill reports. Max
thrashing level captures the highest level seen since the last kill report.
Bug: 194433891
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
Merged-In: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
Critical thrashing limit determines the balance between how much
thrashing should be tolerated before killing a perceptible app.
This threshold might differ between devices, therefore we disable
critical thrashing limit by default allowing each device to set it
individually. This is done to prevent excessive kills of perceptible
apps.
Bug: 194199500
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idd1715564c3727b09fcb0a109ab3d6bae9d0b99a
We see many cases when device keeps thrashing despite lmkd kills. This
happens because killed processes do not free enough filecache to fit
the current workingset completely.
To prevent such cases, introduce ro.lmk.filecache_min_kb property to
specify min filecache size in KB that should be reached after thrashing
is detected. Lmkd will keep killing background processes until this
filecache size limit is satisfied.
Bug: 193293513
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
/sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU
allocations size. Include this value into killinfo reports to track GPU
allocation size at the time of the kill.
Bug: 189366037
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
proc_get_name() can return NULL if the corresponding process has died
or open fails with ENOMEM due to memory shortages.
Ensure such cases are handled without NULL pointer access.
Bug: 186157675
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1
(cherry picked from commit e5995b8269)
- Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT
- Added support to write kill & mem stats information via data socket
to be read & parsed on the AMS Java side for future logging to statsd
Bug: 184698933
Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS
Test: statsd_testdrive 51 54 to inspect statsd logged atoms data
Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4
Merged-In: Id682a438c87b3e4503261d26461f6cee641d86c4
With kernel SPLIT_RSS_COUNTING feature it is possible for a valid
process to report RSS of 0 size when reading /proc/pid/statm. This
happens because split RSS accounting aggregates per-thread counters
asynchronously and depending on the timing of the read, reported
value can be inaccurate and occasionally be 0.
lmkd currently treats processes reporting RSS of 0 as dead and
removes them from the list of processes being tracked. This might
lead to a valid process becoming unkillable.
Change lmkd to stop treating RSS of 0 as a sign of a dead process.
Bug: 160199622
Test: set ro.lmk.kill_heaviest_task=true and hack kernel to report RSS=0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ia311d2f98649c92d1a487657f94ea51f57813b73
proc_get_name() can return NULL if the corresponding process has died
or open fails with ENOMEM due to memory shortages.
Ensure such cases are handled without NULL pointer access.
Bug: 186157675
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1
Occasionally a system can get into heavy file cache thrashing situation
and become unresponsive. In these situations we observe lmkd wakeups,
however it does not kill because all non-perceptible apps are already
killed and the system manages to reclaim enough memory to stay above
min watermark.
Add ro.lmk.thrashing_limit_critical property which when breached will
allow lmkd to kill perceptible apps. The property represents the
percentage of refaulted workingset pages as a fraction of overall file
cache size. By default it is disabled.
Bug: 181778155
Test: thrashing.py 500 10 200
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Merged-In: Icb38ef6c90adaa4f5c956593b6ea0c4febc91dc0
Change-Id: Icb38ef6c90adaa4f5c956593b6ea0c4febc91dc0
When killing a task at or lower than oom_score_adj PERCEPTIBLE_APP_ADJ
choose the heaviest task among the ones at that level to try minimizing
the number of required kills. Because killing a perceptible app will
affect user experience anyway, it makes sense to choose the one that
will release the most memory and therefore no more kills might be
necessary.
Bug: 181778155
Test: running thrashing.py script
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Merged-In: I775ff774430b6fde4d619ede794825dbae59fd8e
Change-Id: I775ff774430b6fde4d619ede794825dbae59fd8e
Occasionally a system can get into heavy file cache thrashing situation
and become unresponsive. In these situations we observe lmkd wakeups,
however it does not kill because all non-perceptible apps are already
killed and the system manages to reclaim enough memory to stay above
min watermark.
Add ro.lmk.thrashing_limit_critical property which when breached will
allow lmkd to kill perceptible apps. The property represents the
percentage of refaulted workingset pages as a fraction of overall file
cache size. By default it is disabled.
Bug: 181778155
Test: thrashing.py 500 10 200
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icb38ef6c90adaa4f5c956593b6ea0c4febc91dc0
When killing a task at or lower than oom_score_adj PERCEPTIBLE_APP_ADJ
choose the heaviest task among the ones at that level to try minimizing
the number of required kills. Because killing a perceptible app will
affect user experience anyway, it makes sense to choose the one that
will release the most memory and therefore no more kills might be
necessary.
Bug: 181778155
Test: running thrashing.py script
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I775ff774430b6fde4d619ede794825dbae59fd8e
Wrong condition causes reporting low watermark breach when min watermark
is breached and visa versa. Fix the condition to make reporting correct.
Bug: 181778155
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I684141c38f961fce99d17cfb3a83706fcd84ea10
Some tools might parse killinfo entries based on the field order. Move
the newly added swap field to the end to ensure compatibility.
Test: build
Change-Id: Id6dad850beba6835f061da95e84190d00a1b26a0