Add a trace for each kill that includes pid, kill reason, oom_adj_score,
min_oom_score and max_thrashing statistics at the time of the kill.
Bug: 195085238
Test: generate kills while tracing and observer the new tracepoints
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Ic2014adc08f5e5dd4aacd415970332618bd15250
Thrashing threshold tuning requires collecting thrashing level data from
the field and correlating these levels with other indications of device
being non-responsive.
Include current and max thrashing levels in the lmkd kill reports. Max
thrashing level captures the highest level seen since the last kill report.
Bug: 194433891
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
Merged-In: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
Thrashing threshold tuning requires collecting thrashing level data from
the field and correlating these levels with other indications of device
being non-responsive.
Include current and max thrashing levels in the lmkd kill reports. Max
thrashing level captures the highest level seen since the last kill report.
Bug: 194433891
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I8a34dc41e7f03668bfad4ac2cbcb5d2570a10752
This reverts commit e1ffef4e36.
Reason for revert: Restore the default thrashing limits to prevent unresponsive devices.
Bug: 194199500
Change-Id: I15be5b3d67a71b68bca6dea9c2d5b4aa54d6c260
Critical thrashing limit determines the balance between how much
thrashing should be tolerated before killing a perceptible app.
This threshold might differ between devices, therefore we disable
critical thrashing limit by default allowing each device to set it
individually. This is done to prevent excessive kills of perceptible
apps.
Bug: 194199500
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idd1715564c3727b09fcb0a109ab3d6bae9d0b99a
Critical thrashing limit determines the balance between how much
thrashing should be tolerated before killing a perceptible app.
This threshold might differ between devices, therefore we disable
critical thrashing limit by default allowing each device to set it
individually. This is done to prevent excessive kills of perceptible
apps.
Bug: 194199500
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Idd1715564c3727b09fcb0a109ab3d6bae9d0b99a
We see many cases when device keeps thrashing despite lmkd kills. This
happens because killed processes do not free enough filecache to fit
the current workingset completely.
To prevent such cases, introduce ro.lmk.filecache_min_kb property to
specify min filecache size in KB that should be reached after thrashing
is detected. Lmkd will keep killing background processes until this
filecache size limit is satisfied.
Bug: 193293513
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
Merged-In: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
We see many cases when device keeps thrashing despite lmkd kills. This
happens because killed processes do not free enough filecache to fit
the current workingset completely.
To prevent such cases, introduce ro.lmk.filecache_min_kb property to
specify min filecache size in KB that should be reached after thrashing
is detected. Lmkd will keep killing background processes until this
filecache size limit is satisfied.
Bug: 193293513
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I49ca4cd2f33b27fdbc432d9ce6944b1a1794b749
/sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU
allocations size. Include this value into killinfo reports to track GPU
allocation size at the time of the kill.
Bug: 189366037
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
Merged-In: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
/sys/fs/bpf/map_gpu_mem_gpu_mem_total_map BPF map exposes total GPU
allocations size. Include this value into killinfo reports to track GPU
allocation size at the time of the kill.
Bug: 189366037
Test: lmkd_unit_test
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: Icc1ed8ab2593530fa293ff9c82f6c8dc400485f5
proc_get_name() can return NULL if the corresponding process has died
or open fails with ENOMEM due to memory shortages.
Ensure such cases are handled without NULL pointer access.
Bug: 186157675
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Change-Id: I05b288e3808bec0bdb73db32de02ba3a322ca6e1
(cherry picked from commit e5995b8269)
- Added new lmkd message for clients to subscribe LMK_ASYNC_EVENT_STAT
- Added support to write kill & mem stats information via data socket
to be read & parsed on the AMS Java side for future logging to statsd
Bug: 184698933
Test: lmkd_unit_test - test check_for_oom tests lmkd message send to AMS
Test: statsd_testdrive 51 54 to inspect statsd logged atoms data
Change-Id: Id682a438c87b3e4503261d26461f6cee641d86c4
Merged-In: Id682a438c87b3e4503261d26461f6cee641d86c4