Commit Graph

409 Commits

Author SHA1 Message Date
Suren Baghdasaryan 12cacaedc0 lmkd: Change meminfo_log into killinfo_log and log additional fields
meminfo_log is used to log the state of the memory at the time of a kill.
Instead of reporting kill information and meminfo separately let's combine
them into one killinfo_log report. While normal logs can be trimmed by
chatty, meminfo_log uses a separate log context which gives it a better
chance of survival. As a result we will have all the information relevant
to a kill in one report which has higher chance of surviving chatty.

Bug: 74119935
Test: lmkd_unit_test
Change-Id: I83a9c12d538e1fb107721b04fdafc3c6c0d83b60
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-10-01 23:26:56 +00:00
Suren Baghdasaryan 8b016be930 lmkd: Isolate statslog related code from lmkd code
Move statsd related code out of lmkd.c to minimize ifdefs sprinkled around
the code and make it more maintainable.

Bug: 74119935
Test: lmkd_unit_test
Change-Id: Ib22f90fd380b9a31e09ab18ef16787bc07415ddf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-10-01 16:36:08 +00:00
Suren Baghdasaryan c2e05b6ffa lmkd: Fix kill failure handling
When lmkd fails to kill it should log the error, remove the process record
and exit immediately.

Bug: 74119935
Test: lmkd_unit_test
Change-Id: I26b0fd873eeed325f7dd56097ec31abc0d63f3a1
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-10-01 16:35:49 +00:00
Suren Baghdasaryan 970a26aeb7 lmkd: Prevent killing foreground processes due to thrashing
Page cache thrashing affects device performance and by killing a process
we try to stop it. However if the thrashing application is the one which
user is interacting with then lmkd should not kill it even though it might
affect device performance.

Bug: 141286980
Test: SequentialRWTest CTS test
Change-Id: If86c0e7e8ad9adf1816659562151ca083eaa65c4
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:54:55 +00:00
Suren Baghdasaryan 89454e620c lmkd: Add optional kill reason description into kill reports
Allow kill report to be appended with the explanation of the reasons
killing has been done. This would help identify kill reasons while
troubleshooting lmkd kills.

Change-Id: Ie5dd7a44e51d04c43c2492be8c1bc964d1b03555
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:54:39 +00:00
Suren Baghdasaryan 2f88e15c3a lmkd: Enable new kill strategy, add and adjust required system properties
Enable new kill strategy when PSI mode is used in combination with
ro.lmk.use_minfree_levels=false. Adjust ro.lmk.swap_free_low_percentage,
introduce ro.lmk.psi_partial_stall_ms and ro.lmk.psi_complete_stall_ms
system properties to support two levels of PSI events measuring partial
and complete stalls. Add ro.lmk.use_new_strategy system property to switch
to the old strategy if necessary.

Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: I6f1b65e19dbe9b58c862e5e4255270c82f0afb9a
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:54:27 +00:00
Suren Baghdasaryan 81c75b2a33 lmkd: Use aggregate zone watermarks as low memory threshold
Parsing /proc/zoneinfo is expensive and zone watermarks normally do no
change often. Instead of checking free memory per each zone we aggregate
zone watermarks and compare them with MemFree from meminfo as an
approximation of memory being under a given watermark.
zoneinfo parsing is rate limited to once per minute to detect a possible
change of the memory margins from userspace.

Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: If4a8154c004e24324e6de44359de416766139df6
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:54:16 +00:00
Suren Baghdasaryan af2be4c55d lmkd: Introduce kill strategy based on zone watermarks, swap and thrashing
Add new kill strategy which makes kill decisions based on which zone
watermark is breached, how much free swap space is still available and
what percentage of the file-backed page cache has been refaulted. This mode
is designed to be used only with PSI signals. It kills unconditionally when
a critical pressure event is received, therefore PSI stall for that event
should be set to a value representing a truly non-responding system
(currently set to 700ms out of 1sec spent in complete stall). New event
handler also controls polling interval based on current memory conditions.

Bug: 132642304
Test: lmkd_unit_test, ACT memory pressure tests
Change-Id: Ia213ef2bb06b245d651ebf2d813e944b4ae7565f
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:54:05 +00:00
Suren Baghdasaryan e12a067ee0 lmkd: Support variable polling intervals set by event handlers
After a memory event happens event handler can assess current memory
condition and decide if and when lmkd should re-check memory metrics in
order to respond to changing memory conditions. Change the event handler
interface to allow control over polling period and ability to start/extend
polling session.

Bug: 132642304
Test: lmkd_unit_test
Change-Id: Ia74011e943140b6cffbf452ff8e1744b7336eacf
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:53:46 +00:00
Suren Baghdasaryan 92d0eec2d2 lmkd: Change zoneinfo parsing to retrieve zone watermarks
/proc/zoneinfo contains per-node data and each node contains per-zone
section for each zone. Current parser does not recognize this hierarchy
and useful per-zone information like zone watermarks cannot be retrieved.
Change the parser to parse zoneinfo into a hierarchical structure. New
parser also handles up to 2 nodes and can be easily extended to handle
more if needed by changing MAX_NR_NODES.

Bug: 132642304
Test: lmkd_unit_test
Change-Id: I9306289ea6d30d78a261c5d5c29f4f6ea167807d
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:53:34 +00:00
Suren Baghdasaryan 03cb836735 lmkd: Change procfs read routine to handle files larger than 1 page in size
Files like /proc/zoneinfo or /proc/<pid>/status can be larger than 4KB
page size. Change reread_file routine to resize read buffer whenever
it is not big enough to read the entire file. Start with 1-page buffer
and double its size until it's big enough to read the entire file.
Read /proc/zoneinfo during initialization to initialize the buffer
to a big enough size and avoid re-allocations when under memory pressure.

Bug: 137010962
Test: lmkd_unit_test
Change-Id: If9a5b0d27c2f4de9063f0fd0f36f908ece87dcce
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:53:18 +00:00
Suren Baghdasaryan 6dce58d561 lmkd: Fix killed process name reporting
Fix termination of killed process name in proc_get_name function. While at
it also fix the coding style in the function.

Test: lmkd_unit_test
Bug: 141780598
Change-Id: I3f99b3e37b9a9d0750ece94f08f0b50ac839dacb
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-09-30 16:02:35 +00:00
Jim Blackler 90853b6203 lmkd: Maintain pid to taskname mapping to amend kill reports.
Required because the kernel cannot always get the taskname safely at
the time the process is killed (due to competition for mm->mmap_sem).

Test: manually
Bug: 130017100
Signed-off-by: Jim Blackler <jimblackler@google.com>
Change-Id: I27a2c3340da321570f0832d58fe9e79ca031620b
2019-09-26 16:27:03 -07:00
Suren Baghdasaryan fe8419215a Merge "lmkd: Prevent non-main threads being registered or killed by lmkd"
am: e0b729d214

Change-Id: I11738b8c2c7acfa4947976e8b12ac9b94e7fc8ac
2019-07-12 13:39:47 -07:00
Suren Baghdasaryan 3ee11d4392 lmkd: Prevent non-main threads being registered or killed by lmkd
Only thread group leaders should be registered with lmkd. Add a check to
ignore any non-leader TIDs and generate an error if such condition is
detected. Run the same check before killing a process to detect cases of
non-leader TIDs being used to kill a process. This might happen if PIDs
overflow and previously registered PID gets reused for a non-leader
thread in the following scenario:

1. pid X is a thread group leader and is registered with lmkd
2. pid X dies without lmkd knowing it and pid gets recycled
3. process Y creates a thread with tid X
4. lmkd kills pid X which results in process Y being killed

Bug: 136408020
Test: lmkd_unit_test
Change-Id: I46c5a0b273f2b72cefc20ec59b80b4393f2a1a37
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-07-12 15:11:54 +00:00
Tim Murray b99e0df08b Merge "lmkd: use ALOGE for logging kills" into qt-dev
am: cc18faf4d6

Change-Id: I51e496d8e5ffa102585dbfaabf8878e9d6bd6c71
2019-05-30 15:31:52 -07:00
Tim Murray 406a184c22 lmkd: use ALOGE for logging kills
Test: boots, works
Bug: 133761317
Signed-off-by: Tim Murray <timmurray@google.com>
Exempt-From-Owner-Approval: trivial change

Change-Id: I1a4a3741694078eec124f1f560ea68e78754bca6
2019-05-30 22:10:36 +00:00
Xin Li d595cdcb43 [automerger skipped] DO NOT MERGE - Skip pi-platform-release (PPRL.190505.001) in stage-aosp-master
am: 26ba26bf71 -s ours
am skip reason: subject contains skip directive

Change-Id: Ia7ec0913ab3369794a415cb2538afbef90f91b30
2019-05-15 17:44:48 -07:00
Xin Li 0946569e6e DO NOT MERGE - Skip pi-platform-release (PPRL.190505.001) in stage-aosp-master
Bug: 132622481
Change-Id: I14ace58ee5b4efd490f9213f7a2087d4d56334db
2019-05-14 12:10:14 -07:00
android-build-team Robot 07b0174090 Snap for 5450365 from e6ef013d2c8c1201540128d9961b73450257ab90 to pi-platform-release
Change-Id: I51d6a1b331634e927247a13a96bfdd10d4e00f4f
2019-05-07 22:04:02 +00:00
Jim Blackler b3cd6342f3 Merge "Allow memory metrics on devices that use kernel LMK" am: 446014ecf9
am: 1376ac38b2

Change-Id: I9f2939684710c6d4efb77175ea17d62a89866563
2019-04-30 03:28:14 -07:00
Jim Blackler ad823c6ac6 Merge "Allow memory metrics on devices that use kernel LMK"
am: 446014ecf9

Change-Id: I9b24267ab63bd4fb58fb36de50561a9c2bf47939
2019-04-30 03:24:06 -07:00
Jim Blackler 700b7191e1 Allow memory metrics on devices that use kernel LMK
Bug: 130017100
Test: Tested manually
Change-Id: I37f6edb71decc1260bd521595842508926fa86aa
2019-04-29 11:02:51 +00:00
Suren Baghdasaryan 2910b0fea1 Merge "lmkd: set PSI_POLL_PERIOD to 10ms" am: e346d03c2b am: bbe0e86c73
am: fc3ccad7de

Change-Id: I36b9a34f8996e9db69cb5697bf752f9b330c9554
2019-03-26 22:23:58 -07:00
Suren Baghdasaryan 34f40e00e7 Merge "lmkd: set PSI_POLL_PERIOD to 10ms" am: e346d03c2b
am: bbe0e86c73

Change-Id: I5cc97b7f87cfb356127ab379695d77acd408c828
2019-03-26 22:19:29 -07:00
Suren Baghdasaryan 8f33f1f8a6 Merge "lmkd: set PSI_POLL_PERIOD to 10ms"
am: e346d03c2b

Change-Id: Ia148c74acf22e2f863eb4f5f411f3e5f0f4e5f55
2019-03-26 22:14:26 -07:00
Suren Baghdasaryan 881a544aec lmkd: set PSI_POLL_PERIOD to 10ms
Occasionally we see cases when 40ms polling is still too conservative.
Change to 10ms polling period. Since the polling happens only after PSI
signal and continues for 1sec this should not affect system performance.

Test: lmkd_unit_test
Bug: 129358844

Change-Id: Ib759b865b2104be23741fc0eacaa541e22d50dde
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-27 02:10:10 +00:00
Suren Baghdasaryan d02e12f514 Merge "lmkd: Fix meminfo logs missing SwapTotal and having wrong field order" am: eedd5f6855 am: 318e33a401
am: 1b7fb341f1

Change-Id: I1187c9c2ab83a086807ca76bd2ca28da5cd0b808
2019-03-25 15:37:05 -07:00
Suren Baghdasaryan e616b91acd Merge "lmkd: Fix meminfo logs missing SwapTotal and having wrong field order" am: eedd5f6855
am: 318e33a401

Change-Id: I229f6f19e3f7a29793484dc53d2ae5c6af5de8f2
2019-03-25 15:33:07 -07:00
Suren Baghdasaryan 81211b235d Merge "lmkd: Fix meminfo logs missing SwapTotal and having wrong field order"
am: eedd5f6855

Change-Id: I89f2abf04b6b50a69528ccd383b38bb83a8b164c
2019-03-25 15:28:17 -07:00
Suren Baghdasaryan 8a3e0c15c1 lmkd: Fix meminfo logs missing SwapTotal and having wrong field order
Previous change If154dc364711bf7c86f32e24ddcd10be359386de called
"lmkd: Do not downgrade/ignore events when swap is full" added SwapTotal
into meminfo structure without adding the field into events.logtag file.
This results in logs which missing field and all fields starting with
"SwapFree" get reordered as a result. Fix this by adding the missing field
into events.logtag.

Bug: 129274901
Test: Confirm correct information in the logcat
Change-Id: Ia4de3790a7e9d49a0e4cba8b3161a715eaf6532e
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
2019-03-25 11:04:11 -07:00
Tim Murray 6e8bb64ce9 Merge "lmkd: set PSI_POLL_PERIOD to 40ms" am: 92dcfb3187 am: 90cfb547d2
am: 8f66fc9000

Change-Id: I66d8ed75fb820edcd369e17eee4c83056edbea64
2019-03-13 16:43:58 -07:00
Tim Murray 6aa9241282 Merge "lmkd: set PSI_POLL_PERIOD to 40ms" am: 92dcfb3187
am: 90cfb547d2

Change-Id: I3bcad580bb095c186d990de98723ec78463ba2d4
2019-03-13 16:39:49 -07:00
Tim Murray f71c9beded Merge "lmkd: set PSI_POLL_PERIOD to 40ms"
am: 92dcfb3187

Change-Id: Ib8d6a2dc535e9cccad95c790da9454d5550b4cd2
2019-03-13 16:35:39 -07:00
Tim Murray 55e117bd2e Merge "lmkd: set PSI_POLL_PERIOD to 40ms" 2019-03-13 23:27:13 +00:00
Tim Murray 0fca3629ca lmkd: set PSI_POLL_PERIOD to 40ms
200ms was too lenient when under severe memory pressure.

Test: boots, works
Bug: 127765309

Change-Id: I8e047de6318574a107720c56473ed0f25582e182
Signed-off-by: Tim Murray <timmurray@google.com>
2019-03-13 10:13:19 -07:00
The Android Open Source Project 6faf98ff88 [automerger skipped] DO NOT MERGE - Merge PPRL.190305.001 into master am: 4d916a1ece -s ours am: 8ba23f2a77 -s ours
am: e4660ddc99 -s ours
am skip reason: subject contains skip directive

Change-Id: I5d90725e75f140c111188e13dba76dae683f3ba7
2019-03-12 23:02:09 -07:00
The Android Open Source Project 4ca7e05e68 [automerger skipped] DO NOT MERGE - Merge PPRL.190305.001 into master am: 4d916a1ece -s ours
am: 8ba23f2a77 -s ours
am skip reason: subject contains skip directive

Change-Id: I448cd15c59dac97660765fc2a02e5ff611a3ed70
2019-03-12 22:34:21 -07:00
The Android Open Source Project 62324f3b00 [automerger skipped] DO NOT MERGE - Merge PPRL.190305.001 into master
am: 4d916a1ece -s ours
am skip reason: subject contains skip directive

Change-Id: I31a0756b75ab6d657dc26807ce8baeb42c40d232
2019-03-12 21:47:32 -07:00
The Android Open Source Project 8199c0f1ca DO NOT MERGE - Merge PPRL.190305.001 into master
Bug: 127812889
Change-Id: I16a546dc24d3cf980ad7ab09895c0d97ee436224
2019-03-11 11:57:28 -07:00
Suren Baghdasaryan 0a966657f9 Merge "Add min_score_adj into LmkKillOccurred event" am: a953ae0546 am: efcab54b55
am: a4bd8777d3

Change-Id: I1c55bd0e3ab771e20cd77250ade6ee20f9e50cce
2019-03-05 16:23:28 -08:00
Jim Blackler b8d21d5d2b [automerger skipped] Add start time to LmkKillOccurred
am: e7a9fabd64 -s ours
am skip reason: change_id I4ef6433391c8758626334731d2b5de038e4468ae with SHA1 1417cdbddb is in history

Change-Id: I0b6eb14568d480b13fd0cea14863a9ad4c14c0cd
2019-03-05 16:15:29 -08:00
Suren Baghdasaryan da0073abd4 Merge "Add min_score_adj into LmkKillOccurred event" am: a953ae0546
am: efcab54b55

Change-Id: I0c1f8f60ef70181e4d3e1399eae45723040174f5
2019-03-05 16:14:07 -08:00
Rajeev Kumar 4255742795 [automerger skipped] Read memory stats from /proc/pid/stat file.
am: e7cfa67a05 -s ours
am skip reason: change_id Ie555933aafa6a6b7aa1dbf5518ebe804376e0afd with SHA1 fe31bef940 is in history

Change-Id: I3c357f360da2969e1850f32419e4b074bbe17e21
2019-03-05 16:10:57 -08:00
Jim Blackler dcdb43747e [automerger skipped] Add start time to LmkKillOccurred am: 962e0442d1 -s ours
am: b68fe506e0 -s ours
am skip reason: change_id I4ef6433391c8758626334731d2b5de038e4468ae with SHA1 34c3cb84a0 is in history

Change-Id: I2521241013c39d3cc16f45b1df996135f92a053f
2019-03-05 15:48:51 -08:00
Rajeev Kumar b14f9f3b46 [automerger skipped] Read memory stats from /proc/pid/stat file. am: 2bc24f88ca -s ours
am: 9eee2302ee -s ours
am skip reason: change_id Ie555933aafa6a6b7aa1dbf5518ebe804376e0afd with SHA1 4dbc24d393 is in history

Change-Id: I42d2080003936bcf99fec6917fe5e52366855c07
2019-03-05 15:48:03 -08:00
Jim Blackler 306de0dba5 Add start time to LmkKillOccurred
This is to measure an application's behavior with respect to being LMKed
(the longer an app lives before being LMKed, the better).

Bug: 119854389
Test: Manual
Change-Id: I4ef6433391c8758626334731d2b5de038e4468ae
Merged-In: I4ef6433391c8758626334731d2b5de038e4468ae
(cherry picked from I4ef6433391c8758626334731d2b5de038e4468ae)
2019-03-05 15:47:56 -08:00
Rajeev Kumar 160e50c6f6 Read memory stats from /proc/pid/stat file.
(cherry pick from commit 0301683e49ab255769b15469487feaab3466167a)
Bug: 117333340
Test: Manual testing using alloc-stress tool
Merged-In: Ie555933aafa6a6b7aa1dbf5518ebe804376e0afd

Change-Id: I8ab08606dba7de2f65711204453067dbfbdcbdd8
2019-03-05 15:46:07 -08:00
Jim Blackler d3b9beb3cd [automerger skipped] Add start time to LmkKillOccurred
am: 962e0442d1 -s ours
am skip reason: change_id I4ef6433391c8758626334731d2b5de038e4468ae with SHA1 1417cdbddb is in history

Change-Id: I56f76418a5c6a3435dec766d731068f60bd4b642
2019-03-05 15:27:13 -08:00
Rajeev Kumar 096ef04338 [automerger skipped] Read memory stats from /proc/pid/stat file.
am: 2bc24f88ca -s ours
am skip reason: change_id Ie555933aafa6a6b7aa1dbf5518ebe804376e0afd with SHA1 4dbc24d393 is in history

Change-Id: I5676596b2ee9f7448faa0b0274ac9425c7525fb0
2019-03-05 15:26:28 -08:00