Multigenerational LRU¶
Quick Start¶
Build Configurations¶
- Required
Set
CONFIG_LRU_GEN=y
.- Optional
Set
CONFIG_LRU_GEN_ENABLED=y
to turn the feature on by default.
Runtime Configurations¶
- Required
Write
1
to/sys/kernel/mm/lru_gen/enable
if the feature was not turned on by default.- Optional
Write
N
to/sys/kernel/mm/lru_gen/min_ttl_ms
to protect the working set ofN
milliseconds. The OOM killer is invoked if this working set cannot be kept in memory.- Optional
Read
/sys/kernel/debug/lru_gen
to confirm the feature is turned on. This file has the following output:
memcg memcg_id memcg_path
node node_id
min_gen birth_time anon_size file_size
...
max_gen birth_time anon_size file_size
min_gen
is the oldest generation number and max_gen
is the
youngest generation number. birth_time
is in milliseconds.
anon_size
and file_size
are in pages.
Phones/Laptops/Workstations¶
No additional configurations required.
Servers/Data Centers¶
- To support more generations
Change
CONFIG_NR_LRU_GENS
to a larger number.- To support more tiers
Change
CONFIG_TIERS_PER_GEN
to a larger number.- To support full stats
Set
CONFIG_LRU_GEN_STATS=y
.- Working set estimation
Write
+ memcg_id node_id max_gen [swappiness]
to/sys/kernel/debug/lru_gen
to invoke the aging, which scans PTEs for accessed pages and then creates the next generationmax_gen+1
. A swap file and a non-zeroswappiness
, which overridesvm.swappiness
, are required to scan PTEs mapping anon pages.- Proactive reclaim
Write
- memcg_id node_id min_gen [swappiness] [nr_to_reclaim]
to/sys/kernel/debug/lru_gen
to invoke the eviction, which evicts generations less than or equal tomin_gen
.min_gen
should be less thanmax_gen-1
asmax_gen
andmax_gen-1
are not fully aged and therefore cannot be evicted.nr_to_reclaim
can be used to limit the number of pages to evict. Multiple command lines are supported, so does concatenation with delimiters,
and;
.
Framework¶
For each lruvec
, evictable pages are divided into multiple
generations. The youngest generation number is stored in
lrugen->max_seq
for both anon and file types as they are aged on
an equal footing. The oldest generation numbers are stored in
lrugen->min_seq[]
separately for anon and file types as clean
file pages can be evicted regardless of swap and writeback
constraints. These three variables are monotonically increasing.
Generation numbers are truncated into
order_base_2(CONFIG_NR_LRU_GENS+1)
bits in order to fit into
page->flags
. The sliding window technique is used to prevent
truncated generation numbers from overlapping. Each truncated
generation number is an index to an array of per-type and per-zone
lists lrugen->lists
.
Each generation is then divided into multiple tiers. Tiers represent
levels of usage from file descriptors only. Pages accessed N
times
via file descriptors belong to tier order_base_2(N)
. Each
generation contains at most CONFIG_TIERS_PER_GEN
tiers, and they
require additional CONFIG_TIERS_PER_GEN-2
bits in page->flags
.
In contrast to moving across generations which requires list
operations, moving across tiers only involves operations on
page->flags
and therefore has a negligible cost. A feedback loop
modeled after the PID controller monitors refault rates of all tiers
and decides when to protect pages from which tiers.
The framework comprises two conceptually independent components: the aging and the eviction, which can be invoked separately from user space for the purpose of working set estimation and proactive reclaim.
Aging¶
The aging produces young generations. Given an lruvec
, the aging
traverses lruvec_memcg()->mm_list
and calls walk_page_range()
to scan PTEs for accessed pages (a mm_struct
list is maintained
for each memcg
). Upon finding one, the aging updates its
generation number to max_seq
(modulo CONFIG_NR_LRU_GENS
).
After each round of traversal, the aging increments max_seq
. The
aging is due when min_seq[]
reaches max_seq-1
.
Eviction¶
The eviction consumes old generations. Given an lruvec
, the
eviction scans pages on the per-zone lists indexed by anon and file
min_seq[]
(modulo CONFIG_NR_LRU_GENS
). It first tries to
select a type based on the values of min_seq[]
. If they are
equal, it selects the type that has a lower refault rate. The eviction
sorts a page according to its updated generation number if the aging
has found this page accessed. It also moves a page to the next
generation if this page is from an upper tier that has a higher
refault rate than the base tier. The eviction increments
min_seq[]
of a selected type when it finds all the per-zone lists
indexed by min_seq[]
of this selected type are empty.
To-do List¶
KVM Optimization¶
Support shadow page table walk.
NUMA Optimization¶
Optimize page table walk for NUMA.