What Does a Scheduler Actually Do?
Imagine a traffic controller at a busy airport. Dozens of planes are ready to land, but only a handful of runways are available. The controller decides which plane lands next, for how long it uses the runway, and when to wave it off so someone else gets a turn. That is what a scheduler does — except the planes are threads and the runways are CPUs.
Source File Map
| File | Purpose |
|---|---|
kern/sched_4bsd.c | The traditional 4BSD scheduler implementation |
kern/sched_ule.c | The ULE scheduler (default since FreeBSD 7.1) |
kern/sched_shim.c | Pluggable scheduler framework (FreeBSD 16+) |
sys/sched.h | Scheduler API — public function declarations |
kern/kern_switch.c | Run queue operations and mi_switch() |
sys/runq.h | Run queue data structure definitions |
kern/kern_synch.c | Sleep/wakeup, thread blocking and unblocking |
amd64/amd64/cpu_switch.S | AMD64 low-level context switch assembly |
arm64/arm64/swtch.S | ARM64 low-level context switch assembly |
The Scheduler API
The sched.h header defines the contract every scheduler must fulfill:
| Function | What It Does |
|---|---|
sched_add() | Place a thread on a run queue — "this thread is ready to run" |
sched_switch() | Pick the next thread and switch to it |
sched_choose() | Select the highest-priority runnable thread |
sched_clock() | Periodic tick — update usage stats, check time slices |
sched_prio() | Set a thread's priority |
sched_wakeup() | Wake a sleeping thread and schedule it |
sched_fork() | Initialize scheduling state for a new child thread |
sched_exit() | Clean up scheduling state when a thread dies |
sched_affinity() | Handle CPU affinity changes |
The Priority Space
FreeBSD uses numeric priorities where lower = more important. The 256-value priority space is divided into classes:
| Class | FreeBSD 10 | FreeBSD 15 | FreeBSD 16 |
|---|---|---|---|
| PRI_ITHD (interrupts) | 0–47 | 0–15 | 0–7 |
| PRI_REALTIME | 48–79 | 16–47 | 8–39 |
| PRI_KERN (kernel) | 80–119 | 48–87 | 40–55 |
| PRI_TIMESHARE (user) | 120–223 | 88–223 | 56–223 |
| PRI_IDLE | 224–255 | 224–255 | 224–255 |
Interrupt and realtime priorities have been getting narrower across versions, giving more room to timeshare threads. This reflects modern workloads where interactive responsiveness matters more than having many distinct interrupt priority levels.
SRQ Flags
When a thread is added to a run queue, flags describe why it is being added:
#define SRQ_BORING 0x0000 /* No special circumstances */
#define SRQ_YIELDING 0x0001 /* Thread is yielding voluntarily */
#define SRQ_OURSELF 0x0002 /* Adding ourselves to the run queue */
#define SRQ_INTR 0x0004 /* Wakeup is interrupt-driven (urgent) */
#define SRQ_PREEMPTED 0x0008 /* Thread was preempted */
#define SRQ_BORROWING 0x0010 /* Priority updated due to lending */
#define SRQ_HOLD 0x0020 /* Return holding td lock (14+) */
#define SRQ_HOLDTD 0x0040 /* Return holding td lock (14+) */
SRQ_BORING: The default — just a regular thread being placed on the queue, nothing unusual.
SRQ_YIELDING: The thread said "I'm done for now, let someone else go." The scheduler places it at the back of its priority queue.
SRQ_INTR: This thread was woken by a hardware interrupt (like a network packet arriving). It may need to run urgently.
SRQ_PREEMPTED: A higher-priority thread showed up, so this one was forcibly pulled off the CPU. Gets special treatment when re-queued.
SRQ_HOLD / SRQ_HOLDTD (FreeBSD 14+): Lock-management flags that control which locks are held when the function returns — important for avoiding lock-order violations.
A web server thread is sleeping, waiting for a network packet. The packet arrives and triggers a hardware interrupt. Which SRQ flag will be used when adding this thread back to the run queue?
Meet the 4BSD Scheduler
The 4BSD scheduler is the old guard — the scheduling algorithm that dates back to original BSD Unix. Its approach is elegant in its simplicity: track how much CPU time each thread has used recently, and gradually lower the priority of CPU-hungry threads so interactive programs stay responsive.
The Priority Decay Formula
Every time a clock tick fires, the 4BSD scheduler asks: "How much CPU has this thread used?" and adjusts its priority accordingly:
- PUSER — base priority for timeshare threads
- ts_estcpu — estimated recent CPU usage (higher = used more CPU)
- INVERSE_ESTCPU_WEIGHT — controls how strongly CPU usage affects priority (typically 8)
- p_nice — user-settable "niceness" value (−20 to +20)
The more CPU you use, the higher your priority number becomes — which means lower actual priority. CPU hogs naturally sink to the back of the line.
CPU Usage Decay (schedcpu)
Every second, the schedcpu() function applies exponential decay to each thread's CPU usage estimate:
This means old CPU usage is gradually "forgotten." Under high load, decay happens more slowly — the system remembers CPU-hungry threads longer when resources are scarce.
The Clock Tick Handler
/* Called on every scheduler clock tick */
ts->ts_cpticks++;
ts->ts_estcpu = ESTCPULIM(ts->ts_estcpu + 1);
if ((ts->ts_estcpu % INVERSE_ESTCPU_WEIGHT) == 0)
resetpriority(td);
Increment the tick counter (ts_cpticks) — this tracks raw clock ticks for the current scheduling window.
Add 1 to the estimated CPU usage (ts_estcpu) and clamp it to a maximum value so it does not overflow.
Every Nth tick (where N = INVERSE_ESTCPU_WEIGHT, typically 8), recalculate the thread's priority using the decay formula above.
Priority does not change on every tick — it is batched for efficiency. The thread's position in the run queue only shifts every 8 ticks.
Per-Thread State: struct td_sched
The 4BSD scheduler's per-thread state has evolved across versions:
struct td_sched {
fixpt_t ts_pctcpu;
int ts_cpticks;
int ts_slptime;
int ts_flags;
struct runq *ts_runq;
};
/* FreeBSD 11 added: */
int ts_slice; /* Remaining ticks in quantum */
/* FreeBSD 12+ added: */
u_int ts_estcpu; /* Estimated CPU utilization */
ts_pctcpu: Percentage of CPU used — shown in ps output as the CPU% column.
ts_cpticks: Raw clock ticks consumed in the current window. Feeds the decay formula.
ts_slptime: How long this thread has been sleeping. After sleeping, the next wakeup gets a priority boost.
ts_slice (FB11+): Remaining ticks before the scheduler forces a switch. Adds explicit time-slicing.
ts_estcpu (FB12+): Moved from the process struct into the per-thread struct for finer-grained accounting.
Run Queue Organization
On SMP systems, there are two pools of queues: a global queue for threads without CPU preferences, and per-CPU queues for threads pinned to specific CPUs.
4BSD Feature Matrix
| Feature | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|---|---|---|---|---|---|---|---|
| ts_slice field | — | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| TDF_SLICEEND flag | — | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| ts_estcpu in td_sched | — | — | ✓ | ✓ | ✓ | ✓ | ✓ |
| sched_clock(td, cnt) | — | — | — | — | ✓ | ✓ | ✓ |
| sched_switch(td, flags) | — | — | — | — | ✓ | ✓ | ✓ |
| HWT hooks | — | — | — | — | — | — | ✓ |
| sched_4bsd_* naming | — | — | — | — | — | — | ✓ |
| struct sched_instance | — | — | — | — | — | — | ✓ |
A thread has been using a lot of CPU. What happens to its ts_estcpu value during the schedcpu() decay pass?
Meet the ULE Scheduler
ULE (pronounced "you-lee") replaced the 4BSD scheduler as the default in FreeBSD 7.1. Where 4BSD treats SMP as an afterthought, ULE was designed from the ground up for multi-core systems.
Per-CPU Design
The fundamental insight of ULE is that each CPU gets its own scheduler state — a struct tdq structure with separate queues for real-time and timeshare threads:
On a 64-core server, 64 CPUs can schedule threads simultaneously without contending on a single lock — a massive scalability win over 4BSD's global queue approach.
Interactivity Detection
ULE automatically classifies threads as interactive or batch-oriented by measuring their voluntary sleep patterns. The magic happens in sched_interact_score():
static int
sched_interact_score(struct thread *td) {
struct td_sched *ts = td->td_sched;
int div;
if (ts->ts_runtime >= ts->ts_slptime) {
div = max(1, ts->ts_runtime / SCHED_INTERACT_HALF);
return (SCHED_INTERACT_HALF +
(SCHED_INTERACT_HALF - (ts->ts_slptime / div)));
}
div = max(1, ts->ts_slptime / SCHED_INTERACT_HALF);
return (ts->ts_runtime / div);
}
Get the thread's scheduler state, which tracks time spent running vs. sleeping.
If the thread has spent more time running than sleeping, it gets a high score (less interactive — it is a CPU hog).
If the thread sleeps more than it runs, it gets a low score (more interactive — waiting for user input or I/O).
The SCHED_INTERACT_HALF constant creates a threshold: scores above it mark batch threads, below mark interactive ones.
Interactive threads receive a priority boost, keeping them responsive even under heavy load.
Work Stealing & Load Balancing
Is my own queue empty? If no, run the highest-priority thread. No cross-CPU coordination needed.
Check if any other CPU's tdq_load exceeds the load balancing threshold. If so, steal from the busiest.
Prefer stealing from CPUs in the same NUMA domain, then same physical package, then same SMT core. Cross-domain transfers cost cache-line flushes.
ULE Feature Matrix
| Feature | 10 | 11 | 12 | 13 | 14 | 15 | 16 |
|---|---|---|---|---|---|---|---|
| 3 run queues per CPU | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | — |
| 1 unified run queue | — | — | — | — | — | — | ✓ |
| sched_slice_min | — | — | ✓ | ✓ | ✓ | ✓ | ✓ |
| always_steal tunable | — | — | — | — | ✓ | ✓ | ✓ |
| Cache-padded lock | — | — | — | — | ✓ | ✓ | ✓ |
| tdq_curthread field | — | — | — | — | — | ✓ | ✓ |
| Lockless TDQ accessors | — | — | — | — | — | — | ✓ |
| kern.sched.ule.* namespace | — | — | — | — | — | — | ✓ |
A thread spends 90% of its time sleeping (waiting for user keystrokes) and only 10% running. What does ULE's sched_interact_score() return for it?
The Run Queue
The run queue is the central data structure that holds all threads ready to execute. Its design has evolved significantly between FreeBSD versions, with the most dramatic change arriving in FreeBSD 16.
Data Structure Evolution
#define RQ_NQS 64 /* 64 run queues */
#define RQ_PPQ 4 /* 4 priorities per queue */
struct rqbits {
rqb_word_t rqb_bits[RQB_LEN];
};
struct runq {
struct rqbits rq_status;
struct rqhead rq_queues[RQ_NQS];
};
64 queues, each covering 4 priority values. Priority 0–3 → queue 0, priority 4–7 → queue 1, and so on.
rq_status: a bitmask with one bit per queue — set means "this queue has threads, check it."
Finding the highest-priority runnable thread = finding the lowest-set bit in rq_status. That is a single bsf (bit-scan forward) instruction — O(1).
The downside: 4 different priorities share a queue, so a priority-5 thread might wait behind priority-7 threads at the same queue slot.
#define RQ_NQS 256 /* 256 run queues */
#define RQ_PPQ 1 /* 1 priority per queue */
typedef unsigned long rqsw_t;
#define RQSW_NB 4 /* 4 × 64-bit = 256 bits */
struct rq_status {
rqsw_t rq_sw[RQSW_NB];
};
struct runq {
struct rq_status rq_status;
struct rq_queue rq_queues[RQ_NQS];
};
256 queues — one per priority level. Priority 5 → queue 5, no sharing.
rq_status: now 4 × 64-bit words = 256 bits total. One bit per queue, just more of them.
Finding the highest-priority thread: scan the 4 words with __builtin_ctzl() (count trailing zeros) — still effectively O(1).
✅ Eliminates intra-band priority inversion: no thread with priority 5 waits behind priority 7 in the same slot.
Run Queue Comparison
| Aspect | FreeBSD 10–15 | FreeBSD 16 |
|---|---|---|
| Queues | 64 | 256 |
| Priorities per queue | 4 | 1 |
| Status tracking | struct rqbits (platform-specific) | struct rq_status with rqsw_t array |
| Bit operations | Direct rqb_bits manipulation | runq_sw_*() abstraction functions |
| Priority mapping | pri / RQ_PPQ = queue index | 1:1 mapping |
Core Queue Operations
/* Conceptual runq_add (simplified) */
void
runq_add(struct runq *rq, struct thread *td, int flags) {
int pri = td->td_priority;
int qi = pri / RQ_PPQ; /* FB10-15: pri/4 FB16: pri/1 */
struct rq_queue *rqh = &rq->rq_queues[qi];
if (flags & SRQ_YIELDING)
TAILQ_INSERT_TAIL(rqh, td, td_runq);
else
TAILQ_INSERT_HEAD(rqh, td, td_runq);
runq_setbit(&rq->rq_status, qi);
}
Convert the thread's numeric priority to a queue index. In FB10-15, divide by 4 so priorities 0–3 share queue 0. In FB16, it is 1:1.
Get a pointer to the target queue head in the run queue array.
If the thread is yielding (gave up CPU voluntarily), add it to the tail — it waits behind others at the same priority.
Otherwise (preempted or newly runnable), add it to the head — it runs first when its priority comes up.
Set the corresponding bit in the status bitmask to indicate this queue is non-empty. Enables O(1) queue selection via bit-scan instructions.
With RQ_PPQ=1 in FB16, every priority gets its own queue. This eliminates "priority inversion within a band" — no thread with priority 5 has to wait behind a thread with priority 7 in the same queue slot.
How does FreeBSD 16 track which of its 256 run queues are non-empty?
Context Switching
Every time the scheduler picks a new thread, the CPU needs to save the old thread's register state and load the new thread's state. This is the context switch — the most performance-critical path in the entire kernel.
What Gets Saved?
AMD64 (x86-64)
| Register Class | Registers | Why |
|---|---|---|
| Callee-saved GPRs | r12, r13, r14, r15, rbp, rsp, rbx | ABI requires these survive function calls |
| Instruction pointer | rip (via return address on stack) | Resume execution at the right place |
| FPU/SIMD state | xsave area (or fxsave on older CPUs) | AVX/SSE registers — lazily saved only if used |
ARM64 (AArch64)
| Register Class | Registers | Why |
|---|---|---|
| Callee-saved GPRs | x19–x29, lr (x30) | ARM64 calling convention preserves these |
| Stack & frame | sp, fp (x29) | Each thread has its own kernel stack |
| VFP/NEON state | Floating-point registers | Lazily saved when next used |
| PAC keys | ptrauth_switch() | Pointer authentication keys (security feature) |
Both architectures use lazy FPU saving: the FPU state is only saved/restored when a thread actually uses floating-point instructions. This avoids saving 512+ bytes of SIMD state on every context switch — most kernel threads never touch the FPU.
The CPU Conversation
What a context switch looks like as a conversation between CPU components:
The mi_switch() Orchestrator
void
mi_switch(int flags) {
struct thread *td = curthread;
/* 1. Account for CPU time used */
sched_switch(td, flags);
/* sched_switch picks next thread and calls */
/* cpu_switch(old_td, new_td, lock) */
/* 2. We return here when scheduled again */
td->td_oncpu = PCPU_GET(cpuid);
}
Get the currently running thread (curthread is a per-CPU global — fast, no locking needed).
Call sched_switch(), which asks the scheduler to pick the next thread. The scheduler calls cpu_switch() under the hood.
The key insight: cpu_switch() does not return to us — it returns to the new thread's saved state. We only resume when some future scheduling decision picks us again.
When we are re-scheduled (maybe milliseconds or seconds later), we update our CPU ID to reflect which CPU we are now running on — we might have migrated to a different core!
Context Switch Overhead
Only register save/restore. No TLB flush. Cheapest possible switch.
Registers + page table switch + TLB flush. Some TLB entries may be preserved with PCID/ASID tags.
Added if either thread uses FPU/AVX/NEON. The xsave area can be 512–8192 bytes depending on available extensions.
After cpu_switch() is called, when does the old thread resume executing?
The Pluggable Scheduler Framework
Until FreeBSD 15, choosing between 4BSD and ULE was a compile-time decision — you had to rebuild the kernel to switch. FreeBSD 16 changes this with a new pluggable scheduler framework that enables boot-time selection.
The Shim Architecture
The shim layer uses IFUNC trampolines — the same mechanism used for CPU-optimized string functions like memcpy(). The linker resolves each function pointer once at boot time. After that, each scheduler call goes directly to the implementation — no extra pointer dereference at runtime.
struct sched_instance — The Vtable
Each scheduler registers itself by filling in a structure of function pointers:
struct sched_instance {
/* Thread lifecycle */
void (*sched_fork)(struct thread *, struct thread *);
void (*sched_exit)(struct proc *, struct thread *);
/* Core scheduling */
void (*sched_clock)(struct thread *, int);
void (*sched_switch)(struct thread *, int);
void (*sched_add)(struct thread *, int);
struct thread *(*sched_choose)(void);
void (*sched_wakeup)(struct thread *, int);
/* Priority management */
void (*sched_prio)(struct thread *, u_char);
void (*sched_user_prio)(struct thread *, u_char);
/* ~45 total: affinity, preempt, bind, load… */
};
This is a vtable — the same design pattern as C++ virtual functions, but in plain C.
Each scheduler (4BSD, ULE, or a future third-party one) fills in this struct with its own function pointers.
The kernel calls sched_add(). The shim resolves this to whichever scheduler's .sched_add pointer was set at boot.
With ~45 function pointers, the interface covers everything from creating threads (sched_fork) to making scheduling decisions (sched_choose).
How DECLARE_SCHEDULER Works
#define DECLARE_SCHEDULER(si) \
DATA_SET(schedulers, si); \
SCHED_DEFINE_IFUNCS(si)
/* In sched_ule.c: */
static struct sched_instance ule_instance = {
.sched_add = sched_ule_add,
.sched_switch = sched_ule_switch,
.sched_choose = sched_ule_choose,
/* ... all ~45 functions ... */
};
DECLARE_SCHEDULER(ule_instance);
/* Loader tunable — set in /boot/loader.conf: */
kern.sched.name="ULE" # or "4BSD"
DECLARE_SCHEDULER does two things: registers the scheduler instance in a linker set, and generates IFUNC trampoline stubs.
DATA_SET(schedulers, si) places the instance in a special linker section so the kernel can discover all available schedulers at boot.
SCHED_DEFINE_IFUNCS generates one IFUNC resolver per scheduler function. At boot, the linker resolves each to the active scheduler's function pointer.
Setting kern.sched.name=ULE in /boot/loader.conf selects the scheduler — no recompilation needed.
What This Enables
Compare schedulers on the same hardware by rebooting with a different tunable. No recompilation, no separate kernel binary.
Researchers can implement novel scheduling algorithms as kernel modules, loadable without modifying the base kernel.
Distributions ship a single kernel binary with both schedulers compiled in, selected at boot time based on workload.
A researcher wants to test a new scheduling algorithm on FreeBSD 16. What do they need to do?
The Big Picture
Now that we have explored each subsystem in detail, let us step back and see how the FreeBSD scheduler has evolved across major versions, how the two schedulers compare, and how to observe it all with SDT probes and DTrace.
Timeline of Major Changes
FreeBSD 10 — Baseline
Both schedulers present, 64-queue runq, compile-time selection. The 4BSD and ULE schedulers coexist as separate compile options.
FreeBSD 11
ts_slice added to 4BSD for explicit time-slicing. SDT probes added to both schedulers for DTrace observability.
FreeBSD 12
ts_estcpu becomes explicit in 4BSD's td_sched. ULE gets a minimum time-slice tunable (sched_slice_min).
FreeBSD 14
API cleanup: sched_switch loses the newtd parameter, sched_wakeup gains flags. ULE gets cache-padded locks to avoid false sharing on multi-socket systems.
FreeBSD 15
tdq_curthread added to ULE, sched_ap_entry for application processor startup, AST-based preemption improvements.
FreeBSD 16
Unified run queue (3 queues → 1 per CPU), 256-queue runq (RQ_PPQ=1), pluggable framework (sched_shim.c), per-scheduler sysctl namespaces (kern.sched.ule.*).
4BSD vs ULE Comparison
| Feature | 4BSD | ULE |
|---|---|---|
| Design | Single global lock, global + per-CPU queues | Per-CPU locks, per-CPU queues only |
| Priority algorithm | Exponential CPU-usage decay | Interactivity scoring (sleep/run ratio) |
| Time slices | Fixed quantum (~100 ms) | Variable, based on classification |
| Interactive boost | None | Automatic: score < 30 → boosted |
| CPU topology | Basic (last-CPU preference) | Full: SMT, packages, NUMA |
| Load balancing | Shortest-queue + IPI | Work-stealing, topology-aware |
| Run queues (FB10–15) | 1 global + N per-CPU | 3 per CPU: realtime, timeshare, idle |
| Run queues (FB16) | 1 global + N per-CPU | 1 unified per CPU |
| Locking | sched_lock global spinlock | Per-CPU tdq_lock spinlocks |
| Default since | Original BSD (retired from default in 7.1) | FreeBSD 7.1 onward |
Available SDT Probes
| Probe | Fires When |
|---|---|
sched:::change-pri | Thread priority changed via sched_thread_priority() |
sched:::lend-pri | Priority lent due to priority propagation |
sched:::enqueue | Thread added to run queue (sched_add()) |
sched:::dequeue | Thread removed from run queue |
sched:::load-change | Run queue load changes |
sched:::on-cpu | Thread begins executing after context switch |
sched:::off-cpu | Thread switched off CPU |
sched:::surrender | Thread yields due to preemption |
DTrace One-Liners
# Watch threads being enqueued in real time
dtrace -n 'sched:::enqueue {
printf("%s (pid %d) enqueued", execname, pid); }'
# Top 10 threads by scheduling events, every 5 seconds
dtrace -n 'sched:::on-cpu { @[execname, tid] = count(); }' \
-n 'tick-5s { trunc(@, 10); printa(@); clear(@); }'
# How long are context switches taking?
dtrace -n 'sched:::off-cpu { self->ts = timestamp; }' \
-n 'sched:::on-cpu /self->ts/ {
@[execname] = quantize(timestamp - self->ts);
self->ts = 0; }'
# Watch priority changes in real time
dtrace -n 'sched:::change-pri {
printf("%s[%d]: %d -> %d",
execname, tid, arg1, arg2); }'
enqueue probe: Fires every time a thread enters the run queue. Great for seeing which processes are most active.
on-cpu aggregation: Counts scheduling events per process, reports every 5 seconds. Shows who the CPU-hungriest processes are in production.
off-cpu/on-cpu latency: Measures how long each thread waits between being switched out and switched back in. A distribution (quantize) shows if outliers exist.
change-pri trace: Shows priority changes as they happen. Useful for diagnosing unexpected priority boosts or inversions.
All probes fire at near-zero cost when not active — SDT probes are nop instructions when DTrace is not running.
Scheduling Data Flow
The complete journey of a scheduling decision, from timer interrupt to new thread executing:
❶ Timer Interrupt
hardclock() fires → calls sched_clock(td, cnt) to update CPU usage statistics and check if the time slice has expired.
❷ Preemption Check
If the time slice is up, the scheduler sets TDF_NEEDRESCHED on the thread. This flag is checked on return from interrupt.
❸ mi_switch()
The kernel calls mi_switch(), which calls sched_switch(). The scheduler picks the highest-priority runnable thread from the run queue.
❹ cpu_switch()
Architecture-specific code saves old registers, loads new registers, switches the kernel stack pointer. On return, we are running as the new thread.
❺ Address Space Switch
If the new thread is in a different process, switch page tables and flush the TLB (unless PCID/ASID is available to avoid the flush).
❻ Resume Execution
The new thread returns from its saved mi_switch() call frame and continues executing. sched:::on-cpu fires.
The entire flow — from timer interrupt to new thread executing — typically takes 1–10 microseconds. Fast enough to happen thousands of times per second on each CPU without noticeable overhead.
Putting It All Together
- Priority system: 256 levels, lower is better, divided into interrupt / realtime / kernel / timeshare / idle classes — and those classes have been shrinking/growing across versions.
- 4BSD: Classic decay-based scheduling, simple and predictable. Still useful for workloads where global lock contention is not an issue.
- ULE: Modern per-CPU design with interactivity scoring and work stealing. The default since FreeBSD 7.1 and continually refined.
- Run queue: Evolved from 64 shared-priority queues (FB10–15) to 256 one-priority-per-queue (FB16), eliminating intra-band inversions.
- Context switching: Architecture-specific register save/restore with lazy FPU and PCID/ASID support for cheaper process switches.
- Pluggable framework: IFUNC-based dispatching for boot-time scheduler selection — the prerequisite for a future ecosystem of specialized schedulers.
- DTrace: Production-safe observability into every scheduling decision with near-zero overhead when inactive.