Why the Sandbox Works
Capsicum is easiest to understand if you stop thinking about permissions as a mood and start thinking about them as tickets. Once a process enters capability mode, it must show an explicit ticket for almost everything interesting.
Ambient access
Classic UNIX code can often reach out to the whole system namespace whenever it wants.
Delegated handles
Capsicum turns file descriptors into authority tokens you can pass around and shrink.
Irreversible drop
Once the process enters capability mode, future descendant processes stay there too.
if ((fd = open(basedir, O_DIRECTORY | O_RDONLY | O_CLOEXEC)) < 0)
err(EXIT_FAILURE, "open(%s)", basedir);
cap_rights_init(&rights, CAP_READ, CAP_FSTATAT, CAP_LOOKUP,
CAP_FCNTL);
if (cap_rights_limit(fd, &rights) < 0 && errno != ENOSYS)
err(EXIT_FAILURE, "cap_rights_limit");
if (cap_enter() < 0 && errno != ENOSYS)
err(EXIT_FAILURE, "cap_enter");
Open one directory while the process still has full naming power.
Prepare a rights set that says what this directory handle is allowed to do later.
Shrink the handle permanently. It can never grow broader again.
Only after the handle is shaped do we enter capability mode.
Capsicum is not mainly about denying things at random. It is about forcing naming to happen early and forcing the steady-state part of the program to happen through explicit handles.
You are sandboxing a file-serving worker. Which phase should still be doing broad pathname discovery?
Two Locks, Not One
A lot of Capsicum confusion comes from mixing up the process-wide wall with the per-descriptor wall. The kernel keeps those separate, and your design should too.
`SYF_CAPENABLED` is just the kernel's way of labeling a syscall as safe to use in capability mode.
if ((se->sy_flags & SYF_CAPENABLED) == 0) {
if (CAP_TRACING(td))
ktrcapfail(CAPFAIL_SYSCALL, NULL);
if (IN_CAPABILITY_MODE(td)) {
td->td_errno = error = ECAPMODE;
goto retval;
}
}
Look at the syscall entry in the kernel table.
If that syscall is not marked capability-enabled, record the violation if tracing is on.
If the process is sandboxed, stop right there and return ECAPMODE.
if (!cap_rights_contains(havep, needp)) {
if (CAP_TRACING(curthread))
ktrcapfail(type, rights);
return (ENOTCAPABLE);
}
return (0);
Compare the rights this fd has with the rights the operation needs.
If the smaller set does not fit inside the larger one, record the violation for tracing.
Return ENOTCAPABLE because the handle is too weak for the requested action.
ECAPMODE
Your code still depends on ambient namespace power.
ENOTCAPABLE
Your architecture is close, but the delegated handle is missing rights.
kern.trap_enotcap
Useful debugging knob that turns capability failures into immediate, catchable traps.
A sandboxed process calls a syscall that the kernel has not marked as capability-safe. What kind of fix should you be thinking about first?
Naming Without Namespaces
The hardest mental shift is that pathnames stop being a free-floating string problem. In capability mode, a path is only meaningful relative to a delegated directory handle.
if (IN_CAPABILITY_MODE(td)) {
ndp->ni_lcf |= NI_LCF_STRICTREL;
ndp->ni_resflags |= NIRES_STRICTREL;
if (ndp->ni_dirfd == AT_FDCWD)
return (ECAPMODE);
}
When the caller is sandboxed, switch pathname lookup into strict relative mode.
Mark the result as a strict-relative lookup too.
If the caller tried to use AT_FDCWD, reject the request immediately.
if (chdir(_PATH_RWHODIR) < 0)
err(1, "chdir(%s)", _PATH_RWHODIR);
if ((dirp = opendir(".")) == NULL)
err(1, "opendir(%s)", _PATH_RWHODIR);
dfd = dirfd(dirp);
cap_rights_init(&rights, CAP_READ, CAP_LOOKUP);
if (caph_rights_limit(dfd, &rights) < 0)
err(1, "cap_rights_limit failed: %s", _PATH_RWHODIR);
Move into the directory while broad lookup still works.
Open the directory and turn it into a stable handle.
Shrink that handle so it can only read and look up entries beneath it later.
Open a directory before the sandbox starts.
Give it `CAP_LOOKUP` plus the exact rights needed for relative `*at()` operations such as `openat()` and `fstatat()`
Use only relative names rooted at that descriptor.
Changing `open()` to `openat()` is not enough by itself. The real change is that the directory fd becomes part of the security boundary, so you must choose it carefully and limit it carefully.
You are sandboxing a worker that may read only one subtree. What is the most capability-native interface to give it?
Patterns from the Tree
The FreeBSD tree does not use one giant Capsicum template. It uses a handful of repeatable patterns chosen to match the shape of each program.
Directory-rooted worker
`pkg-serve` and `rwho` operate inside one delegated subtree using `openat()` and similar calls.
Pre-connected sockets
`traceroute` does naming and connection setup before entering capability mode, then keeps only send and receive rights.
Helper-assisted stdio
`capsicum_helpers.h` cuts boilerplate for common stream setups and cache preloads.
Brokered global services
`syslogd` uses Casper services instead of reopening broad namespaces from inside the sandbox.
caph_cache_catpages();
if (cansandbox && cap_enter() < 0) {
if (errno != ENOSYS) {
Fprintf(stderr, "%s: cap_enter: %s\n", prog,
strerror(errno));
exit(1);
} else {
cansandbox = false;
}
}
Warm up localized message data first so later error handling does not touch the filesystem.
Try to enter the sandbox once setup is done.
If the kernel does not support Capsicum, degrade gracefully for this utility.
cap_casper = cap_init();
if (cap_casper == NULL)
err(1, "Failed to communicate with libcasper");
cap_syslogd = cap_service_open(cap_casper, "syslogd.casper");
if (cap_syslogd == NULL)
err(1, "Failed to open the syslogd.casper libcasper service");
Connect to a brokered capability service before or around sandbox entry.
Open the specific service channel the daemon will rely on later.
From this point on, the daemon can ask the broker for narrow help instead of reopening global access directly.
If your code still needs broad naming during the main loop, do not fight the sandbox. Split the program into a broker and a worker so the narrow phase becomes obvious.
A long-lived sandboxed daemon sometimes needs hostname resolution well after startup. Which pattern from the tree is the best fit?
A Porting Playbook
By this point the individual tools should feel less mysterious. The remaining question is how to sequence the work so an existing application becomes capability-friendly without turning into chaos.
List every ambient dependency: paths, DNS, user databases, ioctls, and PIDs.
Split setup from steady-state work.
Pre-open and limit descriptors.
Enter capability mode and debug the remaining violations.
caph_cache_catpages();
caph_cache_tzdata();
/*
* Cache UTX database fds.
*/
setutxent();
if (caph_enter() < 0)
err(1, "cap_enter");
Warm up locale and timezone data first.
Force the utmp database handles to exist before the sandbox starts.
Only then drop the process into capability mode.
You inherited a daemon that mixes config loading, file opens, DNS lookups, and packet handling inside one giant loop. You want Capsicum without a month-long rewrite.
First cut
Separate startup discovery from long-lived packet handling, even before you add the actual Capsicum calls.
Second cut
Instrument failures as `ECAPMODE` vs `ENOTCAPABLE`. That tells you whether to move code or just widen one rights set a little.
Third cut
If late global operations remain, build a broker boundary instead of weakening the worker.
Capsicum-friendly structure is also easier to reason about with humans and with AI coding tools, because resource acquisition and steady-state work stop being tangled together.