Blame - sandbox/linux/README.md - chromium/src.git

blob: 0629a10a413e8d6eff6972b56f6e4de60ed9f05c [file] [log] [blame] [view]

Elly Fong-Jones	fb88bfb	2024-06-11 15:46:15	[diff] [blame]	1	# The Linux Sandbox
				2
				3	The Linux sandbox provides an API for restricting the capabilities of a process.
				4	The overall design philosophy of the sandbox is documented
				5	[elsewhere](../docs/design/sandbox.md); this document explains how it works on
				6	Linux.
				7
				8	## Overall Design
				9
				10	There are several different sandboxing mechanisms available on Linux:
				11
				12	* setuid(2)
				13	* namespaces
				14	* seccomp(2) BPF
				15	* seccomp(2) legacy
				16	* selinux(7)
				17	* apparmor(7)
				18	* landlock(7)
				19
				20	Chromium chooses which mechanisms to use based on which kernel features are
				21	available. We also generally use multiple layers of sandboxing, to achieve both
				22	confinement for the process and reduction of the exposed kernel attack surface.
				23	Of these mechanisms, Chrome uses:
				24
				25	* setuid(2) everywhere
				26	* namespaces where supported (modern Linux kernels)
				27	* seccomp(2) BPF where supported (modern Linux kernels)
				28
				29	And we used to use, but no longer use:
				30
				31	* selinux(7)
				32	* apparmor(7)
				33
				34	## setuid(2)
				35
				36	The setuid(2) sandbox takes advantage of the fact that privileged processes on
				37	Linux are allowed to create new namespaces (see namespaces(7)) and sandboxes the
				38	renderer by creating empty namespaces for it at launch time. It relies on a
				39	setuid binary, usually installed at `/opt/google/chrome/chrome-sandbox`, which:
				40
				41	* Enters new PID and network namespaces, preventing the sandboxed process from
				42	directly accessing the network or seeing any other processes.
				43	* chroot()s into a "safe" directory (currently inside the process's own /proc
				44	directory) by spawning a privileged helper process which shares its fs state
				45	(using `CLONE_FS`) and having that helper chroot() it, which leaves the
				46	process in an empty, readonly root directory.
				47	* Marks itself as un-dumpable using `prctl(2)`, which prevents any process
				48	without `CAP_SYS_PTRACE` from tracing it. In theory this would keep renderers
				49	from debugging each other, but in practice they are isolated from each other
				50	by PID namespaces anyway.
				51	* Uses capset(2) to drop all inherited capabilities.
				52	* Drops from root back to the uid/gid/etc of the user running the browser
				53
				54	In general, the setuid sandbox makes an effort to apply all these mitigations,
				55	but support for them varies between kernel versions, so the strength of the
				56	setuid sandbox is variable, with newer kernels providing better security.
				57
				58	The setuid sandbox is implemented in [suid/](suid/).
				59
				60	If you need to disable it, you can use `--disable-setuid-sandbox`. You should
				61	also see
				62	[docs/linux/suid_sandbox_development.md](../../docs/linux/suid_sandbox_development.md)
				63	for advice on developing the setuid sandbox itself.
				64
				65	## seccomp(2) BPF
				66
				67	On modern Linuxes, we use the filter mode of seccomp(2), which allows us to
				68	supply a program (written in a domain-specific language called "BPF", see bpf(2)
				69	and bpfc(1)) which is evaluated every time the sandboxed process makes a syscall
				70	to figure out whether the syscall should be allowed. The seccomp filters are
				71	compiled and applied "early" in the syscall process, so this both constrains
				72	what the process can do and reduces attack surface of the kernel.
				73
				74	The seccomp sandbox is implemented in [seccomp-bpf/](seccomp-bpf/), and our
				75	tools for working with the BPF DSL are in [bpf_dsl/](bpf_dsl/). The actual
				76	baseline policies we use are in [seccomp-bpf-helpers/](seccomp-bpf-helpers/).
				77
				78	Since the seccomp sandbox has a filter that is applied to all syscalls being
				79	made, to use it you must have an exhaustive list of syscalls that could be made
				80	by the code being sandboxed - both code you did write and code you didn't write.
				81	Generating that list of syscalls can be difficult and so it is helpful to have
				82	very good test coverage which runs under the sandbox to ensure you are
				83	exercising any code paths that could lead to syscalls.
				84
				85	## landlock(7)
				86
				87	We currently don't use Landlock, but we'd like to:
				88	[345514921](https://issues.chromium.org/issues/345514921).