Monthly Archives: April 2014

Using simple seccomp filters

http://outflux.net/teach-seccomp/

Introduction

The Linux kernel (starting in version 3.5) supports “seccomp filter” (or “mode 2 seccomp”). Ubuntu 12.04 LTS had it backported to its 3.2 kernel, and Chrome OS has been using it (in various forms) for a while. This document is designed as a quick-start guide for software authors that want to take advantage of this security feature. In the simplest terms, it allows a program to declare ahead of time which system calls it expects to use, so that if an attacker gains arbitrary code execution, they cannot poke at any unexpected system calls.
The full seccomp filter documentation can be found in the Linux kernel source, here. The seccomp filter system uses the Berkley Packet Filter system. Combined with argument checking and the many possible filter return values (kill, trap, trace, errno), this is allows for extensive logic. This document seeks to show only the minimal case of defining a syscall whitelist. Everything not added to this filter causes the program to be killed.
To determine which seccomp features are available at runtime, please see the seccomp autodetection examples.
Since it is not always obvious to see which syscalls are being called by the various libraries a program might use, this document also includes example code that provides a helper to assist in discovering unwhitelisted syscalls during filter development.

Example Program

First, we start with an example program that reads stdin, writes to stdout, sleeps, and exits. We want to make sure it never calls “fork”, so we’ve added that to the end so we can verify that seccomp filter is working, once it gets added.

/*
 * seccomp example with syscall reporting
 *
 * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
 * Authors:
 *  Kees Cook <keescook@chromium.org>
 *  Will Drewry <wad@chromium.org>
 *
 * Use of this source code is governed by a BSD-style license that can be
 * found in the LICENSE file.
 */
#define _GNU_SOURCE 1
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>

#include "config.h"

int main(int argc, char *argv[])
{
	char buf[1024];

	printf("Type stuff here: ");
	fflush(NULL);
	buf[0] = '';
	fgets(buf, sizeof(buf), stdin);
	printf("You typed: %s", buf);

	printf("And now we fork, which should do quite the opposite ...\n");
	fflush(NULL);
	sleep(1);

	fork();
	printf("You should not see this because I'm dead.\n");

	return 0;
}
When we build and run this now, we get:

$ autoconf
$ ./configure
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
configure: creating ./config.status
config.status: creating config.h
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o   -o example
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
You should not see this because I'm dead.
You should not see this because I'm dead.

Everything is working, even the “fork” we want to eliminate.

Adding basic seccomp filtering

Next, we include the fancy “seccomp-bpf.h” header. Additionally, this also updates an example “configure.ac” to check for the new “linux/seccomp.h” include, since “seccomp-bpf.h” would like to use it. Then we build our initial list of basic system calls we expect (signal handling, read, write, exit). The flow of a simple seccomp BPF starts with verifying the architecture (since syscall numbers are tied to architecture), and then loads the syscall number and compares it against the whitelist. If no good match is found, it kills the process:

--- step-1/example.c	2012-03-22 21:43:10.845732543 -0700
+++ step-2/example.c	2012-03-22 21:50:56.373304922 -0700
@@ -16,11 +16,54 @@
 #include <unistd.h>
 
 #include "config.h"
+#include "seccomp-bpf.h"
+
+static int install_syscall_filter(void)
+{
+	struct sock_filter filter[] = {
+		/* Validate architecture. */
+		VALIDATE_ARCHITECTURE,
+		/* Grab the system call number. */
+		EXAMINE_SYSCALL,
+		/* List allowed syscalls. */
+		ALLOW_SYSCALL(rt_sigreturn),
+#ifdef __NR_sigreturn
+		ALLOW_SYSCALL(sigreturn),
+#endif
+		ALLOW_SYSCALL(exit_group),
+		ALLOW_SYSCALL(exit),
+		ALLOW_SYSCALL(read),
+		ALLOW_SYSCALL(write),
+		KILL_PROCESS,
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+		.filter = filter,
+	};
+
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(NO_NEW_PRIVS)");
+		goto failed;
+	}
+	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+		perror("prctl(SECCOMP)");
+		goto failed;
+	}
+	return 0;
+
+failed:
+	if (errno == EINVAL)
+		fprintf(stderr, "SECCOMP_FILTER is not available. :(\n");
+	return 1;
+}
 
 int main(int argc, char *argv[])
 {
 	char buf[1024];
 
+	if (install_syscall_filter())
+		return 1;
+
 	printf("Type stuff here: ");
 	fflush(NULL);
 	buf[0] = '';
--- step-1/configure.ac	2012-03-22 21:40:51.651435417 -0700
+++ step-2/configure.ac	2012-03-22 21:44:19.438868163 -0700
@@ -2,4 +2,5 @@
 AC_PREREQ([2.59])
 AC_CONFIG_HEADERS([config.h])
 AC_PROG_CC
+AC_CHECK_HEADERS([linux/seccomp.h])
 AC_OUTPUT
While this gets us to a nice starting place, it’s not obvious what’s still needed when we run the program, since it just blows up instead:

$ ./configure
...
checking for linux/seccomp.h... yes
configure: creating ./config.status
config.status: creating config.h
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o   -o example
$ ./example
Bad system call
$ echo $?
159

Adding syscall reporting

Now we can utilize one of the extra features of seccomp filter, and temporarily catch the failed syscall and report it, instead of immediately exiting. The intention is to remove this at the end, since once we’ve finished our syscall list, we won’t need to change it (unless the program or its libraries change, in which case, we can do this again).
Here, we add the “syscall-reporter.mk” Makefile include and the “syscall-reporter.c” object to the Makefile, and then add “syscall-reporter.h” and a call to “install_syscall_reporter” to the program.

--- step-2/example.c	2012-03-22 21:50:56.373304922 -0700
+++ step-3/example.c	2012-03-22 21:51:04.377433872 -0700
@@ -17,6 +17,7 @@
 
 #include "config.h"
 #include "seccomp-bpf.h"
+#include "syscall-reporter.h"
 
 static int install_syscall_filter(void)
 {
@@ -34,6 +35,7 @@
 		ALLOW_SYSCALL(exit),
 		ALLOW_SYSCALL(read),
 		ALLOW_SYSCALL(write),
+		/* Add more syscalls here. */
 		KILL_PROCESS,
 	};
 	struct sock_fprog prog = {
@@ -61,6 +63,8 @@
 {
 	char buf[1024];
 
+	if (install_syscall_reporter())
+		return 1;
 	if (install_syscall_filter())
 		return 1;
 
--- step-2/Makefile	2012-03-22 19:41:02.510347542 -0700
+++ step-3/Makefile	2012-03-22 19:41:33.706847395 -0700
@@ -3,7 +3,9 @@
 
 all: example
 
-example: example.o
+include syscall-reporter.mk
+
+example: example.o syscall-reporter.o
 
 .PHONY: clean
 clean:
Now, when we run it, we can see the missing syscalls, and progressively add them until we’re up to the fork (which is implemented via the “clone” syscall):

$ make
gcc -Wall   -c -o example.o example.c
In file included from example.c:20:0:
syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp]
echo "static const char *syscall_names[] = {" > syscall-names.h ;\
        echo "#include <syscall.h>" | cpp -dM | grep '^#define __NR_' | \
                LC_ALL=C sed -r -n -e 's/^\#define[ \t]+__NR_([a-z0-9_]+)[ \t]+([0-9]+)(.*)/ [\2] = "\1",/p' >> syscall-names.h;\
        echo "};" >> syscall-names.h
gcc -Wall   -c -o syscall-reporter.o syscall-reporter.c
In file included from syscall-reporter.c:12:0:
syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp]
gcc   example.o syscall-reporter.o   -o example
$ ./example
Looks like you need syscall fstat(5) too!
$ vi example.c
...
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o syscall-reporter.o   -o example
$ ./example
Looks like you need syscall mmap(9) too!
$ vi example.c
...
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o syscall-reporter.o   -o example
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Looks like you need syscall rt_sigprocmask(14) too!
$ ...

Testing is done

This continues until we hit the report of the “clone” use, and we know we’re done:

--- step-3/example.c	2012-03-22 21:51:04.377433872 -0700
+++ step-4/example.c	2012-03-22 21:51:13.577583466 -0700
@@ -36,6 +36,11 @@
 		ALLOW_SYSCALL(read),
 		ALLOW_SYSCALL(write),
 		/* Add more syscalls here. */
+		ALLOW_SYSCALL(fstat),
+		ALLOW_SYSCALL(mmap),
+		ALLOW_SYSCALL(rt_sigprocmask),
+		ALLOW_SYSCALL(rt_sigaction),
+		ALLOW_SYSCALL(nanosleep),
 		KILL_PROCESS,
 	};
 	struct sock_fprog prog = {
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Looks like you need syscall clone(56) too!

Ready for prime-time

Now that we’re done, we can remove the syscall reporter again, and see that the program correctly dies when it hits the fork. (To be really done, the fork should be removed too!)

--- step-4/example.c	2012-03-22 21:51:13.577583466 -0700
+++ step-5/example.c	2012-03-22 21:51:21.785717260 -0700
@@ -17,7 +17,6 @@
 
 #include "config.h"
 #include "seccomp-bpf.h"
-#include "syscall-reporter.h"
 
 static int install_syscall_filter(void)
 {
@@ -35,7 +34,6 @@
 		ALLOW_SYSCALL(exit),
 		ALLOW_SYSCALL(read),
 		ALLOW_SYSCALL(write),
-		/* Add more syscalls here. */
 		ALLOW_SYSCALL(fstat),
 		ALLOW_SYSCALL(mmap),
 		ALLOW_SYSCALL(rt_sigprocmask),
@@ -68,8 +66,6 @@
 {
 	char buf[1024];
 
-	if (install_syscall_reporter())
-		return 1;
 	if (install_syscall_filter())
 		return 1;
 
--- step-4/Makefile	2012-03-22 19:55:27.056164102 -0700
+++ step-5/Makefile	2012-03-22 19:55:33.680270186 -0700
@@ -3,9 +3,7 @@
 
 all: example
 
-include syscall-reporter.mk
-
-example: example.o syscall-reporter.o
+example: example.o
 
 .PHONY: clean
 clean:
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Bad system call
$ echo $?
159

Conclusion

Ta-da! That’s it — you’ve now got a seccomp filter built into your program. To make this even more portable, you can ignore the “prctl” failures if seccomp is not available, or warn the user but not die, or put the entire thing behind a “#ifdef HAVE_LINUX_SECCOMP_H” test.
For more complex, or dynamic, BPF constructions, you’ll probably want to take a look at libseccomp.
For a stand-alone filtering tool, check out minijail.
Thanks for reading! —Kees Cook, Mar-Nov 2012.
For reference, this is all under a BSD license.

Unix signal

http://en.wikipedia.org/wiki/Unix_signal

kill -STOP pid

kill -CONT pid

Signals are a limited form of inter-process communication used in UnixUnix-like, and other POSIX-compliant operating systems. A signal is an asynchronousnotification sent to a process or to a specific thread within the same process in order to notify it of an event that occurred. Signals have been around since the 1970sBell Labs Unix and have been more recently specified in the POSIX standard.

When a signal is sent, the operating system interrupts the target process’s normal flow of execution to deliver the signal. Execution can be interrupted during any non-atomic instruction. If the process has previously registered a signal handler, that routine is executed. Otherwise, the default signal handler is executed.

Embedded programs may find signals useful for interprocess communications, as the computational and memory footprint for signals is small.

Sending signals[edit]

The kill(2) system call will send a specified signal to a specified process, if permissions allow. Similarly, the kill(1) command allows a user to send signals to processes. The raise(3) library function sends the specified signal to the current process.

Exceptions such as division by zero or a segmentation violation will generate signals (here, SIGFPE and SIGSEGV respectively, which both by default cause a core dump and a program exit).

The kernel can generate signals to notify processes of events. For example, SIGPIPE will be generated when a process writes to a pipe which has been closed by the reader; by default, this causes the process to terminate, which is convenient when constructing shell pipelines.

Typing certain key combinations at the controlling terminal of a running process causes the system to send it certain signals:

  • Ctrl-C (in older Unixes, DEL) sends an INT signal (SIGINT); by default, this causes the process to terminate.
  • Ctrl-Z sends a TSTP signal (SIGTSTP); by default, this causes the process to suspend execution.
  • Ctrl-\ sends a QUIT signal (SIGQUIT); by default, this causes the process to terminate and dump core.
  • Ctrl-T (not supported on all UNIXes) sends an INFO signal (SIGINFO); by default, and if supported by the command, this causes the operating system to show information about the running command.

These default key combinations with modern operating systems can be changed with the stty command.

Handling signals[edit]

Signal handlers can be installed with the signal() system call. If a signal handler is not installed for a particular signal, the default handler is used. Otherwise the signal is intercepted and the signal handler is invoked. The process can also specify two default behaviors, without creating a handler: ignore the signal (SIG_IGN) and use the default signal handler (SIG_DFL). There are two signals which cannot be intercepted and handled: SIGKILL and SIGSTOP.

Risks[edit]

Signal handling is vulnerable to race conditions. Because signals are asynchronous, another signal (even of the same type) can be delivered to the process during execution of the signal handling routine. The sigprocmask() call can be used to block and unblock delivery of signals.

Signals can cause the interruption of a system call in progress, leaving it to the application to manage a non-transparent restart.

Signal handlers should be written in a way that doesn’t result in any unwanted side-effects, e.g. errno alteration, signal mask alteration, signal disposition change, and other global process attribute changes. Use of non-reentrant functions, e.g., malloc or printf, inside signal handlers is also unsafe.

Signal handlers can instead put the signal into a queue and immediately return. The main thread will then continue “uninterrupted” until signals are taken from the queue, such as in an event loop. “Uninterrupted” here means that operations that block may return prematurely and must be resumed, as mentioned above. Signals should be processed from the queue on the main thread and not by worker pools, as that reintroduces the problem of asynchronicity.

Relationship with hardware exceptions[edit]

process‘s execution may result in the generation of a hardware exception, for instance, if the process attempts to divide by zero or incurs a TLB miss.

In Unix-like operating systems, this event automatically changes the processor context to start executing a kernel exception handler. In case of some exceptions, such as a page fault, the kernel has sufficient information to fully handle the event itself and resume the process’s execution.

Other exceptions, however, the kernel cannot process intelligently and it must instead defer the exception handling operation to the faulting process. This deferral is achieved via the signal mechanism, wherein the kernel sends to the process a signal corresponding to the current exception. For example, if a process attempted integer divide by zero on an x86 CPU, a divide error exception would be generated and cause the kernel to send the SIGFPE signal to the process.

Similarly, if the process attempted to access a memory address outside of its virtual address space, the kernel would notify the process of this violation via aSIGSEGV signal. The exact mapping between signal names and exceptions is obviously dependent upon the CPU, since exception types differ between architectures.

POSIX signals[edit]

The list below documents the signals that are specified by the Single Unix Specification.[1] All signals are defined as macro constants in <signal.h> header file. The name of the macro constant consists of a “SIG” prefix and several characters that identify the signal. Each macro constant expands into an integer; these numbers can vary across platforms.

SIGABRT
The SIGABRT signal is sent to a process to tell it to abort, i.e. to terminate. The signal is usually initiated by the process itself when it calls abort function of the C Standard Library, but it can be sent to the process from outside as well as any other signal.
SIGALRM, SIGVTALRM and SIGPROF
The SIGALRM, SIGVTALRM and SIGPROF signal is sent to a process when the time limit specified in a call to a preceding alarm setting function (such assetitimer) elapses. SIGALRM is sent when real or clock time elapses. SIGVTALRM is sent when CPU time used by the process elapses. SIGPROF is sent when CPU time used by the process and by the system on behalf of the process elapses.
SIGBUS
The SIGBUS signal is sent to a process when it causes a bus error. The conditions that lead to the signal being raised are, for example, incorrect memory access alignment or non-existent physical address.
SIGCHLD
The SIGCHLD signal is sent to a process when a child process terminates, is interrupted, or resumes after being interrupted. One common usage of the signal is to instruct the operating system to clean up the resources used by a child process after its termination without an explicit call to the wait system call.
SIGCONT
The SIGCONT signal instructs the operating system to continue (restart) a process previously paused by the SIGSTOP or SIGTSTP signal. One important use of this signal is in job control in the Unix shell.
SIGFPE
The SIGFPE signal is sent to a process when it executes an erroneous arithmetic operation, such as division by zero (the name “FPE”, standing for floating-point exception, is a misnomer as the signal covers integer-arithmetic errors as well).[2]
SIGHUP
The SIGHUP signal is sent to a process when its controlling terminal is closed. It was originally designed to notify the process of a serial line drop (a hangup). In modern systems, this signal usually means that the controlling pseudo or virtual terminal has been closed.[3] Many daemons will reload their configuration files and reopen their logfiles instead of exiting when receiving this signal.[4] nohup is a command to make a command ignore the signal.
SIGILL
The SIGILL signal is sent to a process when it attempts to execute an illegal, malformed, unknown, or privileged instruction.
SIGINT
The SIGINT signal is sent to a process by its controlling terminal when a user wishes to interrupt the process. This is typically initiated by pressing Control-C, but on some systems, the “delete” character or “break” key can be used.[5]
SIGKILL
The SIGKILL signal is sent to a process to cause it to terminate immediately (kill). In contrast to SIGTERM and SIGINT, this signal cannot be caught or ignored, and the receiving process cannot perform any clean-up upon receiving this signal.
SIGPIPE
The SIGPIPE signal is sent to a process when it attempts to write to a pipe without a process connected to the other end.
SIGQUIT
The SIGQUIT signal is sent to a process by its controlling terminal when the user requests that the process quit and perform a core dump.
SIGSEGV
The SIGSEGV signal is sent to a process when it makes an invalid virtual memory reference, or segmentation fault, i.e. when it performs a segmentationviolation.[6]
SIGSTOP
The SIGSTOP signal instructs the operating system to stop a process for later resumption.
SIGTERM
The SIGTERM signal is sent to a process to request its termination. Unlike the SIGKILL signal, it can be caught and interpreted or ignored by the process. This allows the process to perform nice termination releasing resources and saving state if appropriate. It should be noted that SIGINT is nearly identical to SIGTERM.
SIGTSTP
The SIGTSTP signal is sent to a process by its controlling terminal to request it to stop temporarily. It is commonly initiated by the user pressing Control-Z. UnlikeSIGSTOP, the process can register a signal handler for or ignore the signal.
SIGTTIN and SIGTTOU
The SIGTTIN and SIGTTOU signals are sent to a process when it attempts to read in or write out respectively from the tty while in the background. Typically, this signal can be received only by processes under job controldaemons do not have controlling terminals and should never receive this signal.
SIGUSR1 and SIGUSR2
The SIGUSR1 and SIGUSR2 signals are sent to a process to indicate user-defined conditions.
SIGPOLL
The SIGPOLL signal is sent to a process when an asynchronous I/O event occurs (meaning it has been polled).
SIGSYS
The SIGSYS signal is sent to a process when it passes a bad argument to a system call.
SIGTRAP
The SIGTRAP signal is sent to a process when an exception (or trap) occurs: a condition that a debugger has requested to be informed of — for example, when a particular function is executed, or when a particular variable changes value.
SIGURG
The SIGURG signal is sent to a process when a socket has urgent or out-of-band data available to read.
SIGXCPU
The SIGXCPU signal is sent to a process when it has used up the CPU for a duration that exceeds a certain predetermined user-settable value.[7] The arrival of a SIGXCPU signal provides the receiving process a chance to quickly save any intermediate results and to exit gracefully, before it is terminated by the operating system using the SIGKILL signal.
SIGXFSZ
The SIGXFSZ signal is sent to a process when it grows a file larger than the maximum allowed size.
SIGRTMIN to SIGRTMAX
The SIGRTMIN to SIGRTMAX signals are intended to be used for user-defined purposes. They are real-time signals.
Signal Code Default Action Description
SIGABRT 6 A Process abort signal
SIGALRM 14 T Alarm clock
SIGBUS 10 A Access to an undefined portion of a memory object
SIGCHLD 18 I – Ignore the Signal Child process terminated, stopped,
SIGCONT 25 C – Continue the process Continue executing, if stopped.
SIGFPE 8 A Erroneous arithmetic operation.
SIGHUP 1 T Hangup.
SIGILL 4 A Illegal instruction.
SIGINT 2 T Terminal interrupt signal.
SIGKILL 9 T Kill (cannot be caught or ignored).
SIGPIPE 13 T – Abnormal termination of the process Write on a pipe with no one to read it.
SIGQUIT 3 A – Abnormal termination of the process Terminal quit signal.
SIGSEGV 11 A Invalid memory reference.
SIGSTOP 23 S – Stop the process Stop executing (cannot be caught or ignored).
SIGTERM 15 T Termination signal.
SIGTSTP 23 S Terminal stop signal.
SIGTTIN 26 S Background process attempting read.
SIGTTOU 27 S Background process attempting write.
SIGUSR1 16 T User-defined signal 1.
SIGUSR2 17 T User-defined signal 2.
SIGPOLL 22 T Pollable event.
SIGPROF 29 T Profiling timer expired.
SIGSYS 12 A Bad system call.
SIGTRAP 5 A Trace/breakpoint trap.
SIGURG 21 I High bandwidth data is available at a socket.
SIGVTALRM 28 T Virtual timer expired.
SIGXCPU 30 A CPU time limit exceeded.
SIGXFSZ 31 A File size limit exceeded
Default Actions:
T – Abnormal termination of the process. The process is terminated with all the consequences of _exit() except that the status made available to wait() and waitpid() indicates abnormal termination by the specified signal.
A – Abnormal termination of the process. Additionally, implementation-defined abnormal termination actions, such as creation of a core file, may occur.
I – Ignore the signal.
S – Stop the process.
C – Continue the process, if it is stopped; otherwise, ignore the signal.

Miscellaneous signals[edit]

The following signals are not specified in the POSIX specification. They are, however, sometimes used on various systems.

SIGEMT
The SIGEMT signal is sent to a process when an emulator trap occurs.
SIGINFO
The SIGINFO signal is sent to a process when a status (info) request is received from the controlling terminal.
SIGPWR
The SIGPWR signal is sent to a process when the system experiences a power failure.
SIGLOST
The SIGLOST signal is sent to a process when a file lock is lost.
SIGWINCH
The SIGWINCH signal is sent to a process when its controlling terminal changes its size (a window change).