Category Archives: Linux

Using simple seccomp filters

http://outflux.net/teach-seccomp/

Introduction

The Linux kernel (starting in version 3.5) supports “seccomp filter” (or “mode 2 seccomp”). Ubuntu 12.04 LTS had it backported to its 3.2 kernel, and Chrome OS has been using it (in various forms) for a while. This document is designed as a quick-start guide for software authors that want to take advantage of this security feature. In the simplest terms, it allows a program to declare ahead of time which system calls it expects to use, so that if an attacker gains arbitrary code execution, they cannot poke at any unexpected system calls.
The full seccomp filter documentation can be found in the Linux kernel source, here. The seccomp filter system uses the Berkley Packet Filter system. Combined with argument checking and the many possible filter return values (kill, trap, trace, errno), this is allows for extensive logic. This document seeks to show only the minimal case of defining a syscall whitelist. Everything not added to this filter causes the program to be killed.
To determine which seccomp features are available at runtime, please see the seccomp autodetection examples.
Since it is not always obvious to see which syscalls are being called by the various libraries a program might use, this document also includes example code that provides a helper to assist in discovering unwhitelisted syscalls during filter development.

Example Program

First, we start with an example program that reads stdin, writes to stdout, sleeps, and exits. We want to make sure it never calls “fork”, so we’ve added that to the end so we can verify that seccomp filter is working, once it gets added.

/*
 * seccomp example with syscall reporting
 *
 * Copyright (c) 2012 The Chromium OS Authors <chromium-os-dev@chromium.org>
 * Authors:
 *  Kees Cook <keescook@chromium.org>
 *  Will Drewry <wad@chromium.org>
 *
 * Use of this source code is governed by a BSD-style license that can be
 * found in the LICENSE file.
 */
#define _GNU_SOURCE 1
#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <unistd.h>

#include "config.h"

int main(int argc, char *argv[])
{
	char buf[1024];

	printf("Type stuff here: ");
	fflush(NULL);
	buf[0] = '';
	fgets(buf, sizeof(buf), stdin);
	printf("You typed: %s", buf);

	printf("And now we fork, which should do quite the opposite ...\n");
	fflush(NULL);
	sleep(1);

	fork();
	printf("You should not see this because I'm dead.\n");

	return 0;
}
When we build and run this now, we get:

$ autoconf
$ ./configure
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables... 
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
configure: creating ./config.status
config.status: creating config.h
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o   -o example
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
You should not see this because I'm dead.
You should not see this because I'm dead.

Everything is working, even the “fork” we want to eliminate.

Adding basic seccomp filtering

Next, we include the fancy “seccomp-bpf.h” header. Additionally, this also updates an example “configure.ac” to check for the new “linux/seccomp.h” include, since “seccomp-bpf.h” would like to use it. Then we build our initial list of basic system calls we expect (signal handling, read, write, exit). The flow of a simple seccomp BPF starts with verifying the architecture (since syscall numbers are tied to architecture), and then loads the syscall number and compares it against the whitelist. If no good match is found, it kills the process:

--- step-1/example.c	2012-03-22 21:43:10.845732543 -0700
+++ step-2/example.c	2012-03-22 21:50:56.373304922 -0700
@@ -16,11 +16,54 @@
 #include <unistd.h>
 
 #include "config.h"
+#include "seccomp-bpf.h"
+
+static int install_syscall_filter(void)
+{
+	struct sock_filter filter[] = {
+		/* Validate architecture. */
+		VALIDATE_ARCHITECTURE,
+		/* Grab the system call number. */
+		EXAMINE_SYSCALL,
+		/* List allowed syscalls. */
+		ALLOW_SYSCALL(rt_sigreturn),
+#ifdef __NR_sigreturn
+		ALLOW_SYSCALL(sigreturn),
+#endif
+		ALLOW_SYSCALL(exit_group),
+		ALLOW_SYSCALL(exit),
+		ALLOW_SYSCALL(read),
+		ALLOW_SYSCALL(write),
+		KILL_PROCESS,
+	};
+	struct sock_fprog prog = {
+		.len = (unsigned short)(sizeof(filter)/sizeof(filter[0])),
+		.filter = filter,
+	};
+
+	if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
+		perror("prctl(NO_NEW_PRIVS)");
+		goto failed;
+	}
+	if (prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &prog)) {
+		perror("prctl(SECCOMP)");
+		goto failed;
+	}
+	return 0;
+
+failed:
+	if (errno == EINVAL)
+		fprintf(stderr, "SECCOMP_FILTER is not available. :(\n");
+	return 1;
+}
 
 int main(int argc, char *argv[])
 {
 	char buf[1024];
 
+	if (install_syscall_filter())
+		return 1;
+
 	printf("Type stuff here: ");
 	fflush(NULL);
 	buf[0] = '';
--- step-1/configure.ac	2012-03-22 21:40:51.651435417 -0700
+++ step-2/configure.ac	2012-03-22 21:44:19.438868163 -0700
@@ -2,4 +2,5 @@
 AC_PREREQ([2.59])
 AC_CONFIG_HEADERS([config.h])
 AC_PROG_CC
+AC_CHECK_HEADERS([linux/seccomp.h])
 AC_OUTPUT
While this gets us to a nice starting place, it’s not obvious what’s still needed when we run the program, since it just blows up instead:

$ ./configure
...
checking for linux/seccomp.h... yes
configure: creating ./config.status
config.status: creating config.h
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o   -o example
$ ./example
Bad system call
$ echo $?
159

Adding syscall reporting

Now we can utilize one of the extra features of seccomp filter, and temporarily catch the failed syscall and report it, instead of immediately exiting. The intention is to remove this at the end, since once we’ve finished our syscall list, we won’t need to change it (unless the program or its libraries change, in which case, we can do this again).
Here, we add the “syscall-reporter.mk” Makefile include and the “syscall-reporter.c” object to the Makefile, and then add “syscall-reporter.h” and a call to “install_syscall_reporter” to the program.

--- step-2/example.c	2012-03-22 21:50:56.373304922 -0700
+++ step-3/example.c	2012-03-22 21:51:04.377433872 -0700
@@ -17,6 +17,7 @@
 
 #include "config.h"
 #include "seccomp-bpf.h"
+#include "syscall-reporter.h"
 
 static int install_syscall_filter(void)
 {
@@ -34,6 +35,7 @@
 		ALLOW_SYSCALL(exit),
 		ALLOW_SYSCALL(read),
 		ALLOW_SYSCALL(write),
+		/* Add more syscalls here. */
 		KILL_PROCESS,
 	};
 	struct sock_fprog prog = {
@@ -61,6 +63,8 @@
 {
 	char buf[1024];
 
+	if (install_syscall_reporter())
+		return 1;
 	if (install_syscall_filter())
 		return 1;
 
--- step-2/Makefile	2012-03-22 19:41:02.510347542 -0700
+++ step-3/Makefile	2012-03-22 19:41:33.706847395 -0700
@@ -3,7 +3,9 @@
 
 all: example
 
-example: example.o
+include syscall-reporter.mk
+
+example: example.o syscall-reporter.o
 
 .PHONY: clean
 clean:
Now, when we run it, we can see the missing syscalls, and progressively add them until we’re up to the fork (which is implemented via the “clone” syscall):

$ make
gcc -Wall   -c -o example.o example.c
In file included from example.c:20:0:
syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp]
echo "static const char *syscall_names[] = {" > syscall-names.h ;\
        echo "#include <syscall.h>" | cpp -dM | grep '^#define __NR_' | \
                LC_ALL=C sed -r -n -e 's/^\#define[ \t]+__NR_([a-z0-9_]+)[ \t]+([0-9]+)(.*)/ [\2] = "\1",/p' >> syscall-names.h;\
        echo "};" >> syscall-names.h
gcc -Wall   -c -o syscall-reporter.o syscall-reporter.c
In file included from syscall-reporter.c:12:0:
syscall-reporter.h:21:2: warning: #warning "You've included the syscall reporter. Do not use in production!" [-Wcpp]
gcc   example.o syscall-reporter.o   -o example
$ ./example
Looks like you need syscall fstat(5) too!
$ vi example.c
...
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o syscall-reporter.o   -o example
$ ./example
Looks like you need syscall mmap(9) too!
$ vi example.c
...
$ make
gcc -Wall   -c -o example.o example.c
gcc   example.o syscall-reporter.o   -o example
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Looks like you need syscall rt_sigprocmask(14) too!
$ ...

Testing is done

This continues until we hit the report of the “clone” use, and we know we’re done:

--- step-3/example.c	2012-03-22 21:51:04.377433872 -0700
+++ step-4/example.c	2012-03-22 21:51:13.577583466 -0700
@@ -36,6 +36,11 @@
 		ALLOW_SYSCALL(read),
 		ALLOW_SYSCALL(write),
 		/* Add more syscalls here. */
+		ALLOW_SYSCALL(fstat),
+		ALLOW_SYSCALL(mmap),
+		ALLOW_SYSCALL(rt_sigprocmask),
+		ALLOW_SYSCALL(rt_sigaction),
+		ALLOW_SYSCALL(nanosleep),
 		KILL_PROCESS,
 	};
 	struct sock_fprog prog = {
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Looks like you need syscall clone(56) too!

Ready for prime-time

Now that we’re done, we can remove the syscall reporter again, and see that the program correctly dies when it hits the fork. (To be really done, the fork should be removed too!)

--- step-4/example.c	2012-03-22 21:51:13.577583466 -0700
+++ step-5/example.c	2012-03-22 21:51:21.785717260 -0700
@@ -17,7 +17,6 @@
 
 #include "config.h"
 #include "seccomp-bpf.h"
-#include "syscall-reporter.h"
 
 static int install_syscall_filter(void)
 {
@@ -35,7 +34,6 @@
 		ALLOW_SYSCALL(exit),
 		ALLOW_SYSCALL(read),
 		ALLOW_SYSCALL(write),
-		/* Add more syscalls here. */
 		ALLOW_SYSCALL(fstat),
 		ALLOW_SYSCALL(mmap),
 		ALLOW_SYSCALL(rt_sigprocmask),
@@ -68,8 +66,6 @@
 {
 	char buf[1024];
 
-	if (install_syscall_reporter())
-		return 1;
 	if (install_syscall_filter())
 		return 1;
 
--- step-4/Makefile	2012-03-22 19:55:27.056164102 -0700
+++ step-5/Makefile	2012-03-22 19:55:33.680270186 -0700
@@ -3,9 +3,7 @@
 
 all: example
 
-include syscall-reporter.mk
-
-example: example.o syscall-reporter.o
+example: example.o
 
 .PHONY: clean
 clean:
$ ./example
Type stuff here: asdf
You typed: asdf
And now we fork, which should do quite the opposite ...
Bad system call
$ echo $?
159

Conclusion

Ta-da! That’s it — you’ve now got a seccomp filter built into your program. To make this even more portable, you can ignore the “prctl” failures if seccomp is not available, or warn the user but not die, or put the entire thing behind a “#ifdef HAVE_LINUX_SECCOMP_H” test.
For more complex, or dynamic, BPF constructions, you’ll probably want to take a look at libseccomp.
For a stand-alone filtering tool, check out minijail.
Thanks for reading! —Kees Cook, Mar-Nov 2012.
For reference, this is all under a BSD license.

Unix domain socket

http://en.wikipedia.org/wiki/Unix_domain_socket

Unix domain socket or IPC socket (inter-process communication socket) is a data communications endpoint for exchanging data between processes executing within the same host operating system. While similar in functionality to named pipes, Unix domain sockets may be created as connection‑mode (SOCK_STREAM or SOCK_SEQPACKET) or as connectionless (SOCK_DGRAM), while pipes are streams only. Processes using Unix domain sockets do not need to share a common ancestry. The API for Unix domain sockets is similar to that of an Internet socket, but it does not use an underlying network protocol for communication. The Unix domain socket facility is a standard component of POSIX operating systems.

Unix domain sockets use the file system as their address name space. They are referenced by processes as inodes in the file system. This allows two processes to open the same socket in order to communicate. However, communication occurs entirely within the operating system kernel.

In addition to sending data, processes may send file descriptors across a Unix domain socket connection using the sendmsg() and recvmsg() system calls.

File descriptor

http://en.wikipedia.org/wiki/File_descriptor

File descriptor

From Wikipedia, the free encyclopedia

In computer programming, a file descriptor (FD) is an abstract indicator for accessing a file. The term is generally used in POSIX operating systems.

In POSIX, a file descriptor is an integer, specifically of the C type int. There are three standard POSIX file descriptors, corresponding to the three standard streams, which presumably every process (save perhaps a daemon) should expect to have:

Integer value Name
0 Standard input (stdin)
1 Standard output (stdout)
2 Standard error (stderr)

Generally, a file descriptor is an index for an entry in a kernel-resident array data structure containing the details of open files. In POSIX this data structure is called a file descriptor table, and each process has its own file descriptor table. The process passes the file descriptor to the kernel through a system call, and the kernel will access the file on behalf of the process. The process itself cannot read or write the file descriptor table directly.

On Linux, the set of file descriptors open in a process can be accessed under the path /proc/PID/fd/, where PID is the process identifier.

In Unix-like systems, file descriptors can refer to any Unix file type named in a file system. As well as regular files, this includes directoriesblock and character devices (also called “special files”), Unix domain sockets, and named pipes. File descriptors can also refer to other objects that do not normally exist in the file system, such as anonymous pipes and network sockets.

The FILE data structure in the C standard I/O library usually includes a low level file descriptor for the object in question on Unix-like systems. The overall data structure provides additional abstraction and is instead known as a file handle.