system call

From NetBSD Wiki

Jump to: navigation, search

System calls (or just syscalls) are a way to ask the kernel to perform a certain task on behalf of a process. Processes can't access hardware, the filesystem or pseudoterminals directly. Any of these requests must go through the kernel. Normally one doesn't need to bother with syscalls directly, because there are many nice wrappers for system calls that make their use a lot simpler. Examples include the open, read, malloc (via brk) and sysctl(2) system calls.

This also explains the description of the shell as "an abstraction over the Unix programming interface", since a lot of ordinary commands (like mkdir, sysctl, File descriptor manipulation and even just cd) in the shell map almost directly to a system call.

By necessity, the syscall handler itself checks for any out-of-bounds or otherwise incorrect arguments. In quite a number of cases, it also needs to verify that the calling process (which is simply the process currently running on the CPU) has sufficient permissions (determined by the EUID and EGID) to have the task performed that it requested. As you can imagine, this is a very important point in the security of the system.

Performing a system call manually

The way the actual system call is performed differs across platforms and OSes. On NetBSD/i386, it goes something like this (from src/lib/libc/arch/i386/sys/syscall.S):

syscall:
        pop     %ecx    /* rta */
        pop     %eax    /* syscall number */
        push    %ecx
        int     $0x80
        push    %ecx    /* Keep stack frame consistant */
        jc      err
        ret
err:
        jmp  CERROR

The syscall(2) function gets called from C with a (variable) number of arguments on the stack. The first argument, which is the syscall number, is popped off the stack and stored in the %eax register. The %ecx stuff is just there because the return address ("rta") is always stored on top of the stack and needs to be taken off temporarily to fiddle with the syscall number (the final push of %ecx is some magic to keep the stack the same size, it doesn't really have a function vital to the syscall. It is there because we popped off the eax as well, and C assumes the stack to be a certain size. That's why we need to ensure the stack is the same size on exit as it was on entry). The actual system call is invoked by means of a hardware interrupt, 0x80. Any additional arguments are simply left on the stack, which the in-kernel system call handler can have a look at when it needs to. note: The system call handler does NOT pop off its arguments.

You can just call a system call via libc, using the aforementioned syscall(2) function. If you must implement it in assembly for whatever reason, you can use the code just shown. The "magic" syscall numbers (to pass in %eax) can be found in src/sys/sys/syscall.h or in /usr/include/sys/syscall.h on an installed system. These numbers are OS dependent, but on NetBSD syscalls are equal across all Ports.

Personal tools