Sunday, May 23, 2010

x86 Linux Networking System Calls: Socketcall

In a monolithic kernel, such as Linux, networking operations are performed within kernel space. This is clearly seen on architectures such as the DEC Alpha, where system calls, the fabric that connects user space and kernel space, exist for socket operations such as connect, listen, and bind. However, this is not the case with the x86 platform. Instead, on the x86 platform, all socket operations are multiplexed through one system call, socketcall. Socketcall takes two parameters. The first parameter is an integer which specifies which call to execute. The values and their respective function are listed in /usr/include/linux/net.h, and are reproduced below:

$ grep SYS_ /usr/include/linux/net.h  
#define SYS_SOCKET      1               /* sys_socket(2)                */
#define SYS_BIND        2               /* sys_bind(2)                  */
#define SYS_CONNECT     3               /* sys_connect(2)               */
#define SYS_LISTEN      4               /* sys_listen(2)                */
#define SYS_ACCEPT      5               /* sys_accept(2)                */
#define SYS_GETSOCKNAME 6               /* sys_getsockname(2)           */
#define SYS_GETPEERNAME 7               /* sys_getpeername(2)           */
#define SYS_SOCKETPAIR  8               /* sys_socketpair(2)            */
#define SYS_SEND        9               /* sys_send(2)                  */
#define SYS_RECV        10              /* sys_recv(2)                  */
#define SYS_SENDTO      11              /* sys_sendto(2)                */
#define SYS_RECVFROM    12              /* sys_recvfrom(2)              */
#define SYS_SHUTDOWN    13              /* sys_shutdown(2)              */
#define SYS_SETSOCKOPT  14              /* sys_setsockopt(2)            */
#define SYS_GETSOCKOPT  15              /* sys_getsockopt(2)            */
#define SYS_SENDMSG     16              /* sys_sendmsg(2)               */
#define SYS_RECVMSG     17              /* sys_recvmsg(2)               */
#define SYS_ACCEPT4     18              /* sys_accept4(2)               */
 
The second parameter is a pointer to an array of parameters for that corresponding call. For example, to create and then close a TCP socket:
 
        pushl        %ebp
        movl         %esp, %ebp
        subl         $16, %esp        

        # parameters for socket(2) 
        movl         $2, -12(%ebp) # PF_INET
        movl         $1, -8(%ebp)  # SOCK_STREAM
        movl         $0, -4(%ebp) 

        # invoke socketcall
        movl         $102, %eax          #socketcall
        movl         $1, %ebx            #socket
        leal         -12(%ebp), %ecx     #address of parameter array
        int          $0x80
        movl         %eax, -16(%ebp)

        # close socket
        movl         $6, %eax        # close
        movl         -16(%ebp), %ebx # load socket fd
        int          $0x80

        addl        $16, %esp
        popl        %ebp
 
Let's dissect this code.

The first three instructions setup a stack frame and create enough space on the stack to store both the file descriptor of the socket and the array of parameters to pass to the socket call. Because the stack on an x86 machine grows downwards, I subtract from the stack pointer to make room for these local variables. I subtract 16 bytes because the socket call requires an array of 3 unsigned long pointers, 4 bytes each, and thus we need 12 bytes to hold the parameter array, and then the file descriptor for the socket is 4 bytes, for a total of 16 bytes.

The next three instructions set the array with the desired parameters to the socket call. Notice that the values in the array are identical to the values that would be passed to the socket function.
The next four instructions invoke the socketcall system call. On the x86 platform, system calls are performed in one of three ways. In the first method, the system call number is loaded into the EAX register, and then the parameters are loaded into the EBX, ECX, EDX, ESI, EDI, and EBP registers respectively. The system call is then invoked using interrupt 128. Another way to invoke a system call is to load the system call number into the EAX register, and then the parameters are loaded into the EBX, EBP, EDX, ESI, and EDI registers respectively. The system call is then invoked using the SYSCALL instruction. The third method to invoke a system call is to load the system call number into the EAX register, and then the parameters into the EBX, ECX, EDX, ESI, EDI, and EBP registers respectively. Finally the system call is invoked using the SYSENTER instruction. Each of these methods are documented in arch/x86/ia32/ia32entry.S of the kernel source.

In this case, I use the first method. First the system call number, 102 for socketcall, is loaded into the EAX register. Then the first parameter, derived from the table above, is moved into the EBX register. The address of the array, containing the parameters to socket(2), is then loaded into the ECX register. The last of the four instructions invokes the system call using interrupt 128. At this point, a socket is created and the file descriptor is returned in the EAX register. The movl instruction after the interrupt saves the file descriptor into the local variable we created earlier on the stack for future use, since the EAX register will be clobbered.

At this point, subsequent socketcall invocations would allow for the socket to be used in various networking functions. Afterwards, it is necessary to close the socket. Since a socket is a file descriptor, it can be closed just like any other file descriptor, using the close system call. Finally, once the socket file descriptor is closed, the stack frame is restored by adding the 16 bytes we subtracted from the stack pointer, effectively deleting the local variables initially created, and then restoring the preserved EBP register.

Hopefully within the next week or so, I will post a full example application that utilizes this interface. It is important to note that, due to the platform specific nature of this system call, it should not be used in applications that claim to be portable. This article aims to provide insight into the underlying mechanisms, rather then a tutorial of socket programming in Linux.

6 comments:

  1. A great overview, very nice. I do have one question though.

    You present the what and the how nicely clearly, but do you have any insight into the why? Why, given all the other syscalls get their own numbers, are these socket calls multiplexed in their own second-level hierarchy on x86 Linux?

    ReplyDelete
  2. Thanks, Paul. That is a very good question.

    From what I understand, this additional hierarchy was implemented to conserve system call numbers.

    ReplyDelete
  3. Jonathan TrowbridgeDecember 27, 2010 at 3:14 PM

    Hey great tutorial!

    I don't understand one thing tho. I notice that, your copying the socket file descriptor into ebx. Shouldn't it be ebp?

    You have:

    movl %eax, -16(%ebx)
    vs.
    movl %eax, -16(%ebp)

    and:

    movl -16(%ebx), %ebx # load socket fd
    vs.
    movl -16(%ebp), %ebx # load socket fd

    If what you had originally is correct could you please explain why this is as I'm just starting to learn assembly.

    ReplyDelete
  4. Thanks, Jonathan.

    You are absolutely correct. That was a typo on my part. It should be fixed now.

    Sorry for the confusion.

    ReplyDelete
  5. Hello,
    I did strace on gnome-dictionary, on ubuntu 10.04 32bit system. Even then it gave me system calls like bind and listen seperately instead of socketcall. The uname-a command gives i686 system. I don't know why it is so. Can you please explain it to me. The kernel is 2.6.32-28-generic#55.

    ReplyDelete
  6. Hi Utkarsh,

    Taking a glance at the source for strace, I see that strace splits up the socketcall into the appropriate subcalls. If you're interested, look in the strace source directory for the preprocessor define "SYS_socket_subcall"

    ReplyDelete