Search |
VNET: PlanetLab Virtualized Network Access
Abstract This document describes the design of VNET, the PlanetLab module that provides virtualized network access on PlanetLab nodes. Table of Contents This document describes the design of VNET, the PlanetLab module that provides virtualized network access on PlanetLab nodes. VNET replaces, but is also backward compatible with, the safe raw sockets interface supported by PLKMOD (also known as SILK) in version 2 of the PlanetLab software [1]. VNET provides a restricted form of raw IP and raw packet sockets, ensures isolation of traffic between slices, and supports a unique interface for accessing proxied IP addresses, all while maintaining compatibility with standard Linux/BSD socket APIs. The goal of VNET is to be as transparent as possible. The great majority of PlanetLab users should not have to recompile any code, use anything besides standard programs and APIs, or even be aware that VNET virtualizes their network access. This document is intended to be read by PlanetLab users who require raw or close-to-raw access to the network. Connection tracking is the heart of VNET. VNET relies on Linux's Netfilter system to associate every inbound and outbound IP packet with a connection structure. VNET then ensures that slices send and receive only packets associated with connections that they own. That is, slices can only:
When an IP packet is sent through a socket, it passes through VNET and is associated with a new or existing connection. If the connection is not already bound to a slice, VNET allows the packet through and binds the connection to the slice that sent the packet. If the connection is bound to a slice, and it is not the slice that sent the packet, the packet is dropped and an error is returned to the sending application. When an IP packet is received by the stack, it also passes through VNET and is associated with a new or existing connection. If the packet was expected; that is, if the connection was bound by a slice; or the connection was initiated by a slice, VNET allows the slice to receive the packet. Otherwise, the packet can only be received by the root slice. Connections are defined on a per-protocol basis. Currently, VNET supports the following protocols:
Associating packets with connections is not always trivial. For example, Netfilter considers some ICMP errors (e.g., most ICMP Port Unreachable messages) to be associated with the connection that caused the ICMP error to be generated, rather than the ICMP connection itself. A pleasant consequence of this behavior is that slices are automatically entitled to receive their own ICMP errors. Connections may be bound,
or effectively reserved, by using the
Example 1. Example C code for creating and binding a regular TCP socket. int sock; struct sockaddr_in sin; /* Create a TCP socket */ sock = socket(PF_INET, SOCK_STREAM, 0); /* Bind port 9090 on all interfaces to it */ memset(&sin, 0, sizeof(sin)); sin.sin_addr = htonl(INADDR_ANY); sin.sin_port = htons(9090); bind(sock, (struct sockaddr *) &sin, sizeof(sin)); Once a local port is successfully bound by a slice, no other
slice may send or receive packets associated with that port. If
another slice attempts to VNET and Netfilter thus effectively serve as a stateful switching firewall for local sockets. The system is stateful because connection state is continually tracked; it is switching because slices only see their own traffic; and it is a firewall because incoming traffic is redirected to the appropriate slice. VNET extends the Linux As previously mentioned, the primary restriction on sent packets is that slices may only send packets associated with connections that they own (i.e., new connections or connections that they initiated). Slices may send packets through any number of raw or regular sockets, although it is recommended that only a single raw IP or packet socket be open at any one time for the highest performance. Because raw IP and packet sockets are generally used only
by administrative programs, there are very few restrictions in
the kernel stack on what can be sent through them. VNET thus
implements its own restrictions on sent packets, and rejects
malformed or otherwise disallowed packets with the standard
error
For backward compatibility with PLKMOD, and to support
reservation for IP protocols besides TCP and UDP,
Example 2. Example C code for creating and binding a raw ICMP socket. int sock; struct sockaddr_in sin; /* Create a raw ICMP socket */ sock = socket(PF_INET, SOCK_RAW, IPPROTO_ICMP); /* Bind ICMP Echo ID 23456 to it */ memset(&sin, 0, sizeof(sin)); sin.sin_addr = htonl(INADDR_ANY); sin.sin_port = htons(23456); bind(sock, (struct sockaddr *) &sin, sizeof(sin)); /* Note that this socket will receive ALL ICMP packets that your slice is entitled to receive. The act of binding simply ensures that your slice will be able to send and receive packets associated with the specified ICMP Echo ID, through this socket or others. */ Example 3. Example C code for creating and binding a raw GRE socket. int sock; struct sockaddr_in sin; /* Create a raw GRE socket */ sock = socket(PF_INET, SOCK_RAW, IPPROTO_GRE); /* Bind GRE key 12 (and/or PPTP Call ID 12) to it */ memset(&sin, 0, sizeof(sin)); sin.sin_addr = htonl(INADDR_ANY); sin.sin_port = htons(12); bind(sock, (struct sockaddr *) &sin, sizeof(sin)); /* Note that this socket will receive ALL GRE and PPTP packets that your slice is entitled to receive. The act of binding simply ensures that your slice will be able to send and receive packets associated with the specified ICMP Echo ID, through this socket or others. */ Raw IP sockets must be created as the root user. Most of
the socket options described in the Usually, the Linux kernel will respond to TCP or UDP requests to ports that it believes are non-listening, with a TCP RST packet or an ICMP Port Unreachable error. The kernel does not track the state of connections carried out through raw IP sockets, and can usually interfere with them. VNET suppresses these kernel generated replies if a TCP or UDP port has been bound to a safe raw socket. This behavior is not standard and programs that rely on it, will not work on regular Linux. It is not absolutely necessary to bind local ports before using them. If a connection is not in use; that is, if the local source port of the connection has not been bound, and no other slice is sending packets from the same source tuple to the same destination tuple at the same time; then packets related to the connection may be sent and received through any number of regular or raw sockets until the port is bound by another slice. Lazy binding is why stock ping and traceroute now work in version 3 of PlanetLab Linux without modification. Both of these programs use unbound raw IP sockets to send packets with effectively random ICMP Echo IDs. Each ICMP connection that these programs generate, is lazily bound to the slice running it. Linux Example 4. Example C code for creating and binding a raw packet socket. int sock; struct ifreq ifr; struct sockaddr_ll sll; /* Open a generic IP socket for querying the stack */ sock = socket(PF_INET, SOCK_RAW, 0); /* Get interface index */ memset(&ifr, 0, sizeof(ifr)); ifr.ifr_addr.sa_family = PF_INET; strcpy(ifr.ifr_name, "vnet"); ioctl(sock, SIOCGIFINDEX, &ifr); /* Re-open raw packet socket */ close(sock); sock = socket(PF_PACKET, SOCK_DGRAM, htons(ETH_P_IP)); /* Bind packet socket to the "vnet" virtual Ethernet device */ memset(&sll, 0, sizeof(sll)); sll.sll_family = PF_PACKET; sll.sll_protocol = htons(ETH_P_IP); sll.sll_ifindex = ifr.ifr_ifindex; bind(sock, (struct sockaddr *) &sll, sizeof(sll)); Standard
VNET restricts the types of packets that may be sent through packet sockets. Normally, Linux packet sockets can be used to send any type of packet that the underlying device driver accepts. However, packets sent through VNET packet sockets are actually re-routed through the IP stack as if they were locally generated by a regular socket. Thus, VNET packet sockets may only be used to send well-formed, routable IP packets that follow the restrictions listed in Section 3.1, “Restrictions on sent packets”. Packet sockets may also be used to receive raw IP
packets. IP packets are routed to the Proxy sockets are special packet sockets that allow slices to utilize unused IP address space on PlanetLab node subnets. On nodes with access to such address space, slices may bind to a virtual interface that proxies for the space. The interfaces should be treated as if they were anonymous (i.e., unconfigured for IP) Ethernet interfaces in promiscuous mode, connected to a subnet servicing the unused address space. The primary applications for the interface are network telescopes and honey farms. Because proxy sockets do not require any special configuration, stock programs like tcpdump, Honeyd, and Snort may be used to implement such applications. Other possible applications include user-level IP routers such as Click and implementations of IP anycast. The same restrictions that apply to safe raw IP and packet
sockets, apply to proxy sockets as well. However, proxy sockets
should be bound to an available proxy device of the form
proxy0, proxy1, etc.,
rather than the Packets sent through proxy sockets are re-routed through the IP stack as if they were forwarded from a foreign host. Thus, the normal restrictions listed in Section 3.1, “Restrictions on sent packets” do not apply. As long as the packets are well-formed enough for the kernel to forward, they will be accepted. Stock Linux ships with a virtual network driver that
tunnels packets between VNET emulates a single TAP interface
The IP address assigned to the VNET provides a version of
Example 5. Example listing of
prot port slice types tcp 52906 524 C udp 9999 0 c icmp 55170 874 c tcp 49943 769 c tcp 43693 769 c udp 55196 874 c tcp 39261 769 c tcp 49075 769 c udp 9999 0 c The
Lazy binds, discussed in the section called “Lazy binding”,
are also printed in
Netfilter's Example 6. Example listing of
tcp 6 107 TIME_WAIT src=24.154.102.20 dst=128.112.139.71 sport=2550 dport=3127 xid=-1 \ src=128.112.139.71 dst=24.154.102.20 sport=3127 dport=2550 xid=524 [ASSURED] use=1 icmp 1 29 src=128.112.139.71 dst=198.78.49.61 type=8 code=0 id=32111 xid=661 [UNREPLIED] \ src=198.78.49.61 dst=128.112.139.71 type=0 code=0 id=32111 xid=-1 use=1 tcp 6 431583 ESTABLISHED src=128.238.35.25 dst=128.112.139.71 sport=41955 dport=3124 xid=-1 \ src=128.112.139.71 dst=128.238.35.25 sport=3124 dport=41955 xid=524 [ASSURED] use=1 tcp 6 25 TIME_WAIT src=127.0.0.1 dst=127.0.0.1 sport=59101 dport=3100 xid=619 \ src=127.0.0.1 dst=127.0.0.1 sport=3100 dport=59101 xid=687 [ASSURED] use=1 udp 17 135 src=128.112.139.71 dst=140.112.107.82 sport=4121 dport=4121 xid=524 \ src=140.112.107.82 dst=128.112.139.71 sport=4121 dport=4121 xid=-1 [ASSURED] use=1 On PlanetLab, the listing is supplemented with
information about the owner of each side of the connection
(loopback connections may have two different owners). An
The meaning of the other fields is the same. Examining the first and second entries in the example above:
VNET extends the meaning of the standard socket option
Example 7. Example C code for retrieving the slice owner of the other end of a connection. int sock;
struct sockaddr_in peer;
int peerlen = sizeof(peer);
struct ucred peercred;
socklen_t peercredlen = sizeof(peercred);
/* At this point, sock is connected to a peer. */
getpeername(sock, (struct sockaddr_in *) &peer, &peerlen);
getsockopt(sock, SOL_SOCKET, SO_PEERCRED, &peercred, &peercredlen);
printf("Peered with slice ID %d\n", peercred.gid);
When the option is set via
Example 8. Example C code for transferring ownership of a socket to another slice. #if !defined(SO_SETXID) #define SO_SETXID SO_PEERCRED #endif /* At this point, sock has been created and xid contains the ID of the desired slice. */ setsockopt(sock, SOL_SOCKET, SO_SETXID, &xid, sizeof(xid)); Because setting VNET adds an The A. ExamplesThe following examples demonstrate the use of the VNET extensions described in this document. The code is freely redistributable within the constraints of the BSD license.
[1] Scout Module API. [2] Proper: Privileged Operations in a Virtualised System Environment. PDN-04-022. August 2004 (updated October 2004). [3] Address Allocation for Private Internets. RFC1918. February 1996. |
PlanetLab loginAnnouncements
|
|||||||||||||||