Friday, 22 August 2014

What is SKB in Linux kernel? What are SKB operations? Memory Representation of SKB? How to send packet out using skb operations?

As many of them are aware, OSI reference and TCP/IP Model.

For any Networking application TCP/IP model is required to process/route the packets from one end to other. Then, How to support the networking in Linux kernel?  How to process the packets in Linux kernel?   

This article will answer the above questions.
  •   A network interface represents a thing which sends and receives packets. This is normally interface code for a physical device like an ethernet card. However some devices are software only such as the loopback device which is used for sending data to yourself.
  • The network subsystem of the Linux kernel is designed to be completely protocol-independent. This applies to both networking protocols (Internet protocol [IP] versus IPX or other protocols) and hardware protocols (Ethernet versus token ring, etc.).
  •    A header is a set of bytes (err, octets) prepended to a packet as it is passed through the various layers of the networking subsystem. When an application sends a block of data through a TCP socket, the networking subsystem breaks that data up into packets and puts a TCP header, describing where each packet fits within the stream, at the beginning. The lower levels then put an IP header, used to route the packet to its destination, in front of the TCP header. If the packet moves over an Ethernet-like medium, an Ethernet header, interpreted by the hardware, goes in front of the rest.

The two important data structures of Linux kernel network layer are:
-         sk_buff     (defined  in  /include/linux/sk_buff.h)


-         net_device  (defined  in  /include/linux/net_device.h)
Each interface is described by a struct net_device item. ( Please refer in another post)

What is sk_buff:
  • sk_buff means socket buffers. This is core structure in Linux networking.
  •  skbuffs are the buffers in which the Linux kernel handles network packets. The packet is received by the network card, put into a skbuff and then passed to the network stack, which uses the skbuff all the time.
  • In the same words as above:
As we need to manipulate packets through the Linux kernel stack, this manipulation involves efficiently:  
  • Adding protocol headers/trailers down the stack.
  •  Removing protocol headers/trailers up the stack. 
  •  Concatenating/separating data. 
  • Each protocol should have convenient access to header fields.
To order to perform all of above, kernel provides the sk_buff structure.

Structure of sk_buff

sk_buff structure is created when an application passes data to a socket or when a packet arrives at the network adaptor (dev_alloc_skb() is invoked).


MEMORY Representation of SKB structure:
SKB has four parts. Memory representation of skb structure is depicted below.


sk_buff has five pointers as mentioned below.
head      
the start of the packet
data
the start of the packet payload
tail 
the end of the packet payload
end  
the end of the packet
len
the amount of data of the packet

As shown in above figure, Skb memory usually has four parts:
1.       head room : located skb-> between the head and skb-> data, which is stored in the local network protocol header, such as TCP, IP header, Ethernet header are located here;
2.       User data  : usually filled by the application layer calls through the system between skb-> data and skb-> tail between;
3.       tail room : between skb-> tail and skb-> end, which is the core part of the user data to fill in the back part;
4.       skb-> after the end is stored in a special structure struct skb_shared_info.

STEPS for sending the packet out using SKB OPERATIONS:

Step 1:   Allocate Memory for the skb
 
skb = alloc_skb(len, GFP_KERNEL);
Once the skb is allocated with memory using alloc_skb() then it will look like as shown in below.

As you can see, the head, data, and tail pointers all point to the beginning of the data buffer. And the end pointer points to the end of it. Note that all of the data area is considered tail room.
The length of this SKB is zero, it isn't very interesting since it doesn't contain any packet data at all. 

Step 2:  Space for Headroom to add protocol headers (Ethernet + IP+ TCP headers etc..)

Reserve some space for protocol headers using skb_reserve().It usually initialize the head room, by calling skb_reserve () function as shown in figure below.


skb_reserve(skb, header_len);


skb->data and skb->tail pointer increments (advances or moves) by the specified Header length.


For example, the TCP layer to send a data packet, head room, at least if tcphdr + iphdr + ethhdr. 

Step 3 :  Add the User Data (payload) after the skb->put()
skb_put() advances (moves the) 'skb->tail' pointer by the specified number of bytes (user_data_len), it also increments 'skb->len' by that number of bytes as well. This routine must not be called on a SKB that has any paged data. 

unsigned char *data = skb_put(skb, user_data_len);                    
memcpy(data, 0x11,  user_data_len);

 
Step 3:  Add the Headers Using the skb->push()  in the Headroom

skb_push() will decrement the 'skb->data' pointer by the specified number of bytes. It will also increment'skb->len' by that number of bytes as well.  Make sure that there is enough head room for the push being performed.
For example , push the TCP header to the front of the SKB.
unsigned char *tcp_header = skb->push(skb, sizeof(struct udphdr));
struct tcphdr *tcp;
tcp = tcp_header;
 
tcp->source = htons(1025);
tcp->dest = htons(2095);
tcp->len = htons(user_data_len);
tcp->check = 0;
 
skb->pull()   :
Remove the respective headers From the Headroom and returning the bytes to headroom using skb_pull() operation.
It Increments (pulled down) the skb-> data by a specified number of bytes and decrements the skb_len.



Step 5: Similarly Push the Ipv4 header in to sk_buffer using
skb_push operation and send the packet out using dev_queue_xmit() function.
Please refer the example code below.

Pictorial representation of skb operations.


Some more skb operations 
skb_clone(),skb_copy(),skb_trim()

skb_clone() : This function will not copy the entire skb structure. It is just make the skb to point to the same region of memory on the line.
In this case data_ref pointer present in the skb_shared_info structure will be incremented to 2
For example:
Packets can be captured using "tcpdump" in linux kernel. At this point, the packet will be given to linux stack and the same packet is given to tcpdump as well (First protocol stack and after tcpdump). 
In this case not to completely copy skb it? Not necessary, because the two parts are read, the network data itself is unchanged, becomes just strcut sk_buff pointer inside, that is to say, we just copy skb to point to the same region of memory on the line! This is skb_clone () did:




It can also access using by skb + 1

skb_copy()
In some cases we need have to write a complete copy of their data skb call skb_copy().It copies entire data of skb and creates another skb. 

skb_trim()  : remove end from a buffer

Prototype
void skb_trim(struct sk_buff *skb, unsigned int len);

skb support functions

There are a bunch of skb support functions provided by the sk_buff layer. 

allocation / free / copy / clone and expansion functions

struct sk_buff *alloc_skb(unsigned int size, int gfp_mask)
This function allocates a new skb. This is provided by the skb layer to initialize some privat data and do memory statistics. The returned buffer has no headroom and a tailroom of /size/ bytes.

void kfree_skb(struct sk_buff *skb)
Decrement the skb's usage count by one and free the skb if no references left.

struct sk_buff *skb_get(struct sk_buff *skb)
Increments the skb's usage count by one and returns a pointer to it.

struct sk_buff *skb_clone(struct sk_buff *skb, int gfp_mask)
This function clones a skb. Both copies share the packet data but have their own struct sk_buff. The new copy is not owned by any socket, reference count is 1.

struct sk_buff *skb_copy(const struct sk_buff *skb, int gfp_mask)
Makes a real copy of the skb, including packet data. This is needed, if You wish to modify the packet data. Reference count of the new skb is 1.

struct skb_copy_expand(const struct sk_buff *skb, int new_headroom, int new_tailroom, int gfp_mask)
Make a copy of the skb, including packet data. Additionally the new skb has a haedroom of /new_headroom/ bytes size and a tailroom of /new_tailroom/ bytes.

anciliary functions

int skb_cloned(struct sk_buff *skb)
Is the skb a clone?

int skb_shared(struct sk_Buff *skb)
Is this skb shared? (is the reference count > 1)?

operations on lists of skb's

struct sk_buff *skb_peek(struct sk_buff_head *list_)
peek a skb from front of the list; does not remove skb from the list

struct sk_buff *skb_peek_tail(struct sk_buff_head *list_)
peek a skb from tail of the list; does not remove sk from the list

__u32 skb_queue_len(sk_buff_head *list_)
return the length of the given skb list

void skb_queue_head(struct sk_buff_head *list_, struct sk_buff *newsk)
enqueue a skb at the head of a given list

void skb_queue_tail(struct sk_buff_head *list_, struct sk_buff *newsk)
enqueue a skb at the end of a given list. 

int skb_headroom(struct sk_buff *skb)
return the amount of bytes of free space at the head of skb

int skb_tailroom(struct sk_buff *skb)
return the amount of bytes of free space at the end of skb

struct sk_buff *skb_cow(struct sk_buff *skb, int headroom)
if the buffer passed lacks sufficient headroom or is a clone it is copied and additional headroom made available. 

void struct sk_buff *skb_dequeue(struct sk_buff_head *list_)
            skb_dequeue() takes the first buffer from a list (dequeue a skb from the head of the given list) If the list is empty a NULL pointer is returned. This is used to pull buffers off queues. The buffers are added with the routines skb_queue_head() andskb_queue_tail().

struct sk_buff *sbk_dequeue_tail(struct sk_buff_head *list_)
              dequeue a skb from the tail of the given list 

Network device packet flow:

How to Identify skb is linear or not.
Data is placed between skb-> head and skb-> end, it is called as linear (linear).
skb->data_len == 0  is Linear.

else
If skb is not linear  means  skb->data_len != 0
the length of  skb->data is (skb->len) - (skb-> data_len) for the head ONLY.


Pseudocode
----------
/* SKB Is Linear */
  if (skb->data_len == 0)
    {
      printk(" Skb is Linear : the skb_len is :%d", skb->len);
    }
  /* SKB Is Not Linear */
  else
    {
      if(skb->data_len != 0)
        {
          skb->len = (skb->len) - (skb->data_len);
        }
       Printk(“Skb is Not Linear : the skb_len is :%d", skb->len);

    }

skb->data_len =   struct skb_shared_info->frags[0...struct skb_shared_info->nr_frags].size
                                          + size of data in struct skb_shared_info->frag_list

When SKB is Non Linear?

First Model :
One is the common NIC driver model, data is stored in different locations of physical pages, skb_shared_info there are an array to store a set of (page, offset, size) of the information used to record these data


Second Model:
frag_list assembled IP packet fragmentation (fragment) when used in: 
Fragmentation of data has its own skb structure, which through skb-> next link into a single linked list, the first table is the first one in the skb shared_info frag_list.


Third  Model:
GSO segment (segmentation) used a model, when a large TCP data packets are cut into several MTU size, they are also through skb-> next link together:


References:
http://wangcong.org/blog/archives/2337
http://www.skbuff.net/skbbasic.html
http://vger.kernel.org/~davem/skb_data.html
http://blog.csdn.net/npy_lp/article/details/7263902