Monday 27 May 2013

What is EPOLL? Epoll vs Poll vs Select call ? And How to implement UDP server in Linux using EPOLL?



Today in internet world, as the number of users are increasing day to day and to support these users it needs more efficient HTTP servers.

A common problem in HTTP server scalability is how to ensure that the server handles a large number of connections simultaneously without degrading the performance.

An event-driven approach is often implemented in high-performance network servers to multiplex a large number of concurrent connections over a few server processes. 

In event-driven servers it is important that the server focuses on connections that can be serviced without blocking its main process.

What is EPOLL?
===========
epoll - I/O event notification facility

Select Vs poll Vs Epoll
==================
The Epoll event mechanism  is designed to scale to larger numbers of connections than select and poll.

One of the problems with select and poll is that in a single call they must both inform the kernel of all of the events of interest and obtain new events.
This can result in large overheads, particularly in environments with large numbers of connections and relatively few new events occurring.

However, if your server application is network-intensive (e.g., 1000s of concurrent connections and/or a high connection rate), you should get really serious about performance.
This situation is often called the c10k problem. With select() or poll(), your network server will hardly perform any useful things but wasting precious CPU cycles under such high load.

c10k Problem
===========
Suppose that there are 10,000 concurrent connections. Typically, only a small number of file descriptors among them, say 10, are ready to read.
The rest 9,990 file descriptors are copied and scanned for no reason, for every select()/poll() call.

Another Example as :

The cost of  Epoll is closer to the number of file descriptors that actually have events on them.
If you're monitoring 200 file descriptors, but only 100 of them have events on them, then you're (very roughly) only paying for those 100 active file descriptors.
This is where Epoll tends to offer one of its major advantages over select. If you have a thousand clients that are mostly idle,
then when you use select you're still paying for all one thousands of them. However, with Epoll, it's like you've only got a few - you're only paying for the ones that are active at any given time.

All this means that epoll will lead to less CPU usage for most workloads

Time Complexity
=============

Select  -> O(n)   Epoll -> O(1)

Select calls, which are O(n), epoll is an O(1) algorithm – this means that it scales well as the number of watched file descriptors increase.
select uses a linear search through the list of watched file descriptors, which causes its O(n) behaviour, whereas epoll uses callbacks in the kernel file structure.

Another fundamental difference of epoll is that it can be used in an edge-triggered, as opposed to level-triggered, fashion.
 This means that you receive “hints” when the kernel believes the file descriptor has become ready for I/O, as opposed to being told “I/O can be carried out on this file descriptor”.

No of clients support is a Limitation in Select Call
==============================================
Using Select() call, Max number of clients it handle is 1024 (1k).

In other words, server is able to handle only 1024 client after which connections are failing.
Increased per process max open files (1024) to 100000 and still the connections failed at 1024.

select limitation

select fails after 1024 fds as FD_SETSIZE max to 1024.
As a natural progression poll was tried next to overcome max open fd issue.

poll limitation
poll solves the max fd issue. But as the number of concurrent clients started increasing, performance dropped drastically.
Poll implementation does O(n) operations internally and performance drops as number of fds increases.

epoll
Epoll solved both problems and gave awesome performance.

Triggering modes
=============

  • Edge Triggered Mode
  •  Level Triggered Mode

Epoll provides both edge-triggered and level-triggered modes. 

In edge-triggered mode, a call to epoll_wait will return only when a new event is en queued with the epoll object, while in level-triggered mode, epoll_wait will return as long as the condition holds.

For instance, if a pipe, registered with epoll, has received data, a call to epoll_wait will return, signaling the presence of data to be read.
Suppose the reader only consumed part of data from the buffer. In level-triggered mode, further calls to epoll_wait will return immediately, as long as the pipe's buffer contains data to be read.
In edge-triggered mode, however, epoll_wait will return only once new data is written to the pipe

To Understand Better…..

When an FD becomes read or write ready, you might not want necessarily want to read (or write) all the data immediately.

Level-triggered epoll will keep nagging you as long as the FD remains ready, whereas edge-triggered won't bother you again until the next time you get an EAGAIN
(so it's more complicated to code around, but can be more efficient depending on what you need to do).

Say you're writing from a resource to an FD. If you register your interest for that FD becoming write ready as level-triggered, you'll get constant notification that the FD is still ready for writing.
If the resource isn't yet available, that's a waste of a wake-up, because you can't write any more anyway.

If you were to add it as edge-triggered instead, you'd get notification that the FD was write ready once, then when the other resource becomes ready you write as much as you can.
Then if write(2) returns EAGAIN, you stop writing and wait for the next notification.

The same applies for reading, because you might not want to pull all the data into user-space before you're ready to do whatever you want to do with it
 (thus having to buffer it, etc etc). With edge-triggered epoll you get told when it's ready to read, and then can remember that and do the actual reading "as and when".

EPOLL SYSTEM Calls
==================

The Epoll interface consists of three system calls:

int epoll_create(int size);

Creates an epoll object and returns its file descriptor. size is obsolete since kernel 2.6.8 but must be greater than zero for backwards compatibility.

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

Controls (configures) which file descriptors are watched by this object, and for which events. op can be ADD, MODIFY or DELETE.

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

Waits for any of the events registered for with epoll_ctl, until at least one occurs or the timeout elapses. Returns the occurred events in events, up to maxevents at once.


 UDP SERVER IMPLEMENTED USING EPOLL
==========================================


#include <stdio.h>          // for printf() and fprintf()
#include <sys/socket.h>     // for socket(), bind(), and connect()
#include <arpa/inet.h>      // for sockaddr_in and inet_ntoa()
#include <stdlib.h>         // for atoi() and exit()
#include <string.h>         // for memset()
#include <unistd.h>         // for close()
#include <fcntl.h>          // for fcntl()
#include <errno.h>
#include <sys/epoll.h>

#define MAX_EVENTS 100

#define BUFFSIZE 5096

unsigned char buf[BUFFSIZE];

/*
 * Dump Data
 */
void dumpData(unsigned char *data,  unsigned int len)
{
  unsigned int uIndx;

  if(data)
    {
      for(uIndx=0; uIndx<len; ++uIndx)
        {
          if(uIndx%32 == 0)
            {
              printf("\n%4d:", uIndx);
            }
          if(uIndx%4 == 0)
            {
              printf(" ");
            }
          printf("%02x", data[uIndx]);
        }
    }
  printf(" Length of Bytes: %d\n", len);
  printf("\n");
}


/*
 * make_socket_non_blocking :
 *   This Function makes socket as Non blocking
 */
static int make_socket_non_blocking(int sockFd)
{
  int getFlag, setFlag;
 
  getFlag = fcntl(sockFd, F_GETFL, 0);
 
  if(getFlag == -1)
  {
    perror("fnctl");
    return -1;
  }
 
  /* Set the Flag as Non Blocking Socket */
  getFlag |= O_NONBLOCK;
 
  setFlag = fcntl(sockFd, F_GETFL, getFlag);
 
  if(setFlag == -1)
  {
    perror("fnctl");
    return -1;
  }
 
  return 0;
}

/*
 *  Main Routine
 */
int main()
{
  int i, length, receivelen;

  /* Socket Parameters */
  int sockFd;
  int optval = 1;   // Socket Option Always = 1

  /* Server Address */
  struct sockaddr_in serverAddr, receivesocket;

  /* Epoll File Descriptor */
  int epollFd;      

  /* EPOLL Event structures */
  struct epoll_event  ev;                  
  struct epoll_event  events[MAX_EVENTS];               
  int numEvents;    
             
  int ctlFd; 
  // Step 1: First Create UDP Socket 
 
  /* Create UDP socket
   * socket(protocol_family, sock_type, IPPROTO_UDP);
   */
  sockFd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);

  /* Check socket is successful or not */
  if (sockFd == -1)
  {
    perror(" Create SockFd Fail \n");
    return -1;
  }

  // Step 2: Make Socket as Non Blocking Socket.
  //         To handle multiple clients Asychronously, required to
  //         configure socket as Non Blocking socket

  /* Make Socket as Non Blocking Socket */
  make_socket_non_blocking(sockFd);

  // Step 3: Set socket options
  //    One can set different sock Options as RE-USE ADDR, 

  //    BROADCAST etc.
 
  /*  In this Program, the socket is set to RE-USE ADDR
   *  So this gives flexibilty to other sockets to BIND to the 

      same port Num */

  if(setsockopt(sockFd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval))== -1)
  {
     perror("setsockopt Fail\n");
     return -1;
  }

  // Step 4: Bind to the Recieve socket
  /* Bind to its own port Num  ( Listen on Port Number) */
  
  /* Setup the addresses */
 
   /* my address or Parameters
     ( These are required for Binding the Port and IP Address )
      Bind to my own port and Address */
  memset(&receivesocket, 0, sizeof(receivesocket));
  receivesocket.sin_family = AF_INET;
  receivesocket.sin_addr.s_addr = htonl(INADDR_ANY);
  receivesocket.sin_port = htons(2905);

  receivelen = sizeof(receivesocket);

  /* Bind the my Socket */
  if (bind(sockFd, (struct sockaddr *) &receivesocket, receivelen) < 0)
  {
    perror("bind");
    return -1;
  }

  // EPOLL Implementation Starts
  // Step 5: Create Epoll Instance
             /* paramater is Optional */
 
  epollFd = epoll_create(6);

  if(epollFd == -1)
  {
     perror("epoll_create");
     return -1;
  }

  /* Add the udp Sock File Descriptor to Epoll Instance */
  ev.data.fd = sockFd;
 
  /* Events are Read Only and Edge-Triggered */
  ev.events = EPOLLIN | EPOLLET;

 
  // Step 6: control interface for an epoll descriptor
  /* EPOLL_CTL_ADD
      Register the target file descriptor fd on the epoll instance
      referred to by the file descriptor epfd and
      associate the event event with the internal file linked to fd.
  */


  /* Add the sock Fd to the EPOLL */
  ctlFd  = epoll_ctl (epollFd, EPOLL_CTL_ADD, sockFd, &ev);
 
  if (ctlFd == -1)
  {
    perror ("epoll_ctl");
    return -1;
  }

 // Step 7: Start the Event Loop using epoll_wait() in while Loop.

 /* Event Loop */
 while(1)
 {
     /*  Wait for events.
      *  int epoll_wait(int epfd, struct epoll_event *events, int
      *  maxevents, int timeout);
      *  Specifying a timeout of -1 makes epoll_wait() wait
      *  indefinitely.
      */
    
     /* Epoll Wait Indefently since Time Out is -1 */
     numEvents = epoll_wait(epollFd, events, MAX_EVENTS, -1);

     for (i = 0; i < numEvents; i++)
     {
       if ((events[i].events & EPOLLERR) ||
           (events[i].events & EPOLLHUP) ||
           (!(events[i].events & EPOLLIN)))
        {
           /* An error has occured on this fd, or  the socket is not
            * ready for reading (why were we notified then?)
            */
           fprintf (stderr, "epoll error\n");
           close (events[i].data.fd);
           continue;
        }
       /* We have data on the fd waiting to be read. Read and
        * display it. We must read whatever data is available
        * completely, as we are running in edge-triggered mode
        * and won't get a notification again for the same data.
        */
       else if ( (events[i].events & EPOLLIN) &&
           (sockFd == events[i].data.fd) )
       {
         while (1)
         {

           memset(buf, 0, BUFFSIZE);
           /* Recieve the Data from Other system */
           if ((length = recvfrom(sockFd, buf, BUFFSIZE, 0, NULL, NULL)) < 0)
            {
                perror("recvfrom");
                return -1;
            }

           else if(length == 0)
             {
               printf( " The Return Value is 0\n");
               break;
             }
           else
             {
               /* Print The data */
               printf("Recvd Byte length : %d",  length);
               dumpData(buf, length);
             }
          }
       }
     }
  }

close( sockFd );
close( epollFd );
return 0;
}



==============================================================================
UDP CLIENT -> udpclient.c
==============================================================================

#include <stdio.h>
#include <arpa/inet.h>
#include <string.h>
#include<stdlib.h>
#include <sys/unistd.h>
#include <sys/fcntl.h>


#define BUFFSIZE 5096
#define MAX_LEN 100000

int sendlen, receivelen;
int received = 0, i,count, rcvCnt=0, sentCnt=0;
unsigned char buffer[BUFFSIZE];
struct sockaddr_in receivesocket;
struct sockaddr_in sendsocket;
int sock;
unsigned int ch;
unsigned int noOfTimes;
   
int sendUDPData();  
   
int main(int argc, char *argv[]) {
   
    int ret = 0;
  int optval = 1;

    /* Create the UDP socket */
    if ((sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0) {
        perror("socket");
        return -1;
    }

    /* my address */
    memset(&receivesocket, 0, sizeof(receivesocket));
    receivesocket.sin_family = AF_INET;
    receivesocket.sin_addr.s_addr = htonl(INADDR_ANY);
    receivesocket.sin_port = htons(2905);

    receivelen = sizeof(receivesocket);

 if(setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval))== -1)
  {
     perror("setsockopt Fail\n");
     return -1;
  }
  if (bind(sock, (struct sockaddr *) &receivesocket, receivelen) < 0) 

    {
        perror("bind");
        return -1;      
    } 
    /* kernel address */
    memset(&sendsocket, 0, sizeof(sendsocket));
    sendsocket.sin_family = AF_INET;
    sendsocket.sin_addr.s_addr = inet_addr("10.12.7.95");
    sendsocket.sin_port = htons(2905);

   do
    {
       printf("\n");
       printf(" Enter your choice:\t");
       printf(" 1. Send UDP Data \n");
       printf(" 2. exit \n");
       scanf("%d", &ch);
       printf("\n");

       switch(ch)
       {

           case 1:
                   printf("Enter the Length of the Payload \n");
                   scanf("%d", &sendlen);
                   printf("Enter How many times you want to send data \n");
                   scanf("%d", &noOfTimes);
                   sendUDPData();
                   break;

           default:
                  printf("Invalid Choice\n");
                  break;
       }
    }while(ch!=2);
return 0;
}
int sendUDPData()
{
    int count=0;
        memset(buffer, 0x31, sendlen);
       
        for(count=0; count< noOfTimes;  count++)
        {
       if (sendto(sock, buffer, sendlen, 0, (struct sockaddr *) &sendsocket,
                        sizeof(sendsocket)) != sendlen)      
       {
        perror("sendto");
        return -1;
       }

    printf("\n");
    }
    return 0;
}

Monday 13 May 2013

Simple Memory pool Implementation for Real Time Applications

Dear readers,

   In this post, I would like to introduce about the Simple Memory Pool Implementation which can be used for real time applications.

Before looking in to actual Implementation, first know about what is memory pool and why is required in real time applications.

In real time applications, performance plays a very important role. That means any real time system should have high performance to get desirable results else it is considered as useless system.

Hence, when ever a developer writing any code, he/she should keep in mind that how much code optimization and performance that is going to achieve with the implementation.

In the code often developers allocate the memory using dynamic allocation macros such as MALLOC in 
case of C and NEW in case of C++. 

Suppose, you have packet receptor system which receives the packets from other device. And this receptor system responsibility to add a extra header to the received packet and sends the modified packet out to the end device. 

What happens when code allocates/dellocates memory using malloc/free for every packet. 

Do you think there is a performance degradation? Yes, obviously there is performance degradation. 

Then what is the solution? Solution is Memory pools.
   A more efficient solution is preallocating a number of memory blocks with the same size called the memory pool. The application can allocate, access, and free blocks represented by handles at run time.

What is Memory Pool?
   Memory pool is pre-allocation of fixed size of memory or it also defined as fixed size of memory allocated using MALLOC or New during initialization of the system.

Why it is required?
   To avoid the performance degradation in real time systems.

How it is implemented?

Step 1 :  Create a memory Pool. 
Application needs to initialize a pool. Application allocates the memory required during initialization time using MALLOC/NEW.
Step 2:    Get memory from Pool
Whenever a memory is required and the same memory would be fetched from the Pool.              
Step3:   Add memory to Pool
Once the Memory is no more used then add the same memory to Pool.
Step4:  Destroy/Deallocate memory Pool
And finally, before application quits, it should de-allocate the pool, to make sure that all memory blocks allocated by the factory are released back to the operating system. After this, of course no more memory pool allocation can be requested.

Simple Program for Memory Pool:



#include<stdio.h>
#include<stdlib.h>
#include<string.h>

#define MAX_MEMORY_POOL_NODES 10

/* Node */
typedef struct node
{
    int    data;
    int    nodeAllocated;
}NODE;

typedef struct Memorypool
{
    NODE       *array[MAX_MEMORY_POOL_NODES];
    int        nodeCounter;
}MEMORYPOOL;

MEMORYPOOL allocNode;

/*
 * initialiseMemoryPool()
 */
void initialiseMemoryPool()
{
   unsigned char uIndx;
   
   /* Initialise the Memory for the Nodes */      
   for(uIndx = 0; uIndx<MAX_MEMORY_POOL_NODES; uIndx++)
   {
      allocNode.array[uIndx]= (NODE *)malloc(sizeof(NODE));
      allocNode.array[uIndx]->nodeAllocated = 0;

      allocNode.nodeCounter++;
   }
}

NODE *getTheFreeNodeFromPool()
{
  int uIndx;
  
   /* Get the Memory from the Pool of Nodes */
   for(uIndx = 0; uIndx<MAX_MEMORY_POOL_NODES; uIndx++)
   {
      if(allocNode.array[uIndx]->nodeAllocated == 0) 
      {
          allocNode.array[uIndx]->nodeAllocated = 1;
          allocNode.nodeCounter--; 
          break;
      }
   }  

  if(uIndx == MAX_MEMORY_POOL_NODES) 
  {
     printf(" No Free Nodes are available \n");
     return NULL;
  }
  return allocNode.array[uIndx];
}

/* 
 *  Add the Node to Memory pool
 */
void addTheNodeToMemoryPool(unsigned char uIndx)
{
   /* Add the Node to Pool */
   if(allocNode.array[uIndx]->nodeAllocated == 1 )
   {
         allocNode.array[uIndx]->data = 0;
         allocNode.array[uIndx]->nodeAllocated = 0; 
         allocNode.nodeCounter++;
    }
   else
   {
      printf("No Nodes are there Add \n");
      return;
   }
}

/*
 * deallocateTheMemoryPool-  Deallocate the Memory Pool
 */
void deallocateTheMemoryPool(void)
{
  unsigned char uIndx;

  /* De-allocate the Memory Pool */
  for(uIndx = 0; uIndx<MAX_MEMORY_POOL_NODES; uIndx++)
    {
        free(allocNode.array[uIndx]);
    }
}
              
/* Main */
int main()
{
    unsigned char uIndx, index;
   
   // Step 1:
   /* Initialise the Memory Pool */
   initialiseMemoryPool();

   // Step 2:
   /* Get the Node from the Memory Pool */
   for(uIndx=0 ; uIndx< MAX_MEMORY_POOL_NODES; uIndx++)
   { 
      NODE *node= NULL; 
      int data;
      
      node = getTheFreeNodeFromPool();

      if(node)
      {
          printf("Enter the Data to be added in the Linklist \n");
          scanf("%d", &data);
    
          node->data = data;
      }
   }
  
   /* Display the Data */
   printf(" Display Data \n");
   for (uIndx=0 ; uIndx<MAX_MEMORY_POOL_NODES; uIndx++)
    {
      printf("%d\n", allocNode.array[uIndx]->data);
    }

   
   //Step 3:
   /* Add the Node to Memory Pool */   
   printf(" Enter the Node to be added to memory pool \n");
   scanf("%d", &index);
   
   addTheNodeToMemoryPool(index);

   /* Display the Data */
   printf(" Display Data after adding Node to Free Pool \n");
   for (uIndx=0 ; uIndx<MAX_MEMORY_POOL_NODES; uIndx++)
    {
      printf("%d\n", allocNode.array[uIndx]->data);
    }

   //Step 4:
   /* De Allocate the Memory Pool */
   deallocateTheMemoryPool();
   return 0;
}