Today
in internet world, as the number of users are increasing day to day and to
support these users it needs more efficient HTTP servers.
A
common problem in HTTP server scalability is how to ensure that the server
handles a large number of connections simultaneously without degrading the
performance.
An
event-driven approach is often implemented in high-performance network servers
to multiplex a large number of concurrent connections over a few server
processes.
In
event-driven servers it is important that the server focuses on connections
that can be serviced without blocking its main process.
What is EPOLL?
===========
epoll - I/O event notification facility
Select Vs poll Vs Epoll
==================
The
Epoll event mechanism is designed to
scale to larger numbers of connections than select and poll.
One
of the problems with select and poll is that in a single call they must both
inform the kernel of all of the events of interest and obtain new events.
This
can result in large overheads, particularly in environments with large numbers
of connections and relatively few new events occurring.
However,
if your server application is network-intensive (e.g., 1000s of concurrent
connections and/or a high connection rate), you should get really serious about
performance.
This
situation is often called the c10k problem. With select() or poll(), your network
server will hardly perform any useful things but wasting precious CPU cycles
under such high load.
c10k Problem
===========
Suppose
that there are 10,000 concurrent connections. Typically, only a small number of
file descriptors among them, say 10, are ready to read.
The
rest 9,990 file descriptors are copied and scanned for no reason, for every
select()/poll() call.
Another
Example as :
The
cost of Epoll is closer to the number of
file descriptors that actually have events on them.
If
you're monitoring 200 file descriptors, but only 100 of them have events on
them, then you're (very roughly) only paying for those 100 active file
descriptors.
This
is where Epoll tends to offer one of its major advantages over select. If you
have a thousand clients that are mostly idle,
then
when you use select you're still paying for all one thousands of them. However,
with Epoll, it's like you've only got a few - you're only paying for the ones
that are active at any given time.
All
this means that epoll will lead to less CPU usage for most workloads
Time Complexity
=============
Select -> O(n) Epoll -> O(1)
Select calls, which are O(n), epoll is
an O(1) algorithm – this means that it scales well as the number of watched
file descriptors increase.
select uses a linear search through
the list of watched file descriptors, which causes its O(n) behaviour, whereas
epoll uses callbacks in the kernel file structure.
Another fundamental difference of
epoll is that it can be used in an edge-triggered, as opposed to
level-triggered, fashion.
This means that you receive “hints” when the
kernel believes the file descriptor has become ready for I/O, as opposed to
being told “I/O can be carried out on this file descriptor”.
No of clients
support is a Limitation in Select Call
==============================================
Using
Select() call, Max number of clients it handle is 1024 (1k).
In
other words, server is able to handle only 1024 client after which connections
are failing.
Increased
per process max open files (1024) to 100000 and still the connections failed at
1024.
select limitation
select fails after 1024 fds as FD_SETSIZE max to 1024.
As a natural progression poll was tried next to overcome max
open fd issue.
poll limitation
poll
solves the max fd issue. But as the number of concurrent clients started
increasing, performance dropped drastically.
Poll
implementation does O(n) operations internally and performance drops as number
of fds increases.
epoll
Epoll solved both problems and gave awesome performance.
Triggering modes
=============
- Edge Triggered Mode
- Level Triggered Mode
Epoll
provides both edge-triggered and level-triggered modes.
In
edge-triggered mode, a call to epoll_wait will return only when a new event is en queued with the epoll object, while in level-triggered mode, epoll_wait will
return as long as the condition holds.
For
instance, if a pipe, registered with epoll, has received data, a call to
epoll_wait will return, signaling the presence of data to be read.
Suppose
the reader only consumed part of data from the buffer. In level-triggered mode,
further calls to epoll_wait will return immediately, as long as the pipe's
buffer contains data to be read.
In
edge-triggered mode, however, epoll_wait will return only once new data is
written to the pipe
To Understand Better…..
When
an FD becomes read or write ready, you might not want necessarily want to read
(or write) all the data immediately.
Level-triggered
epoll will keep nagging you as long as the FD remains ready, whereas
edge-triggered won't bother you again until the next time you get an EAGAIN
(so
it's more complicated to code around, but can be more efficient depending on
what you need to do).
Say
you're writing from a resource to an FD. If you register your interest for that
FD becoming write ready as level-triggered, you'll get constant notification
that the FD is still ready for writing.
If
the resource isn't yet available, that's a waste of a wake-up, because you
can't write any more anyway.
If
you were to add it as edge-triggered instead, you'd get notification that the
FD was write ready once, then when the other resource becomes ready you write
as much as you can.
Then
if write(2) returns EAGAIN, you stop writing and wait for the next
notification.
The
same applies for reading, because you might not want to pull all the data into
user-space before you're ready to do whatever you want to do with it
(thus having to buffer it, etc etc). With
edge-triggered epoll you get told when it's ready to read, and then can
remember that and do the actual reading "as and when".
EPOLL SYSTEM Calls
==================
The Epoll interface consists of three system calls:
int epoll_create(int size);
Creates an epoll object and returns its file descriptor.
size is obsolete since kernel 2.6.8 but must be greater than zero for backwards
compatibility.
int epoll_ctl(int epfd, int op,
int fd, struct epoll_event *event);
Controls (configures) which file descriptors are watched by
this object, and for which events. op can be ADD, MODIFY or DELETE.
int epoll_wait(int epfd, struct
epoll_event *events, int maxevents, int timeout);
Waits for any of the events registered for with epoll_ctl,
until at least one occurs or the timeout elapses. Returns the occurred events
in events, up to maxevents at once.
UDP SERVER IMPLEMENTED USING EPOLL
==========================================
#include <stdio.h> // for printf() and fprintf()
#include <sys/socket.h> // for socket(), bind(), and connect()
#include <arpa/inet.h> // for sockaddr_in and inet_ntoa()
#include <stdlib.h> // for atoi() and exit()
#include <string.h> // for memset()
#include <unistd.h> // for close()
#include <fcntl.h> // for fcntl()
#include <errno.h>
#include <sys/epoll.h>
#define MAX_EVENTS 100
#define BUFFSIZE 5096
unsigned char buf[BUFFSIZE];
/*
* Dump Data
*/
void dumpData(unsigned char *data, unsigned int len)
{
unsigned int uIndx;
if(data)
{
for(uIndx=0; uIndx<len; ++uIndx)
{
if(uIndx%32 == 0)
{
printf("\n%4d:", uIndx);
}
if(uIndx%4 == 0)
{
printf(" ");
}
printf("%02x", data[uIndx]);
}
}
printf(" Length of Bytes: %d\n", len);
printf("\n");
}
/*
* make_socket_non_blocking :
* This Function makes socket as Non blocking
*/
static int make_socket_non_blocking(int sockFd)
{
int getFlag, setFlag;
getFlag = fcntl(sockFd, F_GETFL, 0);
if(getFlag == -1)
{
perror("fnctl");
return -1;
}
/* Set the Flag as Non Blocking Socket */
getFlag |= O_NONBLOCK;
setFlag = fcntl(sockFd, F_GETFL, getFlag);
if(setFlag == -1)
{
perror("fnctl");
return -1;
}
return 0;
}
/*
* Main Routine
*/
int main()
{
int i, length, receivelen;
/* Socket Parameters */
int sockFd;
int optval = 1; // Socket Option Always = 1
/* Server Address */
struct sockaddr_in serverAddr, receivesocket;
/* Epoll File Descriptor */
int epollFd;
/* EPOLL Event structures */
struct epoll_event ev;
struct epoll_event events[MAX_EVENTS];
int numEvents;
int ctlFd;
// Step 1: First Create UDP Socket
/* Create UDP socket
* socket(protocol_family, sock_type, IPPROTO_UDP);
*/
sockFd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP);
/* Check socket is successful or not */
if (sockFd == -1)
{
perror(" Create SockFd Fail \n");
return -1;
}
// Step 2: Make Socket as Non Blocking Socket.
// To handle multiple clients Asychronously, required to
// configure socket as Non Blocking socket
/* Make Socket as Non Blocking Socket */
make_socket_non_blocking(sockFd);
// Step 3: Set socket options
// One can set different sock Options as RE-USE ADDR,
// BROADCAST etc.
/* In this Program, the socket is set to RE-USE ADDR
* So this gives flexibilty to other sockets to BIND to the
same port Num */
if(setsockopt(sockFd, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval))== -1)
{
perror("setsockopt Fail\n");
return -1;
}
// Step 4: Bind to the Recieve socket
/* Bind to its own port Num ( Listen on Port Number) */
/* Setup the addresses */
/* my address or Parameters
( These are required for Binding the Port and IP Address )
Bind to my own port and Address */
memset(&receivesocket, 0, sizeof(receivesocket));
receivesocket.sin_family = AF_INET;
receivesocket.sin_addr.s_addr = htonl(INADDR_ANY);
receivesocket.sin_port = htons(2905);
receivelen = sizeof(receivesocket);
/* Bind the my Socket */
if (bind(sockFd, (struct sockaddr *) &receivesocket, receivelen) < 0)
{
perror("bind");
return -1;
}
// EPOLL Implementation Starts
// Step 5: Create Epoll Instance
/* paramater is Optional */
epollFd = epoll_create(6);
if(epollFd == -1)
{
perror("epoll_create");
return -1;
}
/* Add the udp Sock File Descriptor to Epoll Instance */
ev.data.fd = sockFd;
/* Events are Read Only and Edge-Triggered */
ev.events = EPOLLIN | EPOLLET;
// Step 6: control interface for an epoll descriptor
/* EPOLL_CTL_ADD
Register the target file descriptor fd on the epoll instance
referred to by the file descriptor epfd and
associate the event event with the internal file linked to fd.
*/
/* Add the sock Fd to the EPOLL */
ctlFd = epoll_ctl (epollFd, EPOLL_CTL_ADD, sockFd, &ev);
if (ctlFd == -1)
{
perror ("epoll_ctl");
return -1;
}
// Step 7: Start the Event Loop using epoll_wait() in while Loop.
/* Event Loop */
while(1)
{
/* Wait for events.
* int epoll_wait(int epfd, struct epoll_event *events, int
* maxevents, int timeout);
* Specifying a timeout of -1 makes epoll_wait() wait
* indefinitely.
*/
/* Epoll Wait Indefently since Time Out is -1 */
numEvents = epoll_wait(epollFd, events, MAX_EVENTS, -1);
for (i = 0; i < numEvents; i++)
{
if ((events[i].events & EPOLLERR) ||
(events[i].events & EPOLLHUP) ||
(!(events[i].events & EPOLLIN)))
{
/* An error has occured on this fd, or the socket is not
* ready for reading (why were we notified then?)
*/
fprintf (stderr, "epoll error\n");
close (events[i].data.fd);
continue;
}
/* We have data on the fd waiting to be read. Read and
* display it. We must read whatever data is available
* completely, as we are running in edge-triggered mode
* and won't get a notification again for the same data.
*/
else if ( (events[i].events & EPOLLIN) &&
(sockFd == events[i].data.fd) )
{
while (1)
{
memset(buf, 0, BUFFSIZE);
/* Recieve the Data from Other system */
if ((length = recvfrom(sockFd, buf, BUFFSIZE, 0, NULL, NULL)) < 0)
{
perror("recvfrom");
return -1;
}
else if(length == 0)
{
printf( " The Return Value is 0\n");
break;
}
else
{
/* Print The data */
printf("Recvd Byte length : %d", length);
dumpData(buf, length);
}
}
}
}
}
close( sockFd );
close( epollFd );
return 0;
}
==============================================================================
UDP CLIENT -> udpclient.c
==============================================================================
#include <stdio.h>
#include <arpa/inet.h>
#include <string.h>
#include<stdlib.h>
#include <sys/unistd.h>
#include <sys/fcntl.h>
#define BUFFSIZE 5096
#define MAX_LEN 100000
int sendlen, receivelen;
int received = 0, i,count, rcvCnt=0, sentCnt=0;
unsigned char buffer[BUFFSIZE];
struct sockaddr_in receivesocket;
struct sockaddr_in sendsocket;
int sock;
unsigned int ch;
unsigned int noOfTimes;
int sendUDPData();
int main(int argc, char *argv[]) {
int ret = 0;
int optval = 1;
/* Create the UDP socket */
if ((sock = socket(PF_INET, SOCK_DGRAM, IPPROTO_UDP)) < 0) {
perror("socket");
return -1;
}
/* my address */
memset(&receivesocket, 0, sizeof(receivesocket));
receivesocket.sin_family = AF_INET;
receivesocket.sin_addr.s_addr = htonl(INADDR_ANY);
receivesocket.sin_port = htons(2905);
receivelen = sizeof(receivesocket);
if(setsockopt(sock, SOL_SOCKET, SO_REUSEADDR, &optval, sizeof(optval))== -1)
{
perror("setsockopt Fail\n");
return -1;
}
if (bind(sock, (struct sockaddr *) &receivesocket, receivelen) < 0)
{
perror("bind");
return -1;
}
/* kernel address */
memset(&sendsocket, 0, sizeof(sendsocket));
sendsocket.sin_family = AF_INET;
sendsocket.sin_addr.s_addr = inet_addr("10.12.7.95");
sendsocket.sin_port = htons(2905);
do
{
printf("\n");
printf(" Enter your choice:\t");
printf(" 1. Send UDP Data \n");
printf(" 2. exit \n");
scanf("%d", &ch);
printf("\n");
switch(ch)
{
case 1:
printf("Enter the Length of the Payload \n");
scanf("%d", &sendlen);
printf("Enter How many times you want to send data \n");
scanf("%d", &noOfTimes);
sendUDPData();
break;
default:
printf("Invalid Choice\n");
break;
}
}while(ch!=2);
return 0;
}
int sendUDPData()
{
int count=0;
memset(buffer, 0x31, sendlen);
for(count=0; count< noOfTimes; count++)
{
if (sendto(sock, buffer, sendlen, 0, (struct sockaddr *) &sendsocket,
sizeof(sendsocket)) != sendlen)
{
perror("sendto");
return -1;
}
printf("\n");
}
return 0;
}