Thursday, 12 December 2013

HASHING for Real Time Applications



What is HASHING?
Hashing is a method for storing and retrieving records from a database efficiently.
Hashing allows insert, delete, and search for records based on a search key value.

If the hashing implemented properly and all above operations could be performed in constant time i.e. O (1). This is better than binary search on a sorted array of n records, O (log n) average cost required to do an operation on a binary search tree.

To get time complexity of O (1), designers need to give attention while implementing a hash system.

What is HASH TABLE?
A hash system stores records or elements in an array called a hash table.
(Or) an effective data structure for implementing dictionaries (the store List of records).

What is HASH FUNCTION?














Hash function that can take a key value and compute an integer value (or an index in a table) from it.
Hashing works by performing a computation on a search key K in a way that is intended to identify the position (or index) in HASH TABLE that contains the record with key K.
The function that does this calculation is called the hash function, and will be denoted by the letter h (k). Hash function takes input as a key and returns the index or slot. A position in the hash table is also known as a slot or Index.

Hashing is not good for applications where multiple records with the same key value are permitted. Without collision & overflow, search only takes O (1) time. Data size is not concerned. Security:  If you do not know the hash function, you cannot get data

A good hash function satisfies (approximately) the assumption of simple uniform hashing
Each key is equally likely to hash to any of the “m” slots.  Independently of where any other key has hashed to

Direct Addressing:
With Direct addressing, an element with key (i.e. here i, j, k) stored in slot i, j, k.
With Hashing:

These elements stored in slot h(k1), h(k2), h(k3)

Following are the list of Techniques can be used for Hash Function:
 Division Method:
  • Mapping a key k into one of “m” slots by taking the remainder of k divided by m

h (k) = k mod m    
Ex. m = 12, k = 100, then h(k) = 4

Or hash Index = key % hash_Table_Size
  •  Prime number m” may be good choice
 Mid square Method:
Mapping a key k into one of m slots by get the middle some digits from value k2

h (k ) =    k2 get middle (log m) digits

Ex. m    =  10000,  k = 113586, log m = 4
h(k)      =   1135862                              get middle 4 digits
            =   1290 1779 369              get middle 4 digits
     =   1779
Folding:
  •   Portions of the key are often recombined, or folded together
  •  Divide k into some sections, besides the last section, have same length. Then add these sections together.
a. shift folding
b. folding at the boundaries

H(k) = ∑ (section divided from k) by a or b

Ex, k = 12320324111220, section length = 3

Shift Folding ->  123 + 203 + 241+ 112 + 20                     =>    879
Folding at the boundaries -> 123 + 302 + 241+ 211 + 20  =>  897

 It can be efficiently performed using bitwise operations

Collision & Overflow handling
Collision
      When two keys map to the same location in the hash table.

There are two ways to avoid collisions:-
  •  Open hashing (called as separate Chaining)
  • Closed hashing (called as Open Addressing)   (linear probing, quadratic probing, double hashing)
The difference between the two has to do with whether collisions are stored outside the table (open hashing), or whether collisions result in storing one of the records at another slot in the table (closed hashing).

Open Hashing: (or) Separate chaining
  •  All keys that map to the same hash value are kept in a list (or “bucket”) 
  •  An array of linked list implementation.
 The simplest form of open hashing defines each slot in the hash table to be the head of a linked list. All records that hash to a particular slot are placed on that slot's linked list.

The figure illustrates a hash table where each slot stores one record and a link pointer to the rest of the list.

 Worst-case insert time =   O (1)
ü  insert into the beginning of each link list

Worst -case search time = Θ(n)
ü  Every key mapping to the same slot
ü  Ex. h(1) = h(2) = h(3) =...= h(n) = x
ü  then search key 1

Analysis:
Worst case complexity of all these operations is O(n) In the average case, the running time is O(1 +α) where load factor 
α
=
  n/m               

n
=
 number of elements stored

m
=
 number of hash values or buckets

Θ (1 +α ) means ?
If the number of hash table slots is at least proportional to the number of elements in the
table, we have n = O(m) and, α = n/m = O(m)/m = O(1). Thus, searching takes constant time
on average.
Or hash value h(k) can be computed in O(1) time. If n is O (m), the average case complexity of these operations becomes O(1) !

Closed Hashing: 
  •  No chaining, every key fits in the hash  table 
  • All elements are stored in the hash table itself 
  •  Avoids pointers; only computes the sequence of slots to be examined. 
  •  Collisions are handled by generating a sequence of rehash values. 

Closed Hashing methods:
  •  linear probing,
  • quadratic probing,
  • double hashing

The below program designed for Uniform Hashing ( No collisions) used division method.

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

/* For Any HASH Implementation a Perfect Hash Function is
 * required to avoid collisions */

 /* In real time time is it possible to have perfect Hash function
  * Some times possible and may not possible also
  * Then How to avoid collisions ( Having Same Index value) to store the values
  * Use seperate chaining or Bucket Hashing
  */
typedef struct
{
  int data;
}HASH_TABLE;

HASH_TABLE **hashTable;
int *value;

void intialise_values_and_hashTable(int maxNum, int hashTableSize)
{
   int i;

   /* Intialise the data */
   value = (int *)malloc(maxNum * sizeof(int));

   /* Intialise the Hash Table */
   hashTable = (HASH_TABLE **)malloc(hashTableSize * sizeof(int));


   for(i=0; i<maxNum; i++)
   {
     value[i] = i;
   }

   /* Allocate the Memory Pool of Data */
   for(i=0; i<hashTableSize; i++)
   {
     hashTable[i] = (HASH_TABLE *)malloc(sizeof(HASH_TABLE));
    
     /* reset the Data */
     memset(hashTable[i], 0x00, sizeof(HASH_TABLE));
   }

}

int hashImplementation(int maxNum, int hashTableSize)
{
   int keyIndex, key, hashIndex;

   printf(" Enter the Key Index to get appropriate Key (0 - %d )\n",
               maxNum-1);
   scanf("%d", &keyIndex);
   
   
    /* Get the Hash Key, Ensure that hash key
     * should be unique
     * (e.g: For Key computation use << or >> operators)
     */
    key = getKey(keyIndex, maxNum);
    printf(" The Key : %d\n", key);

    /* Compute the Hash Index
     * This is one of the Important step in Hash Function
     * Function can be computed based on key and HashTable Size
     */
    hashIndex = hashFunction(key,  hashTableSize);

    printf(" The HashIndex : %d\n", hashIndex);

    return hashIndex;
}


/*
 * getKey fn
 */
int getKey(int keyIndex, int maxNum)
{
  int i;

  for(i=0; i<maxNum; i++)
  {

    if(keyIndex == i)
      return value[i];
  }
  if(i == maxNum)
    printf(" No Index Is found\n");

 return 0;
}

/*
 * hashIndex fn
 */
int hashFunction(int key, int hashTableSize)
{
  int index;

  index = key % hashTableSize;
  return index;
}

void insertDataIntoHashTable(int info, int hashIndex)
{
   if(!hashTable[hashIndex]->data)
   {
       hashTable[hashIndex]->data = info;   
   }
   else
   {
      printf(" Data : %d already exists for hashIndex : %d \n:",
          hashTable[hashIndex]->data, hashIndex);
   }
}

void deleteDataFromHashTable(int hashIndex)
{
   hashTable[hashIndex]->data = 0;
}


void displayHashTable(int hashTableSize)
{
  int i;
  for(i=0; i<hashTableSize; i++)
  {
     printf("Index : %d ->  |  data: %d\t |\n", i,
          hashTable[i]->data);
  }

}
/*
 * Main()
 */
int main()
{
   int i, maxNum, hashTableSize, info, hashIndex, ch;

   printf("Enter the Max Number of Entries to be present in Hash Table\n");
   scanf("%d", &maxNum);

   printf("Enter the HASH TABLE SIZE\n");
   scanf("%d", &hashTableSize);

   intialise_values_and_hashTable(maxNum, hashTableSize );

   do
   {
     printf("Enter the choice \n");
     printf("1. Insert \n");
     printf("2. Delete \n");
     printf("3. Display \n");
     printf("10. Exit \n");
     scanf("%d", &ch);

     switch(ch)
     {
       case 1:
         {
           printf(" Enter the Data to be inserted \n");
           scanf("%d", &info);

           hashIndex = hashImplementation(maxNum, hashTableSize);
           /* Insert the data in to Hash Table
            */
           insertDataIntoHashTable(info, hashIndex);
         }
         break;
       case 2:
         {
           printf(" Enter the hashIndex ( Should not be > HashTableSize\n");
           scanf("%d", &hashIndex);

           deleteDataFromHashTable(hashIndex);
         }
         break;
       case 3:
           /* Display*/
           displayHashTable(hashTableSize);
           break;
      default:
           printf("Invalid choice \n");
           break;
      }
   } while(ch != 10);

   /* Free the Allocated Hash Table memory */
   for(i=0; i<hashTableSize; i++)
   {
     free(hashTable[i]);
   }

   free(value);
   free(hashTable);
}

The below program designed with collisions( Used Open Hashing, separate chaining).
#include<stdio.h>
#include<stdlib.h>
#include<string.h>

/* For Any HASH Implementation a Perfect Hash Function is
 * required to avoid collisions */

 /* In real time time is it possible to have perfect Hash function
  * Some times possible and may not possible also
  * Then How to avoid collisions ( Having Same Index value) to store the values
  * Use seperate chaining or Bucket Hashing
  */
struct newNode
{
   int data;
   int key;
   struct newNode *next;
};

typedef struct newNode newData;

typedef struct
{
  int count;
  struct newNode *new;
}HASH_TABLE;


HASH_TABLE **hashTable;
int *value;

/* Bucket Hashing */
HASH_TABLE *head, *last;

void intialise_values_and_hashTable(int maxNum, int hashTableSize)
{
   int i;

   /* Intialise the data */
   value = (int *)malloc(maxNum * sizeof(int));

   /* Intialise the Hash Table */
   hashTable = (HASH_TABLE **)malloc(hashTableSize * sizeof(int));

   for(i=0; i<maxNum; i++)
   {
     value[i] = rand();
   }

   /* Allocate the Memory Pool of Data */
   for(i=0; i<hashTableSize; i++)
   {
     hashTable[i] = (HASH_TABLE *)malloc(sizeof(HASH_TABLE));
    
     /* reset the Data */
     memset(hashTable[i], 0x00, sizeof(HASH_TABLE));
   }
}

int hashImplementation(int maxNum, int hashTableSize)
{
   int keyIndex, key, hashIndex;

   printf(" Enter the Key Index to get appropriate Key (0 - %d )\n",
               maxNum-1);
   scanf("%d", &keyIndex);
   
   
    /* Get the Hash Key, Ensure that hash key
     * should be unique
     * (e.g: For Key computation use << or >> operators)
     */
    key = getKey(keyIndex, maxNum);

    printf(" The Key : %d\n", key);

    /* Compute the Hash Index
     * This is one of the Important step in Hash Function
     * Function can be computed based on key and HashTable Size
     */
    hashIndex = hashFunction(key,  hashTableSize);

    printf(" The HashIndex : %d\n", hashIndex);

    return hashIndex;
}


/*
 * getKey fn
 */
int getKey(int keyIndex, int maxNum)
{
  int i;

  for(i=0; i<maxNum; i++)
  {

    if(keyIndex == i)
      return value[i];
  }
  if(i == maxNum)
    printf(" No Index Is found\n");

 return 0;
}

/*
 * hashIndex fn
 */
int hashFunction(int key, int hashTableSize)
{
  int index;

  index = key % hashTableSize;
  return index;
}

newData *createNewData(int info, int keyData)
{
   newData *new = NULL;

   new = (newData *)malloc(sizeof(newData));
  
   if(new)
   {
      new->data =  info;
      new->key =  keyData;
      new->next =  0;

   }
   return new;
}

void insertDataIntoHashTable(int info, int hashIndex, int keyData)
{
  
   newData *newInfo = NULL, *temp =NULL;

   /* Create New Data */
   newInfo = createNewData(info, keyData);

   if(!hashTable[hashIndex]->new)
   {
      hashTable[hashIndex]->new = newInfo;
   }
   else
   {
     temp = hashTable[hashIndex]->new;
    
     while(temp->next!=0)
     {
        temp = temp->next;
     }
     temp->next = newInfo;
   }
   hashTable[hashIndex]->count++;
}

void deleteDataFromHashTable(int hashIndex, int keyData)
{
   newData *head, *temp, *prev, *current= NULL;
   int count = 0;
  
   current = hashTable[hashIndex]->new;
   head = hashTable[hashIndex]->new;

   while(current)
   {
     if(current->key == keyData)
     {
        if(current == head)
        {
          printf(" It is going inside loop \n");
          temp = current;
          current = current->next;
         
          hashTable[hashIndex]->new = current;

          free(temp);
          temp = NULL;
          return;
        }
        else if(current->next == 0)
        {
          temp = current;
          prev->next = 0;
          free(temp);
          temp= NULL;
          return;

        }
        else
        {
           temp = current;
           prev->next = current->next;
           free(temp);
           temp= NULL;
           return;
        }
     }
     prev = current;
     current = current->next;
     count++;
   }
   if(count == hashTable[hashIndex]->count)
   {
      printf("Data is not found in the hashTable \n");
   }
}


void displayHashTable(int hashTableSize)
{
  int i;

  newData *temp=NULL;
 
  for(i=0; i<hashTableSize; i++)
  {
    temp = hashTable[i]->new;

    while(temp)
    {
      printf(" Key :%d  Data : %d and hashIndex : %d\n", temp->key,
                    temp->data, i);
      temp = temp->next;
    }
  }
}
/*
 * Main()
 */
int main()
{
   int i, keyData, maxNum, hashTableSize, info, hashIndex, ch;

   printf("Enter the Max Number of Entries to be present in Hash Table\n");
   scanf("%d", &maxNum);

   printf("Enter the HASH TABLE SIZE\n");
   scanf("%d", &hashTableSize);

   intialise_values_and_hashTable(maxNum, hashTableSize );

   do
   {
     printf("Enter the choice \n");
     printf("1. Insert \n");
     printf("2. Delete \n");
     printf("3. Display \n");
     printf("10. Exit \n");
     scanf("%d", &ch);

     switch(ch)
     {
       case 1:
         {
           printf(" Enter the Data to be inserted \n");
           scanf("%d", &info);

           printf(" Enter the key Data to be inserted \n");
           scanf("%d", &keyData);

           hashIndex = hashImplementation(maxNum, hashTableSize);
           /* Insert the data in to Hash Table
            */
           insertDataIntoHashTable(info, hashIndex, keyData);
         }
         break;
       case 2:
         {
           printf(" Enter the hashIndex ( Should not be > HashTableSize\n");
           scanf("%d", &hashIndex);

           printf("Enter the KeyData based on tat deletion \n");
           scanf("%d", &keyData);

           deleteDataFromHashTable(hashIndex, keyData);
         }
         break;
       case 3:
           /* Display*/
           displayHashTable(hashTableSize);
           break;
      default:
           printf("Invalid choice \n");
           break;
      }
   } while(ch != 10);

   /* Free the Allocated Hash Table memory */
   for(i=0; i<hashTableSize; i++)
   {
     free(hashTable[i]);
   }

   free(value);
   free(hashTable);
}



No comments:

Post a Comment