Bloom Filters



A Bloom filter is defined as a data structure designed to identify of a elements presence in a set in a rapid and memory efficient manner.

You can think of it as a probabilistic data structure. This data structure helps us to identify that an element is either present or absent in a set. It is not used to store the actual data, but to check whether the data is present or not.

It is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not. In other words, a query returns either "possibly in set" or "definitely not in set".

How Bloom Filter Works?

Let's understand how a Bloom filter works with the help of an example:

Suppose we have a set of elements {A, B, C, D, E, F, G, H, I, J} and we want to check whether the element 'X' is present in the set or not.

Here's how a Bloom filter works:

  • Initially, we create a bit array of size 'm' and initialize all bits to 0.
  • We also create 'k' hash functions, each of which maps an element to one of the 'm' bits.
  • For each element in the set, we calculate the 'k' hash values and set the corresponding bits in the bit array to 1.
  • When we want to check whether an element 'X' is present in the set, we calculate the 'k' hash values for 'X' and check if all the corresponding bits are set to 1.
  • If all the bits are set to 1, we say that 'X' is possibly in the set. If any of the bits is 0, we say that 'X' is definitely not in the set.

Implementation of Bloom Filter

Here's an example implementation of a Bloom filter C, C++, Java and Python:

// C program to implement Bloom Filter
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

#define SIZE 10

bool bitArray[SIZE] = {0};

int hash1(int key){
   return key % SIZE;
}

void insert(int key){
   int h1 = hash1(key);
   bitArray[key] = 1;
}

bool search(int key){
   int h1 = hash1(key);
   return bitArray[key];
}

int main(){
   insert(3);
   insert(5);
   insert(7);
   insert(9);

   printf("%d\n", search(3));
   printf("%d\n", search(5));
   printf("%d\n", search(7));
   printf("%d\n", search(9));
   printf("%d\n", search(4));
   printf("%d\n", search(6));
   printf("%d\n", search(8));

   return 0;
}

Output

Following is the output of the above C program:

1
1
1
1
0
0
0
// C++ program to implement Bloom Filter
#include <iostream>
#include <vector>
using namespace std;

#define SIZE 10

vector<bool> bitArray(SIZE, false);

int hash1(int key){
   return key % SIZE;
}

void insert(int key){
   int h1 = hash1(key);
   bitArray[key] = true;
}

bool search(int key){
   int h1 = hash1(key);
   return bitArray[key];
}

int main(){
   insert(3);
   insert(5);
   insert(7);
   insert(9);

   cout << search(3) << endl;
   cout << search(5) << endl;
   cout << search(7) << endl;
   cout << search(9) << endl;
   cout << search(4) << endl;
   cout << search(6) << endl;
   cout << search(8) << endl;

   return 0;
}

Output

Following is the output of the above C++ program:

1
1
1
1
0
0
0
// Java program to implement Bloom Filter
import java.util.*;

public class BloomFilter {
   static final int SIZE = 10;
   static boolean[] bitArray = new boolean[SIZE];

   static int hash1(int key){
      return key % SIZE;
   }

   static void insert(int key){
      int h1 = hash1(key);
      bitArray[key] = true;
   }

   static boolean search(int key){
      int h1 = hash1(key);
      return bitArray[key];
   }

   public static void main(String[] args){
      insert(3);
      insert(5);
      insert(7);
      insert(9);

      System.out.println(search(3));
      System.out.println(search(5));
      System.out.println(search(7));
      System.out.println(search(9));
      System.out.println(search(4));
      System.out.println(search(6));
      System.out.println(search(8));
   }
}

Output

Following is the output of the above Java program:

true
true
true
true
false
false
false
# Python program to implement Bloom Filter
SIZE = 10
bitArray = [False] * SIZE

def hash1(key):
   return key % SIZE

def insert(key):
   h1 = hash1(key)
   bitArray[key] = True

def search(key):
   h1 = hash1(key)
   return bitArray[key]

insert(3)
insert(5)
insert(7)
insert(9)

print(search(3))
print(search(5))
print(search(7))
print(search(9))
print(search(4))
print(search(6))
print(search(8))

Output

Following is the output of the above Python program:

True
True
True
True
False
False
False

Features of Bloom Filter

Some of the key features of Bloom filters include:

  • Space-efficient: Bloom filters use a small amount of memory compared to other data structures.
  • Fast: Bloom filters provide constant-time lookup and insertion operations.
  • Probabilistic: Bloom filters may return false positives, but never false negatives.
  • Scalable: Bloom filters can be easily scaled to handle large datasets.

Applications of Bloom Filter

Bloom filters are used in various applications, including:

  • Spell checkers
  • Network routers
  • Web browsers
  • Database systems
  • Anti-virus software
  • Big data processing
  • Content delivery networks
Advertisements