The Pigeonhole Principle : Maximum Gap

The Pigeonhole Principle : Maximum Gap

ยท

5 min read

This is follow up from the previous blog. I will highly suggest you give it a read before proceeding.

Recap Maximum Gap

Given an integer array nums, return the maximum difference between two successive elements in its sorted form. If the array contains less than two elements, return 0.

You must write an algorithm that runs in linear time and uses linear extra space.

We tried two approaches before:

  1. Simple Sorting and iterating through the array ( max arr length = 105) with O(nlogn) time complexity.
  2. Building a boolean array until the max element (max element = 109) and iterating through this new array. Although this is linear time complexity and utilizing extra space, n here is 109. Iterating a billion-element array is expensive. We should come up with a more efficient solution.

Further Exploration

The problem here is we have such a big search space ~109. How can we reduce it further?

Instead of building array from 0th element to Max Element, what if we look for range from min to max values? We can build array mapping from min element to max.

This approach can reduce some search space but not drastically. Take an example below, we would still have an array of length 107.

e.g. [100, 500, 300, 9999999, 9999]  MaxGap = 9999999 - 9999 = 9990000

What do we care about in the range?

  • Having elements in the contiguous sequence between the range min and max
  • If all the elements are present between this range , then MaxGap = 1

Alt fig.1.

When will the max gap be greater than 1 ?

  • If some or sequence of elements go missing in the range, we have MaxGap > 1

How to identify the missing ranges?

If you take a look at the diagram above, you will see that an array works like a bucket, and that each bucket contains only one element.

n buckets = n elements

  • Can we bucket the sequences with more than one element? It's possible!
  • Perhaps we can store some meta-information about each bucket to help find the gaps? It's possible, what kind of information

To explore further, question that comes first to mind,

How do we determine the number of elements to store in each bucket?

Let us , for example, transform our original array into buckets of size = 3 (random selection) and visualize.

p2-selected.png

p2-selected2.png

In other words, when there are equal sized buckets, the bucket with no elements will add the most to the gap between contiguous sequences. By looking at the previous and next buckets, where it overflowed, the total can be calculated.

Approach

We can bucket the range of elements in such a way that we can find at least one empty bucket, and this is the gap that actually contributes to MaxGap.

If we store min and max element for each bucket, then for such an empty bucket,

MaxGap = (max Element from previous non-empty bucket) - (min Element from next non-empty bucket)

Therefore, finding all the missing elements becomes unnecessary.

Although the above diagram shows that intuitively, how do we know this will be true always?

Pigeonhole Principle

The Pigeonhole Principle states that

if n items are put into m containers, with n>m, then at least one container must contain more than one item

Considering the same principle alternate formulations for non-missing and missing items, which is relevant to this problem,

If n objects are distributed over n places in such a way that no place receives more than one object, then each place receives exactly one object.

This is like our array from min to max range, no missing, with MaxGap = 1.

If n objects are distributed over m places, and if n < m, then some place receives no object.

This is like our array from min to max range, some missing, with MaxGap > 1. In other words, if n-1 numbers are divided into n buckets, there is at least one empty bucket.

It follows that each of these empty buckets will contribute to the MaxGap, which now reduces the problem to identifying all the empty buckets and computing min and max for non-empty buckets adjacent to those.

To solidify, let us do an example walkthrough,

Example Walkthrough

p3-example.png

Steps

  1. Given n elements in input array, first build buckets of size (n-1) [so that we have at least one empty bucket]
  2. Decide bucket size from the range min and max values for the input array. [ Note: both min and max elements are kept exclusive when adding to the bucket to satisfy the condition]

    (n-2)/(n-1) produces one empty bucket (total numbers divided by total buckets)

  3. As we are storing max and min values for each bucket, initialize the variables for each bucket.
  4. Iterate over input array, and for each element, find which bucket it belongs to using below formula to find bucketIndex. Skip the min and max elements as explained in Step 2.

    Bucket_Index = Math.floor((element-minElem)/bucketSize)

  5. Once bucket index identified, compare with max and min values for that bucket and update accordingly.
  6. Now that we have stored meta-information and buckets , iterate over buckets to identify empty ones.
  7. When empty bucket found, calculate the maxGap.

Code

V2 - Buckets and Pigeonhole Principle

class Solution {
    static class Bucket{
        int max;
        int min;
        Bucket(int max,int min){
            this.max = max;
            this.min = min;
        }
    }

    public int maximumGap(int[] nums) {
        if(nums.length == 1) return 0;

        // find range of elements max and min
        int maxElem = Arrays.stream(nums).max().getAsInt();
        int minElem = Arrays.stream(nums).min().getAsInt();
        int n = nums.length;

        // build a bucket of size (n-1)
        Bucket[] bucketArr = new Bucket[n-1];
        for(int i=0;i<n-1;i++){
            bucketArr[i] = new Bucket(Integer.MIN_VALUE,Integer.MAX_VALUE);
        }

        // divide the range into equal sized buckets
        // make sure both numerator and denominator are floats!!!
        float bucketSize = (float)(maxElem - minElem)/(float)(n-1);

        // add the elements to the buckets
        // find min and max for each bucket

        for(Integer elem: nums){
            // continue if min or max
            if(elem == minElem || elem == maxElem){
                continue;
            }
            int bucketIndex = (int) Math.floor((elem-minElem)/bucketSize);
            bucketArr[bucketIndex].max = Math.max(bucketArr[bucketIndex].max, elem);
            bucketArr[bucketIndex].min = Math.min(bucketArr[bucketIndex].min, elem);
        }

        // identify all the empty buckets and calculate maxGap
        int maxGap = 0;
        int maxOfPrevNonEmpty =  minElem;

        for(int i = 0; i< n-1; i++){
            //Empty bucket - skip
            if(bucketArr[i].min == Integer.MAX_VALUE){
                continue;
            }
            int minOfNextNonEmpty = bucketArr[i].min;

            maxGap = Math.max(maxGap, minOfNextNonEmpty - maxOfPrevNonEmpty);

            maxOfPrevNonEmpty = bucketArr[i].max;
        }

        maxGap = Math.max(maxGap, maxElem-maxOfPrevNonEmpty);
        return maxGap;
    }
}

Time Complexity : O(n)

Space Complexity : O(n)

Author: Rasika.Warade

ย