Reservoir Sampling

Reservoir Sampling

ยท

1 min read

From a stream of events, return a random element. The class definition is

class StreamObserber()
    def observer(x):
    def sample(): # return an random element x

Solution Exploration

Pasted image 20220130070643.png

Code

V0 - Store all the elements

import random  


class StreamObserver:  

    def __init__(self):  
        self.n = 0  
        self.elements = []  

    def observe(self, x):  
        self.n += 1  
        self.elements.append(x)  

    def sample(self):  
        r = random.randint(0, self.n - 1)  
        return self.elements[r]

V1 - Limit the list size

class StreamObserverV1:  
 """  
 Optimize for memory, use a fixed size queue and discard the oldest element    when we have reached the maximum size.  
 """  
 def __init__(self):  
        self.elements = []  
        self.max_size = 10  

 def observe(self, x):  
        if len(self.elements) >= self.max_size:  
            self.elements.pop(0)  
            self.elements.append(x)  

  def sample(self):  
       r = random.randint(0, len(self.elements) - 1)  
       return self.elements[r]

V2 - Just Store that One Element

class StreamObserverV2:  

    def __init__(self) -> None:  
        self.elm = None  
        self.n = 0  

   def observe(self, x):  
        self.n += 1  
        prob = 1 / (self.n)  
        randomProb = random.uniform(0, 1)  
        if randomProb < prob:  
            self.elm = x  
        else:  
            pass  

   def sample(self):  
         return self.elm

Questions

How to test the implementation, it is randomized

  • Push in 1 to 100 numbers, and sample for 10 times should have got at-least one less than 10 ?
ย