schrodinger.math.sampling module

Functions and classes for supporting statistical sampling.

class schrodinger.math.sampling.ReservoirSampler(k, rng=None)

Bases: object

Obtain a random sample without replacement of k items from an input stream of unknown size.

Similar to more_itertools.sample, except that it provides feedback on which samples are kept or rejected at each step. And the sample can be accessed at any point when iterating through data (such as after an iterator is exhausted but before another is fed in).

A random sample is determined in a single pass through the data. A working sample of the data based on the number of observations seen so far is held in memory. The current sample can be accessed at any time with the ‘sample’ member variable. E.g.:

>>> # Create a reservoir object with total sample size of 2.
>>> r = ReservoirSampler(2, np.random.default_rng(42))
>>> # Fill the first two slots with [1,2].
>>> _ = r.consider(1)
>>> _ = r.consider(2)
>>> r.sample
[1, 2]
>>> # Consider whether '3' should be included in the sample.
>>> r.consider(3)
(True, 1)
>>> # The '1' element was replaced with '3'.
>>> r.sample
[3, 2]
>>> # Now consider '4' for inclusion.
>>> r.consider(4)
(False, None)
>>> # The '4' element was not included. Our sample is unchanged.
>>> r.sample
[3, 2]

Based on Algorithm L from the following paper:

Kim-Hung Li. "Reservoir-sampling algorithms of time complexity
O(n(1 + log(N/n)))". ACM Transactions on Mathematical Software.
Volume 20. Issue 4. pp 481–493. https://doi.org/10.1145/198429.198435
Variables
  • k (int) – The sample size.

  • sample (list) – The retained sample data.

__init__(k, rng=None)
Parameters
  • k (int) – Sample size. Must be a positive integer.

  • rng (Optional[numpy.random.Generator]) – Random number generator.

consider(x)

Considers ‘x’ for inclusion into the sample.

Returns

A tuple. The first element indicates whether ‘x’ was accepted into the sample. The second element will either be the item that was replaced by ‘x’ or None if no item was replaced.

Return type

tuple[bool, Any]

class schrodinger.math.sampling.SystematicSampler(k, *, offset=0)

Bases: object

A callable class that will perform systematic interval sampling by returning only every k-th item after an offset. The interval state is persistent between calls so that many iterables can be passed through as if they were all part of a larger sampling frame.

If persistent state is not needed, then itertools.islice can be used to obtain such a sample.

__init__(k, *, offset=0)
Parameters
  • k (int) – The sample interval. Take every k-th element.

  • offset (int) – The number of elements to offset from the beginning of the sampling frame.

classmethod filter(sampling_frame, k, *, offset=0)

A generator function that will return only the k-th elements from the sampling frame.

Parameters
  • sampling_frame (Iterable) – The elements from which to create a sample.

  • k (int) – The sample interval. Take every k-th element.

  • offset (int) – The number of elements to offset from the beginning of the sampling frame.

Returns

An iterator that fill filter elements not meeting the sampling schedule.

Return type

Iterator