sampling Package

sampling Package

This module implement an incremental sampler used to approximate the task and randomly select a portion of the triplets.

sampler Module

The sampler class implementing incremental sampling without replacement.

Incremental meaning that you don’t have to draw the whole sample at once, instead at any given time you can get a piece of the sample of a size you specify. This is useful for very large sample sizes.

class ABXpy.sampling.sampler.IncrementalSampler(N, K, step=None, relative_indexing=True, dtype=<Mock id='139952200484560'>)[source]

Bases: object

Class for sampling without replacement in an incremental fashion

Toy example of usage:

sampler = IncrementalSampler(10**4, 10**4, step=100, relative_indexing=False) complete_sample = np.concatenate([sample for sample in sampler]) assert all(complete_sample==range(10**4))

More realistic example of usage: sampling without replacement 1 million items from a total of 1 trillion items, considering 100 millions items at a time

sampler = IncrementalSampler(10**12, 10**6, step=10**8, relative_indexing=False) complete_sample = np.concatenate([sample for sample in sampler])

next()[source]
sample(n, dtype=<Mock id='139952200484688'>)[source]

Fast implementation of the sampling function

Get all samples from the next n items in a way that avoid rejection sampling with too large samples, more precisely samples whose expected number of sampled items is larger than 10**5.

Parameters
nint

the size of the chunk

Returns
samplenumpy.array

the indices to keep given relative to the current position in the sample or absolutely, depending on the value of relative_indexing specified when initialising the sampler (default value is True)

simple_sample(n)[source]

get all samples from the next n items in a naive fashion

Parameters
nint

the size of the chunk

Returns
——-
samplenumpy.array

the indices to be kept relative to the current position in the sample

ABXpy.sampling.sampler.Knuth_sampling(n, N, dtype=<Mock id='139952200485072'>)[source]

This is the usual sampling function when n is comparable to N

ABXpy.sampling.sampler.hypergeometric_sample(N, K, n)[source]

This function return the number of elements to sample from the next n items.

ABXpy.sampling.sampler.rejection_sampling(n, N, dtype=<Mock id='139952200485200'>)[source]

Using rejection sampling to keep a good performance if n << N

ABXpy.sampling.sampler.sample_without_replacement(n, N, dtype=<Mock id='139952200484880'>)[source]

Returns uniform samples in [0, N-1] without replacement. It will use Knuth sampling or rejection sampling depending on the parameters n and N.

Note

the values 0.6 and 100 are based on empirical tests of the functions and would need to be changed if the functions are changed