sampling Package¶

`sampling` Package¶

This module implement an incremental sampler used to approximate the task and randomly select a portion of the triplets.

`sampler` Module¶

The sampler class implementing incremental sampling without replacement.

Incremental meaning that you don’t have to draw the whole sample at once, instead at any given time you can get a piece of the sample of a size you specify. This is useful for very large sample sizes.

class ABXpy.sampling.sampler.IncrementalSampler(N, K, step=None, relative_indexing=True, dtype=<Mock id='139952200484560'>)[source]¶

Bases: object

Class for sampling without replacement in an incremental fashion

Toy example of usage:

sampler = IncrementalSampler(10**4, 10**4, step=100, relative_indexing=False) complete_sample = np.concatenate([sample for sample in sampler]) assert all(complete_sample==range(10**4))

More realistic example of usage: sampling without replacement 1 million items from a total of 1 trillion items, considering 100 millions items at a time

sampler = IncrementalSampler(10**12, 10**6, step=10**8, relative_indexing=False) complete_sample = np.concatenate([sample for sample in sampler])

next()[source]¶

sample(n, dtype=<Mock id='139952200484688'>)[source]¶

Fast implementation of the sampling function

Get all samples from the next n items in a way that avoid rejection sampling with too large samples, more precisely samples whose expected number of sampled items is larger than 10**5.

Parameters

nint: the size of the chunk

Returns

samplenumpy.array: the indices to keep given relative to the current position in the sample or absolutely, depending on the value of relative_indexing specified when initialising the sampler (default value is True)

simple_sample(n)[source]¶

get all samples from the next n items in a naive fashion

Parameters

nint: the size of the chunk
Returns
——-
samplenumpy.array: the indices to be kept relative to the current position in the sample

ABXpy.sampling.sampler.Knuth_sampling(n, N, dtype=<Mock id='139952200485072'>)[source]¶: This is the usual sampling function when n is comparable to N

ABXpy.sampling.sampler.hypergeometric_sample(N, K, n)[source]¶: This function return the number of elements to sample from the next n items.

ABXpy.sampling.sampler.rejection_sampling(n, N, dtype=<Mock id='139952200485200'>)[source]¶: Using rejection sampling to keep a good performance if n << N

ABXpy.sampling.sampler.sample_without_replacement(n, N, dtype=<Mock id='139952200484880'>)[source]¶: Returns uniform samples in [0, N-1] without replacement. It will use Knuth sampling or rejection sampling depending on the parameters n and N.

Note

the values 0.6 and 100 are based on empirical tests of the functions and would need to be changed if the functions are changed

sampling Package¶