sampling Package¶
sampling
Package¶
This module implement an incremental sampler used to approximate the task and randomly select a portion of the triplets.
sampler
Module¶
The sampler class implementing incremental sampling without replacement.
Incremental meaning that you don’t have to draw the whole sample at once, instead at any given time you can get a piece of the sample of a size you specify. This is useful for very large sample sizes.
-
class
ABXpy.sampling.sampler.
IncrementalSampler
(N, K, step=None, relative_indexing=True, dtype=<Mock id='139952200484560'>)[source]¶ Bases:
object
Class for sampling without replacement in an incremental fashion
Toy example of usage:
sampler = IncrementalSampler(10**4, 10**4, step=100, relative_indexing=False) complete_sample = np.concatenate([sample for sample in sampler]) assert all(complete_sample==range(10**4))
More realistic example of usage: sampling without replacement 1 million items from a total of 1 trillion items, considering 100 millions items at a time
sampler = IncrementalSampler(10**12, 10**6, step=10**8, relative_indexing=False) complete_sample = np.concatenate([sample for sample in sampler])
-
sample
(n, dtype=<Mock id='139952200484688'>)[source]¶ Fast implementation of the sampling function
Get all samples from the next n items in a way that avoid rejection sampling with too large samples, more precisely samples whose expected number of sampled items is larger than 10**5.
- Parameters
- nint
the size of the chunk
- Returns
- samplenumpy.array
the indices to keep given relative to the current position in the sample or absolutely, depending on the value of relative_indexing specified when initialising the sampler (default value is True)
-
-
ABXpy.sampling.sampler.
Knuth_sampling
(n, N, dtype=<Mock id='139952200485072'>)[source]¶ This is the usual sampling function when n is comparable to N
-
ABXpy.sampling.sampler.
hypergeometric_sample
(N, K, n)[source]¶ This function return the number of elements to sample from the next n items.
-
ABXpy.sampling.sampler.
rejection_sampling
(n, N, dtype=<Mock id='139952200485200'>)[source]¶ Using rejection sampling to keep a good performance if n << N
-
ABXpy.sampling.sampler.
sample_without_replacement
(n, N, dtype=<Mock id='139952200484880'>)[source]¶ Returns uniform samples in [0, N-1] without replacement. It will use Knuth sampling or rejection sampling depending on the parameters n and N.
Note
the values 0.6 and 100 are based on empirical tests of the functions and would need to be changed if the functions are changed