dival.datasets package
Submodules
- dival.datasets.angle_subset_dataset module
- dival.datasets.cached_dataset module
- dival.datasets.dataset module
Dataset
Dataset.space
Dataset.shape
Dataset.train_len
Dataset.validation_len
Dataset.test_len
Dataset.random_access
Dataset.num_elements_per_sample
Dataset.standard_dataset_name
Dataset.__init__()
Dataset.generator()
Dataset.get_train_generator()
Dataset.get_validation_generator()
Dataset.get_test_generator()
Dataset.get_len()
Dataset.get_train_len()
Dataset.get_validation_len()
Dataset.get_test_len()
Dataset.get_shape()
Dataset.get_num_elements_per_sample()
Dataset.get_data_pairs()
Dataset.get_data_pairs_per_index()
Dataset.create_torch_dataset()
Dataset.create_keras_generator()
Dataset.get_sample()
Dataset.get_samples()
Dataset.supports_random_access()
ObservationGroundTruthPairDataset
GroundTruthDataset
- dival.datasets.ellipses_dataset module
- dival.datasets.fbp_dataset module
- dival.datasets.lodopab_dataset module
download_lodopab()
LoDoPaBDataset
LoDoPaBDataset.space
LoDoPaBDataset.shape
LoDoPaBDataset.train_len
LoDoPaBDataset.validation_len
LoDoPaBDataset.test_len
LoDoPaBDataset.random_access
LoDoPaBDataset.num_elements_per_sample
LoDoPaBDataset.ray_trafo
LoDoPaBDataset.sorted_by_patient
LoDoPaBDataset.rel_patient_ids
LoDoPaBDataset.__init__()
LoDoPaBDataset.generator()
LoDoPaBDataset.get_ray_trafo()
LoDoPaBDataset.get_sample()
LoDoPaBDataset.get_samples()
LoDoPaBDataset.get_indices_for_patient()
LoDoPaBDataset.check_for_lodopab()
LoDoPaBDataset.get_num_patients()
LoDoPaBDataset.get_patient_ids()
LoDoPaBDataset.get_idx_sorted_by_patient()
- dival.datasets.reordered_dataset module
- dival.datasets.standard module
Module contents
Implements datasets for training and evaluating learned reconstructors.
|
Return a standard dataset by name. |
|
Dataset base class. |
|
Ground truth dataset base class. |
|
Dataset of pairs generated from a ground truth generator by applying a forward operator and noise. |
|
Dataset with images of multiple random ellipses. |
|
The LoDoPaB-CT dataset, which is documented in the Data Descriptor article https://www.nature.com/articles/s41597-021-00893-z and hosted on https://zenodo.org/record/3384092. |
The function get_standard_dataset()
returns fixed “standard” datasets
with pairs of observation and ground truth samples.
Currently the standard datasets are 'ellipses'
and 'lodopab'
.
The class ObservationGroundTruthPairDataset
can be used, either
directly or via GroundTruthDataset.create_pair_dataset()
, to create a
custom dataset of pairs given a ground truth dataset and a forward operator.
For example:
define a
GroundTruthDataset
object (e.g.EllipsesDataset
)define a forward operator
call
create_pair_dataset()
of the dataset and pass the forward operator as well as some noise specification if wanted
- dival.datasets.get_standard_dataset(name, **kwargs)[source]
Return a standard dataset by name.
The standard datasets are (currently):
'ellipses'
A typical synthetical CT dataset with ellipse phantoms.
EllipsesDataset is used as ground truth dataset, a ray transform with parallel beam geometry using 30 angles is applied, and white gaussian noise with a standard deviation of 2.5% (i.e.
0.025 * mean(abs(observation))
) is added.In order to avoid the inverse crime, the ground truth images of shape (128, 128) are upscaled by bilinear interpolation to a resolution of (400, 400) before the ray transform is applied (whose discretization is different from the one of
ray_trafo
).- Attributes of the returned dataset:
- ray_trafo
odl.tomo.RayTransform
Ray transform corresponding to the noiseless forward operator.
get_ray_trafo(**kwargs)
functionFunction that returns a ray transform corresponding to the noiseless forward operator. Keyword arguments (e.g. impl) are forwarded to the
RayTransform
constructor.
- ray_trafo
'lodopab'
The LoDoPaB-CT dataset, which is documented in the Data Descriptor article https://www.nature.com/articles/s41597-021-00893-z and hosted on https://zenodo.org/record/3384092. It is a simulated low dose CT dataset based on real reconstructions from the LIDC-IDRI dataset.
The dataset contains 42895 pairs of images and projection data. For simulation, a ray transform with parallel beam geometry using 1000 angles and 513 detector pixels is used. Poisson noise corresponding to 4096 incident photons per pixel before attenuation is applied to the projection data.
- Attributes of the returned dataset:
- ray_trafo
odl.tomo.RayTransform
Ray transform corresponding to the noiseless forward operator.
- ray_trafo
- Methods of the returned dataset:
get_ray_trafo(**kwargs)
Function that returns a ray transform corresponding to the noiseless forward operator. Keyword arguments (e.g. impl) are forwarded to the
RayTransform
constructor.
- Parameters:
name (str) – Name of the dataset.
kwargs (dict) –
Keyword arguments. Supported parameters for the datasets are:
'ellipses'
- impl{
'skimage'
,'astra_cpu'
,'astra_cuda'
}, optional Implementation passed to
odl.tomo.RayTransform
Default:'astra_cuda'
.- fixed_seedsdict or bool, optional
Seeds to use for random ellipse generation, passed to
EllipsesDataset.__init__()
. Default:False
.- fixed_noise_seedsdict or bool, optional
Seeds to use for noise generation, passed as noise_seeds to
GroundTruthDataset.create_pair_dataset()
. IfTrue
is passed (the default), the seeds{'train': 1, 'validation': 2, 'test': 3}
are used.
- impl{
'lodopab'
- num_anglesint, optional
Number of angles to use from the full 1000 angles. Must be a divisor of 1000.
- observation_model{
'post-log'
,'pre-log'
}, optional The observation model to use. Default is
'post-log'
.- min_photon_countfloat, optional
Replacement value for a simulated photon count of zero. If
observation_model == 'post-log'
, a value greater than zero is required in order to avoid undefined values. The default is 0.1, both for'post-log'
and'pre-log'
model.- sorted_by_patientbool, optional
Whether to sort the samples by patient id. Useful to resplit the dataset. Default:
False
.- impl{
'skimage'
,'astra_cpu'
,'astra_cuda'
}, optional Implementation passed to
odl.tomo.RayTransform
Default:'astra_cuda'
.
- Returns:
dataset – The standard dataset. It has an attribute standard_dataset_name that stores its name.
- Return type:
- class dival.datasets.Dataset(space=None)[source]
Bases:
object
Dataset base class.
Subclasses must either implement
generator()
or provide random access by implementingget_sample()
andget_samples()
(which then should be indicated by setting the attributerandom_access = True
).- space
The spaces of the elements of samples as a tuple. If only one element per sample is provided, this attribute is the space of the element (i.e., no tuple). It is strongly recommended to set this attribute in subclasses, as some functionality may depend on it.
- Type:
[tuple of ]
odl.space.base_tensors.TensorSpace
or None
- shape
The shapes of the elements of samples as a tuple of tuple of int. If only one element per sample is provided, this attribute is the shape of the element (i.e., not a tuple of tuple of int, but a tuple of int).
- Type:
[tuple of ] tuple of int, optional
- train_len
Number of training samples.
- Type:
int, optional
- validation_len
Number of validation samples.
- Type:
int, optional
- test_len
Number of test samples.
- Type:
int, optional
- random_access
Whether the dataset supports random access via
self.get_sample
andself.get_samples
. Setting this attribute is the preferred way for subclasses to indicate whether they support random access.- Type:
bool, optional
- num_elements_per_sample
Number of elements per sample. E.g. 1 for a ground truth dataset or 2 for a dataset of pairs of observation and ground truth.
- Type:
int, optional
- standard_dataset_name
Datasets returned by get_standard_dataset have this attribute giving its name.
- Type:
str, optional
- __init__(space=None)[source]
The attributes that potentially should be set by the subclass are:
space
(can also be set by argument),shape
,train_len
,validation_len
,test_len
,random_access
andnum_elements_per_sample
.- Parameters:
space ([tuple of ]
odl.space.base_tensors.TensorSpace
, optional) – The spaces of the elements of samples as a tuple. If only one element per sample is provided, this attribute is the space of the element (i.e., no tuple). It is strongly recommended to set space in subclasses, as some functionality may depend on it.
- generator(part='train')[source]
Yield data.
The default implementation calls
get_sample()
if the dataset implements it (i.e., supports random access).- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to yield train, validation or test data. Default is'train'
.- Yields:
data (odl element or tuple of odl elements) – Sample of the dataset.
- get_len(part='train')[source]
Return the number of elements the generator will yield.
- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to return the number of train, validation or test elements. Default is'train'
.
- get_shape()[source]
Return the shape of each element.
Returns
shape
if it is set. Otherwise, it is inferred fromspace
(which is strongly recommended to be set in every subclass). If alsospace
is not set, aNotImplementedError
is raised.- Returns:
shape
- Return type:
[tuple of ] tuple
- get_num_elements_per_sample()[source]
Return number of elements per sample.
Returns
num_elements_per_sample
if it is set. Otherwise, it is inferred fromspace
(which is strongly recommended to be set in every subclass). If alsospace
is not set, aNotImplementedError
is raised.- Returns:
num_elements_per_sample
- Return type:
int
- get_data_pairs(part='train', n=None)[source]
Return first samples from data part as
DataPairs
object.Only supports datasets with two elements per sample.``
- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.n (int, optional) – Number of pairs (from beginning). If None, all available data is used (the default).
- get_data_pairs_per_index(part='train', index=None)[source]
Return specific samples from data part as
DataPairs
object.Only supports datasets with two elements per sample.
For datasets not supporting random access, samples are extracted from
generator()
, which can be computationally expensive.- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.index (int or list of int, optional) – Indices of the samples in the data part. Default is
'[0]'
.
- create_torch_dataset(part='train', reshape=None, transform=None)[source]
Create a torch dataset wrapper for one part of this dataset.
If
supports_random_access()
returnsFalse
, a subclass of oftorch.utils.data.IterableDataset
is returned that fetches samples viagenerator()
. Note: When using torch’s DataLoader with multiple workers you might want to individually configure the datasets for each worker, see the PyTorch docs on IterableDataset. For this purpose it can be useful to modify the wrapped dival dataset inworker_init_fn()
, which can be accessed there viatorch.utils.data.get_worker_info().dataset.dataset
.If
supports_random_access()
returns True, a subclass of oftorch.utils.data.Dataset
is returned that retrieves samples usingget_sample()
.- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.reshape (tuple of (tuple or None), optional) – Shapes to which the elements of each sample will be reshaped. If None is passed for an element, no reshape is applied.
transform (callable, optional) – Transform to be applied on each sample, useful for augmentation. Default: None, i.e. no transform.
- Returns:
dataset – The torch dataset wrapping this dataset. The wrapped dival dataset is assigned to the attribute
dataset.dataset
.- Return type:
torch.utils.data.Dataset
ortorch.utils.data.IterableDataset
- create_keras_generator(part='train', batch_size=1, shuffle=True, reshape=None)[source]
Create a keras data generator wrapper for one part of this dataset.
If
supports_random_access()
returnsFalse
, a generator wrappinggenerator()
is returned. In this case no shuffling is performed regardless of the passed shuffle parameter. Also, parallel data loading (with multiple workers) is not applicable.If
supports_random_access()
returns True, atf.keras.utils.Sequence
is returned, which is implemented usingget_sample()
. For datasets that support parallel calls toget_sample()
, the returned data generator (sequence) can be used by multiple workers.- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.batch_size (int, optional) – Batch size. Default is 1.
shuffle (bool, optional) – Whether to shuffle samples each epoch. This option has no effect if
supports_random_access()
returnsFalse
, since in that case samples are fetched directly fromgenerator()
. The default is True.reshape (tuple of (tuple or None), optional) – Shapes to which the elements of each sample will be reshaped. If None is passed for an element, no reshape is applied.
- get_sample(index, part='train', out=None)[source]
Get single sample by index.
- Parameters:
index (int) – Index of the sample.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
sample – E.g. for a pair dataset:
(array, None)
ifout=(True, False)
.- Return type:
[tuple of ] (array-like or None)
- get_samples(key, part='train', out=None)[source]
Get samples by slice or range.
The default implementation calls
get_sample()
if the dataset implements it.- Parameters:
key (slice or range) – Indexes of the samples.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. The first dimension must match the number of samples requested. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
samples – If the dataset has multiple arrays per sample, a tuple holding arrays is returned. E.g. for a pair dataset:
(array, None)
ifout=(True, False)
. The samples are stacked in the first (additional) dimension of each array.- Return type:
[tuple of ] (array-like or None)
- supports_random_access()[source]
Whether random access seems to be supported.
If the object has the attribute self.random_access, its value is returned (this is the preferred way for subclasses to indicate whether they support random access). Otherwise, a simple duck-type check is performed which tries to get the first sample by random access.
- Returns:
supports –
True
if the dataset supports random access, otherwiseFalse
.- Return type:
bool
- class dival.datasets.GroundTruthDataset(space=None)[source]
Bases:
Dataset
Ground truth dataset base class.
- __init__(space=None)[source]
- Parameters:
space (
odl.space.base_tensors.TensorSpace
, optional) – The space of the samples. It is strongly recommended to set space in subclasses, as some functionality may depend on it.
- create_pair_dataset(forward_op, post_processor=None, noise_type=None, noise_kwargs=None, noise_seeds=None)[source]
The parameters are a subset of those of
ObservationGroundTruthPairDataset.__init__()
.
- class dival.datasets.ObservationGroundTruthPairDataset(ground_truth_gen, forward_op, post_processor=None, train_len=None, validation_len=None, test_len=None, domain=None, noise_type=None, noise_kwargs=None, noise_seeds=None)[source]
Bases:
Dataset
Dataset of pairs generated from a ground truth generator by applying a forward operator and noise.
NB: This dataset class does not allow for random access. Supporting random access would require to restore the same random generator state each time the same sample is being accessed if a fixed noise realization should be used for each sample.
- __init__(ground_truth_gen, forward_op, post_processor=None, train_len=None, validation_len=None, test_len=None, domain=None, noise_type=None, noise_kwargs=None, noise_seeds=None)[source]
- Parameters:
ground_truth_gen (generator function) – Function returning a generator providing ground truth. Must accept a part parameter like
Dataset.generator()
.forward_op (odl operator) – Forward operator to apply on the ground truth.
post_processor (odl operator, optional) – Post-processor to apply on the result of the forward operator.
train_len (int, optional) – Number of training samples.
validation_len (int, optional) – Number of validation samples.
test_len (int, optional) – Number of test samples.
domain (odl space, optional) – Ground truth domain. If not specified, it is inferred from forward_op.
noise_type (str, optional) – Noise type. See
NoiseOperator
for the list of supported noise types.noise_kwargs (dict, optional) – Keyword arguments passed to
NoiseOperator
.noise_seeds (dict of int, optional) – Seeds to use for random noise generation. The part (
'train'
, …) is the key to the dict. If a key is omitted or a value is None, no fixed seed is used for that part. By default, no fixed seeds are used.
- generator(part='train')[source]
Yield data.
The default implementation calls
get_sample()
if the dataset implements it (i.e., supports random access).- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to yield train, validation or test data. Default is'train'
.- Yields:
data (odl element or tuple of odl elements) – Sample of the dataset.
- class dival.datasets.EllipsesDataset(image_size=128, min_pt=None, max_pt=None, train_len=32000, validation_len=3200, test_len=3200, fixed_seeds=False)[source]
Bases:
GroundTruthDataset
Dataset with images of multiple random ellipses.
This dataset uses
odl.phantom.ellipsoid_phantom()
to create the images. The images are normalized to have a value range of[0., 1.]
with a background value of0.
.- space
odl.uniform_discr(min_pt, max_pt, (image_size, image_size), dtype='float32')
, with the parameters passed to__init__()
.
- shape
(image_size, image_size)
, with image_size parameter passed to__init__()
. Default(128, 128)
.
- train_len
train_len parameter passed to
__init__()
. Default32000
.
- validation_len
validation_len parameter passed to
__init__()
. Default3200
.
- test_len
test_len parameter passed to
__init__()
. Default3200
.
- random_access
False
- num_elements_per_sample
1
- __init__(image_size=128, min_pt=None, max_pt=None, train_len=32000, validation_len=3200, test_len=3200, fixed_seeds=False)[source]
- Parameters:
image_size (int, optional) – Number of pixels per image dimension. Default:
128
.min_pt ([int, int], optional) – Minimum values of the lp space. Default:
[-image_size/2, -image_size/2]
.max_pt ([int, int], optional) – Maximum values of the lp space. Default:
[image_size/2, image_size/2]
.train_len (int or None, optional) – Length of training set. Default:
32000
. If None, infinitely many samples could be generated.validation_len (int, optional) – Length of training set. Default:
3200
.test_len (int, optional) – Length of test set. Default:
3200
.fixed_seeds (dict or bool, optional) – Seeds to use for random generation. The values of the keys
'train'
,'validation'
and'test'
are used. If a seed is None or omitted, it is choosen randomly. IfTrue
is passed, the seedsfixed_seeds={'train': 42, 'validation': 2, 'test': 1}
are used. IfFalse
is passed (the default), all seeds are chosen randomly.
- class dival.datasets.LoDoPaBDataset(min_pt=None, max_pt=None, observation_model='post-log', min_photon_count=None, sorted_by_patient=False, impl='astra_cuda')[source]
Bases:
Dataset
The LoDoPaB-CT dataset, which is documented in the Data Descriptor article https://www.nature.com/articles/s41597-021-00893-z and hosted on https://zenodo.org/record/3384092. It is a simulated low dose CT dataset based on real reconstructions from the LIDC-IDRI dataset.
The dataset contains 42895 pairs of images and projection data. For simulation, a ray transform with parallel beam geometry using 1000 angles and 513 detector pixels is used. Poisson noise corresponding to 4096 incident photons per pixel before attenuation is applied to the projection data. The images have a size of 362x362 px.
An ODL ray transform that corresponds to the noiseless forward operator can be obtained via the get_ray_trafo method of this dataset. Additionally, the
ray_trafo
attribute holds a ray transform instance, which is created during__init__()
. Note: By default, the'astra_cuda'
implementation backend is used, which requires both astra and a CUDA-enabled GPU being available. You can choose a different backend by passingimpl='skimage'
orimpl='astra_cpu'
.Further functionalities:
converting the stored post-log observations to pre-log observations on the fly (cf. observation_model parameter of
__init__()
)sorting by patient ids (cf.
sorted_by_patient
parameter of__init__()
)changing the zero photon count replacement value of
0.1
used for pre-log observations (cf.min_photon_count
parameter of__init__()
)
- space
(space[0], space[1])
, wherespace[0]
odl.uniform_discr([0., -0.1838], [3.1416, 0.1838], (1000, 513), dtype='float32')
space[1]
odl.uniform_discr(min_pt, max_pt, (362, 362), dtype='float32'))
, with min_pt and max_pt parameters passed to__init__()
- shape
(362, 362)
- train_len
35820
- validation_len
3522
- test_len
3553
- random_access
True
- num_elements_per_sample
2
- ray_trafo
Ray transform corresponding to the noiseless forward operator.
- Type:
odl.tomo.RayTransform
- sorted_by_patient
Whether the samples are sorted by patient id. Default:
False
.- Type:
bool
- rel_patient_ids
Relative patient ids of the samples in the original non-sorted order for each part, as returned by
LoDoPaBDataset.get_patient_ids()
. None, if the csv files are not found.- Type:
(dict of array) or None
- __init__(min_pt=None, max_pt=None, observation_model='post-log', min_photon_count=None, sorted_by_patient=False, impl='astra_cuda')[source]
- Parameters:
min_pt ([float, float], optional) – Minimum values of the lp space. Default:
[-0.13, -0.13]
.max_pt ([float, float], optional) – Maximum values of the lp space. Default:
[0.13, 0.13]
.observation_model ({'post-log', 'pre-log'}, optional) –
The observation model to use. The default is
'post-log'
.'post-log'
Observations are linearly related to the normalized ground truth via the ray transform,
obs = ray_trafo(gt) + noise
. Note that the scaling of the observations matches the normalized ground truth, i.e., they are divided by the linear attenuation of 3071 HU.'pre-log'
Observations are non-linearly related to the ground truth, as given by the Beer-Lambert law. The model is
obs = exp(-ray_trafo(gt * MU(3071 HU))) + noise
, where MU(3071 HU) is the factor, by which the ground truth was normalized.
min_photon_count (float, optional) – Replacement value for a simulated photon count of zero. If
observation_model == 'post-log'
, a value greater than zero is required in order to avoid undefined values. The default is 0.1, both for'post-log'
and'pre-log'
model.sorted_by_patient (bool, optional) – Whether to sort the samples by patient id. Useful to resplit the dataset. See also
get_indices_for_patient()
. Note that the slices of each patient are ordered randomly wrt. the z-location in any case. Default:False
.impl ({
'skimage'
,'astra_cpu'
,'astra_cuda'
}, optional) – Implementation passed toodl.tomo.RayTransform
to constructray_trafo
.
- generator(part='train')[source]
Yield pairs of low dose observations and (virtual) ground truth.
- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.- Yields:
(observation, ground_truth) –
- observationodl element with shape
(1000, 513)
The values depend on the observation_model and min_photon_count parameters that were passed to
__init__()
.- ground_truthodl element with shape
(362, 362)
The values lie in the range
[0., 1.]
.
- observationodl element with shape
- get_ray_trafo(**kwargs)[source]
Return the ray transform that is a noiseless version of the forward operator.
- Parameters:
impl ({
'skimage'
,'astra_cpu'
,'astra_cuda'
}, optional) – The backend implementation passed toodl.tomo.RayTransform
.- Returns:
ray_trafo – The ray transform that corresponds to the noiseless map from 362 x 362 images to the
-log
of their projections (sinograms).- Return type:
odl operator
- get_sample(index, part='train', out=None)[source]
Get single sample of the dataset. Returns a pair of (virtual) ground truth and its low dose observation, of which either part can be left out by option.
- Parameters:
index (int) – The index into the dataset part.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (tuple of array-likes or bools, optional) –
out==(out_observation, out_ground_truth)
- out_observationarray-like or bool
Shape
(1000, 513)
. If an odl element or array is passed, the observation is written to it. IfTrue
, a new odl element holding the observation is created (the default). IfFalse
, no observation is returned.- out_ground_trutharray-like or bool
Shape
(362, 362)
. If an odl element or array is passed, the ground truth is written to it. IfTrue
, a new odl element holding the ground truth is created (the default). IfFalse
, no ground truth is returned.
- Returns:
- observationodl element or
np.ndarray
or None Depending on the value of
out_observation
(see parameter out), a newly created odl element,out_observation
or None is returned. The observation values depend on the observation_model and min_photon_count parameters that were given to the constructor.- ground_truthodl element or
np.ndarray
or None Depending on the value of
out_ground_truth
(see parameter out), a newly created odl element,out_ground_truth
or None is returned. The values lie in the range[0., 1.]
.
- observationodl element or
- Return type:
(observation, ground_truth)
- get_samples(key, part='train', out=None)[source]
Get slice of the dataset. Returns a pair of (virtual) ground truth data and its low dose observation data, of which either part can be left out by option.
- Parameters:
key (slice or range) – The indices into the dataset part.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (tuple of arrays or bools, optional) –
out==(out_observation, out_ground_truth)
- out_observation
np.ndarray
or bool If an array is passed, the observation data is written to it. If
True
, a new array holding the observation data is created (the default). IfFalse
, no observation data is returned.- out_ground_truth
np.ndarray
or bool If an array is passed, the ground truth data is written to it. If
True
, a new array holding the ground truth data is created (the default). IfFalse
, no ground truth data is returned.
- out_observation
- Returns:
- observation
np.ndarray
or None Shape
(samples, 1000, 513)
. Depending on the value ofout_observation
(see parameter out), a newly created array,out_observation
or None is returned. The observation values depend on the observation_model and min_photon_count parameters that were given to the constructor.- ground_truth
np.ndarray
or None Shape
(samples, 362, 362)
. Depending on the value ofout_ground_truth
(see parameter out), a newly created array,out_ground_truth
or None is returned. The values lie in the range[0., 1.]
.
- observation
- Return type:
(observation, ground_truth)
- get_indices_for_patient(rel_patient_id, part='train')[source]
Return the indices of the samples from one patient. If
self.sorted_by_patient
isTrue
, the indices will be subsequent.- Parameters:
rel_patient_id (int) – Patient id, relative to the part.
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to return the number of train, validation or test patients. Default is'train'
.
- Returns:
indices – The indices of the samples from the patient.
- Return type:
array
- static check_for_lodopab()[source]
Fast check whether first and last file of each dataset part exist under the configured data path.
- Returns:
exists – Whether LoDoPaB seems to exist.
- Return type:
bool
- static get_num_patients(part='train')[source]
Return the number of patients in a dataset part.
- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to return the number of train, validation or test patients. Default is'train'
.
- static get_patient_ids(relative=True)[source]
Return the (relative) patient id for all samples of all dataset parts.
- Parameters:
relative (bool, optional) – Whether to use ids relative to the dataset part. The csv files store absolute indices, where “train_ids < validation_ids < test_ids”. If
False
, these absolute indices are returned. IfTrue
, the smallest absolute id of the part is subtracted, giving zero-based (relative) patient ids. Default:True
- Returns:
ids – For each part: an array with the (relative) patient ids for all samples (length: number of samples in the corresponding part).
- Return type:
dict of array
- Raises:
OSError – An OSError is raised if one of the csv files containing the patient ids is missing in the configured data path.
- static get_idx_sorted_by_patient(ids=None)[source]
Return indices that allow access to each dataset part in patient id order.
Note: in most cases this method should not be called directly. Rather specify
sorted_by_patient=True
to the constructor if applicable. A plausible use case of this method, however, is to access existing cache files that were created withsorted_by_patient=False
. In this case, the dataset should be constructed withsorted_by_patient=False
, wrapped by aCachedDataset
and then reordered withReorderedDataset
using the indices returned by this method.- Parameters:
ids (dict of array-like, optional) – Patient ids as returned by
get_patient_ids()
. It is not relevant to this function whether they are relative.- Returns:
idx – Indices that allow access to each dataset part in patient id order. Each array value is an index into the samples in original order (as stored in the HDF5 files). I.e.: By iterating the samples with index
idx[part][i]
fori = 0, 1, 2, ...
one first obtains all samples from one patient, then continues with the samples of the second patient, and so on.- Return type:
dict of array
- Raises:
OSError – An OSError is raised if
ids is None
and one of the csv files containing the patient ids is missing in the configured data path.
- class dival.datasets.AngleSubsetDataset(dataset, angle_indices, impl=None)[source]
Bases:
Dataset
CT dataset that selects a subset of the angles of a basis CT dataset.
- __init__(dataset, angle_indices, impl=None)[source]
- Parameters:
dataset (Dataset) –
Basis CT dataset. Requirements:
sample elements are
(observation, ground_truth)
get_ray_trafo()
gives corresponding ray transform.
angle_indices (array-like or slice) – Indices of the angles to use from the observations.
impl ({
'skimage'
,'astra_cpu'
,'astra_cuda'
}, optional) – Implementation passed toodl.tomo.RayTransform
to constructray_trafo
.
- get_ray_trafo(**kwargs)[source]
Return the ray transform that matches the subset of angles specified to the constructor via angle_indices.
- generator(part='train')[source]
Yield data.
The default implementation calls
get_sample()
if the dataset implements it (i.e., supports random access).- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to yield train, validation or test data. Default is'train'
.- Yields:
data (odl element or tuple of odl elements) – Sample of the dataset.
- get_sample(index, part='train', out=None)[source]
Get single sample by index.
- Parameters:
index (int) – Index of the sample.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
sample – E.g. for a pair dataset:
(array, None)
ifout=(True, False)
.- Return type:
[tuple of ] (array-like or None)
- get_samples(key, part='train', out=None)[source]
Get samples by slice or range.
The default implementation calls
get_sample()
if the dataset implements it.- Parameters:
key (slice or range) – Indexes of the samples.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. The first dimension must match the number of samples requested. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
samples – If the dataset has multiple arrays per sample, a tuple holding arrays is returned. E.g. for a pair dataset:
(array, None)
ifout=(True, False)
. The samples are stacked in the first (additional) dimension of each array.- Return type:
[tuple of ] (array-like or None)
- class dival.datasets.CachedDataset(dataset, space, cache_files, size=None)[source]
Bases:
Dataset
Dataset that allows to replace elements of a dataset with cached data from .npy files.
The arrays in the .npy files must have shape
(self.get_len(part),) + self.space[i].shape
for the i-th component.- __init__(dataset, space, cache_files, size=None)[source]
- Parameters:
dataset (
Dataset
) – Original dataset from which non-cached elements are used. Must support random access if any elements are not cached.space ([tuple of ]
odl.space.base_tensors.TensorSpace
, optional) – The space(s) of the elements of samples as a tuple. This may be different fromspace
, e.g. for precomputing domain-changing operations on the elements.cache_files (dict of [tuple of ] (str or None)) –
Filenames of the cache files for each part and for each component. The part (
'train'
, …) is the key to the dict. For each part, a tuple of filenames should be provided, each of which can be None, meaning that this component should be fetched from the original dataset. If the dataset only provides one element per sample, the filename does not have to be packed inside a tuple. If a key is omitted, the part is fetched from the original dataset.As an example, for a CT dataset with cached FBPs instead of observations for parts
'train'
and'validation'
:{'train': ('cache_train_fbp.npy', None), 'validation': ('cache_validation_fbp.npy', None)}
size (dict of int, optional) – Numbers of samples for each part. If a field is omitted or has value None, all available samples are used, which may be less than the number of samples in the original dataset if the cache contains fewer samples. Default:
{}
.
- generator(part='train')[source]
Yield data.
The default implementation calls
get_sample()
if the dataset implements it (i.e., supports random access).- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to yield train, validation or test data. Default is'train'
.- Yields:
data (odl element or tuple of odl elements) – Sample of the dataset.
- get_sample(index, part='train', out=None)[source]
Get single sample by index.
- Parameters:
index (int) – Index of the sample.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
sample – E.g. for a pair dataset:
(array, None)
ifout=(True, False)
.- Return type:
[tuple of ] (array-like or None)
- get_samples(key, part='train', out=None)[source]
Get samples by slice or range.
The default implementation calls
get_sample()
if the dataset implements it.- Parameters:
key (slice or range) – Indexes of the samples.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. The first dimension must match the number of samples requested. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
samples – If the dataset has multiple arrays per sample, a tuple holding arrays is returned. E.g. for a pair dataset:
(array, None)
ifout=(True, False)
. The samples are stacked in the first (additional) dimension of each array.- Return type:
[tuple of ] (array-like or None)
- dival.datasets.generate_cache_files(dataset, cache_files, size=None, flush_interval=1000)[source]
Generate cache files for
CachedDataset
.- Parameters:
dataset (
Dataset
) – Dataset from which to cache samples.cache_files (dict of [tuple of ] (str or None)) –
Filenames of the cache files for each part and for each component to be cached. The part (
'train'
, …) is the key to the dict. For each part, a tuple of filenames should be provided, each of which can be None, meaning that this component should not be cached. If the dataset only provides one element per sample, the filename does not have to be packed inside a tuple. If a key is omitted, the part is not cached.As an example, for a CT dataset with cached FBPs instead of observations for parts
'train'
and'validation'
:{'train': ('cache_train_fbp.npy', None), 'validation': ('cache_validation_fbp.npy', None)}
size (dict of int, optional) – Numbers of samples to cache for each dataset part. If a field is omitted or has value None, all samples are cached. Default:
{}
.flush_interval (int, optional) – Number of samples to retrieve before flushing to file (using memmap). This amount of samples should fit into the systems main memory (RAM). If
-1
, each file content is only flushed once at the end.
- class dival.datasets.FBPDataset(dataset, ray_trafo, filter_type='Hann', frequency_scaling=1.0)[source]
Bases:
Dataset
Dataset computing filtered back-projections for a CT dataset on the fly.
Each sample is a pair of a FBP and a ground truth image.
- __init__(dataset, ray_trafo, filter_type='Hann', frequency_scaling=1.0)[source]
- Parameters:
dataset (
Dataset
) – CT dataset. FBPs are computed from the observations, the ground truth is taken directly from the dataset.ray_trafo (
odl.tomo.RayTransform
) – Ray transform from which the FBP operator is constructed.filter_type (str, optional) – Filter type accepted by
odl.tomo.fbp_op()
. Default:'Hann'
.frequency_scaling (float, optional) – Relative cutoff frequency passed to
odl.tomo.fbp_op()
. Default:1.0
.
- generator(part='train')[source]
Yield data.
The default implementation calls
get_sample()
if the dataset implements it (i.e., supports random access).- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to yield train, validation or test data. Default is'train'
.- Yields:
data (odl element or tuple of odl elements) – Sample of the dataset.
- get_sample(index, part='train', out=None)[source]
Get single sample by index.
- Parameters:
index (int) – Index of the sample.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
sample – E.g. for a pair dataset:
(array, None)
ifout=(True, False)
.- Return type:
[tuple of ] (array-like or None)
- get_samples(key, part='train', out=None)[source]
Get samples by slice or range.
The default implementation calls
get_sample()
if the dataset implements it.- Parameters:
key (slice or range) – Indexes of the samples.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. The first dimension must match the number of samples requested. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
samples – If the dataset has multiple arrays per sample, a tuple holding arrays is returned. E.g. for a pair dataset:
(array, None)
ifout=(True, False)
. The samples are stacked in the first (additional) dimension of each array.- Return type:
[tuple of ] (array-like or None)
- class dival.datasets.ReorderedDataset(dataset, idx)[source]
Bases:
Dataset
Dataset that reorders the samples of another dataset by specified index arrays for each part.
- __init__(dataset, idx)[source]
- Parameters:
dataset (Dataset) – Dataset to take the samples from. Must support random access.
idx (dict of array-like) – Indices into the original dataset for each part. Each array-like must have (at least) the same length as the part.
- get_sample(index, part='train', out=None)[source]
Get single sample by index.
- Parameters:
index (int) – Index of the sample.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (array-like or tuple of (array-like or bool) or None) –
Array(s) (or e.g. odl element(s)) to which the sample is written. A tuple should be passed, if the dataset returns two or more arrays per sample (i.e. pairs, …). If a tuple element is a bool, it has the following meaning:
True
Create a new array and return it.
False
Do not return this array, i.e. None is returned.
- Returns:
sample – E.g. for a pair dataset:
(array, None)
ifout=(True, False)
.- Return type:
[tuple of ] (array-like or None)