dival.datasets.lodopab_dataset module
Provides LoDoPaBDataset.
Provides simple access to the LoDoPaB-CT dataset documented in a Data Descriptor article.
- class dival.datasets.lodopab_dataset.LoDoPaBDataset(min_pt=None, max_pt=None, observation_model='post-log', min_photon_count=None, sorted_by_patient=False, impl='astra_cuda')[source]
Bases:
Dataset
The LoDoPaB-CT dataset, which is documented in the Data Descriptor article https://www.nature.com/articles/s41597-021-00893-z and hosted on https://zenodo.org/record/3384092. It is a simulated low dose CT dataset based on real reconstructions from the LIDC-IDRI dataset.
The dataset contains 42895 pairs of images and projection data. For simulation, a ray transform with parallel beam geometry using 1000 angles and 513 detector pixels is used. Poisson noise corresponding to 4096 incident photons per pixel before attenuation is applied to the projection data. The images have a size of 362x362 px.
An ODL ray transform that corresponds to the noiseless forward operator can be obtained via the get_ray_trafo method of this dataset. Additionally, the
ray_trafo
attribute holds a ray transform instance, which is created during__init__()
. Note: By default, the'astra_cuda'
implementation backend is used, which requires both astra and a CUDA-enabled GPU being available. You can choose a different backend by passingimpl='skimage'
orimpl='astra_cpu'
.Further functionalities:
converting the stored post-log observations to pre-log observations on the fly (cf. observation_model parameter of
__init__()
)sorting by patient ids (cf.
sorted_by_patient
parameter of__init__()
)changing the zero photon count replacement value of
0.1
used for pre-log observations (cf.min_photon_count
parameter of__init__()
)
- space
(space[0], space[1])
, wherespace[0]
odl.uniform_discr([0., -0.1838], [3.1416, 0.1838], (1000, 513), dtype='float32')
space[1]
odl.uniform_discr(min_pt, max_pt, (362, 362), dtype='float32'))
, with min_pt and max_pt parameters passed to__init__()
- shape
(362, 362)
- train_len
35820
- validation_len
3522
- test_len
3553
- random_access
True
- num_elements_per_sample
2
- ray_trafo
Ray transform corresponding to the noiseless forward operator.
- Type:
odl.tomo.RayTransform
- sorted_by_patient
Whether the samples are sorted by patient id. Default:
False
.- Type:
bool
- rel_patient_ids
Relative patient ids of the samples in the original non-sorted order for each part, as returned by
LoDoPaBDataset.get_patient_ids()
. None, if the csv files are not found.- Type:
(dict of array) or None
- __init__(min_pt=None, max_pt=None, observation_model='post-log', min_photon_count=None, sorted_by_patient=False, impl='astra_cuda')[source]
- Parameters:
min_pt ([float, float], optional) – Minimum values of the lp space. Default:
[-0.13, -0.13]
.max_pt ([float, float], optional) – Maximum values of the lp space. Default:
[0.13, 0.13]
.observation_model ({'post-log', 'pre-log'}, optional) –
The observation model to use. The default is
'post-log'
.'post-log'
Observations are linearly related to the normalized ground truth via the ray transform,
obs = ray_trafo(gt) + noise
. Note that the scaling of the observations matches the normalized ground truth, i.e., they are divided by the linear attenuation of 3071 HU.'pre-log'
Observations are non-linearly related to the ground truth, as given by the Beer-Lambert law. The model is
obs = exp(-ray_trafo(gt * MU(3071 HU))) + noise
, where MU(3071 HU) is the factor, by which the ground truth was normalized.
min_photon_count (float, optional) – Replacement value for a simulated photon count of zero. If
observation_model == 'post-log'
, a value greater than zero is required in order to avoid undefined values. The default is 0.1, both for'post-log'
and'pre-log'
model.sorted_by_patient (bool, optional) – Whether to sort the samples by patient id. Useful to resplit the dataset. See also
get_indices_for_patient()
. Note that the slices of each patient are ordered randomly wrt. the z-location in any case. Default:False
.impl ({
'skimage'
,'astra_cpu'
,'astra_cuda'
}, optional) – Implementation passed toodl.tomo.RayTransform
to constructray_trafo
.
- generator(part='train')[source]
Yield pairs of low dose observations and (virtual) ground truth.
- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.- Yields:
(observation, ground_truth) –
- observationodl element with shape
(1000, 513)
The values depend on the observation_model and min_photon_count parameters that were passed to
__init__()
.- ground_truthodl element with shape
(362, 362)
The values lie in the range
[0., 1.]
.
- observationodl element with shape
- get_ray_trafo(**kwargs)[source]
Return the ray transform that is a noiseless version of the forward operator.
- Parameters:
impl ({
'skimage'
,'astra_cpu'
,'astra_cuda'
}, optional) – The backend implementation passed toodl.tomo.RayTransform
.- Returns:
ray_trafo – The ray transform that corresponds to the noiseless map from 362 x 362 images to the
-log
of their projections (sinograms).- Return type:
odl operator
- get_sample(index, part='train', out=None)[source]
Get single sample of the dataset. Returns a pair of (virtual) ground truth and its low dose observation, of which either part can be left out by option.
- Parameters:
index (int) – The index into the dataset part.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (tuple of array-likes or bools, optional) –
out==(out_observation, out_ground_truth)
- out_observationarray-like or bool
Shape
(1000, 513)
. If an odl element or array is passed, the observation is written to it. IfTrue
, a new odl element holding the observation is created (the default). IfFalse
, no observation is returned.- out_ground_trutharray-like or bool
Shape
(362, 362)
. If an odl element or array is passed, the ground truth is written to it. IfTrue
, a new odl element holding the ground truth is created (the default). IfFalse
, no ground truth is returned.
- Returns:
- observationodl element or
np.ndarray
or None Depending on the value of
out_observation
(see parameter out), a newly created odl element,out_observation
or None is returned. The observation values depend on the observation_model and min_photon_count parameters that were given to the constructor.- ground_truthodl element or
np.ndarray
or None Depending on the value of
out_ground_truth
(see parameter out), a newly created odl element,out_ground_truth
or None is returned. The values lie in the range[0., 1.]
.
- observationodl element or
- Return type:
(observation, ground_truth)
- get_samples(key, part='train', out=None)[source]
Get slice of the dataset. Returns a pair of (virtual) ground truth data and its low dose observation data, of which either part can be left out by option.
- Parameters:
key (slice or range) – The indices into the dataset part.
part ({
'train'
,'validation'
,'test'
}, optional) – The data part. Default is'train'
.out (tuple of arrays or bools, optional) –
out==(out_observation, out_ground_truth)
- out_observation
np.ndarray
or bool If an array is passed, the observation data is written to it. If
True
, a new array holding the observation data is created (the default). IfFalse
, no observation data is returned.- out_ground_truth
np.ndarray
or bool If an array is passed, the ground truth data is written to it. If
True
, a new array holding the ground truth data is created (the default). IfFalse
, no ground truth data is returned.
- out_observation
- Returns:
- observation
np.ndarray
or None Shape
(samples, 1000, 513)
. Depending on the value ofout_observation
(see parameter out), a newly created array,out_observation
or None is returned. The observation values depend on the observation_model and min_photon_count parameters that were given to the constructor.- ground_truth
np.ndarray
or None Shape
(samples, 362, 362)
. Depending on the value ofout_ground_truth
(see parameter out), a newly created array,out_ground_truth
or None is returned. The values lie in the range[0., 1.]
.
- observation
- Return type:
(observation, ground_truth)
- get_indices_for_patient(rel_patient_id, part='train')[source]
Return the indices of the samples from one patient. If
self.sorted_by_patient
isTrue
, the indices will be subsequent.- Parameters:
rel_patient_id (int) – Patient id, relative to the part.
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to return the number of train, validation or test patients. Default is'train'
.
- Returns:
indices – The indices of the samples from the patient.
- Return type:
array
- static check_for_lodopab()[source]
Fast check whether first and last file of each dataset part exist under the configured data path.
- Returns:
exists – Whether LoDoPaB seems to exist.
- Return type:
bool
- static get_num_patients(part='train')[source]
Return the number of patients in a dataset part.
- Parameters:
part ({
'train'
,'validation'
,'test'
}, optional) – Whether to return the number of train, validation or test patients. Default is'train'
.
- static get_patient_ids(relative=True)[source]
Return the (relative) patient id for all samples of all dataset parts.
- Parameters:
relative (bool, optional) – Whether to use ids relative to the dataset part. The csv files store absolute indices, where “train_ids < validation_ids < test_ids”. If
False
, these absolute indices are returned. IfTrue
, the smallest absolute id of the part is subtracted, giving zero-based (relative) patient ids. Default:True
- Returns:
ids – For each part: an array with the (relative) patient ids for all samples (length: number of samples in the corresponding part).
- Return type:
dict of array
- Raises:
OSError – An OSError is raised if one of the csv files containing the patient ids is missing in the configured data path.
- static get_idx_sorted_by_patient(ids=None)[source]
Return indices that allow access to each dataset part in patient id order.
Note: in most cases this method should not be called directly. Rather specify
sorted_by_patient=True
to the constructor if applicable. A plausible use case of this method, however, is to access existing cache files that were created withsorted_by_patient=False
. In this case, the dataset should be constructed withsorted_by_patient=False
, wrapped by aCachedDataset
and then reordered withReorderedDataset
using the indices returned by this method.- Parameters:
ids (dict of array-like, optional) – Patient ids as returned by
get_patient_ids()
. It is not relevant to this function whether they are relative.- Returns:
idx – Indices that allow access to each dataset part in patient id order. Each array value is an index into the samples in original order (as stored in the HDF5 files). I.e.: By iterating the samples with index
idx[part][i]
fori = 0, 1, 2, ...
one first obtains all samples from one patient, then continues with the samples of the second patient, and so on.- Return type:
dict of array
- Raises:
OSError – An OSError is raised if
ids is None
and one of the csv files containing the patient ids is missing in the configured data path.