fab.datasources
Data Sources
The data produced by each device is saved in the HDF files in groups, indexed
by a key that identifies their purpose. For example, the data coming from the
Gas Monitor Detector (GMD) of FLASH2 is stored under the key: /FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall
.
To load data from the HDF stores, we can use the fab.datasources.HDFSource
class.
To load the GMD data from DAQ run 43861 from beamtime number 11013355, all we need to do is:
from fab.magic import beamtime
from fab.datasources import HDFSource
gmd = HDFSource(name = 'GMD', hdf_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall")
data = gmd.load(daq_run=43861) # Load DAQ run 43861
In this case, data will be a xarray.DataArray
containg the raw GMD data, indexed by
train_id (macropulse id)
See the fab.datasources.HDFSource
documentation for more initialization options and parameters.
Configuration
For most use cases, DataSources will be defined in a configuration file as part of an instrument,
removing the need instantiate them manually. See the fab.settings
module documentation for more
information on how to configure datasources from a configuration file.
Preloading values
By default, datasources that only deal with small amounts of data will load all data directly in
RAM, bypassing the lazy behaviour of dask.array
. This speeds up loading time and computations.
We can force a datasource to cache its data on disk for fast access by settig the preload_values
argument to True
. This is useful for small datasources that are is used often in the analysis
(e.g. indexes, set points, and slow varying diagnostics), as it speeds up loading time (the data
is loded from cache instead of from the HDF5 files). But it is a really bad idea for large sources
(GBs to TBs) as it will (in the best case) cause a MemoryError when the data is loaed to RAM.
The default behaviour should be already appropriate for most use cases. Only set this to true if you know what you are doing. If you are unsure, contact an expert before using this setting.
Specialized sources
Most of the time the 'raw' data found in the HDF files loaded with fab.datasources.HDFSource
will be all we need. However, in some case, a bit of preprocessing is a nice addition.
For some common use cases, specialized loaders are implemented. In the case of the GMD data, we could use 'fab.datasources.GMD' as a specialized source.
from fab.magic import beamtime
from fab.datasources import GMD
gmd = GMD(name = 'GMD', data_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall",
calibration_key = "/FL2/Photon Diagnostic/GMD/Average energy/energy hall")
data = gmd.load(daq_run=43861)
This source formats the data and indexes it by both train_id and shot_id. Moreover, if a calibration
key is provided, the data is calibrated so that the absolute value of the data represents the pulse
energy in uJ (calibrating the data increases loading time, if a relative value is all you need, eg
for normalization, you can pass "" as the calibration key)
A similar specialized source is availabe for BAM data under fab.datasources.BAM
.
If the loaded configuration specifies a list of sources (see the settings module documentation) you can also initialize a source by it's name:
from fab.magic import beamtime
from fab.datasources import DataSource
source = DataSource.from_name("name_of_source")
General sources
A general fab.datasources.DataSource
abstract base class is provided if extensions of the
data source API are needed in order to integrate data that was recorded outside of the DAQ
system with in the project. See the following section on Instruments to see how to integrate
many datasources toghter
1''' 2## Data Sources 3 4The data produced by each device is saved in the HDF files in groups, indexed 5by a key that identifies their purpose. For example, the data coming from the 6Gas Monitor Detector (GMD) of FLASH2 is stored under the key: `/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall`. 7 8To load data from the HDF stores, we can use the `fab.datasources.HDFSource` class. 9To load the GMD data from DAQ run 43861 from beamtime number 11013355, all we need to do is: 10 11```python 12from fab.magic import beamtime 13 14from fab.datasources import HDFSource 15 16gmd = HDFSource(name = 'GMD', hdf_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall") 17data = gmd.load(daq_run=43861) # Load DAQ run 43861 18``` 19 20In this case, data will be a `xarray.DataArray` containg the raw GMD data, indexed by 21train_id (macropulse id) 22 23See the `fab.datasources.HDFSource` documentation for more initialization options and parameters. 24 25### Configuration 26 27For most use cases, DataSources will be defined in a configuration file as part of an instrument, 28removing the need instantiate them manually. See the `fab.settings` module documentation for more 29information on how to configure datasources from a configuration file. 30 31 32### Preloading values 33 34By default, datasources that only deal with small amounts of data will load all data directly in 35RAM, bypassing the lazy behaviour of `dask.array`. This speeds up loading time and computations. 36 37We can force a datasource to cache its data on disk for fast access by settig the `preload_values` 38argument to `True`. This is useful for small datasources that are is used often in the analysis 39(e.g. indexes, set points, and slow varying diagnostics), as it speeds up loading time (the data 40is loded from cache instead of from the HDF5 files). But it is a really bad idea for large sources 41(GBs to TBs) as it will (in the best case) cause a MemoryError when the data is loaed to RAM. 42 43The default behaviour should be already appropriate for most use cases. Only set this to true if 44you know what you are doing. If you are unsure, contact an expert before using this setting. 45 46### Specialized sources 47 48Most of the time the 'raw' data found in the HDF files loaded with `fab.datasources.HDFSource` 49will be all we need. However, in some case, a bit of preprocessing is a nice addition. 50 51For some common use cases, specialized loaders are implemented. In the case of the GMD data, we 52could use 'fab.datasources.GMD' as a specialized source. 53 54```python 55from fab.magic import beamtime 56from fab.datasources import GMD 57 58gmd = GMD(name = 'GMD', data_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall", 59 calibration_key = "/FL2/Photon Diagnostic/GMD/Average energy/energy hall") 60data = gmd.load(daq_run=43861) 61``` 62This source formats the data and indexes it by both train_id and shot_id. Moreover, if a calibration 63key is provided, the data is calibrated so that the absolute value of the data represents the pulse 64energy in uJ (calibrating the data increases loading time, if a relative value is all you need, eg 65for normalization, you can pass "" as the calibration key) 66A similar specialized source is availabe for BAM data under `fab.datasources.BAM`. 67 68If the loaded configuration specifies a list of sources (see the settings module documentation) 69you can also initialize a source by it's name: 70 71```python 72from fab.magic import beamtime 73from fab.datasources import DataSource 74 75source = DataSource.from_name("name_of_source") 76``` 77 78### General sources 79 80A general `fab.datasources.DataSource` abstract base class is provided if extensions of the 81data source API are needed in order to integrate data that was recorded outside of the DAQ 82system with in the project. See the following section on Instruments to see how to integrate 83many datasources toghter 84''' 85 86from .basesources import DataSource 87from .HDFSource import HDFSource 88 89from .special import GMD, BAM, Timestamp 90from .adc import SlicedADC 91from .exosources import CSVSource, PnCCD 92from .timepix import Timepix 93 94#__all__ = [DataSource, HDFSource, Timestamp, GMD, BAM, SlicedADC, CSVSource, PnCCD]