fab.datasources

Data Sources

The data produced by each device is saved in the HDF files in groups, indexed by a key that identifies their purpose. For example, the data coming from the Gas Monitor Detector (GMD) of FLASH2 is stored under the key: /FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall.

To load data from the HDF stores, we can use the fab.datasources.HDFSource class. To load the GMD data from DAQ run 43861 from beamtime number 11013355, all we need to do is:

from fab.magic import beamtime

from fab.datasources import HDFSource 

gmd = HDFSource(name = 'GMD', hdf_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall")
data = gmd.load(daq_run=43861)         # Load DAQ run 43861

In this case, data will be a xarray.DataArray containg the raw GMD data, indexed by train_id (macropulse id)

See the fab.datasources.HDFSource documentation for more initialization options and parameters.

Configuration

For most use cases, DataSources will be defined in a configuration file as part of an instrument, removing the need instantiate them manually. See the fab.settings module documentation for more information on how to configure datasources from a configuration file.

Preloading values

By default, datasources that only deal with small amounts of data will load all data directly in RAM, bypassing the lazy behaviour of dask.array. This speeds up loading time and computations.

We can force a datasource to cache its data on disk for fast access by settig the preload_values argument to True. This is useful for small datasources that are is used often in the analysis (e.g. indexes, set points, and slow varying diagnostics), as it speeds up loading time (the data is loded from cache instead of from the HDF5 files). But it is a really bad idea for large sources (GBs to TBs) as it will (in the best case) cause a MemoryError when the data is loaed to RAM.

The default behaviour should be already appropriate for most use cases. Only set this to true if you know what you are doing. If you are unsure, contact an expert before using this setting.

Specialized sources

Most of the time the 'raw' data found in the HDF files loaded with fab.datasources.HDFSource will be all we need. However, in some case, a bit of preprocessing is a nice addition.

For some common use cases, specialized loaders are implemented. In the case of the GMD data, we could use 'fab.datasources.GMD' as a specialized source.

from fab.magic import beamtime     
from fab.datasources import GMD 

gmd = GMD(name = 'GMD', data_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall",
                        calibration_key = "/FL2/Photon Diagnostic/GMD/Average energy/energy hall")
data = gmd.load(daq_run=43861)         

This source formats the data and indexes it by both train_id and shot_id. Moreover, if a calibration key is provided, the data is calibrated so that the absolute value of the data represents the pulse energy in uJ (calibrating the data increases loading time, if a relative value is all you need, eg for normalization, you can pass "" as the calibration key) A similar specialized source is availabe for BAM data under fab.datasources.BAM.

If the loaded configuration specifies a list of sources (see the settings module documentation) you can also initialize a source by it's name:

from fab.magic import beamtime
from fab.datasources import DataSource

source = DataSource.from_name("name_of_source")

General sources

A general fab.datasources.DataSource abstract base class is provided if extensions of the data source API are needed in order to integrate data that was recorded outside of the DAQ system with in the project. See the following section on Instruments to see how to integrate many datasources toghter

 1'''
 2## Data Sources
 3
 4The data produced by each device is saved in the HDF files in groups, indexed 
 5by a key that identifies their purpose. For example, the data coming from the 
 6Gas Monitor Detector (GMD) of FLASH2 is stored under the key: `/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall`.
 7
 8To load data from the HDF stores, we can use the `fab.datasources.HDFSource` class. 
 9To load the GMD data from DAQ run 43861 from beamtime number 11013355, all we need to do is:
10
11```python
12from fab.magic import beamtime
13
14from fab.datasources import HDFSource 
15
16gmd = HDFSource(name = 'GMD', hdf_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall")
17data = gmd.load(daq_run=43861)         # Load DAQ run 43861
18```
19
20In this case, data will be a `xarray.DataArray` containg the raw GMD data, indexed by 
21train_id (macropulse id)
22
23See the `fab.datasources.HDFSource` documentation for more initialization options and parameters.
24
25### Configuration
26
27For most use cases, DataSources will be defined in a configuration file as part of an instrument,
28removing the need instantiate them manually. See the `fab.settings` module documentation for more
29information on how to configure datasources from a configuration file.
30
31
32### Preloading values
33
34By default, datasources that only deal with small amounts of data will load all data directly in 
35RAM, bypassing the lazy behaviour of `dask.array`. This speeds up loading time and computations.
36
37We can force a datasource to cache its data on disk for fast access by settig the `preload_values`
38argument to `True`. This is useful for small datasources that are is used often in the analysis
39(e.g. indexes, set points, and slow varying diagnostics), as it speeds up loading time (the data 
40is loded from cache instead of from the HDF5 files). But it is a really bad idea for large sources
41(GBs to TBs) as it will (in the best case) cause a MemoryError when the data is loaed to RAM. 
42
43The default behaviour should be already appropriate for most use cases. Only set this to true if
44you know what you are doing. If you are unsure, contact an expert before using this setting.
45
46### Specialized sources
47
48Most of the time the 'raw' data found in the HDF files loaded with `fab.datasources.HDFSource`
49will be all we need. However, in some case, a bit of preprocessing is a nice addition.
50
51For some common use cases, specialized loaders are implemented. In the case of the GMD data, we 
52could use 'fab.datasources.GMD' as a specialized source.
53
54```python
55from fab.magic import beamtime     
56from fab.datasources import GMD 
57
58gmd = GMD(name = 'GMD', data_key = "/FL2/Photon Diagnostic/GMD/Pulse resolved energy/energy hall",
59                        calibration_key = "/FL2/Photon Diagnostic/GMD/Average energy/energy hall")
60data = gmd.load(daq_run=43861)         
61```
62This source formats the data and indexes it by both train_id and shot_id. Moreover, if a calibration
63key is provided, the data is calibrated so that the absolute value of the data represents the pulse
64energy in uJ (calibrating the data increases loading time, if a relative value is all you need, eg
65for normalization, you can pass "" as the calibration key)
66A similar specialized source is availabe for BAM data under `fab.datasources.BAM`.
67
68If the loaded configuration specifies a list of sources (see the settings module documentation)
69you can also initialize a source by it's name:
70
71```python
72from fab.magic import beamtime
73from fab.datasources import DataSource
74
75source = DataSource.from_name("name_of_source")
76```
77
78### General sources
79
80A general `fab.datasources.DataSource` abstract base class is provided if extensions of the 
81data source API are needed in order to integrate data that was recorded outside of the DAQ
82system with in the project. See the following section on Instruments to see how to integrate
83many datasources toghter
84'''
85
86from .basesources import DataSource
87from .HDFSource import HDFSource
88
89from .special import GMD, BAM, Timestamp
90from .adc import SlicedADC
91from .exosources import CSVSource, PnCCD
92from .timepix import Timepix
93
94#__all__ = [DataSource, HDFSource, Timestamp, GMD, BAM, SlicedADC, CSVSource, PnCCD]