ceda_di package¶
Submodules¶
ceda_di.envi_geo module¶
Interface for reading data from ENVI BSQ/BIL packed binary files. Also contains methods for extracting metadata (geospatial/temporal).
- class ceda_di.envi_geo.BIL(header_path, path=None, unpack_fmt='<d')¶
Bases: ceda_di.envi_geo.ENVI
Sub-class of ENVI that uses the envi_io.BilFile class to read binary data from BIL files.
- read()¶
Return a dict containing a summary of the file’s data.
Returns: A dict containing a summary of the file’s data.
- class ceda_di.envi_geo.BSQ(header_path, path=None, unpack_fmt='<d')¶
Bases: ceda_di.envi_geo.ENVI
Sub-class of ENVI that uses the envi_io.BsqFile class to read binary data from BSQ files.
- read()¶
Return a dict containing a summary of the file’s data.
Returns: A dict containing a summary of the file’s data.
- class ceda_di.envi_geo.ENVI(header_path, path=None, unpack_fmt='<d')¶
Bases: ceda_di._dataset._geospatial
- get_data_format()¶
Return file format information
Returns: A dict containing file format information
- get_geospatial()¶
Read geospatial data parsed from binary file
Returns: A dict containing geospatial information
- get_parameters()¶
Return a list of Parameter objects containing parameter information.
Returns: A list of Parameter objects containing parameter information.
- get_properties()¶
Return a metadata.product.Properties object describing the file’s metadata.
Returns: Metadata.product.Properties object describing the file.
- get_temporal()¶
Return a dictionary containing the start and end times of the data file.
Returns: A dict containing temporal data.
ceda_di.envi_io module¶
Interface for ENVI BIL and BSQ files (reading packed binary data) Used by envi_geo in ceda_di module
Taken and adapted from: arsf-dan.nerc.ac.uk/trac/attachment/ticket/287/data_handler.py Original author: Ben Taylor (benj)
- class ceda_di.envi_io.BilFile(header_path, path=None, unpack_fmt='<d')¶
Bases: ceda_di.envi_io.EnviFile
Child class of EnviFile. Provides correct wrappers of read() methods for reading BIL files in order.
- read()¶
Read BIL file (reading bytes in correct order)
Returns: A multi-dimensional list containing unpacked BIL data.
- class ceda_di.envi_io.BsqFile(header_path, path=None, unpack_fmt='<d')¶
Bases: ceda_di.envi_io.EnviFile
Child class of EnviFile. Provides correct wrappers of read() methods for reading BSQ files in order.
- read()¶
Read BSQ file (reading bytes in correct order)
Returns: A multi-dimensional list containing unpacked BSQ data.
- class ceda_di.envi_io.EnviFile(header_path, path=None, unpack_fmt='<d')¶
Bases: object
Superclass for BilFile and BsqFile. Contains generic read() method that subclasses use to unpack binary data in the correct order.
- calc_from_xy()¶
Calculate the number of pixels per line based on file size.
- check_valid_fmt_string()¶
Check the format string for validity.
Returns: Number of bytes needed for type in the format string
- get_path(path, ext)¶
Given the path of an ENVI header file, try to guess the path of the ENVI binary and return it.
Parameters: - path – Path to the BIL header file
- ext – Extension of the binary data file to read
Returns: The path of the BIL data file
- process_hdr()¶
Parse the provided header file.
Returns: Header file parsed into key/value pairs
- read(x_size, y_size, z_size)¶
Read an ENVI binary file incrementally, returning arrays containing binary data.
Parameters: - x_size – Number of bands (BIL) || Number of lines (BSQ)
- y_size – Number of lines (BIL) || Number of bands (BSQ)
- z_size – Pixels per line
ceda_di.exif_geo module¶
Module containing classes to read and export XML metadata embedded in GeoTIFF files using the EXIF standard.
- class ceda_di.exif_geo.EXIF(fname)¶
Bases: ceda_di._dataset._geospatial
Class that handles extraction and export of EXIF metadata from GeoTIFF image files.
- get_geospatial()¶
Return a dictionary containing geospatial extent metadata.
Returns: Dictionary containing geospatial extent metadata.
- get_properties()¶
Return a ceda_di.metadata.product.Properties object populated with the file’s metadata.
Returns: A ceda_di.metadata.product.Properties object
- get_temporal()¶
Return a dictionary containing temporal extent metadata
Returns: Dictionary containing temporal extent metadata
ceda_di.extract module¶
‘Extract’ module - handles file crawling and metadata extraction.
- class ceda_di.extract.Extract(conf)¶
Bases: object
File crawler and metadata exractor class. Part of core functionality of ceda_di.
- conf(conf_opt)¶
Return configuration option or raise exception if it doesn’t exist. :param str conf_opt: The name of the configuration option to find.
- index_properties(filename, handler)¶
Index the file in Elasicsearch
- make_dirs()¶
Create directories for output files.
- prepare_logging()¶
Initial logging setup
- process_file(filename)¶
Instantiate a handler for a file and extract metadata.
- run()¶
Run main metadata extraction suite.
- write_properties(fname, _geospatial_obj)¶
Write module properties to an output file.
ceda_di.hdf4_geo module¶
Interface to extract and generate JSON from HDF4 EUFAR metadata
- class ceda_di.hdf4_geo.HDF4(fname)¶
Bases: ceda_di._dataset._geospatial
HDF4 context manager class.
- get_geospatial()¶
Search through HDF4 file, returning a list of coordinates from the ‘Navigation’ vgroup (if it exists).
Returns: Dict containing geospatial information.
- get_properties()¶
Returns ceda_di.metadata.properties.Properties object containing geospatial and temporal metadata from file.
Returns: Metadata.product.Properties object
- get_temporal()¶
Search through HDF4 file, returning timestamps from the ‘Mission’ vgroup (if it exists)
Returns: List containing temporal metadata
- hdf = None¶
- v = None¶
- vs = None¶
ceda_di.index module¶
- class ceda_di.index.BulkIndexer(config, threshold=1000)¶
Bases: object
Context manager for indexing into an ES installation by pooling documents and submitting in large bulk requests when the document count reaches a certain threshold.
- add_to_index_pool(document, mapping=None)¶
Add document to the correct pool, dependent on mapping type. :param str mapping: The mapping to index the document into. :param object document: The JSON-serialisable object to index.
- index_directory(path, mapping=None)¶
Indexes all files in a given directory. :param str path: The path to the directory containing the data files. :param str mapping: The mapping type (doc type) for the document to be indexed as.
- submit_pool(mapping=None)¶
Submit current document grouping (grouped by mapping) to the appropriate mapping in the ElasticSearch index. :param str mapping: The mapping to submit a to index.
- submit_pools()¶
Submit all current document pools to the ElasticSearch index.
- ceda_di.index.create_index(config, elasticsearch)¶
Set up an index in ElasticSearch, given a configuration file path. :param dict config: Application configuration dictionary, including ES config. :param str index_settings_path: Path to index settings JSON document.
ceda_di.jascis module¶
ceda_di.netcdf_geo module¶
Metadata adapters for NetCDF files.
- class ceda_di.netcdf_geo.NetCDFFactory(fpath)¶
Bases: object
Factory for checking, handling and returning an appropriate metadata extraction class.
Parameters: fpath (str) – Path to NetCDF file - get_properties()¶
Return correct metadata extraction class based on metadata format.
- class ceda_di.netcdf_geo.NetCDF_Base¶
Bases: ceda_di._dataset._geospatial
Base class - provides common NetCDF metadata extraction methods
Parameters: fpath (str) – Path to NetCDF file - static clean_coordinate(coord)¶
Return True if coordinate is valid.
- static find_var_by_regex(ncdf, regex)¶
Find a variable reference searching by regular expression.
Parameters: - ncdf (Dataset) – Reference to an opened netCDF4.Dataset object
- regex (re) – Regular expression to match with variable name
- static find_var_by_standard_name(ncdf, standard_name)¶
Find a variable reference searching by CF standard name.
Parameters: - ncdf (Dataset) – Reference to an opened netCDF4.Dataset object
- standard_name (str) – The CF standard name to search for
- static geospatial(ncdf, lat_name, lon_name)¶
Return a dict containing lat/lons from NetCDF file.
Parameters: - ncdf (Dataset) – Reference to an opened netcdf4.Dataset object
- lat_name – Name of parameter containing latitude values
- lon_name – Name of parameter containing longitude values
Returns: Geospatial information as dict.
- static params(ncdf)¶
Construct list of Parameters based on variables in NetCDF file.
Parameters: ncdf (Dataset) – Reference to an opened netcdf4.Dataset object Returns list: List of metadata.product.Parameter objects
- static temporal(ncdf, time_name)¶
Extract time values from Dataset using the variable name provided.
Parameters: - ncdf (Dataset) – Reference to an opened netcdf4.Dataset object
- time_name (str) – Name of the time parameter
- class ceda_di.netcdf_geo.NetCDF_CF(fpath, convention)¶
Bases: ceda_di._dataset._geospatial
Metadata extraction class for CF-compliant NetCDF files.
- get_geospatial()¶
- get_parameters()¶
- get_properties()¶
Return a metadata.product.Properties object populated with metadata.
Returns: Properties object populated with metadata
- get_temporal()¶
- class ceda_di.netcdf_geo.NetCDF_RAF(fpath, convention)¶
Bases: ceda_di._dataset._geospatial
Metadata extraction class for NCAR-RAF-compliant NetCDF.
- get_geospatial()¶
- get_parameters()¶
- get_properties()¶
Return a metadata.product.Properties object populated with metadata.
Returns: Properties object populated with metadata
- get_temporal()¶
ceda_di.search module¶
- class ceda_di.search.ElasticsearchClientFactory¶
Bases: object
- get_client(config_args)¶
Return an appropriately configured Elasticsearch client.
Parameters: config_args – Configuration dictionary. Should contain an Elasticsearch hostname under key ‘es-host’ and an Elasticsearch port under the key ‘es-port’. Returns: A configured Elasticsearch instance
- class ceda_di.search.JsonQueryBuilder¶
Bases: object
- build(extents_string=None, max_results=None)¶
Build an Elasticsearch query dictionary from a given extents string.
Parameters: extents_string – A string specifying temporal or spatial extents, e.g. ‘t=[2014-10-12T12:13:14,2014-10-12T17:18:19]’. Returns: A dictionary which is valid Elasticsearch query JSON.
- process_datetime_extents(start, end)¶
Process a datetime extents search filter and add it to the query dictionary.
Will parse partial datetimes to maximise the search window - e.g. start=2009, end=2010 will find all results from 2009-01-01T00:00:00 to 2010-12-31T23:59:59
Parameters: - start – Start datetime string
- end – End datetime string
- process_latitude_extents(lat_1, lat_2)¶
Process latitude extents search filter and add it to the query dictionary.
Will always include the region from the lowest latitude specified to the highest, regardless of the order in which they are passed to this function.
Parameters: - lat_1 – Latitude float in the range -90 to +90 degrees.
- lat_2 – Latitude float in the range -90 to +90 degrees.
- process_longitude_extents(start, end)¶
Process longitude extents search filter and add it to the query dictionary.
Will automatically constrain start and end longitudes to be within the range -180 to +180 (so they may be specified e.g. as 370). The region searched is always the region from the start longitude to the end latitude :param start: Start latitude :param end: End latitude
- process_single_datetime(datetime)¶
Process a single datetime search filter and add it to the query dictionary.
Will parse partial datetimes to maximise the search window - e.g. 2009 will find all results from 2009-01-01T00:00:00 to 2009-12-31T23:59:59 :param datetime: Start datetime string
- process_single_latitude(lat)¶
Process a single latitude search filter and add it to the query dictionary.
Parameters: lat – Latitude to filter by
- process_single_longitude(lon)¶
Process a single longitude search filter.
Will automatically constrain to within the range -180 to +180 (so values of e.g. 370 are acceptable).
Parameters: lon – Longitude to filter by
- class ceda_di.search.Searcher(config_args, json_query_builder=<ceda_di.search.JsonQueryBuilder object at 0x7f0b7c87a3d0>, elastic_search_client_factory=<ceda_di.search.ElasticsearchClientFactory object at 0x7f0b7c87a450>)¶
Bases: object
Coordinates the searching of Elasticsearch nodes to output matching filepaths.
- run()¶
Run the search and output the results matching the configuration belonging to this instance.
Returns: Outputs matching filenames to sys.stdout