OccurrenceCube Class

class b3alien.b3cube.OccurrenceCube(filepath: str, source='geoparquet', gproject='', dims=None, coords=None, index_col=None)[source]

Bases: object

Load a GeoParquet file (local or from GCS) into a sparse xarray cube.

Parameters:
  • filepath (str) – Path to the GeoParquet file (e.g. ‘gs://bucket/file.parquet’).

  • dims (list or tuple, optional) – Dimension names. Default is [‘time’, ‘cell’, ‘species’].

  • coords (dict, optional) – Optional coordinates to assign to the cube.

  • index_col (str or list, optional) – Column(s) to use for reshaping if needed.

Returns:

A sparse data cube loaded from the GeoParquet file. self.df contains a geopandas.DataFrame self.data a sparse xarray.Xarray

Return type:

b3cube.OccurrenceCube

_create_xcube(df, dims=('time', 'cell', 'species'))[source]

This function converts a GeoDataFrame into a sparse xarray cube with geometry metadata in case a GeoParquet file was loaded. In case of a pure GBIF cube, the geometry is ignored.

_filter_species(speciesKey)[source]

Filter the cube to only include data for a specific speciesKey.

Parameters:

speciesKey (int or str) – The GBIF speciesKey to filter on.

Returns:

Updates self.df and self.data to only include the specified speciesKey.

Return type:

None

_load_gbifcsv(path)[source]

Load a GBIF CSV file from local disk using Pandas. Assumes tab-separated values.

_load_geoparquet(path, gproject)[source]

Load a GeoParquet file from local disk or GCS using GeoPandas.

_species_richness(normalized=False)[source]

Calculate species richness per cell from the sparse cube. :param normalized: Whether to compute normalized richness (richness / total occurrences). :type normalized: bool

Returns:

Sets self.richness as a DataFrame with columns ‘cell’ and ‘richness’ or ‘normalized_richness’.

Return type:

None

Biodiversity data cube functions

b3alien.b3cube.aggregate_count_per_cell(cube, taxonRank, taxon)[source]

Aggregate the counts per taxonomic level per cell. This can be used as a normalization factor

Parameters:
  • cube (OccurrenceCube) – An OccurrenceCube with geometries added

  • taxonRank (dwc.taxonRank term) – Level at which the aggregation needs to be performed

  • taxon (str) – Name of the taxon at which aggregation needs to be performed

  • plot (bool, optional) – whether the aggregated count per cell needs to be plotted on a map

Returns:

gdf – dataframe containing geometries and the aggregated occurrence count

Return type:

GeoDataFrame

b3alien.b3cube.calculate_rate(df_cumulative)[source]

Calculate the rate of establishment from the cumulative distribution.

Parameters:

df_cumulative (pandas.DataFrame) – Datagrame containing the cumulative distribution.

Returns:

  • s1 (pandas.Series) – Series of the time axis.

  • s2 (pandas.Series) – Series of the rate of establishment.

b3alien.b3cube.cumulative_species(cube, species_to_keep)[source]

Calculate the cumulative number of species in a OccurrenceCube.

Parameters:
  • cube (b3alien.b3cube.OccurrenceCube) – Species OccurrenceCube from GBIF.

  • species_to_keep (numpy.array) – Array of GBIF speciesKeys that need to be taken into account to calculate the cumulative species number of a cube.

  • geom (str, optional)

Returns:

  • df1 (pandas.DataFrame) – Sparse dataframe that still contains the cumulative sum per grid cell.

  • df2 (pandas.DataFrame) – Cumulative dataframe cell independent.

b3alien.b3cube.filter_multiple_cells(df_sparse)[source]

Only count a species established when it is present in more than one cell.

Parameters:

df_sparse (pandas.DataFrame) – Datagrame containing the species richness per grid cell.

Returns:

Cumulative species when in multiple cells.

Return type:

pandas.DataFrame

b3alien.b3cube.filter_multiple_occ(df_sparse)[source]

Only count a species established when there are multiple occurrences in a cell.

Parameters:

df_sparse (pandas.DataFrame) – Datagrame containing the species richness per grid cell.

Returns:

Cumulative species when multiple occurrences in a cell.

Return type:

pandas.DataFrame

b3alien.b3cube.filter_time_window(df, start_year, end_year, cols=['year', 'rate'])[source]

Filter a two column dataframe based on year window.

Parameters:
  • df (pandas.DataFrame) – A two column dataframe with one time axs

  • start_year (np.int) – Integer number of the earliest date to filter on.

  • end_year (np.int) – Integer number of the latest date to filter on.

  • cols (list of str, optional) – List of the names of the columns of the dataframe

Returns:

  • time (pandas.Series) – Filtered time series

  • filtered_var (pandas.Series) – Filtered variable corresponding to the time series

b3alien.b3cube.find_correlations(cube: OccurrenceCube, top_n=10)[source]

Calculates species co-occurrence using efficient matrix operations.

This method identifies pairs of species that occur in the same ‘cell’ at the same ‘time’.

Parameters:
  • (OccurrenceCube) (cube)

  • (int) (top_n)

Returns:

list – and their co-occurrence count, sorted in descending order.

Return type:

A list of tuples, where each tuple contains the species names

b3alien.b3cube.get_survey_effort(cube, dateFormat='%Y-%m', calc_type='total')[source]

Estimate the survey effort in an OccurrenceCube.

Parameters:
  • cube (b3alien.b3cube.OccurrenceCube) – Species OccurrenceCube from GBIF.

  • dateFormat (str, optional) – Dateformat stored in the OccurrenceCube. Default is ‘%Y-%m’

  • calc_type (str, optional) –

    Type of survey effort to be calculated.

    ’distinct’ : total number of distinct observers ‘total’ : total number of occurrences Default is total.

Returns:

df – Dataframe containing time and the chosen measurement for survey effort.

Return type:

pandas.DataFrame

b3alien.b3cube.plot_cumsum(df_cumulative)[source]

Create a plot of the cumulative number of species.

Parameters:

df_cumulative (pandas.DataFrame) – Datagrame containing the cumulative number over tume.

Returns:

A plot of the cumulative number of species.

Return type:

matplotlib.plot

b3alien.b3cube.plot_richness(cube, normalized=False, html_path='richness_map.html')[source]

Plot species richness from an OccurrenceCube instance.

Parameters:
  • cube (OccurrenceCube) – An instance of the cube with .data and .richness available.

  • normalized (bool) – Whether to use normalized richness.

  • html_path (str) – Path to save the map if not in Jupyter.

Return type:

None