OccurrenceCube Class

class b3alien.b3cube.OccurrenceCube(filepath: str, source='geoparquet', gproject='', dims=None, coords=None, index_col=None)[source]

Bases: object

Load a GeoParquet file (local or from GCS) into a sparse xarray cube.

Parameters:

filepath (str) – Path to the GeoParquet file (e.g. ‘gs://bucket/file.parquet’).
dims (list or tuple, optional) – Dimension names. Default is [‘time’, ‘cell’, ‘species’].
coords (dict, optional) – Optional coordinates to assign to the cube.
index_col (str or list, optional) – Column(s) to use for reshaping if needed.

Returns:

A sparse data cube loaded from the GeoParquet file. self.df contains a geopandas.DataFrame self.data a sparse xarray.Xarray

Return type:

b3cube.OccurrenceCube

_create_xcube(df, dims=('time', 'cell', 'species'))[source]: This function converts a GeoDataFrame into a sparse xarray cube with geometry metadata in case a GeoParquet file was loaded. In case of a pure GBIF cube, the geometry is ignored.

_filter_species(speciesKey)[source]

Filter the cube to only include data for a specific speciesKey.

Parameters:: speciesKey (int or str) – The GBIF speciesKey to filter on.
Returns:: Updates self.df and self.data to only include the specified speciesKey.
Return type:: None

_load_gbifcsv(path)[source]: Load a GBIF CSV file from local disk using Pandas. Assumes tab-separated values.

_load_geoparquet(path, gproject)[source]: Load a GeoParquet file from local disk or GCS using GeoPandas.

_species_richness(normalized=False)[source]

Calculate species richness per cell from the sparse cube. :param normalized: Whether to compute normalized richness (richness / total occurrences). :type normalized: bool

Returns:: Sets self.richness as a DataFrame with columns ‘cell’ and ‘richness’ or ‘normalized_richness’.
Return type:: None

Biodiversity data cube functions

b3alien.b3cube.aggregate_count_per_cell(cube, taxonRank, taxon)[source]

Aggregate the counts per taxonomic level per cell. This can be used as a normalization factor

Parameters:

cube (OccurrenceCube) – An OccurrenceCube with geometries added
taxonRank (dwc.taxonRank term) – Level at which the aggregation needs to be performed
taxon (str) – Name of the taxon at which aggregation needs to be performed
plot (bool, optional) – whether the aggregated count per cell needs to be plotted on a map

Returns:

gdf – dataframe containing geometries and the aggregated occurrence count

Return type:

GeoDataFrame

b3alien.b3cube.calculate_rate(df_cumulative)[source]

Calculate the rate of establishment from the cumulative distribution.

Parameters:

df_cumulative (pandas.DataFrame) – Datagrame containing the cumulative distribution.

Returns:

s1 (pandas.Series) – Series of the time axis.
s2 (pandas.Series) – Series of the rate of establishment.

b3alien.b3cube.cumulative_species(cube, species_to_keep)[source]

Calculate the cumulative number of species in a OccurrenceCube.

Parameters:

cube (b3alien.b3cube.OccurrenceCube) – Species OccurrenceCube from GBIF.
species_to_keep (numpy.array) – Array of GBIF speciesKeys that need to be taken into account to calculate the cumulative species number of a cube.
geom (str, optional)

Returns:

df1 (pandas.DataFrame) – Sparse dataframe that still contains the cumulative sum per grid cell.
df2 (pandas.DataFrame) – Cumulative dataframe cell independent.

b3alien.b3cube.filter_multiple_cells(df_sparse)[source]

Only count a species established when it is present in more than one cell.

Parameters:: df_sparse (pandas.DataFrame) – Datagrame containing the species richness per grid cell.
Returns:: Cumulative species when in multiple cells.
Return type:: pandas.DataFrame

b3alien.b3cube.filter_multiple_occ(df_sparse)[source]

Only count a species established when there are multiple occurrences in a cell.

Parameters:: df_sparse (pandas.DataFrame) – Datagrame containing the species richness per grid cell.
Returns:: Cumulative species when multiple occurrences in a cell.
Return type:: pandas.DataFrame

b3alien.b3cube.filter_time_window(df, start_year, end_year, cols=['year', 'rate'])[source]

Filter a two column dataframe based on year window.

Parameters:

df (pandas.DataFrame) – A two column dataframe with one time axs
start_year (np.int) – Integer number of the earliest date to filter on.
end_year (np.int) – Integer number of the latest date to filter on.
cols (list of str, optional) – List of the names of the columns of the dataframe

Returns:

time (pandas.Series) – Filtered time series
filtered_var (pandas.Series) – Filtered variable corresponding to the time series

b3alien.b3cube.find_correlations(cube: OccurrenceCube, top_n=10)[source]

Calculates species co-occurrence using efficient matrix operations.

This method identifies pairs of species that occur in the same ‘cell’ at the same ‘time’.

Parameters:

(OccurrenceCube) (cube)
(int) (top_n)

Returns:

list – and their co-occurrence count, sorted in descending order.

Return type:

A list of tuples, where each tuple contains the species names

b3alien.b3cube.get_survey_effort(cube, dateFormat='%Y-%m', calc_type='total')[source]

Estimate the survey effort in an OccurrenceCube.

Parameters:

cube (b3alien.b3cube.OccurrenceCube) – Species OccurrenceCube from GBIF.
dateFormat (str, optional) – Dateformat stored in the OccurrenceCube. Default is ‘%Y-%m’
calc_type (str, optional) –

Type of survey effort to be calculated.
’distinct’ : total number of distinct observers ‘total’ : total number of occurrences Default is total.

Returns:

df – Dataframe containing time and the chosen measurement for survey effort.

Return type:

pandas.DataFrame

b3alien.b3cube.plot_cumsum(df_cumulative)[source]

Create a plot of the cumulative number of species.

Parameters:: df_cumulative (pandas.DataFrame) – Datagrame containing the cumulative number over tume.
Returns:: A plot of the cumulative number of species.
Return type:: matplotlib.plot

b3alien.b3cube.plot_richness(cube, normalized=False, html_path='richness_map.html')[source]

Plot species richness from an OccurrenceCube instance.

Parameters:

cube (OccurrenceCube) – An instance of the cube with .data and .richness available.
normalized (bool) – Whether to use normalized richness.
html_path (str) – Path to save the map if not in Jupyter.

Return type:

None