OccurrenceCube Class
- class b3alien.b3cube.OccurrenceCube(filepath: str, source='geoparquet', gproject='', dims=None, coords=None, index_col=None)[source]
Bases:
objectLoad a GeoParquet file (local or from GCS) into a sparse xarray cube.
- Parameters:
filepath (str) – Path to the GeoParquet file (e.g. ‘gs://bucket/file.parquet’).
dims (list or tuple, optional) – Dimension names. Default is [‘time’, ‘cell’, ‘species’].
coords (dict, optional) – Optional coordinates to assign to the cube.
index_col (str or list, optional) – Column(s) to use for reshaping if needed.
- Returns:
A sparse data cube loaded from the GeoParquet file. self.df contains a geopandas.DataFrame self.data a sparse xarray.Xarray
- Return type:
- _create_xcube(df, dims=('time', 'cell', 'species'))[source]
This function converts a GeoDataFrame into a sparse xarray cube with geometry metadata in case a GeoParquet file was loaded. In case of a pure GBIF cube, the geometry is ignored.
- _filter_species(speciesKey)[source]
Filter the cube to only include data for a specific speciesKey.
- Parameters:
speciesKey (int or str) – The GBIF speciesKey to filter on.
- Returns:
Updates self.df and self.data to only include the specified speciesKey.
- Return type:
None
- _load_gbifcsv(path)[source]
Load a GBIF CSV file from local disk using Pandas. Assumes tab-separated values.
- _load_geoparquet(path, gproject)[source]
Load a GeoParquet file from local disk or GCS using GeoPandas.
- _species_richness(normalized=False)[source]
Calculate species richness per cell from the sparse cube. :param normalized: Whether to compute normalized richness (richness / total occurrences). :type normalized: bool
- Returns:
Sets self.richness as a DataFrame with columns ‘cell’ and ‘richness’ or ‘normalized_richness’.
- Return type:
None
Biodiversity data cube functions
- b3alien.b3cube.aggregate_count_per_cell(cube, taxonRank, taxon)[source]
Aggregate the counts per taxonomic level per cell. This can be used as a normalization factor
- Parameters:
cube (OccurrenceCube) – An OccurrenceCube with geometries added
taxonRank (dwc.taxonRank term) – Level at which the aggregation needs to be performed
taxon (str) – Name of the taxon at which aggregation needs to be performed
plot (bool, optional) – whether the aggregated count per cell needs to be plotted on a map
- Returns:
gdf – dataframe containing geometries and the aggregated occurrence count
- Return type:
GeoDataFrame
- b3alien.b3cube.calculate_rate(df_cumulative)[source]
Calculate the rate of establishment from the cumulative distribution.
- Parameters:
df_cumulative (pandas.DataFrame) – Datagrame containing the cumulative distribution.
- Returns:
s1 (pandas.Series) – Series of the time axis.
s2 (pandas.Series) – Series of the rate of establishment.
- b3alien.b3cube.cumulative_species(cube, species_to_keep)[source]
Calculate the cumulative number of species in a OccurrenceCube.
- Parameters:
cube (b3alien.b3cube.OccurrenceCube) – Species OccurrenceCube from GBIF.
species_to_keep (numpy.array) – Array of GBIF speciesKeys that need to be taken into account to calculate the cumulative species number of a cube.
geom (str, optional)
- Returns:
df1 (pandas.DataFrame) – Sparse dataframe that still contains the cumulative sum per grid cell.
df2 (pandas.DataFrame) – Cumulative dataframe cell independent.
- b3alien.b3cube.filter_multiple_cells(df_sparse)[source]
Only count a species established when it is present in more than one cell.
- Parameters:
df_sparse (pandas.DataFrame) – Datagrame containing the species richness per grid cell.
- Returns:
Cumulative species when in multiple cells.
- Return type:
pandas.DataFrame
- b3alien.b3cube.filter_multiple_occ(df_sparse)[source]
Only count a species established when there are multiple occurrences in a cell.
- Parameters:
df_sparse (pandas.DataFrame) – Datagrame containing the species richness per grid cell.
- Returns:
Cumulative species when multiple occurrences in a cell.
- Return type:
pandas.DataFrame
- b3alien.b3cube.filter_time_window(df, start_year, end_year, cols=['year', 'rate'])[source]
Filter a two column dataframe based on year window.
- Parameters:
df (pandas.DataFrame) – A two column dataframe with one time axs
start_year (np.int) – Integer number of the earliest date to filter on.
end_year (np.int) – Integer number of the latest date to filter on.
cols (list of str, optional) – List of the names of the columns of the dataframe
- Returns:
time (pandas.Series) – Filtered time series
filtered_var (pandas.Series) – Filtered variable corresponding to the time series
- b3alien.b3cube.find_correlations(cube: OccurrenceCube, top_n=10)[source]
Calculates species co-occurrence using efficient matrix operations.
This method identifies pairs of species that occur in the same ‘cell’ at the same ‘time’.
- Parameters:
(OccurrenceCube) (cube)
(int) (top_n)
- Returns:
list – and their co-occurrence count, sorted in descending order.
- Return type:
A list of tuples, where each tuple contains the species names
- b3alien.b3cube.get_survey_effort(cube, dateFormat='%Y-%m', calc_type='total')[source]
Estimate the survey effort in an OccurrenceCube.
- Parameters:
cube (b3alien.b3cube.OccurrenceCube) – Species OccurrenceCube from GBIF.
dateFormat (str, optional) – Dateformat stored in the OccurrenceCube. Default is ‘%Y-%m’
calc_type (str, optional) –
- Type of survey effort to be calculated.
’distinct’ : total number of distinct observers ‘total’ : total number of occurrences Default is total.
- Returns:
df – Dataframe containing time and the chosen measurement for survey effort.
- Return type:
pandas.DataFrame
- b3alien.b3cube.plot_cumsum(df_cumulative)[source]
Create a plot of the cumulative number of species.
- Parameters:
df_cumulative (pandas.DataFrame) – Datagrame containing the cumulative number over tume.
- Returns:
A plot of the cumulative number of species.
- Return type:
matplotlib.plot
- b3alien.b3cube.plot_richness(cube, normalized=False, html_path='richness_map.html')[source]
Plot species richness from an OccurrenceCube instance.
- Parameters:
cube (OccurrenceCube) – An instance of the cube with .data and .richness available.
normalized (bool) – Whether to use normalized richness.
html_path (str) – Path to save the map if not in Jupyter.
- Return type:
None