pyinterp.Histogram2D

class pyinterp.Histogram2D(x: pyinterp.core.Axis, y: pyinterp.core.Axis, bin_counts: Optional[int] = None, dtype: Optional[numpy.dtype] = dtype('float64'))[source]

Bases: object

Group a number of more or less continuous values into a smaller number of “bins” located on a grid.

This class will build for each pixel of the defined grid, a histogram. This histogram will be used to compute the statistics.

Histogram used uses the algorithm described in the paper A Streaming Parallel Decision Tree Algorithm. Therefore, if the number of observations to be taken into account in a pixel exceeds the maximum number of bins, the calculated statistics will be an approximate value of the exact statistical variable. This algorithm is useful if you want to know the statistical distribution per pixel or the value of a quantile, like the median. Otherwise, use the pyinterp.Binning2D class.

Note

Yael Ben-Haim and Elad Tom-Tov, A Streaming Parallel Decision Tree Algorithm, Journal of Machine Learning Research, 11, 28, 849-872 http://jmlr.org/papers/v11/ben-haim10a.html

__init__(x: pyinterp.core.Axis, y: pyinterp.core.Axis, bin_counts: Optional[int] = None, dtype: Optional[numpy.dtype] = dtype('float64'))[source]

Initializes the grid used to calculate the statistics.

Parameters
  • x (pyinterp.Axis) – Definition of the bin centers for the X axis of the grid.

  • y (pyinterp.Axis) – Definition of the bin centers for the Y axis of the grid.

  • bin_counts (int, optional) – The number of bins to use. If not set, the number of bins is 100.

  • dtype (numpy.dtype, optional) – Data type of the instance to create.

Note

The axes define the centers of the different cells where the statistics will be calculated, as shown in the figure below.

../_images/coordinates.svg

In this example, to calculate the statistics in the different cells defined, the coordinates of the axes must be shifted by half a grid step, 0.5 in this example.

Methods

Histogram2D.clear()

Clears the data inside each bin.

Histogram2D.push(x, y, z)

Push new samples into the defined bins.

Histogram2D.push_delayed(x, y, z)

Push new samples into the defined bins from dask array.

Histogram2D.variable([statistics])

Gets the regular grid containing the calculated statistics.

Histogram2D.__add__(other)

Histogram2D.__repr__()

Called by the repr() built-in function to compute the string representation of this instance.

Attributes

Histogram2D.x

Gets the bin centers for the X Axis of the grid.

Histogram2D.y

Gets the bin centers for the Y Axis of the grid.