.. DO NOT EDIT. .. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY. .. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE: .. "auto_examples/ex_binning.py" .. LINE NUMBERS ARE GIVEN BELOW. .. only:: html .. note:: :class: sphx-glr-download-link-note :ref:`Go to the end ` to download the full example code or to run this example in your browser via Binder .. rst-class:: sphx-glr-example-title .. _sphx_glr_auto_examples_ex_binning.py: ******* Binning ******* ========= Binning2D ========= Statistical data binning is a way to group several more or less continuous values into a smaller number of *bins*. For example, if you have irregularly distributed data over the oceans, you can organize these observations into a lower number of geographical intervals (for example, by grouping them all five degrees into latitudes and longitudes). In this example, we will calculate drifter velocity statistics on the Black Sea over a period of 9 years. .. GENERATED FROM PYTHON SOURCE LINES 19-28 .. code-block:: Python import cartopy.crs import matplotlib import matplotlib.pyplot import numpy import pyinterp import pyinterp.backends.xarray import pyinterp.tests .. GENERATED FROM PYTHON SOURCE LINES 29-31 The first step is to load the data into memory and create the interpolator object: .. GENERATED FROM PYTHON SOURCE LINES 31-33 .. code-block:: Python ds = pyinterp.tests.load_aoml() .. GENERATED FROM PYTHON SOURCE LINES 34-35 Let's start by calculating the standard for vectors u and v. .. GENERATED FROM PYTHON SOURCE LINES 35-37 .. code-block:: Python norm = (ds.ud**2 + ds.vd**2)**0.5 .. GENERATED FROM PYTHON SOURCE LINES 38-40 Now, we will describe the grid used to calculate our :py:class:`binned ` statistics. .. GENERATED FROM PYTHON SOURCE LINES 40-45 .. code-block:: Python binning = pyinterp.Binning2D( pyinterp.Axis(numpy.arange(27, 42, 0.3), is_circle=True), pyinterp.Axis(numpy.arange(40, 47, 0.3))) binning .. rst-class:: sphx-glr-script-out .. code-block:: none Axis: x: min_value: 27 max_value: 41.7 step : 0.3 is_circle: false y: min_value: 40 max_value: 46.9 step : 0.3 is_circle: false .. GENERATED FROM PYTHON SOURCE LINES 46-48 We push the loaded data into the different defined bins using :ref:`simple binning `. .. GENERATED FROM PYTHON SOURCE LINES 48-51 .. code-block:: Python binning.clear() binning.push(ds.lon, ds.lat, norm, True) .. GENERATED FROM PYTHON SOURCE LINES 52-67 .. note :: If the processed data is larger than the available RAM, it's possible to use Dask to parallel the calculation. To do this, an instance must be built, then the data must be added using the :py:meth:`push_delayed ` method. This method will return a graph, which when executed will return a new instance containing the calculated statistics. .. code:: python binning = binning.push_delayed(lon, lat, data).compute() It is possible to retrieve other statistical :py:meth:`variables ` such as variance, minimum, maximum, etc. .. GENERATED FROM PYTHON SOURCE LINES 67-69 .. code-block:: Python nearest = binning.variable('mean') .. GENERATED FROM PYTHON SOURCE LINES 70-72 Then, we push the loaded data into the different defined bins using :ref:`linear binning `. .. GENERATED FROM PYTHON SOURCE LINES 72-76 .. code-block:: Python binning.clear() binning.push(ds.lon, ds.lat, norm, False) linear = binning.variable('mean') .. GENERATED FROM PYTHON SOURCE LINES 77-78 We visualize our result .. GENERATED FROM PYTHON SOURCE LINES 78-107 .. code-block:: Python fig = matplotlib.pyplot.figure(figsize=(10, 8)) ax1 = fig.add_subplot(211, projection=cartopy.crs.PlateCarree()) lon, lat = numpy.meshgrid(binning.x, binning.y, indexing='ij') pcm = ax1.pcolormesh(lon, lat, nearest, cmap='jet', shading='auto', vmin=0, vmax=1, transform=cartopy.crs.PlateCarree()) ax1.coastlines() ax1.set_title('Simple binning.') ax2 = fig.add_subplot(212, projection=cartopy.crs.PlateCarree()) lon, lat = numpy.meshgrid(binning.x, binning.y, indexing='ij') pcm = ax2.pcolormesh(lon, lat, linear, cmap='jet', shading='auto', vmin=0, vmax=1, transform=cartopy.crs.PlateCarree()) ax2.coastlines() ax2.set_title('Linear binning.') fig.colorbar(pcm, ax=[ax1, ax2], shrink=0.8) fig.show() .. image-sg:: /auto_examples/images/sphx_glr_ex_binning_001.png :alt: Simple binning., Linear binning. :srcset: /auto_examples/images/sphx_glr_ex_binning_001.png :class: sphx-glr-single-img .. GENERATED FROM PYTHON SOURCE LINES 108-122 =========== Histogram2D =========== :py:class:`This class`, like the previous one, allows calculating a binning using distribution and obtains the median value of the pixels. histograms. In addition, this approach calculates the quantiles of the Note that the algorithm used defines a maximum size of the number of bins handled by each histogram. If the number of observations is greater than the capacity of the histogram, the histogram will be compressed to best present this distribution in limited memory size. The description of the exact algorithm is in the article `A Streaming Parallel Decision Tree Algorithm `_. .. GENERATED FROM PYTHON SOURCE LINES 122-127 .. code-block:: Python hist2d = pyinterp.Histogram2D( pyinterp.Axis(numpy.arange(27, 42, 0.3), is_circle=True), pyinterp.Axis(numpy.arange(40, 47, 0.3))) hist2d .. rst-class:: sphx-glr-script-out .. code-block:: none Axis: x: min_value: 27 max_value: 41.7 step : 0.3 is_circle: false y: min_value: 40 max_value: 46.9 step : 0.3 is_circle: false .. GENERATED FROM PYTHON SOURCE LINES 128-130 We push the loaded data into the different defined bins using the method :py:meth:`push `. .. GENERATED FROM PYTHON SOURCE LINES 130-132 .. code-block:: Python hist2d.push(ds.lon, ds.lat, norm) .. GENERATED FROM PYTHON SOURCE LINES 133-134 We visualize the mean vs median of the distribution. .. GENERATED FROM PYTHON SOURCE LINES 134-162 .. code-block:: Python fig = matplotlib.pyplot.figure(figsize=(10, 8)) ax1 = fig.add_subplot(211, projection=cartopy.crs.PlateCarree()) lon, lat = numpy.meshgrid(binning.x, binning.y, indexing='ij') pcm = ax1.pcolormesh(lon, lat, nearest, cmap='jet', shading='auto', vmin=0, vmax=1, transform=cartopy.crs.PlateCarree()) ax1.coastlines() ax1.set_title('Mean') ax2 = fig.add_subplot(212, projection=cartopy.crs.PlateCarree()) lon, lat = numpy.meshgrid(binning.x, binning.y, indexing='ij') pcm = ax2.pcolormesh(lon, lat, hist2d.variable('quantile', 0.5), cmap='jet', shading='auto', vmin=0, vmax=1, transform=cartopy.crs.PlateCarree()) ax2.coastlines() ax2.set_title('Median') fig.colorbar(pcm, ax=[ax1, ax2], shrink=0.8) fig.show() .. image-sg:: /auto_examples/images/sphx_glr_ex_binning_002.png :alt: Mean, Median :srcset: /auto_examples/images/sphx_glr_ex_binning_002.png :class: sphx-glr-single-img .. rst-class:: sphx-glr-timing **Total running time of the script:** (0 minutes 1.715 seconds) .. _sphx_glr_download_auto_examples_ex_binning.py: .. only:: html .. container:: sphx-glr-footer sphx-glr-footer-example .. container:: binder-badge .. image:: images/binder_badge_logo.svg :target: https://mybinder.org/v2/gh/CNES/pangeo-pyinterp/master?urlpath=lab/tree/notebooks/auto_examples/ex_binning.ipynb :alt: Launch binder :width: 150 px .. container:: sphx-glr-download sphx-glr-download-jupyter :download:`Download Jupyter notebook: ex_binning.ipynb ` .. container:: sphx-glr-download sphx-glr-download-python :download:`Download Python source code: ex_binning.py ` .. only:: html .. rst-class:: sphx-glr-signature `Gallery generated by Sphinx-Gallery `_