.. _combining data:

Combining data
--------------

.. jupyter-execute::
    :hide-code:
    :hide-output:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

    %xmode minimal

* For combining datasets or data arrays along a single dimension, see concatenate_.
* For combining datasets with different variables, see merge_.
* For combining datasets or data arrays with different indexes or missing values, see combine_.
* For combining datasets or data arrays along multiple dimensions see combining.multi_.

.. _concatenate:

Concatenate
~~~~~~~~~~~

To combine :py:class:`~xarray.Dataset` / :py:class:`~xarray.DataArray` objects along an existing or new dimension
into a larger object, you can use :py:func:`~xarray.concat`. ``concat``
takes an iterable of ``DataArray`` or ``Dataset`` objects, as well as a
dimension name, and concatenates along that dimension:

.. jupyter-execute::

    da = xr.DataArray(
        np.arange(6).reshape(2, 3), [("x", ["a", "b"]), ("y", [10, 20, 30])]
    )
    da.isel(y=slice(0, 1))  # same as da[:, :1]

.. jupyter-execute::

    # This resembles how you would use np.concatenate:
    xr.concat([da[:, :1], da[:, 1:]], dim="y")

.. jupyter-execute::

    # For more friendly pandas-like indexing you can use:
    xr.concat([da.isel(y=slice(0, 1)), da.isel(y=slice(1, None))], dim="y")

In addition to combining along an existing dimension, ``concat`` can create a
new dimension by stacking lower dimensional arrays together:

.. jupyter-execute::

    da.sel(x="a")

.. jupyter-execute::

    xr.concat([da.isel(x=0), da.isel(x=1)], "x")

If the second argument to ``concat`` is a new dimension name, the arrays will
be concatenated along that new dimension, which is always inserted as the first
dimension:

.. jupyter-execute::

    xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim")

The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or
:py:class:`~xarray.DataArray` object as well as a string, in which case it is
used to label the values along the new dimension:

.. jupyter-execute::

    xr.concat([da.isel(x=0), da.isel(x=1)], pd.Index([-90, -100], name="new_dim"))

Of course, ``concat`` also works on ``Dataset`` objects:

.. jupyter-execute::

    ds = da.to_dataset(name="foo")
    xr.concat([ds.sel(x="a"), ds.sel(x="b")], "x")

:py:func:`~xarray.concat` has a number of options which provide deeper control
over which variables are concatenated and how it handles conflicting variables
between datasets. With the default parameters, xarray will load some coordinate
variables into memory to compare them between datasets. This may be prohibitively
expensive if you are manipulating your dataset lazily using :ref:`dask`.

.. _merge:

Merge
~~~~~

To combine variables and coordinates between multiple ``DataArray`` and/or
``Dataset`` objects, use :py:func:`~xarray.merge`. It can merge a list of
``Dataset``, ``DataArray`` or dictionaries of objects convertible to
``DataArray`` objects:

.. jupyter-execute::

    xr.merge([ds, ds.rename({"foo": "bar"})])

.. jupyter-execute::

    xr.merge([xr.DataArray(n, name="var%d" % n) for n in range(5)])

If you merge another dataset (or a dictionary including data array objects), by
default the resulting dataset will be aligned on the **union** of all index
coordinates:

.. jupyter-execute::

    other = xr.Dataset({"bar": ("x", [1, 2, 3, 4]), "x": list("abcd")})
    xr.merge([ds, other])

This ensures that ``merge`` is non-destructive. ``xarray.MergeError`` is raised
if you attempt to merge two variables with the same name but different values:

.. jupyter-execute::
    :raises:

    xr.merge([ds, ds + 1])


The same non-destructive merging between ``DataArray`` index coordinates is
used in the :py:class:`~xarray.Dataset` constructor:

.. jupyter-execute::

    xr.Dataset({"a": da.isel(x=slice(0, 1)), "b": da.isel(x=slice(1, 2))})

.. _combine:

Combine
~~~~~~~

The instance method :py:meth:`~xarray.DataArray.combine_first` combines two
datasets/data arrays and defaults to non-null values in the calling object,
using values from the called object to fill holes.  The resulting coordinates
are the union of coordinate labels. Vacant cells as a result of the outer-join
are filled with ``NaN``. For example:

.. jupyter-execute::

    ar0 = xr.DataArray([[0, 0], [0, 0]], [("x", ["a", "b"]), ("y", [-1, 0])])
    ar1 = xr.DataArray([[1, 1], [1, 1]], [("x", ["b", "c"]), ("y", [0, 1])])
    ar0.combine_first(ar1)

.. jupyter-execute::

    ar1.combine_first(ar0)

For datasets, ``ds0.combine_first(ds1)`` works similarly to
``xr.merge([ds0, ds1])``, except that ``xr.merge`` raises ``MergeError`` when
there are conflicting values in variables to be merged, whereas
``.combine_first`` defaults to the calling object's values.

.. _update:

Update
~~~~~~

In contrast to ``merge``, :py:meth:`~xarray.Dataset.update` modifies a dataset
in-place without checking for conflicts, and will overwrite any existing
variables with new values:

.. jupyter-execute::

    ds.update({"space": ("space", [10.2, 9.4, 3.9])})

However, dimensions are still required to be consistent between different
Dataset variables, so you cannot change the size of a dimension unless you
replace all dataset variables that use it.

``update`` also performs automatic alignment if necessary. Unlike ``merge``, it
maintains the alignment of the original array instead of merging indexes:

.. jupyter-execute::

    ds.update(other)

The exact same alignment logic when setting a variable with ``__setitem__``
syntax:

.. jupyter-execute::

    ds["baz"] = xr.DataArray([9, 9, 9, 9, 9], coords=[("x", list("abcde"))])
    ds.baz

Equals and identical
~~~~~~~~~~~~~~~~~~~~

Xarray objects can be compared by using the :py:meth:`~xarray.Dataset.equals`,
:py:meth:`~xarray.Dataset.identical` and
:py:meth:`~xarray.Dataset.broadcast_equals` methods. These methods are used by
the optional ``compat`` argument on ``concat`` and ``merge``.

:py:attr:`~xarray.Dataset.equals` checks dimension names, indexes and array
values:

.. jupyter-execute::

    da.equals(da.copy())

:py:attr:`~xarray.Dataset.identical` also checks attributes, and the name of each
object:

.. jupyter-execute::

    da.identical(da.rename("bar"))

:py:attr:`~xarray.Dataset.broadcast_equals` does a more relaxed form of equality
check that allows variables to have different dimensions, as long as values
are constant along those new dimensions:

.. jupyter-execute::

    left = xr.Dataset(coords={"x": 0})
    right = xr.Dataset({"x": [0, 0, 0]})
    left.broadcast_equals(right)

Like pandas objects, two xarray objects are still equal or identical if they have
missing values marked by ``NaN`` in the same locations.

In contrast, the ``==`` operation performs element-wise comparison (like
numpy):

.. jupyter-execute::

    da == da.copy()

Note that ``NaN`` does not compare equal to ``NaN`` in element-wise comparison;
you may need to deal with missing values explicitly.

.. _combining.no_conflicts:

Merging with 'no_conflicts'
~~~~~~~~~~~~~~~~~~~~~~~~~~~

The ``compat`` argument ``'no_conflicts'`` is only available when
combining xarray objects with ``merge``. In addition to the above comparison
methods it allows the merging of xarray objects with locations where *either*
have ``NaN`` values. This can be used to combine data with overlapping
coordinates as long as any non-missing values agree or are disjoint:

.. jupyter-execute::

    ds1 = xr.Dataset({"a": ("x", [10, 20, 30, np.nan])}, {"x": [1, 2, 3, 4]})
    ds2 = xr.Dataset({"a": ("x", [np.nan, 30, 40, 50])}, {"x": [2, 3, 4, 5]})
    xr.merge([ds1, ds2], compat="no_conflicts")

Note that due to the underlying representation of missing values as floating
point numbers (``NaN``), variable data type is not always preserved when merging
in this manner.

.. _combining.multi:

Combining along multiple dimensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For combining many objects along multiple dimensions xarray provides
:py:func:`~xarray.combine_nested` and :py:func:`~xarray.combine_by_coords`. These
functions use a combination of ``concat`` and ``merge`` across different
variables to combine many objects into one.

:py:func:`~xarray.combine_nested` requires specifying the order in which the
objects should be combined, while :py:func:`~xarray.combine_by_coords` attempts to
infer this ordering automatically from the coordinates in the data.

:py:func:`~xarray.combine_nested` is useful when you know the spatial
relationship between each object in advance. The datasets must be provided in
the form of a nested list, which specifies their relative position and
ordering. A common task is collecting data from a parallelized simulation where
each processor wrote out data to a separate file. A domain which was decomposed
into 4 parts, 2 each along both the x and y axes, requires organising the
datasets into a doubly-nested list, e.g:

.. jupyter-execute::

    arr = xr.DataArray(
        name="temperature", data=np.random.randint(5, size=(2, 2)), dims=["x", "y"]
    )
    arr

.. jupyter-execute::

    ds_grid = [[arr, arr], [arr, arr]]
    xr.combine_nested(ds_grid, concat_dim=["x", "y"])

:py:func:`~xarray.combine_nested` can also be used to explicitly merge datasets
with different variables. For example if we have 4 datasets, which are divided
along two times, and contain two different variables, we can pass ``None``
to ``'concat_dim'`` to specify the dimension of the nested list over which
we wish to use ``merge`` instead of ``concat``:

.. jupyter-execute::

    temp = xr.DataArray(name="temperature", data=np.random.randn(2), dims=["t"])
    precip = xr.DataArray(name="precipitation", data=np.random.randn(2), dims=["t"])
    ds_grid = [[temp, precip], [temp, precip]]
    xr.combine_nested(ds_grid, concat_dim=["t", None])

:py:func:`~xarray.combine_by_coords` is for combining objects which have dimension
coordinates which specify their relationship to and order relative to one
another, for example a linearly-increasing 'time' dimension coordinate.

Here we combine two datasets using their common dimension coordinates. Notice
they are concatenated in order based on the values in their dimension
coordinates, not on their position in the list passed to ``combine_by_coords``.

.. jupyter-execute::


    x1 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [0, 1, 2])])
    x2 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [3, 4, 5])])
    xr.combine_by_coords([x2, x1])

These functions can be used by :py:func:`~xarray.open_mfdataset` to open many
files as one dataset. The particular function used is specified by setting the
argument ``'combine'`` to ``'by_coords'`` or ``'nested'``. This is useful for
situations where your data is split across many files in multiple locations,
which have some known relationship between one another.

    .. currentmodule:: xarray

.. _complex:

Complex Numbers
===============

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import xarray as xr

Xarray leverages NumPy to seamlessly handle complex numbers in :py:class:`~xarray.DataArray` and :py:class:`~xarray.Dataset` objects.

In the examples below, we are using a DataArray named ``da`` with complex elements (of :math:`\mathbb{C}`):

.. jupyter-execute::

    data = np.array([[1 + 2j, 3 + 4j], [5 + 6j, 7 + 8j]])
    da = xr.DataArray(
        data,
        dims=["x", "y"],
        coords={"x": ["a", "b"], "y": [1, 2]},
        name="complex_nums",
    )


Operations on Complex Data
--------------------------
You can access real and imaginary components using the ``.real`` and ``.imag`` attributes. Most NumPy universal functions (ufuncs) like :py:doc:`numpy.abs <numpy:reference/generated/numpy.absolute>` or :py:doc:`numpy.angle <numpy:reference/generated/numpy.angle>` work directly.

.. jupyter-execute::

    da.real

.. jupyter-execute::

    np.abs(da)

.. note::
    Like NumPy, ``.real`` and ``.imag`` typically return *views*, not copies, of the original data.


Reading and Writing Complex Data
--------------------------------

Writing complex data to NetCDF files (see :ref:`io.netcdf`) is supported via :py:meth:`~xarray.DataArray.to_netcdf` using specific backend engines that handle complex types:


.. tab:: h5netcdf

   This requires the `h5netcdf <https://h5netcdf.org>`_ library to be installed.

   .. jupyter-execute::

       # write the data to disk
       da.to_netcdf("complex_nums_h5.nc", engine="h5netcdf")
       # read the file back into memory
       ds_h5 = xr.open_dataset("complex_nums_h5.nc", engine="h5netcdf")
       # check the dtype
       ds_h5[da.name].dtype


.. tab:: netcdf4

   Requires the `netcdf4-python (>= 1.7.1) <https://github.com/Unidata/netcdf4-python>`_ library and you have to enable ``auto_complex=True``.

   .. jupyter-execute::

       # write the data to disk
       da.to_netcdf("complex_nums_nc4.nc", engine="netcdf4", auto_complex=True)
       # read the file back into memory
       ds_nc4 = xr.open_dataset(
           "complex_nums_nc4.nc", engine="netcdf4", auto_complex=True
       )
       # check the dtype
       ds_nc4[da.name].dtype


.. warning::
   The ``scipy`` engine only supports NetCDF V3 and does *not* support complex arrays; writing with ``engine="scipy"`` raises a ``TypeError``.


Alternative: Manual Handling
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

If direct writing is not supported (e.g., targeting NetCDF3), you can manually
split the complex array into separate real and imaginary variables before saving:

.. jupyter-execute::

    # Write data to file
    ds_manual = xr.Dataset(
        {
            f"{da.name}_real": da.real,
            f"{da.name}_imag": da.imag,
        }
    )
    ds_manual.to_netcdf("complex_manual.nc", engine="scipy")  # Example

    # Read data from file
    ds = xr.open_dataset("complex_manual.nc", engine="scipy")
    reconstructed = ds[f"{da.name}_real"] + 1j * ds[f"{da.name}_imag"]

Recommendations
^^^^^^^^^^^^^^^

- Use ``engine="netcdf4"`` with ``auto_complex=True`` for full compliance and ease.
- Use ``h5netcdf`` for HDF5-based storage when interoperability with HDF5 is desired.
- For maximum legacy support (NetCDF3), manually handle real/imaginary components.

.. jupyter-execute::
    :hide-code:

    # Cleanup
    import os

    for f in ["complex_nums_nc4.nc", "complex_nums_h5.nc", "complex_manual.nc"]:
        if os.path.exists(f):
            os.remove(f)


See also
--------
- :ref:`io.netcdf` — full NetCDF I/O guide
- `NumPy complex numbers <https://numpy.org/doc/stable/user/basics.types.html#complex>`__

    .. currentmodule:: xarray

.. _compute:

###########
Computation
###########


The labels associated with :py:class:`~xarray.DataArray` and
:py:class:`~xarray.Dataset` objects enables some powerful shortcuts for
computation, notably including aggregation and broadcasting by dimension
names.

Basic array math
================

Arithmetic operations with a single DataArray automatically vectorize (like
numpy) over all array values:

.. jupyter-execute::
    :hide-code:
    :hide-output:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

    %xmode minimal

.. jupyter-execute::

    arr = xr.DataArray(
        np.random.default_rng(0).random((2, 3)),
        [("x", ["a", "b"]), ("y", [10, 20, 30])],
    )
    arr - 3

.. jupyter-execute::

    abs(arr)

You can also use any of numpy's or scipy's many `ufunc`__ functions directly on
a DataArray:

__ https://numpy.org/doc/stable/reference/ufuncs.html

.. jupyter-execute::

    np.sin(arr)

Use :py:func:`~xarray.where` to conditionally switch between values:

.. jupyter-execute::

    xr.where(arr > 0, "positive", "negative")

Use ``@`` to compute the :py:func:`~xarray.dot` product:

.. jupyter-execute::

    arr @ arr

Data arrays also implement many :py:class:`numpy.ndarray` methods:

.. jupyter-execute::

    arr.round(2)

.. jupyter-execute::

    arr.T

.. jupyter-execute::

    intarr = xr.DataArray([0, 1, 2, 3, 4, 5])
    intarr << 2  # only supported for int types

.. jupyter-execute::

    intarr >> 1

.. _missing_values:

Missing values
==============

Xarray represents missing values using the "NaN" (Not a Number) value from NumPy, which is a
special floating-point value that indicates a value that is undefined or unrepresentable.
There are several methods for handling missing values in xarray:

Xarray objects borrow the :py:meth:`~xarray.DataArray.isnull`,
:py:meth:`~xarray.DataArray.notnull`, :py:meth:`~xarray.DataArray.count`,
:py:meth:`~xarray.DataArray.dropna`, :py:meth:`~xarray.DataArray.fillna`,
:py:meth:`~xarray.DataArray.ffill`, and :py:meth:`~xarray.DataArray.bfill`
methods for working with missing data from pandas:

:py:meth:`~xarray.DataArray.isnull` is a method in xarray that can be used to check for missing or null values in an xarray object.
It returns a new xarray object with the same dimensions as the original object, but with boolean values
indicating where **missing values** are present.

.. jupyter-execute::

    x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
    x.isnull()

In this example, the third and fourth elements of 'x' are NaN, so the resulting :py:class:`~xarray.DataArray`
object has 'True' values in the third and fourth positions and 'False' values in the other positions.

:py:meth:`~xarray.DataArray.notnull` is a method in xarray that can be used to check for non-missing or non-null values in an xarray
object. It returns a new xarray object with the same dimensions as the original object, but with boolean
values indicating where **non-missing values** are present.

.. jupyter-execute::

    x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
    x.notnull()

In this example, the first two and the last elements of x are not NaN, so the resulting
:py:class:`~xarray.DataArray` object has 'True' values in these positions, and 'False' values in the
third and fourth positions where NaN is located.

:py:meth:`~xarray.DataArray.count` is a method in xarray that can be used to count the number of
non-missing values along one or more dimensions of an xarray object. It returns a new xarray object with
the same dimensions as the original object, but with each element replaced by the count of non-missing
values along the specified dimensions.

.. jupyter-execute::

    x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
    x.count()

In this example, 'x' has five elements, but two of them are NaN, so the resulting
:py:class:`~xarray.DataArray` object having a single element containing the value '3', which represents
the number of non-null elements in x.

:py:meth:`~xarray.DataArray.dropna` is a method in xarray that can be used to remove missing or null values from an xarray object.
It returns a new xarray object with the same dimensions as the original object, but with missing values
removed.

.. jupyter-execute::

    x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
    x.dropna(dim="x")

In this example, on calling x.dropna(dim="x") removes any missing values and returns a new
:py:class:`~xarray.DataArray` object with only the non-null elements [0, 1, 2] of 'x', in the
original order.

:py:meth:`~xarray.DataArray.fillna` is a method in xarray that can be used to fill missing or null values in an xarray object with a
specified value or method. It returns a new xarray object with the same dimensions as the original object, but with missing values filled.

.. jupyter-execute::

    x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
    x.fillna(-1)

In this example, there are two NaN values in 'x', so calling x.fillna(-1) replaces these values with -1 and
returns a new :py:class:`~xarray.DataArray` object with five elements, containing the values
[0, 1, -1, -1, 2] in the original order.

:py:meth:`~xarray.DataArray.ffill` is a method in xarray that can be used to forward fill (or fill forward) missing values in an
xarray object along one or more dimensions. It returns a new xarray object with the same dimensions as the
original object, but with missing values replaced by the last non-missing value along the specified dimensions.

.. jupyter-execute::

    x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
    x.ffill("x")

In this example, there are two NaN values in 'x', so calling x.ffill("x") fills these values with the last
non-null value in the same dimension, which are 0 and 1, respectively. The resulting :py:class:`~xarray.DataArray` object has
five elements, containing the values [0, 1, 1, 1, 2] in the original order.

:py:meth:`~xarray.DataArray.bfill` is a method in xarray that can be used to backward fill (or fill backward) missing values in an
xarray object along one or more dimensions. It returns a new xarray object with the same dimensions as the original object, but
with missing values replaced by the next non-missing value along the specified dimensions.

.. jupyter-execute::

    x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
    x.bfill("x")

In this example, there are two NaN values in 'x', so calling x.bfill("x") fills these values with the next
non-null value in the same dimension, which are 2 and 2, respectively. The resulting :py:class:`~xarray.DataArray` object has
five elements, containing the values [0, 1, 2, 2, 2] in the original order.

Like pandas, xarray uses the float value ``np.nan`` (not-a-number) to represent
missing values.

Xarray objects also have an :py:meth:`~xarray.DataArray.interpolate_na` method
for filling missing values via 1D interpolation. It returns a new xarray object with the same dimensions
as the original object, but with missing values interpolated.

.. jupyter-execute::

    x = xr.DataArray(
        [0, 1, np.nan, np.nan, 2],
        dims=["x"],
        coords={"xx": xr.Variable("x", [0, 1, 1.1, 1.9, 3])},
    )
    x.interpolate_na(dim="x", method="linear", use_coordinate="xx")

In this example, there are two NaN values in 'x', so calling x.interpolate_na(dim="x", method="linear",
use_coordinate="xx") fills these values with interpolated values along the "x" dimension using linear
interpolation based on the values of the xx coordinate. The resulting :py:class:`~xarray.DataArray` object has five elements,
containing the values [0., 1., 1.05, 1.45, 2.] in the original order. Note that the interpolated values
are calculated based on the values of the 'xx' coordinate, which has non-integer values, resulting in
non-integer interpolated values.

Note that xarray slightly diverges from the pandas ``interpolate`` syntax by
providing the ``use_coordinate`` keyword which facilitates a clear specification
of which values to use as the index in the interpolation.
Xarray also provides the ``max_gap`` keyword argument to limit the interpolation to
data gaps of length ``max_gap`` or smaller. See :py:meth:`~xarray.DataArray.interpolate_na`
for more.

.. _agg:

Aggregation
===========

Aggregation methods have been updated to take a ``dim`` argument instead of
``axis``. This allows for very intuitive syntax for aggregation methods that are
applied along particular dimension(s):

.. jupyter-execute::

    arr.sum(dim="x")

.. jupyter-execute::

    arr.std(["x", "y"])

.. jupyter-execute::

    arr.min()


If you need to figure out the axis number for a dimension yourself (say,
for wrapping code designed to work with numpy arrays), you can use the
:py:meth:`~xarray.DataArray.get_axis_num` method:

.. jupyter-execute::

    arr.get_axis_num("y")

These operations automatically skip missing values, like in pandas:

.. jupyter-execute::

    xr.DataArray([1, 2, np.nan, 3]).mean()

If desired, you can disable this behavior by invoking the aggregation method
with ``skipna=False``.

.. _compute.rolling:

Rolling window operations
=========================

``DataArray`` objects include a :py:meth:`~xarray.DataArray.rolling` method. This
method supports rolling window aggregation:

.. jupyter-execute::

    arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=("x", "y"))
    arr

:py:meth:`~xarray.DataArray.rolling` is applied along one dimension using the
name of the dimension as a key (e.g. ``y``) and the window size as the value
(e.g. ``3``).  We get back a ``Rolling`` object:

.. jupyter-execute::

    arr.rolling(y=3)

Aggregation and summary methods can be applied directly to the ``Rolling``
object:

.. jupyter-execute::

    r = arr.rolling(y=3)
    r.reduce(np.std)

.. jupyter-execute::

    r.mean()

Aggregation results are assigned the coordinate at the end of each window by
default, but can be centered by passing ``center=True`` when constructing the
``Rolling`` object:

.. jupyter-execute::

    r = arr.rolling(y=3, center=True)
    r.mean()

As can be seen above, aggregations of windows which overlap the border of the
array produce ``nan``\s.  Setting ``min_periods`` in the call to ``rolling``
changes the minimum number of observations within the window required to have
a value when aggregating:

.. jupyter-execute::

    r = arr.rolling(y=3, min_periods=2)
    r.mean()

.. jupyter-execute::

    r = arr.rolling(y=3, center=True, min_periods=2)
    r.mean()

From version 0.17, xarray supports multidimensional rolling,

.. jupyter-execute::

    r = arr.rolling(x=2, y=3, min_periods=2)
    r.mean()

.. tip::

   Note that rolling window aggregations are faster and use less memory when bottleneck_ is installed. This only applies to numpy-backed xarray objects with 1d-rolling.

.. _bottleneck: https://github.com/pydata/bottleneck

We can also manually iterate through ``Rolling`` objects:

.. code:: python

    for label, arr_window in r:
        # arr_window is a view of x
        ...

.. _compute.rolling_exp:

While ``rolling`` provides a simple moving average, ``DataArray`` also supports
an exponential moving average with :py:meth:`~xarray.DataArray.rolling_exp`.
This is similar to pandas' ``ewm`` method. numbagg_ is required.

.. _numbagg: https://github.com/numbagg/numbagg

.. code:: python

    arr.rolling_exp(y=3).mean()

The ``rolling_exp`` method takes a ``window_type`` kwarg, which can be ``'alpha'``,
``'com'`` (for ``center-of-mass``), ``'span'``, and ``'halflife'``. The default is
``span``.

Finally, the rolling object has a ``construct`` method which returns a
view of the original ``DataArray`` with the windowed dimension in
the last position.
You can use this for more advanced rolling operations such as strided rolling,
windowed rolling, convolution, short-time FFT etc.

.. jupyter-execute::

    # rolling with 2-point stride
    rolling_da = r.construct(x="x_win", y="y_win", stride=2)
    rolling_da

.. jupyter-execute::

    rolling_da.mean(["x_win", "y_win"], skipna=False)

Because the ``DataArray`` given by ``r.construct('window_dim')`` is a view
of the original array, it is memory efficient.
You can also use ``construct`` to compute a weighted rolling sum:

.. jupyter-execute::

    weight = xr.DataArray([0.25, 0.5, 0.25], dims=["window"])
    arr.rolling(y=3).construct(y="window").dot(weight)

.. note::
  numpy's Nan-aggregation functions such as ``nansum`` copy the original array.
  In xarray, we internally use these functions in our aggregation methods
  (such as ``.sum()``) if ``skipna`` argument is not specified or set to True.
  This means ``rolling_da.mean('window_dim')`` is memory inefficient.
  To avoid this, use ``skipna=False`` as the above example.


.. _compute.weighted:

Weighted array reductions
=========================

:py:class:`DataArray` and :py:class:`Dataset` objects include :py:meth:`DataArray.weighted`
and :py:meth:`Dataset.weighted` array reduction methods. They currently
support weighted ``sum``, ``mean``, ``std``, ``var`` and ``quantile``.

.. jupyter-execute::

    coords = dict(month=("month", [1, 2, 3]))

    prec = xr.DataArray([1.1, 1.0, 0.9], dims=("month",), coords=coords)
    weights = xr.DataArray([31, 28, 31], dims=("month",), coords=coords)

Create a weighted object:

.. jupyter-execute::

    weighted_prec = prec.weighted(weights)
    weighted_prec

Calculate the weighted sum:

.. jupyter-execute::

    weighted_prec.sum()

Calculate the weighted mean:

.. jupyter-execute::

    weighted_prec.mean(dim="month")

Calculate the weighted quantile:

.. jupyter-execute::

    weighted_prec.quantile(q=0.5, dim="month")

The weighted sum corresponds to:

.. jupyter-execute::

    weighted_sum = (prec * weights).sum()
    weighted_sum

the weighted mean to:

.. jupyter-execute::

    weighted_mean = weighted_sum / weights.sum()
    weighted_mean

the weighted variance to:

.. jupyter-execute::

    weighted_var = weighted_prec.sum_of_squares() / weights.sum()
    weighted_var

and the weighted standard deviation to:

.. jupyter-execute::

    weighted_std = np.sqrt(weighted_var)
    weighted_std

However, the functions also take missing values in the data into account:

.. jupyter-execute::

    data = xr.DataArray([np.nan, 2, 4])
    weights = xr.DataArray([8, 1, 1])

    data.weighted(weights).mean()

Using ``(data * weights).sum() / weights.sum()`` would (incorrectly) result
in 0.6.


If the weights add up to to 0, ``sum`` returns 0:

.. jupyter-execute::

    data = xr.DataArray([1.0, 1.0])
    weights = xr.DataArray([-1.0, 1.0])

    data.weighted(weights).sum()

and ``mean``, ``std`` and ``var`` return ``nan``:

.. jupyter-execute::

    data.weighted(weights).mean()


.. note::
  ``weights`` must be a :py:class:`DataArray` and cannot contain missing values.
  Missing values can be replaced manually by ``weights.fillna(0)``.

.. _compute.coarsen:

Coarsen large arrays
====================

:py:class:`DataArray` and :py:class:`Dataset` objects include a
:py:meth:`~xarray.DataArray.coarsen` and :py:meth:`~xarray.Dataset.coarsen`
methods. This supports block aggregation along multiple dimensions,

.. jupyter-execute::

    x = np.linspace(0, 10, 300)
    t = pd.date_range("1999-12-15", periods=364)
    da = xr.DataArray(
        np.sin(x) * np.cos(np.linspace(0, 1, 364)[:, np.newaxis]),
        dims=["time", "x"],
        coords={"time": t, "x": x},
    )
    da

In order to take a block mean for every 7 days along ``time`` dimension and
every 2 points along ``x`` dimension,

.. jupyter-execute::

    da.coarsen(time=7, x=2).mean()

:py:meth:`~xarray.DataArray.coarsen` raises a ``ValueError`` if the data
length is not a multiple of the corresponding window size.
You can choose ``boundary='trim'`` or ``boundary='pad'`` options for trimming
the excess entries or padding ``nan`` to insufficient entries,

.. jupyter-execute::

    da.coarsen(time=30, x=2, boundary="trim").mean()

If you want to apply a specific function to coordinate, you can pass the
function or method name to ``coord_func`` option,

.. jupyter-execute::

    da.coarsen(time=7, x=2, coord_func={"time": "min"}).mean()

You can also :ref:`use coarsen to reshape<reshape.coarsen>` without applying a computation.

.. _compute.using_coordinates:

Computation using Coordinates
=============================

Xarray objects have some handy methods for the computation with their
coordinates. :py:meth:`~xarray.DataArray.differentiate` computes derivatives by
central finite differences using their coordinates,

.. jupyter-execute::

    a = xr.DataArray([0, 1, 2, 3], dims=["x"], coords=[[0.1, 0.11, 0.2, 0.3]])
    a.differentiate("x")

This method can be used also for multidimensional arrays,

.. jupyter-execute::

    a = xr.DataArray(
        np.arange(8).reshape(4, 2), dims=["x", "y"], coords={"x": [0.1, 0.11, 0.2, 0.3]}
    )
    a.differentiate("x")

:py:meth:`~xarray.DataArray.integrate` computes integration based on
trapezoidal rule using their coordinates,

.. jupyter-execute::

    a.integrate("x")

.. note::
    These methods are limited to simple cartesian geometry. Differentiation
    and integration along multidimensional coordinate are not supported.


.. _compute.polyfit:

Fitting polynomials
===================

Xarray objects provide an interface for performing linear or polynomial regressions
using the least-squares method. :py:meth:`~xarray.DataArray.polyfit` computes the
best fitting coefficients along a given dimension and for a given order,

.. jupyter-execute::

    x = xr.DataArray(np.arange(10), dims=["x"], name="x")
    a = xr.DataArray(3 + 4 * x, dims=["x"], coords={"x": x})
    out = a.polyfit(dim="x", deg=1, full=True)
    out

The method outputs a dataset containing the coefficients (and more if ``full=True``).
The inverse operation is done with :py:meth:`~xarray.polyval`,

.. jupyter-execute::

    xr.polyval(coord=x, coeffs=out.polyfit_coefficients)

.. note::
    These methods replicate the behaviour of :py:func:`numpy.polyfit` and :py:func:`numpy.polyval`.


.. _compute.curvefit:

Fitting arbitrary functions
===========================

Xarray objects also provide an interface for fitting more complex functions using
:py:func:`scipy.optimize.curve_fit`. :py:meth:`~xarray.DataArray.curvefit` accepts
user-defined functions and can fit along multiple coordinates.

For example, we can fit a relationship between two ``DataArray`` objects, maintaining
a unique fit at each spatial coordinate but aggregating over the time dimension:

.. jupyter-execute::

    def exponential(x, a, xc):
        return np.exp((x - xc) / a)


    x = np.arange(-5, 5, 0.1)
    t = np.arange(-5, 5, 0.1)
    X, T = np.meshgrid(x, t)
    Z1 = np.random.uniform(low=-5, high=5, size=X.shape)
    Z2 = exponential(Z1, 3, X)
    Z3 = exponential(Z1, 1, -X)

    ds = xr.Dataset(
        data_vars=dict(
            var1=(["t", "x"], Z1), var2=(["t", "x"], Z2), var3=(["t", "x"], Z3)
        ),
        coords={"t": t, "x": x},
    )
    ds[["var2", "var3"]].curvefit(
        coords=ds.var1,
        func=exponential,
        reduce_dims="t",
        bounds={"a": (0.5, 5), "xc": (-5, 5)},
    )

We can also fit multi-dimensional functions, and even use a wrapper function to
simultaneously fit a summation of several functions, such as this field containing
two gaussian peaks:

.. jupyter-execute::

    def gaussian_2d(coords, a, xc, yc, xalpha, yalpha):
        x, y = coords
        z = a * np.exp(
            -np.square(x - xc) / 2 / np.square(xalpha)
            - np.square(y - yc) / 2 / np.square(yalpha)
        )
        return z


    def multi_peak(coords, *args):
        z = np.zeros(coords[0].shape)
        for i in range(len(args) // 5):
            z += gaussian_2d(coords, *args[i * 5 : i * 5 + 5])
        return z


    x = np.arange(-5, 5, 0.1)
    y = np.arange(-5, 5, 0.1)
    X, Y = np.meshgrid(x, y)

    n_peaks = 2
    names = ["a", "xc", "yc", "xalpha", "yalpha"]
    names = [f"{name}{i}" for i in range(n_peaks) for name in names]
    Z = gaussian_2d((X, Y), 3, 1, 1, 2, 1) + gaussian_2d((X, Y), 2, -1, -2, 1, 1)
    Z += np.random.normal(scale=0.1, size=Z.shape)

    da = xr.DataArray(Z, dims=["y", "x"], coords={"y": y, "x": x})
    da.curvefit(
        coords=["x", "y"],
        func=multi_peak,
        param_names=names,
        kwargs={"maxfev": 10000},
    )

.. note::
    This method replicates the behavior of :py:func:`scipy.optimize.curve_fit`.


.. _compute.broadcasting:

Broadcasting by dimension name
==============================

``DataArray`` objects automatically align themselves ("broadcasting" in
the numpy parlance) by dimension name instead of axis order. With xarray, you
do not need to transpose arrays or insert dimensions of length 1 to get array
operations to work, as commonly done in numpy with :py:func:`numpy.reshape` or
:py:data:`numpy.newaxis`.

This is best illustrated by a few examples. Consider two one-dimensional
arrays with different sizes aligned along different dimensions:

.. jupyter-execute::

    a = xr.DataArray([1, 2], [("x", ["a", "b"])])
    a

.. jupyter-execute::

    b = xr.DataArray([-1, -2, -3], [("y", [10, 20, 30])])
    b

With xarray, we can apply binary mathematical operations to these arrays, and
their dimensions are expanded automatically:

.. jupyter-execute::

    a * b

Moreover, dimensions are always reordered to the order in which they first
appeared:

.. jupyter-execute::

    c = xr.DataArray(np.arange(6).reshape(3, 2), [b["y"], a["x"]])
    c

.. jupyter-execute::

    a + c

This means, for example, that you always subtract an array from its transpose:

.. jupyter-execute::

    c - c.T

You can explicitly broadcast xarray data structures by using the
:py:func:`~xarray.broadcast` function:

.. jupyter-execute::

    a2, b2 = xr.broadcast(a, b)
    a2

.. jupyter-execute::

    b2

.. _math automatic alignment:

Automatic alignment
===================

Xarray enforces alignment between *index* :ref:`coordinates` (that is,
coordinates with the same name as a dimension, marked by ``*``) on objects used
in binary operations.

Similarly to pandas, this alignment is automatic for arithmetic on binary
operations. The default result of a binary operation is by the *intersection*
(not the union) of coordinate labels:

.. jupyter-execute::

    arr = xr.DataArray(np.arange(3), [("x", range(3))])
    arr + arr[:-1]

If coordinate values for a dimension are missing on either argument, all
matching dimensions must have the same size:

.. jupyter-execute::
    :raises:

    arr + xr.DataArray([1, 2], dims="x")

However, one can explicitly change this default automatic alignment type ("inner")
via :py:func:`~xarray.set_options()` in context manager:

.. jupyter-execute::

    with xr.set_options(arithmetic_join="outer"):
        arr + arr[:1]
    arr + arr[:1]

Before loops or performance critical code, it's a good idea to align arrays
explicitly (e.g., by putting them in the same Dataset or using
:py:func:`~xarray.align`) to avoid the overhead of repeated alignment with each
operation. See :ref:`align and reindex` for more details.

.. note::

    There is no automatic alignment between arguments when performing in-place
    arithmetic operations such as ``+=``. You will need to use
    :ref:`manual alignment<align and reindex>`. This ensures in-place
    arithmetic never needs to modify data types.

.. _coordinates math:

Coordinates
===========

Although index coordinates are aligned, other coordinates are not, and if their
values conflict, they will be dropped. This is necessary, for example, because
indexing turns 1D coordinates into scalar coordinates:

.. jupyter-execute::

    arr[0]

.. jupyter-execute::

    arr[1]

.. jupyter-execute::

    # notice that the scalar coordinate 'x' is silently dropped
    arr[1] - arr[0]

Still, xarray will persist other coordinates in arithmetic, as long as there
are no conflicting values:

.. jupyter-execute::

    # only one argument has the 'x' coordinate
    arr[0] + 1

.. jupyter-execute::

    # both arguments have the same 'x' coordinate
    arr[0] - arr[0]

Math with datasets
==================

Datasets support arithmetic operations by automatically looping over all data
variables:

.. jupyter-execute::

    ds = xr.Dataset(
        {
            "x_and_y": (("x", "y"), np.random.randn(3, 5)),
            "x_only": ("x", np.random.randn(3)),
        },
        coords=arr.coords,
    )
    ds > 0

Datasets support most of the same methods found on data arrays:

.. jupyter-execute::

    ds.mean(dim="x")

.. jupyter-execute::

    abs(ds)

Datasets also support NumPy ufuncs (requires NumPy v1.13 or newer), or
alternatively you can use :py:meth:`~xarray.Dataset.map` to map a function
to each variable in a dataset:

.. jupyter-execute::

    np.sin(ds) # equivalent to ds.map(np.sin)

Datasets also use looping over variables for *broadcasting* in binary
arithmetic. You can do arithmetic between any ``DataArray`` and a dataset:

.. jupyter-execute::

    ds + arr

Arithmetic between two datasets matches data variables of the same name:

.. jupyter-execute::

    ds2 = xr.Dataset({"x_and_y": 0, "x_only": 100})
    ds - ds2

Similarly to index based alignment, the result has the intersection of all
matching data variables.

.. _compute.wrapping-custom:

Wrapping custom computation
===========================

It doesn't always make sense to do computation directly with xarray objects:

  - In the inner loop of performance limited code, using xarray can add
    considerable overhead compared to using NumPy or native Python types.
    This is particularly true when working with scalars or small arrays (less
    than ~1e6 elements). Keeping track of labels and ensuring their consistency
    adds overhead, and xarray's core itself is not especially fast, because it's
    written in Python rather than a compiled language like C. Also, xarray's
    high level label-based APIs removes low-level control over how operations
    are implemented.
  - Even if speed doesn't matter, it can be important to wrap existing code, or
    to support alternative interfaces that don't use xarray objects.

For these reasons, it is often well-advised to write low-level routines that
work with NumPy arrays, and to wrap these routines to work with xarray objects.
However, adding support for labels on both :py:class:`~xarray.Dataset` and
:py:class:`~xarray.DataArray` can be a bit of a chore.

To make this easier, xarray supplies the :py:func:`~xarray.apply_ufunc` helper
function, designed for wrapping functions that support broadcasting and
vectorization on unlabeled arrays in the style of a NumPy
`universal function <https://numpy.org/doc/stable/reference/ufuncs.html>`_ ("ufunc" for short).
``apply_ufunc`` takes care of everything needed for an idiomatic xarray wrapper,
including alignment, broadcasting, looping over ``Dataset`` variables (if
needed), and merging of coordinates. In fact, many internal xarray
functions/methods are written using ``apply_ufunc``.

Simple functions that act independently on each value should work without
any additional arguments:

.. jupyter-execute::

    squared_error = lambda x, y: (x - y) ** 2
    arr1 = xr.DataArray([0, 1, 2, 3], dims="x")
    xr.apply_ufunc(squared_error, arr1, 1)

For using more complex operations that consider some array values collectively,
it's important to understand the idea of "core dimensions" from NumPy's
`generalized ufuncs <https://numpy.org/doc/stable/reference/c-api/generalized-ufuncs.html>`_. Core dimensions are defined as dimensions
that should *not* be broadcast over. Usually, they correspond to the fundamental
dimensions over which an operation is defined, e.g., the summed axis in
``np.sum``. A good clue that core dimensions are needed is the presence of an
``axis`` argument on the corresponding NumPy function.

With ``apply_ufunc``, core dimensions are recognized by name, and then moved to
the last dimension of any input arguments before applying the given function.
This means that for functions that accept an ``axis`` argument, you usually need
to set ``axis=-1``. As an example, here is how we would wrap
:py:func:`numpy.linalg.norm` to calculate the vector norm:

.. code-block:: python

    def vector_norm(x, dim, ord=None):
        return xr.apply_ufunc(
            np.linalg.norm, x, input_core_dims=[[dim]], kwargs={"ord": ord, "axis": -1}
        )

.. jupyter-execute::
    :hide-code:

    def vector_norm(x, dim, ord=None):
        return xr.apply_ufunc(
            np.linalg.norm, x, input_core_dims=[[dim]], kwargs={"ord": ord, "axis": -1}
        )

.. jupyter-execute::

    vector_norm(arr1, dim="x")

Because ``apply_ufunc`` follows a standard convention for ufuncs, it plays
nicely with tools for building vectorized functions, like
:py:func:`numpy.broadcast_arrays` and :py:class:`numpy.vectorize`. For high performance
needs, consider using :doc:`Numba's vectorize and guvectorize <numba:user/vectorize>`.

In addition to wrapping functions, ``apply_ufunc`` can automatically parallelize
many functions when using dask by setting ``dask='parallelized'``. See
:ref:`dask.automatic-parallelization` for details.

:py:func:`~xarray.apply_ufunc` also supports some advanced options for
controlling alignment of variables and the form of the result. See the
docstring for full details and more examples.

    .. currentmodule:: xarray

.. _dask:

Parallel Computing with Dask
============================

.. jupyter-execute::

    # Note that it's not necessary to import dask to use xarray with dask.
    import numpy as np
    import pandas as pd
    import xarray as xr
    import bottleneck

.. jupyter-execute::
    :hide-code:

    import os

    np.random.seed(123456)

    # limit the amount of information printed to screen
    xr.set_options(display_expand_data=False)
    np.set_printoptions(precision=3, linewidth=100, threshold=10, edgeitems=2)

    ds = xr.Dataset(
        {
            "temperature": (
                ("time", "latitude", "longitude"),
                np.random.randn(30, 180, 180),
            ),
            "time": pd.date_range("2015-01-01", periods=30),
            "longitude": np.arange(180),
            "latitude": np.arange(89.5, -90.5, -1),
        }
    )
    ds.to_netcdf("example-data.nc")


Xarray integrates with `Dask <https://dask.org/?utm_source=xarray-docs>`__, a general purpose library for parallel computing, to handle larger-than-memory computations.

If you’ve been using Xarray to read in large datasets or split up data across a number of files, you may already be using Dask:

.. code-block:: python

    ds = xr.open_zarr("/path/to/data.zarr")
    timeseries = ds["temp"].mean(dim=["x", "y"]).compute()  # Compute result

Using Dask with Xarray feels similar to working with NumPy arrays, but on much larger datasets. The Dask integration is transparent, so you usually don’t need to manage the parallelism directly; Xarray and Dask handle these aspects behind the scenes. This makes it easy to write code that scales from small, in-memory datasets on a single machine to large datasets that are distributed across a cluster, with minimal code changes.

Examples
--------

If you're new to using Xarray with Dask, we recommend the `Xarray + Dask Tutorial <https://tutorial.xarray.dev/intermediate/xarray_and_dask.html>`_.

Here are some examples for using Xarray with Dask at scale:

- `Zonal averaging with the NOAA National Water Model <https://docs.coiled.io/user_guide/xarray.html?utm_source=xarray-docs>`_
- `CMIP6 Precipitation Frequency Analysis <https://gallery.pangeo.io/repos/pangeo-gallery/cmip6/precip_frequency_change.html>`_
- `Using Dask + Cloud Optimized GeoTIFFs <https://gallery.pangeo.io/repos/pangeo-data/landsat-8-tutorial-gallery/landsat8.html#Dask-Chunks-and-Cloud-Optimized-Geotiffs>`_

Find more examples at the `Project Pythia cookbook gallery <https://cookbooks.projectpythia.org/>`_.


Using Dask with Xarray
----------------------

.. image:: ../_static/dask-array.svg
   :width: 50 %
   :align: right
   :alt: A Dask array

Dask divides arrays into smaller parts called chunks. These chunks are small, manageable pieces of the larger dataset, that Dask is able to process in parallel (see the `Dask Array docs on chunks <https://docs.dask.org/en/stable/array-chunks.html?utm_source=xarray-docs>`_). Commonly chunks are set when reading data, but you can also set the chunksize manually at any point in your workflow using :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`. See :ref:`dask.chunks` for more.

Xarray operations on Dask-backed arrays are lazy. This means computations are not executed immediately, but are instead queued up as tasks in a Dask graph.

When a result is requested (e.g., for plotting, writing to disk, or explicitly computing), Dask executes the task graph. The computations are carried out in parallel, with each chunk being processed independently. This parallel execution is key to handling large datasets efficiently.

Nearly all Xarray methods have been extended to work automatically with Dask Arrays. This includes things like indexing, concatenating, rechunking, grouped operations, etc. Common operations are covered in more detail in each of the sections below.

.. _dask.io:

Reading and writing data
~~~~~~~~~~~~~~~~~~~~~~~~

When reading data, Dask divides your dataset into smaller chunks. You can specify the size of chunks with the ``chunks`` argument. Specifying ``chunks="auto"`` will set the dask chunk sizes to be a multiple of the on-disk chunk sizes. This can be a good idea, but usually the appropriate dask chunk size will depend on your workflow.

.. tab:: Zarr

    The `Zarr <https://zarr.readthedocs.io/en/stable/>`_ format is ideal for working with large datasets. Each chunk is stored in a separate file, allowing parallel reading and writing with Dask. You can also use Zarr to read/write directly from cloud storage buckets (see the `Dask documentation on connecting to remote data <https://docs.dask.org/en/stable/how-to/connect-to-remote-data.html?utm_source=xarray-docs>`__)

    When you open a Zarr dataset with :py:func:`~xarray.open_zarr`, it is loaded as a Dask array by default (if Dask is installed)::

        ds = xr.open_zarr("path/to/directory.zarr")

    See :ref:`io.zarr` for more details.

.. tab:: NetCDF

    Open a single netCDF file with :py:func:`~xarray.open_dataset` and supplying a ``chunks`` argument::

        ds = xr.open_dataset("example-data.nc", chunks={"time": 10})

    Or open multiple files in parallel with py:func:`~xarray.open_mfdataset`::

        xr.open_mfdataset('my/files/*.nc', parallel=True)

    .. tip::

        When reading in many netCDF files with py:func:`~xarray.open_mfdataset`, using ``engine="h5netcdf"`` can
        be faster than the default which uses the netCDF4 package.

    Save larger-than-memory netCDF files::

        ds.to_netcdf("my-big-file.nc")

    Or set ``compute=False`` to return a dask.delayed object that can be computed later::

       delayed_write = ds.to_netcdf("my-big-file.nc", compute=False)
       delayed_write.compute()

    .. note::

        When using Dask’s distributed scheduler to write NETCDF4 files, it may be necessary to set the environment variable ``HDF5_USE_FILE_LOCKING=FALSE`` to avoid competing locks within the HDF5 SWMR file locking scheme. Note that writing netCDF files with Dask’s distributed scheduler is only supported for the netcdf4 backend.

    See :ref:`io.netcdf` for more details.

.. tab:: HDF5

    Open HDF5 files with :py:func:`~xarray.open_dataset`::

        xr.open_dataset("/path/to/my/file.h5", chunks='auto')

    See :ref:`io.hdf5` for more details.

.. tab:: GeoTIFF

    Open large geoTIFF files with rioxarray::

        xds = rioxarray.open_rasterio("my-satellite-image.tif", chunks='auto')

    See :ref:`io.rasterio` for more details.


Loading Dask Arrays
~~~~~~~~~~~~~~~~~~~

There are a few common cases where you may want to convert lazy Dask arrays into eager, in-memory Xarray data structures:

- You want to inspect smaller intermediate results when working interactively or debugging
- You've reduced the dataset (by filtering or with a groupby, for example) and now have something much smaller that fits in memory
- You need to compute intermediate results since Dask is unable (or struggles) to perform a certain computation. The canonical example of this is normalizing a dataset, e.g., ``ds - ds.mean()``, when ``ds`` is larger than memory. Typically, you should either save ``ds`` to disk or compute ``ds.mean()`` eagerly.

To do this, you can use :py:meth:`Dataset.compute` or :py:meth:`DataArray.compute`:

.. jupyter-execute::

    ds.compute()

.. note::

    Using :py:meth:`Dataset.compute` is preferred to :py:meth:`Dataset.load`, which changes the results in-place.

You can also access :py:attr:`DataArray.values`, which will always be a NumPy array:

.. jupyter-input::

    ds.temperature.values

.. jupyter-output::

    array([[[  4.691e-01,  -2.829e-01, ...,  -5.577e-01,   3.814e-01],
            [  1.337e+00,  -1.531e+00, ...,   8.726e-01,  -1.538e+00],
            ...
    # truncated for brevity

NumPy ufuncs like :py:func:`numpy.sin` transparently work on all xarray objects, including those
that store lazy Dask arrays:

.. jupyter-execute::

    np.sin(ds)

To access Dask arrays directly, use the :py:attr:`DataArray.data` attribute which exposes the DataArray's underlying array type.

If you're using a Dask cluster, you can also use :py:meth:`Dataset.persist` for quickly accessing intermediate outputs. This is most helpful after expensive operations like rechunking or setting an index. It's a way of telling the cluster that it should start executing the computations that you have defined so far, and that it should try to keep those results in memory. You will get back a new Dask array that is semantically equivalent to your old array, but now points to running data.

.. code-block:: python

    ds = ds.persist()

.. tip::

   Remember to save the dataset returned by persist! This is a common mistake.

.. _dask.chunks:

Chunking and performance
~~~~~~~~~~~~~~~~~~~~~~~~

The way a dataset is chunked can be critical to performance when working with large datasets. You'll want chunk sizes large enough to reduce the number of chunks that Dask has to think about (to reduce overhead from the task graph) but also small enough so that many of them can fit in memory at once.

.. tip::

    A good rule of thumb is to create arrays with a minimum chunk size of at least one million elements (e.g., a 1000x1000 matrix). With large arrays (10+ GB), you may need larger chunks. See `Choosing good chunk sizes in Dask <https://blog.dask.org/2021/11/02/choosing-dask-chunk-sizes?utm_source=xarray-docs>`_.

It can be helpful to choose chunk sizes based on your downstream analyses and to chunk as early as possible. Datasets with smaller chunks along the time axis, for example, can make time domain problems easier to parallelize since Dask can perform the same operation on each time chunk. If you're working with a large dataset with chunks that make downstream analyses challenging, you may need to rechunk your data. This is an expensive operation though, so is only recommended when needed.

You can chunk or rechunk a dataset by:

- Specifying the ``chunks`` kwarg when reading in your dataset. If you know you'll want to do some spatial subsetting, for example, you could use ``chunks={'latitude': 10, 'longitude': 10}`` to specify small chunks across space. This can avoid loading subsets of data that span multiple chunks, thus reducing the number of file reads. Note that this will only work, though, for chunks that are similar to how the data is chunked on disk. Otherwise, it will be very slow and require a lot of network bandwidth.
- Many array file formats are chunked on disk. You can specify ``chunks={}`` to have a single dask chunk map to a single on-disk chunk, and ``chunks="auto"`` to have a single dask chunk be a automatically chosen multiple of the on-disk chunks.
- Using :py:meth:`Dataset.chunk` after you've already read in your dataset. For time domain problems, for example, you can use ``ds.chunk(time=TimeResampler())`` to rechunk according to a specified unit of time. ``ds.chunk(time=TimeResampler("MS"))``, for example, will set the chunks so that a month of data is contained in one chunk.


For large-scale rechunking tasks (e.g., converting a simulation dataset stored with chunking only along time to a dataset with chunking only across space), consider writing another copy of your data on disk and/or using dedicated tools such as `Rechunker <https://rechunker.readthedocs.io/en/latest/>`_.

.. _dask.automatic-parallelization:

Parallelize custom functions with ``apply_ufunc`` and ``map_blocks``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Almost all of Xarray's built-in operations work on Dask arrays. If you want to
use a function that isn't wrapped by Xarray, and have it applied in parallel on
each block of your xarray object, you have three options:

1. Use :py:func:`~xarray.apply_ufunc` to apply functions that consume and return NumPy arrays.
2. Use :py:func:`~xarray.map_blocks`, :py:meth:`Dataset.map_blocks` or :py:meth:`DataArray.map_blocks`
   to apply functions that consume and return xarray objects.
3. Extract Dask Arrays from xarray objects with :py:attr:`DataArray.data` and use Dask directly.

.. tip::

   See the extensive Xarray tutorial on `apply_ufunc <https://tutorial.xarray.dev/advanced/apply_ufunc/apply_ufunc.html>`_.


``apply_ufunc``
###############

:py:func:`~xarray.apply_ufunc` automates `embarrassingly parallel
<https://en.wikipedia.org/wiki/Embarrassingly_parallel>`__ "map" type operations
where a function written for processing NumPy arrays should be repeatedly
applied to Xarray objects containing Dask Arrays. It works similarly to
:py:func:`dask.array.map_blocks` and :py:func:`dask.array.blockwise`, but without
requiring an intermediate layer of abstraction. See the `Dask documentation <https://docs.dask.org/en/stable/array-gufunc.html?utm_source=xarray-docs>`__ for more details.

For the best performance when using Dask's multi-threaded scheduler, wrap a
function that already releases the global interpreter lock, which fortunately
already includes most NumPy and Scipy functions. Here we show an example
using NumPy operations and a fast function from
`bottleneck <https://github.com/pydata/bottleneck>`__, which
we use to calculate `Spearman's rank-correlation coefficient <https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient>`__:

.. code-block:: python

    def covariance_gufunc(x, y):
        return (
            (x - x.mean(axis=-1, keepdims=True)) * (y - y.mean(axis=-1, keepdims=True))
        ).mean(axis=-1)


    def pearson_correlation_gufunc(x, y):
        return covariance_gufunc(x, y) / (x.std(axis=-1) * y.std(axis=-1))


    def spearman_correlation_gufunc(x, y):
        x_ranks = bottleneck.rankdata(x, axis=-1)
        y_ranks = bottleneck.rankdata(y, axis=-1)
        return pearson_correlation_gufunc(x_ranks, y_ranks)


    def spearman_correlation(x, y, dim):
        return xr.apply_ufunc(
            spearman_correlation_gufunc,
            x,
            y,
            input_core_dims=[[dim], [dim]],
            dask="parallelized",
            output_dtypes=[float],
        )

The only aspect of this example that is different from standard usage of
``apply_ufunc()`` is that we needed to supply the ``output_dtypes`` arguments.
(Read up on :ref:`compute.wrapping-custom` for an explanation of the
"core dimensions" listed in ``input_core_dims``.)

Our new ``spearman_correlation()`` function achieves near linear speedup
when run on large arrays across the four cores on my laptop. It would also
work as a streaming operation, when run on arrays loaded from disk:

.. jupyter-input::

    rs = np.random.default_rng(0)

    array1 = xr.DataArray(rs.randn(1000, 100000), dims=["place", "time"])  # 800MB

    array2 = array1 + 0.5 * rs.randn(1000, 100000)

    # using one core, on NumPy arrays
    %time _ = spearman_correlation(array1, array2, 'time')
    # CPU times: user 21.6 s, sys: 2.84 s, total: 24.5 s
    # Wall time: 24.9 s

    chunked1 = array1.chunk({"place": 10})

    chunked2 = array2.chunk({"place": 10})

    # using all my laptop's cores, with Dask
    r = spearman_correlation(chunked1, chunked2, "time").compute()

    %time _ = r.compute()
    # CPU times: user 30.9 s, sys: 1.74 s, total: 32.6 s
    # Wall time: 4.59 s

One limitation of ``apply_ufunc()`` is that it cannot be applied to arrays with
multiple chunks along a core dimension:

.. jupyter-input::

    spearman_correlation(chunked1, chunked2, "place")

.. jupyter-output::

    ValueError: dimension 'place' on 0th function argument to apply_ufunc with
    dask='parallelized' consists of multiple chunks, but is also a core
    dimension. To fix, rechunk into a single Dask array chunk along this
    dimension, i.e., ``.rechunk({'place': -1})``, but beware that this may
    significantly increase memory usage.

This reflects the nature of core dimensions, in contrast to broadcast (non-core)
dimensions that allow operations to be split into arbitrary chunks for
application.

.. tip::

    When possible, it's recommended to use pre-existing ``dask.array`` functions, either with existing xarray methods or
    :py:func:`~xarray.apply_ufunc()` with ``dask='allowed'``. Dask can often
    have a more efficient implementation that makes use of the specialized
    structure of a problem, unlike the generic speedups offered by
    ``dask='parallelized'``.


``map_blocks``
##############

Functions that consume and return Xarray objects can be easily applied in parallel using :py:func:`map_blocks`.
Your function will receive an Xarray Dataset or DataArray subset to one chunk
along each chunked dimension.

.. jupyter-execute::

    ds.temperature

This DataArray has 3 chunks each with length 10 along the time dimension.
At compute time, a function applied with :py:func:`map_blocks` will receive a DataArray corresponding to a single block of shape 10x180x180
(time x latitude x longitude) with values loaded. The following snippet illustrates how to check the shape of the object
received by the applied function.

.. jupyter-execute::

    def func(da):
        print(da.sizes)
        return da.time


    mapped = xr.map_blocks(func, ds.temperature)
    mapped

Notice that the :py:meth:`map_blocks` call printed
``Frozen({'time': 0, 'latitude': 0, 'longitude': 0})`` to screen.
``func`` is received 0-sized blocks! :py:meth:`map_blocks` needs to know what the final result
looks like in terms of dimensions, shapes etc. It does so by running the provided function on 0-shaped
inputs (*automated inference*). This works in many cases, but not all. If automatic inference does not
work for your function, provide the ``template`` kwarg (see :ref:`below <template-note>`).

In this case, automatic inference has worked so let's check that the result is as expected.

.. jupyter-execute::

    mapped.load(scheduler="single-threaded")
    mapped.identical(ds.time)

Note that we use ``.load(scheduler="single-threaded")`` to execute the computation.
This executes the Dask graph in serial using a for loop, but allows for printing to screen and other
debugging techniques. We can easily see that our function is receiving blocks of shape 10x180x180 and
the returned result is identical to ``ds.time`` as expected.

Here is a common example where automated inference will not work.

.. jupyter-execute::
    :raises:

    def func(da):
        print(da.sizes)
        return da.isel(time=[1])


    mapped = xr.map_blocks(func, ds.temperature)

``func`` cannot be run on 0-shaped inputs because it is not possible to extract element 1 along a
dimension of size 0. In this case we need to tell :py:func:`map_blocks` what the returned result looks
like using the ``template`` kwarg. ``template`` must be an xarray Dataset or DataArray (depending on
what the function returns) with dimensions, shapes, chunk sizes, attributes, coordinate variables *and* data
variables that look exactly like the expected result. The variables should be dask-backed and hence not
incur much memory cost.

.. _template-note:

.. note::

    Note that when ``template`` is provided, ``attrs`` from ``template`` are copied over to the result. Any
    ``attrs`` set in ``func`` will be ignored.


.. jupyter-execute::

    template = ds.temperature.isel(time=[1, 11, 21])
    mapped = xr.map_blocks(func, ds.temperature, template=template)


Notice that the 0-shaped sizes were not printed to screen. Since ``template`` has been provided
:py:func:`map_blocks` does not need to infer it by running ``func`` on 0-shaped inputs.

.. jupyter-execute::

    mapped.identical(template)


:py:func:`map_blocks` also allows passing ``args`` and ``kwargs`` down to the user function ``func``.
``func`` will be executed as ``func(block_xarray, *args, **kwargs)`` so ``args`` must be a list and ``kwargs`` must be a dictionary.

.. jupyter-execute::

    def func(obj, a, b=0):
        return obj + a + b


    mapped = ds.map_blocks(func, args=[10], kwargs={"b": 10})
    expected = ds + 10 + 10
    mapped.identical(expected)

.. jupyter-execute::
    :hide-code:

    ds.close()  # Closes "example-data.nc".
    os.remove("example-data.nc")

.. tip::

   As :py:func:`map_blocks` loads each block into memory, reduce as much as possible objects consumed by user functions.
   For example, drop useless variables before calling ``func`` with :py:func:`map_blocks`.

Deploying Dask
--------------

By default, Dask uses the multi-threaded scheduler, which distributes work across multiple cores on a single machine and allows for processing some datasets that do not fit into memory. However, this has two limitations:

- You are limited by the size of your hard drive
- Downloading data can be slow and expensive

Instead, it can be faster and cheaper to run your computations close to where your data is stored, distributed across many machines on a Dask cluster. Often, this means deploying Dask on HPC clusters or on the cloud. See the `Dask deployment documentation <https://docs.dask.org/en/stable/deploying.html?utm_source=xarray-docs>`__ for more details.

Best Practices
--------------

Dask is pretty easy to use but there are some gotchas, many of which are under active development. Here are some tips we have found through experience. We also recommend checking out the `Dask best practices <https://docs.dask.org/en/stable/array-best-practices.html?utm_source=xarray-docs>`_.

1. Do your spatial and temporal indexing (e.g. ``.sel()`` or ``.isel()``) early, especially before calling ``resample()`` or ``groupby()``. Grouping and resampling triggers some computation on all the blocks, which in theory should commute with indexing, but this optimization hasn't been implemented in Dask yet. (See `Dask issue #746 <https://github.com/dask/dask/issues/746>`_).

2. More generally, ``groupby()`` is a costly operation and will perform a lot better if the ``flox`` package is installed.
   See the `flox documentation <https://flox.readthedocs.io>`_ for more. By default Xarray will use ``flox`` if installed.

3. Save intermediate results to disk as a netCDF files (using ``to_netcdf()``) and then load them again with ``open_dataset()`` for further computations. For example, if subtracting temporal mean from a dataset, save the temporal mean to disk before subtracting. Again, in theory, Dask should be able to do the computation in a streaming fashion, but in practice this is a fail case for the Dask scheduler, because it tries to keep every chunk of an array that it computes in memory. (See `Dask issue #874 <https://github.com/dask/dask/issues/874>`_)

4. Use the `Dask dashboard <https://docs.dask.org/en/latest/dashboard.html?utm_source=xarray-docs>`_ to identify performance bottlenecks.

Here's an example of a simplified workflow putting some of these tips together:

.. code-block:: python

    ds = xr.open_zarr(  # Since we're doing a spatial reduction, increase chunk size in x, y
        "my-data.zarr", chunks={"x": 100, "y": 100}
    )

    time_subset = ds.sea_temperature.sel(
        time=slice("2020-01-01", "2020-12-31")  # Filter early
    )

    # faster resampling when flox is installed
    daily = ds.resample(time="D").mean()

    daily.load()  # Pull smaller results into memory after reducing the dataset

    .. _data structures:

Data Structures
===============

.. jupyter-execute::
    :hide-code:
    :hide-output:

    import numpy as np
    import pandas as pd
    import xarray as xr
    import matplotlib.pyplot as plt

    np.random.seed(123456)
    np.set_printoptions(threshold=10)

    %xmode minimal


DataArray
---------

:py:class:`xarray.DataArray` is xarray's implementation of a labeled,
multi-dimensional array. It has several key properties:

- ``values``: a :py:class:`numpy.ndarray` or
  :ref:`numpy-like array <userguide.duckarrays>` holding the array's values
- ``dims``: dimension names for each axis (e.g., ``('x', 'y', 'z')``)
- ``coords``: a dict-like container of arrays (*coordinates*) that label each
  point (e.g., 1-dimensional arrays of numbers, datetime objects or
  strings)
- ``attrs``: :py:class:`dict` to hold arbitrary metadata (*attributes*)

Xarray uses ``dims`` and ``coords`` to enable its core metadata aware operations.
Dimensions provide names that xarray uses instead of the ``axis`` argument found
in many numpy functions. Coordinates enable fast label based indexing and
alignment, building on the functionality of the ``index`` found on a pandas
:py:class:`~pandas.DataFrame` or :py:class:`~pandas.Series`.

DataArray objects also can have a ``name`` and can hold arbitrary metadata in
the form of their ``attrs`` property. Names and attributes are strictly for
users and user-written code: xarray makes no attempt to interpret them, and
propagates them only in unambiguous cases. For reading and writing attributes
xarray relies on the capabilities of the supported backends.
(see FAQ, :ref:`approach to metadata`).

.. _creating a dataarray:

Creating a DataArray
~~~~~~~~~~~~~~~~~~~~

The :py:class:`~xarray.DataArray` constructor takes:

- ``data``: a multi-dimensional array of values (e.g., a numpy ndarray,
  a :ref:`numpy-like array <userguide.duckarrays>`, :py:class:`~pandas.Series`,
  :py:class:`~pandas.DataFrame` or ``pandas.Panel``)
- ``coords``: a list or dictionary of coordinates. If a list, it should be a
  list of tuples where the first element is the dimension name and the second
  element is the corresponding coordinate array_like object.
- ``dims``: a list of dimension names. If omitted and ``coords`` is a list of
  tuples, dimension names are taken from ``coords``.
- ``attrs``: a dictionary of attributes to add to the instance
- ``name``: a string that names the instance

.. jupyter-execute::

    data = np.random.rand(4, 3)
    locs = ["IA", "IL", "IN"]
    times = pd.date_range("2000-01-01", periods=4)
    foo = xr.DataArray(data, coords=[times, locs], dims=["time", "space"])
    foo

Only ``data`` is required; all of other arguments will be filled
in with default values:

.. jupyter-execute::

    xr.DataArray(data)

As you can see, dimension names are always present in the xarray data model: if
you do not provide them, defaults of the form ``dim_N`` will be created.
However, coordinates are always optional, and dimensions do not have automatic
coordinate labels.

.. note::

  This is different from pandas, where axes always have tick labels, which
  default to the integers ``[0, ..., n-1]``.

  Prior to xarray v0.9, xarray copied this behavior: default coordinates for
  each dimension would be created if coordinates were not supplied explicitly.
  This is no longer the case.

Coordinates can be specified in the following ways:

- A list of values with length equal to the number of dimensions, providing
  coordinate labels for each dimension. Each value must be of one of the
  following forms:

  * A :py:class:`~xarray.DataArray` or :py:class:`~xarray.Variable`
  * A tuple of the form ``(dims, data[, attrs])``, which is converted into
    arguments for :py:class:`~xarray.Variable`
  * A pandas object or scalar value, which is converted into a ``DataArray``
  * A 1D array or list, which is interpreted as values for a one dimensional
    coordinate variable along the same dimension as its name

- A dictionary of ``{coord_name: coord}`` where values are of the same form
  as the list. Supplying coordinates as a dictionary allows other coordinates
  than those corresponding to dimensions (more on these later). If you supply
  ``coords`` as a dictionary, you must explicitly provide ``dims``.

As a list of tuples:

.. jupyter-execute::

    xr.DataArray(data, coords=[("time", times), ("space", locs)])

As a dictionary:

.. jupyter-execute::

    xr.DataArray(
        data,
        coords={
            "time": times,
            "space": locs,
            "const": 42,
            "ranking": ("space", [1, 2, 3]),
        },
        dims=["time", "space"],
    )

As a dictionary with coords across multiple dimensions:

.. jupyter-execute::

    xr.DataArray(
        data,
        coords={
            "time": times,
            "space": locs,
            "const": 42,
            "ranking": (("time", "space"), np.arange(12).reshape(4, 3)),
        },
        dims=["time", "space"],
    )

If you create a ``DataArray`` by supplying a pandas
:py:class:`~pandas.Series`, :py:class:`~pandas.DataFrame` or
``pandas.Panel``, any non-specified arguments in the
``DataArray`` constructor will be filled in from the pandas object:

.. jupyter-execute::

    df = pd.DataFrame({"x": [0, 1], "y": [2, 3]}, index=["a", "b"])
    df.index.name = "abc"
    df.columns.name = "xyz"
    df

.. jupyter-execute::

    xr.DataArray(df)

DataArray properties
~~~~~~~~~~~~~~~~~~~~

Let's take a look at the important properties on our array:

.. jupyter-execute::

    foo.values

.. jupyter-execute::

    foo.dims

.. jupyter-execute::

    foo.coords

.. jupyter-execute::

    foo.attrs

.. jupyter-execute::

    print(foo.name)

You can modify ``values`` inplace:

.. jupyter-execute::

    foo.values = 1.0 * foo.values

.. note::

    The array values in a :py:class:`~xarray.DataArray` have a single
    (homogeneous) data type. To work with heterogeneous or structured data
    types in xarray, use coordinates, or put separate ``DataArray`` objects
    in a single :py:class:`~xarray.Dataset` (see below).

Now fill in some of that missing metadata:

.. jupyter-execute::

    foo.name = "foo"
    foo.attrs["units"] = "meters"
    foo

The :py:meth:`~xarray.DataArray.rename` method is another option, returning a
new data array:

.. jupyter-execute::

    foo.rename("bar")

DataArray Coordinates
~~~~~~~~~~~~~~~~~~~~~

The ``coords`` property is ``dict`` like. Individual coordinates can be
accessed from the coordinates by name, or even by indexing the data array
itself:

.. jupyter-execute::

    foo.coords["time"]

.. jupyter-execute::

    foo["time"]

These are also :py:class:`~xarray.DataArray` objects, which contain tick-labels
for each dimension.

Coordinates can also be set or removed by using the dictionary like syntax:

.. jupyter-execute::

    foo["ranking"] = ("space", [1, 2, 3])
    foo.coords

.. jupyter-execute::

    del foo["ranking"]
    foo.coords

For more details, see :ref:`coordinates` below.

Dataset
-------

:py:class:`xarray.Dataset` is xarray's multi-dimensional equivalent of a
:py:class:`~pandas.DataFrame`. It is a dict-like
container of labeled arrays (:py:class:`~xarray.DataArray` objects) with aligned
dimensions. It is designed as an in-memory representation of the data model
from the `netCDF`__ file format.

__ https://www.unidata.ucar.edu/software/netcdf/

In addition to the dict-like interface of the dataset itself, which can be used
to access any variable in a dataset, datasets have four key properties:

- ``dims``: a dictionary mapping from dimension names to the fixed length of
  each dimension (e.g., ``{'x': 6, 'y': 6, 'time': 8}``)
- ``data_vars``: a dict-like container of DataArrays corresponding to variables
- ``coords``: another dict-like container of DataArrays intended to label points
  used in ``data_vars`` (e.g., arrays of numbers, datetime objects or strings)
- ``attrs``: :py:class:`dict` to hold arbitrary metadata

The distinction between whether a variable falls in data or coordinates
(borrowed from `CF conventions`_) is mostly semantic, and you can probably get
away with ignoring it if you like: dictionary like access on a dataset will
supply variables found in either category. However, xarray does make use of the
distinction for indexing and computations. Coordinates indicate
constant/fixed/independent quantities, unlike the varying/measured/dependent
quantities that belong in data.

.. _CF conventions: https://cfconventions.org/

Here is an example of how we might structure a dataset for a weather forecast:

.. image:: ../_static/dataset-diagram.png

In this example, it would be natural to call ``temperature`` and
``precipitation`` "data variables" and all the other arrays "coordinate
variables" because they label the points along the dimensions. (see [1]_ for
more background on this example).

Creating a Dataset
~~~~~~~~~~~~~~~~~~

To make an :py:class:`~xarray.Dataset` from scratch, supply dictionaries for any
variables (``data_vars``), coordinates (``coords``) and attributes (``attrs``).

- ``data_vars`` should be a dictionary with each key as the name of the variable
  and each value as one of:

  * A :py:class:`~xarray.DataArray` or :py:class:`~xarray.Variable`
  * A tuple of the form ``(dims, data[, attrs])``, which is converted into
    arguments for :py:class:`~xarray.Variable`
  * A pandas object, which is converted into a ``DataArray``
  * A 1D array or list, which is interpreted as values for a one dimensional
    coordinate variable along the same dimension as its name

- ``coords`` should be a dictionary of the same form as ``data_vars``.

- ``attrs`` should be a dictionary.

Let's create some fake data for the example we show above. In this
example dataset, we will represent measurements of the temperature and
pressure that were made under various conditions:

* the measurements were made on four different days;
* they were made at two separate locations, which we will represent using
  their latitude and longitude; and
* they were made using instruments by three different manufacturers, which we
  will refer to as ``'manufac1'``, ``'manufac2'``, and ``'manufac3'``.

.. jupyter-execute::

    np.random.seed(0)
    temperature = 15 + 8 * np.random.randn(2, 3, 4)
    precipitation = 10 * np.random.rand(2, 3, 4)
    lon = [-99.83, -99.32]
    lat = [42.25, 42.21]
    instruments = ["manufac1", "manufac2", "manufac3"]
    time = pd.date_range("2014-09-06", periods=4)
    reference_time = pd.Timestamp("2014-09-05")

    # for real use cases, its good practice to supply array attributes such as
    # units, but we won't bother here for the sake of brevity
    ds = xr.Dataset(
        {
            "temperature": (["loc", "instrument", "time"], temperature),
            "precipitation": (["loc", "instrument", "time"], precipitation),
        },
        coords={
            "lon": (["loc"], lon),
            "lat": (["loc"], lat),
            "instrument": instruments,
            "time": time,
            "reference_time": reference_time,
        },
    )
    ds

Here we pass :py:class:`xarray.DataArray` objects or a pandas object as values
in the dictionary:

.. jupyter-execute::

    xr.Dataset(dict(bar=foo))


.. jupyter-execute::

    xr.Dataset(dict(bar=foo.to_pandas()))

Where a pandas object is supplied as a value, the names of its indexes are used as dimension
names, and its data is aligned to any existing dimensions.

You can also create an dataset from:

- A :py:class:`pandas.DataFrame` or ``pandas.Panel`` along its columns and items
  respectively, by passing it into the :py:class:`~xarray.Dataset` directly
- A :py:class:`pandas.DataFrame` with :py:meth:`Dataset.from_dataframe <xarray.Dataset.from_dataframe>`,
  which will additionally handle MultiIndexes See :ref:`pandas`
- A netCDF file on disk with :py:func:`~xarray.open_dataset`. See :ref:`io`.

Dataset contents
~~~~~~~~~~~~~~~~

:py:class:`~xarray.Dataset` implements the Python mapping interface, with
values given by :py:class:`xarray.DataArray` objects:

.. jupyter-execute::

    print("temperature" in ds)
    ds["temperature"]

Valid keys include each listed coordinate and data variable.

Data and coordinate variables are also contained separately in the
:py:attr:`~xarray.Dataset.data_vars` and :py:attr:`~xarray.Dataset.coords`
dictionary-like attributes:

.. jupyter-execute::

    ds.data_vars

.. jupyter-execute::

    ds.coords

Finally, like data arrays, datasets also store arbitrary metadata in the form
of ``attributes``:

.. jupyter-execute::

    print(ds.attrs)
    ds.attrs["title"] = "example attribute"
    ds

Xarray does not enforce any restrictions on attributes, but serialization to
some file formats may fail if you use objects that are not strings, numbers
or :py:class:`numpy.ndarray` objects.

As a useful shortcut, you can use attribute style access for reading (but not
setting) variables and attributes:

.. jupyter-execute::

    ds.temperature

This is particularly useful in an exploratory context, because you can
tab-complete these variable names with tools like IPython.

.. _dictionary_like_methods:

Dictionary like methods
~~~~~~~~~~~~~~~~~~~~~~~

We can update a dataset in-place using Python's standard dictionary syntax. For
example, to create this example dataset from scratch, we could have written:

.. jupyter-execute::

    ds = xr.Dataset()
    ds["temperature"] = (("loc", "instrument", "time"), temperature)
    ds["temperature_double"] = (("loc", "instrument", "time"), temperature * 2)
    ds["precipitation"] = (("loc", "instrument", "time"), precipitation)
    ds.coords["lat"] = (("loc",), lat)
    ds.coords["lon"] = (("loc",), lon)
    ds.coords["time"] = pd.date_range("2014-09-06", periods=4)
    ds.coords["reference_time"] = pd.Timestamp("2014-09-05")

To change the variables in a ``Dataset``, you can use all the standard dictionary
methods, including ``values``, ``items``, ``__delitem__``, ``get`` and
:py:meth:`~xarray.Dataset.update`. Note that assigning a ``DataArray`` or pandas
object to a ``Dataset`` variable using ``__setitem__`` or ``update`` will
:ref:`automatically align<update>` the array(s) to the original
dataset's indexes.

You can copy a ``Dataset`` by calling the :py:meth:`~xarray.Dataset.copy`
method. By default, the copy is shallow, so only the container will be copied:
the arrays in the ``Dataset`` will still be stored in the same underlying
:py:class:`numpy.ndarray` objects. You can copy all data by calling
``ds.copy(deep=True)``.

.. _transforming datasets:

Transforming datasets
~~~~~~~~~~~~~~~~~~~~~

In addition to dictionary-like methods (described above), xarray has additional
methods (like pandas) for transforming datasets into new objects.

For removing variables, you can select and drop an explicit list of
variables by indexing with a list of names or using the
:py:meth:`~xarray.Dataset.drop_vars` methods to return a new ``Dataset``. These
operations keep around coordinates:

.. jupyter-execute::

    ds[["temperature"]]

.. jupyter-execute::

    ds[["temperature", "temperature_double"]]

.. jupyter-execute::

    ds.drop_vars("temperature")

To remove a dimension, you can use :py:meth:`~xarray.Dataset.drop_dims` method.
Any variables using that dimension are dropped:

.. jupyter-execute::

    ds.drop_dims("time")

As an alternate to dictionary-like modifications, you can use
:py:meth:`~xarray.Dataset.assign` and :py:meth:`~xarray.Dataset.assign_coords`.
These methods return a new dataset with additional (or replaced) values:

.. jupyter-execute::

    ds.assign(temperature2=2 * ds.temperature)

There is also the :py:meth:`~xarray.Dataset.pipe` method that allows you to use
a method call with an external function (e.g., ``ds.pipe(func)``) instead of
simply calling it (e.g., ``func(ds)``). This allows you to write pipelines for
transforming your data (using "method chaining") instead of writing hard to
follow nested function calls:

.. jupyter-input::

    # these lines are equivalent, but with pipe we can make the logic flow
    # entirely from left to right
    plt.plot((2 * ds.temperature.sel(loc=0)).mean("instrument"))
    (ds.temperature.sel(loc=0).pipe(lambda x: 2 * x).mean("instrument").pipe(plt.plot))

Both ``pipe`` and ``assign`` replicate the pandas methods of the same names
(:py:meth:`DataFrame.pipe <pandas.DataFrame.pipe>` and
:py:meth:`DataFrame.assign <pandas.DataFrame.assign>`).

With xarray, there is no performance penalty for creating new datasets, even if
variables are lazily loaded from a file on disk. Creating new objects instead
of mutating existing objects often results in easier to understand code, so we
encourage using this approach.

Renaming variables
~~~~~~~~~~~~~~~~~~

Another useful option is the :py:meth:`~xarray.Dataset.rename` method to rename
dataset variables:

.. jupyter-execute::

    ds.rename({"temperature": "temp", "precipitation": "precip"})

The related :py:meth:`~xarray.Dataset.swap_dims` method allows you do to swap
dimension and non-dimension variables:

.. jupyter-execute::

    ds.coords["day"] = ("time", [6, 7, 8, 9])
    ds.swap_dims({"time": "day"})

DataTree
--------

:py:class:`~xarray.DataTree` is ``xarray``'s highest-level data structure, able to
organise heterogeneous data which could not be stored inside a single
:py:class:`~xarray.Dataset` object. This includes representing the recursive structure
of multiple `groups`_ within a netCDF file or `Zarr Store`_.

.. _groups: https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html
.. _Zarr Store: https://zarr.readthedocs.io/en/stable/tutorial.html#groups

Each :py:class:`~xarray.DataTree` object (or "node") contains the same data that a single
:py:class:`xarray.Dataset` would (i.e. :py:class:`~xarray.DataArray` objects stored under hashable
keys), and so has the same key properties:

- ``dims``: a dictionary mapping of dimension names to lengths, for the
  variables in this node, and this node's ancestors,
- ``data_vars``: a dict-like container of DataArrays corresponding to variables
  in this node,
- ``coords``: another dict-like container of DataArrays, corresponding to
  coordinate variables in this node, and this node's ancestors,
- ``attrs``: dict to hold arbitrary metadata relevant to data in this node.

A single :py:class:`~xarray.DataTree` object acts much like a single :py:class:`~xarray.Dataset` object, and
has a similar set of dict-like methods defined upon it. However, :py:class:`~xarray.DataTree`\s
can also contain other :py:class:`~xarray.DataTree` objects, so they can be thought of as
nested dict-like containers of both :py:class:`xarray.DataArray`\s and :py:class:`~xarray.DataTree`\s.

A single datatree object is known as a "node", and its position relative to
other nodes is defined by two more key properties:

- ``children``: An dictionary mapping from names to other :py:class:`~xarray.DataTree`
  objects, known as its "child nodes".
- ``parent``: The single :py:class:`~xarray.DataTree` object whose children this datatree is a
  member of, known as its "parent node".

Each child automatically knows about its parent node, and a node without a
parent is known as a "root" node (represented by the ``parent`` attribute
pointing to ``None``). Nodes can have multiple children, but as each child node
has at most one parent, there can only ever be one root node in a given tree.

The overall structure is technically a connected acyclic undirected rooted graph,
otherwise known as a `"Tree" <https://en.wikipedia.org/wiki/Tree_(graph_theory)>`_.

:py:class:`~xarray.DataTree` objects can also optionally have a ``name`` as well as ``attrs``,
just like a :py:class:`~xarray.DataArray`. Again these are not normally used unless explicitly
accessed by the user.


.. _creating a datatree:

Creating a DataTree
~~~~~~~~~~~~~~~~~~~

One way to create a :py:class:`~xarray.DataTree` from scratch is to create each node individually,
specifying the nodes' relationship to one another as you create each one.

The :py:class:`~xarray.DataTree` constructor takes:

- ``dataset``: The data that will be stored in this node, represented by a single
  :py:class:`xarray.Dataset`.
- ``children``: The various child nodes (if there are any), given as a mapping
  from string keys to :py:class:`~xarray.DataTree` objects.
- ``name``: A string to use as the name of this node.

Let's make a single datatree node with some example data in it:

.. jupyter-execute::

    ds1 = xr.Dataset({"foo": "orange"})
    dt = xr.DataTree(name="root", dataset=ds1)
    dt

At this point we have created a single node datatree with no parent and no children.

.. jupyter-execute::

    print(dt.parent is None)
    dt.children

We can add a second node to this tree, assigning it to the parent node ``dt``:

.. jupyter-execute::

    dataset2 = xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})
    dt2 = xr.DataTree(name="a", dataset=dataset2)
    # Add the child Datatree to the root node
    dt.children = {"child-node": dt2}
    dt


More idiomatically you can create a tree from a dictionary of ``Datasets`` and
``DataTrees``. In this case we add a new node under ``dt["child-node"]`` by
providing the explicit path under ``"child-node"`` as the dictionary key:

.. jupyter-execute::

    # create a third Dataset
    ds3 = xr.Dataset({"zed": np.nan})
    # create a tree from a dictionary of DataTrees and Datasets
    dt = xr.DataTree.from_dict({"/": dt, "/child-node/new-zed-node": ds3})

We have created a tree with three nodes in it:

.. jupyter-execute::

    dt

Consistency checks are enforced. For instance, if we try to create a cycle,
where the root node is also a child of a descendant, the constructor will raise
an (:py:class:`~xarray.InvalidTreeError`):

.. jupyter-execute::
    :raises:

    dt["child-node"].children = {"new-child": dt}

Alternatively you can also create a :py:class:`~xarray.DataTree` object from:

- A dictionary mapping directory-like paths to either :py:class:`~xarray.DataTree` nodes or data, using :py:meth:`xarray.DataTree.from_dict()`,
- A well formed netCDF or Zarr file on disk with :py:func:`~xarray.open_datatree()`. See :ref:`reading and writing files <io>`.

For data files with groups that do not not align see
:py:func:`xarray.open_groups` or target each group individually
:py:func:`xarray.open_dataset(group='groupname') <xarray.open_dataset>`. For
more information about coordinate alignment see :ref:`datatree-inheritance`


DataTree Contents
~~~~~~~~~~~~~~~~~

Like :py:class:`~xarray.Dataset`, :py:class:`~xarray.DataTree` implements the python mapping interface,
but with values given by either :py:class:`~xarray.DataArray` objects or other
:py:class:`~xarray.DataTree` objects.

.. jupyter-execute::

    dt["child-node"]

.. jupyter-execute::

    dt["foo"]

Iterating over keys will iterate over both the names of variables and child nodes.

We can also access all the data in a single node, and its inherited coordinates, through a dataset-like view

.. jupyter-execute::

    dt["child-node"].dataset

This demonstrates the fact that the data in any one node is equivalent to the
contents of a single :py:class:`~xarray.Dataset` object. The :py:attr:`DataTree.dataset <xarray.DataTree.dataset>` property
returns an immutable view, but we can instead extract the node's data contents
as a new and mutable :py:class:`~xarray.Dataset` object via
:py:meth:`DataTree.to_dataset() <xarray.DataTree.to_dataset>`:

.. jupyter-execute::

    dt["child-node"].to_dataset()

Like with :py:class:`~xarray.Dataset`, you can access the data and coordinate variables of a
node separately via the :py:attr:`~xarray.DataTree.data_vars` and :py:attr:`~xarray.DataTree.coords` attributes:

.. jupyter-execute::

    dt["child-node"].data_vars

.. jupyter-execute::

    dt["child-node"].coords


Dictionary-like methods
~~~~~~~~~~~~~~~~~~~~~~~

We can update a datatree in-place using Python's standard dictionary syntax,
similar to how we can for Dataset objects. For example, to create this example
DataTree from scratch, we could have written:

.. jupyter-execute::

    dt = xr.DataTree(name="root")
    dt["foo"] = "orange"
    dt["child-node"] = xr.DataTree(
        dataset=xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])})
    )
    dt["child-node/new-zed-node/zed"] = np.nan
    dt

To change the variables in a node of a :py:class:`~xarray.DataTree`, you can use all the
standard dictionary methods, including ``values``, ``items``, ``__delitem__``,
``get`` and :py:meth:`xarray.DataTree.update`.
Note that assigning a :py:class:`~xarray.DataTree` object to a :py:class:`~xarray.DataTree` variable using
``__setitem__`` or :py:meth:`~xarray.DataTree.update` will :ref:`automatically align <update>` the
array(s) to the original node's indexes.

If you copy a :py:class:`~xarray.DataTree` using the :py:func:`copy` function or the
:py:meth:`xarray.DataTree.copy` method it will copy the subtree,
meaning that node and children below it, but no parents above it.
Like for :py:class:`~xarray.Dataset`, this copy is shallow by default, but you can copy all the
underlying data arrays by calling ``dt.copy(deep=True)``.


.. _datatree-inheritance:

DataTree Inheritance
~~~~~~~~~~~~~~~~~~~~

DataTree implements a simple inheritance mechanism. Coordinates, dimensions and their
associated indices are propagated from downward starting from the root node to
all descendent nodes.  Coordinate inheritance was inspired by the NetCDF-CF
inherited dimensions, but DataTree's inheritance is slightly stricter yet
easier to reason about.

The constraint that this puts on a DataTree is that dimensions and indices that
are inherited must be aligned with any direct descendant node's existing
dimension or index.  This allows descendants to use dimensions defined in
ancestor nodes, without duplicating that information. But as a consequence, if
a dimension-name is defined in on a node and that same dimension-name
exists in one of its ancestors, they must align (have the same index and
size).

Some examples:

.. jupyter-execute::

    # Set up coordinates
    time = xr.DataArray(data=["2022-01", "2023-01"], dims="time")
    stations = xr.DataArray(data=list("abcdef"), dims="station")
    lon = [-100, -80, -60]
    lat = [10, 20, 30]

    # Set up fake data
    wind_speed = xr.DataArray(np.ones((2, 6)) * 2, dims=("time", "station"))
    pressure = xr.DataArray(np.ones((2, 6)) * 3, dims=("time", "station"))
    air_temperature = xr.DataArray(np.ones((2, 6)) * 4, dims=("time", "station"))
    dewpoint = xr.DataArray(np.ones((2, 6)) * 5, dims=("time", "station"))
    infrared = xr.DataArray(np.ones((2, 3, 3)) * 6, dims=("time", "lon", "lat"))
    true_color = xr.DataArray(np.ones((2, 3, 3)) * 7, dims=("time", "lon", "lat"))

    dt2 = xr.DataTree.from_dict(
        {
            "/": xr.Dataset(
                coords={"time": time},
            ),
            "/weather": xr.Dataset(
                coords={"station": stations},
                data_vars={
                    "wind_speed": wind_speed,
                    "pressure": pressure,
                },
            ),
            "/weather/temperature": xr.Dataset(
                data_vars={
                    "air_temperature": air_temperature,
                    "dewpoint": dewpoint,
                },
            ),
            "/satellite": xr.Dataset(
                coords={"lat": lat, "lon": lon},
                data_vars={
                    "infrared": infrared,
                    "true_color": true_color,
                },
            ),
        },
    )
    dt2


Here there are four different coordinate variables, which apply to variables in the DataTree in different ways:

``time`` is a shared coordinate used by both ``weather`` and ``satellite`` variables
``station`` is used only for ``weather`` variables
``lat`` and ``lon`` are only use for ``satellite`` images

Coordinate variables are inherited to descendent nodes, which is only possible because
variables at different levels of a hierarchical DataTree are always
aligned. Placing the ``time`` variable at the root node automatically indicates
that it applies to all descendent nodes. Similarly, ``station`` is in the base
``weather`` node, because it applies to all weather variables, both directly in
``weather`` and in the ``temperature`` sub-tree.  Notice the inherited coordinates are
explicitly shown in the tree representation under ``Inherited coordinates:``.

.. jupyter-execute::

    dt2["/weather"]

Accessing any of the lower level trees through the :py:func:`.dataset <xarray.DataTree.dataset>` property
automatically includes coordinates from higher levels (e.g., ``time`` and
``station``):

.. jupyter-execute::

    dt2["/weather/temperature"].dataset

Similarly, when you retrieve a Dataset through :py:func:`~xarray.DataTree.to_dataset`  , the inherited coordinates are
included by default unless you exclude them with the ``inherit`` flag:

.. jupyter-execute::

    dt2["/weather/temperature"].to_dataset()

.. jupyter-execute::

    dt2["/weather/temperature"].to_dataset(inherit=False)

For more examples and further discussion see :ref:`alignment and coordinate inheritance <hierarchical-data.alignment-and-coordinate-inheritance>`.

.. _coordinates:

Coordinates
-----------

Coordinates are ancillary variables stored for ``DataArray`` and ``Dataset``
objects in the ``coords`` attribute:

.. jupyter-execute::

    ds.coords

Unlike attributes, xarray *does* interpret and persist coordinates in
operations that transform xarray objects. There are two types of coordinates
in xarray:

- **dimension coordinates** are one dimensional coordinates with a name equal
  to their sole dimension (marked by ``*`` when printing a dataset or data
  array). They are used for label based indexing and alignment,
  like the ``index`` found on a pandas :py:class:`~pandas.DataFrame` or
  :py:class:`~pandas.Series`. Indeed, these "dimension" coordinates use a
  :py:class:`pandas.Index` internally to store their values.

- **non-dimension coordinates** are variables that contain coordinate
  data, but are not a dimension coordinate. They can be multidimensional (see
  :ref:`/examples/multidimensional-coords.ipynb`), and there is no
  relationship between the name of a non-dimension coordinate and the
  name(s) of its dimension(s).  Non-dimension coordinates can be
  useful for indexing or plotting; otherwise, xarray does not make any
  direct use of the values associated with them.  They are not used
  for alignment or automatic indexing, nor are they required to match
  when doing arithmetic (see :ref:`coordinates math`).

.. note::

  Xarray's terminology differs from the `CF terminology`_, where the
  "dimension coordinates" are called "coordinate variables", and the
  "non-dimension coordinates" are called "auxiliary coordinate variables"
  (see :issue:`1295` for more details).

.. _CF terminology: https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#terminology


Modifying coordinates
~~~~~~~~~~~~~~~~~~~~~

To entirely add or remove coordinate arrays, you can use dictionary like
syntax, as shown above.

To convert back and forth between data and coordinates, you can use the
:py:meth:`~xarray.Dataset.set_coords` and
:py:meth:`~xarray.Dataset.reset_coords` methods:

.. jupyter-execute::

    ds.reset_coords()

.. jupyter-execute::

    ds.set_coords(["temperature", "precipitation"])

.. jupyter-execute::

    ds["temperature"].reset_coords(drop=True)

Notice that these operations skip coordinates with names given by dimensions,
as used for indexing. This mostly because we are not entirely sure how to
design the interface around the fact that xarray cannot store a coordinate and
variable with the name but different values in the same dictionary. But we do
recognize that supporting something like this would be useful.

Coordinates methods
~~~~~~~~~~~~~~~~~~~

``Coordinates`` objects also have a few useful methods, mostly for converting
them into dataset objects:

.. jupyter-execute::

    ds.coords.to_dataset()

The merge method is particularly interesting, because it implements the same
logic used for merging coordinates in arithmetic operations
(see :ref:`compute`):

.. jupyter-execute::

    alt = xr.Dataset(coords={"z": [10], "lat": 0, "lon": 0})
    ds.coords.merge(alt.coords)

The ``coords.merge`` method may be useful if you want to implement your own
binary operations that act on xarray objects. In the future, we hope to write
more helper functions so that you can easily make your functions act like
xarray's built-in arithmetic.

Indexes
~~~~~~~

To convert a coordinate (or any ``DataArray``) into an actual
:py:class:`pandas.Index`, use the :py:meth:`~xarray.DataArray.to_index` method:

.. jupyter-execute::

    ds["time"].to_index()

A useful shortcut is the ``indexes`` property (on both ``DataArray`` and
``Dataset``), which lazily constructs a dictionary whose keys are given by each
dimension and whose the values are ``Index`` objects:

.. jupyter-execute::

    ds.indexes

MultiIndex coordinates
~~~~~~~~~~~~~~~~~~~~~~

Xarray supports labeling coordinate values with a :py:class:`pandas.MultiIndex`:

.. jupyter-execute::

    midx = pd.MultiIndex.from_arrays(
        [["R", "R", "V", "V"], [0.1, 0.2, 0.7, 0.9]], names=("band", "wn")
    )
    mda = xr.DataArray(np.random.rand(4), coords={"spec": midx}, dims="spec")
    mda

For convenience multi-index levels are directly accessible as "virtual" or
"derived" coordinates (marked by ``-`` when printing a dataset or data array):

.. jupyter-execute::

    mda["band"]

.. jupyter-execute::

    mda.wn

Indexing with multi-index levels is also possible using the ``sel`` method
(see :ref:`multi-level indexing`).

Unlike other coordinates, "virtual" level coordinates are not stored in
the ``coords`` attribute of ``DataArray`` and ``Dataset`` objects
(although they are shown when printing the ``coords`` attribute).
Consequently, most of the coordinates related methods don't apply for them.
It also can't be used to replace one particular level.

Because in a ``DataArray`` or ``Dataset`` object each multi-index level is
accessible as a "virtual" coordinate, its name must not conflict with the names
of the other levels, coordinates and data variables of the same object.
Even though xarray sets default names for multi-indexes with unnamed levels,
it is recommended that you explicitly set the names of the levels.

.. [1] Latitude and longitude are 2D arrays because the dataset uses
   `projected coordinates`__. ``reference_time`` refers to the reference time
   at which the forecast was made, rather than ``time`` which is the valid time
   for which the forecast applies.

__ https://en.wikipedia.org/wiki/Map_projection

    .. currentmodule:: xarray

.. _userguide.duckarrays:

Working with numpy-like arrays
==============================

NumPy-like arrays (often known as :term:`duck array`\s) are drop-in replacements for the :py:class:`numpy.ndarray`
class but with different features, such as propagating physical units or a different layout in memory.
Xarray can often wrap these array types, allowing you to use labelled dimensions and indexes whilst benefiting from the
additional features of these array libraries.

Some numpy-like array types that xarray already has some support for:

* `Cupy <https://cupy.dev/>`_ - GPU support (see `cupy-xarray <https://cupy-xarray.readthedocs.io>`_),
* `Sparse <https://sparse.pydata.org/en/stable/>`_ - for performant arrays with many zero elements,
* `Pint <https://pint.readthedocs.io/en/latest/>`_ - for tracking the physical units of your data (see `pint-xarray <https://pint-xarray.readthedocs.io>`_),
* `Dask <https://docs.dask.org/en/stable/>`_ - parallel computing on larger-than-memory arrays (see :ref:`using dask with xarray <dask>`),
* `Cubed <https://github.com/tomwhite/cubed/tree/main/cubed>`_ - another parallel computing framework that emphasises reliability (see `cubed-xarray <https://github.com/cubed-xarray>`_).

.. warning::

   This feature should be considered somewhat experimental. Please report any bugs you find on
   `xarray’s issue tracker <https://github.com/pydata/xarray/issues>`_.

.. note::

    For information on wrapping dask arrays see :ref:`dask`. Whilst xarray wraps dask arrays in a similar way to that
    described on this page, chunked array types like :py:class:`dask.array.Array` implement additional methods that require
    slightly different user code (e.g. calling ``.chunk`` or ``.compute``). See the docs on :ref:`wrapping chunked arrays <internals.chunkedarrays>`.

Why "duck"?
-----------

Why is it also called a "duck" array? This comes from a common statement of object-oriented programming -
"If it walks like a duck, and quacks like a duck, treat it like a duck". In other words, a library like xarray that
is capable of using multiple different types of arrays does not have to explicitly check that each one it encounters is
permitted (e.g. ``if dask``, ``if numpy``, ``if sparse`` etc.). Instead xarray can take the more permissive approach of simply
treating the wrapped array as valid, attempting to call the relevant methods (e.g. ``.mean()``) and only raising an
error if a problem occurs (e.g. the method is not found on the wrapped class). This is much more flexible, and allows
objects and classes from different libraries to work together more easily.

What is a numpy-like array?
---------------------------

A "numpy-like array" (also known as a "duck array") is a class that contains array-like data, and implements key
numpy-like functionality such as indexing, broadcasting, and computation methods.

For example, the `sparse <https://sparse.pydata.org/en/stable/>`_ library provides a sparse array type which is useful for representing nD array objects like sparse matrices
in a memory-efficient manner. We can create a sparse array object (of the :py:class:`sparse.COO` type) from a numpy array like this:

.. jupyter-execute::

    from sparse import COO
    import xarray as xr
    import numpy as np
    %xmode minimal

.. jupyter-execute::

    x = np.eye(4, dtype=np.uint8)  # create diagonal identity matrix
    s = COO.from_numpy(x)
    s

This sparse object does not attempt to explicitly store every element in the array, only the non-zero elements.
This approach is much more efficient for large arrays with only a few non-zero elements (such as tri-diagonal matrices).
Sparse array objects can be converted back to a "dense" numpy array by calling :py:meth:`sparse.COO.todense`.

Just like :py:class:`numpy.ndarray` objects, :py:class:`sparse.COO` arrays support indexing

.. jupyter-execute::

    s[1, 1]  # diagonal elements should be ones

.. jupyter-execute::

    s[2, 3]  # off-diagonal elements should be zero

broadcasting,

.. jupyter-execute::

    x2 = np.zeros(
        (4, 1), dtype=np.uint8
    )  # create second sparse array of different shape
    s2 = COO.from_numpy(x2)
    (s * s2)  # multiplication requires broadcasting

and various computation methods

.. jupyter-execute::

    s.sum(axis=1)

This numpy-like array also supports calling so-called `numpy ufuncs <https://numpy.org/doc/stable/reference/ufuncs.html#available-ufuncs>`_
("universal functions") on it directly:

.. jupyter-execute::

    np.sum(s, axis=1)


Notice that in each case the API for calling the operation on the sparse array is identical to that of calling it on the
equivalent numpy array - this is the sense in which the sparse array is "numpy-like".

.. note::

    For discussion on exactly which methods a class needs to implement to be considered "numpy-like", see :ref:`internals.duckarrays`.

Wrapping numpy-like arrays in xarray
------------------------------------

:py:class:`DataArray`, :py:class:`Dataset`, and :py:class:`Variable` objects can wrap these numpy-like arrays.

Constructing xarray objects which wrap numpy-like arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The primary way to create an xarray object which wraps a numpy-like array is to pass that numpy-like array instance directly
to the constructor of the xarray class. The :ref:`page on xarray data structures <data structures>` shows how :py:class:`DataArray` and :py:class:`Dataset`
both accept data in various forms through their ``data`` argument, but in fact this data can also be any wrappable numpy-like array.

For example, we can wrap the sparse array we created earlier inside a new DataArray object:

.. jupyter-execute::

    s_da = xr.DataArray(s, dims=["i", "j"])
    s_da

We can see what's inside - the printable representation of our xarray object (the repr) automatically uses the printable
representation of the underlying wrapped array.

Of course our sparse array object is still there underneath - it's stored under the ``.data`` attribute of the dataarray:

.. jupyter-execute::

    s_da.data

Array methods
~~~~~~~~~~~~~

We saw above that numpy-like arrays provide numpy methods. Xarray automatically uses these when you call the corresponding xarray method:

.. jupyter-execute::

    s_da.sum(dim="j")

Converting wrapped types
~~~~~~~~~~~~~~~~~~~~~~~~

If you want to change the type inside your xarray object you can use :py:meth:`DataArray.as_numpy`:

.. jupyter-execute::

    s_da.as_numpy()

This returns a new :py:class:`DataArray` object, but now wrapping a normal numpy array.

If instead you want to convert to numpy and return that numpy array you can use either :py:meth:`DataArray.to_numpy` or
:py:meth:`DataArray.values`, where the former is strongly preferred. The difference is in the way they coerce to numpy - :py:meth:`~DataArray.values`
always uses :py:func:`numpy.asarray` which will fail for some array types (e.g. ``cupy``), whereas :py:meth:`~DataArray.to_numpy`
uses the correct method depending on the array type.

.. jupyter-execute::

    s_da.to_numpy()

.. jupyter-execute::
    :raises:

    s_da.values

This illustrates the difference between :py:meth:`~DataArray.data` and :py:meth:`~DataArray.values`,
which is sometimes a point of confusion for new xarray users.
Explicitly: :py:meth:`DataArray.data` returns the underlying numpy-like array, regardless of type, whereas
:py:meth:`DataArray.values` converts the underlying array to a numpy array before returning it.
(This is another reason to use :py:meth:`~DataArray.to_numpy` over :py:meth:`~DataArray.values` - the intention is clearer.)

Conversion to numpy as a fallback
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If a wrapped array does not implement the corresponding array method then xarray will often attempt to convert the
underlying array to a numpy array so that the operation can be performed. You may want to watch out for this behavior,
and report any instances in which it causes problems.

Most of xarray's API does support using :term:`duck array` objects, but there are a few areas where
the code will still convert to ``numpy`` arrays:

- Dimension coordinates, and thus all indexing operations:

  * :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
  * :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
  * :py:meth:`Dataset.drop_sel` and :py:meth:`DataArray.drop_sel`
  * :py:meth:`Dataset.reindex`, :py:meth:`Dataset.reindex_like`,
    :py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
    data variables and non-dimension coordinates won't be casted

- Functions and methods that depend on external libraries or features of ``numpy`` not
  covered by ``__array_function__`` / ``__array_ufunc__``:

  * :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
  * :py:meth:`Dataset.bfill` and :py:meth:`DataArray.bfill` (uses ``bottleneck``)
  * :py:meth:`Dataset.interp`, :py:meth:`Dataset.interp_like`,
    :py:meth:`DataArray.interp` and :py:meth:`DataArray.interp_like` (uses ``scipy``):
    duck arrays in data variables and non-dimension coordinates will be casted in
    addition to not supporting duck arrays in dimension coordinates
  * :py:meth:`Dataset.rolling` and :py:meth:`DataArray.rolling` (requires ``numpy>=1.20``)
  * :py:meth:`Dataset.rolling_exp` and :py:meth:`DataArray.rolling_exp` (uses
    ``numbagg``)
  * :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses
    :py:class:`numpy.vectorize`)
  * :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)

- Incompatibilities between different :term:`duck array` libraries:

  * :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was
    not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should
    wrap the new ``dask`` array; changing the chunk sizes works however.

Extensions using duck arrays
----------------------------

Whilst the features above allow many numpy-like array libraries to be used pretty seamlessly with xarray, it often also
makes sense to use an interfacing package to make certain tasks easier.

For example the `pint-xarray package <https://pint-xarray.readthedocs.io>`_ offers a custom ``.pint`` accessor (see :ref:`internals.accessors`) which provides
convenient access to information stored within the wrapped array (e.g. ``.units`` and ``.magnitude``), and makes
creating wrapped pint arrays (and especially xarray-wrapping-pint-wrapping-dask arrays) simpler for the user.

We maintain a list of libraries extending ``xarray`` to make working with particular wrapped duck arrays
easier. If you know of more that aren't on this list please raise an issue to add them!

- `pint-xarray <https://pint-xarray.readthedocs.io>`_
- `cupy-xarray <https://cupy-xarray.readthedocs.io>`_
- `cubed-xarray <https://github.com/cubed-xarray>`_

    .. _ecosystem:

Xarray related projects
-----------------------

Below is a list of existing open source projects that build
functionality upon xarray. See also section :ref:`internals` for more
details on how to build xarray extensions. We also maintain the
`xarray-contrib <https://github.com/xarray-contrib>`_ GitHub organization
as a place to curate projects that build upon xarray.

Geosciences
~~~~~~~~~~~

- `aospy <https://aospy.readthedocs.io>`_: Automated analysis and management of gridded climate data.
- `argopy <https://github.com/euroargodev/argopy>`_: xarray-based Argo data access, manipulation and visualisation for standard users as well as Argo experts.
- `cf_xarray <https://cf-xarray.readthedocs.io/en/latest/>`_: Provides an accessor (DataArray.cf or Dataset.cf) that allows you to interpret Climate and Forecast metadata convention attributes present on xarray objects.
- `climpred <https://climpred.readthedocs.io>`_: Analysis of ensemble forecast models for climate prediction.
- `geocube <https://corteva.github.io/geocube>`_: Tool to convert geopandas vector data into rasterized xarray data.
- `GeoWombat <https://github.com/jgrss/geowombat>`_: Utilities for analysis of remotely sensed and gridded raster data at scale (easily tame Landsat, Sentinel, Quickbird, and PlanetScope).
- `grib2io <https://github.com/NOAA-MDL/grib2io>`_: Utility to work with GRIB2 files including an xarray backend, DASK support for parallel reading in open_mfdataset, lazy loading of data, editing of GRIB2 attributes and GRIB2IO DataArray attrs, and spatial interpolation and reprojection of GRIB2 messages and GRIB2IO Datasets/DataArrays for both grid to grid and grid to stations.
- `gsw-xarray <https://github.com/DocOtak/gsw-xarray>`_: a wrapper around `gsw <https://teos-10.github.io/GSW-Python>`_ that adds CF compliant attributes when possible, units, name.
- `infinite-diff <https://github.com/spencerahill/infinite-diff>`_: xarray-based finite-differencing, focused on gridded climate/meteorology data
- `marc_analysis <https://github.com/darothen/marc_analysis>`_: Analysis package for CESM/MARC experiments and output.
- `MetPy <https://unidata.github.io/MetPy/dev/index.html>`_: A collection of tools in Python for reading, visualizing, and performing calculations with weather data.
- `MPAS-Analysis <https://mpas-dev.github.io/MPAS-Analysis>`_: Analysis for simulations produced with Model for Prediction Across Scales (MPAS) components and the Accelerated Climate Model for Energy (ACME).
- `OGGM <https://oggm.org/>`_: Open Global Glacier Model
- `Oocgcm <https://oocgcm.readthedocs.io/>`_: Analysis of large gridded geophysical datasets
- `Open Data Cube <https://www.opendatacube.org/>`_: Analysis toolkit of continental scale Earth Observation data from satellites.
- `Pangaea <https://pangaea.readthedocs.io/en/latest/>`_: xarray extension for gridded land surface & weather model output).
- `Pangeo <https://pangeo.io>`_: A community effort for big data geoscience in the cloud.
- `PyGDX <https://pygdx.readthedocs.io/en/latest/>`_: Python 3 package for
  accessing data stored in GAMS Data eXchange (GDX) files. Also uses a custom
  subclass.
- `pyinterp <https://pangeo-pyinterp.readthedocs.io/en/latest/>`_: Python 3 package for interpolating geo-referenced data used in the field of geosciences.
- `pyXpcm <https://pyxpcm.readthedocs.io>`_: xarray-based Profile Classification Modelling (PCM), mostly for ocean data.
- `Regionmask <https://regionmask.readthedocs.io/>`_: plotting and creation of masks of spatial regions
- `rioxarray <https://corteva.github.io/rioxarray>`_: geospatial xarray extension powered by rasterio
- `salem <https://salem.readthedocs.io>`_: Adds geolocalised subsetting, masking, and plotting operations to xarray's data structures via accessors.
- `SatPy <https://satpy.readthedocs.io/>`_ : Library for reading and manipulating meteorological remote sensing data and writing it to various image and data file formats.
- `SARXarray <https://tudelftgeodesy.github.io/sarxarray/>`_: xarray extension for reading and processing large Synthetic Aperture Radar (SAR) data stacks.
- `shxarray <https://shxarray.wobbly.earth/>`_: Convert, filter,and map geodesy related spherical harmonic representations of gravity and terrestrial water storage through an xarray extension.
- `Spyfit <https://spyfit.readthedocs.io/en/master/>`_: FTIR spectroscopy of the atmosphere
- `windspharm <https://ajdawson.github.io/windspharm/index.html>`_: Spherical
  harmonic wind analysis in Python.
- `wradlib <https://wradlib.org/>`_: An Open Source Library for Weather Radar Data Processing.
- `wrf-python <https://wrf-python.readthedocs.io/>`_: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
- `xarray-eopf <https://github.com/EOPF-Sample-Service/xarray-eopf>`_: An xarray backend implementation for opening ESA EOPF data products in Zarr format.
- `xarray-regrid <https://github.com/EXCITED-CO2/xarray-regrid>`_: xarray extension for regridding rectilinear data.
- `xarray-simlab <https://xarray-simlab.readthedocs.io>`_: xarray extension for computer model simulations.
- `xarray-spatial <https://xarray-spatial.org/>`_: Numba-accelerated raster-based spatial processing tools (NDVI, curvature, zonal-statistics, proximity, hillshading, viewshed, etc.)
- `xarray-topo <https://xarray-topo.readthedocs.io/>`_: xarray extension for topographic analysis and modelling.
- `xbpch <https://github.com/darothen/xbpch>`_: xarray interface for bpch files.
- `xCDAT <https://xcdat.readthedocs.io/>`_: An extension of xarray for climate data analysis on structured grids.
- `xclim <https://xclim.readthedocs.io/>`_: A library for calculating climate science indices with unit handling built from xarray and dask.
- `xESMF <https://pangeo-xesmf.readthedocs.io/>`_: Universal regridder for geospatial data.
- `xgcm <https://xgcm.readthedocs.io/>`_: Extends the xarray data model to understand finite volume grid cells (common in General Circulation Models) and provides interpolation and difference operations for such grids.
- `xmitgcm <https://xmitgcm.readthedocs.io/>`_: a python package for reading `MITgcm <https://mitgcm.org/>`_ binary MDS files into xarray data structures.
- `xnemogcm <https://github.com/rcaneill/xnemogcm/>`_: a package to read `NEMO <https://nemo-ocean.eu/>`_ output files and add attributes to interface with xgcm.

Machine Learning
~~~~~~~~~~~~~~~~
- `ArviZ <https://arviz-devs.github.io/arviz/>`_: Exploratory analysis of Bayesian models, built on top of xarray.
- `Darts <https://github.com/unit8co/darts/>`_: User-friendly modern machine learning for time series in Python.
- `Elm <https://ensemble-learning-models.readthedocs.io>`_: Parallel machine learning on xarray data structures
- `sklearn-xarray (1) <https://phausamann.github.io/sklearn-xarray>`_: Combines scikit-learn and xarray (1).
- `sklearn-xarray (2) <https://sklearn-xarray.readthedocs.io/en/latest/>`_: Combines scikit-learn and xarray (2).
- `xbatcher <https://xbatcher.readthedocs.io>`_: Batch Generation from Xarray Datasets.

Other domains
~~~~~~~~~~~~~
- `ptsa <https://pennmem.github.io/ptsa/html/index.html>`_: EEG Time Series Analysis
- `pycalphad <https://pycalphad.org/docs/latest/>`_: Computational Thermodynamics in Python
- `pyomeca <https://pyomeca.github.io/>`_: Python framework for biomechanical analysis
- `movement <https://movement.neuroinformatics.dev/>`_: A Python toolbox for analysing animal body movements

Extend xarray capabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~
- `Collocate <https://github.com/cistools/collocate>`_: Collocate xarray trajectories in arbitrary physical dimensions
- `eofs <https://ajdawson.github.io/eofs/>`_: EOF analysis in Python.
- `hypothesis-gufunc <https://hypothesis-gufunc.readthedocs.io/en/latest/>`_: Extension to hypothesis. Makes it easy to write unit tests with xarray objects as input.
- `ntv-pandas <https://github.com/loco-philippe/ntv-pandas>`_ : A tabular analyzer and a semantic, compact and reversible converter for multidimensional and tabular data
- `nxarray <https://github.com/nxarray/nxarray>`_: NeXus input/output capability for xarray.
- `xarray-compare <https://github.com/astropenguin/xarray-compare>`_: xarray extension for data comparison.
- `xarray-dataclasses <https://github.com/astropenguin/xarray-dataclasses>`_: xarray extension for typed DataArray and Dataset creation.
- `xarray_einstats <https://xarray-einstats.readthedocs.io>`_: Statistics, linear algebra and einops for xarray
- `xarray_extras <https://github.com/crusaderky/xarray_extras>`_: Advanced algorithms for xarray objects (e.g. integrations/interpolations).
- `xeofs <https://github.com/nicrie/xeofs>`_: PCA/EOF analysis and related techniques, integrated with xarray and Dask for efficient handling of large-scale data.
- `xpublish <https://xpublish.readthedocs.io/>`_: Publish Xarray Datasets via a Zarr compatible REST API.
- `xrft <https://github.com/rabernat/xrft>`_: Fourier transforms for xarray data.
- `xr-scipy <https://xr-scipy.readthedocs.io>`_: A lightweight scipy wrapper for xarray.
- `X-regression <https://github.com/kuchaale/X-regression>`_: Multiple linear regression from Statsmodels library coupled with Xarray library.
- `xskillscore <https://github.com/xarray-contrib/xskillscore>`_: Metrics for verifying forecasts.
- `xyzpy <https://xyzpy.readthedocs.io>`_: Easily generate high dimensional data, including parallelization.
- `xarray-lmfit <https://github.com/kmnhan/xarray-lmfit>`_: xarray extension for curve fitting using `lmfit <https://lmfit.github.io/lmfit-py/>`_.

Visualization
~~~~~~~~~~~~~
- `datashader <https://datashader.org>`_, `geoviews <https://geoviews.org>`_, `holoviews <https://holoviews.org/>`_, : visualization packages for large data.
- `hvplot <https://hvplot.pyviz.org/>`_ : A high-level plotting API for the PyData ecosystem built on HoloViews.
- `psyplot <https://psyplot.readthedocs.io>`_: Interactive data visualization with python.
- `xarray-leaflet <https://github.com/davidbrochart/xarray_leaflet>`_: An xarray extension for tiled map plotting based on ipyleaflet.
- `xtrude <https://github.com/davidbrochart/xtrude>`_: An xarray extension for 3D terrain visualization based on pydeck.
- `pyvista-xarray <https://github.com/pyvista/pyvista-xarray>`_: xarray DataArray accessor for 3D visualization with `PyVista <https://github.com/pyvista/pyvista>`_ and DataSet engines for reading VTK data formats.

Non-Python projects
~~~~~~~~~~~~~~~~~~~
- `xframe <https://github.com/xtensor-stack/xframe>`_: C++ data structures inspired by xarray.
- `AxisArrays <https://github.com/JuliaArrays/AxisArrays.jl>`_, `NamedArrays <https://github.com/davidavdav/NamedArrays.jl>`_ and `YAXArrays.jl <https://github.com/JuliaDataCubes/YAXArrays.jl>`_: similar data structures for Julia.

More projects can be found at the `"xarray" Github topic <https://github.com/topics/xarray>`_.

    .. currentmodule:: xarray

.. _groupby:

GroupBy: Group and Bin Data
---------------------------

Often we want to bin or group data, produce statistics (mean, variance) on
the groups, and then return a reduced data set. To do this, Xarray supports
`"group by"`__ operations with the same API as pandas to implement the
`split-apply-combine`__ strategy:

__ https://pandas.pydata.org/pandas-docs/stable/groupby.html
__ https://www.jstatsoft.org/v40/i01/paper

- Split your data into multiple independent groups.
- Apply some function to each group.
- Combine your groups back into a single data object.

Group by operations work on both :py:class:`Dataset` and
:py:class:`DataArray` objects. Most of the examples focus on grouping by
a single one-dimensional variable, although support for grouping
over a multi-dimensional variable has recently been implemented. Note that for
one-dimensional data, it is usually faster to rely on pandas' implementation of
the same pipeline.

.. tip::

   `Install the flox package <https://flox.readthedocs.io>`_ to substantially improve the performance
   of GroupBy operations, particularly with dask. flox
   `extends Xarray's in-built GroupBy capabilities <https://flox.readthedocs.io/en/latest/xarray.html>`_
   by allowing grouping by multiple variables, and lazy grouping by dask arrays.
   If installed, Xarray will automatically use flox by default.

Split
~~~~~

Let's create a simple example dataset:

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

.. jupyter-execute::

    ds = xr.Dataset(
        {"foo": (("x", "y"), np.random.rand(4, 3))},
        coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))},
    )
    arr = ds["foo"]
    ds

If we groupby the name of a variable or coordinate in a dataset (we can also
use a DataArray directly), we get back a ``GroupBy`` object:

.. jupyter-execute::

    ds.groupby("letters")

This object works very similarly to a pandas GroupBy object. You can view
the group indices with the ``groups`` attribute:

.. jupyter-execute::

    ds.groupby("letters").groups

You can also iterate over groups in ``(label, group)`` pairs:

.. jupyter-execute::

    list(ds.groupby("letters"))

You can index out a particular group:

.. jupyter-execute::

    ds.groupby("letters")["b"]

To group by multiple variables, see :ref:`this section <groupby.multiple>`.

Binning
~~~~~~~

Sometimes you don't want to use all the unique values to determine the groups
but instead want to "bin" the data into coarser groups. You could always create
a customized coordinate, but xarray facilitates this via the
:py:meth:`Dataset.groupby_bins` method.

.. jupyter-execute::

    x_bins = [0, 25, 50]
    ds.groupby_bins("x", x_bins).groups

The binning is implemented via :func:`pandas.cut`, whose documentation details how
the bins are assigned. As seen in the example above, by default, the bins are
labeled with strings using set notation to precisely identify the bin limits. To
override this behavior, you can specify the bin labels explicitly. Here we
choose ``float`` labels which identify the bin centers:

.. jupyter-execute::

    x_bin_labels = [12.5, 37.5]
    ds.groupby_bins("x", x_bins, labels=x_bin_labels).groups


Apply
~~~~~

To apply a function to each group, you can use the flexible
:py:meth:`core.groupby.DatasetGroupBy.map` method. The resulting objects are automatically
concatenated back together along the group axis:

.. jupyter-execute::

    def standardize(x):
        return (x - x.mean()) / x.std()


    arr.groupby("letters").map(standardize)

GroupBy objects also have a :py:meth:`core.groupby.DatasetGroupBy.reduce` method and
methods like :py:meth:`core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
aggregation function:

.. jupyter-execute::

    arr.groupby("letters").mean(dim="x")

Using a groupby is thus also a convenient shortcut for aggregating over all
dimensions *other than* the provided one:

.. jupyter-execute::

    ds.groupby("x").std(...)

.. note::

    We use an ellipsis (`...`) here to indicate we want to reduce over all
    other dimensions


First and last
~~~~~~~~~~~~~~

There are two special aggregation operations that are currently only found on
groupby objects: first and last. These provide the first or last example of
values for group along the grouped dimension:

.. jupyter-execute::

    ds.groupby("letters").first(...)

By default, they skip missing values (control this with ``skipna``).

Grouped arithmetic
~~~~~~~~~~~~~~~~~~

GroupBy objects also support a limited set of binary arithmetic operations, as
a shortcut for mapping over all unique labels. Binary arithmetic is supported
for ``(GroupBy, Dataset)`` and ``(GroupBy, DataArray)`` pairs, as long as the
dataset or data array uses the unique grouped values as one of its index
coordinates. For example:

.. jupyter-execute::

    alt = arr.groupby("letters").mean(...)
    alt

.. jupyter-execute::

    ds.groupby("letters") - alt

This last line is roughly equivalent to the following::

    results = []
    for label, group in ds.groupby('letters'):
        results.append(group - alt.sel(letters=label))
    xr.concat(results, dim='x')

.. _groupby.multidim:

Multidimensional Grouping
~~~~~~~~~~~~~~~~~~~~~~~~~

Many datasets have a multidimensional coordinate variable (e.g. longitude)
which is different from the logical grid dimensions (e.g. nx, ny). Such
variables are valid under the `CF conventions`__. Xarray supports groupby
operations over multidimensional coordinate variables:

__ https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dimensional_latitude_longitude_coordinate_variables

.. jupyter-execute::

    da = xr.DataArray(
        [[0, 1], [2, 3]],
        coords={
            "lon": (["ny", "nx"], [[30, 40], [40, 50]]),
            "lat": (["ny", "nx"], [[10, 10], [20, 20]]),
        },
        dims=["ny", "nx"],
    )
    da

.. jupyter-execute::

    da.groupby("lon").sum(...)

.. jupyter-execute::

    da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)

Because multidimensional groups have the ability to generate a very large
number of bins, coarse-binning via :py:meth:`Dataset.groupby_bins`
may be desirable:

.. jupyter-execute::

    da.groupby_bins("lon", [0, 45, 50]).sum()

These methods group by ``lon`` values. It is also possible to groupby each
cell in a grid, regardless of value, by stacking multiple dimensions,
applying your function, and then unstacking the result:

.. jupyter-execute::

    stacked = da.stack(gridcell=["ny", "nx"])
    stacked.groupby("gridcell").sum(...).unstack("gridcell")

Alternatively, you can groupby both ``lat`` and ``lon`` at the :ref:`same time <groupby.multiple>`.

.. _groupby.groupers:

Grouper Objects
~~~~~~~~~~~~~~~

Both ``groupby_bins`` and ``resample`` are specializations of the core ``groupby`` operation for binning,
and time resampling. Many problems demand more complex GroupBy application: for example, grouping by multiple
variables with a combination of categorical grouping, binning, and resampling; or more specializations like
spatial resampling; or more complex time grouping like special handling of seasons, or the ability to specify
custom seasons. To handle these use-cases and more, Xarray is evolving to providing an
extension point using ``Grouper`` objects.

.. tip::

   See the `grouper design`_ doc for more detail on the motivation and design ideas behind
   Grouper objects.

.. _grouper design: https://github.com/pydata/xarray/blob/main/design_notes/grouper_objects.md

For now Xarray provides three specialized Grouper objects:

1. :py:class:`groupers.UniqueGrouper` for categorical grouping
2. :py:class:`groupers.BinGrouper` for binned grouping
3. :py:class:`groupers.TimeResampler` for resampling along a datetime coordinate

These provide functionality identical to the existing ``groupby``, ``groupby_bins``, and ``resample`` methods.
That is,

.. code-block:: python

    ds.groupby("x")

is identical to

.. code-block:: python

    from xarray.groupers import UniqueGrouper

    ds.groupby(x=UniqueGrouper())


Similarly,

.. code-block:: python

    ds.groupby_bins("x", bins=bins)

is identical to

.. code-block:: python

    from xarray.groupers import BinGrouper

    ds.groupby(x=BinGrouper(bins))

and

.. code-block:: python

    ds.resample(time="ME")

is identical to

.. code-block:: python

    from xarray.groupers import TimeResampler

    ds.resample(time=TimeResampler("ME"))


The :py:class:`groupers.UniqueGrouper` accepts an optional ``labels`` kwarg that is not present
in :py:meth:`DataArray.groupby` or :py:meth:`Dataset.groupby`.
Specifying ``labels`` is required when grouping by a lazy array type (e.g. dask or cubed).
The ``labels`` are used to construct the output coordinate (say for a reduction), and aggregations
will only be run over the specified labels.
You may use ``labels`` to also specify the ordering of groups to be used during iteration.
The order will be preserved in the output.


.. _groupby.multiple:

Grouping by multiple variables
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use grouper objects to group by multiple dimensions:

.. jupyter-execute::

    from xarray.groupers import UniqueGrouper

    da.groupby(["lat", "lon"]).sum()

The above is sugar for using ``UniqueGrouper`` objects directly:

.. jupyter-execute::

    da.groupby(lat=UniqueGrouper(), lon=UniqueGrouper()).sum()


Different groupers can be combined to construct sophisticated GroupBy operations.

.. jupyter-execute::

    from xarray.groupers import BinGrouper

    ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()


Time Grouping and Resampling
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. seealso::

   See :ref:`resampling`.


Shuffling
~~~~~~~~~

Shuffling is a generalization of sorting a DataArray or Dataset by another DataArray, named ``label`` for example, that follows from the idea of grouping by ``label``.
Shuffling reorders the DataArray or the DataArrays in a Dataset such that all members of a group occur sequentially. For example,
Shuffle the object using either :py:class:`DatasetGroupBy` or :py:class:`DataArrayGroupBy` as appropriate.

.. jupyter-execute::

    da = xr.DataArray(
        dims="x",
        data=[1, 2, 3, 4, 5, 6],
        coords={"label": ("x", "a b c a b c".split(" "))},
    )
    da.groupby("label").shuffle_to_chunks()


For chunked array types (e.g. dask or cubed), shuffle may result in a more optimized communication pattern when compared to direct indexing by the appropriate indexer.
Shuffling also makes GroupBy operations on chunked arrays an embarrassingly parallel problem, and may significantly improve workloads that use :py:meth:`DatasetGroupBy.map` or :py:meth:`DataArrayGroupBy.map`.

    .. _userguide.hierarchical-data:

Hierarchical data
=================

.. jupyter-execute::
    :hide-code:
    :hide-output:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)
    np.set_printoptions(threshold=10)

    %xmode minimal

.. _why:

Why Hierarchical Data?
----------------------

Many real-world datasets are composed of multiple differing components,
and it can often be useful to think of these in terms of a hierarchy of related groups of data.
Examples of data which one might want organise in a grouped or hierarchical manner include:

- Simulation data at multiple resolutions,
- Observational data about the same system but from multiple different types of sensors,
- Mixed experimental and theoretical data,
- A systematic study recording the same experiment but with different parameters,
- Heterogeneous data, such as demographic and metereological data,

or even any combination of the above.

Often datasets like this cannot easily fit into a single :py:class:`~xarray.Dataset` object,
or are more usefully thought of as groups of related :py:class:`~xarray.Dataset` objects.
For this purpose we provide the :py:class:`xarray.DataTree` class.

This page explains in detail how to understand and use the different features
of the :py:class:`~xarray.DataTree` class for your own hierarchical data needs.

.. _node relationships:

Node Relationships
------------------

.. _creating a family tree:

Creating a Family Tree
~~~~~~~~~~~~~~~~~~~~~~

The three main ways of creating a :py:class:`~xarray.DataTree` object are described briefly in :ref:`creating a datatree`.
Here we go into more detail about how to create a tree node-by-node, using a famous family tree from the Simpsons cartoon as an example.

Let's start by defining nodes representing the two siblings, Bart and Lisa Simpson:

.. jupyter-execute::

    bart = xr.DataTree(name="Bart")
    lisa = xr.DataTree(name="Lisa")

Each of these node objects knows their own :py:class:`~xarray.DataTree.name`, but they currently have no relationship to one another.
We can connect them by creating another node representing a common parent, Homer Simpson:

.. jupyter-execute::

    homer = xr.DataTree(name="Homer", children={"Bart": bart, "Lisa": lisa})

Here we set the children of Homer in the node's constructor.
We now have a small family tree where we can see how these individual Simpson family members are related to one another:

.. jupyter-execute::

    print(homer)

.. note::
   We use ``print()`` above to show the compact tree hierarchy.
   :py:class:`~xarray.DataTree` objects also have an interactive HTML representation that is enabled by default in editors such as JupyterLab and VSCode.
   The HTML representation is especially helpful for larger trees and exploring new datasets, as it allows you to expand and collapse nodes.
   If you prefer the text representations you can also set ``xr.set_options(display_style="text")``.

..
   Comment:: may remove note and print()s after upstream theme changes https://github.com/pydata/pydata-sphinx-theme/pull/2187

The nodes representing Bart and Lisa are now connected - we can confirm their sibling rivalry by examining the :py:class:`~xarray.DataTree.siblings` property:

.. jupyter-execute::

    list(homer["Bart"].siblings)

But oops, we forgot Homer's third daughter, Maggie! Let's add her by updating Homer's :py:class:`~xarray.DataTree.children` property to include her:

.. jupyter-execute::

    maggie = xr.DataTree(name="Maggie")
    homer.children = {"Bart": bart, "Lisa": lisa, "Maggie": maggie}
    print(homer)

Let's check that Maggie knows who her Dad is:

.. jupyter-execute::

    maggie.parent.name

That's good - updating the properties of our nodes does not break the internal consistency of our tree, as changes of parentage are automatically reflected on both nodes.

    These children obviously have another parent, Marge Simpson, but :py:class:`~xarray.DataTree` nodes can only have a maximum of one parent.
    Genealogical `family trees are not even technically trees <https://en.wikipedia.org/wiki/Family_tree#Graph_theory>`_ in the mathematical sense -
    the fact that distant relatives can mate makes them directed acyclic graphs.
    Trees of :py:class:`~xarray.DataTree` objects cannot represent this.

Homer is currently listed as having no parent (the so-called "root node" of this tree), but we can update his :py:class:`~xarray.DataTree.parent` property:

.. jupyter-execute::

    abe = xr.DataTree(name="Abe")
    abe.children = {"Homer": homer}

Abe is now the "root" of this tree, which we can see by examining the :py:class:`~xarray.DataTree.root` property of any node in the tree

.. jupyter-execute::

    maggie.root.name

We can see the whole tree by printing Abe's node or just part of the tree by printing Homer's node:

.. jupyter-execute::

    print(abe)

.. jupyter-execute::

    print(abe["Homer"])

In episode 28, Abe Simpson reveals that he had another son, Herbert "Herb" Simpson.
We can add Herbert to the family tree without displacing Homer by :py:meth:`~xarray.DataTree.assign`-ing another child to Abe:

.. jupyter-execute::

    herbert = xr.DataTree(name="Herb")
    abe = abe.assign({"Herbert": herbert})
    print(abe)

.. jupyter-execute::

    print(abe["Herbert"].name)
    print(herbert.name)

.. note::
   This example shows a subtlety - the returned tree has Homer's brother listed as ``"Herbert"``,
   but the original node was named "Herb". Not only are names overridden when stored as keys like this,
   but the new node is a copy, so that the original node that was referenced is unchanged (i.e. ``herbert.name == "Herb"`` still).
   In other words, nodes are copied into trees, not inserted into them.
   This is intentional, and mirrors the behaviour when storing named :py:class:`~xarray.DataArray` objects inside datasets.

Certain manipulations of our tree are forbidden, if they would create an inconsistent result.
In episode 51 of the show Futurama, Philip J. Fry travels back in time and accidentally becomes his own Grandfather.
If we try similar time-travelling hijinks with Homer, we get a :py:class:`~xarray.InvalidTreeError` raised:

.. jupyter-execute::
    :raises:

    abe["Homer"].children = {"Abe": abe}

.. _evolutionary tree:

Ancestry in an Evolutionary Tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Let's use a different example of a tree to discuss more complex relationships between nodes - the phylogenetic tree, or tree of life.

.. jupyter-execute::

    vertebrates = xr.DataTree.from_dict(
        {
            "/Sharks": None,
            "/Bony Skeleton/Ray-finned Fish": None,
            "/Bony Skeleton/Four Limbs/Amphibians": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Rodents & Rabbits": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs": None,
            "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Birds": None,
        },
        name="Vertebrae",
    )

    primates = vertebrates["/Bony Skeleton/Four Limbs/Amniotic Egg/Hair/Primates"]

    dinosaurs = vertebrates[
        "/Bony Skeleton/Four Limbs/Amniotic Egg/Two Fenestrae/Dinosaurs"
    ]

We have used the :py:meth:`~xarray.DataTree.from_dict` constructor method as a preferred way to quickly create a whole tree,
and :ref:`filesystem paths` (to be explained shortly) to select two nodes of interest.

.. jupyter-execute::

    print(vertebrates)

This tree shows various families of species, grouped by their common features (making it technically a `"Cladogram" <https://en.wikipedia.org/wiki/Cladogram>`_,
rather than an evolutionary tree).

Here both the species and the features used to group them are represented by :py:class:`~xarray.DataTree` node objects - there is no distinction in types of node.
We can however get a list of only the nodes we used to represent species by using the fact that all those nodes have no children - they are "leaf nodes".
We can check if a node is a leaf with :py:meth:`~xarray.DataTree.is_leaf`, and get a list of all leaves with the :py:class:`~xarray.DataTree.leaves` property:

.. jupyter-execute::

    print(primates.is_leaf)
    [node.name for node in vertebrates.leaves]

Pretending that this is a true evolutionary tree for a moment, we can find the features of the evolutionary ancestors (so-called "ancestor" nodes),
the distinguishing feature of the common ancestor of all vertebrate life (the root node),
and even the distinguishing feature of the common ancestor of any two species (the common ancestor of two nodes):

.. jupyter-execute::

    print([node.name for node in reversed(primates.parents)])
    print(primates.root.name)
    print(primates.find_common_ancestor(dinosaurs).name)

We can only find a common ancestor between two nodes that lie in the same tree.
If we try to find the common evolutionary ancestor between primates and an Alien species that has no relationship to Earth's evolutionary tree,
an error will be raised.

.. jupyter-execute::
    :raises:

    alien = xr.DataTree(name="Xenomorph")
    primates.find_common_ancestor(alien)


.. _navigating trees:

Navigating Trees
----------------

There are various ways to access the different nodes in a tree.

Properties
~~~~~~~~~~

We can navigate trees using the :py:class:`~xarray.DataTree.parent` and :py:class:`~xarray.DataTree.children` properties of each node, for example:

.. jupyter-execute::

    lisa.parent.children["Bart"].name

but there are also more convenient ways to access nodes.

Dictionary-like interface
~~~~~~~~~~~~~~~~~~~~~~~~~

Children are stored on each node as a key-value mapping from name to child node.
They can be accessed and altered via the :py:class:`~xarray.DataTree.__getitem__` and :py:class:`~xarray.DataTree.__setitem__` syntax.
In general :py:class:`~xarray.DataTree.DataTree` objects support almost the entire set of dict-like methods,
including :py:meth:`~xarray.DataTree.keys`, :py:class:`~xarray.DataTree.values`, :py:class:`~xarray.DataTree.items`,
:py:meth:`~xarray.DataTree.__delitem__` and :py:meth:`~xarray.DataTree.update`.

.. jupyter-execute::

    print(vertebrates["Bony Skeleton"]["Ray-finned Fish"])

Note that the dict-like interface combines access to child :py:class:`~xarray.DataTree` nodes and stored :py:class:`~xarray.DataArrays`,
so if we have a node that contains both children and data, calling :py:meth:`~xarray.DataTree.keys` will list both names of child nodes and
names of data variables:

.. jupyter-execute::

    dt = xr.DataTree(
        dataset=xr.Dataset({"foo": 0, "bar": 1}),
        children={"a": xr.DataTree(), "b": xr.DataTree()},
    )
    print(dt)
    list(dt.keys())

This also means that the names of variables and of child nodes must be different to one another.

Attribute-like access
~~~~~~~~~~~~~~~~~~~~~

You can also select both variables and child nodes through dot indexing

.. jupyter-execute::

    print(dt.foo)
    print(dt.a)

.. _filesystem paths:

Filesystem-like Paths
~~~~~~~~~~~~~~~~~~~~~

Hierarchical trees can be thought of as analogous to file systems.
Each node is like a directory, and each directory can contain both more sub-directories and data.

.. note::

    Future development will allow you to make the filesystem analogy concrete by
    using :py:func:`~xarray.DataTree.open_mfdatatree` or
    :py:func:`~xarray.DataTree.save_mfdatatree`.
    (`See related issue in GitHub <https://github.com/xarray-contrib/datatree/issues/55>`_)

Datatree objects support a syntax inspired by unix-like filesystems,
where the "path" to a node is specified by the keys of each intermediate node in sequence,
separated by forward slashes.
This is an extension of the conventional dictionary ``__getitem__`` syntax to allow navigation across multiple levels of the tree.

Like with filepaths, paths within the tree can either be relative to the current node, e.g.

.. jupyter-execute::

    print(abe["Homer/Bart"].name)
    print(abe["./Homer/Bart"].name)  # alternative syntax

or relative to the root node.
A path specified from the root (as opposed to being specified relative to an arbitrary node in the tree) is sometimes also referred to as a
`"fully qualified name" <https://www.unidata.ucar.edu/blogs/developer/en/entry/netcdf-zarr-data-model-specification#nczarr_fqn>`_,
or as an "absolute path".
The root node is referred to by ``"/"``, so the path from the root node to its grand-child would be ``"/child/grandchild"``, e.g.

.. jupyter-execute::

    # access lisa's sibling by a relative path.
    print(lisa["../Bart"])
    # or from absolute path
    print(lisa["/Homer/Bart"])


Relative paths between nodes also support the ``"../"`` syntax to mean the parent of the current node.
We can use this with ``__setitem__`` to add a missing entry to our evolutionary tree, but add it relative to a more familiar node of interest:

.. jupyter-execute::

    primates["../../Two Fenestrae/Crocodiles"] = xr.DataTree()
    print(vertebrates)

Given two nodes in a tree, we can also find their relative path:

.. jupyter-execute::

    bart.relative_to(lisa)

You can use this filepath feature to build a nested tree from a dictionary of filesystem-like paths and corresponding :py:class:`~xarray.Dataset` objects in a single step.
If we have a dictionary where each key is a valid path, and each value is either valid data or ``None``,
we can construct a complex tree quickly using the alternative constructor :py:meth:`~xarray.DataTree.from_dict()`:

.. jupyter-execute::

    d = {
        "/": xr.Dataset({"foo": "orange"}),
        "/a": xr.Dataset({"bar": 0}, coords={"y": ("y", [0, 1, 2])}),
        "/a/b": xr.Dataset({"zed": np.nan}),
        "a/c/d": None,
    }
    dt = xr.DataTree.from_dict(d)
    print(dt)

.. note::

    Notice that using the path-like syntax will also create any intermediate empty nodes necessary to reach the end of the specified path
    (i.e. the node labelled ``"/a/c"`` in this case.)
    This is to help avoid lots of redundant entries when creating deeply-nested trees using :py:meth:`xarray.DataTree.from_dict`.

.. _iterating over trees:

Iterating over trees
~~~~~~~~~~~~~~~~~~~~

You can iterate over every node in a tree using the subtree :py:class:`~xarray.DataTree.subtree` property.
This returns an iterable of nodes, which yields them in depth-first order.

.. jupyter-execute::

    for node in vertebrates.subtree:
        print(node.path)

Similarly, :py:class:`~xarray.DataTree.subtree_with_keys` returns an iterable of
relative paths and corresponding nodes.

A very useful pattern is to iterate over :py:class:`~xarray.DataTree.subtree_with_keys`
to manipulate nodes however you wish, then rebuild a new tree using
:py:meth:`xarray.DataTree.from_dict()`.
For example, we could keep only the nodes containing data by looping over all nodes,
checking if they contain any data using :py:class:`~xarray.DataTree.has_data`,
then rebuilding a new tree using only the paths of those nodes:

.. jupyter-execute::

    non_empty_nodes = {
        path: node.dataset for path, node in dt.subtree_with_keys if node.has_data
    }
    print(xr.DataTree.from_dict(non_empty_nodes))

You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``.

(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.name)``.)

.. _manipulating trees:

Manipulating Trees
------------------

Subsetting Tree Nodes
~~~~~~~~~~~~~~~~~~~~~

We can subset our tree to select only nodes of interest in various ways.

Similarly to on a real filesystem, matching nodes by common patterns in their paths is often useful.
We can use :py:meth:`xarray.DataTree.match` for this:

.. jupyter-execute::

    dt = xr.DataTree.from_dict(
        {
            "/a/A": None,
            "/a/B": None,
            "/b/A": None,
            "/b/B": None,
        }
    )
    result = dt.match("*/B")
    print(result)

We can also subset trees by the contents of the nodes.
:py:meth:`xarray.DataTree.filter` retains only the nodes of a tree that meet a certain condition.
For example, we could recreate the Simpson's family tree with the ages of each individual, then filter for only the adults:
First lets recreate the tree but with an ``age`` data variable in every node:

.. jupyter-execute::

    simpsons = xr.DataTree.from_dict(
        {
            "/": xr.Dataset({"age": 83}),
            "/Herbert": xr.Dataset({"age": 40}),
            "/Homer": xr.Dataset({"age": 39}),
            "/Homer/Bart": xr.Dataset({"age": 10}),
            "/Homer/Lisa": xr.Dataset({"age": 8}),
            "/Homer/Maggie": xr.Dataset({"age": 1}),
        },
        name="Abe",
    )
    print(simpsons)

Now let's filter out the minors:

.. jupyter-execute::

    print(simpsons.filter(lambda node: node["age"] > 18))

The result is a new tree, containing only the nodes matching the condition.

(Yes, under the hood :py:meth:`~xarray.DataTree.filter` is just syntactic sugar for the pattern we showed you in :ref:`iterating over trees` !)

.. _Tree Contents:

Tree Contents
-------------

Hollow Trees
~~~~~~~~~~~~

A concept that can sometimes be useful is that of a "Hollow Tree", which means a tree with data stored only at the leaf nodes.
This is useful because certain useful tree manipulation operations only make sense for hollow trees.

You can check if a tree is a hollow tree by using the :py:class:`~xarray.DataTree.is_hollow` property.
We can see that the Simpson's family is not hollow because the data variable ``"age"`` is present at some nodes which
have children (i.e. Abe and Homer).

.. jupyter-execute::

    simpsons.is_hollow

.. _tree computation:

Computation
-----------

:py:class:`~xarray.DataTree` objects are also useful for performing computations, not just for organizing data.

Operations and Methods on Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To show how applying operations across a whole tree at once can be useful,
let's first create a example scientific dataset.

.. jupyter-execute::

    def time_stamps(n_samples, T):
        """Create an array of evenly-spaced time stamps"""
        return xr.DataArray(
            data=np.linspace(0, 2 * np.pi * T, n_samples), dims=["time"]
        )


    def signal_generator(t, f, A, phase):
        """Generate an example electrical-like waveform"""
        return A * np.sin(f * t.data + phase)


    time_stamps1 = time_stamps(n_samples=15, T=1.5)
    time_stamps2 = time_stamps(n_samples=10, T=1.0)

    voltages = xr.DataTree.from_dict(
        {
            "/oscilloscope1": xr.Dataset(
                {
                    "potential": (
                        "time",
                        signal_generator(time_stamps1, f=2, A=1.2, phase=0.5),
                    ),
                    "current": (
                        "time",
                        signal_generator(time_stamps1, f=2, A=1.2, phase=1),
                    ),
                },
                coords={"time": time_stamps1},
            ),
            "/oscilloscope2": xr.Dataset(
                {
                    "potential": (
                        "time",
                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.2),
                    ),
                    "current": (
                        "time",
                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
                    ),
                },
                coords={"time": time_stamps2},
            ),
        }
    )
    print(voltages)

Most xarray computation methods also exist as methods on datatree objects,
so you can for example take the mean value of these two timeseries at once:

.. jupyter-execute::

    print(voltages.mean(dim="time"))

This works by mapping the standard :py:meth:`xarray.Dataset.mean()` method over the dataset stored in each node of the
tree one-by-one.

The arguments passed to the method are used for every node, so the values of the arguments you pass might be valid for one node and invalid for another

.. jupyter-execute::
    :raises:

    voltages.isel(time=12)

Notice that the error raised helpfully indicates which node of the tree the operation failed on.

Arithmetic Methods on Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Arithmetic methods are also implemented, so you can e.g. add a scalar to every dataset in the tree at once.
For example, we can advance the timeline of the Simpsons by a decade just by

.. jupyter-execute::

    print(simpsons + 10)

See that the same change (fast-forwarding by adding 10 years to the age of each character) has been applied to every node.

Mapping Custom Functions Over Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You can map custom computation over each node in a tree using :py:meth:`xarray.DataTree.map_over_datasets`.
You can map any function, so long as it takes :py:class:`xarray.Dataset` objects as one (or more) of the input arguments,
and returns one (or more) xarray datasets.

.. note::

    Functions passed to :py:func:`~xarray.DataTree.map_over_datasets` cannot alter nodes in-place.
    Instead they must return new :py:class:`xarray.Dataset` objects.

For example, we can define a function to calculate the Root Mean Square of a timeseries

.. jupyter-execute::

    def rms(signal):
        return np.sqrt(np.mean(signal**2))

Then calculate the RMS value of these signals:

.. jupyter-execute::

    print(voltages.map_over_datasets(rms))

.. _multiple trees:

We can also use :py:func:`~xarray.map_over_datasets` to apply a function over
the data in multiple trees, by passing the trees as positional arguments.

Operating on Multiple Trees
---------------------------

The examples so far have involved mapping functions or methods over the nodes of a single tree,
but we can generalize this to mapping functions over multiple trees at once.

Iterating Over Multiple Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To iterate over the corresponding nodes in multiple trees, use
:py:func:`~xarray.group_subtrees` instead of
:py:class:`~xarray.DataTree.subtree_with_keys`. This combines well with
:py:meth:`xarray.DataTree.from_dict()` to build a new tree:

.. jupyter-execute::

    dt1 = xr.DataTree.from_dict({"a": xr.Dataset({"x": 1}), "b": xr.Dataset({"x": 2})})
    dt2 = xr.DataTree.from_dict(
        {"a": xr.Dataset({"x": 10}), "b": xr.Dataset({"x": 20})}
    )
    result = {}
    for path, (node1, node2) in xr.group_subtrees(dt1, dt2):
        result[path] = node1.dataset + node2.dataset
    dt3 = xr.DataTree.from_dict(result)
    print(dt3)

Alternatively, you apply a function directly to paired datasets at every node
using :py:func:`xarray.map_over_datasets`:

.. jupyter-execute::

    dt3 = xr.map_over_datasets(lambda x, y: x + y, dt1, dt2)
    print(dt3)

Comparing Trees for Isomorphism
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

For it to make sense to map a single non-unary function over the nodes of multiple trees at once,
each tree needs to have the same structure. Specifically two trees can only be considered similar,
or "isomorphic", if the full paths to all of their descendent nodes are the same.

Applying :py:func:`~xarray.group_subtrees` to trees with different structures
raises :py:class:`~xarray.TreeIsomorphismError`:

.. jupyter-execute::
    :raises:

    tree = xr.DataTree.from_dict({"a": None, "a/b": None, "a/c": None})
    simple_tree = xr.DataTree.from_dict({"a": None})
    for _ in xr.group_subtrees(tree, simple_tree):
        ...

We can explicitly also check if any two trees are isomorphic using the :py:meth:`~xarray.DataTree.isomorphic` method:

.. jupyter-execute::

    tree.isomorphic(simple_tree)

Corresponding tree nodes do not need to have the same data in order to be considered isomorphic:

.. jupyter-execute::

    tree_with_data = xr.DataTree.from_dict({"a": xr.Dataset({"foo": 1})})
    simple_tree.isomorphic(tree_with_data)

They also do not need to define child nodes in the same order:

.. jupyter-execute::

    reordered_tree = xr.DataTree.from_dict({"a": None, "a/c": None, "a/b": None})
    tree.isomorphic(reordered_tree)

Arithmetic Between Multiple Trees
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Arithmetic operations like multiplication are binary operations, so as long as we have two isomorphic trees,
we can do arithmetic between them.

.. jupyter-execute::

    currents = xr.DataTree.from_dict(
        {
            "/oscilloscope1": xr.Dataset(
                {
                    "current": (
                        "time",
                        signal_generator(time_stamps1, f=2, A=1.2, phase=1),
                    ),
                },
                coords={"time": time_stamps1},
            ),
            "/oscilloscope2": xr.Dataset(
                {
                    "current": (
                        "time",
                        signal_generator(time_stamps2, f=1.6, A=1.6, phase=0.7),
                    ),
                },
                coords={"time": time_stamps2},
            ),
        }
    )
    print(currents)

.. jupyter-execute::

    currents.isomorphic(voltages)

We could use this feature to quickly calculate the electrical power in our signal, P=IV.

.. jupyter-execute::

    power = currents * voltages
    print(power)

.. _hierarchical-data.alignment-and-coordinate-inheritance:

Alignment and Coordinate Inheritance
------------------------------------

.. _data-alignment:

Data Alignment
~~~~~~~~~~~~~~

The data in different datatree nodes are not totally independent. In particular dimensions (and indexes) in child nodes must be exactly aligned with those in their parent nodes.
Exact alignment means that shared dimensions must be the same length, and indexes along those dimensions must be equal.

.. note::
    If you were a previous user of the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package, this is different from what you're used to!
    In that package the data model was that the data stored in each node actually was completely unrelated. The data model is now slightly stricter.
    This allows us to provide features like :ref:`coordinate-inheritance`.

To demonstrate, let's first generate some example datasets which are not aligned with one another:

.. jupyter-execute::

    # (drop the attributes just to make the printed representation shorter)
    ds = xr.tutorial.open_dataset("air_temperature").drop_attrs()

    ds_daily = ds.resample(time="D").mean("time")
    ds_weekly = ds.resample(time="W").mean("time")
    ds_monthly = ds.resample(time="ME").mean("time")

These datasets have different lengths along the ``time`` dimension, and are therefore not aligned along that dimension.

.. jupyter-execute::

    print(ds_daily.sizes)
    print(ds_weekly.sizes)
    print(ds_monthly.sizes)

We cannot store these non-alignable variables on a single :py:class:`~xarray.Dataset` object, because they do not exactly align:

.. jupyter-execute::
    :raises:

    xr.align(ds_daily, ds_weekly, ds_monthly, join="exact")

But we :ref:`previously said <why>` that multi-resolution data is a good use case for :py:class:`~xarray.DataTree`, so surely we should be able to store these in a single :py:class:`~xarray.DataTree`?
If we first try to create a :py:class:`~xarray.DataTree` with these different-length time dimensions present in both parents and children, we will still get an alignment error:

.. jupyter-execute::
    :raises:

    xr.DataTree.from_dict({"daily": ds_daily, "daily/weekly": ds_weekly})

This is because DataTree checks that data in child nodes align exactly with their parents.

.. note::
    This requirement of aligned dimensions is similar to netCDF's concept of `inherited dimensions <https://www.unidata.ucar.edu/software/netcdf/workshops/2007/groups-types/Introduction.html>`_, as in netCDF-4 files dimensions are `visible to all child groups <https://docs.unidata.ucar.edu/netcdf-c/current/groups.html>`_.

This alignment check is performed up through the tree, all the way to the root, and so is therefore equivalent to requiring that this :py:func:`~xarray.align` command succeeds:

.. code:: python

    xr.align(child.dataset, *(parent.dataset for parent in child.parents), join="exact")

To represent our unalignable data in a single :py:class:`~xarray.DataTree`, we must instead place all variables which are a function of these different-length dimensions into nodes that are not direct descendents of one another, e.g. organize them as siblings.

.. jupyter-execute::

    dt = xr.DataTree.from_dict(
        {"daily": ds_daily, "weekly": ds_weekly, "monthly": ds_monthly}
    )
    print(dt)

Now we have a valid :py:class:`~xarray.DataTree` structure which contains all the data at each different time frequency, stored in a separate group.

This is a useful way to organise our data because we can still operate on all the groups at once.
For example we can extract all three timeseries at a specific lat-lon location:

.. jupyter-execute::

    dt_sel = dt.sel(lat=75, lon=300)
    print(dt_sel)

or compute the standard deviation of each timeseries to find out how it varies with sampling frequency:

.. jupyter-execute::

    dt_std = dt.std(dim="time")
    print(dt_std)

.. _coordinate-inheritance:

Coordinate Inheritance
~~~~~~~~~~~~~~~~~~~~~~

Notice that in the trees we constructed above there is some redundancy - the ``lat`` and ``lon`` variables appear in each sibling group, but are identical across the groups.

.. jupyter-execute::

    dt

We can use "Coordinate Inheritance" to define them only once in a parent group and remove this redundancy, whilst still being able to access those coordinate variables from the child groups.

.. note::
    This is also a new feature relative to the prototype `xarray-contrib/datatree <https://github.com/xarray-contrib/datatree>`_ package.

Let's instead place only the time-dependent variables in the child groups, and put the non-time-dependent ``lat`` and ``lon`` variables in the parent (root) group:

.. jupyter-execute::

    dt = xr.DataTree.from_dict(
        {
            "/": ds.drop_dims("time"),
            "daily": ds_daily.drop_vars(["lat", "lon"]),
            "weekly": ds_weekly.drop_vars(["lat", "lon"]),
            "monthly": ds_monthly.drop_vars(["lat", "lon"]),
        }
    )
    dt

This is preferred to the previous representation because it now makes it clear that all of these datasets share common spatial grid coordinates.
Defining the common coordinates just once also ensures that the spatial coordinates for each group cannot become out of sync with one another during operations.

We can still access the coordinates defined in the parent groups from any of the child groups as if they were actually present on the child groups:

.. jupyter-execute::

    dt.daily.coords

.. jupyter-execute::

    dt["daily/lat"]

As we can still access them, we say that the ``lat`` and ``lon`` coordinates in the child groups have been "inherited" from their common parent group.

If we print just one of the child nodes, it will still display inherited coordinates, but explicitly mark them as such:

.. jupyter-execute::

    dt["/daily"]

This helps to differentiate which variables are defined on the datatree node that you are currently looking at, and which were defined somewhere above it.

We can also still perform all the same operations on the whole tree:

.. jupyter-execute::

    dt.sel(lat=[75], lon=[300])

.. jupyter-execute::

    dt.std(dim="time")

    ###########
User Guide
###########

In this user guide, you will find detailed descriptions and
examples that describe many common tasks that you can accomplish with Xarray.


.. toctree::
   :maxdepth: 2
   :caption: Data model

   terminology
   data-structures
   hierarchical-data
   dask


.. toctree::
   :maxdepth: 2
   :caption: Core operations

   indexing
   combining
   reshaping
   computation
   groupby
   interpolation

.. toctree::
   :maxdepth: 2
   :caption: I/O

   io
   complex-numbers

.. toctree::
   :maxdepth: 2
   :caption: Visualization

   plotting


.. toctree::
   :maxdepth: 2
   :caption: Interoperability

   pandas
   duckarrays
   ecosystem


.. toctree::
   :maxdepth: 2
   :caption: Domain-specific workflows

   time-series
   weather-climate

.. toctree::
   :maxdepth: 2
   :caption: Options and Testing

   options
   testing

    .. _indexing:

Indexing and selecting data
===========================

.. jupyter-execute::
    :hide-code:
    :hide-output:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

    %xmode minimal

Xarray offers extremely flexible indexing routines that combine the best
features of NumPy and pandas for data selection.

The most basic way to access elements of a :py:class:`~xarray.DataArray`
object is to use Python's ``[]`` syntax, such as ``array[i, j]``, where
``i`` and ``j`` are both integers.
As xarray objects can store coordinates corresponding to each dimension of an
array, label-based indexing similar to ``pandas.DataFrame.loc`` is also possible.
In label-based indexing, the element position ``i`` is automatically
looked-up from the coordinate values.

Dimensions of xarray objects have names, so you can also lookup the dimensions
by name, instead of remembering their positional order.

Quick overview
--------------

In total, xarray supports four different kinds of indexing, as described
below and summarized in this table:

.. |br| raw:: html

   <br />

+------------------+--------------+---------------------------------+--------------------------------+
| Dimension lookup | Index lookup | ``DataArray`` syntax            | ``Dataset`` syntax             |
+==================+==============+=================================+================================+
| Positional       | By integer   | ``da[:, 0]``                    | *not available*                |
+------------------+--------------+---------------------------------+--------------------------------+
| Positional       | By label     | ``da.loc[:, 'IA']``             | *not available*                |
+------------------+--------------+---------------------------------+--------------------------------+
| By name          | By integer   | ``da.isel(space=0)`` or |br|    | ``ds.isel(space=0)`` or |br|   |
|                  |              | ``da[dict(space=0)]``           | ``ds[dict(space=0)]``          |
+------------------+--------------+---------------------------------+--------------------------------+
| By name          | By label     | ``da.sel(space='IA')`` or |br|  | ``ds.sel(space='IA')`` or |br| |
|                  |              | ``da.loc[dict(space='IA')]``    | ``ds.loc[dict(space='IA')]``   |
+------------------+--------------+---------------------------------+--------------------------------+

More advanced indexing is also possible for all the methods by
supplying :py:class:`~xarray.DataArray` objects as indexer.
See :ref:`vectorized_indexing` for the details.


Positional indexing
-------------------

Indexing a :py:class:`~xarray.DataArray` directly works (mostly) just like it
does for numpy arrays, except that the returned object is always another
DataArray:

.. jupyter-execute::

    da = xr.DataArray(
        np.random.rand(4, 3),
        [
            ("time", pd.date_range("2000-01-01", periods=4)),
            ("space", ["IA", "IL", "IN"]),
        ],
    )
    da[:2]

.. jupyter-execute::

    da[0, 0]

.. jupyter-execute::

    da[:, [2, 1]]

Attributes are persisted in all indexing operations.

.. warning::

    Positional indexing deviates from the NumPy when indexing with multiple
    arrays like ``da[[0, 1], [0, 1]]``, as described in
    :ref:`vectorized_indexing`.

Xarray also supports label-based indexing, just like pandas. Because
we use a :py:class:`pandas.Index` under the hood, label based indexing is very
fast. To do label based indexing, use the :py:attr:`~xarray.DataArray.loc` attribute:

.. jupyter-execute::

    da.loc["2000-01-01":"2000-01-02", "IA"]

In this example, the selected is a subpart of the array
in the range '2000-01-01':'2000-01-02' along the first coordinate ``time``
and with 'IA' value from the second coordinate ``space``.

You can perform any of the `label indexing operations supported by pandas`__,
including indexing with individual, slices and lists/arrays of labels, as well as
indexing with boolean arrays. Like pandas, label based indexing in xarray is
*inclusive* of both the start and stop bounds.

__ https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-label

Setting values with label based indexing is also supported:

.. jupyter-execute::

    da.loc["2000-01-01", ["IL", "IN"]] = -10
    da


Indexing with dimension names
-----------------------------

With the dimension names, we do not have to rely on dimension order and can
use them explicitly to slice data. There are two ways to do this:

1. Use the :py:meth:`~xarray.DataArray.sel` and :py:meth:`~xarray.DataArray.isel`
   convenience methods:

    .. jupyter-execute::

        # index by integer array indices
        da.isel(space=0, time=slice(None, 2))

    .. jupyter-execute::

        # index by dimension coordinate labels
        da.sel(time=slice("2000-01-01", "2000-01-02"))

2. Use a dictionary as the argument for array positional or label based array
   indexing:

    .. jupyter-execute::

        # index by integer array indices
        da[dict(space=0, time=slice(None, 2))]

    .. jupyter-execute::

        # index by dimension coordinate labels
        da.loc[dict(time=slice("2000-01-01", "2000-01-02"))]

The arguments to these methods can be any objects that could index the array
along the dimension given by the keyword, e.g., labels for an individual value,
:py:class:`Python slice` objects or 1-dimensional arrays.


.. note::

    We would love to be able to do indexing with labeled dimension names inside
    brackets, but unfortunately, `Python does not yet support indexing with
    keyword arguments`__ like ``da[space=0]``

__ https://legacy.python.org/dev/peps/pep-0472/


.. _nearest neighbor lookups:

Nearest neighbor lookups
------------------------

The label based selection methods :py:meth:`~xarray.Dataset.sel`,
:py:meth:`~xarray.Dataset.reindex` and :py:meth:`~xarray.Dataset.reindex_like` all
support ``method`` and ``tolerance`` keyword argument. The method parameter allows for
enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
``'backfill'`` or ``'nearest'``:

.. jupyter-execute::

    da = xr.DataArray([1, 2, 3], [("x", [0, 1, 2])])
    da.sel(x=[1.1, 1.9], method="nearest")

.. jupyter-execute::

    da.sel(x=0.1, method="backfill")

.. jupyter-execute::

    da.reindex(x=[0.5, 1, 1.5, 2, 2.5], method="pad")

Tolerance limits the maximum distance for valid matches with an inexact lookup:

.. jupyter-execute::

    da.reindex(x=[1.1, 1.5], method="nearest", tolerance=0.2)

The method parameter is not yet supported if any of the arguments
to ``.sel()`` is a ``slice`` object:

.. jupyter-execute::
   :raises:

   da.sel(x=slice(1, 3), method="nearest")

However, you don't need to use ``method`` to do inexact slicing. Slicing
already returns all values inside the range (inclusive), as long as the index
labels are monotonic increasing:

.. jupyter-execute::

    da.sel(x=slice(0.9, 3.1))

Indexing axes with monotonic decreasing labels also works, as long as the
``slice`` or ``.loc`` arguments are also decreasing:

.. jupyter-execute::

    reversed_da = da[::-1]
    reversed_da.loc[3.1:0.9]


.. note::

  If you want to interpolate along coordinates rather than looking up the
  nearest neighbors, use :py:meth:`~xarray.Dataset.interp` and
  :py:meth:`~xarray.Dataset.interp_like`.
  See :ref:`interpolation <interp>` for the details.


Dataset indexing
----------------

We can also use these methods to index all variables in a dataset
simultaneously, returning a new dataset:

.. jupyter-execute::

    da = xr.DataArray(
        np.random.rand(4, 3),
        [
            ("time", pd.date_range("2000-01-01", periods=4)),
            ("space", ["IA", "IL", "IN"]),
        ],
    )
    ds = da.to_dataset(name="foo")
    ds.isel(space=[0], time=[0])

.. jupyter-execute::

    ds.sel(time="2000-01-01")

Positional indexing on a dataset is not supported because the ordering of
dimensions in a dataset is somewhat ambiguous (it can vary between different
arrays). However, you can do normal indexing with dimension names:

.. jupyter-execute::

    ds[dict(space=[0], time=[0])]

.. jupyter-execute::

    ds.loc[dict(time="2000-01-01")]

Dropping labels and dimensions
------------------------------

The :py:meth:`~xarray.Dataset.drop_sel` method returns a new object with the listed
index labels along a dimension dropped:

.. jupyter-execute::

    ds.drop_sel(space=["IN", "IL"])

``drop_sel`` is both a ``Dataset`` and ``DataArray`` method.

Use :py:meth:`~xarray.Dataset.drop_dims` to drop a full dimension from a Dataset.
Any variables with these dimensions are also dropped:

.. jupyter-execute::

    ds.drop_dims("time")

.. _masking with where:

Masking with ``where``
----------------------

Indexing methods on xarray objects generally return a subset of the original data.
However, it is sometimes useful to select an object with the same shape as the
original data, but with some elements masked. To do this type of selection in
xarray, use :py:meth:`~xarray.DataArray.where`:

.. jupyter-execute::

    da = xr.DataArray(np.arange(16).reshape(4, 4), dims=["x", "y"])
    da.where(da.x + da.y < 4)

This is particularly useful for ragged indexing of multi-dimensional data,
e.g., to apply a 2D mask to an image. Note that ``where`` follows all the
usual xarray broadcasting and alignment rules for binary operations (e.g.,
``+``) between the object being indexed and the condition, as described in
:ref:`compute`:

.. jupyter-execute::

    da.where(da.y < 2)

By default ``where`` maintains the original size of the data.  For cases
where the selected data size is much smaller than the original data,
use of the option ``drop=True`` clips coordinate
elements that are fully masked:

.. jupyter-execute::

    da.where(da.y < 2, drop=True)

.. _selecting values with isin:

Selecting values with ``isin``
------------------------------

To check whether elements of an xarray object contain a single object, you can
compare with the equality operator ``==`` (e.g., ``arr == 3``). To check
multiple values, use :py:meth:`~xarray.DataArray.isin`:

.. jupyter-execute::

    da = xr.DataArray([1, 2, 3, 4, 5], dims=["x"])
    da.isin([2, 4])

:py:meth:`~xarray.DataArray.isin` works particularly well with
:py:meth:`~xarray.DataArray.where` to support indexing by arrays that are not
already labels of an array:

.. jupyter-execute::

    lookup = xr.DataArray([-1, -2, -3, -4, -5], dims=["x"])
    da.where(lookup.isin([-2, -4]), drop=True)

However, some caution is in order: when done repeatedly, this type of indexing
is significantly slower than using :py:meth:`~xarray.DataArray.sel`.

.. _vectorized_indexing:

Vectorized Indexing
-------------------

Like numpy and pandas, xarray supports indexing many array elements at once in a
vectorized manner.

If you only provide integers, slices, or unlabeled arrays (array without
dimension names, such as ``np.ndarray``, ``list``, but not
:py:meth:`~xarray.DataArray` or :py:meth:`~xarray.Variable`) indexing can be
understood as orthogonally. Each indexer component selects independently along
the corresponding dimension, similar to how vector indexing works in Fortran or
MATLAB, or after using the :py:func:`numpy.ix_` helper:

.. jupyter-execute::

    da = xr.DataArray(
        np.arange(12).reshape((3, 4)),
        dims=["x", "y"],
        coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]},
    )
    da

.. jupyter-execute::

    da[[0, 2, 2], [1, 3]]

For more flexibility, you can supply :py:meth:`~xarray.DataArray` objects
as indexers.
Dimensions on resultant arrays are given by the ordered union of the indexers'
dimensions:

.. jupyter-execute::

    ind_x = xr.DataArray([0, 1], dims=["x"])
    ind_y = xr.DataArray([0, 1], dims=["y"])
    da[ind_x, ind_y]  # orthogonal indexing

Slices or sequences/arrays without named-dimensions are treated as if they have
the same dimension which is indexed along:

.. jupyter-execute::

    # Because [0, 1] is used to index along dimension 'x',
    # it is assumed to have dimension 'x'
    da[[0, 1], ind_x]

Furthermore, you can use multi-dimensional :py:meth:`~xarray.DataArray`
as indexers, where the resultant array dimension is also determined by
indexers' dimension:

.. jupyter-execute::

    ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"])
    da[ind]

Similar to how `NumPy's advanced indexing`_ works, vectorized
indexing for xarray is based on our
:ref:`broadcasting rules <compute.broadcasting>`.
See :ref:`indexing.rules` for the complete specification.

.. _NumPy's advanced indexing: https://numpy.org/doc/stable/user/basics.indexing.html#advanced-indexing

Vectorized indexing also works with ``isel``, ``loc``, and ``sel``:

.. jupyter-execute::

    ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"])
    da.isel(y=ind)  # same as da[:, ind]

.. jupyter-execute::

    ind = xr.DataArray([["a", "b"], ["b", "a"]], dims=["a", "b"])
    da.loc[:, ind]  # same as da.sel(y=ind)

These methods may also be applied to ``Dataset`` objects

.. jupyter-execute::

    ds = da.to_dataset(name="bar")
    ds.isel(x=xr.DataArray([0, 1, 2], dims=["points"]))

Vectorized indexing may be used to extract information from the nearest
grid cells of interest, for example, the nearest climate model grid cells
to a collection specified weather station latitudes and longitudes.
To trigger vectorized indexing behavior
you will need to provide the selection dimensions with a new
shared output dimension name. In the example below, the selections
of the closest latitude and longitude are renamed to an output
dimension named "points":


.. jupyter-execute::

    ds = xr.tutorial.open_dataset("air_temperature")

    # Define target latitude and longitude (where weather stations might be)
    target_lon = xr.DataArray([200, 201, 202, 205], dims="points")
    target_lat = xr.DataArray([31, 41, 42, 42], dims="points")

    # Retrieve data at the grid cells nearest to the target latitudes and longitudes
    da = ds["air"].sel(lon=target_lon, lat=target_lat, method="nearest")
    da

.. tip::

  If you are lazily loading your data from disk, not every form of vectorized
  indexing is supported (or if supported, may not be supported efficiently).
  You may find increased performance by loading your data into memory first,
  e.g., with :py:meth:`~xarray.Dataset.load`.

.. note::

  If an indexer is a :py:meth:`~xarray.DataArray`, its coordinates should not
  conflict with the selected subpart of the target array (except for the
  explicitly indexed dimensions with ``.loc``/``.sel``).
  Otherwise, ``IndexError`` will be raised.


.. _assigning_values:

Assigning values with indexing
------------------------------

To select and assign values to a portion of a :py:meth:`~xarray.DataArray` you
can use indexing with ``.loc`` :

.. jupyter-execute::

    ds = xr.tutorial.open_dataset("air_temperature")

    # add an empty 2D dataarray
    ds["empty"] = xr.full_like(ds.air.mean("time"), fill_value=0)

    # modify one grid point using loc()
    ds["empty"].loc[dict(lon=260, lat=30)] = 100

    # modify a 2D region using loc()
    lc = ds.coords["lon"]
    la = ds.coords["lat"]
    ds["empty"].loc[
        dict(lon=lc[(lc > 220) & (lc < 260)], lat=la[(la > 20) & (la < 60)])
    ] = 100

or :py:meth:`~xarray.where`:

.. jupyter-execute::

    # modify one grid point using xr.where()
    ds["empty"] = xr.where(
        (ds.coords["lat"] == 20) & (ds.coords["lon"] == 260), 100, ds["empty"]
    )

    # or modify a 2D region using xr.where()
    mask = (
        (ds.coords["lat"] > 20)
        & (ds.coords["lat"] < 60)
        & (ds.coords["lon"] > 220)
        & (ds.coords["lon"] < 260)
    )
    ds["empty"] = xr.where(mask, 100, ds["empty"])


Vectorized indexing can also be used to assign values to xarray object.

.. jupyter-execute::

    da = xr.DataArray(
        np.arange(12).reshape((3, 4)),
        dims=["x", "y"],
        coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]},
    )
    da

.. jupyter-execute::

    da[0] = -1  # assignment with broadcasting
    da

.. jupyter-execute::

    ind_x = xr.DataArray([0, 1], dims=["x"])
    ind_y = xr.DataArray([0, 1], dims=["y"])
    da[ind_x, ind_y] = -2  # assign -2 to (ix, iy) = (0, 0) and (1, 1)
    da

.. jupyter-execute::

    da[ind_x, ind_y] += 100  # increment is also possible
    da

Like ``numpy.ndarray``, value assignment sometimes works differently from what one may expect.

.. jupyter-execute::

    da = xr.DataArray([0, 1, 2, 3], dims=["x"])
    ind = xr.DataArray([0, 0, 0], dims=["x"])
    da[ind] -= 1
    da

Where the 0th element will be subtracted 1 only once.
This is because ``v[0] = v[0] - 1`` is called three times, rather than
``v[0] = v[0] - 1 - 1 - 1``.
See `Assigning values to indexed arrays`__ for the details.

__ https://numpy.org/doc/stable/user/basics.indexing.html#assigning-values-to-indexed-arrays


.. note::
  Dask array does not support value assignment
  (see :ref:`dask` for the details).

.. note::

  Coordinates in both the left- and right-hand-side arrays should not
  conflict with each other.
  Otherwise, ``IndexError`` will be raised.

.. warning::

  Do not try to assign values when using any of the indexing methods ``isel``
  or ``sel``::

    # DO NOT do this
    da.isel(space=0) = 0

  Instead, values can be assigned using dictionary-based indexing::

    da[dict(space=0)] = 0

  Assigning values with the chained indexing using ``.sel`` or ``.isel`` fails silently.

  .. jupyter-execute::

      da = xr.DataArray([0, 1, 2, 3], dims=["x"])
      # DO NOT do this
      da.isel(x=[0, 1, 2])[1] = -1
      da

You can also assign values to all variables of a :py:class:`Dataset` at once:

.. jupyter-execute::
    :stderr:

    ds_org = xr.tutorial.open_dataset("eraint_uvz").isel(
        latitude=slice(56, 59), longitude=slice(255, 258), level=0
    )
    # set all values to 0
    ds = xr.zeros_like(ds_org)
    ds

.. jupyter-execute::

    # by integer
    ds[dict(latitude=2, longitude=2)] = 1
    ds["u"]

.. jupyter-execute::

    ds["v"]

.. jupyter-execute::

    # by label
    ds.loc[dict(latitude=47.25, longitude=[11.25, 12])] = 100
    ds["u"]

.. jupyter-execute::

    # dataset as new values
    new_dat = ds_org.loc[dict(latitude=48, longitude=[11.25, 12])]
    new_dat

.. jupyter-execute::

    ds.loc[dict(latitude=47.25, longitude=[11.25, 12])] = new_dat
    ds["u"]

The dimensions can differ between the variables in the dataset, but all variables need to have at least the dimensions specified in the indexer dictionary.
The new values must be either a scalar, a :py:class:`DataArray` or a :py:class:`Dataset` itself that contains all variables that also appear in the dataset to be modified.

.. _more_advanced_indexing:

More advanced indexing
-----------------------

The use of :py:meth:`~xarray.DataArray` objects as indexers enables very
flexible indexing. The following is an example of the pointwise indexing:

.. jupyter-execute::

    da = xr.DataArray(np.arange(56).reshape((7, 8)), dims=["x", "y"])
    da

.. jupyter-execute::

    da.isel(x=xr.DataArray([0, 1, 6], dims="z"), y=xr.DataArray([0, 1, 0], dims="z"))


where three elements at ``(ix, iy) = ((0, 0), (1, 1), (6, 0))`` are selected
and mapped along a new dimension ``z``.

If you want to add a coordinate to the new dimension ``z``,
you can supply a :py:class:`~xarray.DataArray` with a coordinate,

.. jupyter-execute::

    da.isel(
        x=xr.DataArray([0, 1, 6], dims="z", coords={"z": ["a", "b", "c"]}),
        y=xr.DataArray([0, 1, 0], dims="z"),
    )

Analogously, label-based pointwise-indexing is also possible by the ``.sel``
method:

.. jupyter-execute::

    da = xr.DataArray(
        np.random.rand(4, 3),
        [
            ("time", pd.date_range("2000-01-01", periods=4)),
            ("space", ["IA", "IL", "IN"]),
        ],
    )
    times = xr.DataArray(
        pd.to_datetime(["2000-01-03", "2000-01-02", "2000-01-01"]), dims="new_time"
    )
    da.sel(space=xr.DataArray(["IA", "IL", "IN"], dims=["new_time"]), time=times)

.. _align and reindex:

Align and reindex
-----------------

Xarray's ``reindex``, ``reindex_like`` and ``align`` impose a ``DataArray`` or
``Dataset`` onto a new set of coordinates corresponding to dimensions. The
original values are subset to the index labels still found in the new labels,
and values corresponding to new labels not found in the original object are
in-filled with ``NaN``.

Xarray operations that combine multiple objects generally automatically align
their arguments to share the same indexes. However, manual alignment can be
useful for greater control and for increased performance.

To reindex a particular dimension, use :py:meth:`~xarray.DataArray.reindex`:

.. jupyter-execute::

    da.reindex(space=["IA", "CA"])

The :py:meth:`~xarray.DataArray.reindex_like` method is a useful shortcut.
To demonstrate, we will make a subset DataArray with new values:

.. jupyter-execute::

    foo = da.rename("foo")
    baz = (10 * da[:2, :2]).rename("baz")
    baz

Reindexing ``foo`` with ``baz`` selects out the first two values along each
dimension:

.. jupyter-execute::

    foo.reindex_like(baz)

The opposite operation asks us to reindex to a larger shape, so we fill in
the missing values with ``NaN``:

.. jupyter-execute::

    baz.reindex_like(foo)

The :py:func:`~xarray.align` function lets us perform more flexible database-like
``'inner'``, ``'outer'``, ``'left'`` and ``'right'`` joins:

.. jupyter-execute::

    xr.align(foo, baz, join="inner")

.. jupyter-execute::

    xr.align(foo, baz, join="outer")

Both ``reindex_like`` and ``align`` work interchangeably between
:py:class:`~xarray.DataArray` and :py:class:`~xarray.Dataset` objects, and with any number of matching dimension names:

.. jupyter-execute::

    ds

.. jupyter-execute::

    ds.reindex_like(baz)

.. jupyter-execute::

    other = xr.DataArray(["a", "b", "c"], dims="other")
    # this is a no-op, because there are no shared dimension names
    ds.reindex_like(other)

.. _indexing.missing_coordinates:

Missing coordinate labels
-------------------------

Coordinate labels for each dimension are optional (as of xarray v0.9). Label
based indexing with ``.sel`` and ``.loc`` uses standard positional,
integer-based indexing as a fallback for dimensions without a coordinate label:

.. jupyter-execute::

    da = xr.DataArray([1, 2, 3], dims="x")
    da.sel(x=[0, -1])

Alignment between xarray objects where one or both do not have coordinate labels
succeeds only if all dimensions of the same name have the same length.
Otherwise, it raises an informative error:

.. jupyter-execute::
    :raises:

    xr.align(da, da[:2])

Underlying Indexes
------------------

Xarray uses the :py:class:`pandas.Index` internally to perform indexing
operations.  If you need to access the underlying indexes, they are available
through the :py:attr:`~xarray.DataArray.indexes` attribute.

.. jupyter-execute::

    da = xr.DataArray(
        np.random.rand(4, 3),
        [
            ("time", pd.date_range("2000-01-01", periods=4)),
            ("space", ["IA", "IL", "IN"]),
        ],
    )
    da

.. jupyter-execute::

    da.indexes

.. jupyter-execute::

    da.indexes["time"]

Use :py:meth:`~xarray.DataArray.get_index` to get an index for a dimension,
falling back to a default :py:class:`pandas.RangeIndex` if it has no coordinate
labels:

.. jupyter-execute::

    da = xr.DataArray([1, 2, 3], dims="x")
    da

.. jupyter-execute::

    da.get_index("x")


.. _copies_vs_views:

Copies vs. Views
----------------

Whether array indexing returns a view or a copy of the underlying
data depends on the nature of the labels.

For positional (integer)
indexing, xarray follows the same `rules`_ as NumPy:

* Positional indexing with only integers and slices returns a view.
* Positional indexing with arrays or lists returns a copy.

The rules for label based indexing are more complex:

* Label-based indexing with only slices returns a view.
* Label-based indexing with arrays returns a copy.
* Label-based indexing with scalars returns a view or a copy, depending
  upon if the corresponding positional indexer can be represented as an
  integer or a slice object. The exact rules are determined by pandas.

Whether data is a copy or a view is more predictable in xarray than in pandas, so
unlike pandas, xarray does not produce `SettingWithCopy warnings`_. However, you
should still avoid assignment with chained indexing.

Note that other operations (such as :py:meth:`~xarray.DataArray.values`) may also return views rather than copies.

.. _SettingWithCopy warnings: https://pandas.pydata.org/pandas-docs/stable/indexing.html#returning-a-view-versus-a-copy
.. _rules: https://numpy.org/doc/stable/user/basics.copies.html

.. _multi-level indexing:

Multi-level indexing
--------------------

Just like pandas, advanced indexing on multi-level indexes is possible with
``loc`` and ``sel``. You can slice a multi-index by providing multiple indexers,
i.e., a tuple of slices, labels, list of labels, or any selector allowed by
pandas:

.. jupyter-execute::

    midx = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("one", "two"))
    mda = xr.DataArray(np.random.rand(6, 3), [("x", midx), ("y", range(3))])
    mda

.. jupyter-execute::

    mda.sel(x=(list("ab"), [0]))

You can also select multiple elements by providing a list of labels or tuples or
a slice of tuples:

.. jupyter-execute::

    mda.sel(x=[("a", 0), ("b", 1)])

Additionally, xarray supports dictionaries:

.. jupyter-execute::

    mda.sel(x={"one": "a", "two": 0})

For convenience, ``sel`` also accepts multi-index levels directly
as keyword arguments:

.. jupyter-execute::

    mda.sel(one="a", two=0)

Note that using ``sel`` it is not possible to mix a dimension
indexer with level indexers for that dimension
(e.g., ``mda.sel(x={'one': 'a'}, two=0)`` will raise a ``ValueError``).

Like pandas, xarray handles partial selection on multi-index (level drop).
As shown below, it also renames the dimension / coordinate when the
multi-index is reduced to a single index.

.. jupyter-execute::

    mda.loc[{"one": "a"}, ...]

Unlike pandas, xarray does not guess whether you provide index levels or
dimensions when using ``loc`` in some ambiguous cases. For example, for
``mda.loc[{'one': 'a', 'two': 0}]`` and ``mda.loc['a', 0]`` xarray
always interprets ('one', 'two') and ('a', 0) as the names and
labels of the 1st and 2nd dimension, respectively. You must specify all
dimensions or use the ellipsis in the ``loc`` specifier, e.g. in the example
above, ``mda.loc[{'one': 'a', 'two': 0}, :]`` or ``mda.loc[('a', 0), ...]``.


.. _indexing.rules:

Indexing rules
--------------

Here we describe the full rules xarray uses for vectorized indexing. Note that
this is for the purposes of explanation: for the sake of efficiency and to
support various backends, the actual implementation is different.

0. (Only for label based indexing.) Look up positional indexes along each
   dimension from the corresponding :py:class:`pandas.Index`.

1. A full slice object ``:`` is inserted for each dimension without an indexer.

2. ``slice`` objects are converted into arrays, given by
   ``np.arange(*slice.indices(...))``.

3. Assume dimension names for array indexers without dimensions, such as
   ``np.ndarray`` and ``list``, from the dimensions to be indexed along.
   For example, ``v.isel(x=[0, 1])`` is understood as
   ``v.isel(x=xr.DataArray([0, 1], dims=['x']))``.

4. For each variable in a ``Dataset`` or  ``DataArray`` (the array and its
   coordinates):

   a. Broadcast all relevant indexers based on their dimension names
      (see :ref:`compute.broadcasting` for full details).

   b. Index the underling array by the broadcast indexers, using NumPy's
      advanced indexing rules.

5. If any indexer DataArray has coordinates and no coordinate with the
   same name exists, attach them to the indexed object.

.. note::

  Only 1-dimensional boolean arrays can be used as indexers.

    .. _interp:

Interpolating data
==================

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import pandas as pd
    import xarray as xr
    import matplotlib.pyplot as plt

    np.random.seed(123456)

Xarray offers flexible interpolation routines, which have a similar interface
to our :ref:`indexing <indexing>`.

.. note::

  ``interp`` requires ``scipy`` installed.


Scalar and 1-dimensional interpolation
--------------------------------------

Interpolating a :py:class:`~xarray.DataArray` works mostly like labeled
indexing of a :py:class:`~xarray.DataArray`,

.. jupyter-execute::

    da = xr.DataArray(
        np.sin(0.3 * np.arange(12).reshape(4, 3)),
        [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
    )
    # label lookup
    da.sel(time=3)

.. jupyter-execute::

    # interpolation
    da.interp(time=2.5)


Similar to the indexing, :py:meth:`~xarray.DataArray.interp` also accepts an
array-like, which gives the interpolated result as an array.

.. jupyter-execute::

    # label lookup
    da.sel(time=[2, 3])

.. jupyter-execute::

    # interpolation
    da.interp(time=[2.5, 3.5])

To interpolate data with a :py:doc:`numpy.datetime64 <numpy:reference/arrays.datetime>` coordinate you can pass a string.

.. jupyter-execute::

    da_dt64 = xr.DataArray(
        [1, 3], [("time", pd.date_range("1/1/2000", "1/3/2000", periods=2))]
    )
    da_dt64.interp(time="2000-01-02")

The interpolated data can be merged into the original :py:class:`~xarray.DataArray`
by specifying the time periods required.

.. jupyter-execute::

    da_dt64.interp(time=pd.date_range("1/1/2000", "1/3/2000", periods=3))

Interpolation of data indexed by a :py:class:`~xarray.CFTimeIndex` is also
allowed.  See :ref:`CFTimeIndex` for examples.

.. note::

  Currently, our interpolation only works for regular grids.
  Therefore, similarly to :py:meth:`~xarray.DataArray.sel`,
  only 1D coordinates along a dimension can be used as the
  original coordinate to be interpolated.


Multi-dimensional Interpolation
-------------------------------

Like :py:meth:`~xarray.DataArray.sel`, :py:meth:`~xarray.DataArray.interp`
accepts multiple coordinates. In this case, multidimensional interpolation
is carried out.

.. jupyter-execute::

    # label lookup
    da.sel(time=2, space=0.1)

.. jupyter-execute::

    # interpolation
    da.interp(time=2.5, space=0.15)

Array-like coordinates are also accepted:

.. jupyter-execute::

    # label lookup
    da.sel(time=[2, 3], space=[0.1, 0.2])

.. jupyter-execute::

    # interpolation
    da.interp(time=[1.5, 2.5], space=[0.15, 0.25])


:py:meth:`~xarray.DataArray.interp_like` method is a useful shortcut. This
method interpolates an xarray object onto the coordinates of another xarray
object. For example, if we want to compute the difference between
two :py:class:`~xarray.DataArray` s (``da`` and ``other``) staying on slightly
different coordinates,

.. jupyter-execute::

    other = xr.DataArray(
        np.sin(0.4 * np.arange(9).reshape(3, 3)),
        [("time", [0.9, 1.9, 2.9]), ("space", [0.15, 0.25, 0.35])],
    )

it might be a good idea to first interpolate ``da`` so that it will stay on the
same coordinates of ``other``, and then subtract it.
:py:meth:`~xarray.DataArray.interp_like` can be used for such a case,

.. jupyter-execute::

    # interpolate da along other's coordinates
    interpolated = da.interp_like(other)
    interpolated

It is now possible to safely compute the difference ``other - interpolated``.


Interpolation methods
---------------------

We use either :py:class:`scipy.interpolate.interp1d` or special interpolants from
:py:class:`scipy.interpolate` for 1-dimensional interpolation (see :py:meth:`~xarray.Dataset.interp`).
For multi-dimensional interpolation, an attempt is first made to decompose the
interpolation in a series of 1-dimensional interpolations, in which case
the relevant 1-dimensional interpolator is used. If a decomposition cannot be
made (e.g. with advanced interpolation), :py:func:`scipy.interpolate.interpn` is
used.

The interpolation method can be specified by the optional ``method`` argument.

.. jupyter-execute::

    da = xr.DataArray(
        np.sin(np.linspace(0, 2 * np.pi, 10)),
        dims="x",
        coords={"x": np.linspace(0, 1, 10)},
    )

    da.plot.line("o", label="original")
    da.interp(x=np.linspace(0, 1, 100)).plot.line(label="linear (default)")
    da.interp(x=np.linspace(0, 1, 100), method="cubic").plot.line(label="cubic")
    plt.legend();

Additional keyword arguments can be passed to scipy's functions.

.. jupyter-execute::

    # fill 0 for the outside of the original coordinates.
    da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": 0.0})

.. jupyter-execute::

    # 1-dimensional extrapolation
    da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": "extrapolate"})

.. jupyter-execute::

    # multi-dimensional extrapolation
    da = xr.DataArray(
        np.sin(0.3 * np.arange(12).reshape(4, 3)),
        [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
    )

    da.interp(
        time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={"fill_value": "extrapolate"}
    )


Advanced Interpolation
----------------------

:py:meth:`~xarray.DataArray.interp` accepts :py:class:`~xarray.DataArray`
as similar to :py:meth:`~xarray.DataArray.sel`, which enables us more advanced interpolation.
Based on the dimension of the new coordinate passed to :py:meth:`~xarray.DataArray.interp`, the dimension of the result are determined.

For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below,
you can pass two 1-dimensional :py:class:`~xarray.DataArray` s with
a common dimension as new coordinate.

.. image:: ../_static/advanced_selection_interpolation.svg
    :height: 200px
    :width: 400 px
    :alt: advanced indexing and interpolation
    :align: center

For example:

.. jupyter-execute::

    da = xr.DataArray(
        np.sin(0.3 * np.arange(20).reshape(5, 4)),
        [("x", np.arange(5)), ("y", [0.1, 0.2, 0.3, 0.4])],
    )
    # advanced indexing
    x = xr.DataArray([0, 2, 4], dims="z")
    y = xr.DataArray([0.1, 0.2, 0.3], dims="z")
    da.sel(x=x, y=y)

.. jupyter-execute::

    # advanced interpolation, without extrapolation
    x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z")
    y = xr.DataArray([0.15, 0.25, 0.35, 0.45], dims="z")
    da.interp(x=x, y=y)

where values on the original coordinates
``(x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35), (3.5, 0.45))`` are obtained
by the 2-dimensional interpolation and mapped along a new dimension ``z``. Since
no keyword arguments are passed to the interpolation routine, no extrapolation
is performed resulting in a ``nan`` value.

If you want to add a coordinate to the new dimension ``z``, you can supply
:py:class:`~xarray.DataArray` s with a coordinate. Extrapolation can be achieved
by passing additional arguments to SciPy's ``interpnd`` function,

.. jupyter-execute::

    x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z", coords={"z": ["a", "b", "c", "d"]})
    y = xr.DataArray(
        [0.15, 0.25, 0.35, 0.45], dims="z", coords={"z": ["a", "b", "c", "d"]}
    )
    da.interp(x=x, y=y, kwargs={"fill_value": None})

For the details of the advanced indexing,
see :ref:`more advanced indexing <more_advanced_indexing>`.


Interpolating arrays with NaN
-----------------------------

Our :py:meth:`~xarray.DataArray.interp` works with arrays with NaN
the same way that
`scipy.interpolate.interp1d <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp1d.html>`_ and
`scipy.interpolate.interpn <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interpn.html>`_ do.
``linear`` and ``nearest`` methods return arrays including NaN,
while other methods such as ``cubic`` or ``quadratic`` return all NaN arrays.

.. jupyter-execute::

    da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims="x", coords={"x": range(5)})
    da.interp(x=[0.5, 1.5, 2.5])

.. jupyter-execute::

    da.interp(x=[0.5, 1.5, 2.5], method="cubic")

To avoid this, you can drop NaN by :py:meth:`~xarray.DataArray.dropna`, and
then make the interpolation

.. jupyter-execute::

    dropped = da.dropna("x")
    dropped

.. jupyter-execute::

    dropped.interp(x=[0.5, 1.5, 2.5], method="cubic")

If NaNs are distributed randomly in your multidimensional array,
dropping all the columns containing more than one NaNs by
:py:meth:`~xarray.DataArray.dropna` may lose a significant amount of information.
In such a case, you can fill NaN by :py:meth:`~xarray.DataArray.interpolate_na`,
which is similar to :py:meth:`pandas.Series.interpolate`.

.. jupyter-execute::

    filled = da.interpolate_na(dim="x")
    filled

This fills NaN by interpolating along the specified dimension.
After filling NaNs, you can interpolate:

.. jupyter-execute::

    filled.interp(x=[0.5, 1.5, 2.5], method="cubic")

For the details of :py:meth:`~xarray.DataArray.interpolate_na`,
see :ref:`Missing values <missing_values>`.


Example
-------

Let's see how :py:meth:`~xarray.DataArray.interp` works on real data.

.. jupyter-execute::

    # Raw data
    ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
    fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
    ds.air.plot(ax=axes[0])
    axes[0].set_title("Raw data")

    # Interpolated data
    new_lon = np.linspace(ds.lon[0].item(), ds.lon[-1].item(), ds.sizes["lon"] * 4)
    new_lat = np.linspace(ds.lat[0].item(), ds.lat[-1].item(), ds.sizes["lat"] * 4)
    dsi = ds.interp(lat=new_lat, lon=new_lon)
    dsi.air.plot(ax=axes[1])
    axes[1].set_title("Interpolated data");

Our advanced interpolation can be used to remap the data to the new coordinate.
Consider the new coordinates x and z on the two dimensional plane.
The remapping can be done as follows

.. jupyter-execute::

    # new coordinate
    x = np.linspace(240, 300, 100)
    z = np.linspace(20, 70, 100)
    # relation between new and original coordinates
    lat = xr.DataArray(z, dims=["z"], coords={"z": z})
    lon = xr.DataArray(
        (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
        dims=["x", "z"],
        coords={"x": x, "z": z},
    )

    fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
    ds.air.plot(ax=axes[0])
    # draw the new coordinate on the original coordinates.
    for idx in [0, 33, 66, 99]:
        axes[0].plot(lon.isel(x=idx), lat, "--k")
    for idx in [0, 33, 66, 99]:
        axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")
    axes[0].set_title("Raw data")

    dsi = ds.interp(lon=lon, lat=lat)
    dsi.air.plot(ax=axes[1])
    axes[1].set_title("Remapped data");

    .. currentmodule:: xarray
.. _io:

Reading and writing files
=========================

Xarray supports direct serialization and IO to several file formats, from
simple :ref:`io.pickle` files to the more flexible :ref:`io.netcdf`
format (recommended).

.. jupyter-execute::
    :hide-code:

    import os

    import iris
    import ncdata.iris_xarray
    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

You can read different types of files in ``xr.open_dataset`` by specifying the engine to be used:

.. code:: python

    xr.open_dataset("example.nc", engine="netcdf4")

The "engine" provides a set of instructions that tells xarray how
to read the data and pack them into a ``Dataset`` (or ``Dataarray``).
These instructions are stored in an underlying "backend".

Xarray comes with several backends that cover many common data formats.
Many more backends are available via external libraries, or you can `write your own <https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html>`_.
This diagram aims to help you determine - based on the format of the file you'd like to read -
which type of backend you're using and how to use it.

Text and boxes are clickable for more information.
Following the diagram is detailed information on many popular backends.
You can learn more about using and developing backends in the
`Xarray tutorial JupyterBook <https://tutorial.xarray.dev/advanced/backends/backends.html>`_.

..
   _comment: mermaid Flowcharg "link" text gets secondary color background, SVG icon fill gets primary color

.. raw:: html

    <style>
      /* Ensure PST link colors don't override mermaid text colors */
      .mermaid a {
        color: white;
      }
      .mermaid a:hover {
        color: magenta;
        text-decoration-color: magenta;
      }
      .mermaid a:visited {
        color: white;
        text-decoration-color: white;
      }
    </style>

.. mermaid::
    :config: {"theme":"base","themeVariables":{"fontSize":"20px","primaryColor":"#fff","primaryTextColor":"#fff","primaryBorderColor":"#59c7d6","lineColor":"#e28126","secondaryColor":"#767985"}}
    :alt: Flowchart illustrating how to choose the right backend engine to read your data

    flowchart LR
        built-in-eng["`**Is your data stored in one of these formats?**
            - netCDF4
            - netCDF3
            - Zarr
            - DODS/OPeNDAP
            - HDF5
            `"]

        built-in("`**You're in luck!** Xarray bundles a backend to automatically read these formats.
            Open data using <code>xr.open_dataset()</code>. We recommend
            explicitly setting engine='xxxx' for faster loading.`")

        installed-eng["""<b>One of these formats?</b>
            - <a href='https://github.com/ecmwf/cfgrib'>GRIB</a>
            - <a href='https://tiledb-inc.github.io/TileDB-CF-Py/documentation'>TileDB</a>
            - <a href='https://corteva.github.io/rioxarray/stable/getting_started/getting_started.html#rioxarray'>GeoTIFF, JPEG-2000, etc. (via GDAL)</a>
            - <a href='https://www.bopen.eu/xarray-sentinel-open-source-library/'>Sentinel-1 SAFE</a>
            """]

        installed("""Install the linked backend library and use it with
            <code>xr.open_dataset(file, engine='xxxx')</code>.""")

        other["`**Options:**
            - Look around to see if someone has created an Xarray backend for your format!
            - <a href='https://docs.xarray.dev/en/stable/internals/how-to-add-new-backend.html'>Create your own backend</a>
            - Convert your data to a supported format
            `"]

        built-in-eng -->|Yes| built-in
        built-in-eng -->|No| installed-eng

        installed-eng -->|Yes| installed
        installed-eng -->|No| other

        click built-in-eng "https://docs.xarray.dev/en/stable/get-help/faq.html#how-do-i-open-format-x-file-as-an-xarray-dataset"


        classDef quesNodefmt font-size:12pt,fill:#0e4666,stroke:#59c7d6,stroke-width:3
        class built-in-eng,installed-eng quesNodefmt

        classDef ansNodefmt font-size:12pt,fill:#4a4a4a,stroke:#17afb4,stroke-width:3
        class built-in,installed,other ansNodefmt

        linkStyle default font-size:18pt,stroke-width:4


.. _io.netcdf:

netCDF
------

The recommended way to store xarray data structures is `netCDF`__, which
is a binary file format for self-described datasets that originated
in the geosciences. Xarray is based on the netCDF data model, so netCDF files
on disk directly correspond to :py:class:`Dataset` objects (more accurately,
a group in a netCDF file directly corresponds to a :py:class:`Dataset` object.
See :ref:`io.netcdf_groups` for more.)

NetCDF is supported on almost all platforms, and parsers exist
for the vast majority of scientific programming languages. Recent versions of
netCDF are based on the even more widely used HDF5 file-format.

__ https://www.unidata.ucar.edu/software/netcdf/

.. tip::

    If you aren't familiar with this data format, the `netCDF FAQ`_ is a good
    place to start.

.. _netCDF FAQ: https://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF

Reading and writing netCDF files with xarray requires scipy, h5netcdf, or the
`netCDF4-Python`__ library to be installed. SciPy only supports reading and writing
of netCDF V3 files.

__ https://github.com/Unidata/netcdf4-python

We can save a Dataset to disk using the
:py:meth:`Dataset.to_netcdf` method:

.. jupyter-execute::

    ds = xr.Dataset(
        {"foo": (("x", "y"), np.random.rand(4, 5))},
        coords={
            "x": [10, 20, 30, 40],
            "y": pd.date_range("2000-01-01", periods=5),
            "z": ("x", list("abcd")),
        },
    )

    ds.to_netcdf("saved_on_disk.nc")

By default, the file is saved as netCDF4 (assuming netCDF4-Python is
installed). You can control the format and engine used to write the file with
the ``format`` and ``engine`` arguments.

.. tip::

   Using the `h5netcdf <https://github.com/h5netcdf/h5netcdf>`_  package
   by passing ``engine='h5netcdf'`` to :py:meth:`open_dataset` can
   sometimes be quicker than the default ``engine='netcdf4'`` that uses the
   `netCDF4 <https://github.com/Unidata/netcdf4-python>`_ package.


We can load netCDF files to create a new Dataset using
:py:func:`open_dataset`:

.. jupyter-execute::

    ds_disk = xr.open_dataset("saved_on_disk.nc")
    ds_disk

.. jupyter-execute::
    :hide-code:

    # Close "saved_on_disk.nc", but retain the file until after closing or deleting other
    # datasets that will refer to it.
    ds_disk.close()

Similarly, a DataArray can be saved to disk using the
:py:meth:`DataArray.to_netcdf` method, and loaded
from disk using the :py:func:`open_dataarray` function. As netCDF files
correspond to :py:class:`Dataset` objects, these functions internally
convert the ``DataArray`` to a ``Dataset`` before saving, and then convert back
when loading, ensuring that the ``DataArray`` that is loaded is always exactly
the same as the one that was saved.

A dataset can also be loaded or written to a specific group within a netCDF
file. To load from a group, pass a ``group`` keyword argument to the
``open_dataset`` function. The group can be specified as a path-like
string, e.g., to access subgroup 'bar' within group 'foo' pass
'/foo/bar' as the ``group`` argument. When writing multiple groups in one file,
pass ``mode='a'`` to ``to_netcdf`` to ensure that each call does not delete the
file.

.. tip::

    It is recommended to use :py:class:`~xarray.DataTree` to represent
    hierarchical data, and to use the :py:meth:`xarray.DataTree.to_netcdf` method
    when writing hierarchical data to a netCDF file.

Data is *always* loaded lazily from netCDF files. You can manipulate, slice and subset
Dataset and DataArray objects, and no array values are loaded into memory until
you try to perform some sort of actual computation. For an example of how these
lazy arrays work, see the OPeNDAP section below.

There may be minor differences in the :py:class:`Dataset` object returned
when reading a NetCDF file with different engines.

It is important to note that when you modify values of a Dataset, even one
linked to files on disk, only the in-memory copy you are manipulating in xarray
is modified: the original file on disk is never touched.

.. tip::

    Xarray's lazy loading of remote or on-disk datasets is often but not always
    desirable. Before performing computationally intense operations, it is
    often a good idea to load a Dataset (or DataArray) entirely into memory by
    invoking the :py:meth:`Dataset.load` method.

Datasets have a :py:meth:`Dataset.close` method to close the associated
netCDF file. However, it's often cleaner to use a ``with`` statement:

.. jupyter-execute::

    # this automatically closes the dataset after use
    with xr.open_dataset("saved_on_disk.nc") as ds:
        print(ds.keys())

Although xarray provides reasonable support for incremental reads of files on
disk, it does not support incremental writes, which can be a useful strategy
for dealing with datasets too big to fit into memory. Instead, xarray integrates
with dask.array (see :ref:`dask`), which provides a fully featured engine for
streaming computation.

It is possible to append or overwrite netCDF variables using the ``mode='a'``
argument. When using this option, all variables in the dataset will be written
to the original netCDF file, regardless if they exist in the original dataset.


.. _io.netcdf_groups:

Groups
~~~~~~

Whilst netCDF groups can only be loaded individually as ``Dataset`` objects, a
whole file of many nested groups can be loaded as a single
:py:class:`xarray.DataTree` object. To open a whole netCDF file as a tree of groups
use the :py:func:`xarray.open_datatree` function. To save a DataTree object as a
netCDF file containing many groups, use the :py:meth:`xarray.DataTree.to_netcdf` method.


.. _netcdf.root_group.note:

.. note::
    Due to file format specifications the on-disk root group name is always ``"/"``,
    overriding any given ``DataTree`` root node name.

.. _netcdf.group.warning:

.. warning::
    ``DataTree`` objects do not follow the exact same data model as netCDF
    files, which means that perfect round-tripping is not always possible.

    In particular in the netCDF data model dimensions are entities that can
    exist regardless of whether any variable possesses them. This is in contrast
    to `xarray's data model <https://docs.xarray.dev/en/stable/user-guide/data-structures.html>`_
    (and hence :ref:`DataTree's data model <data structures>`) in which the
    dimensions of a (Dataset/Tree) object are simply the set of dimensions
    present across all variables in that dataset.

    This means that if a netCDF file contains dimensions but no variables which
    possess those dimensions, these dimensions will not be present when that
    file is opened as a DataTree object.
    Saving this DataTree object to file will therefore not preserve these
    "unused" dimensions.

.. _io.encoding:

Reading encoded data
~~~~~~~~~~~~~~~~~~~~

NetCDF files follow some conventions for encoding datetime arrays (as numbers
with a "units" attribute) and for packing and unpacking data (as
described by the "scale_factor" and "add_offset" attributes). If the argument
``decode_cf=True`` (default) is given to :py:func:`open_dataset`, xarray will attempt
to automatically decode the values in the netCDF objects according to
`CF conventions`_. Sometimes this will fail, for example, if a variable
has an invalid "units" or "calendar" attribute. For these cases, you can
turn this decoding off manually.

.. _CF conventions: https://cfconventions.org/

You can view this encoding information (among others) in the
:py:attr:`DataArray.encoding` and
:py:attr:`DataArray.encoding` attributes:

.. jupyter-execute::

    ds_disk["y"].encoding

.. jupyter-execute::

    ds_disk.encoding

Note that all operations that manipulate variables other than indexing
will remove encoding information.

In some cases it is useful to intentionally reset a dataset's original encoding values.
This can be done with either the :py:meth:`Dataset.drop_encoding` or
:py:meth:`DataArray.drop_encoding` methods.

.. jupyter-execute::

    ds_no_encoding = ds_disk.drop_encoding()
    ds_no_encoding.encoding

.. _combining multiple files:

Reading multi-file datasets
...........................

NetCDF files are often encountered in collections, e.g., with different files
corresponding to different model runs or one file per timestamp.
Xarray can straightforwardly combine such files into a single Dataset by making use of
:py:func:`concat`, :py:func:`merge`, :py:func:`combine_nested` and
:py:func:`combine_by_coords`. For details on the difference between these
functions see :ref:`combining data`.

Xarray includes support for manipulating datasets that don't fit into memory
with dask_. If you have dask installed, you can open multiple files
simultaneously in parallel using :py:func:`open_mfdataset`::

    xr.open_mfdataset('my/files/*.nc', parallel=True)

This function automatically concatenates and merges multiple files into a
single xarray dataset.
It is the recommended way to open multiple files with xarray.
For more details on parallel reading, see :ref:`combining.multi`, :ref:`dask.io` and a
`blog post`_ by Stephan Hoyer.
:py:func:`open_mfdataset` takes many kwargs that allow you to
control its behaviour (for e.g. ``parallel``, ``combine``, ``compat``, ``join``, ``concat_dim``).
See its docstring for more details.


.. note::

    A common use-case involves a dataset distributed across a large number of files with
    each file containing a large number of variables. Commonly, a few of these variables
    need to be concatenated along a dimension (say ``"time"``), while the rest are equal
    across the datasets (ignoring floating point differences). The following command
    with suitable modifications (such as ``parallel=True``) works well with such datasets::

         xr.open_mfdataset('my/files/*.nc', concat_dim="time", combine="nested",
     	              	   data_vars='minimal', coords='minimal', compat='override')

    This command concatenates variables along the ``"time"`` dimension, but only those that
    already contain the ``"time"`` dimension (``data_vars='minimal', coords='minimal'``).
    Variables that lack the ``"time"`` dimension are taken from the first dataset
    (``compat='override'``).


.. _dask: https://www.dask.org
.. _blog post: https://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/

Sometimes multi-file datasets are not conveniently organized for easy use of :py:func:`open_mfdataset`.
One can use the ``preprocess`` argument to provide a function that takes a dataset
and returns a modified Dataset.
:py:func:`open_mfdataset` will call ``preprocess`` on every dataset
(corresponding to each file) prior to combining them.


If :py:func:`open_mfdataset` does not meet your needs, other approaches are possible.
The general pattern for parallel reading of multiple files
using dask, modifying those datasets and then combining into a single ``Dataset`` is::

     def modify(ds):
         # modify ds here
         return ds


     # this is basically what open_mfdataset does
     open_kwargs = dict(decode_cf=True, decode_times=False)
     open_tasks = [dask.delayed(xr.open_dataset)(f, **open_kwargs) for f in file_names]
     tasks = [dask.delayed(modify)(task) for task in open_tasks]
     datasets = dask.compute(tasks)  # get a list of xarray.Datasets
     combined = xr.combine_nested(datasets)  # or some combination of concat, merge


As an example, here's how we could approximate ``MFDataset`` from the netCDF4
library::

    from glob import glob
    import xarray as xr

    def read_netcdfs(files, dim):
        # glob expands paths with * to a list of files, like the unix shell
        paths = sorted(glob(files))
        datasets = [xr.open_dataset(p) for p in paths]
        combined = xr.concat(datasets, dim)
        return combined

    combined = read_netcdfs('/all/my/files/*.nc', dim='time')

This function will work in many cases, but it's not very robust. First, it
never closes files, which means it will fail if you need to load more than
a few thousand files. Second, it assumes that you want all the data from each
file and that it can all fit into memory. In many situations, you only need
a small subset or an aggregated summary of the data from each file.

Here's a slightly more sophisticated example of how to remedy these
deficiencies::

    def read_netcdfs(files, dim, transform_func=None):
        def process_one_path(path):
            # use a context manager, to ensure the file gets closed after use
            with xr.open_dataset(path) as ds:
                # transform_func should do some sort of selection or
                # aggregation
                if transform_func is not None:
                    ds = transform_func(ds)
                # load all data from the transformed dataset, to ensure we can
                # use it after closing each original file
                ds.load()
                return ds

        paths = sorted(glob(files))
        datasets = [process_one_path(p) for p in paths]
        combined = xr.concat(datasets, dim)
        return combined

    # here we suppose we only care about the combined mean of each file;
    # you might also use indexing operations like .sel to subset datasets
    combined = read_netcdfs('/all/my/files/*.nc', dim='time',
                            transform_func=lambda ds: ds.mean())

This pattern works well and is very robust. We've used similar code to process
tens of thousands of files constituting 100s of GB of data.


.. _io.netcdf.writing_encoded:

Writing encoded data
~~~~~~~~~~~~~~~~~~~~

Conversely, you can customize how xarray writes netCDF files on disk by
providing explicit encodings for each dataset variable. The ``encoding``
argument takes a dictionary with variable names as keys and variable specific
encodings as values. These encodings are saved as attributes on the netCDF
variables on disk, which allows xarray to faithfully read encoded data back into
memory.

It is important to note that using encodings is entirely optional: if you do not
supply any of these encoding options, xarray will write data to disk using a
default encoding, or the options in the ``encoding`` attribute, if set.
This works perfectly fine in most cases, but encoding can be useful for
additional control, especially for enabling compression.

In the file on disk, these encodings are saved as attributes on each variable, which
allow xarray and other CF-compliant tools for working with netCDF files to correctly
read the data.

Scaling and type conversions
............................

These encoding options (based on `CF Conventions on packed data`_) work on any
version of the netCDF file format:

- ``dtype``: Any valid NumPy dtype or string convertible to a dtype, e.g., ``'int16'``
  or ``'float32'``. This controls the type of the data written on disk.
- ``_FillValue``:  Values of ``NaN`` in xarray variables are remapped to this value when
  saved on disk. This is important when converting floating point with missing values
  to integers on disk, because ``NaN`` is not a valid value for integer dtypes. By
  default, variables with float types are attributed a ``_FillValue`` of ``NaN`` in the
  output file, unless explicitly disabled with an encoding ``{'_FillValue': None}``.
- ``scale_factor`` and ``add_offset``: Used to convert from encoded data on disk to
  to the decoded data in memory, according to the formula
  ``decoded = scale_factor * encoded + add_offset``. Please note that ``scale_factor``
  and ``add_offset`` must be of same type and determine the type of the decoded data.

These parameters can be fruitfully combined to compress discretized data on disk. For
example, to save the variable ``foo`` with a precision of 0.1 in 16-bit integers while
converting ``NaN`` to ``-9999``, we would use
``encoding={'foo': {'dtype': 'int16', 'scale_factor': 0.1, '_FillValue': -9999}}``.
Compression and decompression with such discretization is extremely fast.

.. _CF Conventions on packed data: https://cfconventions.org/cf-conventions/cf-conventions.html#packed-data

.. _io.string-encoding:

String encoding
...............

Xarray can write unicode strings to netCDF files in two ways:

- As variable length strings. This is only supported on netCDF4 (HDF5) files.
- By encoding strings into bytes, and writing encoded bytes as a character
  array. The default encoding is UTF-8.

By default, we use variable length strings for compatible files and fall-back
to using encoded character arrays. Character arrays can be selected even for
netCDF4 files by setting the ``dtype`` field in ``encoding`` to ``S1``
(corresponding to NumPy's single-character bytes dtype).

If character arrays are used:

- The string encoding that was used is stored on
  disk in the ``_Encoding`` attribute, which matches an ad-hoc convention
  `adopted by the netCDF4-Python library <https://github.com/Unidata/netcdf4-python/pull/665>`_.
  At the time of this writing (October 2017), a standard convention for indicating
  string encoding for character arrays in netCDF files was
  `still under discussion <https://github.com/Unidata/netcdf-c/issues/402>`_.
  Technically, you can use
  `any string encoding recognized by Python <https://docs.python.org/3/library/codecs.html#standard-encodings>`_ if you feel the need to deviate from UTF-8,
  by setting the ``_Encoding`` field in ``encoding``. But
  `we don't recommend it <https://utf8everywhere.org/>`_.
- The character dimension name can be specified by the ``char_dim_name`` field of a variable's
  ``encoding``. If the name of the character dimension is not specified, the default is
  ``f'string{data.shape[-1]}'``. When decoding character arrays from existing files, the
  ``char_dim_name`` is added to the variables ``encoding`` to preserve if encoding happens, but
  the field can be edited by the user.

.. warning::

  Missing values in bytes or unicode string arrays (represented by ``NaN`` in
  xarray) are currently written to disk as empty strings ``''``. This means
  missing values will not be restored when data is loaded from disk.
  This behavior is likely to change in the future (:issue:`1647`).
  Unfortunately, explicitly setting a ``_FillValue`` for string arrays to handle
  missing values doesn't work yet either, though we also hope to fix this in the
  future.

Chunk based compression
.......................

``zlib``, ``complevel``, ``fletcher32``, ``contiguous`` and ``chunksizes``
can be used for enabling netCDF4/HDF5's chunk based compression, as described
in the `documentation for createVariable`_ for netCDF4-Python. This only works
for netCDF4 files and thus requires using ``format='netCDF4'`` and either
``engine='netcdf4'`` or ``engine='h5netcdf'``.

.. _documentation for createVariable: https://unidata.github.io/netcdf4-python/#netCDF4.Dataset.createVariable

Chunk based gzip compression can yield impressive space savings, especially
for sparse data, but it comes with significant performance overhead. HDF5
libraries can only read complete chunks back into memory, and maximum
decompression speed is in the range of 50-100 MB/s. Worse, HDF5's compression
and decompression currently cannot be parallelized with dask. For these reasons, we
recommend trying discretization based compression (described above) first.

Time units
..........

The ``units`` and ``calendar`` attributes control how xarray serializes ``datetime64`` and
``timedelta64`` arrays to datasets on disk as numeric values. The ``units`` encoding
should be a string like ``'days since 1900-01-01'`` for ``datetime64`` data or a string
like ``'days'`` for ``timedelta64`` data. ``calendar`` should be one of the calendar types
supported by netCDF4-python: ``'standard'``, ``'gregorian'``, ``'proleptic_gregorian'``, ``'noleap'``,
``'365_day'``, ``'360_day'``, ``'julian'``, ``'all_leap'``, ``'366_day'``.

By default, xarray uses the ``'proleptic_gregorian'`` calendar and units of the smallest time
difference between values, with a reference time of the first time value.


.. _io.coordinates:

Coordinates
...........

You can control the ``coordinates`` attribute written to disk by specifying ``DataArray.encoding["coordinates"]``.
If not specified, xarray automatically sets ``DataArray.encoding["coordinates"]`` to a space-delimited list
of names of coordinate variables that share dimensions with the ``DataArray`` being written.
This allows perfect roundtripping of xarray datasets but may not be desirable.
When an xarray ``Dataset`` contains non-dimensional coordinates that do not share dimensions with any of
the variables, these coordinate variable names are saved under a "global" ``"coordinates"`` attribute.
This is not CF-compliant but again facilitates roundtripping of xarray datasets.

Invalid netCDF files
~~~~~~~~~~~~~~~~~~~~

The library ``h5netcdf`` allows writing some dtypes that aren't
allowed in netCDF4 (see
`h5netcdf documentation <https://github.com/h5netcdf/h5netcdf#invalid-netcdf-files>`_).
This feature is available through :py:meth:`DataArray.to_netcdf` and
:py:meth:`Dataset.to_netcdf` when used with ``engine="h5netcdf"``
and currently raises a warning unless ``invalid_netcdf=True`` is set.

.. warning::

  Note that this produces a file that is likely to be not readable by other netCDF
  libraries!

.. _io.hdf5:

HDF5
----
`HDF5`_ is both a file format and a data model for storing information. HDF5 stores
data hierarchically, using groups to create a nested structure. HDF5 is a more
general version of the netCDF4 data model, so the nested structure is one of many
similarities between the two data formats.

Reading HDF5 files in xarray requires the ``h5netcdf`` engine, which can be installed
with ``conda install h5netcdf``. Once installed we can use xarray to open HDF5 files:

.. code:: python

    xr.open_dataset("/path/to/my/file.h5")

The similarities between HDF5 and netCDF4 mean that HDF5 data can be written with the
same :py:meth:`Dataset.to_netcdf` method as used for netCDF4 data:

.. jupyter-execute::

    ds = xr.Dataset(
        {"foo": (("x", "y"), np.random.rand(4, 5))},
        coords={
            "x": [10, 20, 30, 40],
            "y": pd.date_range("2000-01-01", periods=5),
            "z": ("x", list("abcd")),
        },
    )

    ds.to_netcdf("saved_on_disk.h5")

Groups
~~~~~~

If you have multiple or highly nested groups, xarray by default may not read the group
that you want. A particular group of an HDF5 file can be specified using the ``group``
argument:

.. code:: python

    xr.open_dataset("/path/to/my/file.h5", group="/my/group")

While xarray cannot interrogate an HDF5 file to determine which groups are available,
the HDF5 Python reader `h5py`_ can be used instead.

Natively the xarray data structures can only handle one level of nesting, organized as
DataArrays inside of Datasets. If your HDF5 file has additional levels of hierarchy you
can only access one group and a time and will need to specify group names.

.. _HDF5: https://hdfgroup.github.io/hdf5/index.html
.. _h5py: https://www.h5py.org/


.. _io.zarr:

Zarr
----

`Zarr`_ is a Python package that provides an implementation of chunked, compressed,
N-dimensional arrays.
Zarr has the ability to store arrays in a range of ways, including in memory,
in files, and in cloud-based object storage such as `Amazon S3`_ and
`Google Cloud Storage`_.
Xarray's Zarr backend allows xarray to leverage these capabilities, including
the ability to store and analyze datasets far too large fit onto disk
(particularly :ref:`in combination with dask <dask>`).

Xarray can't open just any zarr dataset, because xarray requires special
metadata (attributes) describing the dataset dimensions and coordinates.
At this time, xarray can only open zarr datasets with these special attributes,
such as zarr datasets written by xarray,
`netCDF <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html>`_,
or `GDAL <https://gdal.org/drivers/raster/zarr.html>`_.
For implementation details, see :ref:`zarr_encoding`.

To write a dataset with zarr, we use the :py:meth:`Dataset.to_zarr` method.

To write to a local directory, we pass a path to a directory:

.. jupyter-execute::
    :hide-code:

    ! rm -rf path/to/directory.zarr

.. jupyter-execute::
    :stderr:

    ds = xr.Dataset(
        {"foo": (("x", "y"), np.random.rand(4, 5))},
        coords={
            "x": [10, 20, 30, 40],
            "y": pd.date_range("2000-01-01", periods=5),
            "z": ("x", list("abcd")),
        },
    )
    ds.to_zarr("path/to/directory.zarr", zarr_format=2, consolidated=False)

(The suffix ``.zarr`` is optional--just a reminder that a zarr store lives
there.) If the directory does not exist, it will be created. If a zarr
store is already present at that path, an error will be raised, preventing it
from being overwritten. To override this behavior and overwrite an existing
store, add ``mode='w'`` when invoking :py:meth:`~Dataset.to_zarr`.

DataArrays can also be saved to disk using the :py:meth:`DataArray.to_zarr` method,
and loaded from disk using the :py:func:`open_dataarray` function with ``engine='zarr'``.
Similar to :py:meth:`DataArray.to_netcdf`, :py:meth:`DataArray.to_zarr` will
convert the ``DataArray`` to a ``Dataset`` before saving, and then convert back
when loading, ensuring that the ``DataArray`` that is loaded is always exactly
the same as the one that was saved.

.. note::

    xarray does not write `NCZarr <https://docs.unidata.ucar.edu/nug/current/nczarr_head.html>`_ attributes.
    Therefore, NCZarr data must be opened in read-only mode.

To store variable length strings, convert them to object arrays first with
``dtype=object``.

To read back a zarr dataset that has been created this way, we use the
:py:func:`open_zarr` method:

.. jupyter-execute::

    ds_zarr = xr.open_zarr("path/to/directory.zarr", consolidated=False)
    ds_zarr

Cloud Storage Buckets
~~~~~~~~~~~~~~~~~~~~~

It is possible to read and write xarray datasets directly from / to cloud
storage buckets using zarr. This example uses the `gcsfs`_ package to provide
an interface to `Google Cloud Storage`_.

General `fsspec`_ URLs, those that begin with ``s3://`` or ``gcs://`` for example,
are parsed and the store set up for you automatically when reading.
You should include any arguments to the storage backend as the
key ```storage_options``, part of ``backend_kwargs``.

.. code:: python

    ds_gcs = xr.open_dataset(
        "gcs://<bucket-name>/path.zarr",
        backend_kwargs={
            "storage_options": {"project": "<project-name>", "token": None}
        },
        engine="zarr",
    )


This also works with ``open_mfdataset``, allowing you to pass a list of paths or
a URL to be interpreted as a glob string.

For writing, you may either specify a bucket URL or explicitly set up a
``zarr.abc.store.Store`` instance, as follows:

.. tab:: URL

    .. code:: python

        # write to the bucket via GCS URL
        ds.to_zarr("gs://<bucket/path/to/data.zarr>")
        # read it back
        ds_gcs = xr.open_zarr("gs://<bucket/path/to/data.zarr>")

.. tab:: fsspec

    .. code:: python

        import gcsfs
        import zarr

        # manually manage the cloud filesystem connection -- useful, for example,
        # when you need to manage permissions to cloud resources
        fs = gcsfs.GCSFileSystem(project="<project-name>", token=None)
        zstore = zarr.storage.FsspecStore(fs, path="<bucket/path/to/data.zarr>")

        # write to the bucket
        ds.to_zarr(store=zstore)
        # read it back
        ds_gcs = xr.open_zarr(zstore)

.. tab:: obstore

    .. code:: python

        import obstore
        import zarr

        # alternatively, obstore offers a modern, performant interface for
        # cloud buckets
        gcsstore = obstore.store.GCSStore(
            "<bucket>", prefix="<path/to/data.zarr>", skip_signature=True
        )
        zstore = zarr.store.ObjectStore(gcsstore)

        # write to the bucket
        ds.to_zarr(store=zstore)
        # read it back
        ds_gcs = xr.open_zarr(zstore)


.. _fsspec: https://filesystem-spec.readthedocs.io/en/latest/
.. _obstore: https://developmentseed.org/obstore/latest/
.. _Zarr: https://zarr.readthedocs.io/
.. _Amazon S3: https://aws.amazon.com/s3/
.. _Google Cloud Storage: https://cloud.google.com/storage/
.. _gcsfs: https://github.com/fsspec/gcsfs

.. _io.zarr.distributed_writes:

Distributed writes
~~~~~~~~~~~~~~~~~~

Xarray will natively use dask to write in parallel to a zarr store, which should
satisfy most moderately sized datasets. For more flexible parallelization, we
can use ``region`` to write to limited regions of arrays in an existing Zarr
store.

To scale this up to writing large datasets, first create an initial Zarr store
without writing all of its array data. This can be done by first creating a
``Dataset`` with dummy values stored in :ref:`dask <dask>`, and then calling
``to_zarr`` with ``compute=False`` to write only metadata (including ``attrs``)
to Zarr:

.. jupyter-execute::
    :hide-code:

    ! rm -rf path/to/directory.zarr

.. jupyter-execute::

    import dask.array

    # The values of this dask array are entirely irrelevant; only the dtype,
    # shape and chunks are used
    dummies = dask.array.zeros(30, chunks=10)
    ds = xr.Dataset({"foo": ("x", dummies)}, coords={"x": np.arange(30)})
    path = "path/to/directory.zarr"
    # Now we write the metadata without computing any array values
    ds.to_zarr(path, compute=False, consolidated=False)

Now, a Zarr store with the correct variable shapes and attributes exists that
can be filled out by subsequent calls to ``to_zarr``.
Setting ``region="auto"`` will open the existing store and determine the
correct alignment of the new data with the existing dimensions, or as an
explicit mapping from dimension names to Python ``slice`` objects indicating
where the data should be written (in index space, not label space), e.g.,

.. jupyter-execute::

    # For convenience, we'll slice a single dataset, but in the real use-case
    # we would create them separately possibly even from separate processes.
    ds = xr.Dataset({"foo": ("x", np.arange(30))}, coords={"x": np.arange(30)})
    # Any of the following region specifications are valid
    ds.isel(x=slice(0, 10)).to_zarr(path, region="auto", consolidated=False)
    ds.isel(x=slice(10, 20)).to_zarr(path, region={"x": "auto"}, consolidated=False)
    ds.isel(x=slice(20, 30)).to_zarr(path, region={"x": slice(20, 30)}, consolidated=False)

Concurrent writes with ``region`` are safe as long as they modify distinct
chunks in the underlying Zarr arrays (or use an appropriate ``lock``).

As a safety check to make it harder to inadvertently override existing values,
if you set ``region`` then *all* variables included in a Dataset must have
dimensions included in ``region``. Other variables (typically coordinates)
need to be explicitly dropped and/or written in a separate calls to ``to_zarr``
with ``mode='a'``.

Zarr Compressors and Filters
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are many different `options for compression and filtering possible with
zarr <https://zarr.readthedocs.io/en/stable/user-guide/arrays.html#compressors>`_.

These options can be passed to the ``to_zarr`` method as variable encoding.
For example:

.. jupyter-execute::
    :hide-code:

    ! rm -rf foo.zarr

.. jupyter-execute::

    import zarr
    from zarr.codecs import BloscCodec

    compressor = BloscCodec(cname="zstd", clevel=3, shuffle="shuffle")
    ds.to_zarr("foo.zarr", consolidated=False, encoding={"foo": {"compressors": [compressor]}})

.. note::

    Not all native zarr compression and filtering options have been tested with
    xarray.

.. _io.zarr.appending:

Modifying existing Zarr stores
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray supports several ways of incrementally writing variables to a Zarr
store. These options are useful for scenarios when it is infeasible or
undesirable to write your entire dataset at once.

1. Use ``mode='a'`` to add or overwrite entire variables,
2. Use ``append_dim`` to resize and append to existing variables, and
3. Use ``region`` to write to limited regions of existing arrays.

.. tip::

    For ``Dataset`` objects containing dask arrays, a
    single call to ``to_zarr()`` will write all of your data in parallel.

.. warning::

    Alignment of coordinates is currently not checked when modifying an
    existing Zarr store. It is up to the user to ensure that coordinates are
    consistent.

To add or overwrite entire variables, simply call :py:meth:`~Dataset.to_zarr`
with ``mode='a'`` on a Dataset containing the new variables, passing in an
existing Zarr store or path to a Zarr store.

To resize and then append values along an existing dimension in a store, set
``append_dim``. This is a good option if data always arrives in a particular
order, e.g., for time-stepping a simulation:

.. jupyter-execute::
    :hide-code:

    ! rm -rf path/to/directory.zarr

.. jupyter-execute::

    ds1 = xr.Dataset(
        {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
        coords={
            "x": [10, 20, 30, 40],
            "y": [1, 2, 3, 4, 5],
            "t": pd.date_range("2001-01-01", periods=2),
        },
    )
    ds1.to_zarr("path/to/directory.zarr", consolidated=False)

.. jupyter-execute::

    ds2 = xr.Dataset(
        {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
        coords={
            "x": [10, 20, 30, 40],
            "y": [1, 2, 3, 4, 5],
            "t": pd.date_range("2001-01-03", periods=2),
        },
    )
    ds2.to_zarr("path/to/directory.zarr", append_dim="t", consolidated=False)

.. _io.zarr.writing_chunks:

Specifying chunks in a zarr store
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Chunk sizes may be specified in one of three ways when writing to a zarr store:

1. Manual chunk sizing through the use of the ``encoding`` argument in :py:meth:`Dataset.to_zarr`:
2. Automatic chunking based on chunks in dask arrays
3. Default chunk behavior determined by the zarr library

The resulting chunks will be determined based on the order of the above list; dask
chunks will be overridden by manually-specified chunks in the encoding argument,
and the presence of either dask chunks or chunks in the ``encoding`` attribute will
supersede the default chunking heuristics in zarr.

Importantly, this logic applies to every array in the zarr store individually,
including coordinate arrays. Therefore, if a dataset contains one or more dask
arrays, it may still be desirable to specify a chunk size for the coordinate arrays
(for example, with a chunk size of ``-1`` to include the full coordinate).

To specify chunks manually using the ``encoding`` argument, provide a nested
dictionary with the structure ``{'variable_or_coord_name': {'chunks': chunks_tuple}}``.

.. note::

    The positional ordering of the chunks in the encoding argument must match the
    positional ordering of the dimensions in each array. Watch out for arrays with
    differently-ordered dimensions within a single Dataset.

For example, let's say we're working with a dataset with dimensions
``('time', 'x', 'y')``, a variable ``Tair`` which is chunked in ``x`` and ``y``,
and two multi-dimensional coordinates ``xc`` and ``yc``:

.. jupyter-execute::

    ds = xr.tutorial.open_dataset("rasm")

    ds["Tair"] = ds["Tair"].chunk({"x": 100, "y": 100})

    ds

These multi-dimensional coordinates are only two-dimensional and take up very little
space on disk or in memory, yet when writing to disk the default zarr behavior is to
split them into chunks:

.. jupyter-execute::

    ds.to_zarr("path/to/directory.zarr", consolidated=False, mode="w")
    !tree -I zarr.json path/to/directory.zarr


This may cause unwanted overhead on some systems, such as when reading from a cloud
storage provider. To disable this chunking, we can specify a chunk size equal to the
shape of each coordinate array in the ``encoding`` argument:

.. jupyter-execute::

    ds.to_zarr(
        "path/to/directory.zarr",
        encoding={"xc": {"chunks": ds.xc.shape}, "yc": {"chunks": ds.yc.shape}},
        consolidated=False,
        mode="w",
    )
    !tree -I zarr.json path/to/directory.zarr


The number of chunks on Tair matches our dask chunks, while there is now only a single
chunk in the directory stores of each coordinate.

Groups
~~~~~~

Nested groups in zarr stores can be represented by loading the store as a
:py:class:`xarray.DataTree` object, similarly to netCDF. To open a whole zarr store as
a tree of groups use the :py:func:`open_datatree` function. To save a
``DataTree`` object as a zarr store containing many groups, use the
:py:meth:`xarray.DataTree.to_zarr()` method.

.. note::
    Note that perfect round-tripping should always be possible with a zarr
    store (:ref:`unlike for netCDF files <netcdf.group.warning>`), as zarr does
    not support "unused" dimensions.

    For the root group the same restrictions (:ref:`as for netCDF files <netcdf.root_group.note>`) apply.
    Due to file format specifications the on-disk root group name is always ``"/"``
    overriding any given ``DataTree`` root node name.


.. _io.zarr.consolidated_metadata:

Consolidated Metadata
~~~~~~~~~~~~~~~~~~~~~


Xarray needs to read all of the zarr metadata when it opens a dataset.
In some storage mediums, such as with cloud object storage (e.g. `Amazon S3`_),
this can introduce significant overhead, because two separate HTTP calls to the
object store must be made for each variable in the dataset.
By default Xarray uses a feature called
*consolidated metadata*, storing all metadata for the entire dataset with a
single key (by default called ``.zmetadata``). This typically drastically speeds
up opening the store. (For more information on this feature, consult the
`zarr docs on consolidating metadata <https://zarr.readthedocs.io/en/latest/user-guide/consolidated_metadata.html>`_.)

By default, xarray writes consolidated metadata and attempts to read stores
with consolidated metadata, falling back to use non-consolidated metadata for
reads. Because this fall-back option is so much slower, xarray issues a
``RuntimeWarning`` with guidance when reading with consolidated metadata fails:

    Failed to open Zarr store with consolidated metadata, falling back to try
    reading non-consolidated metadata. This is typically much slower for
    opening a dataset. To silence this warning, consider:

    1. Consolidating metadata in this existing store with
       :py:func:`zarr.consolidate_metadata`.
    2. Explicitly setting ``consolidated=False``, to avoid trying to read
       consolidate metadata.
    3. Explicitly setting ``consolidated=True``, to raise an error in this case
       instead of falling back to try reading non-consolidated metadata.


Fill Values
~~~~~~~~~~~

Zarr arrays have a ``fill_value`` that is used for chunks that were never written to disk.
For the Zarr version 2 format, Xarray will set ``fill_value`` to be equal to the CF/NetCDF ``"_FillValue"``.
This is ``np.nan`` by default for floats, and unset otherwise. Note that the Zarr library will set a
default ``fill_value`` if not specified (usually ``0``).

For the Zarr version 3 format, ``_FillValue`` and ```fill_value`` are decoupled.
So you can set ``fill_value`` in ``encoding`` as usual.

Note that at read-time, you can control whether ``_FillValue`` is masked using the
``mask_and_scale`` kwarg; and whether Zarr's ``fill_value`` is treated as synonymous
with ``_FillValue`` using the ``use_zarr_fill_value_as_mask`` kwarg to :py:func:`xarray.open_zarr`.


.. _io.kerchunk:

Kerchunk
--------

`Kerchunk <https://fsspec.github.io/kerchunk>`_ is a Python library
that allows you to access chunked and compressed data formats (such as NetCDF3, NetCDF4, HDF5, GRIB2, TIFF & FITS),
many of which are primary data formats for many data archives, by viewing the
whole archive as an ephemeral `Zarr`_ dataset which allows for parallel, chunk-specific access.

Instead of creating a new copy of the dataset in the Zarr spec/format or
downloading the files locally, Kerchunk reads through the data archive and extracts the
byte range and compression information of each chunk and saves as a ``reference``.
These references are then saved as ``json`` files or ``parquet`` (more efficient)
for later use. You can view some of these stored in the ``references``
directory `here <https://github.com/pydata/xarray-data>`_.


.. note::
    These references follow this `specification <https://fsspec.github.io/kerchunk/spec.html>`_.
    Packages like `kerchunk`_ and `virtualizarr <https://github.com/zarr-developers/VirtualiZarr>`_
    help in creating and reading these references.


Reading these data archives becomes really easy with ``kerchunk`` in combination
with ``xarray``, especially when these archives are large in size. A single combined
reference can refer to thousands of the original data files present in these archives.
You can view the whole dataset with from this combined reference using the above packages.

The following example shows opening a single ``json`` reference to the ``saved_on_disk.h5`` file created above.
If the file were instead stored remotely (e.g. ``s3://saved_on_disk.h5``) you can use ``storage_options``
that are used to `configure fsspec <https://filesystem-spec.readthedocs.io/en/latest/api.html#fsspec.implementations.reference.ReferenceFileSystem.__init__>`_:

.. jupyter-execute::

    ds_kerchunked = xr.open_dataset(
        "./combined.json",
        engine="kerchunk",
        storage_options={},
    )

    ds_kerchunked

.. note::

    You can refer to the `project pythia kerchunk cookbook <https://projectpythia.org/kerchunk-cookbook/README.html>`_
    and the `pangeo guide on kerchunk <https://guide.cloudnativegeo.org/kerchunk/intro.html>`_ for more information.


.. _io.iris:

Iris
----

The Iris_ tool allows easy reading of common meteorological and climate model formats
(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very
similar to ``DataArray`` objects, while enforcing a CF-compliant data model.

DataArray ``to_iris`` and ``from_iris``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If iris is installed, xarray can convert a ``DataArray`` into a ``Cube`` using
:py:meth:`DataArray.to_iris`:

.. jupyter-execute::

    da = xr.DataArray(
        np.random.rand(4, 5),
        dims=["x", "y"],
        coords=dict(x=[10, 20, 30, 40], y=pd.date_range("2000-01-01", periods=5)),
    )

    cube = da.to_iris()
    print(cube)

Conversely, we can create a new ``DataArray`` object from a ``Cube`` using
:py:meth:`DataArray.from_iris`:

.. jupyter-execute::

    da_cube = xr.DataArray.from_iris(cube)
    da_cube

Ncdata
~~~~~~
Ncdata_ provides more sophisticated means of transferring data, including entire
datasets.  It uses the file saving and loading functions in both projects to provide a
more "correct" translation between them, but still with very low overhead and not
using actual disk files.

Here we load an xarray dataset and convert it to Iris cubes:

.. jupyter-execute::
    :stderr:

    ds = xr.tutorial.open_dataset("air_temperature_gradient")
    cubes = ncdata.iris_xarray.cubes_from_xarray(ds)
    print(cubes)

.. jupyter-execute::

    print(cubes[1])

And we can convert the cubes back to an xarray dataset:

.. jupyter-execute::

    # ensure dataset-level and variable-level attributes loaded correctly
    iris.FUTURE.save_split_attrs = True

    ds = ncdata.iris_xarray.cubes_to_xarray(cubes)
    ds

Ncdata can also adjust file data within load and save operations, to fix data loading
problems or provide exact save formatting without needing to modify files on disk.
See for example : `ncdata usage examples`_

.. _Iris: https://scitools.org.uk/iris
.. _Ncdata: https://ncdata.readthedocs.io/en/latest/index.html
.. _ncdata usage examples: https://github.com/pp-mo/ncdata/tree/v0.1.2?tab=readme-ov-file#correct-a-miscoded-attribute-in-iris-input

OPeNDAP
-------

Xarray includes support for `OPeNDAP`__ (via the netCDF4 library or Pydap), which
lets us access large datasets over HTTP.

__ https://www.opendap.org/

For example, we can open a connection to GBs of weather data produced by the
`PRISM`__ project, and hosted by `IRI`__ at Columbia:

__ https://www.prism.oregonstate.edu/
__ https://iri.columbia.edu/


.. jupyter-input::

    remote_data = xr.open_dataset(
        "http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods",
        decode_times=False,
        )
    remote_data

.. jupyter-output::

    <xarray.Dataset>
    Dimensions:  (T: 1422, X: 1405, Y: 621)
    Coordinates:
      * X        (X) float32 -125.0 -124.958 -124.917 -124.875 -124.833 -124.792 -124.75 ...
      * T        (T) float32 -779.5 -778.5 -777.5 -776.5 -775.5 -774.5 -773.5 -772.5 -771.5 ...
      * Y        (Y) float32 49.9167 49.875 49.8333 49.7917 49.75 49.7083 49.6667 49.625 ...
    Data variables:
        ppt      (T, Y, X) float64 ...
        tdmean   (T, Y, X) float64 ...
        tmax     (T, Y, X) float64 ...
        tmin     (T, Y, X) float64 ...
    Attributes:
        Conventions: IRIDL
        expires: 1375315200

.. TODO: update this example to show off decode_cf?

.. note::

    Like many real-world datasets, this dataset does not entirely follow
    `CF conventions`_. Unexpected formats will usually cause xarray's automatic
    decoding to fail. The way to work around this is to either set
    ``decode_cf=False`` in ``open_dataset`` to turn off all use of CF
    conventions, or by only disabling the troublesome parser.
    In this case, we set ``decode_times=False`` because the time axis here
    provides the calendar attribute in a format that xarray does not expect
    (the integer ``360`` instead of a string like ``'360_day'``).

We can select and slice this data any number of times, and nothing is loaded
over the network until we look at particular values:

.. jupyter-input::

    tmax = remote_data["tmax"][:500, ::3, ::3]
    tmax

.. jupyter-output::

    <xarray.DataArray 'tmax' (T: 500, Y: 207, X: 469)>
    [48541500 values with dtype=float64]
    Coordinates:
      * Y        (Y) float32 49.9167 49.7917 49.6667 49.5417 49.4167 49.2917 ...
      * X        (X) float32 -125.0 -124.875 -124.75 -124.625 -124.5 -124.375 ...
      * T        (T) float32 -779.5 -778.5 -777.5 -776.5 -775.5 -774.5 -773.5 ...
    Attributes:
        pointwidth: 120
        standard_name: air_temperature
        units: Celsius_scale
        expires: 1443657600

.. jupyter-input::

    # the data is downloaded automatically when we make the plot
    tmax[0].plot()

.. image:: ../_static/opendap-prism-tmax.png

Some servers require authentication before we can access the data. Pydap uses
a `Requests`__ session object (which the user can pre-define), and this
session object can recover `authentication`__` credentials from a locally stored
``.netrc`` file. For example, to connect to a server that requires NASA's
URS authentication, with the username/password credentials stored on a locally
accessible ``.netrc``, access to OPeNDAP data should be as simple as this::

    import xarray as xr
    import requests

    my_session = requests.Session()

    ds_url = 'https://gpm1.gesdisc.eosdis.nasa.gov/opendap/hyrax/example.nc'

    ds = xr.open_dataset(ds_url, session=my_session, engine="pydap")

Moreover, a bearer token header can be included in a `Requests`__ session
object, allowing for token-based authentication which  OPeNDAP servers can use
to avoid some redirects.


Lastly, OPeNDAP servers may provide endpoint URLs for different OPeNDAP protocols,
DAP2 and DAP4. To specify which protocol between the two options to use, you can
replace the scheme of the url with the name of the protocol. For example::

    # dap2 url
    ds_url = 'dap2://gpm1.gesdisc.eosdis.nasa.gov/opendap/hyrax/example.nc'

    # dap4 url
    ds_url = 'dap4://gpm1.gesdisc.eosdis.nasa.gov/opendap/hyrax/example.nc'

While most OPeNDAP servers implement DAP2, not all servers implement DAP4. It
is recommended to check if the URL you are using `supports DAP4`__ by checking the
URL on a browser.

__ https://docs.python-requests.org
__ https://pydap.github.io/pydap/en/notebooks/Authentication.html
__ https://pydap.github.io/pydap/en/faqs/dap2_or_dap4_url.html

.. _io.pickle:

Pickle
------

The simplest way to serialize an xarray object is to use Python's built-in pickle
module:

.. jupyter-execute::

    import pickle

    # use the highest protocol (-1) because it is way faster than the default
    # text based pickle format
    pkl = pickle.dumps(ds, protocol=-1)

    pickle.loads(pkl)

Pickling is important because it doesn't require any external libraries
and lets you use xarray objects with Python modules like
:py:mod:`multiprocessing` or :ref:`Dask <dask>`. However, pickling is
**not recommended for long-term storage**.

Restoring a pickle requires that the internal structure of the types for the
pickled data remain unchanged. Because the internal design of xarray is still
being refined, we make no guarantees (at this point) that objects pickled with
this version of xarray will work in future versions.

.. note::

  When pickling an object opened from a NetCDF file, the pickle file will
  contain a reference to the file on disk. If you want to store the actual
  array values, load it into memory first with :py:meth:`Dataset.load`
  or :py:meth:`Dataset.compute`.

.. _dictionary io:

Dictionary
----------

We can convert a ``Dataset`` (or a ``DataArray``) to a dict using
:py:meth:`Dataset.to_dict`:

.. jupyter-execute::

    ds = xr.Dataset({"foo": ("x", np.arange(30))})
    d = ds.to_dict()
    d

We can create a new xarray object from a dict using
:py:meth:`Dataset.from_dict`:

.. jupyter-execute::

    ds_dict = xr.Dataset.from_dict(d)
    ds_dict

Dictionary support allows for flexible use of xarray objects. It doesn't
require external libraries and dicts can easily be pickled, or converted to
json, or geojson. All the values are converted to lists, so dicts might
be quite large.

To export just the dataset schema without the data itself, use the
``data=False`` option:

.. jupyter-execute::

    ds.to_dict(data=False)

.. jupyter-execute::
    :hide-code:

    # We're now done with the dataset named `ds`.  Although the `with` statement closed
    # the dataset, displaying the unpickled pickle of `ds` re-opened "saved_on_disk.nc".
    # However, `ds` (rather than the unpickled dataset) refers to the open file.  Delete
    # `ds` to close the file.
    del ds

    for f in ["saved_on_disk.nc", "saved_on_disk.h5"]:
        if os.path.exists(f):
            os.remove(f)

This can be useful for generating indices of dataset contents to expose to
search indices or other automated data discovery tools.

.. _io.rasterio:

Rasterio
--------

GDAL readable raster data using `rasterio`_  such as GeoTIFFs can be opened using the `rioxarray`_ extension.
`rioxarray`_ can also handle geospatial related tasks such as re-projecting and clipping.

.. jupyter-input::

    import rioxarray

    rds = rioxarray.open_rasterio("RGB.byte.tif")
    rds

.. jupyter-output::

    <xarray.DataArray (band: 3, y: 718, x: 791)>
    [1703814 values with dtype=uint8]
    Coordinates:
      * band         (band) int64 1 2 3
      * y            (y) float64 2.827e+06 2.826e+06 ... 2.612e+06 2.612e+06
      * x            (x) float64 1.021e+05 1.024e+05 ... 3.389e+05 3.392e+05
        spatial_ref  int64 0
    Attributes:
        STATISTICS_MAXIMUM:  255
        STATISTICS_MEAN:     29.947726688477
        STATISTICS_MINIMUM:  0
        STATISTICS_STDDEV:   52.340921626611
        transform:           (300.0379266750948, 0.0, 101985.0, 0.0, -300.0417827...
        _FillValue:          0.0
        scale_factor:        1.0
        add_offset:          0.0
        grid_mapping:        spatial_ref

.. jupyter-input::

    rds.rio.crs
    # CRS.from_epsg(32618)

    rds4326 = rds.rio.reproject("epsg:4326")

    rds4326.rio.crs
    # CRS.from_epsg(4326)

    rds4326.rio.to_raster("RGB.byte.4326.tif")


.. _rasterio: https://rasterio.readthedocs.io/en/latest/
.. _rioxarray: https://corteva.github.io/rioxarray/stable/
.. _test files: https://github.com/rasterio/rasterio/blob/master/tests/data/RGB.byte.tif
.. _pyproj: https://github.com/pyproj4/pyproj

.. _io.cfgrib:

.. jupyter-execute::
    :hide-code:

    import shutil

    shutil.rmtree("foo.zarr")
    shutil.rmtree("path/to/directory.zarr")

GRIB format via cfgrib
----------------------

Xarray supports reading GRIB files via ECMWF cfgrib_ python driver,
if it is installed. To open a GRIB file supply ``engine='cfgrib'``
to :py:func:`open_dataset` after installing cfgrib_:

.. jupyter-input::

    ds_grib = xr.open_dataset("example.grib", engine="cfgrib")

We recommend installing cfgrib via conda::

    conda install -c conda-forge cfgrib

.. _cfgrib: https://github.com/ecmwf/cfgrib


CSV and other formats supported by pandas
-----------------------------------------

For more options (tabular formats and CSV files in particular), consider
exporting your objects to pandas and using its broad range of `IO tools`_.
For CSV files, one might also consider `xarray_extras`_.

.. _xarray_extras: https://xarray-extras.readthedocs.io/en/latest/api/csv.html

.. _IO tools: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html


Third party libraries
---------------------

More formats are supported by extension libraries:

- `xarray-mongodb <https://xarray-mongodb.readthedocs.io/en/latest/>`_: Store xarray objects on MongoDB

    .. currentmodule:: xarray

.. _options:

Configuration
=============

Xarray offers a small number of configuration options through :py:func:`set_options`. With these, you can

1. Control the ``repr``:

   - ``display_expand_attrs``
   - ``display_expand_coords``
   - ``display_expand_data``
   - ``display_expand_data_vars``
   - ``display_max_rows``
   - ``display_style``

2. Control behaviour during operations: ``arithmetic_join``, ``keep_attrs``, ``use_bottleneck``.
3. Control colormaps for plots:``cmap_divergent``, ``cmap_sequential``.
4. Aspects of file reading: ``file_cache_maxsize``, ``warn_on_unclosed_files``.


You can set these options either globally

::

  xr.set_options(arithmetic_join="exact")

or locally as a context manager:

::

   with xr.set_options(arithmetic_join="exact"):
       # do operation here
       pass

    .. currentmodule:: xarray
.. _pandas:

===================
Working with pandas
===================

One of the most important features of xarray is the ability to convert to and
from :py:mod:`pandas` objects to interact with the rest of the PyData
ecosystem. For example, for plotting labeled data, we highly recommend
using the `visualization built in to pandas itself`__ or provided by the pandas
aware libraries such as `Seaborn`__.

__ https://pandas.pydata.org/pandas-docs/stable/visualization.html
__ https://seaborn.pydata.org/

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

Hierarchical and tidy data
~~~~~~~~~~~~~~~~~~~~~~~~~~

Tabular data is easiest to work with when it meets the criteria for
`tidy data`__:

* Each column holds a different variable.
* Each rows holds a different observation.

__ https://www.jstatsoft.org/v59/i10/

In this "tidy data" format, we can represent any :py:class:`Dataset` and
:py:class:`DataArray` in terms of :py:class:`~pandas.DataFrame` and
:py:class:`~pandas.Series`, respectively (and vice-versa). The representation
works by flattening non-coordinates to 1D, and turning the tensor product of
coordinate indexes into a :py:class:`pandas.MultiIndex`.

Dataset and DataFrame
---------------------

To convert any dataset to a ``DataFrame`` in tidy form, use the
:py:meth:`Dataset.to_dataframe()` method:

.. jupyter-execute::

    ds = xr.Dataset(
        {"foo": (("x", "y"), np.random.randn(2, 3))},
        coords={
            "x": [10, 20],
            "y": ["a", "b", "c"],
            "along_x": ("x", np.random.randn(2)),
            "scalar": 123,
        },
    )
    ds

.. jupyter-execute::

    df = ds.to_dataframe()
    df

We see that each variable and coordinate in the Dataset is now a column in the
DataFrame, with the exception of indexes which are in the index.
To convert the ``DataFrame`` to any other convenient representation,
use ``DataFrame`` methods like :py:meth:`~pandas.DataFrame.reset_index`,
:py:meth:`~pandas.DataFrame.stack` and :py:meth:`~pandas.DataFrame.unstack`.

For datasets containing dask arrays where the data should be lazily loaded, see the
:py:meth:`Dataset.to_dask_dataframe()` method.

To create a ``Dataset`` from a ``DataFrame``, use the
:py:meth:`Dataset.from_dataframe` class method or the equivalent
:py:meth:`pandas.DataFrame.to_xarray` method:

.. jupyter-execute::

    xr.Dataset.from_dataframe(df)

Notice that the dimensions of variables in the ``Dataset`` have now
expanded after the round-trip conversion to a ``DataFrame``. This is because
every object in a ``DataFrame`` must have the same indices, so we need to
broadcast the data of each array to the full size of the new ``MultiIndex``.

Likewise, all the coordinates (other than indexes) ended up as variables,
because pandas does not distinguish non-index coordinates.

DataArray and Series
--------------------

``DataArray`` objects have a complementary representation in terms of a
:py:class:`~pandas.Series`. Using a Series preserves the ``Dataset`` to
``DataArray`` relationship, because ``DataFrames`` are dict-like containers
of ``Series``. The methods are very similar to those for working with
DataFrames:

.. jupyter-execute::

    s = ds["foo"].to_series()
    s

.. jupyter-execute::

    # or equivalently, with Series.to_xarray()
    xr.DataArray.from_series(s)

Both the ``from_series`` and ``from_dataframe`` methods use reindexing, so they
work even if the hierarchical index is not a full tensor product:

.. jupyter-execute::

    s[::2]

.. jupyter-execute::

    s[::2].to_xarray()

Lossless and reversible conversion
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The previous ``Dataset`` example shows that the conversion is not reversible (lossy roundtrip) and
that the size of the ``Dataset`` increases.

Particularly after a roundtrip, the following deviations are noted:

- a non-dimension Dataset ``coordinate`` is converted into ``variable``
- a non-dimension DataArray ``coordinate`` is not converted
- ``dtype`` is not always the same (e.g. "str" is converted to "object")
- ``attrs`` metadata is not conserved

To avoid these problems, the third-party `ntv-pandas <https://github.com/loco-philippe/ntv-pandas>`__ library offers lossless and reversible conversions between
``Dataset``/ ``DataArray`` and pandas ``DataFrame`` objects.

This solution is particularly interesting for converting any ``DataFrame`` into a ``Dataset`` (the converter finds the multidimensional structure hidden by the tabular structure).

The `ntv-pandas examples <https://github.com/loco-philippe/ntv-pandas/tree/main/example>`__ show how to improve the conversion for the previous ``Dataset`` example and for more complex examples.

Multi-dimensional data
~~~~~~~~~~~~~~~~~~~~~~

Tidy data is great, but it sometimes you want to preserve dimensions instead of
automatically stacking them into a ``MultiIndex``.

:py:meth:`DataArray.to_pandas()` is a shortcut that lets you convert a
DataArray directly into a pandas object with the same dimensionality, if
available in pandas (i.e., a 1D array is converted to a
:py:class:`~pandas.Series` and 2D to :py:class:`~pandas.DataFrame`):

.. jupyter-execute::

    arr = xr.DataArray(
        np.random.randn(2, 3), coords=[("x", [10, 20]), ("y", ["a", "b", "c"])]
    )
    df = arr.to_pandas()
    df

To perform the inverse operation of converting any pandas objects into a data
array with the same shape, simply use the :py:class:`DataArray`
constructor:

.. jupyter-execute::

    xr.DataArray(df)

Both the ``DataArray`` and ``Dataset`` constructors directly convert pandas
objects into xarray objects with the same shape. This means that they
preserve all use of multi-indexes:

.. jupyter-execute::

    index = pd.MultiIndex.from_arrays(
        [["a", "a", "b"], [0, 1, 2]], names=["one", "two"]
    )
    df = pd.DataFrame({"x": 1, "y": 2}, index=index)
    ds = xr.Dataset(df)
    ds

However, you will need to set dimension names explicitly, either with the
``dims`` argument on in the ``DataArray`` constructor or by calling
:py:class:`~Dataset.rename` on the new object.

.. _panel transition:

Transitioning from pandas.Panel to xarray
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

``Panel``, pandas' data structure for 3D arrays, was always a second class
data structure compared to the Series and DataFrame. To allow pandas
developers to focus more on its core functionality built around the
DataFrame, pandas removed ``Panel`` in favor of directing users who use
multi-dimensional arrays to xarray.

Xarray has most of ``Panel``'s features, a more explicit API (particularly around
indexing), and the ability to scale to >3 dimensions with the same interface.

As discussed in the :ref:`data structures section of the docs <data structures>`, there are two primary data structures in
xarray: ``DataArray`` and ``Dataset``. You can imagine a ``DataArray`` as a
n-dimensional pandas ``Series`` (i.e. a single typed array), and a ``Dataset``
as the ``DataFrame`` equivalent (i.e. a dict of aligned ``DataArray`` objects).

So you can represent a Panel, in two ways:

- As a 3-dimensional ``DataArray``,
- Or as a ``Dataset`` containing a number of 2-dimensional DataArray objects.

Let's take a look:

.. jupyter-execute::

    data = np.random.default_rng(0).random((2, 3, 4))
    items = list("ab")
    major_axis = list("mno")
    minor_axis = pd.date_range(start="2000", periods=4, name="date")

With old versions of pandas (prior to 0.25), this could stored in a ``Panel``:

.. jupyter-input::

    pd.Panel(data, items, major_axis, minor_axis)

.. jupyter-output::

    <class 'pandas.core.panel.Panel'>
    Dimensions: 2 (items) x 3 (major_axis) x 4 (minor_axis)
    Items axis: a to b
    Major_axis axis: m to o
    Minor_axis axis: 2000-01-01 00:00:00 to 2000-01-04 00:00:00

To put this data in a ``DataArray``, write:

.. jupyter-execute::

    array = xr.DataArray(data, [items, major_axis, minor_axis])
    array

As you can see, there are three dimensions (each is also a coordinate). Two of
the axes of were unnamed, so have been assigned ``dim_0`` and ``dim_1``
respectively, while the third retains its name ``date``.

You can also easily convert this data into ``Dataset``:

.. jupyter-execute::

    array.to_dataset(dim="dim_0")

Here, there are two data variables, each representing a DataFrame on panel's
``items`` axis, and labeled as such. Each variable is a 2D array of the
respective values along the ``items`` dimension.

While the xarray docs are relatively complete, a few items stand out for Panel users:

- A DataArray's data is stored as a numpy array, and so can only contain a single
  type. As a result, a Panel that contains :py:class:`~pandas.DataFrame` objects
  with multiple types will be converted to ``dtype=object``. A ``Dataset`` of
  multiple ``DataArray`` objects each with its own dtype will allow original
  types to be preserved.
- :ref:`Indexing <indexing>` is similar to pandas, but more explicit and
  leverages xarray's naming of dimensions.
- Because of those features, making much higher dimensional data is very
  practical.
- Variables in ``Dataset`` objects can use a subset of its dimensions. For
  example, you can have one dataset with Person x Score x Time, and another with
  Person x Score.
- You can use coordinates are used for both dimensions and for variables which
  _label_ the data variables, so you could have a coordinate Age, that labelled
  the Person dimension of a Dataset of Person x Score x Time.

While xarray may take some getting used to, it's worth it! If anything is unclear,
please `post an issue on GitHub <https://github.com/pydata/xarray>`__ or
`StackOverflow <https://stackoverflow.com/questions/tagged/python-xarray>`__,
and we'll endeavor to respond to the specific case or improve the general docs.

    .. currentmodule:: xarray
.. _plotting:

Plotting
========

Introduction
------------

Labeled data enables expressive computations. These same
labels can also be used to easily create informative plots.

Xarray's plotting capabilities are centered around
:py:class:`DataArray` objects.
To plot :py:class:`Dataset` objects
simply access the relevant DataArrays, i.e. ``dset['var1']``.
Dataset specific plotting routines are also available (see :ref:`plot-dataset`).
Here we focus mostly on arrays 2d or larger. If your data fits
nicely into a pandas DataFrame then you're better off using one of the more
developed tools there.

Xarray plotting functionality is a thin wrapper around the popular
`matplotlib <https://matplotlib.org/>`_ library.
Matplotlib syntax and function names were copied as much as possible, which
makes for an easy transition between the two.
Matplotlib must be installed before xarray can plot.

To use xarray's plotting capabilities with time coordinates containing
``cftime.datetime`` objects
`nc-time-axis <https://github.com/SciTools/nc-time-axis>`_ v1.3.0 or later
needs to be installed.

For more extensive plotting applications consider the following projects:

- `Seaborn <https://seaborn.pydata.org/>`_: "provides
  a high-level interface for drawing attractive statistical graphics."
  Integrates well with pandas.

- `HoloViews <https://holoviews.org/>`_
  and `GeoViews <https://geoviews.org/>`_: "Composable, declarative
  data structures for building even complex visualizations easily." Includes
  native support for xarray objects.

- `hvplot <https://hvplot.pyviz.org/>`_: ``hvplot`` makes it very easy to produce
  dynamic plots (backed by ``Holoviews`` or ``Geoviews``) by adding a ``hvplot``
  accessor to DataArrays.

- `Cartopy <https://scitools.org.uk/cartopy/docs/latest/>`_: Provides cartographic
  tools.

Imports
~~~~~~~

.. jupyter-execute::
    :hide-code:

    # Use defaults so we don't get gridlines in generated docs
    import matplotlib as mpl

    mpl.rcdefaults()

The following imports are necessary for all of the examples.

.. jupyter-execute::

    import cartopy.crs as ccrs
    import matplotlib.pyplot as plt
    import numpy as np
    import pandas as pd
    import xarray as xr

For these examples we'll use the North American air temperature dataset.

.. jupyter-execute::

    airtemps = xr.tutorial.open_dataset("air_temperature")
    airtemps

.. jupyter-execute::

    # Convert to celsius
    air = airtemps.air - 273.15

    # copy attributes to get nice figure labels and change Kelvin to Celsius
    air.attrs = airtemps.air.attrs
    air.attrs["units"] = "deg C"

.. note::
   Until :issue:`1614` is solved, you might need to copy over the metadata in ``attrs`` to get informative figure labels (as was done above).


DataArrays
----------

One Dimension
~~~~~~~~~~~~~

================
 Simple Example
================

The simplest way to make a plot is to call the :py:func:`DataArray.plot()` method.

.. jupyter-execute::

    air1d = air.isel(lat=10, lon=10)
    air1d.plot();

Xarray uses the coordinate name along with metadata ``attrs.long_name``,
``attrs.standard_name``, ``DataArray.name`` and ``attrs.units`` (if available)
to label the axes.
The names ``long_name``, ``standard_name`` and ``units`` are copied from the
`CF-conventions spec <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/build/ch03s03.html>`_.
When choosing names, the order of precedence is ``long_name``, ``standard_name`` and finally ``DataArray.name``.
The y-axis label in the above plot was constructed from the ``long_name`` and ``units`` attributes of ``air1d``.

.. jupyter-execute::

    air1d.attrs

======================
 Additional Arguments
======================

Additional arguments are passed directly to the matplotlib function which
does the work.
For example, :py:func:`xarray.plot.line` calls
matplotlib.pyplot.plot_ passing in the index and the array values as x and y, respectively.
So to make a line plot with blue triangles a matplotlib format string
can be used:

.. _matplotlib.pyplot.plot: https://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot

.. jupyter-execute::

    air1d[:200].plot.line("b-^");

.. note::
    Not all xarray plotting methods support passing positional arguments
    to the wrapped matplotlib functions, but they do all
    support keyword arguments.

Keyword arguments work the same way, and are more explicit.

.. jupyter-execute::

    air1d[:200].plot.line(color="purple", marker="o");

=========================
 Adding to Existing Axis
=========================

To add the plot to an existing axis pass in the axis as a keyword argument
``ax``. This works for all xarray plotting methods.
In this example ``axs`` is an array consisting of the left and right
axes created by ``plt.subplots``.

.. jupyter-execute::

    fig, axs = plt.subplots(ncols=2)

    print(axs)

    air1d.plot(ax=axs[0])
    air1d.plot.hist(ax=axs[1]);

On the right is a histogram created by :py:func:`xarray.plot.hist`.

.. _plotting.figsize:

=============================
 Controlling the figure size
=============================

You can pass a ``figsize`` argument to all xarray's plotting methods to
control the figure size. For convenience, xarray's plotting methods also
support the ``aspect`` and ``size`` arguments which control the size of the
resulting image via the formula ``figsize = (aspect * size, size)``:

.. jupyter-execute::

    air1d.plot(aspect=2, size=3);

This feature also works with :ref:`plotting.faceting`. For facet plots,
``size`` and ``aspect`` refer to a single panel (so that ``aspect * size``
gives the width of each facet in inches), while ``figsize`` refers to the
entire figure (as for matplotlib's ``figsize`` argument).

.. note::

    If ``figsize`` or ``size`` are used, a new figure is created,
    so this is mutually exclusive with the ``ax`` argument.

.. note::

    The convention used by xarray (``figsize = (aspect * size, size)``) is
    borrowed from seaborn: it is therefore `not equivalent to matplotlib's`_.

.. _not equivalent to matplotlib's: https://github.com/mwaskom/seaborn/issues/746


.. _plotting.multiplelines:

=========================
 Determine x-axis values
=========================

Per default dimension coordinates are used for the x-axis (here the time coordinates).
However, you can also use non-dimension coordinates, MultiIndex levels, and dimensions
without coordinates along the x-axis. To illustrate this, let's calculate a 'decimal day' (epoch)
from the time and assign it as a non-dimension coordinate:

.. jupyter-execute::

    decimal_day = (air1d.time - air1d.time[0]) / pd.Timedelta("1d")
    air1d_multi = air1d.assign_coords(decimal_day=("time", decimal_day.data))
    air1d_multi

To use ``'decimal_day'`` as x coordinate it must be explicitly specified:

.. jupyter-execute::

    air1d_multi.plot(x="decimal_day");

Creating a new MultiIndex named ``'date'`` from ``'time'`` and ``'decimal_day'``,
it is also possible to use a MultiIndex level as x-axis:

.. jupyter-execute::

    air1d_multi = air1d_multi.set_index(date=("time", "decimal_day"))
    air1d_multi.plot(x="decimal_day");

Finally, if a dataset does not have any coordinates it enumerates all data points:

.. jupyter-execute::

    air1d_multi = air1d_multi.drop_vars(["date", "time", "decimal_day"])
    air1d_multi.plot();

The same applies to 2D plots below.

====================================================
 Multiple lines showing variation along a dimension
====================================================

It is possible to make line plots of two-dimensional data by calling :py:func:`xarray.plot.line`
with appropriate arguments. Consider the 3D variable ``air`` defined above. We can use line
plots to check the variation of air temperature at three different latitudes along a longitude line:

.. jupyter-execute::

    air.isel(lon=10, lat=[19, 21, 22]).plot.line(x="time");

It is required to explicitly specify either

1. ``x``: the dimension to be used for the x-axis, or
2. ``hue``: the dimension you want to represent by multiple lines.

Thus, we could have made the previous plot by specifying ``hue='lat'`` instead of ``x='time'``.
If required, the automatic legend can be turned off using ``add_legend=False``. Alternatively,
``hue`` can be passed directly to :py:func:`xarray.plot.line` as ``air.isel(lon=10, lat=[19,21,22]).plot.line(hue='lat')``.


========================
 Dimension along y-axis
========================

It is also possible to make line plots such that the data are on the x-axis and a dimension is on the y-axis. This can be done by specifying the appropriate ``y`` keyword argument.

.. jupyter-execute::

    air.isel(time=10, lon=[10, 11]).plot(y="lat", hue="lon");

============
 Step plots
============

As an alternative, also a step plot similar to matplotlib's ``plt.step`` can be
made using 1D data.

.. jupyter-execute::

    air1d[:20].plot.step(where="mid");

The argument ``where`` defines where the steps should be placed, options are
``'pre'`` (default), ``'post'``, and ``'mid'``. This is particularly handy
when plotting data grouped with :py:meth:`Dataset.groupby_bins`.

.. jupyter-execute::

    air_grp = air.mean(["time", "lon"]).groupby_bins("lat", [0, 23.5, 66.5, 90])
    air_mean = air_grp.mean()
    air_std = air_grp.std()
    air_mean.plot.step()
    (air_mean + air_std).plot.step(ls=":")
    (air_mean - air_std).plot.step(ls=":")
    plt.ylim(-20, 30)
    plt.title("Zonal mean temperature");

In this case, the actual boundaries of the bins are used and the ``where`` argument
is ignored.


Other axes kwargs
~~~~~~~~~~~~~~~~~


The keyword arguments ``xincrease`` and ``yincrease`` let you control the axes direction.

.. jupyter-execute::

    air.isel(time=10, lon=[10, 11]).plot.line(
        y="lat", hue="lon", xincrease=False, yincrease=False
    );

In addition, one can use ``xscale, yscale`` to set axes scaling;
``xticks, yticks`` to set axes ticks and ``xlim, ylim`` to set axes limits.
These accept the same values as the matplotlib methods ``ax.set_(x,y)scale()``,
``ax.set_(x,y)ticks()``, ``ax.set_(x,y)lim()``, respectively.


Two Dimensions
~~~~~~~~~~~~~~

================
 Simple Example
================

The default method :py:meth:`DataArray.plot` calls :py:func:`xarray.plot.pcolormesh`
by default when the data is two-dimensional.

.. jupyter-execute::

    air2d = air.isel(time=500)
    air2d.plot();

All 2d plots in xarray allow the use of the keyword arguments ``yincrease``
and ``xincrease``.

.. jupyter-execute::

    air2d.plot(yincrease=False);

.. note::

    We use :py:func:`xarray.plot.pcolormesh` as the default two-dimensional plot
    method because it is more flexible than :py:func:`xarray.plot.imshow`.
    However, for large arrays, ``imshow`` can be much faster than ``pcolormesh``.
    If speed is important to you and you are plotting a regular mesh, consider
    using ``imshow``.

================
 Missing Values
================

Xarray plots data with :ref:`missing_values`.

.. jupyter-execute::

    bad_air2d = air2d.copy()
    bad_air2d[dict(lat=slice(0, 10), lon=slice(0, 25))] = np.nan
    bad_air2d.plot();

========================
 Nonuniform Coordinates
========================

It's not necessary for the coordinates to be evenly spaced. Both
:py:func:`xarray.plot.pcolormesh` (default) and :py:func:`xarray.plot.contourf` can
produce plots with nonuniform coordinates.

.. jupyter-execute::

    b = air2d.copy()
    # Apply a nonlinear transformation to one of the coords
    b.coords["lat"] = np.log(b.coords["lat"])

    b.plot();

====================
 Other types of plot
====================

There are several other options for plotting 2D data.

Contour plot using :py:meth:`DataArray.plot.contour()`

.. jupyter-execute::

    air2d.plot.contour();

Filled contour plot using :py:meth:`DataArray.plot.contourf()`

.. jupyter-execute::

    air2d.plot.contourf();

Surface plot using :py:meth:`DataArray.plot.surface()`

.. jupyter-execute::

    # transpose just to make the example look a bit nicer
    air2d.T.plot.surface();

====================
 Calling Matplotlib
====================

Since this is a thin wrapper around matplotlib, all the functionality of
matplotlib is available.

.. jupyter-execute::

    air2d.plot(cmap=plt.cm.Blues)
    plt.title("These colors prove North America\nhas fallen in the ocean")
    plt.ylabel("latitude")
    plt.xlabel("longitude");

.. note::

    Xarray methods update label information and generally play around with the
    axes. So any kind of updates to the plot
    should be done *after* the call to the xarray's plot.
    In the example below, ``plt.xlabel`` effectively does nothing, since
    ``d_ylog.plot()`` updates the xlabel.

    .. jupyter-execute::

        plt.xlabel("Never gonna see this.")
        air2d.plot();

===========
 Colormaps
===========

Xarray borrows logic from Seaborn to infer what kind of color map to use. For
example, consider the original data in Kelvins rather than Celsius:

.. jupyter-execute::

    airtemps.air.isel(time=0).plot();

The Celsius data contain 0, so a diverging color map was used. The
Kelvins do not have 0, so the default color map was used.

.. _robust-plotting:

========
 Robust
========

Outliers often have an extreme effect on the output of the plot.
Here we add two bad data points. This affects the color scale,
washing out the plot.

.. jupyter-execute::

    air_outliers = airtemps.air.isel(time=0).copy()
    air_outliers[0, 0] = 100
    air_outliers[-1, -1] = 400

    air_outliers.plot();

This plot shows that we have outliers. The easy way to visualize
the data without the outliers is to pass the parameter
``robust=True``.
This will use the 2nd and 98th
percentiles of the data to compute the color limits.

.. jupyter-execute::

    air_outliers.plot(robust=True);

Observe that the ranges of the color bar have changed. The arrows on the
color bar indicate
that the colors include data points outside the bounds.

====================
 Discrete Colormaps
====================

It is often useful, when visualizing 2d data, to use a discrete colormap,
rather than the default continuous colormaps that matplotlib uses. The
``levels`` keyword argument can be used to generate plots with discrete
colormaps. For example, to make a plot with 8 discrete color intervals:

.. jupyter-execute::

    air2d.plot(levels=8);

It is also possible to use a list of levels to specify the boundaries of the
discrete colormap:

.. jupyter-execute::

    air2d.plot(levels=[0, 12, 18, 30]);

You can also specify a list of discrete colors through the ``colors`` argument:

.. jupyter-execute::

    flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
    air2d.plot(levels=[0, 12, 18, 30], colors=flatui);

Finally, if you have `Seaborn <https://seaborn.pydata.org/>`_
installed, you can also specify a seaborn color palette to the ``cmap``
argument. Note that ``levels`` *must* be specified with seaborn color palettes
if using ``imshow`` or ``pcolormesh`` (but not with ``contour`` or ``contourf``,
since levels are chosen automatically).

.. jupyter-execute::

    air2d.plot(levels=10, cmap="husl");

.. _plotting.faceting:

Faceting
~~~~~~~~

Faceting here refers to splitting an array along one or two dimensions and
plotting each group.
Xarray's basic plotting is useful for plotting two dimensional arrays. What
about three or four dimensional arrays? That's where facets become helpful.
The general approach to plotting here is called “small multiples”, where the
same kind of plot is repeated multiple times, and the specific use of small
multiples to display the same relationship conditioned on one or more other
variables is often called a “trellis plot”.

Consider the temperature data set. There are 4 observations per day for two
years which makes for 2920 values along the time dimension.
One way to visualize this data is to make a
separate plot for each time period.

The faceted dimension should not have too many values;
faceting on the time dimension will produce 2920 plots. That's
too much to be helpful. To handle this situation try performing
an operation that reduces the size of the data in some way. For example, we
could compute the average air temperature for each month and reduce the
size of this dimension from 2920 -> 12. A simpler way is
to just take a slice on that dimension.
So let's use a slice to pick 6 times throughout the first year.

.. jupyter-execute::

    t = air.isel(time=slice(0, 365 * 4, 250))
    t.coords

================
 Simple Example
================

The easiest way to create faceted plots is to pass in ``row`` or ``col``
arguments to the xarray plotting methods/functions. This returns a
:py:class:`xarray.plot.FacetGrid` object.

.. jupyter-execute::

    g_simple = t.plot(x="lon", y="lat", col="time", col_wrap=3);

Faceting also works for line plots.

.. jupyter-execute::

    g_simple_line = t.isel(lat=slice(0, None, 4)).plot(
        x="lon", hue="lat", col="time", col_wrap=3
    );

===============
 4 dimensional
===============

For 4 dimensional arrays we can use the rows and columns of the grids.
Here we create a 4 dimensional array by taking the original data and adding
a fixed amount. Now we can see how the temperature maps would compare if
one were much hotter.

.. jupyter-execute::

    t2 = t.isel(time=slice(0, 2))
    t4d = xr.concat([t2, t2 + 40], pd.Index(["normal", "hot"], name="fourth_dim"))
    # This is a 4d array
    t4d.coords

    t4d.plot(x="lon", y="lat", col="time", row="fourth_dim");

================
 Other features
================

Faceted plotting supports other arguments common to xarray 2d plots.

.. jupyter-execute::

    hasoutliers = t.isel(time=slice(0, 5)).copy()
    hasoutliers[0, 0, 0] = -100
    hasoutliers[-1, -1, -1] = 400

    g = hasoutliers.plot.pcolormesh(
        x="lon",
        y="lat",
        col="time",
        col_wrap=3,
        robust=True,
        cmap="viridis",
        cbar_kwargs={"label": "this has outliers"},
    )

===================
 FacetGrid Objects
===================

The object returned, ``g`` in the above examples, is a :py:class:`~xarray.plot.FacetGrid` object
that links a :py:class:`DataArray` to a matplotlib figure with a particular structure.
This object can be used to control the behavior of the multiple plots.
It borrows an API and code from `Seaborn's FacetGrid
<https://seaborn.pydata.org/tutorial/axis_grids.html>`_.
The structure is contained within the ``axs`` and ``name_dicts``
attributes, both 2d NumPy object arrays.

.. jupyter-execute::

    g.axs

.. jupyter-execute::

    g.name_dicts

It's possible to select the :py:class:`xarray.DataArray` or
:py:class:`xarray.Dataset` corresponding to the FacetGrid through the
``name_dicts``.

.. jupyter-execute::

    g.data.loc[g.name_dicts[0, 0]]

Here is an example of using the lower level API and then modifying the axes after
they have been plotted.

.. jupyter-execute::


    g = t.plot.imshow(x="lon", y="lat", col="time", col_wrap=3, robust=True)

    for i, ax in enumerate(g.axs.flat):
        ax.set_title("Air Temperature %d" % i)

    bottomright = g.axs[-1, -1]
    bottomright.annotate("bottom right", (240, 40));


:py:class:`~xarray.plot.FacetGrid` objects have methods that let you customize the automatically generated
axis labels, axis ticks and plot titles. See :py:meth:`~xarray.plot.FacetGrid.set_titles`,
:py:meth:`~xarray.plot.FacetGrid.set_xlabels`, :py:meth:`~xarray.plot.FacetGrid.set_ylabels` and
:py:meth:`~xarray.plot.FacetGrid.set_ticks` for more information.
Plotting functions can be applied to each subset of the data by calling
:py:meth:`~xarray.plot.FacetGrid.map_dataarray` or to each subplot by calling :py:meth:`~xarray.plot.FacetGrid.map`.

TODO: add an example of using the ``map`` method to plot dataset variables
(e.g., with ``plt.quiver``).

.. _plot-dataset:

Datasets
--------

Xarray has limited support for plotting Dataset variables against each other.
Consider this dataset

.. jupyter-execute::

    ds = xr.tutorial.scatter_example_dataset(seed=42)
    ds


Scatter
~~~~~~~

Let's plot the ``A`` DataArray as a function of the ``y`` coord

.. jupyter-execute::

    with xr.set_options(display_expand_data=False):
        display(ds.A)

.. jupyter-execute::

    ds.A.plot.scatter(x="y");

Same plot can be displayed using the dataset:

.. jupyter-execute::

    ds.plot.scatter(x="y", y="A");

Now suppose we want to scatter the ``A`` DataArray against the ``B`` DataArray

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B");

The ``hue`` kwarg lets you vary the color by variable value

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B", hue="w");

You can force a legend instead of a colorbar by setting ``add_legend=True, add_colorbar=False``.

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B", hue="w", add_legend=True, add_colorbar=False);

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B", hue="w", add_legend=False, add_colorbar=True);

The ``markersize`` kwarg lets you vary the point's size by variable value.
You can additionally pass ``size_norm`` to control how the variable's values are mapped to point sizes.

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B", hue="y", markersize="z");

The ``z`` kwarg lets you plot the data along the z-axis as well.

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B", z="z", hue="y", markersize="x");

Faceting is also possible

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B", hue="y", markersize="x", row="x", col="w");

And adding the z-axis

.. jupyter-execute::

    ds.plot.scatter(x="A", y="B", z="z", hue="y", markersize="x", row="x", col="w");

For more advanced scatter plots, we recommend converting the relevant data variables
to a pandas DataFrame and using the extensive plotting capabilities of ``seaborn``.

Quiver
~~~~~~

Visualizing vector fields is supported with quiver plots:

.. jupyter-execute::

    ds.isel(w=1, z=1).plot.quiver(x="x", y="y", u="A", v="B");


where ``u`` and ``v`` denote the x and y direction components of the arrow vectors. Again, faceting is also possible:

.. jupyter-execute::

    ds.plot.quiver(x="x", y="y", u="A", v="B", col="w", row="z", scale=4);

``scale`` is required for faceted quiver plots.
The scale determines the number of data units per arrow length unit, i.e. a smaller scale parameter makes the arrow longer.

Streamplot
~~~~~~~~~~

Visualizing vector fields is also supported with streamline plots:

.. jupyter-execute::

    ds.isel(w=1, z=1).plot.streamplot(x="x", y="y", u="A", v="B");


where ``u`` and ``v`` denote the x and y direction components of the vectors tangent to the streamlines.
Again, faceting is also possible:

.. jupyter-execute::

    ds.plot.streamplot(x="x", y="y", u="A", v="B", col="w", row="z");

.. _plot-maps:

Maps
----

To follow this section you'll need to have Cartopy installed and working.

This script will plot the air temperature on a map.

.. jupyter-execute::
    :stderr:

    air = xr.tutorial.open_dataset("air_temperature").air

    p = air.isel(time=0).plot(
        subplot_kws=dict(projection=ccrs.Orthographic(-80, 35), facecolor="gray"),
        transform=ccrs.PlateCarree(),
    )
    p.axes.set_global()

    p.axes.coastlines();

When faceting on maps, the projection can be transferred to the ``plot``
function using the ``subplot_kws`` keyword. The axes for the subplots created
by faceting are accessible in the object returned by ``plot``:

.. jupyter-execute::

    p = air.isel(time=[0, 4]).plot(
        transform=ccrs.PlateCarree(),
        col="time",
        subplot_kws={"projection": ccrs.Orthographic(-80, 35)},
    )
    for ax in p.axs.flat:
        ax.coastlines()
        ax.gridlines()


Details
-------

Ways to Use
~~~~~~~~~~~

There are three ways to use the xarray plotting functionality:

1. Use ``plot`` as a convenience method for a DataArray.

2. Access a specific plotting method from the ``plot`` attribute of a
   DataArray.

3. Directly from the xarray plot submodule.

These are provided for user convenience; they all call the same code.

.. jupyter-execute::

    da = xr.DataArray(range(5))
    fig, axs = plt.subplots(ncols=2, nrows=2)
    da.plot(ax=axs[0, 0])
    da.plot.line(ax=axs[0, 1])
    xr.plot.plot(da, ax=axs[1, 0])
    xr.plot.line(da, ax=axs[1, 1]);

Here the output is the same. Since the data is 1 dimensional the line plot
was used.

The convenience method :py:meth:`xarray.DataArray.plot` dispatches to an appropriate
plotting function based on the dimensions of the ``DataArray`` and whether
the coordinates are sorted and uniformly spaced. This table
describes what gets plotted:

=============== ===========================
Dimensions      Plotting function
--------------- ---------------------------
1               :py:func:`xarray.plot.line`
2               :py:func:`xarray.plot.pcolormesh`
Anything else   :py:func:`xarray.plot.hist`
=============== ===========================

Coordinates
~~~~~~~~~~~

If you'd like to find out what's really going on in the coordinate system,
read on.

.. jupyter-execute::

    a0 = xr.DataArray(np.zeros((4, 3, 2)), dims=("y", "x", "z"), name="temperature")
    a0[0, 0, 0] = 1
    a = a0.isel(z=0)
    a

The plot will produce an image corresponding to the values of the array.
Hence the top left pixel will be a different color than the others.
Before reading on, you may want to look at the coordinates and
think carefully about what the limits, labels, and orientation for
each of the axes should be.

.. jupyter-execute::

    a.plot();

It may seem strange that
the values on the y axis are decreasing with -0.5 on the top. This is because
the pixels are centered over their coordinates, and the
axis labels and ranges correspond to the values of the
coordinates.

Multidimensional coordinates
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

See also: :ref:`/examples/multidimensional-coords.ipynb`.

You can plot irregular grids defined by multidimensional coordinates with
xarray, but you'll have to tell the plot function to use these coordinates
instead of the default ones:

.. jupyter-execute::

    lon, lat = np.meshgrid(np.linspace(-20, 20, 5), np.linspace(0, 30, 4))
    lon += lat / 10
    lat += lon / 10
    da = xr.DataArray(
        np.arange(20).reshape(4, 5),
        dims=["y", "x"],
        coords={"lat": (("y", "x"), lat), "lon": (("y", "x"), lon)},
    )

    da.plot.pcolormesh(x="lon", y="lat");

Note that in this case, xarray still follows the pixel centered convention.
This might be undesirable in some cases, for example when your data is defined
on a polar projection (:issue:`781`). This is why the default is to not follow
this convention when plotting on a map:

.. jupyter-execute::
    :stderr:

    ax = plt.subplot(projection=ccrs.PlateCarree())
    da.plot.pcolormesh(x="lon", y="lat", ax=ax)
    ax.scatter(lon, lat, transform=ccrs.PlateCarree())
    ax.coastlines()
    ax.gridlines(draw_labels=True);

You can however decide to infer the cell boundaries and use the
``infer_intervals`` keyword:

.. jupyter-execute::

    ax = plt.subplot(projection=ccrs.PlateCarree())
    da.plot.pcolormesh(x="lon", y="lat", ax=ax, infer_intervals=True)
    ax.scatter(lon, lat, transform=ccrs.PlateCarree())
    ax.coastlines()
    ax.gridlines(draw_labels=True);

.. note::
    The data model of xarray does not support datasets with `cell boundaries`_
    yet. If you want to use these coordinates, you'll have to make the plots
    outside the xarray framework.

.. _cell boundaries: https://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#cell-boundaries

One can also make line plots with multidimensional coordinates. In this case, ``hue`` must be a dimension name, not a coordinate name.

.. jupyter-execute::

    f, ax = plt.subplots(2, 1)
    da.plot.line(x="lon", hue="y", ax=ax[0])
    da.plot.line(x="lon", hue="x", ax=ax[1]);

    .. _reshape:

###############################
Reshaping and reorganizing data
###############################

Reshaping and reorganizing data refers to the process of changing the structure or organization of data by modifying dimensions, array shapes, order of values, or indexes. Xarray provides several methods to accomplish these tasks.

These methods are particularly useful for reshaping xarray objects for use in machine learning packages, such as scikit-learn, that usually require two-dimensional numpy arrays as inputs. Reshaping can also be required before passing data to external visualization tools, for example geospatial data might expect input organized into a particular format corresponding to stacks of satellite images.

Importing the library
---------------------

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

    # Use defaults so we don't get gridlines in generated docs
    import matplotlib as mpl

    mpl.rcdefaults()

Reordering dimensions
---------------------

To reorder dimensions on a :py:class:`~xarray.DataArray` or across all variables
on a :py:class:`~xarray.Dataset`, use :py:meth:`~xarray.DataArray.transpose`. An
ellipsis (`...`) can be used to represent all other dimensions:

.. jupyter-execute::

    ds = xr.Dataset({"foo": (("x", "y", "z"), [[[42]]]), "bar": (("y", "z"), [[24]])})
    ds.transpose("y", "z", "x") # equivalent to ds.transpose(..., "x")

.. jupyter-execute::

    ds.transpose()  # reverses all dimensions

Expand and squeeze dimensions
-----------------------------

To expand a :py:class:`~xarray.DataArray` or all
variables on a :py:class:`~xarray.Dataset` along a new dimension,
use :py:meth:`~xarray.DataArray.expand_dims`

.. jupyter-execute::

    expanded = ds.expand_dims("w")
    expanded

This method attaches a new dimension with size 1 to all data variables.

To remove such a size-1 dimension from the :py:class:`~xarray.DataArray`
or :py:class:`~xarray.Dataset`,
use :py:meth:`~xarray.DataArray.squeeze`

.. jupyter-execute::

    expanded.squeeze("w")

Converting between datasets and arrays
--------------------------------------

To convert from a Dataset to a DataArray, use :py:meth:`~xarray.Dataset.to_dataarray`:

.. jupyter-execute::

    arr = ds.to_dataarray()
    arr

This method broadcasts all data variables in the dataset against each other,
then concatenates them along a new dimension into a new array while preserving
coordinates.

To convert back from a DataArray to a Dataset, use
:py:meth:`~xarray.DataArray.to_dataset`:

.. jupyter-execute::

    arr.to_dataset(dim="variable")

The broadcasting behavior of ``to_dataarray`` means that the resulting array
includes the union of data variable dimensions:

.. jupyter-execute::

    ds2 = xr.Dataset({"a": 0, "b": ("x", [3, 4, 5])})

    # the input dataset has 4 elements
    ds2

.. jupyter-execute::

    # the resulting array has 6 elements
    ds2.to_dataarray()

Otherwise, the result could not be represented as an orthogonal array.

If you use ``to_dataset`` without supplying the ``dim`` argument, the DataArray will be converted into a Dataset of one variable:

.. jupyter-execute::

    arr.to_dataset(name="combined")

.. _reshape.stack:

Stack and unstack
-----------------

As part of xarray's nascent support for :py:class:`pandas.MultiIndex`, we have
implemented :py:meth:`~xarray.DataArray.stack` and
:py:meth:`~xarray.DataArray.unstack` method, for combining or splitting dimensions:

.. jupyter-execute::

    array = xr.DataArray(
        np.random.randn(2, 3), coords=[("x", ["a", "b"]), ("y", [0, 1, 2])]
    )
    stacked = array.stack(z=("x", "y"))
    stacked

.. jupyter-execute::

    stacked.unstack("z")

As elsewhere in xarray, an ellipsis (`...`) can be used to represent all unlisted dimensions:

.. jupyter-execute::

    stacked = array.stack(z=[..., "x"])
    stacked

These methods are modeled on the :py:class:`pandas.DataFrame` methods of the
same name, although in xarray they always create new dimensions rather than
adding to the existing index or columns.

Like :py:meth:`DataFrame.unstack<pandas.DataFrame.unstack>`, xarray's ``unstack``
always succeeds, even if the multi-index being unstacked does not contain all
possible levels. Missing levels are filled in with ``NaN`` in the resulting object:

.. jupyter-execute::

    stacked2 = stacked[::2]
    stacked2

.. jupyter-execute::

    stacked2.unstack("z")

However, xarray's ``stack`` has an important difference from pandas: unlike
pandas, it does not automatically drop missing values. Compare:

.. jupyter-execute::

    array = xr.DataArray([[np.nan, 1], [2, 3]], dims=["x", "y"])
    array.stack(z=("x", "y"))

.. jupyter-execute::

    array.to_pandas().stack()

We departed from pandas's behavior here because predictable shapes for new
array dimensions is necessary for :ref:`dask`.

.. _reshape.stacking_different:

Stacking different variables together
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These stacking and unstacking operations are particularly useful for reshaping
xarray objects for use in machine learning packages, such as `scikit-learn
<https://scikit-learn.org>`_, that usually require two-dimensional numpy
arrays as inputs. For datasets with only one variable, we only need ``stack``
and ``unstack``, but combining multiple variables in a
:py:class:`xarray.Dataset` is more complicated. If the variables in the dataset
have matching numbers of dimensions, we can call
:py:meth:`~xarray.Dataset.to_dataarray` and then stack along the the new coordinate.
But :py:meth:`~xarray.Dataset.to_dataarray` will broadcast the dataarrays together,
which will effectively tile the lower dimensional variable along the missing
dimensions. The method :py:meth:`xarray.Dataset.to_stacked_array` allows
combining variables of differing dimensions without this wasteful copying while
:py:meth:`xarray.DataArray.to_unstacked_dataset` reverses this operation.
Just as with :py:meth:`xarray.Dataset.stack` the stacked coordinate is
represented by a :py:class:`pandas.MultiIndex` object. These methods are used
like this:

.. jupyter-execute::

    data = xr.Dataset(
        data_vars={"a": (("x", "y"), [[0, 1, 2], [3, 4, 5]]), "b": ("x", [6, 7])},
        coords={"y": ["u", "v", "w"]},
    )
    data

.. jupyter-execute::

    stacked = data.to_stacked_array("z", sample_dims=["x"])
    stacked

.. jupyter-execute::

    unstacked = stacked.to_unstacked_dataset("z")
    unstacked

In this example, ``stacked`` is a two dimensional array that we can easily pass to a scikit-learn or another generic
numerical method.

.. note::

    Unlike with ``stack``,  in ``to_stacked_array``, the user specifies the dimensions they **do not** want stacked.
    For a machine learning task, these unstacked dimensions can be interpreted as the dimensions over which samples are
    drawn, whereas the stacked coordinates are the features. Naturally, all variables should possess these sampling
    dimensions.


.. _reshape.set_index:

Set and reset index
-------------------

Complementary to stack / unstack, xarray's ``.set_index``, ``.reset_index`` and
``.reorder_levels`` allow easy manipulation of ``DataArray`` or ``Dataset``
multi-indexes without modifying the data and its dimensions.

You can create a multi-index from several 1-dimensional variables and/or
coordinates using :py:meth:`~xarray.DataArray.set_index`:

.. jupyter-execute::

    da = xr.DataArray(
        np.random.rand(4),
        coords={
            "band": ("x", ["a", "a", "b", "b"]),
            "wavenumber": ("x", np.linspace(200, 400, 4)),
        },
        dims="x",
    )
    da

.. jupyter-execute::

    mda = da.set_index(x=["band", "wavenumber"])
    mda

These coordinates can now be used for indexing, e.g.,

.. jupyter-execute::

    mda.sel(band="a")

Conversely, you can use :py:meth:`~xarray.DataArray.reset_index`
to extract multi-index levels as coordinates (this is mainly useful
for serialization):

.. jupyter-execute::

    mda.reset_index("x")

:py:meth:`~xarray.DataArray.reorder_levels` allows changing the order
of multi-index levels:

.. jupyter-execute::

    mda.reorder_levels(x=["wavenumber", "band"])

As of xarray v0.9 coordinate labels for each dimension are optional.
You can also use ``.set_index`` / ``.reset_index`` to add / remove
labels for one or several dimensions:

.. jupyter-execute::

    array = xr.DataArray([1, 2, 3], dims="x")
    array

.. jupyter-execute::

    array["c"] = ("x", ["a", "b", "c"])
    array.set_index(x="c")

.. jupyter-execute::

    array = array.set_index(x="c")
    array = array.reset_index("x", drop=True)

.. _reshape.shift_and_roll:

Shift and roll
--------------

To adjust coordinate labels, you can use the :py:meth:`~xarray.Dataset.shift` and
:py:meth:`~xarray.Dataset.roll` methods:

.. jupyter-execute::

    array = xr.DataArray([1, 2, 3, 4], dims="x")
    array.shift(x=2)

.. jupyter-execute::

    array.roll(x=2, roll_coords=True)

.. _reshape.sort:

Sort
----

One may sort a DataArray/Dataset via :py:meth:`~xarray.DataArray.sortby` and
:py:meth:`~xarray.Dataset.sortby`. The input can be an individual or list of
1D ``DataArray`` objects:

.. jupyter-execute::

    ds = xr.Dataset(
        {
            "A": (("x", "y"), [[1, 2], [3, 4]]),
            "B": (("x", "y"), [[5, 6], [7, 8]]),
        },
        coords={"x": ["b", "a"], "y": [1, 0]},
    )
    dax = xr.DataArray([100, 99], [("x", [0, 1])])
    day = xr.DataArray([90, 80], [("y", [0, 1])])
    ds.sortby([day, dax])

As a shortcut, you can refer to existing coordinates by name:

.. jupyter-execute::

    ds.sortby("x")

.. jupyter-execute::

    ds.sortby(["y", "x"])

.. jupyter-execute::

    ds.sortby(["y", "x"], ascending=False)

.. _reshape.coarsen:

Reshaping via coarsen
---------------------

Whilst :py:class:`~xarray.DataArray.coarsen` is normally used for reducing your data's resolution by applying a reduction function
(see the :ref:`page on computation<compute.coarsen>`),
it can also be used to reorganise your data without applying a computation via :py:meth:`~xarray.computation.rolling.DataArrayCoarsen.construct`.

Taking our example tutorial air temperature dataset over the Northern US

.. jupyter-execute::

    air = xr.tutorial.open_dataset("air_temperature")["air"]

    air.isel(time=0).plot(x="lon", y="lat");

we can split this up into sub-regions of size ``(9, 18)`` points using :py:meth:`~xarray.computation.rolling.DataArrayCoarsen.construct`:

.. jupyter-execute::

    regions = air.coarsen(lat=9, lon=18, boundary="pad").construct(
        lon=("x_coarse", "x_fine"), lat=("y_coarse", "y_fine")
    )
    with xr.set_options(display_expand_data=False):
        regions

9 new regions have been created, each of size 9 by 18 points.
The ``boundary="pad"`` kwarg ensured that all regions are the same size even though the data does not evenly divide into these sizes.

By plotting these 9 regions together via :ref:`faceting<plotting.faceting>` we can see how they relate to the original data.

.. jupyter-execute::

    regions.isel(time=0).plot(
        x="x_fine", y="y_fine", col="x_coarse", row="y_coarse", yincrease=False
    );

We are now free to easily apply any custom computation to each coarsened region of our new dataarray.
This would involve specifying that applied functions should act over the ``"x_fine"`` and ``"y_fine"`` dimensions,
but broadcast over the ``"x_coarse"`` and ``"y_coarse"`` dimensions.

    .. currentmodule:: xarray
.. _terminology:

Terminology
===========

*Xarray terminology differs slightly from CF, mathematical conventions, and
pandas; so we've put together a glossary of its terms. Here,* ``arr``
*refers to an xarray* :py:class:`DataArray` *in the examples. For more
complete examples, please consult the relevant documentation.*

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import xarray as xr

.. glossary::

    DataArray
        A multi-dimensional array with labeled or named
        dimensions. ``DataArray`` objects add metadata such as dimension names,
        coordinates, and attributes (defined below) to underlying "unlabeled"
        data structures such as numpy and Dask arrays. If its optional ``name``
        property is set, it is a *named DataArray*.

    Dataset
        A dict-like collection of ``DataArray`` objects with aligned
        dimensions. Thus, most operations that can be performed on the
        dimensions of a single ``DataArray`` can be performed on a
        dataset. Datasets have data variables (see **Variable** below),
        dimensions, coordinates, and attributes.

    Variable
        A `NetCDF-like variable
        <https://docs.unidata.ucar.edu/nug/current/netcdf_data_set_components.html#variables>`_
        consisting of dimensions, data, and attributes which describe a single
        array. The main functional difference between variables and numpy arrays
        is that numerical operations on variables implement array broadcasting
        by dimension name. Each ``DataArray`` has an underlying variable that
        can be accessed via ``arr.variable``. However, a variable is not fully
        described outside of either a ``Dataset`` or a ``DataArray``.

        .. note::

            The :py:class:`Variable` class is low-level interface and can
            typically be ignored. However, the word "variable" appears often
            enough in the code and documentation that is useful to understand.

    Dimension
        In mathematics, the *dimension* of data is loosely the number of degrees
        of freedom for it. A *dimension axis* is a set of all points in which
        all but one of these degrees of freedom is fixed. We can think of each
        dimension axis as having a name, for example the "x dimension".  In
        xarray, a ``DataArray`` object's *dimensions* are its named dimension
        axes ``da.dims``, and the name of the ``i``-th dimension is ``da.dims[i]``.
        If an array is created without specifying dimension names, the default dimension
        names will be ``dim_0``, ``dim_1``, and so forth.

    Coordinate
        An array that labels a dimension or set of dimensions of another
        ``DataArray``. In the usual one-dimensional case, the coordinate array's
        values can loosely be thought of as tick labels along a dimension. We
        distinguish :term:`Dimension coordinate` vs. :term:`Non-dimension
        coordinate` and :term:`Indexed coordinate` vs. :term:`Non-indexed
        coordinate`. A coordinate named ``x`` can be retrieved from
        ``arr.coords[x]``. A ``DataArray`` can have more coordinates than
        dimensions because a single dimension can be labeled by multiple
        coordinate arrays. However, only one coordinate array can be assigned
        as a particular dimension's dimension coordinate array.

    Dimension coordinate
        A one-dimensional coordinate array assigned to ``arr`` with both a name
        and dimension name in ``arr.dims``. Usually (but not always), a
        dimension coordinate is also an :term:`Indexed coordinate` so that it can
        be used for label-based indexing and alignment, like the index found on
        a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`.

    Non-dimension coordinate
        A coordinate array assigned to ``arr`` with a name in ``arr.coords`` but
        *not* in ``arr.dims``. These coordinates arrays can be one-dimensional
        or multidimensional, and they are useful for auxiliary labeling. As an
        example, multidimensional coordinates are often used in geoscience
        datasets when :doc:`the data's physical coordinates (such as latitude
        and longitude) differ from their logical coordinates
        <../examples/multidimensional-coords>`. Printing ``arr.coords`` will
        print all of ``arr``'s coordinate names, with the corresponding
        dimension(s) in parentheses. For example, ``coord_name (dim_name) 1 2 3
        ...``.

    Indexed coordinate
        A coordinate which has an associated :term:`Index`. Generally this means
        that the coordinate labels can be used for indexing (selection) and/or
        alignment. An indexed coordinate may have one or more arbitrary
        dimensions although in most cases it is also a :term:`Dimension
        coordinate`. It may or may not be grouped with other indexed coordinates
        depending on whether they share the same index. Indexed coordinates are
        marked by an asterisk ``*`` when printing a ``DataArray`` or ``Dataset``.

    Non-indexed coordinate
        A coordinate which has no associated :term:`Index`. It may still
        represent fixed labels along one or more dimensions but it cannot be
        used for label-based indexing and alignment.

    Index
        An *index* is a data structure optimized for efficient data selection
        and alignment within a discrete or continuous space that is defined by
        coordinate labels (unless it is a functional index). By default, Xarray
        creates a :py:class:`~xarray.indexes.PandasIndex` object (i.e., a
        :py:class:`pandas.Index` wrapper) for each :term:`Dimension coordinate`.
        For more advanced use cases (e.g., staggered or irregular grids,
        geospatial indexes), Xarray also accepts any instance of a specialized
        :py:class:`~xarray.indexes.Index` subclass that is associated to one or
        more arbitrary coordinates. The index associated with the coordinate
        ``x`` can be retrieved by ``arr.xindexes[x]`` (or ``arr.indexes["x"]``
        if the index is convertible to a :py:class:`pandas.Index` object). If
        two coordinates ``x`` and ``y`` share the same index,
        ``arr.xindexes[x]`` and ``arr.xindexes[y]`` both return the same
        :py:class:`~xarray.indexes.Index` object.

    name
        The names of dimensions, coordinates, DataArray objects and data
        variables can be anything as long as they are :term:`hashable`. However,
        it is preferred to use :py:class:`str` typed names.

    scalar
        By definition, a scalar is not an :term:`array` and when converted to
        one, it has 0 dimensions. That means that, e.g., :py:class:`int`,
        :py:class:`float`, and :py:class:`str` objects are "scalar" while
        :py:class:`list` or :py:class:`tuple` are not.

    duck array
        `Duck arrays`__ are array implementations that behave
        like numpy arrays. They have to define the ``shape``, ``dtype`` and
        ``ndim`` properties. For integration with ``xarray``, the ``__array__``,
        ``__array_ufunc__`` and ``__array_function__`` protocols are also required.

        __ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html

    Aligning
        Aligning refers to the process of ensuring that two or more DataArrays or Datasets
        have the same dimensions and coordinates, so that they can be combined or compared properly.

        .. jupyter-execute::

            x = xr.DataArray(
                [[25, 35], [10, 24]],
                dims=("lat", "lon"),
                coords={"lat": [35.0, 40.0], "lon": [100.0, 120.0]},
            )
            y = xr.DataArray(
                [[20, 5], [7, 13]],
                dims=("lat", "lon"),
                coords={"lat": [35.0, 42.0], "lon": [100.0, 120.0]},
            )
            a, b = xr.align(x, y)

            # By default, an "inner join" is performed
            # so "a" is a copy of "x" where coordinates match "y"
            a

    Broadcasting
        A technique that allows operations to be performed on arrays with different shapes and dimensions.
        When performing operations on arrays with different shapes and dimensions, xarray will automatically attempt to broadcast the
        arrays to a common shape before the operation is applied.

        .. jupyter-execute::

            # 'a' has shape (3,) and 'b' has shape (4,)
            a = xr.DataArray(np.array([1, 2, 3]), dims=["x"])
            b = xr.DataArray(np.array([4, 5, 6, 7]), dims=["y"])

            # 2D array with shape (3, 4)
            a + b

    Merging
        Merging is used to combine two or more Datasets or DataArrays that have different variables or coordinates along
        the same dimensions. When merging, xarray aligns the variables and coordinates of the different datasets along
        the specified dimensions and creates a new ``Dataset`` containing all the variables and coordinates.

        .. jupyter-execute::

            # create two 1D arrays with names
            arr1 = xr.DataArray(
                [1, 2, 3], dims=["x"], coords={"x": [10, 20, 30]}, name="arr1"
            )
            arr2 = xr.DataArray(
                [4, 5, 6], dims=["x"], coords={"x": [20, 30, 40]}, name="arr2"
            )

            # merge the two arrays into a new dataset
            merged_ds = xr.Dataset({"arr1": arr1, "arr2": arr2})
            merged_ds

    Concatenating
        Concatenating is used to combine two or more Datasets or DataArrays along a dimension. When concatenating,
        xarray arranges the datasets or dataarrays along a new dimension, and the resulting ``Dataset`` or ``Dataarray``
        will have the same variables and coordinates along the other dimensions.

        .. jupyter-execute::

            a = xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))
            b = xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))
            c = xr.concat([a, b], dim="c")
            c

    Combining
        Combining is the process of arranging two or more DataArrays or Datasets into a single ``DataArray`` or
        ``Dataset`` using some combination of merging and concatenation operations.

        .. jupyter-execute::

            ds1 = xr.Dataset(
                {"data": xr.DataArray([[1, 2], [3, 4]], dims=("x", "y"))},
                coords={"x": [1, 2], "y": [3, 4]},
            )
            ds2 = xr.Dataset(
                {"data": xr.DataArray([[5, 6], [7, 8]], dims=("x", "y"))},
                coords={"x": [2, 3], "y": [4, 5]},
            )

            # combine the datasets
            combined_ds = xr.combine_by_coords([ds1, ds2])
            combined_ds

    lazy
        Lazily-evaluated operations do not load data into memory until necessary. Instead of doing calculations
        right away, xarray lets you plan what calculations you want to do, like finding the
        average temperature in a dataset. This planning is called "lazy evaluation." Later, when
        you're ready to see the final result, you tell xarray, "Okay, go ahead and do those calculations now!"
        That's when xarray starts working through the steps you planned and gives you the answer you wanted. This
        lazy approach helps save time and memory because xarray only does the work when you actually need the
        results.

    labeled
        Labeled data has metadata describing the context of the data, not just the raw data values.
        This contextual information can be labels for array axes (i.e. dimension names) tick labels along axes (stored as Coordinate variables) or unique names for each array. These labels
        provide context and meaning to the data, making it easier to understand and work with. If you have
        temperature data for different cities over time. Using xarray, you can label the dimensions: one for
        cities and another for time.

    serialization
        Serialization is the process of converting your data into a format that makes it easy to save and share.
        When you serialize data in xarray, you're taking all those temperature measurements, along with their
        labels and other information, and turning them into a format that can be stored in a file or sent over
        the internet. xarray objects can be serialized into formats which store the labels alongside the data.
        Some supported serialization formats are files that can then be stored or transferred (e.g. netCDF),
        whilst others are protocols that allow for data access over a network (e.g. Zarr).

    indexing
        :ref:`Indexing` is how you select subsets of your data which you are interested in.

        - Label-based Indexing: Selecting data by passing a specific label and comparing it to the labels
          stored in the associated coordinates. You can use labels to specify what you want like "Give me the
          temperature for New York on July 15th."

        - Positional Indexing: You can use numbers to refer to positions in the data like "Give me the third temperature value" This is useful when you know the order of your data but don't need to remember the exact labels.

        - Slicing: You can take a "slice" of your data, like you might want all temperatures from July 1st
          to July 10th. xarray supports slicing for both positional and label-based indexing.

    DataTree
        A tree-like collection of ``Dataset`` objects. A *tree* is made up of one or more *nodes*,
        each of which can store the same information as a single ``Dataset`` (accessed via ``.dataset``).
        This data is stored in the same way as in a ``Dataset``, i.e. in the form of data
        :term:`variables<Variable>`, :term:`dimensions<Dimension>`, :term:`coordinates<Coordinate>`,
        and attributes.

       The nodes in a tree are linked to one another, and each node is its own instance of
        ``DataTree`` object. Each node can have zero or more *children* (stored in a dictionary-like
        manner under their corresponding *names*), and those child nodes can themselves have
        children. If a node is a child of another node that other node is said to be its *parent*.
        Nodes can have a maximum of one parent, and if a node has no parent it is said to be the
        *root* node of that *tree*.

    Subtree
        A section of a *tree*, consisting of a *node* along with all the child nodes below it
        (and the child nodes below them, i.e. all so-called *descendant* nodes).
        Excludes the parent node and all nodes above.

    Group
        Another word for a subtree, reflecting how the hierarchical structure of a ``DataTree``
        allows for grouping related data together.
        Analogous to a single
        `netCDF group <https://www.unidata.ucar.edu/software/netcdf/workshops/2011/groups-types/GroupsIntro.html>`_
        or `Zarr group <https://zarr.readthedocs.io/en/stable/tutorial.html#groups>`_.

    .. _testing:

Testing your code
=================

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

.. _testing.hypothesis:

Hypothesis testing
------------------

.. note::

  Testing with hypothesis is a fairly advanced topic. Before reading this section it is recommended that you take a look
  at our guide to xarray's :ref:`data structures`, are familiar with conventional unit testing in
  `pytest <https://docs.pytest.org/>`_, and have seen the
  `hypothesis library documentation <https://hypothesis.readthedocs.io/>`_.

`The hypothesis library <https://hypothesis.readthedocs.io/>`_ is a powerful tool for property-based testing.
Instead of writing tests for one example at a time, it allows you to write tests parameterized by a source of many
dynamically generated examples. For example you might have written a test which you wish to be parameterized by the set
of all possible integers via :py:func:`hypothesis.strategies.integers()`.

Property-based testing is extremely powerful, because (unlike more conventional example-based testing) it can find bugs
that you did not even think to look for!

Strategies
~~~~~~~~~~

Each source of examples is called a "strategy", and xarray provides a range of custom strategies which produce xarray
data structures containing arbitrary data. You can use these to efficiently test downstream code,
quickly ensuring that your code can handle xarray objects of all possible structures and contents.

These strategies are accessible in the :py:mod:`xarray.testing.strategies` module, which provides

.. currentmodule:: xarray

.. autosummary::

   testing.strategies.supported_dtypes
   testing.strategies.names
   testing.strategies.dimension_names
   testing.strategies.dimension_sizes
   testing.strategies.attrs
   testing.strategies.variables
   testing.strategies.unique_subset_of

These build upon the numpy and array API strategies offered in :py:mod:`hypothesis.extra.numpy` and :py:mod:`hypothesis.extra.array_api`:

.. jupyter-execute::

    import hypothesis.extra.numpy as npst

Generating Examples
~~~~~~~~~~~~~~~~~~~

To see an example of what each of these strategies might produce, you can call one followed by the ``.example()`` method,
which is a general hypothesis method valid for all strategies.

.. jupyter-execute::

    import xarray.testing.strategies as xrst

    xrst.variables().example()

.. jupyter-execute::

    xrst.variables().example()

.. jupyter-execute::

    xrst.variables().example()

You can see that calling ``.example()`` multiple times will generate different examples, giving you an idea of the wide
range of data that the xarray strategies can generate.

In your tests however you should not use ``.example()`` - instead you should parameterize your tests with the
:py:func:`hypothesis.given` decorator:

.. jupyter-execute::

    from hypothesis import given

.. jupyter-execute::

    @given(xrst.variables())
    def test_function_that_acts_on_variables(var):
        assert func(var) == ...


Chaining Strategies
~~~~~~~~~~~~~~~~~~~

Xarray's strategies can accept other strategies as arguments, allowing you to customise the contents of the generated
examples.

.. jupyter-execute::

    # generate a Variable containing an array with a complex number dtype, but all other details still arbitrary
    from hypothesis.extra.numpy import complex_number_dtypes

    xrst.variables(dtype=complex_number_dtypes()).example()

This also works with custom strategies, or strategies defined in other packages.
For example you could imagine creating a ``chunks`` strategy to specify particular chunking patterns for a dask-backed array.

Fixing Arguments
~~~~~~~~~~~~~~~~

If you want to fix one aspect of the data structure, whilst allowing variation in the generated examples
over all other aspects, then use :py:func:`hypothesis.strategies.just()`.

.. jupyter-execute::

    import hypothesis.strategies as st

    # Generates only variable objects with dimensions ["x", "y"]
    xrst.variables(dims=st.just(["x", "y"])).example()

(This is technically another example of chaining strategies - :py:func:`hypothesis.strategies.just()` is simply a
special strategy that just contains a single example.)

To fix the length of dimensions you can instead pass ``dims`` as a mapping of dimension names to lengths
(i.e. following xarray objects' ``.sizes()`` property), e.g.

.. jupyter-execute::

    # Generates only variables with dimensions ["x", "y"], of lengths 2 & 3 respectively
    xrst.variables(dims=st.just({"x": 2, "y": 3})).example()

You can also use this to specify that you want examples which are missing some part of the data structure, for instance

.. jupyter-execute::

    # Generates a Variable with no attributes
    xrst.variables(attrs=st.just({})).example()

Through a combination of chaining strategies and fixing arguments, you can specify quite complicated requirements on the
objects your chained strategy will generate.

.. jupyter-execute::

    fixed_x_variable_y_maybe_z = st.fixed_dictionaries(
        {"x": st.just(2), "y": st.integers(3, 4)}, optional={"z": st.just(2)}
    )
    fixed_x_variable_y_maybe_z.example()

.. jupyter-execute::

    special_variables = xrst.variables(dims=fixed_x_variable_y_maybe_z)
    special_variables.example()

.. jupyter-execute::

    special_variables.example()

Here we have used one of hypothesis' built-in strategies :py:func:`hypothesis.strategies.fixed_dictionaries` to create a
strategy which generates mappings of dimension names to lengths (i.e. the ``size`` of the xarray object we want).
This particular strategy will always generate an ``x`` dimension of length 2, and a ``y`` dimension of
length either 3 or 4, and will sometimes also generate a ``z`` dimension of length 2.
By feeding this strategy for dictionaries into the ``dims`` argument of xarray's :py:func:`~st.variables` strategy,
we can generate arbitrary :py:class:`~xarray.Variable` objects whose dimensions will always match these specifications.

Generating Duck-type Arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray objects don't have to wrap numpy arrays, in fact they can wrap any array type which presents the same API as a
numpy array (so-called "duck array wrapping", see :ref:`wrapping numpy-like arrays <internals.duckarrays>`).

Imagine we want to write a strategy which generates arbitrary ``Variable`` objects, each of which wraps a
:py:class:`sparse.COO` array instead of a ``numpy.ndarray``. How could we do that? There are two ways:

1. Create a xarray object with numpy data and use the hypothesis' ``.map()`` method to convert the underlying array to a
different type:

.. jupyter-execute::

    import sparse

.. jupyter-execute::

    def convert_to_sparse(var):
        return var.copy(data=sparse.COO.from_numpy(var.to_numpy()))

.. jupyter-execute::

    sparse_variables = xrst.variables(dims=xrst.dimension_names(min_dims=1)).map(
        convert_to_sparse
    )

    sparse_variables.example()

.. jupyter-execute::

    sparse_variables.example()

2. Pass a function which returns a strategy which generates the duck-typed arrays directly to the ``array_strategy_fn`` argument of the xarray strategies:

.. jupyter-execute::

    def sparse_random_arrays(shape: tuple[int, ...]) -> sparse._coo.core.COO:
        """Strategy which generates random sparse.COO arrays"""
        if shape is None:
            shape = npst.array_shapes()
        else:
            shape = st.just(shape)
        density = st.integers(min_value=0, max_value=1)
        # note sparse.random does not accept a dtype kwarg
        return st.builds(sparse.random, shape=shape, density=density)


    def sparse_random_arrays_fn(
        *, shape: tuple[int, ...], dtype: np.dtype
    ) -> st.SearchStrategy[sparse._coo.core.COO]:
        return sparse_random_arrays(shape=shape)


.. jupyter-execute::

    sparse_random_variables = xrst.variables(
        array_strategy_fn=sparse_random_arrays_fn, dtype=st.just(np.dtype("float64"))
    )
    sparse_random_variables.example()

Either approach is fine, but one may be more convenient than the other depending on the type of the duck array which you
want to wrap.

Compatibility with the Python Array API Standard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Xarray aims to be compatible with any duck-array type that conforms to the `Python Array API Standard <https://data-apis.org/array-api/latest/>`_
(see our :ref:`docs on Array API Standard support <internals.duckarrays.array_api_standard>`).

.. warning::

    The strategies defined in :py:mod:`testing.strategies` are **not** guaranteed to use array API standard-compliant
    dtypes by default.
    For example arrays with the dtype ``np.dtype('float16')`` may be generated by :py:func:`testing.strategies.variables`
    (assuming the ``dtype`` kwarg was not explicitly passed), despite ``np.dtype('float16')`` not being in the
    array API standard.

If the array type you want to generate has an array API-compliant top-level namespace
(e.g. that which is conventionally imported as ``xp`` or similar),
you can use this neat trick:

.. jupyter-execute::

    import numpy as xp  # compatible in numpy 2.0

    # use `import numpy.array_api as xp` in numpy>=1.23,<2.0

    from hypothesis.extra.array_api import make_strategies_namespace

    xps = make_strategies_namespace(xp)

    xp_variables = xrst.variables(
        array_strategy_fn=xps.arrays,
        dtype=xps.scalar_dtypes(),
    )
    xp_variables.example()

Another array API-compliant duck array library would replace the import, e.g. ``import cupy as cp`` instead.

Testing over Subsets of Dimensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A common task when testing xarray user code is checking that your function works for all valid input dimensions.
We can chain strategies to achieve this, for which the helper strategy :py:func:`~testing.strategies.unique_subset_of`
is useful.

It works for lists of dimension names

.. jupyter-execute::

    dims = ["x", "y", "z"]
    xrst.unique_subset_of(dims).example()

.. jupyter-execute::

    xrst.unique_subset_of(dims).example()

as well as for mappings of dimension names to sizes

.. jupyter-execute::

    dim_sizes = {"x": 2, "y": 3, "z": 4}
    xrst.unique_subset_of(dim_sizes).example()

.. jupyter-execute::

    xrst.unique_subset_of(dim_sizes).example()

This is useful because operations like reductions can be performed over any subset of the xarray object's dimensions.
For example we can write a pytest test that tests that a reduction gives the expected result when applying that reduction
along any possible valid subset of the Variable's dimensions.

.. code-block:: python

    import numpy.testing as npt


    @given(st.data(), xrst.variables(dims=xrst.dimension_names(min_dims=1)))
    def test_mean(data, var):
        """Test that the mean of an xarray Variable is always equal to the mean of the underlying array."""

        # specify arbitrary reduction along at least one dimension
        reduction_dims = data.draw(xrst.unique_subset_of(var.dims, min_size=1))

        # create expected result (using nanmean because arrays with Nans will be generated)
        reduction_axes = tuple(var.get_axis_num(dim) for dim in reduction_dims)
        expected = np.nanmean(var.data, axis=reduction_axes)

        # assert property is always satisfied
        result = var.mean(dim=reduction_dims).data
        npt.assert_equal(expected, result)

    .. currentmodule:: xarray

.. _time-series:

================
Time series data
================

A major use case for xarray is multi-dimensional time-series data.
Accordingly, we've copied many of features that make working with time-series
data in pandas such a joy to xarray. In most cases, we rely on pandas for the
core functionality.

.. jupyter-execute::
    :hide-code:

    import numpy as np
    import pandas as pd
    import xarray as xr

    np.random.seed(123456)

Creating datetime64 data
------------------------

Xarray uses the numpy dtypes :py:class:`numpy.datetime64` and :py:class:`numpy.timedelta64`
with specified units (one of ``"s"``, ``"ms"``, ``"us"`` and ``"ns"``) to represent datetime
data, which offer vectorized operations with numpy and smooth integration with pandas.

To convert to or create regular arrays of :py:class:`numpy.datetime64` data, we recommend
using :py:func:`pandas.to_datetime`, :py:class:`pandas.DatetimeIndex`, or :py:func:`xarray.date_range`:

.. jupyter-execute::

    pd.to_datetime(["2000-01-01", "2000-02-02"])

.. jupyter-execute::

    pd.DatetimeIndex(
        ["2000-01-01 00:00:00", "2000-02-02 00:00:00"], dtype="datetime64[s]"
    )

.. jupyter-execute::

    xr.date_range("2000-01-01", periods=365)

.. jupyter-execute::

    xr.date_range("2000-01-01", periods=365, unit="s")


.. note::
    Care has to be taken to create the output with the wanted resolution.
    For :py:func:`pandas.date_range` the ``unit``-kwarg has to be specified
    and for :py:func:`pandas.to_datetime` the selection of the resolution
    isn't possible at all. For that :py:class:`pd.DatetimeIndex` can be used
    directly. There is more in-depth information in section
    :ref:`internals.timecoding`.

Alternatively, you can supply arrays of Python ``datetime`` objects. These get
converted automatically when used as arguments in xarray objects (with us-resolution):

.. jupyter-execute::

    import datetime

    xr.Dataset({"time": datetime.datetime(2000, 1, 1)})

When reading or writing netCDF files, xarray automatically decodes datetime and
timedelta arrays using `CF conventions`_ (that is, by using a ``units``
attribute like ``'days since 2000-01-01'``).

.. _CF conventions: https://cfconventions.org

.. note::

   When decoding/encoding datetimes for non-standard calendars or for dates
   before `1582-10-15`_, xarray uses the `cftime`_ library by default.
   It was previously packaged with the ``netcdf4-python`` package under the
   name ``netcdftime`` but is now distributed separately. ``cftime`` is an
   :ref:`optional dependency<installing>` of xarray.

.. _cftime: https://unidata.github.io/cftime
.. _1582-10-15: https://en.wikipedia.org/wiki/Gregorian_calendar


You can manual decode arrays in this form by passing a dataset to
:py:func:`decode_cf`:

.. jupyter-execute::

    attrs = {"units": "hours since 2000-01-01"}
    ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
    # Default decoding to 'ns'-resolution
    xr.decode_cf(ds)

.. jupyter-execute::

    # Decoding to 's'-resolution
    coder = xr.coders.CFDatetimeCoder(time_unit="s")
    xr.decode_cf(ds, decode_times=coder)

From xarray 2025.01.2 the resolution of the dates can be one of ``"s"``, ``"ms"``, ``"us"`` or ``"ns"``. One limitation of using ``datetime64[ns]`` is that it limits the native representation of dates to those that fall between the years 1678 and 2262, which gets increased significantly with lower resolutions. When a store contains dates outside of these bounds (or dates < `1582-10-15`_ with a Gregorian, also known as standard, calendar), dates will be returned as arrays of :py:class:`cftime.datetime` objects and a :py:class:`CFTimeIndex` will be used for indexing.
:py:class:`CFTimeIndex` enables most of the indexing functionality of a :py:class:`pandas.DatetimeIndex`.
See :ref:`CFTimeIndex` for more information.

Datetime indexing
-----------------

Xarray borrows powerful indexing machinery from pandas (see :ref:`indexing`).

This allows for several useful and succinct forms of indexing, particularly for
``datetime64`` data. For example, we support indexing with strings for single
items and with the ``slice`` object:

.. jupyter-execute::

    time = pd.date_range("2000-01-01", freq="h", periods=365 * 24)
    ds = xr.Dataset({"foo": ("time", np.arange(365 * 24)), "time": time})
    ds.sel(time="2000-01")

.. jupyter-execute::

    ds.sel(time=slice("2000-06-01", "2000-06-10"))

You can also select a particular time by indexing with a
:py:class:`datetime.time` object:

.. jupyter-execute::

    ds.sel(time=datetime.time(12))

For more details, read the pandas documentation and the section on :ref:`datetime_component_indexing` (i.e. using the ``.dt`` accessor).

.. _dt_accessor:

Datetime components
-------------------

Similar to `pandas accessors`_, the components of datetime objects contained in a
given ``DataArray`` can be quickly computed using a special ``.dt`` accessor.

.. _pandas accessors: https://pandas.pydata.org/pandas-docs/stable/basics.html#basics-dt-accessors

.. jupyter-execute::

    time = pd.date_range("2000-01-01", freq="6h", periods=365 * 4)
    ds = xr.Dataset({"foo": ("time", np.arange(365 * 4)), "time": time})
    ds.time.dt.hour

.. jupyter-execute::

    ds.time.dt.dayofweek

The ``.dt`` accessor works on both coordinate dimensions as well as
multi-dimensional data.

Xarray also supports a notion of "virtual" or "derived" coordinates for
`datetime components`__ implemented by pandas, including "year", "month",
"day", "hour", "minute", "second", "dayofyear", "week", "dayofweek", "weekday"
and "quarter":

__ https://pandas.pydata.org/pandas-docs/stable/api.html#time-date-components

.. jupyter-execute::

    ds["time.month"]

.. jupyter-execute::

    ds["time.dayofyear"]

For use as a derived coordinate, xarray adds ``'season'`` to the list of
datetime components supported by pandas:

.. jupyter-execute::

    ds["time.season"]

.. jupyter-execute::

    ds["time"].dt.season

The set of valid seasons consists of 'DJF', 'MAM', 'JJA' and 'SON', labeled by
the first letters of the corresponding months.

You can use these shortcuts with both Datasets and DataArray coordinates.

In addition, xarray supports rounding operations ``floor``, ``ceil``, and ``round``. These operations require that you supply a `rounding frequency as a string argument.`__

__ https://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases

.. jupyter-execute::

    ds["time"].dt.floor("D")

The ``.dt`` accessor can also be used to generate formatted datetime strings
for arrays utilising the same formatting as the standard `datetime.strftime`_.

.. _datetime.strftime: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

.. jupyter-execute::

    ds["time"].dt.strftime("%a, %b %d %H:%M")

.. _datetime_component_indexing:

Indexing Using Datetime Components
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can use use the ``.dt`` accessor when subsetting your data as well. For example, we can subset for the month of January using the following:

.. jupyter-execute::

    ds.isel(time=(ds.time.dt.month == 1))

You can also search for multiple months (in this case January through March), using ``isin``:

.. jupyter-execute::

    ds.isel(time=ds.time.dt.month.isin([1, 2, 3]))

.. _resampling:

Resampling and grouped operations
---------------------------------


.. seealso::

   For more generic documentation on grouping, see :ref:`groupby`.


Datetime components couple particularly well with grouped operations for analyzing features that repeat over time.
Here's how to calculate the mean by time of day:

.. jupyter-execute::

    ds.groupby("time.hour").mean()

For upsampling or downsampling temporal resolutions, xarray offers a
:py:meth:`Dataset.resample` method building on the core functionality
offered by the pandas method of the same name. Resample uses essentially the
same api as :py:meth:`pandas.DataFrame.resample` `in pandas`_.

.. _in pandas: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#up-and-downsampling

For example, we can downsample our dataset from hourly to 6-hourly:

.. jupyter-execute::

    ds.resample(time="6h")

This will create a specialized :py:class:`~xarray.core.resample.DatasetResample` or :py:class:`~xarray.core.resample.DataArrayResample`
object which saves information necessary for resampling. All of the reduction methods which work with
:py:class:`Dataset` or :py:class:`DataArray` objects can also be used for resampling:

.. jupyter-execute::

    ds.resample(time="6h").mean()

You can also supply an arbitrary reduction function to aggregate over each
resampling group:

.. jupyter-execute::

    ds.resample(time="6h").reduce(np.mean)

You can also resample on the time dimension while applying reducing along other dimensions at the same time
by specifying the ``dim`` keyword argument

.. code-block:: python

    ds.resample(time="6h").mean(dim=["time", "latitude", "longitude"])

For upsampling, xarray provides six methods: ``asfreq``, ``ffill``, ``bfill``, ``pad``,
``nearest`` and ``interpolate``. ``interpolate`` extends :py:func:`scipy.interpolate.interp1d`
and supports all of its schemes. All of these resampling operations work on both
Dataset and DataArray objects with an arbitrary number of dimensions.

In order to limit the scope of the methods ``ffill``, ``bfill``, ``pad`` and
``nearest`` the ``tolerance`` argument can be set in coordinate units.
Data that has indices outside of the given ``tolerance`` are set to ``NaN``.

.. jupyter-execute::

    ds.resample(time="1h").nearest(tolerance="1h")

It is often desirable to center the time values after a resampling operation.
That can be accomplished by updating the resampled dataset time coordinate values
using time offset arithmetic via the :py:func:`pandas.tseries.frequencies.to_offset` function.

.. jupyter-execute::

    resampled_ds = ds.resample(time="6h").mean()
    offset = pd.tseries.frequencies.to_offset("6h") / 2
    resampled_ds["time"] = resampled_ds.get_index("time") + offset
    resampled_ds


.. seealso::

   For more examples of using grouped operations on a time dimension, see :doc:`../examples/weather-data`.


.. _seasonal_grouping:

Handling Seasons
~~~~~~~~~~~~~~~~

Two extremely common time series operations are to group by seasons, and resample to a seasonal frequency.
Xarray has historically supported some simple versions of these computations.
For example, ``.groupby("time.season")`` (where the seasons are DJF, MAM, JJA, SON)
and resampling to a seasonal frequency using Pandas syntax: ``.resample(time="QS-DEC")``.

Quite commonly one wants more flexibility in defining seasons. For these use-cases, Xarray provides
:py:class:`groupers.SeasonGrouper` and :py:class:`groupers.SeasonResampler`.


.. currentmodule:: xarray.groupers

.. jupyter-execute::

    from xarray.groupers import SeasonGrouper

    ds.groupby(time=SeasonGrouper(["DJF", "MAM", "JJA", "SON"])).mean()


Note how the seasons are in the specified order, unlike ``.groupby("time.season")`` where the
seasons are sorted alphabetically.

.. jupyter-execute::

    ds.groupby("time.season").mean()


:py:class:`SeasonGrouper` supports overlapping seasons:

.. jupyter-execute::

    ds.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()


Skipping months is allowed:

.. jupyter-execute::

    ds.groupby(time=SeasonGrouper(["JJAS"])).mean()


Use :py:class:`SeasonResampler` to specify custom seasons.

.. jupyter-execute::

    from xarray.groupers import SeasonResampler

    ds.resample(time=SeasonResampler(["DJF", "MAM", "JJA", "SON"])).mean()


:py:class:`SeasonResampler` is smart enough to correctly handle years for seasons that
span the end of the year (e.g. DJF). By default :py:class:`SeasonResampler` will skip any
season that is incomplete (e.g. the first DJF season for a time series that starts in Jan).
Pass the ``drop_incomplete=False`` kwarg to :py:class:`SeasonResampler` to disable this behaviour.

.. jupyter-execute::

    from xarray.groupers import SeasonResampler

    ds.resample(
        time=SeasonResampler(["DJF", "MAM", "JJA", "SON"], drop_incomplete=False)
    ).mean()


Seasons need not be of the same length:

.. jupyter-execute::

    ds.resample(time=SeasonResampler(["JF", "MAM", "JJAS", "OND"])).mean()

    .. currentmodule:: xarray

.. _weather-climate:

Weather and climate data
========================

.. jupyter-execute::
    :hide-code:

    import xarray as xr
    import numpy as np

Xarray can leverage metadata that follows the `Climate and Forecast (CF) conventions`_ if present. Examples include :ref:`automatic labelling of plots<plotting>` with descriptive names and units if proper metadata is present and support for non-standard calendars used in climate science through the ``cftime`` module (explained in the :ref:`CFTimeIndex` section). There are also a number of :ref:`geosciences-focused projects that build on xarray<ecosystem>`.

.. _Climate and Forecast (CF) conventions: https://cfconventions.org

.. _cf_variables:

Related Variables
-----------------

Several CF variable attributes contain lists of other variables
associated with the variable with the attribute.  A few of these are
now parsed by xarray, with the attribute value popped to encoding on
read and the variables in that value interpreted as non-dimension
coordinates:

- ``coordinates``
- ``bounds``
- ``grid_mapping``
- ``climatology``
- ``geometry``
- ``node_coordinates``
- ``node_count``
- ``part_node_count``
- ``interior_ring``
- ``cell_measures``
- ``formula_terms``

This decoding is controlled by the ``decode_coords`` kwarg to
:py:func:`open_dataset` and :py:func:`open_mfdataset`.

The CF attribute ``ancillary_variables`` was not included in the list
due to the variables listed there being associated primarily with the
variable with the attribute, rather than with the dimensions.

.. _metpy_accessor:

CF-compliant coordinate variables
---------------------------------

`MetPy`_ adds a ``metpy`` accessor that allows accessing coordinates with appropriate CF metadata using generic names ``x``, ``y``, ``vertical`` and ``time``. There is also a ``cartopy_crs`` attribute that provides projection information, parsed from the appropriate CF metadata, as a `Cartopy`_ projection object. See the `metpy documentation`_ for more information.

.. _`MetPy`: https://unidata.github.io/MetPy/dev/index.html
.. _`metpy documentation`:	https://unidata.github.io/MetPy/dev/tutorials/xarray_tutorial.html#coordinates
.. _`Cartopy`: https://scitools.org.uk/cartopy/docs/latest/reference/crs.html

.. _CFTimeIndex:

Non-standard calendars and dates outside the precision range
------------------------------------------------------------

Through the standalone ``cftime`` library and a custom subclass of
:py:class:`pandas.Index`, xarray supports a subset of the indexing
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
dates from non-standard calendars commonly used in climate science or dates
using a standard calendar, but outside the `precision range`_ and dates prior to `1582-10-15`_.

.. note::

   As of xarray version 0.11, by default, :py:class:`cftime.datetime` objects
   will be used to represent times (either in indexes, as a
   :py:class:`~xarray.CFTimeIndex`, or in data arrays with dtype object) if
   any of the following are true:

   - The dates are from a non-standard calendar
   - Any dates are outside the nanosecond-precision range (prior xarray version 2025.01.2)
   - Any dates are outside the time span limited by the resolution (from xarray version 2025.01.2)

   Otherwise pandas-compatible dates from a standard calendar will be
   represented with the ``np.datetime64[unit]`` data type (where unit can be one of ``"s"``, ``"ms"``, ``"us"``, ``"ns"``), enabling the use of a :py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[unit]`` and their full set of associated features.

   As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
   values. From xarray version 2025.01.2 on, non-nanosecond precision datetime values are also supported in xarray (this can be parameterized via :py:class:`~xarray.coders.CFDatetimeCoder` and ``decode_times`` kwarg). See also :ref:`internals.timecoding`.

For example, you can create a DataArray indexed by a time
coordinate with dates from a no-leap calendar and a
:py:class:`~xarray.CFTimeIndex` will automatically be used:

.. jupyter-execute::

    from itertools import product
    from cftime import DatetimeNoLeap

    dates = [
        DatetimeNoLeap(year, month, 1)
        for year, month in product(range(1, 3), range(1, 13))
    ]
    da = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")

Xarray also includes a :py:func:`~xarray.date_range` function, which enables
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates.  For
instance, we can create the same dates and DataArray we created above using
(note that ``use_cftime=True`` is not mandatory to return a
:py:class:`~xarray.CFTimeIndex` for non-standard calendars, but can be nice
to use to be explicit):

.. jupyter-execute::

    dates = xr.date_range(
        start="0001", periods=24, freq="MS", calendar="noleap", use_cftime=True
    )
    da = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")

Mirroring pandas' method with the same name, :py:meth:`~xarray.infer_freq` allows one to
infer the sampling frequency of a :py:class:`~xarray.CFTimeIndex` or a 1-D
:py:class:`~xarray.DataArray` containing cftime objects. It also works transparently with
``np.datetime64`` and ``np.timedelta64`` data (with "s", "ms", "us" or "ns" resolution).

.. jupyter-execute::

    xr.infer_freq(dates)

With :py:meth:`~xarray.CFTimeIndex.strftime` we can also easily generate formatted strings from
the datetime values of a :py:class:`~xarray.CFTimeIndex` directly or through the
``dt`` accessor for a :py:class:`~xarray.DataArray`
using the same formatting as the standard `datetime.strftime`_ convention .

.. _datetime.strftime: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior

.. jupyter-execute::

    dates.strftime("%c")

.. jupyter-execute::

    da["time"].dt.strftime("%Y%m%d")

Conversion between non-standard calendar and to/from pandas DatetimeIndexes is
facilitated with the :py:meth:`xarray.Dataset.convert_calendar` method (also available as
:py:meth:`xarray.DataArray.convert_calendar`). Here, like elsewhere in xarray, the ``use_cftime``
argument controls which datetime backend is used in the output. The default (``None``) is to
use ``pandas`` when possible, i.e. when the calendar is ``standard``/``gregorian`` and dates starting with `1582-10-15`_. There is no such restriction when converting to a ``proleptic_gregorian`` calendar.

.. _1582-10-15: https://en.wikipedia.org/wiki/Gregorian_calendar

.. jupyter-execute::

    dates = xr.date_range(
        start="2001", periods=24, freq="MS", calendar="noleap", use_cftime=True
    )
    da_nl = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")
    da_std = da.convert_calendar("standard", use_cftime=True)

The data is unchanged, only the timestamps are modified. Further options are implemented
for the special ``"360_day"`` calendar and for handling missing dates. There is also
:py:meth:`xarray.Dataset.interp_calendar` (and :py:meth:`xarray.DataArray.interp_calendar`)
for interpolating data between calendars.

For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:

- `Partial datetime string indexing`_:

.. jupyter-execute::

    da.sel(time="0001")

.. jupyter-execute::

    da.sel(time=slice("0001-05", "0002-02"))

.. note::


   For specifying full or partial datetime strings in cftime
   indexing, xarray supports two versions of the `ISO 8601 standard`_, the
   basic pattern (YYYYMMDDhhmmss) or the extended pattern
   (YYYY-MM-DDThh:mm:ss), as well as the default cftime string format
   (YYYY-MM-DD hh:mm:ss).  This is somewhat more restrictive than pandas;
   in other words, some datetime strings that would be valid for a
   :py:class:`pandas.DatetimeIndex` are not valid for an
   :py:class:`~xarray.CFTimeIndex`.

- Access of basic datetime components via the ``dt`` accessor (in this case
  just "year", "month", "day", "hour", "minute", "second", "microsecond",
  "season", "dayofyear", "dayofweek", and "days_in_month") with the addition
  of "calendar", absent from pandas:

.. jupyter-execute::

    da.time.dt.year

.. jupyter-execute::

    da.time.dt.month

.. jupyter-execute::

    da.time.dt.season

.. jupyter-execute::

    da.time.dt.dayofyear

.. jupyter-execute::

    da.time.dt.dayofweek

.. jupyter-execute::

    da.time.dt.days_in_month

.. jupyter-execute::

    da.time.dt.calendar

- Rounding of datetimes to fixed frequencies via the ``dt`` accessor:

.. jupyter-execute::

    da.time.dt.ceil("3D").head()

.. jupyter-execute::

    da.time.dt.floor("5D").head()

.. jupyter-execute::

    da.time.dt.round("2D").head()

- Group-by operations based on datetime accessor attributes (e.g. by month of
  the year):

.. jupyter-execute::

    da.groupby("time.month").sum()

- Interpolation using :py:class:`cftime.datetime` objects:

.. jupyter-execute::

    da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])

- Interpolation using datetime strings:

.. jupyter-execute::

    da.interp(time=["0001-01-15", "0001-02-15"])

- Differentiation:

.. jupyter-execute::

    da.differentiate("time")

- Serialization:

.. jupyter-execute::

    da.to_netcdf("example-no-leap.nc")
    reopened = xr.open_dataset("example-no-leap.nc")
    reopened

.. jupyter-execute::
    :hide-code:

    import os

    reopened.close()
    os.remove("example-no-leap.nc")

- And resampling along the time dimension for data indexed by a :py:class:`~xarray.CFTimeIndex`:

.. jupyter-execute::

    da.resample(time="81min", closed="right", label="right", offset="3min").mean()

.. _precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing# API reference#

This page provides an auto-generated summary of xarray’s API. For more details
and examples, refer to the relevant chapters in the main part of the
documentation.

See also: What parts of xarray are considered public API? and How stable is
Xarray’s API?.

## Top-level functions#

`apply_ufunc`(func, *args[, input_core_dims, ...]) | Apply a vectorized function for unlabeled arrays on xarray objects.  
---|---  
`align`(*objects[, join, copy, indexes, ...]) | Given any number of Dataset and/or DataArray objects, returns new objects with aligned indexes and dimension sizes.  
`broadcast`(*args[, exclude]) | Explicitly broadcast any number of DataArray or Dataset objects against one another.  
`concat`(objs, dim[, data_vars, coords, ...]) | Concatenate xarray objects along a new or existing dimension.  
`merge`(objects[, compat, join, fill_value, ...]) | Merge any number of xarray objects into a single Dataset as variables.  
`combine_by_coords`([data_objects, compat, ...]) | Attempt to auto-magically combine the given datasets (or data arrays) into one by using dimension coordinates.  
`combine_nested`(datasets, concat_dim[, ...]) | Explicitly combine an N-dimensional grid of datasets into one by using a succession of concat and merge operations along each dimension of the grid.  
`where`(cond, x, y[, keep_attrs]) | Return elements from x or y depending on cond.  
`infer_freq`(index) | Infer the most likely frequency given the input index.  
`full_like`(other, fill_value[, dtype, ...]) | Return a new object with the same shape and type as a given object.  
`zeros_like`(other[, dtype, chunks, ...]) | Return a new object of zeros with the same shape and type as a given dataarray or dataset.  
`ones_like`(other[, dtype, chunks, ...]) | Return a new object of ones with the same shape and type as a given dataarray or dataset.  
`cov`(da_a, da_b[, dim, ddof, weights]) | Compute covariance between two DataArray objects along a shared dimension.  
`corr`(da_a, da_b[, dim, weights]) | Compute the Pearson correlation coefficient between two DataArray objects along a shared dimension.  
`cross`(a, b, *, dim) | Compute the cross product of two (arrays of) vectors.  
`dot`(*arrays[, dim]) | Generalized dot product for xarray objects.  
`polyval`(coord, coeffs[, degree_dim]) | Evaluate a polynomial at specific values  
`map_blocks`(func, obj[, args, kwargs, template]) | Apply a function to each block of a DataArray or Dataset.  
`show_versions`([file]) | print the versions of xarray and its dependencies  
`set_options`(**kwargs) | Set options for xarray in a controlled context.  
`get_options`() | Get options for xarray.  
`unify_chunks`(*objects) | Given any number of Dataset and/or DataArray objects, returns new objects with unified chunk size along all chunked dimensions.  
  
## Dataset#

### Creating a dataset#

`Dataset`([data_vars, coords, attrs]) | A multi-dimensional, in memory, array database.  
---|---  
`decode_cf`(obj[, concat_characters, ...]) | Decode the given Dataset or Datastore according to CF conventions into a new Dataset.  
  
### Attributes#

`Dataset.dims` | Mapping from dimension names to lengths.  
---|---  
`Dataset.sizes` | Mapping from dimension names to lengths.  
`Dataset.dtypes` | Mapping from data variable names to dtypes.  
`Dataset.data_vars` | Dictionary of DataArray objects corresponding to data variables  
`Dataset.coords` | Mapping of `DataArray` objects corresponding to coordinate variables.  
`Dataset.attrs` | Dictionary of global attributes on this dataset  
`Dataset.encoding` | Dictionary of global encoding attributes on this dataset  
`Dataset.indexes` | Mapping of pandas.Index objects used for label based indexing.  
`Dataset.xindexes` | Mapping of `Index` objects used for label based indexing.  
`Dataset.chunks` | Mapping from dimension names to block lengths for this dataset's data.  
`Dataset.chunksizes` | Mapping from dimension names to block lengths for this dataset's data.  
`Dataset.nbytes` | Total bytes consumed by the data arrays of all variables in this dataset.  
  
### Dictionary interface#

Datasets implement the mapping interface with keys given by variable names and
values given by `DataArray` objects.

`Dataset.__getitem__`(key) | Access variables or coordinates of this dataset as a `DataArray` or a subset of variables or a indexed dataset.  
---|---  
`Dataset.__setitem__`(key, value) | Add an array to this dataset.  
`Dataset.__delitem__`(key) | Remove a variable from this dataset.  
`Dataset.update`(other) | Update this dataset's variables with those from another dataset.  
`Dataset.get`(k[,d]) |   
`Dataset.items`() |   
`Dataset.keys`() |   
`Dataset.values`() |   
  
### Dataset contents#

`Dataset.copy`([deep, data]) | Returns a copy of this dataset.  
---|---  
`Dataset.assign`([variables]) | Assign new data variables to a Dataset, returning a new object with all the original variables in addition to the new ones.  
`Dataset.assign_coords`([coords]) | Assign new coordinates to this object.  
`Dataset.assign_attrs`(*args, **kwargs) | Assign new attrs to this object.  
`Dataset.pipe`(func, *args, **kwargs) | Apply `func(self, *args, **kwargs)`  
`Dataset.merge`(other[, overwrite_vars, ...]) | Merge the arrays of two datasets into a single dataset.  
`Dataset.rename`([name_dict]) | Returns a new object with renamed variables, coordinates and dimensions.  
`Dataset.rename_vars`([name_dict]) | Returns a new object with renamed variables including coordinates  
`Dataset.rename_dims`([dims_dict]) | Returns a new object with renamed dimensions only.  
`Dataset.swap_dims`([dims_dict]) | Returns a new object with swapped dimensions.  
`Dataset.expand_dims`([dim, axis, ...]) | Return a new object with an additional axis (or axes) inserted at the corresponding position in the array shape.  
`Dataset.drop_vars`(names, *[, errors]) | Drop variables from this dataset.  
`Dataset.drop_indexes`(coord_names, *[, errors]) | Drop the indexes assigned to the given coordinates.  
`Dataset.drop_duplicates`(dim, *[, keep]) | Returns a new Dataset with duplicate dimension values removed.  
`Dataset.drop_dims`(drop_dims, *[, errors]) | Drop dimensions and associated variables from this dataset.  
`Dataset.drop_encoding`() | Return a new Dataset without encoding on the dataset or any of its variables/coords.  
`Dataset.drop_attrs`(*[, deep]) | Removes all attributes from the Dataset and its variables.  
`Dataset.set_coords`(names) | Given names of one or more variables, set them as coordinates  
`Dataset.reset_coords`([names, drop]) | Given names of coordinates, reset them to become variables  
`Dataset.convert_calendar`(calendar[, dim, ...]) | Convert the Dataset to another calendar.  
`Dataset.interp_calendar`(target[, dim]) | Interpolates the Dataset to another calendar based on decimal year measure.  
`Dataset.get_index`(key) | Get an index for a dimension, with fall-back to a default RangeIndex  
  
### Comparisons#

`Dataset.equals`(other) | Two Datasets are equal if they have matching variables and coordinates, all of which are equal.  
---|---  
`Dataset.identical`(other) | Like equals, but also checks all dataset attributes and the attributes on all variables and coordinates.  
`Dataset.broadcast_equals`(other) | Two Datasets are broadcast equal if they are equal after broadcasting all variables against each other.  
  
### Indexing#

`Dataset.loc` | Attribute for location based indexing.  
---|---  
`Dataset.isel`([indexers, drop, missing_dims]) | Returns a new dataset with each array indexed along the specified dimension(s).  
`Dataset.sel`([indexers, method, tolerance, drop]) | Returns a new dataset with each array indexed by tick labels along the specified dimension(s).  
`Dataset.drop_sel`([labels, errors]) | Drop index labels from this dataset.  
`Dataset.drop_isel`([indexers]) | Drop index positions from this Dataset.  
`Dataset.head`([indexers]) | Returns a new dataset with the first n values of each array for the specified dimension(s).  
`Dataset.tail`([indexers]) | Returns a new dataset with the last n values of each array for the specified dimension(s).  
`Dataset.thin`([indexers]) | Returns a new dataset with each array indexed along every n-th value for the specified dimension(s)  
`Dataset.squeeze`([dim, drop, axis]) | Return a new object with squeezed data.  
`Dataset.interp`([coords, method, ...]) | Interpolate a Dataset onto new coordinates.  
`Dataset.interp_like`(other[, method, ...]) | Interpolate this object onto the coordinates of another object.  
`Dataset.reindex`([indexers, method, ...]) | Conform this object onto a new set of indexes, filling in missing values with `fill_value`.  
`Dataset.reindex_like`(other[, method, ...]) | Conform this object onto the indexes of another object, for indexes which the objects share.  
`Dataset.set_index`([indexes, append]) | Set Dataset (multi-)indexes using one or more existing coordinates or variables.  
`Dataset.reset_index`(dims_or_levels, *[, drop]) | Reset the specified index(es) or multi-index level(s).  
`Dataset.set_xindex`(coord_names[, index_cls]) | Set a new, Xarray-compatible index from one or more existing coordinate(s).  
`Dataset.reorder_levels`([dim_order]) | Rearrange index levels using input order.  
`Dataset.query`([queries, parser, engine, ...]) | Return a new dataset with each array indexed along the specified dimension(s), where the indexers are given as strings containing Python expressions to be evaluated against the data variables in the dataset.  
  
### Missing value handling#

`Dataset.isnull`([keep_attrs]) | Test each value in the array for whether it is a missing value.  
---|---  
`Dataset.notnull`([keep_attrs]) | Test each value in the array for whether it is not a missing value.  
`Dataset.combine_first`(other) | Combine two Datasets, default to data_vars of self.  
`Dataset.count`([dim, keep_attrs]) | Reduce this Dataset's data by applying `count` along some dimension(s).  
`Dataset.dropna`(dim, *[, how, thresh, subset]) | Returns a new dataset with dropped labels for missing values along the provided dimension.  
`Dataset.fillna`(value) | Fill missing values in this object.  
`Dataset.ffill`(dim[, limit]) | Fill NaN values by propagating values forward  
`Dataset.bfill`(dim[, limit]) | Fill NaN values by propagating values backward  
`Dataset.interpolate_na`([dim, method, limit, ...]) | Fill in NaNs by interpolating according to different methods.  
`Dataset.where`(cond[, other, drop]) | Filter elements from this object according to a condition.  
`Dataset.isin`(test_elements) | Tests each value in the array for whether it is in test elements.  
  
### Computation#

`Dataset.map`(func[, keep_attrs, args]) | Apply a function to each data variable in this dataset  
---|---  
`Dataset.reduce`(func[, dim, keep_attrs, ...]) | Reduce this dataset by applying func along some dimension(s).  
`Dataset.groupby`([group, squeeze, ...]) | Returns a DatasetGroupBy object for performing grouped operations.  
`Dataset.groupby_bins`(group, bins[, right, ...]) | Returns a DatasetGroupBy object for performing grouped operations.  
`Dataset.rolling`([dim, min_periods, center]) | Rolling window object for Datasets.  
`Dataset.rolling_exp`([window, window_type]) | Exponentially-weighted moving window.  
`Dataset.cumulative`(dim[, min_periods]) | Accumulating object for Datasets  
`Dataset.weighted`(weights) | Weighted Dataset operations.  
`Dataset.coarsen`([dim, boundary, side, ...]) | Coarsen object for Datasets.  
`Dataset.resample`([indexer, skipna, closed, ...]) | Returns a Resample object for performing resampling operations.  
`Dataset.diff`(dim[, n, label]) | Calculate the n-th order discrete difference along given axis.  
`Dataset.quantile`(q[, dim, method, ...]) | Compute the qth quantile of the data along the specified dimension.  
`Dataset.differentiate`(coord[, edge_order, ...]) | Differentiate with the second order accurate central differences.  
`Dataset.integrate`(coord[, datetime_unit]) | Integrate along the given coordinate using the trapezoidal rule.  
`Dataset.map_blocks`(func[, args, kwargs, ...]) | Apply a function to each block of this Dataset.  
`Dataset.polyfit`(dim, deg[, skipna, rcond, ...]) | Least squares polynomial fit.  
`Dataset.curvefit`(coords, func[, ...]) | Curve fitting optimization for arbitrary functions.  
`Dataset.eval`(statement, *[, parser]) | Calculate an expression supplied as a string in the context of the dataset.  
  
### Aggregation#

`Dataset.all`([dim, keep_attrs]) | Reduce this Dataset's data by applying `all` along some dimension(s).  
---|---  
`Dataset.any`([dim, keep_attrs]) | Reduce this Dataset's data by applying `any` along some dimension(s).  
`Dataset.argmax`([dim]) | Indices of the maxima of the member variables.  
`Dataset.argmin`([dim]) | Indices of the minima of the member variables.  
`Dataset.count`([dim, keep_attrs]) | Reduce this Dataset's data by applying `count` along some dimension(s).  
`Dataset.idxmax`([dim, skipna, fill_value, ...]) | Return the coordinate label of the maximum value along a dimension.  
`Dataset.idxmin`([dim, skipna, fill_value, ...]) | Return the coordinate label of the minimum value along a dimension.  
`Dataset.max`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `max` along some dimension(s).  
`Dataset.min`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `min` along some dimension(s).  
`Dataset.mean`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `mean` along some dimension(s).  
`Dataset.median`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `median` along some dimension(s).  
`Dataset.prod`([dim, skipna, min_count, ...]) | Reduce this Dataset's data by applying `prod` along some dimension(s).  
`Dataset.sum`([dim, skipna, min_count, keep_attrs]) | Reduce this Dataset's data by applying `sum` along some dimension(s).  
`Dataset.std`([dim, skipna, ddof, keep_attrs]) | Reduce this Dataset's data by applying `std` along some dimension(s).  
`Dataset.var`([dim, skipna, ddof, keep_attrs]) | Reduce this Dataset's data by applying `var` along some dimension(s).  
`Dataset.cumsum`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `cumsum` along some dimension(s).  
`Dataset.cumprod`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `cumprod` along some dimension(s).  
  
### ndarray methods#

`Dataset.argsort`([axis, kind, order]) | Returns the indices that would sort this array.  
---|---  
`Dataset.astype`(dtype, *[, order, casting, ...]) | Copy of the xarray object, with data cast to a specified type.  
`Dataset.clip`([min, max, keep_attrs]) | Return an array whose values are limited to `[min, max]`.  
`Dataset.conj`() | Complex-conjugate all elements.  
`Dataset.conjugate`(*args, **kwargs) | a.conj()  
`Dataset.imag` | The imaginary part of each data variable.  
`Dataset.round`(*args, **kwargs) |   
`Dataset.real` | The real part of each data variable.  
`Dataset.rank`(dim, *[, pct, keep_attrs]) | Ranks the data.  
  
### Reshaping and reorganizing#

`Dataset.transpose`(*dim[, missing_dims]) | Return a new Dataset object with all array dimensions transposed.  
---|---  
`Dataset.stack`([dim, create_index, index_cls]) | Stack any number of existing dimensions into a single new dimension.  
`Dataset.unstack`([dim, fill_value, sparse]) | Unstack existing dimensions corresponding to MultiIndexes into multiple new dimensions.  
`Dataset.to_stacked_array`(new_dim, sample_dims) | Combine variables of differing dimensionality into a DataArray without broadcasting.  
`Dataset.shift`([shifts, fill_value]) | Shift this dataset by an offset along one or more dimensions.  
`Dataset.roll`([shifts, roll_coords]) | Roll this dataset by an offset along one or more dimensions.  
`Dataset.pad`([pad_width, mode, stat_length, ...]) | Pad this dataset along one or more dimensions.  
`Dataset.sortby`(variables[, ascending]) | Sort object by labels or values (along an axis).  
`Dataset.broadcast_like`(other[, exclude]) | Broadcast this DataArray against another Dataset or DataArray.  
  
## DataArray#

`DataArray`([data, coords, dims, name, attrs, ...]) | N-dimensional array with labeled coordinates and dimensions.  
---|---  
  
### Attributes#

`DataArray.values` | The array's data converted to numpy.ndarray.  
---|---  
`DataArray.data` | The DataArray's data as an array.  
`DataArray.coords` | Mapping of `DataArray` objects corresponding to coordinate variables.  
`DataArray.dims` | Tuple of dimension names associated with this array.  
`DataArray.sizes` | Ordered mapping from dimension names to lengths.  
`DataArray.name` | The name of this array.  
`DataArray.attrs` | Dictionary storing arbitrary metadata with this array.  
`DataArray.encoding` | Dictionary of format-specific settings for how this array should be serialized.  
`DataArray.indexes` | Mapping of pandas.Index objects used for label based indexing.  
`DataArray.xindexes` | Mapping of `Index` objects used for label based indexing.  
`DataArray.chunksizes` | Mapping from dimension names to block lengths for this dataarray's data.  
  
### ndarray attributes#

`DataArray.ndim` | Number of array dimensions.  
---|---  
`DataArray.nbytes` | Total bytes consumed by the elements of this DataArray's data.  
`DataArray.shape` | Tuple of array dimensions.  
`DataArray.size` | Number of elements in the array.  
`DataArray.dtype` | Data-type of the array’s elements.  
`DataArray.chunks` | Tuple of block lengths for this dataarray's data, in order of dimensions, or None if the underlying data is not a dask array.  
  
### DataArray contents#

`DataArray.assign_coords`([coords]) | Assign new coordinates to this object.  
---|---  
`DataArray.assign_attrs`(*args, **kwargs) | Assign new attrs to this object.  
`DataArray.pipe`(func, *args, **kwargs) | Apply `func(self, *args, **kwargs)`  
`DataArray.rename`([new_name_or_name_dict]) | Returns a new DataArray with renamed coordinates, dimensions or a new name.  
`DataArray.swap_dims`([dims_dict]) | Returns a new DataArray with swapped dimensions.  
`DataArray.expand_dims`([dim, axis, ...]) | Return a new object with an additional axis (or axes) inserted at the corresponding position in the array shape.  
`DataArray.drop_vars`(names, *[, errors]) | Returns an array with dropped variables.  
`DataArray.drop_indexes`(coord_names, *[, errors]) | Drop the indexes assigned to the given coordinates.  
`DataArray.drop_duplicates`(dim, *[, keep]) | Returns a new DataArray with duplicate dimension values removed.  
`DataArray.drop_encoding`() | Return a new DataArray without encoding on the array or any attached coords.  
`DataArray.drop_attrs`(*[, deep]) | Removes all attributes from the DataArray.  
`DataArray.reset_coords`([names, drop]) | Given names of coordinates, reset them to become variables.  
`DataArray.copy`([deep, data]) | Returns a copy of this array.  
`DataArray.convert_calendar`(calendar[, dim, ...]) | Convert the DataArray to another calendar.  
`DataArray.interp_calendar`(target[, dim]) | Interpolates the DataArray to another calendar based on decimal year measure.  
`DataArray.get_index`(key) | Get an index for a dimension, with fall-back to a default RangeIndex  
`DataArray.astype`(dtype, *[, order, casting, ...]) | Copy of the xarray object, with data cast to a specified type.  
`DataArray.item`(*args) | Copy an element of an array to a standard Python scalar and return it.  
  
### Indexing#

`DataArray.__getitem__`(key) |   
---|---  
`DataArray.__setitem__`(key, value) |   
`DataArray.loc` | Attribute for location based indexing like pandas.  
`DataArray.isel`([indexers, drop, missing_dims]) | Return a new DataArray whose data is given by selecting indexes along the specified dimension(s).  
`DataArray.sel`([indexers, method, tolerance, ...]) | Return a new DataArray whose data is given by selecting index labels along the specified dimension(s).  
`DataArray.drop_sel`([labels, errors]) | Drop index labels from this DataArray.  
`DataArray.drop_isel`([indexers]) | Drop index positions from this DataArray.  
`DataArray.head`([indexers]) | Return a new DataArray whose data is given by the the first n values along the specified dimension(s).  
`DataArray.tail`([indexers]) | Return a new DataArray whose data is given by the the last n values along the specified dimension(s).  
`DataArray.thin`([indexers]) | Return a new DataArray whose data is given by each n value along the specified dimension(s).  
`DataArray.squeeze`([dim, drop, axis]) | Return a new object with squeezed data.  
`DataArray.interp`([coords, method, ...]) | Interpolate a DataArray onto new coordinates.  
`DataArray.interp_like`(other[, method, ...]) | Interpolate this object onto the coordinates of another object, filling out of range values with NaN.  
`DataArray.reindex`([indexers, method, ...]) | Conform this object onto the indexes of another object, filling in missing values with `fill_value`.  
`DataArray.reindex_like`(other, *[, method, ...]) | Conform this object onto the indexes of another object, for indexes which the objects share.  
`DataArray.set_index`([indexes, append]) | Set DataArray (multi-)indexes using one or more existing coordinates.  
`DataArray.reset_index`(dims_or_levels[, drop]) | Reset the specified index(es) or multi-index level(s).  
`DataArray.set_xindex`(coord_names[, index_cls]) | Set a new, Xarray-compatible index from one or more existing coordinate(s).  
`DataArray.reorder_levels`([dim_order]) | Rearrange index levels using input order.  
`DataArray.query`([queries, parser, engine, ...]) | Return a new data array indexed along the specified dimension(s), where the indexers are given as strings containing Python expressions to be evaluated against the values in the array.  
  
### Missing value handling#

`DataArray.isnull`([keep_attrs]) | Test each value in the array for whether it is a missing value.  
---|---  
`DataArray.notnull`([keep_attrs]) | Test each value in the array for whether it is not a missing value.  
`DataArray.combine_first`(other) | Combine two DataArray objects, with union of coordinates.  
`DataArray.count`([dim, keep_attrs]) | Reduce this DataArray's data by applying `count` along some dimension(s).  
`DataArray.dropna`(dim, *[, how, thresh]) | Returns a new array with dropped labels for missing values along the provided dimension.  
`DataArray.fillna`(value) | Fill missing values in this object.  
`DataArray.ffill`(dim[, limit]) | Fill NaN values by propagating values forward  
`DataArray.bfill`(dim[, limit]) | Fill NaN values by propagating values backward  
`DataArray.interpolate_na`([dim, method, ...]) | Fill in NaNs by interpolating according to different methods.  
`DataArray.where`(cond[, other, drop]) | Filter elements from this object according to a condition.  
`DataArray.isin`(test_elements) | Tests each value in the array for whether it is in test elements.  
  
### Comparisons#

`DataArray.equals`(other) | True if two DataArrays have the same dimensions, coordinates and values; otherwise False.  
---|---  
`DataArray.identical`(other) | Like equals, but also checks the array name and attributes, and attributes on all coordinates.  
`DataArray.broadcast_equals`(other) | Two DataArrays are broadcast equal if they are equal after broadcasting them against each other such that they have the same dimensions.  
  
### Computation#

`DataArray.reduce`(func[, dim, axis, ...]) | Reduce this array by applying func along some dimension(s).  
---|---  
`DataArray.groupby`([group, squeeze, ...]) | Returns a DataArrayGroupBy object for performing grouped operations.  
`DataArray.groupby_bins`(group, bins[, right, ...]) | Returns a DataArrayGroupBy object for performing grouped operations.  
`DataArray.rolling`([dim, min_periods, center]) | Rolling window object for DataArrays.  
`DataArray.rolling_exp`([window, window_type]) | Exponentially-weighted moving window.  
`DataArray.cumulative`(dim[, min_periods]) | Accumulating object for DataArrays.  
`DataArray.weighted`(weights) | Weighted DataArray operations.  
`DataArray.coarsen`([dim, boundary, side, ...]) | Coarsen object for DataArrays.  
`DataArray.resample`([indexer, skipna, ...]) | Returns a Resample object for performing resampling operations.  
`DataArray.get_axis_num`(dim) | Return axis number(s) corresponding to dimension(s) in this array.  
`DataArray.diff`(dim[, n, label]) | Calculate the n-th order discrete difference along given axis.  
`DataArray.dot`(other[, dim]) | Perform dot product of two DataArrays along their shared dims.  
`DataArray.quantile`(q[, dim, method, ...]) | Compute the qth quantile of the data along the specified dimension.  
`DataArray.differentiate`(coord[, edge_order, ...]) | Differentiate the array with the second order accurate central differences.  
`DataArray.integrate`([coord, datetime_unit]) | Integrate along the given coordinate using the trapezoidal rule.  
`DataArray.polyfit`(dim, deg[, skipna, rcond, ...]) | Least squares polynomial fit.  
`DataArray.map_blocks`(func[, args, kwargs, ...]) | Apply a function to each block of this DataArray.  
`DataArray.curvefit`(coords, func[, ...]) | Curve fitting optimization for arbitrary functions.  
  
### Aggregation#

`DataArray.all`([dim, keep_attrs]) | Reduce this DataArray's data by applying `all` along some dimension(s).  
---|---  
`DataArray.any`([dim, keep_attrs]) | Reduce this DataArray's data by applying `any` along some dimension(s).  
`DataArray.argmax`([dim, axis, keep_attrs, skipna]) | Index or indices of the maximum of the DataArray over one or more dimensions.  
`DataArray.argmin`([dim, axis, keep_attrs, skipna]) | Index or indices of the minimum of the DataArray over one or more dimensions.  
`DataArray.count`([dim, keep_attrs]) | Reduce this DataArray's data by applying `count` along some dimension(s).  
`DataArray.idxmax`([dim, skipna, fill_value, ...]) | Return the coordinate label of the maximum value along a dimension.  
`DataArray.idxmin`([dim, skipna, fill_value, ...]) | Return the coordinate label of the minimum value along a dimension.  
`DataArray.max`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `max` along some dimension(s).  
`DataArray.min`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `min` along some dimension(s).  
`DataArray.mean`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `mean` along some dimension(s).  
`DataArray.median`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `median` along some dimension(s).  
`DataArray.prod`([dim, skipna, min_count, ...]) | Reduce this DataArray's data by applying `prod` along some dimension(s).  
`DataArray.sum`([dim, skipna, min_count, ...]) | Reduce this DataArray's data by applying `sum` along some dimension(s).  
`DataArray.std`([dim, skipna, ddof, keep_attrs]) | Reduce this DataArray's data by applying `std` along some dimension(s).  
`DataArray.var`([dim, skipna, ddof, keep_attrs]) | Reduce this DataArray's data by applying `var` along some dimension(s).  
`DataArray.cumsum`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `cumsum` along some dimension(s).  
`DataArray.cumprod`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `cumprod` along some dimension(s).  
  
### ndarray methods#

`DataArray.argsort`([axis, kind, order]) | Returns the indices that would sort this array.  
---|---  
`DataArray.clip`([min, max, keep_attrs]) | Return an array whose values are limited to `[min, max]`.  
`DataArray.conj`() | Complex-conjugate all elements.  
`DataArray.conjugate`(*args, **kwargs) | a.conj()  
`DataArray.imag` | The imaginary part of the array.  
`DataArray.searchsorted`(v[, side, sorter]) | Find indices where elements of v should be inserted in a to maintain order.  
`DataArray.round`(*args, **kwargs) |   
`DataArray.real` | The real part of the array.  
`DataArray.T` |   
`DataArray.rank`(dim, *[, pct, keep_attrs]) | Ranks the data.  
  
### String manipulation#

`DataArray.str` |   
---|---  
  
`DataArray.str.capitalize`() | Convert strings in the array to be capitalized.  
---|---  
`DataArray.str.casefold`() | Convert strings in the array to be casefolded.  
`DataArray.str.cat`(*others[, sep]) | Concatenate strings elementwise in the DataArray with other strings.  
`DataArray.str.center`(width[, fillchar]) | Pad left and right side of each string in the array.  
`DataArray.str.contains`(pat[, case, flags, regex]) | Test if pattern or regex is contained within each string of the array.  
`DataArray.str.count`(pat[, flags, case]) | Count occurrences of pattern in each string of the array.  
`DataArray.str.decode`(encoding[, errors]) | Decode character string in the array using indicated encoding.  
`DataArray.str.encode`(encoding[, errors]) | Encode character string in the array using indicated encoding.  
`DataArray.str.endswith`(pat) | Test if the end of each string in the array matches a pattern.  
`DataArray.str.extract`(pat, dim[, case, flags]) | Extract the first match of capture groups in the regex pat as a new dimension in a DataArray.  
`DataArray.str.extractall`(pat, group_dim, ...) | Extract all matches of capture groups in the regex pat as new dimensions in a DataArray.  
`DataArray.str.find`(sub[, start, end, side]) | Return lowest or highest indexes in each strings in the array where the substring is fully contained between [start:end].  
`DataArray.str.findall`(pat[, case, flags]) | Find all occurrences of pattern or regular expression in the DataArray.  
`DataArray.str.format`(*args, **kwargs) | Perform python string formatting on each element of the DataArray.  
`DataArray.str.get`(i[, default]) | Extract character number i from each string in the array.  
`DataArray.str.get_dummies`(dim[, sep]) | Return DataArray of dummy/indicator variables.  
`DataArray.str.index`(sub[, start, end, side]) | Return lowest or highest indexes in each strings where the substring is fully contained between [start:end].  
`DataArray.str.isalnum`() | Check whether all characters in each string are alphanumeric.  
`DataArray.str.isalpha`() | Check whether all characters in each string are alphabetic.  
`DataArray.str.isdecimal`() | Check whether all characters in each string are decimal.  
`DataArray.str.isdigit`() | Check whether all characters in each string are digits.  
`DataArray.str.islower`() | Check whether all characters in each string are lowercase.  
`DataArray.str.isnumeric`() | Check whether all characters in each string are numeric.  
`DataArray.str.isspace`() | Check whether all characters in each string are spaces.  
`DataArray.str.istitle`() | Check whether all characters in each string are titlecase.  
`DataArray.str.isupper`() | Check whether all characters in each string are uppercase.  
`DataArray.str.join`([dim, sep]) | Concatenate strings in a DataArray along a particular dimension.  
`DataArray.str.len`() | Compute the length of each string in the array.  
`DataArray.str.ljust`(width[, fillchar]) | Pad right side of each string in the array.  
`DataArray.str.lower`() | Convert strings in the array to lowercase.  
`DataArray.str.lstrip`([to_strip]) | Remove leading characters.  
`DataArray.str.match`(pat[, case, flags]) | Determine if each string in the array matches a regular expression.  
`DataArray.str.normalize`(form) | Return the Unicode normal form for the strings in the datarray.  
`DataArray.str.pad`(width[, side, fillchar]) | Pad strings in the array up to width.  
`DataArray.str.partition`(dim[, sep]) | Split the strings in the DataArray at the first occurrence of separator sep.  
`DataArray.str.repeat`(repeats) | Repeat each string in the array.  
`DataArray.str.replace`(pat, repl[, n, case, ...]) | Replace occurrences of pattern/regex in the array with some string.  
`DataArray.str.rfind`(sub[, start, end]) | Return highest indexes in each strings in the array where the substring is fully contained between [start:end].  
`DataArray.str.rindex`(sub[, start, end]) | Return highest indexes in each strings where the substring is fully contained between [start:end].  
`DataArray.str.rjust`(width[, fillchar]) | Pad left side of each string in the array.  
`DataArray.str.rpartition`(dim[, sep]) | Split the strings in the DataArray at the last occurrence of separator sep.  
`DataArray.str.rsplit`(dim[, sep, maxsplit]) | Split strings in a DataArray around the given separator/delimiter sep.  
`DataArray.str.rstrip`([to_strip]) | Remove trailing characters.  
`DataArray.str.slice`([start, stop, step]) | Slice substrings from each string in the array.  
`DataArray.str.slice_replace`([start, stop, repl]) | Replace a positional slice of a string with another value.  
`DataArray.str.split`(dim[, sep, maxsplit]) | Split strings in a DataArray around the given separator/delimiter sep.  
`DataArray.str.startswith`(pat) | Test if the start of each string in the array matches a pattern.  
`DataArray.str.strip`([to_strip, side]) | Remove leading and trailing characters.  
`DataArray.str.swapcase`() | Convert strings in the array to be swapcased.  
`DataArray.str.title`() | Convert strings in the array to titlecase.  
`DataArray.str.translate`(table) | Map characters of each string through the given mapping table.  
`DataArray.str.upper`() | Convert strings in the array to uppercase.  
`DataArray.str.wrap`(width, **kwargs) | Wrap long strings in the array in paragraphs with length less than width.  
`DataArray.str.zfill`(width) | Pad each string in the array by prepending '0' characters.  
  
### Datetimelike properties#

**Datetime properties** :

`DataArray.dt.year` | The year of the datetime  
---|---  
`DataArray.dt.month` | The month as January=1, December=12  
`DataArray.dt.day` | The days of the datetime  
`DataArray.dt.hour` | The hours of the datetime  
`DataArray.dt.minute` | The minutes of the datetime  
`DataArray.dt.second` | The seconds of the datetime  
`DataArray.dt.microsecond` | The microseconds of the datetime  
`DataArray.dt.nanosecond` | The nanoseconds of the datetime  
`DataArray.dt.dayofweek` | The day of the week with Monday=0, Sunday=6  
`DataArray.dt.weekday` | The day of the week with Monday=0, Sunday=6  
`DataArray.dt.dayofyear` | The ordinal day of the year  
`DataArray.dt.quarter` | The quarter of the date  
`DataArray.dt.days_in_month` | The number of days in the month  
`DataArray.dt.daysinmonth` | The number of days in the month  
`DataArray.dt.days_in_year` | Each datetime as the year plus the fraction of the year elapsed.  
`DataArray.dt.season` | Season of the year  
`DataArray.dt.time` | Timestamps corresponding to datetimes  
`DataArray.dt.date` | Date corresponding to datetimes  
`DataArray.dt.decimal_year` | Convert the dates as a fractional year.  
`DataArray.dt.calendar` | The name of the calendar of the dates.  
`DataArray.dt.is_month_start` | Indicate whether the date is the first day of the month  
`DataArray.dt.is_month_end` | Indicate whether the date is the last day of the month  
`DataArray.dt.is_quarter_end` | Indicate whether the date is the last day of a quarter  
`DataArray.dt.is_year_start` | Indicate whether the date is the first day of a year  
`DataArray.dt.is_leap_year` | Indicate if the date belongs to a leap year  
  
**Datetime methods** :

`DataArray.dt.floor`(freq) | Round timestamps downward to specified frequency resolution.  
---|---  
`DataArray.dt.ceil`(freq) | Round timestamps upward to specified frequency resolution.  
`DataArray.dt.isocalendar`() | Dataset containing ISO year, week number, and weekday.  
`DataArray.dt.round`(freq) | Round timestamps to specified frequency resolution.  
`DataArray.dt.strftime`(date_format) | Return an array of formatted strings specified by date_format, which supports the same string format as the python standard library.  
  
**Timedelta properties** :

`DataArray.dt.days` | Number of days for each element  
---|---  
`DataArray.dt.seconds` | Number of seconds (>= 0 and less than 1 day) for each element  
`DataArray.dt.microseconds` | Number of microseconds (>= 0 and less than 1 second) for each element  
`DataArray.dt.nanoseconds` | Number of nanoseconds (>= 0 and less than 1 microsecond) for each element  
`DataArray.dt.total_seconds` |   
  
**Timedelta methods** :

`DataArray.dt.floor`(freq) | Round timestamps downward to specified frequency resolution.  
---|---  
`DataArray.dt.ceil`(freq) | Round timestamps upward to specified frequency resolution.  
`DataArray.dt.round`(freq) | Round timestamps to specified frequency resolution.  
  
### Reshaping and reorganizing#

`DataArray.transpose`(*dim[, ...]) | Return a new DataArray object with transposed dimensions.  
---|---  
`DataArray.stack`([dim, create_index, index_cls]) | Stack any number of existing dimensions into a single new dimension.  
`DataArray.unstack`([dim, fill_value, sparse]) | Unstack existing dimensions corresponding to MultiIndexes into multiple new dimensions.  
`DataArray.to_unstacked_dataset`(dim[, level]) | Unstack DataArray expanding to Dataset along a given level of a stacked coordinate.  
`DataArray.shift`([shifts, fill_value]) | Shift this DataArray by an offset along one or more dimensions.  
`DataArray.roll`([shifts, roll_coords]) | Roll this array by an offset along one or more dimensions.  
`DataArray.pad`([pad_width, mode, ...]) | Pad this array along one or more dimensions.  
`DataArray.sortby`(variables[, ascending]) | Sort object by labels or values (along an axis).  
`DataArray.broadcast_like`(other, *[, exclude]) | Broadcast this DataArray against another Dataset or DataArray.  
  
## DataTree#

### Creating a DataTree#

Methods of creating a `DataTree`.

`DataTree`([dataset, children, name]) | A tree-like hierarchical collection of xarray objects.  
---|---  
`DataTree.from_dict`(d, /[, name]) | Create a datatree from a dictionary of data objects, organised by paths into the tree.  
  
### Tree Attributes#

Attributes relating to the recursive tree-like structure of a `DataTree`.

`DataTree.parent` | Parent of this node.  
---|---  
`DataTree.children` | Child nodes of this node, stored under a mapping via their names.  
`DataTree.name` | The name of this node.  
`DataTree.path` | Return the file-like path from the root to this node.  
`DataTree.root` | Root node of the tree  
`DataTree.is_root` | Whether this node is the tree root.  
`DataTree.is_leaf` | Whether this node is a leaf node.  
`DataTree.leaves` | All leaf nodes.  
`DataTree.level` | Level of this node.  
`DataTree.depth` | Maximum level of this tree.  
`DataTree.width` | Number of nodes at this level in the tree.  
`DataTree.subtree` | Iterate over all nodes in this tree, including both self and all descendants.  
`DataTree.subtree_with_keys` | Iterate over relative paths and node pairs for all nodes in this tree.  
`DataTree.descendants` | Child nodes and all their child nodes.  
`DataTree.siblings` | Nodes with the same parent as this node.  
`DataTree.lineage` | All parent nodes and their parent nodes, starting with the closest.  
`DataTree.parents` | All parent nodes and their parent nodes, starting with the closest.  
`DataTree.ancestors` | All parent nodes and their parent nodes, starting with the most distant.  
`DataTree.groups` | Return all groups in the tree, given as a tuple of path-like strings.  
`DataTree.xindexes` | Mapping of xarray Index objects used for label based indexing.  
  
### Data Contents#

Interface to the data objects (optionally) stored inside a single `DataTree`
node. This interface echoes that of `xarray.Dataset`.

`DataTree.dims` | Mapping from dimension names to lengths.  
---|---  
`DataTree.sizes` | Mapping from dimension names to lengths.  
`DataTree.data_vars` | Dictionary of DataArray objects corresponding to data variables  
`DataTree.ds` | An immutable Dataset-like view onto the data in this node.  
`DataTree.coords` | Dictionary of xarray.DataArray objects corresponding to coordinate variables  
`DataTree.attrs` | Dictionary of global attributes on this node object.  
`DataTree.encoding` | Dictionary of global encoding attributes on this node object.  
`DataTree.indexes` | Mapping of pandas.Index objects used for label based indexing.  
`DataTree.nbytes` |   
`DataTree.dataset` | An immutable Dataset-like view onto the data in this node.  
`DataTree.to_dataset`([inherit]) | Return the data in this node as a new xarray.Dataset object.  
`DataTree.has_data` | Whether or not there are any variables in this node.  
`DataTree.has_attrs` | Whether or not there are any metadata attributes in this node.  
`DataTree.is_empty` | False if node contains any data or attrs.  
`DataTree.is_hollow` | True if only leaf nodes contain data.  
`DataTree.chunksizes` | Mapping from group paths to a mapping of chunksizes.  
  
### Dictionary Interface#

`DataTree` objects also have a dict-like interface mapping keys to either
`xarray.DataArray`s or to child `DataTree` nodes.

`DataTree.__getitem__`(key) | Access child nodes, variables, or coordinates stored anywhere in this tree.  
---|---  
`DataTree.__setitem__`(key, value) | Add either a child node or an array to the tree, at any position.  
`DataTree.__delitem__`(key) | Remove a variable or child node from this datatree node.  
`DataTree.update`(other) | Update this node's children and / or variables.  
`DataTree.get`(key[, default]) | Access child nodes, variables, or coordinates stored in this node.  
`DataTree.items`() |   
`DataTree.keys`() |   
`DataTree.values`() |   
  
### Tree Manipulation#

For manipulating, traversing, navigating, or mapping over the tree structure.

`DataTree.orphan`() | Detach this node from its parent.  
---|---  
`DataTree.same_tree`(other) | True if other node is in the same tree as this node.  
`DataTree.relative_to`(other) | Compute the relative path from this node to node other.  
`DataTree.iter_lineage`() | Iterate up the tree, starting from the current node.  
`DataTree.find_common_ancestor`(other) | Find the first common ancestor of two nodes in the same tree.  
`DataTree.map_over_datasets`(func, *args[, kwargs]) | Apply a function to every dataset in this subtree, returning a new tree which stores the results.  
`DataTree.pipe`(func, *args, **kwargs) | Apply `func(self, *args, **kwargs)`  
`DataTree.match`(pattern) | Return nodes with paths matching pattern.  
`DataTree.filter`(filterfunc) | Filter nodes according to a specified condition.  
`DataTree.filter_like`(other) | Filter a datatree like another datatree.  
  
### Pathlib-like Interface#

`DataTree` objects deliberately echo some of the API of `pathlib.PurePath`.

`DataTree.name` | The name of this node.  
---|---  
`DataTree.parent` | Parent of this node.  
`DataTree.parents` | All parent nodes and their parent nodes, starting with the closest.  
`DataTree.relative_to`(other) | Compute the relative path from this node to node other.  
  
### DataTree Contents#

Manipulate the contents of all nodes in a `DataTree` simultaneously.

`DataTree.copy`(*[, inherit, deep]) | Returns a copy of this subtree.  
---|---  
  
### DataTree Node Contents#

Manipulate the contents of a single `DataTree` node.

`DataTree.assign`([items]) | Assign new data variables or child nodes to a DataTree, returning a new object with all the original items in addition to the new ones.  
---|---  
`DataTree.drop_nodes`(names, *[, errors]) | Drop child nodes from this node.  
  
### DataTree Operations#

Apply operations over multiple `DataTree` objects.

`map_over_datasets`(func, *args[, kwargs]) | Applies a function to every dataset in one or more DataTree objects with the same structure (ie.., that are isomorphic), returning new trees which store the results.  
---|---  
`group_subtrees`(*trees) | Iterate over subtrees grouped by relative paths in breadth-first order.  
  
### Comparisons#

Compare one `DataTree` object to another.

`DataTree.isomorphic`(other) | Two DataTrees are considered isomorphic if the set of paths to their descendent nodes are the same.  
---|---  
`DataTree.equals`(other) | Two DataTrees are equal if they have isomorphic node structures, with matching node names, and if they have matching variables and coordinates, all of which are equal.  
`DataTree.identical`(other) | Like equals, but also checks attributes on all datasets, variables and coordinates, and requires that any inherited coordinates at the tree root are also inherited on the other tree.  
  
### Indexing#

Index into all nodes in the subtree simultaneously.

`DataTree.isel`([indexers, drop, missing_dims]) | Returns a new data tree with each array indexed along the specified dimension(s).  
---|---  
`DataTree.sel`([indexers, method, tolerance, drop]) | Returns a new data tree with each array indexed by tick labels along the specified dimension(s).  
  
### Aggregation#

Aggregate data in all nodes in the subtree simultaneously.

`DataTree.all`([dim, keep_attrs]) | Reduce this DataTree's data by applying `all` along some dimension(s).  
---|---  
`DataTree.any`([dim, keep_attrs]) | Reduce this DataTree's data by applying `any` along some dimension(s).  
`DataTree.max`([dim, skipna, keep_attrs]) | Reduce this DataTree's data by applying `max` along some dimension(s).  
`DataTree.min`([dim, skipna, keep_attrs]) | Reduce this DataTree's data by applying `min` along some dimension(s).  
`DataTree.mean`([dim, skipna, keep_attrs]) | Reduce this DataTree's data by applying `mean` along some dimension(s).  
`DataTree.median`([dim, skipna, keep_attrs]) | Reduce this DataTree's data by applying `median` along some dimension(s).  
`DataTree.prod`([dim, skipna, min_count, ...]) | Reduce this DataTree's data by applying `prod` along some dimension(s).  
`DataTree.sum`([dim, skipna, min_count, ...]) | Reduce this DataTree's data by applying `sum` along some dimension(s).  
`DataTree.std`([dim, skipna, ddof, keep_attrs]) | Reduce this DataTree's data by applying `std` along some dimension(s).  
`DataTree.var`([dim, skipna, ddof, keep_attrs]) | Reduce this DataTree's data by applying `var` along some dimension(s).  
`DataTree.cumsum`([dim, skipna, keep_attrs]) | Reduce this DataTree's data by applying `cumsum` along some dimension(s).  
`DataTree.cumprod`([dim, skipna, keep_attrs]) | Reduce this DataTree's data by applying `cumprod` along some dimension(s).  
  
### ndarray methods#

Methods copied from `numpy.ndarray` objects, here applying to the data in all
nodes in the subtree.

`DataTree.argsort`([axis, kind, order]) | Returns the indices that would sort this array.  
---|---  
`DataTree.conj`() | Complex-conjugate all elements.  
`DataTree.conjugate`(*args, **kwargs) | a.conj()  
`DataTree.round`(*args, **kwargs) |   
  
## Coordinates#

### Creating coordinates#

`Coordinates`([coords, indexes]) | Dictionary like container for Xarray coordinates (variables + indexes).  
---|---  
`Coordinates.from_xindex`(index) | Create Xarray coordinates from an existing Xarray index.  
`Coordinates.from_pandas_multiindex`(midx, dim) | Wrap a pandas multi-index as Xarray coordinates (dimension + levels).  
  
### Attributes#

`Coordinates.dims` | Mapping from dimension names to lengths or tuple of dimension names.  
---|---  
`Coordinates.sizes` | Mapping from dimension names to lengths.  
`Coordinates.dtypes` | Mapping from coordinate names to dtypes.  
`Coordinates.variables` | Low level interface to Coordinates contents as dict of Variable objects.  
`Coordinates.indexes` | Mapping of pandas.Index objects used for label based indexing.  
`Coordinates.xindexes` | Mapping of `Index` objects used for label based indexing.  
  
### Dictionary Interface#

Coordinates implement the mapping interface with keys given by variable names
and values given by `DataArray` objects.

`Coordinates.__getitem__`(key) |   
---|---  
`Coordinates.__setitem__`(key, value) |   
`Coordinates.__delitem__`(key) |   
`Coordinates.update`(other) | Update this Coordinates variables with other coordinate variables.  
`Coordinates.get`(k[,d]) |   
`Coordinates.items`() |   
`Coordinates.keys`() |   
`Coordinates.values`() |   
  
### Coordinates contents#

`Coordinates.to_dataset`() | Convert these coordinates into a new Dataset.  
---|---  
`Coordinates.to_index`([ordered_dims]) | Convert all index coordinates into a `pandas.Index`.  
`Coordinates.assign`([coords]) | Assign new coordinates (and indexes) to a Coordinates object, returning a new object with all the original coordinates in addition to the new ones.  
`Coordinates.merge`(other) | Merge two sets of coordinates to create a new Dataset  
`Coordinates.copy`([deep, memo]) | Return a copy of this Coordinates object.  
  
### Comparisons#

`Coordinates.equals`(other) | Two Coordinates objects are equal if they have matching variables, all of which are equal.  
---|---  
`Coordinates.identical`(other) | Like equals, but also checks all variable attributes.  
  
### Proxies#

Coordinates that are accessed from the `coords` property of Dataset, DataArray
and DataTree objects, respectively.

`core.coordinates.DatasetCoordinates`(dataset) | Dictionary like container for Dataset coordinates (variables + indexes).  
---|---  
`core.coordinates.DataArrayCoordinates`(dataarray) | Dictionary like container for DataArray coordinates (variables + indexes).  
`core.coordinates.DataTreeCoordinates`(datatree) | Dictionary like container for coordinates of a DataTree node (variables + indexes).  
  
## Indexes#

Default, pandas-backed indexes built-in to Xarray:

`indexes.PandasIndex`(array, dim[, ...]) | Wrap a pandas.Index as an xarray compatible index.  
---|---  
`indexes.PandasMultiIndex`(array, dim[, ...]) | Wrap a pandas.MultiIndex as an xarray compatible index.  
  
More complex indexes built-in to Xarray:

`CFTimeIndex`(data[, name]) | Custom Index for working with CF calendars and dates  
---|---  
`indexes.RangeIndex`(transform) | Xarray index implementing a simple bounded 1-dimension interval with evenly spaced, monotonic floating-point values.  
`indexes.NDPointIndex`(tree_obj, *, ...) | Xarray index for irregular, n-dimensional data.  
  
### Creating indexes#

`cftime_range`([start, end, periods, freq, ...]) | Return a fixed frequency CFTimeIndex.  
---|---  
`date_range`([start, end, periods, freq, tz, ...]) | Return a fixed frequency datetime index.  
`date_range_like`(source, calendar[, use_cftime]) | Generate a datetime array with the same frequency, start and end as another one, but in a different calendar.  
`indexes.RangeIndex.arange`([start, stop, ...]) | Create a new RangeIndex from given start, stop and step values.  
`indexes.RangeIndex.linspace`(start, stop[, ...]) | Create a new RangeIndex from given start / stop values and number of values.  
  
## Universal functions#

These functions are equivalent to their NumPy versions, but for xarray objects
backed by non-NumPy array types (e.g. `cupy`, `sparse`, or `jax`), they will
ensure that the computation is dispatched to the appropriate backend. You can
find them in the `xarray.ufuncs` module:

`ufuncs.abs` | xarray specific variant of `numpy.abs()`.  
---|---  
`ufuncs.absolute` | xarray specific variant of `numpy.absolute()`.  
`ufuncs.acos` | xarray specific variant of `numpy.acos()`.  
`ufuncs.acosh` | xarray specific variant of `numpy.acosh()`.  
`ufuncs.arccos` | xarray specific variant of `numpy.arccos()`.  
`ufuncs.arccosh` | xarray specific variant of `numpy.arccosh()`.  
`ufuncs.arcsin` | xarray specific variant of `numpy.arcsin()`.  
`ufuncs.arcsinh` | xarray specific variant of `numpy.arcsinh()`.  
`ufuncs.arctan` | xarray specific variant of `numpy.arctan()`.  
`ufuncs.arctanh` | xarray specific variant of `numpy.arctanh()`.  
`ufuncs.asin` | xarray specific variant of `numpy.asin()`.  
`ufuncs.asinh` | xarray specific variant of `numpy.asinh()`.  
`ufuncs.atan` | xarray specific variant of `numpy.atan()`.  
`ufuncs.atanh` | xarray specific variant of `numpy.atanh()`.  
`ufuncs.bitwise_count` | xarray specific variant of `numpy.bitwise_count()`.  
`ufuncs.bitwise_invert` | xarray specific variant of `numpy.bitwise_invert()`.  
`ufuncs.bitwise_not` | xarray specific variant of `numpy.bitwise_not()`.  
`ufuncs.cbrt` | xarray specific variant of `numpy.cbrt()`.  
`ufuncs.ceil` | xarray specific variant of `numpy.ceil()`.  
`ufuncs.conj` | xarray specific variant of `numpy.conj()`.  
`ufuncs.conjugate` | xarray specific variant of `numpy.conjugate()`.  
`ufuncs.cos` | xarray specific variant of `numpy.cos()`.  
`ufuncs.cosh` | xarray specific variant of `numpy.cosh()`.  
`ufuncs.deg2rad` | xarray specific variant of `numpy.deg2rad()`.  
`ufuncs.degrees` | xarray specific variant of `numpy.degrees()`.  
`ufuncs.exp` | xarray specific variant of `numpy.exp()`.  
`ufuncs.exp2` | xarray specific variant of `numpy.exp2()`.  
`ufuncs.expm1` | xarray specific variant of `numpy.expm1()`.  
`ufuncs.fabs` | xarray specific variant of `numpy.fabs()`.  
`ufuncs.floor` | xarray specific variant of `numpy.floor()`.  
`ufuncs.invert` | xarray specific variant of `numpy.invert()`.  
`ufuncs.isfinite` | xarray specific variant of `numpy.isfinite()`.  
`ufuncs.isinf` | xarray specific variant of `numpy.isinf()`.  
`ufuncs.isnan` | xarray specific variant of `numpy.isnan()`.  
`ufuncs.isnat` | xarray specific variant of `numpy.isnat()`.  
`ufuncs.log` | xarray specific variant of `numpy.log()`.  
`ufuncs.log10` | xarray specific variant of `numpy.log10()`.  
`ufuncs.log1p` | xarray specific variant of `numpy.log1p()`.  
`ufuncs.log2` | xarray specific variant of `numpy.log2()`.  
`ufuncs.logical_not` | xarray specific variant of `numpy.logical_not()`.  
`ufuncs.negative` | xarray specific variant of `numpy.negative()`.  
`ufuncs.positive` | xarray specific variant of `numpy.positive()`.  
`ufuncs.rad2deg` | xarray specific variant of `numpy.rad2deg()`.  
`ufuncs.radians` | xarray specific variant of `numpy.radians()`.  
`ufuncs.reciprocal` | xarray specific variant of `numpy.reciprocal()`.  
`ufuncs.rint` | xarray specific variant of `numpy.rint()`.  
`ufuncs.sign` | xarray specific variant of `numpy.sign()`.  
`ufuncs.signbit` | xarray specific variant of `numpy.signbit()`.  
`ufuncs.sin` | xarray specific variant of `numpy.sin()`.  
`ufuncs.sinh` | xarray specific variant of `numpy.sinh()`.  
`ufuncs.spacing` | xarray specific variant of `numpy.spacing()`.  
`ufuncs.sqrt` | xarray specific variant of `numpy.sqrt()`.  
`ufuncs.square` | xarray specific variant of `numpy.square()`.  
`ufuncs.tan` | xarray specific variant of `numpy.tan()`.  
`ufuncs.tanh` | xarray specific variant of `numpy.tanh()`.  
`ufuncs.trunc` | xarray specific variant of `numpy.trunc()`.  
`ufuncs.add` | xarray specific variant of `numpy.add()`.  
`ufuncs.arctan2` | xarray specific variant of `numpy.arctan2()`.  
`ufuncs.atan2` | xarray specific variant of `numpy.atan2()`.  
`ufuncs.bitwise_and` | xarray specific variant of `numpy.bitwise_and()`.  
`ufuncs.bitwise_left_shift` | xarray specific variant of `numpy.bitwise_left_shift()`.  
`ufuncs.bitwise_or` | xarray specific variant of `numpy.bitwise_or()`.  
`ufuncs.bitwise_right_shift` | xarray specific variant of `numpy.bitwise_right_shift()`.  
`ufuncs.bitwise_xor` | xarray specific variant of `numpy.bitwise_xor()`.  
`ufuncs.copysign` | xarray specific variant of `numpy.copysign()`.  
`ufuncs.divide` | xarray specific variant of `numpy.divide()`.  
`ufuncs.equal` | xarray specific variant of `numpy.equal()`.  
`ufuncs.float_power` | xarray specific variant of `numpy.float_power()`.  
`ufuncs.floor_divide` | xarray specific variant of `numpy.floor_divide()`.  
`ufuncs.fmax` | xarray specific variant of `numpy.fmax()`.  
`ufuncs.fmin` | xarray specific variant of `numpy.fmin()`.  
`ufuncs.fmod` | xarray specific variant of `numpy.fmod()`.  
`ufuncs.gcd` | xarray specific variant of `numpy.gcd()`.  
`ufuncs.greater` | xarray specific variant of `numpy.greater()`.  
`ufuncs.greater_equal` | xarray specific variant of `numpy.greater_equal()`.  
`ufuncs.heaviside` | xarray specific variant of `numpy.heaviside()`.  
`ufuncs.hypot` | xarray specific variant of `numpy.hypot()`.  
`ufuncs.lcm` | xarray specific variant of `numpy.lcm()`.  
`ufuncs.ldexp` | xarray specific variant of `numpy.ldexp()`.  
`ufuncs.left_shift` | xarray specific variant of `numpy.left_shift()`.  
`ufuncs.less` | xarray specific variant of `numpy.less()`.  
`ufuncs.less_equal` | xarray specific variant of `numpy.less_equal()`.  
`ufuncs.logaddexp` | xarray specific variant of `numpy.logaddexp()`.  
`ufuncs.logaddexp2` | xarray specific variant of `numpy.logaddexp2()`.  
`ufuncs.logical_and` | xarray specific variant of `numpy.logical_and()`.  
`ufuncs.logical_or` | xarray specific variant of `numpy.logical_or()`.  
`ufuncs.logical_xor` | xarray specific variant of `numpy.logical_xor()`.  
`ufuncs.maximum` | xarray specific variant of `numpy.maximum()`.  
`ufuncs.minimum` | xarray specific variant of `numpy.minimum()`.  
`ufuncs.mod` | xarray specific variant of `numpy.mod()`.  
`ufuncs.multiply` | xarray specific variant of `numpy.multiply()`.  
`ufuncs.nextafter` | xarray specific variant of `numpy.nextafter()`.  
`ufuncs.not_equal` | xarray specific variant of `numpy.not_equal()`.  
`ufuncs.pow` | xarray specific variant of `numpy.pow()`.  
`ufuncs.power` | xarray specific variant of `numpy.power()`.  
`ufuncs.remainder` | xarray specific variant of `numpy.remainder()`.  
`ufuncs.right_shift` | xarray specific variant of `numpy.right_shift()`.  
`ufuncs.subtract` | xarray specific variant of `numpy.subtract()`.  
`ufuncs.true_divide` | xarray specific variant of `numpy.true_divide()`.  
`ufuncs.angle` | xarray specific variant of `numpy.angle()`.  
`ufuncs.isreal` | xarray specific variant of `numpy.isreal()`.  
`ufuncs.iscomplex` | xarray specific variant of `numpy.iscomplex()`.  
  
## IO / Conversion#

### Dataset methods#

`load_dataset`(filename_or_obj, **kwargs) | Open, load into memory, and close a Dataset from a file or file-like object.  
---|---  
`open_dataset`(filename_or_obj, *[, engine, ...]) | Open and decode a dataset from a file or file-like object.  
`open_mfdataset`(paths[, chunks, concat_dim, ...]) | Open multiple files as a single dataset.  
`open_zarr`(store[, group, synchronizer, ...]) | Load and decode a dataset from a Zarr store.  
`save_mfdataset`(datasets, paths[, mode, ...]) | Write multiple datasets to disk as netCDF files simultaneously.  
`Dataset.as_numpy`() | Coerces wrapped data and coordinates into numpy arrays, returning a Dataset.  
`Dataset.from_dataframe`(dataframe[, sparse]) | Convert a pandas.DataFrame into an xarray.Dataset  
`Dataset.from_dict`(d) | Convert a dictionary into an xarray.Dataset.  
`Dataset.to_dataarray`([dim, name]) | Convert this dataset into an xarray.DataArray  
`Dataset.to_dataframe`([dim_order]) | Convert this dataset into a pandas.DataFrame.  
`Dataset.to_dask_dataframe`([dim_order, set_index]) | Convert this dataset into a dask.dataframe.DataFrame.  
`Dataset.to_dict`([data, encoding]) | Convert this dataset to a dictionary following xarray naming conventions.  
`Dataset.to_netcdf`([path, mode, format, ...]) | Write dataset contents to a netCDF file.  
`Dataset.to_pandas`() | Convert this dataset into a pandas object without changing the number of dimensions.  
`Dataset.to_zarr`([store, chunk_store, mode, ...]) | Write dataset contents to a zarr group.  
`Dataset.chunk`([chunks, name_prefix, token, ...]) | Coerce all arrays in this dataset into dask arrays with the given chunks.  
`Dataset.close`() | Release any resources linked to this object.  
`Dataset.compute`(**kwargs) | Manually trigger loading and/or computation of this dataset's data from disk or a remote source into memory and return a new dataset.  
`Dataset.filter_by_attrs`(**kwargs) | Returns a `Dataset` with variables that match specific conditions.  
`Dataset.info`([buf]) | Concise summary of a Dataset variables and attributes.  
`Dataset.load`(**kwargs) | Manually trigger loading and/or computation of this dataset's data from disk or a remote source into memory and return this dataset.  
`Dataset.persist`(**kwargs) | Trigger computation, keeping data as chunked arrays.  
`Dataset.unify_chunks`() | Unify chunk size along all chunked dimensions of this Dataset.  
  
### DataArray methods#

`load_dataarray`(filename_or_obj, **kwargs) | Open, load into memory, and close a DataArray from a file or file-like object containing a single data variable.  
---|---  
`open_dataarray`(filename_or_obj, *[, engine, ...]) | Open an DataArray from a file or file-like object containing a single data variable.  
`DataArray.as_numpy`() | Coerces wrapped data and coordinates into numpy arrays, returning a DataArray.  
`DataArray.from_dict`(d) | Convert a dictionary into an xarray.DataArray  
`DataArray.from_iris`(cube) | Convert a iris.cube.Cube into an xarray.DataArray  
`DataArray.from_series`(series[, sparse]) | Convert a pandas.Series into an xarray.DataArray.  
`DataArray.to_dask_dataframe`([dim_order, ...]) | Convert this array into a dask.dataframe.DataFrame.  
`DataArray.to_dataframe`([name, dim_order]) | Convert this array and its coordinates into a tidy pandas.DataFrame.  
`DataArray.to_dataset`([dim, name, promote_attrs]) | Convert a DataArray to a Dataset.  
`DataArray.to_dict`([data, encoding]) | Convert this xarray.DataArray into a dictionary following xarray naming conventions.  
`DataArray.to_index`() | Convert this variable to a pandas.Index.  
`DataArray.to_iris`() | Convert this array into a iris.cube.Cube  
`DataArray.to_masked_array`([copy]) | Convert this array into a numpy.ma.MaskedArray  
`DataArray.to_netcdf`([path, mode, format, ...]) | Write DataArray contents to a netCDF file.  
`DataArray.to_numpy`() | Coerces wrapped data to numpy and returns a numpy.ndarray.  
`DataArray.to_pandas`() | Convert this array into a pandas object with the same shape.  
`DataArray.to_series`() | Convert this array into a pandas.Series.  
`DataArray.to_zarr`([store, chunk_store, ...]) | Write DataArray contents to a Zarr store  
`DataArray.chunk`([chunks, name_prefix, ...]) | Coerce this array's data into a dask arrays with the given chunks.  
`DataArray.close`() | Release any resources linked to this object.  
`DataArray.compute`(**kwargs) | Manually trigger loading of this array's data from disk or a remote source into memory and return a new array.  
`DataArray.persist`(**kwargs) | Trigger computation in constituent dask arrays  
`DataArray.load`(**kwargs) | Manually trigger loading of this array's data from disk or a remote source into memory and return this array.  
`DataArray.unify_chunks`() | Unify chunk size along all chunked dimensions of this DataArray.  
  
### DataTree methods#

`open_datatree`(filename_or_obj, *[, engine, ...]) | Open and decode a DataTree from a file or file-like object, creating one tree node for each group in the file.  
---|---  
`open_groups`(filename_or_obj, *[, engine, ...]) | Open and decode a file or file-like object, creating a dictionary containing one xarray Dataset for each group in the file.  
`DataTree.to_dict`([relative]) | Create a dictionary mapping of paths to the data contained in those nodes.  
`DataTree.to_netcdf`(filepath[, mode, ...]) | Write datatree contents to a netCDF file.  
`DataTree.to_zarr`(store[, mode, encoding, ...]) | Write datatree contents to a Zarr store.  
`DataTree.chunk`([chunks, name_prefix, token, ...]) | Coerce all arrays in all groups in this tree into dask arrays with the given chunks.  
`DataTree.load`(**kwargs) | Manually trigger loading and/or computation of this datatree's data from disk or a remote source into memory and return this datatree.  
`DataTree.compute`(**kwargs) | Manually trigger loading and/or computation of this datatree's data from disk or a remote source into memory and return a new datatree.  
`DataTree.persist`(**kwargs) | Trigger computation, keeping data as chunked arrays.  
  
## Encoding/Decoding#

### Coder objects#

`coders.CFDatetimeCoder`([use_cftime, time_unit]) | Coder for CF Datetime coding.  
---|---  
  
## Plotting#

### Dataset#

`Dataset.plot.scatter`(*args[, x, y, z, hue, ...]) | Scatter variables against each other.  
---|---  
`Dataset.plot.quiver`(*args[, x, y, u, v, ...]) | Quiver plot of Dataset variables.  
`Dataset.plot.streamplot`(*args[, x, y, u, v, ...]) | Plot streamlines of Dataset variables.  
  
### DataArray#

`DataArray.plot`(*[, row, col, col_wrap, ax, ...]) | Default plot of DataArray using `matplotlib.pyplot`.  
---|---  
  
`DataArray.plot.contourf`(*args[, x, y, ...]) | Filled contour plot of 2D DataArray.  
---|---  
`DataArray.plot.contour`(*args[, x, y, ...]) | Contour plot of 2D DataArray.  
`DataArray.plot.hist`(*args[, figsize, size, ...]) | Histogram of DataArray.  
`DataArray.plot.imshow`(*args[, x, y, ...]) | Image plot of 2D DataArray.  
`DataArray.plot.line`(*args[, row, col, ...]) | Line plot of DataArray values.  
`DataArray.plot.pcolormesh`(*args[, x, y, ...]) | Pseudocolor plot of 2D DataArray.  
`DataArray.plot.step`(*args[, where, ...]) | Step plot of DataArray values.  
`DataArray.plot.scatter`(*args[, x, y, z, ...]) | Scatter variables against each other.  
`DataArray.plot.surface`(*args[, x, y, ...]) | Surface plot of 2D DataArray.  
  
### Faceting#

`plot.FacetGrid`(data[, col, row, col_wrap, ...]) | Initialize the Matplotlib figure and FacetGrid object.  
---|---  
`plot.FacetGrid.add_colorbar`(**kwargs) | Draw a colorbar.  
`plot.FacetGrid.add_legend`(*[, label, ...]) |   
`plot.FacetGrid.add_quiverkey`(u, v, **kwargs) |   
`plot.FacetGrid.map`(func, *args, **kwargs) | Apply a plotting function to each facet's subset of the data.  
`plot.FacetGrid.map_dataarray`(func, x, y, ...) | Apply a plotting function to a 2d facet's subset of the data.  
`plot.FacetGrid.map_dataarray_line`(func, x, ...) |   
`plot.FacetGrid.map_dataset`(func[, x, y, ...]) |   
`plot.FacetGrid.map_plot1d`(func, x, y, *[, ...]) | Apply a plotting function to a 1d facet's subset of the data.  
`plot.FacetGrid.set_axis_labels`(*axlabels) | Set axis labels on the left column and bottom row of the grid.  
`plot.FacetGrid.set_ticks`([max_xticks, ...]) | Set and control tick behavior.  
`plot.FacetGrid.set_titles`([template, ...]) | Draw titles either above each facet or on the grid margins.  
`plot.FacetGrid.set_xlabels`([label]) | Label the x axis on the bottom row of the grid.  
`plot.FacetGrid.set_ylabels`([label]) | Label the y axis on the left column of the grid.  
  
## GroupBy objects#

### Dataset#

`DatasetGroupBy`(obj, groupers[, ...]) |   
---|---  
`DatasetGroupBy.map`(func[, args, shortcut]) | Apply a function to each Dataset in the group and concatenate them together into a new Dataset.  
`DatasetGroupBy.reduce`(func[, dim, axis, ...]) | Reduce the items in this group by applying func along some dimension(s).  
`DatasetGroupBy.assign`(**kwargs) | Assign data variables by group.  
`DatasetGroupBy.assign_coords`([coords]) | Assign coordinates by group.  
`DatasetGroupBy.first`([skipna, keep_attrs]) | Return the first element of each group along the group dimension  
`DatasetGroupBy.last`([skipna, keep_attrs]) | Return the last element of each group along the group dimension  
`DatasetGroupBy.fillna`(value) | Fill missing values in this object by group.  
`DatasetGroupBy.quantile`(q[, dim, method, ...]) | Compute the qth quantile over each array in the groups and concatenate them together into a new array.  
`DatasetGroupBy.where`(cond[, other]) | Return elements from self or other depending on cond.  
`DatasetGroupBy.all`([dim, keep_attrs]) | Reduce this Dataset's data by applying `all` along some dimension(s).  
`DatasetGroupBy.any`([dim, keep_attrs]) | Reduce this Dataset's data by applying `any` along some dimension(s).  
`DatasetGroupBy.count`([dim, keep_attrs]) | Reduce this Dataset's data by applying `count` along some dimension(s).  
`DatasetGroupBy.cumsum`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `cumsum` along some dimension(s).  
`DatasetGroupBy.cumprod`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `cumprod` along some dimension(s).  
`DatasetGroupBy.max`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `max` along some dimension(s).  
`DatasetGroupBy.mean`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `mean` along some dimension(s).  
`DatasetGroupBy.median`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `median` along some dimension(s).  
`DatasetGroupBy.min`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `min` along some dimension(s).  
`DatasetGroupBy.prod`([dim, skipna, ...]) | Reduce this Dataset's data by applying `prod` along some dimension(s).  
`DatasetGroupBy.std`([dim, skipna, ddof, ...]) | Reduce this Dataset's data by applying `std` along some dimension(s).  
`DatasetGroupBy.sum`([dim, skipna, min_count, ...]) | Reduce this Dataset's data by applying `sum` along some dimension(s).  
`DatasetGroupBy.var`([dim, skipna, ddof, ...]) | Reduce this Dataset's data by applying `var` along some dimension(s).  
`DatasetGroupBy.dims` |   
`DatasetGroupBy.groups` | Mapping from group labels to indices.  
`DatasetGroupBy.shuffle_to_chunks`([chunks]) | Sort or "shuffle" the underlying object.  
  
### DataArray#

`DataArrayGroupBy`(obj, groupers[, ...]) |   
---|---  
`DataArrayGroupBy.map`(func[, args, shortcut]) | Apply a function to each array in the group and concatenate them together into a new array.  
`DataArrayGroupBy.reduce`(func[, dim, axis, ...]) | Reduce the items in this group by applying func along some dimension(s).  
`DataArrayGroupBy.assign_coords`([coords]) | Assign coordinates by group.  
`DataArrayGroupBy.first`([skipna, keep_attrs]) | Return the first element of each group along the group dimension  
`DataArrayGroupBy.last`([skipna, keep_attrs]) | Return the last element of each group along the group dimension  
`DataArrayGroupBy.fillna`(value) | Fill missing values in this object by group.  
`DataArrayGroupBy.quantile`(q[, dim, method, ...]) | Compute the qth quantile over each array in the groups and concatenate them together into a new array.  
`DataArrayGroupBy.where`(cond[, other]) | Return elements from self or other depending on cond.  
`DataArrayGroupBy.all`([dim, keep_attrs]) | Reduce this DataArray's data by applying `all` along some dimension(s).  
`DataArrayGroupBy.any`([dim, keep_attrs]) | Reduce this DataArray's data by applying `any` along some dimension(s).  
`DataArrayGroupBy.count`([dim, keep_attrs]) | Reduce this DataArray's data by applying `count` along some dimension(s).  
`DataArrayGroupBy.cumsum`([dim, skipna, ...]) | Reduce this DataArray's data by applying `cumsum` along some dimension(s).  
`DataArrayGroupBy.cumprod`([dim, skipna, ...]) | Reduce this DataArray's data by applying `cumprod` along some dimension(s).  
`DataArrayGroupBy.max`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `max` along some dimension(s).  
`DataArrayGroupBy.mean`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `mean` along some dimension(s).  
`DataArrayGroupBy.median`([dim, skipna, ...]) | Reduce this DataArray's data by applying `median` along some dimension(s).  
`DataArrayGroupBy.min`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `min` along some dimension(s).  
`DataArrayGroupBy.prod`([dim, skipna, ...]) | Reduce this DataArray's data by applying `prod` along some dimension(s).  
`DataArrayGroupBy.std`([dim, skipna, ddof, ...]) | Reduce this DataArray's data by applying `std` along some dimension(s).  
`DataArrayGroupBy.sum`([dim, skipna, ...]) | Reduce this DataArray's data by applying `sum` along some dimension(s).  
`DataArrayGroupBy.var`([dim, skipna, ddof, ...]) | Reduce this DataArray's data by applying `var` along some dimension(s).  
`DataArrayGroupBy.dims` |   
`DataArrayGroupBy.groups` | Mapping from group labels to indices.  
`DataArrayGroupBy.shuffle_to_chunks`([chunks]) | Sort or "shuffle" the underlying object.  
  
### Grouper Objects#

`groupers.BinGrouper`(bins[, right, labels, ...]) | Grouper object for binning numeric data.  
---|---  
`groupers.UniqueGrouper`([labels]) | Grouper object for grouping by a categorical variable.  
`groupers.TimeResampler`(freq[, closed, ...]) | Grouper object specialized to resampling the time coordinate.  
`groupers.SeasonGrouper`(seasons) | Allows grouping using a custom definition of seasons.  
`groupers.SeasonResampler`(seasons, *[, ...]) | Allows grouping using a custom definition of seasons.  
  
## Rolling objects#

### Dataset#

`DatasetRolling`(obj, windows[, min_periods, ...]) |   
---|---  
`DatasetRolling.construct`([window_dim, ...]) | Convert this rolling object to xr.Dataset, where the window dimension is stacked as a new dimension  
`DatasetRolling.reduce`(func[, keep_attrs, ...]) | Reduce the items in this group by applying func along some dimension(s).  
`DatasetRolling.argmax`([keep_attrs]) | Reduce this object's data windows by applying argmax along its dimension.  
`DatasetRolling.argmin`([keep_attrs]) | Reduce this object's data windows by applying argmin along its dimension.  
`DatasetRolling.count`([keep_attrs]) | Reduce this object's data windows by applying count along its dimension.  
`DatasetRolling.max`([keep_attrs]) | Reduce this object's data windows by applying max along its dimension.  
`DatasetRolling.mean`([keep_attrs]) | Reduce this object's data windows by applying mean along its dimension.  
`DatasetRolling.median`([keep_attrs]) | Reduce this object's data windows by applying median along its dimension.  
`DatasetRolling.min`([keep_attrs]) | Reduce this object's data windows by applying min along its dimension.  
`DatasetRolling.prod`([keep_attrs]) | Reduce this object's data windows by applying prod along its dimension.  
`DatasetRolling.std`([keep_attrs]) | Reduce this object's data windows by applying std along its dimension.  
`DatasetRolling.sum`([keep_attrs]) | Reduce this object's data windows by applying sum along its dimension.  
`DatasetRolling.var`([keep_attrs]) | Reduce this object's data windows by applying var along its dimension.  
  
### DataArray#

`DataArrayRolling`(obj, windows[, ...]) |   
---|---  
`DataArrayRolling.__iter__`() |   
`DataArrayRolling.construct`([window_dim, ...]) | Convert this rolling object to xr.DataArray, where the window dimension is stacked as a new dimension  
`DataArrayRolling.reduce`(func[, keep_attrs, ...]) | Reduce each window by applying func.  
`DataArrayRolling.argmax`([keep_attrs]) | Reduce this object's data windows by applying argmax along its dimension.  
`DataArrayRolling.argmin`([keep_attrs]) | Reduce this object's data windows by applying argmin along its dimension.  
`DataArrayRolling.count`([keep_attrs]) | Reduce this object's data windows by applying count along its dimension.  
`DataArrayRolling.max`([keep_attrs]) | Reduce this object's data windows by applying max along its dimension.  
`DataArrayRolling.mean`([keep_attrs]) | Reduce this object's data windows by applying mean along its dimension.  
`DataArrayRolling.median`([keep_attrs]) | Reduce this object's data windows by applying median along its dimension.  
`DataArrayRolling.min`([keep_attrs]) | Reduce this object's data windows by applying min along its dimension.  
`DataArrayRolling.prod`([keep_attrs]) | Reduce this object's data windows by applying prod along its dimension.  
`DataArrayRolling.std`([keep_attrs]) | Reduce this object's data windows by applying std along its dimension.  
`DataArrayRolling.sum`([keep_attrs]) | Reduce this object's data windows by applying sum along its dimension.  
`DataArrayRolling.var`([keep_attrs]) | Reduce this object's data windows by applying var along its dimension.  
  
## Coarsen objects#

### Dataset#

`DatasetCoarsen`(obj, windows, boundary, side, ...) |   
---|---  
`DatasetCoarsen.all`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying all along some dimension(s).  
`DatasetCoarsen.any`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying any along some dimension(s).  
`DatasetCoarsen.construct`([window_dim, ...]) | Convert this Coarsen object to a DataArray or Dataset, where the coarsening dimension is split or reshaped to two new dimensions.  
`DatasetCoarsen.count`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying count along some dimension(s).  
`DatasetCoarsen.max`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying max along some dimension(s).  
`DatasetCoarsen.mean`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying mean along some dimension(s).  
`DatasetCoarsen.median`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying median along some dimension(s).  
`DatasetCoarsen.min`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying min along some dimension(s).  
`DatasetCoarsen.prod`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying prod along some dimension(s).  
`DatasetCoarsen.reduce`(func[, keep_attrs]) | Reduce the items in this group by applying func along some dimension(s).  
`DatasetCoarsen.std`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying std along some dimension(s).  
`DatasetCoarsen.sum`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying sum along some dimension(s).  
`DatasetCoarsen.var`([keep_attrs]) | Reduce this DatasetCoarsen's data by applying var along some dimension(s).  
  
### DataArray#

`DataArrayCoarsen`(obj, windows, boundary, ...) |   
---|---  
`DataArrayCoarsen.all`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying all along some dimension(s).  
`DataArrayCoarsen.any`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying any along some dimension(s).  
`DataArrayCoarsen.construct`([window_dim, ...]) | Convert this Coarsen object to a DataArray or Dataset, where the coarsening dimension is split or reshaped to two new dimensions.  
`DataArrayCoarsen.count`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying count along some dimension(s).  
`DataArrayCoarsen.max`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying max along some dimension(s).  
`DataArrayCoarsen.mean`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying mean along some dimension(s).  
`DataArrayCoarsen.median`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying median along some dimension(s).  
`DataArrayCoarsen.min`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying min along some dimension(s).  
`DataArrayCoarsen.prod`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying prod along some dimension(s).  
`DataArrayCoarsen.reduce`(func[, keep_attrs]) | Reduce the items in this group by applying func along some dimension(s).  
`DataArrayCoarsen.std`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying std along some dimension(s).  
`DataArrayCoarsen.sum`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying sum along some dimension(s).  
`DataArrayCoarsen.var`([keep_attrs]) | Reduce this DataArrayCoarsen's data by applying var along some dimension(s).  
  
## Exponential rolling objects#

`RollingExp`(obj, windows[, window_type, ...]) | Exponentially-weighted moving window object.  
---|---  
`RollingExp.mean`([keep_attrs]) | Exponentially weighted moving average.  
`RollingExp.sum`([keep_attrs]) | Exponentially weighted moving sum.  
  
## Weighted objects#

### Dataset#

`DatasetWeighted`(obj, weights) |   
---|---  
`DatasetWeighted.mean`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `mean` along some dimension(s).  
`DatasetWeighted.quantile`(q, *[, dim, ...]) | Apply a weighted `quantile` to this Dataset's data along some dimension(s).  
`DatasetWeighted.sum`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `sum` along some dimension(s).  
`DatasetWeighted.std`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `std` along some dimension(s).  
`DatasetWeighted.var`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `var` along some dimension(s).  
`DatasetWeighted.sum_of_weights`([dim, keep_attrs]) | Calculate the sum of weights, accounting for missing values in the data.  
`DatasetWeighted.sum_of_squares`([dim, ...]) | Reduce this Dataset's data by a weighted `sum_of_squares` along some dimension(s).  
  
### DataArray#

`DataArrayWeighted`(obj, weights) |   
---|---  
`DataArrayWeighted.mean`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `mean` along some dimension(s).  
`DataArrayWeighted.quantile`(q, *[, dim, ...]) | Apply a weighted `quantile` to this Dataset's data along some dimension(s).  
`DataArrayWeighted.sum`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `sum` along some dimension(s).  
`DataArrayWeighted.std`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `std` along some dimension(s).  
`DataArrayWeighted.var`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by a weighted `var` along some dimension(s).  
`DataArrayWeighted.sum_of_weights`([dim, ...]) | Calculate the sum of weights, accounting for missing values in the data.  
`DataArrayWeighted.sum_of_squares`([dim, ...]) | Reduce this Dataset's data by a weighted `sum_of_squares` along some dimension(s).  
  
## Resample objects#

### Dataset#

`DatasetResample`(*args[, dim, resample_dim]) | DatasetGroupBy object specialized to resampling a specified dimension  
---|---  
`DatasetResample.asfreq`() | Return values of original object at the new up-sampling frequency; essentially a re-index with new times set to NaN.  
`DatasetResample.backfill`([tolerance]) | Backward fill new values at up-sampled frequency.  
`DatasetResample.interpolate`([kind]) | Interpolate up-sampled data using the original data as knots.  
`DatasetResample.nearest`([tolerance]) | Take new values from nearest original coordinate to up-sampled frequency coordinates.  
`DatasetResample.pad`([tolerance]) | Forward fill new values at up-sampled frequency.  
`DatasetResample.all`([dim, keep_attrs]) | Reduce this Dataset's data by applying `all` along some dimension(s).  
`DatasetResample.any`([dim, keep_attrs]) | Reduce this Dataset's data by applying `any` along some dimension(s).  
`DatasetResample.apply`(func[, args, shortcut]) | Backward compatible implementation of `map`  
`DatasetResample.assign`(**kwargs) | Assign data variables by group.  
`DatasetResample.assign_coords`([coords]) | Assign coordinates by group.  
`DatasetResample.bfill`([tolerance]) | Backward fill new values at up-sampled frequency.  
`DatasetResample.count`([dim, keep_attrs]) | Reduce this Dataset's data by applying `count` along some dimension(s).  
`DatasetResample.ffill`([tolerance]) | Forward fill new values at up-sampled frequency.  
`DatasetResample.fillna`(value) | Fill missing values in this object by group.  
`DatasetResample.first`([skipna, keep_attrs]) | Return the first element of each group along the group dimension  
`DatasetResample.last`([skipna, keep_attrs]) | Return the last element of each group along the group dimension  
`DatasetResample.map`(func[, args, shortcut]) | Apply a function over each Dataset in the groups generated for resampling and concatenate them together into a new Dataset.  
`DatasetResample.max`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `max` along some dimension(s).  
`DatasetResample.mean`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `mean` along some dimension(s).  
`DatasetResample.median`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `median` along some dimension(s).  
`DatasetResample.min`([dim, skipna, keep_attrs]) | Reduce this Dataset's data by applying `min` along some dimension(s).  
`DatasetResample.prod`([dim, skipna, ...]) | Reduce this Dataset's data by applying `prod` along some dimension(s).  
`DatasetResample.quantile`(q[, dim, method, ...]) | Compute the qth quantile over each array in the groups and concatenate them together into a new array.  
`DatasetResample.reduce`(func[, dim, axis, ...]) | Reduce the items in this group by applying func along the pre-defined resampling dimension.  
`DatasetResample.std`([dim, skipna, ddof, ...]) | Reduce this Dataset's data by applying `std` along some dimension(s).  
`DatasetResample.sum`([dim, skipna, ...]) | Reduce this Dataset's data by applying `sum` along some dimension(s).  
`DatasetResample.var`([dim, skipna, ddof, ...]) | Reduce this Dataset's data by applying `var` along some dimension(s).  
`DatasetResample.where`(cond[, other]) | Return elements from self or other depending on cond.  
`DatasetResample.dims` |   
`DatasetResample.groups` | Mapping from group labels to indices.  
  
### DataArray#

`DataArrayResample`(*args[, dim, resample_dim]) | DataArrayGroupBy object specialized to time resampling operations over a specified dimension  
---|---  
`DataArrayResample.asfreq`() | Return values of original object at the new up-sampling frequency; essentially a re-index with new times set to NaN.  
`DataArrayResample.backfill`([tolerance]) | Backward fill new values at up-sampled frequency.  
`DataArrayResample.interpolate`([kind]) | Interpolate up-sampled data using the original data as knots.  
`DataArrayResample.nearest`([tolerance]) | Take new values from nearest original coordinate to up-sampled frequency coordinates.  
`DataArrayResample.pad`([tolerance]) | Forward fill new values at up-sampled frequency.  
`DataArrayResample.all`([dim, keep_attrs]) | Reduce this DataArray's data by applying `all` along some dimension(s).  
`DataArrayResample.any`([dim, keep_attrs]) | Reduce this DataArray's data by applying `any` along some dimension(s).  
`DataArrayResample.apply`(func[, args, shortcut]) | Backward compatible implementation of `map`  
`DataArrayResample.assign_coords`([coords]) | Assign coordinates by group.  
`DataArrayResample.bfill`([tolerance]) | Backward fill new values at up-sampled frequency.  
`DataArrayResample.count`([dim, keep_attrs]) | Reduce this DataArray's data by applying `count` along some dimension(s).  
`DataArrayResample.ffill`([tolerance]) | Forward fill new values at up-sampled frequency.  
`DataArrayResample.fillna`(value) | Fill missing values in this object by group.  
`DataArrayResample.first`([skipna, keep_attrs]) | Return the first element of each group along the group dimension  
`DataArrayResample.last`([skipna, keep_attrs]) | Return the last element of each group along the group dimension  
`DataArrayResample.map`(func[, args, shortcut]) | Apply a function to each array in the group and concatenate them together into a new array.  
`DataArrayResample.max`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `max` along some dimension(s).  
`DataArrayResample.mean`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `mean` along some dimension(s).  
`DataArrayResample.median`([dim, skipna, ...]) | Reduce this DataArray's data by applying `median` along some dimension(s).  
`DataArrayResample.min`([dim, skipna, keep_attrs]) | Reduce this DataArray's data by applying `min` along some dimension(s).  
`DataArrayResample.prod`([dim, skipna, ...]) | Reduce this DataArray's data by applying `prod` along some dimension(s).  
`DataArrayResample.quantile`(q[, dim, method, ...]) | Compute the qth quantile over each array in the groups and concatenate them together into a new array.  
`DataArrayResample.reduce`(func[, dim, axis, ...]) | Reduce the items in this group by applying func along the pre-defined resampling dimension.  
`DataArrayResample.std`([dim, skipna, ddof, ...]) | Reduce this DataArray's data by applying `std` along some dimension(s).  
`DataArrayResample.sum`([dim, skipna, ...]) | Reduce this DataArray's data by applying `sum` along some dimension(s).  
`DataArrayResample.var`([dim, skipna, ddof, ...]) | Reduce this DataArray's data by applying `var` along some dimension(s).  
`DataArrayResample.where`(cond[, other]) | Return elements from self or other depending on cond.  
`DataArrayResample.dims` |   
`DataArrayResample.groups` | Mapping from group labels to indices.  
  
## Accessors#

`accessor_dt.DatetimeAccessor`(obj) | Access datetime fields for DataArrays with datetime-like dtypes.  
---|---  
`accessor_dt.TimedeltaAccessor`(obj) | Access Timedelta fields for DataArrays with Timedelta-like dtypes.  
`accessor_str.StringAccessor`(obj) | Vectorized string functions for string-like arrays.  
  
## Custom Indexes#

### Building custom indexes#

These classes are building blocks for more complex Indexes:

`indexes.CoordinateTransform`(coord_names, ...) | Abstract coordinate transform with dimension & coordinate names.  
---|---  
`indexes.CoordinateTransformIndex`(transform) | Helper class for creating Xarray indexes based on coordinate transforms.  
`indexes.NDPointIndex`(tree_obj, *, ...) | Xarray index for irregular, n-dimensional data.  
`indexes.TreeAdapter`(points, *, options) | Lightweight adapter abstract class for plugging in 3rd-party structures like `scipy.spatial.KDTree` or `sklearn.neighbors.KDTree` into `NDPointIndex`.  
  
The Index base class for building custom indexes:

`Index.from_variables`(variables, *, options) | Create a new index object from one or more coordinate variables.  
---|---  
`Index.concat`(indexes, dim[, positions]) | Create a new index by concatenating one or more indexes of the same type.  
`Index.stack`(variables, dim) | Create a new index by stacking coordinate variables into a single new dimension.  
`Index.unstack`() | Unstack a (multi-)index into multiple (single) indexes.  
`Index.create_variables`([variables]) | Maybe create new coordinate variables from this index.  
`Index.should_add_coord_to_array`(name, var, dims) | Define whether or not an index coordinate variable should be added to a new DataArray.  
`Index.to_pandas_index`() | Cast this xarray index to a pandas.Index object or raise a `TypeError` if this is not supported.  
`Index.isel`(indexers) | Maybe returns a new index from the current index itself indexed by positional indexers.  
`Index.sel`(labels) | Query the index with arbitrary coordinate label indexers.  
`Index.join`(other[, how]) | Return a new index from the combination of this index with another index of the same type.  
`Index.reindex_like`(other) | Query the index with another index of the same type.  
`Index.equals`(other, **kwargs) | Compare this index with another index of the same type.  
`Index.roll`(shifts) | Roll this index by an offset along one or more dimensions.  
`Index.rename`(name_dict, dims_dict) | Maybe update the index with new coordinate and dimension names.  
`Index.copy`([deep]) | Return a (deep) copy of this index.  
  
## Tutorial#

`tutorial.open_dataset`(name[, cache, ...]) | Open a dataset from the online repository (requires internet).  
---|---  
`tutorial.load_dataset`(*args, **kwargs) | Open, load into memory, and close a dataset from the online repository (requires internet).  
`tutorial.open_datatree`(name[, cache, ...]) | Open a dataset as a DataTree from the online repository (requires internet).  
`tutorial.load_datatree`(*args, **kwargs) | Open, load into memory (as a DataTree), and close a dataset from the online repository (requires internet).  
  
## Testing#

`testing.assert_equal`(a, b[, check_dim_order]) | Like `numpy.testing.assert_array_equal()`, but for xarray objects.  
---|---  
`testing.assert_identical`(a, b) | Like `xarray.testing.assert_equal()`, but also matches the objects' names and attributes.  
`testing.assert_allclose`(a, b[, rtol, atol, ...]) | Like `numpy.testing.assert_allclose()`, but for xarray objects.  
`testing.assert_chunks_equal`(a, b) | Assert that chunksizes along chunked dimensions are equal.  
  
Test that two `DataTree` objects are similar.

`testing.assert_isomorphic`(a, b) | Two DataTrees are considered isomorphic if the set of paths to their descendent nodes are the same.  
---|---  
`testing.assert_equal`(a, b[, check_dim_order]) | Like `numpy.testing.assert_array_equal()`, but for xarray objects.  
`testing.assert_identical`(a, b) | Like `xarray.testing.assert_equal()`, but also matches the objects' names and attributes.  
  
## Hypothesis Testing Strategies#

See the documentation page on testing for a guide on how to use these
strategies.

Warning

These strategies should be considered highly experimental, and liable to
change at any time.

`testing.strategies.supported_dtypes`() | Generates only those numpy dtypes which xarray can handle.  
---|---  
`testing.strategies.names`() | Generates arbitrary string names for dimensions / variables.  
`testing.strategies.dimension_names`(*[, ...]) | Generates an arbitrary list of valid dimension names.  
`testing.strategies.dimension_sizes`(*[, ...]) | Generates an arbitrary mapping from dimension names to lengths.  
`testing.strategies.attrs`() | Generates arbitrary valid attributes dictionaries for xarray objects.  
`testing.strategies.variables`(*[, ...]) | Generates arbitrary xarray.Variable objects.  
`testing.strategies.unique_subset_of`(objs, *) | Return a strategy which generates a unique subset of the given objects.  
  
## Exceptions#

`AlignmentError` | Error class for alignment failures due to incompatible arguments.  
---|---  
`CoordinateValidationError` | Error class for Xarray coordinate validation failures.  
`MergeError` | Error class for merge failures due to incompatible arguments.  
`SerializationWarning` | Warnings about encoding/decoding issues in serialization.  
  
### DataTree#

Exceptions raised when manipulating trees.

`TreeIsomorphismError` | Error raised if two tree objects do not share the same node structure.  
---|---  
`InvalidTreeError` | Raised when user attempts to create an invalid tree in some way.  
`NotFoundInTreeError` | Raised when operation can't be completed because one node is not part of the expected tree.  
  
## Advanced API#

`Coordinates`([coords, indexes]) | Dictionary like container for Xarray coordinates (variables + indexes).  
---|---  
`Dataset.variables` | Low level interface to Dataset contents as dict of Variable objects.  
`DataArray.variable` | Low level interface to the Variable object for this DataArray.  
`DataTree.variables` | Low level interface to node contents as dict of Variable objects.  
`Variable`(dims, data[, attrs, encoding, fastpath]) | A netcdf-like variable consisting of dimensions, data and attributes which describe a single Array.  
`IndexVariable`(dims, data[, attrs, encoding, ...]) | Wrapper for accommodating a pandas.Index in an xarray.Variable.  
`as_variable`(obj[, name, auto_convert]) | Convert an object into a Variable.  
`Index`() | Base class inherited by all xarray-compatible indexes.  
`IndexSelResult`(dim_indexers[, indexes, ...]) | Index query results.  
`Context`(func) | object carrying the information of a call  
`register_dataset_accessor`(name) | Register a custom property on xarray.Dataset objects.  
`register_dataarray_accessor`(name) | Register a custom accessor on xarray.DataArray objects.  
`register_datatree_accessor`(name) | Register a custom accessor on DataTree objects.  
`Dataset.set_close`(close) | Register the function that releases any resources linked to this object.  
`backends.BackendArray`() |   
`backends.BackendEntrypoint`() | `BackendEntrypoint` is a class container and it is the main interface for the backend plugins, see BackendEntrypoint subclassing.  
`backends.list_engines`() | Return a dictionary of available engines and their BackendEntrypoint objects.  
`backends.refresh_engines`() | Refreshes the backend engines based on installed packages.  
  
These backends provide a low-level interface for lazily loading data from
external file-formats or protocols, and can be manually invoked to create
arguments for the `load_store` and `dump_to_store` Dataset methods:

`backends.NetCDF4DataStore`(manager[, group, ...]) | Store for reading and writing data via the Python-NetCDF4 library.  
---|---  
`backends.H5NetCDFStore`(manager[, group, ...]) | Store for reading and writing data via h5netcdf  
`backends.PydapDataStore`(dataset[, group]) | Store for accessing OpenDAP datasets with pydap.  
`backends.ScipyDataStore`(filename_or_obj[, ...]) | Store for reading and writing data via scipy.io.netcdf.  
`backends.ZarrStore`(zarr_group[, mode, ...]) | Store for reading and writing data via zarr  
`backends.FileManager`() | Manager for acquiring and closing a file object.  
`backends.CachingFileManager`(opener, *args[, ...]) | Wrapper for automatically opening and closing file objects.  
`backends.DummyFileManager`(value) | FileManager that simply wraps an open file in the FileManager interface.  
  
These BackendEntrypoints provide a basic interface to the most commonly used
filetypes in the xarray universe.

`backends.NetCDF4BackendEntrypoint`() | Backend for netCDF files based on the netCDF4 package.  
---|---  
`backends.H5netcdfBackendEntrypoint`() | Backend for netCDF files based on the h5netcdf package.  
`backends.PydapBackendEntrypoint`() | Backend for steaming datasets over the internet using the Data Access Protocol, also known as DODS or OPeNDAP based on the pydap package.  
`backends.ScipyBackendEntrypoint`() | Backend for netCDF files based on the scipy package.  
`backends.StoreBackendEntrypoint`() |   
`backends.ZarrBackendEntrypoint`() | Backend for ".zarr" files based on the zarr package.  
  
## Deprecated / Pending Deprecation#

`Dataset.drop`([labels, dim, errors]) | Backward compatible method based on drop_vars and drop_sel  
---|---  
`DataArray.drop`([labels, dim, errors]) | Backward compatible method based on drop_vars and drop_sel  
`Dataset.apply`(func[, keep_attrs, args]) | Backward compatible implementation of `map`  
`core.groupby.DataArrayGroupBy.apply`(func[, ...]) | Backward compatible implementation of `map`  
`core.groupby.DatasetGroupBy.apply`(func[, ...]) | Backward compatible implementation of `map`  
  
`DataArray.dt.weekofyear` | The week ordinal of the year  
---|---  
`DataArray.dt.week` | The week ordinal of the year