# Automatic Mixed Precision package - torch.cuda.amp

`torch.cuda.amp` provides convenience methods for mixed precision, where some
operations use the `torch.float32` (`float`) datatype and other operations use
`torch.float16` (`half`). Some ops, like linear layers and convolutions, are
much faster in `float16`. Other ops, like reductions, often require the
dynamic range of `float32`. Mixed precision tries to match each op to its
appropriate datatype.

Ordinarily, “automatic mixed precision training” uses
`torch.cuda.amp.autocast` and `torch.cuda.amp.GradScaler` together, as shown
in the [Automatic Mixed Precision
examples](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-examples)
and [Automatic Mixed Precision
recipe](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html).
However, `autocast` and `GradScaler` are modular, and may be used separately
if desired.

  * Autocasting
  * Gradient Scaling
  * Autocast Op Reference

    * Op Eligibility
    * Op-Specific Behavior

      * Ops that can autocast to `float16`
      * Ops that can autocast to `float32`
      * Ops that promote to the widest input type
      * Prefer `binary_cross_entropy_with_logits` over `binary_cross_entropy`

## Autocasting

`class torch.cuda.amp.autocast(enabled=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/autocast_mode.html#autocast)

    

Instances of `autocast` serve as context managers or decorators that allow
regions of your script to run in mixed precision.

In these regions, CUDA ops run in an op-specific dtype chosen by autocast to
improve performance while maintaining accuracy. See the Autocast Op Reference
for details.

When entering an autocast-enabled region, Tensors may be any type. You should
not call `.half()` on your model(s) or inputs when using autocasting.

`autocast` should wrap only the forward pass(es) of your network, including
the loss computation(s). Backward passes under autocast are not recommended.
Backward ops run in the same type that autocast used for corresponding forward
ops.

Example:

    
    
    # Creates model and optimizer in default precision
    model = Net().cuda()
    optimizer = optim.SGD(model.parameters(), ...)
    
    for input, target in data:
        optimizer.zero_grad()
    
        # Enables autocasting for the forward pass (model + loss)
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
    
        # Exits the context manager before backward()
        loss.backward()
        optimizer.step()
    

See the [Automatic Mixed Precision
examples](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-examples)
for usage (along with gradient scaling) in more complex scenarios (e.g.,
gradient penalty, multiple models/losses, custom autograd functions).

`autocast` can also be used as a decorator, e.g., on the `forward` method of
your model:

    
    
    class AutocastModel(nn.Module):
        ...
        @autocast()
        def forward(self, input):
            ...
    

Floating-point Tensors produced in an autocast-enabled region may be
`float16`. After returning to an autocast-disabled region, using them with
floating-point Tensors of different dtypes may cause type mismatch errors. If
so, cast the Tensor(s) produced in the autocast region back to `float32` (or
other dtype if desired). If a Tensor from the autocast region is already
`float32`, the cast is a no-op, and incurs no additional overhead. Example:

    
    
    # Creates some tensors in default dtype (here assumed to be float32)
    a_float32 = torch.rand((8, 8), device="cuda")
    b_float32 = torch.rand((8, 8), device="cuda")
    c_float32 = torch.rand((8, 8), device="cuda")
    d_float32 = torch.rand((8, 8), device="cuda")
    
    with autocast():
        # torch.mm is on autocast's list of ops that should run in float16.
        # Inputs are float32, but the op runs in float16 and produces float16 output.
        # No manual casts are required.
        e_float16 = torch.mm(a_float32, b_float32)
        # Also handles mixed input types
        f_float16 = torch.mm(d_float32, e_float16)
    
    # After exiting autocast, calls f_float16.float() to use with d_float32
    g_float32 = torch.mm(d_float32, f_float16.float())
    

Type mismatch errors _in_ an autocast-enabled region are a bug; if this is
what you observe, please file an issue.

`autocast(enabled=False)` subregions can be nested in autocast-enabled
regions. Locally disabling autocast can be useful, for example, if you want to
force a subregion to run in a particular `dtype`. Disabling autocast gives you
explicit control over the execution type. In the subregion, inputs from the
surrounding region should be cast to `dtype` before use:

    
    
    # Creates some tensors in default dtype (here assumed to be float32)
    a_float32 = torch.rand((8, 8), device="cuda")
    b_float32 = torch.rand((8, 8), device="cuda")
    c_float32 = torch.rand((8, 8), device="cuda")
    d_float32 = torch.rand((8, 8), device="cuda")
    
    with autocast():
        e_float16 = torch.mm(a_float32, b_float32)
    
        with autocast(enabled=False):
            # Calls e_float16.float() to ensure float32 execution
            # (necessary because e_float16 was created in an autocasted region)
            f_float32 = torch.mm(c_float32, e_float16.float())
    
        # No manual casts are required when re-entering the autocast-enabled region.
        # torch.mm again runs in float16 and produces float16 output, regardless of input types.
        g_float16 = torch.mm(d_float32, f_float32)
    

The autocast state is thread-local. If you want it enabled in a new thread,
the context manager or decorator must be invoked in that thread. This affects
[`torch.nn.DataParallel`](generated/torch.nn.dataparallel#torch.nn.DataParallel
"torch.nn.DataParallel") and
[`torch.nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel") when used with more than one GPU
per process (see [Working with Multiple
GPUs](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-multigpu)).

Parameters

    

**enabled** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)") _,__optional_ _,__default=True_) – Whether autocasting
should be enabled in the region.

`torch.cuda.amp.custom_fwd(fwd=None, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/autocast_mode.html#custom_fwd)

    

Helper decorator for `forward` methods of custom autograd functions
(subclasses of [`torch.autograd.Function`](autograd#torch.autograd.Function
"torch.autograd.Function")). See the [example
page](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-custom-
examples) for more detail.

Parameters

    

**cast_inputs** (`torch.dtype` or None, optional, default=None) – If not
`None`, when `forward` runs in an autocast-enabled region, casts incoming
floating-point CUDA Tensors to the target dtype (non-floating-point Tensors
are not affected), then executes `forward` with autocast disabled. If `None`,
`forward`’s internal ops execute with the current autocast state.

Note

If the decorated `forward` is called outside an autocast-enabled region,
`custom_fwd` is a no-op and `cast_inputs` has no effect.

`torch.cuda.amp.custom_bwd(bwd)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/autocast_mode.html#custom_bwd)

    

Helper decorator for backward methods of custom autograd functions (subclasses
of [`torch.autograd.Function`](autograd#torch.autograd.Function
"torch.autograd.Function")). Ensures that `backward` executes with the same
autocast state as `forward`. See the [example
page](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-custom-
examples) for more detail.

## Gradient Scaling

If the forward pass for a particular op has `float16` inputs, the backward
pass for that op will produce `float16` gradients. Gradient values with small
magnitudes may not be representable in `float16`. These values will flush to
zero (“underflow”), so the update for the corresponding parameters will be
lost.

To prevent underflow, “gradient scaling” multiplies the network’s loss(es) by
a scale factor and invokes a backward pass on the scaled loss(es). Gradients
flowing backward through the network are then scaled by the same factor. In
other words, gradient values have a larger magnitude, so they don’t flush to
zero.

Each parameter’s gradient (`.grad` attribute) should be unscaled before the
optimizer updates the parameters, so the scale factor does not interfere with
the learning rate.

`class torch.cuda.amp.GradScaler(init_scale=65536.0, growth_factor=2.0,
backoff_factor=0.5, growth_interval=2000, enabled=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler)

    

`get_backoff_factor()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_backoff_factor)

    

Returns a Python float containing the scale backoff factor.

`get_growth_factor()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_growth_factor)

    

Returns a Python float containing the scale growth factor.

`get_growth_interval()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_growth_interval)

    

Returns a Python int containing the growth interval.

`get_scale()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_scale)

    

Returns a Python float containing the current scale, or 1.0 if scaling is
disabled.

Warning

`get_scale()` incurs a CPU-GPU sync.

`is_enabled()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.is_enabled)

    

Returns a bool indicating whether this instance is enabled.

`load_state_dict(state_dict)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.load_state_dict)

    

Loads the scaler state. If this instance is disabled, `load_state_dict()` is a
no-op.

Parameters

    

**state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict
"\(in Python v3.9\)")) – scaler state. Should be an object returned from a
call to `state_dict()`.

`scale(outputs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.scale)

    

Multiplies (‘scales’) a tensor or list of tensors by the scale factor.

Returns scaled outputs. If this instance of `GradScaler` is not enabled,
outputs are returned unmodified.

Parameters

    

**outputs** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _iterable of
Tensors_) – Outputs to scale.

`set_backoff_factor(new_factor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.set_backoff_factor)

    

Parameters

    

**new_scale** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)")) – Value to use as the new scale backoff factor.

`set_growth_factor(new_factor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.set_growth_factor)

    

Parameters

    

**new_scale** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)")) – Value to use as the new scale growth factor.

`set_growth_interval(new_interval)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.set_growth_interval)

    

Parameters

    

**new_interval** ([int](https://docs.python.org/3/library/functions.html#int
"\(in Python v3.9\)")) – Value to use as the new growth interval.

`state_dict()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.state_dict)

    

Returns the state of the scaler as a
[`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)"). It contains five entries:

  * `"scale"` \- a Python float containing the current scale
  * `"growth_factor"` \- a Python float containing the current growth factor
  * `"backoff_factor"` \- a Python float containing the current backoff factor
  * `"growth_interval"` \- a Python int containing the current growth interval
  * `"_growth_tracker"` \- a Python int containing the number of recent consecutive unskipped steps.

If this instance is not enabled, returns an empty dict.

Note

If you wish to checkpoint the scaler’s state after a particular iteration,
`state_dict()` should be called after `update()`.

`step(optimizer, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.step)

    

`step()` carries out the following two operations:

  1. Internally invokes `unscale_(optimizer)` (unless `unscale_()` was explicitly called for `optimizer` earlier in the iteration). As part of the `unscale_()`, gradients are checked for infs/NaNs.
  2. If no inf/NaN gradients are found, invokes `optimizer.step()` using the unscaled gradients. Otherwise, `optimizer.step()` is skipped to avoid corrupting the params.

`*args` and `**kwargs` are forwarded to `optimizer.step()`.

Returns the return value of `optimizer.step(*args, **kwargs)`.

Parameters

    

  * **optimizer** ([torch.optim.Optimizer](optim#torch.optim.Optimizer "torch.optim.Optimizer")) – Optimizer that applies the gradients.
  * **args** – Any arguments.
  * **kwargs** – Any keyword arguments.

Warning

Closure use is not currently supported.

`unscale_(optimizer)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.unscale_)

    

Divides (“unscales”) the optimizer’s gradient tensors by the scale factor.

`unscale_()` is optional, serving cases where you need to [modify or inspect
gradients](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#working-
with-unscaled-gradients) between the backward pass(es) and `step()`. If
`unscale_()` is not called explicitly, gradients will be unscaled
automatically during `step()`.

Simple example, using `unscale_()` to enable clipping of unscaled gradients:

    
    
    ...
    scaler.scale(loss).backward()
    scaler.unscale_(optimizer)
    torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm)
    scaler.step(optimizer)
    scaler.update()
    

Parameters

    

**optimizer** ([torch.optim.Optimizer](optim#torch.optim.Optimizer
"torch.optim.Optimizer")) – Optimizer that owns the gradients to be unscaled.

Note

`unscale_()` does not incur a CPU-GPU sync.

Warning

`unscale_()` should only be called once per optimizer per `step()` call, and
only after all gradients for that optimizer’s assigned parameters have been
accumulated. Calling `unscale_()` twice for a given optimizer between each
`step()` triggers a RuntimeError.

Warning

`unscale_()` may unscale sparse gradients out of place, replacing the `.grad`
attribute.

`update(new_scale=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.update)

    

Updates the scale factor.

If any optimizer steps were skipped the scale is multiplied by
`backoff_factor` to reduce it. If `growth_interval` unskipped iterations
occurred consecutively, the scale is multiplied by `growth_factor` to increase
it.

Passing `new_scale` sets the scale directly.

Parameters

    

**new_scale** (float or `torch.cuda.FloatTensor`, optional, default=None) –
New scale factor.

Warning

`update()` should only be called at the end of the iteration, after
`scaler.step(optimizer)` has been invoked for all optimizers used this
iteration.

## Autocast Op Reference

### Op Eligibility

Only CUDA ops are eligible for autocasting.

Ops that run in `float64` or non-floating-point dtypes are not eligible, and
will run in these types whether or not autocast is enabled.

Only out-of-place ops and Tensor methods are eligible. In-place variants and
calls that explicitly supply an `out=...` Tensor are allowed in autocast-
enabled regions, but won’t go through autocasting. For example, in an
autocast-enabled region `a.addmm(b, c)` can autocast, but `a.addmm_(b, c)` and
`a.addmm(b, c, out=d)` cannot. For best performance and stability, prefer out-
of-place ops in autocast-enabled regions.

Ops called with an explicit `dtype=...` argument are not eligible, and will
produce output that respects the `dtype` argument.

### Op-Specific Behavior

The following lists describe the behavior of eligible ops in autocast-enabled
regions. These ops always go through autocasting whether they are invoked as
part of a [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module"), as a function, or as a
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") method. If functions are
exposed in multiple namespaces, they go through autocasting regardless of the
namespace.

Ops not listed below do not go through autocasting. They run in the type
defined by their inputs. However, autocasting may still change the type in
which unlisted ops run if they’re downstream from autocasted ops.

If an op is unlisted, we assume it’s numerically stable in `float16`. If you
believe an unlisted op is numerically unstable in `float16`, please file an
issue.

#### Ops that can autocast to `float16`

`__matmul__`, `addbmm`, `addmm`, `addmv`, `addr`, `baddbmm`, `bmm`,
`chain_matmul`, `conv1d`, `conv2d`, `conv3d`, `conv_transpose1d`,
`conv_transpose2d`, `conv_transpose3d`, `GRUCell`, `linear`, `LSTMCell`,
`matmul`, `mm`, `mv`, `prelu`, `RNNCell`

#### Ops that can autocast to `float32`

`__pow__`, `__rdiv__`, `__rpow__`, `__rtruediv__`, `acos`, `asin`,
`binary_cross_entropy_with_logits`, `cosh`, `cosine_embedding_loss`, `cdist`,
`cosine_similarity`, `cross_entropy`, `cumprod`, `cumsum`, `dist`, `erfinv`,
`exp`, `expm1`, `gelu`, `group_norm`, `hinge_embedding_loss`, `kl_div`,
`l1_loss`, `layer_norm`, `log`, `log_softmax`, `log10`, `log1p`, `log2`,
`margin_ranking_loss`, `mse_loss`, `multilabel_margin_loss`,
`multi_margin_loss`, `nll_loss`, `norm`, `normalize`, `pdist`,
`poisson_nll_loss`, `pow`, `prod`, `reciprocal`, `rsqrt`, `sinh`,
`smooth_l1_loss`, `soft_margin_loss`, `softmax`, `softmin`, `softplus`, `sum`,
`renorm`, `tan`, `triplet_margin_loss`

#### Ops that promote to the widest input type

These ops don’t require a particular dtype for stability, but take multiple
inputs and require that the inputs’ dtypes match. If all of the inputs are
`float16`, the op runs in `float16`. If any of the inputs is `float32`,
autocast casts all inputs to `float32` and runs the op in `float32`.

`addcdiv`, `addcmul`, `atan2`, `bilinear`, `cat`, `cross`, `dot`, `equal`,
`index_put`, `stack`, `tensordot`

Some ops not listed here (e.g., binary ops like `add`) natively promote inputs
without autocasting’s intervention. If inputs are a mixture of `float16` and
`float32`, these ops run in `float32` and produce `float32` output, regardless
of whether autocast is enabled.

#### Prefer `binary_cross_entropy_with_logits` over `binary_cross_entropy`

The backward passes of
[`torch.nn.functional.binary_cross_entropy()`](nn.functional#torch.nn.functional.binary_cross_entropy
"torch.nn.functional.binary_cross_entropy") (and
[`torch.nn.BCELoss`](generated/torch.nn.bceloss#torch.nn.BCELoss
"torch.nn.BCELoss"), which wraps it) can produce gradients that aren’t
representable in `float16`. In autocast-enabled regions, the forward input may
be `float16`, which means the backward gradient must be representable in
`float16` (autocasting `float16` forward inputs to `float32` doesn’t help,
because that cast must be reversed in backward). Therefore,
`binary_cross_entropy` and `BCELoss` raise an error in autocast-enabled
regions.

Many models use a sigmoid layer right before the binary cross entropy layer.
In this case, combine the two layers using
[`torch.nn.functional.binary_cross_entropy_with_logits()`](nn.functional#torch.nn.functional.binary_cross_entropy_with_logits
"torch.nn.functional.binary_cross_entropy_with_logits") or
[`torch.nn.BCEWithLogitsLoss`](generated/torch.nn.bcewithlogitsloss#torch.nn.BCEWithLogitsLoss
"torch.nn.BCEWithLogitsLoss"). `binary_cross_entropy_with_logits` and
`BCEWithLogits` are safe to autocast.

# Automatic differentiation package - torch.autograd

`torch.autograd` provides classes and functions implementing automatic
differentiation of arbitrary scalar valued functions. It requires minimal
changes to the existing code - you only need to declare `Tensor` s for which
gradients should be computed with the `requires_grad=True` keyword. As of now,
we only support autograd for floating point `Tensor` types ( half, float,
double and bfloat16) and complex `Tensor` types (cfloat, cdouble).

`torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None,
create_graph=False, grad_variables=None, inputs=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd.html#backward)

    

Computes the sum of gradients of given tensors w.r.t. graph leaves.

The graph is differentiated using the chain rule. If any of `tensors` are non-
scalar (i.e. their data has more than one element) and require gradient, then
the Jacobian-vector product would be computed, in this case the function
additionally requires specifying `grad_tensors`. It should be a sequence of
matching length, that contains the “vector” in the Jacobian-vector product,
usually the gradient of the differentiated function w.r.t. corresponding
tensors (`None` is an acceptable value for all tensors that don’t need
gradient tensors).

This function accumulates gradients in the leaves - you might need to zero
`.grad` attributes or set them to `None` before calling it. See Default
gradient layouts for details on the memory layout of accumulated gradients.

Note

Using this method with `create_graph=True` will create a reference cycle
between the parameter and its gradient which can cause a memory leak. We
recommend using `autograd.grad` when creating the graph to avoid this. If you
have to use this function, make sure to reset the `.grad` fields of your
parameters to `None` after use to break the cycle and avoid the leak.

Note

If you run any forward ops, create `grad_tensors`, and/or call `backward` in a
user-specified CUDA stream context, see [Stream semantics of backward
passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream-
semantics).

Parameters

    

  * **tensors** (_sequence of Tensor_) – Tensors of which the derivative will be computed.
  * **grad_tensors** (_sequence of_ _(_[Tensor](tensors#torch.Tensor "torch.Tensor") _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)") _)_) – The “vector” in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding tensors. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional.
  * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to `True` is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to `False`.
  * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be accumulated into `.grad`. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors.

`torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None,
create_graph=False, only_inputs=True, allow_unused=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd.html#grad)

    

Computes and returns the sum of gradients of outputs w.r.t. the inputs.

`grad_outputs` should be a sequence of length matching `output` containing the
“vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t.
each of the outputs. If an output doesn’t require_grad, then the gradient can
be `None`).

If `only_inputs` is `True`, the function will only return a list of gradients
w.r.t the specified inputs. If it’s `False`, then gradient w.r.t. all
remaining leaves will still be computed, and will be accumulated into their
`.grad` attribute.

Note

If you run any forward ops, create `grad_outputs`, and/or call `grad` in a
user-specified CUDA stream context, see [Stream semantics of backward
passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream-
semantics).

Parameters

    

  * **outputs** (_sequence of Tensor_) – outputs of the differentiated function.
  * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be returned (and not accumulated into `.grad`).
  * **grad_outputs** (_sequence of Tensor_) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. Default: None.
  * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to `True` is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Default: `False`.
  * **allow_unused** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to `False`.

## Functional higher level API

Warning

This API is in beta. Even though the function signatures are very unlikely to
change, major improvements to performances are planned before we consider this
stable.

This section contains the higher level API for the autograd that builds on the
basic API above and allows you to compute jacobians, hessians, etc.

This API works with user-provided functions that take only Tensors as input
and return only Tensors. If your function takes other arguments that are not
Tensors or Tensors that don’t have requires_grad set, you can use a lambda to
capture them. For example, for a function `f` that takes three inputs, a
Tensor for which we want the jacobian, another tensor that should be
considered constant and a boolean flag as `f(input, constant, flag=flag)` you
can use it as `functional.jacobian(lambda x: f(x, constant, flag=flag),
input)`.

`torch.autograd.functional.jacobian(func, inputs, create_graph=False,
strict=False, vectorize=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#jacobian)

    

Function that computes the Jacobian of a given function.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a tuple of Tensors or a Tensor.
  * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the Jacobian will be computed in a differentiable manner. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the jacobian for said inputs, which is the expected mathematical value. Defaults to `False`.
  * **vectorize** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – This feature is experimental, please use at your own risk. When computing the jacobian, usually we invoke `autograd.grad` once per row of the jacobian. If this flag is `True`, we use the vmap prototype feature as the backend to vectorize calls to `autograd.grad` so we only invoke it once instead of once per row. This should lead to performance improvements in many use cases, however, due to this feature being incomplete, there may be performance cliffs. Please use `torch._C._debug_only_display_vmap_fallback_warnings(True)` to show any performance warnings and file us issues if warnings exist for your use case. Defaults to `False`.

Returns

    

if there is a single input and output, this will be a single Tensor containing
the Jacobian for the linearized inputs and output. If one of the two is a
tuple, then the Jacobian will be a tuple of Tensors. If both of them are
tuples, then the Jacobian will be a tuple of tuple of Tensors where
`Jacobian[i][j]` will contain the Jacobian of the `i`th output and `j`th input
and will have as size the concatenation of the sizes of the corresponding
output and the corresponding input and will have same dtype and device as the
corresponding input.

Return type

    

Jacobian ([Tensor](tensors#torch.Tensor "torch.Tensor") or nested tuple of
Tensors)

#### Example

    
    
    >>> def exp_reducer(x):
    ...   return x.exp().sum(dim=1)
    >>> inputs = torch.rand(2, 2)
    >>> jacobian(exp_reducer, inputs)
    tensor([[[1.4917, 2.4352],
             [0.0000, 0.0000]],
            [[0.0000, 0.0000],
             [2.4369, 2.3799]]])
    
    
    
    >>> jacobian(exp_reducer, inputs, create_graph=True)
    tensor([[[1.4917, 2.4352],
             [0.0000, 0.0000]],
            [[0.0000, 0.0000],
             [2.4369, 2.3799]]], grad_fn=<ViewBackward>)
    
    
    
    >>> def exp_adder(x, y):
    ...   return 2 * x.exp() + 3 * y
    >>> inputs = (torch.rand(2), torch.rand(2))
    >>> jacobian(exp_adder, inputs)
    (tensor([[2.8052, 0.0000],
            [0.0000, 3.3963]]),
     tensor([[3., 0.],
             [0., 3.]]))
    

`torch.autograd.functional.hessian(func, inputs, create_graph=False,
strict=False, vectorize=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#hessian)

    

Function that computes the Hessian of a given scalar function.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor with a single element.
  * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the Hessian will be computed in a differentiable manner. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the hessian for said inputs, which is the expected mathematical value. Defaults to `False`.
  * **vectorize** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – This feature is experimental, please use at your own risk. When computing the hessian, usually we invoke `autograd.grad` once per row of the hessian. If this flag is `True`, we use the vmap prototype feature as the backend to vectorize calls to `autograd.grad` so we only invoke it once instead of once per row. This should lead to performance improvements in many use cases, however, due to this feature being incomplete, there may be performance cliffs. Please use `torch._C._debug_only_display_vmap_fallback_warnings(True)` to show any performance warnings and file us issues if warnings exist for your use case. Defaults to `False`.

Returns

    

if there is a single input, this will be a single Tensor containing the
Hessian for the input. If it is a tuple, then the Hessian will be a tuple of
tuples where `Hessian[i][j]` will contain the Hessian of the `i`th input and
`j`th input with size the sum of the size of the `i`th input plus the size of
the `j`th input. `Hessian[i][j]` will have the same dtype and device as the
corresponding `i`th input.

Return type

    

Hessian ([Tensor](tensors#torch.Tensor "torch.Tensor") or a tuple of tuple of
Tensors)

#### Example

    
    
    >>> def pow_reducer(x):
    ...   return x.pow(3).sum()
    >>> inputs = torch.rand(2, 2)
    >>> hessian(pow_reducer, inputs)
    tensor([[[[5.2265, 0.0000],
              [0.0000, 0.0000]],
             [[0.0000, 4.8221],
              [0.0000, 0.0000]]],
            [[[0.0000, 0.0000],
              [1.9456, 0.0000]],
             [[0.0000, 0.0000],
              [0.0000, 3.2550]]]])
    
    
    
    >>> hessian(pow_reducer, inputs, create_graph=True)
    tensor([[[[5.2265, 0.0000],
              [0.0000, 0.0000]],
             [[0.0000, 4.8221],
              [0.0000, 0.0000]]],
            [[[0.0000, 0.0000],
              [1.9456, 0.0000]],
             [[0.0000, 0.0000],
              [0.0000, 3.2550]]]], grad_fn=<ViewBackward>)
    
    
    
    >>> def pow_adder_reducer(x, y):
    ...   return (2 * x.pow(2) + 3 * y.pow(2)).sum()
    >>> inputs = (torch.rand(2), torch.rand(2))
    >>> hessian(pow_adder_reducer, inputs)
    ((tensor([[4., 0.],
              [0., 4.]]),
      tensor([[0., 0.],
              [0., 0.]])),
     (tensor([[0., 0.],
              [0., 0.]]),
      tensor([[6., 0.],
              [0., 6.]])))
    

`torch.autograd.functional.vjp(func, inputs, v=None, create_graph=False,
strict=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#vjp)

    

Function that computes the dot product between a vector `v` and the Jacobian
of the given function at the point given by the inputs.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a tuple of Tensors or a Tensor.
  * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`.
  * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the vector Jacobian product is computed. Must be the same size as the output of `func`. This argument is optional when the output of `func` contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the vjp for said inputs, which is the expected mathematical value. Defaults to `False`.

Returns

    

tuple with:

    

func_output (tuple of Tensors or Tensor): output of `func(inputs)`

vjp (tuple of Tensors or Tensor): result of the dot product with the same
shape as the inputs.

Return type

    

output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)"))

#### Example

    
    
    >>> def exp_reducer(x):
    ...   return x.exp().sum(dim=1)
    >>> inputs = torch.rand(4, 4)
    >>> v = torch.ones(4)
    >>> vjp(exp_reducer, inputs, v)
    (tensor([5.7817, 7.2458, 5.7830, 6.7782]),
     tensor([[1.4458, 1.3962, 1.3042, 1.6354],
            [2.1288, 1.0652, 1.5483, 2.5035],
            [2.2046, 1.1292, 1.1432, 1.3059],
            [1.3225, 1.6652, 1.7753, 2.0152]]))
    
    
    
    >>> vjp(exp_reducer, inputs, v, create_graph=True)
    (tensor([5.7817, 7.2458, 5.7830, 6.7782], grad_fn=<SumBackward1>),
     tensor([[1.4458, 1.3962, 1.3042, 1.6354],
            [2.1288, 1.0652, 1.5483, 2.5035],
            [2.2046, 1.1292, 1.1432, 1.3059],
            [1.3225, 1.6652, 1.7753, 2.0152]], grad_fn=<MulBackward0>))
    
    
    
    >>> def adder(x, y):
    ...   return 2 * x + 3 * y
    >>> inputs = (torch.rand(2), torch.rand(2))
    >>> v = torch.ones(2)
    >>> vjp(adder, inputs, v)
    (tensor([2.4225, 2.3340]),
     (tensor([2., 2.]), tensor([3., 3.])))
    

`torch.autograd.functional.jvp(func, inputs, v=None, create_graph=False,
strict=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#jvp)

    

Function that computes the dot product between the Jacobian of the given
function at the point given by the inputs and a vector `v`.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a tuple of Tensors or a Tensor.
  * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`.
  * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the Jacobian vector product is computed. Must be the same size as the input of `func`. This argument is optional when the input to `func` contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the jvp for said inputs, which is the expected mathematical value. Defaults to `False`.

Returns

    

tuple with:

    

func_output (tuple of Tensors or Tensor): output of `func(inputs)`

jvp (tuple of Tensors or Tensor): result of the dot product with the same
shape as the output.

Return type

    

output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)"))

#### Example

    
    
    >>> def exp_reducer(x):
    ...   return x.exp().sum(dim=1)
    >>> inputs = torch.rand(4, 4)
    >>> v = torch.ones(4, 4)
    >>> jvp(exp_reducer, inputs, v)
    (tensor([6.3090, 4.6742, 7.9114, 8.2106]),
     tensor([6.3090, 4.6742, 7.9114, 8.2106]))
    
    
    
    >>> jvp(exp_reducer, inputs, v, create_graph=True)
    (tensor([6.3090, 4.6742, 7.9114, 8.2106], grad_fn=<SumBackward1>),
     tensor([6.3090, 4.6742, 7.9114, 8.2106], grad_fn=<SqueezeBackward1>))
    
    
    
    >>> def adder(x, y):
    ...   return 2 * x + 3 * y
    >>> inputs = (torch.rand(2), torch.rand(2))
    >>> v = (torch.ones(2), torch.ones(2))
    >>> jvp(adder, inputs, v)
    (tensor([2.2399, 2.5005]),
     tensor([5., 5.]))
    

Note

The jvp is currently computed by using the backward of the backward (sometimes
called the double backwards trick) as we don’t have support for forward mode
AD in PyTorch at the moment.

`torch.autograd.functional.vhp(func, inputs, v=None, create_graph=False,
strict=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#vhp)

    

Function that computes the dot product between a vector `v` and the Hessian of
a given scalar function at the point given by the inputs.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor with a single element.
  * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`.
  * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the vector Hessian product is computed. Must be the same size as the input of `func`. This argument is optional when `func`’s input contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the vhp for said inputs, which is the expected mathematical value. Defaults to `False`.

Returns

    

tuple with:

    

func_output (tuple of Tensors or Tensor): output of `func(inputs)`

vhp (tuple of Tensors or Tensor): result of the dot product with the same
shape as the inputs.

Return type

    

output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)"))

#### Example

    
    
    >>> def pow_reducer(x):
    ...   return x.pow(3).sum()
    >>> inputs = torch.rand(2, 2)
    >>> v = torch.ones(2, 2)
    >>> vhp(pow_reducer, inputs, v)
    (tensor(0.5591),
     tensor([[1.0689, 1.2431],
             [3.0989, 4.4456]]))
    >>> vhp(pow_reducer, inputs, v, create_graph=True)
    (tensor(0.5591, grad_fn=<SumBackward0>),
     tensor([[1.0689, 1.2431],
             [3.0989, 4.4456]], grad_fn=<MulBackward0>))
    >>> def pow_adder_reducer(x, y):
    ...   return (2 * x.pow(2) + 3 * y.pow(2)).sum()
    >>> inputs = (torch.rand(2), torch.rand(2))
    >>> v = (torch.zeros(2), torch.ones(2))
    >>> vhp(pow_adder_reducer, inputs, v)
    (tensor(4.8053),
     (tensor([0., 0.]),
      tensor([6., 6.])))
    

`torch.autograd.functional.hvp(func, inputs, v=None, create_graph=False,
strict=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#hvp)

    

Function that computes the dot product between the Hessian of a given scalar
function and a vector `v` at the point given by the inputs.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor with a single element.
  * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`.
  * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the Hessian vector product is computed. Must be the same size as the input of `func`. This argument is optional when `func`’s input contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the hvp for said inputs, which is the expected mathematical value. Defaults to `False`.

Returns

    

tuple with:

    

func_output (tuple of Tensors or Tensor): output of `func(inputs)`

hvp (tuple of Tensors or Tensor): result of the dot product with the same
shape as the inputs.

Return type

    

output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)"))

#### Example

    
    
    >>> def pow_reducer(x):
    ...   return x.pow(3).sum()
    >>> inputs = torch.rand(2, 2)
    >>> v = torch.ones(2, 2)
    >>> hvp(pow_reducer, inputs, v)
    (tensor(0.1448),
     tensor([[2.0239, 1.6456],
             [2.4988, 1.4310]]))
    
    
    
    >>> hvp(pow_reducer, inputs, v, create_graph=True)
    (tensor(0.1448, grad_fn=<SumBackward0>),
     tensor([[2.0239, 1.6456],
             [2.4988, 1.4310]], grad_fn=<MulBackward0>))
    
    
    
    >>> def pow_adder_reducer(x, y):
    ...   return (2 * x.pow(2) + 3 * y.pow(2)).sum()
    >>> inputs = (torch.rand(2), torch.rand(2))
    >>> v = (torch.zeros(2), torch.ones(2))
    >>> hvp(pow_adder_reducer, inputs, v)
    (tensor(2.3030),
     (tensor([0., 0.]),
      tensor([6., 6.])))
    

Note

This function is significantly slower than `vhp` due to backward mode AD
constraints. If your functions is twice continuously differentiable, then hvp
= vhp.t(). So if you know that your function satisfies this condition, you
should use vhp instead that is much faster with the current implementation.

## Locally disabling gradient computation

`class torch.autograd.no_grad`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#no_grad)

    

Context-manager that disabled gradient calculation.

Disabling gradient calculation is useful for inference, when you are sure that
you will not call `Tensor.backward()`. It will reduce memory consumption for
computations that would otherwise have `requires_grad=True`.

In this mode, the result of every computation will have `requires_grad=False`,
even when the inputs have `requires_grad=True`.

This context manager is thread local; it will not affect computation in other
threads.

Also functions as a decorator. (Make sure to instantiate with parenthesis.)

Example:

    
    
    >>> x = torch.tensor([1], requires_grad=True)
    >>> with torch.no_grad():
    ...   y = x * 2
    >>> y.requires_grad
    False
    >>> @torch.no_grad()
    ... def doubler(x):
    ...     return x * 2
    >>> z = doubler(x)
    >>> z.requires_grad
    False
    

`class torch.autograd.enable_grad`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#enable_grad)

    

Context-manager that enables gradient calculation.

Enables gradient calculation, if it has been disabled via `no_grad` or
`set_grad_enabled`.

This context manager is thread local; it will not affect computation in other
threads.

Also functions as a decorator. (Make sure to instantiate with parenthesis.)

Example:

    
    
    >>> x = torch.tensor([1], requires_grad=True)
    >>> with torch.no_grad():
    ...   with torch.enable_grad():
    ...     y = x * 2
    >>> y.requires_grad
    True
    >>> y.backward()
    >>> x.grad
    >>> @torch.enable_grad()
    ... def doubler(x):
    ...     return x * 2
    >>> with torch.no_grad():
    ...     z = doubler(x)
    >>> z.requires_grad
    True
    

`class torch.autograd.set_grad_enabled(mode)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#set_grad_enabled)

    

Context-manager that sets gradient calculation to on or off.

`set_grad_enabled` will enable or disable grads based on its argument `mode`.
It can be used as a context-manager or as a function.

This context manager is thread local; it will not affect computation in other
threads.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – Flag whether to enable grad (`True`), or disable (`False`).
This can be used to conditionally enable gradients.

Example:

    
    
    >>> x = torch.tensor([1], requires_grad=True)
    >>> is_train = False
    >>> with torch.set_grad_enabled(is_train):
    ...   y = x * 2
    >>> y.requires_grad
    False
    >>> torch.set_grad_enabled(True)
    >>> y = x * 2
    >>> y.requires_grad
    True
    >>> torch.set_grad_enabled(False)
    >>> y = x * 2
    >>> y.requires_grad
    False
    

## Default gradient layouts

When a non-sparse `param` receives a non-sparse gradient during
`torch.autograd.backward()` or `torch.Tensor.backward()` `param.grad` is
accumulated as follows.

If `param.grad` is initially `None`:

  1. If `param`’s memory is non-overlapping and dense, `.grad` is created with strides matching `param` (thus matching `param`’s layout).
  2. Otherwise, `.grad` is created with rowmajor-contiguous strides.

If `param` already has a non-sparse `.grad` attribute:

  3. If `create_graph=False`, `backward()` accumulates into `.grad` in-place, which preserves its strides.
  4. If `create_graph=True`, `backward()` replaces `.grad` with a new tensor `.grad + new grad`, which attempts (but does not guarantee) matching the preexisting `.grad`’s strides.

The default behavior (letting `.grad`s be `None` before the first
`backward()`, such that their layout is created according to 1 or 2, and
retained over time according to 3 or 4) is recommended for best performance.
Calls to `model.zero_grad()` or `optimizer.zero_grad()` will not affect
`.grad` layouts.

In fact, resetting all `.grad`s to `None` before each accumulation phase,
e.g.:

    
    
    for iterations...
        ...
        for param in model.parameters():
            param.grad = None
        loss.backward()
    

such that they’re recreated according to 1 or 2 every time, is a valid
alternative to `model.zero_grad()` or `optimizer.zero_grad()` that may improve
performance for some networks.

### Manual gradient layouts

If you need manual control over `.grad`’s strides, assign `param.grad =` a
zeroed tensor with desired strides before the first `backward()`, and never
reset it to `None`. 3 guarantees your layout is preserved as long as
`create_graph=False`. 4 indicates your layout is _likely_ preserved even if
`create_graph=True`.

## In-place operations on Tensors

Supporting in-place operations in autograd is a hard matter, and we discourage
their use in most cases. Autograd’s aggressive buffer freeing and reuse makes
it very efficient and there are very few occasions when in-place operations
actually lower memory usage by any significant amount. Unless you’re operating
under heavy memory pressure, you might never need to use them.

### In-place correctness checks

All `Tensor` s keep track of in-place operations applied to them, and if the
implementation detects that a tensor was saved for backward in one of the
functions, but it was modified in-place afterwards, an error will be raised
once backward pass is started. This ensures that if you’re using in-place
functions and not seeing any errors, you can be sure that the computed
gradients are correct.

## Variable (deprecated)

Warning

The Variable API has been deprecated: Variables are no longer necessary to use
autograd with tensors. Autograd automatically supports Tensors with
`requires_grad` set to `True`. Below please find a quick guide on what has
changed:

  * `Variable(tensor)` and `Variable(tensor, requires_grad)` still work as expected, but they return Tensors instead of Variables.
  * `var.data` is the same thing as `tensor.data`.
  * Methods such as `var.backward(), var.detach(), var.register_hook()` now work on tensors with the same method names.

In addition, one can now create tensors with `requires_grad=True` using
factory methods such as [`torch.randn()`](generated/torch.randn#torch.randn
"torch.randn"), [`torch.zeros()`](generated/torch.zeros#torch.zeros
"torch.zeros"), [`torch.ones()`](generated/torch.ones#torch.ones
"torch.ones"), and others like the following:

`autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)`

## Tensor autograd functions

`class torch.Tensor`

    

`grad`

    

This attribute is `None` by default and becomes a Tensor the first time a call
to `backward()` computes gradients for `self`. The attribute will then contain
the gradients computed and future calls to `backward()` will accumulate (add)
gradients into it.

`requires_grad`

    

Is `True` if gradients need to be computed for this Tensor, `False` otherwise.

Note

The fact that gradients need to be computed for a Tensor do not mean that the
`grad` attribute will be populated, see `is_leaf` for more details.

`is_leaf`

    

All Tensors that have `requires_grad` which is `False` will be leaf Tensors by
convention.

For Tensors that have `requires_grad` which is `True`, they will be leaf
Tensors if they were created by the user. This means that they are not the
result of an operation and so `grad_fn` is None.

Only leaf Tensors will have their `grad` populated during a call to
`backward()`. To get `grad` populated for non-leaf Tensors, you can use
`retain_grad()`.

Example:

    
    
    >>> a = torch.rand(10, requires_grad=True)
    >>> a.is_leaf
    True
    >>> b = torch.rand(10, requires_grad=True).cuda()
    >>> b.is_leaf
    False
    # b was created by the operation that cast a cpu Tensor into a cuda Tensor
    >>> c = torch.rand(10, requires_grad=True) + 2
    >>> c.is_leaf
    False
    # c was created by the addition operation
    >>> d = torch.rand(10).cuda()
    >>> d.is_leaf
    True
    # d does not require gradients and so has no operation creating it (that is tracked by the autograd engine)
    >>> e = torch.rand(10).cuda().requires_grad_()
    >>> e.is_leaf
    True
    # e requires gradients and has no operations creating it
    >>> f = torch.rand(10, requires_grad=True, device="cuda")
    >>> f.is_leaf
    True
    # f requires grad, has no operation creating it
    

`backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.backward)

    

Computes the gradient of current tensor w.r.t. graph leaves.

The graph is differentiated using the chain rule. If the tensor is non-scalar
(i.e. its data has more than one element) and requires gradient, the function
additionally requires specifying `gradient`. It should be a tensor of matching
type and location, that contains the gradient of the differentiated function
w.r.t. `self`.

This function accumulates gradients in the leaves - you might need to zero
`.grad` attributes or set them to `None` before calling it. See Default
gradient layouts for details on the memory layout of accumulated gradients.

Note

If you run any forward ops, create `gradient`, and/or call `backward` in a
user-specified CUDA stream context, see [Stream semantics of backward
passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream-
semantics).

Parameters

    

  * **gradient** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)")) – Gradient w.r.t. the tensor. If it is a tensor, it will be automatically converted to a Tensor that does not require grad unless `create_graph` is True. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable then this argument is optional.
  * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to `False`.
  * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be accumulated into `.grad`. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors.

`detach()`

    

Returns a new Tensor, detached from the current graph.

The result will never require gradient.

Note

Returned Tensor shares the same storage with the original one. In-place
modifications on either of them will be seen, and may trigger errors in
correctness checks. IMPORTANT NOTE: Previously, in-place size / stride /
storage changes (such as `resize_` / `resize_as_` / `set_` / `transpose_`) to
the returned tensor also update the original tensor. Now, these in-place
changes will not update the original tensor anymore, and will instead trigger
an error. For sparse tensors: In-place indices / values changes (such as
`zero_` / `copy_` / `add_`) to the returned tensor will not update the
original tensor anymore, and will instead trigger an error.

`detach_()`

    

Detaches the Tensor from the graph that created it, making it a leaf. Views
cannot be detached in-place.

`register_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.register_hook)

    

Registers a backward hook.

The hook will be called every time a gradient with respect to the Tensor is
computed. The hook should have the following signature:

    
    
    hook(grad) -> Tensor or None
    

The hook should not modify its argument, but it can optionally return a new
gradient which will be used in place of `grad`.

This function returns a handle with a method `handle.remove()` that removes
the hook from the module.

Example:

    
    
    >>> v = torch.tensor([0., 0., 0.], requires_grad=True)
    >>> h = v.register_hook(lambda grad: grad * 2)  # double the gradient
    >>> v.backward(torch.tensor([1., 2., 3.]))
    >>> v.grad
    
     2
     4
     6
    [torch.FloatTensor of size (3,)]
    
    >>> h.remove()  # removes the hook
    

`retain_grad()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.retain_grad)

    

Enables .grad attribute for non-leaf Tensors.

## Function

`class torch.autograd.Function`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#Function)

    

Records operation history and defines formulas for differentiating ops.

See the Note on extending the autograd engine for more details on how to use
this class: <https://pytorch.org/docs/stable/notes/extending.html#extending-
torch-autograd>

Every operation performed on `Tensor` s creates a new function object, that
performs the computation, and records that it happened. The history is
retained in the form of a DAG of functions, with edges denoting data
dependencies (`input <- output`). Then, when backward is called, the graph is
processed in the topological ordering, by calling `backward()` methods of each
`Function` object, and passing returned gradients on to next `Function` s.

Normally, the only way users interact with functions is by creating subclasses
and defining new operations. This is a recommended way of extending
torch.autograd.

Examples:

    
    
    >>> class Exp(Function):
    >>>
    >>>     @staticmethod
    >>>     def forward(ctx, i):
    >>>         result = i.exp()
    >>>         ctx.save_for_backward(result)
    >>>         return result
    >>>
    >>>     @staticmethod
    >>>     def backward(ctx, grad_output):
    >>>         result, = ctx.saved_tensors
    >>>         return grad_output * result
    >>>
    >>> #Use it by calling the apply method:
    >>> output = Exp.apply(input)
    

`static backward(ctx, *grad_outputs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#Function.backward)

    

Defines a formula for differentiating the operation.

This function is to be overridden by all subclasses.

It must accept a context `ctx` as the first argument, followed by as many
outputs did `forward()` return, and it should return as many tensors, as there
were inputs to `forward()`. Each argument is the gradient w.r.t the given
output, and each returned value should be the gradient w.r.t. the
corresponding input.

The context can be used to retrieve tensors saved during the forward pass. It
also has an attribute `ctx.needs_input_grad` as a tuple of booleans
representing whether each input needs gradient. E.g., `backward()` will have
`ctx.needs_input_grad[0] = True` if the first input to `forward()` needs
gradient computated w.r.t. the output.

`static forward(ctx, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#Function.forward)

    

Performs the operation.

This function is to be overridden by all subclasses.

It must accept a context ctx as the first argument, followed by any number of
arguments (tensors or other types).

The context can be used to store tensors that can be then retrieved during the
backward pass.

## Context method mixins

When creating a new `Function`, the following methods are available to `ctx`.

`class torch.autograd.function._ContextMethodMixin`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin)

    

`mark_dirty(*args)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.mark_dirty)

    

Marks given tensors as modified in an in-place operation.

**This should be called at most once, only from inside the** `forward()`
**method, and all arguments should be inputs.**

Every tensor that’s been modified in-place in a call to `forward()` should be
given to this function, to ensure correctness of our checks. It doesn’t matter
whether the function is called before or after modification.

`mark_non_differentiable(*args)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.mark_non_differentiable)

    

Marks outputs as non-differentiable.

**This should be called at most once, only from inside the** `forward()`
**method, and all arguments should be outputs.**

This will mark outputs as not requiring gradients, increasing the efficiency
of backward computation. You still need to accept a gradient for each output
in `backward()`, but it’s always going to be a zero tensor with the same shape
as the shape of a corresponding output.

This is used e.g. for indices returned from a max `Function`.

`save_for_backward(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.save_for_backward)

    

Saves given tensors for a future call to `backward()`.

**This should be called at most once, and only from inside the** `forward()`
**method.**

Later, saved tensors can be accessed through the `saved_tensors` attribute.
Before returning them to the user, a check is made to ensure they weren’t used
in any in-place operation that modified their content.

Arguments can also be `None`.

`set_materialize_grads(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.set_materialize_grads)

    

Sets whether to materialize output grad tensors. Default is true.

**This should be called only from inside the** `forward()` **method**

If true, undefined output grad tensors will be expanded to tensors full of
zeros prior to calling the `backward()` method.

## Numerical gradient checking

`torch.autograd.gradcheck(func, inputs, eps=1e-06, atol=1e-05, rtol=0.001,
raise_exception=True, check_sparse_nnz=False, nondet_tol=0.0,
check_undefined_grad=True, check_grad_dtypes=False, check_batched_grad=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/gradcheck.html#gradcheck)

    

Check gradients computed via small finite differences against analytical
gradients w.r.t. tensors in `inputs` that are of floating point or complex
type and with `requires_grad=True`.

The check between numerical and analytical gradients uses
[`allclose()`](generated/torch.allclose#torch.allclose "torch.allclose").

For complex functions, no notion of Jacobian exists. Gradcheck verifies if the
numerical and analytical values of Wirtinger and Conjugate Wirtinger
derivative are consistent. The gradient computation is done under the
assumption that the overall function has a real valued output. For functions
with complex output, gradcheck compares the numerical and analytical gradients
for two values of `grad_output`: 1 and 1j. For more details, check out
[Autograd for Complex
Numbers](https://pytorch.org/docs/1.8.0/notes/autograd.html#complex-autograd-
doc).

Note

The default values are designed for `input` of double precision. This check
will likely fail if `input` is of less precision, e.g., `FloatTensor`.

Warning

If any checked tensor in `input` has overlapping memory, i.e., different
indices pointing to the same memory address (e.g., from `torch.expand()`),
this check will likely fail because the numerical gradients computed by point
perturbation at such indices will change values at all other indices that
share the same memory address.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors
  * **inputs** (_tuple of Tensor_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – perturbation for finite differences
  * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance
  * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance
  * **raise_exception** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicating whether to raise an exception if the check fails. The exception gives more information about the exact nature of the failure. This is helpful when debugging gradchecks.
  * **check_sparse_nnz** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, gradcheck allows for SparseTensor input, and for any SparseTensor at input, gradcheck will perform check at nnz positions only.
  * **nondet_tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – tolerance for non-determinism. When running identical inputs through the differentiation, the results must either match exactly (default, 0.0) or be within this tolerance.
  * **check_undefined_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if undefined output grads are supported and treated as zeros, for `Tensor` outputs.
  * **check_batched_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if we can compute batched gradients using prototype vmap support. Defaults to False.

Returns

    

True if all differences satisfy allclose condition

`torch.autograd.gradgradcheck(func, inputs, grad_outputs=None, eps=1e-06,
atol=1e-05, rtol=0.001, gen_non_contig_grad_outputs=False,
raise_exception=True, nondet_tol=0.0, check_undefined_grad=True,
check_grad_dtypes=False, check_batched_grad=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/gradcheck.html#gradgradcheck)

    

Check gradients of gradients computed via small finite differences against
analytical gradients w.r.t. tensors in `inputs` and `grad_outputs` that are of
floating point or complex type and with `requires_grad=True`.

This function checks that backpropagating through the gradients computed to
the given `grad_outputs` are correct.

The check between numerical and analytical gradients uses
[`allclose()`](generated/torch.allclose#torch.allclose "torch.allclose").

Note

The default values are designed for `input` and `grad_outputs` of double
precision. This check will likely fail if they are of less precision, e.g.,
`FloatTensor`.

Warning

If any checked tensor in `input` and `grad_outputs` has overlapping memory,
i.e., different indices pointing to the same memory address (e.g., from
`torch.expand()`), this check will likely fail because the numerical gradients
computed by point perturbation at such indices will change values at all other
indices that share the same memory address.

Parameters

    

  * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors
  * **inputs** (_tuple of Tensor_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function
  * **grad_outputs** (_tuple of Tensor_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The gradients with respect to the function’s outputs.
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – perturbation for finite differences
  * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance
  * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance
  * **gen_non_contig_grad_outputs** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `grad_outputs` is `None` and `gen_non_contig_grad_outputs` is `True`, the randomly generated gradient outputs are made to be noncontiguous
  * **raise_exception** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicating whether to raise an exception if the check fails. The exception gives more information about the exact nature of the failure. This is helpful when debugging gradchecks.
  * **nondet_tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – tolerance for non-determinism. When running identical inputs through the differentiation, the results must either match exactly (default, 0.0) or be within this tolerance. Note that a small amount of nondeterminism in the gradient will lead to larger inaccuracies in the second derivative.
  * **check_undefined_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if undefined output grads are supported and treated as zeros
  * **check_batched_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if we can compute batched gradients using prototype vmap support. Defaults to False.

Returns

    

True if all differences satisfy allclose condition

## Profiler

Autograd includes a profiler that lets you inspect the cost of different
operators inside your model - both on the CPU and GPU. There are two modes
implemented at the moment - CPU-only using `profile`. and nvprof based
(registers both CPU and GPU activity) using `emit_nvtx`.

`class torch.autograd.profiler.profile(enabled=True, *, use_cuda=False,
record_shapes=False, with_flops=False, profile_memory=False, with_stack=False,
use_kineto=False, use_cpu=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile)

    

Context manager that manages autograd profiler state and holds a summary of
results. Under the hood it just records events of functions being executed in
C++ and exposes those events to Python. You can wrap any code into it and it
will only report runtime of PyTorch functions. Note: profiler is thread local
and is automatically propagated into the async tasks

Parameters

    

  * **enabled** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Setting this to False makes this context manager a no-op.
  * **use_cuda** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Enables timing of CUDA events as well using the cudaEvent API. Adds approximately 4us of overhead to each tensor operation.
  * **record_shapes** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If shapes recording is set, information about input dimensions will be collected. This allows one to see which dimensions have been used under the hood and further group by them using prof.key_averages(group_by_input_shape=True). Please note that shape recording might skew your profiling data. It is recommended to use separate runs with and without shape recording to validate the timing. Most likely the skew will be negligible for bottom most events (in a case of nested function calls). But for higher level functions the total self cpu time might be artificially increased because of the shape collection.
  * **with_flops** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If with_flops is set, the profiler will estimate the FLOPS (floating pointer operations per second) value using the operator’s input shape and total time. This allows one to estimate the hardware performance. Currently, this option only works for the matrix multiplication and 2D convolution operators.
  * **profile_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – track tensor memory allocation/deallocation.
  * **with_stack** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – record source information (file and line number) for the ops.
  * **use_kineto** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – experimental, enable profiling with Kineto profiler.
  * **use_cpu** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – profile CPU events; setting to `False` requires `use_kineto=True` and can be used to lower the overhead for GPU-only profiling.

#### Example

    
    
    >>> x = torch.randn((1, 1), requires_grad=True)
    >>> with torch.autograd.profiler.profile() as prof:
    >>>     for _ in range(100):  # any normal python code, really!
    >>>         y = x ** 2
    >>          y.backward()
    >>> # NOTE: some columns were removed for brevity
    >>> print(prof.key_averages().table(sort_by="self_cpu_time_total"))
    -----------------------------------  ---------------  ---------------  ---------------
    Name                                 Self CPU total   CPU time avg     Number of Calls
    -----------------------------------  ---------------  ---------------  ---------------
    mul                                  32.048ms         32.048ms         200
    pow                                  27.041ms         27.041ms         200
    PowBackward0                         9.727ms          55.483ms         100
    torch::autograd::AccumulateGrad      9.148ms          9.148ms          100
    torch::autograd::GraphRoot           691.816us        691.816us        100
    -----------------------------------  ---------------  ---------------  ---------------
    

`export_chrome_trace(path)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.export_chrome_trace)

    

Exports an EventList as a Chrome tracing tools file.

The checkpoint can be later loaded and inspected under `chrome://tracing` URL.

Parameters

    

**path** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – Path where the trace will be written.

`key_averages(group_by_input_shape=False, group_by_stack_n=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.key_averages)

    

Averages all function events over their keys.

Parameters

    

  * **group_by_input_shapes** – group entries by
  * **name, input shapes) rather than just event name.** (_(__event_) – 
  * **is useful to see which input shapes contribute to the runtime** (_This_) – 
  * **most and may help with size-specific optimizations or** (_the_) – 
  * **the best candidates for quantization** (_choosing_) – 
  * **group_by_stack_n** – group by top n stack trace entries

Returns

    

An EventList containing FunctionEventAvg objects.

`property self_cpu_time_total`

    

Returns total time spent on CPU obtained as a sum of all self times across all
the events.

`table(sort_by=None, row_limit=100, max_src_column_width=75, header=None,
top_level_events_only=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.table)

    

Prints an EventList as a nicely formatted table.

Parameters

    

  * **sort_by** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Attribute used to sort entries. By default they are printed in the same order as they were registered. Valid keys include: `cpu_time`, `cuda_time`, `cpu_time_total`, `cuda_time_total`, `cpu_memory_usage`, `cuda_memory_usage`, `self_cpu_memory_usage`, `self_cuda_memory_usage`, `count`.
  * **top_level_events_only** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Boolean flag to determine the selection of events to display. If true, the profiler will only display events at top level like top-level invocation of python `lstm`, python `add` or other functions, nested events like low-level cpu/cuda ops events are omitted for profiler result readability.

Returns

    

A string containing the table.

`total_average()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.total_average)

    

Averages all events.

Returns

    

A FunctionEventAvg object.

`class torch.autograd.profiler.emit_nvtx(enabled=True, record_shapes=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#emit_nvtx)

    

Context manager that makes every autograd operation emit an NVTX range.

It is useful when running the program under nvprof:

    
    
    nvprof --profile-from-start off -o trace_name.prof -- <regular command here>
    

Unfortunately, there’s no way to force nvprof to flush the data it collected
to disk, so for CUDA profiling one has to use this context manager to annotate
nvprof traces and wait for the process to exit before inspecting them. Then,
either NVIDIA Visual Profiler (nvvp) can be used to visualize the timeline, or
`torch.autograd.profiler.load_nvprof()` can load the results for inspection
e.g. in Python REPL.

Parameters

    

  * **enabled** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=True_) – Setting `enabled=False` makes this context manager a no-op. Default: `True`.
  * **record_shapes** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=False_) – If `record_shapes=True`, the nvtx range wrapping each autograd op will append information about the sizes of Tensor arguments received by that op, in the following format: `[[arg0.size(0), arg0.size(1), ...], [arg1.size(0), arg1.size(1), ...], ...]` Non-tensor arguments will be represented by `[]`. Arguments will be listed in the order they are received by the backend op. Please note that this order may not match the order in which those arguments were passed on the Python side. Also note that shape recording may increase the overhead of nvtx range creation.

#### Example

    
    
    >>> with torch.cuda.profiler.profile():
    ...     model(x) # Warmup CUDA memory allocator and profiler
    ...     with torch.autograd.profiler.emit_nvtx():
    ...         model(x)
    

**Forward-backward correlation**

When viewing a profile created using `emit_nvtx` in the Nvidia Visual
Profiler, correlating each backward-pass op with the corresponding forward-
pass op can be difficult. To ease this task, `emit_nvtx` appends sequence
number information to the ranges it generates.

During the forward pass, each function range is decorated with `seq=<N>`.
`seq` is a running counter, incremented each time a new backward Function
object is created and stashed for backward. Thus, the `seq=<N>` annotation
associated with each forward function range tells you that if a backward
Function object is created by this forward function, the backward object will
receive sequence number N. During the backward pass, the top-level range
wrapping each C++ backward Function’s `apply()` call is decorated with
`stashed seq=<M>`. `M` is the sequence number that the backward object was
created with. By comparing `stashed seq` numbers in backward with `seq`
numbers in forward, you can track down which forward op created each backward
Function.

Any functions executed during the backward pass are also decorated with
`seq=<N>`. During default backward (with `create_graph=False`) this
information is irrelevant, and in fact, `N` may simply be 0 for all such
functions. Only the top-level ranges associated with backward Function
objects’ `apply()` methods are useful, as a way to correlate these Function
objects with the earlier forward pass.

**Double-backward**

If, on the other hand, a backward pass with `create_graph=True` is underway
(in other words, if you are setting up for a double-backward), each function’s
execution during backward is given a nonzero, useful `seq=<N>`. Those
functions may themselves create Function objects to be executed later during
double-backward, just as the original functions in the forward pass did. The
relationship between backward and double-backward is conceptually the same as
the relationship between forward and backward: The functions still emit
current-sequence-number-tagged ranges, the Function objects they create still
stash those sequence numbers, and during the eventual double-backward, the
Function objects’ `apply()` ranges are still tagged with `stashed seq`
numbers, which can be compared to `seq` numbers from the backward pass.

`torch.autograd.profiler.load_nvprof(path)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#load_nvprof)

    

Opens an nvprof trace file and parses autograd annotations.

Parameters

    

**path** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – path to nvprof trace

## Anomaly detection

`class torch.autograd.detect_anomaly`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/anomaly_mode.html#detect_anomaly)

    

Context-manager that enable anomaly detection for the autograd engine.

This does two things:

  * Running the forward pass with detection enabled will allow the backward pass to print the traceback of the forward operation that created the failing backward function.
  * Any backward computation that generate “nan” value will raise an error.

Warning

This mode should be enabled only for debugging as the different tests will
slow down your program execution.

#### Example

    
    
    >>> import torch
    >>> from torch import autograd
    >>> class MyFunc(autograd.Function):
    ...     @staticmethod
    ...     def forward(ctx, inp):
    ...         return inp.clone()
    ...     @staticmethod
    ...     def backward(ctx, gO):
    ...         # Error during the backward pass
    ...         raise RuntimeError("Some error in backward")
    ...         return gO.clone()
    >>> def run_fn(a):
    ...     out = MyFunc.apply(a)
    ...     return out.sum()
    >>> inp = torch.rand(10, 10, requires_grad=True)
    >>> out = run_fn(inp)
    >>> out.backward()
        Traceback (most recent call last):
          File "<stdin>", line 1, in <module>
          File "/your/pytorch/install/torch/tensor.py", line 93, in backward
            torch.autograd.backward(self, gradient, retain_graph, create_graph)
          File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward
            allow_unreachable=True)  # allow_unreachable flag
          File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply
            return self._forward_cls.backward(self, *args)
          File "<stdin>", line 8, in backward
        RuntimeError: Some error in backward
    >>> with autograd.detect_anomaly():
    ...     inp = torch.rand(10, 10, requires_grad=True)
    ...     out = run_fn(inp)
    ...     out.backward()
        Traceback of forward call that caused the error:
          File "tmp.py", line 53, in <module>
            out = run_fn(inp)
          File "tmp.py", line 44, in run_fn
            out = MyFunc.apply(a)
        Traceback (most recent call last):
          File "<stdin>", line 4, in <module>
          File "/your/pytorch/install/torch/tensor.py", line 93, in backward
            torch.autograd.backward(self, gradient, retain_graph, create_graph)
          File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward
            allow_unreachable=True)  # allow_unreachable flag
          File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply
            return self._forward_cls.backward(self, *args)
          File "<stdin>", line 8, in backward
        RuntimeError: Some error in backward
    

`class torch.autograd.set_detect_anomaly(mode)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/anomaly_mode.html#set_detect_anomaly)

    

Context-manager that sets the anomaly detection for the autograd engine on or
off.

`set_detect_anomaly` will enable or disable the autograd anomaly detection
based on its argument `mode`. It can be used as a context-manager or as a
function.

See `detect_anomaly` above for details of the anomaly detection behaviour.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – Flag whether to enable anomaly detection (`True`), or
disable (`False`).

# torch.backends

`torch.backends` controls the behavior of various backends that PyTorch
supports.

These backends include:

  * `torch.backends.cuda`
  * `torch.backends.cudnn`
  * `torch.backends.mkl`
  * `torch.backends.mkldnn`
  * `torch.backends.openmp`

## torch.backends.cuda

`torch.backends.cuda.is_built()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/cuda.html#is_built)

    

Returns whether PyTorch is built with CUDA support. Note that this doesn’t
necessarily mean CUDA is available; just that if this PyTorch binary were run
a machine with working CUDA drivers and devices, we would be able to use it.

`torch.backends.cuda.matmul.allow_tf32`

    

A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)") that controls whether TensorFloat-32 tensor cores may be used in
matrix multiplications on Ampere or newer GPUs. See [TensorFloat-32(TF32) on
Ampere devices](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

`torch.backends.cuda.cufft_plan_cache`

    

`cufft_plan_cache` caches the cuFFT plans

`size`

    

A readonly [`int`](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") that shows the number of plans currently in the cuFFT plan
cache.

`max_size`

    

A [`int`](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") that controls cache capacity of cuFFT plan.

`clear()`

    

Clears the cuFFT plan cache.

## torch.backends.cudnn

`torch.backends.cudnn.version()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/cudnn.html#version)

    

Returns the version of cuDNN

`torch.backends.cudnn.is_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/cudnn.html#is_available)

    

Returns a bool indicating if CUDNN is currently available.

`torch.backends.cudnn.enabled`

    

A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)") that controls whether cuDNN is enabled.

`torch.backends.cudnn.allow_tf32`

    

A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)") that controls where TensorFloat-32 tensor cores may be used in cuDNN
convolutions on Ampere or newer GPUs. See [TensorFloat-32(TF32) on Ampere
devices](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-ampere).

`torch.backends.cudnn.deterministic`

    

A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)") that, if True, causes cuDNN to only use deterministic convolution
algorithms. See also
[`torch.are_deterministic_algorithms_enabled()`](generated/torch.are_deterministic_algorithms_enabled#torch.are_deterministic_algorithms_enabled
"torch.are_deterministic_algorithms_enabled") and
[`torch.use_deterministic_algorithms()`](generated/torch.use_deterministic_algorithms#torch.use_deterministic_algorithms
"torch.use_deterministic_algorithms").

`torch.backends.cudnn.benchmark`

    

A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)") that, if True, causes cuDNN to benchmark multiple convolution
algorithms and select the fastest.

## torch.backends.mkl

`torch.backends.mkl.is_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/mkl.html#is_available)

    

Returns whether PyTorch is built with MKL support.

## torch.backends.mkldnn

`torch.backends.mkldnn.is_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/mkldnn.html#is_available)

    

Returns whether PyTorch is built with MKL-DNN support.

## torch.backends.openmp

`torch.backends.openmp.is_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/openmp.html#is_available)

    

Returns whether PyTorch is built with OpenMP support.

# Benchmark Utils - torch.utils.benchmark

`class torch.utils.benchmark.Timer(stmt='pass', setup='pass', timer=<function
timer>, globals=None, label=None, sub_label=None, description=None, env=None,
num_threads=1, language=<Language.PYTHON: 0>)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer)

    

Helper class for measuring execution time of PyTorch statements.

For a full tutorial on how to use this class, see:
<https://pytorch.org/tutorials/recipes/recipes/benchmark.html>

The PyTorch Timer is based on `timeit.Timer` (and in fact uses `timeit.Timer`
internally), but with several key differences:

  1. Runtime aware:
    

Timer will perform warmups (important as some elements of PyTorch are lazily
initialized), set threadpool size so that comparisons are apples-to-apples,
and synchronize asynchronous CUDA functions when necessary.

  2. Focus on replicates:
    

When measuring code, and particularly complex kernels / models, run-to-run
variation is a significant confounding factor. It is expected that all
measurements should include replicates to quantify noise and allow median
computation, which is more robust than mean. To that effect, this class
deviates from the `timeit` API by conceptually merging `timeit.Timer.repeat`
and `timeit.Timer.autorange`. (Exact algorithms are discussed in method
docstrings.) The `timeit` method is replicated for cases where an adaptive
strategy is not desired.

  3. Optional metadata:
    

When defining a Timer, one can optionally specify `label`, `sub_label`,
`description`, and `env`. (Defined later) These fields are included in the
representation of result object and by the `Compare` class to group and
display results for comparison.

  4. Instruction counts
    

In addition to wall times, Timer can run a statement under Callgrind and
report instructions executed.

Directly analogous to `timeit.Timer` constructor arguments:

`stmt`, `setup`, `timer`, `globals`

PyTorch Timer specific constructor arguments:

`label`, `sub_label`, `description`, `env`, `num_threads`

Parameters

    

  * **stmt** – Code snippet to be run in a loop and timed.
  * **setup** – Optional setup code. Used to define variables used in `stmt`
  * **timer** – Callable which returns the current time. If PyTorch was built without CUDA or there is no GPU present, this defaults to `timeit.default_timer`; otherwise it will synchronize CUDA before measuring the time.
  * **globals** – A dict which defines the global variables when `stmt` is being executed. This is the other method for providing variables which `stmt` needs.
  * **label** – String which summarizes `stmt`. For instance, if `stmt` is “torch.nn.functional.relu(torch.add(x, 1, out=out))” one might set label to “ReLU(x + 1)” to improve readability.
  * **sub_label** – 

Provide supplemental information to disambiguate measurements with identical
stmt or label. For instance, in our example above sub_label might be “float”
or “int”, so that it is easy to differentiate: “ReLU(x + 1): (float)”

”ReLU(x + 1): (int)” when printing Measurements or summarizing using
`Compare`.

  * **description** – 

String to distinguish measurements with identical label and sub_label. The
principal use of `description` is to signal to `Compare` the columns of data.
For instance one might set it based on the input size to create a table of the
form:

        
        | n=1 | n=4 | ...
                                ------------- ...
        ReLU(x + 1): (float)    | ... | ... | ...
        ReLU(x + 1): (int)      | ... | ... | ...
        

using `Compare`. It is also included when printing a Measurement.

  * **env** – This tag indicates that otherwise identical tasks were run in different environments, and are therefore not equivilent, for instance when A/B testing a change to a kernel. `Compare` will treat Measurements with different `env` specification as distinct when merging replicate runs.
  * **num_threads** – The size of the PyTorch threadpool when executing `stmt`. Single threaded performace is important as both a key inference workload and a good indicator of intrinsic algorithmic efficiency, so the default is set to one. This is in contrast to the default PyTorch threadpool size which tries to utilize all cores.

`blocked_autorange(callback=None, min_run_time=0.2)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer.blocked_autorange)

    

Measure many replicates while keeping timer overhead to a minimum.

At a high level, blocked_autorange executes the following pseudo-code:

    
    
    `setup`
    
    total_time = 0
    while total_time < min_run_time
        start = timer()
        for _ in range(block_size):
            `stmt`
        total_time += (timer() - start)
    

Note the variable `block_size` in the inner loop. The choice of block size is
important to measurement quality, and must balance two competing objectives:

  1. A small block size results in more replicates and generally better statistics.
  2. A large block size better amortizes the cost of `timer` invocation, and results in a less biased measurement. This is important because CUDA syncronization time is non-trivial (order single to low double digit microseconds) and would otherwise bias the measurement.

blocked_autorange sets block_size by running a warmup period, increasing block
size until timer overhead is less than 0.1% of the overall computation. This
value is then used for the main measurement loop.

Returns

    

A `Measurement` object that contains measured runtimes and repetition counts,
and can be used to compute statistics. (mean, median, etc.)

`collect_callgrind(number=100, collect_baseline=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer.collect_callgrind)

    

Collect instruction counts using Callgrind.

Unlike wall times, instruction counts are deterministic (modulo non-
determinism in the program itself and small amounts of jitter from the Python
interpreter.) This makes them ideal for detailed performance analysis. This
method runs `stmt` in a separate process so that Valgrind can instrument the
program. Performance is severely degraded due to the instrumentation,
howevever this is ameliorated by the fact that a small number of iterations is
generally sufficient to obtain good measurements.

In order to to use this method `valgrind`, `callgrind_control`, and
`callgrind_annotate` must be installed.

Because there is a process boundary between the caller (this process) and the
`stmt` execution, `globals` cannot contain arbitrary in-memory data
structures. (Unlike timing methods) Instead, globals are restricted to
builtins, `nn.Modules`’s, and TorchScripted functions/modules to reduce the
surprise factor from serialization and subsequent deserialization. The
`GlobalsBridge` class provides more detail on this subject. Take particular
care with nn.Modules: they rely on pickle and you may need to add an import to
`setup` for them to transfer properly.

By default, a profile for an empty statement will be collected and cached to
indicate how many instructions are from the Python loop which drives `stmt`.

Returns

    

A `CallgrindStats` object which provides instruction counts and some basic
facilities for analyzing and manipulating results.

`timeit(number=1000000)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer.timeit)

    

Mirrors the semantics of timeit.Timer.timeit().

Execute the main statement (`stmt`) `number` times.
<https://docs.python.org/3/library/timeit.html#timeit.Timer.timeit>

`class torch.utils.benchmark.Measurement(number_per_run, raw_times, task_spec,
metadata=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/common.html#Measurement)

    

The result of a Timer measurement.

This class stores one or more measurements of a given statement. It is
serializable and provides several convenience methods (including a detailed
__repr__) for downstream consumers.

`static merge(measurements)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/common.html#Measurement.merge)

    

Convenience method for merging replicates.

Merge will extrapolate times to `number_per_run=1` and will not transfer any
metadata. (Since it might differ between replicates)

`property significant_figures`

    

Approximate significant figure estimate.

This property is intended to give a convenient way to estimate the precision
of a measurement. It only uses the interquartile region to estimate statistics
to try to mitigate skew from the tails, and uses a static z value of 1.645
since it is not expected to be used for small values of `n`, so z can
approximate `t`.

The significant figure estimation used in conjunction with the `trim_sigfig`
method to provide a more human interpretable data summary. __repr__ does not
use this method; it simply displays raw values. Significant figure estimation
is intended for `Compare`.

`class torch.utils.benchmark.CallgrindStats(task_spec, number_per_run,
built_with_debug_symbols, baseline_inclusive_stats, baseline_exclusive_stats,
stmt_inclusive_stats, stmt_exclusive_stats)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats)

    

Top level container for Callgrind results collected by Timer.

Manipulation is generally done using the FunctionCounts class, which is
obtained by calling `CallgrindStats.stats(…)`. Several convenience methods are
provided as well; the most significant is `CallgrindStats.as_standardized()`.

`as_standardized()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.as_standardized)

    

Strip library names and some prefixes from function strings.

When comparing two different sets of instruction counts, on stumbling block
can be path prefixes. Callgrind includes the full filepath when reporting a
function (as it should). However, this can cause issues when diffing profiles.
If a key component such as Python or PyTorch was built in separate locations
in the two profiles, which can result in something resembling:

    
    
    23234231 /tmp/first_build_dir/thing.c:foo(...)
     9823794 /tmp/first_build_dir/thing.c:bar(...)
      ...
       53453 .../aten/src/Aten/...:function_that_actually_changed(...)
      ...
     -9823794 /tmp/second_build_dir/thing.c:bar(...)
    -23234231 /tmp/second_build_dir/thing.c:foo(...)
    

Stripping prefixes can ameliorate this issue by regularizing the strings and
causing better cancellation of equivilent call sites when diffing.

`counts(*, denoise=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.counts)

    

Returns the total number of instructions executed.

See `FunctionCounts.denoise()` for an explation of the `denoise` arg.

`delta(other, inclusive=False, subtract_baselines=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.delta)

    

Diff two sets of counts.

One common reason to collect instruction counts is to determine the the effect
that a particular change will have on the number of instructions needed to
perform some unit of work. If a change increases that number, the next logical
question is “why”. This generally involves looking at what part if the code
increased in instruction count. This function automates that process so that
one can easily diff counts on both an inclusive and exclusive basis. The
`subtract_baselines` argument allows one to disable baseline correction,
though in most cases it shouldn’t matter as the baselines are expected to more
or less cancel out.

`stats(inclusive=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.stats)

    

Returns detailed function counts.

Conceptually, the FunctionCounts returned can be thought of as a tuple of
(count, path_and_function_name) tuples.

`inclusive` matches the semantics of callgrind. If True, the counts include
instructions executed by children. `inclusive=True` is useful for identifying
hot spots in code; `inclusive=False` is useful for reducing noise when diffing
counts from two different runs. (See CallgrindStats.delta(…) for more details)

`class torch.utils.benchmark.FunctionCounts(_data, inclusive,
_linewidth=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts)

    

Container for manipulating Callgrind results.

It supports:

    

  1. Addition and subtraction to combine or diff results.
  2. Tuple-like indexing.
  3. A `denoise` function which strips CPython calls which are known to be non-deterministic and quite noisy.
  4. Two higher order methods (`filter` and `transform`) for custom manipulation.

`denoise()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts.denoise)

    

Remove known noisy instructions.

Several instructions in the CPython interpreter are rather noisy. These
instructions involve unicode to dictionary lookups which Python uses to map
variable names. FunctionCounts is generally a content agnostic container,
however this is sufficiently important for obtaining reliable results to
warrant an exception.

`filter(filter_fn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts.filter)

    

Keep only the elements where `filter_fn` applied to function name returns
True.

`transform(map_fn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts.transform)

    

Apply `map_fn` to all of the function names.

This can be used to regularize function names (e.g. stripping irrelevant parts
of the file path), coalesce entries by mapping multiple functions to the same
name (in which case the counts are added together), etc.

# torch.utils.bottleneck

`torch.utils.bottleneck` is a tool that can be used as an initial step for
debugging bottlenecks in your program. It summarizes runs of your script with
the Python profiler and PyTorch’s autograd profiler.

Run it on the command line with

    
    
    python -m torch.utils.bottleneck /path/to/source/script.py [args]
    

where [args] are any number of arguments to `script.py`, or run `python -m
torch.utils.bottleneck -h` for more usage instructions.

Warning

Because your script will be profiled, please ensure that it exits in a finite
amount of time.

Warning

Due to the asynchronous nature of CUDA kernels, when running against CUDA
code, the cProfile output and CPU-mode autograd profilers may not show correct
timings: the reported CPU time reports the amount of time used to launch the
kernels but does not include the time the kernel spent executing on a GPU
unless the operation does a synchronize. Ops that do synchronize appear to be
extremely expensive under regular CPU-mode profilers. In these case where
timings are incorrect, the CUDA-mode autograd profiler may be helpful.

Note

To decide which (CPU-only-mode or CUDA-mode) autograd profiler output to look
at, you should first check if your script is CPU-bound (“CPU total time is
much greater than CUDA total time”). If it is CPU-bound, looking at the
results of the CPU-mode autograd profiler will help. If on the other hand your
script spends most of its time executing on the GPU, then it makes sense to
start looking for responsible CUDA operators in the output of the CUDA-mode
autograd profiler.

Of course the reality is much more complicated and your script might not be in
one of those two extremes depending on the part of the model you’re
evaluating. If the profiler outputs don’t help, you could try looking at the
result of
[`torch.autograd.profiler.emit_nvtx()`](autograd#torch.autograd.profiler.emit_nvtx
"torch.autograd.profiler.emit_nvtx") with `nvprof`. However, please take into
account that the NVTX overhead is very high and often gives a heavily skewed
timeline.

Warning

If you are profiling CUDA code, the first profiler that `bottleneck` runs
(cProfile) will include the CUDA startup time (CUDA buffer allocation cost) in
its time reporting. This should not matter if your bottlenecks result in code
much slower than the CUDA startup time.

For more complicated uses of the profilers (like in a multi-GPU case), please
see <https://docs.python.org/3/library/profile.html> or
[`torch.autograd.profiler.profile()`](autograd#torch.autograd.profiler.profile
"torch.autograd.profiler.profile") for more information.

# torch.utils.checkpoint

Note

Checkpointing is implemented by rerunning a forward-pass segment for each
checkpointed segment during backward. This can cause persistent states like
the RNG state to be advanced than they would without checkpointing. By
default, checkpointing includes logic to juggle the RNG state such that
checkpointed passes making use of RNG (through dropout for example) have
deterministic output as compared to non-checkpointed passes. The logic to
stash and restore RNG states can incur a moderate performance hit depending on
the runtime of checkpointed operations. If deterministic output compared to
non-checkpointed passes is not required, supply `preserve_rng_state=False` to
`checkpoint` or `checkpoint_sequential` to omit stashing and restoring the RNG
state during each checkpoint.

The stashing logic saves and restores the RNG state for the current device and
the device of all cuda Tensor arguments to the `run_fn`. However, the logic
has no way to anticipate if the user will move Tensors to a new device within
the `run_fn` itself. Therefore, if you move Tensors to a new device (“new”
meaning not belonging to the set of [current device + devices of Tensor
arguments]) within `run_fn`, deterministic output compared to non-checkpointed
passes is never guaranteed.

`torch.utils.checkpoint.checkpoint(function, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/checkpoint.html#checkpoint)

    

Checkpoint a model or part of the model

Checkpointing works by trading compute for memory. Rather than storing all
intermediate activations of the entire computation graph for computing
backward, the checkpointed part does **not** save intermediate activations,
and instead recomputes them in backward pass. It can be applied on any part of
a model.

Specifically, in the forward pass, `function` will run in
[`torch.no_grad()`](generated/torch.no_grad#torch.no_grad "torch.no_grad")
manner, i.e., not storing the intermediate activations. Instead, the forward
pass saves the inputs tuple and the `function` parameter. In the backwards
pass, the saved inputs and `function` is retrieved, and the forward pass is
computed on `function` again, now tracking the intermediate activations, and
then the gradients are calculated using these activation values.

Warning

Checkpointing doesn’t work with
[`torch.autograd.grad()`](autograd#torch.autograd.grad "torch.autograd.grad"),
but only with [`torch.autograd.backward()`](autograd#torch.autograd.backward
"torch.autograd.backward").

Warning

If `function` invocation during backward does anything different than the one
during forward, e.g., due to some global variable, the checkpointed version
won’t be equivalent, and unfortunately it can’t be detected.

Warning

If checkpointed segment contains tensors detached from the computational graph
by `detach()` or `torch.no_grad()`, the backward pass will raise an error.
This is because `checkpoint` makes all the outputs require gradients which
causes issues when a tensor is defined to have no gradient in the model. To
circumvent this, detach the tensors outside of the `checkpoint` function.

Parameters

    

  * **function** – describes what to run in the forward pass of the model or part of the model. It should also know how to handle the inputs passed as the tuple. For example, in LSTM, if user passes `(activation, hidden)`, `function` should correctly use the first input as `activation` and the second input as `hidden`
  * **preserve_rng_state** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=True_) – Omit stashing and restoring the RNG state during each checkpoint.
  * **args** – tuple containing inputs to the `function`

Returns

    

Output of running `function` on `*args`

`torch.utils.checkpoint.checkpoint_sequential(functions, segments, input,
**kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/checkpoint.html#checkpoint_sequential)

    

A helper function for checkpointing sequential models.

Sequential models execute a list of modules/functions in order (sequentially).
Therefore, we can divide such a model in various segments and checkpoint each
segment. All segments except the last will run in
[`torch.no_grad()`](generated/torch.no_grad#torch.no_grad "torch.no_grad")
manner, i.e., not storing the intermediate activations. The inputs of each
checkpointed segment will be saved for re-running the segment in the backward
pass.

See `checkpoint()` on how checkpointing works.

Warning

Checkpointing doesn’t work with
[`torch.autograd.grad()`](autograd#torch.autograd.grad "torch.autograd.grad"),
but only with [`torch.autograd.backward()`](autograd#torch.autograd.backward
"torch.autograd.backward").

Parameters

    

  * **functions** – A [`torch.nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") or the list of modules or functions (comprising the model) to run sequentially.
  * **segments** – Number of chunks to create in the model
  * **input** – A Tensor that is input to `functions`
  * **preserve_rng_state** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=True_) – Omit stashing and restoring the RNG state during each checkpoint.

Returns

    

Output of running `functions` sequentially on `*inputs`

#### Example

    
    
    >>> model = nn.Sequential(...)
    >>> input_var = checkpoint_sequential(model, chunks, input_var)
    

# torch.utils.cpp_extension

`torch.utils.cpp_extension.CppExtension(name, sources, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#CppExtension)

    

Creates a `setuptools.Extension` for C++.

Convenience method that creates a `setuptools.Extension` with the bare minimum
(but often sufficient) arguments to build a C++ extension.

All arguments are forwarded to the `setuptools.Extension` constructor.

#### Example

    
    
    >>> from setuptools import setup
    >>> from torch.utils.cpp_extension import BuildExtension, CppExtension
    >>> setup(
            name='extension',
            ext_modules=[
                CppExtension(
                    name='extension',
                    sources=['extension.cpp'],
                    extra_compile_args=['-g']),
            ],
            cmdclass={
                'build_ext': BuildExtension
            })
    

`torch.utils.cpp_extension.CUDAExtension(name, sources, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#CUDAExtension)

    

Creates a `setuptools.Extension` for CUDA/C++.

Convenience method that creates a `setuptools.Extension` with the bare minimum
(but often sufficient) arguments to build a CUDA/C++ extension. This includes
the CUDA include path, library path and runtime library.

All arguments are forwarded to the `setuptools.Extension` constructor.

#### Example

    
    
    >>> from setuptools import setup
    >>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension
    >>> setup(
            name='cuda_extension',
            ext_modules=[
                CUDAExtension(
                        name='cuda_extension',
                        sources=['extension.cpp', 'extension_kernel.cu'],
                        extra_compile_args={'cxx': ['-g'],
                                            'nvcc': ['-O2']})
            ],
            cmdclass={
                'build_ext': BuildExtension
            })
    

Compute capabilities:

By default the extension will be compiled to run on all archs of the cards
visible during the building process of the extension, plus PTX. If down the
road a new card is installed the extension may need to be recompiled. If a
visible card has a compute capability (CC) that’s newer than the newest
version for which your nvcc can build fully-compiled binaries, Pytorch will
make nvcc fall back to building kernels with the newest version of PTX your
nvcc does support (see below for details on PTX).

You can override the default behavior using `TORCH_CUDA_ARCH_LIST` to
explicitly specify which CCs you want the extension to support:

TORCH_CUDA_ARCH_LIST=”6.1 8.6” python build_my_extension.py
TORCH_CUDA_ARCH_LIST=”5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX” python
build_my_extension.py

The +PTX option causes extension kernel binaries to include PTX instructions
for the specified CC. PTX is an intermediate representation that allows
kernels to runtime-compile for any CC >= the specified CC (for example,
8.6+PTX generates PTX that can runtime-compile for any GPU with CC >= 8.6).
This improves your binary’s forward compatibility. However, relying on older
PTX to provide forward compat by runtime-compiling for newer CCs can modestly
reduce performance on those newer CCs. If you know exact CC(s) of the GPUs you
want to target, you’re always better off specifying them individually. For
example, if you want your extension to run on 8.0 and 8.6, “8.0+PTX” would
work functionally because it includes PTX that can runtime-compile for 8.6,
but “8.0 8.6” would be better.

Note that while it’s possible to include all supported archs, the more archs
get included the slower the building process will be, as it will build a
separate kernel image for each arch.

`torch.utils.cpp_extension.BuildExtension(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#BuildExtension)

    

A custom `setuptools` build extension .

This `setuptools.build_ext` subclass takes care of passing the minimum
required compiler flags (e.g. `-std=c++14`) as well as mixed C++/CUDA
compilation (and support for CUDA files in general).

When using `BuildExtension`, it is allowed to supply a dictionary for
`extra_compile_args` (rather than the usual list) that maps from languages
(`cxx` or `nvcc`) to a list of additional compiler flags to supply to the
compiler. This makes it possible to supply different flags to the C++ and CUDA
compiler during mixed compilation.

`use_ninja` (bool): If `use_ninja` is `True` (default), then we attempt to
build using the Ninja backend. Ninja greatly speeds up compilation compared to
the standard `setuptools.build_ext`. Fallbacks to the standard distutils
backend if Ninja is not available.

Note

By default, the Ninja backend uses #CPUS + 2 workers to build the extension.
This may use up too many resources on some systems. One can control the number
of workers by setting the `MAX_JOBS` environment variable to a non-negative
number.

`torch.utils.cpp_extension.load(name, sources, extra_cflags=None,
extra_cuda_cflags=None, extra_ldflags=None, extra_include_paths=None,
build_directory=None, verbose=False, with_cuda=None, is_python_module=True,
is_standalone=False, keep_intermediates=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#load)

    

Loads a PyTorch C++ extension just-in-time (JIT).

To load an extension, a Ninja build file is emitted, which is used to compile
the given sources into a dynamic library. This library is subsequently loaded
into the current Python process as a module and returned from this function,
ready for use.

By default, the directory to which the build file is emitted and the resulting
library compiled to is `<tmp>/torch_extensions/<name>`, where `<tmp>` is the
temporary folder on the current platform and `<name>` the name of the
extension. This location can be overridden in two ways. First, if the
`TORCH_EXTENSIONS_DIR` environment variable is set, it replaces
`<tmp>/torch_extensions` and all extensions will be compiled into subfolders
of this directory. Second, if the `build_directory` argument to this function
is supplied, it overrides the entire path, i.e. the library will be compiled
into that folder directly.

To compile the sources, the default system compiler (`c++`) is used, which can
be overridden by setting the `CXX` environment variable. To pass additional
arguments to the compilation process, `extra_cflags` or `extra_ldflags` can be
provided. For example, to compile your extension with optimizations, pass
`extra_cflags=['-O3']`. You can also use `extra_cflags` to pass further
include directories.

CUDA support with mixed compilation is provided. Simply pass CUDA source files
(`.cu` or `.cuh`) along with other sources. Such files will be detected and
compiled with nvcc rather than the C++ compiler. This includes passing the
CUDA lib64 directory as a library directory, and linking `cudart`. You can
pass additional flags to nvcc via `extra_cuda_cflags`, just like with
`extra_cflags` for C++. Various heuristics for finding the CUDA install
directory are used, which usually work fine. If not, setting the `CUDA_HOME`
environment variable is the safest option.

Parameters

    

  * **name** – The name of the extension to build. This MUST be the same as the name of the pybind11 module!
  * **sources** – A list of relative or absolute paths to C++ source files.
  * **extra_cflags** – optional list of compiler flags to forward to the build.
  * **extra_cuda_cflags** – optional list of compiler flags to forward to nvcc when building CUDA sources.
  * **extra_ldflags** – optional list of linker flags to forward to the build.
  * **extra_include_paths** – optional list of include directories to forward to the build.
  * **build_directory** – optional path to use as build workspace.
  * **verbose** – If `True`, turns on verbose logging of load steps.
  * **with_cuda** – Determines whether CUDA headers and libraries are added to the build. If set to `None` (default), this value is automatically determined based on the existence of `.cu` or `.cuh` in `sources`. Set it to `True`` to force CUDA headers and libraries to be included.
  * **is_python_module** – If `True` (default), imports the produced shared library as a Python module. If `False`, behavior depends on `is_standalone`.
  * **is_standalone** – If `False` (default) loads the constructed extension into the process as a plain dynamic library. If `True`, build a standalone executable.

Returns

    

Returns the loaded PyTorch extension as a Python module.

`If is_python_module is False and is_standalone is False:`

    

Returns nothing. (The shared library is loaded into the process as a side
effect.)

`If is_standalone is True.`

    

Return the path to the executable. (On Windows, TORCH_LIB_PATH is added to the
PATH environment variable as a side effect.)

Return type

    

If `is_python_module` is `True`

#### Example

    
    
    >>> from torch.utils.cpp_extension import load
    >>> module = load(
            name='extension',
            sources=['extension.cpp', 'extension_kernel.cu'],
            extra_cflags=['-O2'],
            verbose=True)
    

`torch.utils.cpp_extension.load_inline(name, cpp_sources, cuda_sources=None,
functions=None, extra_cflags=None, extra_cuda_cflags=None, extra_ldflags=None,
extra_include_paths=None, build_directory=None, verbose=False, with_cuda=None,
is_python_module=True, with_pytorch_error_handling=True,
keep_intermediates=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#load_inline)

    

Loads a PyTorch C++ extension just-in-time (JIT) from string sources.

This function behaves exactly like `load()`, but takes its sources as strings
rather than filenames. These strings are stored to files in the build
directory, after which the behavior of `load_inline()` is identical to
`load()`.

See [the
tests](https://github.com/pytorch/pytorch/blob/master/test/test_cpp_extensions.py)
for good examples of using this function.

Sources may omit two required parts of a typical non-inline C++ extension: the
necessary header includes, as well as the (pybind11) binding code. More
precisely, strings passed to `cpp_sources` are first concatenated into a
single `.cpp` file. This file is then prepended with `#include
<torch/extension.h>`.

Furthermore, if the `functions` argument is supplied, bindings will be
automatically generated for each function specified. `functions` can either be
a list of function names, or a dictionary mapping from function names to
docstrings. If a list is given, the name of each function is used as its
docstring.

The sources in `cuda_sources` are concatenated into a separate `.cu` file and
prepended with `torch/types.h`, `cuda.h` and `cuda_runtime.h` includes. The
`.cpp` and `.cu` files are compiled separately, but ultimately linked into a
single library. Note that no bindings are generated for functions in
`cuda_sources` per se. To bind to a CUDA kernel, you must create a C++
function that calls it, and either declare or define this C++ function in one
of the `cpp_sources` (and include its name in `functions`).

See `load()` for a description of arguments omitted below.

Parameters

    

  * **cpp_sources** – A string, or list of strings, containing C++ source code.
  * **cuda_sources** – A string, or list of strings, containing CUDA source code.
  * **functions** – A list of function names for which to generate function bindings. If a dictionary is given, it should map function names to docstrings (which are otherwise just the function names).
  * **with_cuda** – Determines whether CUDA headers and libraries are added to the build. If set to `None` (default), this value is automatically determined based on whether `cuda_sources` is provided. Set it to `True` to force CUDA headers and libraries to be included.
  * **with_pytorch_error_handling** – Determines whether pytorch error and warning macros are handled by pytorch instead of pybind. To do this, each function `foo` is called via an intermediary `_safe_foo` function. This redirection might cause issues in obscure cases of cpp. This flag should be set to `False` when this redirect causes issues.

#### Example

    
    
    >>> from torch.utils.cpp_extension import load_inline
    >>> source = \'\'\'
    at::Tensor sin_add(at::Tensor x, at::Tensor y) {
      return x.sin() + y.sin();
    }
    \'\'\'
    >>> module = load_inline(name='inline_extension',
                             cpp_sources=[source],
                             functions=['sin_add'])
    

Note

By default, the Ninja backend uses #CPUS + 2 workers to build the extension.
This may use up too many resources on some systems. One can control the number
of workers by setting the `MAX_JOBS` environment variable to a non-negative
number.

`torch.utils.cpp_extension.include_paths(cuda=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#include_paths)

    

Get the include paths required to build a C++ or CUDA extension.

Parameters

    

**cuda** – If `True`, includes CUDA-specific include paths.

Returns

    

A list of include path strings.

`torch.utils.cpp_extension.check_compiler_abi_compatibility(compiler)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#check_compiler_abi_compatibility)

    

Verifies that the given compiler is ABI-compatible with PyTorch.

Parameters

    

**compiler** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – The compiler executable name to check (e.g. `g++`). Must be
executable in a shell process.

Returns

    

False if the compiler is (likely) ABI-incompatible with PyTorch, else True.

`torch.utils.cpp_extension.verify_ninja_availability()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#verify_ninja_availability)

    

Raises `RuntimeError` if [ninja](https://ninja-build.org/) build system is not
available on the system, does nothing otherwise.

`torch.utils.cpp_extension.is_ninja_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#is_ninja_available)

    

Returns `True` if the [ninja](https://ninja-build.org/) build system is
available on the system, `False` otherwise.

# torch.cuda

This package adds support for CUDA tensor types, that implement the same
function as CPU tensors, but they utilize GPUs for computation.

It is lazily initialized, so you can always import it, and use
`is_available()` to determine if your system supports CUDA.

[CUDA semantics](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
semantics) has more details about working with CUDA.

`torch.cuda.can_device_access_peer(device, peer_device)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#can_device_access_peer)

    

Checks if peer access between two devices is possible.

`torch.cuda.current_blas_handle()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#current_blas_handle)

    

Returns cublasHandle_t pointer to current cuBLAS handle

`torch.cuda.current_device()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#current_device)

    

Returns the index of a currently selected device.

`torch.cuda.current_stream(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#current_stream)

    

Returns the currently selected `Stream` for a given device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns the currently selected
`Stream` for the current device, given by `current_device()`, if `device` is
`None` (default).

`torch.cuda.default_stream(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#default_stream)

    

Returns the default `Stream` for a given device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns the default `Stream` for
the current device, given by `current_device()`, if `device` is `None`
(default).

`class torch.cuda.device(device)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#device)

    

Context-manager that changes the selected device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)")) – device index to select. It’s a no-op if this argument is a
negative integer or `None`.

`torch.cuda.device_count()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#device_count)

    

Returns the number of GPUs available.

`class torch.cuda.device_of(obj)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#device_of)

    

Context-manager that changes the current device to that of given object.

You can use both tensors and storages as arguments. If a given object is not
allocated on a GPU, this is a no-op.

Parameters

    

**obj** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _Storage_) –
object allocated on the selected device.

`torch.cuda.get_arch_list()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_arch_list)

    

Returns list CUDA architectures this library was compiled for.

`torch.cuda.get_device_capability(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_device_capability)

    

Gets the cuda capability of a device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – device for which to return the device capability.
This function is a no-op if this argument is a negative integer. It uses the
current device, given by `current_device()`, if `device` is `None` (default).

Returns

    

the major and minor cuda capability of the device

Return type

    

[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)"), [int](https://docs.python.org/3/library/functions.html#int
"\(in Python v3.9\)"))

`torch.cuda.get_device_name(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_device_name)

    

Gets the name of a device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – device for which to return the name. This function
is a no-op if this argument is a negative integer. It uses the current device,
given by `current_device()`, if `device` is `None` (default).

Returns

    

the name of the device

Return type

    

[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python
v3.9\)")

`torch.cuda.get_device_properties(device)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_device_properties)

    

Gets the properties of a device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _or_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – device for which to return the properties of the device.

Returns

    

the properties of the device

Return type

    

_CudaDeviceProperties

`torch.cuda.get_gencode_flags()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_gencode_flags)

    

Returns NVCC gencode flags this library were compiled with.

`torch.cuda.init()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#init)

    

Initialize PyTorch’s CUDA state. You may need to call this explicitly if you
are interacting with PyTorch via its C API, as Python bindings for CUDA
functionality will not be available until this initialization takes place.
Ordinary users should not need this, as all of PyTorch’s CUDA methods
automatically initialize CUDA state on-demand.

Does nothing if the CUDA state is already initialized.

`torch.cuda.ipc_collect()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#ipc_collect)

    

Force collects GPU memory after it has been released by CUDA IPC.

Note

Checks if any sent CUDA tensors could be cleaned from the memory. Force closes
shared memory file used for reference counting if there is no active counters.
Useful when the producer process stopped actively sending tensors and want to
release unused memory.

`torch.cuda.is_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#is_available)

    

Returns a bool indicating if CUDA is currently available.

`torch.cuda.is_initialized()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#is_initialized)

    

Returns whether PyTorch’s CUDA state has been initialized.

`torch.cuda.set_device(device)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#set_device)

    

Sets the current device.

Usage of this function is discouraged in favor of `device`. In most cases it’s
better to use `CUDA_VISIBLE_DEVICES` environmental variable.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)")) – selected device. This function is a no-op if this argument is
negative.

`torch.cuda.stream(stream)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#stream)

    

Context-manager that selects a given stream.

All CUDA kernels queued within its context will be enqueued on a selected
stream.

Parameters

    

**stream** (Stream) – selected stream. This manager is a no-op if it’s `None`.

Note

Streams are per-device. If the selected stream is not on the current device,
this function will also change the current device to match the stream.

`torch.cuda.synchronize(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#synchronize)

    

Waits for all kernels in all streams on a CUDA device to complete.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – device for which to synchronize. It uses the current
device, given by `current_device()`, if `device` is `None` (default).

## Random Number Generator

`torch.cuda.get_rng_state(device='cuda')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#get_rng_state)

    

Returns the random number generator state of the specified GPU as a
ByteTensor.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – The device to return the RNG state of. Default:
`'cuda'` (i.e., `torch.device('cuda')`, the current CUDA device).

Warning

This function eagerly initializes CUDA.

`torch.cuda.get_rng_state_all()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#get_rng_state_all)

    

Returns a list of ByteTensor representing the random number states of all
devices.

`torch.cuda.set_rng_state(new_state, device='cuda')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#set_rng_state)

    

Sets the random number generator state of the specified GPU.

Parameters

    

  * **new_state** (_torch.ByteTensor_) – The desired state
  * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The device to set the RNG state. Default: `'cuda'` (i.e., `torch.device('cuda')`, the current CUDA device).

`torch.cuda.set_rng_state_all(new_states)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#set_rng_state_all)

    

Sets the random number generator state of all devices.

Parameters

    

**new_states** (_Iterable of torch.ByteTensor_) – The desired state for each
device

`torch.cuda.manual_seed(seed)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#manual_seed)

    

Sets the seed for generating random numbers for the current GPU. It’s safe to
call this function if CUDA is not available; in that case, it is silently
ignored.

Parameters

    

**seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – The desired seed.

Warning

If you are working with a multi-GPU model, this function is insufficient to
get determinism. To seed all GPUs, use `manual_seed_all()`.

`torch.cuda.manual_seed_all(seed)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#manual_seed_all)

    

Sets the seed for generating random numbers on all GPUs. It’s safe to call
this function if CUDA is not available; in that case, it is silently ignored.

Parameters

    

**seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – The desired seed.

`torch.cuda.seed()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#seed)

    

Sets the seed for generating random numbers to a random number for the current
GPU. It’s safe to call this function if CUDA is not available; in that case,
it is silently ignored.

Warning

If you are working with a multi-GPU model, this function will only initialize
the seed on one GPU. To initialize all GPUs, use `seed_all()`.

`torch.cuda.seed_all()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#seed_all)

    

Sets the seed for generating random numbers to a random number on all GPUs.
It’s safe to call this function if CUDA is not available; in that case, it is
silently ignored.

`torch.cuda.initial_seed()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#initial_seed)

    

Returns the current random seed of the current GPU.

Warning

This function eagerly initializes CUDA.

## Communication collectives

`torch.cuda.comm.broadcast(tensor, devices=None, *, out=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#broadcast)

    

Broadcasts a tensor to specified GPU devices.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – tensor to broadcast. Can be on CPU or GPU.
  * **devices** (_Iterable_ _[_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – an iterable of GPU devices, among which to broadcast.
  * **out** (_Sequence_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__,__optional_ _,__keyword-only_) – the GPU tensors to store output results.

Note

Exactly one of `devices` and `out` must be specified.

Returns

    

  * `If devices is specified,`
    

a tuple containing copies of `tensor`, placed on `devices`.

  * `If out is specified,`
    

a tuple containing `out` tensors, each containing a copy of `tensor`.

`torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#broadcast_coalesced)

    

Broadcasts a sequence tensors to the specified GPUs. Small tensors are first
coalesced into a buffer to reduce the number of synchronizations.

Parameters

    

  * **tensors** (_sequence_) – tensors to broadcast. Must be on the same device, either CPU or GPU.
  * **devices** (_Iterable_ _[_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – an iterable of GPU devices, among which to broadcast.
  * **buffer_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximum size of the buffer used for coalescing

Returns

    

A tuple containing copies of `tensor`, placed on `devices`.

`torch.cuda.comm.reduce_add(inputs, destination=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#reduce_add)

    

Sums tensors from multiple GPUs.

All inputs should have matching shapes, dtype, and layout. The output tensor
will be of the same shape, dtype, and layout.

Parameters

    

  * **inputs** (_Iterable_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – an iterable of tensors to add.
  * **destination** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a device on which the output will be placed (default: current device).

Returns

    

A tensor containing an elementwise sum of all inputs, placed on the
`destination` device.

`torch.cuda.comm.scatter(tensor, devices=None, chunk_sizes=None, dim=0,
streams=None, *, out=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#scatter)

    

Scatters tensor across multiple GPUs.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – tensor to scatter. Can be on CPU or GPU.
  * **devices** (_Iterable_ _[_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – an iterable of GPU devices, among which to scatter.
  * **chunk_sizes** (_Iterable_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – sizes of chunks to be placed on each device. It should match `devices` in length and sums to `tensor.size(dim)`. If not specified, `tensor` will be divided into equal chunks.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – A dimension along which to chunk `tensor`. Default: `0`.
  * **streams** (_Iterable_ _[_Stream _]__,__optional_) – an iterable of Streams, among which to execute the scatter. If not specified, the default stream will be utilized.
  * **out** (_Sequence_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__,__optional_ _,__keyword-only_) – the GPU tensors to store output results. Sizes of these tensors must match that of `tensor`, except for `dim`, where the total size must sum to `tensor.size(dim)`.

Note

Exactly one of `devices` and `out` must be specified. When `out` is specified,
`chunk_sizes` must not be specified and will be inferred from sizes of `out`.

Returns

    

  * `If devices is specified,`
    

a tuple containing chunks of `tensor`, placed on `devices`.

  * `If out is specified,`
    

a tuple containing `out` tensors, each containing a chunk of `tensor`.

`torch.cuda.comm.gather(tensors, dim=0, destination=None, *, out=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#gather)

    

Gathers tensors from multiple GPU devices.

Parameters

    

  * **tensors** (_Iterable_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – an iterable of tensors to gather. Tensor sizes in all dimensions other than `dim` have to match.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a dimension along which the tensors will be concatenated. Default: `0`.
  * **destination** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _, or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the output device. Can be CPU or CUDA. Default: the current CUDA device.
  * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_ _,__keyword-only_) – the tensor to store gather result. Its sizes must match those of `tensors`, except for `dim`, where the size must equal `sum(tensor.size(dim) for tensor in tensors)`. Can be on CPU or CUDA.

Note

`destination` must not be specified when `out` is specified.

Returns

    

  * `If destination is specified,`
    

a tensor located on `destination` device, that is a result of concatenating
`tensors` along `dim`.

  * `If out is specified,`
    

the `out` tensor, now containing results of concatenating `tensors` along
`dim`.

## Streams and events

`class torch.cuda.Stream`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream)

    

Wrapper around a CUDA stream.

A CUDA stream is a linear sequence of execution that belongs to a specific
device, independent from other streams. See [CUDA
semantics](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-semantics) for
details.

Parameters

    

  * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a device on which to allocate the stream. If `device` is `None` (default) or a negative integer, this will use the current device.
  * **priority** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – priority of the stream. Can be either -1 (high priority) or 0 (low priority). By default, streams have priority 0.

Note

Although CUDA versions >= 11 support more than two levels of priorities, in
PyTorch, we only support two levels of priorities.

`query()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.query)

    

Checks if all the work submitted has been completed.

Returns

    

A boolean indicating if all kernels in this stream are completed.

`record_event(event=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.record_event)

    

Records an event.

Parameters

    

**event** (Event _,__optional_) – event to record. If not given, a new one
will be allocated.

Returns

    

Recorded event.

`synchronize()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.synchronize)

    

Wait for all the kernels in this stream to complete.

Note

This is a wrapper around `cudaStreamSynchronize()`: see [CUDA Stream
documentation](https://docs.nvidia.com/cuda/cuda-runtime-
api/group__CUDART__STREAM.html) for more info.

`wait_event(event)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.wait_event)

    

Makes all future work submitted to the stream wait for an event.

Parameters

    

**event** (Event) – an event to wait for.

Note

This is a wrapper around `cudaStreamWaitEvent()`: see [CUDA Stream
documentation](https://docs.nvidia.com/cuda/cuda-runtime-
api/group__CUDART__STREAM.html) for more info.

This function returns without waiting for `event`: only future operations are
affected.

`wait_stream(stream)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.wait_stream)

    

Synchronizes with another stream.

All future work submitted to this stream will wait until all kernels submitted
to a given stream at the time of call complete.

Parameters

    

**stream** (Stream) – a stream to synchronize.

Note

This function returns without waiting for currently enqueued kernels in
`stream`: only future operations are affected.

`class torch.cuda.Event`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event)

    

Wrapper around a CUDA event.

CUDA events are synchronization markers that can be used to monitor the
device’s progress, to accurately measure timing, and to synchronize CUDA
streams.

The underlying CUDA events are lazily initialized when the event is first
recorded or exported to another process. After creation, only streams on the
same device may record the event. However, streams on any device can wait on
the event.

Parameters

    

  * **enable_timing** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates if the event should measure time (default: `False`)
  * **blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, `wait()` will be blocking (default: `False`)
  * **interprocess** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `True`, the event can be shared between processes (default: `False`)

`elapsed_time(end_event)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.elapsed_time)

    

Returns the time elapsed in milliseconds after the event was recorded and
before the end_event was recorded.

`classmethod from_ipc_handle(device, handle)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.from_ipc_handle)

    

Reconstruct an event from an IPC handle on the given device.

`ipc_handle()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.ipc_handle)

    

Returns an IPC handle of this event. If not recorded yet, the event will use
the current device.

`query()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.query)

    

Checks if all work currently captured by event has completed.

Returns

    

A boolean indicating if all work currently captured by event has completed.

`record(stream=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.record)

    

Records the event in a given stream.

Uses `torch.cuda.current_stream()` if no stream is specified. The stream’s
device must match the event’s device.

`synchronize()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.synchronize)

    

Waits for the event to complete.

Waits until the completion of all work currently captured in this event. This
prevents the CPU thread from proceeding until the event completes.

Note

This is a wrapper around `cudaEventSynchronize()`: see [CUDA Event
documentation](https://docs.nvidia.com/cuda/cuda-runtime-
api/group__CUDART__EVENT.html) for more info.

`wait(stream=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.wait)

    

Makes all future work submitted to the given stream wait for this event.

Use `torch.cuda.current_stream()` if no stream is specified.

## Memory management

`torch.cuda.empty_cache()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#empty_cache)

    

Releases all unoccupied cached memory currently held by the caching allocator
so that those can be used in other GPU application and visible in `nvidia-
smi`.

Note

`empty_cache()` doesn’t increase the amount of GPU memory available for
PyTorch. However, it may help reduce fragmentation of GPU memory in certain
cases. See [Memory
management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-memory-
management) for more details about GPU memory management.

`torch.cuda.list_gpu_processes(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#list_gpu_processes)

    

Returns a human-readable printout of the running processes and their GPU
memory use for a given device.

This can be useful to display periodically during training, or when handling
out-of-memory exceptions.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns printout for the current
device, given by `current_device()`, if `device` is `None` (default).

`torch.cuda.memory_stats(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_stats)

    

Returns a dictionary of CUDA memory allocator statistics for a given device.

The return value of this function is a dictionary of statistics, each of which
is a non-negative integer.

Core statistics:

  * `"allocated.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of allocation requests received by the memory allocator.
  * `"allocated_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of allocated memory.
  * `"segment.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of reserved segments from `cudaMalloc()`.
  * `"reserved_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of reserved memory.
  * `"active.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of active memory blocks.
  * `"active_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of active memory.
  * `"inactive_split.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of inactive, non-releasable memory blocks.
  * `"inactive_split_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of inactive, non-releasable memory.

For these core statistics, values are broken down as follows.

Pool type:

  * `all`: combined statistics across all memory pools.
  * `large_pool`: statistics for the large allocation pool (as of October 2019, for size >= 1MB allocations).
  * `small_pool`: statistics for the small allocation pool (as of October 2019, for size < 1MB allocations).

Metric type:

  * `current`: current value of this metric.
  * `peak`: maximum value of this metric.
  * `allocated`: historical total increase in this metric.
  * `freed`: historical total decrease in this metric.

In addition to the core statistics, we also provide some simple event
counters:

  * `"num_alloc_retries"`: number of failed `cudaMalloc` calls that result in a cache flush and retry.
  * `"num_ooms"`: number of out-of-memory errors thrown.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns statistics for the current
device, given by `current_device()`, if `device` is `None` (default).

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

`torch.cuda.memory_summary(device=None, abbreviated=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_summary)

    

Returns a human-readable printout of the current memory allocator statistics
for a given device.

This can be useful to display periodically during training, or when handling
out-of-memory exceptions.

Parameters

    

  * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns printout for the current device, given by `current_device()`, if `device` is `None` (default).
  * **abbreviated** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to return an abbreviated summary (default: False).

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

`torch.cuda.memory_snapshot()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_snapshot)

    

Returns a snapshot of the CUDA memory allocator state across all devices.

Interpreting the output of this function requires familiarity with the memory
allocator internals.

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

`torch.cuda.memory_allocated(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_allocated)

    

Returns the current GPU memory occupied by tensors in bytes for a given
device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns statistic for the current
device, given by `current_device()`, if `device` is `None` (default).

Note

This is likely less than the amount shown in `nvidia-smi` since some unused
memory can be held by the caching allocator and some context needs to be
created on GPU. See [Memory
management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-memory-
management) for more details about GPU memory management.

`torch.cuda.max_memory_allocated(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#max_memory_allocated)

    

Returns the maximum GPU memory occupied by tensors in bytes for a given
device.

By default, this returns the peak allocated memory since the beginning of this
program. `reset_peak_stats()` can be used to reset the starting point in
tracking this metric. For example, these two functions can measure the peak
allocated memory usage of each iteration in a training loop.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns statistic for the current
device, given by `current_device()`, if `device` is `None` (default).

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

`torch.cuda.reset_max_memory_allocated(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#reset_max_memory_allocated)

    

Resets the starting point in tracking maximum GPU memory occupied by tensors
for a given device.

See `max_memory_allocated()` for details.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns statistic for the current
device, given by `current_device()`, if `device` is `None` (default).

Warning

This function now calls `reset_peak_memory_stats()`, which resets /all/ peak
memory stats.

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

`torch.cuda.memory_reserved(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_reserved)

    

Returns the current GPU memory managed by the caching allocator in bytes for a
given device.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns statistic for the current
device, given by `current_device()`, if `device` is `None` (default).

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

`torch.cuda.max_memory_reserved(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#max_memory_reserved)

    

Returns the maximum GPU memory managed by the caching allocator in bytes for a
given device.

By default, this returns the peak cached memory since the beginning of this
program. `reset_peak_stats()` can be used to reset the starting point in
tracking this metric. For example, these two functions can measure the peak
cached memory amount of each iteration in a training loop.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns statistic for the current
device, given by `current_device()`, if `device` is `None` (default).

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

`torch.cuda.set_per_process_memory_fraction(fraction, device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#set_per_process_memory_fraction)

    

Set memory fraction for a process. The fraction is used to limit an caching
allocator to allocated memory on a CUDA device. The allowed value equals the
total visible memory multiplied fraction. If trying to allocate more than the
allowed value in a process, will raise an out of memory error in allocator.

Parameters

    

  * **fraction** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Range: 0~1. Allowed memory equals total_memory * fraction.
  * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. If it is `None` the default CUDA device is used.

Note

In general, the total available free memory is less than the total capacity.

`torch.cuda.memory_cached(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_cached)

    

Deprecated; see `memory_reserved()`.

`torch.cuda.max_memory_cached(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#max_memory_cached)

    

Deprecated; see `max_memory_reserved()`.

`torch.cuda.reset_max_memory_cached(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#reset_max_memory_cached)

    

Resets the starting point in tracking maximum GPU memory managed by the
caching allocator for a given device.

See `max_memory_cached()` for details.

Parameters

    

**device** ([torch.device](tensor_attributes#torch.torch.device
"torch.torch.device")
_or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)") _,__optional_) – selected device. Returns statistic for the current
device, given by `current_device()`, if `device` is `None` (default).

Warning

This function now calls `reset_peak_memory_stats()`, which resets /all/ peak
memory stats.

Note

See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-
memory-management) for more details about GPU memory management.

## NVIDIA Tools Extension (NVTX)

`torch.cuda.nvtx.mark(msg)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/nvtx.html#mark)

    

Describe an instantaneous event that occurred at some point.

Parameters

    

**msg** (_string_) – ASCII message to associate with the event.

`torch.cuda.nvtx.range_push(msg)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/nvtx.html#range_push)

    

Pushes a range onto a stack of nested range span. Returns zero-based depth of
the range that is started.

Parameters

    

**msg** (_string_) – ASCII message to associate with range

`torch.cuda.nvtx.range_pop()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/nvtx.html#range_pop)

    

Pops a range off of a stack of nested range spans. Returns the zero-based
depth of the range that is ended.

# torch.utils.data

At the heart of PyTorch data loading utility is the
`torch.utils.data.DataLoader` class. It represents a Python iterable over a
dataset, with support for

  * map-style and iterable-style datasets,
  * customizing data loading order,
  * automatic batching,
  * single- and multi-process data loading,
  * automatic memory pinning.

These options are configured by the constructor arguments of a `DataLoader`,
which has signature:

    
    
    DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
               batch_sampler=None, num_workers=0, collate_fn=None,
               pin_memory=False, drop_last=False, timeout=0,
               worker_init_fn=None, *, prefetch_factor=2,
               persistent_workers=False)
    

The sections below describe in details the effects and usages of these
options.

## Dataset Types

The most important argument of `DataLoader` constructor is `dataset`, which
indicates a dataset object to load data from. PyTorch supports two different
types of datasets:

  * map-style datasets,
  * iterable-style datasets.

### Map-style datasets

A map-style dataset is one that implements the `__getitem__()` and `__len__()`
protocols, and represents a map from (possibly non-integral) indices/keys to
data samples.

For example, such a dataset, when accessed with `dataset[idx]`, could read the
`idx`-th image and its corresponding label from a folder on the disk.

See `Dataset` for more details.

### Iterable-style datasets

An iterable-style dataset is an instance of a subclass of `IterableDataset`
that implements the `__iter__()` protocol, and represents an iterable over
data samples. This type of datasets is particularly suitable for cases where
random reads are expensive or even improbable, and where the batch size
depends on the fetched data.

For example, such a dataset, when called `iter(dataset)`, could return a
stream of data reading from a database, a remote server, or even logs
generated in real time.

See `IterableDataset` for more details.

Note

When using an `IterableDataset` with multi-process data loading. The same
dataset object is replicated on each worker process, and thus the replicas
must be configured differently to avoid duplicated data. See `IterableDataset`
documentations for how to achieve this.

## Data Loading Order and Sampler

For iterable-style datasets, data loading order is entirely controlled by the
user-defined iterable. This allows easier implementations of chunk-reading and
dynamic batch size (e.g., by yielding a batched sample at each time).

The rest of this section concerns the case with map-style datasets.
`torch.utils.data.Sampler` classes are used to specify the sequence of
indices/keys used in data loading. They represent iterable objects over the
indices to datasets. E.g., in the common case with stochastic gradient decent
(SGD), a `Sampler` could randomly permute a list of indices and yield each one
at a time, or yield a small number of them for mini-batch SGD.

A sequential or shuffled sampler will be automatically constructed based on
the `shuffle` argument to a `DataLoader`. Alternatively, users may use the
`sampler` argument to specify a custom `Sampler` object that at each time
yields the next index/key to fetch.

A custom `Sampler` that yields a list of batch indices at a time can be passed
as the `batch_sampler` argument. Automatic batching can also be enabled via
`batch_size` and `drop_last` arguments. See the next section for more details
on this.

Note

Neither `sampler` nor `batch_sampler` is compatible with iterable-style
datasets, since such datasets have no notion of a key or an index.

## Loading Batched and Non-Batched Data

`DataLoader` supports automatically collating individual fetched data samples
into batches via arguments `batch_size`, `drop_last`, and `batch_sampler`.

### Automatic batching (default)

This is the most common case, and corresponds to fetching a minibatch of data
and collating them into batched samples, i.e., containing Tensors with one
dimension being the batch dimension (usually the first).

When `batch_size` (default `1`) is not `None`, the data loader yields batched
samples instead of individual samples. `batch_size` and `drop_last` arguments
are used to specify how the data loader obtains batches of dataset keys. For
map-style datasets, users can alternatively specify `batch_sampler`, which
yields a list of keys at a time.

Note

The `batch_size` and `drop_last` arguments essentially are used to construct a
`batch_sampler` from `sampler`. For map-style datasets, the `sampler` is
either provided by user or constructed based on the `shuffle` argument. For
iterable-style datasets, the `sampler` is a dummy infinite one. See this
section on more details on samplers.

Note

When fetching from iterable-style datasets with multi-processing, the
`drop_last` argument drops the last non-full batch of each worker’s dataset
replica.

After fetching a list of samples using the indices from sampler, the function
passed as the `collate_fn` argument is used to collate lists of samples into
batches.

In this case, loading from a map-style dataset is roughly equivalent with:

    
    
    for indices in batch_sampler:
        yield collate_fn([dataset[i] for i in indices])
    

and loading from an iterable-style dataset is roughly equivalent with:

    
    
    dataset_iter = iter(dataset)
    for indices in batch_sampler:
        yield collate_fn([next(dataset_iter) for _ in indices])
    

A custom `collate_fn` can be used to customize collation, e.g., padding
sequential data to max length of a batch. See this section on more about
`collate_fn`.

### Disable automatic batching

In certain cases, users may want to handle batching manually in dataset code,
or simply load individual samples. For example, it could be cheaper to
directly load batched data (e.g., bulk reads from a database or reading
continuous chunks of memory), or the batch size is data dependent, or the
program is designed to work on individual samples. Under these scenarios, it’s
likely better to not use automatic batching (where `collate_fn` is used to
collate the samples), but let the data loader directly return each member of
the `dataset` object.

When both `batch_size` and `batch_sampler` are `None` (default value for
`batch_sampler` is already `None`), automatic batching is disabled. Each
sample obtained from the `dataset` is processed with the function passed as
the `collate_fn` argument.

**When automatic batching is disabled** , the default `collate_fn` simply
converts NumPy arrays into PyTorch Tensors, and keeps everything else
untouched.

In this case, loading from a map-style dataset is roughly equivalent with:

    
    
    for index in sampler:
        yield collate_fn(dataset[index])
    

and loading from an iterable-style dataset is roughly equivalent with:

    
    
    for data in iter(dataset):
        yield collate_fn(data)
    

See this section on more about `collate_fn`.

### Working with `collate_fn`

The use of `collate_fn` is slightly different when automatic batching is
enabled or disabled.

**When automatic batching is disabled** , `collate_fn` is called with each
individual data sample, and the output is yielded from the data loader
iterator. In this case, the default `collate_fn` simply converts NumPy arrays
in PyTorch tensors.

**When automatic batching is enabled** , `collate_fn` is called with a list of
data samples at each time. It is expected to collate the input samples into a
batch for yielding from the data loader iterator. The rest of this section
describes behavior of the default `collate_fn` in this case.

For instance, if each data sample consists of a 3-channel image and an
integral class label, i.e., each element of the dataset returns a tuple
`(image, class_index)`, the default `collate_fn` collates a list of such
tuples into a single tuple of a batched image tensor and a batched class label
Tensor. In particular, the default `collate_fn` has the following properties:

  * It always prepends a new dimension as the batch dimension.
  * It automatically converts NumPy arrays and Python numerical values into PyTorch Tensors.
  * It preserves the data structure, e.g., if each sample is a dictionary, it outputs a dictionary with the same set of keys but batched Tensors as values (or lists if the values can not be converted into Tensors). Same for `list` s, `tuple` s, `namedtuple` s, etc.

Users may use customized `collate_fn` to achieve custom batching, e.g.,
collating along a dimension other than the first, padding sequences of various
lengths, or adding support for custom data types.

## Single- and Multi-process Data Loading

A `DataLoader` uses single-process data loading by default.

Within a Python process, the [Global Interpreter Lock
(GIL)](https://wiki.python.org/moin/GlobalInterpreterLock) prevents true fully
parallelizing Python code across threads. To avoid blocking computation code
with data loading, PyTorch provides an easy switch to perform multi-process
data loading by simply setting the argument `num_workers` to a positive
integer.

### Single-process data loading (default)

In this mode, data fetching is done in the same process a `DataLoader` is
initialized. Therefore, data loading may block computing. However, this mode
may be preferred when resource(s) used for sharing data among processes (e.g.,
shared memory, file descriptors) is limited, or when the entire dataset is
small and can be loaded entirely in memory. Additionally, single-process
loading often shows more readable error traces and thus is useful for
debugging.

### Multi-process data loading

Setting the argument `num_workers` as a positive integer will turn on multi-
process data loading with the specified number of loader worker processes.

In this mode, each time an iterator of a `DataLoader` is created (e.g., when
you call `enumerate(dataloader)`), `num_workers` worker processes are created.
At this point, the `dataset`, `collate_fn`, and `worker_init_fn` are passed to
each worker, where they are used to initialize, and fetch data. This means
that dataset access together with its internal IO, transforms (including
`collate_fn`) runs in the worker process.

`torch.utils.data.get_worker_info()` returns various useful information in a
worker process (including the worker id, dataset replica, initial seed, etc.),
and returns `None` in main process. Users may use this function in dataset
code and/or `worker_init_fn` to individually configure each dataset replica,
and to determine whether the code is running in a worker process. For example,
this can be particularly helpful in sharding the dataset.

For map-style datasets, the main process generates the indices using `sampler`
and sends them to the workers. So any shuffle randomization is done in the
main process which guides loading by assigning indices to load.

For iterable-style datasets, since each worker process gets a replica of the
`dataset` object, naive multi-process loading will often result in duplicated
data. Using `torch.utils.data.get_worker_info()` and/or `worker_init_fn`,
users may configure each replica independently. (See `IterableDataset`
documentations for how to achieve this. ) For similar reasons, in multi-
process loading, the `drop_last` argument drops the last non-full batch of
each worker’s iterable-style dataset replica.

Workers are shut down once the end of the iteration is reached, or when the
iterator becomes garbage collected.

Warning

It is generally not recommended to return CUDA tensors in multi-process
loading because of many subtleties in using CUDA and sharing CUDA tensors in
multiprocessing (see [CUDA in
multiprocessing](https://pytorch.org/docs/1.8.0/notes/multiprocessing.html#multiprocessing-
cuda-note)). Instead, we recommend using automatic memory pinning (i.e.,
setting `pin_memory=True`), which enables fast data transfer to CUDA-enabled
GPUs.

#### Platform-specific behaviors

Since workers rely on Python
[`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-
multiprocessing "\(in Python v3.9\)"), worker launch behavior is different on
Windows compared to Unix.

  * On Unix, `fork()` is the default [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "\(in Python v3.9\)") start method. Using `fork()`, child workers typically can access the `dataset` and Python argument functions directly through the cloned address space.
  * On Windows, `spawn()` is the default [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "\(in Python v3.9\)") start method. Using `spawn()`, another interpreter is launched which runs your main script, followed by the internal worker function that receives the `dataset`, `collate_fn` and other arguments through [`pickle`](https://docs.python.org/3/library/pickle.html#module-pickle "\(in Python v3.9\)") serialization.

This separate serialization means that you should take two steps to ensure you
are compatible with Windows while using multi-process data loading:

  * Wrap most of you main script’s code within `if __name__ == '__main__':` block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and `DataLoader` instance creation logic here, as it doesn’t need to be re-executed in workers.
  * Make sure that any custom `collate_fn`, `worker_init_fn` or `dataset` code is declared as top level definitions, outside of the `__main__` check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not `bytecode`.)

#### Randomness in multi-process data loading

By default, each worker will have its PyTorch seed set to `base_seed +
worker_id`, where `base_seed` is a long generated by main process using its
RNG (thereby, consuming a RNG state mandatorily). However, seeds for other
libraries may be duplicated upon initializing workers (e.g., NumPy), causing
each worker to return identical random numbers. (See [this
section](https://pytorch.org/docs/1.8.0/notes/faq.html#dataloader-workers-
random-seed) in FAQ.).

In `worker_init_fn`, you may access the PyTorch seed set for each worker with
either `torch.utils.data.get_worker_info().seed` or
[`torch.initial_seed()`](generated/torch.initial_seed#torch.initial_seed
"torch.initial_seed"), and use it to seed other libraries before data loading.

## Memory Pinning

Host to GPU copies are much faster when they originate from pinned (page-
locked) memory. See [Use pinned memory
buffers](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-memory-pinning)
for more details on when and how to use pinned memory generally.

For data loading, passing `pin_memory=True` to a `DataLoader` will
automatically put the fetched data Tensors in pinned memory, and thus enables
faster data transfer to CUDA-enabled GPUs.

The default memory pinning logic only recognizes Tensors and maps and
iterables containing Tensors. By default, if the pinning logic sees a batch
that is a custom type (which will occur if you have a `collate_fn` that
returns a custom batch type), or if each element of your batch is a custom
type, the pinning logic will not recognize them, and it will return that batch
(or those elements) without pinning the memory. To enable memory pinning for
custom batch or data type(s), define a `pin_memory()` method on your custom
type(s).

See the example below.

Example:

    
    
    class SimpleCustomBatch:
        def __init__(self, data):
            transposed_data = list(zip(*data))
            self.inp = torch.stack(transposed_data[0], 0)
            self.tgt = torch.stack(transposed_data[1], 0)
    
        # custom memory pinning method on custom type
        def pin_memory(self):
            self.inp = self.inp.pin_memory()
            self.tgt = self.tgt.pin_memory()
            return self
    
    def collate_wrapper(batch):
        return SimpleCustomBatch(batch)
    
    inps = torch.arange(10 * 5, dtype=torch.float32).view(10, 5)
    tgts = torch.arange(10 * 5, dtype=torch.float32).view(10, 5)
    dataset = TensorDataset(inps, tgts)
    
    loader = DataLoader(dataset, batch_size=2, collate_fn=collate_wrapper,
                        pin_memory=True)
    
    for batch_ndx, sample in enumerate(loader):
        print(sample.inp.is_pinned())
        print(sample.tgt.is_pinned())
    

`class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False,
sampler=None, batch_sampler=None, num_workers=0, collate_fn=None,
pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None,
multiprocessing_context=None, generator=None, *, prefetch_factor=2,
persistent_workers=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataloader.html#DataLoader)

    

Data loader. Combines a dataset and a sampler, and provides an iterable over
the given dataset.

The `DataLoader` supports both map-style and iterable-style datasets with
single- or multi-process loading, customizing loading order and optional
automatic batching (collation) and memory pinning.

See `torch.utils.data` documentation page for more details.

Parameters

    

  * **dataset** (Dataset) – dataset from which to load the data.
  * **batch_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – how many samples per batch to load (default: `1`).
  * **shuffle** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – set to `True` to have the data reshuffled at every epoch (default: `False`).
  * **sampler** (Sampler _or_ _Iterable_ _,__optional_) – defines the strategy to draw samples from the dataset. Can be any `Iterable` with `__len__` implemented. If specified, `shuffle` must not be specified.
  * **batch_sampler** (Sampler _or_ _Iterable_ _,__optional_) – like `sampler`, but returns a batch of indices at a time. Mutually exclusive with `batch_size`, `shuffle`, `sampler`, and `drop_last`.
  * **num_workers** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – how many subprocesses to use for data loading. `0` means that the data will be loaded in the main process. (default: `0`)
  * **collate_fn** (_callable_ _,__optional_) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.
  * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your `collate_fn` returns a batch that is a custom type, see the example below.
  * **drop_last** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – set to `True` to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If `False` and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: `False`)
  * **timeout** (_numeric_ _,__optional_) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: `0`)
  * **worker_init_fn** (_callable_ _,__optional_) – If not `None`, this will be called on each worker subprocess with the worker id (an int in `[0, num_workers - 1]`) as input, after seeding and before data loading. (default: `None`)
  * **prefetch_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_ _,__keyword-only arg_) – Number of samples loaded in advance by each worker. `2` means there will be a total of 2 * num_workers samples prefetched across all workers. (default: `2`)
  * **persistent_workers** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers `Dataset` instances alive. (default: `False`)

Warning

If the `spawn` start method is used, `worker_init_fn` cannot be an unpicklable
object, e.g., a lambda function. See [Multiprocessing best
practices](https://pytorch.org/docs/1.8.0/notes/multiprocessing.html#multiprocessing-
best-practices) on more details related to multiprocessing in PyTorch.

Warning

`len(dataloader)` heuristic is based on the length of the sampler used. When
`dataset` is an `IterableDataset`, it instead returns an estimate based on
`len(dataset) / batch_size`, with proper rounding depending on `drop_last`,
regardless of multi-process loading configurations. This represents the best
guess PyTorch can make because PyTorch trusts user `dataset` code in correctly
handling multi-process loading to avoid duplicate data.

However, if sharding results in multiple workers having incomplete last
batches, this estimate can still be inaccurate, because (1) an otherwise
complete batch can be broken into multiple ones and (2) more than one batch
worth of samples can be dropped when `drop_last` is set. Unfortunately,
PyTorch can not detect such cases in general.

See Dataset Types for more details on these two types of datasets and how
`IterableDataset` interacts with Multi-process data loading.

Warning

See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html#reproducibility),
and [My data loader workers return identical random
numbers](https://pytorch.org/docs/1.8.0/notes/faq.html#dataloader-workers-
random-seed), and Randomness in multi-process data loading notes for random
seed related questions.

`class torch.utils.data.Dataset`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#Dataset)

    

An abstract class representing a `Dataset`.

All datasets that represent a map from keys to data samples should subclass
it. All subclasses should overwrite `__getitem__()`, supporting fetching a
data sample for a given key. Subclasses could also optionally overwrite
`__len__()`, which is expected to return the size of the dataset by many
`Sampler` implementations and the default options of `DataLoader`.

Note

`DataLoader` by default constructs a index sampler that yields integral
indices. To make it work with a map-style dataset with non-integral
indices/keys, a custom sampler must be provided.

`class torch.utils.data.IterableDataset`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#IterableDataset)

    

An iterable Dataset.

All datasets that represent an iterable of data samples should subclass it.
Such form of datasets is particularly useful when data come from a stream.

All subclasses should overwrite `__iter__()`, which would return an iterator
of samples in this dataset.

When a subclass is used with `DataLoader`, each item in the dataset will be
yielded from the `DataLoader` iterator. When `num_workers > 0`, each worker
process will have a different copy of the dataset object, so it is often
desired to configure each copy independently to avoid having duplicate data
returned from the workers. `get_worker_info()`, when called in a worker
process, returns information about the worker. It can be used in either the
dataset’s `__iter__()` method or the `DataLoader` ‘s `worker_init_fn` option
to modify each copy’s behavior.

Example 1: splitting workload across all workers in `__iter__()`:

    
    
    >>> class MyIterableDataset(torch.utils.data.IterableDataset):
    ...     def __init__(self, start, end):
    ...         super(MyIterableDataset).__init__()
    ...         assert end > start, "this example code only works with end >= start"
    ...         self.start = start
    ...         self.end = end
    ...
    ...     def __iter__(self):
    ...         worker_info = torch.utils.data.get_worker_info()
    ...         if worker_info is None:  # single-process data loading, return the full iterator
    ...             iter_start = self.start
    ...             iter_end = self.end
    ...         else:  # in a worker process
    ...             # split workload
    ...             per_worker = int(math.ceil((self.end - self.start) / float(worker_info.num_workers)))
    ...             worker_id = worker_info.id
    ...             iter_start = self.start + worker_id * per_worker
    ...             iter_end = min(iter_start + per_worker, self.end)
    ...         return iter(range(iter_start, iter_end))
    ...
    >>> # should give same set of data as range(3, 7), i.e., [3, 4, 5, 6].
    >>> ds = MyIterableDataset(start=3, end=7)
    
    >>> # Single-process loading
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0)))
    [3, 4, 5, 6]
    
    >>> # Mult-process loading with two worker processes
    >>> # Worker 0 fetched [3, 4].  Worker 1 fetched [5, 6].
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2)))
    [3, 5, 4, 6]
    
    >>> # With even more workers
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=20)))
    [3, 4, 5, 6]
    

Example 2: splitting workload across all workers using `worker_init_fn`:

    
    
    >>> class MyIterableDataset(torch.utils.data.IterableDataset):
    ...     def __init__(self, start, end):
    ...         super(MyIterableDataset).__init__()
    ...         assert end > start, "this example code only works with end >= start"
    ...         self.start = start
    ...         self.end = end
    ...
    ...     def __iter__(self):
    ...         return iter(range(self.start, self.end))
    ...
    >>> # should give same set of data as range(3, 7), i.e., [3, 4, 5, 6].
    >>> ds = MyIterableDataset(start=3, end=7)
    
    >>> # Single-process loading
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0)))
    [3, 4, 5, 6]
    >>>
    >>> # Directly doing multi-process loading yields duplicate data
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2)))
    [3, 3, 4, 4, 5, 5, 6, 6]
    
    >>> # Define a `worker_init_fn` that configures each dataset copy differently
    >>> def worker_init_fn(worker_id):
    ...     worker_info = torch.utils.data.get_worker_info()
    ...     dataset = worker_info.dataset  # the dataset copy in this worker process
    ...     overall_start = dataset.start
    ...     overall_end = dataset.end
    ...     # configure the dataset to only process the split workload
    ...     per_worker = int(math.ceil((overall_end - overall_start) / float(worker_info.num_workers)))
    ...     worker_id = worker_info.id
    ...     dataset.start = overall_start + worker_id * per_worker
    ...     dataset.end = min(dataset.start + per_worker, overall_end)
    ...
    
    >>> # Mult-process loading with the custom `worker_init_fn`
    >>> # Worker 0 fetched [3, 4].  Worker 1 fetched [5, 6].
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2, worker_init_fn=worker_init_fn)))
    [3, 5, 4, 6]
    
    >>> # With even more workers
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=20, worker_init_fn=worker_init_fn)))
    [3, 4, 5, 6]
    

`class torch.utils.data.TensorDataset(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#TensorDataset)

    

Dataset wrapping tensors.

Each sample will be retrieved by indexing tensors along the first dimension.

Parameters

    

***tensors** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – tensors that
have the same size of the first dimension.

`class torch.utils.data.ConcatDataset(datasets)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#ConcatDataset)

    

Dataset as a concatenation of multiple datasets.

This class is useful to assemble different existing datasets.

Parameters

    

**datasets** (_sequence_) – List of datasets to be concatenated

`class torch.utils.data.ChainDataset(datasets)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#ChainDataset)

    

Dataset for chainning multiple `IterableDataset` s.

This class is useful to assemble different existing dataset streams. The
chainning operation is done on-the-fly, so concatenating large-scale datasets
with this class will be efficient.

Parameters

    

**datasets** (_iterable of IterableDataset_) – datasets to be chained together

`class torch.utils.data.BufferedShuffleDataset(dataset, buffer_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#BufferedShuffleDataset)

    

Dataset shuffled from the original dataset.

This class is useful to shuffle an existing instance of an IterableDataset.
The buffer with `buffer_size` is filled with the items from the dataset first.
Then, each item will be yielded from the buffer by reservoir sampling via
iterator.

`buffer_size` is required to be larger than 0. For `buffer_size == 1`, the
dataset is not shuffled. In order to fully shuffle the whole dataset,
`buffer_size` is required to be greater than or equal to the size of dataset.

When it is used with `DataLoader`, each item in the dataset will be yielded
from the `DataLoader` iterator. And, the method to set up a random seed is
different based on `num_workers`.

For single-process mode (`num_workers == 0`), the random seed is required to
be set before the `DataLoader` in the main process.

    
    
    >>> ds = BufferedShuffleDataset(dataset)
    >>> random.seed(...)
    >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0)))
    

For multi-process mode (`num_workers > 0`), the random seed is set by a
callable function in each worker.

    
    
    >>> ds = BufferedShuffleDataset(dataset)
    >>> def init_fn(worker_id):
    ...     random.seed(...)
    >>> print(list(torch.utils.data.DataLoader(ds, ..., num_workers=n, worker_init_fn=init_fn)))
    

Parameters

    

  * **dataset** (IterableDataset) – The original IterableDataset.
  * **buffer_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The buffer size for shuffling.

`class torch.utils.data.Subset(dataset, indices)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#Subset)

    

Subset of a dataset at specified indices.

Parameters

    

  * **dataset** (Dataset) – The whole Dataset
  * **indices** (_sequence_) – Indices in the whole set selected for subset

`torch.utils.data.get_worker_info()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/_utils/worker.html#get_worker_info)

    

Returns the information about the current `DataLoader` iterator worker
process.

When called in a worker, this returns an object guaranteed to have the
following attributes:

  * `id`: the current worker id.
  * `num_workers`: the total number of workers.
  * `seed`: the random seed set for the current worker. This value is determined by main process RNG and the worker id. See `DataLoader`’s documentation for more details.
  * `dataset`: the copy of the dataset object in **this** process. Note that this will be a different object in a different process than the one in the main process.

When called in the main process, this returns `None`.

Note

When used in a `worker_init_fn` passed over to `DataLoader`, this method can
be useful to set up each worker process differently, for instance, using
`worker_id` to configure the `dataset` object to only read a specific fraction
of a sharded dataset, or use `seed` to seed other libraries used in dataset
code (e.g., NumPy).

`torch.utils.data.random_split(dataset, lengths, generator=<torch._C.Generator
object>)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#random_split)

    

Randomly split a dataset into non-overlapping new datasets of given lengths.
Optionally fix the generator for reproducible results, e.g.:

    
    
    >>> random_split(range(10), [3, 7], generator=torch.Generator().manual_seed(42))
    

Parameters

    

  * **dataset** (Dataset) – Dataset to be split
  * **lengths** (_sequence_) – lengths of splits to be produced
  * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used for the random permutation.

`class torch.utils.data.Sampler(data_source)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#Sampler)

    

Base class for all Samplers.

Every Sampler subclass has to provide an `__iter__()` method, providing a way
to iterate over indices of dataset elements, and a `__len__()` method that
returns the length of the returned iterators.

Note

The `__len__()` method isn’t strictly required by `DataLoader`, but is
expected in any calculation involving the length of a `DataLoader`.

`class torch.utils.data.SequentialSampler(data_source)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#SequentialSampler)

    

Samples elements sequentially, always in the same order.

Parameters

    

**data_source** (Dataset) – dataset to sample from

`class torch.utils.data.RandomSampler(data_source, replacement=False,
num_samples=None, generator=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#RandomSampler)

    

Samples elements randomly. If without replacement, then sample from a shuffled
dataset. If with replacement, then user can specify `num_samples` to draw.

Parameters

    

  * **data_source** (Dataset) – dataset to sample from
  * **replacement** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – samples are drawn on-demand with replacement if `True`, default=``False``
  * **num_samples** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of samples to draw, default=`len(dataset)`. This argument is supposed to be specified only when `replacement` is `True`.
  * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used in sampling.

`class torch.utils.data.SubsetRandomSampler(indices, generator=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#SubsetRandomSampler)

    

Samples elements randomly from a given list of indices, without replacement.

Parameters

    

  * **indices** (_sequence_) – a sequence of indices
  * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used in sampling.

`class torch.utils.data.WeightedRandomSampler(weights, num_samples,
replacement=True, generator=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#WeightedRandomSampler)

    

Samples elements from `[0,..,len(weights)-1]` with given probabilities
(weights).

Parameters

    

  * **weights** (_sequence_) – a sequence of weights, not necessary summing up to one
  * **num_samples** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of samples to draw
  * **replacement** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `True`, samples are drawn with replacement. If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row.
  * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used in sampling.

#### Example

    
    
    >>> list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True))
    [4, 4, 1, 4, 5]
    >>> list(WeightedRandomSampler([0.9, 0.4, 0.05, 0.2, 0.3, 0.1], 5, replacement=False))
    [0, 1, 4, 3, 2]
    

`class torch.utils.data.BatchSampler(sampler, batch_size, drop_last)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#BatchSampler)

    

Wraps another sampler to yield a mini-batch of indices.

Parameters

    

  * **sampler** (Sampler _or_ _Iterable_) – Base sampler. Can be any iterable object
  * **batch_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of mini-batch.
  * **drop_last** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, the sampler will drop the last batch if its size would be less than `batch_size`

#### Example

    
    
    >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=False))
    [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
    >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=True))
    [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    

`class torch.utils.data.distributed.DistributedSampler(dataset,
num_replicas=None, rank=None, shuffle=True, seed=0, drop_last=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/distributed.html#DistributedSampler)

    

Sampler that restricts data loading to a subset of the dataset.

It is especially useful in conjunction with
[`torch.nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel"). In such a case, each process can
pass a `DistributedSampler` instance as a `DataLoader` sampler, and load a
subset of the original dataset that is exclusive to it.

Note

Dataset is assumed to be of constant size.

Parameters

    

  * **dataset** – Dataset used for sampling.
  * **num_replicas** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of processes participating in distributed training. By default, `world_size` is retrieved from the current distributed group.
  * **rank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Rank of the current process within `num_replicas`. By default, `rank` is retrieved from the current distributed group.
  * **shuffle** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True` (default), sampler will shuffle the indices.
  * **seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – random seed used to shuffle the sampler if `shuffle=True`. This number should be identical across all processes in the distributed group. Default: `0`.
  * **drop_last** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, then the sampler will drop the tail of the data to make it evenly divisible across the number of replicas. If `False`, the sampler will add extra indices to make the data evenly divisible across the replicas. Default: `False`.

Warning

In distributed mode, calling the `set_epoch()` method at the beginning of each
epoch **before** creating the `DataLoader` iterator is necessary to make
shuffling work properly across multiple epochs. Otherwise, the same ordering
will be always used.

Example:

    
    
    >>> sampler = DistributedSampler(dataset) if is_distributed else None
    >>> loader = DataLoader(dataset, shuffle=(sampler is None),
    ...                     sampler=sampler)
    >>> for epoch in range(start_epoch, n_epochs):
    ...     if is_distributed:
    ...         sampler.set_epoch(epoch)
    ...     train(loader)
    

# DDP Communication Hooks

DDP communication hook is a generic interface to control how to communicate
gradients across workers by overriding the vanilla allreduce in
[DistributedDataParallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.).
A few built-in communication hooks are provided, and users can easily apply
any of these hooks to optimize communication. Besides, the hook interface can
also support user-defined communication strategies for more advanced use
cases.

Warning

DDP communication hook is experimental and subject to change.

Warning

DDP communication hooks can only support single process single device mode on
NCCL backend.

## How to Use a Communication Hook?

To use a communication hook, the user just needs to let the DDP model register
the hook before the training loop as below.

`torch.nn.parallel.DistributedDataParallel.register_comm_hook().`

    

noindex

## Default Communication Hooks

Default communication hooks are simple **stateless** hooks, so the input state
in `register_comm_hook` is either a process group or `None`.

`torch.distributed.algorithms.ddp_comm_hooks.default_hooks.allreduce_hook(process_group,
bucket)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.html#allreduce_hook)

    

This DDP communication hook just calls `allreduce` using `GradBucket` tensors.
Once gradient tensors are aggregated across all workers, its `then` callback
takes the mean and returns the result. If user registers this hook, DDP
results is expected to be same as the case where no hook was registered.
Hence, this won’t change behavior of DDP and user can use this as a reference
or modify this hook to log useful information or any other purposes while
unaffecting DDP behavior.

Example::

    
    
    
    >>> ddp_model.register_comm_hook(process_group, allreduce_hook)
    

`torch.distributed.algorithms.ddp_comm_hooks.default_hooks.fp16_compress_hook(process_group,
bucket)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.html#fp16_compress_hook)

    

This DDP communication hook implements a simple gradient compression approach
that converts `GradBucket` tensors whose type is assumed to be `torch.float32`
to half-precision floating point format (`torch.float16`). It allreduces those
`float16` gradient tensors. Once compressed gradient tensors are allreduced,
its then callback called `decompress` converts the aggregated result back to
`float32` and takes the mean.

Example::

    
    
    
    >>> ddp_model.register_comm_hook(process_group, fp16_compress_hook)
    

## PowerSGD Communication Hook

PowerSGD ([Vogels et al., NeurIPS 2019](https://arxiv.org/abs/1905.13727)) is
a gradient compression algorithm, which can provide very high compression
rates and accelerate bandwidth-bound distributed training. This algorithm
needs to maintain both some hyperparameters and the internal state. Therefore,
PowerSGD communication hook is a **stateful** hook, and the user needs to
provide a state object defined as below.

### PowerSGD State

`class
torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook.PowerSGDState(process_group,
matrix_approximation_rank=1, start_powerSGD_iter=10, use_error_feedback=True,
warm_start=True, random_seed=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.html#PowerSGDState)

    

Stores both the algorithm’s hyperparameters and the internal state for all the
gradients during the training. Particularly, `matrix_approximation_rank` and
`start_powerSGD_iter` are the main hyperparameters that should be tuned by the
user. For performance, we suggest to keep binary hyperparameters
`use_error_feedback` and `warm_start` on.

  1. `matrix_approximation_rank` controls the size of compressed low-rank tensors, which determines the compression rate. The lower the rank, the stronger the compression.

1.1. If `matrix_approximation_rank` is too low, the full model quality will
need more training steps to reach or will never reach and yield loss in
accuracy.

1.2. The increase of `matrix_approximation_rank` can substantially increase
the computation costs of the compression, and the accuracy may not be futher
improved beyond a certain `matrix_approximation_rank` threshold.

To tune `matrix_approximation_rank`, we suggest to start from 1 and increase
by factors of 2 (like an expoential grid search, 1, 2, 4, …), until a
satisfactory accuracy is reached. Typically only a small value 1-4 is used.
For some NLP tasks (as shown in Appendix D of the original paper), this value
has been increased to 32.

  2. `start_powerSGD_iter` defers PowerSGD compression util step `start_powerSGD_iter`, and vanilla allreduce runs prior to step `start_powerSGD_iter`. This hybrid scheme of **vanilla allreduce + PowerSGD** can effectively improve the accuracy, even a relatively small `matrix_approximation_rank` is used. This is because that, the beginning of training phase is usually very sensitive to inaccurate gradients, and compressing gradients too early may make the training quickly take a suboptimal trajectory, which can result in an irrecoverable impact on the accuracy.

To tune `start_powerSGD_iter`, we suggest to start with 10% of total training
steps, and increase it until a satisfactory accuracy is reached.

Warning

If error feedback or warm-up is enabled, the minimum value of
`start_powerSGD_iter` allowed in DDP is 2. This is because there is another
internal optimization that rebuilds buckets at iteration 1 in DDP, and this
can conflict with any tensor memorized before the rebuild process.

### PowerSGD Hooks

Warning

PowerSGD typically requires extra memory of the same size as the model’s
gradients to enable error feedback, which can compensate for biased compressed
communication and improve accuracy.

Warning

The current implementation may cause gradient overflow for FP16 input.

`torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook.powerSGD_hook(state,
bucket)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.html#powerSGD_hook)

    

This DDP communication hook implements PowerSGD gradient compression algorithm
described in the [paper](https://arxiv.org/abs/1905.13727). Once gradient
tensors are aggregated across all workers, this hook applies compression as
follows:

  1. Views the input flattened 1D gradient tensor as two groups of per-parameter tensors: high-rank tensors and vector-like rank-1 tensors (for biases).
  2. Handles rank-1 tensors by allreducing them without compression:

2.1. Allocate contiguous memory for those rank-1 tensors, and allreduces all
the rank-1 tensors as a batch, without compression;

2.2. Copies the individual rank-1 tensors from the contiguous memory back to
the input tensor.

  3. Handles high-rank tensors by PowerSGD compression:

3.1. For each high-rank tensor M, creates two low-rank tensors P and Q for
decomposing M, such that M = PQ^T, where Q is initialized from a standard
normal distribution and orthogonalized;

3.2. Computes each P in Ps, which is equal to MQ;

3.3. Allreduces Ps as a batch;

3.4. Orthogonalizes each P in Ps;

3.5. Computes each Q in Qs, which is approximately equal to M^TP;

3.6. Allreduces Qs as a batch;

3.7. Computes each M among all the high-rank tensors, which is approximately
equal to PQ^T.

Note that this communication hook enforces vanilla allreduce for the first
`state.start_powerSGD_iter` iterations. This not only gives the user more
control over the tradeoff between speedup and accuracy, but also helps
abstract away some complexity of the internal optimization of DDP for future
communication hook developers.

Parameters

    

  * **state** (PowerSGDState) – State information to configure the compression rate and support error feedback, warm start, etc. To tune the compression configs, mainly need to tune `matrix_approximation_rank`` and `start_powerSGD_iter`.
  * **bucket** (_dist._GradBucket_) – Bucket that stores a 1D flattened gradient tensor that batches multiple per-variable tensors. Note that since DDP comm hook only supports single process single device mode at this time, only exactly one tensor is stored in this bucket.

Returns

    

Future handler of the communication, which updates the gradients in place.

Example::

    
    
    
    >>> state = PowerSGDState(process_group=process_group, matrix_approximation_rank=1, start_powerSGD_iter=10)
    >>> ddp_model.register_comm_hook(state, powerSGD_hook)
    

`torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook.batched_powerSGD_hook(state,
bucket)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.html#batched_powerSGD_hook)

    

This DDP communication hook implements a simplified PowerSGD gradient
compression algorithm described in the
[paper](https://arxiv.org/abs/1905.13727). This variant does not compress the
gradients layer by layer, but instead compresses the flattened input tensor
that batches all the gradients. Therefore, it is **faster** than
`powerSGD_hook()`, but usually results in a **much lower accuracy** , unless
`matrix_approximation_rank` is 1.

Warning

Increasing `matrix_approximation_rank` here may not necessarily increase the
accuracy, because batching per-parameter tensors without column/row alignment
can destroy low-rank structure. Therefore, the user should always consider
`powerSGD_hook()` first, and only consider this variant when a satisfactory
accuracy can be achieved when `matrix_approximation_rank` is 1.

Once gradient tensors are aggregated across all workers, this hook applies
compression as follows:

  1. Views the input flattened 1D gradient tensor as a square-shaped tensor M with 0 paddings;
  2. Creates two low-rank tensors P and Q for decomposing M, such that M = PQ^T, where Q is initialized from a standard normal distribution and orthogonalized;
  3. Computes P, which is equal to MQ;
  4. Allreduces P;
  5. Orthogonalizes P;
  6. Computes Q, which is approximately equal to M^TP;
  7. Allreduces Q;
  8. Computes M, which is approximately equal to PQ^T.
  9. Truncates the input tensor to the original length.

Note that this communication hook enforces vanilla allreduce for the first
`state.start_powerSGD_iter` iterations. This not only gives the user more
control over the tradeoff between speedup and accuracy, but also helps
abstract away some complexity of the internal optimization of DDP for future
communication hook developers.

Parameters

    

  * **state** (PowerSGDState) – State information to configure the compression rate and support error feedback, warm start, etc. To tune the compression configs, mainly need to tune `matrix_approximation_rank` and `start_powerSGD_iter`.
  * **bucket** (_dist._GradBucket_) – Bucket that stores a 1D flattened gradient tensor that batches multiple per-variable tensors. Note that since DDP comm hook only supports single process single device mode at this time, only exactly one tensor is stored in this bucket.

Returns

    

Future handler of the communication, which updates the gradients in place.

Example::

    
    
    
    >>> state = PowerSGDState(process_group=process_group, matrix_approximation_rank=1)
    >>> ddp_model.register_comm_hook(state, batched_powerSGD_hook)
    

## Acknowledgements

Many thanks to PowerSGD paper author **Thijs Vogels** for the code review on
PowerSGD communication hook, as well as the [comparison
experiments](https://observablehq.com/@tvogels/powersgd-benchmark), which show
that the performance of PowerSGD communication hook is on par with the
implementation in the original [paper](https://arxiv.org/abs/1905.13727).

# Distributed communication package - torch.distributed

Note

Please refer to [PyTorch Distributed
Overview](https://pytorch.org/tutorials/beginner/dist_overview.html) for a
brief introduction to all features related to distributed training.

## Backends

`torch.distributed` supports three built-in backends, each with different
capabilities. The table below shows which functions are available for use with
CPU / CUDA tensors. MPI supports CUDA only if the implementation used to build
PyTorch supports it.

Backend | `gloo` | `mpi` | `nccl`  
---|---|---|---  
Device | CPU | GPU | CPU | GPU | CPU | GPU  
send | ✓ | ✘ | ✓ | ? | ✘ | ✘  
recv | ✓ | ✘ | ✓ | ? | ✘ | ✘  
broadcast | ✓ | ✓ | ✓ | ? | ✘ | ✓  
all_reduce | ✓ | ✓ | ✓ | ? | ✘ | ✓  
reduce | ✓ | ✘ | ✓ | ? | ✘ | ✓  
all_gather | ✓ | ✘ | ✓ | ? | ✘ | ✓  
gather | ✓ | ✘ | ✓ | ? | ✘ | ✘  
scatter | ✓ | ✘ | ✓ | ? | ✘ | ✘  
reduce_scatter | ✘ | ✘ | ✘ | ✘ | ✘ | ✓  
all_to_all | ✘ | ✘ | ✓ | ? | ✘ | ✘  
barrier | ✓ | ✘ | ✓ | ? | ✘ | ✓  
  
### Backends that come with PyTorch

PyTorch distributed package supports Linux (stable), MacOS (stable), and
Windows (prototype). By default for Linux, the Gloo and NCCL backends are
built and included in PyTorch distributed (NCCL only when building with CUDA).
MPI is an optional backend that can only be included if you build PyTorch from
source. (e.g.building PyTorch on a host that has MPI installed.)

Note

As of PyTorch v1.8, Windows supports all collective communications backend but
NCCL, If the `init_method` argument of `init_process_group()` points to a file
it must adhere to the following schema:

  * Local file system, `init_method="file:///d:/tmp/some_file"`
  * Shared file system, `init_method="file://////{machine_name}/{share_folder_name}/some_file"`

Same as on Linux platform, you can enable TcpStore by setting environment
variables, MASTER_ADDR and MASTER_PORT.

### Which backend to use?

In the past, we were often asked: “which backend should I use?”.

  * Rule of thumb

    * Use the NCCL backend for distributed **GPU** training
    * Use the Gloo backend for distributed **CPU** training.
  * GPU hosts with InfiniBand interconnect

    * Use NCCL, since it’s the only backend that currently supports InfiniBand and GPUDirect.
  * GPU hosts with Ethernet interconnect

    * Use NCCL, since it currently provides the best distributed GPU training performance, especially for multiprocess single-node or multi-node distributed training. If you encounter any problem with NCCL, use Gloo as the fallback option. (Note that Gloo currently runs slower than NCCL for GPUs.)
  * CPU hosts with InfiniBand interconnect

    * If your InfiniBand has enabled IP over IB, use Gloo, otherwise, use MPI instead. We are planning on adding InfiniBand support for Gloo in the upcoming releases.
  * CPU hosts with Ethernet interconnect

    * Use Gloo, unless you have specific reasons to use MPI.

### Common environment variables

#### Choosing the network interface to use

By default, both the NCCL and Gloo backends will try to find the right network
interface to use. If the automatically detected interface is not correct, you
can override it using the following environment variables (applicable to the
respective backend):

  * **NCCL_SOCKET_IFNAME** , for example `export NCCL_SOCKET_IFNAME=eth0`
  * **GLOO_SOCKET_IFNAME** , for example `export GLOO_SOCKET_IFNAME=eth0`

If you’re using the Gloo backend, you can specify multiple interfaces by
separating them by a comma, like this: `export
GLOO_SOCKET_IFNAME=eth0,eth1,eth2,eth3`. The backend will dispatch operations
in a round-robin fashion across these interfaces. It is imperative that all
processes specify the same number of interfaces in this variable.

#### Other NCCL environment variables

NCCL has also provided a number of environment variables for fine-tuning
purposes.

Commonly used ones include the following for debugging purposes:

  * `export NCCL_DEBUG=INFO`
  * `export NCCL_DEBUG_SUBSYS=ALL`

For the full list of NCCL environment variables, please refer to [NVIDIA
NCCL’s official documentation](https://docs.nvidia.com/deeplearning/sdk/nccl-
developer-guide/docs/env.html)

## Basics

The `torch.distributed` package provides PyTorch support and communication
primitives for multiprocess parallelism across several computation nodes
running on one or more machines. The class
[`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel") builds on this functionality to
provide synchronous distributed training as a wrapper around any PyTorch
model. This differs from the kinds of parallelism provided by [Multiprocessing
package - torch.multiprocessing](multiprocessing) and
[`torch.nn.DataParallel()`](generated/torch.nn.dataparallel#torch.nn.DataParallel
"torch.nn.DataParallel") in that it supports multiple network-connected
machines and in that the user must explicitly launch a separate copy of the
main training script for each process.

In the single-machine synchronous case, `torch.distributed` or the
[`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel") wrapper may still have advantages
over other approaches to data-parallelism, including
[`torch.nn.DataParallel()`](generated/torch.nn.dataparallel#torch.nn.DataParallel
"torch.nn.DataParallel"):

  * Each process maintains its own optimizer and performs a complete optimization step with each iteration. While this may appear redundant, since the gradients have already been gathered together and averaged across processes and are thus the same for every process, this means that no parameter broadcast step is needed, reducing time spent transferring tensors between nodes.
  * Each process contains an independent Python interpreter, eliminating the extra interpreter overhead and “GIL-thrashing” that comes from driving several execution threads, model replicas, or GPUs from a single Python process. This is especially important for models that make heavy use of the Python runtime, including models with recurrent layers or many small components.

## Initialization

The package needs to be initialized using the
`torch.distributed.init_process_group()` function before calling any other
methods. This blocks until all processes have joined.

`torch.distributed.is_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed.html#is_available)

    

Returns `True` if the distributed package is available. Otherwise,
`torch.distributed` does not expose any other APIs. Currently,
`torch.distributed` is available on Linux, MacOS and Windows. Set
`USE_DISTRIBUTED=1` to enable it when building PyTorch from source. Currently,
the default value is `USE_DISTRIBUTED=1` for Linux and Windows,
`USE_DISTRIBUTED=0` for MacOS.

`torch.distributed.init_process_group(backend, init_method=None,
timeout=datetime.timedelta(seconds=1800), world_size=-1, rank=-1, store=None,
group_name='')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#init_process_group)

    

Initializes the default distributed process group, and this will also
initialize the distributed package.

There are 2 main ways to initialize a process group:

    

  1. Specify `store`, `rank`, and `world_size` explicitly.
  2. Specify `init_method` (a URL string) which indicates where/how to discover peers. Optionally specify `rank` and `world_size`, or encode all required parameters in the URL and omit them.

If neither is specified, `init_method` is assumed to be “env://”.

Parameters

    

  * **backend** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_Backend) – The backend to use. Depending on build-time configurations, valid values include `mpi`, `gloo`, and `nccl`. This field should be given as a lowercase string (e.g., `"gloo"`), which can also be accessed via `Backend` attributes (e.g., `Backend.GLOO`). If using multiple processes per machine with `nccl` backend, each process must have exclusive access to every GPU it uses, as sharing GPUs between processes can result in deadlocks.
  * **init_method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – URL specifying how to initialize the process group. Default is “env://” if no `init_method` or `store` is specified. Mutually exclusive with `store`.
  * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of processes participating in the job. Required if `store` is specified.
  * **rank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Rank of the current process (it should be a number between 0 and `world_size`-1). Required if `store` is specified.
  * **store** (Store _,__optional_) – Key/value store accessible to all workers, used to exchange connection/address information. Mutually exclusive with `init_method`.
  * **timeout** (_timedelta_ _,__optional_) – Timeout for operations executed against the process group. Default value equals 30 minutes. This is applicable for the `gloo` backend. For `nccl`, this is applicable only if the environment variable `NCCL_BLOCKING_WAIT` or `NCCL_ASYNC_ERROR_HANDLING` is set to 1. When `NCCL_BLOCKING_WAIT` is set, this is the duration for which the process will block and wait for collectives to complete before throwing an exception. When `NCCL_ASYNC_ERROR_HANDLING` is set, this is the duration after which collectives will be aborted asynchronously and the process will crash. `NCCL_BLOCKING_WAIT` will provide errors to the user which can be caught and handled, but due to its blocking nature, it has a performance overhead. On the other hand, `NCCL_ASYNC_ERROR_HANDLING` has very little performance overhead, but crashes the process on errors. This is done since CUDA execution is async and it is no longer safe to continue executing user code since failed async NCCL operations might result in subsequent CUDA operations running on corrupted data. Only one of these two environment variables should be set.
  * **group_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_ _,__deprecated_) – Group name.

To enable `backend == Backend.MPI`, PyTorch needs to be built from source on a
system that supports MPI.

`class torch.distributed.Backend`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#Backend)

    

An enum-like class of available backends: GLOO, NCCL, MPI, and other
registered backends.

The values of this class are lowercase strings, e.g., `"gloo"`. They can be
accessed as attributes, e.g., `Backend.NCCL`.

This class can be directly called to parse the string, e.g.,
`Backend(backend_str)` will check if `backend_str` is valid, and return the
parsed lowercase string if so. It also accepts uppercase strings, e.g.,
`Backend("GLOO")` returns `"gloo"`.

Note

The entry `Backend.UNDEFINED` is present but only used as initial value of
some fields. Users should neither use it directly nor assume its existence.

`torch.distributed.get_backend(group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#get_backend)

    

Returns the backend of the given process group.

Parameters

    

**group** (_ProcessGroup_ _,__optional_) – The process group to work on. The
default is the general main process group. If another specific group is
specified, the calling process must be part of `group`.

Returns

    

The backend of the given process group as a lower case string.

`torch.distributed.get_rank(group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#get_rank)

    

Returns the rank of current process group

Rank is a unique identifier assigned to each process within a distributed
process group. They are always consecutive integers ranging from 0 to
`world_size`.

Parameters

    

**group** (_ProcessGroup_ _,__optional_) – The process group to work on. If
None, the default process group will be used.

Returns

    

The rank of the process group -1, if not part of the group

`torch.distributed.get_world_size(group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#get_world_size)

    

Returns the number of processes in the current process group

Parameters

    

**group** (_ProcessGroup_ _,__optional_) – The process group to work on. If
None, the default process group will be used.

Returns

    

The world size of the process group -1, if not part of the group

`torch.distributed.is_initialized()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#is_initialized)

    

Checking if the default process group has been initialized

`torch.distributed.is_mpi_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#is_mpi_available)

    

Checks if the MPI backend is available.

`torch.distributed.is_nccl_available()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#is_nccl_available)

    

Checks if the NCCL backend is available.

Currently three initialization methods are supported:

### TCP initialization

There are two ways to initialize using TCP, both requiring a network address
reachable from all processes and a desired `world_size`. The first way
requires specifying an address that belongs to the rank 0 process. This
initialization method requires that all processes have manually specified
ranks.

Note that multicast address is not supported anymore in the latest distributed
package. `group_name` is deprecated as well.

    
    
    import torch.distributed as dist
    
    # Use address of one of the machines
    dist.init_process_group(backend, init_method='tcp://10.1.1.20:23456',
                            rank=args.rank, world_size=4)
    

### Shared file-system initialization

Another initialization method makes use of a file system that is shared and
visible from all machines in a group, along with a desired `world_size`. The
URL should start with `file://` and contain a path to a non-existent file (in
an existing directory) on a shared file system. File-system initialization
will automatically create that file if it doesn’t exist, but will not delete
the file. Therefore, it is your responsibility to make sure that the file is
cleaned up before the next `init_process_group()` call on the same file
path/name.

Note that automatic rank assignment is not supported anymore in the latest
distributed package and `group_name` is deprecated as well.

Warning

This method assumes that the file system supports locking using `fcntl` \-
most local systems and NFS support it.

Warning

This method will always create the file and try its best to clean up and
remove the file at the end of the program. In other words, each initialization
with the file init method will need a brand new empty file in order for the
initialization to succeed. If the same file used by the previous
initialization (which happens not to get cleaned up) is used again, this is
unexpected behavior and can often cause deadlocks and failures. Therefore,
even though this method will try its best to clean up the file, if the auto-
delete happens to be unsuccessful, it is your responsibility to ensure that
the file is removed at the end of the training to prevent the same file to be
reused again during the next time. This is especially important if you plan to
call `init_process_group()` multiple times on the same file name. In other
words, if the file is not removed/cleaned up and you call
`init_process_group()` again on that file, failures are expected. The rule of
thumb here is that, make sure that the file is non-existent or empty every
time `init_process_group()` is called.

    
    
    import torch.distributed as dist
    
    # rank should always be specified
    dist.init_process_group(backend, init_method='file:///mnt/nfs/sharedfile',
                            world_size=4, rank=args.rank)
    

### Environment variable initialization

This method will read the configuration from environment variables, allowing
one to fully customize how the information is obtained. The variables to be
set are:

  * `MASTER_PORT` \- required; has to be a free port on machine with rank 0
  * `MASTER_ADDR` \- required (except for rank 0); address of rank 0 node
  * `WORLD_SIZE` \- required; can be set either here, or in a call to init function
  * `RANK` \- required; can be set either here, or in a call to init function

The machine with rank 0 will be used to set up all connections.

This is the default method, meaning that `init_method` does not have to be
specified (or can be `env://`).

## Distributed Key-Value Store

The distributed package comes with a distributed key-value store, which can be
used to share information between processes in the group as well as to
initialize the distributed pacakge in `torch.distributed.init_process_group()`
(by explicitly creating the store as an alternative to specifying
`init_method`.) There are 3 choices for Key-Value Stores: `TCPStore`,
`FileStore`, and `HashStore`.

`class torch.distributed.Store`

    

Base class for all store implementations, such as the 3 provided by PyTorch
distributed: (`TCPStore`, `FileStore`, and `HashStore`).

`class torch.distributed.TCPStore`

    

A TCP-based distributed key-value store implementation. The server store holds
the data, while the client stores can connect to the server store over TCP and
perform actions such as `set()` to insert a key-value pair, `get()` to
retrieve a key-value pair, etc.

Parameters

    

  * **host_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The hostname or IP Address the server store should run on.
  * **port** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The port on which the server store should listen for incoming requests.
  * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The total number of store users (number of clients + 1 for the server).
  * **is_master** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – True when initializing the server store, False for client stores.
  * **timeout** (_timedelta_) – Timeout used by the store during initialization and for methods such as `get()` and `wait()`.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> # Run on process 1 (server)
    >>> server_store = dist.TCPStore("127.0.0.1", 1234, 2, True, timedelta(seconds=30))
    >>> # Run on process 2 (client)
    >>> client_store = dist.TCPStore("127.0.0.1", 1234, 2, False)
    >>> # Use any of the store methods from either the client or server after initialization
    >>> server_store.set("first_key", "first_value")
    >>> client_store.get("first_key")
    

`class torch.distributed.HashStore`

    

A thread-safe store implementation based on an underlying hashmap. This store
can be used within the same process (for example, by other threads), but
cannot be used across processes.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> store = dist.HashStore()
    >>> # store can be used from other threads
    >>> # Use any of the store methods after initialization
    >>> store.set("first_key", "first_value")
    

`class torch.distributed.FileStore`

    

A store implementation that uses a file to store the underlying key-value
pairs.

Parameters

    

  * **file_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – path of the file in which to store the key-value pairs
  * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The total number of processes using the store

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> store1 = dist.FileStore("/tmp/filestore", 2)
    >>> store2 = dist.FileStore("/tmp/filestore", 2)
    >>> # Use any of the store methods from either the client or server after initialization
    >>> store1.set("first_key", "first_value")
    >>> store2.get("first_key")
    

`class torch.distributed.PrefixStore`

    

A wrapper around any of the 3 key-value stores (`TCPStore`, `FileStore`, and
`HashStore`) that adds a prefix to each key inserted to the store.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The prefix string that is prepended to each key before being inserted into the store.
  * **store** (_torch.distributed.store_) – A store object that forms the underlying key-value store.

`torch.distributed.Store.set(self: torch._C._distributed_c10d.Store, arg0:
str, arg1: str) → None`

    

Inserts the key-value pair into the store based on the supplied `key` and
`value`. If `key` already exists in the store, it will overwrite the old value
with the new supplied `value`.

Parameters

    

  * **key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The key to be added to the store.
  * **value** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The value associated with `key` to be added to the store.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> store.set("first_key", "first_value")
    >>> # Should return "first_value"
    >>> store.get("first_key")
    

`torch.distributed.Store.get(self: torch._C._distributed_c10d.Store, arg0:
str) → bytes`

    

Retrieves the value associated with the given `key` in the store. If `key` is
not present in the store, the function will wait for `timeout`, which is
defined when initializing the store, before throwing an exception.

Parameters

    

**key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – The function will return the value associated with this
key.

Returns

    

Value associated with `key` if `key` is in the store.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> store.set("first_key", "first_value")
    >>> # Should return "first_value"
    >>> store.get("first_key")
    

`torch.distributed.Store.add(self: torch._C._distributed_c10d.Store, arg0:
str, arg1: int) → int`

    

The first call to add for a given `key` creates a counter associated with
`key` in the store, initialized to `amount`. Subsequent calls to add with the
same `key` increment the counter by the specified `amount`. Calling `add()`
with a key that has already been set in the store by `set()` will result in an
exception.

Parameters

    

  * **key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The key in the store whose counter will be incremented.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The quantity by which the counter will be incremented.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> # Using TCPStore as an example, other store types can also be used
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> store.add("first_key", 1)
    >>> store.add("first_key", 6)
    >>> # Should return 7
    >>> store.get("first_key")
    

`torch.distributed.Store.wait(*args, **kwargs)`

    

Overloaded function.

  1. wait(self: torch._C._distributed_c10d.Store, arg0: List[str]) -> None

Waits for each key in `keys` to be added to the store. If not all keys are set
before the `timeout` (set during store initialization), then `wait` will throw
an exception.

Parameters

    

**keys** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in
Python v3.9\)")) – List of keys on which to wait until they are set in the
store.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> # Using TCPStore as an example, other store types can also be used
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> # This will throw an exception after 30 seconds
    >>> store.wait(["bad_key"])
    

  2. wait(self: torch._C._distributed_c10d.Store, arg0: List[str], arg1: datetime.timedelta) -> None

Waits for each key in `keys` to be added to the store, and throws an exception
if the keys have not been set by the supplied `timeout`.

Parameters

    

  * **keys** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – List of keys on which to wait until they are set in the store.
  * **timeout** (_timedelta_) – Time to wait for the keys to be added before throwing an exception.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> # Using TCPStore as an example, other store types can also be used
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> # This will throw an exception after 10 seconds
    >>> store.wait(["bad_key"], timedelta(seconds=10))
    

`torch.distributed.Store.num_keys(self: torch._C._distributed_c10d.Store) →
int`

    

Returns the number of keys set in the store. Note that this number will
typically be one greater than the number of keys added by `set()` and `add()`
since one key is used to coordinate all the workers using the store.

Warning

When used with the `TCPStore`, `num_keys` returns the number of keys written
to the underlying file. If the store is destructed and another store is
created with the same file, the original keys will be retained.

Returns

    

The number of keys present in the store.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> # Using TCPStore as an example, other store types can also be used
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> store.set("first_key", "first_value")
    >>> # This should return 2
    >>> store.num_keys()
    

`torch.distributed.Store.delete_key(self: torch._C._distributed_c10d.Store,
arg0: str) → bool`

    

Deletes the key-value pair associated with `key` from the store. Returns
`true` if the key was successfully deleted, and `false` if it was not.

Warning

The `delete_key` API is only supported by the `TCPStore` and `HashStore`.
Using this API with the `FileStore` will result in an exception.

Parameters

    

**key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – The key to be deleted from the store

Returns

    

`True` if `key` was deleted, otherwise `False`.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> # Using TCPStore as an example, HashStore can also be used
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> store.set("first_key")
    >>> # This should return true
    >>> store.delete_key("first_key")
    >>> # This should return false
    >>> store.delete_key("bad_key")
    

`torch.distributed.Store.set_timeout(self: torch._C._distributed_c10d.Store,
arg0: datetime.timedelta) → None`

    

Sets the store’s default timeout. This timeout is used during initialization
and in `wait()` and `get()`.

Parameters

    

**timeout** (_timedelta_) – timeout to be set in the store.

Example::

    
    
    
    >>> import torch.distributed as dist
    >>> from datetime import timedelta
    >>> # Using TCPStore as an example, other store types can also be used
    >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30))
    >>> store.set_timeout(timedelta(seconds=10))
    >>> # This will throw an exception after 10 seconds
    >>> store.wait(["bad_key"])
    

## Groups

By default collectives operate on the default group (also called the world)
and require all processes to enter the distributed function call. However,
some workloads can benefit from more fine-grained communication. This is where
distributed groups come into play. `new_group()` function can be used to
create new groups, with arbitrary subsets of all processes. It returns an
opaque group handle that can be given as a `group` argument to all collectives
(collectives are distributed functions to exchange information in certain
well-known programming patterns).

`torch.distributed.new_group(ranks=None,
timeout=datetime.timedelta(seconds=1800), backend=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#new_group)

    

Creates a new distributed group.

This function requires that all processes in the main group (i.e. all
processes that are part of the distributed job) enter this function, even if
they are not going to be members of the group. Additionally, groups should be
created in the same order in all processes.

Warning

Using multiple process groups with the `NCCL` backend concurrently is not safe
and the user should perform explicit synchronization in their application to
ensure only one process group is used at a time. This means collectives from
one process group should have completed execution on the device (not just
enqueued since CUDA execution is async) before collectives from another
process group are enqueued. See [Using multiple NCCL communicators
concurrently](https://docs.nvidia.com/deeplearning/nccl/user-
guide/docs/usage/communicators.html#using-multiple-nccl-communicators-
concurrently) for more details.

Parameters

    

  * **ranks** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – List of ranks of group members. If `None`, will be set to all ranks. Default is `None`.
  * **timeout** (_timedelta_ _,__optional_) – Timeout for operations executed against the process group. Default value equals 30 minutes. This is only applicable for the `gloo` backend.
  * **backend** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_Backend _,__optional_) – The backend to use. Depending on build-time configurations, valid values are `gloo` and `nccl`. By default uses the same backend as the global group. This field should be given as a lowercase string (e.g., `"gloo"`), which can also be accessed via `Backend` attributes (e.g., `Backend.GLOO`).

Returns

    

A handle of distributed group that can be given to collective calls.

## Point-to-point communication

`torch.distributed.send(tensor, dst, group=None, tag=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#send)

    

Sends a tensor synchronously.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to send.
  * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match send with remote recv

`torch.distributed.recv(tensor, src=None, group=None, tag=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#recv)

    

Receives a tensor synchronously.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to fill with received data.
  * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Source rank. Will receive from any process if unspecified.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match recv with remote send

Returns

    

Sender rank -1, if not part of the group

`isend()` and `irecv()` return distributed request objects when used. In
general, the type of this object is unspecified as they should never be
created manually, but they are guaranteed to support two methods:

  * `is_completed()` \- returns True if the operation has finished
  * `wait()` \- will block the process until the operation is finished. `is_completed()` is guaranteed to return True once it returns.

`torch.distributed.isend(tensor, dst, group=None, tag=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#isend)

    

Sends a tensor asynchronously.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to send.
  * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match send with remote recv

Returns

    

A distributed request object. None, if not part of the group

`torch.distributed.irecv(tensor, src=None, group=None, tag=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#irecv)

    

Receives a tensor asynchronously.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to fill with received data.
  * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Source rank. Will receive from any process if unspecified.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match recv with remote send

Returns

    

A distributed request object. None, if not part of the group

## Synchronous and asynchronous collective operations

Every collective operation function supports the following two kinds of
operations, depending on the setting of the `async_op` flag passed into the
collective:

**Synchronous operation** \- the default mode, when `async_op` is set to
`False`. When the function returns, it is guaranteed that the collective
operation is performed. In the case of CUDA operations, it is not guaranteed
that the CUDA operation is completed, since CUDA operations are asynchronous.
For CPU collectives, any further function calls utilizing the output of the
collective call will behave as expected. For CUDA collectives, function calls
utilizing the output on the same CUDA stream will behave as expected. Users
must take care of synchronization under the scenario of running under
different streams. For details on CUDA semantics such as stream
synchronization, see [CUDA
Semantics](https://pytorch.org/docs/stable/notes/cuda.html). See the below
script to see examples of differences in these semantics for CPU and CUDA
operations.

**Asynchronous operation** \- when `async_op` is set to True. The collective
operation function returns a distributed request object. In general, you don’t
need to create it manually and it is guaranteed to support two methods:

  * `is_completed()` \- in the case of CPU collectives, returns `True` if completed. In the case of CUDA operations, returns `True` if the operation has been successfully enqueued onto a CUDA stream and the output can be utilized on the default stream without further synchronization.
  * `wait()` \- in the case of CPU collectives, will block the process until the operation is completed. In the case of CUDA collectives, will block until the operation has been successfully enqueued onto a CUDA stream and the output can be utilized on the default stream without further synchronization.

**Example**

The following code can serve as a reference regarding semantics for CUDA
operations when using distributed collectives. It shows the explicit need to
synchronize when using collective outputs on different CUDA streams:

    
    
    # Code runs on each rank.
    dist.init_process_group("nccl", rank=rank, world_size=2)
    output = torch.tensor([rank]).cuda(rank)
    s = torch.cuda.Stream()
    handle = dist.all_reduce(output, async_op=True)
    # Wait ensures the operation is enqueued, but not necessarily complete.
    handle.wait()
    # Using result on non-default stream.
    with torch.cuda.stream(s):
        s.wait_stream(torch.cuda.default_stream())
        output.add_(100)
    if rank == 0:
        # if the explicit call to wait_stream was omitted, the output below will be
        # non-deterministically 1 or 101, depending on whether the allreduce overwrote
        # the value after the add completed.
        print(output)
    

## Collective functions

`torch.distributed.broadcast(tensor, src, group=None, async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#broadcast)

    

Broadcasts the tensor to the whole group.

`tensor` must have the same number of elements in all processes participating
in the collective.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Data to be sent if `src` is the rank of current process, and tensor to be used to save received data otherwise.
  * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`torch.distributed.broadcast_object_list(object_list, src=0, group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#broadcast_object_list)

    

Broadcasts picklable objects in `object_list` to the whole group. Similar to
`broadcast()`, but Python objects can be passed in. Note that all objects in
`object_list` must be picklable in order to be broadcasted.

Parameters

    

  * **object_list** (_List_ _[__Any_ _]_) – List of input objects to broadcast. Each object must be picklable. Only objects on the `src` rank will be broadcast, but each rank must provide lists of equal sizes.
  * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank from which to broadcast `object_list`.
  * **group** – (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. Default is `None`.

Returns

    

`None`. If rank is part of the group, `object_list` will contain the
broadcasted objects from `src` rank.

Note

For NCCL-based processed groups, internal tensor representations of objects
must be moved to the GPU device before communication takes place. In this
case, the device used is given by `torch.cuda.current_device()` and it is the
user’s responsiblity to ensure that this is set so that each rank has an
individual GPU, via `torch.cuda.set_device()`.

Note

Note that this API differs slightly from the `all_gather()` collective since
it does not provide an `async_op` handle and thus will be a blocking call.

Warning

`broadcast_object_list()` uses `pickle` module implicitly, which is known to
be insecure. It is possible to construct malicious pickle data which will
execute arbitrary code during unpickling. Only call this function with data
you trust.

Example::

    
    
    
    >>> # Note: Process group initialization omitted on each rank.
    >>> import torch.distributed as dist
    >>> if dist.get_rank() == 0:
    >>>     # Assumes world_size of 3.
    >>>     objects = ["foo", 12, {1: 2}] # any picklable object
    >>> else:
    >>>     objects = [None, None, None]
    >>> dist.broadcast_object_list(objects, src=0)
    >>> broadcast_objects
    ['foo', 12, {1: 2}]
    

`torch.distributed.all_reduce(tensor, op=<ReduceOp.SUM: 0>, group=None,
async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_reduce)

    

Reduces the tensor data across all machines in such a way that all get the
final result.

After the call `tensor` is going to be bitwise identical in all processes.

Complex tensors are supported.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Input and output of the collective. The function operates in-place.
  * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

#### Examples

    
    
    >>> # All tensors below are of torch.int64 type.
    >>> # We have 2 process groups, 2 ranks.
    >>> tensor = torch.arange(2, dtype=torch.int64) + 1 + 2 * rank
    >>> tensor
    tensor([1, 2]) # Rank 0
    tensor([3, 4]) # Rank 1
    >>> dist.all_reduce(tensor, op=ReduceOp.SUM)
    >>> tensor
    tensor([4, 6]) # Rank 0
    tensor([4, 6]) # Rank 1
    
    
    
    >>> # All tensors below are of torch.cfloat type.
    >>> # We have 2 process groups, 2 ranks.
    >>> tensor = torch.tensor([1+1j, 2+2j], dtype=torch.cfloat) + 2 * rank * (1+1j)
    >>> tensor
    tensor([1.+1.j, 2.+2.j]) # Rank 0
    tensor([3.+3.j, 4.+4.j]) # Rank 1
    >>> dist.all_reduce(tensor, op=ReduceOp.SUM)
    >>> tensor
    tensor([4.+4.j, 6.+6.j]) # Rank 0
    tensor([4.+4.j, 6.+6.j]) # Rank 1
    

`torch.distributed.reduce(tensor, dst, op=<ReduceOp.SUM: 0>, group=None,
async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce)

    

Reduces the tensor data across all machines.

Only the process with rank `dst` is going to receive the final result.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Input and output of the collective. The function operates in-place.
  * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank
  * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`torch.distributed.all_gather(tensor_list, tensor, group=None,
async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_gather)

    

Gathers tensors from the whole group in a list.

Complex tensors are supported.

Parameters

    

  * **tensor_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – Output list. It should contain correctly-sized tensors to be used for output of the collective.
  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to be broadcast from current process.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

#### Examples

    
    
    >>> # All tensors below are of torch.int64 dtype.
    >>> # We have 2 process groups, 2 ranks.
    >>> tensor_list = [torch.zero(2, dtype=torch.int64) for _ in range(2)]
    >>> tensor_list
    [tensor([0, 0]), tensor([0, 0])] # Rank 0 and 1
    >>> tensor = torch.arange(2, dtype=torch.int64) + 1 + 2 * rank
    >>> tensor
    tensor([1, 2]) # Rank 0
    tensor([3, 4]) # Rank 1
    >>> dist.all_gather(tensor_list, tensor)
    >>> tensor_list
    [tensor([1, 2]), tensor([3, 4])] # Rank 0
    [tensor([1, 2]), tensor([3, 4])] # Rank 1
    
    
    
    >>> # All tensors below are of torch.cfloat dtype.
    >>> # We have 2 process groups, 2 ranks.
    >>> tensor_list = [torch.zero(2, dtype=torch.cfloat) for _ in range(2)]
    >>> tensor_list
    [tensor([0.+0.j, 0.+0.j]), tensor([0.+0.j, 0.+0.j])] # Rank 0 and 1
    >>> tensor = torch.tensor([1+1j, 2+2j], dtype=torch.cfloat) + 2 * rank * (1+1j)
    >>> tensor
    tensor([1.+1.j, 2.+2.j]) # Rank 0
    tensor([3.+3.j, 4.+4.j]) # Rank 1
    >>> dist.all_gather(tensor_list, tensor)
    >>> tensor_list
    [tensor([1.+1.j, 2.+2.j]), tensor([3.+3.j, 4.+4.j])] # Rank 0
    [tensor([1.+1.j, 2.+2.j]), tensor([3.+3.j, 4.+4.j])] # Rank 1
    

`torch.distributed.all_gather_object(object_list, obj, group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_gather_object)

    

Gathers picklable objects from the whole group into a list. Similar to
`all_gather()`, but Python objects can be passed in. Note that the object must
be picklable in order to be gathered.

Parameters

    

  * **object_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[__Any_ _]_) – Output list. It should be correctly sized as the size of the group for this collective and will contain the output.
  * **object** (_Any_) – Pickable Python object to be broadcast from current process.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. Default is `None`.

Returns

    

None. If the calling rank is part of this group, the output of the collective
will be populated into the input `object_list`. If the calling rank is not
part of the group, the passed in `object_list` will be unmodified.

Note

Note that this API differs slightly from the `all_gather()` collective since
it does not provide an `async_op` handle and thus will be a blocking call.

Note

For NCCL-based processed groups, internal tensor representations of objects
must be moved to the GPU device before communication takes place. In this
case, the device used is given by `torch.cuda.current_device()` and it is the
user’s responsiblity to ensure that this is set so that each rank has an
individual GPU, via `torch.cuda.set_device()`.

Warning

`all_gather_object()` uses `pickle` module implicitly, which is known to be
insecure. It is possible to construct malicious pickle data which will execute
arbitrary code during unpickling. Only call this function with data you trust.

Example::

    
    
    
    >>> # Note: Process group initialization omitted on each rank.
    >>> import torch.distributed as dist
    >>> # Assumes world_size of 3.
    >>> gather_objects = ["foo", 12, {1: 2}] # any picklable object
    >>> output = [None for _ in gather_objects]
    >>> dist.all_gather_object(output, gather_objects[dist.get_rank()])
    >>> output
    ['foo', 12, {1: 2}]
    

`torch.distributed.gather(tensor, gather_list=None, dst=0, group=None,
async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#gather)

    

Gathers a list of tensors in a single process.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Input tensor.
  * **gather_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__,__optional_) – List of appropriately-sized tensors to use for gathered data (default is None, must be specified on the destination rank)
  * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Destination rank (default is 0)
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`torch.distributed.gather_object(obj, object_gather_list=None, dst=0,
group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#gather_object)

    

Gathers picklable objects from the whole group in a single process. Similar to
`gather()`, but Python objects can be passed in. Note that the object must be
picklable in order to be gathered.

Parameters

    

  * **obj** (_Any_) – Input object. Must be picklable.
  * **object_gather_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[__Any_ _]_) – Output list. On the `dst` rank, it should be correctly sized as the size of the group for this collective and will contain the output. Must be `None` on non-dst ranks. (default is `None`)
  * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Destination rank. (default is 0)
  * **group** – (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. Default is `None`.

Returns

    

None. On the `dst` rank, `object_gather_list` will contain the output of the
collective.

Note

Note that this API differs slightly from the gather collective since it does
not provide an async_op handle and thus will be a blocking call.

Note

Note that this API is not supported when using the NCCL backend.

Warning

`gather_object()` uses `pickle` module implicitly, which is known to be
insecure. It is possible to construct malicious pickle data which will execute
arbitrary code during unpickling. Only call this function with data you trust.

Example::

    
    
    
    >>> # Note: Process group initialization omitted on each rank.
    >>> import torch.distributed as dist
    >>> # Assumes world_size of 3.
    >>> gather_objects = ["foo", 12, {1: 2}] # any picklable object
    >>> output = [None for _ in gather_objects]
    >>> dist.gather_object(
            gather_objects[dist.get_rank()],
            output if dist.get_rank() == 0 else None,
            dst=0
        )
    >>> # On rank 0
    >>> output
    ['foo', 12, {1: 2}]
    

`torch.distributed.scatter(tensor, scatter_list=None, src=0, group=None,
async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#scatter)

    

Scatters a list of tensors to all processes in a group.

Each process will receive exactly one tensor and store its data in the
`tensor` argument.

Parameters

    

  * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Output tensor.
  * **scatter_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to scatter (default is None, must be specified on the source rank)
  * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank (default is 0)
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`torch.distributed.scatter_object_list(scatter_object_output_list,
scatter_object_input_list, src=0, group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#scatter_object_list)

    

Scatters picklable objects in `scatter_object_input_list` to the whole group.
Similar to `scatter()`, but Python objects can be passed in. On each rank, the
scattered object will be stored as the first element of
`scatter_object_output_list`. Note that all objects in
`scatter_object_input_list` must be picklable in order to be scattered.

Parameters

    

  * **scatter_object_output_list** (_List_ _[__Any_ _]_) – Non-empty list whose first element will store the object scattered to this rank.
  * **scatter_object_input_list** (_List_ _[__Any_ _]_) – List of input objects to scatter. Each object must be picklable. Only objects on the `src` rank will be scattered, and the argument can be `None` for non-src ranks.
  * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank from which to scatter `scatter_object_input_list`.
  * **group** – (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. Default is `None`.

Returns

    

`None`. If rank is part of the group, `scatter_object_output_list` will have
its first element set to the scattered object for this rank.

Note

Note that this API differs slightly from the scatter collective since it does
not provide an `async_op` handle and thus will be a blocking call.

Warning

`scatter_object_list()` uses `pickle` module implicitly, which is known to be
insecure. It is possible to construct malicious pickle data which will execute
arbitrary code during unpickling. Only call this function with data you trust.

Example::

    
    
    
    >>> # Note: Process group initialization omitted on each rank.
    >>> import torch.distributed as dist
    >>> if dist.get_rank() == 0:
    >>>     # Assumes world_size of 3.
    >>>     objects = ["foo", 12, {1: 2}] # any picklable object
    >>> else:
    >>>     # Can be any list on non-src ranks, elements are not used.
    >>>     objects = [None, None, None]
    >>> output_list = [None]
    >>> dist.scatter_object_list(output_list, objects, src=0)
    >>> # Rank i gets objects[i]. For example, on rank 2:
    >>> output_list
    [{1: 2}]
    

`torch.distributed.reduce_scatter(output, input_list, op=<ReduceOp.SUM: 0>,
group=None, async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce_scatter)

    

Reduces, then scatters a list of tensors to all processes in a group.

Parameters

    

  * **output** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Output tensor.
  * **input_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to reduce and scatter.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op.

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group.

`torch.distributed.all_to_all(output_tensor_list, input_tensor_list,
group=None, async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_to_all)

    

Each process scatters list of input tensors to all processes in a group and
return gathered list of tensors in output list.

Parameters

    

  * **output_tensor_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to be gathered one per rank.
  * **input_tensor_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to scatter one per rank.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op.

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group.

Warning

`all_to_all` is experimental and subject to change.

#### Examples

    
    
    >>> input = torch.arange(4) + rank * 4
    >>> input = list(input.chunk(4))
    >>> input
    [tensor([0]), tensor([1]), tensor([2]), tensor([3])]     # Rank 0
    [tensor([4]), tensor([5]), tensor([6]), tensor([7])]     # Rank 1
    [tensor([8]), tensor([9]), tensor([10]), tensor([11])]   # Rank 2
    [tensor([12]), tensor([13]), tensor([14]), tensor([15])] # Rank 3
    >>> output = list(torch.empty([4], dtype=torch.int64).chunk(4))
    >>> dist.all_to_all(output, input)
    >>> output
    [tensor([0]), tensor([4]), tensor([8]), tensor([12])]    # Rank 0
    [tensor([1]), tensor([5]), tensor([9]), tensor([13])]    # Rank 1
    [tensor([2]), tensor([6]), tensor([10]), tensor([14])]   # Rank 2
    [tensor([3]), tensor([7]), tensor([11]), tensor([15])]   # Rank 3
    
    
    
    >>> # Essentially, it is similar to following operation:
    >>> scatter_list = input
    >>> gather_list  = output
    >>> for i in range(world_size):
    >>>   dist.scatter(gather_list[i], scatter_list if i == rank else [], src = i)
    
    
    
    >>> input
    tensor([0, 1, 2, 3, 4, 5])                                       # Rank 0
    tensor([10, 11, 12, 13, 14, 15, 16, 17, 18])                     # Rank 1
    tensor([20, 21, 22, 23, 24])                                     # Rank 2
    tensor([30, 31, 32, 33, 34, 35, 36])                             # Rank 3
    >>> input_splits
    [2, 2, 1, 1]                                                     # Rank 0
    [3, 2, 2, 2]                                                     # Rank 1
    [2, 1, 1, 1]                                                     # Rank 2
    [2, 2, 2, 1]                                                     # Rank 3
    >>> output_splits
    [2, 3, 2, 2]                                                     # Rank 0
    [2, 2, 1, 2]                                                     # Rank 1
    [1, 2, 1, 2]                                                     # Rank 2
    [1, 2, 1, 1]                                                     # Rank 3
    >>> input = list(input.split(input_splits))
    >>> input
    [tensor([0, 1]), tensor([2, 3]), tensor([4]), tensor([5])]                   # Rank 0
    [tensor([10, 11, 12]), tensor([13, 14]), tensor([15, 16]), tensor([17, 18])] # Rank 1
    [tensor([20, 21]), tensor([22]), tensor([23]), tensor([24])]                 # Rank 2
    [tensor([30, 31]), tensor([32, 33]), tensor([34, 35]), tensor([36])]         # Rank 3
    >>> output = ...
    >>> dist.all_to_all(output, input)
    >>> output
    [tensor([0, 1]), tensor([10, 11, 12]), tensor([20, 21]), tensor([30, 31])]   # Rank 0
    [tensor([2, 3]), tensor([13, 14]), tensor([22]), tensor([32, 33])]           # Rank 1
    [tensor([4]), tensor([15, 16]), tensor([23]), tensor([34, 35])]              # Rank 2
    [tensor([5]), tensor([17, 18]), tensor([24]), tensor([36])]                  # Rank 3
    

`torch.distributed.barrier(group=None, async_op=False, device_ids=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#barrier)

    

Synchronizes all processes.

This collective blocks processes until the whole group enters this function,
if async_op is False, or if async work handle is called on wait().

Parameters

    

  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op
  * **device_ids** (_[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – List of device/GPU ids. Valid only for NCCL backend.

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`class torch.distributed.ReduceOp`

    

An enum-like class for available reduction operations: `SUM`, `PRODUCT`,
`MIN`, `MAX`, `BAND`, `BOR`, and `BXOR`.

Note that `BAND`, `BOR`, and `BXOR` reductions are not available when using
the `NCCL` backend.

Additionally, `MAX`, `MIN` and `PRODUCT` are not supported for complex
tensors.

The values of this class can be accessed as attributes, e.g., `ReduceOp.SUM`.
They are used in specifying strategies for reduction collectives, e.g.,
`reduce()`, `all_reduce_multigpu()`, etc.

Members:

SUM

PRODUCT

MIN

MAX

BAND

BOR

BXOR

`class torch.distributed.reduce_op`

    

Deprecated enum-like class for reduction operations: `SUM`, `PRODUCT`, `MIN`,
and `MAX`.

`ReduceOp` is recommended to use instead.

## Autograd-enabled communication primitives

If you want to use collective communication functions supporting autograd you
can find an implementation of those in the `torch.distributed.nn.*` module.

Functions here are synchronous and will be inserted in the autograd graph, so
you need to ensure that all the processes that participated in the collective
operation will do the backward pass for the backward communication to
effectively happen and don’t cause a deadlock.

Please notice that currently the only backend where all the functions are
guaranteed to work is `gloo`. .. autofunction:: torch.distributed.nn.broadcast
.. autofunction:: torch.distributed.nn.gather .. autofunction::
torch.distributed.nn.scatter .. autofunction:: torch.distributed.nn.reduce ..
autofunction:: torch.distributed.nn.all_gather .. autofunction::
torch.distributed.nn.all_to_all .. autofunction::
torch.distributed.nn.all_reduce

## Multi-GPU collective functions

If you have more than one GPU on each node, when using the NCCL and Gloo
backend, `broadcast_multigpu()` `all_reduce_multigpu()` `reduce_multigpu()`
`all_gather_multigpu()` and `reduce_scatter_multigpu()` support distributed
collective operations among multiple GPUs within each node. These functions
can potentially improve the overall distributed training performance and be
easily used by passing a list of tensors. Each Tensor in the passed tensor
list needs to be on a separate GPU device of the host where the function is
called. Note that the length of the tensor list needs to be identical among
all the distributed processes. Also note that currently the multi-GPU
collective functions are only supported by the NCCL backend.

For example, if the system we use for distributed training has 2 nodes, each
of which has 8 GPUs. On each of the 16 GPUs, there is a tensor that we would
like to all-reduce. The following code can serve as a reference:

Code running on Node 0

    
    
    import torch
    import torch.distributed as dist
    
    dist.init_process_group(backend="nccl",
                            init_method="file:///distributed_test",
                            world_size=2,
                            rank=0)
    tensor_list = []
    for dev_idx in range(torch.cuda.device_count()):
        tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx))
    
    dist.all_reduce_multigpu(tensor_list)
    

Code running on Node 1

    
    
    import torch
    import torch.distributed as dist
    
    dist.init_process_group(backend="nccl",
                            init_method="file:///distributed_test",
                            world_size=2,
                            rank=1)
    tensor_list = []
    for dev_idx in range(torch.cuda.device_count()):
        tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx))
    
    dist.all_reduce_multigpu(tensor_list)
    

After the call, all 16 tensors on the two nodes will have the all-reduced
value of 16

`torch.distributed.broadcast_multigpu(tensor_list, src, group=None,
async_op=False, src_tensor=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#broadcast_multigpu)

    

Broadcasts the tensor to the whole group with multiple GPU tensors per node.

`tensor` must have the same number of elements in all the GPUs from all
processes participating in the collective. each tensor in the list must be on
a different GPU

Only nccl and gloo backend are currently supported tensors should only be GPU
tensors

Parameters

    

  * **tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – Tensors that participate in the collective operation. If `src` is the rank, then the specified `src_tensor` element of `tensor_list` (`tensor_list[src_tensor]`) will be broadcast to all other tensors (on different GPUs) in the src process and all tensors in `tensor_list` of other non-src processes. You also need to make sure that `len(tensor_list)` is the same for all the distributed processes calling this function.
  * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op
  * **src_tensor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Source tensor rank within `tensor_list`

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`torch.distributed.all_reduce_multigpu(tensor_list, op=<ReduceOp.SUM: 0>,
group=None, async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_reduce_multigpu)

    

Reduces the tensor data across all machines in such a way that all get the
final result. This function reduces a number of tensors on every node, while
each tensor resides on different GPUs. Therefore, the input tensor in the
tensor list needs to be GPU tensors. Also, each tensor in the tensor list
needs to reside on a different GPU.

After the call, all `tensor` in `tensor_list` is going to be bitwise identical
in all processes.

Complex tensors are supported.

Only nccl and gloo backend is currently supported tensors should only be GPU
tensors

Parameters

    

  * **list** (_tensor_) – List of input and output tensors of the collective. The function operates in-place and requires that each tensor to be a GPU tensor on different GPUs. You also need to make sure that `len(tensor_list)` is the same for all the distributed processes calling this function.
  * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`torch.distributed.reduce_multigpu(tensor_list, dst, op=<ReduceOp.SUM: 0>,
group=None, async_op=False, dst_tensor=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce_multigpu)

    

Reduces the tensor data on multiple GPUs across all machines. Each tensor in
`tensor_list` should reside on a separate GPU

Only the GPU of `tensor_list[dst_tensor]` on the process with rank `dst` is
going to receive the final result.

Only nccl backend is currently supported tensors should only be GPU tensors

Parameters

    

  * **tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – Input and output GPU tensors of the collective. The function operates in-place. You also need to make sure that `len(tensor_list)` is the same for all the distributed processes calling this function.
  * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank
  * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op
  * **dst_tensor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Destination tensor rank within `tensor_list`

Returns

    

Async work handle, if async_op is set to True. None, otherwise

`torch.distributed.all_gather_multigpu(output_tensor_lists, input_tensor_list,
group=None, async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_gather_multigpu)

    

Gathers tensors from the whole group in a list. Each tensor in `tensor_list`
should reside on a separate GPU

Only nccl backend is currently supported tensors should only be GPU tensors

Complex tensors are supported.

Parameters

    

  * **output_tensor_lists** (_List_ _[__List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__]_) – 

Output lists. It should contain correctly-sized tensors on each GPU to be used
for output of the collective, e.g. `output_tensor_lists[i]` contains the
all_gather result that resides on the GPU of `input_tensor_list[i]`.

Note that each element of `output_tensor_lists` has the size of `world_size *
len(input_tensor_list)`, since the function all gathers the result from every
single GPU in the group. To interpret each element of
`output_tensor_lists[i]`, note that `input_tensor_list[j]` of rank k will be
appear in `output_tensor_lists[i][k * world_size + j]`

Also note that `len(output_tensor_lists)`, and the size of each element in
`output_tensor_lists` (each element is a list, therefore
`len(output_tensor_lists[i])`) need to be the same for all the distributed
processes calling this function.

  * **input_tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors(on different GPUs) to be broadcast from current process. Note that `len(input_tensor_list)` needs to be the same for all the distributed processes calling this function.
  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group

`torch.distributed.reduce_scatter_multigpu(output_tensor_list,
input_tensor_lists, op=<ReduceOp.SUM: 0>, group=None, async_op=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce_scatter_multigpu)

    

Reduce and scatter a list of tensors to the whole group. Only nccl backend is
currently supported.

Each tensor in `output_tensor_list` should reside on a separate GPU, as should
each list of tensors in `input_tensor_lists`.

Parameters

    

  * **output_tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – 

Output tensors (on different GPUs) to receive the result of the operation.

Note that `len(output_tensor_list)` needs to be the same for all the
distributed processes calling this function.

  * **input_tensor_lists** (_List_ _[__List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__]_) – 

Input lists. It should contain correctly-sized tensors on each GPU to be used
for input of the collective, e.g. `input_tensor_lists[i]` contains the
reduce_scatter input that resides on the GPU of `output_tensor_list[i]`.

Note that each element of `input_tensor_lists` has the size of `world_size *
len(output_tensor_list)`, since the function scatters the result from every
single GPU in the group. To interpret each element of `input_tensor_lists[i]`,
note that `output_tensor_list[j]` of rank k receives the reduce-scattered
result from `input_tensor_lists[i][k * world_size + j]`

Also note that `len(input_tensor_lists)`, and the size of each element in
`input_tensor_lists` (each element is a list, therefore
`len(input_tensor_lists[i])`) need to be the same for all the distributed
processes calling this function.

  * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used.
  * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op.

Returns

    

Async work handle, if async_op is set to True. None, if not async_op or if not
part of the group.

## Third-party backends

Besides the GLOO/MPI/NCCL backends, PyTorch distributed supports third-party
backends through a run-time register mechanism. For references on how to
develop a third-party backend through C++ Extension, please refer to
[Tutorials - Custom C++ and CUDA
Extensions](https://pytorch.org/tutorials/advanced/cpp_extension.html) and
`test/cpp_extensions/cpp_c10d_extension.cpp`. The capability of third-party
backends are decided by their own implementations.

The new backend derives from `c10d.ProcessGroup` and registers the backend
name and the instantiating interface through
`torch.distributed.Backend.register_backend()` when imported.

When manually importing this backend and invoking
`torch.distributed.init_process_group()` with the corresponding backend name,
the `torch.distributed` package runs on the new backend.

Warning

The support of third-party backend is experimental and subject to change.

## Launch utility

The `torch.distributed` package also provides a launch utility in
`torch.distributed.launch`. This helper utility can be used to launch multiple
processes per node for distributed training.

`torch.distributed.launch` is a module that spawns up multiple distributed
training processes on each of the training nodes.

The utility can be used for single-node distributed training, in which one or
more processes per node will be spawned. The utility can be used for either
CPU training or GPU training. If the utility is used for GPU training, each
distributed process will be operating on a single GPU. This can achieve well-
improved single-node training performance. It can also be used in multi-node
distributed training, by spawning up multiple processes on each node for well-
improved multi-node distributed training performance as well. This will
especially be benefitial for systems with multiple Infiniband interfaces that
have direct-GPU support, since all of them can be utilized for aggregated
communication bandwidth.

In both cases of single-node distributed training or multi-node distributed
training, this utility will launch the given number of processes per node
(`--nproc_per_node`). If used for GPU training, this number needs to be less
or equal to the number of GPUs on the current system (`nproc_per_node`), and
each process will be operating on a single GPU from _GPU 0 to GPU
(nproc_per_node - 1)_.

**How to use this module:**

  1. Single-Node multi-process distributed training

    
    
    >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
               YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other
               arguments of your training script)
    

  2. Multi-Node multi-process distributed training: (e.g. two nodes)

Node 1: _(IP: 192.168.1.1, and has a free port: 1234)_

    
    
    >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
               --nnodes=2 --node_rank=0 --master_addr="192.168.1.1"
               --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
               and all other arguments of your training script)
    

Node 2:

    
    
    >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE
               --nnodes=2 --node_rank=1 --master_addr="192.168.1.1"
               --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3
               and all other arguments of your training script)
    

  3. To look up what optional arguments this module offers:

    
    
    >>> python -m torch.distributed.launch --help
    

**Important Notices:**

1\. This utility and multi-process distributed (single-node or multi-node) GPU
training currently only achieves the best performance using the NCCL
distributed backend. Thus NCCL backend is the recommended backend to use for
GPU training.

2\. In your training program, you must parse the command-line argument:
`--local_rank=LOCAL_PROCESS_RANK`, which will be provided by this module. If
your training program uses GPUs, you should ensure that your code only runs on
the GPU device of LOCAL_PROCESS_RANK. This can be done by:

Parsing the local_rank argument

    
    
    >>> import argparse
    >>> parser = argparse.ArgumentParser()
    >>> parser.add_argument("--local_rank", type=int)
    >>> args = parser.parse_args()
    

Set your device to local rank using either

    
    
    >>> torch.cuda.set_device(args.local_rank)  # before your code runs
    

or

    
    
    >>> with torch.cuda.device(args.local_rank):
    >>>    # your code to run
    

3\. In your training program, you are supposed to call the following function
at the beginning to start the distributed backend. You need to make sure that
the init_method uses `env://`, which is the only supported `init_method` by
this module.

    
    
    torch.distributed.init_process_group(backend='YOUR BACKEND',
                                         init_method='env://')
    

4\. In your training program, you can either use regular distributed functions
or use
[`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel") module. If your training program
uses GPUs for training and you would like to use
[`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel") module, here is how to configure
it.

    
    
    model = torch.nn.parallel.DistributedDataParallel(model,
                                                      device_ids=[args.local_rank],
                                                      output_device=args.local_rank)
    

Please ensure that `device_ids` argument is set to be the only GPU device id
that your code will be operating on. This is generally the local rank of the
process. In other words, the `device_ids` needs to be `[args.local_rank]`, and
`output_device` needs to be `args.local_rank` in order to use this utility

5\. Another way to pass `local_rank` to the subprocesses via environment
variable `LOCAL_RANK`. This behavior is enabled when you launch the script
with `--use_env=True`. You must adjust the subprocess example above to replace
`args.local_rank` with `os.environ['LOCAL_RANK']`; the launcher will not pass
`--local_rank` when you specify this flag.

Warning

`local_rank` is NOT globally unique: it is only unique per process on a
machine. Thus, don’t use it to decide if you should, e.g., write to a
networked filesystem. See <https://github.com/pytorch/pytorch/issues/12042>
for an example of how things can go wrong if you don’t do this correctly.

## Spawn utility

The [Multiprocessing package -
torch.multiprocessing](multiprocessing#multiprocessing-doc) package also
provides a `spawn` function in
[`torch.multiprocessing.spawn()`](multiprocessing#torch.multiprocessing.spawn
"torch.multiprocessing.spawn"). This helper function can be used to spawn
multiple processes. It works by passing in the function that you want to run
and spawns N processes to run it. This can be used for multiprocess
distributed training as well.

For references on how to use it, please refer to [PyTorch example - ImageNet
implementation](https://github.com/pytorch/examples/tree/master/imagenet)

Note that this function requires Python 3.4 or higher.

# Probability distributions - torch.distributions

The `distributions` package contains parameterizable probability distributions
and sampling functions. This allows the construction of stochastic computation
graphs and stochastic gradient estimators for optimization. This package
generally follows the design of the [TensorFlow
Distributions](https://arxiv.org/abs/1711.10604) package.

It is not possible to directly backpropagate through random samples. However,
there are two main methods for creating surrogate functions that can be
backpropagated through. These are the score function estimator/likelihood
ratio estimator/REINFORCE and the pathwise derivative estimator. REINFORCE is
commonly seen as the basis for policy gradient methods in reinforcement
learning, and the pathwise derivative estimator is commonly seen in the
reparameterization trick in variational autoencoders. Whilst the score
function only requires the value of samples f(x)f(x) , the pathwise derivative
requires the derivative f′(x)f'(x) . The next sections discuss these two in a
reinforcement learning example. For more details see [Gradient Estimation
Using Stochastic Computation Graphs](https://arxiv.org/abs/1506.05254) .

## Score function

When the probability density function is differentiable with respect to its
parameters, we only need `sample()` and `log_prob()` to implement REINFORCE:

Δθ=αr∂log⁡p(a∣πθ(s))∂θ\Delta\theta = \alpha r \frac{\partial\log
p(a|\pi^\theta(s))}{\partial\theta}

where θ\theta are the parameters, α\alpha is the learning rate, rr is the
reward and p(a∣πθ(s))p(a|\pi^\theta(s)) is the probability of taking action aa
in state ss given policy πθ\pi^\theta .

In practice we would sample an action from the output of a network, apply this
action in an environment, and then use `log_prob` to construct an equivalent
loss function. Note that we use a negative because optimizers use gradient
descent, whilst the rule above assumes gradient ascent. With a categorical
policy, the code for implementing REINFORCE would be as follows:

    
    
    probs = policy_network(state)
    # Note that this is equivalent to what used to be called multinomial
    m = Categorical(probs)
    action = m.sample()
    next_state, reward = env.step(action)
    loss = -m.log_prob(action) * reward
    loss.backward()
    

## Pathwise derivative

The other way to implement these stochastic/policy gradients would be to use
the reparameterization trick from the `rsample()` method, where the
parameterized random variable can be constructed via a parameterized
deterministic function of a parameter-free random variable. The
reparameterized sample therefore becomes differentiable. The code for
implementing the pathwise derivative would be as follows:

    
    
    params = policy_network(state)
    m = Normal(*params)
    # Any distribution with .has_rsample == True could work based on the application
    action = m.rsample()
    next_state, reward = env.step(action)  # Assuming that reward is differentiable
    loss = -reward
    loss.backward()
    

## Distribution

`class
torch.distributions.distribution.Distribution(batch_shape=torch.Size([]),
event_shape=torch.Size([]), validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution)

    

Bases: [`object`](https://docs.python.org/3/library/functions.html#object
"\(in Python v3.9\)")

Distribution is the abstract base class for probability distributions.

`property arg_constraints`

    

Returns a dictionary from argument names to `Constraint` objects that should
be satisfied by each argument of this distribution. Args that are not tensors
need not appear in this dict.

`property batch_shape`

    

Returns the shape over which parameters are batched.

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.cdf)

    

Returns the cumulative density/mass function evaluated at `value`.

Parameters

    

**value** ([Tensor](tensors#torch.Tensor "torch.Tensor")) –

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.entropy)

    

Returns entropy of distribution, batched over batch_shape.

Returns

    

Tensor of shape batch_shape.

`enumerate_support(expand=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.enumerate_support)

    

Returns tensor containing all values supported by a discrete distribution. The
result will enumerate over dimension 0, so the shape of the result will be
`(cardinality,) + batch_shape + event_shape` (where `event_shape = ()` for
univariate distributions).

Note that this enumerates over all batched tensors in lock-step `[[0, 0], [1,
1], …]`. With `expand=False`, enumeration happens along dim 0, but with the
remaining batch dimensions being singleton dimensions, `[[0], [1], ..`.

To iterate over the full Cartesian product use
`itertools.product(m.enumerate_support())`.

Parameters

    

**expand** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – whether to expand the support over the batch dims to match
the distribution’s `batch_shape`.

Returns

    

Tensor iterating over dimension 0.

`property event_shape`

    

Returns the shape of a single sample (without batching).

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.expand)

    

Returns a new distribution instance (or populates an existing instance
provided by a derived class) with batch dimensions expanded to `batch_shape`.
This method calls [`expand`](tensors#torch.Tensor.expand
"torch.Tensor.expand") on the distribution’s parameters. As such, this does
not allocate new memory for the expanded distribution instance. Additionally,
this does not repeat any args checking or parameter broadcasting in
`__init__.py`, when an instance is first created.

Parameters

    

  * **batch_shape** (_torch.Size_) – the desired expanded size.
  * **_instance** – new instance provided by subclasses that need to override `.expand`.

Returns

    

New distribution instance with batch dimensions expanded to `batch_size`.

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.icdf)

    

Returns the inverse cumulative density/mass function evaluated at `value`.

Parameters

    

**value** ([Tensor](tensors#torch.Tensor "torch.Tensor")) –

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.log_prob)

    

Returns the log of the probability density/mass function evaluated at `value`.

Parameters

    

**value** ([Tensor](tensors#torch.Tensor "torch.Tensor")) –

`property mean`

    

Returns the mean of the distribution.

`perplexity()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.perplexity)

    

Returns perplexity of distribution, batched over batch_shape.

Returns

    

Tensor of shape batch_shape.

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.rsample)

    

Generates a sample_shape shaped reparameterized sample or sample_shape shaped
batch of reparameterized samples if the distribution parameters are batched.

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.sample)

    

Generates a sample_shape shaped sample or sample_shape shaped batch of samples
if the distribution parameters are batched.

`sample_n(n)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.sample_n)

    

Generates n samples or n batches of samples if the distribution parameters are
batched.

`static set_default_validate_args(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.set_default_validate_args)

    

Sets whether validation is enabled or disabled.

The default behavior mimics Python’s `assert` statement: validation is on by
default, but is disabled if Python is run in optimized mode (via `python -O`).
Validation may be expensive, so you may want to disable it once a model is
working.

Parameters

    

**value** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – Whether to enable validation.

`property stddev`

    

Returns the standard deviation of the distribution.

`property support`

    

Returns a `Constraint` object representing this distribution’s support.

`property variance`

    

Returns the variance of the distribution.

## ExponentialFamily

`class
torch.distributions.exp_family.ExponentialFamily(batch_shape=torch.Size([]),
event_shape=torch.Size([]), validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exp_family.html#ExponentialFamily)

    

Bases: `torch.distributions.distribution.Distribution`

ExponentialFamily is the abstract base class for probability distributions
belonging to an exponential family, whose probability mass/density function
has the form is defined below

pF(x;θ)=exp⁡(⟨t(x),θ⟩−F(θ)+k(x))p_{F}(x; \theta) = \exp(\langle t(x),
\theta\rangle - F(\theta) + k(x))

where θ\theta denotes the natural parameters, t(x)t(x) denotes the sufficient
statistic, F(θ)F(\theta) is the log normalizer function for a given family and
k(x)k(x) is the carrier measure.

Note

This class is an intermediary between the `Distribution` class and
distributions which belong to an exponential family mainly to check the
correctness of the `.entropy()` and analytic KL divergence methods. We use
this class to compute the entropy and KL divergence using the AD framework and
Bregman divergences (courtesy of: Frank Nielsen and Richard Nock, Entropies
and Cross-entropies of Exponential Families).

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exp_family.html#ExponentialFamily.entropy)

    

Method to compute the entropy using Bregman divergence of the log normalizer.

## Bernoulli

`class torch.distributions.bernoulli.Bernoulli(probs=None, logits=None,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Creates a Bernoulli distribution parameterized by `probs` or `logits` (but not
both).

Samples are binary (0 or 1). They take the value `1` with probability `p` and
`0` with probability `1 - p`.

Example:

    
    
    >>> m = Bernoulli(torch.tensor([0.3]))
    >>> m.sample()  # 30% chance 1; 70% chance 0
    tensor([ 0.])
    

Parameters

    

  * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1`
  * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1`

`arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0,
upper_bound=1.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.entropy)

`enumerate_support(expand=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.enumerate_support)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.expand)

`has_enumerate_support = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.log_prob)

`logits`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.logits)

`property mean`

`property param_shape`

`probs`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.probs)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.sample)

`support = Boolean()`

`property variance`

## Beta

`class torch.distributions.beta.Beta(concentration1, concentration0,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Beta distribution parameterized by `concentration1` and `concentration0`.

Example:

    
    
    >>> m = Beta(torch.tensor([0.5]), torch.tensor([0.5]))
    >>> m.sample()  # Beta distributed with concentration concentration1 and concentration0
    tensor([ 0.1046])
    

Parameters

    

  * **concentration1** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 1st concentration parameter of the distribution (often referred to as alpha)
  * **concentration0** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 2nd concentration parameter of the distribution (often referred to as beta)

`arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0),
'concentration1': GreaterThan(lower_bound=0.0)}`

`property concentration0`

`property concentration1`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.expand)

`has_rsample = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.log_prob)

`property mean`

`rsample(sample_shape=())`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.rsample)

`support = Interval(lower_bound=0.0, upper_bound=1.0)`

`property variance`

## Binomial

`class torch.distributions.binomial.Binomial(total_count=1, probs=None,
logits=None, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a Binomial distribution parameterized by `total_count` and either
`probs` or `logits` (but not both). `total_count` must be broadcastable with
`probs`/`logits`.

Example:

    
    
    >>> m = Binomial(100, torch.tensor([0 , .2, .8, 1]))
    >>> x = m.sample()
    tensor([   0.,   22.,   71.,  100.])
    
    >>> m = Binomial(torch.tensor([[5.], [10.]]), torch.tensor([0.5, 0.8]))
    >>> x = m.sample()
    tensor([[ 4.,  5.],
            [ 7.,  6.]])
    

Parameters

    

  * **total_count** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – number of Bernoulli trials
  * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event probabilities
  * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event log-odds

`arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0,
upper_bound=1.0), 'total_count': IntegerGreaterThan(lower_bound=0)}`

`enumerate_support(expand=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.enumerate_support)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.expand)

`has_enumerate_support = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.log_prob)

`logits`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.logits)

`property mean`

`property param_shape`

`probs`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.probs)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.sample)

`property support`

`property variance`

## Categorical

`class torch.distributions.categorical.Categorical(probs=None, logits=None,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a categorical distribution parameterized by either `probs` or `logits`
(but not both).

Note

It is equivalent to the distribution that
[`torch.multinomial()`](generated/torch.multinomial#torch.multinomial
"torch.multinomial") samples from.

Samples are integers from {0,…,K−1}\\{0, \ldots, K-1\\} where `K` is
`probs.size(-1)`.

If `probs` is 1-dimensional with length-`K`, each element is the relative
probability of sampling the class at that index.

If `probs` is N-dimensional, the first N-1 dimensions are treated as a batch
of relative probability vectors.

Note

The `probs` argument must be non-negative, finite and have a non-zero sum, and
it will be normalized to sum to 1 along the last dimension. attr:`probs` will
return this normalized value. The `logits` argument will be interpreted as
unnormalized log probabilities and can therefore be any real number. It will
likewise be normalized so that the resulting probabilities sum to 1 along the
last dimension. attr:`logits` will return this normalized value.

See also:
[`torch.multinomial()`](generated/torch.multinomial#torch.multinomial
"torch.multinomial")

Example:

    
    
    >>> m = Categorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ]))
    >>> m.sample()  # equal probability of 0, 1, 2, 3
    tensor(3)
    

Parameters

    

  * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities
  * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event log probabilities (unnormalized)

`arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs':
Simplex()}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.entropy)

`enumerate_support(expand=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.enumerate_support)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.expand)

`has_enumerate_support = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.log_prob)

`logits`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.logits)

`property mean`

`property param_shape`

`probs`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.probs)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.sample)

`property support`

`property variance`

## Cauchy

`class torch.distributions.cauchy.Cauchy(loc, scale, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy)

    

Bases: `torch.distributions.distribution.Distribution`

Samples from a Cauchy (Lorentz) distribution. The distribution of the ratio of
independent normally distributed random variables with means `0` follows a
Cauchy distribution.

Example:

    
    
    >>> m = Cauchy(torch.tensor([0.0]), torch.tensor([1.0]))
    >>> m.sample()  # sample from a Cauchy distribution with loc=0 and scale=1
    tensor([ 2.3214])
    

Parameters

    

  * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mode or median of the distribution.
  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – half width at half maximum.

`arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.expand)

`has_rsample = True`

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.rsample)

`support = Real()`

`property variance`

## Chi2

`class torch.distributions.chi2.Chi2(df, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/chi2.html#Chi2)

    

Bases: `torch.distributions.gamma.Gamma`

Creates a Chi2 distribution parameterized by shape parameter `df`. This is
exactly equivalent to `Gamma(alpha=0.5*df, beta=0.5)`

Example:

    
    
    >>> m = Chi2(torch.tensor([1.0]))
    >>> m.sample()  # Chi2 distributed with shape df=1
    tensor([ 0.1046])
    

Parameters

    

**df** ([float](https://docs.python.org/3/library/functions.html#float "\(in
Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – shape
parameter of the distribution

`arg_constraints = {'df': GreaterThan(lower_bound=0.0)}`

`property df`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/chi2.html#Chi2.expand)

## ContinuousBernoulli

`class
torch.distributions.continuous_bernoulli.ContinuousBernoulli(probs=None,
logits=None, lims=(0.499, 0.501), validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Creates a continuous Bernoulli distribution parameterized by `probs` or
`logits` (but not both).

The distribution is supported in [0, 1] and parameterized by ‘probs’ (in
(0,1)) or ‘logits’ (real-valued). Note that, unlike the Bernoulli, ‘probs’
does not correspond to a probability and ‘logits’ does not correspond to log-
odds, but the same names are used due to the similarity with the Bernoulli.
See [1] for more details.

Example:

    
    
    >>> m = ContinuousBernoulli(torch.tensor([0.3]))
    >>> m.sample()
    tensor([ 0.2538])
    

Parameters

    

  * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – (0,1) valued parameters
  * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – real valued parameters whose sigmoid matches ‘probs’

[1] The continuous Bernoulli: fixing a pervasive error in variational
autoencoders, Loaiza-Ganem G and Cunningham JP, NeurIPS 2019.
<https://arxiv.org/abs/1907.06845>

`arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0,
upper_bound=1.0)}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.expand)

`has_rsample = True`

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.log_prob)

`logits`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.logits)

`property mean`

`property param_shape`

`probs`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.probs)

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.rsample)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.sample)

`property stddev`

`support = Interval(lower_bound=0.0, upper_bound=1.0)`

`property variance`

## Dirichlet

`class torch.distributions.dirichlet.Dirichlet(concentration,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Creates a Dirichlet distribution parameterized by concentration
`concentration`.

Example:

    
    
    >>> m = Dirichlet(torch.tensor([0.5, 0.5]))
    >>> m.sample()  # Dirichlet distributed with concentrarion concentration
    tensor([ 0.1046,  0.8954])
    

Parameters

    

**concentration** ([Tensor](tensors#torch.Tensor "torch.Tensor")) –
concentration parameter of the distribution (often referred to as alpha)

`arg_constraints = {'concentration':
IndependentConstraint(GreaterThan(lower_bound=0.0), 1)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.expand)

`has_rsample = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.log_prob)

`property mean`

`rsample(sample_shape=())`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.rsample)

`support = Simplex()`

`property variance`

## Exponential

`class torch.distributions.exponential.Exponential(rate, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Creates a Exponential distribution parameterized by `rate`.

Example:

    
    
    >>> m = Exponential(torch.tensor([1.0]))
    >>> m.sample()  # Exponential distributed with rate=1
    tensor([ 0.1046])
    

Parameters

    

**rate** ([float](https://docs.python.org/3/library/functions.html#float "\(in
Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – rate = 1
/ scale of the distribution

`arg_constraints = {'rate': GreaterThan(lower_bound=0.0)}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.expand)

`has_rsample = True`

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.rsample)

`property stddev`

`support = GreaterThan(lower_bound=0.0)`

`property variance`

## FisherSnedecor

`class torch.distributions.fishersnedecor.FisherSnedecor(df1, df2,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a Fisher-Snedecor distribution parameterized by `df1` and `df2`.

Example:

    
    
    >>> m = FisherSnedecor(torch.tensor([1.0]), torch.tensor([2.0]))
    >>> m.sample()  # Fisher-Snedecor-distributed with df1=1 and df2=2
    tensor([ 0.2453])
    

Parameters

    

  * **df1** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – degrees of freedom parameter 1
  * **df2** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – degrees of freedom parameter 2

`arg_constraints = {'df1': GreaterThan(lower_bound=0.0), 'df2':
GreaterThan(lower_bound=0.0)}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor.expand)

`has_rsample = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor.rsample)

`support = GreaterThan(lower_bound=0.0)`

`property variance`

## Gamma

`class torch.distributions.gamma.Gamma(concentration, rate,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Creates a Gamma distribution parameterized by shape `concentration` and
`rate`.

Example:

    
    
    >>> m = Gamma(torch.tensor([1.0]), torch.tensor([1.0]))
    >>> m.sample()  # Gamma distributed with concentration=1 and rate=1
    tensor([ 0.1046])
    

Parameters

    

  * **concentration** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – shape parameter of the distribution (often referred to as alpha)
  * **rate** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – rate = 1 / scale of the distribution (often referred to as beta)

`arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'rate':
GreaterThan(lower_bound=0.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.expand)

`has_rsample = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.rsample)

`support = GreaterThan(lower_bound=0.0)`

`property variance`

## Geometric

`class torch.distributions.geometric.Geometric(probs=None, logits=None,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a Geometric distribution parameterized by `probs`, where `probs` is
the probability of success of Bernoulli trials. It represents the probability
that in k+1k + 1 Bernoulli trials, the first kk trials failed, before seeing a
success.

Samples are non-negative integers [0, inf⁡\inf ).

Example:

    
    
    >>> m = Geometric(torch.tensor([0.3]))
    >>> m.sample()  # underlying Bernoulli has 30% chance 1; 70% chance 0
    tensor([ 2.])
    

Parameters

    

  * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1`. Must be in range (0, 1]
  * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1`.

`arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0,
upper_bound=1.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.expand)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.log_prob)

`logits`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.logits)

`property mean`

`probs`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.probs)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.sample)

`support = IntegerGreaterThan(lower_bound=0)`

`property variance`

## Gumbel

`class torch.distributions.gumbel.Gumbel(loc, scale, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Samples from a Gumbel Distribution.

Examples:

    
    
    >>> m = Gumbel(torch.tensor([1.0]), torch.tensor([2.0]))
    >>> m.sample()  # sample from Gumbel distribution with loc=1, scale=2
    tensor([ 1.0124])
    

Parameters

    

  * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Location parameter of the distribution
  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Scale parameter of the distribution

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel.expand)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel.log_prob)

`property mean`

`property stddev`

`support = Real()`

`property variance`

## HalfCauchy

`class torch.distributions.half_cauchy.HalfCauchy(scale, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Creates a half-Cauchy distribution parameterized by `scale` where:

    
    
    X ~ Cauchy(0, scale)
    Y = |X| ~ HalfCauchy(scale)
    

Example:

    
    
    >>> m = HalfCauchy(torch.tensor([1.0]))
    >>> m.sample()  # half-cauchy distributed with scale=1
    tensor([ 2.3214])
    

Parameters

    

**scale** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) –
scale of the full Cauchy distribution

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'scale': GreaterThan(lower_bound=0.0)}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.expand)

`has_rsample = True`

`icdf(prob)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.log_prob)

`property mean`

`property scale`

`support = GreaterThan(lower_bound=0.0)`

`property variance`

## HalfNormal

`class torch.distributions.half_normal.HalfNormal(scale, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Creates a half-normal distribution parameterized by `scale` where:

    
    
    X ~ Normal(0, scale)
    Y = |X| ~ HalfNormal(scale)
    

Example:

    
    
    >>> m = HalfNormal(torch.tensor([1.0]))
    >>> m.sample()  # half-normal distributed with scale=1
    tensor([ 0.1046])
    

Parameters

    

**scale** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) –
scale of the full Normal distribution

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'scale': GreaterThan(lower_bound=0.0)}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.expand)

`has_rsample = True`

`icdf(prob)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.log_prob)

`property mean`

`property scale`

`support = GreaterThan(lower_bound=0.0)`

`property variance`

## Independent

`class torch.distributions.independent.Independent(base_distribution,
reinterpreted_batch_ndims, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent)

    

Bases: `torch.distributions.distribution.Distribution`

Reinterprets some of the batch dims of a distribution as event dims.

This is mainly useful for changing the shape of the result of `log_prob()`.
For example to create a diagonal Normal distribution with the same shape as a
Multivariate Normal distribution (so they are interchangeable), you can:

    
    
    >>> loc = torch.zeros(3)
    >>> scale = torch.ones(3)
    >>> mvn = MultivariateNormal(loc, scale_tril=torch.diag(scale))
    >>> [mvn.batch_shape, mvn.event_shape]
    [torch.Size(()), torch.Size((3,))]
    >>> normal = Normal(loc, scale)
    >>> [normal.batch_shape, normal.event_shape]
    [torch.Size((3,)), torch.Size(())]
    >>> diagn = Independent(normal, 1)
    >>> [diagn.batch_shape, diagn.event_shape]
    [torch.Size(()), torch.Size((3,))]
    

Parameters

    

  * **base_distribution** (torch.distributions.distribution.Distribution) – a base distribution
  * **reinterpreted_batch_ndims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of batch dims to reinterpret as event dims

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.entropy)

`enumerate_support(expand=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.enumerate_support)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.expand)

`property has_enumerate_support`

`property has_rsample`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.rsample)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.sample)

`property support`

`property variance`

## Kumaraswamy

`class torch.distributions.kumaraswamy.Kumaraswamy(concentration1,
concentration0, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kumaraswamy.html#Kumaraswamy)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Samples from a Kumaraswamy distribution.

Example:

    
    
    >>> m = Kumaraswamy(torch.Tensor([1.0]), torch.Tensor([1.0]))
    >>> m.sample()  # sample from a Kumaraswamy distribution with concentration alpha=1 and beta=1
    tensor([ 0.1729])
    

Parameters

    

  * **concentration1** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 1st concentration parameter of the distribution (often referred to as alpha)
  * **concentration0** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 2nd concentration parameter of the distribution (often referred to as beta)

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'concentration0': GreaterThan(lower_bound=0.0), 'concentration1':
GreaterThan(lower_bound=0.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kumaraswamy.html#Kumaraswamy.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kumaraswamy.html#Kumaraswamy.expand)

`has_rsample = True`

`property mean`

`support = Interval(lower_bound=0.0, upper_bound=1.0)`

`property variance`

## LKJCholesky

`class torch.distributions.lkj_cholesky.LKJCholesky(dim, concentration=1.0,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky)

    

Bases: `torch.distributions.distribution.Distribution`

LKJ distribution for lower Cholesky factor of correlation matrices. The
distribution is controlled by `concentration` parameter η\eta to make the
probability of the correlation matrix MM generated from a Cholesky factor
propotional to det⁡(M)η−1\det(M)^{\eta - 1} . Because of that, when
`concentration == 1`, we have a uniform distribution over Cholesky factors of
correlation matrices. Note that this distribution samples the Cholesky factor
of correlation matrices and not the correlation matrices themselves and
thereby differs slightly from the derivations in [1] for the `LKJCorr`
distribution. For sampling, this uses the Onion method from [1] Section 3.

L ~ LKJCholesky(dim, concentration) X = L @ L’ ~ LKJCorr(dim, concentration)

Example:

    
    
    >>> l = LKJCholesky(3, 0.5)
    >>> l.sample()  # l @ l.T is a sample of a correlation 3x3 matrix
    tensor([[ 1.0000,  0.0000,  0.0000],
            [ 0.3516,  0.9361,  0.0000],
            [-0.1899,  0.4748,  0.8593]])
    

Parameters

    

  * **dimension** (_dim_) – dimension of the matrices
  * **concentration** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – concentration/shape parameter of the distribution (often referred to as eta)

**References**

[1] `Generating random correlation matrices based on vines and extended onion
method`, Daniel Lewandowski, Dorota Kurowicka, Harry Joe.

`arg_constraints = {'concentration': GreaterThan(lower_bound=0.0)}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky.expand)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky.log_prob)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky.sample)

`support = CorrCholesky()`

## Laplace

`class torch.distributions.laplace.Laplace(loc, scale, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a Laplace distribution parameterized by `loc` and `scale`.

Example:

    
    
    >>> m = Laplace(torch.tensor([0.0]), torch.tensor([1.0]))
    >>> m.sample()  # Laplace distributed with loc=0, scale=1
    tensor([ 0.1046])
    

Parameters

    

  * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution
  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – scale of the distribution

`arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.expand)

`has_rsample = True`

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.rsample)

`property stddev`

`support = Real()`

`property variance`

## LogNormal

`class torch.distributions.log_normal.LogNormal(loc, scale,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/log_normal.html#LogNormal)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Creates a log-normal distribution parameterized by `loc` and `scale` where:

    
    
    X ~ Normal(loc, scale)
    Y = exp(X) ~ LogNormal(loc, scale)
    

Example:

    
    
    >>> m = LogNormal(torch.tensor([0.0]), torch.tensor([1.0]))
    >>> m.sample()  # log-normal distributed with mean=0 and stddev=1
    tensor([ 0.1046])
    

Parameters

    

  * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of log of distribution
  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – standard deviation of log of the distribution

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/log_normal.html#LogNormal.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/log_normal.html#LogNormal.expand)

`has_rsample = True`

`property loc`

`property mean`

`property scale`

`support = GreaterThan(lower_bound=0.0)`

`property variance`

## LowRankMultivariateNormal

`class
torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal(loc,
cov_factor, cov_diag, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a multivariate normal distribution with covariance matrix having a
low-rank form parameterized by `cov_factor` and `cov_diag`:

    
    
    covariance_matrix = cov_factor @ cov_factor.T + cov_diag
    

#### Example

    
    
    >>> m = LowRankMultivariateNormal(torch.zeros(2), torch.tensor([[1.], [0.]]), torch.ones(2))
    >>> m.sample()  # normally distributed with mean=`[0,0]`, cov_factor=`[[1],[0]]`, cov_diag=`[1,1]`
    tensor([-0.2102, -0.5429])
    

Parameters

    

  * **loc** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution with shape `batch_shape + event_shape`
  * **cov_factor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – factor part of low-rank form of covariance matrix with shape `batch_shape + event_shape + (rank,)`
  * **cov_diag** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – diagonal part of low-rank form of covariance matrix with shape `batch_shape + event_shape`

Note

The computation for determinant and inverse of covariance matrix is avoided
when `cov_factor.shape[1] << cov_factor.shape[0]` thanks to [Woodbury matrix
identity](https://en.wikipedia.org/wiki/Woodbury_matrix_identity) and [matrix
determinant lemma](https://en.wikipedia.org/wiki/Matrix_determinant_lemma).
Thanks to these formulas, we just need to compute the determinant and inverse
of the small size “capacitance” matrix:

    
    
    capacitance = I + cov_factor.T @ inv(cov_diag) @ cov_factor
    

`arg_constraints = {'cov_diag':
IndependentConstraint(GreaterThan(lower_bound=0.0), 1), 'cov_factor':
IndependentConstraint(Real(), 2), 'loc': IndependentConstraint(Real(), 1)}`

`covariance_matrix`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.covariance_matrix)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.expand)

`has_rsample = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.log_prob)

`property mean`

`precision_matrix`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.precision_matrix)

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.rsample)

`scale_tril`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.scale_tril)

`support = IndependentConstraint(Real(), 1)`

`variance`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.variance)

## MixtureSameFamily

`class
torch.distributions.mixture_same_family.MixtureSameFamily(mixture_distribution,
component_distribution, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily)

    

Bases: `torch.distributions.distribution.Distribution`

The `MixtureSameFamily` distribution implements a (batch of) mixture
distribution where all component are from different parameterizations of the
same distribution type. It is parameterized by a `Categorical` “selecting
distribution” (over `k` component) and a component distribution, i.e., a
`Distribution` with a rightmost batch shape (equal to `[k]`) which indexes
each (batch of) component.

Examples:

    
    
    # Construct Gaussian Mixture Model in 1D consisting of 5 equally
    # weighted normal distributions
    >>> mix = D.Categorical(torch.ones(5,))
    >>> comp = D.Normal(torch.randn(5,), torch.rand(5,))
    >>> gmm = MixtureSameFamily(mix, comp)
    
    # Construct Gaussian Mixture Modle in 2D consisting of 5 equally
    # weighted bivariate normal distributions
    >>> mix = D.Categorical(torch.ones(5,))
    >>> comp = D.Independent(D.Normal(
                 torch.randn(5,2), torch.rand(5,2)), 1)
    >>> gmm = MixtureSameFamily(mix, comp)
    
    # Construct a batch of 3 Gaussian Mixture Models in 2D each
    # consisting of 5 random weighted bivariate normal distributions
    >>> mix = D.Categorical(torch.rand(3,5))
    >>> comp = D.Independent(D.Normal(
                torch.randn(3,5,2), torch.rand(3,5,2)), 1)
    >>> gmm = MixtureSameFamily(mix, comp)
    

Parameters

    

  * **mixture_distribution** – `torch.distributions.Categorical`-like instance. Manages the probability of selecting component. The number of categories must match the rightmost batch dimension of the `component_distribution`. Must have either scalar `batch_shape` or `batch_shape` matching `component_distribution.batch_shape[:-1]`
  * **component_distribution** – `torch.distributions.Distribution`-like instance. Right-most batch dimension indexes component.

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {}`

`cdf(x)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.cdf)

`property component_distribution`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.expand)

`has_rsample = False`

`log_prob(x)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.log_prob)

`property mean`

`property mixture_distribution`

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.sample)

`property support`

`property variance`

## Multinomial

`class torch.distributions.multinomial.Multinomial(total_count=1, probs=None,
logits=None, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a Multinomial distribution parameterized by `total_count` and either
`probs` or `logits` (but not both). The innermost dimension of `probs` indexes
over categories. All other dimensions index over batches.

Note that `total_count` need not be specified if only `log_prob()` is called
(see example below)

Note

The `probs` argument must be non-negative, finite and have a non-zero sum, and
it will be normalized to sum to 1 along the last dimension. attr:`probs` will
return this normalized value. The `logits` argument will be interpreted as
unnormalized log probabilities and can therefore be any real number. It will
likewise be normalized so that the resulting probabilities sum to 1 along the
last dimension. attr:`logits` will return this normalized value.

  * `sample()` requires a single shared `total_count` for all parameters and samples.
  * `log_prob()` allows different `total_count` for each parameter and sample.

Example:

    
    
    >>> m = Multinomial(100, torch.tensor([ 1., 1., 1., 1.]))
    >>> x = m.sample()  # equal probability of 0, 1, 2, 3
    tensor([ 21.,  24.,  30.,  25.])
    
    >>> Multinomial(probs=torch.tensor([1., 1., 1., 1.])).log_prob(x)
    tensor([-4.1338])
    

Parameters

    

  * **total_count** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of trials
  * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities
  * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event log probabilities (unnormalized)

`arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs':
Simplex()}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial.expand)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial.log_prob)

`property logits`

`property mean`

`property param_shape`

`property probs`

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial.sample)

`property support`

`total_count: int = None`

`property variance`

## MultivariateNormal

`class torch.distributions.multivariate_normal.MultivariateNormal(loc,
covariance_matrix=None, precision_matrix=None, scale_tril=None,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a multivariate normal (also called Gaussian) distribution
parameterized by a mean vector and a covariance matrix.

The multivariate normal distribution can be parameterized either in terms of a
positive definite covariance matrix Σ\mathbf{\Sigma} or a positive definite
precision matrix Σ−1\mathbf{\Sigma}^{-1} or a lower-triangular matrix
L\mathbf{L} with positive-valued diagonal entries, such that
Σ=LL⊤\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top . This triangular matrix can
be obtained via e.g. Cholesky decomposition of the covariance.

#### Example

    
    
    >>> m = MultivariateNormal(torch.zeros(2), torch.eye(2))
    >>> m.sample()  # normally distributed with mean=`[0,0]` and covariance_matrix=`I`
    tensor([-0.2102, -0.5429])
    

Parameters

    

  * **loc** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution
  * **covariance_matrix** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – positive-definite covariance matrix
  * **precision_matrix** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – positive-definite precision matrix
  * **scale_tril** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – lower-triangular factor of covariance, with positive-valued diagonal

Note

Only one of `covariance_matrix` or `precision_matrix` or `scale_tril` can be
specified.

Using `scale_tril` will be more efficient: all computations internally are
based on `scale_tril`. If `covariance_matrix` or `precision_matrix` is passed
instead, it is only used to compute the corresponding lower triangular
matrices using a Cholesky decomposition.

`arg_constraints = {'covariance_matrix': PositiveDefinite(), 'loc':
IndependentConstraint(Real(), 1), 'precision_matrix': PositiveDefinite(),
'scale_tril': LowerCholesky()}`

`covariance_matrix`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.covariance_matrix)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.expand)

`has_rsample = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.log_prob)

`property mean`

`precision_matrix`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.precision_matrix)

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.rsample)

`scale_tril`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.scale_tril)

`support = IndependentConstraint(Real(), 1)`

`property variance`

## NegativeBinomial

`class torch.distributions.negative_binomial.NegativeBinomial(total_count,
probs=None, logits=None, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a Negative Binomial distribution, i.e. distribution of the number of
successful independent and identical Bernoulli trials before `total_count`
failures are achieved. The probability of failure of each Bernoulli trial is
`probs`.

Parameters

    

  * **total_count** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – non-negative number of negative Bernoulli trials to stop, although the distribution is still valid for real valued count
  * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event probabilities of failure in the half open interval [0, 1)
  * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event log-odds for probabilities of failure

`arg_constraints = {'logits': Real(), 'probs':
HalfOpenInterval(lower_bound=0.0, upper_bound=1.0), 'total_count':
GreaterThanEq(lower_bound=0)}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.expand)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.log_prob)

`logits`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.logits)

`property mean`

`property param_shape`

`probs`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.probs)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.sample)

`support = IntegerGreaterThan(lower_bound=0)`

`property variance`

## Normal

`class torch.distributions.normal.Normal(loc, scale, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Creates a normal (also called Gaussian) distribution parameterized by `loc`
and `scale`.

Example:

    
    
    >>> m = Normal(torch.tensor([0.0]), torch.tensor([1.0]))
    >>> m.sample()  # normally distributed with loc=0 and scale=1
    tensor([ 0.1046])
    

Parameters

    

  * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution (often referred to as mu)
  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – standard deviation of the distribution (often referred to as sigma)

`arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.expand)

`has_rsample = True`

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.rsample)

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.sample)

`property stddev`

`support = Real()`

`property variance`

## OneHotCategorical

`class torch.distributions.one_hot_categorical.OneHotCategorical(probs=None,
logits=None, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a one-hot categorical distribution parameterized by `probs` or
`logits`.

Samples are one-hot coded vectors of size `probs.size(-1)`.

Note

The `probs` argument must be non-negative, finite and have a non-zero sum, and
it will be normalized to sum to 1 along the last dimension. attr:`probs` will
return this normalized value. The `logits` argument will be interpreted as
unnormalized log probabilities and can therefore be any real number. It will
likewise be normalized so that the resulting probabilities sum to 1 along the
last dimension. attr:`logits` will return this normalized value.

See also: `torch.distributions.Categorical()` for specifications of `probs`
and `logits`.

Example:

    
    
    >>> m = OneHotCategorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ]))
    >>> m.sample()  # equal probability of 0, 1, 2, 3
    tensor([ 0.,  0.,  0.,  1.])
    

Parameters

    

  * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities
  * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event log probabilities (unnormalized)

`arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs':
Simplex()}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.entropy)

`enumerate_support(expand=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.enumerate_support)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.expand)

`has_enumerate_support = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.log_prob)

`property logits`

`property mean`

`property param_shape`

`property probs`

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.sample)

`support = OneHot()`

`property variance`

## Pareto

`class torch.distributions.pareto.Pareto(scale, alpha, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/pareto.html#Pareto)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Samples from a Pareto Type 1 distribution.

Example:

    
    
    >>> m = Pareto(torch.tensor([1.0]), torch.tensor([1.0]))
    >>> m.sample()  # sample from a Pareto distribution with scale=1 and alpha=1
    tensor([ 1.5623])
    

Parameters

    

  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Scale parameter of the distribution
  * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Shape parameter of the distribution

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'alpha': GreaterThan(lower_bound=0.0), 'scale':
GreaterThan(lower_bound=0.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/pareto.html#Pareto.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/pareto.html#Pareto.expand)

`property mean`

`property support`

`property variance`

## Poisson

`class torch.distributions.poisson.Poisson(rate, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson)

    

Bases: `torch.distributions.exp_family.ExponentialFamily`

Creates a Poisson distribution parameterized by `rate`, the rate parameter.

Samples are nonnegative integers, with a pmf given by

rateke−ratek!\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!}

Example:

    
    
    >>> m = Poisson(torch.tensor([4]))
    >>> m.sample()
    tensor([ 3.])
    

Parameters

    

**rate** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the
rate parameter

`arg_constraints = {'rate': GreaterThan(lower_bound=0.0)}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson.expand)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson.log_prob)

`property mean`

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson.sample)

`support = IntegerGreaterThan(lower_bound=0)`

`property variance`

## RelaxedBernoulli

`class torch.distributions.relaxed_bernoulli.RelaxedBernoulli(temperature,
probs=None, logits=None, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#RelaxedBernoulli)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Creates a RelaxedBernoulli distribution, parametrized by `temperature`, and
either `probs` or `logits` (but not both). This is a relaxed version of the
`Bernoulli` distribution, so the values are in (0, 1), and has
reparametrizable samples.

Example:

    
    
    >>> m = RelaxedBernoulli(torch.tensor([2.2]),
                             torch.tensor([0.1, 0.2, 0.3, 0.99]))
    >>> m.sample()
    tensor([ 0.2951,  0.3442,  0.8918,  0.9021])
    

Parameters

    

  * **temperature** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – relaxation temperature
  * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1`
  * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1`

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#RelaxedBernoulli.expand)

`has_rsample = True`

`property logits`

`property probs`

`support = Interval(lower_bound=0.0, upper_bound=1.0)`

`property temperature`

## LogitRelaxedBernoulli

`class
torch.distributions.relaxed_bernoulli.LogitRelaxedBernoulli(temperature,
probs=None, logits=None, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a LogitRelaxedBernoulli distribution parameterized by `probs` or
`logits` (but not both), which is the logit of a RelaxedBernoulli
distribution.

Samples are logits of values in (0, 1). See [1] for more details.

Parameters

    

  * **temperature** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – relaxation temperature
  * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1`
  * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1`

[1] The Concrete Distribution: A Continuous Relaxation of Discrete Random
Variables (Maddison et al, 2017)

[2] Categorical Reparametrization with Gumbel-Softmax (Jang et al, 2017)

`arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0,
upper_bound=1.0)}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.expand)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.log_prob)

`logits`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.logits)

`property param_shape`

`probs`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.probs)

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.rsample)

`support = Real()`

## RelaxedOneHotCategorical

`class
torch.distributions.relaxed_categorical.RelaxedOneHotCategorical(temperature,
probs=None, logits=None, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_categorical.html#RelaxedOneHotCategorical)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Creates a RelaxedOneHotCategorical distribution parametrized by `temperature`,
and either `probs` or `logits`. This is a relaxed version of the
`OneHotCategorical` distribution, so its samples are on simplex, and are
reparametrizable.

Example:

    
    
    >>> m = RelaxedOneHotCategorical(torch.tensor([2.2]),
                                     torch.tensor([0.1, 0.2, 0.3, 0.4]))
    >>> m.sample()
    tensor([ 0.1294,  0.2324,  0.3859,  0.2523])
    

Parameters

    

  * **temperature** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – relaxation temperature
  * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities
  * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – unnormalized log probability for each event

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}`

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_categorical.html#RelaxedOneHotCategorical.expand)

`has_rsample = True`

`property logits`

`property probs`

`support = Simplex()`

`property temperature`

## StudentT

`class torch.distributions.studentT.StudentT(df, loc=0.0, scale=1.0,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT)

    

Bases: `torch.distributions.distribution.Distribution`

Creates a Student’s t-distribution parameterized by degree of freedom `df`,
mean `loc` and scale `scale`.

Example:

    
    
    >>> m = StudentT(torch.tensor([2.0]))
    >>> m.sample()  # Student's t-distributed with degrees of freedom=2
    tensor([ 0.1046])
    

Parameters

    

  * **df** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – degrees of freedom
  * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution
  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – scale of the distribution

`arg_constraints = {'df': GreaterThan(lower_bound=0.0), 'loc': Real(),
'scale': GreaterThan(lower_bound=0.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.expand)

`has_rsample = True`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.rsample)

`support = Real()`

`property variance`

## TransformedDistribution

`class
torch.distributions.transformed_distribution.TransformedDistribution(base_distribution,
transforms, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution)

    

Bases: `torch.distributions.distribution.Distribution`

Extension of the Distribution class, which applies a sequence of Transforms to
a base distribution. Let f be the composition of transforms applied:

    
    
    X ~ BaseDistribution
    Y = f(X) ~ TransformedDistribution(BaseDistribution, f)
    log p(Y) = log p(X) + log |det (dX/dY)|
    

Note that the `.event_shape` of a `TransformedDistribution` is the maximum
shape of its base distribution and its transforms, since transforms can
introduce correlations among events.

An example for the usage of `TransformedDistribution` would be:

    
    
    # Building a Logistic Distribution
    # X ~ Uniform(0, 1)
    # f = a + b * logit(X)
    # Y ~ f(X) ~ Logistic(a, b)
    base_distribution = Uniform(0, 1)
    transforms = [SigmoidTransform().inv, AffineTransform(loc=a, scale=b)]
    logistic = TransformedDistribution(base_distribution, transforms)
    

For more examples, please look at the implementations of `Gumbel`,
`HalfCauchy`, `HalfNormal`, `LogNormal`, `Pareto`, `Weibull`,
`RelaxedBernoulli` and `RelaxedOneHotCategorical`

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.cdf)

    

Computes the cumulative distribution function by inverting the transform(s)
and computing the score of the base distribution.

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.expand)

`property has_rsample`

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.icdf)

    

Computes the inverse cumulative distribution function using transform(s) and
computing the score of the base distribution.

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.log_prob)

    

Scores the sample by inverting the transform(s) and computing the score using
the score of the base distribution and the log abs det jacobian.

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.rsample)

    

Generates a sample_shape shaped reparameterized sample or sample_shape shaped
batch of reparameterized samples if the distribution parameters are batched.
Samples first from base distribution and applies `transform()` for every
transform in the list.

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.sample)

    

Generates a sample_shape shaped sample or sample_shape shaped batch of samples
if the distribution parameters are batched. Samples first from base
distribution and applies `transform()` for every transform in the list.

`property support`

## Uniform

`class torch.distributions.uniform.Uniform(low, high, validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform)

    

Bases: `torch.distributions.distribution.Distribution`

Generates uniformly distributed random samples from the half-open interval
`[low, high)`.

Example:

    
    
    >>> m = Uniform(torch.tensor([0.0]), torch.tensor([5.0]))
    >>> m.sample()  # uniformly distributed in the range [0.0, 5.0)
    tensor([ 2.3418])
    

Parameters

    

  * **low** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – lower range (inclusive).
  * **high** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – upper range (exclusive).

`arg_constraints = {'high': Dependent(), 'low': Dependent()}`

`cdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.cdf)

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.expand)

`has_rsample = True`

`icdf(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.icdf)

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.log_prob)

`property mean`

`rsample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.rsample)

`property stddev`

`property support`

`property variance`

## VonMises

`class torch.distributions.von_mises.VonMises(loc, concentration,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises)

    

Bases: `torch.distributions.distribution.Distribution`

A circular von Mises distribution.

This implementation uses polar coordinates. The `loc` and `value` args can be
any real number (to facilitate unconstrained optimization), but are
interpreted as angles modulo 2 pi.

Example::

    
    
    
    >>> m = dist.VonMises(torch.tensor([1.0]), torch.tensor([1.0]))
    >>> m.sample() # von Mises distributed with loc=1 and concentration=1
    tensor([1.9777])
    

Parameters

    

  * **loc** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – an angle in radians.
  * **concentration** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – concentration parameter

`arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'loc':
Real()}`

`expand(batch_shape)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.expand)

`has_rsample = False`

`log_prob(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.log_prob)

`property mean`

    

The provided mean is the circular one.

`sample(sample_shape=torch.Size([]))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.sample)

    

The sampling algorithm for the von Mises distribution is based on the
following paper: Best, D. J., and Nicholas I. Fisher. “Efficient simulation of
the von Mises distribution.” Applied Statistics (1979): 152-157.

`support = Real()`

`variance`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.variance)

    

The provided variance is the circular one.

## Weibull

`class torch.distributions.weibull.Weibull(scale, concentration,
validate_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/weibull.html#Weibull)

    

Bases: `torch.distributions.transformed_distribution.TransformedDistribution`

Samples from a two-parameter Weibull distribution.

#### Example

    
    
    >>> m = Weibull(torch.tensor([1.0]), torch.tensor([1.0]))
    >>> m.sample()  # sample from a Weibull distribution with scale=1, concentration=1
    tensor([ 0.4784])
    

Parameters

    

  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Scale parameter of distribution (lambda).
  * **concentration** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Concentration parameter of distribution (k/shape).

`arg_constraints: Dict[str, torch.distributions.constraints.Constraint] =
{'concentration': GreaterThan(lower_bound=0.0), 'scale':
GreaterThan(lower_bound=0.0)}`

`entropy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/weibull.html#Weibull.entropy)

`expand(batch_shape, _instance=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/weibull.html#Weibull.expand)

`property mean`

`support = GreaterThan(lower_bound=0.0)`

`property variance`

## `KL Divergence`

`torch.distributions.kl.kl_divergence(p, q)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kl.html#kl_divergence)

    

Compute Kullback-Leibler divergence KL(p∥q)KL(p \| q) between two
distributions.

KL(p∥q)=∫p(x)log⁡p(x)q(x)dxKL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx

Parameters

    

  * **p** (Distribution) – A `Distribution` object.
  * **q** (Distribution) – A `Distribution` object.

Returns

    

A batch of KL divergences of shape `batch_shape`.

Return type

    

[Tensor](tensors#torch.Tensor "torch.Tensor")

Raises

    

[**NotImplementedError**](https://docs.python.org/3/library/exceptions.html#NotImplementedError
"\(in Python v3.9\)") – If the distribution types have not been registered via
`register_kl()`.

`torch.distributions.kl.register_kl(type_p, type_q)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kl.html#register_kl)

    

Decorator to register a pairwise function with `kl_divergence()`. Usage:

    
    
    @register_kl(Normal, Normal)
    def kl_normal_normal(p, q):
        # insert implementation here
    

Lookup returns the most specific (type,type) match ordered by subclass. If the
match is ambiguous, a `RuntimeWarning` is raised. For example to resolve the
ambiguous situation:

    
    
    @register_kl(BaseP, DerivedQ)
    def kl_version1(p, q): ...
    @register_kl(DerivedP, BaseQ)
    def kl_version2(p, q): ...
    

you should register a third most-specific implementation, e.g.:

    
    
    register_kl(DerivedP, DerivedQ)(kl_version1)  # Break the tie.
    

Parameters

    

  * **type_p** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)")) – A subclass of `Distribution`.
  * **type_q** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)")) – A subclass of `Distribution`.

## `Transforms`

`class torch.distributions.transforms.Transform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform)

    

Abstract class for invertable transformations with computable log det
jacobians. They are primarily used in
`torch.distributions.TransformedDistribution`.

Caching is useful for transforms whose inverses are either expensive or
numerically unstable. Note that care must be taken with memoized values since
the autograd graph may be reversed. For example while the following works with
or without caching:

    
    
    y = t(x)
    t.log_abs_det_jacobian(x, y).backward()  # x will receive gradients.
    

However the following will error when caching due to dependency reversal:

    
    
    y = t(x)
    z = t.inv(y)
    grad(z.sum(), [y])  # error because z is x
    

Derived classes should implement one or both of `_call()` or `_inverse()`.
Derived classes that set `bijective=True` should also implement
`log_abs_det_jacobian()`.

Parameters

    

**cache_size** ([int](https://docs.python.org/3/library/functions.html#int
"\(in Python v3.9\)")) – Size of cache. If zero, no caching is done. If one,
the latest single value is cached. Only 0 and 1 are supported.

Variables

    

  * **~Transform.domain** (`Constraint`) – The constraint representing valid inputs to this transform.
  * **~Transform.codomain** (`Constraint`) – The constraint representing valid outputs to this transform which are inputs to the inverse transform.
  * **~Transform.bijective** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether this transform is bijective. A transform `t` is bijective iff `t.inv(t(x)) == x` and `t(t.inv(y)) == y` for every `x` in the domain and `y` in the codomain. Transforms that are not bijective should at least maintain the weaker pseudoinverse properties `t(t.inv(t(x)) == t(x)` and `t.inv(t(t.inv(y))) == t.inv(y)`.
  * **~Transform.sign** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – For bijective univariate transforms, this should be +1 or -1 depending on whether transform is monotone increasing or decreasing.

`property inv`

    

Returns the inverse `Transform` of this transform. This should satisfy
`t.inv.inv is t`.

`property sign`

    

Returns the sign of the determinant of the Jacobian, if applicable. In general
this only makes sense for bijective transforms.

`log_abs_det_jacobian(x, y)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform.log_abs_det_jacobian)

    

Computes the log det jacobian `log |dy/dx|` given input and output.

`forward_shape(shape)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform.forward_shape)

    

Infers the shape of the forward computation, given the input shape. Defaults
to preserving shape.

`inverse_shape(shape)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform.inverse_shape)

    

Infers the shapes of the inverse computation, given the output shape. Defaults
to preserving shape.

`class torch.distributions.transforms.ComposeTransform(parts, cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#ComposeTransform)

    

Composes multiple transforms in a chain. The transforms being composed are
responsible for caching.

Parameters

    

  * **parts** (list of `Transform`) – A list of transforms to compose.
  * **cache_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of cache. If zero, no caching is done. If one, the latest single value is cached. Only 0 and 1 are supported.

`class torch.distributions.transforms.IndependentTransform(base_transform,
reinterpreted_batch_ndims, cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#IndependentTransform)

    

Wrapper around another transform to treat `reinterpreted_batch_ndims`-many
extra of the right most dimensions as dependent. This has no effect on the
forward or backward transforms, but does sum out
`reinterpreted_batch_ndims`-many of the rightmost dimensions in
`log_abs_det_jacobian()`.

Parameters

    

  * **base_transform** (`Transform`) – A base transform.
  * **reinterpreted_batch_ndims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of extra rightmost dimensions to treat as dependent.

`class torch.distributions.transforms.ReshapeTransform(in_shape, out_shape,
cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#ReshapeTransform)

    

Unit Jacobian transform to reshape the rightmost part of a tensor.

Note that `in_shape` and `out_shape` must have the same number of elements,
just as for [`torch.Tensor.reshape()`](tensors#torch.Tensor.reshape
"torch.Tensor.reshape").

Parameters

    

  * **in_shape** (_torch.Size_) – The input event shape.
  * **out_shape** (_torch.Size_) – The output event shape.

`class torch.distributions.transforms.ExpTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#ExpTransform)

    

Transform via the mapping y=exp⁡(x)y = \exp(x) .

`class torch.distributions.transforms.PowerTransform(exponent, cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#PowerTransform)

    

Transform via the mapping y=xexponenty = x^{\text{exponent}} .

`class torch.distributions.transforms.SigmoidTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#SigmoidTransform)

    

Transform via the mapping y=11+exp⁡(−x)y = \frac{1}{1 + \exp(-x)} and
x=logit(y)x = \text{logit}(y) .

`class torch.distributions.transforms.TanhTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#TanhTransform)

    

Transform via the mapping y=tanh⁡(x)y = \tanh(x) .

It is equivalent to `` ComposeTransform([AffineTransform(0., 2.),
SigmoidTransform(), AffineTransform(-1., 2.)]) `` However this might not be
numerically stable, thus it is recommended to use `TanhTransform` instead.

Note that one should use `cache_size=1` when it comes to `NaN/Inf` values.

`class torch.distributions.transforms.AbsTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#AbsTransform)

    

Transform via the mapping y=∣x∣y = |x| .

`class torch.distributions.transforms.AffineTransform(loc, scale, event_dim=0,
cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#AffineTransform)

    

Transform via the pointwise affine mapping y=loc+scale×xy = \text{loc} +
\text{scale} \times x .

Parameters

    

  * **loc** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Location parameter.
  * **scale** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Scale parameter.
  * **event_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Optional size of `event_shape`. This should be zero for univariate random variables, 1 for distributions over vectors, 2 for distributions over matrices, etc.

`class torch.distributions.transforms.CorrCholeskyTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#CorrCholeskyTransform)

    

Transforms an uncontrained real vector xx with length D∗(D−1)/2D*(D-1)/2 into
the Cholesky factor of a D-dimension correlation matrix. This Cholesky factor
is a lower triangular matrix with positive diagonals and unit Euclidean norm
for each row. The transform is processed as follows:

  1. First we convert x into a lower triangular matrix in row order.
  2. For each row XiX_i of the lower triangular part, we apply a _signed_ version of class `StickBreakingTransform` to transform XiX_i into a unit Euclidean length vector using the following steps: - Scales into the interval (−1,1)(-1, 1) domain: ri=tanh⁡(Xi)r_i = \tanh(X_i) . - Transforms into an unsigned domain: zi=ri2z_i = r_i^2 . - Applies si=StickBreakingTransform(zi)s_i = StickBreakingTransform(z_i) . - Transforms back into signed domain: yi=sign(ri)∗siy_i = sign(r_i) * \sqrt{s_i} .

`class torch.distributions.transforms.SoftmaxTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#SoftmaxTransform)

    

Transform from unconstrained space to the simplex via y=exp⁡(x)y = \exp(x)
then normalizing.

This is not bijective and cannot be used for HMC. However this acts mostly
coordinate-wise (except for the final normalization), and thus is appropriate
for coordinate-wise optimization algorithms.

`class torch.distributions.transforms.StickBreakingTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#StickBreakingTransform)

    

Transform from unconstrained space to the simplex of one additional dimension
via a stick-breaking process.

This transform arises as an iterated sigmoid transform in a stick-breaking
construction of the `Dirichlet` distribution: the first logit is transformed
via sigmoid to the first probability and the probability of everything else,
and then the process recurses.

This is bijective and appropriate for use in HMC; however it mixes coordinates
together and is less appropriate for optimization.

`class torch.distributions.transforms.LowerCholeskyTransform(cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#LowerCholeskyTransform)

    

Transform from unconstrained matrices to lower-triangular matrices with
nonnegative diagonal entries.

This is useful for parameterizing positive definite matrices in terms of their
Cholesky factorization.

`class torch.distributions.transforms.StackTransform(tseq, dim=0,
cache_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#StackTransform)

    

Transform functor that applies a sequence of transforms `tseq` component-wise
to each submatrix at `dim` in a way compatible with
[`torch.stack()`](generated/torch.stack#torch.stack "torch.stack").

Example::

    

x = torch.stack([torch.range(1, 10), torch.range(1, 10)], dim=1) t =
StackTransform([ExpTransform(), identity_transform], dim=1) y = t(x)

## `Constraints`

The following constraints are implemented:

  * `constraints.boolean`
  * `constraints.cat`
  * `constraints.corr_cholesky`
  * `constraints.dependent`
  * `constraints.greater_than(lower_bound)`
  * `constraints.greater_than_eq(lower_bound)`
  * `constraints.independent(constraint, reinterpreted_batch_ndims)`
  * `constraints.integer_interval(lower_bound, upper_bound)`
  * `constraints.interval(lower_bound, upper_bound)`
  * `constraints.less_than(upper_bound)`
  * `constraints.lower_cholesky`
  * `constraints.lower_triangular`
  * `constraints.multinomial`
  * `constraints.nonnegative_integer`
  * `constraints.one_hot`
  * `constraints.positive_definite`
  * `constraints.positive_integer`
  * `constraints.positive`
  * `constraints.real_vector`
  * `constraints.real`
  * `constraints.simplex`
  * `constraints.stack`
  * `constraints.unit_interval`

`class torch.distributions.constraints.Constraint`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraints.html#Constraint)

    

Abstract base class for constraints.

A constraint object represents a region over which a variable is valid, e.g.
within which a variable can be optimized.

Variables

    

  * **~Constraint.is_discrete** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether constrained space is discrete. Defaults to False.
  * **~Constraint.event_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of rightmost dimensions that together define an event. The `check()` method will remove this many dimensions when computing validity.

`check(value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraints.html#Constraint.check)

    

Returns a byte tensor of `sample_shape + batch_shape` indicating whether each
event in value satisfies this constraint.

`torch.distributions.constraints.dependent_property`

    

alias of `torch.distributions.constraints._DependentProperty`

`torch.distributions.constraints.independent`

    

alias of `torch.distributions.constraints._IndependentConstraint`

`torch.distributions.constraints.integer_interval`

    

alias of `torch.distributions.constraints._IntegerInterval`

`torch.distributions.constraints.greater_than`

    

alias of `torch.distributions.constraints._GreaterThan`

`torch.distributions.constraints.greater_than_eq`

    

alias of `torch.distributions.constraints._GreaterThanEq`

`torch.distributions.constraints.less_than`

    

alias of `torch.distributions.constraints._LessThan`

`torch.distributions.constraints.multinomial`

    

alias of `torch.distributions.constraints._Multinomial`

`torch.distributions.constraints.interval`

    

alias of `torch.distributions.constraints._Interval`

`torch.distributions.constraints.half_open_interval`

    

alias of `torch.distributions.constraints._HalfOpenInterval`

`torch.distributions.constraints.cat`

    

alias of `torch.distributions.constraints._Cat`

`torch.distributions.constraints.stack`

    

alias of `torch.distributions.constraints._Stack`

## `Constraint Registry`

PyTorch provides two global `ConstraintRegistry` objects that link
`Constraint` objects to `Transform` objects. These objects both input
constraints and return transforms, but they have different guarantees on
bijectivity.

  1. `biject_to(constraint)` looks up a bijective `Transform` from `constraints.real` to the given `constraint`. The returned transform is guaranteed to have `.bijective = True` and should implement `.log_abs_det_jacobian()`.
  2. `transform_to(constraint)` looks up a not-necessarily bijective `Transform` from `constraints.real` to the given `constraint`. The returned transform is not guaranteed to implement `.log_abs_det_jacobian()`.

The `transform_to()` registry is useful for performing unconstrained
optimization on constrained parameters of probability distributions, which are
indicated by each distribution’s `.arg_constraints` dict. These transforms
often overparameterize a space in order to avoid rotation; they are thus more
suitable for coordinate-wise optimization algorithms like Adam:

    
    
    loc = torch.zeros(100, requires_grad=True)
    unconstrained = torch.zeros(100, requires_grad=True)
    scale = transform_to(Normal.arg_constraints['scale'])(unconstrained)
    loss = -Normal(loc, scale).log_prob(data).sum()
    

The `biject_to()` registry is useful for Hamiltonian Monte Carlo, where
samples from a probability distribution with constrained `.support` are
propagated in an unconstrained space, and algorithms are typically rotation
invariant.:

    
    
    dist = Exponential(rate)
    unconstrained = torch.zeros(100, requires_grad=True)
    sample = biject_to(dist.support)(unconstrained)
    potential_energy = -dist.log_prob(sample).sum()
    

Note

An example where `transform_to` and `biject_to` differ is
`constraints.simplex`: `transform_to(constraints.simplex)` returns a
`SoftmaxTransform` that simply exponentiates and normalizes its inputs; this
is a cheap and mostly coordinate-wise operation appropriate for algorithms
like SVI. In contrast, `biject_to(constraints.simplex)` returns a
`StickBreakingTransform` that bijects its input down to a one-fewer-
dimensional space; this a more expensive less numerically stable transform but
is needed for algorithms like HMC.

The `biject_to` and `transform_to` objects can be extended by user-defined
constraints and transforms using their `.register()` method either as a
function on singleton constraints:

    
    
    transform_to.register(my_constraint, my_transform)
    

or as a decorator on parameterized constraints:

    
    
    @transform_to.register(MyConstraintClass)
    def my_factory(constraint):
        assert isinstance(constraint, MyConstraintClass)
        return MyTransform(constraint.param1, constraint.param2)
    

You can create your own registry by creating a new `ConstraintRegistry`
object.

`class torch.distributions.constraint_registry.ConstraintRegistry`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraint_registry.html#ConstraintRegistry)

    

Registry to link constraints to transforms.

`register(constraint, factory=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraint_registry.html#ConstraintRegistry.register)

    

Registers a `Constraint` subclass in this registry. Usage:

    
    
    @my_registry.register(MyConstraintClass)
    def construct_transform(constraint):
        assert isinstance(constraint, MyConstraint)
        return MyTransform(constraint.arg_constraints)
    

Parameters

    

  * **constraint** (subclass of `Constraint`) – A subclass of `Constraint`, or a singleton object of the desired class.
  * **factory** (_callable_) – A callable that inputs a constraint object and returns a `Transform` object.

# torch.utils.dlpack

`torch.utils.dlpack.from_dlpack(dlpack) → Tensor`

    

Decodes a DLPack to a tensor.

Parameters

    

**dlpack** – a PyCapsule object with the dltensor

The tensor will share the memory with the object represented in the dlpack.
Note that each dlpack can only be consumed once.

`torch.utils.dlpack.to_dlpack(tensor) → PyCapsule`

    

Returns a DLPack representing the tensor.

Parameters

    

**tensor** – a tensor to be exported

The dlpack shares the tensors memory. Note that each dlpack can only be
consumed once.

# torch.fft

Discrete Fourier transforms and related functions.

## Fast Fourier Transforms

`torch.fft.fft(input, n=None, dim=-1, norm=None) → Tensor`

    

Computes the one dimensional discrete Fourier transform of `input`.

Note

The Fourier domain representation of any real signal satisfies the Hermitian
property: `X[i] = conj(X[-i])`. This function always returns both the positive
and negative frequency terms even though, for real inputs, the negative
frequencies are redundant. `rfft()` returns the more compact one-sided
representation where only the positive frequencies are returned.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the FFT.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional FFT.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the forward transform (`fft()`), these correspond to:

    * `"forward"` \- normalize by `1/n`
    * `"backward"` \- no normalization
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal)

Calling the backward transform (`ifft()`) with the same normalization mode
will apply an overall normalization of `1/n` between the two transforms. This
is required to make `ifft()` the exact inverse.

Default is `"backward"` (no normalization).

#### Example

    
    
    >>> t = torch.arange(4)
    >>> t
    tensor([0, 1, 2, 3])
    >>> torch.fft.fft(t)
    tensor([ 6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j])
    
    
    
    >>> t = tensor([0.+1.j, 2.+3.j, 4.+5.j, 6.+7.j])
    >>> torch.fft.fft(t)
    tensor([12.+16.j, -8.+0.j, -4.-4.j,  0.-8.j])
    

`torch.fft.ifft(input, n=None, dim=-1, norm=None) → Tensor`

    

Computes the one dimensional inverse discrete Fourier transform of `input`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the IFFT.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional IFFT.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the backward transform (`ifft()`), these correspond
to:

    * `"forward"` \- no normalization
    * `"backward"` \- normalize by `1/n`
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal)

Calling the forward transform (`fft()`) with the same normalization mode will
apply an overall normalization of `1/n` between the two transforms. This is
required to make `ifft()` the exact inverse.

Default is `"backward"` (normalize by `1/n`).

#### Example

    
    
    >>> t = torch.tensor([ 6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j])
    >>> torch.fft.ifft(t)
    tensor([0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j])
    

`torch.fft.fft2(input, s=None, dim=(-2, -1), norm=None) → Tensor`

    

Computes the 2 dimensional discrete Fourier transform of `input`. Equivalent
to `fftn()` but FFTs only the last two dimensions by default.

Note

The Fourier domain representation of any real signal satisfies the Hermitian
property: `X[i, j] = conj(X[-i, -j])`. This function always returns all
positive and negative frequency terms even though, for real inputs, half of
these values are redundant. `rfft2()` returns the more compact one-sided
representation where only the positive frequencies of the last dimension are
returned.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]`
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: last two dimensions.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the forward transform (`fft2()`), these correspond to:

    * `"forward"` \- normalize by `1/n`
    * `"backward"` \- no normalization
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal)

Where `n = prod(s)` is the logical FFT size. Calling the backward transform
(`ifft2()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`ifft2()` the exact inverse.

Default is `"backward"` (no normalization).

#### Example

    
    
    >>> x = torch.rand(10, 10, dtype=torch.complex64)
    >>> fft2 = torch.fft.fft2(t)
    

The discrete Fourier transform is separable, so `fft2()` here is equivalent to
two one-dimensional `fft()` calls:

    
    
    >>> two_ffts = torch.fft.fft(torch.fft.fft(x, dim=0), dim=1)
    >>> torch.allclose(fft2, two_ffts)
    

`torch.fft.ifft2(input, s=None, dim=(-2, -1), norm=None) → Tensor`

    

Computes the 2 dimensional inverse discrete Fourier transform of `input`.
Equivalent to `ifftn()` but IFFTs only the last two dimensions by default.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the IFFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]`
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: last two dimensions.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the backward transform (`ifft2()`), these correspond
to:

    * `"forward"` \- no normalization
    * `"backward"` \- normalize by `1/n`
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal)

Where `n = prod(s)` is the logical IFFT size. Calling the forward transform
(`fft2()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`ifft2()` the exact inverse.

Default is `"backward"` (normalize by `1/n`).

#### Example

    
    
    >>> x = torch.rand(10, 10, dtype=torch.complex64)
    >>> ifft2 = torch.fft.ifft2(t)
    

The discrete Fourier transform is separable, so `ifft2()` here is equivalent
to two one-dimensional `ifft()` calls:

    
    
    >>> two_iffts = torch.fft.ifft(torch.fft.ifft(x, dim=0), dim=1)
    >>> torch.allclose(ifft2, two_iffts)
    

`torch.fft.fftn(input, s=None, dim=None, norm=None) → Tensor`

    

Computes the N dimensional discrete Fourier transform of `input`.

Note

The Fourier domain representation of any real signal satisfies the Hermitian
property: `X[i_1, ..., i_n] = conj(X[-i_1, ..., -i_n])`. This function always
returns all positive and negative frequency terms even though, for real
inputs, half of these values are redundant. `rfftn()` returns the more compact
one-sided representation where only the positive frequencies of the last
dimension are returned.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]`
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: all dimensions, or the last `len(s)` dimensions if `s` is given.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the forward transform (`fftn()`), these correspond to:

    * `"forward"` \- normalize by `1/n`
    * `"backward"` \- no normalization
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal)

Where `n = prod(s)` is the logical FFT size. Calling the backward transform
(`ifftn()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`ifftn()` the exact inverse.

Default is `"backward"` (no normalization).

#### Example

    
    
    >>> x = torch.rand(10, 10, dtype=torch.complex64)
    >>> fftn = torch.fft.fftn(t)
    

The discrete Fourier transform is separable, so `fftn()` here is equivalent to
two one-dimensional `fft()` calls:

    
    
    >>> two_ffts = torch.fft.fft(torch.fft.fft(x, dim=0), dim=1)
    >>> torch.allclose(fftn, two_ffts)
    

`torch.fft.ifftn(input, s=None, dim=None, norm=None) → Tensor`

    

Computes the N dimensional inverse discrete Fourier transform of `input`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the IFFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]`
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: all dimensions, or the last `len(s)` dimensions if `s` is given.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the backward transform (`ifftn()`), these correspond
to:

    * `"forward"` \- no normalization
    * `"backward"` \- normalize by `1/n`
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal)

Where `n = prod(s)` is the logical IFFT size. Calling the forward transform
(`fftn()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`ifftn()` the exact inverse.

Default is `"backward"` (normalize by `1/n`).

#### Example

    
    
    >>> x = torch.rand(10, 10, dtype=torch.complex64)
    >>> ifftn = torch.fft.ifftn(t)
    

The discrete Fourier transform is separable, so `ifftn()` here is equivalent
to two one-dimensional `ifft()` calls:

    
    
    >>> two_iffts = torch.fft.ifft(torch.fft.ifft(x, dim=0), dim=1)
    >>> torch.allclose(ifftn, two_iffts)
    

`torch.fft.rfft(input, n=None, dim=-1, norm=None) → Tensor`

    

Computes the one dimensional Fourier transform of real-valued `input`.

The FFT of a real signal is Hermitian-symmetric, `X[i] = conj(X[-i])` so the
output contains only the positive frequencies below the Nyquist frequency. To
compute the full output, use `fft()`

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the real input tensor
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the real FFT.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional real FFT.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the forward transform (`rfft()`), these correspond to:

    * `"forward"` \- normalize by `1/n`
    * `"backward"` \- no normalization
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal)

Calling the backward transform (`irfft()`) with the same normalization mode
will apply an overall normalization of `1/n` between the two transforms. This
is required to make `irfft()` the exact inverse.

Default is `"backward"` (no normalization).

#### Example

    
    
    >>> t = torch.arange(4)
    >>> t
    tensor([0, 1, 2, 3])
    >>> torch.fft.rfft(t)
    tensor([ 6.+0.j, -2.+2.j, -2.+0.j])
    

Compare against the full output from `fft()`:

    
    
    >>> torch.fft.fft(t)
    tensor([ 6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j])
    

Notice that the symmetric element `T[-1] == T[1].conj()` is omitted. At the
Nyquist frequency `T[-2] == T[2]` is it’s own symmetric pair, and therefore
must always be real-valued.

`torch.fft.irfft(input, n=None, dim=-1, norm=None) → Tensor`

    

Computes the inverse of `rfft()`.

`input` is interpreted as a one-sided Hermitian signal in the Fourier domain,
as produced by `rfft()`. By the Hermitian property, the output will be real-
valued.

Note

Some input frequencies must be real-valued to satisfy the Hermitian property.
In these cases the imaginary component will be ignored. For example, any
imaginary component in the zero-frequency term cannot be represented in a real
output and so will always be ignored.

Note

The correct interpretation of the Hermitian input depends on the length of the
original data, as given by `n`. This is because each input shape could
correspond to either an odd or even length signal. By default, the signal is
assumed to be even length and odd signals will not round-trip properly. So, it
is recommended to always pass the signal length `n`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor representing a half-Hermitian signal
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Output signal length. This determines the length of the output signal. If given, the input will either be zero-padded or trimmed to this length before computing the real IFFT. Defaults to even output: `n=2*(input.size(dim) - 1)`.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional real IFFT.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the backward transform (`irfft()`), these correspond
to:

    * `"forward"` \- no normalization
    * `"backward"` \- normalize by `1/n`
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the real IFFT orthonormal)

Calling the forward transform (`rfft()`) with the same normalization mode will
apply an overall normalization of `1/n` between the two transforms. This is
required to make `irfft()` the exact inverse.

Default is `"backward"` (normalize by `1/n`).

#### Example

    
    
    >>> t = torch.arange(5)
    >>> t
    tensor([0, 1, 2, 3, 4])
    >>> T = torch.fft.rfft(t)
    >>> T
    tensor([10.0000+0.0000j, -2.5000+3.4410j, -2.5000+0.8123j])
    

Without specifying the output length to `irfft()`, the output will not round-
trip properly because the input is odd-length:

    
    
    >>> torch.fft.irfft(T)
    tensor([0.6250, 1.4045, 3.1250, 4.8455])
    

So, it is recommended to always pass the signal length `n`:

    
    
    >>> torch.fft.irfft(T, t.numel())
    tensor([0.0000, 1.0000, 2.0000, 3.0000, 4.0000])
    

`torch.fft.rfft2(input, s=None, dim=(-2, -1), norm=None) → Tensor`

    

Computes the 2-dimensional discrete Fourier transform of real `input`.
Equivalent to `rfftn()` but FFTs only the last two dimensions by default.

The FFT of a real signal is Hermitian-symmetric, `X[i, j] = conj(X[-i, -j])`,
so the full `fft2()` output contains redundant information. `rfft2()` instead
omits the negative frequencies in the last dimension.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]`
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: last two dimensions.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the forward transform (`rfft2()`), these correspond
to:

    * `"forward"` \- normalize by `1/n`
    * `"backward"` \- no normalization
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the real FFT orthonormal)

Where `n = prod(s)` is the logical FFT size. Calling the backward transform
(`irfft2()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`irfft2()` the exact inverse.

Default is `"backward"` (no normalization).

#### Example

    
    
    >>> t = torch.rand(10, 10)
    >>> rfft2 = torch.fft.rfft2(t)
    >>> rfft2.size()
    torch.Size([10, 6])
    

Compared against the full output from `fft2()`, we have all elements up to the
Nyquist frequency.

    
    
    >>> fft2 = torch.fft.fft2(t)
    >>> torch.allclose(fft2[..., :6], rfft2)
    True
    

The discrete Fourier transform is separable, so `rfft2()` here is equivalent
to a combination of `fft()` and `rfft()`:

    
    
    >>> two_ffts = torch.fft.fft(torch.fft.rfft(x, dim=1), dim=0)
    >>> torch.allclose(rfft2, two_ffts)
    

`torch.fft.irfft2(input, s=None, dim=(-2, -1), norm=None) → Tensor`

    

Computes the inverse of `rfft2()`. Equivalent to `irfftn()` but IFFTs only the
last two dimensions by default.

`input` is interpreted as a one-sided Hermitian signal in the Fourier domain,
as produced by `rfft2()`. By the Hermitian property, the output will be real-
valued.

Note

Some input frequencies must be real-valued to satisfy the Hermitian property.
In these cases the imaginary component will be ignored. For example, any
imaginary component in the zero-frequency term cannot be represented in a real
output and so will always be ignored.

Note

The correct interpretation of the Hermitian input depends on the length of the
original data, as given by `s`. This is because each input shape could
correspond to either an odd or even length signal. By default, the signal is
assumed to be even length and odd signals will not round-trip properly. So, it
is recommended to always pass the signal shape `s`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Defaults to even output in the last dimension: `s[-1] = 2*(input.size(dim[-1]) - 1)`.
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. The last dimension must be the half-Hermitian compressed dimension. Default: last two dimensions.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the backward transform (`irfft2()`), these correspond
to:

    * `"forward"` \- no normalization
    * `"backward"` \- normalize by `1/n`
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the real IFFT orthonormal)

Where `n = prod(s)` is the logical IFFT size. Calling the forward transform
(`rfft2()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`irfft2()` the exact inverse.

Default is `"backward"` (normalize by `1/n`).

#### Example

    
    
    >>> t = torch.rand(10, 9)
    >>> T = torch.fft.rfft2(t)
    

Without specifying the output length to `irfft2()`, the output will not round-
trip properly because the input is odd-length in the last dimension:

    
    
    >>> torch.fft.irfft2(T).size()
    torch.Size([10, 10])
    

So, it is recommended to always pass the signal shape `s`.

    
    
    >>> roundtrip = torch.fft.irfft2(T, t.size())
    >>> roundtrip.size()
    torch.Size([10, 9])
    >>> torch.allclose(roundtrip, t)
    True
    

`torch.fft.rfftn(input, s=None, dim=None, norm=None) → Tensor`

    

Computes the N-dimensional discrete Fourier transform of real `input`.

The FFT of a real signal is Hermitian-symmetric, `X[i_1, ..., i_n] =
conj(X[-i_1, ..., -i_n])` so the full `fftn()` output contains redundant
information. `rfftn()` instead omits the negative frequencies in the last
dimension.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]`
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: all dimensions, or the last `len(s)` dimensions if `s` is given.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the forward transform (`rfftn()`), these correspond
to:

    * `"forward"` \- normalize by `1/n`
    * `"backward"` \- no normalization
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the real FFT orthonormal)

Where `n = prod(s)` is the logical FFT size. Calling the backward transform
(`irfftn()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`irfftn()` the exact inverse.

Default is `"backward"` (no normalization).

#### Example

    
    
    >>> t = torch.rand(10, 10)
    >>> rfftn = torch.fft.rfftn(t)
    >>> rfftn.size()
    torch.Size([10, 6])
    

Compared against the full output from `fftn()`, we have all elements up to the
Nyquist frequency.

    
    
    >>> fftn = torch.fft.fftn(t)
    >>> torch.allclose(fftn[..., :6], rfftn)
    True
    

The discrete Fourier transform is separable, so `rfftn()` here is equivalent
to a combination of `fft()` and `rfft()`:

    
    
    >>> two_ffts = torch.fft.fft(torch.fft.rfft(x, dim=1), dim=0)
    >>> torch.allclose(rfftn, two_ffts)
    

`torch.fft.irfftn(input, s=None, dim=None, norm=None) → Tensor`

    

Computes the inverse of `rfftn()`.

`input` is interpreted as a one-sided Hermitian signal in the Fourier domain,
as produced by `rfftn()`. By the Hermitian property, the output will be real-
valued.

Note

Some input frequencies must be real-valued to satisfy the Hermitian property.
In these cases the imaginary component will be ignored. For example, any
imaginary component in the zero-frequency term cannot be represented in a real
output and so will always be ignored.

Note

The correct interpretation of the Hermitian input depends on the length of the
original data, as given by `s`. This is because each input shape could
correspond to either an odd or even length signal. By default, the signal is
assumed to be even length and odd signals will not round-trip properly. So, it
is recommended to always pass the signal shape `s`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Defaults to even output in the last dimension: `s[-1] = 2*(input.size(dim[-1]) - 1)`.
  * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. The last dimension must be the half-Hermitian compressed dimension. Default: all dimensions, or the last `len(s)` dimensions if `s` is given.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the backward transform (`irfftn()`), these correspond
to:

    * `"forward"` \- no normalization
    * `"backward"` \- normalize by `1/n`
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the real IFFT orthonormal)

Where `n = prod(s)` is the logical IFFT size. Calling the forward transform
(`rfftn()`) with the same normalization mode will apply an overall
normalization of `1/n` between the two transforms. This is required to make
`irfftn()` the exact inverse.

Default is `"backward"` (normalize by `1/n`).

#### Example

    
    
    >>> t = torch.rand(10, 9)
    >>> T = torch.fft.rfftn(t)
    

Without specifying the output length to `irfft()`, the output will not round-
trip properly because the input is odd-length in the last dimension:

    
    
    >>> torch.fft.irfftn(T).size()
    torch.Size([10, 10])
    

So, it is recommended to always pass the signal shape `s`.

    
    
    >>> roundtrip = torch.fft.irfftn(T, t.size())
    >>> roundtrip.size()
    torch.Size([10, 9])
    >>> torch.allclose(roundtrip, t)
    True
    

`torch.fft.hfft(input, n=None, dim=-1, norm=None) → Tensor`

    

Computes the one dimensional discrete Fourier transform of a Hermitian
symmetric `input` signal.

Note

`hfft()`/`ihfft()` are analogous to `rfft()`/`irfft()`. The real FFT expects a
real signal in the time-domain and gives a Hermitian symmetry in the
frequency-domain. The Hermitian FFT is the opposite; Hermitian symmetric in
the time-domain and real-valued in the frequency-domain. For this reason,
special care needs to be taken with the length argument `n`, in the same way
as with `irfft()`.

Note

Because the signal is Hermitian in the time-domain, the result will be real in
the frequency domain. Note that some input frequencies must be real-valued to
satisfy the Hermitian property. In these cases the imaginary component will be
ignored. For example, any imaginary component in `input[0]` would result in
one or more complex frequency terms which cannot be represented in a real
output and so will always be ignored.

Note

The correct interpretation of the Hermitian input depends on the length of the
original data, as given by `n`. This is because each input shape could
correspond to either an odd or even length signal. By default, the signal is
assumed to be even length and odd signals will not round-trip properly. So, it
is recommended to always pass the signal length `n`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor representing a half-Hermitian signal
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Output signal length. This determines the length of the real output. If given, the input will either be zero-padded or trimmed to this length before computing the Hermitian FFT. Defaults to even output: `n=2*(input.size(dim) - 1)`.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional Hermitian FFT.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the forward transform (`hfft()`), these correspond to:

    * `"forward"` \- normalize by `1/n`
    * `"backward"` \- no normalization
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the Hermitian FFT orthonormal)

Calling the backward transform (`ihfft()`) with the same normalization mode
will apply an overall normalization of `1/n` between the two transforms. This
is required to make `ihfft()` the exact inverse.

Default is `"backward"` (no normalization).

#### Example

Taking a real-valued frequency signal and bringing it into the time domain
gives Hermitian symmetric output:

    
    
    >>> t = torch.arange(5)
    >>> t
    tensor([0, 1, 2, 3, 4])
    >>> T = torch.fft.ifft(t)
    >>> T
    tensor([ 2.0000+-0.0000j, -0.5000-0.6882j, -0.5000-0.1625j, -0.5000+0.1625j,
            -0.5000+0.6882j])
    

Note that `T[1] == T[-1].conj()` and `T[2] == T[-2].conj()` is redundant. We
can thus compute the forward transform without considering negative
frequencies:

    
    
    >>> torch.fft.hfft(T[:3], n=5)
    tensor([0., 1., 2., 3., 4.])
    

Like with `irfft()`, the output length must be given in order to recover an
even length output:

    
    
    >>> torch.fft.hfft(T[:3])
    tensor([0.5000, 1.1236, 2.5000, 3.8764])
    

`torch.fft.ihfft(input, n=None, dim=-1, norm=None) → Tensor`

    

Computes the inverse of `hfft()`.

`input` must be a real-valued signal, interpreted in the Fourier domain. The
IFFT of a real signal is Hermitian-symmetric, `X[i] = conj(X[-i])`. `ihfft()`
represents this in the one-sided form where only the positive frequencies
below the Nyquist frequency are included. To compute the full output, use
`ifft()`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the real input tensor
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the Hermitian IFFT.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional Hermitian IFFT.
  * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Normalization mode. For the backward transform (`ihfft()`), these correspond
to:

    * `"forward"` \- no normalization
    * `"backward"` \- normalize by `1/n`
    * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal)

Calling the forward transform (`hfft()`) with the same normalization mode will
apply an overall normalization of `1/n` between the two transforms. This is
required to make `ihfft()` the exact inverse.

Default is `"backward"` (normalize by `1/n`).

#### Example

    
    
    >>> t = torch.arange(5)
    >>> t
    tensor([0, 1, 2, 3, 4])
    >>> torch.fft.ihfft(t)
    tensor([ 2.0000+-0.0000j, -0.5000-0.6882j, -0.5000-0.1625j])
    

Compare against the full output from `ifft()`:

    
    
    >>> torch.fft.ifft(t)
    tensor([ 2.0000+-0.0000j, -0.5000-0.6882j, -0.5000-0.1625j, -0.5000+0.1625j,
        -0.5000+0.6882j])
    

## Helper Functions

`torch.fft.fftfreq(n, d=1.0, *, dtype=None, layout=torch.strided, device=None,
requires_grad=False) → Tensor`

    

Computes the discrete Fourier Transform sample frequencies for a signal of
size `n`.

Note

By convention, `fft()` returns positive frequency terms first, followed by the
negative frequencies in reverse order, so that `f[-i]` for all 0<i≤n/20 < i
\leq n/2 in Python gives the negative frequency terms. For an FFT of length
`n` and with inputs spaced in length unit `d`, the frequencies are:

    
    
    f = [0, 1, ..., (n - 1) // 2, -(n // 2), ..., -1] / (d * n)
    

Note

For even lengths, the Nyquist frequency at `f[n/2]` can be thought of as
either negative or positive. `fftfreq()` follows NumPy’s convention of taking
it to be negative.

Parameters

    

  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the FFT length
  * **d** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The sampling length scale. The spacing between individual samples of the FFT input. The default assumes unit spacing, dividing that result by the actual spacing gives the result in physical frequency units.

Keyword Arguments

    

  * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** (`torch.layout`, optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** (`torch.device`, optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

#### Example

    
    
    >>> torch.fft.fftfreq(5)
    tensor([ 0.0000,  0.2000,  0.4000, -0.4000, -0.2000])
    

For even input, we can see the Nyquist frequency at `f[2]` is given as
negative:

    
    
    >>> torch.fft.fftfreq(4)
    tensor([ 0.0000,  0.2500, -0.5000, -0.2500])
    

`torch.fft.rfftfreq(n, d=1.0, *, dtype=None, layout=torch.strided,
device=None, requires_grad=False) → Tensor`

    

Computes the sample frequencies for `rfft()` with a signal of size `n`.

Note

`rfft()` returns Hermitian one-sided output, so only the positive frequency
terms are returned. For a real FFT of length `n` and with inputs spaced in
length unit `d`, the frequencies are:

    
    
    f = torch.arange((n + 1) // 2) / (d * n)
    

Note

For even lengths, the Nyquist frequency at `f[n/2]` can be thought of as
either negative or positive. Unlike `fftfreq()`, `rfftfreq()` always returns
it as positive.

Parameters

    

  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the real FFT length
  * **d** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The sampling length scale. The spacing between individual samples of the FFT input. The default assumes unit spacing, dividing that result by the actual spacing gives the result in physical frequency units.

Keyword Arguments

    

  * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** (`torch.layout`, optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** (`torch.device`, optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

#### Example

    
    
    >>> torch.fft.rfftfreq(5)
    tensor([ 0.0000,  0.2000,  0.4000])
    
    
    
    >>> torch.fft.rfftfreq(4)
    tensor([ 0.0000,  0.2500, 0.5000])
    

Compared to the output from `fftfreq()`, we see that the Nyquist frequency at
`f[2]` has changed sign: >>> torch.fft.fftfreq(4) tensor([ 0.0000, 0.2500,
-0.5000, -0.2500])

`torch.fft.fftshift(input, dim=None) → Tensor`

    

Reorders n-dimensional FFT data, as provided by `fftn()`, to have negative
frequency terms first.

This performs a periodic shift of n-dimensional data such that the origin `(0,
..., 0)` is moved to the center of the tensor. Specifically, to
`input.shape[dim] // 2` in each selected dimension.

Note

By convention, the FFT returns positive frequency terms first, followed by the
negative frequencies in reverse order, so that `f[-i]` for all 0<i≤n/20 < i
\leq n/2 in Python gives the negative frequency terms. `fftshift()` rearranges
all frequencies into ascending order from negative to positive with the zero-
frequency term in the center.

Note

For even lengths, the Nyquist frequency at `f[n/2]` can be thought of as
either negative or positive. `fftshift()` always puts the Nyquist term at the
0-index. This is the same convention used by `fftfreq()`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the tensor in FFT order
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – The dimensions to rearrange. Only dimensions specified here will be rearranged, any other dimensions will be left in their original order. Default: All dimensions of `input`.

#### Example

    
    
    >>> f = torch.fft.fftfreq(4)
    >>> f
    tensor([ 0.0000,  0.2500, -0.5000, -0.2500])
    
    
    
    >>> torch.fft.fftshift(f)
    tensor([-0.5000, -0.2500,  0.0000,  0.2500])
    

Also notice that the Nyquist frequency term at `f[2]` was moved to the
beginning of the tensor.

This also works for multi-dimensional transforms:

    
    
    >>> x = torch.fft.fftfreq(5, d=1/5) + 0.1 * torch.fft.fftfreq(5, d=1/5).unsqueeze(1)
    >>> x
    tensor([[ 0.0000,  1.0000,  2.0000, -2.0000, -1.0000],
            [ 0.1000,  1.1000,  2.1000, -1.9000, -0.9000],
            [ 0.2000,  1.2000,  2.2000, -1.8000, -0.8000],
            [-0.2000,  0.8000,  1.8000, -2.2000, -1.2000],
            [-0.1000,  0.9000,  1.9000, -2.1000, -1.1000]])
    
    
    
    >>> torch.fft.fftshift(x)
    tensor([[-2.2000, -1.2000, -0.2000,  0.8000,  1.8000],
            [-2.1000, -1.1000, -0.1000,  0.9000,  1.9000],
            [-2.0000, -1.0000,  0.0000,  1.0000,  2.0000],
            [-1.9000, -0.9000,  0.1000,  1.1000,  2.1000],
            [-1.8000, -0.8000,  0.2000,  1.2000,  2.2000]])
    

`fftshift()` can also be useful for spatial data. If our data is defined on a
centered grid (`[-(N//2), (N-1)//2]`) then we can use the standard FFT defined
on an uncentered grid (`[0, N)`) by first applying an `ifftshift()`.

    
    
    >>> x_centered = torch.arange(-5, 5)
    >>> x_uncentered = torch.fft.ifftshift(x_centered)
    >>> fft_uncentered = torch.fft.fft(x_uncentered)
    

Similarly, we can convert the frequency domain components to centered
convention by applying `fftshift()`.

    
    
    >>> fft_centered = torch.fft.fftshift(fft_uncentered)
    

The inverse transform, from centered Fourier space back to centered spatial
data, can be performed by applying the inverse shifts in reverse order:

    
    
    >>> x_centered_2 = torch.fft.fftshift(torch.fft.ifft(torch.fft.ifftshift(fft_centered)))
    >>> torch.allclose(x_centered.to(torch.complex64), x_centered_2)
    True
    

`torch.fft.ifftshift(input, dim=None) → Tensor`

    

Inverse of `fftshift()`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the tensor in FFT order
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – The dimensions to rearrange. Only dimensions specified here will be rearranged, any other dimensions will be left in their original order. Default: All dimensions of `input`.

#### Example

    
    
    >>> f = torch.fft.fftfreq(5)
    >>> f
    tensor([ 0.0000,  0.2000,  0.4000, -0.4000, -0.2000])
    

A round-trip through `fftshift()` and `ifftshift()` gives the same result:

    
    
    >>> shifted = torch.fftshift(f)
    >>> torch.ifftshift(shifted)
    tensor([ 0.0000,  0.2000,  0.4000, -0.4000, -0.2000])
    

# torch.futures

Warning

The `torch.futures` package is experimental and subject to change.

This package provides a `Future` type that encapsulates an asynchronous
execution and a set of utility functions to simplify operations on `Future`
objects. Currently, the `Future` type is primarily used by the [Distributed
RPC Framework](rpc#distributed-rpc-framework).

`class torch.futures.Future`

    

Wrapper around a `torch._C.Future` which encapsulates an asynchronous
execution of a callable, e.g.
[`rpc_async()`](rpc#torch.distributed.rpc.rpc_async
"torch.distributed.rpc.rpc_async"). It also exposes a set of APIs to add
callback functions and set results.

`add_done_callback(self: torch._C.Future, arg0: function) → None`

`done()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.done)

    

Return `True` if this `Future` is done. A `Future` is done if it has a result
or an exception.

`set_exception(result)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.set_exception)

    

Set an exception for this `Future`, which will mark this `Future` as completed
with an error and trigger all attached callbacks. Note that when calling
wait()/value() on this `Future`, the exception set here will be raised inline.

Parameters

    

**result**
([BaseException](https://docs.python.org/3/library/exceptions.html#BaseException
"\(in Python v3.9\)")) – the exception for this `Future`.

Example::

    
    
    
    >>> import torch
    >>>
    >>> fut = torch.futures.Future()
    >>> fut.set_exception(ValueError("foo"))
    >>> fut.wait()
    >>>
    >>> # Output:
    >>> # This will run after the future has finished.
    >>> ValueError: foo
    

`set_result(result)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.set_result)

    

Set the result for this `Future`, which will mark this `Future` as completed
and trigger all attached callbacks. Note that a `Future` cannot be marked
completed twice.

Parameters

    

**result** ([object](https://docs.python.org/3/library/functions.html#object
"\(in Python v3.9\)")) – the result object of this `Future`.

Example::

    
    
    
    >>> import threading
    >>> import time
    >>> import torch
    >>>
    >>> def slow_set_future(fut, value):
    >>>     time.sleep(0.5)
    >>>     fut.set_result(value)
    >>>
    >>> fut = torch.futures.Future()
    >>> t = threading.Thread(
    >>>     target=slow_set_future,
    >>>     args=(fut, torch.ones(2) * 3)
    >>> )
    >>> t.start()
    >>>
    >>> print(fut.wait())  # tensor([3., 3.])
    >>> t.join()
    

`then(callback)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.then)

    

Append the given callback function to this `Future`, which will be run when
the `Future` is completed. Multiple callbacks can be added to the same
`Future`, and will be invoked in the same order as they were added. The
callback must take one argument, which is the reference to this `Future`. The
callback function can use the `Future.wait()` API to get the value. Note that
if this `Future` is already completed, the given callback will be run
immediately inline.

Parameters

    

**callback** (`Callable`) – a `Callable` that takes this `Future` as the only
argument.

Returns

    

A new `Future` object that holds the return value of the `callback` and will
be marked as completed when the given `callback` finishes.

Example::

    
    
    
    >>> import torch
    >>>
    >>> def callback(fut):
    >>>     print(f"RPC return value is {fut.wait()}.")
    >>>
    >>> fut = torch.futures.Future()
    >>> # The inserted callback will print the return value when
    >>> # receiving the response from "worker1"
    >>> cb_fut = fut.then(callback)
    >>> chain_cb_fut = cb_fut.then(
    >>>     lambda x : print(f"Chained cb done. {x.wait()}")
    >>> )
    >>> fut.set_result(5)
    >>>
    >>> # Outputs are:
    >>> # RPC return value is 5.
    >>> # Chained cb done. None
    

`value(self: torch._C.Future) → object`

`wait()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.wait)

    

Block until the value of this `Future` is ready.

Returns

    

The value held by this `Future`. If the function (callback or RPC) creating
the value has thrown an error, this `wait` method will also throw an error.

`torch.futures.collect_all(futures)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#collect_all)

    

Collects the provided `Future` objects into a single combined `Future` that is
completed when all of the sub-futures are completed.

Parameters

    

**futures** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in
Python v3.9\)")) – a list of `Future` objects.

Returns

    

Returns a `Future` object to a list of the passed in Futures.

Example::

    
    
    
    >>> import torch
    >>>
    >>> fut0 = torch.futures.Future()
    >>> fut1 = torch.futures.Future()
    >>>
    >>> fut = torch.futures.collect_all([fut0, fut1])
    >>>
    >>> fut0.set_result(0)
    >>> fut1.set_result(1)
    >>>
    >>> fut_list = fut.wait()
    >>> print(f"fut0 result = {fut_list[0].wait()}")
    >>> print(f"fut1 result = {fut_list[1].wait()}")
    >>> # outputs:
    >>> # fut0 result = 0
    >>> # fut1 result = 1
    

`torch.futures.wait_all(futures)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#wait_all)

    

Waits for all provided futures to be complete, and returns the list of
completed values.

Parameters

    

**futures** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in
Python v3.9\)")) – a list of `Future` object.

Returns

    

A list of the completed `Future` results. This method will throw an error if
`wait` on any `Future` throws.

# torch.fx

## Overview

**This feature is under a Beta release and its API may change.**

FX is a toolkit for developers to use to transform `nn.Module` instances. FX
consists of three main components: a **symbolic tracer,** an **intermediate
representation** , and **Python code generation**. A demonstration of these
components in action:

    
    
    import torch
    # Simple module for demonstration
    class MyModule(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.param = torch.nn.Parameter(torch.rand(3, 4))
            self.linear = torch.nn.Linear(4, 5)
    
        def forward(self, x):
            return self.linear(x + self.param).clamp(min=0.0, max=1.0)
    
    module = MyModule()
    
    from torch.fx import symbolic_trace
    # Symbolic tracing frontend - captures the semantics of the module
    symbolic_traced : torch.fx.GraphModule = symbolic_trace(module)
    
    # High-level intermediate representation (IR) - Graph representation
    print(symbolic_traced.graph)
    """
    graph(x):
        %param : [#users=1] = self.param
        %add_1 : [#users=1] = call_function[target=<built-in function add>](args = (%x, %param), kwargs = {})
        %linear_1 : [#users=1] = call_module[target=linear](args = (%add_1,), kwargs = {})
        %clamp_1 : [#users=1] = call_method[target=clamp](args = (%linear_1,), kwargs = {min: 0.0, max: 1.0})
        return clamp_1
    """
    
    # Code generation - valid Python code
    print(symbolic_traced.code)
    """
    def forward(self, x):
        param = self.param
        add_1 = x + param;  x = param = None
        linear_1 = self.linear(add_1);  add_1 = None
        clamp_1 = linear_1.clamp(min = 0.0, max = 1.0);  linear_1 = None
        return clamp_1
    """
    

The **symbolic tracer** performs “symbolic execution” of the Python code. It
feeds fake values, called Proxies, through the code. Operations on theses
Proxies are recorded. More information about symbolic tracing can be found in
the `symbolic_trace()` and `Tracer` documentation.

The **intermediate representation** is the container for the operations that
were recorded during symbolic tracing. It consists of a list of Nodes that
represent function inputs, callsites (to functions, methods, or
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") instances), and return values. More information about the
IR can be found in the documentation for `Graph`. The IR is the format on
which transformations are applied.

**Python code generation** is what makes FX a Python-to-Python (or Module-to-
Module) transformation toolkit. For each Graph IR, we can create valid Python
code matching the Graph’s semantics. This functionality is wrapped up in
`GraphModule`, which is a
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") instance that holds a `Graph` as well as a `forward` method
generated from the Graph.

Taken together, this pipeline of components (symbolic tracing → intermediate
representation → transforms → Python code generation) constitutes the Python-
to-Python transformation pipeline of FX. In addition, these components can be
used separately. For example, symbolic tracing can be used in isolation to
capture a form of the code for analysis (and not transformation) purposes.
Code generation can be used for programmatically generating models, for
example from a config file. There are many uses for FX!

Several example transformations can be found at the
[examples](https://github.com/pytorch/examples/tree/master/fx) repository.

## Writing Transformations

What is an FX transform? Essentially, it’s a function that looks like this.

    
    
    import torch
    import torch.fx
    
    def transform(m: nn.Module,
                  tracer_class : type = torch.fx.Tracer) -> torch.nn.Module:
        # Step 1: Acquire a Graph representing the code in `m`
    
        # NOTE: torch.fx.symbolic_trace is a wrapper around a call to
        # fx.Tracer.trace and constructing a GraphModule. We'll
        # split that out in our transform to allow the caller to
        # customize tracing behavior.
        graph : torch.fx.Graph = tracer_class().trace(m)
    
        # Step 2: Modify this Graph or create a new one
        graph = ...
    
        # Step 3: Construct a Module to return
        return torch.fx.GraphModule(m, graph)
    

Your transform will take in an
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module"), acquire a `Graph` from it, do some modifications, and
return a new [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module"). You should think of the
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") that your FX transform returns as identical to a regular
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") – you can pass it to another FX transform, you can pass it
to TorchScript, or you can run it. Ensuring that the inputs and outputs of
your FX transform are a
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") will allow for composability.

Note

It is also possible to modify an existing `GraphModule` instead of creating a
new one, like so:

    
    
    import torch
    import torch.fx
    
    def transform(m : nn.Module) -> nn.Module):
        gm : torch.fx.GraphModule = torch.fx.symbolic_trace(m)
    
        # Modify gm.graph
        # <...>
    
        # Recompile the forward() method of `gm` from its Graph
        gm.recompile()
    
        return gm
    

Note that you MUST call `GraphModule.recompile()` to bring the generated
`forward()` method on the `GraphModule` in sync with the modified `Graph`.

Given that you’ve passed in a
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") that has been traced into a `Graph`, there are now two
primary approaches you can take to building a new `Graph`.

### A Quick Primer on Graphs

Full treatment of the semantics of graphs can be found in the `Graph`
documentation, but we are going to cover the basics here. A `Graph` is a data
structure that represents a method on a `GraphModule`. The information that
this requires is:

  * What are the inputs to the method?
  * What are the operations that run inside the method?
  * What is the output (i.e. return) value from the method?

All three of these concepts are represented with `Node` instances. Let’s see
what we mean by that with a short example:

    
    
    import torch
    import torch.fx
    
    class MyModule(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.param = torch.nn.Parameter(torch.rand(3, 4))
            self.linear = torch.nn.Linear(4, 5)
    
        def forward(self, x):
            return torch.topk(torch.sum(
                self.linear(x + self.linear.weight).relu(), dim=-1), 3)
    
    m = MyModule()
    gm = torch.fx.symbolic_trace(m)
    
    gm.graph.print_tabular()
    

Here we define a module `MyModule` for demonstration purposes, instantiate it,
symbolically trace it, then call the `Graph.print_tabular()` method to print
out a table showing the nodes of this `Graph`:

opcode | name | target | args | kwargs  
---|---|---|---|---  
placeholder | x | x | () | {}  
get_attr | linear_weight | linear.weight | () | {}  
call_function | add_1 | <built-in function add> | (x, linear_weight) | {}  
call_module | linear_1 | linear | (add_1,) | {}  
call_method | relu_1 | relu | (linear_1,) | {}  
call_function | sum_1 | <built-in method sum …> | (relu_1,) | {‘dim’: -1}  
call_function | topk_1 | <built-in method topk …> | (sum_1, 3) | {}  
output | output | output | (topk_1,) | {}  
  
We can use this information to answer the questions we posed above.

  * What are the inputs to the method? In FX, method inputs are specified via special `placeholder` nodes. In this case, we have a single `placeholder` node with a `target` of `x`, meaning we have a single (non-self) argument named x.
  * What are the operations within the method? The `get_attr`, `call_function`, `call_module`, and `call_method` nodes represent the operations in the method. A full treatment of the semantics of all of these can be found in the `Node` documentation.
  * What is the return value of the method? The return value in a `Graph` is specified by a special `output` node.

Given that we now know the basics of how code is represented in FX, we can now
explore how we would edit a `Graph`.

### Graph Manipulation

#### Direct Graph Manipulation

One approach to building this new `Graph` is to directly manipulate your old
one. To aid in this, we can simply take the `Graph` we obtain from symbolic
tracing and modify it. For example, let’s say we desire to replace
[`torch.add()`](generated/torch.add#torch.add "torch.add") calls with
[`torch.mul()`](generated/torch.mul#torch.mul "torch.mul") calls.

    
    
    import torch
    import torch.fx
    
    # Sample module
    class M(torch.nn.Module):
        def forward(self, x, y):
            return torch.add(x, y)
    
    def transform(m: torch.nn.Module,
                  tracer_class : type = fx.Tracer) -> torch.nn.Module:
        graph : fx.Graph = tracer_class().trace(m)
        # FX represents its Graph as an ordered list of
        # nodes, so we can iterate through them.
        for node in graph.nodes:
            # Checks if we're calling a function (i.e:
            # torch.add)
            if node.op == 'call_function':
                # The target attribute is the function
                # that call_function calls.
                if node.target == torch.add:
                    node.target = torch.mul
    
        graph.lint() # Does some checks to make sure the
                     # Graph is well-formed.
    
        return fx.GraphModule(m, graph)
    

We can also do more involved `Graph` rewrites, such as deleting or appending
nodes. To aid in these transformations, FX has utility functions for
transforming the graph that can be found in the `Graph` documentation. An
example of using these APIs to append a `torch.relu()` call can be found
below.

    
    
    # Specifies the insertion point. Any nodes added to the
    # Graph within this scope will be inserted after `node`
    with traced.graph.inserting_after(node):
        # Insert a new `call_function` node calling `torch.relu`
        new_node = traced.graph.call_function(
            torch.relu, args=(node,))
    
        # We want all places that used the value of `node` to
        # now use that value after the `relu` call we've added.
        # We use the `replace_all_uses_with` API to do this.
        node.replace_all_uses_with(new_node)
    

For simple transformations that only consist of substitutions, you can also
make use of the [subgraph
rewriter.](https://github.com/pytorch/pytorch/blob/master/torch/fx/subgraph_rewriter.py)

#### Subgraph Rewriting With replace_pattern()

FX also provides another level of automation on top of direct graph
manipulation. The `replace_pattern()` API is essentially a “find/replace” tool
for editing `Graph`s. It allows you to specify a `pattern` and `replacement`
function and it will trace through those functions, find instances of the
group of operations in the `pattern` graph, and replace those instances with
copies of the `replacement` graph. This can help to greatly automate tedious
graph manipulation code, which can get unwieldy as the transformations get
more complex.

#### Graph Manipulation Examples

  * [Replace one op](https://github.com/pytorch/examples/blob/master/fx/replace_op.py)
  * [Conv/Batch Norm fusion](https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py)
  * [replace_pattern: Basic usage](https://github.com/pytorch/examples/blob/master/fx/subgraph_rewriter_basic_use.py)
  * [Quantization](https://pytorch.org/docs/master/quantization.html#prototype-fx-graph-mode-quantization)
  * [Invert Transformation](https://github.com/pytorch/examples/blob/master/fx/invert.py)

### Proxy/Retracing

Another way of manipulating `Graph`s is by reusing the `Proxy` machinery used
in symbolic tracing. For example, let’s imagine that we wanted to write a
transformation that decomposed PyTorch functions into smaller operations. It
would transform every `F.relu(x)` call into `(x > 0) * x`. One possibility
would be to perform the requisite graph rewriting to insert the comparison and
multiplication after the `F.relu`, and then clean up the original `F.relu`.
However, we can automate this process by using `Proxy` objects to
automatically record operations into the `Graph`.

To use this method, we write the operations that we want inserted as regular
PyTorch code and invoke that code with `Proxy` objects as arugments. These
`Proxy` objects will capture the operations that are performed on them and
append them to the `Graph`.

    
    
    # Note that this decomposition rule can be read as regular Python
    def relu_decomposition(x):
        return (x > 0) * x
    
    decomposition_rules = {}
    decomposition_rules[F.relu] = relu_decomposition
    
    def decompose(model: torch.nn.Module,
                  tracer_class : type = fx.Tracer) -> torch.nn.Module:
        """
        Decompose `model` into smaller constituent operations.
        Currently,this only supports decomposing ReLU into its
        mathematical definition: (x > 0) * x
        """
        graph : fx.Graph = tracer_class().trace(model)
        new_graph = fx.Graph()
        env = {}
        for node in graph.nodes:
            if node.op == 'call_function' and node.target in decomposition_rules:
                # By wrapping the arguments with proxies,
                # we can dispatch to the appropriate
                # decomposition rule and implicitly add it
                # to the Graph by symbolically tracing it.
                proxy_args = [
                    fx.Proxy(env[x.name]) if isinstance(x, fx.Node) else x for x in node.args]
                output_proxy = decomposition_rules[node.target](*proxy_args)
    
                # Operations on `Proxy` always yield new `Proxy`s, and the
                # return value of our decomposition rule is no exception.
                # We need to extract the underlying `Node` from the `Proxy`
                # to use it in subsequent iterations of this transform.
                new_node = output_proxy.node
                env[node.name] = new_node
            else:
                # Default case: we don't have a decomposition rule for this
                # node, so just copy the node over into the new graph.
                new_node = new_graph.node_copy(node, lambda x: env[x.name])
                env[node.name] = new_node
        return fx.GraphModule(model, new_graph)
    

In addition to avoiding explicit graph manipulation, using `Proxy`s also
allows you to specify your rewrite rules as native Python code. For
transformations that require a large amount of rewrite rules (such as vmap or
grad), this can often improve readability and maintainability of the rules.

A worked example of using `Proxy`s for `Graph` manipulation can be found
[here](https://github.com/pytorch/examples/blob/master/fx/proxy_based_graph_creation.py).

### The Interpreter Pattern

A useful code organizational pattern in FX is to loop over all the `Node`s in
a `Graph` and execute them. This can be used for several things including
runtime analysis of values flowing through the graph or transformation of the
code via retracing with `Proxy`s. For example, suppose we want to run a
`GraphModule` and record the [`torch.Tensor`](tensors#torch.Tensor
"torch.Tensor") shape and dtype properties on the nodes as we see them at
runtime. That might look like:

    
    
    import torch
    import torch.fx
    from torch.fx.node import Node
    
    from typing import Dict
    
    class ShapeProp:
        """
        Shape propagation. This class takes a `GraphModule`.
        Then, its `propagate` method executes the `GraphModule`
        node-by-node with the given arguments. As each operation
        executes, the ShapeProp class stores away the shape and
        element type for the output values of each operation on
        the `shape` and `dtype` attributes of the operation's
        `Node`.
        """
        def __init__(self, mod):
            self.mod = mod
            self.graph = mod.graph
            self.modules = dict(self.mod.named_modules())
    
        def propagate(self, *args):
            args_iter = iter(args)
            env : Dict[str, Node] = {}
    
            def load_arg(a):
                return torch.fx.graph.map_arg(a, lambda n: env[n.name])
    
            def fetch_attr(target : str):
                target_atoms = target.split('.')
                attr_itr = self.mod
                for i, atom in enumerate(target_atoms):
                    if not hasattr(attr_itr, atom):
                        raise RuntimeError(f"Node referenced nonexistant target {'.'.join(target_atoms[:i])}")
                    attr_itr = getattr(attr_itr, atom)
                return attr_itr
    
            for node in self.graph.nodes:
                if node.op == 'placeholder':
                    result = next(args_iter)
                elif node.op == 'get_attr':
                    result = fetch_attr(node.target)
                elif node.op == 'call_function':
                    result = node.target(*load_arg(node.args), **load_arg(node.kwargs))
                elif node.op == 'call_method':
                    self_obj, *args = load_arg(node.args)
                    kwargs = load_arg(node.kwargs)
                    result = getattr(self_obj, node.target)(*args, **kwargs)
                elif node.op == 'call_module':
                    result = self.modules[node.target](*load_arg(node.args), **load_arg(node.kwargs))
    
                # This is the only code specific to shape propagation.
                # you can delete this `if` branch and this becomes
                # a generic GraphModule interpreter.
                if isinstance(result, torch.Tensor):
                    node.shape = result.shape
                    node.dtype = result.dtype
    
                env[node.name] = result
    
            return load_arg(self.graph.result)
    

As you can see, a full interpreter for FX is not that complicated but it can
be very useful. To ease using this pattern, we provide the `Interpreter`
class, which encompasses the above logic in a way that certain aspects of the
interpreter’s execution can be overridden via method overrides.

In addition to executing operations, we can also generate a new `Graph` by
feeding `Proxy` values through an interpreter. Similarly, we provide the
`Transformer` class to encompass this pattern. `Transformer` behaves similarly
to `Interpreter`, but instead of calling the `run` method to get a concrete
output value from the Module, you would call the `Transformer.transform()`
method to return a new `GraphModule` which was subject to any transformation
rules you installed as overridden methods.

#### Examples of the Interpreter Pattern

  * [Shape Propagation](https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/shape_prop.py)
  * [Performance Profiler](https://github.com/pytorch/tutorials/pull/1319)

## Debugging

### Introduction

Often in the course of authoring transformations, our code will not be quite
right. In this case, we may need to do some debugging. The key is to work
backwards: first, check the results of invoking the generated module to prove
or disprove correctness. Then, inspect and debug the generated code. Then,
debug the process of transformations that led to the generated code.

If you’re not familiar with debuggers, please see the auxiliary section
Available Debuggers.

### Checking Correctness of Modules

Because the output of most deep learning modules consists of floating point
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") instances, checking for
equivalence between the results of two
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") is not as straightforward as doing a simple equality check.
To motivate this, let’s use an example:

    
    
    import torch
    import torch.fx
    import torchvision.models as models
    
    def transform(m : torch.nn.Module) -> torch.nn.Module:
        gm = torch.fx.symbolic_trace(m)
    
        # Imagine we're doing some transforms here
        # <...>
    
        gm.recompile()
    
        return gm
    
    resnet18 = models.resnet18()
    transformed_resnet18 = transform(resnet18)
    
    input_image = torch.randn(5, 3, 224, 224)
    
    assert resnet18(input_image) == transformed_resnet18(input_image)
    """
    RuntimeError: Boolean value of Tensor with more than one value is ambiguous
    """
    

Here, we’ve tried to check equality of the values of two deep learning models
with the `==` equality operator. However, this is not well- defined both due
to the issue of that operator returning a tensor and not a bool, but also
because comparison of floating point values should use a margin of error (or
epsilon) to account for the non-commutativity of floating point operations
(see [here](https://floating-point-gui.de/errors/comparison/) for more
details). We can use
[`torch.allclose()`](generated/torch.allclose#torch.allclose "torch.allclose")
instead, which will give us an approximate comparison taking into account a
relative and absolute tolerance threshold:

    
    
    assert torch.allclose(resnet18(input_image), transformed_resnet18(input_image))
    

This is the first tool in our toolbox to check if transformed modules are
behaving as we expect compared to a reference implementation.

### Debugging the Generated Code

Because FX generates the `forward()` function on `GraphModule`s, using
traditional debugging techniques like `print` statements or `pdb` is not as
straightfoward. Luckily, we have several techniques we can use for debugging
the generated code.

#### Use `pdb`

Invoke `pdb` to step into the running program. Although the code that
represents the `Graph` is not in any source file, we can still step into it
manually using `pdb` when the forward pass is invoked.

    
    
    import torch
    import torch.fx
    import torchvision.models as models
    
    def my_pass(inp: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module:
        graph = tracer_class().trace(inp)
        # Transformation logic here
        # <...>
    
        # Return new Module
        return fx.GraphModule(inp, graph)
    
    my_module = models.resnet18()
    my_module_transformed = my_pass(my_module)
    
    input_value = torch.randn(5, 3, 224, 224)
    
    # When this line is executed at runtime, we will be dropped into an
    # interactive `pdb` prompt. We can use the `step` or `s` command to
    # step into the execution of the next line
    import pdb; pdb.set_trace()
    
    my_module_transformed(input_value)
    

#### Print the Generated Code

If you’d like to run the same code multiple times, then it can be a bit
tedious to step to the right code with `pdb`. In that case, one approach is to
simply copy-paste the generated `forward` pass into your code and examine it
from there.

    
    
    # Assume that `traced` is a GraphModule that has undergone some
    # number of transforms
    
    # Copy this code for later
    print(traced)
    # Print the code generated from symbolic tracing. This outputs:
    """
    def forward(self, y):
        x = self.x
        add_1 = x + y;  x = y = None
        return add_1
    """
    
    # Subclass the original Module
    class SubclassM(M):
        def __init__(self):
            super().__init__()
    
        # Paste the generated `forward` function (the one we printed and
        # copied above) here
        def forward(self, y):
            x = self.x
            add_1 = x + y;  x = y = None
            return add_1
    
    # Create an instance of the original, untraced Module. Then, create an
    # instance of the Module with the copied `forward` function. We can
    # now compare the output of both the original and the traced version.
    pre_trace = M()
    post_trace = SubclassM()
    

#### Use the `to_folder` Function From `GraphModule`

`GraphModule.to_folder()` is a method in `GraphModule` that allows you to dump
out the generated FX code to a folder. Although copying the forward pass into
the code often suffices as in Print the Generated Code, it may be easier to
examine modules and parameters using `to_folder`.

    
    
    m = symbolic_trace(M())
    m.to_folder("foo", "Bar")
    from foo import Bar
    y = Bar()
    

After running the above example, we can then look at the code within
`foo/module.py` and modify it as desired (e.g. adding `print` statements or
using `pdb`) to debug the generated code.

### Debugging the Transformation

Now that we’ve identified that a transformation is creating incorrect code,
it’s time to debug the transformation itself. First, we’ll check the
Limitations of Symbolic Tracing section in the documentation. Once we verify
that tracing is working as expected, the goal becomes figuring out what went
wrong during our `GraphModule` transformation. There may be a quick answer in
Writing Transformations, but, if not, there are several ways to examine our
traced module:

    
    
    # Sample Module
    class M(torch.nn.Module):
        def forward(self, x, y):
            return x + y
    
    # Create an instance of `M`
    m = M()
    
    # Symbolically trace an instance of `M` (returns a GraphModule). In
    # this example, we'll only be discussing how to inspect a
    # GraphModule, so we aren't showing any sample transforms for the
    # sake of brevity.
    traced = symbolic_trace(m)
    
    # Print the code produced by tracing the module.
    print(traced)
    # The generated `forward` function is:
    """
    def forward(self, x, y):
        add_1 = x + y;  x = y = None
        return add_1
    """
    
    # Print the internal Graph.
    print(traced.graph)
    # This print-out returns:
    """
    graph(x, y):
        %add_1 : [#users=1] = call_function[target=<built-in function add>](args = (%x, %y), kwargs = {})
        return add_1
    """
    
    # Print a tabular representation of the internal Graph.
    traced.graph.print_tabular()
    # This gives us:
    """
    opcode         name    target                   args      kwargs
    -------------  ------  -----------------------  --------  --------
    placeholder    x       x                        ()        {}
    placeholder    y       y                        ()        {}
    call_function  add_1   <built-in function add>  (x, y)    {}
    """
    

Using the utility functions above, we can compare our traced Module before and
after we’ve applied our transformations. Sometimes, a simple visual comparison
is enough to trace down a bug. If it’s still not clear what’s going wrong, a
debugger like `pdb` can be a good next step.

Going off of the example above, consider the following code:

    
    
    # Sample user-defined function
    def transform_graph(module: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module:
        # Get the Graph from our traced Module
        g = tracer_class().trace(module)
    
        """
        Transformations on `g` go here
        """
    
        return fx.GraphModule(module, g)
    
    # Transform the Graph
    transformed = transform_graph(traced)
    
    # Print the new code after our transforms. Check to see if it was
    # what we expected
    print(transformed)
    

Using the above example, let’s say that the call to `print(traced)` showed us
that there was an error in our transforms. We want to find what goes wrong
using a debugger. We start a `pdb` session. We can see what’s happening during
the transform by breaking on `transform_graph(traced)`, then pressing `s` to
“step into” the call to `transform_graph(traced)`.

We may also have good luck by editing the `print_tabular` method to print
different attributes of the Nodes in the Graph. (For example, we might want to
see the Node’s `input_nodes` and `users`.)

### Available Debuggers

The most common Python debugger is
[pdb](https://docs.python.org/3/library/pdb.html). You can start your program
in “debug mode” with `pdb` by typing `python -m pdb FILENAME.py` into the
command line, where `FILENAME` is the name of the file you want to debug.
After that, you can use the `pdb` [debugger
commands](https://docs.python.org/3/library/pdb.html#debugger-commands) to
move through your running program stepwise. It’s common to set a breakpoint
(`b LINE-NUMBER`) when you start `pdb`, then call `c` to run the program until
that point. This prevents you from having to step through each line of
execution (using `s` or `n`) to get to the part of the code you want to
examine. Alternatively, you can write `import pdb; pdb.set_trace()` before the
line you want to break at. If you add `pdb.set_trace()`, your program will
automatically start in debug mode when you run it. (In other words, you can
just type `python FILENAME.py` into the command line instead of `python -m pdb
FILENAME.py`.) Once you’re running your file in debug mode, you can step
through the code and examine your program’s internal state using certain
commands. There are many excellent tutorials on `pdb` online, including
RealPython’s [“Python Debugging With Pdb”](https://realpython.com/python-
debugging-pdb/).

IDEs like PyCharm or VSCode usually have a debugger built in. In your IDE, you
can choose to either a) use `pdb` by pulling up a terminal window in your IDE
(e.g. View → Terminal in VSCode), or b) use the built-in debugger (usually a
graphical wrapper around `pdb`).

## Limitations of Symbolic Tracing

FX uses a system of **symbolic tracing** (a.k.a [symbolic
execution](https://en.wikipedia.org/wiki/Symbolic_execution)) to capture the
semantics of programs in a transformable/analyzable form. The system is
**tracing** in that it executes the program (really a
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") or function) to record operations. It is **symbolic** in
that the data flowing through the program during this execution is not real
data, but rather symbols (`Proxy` in FX parlance).

Although symbolic tracing works for most neural net code, it has some
limitations.

### Dynamic Control Flow

The main limitation of symbolic tracing is it does not currently support
_dynamic control flow_. That is, loops or `if` statements where the condition
may depend on the input values of the program.

For example, let’s examine the following program:

    
    
    def func_to_trace(x):
        dim0 = x.size[0]
        if dim0 == 3:
            return torch.relu(x)
        else:
            return torch.neg(x)
    
    traced = torch.fx.symbolic_trace(func_to_trace)
    """
      <...>
      File "dyn.py", line 6, in func_to_trace
        if dim0 == 3:
      File "pytorch/torch/fx/proxy.py", line 155, in __bool__
        return self.tracer.to_bool(self)
      File "pytorch/torch/fx/proxy.py", line 85, in to_bool
        raise TraceError('symbolically traced variables cannot be used as inputs to control flow')
    torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow
    """
    

The condition to the `if` statement relies on the value of `dim0`, which
eventually relies on the value of `x`, a function input. Since `x` can change
(i.e. if you pass a new input tensor to the traced function), this is _dynamic
control flow_. The traceback walks back up through your code to show you where
this situation happens.

#### Static Control Flow

On the other hand, so-called _static control flow_ is supported. Static
control flow is loops or `if` statements whose value cannot change across
invocations. Typically, in PyTorch programs, this control flow arises for code
making decisions about a model’s architecture based on hyper-parameters. As a
concrete example:

    
    
    import torch
    import torch.fx
    
    class MyModule(torch.nn.Module):
        def __init__(self, do_activation : bool = False):
            super().__init__()
            self.do_activation = do_activation
            self.linear = torch.nn.Linear(512, 512)
    
        def forward(self, x):
            x = self.linear(x)
            # This if-statement is so-called static control flow.
            # Its condition does not depend on any input values
            if self.do_activation:
                x = torch.relu(x)
            return x
    
    without_activation = MyModule(do_activation=False)
    with_activation = MyModule(do_activation=True)
    
    traced_without_activation = torch.fx.symbolic_trace(without_activation)
    print(traced_without_activation.code)
    """
    def forward(self, x):
        linear_1 = self.linear(x);  x = None
        return linear_1
    """
    
    traced_with_activation = torch.fx.symbolic_trace(with_activation)
    print(traced_with_activation.code)
    """
    import torch
    def forward(self, x):
        linear_1 = self.linear(x);  x = None
        relu_1 = torch.relu(linear_1);  linear_1 = None
        return relu_1
    """
    

The if-statement `if self.do_activation` does not depend on any function
inputs, thus it is static. `do_activation` can be considered to be a hyper-
parameter, and the traces of different instances of `MyModule` with different
values for that parameter have different code. This is a valid pattern that is
supported by symbolic tracing.

Many instances of dynamic control flow are semantically static control flow.
These instances can be made to support symbolic tracing by removing the data
dependencies on input values, for example by moving values to `Module`
attributes or by passing constant values during symbolic tracing:

    
    
    def f(x, flag):
        if flag: return x
        else: return x*2
    
    fx.symbolic_trace(f) # Fails!
    
    def wrapper(flag):
        return lambda x: f(x, flag)
    
    new_f = wrapper(flag=True)
    fx.symbolic_trace(new_f)
    

In the case of truly dynamic control flow, the sections of the program that
contain this code can be traced as calls to the Method (see Customizing
Tracing with the Tracer class) or function (see `wrap()`) rather than tracing
through them.

### Non-`torch` Functions

FX uses `__torch_function__` as the mechanism by which it intercepts calls
(see the [technical
overview](https://github.com/pytorch/pytorch/blob/master/torch/fx/OVERVIEW.md#technical-
details) for more information about this). Some functions, such as builtin
Python functions or those in the `math` module, are things that are not
covered by `__torch_function__`, but we would still like to capture them in
symbolic tracing. For example:

    
    
    import torch
    import torch.fx
    from math import sqrt
    
    def normalize(x):
        """
        Normalize `x` by the size of the batch dimension
        """
        return x / sqrt(len(x))
    
    # It's valid Python code
    normalize(torch.rand(3, 4))
    
    traced = torch.fx.symbolic_trace(normalize)
    """
      <...>
      File "sqrt.py", line 9, in normalize
        return x / sqrt(len(x))
      File "pytorch/torch/fx/proxy.py", line 161, in __len__
        raise RuntimeError("'len' is not supported in symbolic tracing by default. If you want "
    RuntimeError: 'len' is not supported in symbolic tracing by default. If you want this call to be recorded, please call torch.fx.wrap('len') at module scope
    """
    

The error tells us that the built-in function `len` is not supported. We can
make it so that functions like this are recorded in the trace as direct calls
using the `wrap()` API:

    
    
    torch.fx.wrap('len')
    torch.fx.wrap('sqrt')
    
    traced = torch.fx.symbolic_trace(normalize)
    
    print(traced.code)
    """
    import math
    def forward(self, x):
        len_1 = len(x)
        sqrt_1 = math.sqrt(len_1);  len_1 = None
        truediv = x / sqrt_1;  x = sqrt_1 = None
        return truediv
    """
    

### Customizing Tracing with the `Tracer` class

The `Tracer` class is the class that underlies the implementation of
`symbolic_trace`. The behavior of tracing can be customized by subclassing
Tracer, like so:

    
    
    class MyCustomTracer(torch.fx.Tracer):
        # Inside here you can override various methods
        # to customize tracing. See the `Tracer` API
        # reference
        pass
    
    
    # Let's use this custom tracer to trace through this module
    class MyModule(torch.nn.Module):
        def forward(self, x):
            return torch.relu(x) + torch.ones(3, 4)
    
    mod = MyModule()
    
    traced_graph = MyCustomTracer().trace(mod)
    # trace() returns a Graph. Let's wrap it up in a
    # GraphModule to make it runnable
    traced = torch.fx.GraphModule(mod, traced_graph)
    

#### Leaf Modules

Leaf Modules are the modules that appear as calls in the symbolic trace rather
than being traced through. The default set of leaf modules is the set of
standard `torch.nn` module instances. For example:

    
    
    class MySpecialSubmodule(torch.nn.Module):
        def forward(self, x):
            return torch.neg(x)
    
    class MyModule(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.linear = torch.nn.Linear(3, 4)
            self.submod = MySpecialSubmodule()
    
        def forward(self, x):
            return self.submod(self.linear(x))
    
    traced = torch.fx.symbolic_trace(MyModule())
    print(traced.code)
    # `linear` is preserved as a call, yet `submod` is traced though.
    # This is because the default set of "Leaf Modules" includes all
    # standard `torch.nn` modules.
    """
    import torch
    def forward(self, x):
        linear_1 = self.linear(x);  x = None
        neg_1 = torch.neg(linear_1);  linear_1 = None
        return neg_1
    """
    

The set of leaf modules can be customized by overriding
`Tracer.is_leaf_module()`.

### Miscellanea

  * Tensor constructors (e.g. `torch.zeros`, `torch.ones`, `torch.rand`, `torch.randn`, `torch.sparse_coo_tensor`) are currently not traceable.

    * The deterministic constructors (`zeros`, `ones`) can be used and the value they produce will be embedded in the trace as a constant. This is only problematic if the arguments to these constructors refers to dynamic input sizes. In this case, `ones_like` or `zeros_like` may be a viable substitute.
    * Nondeterministic constructors (`rand`, `randn`) will have a single random value embedded in the trace. This is likely not the intended behavior.
    * This behavior may be fixed in a future release.
  * Type annotations

    * Python 3-style type annotations (e.g. `func(x : torch.Tensor, y : int) -> torch.Tensor`) are supported and will be preserved by symbolic tracing.
    * Python 2-style comment type annotations `# type: (torch.Tensor, int) -> torch.Tensor` are not currently supported.
    * Annotations on local names within a function are not currently supported.

## API Reference

`torch.fx.symbolic_trace(root, concrete_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#symbolic_trace)

    

Symbolic tracing API

Given an `nn.Module` or function instance `root`, this function will return a
`GraphModule` constructed by recording operations seen while tracing through
`root`.

Parameters

    

  * **root** (_Union_ _[_[torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") _,__Callable_ _]_) – Module or function to be traced and converted into a Graph representation.
  * **concrete_args** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__any_ _]__]_) – Concrete arguments that should not be treated as Proxies.

Returns

    

a Module created from the recorded operations from `root`.

Return type

    

GraphModule

`torch.fx.wrap(fn_or_name)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#wrap)

    

This function can be called at module-level scope to register fn_or_name as a
“leaf function”. A “leaf function” will be preserved as a CallFunction node in
the FX trace instead of being traced through:

    
    
    # foo/bar/baz.py
    def my_custom_function(x, y):
        return x * x + y * y
    
    torch.fx.wrap('my_custom_function')
    
    def fn_to_be_traced(x, y):
        # When symbolic tracing, the below call to my_custom_function will be inserted into
        # the graph rather than tracing it.
        return my_custom_function(x, y)
    

This function can also equivalently be used as a decorator:

    
    
    # foo/bar/baz.py
    @torch.fx.wrap
    def my_custom_function(x, y):
        return x * x + y * y
    

A wrapped function can be thought of a “leaf function”, analogous to the
concept of “leaf modules”, that is, they are functions that are left as calls
in the FX trace rather than traced through.

Parameters

    

**fn_or_name** (_Union_
_[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python
v3.9\)") _,__Callable_ _]_) – The function or name of the global function to
insert into the graph when it’s called

`class torch.fx.GraphModule(root, graph, class_name='GraphModule')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule)

    

GraphModule is an nn.Module generated from an fx.Graph. Graphmodule has a
`graph` attribute, as well as `code` and `forward` attributes generated from
that `graph`.

Warning

When `graph` is reassigned, `code` and `forward` will be automatically
regenerated. However, if you edit the contents of the `graph` without
reassigning the `graph` attribute itself, you must call `recompile()` to
update the generated code.

`__init__(root, graph, class_name='GraphModule')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule.__init__)

    

Construct a GraphModule.

Parameters

    

  * **root** (_Union_ _[_[torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") _,__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Any_ _]_) – `root` can either be an nn.Module instance or a Dict mapping strings to any attribute type. In the case that `root` is a Module, any references to Module-based objects (via qualified name) in the Graph’s Nodes’ `target` field will be copied over from the respective place within `root`’s Module hierarchy into the GraphModule’s module hierarchy. In the case that `root` is a dict, the qualified name found in a Node’s `target` will be looked up directly in the dict’s keys. The object mapped to by the Dict will be copied over into the appropriate place within the GraphModule’s module hierarchy.
  * **graph** (Graph) – `graph` contains the nodes this GraphModule should use for code generation
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – `name` denotes the name of this GraphModule for debugging purposes. If it’s unset, all error messages will report as originating from `GraphModule`. It may be helpful to set this to `root`’s original name or a name that makes sense within the context of your transform.

`property code`

    

Return the Python code generated from the `Graph` underlying this
`GraphModule`.

`property graph`

    

Return the `Graph` underlying this `GraphModule`

`recompile()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule.recompile)

    

Recompile this GraphModule from its `graph` attribute. This should be called
after editing the contained `graph`, otherwise the generated code of this
`GraphModule` will be out of date.

`to_folder(folder, module_name='FxModule')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule.to_folder)

    

Dumps out module to `folder` with `module_name` so that it can be imported
with `from <folder> import <module_name>`

Parameters

    

  * **folder** (_Union_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,_[os.PathLike](https://docs.python.org/3/library/os.html#os.PathLike "\(in Python v3.9\)") _]_) – The folder to write the code out to
  * **module_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Top-level name to use for the `Module` while writing out the code

`class torch.fx.Graph`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph)

    

`Graph` is the main data structure used in the FX Intermediate Representation.
It consists of a series of `Node` s, each representing callsites (or other
syntactic constructs). The list of `Node` s, taken together, constitute a
valid Python function.

For example, the following code

    
    
    import torch
    import torch.fx
    
    class MyModule(torch.nn.Module):
        def __init__(self):
            super().__init__()
            self.param = torch.nn.Parameter(torch.rand(3, 4))
            self.linear = torch.nn.Linear(4, 5)
    
        def forward(self, x):
            return torch.topk(torch.sum(self.linear(x + self.linear.weight).relu(), dim=-1), 3)
    
    m = MyModule()
    gm = torch.fx.symbolic_trace(m)
    

Will produce the following Graph:

    
    
    print(gm.graph)
    
    
    
    graph(x):
        %linear_weight : [#users=1] = self.linear.weight
        %add_1 : [#users=1] = call_function[target=operator.add](args = (%x, %linear_weight), kwargs = {})
        %linear_1 : [#users=1] = call_module[target=linear](args = (%add_1,), kwargs = {})
        %relu_1 : [#users=1] = call_method[target=relu](args = (%linear_1,), kwargs = {})
        %sum_1 : [#users=1] = call_function[target=torch.sum](args = (%relu_1,), kwargs = {dim: -1})
        %topk_1 : [#users=1] = call_function[target=torch.topk](args = (%sum_1, 3), kwargs = {})
        return topk_1
    

For the semantics of operations represented in the `Graph`, please see `Node`.

`__init__()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.__init__)

    

Construct an empty Graph.

`call_function(the_function, args=None, kwargs=None, type_expr=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.call_function)

    

Insert a `call_function` `Node` into the `Graph`. A `call_function` node
represents a call to a Python callable, specified by `the_function`.
`the_function` can be

Parameters

    

  * **the_function** (_Callable_ _[__..__,__Any_ _]_) – The function to be called. Can be any PyTorch operator, Python function, or member of the `builtins` or `operator` namespaces.
  * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – The positional arguments to be passed to the called function.
  * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – The keyword arguments to be passed to the called function
  * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have.

Returns

The newly created and inserted `call_function` node.

Note

The same insertion point and type expression rules apply for this method as
`Graph.create_node()`.

`call_method(method_name, args=None, kwargs=None, type_expr=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.call_method)

    

Insert a `call_method` `Node` into the `Graph`. A `call_method` node
represents a call to a given method on the 0th element of `args`.

Parameters

    

  * **method_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The name of the method to apply to the self argument. For example, if args[0] is a `Node` representing a `Tensor`, then to call `relu()` on that `Tensor`, pass `relu` to `method_name`.
  * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – The positional arguments to be passed to the called method. Note that this _should_ include a `self` argument.
  * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – The keyword arguments to be passed to the called method
  * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have.

Returns

    

The newly created and inserted `call_method` node.

Note

The same insertion point and type expression rules apply for this method as
`Graph.create_node()`.

`call_module(module_name, args=None, kwargs=None, type_expr=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.call_module)

    

Insert a `call_module` `Node` into the `Graph`. A `call_module` node
represents a call to the forward() function of a `Module` in the `Module`
hierarchy.

Parameters

    

  * **module_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The qualified name of the `Module` in the `Module` hierarchy to be called. For example, if the traced `Module` has a submodule named `foo`, which has a submodule named `bar`, the qualified name `foo.bar` should be passed as `module_name` to call that module.
  * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – The positional arguments to be passed to the called method. Note that this should _not_ include a `self` argument.
  * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – The keyword arguments to be passed to the called method
  * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have.

Returns

    

The newly-created and inserted `call_module` node.

Note

The same insertion point and type expression rules apply for this method as
`Graph.create_node()`.

`create_node(op, target, args=None, kwargs=None, name=None, type_expr=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.create_node)

    

Create a `Node` and add it to the `Graph` at the current insert-point. Note
that the current insert-point can be set via `Graph.inserting_before()` and
`Graph.inserting_after()`.

Parameters

    

  * **op** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – the opcode for this Node. One of ‘call_function’, ‘call_method’, ‘get_attr’, ‘call_module’, ‘placeholder’, or ‘output’. The semantics of these opcodes are described in the `Graph` docstring.
  * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – is a tuple of arguments to this node.
  * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – the kwargs of this Node
  * **name** (_Optional_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]_) – an optional string name for the `Node`. This will influence the name of the value assigned to in the Python generated code.
  * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have.

Returns

    

The newly-created and inserted node.

`erase_node(to_erase)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.erase_node)

    

Erases a `Node` from the `Graph`. Throws an exception if there are still users
of that node in the `Graph`.

Parameters

    

**to_erase** (Node) – The `Node` to erase from the `Graph`.

`get_attr(qualified_name, type_expr=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.get_attr)

    

Insert a `get_attr` node into the Graph. A `get_attr` `Node` represents the
fetch of an attribute from the `Module` hierarchy.

Parameters

    

  * **qualified_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – the fully-qualified name of the attribute to be retrieved. For example, if the traced Module has a submodule named `foo`, which has a submodule named `bar`, which has an attribute named `baz`, the qualified name `foo.bar.baz` should be passed as `qualified_name`.
  * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have.

Returns

    

The newly-created and inserted `get_attr` node.

Note

The same insertion point and type expression rules apply for this method as
`Graph.create_node`.

`graph_copy(g, val_map)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.graph_copy)

    

Copy all nodes from a given graph into `self`.

Parameters

    

  * **g** (Graph) – The source graph from which to copy Nodes.
  * **val_map** (_Dict_ _[_Node _,_Node _]_) – a dictionary that will be populated with a mapping from nodes in `g` to nodes in `self`. Note that `val_map` can be passed in with values in it already to override copying of certain values.

Returns

    

The value in `self` that is now equivalent to the output value in `g`, if `g`
had an `output` node. `None` otherwise.

`inserting_after(n=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.inserting_after)

    

Set the point at which create_node and companion methods will insert into the
graph. When used within a ‘with’ statement, this will temporary set the insert
point and then restore it when the with statement exits:

    
    
    with g.inserting_after(n):
        ... # inserting after node n
    ... # insert point restored to what it was previously
    g.inserting_after(n) #  set the insert point permanently
    

Parameters

    

**n** (_Optional_ _[_Node _]_) – The node before which to insert. If None this
will insert after the beginning of the entire graph.

Returns

    

A resource manager that will restore the insert point on `__exit__`.

`inserting_before(n=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.inserting_before)

    

Set the point at which create_node and companion methods will insert into the
graph. When used within a ‘with’ statement, this will temporary set the insert
point and then restore it when the with statement exits:

    
    
    with g.inserting_before(n):
        ... # inserting before node n
    ... # insert point restored to what it was previously
    g.inserting_before(n) #  set the insert point permanently
    

Parameters

    

**n** (_Optional_ _[_Node _]_) – The node before which to insert. If None this
will insert before the beginning of the entire graph.

Returns

    

A resource manager that will restore the insert point on `__exit__`.

`lint(root=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.lint)

    

Runs various checks on this Graph to make sure it is well-formed. In
particular: - Checks Nodes have correct ownership (owned by this graph) -
Checks Nodes appear in topological order - If `root` is provided, checks that
targets exist in `root`

Parameters

    

**root** (_Optional_
_[_[torch.nn.Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") _]_) – The root module with which to check for targets.
This is equivalent to the `root` argument that is passed when constructing a
`GraphModule`.

`node_copy(node, arg_transform=<function Graph.<lambda>>)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.node_copy)

    

Copy a node from one graph into another. `arg_transform` needs to transform
arguments from the graph of node to the graph of self. Example:

    
    
    # Copying all the nodes in `g` into `new_graph`
    g : torch.fx.Graph = ...
    new_graph = torch.fx.graph()
    value_remap = {}
    for node in g.nodes:
        value_remap[node] = new_graph.node_copy(node, lambda n : value_remap[n])
    

Parameters

    

  * **node** (Node) – The node to copy into `self`.
  * **arg_transform** (_Callable_ _[__[_Node _]__,__Argument_ _]_) – A function that transforms `Node` arguments in node’s `args` and `kwargs` into the equivalent argument in `self`. In the simplest case, this should retrieve a value out of a table mapping Nodes in the original graph to `self`.

`property nodes`

    

Get the list of Nodes that constitute this Graph.

Note that this `Node` list representation is a doubly-linked list. Mutations
during iteration (e.g. delete a Node, add a Node) are safe.

Returns

    

A doubly-linked list of Nodes. Note that `reversed` can be called on this list
to switch iteration order.

`output(result, type_expr=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.output)

    

Insert an `output` `Node` into the `Graph`. An `output` node represents a
`return` statement in Python code. `result` is the value that should be
returned.

Parameters

    

  * **result** (_Argument_) – The value to be returned.
  * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have.

Note

The same insertion point and type expression rules apply for this method as
`Graph.create_node`.

`placeholder(name, type_expr=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.placeholder)

    

Insert a `placeholder` node into the Graph. A `placeholder` represents a
function input.

Parameters

    

  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – A name for the input value. This corresponds to the name of the positional argument to the function this `Graph` represents.
  * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. This is needed in some cases for proper code generation (e.g. when the function is used subsequently in TorchScript compilation).

Note

The same insertion point and type expression rules apply for this method as
`Graph.create_node`.

`print_tabular()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.print_tabular)

    

Prints the intermediate representation of the graph in tabular format.

`python_code(root_module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.python_code)

    

Turn this `Graph` into valid Python code.

Parameters

    

**root_module** ([str](https://docs.python.org/3/library/stdtypes.html#str
"\(in Python v3.9\)")) – The name of the root module on which to look-up
qualified name targets. This is usually ‘self’.

Returns

    

The string source code generated from this `Graph`.

`class torch.fx.Node(graph, name, op, target, args, kwargs, type=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node)

    

`Node` is the data structure that represents individual operations within a
`Graph`. For the most part, Nodes represent callsites to various entities,
such as operators, methods, and Modules (some exceptions include nodes that
specify function inputs and outputs). Each `Node` has a function specified by
its `op` property. The `Node` semantics for each value of `op` are as follows:

  * `placeholder` represents a function input. The `name` attribute specifies the name this value will take on. `target` is similarly the name of the argument. `args` holds either: 1) nothing, or 2) a single argument denoting the default parameter of the function input. `kwargs` is don’t-care. Placeholders correspond to the function parameters (e.g. `x`) in the graph printout.
  * `get_attr` retrieves a parameter from the module hierarchy. `name` is similarly the name the result of the fetch is assigned to. `target` is the fully-qualified name of the parameter’s position in the module hierarchy. `args` and `kwargs` are don’t-care
  * `call_function` applies a free function to some values. `name` is similarly the name of the value to assign to. `target` is the function to be applied. `args` and `kwargs` represent the arguments to the function, following the Python calling convention
  * `call_module` applies a module in the module hierarchy’s `forward()` method to given arguments. `name` is as previous. `target` is the fully-qualified name of the module in the module hierarchy to call. `args` and `kwargs` represent the arguments to invoke the module on, _including the self argument_.
  * `call_method` calls a method on a value. `name` is as similar. `target` is the string name of the method to apply to the `self` argument. `args` and `kwargs` represent the arguments to invoke the module on, _including the self argument_
  * `output` contains the output of the traced function in its `args[0]` attribute. This corresponds to the “return” statement in the Graph printout.

`property all_input_nodes`

    

Return all Nodes that are inputs to this Node. This is equivalent to iterating
over `args` and `kwargs` and only collecting the values that are Nodes.

Returns

    

List of `Nodes` that appear in the `args` and `kwargs` of this `Node`, in that
order.

`append(x)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node.append)

    

Insert x after this node in the list of nodes in the graph. Equvalent to
`self.next.prepend(x)`

Parameters

    

**x** (Node) – The node to put after this node. Must be a member of the same
graph.

`property args`

    

The tuple of arguments to this `Node`. The interpretation of arguments depends
on the node’s opcode. See the `Node` docstring for more information.

Assignment to this property is allowed. All accounting of uses and users is
updated automatically on assignment.

`property kwargs`

    

The dict of keyword arguments to this `Node`. The interpretation of arguments
depends on the node’s opcode. See the `Node` docstring for more information.

Assignment to this property is allowed. All accounting of uses and users is
updated automatically on assignment.

`property next`

    

Returns the next `Node` in the linked list of Nodes.

Returns

    

The next `Node` in the linked list of Nodes.

`prepend(x)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node.prepend)

    

Insert x before this node in the list of nodes in the graph. Example:

    
    
    Before: p -> self
            bx -> x -> ax
    After:  p -> x -> self
            bx -> ax
    

Parameters

    

**x** (Node) – The node to put before this node. Must be a member of the same
graph.

`property prev`

    

Returns the previous `Node` in the linked list of Nodes.

Returns

    

The previous `Node` in the linked list of Nodes.

`replace_all_uses_with(replace_with)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node.replace_all_uses_with)

    

Replace all uses of `self` in the Graph with the Node `replace_with`.

Parameters

    

**replace_with** (Node) – The node to replace all uses of `self` with.

Returns

    

The list of Nodes on which this change was made.

`class torch.fx.Tracer(autowrap_modules=(<module 'math' from
'/home/matti/miniconda3/lib/python3.7/lib-
dynload/math.cpython-37m-x86_64-linux-gnu.so'>, ))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer)

    

`Tracer` is the class that implements the symbolic tracing functionality of
`torch.fx.symbolic_trace`. A call to `symbolic_trace(m)` is equivalent to
`Tracer().trace(m)`.

Tracer can be subclassed to override various behaviors of the tracing process.
The different behaviors that can be overridden are described in the docstrings
of the methods on this class.

`call_module(m, forward, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.call_module)

    

Method that specifies the behavior of this `Tracer` when it encounters a call
to an `nn.Module` instance.

By default, the behavior is to check if the called module is a leaf module via
`is_leaf_module`. If it is, emit a `call_module` node referring to `m` in the
`Graph`. Otherwise, call the `Module` normally, tracing through the operations
in its `forward` function.

This method can be overridden to–for example–create nested traced
GraphModules, or any other behavior you would want while tracing across
`Module` boundaries. `Module` boundaries.

Parameters

    

  * **m** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – The module for which a call is being emitted
  * **forward** (_Callable_) – The forward() method of the `Module` to be invoked
  * **args** (_Tuple_) – args of the module callsite
  * **kwargs** (_Dict_) – kwargs of the module callsite

Returns

    

The return value from the Module call. In the case that a `call_module` node
was emitted, this is a `Proxy` value. Otherwise, it is whatever value was
returned from the `Module` invocation.

`create_arg(a)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.create_arg)

    

A method to specify the behavior of tracing when preparing values to be used
as arguments to nodes in the `Graph`.

By default, the behavior includes:

  1. Iterate through collection types (e.g. tuple, list, dict) and recursively call `create_args` on the elements.
  2. Given a Proxy object, return a reference to the underlying IR `Node`
  3. Given a non-Proxy Tensor object, emit IR for various cases:

     * For a Parameter, emit a `get_attr` node referring to that Parameter
     * For a non-Parameter Tensor, store the Tensor away in a special attribute referring to that attribute.

This method can be overridden to support more types.

Parameters

    

**a** (_Any_) – The value to be emitted as an `Argument` in the `Graph`.

Returns

    

The value `a` converted into the appropriate `Argument`

`create_args_for_root(root_fn, is_module, concrete_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.create_args_for_root)

    

Create `placeholder` nodes corresponding to the signature of the `root`
Module. This method introspects root’s signature and emits those nodes
accordingly, also supporting `*args` and `**kwargs`.

`is_leaf_module(m, module_qualified_name)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.is_leaf_module)

    

A method to specify whether a given `nn.Module` is a “leaf” module.

Leaf modules are the atomic units that appear in the IR, referenced by
`call_module` calls. By default, Modules in the PyTorch standard library
namespace (torch.nn) are leaf modules. All other modules are traced through
and their constituent ops are recorded, unless specified otherwise via this
parameter.

Parameters

    

  * **m** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – The module being queried about
  * **module_qualified_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The path to root of this module. For example, if you have a module hierarchy where submodule `foo` contains submodule `bar`, which contains submodule `baz`, that module will appear with the qualified name `foo.bar.baz` here.

`path_of_module(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.path_of_module)

    

Helper method to find the qualified name of `mod` in the Module hierarchy of
`root`. For example, if `root` has a submodule named `foo`, which has a
submodule named `bar`, passing `bar` into this function will return the string
“foo.bar”.

Parameters

    

**mod** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – The `Module` to retrieve the qualified name for.

`trace(root, concrete_args=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.trace)

    

Trace `root` and return the corresponding FX `Graph` representation. `root`
can either be an `nn.Module` instance or a Python callable.

Note that after this call, `self.root` may be different from the `root` passed
in here. For example, when a free function is passed to `trace()`, we will
create an `nn.Module` instance to use as the root and add embedded constants
to.

Parameters

    

**root** (_Union_ _[_[Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") _,__Callable_ _]_) – Either a `Module` or a function to be
traced through.

Returns

    

A `Graph` representing the semantics of the passed-in `root`.

`class torch.fx.Proxy(node, tracer=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/proxy.html#Proxy)

    

`Proxy` objects are `Node` wrappers that flow through the program during
symbolic tracing and record all the operations (`torch` function calls, method
calls, operators) that they touch into the growing FX Graph.

If you’re doing graph transforms, you can wrap your own `Proxy` method around
a raw `Node` so that you can use the overloaded operators to add additional
things to a `Graph`.

`class torch.fx.Interpreter(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter)

    

An Interpreter executes an FX graph Node-by-Node. This pattern can be useful
for many things, including writing code transformations as well as analysis
passes.

Methods in the Interpreter class can be overridden to customize the behavior
of execution. The map of overrideable methods in terms of call hierarchy:

    
    
    run()
        +-- run_node
            +-- placeholder()
            +-- get_attr()
            +-- call_function()
            +-- call_method()
            +-- call_module()
            +-- output()
    

#### Example

Suppose we want to swap all instances of `torch.neg` with `torch.sigmoid` and
vice versa (including their `Tensor` method equivalents). We could subclass
Interpreter like so:

    
    
    class NegSigmSwapInterpreter(Interpreter):
        def call_function(self, target : Target,
                          args : Tuple, kwargs : Dict) -> Any:
            if target == torch.sigmoid:
                return torch.neg(*args, **kwargs)
            return super().call_function(n)
    
        def call_method(self, target : Target,
                        args : Tuple, kwargs : Dict) -> Any:
            if target == 'neg':
                call_self, *args_tail = args
                return call_self.sigmoid(*args_tail, **kwargs)
            return super().call_method(n)
    
    def fn(x):
        return torch.sigmoid(x).neg()
    
    gm = torch.fx.symbolic_trace(fn)
    input = torch.randn(3, 4)
    result = NegSigmSwapInterpreter(gm).run(input)
    torch.testing.assert_allclose(result, torch.neg(input).sigmoid())
    

Parameters

    

**module** (GraphModule) – The module to be executed

`call_function(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.call_function)

    

Execute a `call_function` node and return the result.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

Return

    

Any: The value returned by the function invocation

`call_method(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.call_method)

    

Execute a `call_method` node and return the result.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

Return

    

Any: The value returned by the method invocation

`call_module(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.call_module)

    

Execute a `call_module` node and return the result.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

Return

    

Any: The value returned by the module invocation

`fetch_args_kwargs_from_env(n)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.fetch_args_kwargs_from_env)

    

Fetch the concrete values of `args` and `kwargs` of node `n` from the current
execution environment.

Parameters

    

**n** (Node) – The node for which `args` and `kwargs` should be fetched.

Returns

    

`args` and `kwargs` with concrete values for `n`.

Return type

    

Tuple[Tuple, Dict]

`fetch_attr(target)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.fetch_attr)

    

Fetch an attribute from the `Module` hierarchy of `self.module`.

Parameters

    

**target** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – The fully-qualfiied name of the attribute to fetch

Returns

    

The value of the attribute.

Return type

    

Any

`get_attr(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.get_attr)

    

Execute a `get_attr` node. Will retrieve an attribute value from the `Module`
hierarchy of `self.module`.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

Returns

    

The value of the attribute that was retrieved

Return type

    

Any

`map_nodes_to_values(args, n)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.map_nodes_to_values)

    

Recursively descend through `args` and look up the concrete value for each
`Node` in the current execution environment.

Parameters

    

  * **args** (_Argument_) – Data structure within which to look up concrete values
  * **n** (Node) – Node to which `args` belongs. This is only used for error reporting.

`output(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.output)

    

Execute an `output` node. This really just retrieves the value referenced by
the `output` node and returns it.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

Returns

    

The return value referenced by the output node

Return type

    

Any

`placeholder(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.placeholder)

    

Execute a `placeholder` node. Note that this is stateful: `Interpreter`
maintains an internal iterator over arguments passed to `run` and this method
returns next() on that iterator.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

Returns

    

The argument value that was retrieved.

Return type

    

Any

`run(*args, initial_env=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.run)

    

Run `module` via interpretation and return the result.

Parameters

    

  * ***args** – The arguments to the Module to run, in positional order
  * **initial_env** (_Optional_ _[__Dict_ _[_Node _,__Any_ _]__]_) – An optional starting environment for execution. This is a dict mapping `Node` to any value. This can be used, for example, to pre-populate results for certain `Nodes` so as to do only partial evaluation within the interpreter.

Returns

    

The value returned from executing the Module

Return type

    

Any

`run_node(n)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.run_node)

    

Run a specific node `n` and return the result. Calls into placeholder,
get_attr, call_function, call_method, call_module, or output depending on
`node.op`

Parameters

    

**n** (Node) – The Node to execute

Returns

    

The result of executing `n`

Return type

    

Any

`class torch.fx.Transformer(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer)

    

`Transformer` is a special type of interpreter that produces a new `Module`.
It exposes a `transform()` method that returns the transformed `Module`.
`Transformer` does not require arguments to run, as `Interpreter` does.
`Transformer` works entirely symbolically.

#### Example

Suppose we want to swap all instances of `torch.neg` with `torch.sigmoid` and
vice versa (including their `Tensor` method equivalents). We could subclass
`Transformer` like so:

    
    
    class NegSigmSwapXformer(Transformer):
        def call_function(self, target : 'Target', args : Tuple[Argument, ...], kwargs : Dict[str, Any]) -> Any:
            if target == torch.sigmoid:
                return torch.neg(*args, **kwargs)
            return super().call_function(n)
    
        def call_method(self, target : 'Target', args : Tuple[Argument, ...], kwargs : Dict[str, Any]) -> Any:
            if target == 'neg':
                call_self, *args_tail = args
                return call_self.sigmoid(*args_tail, **kwargs)
            return super().call_method(n)
    
    def fn(x):
        return torch.sigmoid(x).neg()
    
    gm = torch.fx.symbolic_trace(fn)
    
    transformed : torch.nn.Module = NegSigmSwapXformer(gm).transform()
    input = torch.randn(3, 4)
    torch.testing.assert_allclose(transformed(input), torch.neg(input).sigmoid())
    

Parameters

    

**module** (GraphModule) – The `Module` to be transformed.

`get_attr(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer.get_attr)

    

Execute a `get_attr` node. In `Transformer`, this is overridden to insert a
new `get_attr` node into the output graph.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

`placeholder(target, args, kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer.placeholder)

    

Execute a `placeholder` node. In `Transformer`, this is overridden to insert a
new `placeholder` into the output graph.

Parameters

    

  * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics
  * **args** (_Tuple_) – Tuple of positional args for this invocation
  * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation

`transform()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer.transform)

    

Transform `self.module` and return the transformed `GraphModule`.

`torch.fx.replace_pattern(gm, pattern, replacement)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/subgraph_rewriter.html#replace_pattern)

    

Matches all possible non-overlapping sets of operators and their data
dependencies (`pattern`) in the Graph of a GraphModule (`gm`), then replaces
each of these matched subgraphs with another subgraph (`replacement`).

Parameters

    

  * **gm** – The GraphModule that wraps the Graph to operate on
  * **pattern** – The subgraph to match in `gm` for replacement
  * **replacement** – The subgraph to replace `pattern` with

Returns

    

A list of `Match` objects representing the places in the original graph that
`pattern` was matched to. The list is empty if there are no matches. `Match`
is defined as:

    
    
    class Match(NamedTuple):
        # Node from which the match was found
        anchor: Node
        # Maps nodes in the pattern subgraph to nodes in the larger graph
        nodes_map: Dict[Node, Node]
    

Return type

    

List[Match]

Examples:

    
    
    import torch
    from torch.fx import symbolic_trace, subgraph_rewriter
    
    class M(torch.nn.Module):
        def __init__(self):
            super().__init__()
    
        def forward(self, x, w1, w2):
            m1 = torch.cat([w1, w2]).sum()
            m2 = torch.cat([w1, w2]).sum()
            return x + torch.max(m1) + torch.max(m2)
    
    def pattern(w1, w2):
        return torch.cat([w1, w2]).sum()
    
    def replacement(w1, w2):
        return torch.stack([w1, w2])
    
    traced_module = symbolic_trace(M())
    
    subgraph_rewriter.replace_pattern(traced_module, pattern, replacement)
    

The above code will first match `pattern` in the `forward` method of
`traced_module`. Pattern-matching is done based on use-def relationships, not
node names. For example, if you had `p = torch.cat([a, b])` in `pattern`, you
could match `m = torch.cat([a, b])` in the original `forward` function,
despite the variable names being different (`p` vs `m`).

The `return` statement in `pattern` is matched based on its value only; it may
or may not match to the `return` statement in the larger graph. In other
words, the pattern doesn’t have to extend to the end of the larger graph.

When the pattern is matched, it will be removed from the larger function and
replaced by `replacement`. If there are multiple matches for `pattern` in the
larger function, each non-overlapping match will be replaced. In the case of a
match overlap, the first found match in the set of overlapping matches will be
replaced. (“First” here being defined as the first in a topological ordering
of the Nodes’ use-def relationships. In most cases, the first Node is the
parameter that appears directly after `self`, while the last Node is whatever
the function returns.)

One important thing to note is that the parameters of the `pattern` Callable
must be used in the Callable itself, and the parameters of the `replacement`
Callable must match the pattern. The first rule is why, in the above code
block, the `forward` function has parameters `x, w1, w2`, but the `pattern`
function only has parameters `w1, w2`. `pattern` doesn’t use `x`, so it
shouldn’t specify `x` as a parameter. As an example of the second rule,
consider replacing

    
    
    def pattern(x, y):
        return torch.neg(x) + torch.relu(y)
    

with

    
    
    def replacement(x, y):
        return torch.relu(x)
    

In this case, `replacement` needs the same number of parameters as `pattern`
(both `x` and `y`), even though the parameter `y` isn’t used in `replacement`.

After calling `subgraph_rewriter.replace_pattern`, the generated Python code
looks like this:

    
    
    def forward(self, x, w1, w2):
        stack_1 = torch.stack([w1, w2])
        sum_1 = stack_1.sum()
        stack_2 = torch.stack([w1, w2])
        sum_2 = stack_2.sum()
        max_1 = torch.max(sum_1)
        add_1 = x + max_1
        max_2 = torch.max(sum_2)
        add_2 = add_1 + max_2
        return add_2
    

# torch._assert

`torch._assert(condition, message)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#_assert)

    

A wrapper around Python’s assert which is symbolically traceable.

# torch.abs

`torch.abs(input, *, out=None) → Tensor`

    

Computes the absolute value of each element in `input`.

outi=∣inputi∣\text{out}_{i} = |\text{input}_{i}|

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.abs(torch.tensor([-1, -2, 3]))
    tensor([ 1,  2,  3])
    

# torch.absolute

`torch.absolute(input, *, out=None) → Tensor`

    

Alias for [`torch.abs()`](torch.abs#torch.abs "torch.abs")

# torch.acos

`torch.acos(input, *, out=None) → Tensor`

    

Computes the inverse cosine of each element in `input`.

outi=cos⁡−1(inputi)\text{out}_{i} = \cos^{-1}(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.3348, -0.5889,  0.2005, -0.1584])
    >>> torch.acos(a)
    tensor([ 1.2294,  2.2004,  1.3690,  1.7298])
    

# torch.acosh

`torch.acosh(input, *, out=None) → Tensor`

    

Returns a new tensor with the inverse hyperbolic cosine of the elements of
`input`.

Note

The domain of the inverse hyperbolic cosine is `[1, inf)` and values outside
this range will be mapped to `NaN`, except for `+ INF` for which the output is
mapped to `+ INF`.

outi=cosh⁡−1(inputi)\text{out}_{i} = \cosh^{-1}(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4).uniform_(1, 2)
    >>> a
    tensor([ 1.3192, 1.9915, 1.9674, 1.7151 ])
    >>> torch.acosh(a)
    tensor([ 0.7791, 1.3120, 1.2979, 1.1341 ])
    

# torch.add

`torch.add(input, other, *, out=None)`

    

Adds the scalar `other` to each element of the input `input` and returns a new
resulting tensor.

out=input+other\text{out} = \text{input} + \text{other}

If `input` is of type FloatTensor or DoubleTensor, `other` must be a real
number, otherwise it should be an integer.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **value** (_Number_) – the number to be added to each element of `input`

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.0202,  1.0985,  1.3506, -0.6056])
    >>> torch.add(a, 20)
    tensor([ 20.0202,  21.0985,  21.3506,  19.3944])
    

`torch.add(input, other, *, alpha=1, out=None)`

Each element of the tensor `other` is multiplied by the scalar `alpha` and
added to each element of the tensor `input`. The resulting tensor is returned.

The shapes of `input` and `other` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

out=input+alpha×other\text{out} = \text{input} + \text{alpha} \times
\text{other}

If `other` is of type FloatTensor or DoubleTensor, `alpha` must be a real
number, otherwise it should be an integer.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

  * **alpha** (_Number_) – the scalar multiplier for `other`
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.9732, -0.3497,  0.6245,  0.4022])
    >>> b = torch.randn(4, 1)
    >>> b
    tensor([[ 0.3743],
            [-1.7724],
            [-0.5811],
            [-0.8017]])
    >>> torch.add(a, b, alpha=10)
    tensor([[  2.7695,   3.3930,   4.3672,   4.1450],
            [-18.6971, -18.0736, -17.0994, -17.3216],
            [ -6.7845,  -6.1610,  -5.1868,  -5.4090],
            [ -8.9902,  -8.3667,  -7.3925,  -7.6147]])
    

# torch.addbmm

`torch.addbmm(input, batch1, batch2, *, beta=1, alpha=1, out=None) → Tensor`

    

Performs a batch matrix-matrix product of matrices stored in `batch1` and
`batch2`, with a reduced add step (all matrix multiplications get accumulated
along the first dimension). `input` is added to the final result.

`batch1` and `batch2` must be 3-D tensors each containing the same number of
matrices.

If `batch1` is a (b×n×m)(b \times n \times m) tensor, `batch2` is a (b×m×p)(b
\times m \times p) tensor, `input` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with a (n×p)(n \times p) tensor and `out` will be a (n×p)(n \times
p) tensor.

out=β input+α(∑i=0b−1batch1i@batch2i)out = \beta\ \text{input} + \alpha\
(\sum_{i=0}^{b-1} \text{batch1}_i \mathbin{@} \text{batch2}_i)

If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will
not be propagated.

For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and
`alpha` must be real numbers, otherwise they should be integers.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Parameters

    

  * **batch1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first batch of matrices to be multiplied
  * **batch2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second batch of matrices to be multiplied

Keyword Arguments

    

  * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta )
  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be added
  * **alpha** (_Number_ _,__optional_) – multiplier for `batch1 @ batch2` (α\alpha )
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> M = torch.randn(3, 5)
    >>> batch1 = torch.randn(10, 3, 4)
    >>> batch2 = torch.randn(10, 4, 5)
    >>> torch.addbmm(M, batch1, batch2)
    tensor([[  6.6311,   0.0503,   6.9768, -12.0362,  -2.1653],
            [ -4.8185,  -1.4255,  -6.6760,   8.9453,   2.5743],
            [ -3.8202,   4.3691,   1.0943,  -1.1109,   5.4730]])
    

# torch.addcdiv

`torch.addcdiv(input, tensor1, tensor2, *, value=1, out=None) → Tensor`

    

Performs the element-wise division of `tensor1` by `tensor2`, multiply the
result by the scalar `value` and add it to `input`.

Warning

Integer division with addcdiv is no longer supported, and in a future release
addcdiv will perform a true division of tensor1 and tensor2. The historic
addcdiv behavior can be implemented as (input + value * torch.trunc(tensor1 /
tensor2)).to(input.dtype) for integer inputs and as (input + value * tensor1 /
tensor2) for float inputs. The future addcdiv behavior is just the latter
implementation: (input + value * tensor1 / tensor2), for all dtypes.

outi=inputi+value×tensor1itensor2i\text{out}_i = \text{input}_i + \text{value}
\times \frac{\text{tensor1}_i}{\text{tensor2}_i}

The shapes of `input`, `tensor1`, and `tensor2` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

For inputs of type `FloatTensor` or `DoubleTensor`, `value` must be a real
number, otherwise an integer.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be added
  * **tensor1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the numerator tensor
  * **tensor2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the denominator tensor

Keyword Arguments

    

  * **value** (_Number_ _,__optional_) – multiplier for tensor1/tensor2\text{tensor1} / \text{tensor2}
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> t = torch.randn(1, 3)
    >>> t1 = torch.randn(3, 1)
    >>> t2 = torch.randn(1, 3)
    >>> torch.addcdiv(t, t1, t2, value=0.1)
    tensor([[-0.2312, -3.6496,  0.1312],
            [-1.0428,  3.4292, -0.1030],
            [-0.5369, -0.9829,  0.0430]])
    

# torch.addcmul

`torch.addcmul(input, tensor1, tensor2, *, value=1, out=None) → Tensor`

    

Performs the element-wise multiplication of `tensor1` by `tensor2`, multiply
the result by the scalar `value` and add it to `input`.

outi=inputi+value×tensor1i×tensor2i\text{out}_i = \text{input}_i +
\text{value} \times \text{tensor1}_i \times \text{tensor2}_i

The shapes of [`tensor`](torch.tensor#torch.tensor "torch.tensor"), `tensor1`,
and `tensor2` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

For inputs of type `FloatTensor` or `DoubleTensor`, `value` must be a real
number, otherwise an integer.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be added
  * **tensor1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be multiplied
  * **tensor2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be multiplied

Keyword Arguments

    

  * **value** (_Number_ _,__optional_) – multiplier for tensor1.∗tensor2tensor1 .* tensor2
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> t = torch.randn(1, 3)
    >>> t1 = torch.randn(3, 1)
    >>> t2 = torch.randn(1, 3)
    >>> torch.addcmul(t, t1, t2, value=0.1)
    tensor([[-0.8635, -0.6391,  1.6174],
            [-0.7617, -0.5879,  1.7388],
            [-0.8353, -0.6249,  1.6511]])
    

# torch.addmm

`torch.addmm(input, mat1, mat2, *, beta=1, alpha=1, out=None) → Tensor`

    

Performs a matrix multiplication of the matrices `mat1` and `mat2`. The matrix
`input` is added to the final result.

If `mat1` is a (n×m)(n \times m) tensor, `mat2` is a (m×p)(m \times p) tensor,
then `input` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with a (n×p)(n \times p) tensor and `out` will be a (n×p)(n \times
p) tensor.

`alpha` and `beta` are scaling factors on matrix-vector product between `mat1`
and `mat2` and the added matrix `input` respectively.

out=β input+α(mat1i@mat2i)\text{out} = \beta\ \text{input} + \alpha\
(\text{mat1}_i \mathbin{@} \text{mat2}_i)

If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will
not be propagated.

For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and
`alpha` must be real numbers, otherwise they should be integers.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be added
  * **mat1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first matrix to be matrix multiplied
  * **mat2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second matrix to be matrix multiplied

Keyword Arguments

    

  * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta )
  * **alpha** (_Number_ _,__optional_) – multiplier for mat1@mat2mat1 @ mat2 (α\alpha )
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> M = torch.randn(2, 3)
    >>> mat1 = torch.randn(2, 3)
    >>> mat2 = torch.randn(3, 3)
    >>> torch.addmm(M, mat1, mat2)
    tensor([[-4.8716,  1.4671, -1.3746],
            [ 0.7573, -3.9555, -2.8681]])
    

# torch.addmv

`torch.addmv(input, mat, vec, *, beta=1, alpha=1, out=None) → Tensor`

    

Performs a matrix-vector product of the matrix `mat` and the vector `vec`. The
vector `input` is added to the final result.

If `mat` is a (n×m)(n \times m) tensor, `vec` is a 1-D tensor of size `m`,
then `input` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with a 1-D tensor of size `n` and `out` will be 1-D tensor of size
`n`.

`alpha` and `beta` are scaling factors on matrix-vector product between `mat`
and `vec` and the added tensor `input` respectively.

out=β input+α(mat@vec)\text{out} = \beta\ \text{input} + \alpha\ (\text{mat}
\mathbin{@} \text{vec})

If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will
not be propagated.

For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and
`alpha` must be real numbers, otherwise they should be integers

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – vector to be added
  * **mat** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be matrix multiplied
  * **vec** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – vector to be matrix multiplied

Keyword Arguments

    

  * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta )
  * **alpha** (_Number_ _,__optional_) – multiplier for mat@vecmat @ vec (α\alpha )
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> M = torch.randn(2)
    >>> mat = torch.randn(2, 3)
    >>> vec = torch.randn(3)
    >>> torch.addmv(M, mat, vec)
    tensor([-0.3768, -5.5565])
    

# torch.addr

`torch.addr(input, vec1, vec2, *, beta=1, alpha=1, out=None) → Tensor`

    

Performs the outer-product of vectors `vec1` and `vec2` and adds it to the
matrix `input`.

Optional values `beta` and `alpha` are scaling factors on the outer product
between `vec1` and `vec2` and the added matrix `input` respectively.

out=β input+α(vec1⊗vec2)\text{out} = \beta\ \text{input} + \alpha\
(\text{vec1} \otimes \text{vec2})

If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will
not be propagated.

If `vec1` is a vector of size `n` and `vec2` is a vector of size `m`, then
`input` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with a matrix of size (n×m)(n \times m) and `out` will be a matrix
of size (n×m)(n \times m) .

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be added
  * **vec1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first vector of the outer product
  * **vec2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second vector of the outer product

Keyword Arguments

    

  * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta )
  * **alpha** (_Number_ _,__optional_) – multiplier for vec1⊗vec2\text{vec1} \otimes \text{vec2} (α\alpha )
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> vec1 = torch.arange(1., 4.)
    >>> vec2 = torch.arange(1., 3.)
    >>> M = torch.zeros(3, 2)
    >>> torch.addr(M, vec1, vec2)
    tensor([[ 1.,  2.],
            [ 2.,  4.],
            [ 3.,  6.]])
    

# torch.all

`torch.all(input) → Tensor`

    

Tests if all elements in `input` evaluate to `True`.

Note

This function matches the behaviour of NumPy in returning output of dtype
`bool` for all supported dtypes except `uint8`. For `uint8` the dtype of
output is `uint8` itself.

Example:

    
    
    >>> a = torch.rand(1, 2).bool()
    >>> a
    tensor([[False, True]], dtype=torch.bool)
    >>> torch.all(a)
    tensor(False, dtype=torch.bool)
    >>> a = torch.arange(0, 3)
    >>> a
    tensor([0, 1, 2])
    >>> torch.all(a)
    tensor(False)
    

`torch.all(input, dim, keepdim=False, *, out=None) → Tensor`

For each row of `input` in the given dimension `dim`, returns `True` if all
elements in the row evaluate to `True` and `False` otherwise.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 fewer dimension
than `input`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.rand(4, 2).bool()
    >>> a
    tensor([[True, True],
            [True, False],
            [True, True],
            [True, True]], dtype=torch.bool)
    >>> torch.all(a, dim=1)
    tensor([ True, False,  True,  True], dtype=torch.bool)
    >>> torch.all(a, dim=0)
    tensor([ True, False], dtype=torch.bool)
    

# torch.allclose

`torch.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False) → bool`

    

This function checks if all `input` and `other` satisfy the condition:

∣input−other∣≤atol+rtol×∣other∣\lvert \text{input} - \text{other} \rvert \leq
\texttt{atol} + \texttt{rtol} \times \lvert \text{other} \rvert

elementwise, for all elements of `input` and `other`. The behaviour of this
function is analogous to
[numpy.allclose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html)

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor to compare
  * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance. Default: 1e-08
  * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance. Default: 1e-05
  * **equal_nan** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, then two `NaN` s will be considered equal. Default: `False`

Example:

    
    
    >>> torch.allclose(torch.tensor([10000., 1e-07]), torch.tensor([10000.1, 1e-08]))
    False
    >>> torch.allclose(torch.tensor([10000., 1e-08]), torch.tensor([10000.1, 1e-09]))
    True
    >>> torch.allclose(torch.tensor([1.0, float('nan')]), torch.tensor([1.0, float('nan')]))
    False
    >>> torch.allclose(torch.tensor([1.0, float('nan')]), torch.tensor([1.0, float('nan')]), equal_nan=True)
    True
    

# torch.amax

`torch.amax(input, dim, keepdim=False, *, out=None) → Tensor`

    

Returns the maximum value of each slice of the `input` tensor in the given
dimension(s) `dim`.

Note

`The difference between max/min and amax/amin is:`

    

  * `amax`/`amin` supports reducing on multiple dimensions,
  * `amax`/`amin` does not return indices,
  * `amax`/`amin` evenly distributes gradient between equal values, while `max(dim)`/`min(dim)` propagates gradient only to a single index in the source tensor.

If `keepdim is ``True``, the output tensors are of the same size as `input`
except in the dimension(s) `dim` where they are of size 1. Otherwise, `dim`s
are squeezed (see :func:`torch.squeeze`), resulting in the output tensors
having fewer dimension than `input`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.8177,  1.4878, -0.2491,  0.9130],
            [-0.7158,  1.1775,  2.0992,  0.4817],
            [-0.0053,  0.0164, -1.3738, -0.0507],
            [ 1.9700,  1.1106, -1.0318, -1.0816]])
    >>> torch.amax(a, 1)
    tensor([1.4878, 2.0992, 0.0164, 1.9700])
    

# torch.amin

`torch.amin(input, dim, keepdim=False, *, out=None) → Tensor`

    

Returns the minimum value of each slice of the `input` tensor in the given
dimension(s) `dim`.

Note

`The difference between max/min and amax/amin is:`

    

  * `amax`/`amin` supports reducing on multiple dimensions,
  * `amax`/`amin` does not return indices,
  * `amax`/`amin` evenly distributes gradient between equal values, while `max(dim)`/`min(dim)` propagates gradient only to a single index in the source tensor.

If `keepdim` is `True`, the output tensors are of the same size as `input`
except in the dimension(s) `dim` where they are of size 1. Otherwise, `dim`s
are squeezed (see :func:`torch.squeeze`), resulting in the output tensors
having fewer dimensions than `input`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.6451, -0.4866,  0.2987, -1.3312],
            [-0.5744,  1.2980,  1.8397, -0.2713],
            [ 0.9128,  0.9214, -1.7268, -0.2995],
            [ 0.9023,  0.4853,  0.9075, -1.6165]])
    >>> torch.amin(a, 1)
    tensor([-1.3312, -0.5744, -1.7268, -1.6165])
    

# torch.angle

`torch.angle(input, *, out=None) → Tensor`

    

Computes the element-wise angle (in radians) of the given `input` tensor.

outi=angle(inputi)\text{out}_{i} = angle(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Note

Starting in PyTorch 1.8, angle returns pi for negative real numbers, zero for
non-negative real numbers, and propagates NaNs. Previously the function would
return zero for all real numbers and not propagate floating-point NaNs.

Example:

    
    
    >>> torch.angle(torch.tensor([-1 + 1j, -2 + 2j, 3 - 3j]))*180/3.14159
    tensor([ 135.,  135,  -45])
    

# torch.any

`torch.any(input) → Tensor`

    

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Tests if any element in `input` evaluates to `True`.

Note

This function matches the behaviour of NumPy in returning output of dtype
`bool` for all supported dtypes except `uint8`. For `uint8` the dtype of
output is `uint8` itself.

Example:

    
    
    >>> a = torch.rand(1, 2).bool()
    >>> a
    tensor([[False, True]], dtype=torch.bool)
    >>> torch.any(a)
    tensor(True, dtype=torch.bool)
    >>> a = torch.arange(0, 3)
    >>> a
    tensor([0, 1, 2])
    >>> torch.any(a)
    tensor(True)
    

`torch.any(input, dim, keepdim=False, *, out=None) → Tensor`

For each row of `input` in the given dimension `dim`, returns `True` if any
element in the row evaluate to `True` and `False` otherwise.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 fewer dimension
than `input`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 2) < 0
    >>> a
    tensor([[ True,  True],
            [False,  True],
            [ True,  True],
            [False, False]])
    >>> torch.any(a, 1)
    tensor([ True,  True,  True, False])
    >>> torch.any(a, 0)
    tensor([True, True])
    

# torch.arange

`torch.arange(start=0, end, step=1, *, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Returns a 1-D tensor of size ⌈end−startstep⌉\left\lceil \frac{\text{end} -
\text{start}}{\text{step}} \right\rceil with values from the interval `[start,
end)` taken with common difference `step` beginning from `start`.

Note that non-integer `step` is subject to floating point rounding errors when
comparing against `end`; to avoid inconsistency, we advise adding a small
epsilon to `end` in such cases.

outi+1=outi+step\text{out}_{{i+1}} = \text{out}_{i} + \text{step}

Parameters

    

  * **start** (_Number_) – the starting value for the set of points. Default: `0`.
  * **end** (_Number_) – the ending value for the set of points
  * **step** (_Number_) – the gap between each pair of adjacent points. Default: `1`.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). If `dtype` is not given, infer the data type from the other input arguments. If any of `start`, `end`, or `stop` are floating-point, the `dtype` is inferred to be the default dtype, see [`get_default_dtype()`](torch.get_default_dtype#torch.get_default_dtype "torch.get_default_dtype"). Otherwise, the `dtype` is inferred to be `torch.int64`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.arange(5)
    tensor([ 0,  1,  2,  3,  4])
    >>> torch.arange(1, 4)
    tensor([ 1,  2,  3])
    >>> torch.arange(1, 2.5, 0.5)
    tensor([ 1.0000,  1.5000,  2.0000])
    

# torch.arccos

`torch.arccos(input, *, out=None) → Tensor`

    

Alias for [`torch.acos()`](torch.acos#torch.acos "torch.acos").

# torch.arccosh

`torch.arccosh(input, *, out=None) → Tensor`

    

Alias for [`torch.acosh()`](torch.acosh#torch.acosh "torch.acosh").

# torch.arcsin

`torch.arcsin(input, *, out=None) → Tensor`

    

Alias for [`torch.asin()`](torch.asin#torch.asin "torch.asin").

# torch.arcsinh

`torch.arcsinh(input, *, out=None) → Tensor`

    

Alias for [`torch.asinh()`](torch.asinh#torch.asinh "torch.asinh").

# torch.arctan

`torch.arctan(input, *, out=None) → Tensor`

    

Alias for [`torch.atan()`](torch.atan#torch.atan "torch.atan").

# torch.arctanh

`torch.arctanh(input, *, out=None) → Tensor`

    

Alias for [`torch.atanh()`](torch.atanh#torch.atanh "torch.atanh").

# torch.are_deterministic_algorithms_enabled

`torch.are_deterministic_algorithms_enabled()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#are_deterministic_algorithms_enabled)

    

Returns True if the global deterministic flag is turned on. Refer to
[`torch.use_deterministic_algorithms()`](torch.use_deterministic_algorithms#torch.use_deterministic_algorithms
"torch.use_deterministic_algorithms") documentation for more details.

# torch.argmax

`torch.argmax(input) → LongTensor`

    

Returns the indices of the maximum value of all elements in the `input`
tensor.

This is the second value returned by [`torch.max()`](torch.max#torch.max
"torch.max"). See its documentation for the exact semantics of this method.

Note

If there are multiple minimal values then the indices of the first minimal
value are returned.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 1.3398,  0.2663, -0.2686,  0.2450],
            [-0.7401, -0.8805, -0.3402, -1.1936],
            [ 0.4907, -1.3948, -1.0691, -0.3132],
            [-1.6092,  0.5419, -0.2993,  0.3195]])
    >>> torch.argmax(a)
    tensor(0)
    

`torch.argmax(input, dim, keepdim=False) → LongTensor`

Returns the indices of the maximum values of a tensor across a dimension.

This is the second value returned by [`torch.max()`](torch.max#torch.max
"torch.max"). See its documentation for the exact semantics of this method.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. If `None`, the argmax of the flattened input is returned.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Ignored if `dim=None`.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 1.3398,  0.2663, -0.2686,  0.2450],
            [-0.7401, -0.8805, -0.3402, -1.1936],
            [ 0.4907, -1.3948, -1.0691, -0.3132],
            [-1.6092,  0.5419, -0.2993,  0.3195]])
    >>> torch.argmax(a, dim=1)
    tensor([ 0,  2,  0,  1])
    

# torch.argmin

`torch.argmin(input, dim=None, keepdim=False) → LongTensor`

    

Returns the indices of the minimum value(s) of the flattened tensor or along a
dimension

This is the second value returned by [`torch.min()`](torch.min#torch.min
"torch.min"). See its documentation for the exact semantics of this method.

Note

If there are multiple minimal values then the indices of the first minimal
value are returned.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. If `None`, the argmin of the flattened input is returned.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Ignored if `dim=None`.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.1139,  0.2254, -0.1381,  0.3687],
            [ 1.0100, -1.1975, -0.0102, -0.4732],
            [-0.9240,  0.1207, -0.7506, -1.0213],
            [ 1.7809, -1.2960,  0.9384,  0.1438]])
    >>> torch.argmin(a)
    tensor(13)
    >>> torch.argmin(a, dim=1)
    tensor([ 2,  1,  3,  1])
    >>> torch.argmin(a, dim=1, keepdim=True)
    tensor([[2],
            [1],
            [3],
            [1]])
    

# torch.argsort

`torch.argsort(input, dim=-1, descending=False) → LongTensor`

    

Returns the indices that sort a tensor along a given dimension in ascending
order by value.

This is the second value returned by [`torch.sort()`](torch.sort#torch.sort
"torch.sort"). See its documentation for the exact semantics of this method.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to sort along
  * **descending** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls the sorting order (ascending or descending)

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.0785,  1.5267, -0.8521,  0.4065],
            [ 0.1598,  0.0788, -0.0745, -1.2700],
            [ 1.2208,  1.0722, -0.7064,  1.2564],
            [ 0.0669, -0.2318, -0.8229, -0.9280]])
    
    
    >>> torch.argsort(a, dim=1)
    tensor([[2, 0, 3, 1],
            [3, 2, 1, 0],
            [2, 1, 0, 3],
            [3, 2, 1, 0]])
    

# torch.as_strided

`torch.as_strided(input, size, stride, storage_offset=0) → Tensor`

    

Create a view of an existing `torch.Tensor` `input` with specified `size`,
`stride` and `storage_offset`.

Warning

More than one element of a created tensor may refer to a single memory
location. As a result, in-place operations (especially ones that are
vectorized) may result in incorrect behavior. If you need to write to the
tensors, please clone them first.

Many PyTorch functions, which return a view of a tensor, are internally
implemented with this function. Those functions, like
[`torch.Tensor.expand()`](../tensors#torch.Tensor.expand
"torch.Tensor.expand"), are easier to read and are therefore more advisable to
use.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **size** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _or_ _ints_) – the shape of the output tensor
  * **stride** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _or_ _ints_) – the stride of the output tensor
  * **storage_offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the offset in the underlying storage of the output tensor

Example:

    
    
    >>> x = torch.randn(3, 3)
    >>> x
    tensor([[ 0.9039,  0.6291,  1.0795],
            [ 0.1586,  2.1939, -0.4900],
            [-0.1909, -0.7503,  1.9355]])
    >>> t = torch.as_strided(x, (2, 2), (1, 2))
    >>> t
    tensor([[0.9039, 1.0795],
            [0.6291, 0.1586]])
    >>> t = torch.as_strided(x, (2, 2), (1, 2), 1)
    tensor([[0.6291, 0.1586],
            [1.0795, 2.1939]])
    

# torch.as_tensor

`torch.as_tensor(data, dtype=None, device=None) → Tensor`

    

Convert the data into a `torch.Tensor`. If the data is already a `Tensor` with
the same `dtype` and `device`, no copy will be performed, otherwise a new
`Tensor` will be returned with computational graph retained if data `Tensor`
has `requires_grad=True`. Similarly, if the data is an `ndarray` of the
corresponding `dtype` and the `device` is the cpu, no copy will be performed.

Parameters

    

  * **data** (_array_like_) – Initial data for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, infers data type from `data`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.

Example:

    
    
    >>> a = numpy.array([1, 2, 3])
    >>> t = torch.as_tensor(a)
    >>> t
    tensor([ 1,  2,  3])
    >>> t[0] = -1
    >>> a
    array([-1,  2,  3])
    
    >>> a = numpy.array([1, 2, 3])
    >>> t = torch.as_tensor(a, device=torch.device('cuda'))
    >>> t
    tensor([ 1,  2,  3])
    >>> t[0] = -1
    >>> a
    array([1,  2,  3])
    

# torch.asin

`torch.asin(input, *, out=None) → Tensor`

    

Returns a new tensor with the arcsine of the elements of `input`.

outi=sin⁡−1(inputi)\text{out}_{i} = \sin^{-1}(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.5962,  1.4985, -0.4396,  1.4525])
    >>> torch.asin(a)
    tensor([-0.6387,     nan, -0.4552,     nan])
    

# torch.asinh

`torch.asinh(input, *, out=None) → Tensor`

    

Returns a new tensor with the inverse hyperbolic sine of the elements of
`input`.

outi=sinh⁡−1(inputi)\text{out}_{i} = \sinh^{-1}(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.1606, -1.4267, -1.0899, -1.0250 ])
    >>> torch.asinh(a)
    tensor([ 0.1599, -1.1534, -0.9435, -0.8990 ])
    

# torch.atan

`torch.atan(input, *, out=None) → Tensor`

    

Returns a new tensor with the arctangent of the elements of `input`.

outi=tan⁡−1(inputi)\text{out}_{i} = \tan^{-1}(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.2341,  0.2539, -0.6256, -0.6448])
    >>> torch.atan(a)
    tensor([ 0.2299,  0.2487, -0.5591, -0.5727])
    

# torch.atan2

`torch.atan2(input, other, *, out=None) → Tensor`

    

Element-wise arctangent of inputi/otheri\text{input}_{i} / \text{other}_{i}
with consideration of the quadrant. Returns a new tensor with the signed
angles in radians between vector (otheri,inputi)(\text{other}_{i},
\text{input}_{i}) and vector (1,0)(1, 0) . (Note that otheri\text{other}_{i} ,
the second parameter, is the x-coordinate, while inputi\text{input}_{i} , the
first parameter, is the y-coordinate.)

The shapes of `input` and `other` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.9041,  0.0196, -0.3108, -2.4423])
    >>> torch.atan2(a, torch.randn(4))
    tensor([ 0.9833,  0.0811, -1.9743, -1.4151])
    

# torch.atanh

`torch.atanh(input, *, out=None) → Tensor`

    

Returns a new tensor with the inverse hyperbolic tangent of the elements of
`input`.

Note

The domain of the inverse hyperbolic tangent is `(-1, 1)` and values outside
this range will be mapped to `NaN`, except for the values `1` and `-1` for
which the output is mapped to `+/-INF` respectively.

outi=tanh⁡−1(inputi)\text{out}_{i} = \tanh^{-1}(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4).uniform_(-1, 1)
    >>> a
    tensor([ -0.9385, 0.2968, -0.8591, -0.1871 ])
    >>> torch.atanh(a)
    tensor([ -1.7253, 0.3060, -1.2899, -0.1893 ])
    

# torch.atleast_1d

`torch.atleast_1d(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#atleast_1d)

    

Returns a 1-dimensional view of each input tensor with zero dimensions. Input
tensors with one or more dimensions are returned as-is.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _list of
Tensors_) –

Returns

    

output (Tensor or tuple of Tensors)

Example::

    
    
    
    >>> x = torch.randn(2)
    >>> x
    tensor([1.4584, 0.7583])
    >>> torch.atleast_1d(x)
    tensor([1.4584, 0.7583])
    >>> x = torch.tensor(1.)
    >>> x
    tensor(1.)
    >>> torch.atleast_1d(x)
    tensor([1.])
    >>> x = torch.tensor(0.5)
    >>> y = torch.tensor(1.)
    >>> torch.atleast_1d((x,y))
    (tensor([0.5000]), tensor([1.]))
    

# torch.atleast_2d

`torch.atleast_2d(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#atleast_2d)

    

Returns a 2-dimensional view of each input tensor with zero dimensions. Input
tensors with two or more dimensions are returned as-is. :param input: :type
input: Tensor or list of Tensors

Returns

    

output (Tensor or tuple of Tensors)

Example::

    
    
    
    >>> x = torch.tensor(1.)
    >>> x
    tensor(1.)
    >>> torch.atleast_2d(x)
    tensor([[1.]])
    >>> x = torch.randn(2,2)
    >>> x
    tensor([[2.2086, 2.5165],
            [0.1757, 0.5194]])
    >>> torch.atleast_2d(x)
    tensor([[2.2086, 2.5165],
            [0.1757, 0.5194]])
    >>> x = torch.tensor(0.5)
    >>> y = torch.tensor(1.)
    >>> torch.atleast_2d((x,y))
    (tensor([[0.5000]]), tensor([[1.]]))
    

# torch.atleast_3d

`torch.atleast_3d(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#atleast_3d)

    

Returns a 3-dimensional view of each input tensor with zero dimensions. Input
tensors with three or more dimensions are returned as-is. :param input: :type
input: Tensor or list of Tensors

Returns

    

output (Tensor or tuple of Tensors)

#### Example

    
    
    >>> x = torch.tensor(0.5)
    >>> x
    tensor(0.5000)
    >>> torch.atleast_3d(x)
    tensor([[[0.5000]]])
    >>> y = torch.randn(2,2)
    >>> y
    tensor([[-0.8079,  0.7460],
            [-1.1647,  1.4734]])
    >>> torch.atleast_3d(y)
    tensor([[[-0.8079],
            [ 0.7460]],
    
            [[-1.1647],
            [ 1.4734]]])
    >>> x = torch.randn(1,1,1)
    >>> x
    tensor([[[-1.5689]]])
    >>> torch.atleast_3d(x)
    tensor([[[-1.5689]]])
    >>> x = torch.tensor(0.5)
    >>> y = torch.tensor(1.)
    >>> torch.atleast_3d((x,y))
    (tensor([[[0.5000]]]), tensor([[[1.]]]))
    

# torch.baddbmm

`torch.baddbmm(input, batch1, batch2, *, beta=1, alpha=1, out=None) → Tensor`

    

Performs a batch matrix-matrix product of matrices in `batch1` and `batch2`.
`input` is added to the final result.

`batch1` and `batch2` must be 3-D tensors each containing the same number of
matrices.

If `batch1` is a (b×n×m)(b \times n \times m) tensor, `batch2` is a (b×m×p)(b
\times m \times p) tensor, then `input` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with a (b×n×p)(b \times n \times p) tensor and `out` will be a
(b×n×p)(b \times n \times p) tensor. Both `alpha` and `beta` mean the same as
the scaling factors used in [`torch.addbmm()`](torch.addbmm#torch.addbmm
"torch.addbmm").

outi=βinputi+α(batch1i@batch2i)\text{out}_i = \beta\ \text{input}_i + \alpha\
(\text{batch1}_i \mathbin{@} \text{batch2}_i)

If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will
not be propagated.

For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and
`alpha` must be real numbers, otherwise they should be integers.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be added
  * **batch1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first batch of matrices to be multiplied
  * **batch2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second batch of matrices to be multiplied

Keyword Arguments

    

  * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta )
  * **alpha** (_Number_ _,__optional_) – multiplier for batch1@batch2\text{batch1} \mathbin{@} \text{batch2} (α\alpha )
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> M = torch.randn(10, 3, 5)
    >>> batch1 = torch.randn(10, 3, 4)
    >>> batch2 = torch.randn(10, 4, 5)
    >>> torch.baddbmm(M, batch1, batch2).size()
    torch.Size([10, 3, 5])
    

# torch.bartlett_window

`torch.bartlett_window(window_length, periodic=True, *, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Bartlett window function.

w[n]=1−∣2nN−1−1∣={2nN−1if 0≤n≤N−122−2nN−1if N−12<n<N,w[n] = 1 - \left|
\frac{2n}{N-1} - 1 \right| = \begin{cases} \frac{2n}{N - 1} & \text{if } 0
\leq n \leq \frac{N - 1}{2} \\\ 2 - \frac{2n}{N - 1} & \text{if } \frac{N -
1}{2} < n < N \\\ \end{cases},

where NN is the full window size.

The input `window_length` is a positive integer controlling the returned
window size. `periodic` flag determines whether the returned window trims off
the last duplicate value from the symmetric window and is ready to be used as
a periodic window with functions like [`torch.stft()`](torch.stft#torch.stft
"torch.stft"). Therefore, if `periodic` is true, the NN in above formula is in
fact window_length+1\text{window\\_length} + 1 . Also, we always have
`torch.bartlett_window(L, periodic=True)` equal to `torch.bartlett_window(L +
1, periodic=False)[:-1])`.

Note

If `window_length` =1=1 , the returned window contains a single value 1.

Parameters

    

  * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of returned window
  * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a window to be used as periodic function. If False, return a symmetric window.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). Only floating point types are supported.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Returns

    

A 1-D tensor of size (window_length,)(\text{window\\_length},) containing the
window

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

# torch.bernoulli

`torch.bernoulli(input, *, generator=None, out=None) → Tensor`

    

Draws binary random numbers (0 or 1) from a Bernoulli distribution.

The `input` tensor should be a tensor containing probabilities to be used for
drawing the binary random number. Hence, all values in `input` have to be in
the range: 0≤inputi≤10 \leq \text{input}_i \leq 1 .

The ith\text{i}^{th} element of the output tensor will draw a value 11
according to the ith\text{i}^{th} probability value given in `input`.

outi∼Bernoulli(p=inputi)\text{out}_{i} \sim \mathrm{Bernoulli}(p =
\text{input}_{i})

The returned `out` tensor only has values 0 or 1 and is of the same shape as
`input`.

`out` can have integral `dtype`, but `input` must have floating point `dtype`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor of probability values for the Bernoulli distribution

Keyword Arguments

    

  * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.empty(3, 3).uniform_(0, 1)  # generate a uniform random matrix with range [0, 1]
    >>> a
    tensor([[ 0.1737,  0.0950,  0.3609],
            [ 0.7148,  0.0289,  0.2676],
            [ 0.9456,  0.8937,  0.7202]])
    >>> torch.bernoulli(a)
    tensor([[ 1.,  0.,  0.],
            [ 0.,  0.,  0.],
            [ 1.,  1.,  1.]])
    
    >>> a = torch.ones(3, 3) # probability of drawing "1" is 1
    >>> torch.bernoulli(a)
    tensor([[ 1.,  1.,  1.],
            [ 1.,  1.,  1.],
            [ 1.,  1.,  1.]])
    >>> a = torch.zeros(3, 3) # probability of drawing "1" is 0
    >>> torch.bernoulli(a)
    tensor([[ 0.,  0.,  0.],
            [ 0.,  0.,  0.],
            [ 0.,  0.,  0.]])
    

# torch.bincount

`torch.bincount(input, weights=None, minlength=0) → Tensor`

    

Count the frequency of each value in an array of non-negative ints.

The number of bins (size 1) is one larger than the largest value in `input`
unless `input` is empty, in which case the result is a tensor of size 0. If
`minlength` is specified, the number of bins is at least `minlength` and if
`input` is empty, then the result is tensor of size `minlength` filled with
zeros. If `n` is the value at position `i`, `out[n] += weights[i]` if
`weights` is specified else `out[n] += 1`.

Note

This operation may produce nondeterministic gradients when given tensors on a
CUDA device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-d int tensor
  * **weights** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – optional, weight for each value in the input tensor. Should be of same size as input tensor.
  * **minlength** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – optional, minimum number of bins. Should be non-negative.

Returns

    

a tensor of shape `Size([max(input) + 1])` if `input` is non-empty, else
`Size(0)`

Return type

    

output ([Tensor](../tensors#torch.Tensor "torch.Tensor"))

Example:

    
    
    >>> input = torch.randint(0, 8, (5,), dtype=torch.int64)
    >>> weights = torch.linspace(0, 1, steps=5)
    >>> input, weights
    (tensor([4, 3, 6, 3, 4]),
     tensor([ 0.0000,  0.2500,  0.5000,  0.7500,  1.0000])
    
    >>> torch.bincount(input)
    tensor([0, 0, 0, 2, 2, 0, 1])
    
    >>> input.bincount(weights)
    tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000])
    

# torch.bitwise_and

`torch.bitwise_and(input, other, *, out=None) → Tensor`

    

Computes the bitwise AND of `input` and `other`. The input tensor must be of
integral or Boolean types. For bool tensors, it computes the logical AND.

Parameters

    

  * **input** – the first input tensor
  * **other** – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

#### Example

    
    
    >>> torch.bitwise_and(torch.tensor([-1, -2, 3], dtype=torch.int8), torch.tensor([1, 0, 3], dtype=torch.int8))
    tensor([1, 0,  3], dtype=torch.int8)
    >>> torch.bitwise_and(torch.tensor([True, True, False]), torch.tensor([False, True, False]))
    tensor([ False, True, False])
    

# torch.bitwise_not

`torch.bitwise_not(input, *, out=None) → Tensor`

    

Computes the bitwise NOT of the given input tensor. The input tensor must be
of integral or Boolean types. For bool tensors, it computes the logical NOT.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

#### Example

    
    
    >>> torch.bitwise_not(torch.tensor([-1, -2, 3], dtype=torch.int8))
    tensor([ 0,  1, -4], dtype=torch.int8)
    

# torch.bitwise_or

`torch.bitwise_or(input, other, *, out=None) → Tensor`

    

Computes the bitwise OR of `input` and `other`. The input tensor must be of
integral or Boolean types. For bool tensors, it computes the logical OR.

Parameters

    

  * **input** – the first input tensor
  * **other** – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

#### Example

    
    
    >>> torch.bitwise_or(torch.tensor([-1, -2, 3], dtype=torch.int8), torch.tensor([1, 0, 3], dtype=torch.int8))
    tensor([-1, -2,  3], dtype=torch.int8)
    >>> torch.bitwise_or(torch.tensor([True, True, False]), torch.tensor([False, True, False]))
    tensor([ True, True, False])
    

# torch.bitwise_xor

`torch.bitwise_xor(input, other, *, out=None) → Tensor`

    

Computes the bitwise XOR of `input` and `other`. The input tensor must be of
integral or Boolean types. For bool tensors, it computes the logical XOR.

Parameters

    

  * **input** – the first input tensor
  * **other** – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

#### Example

    
    
    >>> torch.bitwise_xor(torch.tensor([-1, -2, 3], dtype=torch.int8), torch.tensor([1, 0, 3], dtype=torch.int8))
    tensor([-2, -2,  0], dtype=torch.int8)
    >>> torch.bitwise_xor(torch.tensor([True, True, False]), torch.tensor([False, True, False]))
    tensor([ True, False, False])
    

# torch.blackman_window

`torch.blackman_window(window_length, periodic=True, *, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Blackman window function.

w[n]=0.42−0.5cos⁡(2πnN−1)+0.08cos⁡(4πnN−1)w[n] = 0.42 - 0.5 \cos \left(
\frac{2 \pi n}{N - 1} \right) + 0.08 \cos \left( \frac{4 \pi n}{N - 1} \right)

where NN is the full window size.

The input `window_length` is a positive integer controlling the returned
window size. `periodic` flag determines whether the returned window trims off
the last duplicate value from the symmetric window and is ready to be used as
a periodic window with functions like [`torch.stft()`](torch.stft#torch.stft
"torch.stft"). Therefore, if `periodic` is true, the NN in above formula is in
fact window_length+1\text{window\\_length} + 1 . Also, we always have
`torch.blackman_window(L, periodic=True)` equal to `torch.blackman_window(L +
1, periodic=False)[:-1])`.

Note

If `window_length` =1=1 , the returned window contains a single value 1.

Parameters

    

  * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of returned window
  * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a window to be used as periodic function. If False, return a symmetric window.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). Only floating point types are supported.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Returns

    

A 1-D tensor of size (window_length,)(\text{window\\_length},) containing the
window

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

# torch.block_diag

`torch.block_diag(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#block_diag)

    

Create a block diagonal matrix from provided tensors.

Parameters

    

***tensors** – One or more tensors with 0, 1, or 2 dimensions.

Returns

    

A 2 dimensional tensor with all the input tensors arranged in

    

order such that their upper left and lower right corners are diagonally
adjacent. All other elements are set to 0.

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> import torch
    >>> A = torch.tensor([[0, 1], [1, 0]])
    >>> B = torch.tensor([[3, 4, 5], [6, 7, 8]])
    >>> C = torch.tensor(7)
    >>> D = torch.tensor([1, 2, 3])
    >>> E = torch.tensor([[4], [5], [6]])
    >>> torch.block_diag(A, B, C, D, E)
    tensor([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0],
            [1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            [0, 0, 3, 4, 5, 0, 0, 0, 0, 0],
            [0, 0, 6, 7, 8, 0, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 7, 0, 0, 0, 0],
            [0, 0, 0, 0, 0, 0, 1, 2, 3, 0],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 4],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 5],
            [0, 0, 0, 0, 0, 0, 0, 0, 0, 6]])
    

# torch.bmm

`torch.bmm(input, mat2, *, deterministic=False, out=None) → Tensor`

    

Performs a batch matrix-matrix product of matrices stored in `input` and
`mat2`.

`input` and `mat2` must be 3-D tensors each containing the same number of
matrices.

If `input` is a (b×n×m)(b \times n \times m) tensor, `mat2` is a (b×m×p)(b
\times m \times p) tensor, `out` will be a (b×n×p)(b \times n \times p)
tensor.

outi=inputi@mat2i\text{out}_i = \text{input}_i \mathbin{@} \text{mat2}_i

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Note

This function does not
[broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics). For broadcasting matrix products, see
[`torch.matmul()`](torch.matmul#torch.matmul "torch.matmul").

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first batch of matrices to be multiplied
  * **mat2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second batch of matrices to be multiplied

Keyword Arguments

    

  * **deterministic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – flag to choose between a faster non-deterministic calculation, or a slower deterministic calculation. This argument is only available for sparse-dense CUDA bmm. Default: `False`
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> input = torch.randn(10, 3, 4)
    >>> mat2 = torch.randn(10, 4, 5)
    >>> res = torch.bmm(input, mat2)
    >>> res.size()
    torch.Size([10, 3, 5])
    

# torch.broadcast_shapes

`torch.broadcast_shapes(*shapes) → Size`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#broadcast_shapes)

    

Similar to
[`broadcast_tensors()`](torch.broadcast_tensors#torch.broadcast_tensors
"torch.broadcast_tensors") but for shapes.

This is equivalent to `torch.broadcast_tensors(*map(torch.empty,
shapes))[0].shape` but avoids the need create to intermediate tensors. This is
useful for broadcasting tensors of common batch shape but different rightmost
shape, e.g. to broadcast mean vectors with covariance matrices.

Example:

    
    
    >>> torch.broadcast_shapes((2,), (3, 1), (1, 1, 1))
    torch.Size([1, 3, 2])
    

Parameters

    

***shapes** (_torch.Size_) – Shapes of tensors.

Returns

    

A shape compatible with all input shapes.

Return type

    

shape (torch.Size)

Raises

    

[**RuntimeError**](https://docs.python.org/3/library/exceptions.html#RuntimeError
"\(in Python v3.9\)") – If shapes are incompatible.

# torch.broadcast_tensors

`torch.broadcast_tensors(*tensors) → List of Tensors`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#broadcast_tensors)

    

Broadcasts the given tensors according to [Broadcasting
semantics](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

***tensors** – any number of tensors of the same type

Warning

More than one element of a broadcasted tensor may refer to a single memory
location. As a result, in-place operations (especially ones that are
vectorized) may result in incorrect behavior. If you need to write to the
tensors, please clone them first.

Example:

    
    
    >>> x = torch.arange(3).view(1, 3)
    >>> y = torch.arange(2).view(2, 1)
    >>> a, b = torch.broadcast_tensors(x, y)
    >>> a.size()
    torch.Size([2, 3])
    >>> a
    tensor([[0, 1, 2],
            [0, 1, 2]])
    

# torch.broadcast_to

`torch.broadcast_to(input, shape) → Tensor`

    

Broadcasts `input` to the shape `shape`. Equivalent to calling
`input.expand(shape)`. See [`expand()`](../tensors#torch.Tensor.expand
"torch.Tensor.expand") for details.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **shape** (list, tuple, or `torch.Size`) – the new shape.

Example:

    
    
    >>> x = torch.tensor([1, 2, 3])
    >>> torch.broadcast_to(x, (3, 3))
    tensor([[1, 2, 3],
            [1, 2, 3],
            [1, 2, 3]])
    

# torch.bucketize

`torch.bucketize(input, boundaries, *, out_int32=False, right=False, out=None)
→ Tensor`

    

Returns the indices of the buckets to which each value in the `input` belongs,
where the boundaries of the buckets are set by `boundaries`. Return a new
tensor with the same size as `input`. If `right` is False (default), then the
left boundary is closed. More formally, the returned index satisfies the
following rules:

`right` | _returned index satisfies_  
---|---  
False | `boundaries[i-1] < input[m][n]...[l][x] <= boundaries[i]`  
True | `boundaries[i-1] <= input[m][n]...[l][x] < boundaries[i]`  
  
Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – N-D tensor or a Scalar containing the search value(s).
  * **boundaries** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D tensor, must contain a monotonically increasing sequence.

Keyword Arguments

    

  * **out_int32** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicate the output data type. torch.int32 if True, torch.int64 otherwise. Default value is False, i.e. default output data type is torch.int64.
  * **right** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if False, return the first suitable location that is found. If True, return the last such index. If no suitable index found, return 0 for non-numerical value (eg. nan, inf) or the size of `boundaries` (one pass the last index). In other words, if False, gets the lower bound index for each value in `input` from `boundaries`. If True, gets the upper bound index instead. Default value is False.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor, must be the same size as `input` if provided.

Example:

    
    
    >>> boundaries = torch.tensor([1, 3, 5, 7, 9])
    >>> boundaries
    tensor([1, 3, 5, 7, 9])
    >>> v = torch.tensor([[3, 6, 9], [3, 6, 9]])
    >>> v
    tensor([[3, 6, 9],
            [3, 6, 9]])
    >>> torch.bucketize(v, boundaries)
    tensor([[1, 3, 4],
            [1, 3, 4]])
    >>> torch.bucketize(v, boundaries, right=True)
    tensor([[2, 3, 5],
            [2, 3, 5]])
    

# torch.can_cast

`torch.can_cast(from, to) → bool`

    

Determines if a type conversion is allowed under PyTorch casting rules
described in the type promotion [documentation](../tensor_attributes#type-
promotion-doc).

Parameters

    

  * **from** (_dpython:type_) – The original [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype").
  * **to** (_dpython:type_) – The target [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype").

Example:

    
    
    >>> torch.can_cast(torch.double, torch.float)
    True
    >>> torch.can_cast(torch.float, torch.int)
    False
    

# torch.cartesian_prod

`torch.cartesian_prod(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#cartesian_prod)

    

Do cartesian product of the given sequence of tensors. The behavior is similar
to python’s `itertools.product`.

Parameters

    

***tensors** – any number of 1 dimensional tensors.

Returns

    

A tensor equivalent to converting all the input tensors into lists,

    

do `itertools.product` on these lists, and finally convert the resulting list
into tensor.

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> a = [1, 2, 3]
    >>> b = [4, 5]
    >>> list(itertools.product(a, b))
    [(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)]
    >>> tensor_a = torch.tensor(a)
    >>> tensor_b = torch.tensor(b)
    >>> torch.cartesian_prod(tensor_a, tensor_b)
    tensor([[1, 4],
            [1, 5],
            [2, 4],
            [2, 5],
            [3, 4],
            [3, 5]])
    

# torch.cat

`torch.cat(tensors, dim=0, *, out=None) → Tensor`

    

Concatenates the given sequence of `seq` tensors in the given dimension. All
tensors must either have the same shape (except in the concatenating
dimension) or be empty.

`torch.cat()` can be seen as an inverse operation for
[`torch.split()`](torch.split#torch.split "torch.split") and
[`torch.chunk()`](torch.chunk#torch.chunk "torch.chunk").

`torch.cat()` can be best understood via examples.

Parameters

    

  * **tensors** (_sequence of Tensors_) – any python sequence of tensors of the same type. Non-empty tensors provided must have the same shape, except in the cat dimension.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension over which the tensors are concatenated

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x = torch.randn(2, 3)
    >>> x
    tensor([[ 0.6580, -1.0969, -0.4614],
            [-0.1034, -0.5790,  0.1497]])
    >>> torch.cat((x, x, x), 0)
    tensor([[ 0.6580, -1.0969, -0.4614],
            [-0.1034, -0.5790,  0.1497],
            [ 0.6580, -1.0969, -0.4614],
            [-0.1034, -0.5790,  0.1497],
            [ 0.6580, -1.0969, -0.4614],
            [-0.1034, -0.5790,  0.1497]])
    >>> torch.cat((x, x, x), 1)
    tensor([[ 0.6580, -1.0969, -0.4614,  0.6580, -1.0969, -0.4614,  0.6580,
             -1.0969, -0.4614],
            [-0.1034, -0.5790,  0.1497, -0.1034, -0.5790,  0.1497, -0.1034,
             -0.5790,  0.1497]])
    

# torch.cdist

`torch.cdist(x1, x2, p=2.0,
compute_mode='use_mm_for_euclid_dist_if_necessary')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#cdist)

    

Computes batched the p-norm distance between each pair of the two collections
of row vectors.

Parameters

    

  * **x1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input tensor of shape B×P×MB \times P \times M .
  * **x2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input tensor of shape B×R×MB \times R \times M .
  * **p** – p value for the p-norm distance to calculate between each vector pair ∈[0,∞]\in [0, \infty] .
  * **compute_mode** – ‘use_mm_for_euclid_dist_if_necessary’ - will use matrix multiplication approach to calculate euclidean distance (p = 2) if P > 25 or R > 25 ‘use_mm_for_euclid_dist’ - will always use matrix multiplication approach to calculate euclidean distance (p = 2) ‘donot_use_mm_for_euclid_dist’ - will never use matrix multiplication approach to calculate euclidean distance (p = 2) Default: use_mm_for_euclid_dist_if_necessary.

If x1 has shape B×P×MB \times P \times M and x2 has shape B×R×MB \times R
\times M then the output will have shape B×P×RB \times P \times R .

This function is equivalent to
`scipy.spatial.distance.cdist(input,’minkowski’, p=p)` if p∈(0,∞)p \in (0,
\infty) . When p=0p = 0 it is equivalent to
`scipy.spatial.distance.cdist(input, ‘hamming’) * M`. When p=∞p = \infty , the
closest scipy function is `scipy.spatial.distance.cdist(xn, lambda x, y:
np.abs(x - y).max())`.

#### Example

    
    
    >>> a = torch.tensor([[0.9041,  0.0196], [-0.3108, -2.4423], [-0.4821,  1.059]])
    >>> a
    tensor([[ 0.9041,  0.0196],
            [-0.3108, -2.4423],
            [-0.4821,  1.0590]])
    >>> b = torch.tensor([[-2.1763, -0.4713], [-0.6986,  1.3702]])
    >>> b
    tensor([[-2.1763, -0.4713],
            [-0.6986,  1.3702]])
    >>> torch.cdist(a, b, p=2)
    tensor([[3.1193, 2.0959],
            [2.7138, 3.8322],
            [2.2830, 0.3791]])
    

# torch.ceil

`torch.ceil(input, *, out=None) → Tensor`

    

Returns a new tensor with the ceil of the elements of `input`, the smallest
integer greater than or equal to each element.

outi=⌈inputi⌉=⌊inputi⌋+1\text{out}_{i} = \left\lceil \text{input}_{i}
\right\rceil = \left\lfloor \text{input}_{i} \right\rfloor + 1

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.6341, -1.4208, -1.0900,  0.5826])
    >>> torch.ceil(a)
    tensor([-0., -1., -1.,  1.])
    

# torch.chain_matmul

`torch.chain_matmul(*matrices)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#chain_matmul)

    

Returns the matrix product of the NN 2-D tensors. This product is efficiently
computed using the matrix chain order algorithm which selects the order in
which incurs the lowest cost in terms of arithmetic operations
([[CLRS]](https://mitpress.mit.edu/books/introduction-algorithms-third-
edition)). Note that since this is a function to compute the product, NN needs
to be greater than or equal to 2; if equal to 2 then a trivial matrix-matrix
product is returned. If NN is 1, then this is a no-op - the original matrix is
returned as is.

Parameters

    

**matrices** (_Tensors..._) – a sequence of 2 or more 2-D tensors whose
product is to be determined.

Returns

    

if the ithi^{th} tensor was of dimensions pi×pi+1p_{i} \times p_{i + 1} , then
the product would be of dimensions p1×pN+1p_{1} \times p_{N + 1} .

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> a = torch.randn(3, 4)
    >>> b = torch.randn(4, 5)
    >>> c = torch.randn(5, 6)
    >>> d = torch.randn(6, 7)
    >>> torch.chain_matmul(a, b, c, d)
    tensor([[ -2.3375,  -3.9790,  -4.1119,  -6.6577,   9.5609, -11.5095,  -3.2614],
            [ 21.4038,   3.3378,  -8.4982,  -5.2457, -10.2561,  -2.4684,   2.7163],
            [ -0.9647,  -5.8917,  -2.3213,  -5.2284,  12.8615, -12.2816,  -2.5095]])
    

# torch.cholesky

`torch.cholesky(input, upper=False, *, out=None) → Tensor`

    

Computes the Cholesky decomposition of a symmetric positive-definite matrix AA
or for batches of symmetric positive-definite matrices.

If `upper` is `True`, the returned matrix `U` is upper-triangular, and the
decomposition has the form:

A=UTUA = U^TU

If `upper` is `False`, the returned matrix `L` is lower-triangular, and the
decomposition has the form:

A=LLTA = LL^T

If `upper` is `True`, and AA is a batch of symmetric positive-definite
matrices, then the returned tensor will be composed of upper-triangular
Cholesky factors of each of the individual matrices. Similarly, when `upper`
is `False`, the returned tensor will be composed of lower-triangular Cholesky
factors of each of the individual matrices.

Note

[`torch.linalg.cholesky()`](../linalg#torch.linalg.cholesky
"torch.linalg.cholesky") should be used over `torch.cholesky` when possible.
Note however that [`torch.linalg.cholesky()`](../linalg#torch.linalg.cholesky
"torch.linalg.cholesky") does not yet support the `upper` parameter and
instead always returns the lower triangular matrix.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor AA of size (∗,n,n)(*, n, n) where `*` is zero or more batch dimensions consisting of symmetric positive-definite matrices.
  * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – flag that indicates whether to return a upper or lower triangular matrix. Default: `False`

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output matrix

Example:

    
    
    >>> a = torch.randn(3, 3)
    >>> a = torch.mm(a, a.t()) # make symmetric positive-definite
    >>> l = torch.cholesky(a)
    >>> a
    tensor([[ 2.4112, -0.7486,  1.4551],
            [-0.7486,  1.3544,  0.1294],
            [ 1.4551,  0.1294,  1.6724]])
    >>> l
    tensor([[ 1.5528,  0.0000,  0.0000],
            [-0.4821,  1.0592,  0.0000],
            [ 0.9371,  0.5487,  0.7023]])
    >>> torch.mm(l, l.t())
    tensor([[ 2.4112, -0.7486,  1.4551],
            [-0.7486,  1.3544,  0.1294],
            [ 1.4551,  0.1294,  1.6724]])
    >>> a = torch.randn(3, 2, 2)
    >>> a = torch.matmul(a, a.transpose(-1, -2)) + 1e-03 # make symmetric positive-definite
    >>> l = torch.cholesky(a)
    >>> z = torch.matmul(l, l.transpose(-1, -2))
    >>> torch.max(torch.abs(z - a)) # Max non-zero
    tensor(2.3842e-07)
    

# torch.cholesky_inverse

`torch.cholesky_inverse(input, upper=False, *, out=None) → Tensor`

    

Computes the inverse of a symmetric positive-definite matrix AA using its
Cholesky factor uu : returns matrix `inv`. The inverse is computed using
LAPACK routines `dpotri` and `spotri` (and the corresponding MAGMA routines).

If `upper` is `False`, uu is lower triangular such that the returned tensor is

inv=(uuT)−1inv = (uu^{{T}})^{{-1}}

If `upper` is `True` or not provided, uu is upper triangular such that the
returned tensor is

inv=(uTu)−1inv = (u^T u)^{{-1}}

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input 2-D tensor uu , a upper or lower triangular Cholesky factor
  * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to return a lower (default) or upper triangular matrix

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor for `inv`

Example:

    
    
    >>> a = torch.randn(3, 3)
    >>> a = torch.mm(a, a.t()) + 1e-05 * torch.eye(3) # make symmetric positive definite
    >>> u = torch.cholesky(a)
    >>> a
    tensor([[  0.9935,  -0.6353,   1.5806],
            [ -0.6353,   0.8769,  -1.7183],
            [  1.5806,  -1.7183,  10.6618]])
    >>> torch.cholesky_inverse(u)
    tensor([[ 1.9314,  1.2251, -0.0889],
            [ 1.2251,  2.4439,  0.2122],
            [-0.0889,  0.2122,  0.1412]])
    >>> a.inverse()
    tensor([[ 1.9314,  1.2251, -0.0889],
            [ 1.2251,  2.4439,  0.2122],
            [-0.0889,  0.2122,  0.1412]])
    

# torch.cholesky_solve

`torch.cholesky_solve(input, input2, upper=False, *, out=None) → Tensor`

    

Solves a linear system of equations with a positive semidefinite matrix to be
inverted given its Cholesky factor matrix uu .

If `upper` is `False`, uu is and lower triangular and `c` is returned such
that:

c=(uuT)−1bc = (u u^T)^{{-1}} b

If `upper` is `True` or not provided, uu is upper triangular and `c` is
returned such that:

c=(uTu)−1bc = (u^T u)^{{-1}} b

`torch.cholesky_solve(b, u)` can take in 2D inputs `b, u` or inputs that are
batches of 2D matrices. If the inputs are batches, then returns batched
outputs `c`

Supports real-valued and complex-valued inputs. For the complex-valued inputs
the transpose operator above is the conjugate transpose.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input matrix bb of size (∗,m,k)(*, m, k) , where ∗* is zero or more batch dimensions
  * **input2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input matrix uu of size (∗,m,m)(*, m, m) , where ∗* is zero of more batch dimensions composed of upper or lower triangular Cholesky factor
  * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to consider the Cholesky factor as a lower or upper triangular matrix. Default: `False`.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor for `c`

Example:

    
    
    >>> a = torch.randn(3, 3)
    >>> a = torch.mm(a, a.t()) # make symmetric positive definite
    >>> u = torch.cholesky(a)
    >>> a
    tensor([[ 0.7747, -1.9549,  1.3086],
            [-1.9549,  6.7546, -5.4114],
            [ 1.3086, -5.4114,  4.8733]])
    >>> b = torch.randn(3, 2)
    >>> b
    tensor([[-0.6355,  0.9891],
            [ 0.1974,  1.4706],
            [-0.4115, -0.6225]])
    >>> torch.cholesky_solve(b, u)
    tensor([[ -8.1625,  19.6097],
            [ -5.8398,  14.2387],
            [ -4.3771,  10.4173]])
    >>> torch.mm(a.inverse(), b)
    tensor([[ -8.1626,  19.6097],
            [ -5.8398,  14.2387],
            [ -4.3771,  10.4173]])
    

# torch.chunk

`torch.chunk(input, chunks, dim=0) → List of Tensors`

    

Splits a tensor into a specific number of chunks. Each chunk is a view of the
input tensor.

Last chunk will be smaller if the tensor size along the given dimension `dim`
is not divisible by `chunks`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to split
  * **chunks** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of chunks to return
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to split the tensor

# torch.clamp

`torch.clamp(input, min, max, *, out=None) → Tensor`

    

Clamp all elements in `input` into the range `[` [`min`](torch.min#torch.min
"torch.min"), [`max`](torch.max#torch.max "torch.max") `]`. Let min_value and
max_value be [`min`](torch.min#torch.min "torch.min") and
[`max`](torch.max#torch.max "torch.max"), respectively, this returns:

yi=min⁡(max⁡(xi,min_value),max_value)y_i = \min(\max(x_i, \text{min\\_value}),
\text{max\\_value})

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **min** (_Number_) – lower-bound of the range to be clamped to
  * **max** (_Number_) – upper-bound of the range to be clamped to

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-1.7120,  0.1734, -0.0478, -0.0922])
    >>> torch.clamp(a, min=-0.5, max=0.5)
    tensor([-0.5000,  0.1734, -0.0478, -0.0922])
    

`torch.clamp(input, *, min, out=None) → Tensor`

Clamps all elements in `input` to be larger or equal
[`min`](torch.min#torch.min "torch.min").

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

  * **min** (_Number_) – minimal value of each element in the output
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.0299, -2.3184,  2.1593, -0.8883])
    >>> torch.clamp(a, min=0.5)
    tensor([ 0.5000,  0.5000,  2.1593,  0.5000])
    

`torch.clamp(input, *, max, out=None) → Tensor`

Clamps all elements in `input` to be smaller or equal
[`max`](torch.max#torch.max "torch.max").

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

  * **max** (_Number_) – maximal value of each element in the output
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.7753, -0.4702, -0.4599,  1.1899])
    >>> torch.clamp(a, max=0.5)
    tensor([ 0.5000, -0.4702, -0.4599,  0.5000])
    

# torch.clip

`torch.clip(input, min, max, *, out=None) → Tensor`

    

Alias for [`torch.clamp()`](torch.clamp#torch.clamp "torch.clamp").

# torch.clone

`torch.clone(input, *, memory_format=torch.preserve_format) → Tensor`

    

Returns a copy of `input`.

Note

This function is differentiable, so gradients will flow back from the result
of this operation to `input`. To create a tensor without an autograd
relationship to `input` see [`detach()`](../autograd#torch.Tensor.detach
"torch.Tensor.detach").

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**memory_format**
([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned tensor. Default: `torch.preserve_format`.

# torch.column_stack

`torch.column_stack(tensors, *, out=None) → Tensor`

    

Creates a new tensor by horizontally stacking the tensors in `tensors`.

Equivalent to `torch.hstack(tensors)`, except each zero or one dimensional
tensor `t` in `tensors` is first reshaped into a `(t.numel(), 1)` column
before being stacked horizontally.

Parameters

    

**tensors** (_sequence of Tensors_) – sequence of tensors to concatenate

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([1, 2, 3])
    >>> b = torch.tensor([4, 5, 6])
    >>> torch.column_stack((a, b))
    tensor([[1, 4],
        [2, 5],
        [3, 6]])
    >>> a = torch.arange(5)
    >>> b = torch.arange(10).reshape(5, 2)
    >>> torch.column_stack((a, b, b))
    tensor([[0, 0, 1, 0, 1],
            [1, 2, 3, 2, 3],
            [2, 4, 5, 4, 5],
            [3, 6, 7, 6, 7],
            [4, 8, 9, 8, 9]])
    

# torch.combinations

`torch.combinations(input, r=2, with_replacement=False) → seq`

    

Compute combinations of length rr of the given tensor. The behavior is similar
to python’s `itertools.combinations` when `with_replacement` is set to
`False`, and `itertools.combinations_with_replacement` when `with_replacement`
is set to `True`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1D vector.
  * **r** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – number of elements to combine
  * **with_replacement** (_boolean_ _,__optional_) – whether to allow duplication in combination

Returns

    

A tensor equivalent to converting all the input tensors into lists, do
`itertools.combinations` or `itertools.combinations_with_replacement` on these
lists, and finally convert the resulting list into tensor.

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> a = [1, 2, 3]
    >>> list(itertools.combinations(a, r=2))
    [(1, 2), (1, 3), (2, 3)]
    >>> list(itertools.combinations(a, r=3))
    [(1, 2, 3)]
    >>> list(itertools.combinations_with_replacement(a, r=2))
    [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)]
    >>> tensor_a = torch.tensor(a)
    >>> torch.combinations(tensor_a)
    tensor([[1, 2],
            [1, 3],
            [2, 3]])
    >>> torch.combinations(tensor_a, r=3)
    tensor([[1, 2, 3]])
    >>> torch.combinations(tensor_a, with_replacement=True)
    tensor([[1, 1],
            [1, 2],
            [1, 3],
            [2, 2],
            [2, 3],
            [3, 3]])
    

# torch.compiled_with_cxx11_abi

`torch.compiled_with_cxx11_abi()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#compiled_with_cxx11_abi)

    

Returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1

# torch.complex

`torch.complex(real, imag, *, out=None) → Tensor`

    

Constructs a complex tensor with its real part equal to
[`real`](torch.real#torch.real "torch.real") and its imaginary part equal to
[`imag`](torch.imag#torch.imag "torch.imag").

Parameters

    

  * **real** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The real part of the complex tensor. Must be float or double.
  * **imag** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The imaginary part of the complex tensor. Must be same dtype as [`real`](torch.real#torch.real "torch.real").

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – If the inputs are
`torch.float32`, must be `torch.complex64`. If the inputs are `torch.float64`,
must be `torch.complex128`.

Example::

    
    
    
    >>> real = torch.tensor([1, 2], dtype=torch.float32)
    >>> imag = torch.tensor([3, 4], dtype=torch.float32)
    >>> z = torch.complex(real, imag)
    >>> z
    tensor([(1.+3.j), (2.+4.j)])
    >>> z.dtype
    torch.complex64
    

# torch.conj

`torch.conj(input, *, out=None) → Tensor`

    

Computes the element-wise conjugate of the given `input` tensor. If
:attr`input` has a non-complex dtype, this function just returns `input`.

Warning

In the future, `torch.conj()` may return a non-writeable view for an `input`
of non-complex dtype. It’s recommended that programs not modify the tensor
returned by `torch.conj()` when `input` is of non-complex dtype to be
compatible with this change.

outi=conj(inputi)\text{out}_{i} = conj(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.conj(torch.tensor([-1 + 1j, -2 + 2j, 3 - 3j]))
    tensor([-1 - 1j, -2 - 2j, 3 + 3j])
    

# torch.copysign

`torch.copysign(input, other, *, out=None) → Tensor`

    

Create a new floating-point tensor with the magnitude of `input` and the sign
of `other`, elementwise.

outi={−∣inputi∣ifotheri≤−0.0∣inputi∣ifotheri≥0.0\text{out}_{i} = \begin{cases}
-|\text{input}_{i}| & \text{if} \text{other}_{i} \leq -0.0 \\\
|\text{input}_{i}| & \text{if} \text{other}_{i} \geq 0.0 \\\ \end{cases}

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics), and integer and float inputs.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – magnitudes.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – contains value(s) whose signbit(s) are applied to the magnitudes in `input`.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(5)
    >>> a
    tensor([-1.2557, -0.0026, -0.5387,  0.4740, -0.9244])
    >>> torch.copysign(a, 1)
    tensor([1.2557, 0.0026, 0.5387, 0.4740, 0.9244])
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.7079,  0.2778, -1.0249,  0.5719],
            [-0.0059, -0.2600, -0.4475, -1.3948],
            [ 0.3667, -0.9567, -2.5757, -0.1751],
            [ 0.2046, -0.0742,  0.2998, -0.1054]])
    >>> b = torch.randn(4)
    tensor([ 0.2373,  0.3120,  0.3190, -1.1128])
    >>> torch.copysign(a, b)
    tensor([[ 0.7079,  0.2778,  1.0249, -0.5719],
            [ 0.0059,  0.2600,  0.4475, -1.3948],
            [ 0.3667,  0.9567,  2.5757, -0.1751],
            [ 0.2046,  0.0742,  0.2998, -0.1054]])
    

# torch.cos

`torch.cos(input, *, out=None) → Tensor`

    

Returns a new tensor with the cosine of the elements of `input`.

outi=cos⁡(inputi)\text{out}_{i} = \cos(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 1.4309,  1.2706, -0.8562,  0.9796])
    >>> torch.cos(a)
    tensor([ 0.1395,  0.2957,  0.6553,  0.5574])
    

# torch.cosh

`torch.cosh(input, *, out=None) → Tensor`

    

Returns a new tensor with the hyperbolic cosine of the elements of `input`.

outi=cosh⁡(inputi)\text{out}_{i} = \cosh(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.1632,  1.1835, -0.6979, -0.7325])
    >>> torch.cosh(a)
    tensor([ 1.0133,  1.7860,  1.2536,  1.2805])
    

Note

When `input` is on the CPU, the implementation of torch.cosh may use the Sleef
library, which rounds very large results to infinity or negative infinity. See
[here](https://sleef.org/purec.xhtml) for details.

# torch.count_nonzero

`torch.count_nonzero(input, dim=None) → Tensor`

    

Counts the number of non-zero values in the tensor `input` along the given
`dim`. If no dim is specified then all non-zeros in the tensor are counted.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_ _,__optional_) – Dim or tuple of dims along which to count non-zeros.

Example:

    
    
    >>> x = torch.zeros(3,3)
    >>> x[torch.randn(3,3) > 0.5] = 1
    >>> x
    tensor([[0., 1., 1.],
            [0., 0., 0.],
            [0., 0., 1.]])
    >>> torch.count_nonzero(x)
    tensor(3)
    >>> torch.count_nonzero(x, dim=0)
    tensor([0, 1, 2])
    

# torch.cross

`torch.cross(input, other, dim=None, *, out=None) → Tensor`

    

Returns the cross product of vectors in dimension `dim` of `input` and
`other`.

`input` and `other` must have the same size, and the size of their `dim`
dimension should be 3.

If `dim` is not given, it defaults to the first dimension found with the size
3. Note that this might be unexpected.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to take the cross-product in.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 3)
    >>> a
    tensor([[-0.3956,  1.1455,  1.6895],
            [-0.5849,  1.3672,  0.3599],
            [-1.1626,  0.7180, -0.0521],
            [-0.1339,  0.9902, -2.0225]])
    >>> b = torch.randn(4, 3)
    >>> b
    tensor([[-0.0257, -1.4725, -1.2251],
            [-1.1479, -0.7005, -1.9757],
            [-1.3904,  0.3726, -1.1836],
            [-0.9688, -0.7153,  0.2159]])
    >>> torch.cross(a, b, dim=1)
    tensor([[ 1.0844, -0.5281,  0.6120],
            [-2.4490, -1.5687,  1.9792],
            [-0.8304, -1.3037,  0.5650],
            [-1.2329,  1.9883,  1.0551]])
    >>> torch.cross(a, b)
    tensor([[ 1.0844, -0.5281,  0.6120],
            [-2.4490, -1.5687,  1.9792],
            [-0.8304, -1.3037,  0.5650],
            [-1.2329,  1.9883,  1.0551]])
    

# torch.cummax

`torch.cummax(input, dim, *, out=None) -> (Tensor, LongTensor)`

    

Returns a namedtuple `(values, indices)` where `values` is the cumulative
maximum of elements of `input` in the dimension `dim`. And `indices` is the
index location of each maximum value found in the dimension `dim`.

yi=max(x1,x2,x3,…,xi)y_i = max(x_1, x_2, x_3, \dots, x_i)

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the result tuple of two output tensors
(values, indices)

Example:

    
    
    >>> a = torch.randn(10)
    >>> a
    tensor([-0.3449, -1.5447,  0.0685, -1.5104, -1.1706,  0.2259,  1.4696, -1.3284,
         1.9946, -0.8209])
    >>> torch.cummax(a, dim=0)
    torch.return_types.cummax(
        values=tensor([-0.3449, -0.3449,  0.0685,  0.0685,  0.0685,  0.2259,  1.4696,  1.4696,
         1.9946,  1.9946]),
        indices=tensor([0, 0, 2, 2, 2, 5, 6, 6, 8, 8]))
    

# torch.cummin

`torch.cummin(input, dim, *, out=None) -> (Tensor, LongTensor)`

    

Returns a namedtuple `(values, indices)` where `values` is the cumulative
minimum of elements of `input` in the dimension `dim`. And `indices` is the
index location of each maximum value found in the dimension `dim`.

yi=min(x1,x2,x3,…,xi)y_i = min(x_1, x_2, x_3, \dots, x_i)

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the result tuple of two output tensors
(values, indices)

Example:

    
    
    >>> a = torch.randn(10)
    >>> a
    tensor([-0.2284, -0.6628,  0.0975,  0.2680, -1.3298, -0.4220, -0.3885,  1.1762,
         0.9165,  1.6684])
    >>> torch.cummin(a, dim=0)
    torch.return_types.cummin(
        values=tensor([-0.2284, -0.6628, -0.6628, -0.6628, -1.3298, -1.3298, -1.3298, -1.3298,
        -1.3298, -1.3298]),
        indices=tensor([0, 1, 1, 1, 4, 4, 4, 4, 4, 4]))
    

# torch.cumprod

`torch.cumprod(input, dim, *, dtype=None, out=None) → Tensor`

    

Returns the cumulative product of elements of `input` in the dimension `dim`.

For example, if `input` is a vector of size N, the result will also be a
vector of size N, with elements.

yi=x1×x2×x3×⋯×xiy_i = x_1 \times x_2\times x_3\times \dots \times x_i

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.randn(10)
    >>> a
    tensor([ 0.6001,  0.2069, -0.1919,  0.9792,  0.6727,  1.0062,  0.4126,
            -0.2129, -0.4206,  0.1968])
    >>> torch.cumprod(a, dim=0)
    tensor([ 0.6001,  0.1241, -0.0238, -0.0233, -0.0157, -0.0158, -0.0065,
             0.0014, -0.0006, -0.0001])
    
    >>> a[5] = 0.0
    >>> torch.cumprod(a, dim=0)
    tensor([ 0.6001,  0.1241, -0.0238, -0.0233, -0.0157, -0.0000, -0.0000,
             0.0000, -0.0000, -0.0000])
    

# torch.cumsum

`torch.cumsum(input, dim, *, dtype=None, out=None) → Tensor`

    

Returns the cumulative sum of elements of `input` in the dimension `dim`.

For example, if `input` is a vector of size N, the result will also be a
vector of size N, with elements.

yi=x1+x2+x3+⋯+xiy_i = x_1 + x_2 + x_3 + \dots + x_i

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.randn(10)
    >>> a
    tensor([-0.8286, -0.4890,  0.5155,  0.8443,  0.1865, -0.1752, -2.0595,
             0.1850, -1.1571, -0.4243])
    >>> torch.cumsum(a, dim=0)
    tensor([-0.8286, -1.3175, -0.8020,  0.0423,  0.2289,  0.0537, -2.0058,
            -1.8209, -2.9780, -3.4022])
    

# torch.deg2rad

`torch.deg2rad(input, *, out=None) → Tensor`

    

Returns a new tensor with each of the elements of `input` converted from
angles in degrees to radians.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([[180.0, -180.0], [360.0, -360.0], [90.0, -90.0]])
    >>> torch.deg2rad(a)
    tensor([[ 3.1416, -3.1416],
            [ 6.2832, -6.2832],
            [ 1.5708, -1.5708]])
    

# torch.dequantize

`torch.dequantize(tensor) → Tensor`

    

Returns an fp32 Tensor by dequantizing a quantized Tensor

Parameters

    

**tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – A quantized
Tensor

`torch.dequantize(tensors) → sequence of Tensors`

Given a list of quantized Tensors, dequantize them and return a list of fp32
Tensors

Parameters

    

**tensors** (_sequence of Tensors_) – A list of quantized Tensors

# torch.det

`torch.det(input) → Tensor`

    

Calculates determinant of a square matrix or batches of square matrices.

Note

`torch.det()` is deprecated. Please use
[`torch.linalg.det()`](../linalg#torch.linalg.det "torch.linalg.det") instead.

Note

Backward through detdet internally uses SVD results when `input` is not
invertible. In this case, double backward through detdet will be unstable when
`input` doesn’t have distinct singular values. See torch.svd~torch.svd for
details.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor of size `(*, n, n)` where `*` is zero or more batch dimensions.

Example:

    
    
    >>> A = torch.randn(3, 3)
    >>> torch.det(A)
    tensor(3.7641)
    
    >>> A = torch.randn(3, 2, 2)
    >>> A
    tensor([[[ 0.9254, -0.6213],
             [-0.5787,  1.6843]],
    
            [[ 0.3242, -0.9665],
             [ 0.4539, -0.0887]],
    
            [[ 1.1336, -0.4025],
             [-0.7089,  0.9032]]])
    >>> A.det()
    tensor([1.1990, 0.4099, 0.7386])
    

# torch.diag

`torch.diag(input, diagonal=0, *, out=None) → Tensor`

    

  * If `input` is a vector (1-D tensor), then returns a 2-D square tensor with the elements of `input` as the diagonal.
  * If `input` is a matrix (2-D tensor), then returns a 1-D tensor with the diagonal elements of `input`.

The argument [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal")
controls which diagonal to consider:

  * If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") = 0, it is the main diagonal.
  * If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") > 0, it is above the main diagonal.
  * If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") < 0, it is below the main diagonal.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **diagonal** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

See also

[`torch.diagonal()`](torch.diagonal#torch.diagonal "torch.diagonal") always
returns the diagonal of its input.

[`torch.diagflat()`](torch.diagflat#torch.diagflat "torch.diagflat") always
constructs a tensor with diagonal elements specified by the input.

Examples:

Get the square matrix where the input vector is the diagonal:

    
    
    >>> a = torch.randn(3)
    >>> a
    tensor([ 0.5950,-0.0872, 2.3298])
    >>> torch.diag(a)
    tensor([[ 0.5950, 0.0000, 0.0000],
            [ 0.0000,-0.0872, 0.0000],
            [ 0.0000, 0.0000, 2.3298]])
    >>> torch.diag(a, 1)
    tensor([[ 0.0000, 0.5950, 0.0000, 0.0000],
            [ 0.0000, 0.0000,-0.0872, 0.0000],
            [ 0.0000, 0.0000, 0.0000, 2.3298],
            [ 0.0000, 0.0000, 0.0000, 0.0000]])
    

Get the k-th diagonal of a given matrix:

    
    
    >>> a = torch.randn(3, 3)
    >>> a
    tensor([[-0.4264, 0.0255,-0.1064],
            [ 0.8795,-0.2429, 0.1374],
            [ 0.1029,-0.6482,-1.6300]])
    >>> torch.diag(a, 0)
    tensor([-0.4264,-0.2429,-1.6300])
    >>> torch.diag(a, 1)
    tensor([ 0.0255, 0.1374])
    

# torch.diag_embed

`torch.diag_embed(input, offset=0, dim1=-2, dim2=-1) → Tensor`

    

Creates a tensor whose diagonals of certain 2D planes (specified by `dim1` and
`dim2`) are filled by `input`. To facilitate creating batched diagonal
matrices, the 2D planes formed by the last two dimensions of the returned
tensor are chosen by default.

The argument `offset` controls which diagonal to consider:

  * If `offset` = 0, it is the main diagonal.
  * If `offset` > 0, it is above the main diagonal.
  * If `offset` < 0, it is below the main diagonal.

The size of the new matrix will be calculated to make the specified diagonal
of the size of the last input dimension. Note that for `offset` other than 00
, the order of `dim1` and `dim2` matters. Exchanging them is equivalent to
changing the sign of `offset`.

Applying [`torch.diagonal()`](torch.diagonal#torch.diagonal "torch.diagonal")
to the output of this function with the same arguments yields a matrix
identical to input. However,
[`torch.diagonal()`](torch.diagonal#torch.diagonal "torch.diagonal") has
different default dimensions, so those need to be explicitly specified.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Must be at least 1-dimensional.
  * **offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – which diagonal to consider. Default: 0 (main diagonal).
  * **dim1** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – first dimension with respect to which to take diagonal. Default: -2.
  * **dim2** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – second dimension with respect to which to take diagonal. Default: -1.

Example:

    
    
    >>> a = torch.randn(2, 3)
    >>> torch.diag_embed(a)
    tensor([[[ 1.5410,  0.0000,  0.0000],
             [ 0.0000, -0.2934,  0.0000],
             [ 0.0000,  0.0000, -2.1788]],
    
            [[ 0.5684,  0.0000,  0.0000],
             [ 0.0000, -1.0845,  0.0000],
             [ 0.0000,  0.0000, -1.3986]]])
    
    >>> torch.diag_embed(a, offset=1, dim1=0, dim2=2)
    tensor([[[ 0.0000,  1.5410,  0.0000,  0.0000],
             [ 0.0000,  0.5684,  0.0000,  0.0000]],
    
            [[ 0.0000,  0.0000, -0.2934,  0.0000],
             [ 0.0000,  0.0000, -1.0845,  0.0000]],
    
            [[ 0.0000,  0.0000,  0.0000, -2.1788],
             [ 0.0000,  0.0000,  0.0000, -1.3986]],
    
            [[ 0.0000,  0.0000,  0.0000,  0.0000],
             [ 0.0000,  0.0000,  0.0000,  0.0000]]])
    

# torch.diagflat

`torch.diagflat(input, offset=0) → Tensor`

    

  * If `input` is a vector (1-D tensor), then returns a 2-D square tensor with the elements of `input` as the diagonal.
  * If `input` is a tensor with more than one dimension, then returns a 2-D tensor with diagonal elements equal to a flattened `input`.

The argument `offset` controls which diagonal to consider:

  * If `offset` = 0, it is the main diagonal.
  * If `offset` > 0, it is above the main diagonal.
  * If `offset` < 0, it is below the main diagonal.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider. Default: 0 (main diagonal).

Examples:

    
    
    >>> a = torch.randn(3)
    >>> a
    tensor([-0.2956, -0.9068,  0.1695])
    >>> torch.diagflat(a)
    tensor([[-0.2956,  0.0000,  0.0000],
            [ 0.0000, -0.9068,  0.0000],
            [ 0.0000,  0.0000,  0.1695]])
    >>> torch.diagflat(a, 1)
    tensor([[ 0.0000, -0.2956,  0.0000,  0.0000],
            [ 0.0000,  0.0000, -0.9068,  0.0000],
            [ 0.0000,  0.0000,  0.0000,  0.1695],
            [ 0.0000,  0.0000,  0.0000,  0.0000]])
    
    >>> a = torch.randn(2, 2)
    >>> a
    tensor([[ 0.2094, -0.3018],
            [-0.1516,  1.9342]])
    >>> torch.diagflat(a)
    tensor([[ 0.2094,  0.0000,  0.0000,  0.0000],
            [ 0.0000, -0.3018,  0.0000,  0.0000],
            [ 0.0000,  0.0000, -0.1516,  0.0000],
            [ 0.0000,  0.0000,  0.0000,  1.9342]])
    

# torch.diagonal

`torch.diagonal(input, offset=0, dim1=0, dim2=1) → Tensor`

    

Returns a partial view of `input` with the its diagonal elements with respect
to `dim1` and `dim2` appended as a dimension at the end of the shape.

The argument `offset` controls which diagonal to consider:

  * If `offset` = 0, it is the main diagonal.
  * If `offset` > 0, it is above the main diagonal.
  * If `offset` < 0, it is below the main diagonal.

Applying [`torch.diag_embed()`](torch.diag_embed#torch.diag_embed
"torch.diag_embed") to the output of this function with the same arguments
yields a diagonal matrix with the diagonal entries of the input. However,
[`torch.diag_embed()`](torch.diag_embed#torch.diag_embed "torch.diag_embed")
has different default dimensions, so those need to be explicitly specified.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Must be at least 2-dimensional.
  * **offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – which diagonal to consider. Default: 0 (main diagonal).
  * **dim1** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – first dimension with respect to which to take diagonal. Default: 0.
  * **dim2** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – second dimension with respect to which to take diagonal. Default: 1.

Note

To take a batch diagonal, pass in dim1=-2, dim2=-1.

Examples:

    
    
    >>> a = torch.randn(3, 3)
    >>> a
    tensor([[-1.0854,  1.1431, -0.1752],
            [ 0.8536, -0.0905,  0.0360],
            [ 0.6927, -0.3735, -0.4945]])
    
    
    >>> torch.diagonal(a, 0)
    tensor([-1.0854, -0.0905, -0.4945])
    
    
    >>> torch.diagonal(a, 1)
    tensor([ 1.1431,  0.0360])
    
    
    >>> x = torch.randn(2, 5, 4, 2)
    >>> torch.diagonal(x, offset=-1, dim1=1, dim2=2)
    tensor([[[-1.2631,  0.3755, -1.5977, -1.8172],
             [-1.1065,  1.0401, -0.2235, -0.7938]],
    
            [[-1.7325, -0.3081,  0.6166,  0.2335],
             [ 1.0500,  0.7336, -0.3836, -1.1015]]])
    

# torch.diff

`torch.diff(input, n=1, dim=-1, prepend=None, append=None) → Tensor`

    

Computes the n-th forward difference along the given dimension.

The first-order differences are given by `out[i] = input[i + 1] - input[i]`.
Higher-order differences are calculated by using `torch.diff()` recursively.

Note

Only `n = 1` is currently supported

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute the differences on
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of times to recursively compute the difference
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to compute the difference along. Default is the last dimension.
  * **append** (_prepend_ _,_) – values to prepend or append to `input` along `dim` before computing the difference. Their dimensions must be equivalent to that of input, and their shapes must match input’s shape except on `dim`.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([1, 3, 2])
    >>> torch.diff(a)
    tensor([ 2, -1])
    >>> b = torch.tensor([4, 5])
    >>> torch.diff(a, append=b)
    tensor([ 2, -1,  2,  1])
    >>> c = torch.tensor([[1, 2, 3], [3, 4, 5]])
    >>> torch.diff(c, dim=0)
    tensor([[2, 2, 2]])
    >>> torch.diff(c, dim=1)
    tensor([[1, 1],
            [1, 1]])
    

# torch.digamma

`torch.digamma(input, *, out=None) → Tensor`

    

Computes the logarithmic derivative of the gamma function on `input`.

ψ(x)=ddxln⁡(Γ(x))=Γ′(x)Γ(x)\psi(x) = \frac{d}{dx}
\ln\left(\Gamma\left(x\right)\right) = \frac{\Gamma'(x)}{\Gamma(x)}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to
compute the digamma function on

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Note

This function is similar to SciPy’s `scipy.special.digamma`.

Note

From PyTorch 1.8 onwards, the digamma function returns `-Inf` for `0`.
Previously it returned `NaN` for `0`.

Example:

    
    
    >>> a = torch.tensor([1, 0.5])
    >>> torch.digamma(a)
    tensor([-0.5772, -1.9635])
    

# torch.dist

`torch.dist(input, other, p=2) → Tensor`

    

Returns the p-norm of (`input` \- `other`)

The shapes of `input` and `other` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the Right-hand-side input tensor
  * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the norm to be computed

Example:

    
    
    >>> x = torch.randn(4)
    >>> x
    tensor([-1.5393, -0.8675,  0.5916,  1.6321])
    >>> y = torch.randn(4)
    >>> y
    tensor([ 0.0967, -1.0511,  0.6295,  0.8360])
    >>> torch.dist(x, y, 3.5)
    tensor(1.6727)
    >>> torch.dist(x, y, 3)
    tensor(1.6973)
    >>> torch.dist(x, y, 0)
    tensor(inf)
    >>> torch.dist(x, y, 1)
    tensor(2.6537)
    

# torch.div

`torch.div(input, other, *, rounding_mode=None, out=None) → Tensor`

    

Divides each element of the input `input` by the corresponding element of
`other`.

outi=inputiotheri\text{out}_i = \frac{\text{input}_i}{\text{other}_i}

Note

By default, this performs a “true” division like Python 3. See the
`rounding_mode` argument for floor division.

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics), [type promotion](../tensor_attributes#type-promotion-doc), and
integer, float, and complex inputs. Always promotes integer types to the
default scalar type.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the dividend
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the divisor

Keyword Arguments

    

  * **rounding_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

Type of rounding applied to the result:

    * None - default behavior. Performs no rounding and, if both `input` and `other` are integer types, promotes the inputs to the default scalar type. Equivalent to true division in Python (the `/` operator) and NumPy’s `np.true_divide`.
    * `"trunc"` \- rounds the results of the division towards zero. Equivalent to C-style integer division.
    * `"floor"` \- rounds the results of the division down. Equivalent to floor division in Python (the `//` operator) and NumPy’s `np.floor_divide`.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Examples:

    
    
    >>> x = torch.tensor([ 0.3810,  1.2774, -0.2972, -0.3719,  0.4637])
    >>> torch.div(x, 0.5)
    tensor([ 0.7620,  2.5548, -0.5944, -0.7438,  0.9274])
    
    >>> a = torch.tensor([[-0.3711, -1.9353, -0.4605, -0.2917],
    ...                   [ 0.1815, -1.0111,  0.9805, -1.5923],
    ...                   [ 0.1062,  1.4581,  0.7759, -1.2344],
    ...                   [-0.1830, -0.0313,  1.1908, -1.4757]])
    >>> b = torch.tensor([ 0.8032,  0.2930, -0.8113, -0.2308])
    >>> torch.div(a, b)
    tensor([[-0.4620, -6.6051,  0.5676,  1.2639],
            [ 0.2260, -3.4509, -1.2086,  6.8990],
            [ 0.1322,  4.9764, -0.9564,  5.3484],
            [-0.2278, -0.1068, -1.4678,  6.3938]])
    
    >>> torch.div(a, b, rounding_mode='trunc')
    tensor([[-0., -6.,  0.,  1.],
            [ 0., -3., -1.,  6.],
            [ 0.,  4., -0.,  5.],
            [-0., -0., -1.,  6.]])
    
    >>> torch.div(a, b, rounding_mode='floor')
    tensor([[-1., -7.,  0.,  1.],
            [ 0., -4., -2.,  6.],
            [ 0.,  4., -1.,  5.],
            [-1., -1., -2.,  6.]])
    

# torch.divide

`torch.divide(input, other, *, rounding_mode=None, out=None) → Tensor`

    

Alias for [`torch.div()`](torch.div#torch.div "torch.div").

# torch.dot

`torch.dot(input, other, *, out=None) → Tensor`

    

Computes the dot product of two 1D tensors.

Note

Unlike NumPy’s dot, torch.dot intentionally only supports computing the dot
product of two 1D tensors with the same number of elements.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor in the dot product, must be 1D.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor in the dot product, must be 1D.

Keyword Arguments

    

**{out}** –

Example:

    
    
    >>> torch.dot(torch.tensor([2, 3]), torch.tensor([2, 1]))
    tensor(7)
    

# torch.dstack

`torch.dstack(tensors, *, out=None) → Tensor`

    

Stack tensors in sequence depthwise (along third axis).

This is equivalent to concatenation along the third axis after 1-D and 2-D
tensors have been reshaped by
[`torch.atleast_3d()`](torch.atleast_3d#torch.atleast_3d "torch.atleast_3d").

Parameters

    

**tensors** (_sequence of Tensors_) – sequence of tensors to concatenate

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> a = torch.tensor([1, 2, 3])
    >>> b = torch.tensor([4, 5, 6])
    >>> torch.dstack((a,b))
    tensor([[[1, 4],
             [2, 5],
             [3, 6]]])
    >>> a = torch.tensor([[1],[2],[3]])
    >>> b = torch.tensor([[4],[5],[6]])
    >>> torch.dstack((a,b))
    tensor([[[1, 4]],
            [[2, 5]],
            [[3, 6]]])
    

# torch.eig

`torch.eig(input, eigenvectors=False, *, out=None) -> (Tensor, Tensor)`

    

Computes the eigenvalues and eigenvectors of a real square matrix.

Note

Since eigenvalues and eigenvectors might be complex, backward pass is
supported only if eigenvalues and eigenvectors are all real valued.

When `input` is on CUDA, `torch.eig()` causes host-device synchronization.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the square matrix of shape (n×n)(n \times n) for which the eigenvalues and eigenvectors will be computed
  * **eigenvectors** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – `True` to compute both eigenvalues and eigenvectors; otherwise, only eigenvalues will be computed

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the output tensors

Returns

    

A namedtuple (eigenvalues, eigenvectors) containing

  * **eigenvalues** (_Tensor_): Shape (n×2)(n \times 2) . Each row is an eigenvalue of `input`, where the first element is the real part and the second element is the imaginary part. The eigenvalues are not necessarily ordered.
  * **eigenvectors** (_Tensor_): If `eigenvectors=False`, it’s an empty tensor. Otherwise, this tensor of shape (n×n)(n \times n) can be used to compute normalized (unit length) eigenvectors of corresponding eigenvalues as follows. If the corresponding `eigenvalues[j]` is a real number, column `eigenvectors[:, j]` is the eigenvector corresponding to `eigenvalues[j]`. If the corresponding `eigenvalues[j]` and `eigenvalues[j + 1]` form a complex conjugate pair, then the true eigenvectors can be computed as true eigenvector[j]=eigenvectors[:,j]+i×eigenvectors[:,j+1]\text{true eigenvector}[j] = eigenvectors[:, j] + i \times eigenvectors[:, j + 1] , true eigenvector[j+1]=eigenvectors[:,j]−i×eigenvectors[:,j+1]\text{true eigenvector}[j + 1] = eigenvectors[:, j] - i \times eigenvectors[:, j + 1] .

Return type

    

([Tensor](../tensors#torch.Tensor "torch.Tensor"),
[Tensor](../tensors#torch.Tensor "torch.Tensor"))

Example:

    
    
    Trivial example with a diagonal matrix. By default, only eigenvalues are computed:
    
    >>> a = torch.diag(torch.tensor([1, 2, 3], dtype=torch.double))
    >>> e, v = torch.eig(a)
    >>> e
    tensor([[1., 0.],
            [2., 0.],
            [3., 0.]], dtype=torch.float64)
    >>> v
    tensor([], dtype=torch.float64)
    
    Compute also the eigenvectors:
    
    >>> e, v = torch.eig(a, eigenvectors=True)
    >>> e
    tensor([[1., 0.],
            [2., 0.],
            [3., 0.]], dtype=torch.float64)
    >>> v
    tensor([[1., 0., 0.],
            [0., 1., 0.],
            [0., 0., 1.]], dtype=torch.float64)
    

# torch.einsum

`torch.einsum(equation, *operands) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#einsum)

    

Sums the product of the elements of the input `operands` along dimensions
specified using a notation based on the Einstein summation convention.

Einsum allows computing many common multi-dimensional linear algebraic array
operations by representing them in a short-hand format based on the Einstein
summation convention, given by `equation`. The details of this format are
described below, but the general idea is to label every dimension of the input
`operands` with some subscript and define which subscripts are part of the
output. The output is then computed by summing the product of the elements of
the `operands` along the dimensions whose subscripts are not part of the
output. For example, matrix multiplication can be computed using einsum as
`torch.einsum(“ij,jk->ik”, A, B)`. Here, j is the summation subscript and i
and k the output subscripts (see section below for more details on why).

Equation:

The `equation` string specifies the subscripts (lower case letters `[‘a’,
‘z’]`) for each dimension of the input `operands` in the same order as the
dimensions, separating subcripts for each operand by a comma (‘,’), e.g.
`‘ij,jk’` specify subscripts for two 2D operands. The dimensions labeled with
the same subscript must be broadcastable, that is, their size must either
match or be `1`. The exception is if a subscript is repeated for the same
input operand, in which case the dimensions labeled with this subscript for
this operand must match in size and the operand will be replaced by its
diagonal along these dimensions. The subscripts that appear exactly once in
the `equation` will be part of the output, sorted in increasing alphabetical
order. The output is computed by multiplying the input `operands` element-
wise, with their dimensions aligned based on the subscripts, and then summing
out the dimensions whose subscripts are not part of the output.

Optionally, the output subscripts can be explicitly defined by adding an arrow
(‘->’) at the end of the equation followed by the subscripts for the output.
For instance, the following equation computes the transpose of a matrix
multiplication: ‘ij,jk->ki’. The output subscripts must appear at least once
for some input operand and at most once for the output.

Ellipsis (‘…’) can be used in place of subscripts to broadcast the dimensions
covered by the ellipsis. Each input operand may contain at most one ellipsis
which will cover the dimensions not covered by subscripts, e.g. for an input
operand with 5 dimensions, the ellipsis in the equation `‘ab…c’` cover the
third and fourth dimensions. The ellipsis does not need to cover the same
number of dimensions across the `operands` but the ‘shape’ of the ellipsis
(the size of the dimensions covered by them) must broadcast together. If the
output is not explicitly defined with the arrow (‘->’) notation, the ellipsis
will come first in the output (left-most dimensions), before the subscript
labels that appear exactly once for the input operands. e.g. the following
equation implements batch matrix multiplication `‘…ij,…jk’`.

A few final notes: the equation may contain whitespaces between the different
elements (subscripts, ellipsis, arrow and comma) but something like `‘…’` is
not valid. An empty string `‘’` is valid for scalar operands.

Note

`torch.einsum` handles ellipsis (‘…’) differently from NumPy in that it allows
dimensions covered by the ellipsis to be summed over, that is, ellipsis are
not required to be part of the output.

Note

This function does not optimize the given expression, so a different formula
for the same computation may run faster or consume less memory. Projects like
opt_einsum (<https://optimized-einsum.readthedocs.io/en/stable/>) can optimize
the formula for you.

Parameters

    

  * **equation** (_string_) – The subscripts for the Einstein summation.
  * **operands** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The operands to compute the Einstein sum of.

Examples:

    
    
    # trace
    >>> torch.einsum('ii', torch.randn(4, 4))
    tensor(-1.2104)
    
    # diagonal
    >>> torch.einsum('ii->i', torch.randn(4, 4))
    tensor([-0.1034,  0.7952, -0.2433,  0.4545])
    
    # outer product
    >>> x = torch.randn(5)
    >>> y = torch.randn(4)
    >>> torch.einsum('i,j->ij', x, y)
    tensor([[ 0.1156, -0.2897, -0.3918,  0.4963],
            [-0.3744,  0.9381,  1.2685, -1.6070],
            [ 0.7208, -1.8058, -2.4419,  3.0936],
            [ 0.1713, -0.4291, -0.5802,  0.7350],
            [ 0.5704, -1.4290, -1.9323,  2.4480]])
    
    # batch matrix multiplication
    >>> As = torch.randn(3,2,5)
    >>> Bs = torch.randn(3,5,4)
    >>> torch.einsum('bij,bjk->bik', As, Bs)
    tensor([[[-1.0564, -1.5904,  3.2023,  3.1271],
            [-1.6706, -0.8097, -0.8025, -2.1183]],
    
            [[ 4.2239,  0.3107, -0.5756, -0.2354],
            [-1.4558, -0.3460,  1.5087, -0.8530]],
    
            [[ 2.8153,  1.8787, -4.3839, -1.2112],
            [ 0.3728, -2.1131,  0.0921,  0.8305]]])
    
    # batch permute
    >>> A = torch.randn(2, 3, 4, 5)
    >>> torch.einsum('...ij->...ji', A).shape
    torch.Size([2, 3, 5, 4])
    
    # equivalent to torch.nn.functional.bilinear
    >>> A = torch.randn(3,5,4)
    >>> l = torch.randn(2,5)
    >>> r = torch.randn(2,4)
    >>> torch.einsum('bn,anm,bm->ba', l, A, r)
    tensor([[-0.3430, -5.2405,  0.4494],
            [ 0.3311,  5.5201, -3.0356]])
    

# torch.empty

`torch.empty(*size, *, out=None, dtype=None, layout=torch.strided,
device=None, requires_grad=False, pin_memory=False) → Tensor`

    

Returns a tensor filled with uninitialized data. The shape of the tensor is
defined by the variable argument `size`.

Parameters

    

**size** (_int..._) – a sequence of integers defining the shape of the output
tensor. Can be a variable number of arguments or a collection like a list or
tuple.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.contiguous_format`.

Example:

    
    
    >>> torch.empty(2, 3)
    tensor(1.00000e-08 *
           [[ 6.3984,  0.0000,  0.0000],
            [ 0.0000,  0.0000,  0.0000]])
    

# torch.empty_like

`torch.empty_like(input, *, dtype=None, layout=None, device=None,
requires_grad=False, memory_format=torch.preserve_format) → Tensor`

    

Returns an uninitialized tensor with the same size as `input`.
`torch.empty_like(input)` is equivalent to `torch.empty(input.size(),
dtype=input.dtype, layout=input.layout, device=input.device)`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of
`input` will determine size of the output tensor.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

Example:

    
    
    >>> torch.empty((2,3), dtype=torch.int64)
    tensor([[ 9.4064e+13,  2.8000e+01,  9.3493e+13],
            [ 7.5751e+18,  7.1428e+18,  7.5955e+18]])
    

# torch.empty_strided

`torch.empty_strided(size, stride, *, dtype=None, layout=None, device=None,
requires_grad=False, pin_memory=False) → Tensor`

    

Returns a tensor filled with uninitialized data. The shape and strides of the
tensor is defined by the variable argument `size` and `stride` respectively.
`torch.empty_strided(size, stride)` is equivalent to
`torch.empty(size).as_strided(size, stride)`.

Warning

More than one element of the created tensor may refer to a single memory
location. As a result, in-place operations (especially ones that are
vectorized) may result in incorrect behavior. If you need to write to the
tensors, please clone them first.

Parameters

    

  * **size** (_tuple of python:ints_) – the shape of the output tensor
  * **stride** (_tuple of python:ints_) – the strides of the output tensor

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`.

Example:

    
    
    >>> a = torch.empty_strided((2, 3), (1, 2))
    >>> a
    tensor([[8.9683e-44, 4.4842e-44, 5.1239e+07],
            [0.0000e+00, 0.0000e+00, 3.0705e-41]])
    >>> a.stride()
    (1, 2)
    >>> a.size()
    torch.Size([2, 3])
    

# enable_grad

`class torch.enable_grad`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#enable_grad)

    

Context-manager that enables gradient calculation.

Enables gradient calculation, if it has been disabled via
[`no_grad`](torch.no_grad#torch.no_grad "torch.no_grad") or
[`set_grad_enabled`](torch.set_grad_enabled#torch.set_grad_enabled
"torch.set_grad_enabled").

This context manager is thread local; it will not affect computation in other
threads.

Also functions as a decorator. (Make sure to instantiate with parenthesis.)

Example:

    
    
    >>> x = torch.tensor([1], requires_grad=True)
    >>> with torch.no_grad():
    ...   with torch.enable_grad():
    ...     y = x * 2
    >>> y.requires_grad
    True
    >>> y.backward()
    >>> x.grad
    >>> @torch.enable_grad()
    ... def doubler(x):
    ...     return x * 2
    >>> with torch.no_grad():
    ...     z = doubler(x)
    >>> z.requires_grad
    True
    

# torch.eq

`torch.eq(input, other, *, out=None) → Tensor`

    

Computes element-wise equality

The second argument can be a number or a tensor whose shape is
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the first argument.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Returns

    

A boolean tensor that is True where `input` is equal to `other` and False
elsewhere

Example:

    
    
    >>> torch.eq(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
    tensor([[ True, False],
            [False, True]])
    

# torch.equal

`torch.equal(input, other) → bool`

    

`True` if two tensors have the same size and elements, `False` otherwise.

Example:

    
    
    >>> torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2]))
    True
    

# torch.erf

`torch.erf(input, *, out=None) → Tensor`

    

Computes the error function of each element. The error function is defined as
follows:

erf(x)=2π∫0xe−t2dt\mathrm{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2}
dt

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.erf(torch.tensor([0, -1., 10.]))
    tensor([ 0.0000, -0.8427,  1.0000])
    

# torch.erfc

`torch.erfc(input, *, out=None) → Tensor`

    

Computes the complementary error function of each element of `input`. The
complementary error function is defined as follows:

erfc(x)=1−2π∫0xe−t2dt\mathrm{erfc}(x) = 1 - \frac{2}{\sqrt{\pi}} \int_{0}^{x}
e^{-t^2} dt

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.erfc(torch.tensor([0, -1., 10.]))
    tensor([ 1.0000, 1.8427,  0.0000])
    

# torch.erfinv

`torch.erfinv(input, *, out=None) → Tensor`

    

Computes the inverse error function of each element of `input`. The inverse
error function is defined in the range (−1,1)(-1, 1) as:

erfinv(erf(x))=x\mathrm{erfinv}(\mathrm{erf}(x)) = x

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.erfinv(torch.tensor([0, 0.5, -1.]))
    tensor([ 0.0000,  0.4769,    -inf])
    

# torch.exp

`torch.exp(input, *, out=None) → Tensor`

    

Returns a new tensor with the exponential of the elements of the input tensor
`input`.

yi=exiy_{i} = e^{x_{i}}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.exp(torch.tensor([0, math.log(2.)]))
    tensor([ 1.,  2.])
    

# torch.exp2

`torch.exp2(input, *, out=None) → Tensor`

    

Computes the base two exponential function of `input`.

yi=2xiy_{i} = 2^{x_{i}}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.exp2(torch.tensor([0, math.log2(2.), 3, 4]))
    tensor([ 1.,  2.,  8., 16.])
    

# torch.expm1

`torch.expm1(input, *, out=None) → Tensor`

    

Returns a new tensor with the exponential of the elements minus 1 of `input`.

yi=exi−1y_{i} = e^{x_{i}} - 1

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.expm1(torch.tensor([0, math.log(2.)]))
    tensor([ 0.,  1.])
    

# torch.eye

`torch.eye(n, m=None, *, out=None, dtype=None, layout=torch.strided,
device=None, requires_grad=False) → Tensor`

    

Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.

Parameters

    

  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of rows
  * **m** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of columns with default being `n`

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Returns

    

A 2-D tensor with ones on the diagonal and zeros elsewhere

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> torch.eye(3)
    tensor([[ 1.,  0.,  0.],
            [ 0.,  1.,  0.],
            [ 0.,  0.,  1.]])
    

# torch.fake_quantize_per_channel_affine

`torch.fake_quantize_per_channel_affine(input, scale, zero_point, quant_min,
quant_max) → Tensor`

    

Returns a new tensor with the data in `input` fake quantized per channel using
`scale`, `zero_point`, `quant_min` and `quant_max`, across the channel
specified by `axis`.

output=min(quant_max,max(quant_min,std::nearby_int(input/scale)+zero_point))\text{output}
= min( \text{quant\\_max}, max( \text{quant\\_min},
\text{std::nearby\\_int}(\text{input} / \text{scale}) + \text{zero\\_point} )
)

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input value(s), in `torch.float32`.
  * **scale** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – quantization scale, per channel
  * **zero_point** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – quantization zero_point, per channel
  * **axis** (_int32_) – channel axis
  * **quant_min** (_int64_) – lower bound of the quantized domain
  * **quant_max** (_int64_) – upper bound of the quantized domain

Returns

    

A newly fake_quantized per channel tensor

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> x = torch.randn(2, 2, 2)
    >>> x
    tensor([[[-0.2525, -0.0466],
             [ 0.3491, -0.2168]],
    
            [[-0.5906,  1.6258],
             [ 0.6444, -0.0542]]])
    >>> scales = (torch.randn(2) + 1) * 0.05
    >>> scales
    tensor([0.0475, 0.0486])
    >>> zero_points = torch.zeros(2).to(torch.long)
    >>> zero_points
    tensor([0, 0])
    >>> torch.fake_quantize_per_channel_affine(x, scales, zero_points, 1, 0, 255)
    tensor([[[0.0000, 0.0000],
             [0.3405, 0.0000]],
    
            [[0.0000, 1.6134],
            [0.6323, 0.0000]]])
    

# torch.fake_quantize_per_tensor_affine

`torch.fake_quantize_per_tensor_affine(input, scale, zero_point, quant_min,
quant_max) → Tensor`

    

Returns a new tensor with the data in `input` fake quantized using `scale`,
`zero_point`, `quant_min` and `quant_max`.

output=min(quant_max,max(quant_min,std::nearby_int(input/scale)+zero_point))\text{output}
= min( \text{quant\\_max}, max( \text{quant\\_min},
\text{std::nearby\\_int}(\text{input} / \text{scale}) + \text{zero\\_point} )
)

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input value(s), in `torch.float32`.
  * **scale** (_double_) – quantization scale
  * **zero_point** (_int64_) – quantization zero_point
  * **quant_min** (_int64_) – lower bound of the quantized domain
  * **quant_max** (_int64_) – upper bound of the quantized domain

Returns

    

A newly fake_quantized tensor

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> x = torch.randn(4)
    >>> x
    tensor([ 0.0552,  0.9730,  0.3973, -1.0780])
    >>> torch.fake_quantize_per_tensor_affine(x, 0.1, 0, 0, 255)
    tensor([0.1000, 1.0000, 0.4000, 0.0000])
    

# torch.fix

`torch.fix(input, *, out=None) → Tensor`

    

Alias for [`torch.trunc()`](torch.trunc#torch.trunc "torch.trunc")

# torch.flatten

`torch.flatten(input, start_dim=0, end_dim=-1) → Tensor`

    

Flattens `input` by reshaping it into a one-dimensional tensor. If `start_dim`
or `end_dim` are passed, only dimensions starting with `start_dim` and ending
with `end_dim` are flattened. The order of elements in `input` is unchanged.

Unlike NumPy’s flatten, which always copies input’s data, this function may
return the original object, a view, or copy. If no dimensions are flattened,
then the original object `input` is returned. Otherwise, if input can be
viewed as the flattened shape, then that view is returned. Finally, only if
the input cannot be viewed as the flattened shape is input’s data copied. See
[`torch.Tensor.view()`](../tensors#torch.Tensor.view "torch.Tensor.view") for
details on when a view will be returned.

Note

Flattening a zero-dimensional tensor will return a one-dimensional view.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **start_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the first dim to flatten
  * **end_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the last dim to flatten

Example:

    
    
    >>> t = torch.tensor([[[1, 2],
    ...                    [3, 4]],
    ...                   [[5, 6],
    ...                    [7, 8]]])
    >>> torch.flatten(t)
    tensor([1, 2, 3, 4, 5, 6, 7, 8])
    >>> torch.flatten(t, start_dim=1)
    tensor([[1, 2, 3, 4],
            [5, 6, 7, 8]])
    

# torch.flip

`torch.flip(input, dims) → Tensor`

    

Reverse the order of a n-D tensor along given axis in dims.

Note

`torch.flip` makes a copy of `input`’s data. This is different from NumPy’s
`np.flip`, which returns a view in constant time. Since copying a tensor’s
data is more work than viewing that data, `torch.flip` is expected to be
slower than `np.flip`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dims** (_a list_ _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – axis to flip on

Example:

    
    
    >>> x = torch.arange(8).view(2, 2, 2)
    >>> x
    tensor([[[ 0,  1],
             [ 2,  3]],
    
            [[ 4,  5],
             [ 6,  7]]])
    >>> torch.flip(x, [0, 1])
    tensor([[[ 6,  7],
             [ 4,  5]],
    
            [[ 2,  3],
             [ 0,  1]]])
    

# torch.fliplr

`torch.fliplr(input) → Tensor`

    

Flip tensor in the left/right direction, returning a new tensor.

Flip the entries in each row in the left/right direction. Columns are
preserved, but appear in a different order than before.

Note

Requires the tensor to be at least 2-D.

Note

`torch.fliplr` makes a copy of `input`’s data. This is different from NumPy’s
`np.fliplr`, which returns a view in constant time. Since copying a tensor’s
data is more work than viewing that data, `torch.fliplr` is expected to be
slower than `np.fliplr`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Must be at
least 2-dimensional.

Example:

    
    
    >>> x = torch.arange(4).view(2, 2)
    >>> x
    tensor([[0, 1],
            [2, 3]])
    >>> torch.fliplr(x)
    tensor([[1, 0],
            [3, 2]])
    

# torch.flipud

`torch.flipud(input) → Tensor`

    

Flip tensor in the up/down direction, returning a new tensor.

Flip the entries in each column in the up/down direction. Rows are preserved,
but appear in a different order than before.

Note

Requires the tensor to be at least 1-D.

Note

`torch.flipud` makes a copy of `input`’s data. This is different from NumPy’s
`np.flipud`, which returns a view in constant time. Since copying a tensor’s
data is more work than viewing that data, `torch.flipud` is expected to be
slower than `np.flipud`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Must be at
least 1-dimensional.

Example:

    
    
    >>> x = torch.arange(4).view(2, 2)
    >>> x
    tensor([[0, 1],
            [2, 3]])
    >>> torch.flipud(x)
    tensor([[2, 3],
            [0, 1]])
    

# torch.float_power

`torch.float_power(input, exponent, *, out=None) → Tensor`

    

Raises `input` to the power of `exponent`, elementwise, in double precision.
If neither input is complex returns a `torch.float64` tensor, and if one or
more inputs is complex returns a `torch.complex128` tensor.

Note

This function always computes in double precision, unlike
[`torch.pow()`](torch.pow#torch.pow "torch.pow"), which implements more
typical [type promotion](../tensor_attributes#type-promotion-doc). This is
useful when the computation needs to be performed in a wider or more precise
dtype, or the results of the computation may contain fractional values not
representable in the input dtypes, like when an integer base is raised to a
negative integer exponent.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the base value(s)
  * **exponent** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the exponent value(s)

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randint(10, (4,))
    >>> a
    tensor([6, 4, 7, 1])
    >>> torch.float_power(a, 2)
    tensor([36., 16., 49.,  1.], dtype=torch.float64)
    
    >>> a = torch.arange(1, 5)
    >>> a
    tensor([ 1,  2,  3,  4])
    >>> exp = torch.tensor([2, -3, 4, -5])
    >>> exp
    tensor([ 2, -3,  4, -5])
    >>> torch.float_power(a, exp)
    tensor([1.0000e+00, 1.2500e-01, 8.1000e+01, 9.7656e-04], dtype=torch.float64)
    

# torch.floor

`torch.floor(input, *, out=None) → Tensor`

    

Returns a new tensor with the floor of the elements of `input`, the largest
integer less than or equal to each element.

outi=⌊inputi⌋\text{out}_{i} = \left\lfloor \text{input}_{i} \right\rfloor

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.8166,  1.5308, -0.2530, -0.2091])
    >>> torch.floor(a)
    tensor([-1.,  1., -1., -1.])
    

# torch.floor_divide

`torch.floor_divide(input, other, *, out=None) → Tensor`

    

Warning

This function’s name is a misnomer. It actually rounds the quotient towards
zero instead of taking its floor. This behavior will be deprecated in a future
PyTorch release.

Computes `input` divided by `other`, elementwise, and rounds each quotient
towards zero. Equivalently, it truncates the quotient(s):

outi=trunc(inputiotheri)\text{{out}}_i = \text{trunc} \left(
\frac{{\text{{input}}_i}}{{\text{{other}}_i}} \right)

Supports broadcasting to a common shape, type promotion, and integer and float
inputs.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the dividend
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the divisor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([4.0, 3.0])
    >>> b = torch.tensor([2.0, 2.0])
    >>> torch.floor_divide(a, b)
    tensor([2.0, 1.0])
    >>> torch.floor_divide(a, 1.4)
    tensor([2.0, 2.0])
    

# torch.fmax

`torch.fmax(input, other, *, out=None) → Tensor`

    

Computes the element-wise maximum of `input` and `other`.

This is like [`torch.maximum()`](torch.maximum#torch.maximum "torch.maximum")
except it handles NaNs differently: if exactly one of the two elements being
compared is a NaN then the non-NaN element is taken as the maximum. Only if
both elements are NaN is NaN propagated.

This function is a wrapper around C++’s `std::fmax` and is similar to NumPy’s
`fmax` function.

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics), [type promotion](../tensor_attributes#type-promotion-doc), and
integer and floating-point inputs.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([9.7, float('nan'), 3.1, float('nan')])
    >>> b = torch.tensor([-2.2, 0.5, float('nan'), float('nan')])
    >>> torch.fmax(a, b)
    tensor([9.7000, 0.5000, 3.1000,    nan])
    

# torch.fmin

`torch.fmin(input, other, *, out=None) → Tensor`

    

Computes the element-wise minimum of `input` and `other`.

This is like [`torch.minimum()`](torch.minimum#torch.minimum "torch.minimum")
except it handles NaNs differently: if exactly one of the two elements being
compared is a NaN then the non-NaN element is taken as the minimum. Only if
both elements are NaN is NaN propagated.

This function is a wrapper around C++’s `std::fmin` and is similar to NumPy’s
`fmin` function.

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics), [type promotion](../tensor_attributes#type-promotion-doc), and
integer and floating-point inputs.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([2.2, float('nan'), 2.1, float('nan')])
    >>> b = torch.tensor([-9.3, 0.1, float('nan'), float('nan')])
    >>> torch.fmin(a, b)
    tensor([-9.3000, 0.1000, 2.1000,    nan])
    

# torch.fmod

`torch.fmod(input, other, *, out=None) → Tensor`

    

Computes the element-wise remainder of division.

The dividend and divisor may contain both for integer and floating point
numbers. The remainder has the same sign as the dividend `input`.

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics), [type promotion](../tensor_attributes#type-promotion-doc), and
integer and float inputs.

Note

When the divisor is zero, returns `NaN` for floating point dtypes on both CPU
and GPU; raises `RuntimeError` for integer division by zero on CPU; Integer
division by zero on GPU may return any value.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the dividend
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – the divisor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.fmod(torch.tensor([-3., -2, -1, 1, 2, 3]), 2)
    tensor([-1., -0., -1.,  1.,  0.,  1.])
    >>> torch.fmod(torch.tensor([1, 2, 3, 4, 5]), 1.5)
    tensor([1.0000, 0.5000, 0.0000, 1.0000, 0.5000])
    

# torch.frac

`torch.frac(input, *, out=None) → Tensor`

    

Computes the fractional portion of each element in `input`.

outi=inputi−⌊∣inputi∣⌋∗sgn⁡(inputi)\text{out}_{i} = \text{input}_{i} -
\left\lfloor |\text{input}_{i}| \right\rfloor *
\operatorname{sgn}(\text{input}_{i})

Example:

    
    
    >>> torch.frac(torch.tensor([1, 2.5, -3.2]))
    tensor([ 0.0000,  0.5000, -0.2000])
    

# torch.from_numpy

`torch.from_numpy(ndarray) → Tensor`

    

Creates a [`Tensor`](../tensors#torch.Tensor "torch.Tensor") from a
[`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray
"\(in NumPy v1.20\)").

The returned tensor and `ndarray` share the same memory. Modifications to the
tensor will be reflected in the `ndarray` and vice versa. The returned tensor
is not resizable.

It currently accepts `ndarray` with dtypes of `numpy.float64`,
`numpy.float32`, `numpy.float16`, `numpy.complex64`, `numpy.complex128`,
`numpy.int64`, `numpy.int32`, `numpy.int16`, `numpy.int8`, `numpy.uint8`, and
`numpy.bool`.

Example:

    
    
    >>> a = numpy.array([1, 2, 3])
    >>> t = torch.from_numpy(a)
    >>> t
    tensor([ 1,  2,  3])
    >>> t[0] = -1
    >>> a
    array([-1,  2,  3])
    

# torch.full

`torch.full(size, fill_value, *, out=None, dtype=None, layout=torch.strided,
device=None, requires_grad=False) → Tensor`

    

Creates a tensor of size `size` filled with `fill_value`. The tensor’s dtype
is inferred from `fill_value`.

Parameters

    

  * **size** (_int..._) – a list, tuple, or `torch.Size` of integers defining the shape of the output tensor.
  * **fill_value** (_Scalar_) – the value to fill the output tensor with.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.full((2, 3), 3.141592)
    tensor([[ 3.1416,  3.1416,  3.1416],
            [ 3.1416,  3.1416,  3.1416]])
    

# torch.full_like

`torch.full_like(input, fill_value, *, dtype=None, layout=torch.strided,
device=None, requires_grad=False, memory_format=torch.preserve_format) →
Tensor`

    

Returns a tensor with the same size as `input` filled with `fill_value`.
`torch.full_like(input, fill_value)` is equivalent to
`torch.full(input.size(), fill_value, dtype=input.dtype, layout=input.layout,
device=input.device)`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor.
  * **fill_value** – the number to fill the output tensor with.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

# torch.gather

`torch.gather(input, dim, index, *, sparse_grad=False, out=None) → Tensor`

    

Gathers values along an axis specified by `dim`.

For a 3-D tensor the output is specified by:

    
    
    out[i][j][k] = input[index[i][j][k]][j][k]  # if dim == 0
    out[i][j][k] = input[i][index[i][j][k]][k]  # if dim == 1
    out[i][j][k] = input[i][j][index[i][j][k]]  # if dim == 2
    

`input` and `index` must have the same number of dimensions. It is also
required that `index.size(d) <= input.size(d)` for all dimensions `d != dim`.
`out` will have the same shape as `index`. Note that `input` and `index` do
not broadcast against each other.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the source tensor
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the axis along which to index
  * **index** (_LongTensor_) – the indices of elements to gather

Keyword Arguments

    

  * **sparse_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, gradient w.r.t. `input` will be a sparse tensor.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the destination tensor

Example:

    
    
    >>> t = torch.tensor([[1, 2], [3, 4]])
    >>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]]))
    tensor([[ 1,  1],
            [ 4,  3]])
    

# torch.gcd

`torch.gcd(input, other, *, out=None) → Tensor`

    

Computes the element-wise greatest common divisor (GCD) of `input` and
`other`.

Both `input` and `other` must have integer types.

Note

This defines gcd(0,0)=0gcd(0, 0) = 0 .

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([5, 10, 15])
    >>> b = torch.tensor([3, 4, 5])
    >>> torch.gcd(a, b)
    tensor([1, 2, 5])
    >>> c = torch.tensor([3])
    >>> torch.gcd(a, c)
    tensor([1, 1, 3])
    

# torch.ge

`torch.ge(input, other, *, out=None) → Tensor`

    

Computes input≥other\text{input} \geq \text{other} element-wise.

The second argument can be a number or a tensor whose shape is
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the first argument.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Returns

    

A boolean tensor that is True where `input` is greater than or equal to
`other` and False elsewhere

Example:

    
    
    >>> torch.ge(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
    tensor([[True, True], [False, True]])
    

# Generator

`class torch.Generator(device='cpu') → Generator`

    

Creates and returns a generator object that manages the state of the algorithm
which produces pseudo random numbers. Used as a keyword argument in many [In-
place random sampling](../torch#inplace-random-sampling) functions.

Parameters

    

**device** ([`torch.device`](../tensor_attributes#torch.torch.device
"torch.torch.device"), optional) – the desired device for the generator.

Returns

    

An torch.Generator object.

Return type

    

Generator

Example:

    
    
    >>> g_cpu = torch.Generator()
    >>> g_cuda = torch.Generator(device='cuda')
    

`device`

    

Generator.device -> device

Gets the current device of the generator.

Example:

    
    
    >>> g_cpu = torch.Generator()
    >>> g_cpu.device
    device(type='cpu')
    

`get_state() → Tensor`

    

Returns the Generator state as a `torch.ByteTensor`.

Returns

    

A `torch.ByteTensor` which contains all the necessary bits to restore a
Generator to a specific point in time.

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> g_cpu = torch.Generator()
    >>> g_cpu.get_state()
    

`initial_seed() → int`

    

Returns the initial seed for generating random numbers.

Example:

    
    
    >>> g_cpu = torch.Generator()
    >>> g_cpu.initial_seed()
    2147483647
    

`manual_seed(seed) → Generator`

    

Sets the seed for generating random numbers. Returns a `torch.Generator`
object. It is recommended to set a large seed, i.e. a number that has a good
balance of 0 and 1 bits. Avoid having many 0 bits in the seed.

Parameters

    

**seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – The desired seed. Value must be within the inclusive range
`[-0x8000_0000_0000_0000, 0xffff_ffff_ffff_ffff]`. Otherwise, a RuntimeError
is raised. Negative inputs are remapped to positive values with the formula
`0xffff_ffff_ffff_ffff + seed`.

Returns

    

An torch.Generator object.

Return type

    

Generator

Example:

    
    
    >>> g_cpu = torch.Generator()
    >>> g_cpu.manual_seed(2147483647)
    

`seed() → int`

    

Gets a non-deterministic random number from std::random_device or the current
time and uses it to seed a Generator.

Example:

    
    
    >>> g_cpu = torch.Generator()
    >>> g_cpu.seed()
    1516516984916
    

`set_state(new_state) → void`

    

Sets the Generator state.

Parameters

    

**new_state** (_torch.ByteTensor_) – The desired state.

Example:

    
    
    >>> g_cpu = torch.Generator()
    >>> g_cpu_other = torch.Generator()
    >>> g_cpu.set_state(g_cpu_other.get_state())
    

# torch.geqrf

`torch.geqrf(input, *, out=None) -> (Tensor, Tensor)`

    

This is a low-level function for calling LAPACK directly. This function
returns a namedtuple (a, tau) as defined in [LAPACK documentation for
geqrf](https://software.intel.com/en-us/node/521004) .

You’ll generally want to use [`torch.qr()`](torch.qr#torch.qr "torch.qr")
instead.

Computes a QR decomposition of `input`, but without constructing QQ and RR as
explicit separate matrices.

Rather, this directly calls the underlying LAPACK function `?geqrf` which
produces a sequence of ‘elementary reflectors’.

See [LAPACK documentation for geqrf](https://software.intel.com/en-
us/node/521004) for further details.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
matrix

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the output tuple of (Tensor, Tensor)

# torch.ger

`torch.ger(input, vec2, *, out=None) → Tensor`

    

Alias of [`torch.outer()`](torch.outer#torch.outer "torch.outer").

Warning

This function is deprecated and will be removed in a future PyTorch release.
Use [`torch.outer()`](torch.outer#torch.outer "torch.outer") instead.

# torch.get_default_dtype

`torch.get_default_dtype() → torch.dtype`

    

Get the current default floating point
[`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype").

Example:

    
    
    >>> torch.get_default_dtype()  # initial default for floating point is torch.float32
    torch.float32
    >>> torch.set_default_dtype(torch.float64)
    >>> torch.get_default_dtype()  # default is now changed to torch.float64
    torch.float64
    >>> torch.set_default_tensor_type(torch.FloatTensor)  # setting tensor type also affects this
    >>> torch.get_default_dtype()  # changed to torch.float32, the dtype for torch.FloatTensor
    torch.float32
    

# torch.get_num_interop_threads

`torch.get_num_interop_threads() → int`

    

Returns the number of threads used for inter-op parallelism on CPU (e.g. in
JIT interpreter)

# torch.get_num_threads

`torch.get_num_threads() → int`

    

Returns the number of threads used for parallelizing CPU operations

# torch.get_rng_state

`torch.get_rng_state()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#get_rng_state)

    

Returns the random number generator state as a `torch.ByteTensor`.

# torch.greater

`torch.greater(input, other, *, out=None) → Tensor`

    

Alias for [`torch.gt()`](torch.gt#torch.gt "torch.gt").

# torch.greater_equal

`torch.greater_equal(input, other, *, out=None) → Tensor`

    

Alias for [`torch.ge()`](torch.ge#torch.ge "torch.ge").

# torch.gt

`torch.gt(input, other, *, out=None) → Tensor`

    

Computes input>other\text{input} > \text{other} element-wise.

The second argument can be a number or a tensor whose shape is
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the first argument.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Returns

    

A boolean tensor that is True where `input` is greater than `other` and False
elsewhere

Example:

    
    
    >>> torch.gt(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
    tensor([[False, True], [False, False]])
    

# torch.hamming_window

`torch.hamming_window(window_length, periodic=True, alpha=0.54, beta=0.46, *,
dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Hamming window function.

w[n]=α−βcos⁡(2πnN−1),w[n] = \alpha - \beta\ \cos \left( \frac{2 \pi n}{N - 1}
\right),

where NN is the full window size.

The input `window_length` is a positive integer controlling the returned
window size. `periodic` flag determines whether the returned window trims off
the last duplicate value from the symmetric window and is ready to be used as
a periodic window with functions like [`torch.stft()`](torch.stft#torch.stft
"torch.stft"). Therefore, if `periodic` is true, the NN in above formula is in
fact window_length+1\text{window\\_length} + 1 . Also, we always have
`torch.hamming_window(L, periodic=True)` equal to `torch.hamming_window(L + 1,
periodic=False)[:-1])`.

Note

If `window_length` =1=1 , the returned window contains a single value 1.

Note

This is a generalized version of
[`torch.hann_window()`](torch.hann_window#torch.hann_window
"torch.hann_window").

Parameters

    

  * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of returned window
  * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a window to be used as periodic function. If False, return a symmetric window.
  * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The coefficient α\alpha in the equation above
  * **beta** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The coefficient β\beta in the equation above

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). Only floating point types are supported.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Returns

    

A 1-D tensor of size (window_length,)(\text{window\\_length},) containing the
window

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

# torch.hann_window

`torch.hann_window(window_length, periodic=True, *, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Hann window function.

w[n]=12[1−cos⁡(2πnN−1)]=sin⁡2(πnN−1),w[n] = \frac{1}{2}\ \left[1 - \cos \left(
\frac{2 \pi n}{N - 1} \right)\right] = \sin^2 \left( \frac{\pi n}{N - 1}
\right),

where NN is the full window size.

The input `window_length` is a positive integer controlling the returned
window size. `periodic` flag determines whether the returned window trims off
the last duplicate value from the symmetric window and is ready to be used as
a periodic window with functions like [`torch.stft()`](torch.stft#torch.stft
"torch.stft"). Therefore, if `periodic` is true, the NN in above formula is in
fact window_length+1\text{window\\_length} + 1 . Also, we always have
`torch.hann_window(L, periodic=True)` equal to `torch.hann_window(L + 1,
periodic=False)[:-1])`.

Note

If `window_length` =1=1 , the returned window contains a single value 1.

Parameters

    

  * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of returned window
  * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a window to be used as periodic function. If False, return a symmetric window.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). Only floating point types are supported.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Returns

    

A 1-D tensor of size (window_length,)(\text{window\\_length},) containing the
window

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

# torch.heaviside

`torch.heaviside(input, values, *, out=None) → Tensor`

    

Computes the Heaviside step function for each element in `input`. The
Heaviside step function is defined as:

heaviside(input,values)={0,if input < 0values,if input == 01,if input >
0\text{{heaviside}}(input, values) = \begin{cases} 0, & \text{if input < 0}\\\
values, & \text{if input == 0}\\\ 1, & \text{if input > 0} \end{cases}

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **values** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The values to use where `input` is zero.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> input = torch.tensor([-1.5, 0, 2.0])
    >>> values = torch.tensor([0.5])
    >>> torch.heaviside(input, values)
    tensor([0.0000, 0.5000, 1.0000])
    >>> values = torch.tensor([1.2, -2.0, 3.5])
    >>> torch.heaviside(input, values)
    tensor([0., -2., 1.])
    

# torch.histc

`torch.histc(input, bins=100, min=0, max=0, *, out=None) → Tensor`

    

Computes the histogram of a tensor.

The elements are sorted into equal width bins between
[`min`](torch.min#torch.min "torch.min") and [`max`](torch.max#torch.max
"torch.max"). If [`min`](torch.min#torch.min "torch.min") and
[`max`](torch.max#torch.max "torch.max") are both zero, the minimum and
maximum values of the data are used.

Elements lower than min and higher than max are ignored.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **bins** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of histogram bins
  * **min** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – lower end of the range (inclusive)
  * **max** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – upper end of the range (inclusive)

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Returns

    

Histogram represented as a tensor

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> torch.histc(torch.tensor([1., 2, 1]), bins=4, min=0, max=3)
    tensor([ 0.,  2.,  1.,  0.])
    

# torch.hstack

`torch.hstack(tensors, *, out=None) → Tensor`

    

Stack tensors in sequence horizontally (column wise).

This is equivalent to concatenation along the first axis for 1-D tensors, and
along the second axis for all other tensors.

Parameters

    

**tensors** (_sequence of Tensors_) – sequence of tensors to concatenate

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([1, 2, 3])
    >>> b = torch.tensor([4, 5, 6])
    >>> torch.hstack((a,b))
    tensor([1, 2, 3, 4, 5, 6])
    >>> a = torch.tensor([[1],[2],[3]])
    >>> b = torch.tensor([[4],[5],[6]])
    >>> torch.hstack((a,b))
    tensor([[1, 4],
            [2, 5],
            [3, 6]])
    

# torch.hypot

`torch.hypot(input, other, *, out=None) → Tensor`

    

Given the legs of a right triangle, return its hypotenuse.

outi=inputi2+otheri2\text{out}_{i} = \sqrt{\text{input}_{i}^{2} +
\text{other}_{i}^{2}}

The shapes of `input` and `other` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.hypot(torch.tensor([4.0]), torch.tensor([3.0, 4.0, 5.0]))
    tensor([5.0000, 5.6569, 6.4031])
    

# torch.i0

`torch.i0(input, *, out=None) → Tensor`

    

Computes the zeroth order modified Bessel function of the first kind for each
element of `input`.

outi=I0(inputi)=∑k=0∞(inputi2/4)k(k!)2\text{out}_{i} = I_0(\text{input}_{i}) =
\sum_{k=0}^{\infty} \frac{(\text{input}_{i}^2/4)^k}{(k!)^2}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.i0(torch.arange(5, dtype=torch.float32))
    tensor([ 1.0000,  1.2661,  2.2796,  4.8808, 11.3019])
    

# torch.igamma

`torch.igamma(input, other, *, out=None) → Tensor`

    

Computes the regularized lower incomplete gamma function:

outi=1Γ(inputi)∫0otheritinputi−1e−tdt\text{out}_{i} =
\frac{1}{\Gamma(\text{input}_i)} \int_0^{\text{other}_i} t^{\text{input}_i-1}
e^{-t} dt

where both inputi\text{input}_i and otheri\text{other}_i are weakly positive
and at least one is strictly positive. If both are zero or either is negative
then outi=nan\text{out}_i=\text{nan} . Γ(⋅)\Gamma(\cdot) in the equation above
is the gamma function,

Γ(inputi)=∫0∞t(inputi−1)e−tdt.\Gamma(\text{input}_i) = \int_0^\infty
t^{(\text{input}_i-1)} e^{-t} dt.

See [`torch.igammac()`](torch.igammac#torch.igammac "torch.igammac") and
[`torch.lgamma()`](torch.lgamma#torch.lgamma "torch.lgamma") for related
functions.

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) and float inputs.

Note

The backward pass with respect to `input` is not yet supported. Please open an
issue on PyTorch’s Github to request it.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first non-negative input tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second non-negative input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a1 = torch.tensor([4.0])
    >>> a2 = torch.tensor([3.0, 4.0, 5.0])
    >>> a = torch.igammac(a1, a2)
    tensor([0.3528, 0.5665, 0.7350])
    tensor([0.3528, 0.5665, 0.7350])
    >>> b = torch.igamma(a1, a2) + torch.igammac(a1, a2)
    tensor([1., 1., 1.])
    

# torch.igammac

`torch.igammac(input, other, *, out=None) → Tensor`

    

Computes the regularized upper incomplete gamma function:

outi=1Γ(inputi)∫otheri∞tinputi−1e−tdt\text{out}_{i} =
\frac{1}{\Gamma(\text{input}_i)} \int_{\text{other}_i}^{\infty}
t^{\text{input}_i-1} e^{-t} dt

where both inputi\text{input}_i and otheri\text{other}_i are weakly positive
and at least one is strictly positive. If both are zero or either is negative
then outi=nan\text{out}_i=\text{nan} . Γ(⋅)\Gamma(\cdot) in the equation above
is the gamma function,

Γ(inputi)=∫0∞t(inputi−1)e−tdt.\Gamma(\text{input}_i) = \int_0^\infty
t^{(\text{input}_i-1)} e^{-t} dt.

See [`torch.igamma()`](torch.igamma#torch.igamma "torch.igamma") and
[`torch.lgamma()`](torch.lgamma#torch.lgamma "torch.lgamma") for related
functions.

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) and float inputs.

Note

The backward pass with respect to `input` is not yet supported. Please open an
issue on PyTorch’s Github to request it.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first non-negative input tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second non-negative input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a1 = torch.tensor([4.0])
    >>> a2 = torch.tensor([3.0, 4.0, 5.0])
    >>> a = torch.igammac(a1, a2)
    tensor([0.6472, 0.4335, 0.2650])
    >>> b = torch.igamma(a1, a2) + torch.igammac(a1, a2)
    tensor([1., 1., 1.])
    

# torch.imag

`torch.imag(input) → Tensor`

    

Returns a new tensor containing imaginary values of the `self` tensor. The
returned tensor and `self` share the same underlying storage.

Warning

`imag()` is only supported for tensors with complex dtypes.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example::

    
    
    
    >>> x=torch.randn(4, dtype=torch.cfloat)
    >>> x
    tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)])
    >>> x.imag
    tensor([ 0.3553, -0.7896, -0.0633, -0.8119])
    

# torch.index_select

`torch.index_select(input, dim, index, *, out=None) → Tensor`

    

Returns a new tensor which indexes the `input` tensor along dimension `dim`
using the entries in `index` which is a `LongTensor`.

The returned tensor has the same number of dimensions as the original tensor
(`input`). The `dim`th dimension has the same size as the length of `index`;
other dimensions have the same size as in the original tensor.

Note

The returned tensor does **not** use the same storage as the original tensor.
If `out` has a different shape than expected, we silently change it to the
correct shape, reallocating the underlying storage if necessary.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension in which we index
  * **index** (_IntTensor_ _or_ _LongTensor_) – the 1-D tensor containing the indices to index

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x = torch.randn(3, 4)
    >>> x
    tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
            [-0.4664,  0.2647, -0.1228, -1.1068],
            [-1.1734, -0.6571,  0.7230, -0.6004]])
    >>> indices = torch.tensor([0, 2])
    >>> torch.index_select(x, 0, indices)
    tensor([[ 0.1427,  0.0231, -0.5414, -1.0009],
            [-1.1734, -0.6571,  0.7230, -0.6004]])
    >>> torch.index_select(x, 1, indices)
    tensor([[ 0.1427, -0.5414],
            [-0.4664, -0.1228],
            [-1.1734,  0.7230]])
    

# torch.initial_seed

`torch.initial_seed()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#initial_seed)

    

Returns the initial seed for generating random numbers as a Python `long`.

# torch.inner

`torch.inner(input, other, *, out=None) → Tensor`

    

Computes the dot product for 1D tensors. For higher dimensions, sums the
product of elements from `input` and `other` along their last dimension.

Note

If either `input` or `other` is a scalar, the result is equivalent to
`torch.mul(input, other)`.

If both `input` and `other` are non-scalars, the size of their last dimension
must match and the result is equivalent to `torch.tensordot(input, other,
dims=([-1], [-1]))`

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – First input tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) –
Optional output tensor to write result into. The output shape is
`input.shape[:-1] + other.shape[:-1]`.

Example:

    
    
    # Dot product
    >>> torch.inner(torch.tensor([1, 2, 3]), torch.tensor([0, 2, 1]))
    tensor(7)
    
    # Multidimensional input tensors
    >>> a = torch.randn(2, 3)
    >>> a
    tensor([[0.8173, 1.0874, 1.1784],
            [0.3279, 0.1234, 2.7894]])
    >>> b = torch.randn(2, 4, 3)
    >>> b
    tensor([[[-0.4682, -0.7159,  0.1506],
            [ 0.4034, -0.3657,  1.0387],
            [ 0.9892, -0.6684,  0.1774],
            [ 0.9482,  1.3261,  0.3917]],
    
            [[ 0.4537,  0.7493,  1.1724],
            [ 0.2291,  0.5749, -0.2267],
            [-0.7920,  0.3607, -0.3701],
            [ 1.3666, -0.5850, -1.7242]]])
    >>> torch.inner(a, b)
    tensor([[[-0.9837,  1.1560,  0.2907,  2.6785],
            [ 2.5671,  0.5452, -0.6912, -1.5509]],
    
            [[ 0.1782,  2.9843,  0.7366,  1.5672],
            [ 3.5115, -0.4864, -1.2476, -4.4337]]])
    
    # Scalar input
    >>> torch.inner(a, torch.tensor(2))
    tensor([[1.6347, 2.1748, 2.3567],
            [0.6558, 0.2469, 5.5787]])
    

# torch.inverse

`torch.inverse(input, *, out=None) → Tensor`

    

Takes the inverse of the square matrix `input`. `input` can be batches of 2D
square tensors, in which case this function would return a tensor composed of
individual inverses.

Supports real and complex input.

Note

`torch.inverse()` is deprecated. Please use
[`torch.linalg.inv()`](../linalg#torch.linalg.inv "torch.linalg.inv") instead.

Note

Irrespective of the original strides, the returned tensors will be transposed,
i.e. with strides like `input.contiguous().transpose(-2, -1).stride()`

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor of size (∗,n,n)(*, n, n) where `*` is zero or more batch dimensions

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Examples:

    
    
    >>> x = torch.rand(4, 4)
    >>> y = torch.inverse(x)
    >>> z = torch.mm(x, y)
    >>> z
    tensor([[ 1.0000, -0.0000, -0.0000,  0.0000],
            [ 0.0000,  1.0000,  0.0000,  0.0000],
            [ 0.0000,  0.0000,  1.0000,  0.0000],
            [ 0.0000, -0.0000, -0.0000,  1.0000]])
    >>> torch.max(torch.abs(z - torch.eye(4))) # Max non-zero
    tensor(1.1921e-07)
    
    >>> # Batched inverse example
    >>> x = torch.randn(2, 3, 4, 4)
    >>> y = torch.inverse(x)
    >>> z = torch.matmul(x, y)
    >>> torch.max(torch.abs(z - torch.eye(4).expand_as(x))) # Max non-zero
    tensor(1.9073e-06)
    
    >>> x = torch.rand(4, 4, dtype=torch.cdouble)
    >>> y = torch.inverse(x)
    >>> z = torch.mm(x, y)
    >>> z
    tensor([[ 1.0000e+00+0.0000e+00j, -1.3878e-16+3.4694e-16j,
            5.5511e-17-1.1102e-16j,  0.0000e+00-1.6653e-16j],
            [ 5.5511e-16-1.6653e-16j,  1.0000e+00+6.9389e-17j,
            2.2204e-16-1.1102e-16j, -2.2204e-16+1.1102e-16j],
            [ 3.8858e-16-1.2490e-16j,  2.7756e-17+3.4694e-17j,
            1.0000e+00+0.0000e+00j, -4.4409e-16+5.5511e-17j],
            [ 4.4409e-16+5.5511e-16j, -3.8858e-16+1.8041e-16j,
            2.2204e-16+0.0000e+00j,  1.0000e+00-3.4694e-16j]],
        dtype=torch.complex128)
    >>> torch.max(torch.abs(z - torch.eye(4, dtype=torch.cdouble))) # Max non-zero
    tensor(7.5107e-16, dtype=torch.float64)
    

# torch.is_complex

`torch.is_complex(input) -> (bool)`

    

Returns True if the data type of `input` is a complex data type i.e., one of
`torch.complex64`, and `torch.complex128`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

# torch.is_floating_point

`torch.is_floating_point(input) -> (bool)`

    

Returns True if the data type of `input` is a floating point data type i.e.,
one of `torch.float64`, `torch.float32`, `torch.float16`, and
`torch.bfloat16`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

# torch.is_nonzero

`torch.is_nonzero(input) -> (bool)`

    

Returns True if the `input` is a single element tensor which is not equal to
zero after type conversions. i.e. not equal to `torch.tensor([0.])` or
`torch.tensor([0])` or `torch.tensor([False])`. Throws a `RuntimeError` if
`torch.numel() != 1` (even in case of sparse tensors).

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Examples:

    
    
    >>> torch.is_nonzero(torch.tensor([0.]))
    False
    >>> torch.is_nonzero(torch.tensor([1.5]))
    True
    >>> torch.is_nonzero(torch.tensor([False]))
    False
    >>> torch.is_nonzero(torch.tensor([3]))
    True
    >>> torch.is_nonzero(torch.tensor([1, 3, 5]))
    Traceback (most recent call last):
    ...
    RuntimeError: bool value of Tensor with more than one value is ambiguous
    >>> torch.is_nonzero(torch.tensor([]))
    Traceback (most recent call last):
    ...
    RuntimeError: bool value of Tensor with no values is ambiguous
    

# torch.is_storage

`torch.is_storage(obj)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#is_storage)

    

Returns True if `obj` is a PyTorch storage object.

Parameters

    

**obj** (_Object_) – Object to test

# torch.is_tensor

`torch.is_tensor(obj)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#is_tensor)

    

Returns True if `obj` is a PyTorch tensor.

Note that this function is simply doing `isinstance(obj, Tensor)`. Using that
`isinstance` check is better for typechecking with mypy, and more explicit -
so it’s recommended to use that instead of `is_tensor`.

Parameters

    

**obj** (_Object_) – Object to test

# torch.isclose

`torch.isclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False) →
Tensor`

    

Returns a new tensor with boolean elements representing if each element of
`input` is “close” to the corresponding element of `other`. Closeness is
defined as:

∣input−other∣≤atol+rtol×∣other∣\lvert \text{input} - \text{other} \rvert \leq
\texttt{atol} + \texttt{rtol} \times \lvert \text{other} \rvert

where `input` and `other` are finite. Where `input` and/or `other` are
nonfinite they are close if and only if they are equal, with NaNs being
considered equal to each other when `equal_nan` is True.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor to compare
  * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance. Default: 1e-08
  * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance. Default: 1e-05
  * **equal_nan** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, then two `NaN` s will be considered equal. Default: `False`

Examples:

    
    
    >>> torch.isclose(torch.tensor((1., 2, 3)), torch.tensor((1 + 1e-10, 3, 4)))
    tensor([ True, False, False])
    >>> torch.isclose(torch.tensor((float('inf'), 4)), torch.tensor((float('inf'), 6)), rtol=.5)
    tensor([True, True])
    

# torch.isfinite

`torch.isfinite(input) → Tensor`

    

Returns a new tensor with boolean elements representing if each element is
`finite` or not.

Real values are finite when they are not NaN, negative infinity, or infinity.
Complex values are finite when both their real and imaginary parts are finite.

Args:

    

input (Tensor): the input tensor.

Returns:

    

A boolean tensor that is True where `input` is finite and False elsewhere

Example:

    
    
    >>> torch.isfinite(torch.tensor([1, float('inf'), 2, float('-inf'), float('nan')]))
    tensor([True,  False,  True,  False,  False])
    

# torch.isinf

`torch.isinf(input) → Tensor`

    

Tests if each element of `input` is infinite (positive or negative infinity)
or not.

Note

Complex values are infinite when their real or imaginary part is infinite.

Args:

    

{input}

Returns:

    

A boolean tensor that is True where `input` is infinite and False elsewhere

Example:

    
    
    >>> torch.isinf(torch.tensor([1, float('inf'), 2, float('-inf'), float('nan')]))
    tensor([False,  True,  False,  True,  False])
    

# torch.isnan

`torch.isnan(input) → Tensor`

    

Returns a new tensor with boolean elements representing if each element of
`input` is NaN or not. Complex values are considered NaN when either their
real and/or imaginary part is NaN.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Returns

    

A boolean tensor that is True where `input` is NaN and False elsewhere

Example:

    
    
    >>> torch.isnan(torch.tensor([1, float('nan'), 2]))
    tensor([False, True, False])
    

# torch.isneginf

`torch.isneginf(input, *, out=None) → Tensor`

    

Tests if each element of `input` is negative infinity or not.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> a = torch.tensor([-float('inf'), float('inf'), 1.2])
    >>> torch.isneginf(a)
    tensor([ True, False, False])
    

# torch.isposinf

`torch.isposinf(input, *, out=None) → Tensor`

    

Tests if each element of `input` is positive infinity or not.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> a = torch.tensor([-float('inf'), float('inf'), 1.2])
    >>> torch.isposinf(a)
    tensor([False,  True, False])
    

# torch.isreal

`torch.isreal(input) → Tensor`

    

Returns a new tensor with boolean elements representing if each element of
`input` is real-valued or not. All real-valued types are considered real.
Complex values are considered real when their imaginary part is 0.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Returns

    

A boolean tensor that is True where `input` is real and False elsewhere

Example:

    
    
    >>> torch.isreal(torch.tensor([1, 1+1j, 2+0j]))
    tensor([True, False, True])
    

# torch.istft

`torch.istft(input, n_fft, hop_length=None, win_length=None, window=None,
center=True, normalized=False, onesided=None, length=None,
return_complex=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#istft)

    

Inverse short time Fourier Transform. This is expected to be the inverse of
[`stft()`](torch.stft#torch.stft "torch.stft"). It has the same parameters (+
additional optional parameter of `length`) and it should return the least
squares estimation of the original signal. The algorithm will check using the
NOLA condition ( nonzero overlap).

Important consideration in the parameters `window` and `center` so that the
envelop created by the summation of all the windows is never zero at certain
point in time. Specifically,
∑t=−∞∞∣w∣2[n−t×hop_length]=0\sum_{t=-\infty}^{\infty} |w|^2[n-t\times
hop\\_length] \cancel{=} 0 .

Since [`stft()`](torch.stft#torch.stft "torch.stft") discards elements at the
end of the signal if they do not fit in a frame, `istft` may return a shorter
signal than the original signal (can occur if `center` is False since the
signal isn’t padded).

If `center` is `True`, then there will be padding e.g. `'constant'`,
`'reflect'`, etc. Left padding can be trimmed off exactly because they can be
calculated but right padding cannot be calculated without additional
information.

Example: Suppose the last window is: `[17, 18, 0, 0, 0]` vs `[18, 0, 0, 0, 0]`

The `n_fft`, `hop_length`, `win_length` are all the same which prevents the
calculation of right padding. These additional values could be zeros or a
reflection of the signal so providing `length` could be useful. If `length` is
`None` then padding will be aggressively removed (some loss of signal).

[1] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time
Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236-243, Apr. 1984.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 

The input tensor. Expected to be output of [`stft()`](torch.stft#torch.stft
"torch.stft"), can either be complex (`channel`, `fft_size`, `n_frame`), or
real (`channel`, `fft_size`, `n_frame`, 2) where the `channel` dimension is
optional.

Deprecated since version 1.8.0: Real input is deprecated, use complex inputs
as returned by `stft(..., return_complex=True)` instead.

  * **n_fft** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of Fourier transform
  * **hop_length** (_Optional_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – The distance between neighboring sliding window frames. (Default: `n_fft // 4`)
  * **win_length** (_Optional_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – The size of window frame and STFT filter. (Default: `n_fft`)
  * **window** (_Optional_ _[_[torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – The optional window function. (Default: `torch.ones(win_length)`)
  * **center** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether `input` was padded on both sides so that the tt -th frame is centered at time t×hop_lengtht \times \text{hop\\_length} . (Default: `True`)
  * **normalized** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether the STFT was normalized. (Default: `False`)
  * **onesided** (_Optional_ _[_[bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _]_) – Whether the STFT was onesided. (Default: `True` if `n_fft != fft_size` in the input size)
  * **length** (_Optional_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – The amount to trim the signal by (i.e. the original signal length). (Default: whole signal)
  * **return_complex** (_Optional_ _[_[bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _]_) – Whether the output should be complex, or if the input should be assumed to derive from a real signal and window. Note that this is incompatible with `onesided=True`. (Default: `False`)

Returns

    

Least squares estimation of the original signal of size (…, signal_length)

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

# torch.jit.fork

`torch.jit.fork(func, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_async.html#fork)

    

Creates an asynchronous task executing `func` and a reference to the value of
the result of this execution. `fork` will return immediately, so the return
value of `func` may not have been computed yet. To force completion of the
task and access the return value invoke `torch.jit.wait` on the Future. `fork`
invoked with a `func` which returns `T` is typed as `torch.jit.Future[T]`.
`fork` calls can be arbitrarily nested, and may be invoked with positional and
keyword arguments. Asynchronous execution will only occur when run in
TorchScript. If run in pure python, `fork` will not execute in parallel.
`fork` will also not execute in parallel when invoked while tracing, however
the `fork` and `wait` calls will be captured in the exported IR Graph. ..
warning:

    
    
    `fork` tasks will execute non-deterministicly. We recommend only spawning
    parallel fork tasks for pure functions that do not modify their inputs,
    module attributes, or global state.
    

Parameters

    

  * **func** (_callable_ _or_[torch.nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – A Python function or `torch.nn.Module` that will be invoked. If executed in TorchScript, it will execute asynchronously, otherwise it will not. Traced invocations of fork will be captured in the IR.
  * ****kwargs** (_*args_ _,_) – arguments to invoke `func` with.

Returns

    

a reference to the execution of `func`. The value `T` can only be accessed by
forcing completion of `func` through `torch.jit.wait`.

Return type

    

`torch.jit.Future[T]`

Example (fork a free function):

    
    
    import torch
    from torch import Tensor
    def foo(a : Tensor, b : int) -> Tensor:
        return a + b
    def bar(a):
        fut : torch.jit.Future[Tensor] = torch.jit.fork(foo, a, b=2)
        return torch.jit.wait(fut)
    script_bar = torch.jit.script(bar)
    input = torch.tensor(2)
    # only the scripted version executes asynchronously
    assert script_bar(input) == bar(input)
    # trace is not run asynchronously, but fork is captured in IR
    graph = torch.jit.trace(bar, (input,)).graph
    assert "fork" in str(graph)
    

Example (fork a module method):

    
    
    import torch
    from torch import Tensor
    class AddMod(torch.nn.Module):
        def forward(self, a: Tensor, b : int):
            return a + b
    class Mod(torch.nn.Module):
        def __init__(self):
            super(self).__init__()
            self.mod = AddMod()
        def forward(self, input):
            fut = torch.jit.fork(self.mod, a, b=2)
            return torch.jit.wait(fut)
    input = torch.tensor(2)
    mod = Mod()
    assert mod(input) == torch.jit.script(mod).forward(input)
    

# torch.jit.freeze

`torch.jit.freeze(mod, preserved_attrs=None, optimize_numerics=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_freeze.html#freeze)

    

Freezing a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") will clone it and attempt to inline the cloned
module’s submodules, parameters, and attributes as constants in the
TorchScript IR Graph. By default, `forward` will be preserved, as well as
attributes & methods specified in `preserved_attrs`. Additionally, any
attribute that is modified within a preserved method will be preserved.

Freezing currently only accepts ScriptModules that are in eval mode.

Parameters

    

  * **mod** ([`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")) – a module to be frozen
  * **preserved_attrs** (_Optional_ _[__List_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]__]_) – a list of attributes to preserve in addition to the forward method.
  * **modified in preserved methods will also be preserved.** (_Attributes_) – 
  * **optimize_numerics** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, a set of optimization passes will be run that does not strictly
  * **numerics. Full details of optimization can be found at torch.jit.optimize_frozen_module.** (_preserve_) – 

Returns

    

Frozen [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule").

Example (Freezing a simple module with a Parameter):

    
    
        def forward(self, input):
            output = self.weight.mm(input)
            output = self.linear(output)
            return output
    
    scripted_module = torch.jit.script(MyModule(2, 3).eval())
    frozen_module = torch.jit.freeze(scripted_module)
    # parameters have been removed and inlined into the Graph as constants
    assert len(list(frozen_module.named_parameters())) == 0
    # See the compiled graph as Python code
    print(frozen_module.code)
    

Example (Freezing a module with preserved attributes)

    
    
        def forward(self, input):
            self.modified_tensor += 1
            return input + self.modified_tensor
    
    scripted_module = torch.jit.script(MyModule2().eval())
    frozen_module = torch.jit.freeze(scripted_module, preserved_attrs=["version"])
    # we've manually preserved `version`, so it still exists on the frozen module and can be modified
    assert frozen_module.version == 1
    frozen_module.version = 2
    # `modified_tensor` is detected as being mutated in the forward, so freezing preserves
    # it to retain model semantics
    assert frozen_module(torch.tensor(1)) == torch.tensor(12)
    # now that we've run it once, the next result will be incremented by one
    assert frozen_module(torch.tensor(1)) == torch.tensor(13)
    

Note

If you’re not sure why an attribute is not being inlined as a constant, you
can run `dump_alias_db` on frozen_module.forward.graph to see if freezing has
detected the attribute is being modified.

# torch.jit.ignore

`torch.jit.ignore(drop=False, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#ignore)

    

This decorator indicates to the compiler that a function or method should be
ignored and left as a Python function. This allows you to leave code in your
model that is not yet TorchScript compatible. If called from TorchScript,
ignored functions will dispatch the call to the Python interpreter. Models
with ignored functions cannot be exported; use
[`@torch.jit.unused`](torch.jit.unused#torch.jit.unused "torch.jit.unused")
instead.

Example (using `@torch.jit.ignore` on a method):

    
    
    import torch
    import torch.nn as nn
    
    class MyModule(nn.Module):
        @torch.jit.ignore
        def debugger(self, x):
            import pdb
            pdb.set_trace()
    
        def forward(self, x):
            x += 10
            # The compiler would normally try to compile `debugger`,
            # but since it is `@ignore`d, it will be left as a call
            # to Python
            self.debugger(x)
            return x
    
    m = torch.jit.script(MyModule())
    
    # Error! The call `debugger` cannot be saved since it calls into Python
    m.save("m.pt")
    

Example (using `@torch.jit.ignore(drop=True)` on a method):

    
    
    import torch
    import torch.nn as nn
    
    class MyModule(nn.Module):
        @torch.jit.ignore(drop=True)
        def training_method(self, x):
            import pdb
            pdb.set_trace()
    
        def forward(self, x):
            if self.training:
                self.training_method(x)
            return x
    
    m = torch.jit.script(MyModule())
    
    # This is OK since `training_method` is not saved, the call is replaced
    # with a `raise`.
    m.save("m.pt")
    

# torch.jit.isinstance

`torch.jit.isinstance(obj, target_type)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit.html#isinstance)

    

This function provides for conatiner type refinement in TorchScript. It can
refine parameterized containers of the List, Dict, Tuple, and Optional types.
E.g. `List[str]`, `Dict[str, List[torch.Tensor]]`,
`Optional[Tuple[int,str,int]]`. It can also refine basic types such as bools
and ints that are available in TorchScript.

Parameters

    

  * **obj** – object to refine the type of
  * **target_type** – type to try to refine obj to

Returns

    

True if obj was successfully refined to the type of target_type,

    

False otherwise with no new type refinement

Return type

    

`bool`

Example (using `torch.jit.isinstance` for type refinement): .. testcode:

    
    
    import torch
    from typing import Any, Dict, List
    
    class MyModule(torch.nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
    
        def forward(self, input: Any): # note the Any type
            if torch.jit.isinstance(input, List[torch.Tensor]):
                for t in input:
                    y = t.clamp(0, 0.5)
            elif torch.jit.isinstance(input, Dict[str, str]):
                for val in input.values():
                    print(val)
    
    m = torch.jit.script(MyModule())
    x = [torch.rand(3,3), torch.rand(4,3)]
    m(x)
    y = {"key1":"val1","key2":"val2"}
    m(y)
    

# torch.jit.load

`torch.jit.load(f, map_location=None, _extra_files=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_serialization.html#load)

    

Load a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") or
[`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction
"torch.jit.ScriptFunction") previously saved with
[`torch.jit.save`](torch.jit.save#torch.jit.save "torch.jit.save")

All previously saved modules, no matter their device, are first loaded onto
CPU, and then are moved to the devices they were saved from. If this fails
(e.g. because the run time system doesn’t have certain devices), an exception
is raised.

Parameters

    

  * **f** – a file-like object (has to implement read, readline, tell, and seek), or a string containing a file name
  * **map_location** (_string_ _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – A simplified version of `map_location` in `torch.jit.save` used to dynamically remap storages to an alternative set of devices.
  * **_extra_files** (_dictionary of filename to content_) – The extra filenames given in the map would be loaded and their content would be stored in the provided map.

Returns

    

A [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") object.

Example:

    
    
    import torch
    import io
    
    torch.jit.load('scriptmodule.pt')
    
    # Load ScriptModule from io.BytesIO object
    with open('scriptmodule.pt', 'rb') as f:
        buffer = io.BytesIO(f.read())
    
    # Load all tensors to the original device
    torch.jit.load(buffer)
    
    # Load all tensors onto CPU, using a device
    buffer.seek(0)
    torch.jit.load(buffer, map_location=torch.device('cpu'))
    
    # Load all tensors onto CPU, using a string
    buffer.seek(0)
    torch.jit.load(buffer, map_location='cpu')
    
    # Load with extra files.
    extra_files = {'foo.txt': ''}  # values will be replaced with data
    torch.jit.load('scriptmodule.pt', _extra_files=extra_files)
    print(extra_files['foo.txt'])
    

# torch.jit.save

`torch.jit.save(m, f, _extra_files=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_serialization.html#save)

    

Save an offline version of this module for use in a separate process. The
saved module serializes all of the methods, submodules, parameters, and
attributes of this module. It can be loaded into the C++ API using
`torch::jit::load(filename)` or into the Python API with
[`torch.jit.load`](torch.jit.load#torch.jit.load "torch.jit.load").

To be able to save a module, it must not make any calls to native Python
functions. This means that all submodules must be subclasses of
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") as well.

Danger

All modules, no matter their device, are always loaded onto the CPU during
loading. This is different from [`torch.load()`](torch.load#torch.load
"torch.load")’s semantics and may change in the future.

Parameters

    

  * **m** – A [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") to save.
  * **f** – A file-like object (has to implement write and flush) or a string containing a file name.
  * **_extra_files** – Map from filename to contents which will be stored as part of `f`.

Note

torch.jit.save attempts to preserve the behavior of some operators across
versions. For example, dividing two integer tensors in PyTorch 1.5 performed
floor division, and if the module containing that code is saved in PyTorch 1.5
and loaded in PyTorch 1.6 its division behavior will be preserved. The same
module saved in PyTorch 1.6 will fail to load in PyTorch 1.5, however, since
the behavior of division changed in 1.6, and 1.5 does not know how to
replicate the 1.6 behavior.

Example:

    
    
    import torch
    import io
    
    class MyModule(torch.nn.Module):
        def forward(self, x):
            return x + 10
    
    m = torch.jit.script(MyModule())
    
    # Save to file
    torch.jit.save(m, 'scriptmodule.pt')
    # This line is equivalent to the previous
    m.save("scriptmodule.pt")
    
    # Save to io.BytesIO buffer
    buffer = io.BytesIO()
    torch.jit.save(m, buffer)
    
    # Save with extra files
    extra_files = {'foo.txt': b'bar'}
    torch.jit.save(m, 'scriptmodule.pt', _extra_files=extra_files)
    

# torch.jit.script

`torch.jit.script(obj, optimize=None, _frames_up=0, _rcb=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_script.html#script)

    

Scripting a function or `nn.Module` will inspect the source code, compile it
as TorchScript code using the TorchScript compiler, and return a
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") or
[`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction
"torch.jit.ScriptFunction"). TorchScript itself is a subset of the Python
language, so not all features in Python work, but we provide enough
functionality to compute on tensors and do control-dependent operations. For a
complete guide, see the [TorchScript Language
Reference](../jit_language_reference#language-reference).

`torch.jit.script` can be used as a function for modules and functions, and as
a decorator `@torch.jit.script` for [TorchScript
Classes](../jit_language_reference#id2) and functions.

Parameters

    

**obj** (callable, class, or `nn.Module`) – The `nn.Module`, function, or
class type to compile.

Returns

    

If `obj` is `nn.Module`, `script` returns a
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") object. The returned
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") will have the same set of sub-modules and parameters
as the original `nn.Module`. If `obj` is a standalone function, a
[`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction
"torch.jit.ScriptFunction") will be returned.

**Scripting a function**

    

The `@torch.jit.script` decorator will construct a
[`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction
"torch.jit.ScriptFunction") by compiling the body of the function.

Example (scripting a function):

    
    
    import torch
    
    @torch.jit.script
    def foo(x, y):
        if x.max() > y.max():
            r = x
        else:
            r = y
        return r
    
    print(type(foo))  # torch.jit.ScriptFuncion
    
    # See the compiled graph as Python code
    print(foo.code)
    
    # Call the function using the TorchScript interpreter
    foo(torch.ones(2, 2), torch.ones(2, 2))
    

**Scripting an nn.Module**

    

Scripting an `nn.Module` by default will compile the `forward` method and
recursively compile any methods, submodules, and functions called by
`forward`. If a `nn.Module` only uses features supported in TorchScript, no
changes to the original module code should be necessary. `script` will
construct [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") that has copies of the attributes, parameters, and
methods of the original module.

Example (scripting a simple module with a Parameter):

    
    
    import torch
    
    class MyModule(torch.nn.Module):
        def __init__(self, N, M):
            super(MyModule, self).__init__()
            # This parameter will be copied to the new ScriptModule
            self.weight = torch.nn.Parameter(torch.rand(N, M))
    
            # When this submodule is used, it will be compiled
            self.linear = torch.nn.Linear(N, M)
    
        def forward(self, input):
            output = self.weight.mv(input)
    
            # This calls the `forward` method of the `nn.Linear` module, which will
            # cause the `self.linear` submodule to be compiled to a `ScriptModule` here
            output = self.linear(output)
            return output
    
    scripted_module = torch.jit.script(MyModule(2, 3))
    

Example (scripting a module with traced submodules):

    
    
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    
    class MyModule(nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
            # torch.jit.trace produces a ScriptModule's conv1 and conv2
            self.conv1 = torch.jit.trace(nn.Conv2d(1, 20, 5), torch.rand(1, 1, 16, 16))
            self.conv2 = torch.jit.trace(nn.Conv2d(20, 20, 5), torch.rand(1, 20, 16, 16))
    
        def forward(self, input):
            input = F.relu(self.conv1(input))
            input = F.relu(self.conv2(input))
            return input
    
    scripted_module = torch.jit.script(MyModule())
    

To compile a method other than `forward` (and recursively compile anything it
calls), add the [`@torch.jit.export`](../jit#torch.jit.export
"torch.jit.export") decorator to the method. To opt out of compilation use
[`@torch.jit.ignore`](torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") or
[`@torch.jit.unused`](torch.jit.unused#torch.jit.unused "torch.jit.unused").

Example (an exported and ignored method in a module):

    
    
    import torch
    import torch.nn as nn
    
    class MyModule(nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
    
        @torch.jit.export
        def some_entry_point(self, input):
            return input + 10
    
        @torch.jit.ignore
        def python_only_fn(self, input):
            # This function won't be compiled, so any
            # Python APIs can be used
            import pdb
            pdb.set_trace()
    
        def forward(self, input):
            if self.training:
                self.python_only_fn(input)
            return input * 99
    
    scripted_module = torch.jit.script(MyModule())
    print(scripted_module.some_entry_point(torch.randn(2, 2)))
    print(scripted_module(torch.randn(2, 2)))
    

# torch.jit.script_if_tracing

`torch.jit.script_if_tracing(fn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit.html#script_if_tracing)

    

Compiles `fn` when it is first called during tracing. `torch.jit.script` has a
non-negligible start up time when it is first called due to lazy-
initializations of many compiler builtins. Therefore you should not use it in
library code. However, you may want to have parts of your library work in
tracing even if they use control flow. In these cases, you should use
`@torch.jit.script_if_tracing` to substitute for `torch.jit.script`.

Parameters

    

**fn** – A function to compile.

Returns

    

If called during tracing, a
[`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction
"torch.jit.ScriptFunction") created by `torch.jit.script` is returned.
Otherwise, the original function `fn` is returned.

# ScriptFunction

`class torch.jit.ScriptFunction`

    

Functionally equivalent to a
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule"), but represents a single function and does not have
any attributes or Parameters.

`get_debug_state(self: torch._C.ScriptFunction) → torch._C.GraphExecutorState`

`save(self: torch._C.ScriptFunction, filename: str, _extra_files: Dict[str,
str] = {}) → None`

`save_to_buffer(self: torch._C.ScriptFunction, _extra_files: Dict[str, str] =
{}) → bytes`

# ScriptModule

`class torch.jit.ScriptModule`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_script.html#ScriptModule)

    

A wrapper around C++ `torch::jit::Module`. `ScriptModule`s contain methods,
attributes, parameters, and constants. These can be accessed the same as on a
normal `nn.Module`.

`add_module(name, module)`

    

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

    

  * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name
  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – child module to be added to the module.

`apply(fn)`

    

Applies `fn` recursively to every submodule (as returned by `.children()`) as
well as self. Typical use includes initializing the parameters of a model (see
also [torch.nn.init](../nn.init#nn-init-doc)).

Parameters

    

**fn** (`Module` -> None) – function to be applied to each submodule

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

Example:

    
    
    >>> @torch.no_grad()
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    

`bfloat16()`

    

Casts all floating point parameters and buffers to `bfloat16` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`buffers(recurse=True)`

    

Returns an iterator over module buffers.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields buffers of this module and all
submodules. Otherwise, yields only buffers that are direct members of this
module.

Yields

    

_torch.Tensor_ – module buffer

Example:

    
    
    >>> for buf in model.buffers():
    >>>     print(type(buf), buf.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`children()`

    

Returns an iterator over immediate children modules.

Yields

    

_Module_ – a child module

`property code`

    

Returns a pretty-printed representation (as valid Python syntax) of the
internal graph for the `forward` method. See [Inspecting
Code](../jit#inspecting-code) for details.

`property code_with_constants`

    

Returns a tuple of:

[0] a pretty-printed representation (as valid Python syntax) of the internal
graph for the `forward` method. See `code`. [1] a ConstMap following the
CONSTANT.cN format of the output in [0]. The indices in the [0] output are
keys to the underlying constant’s values.

See [Inspecting Code](../jit#inspecting-code) for details.

`cpu()`

    

Moves all model parameters and buffers to the CPU.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`cuda(device=None)`

    

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on GPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`double()`

    

Casts all floating point parameters and buffers to `double` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`eval()`

    

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. `Dropout`, `BatchNorm`, etc.

This is equivalent with
[`self.train(False)`](torch.nn.module#torch.nn.Module.train
"torch.nn.Module.train").

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`extra_repr()`

    

Set the extra representation of the module

To print customized extra information, you should re-implement this method in
your own modules. Both single-line and multi-line strings are acceptable.

`float()`

    

Casts all floating point parameters and buffers to float datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`property graph`

    

Returns a string representation of the internal graph for the `forward`
method. See [Interpreting Graphs](../jit#interpreting-graphs) for details.

`half()`

    

Casts all floating point parameters and buffers to `half` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`property inlined_graph`

    

Returns a string representation of the internal graph for the `forward`
method. This graph will be preprocessed to inline all function and method
calls. See [Interpreting Graphs](../jit#interpreting-graphs) for details.

`load_state_dict(state_dict, strict=True)`

    

Copies parameters and buffers from `state_dict` into this module and its
descendants. If `strict` is `True`, then the keys of `state_dict` must exactly
match the keys returned by this module’s
[`state_dict()`](torch.nn.module#torch.nn.Module.state_dict
"torch.nn.Module.state_dict") function.

Parameters

    

  * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Default: `True`

Returns

    

  * **missing_keys** is a list of str containing the missing keys
  * **unexpected_keys** is a list of str containing the unexpected keys

Return type

    

`NamedTuple` with `missing_keys` and `unexpected_keys` fields

`modules()`

    

Returns an iterator over all modules in the network.

Yields

    

_Module_ – a module in the network

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    
    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)
    

`named_buffers(prefix='', recurse=True)`

    

Returns an iterator over module buffers, yielding both the name of the buffer
as well as the buffer itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

    

_(string, torch.Tensor)_ – Tuple containing the name and buffer

Example:

    
    
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    

`named_children()`

    

Returns an iterator over immediate children modules, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple containing a name and child module

Example:

    
    
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    

`named_modules(memo=None, prefix='')`

    

Returns an iterator over all modules in the network, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    
    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
    

`named_parameters(prefix='', recurse=True)`

    

Returns an iterator over module parameters, yielding both the name of the
parameter as well as the parameter itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

    

_(string, Parameter)_ – Tuple containing the name and parameter

Example:

    
    
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    

`parameters(recurse=True)`

    

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields parameters of this module and
all submodules. Otherwise, yields only parameters that are direct members of
this module.

Yields

    

_Parameter_ – module parameter

Example:

    
    
    >>> for param in model.parameters():
    >>>     print(type(param), param.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`register_backward_hook(hook)`

    

Registers a backward hook on the module.

This function is deprecated in favor of
`nn.Module.register_full_backward_hook()` and the behavior of this function
will change in future versions.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_buffer(name, tensor, persistent=True)`

    

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a
model parameter. For example, BatchNorm’s `running_mean` is not a parameter,
but is part of the module’s state. Buffers, by default, are persistent and
will be saved alongside parameters. This behavior can be changed by setting
`persistent` to `False`. The only difference between a persistent buffer and a
non-persistent buffer is that the latter will not be a part of this module’s
`state_dict`.

Buffers can be accessed as attributes using given names.

Parameters

    

  * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name
  * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered.
  * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`.

Example:

    
    
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    

`register_forward_hook(hook)`

    

Registers a forward hook on the module.

The hook will be called every time after `forward()` has computed an output.
It should have the following signature:

    
    
    hook(module, input, output) -> None or modified output
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the output. It can modify the input inplace but it will not have effect
on forward since this is called after `forward()` is called.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_forward_pre_hook(hook)`

    

Registers a forward pre-hook on the module.

The hook will be called every time before `forward()` is invoked. It should
have the following signature:

    
    
    hook(module, input) -> None or modified input
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the input. User can either return a tuple or a single modified value in
the hook. We will wrap the value into a tuple if a single value is
returned(unless that value is already a tuple).

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_full_backward_hook(hook)`

    

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs
are computed. The hook should have the following signature:

    
    
    hook(module, grad_input, grad_output) -> tuple(Tensor) or None
    

The `grad_input` and `grad_output` are tuples that contain the gradients with
respect to the inputs and outputs respectively. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to the
input that will be used in place of `grad_input` in subsequent computations.
`grad_input` will only correspond to the inputs given as positional arguments
and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output`
will be `None` for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks
and will raise an error.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_parameter(name, param)`

    

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

    

  * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name
  * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module.

`requires_grad_(requires_grad=True)`

    

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ `requires_grad` attributes in-place.

This method is helpful for freezing part of the module for finetuning or
training parts of a model individually (e.g., GAN training).

Parameters

    

**requires_grad**
([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)")) – whether autograd should record operations on parameters in this
module. Default: `True`.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`save(f, _extra_files={})`

    

See [`torch.jit.save`](torch.jit.save#torch.jit.save "torch.jit.save") for
details.

`state_dict(destination=None, prefix='', keep_vars=False)`

    

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included.
Keys are corresponding parameter and buffer names.

Returns

    

a dictionary containing a whole state of the module

Return type

    

[dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)")

Example:

    
    
    >>> module.state_dict().keys()
    ['bias', 'weight']
    

`to(*args, **kwargs)`

    

Moves and/or casts the parameters and buffers.

This can be called as

`to(device=None, dtype=None, non_blocking=False)`

`to(dtype, non_blocking=False)`

`to(tensor, non_blocking=False)`

`to(memory_format=torch.channels_last)`

Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to
"torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In
addition, this method will only cast the floating point or complex parameters
and buffers to :attr:`dtype` (if given). The integral parameters and buffers
will be moved `device`, if that is given, but with dtypes unchanged. When
`non_blocking` is set, it tries to convert/move asynchronously with respect to
the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA
devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

    

  * **device** (`torch.device`) – the desired device of the parameters and buffers in this module
  * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module
  * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
  * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

Examples:

    
    
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    
    >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.3741+0.j,  0.2382+0.j],
            [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
    >>> linear(torch.ones(3, 2, dtype=torch.cdouble))
    tensor([[0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
    

`train(mode=True)`

    

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. `Dropout`, `BatchNorm`, etc.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode
(`False`). Default: `True`.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`type(dst_type)`

    

Casts all parameters and buffers to `dst_type`.

Parameters

    

**dst_type** ([type](https://docs.python.org/3/library/functions.html#type
"\(in Python v3.9\)") _or_ _string_) – the desired type

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`xpu(device=None)`

    

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on XPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`zero_grad(set_to_none=False)`

    

Sets gradients of all model parameters to zero. See similar function under
[`torch.optim.Optimizer`](../optim#torch.optim.Optimizer
"torch.optim.Optimizer") for more context.

Parameters

    

**set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – instead of setting to zero, set the grads to None.
See
[`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad
"torch.optim.Optimizer.zero_grad") for details.

# torch.jit.trace

`torch.jit.trace(func, example_inputs, optimize=None, check_trace=True,
check_inputs=None, check_tolerance=1e-05, strict=True, _force_outplace=False,
_module_class=None, _compilation_unit=<torch.jit.CompilationUnit object>)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_trace.html#trace)

    

Trace a function and return an executable or
[`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction
"torch.jit.ScriptFunction") that will be optimized using just-in-time
compilation. Tracing is ideal for code that operates only on `Tensor`s and
lists, dictionaries, and tuples of `Tensor`s.

Using `torch.jit.trace` and `torch.jit.trace_module`, you can turn an existing
module or Python function into a TorchScript
[`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction
"torch.jit.ScriptFunction") or
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule"). You must provide example inputs, and we run the
function, recording the operations performed on all the tensors.

  * The resulting recording of a standalone function produces `ScriptFunction`.
  * The resulting recording of `nn.Module.forward` or `nn.Module` produces `ScriptModule`.

This module also contains any parameters that the original module had as well.

Warning

Tracing only correctly records functions and modules which are not data
dependent (e.g., do not have conditionals on data in tensors) and do not have
any untracked external dependencies (e.g., perform input/output or access
global variables). Tracing only records operations done when the given
function is run on the given tensors. Therefore, the returned `ScriptModule`
will always run the same traced graph on any input. This has some important
implications when your module is expected to run different sets of operations,
depending on the input and/or the module state. For example,

  * Tracing will not record any control-flow like if-statements or loops. When this control-flow is constant across your module, this is fine and it often inlines the control-flow decisions. But sometimes the control-flow is actually part of the model itself. For instance, a recurrent network is a loop over the (possibly dynamic) length of an input sequence.
  * In the returned [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"), operations that have different behaviors in `training` and `eval` modes will always behave as if it is in the mode it was in during tracing, no matter which mode the `ScriptModule` is in.

In cases like these, tracing would not be appropriate and
[`scripting`](torch.jit.script#torch.jit.script "torch.jit.script") is a
better choice. If you trace such models, you may silently get incorrect
results on subsequent invocations of the model. The tracer will try to emit
warnings when doing something that may cause an incorrect trace to be
produced.

Parameters

    

  * **func** (_callable_ _or_[torch.nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – A Python function or `torch.nn.Module` that will be run with `example_inputs`. `func` arguments and return values must be tensors or (possibly nested) tuples that contain tensors. When a module is passed `torch.jit.trace`, only the `forward` method is run and traced (see [`torch.jit.trace`](torch.jit.trace_module#torch.jit.trace_module "torch.jit.trace_module") for details).
  * **example_inputs** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _or_[torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – A tuple of example inputs that will be passed to the function while tracing. The resulting trace can be run with inputs of different types and shapes assuming the traced operations support those types and shapes. `example_inputs` may also be a single Tensor in which case it is automatically wrapped in a tuple.

Keyword Arguments

    

  * **check_trace** (`bool`, optional) – Check if the same inputs run through traced code produce the same outputs. Default: `True`. You might want to disable this if, for example, your network contains non- deterministic ops or if you are sure that the network is correct despite a checker failure.
  * **check_inputs** (_list of tuples_ _,__optional_) – A list of tuples of input arguments that should be used to check the trace against what is expected. Each tuple is equivalent to a set of input arguments that would be specified in `example_inputs`. For best results, pass in a set of checking inputs representative of the space of shapes and types of inputs you expect the network to see. If not specified, the original `example_inputs` are used for checking
  * **check_tolerance** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Floating-point comparison tolerance to use in the checker procedure. This can be used to relax the checker strictness in the event that results diverge numerically for a known reason, such as operator fusion.
  * **strict** (`bool`, optional) – run the tracer in a strict mode or not (default: `True`). Only turn this off when you want the tracer to record your mutable container types (currently `list`/`dict`) and you are sure that the container you are using in your problem is a `constant` structure and does not get used as control flow (if, for) conditions.

Returns

    

If `func` is `nn.Module` or `forward` of `nn.Module`, `trace` returns a
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") object with a single `forward` method containing the
traced code. The returned `ScriptModule` will have the same set of sub-modules
and parameters as the original `nn.Module`. If `func` is a standalone
function, `trace` returns `ScriptFunction`.

Example (tracing a function):

    
    
    import torch
    
    def foo(x, y):
        return 2 * x + y
    
    # Run `foo` with the provided inputs and record the tensor operations
    traced_foo = torch.jit.trace(foo, (torch.rand(3), torch.rand(3)))
    
    # `traced_foo` can now be run with the TorchScript interpreter or saved
    # and loaded in a Python-free environment
    

Example (tracing an existing module):

    
    
    import torch
    import torch.nn as nn
    
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv = nn.Conv2d(1, 1, 3)
    
        def forward(self, x):
            return self.conv(x)
    
    n = Net()
    example_weight = torch.rand(1, 1, 3, 3)
    example_forward_input = torch.rand(1, 1, 3, 3)
    
    # Trace a specific method and construct `ScriptModule` with
    # a single `forward` method
    module = torch.jit.trace(n.forward, example_forward_input)
    
    # Trace a module (implicitly traces `forward`) and construct a
    # `ScriptModule` with a single `forward` method
    module = torch.jit.trace(n, example_forward_input)
    

# torch.jit.trace_module

`torch.jit.trace_module(mod, inputs, optimize=None, check_trace=True,
check_inputs=None, check_tolerance=1e-05, strict=True, _force_outplace=False,
_module_class=None, _compilation_unit=<torch.jit.CompilationUnit object>)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_trace.html#trace_module)

    

Trace a module and return an executable
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") that will be optimized using just-in-time
compilation. When a module is passed to
[`torch.jit.trace`](torch.jit.trace#torch.jit.trace "torch.jit.trace"), only
the `forward` method is run and traced. With `trace_module`, you can specify a
dictionary of method names to example inputs to trace (see the `inputs`)
argument below.

See [`torch.jit.trace`](torch.jit.trace#torch.jit.trace "torch.jit.trace") for
more information on tracing.

Parameters

    

  * **mod** ([torch.nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – A `torch.nn.Module` containing methods whose names are specified in `inputs`. The given methods will be compiled as a part of a single `ScriptModule`.
  * **inputs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – A dict containing sample inputs indexed by method names in `mod`. The inputs will be passed to methods whose names correspond to inputs’ keys while tracing. `{ 'forward' : example_forward_input, 'method2': example_method2_input}`

Keyword Arguments

    

  * **check_trace** (`bool`, optional) – Check if the same inputs run through traced code produce the same outputs. Default: `True`. You might want to disable this if, for example, your network contains non- deterministic ops or if you are sure that the network is correct despite a checker failure.
  * **check_inputs** (_list of dicts_ _,__optional_) – A list of dicts of input arguments that should be used to check the trace against what is expected. Each tuple is equivalent to a set of input arguments that would be specified in `inputs`. For best results, pass in a set of checking inputs representative of the space of shapes and types of inputs you expect the network to see. If not specified, the original `inputs` are used for checking
  * **check_tolerance** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Floating-point comparison tolerance to use in the checker procedure. This can be used to relax the checker strictness in the event that results diverge numerically for a known reason, such as operator fusion.

Returns

    

A [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") object with a single `forward` method containing the
traced code. When `func` is a `torch.nn.Module`, the returned
[`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") will have the same set of sub-modules and parameters
as `func`.

Example (tracing a module with multiple methods):

    
    
    import torch
    import torch.nn as nn
    
    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv = nn.Conv2d(1, 1, 3)
    
        def forward(self, x):
            return self.conv(x)
    
        def weighted_kernel_sum(self, weight):
            return weight * self.conv.weight
    
    
    n = Net()
    example_weight = torch.rand(1, 1, 3, 3)
    example_forward_input = torch.rand(1, 1, 3, 3)
    
    # Trace a specific method and construct `ScriptModule` with
    # a single `forward` method
    module = torch.jit.trace(n.forward, example_forward_input)
    
    # Trace a module (implicitly traces `forward`) and construct a
    # `ScriptModule` with a single `forward` method
    module = torch.jit.trace(n, example_forward_input)
    
    # Trace specific methods on a module (specified in `inputs`), constructs
    # a `ScriptModule` with `forward` and `weighted_kernel_sum` methods
    inputs = {'forward' : example_forward_input, 'weighted_kernel_sum' : example_weight}
    module = torch.jit.trace_module(n, inputs)
    

# torch.jit.unused

`torch.jit.unused(fn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#unused)

    

This decorator indicates to the compiler that a function or method should be
ignored and replaced with the raising of an exception. This allows you to
leave code in your model that is not yet TorchScript compatible and still
export your model.

Example (using `@torch.jit.unused` on a method):

    
    
    import torch
    import torch.nn as nn
    
    class MyModule(nn.Module):
        def __init__(self, use_memory_efficient):
            super(MyModule, self).__init__()
            self.use_memory_efficient = use_memory_efficient
    
        @torch.jit.unused
        def memory_efficient(self, x):
            import pdb
            pdb.set_trace()
            return x + 10
    
        def forward(self, x):
            # Use not-yet-scriptable memory efficient mode
            if self.use_memory_efficient:
                return self.memory_efficient(x)
            else:
                return x + 10
    
    m = torch.jit.script(MyModule(use_memory_efficient=False))
    m.save("m.pt")
    
    m = torch.jit.script(MyModule(use_memory_efficient=True))
    # exception raised
    m(torch.rand(100))
    

# torch.jit.wait

`torch.jit.wait(future)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_async.html#wait)

    

Forces completion of a `torch.jit.Future[T]` asynchronous task, returning the
result of the task. See [`fork()`](torch.jit.fork#torch.jit.fork
"torch.jit.fork") for docs and examples. :param func: an asynchronous task
reference, created through `torch.jit.fork` :type func: torch.jit.Future[T]

Returns

    

the return value of the the completed task

Return type

    

`T`

# torch.kaiser_window

`torch.kaiser_window(window_length, periodic=True, beta=12.0, *, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Computes the Kaiser window with window length `window_length` and shape
parameter `beta`.

Let I_0 be the zeroth order modified Bessel function of the first kind (see
[`torch.i0()`](torch.i0#torch.i0 "torch.i0")) and `N = L - 1` if `periodic` is
False and `L` if `periodic` is True, where `L` is the `window_length`. This
function computes:

outi=I0(β1−(i−N/2N/2)2)/I0(β)out_i = I_0 \left( \beta \sqrt{1 - \left(
{\frac{i - N/2}{N/2}} \right) ^2 } \right) / I_0( \beta )

Calling `torch.kaiser_window(L, B, periodic=True)` is equivalent to calling
`torch.kaiser_window(L + 1, B, periodic=False)[:-1])`. The `periodic` argument
is intended as a helpful shorthand to produce a periodic window as input to
functions like [`torch.stft()`](torch.stft#torch.stft "torch.stft").

Note

If `window_length` is one, then the returned window is a single element tensor
containing a one.

Parameters

    

  * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – length of the window.
  * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a periodic window suitable for use in spectral analysis. If False, returns a symmetric window suitable for use in filter design.
  * **beta** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – shape parameter for the window.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

# torch.kron

`torch.kron(input, other, *, out=None) → Tensor`

    

Computes the Kronecker product, denoted by ⊗\otimes , of `input` and `other`.

If `input` is a (a0×a1×⋯×an)(a_0 \times a_1 \times \dots \times a_n) tensor
and `other` is a (b0×b1×⋯×bn)(b_0 \times b_1 \times \dots \times b_n) tensor,
the result will be a (a0∗b0×a1∗b1×⋯×an∗bn)(a_0*b_0 \times a_1*b_1 \times \dots
\times a_n*b_n) tensor with the following entries:

(input⊗other)k0,k1,…,kn=inputi0,i1,…,in∗otherj0,j1,…,jn,(\text{input} \otimes
\text{other})_{k_0, k_1, \dots, k_n} = \text{input}_{i_0, i_1, \dots, i_n} *
\text{other}_{j_0, j_1, \dots, j_n},

where kt=it∗bt+jtk_t = i_t * b_t + j_t for 0≤t≤n0 \leq t \leq n . If one
tensor has fewer dimensions than the other it is unsqueezed until it has the
same number of dimensions.

Supports real-valued and complex-valued inputs.

Note

This function generalizes the typical definition of the Kronecker product for
two matrices to two tensors, as described above. When `input` is a (m×n)(m
\times n) matrix and `other` is a (p×q)(p \times q) matrix, the result will be
a (p∗m×q∗n)(p*m \times q*n) block matrix:

A⊗B=[a11B⋯a1nB⋮⋱⋮am1B⋯amnB]\mathbf{A} \otimes \mathbf{B}=\begin{bmatrix}
a_{11} \mathbf{B} & \cdots & a_{1 n} \mathbf{B} \\\ \vdots & \ddots & \vdots
\\\ a_{m 1} \mathbf{B} & \cdots & a_{m n} \mathbf{B} \end{bmatrix}

where `input` is A\mathbf{A} and `other` is B\mathbf{B} .

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – The
output tensor. Ignored if `None`. Default: `None`

Examples:

    
    
    >>> mat1 = torch.eye(2)
    >>> mat2 = torch.ones(2, 2)
    >>> torch.kron(mat1, mat2)
    tensor([[1., 1., 0., 0.],
            [1., 1., 0., 0.],
            [0., 0., 1., 1.],
            [0., 0., 1., 1.]])
    
    >>> mat1 = torch.eye(2)
    >>> mat2 = torch.arange(1, 5).reshape(2, 2)
    >>> torch.kron(mat1, mat2)
    tensor([[1., 2., 0., 0.],
            [3., 4., 0., 0.],
            [0., 0., 1., 2.],
            [0., 0., 3., 4.]])
    

# torch.kthvalue

`torch.kthvalue(input, k, dim=None, keepdim=False, *, out=None) -> (Tensor,
LongTensor)`

    

Returns a namedtuple `(values, indices)` where `values` is the `k` th smallest
element of each row of the `input` tensor in the given dimension `dim`. And
`indices` is the index location of each element found.

If `dim` is not given, the last dimension of the `input` is chosen.

If `keepdim` is `True`, both the `values` and `indices` tensors are the same
size as `input`, except in the dimension `dim` where they are of size 1.
Otherwise, `dim` is squeezed (see
[`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting
in both the `values` and `indices` tensors having 1 fewer dimension than the
`input` tensor.

Note

When `input` is a CUDA tensor and there are multiple valid `k` th values, this
function may nondeterministically return `indices` for any of them.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **k** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – k for the k-th smallest element
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to find the kth value along
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the output tuple of (Tensor, LongTensor) can
be optionally given to be used as output buffers

Example:

    
    
    >>> x = torch.arange(1., 6.)
    >>> x
    tensor([ 1.,  2.,  3.,  4.,  5.])
    >>> torch.kthvalue(x, 4)
    torch.return_types.kthvalue(values=tensor(4.), indices=tensor(3))
    
    >>> x=torch.arange(1.,7.).resize_(2,3)
    >>> x
    tensor([[ 1.,  2.,  3.],
            [ 4.,  5.,  6.]])
    >>> torch.kthvalue(x, 2, 0, True)
    torch.return_types.kthvalue(values=tensor([[4., 5., 6.]]), indices=tensor([[1, 1, 1]]))
    

# torch.lcm

`torch.lcm(input, other, *, out=None) → Tensor`

    

Computes the element-wise least common multiple (LCM) of `input` and `other`.

Both `input` and `other` must have integer types.

Note

This defines lcm(0,0)=0lcm(0, 0) = 0 and lcm(0,a)=0lcm(0, a) = 0 .

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([5, 10, 15])
    >>> b = torch.tensor([3, 4, 5])
    >>> torch.lcm(a, b)
    tensor([15, 20, 15])
    >>> c = torch.tensor([3])
    >>> torch.lcm(a, c)
    tensor([15, 30, 15])
    

# torch.ldexp

`torch.ldexp(input, other, *, out=None) → Tensor`

    

Multiplies `input` by 2**:attr:`other`.

outi=inputi∗2iother\text{{out}}_i = \text{{input}}_i * 2^\text{{other}}_i

Typically this function is used to construct floating point numbers by
multiplying mantissas in `input` with integral powers of two created from the
exponents in :attr:’other’.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a tensor of exponents, typically integers.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> torch.ldexp(torch.tensor([1.]), torch.tensor([1]))
    tensor([2.])
    >>> torch.ldexp(torch.tensor([1.0]), torch.tensor([1, 2, 3, 4]))
    tensor([ 2.,  4.,  8., 16.])
    

# torch.le

`torch.le(input, other, *, out=None) → Tensor`

    

Computes input≤other\text{input} \leq \text{other} element-wise.

The second argument can be a number or a tensor whose shape is
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the first argument.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – the tensor or value to compare

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Returns

    

A boolean tensor that is True where `input` is less than or equal to `other`
and False elsewhere

Example:

    
    
    >>> torch.le(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
    tensor([[True, False], [True, True]])
    

# torch.lerp

`torch.lerp(input, end, weight, *, out=None)`

    

Does a linear interpolation of two tensors `start` (given by `input`) and
`end` based on a scalar or tensor `weight` and returns the resulting `out`
tensor.

outi=starti+weighti×(endi−starti)\text{out}_i = \text{start}_i +
\text{weight}_i \times (\text{end}_i - \text{start}_i)

The shapes of `start` and `end` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics). If `weight` is a tensor, then the shapes of `weight`, `start`, and
`end` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor with the starting points
  * **end** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor with the ending points
  * **weight** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _tensor_) – the weight for the interpolation formula

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> start = torch.arange(1., 5.)
    >>> end = torch.empty(4).fill_(10)
    >>> start
    tensor([ 1.,  2.,  3.,  4.])
    >>> end
    tensor([ 10.,  10.,  10.,  10.])
    >>> torch.lerp(start, end, 0.5)
    tensor([ 5.5000,  6.0000,  6.5000,  7.0000])
    >>> torch.lerp(start, end, torch.full_like(start, 0.5))
    tensor([ 5.5000,  6.0000,  6.5000,  7.0000])
    

# torch.less

`torch.less(input, other, *, out=None) → Tensor`

    

Alias for [`torch.lt()`](torch.lt#torch.lt "torch.lt").

# torch.less_equal

`torch.less_equal(input, other, *, out=None) → Tensor`

    

Alias for [`torch.le()`](torch.le#torch.le "torch.le").

# torch.lgamma

`torch.lgamma(input, *, out=None) → Tensor`

    

Computes the logarithm of the gamma function on `input`.

outi=log⁡Γ(inputi)\text{out}_{i} = \log \Gamma(\text{input}_{i})

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.arange(0.5, 2, 0.5)
    >>> torch.lgamma(a)
    tensor([ 0.5724,  0.0000, -0.1208])
    

# torch.linspace

`torch.linspace(start, end, steps, *, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Creates a one-dimensional tensor of size `steps` whose values are evenly
spaced from `start` to `end`, inclusive. That is, the value are:

(start,start+end−startsteps−1,…,start+(steps−2)∗end−startsteps−1,end)(\text{start},
\text{start} + \frac{\text{end} - \text{start}}{\text{steps} - 1}, \ldots,
\text{start} + (\text{steps} - 2) * \frac{\text{end} -
\text{start}}{\text{steps} - 1}, \text{end})

Warning

Not providing a value for `steps` is deprecated. For backwards compatibility,
not providing a value for `steps` will create a tensor with 100 elements. Note
that this behavior is not reflected in the documented function signature and
should not be relied on. In a future PyTorch release, failing to provide a
value for `steps` will throw a runtime error.

Parameters

    

  * **start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the starting value for the set of points
  * **end** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the ending value for the set of points
  * **steps** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the constructed tensor

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.linspace(3, 10, steps=5)
    tensor([  3.0000,   4.7500,   6.5000,   8.2500,  10.0000])
    >>> torch.linspace(-10, 10, steps=5)
    tensor([-10.,  -5.,   0.,   5.,  10.])
    >>> torch.linspace(start=-10, end=10, steps=5)
    tensor([-10.,  -5.,   0.,   5.,  10.])
    >>> torch.linspace(start=-10, end=10, steps=1)
    tensor([-10.])
    

# torch.load

`torch.load(f, map_location=None, pickle_module=<module 'pickle' from
'/home/matti/miniconda3/lib/python3.7/pickle.py'>, **pickle_load_args)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/serialization.html#load)

    

Loads an object saved with [`torch.save()`](torch.save#torch.save
"torch.save") from a file.

`torch.load()` uses Python’s unpickling facilities but treats storages, which
underlie tensors, specially. They are first deserialized on the CPU and are
then moved to the device they were saved from. If this fails (e.g. because the
run time system doesn’t have certain devices), an exception is raised.
However, storages can be dynamically remapped to an alternative set of devices
using the `map_location` argument.

If `map_location` is a callable, it will be called once for each serialized
storage with two arguments: storage and location. The storage argument will be
the initial deserialization of the storage, residing on the CPU. Each
serialized storage has a location tag associated with it which identifies the
device it was saved from, and this tag is the second argument passed to
`map_location`. The builtin location tags are `'cpu'` for CPU tensors and
`'cuda:device_id'` (e.g. `'cuda:2'`) for CUDA tensors. `map_location` should
return either `None` or a storage. If `map_location` returns a storage, it
will be used as the final deserialized object, already moved to the right
device. Otherwise, `torch.load()` will fall back to the default behavior, as
if `map_location` wasn’t specified.

If `map_location` is a
[`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device")
object or a string containing a device tag, it indicates the location where
all tensors should be loaded.

Otherwise, if `map_location` is a dict, it will be used to remap location tags
appearing in the file (keys), to ones that specify where to put the storages
(values).

User extensions can register their own location tags and tagging and
deserialization methods using `torch.serialization.register_package()`.

Parameters

    

  * **f** – a file-like object (has to implement `read()`, `readline()`, `tell()`, and `seek()`), or a string or os.PathLike object containing a file name
  * **map_location** – a function, [`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), string or a dict specifying how to remap storage locations
  * **pickle_module** – module used for unpickling metadata and objects (has to match the `pickle_module` used to serialize file)
  * **pickle_load_args** – (Python 3 only) optional keyword arguments passed over to `pickle_module.load()` and `pickle_module.Unpickler()`, e.g., `errors=...`.

Warning

`torch.load()` uses `pickle` module implicitly, which is known to be insecure.
It is possible to construct malicious pickle data which will execute arbitrary
code during unpickling. Never load data that could have come from an untrusted
source, or that could have been tampered with. **Only load data you trust**.

Note

When you call `torch.load()` on a file which contains GPU tensors, those
tensors will be loaded to GPU by default. You can call `torch.load(..,
map_location='cpu')` and then `load_state_dict()` to avoid GPU RAM surge when
loading a model checkpoint.

Note

By default, we decode byte strings as `utf-8`. This is to avoid a common error
case `UnicodeDecodeError: 'ascii' codec can't decode byte 0x...` when loading
files saved by Python 2 in Python 3. If this default is incorrect, you may use
an extra `encoding` keyword argument to specify how these objects should be
loaded, e.g., `encoding='latin1'` decodes them to strings using `latin1`
encoding, and `encoding='bytes'` keeps them as byte arrays which can be
decoded later with `byte_array.decode(...)`.

#### Example

    
    
    >>> torch.load('tensors.pt')
    # Load all tensors onto the CPU
    >>> torch.load('tensors.pt', map_location=torch.device('cpu'))
    # Load all tensors onto the CPU, using a function
    >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage)
    # Load all tensors onto GPU 1
    >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage.cuda(1))
    # Map tensors from GPU 1 to GPU 0
    >>> torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'})
    # Load tensor from io.BytesIO object
    >>> with open('tensor.pt', 'rb') as f:
    ...     buffer = io.BytesIO(f.read())
    >>> torch.load(buffer)
    # Load a module with 'ascii' encoding for unpickling
    >>> torch.load('module.pt', encoding='ascii')
    

# torch.lobpcg

`torch.lobpcg(A, k=None, B=None, X=None, n=None, iK=None, niter=None,
tol=None, largest=None, method=None, tracker=None, ortho_iparams=None,
ortho_fparams=None, ortho_bparams=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_lobpcg.html#lobpcg)

    

Find the k largest (or smallest) eigenvalues and the corresponding
eigenvectors of a symmetric positive defined generalized eigenvalue problem
using matrix-free LOBPCG methods.

This function is a front-end to the following LOBPCG algorithms selectable via
`method` argument:

`method=”basic”` \- the LOBPCG method introduced by Andrew Knyazev, see
[Knyazev2001]. A less robust method, may fail when Cholesky is applied to
singular input.

`method=”ortho”` \- the LOBPCG method with orthogonal basis selection
[StathopoulosEtal2002]. A robust method.

Supported inputs are dense, sparse, and batches of dense matrices.

Note

In general, the basic method spends least time per iteration. However, the
robust methods converge much faster and are more stable. So, the usage of the
basic method is generally not recommended but there exist cases where the
usage of the basic method may be preferred.

Warning

The backward method does not support sparse and complex inputs. It works only
when `B` is not provided (i.e. `B == None`). We are actively working on
extensions, and the details of the algorithms are going to be published
promptly.

Warning

While it is assumed that `A` is symmetric, `A.grad` is not. To make sure that
`A.grad` is symmetric, so that `A - t * A.grad` is symmetric in first-order
optimization routines, prior to running `lobpcg` we do the following
symmetrization map: `A -> (A + A.t()) / 2`. The map is performed only when the
`A` requires gradients.

Parameters

    

  * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,m)(*, m, m)
  * **B** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the input tensor of size (∗,m,m)(*, m, m) . When not specified, `B` is interpereted as identity matrix.
  * **X** (_tensor_ _,__optional_) – the input tensor of size (∗,m,n)(*, m, n) where `k <= n <= m`. When specified, it is used as initial approximation of eigenvectors. X must be a dense tensor.
  * **iK** (_tensor_ _,__optional_) – the input tensor of size (∗,m,m)(*, m, m) . When specified, it will be used as preconditioner.
  * **k** (_integer_ _,__optional_) – the number of requested eigenpairs. Default is the number of XX columns (when specified) or `1`.
  * **n** (_integer_ _,__optional_) – if XX is not specified then `n` specifies the size of the generated random approximation of eigenvectors. Default value for `n` is `k`. If XX is specified, the value of `n` (when specified) must be the number of XX columns.
  * **tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – residual tolerance for stopping criterion. Default is `feps ** 0.5` where `feps` is smallest non-zero floating-point number of the given input tensor `A` data type.
  * **largest** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – when True, solve the eigenproblem for the largest eigenvalues. Otherwise, solve the eigenproblem for smallest eigenvalues. Default is `True`.
  * **method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – select LOBPCG method. See the description of the function above. Default is “ortho”.
  * **niter** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – maximum number of iterations. When reached, the iteration process is hard-stopped and the current approximation of eigenpairs is returned. For infinite iteration but until convergence criteria is met, use `-1`.
  * **tracker** (_callable_ _,__optional_) – 

a function for tracing the iteration process. When specified, it is called at
each iteration step with LOBPCG instance as an argument. The LOBPCG instance
holds the full state of the iteration process in the following attributes:

`iparams`, `fparams`, `bparams` \- dictionaries of integer, float, and boolean
valued input parameters, respectively

`ivars`, `fvars`, `bvars`, `tvars` \- dictionaries of integer, float, boolean,
and Tensor valued iteration variables, respectively.

`A`, `B`, `iK` \- input Tensor arguments.

`E`, `X`, `S`, `R` \- iteration Tensor variables.

For instance:

`ivars[“istep”]` \- the current iteration step `X` \- the current
approximation of eigenvectors `E` \- the current approximation of eigenvalues
`R` \- the current residual `ivars[“converged_count”]` \- the current number
of converged eigenpairs `tvars[“rerr”]` \- the current state of convergence
criteria

Note that when `tracker` stores Tensor objects from the LOBPCG instance, it
must make copies of these.

If `tracker` sets `bvars[“force_stop”] = True`, the iteration process will be
hard-stopped.

  * **ortho_fparams, ortho_bparams** (_ortho_iparams_ _,_) – various parameters to LOBPCG algorithm when using `method=”ortho”`.

Returns

    

tensor of eigenvalues of size (∗,k)(*, k)

X (Tensor): tensor of eigenvectors of size (∗,m,k)(*, m, k)

Return type

    

E ([Tensor](../tensors#torch.Tensor "torch.Tensor"))

#### References

[Knyazev2001] Andrew V. Knyazev. (2001) Toward the Optimal Preconditioned
Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method.
SIAM J. Sci. Comput., 23(2), 517-541. (25 pages)
<https://epubs.siam.org/doi/abs/10.1137/S1064827500366124>

[StathopoulosEtal2002] Andreas Stathopoulos and Kesheng Wu. (2002) A Block
Orthogonalization Procedure with Constant Synchronization Requirements. SIAM
J. Sci. Comput., 23(6), 2165-2182. (18 pages)
<https://epubs.siam.org/doi/10.1137/S1064827500370883>

[DuerschEtal2018] Jed A. Duersch, Meiyue Shao, Chao Yang, Ming Gu. (2018) A
Robust and Efficient Implementation of LOBPCG. SIAM J. Sci. Comput., 40(5),
C655-C676. (22 pages) <https://epubs.siam.org/doi/abs/10.1137/17M1129830>

# torch.log

`torch.log(input, *, out=None) → Tensor`

    

Returns a new tensor with the natural logarithm of the elements of `input`.

yi=log⁡e(xi)y_{i} = \log_{e} (x_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(5)
    >>> a
    tensor([-0.7168, -0.5471, -0.8933, -1.4428, -0.1190])
    >>> torch.log(a)
    tensor([ nan,  nan,  nan,  nan,  nan])
    

# torch.log10

`torch.log10(input, *, out=None) → Tensor`

    

Returns a new tensor with the logarithm to the base 10 of the elements of
`input`.

yi=log⁡10(xi)y_{i} = \log_{10} (x_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.rand(5)
    >>> a
    tensor([ 0.5224,  0.9354,  0.7257,  0.1301,  0.2251])
    
    
    >>> torch.log10(a)
    tensor([-0.2820, -0.0290, -0.1392, -0.8857, -0.6476])
    

# torch.log1p

`torch.log1p(input, *, out=None) → Tensor`

    

Returns a new tensor with the natural logarithm of (1 + `input`).

yi=log⁡e(xi+1)y_i = \log_{e} (x_i + 1)

Note

This function is more accurate than [`torch.log()`](torch.log#torch.log
"torch.log") for small values of `input`

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(5)
    >>> a
    tensor([-1.0090, -0.9923,  1.0249, -0.5372,  0.2492])
    >>> torch.log1p(a)
    tensor([    nan, -4.8653,  0.7055, -0.7705,  0.2225])
    

# torch.log2

`torch.log2(input, *, out=None) → Tensor`

    

Returns a new tensor with the logarithm to the base 2 of the elements of
`input`.

yi=log⁡2(xi)y_{i} = \log_{2} (x_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.rand(5)
    >>> a
    tensor([ 0.8419,  0.8003,  0.9971,  0.5287,  0.0490])
    
    
    >>> torch.log2(a)
    tensor([-0.2483, -0.3213, -0.0042, -0.9196, -4.3504])
    

# torch.logaddexp

`torch.logaddexp(input, other, *, out=None) → Tensor`

    

Logarithm of the sum of exponentiations of the inputs.

Calculates pointwise log⁡(ex+ey)\log\left(e^x + e^y\right) . This function is
useful in statistics where the calculated probabilities of events may be so
small as to exceed the range of normal floating point numbers. In such cases
the logarithm of the calculated probability is stored. This function allows
adding probabilities stored in such a fashion.

This op should be disambiguated with
[`torch.logsumexp()`](torch.logsumexp#torch.logsumexp "torch.logsumexp") which
performs a reduction on a single tensor.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.logaddexp(torch.tensor([-1.0]), torch.tensor([-1.0, -2, -3]))
    tensor([-0.3069, -0.6867, -0.8731])
    >>> torch.logaddexp(torch.tensor([-100.0, -200, -300]), torch.tensor([-1.0, -2, -3]))
    tensor([-1., -2., -3.])
    >>> torch.logaddexp(torch.tensor([1.0, 2000, 30000]), torch.tensor([-1.0, -2, -3]))
    tensor([1.1269e+00, 2.0000e+03, 3.0000e+04])
    

# torch.logaddexp2

`torch.logaddexp2(input, other, *, out=None) → Tensor`

    

Logarithm of the sum of exponentiations of the inputs in base-2.

Calculates pointwise log⁡2(2x+2y)\log_2\left(2^x + 2^y\right) . See
[`torch.logaddexp()`](torch.logaddexp#torch.logaddexp "torch.logaddexp") for
more details.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

# torch.logcumsumexp

`torch.logcumsumexp(input, dim, *, out=None) → Tensor`

    

Returns the logarithm of the cumulative summation of the exponentiation of
elements of `input` in the dimension `dim`.

For summation index jj given by `dim` and other indices ii , the result is

logcumsumexp(x)ij=log⁡∑j=0iexp⁡(xij)\text{logcumsumexp}(x)_{ij} = \log
\sum\limits_{j=0}^{i} \exp(x_{ij})

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> a = torch.randn(10)
    >>> torch.logcumsumexp(a, dim=0)
    tensor([-0.42296738, -0.04462666,  0.86278635,  0.94622083,  1.05277811,
             1.39202815,  1.83525007,  1.84492621,  2.06084887,  2.06844475]))
    

# torch.logdet

`torch.logdet(input) → Tensor`

    

Calculates log determinant of a square matrix or batches of square matrices.

Note

Result is `-inf` if `input` has zero log determinant, and is `nan` if `input`
has negative determinant.

Note

Backward through `logdet()` internally uses SVD results when `input` is not
invertible. In this case, double backward through `logdet()` will be unstable
in when `input` doesn’t have distinct singular values. See
[`svd()`](torch.svd#torch.svd "torch.svd") for details.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor of size `(*, n, n)` where `*` is zero or more batch dimensions.

Example:

    
    
    >>> A = torch.randn(3, 3)
    >>> torch.det(A)
    tensor(0.2611)
    >>> torch.logdet(A)
    tensor(-1.3430)
    >>> A
    tensor([[[ 0.9254, -0.6213],
             [-0.5787,  1.6843]],
    
            [[ 0.3242, -0.9665],
             [ 0.4539, -0.0887]],
    
            [[ 1.1336, -0.4025],
             [-0.7089,  0.9032]]])
    >>> A.det()
    tensor([1.1990, 0.4099, 0.7386])
    >>> A.det().log()
    tensor([ 0.1815, -0.8917, -0.3031])
    

# torch.logical_and

`torch.logical_and(input, other, *, out=None) → Tensor`

    

Computes the element-wise logical AND of the given input tensors. Zeros are
treated as `False` and nonzeros are treated as `True`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute AND with

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.logical_and(torch.tensor([True, False, True]), torch.tensor([True, False, False]))
    tensor([ True, False, False])
    >>> a = torch.tensor([0, 1, 10, 0], dtype=torch.int8)
    >>> b = torch.tensor([4, 0, 1, 0], dtype=torch.int8)
    >>> torch.logical_and(a, b)
    tensor([False, False,  True, False])
    >>> torch.logical_and(a.double(), b.double())
    tensor([False, False,  True, False])
    >>> torch.logical_and(a.double(), b)
    tensor([False, False,  True, False])
    >>> torch.logical_and(a, b, out=torch.empty(4, dtype=torch.bool))
    tensor([False, False,  True, False])
    

# torch.logical_not

`torch.logical_not(input, *, out=None) → Tensor`

    

Computes the element-wise logical NOT of the given input tensor. If not
specified, the output tensor will have the bool dtype. If the input tensor is
not a bool tensor, zeros are treated as `False` and non-zeros are treated as
`True`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.logical_not(torch.tensor([True, False]))
    tensor([False,  True])
    >>> torch.logical_not(torch.tensor([0, 1, -10], dtype=torch.int8))
    tensor([ True, False, False])
    >>> torch.logical_not(torch.tensor([0., 1.5, -10.], dtype=torch.double))
    tensor([ True, False, False])
    >>> torch.logical_not(torch.tensor([0., 1., -10.], dtype=torch.double), out=torch.empty(3, dtype=torch.int16))
    tensor([1, 0, 0], dtype=torch.int16)
    

# torch.logical_or

`torch.logical_or(input, other, *, out=None) → Tensor`

    

Computes the element-wise logical OR of the given input tensors. Zeros are
treated as `False` and nonzeros are treated as `True`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute OR with

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.logical_or(torch.tensor([True, False, True]), torch.tensor([True, False, False]))
    tensor([ True, False,  True])
    >>> a = torch.tensor([0, 1, 10, 0], dtype=torch.int8)
    >>> b = torch.tensor([4, 0, 1, 0], dtype=torch.int8)
    >>> torch.logical_or(a, b)
    tensor([ True,  True,  True, False])
    >>> torch.logical_or(a.double(), b.double())
    tensor([ True,  True,  True, False])
    >>> torch.logical_or(a.double(), b)
    tensor([ True,  True,  True, False])
    >>> torch.logical_or(a, b, out=torch.empty(4, dtype=torch.bool))
    tensor([ True,  True,  True, False])
    

# torch.logical_xor

`torch.logical_xor(input, other, *, out=None) → Tensor`

    

Computes the element-wise logical XOR of the given input tensors. Zeros are
treated as `False` and nonzeros are treated as `True`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute XOR with

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.logical_xor(torch.tensor([True, False, True]), torch.tensor([True, False, False]))
    tensor([False, False,  True])
    >>> a = torch.tensor([0, 1, 10, 0], dtype=torch.int8)
    >>> b = torch.tensor([4, 0, 1, 0], dtype=torch.int8)
    >>> torch.logical_xor(a, b)
    tensor([ True,  True, False, False])
    >>> torch.logical_xor(a.double(), b.double())
    tensor([ True,  True, False, False])
    >>> torch.logical_xor(a.double(), b)
    tensor([ True,  True, False, False])
    >>> torch.logical_xor(a, b, out=torch.empty(4, dtype=torch.bool))
    tensor([ True,  True, False, False])
    

# torch.logit

`torch.logit(input, eps=None, *, out=None) → Tensor`

    

Returns a new tensor with the logit of the elements of `input`. `input` is
clamped to [eps, 1 - eps] when eps is not None. When eps is None and `input` <
0 or `input` > 1, the function will yields NaN.

yi=ln⁡(zi1−zi)zi={xiif eps is Noneepsif xi<epsxiif eps≤xi≤1−eps1−epsif
xi>1−epsy_{i} = \ln(\frac{z_{i}}{1 - z_{i}}) \\\ z_{i} = \begin{cases} x_{i} &
\text{if eps is None} \\\ \text{eps} & \text{if } x_{i} < \text{eps} \\\ x_{i}
& \text{if } \text{eps} \leq x_{i} \leq 1 - \text{eps} \\\ 1 - \text{eps} &
\text{if } x_{i} > 1 - \text{eps} \end{cases}

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the epsilon for input clamp bound. Default: `None`

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.rand(5)
    >>> a
    tensor([0.2796, 0.9331, 0.6486, 0.1523, 0.6516])
    >>> torch.logit(a, eps=1e-6)
    tensor([-0.9466,  2.6352,  0.6131, -1.7169,  0.6261])
    

# torch.logspace

`torch.logspace(start, end, steps, base=10.0, *, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Creates a one-dimensional tensor of size `steps` whose values are evenly
spaced from basestart{{\text{{base}}}}^{{\text{{start}}}} to
baseend{{\text{{base}}}}^{{\text{{end}}}} , inclusive, on a logarithmic scale
with base `base`. That is, the values are:

(basestart,base(start+end−startsteps−1),…,base(start+(steps−2)∗end−startsteps−1),baseend)(\text{base}^{\text{start}},
\text{base}^{(\text{start} + \frac{\text{end} - \text{start}}{ \text{steps} -
1})}, \ldots, \text{base}^{(\text{start} + (\text{steps} - 2) *
\frac{\text{end} - \text{start}}{ \text{steps} - 1})},
\text{base}^{\text{end}})

Warning

Not providing a value for `steps` is deprecated. For backwards compatibility,
not providing a value for `steps` will create a tensor with 100 elements. Note
that this behavior is not reflected in the documented function signature and
should not be relied on. In a future PyTorch release, failing to provide a
value for `steps` will throw a runtime error.

Parameters

    

  * **start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the starting value for the set of points
  * **end** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the ending value for the set of points
  * **steps** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the constructed tensor
  * **base** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – base of the logarithm function. Default: `10.0`.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.logspace(start=-10, end=10, steps=5)
    tensor([ 1.0000e-10,  1.0000e-05,  1.0000e+00,  1.0000e+05,  1.0000e+10])
    >>> torch.logspace(start=0.1, end=1.0, steps=5)
    tensor([  1.2589,   2.1135,   3.5481,   5.9566,  10.0000])
    >>> torch.logspace(start=0.1, end=1.0, steps=1)
    tensor([1.2589])
    >>> torch.logspace(start=2, end=2, steps=1, base=2)
    tensor([4.0])
    

# torch.logsumexp

`torch.logsumexp(input, dim, keepdim=False, *, out=None)`

    

Returns the log of summed exponentials of each row of the `input` tensor in
the given dimension `dim`. The computation is numerically stabilized.

For summation index jj given by `dim` and other indices ii , the result is

logsumexp(x)i=log⁡∑jexp⁡(xij)\text{logsumexp}(x)_{i} = \log \sum_j
\exp(x_{ij})

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> a = torch.randn(3, 3)
    >>> torch.logsumexp(a, 1)
    tensor([ 0.8442,  1.4322,  0.8711])
    

# torch.lstsq

`torch.lstsq(input, A, *, out=None) → Tensor`

    

Computes the solution to the least squares and least norm problems for a full
rank matrix AA of size (m×n)(m \times n) and a matrix BB of size (m×k)(m
\times k) .

If m≥nm \geq n , `lstsq()` solves the least-squares problem:

min⁡X∥AX−B∥2.\begin{array}{ll} \min_X & \|AX-B\|_2. \end{array}

If m<nm < n , `lstsq()` solves the least-norm problem:

min⁡X∥X∥2subject toAX=B.\begin{array}{ll} \min_X & \|X\|_2 & \text{subject to}
& AX = B. \end{array}

Returned tensor XX has shape (max⁡(m,n)×k)(\max(m, n) \times k) . The first nn
rows of XX contains the solution. If m≥nm \geq n , the residual sum of squares
for the solution in each column is given by the sum of squares of elements in
the remaining m−nm - n rows of that column.

Note

The case when m<nm < n is not supported on the GPU.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the matrix BB
  * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the mm by nn matrix AA

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the optional destination tensor

Returns

    

A namedtuple (solution, QR) containing:

  * **solution** (_Tensor_): the least squares solution
  * **QR** (_Tensor_): the details of the QR factorization

Return type

    

([Tensor](../tensors#torch.Tensor "torch.Tensor"),
[Tensor](../tensors#torch.Tensor "torch.Tensor"))

Note

The returned matrices will always be transposed, irrespective of the strides
of the input matrices. That is, they will have stride `(1, m)` instead of `(m,
1)`.

Example:

    
    
    >>> A = torch.tensor([[1., 1, 1],
    ...                   [2, 3, 4],
    ...                   [3, 5, 2],
    ...                   [4, 2, 5],
    ...                   [5, 4, 3]])
    >>> B = torch.tensor([[-10., -3],
    ...                   [ 12, 14],
    ...                   [ 14, 12],
    ...                   [ 16, 16],
    ...                   [ 18, 16]])
    >>> X, _ = torch.lstsq(B, A)
    >>> X
    tensor([[  2.0000,   1.0000],
            [  1.0000,   1.0000],
            [  1.0000,   2.0000],
            [ 10.9635,   4.8501],
            [  8.9332,   5.2418]])
    

# torch.lt

`torch.lt(input, other, *, out=None) → Tensor`

    

Computes input<other\text{input} < \text{other} element-wise.

The second argument can be a number or a tensor whose shape is
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the first argument.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Returns

    

A boolean tensor that is True where `input` is less than `other` and False
elsewhere

Example:

    
    
    >>> torch.lt(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
    tensor([[False, False], [True, False]])
    

# torch.lu

`torch.lu(*args, **kwargs)`

    

Computes the LU factorization of a matrix or batches of matrices `A`. Returns
a tuple containing the LU factorization and pivots of `A`. Pivoting is done if
`pivot` is set to `True`.

Note

The pivots returned by the function are 1-indexed. If `pivot` is `False`, then
the returned pivots is a tensor filled with zeros of the appropriate size.

Note

LU factorization with `pivot` = `False` is not available for CPU, and
attempting to do so will throw an error. However, LU factorization with
`pivot` = `False` is available for CUDA.

Note

This function does not check if the factorization was successful or not if
`get_infos` is `True` since the status of the factorization is present in the
third element of the return tuple.

Note

In the case of batches of square matrices with size less or equal to 32 on a
CUDA device, the LU factorization is repeated for singular matrices due to the
bug in the MAGMA library (see magma issue 13).

Note

`L`, `U`, and `P` can be derived using
[`torch.lu_unpack()`](torch.lu_unpack#torch.lu_unpack "torch.lu_unpack").

Warning

The LU factorization does have backward support, but only for square inputs of
full rank.

Parameters

    

  * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to factor of size (∗,m,n)(*, m, n)
  * **pivot** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether pivoting is done. Default: `True`
  * **get_infos** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if set to `True`, returns an info IntTensor. Default: `False`
  * **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – optional output tuple. If `get_infos` is `True`, then the elements in the tuple are Tensor, IntTensor, and IntTensor. If `get_infos` is `False`, then the elements in the tuple are Tensor, IntTensor. Default: `None`

Returns

    

A tuple of tensors containing

  * **factorization** (_Tensor_): the factorization of size (∗,m,n)(*, m, n)
  * **pivots** (_IntTensor_): the pivots of size (∗,min(m,n))(*, \text{min}(m, n)) . `pivots` stores all the intermediate transpositions of rows. The final permutation `perm` could be reconstructed by applying `swap(perm[i], perm[pivots[i] - 1])` for `i = 0, ..., pivots.size(-1) - 1`, where `perm` is initially the identity permutation of mm elements (essentially this is what [`torch.lu_unpack()`](torch.lu_unpack#torch.lu_unpack "torch.lu_unpack") is doing).
  * **infos** (_IntTensor_ , _optional_): if `get_infos` is `True`, this is a tensor of size (∗)(*) where non-zero values indicate whether factorization for the matrix or each minibatch has succeeded or failed

Return type

    

([Tensor](../tensors#torch.Tensor "torch.Tensor"), IntTensor, IntTensor
(optional))

Example:

    
    
    >>> A = torch.randn(2, 3, 3)
    >>> A_LU, pivots = torch.lu(A)
    >>> A_LU
    tensor([[[ 1.3506,  2.5558, -0.0816],
             [ 0.1684,  1.1551,  0.1940],
             [ 0.1193,  0.6189, -0.5497]],
    
            [[ 0.4526,  1.2526, -0.3285],
             [-0.7988,  0.7175, -0.9701],
             [ 0.2634, -0.9255, -0.3459]]])
    >>> pivots
    tensor([[ 3,  3,  3],
            [ 3,  3,  3]], dtype=torch.int32)
    >>> A_LU, pivots, info = torch.lu(A, get_infos=True)
    >>> if info.nonzero().size(0) == 0:
    ...   print('LU factorization succeeded for all samples!')
    LU factorization succeeded for all samples!
    

# torch.lu_solve

`torch.lu_solve(b, LU_data, LU_pivots, *, out=None) → Tensor`

    

Returns the LU solve of the linear system Ax=bAx = b using the partially
pivoted LU factorization of A from [`torch.lu()`](torch.lu#torch.lu
"torch.lu").

This function supports `float`, `double`, `cfloat` and `cdouble` dtypes for
`input`.

Parameters

    

  * **b** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the RHS tensor of size (∗,m,k)(*, m, k) , where ∗* is zero or more batch dimensions.
  * **LU_data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the pivoted LU factorization of A from [`torch.lu()`](torch.lu#torch.lu "torch.lu") of size (∗,m,m)(*, m, m) , where ∗* is zero or more batch dimensions.
  * **LU_pivots** (_IntTensor_) – the pivots of the LU factorization from [`torch.lu()`](torch.lu#torch.lu "torch.lu") of size (∗,m)(*, m) , where ∗* is zero or more batch dimensions. The batch dimensions of `LU_pivots` must be equal to the batch dimensions of `LU_data`.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> A = torch.randn(2, 3, 3)
    >>> b = torch.randn(2, 3, 1)
    >>> A_LU = torch.lu(A)
    >>> x = torch.lu_solve(b, *A_LU)
    >>> torch.norm(torch.bmm(A, x) - b)
    tensor(1.00000e-07 *
           2.8312)
    

# torch.lu_unpack

`torch.lu_unpack(LU_data, LU_pivots, unpack_data=True, unpack_pivots=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#lu_unpack)

    

Unpacks the data and pivots from a LU factorization of a tensor.

Returns a tuple of tensors as `(the pivots, the L tensor, the U tensor)`.

Parameters

    

  * **LU_data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the packed LU factorization data
  * **LU_pivots** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the packed LU factorization pivots
  * **unpack_data** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – flag indicating if the data should be unpacked
  * **unpack_pivots** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – flag indicating if the pivots should be unpacked

Examples:

    
    
    >>> A = torch.randn(2, 3, 3)
    >>> A_LU, pivots = A.lu()
    >>> P, A_L, A_U = torch.lu_unpack(A_LU, pivots)
    >>>
    >>> # can recover A from factorization
    >>> A_ = torch.bmm(P, torch.bmm(A_L, A_U))
    
    >>> # LU factorization of a rectangular matrix:
    >>> A = torch.randn(2, 3, 2)
    >>> A_LU, pivots = A.lu()
    >>> P, A_L, A_U = torch.lu_unpack(A_LU, pivots)
    >>> P
    tensor([[[1., 0., 0.],
             [0., 1., 0.],
             [0., 0., 1.]],
    
            [[0., 0., 1.],
             [0., 1., 0.],
             [1., 0., 0.]]])
    >>> A_L
    tensor([[[ 1.0000,  0.0000],
             [ 0.4763,  1.0000],
             [ 0.3683,  0.1135]],
    
            [[ 1.0000,  0.0000],
             [ 0.2957,  1.0000],
             [-0.9668, -0.3335]]])
    >>> A_U
    tensor([[[ 2.1962,  1.0881],
             [ 0.0000, -0.8681]],
    
            [[-1.0947,  0.3736],
             [ 0.0000,  0.5718]]])
    >>> A_ = torch.bmm(P, torch.bmm(A_L, A_U))
    >>> torch.norm(A_ - A)
    tensor(2.9802e-08)
    

# torch.manual_seed

`torch.manual_seed(seed)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#manual_seed)

    

Sets the seed for generating random numbers. Returns a `torch.Generator`
object.

Parameters

    

**seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – The desired seed. Value must be within the inclusive range
`[-0x8000_0000_0000_0000, 0xffff_ffff_ffff_ffff]`. Otherwise, a RuntimeError
is raised. Negative inputs are remapped to positive values with the formula
`0xffff_ffff_ffff_ffff + seed`.

# torch.masked_select

`torch.masked_select(input, mask, *, out=None) → Tensor`

    

Returns a new 1-D tensor which indexes the `input` tensor according to the
boolean mask `mask` which is a `BoolTensor`.

The shapes of the `mask` tensor and the `input` tensor don’t need to match,
but they must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Note

The returned tensor does **not** use the same storage as the original tensor

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **mask** (_BoolTensor_) – the tensor containing the binary mask to index with

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x = torch.randn(3, 4)
    >>> x
    tensor([[ 0.3552, -2.3825, -0.8297,  0.3477],
            [-1.2035,  1.2252,  0.5002,  0.6248],
            [ 0.1307, -2.0608,  0.1244,  2.0139]])
    >>> mask = x.ge(0.5)
    >>> mask
    tensor([[False, False, False, False],
            [False, True, True, True],
            [False, False, False, True]])
    >>> torch.masked_select(x, mask)
    tensor([ 1.2252,  0.5002,  0.6248,  2.0139])
    

# torch.matmul

`torch.matmul(input, other, *, out=None) → Tensor`

    

Matrix product of two tensors.

The behavior depends on the dimensionality of the tensors as follows:

  * If both tensors are 1-dimensional, the dot product (scalar) is returned.
  * If both arguments are 2-dimensional, the matrix-matrix product is returned.
  * If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed.
  * If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned.
  * If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are [broadcasted](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-semantics) (and thus must be broadcastable). For example, if `input` is a (j×1×n×n)(j \times 1 \times n \times n) tensor and `other` is a (k×n×n)(k \times n \times n) tensor, `out` will be a (j×k×n×n)(j \times k \times n \times n) tensor.

Note that the broadcasting logic only looks at the batch dimensions when
determining if the inputs are broadcastable, and not the matrix dimensions.
For example, if `input` is a (j×1×n×m)(j \times 1 \times n \times m) tensor
and `other` is a (k×m×p)(k \times m \times p) tensor, these inputs are valid
for broadcasting even though the final two dimensions (i.e. the matrix
dimensions) are different. `out` will be a (j×k×n×p)(j \times k \times n
\times p) tensor.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Note

The 1-dimensional dot product version of this function does not support an
`out` parameter.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first tensor to be multiplied
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second tensor to be multiplied

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> # vector x vector
    >>> tensor1 = torch.randn(3)
    >>> tensor2 = torch.randn(3)
    >>> torch.matmul(tensor1, tensor2).size()
    torch.Size([])
    >>> # matrix x vector
    >>> tensor1 = torch.randn(3, 4)
    >>> tensor2 = torch.randn(4)
    >>> torch.matmul(tensor1, tensor2).size()
    torch.Size([3])
    >>> # batched matrix x broadcasted vector
    >>> tensor1 = torch.randn(10, 3, 4)
    >>> tensor2 = torch.randn(4)
    >>> torch.matmul(tensor1, tensor2).size()
    torch.Size([10, 3])
    >>> # batched matrix x batched matrix
    >>> tensor1 = torch.randn(10, 3, 4)
    >>> tensor2 = torch.randn(10, 4, 5)
    >>> torch.matmul(tensor1, tensor2).size()
    torch.Size([10, 3, 5])
    >>> # batched matrix x broadcasted matrix
    >>> tensor1 = torch.randn(10, 3, 4)
    >>> tensor2 = torch.randn(4, 5)
    >>> torch.matmul(tensor1, tensor2).size()
    torch.Size([10, 3, 5])
    

# torch.matrix_exp

`torch.matrix_exp()`

    

Returns the matrix exponential. Supports batched input. For a matrix `A`, the
matrix exponential is defined as

eA=∑k=0∞Ak/k!\mathrm{e}^A = \sum_{k=0}^\infty A^k / k!

The implementation is based on:

Bader, P.; Blanes, S.; Casas, F. Computing the Matrix Exponential with an
Optimized Taylor Polynomial Approximation. Mathematics 2019, 7, 1174.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.randn(2, 2, 2)
    >>> a[0, :, :] = torch.eye(2, 2)
    >>> a[1, :, :] = 2 * torch.eye(2, 2)
    >>> a
    tensor([[[1., 0.],
             [0., 1.]],
    
            [[2., 0.],
             [0., 2.]]])
    >>> torch.matrix_exp(a)
    tensor([[[2.7183, 0.0000],
             [0.0000, 2.7183]],
    
             [[7.3891, 0.0000],
              [0.0000, 7.3891]]])
    
    >>> import math
    >>> x = torch.tensor([[0, math.pi/3], [-math.pi/3, 0]])
    >>> x.matrix_exp() # should be [[cos(pi/3), sin(pi/3)], [-sin(pi/3), cos(pi/3)]]
    tensor([[ 0.5000,  0.8660],
            [-0.8660,  0.5000]])
    

# torch.matrix_power

`torch.matrix_power(input, n) → Tensor`

    

Returns the matrix raised to the power `n` for square matrices. For batch of
matrices, each individual matrix is raised to the power `n`.

If `n` is negative, then the inverse of the matrix (if invertible) is raised
to the power `n`. For a batch of matrices, the batched inverse (if invertible)
is raised to the power `n`. If `n` is 0, then an identity matrix is returned.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the power to raise the matrix to

Example:

    
    
    >>> a = torch.randn(2, 2, 2)
    >>> a
    tensor([[[-1.9975, -1.9610],
             [ 0.9592, -2.3364]],
    
            [[-1.2534, -1.3429],
             [ 0.4153, -1.4664]]])
    >>> torch.matrix_power(a, 3)
    tensor([[[  3.9392, -23.9916],
             [ 11.7357,  -0.2070]],
    
            [[  0.2468,  -6.7168],
             [  2.0774,  -0.8187]]])
    

# torch.matrix_rank

`torch.matrix_rank(input, tol=None, symmetric=False, *, out=None) → Tensor`

    

Returns the numerical rank of a 2-D tensor. The method to compute the matrix
rank is done using SVD by default. If `symmetric` is `True`, then `input` is
assumed to be symmetric, and the computation of the rank is done by obtaining
the eigenvalues.

`tol` is the threshold below which the singular values (or the eigenvalues
when `symmetric` is `True`) are considered to be 0. If `tol` is not specified,
`tol` is set to `S.max() * max(S.size()) * eps` where `S` is the singular
values (or the eigenvalues when `symmetric` is `True`), and `eps` is the
epsilon value for the datatype of `input`.

Note

`torch.matrix_rank()` is deprecated. Please use
[`torch.linalg.matrix_rank()`](../linalg#torch.linalg.matrix_rank
"torch.linalg.matrix_rank") instead. The parameter `symmetric` was renamed in
[`torch.linalg.matrix_rank()`](../linalg#torch.linalg.matrix_rank
"torch.linalg.matrix_rank") to `hermitian`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input 2-D tensor
  * **tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the tolerance value. Default: `None`
  * **symmetric** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates whether `input` is symmetric. Default: `False`

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.eye(10)
    >>> torch.matrix_rank(a)
    tensor(10)
    >>> b = torch.eye(10)
    >>> b[0, 0] = 0
    >>> torch.matrix_rank(b)
    tensor(9)
    

# torch.max

`torch.max(input) → Tensor`

    

Returns the maximum value of all elements in the `input` tensor.

Warning

This function produces deterministic (sub)gradients unlike `max(dim=0)`

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[ 0.6763,  0.7445, -2.2369]])
    >>> torch.max(a)
    tensor(0.7445)
    

`torch.max(input, dim, keepdim=False, *, out=None) -> (Tensor, LongTensor)`

Returns a namedtuple `(values, indices)` where `values` is the maximum value
of each row of the `input` tensor in the given dimension `dim`. And `indices`
is the index location of each maximum value found (argmax).

If `keepdim` is `True`, the output tensors are of the same size as `input`
except in the dimension `dim` where they are of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensors having 1 fewer dimension
than `input`.

Note

If there are multiple maximal values in a reduced row then the indices of the
first maximal value are returned.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Default: `False`.

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the result tuple of two output tensors (max,
max_indices)

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[-1.2360, -0.2942, -0.1222,  0.8475],
            [ 1.1949, -1.1127, -2.2379, -0.6702],
            [ 1.5717, -0.9207,  0.1297, -1.8768],
            [-0.6172,  1.0036, -0.6060, -0.2432]])
    >>> torch.max(a, 1)
    torch.return_types.max(values=tensor([0.8475, 1.1949, 1.5717, 1.0036]), indices=tensor([3, 0, 0, 1]))
    

`torch.max(input, other, *, out=None) → Tensor`

See [`torch.maximum()`](torch.maximum#torch.maximum "torch.maximum").

# torch.maximum

`torch.maximum(input, other, *, out=None) → Tensor`

    

Computes the element-wise maximum of `input` and `other`.

Note

If one of the elements being compared is a NaN, then that element is returned.
`maximum()` is not supported for tensors with complex dtypes.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor((1, 2, -1))
    >>> b = torch.tensor((3, 0, 4))
    >>> torch.maximum(a, b)
    tensor([3, 2, 4])
    

# torch.mean

`torch.mean(input) → Tensor`

    

Returns the mean value of all elements in the `input` tensor.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[ 0.2294, -0.5481,  1.3288]])
    >>> torch.mean(a)
    tensor(0.3367)
    

`torch.mean(input, dim, keepdim=False, *, out=None) → Tensor`

Returns the mean value of each row of the `input` tensor in the given
dimension `dim`. If `dim` is a list of dimensions, reduce over all of them.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[-0.3841,  0.6320,  0.4254, -0.7384],
            [-0.9644,  1.0131, -0.6549, -1.4279],
            [-0.2951, -1.3350, -0.7694,  0.5600],
            [ 1.0842, -0.9580,  0.3623,  0.2343]])
    >>> torch.mean(a, 1)
    tensor([-0.0163, -0.5085, -0.4599,  0.1807])
    >>> torch.mean(a, 1, True)
    tensor([[-0.0163],
            [-0.5085],
            [-0.4599],
            [ 0.1807]])
    

# torch.median

`torch.median(input) → Tensor`

    

Returns the median of the values in `input`.

Note

The median is not unique for `input` tensors with an even number of elements.
In this case the lower of the two medians is returned. To compute the mean of
both medians, use [`torch.quantile()`](torch.quantile#torch.quantile
"torch.quantile") with `q=0.5` instead.

Warning

This function produces deterministic (sub)gradients unlike `median(dim=0)`

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[ 1.5219, -1.5212,  0.2202]])
    >>> torch.median(a)
    tensor(0.2202)
    

`torch.median(input, dim=-1, keepdim=False, *, out=None) -> (Tensor,
LongTensor)`

Returns a namedtuple `(values, indices)` where `values` contains the median of
each row of `input` in the dimension `dim`, and `indices` contains the index
of the median values found in the dimension `dim`.

By default, `dim` is the last dimension of the `input` tensor.

If `keepdim` is `True`, the output tensors are of the same size as `input`
except in the dimension `dim` where they are of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the outputs tensor having 1 fewer dimension
than `input`.

Note

The median is not unique for `input` tensors with an even number of elements
in the dimension `dim`. In this case the lower of the two medians is returned.
To compute the mean of both medians in `input`, use
[`torch.quantile()`](torch.quantile#torch.quantile "torch.quantile") with
`q=0.5` instead.

Warning

`indices` does not necessarily contain the first occurrence of each median
value found, unless it is unique. The exact implementation details are device-
specific. Do not expect the same result when run on CPU and GPU in general.
For the same reason do not expect the gradients to be deterministic.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** (_(_[Tensor](../tensors#torch.Tensor "torch.Tensor")
_,_[Tensor](../tensors#torch.Tensor "torch.Tensor") _)__,__optional_) – The
first tensor will be populated with the median values and the second tensor,
which must have dtype long, with their indices in the dimension `dim` of
`input`.

Example:

    
    
    >>> a = torch.randn(4, 5)
    >>> a
    tensor([[ 0.2505, -0.3982, -0.9948,  0.3518, -1.3131],
            [ 0.3180, -0.6993,  1.0436,  0.0438,  0.2270],
            [-0.2751,  0.7303,  0.2192,  0.3321,  0.2488],
            [ 1.0778, -1.9510,  0.7048,  0.4742, -0.7125]])
    >>> torch.median(a, 1)
    torch.return_types.median(values=tensor([-0.3982,  0.2270,  0.2488,  0.4742]), indices=tensor([1, 4, 4, 3]))
    

# torch.meshgrid

`torch.meshgrid(*tensors)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#meshgrid)

    

Take NN tensors, each of which can be either scalar or 1-dimensional vector,
and create NN N-dimensional grids, where the ii th grid is defined by
expanding the ii th input over dimensions defined by other inputs.

Parameters

    

**tensors** (_list of Tensor_) – list of scalars or 1 dimensional tensors.
Scalars will be treated as tensors of size (1,)(1,) automatically

Returns

    

If the input has kk tensors of size (N1,),(N2,),…,(Nk,)(N_1,), (N_2,), \ldots
, (N_k,) , then the output would also have kk tensors, where all tensors are
of size (N1,N2,…,Nk)(N_1, N_2, \ldots , N_k) .

Return type

    

seq (sequence of Tensors)

Example:

    
    
    >>> x = torch.tensor([1, 2, 3])
    >>> y = torch.tensor([4, 5, 6])
    >>> grid_x, grid_y = torch.meshgrid(x, y)
    >>> grid_x
    tensor([[1, 1, 1],
            [2, 2, 2],
            [3, 3, 3]])
    >>> grid_y
    tensor([[4, 5, 6],
            [4, 5, 6],
            [4, 5, 6]])
    

# torch.min

`torch.min(input) → Tensor`

    

Returns the minimum value of all elements in the `input` tensor.

Warning

This function produces deterministic (sub)gradients unlike `min(dim=0)`

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[ 0.6750,  1.0857,  1.7197]])
    >>> torch.min(a)
    tensor(0.6750)
    

`torch.min(input, dim, keepdim=False, *, out=None) -> (Tensor, LongTensor)`

Returns a namedtuple `(values, indices)` where `values` is the minimum value
of each row of the `input` tensor in the given dimension `dim`. And `indices`
is the index location of each minimum value found (argmin).

If `keepdim` is `True`, the output tensors are of the same size as `input`
except in the dimension `dim` where they are of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensors having 1 fewer dimension
than `input`.

Note

If there are multiple minimal values in a reduced row then the indices of the
first minimal value are returned.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the tuple of two output tensors (min,
min_indices)

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[-0.6248,  1.1334, -1.1899, -0.2803],
            [-1.4644, -0.2635, -0.3651,  0.6134],
            [ 0.2457,  0.0384,  1.0128,  0.7015],
            [-0.1153,  2.9849,  2.1458,  0.5788]])
    >>> torch.min(a, 1)
    torch.return_types.min(values=tensor([-1.1899, -1.4644,  0.0384, -0.1153]), indices=tensor([2, 0, 1, 0]))
    

`torch.min(input, other, *, out=None) → Tensor`

See [`torch.minimum()`](torch.minimum#torch.minimum "torch.minimum").

# torch.minimum

`torch.minimum(input, other, *, out=None) → Tensor`

    

Computes the element-wise minimum of `input` and `other`.

Note

If one of the elements being compared is a NaN, then that element is returned.
`minimum()` is not supported for tensors with complex dtypes.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor((1, 2, -1))
    >>> b = torch.tensor((3, 0, 4))
    >>> torch.minimum(a, b)
    tensor([1, 0, -1])
    

# torch.mm

`torch.mm(input, mat2, *, out=None) → Tensor`

    

Performs a matrix multiplication of the matrices `input` and `mat2`.

If `input` is a (n×m)(n \times m) tensor, `mat2` is a (m×p)(m \times p)
tensor, `out` will be a (n×p)(n \times p) tensor.

Note

This function does not
[broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics). For broadcasting matrix products, see
[`torch.matmul()`](torch.matmul#torch.matmul "torch.matmul").

Supports strided and sparse 2-D tensors as inputs, autograd with respect to
strided inputs.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first matrix to be matrix multiplied
  * **mat2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second matrix to be matrix multiplied

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> mat1 = torch.randn(2, 3)
    >>> mat2 = torch.randn(3, 3)
    >>> torch.mm(mat1, mat2)
    tensor([[ 0.4851,  0.5037, -0.3633],
            [-0.0760, -3.6705,  2.4784]])
    

# torch.mode

`torch.mode(input, dim=-1, keepdim=False, *, out=None) -> (Tensor,
LongTensor)`

    

Returns a namedtuple `(values, indices)` where `values` is the mode value of
each row of the `input` tensor in the given dimension `dim`, i.e. a value
which appears most often in that row, and `indices` is the index location of
each mode value found.

By default, `dim` is the last dimension of the `input` tensor.

If `keepdim` is `True`, the output tensors are of the same size as `input`
except in the dimension `dim` where they are of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensors having 1 fewer dimension
than `input`.

Note

This function is not defined for `torch.cuda.Tensor` yet.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the result tuple of two output tensors
(values, indices)

Example:

    
    
    >>> a = torch.randint(10, (5,))
    >>> a
    tensor([6, 5, 1, 0, 2])
    >>> b = a + (torch.randn(50, 1) * 5).long()
    >>> torch.mode(b, 0)
    torch.return_types.mode(values=tensor([6, 5, 1, 0, 2]), indices=tensor([2, 2, 2, 2, 2]))
    

# torch.moveaxis

`torch.moveaxis(input, source, destination) → Tensor`

    

Alias for [`torch.movedim()`](torch.movedim#torch.movedim "torch.movedim").

This function is equivalent to NumPy’s moveaxis function.

Examples:

    
    
    >>> t = torch.randn(3,2,1)
    >>> t
    tensor([[[-0.3362],
            [-0.8437]],
    
            [[-0.9627],
            [ 0.1727]],
    
            [[ 0.5173],
            [-0.1398]]])
    >>> torch.moveaxis(t, 1, 0).shape
    torch.Size([2, 3, 1])
    >>> torch.moveaxis(t, 1, 0)
    tensor([[[-0.3362],
            [-0.9627],
            [ 0.5173]],
    
            [[-0.8437],
            [ 0.1727],
            [-0.1398]]])
    >>> torch.moveaxis(t, (1, 2), (0, 1)).shape
    torch.Size([2, 1, 3])
    >>> torch.moveaxis(t, (1, 2), (0, 1))
    tensor([[[-0.3362, -0.9627,  0.5173]],
    
            [[-0.8437,  0.1727, -0.1398]]])
    

# torch.movedim

`torch.movedim(input, source, destination) → Tensor`

    

Moves the dimension(s) of `input` at the position(s) in `source` to the
position(s) in `destination`.

Other dimensions of `input` that are not explicitly moved remain in their
original order and appear at the positions not specified in `destination`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **source** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – Original positions of the dims to move. These must be unique.
  * **destination** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – Destination positions for each of the original dims. These must also be unique.

Examples:

    
    
    >>> t = torch.randn(3,2,1)
    >>> t
    tensor([[[-0.3362],
            [-0.8437]],
    
            [[-0.9627],
            [ 0.1727]],
    
            [[ 0.5173],
            [-0.1398]]])
    >>> torch.movedim(t, 1, 0).shape
    torch.Size([2, 3, 1])
    >>> torch.movedim(t, 1, 0)
    tensor([[[-0.3362],
            [-0.9627],
            [ 0.5173]],
    
            [[-0.8437],
            [ 0.1727],
            [-0.1398]]])
    >>> torch.movedim(t, (1, 2), (0, 1)).shape
    torch.Size([2, 1, 3])
    >>> torch.movedim(t, (1, 2), (0, 1))
    tensor([[[-0.3362, -0.9627,  0.5173]],
    
            [[-0.8437,  0.1727, -0.1398]]])
    

# torch.msort

`torch.msort(input, *, out=None) → Tensor`

    

Sorts the elements of the `input` tensor along its first dimension in
ascending order by value.

Note

`torch.msort(t)` is equivalent to `torch.sort(t, dim=0)[0]`. See also
[`torch.sort()`](torch.sort#torch.sort "torch.sort").

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> t = torch.randn(3, 4)
    >>> t
    tensor([[-0.1321,  0.4370, -1.2631, -1.1289],
            [-2.0527, -1.1250,  0.2275,  0.3077],
            [-0.0881, -0.1259, -0.5495,  1.0284]])
    >>> torch.msort(t)
    tensor([[-2.0527, -1.1250, -1.2631, -1.1289],
            [-0.1321, -0.1259, -0.5495,  0.3077],
            [-0.0881,  0.4370,  0.2275,  1.0284]])
    

# torch.mul

`torch.mul(input, other, *, out=None)`

    

Multiplies each element of the input `input` with the scalar `other` and
returns a new resulting tensor.

outi=other×inputi\text{out}_i = \text{other} \times \text{input}_i

If `input` is of type `FloatTensor` or `DoubleTensor`, `other` should be a
real number, otherwise it should be an integer

Parameters

    

  * **{input}** – 
  * **other** (_Number_) – the number to be multiplied to each element of `input`

Keyword Arguments

    

**{out}** –

Example:

    
    
    >>> a = torch.randn(3)
    >>> a
    tensor([ 0.2015, -0.4255,  2.6087])
    >>> torch.mul(a, 100)
    tensor([  20.1494,  -42.5491,  260.8663])
    

`torch.mul(input, other, *, out=None)`

Each element of the tensor `input` is multiplied by the corresponding element
of the Tensor `other`. The resulting tensor is returned.

The shapes of `input` and `other` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

outi=inputi×otheri\text{out}_i = \text{input}_i \times \text{other}_i

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first multiplicand tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second multiplicand tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 1)
    >>> a
    tensor([[ 1.1207],
            [-0.3137],
            [ 0.0700],
            [ 0.8378]])
    >>> b = torch.randn(1, 4)
    >>> b
    tensor([[ 0.5146,  0.1216, -0.5244,  2.2382]])
    >>> torch.mul(a, b)
    tensor([[ 0.5767,  0.1363, -0.5877,  2.5083],
            [-0.1614, -0.0382,  0.1645, -0.7021],
            [ 0.0360,  0.0085, -0.0367,  0.1567],
            [ 0.4312,  0.1019, -0.4394,  1.8753]])
    

# torch.multinomial

`torch.multinomial(input, num_samples, replacement=False, *, generator=None,
out=None) → LongTensor`

    

Returns a tensor where each row contains `num_samples` indices sampled from
the multinomial probability distribution located in the corresponding row of
tensor `input`.

Note

The rows of `input` do not need to sum to one (in which case we use the values
as weights), but must be non-negative, finite and have a non-zero sum.

Indices are ordered from left to right according to when each was sampled
(first samples are placed in first column).

If `input` is a vector, `out` is a vector of size `num_samples`.

If `input` is a matrix with `m` rows, `out` is an matrix of shape
(m×num_samples)(m \times \text{num\\_samples}) .

If replacement is `True`, samples are drawn with replacement.

If not, they are drawn without replacement, which means that when a sample
index is drawn for a row, it cannot be drawn again for that row.

Note

When drawn without replacement, `num_samples` must be lower than number of
non-zero elements in `input` (or the min number of non-zero elements in each
row of `input` if it is a matrix).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor containing probabilities
  * **num_samples** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of samples to draw
  * **replacement** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to draw with replacement or not

Keyword Arguments

    

  * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> weights = torch.tensor([0, 10, 3, 0], dtype=torch.float) # create a tensor of weights
    >>> torch.multinomial(weights, 2)
    tensor([1, 2])
    >>> torch.multinomial(weights, 4) # ERROR!
    RuntimeError: invalid argument 2: invalid multinomial distribution (with replacement=False,
    not enough non-negative category to sample) at ../aten/src/TH/generic/THTensorRandom.cpp:320
    >>> torch.multinomial(weights, 4, replacement=True)
    tensor([ 2,  1,  1,  1])
    

# torch.multiply

`torch.multiply(input, other, *, out=None)`

    

Alias for [`torch.mul()`](torch.mul#torch.mul "torch.mul").

# torch.mv

`torch.mv(input, vec, *, out=None) → Tensor`

    

Performs a matrix-vector product of the matrix `input` and the vector `vec`.

If `input` is a (n×m)(n \times m) tensor, `vec` is a 1-D tensor of size mm ,
`out` will be 1-D of size nn .

Note

This function does not
[broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be multiplied
  * **vec** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – vector to be multiplied

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> mat = torch.randn(2, 3)
    >>> vec = torch.randn(3)
    >>> torch.mv(mat, vec)
    tensor([ 1.0404, -0.6361])
    

# torch.mvlgamma

`torch.mvlgamma(input, p) → Tensor`

    

Computes the [multivariate log-gamma
function](https://en.wikipedia.org/wiki/Multivariate_gamma_function)) with
dimension pp element-wise, given by

log⁡(Γp(a))=C+∑i=1plog⁡(Γ(a−i−12))\log(\Gamma_{p}(a)) = C + \displaystyle
\sum_{i=1}^{p} \log\left(\Gamma\left(a - \frac{i - 1}{2}\right)\right)

where C=log⁡(π)×p(p−1)4C = \log(\pi) \times \frac{p (p - 1)}{4} and
Γ(⋅)\Gamma(\cdot) is the Gamma function.

All elements must be greater than p−12\frac{p - 1}{2} , otherwise an error
would be thrown.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute the multivariate log-gamma function
  * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of dimensions

Example:

    
    
    >>> a = torch.empty(2, 3).uniform_(1, 2)
    >>> a
    tensor([[1.6835, 1.8474, 1.1929],
            [1.0475, 1.7162, 1.4180]])
    >>> torch.mvlgamma(a, 2)
    tensor([[0.3928, 0.4007, 0.7586],
            [1.0311, 0.3901, 0.5049]])
    

# torch.nan_to_num

`torch.nan_to_num(input, nan=0.0, posinf=None, neginf=None, *, out=None) →
Tensor`

    

Replaces `NaN`, positive infinity, and negative infinity values in `input`
with the values specified by `nan`, `posinf`, and `neginf`, respectively. By
default, `NaN`s are replaced with zero, positive infinity is replaced with the
greatest finite value representable by :attr:`input`’s dtype, and negative
infinity is replaced with the least finite value representable by `input`’s
dtype.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **nan** (_Number_ _,__optional_) – the value to replace `NaN`s with. Default is zero.
  * **posinf** (_Number_ _,__optional_) – if a Number, the value to replace positive infinity values with. If None, positive infinity values are replaced with the greatest finite value representable by `input`’s dtype. Default is None.
  * **neginf** (_Number_ _,__optional_) – if a Number, the value to replace negative infinity values with. If None, negative infinity values are replaced with the lowest finite value representable by `input`’s dtype. Default is None.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x = torch.tensor([float('nan'), float('inf'), -float('inf'), 3.14])
    >>> torch.nan_to_num(x)
    tensor([ 0.0000e+00,  3.4028e+38, -3.4028e+38,  3.1400e+00])
    >>> torch.nan_to_num(x, nan=2.0)
    tensor([ 2.0000e+00,  3.4028e+38, -3.4028e+38,  3.1400e+00])
    >>> torch.nan_to_num(x, nan=2.0, posinf=1.0)
    tensor([ 2.0000e+00,  1.0000e+00, -3.4028e+38,  3.1400e+00])
    

# torch.nanmedian

`torch.nanmedian(input) → Tensor`

    

Returns the median of the values in `input`, ignoring `NaN` values.

This function is identical to [`torch.median()`](torch.median#torch.median
"torch.median") when there are no `NaN` values in `input`. When `input` has
one or more `NaN` values, [`torch.median()`](torch.median#torch.median
"torch.median") will always return `NaN`, while this function will return the
median of the non-`NaN` elements in `input`. If all the elements in `input`
are `NaN` it will also return `NaN`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.tensor([1, float('nan'), 3, 2])
    >>> a.median()
    tensor(nan)
    >>> a.nanmedian()
    tensor(2.)
    

`torch.nanmedian(input, dim=-1, keepdim=False, *, out=None) -> (Tensor,
LongTensor)`

Returns a namedtuple `(values, indices)` where `values` contains the median of
each row of `input` in the dimension `dim`, ignoring `NaN` values, and
`indices` contains the index of the median values found in the dimension
`dim`.

This function is identical to [`torch.median()`](torch.median#torch.median
"torch.median") when there are no `NaN` values in a reduced row. When a
reduced row has one or more `NaN` values,
[`torch.median()`](torch.median#torch.median "torch.median") will always
reduce it to `NaN`, while this function will reduce it to the median of the
non-`NaN` elements. If all the elements in a reduced row are `NaN` then it
will be reduced to `NaN`, too.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** (_(_[Tensor](../tensors#torch.Tensor "torch.Tensor")
_,_[Tensor](../tensors#torch.Tensor "torch.Tensor") _)__,__optional_) – The
first tensor will be populated with the median values and the second tensor,
which must have dtype long, with their indices in the dimension `dim` of
`input`.

Example:

    
    
    >>> a = torch.tensor([[2, 3, 1], [float('nan'), 1, float('nan')]])
    >>> a
    tensor([[2., 3., 1.],
            [nan, 1., nan]])
    >>> a.median(0)
    torch.return_types.median(values=tensor([nan, 1., nan]), indices=tensor([1, 1, 1]))
    >>> a.nanmedian(0)
    torch.return_types.nanmedian(values=tensor([2., 1., 1.]), indices=tensor([0, 1, 0]))
    

# torch.nanquantile

`torch.nanquantile(input, q, dim=None, keepdim=False, *, out=None) → Tensor`

    

This is a variant of [`torch.quantile()`](torch.quantile#torch.quantile
"torch.quantile") that “ignores” `NaN` values, computing the quantiles `q` as
if `NaN` values in `input` did not exist. If all values in a reduced row are
`NaN` then the quantiles for that reduction will be `NaN`. See the
documentation for [`torch.quantile()`](torch.quantile#torch.quantile
"torch.quantile").

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **q** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – a scalar or 1D tensor of quantile values in the range [0, 1]
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> t = torch.tensor([float('nan'), 1, 2])
    >>> t.quantile(0.5)
    tensor(nan)
    >>> t.nanquantile(0.5)
    tensor(1.5000)
    
    >>> t = torch.tensor([[float('nan'), float('nan')], [1, 2]])
    >>> t
    tensor([[nan, nan],
            [1., 2.]])
    >>> t.nanquantile(0.5, dim=0)
    tensor([1., 2.])
    >>> t.nanquantile(0.5, dim=1)
    tensor([   nan, 1.5000])
    

# torch.nansum

`torch.nansum(input, *, dtype=None) → Tensor`

    

Returns the sum of all elements, treating Not a Numbers (NaNs) as zero.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), optional) – the desired data type of returned tensor. If
specified, the input tensor is casted to `dtype` before the operation is
performed. This is useful for preventing data type overflows. Default: None.

Example:

    
    
    >>> a = torch.tensor([1., 2., float('nan'), 4.])
    >>> torch.nansum(a)
    tensor(7.)
    

`torch.nansum(input, dim, keepdim=False, *, dtype=None) → Tensor`

Returns the sum of each row of the `input` tensor in the given dimension
`dim`, treating Not a Numbers (NaNs) as zero. If `dim` is a list of
dimensions, reduce over all of them.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), optional) – the desired data type of returned tensor. If
specified, the input tensor is casted to `dtype` before the operation is
performed. This is useful for preventing data type overflows. Default: None.

Example:

    
    
    >>> torch.nansum(torch.tensor([1., float("nan")]))
    1.0
    >>> a = torch.tensor([[1, 2], [3., float("nan")]])
    >>> torch.nansum(a)
    tensor(6.)
    >>> torch.nansum(a, dim=0)
    tensor([4., 2.])
    >>> torch.nansum(a, dim=1)
    tensor([3., 3.])
    

# torch.narrow

`torch.narrow(input, dim, start, length) → Tensor`

    

Returns a new tensor that is a narrowed version of `input` tensor. The
dimension `dim` is input from `start` to `start + length`. The returned tensor
and `input` tensor share the same underlying storage.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to narrow
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension along which to narrow
  * **start** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the starting dimension
  * **length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the distance to the ending dimension

Example:

    
    
    >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    >>> torch.narrow(x, 0, 0, 2)
    tensor([[ 1,  2,  3],
            [ 4,  5,  6]])
    >>> torch.narrow(x, 1, 1, 2)
    tensor([[ 2,  3],
            [ 5,  6],
            [ 8,  9]])
    

# torch.ne

`torch.ne(input, other, *, out=None) → Tensor`

    

Computes input≠other\text{input} \neq \text{other} element-wise.

The second argument can be a number or a tensor whose shape is
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the first argument.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Returns

    

A boolean tensor that is True where `input` is not equal to `other` and False
elsewhere

Example:

    
    
    >>> torch.ne(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]]))
    tensor([[False, True], [True, False]])
    

# torch.neg

`torch.neg(input, *, out=None) → Tensor`

    

Returns a new tensor with the negative of the elements of `input`.

out=−1×input\text{out} = -1 \times \text{input}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(5)
    >>> a
    tensor([ 0.0090, -0.2262, -0.0682, -0.2866,  0.3940])
    >>> torch.neg(a)
    tensor([-0.0090,  0.2262,  0.0682,  0.2866, -0.3940])
    

# torch.negative

`torch.negative(input, *, out=None) → Tensor`

    

Alias for [`torch.neg()`](torch.neg#torch.neg "torch.neg")

# torch.nextafter

`torch.nextafter(input, other, *, out=None) → Tensor`

    

Return the next floating-point value after `input` towards `other`,
elementwise.

The shapes of `input` and `other` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> eps = torch.finfo(torch.float32).eps
    >>> torch.nextafter(torch.Tensor([1, 2]), torch.Tensor([2, 1])) == torch.Tensor([eps + 1, 2 - eps])
    tensor([True, True])
    

# AdaptiveAvgPool1d

`class torch.nn.AdaptiveAvgPool1d(output_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveAvgPool1d)

    

Applies a 1D adaptive average pooling over an input signal composed of several
input planes.

The output size is H, for any input size. The number of output features is
equal to the number of input planes.

Parameters

    

**output_size** – the target output size H

#### Examples

    
    
    >>> # target output size of 5
    >>> m = nn.AdaptiveAvgPool1d(5)
    >>> input = torch.randn(1, 64, 8)
    >>> output = m(input)
    

# AdaptiveAvgPool2d

`class torch.nn.AdaptiveAvgPool2d(output_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveAvgPool2d)

    

Applies a 2D adaptive average pooling over an input signal composed of several
input planes.

The output is of size H x W, for any input size. The number of output features
is equal to the number of input planes.

Parameters

    

**output_size** – the target output size of the image of the form H x W. Can
be a tuple (H, W) or a single H for a square image H x H. H and W can be
either a `int`, or `None` which means the size will be the same as that of the
input.

#### Examples

    
    
    >>> # target output size of 5x7
    >>> m = nn.AdaptiveAvgPool2d((5,7))
    >>> input = torch.randn(1, 64, 8, 9)
    >>> output = m(input)
    >>> # target output size of 7x7 (square)
    >>> m = nn.AdaptiveAvgPool2d(7)
    >>> input = torch.randn(1, 64, 10, 9)
    >>> output = m(input)
    >>> # target output size of 10x7
    >>> m = nn.AdaptiveAvgPool2d((None, 7))
    >>> input = torch.randn(1, 64, 10, 9)
    >>> output = m(input)
    

# AdaptiveAvgPool3d

`class torch.nn.AdaptiveAvgPool3d(output_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveAvgPool3d)

    

Applies a 3D adaptive average pooling over an input signal composed of several
input planes.

The output is of size D x H x W, for any input size. The number of output
features is equal to the number of input planes.

Parameters

    

**output_size** – the target output size of the form D x H x W. Can be a tuple
(D, H, W) or a single number D for a cube D x D x D. D, H and W can be either
a `int`, or `None` which means the size will be the same as that of the input.

#### Examples

    
    
    >>> # target output size of 5x7x9
    >>> m = nn.AdaptiveAvgPool3d((5,7,9))
    >>> input = torch.randn(1, 64, 8, 9, 10)
    >>> output = m(input)
    >>> # target output size of 7x7x7 (cube)
    >>> m = nn.AdaptiveAvgPool3d(7)
    >>> input = torch.randn(1, 64, 10, 9, 8)
    >>> output = m(input)
    >>> # target output size of 7x9x8
    >>> m = nn.AdaptiveAvgPool3d((7, None, None))
    >>> input = torch.randn(1, 64, 10, 9, 8)
    >>> output = m(input)
    

# AdaptiveLogSoftmaxWithLoss

`class torch.nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs,
div_value=4.0, head_bias=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/adaptive.html#AdaptiveLogSoftmaxWithLoss)

    

Efficient softmax approximation as described in [Efficient softmax
approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David
Grangier, and Hervé Jégou](https://arxiv.org/abs/1609.04309).

Adaptive softmax is an approximate strategy for training models with large
output spaces. It is most effective when the label distribution is highly
imbalanced, for example in natural language modelling, where the word
frequency distribution approximately follows the [Zipf’s
law](https://en.wikipedia.org/wiki/Zipf%27s_law).

Adaptive softmax partitions the labels into several clusters, according to
their frequency. These clusters may contain different number of targets each.
Additionally, clusters containing less frequent labels assign lower
dimensional embeddings to those labels, which speeds up the computation. For
each minibatch, only clusters for which at least one target is present are
evaluated.

The idea is that the clusters which are accessed frequently (like the first
one, containing most frequent labels), should also be cheap to compute – that
is, contain a small number of assigned labels.

We highly recommend taking a look at the original paper for more details.

  * `cutoffs` should be an ordered Sequence of integers sorted in the increasing order. It controls number of clusters and the partitioning of targets into clusters. For example setting `cutoffs = [10, 100, 1000]` means that first `10` targets will be assigned to the ‘head’ of the adaptive softmax, targets `11, 12, …, 100` will be assigned to the first cluster, and targets `101, 102, …, 1000` will be assigned to the second cluster, while targets `1001, 1002, …, n_classes - 1` will be assigned to the last, third cluster.
  * `div_value` is used to compute the size of each additional cluster, which is given as ⌊in_featuresdiv_valueidx⌋\left\lfloor\frac{\texttt{in\\_features}}{\texttt{div\\_value}^{idx}}\right\rfloor , where idxidx is the cluster index (with clusters for less frequent words having larger indices, and indices starting from 11 ).
  * `head_bias` if set to True, adds a bias term to the ‘head’ of the adaptive softmax. See paper for details. Set to False in the official implementation.

Warning

Labels passed as inputs to this module should be sorted according to their
frequency. This means that the most frequent label should be represented by
the index `0`, and the least frequent label should be represented by the index
`n_classes - 1`.

Note

This module returns a `NamedTuple` with `output` and `loss` fields. See
further documentation for details.

Note

To compute log-probabilities for all classes, the `log_prob` method can be
used.

Parameters

    

  * **in_features** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of features in the input tensor
  * **n_classes** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of classes in the dataset
  * **cutoffs** (_Sequence_) – Cutoffs used to assign targets to their buckets
  * **div_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – value used as an exponent to compute sizes of the clusters. Default: 4.0
  * **head_bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a bias term to the ‘head’ of the adaptive softmax. Default: `False`

Returns

    

  * **output** is a Tensor of size `N` containing computed target log probabilities for each example
  * **loss** is a Scalar representing the computed negative log likelihood loss

Return type

    

`NamedTuple` with `output` and `loss` fields

Shape:

    

  * input: (N,in_features)(N, \texttt{in\\_features})
  * target: (N)(N) where each value satisfies 0<=target[i]<=n_classes0 <= \texttt{target[i]} <= \texttt{n\\_classes}
  * output1: (N)(N)
  * output2: `Scalar`

`log_prob(input)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/adaptive.html#AdaptiveLogSoftmaxWithLoss.log_prob)

    

Computes log probabilities for all n_classes\texttt{n\\_classes}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a minibatch of
examples

Returns

    

log-probabilities of for each class cc in range 0<=c<=n_classes0 <= c <=
\texttt{n\\_classes} , where n_classes\texttt{n\\_classes} is a parameter
passed to `AdaptiveLogSoftmaxWithLoss` constructor.

Shape:

    

  * Input: (N,in_features)(N, \texttt{in\\_features})
  * Output: (N,n_classes)(N, \texttt{n\\_classes})

`predict(input)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/adaptive.html#AdaptiveLogSoftmaxWithLoss.predict)

    

This is equivalent to `self.log_pob(input).argmax(dim=1)`, but is more
efficient in some cases.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a minibatch of
examples

Returns

    

a class with the highest probability for each example

Return type

    

output ([Tensor](../tensors#torch.Tensor "torch.Tensor"))

Shape:

    

  * Input: (N,in_features)(N, \texttt{in\\_features})
  * Output: (N)(N)

# AdaptiveMaxPool1d

`class torch.nn.AdaptiveMaxPool1d(output_size, return_indices=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveMaxPool1d)

    

Applies a 1D adaptive max pooling over an input signal composed of several
input planes.

The output size is H, for any input size. The number of output features is
equal to the number of input planes.

Parameters

    

  * **output_size** – the target output size H
  * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool1d. Default: `False`

#### Examples

    
    
    >>> # target output size of 5
    >>> m = nn.AdaptiveMaxPool1d(5)
    >>> input = torch.randn(1, 64, 8)
    >>> output = m(input)
    

# AdaptiveMaxPool2d

`class torch.nn.AdaptiveMaxPool2d(output_size, return_indices=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveMaxPool2d)

    

Applies a 2D adaptive max pooling over an input signal composed of several
input planes.

The output is of size H x W, for any input size. The number of output features
is equal to the number of input planes.

Parameters

    

  * **output_size** – the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a `int`, or `None` which means the size will be the same as that of the input.
  * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool2d. Default: `False`

#### Examples

    
    
    >>> # target output size of 5x7
    >>> m = nn.AdaptiveMaxPool2d((5,7))
    >>> input = torch.randn(1, 64, 8, 9)
    >>> output = m(input)
    >>> # target output size of 7x7 (square)
    >>> m = nn.AdaptiveMaxPool2d(7)
    >>> input = torch.randn(1, 64, 10, 9)
    >>> output = m(input)
    >>> # target output size of 10x7
    >>> m = nn.AdaptiveMaxPool2d((None, 7))
    >>> input = torch.randn(1, 64, 10, 9)
    >>> output = m(input)
    

# AdaptiveMaxPool3d

`class torch.nn.AdaptiveMaxPool3d(output_size, return_indices=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveMaxPool3d)

    

Applies a 3D adaptive max pooling over an input signal composed of several
input planes.

The output is of size D x H x W, for any input size. The number of output
features is equal to the number of input planes.

Parameters

    

  * **output_size** – the target output size of the image of the form D x H x W. Can be a tuple (D, H, W) or a single D for a cube D x D x D. D, H and W can be either a `int`, or `None` which means the size will be the same as that of the input.
  * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool3d. Default: `False`

#### Examples

    
    
    >>> # target output size of 5x7x9
    >>> m = nn.AdaptiveMaxPool3d((5,7,9))
    >>> input = torch.randn(1, 64, 8, 9, 10)
    >>> output = m(input)
    >>> # target output size of 7x7x7 (cube)
    >>> m = nn.AdaptiveMaxPool3d(7)
    >>> input = torch.randn(1, 64, 10, 9, 8)
    >>> output = m(input)
    >>> # target output size of 7x9x8
    >>> m = nn.AdaptiveMaxPool3d((7, None, None))
    >>> input = torch.randn(1, 64, 10, 9, 8)
    >>> output = m(input)
    

# AlphaDropout

`class torch.nn.AlphaDropout(p=0.5, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#AlphaDropout)

    

Applies Alpha Dropout over the input.

Alpha Dropout is a type of Dropout that maintains the self-normalizing
property. For an input with zero mean and unit standard deviation, the output
of Alpha Dropout maintains the original mean and standard deviation of the
input. Alpha Dropout goes hand-in-hand with SELU activation function, which
ensures that the outputs have zero mean and unit standard deviation.

During training, it randomly masks some of the elements of the input tensor
with probability _p_ using samples from a bernoulli distribution. The elements
to masked are randomized on every forward call, and scaled and shifted to
maintain zero mean and unit standard deviation.

During evaluation the module simply computes an identity function.

More details can be found in the paper [Self-Normalizing Neural
Networks](https://arxiv.org/abs/1706.02515) .

Parameters

    

  * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – probability of an element to be dropped. Default: 0.5
  * **inplace** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to `True`, will do this operation in-place

Shape:

    

  * Input: (∗)(*) . Input can be of any shape
  * Output: (∗)(*) . Output is of the same shape as input

Examples:

    
    
    >>> m = nn.AlphaDropout(p=0.2)
    >>> input = torch.randn(20, 16)
    >>> output = m(input)
    

# AvgPool1d

`class torch.nn.AvgPool1d(kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AvgPool1d)

    

Applies a 1D average pooling over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size (N,C,L)(N,
C, L) , output (N,C,Lout)(N, C, L_{out}) and `kernel_size` kk can be precisely
described as:

out(Ni,Cj,l)=1k∑m=0k−1input(Ni,Cj,stride×l+m)\text{out}(N_i, C_j, l) =
\frac{1}{k} \sum_{m=0}^{k-1} \text{input}(N_i, C_j, \text{stride} \times l +
m)

If `padding` is non-zero, then the input is implicitly zero-padded on both
sides for `padding` number of points.

Note

When ceil_mode=True, sliding windows are allowed to go off-bounds if they
start within the left padding or the input. Sliding windows that would start
in the right padded region are ignored.

The parameters `kernel_size`, `stride`, `padding` can each be an `int` or a
one-element tuple.

Parameters

    

  * **kernel_size** – the size of the window
  * **stride** – the stride of the window. Default value is `kernel_size`
  * **padding** – implicit zero padding to be added on both sides
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape
  * **count_include_pad** – when True, will include the zero-padding in the averaging calculation

Shape:

    

  * Input: (N,C,Lin)(N, C, L_{in})
  * Output: (N,C,Lout)(N, C, L_{out}) , where

Lout=⌊Lin+2×padding−kernel_sizestride+1⌋L_{out} = \left\lfloor \frac{L_{in} +
2 \times \text{padding} - \text{kernel\\_size}}{\text{stride}} +
1\right\rfloor

Examples:

    
    
    >>> # pool with window of size=3, stride=2
    >>> m = nn.AvgPool1d(3, stride=2)
    >>> m(torch.tensor([[[1.,2,3,4,5,6,7]]]))
    tensor([[[ 2.,  4.,  6.]]])
    

# AvgPool2d

`class torch.nn.AvgPool2d(kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True, divisor_override=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AvgPool2d)

    

Applies a 2D average pooling over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size
(N,C,H,W)(N, C, H, W) , output (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) and
`kernel_size` (kH,kW)(kH, kW) can be precisely described as:

out(Ni,Cj,h,w)=1kH∗kW∑m=0kH−1∑n=0kW−1input(Ni,Cj,stride[0]×h+m,stride[1]×w+n)out(N_i,
C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i,
C_j, stride[0] \times h + m, stride[1] \times w + n)

If `padding` is non-zero, then the input is implicitly zero-padded on both
sides for `padding` number of points.

Note

When ceil_mode=True, sliding windows are allowed to go off-bounds if they
start within the left padding or the input. Sliding windows that would start
in the right padded region are ignored.

The parameters `kernel_size`, `stride`, `padding` can either be:

  * a single `int` – in which case the same value is used for the height and width dimension
  * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension

Parameters

    

  * **kernel_size** – the size of the window
  * **stride** – the stride of the window. Default value is `kernel_size`
  * **padding** – implicit zero padding to be added on both sides
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape
  * **count_include_pad** – when True, will include the zero-padding in the averaging calculation
  * **divisor_override** – if specified, it will be used as divisor, otherwise `kernel_size` will be used

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where

Hout=⌊Hin+2×padding[0]−kernel_size[0]stride[0]+1⌋H_{out} =
\left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] -
\text{kernel\\_size}[0]}{\text{stride}[0]} + 1\right\rfloor

Wout=⌊Win+2×padding[1]−kernel_size[1]stride[1]+1⌋W_{out} =
\left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] -
\text{kernel\\_size}[1]}{\text{stride}[1]} + 1\right\rfloor

Examples:

    
    
    >>> # pool of square window of size=3, stride=2
    >>> m = nn.AvgPool2d(3, stride=2)
    >>> # pool of non-square window
    >>> m = nn.AvgPool2d((3, 2), stride=(2, 1))
    >>> input = torch.randn(20, 16, 50, 32)
    >>> output = m(input)
    

# AvgPool3d

`class torch.nn.AvgPool3d(kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True, divisor_override=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AvgPool3d)

    

Applies a 3D average pooling over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size
(N,C,D,H,W)(N, C, D, H, W) , output (N,C,Dout,Hout,Wout)(N, C, D_{out},
H_{out}, W_{out}) and `kernel_size` (kD,kH,kW)(kD, kH, kW) can be precisely
described as:

out(Ni,Cj,d,h,w)=∑k=0kD−1∑m=0kH−1∑n=0kW−1input(Ni,Cj,stride[0]×d+k,stride[1]×h+m,stride[2]×w+n)kD×kH×kW\begin{aligned}
\text{out}(N_i, C_j, d, h, w) ={} & \sum_{k=0}^{kD-1} \sum_{m=0}^{kH-1}
\sum_{n=0}^{kW-1} \\\ & \frac{\text{input}(N_i, C_j, \text{stride}[0] \times d
+ k, \text{stride}[1] \times h + m, \text{stride}[2] \times w + n)} {kD \times
kH \times kW} \end{aligned}

If `padding` is non-zero, then the input is implicitly zero-padded on all
three sides for `padding` number of points.

Note

When ceil_mode=True, sliding windows are allowed to go off-bounds if they
start within the left padding or the input. Sliding windows that would start
in the right padded region are ignored.

The parameters `kernel_size`, `stride` can either be:

  * a single `int` – in which case the same value is used for the depth, height and width dimension
  * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension

Parameters

    

  * **kernel_size** – the size of the window
  * **stride** – the stride of the window. Default value is `kernel_size`
  * **padding** – implicit zero padding to be added on all three sides
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape
  * **count_include_pad** – when True, will include the zero-padding in the averaging calculation
  * **divisor_override** – if specified, it will be used as divisor, otherwise `kernel_size` will be used

Shape:

    

  * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in})
  * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where

Dout=⌊Din+2×padding[0]−kernel_size[0]stride[0]+1⌋D_{out} =
\left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] -
\text{kernel\\_size}[0]}{\text{stride}[0]} + 1\right\rfloor

Hout=⌊Hin+2×padding[1]−kernel_size[1]stride[1]+1⌋H_{out} =
\left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] -
\text{kernel\\_size}[1]}{\text{stride}[1]} + 1\right\rfloor

Wout=⌊Win+2×padding[2]−kernel_size[2]stride[2]+1⌋W_{out} =
\left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] -
\text{kernel\\_size}[2]}{\text{stride}[2]} + 1\right\rfloor

Examples:

    
    
    >>> # pool of square window of size=3, stride=2
    >>> m = nn.AvgPool3d(3, stride=2)
    >>> # pool of non-square window
    >>> m = nn.AvgPool3d((3, 2, 2), stride=(2, 1, 2))
    >>> input = torch.randn(20, 16, 50,44, 31)
    >>> output = m(input)
    

# BatchNorm1d

`class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1,
affine=True, track_running_stats=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#BatchNorm1d)

    

Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs
with optional additional channel dimension) as described in the paper [Batch
Normalization: Accelerating Deep Network Training by Reducing Internal
Covariate Shift](https://arxiv.org/abs/1502.03167) .

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated per-dimension over the mini-
batches and γ\gamma and β\beta are learnable parameter vectors of size `C`
(where `C` is the input size). By default, the elements of γ\gamma are set to
1 and the elements of β\beta are set to 0. The standard-deviation is
calculated via the biased estimator, equivalent to `torch.var(input,
unbiased=False)`.

Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default `momentum` of 0.1.

If `track_running_stats` is set to `False`, this layer then does not keep
running estimates, and batch statistics are instead used during evaluation
time as well.

Note

This `momentum` argument is different from one used in optimizer classes and
the conventional notion of momentum. Mathematically, the update rule for
running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new}
= (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where
x^\hat{x} is the estimated statistic and xtx_t is the new observed value.

Because the Batch Normalization is done over the `C` dimension, computing
statistics on `(N, L)` slices, it’s common terminology to call this Temporal
Batch Normalization.

Parameters

    

  * **num_features** – CC from an expected input of size (N,C,L)(N, C, L) or LL from input of size (N,L)(N, L)
  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1
  * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True`
  * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True`

Shape:

    

  * Input: (N,C)(N, C) or (N,C,L)(N, C, L)
  * Output: (N,C)(N, C) or (N,C,L)(N, C, L) (same shape as input)

Examples:

    
    
    >>> # With Learnable Parameters
    >>> m = nn.BatchNorm1d(100)
    >>> # Without Learnable Parameters
    >>> m = nn.BatchNorm1d(100, affine=False)
    >>> input = torch.randn(20, 100)
    >>> output = m(input)
    

# BatchNorm2d

`class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1,
affine=True, track_running_stats=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#BatchNorm2d)

    

Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with
additional channel dimension) as described in the paper [Batch Normalization:
Accelerating Deep Network Training by Reducing Internal Covariate
Shift](https://arxiv.org/abs/1502.03167) .

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated per-dimension over the mini-
batches and γ\gamma and β\beta are learnable parameter vectors of size `C`
(where `C` is the input size). By default, the elements of γ\gamma are set to
1 and the elements of β\beta are set to 0. The standard-deviation is
calculated via the biased estimator, equivalent to `torch.var(input,
unbiased=False)`.

Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default `momentum` of 0.1.

If `track_running_stats` is set to `False`, this layer then does not keep
running estimates, and batch statistics are instead used during evaluation
time as well.

Note

This `momentum` argument is different from one used in optimizer classes and
the conventional notion of momentum. Mathematically, the update rule for
running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new}
= (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where
x^\hat{x} is the estimated statistic and xtx_t is the new observed value.

Because the Batch Normalization is done over the `C` dimension, computing
statistics on `(N, H, W)` slices, it’s common terminology to call this Spatial
Batch Normalization.

Parameters

    

  * **num_features** – CC from an expected input of size (N,C,H,W)(N, C, H, W)
  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1
  * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True`
  * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True`

Shape:

    

  * Input: (N,C,H,W)(N, C, H, W)
  * Output: (N,C,H,W)(N, C, H, W) (same shape as input)

Examples:

    
    
    >>> # With Learnable Parameters
    >>> m = nn.BatchNorm2d(100)
    >>> # Without Learnable Parameters
    >>> m = nn.BatchNorm2d(100, affine=False)
    >>> input = torch.randn(20, 100, 35, 45)
    >>> output = m(input)
    

# BatchNorm3d

`class torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1,
affine=True, track_running_stats=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#BatchNorm3d)

    

Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with
additional channel dimension) as described in the paper [Batch Normalization:
Accelerating Deep Network Training by Reducing Internal Covariate
Shift](https://arxiv.org/abs/1502.03167) .

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated per-dimension over the mini-
batches and γ\gamma and β\beta are learnable parameter vectors of size `C`
(where `C` is the input size). By default, the elements of γ\gamma are set to
1 and the elements of β\beta are set to 0. The standard-deviation is
calculated via the biased estimator, equivalent to `torch.var(input,
unbiased=False)`.

Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default `momentum` of 0.1.

If `track_running_stats` is set to `False`, this layer then does not keep
running estimates, and batch statistics are instead used during evaluation
time as well.

Note

This `momentum` argument is different from one used in optimizer classes and
the conventional notion of momentum. Mathematically, the update rule for
running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new}
= (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where
x^\hat{x} is the estimated statistic and xtx_t is the new observed value.

Because the Batch Normalization is done over the `C` dimension, computing
statistics on `(N, D, H, W)` slices, it’s common terminology to call this
Volumetric Batch Normalization or Spatio-temporal Batch Normalization.

Parameters

    

  * **num_features** – CC from an expected input of size (N,C,D,H,W)(N, C, D, H, W)
  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1
  * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True`
  * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True`

Shape:

    

  * Input: (N,C,D,H,W)(N, C, D, H, W)
  * Output: (N,C,D,H,W)(N, C, D, H, W) (same shape as input)

Examples:

    
    
    >>> # With Learnable Parameters
    >>> m = nn.BatchNorm3d(100)
    >>> # Without Learnable Parameters
    >>> m = nn.BatchNorm3d(100, affine=False)
    >>> input = torch.randn(20, 100, 35, 45, 10)
    >>> output = m(input)
    

# BCELoss

`class torch.nn.BCELoss(weight=None, size_average=None, reduce=None,
reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#BCELoss)

    

Creates a criterion that measures the Binary Cross Entropy between the target
and the output:

The unreduced (i.e. with `reduction` set to `'none'`) loss can be described
as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=−wn[yn⋅log⁡xn+(1−yn)⋅log⁡(1−xn)],\ell(x, y) = L =
\\{l_1,\dots,l_N\\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 -
y_n) \cdot \log (1 - x_n) \right],

where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`),
then

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) =
\begin{cases} \operatorname{mean}(L), & \text{if reduction} =
\text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}
\end{cases}

This is used for measuring the error of a reconstruction in for example an
auto-encoder. Note that the targets yy should be numbers between 0 and 1.

Notice that if xnx_n is either 0 or 1, one of the log terms would be
mathematically undefined in the above loss equation. PyTorch chooses to set
log⁡(0)=−∞\log (0) = -\infty , since lim⁡x→0log⁡(x)=−∞\lim_{x\to 0} \log (x) =
-\infty . However, an infinite term in the loss equation is not desirable for
several reasons.

For one, if either yn=0y_n = 0 or (1−yn)=0(1 - y_n) = 0 , then we would be
multiplying 0 with infinity. Secondly, if we have an infinite loss value, then
we would also have an infinite term in our gradient, since
lim⁡x→0ddxlog⁡(x)=∞\lim_{x\to 0} \frac{d}{dx} \log (x) = \infty . This would
make BCELoss’s backward method nonlinear with respect to xnx_n , and using it
for things like linear regression would not be straight-forward.

Our solution is that BCELoss clamps its log function outputs to be greater
than or equal to -100. This way, we can always have a finite loss value and a
linear backward method.

Parameters

    

  * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size `nbatch`.
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input
  * Output: scalar. If `reduction` is `'none'`, then (N,∗)(N, *) , same shape as input.

Examples:

    
    
    >>> m = nn.Sigmoid()
    >>> loss = nn.BCELoss()
    >>> input = torch.randn(3, requires_grad=True)
    >>> target = torch.empty(3).random_(2)
    >>> output = loss(m(input), target)
    >>> output.backward()
    

# BCEWithLogitsLoss

`class torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None,
reduction='mean', pos_weight=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#BCEWithLogitsLoss)

    

This loss combines a `Sigmoid` layer and the `BCELoss` in one single class.
This version is more numerically stable than using a plain `Sigmoid` followed
by a `BCELoss` as, by combining the operations into one layer, we take
advantage of the log-sum-exp trick for numerical stability.

The unreduced (i.e. with `reduction` set to `'none'`) loss can be described
as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=−wn[yn⋅log⁡σ(xn)+(1−yn)⋅log⁡(1−σ(xn))],\ell(x, y) = L =
\\{l_1,\dots,l_N\\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log \sigma(x_n)
+ (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right],

where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`),
then

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) =
\begin{cases} \operatorname{mean}(L), & \text{if reduction} =
\text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}
\end{cases}

This is used for measuring the error of a reconstruction in for example an
auto-encoder. Note that the targets `t[i]` should be numbers between 0 and 1.

It’s possible to trade off recall and precision by adding weights to positive
examples. In the case of multi-label classification the loss can be described
as:

ℓc(x,y)=Lc={l1,c,…,lN,c}⊤,ln,c=−wn,c[pcyn,c⋅log⁡σ(xn,c)+(1−yn,c)⋅log⁡(1−σ(xn,c))],\ell_c(x,
y) = L_c = \\{l_{1,c},\dots,l_{N,c}\\}^\top, \quad l_{n,c} = - w_{n,c} \left[
p_c y_{n,c} \cdot \log \sigma(x_{n,c}) + (1 - y_{n,c}) \cdot \log (1 -
\sigma(x_{n,c})) \right],

where cc is the class number (c>1c > 1 for multi-label binary classification,
c=1c = 1 for single-label binary classification), nn is the number of the
sample in the batch and pcp_c is the weight of the positive answer for the
class cc .

pc>1p_c > 1 increases the recall, pc<1p_c < 1 increases the precision.

For example, if a dataset contains 100 positive and 300 negative examples of a
single class, then `pos_weight` for the class should be equal to
300100=3\frac{300}{100}=3 . The loss would act as if the dataset contains
3×100=3003\times 100=300 positive examples.

Examples:

    
    
    >>> target = torch.ones([10, 64], dtype=torch.float32)  # 64 classes, batch size = 10
    >>> output = torch.full([10, 64], 1.5)  # A prediction (logit)
    >>> pos_weight = torch.ones([64])  # All weights are equal to 1
    >>> criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight)
    >>> criterion(output, target)  # -log(sigmoid(1.5))
    tensor(0.2014)
    

Parameters

    

  * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size `nbatch`.
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`
  * **pos_weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a weight of positive examples. Must be a vector with length equal to the number of classes.

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input
  * Output: scalar. If `reduction` is `'none'`, then (N,∗)(N, *) , same shape as input.

Examples:

    
    
    >>> loss = nn.BCEWithLogitsLoss()
    >>> input = torch.randn(3, requires_grad=True)
    >>> target = torch.empty(3).random_(2)
    >>> output = loss(input, target)
    >>> output.backward()
    

# Bilinear

`class torch.nn.Bilinear(in1_features, in2_features, out_features, bias=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#Bilinear)

    

Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A
x_2 + b

Parameters

    

  * **in1_features** – size of each first input sample
  * **in2_features** – size of each second input sample
  * **out_features** – size of each output sample
  * **bias** – If set to False, the layer will not learn an additive bias. Default: `True`

Shape:

    

  * Input1: (N,∗,Hin1)(N, *, H_{in1}) where Hin1=in1_featuresH_{in1}=\text{in1\\_features} and ∗* means any number of additional dimensions. All but the last dimension of the inputs should be the same.
  * Input2: (N,∗,Hin2)(N, *, H_{in2}) where Hin2=in2_featuresH_{in2}=\text{in2\\_features} .
  * Output: (N,∗,Hout)(N, *, H_{out}) where Hout=out_featuresH_{out}=\text{out\\_features} and all but the last dimension are the same shape as the input.

Variables

    

  * **~Bilinear.weight** – the learnable weights of the module of shape (out_features,in1_features,in2_features)(\text{out\\_features}, \text{in1\\_features}, \text{in2\\_features}) . The values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in1_featuresk = \frac{1}{\text{in1\\_features}}
  * **~Bilinear.bias** – the learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in1_featuresk = \frac{1}{\text{in1\\_features}}

Examples:

    
    
    >>> m = nn.Bilinear(20, 30, 40)
    >>> input1 = torch.randn(128, 20)
    >>> input2 = torch.randn(128, 30)
    >>> output = m(input1, input2)
    >>> print(output.size())
    torch.Size([128, 40])
    

# CELU

`class torch.nn.CELU(alpha=1.0, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#CELU)

    

Applies the element-wise function:

CELU(x)=max⁡(0,x)+min⁡(0,α∗(exp⁡(x/α)−1))\text{CELU}(x) = \max(0,x) + \min(0,
\alpha * (\exp(x/\alpha) - 1))

More details can be found in the paper [Continuously Differentiable
Exponential Linear Units](https://arxiv.org/abs/1704.07483) .

Parameters

    

  * **alpha** – the α\alpha value for the CELU formulation. Default: 1.0
  * **inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.CELU()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# ChannelShuffle

`class torch.nn.ChannelShuffle(groups)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/channelshuffle.html#ChannelShuffle)

    

Divide the channels in a tensor of shape (∗,C,H,W)(*, C , H, W) into g groups
and rearrange them as (∗,Cg,g,H,W)(*, C \frac g, g, H, W) , while keeping the
original tensor shape.

Parameters

    

**groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – number of groups to divide channels in.

Examples:

    
    
    >>> channel_shuffle = nn.ChannelShuffle(2)
    >>> input = torch.randn(1, 4, 2, 2)
    >>> print(input)
    [[[[1, 2],
       [3, 4]],
      [[5, 6],
       [7, 8]],
      [[9, 10],
       [11, 12]],
      [[13, 14],
       [15, 16]],
     ]]
    >>> output = channel_shuffle(input)
    >>> print(output)
    [[[[1, 2],
       [3, 4]],
      [[9, 10],
       [11, 12]],
      [[5, 6],
       [7, 8]],
      [[13, 14],
       [15, 16]],
     ]]
    

# ConstantPad1d

`class torch.nn.ConstantPad1d(padding, value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ConstantPad1d)

    

Pads the input tensor boundaries with a constant value.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in
both boundaries. If a 2-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} )

Shape:

    

  * Input: (N,C,Win)(N, C, W_{in})
  * Output: (N,C,Wout)(N, C, W_{out}) where

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ConstantPad1d(2, 3.5)
    >>> input = torch.randn(1, 2, 4)
    >>> input
    tensor([[[-1.0491, -0.7152, -0.0749,  0.8530],
             [-1.3287,  1.8966,  0.1466, -0.2771]]])
    >>> m(input)
    tensor([[[ 3.5000,  3.5000, -1.0491, -0.7152, -0.0749,  0.8530,  3.5000,
               3.5000],
             [ 3.5000,  3.5000, -1.3287,  1.8966,  0.1466, -0.2771,  3.5000,
               3.5000]]])
    >>> m = nn.ConstantPad1d(2, 3.5)
    >>> input = torch.randn(1, 2, 3)
    >>> input
    tensor([[[ 1.6616,  1.4523, -1.1255],
             [-3.6372,  0.1182, -1.8652]]])
    >>> m(input)
    tensor([[[ 3.5000,  3.5000,  1.6616,  1.4523, -1.1255,  3.5000,  3.5000],
             [ 3.5000,  3.5000, -3.6372,  0.1182, -1.8652,  3.5000,  3.5000]]])
    >>> # using different paddings for different sides
    >>> m = nn.ConstantPad1d((3, 1), 3.5)
    >>> m(input)
    tensor([[[ 3.5000,  3.5000,  3.5000,  1.6616,  1.4523, -1.1255,  3.5000],
             [ 3.5000,  3.5000,  3.5000, -3.6372,  0.1182, -1.8652,  3.5000]]])
    

# ConstantPad2d

`class torch.nn.ConstantPad2d(padding, value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ConstantPad2d)

    

Pads the input tensor boundaries with a constant value.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} , padding_top\text{padding\\_top} ,
padding_bottom\text{padding\\_bottom} )

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where

Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} +
\text{padding\\_bottom}

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ConstantPad2d(2, 3.5)
    >>> input = torch.randn(1, 2, 2)
    >>> input
    tensor([[[ 1.6585,  0.4320],
             [-0.8701, -0.4649]]])
    >>> m(input)
    tensor([[[ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
             [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
             [ 3.5000,  3.5000,  1.6585,  0.4320,  3.5000,  3.5000],
             [ 3.5000,  3.5000, -0.8701, -0.4649,  3.5000,  3.5000],
             [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
             [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000,  3.5000]]])
    >>> # using different paddings for different sides
    >>> m = nn.ConstantPad2d((3, 0, 2, 1), 3.5)
    >>> m(input)
    tensor([[[ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
             [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000],
             [ 3.5000,  3.5000,  3.5000,  1.6585,  0.4320],
             [ 3.5000,  3.5000,  3.5000, -0.8701, -0.4649],
             [ 3.5000,  3.5000,  3.5000,  3.5000,  3.5000]]])
    

# ConstantPad3d

`class torch.nn.ConstantPad3d(padding, value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ConstantPad3d)

    

Pads the input tensor boundaries with a constant value.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 6-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} , padding_top\text{padding\\_top} ,
padding_bottom\text{padding\\_bottom} , padding_front\text{padding\\_front} ,
padding_back\text{padding\\_back} )

Shape:

    

  * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in})
  * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) where

Dout=Din+padding_front+padding_backD_{out} = D_{in} + \text{padding\\_front} +
\text{padding\\_back}

Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} +
\text{padding\\_bottom}

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ConstantPad3d(3, 3.5)
    >>> input = torch.randn(16, 3, 10, 20, 30)
    >>> output = m(input)
    >>> # using different paddings for different sides
    >>> m = nn.ConstantPad3d((3, 3, 6, 6, 0, 1), 3.5)
    >>> output = m(input)
    

# Conv1d

`class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1,
padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#Conv1d)

    

Applies a 1D convolution over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size
(N,Cin,L)(N, C_{\text{in}}, L) and output (N,Cout,Lout)(N, C_{\text{out}},
L_{\text{out}}) can be precisely described as:

out(Ni,Coutj)=bias(Coutj)+∑k=0Cin−1weight(Coutj,k)⋆input(Ni,k)\text{out}(N_i,
C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1}
\text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k)

where ⋆\star is the valid [cross-
correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator, NN is
a batch size, CC denotes a number of channels, LL is a length of signal
sequence.

This module supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

  * `stride` controls the stride for the cross-correlation, a single number or a one-element tuple.
  * `padding` controls the amount of implicit padding on both sides for `padding` number of points.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
  * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example,

    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
    * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ).

Note

When `groups == in_channels` and `out_channels == K * in_channels`, where `K`
is a positive integer, this operation is also known as a “depthwise
convolution”.

In other words, for an input of size (N,Cin,Lin)(N, C_{in}, L_{in}) , a
depthwise convolution with a depthwise multiplier `K` can be performed with
the arguments (Cin=Cin,Cout=Cin×K,...,groups=Cin)(C_\text{in}=C_\text{in},
C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in}) .

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image
  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0
  * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`

Shape:

    

  * Input: (N,Cin,Lin)(N, C_{in}, L_{in})
  * Output: (N,Cout,Lout)(N, C_{out}, L_{out}) where

Lout=⌊Lin+2×padding−dilation×(kernel_size−1)−1stride+1⌋L_{out} =
\left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times
(\text{kernel\\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor

Variables

    

  * **~Conv1d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (out_channels,in_channelsgroups,kernel_size)(\text{out\\_channels}, \frac{\text{in\\_channels}}{\text{groups}}, \text{kernel\\_size}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗kernel_sizek = \frac{groups}{C_\text{in} * \text{kernel\\_size}}
  * **~Conv1d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗kernel_sizek = \frac{groups}{C_\text{in} * \text{kernel\\_size}}

Examples:

    
    
    >>> m = nn.Conv1d(16, 33, 3, stride=2)
    >>> input = torch.randn(20, 16, 50)
    >>> output = m(input)
    

# Conv2d

`class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1,
padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#Conv2d)

    

Applies a 2D convolution over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size
(N,Cin,H,W)(N, C_{\text{in}}, H, W) and output (N,Cout,Hout,Wout)(N,
C_{\text{out}}, H_{\text{out}}, W_{\text{out}}) can be precisely described as:

out(Ni,Coutj)=bias(Coutj)+∑k=0Cin−1weight(Coutj,k)⋆input(Ni,k)\text{out}(N_i,
C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k =
0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star
\text{input}(N_i, k)

where ⋆\star is the valid 2D [cross-
correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator, NN is
a batch size, CC denotes a number of channels, HH is a height of input planes
in pixels, and WW is width in pixels.

This module supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

  * `stride` controls the stride for the cross-correlation, a single number or a tuple.
  * `padding` controls the amount of implicit padding on both sides for `padding` number of points for each dimension.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
  * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example,

    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
    * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ).

The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be:

  * a single `int` – in which case the same value is used for the height and width dimension
  * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension

Note

When `groups == in_channels` and `out_channels == K * in_channels`, where `K`
is a positive integer, this operation is also known as a “depthwise
convolution”.

In other words, for an input of size (N,Cin,Lin)(N, C_{in}, L_{in}) , a
depthwise convolution with a depthwise multiplier `K` can be performed with
the arguments (Cin=Cin,Cout=Cin×K,...,groups=Cin)(C_\text{in}=C_\text{in},
C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in}) .

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image
  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0
  * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`

Shape:

    

  * Input: (N,Cin,Hin,Win)(N, C_{in}, H_{in}, W_{in})
  * Output: (N,Cout,Hout,Wout)(N, C_{out}, H_{out}, W_{out}) where

Hout=⌊Hin+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋H_{out} =
\left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0]
\times (\text{kernel\\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

Wout=⌊Win+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋W_{out} =
\left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1]
\times (\text{kernel\\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

Variables

    

  * **~Conv2d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (out_channels,in_channelsgroups,(\text{out\\_channels}, \frac{\text{in\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\\_size}[i]}
  * **~Conv2d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\\_size}[i]}

#### Examples

    
    
    >>> # With square kernels and equal stride
    >>> m = nn.Conv2d(16, 33, 3, stride=2)
    >>> # non-square kernels and unequal stride and with padding
    >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
    >>> # non-square kernels and unequal stride and with padding and dilation
    >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
    >>> input = torch.randn(20, 16, 50, 100)
    >>> output = m(input)
    

# Conv3d

`class torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1,
padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#Conv3d)

    

Applies a 3D convolution over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size
(N,Cin,D,H,W)(N, C_{in}, D, H, W) and output (N,Cout,Dout,Hout,Wout)(N,
C_{out}, D_{out}, H_{out}, W_{out}) can be precisely described as:

out(Ni,Coutj)=bias(Coutj)+∑k=0Cin−1weight(Coutj,k)⋆input(Ni,k)out(N_i,
C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k)
\star input(N_i, k)

where ⋆\star is the valid 3D [cross-
correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator

This module supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

  * `stride` controls the stride for the cross-correlation.
  * `padding` controls the amount of implicit padding on both sides for `padding` number of points for each dimension.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
  * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example,

    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
    * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ).

The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be:

  * a single `int` – in which case the same value is used for the depth, height and width dimension
  * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension

Note

When `groups == in_channels` and `out_channels == K * in_channels`, where `K`
is a positive integer, this operation is also known as a “depthwise
convolution”.

In other words, for an input of size (N,Cin,Lin)(N, C_{in}, L_{in}) , a
depthwise convolution with a depthwise multiplier `K` can be performed with
the arguments (Cin=Cin,Cout=Cin×K,...,groups=Cin)(C_\text{in}=C_\text{in},
C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in}) .

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image
  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to all three sides of the input. Default: 0
  * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`

Shape:

    

  * Input: (N,Cin,Din,Hin,Win)(N, C_{in}, D_{in}, H_{in}, W_{in})
  * Output: (N,Cout,Dout,Hout,Wout)(N, C_{out}, D_{out}, H_{out}, W_{out}) where

Dout=⌊Din+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋D_{out} =
\left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0]
\times (\text{kernel\\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

Hout=⌊Hin+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋H_{out} =
\left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1]
\times (\text{kernel\\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

Wout=⌊Win+2×padding[2]−dilation[2]×(kernel_size[2]−1)−1stride[2]+1⌋W_{out} =
\left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2]
\times (\text{kernel\\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor

Variables

    

  * **~Conv3d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (out_channels,in_channelsgroups,(\text{out\\_channels}, \frac{\text{in\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1],kernel_size[2])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}, \text{kernel\\_size[2]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\\_size}[i]}
  * **~Conv3d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\\_size}[i]}

Examples:

    
    
    >>> # With square kernels and equal stride
    >>> m = nn.Conv3d(16, 33, 3, stride=2)
    >>> # non-square kernels and unequal stride and with padding
    >>> m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0))
    >>> input = torch.randn(20, 16, 10, 50, 100)
    >>> output = m(input)
    

# ConvTranspose1d

`class torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size,
stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#ConvTranspose1d)

    

Applies a 1D transposed convolution operator over an input image composed of
several input planes.

This module can be seen as the gradient of Conv1d with respect to its input.
It is also known as a fractionally-strided convolution or a deconvolution
(although it is not an actual deconvolution operation).

This module supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

  * `stride` controls the stride for the cross-correlation.
  * `padding` controls the amount of implicit zero padding on both sides for `dilation * (kernel_size - 1) - padding` number of points. See note below for details.
  * `output_padding` controls the additional size added to one side of the output shape. See note below for details.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
  * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example,

    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
    * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ).

Note

The `padding` argument effectively adds `dilation * (kernel_size - 1) -
padding` amount of zero padding to both sizes of the input. This is set so
that when a [`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") and
a `ConvTranspose1d` are initialized with same parameters, they are inverses of
each other in regard to the input and output shapes. However, when `stride >
1`, [`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") maps
multiple input shapes to the same output shape. `output_padding` is provided
to resolve this ambiguity by effectively increasing the calculated output
shape on one side. Note that `output_padding` is only used to find output
shape, but does not actually add zero-padding to output.

Note

In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at a
performance cost) by setting `torch.backends.cudnn.deterministic = True`.
Please see the notes on
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
background.

Parameters

    

  * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image
  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of the input. Default: 0
  * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of the output shape. Default: 0
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1

Shape:

    

  * Input: (N,Cin,Lin)(N, C_{in}, L_{in})
  * Output: (N,Cout,Lout)(N, C_{out}, L_{out}) where

Lout=(Lin−1)×stride−2×padding+dilation×(kernel_size−1)+output_padding+1L_{out}
= (L_{in} - 1) \times \text{stride} - 2 \times \text{padding} +
\text{dilation} \times (\text{kernel\\_size} - 1) + \text{output\\_padding} +
1

Variables

    

  * **~ConvTranspose1d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels,out_channelsgroups,(\text{in\\_channels}, \frac{\text{out\\_channels}}{\text{groups}}, kernel_size)\text{kernel\\_size}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗kernel_sizek = \frac{groups}{C_\text{out} * \text{kernel\\_size}}
  * **~ConvTranspose1d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗kernel_sizek = \frac{groups}{C_\text{out} * \text{kernel\\_size}}

# ConvTranspose2d

`class torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size,
stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#ConvTranspose2d)

    

Applies a 2D transposed convolution operator over an input image composed of
several input planes.

This module can be seen as the gradient of Conv2d with respect to its input.
It is also known as a fractionally-strided convolution or a deconvolution
(although it is not an actual deconvolution operation).

This module supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

  * `stride` controls the stride for the cross-correlation.
  * `padding` controls the amount of implicit zero padding on both sides for `dilation * (kernel_size - 1) - padding` number of points. See note below for details.
  * `output_padding` controls the additional size added to one side of the output shape. See note below for details.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
  * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example,

    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
    * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ).

The parameters `kernel_size`, `stride`, `padding`, `output_padding` can either
be:

  * a single `int` – in which case the same value is used for the height and width dimensions
  * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension

Note

The `padding` argument effectively adds `dilation * (kernel_size - 1) -
padding` amount of zero padding to both sizes of the input. This is set so
that when a [`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") and
a `ConvTranspose2d` are initialized with same parameters, they are inverses of
each other in regard to the input and output shapes. However, when `stride >
1`, [`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") maps
multiple input shapes to the same output shape. `output_padding` is provided
to resolve this ambiguity by effectively increasing the calculated output
shape on one side. Note that `output_padding` is only used to find output
shape, but does not actually add zero-padding to output.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image
  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0
  * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1

Shape:

    

  * Input: (N,Cin,Hin,Win)(N, C_{in}, H_{in}, W_{in})
  * Output: (N,Cout,Hout,Wout)(N, C_{out}, H_{out}, W_{out}) where

Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1H_{out}
= (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] +
\text{dilation}[0] \times (\text{kernel\\_size}[0] - 1) +
\text{output\\_padding}[0] + 1

Wout=(Win−1)×stride[1]−2×padding[1]+dilation[1]×(kernel_size[1]−1)+output_padding[1]+1W_{out}
= (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] +
\text{dilation}[1] \times (\text{kernel\\_size}[1] - 1) +
\text{output\\_padding}[1] + 1

Variables

    

  * **~ConvTranspose2d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels,out_channelsgroups,(\text{in\\_channels}, \frac{\text{out\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\\_size}[i]}
  * **~ConvTranspose2d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels) If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\\_size}[i]}

Examples:

    
    
    >>> # With square kernels and equal stride
    >>> m = nn.ConvTranspose2d(16, 33, 3, stride=2)
    >>> # non-square kernels and unequal stride and with padding
    >>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
    >>> input = torch.randn(20, 16, 50, 100)
    >>> output = m(input)
    >>> # exact output size can be also specified as an argument
    >>> input = torch.randn(1, 16, 12, 12)
    >>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
    >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
    >>> h = downsample(input)
    >>> h.size()
    torch.Size([1, 16, 6, 6])
    >>> output = upsample(h, output_size=input.size())
    >>> output.size()
    torch.Size([1, 16, 12, 12])
    

# ConvTranspose3d

`class torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size,
stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#ConvTranspose3d)

    

Applies a 3D transposed convolution operator over an input image composed of
several input planes. The transposed convolution operator multiplies each
input value element-wise by a learnable kernel, and sums over the outputs from
all input feature planes.

This module can be seen as the gradient of Conv3d with respect to its input.
It is also known as a fractionally-strided convolution or a deconvolution
(although it is not an actual deconvolution operation).

This module supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

  * `stride` controls the stride for the cross-correlation.
  * `padding` controls the amount of implicit zero padding on both sides for `dilation * (kernel_size - 1) - padding` number of points. See note below for details.
  * `output_padding` controls the additional size added to one side of the output shape. See note below for details.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.
  * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example,

    * At groups=1, all inputs are convolved to all outputs.
    * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated.
    * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ).

The parameters `kernel_size`, `stride`, `padding`, `output_padding` can either
be:

  * a single `int` – in which case the same value is used for the depth, height and width dimensions
  * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension

Note

The `padding` argument effectively adds `dilation * (kernel_size - 1) -
padding` amount of zero padding to both sizes of the input. This is set so
that when a [`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") and
a `ConvTranspose3d` are initialized with same parameters, they are inverses of
each other in regard to the input and output shapes. However, when `stride >
1`, [`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") maps
multiple input shapes to the same output shape. `output_padding` is provided
to resolve this ambiguity by effectively increasing the calculated output
shape on one side. Note that `output_padding` is only used to find output
shape, but does not actually add zero-padding to output.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image
  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0
  * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1

Shape:

    

  * Input: (N,Cin,Din,Hin,Win)(N, C_{in}, D_{in}, H_{in}, W_{in})
  * Output: (N,Cout,Dout,Hout,Wout)(N, C_{out}, D_{out}, H_{out}, W_{out}) where

Dout=(Din−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1D_{out}
= (D_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] +
\text{dilation}[0] \times (\text{kernel\\_size}[0] - 1) +
\text{output\\_padding}[0] + 1

Hout=(Hin−1)×stride[1]−2×padding[1]+dilation[1]×(kernel_size[1]−1)+output_padding[1]+1H_{out}
= (H_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] +
\text{dilation}[1] \times (\text{kernel\\_size}[1] - 1) +
\text{output\\_padding}[1] + 1

Wout=(Win−1)×stride[2]−2×padding[2]+dilation[2]×(kernel_size[2]−1)+output_padding[2]+1W_{out}
= (W_{in} - 1) \times \text{stride}[2] - 2 \times \text{padding}[2] +
\text{dilation}[2] \times (\text{kernel\\_size}[2] - 1) +
\text{output\\_padding}[2] + 1

Variables

    

  * **~ConvTranspose3d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels,out_channelsgroups,(\text{in\\_channels}, \frac{\text{out\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1],kernel_size[2])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}, \text{kernel\\_size[2]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel\\_size}[i]}
  * **~ConvTranspose3d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels) If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel\\_size}[i]}

Examples:

    
    
    >>> # With square kernels and equal stride
    >>> m = nn.ConvTranspose3d(16, 33, 3, stride=2)
    >>> # non-square kernels and unequal stride and with padding
    >>> m = nn.ConvTranspose3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2))
    >>> input = torch.randn(20, 16, 10, 50, 100)
    >>> output = m(input)
    

# CosineEmbeddingLoss

`class torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None,
reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#CosineEmbeddingLoss)

    

Creates a criterion that measures the loss given input tensors x1x_1 , x2x_2
and a `Tensor` label yy with values 1 or -1. This is used for measuring
whether two inputs are similar or dissimilar, using the cosine distance, and
is typically used for learning nonlinear embeddings or semi-supervised
learning.

The loss function for each sample is:

loss(x,y)={1−cos⁡(x1,x2),if y=1max⁡(0,cos⁡(x1,x2)−margin),if
y=−1\text{loss}(x, y) = \begin{cases} 1 - \cos(x_1, x_2), & \text{if } y = 1
\\\ \max(0, \cos(x_1, x_2) - \text{margin}), & \text{if } y = -1 \end{cases}

Parameters

    

  * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Should be a number from −1-1 to 11 , 00 to 0.50.5 is suggested. If `margin` is missing, the default value is 00 .
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

# CosineSimilarity

`class torch.nn.CosineSimilarity(dim=1, eps=1e-08)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/distance.html#CosineSimilarity)

    

Returns cosine similarity between x1x_1 and x2x_2 , computed along dim.

similarity=x1⋅x2max⁡(∥x1∥2⋅∥x2∥2,ϵ).\text{similarity} = \dfrac{x_1 \cdot
x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)}.

Parameters

    

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Dimension where cosine similarity is computed. Default: 1
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid division by zero. Default: 1e-8

Shape:

    

  * Input1: (∗1,D,∗2)(\ast_1, D, \ast_2) where D is at position `dim`
  * Input2: (∗1,D,∗2)(\ast_1, D, \ast_2) , same shape as the Input1
  * Output: (∗1,∗2)(\ast_1, \ast_2)

Examples::

    
    
    
    >>> input1 = torch.randn(100, 128)
    >>> input2 = torch.randn(100, 128)
    >>> cos = nn.CosineSimilarity(dim=1, eps=1e-6)
    >>> output = cos(input1, input2)
    

# CrossEntropyLoss

`class torch.nn.CrossEntropyLoss(weight=None, size_average=None,
ignore_index=-100, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#CrossEntropyLoss)

    

This criterion combines [`LogSoftmax`](torch.nn.logsoftmax#torch.nn.LogSoftmax
"torch.nn.LogSoftmax") and [`NLLLoss`](torch.nn.nllloss#torch.nn.NLLLoss
"torch.nn.NLLLoss") in one single class.

It is useful when training a classification problem with `C` classes. If
provided, the optional argument `weight` should be a 1D `Tensor` assigning
weight to each of the classes. This is particularly useful when you have an
unbalanced training set.

The `input` is expected to contain raw, unnormalized scores for each class.

`input` has to be a Tensor of size either (minibatch,C)(minibatch, C) or
(minibatch,C,d1,d2,...,dK)(minibatch, C, d_1, d_2, ..., d_K) with K≥1K \geq 1
for the `K`-dimensional case (described later).

This criterion expects a class index in the range [0,C−1][0, C-1] as the
`target` for each value of a 1D tensor of size `minibatch`; if `ignore_index`
is specified, this criterion also accepts this class index (this index may not
necessarily be in the class range).

The loss can be described as:

loss(x,class)=−log⁡(exp⁡(x[class])∑jexp⁡(x[j]))=−x[class]+log⁡(∑jexp⁡(x[j]))\text{loss}(x,
class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right) =
-x[class] + \log\left(\sum_j \exp(x[j])\right)

or in the case of the `weight` argument being specified:

loss(x,class)=weight[class](−x[class]+log⁡(∑jexp⁡(x[j])))\text{loss}(x, class)
= weight[class] \left(-x[class] + \log\left(\sum_j \exp(x[j])\right)\right)

The losses are averaged across observations for each minibatch. If the
`weight` argument is specified then this is a weighted average:

loss=∑i=1Nloss(i,class[i])∑i=1Nweight[class[i]]\text{loss} =
\frac{\sum^{N}_{i=1} loss(i, class[i])}{\sum^{N}_{i=1} weight[class[i]]}

Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size (minibatch,C,d1,d2,...,dK)(minibatch, C, d_1, d_2, ..., d_K)
with K≥1K \geq 1 , where KK is the number of dimensions, and a target of
appropriate shape (see below).

Parameters

    

  * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, has to be a Tensor of size `C`
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets.
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the weighted mean of the output is taken, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (N,C)(N, C) where `C = number of classes`, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of `K`-dimensional loss.
  * Target: (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss.
  * Output: scalar. If `reduction` is `'none'`, then the same size as the target: (N)(N) , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss.

Examples:

    
    
    >>> loss = nn.CrossEntropyLoss()
    >>> input = torch.randn(3, 5, requires_grad=True)
    >>> target = torch.empty(3, dtype=torch.long).random_(5)
    >>> output = loss(input, target)
    >>> output.backward()
    

# CTCLoss

`class torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#CTCLoss)

    

The Connectionist Temporal Classification loss.

Calculates loss between a continuous (unsegmented) time series and a target
sequence. CTCLoss sums over the probability of possible alignments of input to
target, producing a loss value which is differentiable with respect to each
input node. The alignment of input to target is assumed to be “many-to-one”,
which limits the length of the target sequence such that it must be ≤\leq the
input length.

Parameters

    

  * **blank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – blank label. Default 00 .
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the output losses will be divided by the target lengths and then the mean over the batch is taken. Default: `'mean'`
  * **zero_infinity** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether to zero infinite losses and the associated gradients. Default: `False` Infinite losses mainly occur when the inputs are too short to be aligned to the targets.

Shape:

    

  * Log_probs: Tensor of size (T,N,C)(T, N, C) , where T=input lengthT = \text{input length} , N=batch sizeN = \text{batch size} , and C=number of classes (including blank)C = \text{number of classes (including blank)} . The logarithmized probabilities of the outputs (e.g. obtained with [`torch.nn.functional.log_softmax()`](../nn.functional#torch.nn.functional.log_softmax "torch.nn.functional.log_softmax")).
  * Targets: Tensor of size (N,S)(N, S) or (sum⁡(target_lengths))(\operatorname{sum}(\text{target\\_lengths})) , where N=batch sizeN = \text{batch size} and S=max target length, if shape is (N,S)S = \text{max target length, if shape is } (N, S) . It represent the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the (N,S)(N, S) form, targets are padded to the length of the longest sequence, and stacked. In the (sum⁡(target_lengths))(\operatorname{sum}(\text{target\\_lengths})) form, the targets are assumed to be un-padded and concatenated within 1 dimension.
  * Input_lengths: Tuple or tensor of size (N)(N) , where N=batch sizeN = \text{batch size} . It represent the lengths of the inputs (must each be ≤T\leq T ). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths.
  * Target_lengths: Tuple or tensor of size (N)(N) , where N=batch sizeN = \text{batch size} . It represent lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is (N,S)(N,S) , target_lengths are effectively the stop index sns_n for each target sequence, such that `target_n = targets[n,0:s_n]` for each target in a batch. Lengths must each be ≤S\leq S If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor.
  * Output: scalar. If `reduction` is `'none'`, then (N)(N) , where N=batch sizeN = \text{batch size} .

Examples:

    
    
    >>> # Target are to be padded
    >>> T = 50      # Input sequence length
    >>> C = 20      # Number of classes (including blank)
    >>> N = 16      # Batch size
    >>> S = 30      # Target sequence length of longest target in batch (padding length)
    >>> S_min = 10  # Minimum target length, for demonstration purposes
    >>>
    >>> # Initialize random batch of input vectors, for *size = (T,N,C)
    >>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
    >>>
    >>> # Initialize random batch of targets (0 = blank, 1:C = classes)
    >>> target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long)
    >>>
    >>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
    >>> target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long)
    >>> ctc_loss = nn.CTCLoss()
    >>> loss = ctc_loss(input, target, input_lengths, target_lengths)
    >>> loss.backward()
    >>>
    >>>
    >>> # Target are to be un-padded
    >>> T = 50      # Input sequence length
    >>> C = 20      # Number of classes (including blank)
    >>> N = 16      # Batch size
    >>>
    >>> # Initialize random batch of input vectors, for *size = (T,N,C)
    >>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_()
    >>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long)
    >>>
    >>> # Initialize random batch of targets (0 = blank, 1:C = classes)
    >>> target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long)
    >>> target = torch.randint(low=1, high=C, size=(sum(target_lengths),), dtype=torch.long)
    >>> ctc_loss = nn.CTCLoss()
    >>> loss = ctc_loss(input, target, input_lengths, target_lengths)
    >>> loss.backward()
    

Reference:

    

A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented
Sequence Data with Recurrent Neural Networks:
<https://www.cs.toronto.edu/~graves/icml_2006.pdf>

Note

In order to use CuDNN, the following must be satisfied: `targets` must be in
concatenated format, all `input_lengths` must be `T`. blank=0blank=0 ,
`target_lengths` ≤256\leq 256 , the integer arguments must be of dtype
`torch.int32`.

The regular implementation uses the (more common in PyTorch) `torch.long`
dtype.

Note

In some circumstances when using the CUDA backend with CuDNN, this operator
may select a nondeterministic algorithm to increase performance. If this is
undesirable, you can try to make the operation deterministic (potentially at a
performance cost) by setting `torch.backends.cudnn.deterministic = True`.
Please see the notes on
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
background.

# DataParallel

`class torch.nn.DataParallel(module, device_ids=None, output_device=None,
dim=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/data_parallel.html#DataParallel)

    

Implements data parallelism at the module level.

This container parallelizes the application of the given `module` by splitting
the input across the specified devices by chunking in the batch dimension
(other objects will be copied once per device). In the forward pass, the
module is replicated on each device, and each replica handles a portion of the
input. During the backwards pass, gradients from each replica are summed into
the original module.

The batch size should be larger than the number of GPUs used.

Warning

It is recommended to use
[`DistributedDataParallel`](torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel"), instead of this class, to do
multi-GPU training, even if there is only a single node. See: [Use
nn.parallel.DistributedDataParallel instead of multiprocessing or
nn.DataParallel](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-nn-ddp-
instead) and [Distributed Data
Parallel](https://pytorch.org/docs/1.8.0/notes/ddp.html#ddp).

Arbitrary positional and keyword inputs are allowed to be passed into
DataParallel but some types are specially handled. tensors will be
**scattered** on dim specified (default 0). tuple, list and dict types will be
shallow copied. The other types will be shared among different threads and can
be corrupted if written to in the model’s forward pass.

The parallelized `module` must have its parameters and buffers on
`device_ids[0]` before running this `DataParallel` module.

Warning

In each forward, `module` is **replicated** on each device, so any updates to
the running module in `forward` will be lost. For example, if `module` has a
counter attribute that is incremented in each `forward`, it will always stay
at the initial value because the update is done on the replicas which are
destroyed after `forward`. However, `DataParallel` guarantees that the replica
on `device[0]` will have its parameters and buffers sharing storage with the
base parallelized `module`. So **in-place** updates to the parameters or
buffers on `device[0]` will be recorded. E.g.,
[`BatchNorm2d`](torch.nn.batchnorm2d#torch.nn.BatchNorm2d
"torch.nn.BatchNorm2d") and
[`spectral_norm()`](torch.nn.utils.spectral_norm#torch.nn.utils.spectral_norm
"torch.nn.utils.spectral_norm") rely on this behavior to update the buffers.

Warning

Forward and backward hooks defined on `module` and its submodules will be
invoked `len(device_ids)` times, each with inputs located on a particular
device. Particularly, the hooks are only guaranteed to be executed in correct
order with respect to operations on corresponding devices. For example, it is
not guaranteed that hooks set via
[`register_forward_pre_hook()`](torch.nn.module#torch.nn.Module.register_forward_pre_hook
"torch.nn.Module.register_forward_pre_hook") be executed before `all`
`len(device_ids)` [`forward()`](torch.nn.module#torch.nn.Module.forward
"torch.nn.Module.forward") calls, but that each such hook be executed before
the corresponding [`forward()`](torch.nn.module#torch.nn.Module.forward
"torch.nn.Module.forward") call of that device.

Warning

When `module` returns a scalar (i.e., 0-dimensional tensor) in `forward()`,
this wrapper will return a vector of length equal to number of devices used in
data parallelism, containing the result from each device.

Note

There is a subtlety in using the `pack sequence -> recurrent network -> unpack
sequence` pattern in a [`Module`](torch.nn.module#torch.nn.Module
"torch.nn.Module") wrapped in `DataParallel`. See [My recurrent network
doesn’t work with data
parallelism](https://pytorch.org/docs/1.8.0/notes/faq.html#pack-rnn-unpack-
with-data-parallelism) section in FAQ for details.

Parameters

    

  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module to be parallelized
  * **device_ids** (_list of python:int_ _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – CUDA devices (default: all devices)
  * **output_device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – device location of output (default: device_ids[0])

Variables

    

**~DataParallel.module** ([Module](torch.nn.module#torch.nn.Module
"torch.nn.Module")) – the module to be parallelized

Example:

    
    
    >>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
    >>> output = net(input_var)  # input_var can be on any device, including CPU
    

# Dropout

`class torch.nn.Dropout(p=0.5, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#Dropout)

    

During training, randomly zeroes some of the elements of the input tensor with
probability `p` using samples from a Bernoulli distribution. Each channel will
be zeroed out independently on every forward call.

This has proven to be an effective technique for regularization and preventing
the co-adaptation of neurons as described in the paper [Improving neural
networks by preventing co-adaptation of feature
detectors](https://arxiv.org/abs/1207.0580) .

Furthermore, the outputs are scaled by a factor of 11−p\frac{1}{1-p} during
training. This means that during evaluation the module simply computes an
identity function.

Parameters

    

  * **p** – probability of an element to be zeroed. Default: 0.5
  * **inplace** – If set to `True`, will do this operation in-place. Default: `False`

Shape:

    

  * Input: (∗)(*) . Input can be of any shape
  * Output: (∗)(*) . Output is of the same shape as input

Examples:

    
    
    >>> m = nn.Dropout(p=0.2)
    >>> input = torch.randn(20, 16)
    >>> output = m(input)
    

# Dropout2d

`class torch.nn.Dropout2d(p=0.5, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#Dropout2d)

    

Randomly zero out entire channels (a channel is a 2D feature map, e.g., the jj
-th channel of the ii -th sample in the batched input is a 2D tensor
input[i,j]\text{input}[i, j] ). Each channel will be zeroed out independently
on every forward call with probability `p` using samples from a Bernoulli
distribution.

Usually the input comes from `nn.Conv2d` modules.

As described in the paper [Efficient Object Localization Using Convolutional
Networks](https://arxiv.org/abs/1411.4280) , if adjacent pixels within feature
maps are strongly correlated (as is normally the case in early convolution
layers) then i.i.d. dropout will not regularize the activations and will
otherwise just result in an effective learning rate decrease.

In this case, `nn.Dropout2d()` will help promote independence between feature
maps and should be used instead.

Parameters

    

  * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – probability of an element to be zero-ed.
  * **inplace** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to `True`, will do this operation in-place

Shape:

    

  * Input: (N,C,H,W)(N, C, H, W)
  * Output: (N,C,H,W)(N, C, H, W) (same shape as input)

Examples:

    
    
    >>> m = nn.Dropout2d(p=0.2)
    >>> input = torch.randn(20, 16, 32, 32)
    >>> output = m(input)
    

# Dropout3d

`class torch.nn.Dropout3d(p=0.5, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#Dropout3d)

    

Randomly zero out entire channels (a channel is a 3D feature map, e.g., the jj
-th channel of the ii -th sample in the batched input is a 3D tensor
input[i,j]\text{input}[i, j] ). Each channel will be zeroed out independently
on every forward call with probability `p` using samples from a Bernoulli
distribution.

Usually the input comes from `nn.Conv3d` modules.

As described in the paper [Efficient Object Localization Using Convolutional
Networks](https://arxiv.org/abs/1411.4280) , if adjacent pixels within feature
maps are strongly correlated (as is normally the case in early convolution
layers) then i.i.d. dropout will not regularize the activations and will
otherwise just result in an effective learning rate decrease.

In this case, `nn.Dropout3d()` will help promote independence between feature
maps and should be used instead.

Parameters

    

  * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – probability of an element to be zeroed.
  * **inplace** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to `True`, will do this operation in-place

Shape:

    

  * Input: (N,C,D,H,W)(N, C, D, H, W)
  * Output: (N,C,D,H,W)(N, C, D, H, W) (same shape as input)

Examples:

    
    
    >>> m = nn.Dropout3d(p=0.2)
    >>> input = torch.randn(20, 16, 4, 32, 32)
    >>> output = m(input)
    

# ELU

`class torch.nn.ELU(alpha=1.0, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#ELU)

    

Applies the element-wise function:

ELU(x)={x, if x>0α∗(exp⁡(x)−1), if x≤0\text{ELU}(x) = \begin{cases} x, &
\text{ if } x > 0\\\ \alpha * (\exp(x) - 1), & \text{ if } x \leq 0
\end{cases}

Parameters

    

  * **alpha** – the α\alpha value for the ELU formulation. Default: 1.0
  * **inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.ELU()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Embedding

`class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None,
max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False,
_weight=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#Embedding)

    

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using
indices. The input to the module is a list of indices, and the output is the
corresponding word embeddings.

Parameters

    

  * **num_embeddings** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the dictionary of embeddings
  * **embedding_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of each embedding vector
  * **padding_idx** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – If given, pads the output with the embedding vector at `padding_idx` (initialized to zeros) whenever it encounters the index.
  * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`.
  * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The p of the p-norm to compute for the `max_norm` option. Default `2`.
  * **scale_grad_by_freq** (_boolean_ _,__optional_) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`.
  * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, gradient w.r.t. `weight` matrix will be a sparse tensor. See Notes for more details regarding sparse gradients.

Variables

    

**~Embedding.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the
learnable weights of the module of shape (num_embeddings, embedding_dim)
initialized from N(0,1)\mathcal{N}(0, 1)

Shape:

    

  * Input: (∗)(*) , IntTensor or LongTensor of arbitrary shape containing the indices to extract
  * Output: (∗,H)(*, H) , where `*` is the input shape and H=embedding_dimH=\text{embedding\\_dim}

Note

Keep in mind that only a limited number of optimizers support sparse
gradients: currently it’s `optim.SGD` (`CUDA` and `CPU`), `optim.SparseAdam`
(`CUDA` and `CPU`) and `optim.Adagrad` (`CPU`)

Note

With `padding_idx` set, the embedding vector at `padding_idx` is initialized
to all zeros. However, note that this vector can be modified afterwards, e.g.,
using a customized initialization method, and thus changing the vector used to
pad the output. The gradient for this vector from `Embedding` is always zero.

Note

When `max_norm` is not `None`, `Embedding`’s forward method will modify the
`weight` tensor in-place. Since tensors needed for gradient computations
cannot be modified in-place, performing a differentiable operation on
`Embedding.weight` before calling `Embedding`’s forward method requires
cloning `Embedding.weight` when `max_norm` is not `None`. For example:

    
    
    n, d, m = 3, 5, 7
    embedding = nn.Embedding(n, d, max_norm=True)
    W = torch.randn((m, d), requires_grad=True)
    idx = torch.tensor([1, 2])
    a = embedding.weight.clone() @ W.t()  # weight must be cloned for this to be differentiable
    b = embedding(idx) @ W.t()  # modifies weight in-place
    out = (a.unsqueeze(0) + b.unsqueeze(1))
    loss = out.sigmoid().prod()
    loss.backward()
    

Examples:

    
    
    >>> # an Embedding module containing 10 tensors of size 3
    >>> embedding = nn.Embedding(10, 3)
    >>> # a batch of 2 samples of 4 indices each
    >>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]])
    >>> embedding(input)
    tensor([[[-0.0251, -1.6902,  0.7172],
             [-0.6431,  0.0748,  0.6969],
             [ 1.4970,  1.3448, -0.9685],
             [-0.3677, -2.7265, -0.1685]],
    
            [[ 1.4970,  1.3448, -0.9685],
             [ 0.4362, -0.4004,  0.9400],
             [-0.6431,  0.0748,  0.6969],
             [ 0.9124, -2.3616,  1.1151]]])
    
    
    >>> # example with padding_idx
    >>> embedding = nn.Embedding(10, 3, padding_idx=0)
    >>> input = torch.LongTensor([[0,2,0,5]])
    >>> embedding(input)
    tensor([[[ 0.0000,  0.0000,  0.0000],
             [ 0.1535, -2.0309,  0.9315],
             [ 0.0000,  0.0000,  0.0000],
             [-0.1655,  0.9897,  0.0635]]])
    

`classmethod from_pretrained(embeddings, freeze=True, padding_idx=None,
max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#Embedding.from_pretrained)

    

Creates Embedding instance from given 2-dimensional FloatTensor.

Parameters

    

  * **embeddings** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – FloatTensor containing weights for the Embedding. First dimension is being passed to Embedding as `num_embeddings`, second as `embedding_dim`.
  * **freeze** (_boolean_ _,__optional_) – If `True`, the tensor does not get updated in the learning process. Equivalent to `embedding.weight.requires_grad = False`. Default: `True`
  * **padding_idx** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – See module initialization documentation.
  * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation.
  * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default `2`.
  * **scale_grad_by_freq** (_boolean_ _,__optional_) – See module initialization documentation. Default `False`.
  * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – See module initialization documentation.

Examples:

    
    
    >>> # FloatTensor containing pretrained weights
    >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
    >>> embedding = nn.Embedding.from_pretrained(weight)
    >>> # Get embeddings for index 1
    >>> input = torch.LongTensor([1])
    >>> embedding(input)
    tensor([[ 4.0000,  5.1000,  6.3000]])
    

# EmbeddingBag

`class torch.nn.EmbeddingBag(num_embeddings, embedding_dim, max_norm=None,
norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False,
_weight=None, include_last_offset=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#EmbeddingBag)

    

Computes sums or means of ‘bags’ of embeddings, without instantiating the
intermediate embeddings.

For bags of constant length and no `per_sample_weights` and 2D inputs, this
class

  * with `mode="sum"` is equivalent to [`Embedding`](torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") followed by `torch.sum(dim=1)`,
  * with `mode="mean"` is equivalent to [`Embedding`](torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") followed by `torch.mean(dim=1)`,
  * with `mode="max"` is equivalent to [`Embedding`](torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") followed by `torch.max(dim=1)`.

However, `EmbeddingBag` is much more time and memory efficient than using a
chain of these operations.

EmbeddingBag also supports per-sample weights as an argument to the forward
pass. This scales the output of the Embedding before performing a weighted
reduction as specified by `mode`. If `per_sample_weights`` is passed, the only
supported `mode` is `"sum"`, which computes a weighted sum according to
`per_sample_weights`.

Parameters

    

  * **num_embeddings** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the dictionary of embeddings
  * **embedding_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of each embedding vector
  * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`.
  * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The p of the p-norm to compute for the `max_norm` option. Default `2`.
  * **scale_grad_by_freq** (_boolean_ _,__optional_) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`. Note: this option is not supported when `mode="max"`.
  * **mode** (_string_ _,__optional_) – `"sum"`, `"mean"` or `"max"`. Specifies the way to reduce the bag. `"sum"` computes the weighted sum, taking `per_sample_weights` into consideration. `"mean"` computes the average of the values in the bag, `"max"` computes the max value over each bag. Default: `"mean"`
  * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, gradient w.r.t. `weight` matrix will be a sparse tensor. See Notes for more details regarding sparse gradients. Note: this option is not supported when `mode="max"`.
  * **include_last_offset** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, `offsets` has one additional element, where the last element is equivalent to the size of `indices`. This matches the CSR format.

Variables

    

**~EmbeddingBag.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) –
the learnable weights of the module of shape `(num_embeddings, embedding_dim)`
initialized from N(0,1)\mathcal{N}(0, 1) .

`Inputs: input (IntTensor or LongTensor), offsets (IntTensor or LongTensor,
optional), and`

    

`per_index_weights` (Tensor, optional)

  * `input` and `offsets` have to be of the same type, either int or long
  * If `input` is 2D of shape `(B, N)`,

it will be treated as `B` bags (sequences) each of fixed length `N`, and this
will return `B` values aggregated in a way depending on the `mode`. `offsets`
is ignored and required to be `None` in this case.

  * If `input` is 1D of shape `(N)`,

it will be treated as a concatenation of multiple bags (sequences). `offsets`
is required to be a 1D tensor containing the starting index positions of each
bag in `input`. Therefore, for `offsets` of shape `(B)`, `input` will be
viewed as having `B` bags. Empty bags (i.e., having 0-length) will have
returned vectors filled by zeros.

per_sample_weights (Tensor, optional): a tensor of float / double weights, or
None

    

to indicate all weights should be taken to be `1`. If specified,
`per_sample_weights` must have exactly the same shape as input and is treated
as having the same `offsets`, if those are not `None`. Only supported for
`mode='sum'`.

Output shape: `(B, embedding_dim)`

Examples:

    
    
    >>> # an Embedding module containing 10 tensors of size 3
    >>> embedding_sum = nn.EmbeddingBag(10, 3, mode='sum')
    >>> # a batch of 2 samples of 4 indices each
    >>> input = torch.LongTensor([1,2,4,5,4,3,2,9])
    >>> offsets = torch.LongTensor([0,4])
    >>> embedding_sum(input, offsets)
    tensor([[-0.8861, -5.4350, -0.0523],
            [ 1.1306, -2.5798, -1.0044]])
    

`classmethod from_pretrained(embeddings, freeze=True, max_norm=None,
norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False,
include_last_offset=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#EmbeddingBag.from_pretrained)

    

Creates EmbeddingBag instance from given 2-dimensional FloatTensor.

Parameters

    

  * **embeddings** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – FloatTensor containing weights for the EmbeddingBag. First dimension is being passed to EmbeddingBag as ‘num_embeddings’, second as ‘embedding_dim’.
  * **freeze** (_boolean_ _,__optional_) – If `True`, the tensor does not get updated in the learning process. Equivalent to `embeddingbag.weight.requires_grad = False`. Default: `True`
  * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default: `None`
  * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default `2`.
  * **scale_grad_by_freq** (_boolean_ _,__optional_) – See module initialization documentation. Default `False`.
  * **mode** (_string_ _,__optional_) – See module initialization documentation. Default: `"mean"`
  * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default: `False`.
  * **include_last_offset** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default: `False`.

Examples:

    
    
    >>> # FloatTensor containing pretrained weights
    >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]])
    >>> embeddingbag = nn.EmbeddingBag.from_pretrained(weight)
    >>> # Get embeddings for index 1
    >>> input = torch.LongTensor([[1, 0]])
    >>> embeddingbag(input)
    tensor([[ 2.5000,  3.7000,  4.6500]])
    

# Flatten

`class torch.nn.Flatten(start_dim=1, end_dim=-1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/flatten.html#Flatten)

    

Flattens a contiguous range of dims into a tensor. For use with `Sequential`.

Shape:

    

  * Input: (N,∗dims)(N, *dims)
  * Output: (N,∏∗dims)(N, \prod *dims) (for the default case).

Parameters

    

  * **start_dim** – first dim to flatten (default = 1).
  * **end_dim** – last dim to flatten (default = -1).

Examples::

    
    
    
    >>> input = torch.randn(32, 1, 5, 5)
    >>> m = nn.Sequential(
    >>>     nn.Conv2d(1, 32, 5, 1, 1),
    >>>     nn.Flatten()
    >>> )
    >>> output = m(input)
    >>> output.size()
    torch.Size([32, 288])
    

`add_module(name, module)`

    

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

    

  * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name
  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – child module to be added to the module.

`apply(fn)`

    

Applies `fn` recursively to every submodule (as returned by `.children()`) as
well as self. Typical use includes initializing the parameters of a model (see
also [torch.nn.init](../nn.init#nn-init-doc)).

Parameters

    

**fn** ([`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") -> None)
– function to be applied to each submodule

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

Example:

    
    
    >>> @torch.no_grad()
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    

`bfloat16()`

    

Casts all floating point parameters and buffers to `bfloat16` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`buffers(recurse=True)`

    

Returns an iterator over module buffers.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields buffers of this module and all
submodules. Otherwise, yields only buffers that are direct members of this
module.

Yields

    

_torch.Tensor_ – module buffer

Example:

    
    
    >>> for buf in model.buffers():
    >>>     print(type(buf), buf.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`children()`

    

Returns an iterator over immediate children modules.

Yields

    

_Module_ – a child module

`cpu()`

    

Moves all model parameters and buffers to the CPU.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`cuda(device=None)`

    

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on GPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`double()`

    

Casts all floating point parameters and buffers to `double` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`eval()`

    

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout
"torch.nn.Dropout"), `BatchNorm`, etc.

This is equivalent with
[`self.train(False)`](torch.nn.module#torch.nn.Module.train
"torch.nn.Module.train").

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`float()`

    

Casts all floating point parameters and buffers to float datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`half()`

    

Casts all floating point parameters and buffers to `half` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`load_state_dict(state_dict, strict=True)`

    

Copies parameters and buffers from `state_dict` into this module and its
descendants. If `strict` is `True`, then the keys of `state_dict` must exactly
match the keys returned by this module’s
[`state_dict()`](torch.nn.module#torch.nn.Module.state_dict
"torch.nn.Module.state_dict") function.

Parameters

    

  * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Default: `True`

Returns

    

  * **missing_keys** is a list of str containing the missing keys
  * **unexpected_keys** is a list of str containing the unexpected keys

Return type

    

`NamedTuple` with `missing_keys` and `unexpected_keys` fields

`modules()`

    

Returns an iterator over all modules in the network.

Yields

    

_Module_ – a module in the network

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    
    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)
    

`named_buffers(prefix='', recurse=True)`

    

Returns an iterator over module buffers, yielding both the name of the buffer
as well as the buffer itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

    

_(string, torch.Tensor)_ – Tuple containing the name and buffer

Example:

    
    
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    

`named_children()`

    

Returns an iterator over immediate children modules, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple containing a name and child module

Example:

    
    
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    

`named_modules(memo=None, prefix='')`

    

Returns an iterator over all modules in the network, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    
    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
    

`named_parameters(prefix='', recurse=True)`

    

Returns an iterator over module parameters, yielding both the name of the
parameter as well as the parameter itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

    

_(string, Parameter)_ – Tuple containing the name and parameter

Example:

    
    
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    

`parameters(recurse=True)`

    

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields parameters of this module and
all submodules. Otherwise, yields only parameters that are direct members of
this module.

Yields

    

_Parameter_ – module parameter

Example:

    
    
    >>> for param in model.parameters():
    >>>     print(type(param), param.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`register_backward_hook(hook)`

    

Registers a backward hook on the module.

This function is deprecated in favor of
`nn.Module.register_full_backward_hook()` and the behavior of this function
will change in future versions.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_buffer(name, tensor, persistent=True)`

    

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a
model parameter. For example, BatchNorm’s `running_mean` is not a parameter,
but is part of the module’s state. Buffers, by default, are persistent and
will be saved alongside parameters. This behavior can be changed by setting
`persistent` to `False`. The only difference between a persistent buffer and a
non-persistent buffer is that the latter will not be a part of this module’s
`state_dict`.

Buffers can be accessed as attributes using given names.

Parameters

    

  * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name
  * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered.
  * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`.

Example:

    
    
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    

`register_forward_hook(hook)`

    

Registers a forward hook on the module.

The hook will be called every time after `forward()` has computed an output.
It should have the following signature:

    
    
    hook(module, input, output) -> None or modified output
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the output. It can modify the input inplace but it will not have effect
on forward since this is called after `forward()` is called.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_forward_pre_hook(hook)`

    

Registers a forward pre-hook on the module.

The hook will be called every time before `forward()` is invoked. It should
have the following signature:

    
    
    hook(module, input) -> None or modified input
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the input. User can either return a tuple or a single modified value in
the hook. We will wrap the value into a tuple if a single value is
returned(unless that value is already a tuple).

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_full_backward_hook(hook)`

    

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs
are computed. The hook should have the following signature:

    
    
    hook(module, grad_input, grad_output) -> tuple(Tensor) or None
    

The `grad_input` and `grad_output` are tuples that contain the gradients with
respect to the inputs and outputs respectively. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to the
input that will be used in place of `grad_input` in subsequent computations.
`grad_input` will only correspond to the inputs given as positional arguments
and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output`
will be `None` for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks
and will raise an error.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_parameter(name, param)`

    

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

    

  * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name
  * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module.

`requires_grad_(requires_grad=True)`

    

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ `requires_grad` attributes in-place.

This method is helpful for freezing part of the module for finetuning or
training parts of a model individually (e.g., GAN training).

Parameters

    

**requires_grad**
([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)")) – whether autograd should record operations on parameters in this
module. Default: `True`.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`state_dict(destination=None, prefix='', keep_vars=False)`

    

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included.
Keys are corresponding parameter and buffer names.

Returns

    

a dictionary containing a whole state of the module

Return type

    

[dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)")

Example:

    
    
    >>> module.state_dict().keys()
    ['bias', 'weight']
    

`to(*args, **kwargs)`

    

Moves and/or casts the parameters and buffers.

This can be called as

`to(device=None, dtype=None, non_blocking=False)`

`to(dtype, non_blocking=False)`

`to(tensor, non_blocking=False)`

`to(memory_format=torch.channels_last)`

Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to
"torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In
addition, this method will only cast the floating point or complex parameters
and buffers to :attr:`dtype` (if given). The integral parameters and buffers
will be moved `device`, if that is given, but with dtypes unchanged. When
`non_blocking` is set, it tries to convert/move asynchronously with respect to
the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA
devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

    

  * **device** (`torch.device`) – the desired device of the parameters and buffers in this module
  * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module
  * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
  * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

Examples:

    
    
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    
    >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.3741+0.j,  0.2382+0.j],
            [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
    >>> linear(torch.ones(3, 2, dtype=torch.cdouble))
    tensor([[0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
    

`train(mode=True)`

    

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout
"torch.nn.Dropout"), `BatchNorm`, etc.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode
(`False`). Default: `True`.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`type(dst_type)`

    

Casts all parameters and buffers to `dst_type`.

Parameters

    

**dst_type** ([type](https://docs.python.org/3/library/functions.html#type
"\(in Python v3.9\)") _or_ _string_) – the desired type

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`xpu(device=None)`

    

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on XPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`zero_grad(set_to_none=False)`

    

Sets gradients of all model parameters to zero. See similar function under
[`torch.optim.Optimizer`](../optim#torch.optim.Optimizer
"torch.optim.Optimizer") for more context.

Parameters

    

**set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – instead of setting to zero, set the grads to None.
See
[`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad
"torch.optim.Optimizer.zero_grad") for details.

# Fold

`class torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0,
stride=1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/fold.html#Fold)

    

Combines an array of sliding local blocks into a large containing tensor.

Consider a batched `input` tensor containing sliding local blocks, e.g.,
patches of images, of shape (N,C×∏(kernel_size),L)(N, C \times
\prod(\text{kernel\\_size}), L) , where NN is batch dimension,
C×∏(kernel_size)C \times \prod(\text{kernel\\_size}) is the number of values
within a block (a block has ∏(kernel_size)\prod(\text{kernel\\_size}) spatial
locations each containing a CC -channeled vector), and LL is the total number
of blocks. (This is exactly the same specification as the output shape of
[`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold").) This operation
combines these local blocks into the large `output` tensor of shape
(N,C,output_size[0],output_size[1],…)(N, C, \text{output\\_size}[0],
\text{output\\_size}[1], \dots) by summing the overlapping values. Similar to
[`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold"), the arguments
must satisfy

L=∏d⌊output_size[d]+2×padding[d]−dilation[d]×(kernel_size[d]−1)−1stride[d]+1⌋,L
= \prod_d \left\lfloor\frac{\text{output\\_size}[d] + 2 \times
\text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\\_size}[d] - 1)
- 1}{\text{stride}[d]} + 1\right\rfloor,

where dd is over all spatial dimensions.

  * `output_size` describes the spatial shape of the large containing tensor of the sliding local blocks. It is useful to resolve the ambiguity when multiple input shapes map to same number of sliding blocks, e.g., with `stride > 0`.

The `padding`, `stride` and `dilation` arguments specify how the sliding
blocks are retrieved.

  * `stride` controls the stride for the sliding blocks.
  * `padding` controls the amount of implicit zero-paddings on both sides for `padding` number of points for each dimension before reshaping.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.

Parameters

    

  * **output_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the shape of the spatial dimensions of the output (i.e., `output.sizes()[2:]`)
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the sliding blocks
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the stride of the sliding blocks in the input spatial dimensions. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – implicit zero padding to be added on both sides of input. Default: 0
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – a parameter that controls the stride of elements within the neighborhood. Default: 1

  * If `output_size`, `kernel_size`, `dilation`, `padding` or `stride` is an int or a tuple of length 1 then their values will be replicated across all spatial dimensions.
  * For the case of two output spatial dimensions this operation is sometimes called `col2im`.

Note

`Fold` calculates each combined value in the resulting large tensor by summing
all values from all containing blocks.
[`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold") extracts the
values in the local blocks by copying from the large tensor. So, if the blocks
overlap, they are not inverses of each other.

In general, folding and unfolding operations are related as follows. Consider
`Fold` and [`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold")
instances created with the same parameters:

    
    
    >>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...)
    >>> fold = nn.Fold(output_size=..., **fold_params)
    >>> unfold = nn.Unfold(**fold_params)
    

Then for any (supported) `input` tensor the following equality holds:

    
    
    fold(unfold(input)) == divisor * input
    

where `divisor` is a tensor that depends only on the shape and dtype of the
`input`:

    
    
    >>> input_ones = torch.ones(input.shape, dtype=input.dtype)
    >>> divisor = fold(unfold(input_ones))
    

When the `divisor` tensor contains no zero elements, then `fold` and `unfold`
operations are inverses of each other (up to constant divisor).

Warning

Currently, only 4-D output tensors (batched image-like tensors) are supported.

Shape:

    

  * Input: (N,C×∏(kernel_size),L)(N, C \times \prod(\text{kernel\\_size}), L)
  * Output: (N,C,output_size[0],output_size[1],…)(N, C, \text{output\\_size}[0], \text{output\\_size}[1], \dots) as described above

Examples:

    
    
    >>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2))
    >>> input = torch.randn(1, 3 * 2 * 2, 12)
    >>> output = fold(input)
    >>> output.size()
    torch.Size([1, 3, 4, 5])
    

# FractionalMaxPool2d

`class torch.nn.FractionalMaxPool2d(kernel_size, output_size=None,
output_ratio=None, return_indices=False, _random_samples=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#FractionalMaxPool2d)

    

Applies a 2D fractional max pooling over an input signal composed of several
input planes.

Fractional MaxPooling is described in detail in the paper [Fractional
MaxPooling](https://arxiv.org/abs/1412.6071) by Ben Graham

The max-pooling operation is applied in kH×kWkH \times kW regions by a
stochastic step size determined by the target output size. The number of
output features is equal to the number of input planes.

Parameters

    

  * **kernel_size** – the size of the window to take a max over. Can be a single number k (for a square kernel of k x k) or a tuple `(kh, kw)`
  * **output_size** – the target output size of the image of the form `oH x oW`. Can be a tuple `(oH, oW)` or a single number oH for a square image `oH x oH`
  * **output_ratio** – If one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1)
  * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to `nn.MaxUnpool2d()`. Default: `False`

#### Examples

    
    
    >>> # pool of square window of size=3, and target output size 13x12
    >>> m = nn.FractionalMaxPool2d(3, output_size=(13, 12))
    >>> # pool of square window and target output size being half of input image size
    >>> m = nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5))
    >>> input = torch.randn(20, 16, 50, 32)
    >>> output = m(input)
    

# GaussianNLLLoss

`class torch.nn.GaussianNLLLoss(*, full=False, eps=1e-06, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#GaussianNLLLoss)

    

Gaussian negative log likelihood loss.

The targets are treated as samples from Gaussian distributions with
expectations and variances predicted by the neural network. For a
D-dimensional `target` tensor modelled as having heteroscedastic Gaussian
distributions with a D-dimensional tensor of expectations `input` and a
D-dimensional tensor of positive variances `var` the loss is:

loss=12∑i=1D(log⁡(max(var[i], eps))+(input[i]−target[i])2max(var[i],
eps))+const.\text{loss} = \frac{1}{2}\sum_{i=1}^D
\left(\log\left(\text{max}\left(\text{var}[i], \ \text{eps}\right)\right) +
\frac{\left(\text{input}[i] - \text{target}[i]\right)^2}
{\text{max}\left(\text{var}[i], \ \text{eps}\right)}\right) + \text{const.}

where `eps` is used for stability. By default, the constant term of the loss
function is omitted unless `full` is `True`. If `var` is a scalar (implying
`target` tensor has homoscedastic Gaussian distributions) it is broadcasted to
be the same size as the input.

Parameters

    

  * **full** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – include the constant term in the loss calculation. Default: `False`.
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – value used to clamp `var` (see note below), for stability. Default: 1e-6.
  * **reduction** (_string_ _,__optional_) – specifies the reduction to apply to the output:`'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the output is the average of all batch member losses, `'sum'`: the output is the sum of all batch member losses. Default: `'mean'`.

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input
  * Var: (N,1)(N, 1) or (N,∗)(N, *) , same shape as the input
  * Output: scalar if `reduction` is `'mean'` (default) or `'sum'`. If `reduction` is `'none'`, then (N)(N)

Examples:

    
    
    >>> loss = nn.GaussianNLLLoss()
    >>> input = torch.randn(5, 2, requires_grad=True)
    >>> target = torch.randn(5, 2)
    >>> var = torch.ones(5, 2, requires_grad=True) #heteroscedastic
    >>> output = loss(input, target, var)
    >>> output.backward()
    
    
    >>> loss = nn.GaussianNLLLoss()
    >>> input = torch.randn(5, 2, requires_grad=True)
    >>> target = torch.randn(5, 2)
    >>> var = torch.ones(5, 1, requires_grad=True) #homoscedastic
    >>> output = loss(input, target, var)
    >>> output.backward()
    

Note

The clamping of `var` is ignored with respect to autograd, and so the
gradients are unaffected by it.

Reference:

    

Nix, D. A. and Weigend, A. S., “Estimating the mean and variance of the target
probability distribution”, Proceedings of 1994 IEEE International Conference
on Neural Networks (ICNN’94), Orlando, FL, USA, 1994, pp. 55-60 vol.1, doi:
10.1109/ICNN.1994.374138.

# GELU

`class torch.nn.GELU`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#GELU)

    

Applies the Gaussian Error Linear Units function:

GELU(x)=x∗Φ(x)\text{GELU}(x) = x * \Phi(x)

where Φ(x)\Phi(x) is the Cumulative Distribution Function for Gaussian
Distribution.

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.GELU()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# GroupNorm

`class torch.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/normalization.html#GroupNorm)

    

Applies Group Normalization over a mini-batch of inputs as described in the
paper [Group Normalization](https://arxiv.org/abs/1803.08494)

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The input channels are separated into `num_groups` groups, each containing
`num_channels / num_groups` channels. The mean and standard-deviation are
calculated separately over the each group. γ\gamma and β\beta are learnable
per-channel affine transform parameter vectors of size `num_channels` if
`affine` is `True`. The standard-deviation is calculated via the biased
estimator, equivalent to `torch.var(input, unbiased=False)`.

This layer uses statistics computed from input data in both training and
evaluation modes.

Parameters

    

  * **num_groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of groups to separate the channels into
  * **num_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of channels expected in input
  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **affine** – a boolean value that when set to `True`, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default: `True`.

Shape:

    

  * Input: (N,C,∗)(N, C, *) where C=num_channelsC=\text{num\\_channels}
  * Output: (N,C,∗)(N, C, *) (same shape as input)

Examples:

    
    
    >>> input = torch.randn(20, 6, 10, 10)
    >>> # Separate 6 channels into 3 groups
    >>> m = nn.GroupNorm(3, 6)
    >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm)
    >>> m = nn.GroupNorm(6, 6)
    >>> # Put all 6 channels into a single group (equivalent with LayerNorm)
    >>> m = nn.GroupNorm(1, 6)
    >>> # Activating the module
    >>> output = m(input)
    

# GRU

`class torch.nn.GRU(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#GRU)

    

Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.

For each element in the input sequence, each layer computes the following
function:

rt=σ(Wirxt+bir+Whrh(t−1)+bhr)zt=σ(Wizxt+biz+Whzh(t−1)+bhz)nt=tanh⁡(Winxt+bin+rt∗(Whnh(t−1)+bhn))ht=(1−zt)∗nt+zt∗h(t−1)\begin{array}{ll}
r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\\ z_t =
\sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\\ n_t = \tanh(W_{in}
x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\\ h_t = (1 - z_t) * n_t +
z_t * h_{(t-1)} \end{array}

where hth_t is the hidden state at time `t`, xtx_t is the input at time `t`,
h(t−1)h_{(t-1)} is the hidden state of the layer at time `t-1` or the initial
hidden state at time `0`, and rtr_t , ztz_t , ntn_t are the reset, update, and
new gates, respectively. σ\sigma is the sigmoid function, and ∗* is the
Hadamard product.

In a multilayer GRU, the input xt(l)x^{(l)}_t of the ll -th layer (l>=2l >= 2
) is the hidden state ht(l−1)h^{(l-1)}_t of the previous layer multiplied by
dropout δt(l−1)\delta^{(l-1)}_t where each δt(l−1)\delta^{(l-1)}_t is a
Bernoulli random variable which is 00 with probability `dropout`.

Parameters

    

  * **input_size** – The number of expected features in the input `x`
  * **hidden_size** – The number of features in the hidden state `h`
  * **num_layers** – Number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two GRUs together to form a `stacked GRU`, with the second GRU taking in outputs of the first GRU and computing the final results. Default: 1
  * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True`
  * **batch_first** – If `True`, then the input and output tensors are provided as (batch, seq, feature). Default: `False`
  * **dropout** – If non-zero, introduces a `Dropout` layer on the outputs of each GRU layer except the last layer, with dropout probability equal to `dropout`. Default: 0
  * **bidirectional** – If `True`, becomes a bidirectional GRU. Default: `False`

Inputs: input, h_0

    

  * **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See [`torch.nn.utils.rnn.pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") for details.
  * **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs: output, h_n

    

  * **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features h_t from the last layer of the GRU, for each `t`. If a [`torch.nn.utils.rnn.PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using `output.view(seq_len, batch, num_directions, hidden_size)`, with forward and backward being direction `0` and `1` respectively.

Similarly, the directions can be separated in the packed case.

  * **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`

Like _output_ , the layers can be separated using `h_n.view(num_layers,
num_directions, batch, hidden_size)`.

Shape:

    

  * Input1: (L,N,Hin)(L, N, H_{in}) tensor containing input features where Hin=input_sizeH_{in}=\text{input\\_size} and `L` represents a sequence length.
  * Input2: (S,N,Hout)(S, N, H_{out}) tensor containing the initial hidden state for each element in the batch. Hout=hidden_sizeH_{out}=\text{hidden\\_size} Defaults to zero if not provided. where S=num_layers∗num_directionsS=\text{num\\_layers} * \text{num\\_directions} If the RNN is bidirectional, num_directions should be 2, else it should be 1.
  * Output1: (L,N,Hall)(L, N, H_{all}) where Hall=num_directions∗hidden_sizeH_{all}=\text{num\\_directions} * \text{hidden\\_size}
  * Output2: (S,N,Hout)(S, N, H_{out}) tensor containing the next hidden state for each element in the batch

Variables

    

  * **~GRU.weight_ih_l[k]** – the learnable input-hidden weights of the kth\text{k}^{th} layer (W_ir|W_iz|W_in), of shape `(3*hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(3*hidden_size, num_directions * hidden_size)`
  * **~GRU.weight_hh_l[k]** – the learnable hidden-hidden weights of the kth\text{k}^{th} layer (W_hr|W_hz|W_hn), of shape `(3*hidden_size, hidden_size)`
  * **~GRU.bias_ih_l[k]** – the learnable input-hidden bias of the kth\text{k}^{th} layer (b_ir|b_iz|b_in), of shape `(3*hidden_size)`
  * **~GRU.bias_hh_l[k]** – the learnable hidden-hidden bias of the kth\text{k}^{th} layer (b_hr|b_hz|b_hn), of shape `(3*hidden_size)`

Note

All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k},
\sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}}

Orphan

Note

If the following conditions are satisfied: 1) cudnn is enabled, 2) input data
is on the GPU 3) input data has dtype `torch.float16` 4) V100 GPU is used, 5)
input data is not in `PackedSequence` format persistent algorithm can be
selected to improve performance.

Examples:

    
    
    >>> rnn = nn.GRU(10, 20, 2)
    >>> input = torch.randn(5, 3, 10)
    >>> h0 = torch.randn(2, 3, 20)
    >>> output, hn = rnn(input, h0)
    

# GRUCell

`class torch.nn.GRUCell(input_size, hidden_size, bias=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#GRUCell)

    

A gated recurrent unit (GRU) cell

r=σ(Wirx+bir+Whrh+bhr)z=σ(Wizx+biz+Whzh+bhz)n=tanh⁡(Winx+bin+r∗(Whnh+bhn))h′=(1−z)∗n+z∗h\begin{array}{ll}
r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\\ z = \sigma(W_{iz} x +
b_{iz} + W_{hz} h + b_{hz}) \\\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h +
b_{hn})) \\\ h' = (1 - z) * n + z * h \end{array}

where σ\sigma is the sigmoid function, and ∗* is the Hadamard product.

Parameters

    

  * **input_size** – The number of expected features in the input `x`
  * **hidden_size** – The number of features in the hidden state `h`
  * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True`

Inputs: input, hidden

    

  * **input** of shape `(batch, input_size)`: tensor containing input features
  * **hidden** of shape `(batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.

Outputs: h’

    

  * **h’** of shape `(batch, hidden_size)`: tensor containing the next hidden state for each element in the batch

Shape:

    

  * Input1: (N,Hin)(N, H_{in}) tensor containing input features where HinH_{in} = `input_size`
  * Input2: (N,Hout)(N, H_{out}) tensor containing the initial hidden state for each element in the batch where HoutH_{out} = `hidden_size` Defaults to zero if not provided.
  * Output: (N,Hout)(N, H_{out}) tensor containing the next hidden state for each element in the batch

Variables

    

  * **~GRUCell.weight_ih** – the learnable input-hidden weights, of shape `(3*hidden_size, input_size)`
  * **~GRUCell.weight_hh** – the learnable hidden-hidden weights, of shape `(3*hidden_size, hidden_size)`
  * **~GRUCell.bias_ih** – the learnable input-hidden bias, of shape `(3*hidden_size)`
  * **~GRUCell.bias_hh** – the learnable hidden-hidden bias, of shape `(3*hidden_size)`

Note

All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k},
\sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}}

Examples:

    
    
    >>> rnn = nn.GRUCell(10, 20)
    >>> input = torch.randn(6, 3, 10)
    >>> hx = torch.randn(3, 20)
    >>> output = []
    >>> for i in range(6):
            hx = rnn(input[i], hx)
            output.append(hx)
    

# Hardshrink

`class torch.nn.Hardshrink(lambd=0.5)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardshrink)

    

Applies the hard shrinkage function element-wise:

HardShrink(x)={x, if x>λx, if x<−λ0, otherwise \text{HardShrink}(x) =
\begin{cases} x, & \text{ if } x > \lambda \\\ x, & \text{ if } x < -\lambda
\\\ 0, & \text{ otherwise } \end{cases}

Parameters

    

**lambd** – the λ\lambda value for the Hardshrink formulation. Default: 0.5

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Hardshrink()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Hardsigmoid

`class torch.nn.Hardsigmoid(inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardsigmoid)

    

Applies the element-wise function:

Hardsigmoid(x)={0if x≤−3,1if x≥+3,x/6+1/2otherwise\text{Hardsigmoid}(x) =
\begin{cases} 0 & \text{if~} x \le -3, \\\ 1 & \text{if~} x \ge +3, \\\ x / 6
+ 1 / 2 & \text{otherwise} \end{cases}

Parameters

    

**inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Hardsigmoid()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Hardswish

`class torch.nn.Hardswish(inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardswish)

    

Applies the hardswish function, element-wise, as described in the paper:

[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244).

Hardswish(x)={0if x≤−3,xif x≥+3,x⋅(x+3)/6otherwise\text{Hardswish}(x) =
\begin{cases} 0 & \text{if~} x \le -3, \\\ x & \text{if~} x \ge +3, \\\ x
\cdot (x + 3) /6 & \text{otherwise} \end{cases}

Parameters

    

**inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Hardswish()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Hardtanh

`class torch.nn.Hardtanh(min_val=-1.0, max_val=1.0, inplace=False,
min_value=None, max_value=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardtanh)

    

Applies the HardTanh function element-wise

HardTanh is defined as:

HardTanh(x)={1 if x>1−1 if x<−1x otherwise \text{HardTanh}(x) = \begin{cases}
1 & \text{ if } x > 1 \\\ -1 & \text{ if } x < -1 \\\ x & \text{ otherwise }
\\\ \end{cases}

The range of the linear region [−1,1][-1, 1] can be adjusted using `min_val`
and `max_val`.

Parameters

    

  * **min_val** – minimum value of the linear region range. Default: -1
  * **max_val** – maximum value of the linear region range. Default: 1
  * **inplace** – can optionally do the operation in-place. Default: `False`

Keyword arguments `min_value` and `max_value` have been deprecated in favor of
`min_val` and `max_val`.

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Hardtanh(-2, 2)
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# HingeEmbeddingLoss

`class torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None,
reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#HingeEmbeddingLoss)

    

Measures the loss given an input tensor xx and a labels tensor yy (containing
1 or -1). This is usually used for measuring whether two inputs are similar or
dissimilar, e.g. using the L1 pairwise distance as xx , and is typically used
for learning nonlinear embeddings or semi-supervised learning.

The loss function for nn -th sample in the mini-batch is

ln={xn,ifyn=1,max⁡{0,Δ−xn},ifyn=−1,l_n = \begin{cases} x_n, & \text{if}\; y_n
= 1,\\\ \max \\{0, \Delta - x_n\\}, & \text{if}\; y_n = -1, \end{cases}

and the total loss functions is

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) =
\begin{cases} \operatorname{mean}(L), & \text{if reduction} =
\text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}
\end{cases}

where L={l1,…,lN}⊤L = \\{l_1,\dots,l_N\\}^\top .

Parameters

    

  * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Has a default value of `1`.
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (∗)(*) where ∗* means, any number of dimensions. The sum operation operates over all the elements.
  * Target: (∗)(*) , same shape as the input
  * Output: scalar. If `reduction` is `'none'`, then same shape as the input

# Identity

`class torch.nn.Identity(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#Identity)

    

A placeholder identity operator that is argument-insensitive.

Parameters

    

  * **args** – any argument (unused)
  * **kwargs** – any keyword argument (unused)

Examples:

    
    
    >>> m = nn.Identity(54, unused_argument1=0.1, unused_argument2=False)
    >>> input = torch.randn(128, 20)
    >>> output = m(input)
    >>> print(output.size())
    torch.Size([128, 20])
    

# InstanceNorm1d

`class torch.nn.InstanceNorm1d(num_features, eps=1e-05, momentum=0.1,
affine=False, track_running_stats=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/instancenorm.html#InstanceNorm1d)

    

Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with
optional additional channel dimension) as described in the paper [Instance
Normalization: The Missing Ingredient for Fast
Stylization](https://arxiv.org/abs/1607.08022).

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated per-dimension separately for
each object in a mini-batch. γ\gamma and β\beta are learnable parameter
vectors of size `C` (where `C` is the input size) if `affine` is `True`. The
standard-deviation is calculated via the biased estimator, equivalent to
`torch.var(input, unbiased=False)`.

By default, this layer uses instance statistics computed from input data in
both training and evaluation modes.

If `track_running_stats` is set to `True`, during training this layer keeps
running estimates of its computed mean and variance, which are then used for
normalization during evaluation. The running estimates are kept with a default
`momentum` of 0.1.

Note

This `momentum` argument is different from one used in optimizer classes and
the conventional notion of momentum. Mathematically, the update rule for
running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new}
= (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where
x^\hat{x} is the estimated statistic and xtx_t is the new observed value.

Note

`InstanceNorm1d` and [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm
"torch.nn.LayerNorm") are very similar, but have some subtle differences.
`InstanceNorm1d` is applied on each channel of channeled data like
multidimensional time series, but
[`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") is
usually applied on entire sample and often in NLP tasks. Additionally,
[`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm")
applies elementwise affine transform, while `InstanceNorm1d` usually don’t
apply affine transform.

Parameters

    

  * **num_features** – CC from an expected input of size (N,C,L)(N, C, L) or LL from input of size (N,L)(N, L)
  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **momentum** – the value used for the running_mean and running_var computation. Default: 0.1
  * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: `False`.
  * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: `False`

Shape:

    

  * Input: (N,C,L)(N, C, L)
  * Output: (N,C,L)(N, C, L) (same shape as input)

Examples:

    
    
    >>> # Without Learnable Parameters
    >>> m = nn.InstanceNorm1d(100)
    >>> # With Learnable Parameters
    >>> m = nn.InstanceNorm1d(100, affine=True)
    >>> input = torch.randn(20, 100, 40)
    >>> output = m(input)
    

# InstanceNorm2d

`class torch.nn.InstanceNorm2d(num_features, eps=1e-05, momentum=0.1,
affine=False, track_running_stats=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/instancenorm.html#InstanceNorm2d)

    

Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with
additional channel dimension) as described in the paper [Instance
Normalization: The Missing Ingredient for Fast
Stylization](https://arxiv.org/abs/1607.08022).

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated per-dimension separately for
each object in a mini-batch. γ\gamma and β\beta are learnable parameter
vectors of size `C` (where `C` is the input size) if `affine` is `True`. The
standard-deviation is calculated via the biased estimator, equivalent to
`torch.var(input, unbiased=False)`.

By default, this layer uses instance statistics computed from input data in
both training and evaluation modes.

If `track_running_stats` is set to `True`, during training this layer keeps
running estimates of its computed mean and variance, which are then used for
normalization during evaluation. The running estimates are kept with a default
`momentum` of 0.1.

Note

This `momentum` argument is different from one used in optimizer classes and
the conventional notion of momentum. Mathematically, the update rule for
running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new}
= (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where
x^\hat{x} is the estimated statistic and xtx_t is the new observed value.

Note

`InstanceNorm2d` and [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm
"torch.nn.LayerNorm") are very similar, but have some subtle differences.
`InstanceNorm2d` is applied on each channel of channeled data like RGB images,
but [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm")
is usually applied on entire sample and often in NLP tasks. Additionally,
[`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm")
applies elementwise affine transform, while `InstanceNorm2d` usually don’t
apply affine transform.

Parameters

    

  * **num_features** – CC from an expected input of size (N,C,H,W)(N, C, H, W)
  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **momentum** – the value used for the running_mean and running_var computation. Default: 0.1
  * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: `False`.
  * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: `False`

Shape:

    

  * Input: (N,C,H,W)(N, C, H, W)
  * Output: (N,C,H,W)(N, C, H, W) (same shape as input)

Examples:

    
    
    >>> # Without Learnable Parameters
    >>> m = nn.InstanceNorm2d(100)
    >>> # With Learnable Parameters
    >>> m = nn.InstanceNorm2d(100, affine=True)
    >>> input = torch.randn(20, 100, 35, 45)
    >>> output = m(input)
    

# InstanceNorm3d

`class torch.nn.InstanceNorm3d(num_features, eps=1e-05, momentum=0.1,
affine=False, track_running_stats=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/instancenorm.html#InstanceNorm3d)

    

Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with
additional channel dimension) as described in the paper [Instance
Normalization: The Missing Ingredient for Fast
Stylization](https://arxiv.org/abs/1607.08022).

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated per-dimension separately for
each object in a mini-batch. γ\gamma and β\beta are learnable parameter
vectors of size C (where C is the input size) if `affine` is `True`. The
standard-deviation is calculated via the biased estimator, equivalent to
`torch.var(input, unbiased=False)`.

By default, this layer uses instance statistics computed from input data in
both training and evaluation modes.

If `track_running_stats` is set to `True`, during training this layer keeps
running estimates of its computed mean and variance, which are then used for
normalization during evaluation. The running estimates are kept with a default
`momentum` of 0.1.

Note

This `momentum` argument is different from one used in optimizer classes and
the conventional notion of momentum. Mathematically, the update rule for
running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new}
= (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where
x^\hat{x} is the estimated statistic and xtx_t is the new observed value.

Note

`InstanceNorm3d` and [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm
"torch.nn.LayerNorm") are very similar, but have some subtle differences.
`InstanceNorm3d` is applied on each channel of channeled data like 3D models
with RGB color, but [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm
"torch.nn.LayerNorm") is usually applied on entire sample and often in NLP
tasks. Additionally, [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm
"torch.nn.LayerNorm") applies elementwise affine transform, while
`InstanceNorm3d` usually don’t apply affine transform.

Parameters

    

  * **num_features** – CC from an expected input of size (N,C,D,H,W)(N, C, D, H, W)
  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **momentum** – the value used for the running_mean and running_var computation. Default: 0.1
  * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: `False`.
  * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: `False`

Shape:

    

  * Input: (N,C,D,H,W)(N, C, D, H, W)
  * Output: (N,C,D,H,W)(N, C, D, H, W) (same shape as input)

Examples:

    
    
    >>> # Without Learnable Parameters
    >>> m = nn.InstanceNorm3d(100)
    >>> # With Learnable Parameters
    >>> m = nn.InstanceNorm3d(100, affine=True)
    >>> input = torch.randn(20, 100, 35, 45, 10)
    >>> output = m(input)
    

# KLDivLoss

`class torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean',
log_target=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#KLDivLoss)

    

The Kullback-Leibler divergence loss measure

[Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback-
Leibler_divergence) is a useful distance measure for continuous distributions
and is often useful when performing direct regression over the space of
(discretely sampled) continuous output distributions.

As with [`NLLLoss`](torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss"), the
`input` given is expected to contain _log-probabilities_ and is not restricted
to a 2D Tensor. The targets are interpreted as _probabilities_ by default, but
could be considered as _log-probabilities_ with `log_target` set to `True`.

This criterion expects a `target` `Tensor` of the same size as the `input`
`Tensor`.

The unreduced (i.e. with `reduction` set to `'none'`) loss can be described
as:

l(x,y)=L={l1,…,lN},ln=yn⋅(log⁡yn−xn)l(x,y) = L = \\{ l_1,\dots,l_N \\}, \quad
l_n = y_n \cdot \left( \log y_n - x_n \right)

where the index NN spans all dimensions of `input` and LL has the same shape
as `input`. If `reduction` is not `'none'` (default `'mean'`), then:

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) =
\begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}
\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases}

In default `reduction` mode `'mean'`, the losses are averaged for each
minibatch over observations **as well as** over dimensions. `'batchmean'` mode
gives the correct KL divergence where losses are averaged over batch dimension
only. `'mean'` mode’s behavior will be changed to the same as `'batchmean'` in
the next major release.

Parameters

    

  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'batchmean'` | `'sum'` | `'mean'`. `'none'`: no reduction will be applied. `'batchmean'`: the sum of the output will be divided by batchsize. `'sum'`: the output will be summed. `'mean'`: the output will be divided by the number of elements in the output. Default: `'mean'`
  * **log_target** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Specifies whether `target` is passed in the log space. Default: `False`

Note

`size_average` and `reduce` are in the process of being deprecated, and in the
meantime, specifying either of those two args will override `reduction`.

Note

`reduction` = `'mean'` doesn’t return the true kl divergence value, please use
`reduction` = `'batchmean'` which aligns with KL math definition. In the next
major release, `'mean'` will be changed to be the same as `'batchmean'`.

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input
  * Output: scalar by default. If :attr:`reduction` is `'none'`, then (N,∗)(N, *) , the same shape as the input

# L1Loss

`class torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#L1Loss)

    

Creates a criterion that measures the mean absolute error (MAE) between each
element in the input xx and target yy .

The unreduced (i.e. with `reduction` set to `'none'`) loss can be described
as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=∣xn−yn∣,\ell(x, y) = L = \\{l_1,\dots,l_N\\}^\top,
\quad l_n = \left| x_n - y_n \right|,

where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`),
then:

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) =
\begin{cases} \operatorname{mean}(L), & \text{if reduction} =
\text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}
\end{cases}

xx and yy are tensors of arbitrary shapes with a total of nn elements each.

The sum operation still operates over all the elements, and divides by nn .

The division by nn can be avoided if one sets `reduction = 'sum'`.

Supports real-valued and complex-valued inputs.

Parameters

    

  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input
  * Output: scalar. If `reduction` is `'none'`, then (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> loss = nn.L1Loss()
    >>> input = torch.randn(3, 5, requires_grad=True)
    >>> target = torch.randn(3, 5)
    >>> output = loss(input, target)
    >>> output.backward()
    

# LayerNorm

`class torch.nn.LayerNorm(normalized_shape, eps=1e-05,
elementwise_affine=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/normalization.html#LayerNorm)

    

Applies Layer Normalization over a mini-batch of inputs as described in the
paper [Layer Normalization](https://arxiv.org/abs/1607.06450)

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated separately over the last
certain number dimensions which have to be of the shape specified by
`normalized_shape`. γ\gamma and β\beta are learnable affine transform
parameters of `normalized_shape` if `elementwise_affine` is `True`. The
standard-deviation is calculated via the biased estimator, equivalent to
`torch.var(input, unbiased=False)`.

Note

Unlike Batch Normalization and Instance Normalization, which applies scalar
scale and bias for each entire channel/plane with the `affine` option, Layer
Normalization applies per-element scale and bias with `elementwise_affine`.

This layer uses statistics computed from input data in both training and
evaluation modes.

Parameters

    

  * **normalized_shape** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _or_ _torch.Size_) – 

input shape from an expected input of size

[∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]][* \times
\text{normalized\\_shape}[0] \times \text{normalized\\_shape}[1] \times \ldots
\times \text{normalized\\_shape}[-1]]

If a single integer is used, it is treated as a singleton list, and this
module will normalize over the last dimension which is expected to be of that
specific size.

  * **eps** – a value added to the denominator for numerical stability. Default: 1e-5
  * **elementwise_affine** – a boolean value that when set to `True`, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default: `True`.

Shape:

    

  * Input: (N,∗)(N, *)
  * Output: (N,∗)(N, *) (same shape as input)

Examples:

    
    
    >>> input = torch.randn(20, 5, 10, 10)
    >>> # With Learnable Parameters
    >>> m = nn.LayerNorm(input.size()[1:])
    >>> # Without Learnable Parameters
    >>> m = nn.LayerNorm(input.size()[1:], elementwise_affine=False)
    >>> # Normalize over last two dimensions
    >>> m = nn.LayerNorm([10, 10])
    >>> # Normalize over last dimension of size 10
    >>> m = nn.LayerNorm(10)
    >>> # Activating the module
    >>> output = m(input)
    

# LazyConv1d

`class torch.nn.LazyConv1d(out_channels, kernel_size, stride=1, padding=0,
dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConv1d)

    

A [`torch.nn.Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d")
module with lazy initialization of the `in_channels` argument of the
[`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") that is inferred
from the `input.size(1)`.

Parameters

    

  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0
  * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`

See also

[`torch.nn.Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") and
[`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin
"torch.nn.modules.lazy.LazyModuleMixin")

`cls_to_become`

    

alias of [`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d")

# LazyConv2d

`class torch.nn.LazyConv2d(out_channels, kernel_size, stride=1, padding=0,
dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConv2d)

    

A [`torch.nn.Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d")
module with lazy initialization of the `in_channels` argument of the
[`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") that is inferred
from the `input.size(1)`.

Parameters

    

  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0
  * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`

See also

[`torch.nn.Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") and
[`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin
"torch.nn.modules.lazy.LazyModuleMixin")

`cls_to_become`

    

alias of [`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d")

# LazyConv3d

`class torch.nn.LazyConv3d(out_channels, kernel_size, stride=1, padding=0,
dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConv3d)

    

A [`torch.nn.Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d")
module with lazy initialization of the `in_channels` argument of the
[`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") that is inferred
from the `input.size(1)`.

Parameters

    

  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0
  * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`

See also

[`torch.nn.Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") and
[`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin
"torch.nn.modules.lazy.LazyModuleMixin")

`cls_to_become`

    

alias of [`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d")

# LazyConvTranspose1d

`class torch.nn.LazyConvTranspose1d(out_channels, kernel_size, stride=1,
padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConvTranspose1d)

    

A
[`torch.nn.ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d
"torch.nn.ConvTranspose1d") module with lazy initialization of the
`in_channels` argument of the
[`ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d
"torch.nn.ConvTranspose1d") that is inferred from the `input.size(1)`.

Parameters

    

  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of the input. Default: 0
  * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of the output shape. Default: 0
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1

See also

[`torch.nn.ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d
"torch.nn.ConvTranspose1d") and
[`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin
"torch.nn.modules.lazy.LazyModuleMixin")

`cls_to_become`

    

alias of [`ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d
"torch.nn.ConvTranspose1d")

# LazyConvTranspose2d

`class torch.nn.LazyConvTranspose2d(out_channels, kernel_size, stride=1,
padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConvTranspose2d)

    

A
[`torch.nn.ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d
"torch.nn.ConvTranspose2d") module with lazy initialization of the
`in_channels` argument of the
[`ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d
"torch.nn.ConvTranspose2d") that is inferred from the `input.size(1)`.

Parameters

    

  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0
  * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1

See also

[`torch.nn.ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d
"torch.nn.ConvTranspose2d") and
[`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin
"torch.nn.modules.lazy.LazyModuleMixin")

`cls_to_become`

    

alias of [`ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d
"torch.nn.ConvTranspose2d")

# LazyConvTranspose3d

`class torch.nn.LazyConvTranspose3d(out_channels, kernel_size, stride=1,
padding=0, output_padding=0, groups=1, bias=True, dilation=1,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConvTranspose3d)

    

A
[`torch.nn.ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d
"torch.nn.ConvTranspose3d") module with lazy initialization of the
`in_channels` argument of the
[`ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d
"torch.nn.ConvTranspose3d") that is inferred from the `input.size(1)`.

Parameters

    

  * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution
  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0
  * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0
  * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1
  * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True`
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1

See also

[`torch.nn.ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d
"torch.nn.ConvTranspose3d") and
[`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin
"torch.nn.modules.lazy.LazyModuleMixin")

`cls_to_become`

    

alias of [`ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d
"torch.nn.ConvTranspose3d")

# LazyLinear

`class torch.nn.LazyLinear(out_features, bias=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#LazyLinear)

    

A [`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear")
module with lazy initialization.

In this module, the `weight` and `bias` are of
`torch.nn.UninitializedParameter` class. They will be initialized after the
first call to `forward` is done and the module will become a regular
[`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear") module.

Check the
[`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin
"torch.nn.modules.lazy.LazyModuleMixin") for further documentation on lazy
modules and their limitations.

Parameters

    

  * **out_features** – size of each output sample
  * **bias** – If set to `False`, the layer will not learn an additive bias. Default: `True`

Variables

    

  * **~LazyLinear.weight** – the learnable weights of the module of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) . The values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in_featuresk = \frac{1}{\text{in\\_features}}
  * **~LazyLinear.bias** – the learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1in_featuresk = \frac{1}{\text{in\\_features}}

`cls_to_become`

    

alias of [`Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear")

# LeakyReLU

`class torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#LeakyReLU)

    

Applies the element-wise function:

LeakyReLU(x)=max⁡(0,x)+negative_slope∗min⁡(0,x)\text{LeakyReLU}(x) = \max(0,
x) + \text{negative\\_slope} * \min(0, x)

or

LeakyRELU(x)={x, if x≥0negative_slope×x, otherwise \text{LeakyRELU}(x) =
\begin{cases} x, & \text{ if } x \geq 0 \\\ \text{negative\\_slope} \times x,
& \text{ otherwise } \end{cases}

Parameters

    

  * **negative_slope** – Controls the angle of the negative slope. Default: 1e-2
  * **inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.LeakyReLU(0.1)
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Linear

`class torch.nn.Linear(in_features, out_features, bias=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#Linear)

    

Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b

This module supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Parameters

    

  * **in_features** – size of each input sample
  * **out_features** – size of each output sample
  * **bias** – If set to `False`, the layer will not learn an additive bias. Default: `True`

Shape:

    

  * Input: (N,∗,Hin)(N, *, H_{in}) where ∗* means any number of additional dimensions and Hin=in_featuresH_{in} = \text{in\\_features}
  * Output: (N,∗,Hout)(N, *, H_{out}) where all but the last dimension are the same shape as the input and Hout=out_featuresH_{out} = \text{out\\_features} .

Variables

    

  * **~Linear.weight** – the learnable weights of the module of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) . The values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in_featuresk = \frac{1}{\text{in\\_features}}
  * **~Linear.bias** – the learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1in_featuresk = \frac{1}{\text{in\\_features}}

Examples:

    
    
    >>> m = nn.Linear(20, 30)
    >>> input = torch.randn(128, 20)
    >>> output = m(input)
    >>> print(output.size())
    torch.Size([128, 30])
    

# LocalResponseNorm

`class torch.nn.LocalResponseNorm(size, alpha=0.0001, beta=0.75, k=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/normalization.html#LocalResponseNorm)

    

Applies local response normalization over an input signal composed of several
input planes, where channels occupy the second dimension. Applies
normalization across channels.

bc=ac(k+αn∑c′=max⁡(0,c−n/2)min⁡(N−1,c+n/2)ac′2)−βb_{c} = a_{c}\left(k +
\frac{\alpha}{n} \sum_{c'=\max(0,
c-n/2)}^{\min(N-1,c+n/2)}a_{c'}^2\right)^{-\beta}

Parameters

    

  * **size** – amount of neighbouring channels used for normalization
  * **alpha** – multiplicative factor. Default: 0.0001
  * **beta** – exponent. Default: 0.75
  * **k** – additive factor. Default: 1

Shape:

    

  * Input: (N,C,∗)(N, C, *)
  * Output: (N,C,∗)(N, C, *) (same shape as input)

Examples:

    
    
    >>> lrn = nn.LocalResponseNorm(2)
    >>> signal_2d = torch.randn(32, 5, 24, 24)
    >>> signal_4d = torch.randn(16, 5, 7, 7, 7, 7)
    >>> output_2d = lrn(signal_2d)
    >>> output_4d = lrn(signal_4d)
    

# LogSigmoid

`class torch.nn.LogSigmoid`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#LogSigmoid)

    

Applies the element-wise function:

LogSigmoid(x)=log⁡(11+exp⁡(−x))\text{LogSigmoid}(x) = \log\left(\frac{ 1 }{ 1
+ \exp(-x)}\right)

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.LogSigmoid()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# LogSoftmax

`class torch.nn.LogSoftmax(dim=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#LogSoftmax)

    

Applies the log⁡(Softmax(x))\log(\text{Softmax}(x)) function to an
n-dimensional input Tensor. The LogSoftmax formulation can be simplified as:

LogSoftmax(xi)=log⁡(exp⁡(xi)∑jexp⁡(xj))\text{LogSoftmax}(x_{i}) =
\log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right)

Shape:

    

  * Input: (∗)(*) where `*` means, any number of additional dimensions
  * Output: (∗)(*) , same shape as the input

Parameters

    

**dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – A dimension along which LogSoftmax will be computed.

Returns

    

a Tensor of the same dimension and shape as the input with values in the range
[-inf, 0)

Examples:

    
    
    >>> m = nn.LogSoftmax()
    >>> input = torch.randn(2, 3)
    >>> output = m(input)
    

# LPPool1d

`class torch.nn.LPPool1d(norm_type, kernel_size, stride=None,
ceil_mode=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#LPPool1d)

    

Applies a 1D power-average pooling over an input signal composed of several
input planes.

On each window, the function computed is:

f(X)=∑x∈Xxppf(X) = \sqrt[p]{\sum_{x \in X} x^{p}}

  * At p = ∞\infty , one gets Max Pooling
  * At p = 1, one gets Sum Pooling (which is proportional to Average Pooling)

Note

If the sum to the power of `p` is zero, the gradient of this function is not
defined. This implementation will set the gradient to zero in this case.

Parameters

    

  * **kernel_size** – a single int, the size of the window
  * **stride** – a single int, the stride of the window. Default value is `kernel_size`
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape

Shape:

    

  * Input: (N,C,Lin)(N, C, L_{in})
  * Output: (N,C,Lout)(N, C, L_{out}) , where

Lout=⌊Lin−kernel_sizestride+1⌋L_{out} = \left\lfloor\frac{L_{in} -
\text{kernel\\_size}}{\text{stride}} + 1\right\rfloor

Examples::

    
    
    
    >>> # power-2 pool of window of length 3, with stride 2.
    >>> m = nn.LPPool1d(2, 3, stride=2)
    >>> input = torch.randn(20, 16, 50)
    >>> output = m(input)
    

# LPPool2d

`class torch.nn.LPPool2d(norm_type, kernel_size, stride=None,
ceil_mode=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#LPPool2d)

    

Applies a 2D power-average pooling over an input signal composed of several
input planes.

On each window, the function computed is:

f(X)=∑x∈Xxppf(X) = \sqrt[p]{\sum_{x \in X} x^{p}}

  * At p = ∞\infty , one gets Max Pooling
  * At p = 1, one gets Sum Pooling (which is proportional to average pooling)

The parameters `kernel_size`, `stride` can either be:

  * a single `int` – in which case the same value is used for the height and width dimension
  * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension

Note

If the sum to the power of `p` is zero, the gradient of this function is not
defined. This implementation will set the gradient to zero in this case.

Parameters

    

  * **kernel_size** – the size of the window
  * **stride** – the stride of the window. Default value is `kernel_size`
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where

Hout=⌊Hin−kernel_size[0]stride[0]+1⌋H_{out} = \left\lfloor\frac{H_{in} -
\text{kernel\\_size}[0]}{\text{stride}[0]} + 1\right\rfloor

Wout=⌊Win−kernel_size[1]stride[1]+1⌋W_{out} = \left\lfloor\frac{W_{in} -
\text{kernel\\_size}[1]}{\text{stride}[1]} + 1\right\rfloor

Examples:

    
    
    >>> # power-2 pool of square window of size=3, stride=2
    >>> m = nn.LPPool2d(2, 3, stride=2)
    >>> # pool of non-square window of power 1.2
    >>> m = nn.LPPool2d(1.2, (3, 2), stride=(2, 1))
    >>> input = torch.randn(20, 16, 50, 32)
    >>> output = m(input)
    

# LSTM

`class torch.nn.LSTM(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#LSTM)

    

Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.

For each element in the input sequence, each layer computes the following
function:

it=σ(Wiixt+bii+Whiht−1+bhi)ft=σ(Wifxt+bif+Whfht−1+bhf)gt=tanh⁡(Wigxt+big+Whght−1+bhg)ot=σ(Wioxt+bio+Whoht−1+bho)ct=ft⊙ct−1+it⊙gtht=ot⊙tanh⁡(ct)\begin{array}{ll}
\\\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\\ f_t =
\sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\\ g_t = \tanh(W_{ig}
x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\\ o_t = \sigma(W_{io} x_t + b_{io} +
W_{ho} h_{t-1} + b_{ho}) \\\ c_t = f_t \odot c_{t-1} + i_t \odot g_t \\\ h_t =
o_t \odot \tanh(c_t) \\\ \end{array}

where hth_t is the hidden state at time `t`, ctc_t is the cell state at time
`t`, xtx_t is the input at time `t`, ht−1h_{t-1} is the hidden state of the
layer at time `t-1` or the initial hidden state at time `0`, and iti_t , ftf_t
, gtg_t , oto_t are the input, forget, cell, and output gates, respectively.
σ\sigma is the sigmoid function, and ⊙\odot is the Hadamard product.

In a multilayer LSTM, the input xt(l)x^{(l)}_t of the ll -th layer (l>=2l >= 2
) is the hidden state ht(l−1)h^{(l-1)}_t of the previous layer multiplied by
dropout δt(l−1)\delta^{(l-1)}_t where each δt(l−1)\delta^{(l-1)}_t is a
Bernoulli random variable which is 00 with probability `dropout`.

If `proj_size > 0` is specified, LSTM with projections will be used. This
changes the LSTM cell in the following way. First, the dimension of hth_t will
be changed from `hidden_size` to `proj_size` (dimensions of WhiW_{hi} will be
changed accordingly). Second, the output hidden state of each layer will be
multiplied by a learnable projection matrix: ht=Whrhth_t = W_{hr}h_t . Note
that as a consequence of this, the output of LSTM network will be of different
shape as well. See Inputs/Outputs sections below for exact dimensions of all
variables. You can find more details in <https://arxiv.org/abs/1402.1128>.

Parameters

    

  * **input_size** – The number of expected features in the input `x`
  * **hidden_size** – The number of features in the hidden state `h`
  * **num_layers** – Number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1
  * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True`
  * **batch_first** – If `True`, then the input and output tensors are provided as (batch, seq, feature). Default: `False`
  * **dropout** – If non-zero, introduces a `Dropout` layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to `dropout`. Default: 0
  * **bidirectional** – If `True`, becomes a bidirectional LSTM. Default: `False`
  * **proj_size** – If `> 0`, will use LSTM with projections of corresponding size. Default: 0

Inputs: input, (h_0, c_0)

    

  * **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See [`torch.nn.utils.rnn.pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") or [`torch.nn.utils.rnn.pack_sequence()`](torch.nn.utils.rnn.pack_sequence#torch.nn.utils.rnn.pack_sequence "torch.nn.utils.rnn.pack_sequence") for details.
  * **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1. If `proj_size > 0` was specified, the shape has to be `(num_layers * num_directions, batch, proj_size)`.
  * **c_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial cell state for each element in the batch.

If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero.

Outputs: output, (h_n, c_n)

    

  * **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features `(h_t)` from the last layer of the LSTM, for each `t`. If a [`torch.nn.utils.rnn.PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") has been given as the input, the output will also be a packed sequence. If `proj_size > 0` was specified, output shape will be `(seq_len, batch, num_directions * proj_size)`.

For the unpacked case, the directions can be separated using
`output.view(seq_len, batch, num_directions, hidden_size)`, with forward and
backward being direction `0` and `1` respectively. Similarly, the directions
can be separated in the packed case.

  * **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`. If `proj_size > 0` was specified, `h_n` shape will be `(num_layers * num_directions, batch, proj_size)`.

Like _output_ , the layers can be separated using `h_n.view(num_layers,
num_directions, batch, hidden_size)` and similarly for _c_n_.

  * **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the cell state for `t = seq_len`.

Variables

    

  * **~LSTM.weight_ih_l[k]** – the learnable input-hidden weights of the kth\text{k}^{th} layer `(W_ii|W_if|W_ig|W_io)`, of shape `(4*hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(4*hidden_size, num_directions * hidden_size)`
  * **~LSTM.weight_hh_l[k]** – the learnable hidden-hidden weights of the kth\text{k}^{th} layer `(W_hi|W_hf|W_hg|W_ho)`, of shape `(4*hidden_size, hidden_size)`. If `proj_size > 0` was specified, the shape will be `(4*hidden_size, proj_size)`.
  * **~LSTM.bias_ih_l[k]** – the learnable input-hidden bias of the kth\text{k}^{th} layer `(b_ii|b_if|b_ig|b_io)`, of shape `(4*hidden_size)`
  * **~LSTM.bias_hh_l[k]** – the learnable hidden-hidden bias of the kth\text{k}^{th} layer `(b_hi|b_hf|b_hg|b_ho)`, of shape `(4*hidden_size)`
  * **~LSTM.weight_hr_l[k]** – the learnable projection weights of the kth\text{k}^{th} layer of shape `(proj_size, hidden_size)`. Only present when `proj_size > 0` was specified.

Note

All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k},
\sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}}

Warning

There are known non-determinism issues for RNN functions on some versions of
cuDNN and CUDA. You can enforce deterministic behavior by setting the
following environment variables:

On CUDA 10.1, set environment variable `CUDA_LAUNCH_BLOCKING=1`. This may
affect performance.

On CUDA 10.2 or later, set environment variable (note the leading colon
symbol) `CUBLAS_WORKSPACE_CONFIG=:16:8` or `CUBLAS_WORKSPACE_CONFIG=:4096:2`.

See the [cuDNN 8 Release
Notes](https://docs.nvidia.com/deeplearning/sdk/cudnn-release-
notes/rel_8.html) for more information.

Orphan

Note

If the following conditions are satisfied: 1) cudnn is enabled, 2) input data
is on the GPU 3) input data has dtype `torch.float16` 4) V100 GPU is used, 5)
input data is not in `PackedSequence` format persistent algorithm can be
selected to improve performance.

Examples:

    
    
    >>> rnn = nn.LSTM(10, 20, 2)
    >>> input = torch.randn(5, 3, 10)
    >>> h0 = torch.randn(2, 3, 20)
    >>> c0 = torch.randn(2, 3, 20)
    >>> output, (hn, cn) = rnn(input, (h0, c0))
    

# LSTMCell

`class torch.nn.LSTMCell(input_size, hidden_size, bias=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#LSTMCell)

    

A long short-term memory (LSTM) cell.

i=σ(Wiix+bii+Whih+bhi)f=σ(Wifx+bif+Whfh+bhf)g=tanh⁡(Wigx+big+Whgh+bhg)o=σ(Wiox+bio+Whoh+bho)c′=f∗c+i∗gh′=o∗tanh⁡(c′)\begin{array}{ll}
i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\\ f = \sigma(W_{if} x +
b_{if} + W_{hf} h + b_{hf}) \\\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h +
b_{hg}) \\\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\\ c' = f * c +
i * g \\\ h' = o * \tanh(c') \\\ \end{array}

where σ\sigma is the sigmoid function, and ∗* is the Hadamard product.

Parameters

    

  * **input_size** – The number of expected features in the input `x`
  * **hidden_size** – The number of features in the hidden state `h`
  * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True`

Inputs: input, (h_0, c_0)

    

  * **input** of shape `(batch, input_size)`: tensor containing input features
  * **h_0** of shape `(batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch.
  * **c_0** of shape `(batch, hidden_size)`: tensor containing the initial cell state for each element in the batch.

If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero.

Outputs: (h_1, c_1)

    

  * **h_1** of shape `(batch, hidden_size)`: tensor containing the next hidden state for each element in the batch
  * **c_1** of shape `(batch, hidden_size)`: tensor containing the next cell state for each element in the batch

Variables

    

  * **~LSTMCell.weight_ih** – the learnable input-hidden weights, of shape `(4*hidden_size, input_size)`
  * **~LSTMCell.weight_hh** – the learnable hidden-hidden weights, of shape `(4*hidden_size, hidden_size)`
  * **~LSTMCell.bias_ih** – the learnable input-hidden bias, of shape `(4*hidden_size)`
  * **~LSTMCell.bias_hh** – the learnable hidden-hidden bias, of shape `(4*hidden_size)`

Note

All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k},
\sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}}

Examples:

    
    
    >>> rnn = nn.LSTMCell(10, 20)
    >>> input = torch.randn(3, 10)
    >>> hx = torch.randn(3, 20)
    >>> cx = torch.randn(3, 20)
    >>> output = []
    >>> for i in range(6):
            hx, cx = rnn(input[i], (hx, cx))
            output.append(hx)
    

# MarginRankingLoss

`class torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None,
reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MarginRankingLoss)

    

Creates a criterion that measures the loss given inputs x1x1 , x2x2 , two 1D
mini-batch `Tensors`, and a label 1D mini-batch tensor yy (containing 1 or
-1).

If y=1y = 1 then it assumed the first input should be ranked higher (have a
larger value) than the second input, and vice-versa for y=−1y = -1 .

The loss function for each pair of samples in the mini-batch is:

loss(x1,x2,y)=max⁡(0,−y∗(x1−x2)+margin)\text{loss}(x1, x2, y) = \max(0, -y *
(x1 - x2) + \text{margin})

Parameters

    

  * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Has a default value of 00 .
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input1: (N)(N) where `N` is the batch size.
  * Input2: (N)(N) , same shape as the Input1.
  * Target: (N)(N) , same shape as the inputs.
  * Output: scalar. If `reduction` is `'none'`, then (N)(N) .

Examples:

    
    
    >>> loss = nn.MarginRankingLoss()
    >>> input1 = torch.randn(3, requires_grad=True)
    >>> input2 = torch.randn(3, requires_grad=True)
    >>> target = torch.randn(3).sign()
    >>> output = loss(input1, input2, target)
    >>> output.backward()
    

# MaxPool1d

`class torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1,
return_indices=False, ceil_mode=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxPool1d)

    

Applies a 1D max pooling over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size (N,C,L)(N,
C, L) and output (N,C,Lout)(N, C, L_{out}) can be precisely described as:

out(Ni,Cj,k)=max⁡m=0,…,kernel_size−1input(Ni,Cj,stride×k+m)out(N_i, C_j, k) =
\max_{m=0, \ldots, \text{kernel\\_size} - 1} input(N_i, C_j, stride \times k +
m)

If `padding` is non-zero, then the input is implicitly padded with negative
infinity on both sides for `padding` number of points. `dilation` is the
stride between the elements within the sliding window. This
[link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has
a nice visualization of the pooling parameters.

Note

When ceil_mode=True, sliding windows are allowed to go off-bounds if they
start within the left padding or the input. Sliding windows that would start
in the right padded region are ignored.

Parameters

    

  * **kernel_size** – The size of the sliding window, must be > 0.
  * **stride** – The stride of the sliding window, must be > 0\. Default value is `kernel_size`.
  * **padding** – Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2.
  * **dilation** – The stride between elements within a sliding window, must be > 0.
  * **return_indices** – If `True`, will return the argmax along with the max values. Useful for [`torch.nn.MaxUnpool1d`](torch.nn.maxunpool1d#torch.nn.MaxUnpool1d "torch.nn.MaxUnpool1d") later
  * **ceil_mode** – If `True`, will use `ceil` instead of `floor` to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window.

Shape:

    

  * Input: (N,C,Lin)(N, C, L_{in})
  * Output: (N,C,Lout)(N, C, L_{out}) , where

Lout=⌊Lin+2×padding−dilation×(kernel_size−1)−1stride+1⌋L_{out} = \left\lfloor
\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times
(\text{kernel\\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor

Examples:

    
    
    >>> # pool of size=3, stride=2
    >>> m = nn.MaxPool1d(3, stride=2)
    >>> input = torch.randn(20, 16, 50)
    >>> output = m(input)
    

# MaxPool2d

`class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1,
return_indices=False, ceil_mode=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxPool2d)

    

Applies a 2D max pooling over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size
(N,C,H,W)(N, C, H, W) , output (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) and
`kernel_size` (kH,kW)(kH, kW) can be precisely described as:

out(Ni,Cj,h,w)=max⁡m=0,…,kH−1max⁡n=0,…,kW−1input(Ni,Cj,stride[0]×h+m,stride[1]×w+n)\begin{aligned}
out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1}
\\\ & \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]}
\times w + n) \end{aligned}

If `padding` is non-zero, then the input is implicitly zero-padded on both
sides for `padding` number of points. `dilation` controls the spacing between
the kernel points. It is harder to describe, but this
[link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has
a nice visualization of what `dilation` does.

Note

When ceil_mode=True, sliding windows are allowed to go off-bounds if they
start within the left padding or the input. Sliding windows that would start
in the right padded region are ignored.

The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be:

  * a single `int` – in which case the same value is used for the height and width dimension
  * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension

Parameters

    

  * **kernel_size** – the size of the window to take a max over
  * **stride** – the stride of the window. Default value is `kernel_size`
  * **padding** – implicit zero padding to be added on both sides
  * **dilation** – a parameter that controls the stride of elements in the window
  * **return_indices** – if `True`, will return the max indices along with the outputs. Useful for [`torch.nn.MaxUnpool2d`](torch.nn.maxunpool2d#torch.nn.MaxUnpool2d "torch.nn.MaxUnpool2d") later
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where

Hout=⌊Hin+2∗padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋H_{out} =
\left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]} \times
(\text{kernel\\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor

Wout=⌊Win+2∗padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋W_{out} =
\left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]} \times
(\text{kernel\\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor

Examples:

    
    
    >>> # pool of square window of size=3, stride=2
    >>> m = nn.MaxPool2d(3, stride=2)
    >>> # pool of non-square window
    >>> m = nn.MaxPool2d((3, 2), stride=(2, 1))
    >>> input = torch.randn(20, 16, 50, 32)
    >>> output = m(input)
    

# MaxPool3d

`class torch.nn.MaxPool3d(kernel_size, stride=None, padding=0, dilation=1,
return_indices=False, ceil_mode=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxPool3d)

    

Applies a 3D max pooling over an input signal composed of several input
planes.

In the simplest case, the output value of the layer with input size
(N,C,D,H,W)(N, C, D, H, W) , output (N,C,Dout,Hout,Wout)(N, C, D_{out},
H_{out}, W_{out}) and `kernel_size` (kD,kH,kW)(kD, kH, kW) can be precisely
described as:

out(Ni,Cj,d,h,w)=max⁡k=0,…,kD−1max⁡m=0,…,kH−1max⁡n=0,…,kW−1input(Ni,Cj,stride[0]×d+k,stride[1]×h+m,stride[2]×w+n)\begin{aligned}
\text{out}(N_i, C_j, d, h, w) ={} & \max_{k=0, \ldots, kD-1} \max_{m=0,
\ldots, kH-1} \max_{n=0, \ldots, kW-1} \\\ & \text{input}(N_i, C_j,
\text{stride[0]} \times d + k, \text{stride[1]} \times h + m, \text{stride[2]}
\times w + n) \end{aligned}

If `padding` is non-zero, then the input is implicitly zero-padded on both
sides for `padding` number of points. `dilation` controls the spacing between
the kernel points. It is harder to describe, but this
[link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has
a nice visualization of what `dilation` does.

Note

When ceil_mode=True, sliding windows are allowed to go off-bounds if they
start within the left padding or the input. Sliding windows that would start
in the right padded region are ignored.

The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be:

  * a single `int` – in which case the same value is used for the depth, height and width dimension
  * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension

Parameters

    

  * **kernel_size** – the size of the window to take a max over
  * **stride** – the stride of the window. Default value is `kernel_size`
  * **padding** – implicit zero padding to be added on all three sides
  * **dilation** – a parameter that controls the stride of elements in the window
  * **return_indices** – if `True`, will return the max indices along with the outputs. Useful for [`torch.nn.MaxUnpool3d`](torch.nn.maxunpool3d#torch.nn.MaxUnpool3d "torch.nn.MaxUnpool3d") later
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape

Shape:

    

  * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in})
  * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where

Dout=⌊Din+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋D_{out} =
\left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0]
\times (\text{kernel\\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

Hout=⌊Hin+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋H_{out} =
\left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1]
\times (\text{kernel\\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

Wout=⌊Win+2×padding[2]−dilation[2]×(kernel_size[2]−1)−1stride[2]+1⌋W_{out} =
\left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2]
\times (\text{kernel\\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor

Examples:

    
    
    >>> # pool of square window of size=3, stride=2
    >>> m = nn.MaxPool3d(3, stride=2)
    >>> # pool of non-square window
    >>> m = nn.MaxPool3d((3, 2, 2), stride=(2, 1, 2))
    >>> input = torch.randn(20, 16, 50,44, 31)
    >>> output = m(input)
    

# MaxUnpool1d

`class torch.nn.MaxUnpool1d(kernel_size, stride=None, padding=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxUnpool1d)

    

Computes a partial inverse of
[`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d").

[`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") is
not fully invertible, since the non-maximal values are lost.

`MaxUnpool1d` takes in as input the output of
[`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d")
including the indices of the maximal values and computes a partial inverse in
which all non-maximal values are set to zero.

Note

[`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") can
map several input sizes to the same output sizes. Hence, the inversion process
can get ambiguous. To accommodate this, you can provide the needed output size
as an additional argument `output_size` in the forward call. See the Inputs
and Example below.

Parameters

    

  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the max pooling window.
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Stride of the max pooling window. It is set to `kernel_size` by default.
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Padding that was added to the input

Inputs:

    

  * `input`: the input Tensor to invert
  * `indices`: the indices given out by [`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d")
  * `output_size` (optional): the targeted output size

Shape:

    

  * Input: (N,C,Hin)(N, C, H_{in})
  * Output: (N,C,Hout)(N, C, H_{out}) , where

Hout=(Hin−1)×stride[0]−2×padding[0]+kernel_size[0]H_{out} = (H_{in} - 1)
\times \text{stride}[0] - 2 \times \text{padding}[0] + \text{kernel\\_size}[0]

or as given by `output_size` in the call operator

Example:

    
    
    >>> pool = nn.MaxPool1d(2, stride=2, return_indices=True)
    >>> unpool = nn.MaxUnpool1d(2, stride=2)
    >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8]]])
    >>> output, indices = pool(input)
    >>> unpool(output, indices)
    tensor([[[ 0.,  2.,  0.,  4.,  0.,  6.,  0., 8.]]])
    
    >>> # Example showcasing the use of output_size
    >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8, 9]]])
    >>> output, indices = pool(input)
    >>> unpool(output, indices, output_size=input.size())
    tensor([[[ 0.,  2.,  0.,  4.,  0.,  6.,  0., 8.,  0.]]])
    
    >>> unpool(output, indices)
    tensor([[[ 0.,  2.,  0.,  4.,  0.,  6.,  0., 8.]]])
    

# MaxUnpool2d

`class torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxUnpool2d)

    

Computes a partial inverse of
[`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d").

[`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") is
not fully invertible, since the non-maximal values are lost.

`MaxUnpool2d` takes in as input the output of
[`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d")
including the indices of the maximal values and computes a partial inverse in
which all non-maximal values are set to zero.

Note

[`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") can
map several input sizes to the same output sizes. Hence, the inversion process
can get ambiguous. To accommodate this, you can provide the needed output size
as an additional argument `output_size` in the forward call. See the Inputs
and Example below.

Parameters

    

  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the max pooling window.
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Stride of the max pooling window. It is set to `kernel_size` by default.
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Padding that was added to the input

Inputs:

    

  * `input`: the input Tensor to invert
  * `indices`: the indices given out by [`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d")
  * `output_size` (optional): the targeted output size

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where

Hout=(Hin−1)×stride[0]−2×padding[0]+kernel_size[0]H_{out} = (H_{in} - 1)
\times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\\_size[0]}

Wout=(Win−1)×stride[1]−2×padding[1]+kernel_size[1]W_{out} = (W_{in} - 1)
\times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\\_size[1]}

or as given by `output_size` in the call operator

Example:

    
    
    >>> pool = nn.MaxPool2d(2, stride=2, return_indices=True)
    >>> unpool = nn.MaxUnpool2d(2, stride=2)
    >>> input = torch.tensor([[[[ 1.,  2,  3,  4],
                                [ 5,  6,  7,  8],
                                [ 9, 10, 11, 12],
                                [13, 14, 15, 16]]]])
    >>> output, indices = pool(input)
    >>> unpool(output, indices)
    tensor([[[[  0.,   0.,   0.,   0.],
              [  0.,   6.,   0.,   8.],
              [  0.,   0.,   0.,   0.],
              [  0.,  14.,   0.,  16.]]]])
    
    >>> # specify a different output size than input size
    >>> unpool(output, indices, output_size=torch.Size([1, 1, 5, 5]))
    tensor([[[[  0.,   0.,   0.,   0.,   0.],
              [  6.,   0.,   8.,   0.,   0.],
              [  0.,   0.,   0.,  14.,   0.],
              [ 16.,   0.,   0.,   0.,   0.],
              [  0.,   0.,   0.,   0.,   0.]]]])
    

# MaxUnpool3d

`class torch.nn.MaxUnpool3d(kernel_size, stride=None, padding=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxUnpool3d)

    

Computes a partial inverse of
[`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d").

[`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") is
not fully invertible, since the non-maximal values are lost. `MaxUnpool3d`
takes in as input the output of
[`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d")
including the indices of the maximal values and computes a partial inverse in
which all non-maximal values are set to zero.

Note

[`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") can
map several input sizes to the same output sizes. Hence, the inversion process
can get ambiguous. To accommodate this, you can provide the needed output size
as an additional argument `output_size` in the forward call. See the Inputs
section below.

Parameters

    

  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the max pooling window.
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Stride of the max pooling window. It is set to `kernel_size` by default.
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Padding that was added to the input

Inputs:

    

  * `input`: the input Tensor to invert
  * `indices`: the indices given out by [`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d")
  * `output_size` (optional): the targeted output size

Shape:

    

  * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in})
  * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where

Dout=(Din−1)×stride[0]−2×padding[0]+kernel_size[0]D_{out} = (D_{in} - 1)
\times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\\_size[0]}

Hout=(Hin−1)×stride[1]−2×padding[1]+kernel_size[1]H_{out} = (H_{in} - 1)
\times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\\_size[1]}

Wout=(Win−1)×stride[2]−2×padding[2]+kernel_size[2]W_{out} = (W_{in} - 1)
\times \text{stride[2]} - 2 \times \text{padding[2]} + \text{kernel\\_size[2]}

or as given by `output_size` in the call operator

Example:

    
    
    >>> # pool of square window of size=3, stride=2
    >>> pool = nn.MaxPool3d(3, stride=2, return_indices=True)
    >>> unpool = nn.MaxUnpool3d(3, stride=2)
    >>> output, indices = pool(torch.randn(20, 16, 51, 33, 15))
    >>> unpooled_output = unpool(output, indices)
    >>> unpooled_output.size()
    torch.Size([20, 16, 51, 33, 15])
    

# Module

`class torch.nn.Module`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module)

    

Base class for all neural network modules.

Your models should also subclass this class.

Modules can also contain other Modules, allowing to nest them in a tree
structure. You can assign the submodules as regular attributes:

    
    
    import torch.nn as nn
    import torch.nn.functional as F
    
    class Model(nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.conv1 = nn.Conv2d(1, 20, 5)
            self.conv2 = nn.Conv2d(20, 20, 5)
    
        def forward(self, x):
            x = F.relu(self.conv1(x))
            return F.relu(self.conv2(x))
    

Submodules assigned in this way will be registered, and will have their
parameters converted too when you call `to()`, etc.

Variables

    

**training** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – Boolean represents whether this module is in training
or evaluation mode.

`add_module(name, module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.add_module)

    

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

    

  * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name
  * **module** (Module) – child module to be added to the module.

`apply(fn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.apply)

    

Applies `fn` recursively to every submodule (as returned by `.children()`) as
well as self. Typical use includes initializing the parameters of a model (see
also [torch.nn.init](../nn.init#nn-init-doc)).

Parameters

    

**fn** (`Module` -> None) – function to be applied to each submodule

Returns

    

self

Return type

    

Module

Example:

    
    
    >>> @torch.no_grad()
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    

`bfloat16()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.bfloat16)

    

Casts all floating point parameters and buffers to `bfloat16` datatype.

Returns

    

self

Return type

    

Module

`buffers(recurse=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.buffers)

    

Returns an iterator over module buffers.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields buffers of this module and all
submodules. Otherwise, yields only buffers that are direct members of this
module.

Yields

    

_torch.Tensor_ – module buffer

Example:

    
    
    >>> for buf in model.buffers():
    >>>     print(type(buf), buf.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`children()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.children)

    

Returns an iterator over immediate children modules.

Yields

    

_Module_ – a child module

`cpu()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.cpu)

    

Moves all model parameters and buffers to the CPU.

Returns

    

self

Return type

    

Module

`cuda(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.cuda)

    

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on GPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

Module

`double()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.double)

    

Casts all floating point parameters and buffers to `double` datatype.

Returns

    

self

Return type

    

Module

`dump_patches: bool = False`

    

This allows better BC support for `load_state_dict()`. In `state_dict()`, the
version number will be saved as in the attribute `_metadata` of the returned
state dict, and thus pickled. `_metadata` is a dictionary with keys that
follow the naming convention of state dict. See `_load_from_state_dict` on how
to use this information in loading.

If new parameters/buffers are added/removed from a module, this number shall
be bumped, and the module’s `_load_from_state_dict` method can compare the
version number and do appropriate changes if the state dict is from before the
change.

`eval()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.eval)

    

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout
"torch.nn.Dropout"), `BatchNorm`, etc.

This is equivalent with `self.train(False)`.

Returns

    

self

Return type

    

Module

`extra_repr()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.extra_repr)

    

Set the extra representation of the module

To print customized extra information, you should re-implement this method in
your own modules. Both single-line and multi-line strings are acceptable.

`float()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.float)

    

Casts all floating point parameters and buffers to float datatype.

Returns

    

self

Return type

    

Module

`forward(*input)`

    

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function,
one should call the `Module` instance afterwards instead of this since the
former takes care of running the registered hooks while the latter silently
ignores them.

`half()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.half)

    

Casts all floating point parameters and buffers to `half` datatype.

Returns

    

self

Return type

    

Module

`load_state_dict(state_dict, strict=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.load_state_dict)

    

Copies parameters and buffers from `state_dict` into this module and its
descendants. If `strict` is `True`, then the keys of `state_dict` must exactly
match the keys returned by this module’s `state_dict()` function.

Parameters

    

  * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s `state_dict()` function. Default: `True`

Returns

    

  * **missing_keys** is a list of str containing the missing keys
  * **unexpected_keys** is a list of str containing the unexpected keys

Return type

    

`NamedTuple` with `missing_keys` and `unexpected_keys` fields

`modules()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.modules)

    

Returns an iterator over all modules in the network.

Yields

    

_Module_ – a module in the network

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    
    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)
    

`named_buffers(prefix='', recurse=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_buffers)

    

Returns an iterator over module buffers, yielding both the name of the buffer
as well as the buffer itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

    

_(string, torch.Tensor)_ – Tuple containing the name and buffer

Example:

    
    
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    

`named_children()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_children)

    

Returns an iterator over immediate children modules, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple containing a name and child module

Example:

    
    
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    

`named_modules(memo=None, prefix='')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_modules)

    

Returns an iterator over all modules in the network, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    
    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
    

`named_parameters(prefix='', recurse=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_parameters)

    

Returns an iterator over module parameters, yielding both the name of the
parameter as well as the parameter itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

    

_(string, Parameter)_ – Tuple containing the name and parameter

Example:

    
    
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    

`parameters(recurse=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.parameters)

    

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields parameters of this module and
all submodules. Otherwise, yields only parameters that are direct members of
this module.

Yields

    

_Parameter_ – module parameter

Example:

    
    
    >>> for param in model.parameters():
    >>>     print(type(param), param.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`register_backward_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_backward_hook)

    

Registers a backward hook on the module.

This function is deprecated in favor of
`nn.Module.register_full_backward_hook()` and the behavior of this function
will change in future versions.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_buffer(name, tensor, persistent=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_buffer)

    

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a
model parameter. For example, BatchNorm’s `running_mean` is not a parameter,
but is part of the module’s state. Buffers, by default, are persistent and
will be saved alongside parameters. This behavior can be changed by setting
`persistent` to `False`. The only difference between a persistent buffer and a
non-persistent buffer is that the latter will not be a part of this module’s
`state_dict`.

Buffers can be accessed as attributes using given names.

Parameters

    

  * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name
  * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered.
  * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`.

Example:

    
    
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    

`register_forward_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_forward_hook)

    

Registers a forward hook on the module.

The hook will be called every time after `forward()` has computed an output.
It should have the following signature:

    
    
    hook(module, input, output) -> None or modified output
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the output. It can modify the input inplace but it will not have effect
on forward since this is called after `forward()` is called.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_forward_pre_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_forward_pre_hook)

    

Registers a forward pre-hook on the module.

The hook will be called every time before `forward()` is invoked. It should
have the following signature:

    
    
    hook(module, input) -> None or modified input
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the input. User can either return a tuple or a single modified value in
the hook. We will wrap the value into a tuple if a single value is
returned(unless that value is already a tuple).

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_full_backward_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_full_backward_hook)

    

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs
are computed. The hook should have the following signature:

    
    
    hook(module, grad_input, grad_output) -> tuple(Tensor) or None
    

The `grad_input` and `grad_output` are tuples that contain the gradients with
respect to the inputs and outputs respectively. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to the
input that will be used in place of `grad_input` in subsequent computations.
`grad_input` will only correspond to the inputs given as positional arguments
and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output`
will be `None` for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks
and will raise an error.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_parameter(name, param)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_parameter)

    

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

    

  * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name
  * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module.

`requires_grad_(requires_grad=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.requires_grad_)

    

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ `requires_grad` attributes in-place.

This method is helpful for freezing part of the module for finetuning or
training parts of a model individually (e.g., GAN training).

Parameters

    

**requires_grad**
([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)")) – whether autograd should record operations on parameters in this
module. Default: `True`.

Returns

    

self

Return type

    

Module

`state_dict(destination=None, prefix='', keep_vars=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.state_dict)

    

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included.
Keys are corresponding parameter and buffer names.

Returns

    

a dictionary containing a whole state of the module

Return type

    

[dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)")

Example:

    
    
    >>> module.state_dict().keys()
    ['bias', 'weight']
    

`to(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to)

    

Moves and/or casts the parameters and buffers.

This can be called as

`to(device=None, dtype=None, non_blocking=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to)

`to(dtype, non_blocking=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to)

`to(tensor, non_blocking=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to)

`to(memory_format=torch.channels_last)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to)

Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to
"torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In
addition, this method will only cast the floating point or complex parameters
and buffers to :attr:`dtype` (if given). The integral parameters and buffers
will be moved `device`, if that is given, but with dtypes unchanged. When
`non_blocking` is set, it tries to convert/move asynchronously with respect to
the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA
devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

    

  * **device** (`torch.device`) – the desired device of the parameters and buffers in this module
  * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module
  * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
  * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

    

self

Return type

    

Module

Examples:

    
    
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    
    >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.3741+0.j,  0.2382+0.j],
            [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
    >>> linear(torch.ones(3, 2, dtype=torch.cdouble))
    tensor([[0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
    

`train(mode=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.train)

    

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout
"torch.nn.Dropout"), `BatchNorm`, etc.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode
(`False`). Default: `True`.

Returns

    

self

Return type

    

Module

`type(dst_type)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.type)

    

Casts all parameters and buffers to `dst_type`.

Parameters

    

**dst_type** ([type](https://docs.python.org/3/library/functions.html#type
"\(in Python v3.9\)") _or_ _string_) – the desired type

Returns

    

self

Return type

    

Module

`xpu(device=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.xpu)

    

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on XPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

Module

`zero_grad(set_to_none=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.zero_grad)

    

Sets gradients of all model parameters to zero. See similar function under
[`torch.optim.Optimizer`](../optim#torch.optim.Optimizer
"torch.optim.Optimizer") for more context.

Parameters

    

**set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – instead of setting to zero, set the grads to None.
See
[`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad
"torch.optim.Optimizer.zero_grad") for details.

# ModuleDict

`class torch.nn.ModuleDict(modules=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict)

    

Holds submodules in a dictionary.

`ModuleDict` can be indexed like a regular Python dictionary, but modules it
contains are properly registered, and will be visible by all
[`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") methods.

`ModuleDict` is an **ordered** dictionary that respects

  * the order of insertion, and
  * in `update()`, the order of the merged `OrderedDict`, `dict` (started from Python 3.6) or another `ModuleDict` (the argument to `update()`).

Note that `update()` with other unordered mapping types (e.g., Python’s plain
`dict` before Python version 3.6) does not preserve the order of the merged
mapping.

Parameters

    

**modules** (_iterable_ _,__optional_) – a mapping (dictionary) of (string:
module) or an iterable of key-value pairs of type (string, module)

Example:

    
    
    class MyModule(nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
            self.choices = nn.ModuleDict({
                    'conv': nn.Conv2d(10, 10, 3),
                    'pool': nn.MaxPool2d(3)
            })
            self.activations = nn.ModuleDict([
                    ['lrelu', nn.LeakyReLU()],
                    ['prelu', nn.PReLU()]
            ])
    
        def forward(self, x, choice, act):
            x = self.choices[choice](x)
            x = self.activations[act](x)
            return x
    

`clear()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.clear)

    

Remove all items from the ModuleDict.

`items()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.items)

    

Return an iterable of the ModuleDict key/value pairs.

`keys()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.keys)

    

Return an iterable of the ModuleDict keys.

`pop(key)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.pop)

    

Remove key from the ModuleDict and return its module.

Parameters

    

**key** (_string_) – key to pop from the ModuleDict

`update(modules)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.update)

    

Update the `ModuleDict` with the key-value pairs from a mapping or an
iterable, overwriting existing keys.

Note

If `modules` is an `OrderedDict`, a `ModuleDict`, or an iterable of key-value
pairs, the order of new elements in it is preserved.

Parameters

    

**modules** (_iterable_) – a mapping (dictionary) from string to
[`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module"), or an iterable
of key-value pairs of type (string, [`Module`](torch.nn.module#torch.nn.Module
"torch.nn.Module"))

`values()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.values)

    

Return an iterable of the ModuleDict values.

# ModuleList

`class torch.nn.ModuleList(modules=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList)

    

Holds submodules in a list.

`ModuleList` can be indexed like a regular Python list, but modules it
contains are properly registered, and will be visible by all
[`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") methods.

Parameters

    

**modules** (_iterable_ _,__optional_) – an iterable of modules to add

Example:

    
    
    class MyModule(nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
            self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])
    
        def forward(self, x):
            # ModuleList can act as an iterable, or be indexed using ints
            for i, l in enumerate(self.linears):
                x = self.linears[i // 2](x) + l(x)
            return x
    

`append(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList.append)

    

Appends a given module to the end of the list.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module to append

`extend(modules)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList.extend)

    

Appends modules from a Python iterable to the end of the list.

Parameters

    

**modules** (_iterable_) – iterable of modules to append

`insert(index, module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList.insert)

    

Insert a given module before a given index in the list.

Parameters

    

  * **index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index to insert.
  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module to insert

# LazyModuleMixin

`class torch.nn.modules.lazy.LazyModuleMixin(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/lazy.html#LazyModuleMixin)

    

A mixin for modules that lazily initialize parameters, also known as “lazy
modules.”

Modules that lazily initialize parameters, or “lazy modules”, derive the
shapes of their parameters from the first input(s) to their forward method.
Until that first forward they contain `torch.nn.UninitializedParameter`s that
should not be accessed or used, and afterward they contain regular
:class:`torch.nn.Parameter`s. Lazy modules are convenient since they don't
require computing some module arguments, like the `in_features` argument of a
typical [`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear
"torch.nn.Linear").

After construction, networks with lazy modules should first be converted to
the desired dtype and placed on the desired device. The lazy modules should
then be initialized with one or more “dry runs”. These “dry runs” send inputs
of the correct size, dtype, and device through the network and to each one of
its lazy modules. After this the network can be used as usual.

    
    
    >>> class LazyMLP(torch.nn.Module):
    ...    def __init__(self):
    ...        super().__init__()
    ...        self.fc1 = torch.nn.LazyLinear(10)
    ...        self.relu1 = torch.nn.ReLU()
    ...        self.fc2 = torch.nn.LazyLinear(1)
    ...        self.relu2 = torch.nn.ReLU()
    ...
    ...    def forward(self, input):
    ...        x = self.relu1(self.fc1(input))
    ...        y = self.relu2(self.fc2(x))
    ...        return y
    >>> # constructs a network with lazy modules
    >>> lazy_mlp = LazyMLP()
    >>> # transforms the network's device and dtype
    >>> # NOTE: these transforms can and should be applied after construction and before any 'dry runs'
    >>> lazy_mlp = mlp.cuda().double()
    >>> lazy_mlp
    LazyMLP(
      (fc1): LazyLinear(in_features=0, out_features=10, bias=True)
      (relu1): ReLU()
      (fc2): LazyLinear(in_features=0, out_features=1, bias=True)
      (relu2): ReLU()
    )
    >>> # performs a dry run to initialize the network's lazy modules
    >>> lazy_mlp(torch.ones(10,10).cuda())
    >>> # after initialization, LazyLinear modules become regular Linear modules
    >>> lazy_mlp
    LazyMLP(
      (fc1): Linear(in_features=10, out_features=10, bias=True)
      (relu1): ReLU()
      (fc2): Linear(in_features=10, out_features=1, bias=True)
      (relu2): ReLU()
    )
    >>> # attaches an optimizer, since parameters can now be used as usual
    >>> optim = torch.optim.SGD(mlp.parameters(), lr=0.01)
    

A final caveat when using lazy modules is that the order of initialization of
a network’s parameters may change, since the lazy modules are always
initialized after other modules. This can cause the parameters of a network
using lazy modules to be initialized differently than the parameters of a
network without lazy modules. For example, if the LazyMLP class defined above
had a [`torch.nn.LazyLinear`](torch.nn.lazylinear#torch.nn.LazyLinear
"torch.nn.LazyLinear") module first and then a regular
[`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear") second,
the second module would be initialized on construction and the first module
would be initialized during the first dry run.

Lazy modules can be serialized with a state dict like other modules. For
example:

    
    
    >>> lazy_mlp = LazyMLP()
    >>> # The state dict shows the uninitialized parameters
    >>> lazy_mlp.state_dict()
    OrderedDict([('fc1.weight', Uninitialized parameter),
                 ('fc1.bias',
                  tensor([-1.8832e+25,  4.5636e-41, -1.8832e+25,  4.5636e-41, -6.1598e-30,
                           4.5637e-41, -1.8788e+22,  4.5636e-41, -2.0042e-31,  4.5637e-41])),
                 ('fc2.weight', Uninitialized parameter),
                 ('fc2.bias', tensor([0.0019]))])
    

Lazy modules can also load regular `torch.nn.Parameter` s, which replace their
`torch.nn.UninitializedParameter` s:

    
    
    >>> full_mlp = LazyMLP()
    >>> # Dry run to initialize another module
    >>> full_mlp.forward(torch.ones(10, 1))
    >>> # Load an initialized state into a lazy module
    >>> lazy_mlp.load_state_dict(full_mlp.state_dict())
    >>> # The state dict now holds valid values
    >>> lazy_mlp.state_dict()
    OrderedDict([('fc1.weight',
                  tensor([[-0.3837],
                          [ 0.0907],
                          [ 0.6708],
                          [-0.5223],
                          [-0.9028],
                          [ 0.2851],
                          [-0.4537],
                          [ 0.6813],
                          [ 0.5766],
                          [-0.8678]])),
                 ('fc1.bias',
                  tensor([-1.8832e+25,  4.5636e-41, -1.8832e+25,  4.5636e-41, -6.1598e-30,
                           4.5637e-41, -1.8788e+22,  4.5636e-41, -2.0042e-31,  4.5637e-41])),
                 ('fc2.weight',
                  tensor([[ 0.1320,  0.2938,  0.0679,  0.2793,  0.1088, -0.1795, -0.2301,  0.2807,
                            0.2479,  0.1091]])),
                 ('fc2.bias', tensor([0.0019]))])
    

Note, however, that lazy modules cannot validate that the shape of parameters
they load is correct.

`has_uninitialized_params()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/lazy.html#LazyModuleMixin.has_uninitialized_params)

    

Check if a module has parameters that are not initialized

`initialize_parameters(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/lazy.html#LazyModuleMixin.initialize_parameters)

    

Initialize parameters according to the input batch properties. This adds an
interface to isolate parameter initialization from the forward pass when doing
parameter shape inference.

# torch.nn.modules.module.register_module_backward_hook

`torch.nn.modules.module.register_module_backward_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#register_module_backward_hook)

    

Registers a backward hook common to all the modules.

This function is deprecated in favor of
`nn.module.register_module_full_backward_hook()` and the behavior of this
function will change in future versions.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

# torch.nn.modules.module.register_module_forward_hook

`torch.nn.modules.module.register_module_forward_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#register_module_forward_hook)

    

Registers a global forward hook for all the modules

Warning

This adds global state to the `nn.module` module and it is only intended for
debugging/profiling purposes.

The hook will be called every time after `forward()` has computed an output.
It should have the following signature:

    
    
    hook(module, input, output) -> None or modified output
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the output. It can modify the input inplace but it will not have effect
on forward since this is called after `forward()` is called.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

This hook will be executed before specific module hooks registered with
`register_forward_hook`.

# torch.nn.modules.module.register_module_forward_pre_hook

`torch.nn.modules.module.register_module_forward_pre_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#register_module_forward_pre_hook)

    

Registers a forward pre-hook common to all modules.

Warning

This adds global state to the `nn.module` module and it is only intended for
debugging/profiling purposes.

The hook will be called every time before `forward()` is invoked. It should
have the following signature:

    
    
    hook(module, input) -> None or modified input
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the input. User can either return a tuple or a single modified value in
the hook. We will wrap the value into a tuple if a single value is
returned(unless that value is already a tuple).

This hook has precedence over the specific module hooks registered with
`register_forward_pre_hook`.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

# MSELoss

`class torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MSELoss)

    

Creates a criterion that measures the mean squared error (squared L2 norm)
between each element in the input xx and target yy .

The unreduced (i.e. with `reduction` set to `'none'`) loss can be described
as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=(xn−yn)2,\ell(x, y) = L = \\{l_1,\dots,l_N\\}^\top,
\quad l_n = \left( x_n - y_n \right)^2,

where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`),
then:

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) =
\begin{cases} \operatorname{mean}(L), & \text{if reduction} =
\text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}
\end{cases}

xx and yy are tensors of arbitrary shapes with a total of nn elements each.

The mean operation still operates over all the elements, and divides by nn .

The division by nn can be avoided if one sets `reduction = 'sum'`.

Parameters

    

  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> loss = nn.MSELoss()
    >>> input = torch.randn(3, 5, requires_grad=True)
    >>> target = torch.randn(3, 5)
    >>> output = loss(input, target)
    >>> output.backward()
    

# MultiheadAttention

`class torch.nn.MultiheadAttention(embed_dim, num_heads, dropout=0.0,
bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#MultiheadAttention)

    

Allows the model to jointly attend to information from different
representation subspaces. See [Attention Is All You
Need](https://arxiv.org/abs/1706.03762)

MultiHead(Q,K,V)=Concat(head1,…,headh)WO\text{MultiHead}(Q, K, V) =
\text{Concat}(head_1,\dots,head_h)W^O

where headi=Attention(QWiQ,KWiK,VWiV)head_i = \text{Attention}(QW_i^Q, KW_i^K,
VW_i^V) .

Parameters

    

  * **embed_dim** – total dimension of the model.
  * **num_heads** – parallel attention heads.
  * **dropout** – a Dropout layer on attn_output_weights. Default: 0.0.
  * **bias** – add bias as module parameter. Default: True.
  * **add_bias_kv** – add bias to the key and value sequences at dim=0.
  * **add_zero_attn** – add a new batch of zeros to the key and value sequences at dim=1.
  * **kdim** – total number of features in key. Default: None.
  * **vdim** – total number of features in value. Default: None.

Note that if `kdim` and `vdim` are None, they will be set to `embed_dim` such
that query, key, and value have the same number of features.

Examples:

    
    
    >>> multihead_attn = nn.MultiheadAttention(embed_dim, num_heads)
    >>> attn_output, attn_output_weights = multihead_attn(query, key, value)
    

`forward(query, key, value, key_padding_mask=None, need_weights=True,
attn_mask=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#MultiheadAttention.forward)

    

Parameters

    

  * **key, value** (_query_ _,_) – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details.
  * **key_padding_mask** – if provided, specified padding elements in the key will be ignored by the attention. When given a binary mask and a value is True, the corresponding value on the attention layer will be ignored. When given a byte mask and a value is non-zero, the corresponding value on the attention layer will be ignored
  * **need_weights** – output attn_output_weights.
  * **attn_mask** – 2D or 3D mask that prevents attention to certain positions. A 2D mask will be broadcasted for all the batches while a 3D mask allows to specify a different mask for the entries of each batch.

Shapes for inputs:

    

  * query: (L,N,E)(L, N, E) where L is the target sequence length, N is the batch size, E is the embedding dimension.
  * key: (S,N,E)(S, N, E) , where S is the source sequence length, N is the batch size, E is the embedding dimension.
  * value: (S,N,E)(S, N, E) where S is the source sequence length, N is the batch size, E is the embedding dimension.
  * key_padding_mask: (N,S)(N, S) where N is the batch size, S is the source sequence length. If a ByteTensor is provided, the non-zero positions will be ignored while the position with the zero positions will be unchanged. If a BoolTensor is provided, the positions with the value of `True` will be ignored while the position with the value of `False` will be unchanged.
  * attn_mask: if a 2D mask: (L,S)(L, S) where L is the target sequence length, S is the source sequence length.

If a 3D mask: (N⋅num_heads,L,S)(N\cdot\text{num\\_heads}, L, S) where N is the
batch size, L is the target sequence length, S is the source sequence length.
`attn_mask` ensure that position i is allowed to attend the unmasked
positions. If a ByteTensor is provided, the non-zero positions are not allowed
to attend while the zero positions will be unchanged. If a BoolTensor is
provided, positions with `True` is not allowed to attend while `False` values
will be unchanged. If a FloatTensor is provided, it will be added to the
attention weight.

Shapes for outputs:

    

  * attn_output: (L,N,E)(L, N, E) where L is the target sequence length, N is the batch size, E is the embedding dimension.
  * attn_output_weights: (N,L,S)(N, L, S) where N is the batch size, L is the target sequence length, S is the source sequence length.

# MultiLabelMarginLoss

`class torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None,
reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MultiLabelMarginLoss)

    

Creates a criterion that optimizes a multi-class multi-classification hinge
loss (margin-based loss) between input xx (a 2D mini-batch `Tensor`) and
output yy (which is a 2D `Tensor` of target class indices). For each sample in
the mini-batch:

loss(x,y)=∑ijmax⁡(0,1−(x[y[j]]−x[i]))x.size(0)\text{loss}(x, y) =
\sum_{ij}\frac{\max(0, 1 - (x[y[j]] - x[i]))}{\text{x.size}(0)}

where x∈{0,⋯,x.size(0)−1}x \in \left\\{0, \; \cdots , \; \text{x.size}(0) -
1\right\\} , y∈{0,⋯,y.size(0)−1}y \in \left\\{0, \; \cdots , \;
\text{y.size}(0) - 1\right\\} , 0≤y[j]≤x.size(0)−10 \leq y[j] \leq
\text{x.size}(0)-1 , and i≠y[j]i \neq y[j] for all ii and jj .

yy and xx must have the same size.

The criterion only considers a contiguous block of non-negative targets that
starts at the front.

This allows for different samples to have variable amounts of target classes.

Parameters

    

  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (C)(C) or (N,C)(N, C) where `N` is the batch size and `C` is the number of classes.
  * Target: (C)(C) or (N,C)(N, C) , label targets padded by -1 ensuring same shape as the input.
  * Output: scalar. If `reduction` is `'none'`, then (N)(N) .

Examples:

    
    
    >>> loss = nn.MultiLabelMarginLoss()
    >>> x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]])
    >>> # for target y, only consider labels 3 and 0, not after label -1
    >>> y = torch.LongTensor([[3, 0, -1, 1]])
    >>> loss(x, y)
    >>> # 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4)))
    tensor(0.8500)
    

# MultiLabelSoftMarginLoss

`class torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None,
reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MultiLabelSoftMarginLoss)

    

Creates a criterion that optimizes a multi-label one-versus-all loss based on
max-entropy, between input xx and target yy of size (N,C)(N, C) . For each
sample in the minibatch:

loss(x,y)=−1C∗∑iy[i]∗log⁡((1+exp⁡(−x[i]))−1)+(1−y[i])∗log⁡(exp⁡(−x[i])(1+exp⁡(−x[i])))loss(x,
y) = - \frac{1}{C} * \sum_i y[i] * \log((1 + \exp(-x[i]))^{-1}) + (1-y[i]) *
\log\left(\frac{\exp(-x[i])}{(1 + \exp(-x[i]))}\right)

where i∈{0,⋯,x.nElement()−1}i \in \left\\{0, \; \cdots , \;
\text{x.nElement}() - 1\right\\} , y[i]∈{0,1}y[i] \in \left\\{0, \; 1\right\\}
.

Parameters

    

  * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size `C`. Otherwise, it is treated as if having all ones.
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (N,C)(N, C) where `N` is the batch size and `C` is the number of classes.
  * Target: (N,C)(N, C) , label targets padded by -1 ensuring same shape as the input.
  * Output: scalar. If `reduction` is `'none'`, then (N)(N) .

# MultiMarginLoss

`class torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None,
size_average=None, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MultiMarginLoss)

    

Creates a criterion that optimizes a multi-class classification hinge loss
(margin-based loss) between input xx (a 2D mini-batch `Tensor`) and output yy
(which is a 1D tensor of target class indices, 0≤y≤x.size(1)−10 \leq y \leq
\text{x.size}(1)-1 ):

For each mini-batch sample, the loss in terms of the 1D input xx and scalar
output yy is:

loss(x,y)=∑imax⁡(0,margin−x[y]+x[i]))px.size(0)\text{loss}(x, y) =
\frac{\sum_i \max(0, \text{margin} - x[y] + x[i]))^p}{\text{x.size}(0)}

where x∈{0,⋯,x.size(0)−1}x \in \left\\{0, \; \cdots , \; \text{x.size}(0) -
1\right\\} and i≠yi \neq y .

Optionally, you can give non-equal weighting on the classes by passing a 1D
`weight` tensor into the constructor.

The loss function then becomes:

loss(x,y)=∑imax⁡(0,w[y]∗(margin−x[y]+x[i]))p)x.size(0)\text{loss}(x, y) =
\frac{\sum_i \max(0, w[y] * (\text{margin} - x[y] +
x[i]))^p)}{\text{x.size}(0)}

Parameters

    

  * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Has a default value of 11 . 11 and 22 are the only supported values.
  * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Has a default value of 11 .
  * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size `C`. Otherwise, it is treated as if having all ones.
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

# NLLLoss

`class torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100,
reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#NLLLoss)

    

The negative log likelihood loss. It is useful to train a classification
problem with `C` classes.

If provided, the optional argument `weight` should be a 1D Tensor assigning
weight to each of the classes. This is particularly useful when you have an
unbalanced training set.

The `input` given through a forward call is expected to contain log-
probabilities of each class. `input` has to be a Tensor of size either
(minibatch,C)(minibatch, C) or (minibatch,C,d1,d2,...,dK)(minibatch, C, d_1,
d_2, ..., d_K) with K≥1K \geq 1 for the `K`-dimensional case (described
later).

Obtaining log-probabilities in a neural network is easily achieved by adding a
`LogSoftmax` layer in the last layer of your network. You may use
`CrossEntropyLoss` instead, if you prefer not to add an extra layer.

The `target` that this loss expects should be a class index in the range
[0,C−1][0, C-1] where `C = number of classes`; if `ignore_index` is specified,
this loss also accepts this class index (this index may not necessarily be in
the class range).

The unreduced (i.e. with `reduction` set to `'none'`) loss can be described
as:

ℓ(x,y)=L={l1,…,lN}⊤,ln=−wynxn,yn,wc=weight[c]⋅1{c≠ignore_index},\ell(x, y) =
L = \\{l_1,\dots,l_N\\}^\top, \quad l_n = - w_{y_n} x_{n,y_n}, \quad w_{c} =
\text{weight}[c] \cdot \mathbb{1}\\{c \not= \text{ignore\\_index}\\},

where xx is the input, yy is the target, ww is the weight, and NN is the batch
size. If `reduction` is not `'none'` (default `'mean'`), then

ℓ(x,y)={∑n=1N1∑n=1Nwynln,if reduction=‘mean’;∑n=1Nln,if
reduction=‘sum’.\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N
w_{y_n}} l_n, & \text{if reduction} = \text{`mean';}\\\ \sum_{n=1}^N l_n, &
\text{if reduction} = \text{`sum'.} \end{cases}

Can also be used for higher dimension inputs, such as 2D images, by providing
an input of size (minibatch,C,d1,d2,...,dK)(minibatch, C, d_1, d_2, ..., d_K)
with K≥1K \geq 1 , where KK is the number of dimensions, and a target of
appropriate shape (see below). In the case of images, it computes NLL loss
per-pixel.

Parameters

    

  * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size `C`. Otherwise, it is treated as if having all ones.
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets.
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the weighted mean of the output is taken, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (N,C)(N, C) where `C = number of classes`, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of `K`-dimensional loss.
  * Target: (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss.
  * Output: scalar. If `reduction` is `'none'`, then the same size as the target: (N)(N) , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss.

Examples:

    
    
    >>> m = nn.LogSoftmax(dim=1)
    >>> loss = nn.NLLLoss()
    >>> # input is of size N x C = 3 x 5
    >>> input = torch.randn(3, 5, requires_grad=True)
    >>> # each element in target has to have 0 <= value < C
    >>> target = torch.tensor([1, 0, 4])
    >>> output = loss(m(input), target)
    >>> output.backward()
    >>>
    >>>
    >>> # 2D loss example (used, for example, with image inputs)
    >>> N, C = 5, 4
    >>> loss = nn.NLLLoss()
    >>> # input is of size N x C x height x width
    >>> data = torch.randn(N, 16, 10, 10)
    >>> conv = nn.Conv2d(16, C, (3, 3))
    >>> m = nn.LogSoftmax(dim=1)
    >>> # each element in target has to have 0 <= value < C
    >>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
    >>> output = loss(m(conv(data)), target)
    >>> output.backward()
    

# PairwiseDistance

`class torch.nn.PairwiseDistance(p=2.0, eps=1e-06, keepdim=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/distance.html#PairwiseDistance)

    

Computes the batchwise pairwise distance between vectors v1v_1 , v2v_2 using
the p-norm:

∥x∥p=(∑i=1n∣xi∣p)1/p.\Vert x \Vert _p = \left( \sum_{i=1}^n \vert x_i \vert ^
p \right) ^ {1/p}.

Parameters

    

  * **p** (_real_) – the norm degree. Default: 2
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid division by zero. Default: 1e-6
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Determines whether or not to keep the vector dimension. Default: False

Shape:

    

  * Input1: (N,D)(N, D) where `D = vector dimension`
  * Input2: (N,D)(N, D) , same shape as the Input1
  * Output: (N)(N) . If `keepdim` is `True`, then (N,1)(N, 1) .

Examples::

    
    
    
    >>> pdist = nn.PairwiseDistance(p=2)
    >>> input1 = torch.randn(100, 128)
    >>> input2 = torch.randn(100, 128)
    >>> output = pdist(input1, input2)
    

# DistributedDataParallel

`class torch.nn.parallel.DistributedDataParallel(module, device_ids=None,
output_device=None, dim=0, broadcast_buffers=True, process_group=None,
bucket_cap_mb=25, find_unused_parameters=False, check_reduction=False,
gradient_as_bucket_view=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel)

    

Implements distributed data parallelism that is based on `torch.distributed`
package at the module level.

This container parallelizes the application of the given module by splitting
the input across the specified devices by chunking in the batch dimension. The
module is replicated on each machine and each device, and each such replica
handles a portion of the input. During the backwards pass, gradients from each
node are averaged.

The batch size should be larger than the number of GPUs used locally.

See also: [Basics](../distributed#distributed-basics) and [Use
nn.parallel.DistributedDataParallel instead of multiprocessing or
nn.DataParallel](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-nn-ddp-
instead). The same constraints on input as in
[`torch.nn.DataParallel`](torch.nn.dataparallel#torch.nn.DataParallel
"torch.nn.DataParallel") apply.

Creation of this class requires that `torch.distributed` to be already
initialized, by calling
[`torch.distributed.init_process_group()`](../distributed#torch.distributed.init_process_group
"torch.distributed.init_process_group").

`DistributedDataParallel` is proven to be significantly faster than
[`torch.nn.DataParallel`](torch.nn.dataparallel#torch.nn.DataParallel
"torch.nn.DataParallel") for single-node multi-GPU data parallel training.

To use `DistributedDataParallel` on a host with N GPUs, you should spawn up
`N` processes, ensuring that each process exclusively works on a single GPU
from 0 to N-1. This can be done by either setting `CUDA_VISIBLE_DEVICES` for
every process or by calling:

    
    
    >>> torch.cuda.set_device(i)
    

where i is from 0 to N-1. In each process, you should refer the following to
construct this module:

    
    
    >>> torch.distributed.init_process_group(
    >>>     backend='nccl', world_size=N, init_method='...'
    >>> )
    >>> model = DistributedDataParallel(model, device_ids=[i], output_device=i)
    

In order to spawn up multiple processes per node, you can use either
`torch.distributed.launch` or `torch.multiprocessing.spawn`.

Note

Please refer to [PyTorch Distributed
Overview](https://pytorch.org/tutorials/beginner/dist_overview.html) for a
brief introduction to all features related to distributed training.

Note

`nccl` backend is currently the fastest and highly recommended backend when
using GPUs. This applies to both single-node and multi-node distributed
training.

Note

This module also supports mixed-precision distributed training. This means
that your model can have different types of parameters such as mixed types of
`fp16` and `fp32`, the gradient reduction on these mixed types of parameters
will just work fine.

Note

If you use `torch.save` on one process to checkpoint the module, and
`torch.load` on some other processes to recover it, make sure that
`map_location` is configured properly for every process. Without
`map_location`, `torch.load` would recover the module to devices where the
module was saved from.

Note

When a model is trained on `M` nodes with `batch=N`, the gradient will be `M`
times smaller when compared to the same model trained on a single node with
`batch=M*N` if the loss is summed (NOT averaged as usual) across instances in
a batch (because the gradients between different nodes are averaged). You
should take this into consideration when you want to obtain a mathematically
equivalent training process compared to the local training counterpart. But in
most cases, you can just treat a DistributedDataParallel wrapped model, a
DataParallel wrapped model and an ordinary model on a single GPU as the same
(E.g. using the same learning rate for equivalent batch size).

Note

Parameters are never broadcast between processes. The module performs an all-
reduce step on gradients and assumes that they will be modified by the
optimizer in all processes in the same way. Buffers (e.g. BatchNorm stats) are
broadcast from the module in process of rank 0, to all other replicas in the
system in every iteration.

Note

If you are using DistributedDataParallel in conjunction with the [Distributed
RPC Framework](../rpc#distributed-rpc-framework), you should always use
[`torch.distributed.autograd.backward()`](../rpc#torch.distributed.autograd.backward
"torch.distributed.autograd.backward") to compute gradients and
[`torch.distributed.optim.DistributedOptimizer`](../rpc#torch.distributed.optim.DistributedOptimizer
"torch.distributed.optim.DistributedOptimizer") for optimizing parameters.

Example:

    
    
    >>> import torch.distributed.autograd as dist_autograd
    >>> from torch.nn.parallel import DistributedDataParallel as DDP
    >>> from torch import optim
    >>> from torch.distributed.optim import DistributedOptimizer
    >>> from torch.distributed.rpc import RRef
    >>>
    >>> t1 = torch.rand((3, 3), requires_grad=True)
    >>> t2 = torch.rand((3, 3), requires_grad=True)
    >>> rref = rpc.remote("worker1", torch.add, args=(t1, t2))
    >>> ddp_model = DDP(my_model)
    >>>
    >>> # Setup optimizer
    >>> optimizer_params = [rref]
    >>> for param in ddp_model.parameters():
    >>>     optimizer_params.append(RRef(param))
    >>>
    >>> dist_optim = DistributedOptimizer(
    >>>     optim.SGD,
    >>>     optimizer_params,
    >>>     lr=0.05,
    >>> )
    >>>
    >>> with dist_autograd.context() as context_id:
    >>>     pred = ddp_model(rref.to_here())
    >>>     loss = loss_func(pred, loss)
    >>>     dist_autograd.backward(context_id, loss)
    >>>     dist_optim.step()
    

Warning

Constructor, forward method, and differentiation of the output (or a function
of the output of this module) are distributed synchronization points. Take
that into account in case different processes might be executing different
code.

Warning

This module assumes all parameters are registered in the model by the time it
is created. No parameters should be added nor removed later. Same applies to
buffers.

Warning

This module assumes all parameters are registered in the model of each
distributed processes are in the same order. The module itself will conduct
gradient `allreduce` following the reverse order of the registered parameters
of the model. In other words, it is users’ responsibility to ensure that each
distributed process has the exact same model and thus the exact same parameter
registration order.

Warning

This module allows parameters with non-rowmajor-contiguous strides. For
example, your model may contain some parameters whose `torch.memory_format` is
`torch.contiguous_format` and others whose format is `torch.channels_last`.
However, corresponding parameters in different processes must have the same
strides.

Warning

This module doesn’t work with
[`torch.autograd.grad()`](../autograd#torch.autograd.grad
"torch.autograd.grad") (i.e. it will only work if gradients are to be
accumulated in `.grad` attributes of parameters).

Warning

If you plan on using this module with a `nccl` backend or a `gloo` backend
(that uses Infiniband), together with a DataLoader that uses multiple workers,
please change the multiprocessing start method to `forkserver` (Python 3 only)
or `spawn`. Unfortunately Gloo (that uses Infiniband) and NCCL2 are not fork
safe, and you will likely experience deadlocks if you don’t change this
setting.

Warning

Forward and backward hooks defined on `module` and its submodules won’t be
invoked anymore, unless the hooks are initialized in the `forward()` method.

Warning

You should never try to change your model’s parameters after wrapping up your
model with `DistributedDataParallel`. Because, when wrapping up your model
with `DistributedDataParallel`, the constructor of `DistributedDataParallel`
will register the additional gradient reduction functions on all the
parameters of the model itself at the time of construction. If you change the
model’s parameters afterwards, gradient redunction functions no longer match
the correct set of parameters.

Warning

Using `DistributedDataParallel` in conjunction with the [Distributed RPC
Framework](../rpc#distributed-rpc-framework) is experimental and subject to
change.

Warning

The `gradient_as_bucket_view` mode does not yet work with Automatic Mixed
Precision (AMP). AMP maintains stashed gradients that are used for unscaling
gradients. With `gradient_as_bucket_view=True`, these stashed gradients will
point to communication buckets in the first iteration. In the next iteration,
the communication buckets are mutated and thus these stashed gradients will be
unexpectedly mutated as well, which might lead to wrong results.

Parameters

    

  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module to be parallelized
  * **device_ids** (_list of python:int_ _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – CUDA devices. This should only be provided when the input module resides on a single CUDA device. For single-device modules, the i’th `module` replica is placed on `device_ids[i]`. For multi-device modules and CPU modules, `device_ids` must be `None` or an empty list, and input data for the forward pass must be placed on the correct device. (default: all visible devices for single-device modules)
  * **output_device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – Device location of output for single-device CUDA modules. For multi-device modules and CPU modules, it must be `None`, and the module itself dictates the output location. (default: `device_ids[0]` for single-device modules)
  * **broadcast_buffers** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Flag that enables syncing (broadcasting) buffers of the module at beginning of the `forward` function. (default: `True`)
  * **process_group** – The process group to be used for distributed data all-reduction. If `None`, the default process group, which is created by [`torch.distributed.init_process_group()`](../distributed#torch.distributed.init_process_group "torch.distributed.init_process_group"), will be used. (default: `None`)
  * **bucket_cap_mb** – `DistributedDataParallel` will bucket parameters into multiple buckets so that gradient reduction of each bucket can potentially overlap with backward computation. `bucket_cap_mb` controls the bucket size in MegaBytes (MB). (default: 25)
  * **find_unused_parameters** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Traverse the autograd graph from all tensors contained in the return value of the wrapped module’s `forward` function. Parameters that don’t receive gradients as part of this graph are preemptively marked as being ready to be reduced. Note that all `forward` outputs that are derived from module parameters must participate in calculating loss and later the gradient computation. If they don’t, this wrapper will hang waiting for autograd to produce gradients for those parameters. Any outputs derived from module parameters that are otherwise unused can be detached from the autograd graph using `torch.Tensor.detach`. (default: `False`)
  * **check_reduction** – This argument is deprecated.
  * **gradient_as_bucket_view** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – This is a prototype feature and subject to changes. When set to `True`, gradients will be views pointing to different offsets of `allreduce` communication buckets. This can reduce peak memory usage, where the saved memory size will be equal to the total gradients size. Moreover, it avoids the overhead of copying between gradients and `allreduce` communication buckets. When gradients are views, `detach_()` cannot be called on the gradients. If hitting such errors, please fix it by referring to the [`zero_grad()`](../optim#torch.optim.Optimizer.zero_grad "torch.optim.Optimizer.zero_grad") function in `torch/optim/optimizer.py` as a solution.

Variables

    

**~DistributedDataParallel.module** ([Module](torch.nn.module#torch.nn.Module
"torch.nn.Module")) – the module to be parallelized.

Example:

    
    
    >>> torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...')
    >>> net = torch.nn.parallel.DistributedDataParallel(model, pg)
    

`join(divide_by_initial_world_size=True, enable=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.join)

    

A context manager to be used in conjunction with an instance of
`torch.nn.parallel.DistributedDataParallel` to be able to train with uneven
inputs across participating processes.

This context manager will keep track of already-joined DDP processes, and
“shadow” the forward and backward passes by inserting collective communication
operations to match with the ones created by non-joined DDP processes. This
will ensure each collective call has a corresponding call by already-joined
DDP processes, preventing hangs or errors that would otherwise happen when
training with uneven inputs across processes.

Once all DDP processes have joined, the context manager will broadcast the
model corresponding to the last joined process to all processes to ensure the
model is the same across all processes (which is guaranteed by DDP).

To use this to enable training with uneven inputs across processes, simply
wrap this context manager around your training loop. No further modifications
to the model or data loading is required.

Warning

This module works only with the multi-process, single-device usage of
`torch.nn.parallel.DistributedDataParallel`, which means that a single process
works on a single GPU.

Warning

This module currently does not support custom distributed collective
operations in the forward pass, such as `SyncBatchNorm` or other custom
defined collectives in the model’s forward pass.

Parameters

    

  * **divide_by_initial_world_size** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, will divide gradients by the initial `world_size` DDP training was launched with. If `False`, will compute the effective world size (number of ranks that have not depleted their inputs yet) and divide gradients by that during allreduce. Set `divide_by_initial_world_size=True` to ensure every input sample including the uneven inputs have equal weight in terms of how much they contribute to the global gradient. This is achieved by always dividing the gradient by the initial `world_size` even when we encounter uneven inputs. If you set this to `False`, we divide the gradient by the remaining number of nodes. This ensures parity with training on a smaller `world_size` although it also means the uneven inputs would contribute more towards the global gradient. Typically, you would want to set this to `True` for cases where the last few inputs of your training job are uneven. In extreme cases, where there is a large discrepancy in the number of inputs, setting this to `False` might provide better results.
  * **enable** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to enable uneven input detection or not. Pass in `enable=False` to disable in cases where you know that inputs are even across participating processes. Default is `True`.

Example:

    
    
    >>>  import torch
    >>>  import torch.distributed as dist
    >>>  import os
    >>>  import torch.multiprocessing as mp
    >>>  import torch.nn as nn
    >>>  # On each spawned worker
    >>>  def worker(rank):
    >>>      dist.init_process_group("nccl", rank=rank, world_size=2)
    >>>      torch.cuda.set_device(rank)
    >>>      model = nn.Linear(1, 1, bias=False).to(rank)
    >>>      model = torch.nn.parallel.DistributedDataParallel(
    >>>          model, device_ids=[rank], output_device=rank
    >>>      )
    >>>      # Rank 1 gets one more input than rank 0.
    >>>      inputs = [torch.tensor([1]).float() for _ in range(10 + rank)]
    >>>      with model.join():
    >>>          for _ in range(5):
    >>>              for inp in inputs:
    >>>                  loss = model(inp).sum()
    >>>                  loss.backward()
    >>>  # Without the join() API, the below synchronization will hang
    >>>  # blocking for rank 1's allreduce to complete.
    >>>  torch.cuda.synchronize(device=rank)
    

`no_sync()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.no_sync)

    

A context manager to disable gradient synchronizations across DDP processes.
Within this context, gradients will be accumulated on module variables, which
will later be synchronized in the first forward-backward pass exiting the
context.

Example:

    
    
    >>> ddp = torch.nn.parallel.DistributedDataParallel(model, pg)
    >>> with ddp.no_sync():
    >>>   for input in inputs:
    >>>     ddp(input).backward()  # no synchronization, accumulate grads
    >>> ddp(another_input).backward()  # synchronize grads
    

`register_comm_hook(state, hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.register_comm_hook)

    

Registers a communication hook which is an enhancement that provides a
flexible hook to users where they can specify how DDP aggregates gradients
across multiple workers.

This hook would be very useful for researchers to try out new ideas. For
example, this hook can be used to implement several algorithms like GossipGrad
and gradient compression which involve different communication strategies for
parameter syncs while running Distributed DataParallel training.

Parameters

    

  * **state** ([object](https://docs.python.org/3/library/functions.html#object "\(in Python v3.9\)")) – 

Passed to the hook to maintain any state information during the training
process. Examples include error feedback in gradient compression, peers to
communicate with next in GossipGrad, etc.

It is locally stored by each worker and shared by all the gradient tensors on
the worker.

  * **hook** (_callable_) – 

Averages gradient tensors across workers and defined as: `hook(state: object,
bucket: dist._GradBucket) -> torch.futures.Future`:

This function is called once the bucket is ready. The hook can perform
whatever processing is needed and return a Future indicating completion of any
async work (ex: allreduce). If the hook doesn’t perform any communication, it
can also just return a completed Future. The Future should hold the new value
of grad bucket’s tensors. Once a bucket is ready, c10d reducer would call this
hook and use the tensors returned by the Future and copy grads to individual
parameters.

We also provide an API called `get_future` to retrieve a Future associated
with the completion of `c10d.ProcessGroup.work`.

Warning

Grad bucket’s tensors will not be predivided by world_size. User is
responsible to divide by the world_size in case of operations like allreduce.

Warning

DDP communication hook can only be registered once and should be registered
before calling backward.

Warning

The Future object that hook returns should contain a result that has the same
shape with the tensors inside grad bucket.

Warning

DDP communication hook does not support single-process multiple-device mode.
Gradbucket tensors should consist of only a single tensor.

Warning

`get_future` API supports only NCCL backend and will return a
`torch._C.Future` which is an internal type and should be used with caution.
It can still be used by `register_comm_hook` API, but it is subject to some
subtle differences compared to `torch.futures.Future`.

Warning

DDP communication hook is experimental and subject to change.

Example::

    

Below is an example of a noop hook that returns the same tensors.

    
    
    >>> def noop(state: object, bucket: dist._GradBucket): -> torch.futures.Future
    >>>     fut = torch.futures.Future()
    >>>     fut.set_result(bucket.get_tensors())
    >>>     return fut
    
    
    
    >>> ddp.register_comm_hook(state = None, hook = noop)
    

Example::

    

Below is an example of a Parallel SGD algorithm where gradients are encoded
before allreduce, and then decoded after allreduce.

    
    
    >>> def encode_and_decode(state: object, bucket: dist._GradBucket): -> torch.futures.Future
    >>>     tensors = [t / process_group.world_size for t in bucket.get_tensors()]
    >>>     encoded_tensors = encode(tensors) # encode gradients
    >>>     fut = process_group.allreduce(encoded_tensors).get_future()
    >>>     # Define the then callback to decode.
    >>>     def decode(fut):
    >>>         decoded_tensors = decode(fut.value()) # decode gradients
    >>>         return decoded_tensors
    >>>     return fut.then(decode)
    
    
    
    >>> ddp.register_comm_hook(state = None, hook = encode_and_decode)
    

# Parameter

`class torch.nn.parameter.Parameter`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parameter.html#Parameter)

    

A kind of Tensor that is to be considered a module parameter.

Parameters are [`Tensor`](../tensors#torch.Tensor "torch.Tensor") subclasses,
that have a very special property when used with `Module` s - when they’re
assigned as Module attributes they are automatically added to the list of its
parameters, and will appear e.g. in `parameters()` iterator. Assigning a
Tensor doesn’t have such effect. This is because one might want to cache some
temporary state, like last hidden state of the RNN, in the model. If there was
no such class as `Parameter`, these temporaries would get registered too.

Parameters

    

  * **data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – parameter tensor.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if the parameter requires gradient. See [Excluding subgraphs from backward](https://pytorch.org/docs/1.8.0/notes/autograd.html#excluding-subgraphs) for more details. Default: `True`

# UninitializedParameter

`class torch.nn.parameter.UninitializedParameter`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parameter.html#UninitializedParameter)

    

A parameter that is not initialized.

Unitialized Parameters are a a special case of `torch.nn.Parameter` where the
shape of the data is still unknown.

Unlikely a `torch.nn.Parameter`, uninitialized parameters hold no data and
attempting to access some properties, like their shape, will throw a runtime
error. The only operations that can be performed on a uninitialized parameter
are changing its datatype, moving it to a different device and converting it
to a regular `torch.nn.Parameter`.

`materialize(shape, device=None, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parameter.html#UninitializedParameter.materialize)

    

Create a Parameter with the same properties of the uninitialized one. Given a
shape, it materializes a parameter in the same device and with the same
`dtype` as the current one or the specified ones in the arguments.

Parameters

    

  * **shape** – (tuple): the shape for the materialized tensor.
  * **device** (`torch.device`) – the desired device of the parameters and buffers in this module. Optional.
  * **dtype** (`torch.dtype`) – the desired floating point type of the floating point parameters and buffers in this module. Optional.

# ParameterDict

`class torch.nn.ParameterDict(parameters=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict)

    

Holds parameters in a dictionary.

ParameterDict can be indexed like a regular Python dictionary, but parameters
it contains are properly registered, and will be visible by all Module
methods.

`ParameterDict` is an **ordered** dictionary that respects

  * the order of insertion, and
  * in `update()`, the order of the merged `OrderedDict` or another `ParameterDict` (the argument to `update()`).

Note that `update()` with other unordered mapping types (e.g., Python’s plain
`dict`) does not preserve the order of the merged mapping.

Parameters

    

**parameters** (_iterable_ _,__optional_) – a mapping (dictionary) of (string
: `Parameter`) or an iterable of key-value pairs of type (string, `Parameter`)

Example:

    
    
    class MyModule(nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
            self.params = nn.ParameterDict({
                    'left': nn.Parameter(torch.randn(5, 10)),
                    'right': nn.Parameter(torch.randn(5, 10))
            })
    
        def forward(self, x, choice):
            x = self.params[choice].mm(x)
            return x
    

`clear()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.clear)

    

Remove all items from the ParameterDict.

`items()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.items)

    

Return an iterable of the ParameterDict key/value pairs.

`keys()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.keys)

    

Return an iterable of the ParameterDict keys.

`pop(key)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.pop)

    

Remove key from the ParameterDict and return its parameter.

Parameters

    

**key** (_string_) – key to pop from the ParameterDict

`update(parameters)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.update)

    

Update the `ParameterDict` with the key-value pairs from a mapping or an
iterable, overwriting existing keys.

Note

If `parameters` is an `OrderedDict`, a `ParameterDict`, or an iterable of key-
value pairs, the order of new elements in it is preserved.

Parameters

    

**parameters** (_iterable_) – a mapping (dictionary) from string to
`Parameter`, or an iterable of key-value pairs of type (string, `Parameter`)

`values()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.values)

    

Return an iterable of the ParameterDict values.

# ParameterList

`class torch.nn.ParameterList(parameters=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterList)

    

Holds parameters in a list.

`ParameterList` can be indexed like a regular Python list, but parameters it
contains are properly registered, and will be visible by all
[`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") methods.

Parameters

    

**parameters** (_iterable_ _,__optional_) – an iterable of `Parameter` to add

Example:

    
    
    class MyModule(nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
            self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)])
    
        def forward(self, x):
            # ParameterList can act as an iterable, or be indexed using ints
            for i, p in enumerate(self.params):
                x = self.params[i // 2].mm(x) + p.mm(x)
            return x
    

`append(parameter)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterList.append)

    

Appends a given parameter at the end of the list.

Parameters

    

**parameter** (_nn.Parameter_) – parameter to append

`extend(parameters)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterList.extend)

    

Appends parameters from a Python iterable to the end of the list.

Parameters

    

**parameters** (_iterable_) – iterable of parameters to append

# PixelShuffle

`class torch.nn.PixelShuffle(upscale_factor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pixelshuffle.html#PixelShuffle)

    

Rearranges elements in a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W)
to a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) , where r is
an upscale factor.

This is useful for implementing efficient sub-pixel convolution with a stride
of 1/r1/r .

See the paper: [Real-Time Single Image and Video Super-Resolution Using an
Efficient Sub-Pixel Convolutional Neural
Network](https://arxiv.org/abs/1609.05158) by Shi et. al (2016) for more
details.

Parameters

    

**upscale_factor** ([int](https://docs.python.org/3/library/functions.html#int
"\(in Python v3.9\)")) – factor to increase spatial resolution by

Shape:

    

  * Input: (∗,Cin,Hin,Win)(*, C_{in}, H_{in}, W_{in}) , where * is zero or more batch dimensions
  * Output: (∗,Cout,Hout,Wout)(*, C_{out}, H_{out}, W_{out}) , where

Cout=Cin÷upscale_factor2C_{out} = C_{in} \div \text{upscale\\_factor}^2

Hout=Hin×upscale_factorH_{out} = H_{in} \times \text{upscale\\_factor}

Wout=Win×upscale_factorW_{out} = W_{in} \times \text{upscale\\_factor}

Examples:

    
    
    >>> pixel_shuffle = nn.PixelShuffle(3)
    >>> input = torch.randn(1, 9, 4, 4)
    >>> output = pixel_shuffle(input)
    >>> print(output.size())
    torch.Size([1, 1, 12, 12])
    

# PixelUnshuffle

`class torch.nn.PixelUnshuffle(downscale_factor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pixelshuffle.html#PixelUnshuffle)

    

Reverses the [`PixelShuffle`](torch.nn.pixelshuffle#torch.nn.PixelShuffle
"torch.nn.PixelShuffle") operation by rearranging elements in a tensor of
shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape
(∗,C×r2,H,W)(*, C \times r^2, H, W) , where r is a downscale factor.

See the paper: [Real-Time Single Image and Video Super-Resolution Using an
Efficient Sub-Pixel Convolutional Neural
Network](https://arxiv.org/abs/1609.05158) by Shi et. al (2016) for more
details.

Parameters

    

**downscale_factor**
([int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)")) – factor to decrease spatial resolution by

Shape:

    

  * Input: (∗,Cin,Hin,Win)(*, C_{in}, H_{in}, W_{in}) , where * is zero or more batch dimensions
  * Output: (∗,Cout,Hout,Wout)(*, C_{out}, H_{out}, W_{out}) , where

Cout=Cin×downscale_factor2C_{out} = C_{in} \times \text{downscale\\_factor}^2

Hout=Hin÷downscale_factorH_{out} = H_{in} \div \text{downscale\\_factor}

Wout=Win÷downscale_factorW_{out} = W_{in} \div \text{downscale\\_factor}

Examples:

    
    
    >>> pixel_unshuffle = nn.PixelUnshuffle(3)
    >>> input = torch.randn(1, 1, 12, 12)
    >>> output = pixel_unshuffle(input)
    >>> print(output.size())
    torch.Size([1, 9, 4, 4])
    

# PoissonNLLLoss

`class torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None,
eps=1e-08, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#PoissonNLLLoss)

    

Negative log likelihood loss with Poisson distribution of target.

The loss can be described as:

target∼Poisson(input)loss(input,target)=input−target∗log⁡(input)+log⁡(target!)\text{target}
\sim \mathrm{Poisson}(\text{input}) \text{loss}(\text{input}, \text{target}) =
\text{input} - \text{target} * \log(\text{input}) + \log(\text{target!})

The last term can be omitted or approximated with Stirling formula. The
approximation is used for target values more than 1. For targets less or equal
to 1 zeros are added to the loss.

Parameters

    

  * **log_input** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True` the loss is computed as exp⁡(input)−target∗input\exp(\text{input}) - \text{target}*\text{input} , if `False` the loss is input−target∗log⁡(input+eps)\text{input} - \text{target}*\log(\text{input}+\text{eps}) .
  * **full** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – 

whether to compute full loss, i. e. to add the Stirling approximation term

target∗log⁡(target)−target+0.5∗log⁡(2πtarget).\text{target}*\log(\text{target})
- \text{target} + 0.5 * \log(2\pi\text{target}).

  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid evaluation of log⁡(0)\log(0) when `log_input = False`. Default: 1e-8
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Examples:

    
    
    >>> loss = nn.PoissonNLLLoss()
    >>> log_input = torch.randn(5, 2, requires_grad=True)
    >>> target = torch.randn(5, 2)
    >>> output = loss(log_input, target)
    >>> output.backward()
    

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input
  * Output: scalar by default. If `reduction` is `'none'`, then (N,∗)(N, *) , the same shape as the input

# PReLU

`class torch.nn.PReLU(num_parameters=1, init=0.25)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#PReLU)

    

Applies the element-wise function:

PReLU(x)=max⁡(0,x)+a∗min⁡(0,x)\text{PReLU}(x) = \max(0,x) + a * \min(0,x)

or

PReLU(x)={x, if x≥0ax, otherwise \text{PReLU}(x) = \begin{cases} x, & \text{
if } x \geq 0 \\\ ax, & \text{ otherwise } \end{cases}

Here aa is a learnable parameter. When called without arguments, `nn.PReLU()`
uses a single parameter aa across all input channels. If called with
`nn.PReLU(nChannels)`, a separate aa is used for each input channel.

Note

weight decay should not be used when learning aa for good performance.

Note

Channel dim is the 2nd dim of input. When input has dims < 2, then there is no
channel dim and the number of channels = 1.

Parameters

    

  * **num_parameters** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of aa to learn. Although it takes an int as input, there is only two values are legitimate: 1, or the number of channels at input. Default: 1
  * **init** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the initial value of aa . Default: 0.25

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Variables

    

**~PReLU.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the
learnable weights of shape (`num_parameters`).

Examples:

    
    
    >>> m = nn.PReLU()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# ReflectionPad1d

`class torch.nn.ReflectionPad1d(padding)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReflectionPad1d)

    

Pads the input tensor using the reflection of the input boundary.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 2-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} )

Shape:

    

  * Input: (N,C,Win)(N, C, W_{in})
  * Output: (N,C,Wout)(N, C, W_{out}) where

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ReflectionPad1d(2)
    >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4)
    >>> input
    tensor([[[0., 1., 2., 3.],
             [4., 5., 6., 7.]]])
    >>> m(input)
    tensor([[[2., 1., 0., 1., 2., 3., 2., 1.],
             [6., 5., 4., 5., 6., 7., 6., 5.]]])
    >>> # using different paddings for different sides
    >>> m = nn.ReflectionPad1d((3, 1))
    >>> m(input)
    tensor([[[3., 2., 1., 0., 1., 2., 3., 2.],
             [7., 6., 5., 4., 5., 6., 7., 6.]]])
    

# ReflectionPad2d

`class torch.nn.ReflectionPad2d(padding)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReflectionPad2d)

    

Pads the input tensor using the reflection of the input boundary.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} , padding_top\text{padding\\_top} ,
padding_bottom\text{padding\\_bottom} )

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where

Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} +
\text{padding\\_bottom}

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ReflectionPad2d(2)
    >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3)
    >>> input
    tensor([[[[0., 1., 2.],
              [3., 4., 5.],
              [6., 7., 8.]]]])
    >>> m(input)
    tensor([[[[8., 7., 6., 7., 8., 7., 6.],
              [5., 4., 3., 4., 5., 4., 3.],
              [2., 1., 0., 1., 2., 1., 0.],
              [5., 4., 3., 4., 5., 4., 3.],
              [8., 7., 6., 7., 8., 7., 6.],
              [5., 4., 3., 4., 5., 4., 3.],
              [2., 1., 0., 1., 2., 1., 0.]]]])
    >>> # using different paddings for different sides
    >>> m = nn.ReflectionPad2d((1, 1, 2, 0))
    >>> m(input)
    tensor([[[[7., 6., 7., 8., 7.],
              [4., 3., 4., 5., 4.],
              [1., 0., 1., 2., 1.],
              [4., 3., 4., 5., 4.],
              [7., 6., 7., 8., 7.]]]])
    

# ReLU

`class torch.nn.ReLU(inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#ReLU)

    

Applies the rectified linear unit function element-wise:

ReLU(x)=(x)+=max⁡(0,x)\text{ReLU}(x) = (x)^+ = \max(0, x)

Parameters

    

**inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
      >>> m = nn.ReLU()
      >>> input = torch.randn(2)
      >>> output = m(input)
    
    
    An implementation of CReLU - https://arxiv.org/abs/1603.05201
    
      >>> m = nn.ReLU()
      >>> input = torch.randn(2).unsqueeze(0)
      >>> output = torch.cat((m(input),m(-input)))
    

# ReLU6

`class torch.nn.ReLU6(inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#ReLU6)

    

Applies the element-wise function:

ReLU6(x)=min⁡(max⁡(0,x),6)\text{ReLU6}(x) = \min(\max(0,x), 6)

Parameters

    

**inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.ReLU6()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# ReplicationPad1d

`class torch.nn.ReplicationPad1d(padding)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReplicationPad1d)

    

Pads the input tensor using replication of the input boundary.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 2-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} )

Shape:

    

  * Input: (N,C,Win)(N, C, W_{in})
  * Output: (N,C,Wout)(N, C, W_{out}) where

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ReplicationPad1d(2)
    >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4)
    >>> input
    tensor([[[0., 1., 2., 3.],
             [4., 5., 6., 7.]]])
    >>> m(input)
    tensor([[[0., 0., 0., 1., 2., 3., 3., 3.],
             [4., 4., 4., 5., 6., 7., 7., 7.]]])
    >>> # using different paddings for different sides
    >>> m = nn.ReplicationPad1d((3, 1))
    >>> m(input)
    tensor([[[0., 0., 0., 0., 1., 2., 3., 3.],
             [4., 4., 4., 4., 5., 6., 7., 7.]]])
    

# ReplicationPad2d

`class torch.nn.ReplicationPad2d(padding)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReplicationPad2d)

    

Pads the input tensor using replication of the input boundary.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} , padding_top\text{padding\\_top} ,
padding_bottom\text{padding\\_bottom} )

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where

Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} +
\text{padding\\_bottom}

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ReplicationPad2d(2)
    >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3)
    >>> input
    tensor([[[[0., 1., 2.],
              [3., 4., 5.],
              [6., 7., 8.]]]])
    >>> m(input)
    tensor([[[[0., 0., 0., 1., 2., 2., 2.],
              [0., 0., 0., 1., 2., 2., 2.],
              [0., 0., 0., 1., 2., 2., 2.],
              [3., 3., 3., 4., 5., 5., 5.],
              [6., 6., 6., 7., 8., 8., 8.],
              [6., 6., 6., 7., 8., 8., 8.],
              [6., 6., 6., 7., 8., 8., 8.]]]])
    >>> # using different paddings for different sides
    >>> m = nn.ReplicationPad2d((1, 1, 2, 0))
    >>> m(input)
    tensor([[[[0., 0., 1., 2., 2.],
              [0., 0., 1., 2., 2.],
              [0., 0., 1., 2., 2.],
              [3., 3., 4., 5., 5.],
              [6., 6., 7., 8., 8.]]]])
    

# ReplicationPad3d

`class torch.nn.ReplicationPad3d(padding)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReplicationPad3d)

    

Pads the input tensor using replication of the input boundary.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 6-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} , padding_top\text{padding\\_top} ,
padding_bottom\text{padding\\_bottom} , padding_front\text{padding\\_front} ,
padding_back\text{padding\\_back} )

Shape:

    

  * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in})
  * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) where

Dout=Din+padding_front+padding_backD_{out} = D_{in} + \text{padding\\_front} +
\text{padding\\_back}

Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} +
\text{padding\\_bottom}

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ReplicationPad3d(3)
    >>> input = torch.randn(16, 3, 8, 320, 480)
    >>> output = m(input)
    >>> # using different paddings for different sides
    >>> m = nn.ReplicationPad3d((3, 3, 6, 6, 1, 1))
    >>> output = m(input)
    

# RNN

`class torch.nn.RNN(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNN)

    

Applies a multi-layer Elman RNN with tanh⁡\tanh or ReLU\text{ReLU} non-
linearity to an input sequence.

For each element in the input sequence, each layer computes the following
function:

ht=tanh⁡(Wihxt+bih+Whhh(t−1)+bhh)h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh}
h_{(t-1)} + b_{hh})

where hth_t is the hidden state at time `t`, xtx_t is the input at time `t`,
and h(t−1)h_{(t-1)} is the hidden state of the previous layer at time `t-1` or
the initial hidden state at time `0`. If `nonlinearity` is `'relu'`, then
ReLU\text{ReLU} is used instead of tanh⁡\tanh .

Parameters

    

  * **input_size** – The number of expected features in the input `x`
  * **hidden_size** – The number of features in the hidden state `h`
  * **num_layers** – Number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two RNNs together to form a `stacked RNN`, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1
  * **nonlinearity** – The non-linearity to use. Can be either `'tanh'` or `'relu'`. Default: `'tanh'`
  * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True`
  * **batch_first** – If `True`, then the input and output tensors are provided as `(batch, seq, feature)`. Default: `False`
  * **dropout** – If non-zero, introduces a `Dropout` layer on the outputs of each RNN layer except the last layer, with dropout probability equal to `dropout`. Default: 0
  * **bidirectional** – If `True`, becomes a bidirectional RNN. Default: `False`

Inputs: input, h_0

    

  * **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See [`torch.nn.utils.rnn.pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") or [`torch.nn.utils.rnn.pack_sequence()`](torch.nn.utils.rnn.pack_sequence#torch.nn.utils.rnn.pack_sequence "torch.nn.utils.rnn.pack_sequence") for details.
  * **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1.

Outputs: output, h_n

    

  * **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features (`h_t`) from the last layer of the RNN, for each `t`. If a [`torch.nn.utils.rnn.PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") has been given as the input, the output will also be a packed sequence.

For the unpacked case, the directions can be separated using
`output.view(seq_len, batch, num_directions, hidden_size)`, with forward and
backward being direction `0` and `1` respectively. Similarly, the directions
can be separated in the packed case.

  * **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`.

Like _output_ , the layers can be separated using `h_n.view(num_layers,
num_directions, batch, hidden_size)`.

Shape:

    

  * Input1: (L,N,Hin)(L, N, H_{in}) tensor containing input features where Hin=input_sizeH_{in}=\text{input\\_size} and `L` represents a sequence length.
  * Input2: (S,N,Hout)(S, N, H_{out}) tensor containing the initial hidden state for each element in the batch. Hout=hidden_sizeH_{out}=\text{hidden\\_size} Defaults to zero if not provided. where S=num_layers∗num_directionsS=\text{num\\_layers} * \text{num\\_directions} If the RNN is bidirectional, num_directions should be 2, else it should be 1.
  * Output1: (L,N,Hall)(L, N, H_{all}) where Hall=num_directions∗hidden_sizeH_{all}=\text{num\\_directions} * \text{hidden\\_size}
  * Output2: (S,N,Hout)(S, N, H_{out}) tensor containing the next hidden state for each element in the batch

Variables

    

  * **~RNN.weight_ih_l[k]** – the learnable input-hidden weights of the k-th layer, of shape `(hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(hidden_size, num_directions * hidden_size)`
  * **~RNN.weight_hh_l[k]** – the learnable hidden-hidden weights of the k-th layer, of shape `(hidden_size, hidden_size)`
  * **~RNN.bias_ih_l[k]** – the learnable input-hidden bias of the k-th layer, of shape `(hidden_size)`
  * **~RNN.bias_hh_l[k]** – the learnable hidden-hidden bias of the k-th layer, of shape `(hidden_size)`

Note

All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k},
\sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}}

Warning

There are known non-determinism issues for RNN functions on some versions of
cuDNN and CUDA. You can enforce deterministic behavior by setting the
following environment variables:

On CUDA 10.1, set environment variable `CUDA_LAUNCH_BLOCKING=1`. This may
affect performance.

On CUDA 10.2 or later, set environment variable (note the leading colon
symbol) `CUBLAS_WORKSPACE_CONFIG=:16:8` or `CUBLAS_WORKSPACE_CONFIG=:4096:2`.

See the [cuDNN 8 Release
Notes](https://docs.nvidia.com/deeplearning/sdk/cudnn-release-
notes/rel_8.html) for more information.

Orphan

Note

If the following conditions are satisfied: 1) cudnn is enabled, 2) input data
is on the GPU 3) input data has dtype `torch.float16` 4) V100 GPU is used, 5)
input data is not in `PackedSequence` format persistent algorithm can be
selected to improve performance.

Examples:

    
    
    >>> rnn = nn.RNN(10, 20, 2)
    >>> input = torch.randn(5, 3, 10)
    >>> h0 = torch.randn(2, 3, 20)
    >>> output, hn = rnn(input, h0)
    

# RNNBase

`class torch.nn.RNNBase(mode, input_size, hidden_size, num_layers=1,
bias=True, batch_first=False, dropout=0.0, bidirectional=False, proj_size=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNNBase)

    

`flatten_parameters()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNNBase.flatten_parameters)

    

Resets parameter data pointer so that they can use faster code paths.

Right now, this works only if the module is on the GPU and cuDNN is enabled.
Otherwise, it’s a no-op.

# RNNCell

`class torch.nn.RNNCell(input_size, hidden_size, bias=True,
nonlinearity='tanh')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNNCell)

    

An Elman RNN cell with tanh or ReLU non-linearity.

h′=tanh⁡(Wihx+bih+Whhh+bhh)h' = \tanh(W_{ih} x + b_{ih} + W_{hh} h + b_{hh})

If `nonlinearity` is `‘relu’`, then ReLU is used in place of tanh.

Parameters

    

  * **input_size** – The number of expected features in the input `x`
  * **hidden_size** – The number of features in the hidden state `h`
  * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True`
  * **nonlinearity** – The non-linearity to use. Can be either `'tanh'` or `'relu'`. Default: `'tanh'`

Inputs: input, hidden

    

  * **input** of shape `(batch, input_size)`: tensor containing input features
  * **hidden** of shape `(batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided.

Outputs: h’

    

  * **h’** of shape `(batch, hidden_size)`: tensor containing the next hidden state for each element in the batch

Shape:

    

  * Input1: (N,Hin)(N, H_{in}) tensor containing input features where HinH_{in} = `input_size`
  * Input2: (N,Hout)(N, H_{out}) tensor containing the initial hidden state for each element in the batch where HoutH_{out} = `hidden_size` Defaults to zero if not provided.
  * Output: (N,Hout)(N, H_{out}) tensor containing the next hidden state for each element in the batch

Variables

    

  * **~RNNCell.weight_ih** – the learnable input-hidden weights, of shape `(hidden_size, input_size)`
  * **~RNNCell.weight_hh** – the learnable hidden-hidden weights, of shape `(hidden_size, hidden_size)`
  * **~RNNCell.bias_ih** – the learnable input-hidden bias, of shape `(hidden_size)`
  * **~RNNCell.bias_hh** – the learnable hidden-hidden bias, of shape `(hidden_size)`

Note

All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k},
\sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}}

Examples:

    
    
    >>> rnn = nn.RNNCell(10, 20)
    >>> input = torch.randn(6, 3, 10)
    >>> hx = torch.randn(3, 20)
    >>> output = []
    >>> for i in range(6):
            hx = rnn(input[i], hx)
            output.append(hx)
    

# RReLU

`class torch.nn.RReLU(lower=0.125, upper=0.3333333333333333, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#RReLU)

    

Applies the randomized leaky rectified liner unit function, element-wise, as
described in the paper:

[Empirical Evaluation of Rectified Activations in Convolutional
Network](https://arxiv.org/abs/1505.00853).

The function is defined as:

RReLU(x)={xif x≥0ax otherwise \text{RReLU}(x) = \begin{cases} x & \text{if } x
\geq 0 \\\ ax & \text{ otherwise } \end{cases}

where aa is randomly sampled from uniform distribution
U(lower,upper)\mathcal{U}(\text{lower}, \text{upper}) .

See: <https://arxiv.org/pdf/1505.00853.pdf>

Parameters

    

  * **lower** – lower bound of the uniform distribution. Default: 18\frac{1}{8}
  * **upper** – upper bound of the uniform distribution. Default: 13\frac{1}{3}
  * **inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.RReLU(0.1, 0.3)
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# SELU

`class torch.nn.SELU(inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#SELU)

    

Applied element-wise, as:

SELU(x)=scale∗(max⁡(0,x)+min⁡(0,α∗(exp⁡(x)−1)))\text{SELU}(x) = \text{scale} *
(\max(0,x) + \min(0, \alpha * (\exp(x) - 1)))

with α=1.6732632423543772848170429916717\alpha =
1.6732632423543772848170429916717 and
scale=1.0507009873554804934193349852946\text{scale} =
1.0507009873554804934193349852946 .

More details can be found in the paper [Self-Normalizing Neural
Networks](https://arxiv.org/abs/1706.02515) .

Parameters

    

**inplace** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)") _,__optional_) – can optionally do the operation in-
place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.SELU()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Sequential

`class torch.nn.Sequential(*args)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#Sequential)

    

A sequential container. Modules will be added to it in the order they are
passed in the constructor. Alternatively, an ordered dict of modules can also
be passed in.

To make it easier to understand, here is a small example:

    
    
    # Example of using Sequential
    model = nn.Sequential(
              nn.Conv2d(1,20,5),
              nn.ReLU(),
              nn.Conv2d(20,64,5),
              nn.ReLU()
            )
    
    # Example of using Sequential with OrderedDict
    model = nn.Sequential(OrderedDict([
              ('conv1', nn.Conv2d(1,20,5)),
              ('relu1', nn.ReLU()),
              ('conv2', nn.Conv2d(20,64,5)),
              ('relu2', nn.ReLU())
            ]))
    

# Sigmoid

`class torch.nn.Sigmoid`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Sigmoid)

    

Applies the element-wise function:

Sigmoid(x)=σ(x)=11+exp⁡(−x)\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 +
\exp(-x)}

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Sigmoid()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# SiLU

`class torch.nn.SiLU(inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#SiLU)

    

Applies the silu function, element-wise.

silu(x)=x∗σ(x),where σ(x) is the logistic sigmoid.\text{silu}(x) = x *
\sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.}

Note

See [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415)
where the SiLU (Sigmoid Linear Unit) was originally coined, and see [Sigmoid-
Weighted Linear Units for Neural Network Function Approximation in
Reinforcement Learning](https://arxiv.org/abs/1702.03118) and [Swish: a Self-
Gated Activation Function](https://arxiv.org/abs/1710.05941v1) where the SiLU
was experimented with later.

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.SiLU()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# SmoothL1Loss

`class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean',
beta=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#SmoothL1Loss)

    

Creates a criterion that uses a squared term if the absolute element-wise
error falls below beta and an L1 term otherwise. It is less sensitive to
outliers than the [`torch.nn.MSELoss`](torch.nn.mseloss#torch.nn.MSELoss
"torch.nn.MSELoss") and in some cases prevents exploding gradients (e.g. see
`Fast R-CNN` paper by Ross Girshick). Omitting a scaling factor of `beta`,
this loss is also known as the Huber loss:

loss(x,y)=1n∑izi\text{loss}(x, y) = \frac{1}{n} \sum_{i} z_{i}

where ziz_{i} is given by:

zi={0.5(xi−yi)2/beta,if ∣xi−yi∣<beta∣xi−yi∣−0.5∗beta,otherwise z_{i} =
\begin{cases} 0.5 (x_i - y_i)^2 / beta, & \text{if } |x_i - y_i| < beta \\\
|x_i - y_i| - 0.5 * beta, & \text{otherwise } \end{cases}

xx and yy arbitrary shapes with a total of nn elements each the sum operation
still operates over all the elements, and divides by nn .

`beta` is an optional parameter that defaults to 1.

Note: When `beta` is set to 0, this is equivalent to
[`L1Loss`](torch.nn.l1loss#torch.nn.L1Loss "torch.nn.L1Loss"). Passing a
negative value in for `beta` will result in an exception.

The division by nn can be avoided if sets `reduction = 'sum'`.

Parameters

    

  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`
  * **beta** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Specifies the threshold at which to change between L1 and L2 loss. This value defaults to 1.0.

Shape:

    

  * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions
  * Target: (N,∗)(N, *) , same shape as the input
  * Output: scalar. If `reduction` is `'none'`, then (N,∗)(N, *) , same shape as the input

# SoftMarginLoss

`class torch.nn.SoftMarginLoss(size_average=None, reduce=None,
reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#SoftMarginLoss)

    

Creates a criterion that optimizes a two-class classification logistic loss
between input tensor xx and target tensor yy (containing 1 or -1).

loss(x,y)=∑ilog⁡(1+exp⁡(−y[i]∗x[i]))x.nelement()\text{loss}(x, y) = \sum_i
\frac{\log(1 + \exp(-y[i]*x[i]))}{\text{x.nelement}()}

Parameters

    

  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (∗)(*) where ∗* means, any number of additional dimensions
  * Target: (∗)(*) , same shape as the input
  * Output: scalar. If `reduction` is `'none'`, then same shape as the input

# Softmax

`class torch.nn.Softmax(dim=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softmax)

    

Applies the Softmax function to an n-dimensional input Tensor rescaling them
so that the elements of the n-dimensional output Tensor lie in the range [0,1]
and sum to 1.

Softmax is defined as:

Softmax(xi)=exp⁡(xi)∑jexp⁡(xj)\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j
\exp(x_j)}

When the input Tensor is a sparse tensor then the unspecifed values are
treated as `-inf`.

Shape:

    

  * Input: (∗)(*) where `*` means, any number of additional dimensions
  * Output: (∗)(*) , same shape as the input

Returns

    

a Tensor of the same dimension and shape as the input with values in the range
[0, 1]

Parameters

    

**dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – A dimension along which Softmax will be computed (so every
slice along dim will sum to 1).

Note

This module doesn’t work directly with NLLLoss, which expects the Log to be
computed between the Softmax and itself. Use `LogSoftmax` instead (it’s faster
and has better numerical properties).

Examples:

    
    
    >>> m = nn.Softmax(dim=1)
    >>> input = torch.randn(2, 3)
    >>> output = m(input)
    

# Softmax2d

`class torch.nn.Softmax2d`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softmax2d)

    

Applies SoftMax over features to each spatial location.

When given an image of `Channels x Height x Width`, it will apply `Softmax` to
each location (Channels,hi,wj)(Channels, h_i, w_j)

Shape:

    

  * Input: (N,C,H,W)(N, C, H, W)
  * Output: (N,C,H,W)(N, C, H, W) (same shape as input)

Returns

    

a Tensor of the same dimension and shape as the input with values in the range
[0, 1]

Examples:

    
    
    >>> m = nn.Softmax2d()
    >>> # you softmax over the 2nd dimension
    >>> input = torch.randn(2, 3, 12, 13)
    >>> output = m(input)
    

# Softmin

`class torch.nn.Softmin(dim=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softmin)

    

Applies the Softmin function to an n-dimensional input Tensor rescaling them
so that the elements of the n-dimensional output Tensor lie in the range `[0,
1]` and sum to 1.

Softmin is defined as:

Softmin(xi)=exp⁡(−xi)∑jexp⁡(−xj)\text{Softmin}(x_{i}) =
\frac{\exp(-x_i)}{\sum_j \exp(-x_j)}

Shape:

    

  * Input: (∗)(*) where `*` means, any number of additional dimensions
  * Output: (∗)(*) , same shape as the input

Parameters

    

**dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – A dimension along which Softmin will be computed (so every
slice along dim will sum to 1).

Returns

    

a Tensor of the same dimension and shape as the input, with values in the
range [0, 1]

Examples:

    
    
    >>> m = nn.Softmin()
    >>> input = torch.randn(2, 3)
    >>> output = m(input)
    

# Softplus

`class torch.nn.Softplus(beta=1, threshold=20)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softplus)

    

Applies the element-wise function:

Softplus(x)=1β∗log⁡(1+exp⁡(β∗x))\text{Softplus}(x) = \frac{1}{\beta} * \log(1
+ \exp(\beta * x))

SoftPlus is a smooth approximation to the ReLU function and can be used to
constrain the output of a machine to always be positive.

For numerical stability the implementation reverts to the linear function when
input×β>thresholdinput \times \beta > threshold .

Parameters

    

  * **beta** – the β\beta value for the Softplus formulation. Default: 1
  * **threshold** – values above this revert to a linear function. Default: 20

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Softplus()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Softshrink

`class torch.nn.Softshrink(lambd=0.5)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softshrink)

    

Applies the soft shrinkage function elementwise:

SoftShrinkage(x)={x−λ, if x>λx+λ, if x<−λ0, otherwise \text{SoftShrinkage}(x)
= \begin{cases} x - \lambda, & \text{ if } x > \lambda \\\ x + \lambda, &
\text{ if } x < -\lambda \\\ 0, & \text{ otherwise } \end{cases}

Parameters

    

**lambd** – the λ\lambda (must be no less than zero) value for the Softshrink
formulation. Default: 0.5

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Softshrink()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Softsign

`class torch.nn.Softsign`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softsign)

    

Applies the element-wise function:

SoftSign(x)=x1+∣x∣\text{SoftSign}(x) = \frac{x}{ 1 + |x|}

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Softsign()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# SyncBatchNorm

`class torch.nn.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1,
affine=True, track_running_stats=True, process_group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#SyncBatchNorm)

    

Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D
inputs with additional channel dimension) as described in the paper [Batch
Normalization: Accelerating Deep Network Training by Reducing Internal
Covariate Shift](https://arxiv.org/abs/1502.03167) .

y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] +
\epsilon}} * \gamma + \beta

The mean and standard-deviation are calculated per-dimension over all mini-
batches of the same process groups. γ\gamma and β\beta are learnable parameter
vectors of size `C` (where `C` is the input size). By default, the elements of
γ\gamma are sampled from U(0,1)\mathcal{U}(0, 1) and the elements of β\beta
are set to 0. The standard-deviation is calculated via the biased estimator,
equivalent to `torch.var(input, unbiased=False)`.

Also by default, during training this layer keeps running estimates of its
computed mean and variance, which are then used for normalization during
evaluation. The running estimates are kept with a default `momentum` of 0.1.

If `track_running_stats` is set to `False`, this layer then does not keep
running estimates, and batch statistics are instead used during evaluation
time as well.

Note

This `momentum` argument is different from one used in optimizer classes and
the conventional notion of momentum. Mathematically, the update rule for
running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new}
= (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where
x^\hat{x} is the estimated statistic and xtx_t is the new observed value.

Because the Batch Normalization is done for each channel in the `C` dimension,
computing statistics on `(N, +)` slices, it’s common terminology to call this
Volumetric Batch Normalization or Spatio-temporal Batch Normalization.

Currently `SyncBatchNorm` only supports `DistributedDataParallel` (DDP) with
single GPU per process. Use `torch.nn.SyncBatchNorm.convert_sync_batchnorm()`
to convert `BatchNorm*D` layer to `SyncBatchNorm` before wrapping Network with
DDP.

Parameters

    

  * **num_features** – CC from an expected input of size (N,C,+)(N, C, +)
  * **eps** – a value added to the denominator for numerical stability. Default: `1e-5`
  * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1
  * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True`
  * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True`
  * **process_group** – synchronization of stats happen within each process group individually. Default behavior is synchronization across the whole world

Shape:

    

  * Input: (N,C,+)(N, C, +)
  * Output: (N,C,+)(N, C, +) (same shape as input)

Examples:

    
    
    >>> # With Learnable Parameters
    >>> m = nn.SyncBatchNorm(100)
    >>> # creating process group (optional)
    >>> # ranks is a list of int identifying rank ids.
    >>> ranks = list(range(8))
    >>> r1, r2 = ranks[:4], ranks[4:]
    >>> # Note: every rank calls into new_group for every
    >>> # process group created, even if that rank is not
    >>> # part of the group.
    >>> process_groups = [torch.distributed.new_group(pids) for pids in [r1, r2]]
    >>> process_group = process_groups[0 if dist.get_rank() <= 3 else 1]
    >>> # Without Learnable Parameters
    >>> m = nn.BatchNorm3d(100, affine=False, process_group=process_group)
    >>> input = torch.randn(20, 100, 35, 45, 10)
    >>> output = m(input)
    
    >>> # network is nn.BatchNorm layer
    >>> sync_bn_network = nn.SyncBatchNorm.convert_sync_batchnorm(network, process_group)
    >>> # only single gpu per process is currently supported
    >>> ddp_sync_bn_network = torch.nn.parallel.DistributedDataParallel(
    >>>                         sync_bn_network,
    >>>                         device_ids=[args.local_rank],
    >>>                         output_device=args.local_rank)
    

`classmethod convert_sync_batchnorm(module, process_group=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#SyncBatchNorm.convert_sync_batchnorm)

    

Helper function to convert all `BatchNorm*D` layers in the model to
`torch.nn.SyncBatchNorm` layers.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing one or more attr:`BatchNorm*D` layers
  * **process_group** (_optional_) – process group to scope synchronization, default is the whole world

Returns

    

The original `module` with the converted `torch.nn.SyncBatchNorm` layers. If
the original `module` is a `BatchNorm*D` layer, a new `torch.nn.SyncBatchNorm`
layer object will be returned instead.

Example:

    
    
    >>> # Network with nn.BatchNorm layer
    >>> module = torch.nn.Sequential(
    >>>            torch.nn.Linear(20, 100),
    >>>            torch.nn.BatchNorm1d(100),
    >>>          ).cuda()
    >>> # creating process group (optional)
    >>> # ranks is a list of int identifying rank ids.
    >>> ranks = list(range(8))
    >>> r1, r2 = ranks[:4], ranks[4:]
    >>> # Note: every rank calls into new_group for every
    >>> # process group created, even if that rank is not
    >>> # part of the group.
    >>> process_groups = [torch.distributed.new_group(pids) for pids in [r1, r2]]
    >>> process_group = process_groups[0 if dist.get_rank() <= 3 else 1]
    >>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group)
    

# Tanh

`class torch.nn.Tanh`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Tanh)

    

Applies the element-wise function:

Tanh(x)=tanh⁡(x)=exp⁡(x)−exp⁡(−x)exp⁡(x)+exp⁡(−x)\text{Tanh}(x) = \tanh(x) =
\frac{\exp(x) - \exp(-x)} {\exp(x) + \exp(-x)}

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Tanh()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Tanhshrink

`class torch.nn.Tanhshrink`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Tanhshrink)

    

Applies the element-wise function:

Tanhshrink(x)=x−tanh⁡(x)\text{Tanhshrink}(x) = x - \tanh(x)

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Tanhshrink()
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Threshold

`class torch.nn.Threshold(threshold, value, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Threshold)

    

Thresholds each element of the input Tensor.

Threshold is defined as:

y={x, if x>thresholdvalue, otherwise y = \begin{cases} x, &\text{ if } x >
\text{threshold} \\\ \text{value}, &\text{ otherwise } \end{cases}

Parameters

    

  * **threshold** – The value to threshold at
  * **value** – The value to replace with
  * **inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.Threshold(0.1, 20)
    >>> input = torch.randn(2)
    >>> output = m(input)
    

# Transformer

`class torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6,
num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation='relu',
custom_encoder=None, custom_decoder=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#Transformer)

    

A transformer model. User is able to modify the attributes as needed. The
architecture is based on the paper “Attention Is All You Need”. Ashish
Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N
Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.
In Advances in Neural Information Processing Systems, pages 6000-6010. Users
can build the BERT(<https://arxiv.org/abs/1810.04805>) model with
corresponding parameters.

Parameters

    

  * **d_model** – the number of expected features in the encoder/decoder inputs (default=512).
  * **nhead** – the number of heads in the multiheadattention models (default=8).
  * **num_encoder_layers** – the number of sub-encoder-layers in the encoder (default=6).
  * **num_decoder_layers** – the number of sub-decoder-layers in the decoder (default=6).
  * **dim_feedforward** – the dimension of the feedforward network model (default=2048).
  * **dropout** – the dropout value (default=0.1).
  * **activation** – the activation function of encoder/decoder intermediate layer, relu or gelu (default=relu).
  * **custom_encoder** – custom encoder (default=None).
  * **custom_decoder** – custom decoder (default=None).

Examples::

    
    
    
    >>> transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
    >>> src = torch.rand((10, 32, 512))
    >>> tgt = torch.rand((20, 32, 512))
    >>> out = transformer_model(src, tgt)
    

Note: A full example to apply nn.Transformer module for the word language
model is available in
<https://github.com/pytorch/examples/tree/master/word_language_model>

`forward(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None,
src_key_padding_mask=None, tgt_key_padding_mask=None,
memory_key_padding_mask=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#Transformer.forward)

    

Take in and process masked source/target sequences.

Parameters

    

  * **src** – the sequence to the encoder (required).
  * **tgt** – the sequence to the decoder (required).
  * **src_mask** – the additive mask for the src sequence (optional).
  * **tgt_mask** – the additive mask for the tgt sequence (optional).
  * **memory_mask** – the additive mask for the encoder output (optional).
  * **src_key_padding_mask** – the ByteTensor mask for src keys per batch (optional).
  * **tgt_key_padding_mask** – the ByteTensor mask for tgt keys per batch (optional).
  * **memory_key_padding_mask** – the ByteTensor mask for memory keys per batch (optional).

Shape:

    

  * src: (S,N,E)(S, N, E) .
  * tgt: (T,N,E)(T, N, E) .
  * src_mask: (S,S)(S, S) .
  * tgt_mask: (T,T)(T, T) .
  * memory_mask: (T,S)(T, S) .
  * src_key_padding_mask: (N,S)(N, S) .
  * tgt_key_padding_mask: (N,T)(N, T) .
  * memory_key_padding_mask: (N,S)(N, S) .

Note: [src/tgt/memory]_mask ensures that position i is allowed to attend the
unmasked positions. If a ByteTensor is provided, the non-zero positions are
not allowed to attend while the zero positions will be unchanged. If a
BoolTensor is provided, positions with `True` are not allowed to attend while
`False` values will be unchanged. If a FloatTensor is provided, it will be
added to the attention weight. [src/tgt/memory]_key_padding_mask provides
specified elements in the key to be ignored by the attention. If a ByteTensor
is provided, the non-zero positions will be ignored while the zero positions
will be unchanged. If a BoolTensor is provided, the positions with the value
of `True` will be ignored while the position with the value of `False` will be
unchanged.

  * output: (T,N,E)(T, N, E) .

Note: Due to the multi-head attention architecture in the transformer model,
the output sequence length of a transformer is same as the input sequence
(i.e. target) length of the decode.

where S is the source sequence length, T is the target sequence length, N is
the batch size, E is the feature number

#### Examples

    
    
    >>> output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask)
    

`generate_square_subsequent_mask(sz)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#Transformer.generate_square_subsequent_mask)

    

Generate a square mask for the sequence. The masked positions are filled with
float(‘-inf’). Unmasked positions are filled with float(0.0).

# TransformerDecoder

`class torch.nn.TransformerDecoder(decoder_layer, num_layers, norm=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoder)

    

TransformerDecoder is a stack of N decoder layers

Parameters

    

  * **decoder_layer** – an instance of the TransformerDecoderLayer() class (required).
  * **num_layers** – the number of sub-decoder-layers in the decoder (required).
  * **norm** – the layer normalization component (optional).

Examples::

    
    
    
    >>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
    >>> transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6)
    >>> memory = torch.rand(10, 32, 512)
    >>> tgt = torch.rand(20, 32, 512)
    >>> out = transformer_decoder(tgt, memory)
    

`forward(tgt, memory, tgt_mask=None, memory_mask=None,
tgt_key_padding_mask=None, memory_key_padding_mask=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoder.forward)

    

Pass the inputs (and mask) through the decoder layer in turn.

Parameters

    

  * **tgt** – the sequence to the decoder (required).
  * **memory** – the sequence from the last layer of the encoder (required).
  * **tgt_mask** – the mask for the tgt sequence (optional).
  * **memory_mask** – the mask for the memory sequence (optional).
  * **tgt_key_padding_mask** – the mask for the tgt keys per batch (optional).
  * **memory_key_padding_mask** – the mask for the memory keys per batch (optional).

Shape:

    

see the docs in Transformer class.

# TransformerDecoderLayer

`class torch.nn.TransformerDecoderLayer(d_model, nhead, dim_feedforward=2048,
dropout=0.1, activation='relu')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoderLayer)

    

TransformerDecoderLayer is made up of self-attn, multi-head-attn and
feedforward network. This standard decoder layer is based on the paper
“Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob
Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin.
2017. Attention is all you need. In Advances in Neural Information Processing
Systems, pages 6000-6010. Users may modify or implement in a different way
during application.

Parameters

    

  * **d_model** – the number of expected features in the input (required).
  * **nhead** – the number of heads in the multiheadattention models (required).
  * **dim_feedforward** – the dimension of the feedforward network model (default=2048).
  * **dropout** – the dropout value (default=0.1).
  * **activation** – the activation function of intermediate layer, relu or gelu (default=relu).

Examples::

    
    
    
    >>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8)
    >>> memory = torch.rand(10, 32, 512)
    >>> tgt = torch.rand(20, 32, 512)
    >>> out = decoder_layer(tgt, memory)
    

`forward(tgt, memory, tgt_mask=None, memory_mask=None,
tgt_key_padding_mask=None, memory_key_padding_mask=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoderLayer.forward)

    

Pass the inputs (and mask) through the decoder layer.

Parameters

    

  * **tgt** – the sequence to the decoder layer (required).
  * **memory** – the sequence from the last layer of the encoder (required).
  * **tgt_mask** – the mask for the tgt sequence (optional).
  * **memory_mask** – the mask for the memory sequence (optional).
  * **tgt_key_padding_mask** – the mask for the tgt keys per batch (optional).
  * **memory_key_padding_mask** – the mask for the memory keys per batch (optional).

Shape:

    

see the docs in Transformer class.

# TransformerEncoder

`class torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoder)

    

TransformerEncoder is a stack of N encoder layers

Parameters

    

  * **encoder_layer** – an instance of the TransformerEncoderLayer() class (required).
  * **num_layers** – the number of sub-encoder-layers in the encoder (required).
  * **norm** – the layer normalization component (optional).

Examples::

    
    
    
    >>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
    >>> transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6)
    >>> src = torch.rand(10, 32, 512)
    >>> out = transformer_encoder(src)
    

`forward(src, mask=None, src_key_padding_mask=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoder.forward)

    

Pass the input through the encoder layers in turn.

Parameters

    

  * **src** – the sequence to the encoder (required).
  * **mask** – the mask for the src sequence (optional).
  * **src_key_padding_mask** – the mask for the src keys per batch (optional).

Shape:

    

see the docs in Transformer class.

# TransformerEncoderLayer

`class torch.nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048,
dropout=0.1, activation='relu')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoderLayer)

    

TransformerEncoderLayer is made up of self-attn and feedforward network. This
standard encoder layer is based on the paper “Attention Is All You Need”.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan
N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.
In Advances in Neural Information Processing Systems, pages 6000-6010. Users
may modify or implement in a different way during application.

Parameters

    

  * **d_model** – the number of expected features in the input (required).
  * **nhead** – the number of heads in the multiheadattention models (required).
  * **dim_feedforward** – the dimension of the feedforward network model (default=2048).
  * **dropout** – the dropout value (default=0.1).
  * **activation** – the activation function of intermediate layer, relu or gelu (default=relu).

Examples::

    
    
    
    >>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8)
    >>> src = torch.rand(10, 32, 512)
    >>> out = encoder_layer(src)
    

`forward(src, src_mask=None, src_key_padding_mask=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoderLayer.forward)

    

Pass the input through the encoder layer.

Parameters

    

  * **src** – the sequence to the encoder layer (required).
  * **src_mask** – the mask for the src sequence (optional).
  * **src_key_padding_mask** – the mask for the src keys per batch (optional).

Shape:

    

see the docs in Transformer class.

# TripletMarginLoss

`class torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False,
size_average=None, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#TripletMarginLoss)

    

Creates a criterion that measures the triplet loss given an input tensors x1x1
, x2x2 , x3x3 and a margin with a value greater than 00 . This is used for
measuring a relative similarity between samples. A triplet is composed by `a`,
`p` and `n` (i.e., `anchor`, `positive examples` and `negative examples`
respectively). The shapes of all input tensors should be (N,D)(N, D) .

The distance swap is described in detail in the paper [Learning shallow
convolutional feature descriptors with triplet
losses](http://www.bmva.org/bmvc/2016/papers/paper119/index.html) by V.
Balntas, E. Riba et al.

The loss function for each sample in the mini-batch is:

L(a,p,n)=max⁡{d(ai,pi)−d(ai,ni)+margin,0}L(a, p, n) = \max \\{d(a_i, p_i) -
d(a_i, n_i) + {\rm margin}, 0\\}

where

d(xi,yi)=∥xi−yi∥pd(x_i, y_i) = \left\lVert {\bf x}_i - {\bf y}_i
\right\rVert_p

See also
[`TripletMarginWithDistanceLoss`](torch.nn.tripletmarginwithdistanceloss#torch.nn.TripletMarginWithDistanceLoss
"torch.nn.TripletMarginWithDistanceLoss"), which computes the triplet margin
loss for input tensors using a custom distance function.

Parameters

    

  * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Default: 11 .
  * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The norm degree for pairwise distance. Default: 22 .
  * **swap** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – The distance swap is described in detail in the paper `Learning shallow convolutional feature descriptors with triplet losses` by V. Balntas, E. Riba et al. Default: `False`.
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Shape:

    

  * Input: (N,D)(N, D) where DD is the vector dimension.
  * `Output: A Tensor of shape (N)(N) if reduction is 'none', or a scalar`
    

otherwise.

    
    
    >>> triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2)
    >>> anchor = torch.randn(100, 128, requires_grad=True)
    >>> positive = torch.randn(100, 128, requires_grad=True)
    >>> negative = torch.randn(100, 128, requires_grad=True)
    >>> output = triplet_loss(anchor, positive, negative)
    >>> output.backward()
    

# TripletMarginWithDistanceLoss

`class torch.nn.TripletMarginWithDistanceLoss(*, distance_function=None,
margin=1.0, swap=False, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#TripletMarginWithDistanceLoss)

    

Creates a criterion that measures the triplet loss given input tensors aa , pp
, and nn (representing anchor, positive, and negative examples, respectively),
and a nonnegative, real-valued function (“distance function”) used to compute
the relationship between the anchor and positive example (“positive distance”)
and the anchor and negative example (“negative distance”).

The unreduced loss (i.e., with `reduction` set to `'none'`) can be described
as:

ℓ(a,p,n)=L={l1,…,lN}⊤,li=max⁡{d(ai,pi)−d(ai,ni)+margin,0}\ell(a, p, n) = L =
\\{l_1,\dots,l_N\\}^\top, \quad l_i = \max \\{d(a_i, p_i) - d(a_i, n_i) + {\rm
margin}, 0\\}

where NN is the batch size; dd is a nonnegative, real-valued function
quantifying the closeness of two tensors, referred to as the
`distance_function`; and marginmargin is a nonnegative margin representing the
minimum difference between the positive and negative distances that is
required for the loss to be 0. The input tensors have NN elements each and can
be of any shape that the distance function can handle.

If `reduction` is not `'none'` (default `'mean'`), then:

ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) =
\begin{cases} \operatorname{mean}(L), & \text{if reduction} =
\text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.}
\end{cases}

See also
[`TripletMarginLoss`](torch.nn.tripletmarginloss#torch.nn.TripletMarginLoss
"torch.nn.TripletMarginLoss"), which computes the triplet loss for input
tensors using the lpl_p distance as the distance function.

Parameters

    

  * **distance_function** (_callable_ _,__optional_) – A nonnegative, real-valued function that quantifies the closeness of two tensors. If not specified, `nn.PairwiseDistance` will be used. Default: `None`
  * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – A nonnegative margin representing the minimum difference between the positive and negative distances required for the loss to be 0. Larger margins penalize cases where the negative examples are not distant enough from the anchors, relative to the positives. Default: 11 .
  * **swap** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether to use the distance swap described in the paper `Learning shallow convolutional feature descriptors with triplet losses` by V. Balntas, E. Riba et al. If True, and if the positive example is closer to the negative example than the anchor is, swaps the positive example and the anchor in the loss computation. Default: `False`.
  * **reduction** (_string_ _,__optional_) – Specifies the (optional) reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Default: `'mean'`

Shape:

    

  * Input: (N,∗)(N, *) where ∗* represents any number of additional dimensions as supported by the distance function.
  * Output: A Tensor of shape (N)(N) if `reduction` is `'none'`, or a scalar otherwise.

Examples:

    
    
    >>> # Initialize embeddings
    >>> embedding = nn.Embedding(1000, 128)
    >>> anchor_ids = torch.randint(0, 1000, (1,))
    >>> positive_ids = torch.randint(0, 1000, (1,))
    >>> negative_ids = torch.randint(0, 1000, (1,))
    >>> anchor = embedding(anchor_ids)
    >>> positive = embedding(positive_ids)
    >>> negative = embedding(negative_ids)
    >>>
    >>> # Built-in Distance Function
    >>> triplet_loss = \
    >>>     nn.TripletMarginWithDistanceLoss(distance_function=nn.PairwiseDistance())
    >>> output = triplet_loss(anchor, positive, negative)
    >>> output.backward()
    >>>
    >>> # Custom Distance Function
    >>> def l_infinity(x1, x2):
    >>>     return torch.max(torch.abs(x1 - x2), dim=1).values
    >>>
    >>> triplet_loss = \
    >>>     nn.TripletMarginWithDistanceLoss(distance_function=l_infinity, margin=1.5)
    >>> output = triplet_loss(anchor, positive, negative)
    >>> output.backward()
    >>>
    >>> # Custom Distance Function (Lambda)
    >>> triplet_loss = \
    >>>     nn.TripletMarginWithDistanceLoss(
    >>>         distance_function=lambda x, y: 1.0 - F.cosine_similarity(x, y))
    >>> output = triplet_loss(anchor, positive, negative)
    >>> output.backward()
    

Reference:

    

V. Balntas, et al.: Learning shallow convolutional feature descriptors with
triplet losses: <http://www.bmva.org/bmvc/2016/papers/paper119/index.html>

# Unflatten

`class torch.nn.Unflatten(dim, unflattened_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/flatten.html#Unflatten)

    

Unflattens a tensor dim expanding it to a desired shape. For use with
`Sequential`.

  * `dim` specifies the dimension of the input tensor to be unflattened, and it can be either `int` or `str` when `Tensor` or `NamedTensor` is used, respectively.
  * `unflattened_size` is the new shape of the unflattened dimension of the tensor and it can be a `tuple` of ints or a `list` of ints or `torch.Size` for `Tensor` input; a `NamedShape` (tuple of `(name, size)` tuples) for `NamedTensor` input.

Shape:

    

  * Input: (N,∗dims)(N, *dims)
  * Output: (N,Cout,Hout,Wout)(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}})

Parameters

    

  * **dim** (_Union_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]_) – Dimension to be unflattened
  * **unflattened_size** (_Union_ _[__torch.Size_ _,__Tuple_ _,__List_ _,__NamedShape_ _]_) – New shape of the unflattened dimension

#### Examples

    
    
    >>> input = torch.randn(2, 50)
    >>> # With tuple of ints
    >>> m = nn.Sequential(
    >>>     nn.Linear(50, 50),
    >>>     nn.Unflatten(1, (2, 5, 5))
    >>> )
    >>> output = m(input)
    >>> output.size()
    torch.Size([2, 2, 5, 5])
    >>> # With torch.Size
    >>> m = nn.Sequential(
    >>>     nn.Linear(50, 50),
    >>>     nn.Unflatten(1, torch.Size([2, 5, 5]))
    >>> )
    >>> output = m(input)
    >>> output.size()
    torch.Size([2, 2, 5, 5])
    >>> # With namedshape (tuple of tuples)
    >>> input = torch.randn(2, 50, names=('N', 'features'))
    >>> unflatten = nn.Unflatten('features', (('C', 2), ('H', 5), ('W', 5)))
    >>> output = unflatten(input)
    >>> output.size()
    torch.Size([2, 2, 5, 5])
    

`add_module(name, module)`

    

Adds a child module to the current module.

The module can be accessed as an attribute using the given name.

Parameters

    

  * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name
  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – child module to be added to the module.

`apply(fn)`

    

Applies `fn` recursively to every submodule (as returned by `.children()`) as
well as self. Typical use includes initializing the parameters of a model (see
also [torch.nn.init](../nn.init#nn-init-doc)).

Parameters

    

**fn** ([`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") -> None)
– function to be applied to each submodule

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

Example:

    
    
    >>> @torch.no_grad()
    >>> def init_weights(m):
    >>>     print(m)
    >>>     if type(m) == nn.Linear:
    >>>         m.weight.fill_(1.0)
    >>>         print(m.weight)
    >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2))
    >>> net.apply(init_weights)
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Linear(in_features=2, out_features=2, bias=True)
    Parameter containing:
    tensor([[ 1.,  1.],
            [ 1.,  1.]])
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    

`bfloat16()`

    

Casts all floating point parameters and buffers to `bfloat16` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`buffers(recurse=True)`

    

Returns an iterator over module buffers.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields buffers of this module and all
submodules. Otherwise, yields only buffers that are direct members of this
module.

Yields

    

_torch.Tensor_ – module buffer

Example:

    
    
    >>> for buf in model.buffers():
    >>>     print(type(buf), buf.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`children()`

    

Returns an iterator over immediate children modules.

Yields

    

_Module_ – a child module

`cpu()`

    

Moves all model parameters and buffers to the CPU.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`cuda(device=None)`

    

Moves all model parameters and buffers to the GPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on GPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`double()`

    

Casts all floating point parameters and buffers to `double` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`eval()`

    

Sets the module in evaluation mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout
"torch.nn.Dropout"), `BatchNorm`, etc.

This is equivalent with
[`self.train(False)`](torch.nn.module#torch.nn.Module.train
"torch.nn.Module.train").

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`float()`

    

Casts all floating point parameters and buffers to float datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`half()`

    

Casts all floating point parameters and buffers to `half` datatype.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`load_state_dict(state_dict, strict=True)`

    

Copies parameters and buffers from `state_dict` into this module and its
descendants. If `strict` is `True`, then the keys of `state_dict` must exactly
match the keys returned by this module’s
[`state_dict()`](torch.nn.module#torch.nn.Module.state_dict
"torch.nn.Module.state_dict") function.

Parameters

    

  * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers.
  * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Default: `True`

Returns

    

  * **missing_keys** is a list of str containing the missing keys
  * **unexpected_keys** is a list of str containing the unexpected keys

Return type

    

`NamedTuple` with `missing_keys` and `unexpected_keys` fields

`modules()`

    

Returns an iterator over all modules in the network.

Yields

    

_Module_ – a module in the network

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.modules()):
            print(idx, '->', m)
    
    0 -> Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    )
    1 -> Linear(in_features=2, out_features=2, bias=True)
    

`named_buffers(prefix='', recurse=True)`

    

Returns an iterator over module buffers, yielding both the name of the buffer
as well as the buffer itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module.

Yields

    

_(string, torch.Tensor)_ – Tuple containing the name and buffer

Example:

    
    
    >>> for name, buf in self.named_buffers():
    >>>    if name in ['running_var']:
    >>>        print(buf.size())
    

`named_children()`

    

Returns an iterator over immediate children modules, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple containing a name and child module

Example:

    
    
    >>> for name, module in model.named_children():
    >>>     if name in ['conv4', 'conv5']:
    >>>         print(module)
    

`named_modules(memo=None, prefix='')`

    

Returns an iterator over all modules in the network, yielding both the name of
the module as well as the module itself.

Yields

    

_(string, Module)_ – Tuple of name and module

Note

Duplicate modules are returned only once. In the following example, `l` will
be returned only once.

Example:

    
    
    >>> l = nn.Linear(2, 2)
    >>> net = nn.Sequential(l, l)
    >>> for idx, m in enumerate(net.named_modules()):
            print(idx, '->', m)
    
    0 -> ('', Sequential(
      (0): Linear(in_features=2, out_features=2, bias=True)
      (1): Linear(in_features=2, out_features=2, bias=True)
    ))
    1 -> ('0', Linear(in_features=2, out_features=2, bias=True))
    

`named_parameters(prefix='', recurse=True)`

    

Returns an iterator over module parameters, yielding both the name of the
parameter as well as the parameter itself.

Parameters

    

  * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names.
  * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module.

Yields

    

_(string, Parameter)_ – Tuple containing the name and parameter

Example:

    
    
    >>> for name, param in self.named_parameters():
    >>>    if name in ['bias']:
    >>>        print(param.size())
    

`parameters(recurse=True)`

    

Returns an iterator over module parameters.

This is typically passed to an optimizer.

Parameters

    

**recurse** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – if True, then yields parameters of this module and
all submodules. Otherwise, yields only parameters that are direct members of
this module.

Yields

    

_Parameter_ – module parameter

Example:

    
    
    >>> for param in model.parameters():
    >>>     print(type(param), param.size())
    <class 'torch.Tensor'> (20L,)
    <class 'torch.Tensor'> (20L, 1L, 5L, 5L)
    

`register_backward_hook(hook)`

    

Registers a backward hook on the module.

This function is deprecated in favor of
`nn.Module.register_full_backward_hook()` and the behavior of this function
will change in future versions.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_buffer(name, tensor, persistent=True)`

    

Adds a buffer to the module.

This is typically used to register a buffer that should not to be considered a
model parameter. For example, BatchNorm’s `running_mean` is not a parameter,
but is part of the module’s state. Buffers, by default, are persistent and
will be saved alongside parameters. This behavior can be changed by setting
`persistent` to `False`. The only difference between a persistent buffer and a
non-persistent buffer is that the latter will not be a part of this module’s
`state_dict`.

Buffers can be accessed as attributes using given names.

Parameters

    

  * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name
  * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered.
  * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`.

Example:

    
    
    >>> self.register_buffer('running_mean', torch.zeros(num_features))
    

`register_forward_hook(hook)`

    

Registers a forward hook on the module.

The hook will be called every time after `forward()` has computed an output.
It should have the following signature:

    
    
    hook(module, input, output) -> None or modified output
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the output. It can modify the input inplace but it will not have effect
on forward since this is called after `forward()` is called.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_forward_pre_hook(hook)`

    

Registers a forward pre-hook on the module.

The hook will be called every time before `forward()` is invoked. It should
have the following signature:

    
    
    hook(module, input) -> None or modified input
    

The input contains only the positional arguments given to the module. Keyword
arguments won’t be passed to the hooks and only to the `forward`. The hook can
modify the input. User can either return a tuple or a single modified value in
the hook. We will wrap the value into a tuple if a single value is
returned(unless that value is already a tuple).

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_full_backward_hook(hook)`

    

Registers a backward hook on the module.

The hook will be called every time the gradients with respect to module inputs
are computed. The hook should have the following signature:

    
    
    hook(module, grad_input, grad_output) -> tuple(Tensor) or None
    

The `grad_input` and `grad_output` are tuples that contain the gradients with
respect to the inputs and outputs respectively. The hook should not modify its
arguments, but it can optionally return a new gradient with respect to the
input that will be used in place of `grad_input` in subsequent computations.
`grad_input` will only correspond to the inputs given as positional arguments
and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output`
will be `None` for all non-Tensor arguments.

Warning

Modifying inputs or outputs inplace is not allowed when using backward hooks
and will raise an error.

Returns

    

a handle that can be used to remove the added hook by calling
`handle.remove()`

Return type

    

`torch.utils.hooks.RemovableHandle`

`register_parameter(name, param)`

    

Adds a parameter to the module.

The parameter can be accessed as an attribute using given name.

Parameters

    

  * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name
  * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module.

`requires_grad_(requires_grad=True)`

    

Change if autograd should record operations on parameters in this module.

This method sets the parameters’ `requires_grad` attributes in-place.

This method is helpful for freezing part of the module for finetuning or
training parts of a model individually (e.g., GAN training).

Parameters

    

**requires_grad**
([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)")) – whether autograd should record operations on parameters in this
module. Default: `True`.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`state_dict(destination=None, prefix='', keep_vars=False)`

    

Returns a dictionary containing a whole state of the module.

Both parameters and persistent buffers (e.g. running averages) are included.
Keys are corresponding parameter and buffer names.

Returns

    

a dictionary containing a whole state of the module

Return type

    

[dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)")

Example:

    
    
    >>> module.state_dict().keys()
    ['bias', 'weight']
    

`to(*args, **kwargs)`

    

Moves and/or casts the parameters and buffers.

This can be called as

`to(device=None, dtype=None, non_blocking=False)`

`to(dtype, non_blocking=False)`

`to(tensor, non_blocking=False)`

`to(memory_format=torch.channels_last)`

Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to
"torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In
addition, this method will only cast the floating point or complex parameters
and buffers to :attr:`dtype` (if given). The integral parameters and buffers
will be moved `device`, if that is given, but with dtypes unchanged. When
`non_blocking` is set, it tries to convert/move asynchronously with respect to
the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA
devices.

See below for examples.

Note

This method modifies the module in-place.

Parameters

    

  * **device** (`torch.device`) – the desired device of the parameters and buffers in this module
  * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module
  * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module
  * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument)

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

Examples:

    
    
    >>> linear = nn.Linear(2, 2)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]])
    >>> linear.to(torch.double)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1913, -0.3420],
            [-0.5113, -0.2325]], dtype=torch.float64)
    >>> gpu1 = torch.device("cuda:1")
    >>> linear.to(gpu1, dtype=torch.half, non_blocking=True)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1')
    >>> cpu = torch.device("cpu")
    >>> linear.to(cpu)
    Linear(in_features=2, out_features=2, bias=True)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.1914, -0.3420],
            [-0.5112, -0.2324]], dtype=torch.float16)
    
    >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble)
    >>> linear.weight
    Parameter containing:
    tensor([[ 0.3741+0.j,  0.2382+0.j],
            [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128)
    >>> linear(torch.ones(3, 2, dtype=torch.cdouble))
    tensor([[0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j],
            [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128)
    

`train(mode=True)`

    

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular
modules for details of their behaviors in training/evaluation mode, if they
are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout
"torch.nn.Dropout"), `BatchNorm`, etc.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode
(`False`). Default: `True`.

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`type(dst_type)`

    

Casts all parameters and buffers to `dst_type`.

Parameters

    

**dst_type** ([type](https://docs.python.org/3/library/functions.html#type
"\(in Python v3.9\)") _or_ _string_) – the desired type

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`xpu(device=None)`

    

Moves all model parameters and buffers to the XPU.

This also makes associated parameters and buffers different objects. So it
should be called before constructing optimizer if the module will live on XPU
while being optimized.

Parameters

    

**device** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – if specified, all parameters will be copied
to that device

Returns

    

self

Return type

    

[Module](torch.nn.module#torch.nn.Module "torch.nn.Module")

`zero_grad(set_to_none=False)`

    

Sets gradients of all model parameters to zero. See similar function under
[`torch.optim.Optimizer`](../optim#torch.optim.Optimizer
"torch.optim.Optimizer") for more context.

Parameters

    

**set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – instead of setting to zero, set the grads to None.
See
[`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad
"torch.optim.Optimizer.zero_grad") for details.

# Unfold

`class torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/fold.html#Unfold)

    

Extracts sliding local blocks from a batched input tensor.

Consider a batched `input` tensor of shape (N,C,∗)(N, C, *) , where NN is the
batch dimension, CC is the channel dimension, and ∗* represent arbitrary
spatial dimensions. This operation flattens each sliding `kernel_size`-sized
block within the spatial dimensions of `input` into a column (i.e., last
dimension) of a 3-D `output` tensor of shape (N,C×∏(kernel_size),L)(N, C
\times \prod(\text{kernel\\_size}), L) , where C×∏(kernel_size)C \times
\prod(\text{kernel\\_size}) is the total number of values within each block (a
block has ∏(kernel_size)\prod(\text{kernel\\_size}) spatial locations each
containing a CC -channeled vector), and LL is the total number of such blocks:

L=∏d⌊spatial_size[d]+2×padding[d]−dilation[d]×(kernel_size[d]−1)−1stride[d]+1⌋,L
= \prod_d \left\lfloor\frac{\text{spatial\\_size}[d] + 2 \times
\text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\\_size}[d] - 1)
- 1}{\text{stride}[d]} + 1\right\rfloor,

where spatial_size\text{spatial\\_size} is formed by the spatial dimensions of
`input` (∗* above), and dd is over all spatial dimensions.

Therefore, indexing `output` at the last dimension (column dimension) gives
all values within a certain block.

The `padding`, `stride` and `dilation` arguments specify how the sliding
blocks are retrieved.

  * `stride` controls the stride for the sliding blocks.
  * `padding` controls the amount of implicit zero-paddings on both sides for `padding` number of points for each dimension before reshaping.
  * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does.

Parameters

    

  * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the sliding blocks
  * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the stride of the sliding blocks in the input spatial dimensions. Default: 1
  * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – implicit zero padding to be added on both sides of input. Default: 0
  * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – a parameter that controls the stride of elements within the neighborhood. Default: 1

  * If `kernel_size`, `dilation`, `padding` or `stride` is an int or a tuple of length 1, their values will be replicated across all spatial dimensions.
  * For the case of two input spatial dimensions this operation is sometimes called `im2col`.

Note

[`Fold`](torch.nn.fold#torch.nn.Fold "torch.nn.Fold") calculates each combined
value in the resulting large tensor by summing all values from all containing
blocks. `Unfold` extracts the values in the local blocks by copying from the
large tensor. So, if the blocks overlap, they are not inverses of each other.

In general, folding and unfolding operations are related as follows. Consider
[`Fold`](torch.nn.fold#torch.nn.Fold "torch.nn.Fold") and `Unfold` instances
created with the same parameters:

    
    
    >>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...)
    >>> fold = nn.Fold(output_size=..., **fold_params)
    >>> unfold = nn.Unfold(**fold_params)
    

Then for any (supported) `input` tensor the following equality holds:

    
    
    fold(unfold(input)) == divisor * input
    

where `divisor` is a tensor that depends only on the shape and dtype of the
`input`:

    
    
    >>> input_ones = torch.ones(input.shape, dtype=input.dtype)
    >>> divisor = fold(unfold(input_ones))
    

When the `divisor` tensor contains no zero elements, then `fold` and `unfold`
operations are inverses of each other (up to constant divisor).

Warning

Currently, only 4-D input tensors (batched image-like tensors) are supported.

Shape:

    

  * Input: (N,C,∗)(N, C, *)
  * Output: (N,C×∏(kernel_size),L)(N, C \times \prod(\text{kernel\\_size}), L) as described above

Examples:

    
    
    >>> unfold = nn.Unfold(kernel_size=(2, 3))
    >>> input = torch.randn(2, 5, 3, 4)
    >>> output = unfold(input)
    >>> # each patch contains 30 values (2x3=6 vectors, each of 5 channels)
    >>> # 4 blocks (2x3 kernels) in total in the 3x4 input
    >>> output.size()
    torch.Size([2, 30, 4])
    
    >>> # Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape)
    >>> inp = torch.randn(1, 3, 10, 12)
    >>> w = torch.randn(2, 3, 4, 5)
    >>> inp_unf = torch.nn.functional.unfold(inp, (4, 5))
    >>> out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2)
    >>> out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1))
    >>> # or equivalently (and avoiding a copy),
    >>> # out = out_unf.view(1, 2, 7, 8)
    >>> (torch.nn.functional.conv2d(inp, w) - out).abs().max()
    tensor(1.9073e-06)
    

# Upsample

`class torch.nn.Upsample(size=None, scale_factor=None, mode='nearest',
align_corners=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/upsampling.html#Upsample)

    

Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric)
data.

The input data is assumed to be of the form `minibatch x channels x [optional
depth] x [optional height] x width`. Hence, for spatial inputs, we expect a 4D
Tensor and for volumetric inputs, we expect a 5D Tensor.

The algorithms available for upsampling are nearest neighbor and linear,
bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively.

One can either give a `scale_factor` or the target output `size` to calculate
the output size. (You cannot give both, as it is ambiguous)

Parameters

    

  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – output spatial sizes
  * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _] or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _] or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – multiplier for spatial size. Has to match input size if it is a tuple.
  * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – the upsampling algorithm: one of `'nearest'`, `'linear'`, `'bilinear'`, `'bicubic'` and `'trilinear'`. Default: `'nearest'`
  * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when `mode` is `'linear'`, `'bilinear'`, or `'trilinear'`. Default: `False`

Shape:

    

  * Input: (N,C,Win)(N, C, W_{in}) , (N,C,Hin,Win)(N, C, H_{in}, W_{in}) or (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in})
  * Output: (N,C,Wout)(N, C, W_{out}) , (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) or (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where

Dout=⌊Din×scale_factor⌋D_{out} = \left\lfloor D_{in} \times
\text{scale\\_factor} \right\rfloor

Hout=⌊Hin×scale_factor⌋H_{out} = \left\lfloor H_{in} \times
\text{scale\\_factor} \right\rfloor

Wout=⌊Win×scale_factor⌋W_{out} = \left\lfloor W_{in} \times
\text{scale\\_factor} \right\rfloor

Warning

With `align_corners = True`, the linearly interpolating modes (`linear`,
`bilinear`, `bicubic`, and `trilinear`) don’t proportionally align the output
and input pixels, and thus the output values can depend on the input size.
This was the default behavior for these modes up to version 0.3.1. Since then,
the default behavior is `align_corners = False`. See below for concrete
examples on how this affects the outputs.

Note

If you want downsampling/general resizing, you should use `interpolate()`.

Examples:

    
    
    >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2)
    >>> input
    tensor([[[[ 1.,  2.],
              [ 3.,  4.]]]])
    
    >>> m = nn.Upsample(scale_factor=2, mode='nearest')
    >>> m(input)
    tensor([[[[ 1.,  1.,  2.,  2.],
              [ 1.,  1.,  2.,  2.],
              [ 3.,  3.,  4.,  4.],
              [ 3.,  3.,  4.,  4.]]]])
    
    >>> m = nn.Upsample(scale_factor=2, mode='bilinear')  # align_corners=False
    >>> m(input)
    tensor([[[[ 1.0000,  1.2500,  1.7500,  2.0000],
              [ 1.5000,  1.7500,  2.2500,  2.5000],
              [ 2.5000,  2.7500,  3.2500,  3.5000],
              [ 3.0000,  3.2500,  3.7500,  4.0000]]]])
    
    >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
    >>> m(input)
    tensor([[[[ 1.0000,  1.3333,  1.6667,  2.0000],
              [ 1.6667,  2.0000,  2.3333,  2.6667],
              [ 2.3333,  2.6667,  3.0000,  3.3333],
              [ 3.0000,  3.3333,  3.6667,  4.0000]]]])
    
    >>> # Try scaling the same data in a larger tensor
    >>>
    >>> input_3x3 = torch.zeros(3, 3).view(1, 1, 3, 3)
    >>> input_3x3[:, :, :2, :2].copy_(input)
    tensor([[[[ 1.,  2.],
              [ 3.,  4.]]]])
    >>> input_3x3
    tensor([[[[ 1.,  2.,  0.],
              [ 3.,  4.,  0.],
              [ 0.,  0.,  0.]]]])
    
    >>> m = nn.Upsample(scale_factor=2, mode='bilinear')  # align_corners=False
    >>> # Notice that values in top left corner are the same with the small input (except at boundary)
    >>> m(input_3x3)
    tensor([[[[ 1.0000,  1.2500,  1.7500,  1.5000,  0.5000,  0.0000],
              [ 1.5000,  1.7500,  2.2500,  1.8750,  0.6250,  0.0000],
              [ 2.5000,  2.7500,  3.2500,  2.6250,  0.8750,  0.0000],
              [ 2.2500,  2.4375,  2.8125,  2.2500,  0.7500,  0.0000],
              [ 0.7500,  0.8125,  0.9375,  0.7500,  0.2500,  0.0000],
              [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]])
    
    >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
    >>> # Notice that values in top left corner are now changed
    >>> m(input_3x3)
    tensor([[[[ 1.0000,  1.4000,  1.8000,  1.6000,  0.8000,  0.0000],
              [ 1.8000,  2.2000,  2.6000,  2.2400,  1.1200,  0.0000],
              [ 2.6000,  3.0000,  3.4000,  2.8800,  1.4400,  0.0000],
              [ 2.4000,  2.7200,  3.0400,  2.5600,  1.2800,  0.0000],
              [ 1.2000,  1.3600,  1.5200,  1.2800,  0.6400,  0.0000],
              [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]])
    

# UpsamplingBilinear2d

`class torch.nn.UpsamplingBilinear2d(size=None, scale_factor=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/upsampling.html#UpsamplingBilinear2d)

    

Applies a 2D bilinear upsampling to an input signal composed of several input
channels.

To specify the scale, it takes either the `size` or the `scale_factor` as it’s
constructor argument.

When `size` is given, it is the output size of the image `(h, w)`.

Parameters

    

  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – output spatial sizes
  * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – multiplier for spatial size.

Warning

This class is deprecated in favor of `interpolate()`. It is equivalent to
`nn.functional.interpolate(..., mode='bilinear', align_corners=True)`.

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where

Hout=⌊Hin×scale_factor⌋H_{out} = \left\lfloor H_{in} \times
\text{scale\\_factor} \right\rfloor

Wout=⌊Win×scale_factor⌋W_{out} = \left\lfloor W_{in} \times
\text{scale\\_factor} \right\rfloor

Examples:

    
    
    >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2)
    >>> input
    tensor([[[[ 1.,  2.],
              [ 3.,  4.]]]])
    
    >>> m = nn.UpsamplingBilinear2d(scale_factor=2)
    >>> m(input)
    tensor([[[[ 1.0000,  1.3333,  1.6667,  2.0000],
              [ 1.6667,  2.0000,  2.3333,  2.6667],
              [ 2.3333,  2.6667,  3.0000,  3.3333],
              [ 3.0000,  3.3333,  3.6667,  4.0000]]]])
    

# UpsamplingNearest2d

`class torch.nn.UpsamplingNearest2d(size=None, scale_factor=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/upsampling.html#UpsamplingNearest2d)

    

Applies a 2D nearest neighbor upsampling to an input signal composed of
several input channels.

To specify the scale, it takes either the `size` or the `scale_factor` as it’s
constructor argument.

When `size` is given, it is the output size of the image `(h, w)`.

Parameters

    

  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – output spatial sizes
  * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – multiplier for spatial size.

Warning

This class is deprecated in favor of `interpolate()`.

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where

Hout=⌊Hin×scale_factor⌋H_{out} = \left\lfloor H_{in} \times
\text{scale\\_factor} \right\rfloor

Wout=⌊Win×scale_factor⌋W_{out} = \left\lfloor W_{in} \times
\text{scale\\_factor} \right\rfloor

Examples:

    
    
    >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2)
    >>> input
    tensor([[[[ 1.,  2.],
              [ 3.,  4.]]]])
    
    >>> m = nn.UpsamplingNearest2d(scale_factor=2)
    >>> m(input)
    tensor([[[[ 1.,  1.,  2.,  2.],
              [ 1.,  1.,  2.,  2.],
              [ 3.,  3.,  4.,  4.],
              [ 3.,  3.,  4.,  4.]]]])
    

# torch.nn.utils.clip_grad_norm_

`torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/clip_grad.html#clip_grad_norm_)

    

Clips gradient norm of an iterable of parameters.

The norm is computed over all gradients together, as if they were concatenated
into a single vector. Gradients are modified in-place.

Parameters

    

  * **parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _] or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – an iterable of Tensors or a single Tensor that will have gradients normalized
  * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – max norm of the gradients
  * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – type of the used p-norm. Can be `'inf'` for infinity norm.

Returns

    

Total norm of the parameters (viewed as a single vector).

# torch.nn.utils.clip_grad_value_

`torch.nn.utils.clip_grad_value_(parameters, clip_value)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/clip_grad.html#clip_grad_value_)

    

Clips gradient of an iterable of parameters at specified value.

Gradients are modified in-place.

Parameters

    

  * **parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _] or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – an iterable of Tensors or a single Tensor that will have gradients normalized
  * **clip_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximum allowed value of the gradients. The gradients are clipped in the range [-clip_value,clip_value]\left[\text{-clip\\_value}, \text{clip\\_value}\right]

# torch.nn.utils.parameters_to_vector

`torch.nn.utils.parameters_to_vector(parameters)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/convert_parameters.html#parameters_to_vector)

    

Convert parameters to one vector

Parameters

    

**parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor")
_]_) – an iterator of Tensors that are the parameters of a model.

Returns

    

The parameters represented by a single vector

# BasePruningMethod

`class torch.nn.utils.prune.BasePruningMethod`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod)

    

Abstract base class for creation of new pruning techniques.

Provides a skeleton for customization requiring the overriding of methods such
as `compute_mask()` and `apply()`.

`classmethod apply(module, name, *args, importance_scores=None, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.apply)

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **args** – arguments passed on to a subclass of `BasePruningMethod`
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the parameter will be used in its place.
  * **kwargs** – keyword arguments passed on to a subclass of a `BasePruningMethod`

`apply_mask(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.apply_mask)

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`abstract compute_mask(t, default_mask)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.compute_mask)

    

Computes and returns a mask for the input tensor `t`. Starting from a base
`default_mask` (which should be a mask of ones if the tensor has not been
pruned yet), generate a random mask to apply on top of the `default_mask`
according to the specific pruning method recipe.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the importance scores of the
  * **to prune.** (_parameter_) – 
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Base mask from previous pruning
  * **that need to be respected after the new mask is** (_iterations_ _,_) – 
  * **Same dims as t.** (_applied._) – 

Returns

    

mask to apply to `t`, of same dims as `t`

Return type

    

mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`prune(t, default_mask=None, importance_scores=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.prune)

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.remove)

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# torch.nn.utils.prune.custom_from_mask

`torch.nn.utils.prune.custom_from_mask(module, name, mask)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#custom_from_mask)

    

Prunes tensor corresponding to parameter called `name` in `module` by applying
the pre-computed mask in `mask`. Modifies module in place (and also return the
modified module) by: 1) adding a named buffer called `name+'_mask'`
corresponding to the binary mask applied to the parameter `name` by the
pruning method. 2) replacing the parameter `name` by its pruned version, while
the original (unpruned) parameter is stored in a new parameter named
`name+'_orig'`.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **mask** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – binary mask to be applied to the parameter.

Returns

    

modified (i.e. pruned) version of the input module

Return type

    

module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module"))

#### Examples

    
    
    >>> m = prune.custom_from_mask(
            nn.Linear(5, 3), name='bias', mask=torch.Tensor([0, 1, 0])
        )
    >>> print(m.bias_mask)
    tensor([0., 1., 0.])
    

# CustomFromMask

`class torch.nn.utils.prune.CustomFromMask(mask)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#CustomFromMask)

    

`classmethod apply(module, name, mask)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#CustomFromMask.apply)

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.

`apply_mask(module)`

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`prune(t, default_mask=None, importance_scores=None)`

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# torch.nn.utils.prune.global_unstructured

`torch.nn.utils.prune.global_unstructured(parameters, pruning_method,
importance_scores=None, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#global_unstructured)

    

Globally prunes tensors corresponding to all parameters in `parameters` by
applying the specified `pruning_method`. Modifies modules in place by: 1)
adding a named buffer called `name+'_mask'` corresponding to the binary mask
applied to the parameter `name` by the pruning method. 2) replacing the
parameter `name` by its pruned version, while the original (unpruned)
parameter is stored in a new parameter named `name+'_orig'`.

Parameters

    

  * **parameters** (_Iterable of_ _(__module_ _,__name_ _)__tuples_) – parameters of the model to prune in a global fashion, i.e. by aggregating all weights prior to deciding which ones to prune. module must be of type `nn.Module`, and name must be a string.
  * **pruning_method** (_function_) – a valid pruning function from this module, or a custom one implemented by the user that satisfies the implementation guidelines and has `PRUNING_TYPE='unstructured'`.
  * **importance_scores** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dictionary mapping (module, name) tuples to the corresponding parameter’s importance scores tensor. The tensor should be the same shape as the parameter, and is used for computing mask for pruning. If unspecified or None, the parameter will be used in place of its importance scores.
  * **kwargs** – other keyword arguments such as: amount (int or float): quantity of parameters to prune across the specified parameters. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.

Raises

    

[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError
"\(in Python v3.9\)") – if `PRUNING_TYPE != 'unstructured'`

Note

Since global structured pruning doesn’t make much sense unless the norm is
normalized by the size of the parameter, we now limit the scope of global
pruning to unstructured methods.

#### Examples

    
    
    >>> net = nn.Sequential(OrderedDict([
            ('first', nn.Linear(10, 4)),
            ('second', nn.Linear(4, 1)),
        ]))
    >>> parameters_to_prune = (
            (net.first, 'weight'),
            (net.second, 'weight'),
        )
    >>> prune.global_unstructured(
            parameters_to_prune,
            pruning_method=prune.L1Unstructured,
            amount=10,
        )
    >>> print(sum(torch.nn.utils.parameters_to_vector(net.buffers()) == 0))
    tensor(10, dtype=torch.uint8)
    

# Identity

`class torch.nn.utils.prune.Identity`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#Identity)

    

Utility pruning method that does not prune any units but generates the pruning
parametrization with a mask of ones.

`classmethod apply(module, name)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#Identity.apply)

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.

`apply_mask(module)`

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`prune(t, default_mask=None, importance_scores=None)`

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# torch.nn.utils.prune.is_pruned

`torch.nn.utils.prune.is_pruned(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#is_pruned)

    

Check whether `module` is pruned by looking for `forward_pre_hooks` in its
modules that inherit from the
[`BasePruningMethod`](torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod
"torch.nn.utils.prune.BasePruningMethod").

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
object that is either pruned or unpruned

Returns

    

binary answer to whether `module` is pruned.

#### Examples

    
    
    >>> m = nn.Linear(5, 7)
    >>> print(prune.is_pruned(m))
    False
    >>> prune.random_unstructured(m, name='weight', amount=0.2)
    >>> print(prune.is_pruned(m))
    True
    

# torch.nn.utils.prune.l1_unstructured

`torch.nn.utils.prune.l1_unstructured(module, name, amount,
importance_scores=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#l1_unstructured)

    

Prunes tensor corresponding to parameter called `name` in `module` by removing
the specified `amount` of (currently unpruned) units with the lowest L1-norm.
Modifies module in place (and also return the modified module) by: 1) adding a
named buffer called `name+'_mask'` corresponding to the binary mask applied to
the parameter `name` by the pruning method. 2) replacing the parameter `name`
by its pruned version, while the original (unpruned) parameter is stored in a
new parameter named `name+'_orig'`.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place.

Returns

    

modified (i.e. pruned) version of the input module

Return type

    

module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module"))

#### Examples

    
    
    >>> m = prune.l1_unstructured(nn.Linear(2, 3), 'weight', amount=0.2)
    >>> m.state_dict().keys()
    odict_keys(['bias', 'weight_orig', 'weight_mask'])
    

# L1Unstructured

`class torch.nn.utils.prune.L1Unstructured(amount)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#L1Unstructured)

    

Prune (currently unpruned) units in a tensor by zeroing out the ones with the
lowest L1-norm.

Parameters

    

**amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_or_[float](https://docs.python.org/3/library/functions.html#float "\(in
Python v3.9\)")) – quantity of parameters to prune. If `float`, should be
between 0.0 and 1.0 and represent the fraction of parameters to prune. If
`int`, it represents the absolute number of parameters to prune.

`classmethod apply(module, name, amount, importance_scores=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#L1Unstructured.apply)

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place.

`apply_mask(module)`

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`prune(t, default_mask=None, importance_scores=None)`

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# torch.nn.utils.prune.ln_structured

`torch.nn.utils.prune.ln_structured(module, name, amount, n, dim,
importance_scores=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#ln_structured)

    

Prunes tensor corresponding to parameter called `name` in `module` by removing
the specified `amount` of (currently unpruned) channels along the specified
`dim` with the lowest L``n``-norm. Modifies module in place (and also return
the modified module) by: 1) adding a named buffer called `name+'_mask'`
corresponding to the binary mask applied to the parameter `name` by the
pruning method. 2) replacing the parameter `name` by its pruned version, while
the original (unpruned) parameter is stored in a new parameter named
`name+'_orig'`.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'_) – See documentation of valid entries for argument `p` in [`torch.norm()`](torch.norm#torch.norm "torch.norm").
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index of the dim along which we define channels to prune.
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place.

Returns

    

modified (i.e. pruned) version of the input module

Return type

    

module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module"))

#### Examples

    
    
    >>> m = prune.ln_structured(
           nn.Conv2d(5, 3, 2), 'weight', amount=0.3, dim=1, n=float('-inf')
        )
    

# LnStructured

`class torch.nn.utils.prune.LnStructured(amount, n, dim=-1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#LnStructured)

    

Prune entire (currently unpruned) channels in a tensor based on their Ln-norm.

Parameters

    

  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of channels to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'_) – See documentation of valid entries for argument `p` in [`torch.norm()`](torch.norm#torch.norm "torch.norm").
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – index of the dim along which we define channels to prune. Default: -1.

`classmethod apply(module, name, amount, n, dim, importance_scores=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#LnStructured.apply)

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'_) – See documentation of valid entries for argument `p` in [`torch.norm()`](torch.norm#torch.norm "torch.norm").
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index of the dim along which we define channels to prune.
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place.

`apply_mask(module)`

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`compute_mask(t, default_mask)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#LnStructured.compute_mask)

    

Computes and returns a mask for the input tensor `t`. Starting from a base
`default_mask` (which should be a mask of ones if the tensor has not been
pruned yet), generate a mask to apply on top of the `default_mask` by zeroing
out the channels along the specified dim with the lowest Ln-norm.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the parameter to prune
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Base mask from previous pruning iterations, that need to be respected after the new mask is applied. Same dims as `t`.

Returns

    

mask to apply to `t`, of same dims as `t`

Return type

    

mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

Raises

    

[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError
"\(in Python v3.9\)") – if `self.dim >= len(t.shape)`

`prune(t, default_mask=None, importance_scores=None)`

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# PruningContainer

`class torch.nn.utils.prune.PruningContainer(*args)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#PruningContainer)

    

Container holding a sequence of pruning methods for iterative pruning. Keeps
track of the order in which pruning methods are applied and handles combining
successive pruning calls.

Accepts as argument an instance of a BasePruningMethod or an iterable of them.

`add_pruning_method(method)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#PruningContainer.add_pruning_method)

    

Adds a child pruning `method` to the container.

Parameters

    

**method** (_subclass of BasePruningMethod_) – child pruning method to be
added to the container.

`classmethod apply(module, name, *args, importance_scores=None, **kwargs)`

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **args** – arguments passed on to a subclass of [`BasePruningMethod`](torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod "torch.nn.utils.prune.BasePruningMethod")
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the parameter will be used in its place.
  * **kwargs** – keyword arguments passed on to a subclass of a [`BasePruningMethod`](torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod "torch.nn.utils.prune.BasePruningMethod")

`apply_mask(module)`

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`compute_mask(t, default_mask)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#PruningContainer.compute_mask)

    

Applies the latest `method` by computing the new partial masks and returning
its combination with the `default_mask`. The new partial mask should be
computed on the entries or channels that were not zeroed out by the
`default_mask`. Which portions of the tensor `t` the new mask will be
calculated from depends on the `PRUNING_TYPE` (handled by the type handler):

  * for ‘unstructured’, the mask will be computed from the raveled list of nonmasked entries;
  * for ‘structured’, the mask will be computed from the nonmasked channels in the tensor;
  * for ‘global’, the mask will be computed across all entries.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the parameter to prune (of same dimensions as `default_mask`).
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – mask from previous pruning iteration.

Returns

    

new mask that combines the effects of the `default_mask` and the new mask from
the current pruning `method` (of same dimensions as `default_mask` and `t`).

Return type

    

mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`prune(t, default_mask=None, importance_scores=None)`

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# torch.nn.utils.prune.random_structured

`torch.nn.utils.prune.random_structured(module, name, amount, dim)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#random_structured)

    

Prunes tensor corresponding to parameter called `name` in `module` by removing
the specified `amount` of (currently unpruned) channels along the specified
`dim` selected at random. Modifies module in place (and also return the
modified module) by: 1) adding a named buffer called `name+'_mask'`
corresponding to the binary mask applied to the parameter `name` by the
pruning method. 2) replacing the parameter `name` by its pruned version, while
the original (unpruned) parameter is stored in a new parameter named
`name+'_orig'`.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index of the dim along which we define channels to prune.

Returns

    

modified (i.e. pruned) version of the input module

Return type

    

module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module"))

#### Examples

    
    
    >>> m = prune.random_structured(
            nn.Linear(5, 3), 'weight', amount=3, dim=1
        )
    >>> columns_pruned = int(sum(torch.sum(m.weight, dim=0) == 0))
    >>> print(columns_pruned)
    3
    

# torch.nn.utils.prune.random_unstructured

`torch.nn.utils.prune.random_unstructured(module, name, amount)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#random_unstructured)

    

Prunes tensor corresponding to parameter called `name` in `module` by removing
the specified `amount` of (currently unpruned) units selected at random.
Modifies module in place (and also return the modified module) by: 1) adding a
named buffer called `name+'_mask'` corresponding to the binary mask applied to
the parameter `name` by the pruning method. 2) replacing the parameter `name`
by its pruned version, while the original (unpruned) parameter is stored in a
new parameter named `name+'_orig'`.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.

Returns

    

modified (i.e. pruned) version of the input module

Return type

    

module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module"))

#### Examples

    
    
    >>> m = prune.random_unstructured(nn.Linear(2, 3), 'weight', amount=1)
    >>> torch.sum(m.weight_mask == 0)
    tensor(1)
    

# RandomStructured

`class torch.nn.utils.prune.RandomStructured(amount, dim=-1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomStructured)

    

Prune entire (currently unpruned) channels in a tensor at random.

Parameters

    

  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – index of the dim along which we define channels to prune. Default: -1.

`classmethod apply(module, name, amount, dim=-1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomStructured.apply)

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – index of the dim along which we define channels to prune. Default: -1.

`apply_mask(module)`

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`compute_mask(t, default_mask)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomStructured.compute_mask)

    

Computes and returns a mask for the input tensor `t`. Starting from a base
`default_mask` (which should be a mask of ones if the tensor has not been
pruned yet), generate a random mask to apply on top of the `default_mask` by
randomly zeroing out channels along the specified dim of the tensor.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the parameter to prune
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Base mask from previous pruning iterations, that need to be respected after the new mask is applied. Same dims as `t`.

Returns

    

mask to apply to `t`, of same dims as `t`

Return type

    

mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

Raises

    

[**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError
"\(in Python v3.9\)") – if `self.dim >= len(t.shape)`

`prune(t, default_mask=None, importance_scores=None)`

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# RandomUnstructured

`class torch.nn.utils.prune.RandomUnstructured(amount)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomUnstructured)

    

Prune (currently unpruned) units in a tensor at random.

Parameters

    

  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.

`classmethod apply(module, name, amount)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomUnstructured.apply)

    

Adds the forward pre-hook that enables pruning on the fly and the
reparametrization of a tensor in terms of the original tensor and the pruning
mask.

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.
  * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune.

`apply_mask(module)`

    

Simply handles the multiplication between the parameter being pruned and the
generated mask. Fetches the mask and the original tensor from the module and
returns the pruned version of the tensor.

Parameters

    

**module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) –
module containing the tensor to prune

Returns

    

pruned version of the input tensor

Return type

    

pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor"))

`prune(t, default_mask=None, importance_scores=None)`

    

Computes and returns a pruned version of input tensor `t` according to the
pruning rule specified in `compute_mask()`.

Parameters

    

  * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`).
  * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place.
  * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones.

Returns

    

pruned version of tensor `t`.

`remove(module)`

    

Removes the pruning reparameterization from a module. The pruned parameter
named `name` remains permanently pruned, and the parameter named
`name+'_orig'` is removed from the parameter list. Similarly, the buffer named
`name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

# torch.nn.utils.prune.remove

`torch.nn.utils.prune.remove(module, name)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#remove)

    

Removes the pruning reparameterization from a module and the pruning method
from the forward hook. The pruned parameter named `name` remains permanently
pruned, and the parameter named `name+'_orig'` is removed from the parameter
list. Similarly, the buffer named `name+'_mask'` is removed from the buffers.

Note

Pruning itself is NOT undone or reversed!

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act.

#### Examples

    
    
    >>> m = random_unstructured(nn.Linear(5, 7), name='weight', amount=0.2)
    >>> m = remove(m, name='weight')
    

# torch.nn.utils.remove_spectral_norm

`torch.nn.utils.remove_spectral_norm(module, name='weight')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/spectral_norm.html#remove_spectral_norm)

    

Removes the spectral normalization reparameterization from a module.

Parameters

    

  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter

#### Example

    
    
    >>> m = spectral_norm(nn.Linear(40, 10))
    >>> remove_spectral_norm(m)
    

# torch.nn.utils.remove_weight_norm

`torch.nn.utils.remove_weight_norm(module, name='weight')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/weight_norm.html#remove_weight_norm)

    

Removes the weight normalization reparameterization from a module.

Parameters

    

  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter

#### Example

    
    
    >>> m = weight_norm(nn.Linear(20, 40))
    >>> remove_weight_norm(m)
    

# torch.nn.utils.rnn.pack_padded_sequence

`torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False,
enforce_sorted=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pack_padded_sequence)

    

Packs a Tensor containing padded sequences of variable length.

`input` can be of size `T x B x *` where `T` is the length of the longest
sequence (equal to `lengths[0]`), `B` is the batch size, and `*` is any number
of dimensions (including 0). If `batch_first` is `True`, `B x T x *` `input`
is expected.

For unsorted sequences, use `enforce_sorted = False`. If `enforce_sorted` is
`True`, the sequences should be sorted by length in a decreasing order, i.e.
`input[:,0]` should be the longest sequence, and `input[:,B-1]` the shortest
one. `enforce_sorted = True` is only necessary for ONNX export.

Note

This function accepts any input that has at least two dimensions. You can
apply it to pack the labels, and use the output of the RNN with them to
compute the loss directly. A Tensor can be retrieved from a
[`PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence
"torch.nn.utils.rnn.PackedSequence") object by accessing its `.data`
attribute.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – padded batch of variable length sequences.
  * **lengths** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _(_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _)_) – list of sequence lengths of each batch element (must be on the CPU if provided as a tensor).
  * **batch_first** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the input is expected in `B x T x *` format.
  * **enforce_sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the input is expected to contain sequences sorted by length in a decreasing order. If `False`, the input will get sorted unconditionally. Default: `True`.

Returns

    

a
[`PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence
"torch.nn.utils.rnn.PackedSequence") object

# torch.nn.utils.rnn.pack_sequence

`torch.nn.utils.rnn.pack_sequence(sequences, enforce_sorted=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pack_sequence)

    

Packs a list of variable length Tensors

`sequences` should be a list of Tensors of size `L x *`, where `L` is the
length of a sequence and `*` is any number of trailing dimensions, including
zero.

For unsorted sequences, use `enforce_sorted = False`. If `enforce_sorted` is
`True`, the sequences should be sorted in the order of decreasing length.
`enforce_sorted = True` is only necessary for ONNX export.

#### Example

    
    
    >>> from torch.nn.utils.rnn import pack_sequence
    >>> a = torch.tensor([1,2,3])
    >>> b = torch.tensor([4,5])
    >>> c = torch.tensor([6])
    >>> pack_sequence([a, b, c])
    PackedSequence(data=tensor([ 1,  4,  6,  2,  5,  3]), batch_sizes=tensor([ 3,  2,  1]))
    

Parameters

    

  * **sequences** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – A list of sequences of decreasing length.
  * **enforce_sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, checks that the input contains sequences sorted by length in a decreasing order. If `False`, this condition is not checked. Default: `True`.

Returns

    

a
[`PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence
"torch.nn.utils.rnn.PackedSequence") object

# PackedSequence

`class torch.nn.utils.rnn.PackedSequence`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#PackedSequence)

    

Holds the data and list of `batch_sizes` of a packed sequence.

All RNN modules accept packed sequences as inputs.

Note

Instances of this class should never be created manually. They are meant to be
instantiated by functions like
[`pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence
"torch.nn.utils.rnn.pack_padded_sequence").

Batch sizes represent the number elements at each sequence step in the batch,
not the varying sequence lengths passed to
[`pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence
"torch.nn.utils.rnn.pack_padded_sequence"). For instance, given data `abc` and
`x` the `PackedSequence` would contain data `axbc` with `batch_sizes=[2,1,1]`.

Variables

    

  * **~PackedSequence.data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor containing packed sequence
  * **~PackedSequence.batch_sizes** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor of integers holding information about the batch size at each sequence step
  * **~PackedSequence.sorted_indices** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – Tensor of integers holding how this `PackedSequence` is constructed from sequences.
  * **~PackedSequence.unsorted_indices** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – Tensor of integers holding how this to recover the original sequences with correct order.

Note

`data` can be on arbitrary device and of arbitrary dtype. `sorted_indices` and
`unsorted_indices` must be `torch.int64` tensors on the same device as `data`.

However, `batch_sizes` should always be a CPU `torch.int64` tensor.

This invariant is maintained throughout `PackedSequence` class, and all
functions that construct a `:class:PackedSequence` in PyTorch (i.e., they only
pass in tensors conforming to this constraint).

`property batch_sizes`

    

Alias for field number 1

`count()`

    

Return number of occurrences of value.

`property data`

    

Alias for field number 0

`index()`

    

Return first index of value.

Raises ValueError if the value is not present.

`property is_cuda`

    

Returns true if `self.data` stored on a gpu

`is_pinned()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#PackedSequence.is_pinned)

    

Returns true if `self.data` stored on in pinned memory

`property sorted_indices`

    

Alias for field number 2

`to(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#PackedSequence.to)

    

Performs dtype and/or device conversion on `self.data`.

It has similar signature as [`torch.Tensor.to()`](../tensors#torch.Tensor.to
"torch.Tensor.to"), except optional arguments like `non_blocking` and `copy`
should be passed as kwargs, not args, or they will not apply to the index
tensors.

Note

If the `self.data` Tensor already has the correct `torch.dtype` and
`torch.device`, then `self` is returned. Otherwise, returns a copy with the
desired configuration.

`property unsorted_indices`

    

Alias for field number 3

# torch.nn.utils.rnn.pad_packed_sequence

`torch.nn.utils.rnn.pad_packed_sequence(sequence, batch_first=False,
padding_value=0.0, total_length=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pad_packed_sequence)

    

Pads a packed batch of variable length sequences.

It is an inverse operation to
[`pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence
"torch.nn.utils.rnn.pack_padded_sequence").

The returned Tensor’s data will be of size `T x B x *`, where `T` is the
length of the longest sequence and `B` is the batch size. If `batch_first` is
True, the data will be transposed into `B x T x *` format.

#### Example

    
    
    >>> from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence
    >>> seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]])
    >>> lens = [2, 1, 3]
    >>> packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False)
    >>> packed
    PackedSequence(data=tensor([4, 1, 3, 5, 2, 6]), batch_sizes=tensor([3, 2, 1]),
                   sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0]))
    >>> seq_unpacked, lens_unpacked = pad_packed_sequence(packed, batch_first=True)
    >>> seq_unpacked
    tensor([[1, 2, 0],
            [3, 0, 0],
            [4, 5, 6]])
    >>> lens_unpacked
    tensor([2, 1, 3])
    

Note

`total_length` is useful to implement the `pack sequence -> recurrent network
-> unpack sequence` pattern in a [`Module`](torch.nn.module#torch.nn.Module
"torch.nn.Module") wrapped in
[`DataParallel`](torch.nn.dataparallel#torch.nn.DataParallel
"torch.nn.DataParallel"). See [this FAQ
section](https://pytorch.org/docs/1.8.0/notes/faq.html#pack-rnn-unpack-with-
data-parallelism) for details.

Parameters

    

  * **sequence** ([PackedSequence](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence")) – batch to pad
  * **batch_first** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the output will be in `B x T x *` format.
  * **padding_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – values for padded elements.
  * **total_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if not `None`, the output will be padded to have length `total_length`. This method will throw [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError "\(in Python v3.9\)") if `total_length` is less than the max sequence length in `sequence`.

Returns

    

Tuple of Tensor containing the padded sequence, and a Tensor containing the
list of lengths of each sequence in the batch. Batch elements will be re-
ordered as they were ordered originally when the batch was passed to
`pack_padded_sequence` or `pack_sequence`.

# torch.nn.utils.rnn.pad_sequence

`torch.nn.utils.rnn.pad_sequence(sequences, batch_first=False,
padding_value=0.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pad_sequence)

    

Pad a list of variable length Tensors with `padding_value`

`pad_sequence` stacks a list of Tensors along a new dimension, and pads them
to equal length. For example, if the input is list of sequences with size `L x
*` and if batch_first is False, and `T x B x *` otherwise.

`B` is batch size. It is equal to the number of elements in `sequences`. `T`
is length of the longest sequence. `L` is length of the sequence. `*` is any
number of trailing dimensions, including none.

#### Example

    
    
    >>> from torch.nn.utils.rnn import pad_sequence
    >>> a = torch.ones(25, 300)
    >>> b = torch.ones(22, 300)
    >>> c = torch.ones(15, 300)
    >>> pad_sequence([a, b, c]).size()
    torch.Size([25, 3, 300])
    

Note

This function returns a Tensor of size `T x B x *` or `B x T x *` where `T` is
the length of the longest sequence. This function assumes trailing dimensions
and type of all the Tensors in sequences are same.

Parameters

    

  * **sequences** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – list of variable length sequences.
  * **batch_first** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – output will be in `B x T x *` if True, or in `T x B x *` otherwise
  * **padding_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – value for padded elements. Default: 0.

Returns

    

Tensor of size `T x B x *` if `batch_first` is `False`. Tensor of size `B x T
x *` otherwise

# torch.nn.utils.spectral_norm

`torch.nn.utils.spectral_norm(module, name='weight', n_power_iterations=1,
eps=1e-12, dim=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/spectral_norm.html#spectral_norm)

    

Applies spectral normalization to a parameter in the given module.

WSN=Wσ(W),σ(W)=max⁡h:h≠0∥Wh∥2∥h∥2\mathbf{W}_{SN} =
\dfrac{\mathbf{W}}{\sigma(\mathbf{W})}, \sigma(\mathbf{W}) = \max_{\mathbf{h}:
\mathbf{h} \ne 0} \dfrac{\|\mathbf{W} \mathbf{h}\|_2}{\|\mathbf{h}\|_2}

Spectral normalization stabilizes the training of discriminators (critics) in
Generative Adversarial Networks (GANs) by rescaling the weight tensor with
spectral norm σ\sigma of the weight matrix calculated using power iteration
method. If the dimension of the weight tensor is greater than 2, it is
reshaped to 2D in power iteration method to get spectral norm. This is
implemented via a hook that calculates spectral norm and rescales weight
before every `forward()` call.

See [Spectral Normalization for Generative Adversarial
Networks](https://arxiv.org/abs/1802.05957) .

Parameters

    

  * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter
  * **n_power_iterations** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – number of power iterations to calculate spectral norm
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – epsilon for numerical stability in calculating norms
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – dimension corresponding to number of outputs, the default is `0`, except for modules that are instances of ConvTranspose{1,2,3}d, when it is `1`

Returns

    

The original module with the spectral norm hook

Example:

    
    
    >>> m = spectral_norm(nn.Linear(20, 40))
    >>> m
    Linear(in_features=20, out_features=40, bias=True)
    >>> m.weight_u.size()
    torch.Size([40])
    

# torch.nn.utils.vector_to_parameters

`torch.nn.utils.vector_to_parameters(vec, parameters)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/convert_parameters.html#vector_to_parameters)

    

Convert one vector to the parameters

Parameters

    

  * **vec** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a single vector represents the parameters of a model.
  * **parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – an iterator of Tensors that are the parameters of a model.

# torch.nn.utils.weight_norm

`torch.nn.utils.weight_norm(module, name='weight', dim=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/weight_norm.html#weight_norm)

    

Applies weight normalization to a parameter in the given module.

w=gv∥v∥\mathbf{w} = g \dfrac{\mathbf{v}}{\|\mathbf{v}\|}

Weight normalization is a reparameterization that decouples the magnitude of a
weight tensor from its direction. This replaces the parameter specified by
`name` (e.g. `'weight'`) with two parameters: one specifying the magnitude
(e.g. `'weight_g'`) and one specifying the direction (e.g. `'weight_v'`).
Weight normalization is implemented via a hook that recomputes the weight
tensor from the magnitude and direction before every `forward()` call.

By default, with `dim=0`, the norm is computed independently per output
channel/plane. To compute a norm over the entire weight tensor, use
`dim=None`.

See <https://arxiv.org/abs/1602.07868>

Parameters

    

  * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module
  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – dimension over which to compute the norm

Returns

    

The original module with the weight norm hook

Example:

    
    
    >>> m = weight_norm(nn.Linear(20, 40), name='weight')
    >>> m
    Linear(in_features=20, out_features=40, bias=True)
    >>> m.weight_g.size()
    torch.Size([40, 1])
    >>> m.weight_v.size()
    torch.Size([40, 20])
    

# ZeroPad2d

`class torch.nn.ZeroPad2d(padding)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ZeroPad2d)

    

Pads the input tensor boundaries with zero.

For `N`-dimensional padding, use
[`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad
"torch.nn.functional.pad").

Parameters

    

**padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")
_,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all
boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} ,
padding_right\text{padding\\_right} , padding_top\text{padding\\_top} ,
padding_bottom\text{padding\\_bottom} )

Shape:

    

  * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in})
  * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where

Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} +
\text{padding\\_bottom}

Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} +
\text{padding\\_right}

Examples:

    
    
    >>> m = nn.ZeroPad2d(2)
    >>> input = torch.randn(1, 1, 3, 3)
    >>> input
    tensor([[[[-0.1678, -0.4418,  1.9466],
              [ 0.9604, -0.4219, -0.5241],
              [-0.9162, -0.5436, -0.6446]]]])
    >>> m(input)
    tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
              [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
              [ 0.0000,  0.0000, -0.1678, -0.4418,  1.9466,  0.0000,  0.0000],
              [ 0.0000,  0.0000,  0.9604, -0.4219, -0.5241,  0.0000,  0.0000],
              [ 0.0000,  0.0000, -0.9162, -0.5436, -0.6446,  0.0000,  0.0000],
              [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
              [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000]]]])
    >>> # using different paddings for different sides
    >>> m = nn.ZeroPad2d((1, 1, 2, 0))
    >>> m(input)
    tensor([[[[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
              [ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
              [ 0.0000, -0.1678, -0.4418,  1.9466,  0.0000],
              [ 0.0000,  0.9604, -0.4219, -0.5241,  0.0000],
              [ 0.0000, -0.9162, -0.5436, -0.6446,  0.0000]]]])
    

# no_grad

`class torch.no_grad`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#no_grad)

    

Context-manager that disabled gradient calculation.

Disabling gradient calculation is useful for inference, when you are sure that
you will not call [`Tensor.backward()`](../autograd#torch.Tensor.backward
"torch.Tensor.backward"). It will reduce memory consumption for computations
that would otherwise have `requires_grad=True`.

In this mode, the result of every computation will have `requires_grad=False`,
even when the inputs have `requires_grad=True`.

This context manager is thread local; it will not affect computation in other
threads.

Also functions as a decorator. (Make sure to instantiate with parenthesis.)

Example:

    
    
    >>> x = torch.tensor([1], requires_grad=True)
    >>> with torch.no_grad():
    ...   y = x * 2
    >>> y.requires_grad
    False
    >>> @torch.no_grad()
    ... def doubler(x):
    ...     return x * 2
    >>> z = doubler(x)
    >>> z.requires_grad
    False
    

# torch.nonzero

`torch.nonzero(input, *, out=None, as_tuple=False) → LongTensor or tuple of
LongTensors`

    

Note

`torch.nonzero(..., as_tuple=False)` (default) returns a 2-D tensor where each
row is the index for a nonzero value.

`torch.nonzero(..., as_tuple=True)` returns a tuple of 1-D index tensors,
allowing for advanced indexing, so `x[x.nonzero(as_tuple=True)]` gives all
nonzero values of tensor `x`. Of the returned tuple, each index tensor
contains nonzero indices for a certain dimension.

See below for more details on the two behaviors.

When `input` is on CUDA, `torch.nonzero()` causes host-device synchronization.

**When** `as_tuple` **is ``False`` (default)** :

Returns a tensor containing the indices of all non-zero elements of `input`.
Each row in the result contains the indices of a non-zero element in `input`.
The result is sorted lexicographically, with the last index changing the
fastest (C-style).

If `input` has nn dimensions, then the resulting indices tensor `out` is of
size (z×n)(z \times n) , where zz is the total number of non-zero elements in
the `input` tensor.

**When** `as_tuple` **is ``True``** :

Returns a tuple of 1-D tensors, one for each dimension in `input`, each
containing the indices (in that dimension) of all non-zero elements of `input`
.

If `input` has nn dimensions, then the resulting tuple contains nn tensors of
size zz , where zz is the total number of non-zero elements in the `input`
tensor.

As a special case, when `input` has zero dimensions and a nonzero scalar
value, it is treated as a one-dimensional tensor with one element.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** (_LongTensor_ _,__optional_) – the output tensor containing indices

Returns

    

If `as_tuple` is `False`, the output tensor containing indices. If `as_tuple`
is `True`, one 1-D tensor for each dimension, containing the indices of each
nonzero element along that dimension.

Return type

    

LongTensor or tuple of LongTensor

Example:

    
    
    >>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]))
    tensor([[ 0],
            [ 1],
            [ 2],
            [ 4]])
    >>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
    ...                             [0.0, 0.4, 0.0, 0.0],
    ...                             [0.0, 0.0, 1.2, 0.0],
    ...                             [0.0, 0.0, 0.0,-0.4]]))
    tensor([[ 0,  0],
            [ 1,  1],
            [ 2,  2],
            [ 3,  3]])
    >>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]), as_tuple=True)
    (tensor([0, 1, 2, 4]),)
    >>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0],
    ...                             [0.0, 0.4, 0.0, 0.0],
    ...                             [0.0, 0.0, 1.2, 0.0],
    ...                             [0.0, 0.0, 0.0,-0.4]]), as_tuple=True)
    (tensor([0, 1, 2, 3]), tensor([0, 1, 2, 3]))
    >>> torch.nonzero(torch.tensor(5), as_tuple=True)
    (tensor([0]),)
    

# torch.norm

`torch.norm(input, p='fro', dim=None, keepdim=False, out=None, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#norm)

    

Returns the matrix norm or vector norm of a given tensor.

Warning

torch.norm is deprecated and may be removed in a future PyTorch release. Use
[`torch.linalg.norm()`](../linalg#torch.linalg.norm "torch.linalg.norm")
instead, but note that [`torch.linalg.norm()`](../linalg#torch.linalg.norm
"torch.linalg.norm") has a different signature and slightly different behavior
that is more consistent with NumPy’s numpy.linalg.norm.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The input tensor. Its data type must be either a floating point or complex type. For complex inputs, the norm is calculated using the absolute value of each element. If the input is complex and neither `dtype` nor `out` is specified, the result’s data type will be the corresponding floating point type (e.g. float if `input` is complexfloat).
  * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'__,__optional_) – 

the order of norm. Default: `'fro'` The following norms can be calculated:

ord | matrix norm | vector norm  
---|---|---  
’fro’ | Frobenius norm | –  
‘nuc’ | nuclear norm | –  
Number | – | sum(abs(x)**ord)**(1./ord)  
  
The vector norm can be calculated across any number of dimensions. The
corresponding dimensions of `input` are flattened into one dimension, and the
norm is calculated on the flattened dimension.

Frobenius norm produces the same result as `p=2` in all cases except when
`dim` is a list of three or more dims, in which case Frobenius norm throws an
error.

Nuclear norm can only be calculated across exactly two dimensions.

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__tuple of python:ints_ _,__list of python:ints_ _,__optional_) – Specifies which dimension or dimensions of `input` to calculate the norm across. If `dim` is `None`, the norm will be calculated across all dimensions of `input`. If the norm type indicated by `p` does not support the specified number of dimensions, an error will occur.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether the output tensors have `dim` retained or not. Ignored if `dim` = `None` and `out` = `None`. Default: `False`
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Ignored if `dim` = `None` and `out` = `None`.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to :attr:’dtype’ while performing the operation. Default: None.

Note

Even though `p='fro'` supports any number of dimensions, the true mathematical
definition of Frobenius norm only applies to tensors with exactly two
dimensions. [`torch.linalg.norm()`](../linalg#torch.linalg.norm
"torch.linalg.norm") with `ord='fro'` aligns with the mathematical definition,
since it can only be applied across exactly two dimensions.

Example:

    
    
    >>> import torch
    >>> a = torch.arange(9, dtype= torch.float) - 4
    >>> b = a.reshape((3, 3))
    >>> torch.norm(a)
    tensor(7.7460)
    >>> torch.norm(b)
    tensor(7.7460)
    >>> torch.norm(a, float('inf'))
    tensor(4.)
    >>> torch.norm(b, float('inf'))
    tensor(4.)
    >>> c = torch.tensor([[ 1, 2, 3],[-1, 1, 4]] , dtype= torch.float)
    >>> torch.norm(c, dim=0)
    tensor([1.4142, 2.2361, 5.0000])
    >>> torch.norm(c, dim=1)
    tensor([3.7417, 4.2426])
    >>> torch.norm(c, p=1, dim=1)
    tensor([6., 6.])
    >>> d = torch.arange(8, dtype= torch.float).reshape(2,2,2)
    >>> torch.norm(d, dim=(1,2))
    tensor([ 3.7417, 11.2250])
    >>> torch.norm(d[0, :, :]), torch.norm(d[1, :, :])
    (tensor(3.7417), tensor(11.2250))
    

# torch.normal

`torch.normal(mean, std, *, generator=None, out=None) → Tensor`

    

Returns a tensor of random numbers drawn from separate normal distributions
whose mean and standard deviation are given.

The [`mean`](torch.mean#torch.mean "torch.mean") is a tensor with the mean of
each output element’s normal distribution

The [`std`](torch.std#torch.std "torch.std") is a tensor with the standard
deviation of each output element’s normal distribution

The shapes of [`mean`](torch.mean#torch.mean "torch.mean") and
[`std`](torch.std#torch.std "torch.std") don’t need to match, but the total
number of elements in each tensor need to be the same.

Note

When the shapes do not match, the shape of [`mean`](torch.mean#torch.mean
"torch.mean") is used as the shape for the returned output tensor

Parameters

    

  * **mean** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element means
  * **std** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element standard deviations

Keyword Arguments

    

  * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> torch.normal(mean=torch.arange(1., 11.), std=torch.arange(1, 0, -0.1))
    tensor([  1.0425,   3.5672,   2.7969,   4.2925,   4.7229,   6.2134,
              8.0505,   8.1408,   9.0563,  10.0566])
    

`torch.normal(mean=0.0, std, *, out=None) → Tensor`

Similar to the function above, but the means are shared among all drawn
elements.

Parameters

    

  * **mean** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the mean for all distributions
  * **std** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element standard deviations

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.normal(mean=0.5, std=torch.arange(1., 6.))
    tensor([-1.2793, -1.0732, -2.0687,  5.1177, -1.2303])
    

`torch.normal(mean, std=1.0, *, out=None) → Tensor`

Similar to the function above, but the standard-deviations are shared among
all drawn elements.

Parameters

    

  * **mean** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element means
  * **std** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the standard deviation for all distributions

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor

Example:

    
    
    >>> torch.normal(mean=torch.arange(1., 6.))
    tensor([ 1.1552,  2.6148,  2.6535,  5.8318,  4.2361])
    

`torch.normal(mean, std, size, *, out=None) → Tensor`

Similar to the function above, but the means and standard deviations are
shared among all drawn elements. The resulting tensor has size given by
`size`.

Parameters

    

  * **mean** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the mean for all distributions
  * **std** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the standard deviation for all distributions
  * **size** (_int..._) – a sequence of integers defining the shape of the output tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.normal(2, 3, size=(1, 4))
    tensor([[-1.3987, -1.9544,  3.6048,  0.7909]])
    

# torch.not_equal

`torch.not_equal(input, other, *, out=None) → Tensor`

    

Alias for [`torch.ne()`](torch.ne#torch.ne "torch.ne").

# torch.numel

`torch.numel(input) → int`

    

Returns the total number of elements in the `input` tensor.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> a = torch.randn(1, 2, 3, 4, 5)
    >>> torch.numel(a)
    120
    >>> a = torch.zeros(4,4)
    >>> torch.numel(a)
    16
    

# torch.ones

`torch.ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None,
requires_grad=False) → Tensor`

    

Returns a tensor filled with the scalar value `1`, with the shape defined by
the variable argument `size`.

Parameters

    

**size** (_int..._) – a sequence of integers defining the shape of the output
tensor. Can be a variable number of arguments or a collection like a list or
tuple.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.ones(2, 3)
    tensor([[ 1.,  1.,  1.],
            [ 1.,  1.,  1.]])
    
    >>> torch.ones(5)
    tensor([ 1.,  1.,  1.,  1.,  1.])
    

# torch.ones_like

`torch.ones_like(input, *, dtype=None, layout=None, device=None,
requires_grad=False, memory_format=torch.preserve_format) → Tensor`

    

Returns a tensor filled with the scalar value `1`, with the same size as
`input`. `torch.ones_like(input)` is equivalent to `torch.ones(input.size(),
dtype=input.dtype, layout=input.layout, device=input.device)`.

Warning

As of 0.4, this function does not support an `out` keyword. As an alternative,
the old `torch.ones_like(input, out=output)` is equivalent to
`torch.ones(input.size(), out=output)`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of
`input` will determine size of the output tensor.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

Example:

    
    
    >>> input = torch.empty(2, 3)
    >>> torch.ones_like(input)
    tensor([[ 1.,  1.,  1.],
            [ 1.,  1.,  1.]])
    

# torch.orgqr

`torch.orgqr(input, input2) → Tensor`

    

Computes the orthogonal matrix `Q` of a QR factorization, from the `(input,
input2)` tuple returned by [`torch.geqrf()`](torch.geqrf#torch.geqrf
"torch.geqrf").

This directly calls the underlying LAPACK function `?orgqr`. See [LAPACK
documentation for orgqr](https://software.intel.com/en-us/mkl-developer-
reference-c-orgqr) for further details.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `a` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf").
  * **input2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `tau` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf").

# torch.ormqr

`torch.ormqr(input, input2, input3, left=True, transpose=False) → Tensor`

    

Multiplies `mat` (given by `input3`) by the orthogonal `Q` matrix of the QR
factorization formed by [`torch.geqrf()`](torch.geqrf#torch.geqrf
"torch.geqrf") that is represented by `(a, tau)` (given by (`input`,
`input2`)).

This directly calls the underlying LAPACK function `?ormqr`. See [LAPACK
documentation for ormqr](https://software.intel.com/en-us/mkl-developer-
reference-c-ormqr) for further details.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `a` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf").
  * **input2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `tau` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf").
  * **input3** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the matrix to be multiplied.

# torch.outer

`torch.outer(input, vec2, *, out=None) → Tensor`

    

Outer product of `input` and `vec2`. If `input` is a vector of size nn and
`vec2` is a vector of size mm , then `out` must be a matrix of size (n×m)(n
\times m) .

Note

This function does not
[broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D input vector
  * **vec2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D input vector

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) –
optional output matrix

Example:

    
    
    >>> v1 = torch.arange(1., 5.)
    >>> v2 = torch.arange(1., 4.)
    >>> torch.outer(v1, v2)
    tensor([[  1.,   2.,   3.],
            [  2.,   4.,   6.],
            [  3.,   6.,   9.],
            [  4.,   8.,  12.]])
    

# torch.pca_lowrank

`torch.pca_lowrank(A, q=None, center=True, niter=2)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_lowrank.html#pca_lowrank)

    

Performs linear Principal Component Analysis (PCA) on a low-rank matrix,
batches of such matrices, or sparse matrix.

This function returns a namedtuple `(U, S, V)` which is the nearly optimal
approximation of a singular value decomposition of a centered matrix AA such
that A=Udiag(S)VTA = U diag(S) V^T .

Note

The relation of `(U, S, V)` to PCA is as follows:

  * AA is a data matrix with `m` samples and `n` features
  * the VV columns represent the principal directions
  * S∗∗2/(m−1)S ** 2 / (m - 1) contains the eigenvalues of ATA/(m−1)A^T A / (m - 1) which is the covariance of `A` when `center=True` is provided.
  * `matmul(A, V[:, :k])` projects data to the first k principal components

Note

Different from the standard SVD, the size of returned matrices depend on the
specified rank and q values as follows:

  * UU is m x q matrix
  * SS is q-vector
  * VV is n x q matrix

Note

To obtain repeatable results, reset the seed for the pseudorandom number
generator

Parameters

    

  * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n)
  * **q** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a slightly overestimated rank of AA . By default, `q = min(6, m, n)`.
  * **center** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, center the input tensor, otherwise, assume that the input is centered.
  * **niter** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of subspace iterations to conduct; niter must be a nonnegative integer, and defaults to 2.

References:

    
    
    - Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp, Finding
      structure with randomness: probabilistic algorithms for
      constructing approximate matrix decompositions,
      arXiv:0909.4061 [math.NA; math.PR], 2009 (available at
      `arXiv <http://arxiv.org/abs/0909.4061>`_).
    

# torch.pinverse

`torch.pinverse(input, rcond=1e-15) → Tensor`

    

Calculates the pseudo-inverse (also known as the Moore-Penrose inverse) of a
2D tensor. Please look at [Moore-Penrose
inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse) for more
details

Note

`torch.pinverse()` is deprecated. Please use
[`torch.linalg.pinv()`](../linalg#torch.linalg.pinv "torch.linalg.pinv")
instead which includes new parameters `hermitian` and `out`.

Note

This method is implemented using the Singular Value Decomposition.

Note

The pseudo-inverse is not necessarily a continuous function in the elements of
the matrix [[1]](https://epubs.siam.org/doi/10.1137/0117004). Therefore,
derivatives are not always existent, and exist for a constant rank only
[[2]](https://www.jstor.org/stable/2156365). However, this method is backprop-
able due to the implementation by using SVD results, and could be unstable.
Double-backward will also be unstable due to the usage of SVD internally. See
[`svd()`](torch.svd#torch.svd "torch.svd") for more details.

Note

Supports real and complex inputs. Batched version for complex inputs is only
supported on the CPU.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The input tensor of size (∗,m,n)(*, m, n) where ∗* is zero or more batch dimensions.
  * **rcond** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – A floating point value to determine the cutoff for small singular values. Default: `1e-15`.

Returns

    

The pseudo-inverse of `input` of dimensions (∗,n,m)(*, n, m)

Example:

    
    
    >>> input = torch.randn(3, 5)
    >>> input
    tensor([[ 0.5495,  0.0979, -1.4092, -0.1128,  0.4132],
            [-1.1143, -0.3662,  0.3042,  1.6374, -0.9294],
            [-0.3269, -0.5745, -0.0382, -0.5922, -0.6759]])
    >>> torch.pinverse(input)
    tensor([[ 0.0600, -0.1933, -0.2090],
            [-0.0903, -0.0817, -0.4752],
            [-0.7124, -0.1631, -0.2272],
            [ 0.1356,  0.3933, -0.5023],
            [-0.0308, -0.1725, -0.5216]])
    >>> # Batched pinverse example
    >>> a = torch.randn(2,6,3)
    >>> b = torch.pinverse(a)
    >>> torch.matmul(b, a)
    tensor([[[ 1.0000e+00,  1.6391e-07, -1.1548e-07],
            [ 8.3121e-08,  1.0000e+00, -2.7567e-07],
            [ 3.5390e-08,  1.4901e-08,  1.0000e+00]],
    
            [[ 1.0000e+00, -8.9407e-08,  2.9802e-08],
            [-2.2352e-07,  1.0000e+00,  1.1921e-07],
            [ 0.0000e+00,  8.9407e-08,  1.0000e+00]]])
    

# torch.poisson

`torch.poisson(input, generator=None) → Tensor`

    

Returns a tensor of the same size as `input` with each element sampled from a
Poisson distribution with rate parameter given by the corresponding element in
`input` i.e.,

outi∼Poisson(inputi)\text{out}_i \sim \text{Poisson}(\text{input}_i)

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor containing the rates of the Poisson distribution

Keyword Arguments

    

**generator** ([`torch.Generator`](torch.generator#torch.Generator
"torch.Generator"), optional) – a pseudorandom number generator for sampling

Example:

    
    
    >>> rates = torch.rand(4, 4) * 5  # rate parameter between 0 and 5
    >>> torch.poisson(rates)
    tensor([[9., 1., 3., 5.],
            [8., 6., 6., 0.],
            [0., 4., 5., 3.],
            [2., 1., 4., 2.]])
    

# torch.polar

`torch.polar(abs, angle, *, out=None) → Tensor`

    

Constructs a complex tensor whose elements are Cartesian coordinates
corresponding to the polar coordinates with absolute value
[`abs`](torch.abs#torch.abs "torch.abs") and angle
[`angle`](torch.angle#torch.angle "torch.angle").

out=abs⋅cos⁡(angle)+abs⋅sin⁡(angle)⋅j\text{out} = \text{abs} \cdot
\cos(\text{angle}) + \text{abs} \cdot \sin(\text{angle}) \cdot j

Parameters

    

  * **abs** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The absolute value the complex tensor. Must be float or double.
  * **angle** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The angle of the complex tensor. Must be same dtype as [`abs`](torch.abs#torch.abs "torch.abs").

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – If the inputs are
`torch.float32`, must be `torch.complex64`. If the inputs are `torch.float64`,
must be `torch.complex128`.

Example::

    
    
    
    >>> import numpy as np
    >>> abs = torch.tensor([1, 2], dtype=torch.float64)
    >>> angle = torch.tensor([np.pi / 2, 5 * np.pi / 4], dtype=torch.float64)
    >>> z = torch.polar(abs, angle)
    >>> z
    tensor([(0.0000+1.0000j), (-1.4142-1.4142j)], dtype=torch.complex128)
    

# torch.polygamma

`torch.polygamma(n, input, *, out=None) → Tensor`

    

Computes the nthn^{th} derivative of the digamma function on `input`. n≥0n
\geq 0 is called the order of the polygamma function.

ψ(n)(x)=d(n)dx(n)ψ(x)\psi^{(n)}(x) = \frac{d^{(n)}}{dx^{(n)}} \psi(x)

Note

This function is implemented only for nonnegative integers n≥0n \geq 0 .

Parameters

    

  * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the order of the polygamma function
  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example::

    
    
    
    >>> a = torch.tensor([1, 0.5])
    >>> torch.polygamma(1, a)
    tensor([1.64493, 4.9348])
    >>> torch.polygamma(2, a)
    tensor([ -2.4041, -16.8288])
    >>> torch.polygamma(3, a)
    tensor([ 6.4939, 97.4091])
    >>> torch.polygamma(4, a)
    tensor([ -24.8863, -771.4742])
    

# torch.pow

`torch.pow(input, exponent, *, out=None) → Tensor`

    

Takes the power of each element in `input` with `exponent` and returns a
tensor with the result.

`exponent` can be either a single `float` number or a `Tensor` with the same
number of elements as `input`.

When `exponent` is a scalar value, the operation applied is:

outi=xiexponent\text{out}_i = x_i ^ \text{exponent}

When `exponent` is a tensor, the operation applied is:

outi=xiexponenti\text{out}_i = x_i ^ {\text{exponent}_i}

When `exponent` is a tensor, the shapes of `input` and `exponent` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **exponent** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _tensor_) – the exponent value

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.4331,  1.2475,  0.6834, -0.2791])
    >>> torch.pow(a, 2)
    tensor([ 0.1875,  1.5561,  0.4670,  0.0779])
    >>> exp = torch.arange(1., 5.)
    
    >>> a = torch.arange(1., 5.)
    >>> a
    tensor([ 1.,  2.,  3.,  4.])
    >>> exp
    tensor([ 1.,  2.,  3.,  4.])
    >>> torch.pow(a, exp)
    tensor([   1.,    4.,   27.,  256.])
    

`torch.pow(self, exponent, *, out=None) → Tensor`

`self` is a scalar `float` value, and `exponent` is a tensor. The returned
tensor `out` is of the same shape as `exponent`

The operation applied is:

outi=selfexponenti\text{out}_i = \text{self} ^ {\text{exponent}_i}

Parameters

    

  * **self** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the scalar base value for the power operation
  * **exponent** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the exponent tensor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> exp = torch.arange(1., 5.)
    >>> base = 2
    >>> torch.pow(base, exp)
    tensor([  2.,   4.,   8.,  16.])
    

# torch.prod

`torch.prod(input, *, dtype=None) → Tensor`

    

Returns the product of all elements in the `input` tensor.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), optional) – the desired data type of returned tensor. If
specified, the input tensor is casted to `dtype` before the operation is
performed. This is useful for preventing data type overflows. Default: None.

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[-0.8020,  0.5428, -1.5854]])
    >>> torch.prod(a)
    tensor(0.6902)
    

`torch.prod(input, dim, keepdim=False, *, dtype=None) → Tensor`

Returns the product of each row of the `input` tensor in the given dimension
`dim`.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 fewer dimension
than `input`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), optional) – the desired data type of returned tensor. If
specified, the input tensor is casted to `dtype` before the operation is
performed. This is useful for preventing data type overflows. Default: None.

Example:

    
    
    >>> a = torch.randn(4, 2)
    >>> a
    tensor([[ 0.5261, -0.3837],
            [ 1.1857, -0.2498],
            [-1.1646,  0.0705],
            [ 1.1131, -1.0629]])
    >>> torch.prod(a, 1)
    tensor([-0.2018, -0.2962, -0.0821, -1.1831])
    

# torch.promote_types

`torch.promote_types(type1, type2) → dtype`

    

Returns the [`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype") with the smallest size and scalar kind that is not
smaller nor of lower kind than either `type1` or `type2`. See type promotion
[documentation](../tensor_attributes#type-promotion-doc) for more information
on the type promotion logic.

Parameters

    

  * **type1** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – 
  * **type2** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – 

Example:

    
    
    >>> torch.promote_types(torch.int32, torch.float32)
    torch.float32
    >>> torch.promote_types(torch.uint8, torch.long)
    torch.long
    

# torch.qr

`torch.qr(input, some=True, *, out=None) -> (Tensor, Tensor)`

    

Computes the QR decomposition of a matrix or a batch of matrices `input`, and
returns a namedtuple (Q, R) of tensors such that input=QR\text{input} = Q R
with QQ being an orthogonal matrix or batch of orthogonal matrices and RR
being an upper triangular matrix or batch of upper triangular matrices.

If `some` is `True`, then this function returns the thin (reduced) QR
factorization. Otherwise, if `some` is `False`, this function returns the
complete QR factorization.

Warning

`torch.qr` is deprecated. Please use
[`torch.linalg.qr()`](../linalg#torch.linalg.qr "torch.linalg.qr") instead.

**Differences with** `torch.linalg.qr`:

  * `torch.linalg.qr` takes a string parameter `mode` instead of `some`:

    * `some=True` is equivalent of `mode='reduced'`: both are the default
    * `some=False` is equivalent of `mode='complete'`.

Warning

If you plan to backpropagate through QR, note that the current backward
implementation is only well-defined when the first
min⁡(input.size(−1),input.size(−2))\min(input.size(-1), input.size(-2))
columns of `input` are linearly independent. This behavior will propably
change once QR supports pivoting.

Note

This function uses LAPACK for CPU inputs and MAGMA for CUDA inputs, and may
produce different (valid) decompositions on different device types or
different platforms.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n) where `*` is zero or more batch dimensions consisting of matrices of dimension m×nm \times n .
  * **some** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – 

Set to `True` for reduced QR decomposition and `False` for complete QR
decomposition. If `k = min(m, n)` then:

    * `some=True` : returns `(Q, R)` with dimensions (m, k), (k, n) (default)
    * `'some=False'`: returns `(Q, R)` with dimensions (m, m), (m, n)

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – tuple of `Q` and `R` tensors. The dimensions
of `Q` and `R` are detailed in the description of `some` above.

Example:

    
    
    >>> a = torch.tensor([[12., -51, 4], [6, 167, -68], [-4, 24, -41]])
    >>> q, r = torch.qr(a)
    >>> q
    tensor([[-0.8571,  0.3943,  0.3314],
            [-0.4286, -0.9029, -0.0343],
            [ 0.2857, -0.1714,  0.9429]])
    >>> r
    tensor([[ -14.0000,  -21.0000,   14.0000],
            [   0.0000, -175.0000,   70.0000],
            [   0.0000,    0.0000,  -35.0000]])
    >>> torch.mm(q, r).round()
    tensor([[  12.,  -51.,    4.],
            [   6.,  167.,  -68.],
            [  -4.,   24.,  -41.]])
    >>> torch.mm(q.t(), q).round()
    tensor([[ 1.,  0.,  0.],
            [ 0.,  1., -0.],
            [ 0., -0.,  1.]])
    >>> a = torch.randn(3, 4, 5)
    >>> q, r = torch.qr(a, some=False)
    >>> torch.allclose(torch.matmul(q, r), a)
    True
    >>> torch.allclose(torch.matmul(q.transpose(-2, -1), q), torch.eye(5))
    True
    

# torch.quantile

`torch.quantile(input, q) → Tensor`

    

Returns the q-th quantiles of all elements in the `input` tensor, doing a
linear interpolation when the q-th quantile lies between two data points.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **q** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – a scalar or 1D tensor of quantile values in the range [0, 1]

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[ 0.0700, -0.5446,  0.9214]])
    >>> q = torch.tensor([0, 0.5, 1])
    >>> torch.quantile(a, q)
    tensor([-0.5446,  0.0700,  0.9214])
    

`torch.quantile(input, q, dim=None, keepdim=False, *, out=None) → Tensor`

Returns the q-th quantiles of each row of the `input` tensor along the
dimension `dim`, doing a linear interpolation when the q-th quantile lies
between two data points. By default, `dim` is `None` resulting in the `input`
tensor being flattened before computation.

If `keepdim` is `True`, the output dimensions are of the same size as `input`
except in the dimensions being reduced (`dim` or all if `dim` is `None`) where
they have size 1. Otherwise, the dimensions being reduced are squeezed (see
[`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")). If `q` is a
1D tensor, an extra dimension is prepended to the output tensor with the same
size as `q` which represents the quantiles.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **q** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – a scalar or 1D tensor of quantile values in the range [0, 1]
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(2, 3)
    >>> a
    tensor([[ 0.0795, -1.2117,  0.9765],
            [ 1.1707,  0.6706,  0.4884]])
    >>> q = torch.tensor([0.25, 0.5, 0.75])
    >>> torch.quantile(a, q, dim=1, keepdim=True)
    tensor([[[-0.5661],
            [ 0.5795]],
    
            [[ 0.0795],
            [ 0.6706]],
    
            [[ 0.5280],
            [ 0.9206]]])
    >>> torch.quantile(a, q, dim=1, keepdim=True).shape
    torch.Size([3, 2, 1])
    

# torch.quantize_per_channel

`torch.quantize_per_channel(input, scales, zero_points, axis, dtype) → Tensor`

    

Converts a float tensor to a per-channel quantized tensor with given scales
and zero points.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – float tensor to quantize
  * **scales** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – float 1D tensor of scales to use, size should match `input.size(axis)`
  * **zero_points** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – integer 1D tensor of offset to use, size should match `input.size(axis)`
  * **axis** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension on which apply per-channel quantization
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – the desired data type of returned tensor. Has to be one of the quantized dtypes: `torch.quint8`, `torch.qint8`, `torch.qint32`

Returns

    

A newly quantized tensor

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> x = torch.tensor([[-1.0, 0.0], [1.0, 2.0]])
    >>> torch.quantize_per_channel(x, torch.tensor([0.1, 0.01]), torch.tensor([10, 0]), 0, torch.quint8)
    tensor([[-1.,  0.],
            [ 1.,  2.]], size=(2, 2), dtype=torch.quint8,
           quantization_scheme=torch.per_channel_affine,
           scale=tensor([0.1000, 0.0100], dtype=torch.float64),
           zero_point=tensor([10,  0]), axis=0)
    >>> torch.quantize_per_channel(x, torch.tensor([0.1, 0.01]), torch.tensor([10, 0]), 0, torch.quint8).int_repr()
    tensor([[  0,  10],
            [100, 200]], dtype=torch.uint8)
    

# torch.quantize_per_tensor

`torch.quantize_per_tensor(input, scale, zero_point, dtype) → Tensor`

    

Converts a float tensor to a quantized tensor with given scale and zero point.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – float tensor to quantize
  * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – scale to apply in quantization formula
  * **zero_point** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – offset in integer value that maps to float zero
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – the desired data type of returned tensor. Has to be one of the quantized dtypes: `torch.quint8`, `torch.qint8`, `torch.qint32`

Returns

    

A newly quantized tensor

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> torch.quantize_per_tensor(torch.tensor([-1.0, 0.0, 1.0, 2.0]), 0.1, 10, torch.quint8)
    tensor([-1.,  0.,  1.,  2.], size=(4,), dtype=torch.quint8,
           quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=10)
    >>> torch.quantize_per_tensor(torch.tensor([-1.0, 0.0, 1.0, 2.0]), 0.1, 10, torch.quint8).int_repr()
    tensor([ 0, 10, 20, 30], dtype=torch.uint8)
    

# SobolEngine

`class torch.quasirandom.SobolEngine(dimension, scramble=False, seed=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine)

    

The `torch.quasirandom.SobolEngine` is an engine for generating (scrambled)
Sobol sequences. Sobol sequences are an example of low discrepancy quasi-
random sequences.

This implementation of an engine for Sobol sequences is capable of sampling
sequences up to a maximum dimension of 21201. It uses direction numbers from
<https://web.maths.unsw.edu.au/~fkuo/sobol/> obtained using the search
criterion D(6) up to the dimension 21201. This is the recommended choice by
the authors.

#### References

  * Art B. Owen. Scrambling Sobol and Niederreiter-Xing points. Journal of Complexity, 14(4):466-489, December 1998.
  * I. M. Sobol. The distribution of points in a cube and the accurate evaluation of integrals. Zh. Vychisl. Mat. i Mat. Phys., 7:784-802, 1967.

Parameters

    

  * **dimension** (_Int_) – The dimensionality of the sequence to be drawn
  * **scramble** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Setting this to `True` will produce scrambled Sobol sequences. Scrambling is capable of producing better Sobol sequences. Default: `False`.
  * **seed** (_Int_ _,__optional_) – This is the seed for the scrambling. The seed of the random number generator is set to this, if specified. Otherwise, it uses a random seed. Default: `None`

Examples:

    
    
    >>> soboleng = torch.quasirandom.SobolEngine(dimension=5)
    >>> soboleng.draw(3)
    tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
            [0.7500, 0.2500, 0.7500, 0.2500, 0.7500],
            [0.2500, 0.7500, 0.2500, 0.7500, 0.2500]])
    

`draw(n=1, out=None, dtype=torch.float32)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.draw)

    

Function to draw a sequence of `n` points from a Sobol sequence. Note that the
samples are dependent on the previous samples. The size of the result is
(n,dimension)(n, dimension) .

Parameters

    

  * **n** (_Int_ _,__optional_) – The length of sequence of points to draw. Default: 1
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor
  * **dtype** (`torch.dtype`, optional) – the desired data type of the returned tensor. Default: `torch.float32`

`draw_base2(m, out=None, dtype=torch.float32)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.draw_base2)

    

Function to draw a sequence of `2**m` points from a Sobol sequence. Note that
the samples are dependent on the previous samples. The size of the result is
(2∗∗m,dimension)(2**m, dimension) .

Parameters

    

  * **m** (_Int_) – The (base2) exponent of the number of points to draw.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor
  * **dtype** (`torch.dtype`, optional) – the desired data type of the returned tensor. Default: `torch.float32`

`fast_forward(n)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.fast_forward)

    

Function to fast-forward the state of the `SobolEngine` by `n` steps. This is
equivalent to drawing `n` samples without using the samples.

Parameters

    

**n** (_Int_) – The number of steps to fast-forward by.

`reset()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.reset)

    

Function to reset the `SobolEngine` to base state.

# torch.rad2deg

`torch.rad2deg(input, *, out=None) → Tensor`

    

Returns a new tensor with each of the elements of `input` converted from
angles in radians to degrees.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([[3.142, -3.142], [6.283, -6.283], [1.570, -1.570]])
    >>> torch.rad2deg(a)
    tensor([[ 180.0233, -180.0233],
            [ 359.9894, -359.9894],
            [  89.9544,  -89.9544]])
    

# torch.rand

`torch.rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None,
requires_grad=False) → Tensor`

    

Returns a tensor filled with random numbers from a uniform distribution on the
interval [0,1)[0, 1)

The shape of the tensor is defined by the variable argument `size`.

Parameters

    

**size** (_int..._) – a sequence of integers defining the shape of the output
tensor. Can be a variable number of arguments or a collection like a list or
tuple.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.rand(4)
    tensor([ 0.5204,  0.2503,  0.3525,  0.5673])
    >>> torch.rand(2, 3)
    tensor([[ 0.8237,  0.5781,  0.6879],
            [ 0.3816,  0.7249,  0.0998]])
    

# torch.rand_like

`torch.rand_like(input, *, dtype=None, layout=None, device=None,
requires_grad=False, memory_format=torch.preserve_format) → Tensor`

    

Returns a tensor with the same size as `input` that is filled with random
numbers from a uniform distribution on the interval [0,1)[0, 1) .
`torch.rand_like(input)` is equivalent to `torch.rand(input.size(),
dtype=input.dtype, layout=input.layout, device=input.device)`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of
`input` will determine size of the output tensor.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

# torch.randint

`torch.randint(low=0, high, size, *, generator=None, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Returns a tensor filled with random integers generated uniformly between `low`
(inclusive) and `high` (exclusive).

The shape of the tensor is defined by the variable argument `size`.

Note

With the global dtype default (`torch.float32`), this function returns a
tensor with dtype `torch.int64`.

Parameters

    

  * **low** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Lowest integer to be drawn from the distribution. Default: 0.
  * **high** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – One above the highest integer to be drawn from the distribution.
  * **size** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – a tuple defining the shape of the output tensor.

Keyword Arguments

    

  * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.randint(3, 5, (3,))
    tensor([4, 3, 4])
    
    
    >>> torch.randint(10, (2, 2))
    tensor([[0, 2],
            [5, 5]])
    
    
    >>> torch.randint(3, 10, (2, 2))
    tensor([[4, 5],
            [6, 7]])
    

# torch.randint_like

`torch.randint_like(input, low=0, high, *, dtype=None, layout=torch.strided,
device=None, requires_grad=False, memory_format=torch.preserve_format) →
Tensor`

    

Returns a tensor with the same shape as Tensor `input` filled with random
integers generated uniformly between `low` (inclusive) and `high` (exclusive).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor.
  * **low** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Lowest integer to be drawn from the distribution. Default: 0.
  * **high** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – One above the highest integer to be drawn from the distribution.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

# torch.randn

`torch.randn(*size, *, out=None, dtype=None, layout=torch.strided,
device=None, requires_grad=False) → Tensor`

    

Returns a tensor filled with random numbers from a normal distribution with
mean `0` and variance `1` (also called the standard normal distribution).

outi∼N(0,1)\text{out}_{i} \sim \mathcal{N}(0, 1)

The shape of the tensor is defined by the variable argument `size`.

Parameters

    

**size** (_int..._) – a sequence of integers defining the shape of the output
tensor. Can be a variable number of arguments or a collection like a list or
tuple.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.randn(4)
    tensor([-2.1436,  0.9966,  2.3426, -0.6366])
    >>> torch.randn(2, 3)
    tensor([[ 1.5954,  2.8929, -1.0923],
            [ 1.1719, -0.4709, -0.1996]])
    

# torch.randn_like

`torch.randn_like(input, *, dtype=None, layout=None, device=None,
requires_grad=False, memory_format=torch.preserve_format) → Tensor`

    

Returns a tensor with the same size as `input` that is filled with random
numbers from a normal distribution with mean 0 and variance 1.
`torch.randn_like(input)` is equivalent to `torch.randn(input.size(),
dtype=input.dtype, layout=input.layout, device=input.device)`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of
`input` will determine size of the output tensor.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

# torch.randperm

`torch.randperm(n, *, generator=None, out=None, dtype=torch.int64,
layout=torch.strided, device=None, requires_grad=False, pin_memory=False) →
Tensor`

    

Returns a random permutation of integers from `0` to `n - 1`.

Parameters

    

**n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python
v3.9\)")) – the upper bound (exclusive)

Keyword Arguments

    

  * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: `torch.int64`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`.

Example:

    
    
    >>> torch.randperm(4)
    tensor([2, 1, 0, 3])
    

# torch.range

`torch.range(start=0, end, step=1, *, out=None, dtype=None,
layout=torch.strided, device=None, requires_grad=False) → Tensor`

    

Returns a 1-D tensor of size ⌊end−startstep⌋+1\left\lfloor \frac{\text{end} -
\text{start}}{\text{step}} \right\rfloor + 1 with values from `start` to `end`
with step `step`. Step is the gap between two values in the tensor.

outi+1=outi+step.\text{out}_{i+1} = \text{out}_i + \text{step}.

Warning

This function is deprecated and will be removed in a future release because
its behavior is inconsistent with Python’s range builtin. Instead, use
[`torch.arange()`](torch.arange#torch.arange "torch.arange"), which produces
values in [start, end).

Parameters

    

  * **start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the starting value for the set of points. Default: `0`.
  * **end** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the ending value for the set of points
  * **step** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the gap between each pair of adjacent points. Default: `1`.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). If `dtype` is not given, infer the data type from the other input arguments. If any of `start`, `end`, or `stop` are floating-point, the `dtype` is inferred to be the default dtype, see [`get_default_dtype()`](torch.get_default_dtype#torch.get_default_dtype "torch.get_default_dtype"). Otherwise, the `dtype` is inferred to be `torch.int64`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.range(1, 4)
    tensor([ 1.,  2.,  3.,  4.])
    >>> torch.range(1, 4, 0.5)
    tensor([ 1.0000,  1.5000,  2.0000,  2.5000,  3.0000,  3.5000,  4.0000])
    

# torch.ravel

`torch.ravel(input) → Tensor`

    

Return a contiguous flattened tensor. A copy is made only if needed.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> t = torch.tensor([[[1, 2],
    ...                    [3, 4]],
    ...                   [[5, 6],
    ...                    [7, 8]]])
    >>> torch.ravel(t)
    tensor([1, 2, 3, 4, 5, 6, 7, 8])
    

# torch.real

`torch.real(input) → Tensor`

    

Returns a new tensor containing real values of the `self` tensor. The returned
tensor and `self` share the same underlying storage.

Warning

`real()` is only supported for tensors with complex dtypes.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example::

    
    
    
    >>> x=torch.randn(4, dtype=torch.cfloat)
    >>> x
    tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)])
    >>> x.real
    tensor([ 0.3100, -0.5445, -1.6492, -0.0638])
    

# torch.reciprocal

`torch.reciprocal(input, *, out=None) → Tensor`

    

Returns a new tensor with the reciprocal of the elements of `input`

Note

Unlike NumPy’s reciprocal, torch.reciprocal supports integral inputs. Integral
inputs to reciprocal are automatically [promoted](../tensor_attributes#type-
promotion-doc) to the default scalar type.

outi=1inputi\text{out}_{i} = \frac{1}{\text{input}_{i}}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.4595, -2.1219, -1.4314,  0.7298])
    >>> torch.reciprocal(a)
    tensor([-2.1763, -0.4713, -0.6986,  1.3702])
    

# torch.remainder

`torch.remainder(input, other, *, out=None) → Tensor`

    

Computes the element-wise remainder of division.

The dividend and divisor may contain both for integer and floating point
numbers. The remainder has the same sign as the divisor `other`.

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics), [type promotion](../tensor_attributes#type-promotion-doc), and
integer and float inputs.

Note

Complex inputs are not supported. In some cases, it is not mathematically
possible to satisfy the definition of a modulo operation with complex numbers.
See [`torch.fmod()`](torch.fmod#torch.fmod "torch.fmod") for how division by
zero is handled.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the dividend
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – the divisor

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.remainder(torch.tensor([-3., -2, -1, 1, 2, 3]), 2)
    tensor([ 1.,  0.,  1.,  1.,  0.,  1.])
    >>> torch.remainder(torch.tensor([1, 2, 3, 4, 5]), 1.5)
    tensor([ 1.0000,  0.5000,  0.0000,  1.0000,  0.5000])
    

See also

[`torch.fmod()`](torch.fmod#torch.fmod "torch.fmod"), which computes the
element-wise remainder of division equivalently to the C library function
`fmod()`.

# torch.renorm

`torch.renorm(input, p, dim, maxnorm, *, out=None) → Tensor`

    

Returns a tensor where each sub-tensor of `input` along dimension `dim` is
normalized such that the `p`-norm of the sub-tensor is lower than the value
`maxnorm`

Note

If the norm of a row is lower than `maxnorm`, the row is unchanged

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the power for the norm computation
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to slice over to get the sub-tensors
  * **maxnorm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the maximum norm to keep each sub-tensor under

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x = torch.ones(3, 3)
    >>> x[1].fill_(2)
    tensor([ 2.,  2.,  2.])
    >>> x[2].fill_(3)
    tensor([ 3.,  3.,  3.])
    >>> x
    tensor([[ 1.,  1.,  1.],
            [ 2.,  2.,  2.],
            [ 3.,  3.,  3.]])
    >>> torch.renorm(x, 1, 0, 5)
    tensor([[ 1.0000,  1.0000,  1.0000],
            [ 1.6667,  1.6667,  1.6667],
            [ 1.6667,  1.6667,  1.6667]])
    

# torch.repeat_interleave

`torch.repeat_interleave(input, repeats, dim=None) → Tensor`

    

Repeat elements of a tensor.

Warning

This is different from
[`torch.Tensor.repeat()`](../tensors#torch.Tensor.repeat
"torch.Tensor.repeat") but similar to `numpy.repeat`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **repeats** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of repetitions for each element. repeats is broadcasted to fit the shape of the given axis.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to repeat values. By default, use the flattened input array, and return a flat output array.

Returns

    

Repeated tensor which has the same shape as input, except along the given
axis.

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> x = torch.tensor([1, 2, 3])
    >>> x.repeat_interleave(2)
    tensor([1, 1, 2, 2, 3, 3])
    >>> y = torch.tensor([[1, 2], [3, 4]])
    >>> torch.repeat_interleave(y, 2)
    tensor([1, 1, 2, 2, 3, 3, 4, 4])
    >>> torch.repeat_interleave(y, 3, dim=1)
    tensor([[1, 1, 1, 2, 2, 2],
            [3, 3, 3, 4, 4, 4]])
    >>> torch.repeat_interleave(y, torch.tensor([1, 2]), dim=0)
    tensor([[1, 2],
            [3, 4],
            [3, 4]])
    

`torch.repeat_interleave(repeats) → Tensor`

If the `repeats` is `tensor([n1, n2, n3, …])`, then the output will be
`tensor([0, 0, …, 1, 1, …, 2, 2, …, …])` where `0` appears `n1` times, `1`
appears `n2` times, `2` appears `n3` times, etc.

# torch.reshape

`torch.reshape(input, shape) → Tensor`

    

Returns a tensor with the same data and number of elements as `input`, but
with the specified shape. When possible, the returned tensor will be a view of
`input`. Otherwise, it will be a copy. Contiguous inputs and inputs with
compatible strides can be reshaped without copying, but you should not depend
on the copying vs. viewing behavior.

See [`torch.Tensor.view()`](../tensors#torch.Tensor.view "torch.Tensor.view")
on when it is possible to return a view.

A single dimension may be -1, in which case it’s inferred from the remaining
dimensions and the number of elements in `input`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be reshaped
  * **shape** (_tuple of python:ints_) – the new shape

Example:

    
    
    >>> a = torch.arange(4.)
    >>> torch.reshape(a, (2, 2))
    tensor([[ 0.,  1.],
            [ 2.,  3.]])
    >>> b = torch.tensor([[0, 1], [2, 3]])
    >>> torch.reshape(b, (-1,))
    tensor([ 0,  1,  2,  3])
    

# torch.result_type

`torch.result_type(tensor1, tensor2) → dtype`

    

Returns the [`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype") that would result from performing an arithmetic operation
on the provided input tensors. See type promotion
[documentation](../tensor_attributes#type-promotion-doc) for more information
on the type promotion logic.

Parameters

    

  * **tensor1** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – an input tensor or number
  * **tensor2** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – an input tensor or number

Example:

    
    
    >>> torch.result_type(torch.tensor([1, 2], dtype=torch.int), 1.0)
    torch.float32
    >>> torch.result_type(torch.tensor([1, 2], dtype=torch.uint8), torch.tensor(1))
    torch.uint8
    

# torch.roll

`torch.roll(input, shifts, dims=None) → Tensor`

    

Roll the tensor along the given dimension(s). Elements that are shifted beyond
the last position are re-introduced at the first position. If a dimension is
not specified, the tensor will be flattened before rolling and then restored
to the original shape.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **shifts** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – The number of places by which the elements of the tensor are shifted. If shifts is a tuple, dims must be a tuple of the same size, and each dimension will be rolled by the corresponding value
  * **dims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – Axis along which to roll

Example:

    
    
    >>> x = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8]).view(4, 2)
    >>> x
    tensor([[1, 2],
            [3, 4],
            [5, 6],
            [7, 8]])
    >>> torch.roll(x, 1, 0)
    tensor([[7, 8],
            [1, 2],
            [3, 4],
            [5, 6]])
    >>> torch.roll(x, -1, 0)
    tensor([[3, 4],
            [5, 6],
            [7, 8],
            [1, 2]])
    >>> torch.roll(x, shifts=(2, 1), dims=(0, 1))
    tensor([[6, 5],
            [8, 7],
            [2, 1],
            [4, 3]])
    

# torch.rot90

`torch.rot90(input, k, dims) → Tensor`

    

Rotate a n-D tensor by 90 degrees in the plane specified by dims axis.
Rotation direction is from the first towards the second axis if k > 0, and
from the second towards the first for k < 0.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **k** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of times to rotate
  * **dims** (_a list_ _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – axis to rotate

Example:

    
    
    >>> x = torch.arange(4).view(2, 2)
    >>> x
    tensor([[0, 1],
            [2, 3]])
    >>> torch.rot90(x, 1, [0, 1])
    tensor([[1, 3],
            [0, 2]])
    
    >>> x = torch.arange(8).view(2, 2, 2)
    >>> x
    tensor([[[0, 1],
             [2, 3]],
    
            [[4, 5],
             [6, 7]]])
    >>> torch.rot90(x, 1, [1, 2])
    tensor([[[1, 3],
             [0, 2]],
    
            [[5, 7],
             [4, 6]]])
    

# torch.round

`torch.round(input, *, out=None) → Tensor`

    

Returns a new tensor with each of the elements of `input` rounded to the
closest integer.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.9920,  0.6077,  0.9734, -1.0362])
    >>> torch.round(a)
    tensor([ 1.,  1.,  1., -1.])
    

# torch.row_stack

`torch.row_stack(tensors, *, out=None) → Tensor`

    

Alias of [`torch.vstack()`](torch.vstack#torch.vstack "torch.vstack").

# torch.rsqrt

`torch.rsqrt(input, *, out=None) → Tensor`

    

Returns a new tensor with the reciprocal of the square-root of each of the
elements of `input`.

outi=1inputi\text{out}_{i} = \frac{1}{\sqrt{\text{input}_{i}}}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.0370,  0.2970,  1.5420, -0.9105])
    >>> torch.rsqrt(a)
    tensor([    nan,  1.8351,  0.8053,     nan])
    

# torch.save

`torch.save(obj, f, pickle_module=<module 'pickle' from
'/home/matti/miniconda3/lib/python3.7/pickle.py'>, pickle_protocol=2,
_use_new_zipfile_serialization=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/serialization.html#save)

    

Saves an object to a disk file.

See also: `saving-loading-tensors`

Parameters

    

  * **obj** – saved object
  * **f** – a file-like object (has to implement write and flush) or a string or os.PathLike object containing a file name
  * **pickle_module** – module used for pickling metadata and objects
  * **pickle_protocol** – can be specified to override the default protocol

Note

A common PyTorch convention is to save tensors using .pt file extension.

Note

PyTorch preserves storage sharing across serialization. See `preserve-storage-
sharing` for more details.

Note

The 1.6 release of PyTorch switched `torch.save` to use a new zipfile-based
file format. `torch.load` still retains the ability to load files in the old
format. If for any reason you want `torch.save` to use the old format, pass
the kwarg `_use_new_zipfile_serialization=False`.

#### Example

    
    
    >>> # Save to file
    >>> x = torch.tensor([0, 1, 2, 3, 4])
    >>> torch.save(x, 'tensor.pt')
    >>> # Save to io.BytesIO buffer
    >>> buffer = io.BytesIO()
    >>> torch.save(x, buffer)
    

# torch.scatter

`torch.scatter(input, dim, index, src) → Tensor`

    

Out-of-place version of
[`torch.Tensor.scatter_()`](../tensors#torch.Tensor.scatter_
"torch.Tensor.scatter_")

# torch.scatter_add

`torch.scatter_add(input, dim, index, src) → Tensor`

    

Out-of-place version of
[`torch.Tensor.scatter_add_()`](../tensors#torch.Tensor.scatter_add_
"torch.Tensor.scatter_add_")

# torch.searchsorted

`torch.searchsorted(sorted_sequence, values, *, out_int32=False, right=False,
out=None) → Tensor`

    

Find the indices from the _innermost_ dimension of `sorted_sequence` such
that, if the corresponding values in `values` were inserted before the
indices, the order of the corresponding _innermost_ dimension within
`sorted_sequence` would be preserved. Return a new tensor with the same size
as `values`. If `right` is False (default), then the left boundary of
`sorted_sequence` is closed. More formally, the returned index satisfies the
following rules:

`sorted_sequence` | `right` | _returned index satisfies_  
---|---|---  
1-D | False | `sorted_sequence[i-1] < values[m][n]...[l][x] <= sorted_sequence[i]`  
1-D | True | `sorted_sequence[i-1] <= values[m][n]...[l][x] < sorted_sequence[i]`  
N-D | False | `sorted_sequence[m][n]...[l][i-1] < values[m][n]...[l][x] <= sorted_sequence[m][n]...[l][i]`  
N-D | True | `sorted_sequence[m][n]...[l][i-1] <= values[m][n]...[l][x] < sorted_sequence[m][n]...[l][i]`  
  
Parameters

    

  * **sorted_sequence** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – N-D or 1-D tensor, containing monotonically increasing sequence on the _innermost_ dimension.
  * **values** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – N-D tensor or a Scalar containing the search value(s).

Keyword Arguments

    

  * **out_int32** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicate the output data type. torch.int32 if True, torch.int64 otherwise. Default value is False, i.e. default output data type is torch.int64.
  * **right** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if False, return the first suitable location that is found. If True, return the last such index. If no suitable index found, return 0 for non-numerical value (eg. nan, inf) or the size of _innermost_ dimension within `sorted_sequence` (one pass the last index of the _innermost_ dimension). In other words, if False, gets the lower bound index for each value in `values` on the corresponding _innermost_ dimension of the `sorted_sequence`. If True, gets the upper bound index instead. Default value is False.
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor, must be the same size as `values` if provided.

Note

If your use case is always 1-D sorted sequence,
[`torch.bucketize()`](torch.bucketize#torch.bucketize "torch.bucketize") is
preferred, because it has fewer dimension checks resulting in slightly better
performance.

Example:

    
    
    >>> sorted_sequence = torch.tensor([[1, 3, 5, 7, 9], [2, 4, 6, 8, 10]])
    >>> sorted_sequence
    tensor([[ 1,  3,  5,  7,  9],
            [ 2,  4,  6,  8, 10]])
    >>> values = torch.tensor([[3, 6, 9], [3, 6, 9]])
    >>> values
    tensor([[3, 6, 9],
            [3, 6, 9]])
    >>> torch.searchsorted(sorted_sequence, values)
    tensor([[1, 3, 4],
            [1, 2, 4]])
    >>> torch.searchsorted(sorted_sequence, values, right=True)
    tensor([[2, 3, 5],
            [1, 3, 4]])
    
    >>> sorted_sequence_1d = torch.tensor([1, 3, 5, 7, 9])
    >>> sorted_sequence_1d
    tensor([1, 3, 5, 7, 9])
    >>> torch.searchsorted(sorted_sequence_1d, values)
    tensor([[1, 3, 4],
            [1, 3, 4]])
    

# torch.seed

`torch.seed()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#seed)

    

Sets the seed for generating random numbers to a non-deterministic random
number. Returns a 64 bit number used to seed the RNG.

# torch.set_default_dtype

`torch.set_default_dtype(d)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#set_default_dtype)

    

Sets the default floating point dtype to `d`. This dtype is:

  1. The inferred dtype for python floats in [`torch.tensor()`](torch.tensor#torch.tensor "torch.tensor").
  2. Used to infer dtype for python complex numbers. The default complex dtype is set to `torch.complex128` if default floating point dtype is `torch.float64`, otherwise it’s set to `torch.complex64`

The default floating point dtype is initially `torch.float32`.

Parameters

    

**d** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype")) – the floating point dtype to make the default

#### Example

    
    
    >>> # initial default for floating point is torch.float32
    >>> torch.tensor([1.2, 3]).dtype
    torch.float32
    >>> # initial default for floating point is torch.complex64
    >>> torch.tensor([1.2, 3j]).dtype
    torch.complex64
    >>> torch.set_default_dtype(torch.float64)
    >>> torch.tensor([1.2, 3]).dtype    # a new floating point tensor
    torch.float64
    >>> torch.tensor([1.2, 3j]).dtype   # a new complex tensor
    torch.complex128
    

# torch.set_default_tensor_type

`torch.set_default_tensor_type(t)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#set_default_tensor_type)

    

Sets the default `torch.Tensor` type to floating point tensor type `t`. This
type will also be used as default floating point type for type inference in
[`torch.tensor()`](torch.tensor#torch.tensor "torch.tensor").

The default floating point tensor type is initially `torch.FloatTensor`.

Parameters

    

**t** ([type](https://docs.python.org/3/library/functions.html#type "\(in
Python v3.9\)") _or_ _string_) – the floating point tensor type or its name

Example:

    
    
    >>> torch.tensor([1.2, 3]).dtype    # initial default for floating point is torch.float32
    torch.float32
    >>> torch.set_default_tensor_type(torch.DoubleTensor)
    >>> torch.tensor([1.2, 3]).dtype    # a new floating point tensor
    torch.float64
    

# torch.set_flush_denormal

`torch.set_flush_denormal(mode) → bool`

    

Disables denormal floating numbers on CPU.

Returns `True` if your system supports flushing denormal numbers and it
successfully configures flush denormal mode. `set_flush_denormal()` is only
supported on x86 architectures supporting SSE3.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – Controls whether to enable flush denormal mode or not

Example:

    
    
    >>> torch.set_flush_denormal(True)
    True
    >>> torch.tensor([1e-323], dtype=torch.float64)
    tensor([ 0.], dtype=torch.float64)
    >>> torch.set_flush_denormal(False)
    True
    >>> torch.tensor([1e-323], dtype=torch.float64)
    tensor(9.88131e-324 *
           [ 1.0000], dtype=torch.float64)
    

# set_grad_enabled

`class torch.set_grad_enabled(mode)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#set_grad_enabled)

    

Context-manager that sets gradient calculation to on or off.

`set_grad_enabled` will enable or disable grads based on its argument
[`mode`](torch.mode#torch.mode "torch.mode"). It can be used as a context-
manager or as a function.

This context manager is thread local; it will not affect computation in other
threads.

Parameters

    

**mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – Flag whether to enable grad (`True`), or disable (`False`).
This can be used to conditionally enable gradients.

Example:

    
    
    >>> x = torch.tensor([1], requires_grad=True)
    >>> is_train = False
    >>> with torch.set_grad_enabled(is_train):
    ...   y = x * 2
    >>> y.requires_grad
    False
    >>> torch.set_grad_enabled(True)
    >>> y = x * 2
    >>> y.requires_grad
    True
    >>> torch.set_grad_enabled(False)
    >>> y = x * 2
    >>> y.requires_grad
    False
    

# torch.set_num_interop_threads

`torch.set_num_interop_threads(int)`

    

Sets the number of threads used for interop parallelism (e.g. in JIT
interpreter) on CPU.

Warning

Can only be called once and before any inter-op parallel work is started (e.g.
JIT execution).

# torch.set_num_threads

`torch.set_num_threads(int)`

    

Sets the number of threads used for intraop parallelism on CPU.

Warning

To ensure that the correct number of threads is used, set_num_threads must be
called before running eager, JIT or autograd code.

# torch.set_printoptions

`torch.set_printoptions(precision=None, threshold=None, edgeitems=None,
linewidth=None, profile=None, sci_mode=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_tensor_str.html#set_printoptions)

    

Set options for printing. Items shamelessly taken from NumPy

Parameters

    

  * **precision** – Number of digits of precision for floating point output (default = 4).
  * **threshold** – Total number of array elements which trigger summarization rather than full `repr` (default = 1000).
  * **edgeitems** – Number of array items in summary at beginning and end of each dimension (default = 3).
  * **linewidth** – The number of characters per line for the purpose of inserting line breaks (default = 80). Thresholded matrices will ignore this parameter.
  * **profile** – Sane defaults for pretty printing. Can override with any of the above options. (any one of `default`, `short`, `full`)
  * **sci_mode** – Enable (True) or disable (False) scientific notation. If None (default) is specified, the value is defined by `torch._tensor_str._Formatter`. This value is automatically chosen by the framework.

# torch.set_rng_state

`torch.set_rng_state(new_state)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#set_rng_state)

    

Sets the random number generator state.

Parameters

    

**new_state** (_torch.ByteTensor_) – The desired state

# torch.sgn

`torch.sgn(input, *, out=None) → Tensor`

    

For complex tensors, this function returns a new tensor whose elemants have
the same angle as that of the elements of `input` and absolute value 1. For a
non-complex tensor, this function returns the signs of the elements of `input`
(see [`torch.sign()`](torch.sign#torch.sign "torch.sign")).

outi=0\text{out}_{i} = 0 , if ∣inputi∣==0|{\text{{input}}_i}| == 0
outi=inputi∣inputi∣\text{out}_{i} =
\frac{{\text{{input}}_i}}{|{\text{{input}}_i}|} , otherwise

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x=torch.tensor([3+4j, 7-24j, 0, 1+2j])
    >>> x.sgn()
    tensor([0.6000+0.8000j, 0.2800-0.9600j, 0.0000+0.0000j, 0.4472+0.8944j])
    

# torch.sigmoid

`torch.sigmoid(input, *, out=None) → Tensor`

    

Returns a new tensor with the sigmoid of the elements of `input`.

outi=11+e−inputi\text{out}_{i} = \frac{1}{1 + e^{-\text{input}_{i}}}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.9213,  1.0887, -0.8858, -1.7683])
    >>> torch.sigmoid(a)
    tensor([ 0.7153,  0.7481,  0.2920,  0.1458])
    

# torch.sign

`torch.sign(input, *, out=None) → Tensor`

    

Returns a new tensor with the signs of the elements of `input`.

outi=sgn⁡(inputi)\text{out}_{i} = \operatorname{sgn}(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([0.7, -1.2, 0., 2.3])
    >>> a
    tensor([ 0.7000, -1.2000,  0.0000,  2.3000])
    >>> torch.sign(a)
    tensor([ 1., -1.,  0.,  1.])
    

# torch.signbit

`torch.signbit(input, *, out=None) → Tensor`

    

Tests if each element of `input` has its sign bit set (is less than zero) or
not.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([0.7, -1.2, 0., 2.3])
    >>> torch.signbit(a)
    tensor([ False, True,  False,  False])
    

# torch.sin

`torch.sin(input, *, out=None) → Tensor`

    

Returns a new tensor with the sine of the elements of `input`.

outi=sin⁡(inputi)\text{out}_{i} = \sin(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-0.5461,  0.1347, -2.7266, -0.2746])
    >>> torch.sin(a)
    tensor([-0.5194,  0.1343, -0.4032, -0.2711])
    

# torch.sinc

`torch.sinc(input, *, out=None) → Tensor`

    

Computes the normalized sinc of `input.`

outi={1,if inputi=0sin⁡(πinputi)/(πinputi),otherwise\text{out}_{i} =
\begin{cases} 1, & \text{if}\ \text{input}_{i}=0 \\\ \sin(\pi
\text{input}_{i}) / (\pi \text{input}_{i}), & \text{otherwise} \end{cases}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.2252, -0.2948,  1.0267, -1.1566])
    >>> torch.sinc(a)
    tensor([ 0.9186,  0.8631, -0.0259, -0.1300])
    

# torch.sinh

`torch.sinh(input, *, out=None) → Tensor`

    

Returns a new tensor with the hyperbolic sine of the elements of `input`.

outi=sinh⁡(inputi)\text{out}_{i} = \sinh(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.5380, -0.8632, -0.1265,  0.9399])
    >>> torch.sinh(a)
    tensor([ 0.5644, -0.9744, -0.1268,  1.0845])
    

Note

When `input` is on the CPU, the implementation of torch.sinh may use the Sleef
library, which rounds very large results to infinity or negative infinity. See
[here](https://sleef.org/purec.xhtml) for details.

# torch.slogdet

`torch.slogdet(input) -> (Tensor, Tensor)`

    

Calculates the sign and log absolute value of the determinant(s) of a square
matrix or batches of square matrices.

Note

`torch.slogdet()` is deprecated. Please use
[`torch.linalg.slogdet()`](../linalg#torch.linalg.slogdet
"torch.linalg.slogdet") instead.

Note

If `input` has zero determinant, this returns `(0, -inf)`.

Note

Backward through `slogdet()` internally uses SVD results when `input` is not
invertible. In this case, double backward through `slogdet()` will be unstable
in when `input` doesn’t have distinct singular values. See
[`svd()`](torch.svd#torch.svd "torch.svd") for details.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor of size `(*, n, n)` where `*` is zero or more batch dimensions.

Returns

    

A namedtuple (sign, logabsdet) containing the sign of the determinant, and the
log value of the absolute determinant.

Example:

    
    
    >>> A = torch.randn(3, 3)
    >>> A
    tensor([[ 0.0032, -0.2239, -1.1219],
            [-0.6690,  0.1161,  0.4053],
            [-1.6218, -0.9273, -0.0082]])
    >>> torch.det(A)
    tensor(-0.7576)
    >>> torch.logdet(A)
    tensor(nan)
    >>> torch.slogdet(A)
    torch.return_types.slogdet(sign=tensor(-1.), logabsdet=tensor(-0.2776))
    

# torch.solve

`torch.solve(input, A, *, out=None) -> (Tensor, Tensor)`

    

This function returns the solution to the system of linear equations
represented by AX=BAX = B and the LU factorization of A, in order as a
namedtuple `solution, LU`.

`LU` contains `L` and `U` factors for LU factorization of `A`.

`torch.solve(B, A)` can take in 2D inputs `B, A` or inputs that are batches of
2D matrices. If the inputs are batches, then returns batched outputs
`solution, LU`.

Supports real-valued and complex-valued inputs.

Note

Irrespective of the original strides, the returned matrices `solution` and
`LU` will be transposed, i.e. with strides like `B.contiguous().transpose(-1,
-2).stride()` and `A.contiguous().transpose(-1, -2).stride()` respectively.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input matrix BB of size (∗,m,k)(*, m, k) , where ∗* is zero or more batch dimensions.
  * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input square matrix of size (∗,m,m)(*, m, m) , where ∗* is zero or more batch dimensions.

Keyword Arguments

    

**out** (_(_[Tensor](../tensors#torch.Tensor "torch.Tensor")
_,_[Tensor](../tensors#torch.Tensor "torch.Tensor") _)__,__optional_) –
optional output tuple.

Example:

    
    
    >>> A = torch.tensor([[6.80, -2.11,  5.66,  5.97,  8.23],
    ...                   [-6.05, -3.30,  5.36, -4.44,  1.08],
    ...                   [-0.45,  2.58, -2.70,  0.27,  9.04],
    ...                   [8.32,  2.71,  4.35,  -7.17,  2.14],
    ...                   [-9.67, -5.14, -7.26,  6.08, -6.87]]).t()
    >>> B = torch.tensor([[4.02,  6.19, -8.22, -7.57, -3.03],
    ...                   [-1.56,  4.00, -8.67,  1.75,  2.86],
    ...                   [9.81, -4.09, -4.57, -8.61,  8.99]]).t()
    >>> X, LU = torch.solve(B, A)
    >>> torch.dist(B, torch.mm(A, X))
    tensor(1.00000e-06 *
           7.0977)
    
    >>> # Batched solver example
    >>> A = torch.randn(2, 3, 1, 4, 4)
    >>> B = torch.randn(2, 3, 1, 4, 6)
    >>> X, LU = torch.solve(B, A)
    >>> torch.dist(B, A.matmul(X))
    tensor(1.00000e-06 *
       3.6386)
    

# torch.sort

`torch.sort(input, dim=-1, descending=False, *, out=None) -> (Tensor,
LongTensor)`

    

Sorts the elements of the `input` tensor along a given dimension in ascending
order by value.

If `dim` is not given, the last dimension of the `input` is chosen.

If `descending` is `True` then the elements are sorted in descending order by
value.

A namedtuple of (values, indices) is returned, where the `values` are the
sorted values and `indices` are the indices of the elements in the original
`input` tensor.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to sort along
  * **descending** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls the sorting order (ascending or descending)

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the output tuple of (`Tensor`, `LongTensor`)
that can be optionally given to be used as output buffers

Example:

    
    
    >>> x = torch.randn(3, 4)
    >>> sorted, indices = torch.sort(x)
    >>> sorted
    tensor([[-0.2162,  0.0608,  0.6719,  2.3332],
            [-0.5793,  0.0061,  0.6058,  0.9497],
            [-0.5071,  0.3343,  0.9553,  1.0960]])
    >>> indices
    tensor([[ 1,  0,  2,  3],
            [ 3,  1,  0,  2],
            [ 0,  3,  1,  2]])
    
    >>> sorted, indices = torch.sort(x, 0)
    >>> sorted
    tensor([[-0.5071, -0.2162,  0.6719, -0.5793],
            [ 0.0608,  0.0061,  0.9497,  0.3343],
            [ 0.6058,  0.9553,  1.0960,  2.3332]])
    >>> indices
    tensor([[ 2,  0,  0,  1],
            [ 0,  1,  1,  2],
            [ 1,  2,  2,  0]])
    

# torch.sparse_coo_tensor

`torch.sparse_coo_tensor(indices, values, size=None, *, dtype=None,
device=None, requires_grad=False) → Tensor`

    

Constructs a [sparse tensor in COO(rdinate) format](../sparse#sparse-coo-docs)
with specified values at the given `indices`.

Note

This function returns an [uncoalesced tensor](../sparse#sparse-uncoalesced-
coo-docs).

Parameters

    

  * **indices** (_array_like_) – Initial data for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. Will be cast to a `torch.LongTensor` internally. The indices are the coordinates of the non-zero values in the matrix, and thus should be two-dimensional where the first dimension is the number of tensor dimensions and the second dimension is the number of non-zero values.
  * **values** (_array_like_) – Initial values for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types.
  * **size** (list, tuple, or `torch.Size`, optional) – Size of the sparse tensor. If not provided the size will be inferred as the minimum size big enough to hold all non-zero elements.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if None, infers data type from `values`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> i = torch.tensor([[0, 1, 1],
    ...                   [2, 0, 2]])
    >>> v = torch.tensor([3, 4, 5], dtype=torch.float32)
    >>> torch.sparse_coo_tensor(i, v, [2, 4])
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([3., 4., 5.]),
           size=(2, 4), nnz=3, layout=torch.sparse_coo)
    
    >>> torch.sparse_coo_tensor(i, v)  # Shape inference
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([3., 4., 5.]),
           size=(2, 3), nnz=3, layout=torch.sparse_coo)
    
    >>> torch.sparse_coo_tensor(i, v, [2, 4],
    ...                         dtype=torch.float64,
    ...                         device=torch.device('cuda:0'))
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([3., 4., 5.]),
           device='cuda:0', size=(2, 4), nnz=3, dtype=torch.float64,
           layout=torch.sparse_coo)
    
    # Create an empty sparse tensor with the following invariants:
    #   1. sparse_dim + dense_dim = len(SparseTensor.shape)
    #   2. SparseTensor._indices().shape = (sparse_dim, nnz)
    #   3. SparseTensor._values().shape = (nnz, SparseTensor.shape[sparse_dim:])
    #
    # For instance, to create an empty sparse tensor with nnz = 0, dense_dim = 0 and
    # sparse_dim = 1 (hence indices is a 2D tensor of shape = (1, 0))
    >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), [], [1])
    tensor(indices=tensor([], size=(1, 0)),
           values=tensor([], size=(0,)),
           size=(1,), nnz=0, layout=torch.sparse_coo)
    
    # and to create an empty sparse tensor with nnz = 0, dense_dim = 1 and
    # sparse_dim = 1
    >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), torch.empty([0, 2]), [1, 2])
    tensor(indices=tensor([], size=(1, 0)),
           values=tensor([], size=(0, 2)),
           size=(1, 2), nnz=0, layout=torch.sparse_coo)
    

# torch.split

`torch.split(tensor, split_size_or_sections, dim=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#split)

    

Splits the tensor into chunks. Each chunk is a view of the original tensor.

If `split_size_or_sections` is an integer type, then
[`tensor`](torch.tensor#torch.tensor "torch.tensor") will be split into
equally sized chunks (if possible). Last chunk will be smaller if the tensor
size along the given dimension `dim` is not divisible by `split_size`.

If `split_size_or_sections` is a list, then
[`tensor`](torch.tensor#torch.tensor "torch.tensor") will be split into
`len(split_size_or_sections)` chunks with sizes in `dim` according to
`split_size_or_sections`.

Parameters

    

  * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to split.
  * **split_size_or_sections** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _) or_ _(_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _(_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _)_) – size of a single chunk or list of sizes for each chunk
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to split the tensor.

Example::

    
    
    
    >>> a = torch.arange(10).reshape(5,2)
    >>> a
    tensor([[0, 1],
            [2, 3],
            [4, 5],
            [6, 7],
            [8, 9]])
    >>> torch.split(a, 2)
    (tensor([[0, 1],
             [2, 3]]),
     tensor([[4, 5],
             [6, 7]]),
     tensor([[8, 9]]))
    >>> torch.split(a, [1,4])
    (tensor([[0, 1]]),
     tensor([[2, 3],
             [4, 5],
             [6, 7],
             [8, 9]]))
    

# torch.sqrt

`torch.sqrt(input, *, out=None) → Tensor`

    

Returns a new tensor with the square-root of the elements of `input`.

outi=inputi\text{out}_{i} = \sqrt{\text{input}_{i}}

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-2.0755,  1.0226,  0.0831,  0.4806])
    >>> torch.sqrt(a)
    tensor([    nan,  1.0112,  0.2883,  0.6933])
    

# torch.square

`torch.square(input, *, out=None) → Tensor`

    

Returns a new tensor with the square of the elements of `input`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-2.0755,  1.0226,  0.0831,  0.4806])
    >>> torch.square(a)
    tensor([ 4.3077,  1.0457,  0.0069,  0.2310])
    

# torch.squeeze

`torch.squeeze(input, dim=None, *, out=None) → Tensor`

    

Returns a tensor with all the dimensions of `input` of size `1` removed.

For example, if `input` is of shape: (A×1×B×C×1×D)(A \times 1 \times B \times
C \times 1 \times D) then the `out` tensor will be of shape: (A×B×C×D)(A
\times B \times C \times D) .

When `dim` is given, a squeeze operation is done only in the given dimension.
If `input` is of shape: (A×1×B)(A \times 1 \times B) , `squeeze(input, 0)`
leaves the tensor unchanged, but `squeeze(input, 1)` will squeeze the tensor
to the shape (A×B)(A \times B) .

Note

The returned tensor shares the storage with the input tensor, so changing the
contents of one will change the contents of the other.

Warning

If the tensor has a batch dimension of size 1, then `squeeze(input)` will also
remove the batch dimension, which can lead to unexpected errors.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if given, the input will be squeezed only in this dimension

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x = torch.zeros(2, 1, 2, 1, 2)
    >>> x.size()
    torch.Size([2, 1, 2, 1, 2])
    >>> y = torch.squeeze(x)
    >>> y.size()
    torch.Size([2, 2, 2])
    >>> y = torch.squeeze(x, 0)
    >>> y.size()
    torch.Size([2, 1, 2, 1, 2])
    >>> y = torch.squeeze(x, 1)
    >>> y.size()
    torch.Size([2, 2, 1, 2])
    

# torch.stack

`torch.stack(tensors, dim=0, *, out=None) → Tensor`

    

Concatenates a sequence of tensors along a new dimension.

All tensors need to be of the same size.

Parameters

    

  * **tensors** (_sequence of Tensors_) – sequence of tensors to concatenate
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension to insert. Has to be between 0 and the number of dimensions of concatenated tensors (inclusive)

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

# torch.std

`torch.std(input, unbiased=True) → Tensor`

    

Returns the standard-deviation of all elements in the `input` tensor.

If `unbiased` is `False`, then the standard-deviation will be calculated via
the biased estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[-0.8166, -1.3802, -0.3560]])
    >>> torch.std(a)
    tensor(0.5130)
    

`torch.std(input, dim, unbiased=True, keepdim=False, *, out=None) → Tensor`

Returns the standard-deviation of each row of the `input` tensor in the
dimension `dim`. If `dim` is a list of dimensions, reduce over all of them.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

If `unbiased` is `False`, then the standard-deviation will be calculated via
the biased estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.2035,  1.2959,  1.8101, -0.4644],
            [ 1.5027, -0.3270,  0.5905,  0.6538],
            [-1.5745,  1.3330, -0.5596, -0.6548],
            [ 0.1264, -0.5080,  1.6420,  0.1992]])
    >>> torch.std(a, dim=1)
    tensor([ 1.0311,  0.7477,  1.2204,  0.9087])
    

# torch.std_mean

`torch.std_mean(input, unbiased=True) -> (Tensor, Tensor)`

    

Returns the standard-deviation and mean of all elements in the `input` tensor.

If `unbiased` is `False`, then the standard-deviation will be calculated via
the biased estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[0.3364, 0.3591, 0.9462]])
    >>> torch.std_mean(a)
    (tensor(0.3457), tensor(0.5472))
    

`torch.std_mean(input, dim, unbiased=True, keepdim=False) -> (Tensor, Tensor)`

Returns the standard-deviation and mean of each row of the `input` tensor in
the dimension `dim`. If `dim` is a list of dimensions, reduce over all of
them.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

If `unbiased` is `False`, then the standard-deviation will be calculated via
the biased estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.5648, -0.5984, -1.2676, -1.4471],
            [ 0.9267,  1.0612,  1.1050, -0.6014],
            [ 0.0154,  1.9301,  0.0125, -1.0904],
            [-1.9711, -0.7748, -1.3840,  0.5067]])
    >>> torch.std_mean(a, 1)
    (tensor([0.9110, 0.8197, 1.2552, 1.0608]), tensor([-0.6871,  0.6229,  0.2169, -0.9058]))
    

# torch.stft

`torch.stft(input, n_fft, hop_length=None, win_length=None, window=None,
center=True, pad_mode='reflect', normalized=False, onesided=None,
return_complex=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#stft)

    

Short-time Fourier transform (STFT).

Warning

From version 1.8.0, `return_complex` must always be given explicitly for real
inputs and `return_complex=False` has been deprecated. Strongly prefer
`return_complex=True` as in a future pytorch release, this function will only
return complex tensors.

Note that [`torch.view_as_real()`](torch.view_as_real#torch.view_as_real
"torch.view_as_real") can be used to recover a real tensor with an extra last
dimension for real and imaginary components.

The STFT computes the Fourier transform of short overlapping windows of the
input. This giving frequency components of the signal as they change over
time. The interface of this function is modeled after the
[librosa](https://librosa.org/doc/latest/generated/librosa.stft.html) stft
function.

Ignoring the optional batch dimension, this method computes the following
expression:

X[m,ω]=∑k=0win_length-1window[k]
input[m×hop_length+k]exp⁡(−j2π⋅ωkwin_length),X[m, \omega] = \sum_{k =
0}^{\text{win\\_length-1}}% \text{window}[k]\ \text{input}[m \times
\text{hop\\_length} + k]\ % \exp\left(- j \frac{2 \pi \cdot \omega
k}{\text{win\\_length}}\right),

where mm is the index of the sliding window, and ω\omega is the frequency that
0≤ω<n_fft0 \leq \omega < \text{n\\_fft} . When `onesided` is the default value
`True`,

  * `input` must be either a 1-D time sequence or a 2-D batch of time sequences.
  * If `hop_length` is `None` (default), it is treated as equal to `floor(n_fft / 4)`.
  * If `win_length` is `None` (default), it is treated as equal to `n_fft`.
  * `window` can be a 1-D tensor of size `win_length`, e.g., from [`torch.hann_window()`](torch.hann_window#torch.hann_window "torch.hann_window"). If `window` is `None` (default), it is treated as if having 11 everywhere in the window. If win_length<n_fft\text{win\\_length} < \text{n\\_fft} , `window` will be padded on both sides to length `n_fft` before being applied.
  * If `center` is `True` (default), `input` will be padded on both sides so that the tt -th frame is centered at time t×hop_lengtht \times \text{hop\\_length} . Otherwise, the tt -th frame begins at time t×hop_lengtht \times \text{hop\\_length} .
  * `pad_mode` determines the padding method used on `input` when `center` is `True`. See [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad") for all available options. Default is `"reflect"`.
  * If `onesided` is `True` (default for real input), only values for ω\omega in [0,1,2,…,⌊n_fft2⌋+1]\left[0, 1, 2, \dots, \left\lfloor \frac{\text{n\\_fft}}{2} \right\rfloor + 1\right] are returned because the real-to-complex Fourier transform satisfies the conjugate symmetry, i.e., X[m,ω]=X[m,n_fft−ω]∗X[m, \omega] = X[m, \text{n\\_fft} - \omega]^* . Note if the input or window tensors are complex, then `onesided` output is not possible.
  * If `normalized` is `True` (default is `False`), the function returns the normalized STFT results, i.e., multiplied by (frame_length)−0.5(\text{frame\\_length})^{-0.5} .
  * If `return_complex` is `True` (default if input is complex), the return is a `input.dim() + 1` dimensional complex tensor. If `False`, the output is a `input.dim() + 2` dimensional real tensor where the last dimension represents the real and imaginary components.

Returns either a complex tensor of size (∗×N×T)(* \times N \times T) if
`return_complex` is true, or a real tensor of size (∗×N×T×2)(* \times N \times
T \times 2) . Where ∗* is the optional batch size of `input`, NN is the number
of frequencies where STFT is applied and TT is the total number of frames
used.

Warning

This function changed signature at version 0.4.1. Calling with the previous
signature may cause error or return incorrect result.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **n_fft** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of Fourier transform
  * **hop_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the distance between neighboring sliding window frames. Default: `None` (treated as equal to `floor(n_fft / 4)`)
  * **win_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the size of window frame and STFT filter. Default: `None` (treated as equal to `n_fft`)
  * **window** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the optional window function. Default: `None` (treated as window of all 11 s)
  * **center** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to pad `input` on both sides so that the tt -th frame is centered at time t×hop_lengtht \times \text{hop\\_length} . Default: `True`
  * **pad_mode** (_string_ _,__optional_) – controls the padding method used when `center` is `True`. Default: `"reflect"`
  * **normalized** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to return the normalized STFT results Default: `False`
  * **onesided** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to return half of results to avoid redundancy for real inputs. Default: `True` for real `input` and `window`, `False` otherwise.
  * **return_complex** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to return a complex tensor, or a real tensor with an extra last dimension for the real and imaginary components.

Returns

    

A tensor containing the STFT result with shape described above

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

# torch.sub

`torch.sub(input, other, *, alpha=1, out=None) → Tensor`

    

Subtracts `other`, scaled by `alpha`, from `input`.

outi=inputi−alpha×otheri\text{{out}}_i = \text{{input}}_i - \text{{alpha}}
\times \text{{other}}_i

Supports [broadcasting to a common
shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics), [type promotion](../tensor_attributes#type-promotion-doc), and
integer, float, and complex inputs.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – the tensor or scalar to subtract from `input`

Keyword Arguments

    

  * **alpha** (_Scalar_) – the scalar multiplier for `other`
  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

Example:

    
    
    >>> a = torch.tensor((1, 2))
    >>> b = torch.tensor((0, 1))
    >>> torch.sub(a, b, alpha=2)
    tensor([1, 0])
    

# torch.subtract

`torch.subtract(input, other, *, alpha=1, out=None) → Tensor`

    

Alias for [`torch.sub()`](torch.sub#torch.sub "torch.sub").

# torch.sum

`torch.sum(input, *, dtype=None) → Tensor`

    

Returns the sum of all elements in the `input` tensor.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), optional) – the desired data type of returned tensor. If
specified, the input tensor is casted to `dtype` before the operation is
performed. This is useful for preventing data type overflows. Default: None.

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[ 0.1133, -0.9567,  0.2958]])
    >>> torch.sum(a)
    tensor(-0.5475)
    

`torch.sum(input, dim, keepdim=False, *, dtype=None) → Tensor`

Returns the sum of each row of the `input` tensor in the given dimension
`dim`. If `dim` is a list of dimensions, reduce over all of them.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), optional) – the desired data type of returned tensor. If
specified, the input tensor is casted to `dtype` before the operation is
performed. This is useful for preventing data type overflows. Default: None.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[ 0.0569, -0.2475,  0.0737, -0.3429],
            [-0.2993,  0.9138,  0.9337, -1.6864],
            [ 0.1132,  0.7892, -0.1003,  0.5688],
            [ 0.3637, -0.9906, -0.4752, -1.5197]])
    >>> torch.sum(a, 1)
    tensor([-0.4598, -0.1381,  1.3708, -2.6217])
    >>> b = torch.arange(4 * 5 * 6).view(4, 5, 6)
    >>> torch.sum(b, (2, 1))
    tensor([  435.,  1335.,  2235.,  3135.])
    

# torch.svd

`torch.svd(input, some=True, compute_uv=True, *, out=None) -> (Tensor, Tensor,
Tensor)`

    

Computes the singular value decomposition of either a matrix or batch of
matrices `input`. The singular value decomposition is represented as a
namedtuple (`U,S,V`), such that `input` = `U` diag(`S`) `Vᴴ`, where `Vᴴ` is
the transpose of `V` for the real-valued inputs, or the conjugate transpose of
`V` for the complex-valued inputs. If `input` is a batch of tensors, then `U`,
`S`, and `V` are also batched with the same batch dimensions as `input`.

If `some` is `True` (default), the method returns the reduced singular value
decomposition i.e., if the last two dimensions of `input` are `m` and `n`,
then the returned `U` and `V` matrices will contain only min(`n, m`)
orthonormal columns.

If `compute_uv` is `False`, the returned `U` and `V` will be zero-filled
matrices of shape `(m × m)` and `(n × n)` respectively, and the same device as
`input`. The `some` argument has no effect when `compute_uv` is `False`.

Supports input of float, double, cfloat and cdouble data types. The dtypes of
`U` and `V` are the same as `input`’s. `S` will always be real-valued, even if
`input` is complex.

Warning

`torch.svd()` is deprecated. Please use
[`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd") instead,
which is similar to NumPy’s `numpy.linalg.svd`.

Note

Differences with [`torch.linalg.svd()`](../linalg#torch.linalg.svd
"torch.linalg.svd"):

  * `some` is the opposite of [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd")’s `full_matricies`. Note that default value for both is `True`, so the default behavior is effectively the opposite.
  * `torch.svd()` returns `V`, whereas [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd") returns `Vᴴ`.
  * If `compute_uv=False`, `torch.svd()` returns zero-filled tensors for `U` and `Vh`, whereas [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd") returns empty tensors.

Note

The singular values are returned in descending order. If `input` is a batch of
matrices, then the singular values of each matrix in the batch is returned in
descending order.

Note

The implementation of SVD on CPU uses the LAPACK routine `?gesdd` (a divide-
and-conquer algorithm) instead of `?gesvd` for speed. Analogously, the SVD on
GPU uses the cuSOLVER routines `gesvdj` and `gesvdjBatched` on CUDA 10.1.243
and later, and uses the MAGMA routine `gesdd` on earlier versions of CUDA.

Note

The returned matrix `U` will be transposed, i.e. with strides
`U.contiguous().transpose(-2, -1).stride()`.

Note

Gradients computed using `U` and `V` may be unstable if `input` is not full
rank or has non-unique singular values.

Note

When `some` = `False`, the gradients on `U[..., :, min(m, n):]` and `V[..., :,
min(m, n):]` will be ignored in backward as those vectors can be arbitrary
bases of the subspaces.

Note

The `S` tensor can only be used to compute gradients if `compute_uv` is True.

Note

With the complex-valued input the backward operation works correctly only for
gauge invariant loss functions. Please look at [Gauge problem in
AD](https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/) for more
details.

Note

Since `U` and `V` of an SVD is not unique, each vector can be multiplied by an
arbitrary phase factor eiϕe^{i \phi} while the SVD result is still correct.
Different platforms, like Numpy, or inputs on different device types, may
produce different `U` and `V` tensors.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size `(*, m, n)` where `*` is zero or more batch dimensions consisting of `(m × n)` matrices.
  * **some** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to compute the reduced or full decomposition, and consequently the shape of returned `U` and `V`. Defaults to True.
  * **compute_uv** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – option whether to compute `U` and `V` or not. Defaults to True.

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the output tuple of tensors

Example:

    
    
    >>> a = torch.randn(5, 3)
    >>> a
    tensor([[ 0.2364, -0.7752,  0.6372],
            [ 1.7201,  0.7394, -0.0504],
            [-0.3371, -1.0584,  0.5296],
            [ 0.3550, -0.4022,  1.5569],
            [ 0.2445, -0.0158,  1.1414]])
    >>> u, s, v = torch.svd(a)
    >>> u
    tensor([[ 0.4027,  0.0287,  0.5434],
            [-0.1946,  0.8833,  0.3679],
            [ 0.4296, -0.2890,  0.5261],
            [ 0.6604,  0.2717, -0.2618],
            [ 0.4234,  0.2481, -0.4733]])
    >>> s
    tensor([2.3289, 2.0315, 0.7806])
    >>> v
    tensor([[-0.0199,  0.8766,  0.4809],
            [-0.5080,  0.4054, -0.7600],
            [ 0.8611,  0.2594, -0.4373]])
    >>> torch.dist(a, torch.mm(torch.mm(u, torch.diag(s)), v.t()))
    tensor(8.6531e-07)
    >>> a_big = torch.randn(7, 5, 3)
    >>> u, s, v = torch.svd(a_big)
    >>> torch.dist(a_big, torch.matmul(torch.matmul(u, torch.diag_embed(s)), v.transpose(-2, -1)))
    tensor(2.6503e-06)
    

# torch.svd_lowrank

`torch.svd_lowrank(A, q=6, niter=2, M=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_lowrank.html#svd_lowrank)

    

Return the singular value decomposition `(U, S, V)` of a matrix, batches of
matrices, or a sparse matrix AA such that A≈Udiag(S)VTA \approx U diag(S) V^T
. In case MM is given, then SVD is computed for the matrix A−MA - M .

Note

The implementation is based on the Algorithm 5.1 from Halko et al, 2009.

Note

To obtain repeatable results, reset the seed for the pseudorandom number
generator

Note

The input is assumed to be a low-rank matrix.

Note

In general, use the full-rank SVD implementation `torch.svd` for dense
matrices due to its 10-fold higher performance characteristics. The low-rank
SVD will be useful for huge sparse matrices that `torch.svd` cannot handle.

Args::

    

A (Tensor): the input tensor of size (∗,m,n)(*, m, n)

q (int, optional): a slightly overestimated rank of A.

niter (int, optional): the number of subspace iterations to

    

conduct; niter must be a nonnegative integer, and defaults to 2

M (Tensor, optional): the input tensor’s mean of size

    

(∗,1,n)(*, 1, n) .

References::

    

  * Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, arXiv:0909.4061 [math.NA; math.PR], 2009 (available at [arXiv](https://arxiv.org/abs/0909.4061)).

# torch.swapaxes

`torch.swapaxes(input, axis0, axis1) → Tensor`

    

Alias for [`torch.transpose()`](torch.transpose#torch.transpose
"torch.transpose").

This function is equivalent to NumPy’s swapaxes function.

Examples:

    
    
    >>> x = torch.tensor([[[0,1],[2,3]],[[4,5],[6,7]]])
    >>> x
    tensor([[[0, 1],
            [2, 3]],
    
            [[4, 5],
            [6, 7]]])
    >>> torch.swapaxes(x, 0, 1)
    tensor([[[0, 1],
            [4, 5]],
    
            [[2, 3],
            [6, 7]]])
    >>> torch.swapaxes(x, 0, 2)
    tensor([[[0, 4],
            [2, 6]],
    
            [[1, 5],
            [3, 7]]])
    

# torch.swapdims

`torch.swapdims(input, dim0, dim1) → Tensor`

    

Alias for [`torch.transpose()`](torch.transpose#torch.transpose
"torch.transpose").

This function is equivalent to NumPy’s swapaxes function.

Examples:

    
    
    >>> x = torch.tensor([[[0,1],[2,3]],[[4,5],[6,7]]])
    >>> x
    tensor([[[0, 1],
            [2, 3]],
    
            [[4, 5],
            [6, 7]]])
    >>> torch.swapdims(x, 0, 1)
    tensor([[[0, 1],
            [4, 5]],
    
            [[2, 3],
            [6, 7]]])
    >>> torch.swapdims(x, 0, 2)
    tensor([[[0, 4],
            [2, 6]],
    
            [[1, 5],
            [3, 7]]])
    

# torch.symeig

`torch.symeig(input, eigenvectors=False, upper=True, *, out=None) -> (Tensor,
Tensor)`

    

This function returns eigenvalues and eigenvectors of a real symmetric matrix
`input` or a batch of real symmetric matrices, represented by a namedtuple
(eigenvalues, eigenvectors).

This function calculates all eigenvalues (and vectors) of `input` such that
input=Vdiag(e)VT\text{input} = V \text{diag}(e) V^T .

The boolean argument `eigenvectors` defines computation of both eigenvectors
and eigenvalues or eigenvalues only.

If it is `False`, only eigenvalues are computed. If it is `True`, both
eigenvalues and eigenvectors are computed.

Since the input matrix `input` is supposed to be symmetric, only the upper
triangular portion is used by default.

If `upper` is `False`, then lower triangular portion is used.

Note

The eigenvalues are returned in ascending order. If `input` is a batch of
matrices, then the eigenvalues of each matrix in the batch is returned in
ascending order.

Note

Irrespective of the original strides, the returned matrix `V` will be
transposed, i.e. with strides `V.contiguous().transpose(-1, -2).stride()`.

Warning

Extra care needs to be taken when backward through outputs. Such operation is
only stable when all eigenvalues are distinct and becomes less stable the
smaller min⁡i≠j∣λi−λj∣\min_{i \neq j} |\lambda_i - \lambda_j| is.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,n,n)(*, n, n) where `*` is zero or more batch dimensions consisting of symmetric matrices.
  * **eigenvectors** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether eigenvectors have to be computed
  * **upper** (_boolean_ _,__optional_) – controls whether to consider upper-triangular or lower-triangular region

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the output tuple of (Tensor, Tensor)

Returns

    

A namedtuple (eigenvalues, eigenvectors) containing

  * **eigenvalues** (_Tensor_): Shape (∗,m)(*, m) . The eigenvalues in ascending order.
  * **eigenvectors** (_Tensor_): Shape (∗,m,m)(*, m, m) . If `eigenvectors=False`, it’s an empty tensor. Otherwise, this tensor contains the orthonormal eigenvectors of the `input`.

Return type

    

([Tensor](../tensors#torch.Tensor "torch.Tensor"),
[Tensor](../tensors#torch.Tensor "torch.Tensor"))

Examples:

    
    
    >>> a = torch.randn(5, 5)
    >>> a = a + a.t()  # To make a symmetric
    >>> a
    tensor([[-5.7827,  4.4559, -0.2344, -1.7123, -1.8330],
            [ 4.4559,  1.4250, -2.8636, -3.2100, -0.1798],
            [-0.2344, -2.8636,  1.7112, -5.5785,  7.1988],
            [-1.7123, -3.2100, -5.5785, -2.6227,  3.1036],
            [-1.8330, -0.1798,  7.1988,  3.1036, -5.1453]])
    >>> e, v = torch.symeig(a, eigenvectors=True)
    >>> e
    tensor([-13.7012,  -7.7497,  -2.3163,   5.2477,   8.1050])
    >>> v
    tensor([[ 0.1643,  0.9034, -0.0291,  0.3508,  0.1817],
            [-0.2417, -0.3071, -0.5081,  0.6534,  0.4026],
            [-0.5176,  0.1223, -0.0220,  0.3295, -0.7798],
            [-0.4850,  0.2695, -0.5773, -0.5840,  0.1337],
            [ 0.6415, -0.0447, -0.6381, -0.0193, -0.4230]])
    >>> a_big = torch.randn(5, 2, 2)
    >>> a_big = a_big + a_big.transpose(-2, -1)  # To make a_big symmetric
    >>> e, v = a_big.symeig(eigenvectors=True)
    >>> torch.allclose(torch.matmul(v, torch.matmul(e.diag_embed(), v.transpose(-2, -1))), a_big)
    True
    

# torch.t

`torch.t(input) → Tensor`

    

Expects `input` to be <= 2-D tensor and transposes dimensions 0 and 1.

0-D and 1-D tensors are returned as is. When input is a 2-D tensor this is
equivalent to `transpose(input, 0, 1)`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example:

    
    
    >>> x = torch.randn(())
    >>> x
    tensor(0.1995)
    >>> torch.t(x)
    tensor(0.1995)
    >>> x = torch.randn(3)
    >>> x
    tensor([ 2.4320, -0.4608,  0.7702])
    >>> torch.t(x)
    tensor([ 2.4320, -0.4608,  0.7702])
    >>> x = torch.randn(2, 3)
    >>> x
    tensor([[ 0.4875,  0.9158, -0.5872],
            [ 0.3938, -0.6929,  0.6932]])
    >>> torch.t(x)
    tensor([[ 0.4875,  0.3938],
            [ 0.9158, -0.6929],
            [-0.5872,  0.6932]])
    

# torch.take

`torch.take(input, index) → Tensor`

    

Returns a new tensor with the elements of `input` at the given indices. The
input tensor is treated as if it were viewed as a 1-D tensor. The result takes
the same shape as the indices.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **indices** (_LongTensor_) – the indices into tensor

Example:

    
    
    >>> src = torch.tensor([[4, 3, 5],
    ...                     [6, 7, 8]])
    >>> torch.take(src, torch.tensor([0, 2, 5]))
    tensor([ 4,  5,  8])
    

# torch.tan

`torch.tan(input, *, out=None) → Tensor`

    

Returns a new tensor with the tangent of the elements of `input`.

outi=tan⁡(inputi)\text{out}_{i} = \tan(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([-1.2027, -1.7687,  0.4412, -1.3856])
    >>> torch.tan(a)
    tensor([-2.5930,  4.9859,  0.4722, -5.3366])
    

# torch.tanh

`torch.tanh(input, *, out=None) → Tensor`

    

Returns a new tensor with the hyperbolic tangent of the elements of `input`.

outi=tanh⁡(inputi)\text{out}_{i} = \tanh(\text{input}_{i})

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 0.8986, -0.7279,  1.1745,  0.2611])
    >>> torch.tanh(a)
    tensor([ 0.7156, -0.6218,  0.8257,  0.2553])
    

# torch.tensor

`torch.tensor(data, *, dtype=None, device=None, requires_grad=False,
pin_memory=False) → Tensor`

    

Constructs a tensor with `data`.

Warning

`torch.tensor()` always copies `data`. If you have a Tensor `data` and want to
avoid a copy, use
[`torch.Tensor.requires_grad_()`](../tensors#torch.Tensor.requires_grad_
"torch.Tensor.requires_grad_") or
[`torch.Tensor.detach()`](../autograd#torch.Tensor.detach
"torch.Tensor.detach"). If you have a NumPy `ndarray` and want to avoid a
copy, use [`torch.as_tensor()`](torch.as_tensor#torch.as_tensor
"torch.as_tensor").

Warning

When data is a tensor `x`, `torch.tensor()` reads out ‘the data’ from whatever
it is passed, and constructs a leaf variable. Therefore `torch.tensor(x)` is
equivalent to `x.clone().detach()` and `torch.tensor(x, requires_grad=True)`
is equivalent to `x.clone().detach().requires_grad_(True)`. The equivalents
using `clone()` and `detach()` are recommended.

Parameters

    

**data** (_array_like_) – Initial data for the tensor. Can be a list, tuple,
NumPy `ndarray`, scalar, and other types.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, infers data type from `data`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`.

Example:

    
    
    >>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]])
    tensor([[ 0.1000,  1.2000],
            [ 2.2000,  3.1000],
            [ 4.9000,  5.2000]])
    
    >>> torch.tensor([0, 1])  # Type inference on data
    tensor([ 0,  1])
    
    >>> torch.tensor([[0.11111, 0.222222, 0.3333333]],
    ...              dtype=torch.float64,
    ...              device=torch.device('cuda:0'))  # creates a torch.cuda.DoubleTensor
    tensor([[ 0.1111,  0.2222,  0.3333]], dtype=torch.float64, device='cuda:0')
    
    >>> torch.tensor(3.14159)  # Create a scalar (zero-dimensional tensor)
    tensor(3.1416)
    
    >>> torch.tensor([])  # Create an empty tensor (of size (0,))
    tensor([])
    

# torch.tensor_split

`torch.tensor_split(input, indices_or_sections, dim=0) → List of Tensors`

    

Splits a tensor into multiple sub-tensors, all of which are views of `input`,
along dimension `dim` according to the indices or number of sections specified
by `indices_or_sections`. This function is based on NumPy’s
[`numpy.array_split()`](https://numpy.org/doc/stable/reference/generated/numpy.array_split.html#numpy.array_split
"\(in NumPy v1.20\)").

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to split
  * **indices_or_sections** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _or_ _tuple of python:ints_) – 

If `indices_or_sections` is an integer `n` or a zero dimensional long tensor
with value `n`, `input` is split into `n` sections along dimension `dim`. If
`input` is divisible by `n` along dimension `dim`, each section will be of
equal size, `input.size(dim) / n`. If `input` is not divisible by `n`, the
sizes of the first `int(input.size(dim) % n)` sections will have size
`int(input.size(dim) / n) + 1`, and the rest will have size
`int(input.size(dim) / n)`.

If `indices_or_sections` is a list or tuple of ints, or a one-dimensional long
tensor, then `input` is split along dimension `dim` at each of the indices in
the list, tuple or tensor. For instance, `indices_or_sections=[2, 3]` and
`dim=0` would result in the tensors `input[:2]`, `input[2:3]`, and
`input[3:]`.

If indices_or_sections is a tensor, it must be a zero-dimensional or one-
dimensional long tensor on the CPU.

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – dimension along which to split the tensor. Default: `0`

Example::

    
    
    
    >>> x = torch.arange(8)
    >>> torch.tensor_split(x, 3)
    (tensor([0, 1, 2]), tensor([3, 4, 5]), tensor([6, 7]))
    
    
    
    >>> x = torch.arange(7)
    >>> torch.tensor_split(x, 3)
    (tensor([0, 1, 2]), tensor([3, 4]), tensor([5, 6]))
    >>> torch.tensor_split(x, (1, 6))
    (tensor([0]), tensor([1, 2, 3, 4, 5]), tensor([6]))
    
    
    
    >>> x = torch.arange(14).reshape(2, 7)
    >>> x
    tensor([[ 0,  1,  2,  3,  4,  5,  6],
            [ 7,  8,  9, 10, 11, 12, 13]])
    >>> torch.tensor_split(x, 3, dim=1)
    (tensor([[0, 1, 2],
            [7, 8, 9]]),
     tensor([[ 3,  4],
            [10, 11]]),
     tensor([[ 5,  6],
            [12, 13]]))
    >>> torch.tensor_split(x, (1, 6), dim=1)
    (tensor([[0],
            [7]]),
     tensor([[ 1,  2,  3,  4,  5],
            [ 8,  9, 10, 11, 12]]),
     tensor([[ 6],
            [13]]))
    

# torch.tensordot

`torch.tensordot(a, b, dims=2, out=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#tensordot)

    

Returns a contraction of a and b over multiple dimensions.

`tensordot` implements a generalized matrix product.

Parameters

    

  * **a** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Left tensor to contract
  * **b** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Right tensor to contract
  * **dims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[__List_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__]__containing two lists_) – number of dimensions to contract or explicit lists of dimensions for `a` and `b` respectively

When called with a non-negative integer argument `dims` = dd , and the number
of dimensions of `a` and `b` is mm and nn , respectively, `tensordot()`
computes

ri0,...,im−d,id,...,in=∑k0,...,kd−1ai0,...,im−d,k0,...,kd−1×bk0,...,kd−1,id,...,in.r_{i_0,...,i_{m-d},
i_d,...,i_n} = \sum_{k_0,...,k_{d-1}} a_{i_0,...,i_{m-d},k_0,...,k_{d-1}}
\times b_{k_0,...,k_{d-1}, i_d,...,i_n}.

When called with `dims` of the list form, the given dimensions will be
contracted in place of the last dd of `a` and the first dd of bb . The sizes
in these dimensions must match, but `tensordot()` will deal with broadcasted
dimensions.

Examples:

    
    
    >>> a = torch.arange(60.).reshape(3, 4, 5)
    >>> b = torch.arange(24.).reshape(4, 3, 2)
    >>> torch.tensordot(a, b, dims=([1, 0], [0, 1]))
    tensor([[4400., 4730.],
            [4532., 4874.],
            [4664., 5018.],
            [4796., 5162.],
            [4928., 5306.]])
    
    >>> a = torch.randn(3, 4, 5, device='cuda')
    >>> b = torch.randn(4, 5, 6, device='cuda')
    >>> c = torch.tensordot(a, b, dims=2).cpu()
    tensor([[ 8.3504, -2.5436,  6.2922,  2.7556, -1.0732,  3.2741],
            [ 3.3161,  0.0704,  5.0187, -0.4079, -4.3126,  4.8744],
            [ 0.8223,  3.9445,  3.2168, -0.2400,  3.4117,  1.7780]])
    
    >>> a = torch.randn(3, 5, 4, 6)
    >>> b = torch.randn(6, 4, 5, 3)
    >>> torch.tensordot(a, b, dims=([2, 1, 3], [1, 2, 0]))
    tensor([[  7.7193,  -2.4867, -10.3204],
            [  1.5513, -14.4737,  -6.5113],
            [ -0.2850,   4.2573,  -3.5997]])
    

# torch.tile

`torch.tile(input, reps) → Tensor`

    

Constructs a tensor by repeating the elements of `input`. The `reps` argument
specifies the number of repetitions in each dimension.

If `reps` specifies fewer dimensions than `input` has, then ones are prepended
to `reps` until all dimensions are specified. For example, if `input` has
shape (8, 6, 4, 2) and `reps` is (2, 2), then `reps` is treated as (1, 1, 2,
2).

Analogously, if `input` has fewer dimensions than `reps` specifies, then
`input` is treated as if it were unsqueezed at dimension zero until it has as
many dimensions as `reps` specifies. For example, if `input` has shape (4, 2)
and `reps` is (3, 3, 2, 2), then `input` is treated as if it had the shape (1,
1, 4, 2).

Note

This function is similar to NumPy’s tile function.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor whose elements to repeat.
  * **reps** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the number of repetitions per dimension.

Example:

    
    
    >>> x = torch.tensor([1, 2, 3])
    >>> x.tile((2,))
    tensor([1, 2, 3, 1, 2, 3])
    >>> y = torch.tensor([[1, 2], [3, 4]])
    >>> torch.tile(y, (2, 2))
    tensor([[1, 2, 1, 2],
            [3, 4, 3, 4],
            [1, 2, 1, 2],
            [3, 4, 3, 4]])
    

# torch.topk

`torch.topk(input, k, dim=None, largest=True, sorted=True, *, out=None) ->
(Tensor, LongTensor)`

    

Returns the `k` largest elements of the given `input` tensor along a given
dimension.

If `dim` is not given, the last dimension of the `input` is chosen.

If `largest` is `False` then the `k` smallest elements are returned.

A namedtuple of `(values, indices)` is returned, where the `indices` are the
indices of the elements in the original `input` tensor.

The boolean option `sorted` if `True`, will make sure that the returned `k`
elements are themselves sorted

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **k** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the k in “top-k”
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to sort along
  * **largest** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to return largest or smallest elements
  * **sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to return the elements in sorted order

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – the output tuple of (Tensor, LongTensor) that
can be optionally given to be used as output buffers

Example:

    
    
    >>> x = torch.arange(1., 6.)
    >>> x
    tensor([ 1.,  2.,  3.,  4.,  5.])
    >>> torch.topk(x, 3)
    torch.return_types.topk(values=tensor([5., 4., 3.]), indices=tensor([4, 3, 2]))
    

# torch.trace

`torch.trace(input) → Tensor`

    

Returns the sum of the elements of the diagonal of the input 2-D matrix.

Example:

    
    
    >>> x = torch.arange(1., 10.).view(3, 3)
    >>> x
    tensor([[ 1.,  2.,  3.],
            [ 4.,  5.,  6.],
            [ 7.,  8.,  9.]])
    >>> torch.trace(x)
    tensor(15.)
    

# torch.transpose

`torch.transpose(input, dim0, dim1) → Tensor`

    

Returns a tensor that is a transposed version of `input`. The given dimensions
`dim0` and `dim1` are swapped.

The resulting `out` tensor shares its underlying storage with the `input`
tensor, so changing the content of one would change the content of the other.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim0** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the first dimension to be transposed
  * **dim1** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the second dimension to be transposed

Example:

    
    
    >>> x = torch.randn(2, 3)
    >>> x
    tensor([[ 1.0028, -0.9893,  0.5809],
            [-0.1669,  0.7299,  0.4942]])
    >>> torch.transpose(x, 0, 1)
    tensor([[ 1.0028, -0.1669],
            [-0.9893,  0.7299],
            [ 0.5809,  0.4942]])
    

# torch.trapz

`torch.trapz(y, x, *, dim=-1) → Tensor`

    

Estimate ∫ydx\int y\,dx along `dim`, using the trapezoid rule.

Parameters

    

  * **y** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The values of the function to integrate
  * **x** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The points at which the function `y` is sampled. If `x` is not in ascending order, intervals on which it is decreasing contribute negatively to the estimated integral (i.e., the convention ∫abf=−∫baf\int_a^b f = -\int_b^a f is followed).
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The dimension along which to integrate. By default, use the last dimension.

Returns

    

A Tensor with the same shape as the input, except with `dim` removed. Each
element of the returned tensor represents the estimated integral ∫ydx\int
y\,dx along `dim`.

Example:

    
    
    >>> y = torch.randn((2, 3))
    >>> y
    tensor([[-2.1156,  0.6857, -0.2700],
            [-1.2145,  0.5540,  2.0431]])
    >>> x = torch.tensor([[1, 3, 4], [1, 2, 3]])
    >>> torch.trapz(y, x)
    tensor([-1.2220,  0.9683])
    

`torch.trapz(y, *, dx=1, dim=-1) → Tensor`

As above, but the sample points are spaced uniformly at a distance of `dx`.

Parameters

    

**y** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The values of the
function to integrate

Keyword Arguments

    

  * **dx** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – The distance between points at which `y` is sampled.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The dimension along which to integrate. By default, use the last dimension.

Returns

    

A Tensor with the same shape as the input, except with `dim` removed. Each
element of the returned tensor represents the estimated integral ∫ydx\int
y\,dx along `dim`.

# torch.triangular_solve

`torch.triangular_solve(input, A, upper=True, transpose=False,
unitriangular=False) -> (Tensor, Tensor)`

    

Solves a system of equations with a triangular coefficient matrix AA and
multiple right-hand sides bb .

In particular, solves AX=bAX = b and assumes AA is upper-triangular with the
default keyword arguments.

`torch.triangular_solve(b, A)` can take in 2D inputs `b, A` or inputs that are
batches of 2D matrices. If the inputs are batches, then returns batched
outputs `X`

Supports real-valued and complex-valued inputs.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – multiple right-hand sides of size (∗,m,k)(*, m, k) where ∗* is zero of more batch dimensions (bb )
  * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input triangular coefficient matrix of size (∗,m,m)(*, m, m) where ∗* is zero or more batch dimensions
  * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to solve the upper-triangular system of equations (default) or the lower-triangular system of equations. Default: `True`.
  * **transpose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether AA should be transposed before being sent into the solver. Default: `False`.
  * **unitriangular** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether AA is unit triangular. If True, the diagonal elements of AA are assumed to be 1 and not referenced from AA . Default: `False`.

Returns

    

A namedtuple `(solution, cloned_coefficient)` where `cloned_coefficient` is a
clone of AA and `solution` is the solution XX to AX=bAX = b (or whatever
variant of the system of equations, depending on the keyword arguments.)

Examples:

    
    
    >>> A = torch.randn(2, 2).triu()
    >>> A
    tensor([[ 1.1527, -1.0753],
            [ 0.0000,  0.7986]])
    >>> b = torch.randn(2, 3)
    >>> b
    tensor([[-0.0210,  2.3513, -1.5492],
            [ 1.5429,  0.7403, -1.0243]])
    >>> torch.triangular_solve(b, A)
    torch.return_types.triangular_solve(
    solution=tensor([[ 1.7841,  2.9046, -2.5405],
            [ 1.9320,  0.9270, -1.2826]]),
    cloned_coefficient=tensor([[ 1.1527, -1.0753],
            [ 0.0000,  0.7986]]))
    

# torch.tril

`torch.tril(input, diagonal=0, *, out=None) → Tensor`

    

Returns the lower triangular part of the matrix (2-D tensor) or batch of
matrices `input`, the other elements of the result tensor `out` are set to 0.

The lower triangular part of the matrix is defined as the elements on and
below the diagonal.

The argument [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal")
controls which diagonal to consider. If
[`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") = 0, all elements
on and below the main diagonal are retained. A positive value includes just as
many diagonals above the main diagonal, and similarly a negative value
excludes just as many diagonals below the main diagonal. The main diagonal are
the set of indices {(i,i)}\lbrace (i, i) \rbrace for i∈[0,min⁡{d1,d2}−1]i \in
[0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1}, d_{2} are the dimensions of
the matrix.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **diagonal** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(3, 3)
    >>> a
    tensor([[-1.0813, -0.8619,  0.7105],
            [ 0.0935,  0.1380,  2.2112],
            [-0.3409, -0.9828,  0.0289]])
    >>> torch.tril(a)
    tensor([[-1.0813,  0.0000,  0.0000],
            [ 0.0935,  0.1380,  0.0000],
            [-0.3409, -0.9828,  0.0289]])
    
    >>> b = torch.randn(4, 6)
    >>> b
    tensor([[ 1.2219,  0.5653, -0.2521, -0.2345,  1.2544,  0.3461],
            [ 0.4785, -0.4477,  0.6049,  0.6368,  0.8775,  0.7145],
            [ 1.1502,  3.2716, -1.1243, -0.5413,  0.3615,  0.6864],
            [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024,  0.0978]])
    >>> torch.tril(b, diagonal=1)
    tensor([[ 1.2219,  0.5653,  0.0000,  0.0000,  0.0000,  0.0000],
            [ 0.4785, -0.4477,  0.6049,  0.0000,  0.0000,  0.0000],
            [ 1.1502,  3.2716, -1.1243, -0.5413,  0.0000,  0.0000],
            [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024,  0.0000]])
    >>> torch.tril(b, diagonal=-1)
    tensor([[ 0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
            [ 0.4785,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000],
            [ 1.1502,  3.2716,  0.0000,  0.0000,  0.0000,  0.0000],
            [-0.0614, -0.7344, -1.3164,  0.0000,  0.0000,  0.0000]])
    

# torch.tril_indices

`torch.tril_indices(row, col, offset=0, *, dtype=torch.long, device='cpu',
layout=torch.strided) → Tensor`

    

Returns the indices of the lower triangular part of a `row`-by- `col` matrix
in a 2-by-N Tensor, where the first row contains row coordinates of all
indices and the second row contains column coordinates. Indices are ordered
based on rows and then columns.

The lower triangular part of the matrix is defined as the elements on and
below the diagonal.

The argument `offset` controls which diagonal to consider. If `offset` = 0,
all elements on and below the main diagonal are retained. A positive value
includes just as many diagonals above the main diagonal, and similarly a
negative value excludes just as many diagonals below the main diagonal. The
main diagonal are the set of indices {(i,i)}\lbrace (i, i) \rbrace for
i∈[0,min⁡{d1,d2}−1]i \in [0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1},
d_{2} are the dimensions of the matrix.

Note

When running on CUDA, `row * col` must be less than 2592^{59} to prevent
overflow during calculation.

Parameters

    

  * **row** (`int`) – number of rows in the 2-D matrix.
  * **col** (`int`) – number of columns in the 2-D matrix.
  * **offset** (`int`) – diagonal offset from the main diagonal. Default: if not provided, 0.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, `torch.long`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – currently only support `torch.strided`.

Example::

    
    
    
    >>> a = torch.tril_indices(3, 3)
    >>> a
    tensor([[0, 1, 1, 2, 2, 2],
            [0, 0, 1, 0, 1, 2]])
    
    
    
    >>> a = torch.tril_indices(4, 3, -1)
    >>> a
    tensor([[1, 2, 2, 3, 3, 3],
            [0, 0, 1, 0, 1, 2]])
    
    
    
    >>> a = torch.tril_indices(4, 3, 1)
    >>> a
    tensor([[0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3],
            [0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2]])
    

# torch.triu

`torch.triu(input, diagonal=0, *, out=None) → Tensor`

    

Returns the upper triangular part of a matrix (2-D tensor) or batch of
matrices `input`, the other elements of the result tensor `out` are set to 0.

The upper triangular part of the matrix is defined as the elements on and
above the diagonal.

The argument [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal")
controls which diagonal to consider. If
[`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") = 0, all elements
on and above the main diagonal are retained. A positive value excludes just as
many diagonals above the main diagonal, and similarly a negative value
includes just as many diagonals below the main diagonal. The main diagonal are
the set of indices {(i,i)}\lbrace (i, i) \rbrace for i∈[0,min⁡{d1,d2}−1]i \in
[0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1}, d_{2} are the dimensions of
the matrix.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **diagonal** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(3, 3)
    >>> a
    tensor([[ 0.2309,  0.5207,  2.0049],
            [ 0.2072, -1.0680,  0.6602],
            [ 0.3480, -0.5211, -0.4573]])
    >>> torch.triu(a)
    tensor([[ 0.2309,  0.5207,  2.0049],
            [ 0.0000, -1.0680,  0.6602],
            [ 0.0000,  0.0000, -0.4573]])
    >>> torch.triu(a, diagonal=1)
    tensor([[ 0.0000,  0.5207,  2.0049],
            [ 0.0000,  0.0000,  0.6602],
            [ 0.0000,  0.0000,  0.0000]])
    >>> torch.triu(a, diagonal=-1)
    tensor([[ 0.2309,  0.5207,  2.0049],
            [ 0.2072, -1.0680,  0.6602],
            [ 0.0000, -0.5211, -0.4573]])
    
    >>> b = torch.randn(4, 6)
    >>> b
    tensor([[ 0.5876, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
            [-0.2447,  0.9556, -1.2919,  1.3378, -0.1768, -1.0857],
            [ 0.4333,  0.3146,  0.6576, -1.0432,  0.9348, -0.4410],
            [-0.9888,  1.0679, -1.3337, -1.6556,  0.4798,  0.2830]])
    >>> torch.triu(b, diagonal=1)
    tensor([[ 0.0000, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
            [ 0.0000,  0.0000, -1.2919,  1.3378, -0.1768, -1.0857],
            [ 0.0000,  0.0000,  0.0000, -1.0432,  0.9348, -0.4410],
            [ 0.0000,  0.0000,  0.0000,  0.0000,  0.4798,  0.2830]])
    >>> torch.triu(b, diagonal=-1)
    tensor([[ 0.5876, -0.0794, -1.8373,  0.6654,  0.2604,  1.5235],
            [-0.2447,  0.9556, -1.2919,  1.3378, -0.1768, -1.0857],
            [ 0.0000,  0.3146,  0.6576, -1.0432,  0.9348, -0.4410],
            [ 0.0000,  0.0000, -1.3337, -1.6556,  0.4798,  0.2830]])
    

# torch.triu_indices

`torch.triu_indices(row, col, offset=0, *, dtype=torch.long, device='cpu',
layout=torch.strided) → Tensor`

    

Returns the indices of the upper triangular part of a `row` by `col` matrix in
a 2-by-N Tensor, where the first row contains row coordinates of all indices
and the second row contains column coordinates. Indices are ordered based on
rows and then columns.

The upper triangular part of the matrix is defined as the elements on and
above the diagonal.

The argument `offset` controls which diagonal to consider. If `offset` = 0,
all elements on and above the main diagonal are retained. A positive value
excludes just as many diagonals above the main diagonal, and similarly a
negative value includes just as many diagonals below the main diagonal. The
main diagonal are the set of indices {(i,i)}\lbrace (i, i) \rbrace for
i∈[0,min⁡{d1,d2}−1]i \in [0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1},
d_{2} are the dimensions of the matrix.

Note

When running on CUDA, `row * col` must be less than 2592^{59} to prevent
overflow during calculation.

Parameters

    

  * **row** (`int`) – number of rows in the 2-D matrix.
  * **col** (`int`) – number of columns in the 2-D matrix.
  * **offset** (`int`) – diagonal offset from the main diagonal. Default: if not provided, 0.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, `torch.long`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – currently only support `torch.strided`.

Example::

    
    
    
    >>> a = torch.triu_indices(3, 3)
    >>> a
    tensor([[0, 0, 0, 1, 1, 2],
            [0, 1, 2, 1, 2, 2]])
    
    
    
    >>> a = torch.triu_indices(4, 3, -1)
    >>> a
    tensor([[0, 0, 0, 1, 1, 1, 2, 2, 3],
            [0, 1, 2, 0, 1, 2, 1, 2, 2]])
    
    
    
    >>> a = torch.triu_indices(4, 3, 1)
    >>> a
    tensor([[0, 0, 1],
            [1, 2, 2]])
    

# torch.true_divide

`torch.true_divide(dividend, divisor, *, out) → Tensor`

    

Alias for [`torch.div()`](torch.div#torch.div "torch.div") with
`rounding_mode=None`.

# torch.trunc

`torch.trunc(input, *, out=None) → Tensor`

    

Returns a new tensor with the truncated integer values of the elements of
`input`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4)
    >>> a
    tensor([ 3.4742,  0.5466, -0.8008, -0.9079])
    >>> torch.trunc(a)
    tensor([ 3.,  0., -0., -0.])
    

# torch.unbind

`torch.unbind(input, dim=0) → seq`

    

Removes a tensor dimension.

Returns a tuple of all slices along a given dimension, already without it.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to unbind
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension to remove

Example:

    
    
    >>> torch.unbind(torch.tensor([[1, 2, 3],
    >>>                            [4, 5, 6],
    >>>                            [7, 8, 9]]))
    (tensor([1, 2, 3]), tensor([4, 5, 6]), tensor([7, 8, 9]))
    

# torch.unique

`torch.unique(*args, **kwargs)`

    

Returns the unique elements of the input tensor.

Note

This function is different from
[`torch.unique_consecutive()`](torch.unique_consecutive#torch.unique_consecutive
"torch.unique_consecutive") in the sense that this function also eliminates
non-consecutive duplicate values.

Note

Currently in the CUDA implementation and the CPU implementation when dim is
specified, `torch.unique` always sort the tensor at the beginning regardless
of the `sort` argument. Sorting could be slow, so if your input tensor is
already sorted, it is recommended to use
[`torch.unique_consecutive()`](torch.unique_consecutive#torch.unique_consecutive
"torch.unique_consecutive") which avoids the sorting.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to sort the unique elements in ascending order before returning as output.
  * **return_inverse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the indices for where elements in the original input ended up in the returned unique list.
  * **return_counts** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the counts for each unique element.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to apply unique. If `None`, the unique of the flattened input is returned. default: `None`

Returns

    

A tensor or a tuple of tensors containing

  * **output** (_Tensor_): the output list of unique scalar elements.
  * **inverse_indices** (_Tensor_): (optional) if `return_inverse` is True, there will be an additional returned tensor (same shape as input) representing the indices for where elements in the original input map to in the output; otherwise, this function will only return a single tensor.
  * **counts** (_Tensor_): (optional) if `return_counts` is True, there will be an additional returned tensor (same shape as output or output.size(dim), if dim was specified) representing the number of occurrences for each unique value or tensor.

Return type

    

([Tensor](../tensors#torch.Tensor "torch.Tensor"),
[Tensor](../tensors#torch.Tensor "torch.Tensor") (optional),
[Tensor](../tensors#torch.Tensor "torch.Tensor") (optional))

Example:

    
    
    >>> output = torch.unique(torch.tensor([1, 3, 2, 3], dtype=torch.long))
    >>> output
    tensor([ 2,  3,  1])
    
    >>> output, inverse_indices = torch.unique(
    ...     torch.tensor([1, 3, 2, 3], dtype=torch.long), sorted=True, return_inverse=True)
    >>> output
    tensor([ 1,  2,  3])
    >>> inverse_indices
    tensor([ 0,  2,  1,  2])
    
    >>> output, inverse_indices = torch.unique(
    ...     torch.tensor([[1, 3], [2, 3]], dtype=torch.long), sorted=True, return_inverse=True)
    >>> output
    tensor([ 1,  2,  3])
    >>> inverse_indices
    tensor([[ 0,  2],
            [ 1,  2]])
    

# torch.unique_consecutive

`torch.unique_consecutive(*args, **kwargs)`

    

Eliminates all but the first element from every consecutive group of
equivalent elements.

Note

This function is different from [`torch.unique()`](torch.unique#torch.unique
"torch.unique") in the sense that this function only eliminates consecutive
duplicate values. This semantics is similar to `std::unique` in C++.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **return_inverse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the indices for where elements in the original input ended up in the returned unique list.
  * **return_counts** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the counts for each unique element.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to apply unique. If `None`, the unique of the flattened input is returned. default: `None`

Returns

    

A tensor or a tuple of tensors containing

  * **output** (_Tensor_): the output list of unique scalar elements.
  * **inverse_indices** (_Tensor_): (optional) if `return_inverse` is True, there will be an additional returned tensor (same shape as input) representing the indices for where elements in the original input map to in the output; otherwise, this function will only return a single tensor.
  * **counts** (_Tensor_): (optional) if `return_counts` is True, there will be an additional returned tensor (same shape as output or output.size(dim), if dim was specified) representing the number of occurrences for each unique value or tensor.

Return type

    

([Tensor](../tensors#torch.Tensor "torch.Tensor"),
[Tensor](../tensors#torch.Tensor "torch.Tensor") (optional),
[Tensor](../tensors#torch.Tensor "torch.Tensor") (optional))

Example:

    
    
    >>> x = torch.tensor([1, 1, 2, 2, 3, 1, 1, 2])
    >>> output = torch.unique_consecutive(x)
    >>> output
    tensor([1, 2, 3, 1, 2])
    
    >>> output, inverse_indices = torch.unique_consecutive(x, return_inverse=True)
    >>> output
    tensor([1, 2, 3, 1, 2])
    >>> inverse_indices
    tensor([0, 0, 1, 1, 2, 3, 3, 4])
    
    >>> output, counts = torch.unique_consecutive(x, return_counts=True)
    >>> output
    tensor([1, 2, 3, 1, 2])
    >>> counts
    tensor([2, 2, 1, 2, 1])
    

# torch.unsqueeze

`torch.unsqueeze(input, dim) → Tensor`

    

Returns a new tensor with a dimension of size one inserted at the specified
position.

The returned tensor shares the same underlying data with this tensor.

A `dim` value within the range `[-input.dim() - 1, input.dim() + 1)` can be
used. Negative `dim` will correspond to `unsqueeze()` applied at `dim` = `dim
+ input.dim() + 1`.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the index at which to insert the singleton dimension

Example:

    
    
    >>> x = torch.tensor([1, 2, 3, 4])
    >>> torch.unsqueeze(x, 0)
    tensor([[ 1,  2,  3,  4]])
    >>> torch.unsqueeze(x, 1)
    tensor([[ 1],
            [ 2],
            [ 3],
            [ 4]])
    

# torch.use_deterministic_algorithms

`torch.use_deterministic_algorithms(d)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#use_deterministic_algorithms)

    

Sets whether PyTorch operations must use “deterministic” algorithms. That is,
algorithms which, given the same input, and when run on the same software and
hardware, always produce the same output. When True, operations will use
deterministic algorithms when available, and if only nondeterministic
algorithms are available they will throw a :class:RuntimeError when called.

Warning

This feature is in beta, and its design and implementation may change in the
future.

The following normally-nondeterministic operations will act deterministically
when `d=True`:

  * [`torch.nn.Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") when called on CUDA tensor
  * [`torch.nn.Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") when called on CUDA tensor
  * [`torch.nn.Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") when called on CUDA tensor
  * [`torch.nn.ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") when called on CUDA tensor
  * [`torch.nn.ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") when called on CUDA tensor
  * [`torch.nn.ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") when called on CUDA tensor
  * [`torch.bmm()`](torch.bmm#torch.bmm "torch.bmm") when called on sparse-dense CUDA tensors
  * `torch.__getitem__()` backward when `self` is a CPU tensor and `indices` is a list of tensors
  * `torch.index_put()` with `accumulate=True` when called on a CPU tensor

The following normally-nondeterministic operations will throw a
[`RuntimeError`](https://docs.python.org/3/library/exceptions.html#RuntimeError
"\(in Python v3.9\)") when `d=True`:

  * [`torch.nn.AvgPool3d`](torch.nn.avgpool3d#torch.nn.AvgPool3d "torch.nn.AvgPool3d") when called on a CUDA tensor that requires grad
  * [`torch.nn.AdaptiveAvgPool2d`](torch.nn.adaptiveavgpool2d#torch.nn.AdaptiveAvgPool2d "torch.nn.AdaptiveAvgPool2d") when called on a CUDA tensor that requires grad
  * [`torch.nn.AdaptiveAvgPool3d`](torch.nn.adaptiveavgpool3d#torch.nn.AdaptiveAvgPool3d "torch.nn.AdaptiveAvgPool3d") when called on a CUDA tensor that requires grad
  * [`torch.nn.MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") when called on a CUDA tensor that requires grad
  * [`torch.nn.AdaptiveMaxPool2d`](torch.nn.adaptivemaxpool2d#torch.nn.AdaptiveMaxPool2d "torch.nn.AdaptiveMaxPool2d") when called on a CUDA tensor that requires grad
  * [`torch.nn.FractionalMaxPool2d`](torch.nn.fractionalmaxpool2d#torch.nn.FractionalMaxPool2d "torch.nn.FractionalMaxPool2d") when called on a CUDA tensor that requires grad
  * `torch.nn.FractionalMaxPool3d` when called on a CUDA tensor that requires grad
  * [`torch.nn.functional.interpolate()`](../nn.functional#torch.nn.functional.interpolate "torch.nn.functional.interpolate") when called on a CUDA tensor that requires grad and one of the following modes is used:

    * `linear`
    * `bilinear`
    * `bicubic`
    * `trilinear`
  * [`torch.nn.ReflectionPad1d`](torch.nn.reflectionpad1d#torch.nn.ReflectionPad1d "torch.nn.ReflectionPad1d") when called on a CUDA tensor that requires grad
  * [`torch.nn.ReflectionPad2d`](torch.nn.reflectionpad2d#torch.nn.ReflectionPad2d "torch.nn.ReflectionPad2d") when called on a CUDA tensor that requires grad
  * [`torch.nn.ReplicationPad1d`](torch.nn.replicationpad1d#torch.nn.ReplicationPad1d "torch.nn.ReplicationPad1d") when called on a CUDA tensor that requires grad
  * [`torch.nn.ReplicationPad2d`](torch.nn.replicationpad2d#torch.nn.ReplicationPad2d "torch.nn.ReplicationPad2d") when called on a CUDA tensor that requires grad
  * [`torch.nn.ReplicationPad3d`](torch.nn.replicationpad3d#torch.nn.ReplicationPad3d "torch.nn.ReplicationPad3d") when called on a CUDA tensor that requires grad
  * [`torch.nn.NLLLoss`](torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") when called on a CUDA tensor that requires grad
  * [`torch.nn.CTCLoss`](torch.nn.ctcloss#torch.nn.CTCLoss "torch.nn.CTCLoss") when called on a CUDA tensor that requires grad
  * [`torch.nn.EmbeddingBag`](torch.nn.embeddingbag#torch.nn.EmbeddingBag "torch.nn.EmbeddingBag") when called on a CUDA tensor that requires grad
  * `torch.scatter_add_()` when called on a CUDA tensor
  * `torch.index_add_()` when called on a CUDA tensor
  * `torch.index_copy()`
  * [`torch.index_select()`](torch.index_select#torch.index_select "torch.index_select") when called on a CUDA tensor that requires grad
  * [`torch.repeat_interleave()`](torch.repeat_interleave#torch.repeat_interleave "torch.repeat_interleave") when called on a CUDA tensor that requires grad
  * [`torch.histc()`](torch.histc#torch.histc "torch.histc") when called on a CUDA tensor
  * [`torch.bincount()`](torch.bincount#torch.bincount "torch.bincount") when called on a CUDA tensor
  * [`torch.kthvalue()`](torch.kthvalue#torch.kthvalue "torch.kthvalue") with called on a CUDA tensor
  * [`torch.median()`](torch.median#torch.median "torch.median") with indices output when called on a CUDA tensor

A handful of CUDA operations are nondeterministic if the CUDA version is 10.2
or greater, unless the environment variable `CUBLAS_WORKSPACE_CONFIG=:4096:8`
or `CUBLAS_WORKSPACE_CONFIG=:16:8` is set. See the CUDA documentation for more
details:
<https://docs.nvidia.com/cuda/cublas/index.html#cublasApi_reproducibility> If
one of these environment variable configurations is not set, a
[`RuntimeError`](https://docs.python.org/3/library/exceptions.html#RuntimeError
"\(in Python v3.9\)") will be raised from these operations when called with
CUDA tensors:

  * [`torch.mm()`](torch.mm#torch.mm "torch.mm")
  * [`torch.mv()`](torch.mv#torch.mv "torch.mv")
  * [`torch.bmm()`](torch.bmm#torch.bmm "torch.bmm")

Note that deterministic operations tend to have worse performance than non-
deterministic operations.

Parameters

    

**d** ([`bool`](https://docs.python.org/3/library/functions.html#bool "\(in
Python v3.9\)")) – If True, force operations to be deterministic. If False,
allow non-deterministic operations.

# torch.vander

`torch.vander(x, N=None, increasing=False) → Tensor`

    

Generates a Vandermonde matrix.

The columns of the output matrix are elementwise powers of the input vector
x(N−1),x(N−2),...,x0x^{(N-1)}, x^{(N-2)}, ..., x^0 . If increasing is True,
the order of the columns is reversed x0,x1,...,x(N−1)x^0, x^1, ..., x^{(N-1)}
. Such a matrix with a geometric progression in each row is named for
Alexandre-Theophile Vandermonde.

Parameters

    

  * **x** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D input tensor.
  * **N** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of columns in the output. If N is not specified, a square array is returned (N=len(x))(N = len(x)) .
  * **increasing** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Order of the powers of the columns. If True, the powers increase from left to right, if False (the default) they are reversed.

Returns

    

Vandermonde matrix. If increasing is False, the first column is
x(N−1)x^{(N-1)} , the second x(N−2)x^{(N-2)} and so forth. If increasing is
True, the columns are x0,x1,...,x(N−1)x^0, x^1, ..., x^{(N-1)} .

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> x = torch.tensor([1, 2, 3, 5])
    >>> torch.vander(x)
    tensor([[  1,   1,   1,   1],
            [  8,   4,   2,   1],
            [ 27,   9,   3,   1],
            [125,  25,   5,   1]])
    >>> torch.vander(x, N=3)
    tensor([[ 1,  1,  1],
            [ 4,  2,  1],
            [ 9,  3,  1],
            [25,  5,  1]])
    >>> torch.vander(x, N=3, increasing=True)
    tensor([[ 1,  1,  1],
            [ 1,  2,  4],
            [ 1,  3,  9],
            [ 1,  5, 25]])
    

# torch.var

`torch.var(input, unbiased=True) → Tensor`

    

Returns the variance of all elements in the `input` tensor.

If `unbiased` is `False`, then the variance will be calculated via the biased
estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[-0.3425, -1.2636, -0.4864]])
    >>> torch.var(a)
    tensor(0.2455)
    

`torch.var(input, dim, unbiased=True, keepdim=False, *, out=None) → Tensor`

Returns the variance of each row of the `input` tensor in the given dimension
`dim`.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

If `unbiased` is `False`, then the variance will be calculated via the biased
estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[-0.3567,  1.7385, -1.3042,  0.7423],
            [ 1.3436, -0.1015, -0.9834, -0.8438],
            [ 0.6056,  0.1089, -0.3112, -1.4085],
            [-0.7700,  0.6074, -0.1469,  0.7777]])
    >>> torch.var(a, 1)
    tensor([ 1.7444,  1.1363,  0.7356,  0.5112])
    

# torch.var_mean

`torch.var_mean(input, unbiased=True) -> (Tensor, Tensor)`

    

Returns the variance and mean of all elements in the `input` tensor.

If `unbiased` is `False`, then the variance will be calculated via the biased
estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not

Example:

    
    
    >>> a = torch.randn(1, 3)
    >>> a
    tensor([[0.0146, 0.4258, 0.2211]])
    >>> torch.var_mean(a)
    (tensor(0.0423), tensor(0.2205))
    

`torch.var_mean(input, dim, keepdim=False, unbiased=True) -> (Tensor, Tensor)`

Returns the variance and mean of each row of the `input` tensor in the given
dimension `dim`.

If `keepdim` is `True`, the output tensor is of the same size as `input`
except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is
squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze
"torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`)
fewer dimension(s).

If `unbiased` is `False`, then the variance will be calculated via the biased
estimator. Otherwise, Bessel’s correction will be used.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor.
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce.
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not.
  * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not

Example:

    
    
    >>> a = torch.randn(4, 4)
    >>> a
    tensor([[-1.5650,  2.0415, -0.1024, -0.5790],
            [ 0.2325, -2.6145, -1.6428, -0.3537],
            [-0.2159, -1.1069,  1.2882, -1.3265],
            [-0.6706, -1.5893,  0.6827,  1.6727]])
    >>> torch.var_mean(a, 1)
    (tensor([2.3174, 1.6403, 1.4092, 2.0791]), tensor([-0.0512, -1.0946, -0.3403,  0.0239]))
    

# torch.vdot

`torch.vdot(input, other, *, out=None) → Tensor`

    

Computes the dot product of two 1D tensors. The vdot(a, b) function handles
complex numbers differently than dot(a, b). If the first argument is complex,
the complex conjugate of the first argument is used for the calculation of the
dot product.

Note

Unlike NumPy’s vdot, torch.vdot intentionally only supports computing the dot
product of two 1D tensors with the same number of elements.

Parameters

    

  * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor in the dot product, must be 1D. Its conjugate is used if it’s complex.
  * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor in the dot product, must be 1D.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> torch.vdot(torch.tensor([2, 3]), torch.tensor([2, 1]))
    tensor(7)
    >>> a = torch.tensor((1 +2j, 3 - 1j))
    >>> b = torch.tensor((2 +1j, 4 - 0j))
    >>> torch.vdot(a, b)
    tensor([16.+1.j])
    >>> torch.vdot(b, a)
    tensor([16.-1.j])
    

# torch.view_as_complex

`torch.view_as_complex(input) → Tensor`

    

Returns a view of `input` as a complex tensor. For an input complex tensor of
`size` m1,m2,…,mi,2m1, m2, \dots, mi, 2 , this function returns a new complex
tensor of `size` m1,m2,…,mim1, m2, \dots, mi where the last dimension of the
input tensor is expected to represent the real and imaginary components of
complex numbers.

Warning

`view_as_complex()` is only supported for tensors with
[`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")
`torch.float64` and `torch.float32`. The input is expected to have the last
dimension of `size` 2\. In addition, the tensor must have a `stride` of 1 for
its last dimension. The strides of all other dimensions must be even numbers.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example::

    
    
    
    >>> x=torch.randn(4, 2)
    >>> x
    tensor([[ 1.6116, -0.5772],
            [-1.4606, -0.9120],
            [ 0.0786, -1.7497],
            [-0.6561, -1.6623]])
    >>> torch.view_as_complex(x)
    tensor([(1.6116-0.5772j), (-1.4606-0.9120j), (0.0786-1.7497j), (-0.6561-1.6623j)])
    

# torch.view_as_real

`torch.view_as_real(input) → Tensor`

    

Returns a view of `input` as a real tensor. For an input complex tensor of
`size` m1,m2,…,mim1, m2, \dots, mi , this function returns a new real tensor
of size m1,m2,…,mi,2m1, m2, \dots, mi, 2 , where the last dimension of size 2
represents the real and imaginary components of complex numbers.

Warning

`view_as_real()` is only supported for tensors with `complex dtypes`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input
tensor.

Example::

    
    
    
    >>> x=torch.randn(4, dtype=torch.cfloat)
    >>> x
    tensor([(0.4737-0.3839j), (-0.2098-0.6699j), (0.3470-0.9451j), (-0.5174-1.3136j)])
    >>> torch.view_as_real(x)
    tensor([[ 0.4737, -0.3839],
            [-0.2098, -0.6699],
            [ 0.3470, -0.9451],
            [-0.5174, -1.3136]])
    

# torch.vstack

`torch.vstack(tensors, *, out=None) → Tensor`

    

Stack tensors in sequence vertically (row wise).

This is equivalent to concatenation along the first axis after all 1-D tensors
have been reshaped by [`torch.atleast_2d()`](torch.atleast_2d#torch.atleast_2d
"torch.atleast_2d").

Parameters

    

**tensors** (_sequence of Tensors_) – sequence of tensors to concatenate

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> a = torch.tensor([1, 2, 3])
    >>> b = torch.tensor([4, 5, 6])
    >>> torch.vstack((a,b))
    tensor([[1, 2, 3],
            [4, 5, 6]])
    >>> a = torch.tensor([[1],[2],[3]])
    >>> b = torch.tensor([[4],[5],[6]])
    >>> torch.vstack((a,b))
    tensor([[1],
            [2],
            [3],
            [4],
            [5],
            [6]])
    

# torch.where

`torch.where(condition, x, y) → Tensor`

    

Return a tensor of elements selected from either `x` or `y`, depending on
`condition`.

The operation is defined as:

outi={xiif conditioniyiotherwise\text{out}_i = \begin{cases} \text{x}_i &
\text{if } \text{condition}_i \\\ \text{y}_i & \text{otherwise} \\\
\end{cases}

Note

The tensors `condition`, `x`, `y` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

Note

Currently valid scalar and tensor combination are 1. Scalar of floating dtype
and torch.double 2. Scalar of integral dtype and torch.long 3. Scalar of
complex dtype and torch.complex128

Parameters

    

  * **condition** (_BoolTensor_) – When True (nonzero), yield x, otherwise yield y
  * **x** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – value (if :attr:x is a scalar) or values selected at indices where `condition` is `True`
  * **y** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – value (if :attr:x is a scalar) or values selected at indices where `condition` is `False`

Returns

    

A tensor of shape equal to the broadcasted shape of `condition`, `x`, `y`

Return type

    

[Tensor](../tensors#torch.Tensor "torch.Tensor")

Example:

    
    
    >>> x = torch.randn(3, 2)
    >>> y = torch.ones(3, 2)
    >>> x
    tensor([[-0.4620,  0.3139],
            [ 0.3898, -0.7197],
            [ 0.0478, -0.1657]])
    >>> torch.where(x > 0, x, y)
    tensor([[ 1.0000,  0.3139],
            [ 0.3898,  1.0000],
            [ 0.0478,  1.0000]])
    >>> x = torch.randn(2, 2, dtype=torch.double)
    >>> x
    tensor([[ 1.0779,  0.0383],
            [-0.8785, -1.1089]], dtype=torch.float64)
    >>> torch.where(x > 0, x, 0.)
    tensor([[1.0779, 0.0383],
            [0.0000, 0.0000]], dtype=torch.float64)
    

`torch.where(condition) → tuple of LongTensor`

`torch.where(condition)` is identical to `torch.nonzero(condition,
as_tuple=True)`.

Note

See also [`torch.nonzero()`](torch.nonzero#torch.nonzero "torch.nonzero").

# torch.xlogy

`torch.xlogy(input, other, *, out=None) → Tensor`

    

Computes `input * log(other)` with the following cases.

outi={NaNif otheri=NaN0if inputi=0.0inputi∗log⁡(otheri)otherwise\text{out}_{i}
= \begin{cases} \text{NaN} & \text{if } \text{other}_{i} = \text{NaN} \\\ 0 &
\text{if } \text{input}_{i} = 0.0 \\\ \text{input}_{i} *
\log{(\text{other}_{i})} & \text{otherwise} \end{cases}

Similar to SciPy’s `scipy.special.xlogy`.

Parameters

    

  * **input** (_Number_ _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – 
  * **other** (_Number_ _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – 

Note

At least one of `input` or `other` must be a tensor.

Keyword Arguments

    

**out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the
output tensor.

Example:

    
    
    >>> x = torch.zeros(5,)
    >>> y = torch.tensor([-1, 0, 1, float('inf'), float('nan')])
    >>> torch.xlogy(x, y)
    tensor([0., 0., 0., 0., nan])
    >>> x = torch.tensor([1, 2, 3])
    >>> y = torch.tensor([3, 2, 1])
    >>> torch.xlogy(x, y)
    tensor([1.0986, 1.3863, 0.0000])
    >>> torch.xlogy(x, 4)
    tensor([1.3863, 2.7726, 4.1589])
    >>> torch.xlogy(2, y)
    tensor([2.1972, 1.3863, 0.0000])
    

# torch.zeros

`torch.zeros(*size, *, out=None, dtype=None, layout=torch.strided,
device=None, requires_grad=False) → Tensor`

    

Returns a tensor filled with the scalar value `0`, with the shape defined by
the variable argument `size`.

Parameters

    

**size** (_int..._) – a sequence of integers defining the shape of the output
tensor. Can be a variable number of arguments or a collection like a list or
tuple.

Keyword Arguments

    

  * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.
  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")).
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> torch.zeros(2, 3)
    tensor([[ 0.,  0.,  0.],
            [ 0.,  0.,  0.]])
    
    >>> torch.zeros(5)
    tensor([ 0.,  0.,  0.,  0.,  0.])
    

# torch.zeros_like

`torch.zeros_like(input, *, dtype=None, layout=None, device=None,
requires_grad=False, memory_format=torch.preserve_format) → Tensor`

    

Returns a tensor filled with the scalar value `0`, with the same size as
`input`. `torch.zeros_like(input)` is equivalent to `torch.zeros(input.size(),
dtype=input.dtype, layout=input.layout, device=input.device)`.

Warning

As of 0.4, this function does not support an `out` keyword. As an alternative,
the old `torch.zeros_like(input, out=output)` is equivalent to
`torch.zeros(input.size(), out=output)`.

Parameters

    

**input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of
`input` will determine size of the output tensor.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`.
  * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`.
  * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.
  * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

Example:

    
    
    >>> input = torch.empty(2, 3)
    >>> torch.zeros_like(input)
    tensor([[ 0.,  0.,  0.],
            [ 0.,  0.,  0.]])
    

# torch.hub

Pytorch Hub is a pre-trained model repository designed to facilitate research
reproducibility.

## Publishing models

Pytorch Hub supports publishing pre-trained models(model definitions and pre-
trained weights) to a github repository by adding a simple `hubconf.py` file;

`hubconf.py` can have multiple entrypoints. Each entrypoint is defined as a
python function (example: a pre-trained model you want to publish).

    
    
    def entrypoint_name(*args, **kwargs):
        # args & kwargs are optional, for models which take positional/keyword arguments.
        ...
    

### How to implement an entrypoint?

Here is a code snippet specifies an entrypoint for `resnet18` model if we
expand the implementation in `pytorch/vision/hubconf.py`. In most case
importing the right function in `hubconf.py` is sufficient. Here we just want
to use the expanded version as an example to show how it works. You can see
the full script in [pytorch/vision
repo](https://github.com/pytorch/vision/blob/master/hubconf.py)

    
    
    dependencies = ['torch']
    from torchvision.models.resnet import resnet18 as _resnet18
    
    # resnet18 is the name of entrypoint
    def resnet18(pretrained=False, **kwargs):
        """ # This docstring shows up in hub.help()
        Resnet18 model
        pretrained (bool): kwargs, load pretrained weights into the model
        """
        # Call the model, load pretrained weights
        model = _resnet18(pretrained=pretrained, **kwargs)
        return model
    

  * `dependencies` variable is a **list** of package names required to **load** the model. Note this might be slightly different from dependencies required for training a model.
  * `args` and `kwargs` are passed along to the real callable function.
  * Docstring of the function works as a help message. It explains what does the model do and what are the allowed positional/keyword arguments. It’s highly recommended to add a few examples here.
  * Entrypoint function can either return a model(nn.module), or auxiliary tools to make the user workflow smoother, e.g. tokenizers.
  * Callables prefixed with underscore are considered as helper functions which won’t show up in `torch.hub.list()`.
  * Pretrained weights can either be stored locally in the github repo, or loadable by `torch.hub.load_state_dict_from_url()`. If less than 2GB, it’s recommended to attach it to a [project release](https://help.github.com/en/articles/distributing-large-binaries) and use the url from the release. In the example above `torchvision.models.resnet.resnet18` handles `pretrained`, alternatively you can put the following logic in the entrypoint definition.

    
    
    if pretrained:
        # For checkpoint saved in local github repo, e.g. <RELATIVE_PATH_TO_CHECKPOINT>=weights/save.pth
        dirname = os.path.dirname(__file__)
        checkpoint = os.path.join(dirname, <RELATIVE_PATH_TO_CHECKPOINT>)
        state_dict = torch.load(checkpoint)
        model.load_state_dict(state_dict)
    
        # For checkpoint saved elsewhere
        checkpoint = 'https://download.pytorch.org/models/resnet18-5c106cde.pth'
        model.load_state_dict(torch.hub.load_state_dict_from_url(checkpoint, progress=False))
    

### Important Notice

  * The published models should be at least in a branch/tag. It can’t be a random commit.

## Loading models from Hub

Pytorch Hub provides convenient APIs to explore all available models in hub
through `torch.hub.list()`, show docstring and examples through
`torch.hub.help()` and load the pre-trained models using `torch.hub.load()`.

`torch.hub.list(github, force_reload=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#list)

    

List all entrypoints available in `github` hubconf.

Parameters

    

  * **github** (_string_) – a string with format “repo_owner/repo_name[:tag_name]” with an optional tag/branch. The default branch is `master` if not specified. Example: ‘pytorch/vision[:hub]’
  * **force_reload** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to discard the existing cache and force a fresh download. Default is `False`.

Returns

    

a list of available entrypoint names

Return type

    

entrypoints

#### Example

    
    
    >>> entrypoints = torch.hub.list('pytorch/vision', force_reload=True)
    

`torch.hub.help(github, model, force_reload=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#help)

    

Show the docstring of entrypoint `model`.

Parameters

    

  * **github** (_string_) – a string with format <repo_owner/repo_name[:tag_name]> with an optional tag/branch. The default branch is `master` if not specified. Example: ‘pytorch/vision[:hub]’
  * **model** (_string_) – a string of entrypoint name defined in repo’s hubconf.py
  * **force_reload** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to discard the existing cache and force a fresh download. Default is `False`.

#### Example

    
    
    >>> print(torch.hub.help('pytorch/vision', 'resnet18', force_reload=True))
    

`torch.hub.load(repo_or_dir, model, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#load)

    

Load a model from a github repo or a local directory.

Note: Loading a model is the typical use case, but this can also be used to
for loading other objects such as tokenizers, loss functions, etc.

If `source` is `'github'`, `repo_or_dir` is expected to be of the form
`repo_owner/repo_name[:tag_name]` with an optional tag/branch.

If `source` is `'local'`, `repo_or_dir` is expected to be a path to a local
directory.

Parameters

    

  * **repo_or_dir** (_string_) – repo name (`repo_owner/repo_name[:tag_name]`), if `source = 'github'`; or a path to a local directory, if `source = 'local'`.
  * **model** (_string_) – the name of a callable (entrypoint) defined in the repo/dir’s `hubconf.py`.
  * ***args** (_optional_) – the corresponding args for callable `model`.
  * **source** (_string_ _,__optional_) – `'github'` | `'local'`. Specifies how `repo_or_dir` is to be interpreted. Default is `'github'`.
  * **force_reload** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to force a fresh download of the github repo unconditionally. Does not have any effect if `source = 'local'`. Default is `False`.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, mute messages about hitting local caches. Note that the message about first download cannot be muted. Does not have any effect if `source = 'local'`. Default is `True`.
  * ****kwargs** (_optional_) – the corresponding kwargs for callable `model`.

Returns

    

The output of the `model` callable when called with the given `*args` and
`**kwargs`.

#### Example

    
    
    >>> # from a github repo
    >>> repo = 'pytorch/vision'
    >>> model = torch.hub.load(repo, 'resnet50', pretrained=True)
    >>> # from a local directory
    >>> path = '/some/local/path/pytorch/vision'
    >>> model = torch.hub.load(path, 'resnet50', pretrained=True)
    

`torch.hub.download_url_to_file(url, dst, hash_prefix=None, progress=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#download_url_to_file)

    

Download object at the given URL to a local path.

Parameters

    

  * **url** (_string_) – URL of the object to download
  * **dst** (_string_) – Full path where object will be saved, e.g. `/tmp/temporary_file`
  * **hash_prefix** (_string_ _,__optional_) – If not None, the SHA256 downloaded file should start with `hash_prefix`. Default: None
  * **progress** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether or not to display a progress bar to stderr Default: True

#### Example

    
    
    >>> torch.hub.download_url_to_file('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth', '/tmp/temporary_file')
    

`torch.hub.load_state_dict_from_url(url, model_dir=None, map_location=None,
progress=True, check_hash=False, file_name=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#load_state_dict_from_url)

    

Loads the Torch serialized object at the given URL.

If downloaded file is a zip file, it will be automatically decompressed.

If the object is already present in `model_dir`, it’s deserialized and
returned. The default value of `model_dir` is `<hub_dir>/checkpoints` where
`hub_dir` is the directory returned by `get_dir()`.

Parameters

    

  * **url** (_string_) – URL of the object to download
  * **model_dir** (_string_ _,__optional_) – directory in which to save the object
  * **map_location** (_optional_) – a function or a dict specifying how to remap storage locations (see torch.load)
  * **progress** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether or not to display a progress bar to stderr. Default: True
  * **check_hash** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, the filename part of the URL should follow the naming convention `filename-<sha256>.ext` where `<sha256>` is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file. Default: False
  * **file_name** (_string_ _,__optional_) – name for the downloaded file. Filename from `url` will be used if not set.

#### Example

    
    
    >>> state_dict = torch.hub.load_state_dict_from_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth')
    

### Running a loaded model:

Note that `*args` and `**kwargs` in `torch.hub.load()` are used to
**instantiate** a model. After you have loaded a model, how can you find out
what you can do with the model? A suggested workflow is

  * `dir(model)` to see all available methods of the model.
  * `help(model.foo)` to check what arguments `model.foo` takes to run

To help users explore without referring to documentation back and forth, we
strongly recommend repo owners make function help messages clear and succinct.
It’s also helpful to include a minimal working example.

### Where are my downloaded models saved?

The locations are used in the order of

  * Calling `hub.set_dir(<PATH_TO_HUB_DIR>)`
  * `$TORCH_HOME/hub`, if environment variable `TORCH_HOME` is set.
  * `$XDG_CACHE_HOME/torch/hub`, if environment variable `XDG_CACHE_HOME` is set.
  * `~/.cache/torch/hub`

`torch.hub.get_dir()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#get_dir)

    

Get the Torch Hub cache directory used for storing downloaded models &
weights.

If `set_dir()` is not called, default path is `$TORCH_HOME/hub` where
environment variable `$TORCH_HOME` defaults to `$XDG_CACHE_HOME/torch`.
`$XDG_CACHE_HOME` follows the X Design Group specification of the Linux
filesystem layout, with a default value `~/.cache` if the environment variable
is not set.

`torch.hub.set_dir(d)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#set_dir)

    

Optionally set the Torch Hub directory used to save downloaded models &
weights.

Parameters

    

**d** (_string_) – path to a local folder to save downloaded models & weights.

### Caching logic

By default, we don’t clean up files after loading it. Hub uses the cache by
default if it already exists in the directory returned by `get_dir()`.

Users can force a reload by calling `hub.load(..., force_reload=True)`. This
will delete the existing github folder and downloaded weights, reinitialize a
fresh download. This is useful when updates are published to the same branch,
users can keep up with the latest release.

### Known limitations:

Torch hub works by importing the package as if it was installed. There’re some
side effects introduced by importing in Python. For example, you can see new
items in Python caches `sys.modules` and `sys.path_importer_cache` which is
normal Python behavior.

A known limitation that worth mentioning here is user **CANNOT** load two
different branches of the same repo in the **same python process**. It’s just
like installing two packages with the same name in Python, which is not good.
Cache might join the party and give you surprises if you actually try that. Of
course it’s totally fine to load them in separate processes.

# PyTorch documentation

PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.

Features described in this documentation are classified by release status:

_Stable:_ These features will be maintained long-term and there should
generally be no major performance limitations or gaps in documentation. We
also expect to maintain backwards compatibility (although breaking changes can
happen and notice will be given one release ahead of time).

_Beta:_ Features are tagged as Beta because the API may change based on user
feedback, because the performance needs to improve, or because coverage across
operators is not yet complete. For Beta features, we are committing to seeing
the feature through to the Stable classification. We are not, however,
committing to backwards compatibility.

_Prototype:_ These features are typically not available as part of binary
distributions like PyPI or Conda, except sometimes behind run-time flags, and
are at an early stage for feedback and testing.

Notes

  * [Automatic Mixed Precision examples](https://pytorch.org/docs/1.8.0/notes/amp_examples.html)
  * [Autograd mechanics](https://pytorch.org/docs/1.8.0/notes/autograd.html)
  * [Broadcasting semantics](https://pytorch.org/docs/1.8.0/notes/broadcasting.html)
  * [CPU threading and TorchScript inference](https://pytorch.org/docs/1.8.0/notes/cpu_threading_torchscript_inference.html)
  * [CUDA semantics](https://pytorch.org/docs/1.8.0/notes/cuda.html)
  * [Distributed Data Parallel](https://pytorch.org/docs/1.8.0/notes/ddp.html)
  * [Extending PyTorch](https://pytorch.org/docs/1.8.0/notes/extending.html)
  * [Frequently Asked Questions](https://pytorch.org/docs/1.8.0/notes/faq.html)
  * [Features for large-scale deployments](https://pytorch.org/docs/1.8.0/notes/large_scale_deployments.html)
  * [Modules](https://pytorch.org/docs/1.8.0/notes/modules.html)
  * [Multiprocessing best practices](https://pytorch.org/docs/1.8.0/notes/multiprocessing.html)
  * [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html)
  * [Serialization semantics](https://pytorch.org/docs/1.8.0/notes/serialization.html)
  * [Windows FAQ](https://pytorch.org/docs/1.8.0/notes/windows.html)

Language Bindings

  * [C++](https://pytorch.org/docs/1.8.0/cpp_index.html)
  * [Javadoc](https://pytorch.org/javadoc/)

Python API

  * [torch](torch)
  * [torch.nn](nn)
  * [torch.nn.functional](nn.functional)
  * [torch.Tensor](tensors)
  * [Tensor Attributes](tensor_attributes)
  * [Tensor Views](tensor_view)
  * [torch.autograd](autograd)
  * [torch.cuda](cuda)
  * [torch.cuda.amp](amp)
  * [torch.backends](backends)
  * [torch.distributed](distributed)
  * [torch.distributions](distributions)
  * [torch.fft](fft)
  * [torch.futures](futures)
  * [torch.fx](fx)
  * [torch.hub](hub)
  * [torch.jit](jit)
  * [torch.linalg](linalg)
  * [torch.overrides](torch.overrides)
  * [torch.nn.init](nn.init)
  * [torch.onnx](onnx)
  * [torch.optim](optim)
  * [Complex Numbers](complex_numbers)
  * [DDP Communication Hooks](ddp_comm_hooks)
  * [Pipeline Parallelism](pipeline)
  * [Quantization](quantization)
  * [Distributed RPC Framework](rpc)
  * [torch.random](random)
  * [torch.sparse](sparse)
  * [torch.Storage](storage)
  * [torch.utils.benchmark](benchmark_utils)
  * [torch.utils.bottleneck](bottleneck)
  * [torch.utils.checkpoint](checkpoint)
  * [torch.utils.cpp_extension](cpp_extension)
  * [torch.utils.data](data)
  * [torch.utils.dlpack](dlpack)
  * [torch.utils.mobile_optimizer](mobile_optimizer)
  * [torch.utils.model_zoo](model_zoo)
  * [torch.utils.tensorboard](tensorboard)
  * [Type Info](type_info)
  * [Named Tensors](named_tensor)
  * [Named Tensors operator coverage](name_inference)
  * [torch.__config__](__config__)

Libraries

  * [torchaudio](https://pytorch.org/audio/stable)
  * [torchtext](https://pytorch.org/text/stable)
  * [torchvision](https://pytorch.org/vision/stable)
  * [TorchElastic](https://pytorch.org/elastic/)
  * [TorchServe](https://pytorch.org/serve)
  * [PyTorch on XLA Devices](http://pytorch.org/xla/)

Community

  * [PyTorch Contribution Guide](https://pytorch.org/docs/1.8.0/community/contribution_guide.html)
  * [PyTorch Governance](https://pytorch.org/docs/1.8.0/community/governance.html)
  * [PyTorch Governance | Persons of Interest](https://pytorch.org/docs/1.8.0/community/persons_of_interest.html)

# Indices and tables

  * [Index](https://pytorch.org/docs/1.8.0/genindex.html)
  * [Module Index](https://pytorch.org/docs/1.8.0/py-modindex.html)

# TorchScript

  * Creating TorchScript Code
  * Mixing Tracing and Scripting
  * TorchScript Language
  * Built-in Functions and Modules

    * PyTorch Functions and Modules
    * Python Functions and Modules
    * Python Language Reference Comparison
  * Debugging

    * Disable JIT for Debugging
    * Inspecting Code
    * Interpreting Graphs
    * Tracer
  * Frequently Asked Questions
  * Appendix

    * Migrating to PyTorch 1.2 Recursive Scripting API
    * References

TorchScript is a way to create serializable and optimizable models from
PyTorch code. Any TorchScript program can be saved from a Python process and
loaded in a process where there is no Python dependency.

We provide tools to incrementally transition a model from a pure Python
program to a TorchScript program that can be run independently from Python,
such as in a standalone C++ program. This makes it possible to train models in
PyTorch using familiar tools in Python and then export the model via
TorchScript to a production environment where Python programs may be
disadvantageous for performance and multi-threading reasons.

For a gentle introduction to TorchScript, see the [Introduction to
TorchScript](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html)
tutorial.

For an end-to-end example of converting a PyTorch model to TorchScript and
running it in C++, see the [Loading a PyTorch Model in
C++](https://pytorch.org/tutorials/advanced/cpp_export.html) tutorial.

## Creating TorchScript Code

[`script`](generated/torch.jit.script#torch.jit.script "torch.jit.script")(obj[, optimize, _frames_up, _rcb]) | Scripting a function or `nn.Module` will inspect the source code, compile it as TorchScript code using the TorchScript compiler, and return a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") or [`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction").  
---|---  
[`trace`](generated/torch.jit.trace#torch.jit.trace "torch.jit.trace")(func, example_inputs[, optimize, …]) | Trace a function and return an executable or [`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") that will be optimized using just-in-time compilation.  
[`script_if_tracing`](generated/torch.jit.script_if_tracing#torch.jit.script_if_tracing "torch.jit.script_if_tracing")(fn) | Compiles `fn` when it is first called during tracing.  
[`trace_module`](generated/torch.jit.trace_module#torch.jit.trace_module "torch.jit.trace_module")(mod, inputs[, optimize, …]) | Trace a module and return an executable [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") that will be optimized using just-in-time compilation.  
[`fork`](generated/torch.jit.fork#torch.jit.fork "torch.jit.fork")(func, *args, **kwargs) | Creates an asynchronous task executing `func` and a reference to the value of the result of this execution.  
[`wait`](generated/torch.jit.wait#torch.jit.wait "torch.jit.wait")(future) | Forces completion of a `torch.jit.Future[T]` asynchronous task, returning the result of the task.  
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")() | A wrapper around C++ `torch::jit::Module`.  
[`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") | Functionally equivalent to a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"), but represents a single function and does not have any attributes or Parameters.  
[`freeze`](generated/torch.jit.freeze#torch.jit.freeze "torch.jit.freeze")(mod[, preserved_attrs, optimize_numerics]) | Freezing a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") will clone it and attempt to inline the cloned module’s submodules, parameters, and attributes as constants in the TorchScript IR Graph.  
[`save`](generated/torch.jit.save#torch.jit.save "torch.jit.save")(m, f[, _extra_files]) | Save an offline version of this module for use in a separate process.  
[`load`](generated/torch.jit.load#torch.jit.load "torch.jit.load")(f[, map_location, _extra_files]) | Load a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") or [`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") previously saved with [`torch.jit.save`](generated/torch.jit.save#torch.jit.save "torch.jit.save")  
[`ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore")([drop]) | This decorator indicates to the compiler that a function or method should be ignored and left as a Python function.  
[`unused`](generated/torch.jit.unused#torch.jit.unused "torch.jit.unused")(fn) | This decorator indicates to the compiler that a function or method should be ignored and replaced with the raising of an exception.  
[`isinstance`](generated/torch.jit.isinstance#torch.jit.isinstance "torch.jit.isinstance")(obj, target_type) | This function provides for conatiner type refinement in TorchScript.  
  
## Mixing Tracing and Scripting

In many cases either tracing or scripting is an easier approach for converting
a model to TorchScript. Tracing and scripting can be composed to suit the
particular requirements of a part of a model.

Scripted functions can call traced functions. This is particularly useful when
you need to use control-flow around a simple feed-forward model. For instance
the beam search of a sequence to sequence model will typically be written in
script but can call an encoder module generated using tracing.

Example (calling a traced function in script):

    
    
    import torch
    
    def foo(x, y):
        return 2 * x + y
    
    traced_foo = torch.jit.trace(foo, (torch.rand(3), torch.rand(3)))
    
    @torch.jit.script
    def bar(x):
        return traced_foo(x, x)
    

Traced functions can call script functions. This is useful when a small part
of a model requires some control-flow even though most of the model is just a
feed-forward network. Control-flow inside of a script function called by a
traced function is preserved correctly.

Example (calling a script function in a traced function):

    
    
    import torch
    
    @torch.jit.script
    def foo(x, y):
        if x.max() > y.max():
            r = x
        else:
            r = y
        return r
    
    
    def bar(x, y, z):
        return foo(x, y) + z
    
    traced_bar = torch.jit.trace(bar, (torch.rand(3), torch.rand(3), torch.rand(3)))
    

This composition also works for `nn.Module`s as well, where it can be used to
generate a submodule using tracing that can be called from the methods of a
script module.

Example (using a traced module):

    
    
    import torch
    import torchvision
    
    class MyScriptModule(torch.nn.Module):
        def __init__(self):
            super(MyScriptModule, self).__init__()
            self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68])
                                            .resize_(1, 3, 1, 1))
            self.resnet = torch.jit.trace(torchvision.models.resnet18(),
                                          torch.rand(1, 3, 224, 224))
    
        def forward(self, input):
            return self.resnet(input - self.means)
    
    my_script_module = torch.jit.script(MyScriptModule())
    

## TorchScript Language

TorchScript is a statically typed subset of Python, so many Python features
apply directly to TorchScript. See the full [TorchScript Language
Reference](jit_language_reference#language-reference) for details.

## Built-in Functions and Modules

TorchScript supports the use of most PyTorch functions and many Python built-
ins. See [TorchScript Builtins](jit_builtin_functions#builtin-functions) for a
full reference of supported functions.

### PyTorch Functions and Modules

TorchScript supports a subset of the tensor and neural network functions that
PyTorch provides. Most methods on Tensor as well as functions in the `torch`
namespace, all functions in `torch.nn.functional` and most modules from
`torch.nn` are supported in TorchScript.

See [TorchScript Unsupported Pytorch Constructs](jit_unsupported#jit-
unsupported) for a list of unsupported PyTorch functions and modules.

### Python Functions and Modules

Many of Python’s [built-in
functions](https://docs.python.org/3/library/functions.html) are supported in
TorchScript. The [`math`](https://docs.python.org/3/library/math.html#module-
math "\(in Python v3.9\)") module is also supported (see [math
Module](jit_builtin_functions#math-module) for details), but no other Python
modules (built-in or third party) are supported.

### Python Language Reference Comparison

For a full listing of supported Python features, see [Python Language
Reference Coverage](jit_python_reference#python-language-reference).

## Debugging

### Disable JIT for Debugging

`PYTORCH_JIT`

Setting the environment variable `PYTORCH_JIT=0` will disable all script and
tracing annotations. If there is hard-to-debug error in one of your
TorchScript models, you can use this flag to force everything to run using
native Python. Since TorchScript (scripting and tracing) is disabled with this
flag, you can use tools like `pdb` to debug the model code. For example:

    
    
    @torch.jit.script
    def scripted_fn(x : torch.Tensor):
        for i in range(12):
            x = x + x
        return x
    
    def fn(x):
        x = torch.neg(x)
        import pdb; pdb.set_trace()
        return scripted_fn(x)
    
    traced_fn = torch.jit.trace(fn, (torch.rand(4, 5),))
    traced_fn(torch.rand(3, 4))
    

Debugging this script with `pdb` works except for when we invoke the
[`@torch.jit.script`](generated/torch.jit.script#torch.jit.script
"torch.jit.script") function. We can globally disable JIT, so that we can call
the [`@torch.jit.script`](generated/torch.jit.script#torch.jit.script
"torch.jit.script") function as a normal Python function and not compile it.
If the above script is called `disable_jit_example.py`, we can invoke it like
so:

    
    
    $ PYTORCH_JIT=0 python disable_jit_example.py
    

and we will be able to step into the
[`@torch.jit.script`](generated/torch.jit.script#torch.jit.script
"torch.jit.script") function as a normal Python function. To disable the
TorchScript compiler for a specific function, see
[`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore
"torch.jit.ignore").

### Inspecting Code

TorchScript provides a code pretty-printer for all
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") instances. This pretty-printer gives an
interpretation of the script method’s code as valid Python syntax. For
example:

    
    
    @torch.jit.script
    def foo(len):
        # type: (int) -> torch.Tensor
        rv = torch.zeros(3, 4)
        for i in range(len):
            if i < 10:
                rv = rv - 1.0
            else:
                rv = rv + 1.0
        return rv
    
    print(foo.code)
    

A [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") with a single `forward` method will have an
attribute `code`, which you can use to inspect the
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule")’s code. If the
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") has more than one method, you will need to access
`.code` on the method itself and not the module. We can inspect the code of a
method named `foo` on a
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") by accessing `.foo.code`. The example above produces
this output:

    
    
    def foo(len: int) -> Tensor:
        rv = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None)
        rv0 = rv
        for i in range(len):
            if torch.lt(i, 10):
                rv1 = torch.sub(rv0, 1., 1)
            else:
                rv1 = torch.add(rv0, 1., 1)
            rv0 = rv1
        return rv0
    

This is TorchScript’s compilation of the code for the `forward` method. You
can use this to ensure TorchScript (tracing or scripting) has captured your
model code correctly.

### Interpreting Graphs

TorchScript also has a representation at a lower level than the code pretty-
printer, in the form of IR graphs.

TorchScript uses a static single assignment (SSA) intermediate representation
(IR) to represent computation. The instructions in this format consist of ATen
(the C++ backend of PyTorch) operators and other primitive operators,
including control flow operators for loops and conditionals. As an example:

    
    
    @torch.jit.script
    def foo(len):
        # type: (int) -> torch.Tensor
        rv = torch.zeros(3, 4)
        for i in range(len):
            if i < 10:
                rv = rv - 1.0
            else:
                rv = rv + 1.0
        return rv
    
    print(foo.graph)
    

`graph` follows the same rules described in the Inspecting Code section with
regard to `forward` method lookup.

The example script above produces the graph:

    
    
    graph(%len.1 : int):
      %24 : int = prim::Constant[value=1]()
      %17 : bool = prim::Constant[value=1]() # test.py:10:5
      %12 : bool? = prim::Constant()
      %10 : Device? = prim::Constant()
      %6 : int? = prim::Constant()
      %1 : int = prim::Constant[value=3]() # test.py:9:22
      %2 : int = prim::Constant[value=4]() # test.py:9:25
      %20 : int = prim::Constant[value=10]() # test.py:11:16
      %23 : float = prim::Constant[value=1]() # test.py:12:23
      %4 : int[] = prim::ListConstruct(%1, %2)
      %rv.1 : Tensor = aten::zeros(%4, %6, %6, %10, %12) # test.py:9:10
      %rv : Tensor = prim::Loop(%len.1, %17, %rv.1) # test.py:10:5
        block0(%i.1 : int, %rv.14 : Tensor):
          %21 : bool = aten::lt(%i.1, %20) # test.py:11:12
          %rv.13 : Tensor = prim::If(%21) # test.py:11:9
            block0():
              %rv.3 : Tensor = aten::sub(%rv.14, %23, %24) # test.py:12:18
              -> (%rv.3)
            block1():
              %rv.6 : Tensor = aten::add(%rv.14, %23, %24) # test.py:14:18
              -> (%rv.6)
          -> (%17, %rv.13)
      return (%rv)
    

Take the instruction `%rv.1 : Tensor = aten::zeros(%4, %6, %6, %10, %12) #
test.py:9:10` for example.

  * `%rv.1 : Tensor` means we assign the output to a (unique) value named `rv.1`, that value is of `Tensor` type and that we do not know its concrete shape.
  * `aten::zeros` is the operator (equivalent to `torch.zeros`) and the input list `(%4, %6, %6, %10, %12)` specifies which values in scope should be passed as inputs. The schema for built-in functions like `aten::zeros` can be found at Builtin Functions.
  * `# test.py:9:10` is the location in the original source file that generated this instruction. In this case, it is a file named `test.py`, on line 9, and at character 10.

Notice that operators can also have associated `blocks`, namely the
`prim::Loop` and `prim::If` operators. In the graph print-out, these operators
are formatted to reflect their equivalent source code forms to facilitate easy
debugging.

Graphs can be inspected as shown to confirm that the computation described by
a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") is correct, in both automated and manual fashion, as
described below.

### Tracer

#### Tracing Edge Cases

There are some edge cases that exist where the trace of a given Python
function/module will not be representative of the underlying code. These cases
can include:

  * Tracing of control flow that is dependent on inputs (e.g. tensor shapes)
  * Tracing of in-place operations of tensor views (e.g. indexing on the left-hand side of an assignment)

Note that these cases may in fact be traceable in the future.

#### Automatic Trace Checking

One way to automatically catch many errors in traces is by using
`check_inputs` on the `torch.jit.trace()` API. `check_inputs` takes a list of
tuples of inputs that will be used to re-trace the computation and verify the
results. For example:

    
    
    def loop_in_traced_fn(x):
        result = x[0]
        for i in range(x.size(0)):
            result = result * x[i]
        return result
    
    inputs = (torch.rand(3, 4, 5),)
    check_inputs = [(torch.rand(4, 5, 6),), (torch.rand(2, 3, 4),)]
    
    traced = torch.jit.trace(loop_in_traced_fn, inputs, check_inputs=check_inputs)
    

Gives us the following diagnostic information:

    
    
    ERROR: Graphs differed across invocations!
    Graph diff:
    
                graph(%x : Tensor) {
                %1 : int = prim::Constant[value=0]()
                %2 : int = prim::Constant[value=0]()
                %result.1 : Tensor = aten::select(%x, %1, %2)
                %4 : int = prim::Constant[value=0]()
                %5 : int = prim::Constant[value=0]()
                %6 : Tensor = aten::select(%x, %4, %5)
                %result.2 : Tensor = aten::mul(%result.1, %6)
                %8 : int = prim::Constant[value=0]()
                %9 : int = prim::Constant[value=1]()
                %10 : Tensor = aten::select(%x, %8, %9)
            -   %result : Tensor = aten::mul(%result.2, %10)
            +   %result.3 : Tensor = aten::mul(%result.2, %10)
            ?          ++
                %12 : int = prim::Constant[value=0]()
                %13 : int = prim::Constant[value=2]()
                %14 : Tensor = aten::select(%x, %12, %13)
            +   %result : Tensor = aten::mul(%result.3, %14)
            +   %16 : int = prim::Constant[value=0]()
            +   %17 : int = prim::Constant[value=3]()
            +   %18 : Tensor = aten::select(%x, %16, %17)
            -   %15 : Tensor = aten::mul(%result, %14)
            ?     ^                                 ^
            +   %19 : Tensor = aten::mul(%result, %18)
            ?     ^                                 ^
            -   return (%15);
            ?             ^
            +   return (%19);
            ?             ^
                }
    

This message indicates to us that the computation differed between when we
first traced it and when we traced it with the `check_inputs`. Indeed, the
loop within the body of `loop_in_traced_fn` depends on the shape of the input
`x`, and thus when we try another `x` with a different shape, the trace
differs.

In this case, data-dependent control flow like this can be captured using
[`torch.jit.script()`](generated/torch.jit.script#torch.jit.script
"torch.jit.script") instead:

    
    
    def fn(x):
        result = x[0]
        for i in range(x.size(0)):
            result = result * x[i]
        return result
    
    inputs = (torch.rand(3, 4, 5),)
    check_inputs = [(torch.rand(4, 5, 6),), (torch.rand(2, 3, 4),)]
    
    scripted_fn = torch.jit.script(fn)
    print(scripted_fn.graph)
    #print(str(scripted_fn.graph).strip())
    
    for input_tuple in [inputs] + check_inputs:
        torch.testing.assert_allclose(fn(*input_tuple), scripted_fn(*input_tuple))
    

Which produces:

    
    
    graph(%x : Tensor) {
        %5 : bool = prim::Constant[value=1]()
        %1 : int = prim::Constant[value=0]()
        %result.1 : Tensor = aten::select(%x, %1, %1)
        %4 : int = aten::size(%x, %1)
        %result : Tensor = prim::Loop(%4, %5, %result.1)
        block0(%i : int, %7 : Tensor) {
            %10 : Tensor = aten::select(%x, %1, %i)
            %result.2 : Tensor = aten::mul(%7, %10)
            -> (%5, %result.2)
        }
        return (%result);
    }
    

#### Tracer Warnings

The tracer produces warnings for several problematic patterns in traced
computation. As an example, take a trace of a function that contains an in-
place assignment on a slice (a view) of a Tensor:

    
    
    def fill_row_zero(x):
        x[0] = torch.rand(*x.shape[1:2])
        return x
    
    traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),))
    print(traced.graph)
    

Produces several warnings and a graph which simply returns the input:

    
    
    fill_row_zero.py:4: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe.
        x[0] = torch.rand(*x.shape[1:2])
    fill_row_zero.py:6: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error:
    Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 1] (0.09115803241729736 vs. 0.6782537698745728) and 3 other locations (33.00%)
        traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),))
    graph(%0 : Float(3, 4)) {
        return (%0);
    }
    

We can fix this by modifying the code to not use the in-place update, but
rather build up the result tensor out-of-place with `torch.cat`:

    
    
    def fill_row_zero(x):
        x = torch.cat((torch.rand(1, *x.shape[1:2]), x[1:2]), dim=0)
        return x
    
    traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),))
    print(traced.graph)
    

## Frequently Asked Questions

Q: I would like to train a model on GPU and do inference on CPU. What are the
best practices?

First convert your model from GPU to CPU and then save it, like so:

    
    
    cpu_model = gpu_model.cpu()
    sample_input_cpu = sample_input_gpu.cpu()
    traced_cpu = torch.jit.trace(cpu_model, sample_input_cpu)
    torch.jit.save(traced_cpu, "cpu.pt")
    
    traced_gpu = torch.jit.trace(gpu_model, sample_input_gpu)
    torch.jit.save(traced_gpu, "gpu.pt")
    
    # ... later, when using the model:
    
    if use_gpu:
      model = torch.jit.load("gpu.pt")
    else:
      model = torch.jit.load("cpu.pt")
    
    model(input)
    

This is recommended because the tracer may witness tensor creation on a
specific device, so casting an already-loaded model may have unexpected
effects. Casting the model _before_ saving it ensures that the tracer has the
correct device information.

Q: How do I store attributes on a
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule")?

Say we have a model like:

    
    
    import torch
    
    class Model(torch.nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.x = 2
    
        def forward(self):
            return self.x
    
    m = torch.jit.script(Model())
    

If `Model` is instantiated it will result in a compilation error since the
compiler doesn’t know about `x`. There are 4 ways to inform the compiler of
attributes on
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule"):

1\. `nn.Parameter` \- Values wrapped in `nn.Parameter` will work as they do on
`nn.Module`s

2\. `register_buffer` \- Values wrapped in `register_buffer` will work as they
do on `nn.Module`s. This is equivalent to an attribute (see 4) of type
`Tensor`.

3\. Constants - Annotating a class member as `Final` (or adding it to a list
called `__constants__` at the class definition level) will mark the contained
names as constants. Constants are saved directly in the code of the model. See
`builtin-constants` for details.

4\. Attributes - Values that are a `supported type` can be added as mutable
attributes. Most types can be inferred but some may need to be specified, see
`module attributes` for details.

Q: I would like to trace module’s method but I keep getting this error:

`RuntimeError: Cannot insert a Tensor that requires grad as a constant.
Consider making it a parameter or input, or detaching the gradient`

This error usually means that the method you are tracing uses a module’s
parameters and you are passing the module’s method instead of the module
instance (e.g. `my_module_instance.forward` vs `my_module_instance`).

  * Invoking `trace` with a module’s method captures module parameters (which may require gradients) as **constants**.
  * On the other hand, invoking `trace` with module’s instance (e.g. `my_module`) creates a new module and correctly copies parameters into the new module, so they can accumulate gradients if required.

To trace a specific method on a module, see
[`torch.jit.trace_module`](generated/torch.jit.trace_module#torch.jit.trace_module
"torch.jit.trace_module")

## Appendix

### Migrating to PyTorch 1.2 Recursive Scripting API

This section details the changes to TorchScript in PyTorch 1.2. If you are new
to TorchScript you can skip this section. There are two main changes to the
TorchScript API with PyTorch 1.2.

1\. [`torch.jit.script`](generated/torch.jit.script#torch.jit.script
"torch.jit.script") will now attempt to recursively compile functions,
methods, and classes that it encounters. Once you call `torch.jit.script`,
compilation is “opt-out”, rather than “opt-in”.

2\. `torch.jit.script(nn_module_instance)` is now the preferred way to create
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule")s, instead of inheriting from
`torch.jit.ScriptModule`. These changes combine to provide a simpler, easier-
to-use API for converting your `nn.Module`s into
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule")s, ready to be optimized and executed in a non-Python
environment.

The new usage looks like this:

    
    
    import torch
    import torch.nn as nn
    import torch.nn.functional as F
    
    class Model(nn.Module):
        def __init__(self):
            super(Model, self).__init__()
            self.conv1 = nn.Conv2d(1, 20, 5)
            self.conv2 = nn.Conv2d(20, 20, 5)
    
        def forward(self, x):
            x = F.relu(self.conv1(x))
            return F.relu(self.conv2(x))
    
    my_model = Model()
    my_scripted_model = torch.jit.script(my_model)
    

  * The module’s `forward` is compiled by default. Methods called from `forward` are lazily compiled in the order they are used in `forward`.
  * To compile a method other than `forward` that is not called from `forward`, add `@torch.jit.export`.
  * To stop the compiler from compiling a method, add [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") or [`@torch.jit.unused`](generated/torch.jit.unused#torch.jit.unused "torch.jit.unused"). `@ignore` leaves the
  * method as a call to python, and `@unused` replaces it with an exception. `@ignored` cannot be exported; `@unused` can.
  * Most attribute types can be inferred, so `torch.jit.Attribute` is not necessary. For empty container types, annotate their types using [PEP 526-style](https://www.python.org/dev/peps/pep-0526/#class-and-instance-variable-annotations) class annotations.
  * Constants can be marked with a `Final` class annotation instead of adding the name of the member to `__constants__`.
  * Python 3 type hints can be used in place of `torch.jit.annotate`

As a result of these changes, the following items are considered deprecated
and should not appear in new code:

    

  * The `@torch.jit.script_method` decorator
  * Classes that inherit from `torch.jit.ScriptModule`
  * The `torch.jit.Attribute` wrapper class
  * The `__constants__` array
  * The `torch.jit.annotate` function

#### Modules

Warning

The [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore
"torch.jit.ignore") annotation’s behavior changes in PyTorch 1.2. Before
PyTorch 1.2 the @ignore decorator was used to make a function or method
callable from code that is exported. To get this functionality back, use
`@torch.jit.unused()`. `@torch.jit.ignore` is now equivalent to
`@torch.jit.ignore(drop=False)`. See
[`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore
"torch.jit.ignore") and
[`@torch.jit.unused`](generated/torch.jit.unused#torch.jit.unused
"torch.jit.unused") for details.

When passed to the
[`torch.jit.script`](generated/torch.jit.script#torch.jit.script
"torch.jit.script") function, a `torch.nn.Module`’s data is copied to a
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") and the TorchScript compiler compiles the module.
The module’s `forward` is compiled by default. Methods called from `forward`
are lazily compiled in the order they are used in `forward`, as well as any
`@torch.jit.export` methods.

`torch.jit.export(fn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#export)

    

This decorator indicates that a method on an `nn.Module` is used as an entry
point into a
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") and should be compiled.

`forward` implicitly is assumed to be an entry point, so it does not need this
decorator. Functions and methods called from `forward` are compiled as they
are seen by the compiler, so they do not need this decorator either.

Example (using `@torch.jit.export` on a method):

    
    
    import torch
    import torch.nn as nn
    
    class MyModule(nn.Module):
        def implicitly_compiled_method(self, x):
            return x + 99
    
        # `forward` is implicitly decorated with `@torch.jit.export`,
        # so adding it here would have no effect
        def forward(self, x):
            return x + 10
    
        @torch.jit.export
        def another_forward(self, x):
            # When the compiler sees this call, it will compile
            # `implicitly_compiled_method`
            return self.implicitly_compiled_method(x)
    
        def unused_method(self, x):
            return x - 20
    
    # `m` will contain compiled methods:
    #     `forward`
    #     `another_forward`
    #     `implicitly_compiled_method`
    # `unused_method` will not be compiled since it was not called from
    # any compiled methods and wasn't decorated with `@torch.jit.export`
    m = torch.jit.script(MyModule())
    

#### Functions

Functions don’t change much, they can be decorated with
[`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore
"torch.jit.ignore") or
[`torch.jit.unused`](generated/torch.jit.unused#torch.jit.unused
"torch.jit.unused") if needed.

    
    
    # Same behavior as pre-PyTorch 1.2
    @torch.jit.script
    def some_fn():
        return 2
    
    # Marks a function as ignored, if nothing
    # ever calls it then this has no effect
    @torch.jit.ignore
    def some_fn2():
        return 2
    
    # As with ignore, if nothing calls it then it has no effect.
    # If it is called in script it is replaced with an exception.
    @torch.jit.unused
    def some_fn3():
      import pdb; pdb.set_trace()
      return 4
    
    # Doesn't do anything, this function is already
    # the main entry point
    @torch.jit.export
    def some_fn4():
        return 2
    

#### TorchScript Classes

Warning

TorchScript class support is experimental. Currently it is best suited for
simple record-like types (think a `NamedTuple` with methods attached).

Everything in a user defined [TorchScript Class](torchscript-class) is
exported by default, functions can be decorated with
[`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore
"torch.jit.ignore") if needed.

#### Attributes

The TorchScript compiler needs to know the types of `module attributes`. Most
types can be inferred from the value of the member. Empty lists and dicts
cannot have their types inferred and must have their types annotated with [PEP
526-style](https://www.python.org/dev/peps/pep-0526/#class-and-instance-
variable-annotations) class annotations. If a type cannot be inferred and is
not explicitly annotated, it will not be added as an attribute to the
resulting
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule")

Old API:

    
    
    from typing import Dict
    import torch
    
    class MyModule(torch.jit.ScriptModule):
        def __init__(self):
            super(MyModule, self).__init__()
            self.my_dict = torch.jit.Attribute({}, Dict[str, int])
            self.my_int = torch.jit.Attribute(20, int)
    
    m = MyModule()
    

New API:

    
    
    from typing import Dict
    
    class MyModule(torch.nn.Module):
        my_dict: Dict[str, int]
    
        def __init__(self):
            super(MyModule, self).__init__()
            # This type cannot be inferred and must be specified
            self.my_dict = {}
    
            # The attribute type here is inferred to be `int`
            self.my_int = 20
    
        def forward(self):
            pass
    
    m = torch.jit.script(MyModule())
    

#### Constants

The `Final` type constructor can be used to mark members as `constant`. If
members are not marked constant, they will be copied to the resulting
[`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule
"torch.jit.ScriptModule") as an attribute. Using `Final` opens opportunities
for optimization if the value is known to be fixed and gives additional type
safety.

Old API:

    
    
    class MyModule(torch.jit.ScriptModule):
        __constants__ = ['my_constant']
    
        def __init__(self):
            super(MyModule, self).__init__()
            self.my_constant = 2
    
        def forward(self):
            pass
    m = MyModule()
    

New API:

    
    
    try:
        from typing_extensions import Final
    except:
        # If you don't have `typing_extensions` installed, you can use a
        # polyfill from `torch.jit`.
        from torch.jit import Final
    
    class MyModule(torch.nn.Module):
    
        my_constant: Final[int]
    
        def __init__(self):
            super(MyModule, self).__init__()
            self.my_constant = 2
    
        def forward(self):
            pass
    
    m = torch.jit.script(MyModule())
    

#### Variables

Containers are assumed to have type `Tensor` and be non-optional (see `Default
Types` for more information). Previously, `torch.jit.annotate` was used to
tell the TorchScript compiler what the type should be. Python 3 style type
hints are now supported.

    
    
    import torch
    from typing import Dict, Optional
    
    @torch.jit.script
    def make_dict(flag: bool):
        x: Dict[str, int] = {}
        x['hi'] = 2
        b: Optional[int] = None
        if flag:
            b = 2
        return x, b
    

### References

  * [Python Language Reference Coverage](jit_python_reference)
  * [TorchScript Unsupported Pytorch Constructs](jit_unsupported)

  * Types
  * Expressions
  * Statements
  * Variable Resolution
  * Use of Python Values

# TorchScript Language Reference

TorchScript is a statically typed subset of Python that can either be written
directly (using the
[`@torch.jit.script`](generated/torch.jit.script#torch.jit.script
"torch.jit.script") decorator) or generated automatically from Python code via
tracing. When using tracing, code is automatically converted into this subset
of Python by recording only the actual operators on tensors and simply
executing and discarding the other surrounding Python code.

When writing TorchScript directly using `@torch.jit.script` decorator, the
programmer must only use the subset of Python supported in TorchScript. This
section documents what is supported in TorchScript as if it were a language
reference for a stand alone language. Any features of Python not mentioned in
this reference are not part of TorchScript. See `Builtin Functions` for a
complete reference of available Pytorch tensor methods, modules, and
functions.

As a subset of Python, any valid TorchScript function is also a valid Python
function. This makes it possible to `disable TorchScript` and debug the
function using standard Python tools like `pdb`. The reverse is not true:
there are many valid Python programs that are not valid TorchScript programs.
Instead, TorchScript focuses specifically on the features of Python that are
needed to represent neural network models in PyTorch.

## Types

The largest difference between TorchScript and the full Python language is
that TorchScript only supports a small set of types that are needed to express
neural net models. In particular, TorchScript supports:

Type | Description  
---|---  
`Tensor` | A PyTorch tensor of any dtype, dimension, or backend  
`Tuple[T0, T1, ..., TN]` | A tuple containing subtypes `T0`, `T1`, etc. (e.g. `Tuple[Tensor, Tensor]`)  
`bool` | A boolean value  
`int` | A scalar integer  
`float` | A scalar floating point number  
`str` | A string  
`List[T]` | A list of which all members are type `T`  
`Optional[T]` | A value which is either None or type `T`  
`Dict[K, V]` | A dict with key type `K` and value type `V`. Only `str`, `int`, and `float` are allowed as key types.  
`T` | A TorchScript Class  
`E` | A TorchScript Enum  
`NamedTuple[T0, T1, ...]` | A [`collections.namedtuple`](https://docs.python.org/3/library/collections.html#collections.namedtuple "\(in Python v3.9\)") tuple type  
  
Unlike Python, each variable in TorchScript function must have a single static
type. This makes it easier to optimize TorchScript functions.

Example (a type mismatch)

    
    
    import torch
    
    @torch.jit.script
    def an_error(x):
        if x:
            r = torch.rand(1)
        else:
            r = 4
        return r
    
    
    
    Traceback (most recent call last):
      ...
    RuntimeError: ...
    
    Type mismatch: r is set to type Tensor in the true branch and type int in the false branch:
    @torch.jit.script
    def an_error(x):
        if x:
        ~~~~~
            r = torch.rand(1)
            ~~~~~~~~~~~~~~~~~
        else:
        ~~~~~
            r = 4
            ~~~~~ <--- HERE
        return r
    and was used here:
        else:
            r = 4
        return r
               ~ <--- HERE...
    

### Unsupported Typing Constructs

TorchScript does not support all features and types of the
[`typing`](https://docs.python.org/3/library/typing.html#module-typing "\(in
Python v3.9\)") module. Some of these are more fundamental things that are
unlikely to be added in the future while others may be added if there is
enough user demand to make it a priority.

These types and features from the
[`typing`](https://docs.python.org/3/library/typing.html#module-typing "\(in
Python v3.9\)") module are unavailble in TorchScript.

Item | Description  
---|---  
[`typing.Any`](https://docs.python.org/3/library/typing.html#typing.Any "\(in Python v3.9\)") | [`typing.Any`](https://docs.python.org/3/library/typing.html#typing.Any "\(in Python v3.9\)") is currently in development but not yet released  
[`typing.NoReturn`](https://docs.python.org/3/library/typing.html#typing.NoReturn "\(in Python v3.9\)") | Not implemented  
[`typing.Union`](https://docs.python.org/3/library/typing.html#typing.Union "\(in Python v3.9\)") | Unlikely to be implemented (however [`typing.Optional`](https://docs.python.org/3/library/typing.html#typing.Optional "\(in Python v3.9\)") is supported)  
[`typing.Sequence`](https://docs.python.org/3/library/typing.html#typing.Sequence "\(in Python v3.9\)") | Not implemented  
[`typing.Callable`](https://docs.python.org/3/library/typing.html#typing.Callable "\(in Python v3.9\)") | Not implemented  
[`typing.Literal`](https://docs.python.org/3/library/typing.html#typing.Literal "\(in Python v3.9\)") | Not implemented  
[`typing.ClassVar`](https://docs.python.org/3/library/typing.html#typing.ClassVar "\(in Python v3.9\)") | Not implemented  
[`typing.Final`](https://docs.python.org/3/library/typing.html#typing.Final "\(in Python v3.9\)") | This is supported for module attributes class attribute annotations but not for functions  
[`typing.AnyStr`](https://docs.python.org/3/library/typing.html#typing.AnyStr "\(in Python v3.9\)") | TorchScript does not support [`bytes`](https://docs.python.org/3/library/stdtypes.html#bytes "\(in Python v3.9\)") so this type is not used  
[`typing.overload`](https://docs.python.org/3/library/typing.html#typing.overload "\(in Python v3.9\)") | [`typing.overload`](https://docs.python.org/3/library/typing.html#typing.overload "\(in Python v3.9\)") is currently in development but not yet released  
Type aliases | Not implemented  
Nominal vs structural subtyping | Nominal typing is in development, but structural typing is not  
NewType | Unlikely to be implemented  
Generics | Unlikely to be implemented  
  
Any other functionality from the
[`typing`](https://docs.python.org/3/library/typing.html#module-typing "\(in
Python v3.9\)") module not explitily listed in this documentation is
unsupported.

### Default Types

By default, all parameters to a TorchScript function are assumed to be Tensor.
To specify that an argument to a TorchScript function is another type, it is
possible to use MyPy-style type annotations using the types listed above.

    
    
    import torch
    
    @torch.jit.script
    def foo(x, tup):
        # type: (int, Tuple[Tensor, Tensor]) -> Tensor
        t0, t1 = tup
        return t0 + t1 + x
    
    print(foo(3, (torch.rand(3), torch.rand(3))))
    

Note

It is also possible to annotate types with Python 3 type hints from the
`typing` module.

    
    
    import torch
    from typing import Tuple
    
    @torch.jit.script
    def foo(x: int, tup: Tuple[torch.Tensor, torch.Tensor]) -> torch.Tensor:
        t0, t1 = tup
        return t0 + t1 + x
    
    print(foo(3, (torch.rand(3), torch.rand(3))))
    

An empty list is assumed to be `List[Tensor]` and empty dicts `Dict[str,
Tensor]`. To instantiate an empty list or dict of other types, use `Python 3
type hints`.

Example (type annotations for Python 3):

    
    
    import torch
    import torch.nn as nn
    from typing import Dict, List, Tuple
    
    class EmptyDataStructures(torch.nn.Module):
        def __init__(self):
            super(EmptyDataStructures, self).__init__()
    
        def forward(self, x: torch.Tensor) -> Tuple[List[Tuple[int, float]], Dict[str, int]]:
            # This annotates the list to be a `List[Tuple[int, float]]`
            my_list: List[Tuple[int, float]] = []
            for i in range(10):
                my_list.append((i, x.item()))
    
            my_dict: Dict[str, int] = {}
            return my_list, my_dict
    
    x = torch.jit.script(EmptyDataStructures())
    

### Optional Type Refinement

TorchScript will refine the type of a variable of type `Optional[T]` when a
comparison to `None` is made inside the conditional of an if-statement or
checked in an `assert`. The compiler can reason about multiple `None` checks
that are combined with `and`, `or`, and `not`. Refinement will also occur for
else blocks of if-statements that are not explicitly written.

The `None` check must be within the if-statement’s condition; assigning a
`None` check to a variable and using it in the if-statement’s condition will
not refine the types of variables in the check. Only local variables will be
refined, an attribute like `self.x` will not and must assigned to a local
variable to be refined.

Example (refining types on parameters and locals):

    
    
    import torch
    import torch.nn as nn
    from typing import Optional
    
    class M(nn.Module):
        z: Optional[int]
    
        def __init__(self, z):
            super(M, self).__init__()
            # If `z` is None, its type cannot be inferred, so it must
            # be specified (above)
            self.z = z
    
        def forward(self, x, y, z):
            # type: (Optional[int], Optional[int], Optional[int]) -> int
            if x is None:
                x = 1
                x = x + 1
    
            # Refinement for an attribute by assigning it to a local
            z = self.z
            if y is not None and z is not None:
                x = y + z
    
            # Refinement via an `assert`
            assert z is not None
            x += z
            return x
    
    module = torch.jit.script(M(2))
    module = torch.jit.script(M(None))
    

### TorchScript Classes

Warning

TorchScript class support is experimental. Currently it is best suited for
simple record-like types (think a `NamedTuple` with methods attached).

Python classes can be used in TorchScript if they are annotated with
[`@torch.jit.script`](generated/torch.jit.script#torch.jit.script
"torch.jit.script"), similar to how you would declare a TorchScript function:

    
    
    @torch.jit.script
    class Foo:
      def __init__(self, x, y):
        self.x = x
    
      def aug_add_x(self, inc):
        self.x += inc
    

This subset is restricted:

  * All functions must be valid TorchScript functions (including `__init__()`).
  * Classes must be new-style classes, as we use `__new__()` to construct them with pybind11.
  * TorchScript classes are statically typed. Members can only be declared by assigning to self in the `__init__()` method.

For example, assigning to `self` outside of the `__init__()` method:

        
        @torch.jit.script
        class Foo:
          def assign_x(self):
            self.x = torch.rand(2, 3)
        

Will result in:

        
        RuntimeError:
        Tried to set nonexistent attribute: x. Did you forget to initialize it in __init__()?:
        def assign_x(self):
          self.x = torch.rand(2, 3)
          ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        

  * No expressions except method definitions are allowed in the body of the class.
  * No support for inheritance or any other polymorphism strategy, except for inheriting from `object` to specify a new-style class.

After a class is defined, it can be used in both TorchScript and Python
interchangeably like any other TorchScript type:

    
    
    # Declare a TorchScript class
    @torch.jit.script
    class Pair:
      def __init__(self, first, second):
        self.first = first
        self.second = second
    
    @torch.jit.script
    def sum_pair(p):
      # type: (Pair) -> Tensor
      return p.first + p.second
    
    p = Pair(torch.rand(2, 3), torch.rand(2, 3))
    print(sum_pair(p))
    

### TorchScript Enums

Python enums can be used in TorchScript without any extra annotation or code:

    
    
    from enum import Enum
    
    
    class Color(Enum):
        RED = 1
        GREEN = 2
    
    @torch.jit.script
    def enum_fn(x: Color, y: Color) -> bool:
        if x == Color.RED:
            return True
    
        return x == y
    

After an enum is defined, it can be used in both TorchScript and Python
interchangeably like any other TorchScript type. The type of the values of an
enum must be `int`, `float`, or `str`. All values must be of the same type;
heterogenous types for enum values are not supported.

### Named Tuples

Types produced by
[`collections.namedtuple`](https://docs.python.org/3/library/collections.html#collections.namedtuple
"\(in Python v3.9\)") can be used in TorchScript.

    
    
    import torch
    import collections
    
    Point = collections.namedtuple('Point', ['x', 'y'])
    
    @torch.jit.script
    def total(point):
        # type: (Point) -> Tensor
        return point.x + point.y
    
    p = Point(x=torch.rand(3), y=torch.rand(3))
    print(total(p))
    

### Iterables

Some functions (for example,
[`zip`](https://docs.python.org/3/library/functions.html#zip "\(in Python
v3.9\)") and
[`enumerate`](https://docs.python.org/3/library/functions.html#enumerate "\(in
Python v3.9\)")) can only operate on iterable types. Iterable types in
TorchScript include `Tensor`s, lists, tuples, dictionaries, strings,
[`torch.nn.ModuleList`](generated/torch.nn.modulelist#torch.nn.ModuleList
"torch.nn.ModuleList") and
[`torch.nn.ModuleDict`](generated/torch.nn.moduledict#torch.nn.ModuleDict
"torch.nn.ModuleDict").

## Expressions

The following Python Expressions are supported.

### Literals

    
    
    True
    False
    None
    'string literals'
    "string literals"
    3  # interpreted as int
    3.4  # interpreted as a float
    

#### List Construction

An empty list is assumed have type `List[Tensor]`. The types of other list
literals are derived from the type of the members. See Default Types for more
details.

    
    
    [3, 4]
    []
    [torch.rand(3), torch.rand(4)]
    

#### Tuple Construction

    
    
    (3, 4)
    (3,)
    

#### Dict Construction

An empty dict is assumed have type `Dict[str, Tensor]`. The types of other
dict literals are derived from the type of the members. See Default Types for
more details.

    
    
    {'hello': 3}
    {}
    {'a': torch.rand(3), 'b': torch.rand(4)}
    

### Variables

See Variable Resolution for how variables are resolved.

    
    
    my_variable_name
    

### Arithmetic Operators

    
    
    a + b
    a - b
    a * b
    a / b
    a ^ b
    a @ b
    

### Comparison Operators

    
    
    a == b
    a != b
    a < b
    a > b
    a <= b
    a >= b
    

### Logical Operators

    
    
    a and b
    a or b
    not b
    

### Subscripts and Slicing

    
    
    t[0]
    t[-1]
    t[0:2]
    t[1:]
    t[:1]
    t[:]
    t[0, 1]
    t[0, 1:2]
    t[0, :1]
    t[-1, 1:, 0]
    t[1:, -1, 0]
    t[i:j, i]
    

### Function Calls

Calls to `builtin functions`

    
    
    torch.rand(3, dtype=torch.int)
    

Calls to other script functions:

    
    
    import torch
    
    @torch.jit.script
    def foo(x):
        return x + 1
    
    @torch.jit.script
    def bar(x):
        return foo(x)
    

### Method Calls

Calls to methods of builtin types like tensor: `x.mm(y)`

On modules, methods must be compiled before they can be called. The
TorchScript compiler recursively compiles methods it sees when compiling other
methods. By default, compilation starts on the `forward` method. Any methods
called by `forward` will be compiled, and any methods called by those methods,
and so on. To start compilation at a method other than `forward`, use the
[`@torch.jit.export`](jit#torch.jit.export "torch.jit.export") decorator
(`forward` implicitly is marked `@torch.jit.export`).

Calling a submodule directly (e.g. `self.resnet(input)`) is equivalent to
calling its `forward` method (e.g. `self.resnet.forward(input)`).

    
    
    import torch
    import torch.nn as nn
    import torchvision
    
    class MyModule(nn.Module):
        def __init__(self):
            super(MyModule, self).__init__()
            means = torch.tensor([103.939, 116.779, 123.68])
            self.means = torch.nn.Parameter(means.resize_(1, 3, 1, 1))
            resnet = torchvision.models.resnet18()
            self.resnet = torch.jit.trace(resnet, torch.rand(1, 3, 224, 224))
    
        def helper(self, input):
            return self.resnet(input - self.means)
    
        def forward(self, input):
            return self.helper(input)
    
        # Since nothing in the model calls `top_level_method`, the compiler
        # must be explicitly told to compile this method
        @torch.jit.export
        def top_level_method(self, input):
            return self.other_helper(input)
    
        def other_helper(self, input):
            return input + 10
    
    # `my_script_module` will have the compiled methods `forward`, `helper`,
    # `top_level_method`, and `other_helper`
    my_script_module = torch.jit.script(MyModule())
    

### Ternary Expressions

    
    
    x if x > y else y
    

### Casts

    
    
    float(ten)
    int(3.5)
    bool(ten)
    str(2)``
    

### Accessing Module Parameters

    
    
    self.my_parameter
    self.my_submodule.my_parameter
    

## Statements

TorchScript supports the following types of statements:

### Simple Assignments

    
    
    a = b
    a += b # short-hand for a = a + b, does not operate in-place on a
    a -= b
    

### Pattern Matching Assignments

    
    
    a, b = tuple_or_list
    a, b, *c = a_tuple
    

Multiple Assignments

    
    
    a = b, c = tup
    

### Print Statements

    
    
    print("the result of an add:", a + b)
    

### If Statements

    
    
    if a < 4:
        r = -a
    elif a < 3:
        r = a + a
    else:
        r = 3 * a
    

In addition to bools, floats, ints, and Tensors can be used in a conditional
and will be implicitly casted to a boolean.

### While Loops

    
    
    a = 0
    while a < 4:
        print(a)
        a += 1
    

### For loops with range

    
    
    x = 0
    for i in range(10):
        x *= i
    

### For loops over tuples

These unroll the loop, generating a body for each member of the tuple. The
body must type-check correctly for each member.

    
    
    tup = (3, torch.rand(4))
    for x in tup:
        print(x)
    

### For loops over constant nn.ModuleList

To use a `nn.ModuleList` inside a compiled method, it must be marked constant
by adding the name of the attribute to the `__constants__` list for the type.
For loops over a `nn.ModuleList` will unroll the body of the loop at compile
time, with each member of the constant module list.

    
    
    class SubModule(torch.nn.Module):
        def __init__(self):
            super(SubModule, self).__init__()
            self.weight = nn.Parameter(torch.randn(2))
    
        def forward(self, input):
            return self.weight + input
    
    class MyModule(torch.nn.Module):
        __constants__ = ['mods']
    
        def __init__(self):
            super(MyModule, self).__init__()
            self.mods = torch.nn.ModuleList([SubModule() for i in range(10)])
    
        def forward(self, v):
            for module in self.mods:
                v = module(v)
            return v
    
    
    m = torch.jit.script(MyModule())
    

### Break and Continue

    
    
    for i in range(5):
        if i == 1:
        continue
        if i == 3:
        break
        print(i)
    

### Return

    
    
    return a, b
    

## Variable Resolution

TorchScript supports a subset of Python’s variable resolution (i.e. scoping)
rules. Local variables behave the same as in Python, except for the
restriction that a variable must have the same type along all paths through a
function. If a variable has a different type on different branches of an if
statement, it is an error to use it after the end of the if statement.

Similarly, a variable is not allowed to be used if it is only _defined_ along
some paths through the function.

Example:

    
    
    @torch.jit.script
    def foo(x):
        if x < 0:
            y = 4
        print(y)
    
    
    
    Traceback (most recent call last):
      ...
    RuntimeError: ...
    
    y is not defined in the false branch...
    @torch.jit.script...
    def foo(x):
        if x < 0:
        ~~~~~~~~~
            y = 4
            ~~~~~ <--- HERE
        print(y)
    and was used here:
        if x < 0:
            y = 4
        print(y)
              ~ <--- HERE...
    

Non-local variables are resolved to Python values at compile time when the
function is defined. These values are then converted into TorchScript values
using the rules described in Use of Python Values.

## Use of Python Values

To make writing TorchScript more convenient, we allow script code to refer to
Python values in the surrounding scope. For instance, any time there is a
reference to `torch`, the TorchScript compiler is actually resolving it to the
`torch` Python module when the function is declared. These Python values are
not a first class part of TorchScript. Instead they are de-sugared at compile-
time into the primitive types that TorchScript supports. This depends on the
dynamic type of the Python valued referenced when compilation occurs. This
section describes the rules that are used when accessing Python values in
TorchScript.

### Functions

TorchScript can call Python functions. This functionality is very useful when
incrementally converting a model to TorchScript. The model can be moved
function-by-function to TorchScript, leaving calls to Python functions in
place. This way you can incrementally check the correctness of the model as
you go.

`torch.jit.is_scripting()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#is_scripting)

    

Function that returns True when in compilation and False otherwise. This is
useful especially with the @unused decorator to leave code in your model that
is not yet TorchScript compatible. .. testcode:

    
    
    import torch
    
    @torch.jit.unused
    def unsupported_linear_op(x):
        return x
    
    def linear(x):
       if not torch.jit.is_scripting():
          return torch.linear(x)
       else:
          return unsupported_linear_op(x)
    

### Attribute Lookup On Python Modules

TorchScript can lookup attributes on modules. `Builtin functions` like
`torch.add` are accessed this way. This allows TorchScript to call functions
defined in other modules.

### Python-defined Constants

TorchScript also provides a way to use constants that are defined in Python.
These can be used to hard-code hyper-parameters into the function, or to
define universal constants. There are two ways of specifying that a Python
value should be treated as a constant.

  1. Values looked up as attributes of a module are assumed to be constant:

    
    
    import math
    import torch
    
    @torch.jit.script
    def fn():
        return math.pi
    

  2. Attributes of a ScriptModule can be marked constant by annotating them with `Final[T]`

    
    
    import torch
    import torch.nn as nn
    
    class Foo(nn.Module):
        # `Final` from the `typing_extensions` module can also be used
        a : torch.jit.Final[int]
    
        def __init__(self):
            super(Foo, self).__init__()
            self.a = 1 + 4
    
        def forward(self, input):
            return self.a + input
    
    f = torch.jit.script(Foo())
    

Supported constant Python types are

  * `int`
  * `float`
  * `bool`
  * `torch.device`
  * `torch.layout`
  * `torch.dtype`
  * tuples containing supported types
  * `torch.nn.ModuleList` which can be used in a TorchScript for loop

### Module Attributes

The `torch.nn.Parameter` wrapper and `register_buffer` can be used to assign
tensors to a module. Other values assigned to a module that is compiled will
be added to the compiled module if their types can be inferred. All types
available in TorchScript can be used as module attributes. Tensor attributes
are semantically the same as buffers. The type of empty lists and dictionaries
and `None` values cannot be inferred and must be specified via [PEP
526-style](https://www.python.org/dev/peps/pep-0526/#class-and-instance-
variable-annotations) class annotations. If a type cannot be inferred and is
not explicilty annotated, it will not be added as an attribute to the
resulting `ScriptModule`.

Example:

    
    
    from typing import List, Dict
    
    class Foo(nn.Module):
        # `words` is initialized as an empty list, so its type must be specified
        words: List[str]
    
        # The type could potentially be inferred if `a_dict` (below) was not
        # empty, but this annotation ensures `some_dict` will be made into the
        # proper type
        some_dict: Dict[str, int]
    
        def __init__(self, a_dict):
            super(Foo, self).__init__()
            self.words = []
            self.some_dict = a_dict
    
            # `int`s can be inferred
            self.my_int = 10
    
        def forward(self, input):
            # type: (str) -> int
            self.words.append(input)
            return self.some_dict[input] + self.my_int
    
    f = torch.jit.script(Foo({'hi': 2}))
    

# torch.linalg

Common linear algebra operations.

This module is in BETA. New functions are still being added, and some
functions may change in future PyTorch releases. See the documentation of each
function for details.

## Functions

`torch.linalg.cholesky(input, *, out=None) → Tensor`

    

Computes the Cholesky decomposition of a Hermitian (or symmetric for real-
valued matrices) positive-definite matrix or the Cholesky decompositions for a
batch of such matrices. Each decomposition has the form:

input=LLH\text{input} = LL^H

where LL is a lower-triangular matrix and LHL^H is the conjugate transpose of
LL , which is just a transpose for the case of real-valued input matrices. In
code it translates to `input = L @ L.t()` if `input` is real-valued and `input
= L @ L.conj().t()` if `input` is complex-valued. The batch of LL matrices is
returned.

Supports real-valued and complex-valued inputs.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

LAPACK’s `potrf` is used for CPU inputs, and MAGMA’s `potrf` is used for CUDA
inputs.

Note

If `input` is not a Hermitian positive-definite matrix, or if it’s a batch of
matrices and one or more of them is not a Hermitian positive-definite matrix,
then a RuntimeError will be thrown. If `input` is a batch of matrices, then
the error message will include the batch index of the first matrix that is not
Hermitian positive-definite.

Parameters

    

**input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
of size (∗,n,n)(*, n, n) consisting of Hermitian positive-definite n×nn \times
n matrices, where ∗* is zero or more batch dimensions.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The
output tensor. Ignored if `None`. Default: `None`

Examples:

    
    
    >>> a = torch.randn(2, 2, dtype=torch.complex128)
    >>> a = torch.mm(a, a.t().conj())  # creates a Hermitian positive-definite matrix
    >>> l = torch.linalg.cholesky(a)
    >>> a
    tensor([[2.5266+0.0000j, 1.9586-2.0626j],
            [1.9586+2.0626j, 9.4160+0.0000j]], dtype=torch.complex128)
    >>> l
    tensor([[1.5895+0.0000j, 0.0000+0.0000j],
            [1.2322+1.2976j, 2.4928+0.0000j]], dtype=torch.complex128)
    >>> torch.mm(l, l.t().conj())
    tensor([[2.5266+0.0000j, 1.9586-2.0626j],
            [1.9586+2.0626j, 9.4160+0.0000j]], dtype=torch.complex128)
    
    >>> a = torch.randn(3, 2, 2, dtype=torch.float64)
    >>> a = torch.matmul(a, a.transpose(-2, -1))  # creates a symmetric positive-definite matrix
    >>> l = torch.linalg.cholesky(a)
    >>> a
    tensor([[[ 1.1629,  2.0237],
            [ 2.0237,  6.6593]],
    
            [[ 0.4187,  0.1830],
            [ 0.1830,  0.1018]],
    
            [[ 1.9348, -2.5744],
            [-2.5744,  4.6386]]], dtype=torch.float64)
    >>> l
    tensor([[[ 1.0784,  0.0000],
            [ 1.8766,  1.7713]],
    
            [[ 0.6471,  0.0000],
            [ 0.2829,  0.1477]],
    
            [[ 1.3910,  0.0000],
            [-1.8509,  1.1014]]], dtype=torch.float64)
    >>> torch.allclose(torch.matmul(l, l.transpose(-2, -1)), a)
    True
    

`torch.linalg.cond(input, p=None, *, out=None) → Tensor`

    

Computes the condition number of a matrix `input`, or of each matrix in a
batched `input`, using the matrix norm defined by `p`.

For norms `{‘fro’, ‘nuc’, inf, -inf, 1, -1}` this is defined as the matrix
norm of `input` times the matrix norm of the inverse of `input` computed using
`torch.linalg.norm()`. While for norms `{None, 2, -2}` this is defined as the
ratio between the largest and smallest singular values computed using
`torch.linalg.svd()`.

This function supports float, double, cfloat and cdouble dtypes.

Note

When given inputs on a CUDA device, this function may synchronize that device
with the CPU depending on which norm `p` is used.

Note

For norms `{None, 2, -2}`, `input` may be a non-square matrix or batch of non-
square matrices. For other norms, however, `input` must be a square matrix or
a batch of square matrices, and if this requirement is not satisfied a
RuntimeError will be thrown.

Note

For norms `{‘fro’, ‘nuc’, inf, -inf, 1, -1}` if `input` is a non-invertible
matrix then a tensor containing infinity will be returned. If `input` is a
batch of matrices and one or more of them is not invertible then a
RuntimeError will be thrown.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size `(m, n)` or the batch of matrices of size `(*, m, n)` where `*` is one or more batch dimensions.
  * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'__,__optional_) – 

the type of the matrix norm to use in the computations. inf refers to
`float('inf')`, numpy’s `inf` object, or any equivalent object. The following
norms can be used:

p | norm for matrices  
---|---  
None | ratio of the largest singular value to the smallest singular value  
’fro’ | Frobenius norm  
’nuc’ | nuclear norm  
inf | max(sum(abs(x), dim=1))  
-inf | min(sum(abs(x), dim=1))  
1 | max(sum(abs(x), dim=0))  
-1 | min(sum(abs(x), dim=0))  
2 | ratio of the largest singular value to the smallest singular value  
-2 | ratio of the smallest singular value to the largest singular value  
  
Default: `None`

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – tensor
to write the output to. Default is `None`.

Returns

    

The condition number of `input`. The output dtype is always real valued even
for complex inputs (e.g. float if `input` is cfloat).

Examples:

    
    
    >>> a = torch.randn(3, 4, 4, dtype=torch.complex64)
    >>> torch.linalg.cond(a)
    >>> a = torch.tensor([[1., 0, -1], [0, 1, 0], [1, 0, 1]])
    >>> torch.linalg.cond(a)
    tensor([1.4142])
    >>> torch.linalg.cond(a, 'fro')
    tensor(3.1623)
    >>> torch.linalg.cond(a, 'nuc')
    tensor(9.2426)
    >>> torch.linalg.cond(a, float('inf'))
    tensor(2.)
    >>> torch.linalg.cond(a, float('-inf'))
    tensor(1.)
    >>> torch.linalg.cond(a, 1)
    tensor(2.)
    >>> torch.linalg.cond(a, -1)
    tensor(1.)
    >>> torch.linalg.cond(a, 2)
    tensor([1.4142])
    >>> torch.linalg.cond(a, -2)
    tensor([0.7071])
    
    >>> a = torch.randn(2, 3, 3)
    >>> a
    tensor([[[-0.9204,  1.1140,  1.2055],
            [ 0.3988, -0.2395, -0.7441],
            [-0.5160,  0.3115,  0.2619]],
    
            [[-2.2128,  0.9241,  2.1492],
            [-1.1277,  2.7604, -0.8760],
            [ 1.2159,  0.5960,  0.0498]]])
    >>> torch.linalg.cond(a)
    tensor([[9.5917],
            [3.2538]])
    
    >>> a = torch.randn(2, 3, 3, dtype=torch.complex64)
    >>> a
    tensor([[[-0.4671-0.2137j, -0.1334-0.9508j,  0.6252+0.1759j],
            [-0.3486-0.2991j, -0.1317+0.1252j,  0.3025-0.1604j],
            [-0.5634+0.8582j,  0.1118-0.4677j, -0.1121+0.7574j]],
    
            [[ 0.3964+0.2533j,  0.9385-0.6417j, -0.0283-0.8673j],
            [ 0.2635+0.2323j, -0.8929-1.1269j,  0.3332+0.0733j],
            [ 0.1151+0.1644j, -1.1163+0.3471j, -0.5870+0.1629j]]])
    >>> torch.linalg.cond(a)
    tensor([[4.6245],
            [4.5671]])
    >>> torch.linalg.cond(a, 1)
    tensor([9.2589, 9.3486])
    

`torch.linalg.det(input) → Tensor`

    

Computes the determinant of a square matrix `input`, or of each square matrix
in a batched `input`.

This function supports float, double, cfloat and cdouble dtypes.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

The determinant is computed using LU factorization. LAPACK’s `getrf` is used
for CPU inputs, and MAGMA’s `getrf` is used for CUDA inputs.

Note

Backward through `det` internally uses `torch.linalg.svd()` when `input` is
not invertible. In this case, double backward through `det` will be unstable
when `input` doesn’t have distinct singular values. See `torch.linalg.svd()`
for more details.

Parameters

    

**input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix
of size `(n, n)` or the batch of matrices of size `(*, n, n)` where `*` is one
or more batch dimensions.

Example:

    
    
    >>> a = torch.randn(3, 3)
    >>> a
    tensor([[ 0.9478,  0.9158, -1.1295],
            [ 0.9701,  0.7346, -1.8044],
            [-0.2337,  0.0557,  0.6929]])
    >>> torch.linalg.det(a)
    tensor(0.0934)
    
    >>> a = torch.randn(3, 2, 2)
    >>> a
    tensor([[[ 0.9254, -0.6213],
             [-0.5787,  1.6843]],
    
            [[ 0.3242, -0.9665],
             [ 0.4539, -0.0887]],
    
            [[ 1.1336, -0.4025],
             [-0.7089,  0.9032]]])
    >>> torch.linalg.det(a)
    tensor([1.1990, 0.4099, 0.7386])
    

`torch.linalg.slogdet(input, *, out=None) -> (Tensor, Tensor)`

    

Calculates the sign and natural logarithm of the absolute value of a square
matrix’s determinant, or of the absolute values of the determinants of a batch
of square matrices `input`. The determinant can be computed with `sign *
exp(logabsdet)`.

Supports input of float, double, cfloat and cdouble datatypes.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

The determinant is computed using LU factorization. LAPACK’s `getrf` is used
for CPU inputs, and MAGMA’s `getrf` is used for CUDA inputs.

Note

For matrices that have zero determinant, this returns `(0, -inf)`. If `input`
is batched then the entries in the result tensors corresponding to matrices
with the zero determinant have sign 0 and the natural logarithm of the
absolute value of the determinant -inf.

Parameters

    

**input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix
of size (n,n)(n, n) or the batch of matrices of size (∗,n,n)(*, n, n) where ∗*
is one or more batch dimensions.

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – tuple of two tensors to write the output to.

Returns

    

A namedtuple (sign, logabsdet) containing the sign of the determinant and the
natural logarithm of the absolute value of determinant, respectively.

Example:

    
    
    >>> A = torch.randn(3, 3)
    >>> A
    tensor([[ 0.0032, -0.2239, -1.1219],
            [-0.6690,  0.1161,  0.4053],
            [-1.6218, -0.9273, -0.0082]])
    >>> torch.linalg.det(A)
    tensor(-0.7576)
    >>> torch.linalg.logdet(A)
    tensor(nan)
    >>> torch.linalg.slogdet(A)
    torch.return_types.linalg_slogdet(sign=tensor(-1.), logabsdet=tensor(-0.2776))
    

`torch.linalg.eigh(input, UPLO='L', *, out=None) -> (Tensor, Tensor)`

    

Computes the eigenvalues and eigenvectors of a complex Hermitian (or real
symmetric) matrix `input`, or of each such matrix in a batched `input`.

For a single matrix `input`, the tensor of eigenvalues `w` and the tensor of
eigenvectors `V` decompose the `input` such that `input = V diag(w) Vᴴ`, where
`Vᴴ` is the transpose of `V` for real-valued `input`, or the conjugate
transpose of `V` for complex-valued `input`.

Since the matrix or matrices in `input` are assumed to be Hermitian, the
imaginary part of their diagonals is always treated as zero. When `UPLO` is
“L”, its default value, only the lower triangular part of each matrix is used
in the computation. When `UPLO` is “U” only the upper triangular part of each
matrix is used.

Supports input of float, double, cfloat and cdouble dtypes.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

The eigenvalues/eigenvectors are computed using LAPACK’s `syevd` and `heevd`
routines for CPU inputs, and MAGMA’s `syevd` and `heevd` routines for CUDA
inputs.

Note

The eigenvalues of real symmetric or complex Hermitian matrices are always
real.

Note

The eigenvectors of matrices are not unique, so any eigenvector multiplied by
a constant remains a valid eigenvector. This function may compute different
eigenvector representations on different device types. Usually the difference
is only in the sign of the eigenvector.

Note

See `torch.linalg.eigvalsh()` for a related function that computes only
eigenvalues. However, that function is not differentiable.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the Hermitian `n times n` matrix or the batch of such matrices of size `(*, n, n)` where `*` is one or more batch dimensions.
  * **UPLO** (_'L'__,__'U'__,__optional_) – controls whether to use the upper-triangular or the lower-triangular part of `input` in the computations. Default is `'L'`.

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – tuple of two tensors to write the output to.
Default is `None`.

Returns

    

A namedtuple (eigenvalues, eigenvectors) containing

  * `eigenvalues (Tensor): Shape (*, m).`
    

The eigenvalues in ascending order.

  * `eigenvectors (Tensor): Shape (*, m, m).`
    

The orthonormal eigenvectors of the `input`.

Return type

    

([Tensor](tensors#torch.Tensor "torch.Tensor"), [Tensor](tensors#torch.Tensor
"torch.Tensor"))

Examples:

    
    
    >>> a = torch.randn(2, 2, dtype=torch.complex128)
    >>> a = a + a.t().conj()  # creates a Hermitian matrix
    >>> a
    tensor([[2.9228+0.0000j, 0.2029-0.0862j],
            [0.2029+0.0862j, 0.3464+0.0000j]], dtype=torch.complex128)
    >>> w, v = torch.linalg.eigh(a)
    >>> w
    tensor([0.3277, 2.9415], dtype=torch.float64)
    >>> v
    tensor([[-0.0846+-0.0000j, -0.9964+0.0000j],
            [ 0.9170+0.3898j, -0.0779-0.0331j]], dtype=torch.complex128)
    >>> torch.allclose(torch.matmul(v, torch.matmul(w.to(v.dtype).diag_embed(), v.t().conj())), a)
    True
    
    >>> a = torch.randn(3, 2, 2, dtype=torch.float64)
    >>> a = a + a.transpose(-2, -1)  # creates a symmetric matrix
    >>> w, v = torch.linalg.eigh(a)
    >>> torch.allclose(torch.matmul(v, torch.matmul(w.diag_embed(), v.transpose(-2, -1))), a)
    True
    

`torch.linalg.eigvalsh(input, UPLO='L', *, out=None) → Tensor`

    

Computes the eigenvalues of a complex Hermitian (or real symmetric) matrix
`input`, or of each such matrix in a batched `input`. The eigenvalues are
returned in ascending order.

Since the matrix or matrices in `input` are assumed to be Hermitian, the
imaginary part of their diagonals is always treated as zero. When `UPLO` is
“L”, its default value, only the lower triangular part of each matrix is used
in the computation. When `UPLO` is “U” only the upper triangular part of each
matrix is used.

Supports input of float, double, cfloat and cdouble dtypes.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

The eigenvalues are computed using LAPACK’s `syevd` and `heevd` routines for
CPU inputs, and MAGMA’s `syevd` and `heevd` routines for CUDA inputs.

Note

The eigenvalues of real symmetric or complex Hermitian matrices are always
real.

Note

This function doesn’t support backpropagation, please use
`torch.linalg.eigh()` instead, which also computes the eigenvectors.

Note

See `torch.linalg.eigh()` for a related function that computes both
eigenvalues and eigenvectors.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the Hermitian `n times n` matrix or the batch of such matrices of size `(*, n, n)` where `*` is one or more batch dimensions.
  * **UPLO** (_'L'__,__'U'__,__optional_) – controls whether to use the upper-triangular or the lower-triangular part of `input` in the computations. Default is `'L'`.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – tensor
to write the output to. Default is `None`.

Examples:

    
    
    >>> a = torch.randn(2, 2, dtype=torch.complex128)
    >>> a = a + a.t().conj()  # creates a Hermitian matrix
    >>> a
    tensor([[2.9228+0.0000j, 0.2029-0.0862j],
            [0.2029+0.0862j, 0.3464+0.0000j]], dtype=torch.complex128)
    >>> w = torch.linalg.eigvalsh(a)
    >>> w
    tensor([0.3277, 2.9415], dtype=torch.float64)
    
    >>> a = torch.randn(3, 2, 2, dtype=torch.float64)
    >>> a = a + a.transpose(-2, -1)  # creates a symmetric matrix
    >>> a
    tensor([[[ 2.8050, -0.3850],
            [-0.3850,  3.2376]],
    
            [[-1.0307, -2.7457],
            [-2.7457, -1.7517]],
    
            [[ 1.7166,  2.2207],
            [ 2.2207, -2.0898]]], dtype=torch.float64)
    >>> w = torch.linalg.eigvalsh(a)
    >>> w
    tensor([[ 2.5797,  3.4629],
            [-4.1605,  1.3780],
            [-3.1113,  2.7381]], dtype=torch.float64)
    

`torch.linalg.matrix_rank(input, tol=None, hermitian=False, *, out=None) →
Tensor`

    

Computes the numerical rank of a matrix `input`, or of each matrix in a
batched `input`.

The matrix rank is computed as the number of singular values (or absolute
eigenvalues when `hermitian` is `True`) that are greater than the specified
`tol` threshold.

If `tol` is not specified, `tol` is set to
`S.max(dim=-1)*max(input.shape[-2:])*eps`, where `S` is the singular values
(or absolute eigenvalues when `hermitian` is `True`), and `eps` is the epsilon
value for the datatype of `input`. The epsilon value can be obtained using the
`eps` attribute of `torch.finfo`.

Supports input of float, double, cfloat and cdouble dtypes.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

The matrix rank is computed using singular value decomposition (see
`torch.linalg.svd()`) by default. If `hermitian` is `True`, then `input` is
assumed to be Hermitian (symmetric if real-valued), and the computation is
done by obtaining the eigenvalues (see `torch.linalg.eigvalsh()`).

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size `(m, n)` or the batch of matrices of size `(*, m, n)` where `*` is one or more batch dimensions.
  * **tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the tolerance value. Default is `None`
  * **hermitian** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates whether `input` is Hermitian. Default is `False`.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – tensor
to write the output to. Default is `None`.

Examples:

    
    
    >>> a = torch.eye(10)
    >>> torch.linalg.matrix_rank(a)
    tensor(10)
    >>> b = torch.eye(10)
    >>> b[0, 0] = 0
    >>> torch.linalg.matrix_rank(b)
    tensor(9)
    
    >>> a = torch.randn(4, 3, 2)
    >>> torch.linalg.matrix_rank(a)
    tensor([2, 2, 2, 2])
    
    >>> a = torch.randn(2, 4, 2, 3)
    >>> torch.linalg.matrix_rank(a)
    tensor([[2, 2, 2, 2],
            [2, 2, 2, 2]])
    
    >>> a = torch.randn(2, 4, 3, 3, dtype=torch.complex64)
    >>> torch.linalg.matrix_rank(a)
    tensor([[3, 3, 3, 3],
            [3, 3, 3, 3]])
    >>> torch.linalg.matrix_rank(a, hermitian=True)
    tensor([[3, 3, 3, 3],
            [3, 3, 3, 3]])
    >>> torch.linalg.matrix_rank(a, tol=1.0)
    tensor([[3, 2, 2, 2],
            [1, 2, 1, 2]])
    >>> torch.linalg.matrix_rank(a, tol=1.0, hermitian=True)
    tensor([[2, 2, 2, 1],
            [1, 2, 2, 2]])
    

`torch.linalg.norm(input, ord=None, dim=None, keepdim=False, *, out=None,
dtype=None) → Tensor`

    

Returns the matrix norm or vector norm of a given tensor.

This function can calculate one of eight different types of matrix norms, or
one of an infinite number of vector norms, depending on both the number of
reduction dimensions and the value of the `ord` parameter.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – The input tensor. If dim is None, x must be 1-D or 2-D, unless `ord` is None. If both `dim` and `ord` are None, the 2-norm of the input flattened to 1-D will be returned. Its data type must be either a floating point or complex type. For complex inputs, the norm is calculated on of the absolute values of each element. If the input is complex and neither `dtype` nor `out` is specified, the result’s data type will be the corresponding floating point type (e.g. float if `input` is complexfloat).
  * **ord** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'__,__optional_) – 

The order of norm. inf refers to `float('inf')`, numpy’s `inf` object, or any
equivalent object. The following norms can be calculated:

ord | norm for matrices | norm for vectors  
---|---|---  
None | Frobenius norm | 2-norm  
’fro’ | Frobenius norm | – not supported –  
‘nuc’ | nuclear norm | – not supported –  
inf | max(sum(abs(x), dim=1)) | max(abs(x))  
-inf | min(sum(abs(x), dim=1)) | min(abs(x))  
0 | – not supported – | sum(x != 0)  
1 | max(sum(abs(x), dim=0)) | as below  
-1 | min(sum(abs(x), dim=0)) | as below  
2 | 2-norm (largest sing. value) | as below  
-2 | smallest singular value | as below  
other | – not supported – | sum(abs(x)**ord)**(1./ord)  
  
Default: `None`

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__2-tuple of python:ints_ _,__2-list of python:ints_ _,__optional_) – If `dim` is an int, vector norm will be calculated over the specified dimension. If `dim` is a 2-tuple of ints, matrix norm will be calculated over the specified dimensions. If `dim` is None, matrix norm will be calculated when the input tensor has two dimensions, and vector norm will be calculated when the input tensor has one dimension. Default: `None`
  * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to True, the reduced dimensions are retained in the result as dimensions with size one. Default: `False`

Keyword Arguments

    

  * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default: `None`
  * **dtype** (`torch.dtype`, optional) – If specified, the input tensor is cast to `dtype` before performing the operation, and the returned tensor’s type will be `dtype`. If this argument is used in conjunction with the `out` argument, the output tensor’s type must match this argument or a RuntimeError will be raised. Default: `None`

Examples:

    
    
    >>> import torch
    >>> from torch import linalg as LA
    >>> a = torch.arange(9, dtype=torch.float) - 4
    >>> a
    tensor([-4., -3., -2., -1.,  0.,  1.,  2.,  3.,  4.])
    >>> b = a.reshape((3, 3))
    >>> b
    tensor([[-4., -3., -2.],
            [-1.,  0.,  1.],
            [ 2.,  3.,  4.]])
    
    >>> LA.norm(a)
    tensor(7.7460)
    >>> LA.norm(b)
    tensor(7.7460)
    >>> LA.norm(b, 'fro')
    tensor(7.7460)
    >>> LA.norm(a, float('inf'))
    tensor(4.)
    >>> LA.norm(b, float('inf'))
    tensor(9.)
    >>> LA.norm(a, -float('inf'))
    tensor(0.)
    >>> LA.norm(b, -float('inf'))
    tensor(2.)
    
    >>> LA.norm(a, 1)
    tensor(20.)
    >>> LA.norm(b, 1)
    tensor(7.)
    >>> LA.norm(a, -1)
    tensor(0.)
    >>> LA.norm(b, -1)
    tensor(6.)
    >>> LA.norm(a, 2)
    tensor(7.7460)
    >>> LA.norm(b, 2)
    tensor(7.3485)
    
    >>> LA.norm(a, -2)
    tensor(0.)
    >>> LA.norm(b.double(), -2)
    tensor(1.8570e-16, dtype=torch.float64)
    >>> LA.norm(a, 3)
    tensor(5.8480)
    >>> LA.norm(a, -3)
    tensor(0.)
    

Using the `dim` argument to compute vector norms:

    
    
    >>> c = torch.tensor([[1., 2., 3.],
    ...                   [-1, 1, 4]])
    >>> LA.norm(c, dim=0)
    tensor([1.4142, 2.2361, 5.0000])
    >>> LA.norm(c, dim=1)
    tensor([3.7417, 4.2426])
    >>> LA.norm(c, ord=1, dim=1)
    tensor([6., 6.])
    

Using the `dim` argument to compute matrix norms:

    
    
    >>> m = torch.arange(8, dtype=torch.float).reshape(2, 2, 2)
    >>> LA.norm(m, dim=(1,2))
    tensor([ 3.7417, 11.2250])
    >>> LA.norm(m[0, :, :]), LA.norm(m[1, :, :])
    (tensor(3.7417), tensor(11.2250))
    

`torch.linalg.pinv(input, rcond=1e-15, hermitian=False, *, out=None) → Tensor`

    

Computes the pseudo-inverse (also known as the Moore-Penrose inverse) of a
matrix `input`, or of each matrix in a batched `input`.

The singular values (or the absolute values of the eigenvalues when
`hermitian` is `True`) that are below the specified `rcond` threshold are
treated as zero and discarded in the computation.

Supports input of float, double, cfloat and cdouble datatypes.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

The pseudo-inverse is computed using singular value decomposition (see
`torch.linalg.svd()`) by default. If `hermitian` is `True`, then `input` is
assumed to be Hermitian (symmetric if real-valued), and the computation of the
pseudo-inverse is done by obtaining the eigenvalues and eigenvectors (see
`torch.linalg.eigh()`).

Note

If singular value decomposition or eigenvalue decomposition algorithms do not
converge then a RuntimeError will be thrown.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size `(m, n)` or the batch of matrices of size `(*, m, n)` where `*` is one or more batch dimensions.
  * **rcond** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – the tolerance value to determine the cutoff for small singular values. Must be broadcastable to the singular values of `input` as returned by [`torch.svd()`](generated/torch.svd#torch.svd "torch.svd"). Default is `1e-15`.
  * **hermitian** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates whether `input` is Hermitian. Default is `False`.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The
output tensor. Ignored if `None`. Default is `None`.

Examples:

    
    
    >>> input = torch.randn(3, 5)
    >>> input
    tensor([[ 0.5495,  0.0979, -1.4092, -0.1128,  0.4132],
            [-1.1143, -0.3662,  0.3042,  1.6374, -0.9294],
            [-0.3269, -0.5745, -0.0382, -0.5922, -0.6759]])
    >>> torch.linalg.pinv(input)
    tensor([[ 0.0600, -0.1933, -0.2090],
            [-0.0903, -0.0817, -0.4752],
            [-0.7124, -0.1631, -0.2272],
            [ 0.1356,  0.3933, -0.5023],
            [-0.0308, -0.1725, -0.5216]])
    
    Batched linalg.pinv example
    >>> a = torch.randn(2, 6, 3)
    >>> b = torch.linalg.pinv(a)
    >>> torch.matmul(b, a)
    tensor([[[ 1.0000e+00,  1.6391e-07, -1.1548e-07],
            [ 8.3121e-08,  1.0000e+00, -2.7567e-07],
            [ 3.5390e-08,  1.4901e-08,  1.0000e+00]],
    
            [[ 1.0000e+00, -8.9407e-08,  2.9802e-08],
            [-2.2352e-07,  1.0000e+00,  1.1921e-07],
            [ 0.0000e+00,  8.9407e-08,  1.0000e+00]]])
    
    Hermitian input example
    >>> a = torch.randn(3, 3, dtype=torch.complex64)
    >>> a = a + a.t().conj()  # creates a Hermitian matrix
    >>> b = torch.linalg.pinv(a, hermitian=True)
    >>> torch.matmul(b, a)
    tensor([[ 1.0000e+00+0.0000e+00j, -1.1921e-07-2.3842e-07j,
            5.9605e-08-2.3842e-07j],
            [ 5.9605e-08+2.3842e-07j,  1.0000e+00+2.3842e-07j,
            -4.7684e-07+1.1921e-07j],
            [-1.1921e-07+0.0000e+00j, -2.3842e-07-2.9802e-07j,
            1.0000e+00-1.7897e-07j]])
    
    Non-default rcond example
    >>> rcond = 0.5
    >>> a = torch.randn(3, 3)
    >>> torch.linalg.pinv(a)
    tensor([[ 0.2971, -0.4280, -2.0111],
            [-0.0090,  0.6426, -0.1116],
            [-0.7832, -0.2465,  1.0994]])
    >>> torch.linalg.pinv(a, rcond)
    tensor([[-0.2672, -0.2351, -0.0539],
            [-0.0211,  0.6467, -0.0698],
            [-0.4400, -0.3638, -0.0910]])
    
    Matrix-wise rcond example
    >>> a = torch.randn(5, 6, 2, 3, 3)
    >>> rcond = torch.rand(2)  # different rcond values for each matrix in a[:, :, 0] and a[:, :, 1]
    >>> torch.linalg.pinv(a, rcond)
    >>> rcond = torch.randn(5, 6, 2) # different rcond value for each matrix in 'a'
    >>> torch.linalg.pinv(a, rcond)
    

`torch.linalg.svd(input, full_matrices=True, compute_uv=True, *, out=None) ->
(Tensor, Tensor, Tensor)`

    

Computes the singular value decomposition of either a matrix or batch of
matrices `input`.” The singular value decomposition is represented as a
namedtuple `(U, S, Vh)`, such that input=U@diag(S)×Vhinput = U \mathbin{@}
diag(S) \times Vh . If `input` is a batch of tensors, then `U`, `S`, and `Vh`
are also batched with the same batch dimensions as `input`.

If `full_matrices` is `False` (default), the method returns the reduced
singular value decomposition i.e., if the last two dimensions of `input` are
`m` and `n`, then the returned `U` and `V` matrices will contain only
min(n,m)min(n, m) orthonormal columns.

If `compute_uv` is `False`, the returned `U` and `Vh` will be empy tensors
with no elements and the same device as `input`. The `full_matrices` argument
has no effect when `compute_uv` is False.

The dtypes of `U` and `V` are the same as `input`’s. `S` will always be real-
valued, even if `input` is complex.

Note

Unlike NumPy’s `linalg.svd`, this always returns a namedtuple of three
tensors, even when `compute_uv=False`. This behavior may change in a future
PyTorch release.

Note

The singular values are returned in descending order. If `input` is a batch of
matrices, then the singular values of each matrix in the batch is returned in
descending order.

Note

The implementation of SVD on CPU uses the LAPACK routine `?gesdd` (a divide-
and-conquer algorithm) instead of `?gesvd` for speed. Analogously, the SVD on
GPU uses the cuSOLVER routines `gesvdj` and `gesvdjBatched` on CUDA 10.1.243
and later, and uses the MAGMA routine `gesdd` on earlier versions of CUDA.

Note

The returned matrix `U` will be transposed, i.e. with strides
`U.contiguous().transpose(-2, -1).stride()`.

Note

Gradients computed using `U` and `Vh` may be unstable if `input` is not full
rank or has non-unique singular values.

Note

When `full_matrices` = `True`, the gradients on `U[..., :, min(m, n):]` and
`V[..., :, min(m, n):]` will be ignored in backward as those vectors can be
arbitrary bases of the subspaces.

Note

The `S` tensor can only be used to compute gradients if `compute_uv` is True.

Note

Since `U` and `V` of an SVD is not unique, each vector can be multiplied by an
arbitrary phase factor eiϕe^{i \phi} while the SVD result is still correct.
Different platforms, like Numpy, or inputs on different device types, may
produce different `U` and `V` tensors.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n) where `*` is zero or more batch dimensions consisting of m×nm \times n matrices.
  * **full_matrices** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to compute the full or reduced decomposition, and consequently the shape of returned `U` and `V`. Defaults to True.
  * **compute_uv** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to compute `U` and `V` or not. Defaults to True.
  * **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – a tuple of three tensors to use for the outputs. If compute_uv=False, the 1st and 3rd arguments must be tensors, but they are ignored. E.g. you can pass `(torch.Tensor(), out_S, torch.Tensor())`

Example:

    
    
    >>> import torch
    >>> a = torch.randn(5, 3)
    >>> a
    tensor([[-0.3357, -0.2987, -1.1096],
            [ 1.4894,  1.0016, -0.4572],
            [-1.9401,  0.7437,  2.0968],
            [ 0.1515,  1.3812,  1.5491],
            [-1.8489, -0.5907, -2.5673]])
    >>>
    >>> # reconstruction in the full_matrices=False case
    >>> u, s, vh = torch.linalg.svd(a, full_matrices=False)
    >>> u.shape, s.shape, vh.shape
    (torch.Size([5, 3]), torch.Size([3]), torch.Size([3, 3]))
    >>> torch.dist(a, u @ torch.diag(s) @ vh)
    tensor(1.0486e-06)
    >>>
    >>> # reconstruction in the full_matrices=True case
    >>> u, s, vh = torch.linalg.svd(a)
    >>> u.shape, s.shape, vh.shape
    (torch.Size([5, 5]), torch.Size([3]), torch.Size([3, 3]))
    >>> torch.dist(a, u[:, :3] @ torch.diag(s) @ vh)
    >>> torch.dist(a, u[:, :3] @ torch.diag(s) @ vh)
    tensor(1.0486e-06)
    >>>
    >>> # extra dimensions
    >>> a_big = torch.randn(7, 5, 3)
    >>> u, s, vh = torch.linalg.svd(a_big, full_matrices=False)
    >>> torch.dist(a_big, u @ torch.diag_embed(s) @ vh)
    tensor(3.0957e-06)
    

`torch.linalg.solve(input, other, *, out=None) → Tensor`

    

Computes the solution `x` to the matrix equation `matmul(input, x) = other`
with a square matrix, or batches of such matrices, `input` and one or more
right-hand side vectors `other`. If `input` is batched and `other` is not,
then `other` is broadcast to have the same batch dimensions as `input`. The
resulting tensor has the same shape as the (possibly broadcast) `other`.

Supports input of `float`, `double`, `cfloat` and `cdouble` dtypes.

Note

If `input` is a non-square or non-invertible matrix, or a batch containing
non-square matrices or one or more non-invertible matrices, then a
RuntimeError will be thrown.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the square n×nn \times n matrix or the batch of such matrices of size (∗,n,n)(*, n, n) where `*` is one or more batch dimensions.
  * **other** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – right-hand side tensor of shape (∗,n)(*, n) or (∗,n,k)(*, n, k) , where kk is the number of right-hand side vectors.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The
output tensor. Ignored if `None`. Default: `None`

Examples:

    
    
    >>> A = torch.eye(3)
    >>> b = torch.randn(3)
    >>> x = torch.linalg.solve(A, b)
    >>> torch.allclose(A @ x, b)
    True
    

Batched input:

    
    
    >>> A = torch.randn(2, 3, 3)
    >>> b = torch.randn(3, 1)
    >>> x = torch.linalg.solve(A, b)
    >>> torch.allclose(A @ x, b)
    True
    >>> b = torch.rand(3) # b is broadcast internally to (*A.shape[:-2], 3)
    >>> x = torch.linalg.solve(A, b)
    >>> x.shape
    torch.Size([2, 3])
    >>> Ax = A @ x.unsqueeze(-1)
    >>> torch.allclose(Ax, b.unsqueeze(-1).expand_as(Ax))
    True
    

`torch.linalg.tensorinv(input, ind=2, *, out=None) → Tensor`

    

Computes a tensor `input_inv` such that `tensordot(input_inv, input, ind) ==
I_n` (inverse tensor equation), where `I_n` is the n-dimensional identity
tensor and `n` is equal to `input.ndim`. The resulting tensor `input_inv` has
shape equal to `input.shape[ind:] + input.shape[:ind]`.

Supports input of `float`, `double`, `cfloat` and `cdouble` data types.

Note

If `input` is not invertible or does not satisfy the requirement
`prod(input.shape[ind:]) == prod(input.shape[:ind])`, then a RuntimeError will
be thrown.

Note

When `input` is a 2-dimensional tensor and `ind=1`, this function computes the
(multiplicative) inverse of `input`, equivalent to calling
[`torch.inverse()`](generated/torch.inverse#torch.inverse "torch.inverse").

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – A tensor to invert. Its shape must satisfy `prod(input.shape[:ind]) == prod(input.shape[ind:])`.
  * **ind** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A positive integer that describes the inverse tensor equation. See [`torch.tensordot()`](generated/torch.tensordot#torch.tensordot "torch.tensordot") for details. Default: 2.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The
output tensor. Ignored if `None`. Default: `None`

Examples:

    
    
    >>> a = torch.eye(4 * 6).reshape((4, 6, 8, 3))
    >>> ainv = torch.linalg.tensorinv(a, ind=2)
    >>> ainv.shape
    torch.Size([8, 3, 4, 6])
    >>> b = torch.randn(4, 6)
    >>> torch.allclose(torch.tensordot(ainv, b), torch.linalg.tensorsolve(a, b))
    True
    
    >>> a = torch.randn(4, 4)
    >>> a_tensorinv = torch.linalg.tensorinv(a, ind=1)
    >>> a_inv = torch.inverse(a)
    >>> torch.allclose(a_tensorinv, a_inv)
    True
    

`torch.linalg.tensorsolve(input, other, dims=None, *, out=None) → Tensor`

    

Computes a tensor `x` such that `tensordot(input, x, dims=x.ndim) = other`.
The resulting tensor `x` has the same shape as `input[other.ndim:]`.

Supports real-valued and complex-valued inputs.

Note

If `input` does not satisfy the requirement `prod(input.shape[other.ndim:]) ==
prod(input.shape[:other.ndim])` after (optionally) moving the dimensions using
`dims`, then a RuntimeError will be thrown.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – “left-hand-side” tensor, it must satisfy the requirement `prod(input.shape[other.ndim:]) == prod(input.shape[:other.ndim])`.
  * **other** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – “right-hand-side” tensor of shape `input.shape[other.ndim]`.
  * **dims** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – dimensions of `input` to be moved before the computation. Equivalent to calling `input = movedim(input, dims, range(len(dims) - input.ndim, 0))`. If None (default), no dimensions are moved.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The
output tensor. Ignored if `None`. Default: `None`

Examples:

    
    
    >>> a = torch.eye(2 * 3 * 4).reshape((2 * 3, 4, 2, 3, 4))
    >>> b = torch.randn(2 * 3, 4)
    >>> x = torch.linalg.tensorsolve(a, b)
    >>> x.shape
    torch.Size([2, 3, 4])
    >>> torch.allclose(torch.tensordot(a, x, dims=x.ndim), b)
    True
    
    >>> a = torch.randn(6, 4, 4, 3, 2)
    >>> b = torch.randn(4, 3, 2)
    >>> x = torch.linalg.tensorsolve(a, b, dims=(0, 2))
    >>> x.shape
    torch.Size([6, 4])
    >>> a = a.permute(1, 3, 4, 0, 2)
    >>> a.shape[b.ndim:]
    torch.Size([6, 4])
    >>> torch.allclose(torch.tensordot(a, x, dims=x.ndim), b, atol=1e-6)
    True
    

`torch.linalg.inv(input, *, out=None) → Tensor`

    

Computes the multiplicative inverse matrix of a square matrix `input`, or of
each square matrix in a batched `input`. The result satisfies the relation:

`matmul(inv(input),input)` = `matmul(input,inv(input))` =
`eye(input.shape[0]).expand_as(input)`.

Supports input of float, double, cfloat and cdouble data types.

Note

When given inputs on a CUDA device, this function synchronizes that device
with the CPU.

Note

The inverse matrix is computed using LAPACK’s `getrf` and `getri` routines for
CPU inputs. For CUDA inputs, cuSOLVER’s `getrf` and `getrs` routines as well
as cuBLAS’ `getrf` and `getri` routines are used if CUDA version >= 10.1.243,
otherwise MAGMA’s `getrf` and `getri` routines are used instead.

Note

If `input` is a non-invertible matrix or non-square matrix, or batch with at
least one such matrix, then a RuntimeError will be thrown.

Parameters

    

**input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the square `(n,
n)` matrix or the batch of such matrices of size `(*, n, n)` where `*` is one
or more batch dimensions.

Keyword Arguments

    

**out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The
output tensor. Ignored if `None`. Default is `None`.

Examples:

    
    
    >>> x = torch.rand(4, 4)
    >>> y = torch.linalg.inv(x)
    >>> z = torch.mm(x, y)
    >>> z
    tensor([[ 1.0000, -0.0000, -0.0000,  0.0000],
            [ 0.0000,  1.0000,  0.0000,  0.0000],
            [ 0.0000,  0.0000,  1.0000,  0.0000],
            [ 0.0000, -0.0000, -0.0000,  1.0000]])
    >>> torch.max(torch.abs(z - torch.eye(4))) # Max non-zero
    tensor(1.1921e-07)
    
    >>> # Batched inverse example
    >>> x = torch.randn(2, 3, 4, 4)
    >>> y = torch.linalg.inv(x)
    >>> z = torch.matmul(x, y)
    >>> torch.max(torch.abs(z - torch.eye(4).expand_as(x))) # Max non-zero
    tensor(1.9073e-06)
    
    >>> x = torch.rand(4, 4, dtype=torch.cdouble)
    >>> y = torch.linalg.inv(x)
    >>> z = torch.mm(x, y)
    >>> z
    tensor([[ 1.0000e+00+0.0000e+00j, -1.3878e-16+3.4694e-16j,
            5.5511e-17-1.1102e-16j,  0.0000e+00-1.6653e-16j],
            [ 5.5511e-16-1.6653e-16j,  1.0000e+00+6.9389e-17j,
            2.2204e-16-1.1102e-16j, -2.2204e-16+1.1102e-16j],
            [ 3.8858e-16-1.2490e-16j,  2.7756e-17+3.4694e-17j,
            1.0000e+00+0.0000e+00j, -4.4409e-16+5.5511e-17j],
            [ 4.4409e-16+5.5511e-16j, -3.8858e-16+1.8041e-16j,
            2.2204e-16+0.0000e+00j,  1.0000e+00-3.4694e-16j]],
        dtype=torch.complex128)
    >>> torch.max(torch.abs(z - torch.eye(4, dtype=torch.cdouble))) # Max non-zero
    tensor(7.5107e-16, dtype=torch.float64)
    

`torch.linalg.qr(input, mode='reduced', *, out=None) -> (Tensor, Tensor)`

    

Computes the QR decomposition of a matrix or a batch of matrices `input`, and
returns a namedtuple (Q, R) of tensors such that input=QR\text{input} = Q R
with QQ being an orthogonal matrix or batch of orthogonal matrices and RR
being an upper triangular matrix or batch of upper triangular matrices.

Depending on the value of `mode` this function returns the reduced or complete
QR factorization. See below for a list of valid modes.

Note

**Differences with** `numpy.linalg.qr`:

  * `mode='raw'` is not implemented
  * unlike `numpy.linalg.qr`, this function always returns a tuple of two tensors. When `mode='r'`, the `Q` tensor is an empty tensor. This behavior may change in a future PyTorch release.

Note

Backpropagation is not supported for `mode='r'`. Use `mode='reduced'` instead.

Backpropagation is also not supported if the first
min⁡(input.size(−1),input.size(−2))\min(input.size(-1), input.size(-2))
columns of any matrix in `input` are not linearly independent. While no error
will be thrown when this occurs the values of the “gradient” produced may be
anything. This behavior may change in the future.

Note

This function uses LAPACK for CPU inputs and MAGMA for CUDA inputs, and may
produce different (valid) decompositions on different device types or
different platforms.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n) where `*` is zero or more batch dimensions consisting of matrices of dimension m×nm \times n .
  * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – 

if `k = min(m, n)` then:

    * `'reduced'` : returns `(Q, R)` with dimensions (m, k), (k, n) (default)
    * `'complete'`: returns `(Q, R)` with dimensions (m, m), (m, n)
    * `'r'`: computes only `R`; returns `(Q, R)` where `Q` is empty and `R` has dimensions (k, n)

Keyword Arguments

    

**out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in
Python v3.9\)") _,__optional_) – tuple of `Q` and `R` tensors. The dimensions
of `Q` and `R` are detailed in the description of `mode` above.

Example:

    
    
    >>> a = torch.tensor([[12., -51, 4], [6, 167, -68], [-4, 24, -41]])
    >>> q, r = torch.linalg.qr(a)
    >>> q
    tensor([[-0.8571,  0.3943,  0.3314],
            [-0.4286, -0.9029, -0.0343],
            [ 0.2857, -0.1714,  0.9429]])
    >>> r
    tensor([[ -14.0000,  -21.0000,   14.0000],
            [   0.0000, -175.0000,   70.0000],
            [   0.0000,    0.0000,  -35.0000]])
    >>> torch.mm(q, r).round()
    tensor([[  12.,  -51.,    4.],
            [   6.,  167.,  -68.],
            [  -4.,   24.,  -41.]])
    >>> torch.mm(q.t(), q).round()
    tensor([[ 1.,  0.,  0.],
            [ 0.,  1., -0.],
            [ 0., -0.,  1.]])
    >>> q2, r2 = torch.linalg.qr(a, mode='r')
    >>> q2
    tensor([])
    >>> torch.equal(r, r2)
    True
    >>> a = torch.randn(3, 4, 5)
    >>> q, r = torch.linalg.qr(a, mode='complete')
    >>> torch.allclose(torch.matmul(q, r), a)
    True
    >>> torch.allclose(torch.matmul(q.transpose(-2, -1), q), torch.eye(5))
    True
    

# torch.utils.mobile_optimizer

Warning

This API is in beta and may change in the near future.

Torch mobile supports `torch.mobile_optimizer.optimize_for_mobile` utility to
run a list of optimization pass with modules in eval mode. The method takes
the following parameters: a torch.jit.ScriptModule object, a blocklisting
optimization set and a preserved method list

`By default, if optimization blocklist is None or empty, optimize_for_mobile
will run the following optimizations:`

    

  * **Conv2D + BatchNorm fusion** (blocklisting option `MobileOptimizerType::CONV_BN_FUSION`): This optimization pass folds `Conv2d-BatchNorm2d` into `Conv2d` in `forward` method of this module and all its submodules. The weight and bias of the `Conv2d` are correspondingly updated.
  * **Insert and Fold prepacked ops** (blocklisting option `MobileOptimizerType::INSERT_FOLD_PREPACK_OPS`): This optimization pass rewrites the graph to replace 2D convolutions and linear ops with their prepacked counterparts. Prepacked ops are stateful ops in that, they require some state to be created, such as weight prepacking and use this state, i.e. prepacked weights, during op execution. XNNPACK is one such backend that provides prepacked ops, with kernels optimized for mobile platforms (such as ARM CPUs). Prepacking of weight enables efficient memory access and thus faster kernel execution. At the moment `optimize_for_mobile` pass rewrites the graph to replace `Conv2D/Linear` with 1) op that pre-packs weight for XNNPACK conv2d/linear ops and 2) op that takes pre-packed weight and activation as input and generates output activations. Since 1 needs to be done only once, we fold the weight pre-packing such that it is done only once at model load time. This pass of the `optimize_for_mobile` does 1 and 2 and then folds, i.e. removes, weight pre-packing ops.
  * **ReLU/Hardtanh fusion** : XNNPACK ops support fusion of clamping. That is clamping of output activation is done as part of the kernel, including for 2D convolution and linear op kernels. Thus clamping effectively comes for free. Thus any op that can be expressed as clamping op, such as `ReLU` or `hardtanh`, can be fused with previous `Conv2D` or `linear` op in XNNPACK. This pass rewrites graph by finding `ReLU/hardtanh` ops that follow XNNPACK `Conv2D/linear` ops, written by the previous pass, and fuses them together.
  * **Dropout removal** (blocklisting option `MobileOptimizerType::REMOVE_DROPOUT`): This optimization pass removes `dropout` and `dropout_` nodes from this module when training is false.
  * **Conv packed params hoisting** (blocklisting option `MobileOptimizerType::HOIST_CONV_PACKED_PARAMS`): This optimization pass moves convolution packed params to the root module, so that the convolution structs can be deleted. This decreases model size without impacting numerics.

`optimize_for_mobile` will also invoke freeze_module pass which only preserves
`forward` method. If you have other method to that needed to be preserved, add
them into the preserved method list and pass into the method.

`torch.utils.mobile_optimizer.optimize_for_mobile(script_module,
optimization_blocklist=None, preserved_methods=None, backend='CPU')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/mobile_optimizer.html#optimize_for_mobile)

    

Parameters

    

  * **script_module** – An instance of torch script module with type of ScriptModule.
  * **optimization_blocklist** – A set with type of MobileOptimizerType. When set is not passed, optimization method will run all the optimizer pass; otherwise, optimizer method will run the optimization pass that is not included inside optimization_blocklist.
  * **perserved_methods** – A list of methods that needed to be preserved when freeze_module pass is invoked
  * **backend** – Device type to use for running the result model (‘CPU’(default), ‘Vulkan’ or ‘Metal’).

Returns

    

A new optimized torch script module

# torch.utils.model_zoo

Moved to `torch.hub`.

`torch.utils.model_zoo.load_url(url, model_dir=None, map_location=None,
progress=True, check_hash=False, file_name=None)`

    

Loads the Torch serialized object at the given URL.

If downloaded file is a zip file, it will be automatically decompressed.

If the object is already present in `model_dir`, it’s deserialized and
returned. The default value of `model_dir` is `<hub_dir>/checkpoints` where
`hub_dir` is the directory returned by [`get_dir()`](hub#torch.hub.get_dir
"torch.hub.get_dir").

Parameters

    

  * **url** (_string_) – URL of the object to download
  * **model_dir** (_string_ _,__optional_) – directory in which to save the object
  * **map_location** (_optional_) – a function or a dict specifying how to remap storage locations (see torch.load)
  * **progress** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether or not to display a progress bar to stderr. Default: True
  * **check_hash** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, the filename part of the URL should follow the naming convention `filename-<sha256>.ext` where `<sha256>` is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file. Default: False
  * **file_name** (_string_ _,__optional_) – name for the downloaded file. Filename from `url` will be used if not set.

#### Example

    
    
    >>> state_dict = torch.hub.load_state_dict_from_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth')
    

# Multiprocessing package - torch.multiprocessing

torch.multiprocessing is a wrapper around the native
[`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-
multiprocessing "\(in Python v3.9\)") module. It registers custom reducers,
that use shared memory to provide shared views on the same data in different
processes. Once the tensor/storage is moved to shared_memory (see
[`share_memory_()`](tensors#torch.Tensor.share_memory_
"torch.Tensor.share_memory_")), it will be possible to send it to other
processes without making any copies.

The API is 100% compatible with the original module - it’s enough to change
`import multiprocessing` to `import torch.multiprocessing` to have all the
tensors sent through the queues or shared via other mechanisms, moved to
shared memory.

Because of the similarity of APIs we do not document most of this package
contents, and we recommend referring to very good docs of the original module.

Warning

If the main process exits abruptly (e.g. because of an incoming signal),
Python’s `multiprocessing` sometimes fails to clean up its children. It’s a
known caveat, so if you’re seeing any resource leaks after interrupting the
interpreter, it probably means that this has just happened to you.

## Strategy management

`torch.multiprocessing.get_all_sharing_strategies()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing.html#get_all_sharing_strategies)

    

Returns a set of sharing strategies supported on a current system.

`torch.multiprocessing.get_sharing_strategy()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing.html#get_sharing_strategy)

    

Returns the current strategy for sharing CPU tensors.

`torch.multiprocessing.set_sharing_strategy(new_strategy)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing.html#set_sharing_strategy)

    

Sets the strategy for sharing CPU tensors.

Parameters

    

**new_strategy** ([str](https://docs.python.org/3/library/stdtypes.html#str
"\(in Python v3.9\)")) – Name of the selected strategy. Should be one of the
values returned by `get_all_sharing_strategies()`.

## Sharing CUDA tensors

Sharing CUDA tensors between processes is supported only in Python 3, using a
`spawn` or `forkserver` start methods.

Unlike CPU tensors, the sending process is required to keep the original
tensor as long as the receiving process retains a copy of the tensor. The
refcounting is implemented under the hood but requires users to follow the
next best practices.

Warning

If the consumer process dies abnormally to a fatal signal, the shared tensor
could be forever kept in memory as long as the sending process is running.

  1. Release memory ASAP in the consumer.

    
    
    ## Good
    x = queue.get()
    # do somethings with x
    del x
    
    
    
    ## Bad
    x = queue.get()
    # do somethings with x
    # do everything else (producer have to keep x in memory)
    

2\. Keep producer process running until all consumers exits. This will prevent
the situation when the producer process releasing memory which is still in use
by the consumer.

    
    
    ## producer
    # send tensors, do something
    event.wait()
    
    
    
    ## consumer
    # receive tensors and use them
    event.set()
    

  3. Don’t pass received tensors.

    
    
    # not going to work
    x = queue.get()
    queue_2.put(x)
    
    
    
    # you need to create a process-local copy
    x = queue.get()
    x_clone = x.clone()
    queue_2.put(x_clone)
    
    
    
    # putting and getting from the same queue in the same process will likely end up with segfault
    queue.put(tensor)
    x = queue.get()
    

## Sharing strategies

This section provides a brief overview into how different sharing strategies
work. Note that it applies only to CPU tensor - CUDA tensors will always use
the CUDA API, as that’s the only way they can be shared.

### File descriptor - `file_descriptor`

Note

This is the default strategy (except for macOS and OS X where it’s not
supported).

This strategy will use file descriptors as shared memory handles. Whenever a
storage is moved to shared memory, a file descriptor obtained from `shm_open`
is cached with the object, and when it’s going to be sent to other processes,
the file descriptor will be transferred (e.g. via UNIX sockets) to it. The
receiver will also cache the file descriptor and `mmap` it, to obtain a shared
view onto the storage data.

Note that if there will be a lot of tensors shared, this strategy will keep a
large number of file descriptors open most of the time. If your system has low
limits for the number of open file descriptors, and you can’t raise them, you
should use the `file_system` strategy.

### File system - `file_system`

This strategy will use file names given to `shm_open` to identify the shared
memory regions. This has a benefit of not requiring the implementation to
cache the file descriptors obtained from it, but at the same time is prone to
shared memory leaks. The file can’t be deleted right after its creation,
because other processes need to access it to open their views. If the
processes fatally crash, or are killed, and don’t call the storage
destructors, the files will remain in the system. This is very serious,
because they keep using up the memory until the system is restarted, or
they’re freed manually.

To counter the problem of shared memory file leaks, `torch.multiprocessing`
will spawn a daemon named `torch_shm_manager` that will isolate itself from
the current process group, and will keep track of all shared memory
allocations. Once all processes connected to it exit, it will wait a moment to
ensure there will be no new connections, and will iterate over all shared
memory files allocated by the group. If it finds that any of them still exist,
they will be deallocated. We’ve tested this method and it proved to be robust
to various failures. Still, if your system has high enough limits, and
`file_descriptor` is a supported strategy, we do not recommend switching to
this one.

## Spawning subprocesses

Note

Available for Python >= 3.4.

This depends on the `spawn` start method in Python’s `multiprocessing`
package.

Spawning a number of subprocesses to perform some function can be done by
creating `Process` instances and calling `join` to wait for their completion.
This approach works fine when dealing with a single subprocess but presents
potential issues when dealing with multiple processes.

Namely, joining processes sequentially implies they will terminate
sequentially. If they don’t, and the first process does not terminate, the
process termination will go unnoticed. Also, there are no native facilities
for error propagation.

The `spawn` function below addresses these concerns and takes care of error
propagation, out of order termination, and will actively terminate processes
upon detecting an error in one of them.

`torch.multiprocessing.spawn(fn, args=(), nprocs=1, join=True, daemon=False,
start_method='spawn')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing/spawn.html#spawn)

    

Spawns `nprocs` processes that run `fn` with `args`.

If one of the processes exits with a non-zero exit status, the remaining
processes are killed and an exception is raised with the cause of termination.
In the case an exception was caught in the child process, it is forwarded and
its traceback is included in the exception raised in the parent process.

Parameters

    

  * **fn** (_function_) – 

Function is called as the entrypoint of the spawned process. This function
must be defined at the top level of a module so it can be pickled and spawned.
This is a requirement imposed by multiprocessing.

The function is called as `fn(i, *args)`, where `i` is the process index and
`args` is the passed through tuple of arguments.

  * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Arguments passed to `fn`.
  * **nprocs** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of processes to spawn.
  * **join** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Perform a blocking join on all processes.
  * **daemon** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – The spawned processes’ daemon flag. If set to True, daemonic processes will be created.
  * **start_method** (_string_) – (deprecated) this method will always use `spawn` as the start method. To use a different start method use `start_processes()`.

Returns

    

None if `join` is `True`, `ProcessContext` if `join` is `False`

`class torch.multiprocessing.SpawnContext`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing/spawn.html#SpawnContext)

    

Returned by `spawn()` when called with `join=False`.

`join(timeout=None)`

    

Tries to join one or more processes in this spawn context. If one of them
exited with a non-zero exit status, this function kills the remaining
processes and raises an exception with the cause of the first process exiting.

Returns `True` if all processes have been joined successfully, `False` if
there are more processes that need to be joined.

Parameters

    

**timeout** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)")) – Wait this long before giving up on waiting.

# Named Tensors

Named Tensors allow users to give explicit names to tensor dimensions. In most
cases, operations that take dimension parameters will accept dimension names,
avoiding the need to track dimensions by position. In addition, named tensors
use names to automatically check that APIs are being used correctly at
runtime, providing extra safety. Names can also be used to rearrange
dimensions, for example, to support “broadcasting by name” rather than
“broadcasting by position”.

Warning

The named tensor API is a prototype feature and subject to change.

## Creating named tensors

Factory functions now take a new `names` argument that associates a name with
each dimension.

    
    
    >>> torch.zeros(2, 3, names=('N', 'C'))
    tensor([[0., 0., 0.],
            [0., 0., 0.]], names=('N', 'C'))
    

Named dimensions, like regular Tensor dimensions, are ordered.
`tensor.names[i]` is the name of dimension `i` of `tensor`.

The following factory functions support named tensors:

  * [`torch.empty()`](generated/torch.empty#torch.empty "torch.empty")
  * [`torch.rand()`](generated/torch.rand#torch.rand "torch.rand")
  * [`torch.randn()`](generated/torch.randn#torch.randn "torch.randn")
  * [`torch.ones()`](generated/torch.ones#torch.ones "torch.ones")
  * [`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor")
  * [`torch.zeros()`](generated/torch.zeros#torch.zeros "torch.zeros")

## Named dimensions

See `names` for restrictions on tensor names.

Use `names` to access the dimension names of a tensor and `rename()` to rename
named dimensions.

    
    
    >>> imgs = torch.randn(1, 2, 2, 3 , names=('N', 'C', 'H', 'W'))
    >>> imgs.names
    ('N', 'C', 'H', 'W')
    
    >>> renamed_imgs = imgs.rename(H='height', W='width')
    >>> renamed_imgs.names
    ('N', 'C', 'height', 'width)
    

Named tensors can coexist with unnamed tensors; named tensors are instances of
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor"). Unnamed tensors have
`None`-named dimensions. Named tensors do not require all dimensions to be
named.

    
    
    >>> imgs = torch.randn(1, 2, 2, 3 , names=(None, 'C', 'H', 'W'))
    >>> imgs.names
    (None, 'C', 'H', 'W')
    

## Name propagation semantics

Named tensors use names to automatically check that APIs are being called
correctly at runtime. This occurs in a process called _name inference_. More
formally, name inference consists of the following two steps:

  * **Check names** : an operator may perform automatic checks at runtime that check that certain dimension names must match.
  * **Propagate names** : name inference propagates names to output tensors.

All operations that support named tensors propagate names.

    
    
    >>> x = torch.randn(3, 3, names=('N', 'C'))
    >>> x.abs().names
    ('N', 'C')
    

### match semantics

Two names _match_ if they are equal (string equality) or if at least one is
`None`. Nones are essentially a special “wildcard” name.

`unify(A, B)` determines which of the names `A` and `B` to propagate to the
outputs. It returns the more _specific_ of the two names, if they match. If
the names do not match, then it errors.

Note

In practice, when working with named tensors, one should avoid having unnamed
dimensions because their handling can be complicated. It is recommended to
lift all unnamed dimensions to be named dimensions by using `refine_names()`.

### Basic name inference rules

Let’s see how `match` and `unify` are used in name inference in the case of
adding two one-dim tensors with no broadcasting.

    
    
    x = torch.randn(3, names=('X',))
    y = torch.randn(3)
    z = torch.randn(3, names=('Z',))
    

**Check names** : check that the names of the two tensors _match_.

For the following examples:

    
    
    >>> # x + y  # match('X', None) is True
    >>> # x + z  # match('X', 'Z') is False
    >>> # x + x  # match('X', 'X') is True
    
    >>> x + z
    Error when attempting to broadcast dims ['X'] and dims ['Z']: dim 'X' and dim 'Z' are at the same position from the right but do not match.
    

**Propagate names** : _unify_ the names to select which one to propagate. In
the case of `x + y`, `unify('X', None) = 'X'` because `'X'` is more specific
than `None`.

    
    
    >>> (x + y).names
    ('X',)
    >>> (x + x).names
    ('X',)
    

For a comprehensive list of name inference rules, see [Named Tensors operator
coverage](name_inference#name-inference-reference-doc). Here are two common
operations that may be useful to go over:

  * Binary arithmetic ops: [Unifies names from inputs](name_inference#unifies-names-from-inputs-doc)
  * Matrix multiplication ops: [Contracts away dims](name_inference#contracts-away-dims-doc)

## Explicit alignment by names

Use `align_as()` or `align_to()` to align tensor dimensions by name to a
specified ordering. This is useful for performing “broadcasting by names”.

    
    
    # This function is agnostic to the dimension ordering of `input`,
    # as long as it has a `C` dimension somewhere.
    def scale_channels(input, scale):
        scale = scale.refine_names('C')
        return input * scale.align_as(input)
    
    >>> num_channels = 3
    >>> scale = torch.randn(num_channels, names=('C',))
    >>> imgs = torch.rand(3, 3, 3, num_channels, names=('N', 'H', 'W', 'C'))
    >>> more_imgs = torch.rand(3, num_channels, 3, 3, names=('N', 'C', 'H', 'W'))
    >>> videos = torch.randn(3, num_channels, 3, 3, 3, names=('N', 'C', 'H', 'W', 'D')
    
    >>> scale_channels(imgs, scale)
    >>> scale_channels(more_imgs, scale)
    >>> scale_channels(videos, scale)
    

## Manipulating dimensions

Use `align_to()` to permute large amounts of dimensions without mentioning all
of them as in required by [`permute()`](tensors#torch.Tensor.permute
"torch.Tensor.permute").

    
    
    >>> tensor = torch.randn(2, 2, 2, 2, 2, 2)
    >>> named_tensor = tensor.refine_names('A', 'B', 'C', 'D', 'E', 'F')
    
    # Move the F (dim 5) and E dimension (dim 4) to the front while keeping
    # the rest in the same order
    >>> tensor.permute(5, 4, 0, 1, 2, 3)
    >>> named_tensor.align_to('F', 'E', ...)
    

Use [`flatten()`](tensors#torch.Tensor.flatten "torch.Tensor.flatten") and
`unflatten()` to flatten and unflatten dimensions, respectively. These methods
are more verbose than [`view()`](tensors#torch.Tensor.view
"torch.Tensor.view") and [`reshape()`](tensors#torch.Tensor.reshape
"torch.Tensor.reshape"), but have more semantic meaning to someone reading the
code.

    
    
    >>> imgs = torch.randn(32, 3, 128, 128)
    >>> named_imgs = imgs.refine_names('N', 'C', 'H', 'W')
    
    >>> flat_imgs = imgs.view(32, -1)
    >>> named_flat_imgs = named_imgs.flatten(['C', 'H', 'W'], 'features')
    >>> named_flat_imgs.names
    ('N', 'features')
    
    >>> unflattened_imgs = imgs.view(32, 3, 128, 128)
    >>> unflattened_named_imgs = named_flat_imgs.unflatten(
            'features', [('C', 3), ('H', 128), ('W', 128)])
    

## Autograd support

Autograd currently supports named tensors in a limited manner: autograd
ignores names on all tensors. Gradient computation is still correct but we
lose the safety that names give us.

    
    
    >>> x = torch.randn(3, names=('D',))
    >>> weight = torch.randn(3, names=('D',), requires_grad=True)
    >>> loss = (x - weight).abs()
    >>> grad_loss = torch.randn(3)
    >>> loss.backward(grad_loss)
    >>> weight.grad  # Unnamed for now. Will be named in the future
    tensor([-1.8107, -0.6357,  0.0783])
    
    >>> weight.grad.zero_()
    >>> grad_loss = grad_loss.refine_names('C')
    >>> loss = (x - weight).abs()
    # Ideally we'd check that the names of loss and grad_loss match but we don't yet.
    >>> loss.backward(grad_loss)
    >>> weight.grad
    tensor([-1.8107, -0.6357,  0.0783])
    

## Currently supported operations and subsystems

### Operators

See [Named Tensors operator coverage](name_inference#name-inference-reference-
doc) for a full list of the supported torch and tensor operations. We do not
yet support the following that is not covered by the link:

  * indexing, advanced indexing.

For `torch.nn.functional` operators, we support the following:

  * [`torch.nn.functional.relu()`](nn.functional#torch.nn.functional.relu "torch.nn.functional.relu")
  * [`torch.nn.functional.softmax()`](nn.functional#torch.nn.functional.softmax "torch.nn.functional.softmax")
  * [`torch.nn.functional.log_softmax()`](nn.functional#torch.nn.functional.log_softmax "torch.nn.functional.log_softmax")
  * [`torch.nn.functional.tanh()`](nn.functional#torch.nn.functional.tanh "torch.nn.functional.tanh")
  * [`torch.nn.functional.sigmoid()`](nn.functional#torch.nn.functional.sigmoid "torch.nn.functional.sigmoid")
  * [`torch.nn.functional.dropout()`](nn.functional#torch.nn.functional.dropout "torch.nn.functional.dropout")

### Subsystems

Autograd is supported, see Autograd support. Because gradients are currently
unnamed, optimizers may work but are untested.

NN modules are currently unsupported. This can lead to the following when
calling modules with named tensor inputs:

  * NN module parameters are unnamed, so outputs may be partially named.
  * NN module forward passes have code that don’t support named tensors and will error out appropriately.

We also do not support the following subsystems, though some may work out of
the box:

  * distributions
  * serialization ([`torch.load()`](generated/torch.load#torch.load "torch.load"), [`torch.save()`](generated/torch.save#torch.save "torch.save"))
  * multiprocessing
  * JIT
  * distributed
  * ONNX

If any of these would help your use case, please [search if an issue has
already been
filed](https://github.com/pytorch/pytorch/issues?q=is%3Aopen+is%3Aissue+label%3A%22module%3A+named+tensor%22)
and if not, [file one](https://github.com/pytorch/pytorch/issues/new/choose).

## Named tensor API reference

In this section please find the documentation for named tensor specific APIs.
For a comprehensive reference for how names are propagated through other
PyTorch operators, see [Named Tensors operator coverage](name_inference#name-
inference-reference-doc).

`class torch.Tensor`

    

`names`

    

Stores names for each of this tensor’s dimensions.

`names[idx]` corresponds to the name of tensor dimension `idx`. Names are
either a string if the dimension is named or `None` if the dimension is
unnamed.

Dimension names may contain characters or underscore. Furthermore, a dimension
name must be a valid Python variable name (i.e., does not start with
underscore).

Tensors may not have two named dimensions with the same name.

Warning

The named tensor API is experimental and subject to change.

`rename(*names, **rename_map)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.rename)

    

Renames dimension names of `self`.

There are two main usages:

`self.rename(**rename_map)` returns a view on tensor that has dims renamed as
specified in the mapping `rename_map`.

`self.rename(*names)` returns a view on tensor, renaming all dimensions
positionally using `names`. Use `self.rename(None)` to drop names on a tensor.

One cannot specify both positional args `names` and keyword args `rename_map`.

Examples:

    
    
    >>> imgs = torch.rand(2, 3, 5, 7, names=('N', 'C', 'H', 'W'))
    >>> renamed_imgs = imgs.rename(N='batch', C='channels')
    >>> renamed_imgs.names
    ('batch', 'channels', 'H', 'W')
    
    >>> renamed_imgs = imgs.rename(None)
    >>> renamed_imgs.names
    (None,)
    
    >>> renamed_imgs = imgs.rename('batch', 'channel', 'height', 'width')
    >>> renamed_imgs.names
    ('batch', 'channel', 'height', 'width')
    

Warning

The named tensor API is experimental and subject to change.

`rename_(*names, **rename_map)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.rename_)

    

In-place version of `rename()`.

`refine_names(*names)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.refine_names)

    

Refines the dimension names of `self` according to `names`.

Refining is a special case of renaming that “lifts” unnamed dimensions. A
`None` dim can be refined to have any name; a named dim can only be refined to
have the same name.

Because named tensors can coexist with unnamed tensors, refining names gives a
nice way to write named-tensor-aware code that works with both named and
unnamed tensors.

`names` may contain up to one Ellipsis (`...`). The Ellipsis is expanded
greedily; it is expanded in-place to fill `names` to the same length as
`self.dim()` using names from the corresponding indices of `self.names`.

Python 2 does not support Ellipsis but one may use a string literal instead
(`'...'`).

Parameters

    

**names** (_iterable of str_) – The desired names of the output tensor. May
contain up to one Ellipsis.

Examples:

    
    
    >>> imgs = torch.randn(32, 3, 128, 128)
    >>> named_imgs = imgs.refine_names('N', 'C', 'H', 'W')
    >>> named_imgs.names
    ('N', 'C', 'H', 'W')
    
    >>> tensor = torch.randn(2, 3, 5, 7, 11)
    >>> tensor = tensor.refine_names('A', ..., 'B', 'C')
    >>> tensor.names
    ('A', None, None, 'B', 'C')
    

Warning

The named tensor API is experimental and subject to change.

`align_as(other) → Tensor`

    

Permutes the dimensions of the `self` tensor to match the dimension order in
the `other` tensor, adding size-one dims for any new names.

This operation is useful for explicit broadcasting by names (see examples).

All of the dims of `self` must be named in order to use this method. The
resulting tensor is a view on the original tensor.

All dimension names of `self` must be present in `other.names`. `other` may
contain named dimensions that are not in `self.names`; the output tensor has a
size-one dimension for each of those new names.

To align a tensor to a specific order, use `align_to()`.

Examples:

    
    
    # Example 1: Applying a mask
    >>> mask = torch.randint(2, [127, 128], dtype=torch.bool).refine_names('W', 'H')
    >>> imgs = torch.randn(32, 128, 127, 3, names=('N', 'H', 'W', 'C'))
    >>> imgs.masked_fill_(mask.align_as(imgs), 0)
    
    
    # Example 2: Applying a per-channel-scale
    >>> def scale_channels(input, scale):
    >>>    scale = scale.refine_names('C')
    >>>    return input * scale.align_as(input)
    
    >>> num_channels = 3
    >>> scale = torch.randn(num_channels, names=('C',))
    >>> imgs = torch.rand(32, 128, 128, num_channels, names=('N', 'H', 'W', 'C'))
    >>> more_imgs = torch.rand(32, num_channels, 128, 128, names=('N', 'C', 'H', 'W'))
    >>> videos = torch.randn(3, num_channels, 128, 128, 128, names=('N', 'C', 'H', 'W', 'D'))
    
    # scale_channels is agnostic to the dimension order of the input
    >>> scale_channels(imgs, scale)
    >>> scale_channels(more_imgs, scale)
    >>> scale_channels(videos, scale)
    

Warning

The named tensor API is experimental and subject to change.

`align_to(*names)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.align_to)

    

Permutes the dimensions of the `self` tensor to match the order specified in
`names`, adding size-one dims for any new names.

All of the dims of `self` must be named in order to use this method. The
resulting tensor is a view on the original tensor.

All dimension names of `self` must be present in `names`. `names` may contain
additional names that are not in `self.names`; the output tensor has a size-
one dimension for each of those new names.

`names` may contain up to one Ellipsis (`...`). The Ellipsis is expanded to be
equal to all dimension names of `self` that are not mentioned in `names`, in
the order that they appear in `self`.

Python 2 does not support Ellipsis but one may use a string literal instead
(`'...'`).

Parameters

    

**names** (_iterable of str_) – The desired dimension ordering of the output
tensor. May contain up to one Ellipsis that is expanded to all unmentioned dim
names of `self`.

Examples:

    
    
    >>> tensor = torch.randn(2, 2, 2, 2, 2, 2)
    >>> named_tensor = tensor.refine_names('A', 'B', 'C', 'D', 'E', 'F')
    
    # Move the F and E dims to the front while keeping the rest in order
    >>> named_tensor.align_to('F', 'E', ...)
    

Warning

The named tensor API is experimental and subject to change.

`unflatten(dim, sizes)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.unflatten)

    

Expands the dimension [`dim`](tensors#torch.Tensor.dim "torch.Tensor.dim") of
the `self` tensor over multiple dimensions of sizes given by `sizes`.

  * `sizes` is the new shape of the unflattened dimension and it can be a `Tuple[int]` as well as `torch.Size` if `self` is a `Tensor`, or `namedshape` (Tuple[(name: str, size: int)]) if `self` is a `NamedTensor`. The total number of elements in sizes must match the number of elements in the original dim being unflattened.

Parameters

    

  * **dim** (_Union_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]_) – Dimension to unflatten
  * **sizes** (_Union_ _[__Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _torch.Size_ _,__Tuple_ _[__Tuple_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__]__]_) – New shape of the unflattened dimension

#### Examples

    
    
    >>> torch.randn(3, 4, 1).unflatten(1, (2, 2)).shape
    torch.Size([3, 2, 2, 1])
    >>> torch.randn(2, 4, names=('A', 'B')).unflatten('B', (('B1', 2), ('B2', 2)))
    tensor([[[-1.1772,  0.0180],
            [ 0.2412,  0.1431]],
    

[[-1.1819, -0.8899], [ 1.5813, 0.2274]]], names=(‘A’, ‘B1’, ‘B2’))

Warning

The named tensor API is experimental and subject to change.

`flatten(dims, out_dim) → Tensor`

    

Flattens `dims` into a single dimension with name `out_dim`.

All of `dims` must be consecutive in order in the `self` tensor, but not
necessary contiguous in memory.

Examples:

    
    
    >>> imgs = torch.randn(32, 3, 128, 128, names=('N', 'C', 'H', 'W'))
    >>> flat_imgs = imgs.flatten(['C', 'H', 'W'], 'features')
    >>> flat_imgs.names, flat_imgs.shape
    (('N', 'features'), torch.Size([32, 49152]))
    

Warning

The named tensor API is experimental and subject to change.

# torch.nn.functional

## Convolution functions

### conv1d

`torch.nn.functional.conv1d(input, weight, bias=None, stride=1, padding=0,
dilation=1, groups=1) → Tensor`

    

Applies a 1D convolution over an input signal composed of several input
planes.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

See [`Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d")
for details and output shape.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** – input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW)
  * **weight** – filters of shape (out_channels,in_channelsgroups,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kW)
  * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: `None`
  * **stride** – the stride of the convolving kernel. Can be a single number or a one-element tuple `(sW,)`. Default: 1
  * **padding** – implicit paddings on both sides of the input. Can be a single number or a one-element tuple `(padW,)`. Default: 0
  * **dilation** – the spacing between kernel elements. Can be a single number or a one-element tuple `(dW,)`. Default: 1
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1

Examples:

    
    
    >>> filters = torch.randn(33, 16, 3)
    >>> inputs = torch.randn(20, 16, 50)
    >>> F.conv1d(inputs, filters)
    

### conv2d

`torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0,
dilation=1, groups=1) → Tensor`

    

Applies a 2D convolution over an input image composed of several input planes.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

See [`Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d")
for details and output shape.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** – input tensor of shape (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW)
  * **weight** – filters of shape (out_channels,in_channelsgroups,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kH , kW)
  * **bias** – optional bias tensor of shape (out_channels)(\text{out\\_channels}) . Default: `None`
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sH, sW)`. Default: 1
  * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dH, dW)`. Default: 1
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1

Examples:

    
    
    >>> # With square kernels and equal stride
    >>> filters = torch.randn(8,4,3,3)
    >>> inputs = torch.randn(1,4,5,5)
    >>> F.conv2d(inputs, filters, padding=1)
    

### conv3d

`torch.nn.functional.conv3d(input, weight, bias=None, stride=1, padding=0,
dilation=1, groups=1) → Tensor`

    

Applies a 3D convolution over an input image composed of several input planes.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

See [`Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d")
for details and output shape.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** – input tensor of shape (minibatch,in_channels,iT,iH,iW)(\text{minibatch} , \text{in\\_channels} , iT , iH , iW)
  * **weight** – filters of shape (out_channels,in_channelsgroups,kT,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kT , kH , kW)
  * **bias** – optional bias tensor of shape (out_channels)(\text{out\\_channels}) . Default: None
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sT, sH, sW)`. Default: 1
  * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padT, padH, padW)`. Default: 0
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dT, dH, dW)`. Default: 1
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1

Examples:

    
    
    >>> filters = torch.randn(33, 16, 3, 3, 3)
    >>> inputs = torch.randn(20, 16, 50, 10, 20)
    >>> F.conv3d(inputs, filters)
    

### conv_transpose1d

`torch.nn.functional.conv_transpose1d(input, weight, bias=None, stride=1,
padding=0, output_padding=0, groups=1, dilation=1) → Tensor`

    

Applies a 1D transposed convolution operator over an input signal composed of
several input planes, sometimes also called “deconvolution”.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

See
[`ConvTranspose1d`](generated/torch.nn.convtranspose1d#torch.nn.ConvTranspose1d
"torch.nn.ConvTranspose1d") for details and output shape.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** – input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW)
  * **weight** – filters of shape (in_channels,out_channelsgroups,kW)(\text{in\\_channels} , \frac{\text{out\\_channels}}{\text{groups}} , kW)
  * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: None
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sW,)`. Default: 1
  * **padding** – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple `(padW,)`. Default: 0
  * **output_padding** – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple `(out_padW)`. Default: 0
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dW,)`. Default: 1

Examples:

    
    
    >>> inputs = torch.randn(20, 16, 50)
    >>> weights = torch.randn(16, 33, 5)
    >>> F.conv_transpose1d(inputs, weights)
    

### conv_transpose2d

`torch.nn.functional.conv_transpose2d(input, weight, bias=None, stride=1,
padding=0, output_padding=0, groups=1, dilation=1) → Tensor`

    

Applies a 2D transposed convolution operator over an input image composed of
several input planes, sometimes also called “deconvolution”.

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

See
[`ConvTranspose2d`](generated/torch.nn.convtranspose2d#torch.nn.ConvTranspose2d
"torch.nn.ConvTranspose2d") for details and output shape.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** – input tensor of shape (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW)
  * **weight** – filters of shape (in_channels,out_channelsgroups,kH,kW)(\text{in\\_channels} , \frac{\text{out\\_channels}}{\text{groups}} , kH , kW)
  * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: None
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sH, sW)`. Default: 1
  * **padding** – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple `(padH, padW)`. Default: 0
  * **output_padding** – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple `(out_padH, out_padW)`. Default: 0
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dH, dW)`. Default: 1

Examples:

    
    
    >>> # With square kernels and equal stride
    >>> inputs = torch.randn(1, 4, 5, 5)
    >>> weights = torch.randn(4, 8, 3, 3)
    >>> F.conv_transpose2d(inputs, weights, padding=1)
    

### conv_transpose3d

`torch.nn.functional.conv_transpose3d(input, weight, bias=None, stride=1,
padding=0, output_padding=0, groups=1, dilation=1) → Tensor`

    

Applies a 3D transposed convolution operator over an input image composed of
several input planes, sometimes also called “deconvolution”

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

See
[`ConvTranspose3d`](generated/torch.nn.convtranspose3d#torch.nn.ConvTranspose3d
"torch.nn.ConvTranspose3d") for details and output shape.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** – input tensor of shape (minibatch,in_channels,iT,iH,iW)(\text{minibatch} , \text{in\\_channels} , iT , iH , iW)
  * **weight** – filters of shape (in_channels,out_channelsgroups,kT,kH,kW)(\text{in\\_channels} , \frac{\text{out\\_channels}}{\text{groups}} , kT , kH , kW)
  * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: None
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sT, sH, sW)`. Default: 1
  * **padding** – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple `(padT, padH, padW)`. Default: 0
  * **output_padding** – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple `(out_padT, out_padH, out_padW)`. Default: 0
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dT, dH, dW)`. Default: 1

Examples:

    
    
    >>> inputs = torch.randn(20, 16, 50, 10, 20)
    >>> weights = torch.randn(16, 33, 3, 3, 3)
    >>> F.conv_transpose3d(inputs, weights)
    

### unfold

`torch.nn.functional.unfold(input, kernel_size, dilation=1, padding=0,
stride=1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#unfold)

    

Extracts sliding local blocks from a batched input tensor.

Warning

Currently, only 4-D input tensors (batched image-like tensors) are supported.

Warning

More than one element of the unfolded tensor may refer to a single memory
location. As a result, in-place operations (especially ones that are
vectorized) may result in incorrect behavior. If you need to write to the
tensor, please clone it first.

See [`torch.nn.Unfold`](generated/torch.nn.unfold#torch.nn.Unfold
"torch.nn.Unfold") for details

### fold

`torch.nn.functional.fold(input, output_size, kernel_size, dilation=1,
padding=0, stride=1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#fold)

    

Combines an array of sliding local blocks into a large containing tensor.

Warning

Currently, only 3-D output tensors (unfolded batched image-like tensors) are
supported.

See [`torch.nn.Fold`](generated/torch.nn.fold#torch.nn.Fold "torch.nn.Fold")
for details

## Pooling functions

### avg_pool1d

`torch.nn.functional.avg_pool1d(input, kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True) → Tensor`

    

Applies a 1D average pooling over an input signal composed of several input
planes.

See [`AvgPool1d`](generated/torch.nn.avgpool1d#torch.nn.AvgPool1d
"torch.nn.AvgPool1d") for details and output shape.

Parameters

    

  * **input** – input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW)
  * **kernel_size** – the size of the window. Can be a single number or a tuple `(kW,)`
  * **stride** – the stride of the window. Can be a single number or a tuple `(sW,)`. Default: `kernel_size`
  * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padW,)`. Default: 0
  * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape. Default: `False`
  * **count_include_pad** – when True, will include the zero-padding in the averaging calculation. Default: `True`

Examples:

    
    
    >>> # pool of square window of size=3, stride=2
    >>> input = torch.tensor([[[1, 2, 3, 4, 5, 6, 7]]], dtype=torch.float32)
    >>> F.avg_pool1d(input, kernel_size=3, stride=2)
    tensor([[[ 2.,  4.,  6.]]])
    

### avg_pool2d

`torch.nn.functional.avg_pool2d(input, kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True, divisor_override=None) → Tensor`

    

Applies 2D average-pooling operation in kH×kWkH \times kW regions by step size
sH×sWsH \times sW steps. The number of output features is equal to the number
of input planes.

See [`AvgPool2d`](generated/torch.nn.avgpool2d#torch.nn.AvgPool2d
"torch.nn.AvgPool2d") for details and output shape.

Parameters

    

  * **input** – input tensor (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW)
  * **kernel_size** – size of the pooling region. Can be a single number or a tuple `(kH, kW)`
  * **stride** – stride of the pooling operation. Can be a single number or a tuple `(sH, sW)`. Default: `kernel_size`
  * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0
  * **ceil_mode** – when True, will use `ceil` instead of `floor` in the formula to compute the output shape. Default: `False`
  * **count_include_pad** – when True, will include the zero-padding in the averaging calculation. Default: `True`
  * **divisor_override** – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None

### avg_pool3d

`torch.nn.functional.avg_pool3d(input, kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True, divisor_override=None) → Tensor`

    

Applies 3D average-pooling operation in kT×kH×kWkT \times kH \times kW regions
by step size sT×sH×sWsT \times sH \times sW steps. The number of output
features is equal to ⌊input planessT⌋\lfloor\frac{\text{input
planes}}{sT}\rfloor .

See [`AvgPool3d`](generated/torch.nn.avgpool3d#torch.nn.AvgPool3d
"torch.nn.AvgPool3d") for details and output shape.

Parameters

    

  * **input** – input tensor (minibatch,in_channels,iT×iH,iW)(\text{minibatch} , \text{in\\_channels} , iT \times iH , iW)
  * **kernel_size** – size of the pooling region. Can be a single number or a tuple `(kT, kH, kW)`
  * **stride** – stride of the pooling operation. Can be a single number or a tuple `(sT, sH, sW)`. Default: `kernel_size`
  * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padT, padH, padW)`, Default: 0
  * **ceil_mode** – when True, will use `ceil` instead of `floor` in the formula to compute the output shape
  * **count_include_pad** – when True, will include the zero-padding in the averaging calculation
  * **divisor_override** – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None

### max_pool1d

`torch.nn.functional.max_pool1d(*args, **kwargs)`

    

Applies a 1D max pooling over an input signal composed of several input
planes.

See [`MaxPool1d`](generated/torch.nn.maxpool1d#torch.nn.MaxPool1d
"torch.nn.MaxPool1d") for details.

### max_pool2d

`torch.nn.functional.max_pool2d(*args, **kwargs)`

    

Applies a 2D max pooling over an input signal composed of several input
planes.

See [`MaxPool2d`](generated/torch.nn.maxpool2d#torch.nn.MaxPool2d
"torch.nn.MaxPool2d") for details.

### max_pool3d

`torch.nn.functional.max_pool3d(*args, **kwargs)`

    

Applies a 3D max pooling over an input signal composed of several input
planes.

See [`MaxPool3d`](generated/torch.nn.maxpool3d#torch.nn.MaxPool3d
"torch.nn.MaxPool3d") for details.

### max_unpool1d

`torch.nn.functional.max_unpool1d(input, indices, kernel_size, stride=None,
padding=0, output_size=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#max_unpool1d)

    

Computes a partial inverse of `MaxPool1d`.

See [`MaxUnpool1d`](generated/torch.nn.maxunpool1d#torch.nn.MaxUnpool1d
"torch.nn.MaxUnpool1d") for details.

### max_unpool2d

`torch.nn.functional.max_unpool2d(input, indices, kernel_size, stride=None,
padding=0, output_size=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#max_unpool2d)

    

Computes a partial inverse of `MaxPool2d`.

See [`MaxUnpool2d`](generated/torch.nn.maxunpool2d#torch.nn.MaxUnpool2d
"torch.nn.MaxUnpool2d") for details.

### max_unpool3d

`torch.nn.functional.max_unpool3d(input, indices, kernel_size, stride=None,
padding=0, output_size=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#max_unpool3d)

    

Computes a partial inverse of `MaxPool3d`.

See [`MaxUnpool3d`](generated/torch.nn.maxunpool3d#torch.nn.MaxUnpool3d
"torch.nn.MaxUnpool3d") for details.

### lp_pool1d

`torch.nn.functional.lp_pool1d(input, norm_type, kernel_size, stride=None,
ceil_mode=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#lp_pool1d)

    

Applies a 1D power-average pooling over an input signal composed of several
input planes. If the sum of all inputs to the power of `p` is zero, the
gradient is set to zero as well.

See [`LPPool1d`](generated/torch.nn.lppool1d#torch.nn.LPPool1d
"torch.nn.LPPool1d") for details.

### lp_pool2d

`torch.nn.functional.lp_pool2d(input, norm_type, kernel_size, stride=None,
ceil_mode=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#lp_pool2d)

    

Applies a 2D power-average pooling over an input signal composed of several
input planes. If the sum of all inputs to the power of `p` is zero, the
gradient is set to zero as well.

See [`LPPool2d`](generated/torch.nn.lppool2d#torch.nn.LPPool2d
"torch.nn.LPPool2d") for details.

### adaptive_max_pool1d

`torch.nn.functional.adaptive_max_pool1d(*args, **kwargs)`

    

Applies a 1D adaptive max pooling over an input signal composed of several
input planes.

See
[`AdaptiveMaxPool1d`](generated/torch.nn.adaptivemaxpool1d#torch.nn.AdaptiveMaxPool1d
"torch.nn.AdaptiveMaxPool1d") for details and output shape.

Parameters

    

  * **output_size** – the target output size (single integer)
  * **return_indices** – whether to return pooling indices. Default: `False`

### adaptive_max_pool2d

`torch.nn.functional.adaptive_max_pool2d(*args, **kwargs)`

    

Applies a 2D adaptive max pooling over an input signal composed of several
input planes.

See
[`AdaptiveMaxPool2d`](generated/torch.nn.adaptivemaxpool2d#torch.nn.AdaptiveMaxPool2d
"torch.nn.AdaptiveMaxPool2d") for details and output shape.

Parameters

    

  * **output_size** – the target output size (single integer or double-integer tuple)
  * **return_indices** – whether to return pooling indices. Default: `False`

### adaptive_max_pool3d

`torch.nn.functional.adaptive_max_pool3d(*args, **kwargs)`

    

Applies a 3D adaptive max pooling over an input signal composed of several
input planes.

See
[`AdaptiveMaxPool3d`](generated/torch.nn.adaptivemaxpool3d#torch.nn.AdaptiveMaxPool3d
"torch.nn.AdaptiveMaxPool3d") for details and output shape.

Parameters

    

  * **output_size** – the target output size (single integer or triple-integer tuple)
  * **return_indices** – whether to return pooling indices. Default: `False`

### adaptive_avg_pool1d

`torch.nn.functional.adaptive_avg_pool1d(input, output_size) → Tensor`

    

Applies a 1D adaptive average pooling over an input signal composed of several
input planes.

See
[`AdaptiveAvgPool1d`](generated/torch.nn.adaptiveavgpool1d#torch.nn.AdaptiveAvgPool1d
"torch.nn.AdaptiveAvgPool1d") for details and output shape.

Parameters

    

**output_size** – the target output size (single integer)

### adaptive_avg_pool2d

`torch.nn.functional.adaptive_avg_pool2d(input, output_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#adaptive_avg_pool2d)

    

Applies a 2D adaptive average pooling over an input signal composed of several
input planes.

See
[`AdaptiveAvgPool2d`](generated/torch.nn.adaptiveavgpool2d#torch.nn.AdaptiveAvgPool2d
"torch.nn.AdaptiveAvgPool2d") for details and output shape.

Parameters

    

**output_size** – the target output size (single integer or double-integer
tuple)

### adaptive_avg_pool3d

`torch.nn.functional.adaptive_avg_pool3d(input, output_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#adaptive_avg_pool3d)

    

Applies a 3D adaptive average pooling over an input signal composed of several
input planes.

See
[`AdaptiveAvgPool3d`](generated/torch.nn.adaptiveavgpool3d#torch.nn.AdaptiveAvgPool3d
"torch.nn.AdaptiveAvgPool3d") for details and output shape.

Parameters

    

**output_size** – the target output size (single integer or triple-integer
tuple)

## Non-linear activation functions

### threshold

`torch.nn.functional.threshold(input, threshold, value, inplace=False)`

    

Thresholds each element of the input Tensor.

See [`Threshold`](generated/torch.nn.threshold#torch.nn.Threshold
"torch.nn.Threshold") for more details.

`torch.nn.functional.threshold_(input, threshold, value) → Tensor`

    

In-place version of `threshold()`.

### relu

`torch.nn.functional.relu(input, inplace=False) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#relu)

    

Applies the rectified linear unit function element-wise. See
[`ReLU`](generated/torch.nn.relu#torch.nn.ReLU "torch.nn.ReLU") for more
details.

`torch.nn.functional.relu_(input) → Tensor`

    

In-place version of `relu()`.

### hardtanh

`torch.nn.functional.hardtanh(input, min_val=-1., max_val=1., inplace=False) →
Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardtanh)

    

Applies the HardTanh function element-wise. See
[`Hardtanh`](generated/torch.nn.hardtanh#torch.nn.Hardtanh
"torch.nn.Hardtanh") for more details.

`torch.nn.functional.hardtanh_(input, min_val=-1., max_val=1.) → Tensor`

    

In-place version of `hardtanh()`.

### hardswish

`torch.nn.functional.hardswish(input, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardswish)

    

Applies the hardswish function, element-wise, as described in the paper:

[Searching for MobileNetV3](https://arxiv.org/abs/1905.02244).

Hardswish(x)={0if x≤−3,xif x≥+3,x⋅(x+3)/6otherwise\text{Hardswish}(x) =
\begin{cases} 0 & \text{if~} x \le -3, \\\ x & \text{if~} x \ge +3, \\\ x
\cdot (x + 3) /6 & \text{otherwise} \end{cases}

See [`Hardswish`](generated/torch.nn.hardswish#torch.nn.Hardswish
"torch.nn.Hardswish") for more details.

### relu6

`torch.nn.functional.relu6(input, inplace=False) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#relu6)

    

Applies the element-wise function ReLU6(x)=min⁡(max⁡(0,x),6)\text{ReLU6}(x) =
\min(\max(0,x), 6) .

See [`ReLU6`](generated/torch.nn.relu6#torch.nn.ReLU6 "torch.nn.ReLU6") for
more details.

### elu

`torch.nn.functional.elu(input, alpha=1.0, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#elu)

    

Applies element-wise, ELU(x)=max⁡(0,x)+min⁡(0,α∗(exp⁡(x)−1))\text{ELU}(x) =
\max(0,x) + \min(0, \alpha * (\exp(x) - 1)) .

See [`ELU`](generated/torch.nn.elu#torch.nn.ELU "torch.nn.ELU") for more
details.

`torch.nn.functional.elu_(input, alpha=1.) → Tensor`

    

In-place version of `elu()`.

### selu

`torch.nn.functional.selu(input, inplace=False) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#selu)

    

Applies element-wise,
SELU(x)=scale∗(max⁡(0,x)+min⁡(0,α∗(exp⁡(x)−1)))\text{SELU}(x) = scale *
(\max(0,x) + \min(0, \alpha * (\exp(x) - 1))) , with
α=1.6732632423543772848170429916717\alpha=1.6732632423543772848170429916717
and
scale=1.0507009873554804934193349852946scale=1.0507009873554804934193349852946
.

See [`SELU`](generated/torch.nn.selu#torch.nn.SELU "torch.nn.SELU") for more
details.

### celu

`torch.nn.functional.celu(input, alpha=1., inplace=False) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#celu)

    

Applies element-wise, CELU(x)=max⁡(0,x)+min⁡(0,α∗(exp⁡(x/α)−1))\text{CELU}(x)
= \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1)) .

See [`CELU`](generated/torch.nn.celu#torch.nn.CELU "torch.nn.CELU") for more
details.

### leaky_relu

`torch.nn.functional.leaky_relu(input, negative_slope=0.01, inplace=False) →
Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#leaky_relu)

    

Applies element-wise,
LeakyReLU(x)=max⁡(0,x)+negative_slope∗min⁡(0,x)\text{LeakyReLU}(x) = \max(0,
x) + \text{negative\\_slope} * \min(0, x)

See [`LeakyReLU`](generated/torch.nn.leakyrelu#torch.nn.LeakyReLU
"torch.nn.LeakyReLU") for more details.

`torch.nn.functional.leaky_relu_(input, negative_slope=0.01) → Tensor`

    

In-place version of `leaky_relu()`.

### prelu

`torch.nn.functional.prelu(input, weight) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#prelu)

    

Applies element-wise the function
PReLU(x)=max⁡(0,x)+weight∗min⁡(0,x)\text{PReLU}(x) = \max(0,x) + \text{weight}
* \min(0,x) where weight is a learnable parameter.

See [`PReLU`](generated/torch.nn.prelu#torch.nn.PReLU "torch.nn.PReLU") for
more details.

### rrelu

`torch.nn.functional.rrelu(input, lower=1./8, upper=1./3, training=False,
inplace=False) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#rrelu)

    

Randomized leaky ReLU.

See [`RReLU`](generated/torch.nn.rrelu#torch.nn.RReLU "torch.nn.RReLU") for
more details.

`torch.nn.functional.rrelu_(input, lower=1./8, upper=1./3, training=False) →
Tensor`

    

In-place version of `rrelu()`.

### glu

`torch.nn.functional.glu(input, dim=-1) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#glu)

    

The gated linear unit. Computes:

GLU(a,b)=a⊗σ(b)\text{GLU}(a, b) = a \otimes \sigma(b)

where `input` is split in half along `dim` to form `a` and `b`, σ\sigma is the
sigmoid function and ⊗\otimes is the element-wise product between matrices.

See [Language Modeling with Gated Convolutional
Networks](https://arxiv.org/abs/1612.08083).

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input tensor
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension on which to split the input. Default: -1

### gelu

`torch.nn.functional.gelu(input) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#gelu)

    

Applies element-wise the function GELU(x)=x∗Φ(x)\text{GELU}(x) = x * \Phi(x)

where Φ(x)\Phi(x) is the Cumulative Distribution Function for Gaussian
Distribution.

See [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415).

### logsigmoid

`torch.nn.functional.logsigmoid(input) → Tensor`

    

Applies element-wise LogSigmoid(xi)=log⁡(11+exp⁡(−xi))\text{LogSigmoid}(x_i) =
\log \left(\frac{1}{1 + \exp(-x_i)}\right)

See [`LogSigmoid`](generated/torch.nn.logsigmoid#torch.nn.LogSigmoid
"torch.nn.LogSigmoid") for more details.

### hardshrink

`torch.nn.functional.hardshrink(input, lambd=0.5) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardshrink)

    

Applies the hard shrinkage function element-wise

See [`Hardshrink`](generated/torch.nn.hardshrink#torch.nn.Hardshrink
"torch.nn.Hardshrink") for more details.

### tanhshrink

`torch.nn.functional.tanhshrink(input) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#tanhshrink)

    

Applies element-wise, Tanhshrink(x)=x−Tanh(x)\text{Tanhshrink}(x) = x -
\text{Tanh}(x)

See [`Tanhshrink`](generated/torch.nn.tanhshrink#torch.nn.Tanhshrink
"torch.nn.Tanhshrink") for more details.

### softsign

`torch.nn.functional.softsign(input) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#softsign)

    

Applies element-wise, the function SoftSign(x)=x1+∣x∣\text{SoftSign}(x) =
\frac{x}{1 + |x|}

See [`Softsign`](generated/torch.nn.softsign#torch.nn.Softsign
"torch.nn.Softsign") for more details.

### softplus

`torch.nn.functional.softplus(input, beta=1, threshold=20) → Tensor`

    

Applies element-wise, the function
Softplus(x)=1β∗log⁡(1+exp⁡(β∗x))\text{Softplus}(x) = \frac{1}{\beta} * \log(1
+ \exp(\beta * x)) .

For numerical stability the implementation reverts to the linear function when
input×β>thresholdinput \times \beta > threshold .

See [`Softplus`](generated/torch.nn.softplus#torch.nn.Softplus
"torch.nn.Softplus") for more details.

### softmin

`torch.nn.functional.softmin(input, dim=None, _stacklevel=3, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#softmin)

    

Applies a softmin function.

Note that Softmin(x)=Softmax(−x)\text{Softmin}(x) = \text{Softmax}(-x) . See
softmax definition for mathematical formula.

See [`Softmin`](generated/torch.nn.softmin#torch.nn.Softmin
"torch.nn.Softmin") for more details.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmin will be computed (so every slice along dim will sum to 1).
  * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None.

### softmax

`torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#softmax)

    

Applies a softmax function.

Softmax is defined as:

Softmax(xi)=exp⁡(xi)∑jexp⁡(xj)\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j
\exp(x_j)}

It is applied to all slices along dim, and will re-scale them so that the
elements lie in the range `[0, 1]` and sum to 1.

See [`Softmax`](generated/torch.nn.softmax#torch.nn.Softmax
"torch.nn.Softmax") for more details.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed.
  * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None.

Note

This function doesn’t work directly with NLLLoss, which expects the Log to be
computed between the Softmax and itself. Use log_softmax instead (it’s faster
and has better numerical properties).

### softshrink

`torch.nn.functional.softshrink(input, lambd=0.5) → Tensor`

    

Applies the soft shrinkage function elementwise

See [`Softshrink`](generated/torch.nn.softshrink#torch.nn.Softshrink
"torch.nn.Softshrink") for more details.

### gumbel_softmax

`torch.nn.functional.gumbel_softmax(logits, tau=1, hard=False, eps=1e-10,
dim=-1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#gumbel_softmax)

    

Samples from the Gumbel-Softmax distribution ([Link
1](https://arxiv.org/abs/1611.00712) [Link
2](https://arxiv.org/abs/1611.01144)) and optionally discretizes.

Parameters

    

  * **logits** – `[…, num_features]` unnormalized log probabilities
  * **tau** – non-negative scalar temperature
  * **hard** – if `True`, the returned samples will be discretized as one-hot vectors, but will be differentiated as if it is the soft sample in autograd
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed. Default: -1.

Returns

    

Sampled tensor of same shape as `logits` from the Gumbel-Softmax distribution.
If `hard=True`, the returned samples will be one-hot, otherwise they will be
probability distributions that sum to 1 across `dim`.

Note

This function is here for legacy reasons, may be removed from nn.Functional in
the future.

Note

The main trick for `hard` is to do `y_hard - y_soft.detach() + y_soft`

It achieves two things: - makes the output value exactly one-hot (since we add
then subtract y_soft value) - makes the gradient equal to y_soft gradient
(since we strip all other gradients)

Examples::

    
    
    
    >>> logits = torch.randn(20, 32)
    >>> # Sample soft categorical using reparametrization trick:
    >>> F.gumbel_softmax(logits, tau=1, hard=False)
    >>> # Sample hard categorical using "Straight-through" trick:
    >>> F.gumbel_softmax(logits, tau=1, hard=True)
    

### log_softmax

`torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#log_softmax)

    

Applies a softmax followed by a logarithm.

While mathematically equivalent to log(softmax(x)), doing these two operations
separately is slower, and numerically unstable. This function uses an
alternative formulation to compute the output and gradient correctly.

See [`LogSoftmax`](generated/torch.nn.logsoftmax#torch.nn.LogSoftmax
"torch.nn.LogSoftmax") for more details.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which log_softmax will be computed.
  * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None.

### tanh

`torch.nn.functional.tanh(input) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#tanh)

    

Applies element-wise,
Tanh(x)=tanh⁡(x)=exp⁡(x)−exp⁡(−x)exp⁡(x)+exp⁡(−x)\text{Tanh}(x) = \tanh(x) =
\frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)}

See [`Tanh`](generated/torch.nn.tanh#torch.nn.Tanh "torch.nn.Tanh") for more
details.

### sigmoid

`torch.nn.functional.sigmoid(input) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#sigmoid)

    

Applies the element-wise function Sigmoid(x)=11+exp⁡(−x)\text{Sigmoid}(x) =
\frac{1}{1 + \exp(-x)}

See [`Sigmoid`](generated/torch.nn.sigmoid#torch.nn.Sigmoid
"torch.nn.Sigmoid") for more details.

### hardsigmoid

`torch.nn.functional.hardsigmoid(input) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardsigmoid)

    

Applies the element-wise function

Hardsigmoid(x)={0if x≤−3,1if x≥+3,x/6+1/2otherwise\text{Hardsigmoid}(x) =
\begin{cases} 0 & \text{if~} x \le -3, \\\ 1 & \text{if~} x \ge +3, \\\ x / 6
+ 1 / 2 & \text{otherwise} \end{cases}

Parameters

    

**inplace** – If set to `True`, will do this operation in-place. Default:
`False`

See [`Hardsigmoid`](generated/torch.nn.hardsigmoid#torch.nn.Hardsigmoid
"torch.nn.Hardsigmoid") for more details.

### silu

`torch.nn.functional.silu(input, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#silu)

    

Applies the silu function, element-wise.

silu(x)=x∗σ(x),where σ(x) is the logistic sigmoid.\text{silu}(x) = x *
\sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.}

Note

See [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415)
where the SiLU (Sigmoid Linear Unit) was originally coined, and see [Sigmoid-
Weighted Linear Units for Neural Network Function Approximation in
Reinforcement Learning](https://arxiv.org/abs/1702.03118) and [Swish: a Self-
Gated Activation Function](https://arxiv.org/abs/1710.05941v1) where the SiLU
was experimented with later.

See [`SiLU`](generated/torch.nn.silu#torch.nn.SiLU "torch.nn.SiLU") for more
details.

## Normalization functions

### batch_norm

`torch.nn.functional.batch_norm(input, running_mean, running_var, weight=None,
bias=None, training=False, momentum=0.1, eps=1e-05)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#batch_norm)

    

Applies Batch Normalization for each channel across a batch of data.

See [`BatchNorm1d`](generated/torch.nn.batchnorm1d#torch.nn.BatchNorm1d
"torch.nn.BatchNorm1d"),
[`BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d
"torch.nn.BatchNorm2d"),
[`BatchNorm3d`](generated/torch.nn.batchnorm3d#torch.nn.BatchNorm3d
"torch.nn.BatchNorm3d") for details.

### instance_norm

`torch.nn.functional.instance_norm(input, running_mean=None, running_var=None,
weight=None, bias=None, use_input_stats=True, momentum=0.1, eps=1e-05)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#instance_norm)

    

Applies Instance Normalization for each channel in each data sample in a
batch.

See
[`InstanceNorm1d`](generated/torch.nn.instancenorm1d#torch.nn.InstanceNorm1d
"torch.nn.InstanceNorm1d"),
[`InstanceNorm2d`](generated/torch.nn.instancenorm2d#torch.nn.InstanceNorm2d
"torch.nn.InstanceNorm2d"),
[`InstanceNorm3d`](generated/torch.nn.instancenorm3d#torch.nn.InstanceNorm3d
"torch.nn.InstanceNorm3d") for details.

### layer_norm

`torch.nn.functional.layer_norm(input, normalized_shape, weight=None,
bias=None, eps=1e-05)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#layer_norm)

    

Applies Layer Normalization for last certain number of dimensions.

See [`LayerNorm`](generated/torch.nn.layernorm#torch.nn.LayerNorm
"torch.nn.LayerNorm") for details.

### local_response_norm

`torch.nn.functional.local_response_norm(input, size, alpha=0.0001, beta=0.75,
k=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#local_response_norm)

    

Applies local response normalization over an input signal composed of several
input planes, where channels occupy the second dimension. Applies
normalization across channels.

See
[`LocalResponseNorm`](generated/torch.nn.localresponsenorm#torch.nn.LocalResponseNorm
"torch.nn.LocalResponseNorm") for details.

### normalize

`torch.nn.functional.normalize(input, p=2, dim=1, eps=1e-12, out=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#normalize)

    

Performs LpL_p normalization of inputs over specified dimension.

For a tensor `input` of sizes (n0,...,ndim,...,nk)(n_0, ..., n_{dim}, ...,
n_k) , each ndimn_{dim} -element vector vv along dimension `dim` is
transformed as

v=vmax⁡(∥v∥p,ϵ).v = \frac{v}{\max(\lVert v \rVert_p, \epsilon)}.

With the default arguments it uses the Euclidean norm over vectors along
dimension 11 for normalization.

Parameters

    

  * **input** – input tensor of any shape
  * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the exponent value in the norm formulation. Default: 2
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. Default: 1
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – small value to avoid division by zero. Default: 1e-12
  * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. If `out` is used, this operation won’t be differentiable.

## Linear functions

### linear

`torch.nn.functional.linear(input, weight, bias=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#linear)

    

Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b .

This operator supports
[TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-
ampere).

Shape:

  * Input: (N,∗,in_features)(N, *, in\\_features) N is the batch size, `*` means any number of additional dimensions
  * Weight: (out_features,in_features)(out\\_features, in\\_features)
  * Bias: (out_features)(out\\_features)
  * Output: (N,∗,out_features)(N, *, out\\_features)

### bilinear

`torch.nn.functional.bilinear(input1, input2, weight, bias=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#bilinear)

    

Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A
x_2 + b

Shape:

  * input1: (N,∗,Hin1)(N, *, H_{in1}) where Hin1=in1_featuresH_{in1}=\text{in1\\_features} and ∗* means any number of additional dimensions. All but the last dimension of the inputs should be the same.
  * input2: (N,∗,Hin2)(N, *, H_{in2}) where Hin2=in2_featuresH_{in2}=\text{in2\\_features}
  * weight: (out_features,in1_features,in2_features)(\text{out\\_features}, \text{in1\\_features}, \text{in2\\_features})
  * bias: (out_features)(\text{out\\_features})
  * output: (N,∗,Hout)(N, *, H_{out}) where Hout=out_featuresH_{out}=\text{out\\_features} and all but the last dimension are the same shape as the input.

## Dropout functions

### dropout

`torch.nn.functional.dropout(input, p=0.5, training=True, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#dropout)

    

During training, randomly zeroes some of the elements of the input tensor with
probability `p` using samples from a Bernoulli distribution.

See [`Dropout`](generated/torch.nn.dropout#torch.nn.Dropout
"torch.nn.Dropout") for details.

Parameters

    

  * **p** – probability of an element to be zeroed. Default: 0.5
  * **training** – apply dropout if is `True`. Default: `True`
  * **inplace** – If set to `True`, will do this operation in-place. Default: `False`

### alpha_dropout

`torch.nn.functional.alpha_dropout(input, p=0.5, training=False,
inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#alpha_dropout)

    

Applies alpha dropout to the input.

See [`AlphaDropout`](generated/torch.nn.alphadropout#torch.nn.AlphaDropout
"torch.nn.AlphaDropout") for details.

### feature_alpha_dropout

`torch.nn.functional.feature_alpha_dropout(input, p=0.5, training=False,
inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#feature_alpha_dropout)

    

Randomly masks out entire channels (a channel is a feature map, e.g. the jj
-th channel of the ii -th sample in the batch input is a tensor
input[i,j]\text{input}[i, j] ) of the input tensor). Instead of setting
activations to zero, as in regular Dropout, the activations are set to the
negative saturation value of the SELU activation function.

Each element will be masked independently on every forward call with
probability `p` using samples from a Bernoulli distribution. The elements to
be masked are randomized on every forward call, and scaled and shifted to
maintain zero mean and unit variance.

See `FeatureAlphaDropout` for details.

Parameters

    

  * **p** – dropout probability of a channel to be zeroed. Default: 0.5
  * **training** – apply dropout if is `True`. Default: `True`
  * **inplace** – If set to `True`, will do this operation in-place. Default: `False`

### dropout2d

`torch.nn.functional.dropout2d(input, p=0.5, training=True, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#dropout2d)

    

Randomly zero out entire channels (a channel is a 2D feature map, e.g., the jj
-th channel of the ii -th sample in the batched input is a 2D tensor
input[i,j]\text{input}[i, j] ) of the input tensor). Each channel will be
zeroed out independently on every forward call with probability `p` using
samples from a Bernoulli distribution.

See [`Dropout2d`](generated/torch.nn.dropout2d#torch.nn.Dropout2d
"torch.nn.Dropout2d") for details.

Parameters

    

  * **p** – probability of a channel to be zeroed. Default: 0.5
  * **training** – apply dropout if is `True`. Default: `True`
  * **inplace** – If set to `True`, will do this operation in-place. Default: `False`

### dropout3d

`torch.nn.functional.dropout3d(input, p=0.5, training=True, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#dropout3d)

    

Randomly zero out entire channels (a channel is a 3D feature map, e.g., the jj
-th channel of the ii -th sample in the batched input is a 3D tensor
input[i,j]\text{input}[i, j] ) of the input tensor). Each channel will be
zeroed out independently on every forward call with probability `p` using
samples from a Bernoulli distribution.

See [`Dropout3d`](generated/torch.nn.dropout3d#torch.nn.Dropout3d
"torch.nn.Dropout3d") for details.

Parameters

    

  * **p** – probability of a channel to be zeroed. Default: 0.5
  * **training** – apply dropout if is `True`. Default: `True`
  * **inplace** – If set to `True`, will do this operation in-place. Default: `False`

## Sparse functions

### embedding

`torch.nn.functional.embedding(input, weight, padding_idx=None, max_norm=None,
norm_type=2.0, scale_grad_by_freq=False, sparse=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#embedding)

    

A simple lookup table that looks up embeddings in a fixed dictionary and size.

This module is often used to retrieve word embeddings using indices. The input
to the module is a list of indices, and the embedding matrix, and the output
is the corresponding word embeddings.

See [`torch.nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding
"torch.nn.Embedding") for more details.

Parameters

    

  * **input** (_LongTensor_) – Tensor containing indices into the embedding matrix
  * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size
  * **padding_idx** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – If given, pads the output with the embedding vector at `padding_idx` (initialized to zeros) whenever it encounters the index.
  * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`. Note: this will modify `weight` in-place.
  * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The p of the p-norm to compute for the `max_norm` option. Default `2`.
  * **scale_grad_by_freq** (_boolean_ _,__optional_) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`.
  * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, gradient w.r.t. `weight` will be a sparse tensor. See Notes under [`torch.nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") for more details regarding sparse gradients.

Shape:

    

  * Input: LongTensor of arbitrary shape containing the indices to extract
  * `Weight: Embedding matrix of floating point type with shape (V, embedding_dim),`
    

where V = maximum index + 1 and embedding_dim = the embedding size

  * Output: `(*, embedding_dim)`, where `*` is the input shape

Examples:

    
    
    >>> # a batch of 2 samples of 4 indices each
    >>> input = torch.tensor([[1,2,4,5],[4,3,2,9]])
    >>> # an embedding matrix containing 10 tensors of size 3
    >>> embedding_matrix = torch.rand(10, 3)
    >>> F.embedding(input, embedding_matrix)
    tensor([[[ 0.8490,  0.9625,  0.6753],
             [ 0.9666,  0.7761,  0.6108],
             [ 0.6246,  0.9751,  0.3618],
             [ 0.4161,  0.2419,  0.7383]],
    
            [[ 0.6246,  0.9751,  0.3618],
             [ 0.0237,  0.7794,  0.0528],
             [ 0.9666,  0.7761,  0.6108],
             [ 0.3385,  0.8612,  0.1867]]])
    
    >>> # example with padding_idx
    >>> weights = torch.rand(10, 3)
    >>> weights[0, :].zero_()
    >>> embedding_matrix = weights
    >>> input = torch.tensor([[0,2,0,5]])
    >>> F.embedding(input, embedding_matrix, padding_idx=0)
    tensor([[[ 0.0000,  0.0000,  0.0000],
             [ 0.5609,  0.5384,  0.8720],
             [ 0.0000,  0.0000,  0.0000],
             [ 0.6262,  0.2438,  0.7471]]])
    

### embedding_bag

`torch.nn.functional.embedding_bag(input, weight, offsets=None, max_norm=None,
norm_type=2, scale_grad_by_freq=False, mode='mean', sparse=False,
per_sample_weights=None, include_last_offset=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#embedding_bag)

    

Computes sums, means or maxes of `bags` of embeddings, without instantiating
the intermediate embeddings.

See
[`torch.nn.EmbeddingBag`](generated/torch.nn.embeddingbag#torch.nn.EmbeddingBag
"torch.nn.EmbeddingBag") for more details.

Note

This operation may produce nondeterministic gradients when given tensors on a
CUDA device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **input** (_LongTensor_) – Tensor containing bags of indices into the embedding matrix
  * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size
  * **offsets** (_LongTensor_ _,__optional_) – Only used when `input` is 1D. `offsets` determines the starting index position of each bag (sequence) in `input`.
  * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`. Note: this will modify `weight` in-place.
  * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The `p` in the `p`-norm to compute for the `max_norm` option. Default `2`.
  * **scale_grad_by_freq** (_boolean_ _,__optional_) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`. Note: this option is not supported when `mode="max"`.
  * **mode** (_string_ _,__optional_) – `"sum"`, `"mean"` or `"max"`. Specifies the way to reduce the bag. Default: `"mean"`
  * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, gradient w.r.t. `weight` will be a sparse tensor. See Notes under [`torch.nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") for more details regarding sparse gradients. Note: this option is not supported when `mode="max"`.
  * **per_sample_weights** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a tensor of float / double weights, or None to indicate all weights should be taken to be 1. If specified, `per_sample_weights` must have exactly the same shape as input and is treated as having the same `offsets`, if those are not None.
  * **include_last_offset** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the size of offsets is equal to the number of bags + 1.
  * **last element is the size of the input, or the ending index position of the last bag** (_The_) – 

Shape:

  * `input` (LongTensor) and `offsets` (LongTensor, optional)

    * If `input` is 2D of shape `(B, N)`,

it will be treated as `B` bags (sequences) each of fixed length `N`, and this
will return `B` values aggregated in a way depending on the `mode`. `offsets`
is ignored and required to be `None` in this case.

    * If `input` is 1D of shape `(N)`,

it will be treated as a concatenation of multiple bags (sequences). `offsets`
is required to be a 1D tensor containing the starting index positions of each
bag in `input`. Therefore, for `offsets` of shape `(B)`, `input` will be
viewed as having `B` bags. Empty bags (i.e., having 0-length) will have
returned vectors filled by zeros.

  * `weight` (Tensor): the learnable weights of the module of shape `(num_embeddings, embedding_dim)`
  * `per_sample_weights` (Tensor, optional). Has the same shape as `input`.
  * `output`: aggregated embedding values of shape `(B, embedding_dim)`

Examples:

    
    
    >>> # an Embedding module containing 10 tensors of size 3
    >>> embedding_matrix = torch.rand(10, 3)
    >>> # a batch of 2 samples of 4 indices each
    >>> input = torch.tensor([1,2,4,5,4,3,2,9])
    >>> offsets = torch.tensor([0,4])
    >>> F.embedding_bag(embedding_matrix, input, offsets)
    tensor([[ 0.3397,  0.3552,  0.5545],
            [ 0.5893,  0.4386,  0.5882]])
    

### one_hot

`torch.nn.functional.one_hot(tensor, num_classes=-1) → LongTensor`

    

Takes LongTensor with index values of shape `(*)` and returns a tensor of
shape `(*, num_classes)` that have zeros everywhere except where the index of
last dimension matches the corresponding value of the input tensor, in which
case it will be 1.

See also [One-hot on Wikipedia](https://en.wikipedia.org/wiki/One-hot) .

Parameters

    

  * **tensor** (_LongTensor_) – class values of any shape.
  * **num_classes** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Total number of classes. If set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor.

Returns

    

LongTensor that has one more dimension with 1 values at the index of last
dimension indicated by the input, and 0 everywhere else.

#### Examples

    
    
    >>> F.one_hot(torch.arange(0, 5) % 3)
    tensor([[1, 0, 0],
            [0, 1, 0],
            [0, 0, 1],
            [1, 0, 0],
            [0, 1, 0]])
    >>> F.one_hot(torch.arange(0, 5) % 3, num_classes=5)
    tensor([[1, 0, 0, 0, 0],
            [0, 1, 0, 0, 0],
            [0, 0, 1, 0, 0],
            [1, 0, 0, 0, 0],
            [0, 1, 0, 0, 0]])
    >>> F.one_hot(torch.arange(0, 6).view(3,2) % 3)
    tensor([[[1, 0, 0],
             [0, 1, 0]],
            [[0, 0, 1],
             [1, 0, 0]],
            [[0, 1, 0],
             [0, 0, 1]]])
    

## Distance functions

### pairwise_distance

`torch.nn.functional.pairwise_distance(x1, x2, p=2.0, eps=1e-06,
keepdim=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#pairwise_distance)

    

See
[`torch.nn.PairwiseDistance`](generated/torch.nn.pairwisedistance#torch.nn.PairwiseDistance
"torch.nn.PairwiseDistance") for details

### cosine_similarity

`torch.nn.functional.cosine_similarity(x1, x2, dim=1, eps=1e-8) → Tensor`

    

Returns cosine similarity between x1 and x2, computed along dim.

similarity=x1⋅x2max⁡(∥x1∥2⋅∥x2∥2,ϵ)\text{similarity} = \dfrac{x_1 \cdot
x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)}

Parameters

    

  * **x1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – First input.
  * **x2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Second input (of size matching x1).
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Dimension of vectors. Default: 1
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid division by zero. Default: 1e-8

Shape:

    

  * Input: (∗1,D,∗2)(\ast_1, D, \ast_2) where D is at position `dim`.
  * Output: (∗1,∗2)(\ast_1, \ast_2) where 1 is at position `dim`.

Example:

    
    
    >>> input1 = torch.randn(100, 128)
    >>> input2 = torch.randn(100, 128)
    >>> output = F.cosine_similarity(input1, input2)
    >>> print(output)
    

### pdist

`torch.nn.functional.pdist(input, p=2) → Tensor`

    

Computes the p-norm distance between every pair of row vectors in the input.
This is identical to the upper triangular portion, excluding the diagonal, of
`torch.norm(input[:, None] - input, dim=2, p=p)`. This function will be faster
if the rows are contiguous.

If input has shape N×MN \times M then the output will have shape
12N(N−1)\frac{1}{2} N (N - 1) .

This function is equivalent to `scipy.spatial.distance.pdist(input,
‘minkowski’, p=p)` if p∈(0,∞)p \in (0, \infty) . When p=0p = 0 it is
equivalent to `scipy.spatial.distance.pdist(input, ‘hamming’) * M`. When p=∞p
= \infty , the closest scipy function is `scipy.spatial.distance.pdist(xn,
lambda x, y: np.abs(x - y).max())`.

Parameters

    

  * **input** – input tensor of shape N×MN \times M .
  * **p** – p value for the p-norm distance to calculate between each vector pair ∈[0,∞]\in [0, \infty] .

## Loss functions

### binary_cross_entropy

`torch.nn.functional.binary_cross_entropy(input, target, weight=None,
size_average=None, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#binary_cross_entropy)

    

Function that measures the Binary Cross Entropy between the target and the
output.

See [`BCELoss`](generated/torch.nn.bceloss#torch.nn.BCELoss
"torch.nn.BCELoss") for details.

Parameters

    

  * **input** – Tensor of arbitrary shape
  * **target** – Tensor of the same shape as input
  * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight if provided it’s repeated to match input tensor shape
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Examples:

    
    
    >>> input = torch.randn((3, 2), requires_grad=True)
    >>> target = torch.rand((3, 2), requires_grad=False)
    >>> loss = F.binary_cross_entropy(F.sigmoid(input), target)
    >>> loss.backward()
    

### binary_cross_entropy_with_logits

`torch.nn.functional.binary_cross_entropy_with_logits(input, target,
weight=None, size_average=None, reduce=None, reduction='mean',
pos_weight=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#binary_cross_entropy_with_logits)

    

Function that measures Binary Cross Entropy between target and output logits.

See
[`BCEWithLogitsLoss`](generated/torch.nn.bcewithlogitsloss#torch.nn.BCEWithLogitsLoss
"torch.nn.BCEWithLogitsLoss") for details.

Parameters

    

  * **input** – Tensor of arbitrary shape
  * **target** – Tensor of the same shape as input
  * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight if provided it’s repeated to match input tensor shape
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`
  * **pos_weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a weight of positive examples. Must be a vector with length equal to the number of classes.

Examples:

    
    
    >>> input = torch.randn(3, requires_grad=True)
    >>> target = torch.empty(3).random_(2)
    >>> loss = F.binary_cross_entropy_with_logits(input, target)
    >>> loss.backward()
    

### poisson_nll_loss

`torch.nn.functional.poisson_nll_loss(input, target, log_input=True,
full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#poisson_nll_loss)

    

Poisson negative log likelihood loss.

See
[`PoissonNLLLoss`](generated/torch.nn.poissonnllloss#torch.nn.PoissonNLLLoss
"torch.nn.PoissonNLLLoss") for details.

Parameters

    

  * **input** – expectation of underlying Poisson distribution.
  * **target** – random sample target∼Poisson(input)target \sim \text{Poisson}(input) .
  * **log_input** – if `True` the loss is computed as exp⁡(input)−target∗input\exp(\text{input}) - \text{target} * \text{input} , if `False` then loss is input−target∗log⁡(input+eps)\text{input} - \text{target} * \log(\text{input}+\text{eps}) . Default: `True`
  * **full** – whether to compute full loss, i. e. to add the Stirling approximation term. Default: `False` target∗log⁡(target)−target+0.5∗log⁡(2∗π∗target)\text{target} * \log(\text{target}) - \text{target} + 0.5 * \log(2 * \pi * \text{target}) .
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True`
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid evaluation of log⁡(0)\log(0) when `log_input`=``False``. Default: 1e-8
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

### cosine_embedding_loss

`torch.nn.functional.cosine_embedding_loss(input1, input2, target, margin=0,
size_average=None, reduce=None, reduction='mean') → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#cosine_embedding_loss)

    

See
[`CosineEmbeddingLoss`](generated/torch.nn.cosineembeddingloss#torch.nn.CosineEmbeddingLoss
"torch.nn.CosineEmbeddingLoss") for details.

### cross_entropy

`torch.nn.functional.cross_entropy(input, target, weight=None,
size_average=None, ignore_index=-100, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#cross_entropy)

    

This criterion combines `log_softmax` and `nll_loss` in a single function.

See
[`CrossEntropyLoss`](generated/torch.nn.crossentropyloss#torch.nn.CrossEntropyLoss
"torch.nn.CrossEntropyLoss") for details.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – (N,C)(N, C) where `C = number of classes` or (N,C,H,W)(N, C, H, W) in case of 2D Loss, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) where K≥1K \geq 1 in the case of K-dimensional loss.
  * **target** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) where K≥1K \geq 1 for K-dimensional loss.
  * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, has to be a Tensor of size `C`
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True`
  * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets. Default: -100
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Examples:

    
    
    >>> input = torch.randn(3, 5, requires_grad=True)
    >>> target = torch.randint(5, (3,), dtype=torch.int64)
    >>> loss = F.cross_entropy(input, target)
    >>> loss.backward()
    

### ctc_loss

`torch.nn.functional.ctc_loss(log_probs, targets, input_lengths,
target_lengths, blank=0, reduction='mean', zero_infinity=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#ctc_loss)

    

The Connectionist Temporal Classification loss.

See [`CTCLoss`](generated/torch.nn.ctcloss#torch.nn.CTCLoss
"torch.nn.CTCLoss") for details.

Note

In some circumstances when given tensors on a CUDA device and using CuDNN,
this operator may select a nondeterministic algorithm to increase performance.
If this is undesirable, you can try to make the operation deterministic
(potentially at a performance cost) by setting
`torch.backends.cudnn.deterministic = True`. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Note

This operation may produce nondeterministic gradients when given tensors on a
CUDA device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **log_probs** – (T,N,C)(T, N, C) where `C = number of characters in alphabet including blank`, `T = input length`, and `N = batch size`. The logarithmized probabilities of the outputs (e.g. obtained with `torch.nn.functional.log_softmax()`).
  * **targets** – (N,S)(N, S) or `(sum(target_lengths))`. Targets cannot be blank. In the second form, the targets are assumed to be concatenated.
  * **input_lengths** – (N)(N) . Lengths of the inputs (must each be ≤T\leq T )
  * **target_lengths** – (N)(N) . Lengths of the targets
  * **blank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Blank label. Default 00 .
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the output losses will be divided by the target lengths and then the mean over the batch is taken, `'sum'`: the output will be summed. Default: `'mean'`
  * **zero_infinity** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether to zero infinite losses and the associated gradients. Default: `False` Infinite losses mainly occur when the inputs are too short to be aligned to the targets.

Example:

    
    
    >>> log_probs = torch.randn(50, 16, 20).log_softmax(2).detach().requires_grad_()
    >>> targets = torch.randint(1, 20, (16, 30), dtype=torch.long)
    >>> input_lengths = torch.full((16,), 50, dtype=torch.long)
    >>> target_lengths = torch.randint(10,30,(16,), dtype=torch.long)
    >>> loss = F.ctc_loss(log_probs, targets, input_lengths, target_lengths)
    >>> loss.backward()
    

### hinge_embedding_loss

`torch.nn.functional.hinge_embedding_loss(input, target, margin=1.0,
size_average=None, reduce=None, reduction='mean') → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hinge_embedding_loss)

    

See
[`HingeEmbeddingLoss`](generated/torch.nn.hingeembeddingloss#torch.nn.HingeEmbeddingLoss
"torch.nn.HingeEmbeddingLoss") for details.

### kl_div

`torch.nn.functional.kl_div(input, target, size_average=None, reduce=None,
reduction='mean', log_target=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#kl_div)

    

The [Kullback-Leibler divergence Loss](https://en.wikipedia.org/wiki/Kullback-
Leibler_divergence)

See [`KLDivLoss`](generated/torch.nn.kldivloss#torch.nn.KLDivLoss
"torch.nn.KLDivLoss") for details.

Parameters

    

  * **input** – Tensor of arbitrary shape
  * **target** – Tensor of the same shape as input
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True`
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'batchmean'` | `'sum'` | `'mean'`. `'none'`: no reduction will be applied `'batchmean'`: the sum of the output will be divided by the batchsize `'sum'`: the output will be summed `'mean'`: the output will be divided by the number of elements in the output Default: `'mean'`
  * **log_target** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – A flag indicating whether `target` is passed in the log space. It is recommended to pass certain distributions (like `softmax`) in the log space to avoid numerical issues caused by explicit `log`. Default: `False`

Note

`size_average` and `reduce` are in the process of being deprecated, and in the
meantime, specifying either of those two args will override `reduction`.

Note

:attr:`reduction` = `'mean'` doesn’t return the true kl divergence value,
please use :attr:`reduction` = `'batchmean'` which aligns with KL math
definition. In the next major release, `'mean'` will be changed to be the same
as ‘batchmean’.

### l1_loss

`torch.nn.functional.l1_loss(input, target, size_average=None, reduce=None,
reduction='mean') → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#l1_loss)

    

Function that takes the mean element-wise absolute value difference.

See [`L1Loss`](generated/torch.nn.l1loss#torch.nn.L1Loss "torch.nn.L1Loss")
for details.

### mse_loss

`torch.nn.functional.mse_loss(input, target, size_average=None, reduce=None,
reduction='mean') → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#mse_loss)

    

Measures the element-wise mean squared error.

See [`MSELoss`](generated/torch.nn.mseloss#torch.nn.MSELoss
"torch.nn.MSELoss") for details.

### margin_ranking_loss

`torch.nn.functional.margin_ranking_loss(input1, input2, target, margin=0,
size_average=None, reduce=None, reduction='mean') → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#margin_ranking_loss)

    

See
[`MarginRankingLoss`](generated/torch.nn.marginrankingloss#torch.nn.MarginRankingLoss
"torch.nn.MarginRankingLoss") for details.

### multilabel_margin_loss

`torch.nn.functional.multilabel_margin_loss(input, target, size_average=None,
reduce=None, reduction='mean') → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#multilabel_margin_loss)

    

See
[`MultiLabelMarginLoss`](generated/torch.nn.multilabelmarginloss#torch.nn.MultiLabelMarginLoss
"torch.nn.MultiLabelMarginLoss") for details.

### multilabel_soft_margin_loss

`torch.nn.functional.multilabel_soft_margin_loss(input, target, weight=None,
size_average=None) → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#multilabel_soft_margin_loss)

    

See
[`MultiLabelSoftMarginLoss`](generated/torch.nn.multilabelsoftmarginloss#torch.nn.MultiLabelSoftMarginLoss
"torch.nn.MultiLabelSoftMarginLoss") for details.

### multi_margin_loss

`torch.nn.functional.multi_margin_loss(input, target, p=1, margin=1.0,
weight=None, size_average=None, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#multi_margin_loss)

    

multi_margin_loss(input, target, p=1, margin=1, weight=None,
size_average=None,

    

reduce=None, reduction=’mean’) -> Tensor

See
[`MultiMarginLoss`](generated/torch.nn.multimarginloss#torch.nn.MultiMarginLoss
"torch.nn.MultiMarginLoss") for details.

### nll_loss

`torch.nn.functional.nll_loss(input, target, weight=None, size_average=None,
ignore_index=-100, reduce=None, reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#nll_loss)

    

The negative log likelihood loss.

See [`NLLLoss`](generated/torch.nn.nllloss#torch.nn.NLLLoss
"torch.nn.NLLLoss") for details.

Parameters

    

  * **input** – (N,C)(N, C) where `C = number of classes` or (N,C,H,W)(N, C, H, W) in case of 2D Loss, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) where K≥1K \geq 1 in the case of K-dimensional loss.
  * **target** – (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) where K≥1K \geq 1 for K-dimensional loss.
  * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, has to be a Tensor of size `C`
  * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True`
  * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets. Default: -100
  * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True`
  * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'`

Example:

    
    
    >>> # input is of size N x C = 3 x 5
    >>> input = torch.randn(3, 5, requires_grad=True)
    >>> # each element in target has to have 0 <= value < C
    >>> target = torch.tensor([1, 0, 4])
    >>> output = F.nll_loss(F.log_softmax(input), target)
    >>> output.backward()
    

### smooth_l1_loss

`torch.nn.functional.smooth_l1_loss(input, target, size_average=None,
reduce=None, reduction='mean', beta=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#smooth_l1_loss)

    

Function that uses a squared term if the absolute element-wise error falls
below beta and an L1 term otherwise.

See [`SmoothL1Loss`](generated/torch.nn.smoothl1loss#torch.nn.SmoothL1Loss
"torch.nn.SmoothL1Loss") for details.

### soft_margin_loss

`torch.nn.functional.soft_margin_loss(input, target, size_average=None,
reduce=None, reduction='mean') → Tensor`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#soft_margin_loss)

    

See
[`SoftMarginLoss`](generated/torch.nn.softmarginloss#torch.nn.SoftMarginLoss
"torch.nn.SoftMarginLoss") for details.

### triplet_margin_loss

`torch.nn.functional.triplet_margin_loss(anchor, positive, negative,
margin=1.0, p=2, eps=1e-06, swap=False, size_average=None, reduce=None,
reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#triplet_margin_loss)

    

See
[`TripletMarginLoss`](generated/torch.nn.tripletmarginloss#torch.nn.TripletMarginLoss
"torch.nn.TripletMarginLoss") for details

### triplet_margin_with_distance_loss

`torch.nn.functional.triplet_margin_with_distance_loss(anchor, positive,
negative, *, distance_function=None, margin=1.0, swap=False,
reduction='mean')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#triplet_margin_with_distance_loss)

    

See
[`TripletMarginWithDistanceLoss`](generated/torch.nn.tripletmarginwithdistanceloss#torch.nn.TripletMarginWithDistanceLoss
"torch.nn.TripletMarginWithDistanceLoss") for details.

## Vision functions

### pixel_shuffle

`torch.nn.functional.pixel_shuffle(input, upscale_factor) → Tensor`

    

Rearranges elements in a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W)
to a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) , where r is
the `upscale_factor`.

See [`PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle
"torch.nn.PixelShuffle") for details.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **upscale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – factor to increase spatial resolution by

Examples:

    
    
    >>> input = torch.randn(1, 9, 4, 4)
    >>> output = torch.nn.functional.pixel_shuffle(input, 3)
    >>> print(output.size())
    torch.Size([1, 1, 12, 12])
    

### pixel_unshuffle

`torch.nn.functional.pixel_unshuffle(input, downscale_factor) → Tensor`

    

Reverses the
[`PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle
"torch.nn.PixelShuffle") operation by rearranging elements in a tensor of
shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape
(∗,C×r2,H,W)(*, C \times r^2, H, W) , where r is the `downscale_factor`.

See
[`PixelUnshuffle`](generated/torch.nn.pixelunshuffle#torch.nn.PixelUnshuffle
"torch.nn.PixelUnshuffle") for details.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **downscale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – factor to increase spatial resolution by

Examples:

    
    
    >>> input = torch.randn(1, 1, 12, 12)
    >>> output = torch.nn.functional.pixel_unshuffle(input, 3)
    >>> print(output.size())
    torch.Size([1, 9, 4, 4])
    

### pad

`torch.nn.functional.pad(input, pad, mode='constant', value=0)`

    

Pads tensor.

Padding size:

    

The padding size by which to pad some dimensions of `input` are described
starting from the last dimension and moving forward.
⌊len(pad)2⌋\left\lfloor\frac{\text{len(pad)}}{2}\right\rfloor dimensions of
`input` will be padded. For example, to pad only the last dimension of the
input tensor, then `pad` has the form
(padding_left,padding_right)(\text{padding\\_left}, \text{padding\\_right}) ;
to pad the last 2 dimensions of the input tensor, then use
(padding_left,padding_right,(\text{padding\\_left}, \text{padding\\_right},
padding_top,padding_bottom)\text{padding\\_top}, \text{padding\\_bottom}) ; to
pad the last 3 dimensions, use
(padding_left,padding_right,(\text{padding\\_left}, \text{padding\\_right},
padding_top,padding_bottom\text{padding\\_top}, \text{padding\\_bottom}
padding_front,padding_back)\text{padding\\_front}, \text{padding\\_back}) .

Padding mode:

    

See
[`torch.nn.ConstantPad2d`](generated/torch.nn.constantpad2d#torch.nn.ConstantPad2d
"torch.nn.ConstantPad2d"),
[`torch.nn.ReflectionPad2d`](generated/torch.nn.reflectionpad2d#torch.nn.ReflectionPad2d
"torch.nn.ReflectionPad2d"), and
[`torch.nn.ReplicationPad2d`](generated/torch.nn.replicationpad2d#torch.nn.ReplicationPad2d
"torch.nn.ReplicationPad2d") for concrete examples on how each of the padding
modes works. Constant padding is implemented for arbitrary dimensions.
Replicate padding is implemented for padding the last 3 dimensions of 5D input
tensor, or the last 2 dimensions of 4D input tensor, or the last dimension of
3D input tensor. Reflect padding is only implemented for padding the last 2
dimensions of 4D input tensor, or the last dimension of 3D input tensor.

Note

When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off. Please see the
notes on
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
background.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – N-dimensional tensor
  * **pad** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – m-elements tuple, where m2≤\frac{m}{2} \leq input dimensions and mm is even.
  * **mode** – `'constant'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'constant'`
  * **value** – fill value for `'constant'` padding. Default: `0`

Examples:

    
    
    >>> t4d = torch.empty(3, 3, 4, 2)
    >>> p1d = (1, 1) # pad last dim by 1 on each side
    >>> out = F.pad(t4d, p1d, "constant", 0)  # effectively zero padding
    >>> print(out.size())
    torch.Size([3, 3, 4, 4])
    >>> p2d = (1, 1, 2, 2) # pad last dim by (1, 1) and 2nd to last by (2, 2)
    >>> out = F.pad(t4d, p2d, "constant", 0)
    >>> print(out.size())
    torch.Size([3, 3, 8, 4])
    >>> t4d = torch.empty(3, 3, 4, 2)
    >>> p3d = (0, 1, 2, 1, 3, 3) # pad by (0, 1), (2, 1), and (3, 3)
    >>> out = F.pad(t4d, p3d, "constant", 0)
    >>> print(out.size())
    torch.Size([3, 9, 7, 3])
    

### interpolate

`torch.nn.functional.interpolate(input, size=None, scale_factor=None,
mode='nearest', align_corners=None, recompute_scale_factor=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#interpolate)

    

Down/up samples the input to either the given `size` or the given
`scale_factor`

The algorithm used for interpolation is determined by `mode`.

Currently temporal, spatial and volumetric sampling are supported, i.e.
expected inputs are 3-D, 4-D or 5-D in shape.

The input dimensions are interpreted in the form: `mini-batch x channels x
[optional depth] x [optional height] x width`.

The modes available for resizing are: `nearest`, `linear` (3D-only),
`bilinear`, `bicubic` (4D-only), `trilinear` (5D-only), `area`

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size.
  * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to match input size if it is a tuple.
  * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – algorithm used for upsampling: `'nearest'` | `'linear'` | `'bilinear'` | `'bicubic'` | `'trilinear'` | `'area'`. Default: `'nearest'`
  * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'linear'`, `'bilinear'`, `'bicubic'` or `'trilinear'`. Default: `False`
  * **recompute_scale_factor** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – recompute the scale_factor for use in the interpolation calculation. When `scale_factor` is passed as a parameter, it is used to compute the `output_size`. If `recompute_scale_factor` is `False` or not specified, the passed-in `scale_factor` will be used in the interpolation computation. Otherwise, a new `scale_factor` will be computed based on the output and input sizes for use in the interpolation computation (i.e. the computation will be identical to if the computed `output_size` were passed-in explicitly). Note that when `scale_factor` is floating-point, the recomputed scale_factor may differ from the one passed in due to rounding and precision issues.

Note

With `mode='bicubic'`, it’s possible to cause overshoot, in other words it can
produce negative values or values greater than 255 for images. Explicitly call
`result.clamp(min=0, max=255)` if you want to reduce the overshoot when
displaying the image.

Warning

With `align_corners = True`, the linearly interpolating modes (`linear`,
`bilinear`, and `trilinear`) don’t proportionally align the output and input
pixels, and thus the output values can depend on the input size. This was the
default behavior for these modes up to version 0.3.1. Since then, the default
behavior is `align_corners = False`. See
[`Upsample`](generated/torch.nn.upsample#torch.nn.Upsample
"torch.nn.Upsample") for concrete examples on how this affects the outputs.

Warning

When scale_factor is specified, if recompute_scale_factor=True, scale_factor
is used to compute the output_size which will then be used to infer new scales
for the interpolation. The default behavior for recompute_scale_factor changed
to False in 1.6.0, and scale_factor is used in the interpolation calculation.

Note

This operation may produce nondeterministic gradients when given tensors on a
CUDA device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

### upsample

`torch.nn.functional.upsample(input, size=None, scale_factor=None,
mode='nearest', align_corners=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#upsample)

    

Upsamples the input to either the given `size` or the given `scale_factor`

Warning

This function is deprecated in favor of `torch.nn.functional.interpolate()`.
This is equivalent with `nn.functional.interpolate(...)`.

Note

This operation may produce nondeterministic gradients when given tensors on a
CUDA device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

The algorithm used for upsampling is determined by `mode`.

Currently temporal, spatial and volumetric upsampling are supported, i.e.
expected inputs are 3-D, 4-D or 5-D in shape.

The input dimensions are interpreted in the form: `mini-batch x channels x
[optional depth] x [optional height] x width`.

The modes available for upsampling are: `nearest`, `linear` (3D-only),
`bilinear`, `bicubic` (4D-only), `trilinear` (5D-only)

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size.
  * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to match input size if it is a tuple.
  * **mode** (_string_) – algorithm used for upsampling: `'nearest'` | `'linear'` | `'bilinear'` | `'bicubic'` | `'trilinear'`. Default: `'nearest'`
  * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'linear'`, `'bilinear'`, `'bicubic'` or `'trilinear'`. Default: `False`

Note

With `mode='bicubic'`, it’s possible to cause overshoot, in other words it can
produce negative values or values greater than 255 for images. Explicitly call
`result.clamp(min=0, max=255)` if you want to reduce the overshoot when
displaying the image.

Warning

With `align_corners = True`, the linearly interpolating modes (`linear`,
`bilinear`, and `trilinear`) don’t proportionally align the output and input
pixels, and thus the output values can depend on the input size. This was the
default behavior for these modes up to version 0.3.1. Since then, the default
behavior is `align_corners = False`. See
[`Upsample`](generated/torch.nn.upsample#torch.nn.Upsample
"torch.nn.Upsample") for concrete examples on how this affects the outputs.

### upsample_nearest

`torch.nn.functional.upsample_nearest(input, size=None, scale_factor=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#upsample_nearest)

    

Upsamples the input, using nearest neighbours’ pixel values.

Warning

This function is deprecated in favor of `torch.nn.functional.interpolate()`.
This is equivalent with `nn.functional.interpolate(..., mode='nearest')`.

Currently spatial and volumetric upsampling are supported (i.e. expected
inputs are 4 or 5 dimensional).

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatia size.
  * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – multiplier for spatial size. Has to be an integer.

Note

This operation may produce nondeterministic gradients when given tensors on a
CUDA device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

### upsample_bilinear

`torch.nn.functional.upsample_bilinear(input, size=None, scale_factor=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#upsample_bilinear)

    

Upsamples the input, using bilinear upsampling.

Warning

This function is deprecated in favor of `torch.nn.functional.interpolate()`.
This is equivalent with `nn.functional.interpolate(..., mode='bilinear',
align_corners=True)`.

Expected inputs are spatial (4 dimensional). Use `upsample_trilinear` fo
volumetric (5 dimensional) inputs.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size.
  * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – multiplier for spatial size

Note

This operation may produce nondeterministic gradients when given tensors on a
CUDA device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

### grid_sample

`torch.nn.functional.grid_sample(input, grid, mode='bilinear',
padding_mode='zeros', align_corners=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#grid_sample)

    

Given an `input` and a flow-field `grid`, computes the `output` using `input`
values and pixel locations from `grid`.

Currently, only spatial (4-D) and volumetric (5-D) `input` are supported.

In the spatial (4-D) case, for `input` with shape (N,C,Hin,Win)(N, C,
H_\text{in}, W_\text{in}) and `grid` with shape (N,Hout,Wout,2)(N,
H_\text{out}, W_\text{out}, 2) , the output will have shape (N,C,Hout,Wout)(N,
C, H_\text{out}, W_\text{out}) .

For each output location `output[n, :, h, w]`, the size-2 vector `grid[n, h,
w]` specifies `input` pixel locations `x` and `y`, which are used to
interpolate the output value `output[n, :, h, w]`. In the case of 5D inputs,
`grid[n, d, h, w]` specifies the `x`, `y`, `z` pixel locations for
interpolating `output[n, :, d, h, w]`. `mode` argument specifies `nearest` or
`bilinear` interpolation method to sample the input pixels.

`grid` specifies the sampling pixel locations normalized by the `input`
spatial dimensions. Therefore, it should have most values in the range of
`[-1, 1]`. For example, values `x = -1, y = -1` is the left-top pixel of
`input`, and values `x = 1, y = 1` is the right-bottom pixel of `input`.

If `grid` has values outside the range of `[-1, 1]`, the corresponding outputs
are handled as defined by `padding_mode`. Options are

  * `padding_mode="zeros"`: use `0` for out-of-bound grid locations,
  * `padding_mode="border"`: use border values for out-of-bound grid locations,
  * `padding_mode="reflection"`: use values at locations reflected by the border for out-of-bound grid locations. For location far away from the border, it will keep being reflected until becoming in bound, e.g., (normalized) pixel location `x = -3.5` reflects by border `-1` and becomes `x' = 1.5`, then reflects by border `1` and becomes `x'' = -0.5`.

Note

This function is often used in conjunction with `affine_grid()` to build
[Spatial Transformer Networks](https://arxiv.org/abs/1506.02025) .

Note

When using the CUDA backend, this operation may induce nondeterministic
behaviour in its backward pass that is not easily switched off. Please see the
notes on
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
background.

Note

NaN values in `grid` would be interpreted as `-1`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input of shape (N,C,Hin,Win)(N, C, H_\text{in}, W_\text{in}) (4-D case) or (N,C,Din,Hin,Win)(N, C, D_\text{in}, H_\text{in}, W_\text{in}) (5-D case)
  * **grid** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – flow-field of shape (N,Hout,Wout,2)(N, H_\text{out}, W_\text{out}, 2) (4-D case) or (N,Dout,Hout,Wout,3)(N, D_\text{out}, H_\text{out}, W_\text{out}, 3) (5-D case)
  * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – interpolation mode to calculate output values `'bilinear'` | `'nearest'` | `'bicubic'`. Default: `'bilinear'` Note: `mode='bicubic'` supports only 4-D input. When `mode='bilinear'` and the input is 5-D, the interpolation mode used internally will actually be trilinear. However, when the input is 4-D, the interpolation mode will legitimately be bilinear.
  * **padding_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – padding mode for outside grid values `'zeros'` | `'border'` | `'reflection'`. Default: `'zeros'`
  * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input as squares rather than points. If set to `True`, the extrema (`-1` and `1`) are considered as referring to the center points of the input’s corner pixels. If set to `False`, they are instead considered as referring to the corner points of the input’s corner pixels, making the sampling more resolution agnostic. This option parallels the `align_corners` option in `interpolate()`, and so whichever option is used here should also be used there to resize the input image before grid sampling. Default: `False`

Returns

    

output Tensor

Return type

    

output ([Tensor](tensors#torch.Tensor "torch.Tensor"))

Warning

When `align_corners = True`, the grid positions depend on the pixel size
relative to the input image size, and so the locations sampled by
`grid_sample()` will differ for the same input given at different resolutions
(that is, after being upsampled or downsampled). The default behavior up to
version 1.2.0 was `align_corners = True`. Since then, the default behavior has
been changed to `align_corners = False`, in order to bring it in line with the
default for `interpolate()`.

Note

`mode='bicubic'` is implemented using the [cubic convolution
algorithm](https://en.wikipedia.org/wiki/Bicubic_interpolation) with
α=−0.75\alpha=-0.75 . The constant α\alpha might be different from packages to
packages. For example, [PIL](https://github.com/python-
pillow/Pillow/blob/4634eafe3c695a014267eefdce830b4a825beed7/src/libImaging/Resample.c#L51)
and
[OpenCV](https://github.com/opencv/opencv/blob/f345ed564a06178670750bad59526cfa4033be55/modules/imgproc/src/resize.cpp#L908)
use -0.5 and -0.75 respectively. This algorithm may “overshoot” the range of
values it’s interpolating. For example, it may produce negative values or
values greater than 255 when interpolating input in [0, 255]. Clamp the
results with :func: `torch.clamp` to ensure they are within the valid range.

### affine_grid

`torch.nn.functional.affine_grid(theta, size, align_corners=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#affine_grid)

    

Generates a 2D or 3D flow field (sampling grid), given a batch of affine
matrices `theta`.

Note

This function is often used in conjunction with `grid_sample()` to build
[Spatial Transformer Networks](https://arxiv.org/abs/1506.02025) .

Parameters

    

  * **theta** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input batch of affine matrices with shape (N×2×3N \times 2 \times 3 ) for 2D or (N×3×4N \times 3 \times 4 ) for 3D
  * **size** (_torch.Size_) – the target output image size. (N×C×H×WN \times C \times H \times W for 2D or N×C×D×H×WN \times C \times D \times H \times W for 3D) Example: torch.Size((32, 3, 24, 24))
  * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, consider `-1` and `1` to refer to the centers of the corner pixels rather than the image corners. Refer to `grid_sample()` for a more complete description. A grid generated by `affine_grid()` should be passed to `grid_sample()` with the same setting for this option. Default: `False`

Returns

    

output Tensor of size (N×H×W×2N \times H \times W \times 2 )

Return type

    

output ([Tensor](tensors#torch.Tensor "torch.Tensor"))

Warning

When `align_corners = True`, the grid positions depend on the pixel size
relative to the input image size, and so the locations sampled by
`grid_sample()` will differ for the same input given at different resolutions
(that is, after being upsampled or downsampled). The default behavior up to
version 1.2.0 was `align_corners = True`. Since then, the default behavior has
been changed to `align_corners = False`, in order to bring it in line with the
default for `interpolate()`.

Warning

When `align_corners = True`, 2D affine transforms on 1D data and 3D affine
transforms on 2D data (that is, when one of the spatial dimensions has unit
size) are ill-defined, and not an intended use case. This is not a problem
when `align_corners = False`. Up to version 1.2.0, all grid points along a
unit dimension were considered arbitrarily to be at `-1`. From version 1.3.0,
under `align_corners = True` all grid points along a unit dimension are
considered to be at ``0` (the center of the input image).

## DataParallel functions (multi-GPU, distributed)

### data_parallel

`torch.nn.parallel.data_parallel(module, inputs, device_ids=None,
output_device=None, dim=0, module_kwargs=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/data_parallel.html#data_parallel)

    

Evaluates module(input) in parallel across the GPUs given in device_ids.

This is the functional version of the DataParallel module.

Parameters

    

  * **module** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – the module to evaluate in parallel
  * **inputs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the module
  * **device_ids** (_list of python:int_ _or_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device")) – GPU ids on which to replicate module
  * **output_device** (_list of python:int_ _or_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device")) – GPU location of the output Use -1 to indicate the CPU. (default: device_ids[0])

Returns

    

a Tensor containing the result of module(input) located on output_device

# torch.nn

These are the basic building block for graphs

torch.nn

  * Containers
  * Convolution Layers
  * Pooling layers
  * Padding Layers
  * Non-linear Activations (weighted sum, nonlinearity)
  * Non-linear Activations (other)
  * Normalization Layers
  * Recurrent Layers
  * Transformer Layers
  * Linear Layers
  * Dropout Layers
  * Sparse Layers
  * Distance Functions
  * Loss Functions
  * Vision Layers
  * Shuffle Layers
  * DataParallel Layers (multi-GPU, distributed)
  * Utilities
  * Quantized Functions
  * Lazy Modules Initialization

[`Parameter`](generated/torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter") | A kind of Tensor that is to be considered a module parameter.  
---|---  
[`UninitializedParameter`](generated/torch.nn.parameter.uninitializedparameter#torch.nn.parameter.UninitializedParameter "torch.nn.parameter.UninitializedParameter") | A parameter that is not initialized.  
  
## Containers

[`Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") | Base class for all neural network modules.  
---|---  
[`Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") | A sequential container.  
[`ModuleList`](generated/torch.nn.modulelist#torch.nn.ModuleList "torch.nn.ModuleList") | Holds submodules in a list.  
[`ModuleDict`](generated/torch.nn.moduledict#torch.nn.ModuleDict "torch.nn.ModuleDict") | Holds submodules in a dictionary.  
[`ParameterList`](generated/torch.nn.parameterlist#torch.nn.ParameterList "torch.nn.ParameterList") | Holds parameters in a list.  
[`ParameterDict`](generated/torch.nn.parameterdict#torch.nn.ParameterDict "torch.nn.ParameterDict") | Holds parameters in a dictionary.  
  
Global Hooks For Module

[`register_module_forward_pre_hook`](generated/torch.nn.modules.module.register_module_forward_pre_hook#torch.nn.modules.module.register_module_forward_pre_hook "torch.nn.modules.module.register_module_forward_pre_hook") | Registers a forward pre-hook common to all modules.  
---|---  
[`register_module_forward_hook`](generated/torch.nn.modules.module.register_module_forward_hook#torch.nn.modules.module.register_module_forward_hook "torch.nn.modules.module.register_module_forward_hook") | Registers a global forward hook for all the modules  
[`register_module_backward_hook`](generated/torch.nn.modules.module.register_module_backward_hook#torch.nn.modules.module.register_module_backward_hook "torch.nn.modules.module.register_module_backward_hook") | Registers a backward hook common to all the modules.  
  
## Convolution Layers

[`nn.Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") | Applies a 1D convolution over an input signal composed of several input planes.  
---|---  
[`nn.Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") | Applies a 2D convolution over an input signal composed of several input planes.  
[`nn.Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") | Applies a 3D convolution over an input signal composed of several input planes.  
[`nn.ConvTranspose1d`](generated/torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") | Applies a 1D transposed convolution operator over an input image composed of several input planes.  
[`nn.ConvTranspose2d`](generated/torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") | Applies a 2D transposed convolution operator over an input image composed of several input planes.  
[`nn.ConvTranspose3d`](generated/torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") | Applies a 3D transposed convolution operator over an input image composed of several input planes.  
[`nn.LazyConv1d`](generated/torch.nn.lazyconv1d#torch.nn.LazyConv1d "torch.nn.LazyConv1d") | A [`torch.nn.Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") module with lazy initialization of the `in_channels` argument of the `Conv1d` that is inferred from the `input.size(1)`.  
[`nn.LazyConv2d`](generated/torch.nn.lazyconv2d#torch.nn.LazyConv2d "torch.nn.LazyConv2d") | A [`torch.nn.Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") module with lazy initialization of the `in_channels` argument of the `Conv2d` that is inferred from the `input.size(1)`.  
[`nn.LazyConv3d`](generated/torch.nn.lazyconv3d#torch.nn.LazyConv3d "torch.nn.LazyConv3d") | A [`torch.nn.Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") module with lazy initialization of the `in_channels` argument of the `Conv3d` that is inferred from the `input.size(1)`.  
[`nn.LazyConvTranspose1d`](generated/torch.nn.lazyconvtranspose1d#torch.nn.LazyConvTranspose1d "torch.nn.LazyConvTranspose1d") | A [`torch.nn.ConvTranspose1d`](generated/torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") module with lazy initialization of the `in_channels` argument of the `ConvTranspose1d` that is inferred from the `input.size(1)`.  
[`nn.LazyConvTranspose2d`](generated/torch.nn.lazyconvtranspose2d#torch.nn.LazyConvTranspose2d "torch.nn.LazyConvTranspose2d") | A [`torch.nn.ConvTranspose2d`](generated/torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") module with lazy initialization of the `in_channels` argument of the `ConvTranspose2d` that is inferred from the `input.size(1)`.  
[`nn.LazyConvTranspose3d`](generated/torch.nn.lazyconvtranspose3d#torch.nn.LazyConvTranspose3d "torch.nn.LazyConvTranspose3d") | A [`torch.nn.ConvTranspose3d`](generated/torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") module with lazy initialization of the `in_channels` argument of the `ConvTranspose3d` that is inferred from the `input.size(1)`.  
[`nn.Unfold`](generated/torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold") | Extracts sliding local blocks from a batched input tensor.  
[`nn.Fold`](generated/torch.nn.fold#torch.nn.Fold "torch.nn.Fold") | Combines an array of sliding local blocks into a large containing tensor.  
  
## Pooling layers

[`nn.MaxPool1d`](generated/torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") | Applies a 1D max pooling over an input signal composed of several input planes.  
---|---  
[`nn.MaxPool2d`](generated/torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") | Applies a 2D max pooling over an input signal composed of several input planes.  
[`nn.MaxPool3d`](generated/torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") | Applies a 3D max pooling over an input signal composed of several input planes.  
[`nn.MaxUnpool1d`](generated/torch.nn.maxunpool1d#torch.nn.MaxUnpool1d "torch.nn.MaxUnpool1d") | Computes a partial inverse of `MaxPool1d`.  
[`nn.MaxUnpool2d`](generated/torch.nn.maxunpool2d#torch.nn.MaxUnpool2d "torch.nn.MaxUnpool2d") | Computes a partial inverse of `MaxPool2d`.  
[`nn.MaxUnpool3d`](generated/torch.nn.maxunpool3d#torch.nn.MaxUnpool3d "torch.nn.MaxUnpool3d") | Computes a partial inverse of `MaxPool3d`.  
[`nn.AvgPool1d`](generated/torch.nn.avgpool1d#torch.nn.AvgPool1d "torch.nn.AvgPool1d") | Applies a 1D average pooling over an input signal composed of several input planes.  
[`nn.AvgPool2d`](generated/torch.nn.avgpool2d#torch.nn.AvgPool2d "torch.nn.AvgPool2d") | Applies a 2D average pooling over an input signal composed of several input planes.  
[`nn.AvgPool3d`](generated/torch.nn.avgpool3d#torch.nn.AvgPool3d "torch.nn.AvgPool3d") | Applies a 3D average pooling over an input signal composed of several input planes.  
[`nn.FractionalMaxPool2d`](generated/torch.nn.fractionalmaxpool2d#torch.nn.FractionalMaxPool2d "torch.nn.FractionalMaxPool2d") | Applies a 2D fractional max pooling over an input signal composed of several input planes.  
[`nn.LPPool1d`](generated/torch.nn.lppool1d#torch.nn.LPPool1d "torch.nn.LPPool1d") | Applies a 1D power-average pooling over an input signal composed of several input planes.  
[`nn.LPPool2d`](generated/torch.nn.lppool2d#torch.nn.LPPool2d "torch.nn.LPPool2d") | Applies a 2D power-average pooling over an input signal composed of several input planes.  
[`nn.AdaptiveMaxPool1d`](generated/torch.nn.adaptivemaxpool1d#torch.nn.AdaptiveMaxPool1d "torch.nn.AdaptiveMaxPool1d") | Applies a 1D adaptive max pooling over an input signal composed of several input planes.  
[`nn.AdaptiveMaxPool2d`](generated/torch.nn.adaptivemaxpool2d#torch.nn.AdaptiveMaxPool2d "torch.nn.AdaptiveMaxPool2d") | Applies a 2D adaptive max pooling over an input signal composed of several input planes.  
[`nn.AdaptiveMaxPool3d`](generated/torch.nn.adaptivemaxpool3d#torch.nn.AdaptiveMaxPool3d "torch.nn.AdaptiveMaxPool3d") | Applies a 3D adaptive max pooling over an input signal composed of several input planes.  
[`nn.AdaptiveAvgPool1d`](generated/torch.nn.adaptiveavgpool1d#torch.nn.AdaptiveAvgPool1d "torch.nn.AdaptiveAvgPool1d") | Applies a 1D adaptive average pooling over an input signal composed of several input planes.  
[`nn.AdaptiveAvgPool2d`](generated/torch.nn.adaptiveavgpool2d#torch.nn.AdaptiveAvgPool2d "torch.nn.AdaptiveAvgPool2d") | Applies a 2D adaptive average pooling over an input signal composed of several input planes.  
[`nn.AdaptiveAvgPool3d`](generated/torch.nn.adaptiveavgpool3d#torch.nn.AdaptiveAvgPool3d "torch.nn.AdaptiveAvgPool3d") | Applies a 3D adaptive average pooling over an input signal composed of several input planes.  
  
## Padding Layers

[`nn.ReflectionPad1d`](generated/torch.nn.reflectionpad1d#torch.nn.ReflectionPad1d "torch.nn.ReflectionPad1d") | Pads the input tensor using the reflection of the input boundary.  
---|---  
[`nn.ReflectionPad2d`](generated/torch.nn.reflectionpad2d#torch.nn.ReflectionPad2d "torch.nn.ReflectionPad2d") | Pads the input tensor using the reflection of the input boundary.  
[`nn.ReplicationPad1d`](generated/torch.nn.replicationpad1d#torch.nn.ReplicationPad1d "torch.nn.ReplicationPad1d") | Pads the input tensor using replication of the input boundary.  
[`nn.ReplicationPad2d`](generated/torch.nn.replicationpad2d#torch.nn.ReplicationPad2d "torch.nn.ReplicationPad2d") | Pads the input tensor using replication of the input boundary.  
[`nn.ReplicationPad3d`](generated/torch.nn.replicationpad3d#torch.nn.ReplicationPad3d "torch.nn.ReplicationPad3d") | Pads the input tensor using replication of the input boundary.  
[`nn.ZeroPad2d`](generated/torch.nn.zeropad2d#torch.nn.ZeroPad2d "torch.nn.ZeroPad2d") | Pads the input tensor boundaries with zero.  
[`nn.ConstantPad1d`](generated/torch.nn.constantpad1d#torch.nn.ConstantPad1d "torch.nn.ConstantPad1d") | Pads the input tensor boundaries with a constant value.  
[`nn.ConstantPad2d`](generated/torch.nn.constantpad2d#torch.nn.ConstantPad2d "torch.nn.ConstantPad2d") | Pads the input tensor boundaries with a constant value.  
[`nn.ConstantPad3d`](generated/torch.nn.constantpad3d#torch.nn.ConstantPad3d "torch.nn.ConstantPad3d") | Pads the input tensor boundaries with a constant value.  
  
## Non-linear Activations (weighted sum, nonlinearity)

[`nn.ELU`](generated/torch.nn.elu#torch.nn.ELU "torch.nn.ELU") | Applies the element-wise function:  
---|---  
[`nn.Hardshrink`](generated/torch.nn.hardshrink#torch.nn.Hardshrink "torch.nn.Hardshrink") | Applies the hard shrinkage function element-wise:  
[`nn.Hardsigmoid`](generated/torch.nn.hardsigmoid#torch.nn.Hardsigmoid "torch.nn.Hardsigmoid") | Applies the element-wise function:  
[`nn.Hardtanh`](generated/torch.nn.hardtanh#torch.nn.Hardtanh "torch.nn.Hardtanh") | Applies the HardTanh function element-wise  
[`nn.Hardswish`](generated/torch.nn.hardswish#torch.nn.Hardswish "torch.nn.Hardswish") | Applies the hardswish function, element-wise, as described in the paper:  
[`nn.LeakyReLU`](generated/torch.nn.leakyrelu#torch.nn.LeakyReLU "torch.nn.LeakyReLU") | Applies the element-wise function:  
[`nn.LogSigmoid`](generated/torch.nn.logsigmoid#torch.nn.LogSigmoid "torch.nn.LogSigmoid") | Applies the element-wise function:  
[`nn.MultiheadAttention`](generated/torch.nn.multiheadattention#torch.nn.MultiheadAttention "torch.nn.MultiheadAttention") | Allows the model to jointly attend to information from different representation subspaces.  
[`nn.PReLU`](generated/torch.nn.prelu#torch.nn.PReLU "torch.nn.PReLU") | Applies the element-wise function:  
[`nn.ReLU`](generated/torch.nn.relu#torch.nn.ReLU "torch.nn.ReLU") | Applies the rectified linear unit function element-wise:  
[`nn.ReLU6`](generated/torch.nn.relu6#torch.nn.ReLU6 "torch.nn.ReLU6") | Applies the element-wise function:  
[`nn.RReLU`](generated/torch.nn.rrelu#torch.nn.RReLU "torch.nn.RReLU") | Applies the randomized leaky rectified liner unit function, element-wise, as described in the paper:  
[`nn.SELU`](generated/torch.nn.selu#torch.nn.SELU "torch.nn.SELU") | Applied element-wise, as:  
[`nn.CELU`](generated/torch.nn.celu#torch.nn.CELU "torch.nn.CELU") | Applies the element-wise function:  
[`nn.GELU`](generated/torch.nn.gelu#torch.nn.GELU "torch.nn.GELU") | Applies the Gaussian Error Linear Units function:  
[`nn.Sigmoid`](generated/torch.nn.sigmoid#torch.nn.Sigmoid "torch.nn.Sigmoid") | Applies the element-wise function:  
[`nn.SiLU`](generated/torch.nn.silu#torch.nn.SiLU "torch.nn.SiLU") | Applies the silu function, element-wise.  
[`nn.Softplus`](generated/torch.nn.softplus#torch.nn.Softplus "torch.nn.Softplus") | Applies the element-wise function:  
[`nn.Softshrink`](generated/torch.nn.softshrink#torch.nn.Softshrink "torch.nn.Softshrink") | Applies the soft shrinkage function elementwise:  
[`nn.Softsign`](generated/torch.nn.softsign#torch.nn.Softsign "torch.nn.Softsign") | Applies the element-wise function:  
[`nn.Tanh`](generated/torch.nn.tanh#torch.nn.Tanh "torch.nn.Tanh") | Applies the element-wise function:  
[`nn.Tanhshrink`](generated/torch.nn.tanhshrink#torch.nn.Tanhshrink "torch.nn.Tanhshrink") | Applies the element-wise function:  
[`nn.Threshold`](generated/torch.nn.threshold#torch.nn.Threshold "torch.nn.Threshold") | Thresholds each element of the input Tensor.  
  
## Non-linear Activations (other)

[`nn.Softmin`](generated/torch.nn.softmin#torch.nn.Softmin "torch.nn.Softmin") | Applies the Softmin function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range `[0, 1]` and sum to 1.  
---|---  
[`nn.Softmax`](generated/torch.nn.softmax#torch.nn.Softmax "torch.nn.Softmax") | Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1.  
[`nn.Softmax2d`](generated/torch.nn.softmax2d#torch.nn.Softmax2d "torch.nn.Softmax2d") | Applies SoftMax over features to each spatial location.  
[`nn.LogSoftmax`](generated/torch.nn.logsoftmax#torch.nn.LogSoftmax "torch.nn.LogSoftmax") | Applies the log⁡(Softmax(x))\log(\text{Softmax}(x)) function to an n-dimensional input Tensor.  
[`nn.AdaptiveLogSoftmaxWithLoss`](generated/torch.nn.adaptivelogsoftmaxwithloss#torch.nn.AdaptiveLogSoftmaxWithLoss "torch.nn.AdaptiveLogSoftmaxWithLoss") | Efficient softmax approximation as described in [Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou](https://arxiv.org/abs/1609.04309).  
  
## Normalization Layers

[`nn.BatchNorm1d`](generated/torch.nn.batchnorm1d#torch.nn.BatchNorm1d "torch.nn.BatchNorm1d") | Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) .  
---|---  
[`nn.BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d "torch.nn.BatchNorm2d") | Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) .  
[`nn.BatchNorm3d`](generated/torch.nn.batchnorm3d#torch.nn.BatchNorm3d "torch.nn.BatchNorm3d") | Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) .  
[`nn.GroupNorm`](generated/torch.nn.groupnorm#torch.nn.GroupNorm "torch.nn.GroupNorm") | Applies Group Normalization over a mini-batch of inputs as described in the paper [Group Normalization](https://arxiv.org/abs/1803.08494)  
[`nn.SyncBatchNorm`](generated/torch.nn.syncbatchnorm#torch.nn.SyncBatchNorm "torch.nn.SyncBatchNorm") | Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) .  
[`nn.InstanceNorm1d`](generated/torch.nn.instancenorm1d#torch.nn.InstanceNorm1d "torch.nn.InstanceNorm1d") | Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022).  
[`nn.InstanceNorm2d`](generated/torch.nn.instancenorm2d#torch.nn.InstanceNorm2d "torch.nn.InstanceNorm2d") | Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022).  
[`nn.InstanceNorm3d`](generated/torch.nn.instancenorm3d#torch.nn.InstanceNorm3d "torch.nn.InstanceNorm3d") | Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022).  
[`nn.LayerNorm`](generated/torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") | Applies Layer Normalization over a mini-batch of inputs as described in the paper [Layer Normalization](https://arxiv.org/abs/1607.06450)  
[`nn.LocalResponseNorm`](generated/torch.nn.localresponsenorm#torch.nn.LocalResponseNorm "torch.nn.LocalResponseNorm") | Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.  
  
## Recurrent Layers

[`nn.RNNBase`](generated/torch.nn.rnnbase#torch.nn.RNNBase "torch.nn.RNNBase") |   
---|---  
[`nn.RNN`](generated/torch.nn.rnn#torch.nn.RNN "torch.nn.RNN") | Applies a multi-layer Elman RNN with tanh⁡\tanh or ReLU\text{ReLU} non-linearity to an input sequence.  
[`nn.LSTM`](generated/torch.nn.lstm#torch.nn.LSTM "torch.nn.LSTM") | Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.  
[`nn.GRU`](generated/torch.nn.gru#torch.nn.GRU "torch.nn.GRU") | Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.  
[`nn.RNNCell`](generated/torch.nn.rnncell#torch.nn.RNNCell "torch.nn.RNNCell") | An Elman RNN cell with tanh or ReLU non-linearity.  
[`nn.LSTMCell`](generated/torch.nn.lstmcell#torch.nn.LSTMCell "torch.nn.LSTMCell") | A long short-term memory (LSTM) cell.  
[`nn.GRUCell`](generated/torch.nn.grucell#torch.nn.GRUCell "torch.nn.GRUCell") | A gated recurrent unit (GRU) cell  
  
## Transformer Layers

[`nn.Transformer`](generated/torch.nn.transformer#torch.nn.Transformer "torch.nn.Transformer") | A transformer model.  
---|---  
[`nn.TransformerEncoder`](generated/torch.nn.transformerencoder#torch.nn.TransformerEncoder "torch.nn.TransformerEncoder") | TransformerEncoder is a stack of N encoder layers  
[`nn.TransformerDecoder`](generated/torch.nn.transformerdecoder#torch.nn.TransformerDecoder "torch.nn.TransformerDecoder") | TransformerDecoder is a stack of N decoder layers  
[`nn.TransformerEncoderLayer`](generated/torch.nn.transformerencoderlayer#torch.nn.TransformerEncoderLayer "torch.nn.TransformerEncoderLayer") | TransformerEncoderLayer is made up of self-attn and feedforward network.  
[`nn.TransformerDecoderLayer`](generated/torch.nn.transformerdecoderlayer#torch.nn.TransformerDecoderLayer "torch.nn.TransformerDecoderLayer") | TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network.  
  
## Linear Layers

[`nn.Identity`](generated/torch.nn.identity#torch.nn.Identity "torch.nn.Identity") | A placeholder identity operator that is argument-insensitive.  
---|---  
[`nn.Linear`](generated/torch.nn.linear#torch.nn.Linear "torch.nn.Linear") | Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b  
[`nn.Bilinear`](generated/torch.nn.bilinear#torch.nn.Bilinear "torch.nn.Bilinear") | Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A x_2 + b  
[`nn.LazyLinear`](generated/torch.nn.lazylinear#torch.nn.LazyLinear "torch.nn.LazyLinear") | A [`torch.nn.Linear`](generated/torch.nn.linear#torch.nn.Linear "torch.nn.Linear") module with lazy initialization.  
  
## Dropout Layers

[`nn.Dropout`](generated/torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout") | During training, randomly zeroes some of the elements of the input tensor with probability `p` using samples from a Bernoulli distribution.  
---|---  
[`nn.Dropout2d`](generated/torch.nn.dropout2d#torch.nn.Dropout2d "torch.nn.Dropout2d") | Randomly zero out entire channels (a channel is a 2D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 2D tensor input[i,j]\text{input}[i, j] ).  
[`nn.Dropout3d`](generated/torch.nn.dropout3d#torch.nn.Dropout3d "torch.nn.Dropout3d") | Randomly zero out entire channels (a channel is a 3D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 3D tensor input[i,j]\text{input}[i, j] ).  
[`nn.AlphaDropout`](generated/torch.nn.alphadropout#torch.nn.AlphaDropout "torch.nn.AlphaDropout") | Applies Alpha Dropout over the input.  
  
## Sparse Layers

[`nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") | A simple lookup table that stores embeddings of a fixed dictionary and size.  
---|---  
[`nn.EmbeddingBag`](generated/torch.nn.embeddingbag#torch.nn.EmbeddingBag "torch.nn.EmbeddingBag") | Computes sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings.  
  
## Distance Functions

[`nn.CosineSimilarity`](generated/torch.nn.cosinesimilarity#torch.nn.CosineSimilarity "torch.nn.CosineSimilarity") | Returns cosine similarity between x1x_1 and x2x_2 , computed along dim.  
---|---  
[`nn.PairwiseDistance`](generated/torch.nn.pairwisedistance#torch.nn.PairwiseDistance "torch.nn.PairwiseDistance") | Computes the batchwise pairwise distance between vectors v1v_1 , v2v_2 using the p-norm:  
  
## Loss Functions

[`nn.L1Loss`](generated/torch.nn.l1loss#torch.nn.L1Loss "torch.nn.L1Loss") | Creates a criterion that measures the mean absolute error (MAE) between each element in the input xx and target yy .  
---|---  
[`nn.MSELoss`](generated/torch.nn.mseloss#torch.nn.MSELoss "torch.nn.MSELoss") | Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input xx and target yy .  
[`nn.CrossEntropyLoss`](generated/torch.nn.crossentropyloss#torch.nn.CrossEntropyLoss "torch.nn.CrossEntropyLoss") | This criterion combines [`LogSoftmax`](generated/torch.nn.logsoftmax#torch.nn.LogSoftmax "torch.nn.LogSoftmax") and [`NLLLoss`](generated/torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") in one single class.  
[`nn.CTCLoss`](generated/torch.nn.ctcloss#torch.nn.CTCLoss "torch.nn.CTCLoss") | The Connectionist Temporal Classification loss.  
[`nn.NLLLoss`](generated/torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") | The negative log likelihood loss.  
[`nn.PoissonNLLLoss`](generated/torch.nn.poissonnllloss#torch.nn.PoissonNLLLoss "torch.nn.PoissonNLLLoss") | Negative log likelihood loss with Poisson distribution of target.  
[`nn.GaussianNLLLoss`](generated/torch.nn.gaussiannllloss#torch.nn.GaussianNLLLoss "torch.nn.GaussianNLLLoss") | Gaussian negative log likelihood loss.  
[`nn.KLDivLoss`](generated/torch.nn.kldivloss#torch.nn.KLDivLoss "torch.nn.KLDivLoss") | The Kullback-Leibler divergence loss measure  
[`nn.BCELoss`](generated/torch.nn.bceloss#torch.nn.BCELoss "torch.nn.BCELoss") | Creates a criterion that measures the Binary Cross Entropy between the target and the output:  
[`nn.BCEWithLogitsLoss`](generated/torch.nn.bcewithlogitsloss#torch.nn.BCEWithLogitsLoss "torch.nn.BCEWithLogitsLoss") | This loss combines a `Sigmoid` layer and the `BCELoss` in one single class.  
[`nn.MarginRankingLoss`](generated/torch.nn.marginrankingloss#torch.nn.MarginRankingLoss "torch.nn.MarginRankingLoss") | Creates a criterion that measures the loss given inputs x1x1 , x2x2 , two 1D mini-batch `Tensors`, and a label 1D mini-batch tensor yy (containing 1 or -1).  
[`nn.HingeEmbeddingLoss`](generated/torch.nn.hingeembeddingloss#torch.nn.HingeEmbeddingLoss "torch.nn.HingeEmbeddingLoss") | Measures the loss given an input tensor xx and a labels tensor yy (containing 1 or -1).  
[`nn.MultiLabelMarginLoss`](generated/torch.nn.multilabelmarginloss#torch.nn.MultiLabelMarginLoss "torch.nn.MultiLabelMarginLoss") | Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input xx (a 2D mini-batch `Tensor`) and output yy (which is a 2D `Tensor` of target class indices).  
[`nn.SmoothL1Loss`](generated/torch.nn.smoothl1loss#torch.nn.SmoothL1Loss "torch.nn.SmoothL1Loss") | Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.  
[`nn.SoftMarginLoss`](generated/torch.nn.softmarginloss#torch.nn.SoftMarginLoss "torch.nn.SoftMarginLoss") | Creates a criterion that optimizes a two-class classification logistic loss between input tensor xx and target tensor yy (containing 1 or -1).  
[`nn.MultiLabelSoftMarginLoss`](generated/torch.nn.multilabelsoftmarginloss#torch.nn.MultiLabelSoftMarginLoss "torch.nn.MultiLabelSoftMarginLoss") | Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input xx and target yy of size (N,C)(N, C) .  
[`nn.CosineEmbeddingLoss`](generated/torch.nn.cosineembeddingloss#torch.nn.CosineEmbeddingLoss "torch.nn.CosineEmbeddingLoss") | Creates a criterion that measures the loss given input tensors x1x_1 , x2x_2 and a `Tensor` label yy with values 1 or -1.  
[`nn.MultiMarginLoss`](generated/torch.nn.multimarginloss#torch.nn.MultiMarginLoss "torch.nn.MultiMarginLoss") | Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input xx (a 2D mini-batch `Tensor`) and output yy (which is a 1D tensor of target class indices, 0≤y≤x.size(1)−10 \leq y \leq \text{x.size}(1)-1 ):  
[`nn.TripletMarginLoss`](generated/torch.nn.tripletmarginloss#torch.nn.TripletMarginLoss "torch.nn.TripletMarginLoss") | Creates a criterion that measures the triplet loss given an input tensors x1x1 , x2x2 , x3x3 and a margin with a value greater than 00 .  
[`nn.TripletMarginWithDistanceLoss`](generated/torch.nn.tripletmarginwithdistanceloss#torch.nn.TripletMarginWithDistanceLoss "torch.nn.TripletMarginWithDistanceLoss") | Creates a criterion that measures the triplet loss given input tensors aa , pp , and nn (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function (“distance function”) used to compute the relationship between the anchor and positive example (“positive distance”) and the anchor and negative example (“negative distance”).  
  
## Vision Layers

[`nn.PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle "torch.nn.PixelShuffle") | Rearranges elements in a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) to a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) , where r is an upscale factor.  
---|---  
[`nn.PixelUnshuffle`](generated/torch.nn.pixelunshuffle#torch.nn.PixelUnshuffle "torch.nn.PixelUnshuffle") | Reverses the [`PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle "torch.nn.PixelShuffle") operation by rearranging elements in a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) , where r is a downscale factor.  
[`nn.Upsample`](generated/torch.nn.upsample#torch.nn.Upsample "torch.nn.Upsample") | Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data.  
[`nn.UpsamplingNearest2d`](generated/torch.nn.upsamplingnearest2d#torch.nn.UpsamplingNearest2d "torch.nn.UpsamplingNearest2d") | Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels.  
[`nn.UpsamplingBilinear2d`](generated/torch.nn.upsamplingbilinear2d#torch.nn.UpsamplingBilinear2d "torch.nn.UpsamplingBilinear2d") | Applies a 2D bilinear upsampling to an input signal composed of several input channels.  
  
## Shuffle Layers

[`nn.ChannelShuffle`](generated/torch.nn.channelshuffle#torch.nn.ChannelShuffle "torch.nn.ChannelShuffle") | Divide the channels in a tensor of shape (∗,C,H,W)(*, C , H, W) into g groups and rearrange them as (∗,Cg,g,H,W)(*, C \frac g, g, H, W) , while keeping the original tensor shape.  
---|---  
  
## DataParallel Layers (multi-GPU, distributed)

[`nn.DataParallel`](generated/torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel") | Implements data parallelism at the module level.  
---|---  
[`nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") | Implements distributed data parallelism that is based on `torch.distributed` package at the module level.  
  
## Utilities

From the `torch.nn.utils` module

[`clip_grad_norm_`](generated/torch.nn.utils.clip_grad_norm_#torch.nn.utils.clip_grad_norm_ "torch.nn.utils.clip_grad_norm_") | Clips gradient norm of an iterable of parameters.  
---|---  
[`clip_grad_value_`](generated/torch.nn.utils.clip_grad_value_#torch.nn.utils.clip_grad_value_ "torch.nn.utils.clip_grad_value_") | Clips gradient of an iterable of parameters at specified value.  
[`parameters_to_vector`](generated/torch.nn.utils.parameters_to_vector#torch.nn.utils.parameters_to_vector "torch.nn.utils.parameters_to_vector") | Convert parameters to one vector  
[`vector_to_parameters`](generated/torch.nn.utils.vector_to_parameters#torch.nn.utils.vector_to_parameters "torch.nn.utils.vector_to_parameters") | Convert one vector to the parameters  
[`prune.BasePruningMethod`](generated/torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod "torch.nn.utils.prune.BasePruningMethod") | Abstract base class for creation of new pruning techniques.  
---|---  
[`prune.PruningContainer`](generated/torch.nn.utils.prune.pruningcontainer#torch.nn.utils.prune.PruningContainer "torch.nn.utils.prune.PruningContainer") | Container holding a sequence of pruning methods for iterative pruning.  
---|---  
[`prune.Identity`](generated/torch.nn.utils.prune.identity#torch.nn.utils.prune.Identity "torch.nn.utils.prune.Identity") | Utility pruning method that does not prune any units but generates the pruning parametrization with a mask of ones.  
[`prune.RandomUnstructured`](generated/torch.nn.utils.prune.randomunstructured#torch.nn.utils.prune.RandomUnstructured "torch.nn.utils.prune.RandomUnstructured") | Prune (currently unpruned) units in a tensor at random.  
[`prune.L1Unstructured`](generated/torch.nn.utils.prune.l1unstructured#torch.nn.utils.prune.L1Unstructured "torch.nn.utils.prune.L1Unstructured") | Prune (currently unpruned) units in a tensor by zeroing out the ones with the lowest L1-norm.  
[`prune.RandomStructured`](generated/torch.nn.utils.prune.randomstructured#torch.nn.utils.prune.RandomStructured "torch.nn.utils.prune.RandomStructured") | Prune entire (currently unpruned) channels in a tensor at random.  
[`prune.LnStructured`](generated/torch.nn.utils.prune.lnstructured#torch.nn.utils.prune.LnStructured "torch.nn.utils.prune.LnStructured") | Prune entire (currently unpruned) channels in a tensor based on their Ln-norm.  
[`prune.CustomFromMask`](generated/torch.nn.utils.prune.customfrommask#torch.nn.utils.prune.CustomFromMask "torch.nn.utils.prune.CustomFromMask") |   
[`prune.identity`](generated/torch.nn.utils.prune.identity#torch.nn.utils.prune.identity "torch.nn.utils.prune.identity") | Applies pruning reparametrization to the tensor corresponding to the parameter called `name` in `module` without actually pruning any units.  
[`prune.random_unstructured`](generated/torch.nn.utils.prune.random_unstructured#torch.nn.utils.prune.random_unstructured "torch.nn.utils.prune.random_unstructured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) units selected at random.  
[`prune.l1_unstructured`](generated/torch.nn.utils.prune.l1_unstructured#torch.nn.utils.prune.l1_unstructured "torch.nn.utils.prune.l1_unstructured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) units with the lowest L1-norm.  
[`prune.random_structured`](generated/torch.nn.utils.prune.random_structured#torch.nn.utils.prune.random_structured "torch.nn.utils.prune.random_structured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` selected at random.  
[`prune.ln_structured`](generated/torch.nn.utils.prune.ln_structured#torch.nn.utils.prune.ln_structured "torch.nn.utils.prune.ln_structured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` with the lowest L``n``-norm.  
[`prune.global_unstructured`](generated/torch.nn.utils.prune.global_unstructured#torch.nn.utils.prune.global_unstructured "torch.nn.utils.prune.global_unstructured") | Globally prunes tensors corresponding to all parameters in `parameters` by applying the specified `pruning_method`.  
[`prune.custom_from_mask`](generated/torch.nn.utils.prune.custom_from_mask#torch.nn.utils.prune.custom_from_mask "torch.nn.utils.prune.custom_from_mask") | Prunes tensor corresponding to parameter called `name` in `module` by applying the pre-computed mask in `mask`.  
[`prune.remove`](generated/torch.nn.utils.prune.remove#torch.nn.utils.prune.remove "torch.nn.utils.prune.remove") | Removes the pruning reparameterization from a module and the pruning method from the forward hook.  
[`prune.is_pruned`](generated/torch.nn.utils.prune.is_pruned#torch.nn.utils.prune.is_pruned "torch.nn.utils.prune.is_pruned") | Check whether `module` is pruned by looking for `forward_pre_hooks` in its modules that inherit from the `BasePruningMethod`.  
[`weight_norm`](generated/torch.nn.utils.weight_norm#torch.nn.utils.weight_norm "torch.nn.utils.weight_norm") | Applies weight normalization to a parameter in the given module.  
[`remove_weight_norm`](generated/torch.nn.utils.remove_weight_norm#torch.nn.utils.remove_weight_norm "torch.nn.utils.remove_weight_norm") | Removes the weight normalization reparameterization from a module.  
[`spectral_norm`](generated/torch.nn.utils.spectral_norm#torch.nn.utils.spectral_norm "torch.nn.utils.spectral_norm") | Applies spectral normalization to a parameter in the given module.  
[`remove_spectral_norm`](generated/torch.nn.utils.remove_spectral_norm#torch.nn.utils.remove_spectral_norm "torch.nn.utils.remove_spectral_norm") | Removes the spectral normalization reparameterization from a module.  
  
Utility functions in other modules

[`nn.utils.rnn.PackedSequence`](generated/torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") | Holds the data and list of `batch_sizes` of a packed sequence.  
---|---  
[`nn.utils.rnn.pack_padded_sequence`](generated/torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") | Packs a Tensor containing padded sequences of variable length.  
[`nn.utils.rnn.pad_packed_sequence`](generated/torch.nn.utils.rnn.pad_packed_sequence#torch.nn.utils.rnn.pad_packed_sequence "torch.nn.utils.rnn.pad_packed_sequence") | Pads a packed batch of variable length sequences.  
[`nn.utils.rnn.pad_sequence`](generated/torch.nn.utils.rnn.pad_sequence#torch.nn.utils.rnn.pad_sequence "torch.nn.utils.rnn.pad_sequence") | Pad a list of variable length Tensors with `padding_value`  
[`nn.utils.rnn.pack_sequence`](generated/torch.nn.utils.rnn.pack_sequence#torch.nn.utils.rnn.pack_sequence "torch.nn.utils.rnn.pack_sequence") | Packs a list of variable length Tensors  
[`nn.Flatten`](generated/torch.nn.flatten#torch.nn.Flatten "torch.nn.Flatten") | Flattens a contiguous range of dims into a tensor.  
[`nn.Unflatten`](generated/torch.nn.unflatten#torch.nn.Unflatten "torch.nn.Unflatten") | Unflattens a tensor dim expanding it to a desired shape.  
  
## Quantized Functions

Quantization refers to techniques for performing computations and storing
tensors at lower bitwidths than floating point precision. PyTorch supports
both per tensor and per channel asymmetric linear quantization. To learn more
how to use quantized functions in PyTorch, please refer to the
[Quantization](quantization#quantization-doc) documentation.

## Lazy Modules Initialization

[`nn.modules.lazy.LazyModuleMixin`](generated/torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") | A mixin for modules that lazily initialize parameters, also known as “lazy modules.”  
---|---  
  
# torch.nn.init

`torch.nn.init.calculate_gain(nonlinearity, param=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#calculate_gain)

    

Return the recommended gain value for the given nonlinearity function. The
values are as follows:

nonlinearity | gain  
---|---  
Linear / Identity | 11  
Conv{1,2,3}D | 11  
Sigmoid | 11  
Tanh | 53\frac{5}{3}  
ReLU | 2\sqrt{2}  
Leaky Relu | 21+negative_slope2\sqrt{\frac{2}{1 + \text{negative\\_slope}^2}}  
SELU | 34\frac{3}{4}  
  
Parameters

    

  * **nonlinearity** – the non-linear function (`nn.functional` name)
  * **param** – optional parameter for the non-linear function

#### Examples

    
    
    >>> gain = nn.init.calculate_gain('leaky_relu', 0.2)  # leaky_relu with negative_slope=0.2
    

`torch.nn.init.uniform_(tensor, a=0.0, b=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#uniform_)

    

Fills the input Tensor with values drawn from the uniform distribution
U(a,b)\mathcal{U}(a, b) .

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **a** – the lower bound of the uniform distribution
  * **b** – the upper bound of the uniform distribution

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.uniform_(w)
    

`torch.nn.init.normal_(tensor, mean=0.0, std=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#normal_)

    

Fills the input Tensor with values drawn from the normal distribution
N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2) .

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **mean** – the mean of the normal distribution
  * **std** – the standard deviation of the normal distribution

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.normal_(w)
    

`torch.nn.init.constant_(tensor, val)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#constant_)

    

Fills the input Tensor with the value val\text{val} .

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **val** – the value to fill the tensor with

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.constant_(w, 0.3)
    

`torch.nn.init.ones_(tensor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#ones_)

    

Fills the input Tensor with the scalar value `1`.

Parameters

    

**tensor** – an n-dimensional `torch.Tensor`

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.ones_(w)
    

`torch.nn.init.zeros_(tensor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#zeros_)

    

Fills the input Tensor with the scalar value `0`.

Parameters

    

**tensor** – an n-dimensional `torch.Tensor`

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.zeros_(w)
    

`torch.nn.init.eye_(tensor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#eye_)

    

Fills the 2-dimensional input `Tensor` with the identity matrix. Preserves the
identity of the inputs in `Linear` layers, where as many inputs are preserved
as possible.

Parameters

    

**tensor** – a 2-dimensional `torch.Tensor`

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.eye_(w)
    

`torch.nn.init.dirac_(tensor, groups=1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#dirac_)

    

Fills the {3, 4, 5}-dimensional input `Tensor` with the Dirac delta function.
Preserves the identity of the inputs in `Convolutional` layers, where as many
input channels are preserved as possible. In case of groups>1, each group of
channels preserves identity

Parameters

    

  * **tensor** – a {3, 4, 5}-dimensional `torch.Tensor`
  * **groups** (_optional_) – number of groups in the conv layer (default: 1)

#### Examples

    
    
    >>> w = torch.empty(3, 16, 5, 5)
    >>> nn.init.dirac_(w)
    >>> w = torch.empty(3, 24, 5, 5)
    >>> nn.init.dirac_(w, 3)
    

`torch.nn.init.xavier_uniform_(tensor, gain=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#xavier_uniform_)

    

Fills the input `Tensor` with values according to the method described in
`Understanding the difficulty of training deep feedforward neural networks` \-
Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting
tensor will have values sampled from U(−a,a)\mathcal{U}(-a, a) where

a=gain×6fan_in+fan_outa = \text{gain} \times \sqrt{\frac{6}{\text{fan\\_in} +
\text{fan\\_out}}}

Also known as Glorot initialization.

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **gain** – an optional scaling factor

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu'))
    

`torch.nn.init.xavier_normal_(tensor, gain=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#xavier_normal_)

    

Fills the input `Tensor` with values according to the method described in
`Understanding the difficulty of training deep feedforward neural networks` \-
Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting
tensor will have values sampled from N(0,std2)\mathcal{N}(0, \text{std}^2)
where

std=gain×2fan_in+fan_out\text{std} = \text{gain} \times
\sqrt{\frac{2}{\text{fan\\_in} + \text{fan\\_out}}}

Also known as Glorot initialization.

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **gain** – an optional scaling factor

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.xavier_normal_(w)
    

`torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in',
nonlinearity='leaky_relu')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#kaiming_uniform_)

    

Fills the input `Tensor` with values according to the method described in
`Delving deep into rectifiers: Surpassing human-level performance on ImageNet
classification` \- He, K. et al. (2015), using a uniform distribution. The
resulting tensor will have values sampled from
U(−bound,bound)\mathcal{U}(-\text{bound}, \text{bound}) where

bound=gain×3fan_mode\text{bound} = \text{gain} \times
\sqrt{\frac{3}{\text{fan\\_mode}}}

Also known as He initialization.

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **a** – the negative slope of the rectifier used after this layer (only used with `'leaky_relu'`)
  * **mode** – either `'fan_in'` (default) or `'fan_out'`. Choosing `'fan_in'` preserves the magnitude of the variance of the weights in the forward pass. Choosing `'fan_out'` preserves the magnitudes in the backwards pass.
  * **nonlinearity** – the non-linear function (`nn.functional` name), recommended to use only with `'relu'` or `'leaky_relu'` (default).

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu')
    

`torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in',
nonlinearity='leaky_relu')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#kaiming_normal_)

    

Fills the input `Tensor` with values according to the method described in
`Delving deep into rectifiers: Surpassing human-level performance on ImageNet
classification` \- He, K. et al. (2015), using a normal distribution. The
resulting tensor will have values sampled from N(0,std2)\mathcal{N}(0,
\text{std}^2) where

std=gainfan_mode\text{std} = \frac{\text{gain}}{\sqrt{\text{fan\\_mode}}}

Also known as He initialization.

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **a** – the negative slope of the rectifier used after this layer (only used with `'leaky_relu'`)
  * **mode** – either `'fan_in'` (default) or `'fan_out'`. Choosing `'fan_in'` preserves the magnitude of the variance of the weights in the forward pass. Choosing `'fan_out'` preserves the magnitudes in the backwards pass.
  * **nonlinearity** – the non-linear function (`nn.functional` name), recommended to use only with `'relu'` or `'leaky_relu'` (default).

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu')
    

`torch.nn.init.orthogonal_(tensor, gain=1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#orthogonal_)

    

Fills the input `Tensor` with a (semi) orthogonal matrix, as described in
`Exact solutions to the nonlinear dynamics of learning in deep linear neural
networks` \- Saxe, A. et al. (2013). The input tensor must have at least 2
dimensions, and for tensors with more than 2 dimensions the trailing
dimensions are flattened.

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`, where n≥2n \geq 2
  * **gain** – optional scaling factor

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.orthogonal_(w)
    

`torch.nn.init.sparse_(tensor, sparsity, std=0.01)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#sparse_)

    

Fills the 2D input `Tensor` as a sparse matrix, where the non-zero elements
will be drawn from the normal distribution N(0,0.01)\mathcal{N}(0, 0.01) , as
described in `Deep learning via Hessian-free optimization` \- Martens, J.
(2010).

Parameters

    

  * **tensor** – an n-dimensional `torch.Tensor`
  * **sparsity** – The fraction of elements in each column to be set to zero
  * **std** – the standard deviation of the normal distribution used to generate the non-zero values

#### Examples

    
    
    >>> w = torch.empty(3, 5)
    >>> nn.init.sparse_(w, sparsity=0.1)
    

# torch.onnx

  * Example: End-to-end AlexNet from PyTorch to ONNX
  * Tracing vs Scripting
  * Write PyTorch model in Torch way
  * Using dictionaries to handle Named Arguments as model inputs
  * Indexing

    * Getter
    * Setter
  * TorchVision support
  * Limitations
  * Supported operators
  * Adding support for operators

    * ATen operators
    * Non-ATen operators
    * Custom operators
  * Operator Export Type

    * ONNX
    * ONNX_ATEN
    * ONNX_ATEN_FALLBACK
    * RAW
    * ONNX_FALLTHROUGH
  * Frequently Asked Questions
  * Use external data format
  * Training
  * Functions

## Example: End-to-end AlexNet from PyTorch to ONNX

Here is a simple script which exports a pretrained AlexNet as defined in
torchvision into ONNX. It runs a single round of inference and then saves the
resulting traced model to `alexnet.onnx`:

    
    
    import torch
    import torchvision
    
    dummy_input = torch.randn(10, 3, 224, 224, device='cuda')
    model = torchvision.models.alexnet(pretrained=True).cuda()
    
    # Providing input and output names sets the display names for values
    # within the model's graph. Setting these does not change the semantics
    # of the graph; it is only for readability.
    #
    # The inputs to the network consist of the flat list of inputs (i.e.
    # the values you would pass to the forward() method) followed by the
    # flat list of parameters. You can partially specify names, i.e. provide
    # a list here shorter than the number of inputs to the model, and we will
    # only set that subset of names, starting from the beginning.
    input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ]
    output_names = [ "output1" ]
    
    torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names)
    

The resulting `alexnet.onnx` is a binary protobuf file which contains both the
network structure and parameters of the model you exported (in this case,
AlexNet). The keyword argument `verbose=True` causes the exporter to print out
a human-readable representation of the network:

    
    
    # These are the inputs and parameters to the network, which have taken on
    # the names we specified earlier.
    graph(%actual_input_1 : Float(10, 3, 224, 224)
          %learned_0 : Float(64, 3, 11, 11)
          %learned_1 : Float(64)
          %learned_2 : Float(192, 64, 5, 5)
          %learned_3 : Float(192)
          # ---- omitted for brevity ----
          %learned_14 : Float(1000, 4096)
          %learned_15 : Float(1000)) {
      # Every statement consists of some output tensors (and their types),
      # the operator to be run (with its attributes, e.g., kernels, strides,
      # etc.), its input tensors (%actual_input_1, %learned_0, %learned_1)
      %17 : Float(10, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides=[4, 4]](%actual_input_1, %learned_0, %learned_1), scope: AlexNet/Sequential[features]/Conv2d[0]
      %18 : Float(10, 64, 55, 55) = onnx::Relu(%17), scope: AlexNet/Sequential[features]/ReLU[1]
      %19 : Float(10, 64, 27, 27) = onnx::MaxPool[kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%18), scope: AlexNet/Sequential[features]/MaxPool2d[2]
      # ---- omitted for brevity ----
      %29 : Float(10, 256, 6, 6) = onnx::MaxPool[kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%28), scope: AlexNet/Sequential[features]/MaxPool2d[12]
      # Dynamic means that the shape is not known. This may be because of a
      # limitation of our implementation (which we would like to fix in a
      # future release) or shapes which are truly dynamic.
      %30 : Dynamic = onnx::Shape(%29), scope: AlexNet
      %31 : Dynamic = onnx::Slice[axes=[0], ends=[1], starts=[0]](%30), scope: AlexNet
      %32 : Long() = onnx::Squeeze[axes=[0]](%31), scope: AlexNet
      %33 : Long() = onnx::Constant[value={9216}](), scope: AlexNet
      # ---- omitted for brevity ----
      %output1 : Float(10, 1000) = onnx::Gemm[alpha=1, beta=1, broadcast=1, transB=1](%45, %learned_14, %learned_15), scope: AlexNet/Sequential[classifier]/Linear[6]
      return (%output1);
    }
    

You can also verify the protobuf using the
[ONNX](https://github.com/onnx/onnx/) library. You can install `ONNX` with
conda:

    
    
    conda install -c conda-forge onnx
    

Then, you can run:

    
    
    import onnx
    
    # Load the ONNX model
    model = onnx.load("alexnet.onnx")
    
    # Check that the IR is well formed
    onnx.checker.check_model(model)
    
    # Print a human readable representation of the graph
    onnx.helper.printable_graph(model.graph)
    

To run the exported script with [caffe2](https://caffe2.ai/), you will need to
install `caffe2`: If you don’t have one already, Please [follow the install
instructions](https://caffe2.ai/docs/getting-started.html).

Once these are installed, you can use the backend for Caffe2:

    
    
    # ...continuing from above
    import caffe2.python.onnx.backend as backend
    import numpy as np
    
    rep = backend.prepare(model, device="CUDA:0") # or "CPU"
    # For the Caffe2 backend:
    #     rep.predict_net is the Caffe2 protobuf for the network
    #     rep.workspace is the Caffe2 workspace for the network
    #       (see the class caffe2.python.onnx.backend.Workspace)
    outputs = rep.run(np.random.randn(10, 3, 224, 224).astype(np.float32))
    # To run networks with more than one input, pass a tuple
    # rather than a single numpy ndarray.
    print(outputs[0])
    

You can also run the exported model with [ONNX
Runtime](https://github.com/microsoft/onnxruntime), you will need to install
`ONNX Runtime`: please [follow these
instructions](https://github.com/microsoft/onnxruntime#installation).

Once these are installed, you can use the backend for ONNX Runtime:

    
    
    # ...continuing from above
    import onnxruntime as ort
    
    ort_session = ort.InferenceSession('alexnet.onnx')
    
    outputs = ort_session.run(None, {'actual_input_1': np.random.randn(10, 3, 224, 224).astype(np.float32)})
    
    print(outputs[0])
    

Here is another [tutorial of exporting the SuperResolution model to
ONNX.](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html).

In the future, there will be backends for other frameworks as well.

## Tracing vs Scripting

The ONNX exporter can be both _trace-based_ and _script-based_ exporter.

  * _trace-based_ means that it operates by executing your model once, and exporting the operators which were actually run during this run. This means that if your model is dynamic, e.g., changes behavior depending on input data, the export won’t be accurate. Similarly, a trace is likely to be valid only for a specific input size (which is one reason why we require explicit inputs on tracing.) We recommend examining the model trace and making sure the traced operators look reasonable. If your model contains control flows like for loops and if conditions, _trace-based_ exporter will unroll the loops and if conditions, exporting a static graph that is exactly the same as this run. If you want to export your model with dynamic control flows, you will need to use the _script-based_ exporter.
  * _script-based_ means that the model you are trying to export is a [ScriptModule](jit). `ScriptModule` is the core data structure in `TorchScript`, and `TorchScript` is a subset of Python language, that creates serializable and optimizable models from PyTorch code.

We allow mixing tracing and scripting. You can compose tracing and scripting
to suit the particular requirements of a part of a model. Checkout this
example:

    
    
    import torch
    
    # Trace-based only
    
    class LoopModel(torch.nn.Module):
        def forward(self, x, y):
            for i in range(y):
                x = x + i
            return x
    
    model = LoopModel()
    dummy_input = torch.ones(2, 3, dtype=torch.long)
    loop_count = torch.tensor(5, dtype=torch.long)
    
    torch.onnx.export(model, (dummy_input, loop_count), 'loop.onnx', verbose=True)
    

With _trace-based_ exporter, we get the result ONNX graph which unrolls the
for loop:

    
    
    graph(%0 : Long(2, 3),
          %1 : Long()):
      %2 : Tensor = onnx::Constant[value={1}]()
      %3 : Tensor = onnx::Add(%0, %2)
      %4 : Tensor = onnx::Constant[value={2}]()
      %5 : Tensor = onnx::Add(%3, %4)
      %6 : Tensor = onnx::Constant[value={3}]()
      %7 : Tensor = onnx::Add(%5, %6)
      %8 : Tensor = onnx::Constant[value={4}]()
      %9 : Tensor = onnx::Add(%7, %8)
      return (%9)
    

To utilize _script-based_ exporter for capturing the dynamic loop, we can
write the loop in script, and call it from the regular nn.Module:

    
    
    # Mixing tracing and scripting
    
    @torch.jit.script
    def loop(x, y):
        for i in range(int(y)):
            x = x + i
        return x
    
    class LoopModel2(torch.nn.Module):
        def forward(self, x, y):
            return loop(x, y)
    
    model = LoopModel2()
    dummy_input = torch.ones(2, 3, dtype=torch.long)
    loop_count = torch.tensor(5, dtype=torch.long)
    torch.onnx.export(model, (dummy_input, loop_count), 'loop.onnx', verbose=True,
                      input_names=['input_data', 'loop_range'])
    

Now the exported ONNX graph becomes:

    
    
    graph(%input_data : Long(2, 3),
          %loop_range : Long()):
      %2 : Long() = onnx::Constant[value={1}](), scope: LoopModel2/loop
      %3 : Tensor = onnx::Cast[to=9](%2)
      %4 : Long(2, 3) = onnx::Loop(%loop_range, %3, %input_data), scope: LoopModel2/loop # custom_loop.py:240:5
        block0(%i.1 : Long(), %cond : bool, %x.6 : Long(2, 3)):
          %8 : Long(2, 3) = onnx::Add(%x.6, %i.1), scope: LoopModel2/loop # custom_loop.py:241:13
          %9 : Tensor = onnx::Cast[to=9](%2)
          -> (%9, %8)
      return (%4)
    

The dynamic control flow is captured correctly. We can verify in backends with
different loop range.

    
    
    import caffe2.python.onnx.backend as backend
    import numpy as np
    import onnx
    model = onnx.load('loop.onnx')
    
    rep = backend.prepare(model)
    outputs = rep.run((dummy_input.numpy(), np.array(9).astype(np.int64)))
    print(outputs[0])
    #[[37 37 37]
    # [37 37 37]]
    
    
    import onnxruntime as ort
    ort_sess = ort.InferenceSession('loop.onnx')
    outputs = ort_sess.run(None, {'input_data': dummy_input.numpy(),
                                  'loop_range': np.array(9).astype(np.int64)})
    print(outputs)
    #[array([[37, 37, 37],
    #       [37, 37, 37]], dtype=int64)]
    

To avoid exporting a variable scalar tensor as a fixed value constant as part
of the ONNX model, please avoid use of `torch.Tensor.item()`. Torch supports
implicit cast of single-element tensors to numbers. E.g.:

    
    
    class LoopModel(torch.nn.Module):
        def forward(self, x, y):
            res = []
            arr = x.split(2, 0)
            for i in range(int(y)):
                res += [arr[i].sum(0, False)]
            return torch.stack(res)
    
    model = torch.jit.script(LoopModel())
    inputs = (torch.randn(16), torch.tensor(8))
    
    out = model(*inputs)
    torch.onnx.export(model, inputs, 'loop_and_list.onnx', opset_version=11, example_outputs=out)
    

## Write PyTorch model in Torch way

PyTorch models can be written using numpy manipulations, but this is not
proper when we convert to the ONNX model. For the trace-based exporter,
tracing treats the numpy values as the constant node, therefore it calculates
the wrong result if we change the input. So the PyTorch model need implement
using torch operators. For example, do not use numpy operators on numpy
tensors:

    
    
    np.concatenate((x, y, z), axis=1)
    

do not convert to numpy types:

    
    
    y = x.astype(np.int)
    

Always use torch tensors and torch operators: torch.concat, etc. In addition,
Dropout layer need defined in init function so that inferencing can handle it
properly, i.e.,

    
    
    class MyModule(nn.Module):
        def __init__(self):
            self.dropout = nn.Dropout(0.5)
    
        def forward(self, x):
            x = self.dropout(x)
    

## Using dictionaries to handle Named Arguments as model inputs

There are two ways to handle models which consist of named parameters or
keyword arguments as inputs:

  * The first method is to pass all the inputs in the same order as required by the model and pass None values for the keyword arguments that do not require a value to be passed
  * The second and more intuitive method is to represent the keyword arguments as key-value pairs where the key represents the name of the argument in the model signature and the value represents the value of the argument to be passed

For example, in the model:

    
    
    class Model(torch.nn.Module):
      def forward(self, x, y=None, z=None):
        if y is not None:
          return x + y
        if z is not None:
          return x + z
        return x
    m = Model()
    x = torch.randn(2, 3)
    z = torch.randn(2, 3)
    

There are two ways of exporting the model:

  * Not using a dictionary for the keyword arguments and passing all the inputs in the same order as required by the model
        
        torch.onnx.export(model, (x, None, z), ‘test.onnx’)
        

  * Using a dictionary to represent the keyword arguments. This dictionary is always passed in addition to the non-keyword arguments and is always the last argument in the args tuple.
        
        torch.onnx.export(model, (x, {'y': None, 'z': z}), ‘test.onnx’)
        

For cases in which there are no keyword arguments, models can be exported with
either an empty or no dictionary. For example,

    
    
    torch.onnx.export(model, (x, {}), ‘test.onnx’)
    or
    torch.onnx.export(model, (x, ), ‘test.onnx’)
    

An exception to this rule are cases in which the last input is also of a
dictionary type. In these cases it is mandatory to have an empty dictionary as
the last argument in the args tuple. For example,

    
    
    class Model(torch.nn.Module):
      def forward(self, k, x):
        ...
        return x
    m = Model()
    k = torch.randn(2, 3)
    x = {torch.tensor(1.): torch.randn(2, 3)}
    

Without the presence of the empty dictionary, the export call assumes that the
‘x’ input is intended to represent the optional dictionary consisting of named
arguments. In order to prevent this from being an issue a constraint is placed
to provide an empty dictionary as the last input in the tuple args in such
cases. The new call would look like this.

    
    
    torch.onnx.export(model, (k, x, {}), ‘test.onnx’)
    

## Indexing

Tensor indexing in PyTorch is very flexible and complicated. There are two
categories of indexing. Both are largely supported in exporting today. If you
are experiencing issues exporting indexing that belongs to the supported
patterns below, please double check that you are exporting with the latest
opset (opset_version=12).

### Getter

This type of indexing occurs on the RHS. Export is supported for ONNX opset
version >= 9. E.g.:

    
    
    data = torch.randn(3, 4)
    index = torch.tensor([1, 2])
    
    # RHS indexing is supported in ONNX opset >= 11.
    class RHSIndexing(torch.nn.Module):
        def forward(self, data, index):
            return data[index]
    
    out = RHSIndexing()(data, index)
    
    torch.onnx.export(RHSIndexing(), (data, index), 'indexing.onnx', opset_version=9)
    
    # onnxruntime
    import onnxruntime
    sess = onnxruntime.InferenceSession('indexing.onnx')
    out_ort = sess.run(None, {
        sess.get_inputs()[0].name: data.numpy(),
        sess.get_inputs()[1].name: index.numpy(),
    })
    
    assert torch.all(torch.eq(out, torch.tensor(out_ort)))
    

Below is the list of supported patterns for RHS indexing.

    
    
    # Scalar indices
    data[0, 1]
    
    # Slice indices
    data[:3]
    
    # Tensor indices
    data[torch.tensor([[1, 2], [2, 3]])]
    data[torch.tensor([2, 3]), torch.tensor([1, 2])]
    data[torch.tensor([[1, 2], [2, 3]]), torch.tensor([2, 3])]
    data[torch.tensor([2, 3]), :, torch.tensor([1, 2])]
    
    # Ellipsis
    # Not supported in scripting
    # i.e. torch.jit.script(model) will fail if model contains this pattern.
    # Export is supported under tracing
    # i.e. torch.onnx.export(model)
    data[...]
    
    # The combination of above
    data[2, ..., torch.tensor([2, 1, 3]), 2:4, torch.tensor([[1], [2]])]
    
    # Boolean mask (supported for ONNX opset version >= 11)
    data[data != 1]
    

And below is the list of unsupported patterns for RHS indexing.

    
    
    # Tensor indices that includes negative values.
    data[torch.tensor([[1, 2], [2, -3]]), torch.tensor([-2, 3])]
    

### Setter

In code, this type of indexing occurs on the LHS. Export is supported for ONNX
opset version >= 11. E.g.:

    
    
    data = torch.zeros(3, 4)
    new_data = torch.arange(4).to(torch.float32)
    
    # LHS indexing is supported in ONNX opset >= 11.
    class LHSIndexing(torch.nn.Module):
        def forward(self, data, new_data):
            data[1] = new_data
            return data
    
    out = LHSIndexing()(data, new_data)
    
    data = torch.zeros(3, 4)
    new_data = torch.arange(4).to(torch.float32)
    torch.onnx.export(LHSIndexing(), (data, new_data), 'inplace_assign.onnx', opset_version=11)
    
    # onnxruntime
    import onnxruntime
    sess = onnxruntime.InferenceSession('inplace_assign.onnx')
    out_ort = sess.run(None, {
        sess.get_inputs()[0].name: torch.zeros(3, 4).numpy(),
        sess.get_inputs()[1].name: new_data.numpy(),
    })
    
    assert torch.all(torch.eq(out, torch.tensor(out_ort)))
    

Below is the list of supported patterns for LHS indexing.

    
    
    # Scalar indices
    data[0, 1] = new_data
    
    # Slice indices
    data[:3] = new_data
    
    # Tensor indices
    # If more than one tensor are used as indices, only consecutive 1-d tensor indices are supported.
    data[torch.tensor([[1, 2], [2, 3]])] = new_data
    data[torch.tensor([2, 3]), torch.tensor([1, 2])] = new_data
    
    # Ellipsis
    # Not supported to export in script modules
    # i.e. torch.onnx.export(torch.jit.script(model)) will fail if model contains this pattern.
    # Export is supported under tracing
    # i.e. torch.onnx.export(model)
    data[...] = new_data
    
    # The combination of above
    data[2, ..., torch.tensor([2, 1, 3]), 2:4] += update
    
    # Boolean mask
    data[data != 1] = new_data
    

And below is the list of unsupported patterns for LHS indexing.

    
    
    # Multiple tensor indices if any has rank >= 2
    data[torch.tensor([[1, 2], [2, 3]]), torch.tensor([2, 3])] = new_data
    
    # Multiple tensor indices that are not consecutive
    data[torch.tensor([2, 3]), :, torch.tensor([1, 2])] = new_data
    
    # Tensor indices that includes negative values.
    data[torch.tensor([1, -2]), torch.tensor([-2, 3])] = new_data
    

If you are experiencing issues exporting indexing that belongs to the above
supported patterns, please double check that you are exporting with the latest
opset (opset_version=12).

## TorchVision support

All TorchVision models, except for quantized versions, are exportable to ONNX.
More details can be found in [TorchVision](torchvision/models).

## Limitations

  * Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. Users need to verify their dict inputs carefully, and keep in mind that dynamic lookups are not available.
  * PyTorch and ONNX backends(Caffe2, ONNX Runtime, etc) often have implementations of operators with some numeric differences. Depending on model structure, these differences may be negligible, but they can also cause major divergences in behavior (especially on untrained models.) We allow Caffe2 to call directly to Torch implementations of operators, to help you smooth over these differences when precision is important, and to also document these differences.

## Supported operators

The following operators are supported:

  * BatchNorm
  * ConstantPadNd
  * Conv
  * Dropout
  * Embedding (no optional arguments supported)
  * EmbeddingBag
  * FeatureDropout (training mode not supported)
  * Index
  * MaxPool1d
  * MaxPool2d
  * MaxPool3d
  * RNN
  * abs
  * absolute
  * acos
  * adaptive_avg_pool1d
  * adaptive_avg_pool2d
  * adaptive_avg_pool3d
  * adaptive_max_pool1d
  * adaptive_max_pool2d
  * adaptive_max_pool3d
  * add (nonzero alpha not supported)
  * addmm
  * and
  * arange
  * argmax
  * argmin
  * asin
  * atan
  * avg_pool1d
  * avg_pool2d
  * avg_pool2d
  * avg_pool3d
  * as_strided
  * baddbmm
  * bitshift
  * cat
  * ceil
  * celu
  * clamp
  * clamp_max
  * clamp_min
  * concat
  * copy
  * cos
  * cumsum
  * det
  * dim_arange
  * div
  * dropout
  * einsum
  * elu
  * empty
  * empty_like
  * eq
  * erf
  * exp
  * expand
  * expand_as
  * eye
  * flatten
  * floor
  * floor_divide
  * frobenius_norm
  * full
  * full_like
  * gather
  * ge
  * gelu
  * glu
  * group_norm
  * gt
  * hardswish
  * hardtanh
  * im2col
  * index_copy
  * index_fill
  * index_put
  * index_select
  * instance_norm
  * interpolate
  * isnan
  * KLDivLoss
  * layer_norm
  * le
  * leaky_relu
  * len
  * log
  * log1p
  * log2
  * log_sigmoid
  * log_softmax
  * logdet
  * logsumexp
  * lt
  * masked_fill
  * masked_scatter
  * masked_select
  * max
  * mean
  * min
  * mm
  * mul
  * multinomial
  * narrow
  * ne
  * neg
  * new_empty
  * new_full
  * new_zeros
  * nll_loss
  * nonzero
  * norm
  * ones
  * ones_like
  * or
  * permute
  * pixel_shuffle
  * pow
  * prelu (single weight shared among input channels not supported)
  * prod
  * rand
  * randn
  * randn_like
  * reciprocal
  * reflection_pad
  * relu
  * repeat
  * replication_pad
  * reshape
  * reshape_as
  * round
  * rrelu
  * rsqrt
  * rsub
  * scalar_tensor
  * scatter
  * scatter_add
  * select
  * selu
  * sigmoid
  * sign
  * sin
  * size
  * slice
  * softmax
  * softplus
  * sort
  * split
  * sqrt
  * squeeze
  * stack
  * std
  * sub (nonzero alpha not supported)
  * sum
  * t
  * tan
  * tanh
  * threshold (non-zero threshold/non-zero value not supported)
  * to
  * topk
  * transpose
  * true_divide
  * type_as
  * unbind
  * unfold (experimental support with ATen-Caffe2 integration)
  * unique
  * unsqueeze
  * upsample_nearest1d
  * upsample_nearest2d
  * upsample_nearest3d
  * view
  * weight_norm
  * where
  * zeros
  * zeros_like

The operator set above is sufficient to export the following models:

  * AlexNet
  * DCGAN
  * DenseNet
  * Inception (warning: this model is highly sensitive to changes in operator implementation)
  * ResNet
  * SuperResolution
  * VGG
  * [word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model)

## Adding support for operators

Adding export support for operators is an _advance usage_.

To achieve this, developers need to touch the source code of PyTorch. Please
follow the [instructions](https://github.com/pytorch/pytorch#from-source) for
installing PyTorch from source. If the wanted operator is standardized in
ONNX, it should be easy to add support for exporting such operator (adding a
symbolic function for the operator). To confirm whether the operator is
standardized or not, please check the [ONNX operator
list](https://github.com/onnx/onnx/blob/master/docs/Operators.md).

### ATen operators

If the operator is an ATen operator, which means you can find the declaration
of the function in `torch/csrc/autograd/generated/VariableType.h` (available
in generated code in PyTorch install dir), you should add the symbolic
function in `torch/onnx/symbolic_opset<version>.py` and follow the
instructions listed as below:

  * Define the symbolic function in `torch/onnx/symbolic_opset<version>.py`, for example [torch/onnx/symbolic_opset9.py](https://github.com/pytorch/pytorch/blob/master/torch/onnx/symbolic_opset9.py). Make sure the function has the same name as the ATen operator/function defined in `VariableType.h`.
  * The first parameter is always the exported ONNX graph. Parameter names must EXACTLY match the names in `VariableType.h`, because dispatch is done with keyword arguments.
  * Parameter ordering does NOT necessarily match what is in `VariableType.h`, tensors (inputs) are always first, then non-tensor arguments.
  * In the symbolic function, if the operator is already standardized in ONNX, we only need to create a node to represent the ONNX operator in the graph.
  * If the input argument is a tensor, but ONNX asks for a scalar, we have to explicitly do the conversion. The helper function `_scalar` can convert a scalar tensor into a python scalar, and `_if_scalar_type_as` can turn a Python scalar into a PyTorch tensor.

### Non-ATen operators

If the operator is a non-ATen operator, the symbolic function has to be added
in the corresponding PyTorch Function class. Please read the following
instructions:

  * Create a symbolic function named `symbolic` in the corresponding Function class.
  * The first parameter is always the exported ONNX graph.
  * Parameter names except the first must EXACTLY match the names in `forward`.
  * The output tuple size must match the outputs of `forward`.
  * In the symbolic function, if the operator is already standardized in ONNX, we just need to create a node to represent the ONNX operator in the graph.

Symbolic functions should be implemented in Python. All of these functions
interact with Python methods which are implemented via C++-Python bindings,
but intuitively the interface they provide looks like this:

    
    
    def operator/symbolic(g, *inputs):
      """
      Modifies Graph (e.g., using "op"), adding the ONNX operations representing
      this PyTorch function, and returning a Value or tuple of Values specifying the
      ONNX outputs whose values correspond to the original PyTorch return values
      of the autograd Function (or None if an output is not supported by ONNX).
    
      Args:
        g (Graph): graph to write the ONNX representation into
        inputs (Value...): list of values representing the variables which contain
            the inputs for this function
      """
    
    class Value(object):
      """Represents an intermediate tensor value computed in ONNX."""
      def type(self):
        """Returns the Type of the value."""
    
    class Type(object):
      def sizes(self):
        """Returns a tuple of ints representing the shape of a tensor this describes."""
    
    class Graph(object):
      def op(self, opname, *inputs, **attrs):
        """
        Create an ONNX operator 'opname', taking 'args' as inputs
        and attributes 'kwargs' and add it as a node to the current graph,
        returning the value representing the single output of this
        operator (see the `outputs` keyword argument for multi-return
        nodes).
    
        The set of operators and the inputs/attributes they take
        is documented at https://github.com/onnx/onnx/blob/master/docs/Operators.md
    
        Args:
            opname (string): The ONNX operator name, e.g., `Abs` or `Add`.
            args (Value...): The inputs to the operator; usually provided
                as arguments to the `symbolic` definition.
            kwargs: The attributes of the ONNX operator, with keys named
                according to the following convention: `alpha_f` indicates
                the `alpha` attribute with type `f`.  The valid type specifiers are
                `f` (float), `i` (int), `s` (string) or `t` (Tensor).  An attribute
                specified with type float accepts either a single float, or a
                list of floats (e.g., you would say `dims_i` for a `dims` attribute
                that takes a list of integers).
            outputs (int, optional):  The number of outputs this operator returns;
                by default an operator is assumed to return a single output.
                If `outputs` is greater than one, this functions returns a tuple
                of output `Value`, representing each output of the ONNX operator
                in positional.
        """
    

The ONNX graph C++ definition is in `torch/csrc/jit/ir/ir.h`.

Here is an example of handling missing symbolic function for `elu` operator.
We try to export the model and see the error message as below:

    
    
    UserWarning: ONNX export failed on elu because torch.onnx.symbolic_opset9.elu does not exist
    RuntimeError: ONNX export failed: Couldn't export operator elu
    

The export fails because PyTorch does not support exporting `elu` operator. We
find `virtual Tensor elu(const Tensor & input, Scalar alpha, bool inplace)
const override;` in `VariableType.h`. This means `elu` is an ATen operator. We
check the [ONNX operator
list](https://github.com/onnx/onnx/blob/master/docs/Operators.md), and confirm
that `Elu` is standardized in ONNX. We add the following lines to
`symbolic_opset9.py`:

    
    
    def elu(g, input, alpha, inplace=False):
        return g.op("Elu", input, alpha_f=_scalar(alpha))
    

Now PyTorch is able to export `elu` operator.

There are more examples in
[symbolic_opset9.py](https://github.com/pytorch/pytorch/blob/master/torch/onnx/symbolic_opset9.py),
[symbolic_opset10.py](https://github.com/pytorch/pytorch/blob/master/torch/onnx/symbolic_opset10.py).

The interface for specifying operator definitions is experimental; adventurous
users should note that the APIs will probably change in a future interface.

### Custom operators

Following this tutorial [Extending TorchScript with Custom C++
Operators](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html),
you can create and register your own custom ops implementation in PyTorch.
Here’s how to export such model to ONNX.:

    
    
    # Create custom symbolic function
    from torch.onnx.symbolic_helper import parse_args
    @parse_args('v', 'v', 'f', 'i')
    def symbolic_foo_forward(g, input1, input2, attr1, attr2):
        return g.op("Foo", input1, input2, attr1_f=attr1, attr2_i=attr2)
    
    # Register custom symbolic function
    from torch.onnx import register_custom_op_symbolic
    register_custom_op_symbolic('custom_ops::foo_forward', symbolic_foo_forward, 9)
    
    class FooModel(torch.nn.Module):
        def __init__(self, attr1, attr2):
            super(FooModule, self).__init__()
            self.attr1 = attr1
            self.attr2 = attr2
    
        def forward(self, input1, input2):
            # Calling custom op
            return torch.ops.custom_ops.foo_forward(input1, input2, self.attr1, self.attr2)
    
    model = FooModel(attr1, attr2)
    torch.onnx.export(model, (dummy_input1, dummy_input2), 'model.onnx', custom_opsets={"custom_domain": 2})
    

Depending on the custom operator, you can export it as one or a combination of
existing ONNX ops. You can also export it as a custom op in ONNX as well. In
that case, you can specify the custom domain and version (custom opset) using
the `custom_opsets` dictionary at export. If not explicitly specified, the
custom opset version is set to 1 by default. Using custom ONNX ops, you will
need to extend the backend of your choice with matching custom ops
implementation, e.g. [Caffe2 custom ops](https://caffe2.ai/docs/custom-
operators.html), [ONNX Runtime custom
ops](https://github.com/microsoft/onnxruntime/blob/master/docs/AddingCustomOp.md).

## Operator Export Type

Exporting models with unsupported ONNX operators can be achieved using the
`operator_export_type` flag in export API. This flag is useful when users try
to export ATen and non-ATen operators that are not registered and supported in
ONNX.

### ONNX

This mode is used to export all operators as regular ONNX operators. This is
the default `operator_export_type` mode.

    
    
    Example torch ir graph:
    
      graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])):
        %3 : Float(2, 3, 4, strides=[12, 4, 1]) = aten:exp(%0)
        %4 : Float(2, 3, 4, strides=[12, 4, 1]) = aten:div(%0, %3)
        return (%4)
    
    Is exported as:
    
      graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])):
        %1 : Float(2, 3, 4, strides=[12, 4, 1]) = onnx:Exp(%0)
        %2 : Float(2, 3, 4, strides=[12, 4, 1]) = onnx:Div(%0, %1)
        return (%2)
    

### ONNX_ATEN

This mode is used to export all operators as ATen ops, and avoid conversion to
ONNX.

    
    
    Example torch ir graph:
    
      graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])):
        %3 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::exp(%0)
        %4 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::div(%0, %3)
        return (%4)
    
    Is exported as:
    
      graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])):
        %1 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::ATen[operator="exp"](%0)
        %2 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::ATen[operator="div"](%0, %1)
        return (%2)
    

### ONNX_ATEN_FALLBACK

To fallback on unsupported ATen operators in ONNX. Supported operators are
exported to ONNX regularly. In the following example, aten::triu is not
supported in ONNX. Exporter falls back on this operator.

    
    
    Example torch ir graph:
    
      graph(%0 : Float):
        %3 : int = prim::Constant[value=0]()
        %4 : Float = aten::triu(%0, %3) # unsupported op
        %5 : Float = aten::mul(%4, %0) # registered op
        return (%5)
    
    is exported as:
    
      graph(%0 : Float):
        %1 : Long() = onnx::Constant[value={0}]()
        %2 : Float = aten::ATen[operator="triu"](%0, %1) # unsupported op
        %3 : Float = onnx::Mul(%2, %0) # registered op
        return (%3)
    

### RAW

To export a raw ir.

    
    
    Example torch ir graph:
    
      graph(%x.1 : Float(1, strides=[1])):
        %1 : Tensor = aten::exp(%x.1)
        %2 : Tensor = aten::div(%x.1, %1)
        %y.1 : Tensor[] = prim::ListConstruct(%2)
        return (%y.1)
    
    is exported as:
    
      graph(%x.1 : Float(1, strides=[1])):
        %1 : Tensor = aten::exp(%x.1)
        %2 : Tensor = aten::div(%x.1, %1)
        %y.1 : Tensor[] = prim::ListConstruct(%2)
        return (%y.1)
    

### ONNX_FALLTHROUGH

This mode can be used to export any operator (ATen or non-ATen) that is not
registered and supported in ONNX. Exported falls through and exports the
operator as is, as custom op. Exporting custom operators enables users to
register and implement the operator as part of their runtime backend.

    
    
    Example torch ir graph:
    
      graph(%0 : Float(2, 3, 4, strides=[12, 4, 1]),
            %1 : Float(2, 3, 4, strides=[12, 4, 1])):
        %6 : Float(2, 3, 4, strides=[12, 4, 1]) = foo_namespace::bar(%0, %1) # custom op
        %7 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::div(%6, %0) # registered op
        return (%7))
    
    is exported as:
    
      graph(%0 : Float(2, 3, 4, strides=[12, 4, 1]),
            %1 : Float(2, 3, 4, strides=[12, 4, 1])):
        %2 : Float(2, 3, 4, strides=[12, 4, 1]) = foo_namespace::bar(%0, %1) # custom op
        %3 : Float(2, 3, 4, strides=[12, 4, 1]) = onnx::Div(%2, %0) # registered op
        return (%3
    

## Frequently Asked Questions

Q: I have exported my lstm model, but its input size seems to be fixed?

The tracer records the example inputs shape in the graph. In case the model
should accept inputs of dynamic shape, you can utilize the parameter
`dynamic_axes` in export api.

    
    
    layer_count = 4
    
    model = nn.LSTM(10, 20, num_layers=layer_count, bidirectional=True)
    model.eval()
    
    with torch.no_grad():
        input = torch.randn(5, 3, 10)
        h0 = torch.randn(layer_count * 2, 3, 20)
        c0 = torch.randn(layer_count * 2, 3, 20)
        output, (hn, cn) = model(input, (h0, c0))
    
        # default export
        torch.onnx.export(model, (input, (h0, c0)), 'lstm.onnx')
        onnx_model = onnx.load('lstm.onnx')
        # input shape [5, 3, 10]
        print(onnx_model.graph.input[0])
    
        # export with `dynamic_axes`
        torch.onnx.export(model, (input, (h0, c0)), 'lstm.onnx',
                        input_names=['input', 'h0', 'c0'],
                        output_names=['output', 'hn', 'cn'],
                        dynamic_axes={'input': {0: 'sequence'}, 'output': {0: 'sequence'}})
        onnx_model = onnx.load('lstm.onnx')
        # input shape ['sequence', 3, 10]
        print(onnx_model.graph.input[0])
    

Q: How to export models with loops in it?

Please checkout Tracing vs Scripting.

Q: Does ONNX support implicit scalar datatype casting?

No, but the exporter will try to handle that part. Scalars are converted to
constant tensors in ONNX. The exporter will try to figure out the right
datatype for scalars. However for cases that it failed to do so, you will need
to manually provide the datatype information. This often happens with scripted
models, where the datatypes are not recorded. We are trying to improve the
datatype propagation in the exporter such that manual changes are not required
in the future.

    
    
    class ImplicitCastType(torch.jit.ScriptModule):
        @torch.jit.script_method
        def forward(self, x):
            # Exporter knows x is float32, will export '2' as float32 as well.
            y = x + 2
            # Without type propagation, exporter doesn't know the datatype of y.
            # Thus '3' is exported as int64 by default.
            return y + 3
            # The following will export correctly.
            # return y + torch.tensor([3], dtype=torch.float32)
    
    x = torch.tensor([1.0], dtype=torch.float32)
    torch.onnx.export(ImplicitCastType(), x, 'models/implicit_cast.onnx',
                      example_outputs=ImplicitCastType()(x))
    

Q: Is tensor in-place indexed assignment like `data[index] = new_data`
supported?

Yes, this is supported for ONNX opset version >= 11. Please checkout Indexing.

Q: Is tensor list exportable to ONNX?

Yes, this is supported now for ONNX opset version >= 11. ONNX introduced the
concept of Sequence in opset 11. Similar to list, Sequence is a data type that
contains arbitrary number of Tensors. Associated operators are also introduced
in ONNX, such as SequenceInsert, SequenceAt, etc. However, in-place list
append within loops is not exportable to ONNX. To implement this, please use
inplace add operator. E.g.:

    
    
    class ListLoopModel(torch.nn.Module):
        def forward(self, x):
            res = []
            res1 = []
            arr = x.split(2, 0)
            res2 = torch.zeros(3, 4, dtype=torch.long)
            for i in range(len(arr)):
                res += [arr[i].sum(0, False)]
                res1 += [arr[-1 - i].sum(0, False)]
                res2 += 1
            return torch.stack(res), torch.stack(res1), res2
    
    model = torch.jit.script(ListLoopModel())
    inputs = torch.randn(16)
    
    out = model(inputs)
    torch.onnx.export(model, (inputs, ), 'loop_and_list.onnx', opset_version=11, example_outputs=out)
    
    # onnxruntime
    import onnxruntime
    sess = onnxruntime.InferenceSession('loop_and_list.onnx')
    out_ort = sess.run(None, {
        sess.get_inputs()[0].name: inputs.numpy(),
    })
    
    assert [torch.allclose(o, torch.tensor(o_ort)) for o, o_ort in zip(out, out_ort)]
    

## Use external data format

`use_external_data_format` argument in export API enables export of models in
ONNX external data format. With this option enabled, the exporter stores some
model parameters in external binary files, rather than the ONNX file itself.
These external binary files are stored in the same location as the ONNX file.
Argument ‘f’ must be a string specifying the location of the model.

    
    
    model = torchvision.models.mobilenet_v2(pretrained=True)
    input = torch.randn(2, 3, 224, 224, requires_grad=True)
    torch.onnx.export(model, (input, ), './large_model.onnx', use_external_data_format=True)
    

This argument enables export of large models to ONNX. Models larger than 2GB
cannot be exported in one file because of the protobuf size limit. Users
should set `use_external_data_format` to `True` to successfully export such
models.

## Training

`Training` argument in export API allows users to export models in a training-
friendly mode. `TrainingMode.TRAINING` exports model in a training-friendly
mode that avoids certain model optimizations which might interfere with model
parameter training. `TrainingMode.PRESERVE` exports the model in inference
mode if `model.training` is `False`. Otherwise, it exports the model in a
training-friendly mode. The default mode for this argument is
`TrainingMode.EVAL` which exports the model in inference mode.

## Functions

`torch.onnx.export(model, args, f, export_params=True, verbose=False,
training=<TrainingMode.EVAL: 0>, input_names=None, output_names=None,
aten=False, export_raw_ir=False, operator_export_type=None,
opset_version=None, _retain_param_name=True, do_constant_folding=True,
example_outputs=None, strip_doc_string=True, dynamic_axes=None,
keep_initializers_as_inputs=None, custom_opsets=None,
enable_onnx_checker=True, use_external_data_format=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#export)

    

Export a model into ONNX format. This exporter runs your model once in order
to get a trace of its execution to be exported; at the moment, it supports a
limited set of dynamic models (e.g., RNNs.)

Parameters

    

  * **model** ([torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – the model to be exported.
  * **args** (_tuple of arguments_ _or_[torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__a dictionary consisting of named arguments_ _(__optional_ _)_) – 

a dictionary to specify the input to the corresponding named parameter: - KEY:
str, named parameter - VALUE: corresponding input args can be structured
either as:

    1. ONLY A TUPLE OF ARGUMENTS or torch.Tensor:
           
           ‘’args = (x, y, z)’'
           

The inputs to the model, e.g., such that `model(*args)` is a valid invocation
of the model. Any non-Tensor arguments will be hard-coded into the exported
model; any Tensor arguments will become inputs of the exported model, in the
order they occur in args. If args is a Tensor, this is equivalent to having
called it with a 1-ary tuple of that Tensor.

    2. A TUPLE OF ARGUEMENTS WITH A DICTIONARY OF NAMED PARAMETERS:
           
           ‘’args = (x,
                   {
                   ‘y’: input_y,
                   ‘z’: input_z
                   }) ‘’
           

The inputs to the model are structured as a tuple consisting of non-keyword
arguments and the last value of this tuple being a dictionary consisting of
named parameters and the corresponding inputs as key-value pairs. If certain
named argument is not present in the dictionary, it is assigned the default
value, or None if default value is not provided.

Cases in which an dictionary input is the last input of the args tuple would
cause a conflict when a dictionary of named parameters is used. The model
below provides such an example.

class Model(torch.nn.Module):

    

def forward(self, k, x):

    

… return x

m = Model() k = torch.randn(2, 3) x = {torch.tensor(1.): torch.randn(2, 3)}

In the previous iteration, the call to export API would look like

torch.onnx.export(model, (k, x), ‘test.onnx’)

This would work as intended. However, the export function would now assume
that the ‘x’ input is intended to represent the optional dictionary consisting
of named arguments. In order to prevent this from being an issue a constraint
is placed to provide an empty dictionary as the last input in the tuple args
in such cases. The new call would look like this.

torch.onnx.export(model, (k, x, {}), ‘test.onnx’)

  * **f** – a file-like object (has to implement fileno that returns a file descriptor) or a string containing a file name. A binary Protobuf will be written to this file.
  * **export_params** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default True_) – if specified, all parameters will be exported. Set this to False if you want to export an untrained model. In this case, the exported model will first take all of its parameters as arguments, the ordering as specified by `model.state_dict().values()`
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – if specified, we will print out a debug description of the trace being exported.
  * **training** (_enum_ _,__default TrainingMode.EVAL_) – TrainingMode.EVAL: export the model in inference mode. TrainingMode.PRESERVE: export the model in inference mode if model.training is False and to a training friendly mode if model.training is True. TrainingMode.TRAINING: export the model in a training friendly mode.
  * **input_names** (_list of strings_ _,__default empty list_) – names to assign to the input nodes of the graph, in order
  * **output_names** (_list of strings_ _,__default empty list_) – names to assign to the output nodes of the graph, in order
  * **aten** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – [DEPRECATED. use operator_export_type] export the model in aten mode. If using aten mode, all the ops original exported by the functions in symbolic_opset<version>.py are exported as ATen ops.
  * **export_raw_ir** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – [DEPRECATED. use operator_export_type] export the internal IR directly instead of converting it to ONNX ops.
  * **operator_export_type** (_enum_ _,__default OperatorExportTypes.ONNX_) – 

OperatorExportTypes.ONNX: All ops are exported as regular ONNX ops (with ONNX
namespace). OperatorExportTypes.ONNX_ATEN: All ops are exported as ATen ops
(with aten namespace). OperatorExportTypes.ONNX_ATEN_FALLBACK: If an ATen op
is not supported in ONNX or its symbolic is missing, fall back on ATen op.
Registered ops are exported to ONNX regularly. Example graph:

        
        graph(%0 : Float)::
          %3 : int = prim::Constant[value=0]()
          %4 : Float = aten::triu(%0, %3) # missing op
          %5 : Float = aten::mul(%4, %0) # registered op
          return (%5)
        

is exported as:

        
        graph(%0 : Float)::
          %1 : Long() = onnx::Constant[value={0}]()
          %2 : Float = aten::ATen[operator="triu"](%0, %1)  # missing op
          %3 : Float = onnx::Mul(%2, %0) # registered op
          return (%3)
        

In the above example, aten::triu is not supported in ONNX, hence exporter
falls back on this op. OperatorExportTypes.RAW: Export raw ir.
OperatorExportTypes.ONNX_FALLTHROUGH: If an op is not supported in ONNX, fall
through and export the operator as is, as a custom ONNX op. Using this mode,
the op can be exported and implemented by the user for their runtime backend.
Example graph:

        
        graph(%x.1 : Long(1, strides=[1]))::
          %1 : None = prim::Constant()
          %2 : Tensor = aten::sum(%x.1, %1)
          %y.1 : Tensor[] = prim::ListConstruct(%2)
          return (%y.1)
        

is exported as:

        
        graph(%x.1 : Long(1, strides=[1]))::
          %1 : Tensor = onnx::ReduceSum[keepdims=0](%x.1)
          %y.1 : Long() = prim::ListConstruct(%1)
          return (%y.1)
        

In the above example, prim::ListConstruct is not supported, hence exporter
falls through.

  * **opset_version** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__default is 9_) – by default we export the model to the opset version of the onnx submodule. Since ONNX’s latest opset may evolve before next stable release, by default we export to one stable opset version. Right now, supported stable opset version is 9. The opset_version must be _onnx_main_opset or in _onnx_stable_opsets which are defined in torch/onnx/symbolic_helper.py
  * **do_constant_folding** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – If True, the constant-folding optimization is applied to the model during export. Constant-folding optimization will replace some of the ops that have all constant inputs, with pre-computed constant nodes.
  * **example_outputs** (_tuple of Tensors_ _,__default None_) – Model’s example outputs being exported. example_outputs must be provided when exporting a ScriptModule or TorchScript Function.
  * **strip_doc_string** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default True_) – if True, strips the field “doc_string” from the exported model, which information about the stack trace.
  * **dynamic_axes** (_dict <string_ _,__dict <python:int_ _,__string >>__or_ _dict <string_ _,_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _(_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _)__>__,__default empty dict_) – 

a dictionary to specify dynamic axes of input/output, such that: - KEY: input
and/or output names - VALUE: index of dynamic axes for given key and
potentially the name to be used for exported dynamic axes. In general the
value is defined according to one of the following ways or a combination of
both: (1). A list of integers specifying the dynamic axes of provided input.
In this scenario automated names will be generated and applied to dynamic axes
of provided input/output during export. OR (2). An inner dictionary that
specifies a mapping FROM the index of dynamic axis in corresponding
input/output TO the name that is desired to be applied on such axis of such
input/output during export.

Example. if we have the following shape for inputs and outputs:

        
        shape(input_1) = ('b', 3, 'w', 'h')
        and shape(input_2) = ('b', 4)
        and shape(output)  = ('b', 'd', 5)
        

Then `dynamic axes` can be defined either as:

    1. ONLY INDICES:
           
           ``dynamic_axes = {'input_1':[0, 2, 3],
                             'input_2':[0],
                             'output':[0, 1]}``
           where automatic names will be generated for exported dynamic axes
           

    2. INDICES WITH CORRESPONDING NAMES:
           
           ``dynamic_axes = {'input_1':{0:'batch',
                                        1:'width',
                                        2:'height'},
                             'input_2':{0:'batch'},
                             'output':{0:'batch',
                                       1:'detections'}}``
           where provided names will be applied to exported dynamic axes
           

    3. MIXED MODE OF (1) and (2):
           
           ``dynamic_axes = {'input_1':[0, 2, 3],
                             'input_2':{0:'batch'},
                             'output':[0,1]}``
           

  * **keep_initializers_as_inputs** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default None_) – 

If True, all the initializers (typically corresponding to parameters) in the
exported graph will also be added as inputs to the graph. If False, then
initializers are not added as inputs to the graph, and only the non-parameter
inputs are added as inputs.

This may allow for better optimizations (such as constant folding etc.) by
backends/runtimes that execute these graphs. If unspecified (default None),
then the behavior is chosen automatically as follows. If operator_export_type
is OperatorExportTypes.ONNX, the behavior is equivalent to setting this
argument to False. For other values of operator_export_type, the behavior is
equivalent to setting this argument to True. Note that for ONNX opset version
< 9, initializers MUST be part of graph inputs. Therefore, if opset_version
argument is set to a 8 or lower, this argument will be ignored.

  * **custom_opsets** (_dict <string_ _,__int >__,__default empty dict_) – A dictionary to indicate custom opset domain and version at export. If model contains a custom opset, it is optional to specify the domain and opset version in the dictionary: - KEY: opset domain name - VALUE: opset version If the custom opset is not provided in this dictionary, opset version is set to 1 by default.
  * **enable_onnx_checker** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default True_) – If True the onnx model checker will be run as part of the export, to ensure the exported model is a valid ONNX model.
  * **external_data_format** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – If True, then the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself. See link for format details: <https://github.com/onnx/onnx/blob/8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e/onnx/onnx.proto#L423> Also, in this case, argument ‘f’ must be a string specifying the location of the model. The external binary files will be stored in the same location specified by the model location ‘f’. If False, then the model is stored in regular format, i.e. model and parameters are all in one file. This argument is ignored for all export types other than ONNX.

`torch.onnx.export_to_pretty_string(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#export_to_pretty_string)

`torch.onnx.register_custom_op_symbolic(symbolic_name, symbolic_fn,
opset_version)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#register_custom_op_symbolic)

`torch.onnx.operators.shape_as_tensor(x)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx/operators.html#shape_as_tensor)

`torch.onnx.select_model_mode_for_export(model, mode)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#select_model_mode_for_export)

    

A context manager to temporarily set the training mode of ‘model’ to ‘mode’,
resetting it when we exit the with-block. A no-op if mode is None.

In version 1.6 changed to this from set_training

`torch.onnx.is_in_onnx_export()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#is_in_onnx_export)

    

Check whether it’s in the middle of the ONNX export. This function returns
True in the middle of torch.onnx.export(). torch.onnx.export should be
executed with single thread.

# torch.optim

`torch.optim` is a package implementing various optimization algorithms. Most
commonly used methods are already supported, and the interface is general
enough, so that more sophisticated ones can be also easily integrated in the
future.

## How to use an optimizer

To use `torch.optim` you have to construct an optimizer object, that will hold
the current state and will update the parameters based on the computed
gradients.

### Constructing it

To construct an `Optimizer` you have to give it an iterable containing the
parameters (all should be `Variable` s) to optimize. Then, you can specify
optimizer-specific options such as the learning rate, weight decay, etc.

Note

If you need to move a model to GPU via `.cuda()`, please do so before
constructing optimizers for it. Parameters of a model after `.cuda()` will be
different objects with those before the call.

In general, you should make sure that optimized parameters live in consistent
locations when optimizers are constructed and used.

Example:

    
    
    optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    optimizer = optim.Adam([var1, var2], lr=0.0001)
    

### Per-parameter options

`Optimizer` s also support specifying per-parameter options. To do this,
instead of passing an iterable of `Variable` s, pass in an iterable of
[`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)") s. Each of them will define a separate parameter group, and should
contain a `params` key, containing a list of parameters belonging to it. Other
keys should match the keyword arguments accepted by the optimizers, and will
be used as optimization options for this group.

Note

You can still pass options as keyword arguments. They will be used as
defaults, in the groups that didn’t override them. This is useful when you
only want to vary a single option, while keeping all others consistent between
parameter groups.

For example, this is very useful when one wants to specify per-layer learning
rates:

    
    
    optim.SGD([
                    {'params': model.base.parameters()},
                    {'params': model.classifier.parameters(), 'lr': 1e-3}
                ], lr=1e-2, momentum=0.9)
    

This means that `model.base`’s parameters will use the default learning rate
of `1e-2`, `model.classifier`’s parameters will use a learning rate of `1e-3`,
and a momentum of `0.9` will be used for all parameters.

### Taking an optimization step

All optimizers implement a `step()` method, that updates the parameters. It
can be used in two ways:

#### `optimizer.step()`

This is a simplified version supported by most optimizers. The function can be
called once the gradients are computed using e.g. `backward()`.

Example:

    
    
    for input, target in dataset:
        optimizer.zero_grad()
        output = model(input)
        loss = loss_fn(output, target)
        loss.backward()
        optimizer.step()
    

#### `optimizer.step(closure)`

Some optimization algorithms such as Conjugate Gradient and LBFGS need to
reevaluate the function multiple times, so you have to pass in a closure that
allows them to recompute your model. The closure should clear the gradients,
compute the loss, and return it.

Example:

    
    
    for input, target in dataset:
        def closure():
            optimizer.zero_grad()
            output = model(input)
            loss = loss_fn(output, target)
            loss.backward()
            return loss
        optimizer.step(closure)
    

## Algorithms

`class torch.optim.Optimizer(params, defaults)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer)

    

Base class for all optimizers.

Warning

Parameters need to be specified as collections that have a deterministic
ordering that is consistent between runs. Examples of objects that don’t
satisfy those properties are sets and iterators over values of dictionaries.

Parameters

    

  * **params** (_iterable_) – an iterable of [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") s or [`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)") s. Specifies what Tensors should be optimized.
  * **defaults** – (dict): a dict containing default values of optimization options (used when a parameter group doesn’t specify them).

`add_param_group(param_group)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.add_param_group)

    

Add a param group to the `Optimizer` s `param_groups`.

This can be useful when fine tuning a pre-trained network as frozen layers can
be made trainable and added to the `Optimizer` as training progresses.

Parameters

    

  * **param_group** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Specifies what Tensors should be optimized along with group
  * **optimization options.** (_specific_) – 

`load_state_dict(state_dict)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.load_state_dict)

    

Loads the optimizer state.

Parameters

    

**state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict
"\(in Python v3.9\)")) – optimizer state. Should be an object returned from a
call to `state_dict()`.

`state_dict()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.state_dict)

    

Returns the state of the optimizer as a
[`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)").

It contains two entries:

  * state - a dict holding current optimization state. Its content
    

differs between optimizer classes.

  * param_groups - a dict containing all parameter groups

`step(closure)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.step)

    

Performs a single optimization step (parameter update).

Parameters

    

**closure** (_callable_) – A closure that reevaluates the model and returns
the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the `.grad` field
of the parameters.

`zero_grad(set_to_none=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.zero_grad)

    

Sets the gradients of all optimized [`torch.Tensor`](tensors#torch.Tensor
"torch.Tensor") s to zero.

Parameters

    

**set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – instead of setting to zero, set the grads to None.
This will in general have lower memory footprint, and can modestly improve
performance. However, it changes certain behaviors. For example: 1. When the
user tries to access a gradient and perform manual ops on it, a None attribute
or a Tensor full of 0s will behave differently. 2. If the user requests
`zero_grad(set_to_none=True)` followed by a backward pass, `.grad`s are
guaranteed to be None for params that did not receive a gradient. 3.
`torch.optim` optimizers have a different behavior if the gradient is 0 or
None (in one case it does the step with a gradient of 0 and in the other it
skips the step altogether).

`class torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06,
weight_decay=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adadelta.html#Adadelta)

    

Implements Adadelta algorithm.

It has been proposed in [ADADELTA: An Adaptive Learning Rate
Method](https://arxiv.org/abs/1212.5701).

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **rho** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – coefficient used for computing a running average of squared gradients (default: 0.9)
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-6)
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – coefficient that scale delta before it is applied to the parameters (default: 1.0)
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adadelta.html#Adadelta.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0,
initial_accumulator_value=0, eps=1e-10)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adagrad.html#Adagrad)

    

Implements Adagrad algorithm.

It has been proposed in [Adaptive Subgradient Methods for Online Learning and
Stochastic Optimization](http://jmlr.org/papers/v12/duchi11a.html).

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2)
  * **lr_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate decay (default: 0)
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0)
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-10)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adagrad.html#Adagrad.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08,
weight_decay=0, amsgrad=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adam.html#Adam)

    

Implements Adam algorithm.

It has been proposed in [Adam: A Method for Stochastic
Optimization](https://arxiv.org/abs/1412.6980). The implementation of the L2
penalty follows changes proposed in [Decoupled Weight Decay
Regularization](https://arxiv.org/abs/1711.05101).

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-3)
  * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8)
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0)
  * **amsgrad** (_boolean_ _,__optional_) – whether to use the AMSGrad variant of this algorithm from the paper [On the Convergence of Adam and Beyond](https://openreview.net/forum?id=ryQu7f-RZ) (default: False)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adam.html#Adam.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08,
weight_decay=0.01, amsgrad=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamw.html#AdamW)

    

Implements AdamW algorithm.

The original Adam algorithm was proposed in [Adam: A Method for Stochastic
Optimization](https://arxiv.org/abs/1412.6980). The AdamW variant was proposed
in [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101).

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-3)
  * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8)
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay coefficient (default: 1e-2)
  * **amsgrad** (_boolean_ _,__optional_) – whether to use the AMSGrad variant of this algorithm from the paper [On the Convergence of Adam and Beyond](https://openreview.net/forum?id=ryQu7f-RZ) (default: False)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamw.html#AdamW.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999),
eps=1e-08)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sparse_adam.html#SparseAdam)

    

Implements lazy version of Adam algorithm suitable for sparse tensors.

In this variant, only moments that show up in the gradient get updated, and
only those portions of the gradient get applied to the parameters.

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-3)
  * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999))
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sparse_adam.html#SparseAdam.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08,
weight_decay=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamax.html#Adamax)

    

Implements Adamax algorithm (a variant of Adam based on infinity norm).

It has been proposed in [Adam: A Method for Stochastic
Optimization](https://arxiv.org/abs/1412.6980).

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 2e-3)
  * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8)
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamax.html#Adamax.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.ASGD(params, lr=0.01, lambd=0.0001, alpha=0.75,
t0=1000000.0, weight_decay=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/asgd.html#ASGD)

    

Implements Averaged Stochastic Gradient Descent.

It has been proposed in [Acceleration of stochastic approximation by
averaging](https://dl.acm.org/citation.cfm?id=131098).

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2)
  * **lambd** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – decay term (default: 1e-4)
  * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – power for eta update (default: 0.75)
  * **t0** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – point at which to start averaging (default: 1e6)
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/asgd.html#ASGD.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None,
tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100,
line_search_fn=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lbfgs.html#LBFGS)

    

Implements L-BFGS algorithm, heavily inspired by `minFunc
<https://www.cs.ubc.ca/~schmidtm/Software/minFunc.html>`.

Warning

This optimizer doesn’t support per-parameter options and parameter groups
(there can be only one).

Warning

Right now all parameters have to be on a single device. This will be improved
in the future.

Note

This is a very memory intensive optimizer (it requires additional `param_bytes
* (history_size + 1)` bytes). If it doesn’t fit in memory try reducing the
history size, or use a different algorithm.

Parameters

    

  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – learning rate (default: 1)
  * **max_iter** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximal number of iterations per optimization step (default: 20)
  * **max_eval** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximal number of function evaluations per optimization step (default: max_iter * 1.25).
  * **tolerance_grad** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – termination tolerance on first order optimality (default: 1e-5).
  * **tolerance_change** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – termination tolerance on function value/parameter changes (default: 1e-9).
  * **history_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – update history size (default: 100).
  * **line_search_fn** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – either ‘strong_wolfe’ or None (default: None).

`step(closure)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lbfgs.html#LBFGS.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_) – A closure that reevaluates the model and returns
the loss.

`class torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08,
weight_decay=0, momentum=0, centered=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rmsprop.html#RMSprop)

    

Implements RMSprop algorithm.

Proposed by G. Hinton in his
[course](https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf).

The centered version first appears in [Generating Sequences With Recurrent
Neural Networks](https://arxiv.org/pdf/1308.0850v5.pdf).

The implementation here takes the square root of the gradient average before
adding epsilon (note that TensorFlow interchanges these two operations). The
effective learning rate is thus α/(v+ϵ)\alpha/(\sqrt{v} + \epsilon) where
α\alpha is the scheduled learning rate and vv is the weighted moving average
of the squared gradient.

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2)
  * **momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – momentum factor (default: 0)
  * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – smoothing constant (default: 0.99)
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8)
  * **centered** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, compute the centered RMSProp, the gradient is normalized by an estimation of its variance
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0)

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rmsprop.html#RMSprop.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06,
50))`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rprop.html#Rprop)

    

Implements the resilient backpropagation algorithm.

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2)
  * **etas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – pair of (etaminus, etaplis), that are multiplicative increase and decrease factors (default: (0.5, 1.2))
  * **step_sizes** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – a pair of minimal and maximal allowed step sizes (default: (1e-6, 50))

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rprop.html#Rprop.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

`class torch.optim.SGD(params, lr=<required parameter>, momentum=0,
dampening=0, weight_decay=0, nesterov=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sgd.html#SGD)

    

Implements stochastic gradient descent (optionally with momentum).

Nesterov momentum is based on the formula from [On the importance of
initialization and momentum in deep
learning](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf).

Parameters

    

  * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups
  * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – learning rate
  * **momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – momentum factor (default: 0)
  * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0)
  * **dampening** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – dampening for momentum (default: 0)
  * **nesterov** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – enables Nesterov momentum (default: False)

#### Example

    
    
    >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
    >>> optimizer.zero_grad()
    >>> loss_fn(model(input), target).backward()
    >>> optimizer.step()
    

Note

The implementation of SGD with Momentum/Nesterov subtly differs from Sutskever
et. al. and implementations in some other frameworks.

Considering the specific case of Momentum, the update can be written as

vt+1=μ∗vt+gt+1,pt+1=pt−lr∗vt+1,\begin{aligned} v_{t+1} & = \mu * v_{t} +
g_{t+1}, \\\ p_{t+1} & = p_{t} - \text{lr} * v_{t+1}, \end{aligned}

where pp , gg , vv and μ\mu denote the parameters, gradient, velocity, and
momentum respectively.

This is in contrast to Sutskever et. al. and other frameworks which employ an
update of the form

vt+1=μ∗vt+lr∗gt+1,pt+1=pt−vt+1.\begin{aligned} v_{t+1} & = \mu * v_{t} +
\text{lr} * g_{t+1}, \\\ p_{t+1} & = p_{t} - v_{t+1}. \end{aligned}

The Nesterov version is analogously modified.

`step(closure=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sgd.html#SGD.step)

    

Performs a single optimization step.

Parameters

    

**closure** (_callable_ _,__optional_) – A closure that reevaluates the model
and returns the loss.

## How to adjust learning rate

`torch.optim.lr_scheduler` provides several methods to adjust the learning
rate based on the number of epochs.
`torch.optim.lr_scheduler.ReduceLROnPlateau` allows dynamic learning rate
reducing based on some validation measurements.

Learning rate scheduling should be applied after optimizer’s update; e.g., you
should write your code this way:

    
    
    >>> scheduler = ...
    >>> for epoch in range(100):
    >>>     train(...)
    >>>     validate(...)
    >>>     scheduler.step()
    

Warning

Prior to PyTorch 1.1.0, the learning rate scheduler was expected to be called
before the optimizer’s update; 1.1.0 changed this behavior in a BC-breaking
way. If you use the learning rate scheduler (calling `scheduler.step()`)
before the optimizer’s update (calling `optimizer.step()`), this will skip the
first value of the learning rate schedule. If you are unable to reproduce
results after upgrading to PyTorch 1.1.0, please check if you are calling
`scheduler.step()` at the wrong time.

`class torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1,
verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#LambdaLR)

    

Sets the learning rate of each parameter group to the initial lr times a given
function. When last_epoch=-1, sets initial lr as lr.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **lr_lambda** (_function_ _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups.
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

#### Example

    
    
    >>> # Assuming optimizer has two groups.
    >>> lambda1 = lambda epoch: epoch // 30
    >>> lambda2 = lambda epoch: 0.95 ** epoch
    >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2])
    >>> for epoch in range(100):
    >>>     train(...)
    >>>     validate(...)
    >>>     scheduler.step()
    

`load_state_dict(state_dict)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#LambdaLR.load_state_dict)

    

Loads the schedulers state.

When saving or loading the scheduler, please make sure to also save or load
the state of the optimizer.

Parameters

    

**state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict
"\(in Python v3.9\)")) – scheduler state. Should be an object returned from a
call to `state_dict()`.

`state_dict()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#LambdaLR.state_dict)

    

Returns the state of the scheduler as a
[`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)").

It contains an entry for every variable in self.__dict__ which is not the
optimizer. The learning rate lambda functions will only be saved if they are
callable objects and not if they are functions or lambdas.

When saving or loading the scheduler, please make sure to also save or load
the state of the optimizer.

`class torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda,
last_epoch=-1, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiplicativeLR)

    

Multiply the learning rate of each parameter group by the factor given in the
specified function. When last_epoch=-1, sets initial lr as lr.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **lr_lambda** (_function_ _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups.
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

#### Example

    
    
    >>> lmbda = lambda epoch: 0.95
    >>> scheduler = MultiplicativeLR(optimizer, lr_lambda=lmbda)
    >>> for epoch in range(100):
    >>>     train(...)
    >>>     validate(...)
    >>>     scheduler.step()
    

`load_state_dict(state_dict)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiplicativeLR.load_state_dict)

    

Loads the schedulers state.

Parameters

    

**state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict
"\(in Python v3.9\)")) – scheduler state. Should be an object returned from a
call to `state_dict()`.

`state_dict()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiplicativeLR.state_dict)

    

Returns the state of the scheduler as a
[`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python
v3.9\)").

It contains an entry for every variable in self.__dict__ which is not the
optimizer. The learning rate lambda functions will only be saved if they are
callable objects and not if they are functions or lambdas.

`class torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1,
last_epoch=-1, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#StepLR)

    

Decays the learning rate of each parameter group by gamma every step_size
epochs. Notice that such decay can happen simultaneously with other changes to
the learning rate from outside this scheduler. When last_epoch=-1, sets
initial lr as lr.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **step_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Period of learning rate decay.
  * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Multiplicative factor of learning rate decay. Default: 0.1.
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

#### Example

    
    
    >>> # Assuming optimizer uses lr = 0.05 for all groups
    >>> # lr = 0.05     if epoch < 30
    >>> # lr = 0.005    if 30 <= epoch < 60
    >>> # lr = 0.0005   if 60 <= epoch < 90
    >>> # ...
    >>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1)
    >>> for epoch in range(100):
    >>>     train(...)
    >>>     validate(...)
    >>>     scheduler.step()
    

`class torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1,
last_epoch=-1, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiStepLR)

    

Decays the learning rate of each parameter group by gamma once the number of
epoch reaches one of the milestones. Notice that such decay can happen
simultaneously with other changes to the learning rate from outside this
scheduler. When last_epoch=-1, sets initial lr as lr.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **milestones** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – List of epoch indices. Must be increasing.
  * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Multiplicative factor of learning rate decay. Default: 0.1.
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

#### Example

    
    
    >>> # Assuming optimizer uses lr = 0.05 for all groups
    >>> # lr = 0.05     if epoch < 30
    >>> # lr = 0.005    if 30 <= epoch < 80
    >>> # lr = 0.0005   if epoch >= 80
    >>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1)
    >>> for epoch in range(100):
    >>>     train(...)
    >>>     validate(...)
    >>>     scheduler.step()
    

`class torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1,
verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#ExponentialLR)

    

Decays the learning rate of each parameter group by gamma every epoch. When
last_epoch=-1, sets initial lr as lr.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Multiplicative factor of learning rate decay.
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

`class torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0,
last_epoch=-1, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CosineAnnealingLR)

    

Set the learning rate of each parameter group using a cosine annealing
schedule, where ηmax\eta_{max} is set to the initial lr and TcurT_{cur} is the
number of epochs since the last restart in SGDR:

ηt=ηmin+12(ηmax−ηmin)(1+cos⁡(TcurTmaxπ)),Tcur≠(2k+1)Tmax;ηt+1=ηt+12(ηmax−ηmin)(1−cos⁡(1Tmaxπ)),Tcur=(2k+1)Tmax.\begin{aligned}
\eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 +
\cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq
(2k+1)T_{max}; \\\ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} -
\eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur}
= (2k+1)T_{max}. \end{aligned}

When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is
defined recursively, the learning rate can be simultaneously modified outside
this scheduler by other operators. If the learning rate is set solely by this
scheduler, the learning rate at each step becomes:

ηt=ηmin+12(ηmax−ηmin)(1+cos⁡(TcurTmaxπ))\eta_t = \eta_{min} +
\frac{1}{2}(\eta_{max} - \eta_{min})\left(1 +
\cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right)

It has been proposed in [SGDR: Stochastic Gradient Descent with Warm
Restarts](https://arxiv.org/abs/1608.03983). Note that this only implements
the cosine annealing part of SGDR, and not the restarts.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **T_max** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Maximum number of iterations.
  * **eta_min** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Minimum learning rate. Default: 0.
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

`class torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min',
factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0,
min_lr=0, eps=1e-08, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#ReduceLROnPlateau)

    

Reduce learning rate when a metric has stopped improving. Models often benefit
from reducing the learning rate by a factor of 2-10 once learning stagnates.
This scheduler reads a metrics quantity and if no improvement is seen for a
‘patience’ number of epochs, the learning rate is reduced.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – One of `min`, `max`. In `min` mode, lr will be reduced when the quantity monitored has stopped decreasing; in `max` mode it will be reduced when the quantity monitored has stopped increasing. Default: ‘min’.
  * **factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1.
  * **patience** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of epochs with no improvement after which learning rate will be reduced. For example, if `patience = 2`, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the 3rd epoch if the loss still hasn’t improved then. Default: 10.
  * **threshold** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4.
  * **threshold_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – One of `rel`, `abs`. In `rel` mode, dynamic_threshold = best * ( 1 + threshold ) in ‘max’ mode or best * ( 1 - threshold ) in `min` mode. In `abs` mode, dynamic_threshold = best + threshold in `max` mode or best - threshold in `min` mode. Default: ‘rel’.
  * **cooldown** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0.
  * **min_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0.
  * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

#### Example

    
    
    >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
    >>> scheduler = ReduceLROnPlateau(optimizer, 'min')
    >>> for epoch in range(10):
    >>>     train(...)
    >>>     val_loss = validate(...)
    >>>     # Note that step should be called after validate()
    >>>     scheduler.step(val_loss)
    

`class torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr,
step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0,
scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8,
max_momentum=0.9, last_epoch=-1, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CyclicLR)

    

Sets the learning rate of each parameter group according to cyclical learning
rate policy (CLR). The policy cycles the learning rate between two boundaries
with a constant frequency, as detailed in the paper [Cyclical Learning Rates
for Training Neural Networks](https://arxiv.org/abs/1506.01186). The distance
between the two boundaries can be scaled on a per-iteration or per-cycle
basis.

Cyclical learning rate policy changes the learning rate after every batch.
`step` should be called after a batch has been used for training.

This class has three built-in policies, as put forth in the paper:

  * “triangular”: A basic triangular cycle without amplitude scaling.
  * “triangular2”: A basic triangular cycle that scales initial amplitude by half each cycle.
  * “exp_range”: A cycle that scales initial amplitude by gammacycle iterations\text{gamma}^{\text{cycle iterations}} at each cycle iteration.

This implementation was adapted from the github repo:
[bckenstler/CLR](https://github.com/bckenstler/CLR)

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **base_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Initial learning rate which is the lower boundary in the cycle for each parameter group.
  * **max_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper learning rate boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_lr - base_lr). The lr at any cycle is the sum of base_lr and some scaling of the amplitude; therefore max_lr may not actually be reached depending on scaling function.
  * **step_size_up** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of training iterations in the increasing half of a cycle. Default: 2000
  * **step_size_down** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of training iterations in the decreasing half of a cycle. If step_size_down is None, it is set to step_size_up. Default: None
  * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – One of {triangular, triangular2, exp_range}. Values correspond to policies detailed above. If scale_fn is not None, this argument is ignored. Default: ‘triangular’
  * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Constant in ‘exp_range’ scaling function: gamma**(cycle iterations) Default: 1.0
  * **scale_fn** (_function_) – Custom scaling policy defined by a single argument lambda function, where 0 <= scale_fn(x) <= 1 for all x >= 0. If specified, then ‘mode’ is ignored. Default: None
  * **scale_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – {‘cycle’, ‘iterations’}. Defines whether scale_fn is evaluated on cycle number or cycle iterations (training iterations since start of cycle). Default: ‘cycle’
  * **cycle_momentum** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, momentum is cycled inversely to learning rate between ‘base_momentum’ and ‘max_momentum’. Default: True
  * **base_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Default: 0.8
  * **max_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). The momentum at any cycle is the difference of max_momentum and some scaling of the amplitude; therefore base_momentum may not actually be reached depending on scaling function. Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Default: 0.9
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of the last batch. This parameter is used when resuming a training job. Since `step()` should be invoked after each batch instead of after each epoch, this number represents the total number of _batches_ computed, not the total number of epochs computed. When last_epoch=-1, the schedule is started from the beginning. Default: -1
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

#### Example

    
    
    >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
    >>> scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.01, max_lr=0.1)
    >>> data_loader = torch.utils.data.DataLoader(...)
    >>> for epoch in range(10):
    >>>     for batch in data_loader:
    >>>         train_batch(...)
    >>>         scheduler.step()
    

`get_lr()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CyclicLR.get_lr)

    

Calculates the learning rate at batch index. This function treats
`self.last_epoch` as the last batch index.

If `self.cycle_momentum` is `True`, this function has a side effect of
updating the optimizer’s momentum.

`class torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr,
total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3,
anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85,
max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0,
three_phase=False, last_epoch=-1, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#OneCycleLR)

    

Sets the learning rate of each parameter group according to the 1cycle
learning rate policy. The 1cycle policy anneals the learning rate from an
initial learning rate to some maximum learning rate and then from that maximum
learning rate to some minimum learning rate much lower than the initial
learning rate. This policy was initially described in the paper [Super-
Convergence: Very Fast Training of Neural Networks Using Large Learning
Rates](https://arxiv.org/abs/1708.07120).

The 1cycle learning rate policy changes the learning rate after every batch.
`step` should be called after a batch has been used for training.

This scheduler is not chainable.

Note also that the total number of steps in the cycle can be determined in one
of two ways (listed in order of precedence):

  1. A value for total_steps is explicitly provided.
  2. A number of epochs (epochs) and a number of steps per epoch (steps_per_epoch) are provided. In this case, the number of total steps is inferred by total_steps = epochs * steps_per_epoch

You must either provide a value for total_steps or provide a value for both
epochs and steps_per_epoch.

The default behaviour of this scheduler follows the fastai implementation of
1cycle, which claims that “unpublished work has shown even better results by
using only two phases”. To mimic the behaviour of the original paper instead,
set `three_phase=True`.

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **max_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper learning rate boundaries in the cycle for each parameter group.
  * **total_steps** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The total number of steps in the cycle. Note that if a value is not provided here, then it must be inferred by providing a value for epochs and steps_per_epoch. Default: None
  * **epochs** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of epochs to train for. This is used along with steps_per_epoch in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None
  * **steps_per_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of steps per epoch to train for. This is used along with epochs in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None
  * **pct_start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3
  * **anneal_strategy** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – {‘cos’, ‘linear’} Specifies the annealing strategy: “cos” for cosine annealing, “linear” for linear annealing. Default: ‘cos’
  * **cycle_momentum** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, momentum is cycled inversely to learning rate between ‘base_momentum’ and ‘max_momentum’. Default: True
  * **base_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Default: 0.85
  * **max_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Default: 0.95
  * **div_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Determines the initial learning rate via initial_lr = max_lr/div_factor Default: 25
  * **final_div_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor Default: 1e4
  * **three_phase** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, use a third phase of the schedule to annihilate the learning rate according to ‘final_div_factor’ instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by ‘pct_start’).
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of the last batch. This parameter is used when resuming a training job. Since `step()` should be invoked after each batch instead of after each epoch, this number represents the total number of _batches_ computed, not the total number of epochs computed. When last_epoch=-1, the schedule is started from the beginning. Default: -1
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

#### Example

    
    
    >>> data_loader = torch.utils.data.DataLoader(...)
    >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9)
    >>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10)
    >>> for epoch in range(10):
    >>>     for batch in data_loader:
    >>>         train_batch(...)
    >>>         scheduler.step()
    

`class torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0,
T_mult=1, eta_min=0, last_epoch=-1, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CosineAnnealingWarmRestarts)

    

Set the learning rate of each parameter group using a cosine annealing
schedule, where ηmax\eta_{max} is set to the initial lr, TcurT_{cur} is the
number of epochs since the last restart and TiT_{i} is the number of epochs
between two warm restarts in SGDR:

ηt=ηmin+12(ηmax−ηmin)(1+cos⁡(TcurTiπ))\eta_t = \eta_{min} +
\frac{1}{2}(\eta_{max} - \eta_{min})\left(1 +
\cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)

When Tcur=TiT_{cur}=T_{i} , set ηt=ηmin\eta_t = \eta_{min} . When
Tcur=0T_{cur}=0 after restart, set ηt=ηmax\eta_t=\eta_{max} .

It has been proposed in [SGDR: Stochastic Gradient Descent with Warm
Restarts](https://arxiv.org/abs/1608.03983).

Parameters

    

  * **optimizer** (Optimizer) – Wrapped optimizer.
  * **T_0** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of iterations for the first restart.
  * **T_mult** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – A factor increases TiT_{i} after a restart. Default: 1.
  * **eta_min** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Minimum learning rate. Default: 0.
  * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The index of last epoch. Default: -1.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`.

`step(epoch=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CosineAnnealingWarmRestarts.step)

    

Step could be called after every batch update

#### Example

    
    
    >>> scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
    >>> iters = len(dataloader)
    >>> for epoch in range(20):
    >>>     for i, sample in enumerate(dataloader):
    >>>         inputs, labels = sample['inputs'], sample['labels']
    >>>         optimizer.zero_grad()
    >>>         outputs = net(inputs)
    >>>         loss = criterion(outputs, labels)
    >>>         loss.backward()
    >>>         optimizer.step()
    >>>         scheduler.step(epoch + i / iters)
    

This function can be called in an interleaved way.

#### Example

    
    
    >>> scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult)
    >>> for epoch in range(20):
    >>>     scheduler.step()
    >>> scheduler.step(26)
    >>> scheduler.step() # scheduler.step(27), instead of scheduler(20)
    

## Stochastic Weight Averaging

`torch.optim.swa_utils` implements Stochastic Weight Averaging (SWA). In
particular, `torch.optim.swa_utils.AveragedModel` class implements SWA models,
`torch.optim.swa_utils.SWALR` implements the SWA learning rate scheduler and
`torch.optim.swa_utils.update_bn()` is a utility function used to update SWA
batch normalization statistics at the end of training.

SWA has been proposed in [Averaging Weights Leads to Wider Optima and Better
Generalization](https://arxiv.org/abs/1803.05407).

### Constructing averaged models

`AveragedModel` class serves to compute the weights of the SWA model. You can
create an averaged model by running:

    
    
    >>> swa_model = AveragedModel(model)
    

Here the model `model` can be an arbitrary
[`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module") object. `swa_model` will keep track of the running averages
of the parameters of the `model`. To update these averages, you can use the
`update_parameters()` function:

    
    
    >>> swa_model.update_parameters(model)
    

### SWA learning rate schedules

Typically, in SWA the learning rate is set to a high constant value. `SWALR`
is a learning rate scheduler that anneals the learning rate to a fixed value,
and then keeps it constant. For example, the following code creates a
scheduler that linearly anneals the learning rate from its initial value to
0.05 in 5 epochs within each parameter group:

    
    
    >>> swa_scheduler = torch.optim.swa_utils.SWALR(optimizer, \
    >>>         anneal_strategy="linear", anneal_epochs=5, swa_lr=0.05)
    

You can also use cosine annealing to a fixed value instead of linear annealing
by setting `anneal_strategy="cos"`.

### Taking care of batch normalization

`update_bn()` is a utility function that allows to compute the batchnorm
statistics for the SWA model on a given dataloader `loader` at the end of
training:

    
    
    >>> torch.optim.swa_utils.update_bn(loader, swa_model)
    

`update_bn()` applies the `swa_model` to every element in the dataloader and
computes the activation statistics for each batch normalization layer in the
model.

Warning

`update_bn()` assumes that each batch in the dataloader `loader` is either a
tensors or a list of tensors where the first element is the tensor that the
network `swa_model` should be applied to. If your dataloader has a different
structure, you can update the batch normalization statistics of the
`swa_model` by doing a forward pass with the `swa_model` on each element of
the dataset.

### Custom averaging strategies

By default, `torch.optim.swa_utils.AveragedModel` computes a running equal
average of the parameters that you provide, but you can also use custom
averaging functions with the `avg_fn` parameter. In the following example
`ema_model` computes an exponential moving average.

Example:

    
    
    >>> ema_avg = lambda averaged_model_parameter, model_parameter, num_averaged:\
    >>>         0.1 * averaged_model_parameter + 0.9 * model_parameter
    >>> ema_model = torch.optim.swa_utils.AveragedModel(model, avg_fn=ema_avg)
    

### Putting it all together

In the example below, `swa_model` is the SWA model that accumulates the
averages of the weights. We train the model for a total of 300 epochs and we
switch to the SWA learning rate schedule and start to collect SWA averages of
the parameters at epoch 160:

    
    
    >>> loader, optimizer, model, loss_fn = ...
    >>> swa_model = torch.optim.swa_utils.AveragedModel(model)
    >>> scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300)
    >>> swa_start = 160
    >>> swa_scheduler = SWALR(optimizer, swa_lr=0.05)
    >>>
    >>> for epoch in range(300):
    >>>       for input, target in loader:
    >>>           optimizer.zero_grad()
    >>>           loss_fn(model(input), target).backward()
    >>>           optimizer.step()
    >>>       if epoch > swa_start:
    >>>           swa_model.update_parameters(model)
    >>>           swa_scheduler.step()
    >>>       else:
    >>>           scheduler.step()
    >>>
    >>> # Update bn statistics for the swa_model at the end
    >>> torch.optim.swa_utils.update_bn(loader, swa_model)
    >>> # Use swa_model to make predictions on test data
    >>> preds = swa_model(test_input)
    

# Pipeline Parallelism

Pipeline parallelism was original introduced in the
[Gpipe](https://arxiv.org/abs/1811.06965) paper and is an efficient technique
to train large models on multiple GPUs.

Warning

Pipeline Parallelism is experimental and subject to change.

## Model Parallelism using multiple GPUs

Typically for large models which don’t fit on a single GPU, model parallelism
is employed where certain parts of the model are placed on different GPUs.
Although, if this is done naively for sequential models, the training process
suffers from GPU under utilization since only one GPU is active at one time as
shown in the figure below:

The figure represents a model with 4 layers placed on 4 different GPUs
(vertical axis). The horizontal axis represents training this model through
time demonstrating that only 1 GPU is utilized at a time ([image
source](https://arxiv.org/abs/1811.06965)).

## Pipelined Execution

To alleviate this problem, pipeline parallelism splits the input minibatch
into multiple microbatches and pipelines the execution of these microbatches
across multiple GPUs. This is outlined in the figure below:

The figure represents a model with 4 layers placed on 4 different GPUs
(vertical axis). The horizontal axis represents training this model through
time demonstrating that the GPUs are utilized much more efficiently. However,
there still exists a bubble (as demonstrated in the figure) where certain GPUs
are not utilized. ([image source](https://arxiv.org/abs/1811.06965)).

## Pipe APIs in PyTorch

`class torch.distributed.pipeline.sync.Pipe(module, chunks=1,
checkpoint='except_last', deferred_batch_norm=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/pipe.html#Pipe)

    

Wraps an arbitrary
[`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential
"torch.nn.Sequential") module to train on using synchronous pipeline
parallelism. If the module requires lots of memory and doesn’t fit on a single
GPU, pipeline parallelism is a useful technique to employ for training.

The implementation is based on the
[torchgpipe](https://arxiv.org/abs/2004.09910) paper.

Pipe combines pipeline parallelism with checkpointing to reduce peak memory
required to train while minimizing device under-utilization.

You should place all the modules on the appropriate devices and wrap them into
an [`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential
"torch.nn.Sequential") module defining the desired order of execution.

Parameters

    

  * **module** ([`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential")) – sequential module to be parallelized using pipelining. Each module in the sequence has to have all of its parameters on a single device. Each module in the sequence has to either be an nn.Module or [`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") (to combine multiple sequential modules on a single device)
  * **chunks** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of micro-batches (default: `1`)
  * **checkpoint** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – when to enable checkpointing, one of `'always'`, `'except_last'`, or `'never'` (default: `'except_last'`). `'never'` disables checkpointing completely, `'except_last'` enables checkpointing for all micro-batches except the last one and `'always'` enables checkpointing for all micro-batches.
  * **deferred_batch_norm** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use deferred `BatchNorm` moving statistics (default: [`False`](https://docs.python.org/3/library/constants.html#False "\(in Python v3.9\)")). If set to [`True`](https://docs.python.org/3/library/constants.html#True "\(in Python v3.9\)"), we track statistics across multiple micro-batches to update the running statistics per mini-batch.

Raises

    

  * [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError "\(in Python v3.9\)") – the module is not a [`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential").
  * [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError "\(in Python v3.9\)") – invalid arguments

Example::

    

Pipeline of two FC layers across GPUs 0 and 1.

    
    
    >>> fc1 = nn.Linear(16, 8).cuda(0)
    >>> fc2 = nn.Linear(8, 4).cuda(1)
    >>> model = nn.Sequential(fc1, fc2)
    >>> model = Pipe(model, chunks=8)
    >>> input = torch.rand(16, 16).cuda(0)
    >>> output_rref = model(input)
    

Note

You can wrap a `Pipe` model with
[`torch.nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel
"torch.nn.parallel.DistributedDataParallel") only when the checkpoint
parameter of `Pipe` is `'never'`.

Note

`Pipe` only supports intra-node pipelining currently, but will be expanded to
support inter-node pipelining in the future. The forward function returns an
[`RRef`](rpc#torch.distributed.rpc.RRef "torch.distributed.rpc.RRef") to allow
for inter-node pipelining in the future, where the output might be on a remote
host. For intra-node pipelinining you can use
[`local_value()`](rpc#torch.distributed.rpc.RRef.local_value
"torch.distributed.rpc.RRef.local_value") to retrieve the output locally.

Warning

`Pipe` is experimental and subject to change.

`forward(input)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/pipe.html#Pipe.forward)

    

Processes a single input mini-batch through the pipe and returns an
[`RRef`](rpc#torch.distributed.rpc.RRef "torch.distributed.rpc.RRef") pointing
to the output. `Pipe` is a fairly transparent module wrapper. It doesn’t
modify the input and output signature of the underlying module. But there’s
type restriction. Input and output have to be a
[`Tensor`](tensors#torch.Tensor "torch.Tensor") or a sequence of tensors. This
restriction is applied at partition boundaries too.

The input tensor is split into multiple micro-batches based on the `chunks`
parameter used to initialize `Pipe`. The batch size is assumed to be the first
dimension of the tensor and if the batch size is less than `chunks`, the
number of micro-batches is equal to the batch size.

Parameters

    

**input** (torch.Tensor or sequence of [`Tensor`](tensors#torch.Tensor
"torch.Tensor")) – input mini-batch

Returns

    

[`RRef`](rpc#torch.distributed.rpc.RRef "torch.distributed.rpc.RRef") to the
output of the mini-batch

Raises

    

[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError
"\(in Python v3.9\)") – input is not a tensor or sequence of tensors.

### Skip connections

Certain models like ResNeXt are not completely sequential and have skip
connections between layers. Naively implementing as part of pipeling
parallelism would imply that we need to copy outputs for certain layers
through multiple GPUs till we eventually reach the GPU where the layer for the
skip connection resides. To avoid this copy overhead, we provide APIs below to
stash and pop Tensors in different layers of the model.

`torch.distributed.pipeline.sync.skip.skippable.skippable(stash=(), pop=())`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#skippable)

    

The decorator to define a
[`nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")
with skip connections. Decorated modules are called “skippable”. This
functionality works perfectly fine even when the module is not wrapped by
`Pipe`.

Each skip tensor is managed by its name. Before manipulating skip tensors, a
skippable module must statically declare the names for skip tensors by `stash`
and/or `pop` parameters. Skip tensors with pre-declared name can be stashed by
`yield stash(name, tensor)` or popped by `tensor = yield pop(name)`.

Here is an example with three layers. A skip tensor named “1to3” is stashed
and popped at the first and last layer, respectively:

    
    
    @skippable(stash=['1to3'])
    class Layer1(nn.Module):
        def forward(self, input):
            yield stash('1to3', input)
            return f1(input)
    
    class Layer2(nn.Module):
        def forward(self, input):
            return f2(input)
    
    @skippable(pop=['1to3'])
    class Layer3(nn.Module):
        def forward(self, input):
            skip_1to3 = yield pop('1to3')
            return f3(input) + skip_1to3
    
    model = nn.Sequential(Layer1(), Layer2(), Layer3())
    

One skippable module can stash or pop multiple skip tensors:

    
    
    @skippable(stash=['alice', 'bob'], pop=['carol'])
    class StashStashPop(nn.Module):
        def forward(self, input):
            yield stash('alice', f_alice(input))
            yield stash('bob', f_bob(input))
            carol = yield pop('carol')
            return input + carol
    

Every skip tensor must be associated with exactly one pair of `stash` and
`pop`. `Pipe` checks this restriction automatically when wrapping a module.
You can also check the restriction by `verify_skippables()` without `Pipe`.

`class torch.distributed.pipeline.sync.skip.skippable.stash(name, tensor)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#stash)

    

The command to stash a skip tensor.

    
    
    def forward(self, input):
        yield stash('name', input)
        return f(input)
    

Parameters

    

  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – name of skip tensor
  * **input** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)")) – tensor to pass to the skip connection

`class torch.distributed.pipeline.sync.skip.skippable.pop(name)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#pop)

    

The command to pop a skip tensor.

    
    
    def forward(self, input):
        skip = yield pop('name')
        return f(input) + skip
    

Parameters

    

**name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in
Python v3.9\)")) – name of skip tensor

Returns

    

the skip tensor previously stashed by another layer under the same name

`torch.distributed.pipeline.sync.skip.skippable.verify_skippables(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#verify_skippables)

    

Verifies if the underlying skippable modules satisfy integrity.

Every skip tensor must have only one pair of `stash` and `pop`. If there are
one or more unmatched pairs, it will raise
[`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError
"\(in Python v3.9\)") with the detailed messages.

Here are a few failure cases. `verify_skippables()` will report failure for
these cases:

    
    
    # Layer1 stashes "1to3".
    # Layer3 pops "1to3".
    
    nn.Sequential(Layer1(), Layer2())
    #               └──── ?
    
    nn.Sequential(Layer2(), Layer3())
    #                   ? ────┘
    
    nn.Sequential(Layer1(), Layer2(), Layer3(), Layer3())
    #               └───────────────────┘       ^^^^^^
    
    nn.Sequential(Layer1(), Layer1(), Layer2(), Layer3())
    #             ^^^^^^      └───────────────────┘
    

To use the same name for multiple skip tensors, they must be isolated by
different namespaces. See `isolate()`.

Raises

    

[**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError
"\(in Python v3.9\)") – one or more pairs of `stash` and `pop` are not
matched.

## Acknowledgements

The implementation for pipeline parallelism is based on [fairscale’s pipe
implementation](https://github.com/facebookresearch/fairscale/tree/master/fairscale/nn/pipe)
and [torchgpipe](https://github.com/kakaobrain/torchgpipe). We would like to
thank both teams for their contributions and guidance towards bringing
pipeline parallelism into PyTorch.

# torch.random

`torch.random.fork_rng(devices=None, enabled=True, _caller='fork_rng',
_devices_kw='devices')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#fork_rng)

    

Forks the RNG, so that when you return, the RNG is reset to the state that it
was previously in.

Parameters

    

  * **devices** (_iterable of CUDA IDs_) – CUDA devices for which to fork the RNG. CPU RNG state is always forked. By default, `fork_rng()` operates on all devices, but will emit a warning if your machine has a lot of devices, since this function will run very slowly in that case. If you explicitly specify devices, this warning will be suppressed
  * **enabled** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `False`, the RNG is not forked. This is a convenience argument for easily disabling the context manager without having to delete it and unindent your Python code under it.

`torch.random.get_rng_state()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#get_rng_state)

    

Returns the random number generator state as a `torch.ByteTensor`.

`torch.random.initial_seed()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#initial_seed)

    

Returns the initial seed for generating random numbers as a Python `long`.

`torch.random.manual_seed(seed)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#manual_seed)

    

Sets the seed for generating random numbers. Returns a `torch.Generator`
object.

Parameters

    

**seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)")) – The desired seed. Value must be within the inclusive range
`[-0x8000_0000_0000_0000, 0xffff_ffff_ffff_ffff]`. Otherwise, a RuntimeError
is raised. Negative inputs are remapped to positive values with the formula
`0xffff_ffff_ffff_ffff + seed`.

`torch.random.seed()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#seed)

    

Sets the seed for generating random numbers to a non-deterministic random
number. Returns a 64 bit number used to seed the RNG.

`torch.random.set_rng_state(new_state)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#set_rng_state)

    

Sets the random number generator state.

Parameters

    

**new_state** (_torch.ByteTensor_) – The desired state

# Distributed RPC Framework

The distributed RPC framework provides mechanisms for multi-machine model
training through a set of primitives to allow for remote communication, and a
higher-level API to automatically differentiate models split across several
machines.

Warning

APIs in the RPC package are stable. There are multiple ongoing work items to
improve performance and error handling, which will ship in future releases.

Note

Please refer to [PyTorch Distributed
Overview](https://pytorch.org/tutorials/beginner/dist_overview.html) for a
brief introduction to all features related to distributed training.

## Basics

The distributed RPC framework makes it easy to run functions remotely,
supports referencing remote objects without copying the real data around, and
provides autograd and optimizer APIs to transparently run backward and update
parameters across RPC boundaries. These features can be categorized into four
sets of APIs.

  1. **Remote Procedure Call (RPC)** supports running a function on the specified destination worker with the given arguments and getting the return value back or creating a reference to the return value. There are three main RPC APIs: `rpc_sync()` (synchronous), `rpc_async()` (asynchronous), and `remote()` (asynchronous and returns a reference to the remote return value). Use the synchronous API if the user code cannot proceed without the return value. Otherwise, use the asynchronous API to get a future, and wait on the future when the return value is needed on the caller. The `remote()` API is useful when the requirement is to create something remotely but never need to fetch it to the caller. Imagine the case that a driver process is setting up a parameter server and a trainer. The driver can create an embedding table on the parameter server and then share the reference to the embedding table with the trainer, but itself will never use the embedding table locally. In this case, `rpc_sync()` and `rpc_async()` are no longer appropriate, as they always imply that the return value will be returned to the caller immediately or in the future.
  2. **Remote Reference (RRef)** serves as a distributed shared pointer to a local or remote object. It can be shared with other workers and reference counting will be handled transparently. Each RRef only has one owner and the object only lives on that owner. Non-owner workers holding RRefs can get copies of the object from the owner by explicitly requesting it. This is useful when a worker needs to access some data object, but itself is neither the creator (the caller of `remote()`) or the owner of the object. The distributed optimizer, as we will discuss below, is one example of such use cases.
  3. **Distributed Autograd** stitches together local autograd engines on all the workers involved in the forward pass, and automatically reach out to them during the backward pass to compute gradients. This is especially helpful if the forward pass needs to span multiple machines when conducting, e.g., distributed model parallel training, parameter-server training, etc. With this feature, user code no longer needs to worry about how to send gradients across RPC boundaries and in which order should the local autograd engines be launched, which can become quite complicated where there are nested and inter-dependent RPC calls in the forward pass.
  4. **Distributed Optimizer** ’s constructor takes a [`Optimizer()`](optim#torch.optim.Optimizer "torch.optim.Optimizer") (e.g., [`SGD()`](optim#torch.optim.SGD "torch.optim.SGD"), [`Adagrad()`](optim#torch.optim.Adagrad "torch.optim.Adagrad"), etc.) and a list of parameter RRefs, creates an [`Optimizer()`](optim#torch.optim.Optimizer "torch.optim.Optimizer") instance on each distinct RRef owner, and updates parameters accordingly when running `step()`. When you have distributed forward and backward passes, parameters and gradients will be scattered across multiple workers, and hence it requires an optimizer on each of the involved workers. Distributed Optimizer wraps all those local optimizers into one, and provides a concise constructor and `step()` API.

## RPC

Before using RPC and distributed autograd primitives, initialization must take
place. To initialize the RPC framework we need to use `init_rpc()` which would
initialize the RPC framework, RRef framework and distributed autograd.

`torch.distributed.rpc.init_rpc(name, backend=None, rank=-1, world_size=None,
rpc_backend_options=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc.html#init_rpc)

    

Initializes RPC primitives such as the local RPC agent and distributed
autograd, which immediately makes the current process ready to send and
receive RPCs.

Parameters

    

  * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – a globally unique name of this node. (e.g., `Trainer3`, `ParameterServer2`, `Master`, `Worker1`) Name can only contain number, alphabet, underscore, colon, and/or dash, and must be shorter than 128 characters.
  * **backend** (BackendType _,__optional_) – The type of RPC backend implementation. Supported values include `BackendType.TENSORPIPE` (the default) and `BackendType.PROCESS_GROUP`. See Backends for more information.
  * **rank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – a globally unique id/rank of this node.
  * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of workers in the group.
  * **rpc_backend_options** (RpcBackendOptions _,__optional_) – The options passed to the RpcAgent constructor. It must be an agent-specific subclass of `RpcBackendOptions` and contains agent-specific initialization configurations. By default, for all agents, it sets the default timeout to 60 seconds and performs the rendezvous with an underlying process group initialized using `init_method = "env://"`, meaning that environment variables `MASTER_ADDR` and `MASTER_PORT` need to be set properly. See Backends for more information and find which options are available.

The following APIs allow users to remotely execute functions as well as create
references (RRefs) to remote data objects. In these APIs, when passing a
`Tensor` as an argument or a return value, the destination worker will try to
create a `Tensor` with the same meta (i.e., shape, stride, etc.). We
intentionally disallow transmitting CUDA tensors because it might crash if the
device lists on source and destination workers do not match. In such cases,
applications can always explicitly move the input tensors to CPU on the caller
and move it to the desired devices on the callee if necessary.

Warning

TorchScript support in RPC is a prototype feature and subject to change. Since
v1.5.0, `torch.distributed.rpc` supports calling TorchScript functions as RPC
target functions, and this will help improve parallelism on the callee side as
executing TorchScript functions does not require GIL.

`torch.distributed.rpc.rpc_sync(to, func, args=None, kwargs=None,
timeout=-1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#rpc_sync)

    

Make a blocking RPC call to run function `func` on worker `to`. RPC messages
are sent and received in parallel to execution of Python code. This method is
thread-safe.

Parameters

    

  * **to** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_WorkerInfo _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – name/rank/`WorkerInfo` of the destination worker.
  * **func** (_callable_) – a callable function, such as Python callables, builtin operators (e.g. [`add()`](generated/torch.add#torch.add "torch.add")) and annotated TorchScript functions.
  * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the argument tuple for the `func` invocation.
  * **kwargs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – is a dictionary of keyword arguments for the `func` invocation.
  * **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – timeout in seconds to use for this RPC. If the RPC does not complete in this amount of time, an exception indicating it has timed out will be raised. A value of 0 indicates an infinite timeout, i.e. a timeout error will never be raised. If not provided, the default value set during initialization or with `_set_rpc_timeout` is used.

Returns

    

Returns the result of running `func` with `args` and `kwargs`.

Warning

Using GPU tensors as arguments or return values of `func` is not supported
since we don’t support sending GPU tensors over the wire. You need to
explicitly copy GPU tensors to CPU before using them as arguments or return
values of `func`.

Example::

    

Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both
workers. Refer to
[`init_process_group()`](distributed#torch.distributed.init_process_group
"torch.distributed.init_process_group") API for more details. For example,

    
    
    >>> export MASTER_ADDR=localhost
    >>> export MASTER_PORT=5678
    

Then run the following code in two different processes:

    
    
    >>> # On worker 0:
    >>> import torch
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker0", rank=0, world_size=2)
    >>> ret = rpc.rpc_sync("worker1", torch.add, args=(torch.ones(2), 3))
    >>> rpc.shutdown()
    
    
    
    >>> # On worker 1:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker1", rank=1, world_size=2)
    >>> rpc.shutdown()
    

Below is an example of running a TorchScript function using RPC.

    
    
    >>> # On both workers:
    >>> @torch.jit.script
    >>> def my_script_add(t1, t2):
    >>>    return torch.add(t1, t2)
    
    
    
    >>> # On worker 0:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker0", rank=0, world_size=2)
    >>> ret = rpc.rpc_sync("worker1", my_script_add, args=(torch.ones(2), 3))
    >>> rpc.shutdown()
    
    
    
    >>> # On worker 1:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker1", rank=1, world_size=2)
    >>> rpc.shutdown()
    

`torch.distributed.rpc.rpc_async(to, func, args=None, kwargs=None,
timeout=-1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#rpc_async)

    

Make a non-blocking RPC call to run function `func` on worker `to`. RPC
messages are sent and received in parallel to execution of Python code. This
method is thread-safe. This method will immediately return a
[`Future`](futures#torch.futures.Future "torch.futures.Future") that can be
awaited on.

Parameters

    

  * **to** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_WorkerInfo _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – name/rank/`WorkerInfo` of the destination worker.
  * **func** (_callable_) – a callable function, such as Python callables, builtin operators (e.g. [`add()`](generated/torch.add#torch.add "torch.add")) and annotated TorchScript functions.
  * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the argument tuple for the `func` invocation.
  * **kwargs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – is a dictionary of keyword arguments for the `func` invocation.
  * **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – timeout in seconds to use for this RPC. If the RPC does not complete in this amount of time, an exception indicating it has timed out will be raised. A value of 0 indicates an infinite timeout, i.e. a timeout error will never be raised. If not provided, the default value set during initialization or with `_set_rpc_timeout` is used.

Returns

    

Returns a [`Future`](futures#torch.futures.Future "torch.futures.Future")
object that can be waited on. When completed, the return value of `func` on
`args` and `kwargs` can be retrieved from the
[`Future`](futures#torch.futures.Future "torch.futures.Future") object.

Warning

Using GPU tensors as arguments or return values of `func` is not supported
since we don’t support sending GPU tensors over the wire. You need to
explicitly copy GPU tensors to CPU before using them as arguments or return
values of `func`.

Warning

The `rpc_async` API does not copy storages of argument tensors until sending
them over the wire, which could be done by a different thread depending on the
RPC backend type. The caller should make sure that the contents of those
tensors stay intact until the returned [`Future`](futures#torch.futures.Future
"torch.futures.Future") completes.

Example::

    

Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both
workers. Refer to
[`init_process_group()`](distributed#torch.distributed.init_process_group
"torch.distributed.init_process_group") API for more details. For example,

    
    
    >>> export MASTER_ADDR=localhost
    >>> export MASTER_PORT=5678
    

Then run the following code in two different processes:

    
    
    >>> # On worker 0:
    >>> import torch
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker0", rank=0, world_size=2)
    >>> fut1 = rpc.rpc_async("worker1", torch.add, args=(torch.ones(2), 3))
    >>> fut2 = rpc.rpc_async("worker1", min, args=(1, 2))
    >>> result = fut1.wait() + fut2.wait()
    >>> rpc.shutdown()
    
    
    
    >>> # On worker 1:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker1", rank=1, world_size=2)
    >>> rpc.shutdown()
    

Below is an example of running a TorchScript function using RPC.

    
    
    >>> # On both workers:
    >>> @torch.jit.script
    >>> def my_script_add(t1, t2):
    >>>    return torch.add(t1, t2)
    
    
    
    >>> # On worker 0:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker0", rank=0, world_size=2)
    >>> fut = rpc.rpc_async("worker1", my_script_add, args=(torch.ones(2), 3))
    >>> ret = fut.wait()
    >>> rpc.shutdown()
    
    
    
    >>> # On worker 1:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker1", rank=1, world_size=2)
    >>> rpc.shutdown()
    

`torch.distributed.rpc.remote(to, func, args=None, kwargs=None, timeout=-1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#remote)

    

Make a remote call to run `func` on worker `to` and return an `RRef` to the
result value immediately. Worker `to` will be the owner of the returned
`RRef`, and the worker calling `remote` is a user. The owner manages the
global reference count of its `RRef`, and the owner `RRef` is only destructed
when globally there are no living references to it.

Parameters

    

  * **to** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_WorkerInfo _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – name/rank/`WorkerInfo` of the destination worker.
  * **func** (_callable_) – a callable function, such as Python callables, builtin operators (e.g. [`add()`](generated/torch.add#torch.add "torch.add")) and annotated TorchScript functions.
  * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the argument tuple for the `func` invocation.
  * **kwargs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – is a dictionary of keyword arguments for the `func` invocation.
  * **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – timeout in seconds for this remote call. If the creation of this `RRef` on worker `to` is not successfully processed on this worker within this timeout, then the next time there is an attempt to use the RRef (such as `to_here()`), a timeout will be raised indicating this failure. A value of 0 indicates an infinite timeout, i.e. a timeout error will never be raised. If not provided, the default value set during initialization or with `_set_rpc_timeout` is used.

Returns

    

A user `RRef` instance to the result value. Use the blocking API
`torch.distributed.rpc.RRef.to_here()` to retrieve the result value locally.

Warning

Using GPU tensors as arguments or return values of `func` is not supported
since we don’t support sending GPU tensors over the wire. You need to
explicitly copy GPU tensors to CPU before using them as arguments or return
values of `func`.

Warning

The `remote` API does not copy storages of argument tensors until sending them
over the wire, which could be done by a different thread depending on the RPC
backend type. The caller should make sure that the contents of those tensors
stay intact until the returned RRef is confirmed by the owner, which can be
checked using the `torch.distributed.rpc.RRef.confirmed_by_owner()` API.

Warning

Errors such as timeouts for the `remote` API are handled on a best-effort
basis. This means that when remote calls initiated by `remote` fail, such as
with a timeout error, we take a best-effort approach to error handling. This
means that errors are handled and set on the resulting RRef on an asynchronous
basis. If the RRef has not been used by the application before this handling
(such as `to_here` or fork call), then future uses of the `RRef` will
appropriately raise errors. However, it is possible that the user application
will use the `RRef` before the errors are handled. In this case, errors may
not be raised as they have not yet been handled.

Example::

    

Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both
workers. Refer to
[`init_process_group()`](distributed#torch.distributed.init_process_group
"torch.distributed.init_process_group") API for more details. For example,

    
    
    >>> export MASTER_ADDR=localhost
    >>> export MASTER_PORT=5678
    

Then run the following code in two different processes:

    
    
    >>> # On worker 0:
    >>> import torch
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker0", rank=0, world_size=2)
    >>> rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3))
    >>> rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1))
    >>> x = rref1.to_here() + rref2.to_here()
    >>> rpc.shutdown()
    
    
    
    >>> # On worker 1:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker1", rank=1, world_size=2)
    >>> rpc.shutdown()
    

Below is an example of running a TorchScript function using RPC.

    
    
    >>> # On both workers:
    >>> @torch.jit.script
    >>> def my_script_add(t1, t2):
    >>>    return torch.add(t1, t2)
    
    
    
    >>> # On worker 0:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker0", rank=0, world_size=2)
    >>> rref = rpc.remote("worker1", my_script_add, args=(torch.ones(2), 3))
    >>> rref.to_here()
    >>> rpc.shutdown()
    
    
    
    >>> # On worker 1:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker1", rank=1, world_size=2)
    >>> rpc.shutdown()
    

`torch.distributed.rpc.get_worker_info(worker_name=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#get_worker_info)

    

Get `WorkerInfo` of a given worker name. Use this `WorkerInfo` to avoid
passing an expensive string on every invocation.

Parameters

    

**worker_name** ([str](https://docs.python.org/3/library/stdtypes.html#str
"\(in Python v3.9\)")) – the string name of a worker. If `None`, return the
the id of the current worker. (default `None`)

Returns

    

`WorkerInfo` instance for the given `worker_name` or `WorkerInfo` of the
current worker if `worker_name` is `None`.

`torch.distributed.rpc.shutdown(graceful=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#shutdown)

    

Perform a shutdown of the RPC agent, and then destroy the RPC agent. This
stops the local agent from accepting outstanding requests, and shuts down the
RPC framework by terminating all RPC threads. If `graceful=True`, this will
block until all local and remote RPC processes reach this method and wait for
all outstanding work to complete. Otherwise, if `graceful=False`, this is a
local shutdown, and it does not wait for other RPC processes to reach this
method.

Warning

For [`Future`](futures#torch.futures.Future "torch.futures.Future") objects
returned by `rpc_async()`, `future.wait()` should not be called after
`shutdown()`.

Parameters

    

**graceful** ([bool](https://docs.python.org/3/library/functions.html#bool
"\(in Python v3.9\)")) – Whether to do a graceful shutdown or not. If True,
this will 1) wait until there is no pending system messages for `UserRRefs`
and delete them; 2) block until all local and remote RPC processes have
reached this method and wait for all outstanding work to complete.

Example::

    

Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both
workers. Refer to
[`init_process_group()`](distributed#torch.distributed.init_process_group
"torch.distributed.init_process_group") API for more details. For example,

    
    
    >>> export MASTER_ADDR=localhost
    >>> export MASTER_PORT=5678
    

Then run the following code in two different processes:

    
    
    >>> # On worker 0:
    >>> import torch
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker0", rank=0, world_size=2)
    >>> # do some work
    >>> result = rpc.rpc_sync("worker1", torch.add, args=(torch.ones(1), 1))
    >>> # ready to shutdown
    >>> rpc.shutdown()
    
    
    
    >>> # On worker 1:
    >>> import torch.distributed.rpc as rpc
    >>> rpc.init_rpc("worker1", rank=1, world_size=2)
    >>> # wait for worker 0 to finish work, and then shutdown.
    >>> rpc.shutdown()
    

`class torch.distributed.rpc.WorkerInfo`

    

A structure that encapsulates information of a worker in the system. Contains
the name and ID of the worker. This class is not meant to be constructed
directly, rather, an instance can be retrieved through `get_worker_info()` and
the result can be passed in to functions such as `rpc_sync()`, `rpc_async()`,
`remote()` to avoid copying a string on every invocation.

`property id`

    

Globally unique id to identify the worker.

`property name`

    

The name of the worker.

The RPC package also provides decorators which allow applications to specify
how a given function should be treated on the callee side.

`torch.distributed.rpc.functions.async_execution(fn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/functions.html#async_execution)

    

A decorator for a function indicating that the return value of the function is
guaranteed to be a [`Future`](futures#torch.futures.Future
"torch.futures.Future") object and this function can run asynchronously on the
RPC callee. More specifically, the callee extracts the
[`Future`](futures#torch.futures.Future "torch.futures.Future") returned by
the wrapped function and installs subsequent processing steps as a callback to
that [`Future`](futures#torch.futures.Future "torch.futures.Future"). The
installed callback will read the value from the
[`Future`](futures#torch.futures.Future "torch.futures.Future") when completed
and send the value back as the RPC response. That also means the returned
[`Future`](futures#torch.futures.Future "torch.futures.Future") only exists on
the callee side and is never sent through RPC. This decorator is useful when
the wrapped function’s (`fn`) execution needs to pause and resume due to,
e.g., containing `rpc_async()` or waiting for other signals.

Note

To enable asynchronous execution, applications must pass the function object
returned by this decorator to RPC APIs. If RPC detected attributes installed
by this decorator, it knows that this function returns a `Future` object and
will handle that accordingly. However, this does not mean this decorator has
to be outmost one when defining a function. For example, when combined with
`@staticmethod` or `@classmethod`, `@rpc.functions.async_execution` needs to
be the inner decorator to allow the target function be recognized as a static
or class function. This target function can still execute asynchronously
because, when accessed, the static or class method preserves attributes
installed by `@rpc.functions.async_execution`.

Example::

    

The returned [`Future`](futures#torch.futures.Future "torch.futures.Future")
object can come from `rpc_async()`,
[`then()`](futures#torch.futures.Future.then "torch.futures.Future.then"), or
[`Future`](futures#torch.futures.Future "torch.futures.Future") constructor.
The example below shows directly using the
[`Future`](futures#torch.futures.Future "torch.futures.Future") returned by
[`then()`](futures#torch.futures.Future.then "torch.futures.Future.then").

    
    
    >>> from torch.distributed import rpc
    >>>
    >>> # omitting setup and shutdown RPC
    >>>
    >>> # On all workers
    >>> @rpc.functions.async_execution
    >>> def async_add_chained(to, x, y, z):
    >>>     # This function runs on "worker1" and returns immediately when
    >>>     # the callback is installed through the `then(cb)` API. In the
    >>>     # mean time, the `rpc_async` to "worker2" can run concurrently.
    >>>     # When the return value of that `rpc_async` arrives at
    >>>     # "worker1", "worker1" will run the lambda function accordingly
    >>>     # and set the value for the previously returned `Future`, which
    >>>     # will then trigger RPC to send the result back to "worker0".
    >>>     return rpc.rpc_async(to, torch.add, args=(x, y)).then(
    >>>         lambda fut: fut.wait() + z
    >>>     )
    >>>
    >>> # On worker0
    >>> ret = rpc.rpc_sync(
    >>>     "worker1",
    >>>     async_add_chained,
    >>>     args=("worker2", torch.ones(2), 1, 1)
    >>> )
    >>> print(ret)  # prints tensor([3., 3.])
    

When combined with TorchScript decorators, this decorator must be the outmost
one.

    
    
    >>> from torch import Tensor
    >>> from torch.futures import Future
    >>> from torch.distributed import rpc
    >>>
    >>> # omitting setup and shutdown RPC
    >>>
    >>> # On all workers
    >>> @torch.jit.script
    >>> def script_add(x: Tensor, y: Tensor) -> Tensor:
    >>>     return x + y
    >>>
    >>> @rpc.functions.async_execution
    >>> @torch.jit.script
    >>> def async_add(to: str, x: Tensor, y: Tensor) -> Future[Tensor]:
    >>>     return rpc.rpc_async(to, script_add, (x, y))
    >>>
    >>> # On worker0
    >>> ret = rpc.rpc_sync(
    >>>     "worker1",
    >>>     async_add,
    >>>     args=("worker2", torch.ones(2), 1)
    >>> )
    >>> print(ret)  # prints tensor([2., 2.])
    

When combined with static or class method, this decorator must be the inner
one.

    
    
    >>> from torch.distributed import rpc
    >>>
    >>> # omitting setup and shutdown RPC
    >>>
    >>> # On all workers
    >>> class AsyncExecutionClass:
    >>>
    >>>     @staticmethod
    >>>     @rpc.functions.async_execution
    >>>     def static_async_add(to, x, y, z):
    >>>         return rpc.rpc_async(to, torch.add, args=(x, y)).then(
    >>>             lambda fut: fut.wait() + z
    >>>         )
    >>>
    >>>     @classmethod
    >>>     @rpc.functions.async_execution
    >>>     def class_async_add(cls, to, x, y, z):
    >>>         ret_fut = torch.futures.Future()
    >>>         rpc.rpc_async(to, torch.add, args=(x, y)).then(
    >>>             lambda fut: ret_fut.set_result(fut.wait() + z)
    >>>         )
    >>>         return ret_fut
    >>>
    >>>     @rpc.functions.async_execution
    >>>     def bound_async_add(self, to, x, y, z):
    >>>         return rpc.rpc_async(to, torch.add, args=(x, y)).then(
    >>>             lambda fut: fut.wait() + z
    >>>         )
    >>>
    >>> # On worker0
    >>> ret = rpc.rpc_sync(
    >>>     "worker1",
    >>>     AsyncExecutionClass.static_async_add,
    >>>     args=("worker2", torch.ones(2), 1, 2)
    >>> )
    >>> print(ret)  # prints tensor([4., 4.])
    >>>
    >>> ret = rpc.rpc_sync(
    >>>     "worker1",
    >>>     AsyncExecutionClass.class_async_add,
    >>>     args=("worker2", torch.ones(2), 1, 2)
    >>> )
    >>> print(ret)  # prints tensor([4., 4.])
    

This decorator also works with RRef helpers, i.e., .
`torch.distributed.rpc.RRef.rpc_sync()`,
`torch.distributed.rpc.RRef.rpc_async()`, and
`torch.distributed.rpc.RRef.remote()`.

    
    
    >>> from torch.distributed import rpc
    >>>
    >>> # reuse the AsyncExecutionClass class above
    >>> rref = rpc.remote("worker1", AsyncExecutionClass)
    >>> ret = rref.rpc_sync().static_async_add("worker2", torch.ones(2), 1, 2)
    >>> print(ret)  # prints tensor([4., 4.])
    >>>
    >>> rref = rpc.remote("worker1", AsyncExecutionClass)
    >>> ret = rref.rpc_async().static_async_add("worker2", torch.ones(2), 1, 2).wait()
    >>> print(ret)  # prints tensor([4., 4.])
    >>>
    >>> rref = rpc.remote("worker1", AsyncExecutionClass)
    >>> ret = rref.remote().static_async_add("worker2", torch.ones(2), 1, 2).to_here()
    >>> print(ret)  # prints tensor([4., 4.])
    

### Backends

The RPC module can leverage different backends to perform the communication
between the nodes. The backend to be used can be specified in the `init_rpc()`
function, by passing a certain value of the `BackendType` enum. Regardless of
what backend is used, the rest of the RPC API won’t change. Each backend also
defines its own subclass of the `RpcBackendOptions` class, an instance of
which can also be passed to `init_rpc()` to configure the backend’s behavior.

`class torch.distributed.rpc.BackendType`

    

An enum class of available backends.

PyTorch ships with two builtin backends: `BackendType.TENSORPIPE` and
`BackendType.PROCESS_GROUP`. Additional ones can be registered using the
`register_backend()` function.

`class torch.distributed.rpc.RpcBackendOptions`

    

An abstract structure encapsulating the options passed into the RPC backend.
An instance of this class can be passed in to `init_rpc()` in order to
initialize RPC with specific configurations, such as the RPC timeout and
`init_method` to be used.

`property init_method`

    

URL specifying how to initialize the process group. Default is `env://`

`property rpc_timeout`

    

A float indicating the timeout to use for all RPCs. If an RPC does not
complete in this timeframe, it will complete with an exception indicating that
it has timed out.

#### TensorPipe Backend

The TensorPipe agent, which is the default, leverages [the TensorPipe
library](https://github.com/pytorch/tensorpipe), which provides a natively
point-to-point communication primitive specifically suited for machine
learning that fundamentally addresses some of the limitations of Gloo.
Compared to Gloo, it has the advantage of being asynchronous, which allows a
large number of transfers to occur simultaneously, each at their own speed,
without blocking each other. It will only open pipes between pairs of nodes
when needed, on demand, and when one node fails only its incident pipes will
be closed, while all other ones will keep working as normal. In addition, it
is able to support multiple different transports (TCP, of course, but also
shared memory, NVLink, InfiniBand, …) and can automatically detect their
availability and negotiate the best transport to use for each pipe.

The TensorPipe backend has been introduced in PyTorch v1.6 and is being
actively developed. At the moment, it only supports CPU tensors, with GPU
support coming soon. It comes with a TCP-based transport, just like Gloo. It
is also able to automatically chunk and multiplex large tensors over multiple
sockets and threads in order to achieve very high bandwidths. The agent will
be able to pick the best transport on its own, with no intervention required.

Example:

    
    
    >>> import os
    >>> from torch.distributed import rpc
    >>> os.environ['MASTER_ADDR'] = 'localhost'
    >>> os.environ['MASTER_PORT'] = '29500'
    >>>
    >>> rpc.init_rpc(
    >>>     "worker1",
    >>>     rank=0,
    >>>     world_size=2,
    >>>     rpc_backend_options=rpc.TensorPipeRpcBackendOptions(
    >>>         num_worker_threads=8,
    >>>         rpc_timeout=20 # 20 second timeout
    >>>     )
    >>> )
    >>>
    >>> # omitting init_rpc invocation on worker2
    

`class torch.distributed.rpc.TensorPipeRpcBackendOptions(*,
num_worker_threads=16, rpc_timeout=60.0, init_method='env://',
device_maps=None, _transports=None, _channels=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/options.html#TensorPipeRpcBackendOptions)

    

The backend options for `TensorPipeAgent`, derived from `RpcBackendOptions`.

Parameters

    

  * **num_worker_threads** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The number of threads in the thread-pool used by `TensorPipeAgent` to execute requests (default: 16).
  * **rpc_timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The default timeout, in seconds, for RPC requests (default: 60 seconds). If the RPC has not completed in this timeframe, an exception indicating so will be raised. Callers can override this timeout for individual RPCs in `rpc_sync()` and `rpc_async()` if necessary.
  * **init_method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – The URL to initialize the distributed store used for rendezvous. It takes any value accepted for the same argument of [`init_process_group()`](distributed#torch.distributed.init_process_group "torch.distributed.init_process_group") (default: `env://`).
  * **device_maps** (_Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Dict_ _]_) – Device placement mappings from this worker to the callee. Key is the callee worker name and value the dictionary (`Dict` of `int`, `str`, or `torch.device`) that maps this worker’s devices to the callee worker’s devices. (default: `None`)

`property device_maps`

    

The device map locations.

`property init_method`

    

URL specifying how to initialize the process group. Default is `env://`

`property num_worker_threads`

    

The number of threads in the thread-pool used by `TensorPipeAgent` to execute
requests.

`property rpc_timeout`

    

A float indicating the timeout to use for all RPCs. If an RPC does not
complete in this timeframe, it will complete with an exception indicating that
it has timed out.

`set_device_map(to, device_map)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/options.html#TensorPipeRpcBackendOptions.set_device_map)

    

Set device mapping between each RPC caller and callee pair. This function can
be called multiple times to incrementally add device placement configurations.

Parameters

    

  * **worker_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Callee name.
  * **device_map** (_Dict of python:int_ _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _, or_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device")) – Device placement mappings from this worker to the callee. This map must be invertible.

Example::

    
    
    
    >>> # both workers
    >>> def add(x, y):
    >>>     print(x)  # tensor([1., 1.], device='cuda:1')
    >>>     return x + y, (x + y).to(2)
    >>>
    >>> # on worker 0
    >>> options = TensorPipeRpcBackendOptions(
    >>>     num_worker_threads=8,
    >>>     device_maps={"worker1": {0, 1}}
    >>>     # maps worker0's cuda:0 to worker1's cuda:1
    >>> )
    >>> options.set_device_map("worker1", {1, 2})
    >>> # maps worker0's cuda:1 to worker1's cuda:2
    >>>
    >>> rpc.init_rpc(
    >>>     "worker0",
    >>>     rank=0,
    >>>     world_size=2
    >>>     backend=rpc.BackendType.TENSORPIPE,
    >>>     rpc_backend_options=options
    >>> )
    >>>
    >>> x = torch.ones(2)
    >>> rets = rpc.rpc_sync("worker1", add, args=(x.to(0), 1))
    >>> # The first argument will be moved to cuda:1 on worker1. When
    >>> # sending the return value back, it will follow the invert of
    >>> # the device map, and hence will be moved back to cuda:0 and
    >>> # cuda:1 on worker0
    >>> print(rets[0])  # tensor([2., 2.], device='cuda:0')
    >>> print(rets[0])  # tensor([2., 2.], device='cuda:1')
    

#### Process Group Backend

Warning

The Process Group Backend will be deprecated soon, we recommend using the
TensorPipe Backend instead.

The Process Group agent instantiates a process group from the
[`distributed`](distributed#module-torch.distributed "torch.distributed")
module and utilizes its point-to-point communication capabilities to send RPC
messages. Internally, the process group uses [the Gloo
library](https://github.com/facebookincubator/gloo/).

Gloo has been hardened by years of extensive use in PyTorch and is thus very
reliable. However, as it was designed to perform collective communication, it
may not always be the best fit for RPC. For example, each networking operation
is synchronous and blocking, which means that it cannot be run in parallel
with others. Moreover, it opens a connection between all pairs of nodes, and
brings down all of them when one fails, thus reducing the resiliency and the
elasticity of the system.

Example:

    
    
    >>> import os
    >>> from torch.distributed import rpc
    >>> os.environ['MASTER_ADDR'] = 'localhost'
    >>> os.environ['MASTER_PORT'] = '29500'
    >>>
    >>> rpc.init_rpc(
    >>>     "worker1",
    >>>     rank=0,
    >>>     world_size=2,
    >>>     backend=rpc.BackendType.PROCESS_GROUP,
    >>>     rpc_backend_options=rpc.ProcessGroupRpcBackendOptions(
    >>>         num_send_recv_threads=16,
    >>>         rpc_timeout=20 # 20 second timeout
    >>>     )
    >>> )
    >>>
    >>> # omitting init_rpc invocation on worker2
    

`class torch.distributed.rpc.ProcessGroupRpcBackendOptions`

    

The backend options class for `ProcessGroupAgent`, which is derived from
`RpcBackendOptions`.

Parameters

    

  * **num_send_recv_threads** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The number of threads in the thread-pool used by `ProcessGroupAgent` (default: 4).
  * **rpc_timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The default timeout, in seconds, for RPC requests (default: 60 seconds). If the RPC has not completed in this timeframe, an exception indicating so will be raised. Callers can override this timeout for individual RPCs in `rpc_sync()` and `rpc_async()` if necessary.
  * **init_method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – The URL to initialize `ProcessGroupGloo` (default: `env://`).

`property init_method`

    

URL specifying how to initialize the process group. Default is `env://`

`property num_send_recv_threads`

    

The number of threads in the thread-pool used by ProcessGroupAgent.

`property rpc_timeout`

    

A float indicating the timeout to use for all RPCs. If an RPC does not
complete in this timeframe, it will complete with an exception indicating that
it has timed out.

## RRef

An `RRef` (Remote REFerence) is a reference to a value of some type `T` (e.g.
`Tensor`) on a remote worker. This handle keeps the referenced remote value
alive on the owner, but there is no implication that the value will be
transferred to the local worker in the future. RRefs can be used in multi-
machine training by holding references to
[nn.Modules](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) that
exist on other workers, and calling the appropriate functions to retrieve or
modify their parameters during training. See [Remote Reference
Protocol](rpc/rref#remote-reference-protocol) for more details.

`class torch.distributed.rpc.RRef`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#RRef)

    

`backward(self: torch._C._distributed_rpc.PyRRef, dist_autograd_ctx_id: int =
-1, retain_graph: bool = False) → None`

    

Runs the backward pass using the RRef as the root of the backward pass. If
`dist_autograd_ctx_id` is provided, we perform a distributed backward pass
using the provided ctx_id starting from the owner of the RRef. In this case,
`get_gradients()` should be used to retrieve the gradients. If
`dist_autograd_ctx_id` is `None`, it is assumed that this is a local autograd
graph and we only perform a local backward pass. In the local case, the node
calling this API has to be the owner of the RRef. The value of the RRef is
expected to be a scalar Tensor.

Parameters

    

  * **dist_autograd_ctx_id** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The distributed autograd context id for which we should retrieve the gradients (default: -1).
  * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to `True` is not needed and often can be worked around in a much more efficient way. Usually, you need to set this to `True` to run backward multiple times (default: False).

Example::

    
    
    
    >>> import torch.distributed.autograd as dist_autograd
    >>> with dist_autograd.context() as context_id:
    >>>     rref.backward(context_id)
    

`confirmed_by_owner(self: torch._C._distributed_rpc.PyRRef) → bool`

    

Returns whether this `RRef` has been confirmed by the owner. `OwnerRRef`
always returns true, while `UserRRef` only returns true when the owner knowns
about this `UserRRef`.

`is_owner(self: torch._C._distributed_rpc.PyRRef) → bool`

    

Returns whether or not the current node is the owner of this `RRef`.

`local_value(self: torch._C._distributed_rpc.PyRRef) → object`

    

If the current node is the owner, returns a reference to the local value.
Otherwise, throws an exception.

`owner(self: torch._C._distributed_rpc.PyRRef) →
torch._C._distributed_rpc.WorkerInfo`

    

Returns worker information of the node that owns this `RRef`.

`owner_name(self: torch._C._distributed_rpc.PyRRef) → str`

    

Returns worker name of the node that owns this `RRef`.

`remote(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) →
object`

    

Create a helper proxy to easily launch a `remote` using the owner of the RRef
as the destination to run functions on the object referenced by this RRef.
More specifically, `rref.remote().func_name(*args, **kwargs)` is the same as
the following:

    
    
    >>> def run(rref, func_name, args, kwargs):
    >>>   return getattr(rref.local_value(), func_name)(*args, **kwargs)
    >>>
    >>> rpc.remote(rref.owner(), run, args=(rref, func_name, args, kwargs))
    

Parameters

    

**timeout** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)") _,__optional_) – Timeout for `rref.remote()`. If the
creation of this `RRef` is not successfully completed within the timeout, then
the next time there is an attempt to use the RRef (such as `to_here`), a
timeout will be raised. If not provided, the default RPC timeout will be used.
Please see `rpc.remote()` for specific timeout semantics for `RRef`.

Example::

    
    
    
    >>> from torch.distributed import rpc
    >>> rref = rpc.remote("worker1", torch.add, args=(torch.zeros(2, 2), 1))
    >>> rref.remote().size().to_here()  # returns torch.Size([2, 2])
    >>> rref.remote().view(1, 4).to_here()  # returns tensor([[1., 1., 1., 1.]])
    

`rpc_async(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) →
object`

    

Create a helper proxy to easily launch an `rpc_async` using the owner of the
RRef as the destination to run functions on the object referenced by this
RRef. More specifically, `rref.rpc_async().func_name(*args, **kwargs)` is the
same as the following:

    
    
    >>> def run(rref, func_name, args, kwargs):
    >>>   return getattr(rref.local_value(), func_name)(*args, **kwargs)
    >>>
    >>> rpc.rpc_async(rref.owner(), run, args=(rref, func_name, args, kwargs))
    

Parameters

    

**timeout** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)") _,__optional_) – Timeout for `rref.rpc_async()`. If the
call does not complete within this timeframe, an exception indicating so will
be raised. If this argument is not provided, the default RPC timeout will be
used.

Example::

    
    
    
    >>> from torch.distributed import rpc
    >>> rref = rpc.remote("worker1", torch.add, args=(torch.zeros(2, 2), 1))
    >>> rref.rpc_async().size().wait()  # returns torch.Size([2, 2])
    >>> rref.rpc_async().view(1, 4).wait()  # returns tensor([[1., 1., 1., 1.]])
    

`rpc_sync(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) →
object`

    

Create a helper proxy to easily launch an `rpc_sync` using the owner of the
RRef as the destination to run functions on the object referenced by this
RRef. More specifically, `rref.rpc_sync().func_name(*args, **kwargs)` is the
same as the following:

    
    
    >>> def run(rref, func_name, args, kwargs):
    >>>   return getattr(rref.local_value(), func_name)(*args, **kwargs)
    >>>
    >>> rpc.rpc_sync(rref.owner(), run, args=(rref, func_name, args, kwargs))
    

Parameters

    

**timeout** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)") _,__optional_) – Timeout for `rref.rpc_sync()`. If the
call does not complete within this timeframe, an exception indicating so will
be raised. If this argument is not provided, the default RPC timeout will be
used.

Example::

    
    
    
    >>> from torch.distributed import rpc
    >>> rref = rpc.remote("worker1", torch.add, args=(torch.zeros(2, 2), 1))
    >>> rref.rpc_sync().size()  # returns torch.Size([2, 2])
    >>> rref.rpc_sync().view(1, 4)  # returns tensor([[1., 1., 1., 1.]])
    

`to_here(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) →
object`

    

Blocking call that copies the value of the RRef from the owner to the local
node and returns it. If the current node is the owner, returns a reference to
the local value.

Parameters

    

**timeout** ([float](https://docs.python.org/3/library/functions.html#float
"\(in Python v3.9\)") _,__optional_) – Timeout for `to_here`. If the call does
not complete within this timeframe, an exception indicating so will be raised.
If this argument is not provided, the default RPC timeout (60s) will be used.

More Information about RRef

  * [Remote Reference Protocol](rpc/rref)
    * [Background](rpc/rref#background)
    * [Assumptions](rpc/rref#assumptions)
    * [RRef Lifetime](rpc/rref#rref-lifetime)
      * [Design Reasoning](rpc/rref#design-reasoning)
      * [Implementation](rpc/rref#implementation)
    * [Protocol Scenarios](rpc/rref#protocol-scenarios)
      * [User Share RRef with Owner as Return Value](rpc/rref#user-share-rref-with-owner-as-return-value)
      * [User Share RRef with Owner as Argument](rpc/rref#user-share-rref-with-owner-as-argument)
      * [Owner Share RRef with User](rpc/rref#owner-share-rref-with-user)
      * [User Share RRef with User](rpc/rref#user-share-rref-with-user)

## Distributed Autograd Framework

This module provides an RPC-based distributed autograd framework that can be
used for applications such as model parallel training. In short, applications
may send and receive gradient recording tensors over RPC. In the forward pass,
we record when gradient recording tensors are sent over RPC and during the
backward pass we use this information to perform a distributed backward pass
using RPC. For more details see [Distributed Autograd
Design](rpc/distributed_autograd#distributed-autograd-design).

`class torch.distributed.autograd.context`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/autograd.html#context)

    

Context object to wrap forward and backward passes when using distributed
autograd. The `context_id` generated in the `with` statement is required to
uniquely identify a distributed backward pass on all workers. Each worker
stores metadata associated with this `context_id`, which is required to
correctly execute a distributed autograd pass.

Example::

    
    
    
    >>> import torch.distributed.autograd as dist_autograd
    >>> with dist_autograd.context() as context_id:
    >>>   t1 = torch.rand((3, 3), requires_grad=True)
    >>>   t2 = torch.rand((3, 3), requires_grad=True)
    >>>   loss = rpc.rpc_sync("worker1", torch.add, args=(t1, t2)).sum()
    >>>   dist_autograd.backward(context_id, [loss])
    

`torch.distributed.autograd.backward(context_id: int, roots: List[Tensor],
retain_graph = False) → None`

    

Kicks off the distributed backward pass using the provided roots. This
currently implements the [FAST mode algorithm](rpc/distributed_autograd#fast-
mode-algorithm) which assumes all RPC messages sent in the same distributed
autograd context across workers would be part of the autograd graph during the
backward pass.

We use the provided roots to discover the autograd graph and compute
appropriate dependencies. This method blocks until the entire autograd
computation is done.

We accumulate the gradients in the appropriate
`torch.distributed.autograd.context` on each of the nodes. The autograd
context to be used is looked up given the `context_id` that is passed in when
`torch.distributed.autograd.backward()` is called. If there is no valid
autograd context corresponding to the given ID, we throw an error. You can
retrieve the accumulated gradients using the `get_gradients()` API.

Parameters

    

  * **context_id** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The autograd context id for which we should retrieve the gradients.
  * **roots** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Tensors which represent the roots of the autograd computation. All the tensors should be scalars.
  * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Usually, you need to set this to True to run backward multiple times.

Example::

    
    
    
    >>> import torch.distributed.autograd as dist_autograd
    >>> with dist_autograd.context() as context_id:
    >>>     pred = model.forward()
    >>>     loss = loss_func(pred, loss)
    >>>     dist_autograd.backward(context_id, loss)
    

`torch.distributed.autograd.get_gradients(context_id: int) → Dict[Tensor,
Tensor]`

    

Retrieves a map from Tensor to the appropriate gradient for that Tensor
accumulated in the provided context corresponding to the given `context_id` as
part of the distributed autograd backward pass.

Parameters

    

**context_id** ([int](https://docs.python.org/3/library/functions.html#int
"\(in Python v3.9\)")) – The autograd context id for which we should retrieve
the gradients.

Returns

    

A map where the key is the Tensor and the value is the associated gradient for
that Tensor.

Example::

    
    
    
    >>> import torch.distributed.autograd as dist_autograd
    >>> with dist_autograd.context() as context_id:
    >>>     t1 = torch.rand((3, 3), requires_grad=True)
    >>>     t2 = torch.rand((3, 3), requires_grad=True)
    >>>     loss = t1 + t2
    >>>     dist_autograd.backward(context_id, [loss.sum()])
    >>>     grads = dist_autograd.get_gradients(context_id)
    >>>     print(grads[t1])
    >>>     print(grads[t2])
    

More Information about RPC Autograd

  * [Distributed Autograd Design](rpc/distributed_autograd)
    * [Background](rpc/distributed_autograd#background)
    * [Autograd recording during the forward pass](rpc/distributed_autograd#autograd-recording-during-the-forward-pass)
    * [Distributed Autograd Context](rpc/distributed_autograd#distributed-autograd-context)
    * [Distributed Backward Pass](rpc/distributed_autograd#distributed-backward-pass)
      * [Computing dependencies](rpc/distributed_autograd#computing-dependencies)
      * [FAST mode algorithm](rpc/distributed_autograd#fast-mode-algorithm)
      * [SMART mode algorithm](rpc/distributed_autograd#smart-mode-algorithm)
    * [Distributed Optimizer](rpc/distributed_autograd#distributed-optimizer)
    * [Simple end to end example](rpc/distributed_autograd#simple-end-to-end-example)

## Distributed Optimizer

`torch.distributed.optim` exposes DistributedOptimizer, which takes a list of
remote parameters (`RRef`) and runs the optimizer locally on the workers where
the parameters live. The distributed optimizer can use any of the local
optimizer [Algorithms](optim#optimizer-algorithms) to apply the gradients on
each worker.

`class torch.distributed.optim.DistributedOptimizer(optimizer_class,
params_rref, *args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/optim/optimizer.html#DistributedOptimizer)

    

DistributedOptimizer takes remote references to parameters scattered across
workers and applies the given optimizer locally for each parameter.

This class uses `get_gradients()` in order to retrieve the gradients for
specific parameters.

Concurrent calls to `step()`, either from the same or different clients, will
be serialized on each worker – as each worker’s optimizer can only work on one
set of gradients at a time. However, there is no guarantee that the full
forward-backward-optimizer sequence will execute for one client at a time.
This means that the gradients being applied may not correspond to the latest
forward pass executed on a given worker. Also, there is no guaranteed ordering
across workers.

`DistributedOptimizer` creates the local optimizer with TorchScript enabled by
default, so that optimizer updates are not blocked by the Python Global
Interpreter Lock (GIL) during multithreaded training (e.g. Distributed Model
Parallel). This feature is currently in beta stage, enabled for optimizers
including `Adagrad`, `Adam`, `SGD`, `RMSprop`, `AdamW` and `Adadelta`. We are
increasing the coverage to all optimizers in future releases.

Parameters

    

  * **optimizer_class** ([optim.Optimizer](optim#torch.optim.Optimizer "torch.optim.Optimizer")) – the class of optimizer to instantiate on each worker.
  * **params_rref** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_RRef _]_) – list of RRefs to local or remote parameters to optimize.
  * **args** – arguments to pass to the optimizer constructor on each worker.
  * **kwargs** – arguments to pass to the optimizer constructor on each worker.

Example::

    
    
    
    >>> import torch.distributed.autograd as dist_autograd
    >>> import torch.distributed.rpc as rpc
    >>> from torch import optim
    >>> from torch.distributed.optim import DistributedOptimizer
    >>>
    >>> with dist_autograd.context() as context_id:
    >>>   # Forward pass.
    >>>   rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3))
    >>>   rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1))
    >>>   loss = rref1.to_here() + rref2.to_here()
    >>>
    >>>   # Backward pass.
    >>>   dist_autograd.backward(context_id, [loss.sum()])
    >>>
    >>>   # Optimizer.
    >>>   dist_optim = DistributedOptimizer(
    >>>      optim.SGD,
    >>>      [rref1, rref2],
    >>>      lr=0.05,
    >>>   )
    >>>   dist_optim.step(context_id)
    

`step(context_id)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/optim/optimizer.html#DistributedOptimizer.step)

    

Performs a single optimization step.

This will call
[`torch.optim.Optimizer.step()`](optim#torch.optim.Optimizer.step
"torch.optim.Optimizer.step") on each worker containing parameters to be
optimized, and will block until all workers return. The provided `context_id`
will be used to retrieve the corresponding `context` that contains the
gradients that should be applied to the parameters.

Parameters

    

**context_id** – the autograd context id for which we should run the optimizer
step.

## Design Notes

The distributed autograd design note covers the design of the RPC-based
distributed autograd framework that is useful for applications such as model
parallel training.

  * [Distributed Autograd Design](rpc/distributed_autograd#distributed-autograd-design)

The RRef design note covers the design of the RRef (Remote REFerence) protocol
used to refer to values on remote workers by the framework.

  * [Remote Reference Protocol](rpc/rref#remote-reference-protocol)

## Tutorials

The RPC tutorials introduce users to the RPC framework, provide several
example applications using torch.distributed.rpc APIs, and demonstrate how to
use [the profiler](https://pytorch.org/docs/stable/autograd.html#profiler) to
profile RPC-based workloads.

  * [Getting started with Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html)
  * [Implementing a Parameter Server using Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html)
  * [Combining Distributed DataParallel with Distributed RPC Framework](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html)
  * [Profiling RPC-based Workloads](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html)
  * [Implementing batch RPC processing](https://pytorch.org/tutorials/intermediate/rpc_async_execution.html)
  * [Distributed Pipeline Parallel](https://pytorch.org/tutorials/intermediate/dist_pipeline_parallel_tutorial.html)

# torch.sparse

## Introduction

PyTorch provides [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") to
represent a multi-dimensional array containing elements of a single data type.
By default, array elements are stored contiguously in memory leading to
efficient implementations of various array processing algorithms that relay on
the fast access to array elements. However, there exists an important class of
multi-dimensional arrays, so-called sparse arrays, where the contiguous memory
storage of array elements turns out to be suboptimal. Sparse arrays have a
property of having a vast portion of elements being equal to zero which means
that a lot of memory as well as processor resources can be spared if only the
non-zero elements are stored or/and processed. Various sparse storage formats
([such as COO, CSR/CSC, LIL,
etc.](https://en.wikipedia.org/wiki/Sparse_matrix)) have been developed that
are optimized for a particular structure of non-zero elements in sparse arrays
as well as for specific operations on the arrays.

Note

When talking about storing only non-zero elements of a sparse array, the usage
of adjective “non-zero” is not strict: one is allowed to store also zeros in
the sparse array data structure. Hence, in the following, we use “specified
elements” for those array elements that are actually stored. In addition, the
unspecified elements are typically assumed to have zero value, but not only,
hence we use the term “fill value” to denote such elements.

Note

Using a sparse storage format for storing sparse arrays can be advantageous
only when the size and sparsity levels of arrays are high. Otherwise, for
small-sized or low-sparsity arrays using the contiguous memory storage format
is likely the most efficient approach.

Warning

The PyTorch API of sparse tensors is in beta and may change in the near
future.

## Sparse COO tensors

Currently, PyTorch implements the so-called Coordinate format, or COO format,
as the default sparse storage format for storing sparse tensors. In COO
format, the specified elements are stored as tuples of element indices and the
corresponding values. In particular,

  * the indices of specified elements are collected in `indices` tensor of size `(ndim, nse)` and with element type `torch.int64`,
  * the corresponding values are collected in `values` tensor of size `(nse,)` and with an arbitrary integer or floating point number element type,

where `ndim` is the dimensionality of the tensor and `nse` is the number of
specified elements.

Note

The memory consumption of a sparse COO tensor is at least `(ndim * 8 + <size
of element type in bytes>) * nse` bytes (plus a constant overhead from storing
other tensor data).

The memory consumption of a strided tensor is at least `product(<tensor
shape>) * <size of element type in bytes>`.

For example, the memory consumption of a 10 000 x 10 000 tensor with 100 000
non-zero 32-bit floating point numbers is at least `(2 * 8 + 4) * 100 000 = 2
000 000` bytes when using COO tensor layout and `10 000 * 10 000 * 4 = 400 000
000` bytes when using the default strided tensor layout. Notice the 200 fold
memory saving from using the COO storage format.

### Construction

A sparse COO tensor can be constructed by providing the two tensors of indices
and values, as well as the size of the sparse tensor (when it cannot be
inferred from the indices and values tensors) to a function
[`torch.sparse_coo_tensor()`](generated/torch.sparse_coo_tensor#torch.sparse_coo_tensor
"torch.sparse_coo_tensor").

Suppose we want to define a sparse tensor with the entry 3 at location (0, 2),
entry 4 at location (1, 0), and entry 5 at location (1, 2). Unspecified
elements are assumed to have the same value, fill value, which is zero by
default. We would then write:

    
    
    >>> i = [[0, 1, 1],
             [2, 0, 2]]
    >>> v =  [3, 4, 5]
    >>> s = torch.sparse_coo_tensor(i, v, (2, 3))
    >>> s
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([3, 4, 5]),
           size=(2, 3), nnz=3, layout=torch.sparse_coo)
    >>> s.to_dense()
    tensor([[0, 0, 3],
            [4, 0, 5]])
    

Note that the input `i` is NOT a list of index tuples. If you want to write
your indices this way, you should transpose before passing them to the sparse
constructor:

    
    
    >>> i = [[0, 2], [1, 0], [1, 2]]
    >>> v =  [3,      4,      5    ]
    >>> s = torch.sparse_coo_tensor(list(zip(*i)), v, (2, 3))
    >>> # Or another equivalent formulation to get s
    >>> s = torch.sparse_coo_tensor(torch.tensor(i).t(), v, (2, 3))
    >>> torch.sparse_coo_tensor(i.t(), v, torch.Size([2,3])).to_dense()
    tensor([[0, 0, 3],
            [4, 0, 5]])
    

An empty sparse COO tensor can be constructed by specifying its size only:

    
    
    >>> torch.sparse_coo_tensor(size=(2, 3))
    tensor(indices=tensor([], size=(2, 0)),
           values=tensor([], size=(0,)),
           size=(2, 3), nnz=0, layout=torch.sparse_coo)
    

### Hybrid sparse COO tensors

Pytorch implements an extension of sparse tensors with scalar values to sparse
tensors with (contiguous) tensor values. Such tensors are called hybrid
tensors.

PyTorch hybrid COO tensor extends the sparse COO tensor by allowing the
`values` tensor to be a multi-dimensional tensor so that we have:

  * the indices of specified elements are collected in `indices` tensor of size `(sparse_dims, nse)` and with element type `torch.int64`,
  * the corresponding (tensor) values are collected in `values` tensor of size `(nse, dense_dims)` and with an arbitrary integer or floating point number element type.

Note

We use (M + K)-dimensional tensor to denote a N-dimensional hybrid sparse
tensor, where M and K are the numbers of sparse and dense dimensions,
respectively, such that M + K == N holds.

Suppose we want to create a (2 + 1)-dimensional tensor with the entry [3, 4]
at location (0, 2), entry [5, 6] at location (1, 0), and entry [7, 8] at
location (1, 2). We would write

    
    
    >>> i = [[0, 1, 1],
             [2, 0, 2]]
    >>> v =  [[3, 4], [5, 6], [7, 8]]
    >>> s = torch.sparse_coo_tensor(i, v, (2, 3, 2))
    >>> s
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([[3, 4],
                          [5, 6],
                          [7, 8]]),
           size=(2, 3, 2), nnz=3, layout=torch.sparse_coo)
    
    
    
    >>> s.to_dense()
    tensor([[[0, 0],
             [0, 0],
             [3, 4]],
            [[5, 6],
             [0, 0],
             [7, 8]]])
    

In general, if `s` is a sparse COO tensor and `M = s.sparse_dim()`, `K =
s.dense_dim()`, then we have the following invariants:

  * `M + K == len(s.shape) == s.ndim` \- dimensionality of a tensor is the sum of the number of sparse and dense dimensions,
  * `s.indices().shape == (M, nse)` \- sparse indices are stored explicitly,
  * `s.values().shape == (nse,) + s.shape[M : M + K]` \- the values of a hybrid tensor are K-dimensional tensors,
  * `s.values().layout == torch.strided` \- values are stored as strided tensors.

Note

Dense dimensions always follow sparse dimensions, that is, mixing of dense and
sparse dimensions is not supported.

### Uncoalesced sparse COO tensors

PyTorch sparse COO tensor format permits _uncoalesced_ sparse tensors, where
there may be duplicate coordinates in the indices; in this case, the
interpretation is that the value at that index is the sum of all duplicate
value entries. For example, one can specify multiple values, `3` and `4`, for
the same index `1`, that leads to an 1-D uncoalesced tensor:

    
    
    >>> i = [[1, 1]]
    >>> v =  [3, 4]
    >>> s=torch.sparse_coo_tensor(i, v, (3,))
    >>> s
    tensor(indices=tensor([[1, 1]]),
           values=tensor(  [3, 4]),
           size=(3,), nnz=2, layout=torch.sparse_coo)
    

while the coalescing process will accumulate the multi-valued elements into a
single value using summation:

    
    
    >>> s.coalesce()
    tensor(indices=tensor([[1]]),
           values=tensor([7]),
           size=(3,), nnz=1, layout=torch.sparse_coo)
    

In general, the output of `torch.Tensor.coalesce()` method is a sparse tensor
with the following properties:

  * the indices of specified tensor elements are unique,
  * the indices are sorted in lexicographical order,
  * `torch.Tensor.is_coalesced()` returns `True`.

Note

For the most part, you shouldn’t have to care whether or not a sparse tensor
is coalesced or not, as most operations will work identically given a
coalesced or uncoalesced sparse tensor.

However, some operations can be implemented more efficiently on uncoalesced
tensors, and some on coalesced tensors.

For instance, addition of sparse COO tensors is implemented by simply
concatenating the indices and values tensors:

    
    
    >>> a = torch.sparse_coo_tensor([[1, 1]], [5, 6], (2,))
    >>> b = torch.sparse_coo_tensor([[0, 0]], [7, 8], (2,))
    >>> a + b
    tensor(indices=tensor([[0, 0, 1, 1]]),
           values=tensor([7, 8, 5, 6]),
           size=(2,), nnz=4, layout=torch.sparse_coo)
    

If you repeatedly perform an operation that can produce duplicate entries
(e.g., [`torch.Tensor.add()`](tensors#torch.Tensor.add "torch.Tensor.add")),
you should occasionally coalesce your sparse tensors to prevent them from
growing too large.

On the other hand, the lexicographical ordering of indices can be advantageous
for implementing algorithms that involve many element selection operations,
such as slicing or matrix products.

### Working with sparse COO tensors

Let’s consider the following example:

    
    
    >>> i = [[0, 1, 1],
             [2, 0, 2]]
    >>> v =  [[3, 4], [5, 6], [7, 8]]
    >>> s = torch.sparse_coo_tensor(i, v, (2, 3, 2))
    

As mentioned above, a sparse COO tensor is a
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") instance and to
distinguish it from the `Tensor` instances that use some other layout, on can
use `torch.Tensor.is_sparse` or `torch.Tensor.layout` properties:

    
    
    >>> isinstance(s, torch.Tensor)
    True
    >>> s.is_sparse
    True
    >>> s.layout == torch.sparse_coo
    True
    

The number of sparse and dense dimensions can be acquired using methods
`torch.Tensor.sparse_dim()` and `torch.Tensor.dense_dim()`, respectively. For
instance:

    
    
    >>> s.sparse_dim(), s.dense_dim()
    (2, 1)
    

If `s` is a sparse COO tensor then its COO format data can be acquired using
methods `torch.Tensor.indices()` and `torch.Tensor.values()`.

Note

Currently, one can acquire the COO format data only when the tensor instance
is coalesced:

    
    
    >>> s.indices()
    RuntimeError: Cannot get indices on an uncoalesced tensor, please call .coalesce() first
    

For acquiring the COO format data of an uncoalesced tensor, use
`torch.Tensor._values()` and `torch.Tensor._indices()`:

    
    
    >>> s._indices()
    tensor([[0, 1, 1],
            [2, 0, 2]])
    

Constructing a new sparse COO tensor results a tensor that is not coalesced:

    
    
    >>> s.is_coalesced()
    False
    

but one can construct a coalesced copy of a sparse COO tensor using the
`torch.Tensor.coalesce()` method:

    
    
    >>> s2 = s.coalesce()
    >>> s2.indices()
    tensor([[0, 1, 1],
           [2, 0, 2]])
    

When working with uncoalesced sparse COO tensors, one must take into an
account the additive nature of uncoalesced data: the values of the same
indices are the terms of a sum that evaluation gives the value of the
corresponding tensor element. For example, the scalar multiplication on an
uncoalesced sparse tensor could be implemented by multiplying all the
uncoalesced values with the scalar because `c * (a + b) == c * a + c * b`
holds. However, any nonlinear operation, say, a square root, cannot be
implemented by applying the operation to uncoalesced data because `sqrt(a + b)
== sqrt(a) + sqrt(b)` does not hold in general.

Slicing (with positive step) of a sparse COO tensor is supported only for
dense dimensions. Indexing is supported for both sparse and dense dimensions:

    
    
    >>> s[1]
    tensor(indices=tensor([[0, 2]]),
           values=tensor([[5, 6],
                          [7, 8]]),
           size=(3, 2), nnz=2, layout=torch.sparse_coo)
    >>> s[1, 0, 1]
    tensor(6)
    >>> s[1, 0, 1:]
    tensor([6])
    

In PyTorch, the fill value of a sparse tensor cannot be specified explicitly
and is assumed to be zero in general. However, there exists operations that
may interpret the fill value differently. For instance,
`torch.sparse.softmax()` computes the softmax with the assumption that the
fill value is negative infinity.

## Supported Linear Algebra operations

The following table summarizes supported Linear Algebra operations on sparse
matrices where the operands layouts may vary. Here `T[layout]` denotes a
tensor with a given layout. Similarly, `M[layout]` denotes a matrix (2-D
PyTorch tensor), and `V[layout]` denotes a vector (1-D PyTorch tensor). In
addition, `f` denotes a scalar (float or 0-D PyTorch tensor), `*` is element-
wise multiplication, and `@` is matrix multiplication.

PyTorch operation | Sparse grad? | Layout signature  
---|---|---  
[`torch.mv()`](generated/torch.mv#torch.mv "torch.mv") | no | `M[sparse_coo] @ V[strided] -> V[strided]`  
[`torch.matmul()`](generated/torch.matmul#torch.matmul "torch.matmul") | no | `M[sparse_coo] @ M[strided] -> M[strided]`  
[`torch.mm()`](generated/torch.mm#torch.mm "torch.mm") | no | `M[sparse_coo] @ M[strided] -> M[strided]`  
`torch.sparse.mm()` | yes | `M[sparse_coo] @ M[strided] -> M[strided]`  
`torch.smm()` | no | `M[sparse_coo] @ M[strided] -> M[sparse_coo]`  
`torch.hspmm()` | no | `M[sparse_coo] @ M[strided] -> M[hybrid sparse_coo]`  
[`torch.bmm()`](generated/torch.bmm#torch.bmm "torch.bmm") | no | `T[sparse_coo] @ T[strided] -> T[strided]`  
[`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm") | no | `f * M[strided] + f * (M[sparse_coo] @ M[strided]) -> M[strided]`  
`torch.sparse.addmm()` | yes | `f * M[strided] + f * (M[sparse_coo] @ M[strided]) -> M[strided]`  
`torch.sspaddmm()` | no | `f * M[sparse_coo] + f * (M[sparse_coo] @ M[strided]) -> M[sparse_coo]`  
[`torch.lobpcg()`](generated/torch.lobpcg#torch.lobpcg "torch.lobpcg") | no | `GENEIG(M[sparse_coo]) -> M[strided], M[strided]`  
[`torch.pca_lowrank()`](generated/torch.pca_lowrank#torch.pca_lowrank "torch.pca_lowrank") | yes | `PCA(M[sparse_coo]) -> M[strided], M[strided], M[strided]`  
[`torch.svd_lowrank()`](generated/torch.svd_lowrank#torch.svd_lowrank "torch.svd_lowrank") | yes | `SVD(M[sparse_coo]) -> M[strided], M[strided], M[strided]`  
  
where “Sparse grad?” column indicates if the PyTorch operation supports
backward with respect to sparse matrix argument. All PyTorch operations,
except `torch.smm()`, support backward with respect to strided matrix
arguments.

Note

Currently, PyTorch does not support matrix multiplication with the layout
signature `M[strided] @ M[sparse_coo]`. However, applications can still
compute this using the matrix relation `D @ S == (S.t() @ D.t()).t()`.

`class torch.Tensor`

    

The following methods are specific to sparse tensors:

`is_sparse`

    

Is `True` if the Tensor uses sparse storage layout, `False` otherwise.

`dense_dim() → int`

    

Return the number of dense dimensions in a sparse tensor `self`.

Warning

Throws an error if `self` is not a sparse tensor.

See also `Tensor.sparse_dim()` and hybrid tensors.

`sparse_dim() → int`

    

Return the number of sparse dimensions in a sparse tensor `self`.

Warning

Throws an error if `self` is not a sparse tensor.

See also `Tensor.dense_dim()` and hybrid tensors.

`sparse_mask(mask) → Tensor`

    

Returns a new sparse tensor with values from a strided tensor `self` filtered
by the indices of the sparse tensor `mask`. The values of `mask` sparse tensor
are ignored. `self` and `mask` tensors must have the same shape.

Note

The returned sparse tensor has the same indices as the sparse tensor `mask`,
even when the corresponding values in `self` are zeros.

Parameters

    

**mask** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse tensor
whose indices are used as a filter

Example:

    
    
    >>> nse = 5
    >>> dims = (5, 5, 2, 2)
    >>> I = torch.cat([torch.randint(0, dims[0], size=(nse,)),
    ...                torch.randint(0, dims[1], size=(nse,))], 0).reshape(2, nse)
    >>> V = torch.randn(nse, dims[2], dims[3])
    >>> S = torch.sparse_coo_tensor(I, V, dims).coalesce()
    >>> D = torch.randn(dims)
    >>> D.sparse_mask(S)
    tensor(indices=tensor([[0, 0, 0, 2],
                           [0, 1, 4, 3]]),
           values=tensor([[[ 1.6550,  0.2397],
                           [-0.1611, -0.0779]],
    
                          [[ 0.2326, -1.0558],
                           [ 1.4711,  1.9678]],
    
                          [[-0.5138, -0.0411],
                           [ 1.9417,  0.5158]],
    
                          [[ 0.0793,  0.0036],
                           [-0.2569, -0.1055]]]),
           size=(5, 5, 2, 2), nnz=4, layout=torch.sparse_coo)
    

`sparse_resize_(size, sparse_dim, dense_dim) → Tensor`

    

Resizes `self` sparse tensor to the desired size and the number of sparse and
dense dimensions.

Note

If the number of specified elements in `self` is zero, then
[`size`](tensors#torch.Tensor.size "torch.Tensor.size"), `sparse_dim`, and
`dense_dim` can be any size and positive integers such that `len(size) ==
sparse_dim + dense_dim`.

If `self` specifies one or more elements, however, then each dimension in
[`size`](tensors#torch.Tensor.size "torch.Tensor.size") must not be smaller
than the corresponding dimension of `self`, `sparse_dim` must equal the number
of sparse dimensions in `self`, and `dense_dim` must equal the number of dense
dimensions in `self`.

Warning

Throws an error if `self` is not a sparse tensor.

Parameters

    

  * **size** (_torch.Size_) – the desired size. If `self` is non-empty sparse tensor, the desired size cannot be smaller than the original size.
  * **sparse_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of sparse dimensions
  * **dense_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of dense dimensions

`sparse_resize_and_clear_(size, sparse_dim, dense_dim) → Tensor`

    

Removes all specified elements from a sparse tensor `self` and resizes `self`
to the desired size and the number of sparse and dense dimensions.

Parameters

    

  * **size** (_torch.Size_) – the desired size.
  * **sparse_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of sparse dimensions
  * **dense_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of dense dimensions

`to_dense() → Tensor`

    

Creates a strided copy of `self`.

Warning

Throws an error if `self` is a strided tensor.

Example:

    
    
    >>> s = torch.sparse_coo_tensor(
    ...        torch.tensor([[1, 1],
    ...                      [0, 2]]),
    ...        torch.tensor([9, 10]),
    ...        size=(3, 3))
    >>> s.to_dense()
    tensor([[ 0,  0,  0],
            [ 9,  0, 10],
            [ 0,  0,  0]])
    

`to_sparse(sparseDims) → Tensor`

    

Returns a sparse copy of the tensor. PyTorch supports sparse tensors in
coordinate format.

Parameters

    

**sparseDims** ([int](https://docs.python.org/3/library/functions.html#int
"\(in Python v3.9\)") _,__optional_) – the number of sparse dimensions to
include in the new sparse tensor

Example:

    
    
    >>> d = torch.tensor([[0, 0, 0], [9, 0, 10], [0, 0, 0]])
    >>> d
    tensor([[ 0,  0,  0],
            [ 9,  0, 10],
            [ 0,  0,  0]])
    >>> d.to_sparse()
    tensor(indices=tensor([[1, 1],
                           [0, 2]]),
           values=tensor([ 9, 10]),
           size=(3, 3), nnz=2, layout=torch.sparse_coo)
    >>> d.to_sparse(1)
    tensor(indices=tensor([[1]]),
           values=tensor([[ 9,  0, 10]]),
           size=(3, 3), nnz=1, layout=torch.sparse_coo)
    

`coalesce() → Tensor`

    

Returns a coalesced copy of `self` if `self` is an uncoalesced tensor.

Returns `self` if `self` is a coalesced tensor.

Warning

Throws an error if `self` is not a sparse COO tensor.

`is_coalesced() → bool`

    

Returns `True` if `self` is a sparse COO tensor that is coalesced, `False`
otherwise.

Warning

Throws an error if `self` is not a sparse COO tensor.

See `coalesce()` and uncoalesced tensors.

`indices() → Tensor`

    

Return the indices tensor of a sparse COO tensor.

Warning

Throws an error if `self` is not a sparse COO tensor.

See also `Tensor.values()`.

Note

This method can only be called on a coalesced sparse tensor. See
`Tensor.coalesce()` for details.

`values() → Tensor`

    

Return the values tensor of a sparse COO tensor.

Warning

Throws an error if `self` is not a sparse COO tensor.

See also `Tensor.indices()`.

Note

This method can only be called on a coalesced sparse tensor. See
`Tensor.coalesce()` for details.

The following [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") methods
support sparse COO tensors:

[`add()`](tensors#torch.Tensor.add "torch.Tensor.add")
[`add_()`](tensors#torch.Tensor.add_ "torch.Tensor.add_")
[`addmm()`](tensors#torch.Tensor.addmm "torch.Tensor.addmm")
[`addmm_()`](tensors#torch.Tensor.addmm_ "torch.Tensor.addmm_")
[`any()`](tensors#torch.Tensor.any "torch.Tensor.any")
[`asin()`](tensors#torch.Tensor.asin "torch.Tensor.asin")
[`asin_()`](tensors#torch.Tensor.asin_ "torch.Tensor.asin_")
[`arcsin()`](tensors#torch.Tensor.arcsin "torch.Tensor.arcsin")
[`arcsin_()`](tensors#torch.Tensor.arcsin_ "torch.Tensor.arcsin_")
[`bmm()`](tensors#torch.Tensor.bmm "torch.Tensor.bmm")
[`clone()`](tensors#torch.Tensor.clone "torch.Tensor.clone")
[`deg2rad()`](tensors#torch.Tensor.deg2rad "torch.Tensor.deg2rad")
`deg2rad_()` [`detach()`](autograd#torch.Tensor.detach "torch.Tensor.detach")
[`detach_()`](autograd#torch.Tensor.detach_ "torch.Tensor.detach_")
[`dim()`](tensors#torch.Tensor.dim "torch.Tensor.dim")
[`div()`](tensors#torch.Tensor.div "torch.Tensor.div")
[`div_()`](tensors#torch.Tensor.div_ "torch.Tensor.div_")
[`floor_divide()`](tensors#torch.Tensor.floor_divide
"torch.Tensor.floor_divide")
[`floor_divide_()`](tensors#torch.Tensor.floor_divide_
"torch.Tensor.floor_divide_") [`get_device()`](tensors#torch.Tensor.get_device
"torch.Tensor.get_device")
[`index_select()`](tensors#torch.Tensor.index_select
"torch.Tensor.index_select") [`isnan()`](tensors#torch.Tensor.isnan
"torch.Tensor.isnan") [`log1p()`](tensors#torch.Tensor.log1p
"torch.Tensor.log1p") [`log1p_()`](tensors#torch.Tensor.log1p_
"torch.Tensor.log1p_") [`mm()`](tensors#torch.Tensor.mm "torch.Tensor.mm")
[`mul()`](tensors#torch.Tensor.mul "torch.Tensor.mul")
[`mul_()`](tensors#torch.Tensor.mul_ "torch.Tensor.mul_")
[`mv()`](tensors#torch.Tensor.mv "torch.Tensor.mv")
[`narrow_copy()`](tensors#torch.Tensor.narrow_copy "torch.Tensor.narrow_copy")
[`neg()`](tensors#torch.Tensor.neg "torch.Tensor.neg")
[`neg_()`](tensors#torch.Tensor.neg_ "torch.Tensor.neg_")
[`negative()`](tensors#torch.Tensor.negative "torch.Tensor.negative")
[`negative_()`](tensors#torch.Tensor.negative_ "torch.Tensor.negative_")
[`numel()`](tensors#torch.Tensor.numel "torch.Tensor.numel")
[`rad2deg()`](tensors#torch.Tensor.rad2deg "torch.Tensor.rad2deg")
`rad2deg_()` [`resize_as_()`](tensors#torch.Tensor.resize_as_
"torch.Tensor.resize_as_") [`size()`](tensors#torch.Tensor.size
"torch.Tensor.size") [`pow()`](tensors#torch.Tensor.pow "torch.Tensor.pow")
[`sqrt()`](tensors#torch.Tensor.sqrt "torch.Tensor.sqrt")
[`square()`](tensors#torch.Tensor.square "torch.Tensor.square") `smm()`
`sspaddmm()` [`sub()`](tensors#torch.Tensor.sub "torch.Tensor.sub")
[`sub_()`](tensors#torch.Tensor.sub_ "torch.Tensor.sub_")
[`t()`](tensors#torch.Tensor.t "torch.Tensor.t")
[`t_()`](tensors#torch.Tensor.t_ "torch.Tensor.t_")
[`transpose()`](tensors#torch.Tensor.transpose "torch.Tensor.transpose")
[`transpose_()`](tensors#torch.Tensor.transpose_ "torch.Tensor.transpose_")
[`zero_()`](tensors#torch.Tensor.zero_ "torch.Tensor.zero_")

## Sparse tensor functions

`torch.sparse_coo_tensor(indices, values, size=None, *, dtype=None,
device=None, requires_grad=False) → Tensor`

    

Constructs a sparse tensor in COO(rdinate) format with specified values at the
given `indices`.

Note

This function returns an uncoalesced tensor.

Parameters

    

  * **indices** (_array_like_) – Initial data for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. Will be cast to a `torch.LongTensor` internally. The indices are the coordinates of the non-zero values in the matrix, and thus should be two-dimensional where the first dimension is the number of tensor dimensions and the second dimension is the number of non-zero values.
  * **values** (_array_like_) – Initial values for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types.
  * **size** (list, tuple, or `torch.Size`, optional) – Size of the sparse tensor. If not provided the size will be inferred as the minimum size big enough to hold all non-zero elements.

Keyword Arguments

    

  * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if None, infers data type from `values`.
  * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> i = torch.tensor([[0, 1, 1],
    ...                   [2, 0, 2]])
    >>> v = torch.tensor([3, 4, 5], dtype=torch.float32)
    >>> torch.sparse_coo_tensor(i, v, [2, 4])
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([3., 4., 5.]),
           size=(2, 4), nnz=3, layout=torch.sparse_coo)
    
    >>> torch.sparse_coo_tensor(i, v)  # Shape inference
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([3., 4., 5.]),
           size=(2, 3), nnz=3, layout=torch.sparse_coo)
    
    >>> torch.sparse_coo_tensor(i, v, [2, 4],
    ...                         dtype=torch.float64,
    ...                         device=torch.device('cuda:0'))
    tensor(indices=tensor([[0, 1, 1],
                           [2, 0, 2]]),
           values=tensor([3., 4., 5.]),
           device='cuda:0', size=(2, 4), nnz=3, dtype=torch.float64,
           layout=torch.sparse_coo)
    
    # Create an empty sparse tensor with the following invariants:
    #   1. sparse_dim + dense_dim = len(SparseTensor.shape)
    #   2. SparseTensor._indices().shape = (sparse_dim, nnz)
    #   3. SparseTensor._values().shape = (nnz, SparseTensor.shape[sparse_dim:])
    #
    # For instance, to create an empty sparse tensor with nnz = 0, dense_dim = 0 and
    # sparse_dim = 1 (hence indices is a 2D tensor of shape = (1, 0))
    >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), [], [1])
    tensor(indices=tensor([], size=(1, 0)),
           values=tensor([], size=(0,)),
           size=(1,), nnz=0, layout=torch.sparse_coo)
    
    # and to create an empty sparse tensor with nnz = 0, dense_dim = 1 and
    # sparse_dim = 1
    >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), torch.empty([0, 2]), [1, 2])
    tensor(indices=tensor([], size=(1, 0)),
           values=tensor([], size=(0, 2)),
           size=(1, 2), nnz=0, layout=torch.sparse_coo)
    

`torch.sparse.sum(input, dim=None, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#sum)

    

Returns the sum of each row of the sparse tensor `input` in the given
dimensions `dim`. If `dim` is a list of dimensions, reduce over all of them.
When sum over all `sparse_dim`, this method returns a dense tensor instead of
a sparse tensor.

All summed `dim` are squeezed (see
[`torch.squeeze()`](generated/torch.squeeze#torch.squeeze "torch.squeeze")),
resulting an output tensor having `dim` fewer dimensions than `input`.

During backward, only gradients at `nnz` locations of `input` will propagate
back. Note that the gradients of `input` is coalesced.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input sparse tensor
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – a dimension or a list of dimensions to reduce. Default: reduce over all dims.
  * **dtype** (`torch.dtype`, optional) – the desired data type of returned Tensor. Default: dtype of `input`.

Example:

    
    
    >>> nnz = 3
    >>> dims = [5, 5, 2, 3]
    >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)),
                       torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz)
    >>> V = torch.randn(nnz, dims[2], dims[3])
    >>> size = torch.Size(dims)
    >>> S = torch.sparse_coo_tensor(I, V, size)
    >>> S
    tensor(indices=tensor([[2, 0, 3],
                           [2, 4, 1]]),
           values=tensor([[[-0.6438, -1.6467,  1.4004],
                           [ 0.3411,  0.0918, -0.2312]],
    
                          [[ 0.5348,  0.0634, -2.0494],
                           [-0.7125, -1.0646,  2.1844]],
    
                          [[ 0.1276,  0.1874, -0.6334],
                           [-1.9682, -0.5340,  0.7483]]]),
           size=(5, 5, 2, 3), nnz=3, layout=torch.sparse_coo)
    
    # when sum over only part of sparse_dims, return a sparse tensor
    >>> torch.sparse.sum(S, [1, 3])
    tensor(indices=tensor([[0, 2, 3]]),
           values=tensor([[-1.4512,  0.4073],
                          [-0.8901,  0.2017],
                          [-0.3183, -1.7539]]),
           size=(5, 2), nnz=3, layout=torch.sparse_coo)
    
    # when sum over all sparse dim, return a dense tensor
    # with summed dims squeezed
    >>> torch.sparse.sum(S, [0, 1, 3])
    tensor([-2.6596, -1.1450])
    

`torch.sparse.addmm(mat, mat1, mat2, beta=1.0, alpha=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#addmm)

    

This function does exact same thing as
[`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm") in the
forward, except that it supports backward for sparse matrix `mat1`. `mat1`
need to have `sparse_dim = 2`. Note that the gradients of `mat1` is a
coalesced sparse tensor.

Parameters

    

  * **mat** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be added
  * **mat1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be multiplied
  * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be multiplied
  * **beta** (_Number_ _,__optional_) – multiplier for `mat` (β\beta )
  * **alpha** (_Number_ _,__optional_) – multiplier for mat1@mat2mat1 @ mat2 (α\alpha )

`torch.sparse.mm(mat1, mat2)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#mm)

    

Performs a matrix multiplication of the sparse matrix `mat1` and the (sparse
or strided) matrix `mat2`. Similar to
[`torch.mm()`](generated/torch.mm#torch.mm "torch.mm"), If `mat1` is a (n×m)(n
\times m) tensor, `mat2` is a (m×p)(m \times p) tensor, out will be a (n×p)(n
\times p) tensor. `mat1` need to have `sparse_dim = 2`. This function also
supports backward for both matrices. Note that the gradients of `mat1` is a
coalesced sparse tensor.

Parameters

    

  * **mat1** (_SparseTensor_) – the first sparse matrix to be multiplied
  * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the second matrix to be multiplied, which could be sparse or dense

Shape:

    

The format of the output tensor of this function follows: - sparse x sparse ->
sparse - sparse x dense -> dense

Example:

    
    
    >>> a = torch.randn(2, 3).to_sparse().requires_grad_(True)
    >>> a
    tensor(indices=tensor([[0, 0, 0, 1, 1, 1],
                           [0, 1, 2, 0, 1, 2]]),
           values=tensor([ 1.5901,  0.0183, -0.6146,  1.8061, -0.0112,  0.6302]),
           size=(2, 3), nnz=6, layout=torch.sparse_coo, requires_grad=True)
    
    >>> b = torch.randn(3, 2, requires_grad=True)
    >>> b
    tensor([[-0.6479,  0.7874],
            [-1.2056,  0.5641],
            [-1.1716, -0.9923]], requires_grad=True)
    
    >>> y = torch.sparse.mm(a, b)
    >>> y
    tensor([[-0.3323,  1.8723],
            [-1.8951,  0.7904]], grad_fn=<SparseAddmmBackward>)
    >>> y.sum().backward()
    >>> a.grad
    tensor(indices=tensor([[0, 0, 0, 1, 1, 1],
                           [0, 1, 2, 0, 1, 2]]),
           values=tensor([ 0.1394, -0.6415, -2.1639,  0.1394, -0.6415, -2.1639]),
           size=(2, 3), nnz=6, layout=torch.sparse_coo)
    

`torch.sspaddmm(input, mat1, mat2, *, beta=1, alpha=1, out=None) → Tensor`

    

Matrix multiplies a sparse tensor `mat1` with a dense tensor `mat2`, then adds
the sparse tensor `input` to the result.

Note: This function is equivalent to
[`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm"), except
`input` and `mat1` are sparse.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be added
  * **mat1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be matrix multiplied
  * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be matrix multiplied

Keyword Arguments

    

  * **beta** (_Number_ _,__optional_) – multiplier for `mat` (β\beta )
  * **alpha** (_Number_ _,__optional_) – multiplier for mat1@mat2mat1 @ mat2 (α\alpha )
  * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor.

`torch.hspmm(mat1, mat2, *, out=None) → Tensor`

    

Performs a matrix multiplication of a sparse COO matrix `mat1` and a strided
matrix `mat2`. The result is a (1 + 1)-dimensional hybrid COO matrix.

Parameters

    

  * **mat1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the first sparse matrix to be matrix multiplied
  * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the second strided matrix to be matrix multiplied

Keyword Arguments

    

**{out}** –

`torch.smm(input, mat) → Tensor`

    

Performs a matrix multiplication of the sparse matrix `input` with the dense
matrix `mat`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be matrix multiplied
  * **mat** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be matrix multiplied

`torch.sparse.softmax(input, dim, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#softmax)

    

Applies a softmax function.

Softmax is defined as:

Softmax(xi)=exp(xi)∑jexp(xj)\text{Softmax}(x_{i}) = \frac{exp(x_i)}{\sum_j
exp(x_j)}

where i,ji, j run over sparse tensor indices and unspecified entries are
ignores. This is equivalent to defining unspecified entries as negative
infinity so that exp(xk)=0exp(x_k) = 0 when the entry with index kk has not
specified.

It is applied to all slices along `dim`, and will re-scale them so that the
elements lie in the range `[0, 1]` and sum to 1.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed.
  * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None

`torch.sparse.log_softmax(input, dim, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#log_softmax)

    

Applies a softmax function followed by logarithm.

See `softmax` for more details.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input
  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed.
  * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None

## Other functions

The following `torch` functions support sparse COO tensors:

[`cat()`](generated/torch.cat#torch.cat "torch.cat")
[`dstack()`](generated/torch.dstack#torch.dstack "torch.dstack")
[`empty()`](generated/torch.empty#torch.empty "torch.empty")
[`empty_like()`](generated/torch.empty_like#torch.empty_like
"torch.empty_like") [`hstack()`](generated/torch.hstack#torch.hstack
"torch.hstack")
[`index_select()`](generated/torch.index_select#torch.index_select
"torch.index_select")
[`is_complex()`](generated/torch.is_complex#torch.is_complex
"torch.is_complex")
[`is_floating_point()`](generated/torch.is_floating_point#torch.is_floating_point
"torch.is_floating_point")
[`is_nonzero()`](generated/torch.is_nonzero#torch.is_nonzero
"torch.is_nonzero") `is_same_size()` `is_signed()`
[`is_tensor()`](generated/torch.is_tensor#torch.is_tensor "torch.is_tensor")
[`lobpcg()`](generated/torch.lobpcg#torch.lobpcg "torch.lobpcg")
[`mm()`](generated/torch.mm#torch.mm "torch.mm") `native_norm()`
[`pca_lowrank()`](generated/torch.pca_lowrank#torch.pca_lowrank
"torch.pca_lowrank") `select()` [`stack()`](generated/torch.stack#torch.stack
"torch.stack") [`svd_lowrank()`](generated/torch.svd_lowrank#torch.svd_lowrank
"torch.svd_lowrank") [`unsqueeze()`](generated/torch.unsqueeze#torch.unsqueeze
"torch.unsqueeze") [`vstack()`](generated/torch.vstack#torch.vstack
"torch.vstack") [`zeros()`](generated/torch.zeros#torch.zeros "torch.zeros")
[`zeros_like()`](generated/torch.zeros_like#torch.zeros_like
"torch.zeros_like")

# torch.Storage

A `torch.Storage` is a contiguous, one-dimensional array of a single data
type.

Every [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") has a
corresponding storage of the same data type.

`class torch.FloatStorage(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#FloatStorage)

    

`bfloat16()`

    

Casts this storage to bfloat16 type

`bool()`

    

Casts this storage to bool type

`byte()`

    

Casts this storage to byte type

`char()`

    

Casts this storage to char type

`clone()`

    

Returns a copy of this storage

`complex_double()`

    

Casts this storage to complex double type

`complex_float()`

    

Casts this storage to complex float type

`copy_()`

`cpu()`

    

Returns a CPU copy of this storage if it’s not already on the CPU

`cuda(device=None, non_blocking=False, **kwargs)`

    

Returns a copy of this object in CUDA memory.

If this object is already in CUDA memory and on the correct device, then no
copy is performed and the original object is returned.

Parameters

    

  * **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The destination GPU id. Defaults to the current device.
  * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True` and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect.
  * ****kwargs** – For compatibility, may contain the key `async` in place of the `non_blocking` argument.

`data_ptr()`

`device`

`double()`

    

Casts this storage to double type

`dtype`

`element_size()`

`fill_()`

`float()`

    

Casts this storage to float type

`static from_buffer()`

`static from_file(filename, shared=False, size=0) → Storage`

    

If `shared` is `True`, then memory is shared between all processes. All
changes are written to the file. If `shared` is `False`, then the changes on
the storage do not affect the file.

`size` is the number of elements in the storage. If `shared` is `False`, then
the file must contain at least `size * sizeof(Type)` bytes (`Type` is the type
of storage). If `shared` is `True` the file will be created if needed.

Parameters

    

  * **filename** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – file name to map
  * **shared** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to share memory
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of elements in the storage

`get_device()`

`half()`

    

Casts this storage to half type

`int()`

    

Casts this storage to int type

`is_cuda: bool = False`

`is_pinned()`

`is_shared()`

`is_sparse: bool = False`

`long()`

    

Casts this storage to long type

`new()`

`pin_memory()`

    

Copies the storage to pinned memory, if it’s not already pinned.

`resize_()`

`share_memory_()`

    

Moves the storage to shared memory.

This is a no-op for storages already in shared memory and for CUDA storages,
which do not need to be moved for sharing across processes. Storages in shared
memory cannot be resized.

Returns: self

`short()`

    

Casts this storage to short type

`size()`

`tolist()`

    

Returns a list containing the elements of this storage

`type(dtype=None, non_blocking=False, **kwargs)`

    

Returns the type if `dtype` is not provided, else casts this object to the
specified type.

If this is already of the correct type, no copy is performed and the original
object is returned.

Parameters

    

  * **dtype** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – The desired type
  * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, and the source is in pinned memory and destination is on the GPU or vice versa, the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect.
  * ****kwargs** – For compatibility, may contain the key `async` in place of the `non_blocking` argument. The `async` arg is deprecated.

# Tensor Attributes

Each `torch.Tensor` has a `torch.dtype`, `torch.device`, and `torch.layout`.

## torch.dtype

`class torch.dtype`

A `torch.dtype` is an object that represents the data type of a
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor"). PyTorch has twelve
different data types:

Data type | dtype | Legacy Constructors  
---|---|---  
32-bit floating point | `torch.float32` or `torch.float` | `torch.*.FloatTensor`  
64-bit floating point | `torch.float64` or `torch.double` | `torch.*.DoubleTensor`  
64-bit complex | `torch.complex64` or `torch.cfloat` |   
128-bit complex | `torch.complex128` or `torch.cdouble` |   
16-bit floating point 1 | `torch.float16` or `torch.half` | `torch.*.HalfTensor`  
16-bit floating point 2 | `torch.bfloat16` | `torch.*.BFloat16Tensor`  
8-bit integer (unsigned) | `torch.uint8` | `torch.*.ByteTensor`  
8-bit integer (signed) | `torch.int8` | `torch.*.CharTensor`  
16-bit integer (signed) | `torch.int16` or `torch.short` | `torch.*.ShortTensor`  
32-bit integer (signed) | `torch.int32` or `torch.int` | `torch.*.IntTensor`  
64-bit integer (signed) | `torch.int64` or `torch.long` | `torch.*.LongTensor`  
Boolean | `torch.bool` | `torch.*.BoolTensor`  
  
`1`

    

Sometimes referred to as binary16: uses 1 sign, 5 exponent, and 10 significand
bits. Useful when precision is important.

`2`

    

Sometimes referred to as Brain Floating Point: use 1 sign, 8 exponent and 7
significand bits. Useful when range is important, since it has the same number
of exponent bits as `float32`

To find out if a `torch.dtype` is a floating point data type, the property
[`is_floating_point`](generated/torch.is_floating_point#torch.is_floating_point
"torch.is_floating_point") can be used, which returns `True` if the data type
is a floating point data type.

To find out if a `torch.dtype` is a complex data type, the property
[`is_complex`](generated/torch.is_complex#torch.is_complex "torch.is_complex")
can be used, which returns `True` if the data type is a complex data type.

When the dtypes of inputs to an arithmetic operation (`add`, `sub`, `div`,
`mul`) differ, we promote by finding the minimum dtype that satisfies the
following rules:

  * If the type of a scalar operand is of a higher category than tensor operands (where complex > floating > integral > boolean), we promote to a type with sufficient size to hold all scalar operands of that category.
  * If a zero-dimension tensor operand has a higher category than dimensioned operands, we promote to a type with sufficient size and category to hold all zero-dim tensor operands of that category.
  * If there are no higher-category zero-dim operands, we promote to a type with sufficient size and category to hold all dimensioned operands.

A floating point scalar operand has dtype `torch.get_default_dtype()` and an
integral non-boolean scalar operand has dtype `torch.int64`. Unlike numpy, we
do not inspect values when determining the minimum `dtypes` of an operand.
Quantized and complex types are not yet supported.

Promotion Examples:

    
    
    >>> float_tensor = torch.ones(1, dtype=torch.float)
    >>> double_tensor = torch.ones(1, dtype=torch.double)
    >>> complex_float_tensor = torch.ones(1, dtype=torch.complex64)
    >>> complex_double_tensor = torch.ones(1, dtype=torch.complex128)
    >>> int_tensor = torch.ones(1, dtype=torch.int)
    >>> long_tensor = torch.ones(1, dtype=torch.long)
    >>> uint_tensor = torch.ones(1, dtype=torch.uint8)
    >>> double_tensor = torch.ones(1, dtype=torch.double)
    >>> bool_tensor = torch.ones(1, dtype=torch.bool)
    # zero-dim tensors
    >>> long_zerodim = torch.tensor(1, dtype=torch.long)
    >>> int_zerodim = torch.tensor(1, dtype=torch.int)
    
    >>> torch.add(5, 5).dtype
    torch.int64
    # 5 is an int64, but does not have higher category than int_tensor so is not considered.
    >>> (int_tensor + 5).dtype
    torch.int32
    >>> (int_tensor + long_zerodim).dtype
    torch.int32
    >>> (long_tensor + int_tensor).dtype
    torch.int64
    >>> (bool_tensor + long_tensor).dtype
    torch.int64
    >>> (bool_tensor + uint_tensor).dtype
    torch.uint8
    >>> (float_tensor + double_tensor).dtype
    torch.float64
    >>> (complex_float_tensor + complex_double_tensor).dtype
    torch.complex128
    >>> (bool_tensor + int_tensor).dtype
    torch.int32
    # Since long is a different kind than float, result dtype only needs to be large enough
    # to hold the float.
    >>> torch.add(long_tensor, float_tensor).dtype
    torch.float32
    

`When the output tensor of an arithmetic operation is specified, we allow
casting to its dtype except that:`

    

  * An integral output tensor cannot accept a floating point tensor.
  * A boolean output tensor cannot accept a non-boolean tensor.
  * A non-complex output tensor cannot accept a complex tensor

Casting Examples:

    
    
    # allowed:
    >>> float_tensor *= double_tensor
    >>> float_tensor *= int_tensor
    >>> float_tensor *= uint_tensor
    >>> float_tensor *= bool_tensor
    >>> float_tensor *= double_tensor
    >>> int_tensor *= long_tensor
    >>> int_tensor *= uint_tensor
    >>> uint_tensor *= int_tensor
    
    # disallowed (RuntimeError: result type can't be cast to the desired output type):
    >>> int_tensor *= float_tensor
    >>> bool_tensor *= int_tensor
    >>> bool_tensor *= uint_tensor
    >>> float_tensor *= complex_float_tensor
    

## torch.device

`class torch.device`

A `torch.device` is an object representing the device on which a
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") is or will be allocated.

The `torch.device` contains a device type (`'cpu'` or `'cuda'`) and optional
device ordinal for the device type. If the device ordinal is not present, this
object will always represent the current device for the device type, even
after [`torch.cuda.set_device()`](cuda#torch.cuda.set_device
"torch.cuda.set_device") is called; e.g., a
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") constructed with device
`'cuda'` is equivalent to `'cuda:X'` where X is the result of
[`torch.cuda.current_device()`](cuda#torch.cuda.current_device
"torch.cuda.current_device").

A [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor")’s device can be
accessed via the [`Tensor.device`](tensors#torch.Tensor.device
"torch.Tensor.device") property.

A `torch.device` can be constructed via a string or via a string and device
ordinal

Via a string:

    
    
    >>> torch.device('cuda:0')
    device(type='cuda', index=0)
    
    >>> torch.device('cpu')
    device(type='cpu')
    
    >>> torch.device('cuda')  # current cuda device
    device(type='cuda')
    

Via a string and device ordinal:

    
    
    >>> torch.device('cuda', 0)
    device(type='cuda', index=0)
    
    >>> torch.device('cpu', 0)
    device(type='cpu', index=0)
    

Note

The `torch.device` argument in functions can generally be substituted with a
string. This allows for fast prototyping of code.

    
    
    >>> # Example of a function that takes in a torch.device
    >>> cuda1 = torch.device('cuda:1')
    >>> torch.randn((2,3), device=cuda1)
    
    
    
    >>> # You can substitute the torch.device with a string
    >>> torch.randn((2,3), device='cuda:1')
    

Note

For legacy reasons, a device can be constructed via a single device ordinal,
which is treated as a cuda device. This matches
[`Tensor.get_device()`](tensors#torch.Tensor.get_device
"torch.Tensor.get_device"), which returns an ordinal for cuda tensors and is
not supported for cpu tensors.

    
    
    >>> torch.device(1)
    device(type='cuda', index=1)
    

Note

Methods which take a device will generally accept a (properly formatted)
string or (legacy) integer device ordinal, i.e. the following are all
equivalent:

    
    
    >>> torch.randn((2,3), device=torch.device('cuda:1'))
    >>> torch.randn((2,3), device='cuda:1')
    >>> torch.randn((2,3), device=1)  # legacy
    

## torch.layout

`class torch.layout`

Warning

The `torch.layout` class is in beta and subject to change.

A `torch.layout` is an object that represents the memory layout of a
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor"). Currently, we support
`torch.strided` (dense Tensors) and have beta support for `torch.sparse_coo`
(sparse COO Tensors).

`torch.strided` represents dense Tensors and is the memory layout that is most
commonly used. Each strided tensor has an associated `torch.Storage`, which
holds its data. These tensors provide multi-dimensional,
[strided](https://en.wikipedia.org/wiki/Stride_of_an_array) view of a storage.
Strides are a list of integers: the k-th stride represents the jump in the
memory necessary to go from one element to the next one in the k-th dimension
of the Tensor. This concept makes it possible to perform many tensor
operations efficiently.

Example:

    
    
    >>> x = torch.Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
    >>> x.stride()
    (5, 1)
    
    >>> x.t().stride()
    (1, 5)
    

For more information on `torch.sparse_coo` tensors, see
[torch.sparse](sparse#sparse-docs).

## torch.memory_format

`class torch.memory_format`

A `torch.memory_format` is an object representing the memory format on which a
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") is or will be allocated.

Possible values are:

  * `torch.contiguous_format`: Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in decreasing order.
  * `torch.channels_last`: Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in `strides[0] > strides[2] > strides[3] > strides[1] == 1` aka NHWC order.
  * `torch.preserve_format`: Used in functions like `clone` to preserve the memory format of the input tensor. If input tensor is allocated in dense non-overlapping memory, the output tensor strides will be copied from the input. Otherwise output strides will follow `torch.contiguous_format`

# torch.utils.tensorboard

Before going further, more details on TensorBoard can be found at
<https://www.tensorflow.org/tensorboard/>

Once you’ve installed TensorBoard, these utilities let you log PyTorch models
and metrics into a directory for visualization within the TensorBoard UI.
Scalars, images, histograms, graphs, and embedding visualizations are all
supported for PyTorch models and tensors as well as Caffe2 nets and blobs.

The SummaryWriter class is your main entry to log data for consumption and
visualization by TensorBoard. For example:

    
    
    import torch
    import torchvision
    from torch.utils.tensorboard import SummaryWriter
    from torchvision import datasets, transforms
    
    # Writer will output to ./runs/ directory by default
    writer = SummaryWriter()
    
    transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])
    trainset = datasets.MNIST('mnist_train', train=True, download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)
    model = torchvision.models.resnet50(False)
    # Have ResNet model take in grayscale rather than RGB
    model.conv1 = torch.nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
    images, labels = next(iter(trainloader))
    
    grid = torchvision.utils.make_grid(images)
    writer.add_image('images', grid, 0)
    writer.add_graph(model, images)
    writer.close()
    

This can then be visualized with TensorBoard, which should be installable and
runnable with:

    
    
    pip install tensorboard
    tensorboard --logdir=runs
    

Lots of information can be logged for one experiment. To avoid cluttering the
UI and have better result clustering, we can group plots by naming them
hierarchically. For example, “Loss/train” and “Loss/test” will be grouped
together, while “Accuracy/train” and “Accuracy/test” will be grouped
separately in the TensorBoard interface.

    
    
    from torch.utils.tensorboard import SummaryWriter
    import numpy as np
    
    writer = SummaryWriter()
    
    for n_iter in range(100):
        writer.add_scalar('Loss/train', np.random.random(), n_iter)
        writer.add_scalar('Loss/test', np.random.random(), n_iter)
        writer.add_scalar('Accuracy/train', np.random.random(), n_iter)
        writer.add_scalar('Accuracy/test', np.random.random(), n_iter)
    

Expected result:

[](_images/hier_tags.png)

`class torch.utils.tensorboard.writer.SummaryWriter(log_dir=None, comment='',
purge_step=None, max_queue=10, flush_secs=120, filename_suffix='')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter)

    

Writes entries directly to event files in the log_dir to be consumed by
TensorBoard.

The `SummaryWriter` class provides a high-level API to create an event file in
a given directory and add summaries and events to it. The class updates the
file contents asynchronously. This allows a training program to call methods
to add data to the file directly from the training loop, without slowing down
training.

`__init__(log_dir=None, comment='', purge_step=None, max_queue=10,
flush_secs=120, filename_suffix='')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.__init__)

    

Creates a `SummaryWriter` that will write out events and summaries to the
event file.

Parameters

    

  * **log_dir** (_string_) – Save directory location. Default is runs/**CURRENT_DATETIME_HOSTNAME** , which changes after each run. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them.
  * **comment** (_string_) – Comment log_dir suffix appended to the default `log_dir`. If `log_dir` is assigned, this argument has no effect.
  * **purge_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – When logging crashes at step T+XT+X and restarts at step TT , any events whose global_step larger or equal to TT will be purged and hidden from TensorBoard. Note that crashed and resumed experiments should have the same `log_dir`.
  * **max_queue** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of the queue for pending events and summaries before one of the ‘add’ calls forces a flush to disk. Default is ten items.
  * **flush_secs** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – How often, in seconds, to flush the pending events and summaries to disk. Default is every two minutes.
  * **filename_suffix** (_string_) – Suffix added to all event filenames in the log_dir directory. More details on filename construction in tensorboard.summary.writer.event_file_writer.EventFileWriter.

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    
    # create a summary writer with automatically generated folder name.
    writer = SummaryWriter()
    # folder location: runs/May04_22-14-54_s-MacBook-Pro.local/
    
    # create a summary writer using the specified folder name.
    writer = SummaryWriter("my_experiment")
    # folder location: my_experiment
    
    # create a summary writer with comment appended.
    writer = SummaryWriter(comment="LR_0.1_BATCH_16")
    # folder location: runs/May04_22-14-54_s-MacBook-Pro.localLR_0.1_BATCH_16/
    

`add_scalar(tag, scalar_value, global_step=None, walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_scalar)

    

Add scalar data to summary.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **scalar_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _string/blobname_) – Value to save
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) with seconds after epoch of event

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    writer = SummaryWriter()
    x = range(100)
    for i in x:
        writer.add_scalar('y=2x', i * 2, i)
    writer.close()
    

Expected result:

[](_images/add_scalar.png)

`add_scalars(main_tag, tag_scalar_dict, global_step=None, walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_scalars)

    

Adds many scalar data to summary.

Parameters

    

  * **main_tag** (_string_) – The parent name for the tags
  * **tag_scalar_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Key-value pair storing the tag and corresponding values
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    writer = SummaryWriter()
    r = 5
    for i in range(100):
        writer.add_scalars('run_14h', {'xsinx':i*np.sin(i/r),
                                        'xcosx':i*np.cos(i/r),
                                        'tanx': np.tan(i/r)}, i)
    writer.close()
    # This call adds three values to the same scalar plot with the tag
    # 'run_14h' in TensorBoard's scalar section.
    

Expected result:

[](_images/add_scalars.png)

`add_histogram(tag, values, global_step=None, bins='tensorflow',
walltime=None, max_bins=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_histogram)

    

Add histogram to summary.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **values** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Values to build histogram
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **bins** (_string_) – One of {‘tensorflow’,’auto’, ‘fd’, …}. This determines how the bins are made. You can find other options in: <https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html>
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    import numpy as np
    writer = SummaryWriter()
    for i in range(10):
        x = np.random.random(1000)
        writer.add_histogram('distribution centers', x + i, i)
    writer.close()
    

Expected result:

[](_images/add_histogram.png)

`add_image(tag, img_tensor, global_step=None, walltime=None,
dataformats='CHW')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_image)

    

Add image data to summary.

Note that this requires the `pillow` package.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **img_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Image data
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Shape:

    

img_tensor: Default is (3,H,W)(3, H, W) . You can use
`torchvision.utils.make_grid()` to convert a batch of tensor into 3xHxW format
or call `add_images` and let us do the job. Tensor with (1,H,W)(1, H, W) ,
(H,W)(H, W) , (H,W,3)(H, W, 3) is also suitable as long as corresponding
`dataformats` argument is passed, e.g. `CHW`, `HWC`, `HW`.

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    import numpy as np
    img = np.zeros((3, 100, 100))
    img[0] = np.arange(0, 10000).reshape(100, 100) / 10000
    img[1] = 1 - np.arange(0, 10000).reshape(100, 100) / 10000
    
    img_HWC = np.zeros((100, 100, 3))
    img_HWC[:, :, 0] = np.arange(0, 10000).reshape(100, 100) / 10000
    img_HWC[:, :, 1] = 1 - np.arange(0, 10000).reshape(100, 100) / 10000
    
    writer = SummaryWriter()
    writer.add_image('my_image', img, 0)
    
    # If you have non-default dimension setting, set the dataformats argument.
    writer.add_image('my_image_HWC', img_HWC, 0, dataformats='HWC')
    writer.close()
    

Expected result:

[](_images/add_image.png)

`add_images(tag, img_tensor, global_step=None, walltime=None,
dataformats='NCHW')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_images)

    

Add batched image data to summary.

Note that this requires the `pillow` package.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **img_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Image data
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event
  * **dataformats** (_string_) – Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc.

Shape:

    

img_tensor: Default is (N,3,H,W)(N, 3, H, W) . If `dataformats` is specified,
other shape will be accepted. e.g. NCHW or NHWC.

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    import numpy as np
    
    img_batch = np.zeros((16, 3, 100, 100))
    for i in range(16):
        img_batch[i, 0] = np.arange(0, 10000).reshape(100, 100) / 10000 / 16 * i
        img_batch[i, 1] = (1 - np.arange(0, 10000).reshape(100, 100) / 10000) / 16 * i
    
    writer = SummaryWriter()
    writer.add_images('my_image_batch', img_batch, 0)
    writer.close()
    

Expected result:

[](_images/add_images.png)

`add_figure(tag, figure, global_step=None, close=True, walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_figure)

    

Render matplotlib figure into an image and add it to summary.

Note that this requires the `matplotlib` package.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **figure** (_matplotlib.pyplot.figure_) – Figure or a list of figures
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **close** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Flag to automatically close the figure
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

`add_video(tag, vid_tensor, global_step=None, fps=4, walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_video)

    

Add video data to summary.

Note that this requires the `moviepy` package.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **vid_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Video data
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **fps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Frames per second
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Shape:

    

vid_tensor: (N,T,C,H,W)(N, T, C, H, W) . The values should lie in [0, 255] for
type `uint8` or [0, 1] for type `float`.

`add_audio(tag, snd_tensor, global_step=None, sample_rate=44100,
walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_audio)

    

Add audio data to summary.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **snd_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Sound data
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **sample_rate** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – sample rate in Hz
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Shape:

    

snd_tensor: (1,L)(1, L) . The values should lie between [-1, 1].

`add_text(tag, text_string, global_step=None, walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_text)

    

Add text data to summary.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **text_string** (_string_) – String to save
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Examples:

    
    
    writer.add_text('lstm', 'This is an lstm', 0)
    writer.add_text('rnn', 'This is an rnn', 10)
    

`add_graph(model, input_to_model=None, verbose=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_graph)

    

Add graph data to summary.

Parameters

    

  * **model** ([torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – Model to draw.
  * **input_to_model** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _list of torch.Tensor_) – A variable or a tuple of variables to be fed.
  * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to print graph structure in console.

`add_embedding(mat, metadata=None, label_img=None, global_step=None,
tag='default', metadata_header=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_embedding)

    

Add embedding projector data to summary.

Parameters

    

  * **mat** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _numpy.array_) – A matrix which each row is the feature vector of the data point
  * **metadata** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A list of labels, each element will be convert to string
  * **label_img** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Images correspond to each data point
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **tag** (_string_) – Name for the embedding

Shape:

    

mat: (N,D)(N, D) , where N is number of data and D is feature dimension

label_img: (N,C,H,W)(N, C, H, W)

Examples:

    
    
    import keyword
    import torch
    meta = []
    while len(meta)<100:
        meta = meta+keyword.kwlist # get some strings
    meta = meta[:100]
    
    for i, v in enumerate(meta):
        meta[i] = v+str(i)
    
    label_img = torch.rand(100, 3, 10, 32)
    for i in range(100):
        label_img[i]*=i/100.0
    
    writer.add_embedding(torch.randn(100, 5), metadata=meta, label_img=label_img)
    writer.add_embedding(torch.randn(100, 5), label_img=label_img)
    writer.add_embedding(torch.randn(100, 5), metadata=meta)
    

`add_pr_curve(tag, labels, predictions, global_step=None, num_thresholds=127,
weights=None, walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_pr_curve)

    

Adds precision recall curve. Plotting a precision-recall curve lets you
understand your model’s performance under different threshold settings. With
this function, you provide the ground truth labeling (T/F) and prediction
confidence (usually the output of your model) for each target. The TensorBoard
UI will let you choose the threshold interactively.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **labels** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Ground truth data. Binary label for each element.
  * **predictions** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – The probability that an element be classified as true. Value should be in [0, 1]
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **num_thresholds** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of thresholds used to draw the curve.
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    import numpy as np
    labels = np.random.randint(2, size=100)  # binary label
    predictions = np.random.rand(100)
    writer = SummaryWriter()
    writer.add_pr_curve('pr_curve', labels, predictions, 0)
    writer.close()
    

`add_custom_scalars(layout)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_custom_scalars)

    

Create special chart by collecting charts tags in ‘scalars’. Note that this
function can only be called once for each SummaryWriter() object. Because it
only provides metadata to tensorboard, the function can be called before or
after the training loop.

Parameters

    

**layout** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in
Python v3.9\)")) – {categoryName: _charts_}, where _charts_ is also a
dictionary {chartName: _ListOfProperties_}. The first element in
_ListOfProperties_ is the chart’s type (one of **Multiline** or **Margin**)
and the second element should be a list containing the tags you have used in
add_scalar function, which will be collected into the new chart.

Examples:

    
    
    layout = {'Taiwan':{'twse':['Multiline',['twse/0050', 'twse/2330']]},
                 'USA':{ 'dow':['Margin',   ['dow/aaa', 'dow/bbb', 'dow/ccc']],
                      'nasdaq':['Margin',   ['nasdaq/aaa', 'nasdaq/bbb', 'nasdaq/ccc']]}}
    
    writer.add_custom_scalars(layout)
    

`add_mesh(tag, vertices, colors=None, faces=None, config_dict=None,
global_step=None, walltime=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_mesh)

    

Add meshes or 3D point clouds to TensorBoard. The visualization is based on
Three.js, so it allows users to interact with the rendered object. Besides the
basic definitions such as vertices, faces, users can further provide camera
parameter, lighting condition, etc. Please check
<https://threejs.org/docs/index.html#manual/en/introduction/Creating-a-scene>
for advanced usage.

Parameters

    

  * **tag** (_string_) – Data identifier
  * **vertices** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – List of the 3D coordinates of vertices.
  * **colors** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Colors for each vertex
  * **faces** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Indices of vertices within each triangle. (Optional)
  * **config_dict** – Dictionary with ThreeJS classes names and configuration.
  * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record
  * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event

Shape:

    

vertices: (B,N,3)(B, N, 3) . (batch, number_of_vertices, channels)

colors: (B,N,3)(B, N, 3) . The values should lie in [0, 255] for type `uint8`
or [0, 1] for type `float`.

faces: (B,N,3)(B, N, 3) . The values should lie in [0, number_of_vertices] for
type `uint8`.

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    vertices_tensor = torch.as_tensor([
        [1, 1, 1],
        [-1, -1, 1],
        [1, -1, -1],
        [-1, 1, -1],
    ], dtype=torch.float).unsqueeze(0)
    colors_tensor = torch.as_tensor([
        [255, 0, 0],
        [0, 255, 0],
        [0, 0, 255],
        [255, 0, 255],
    ], dtype=torch.int).unsqueeze(0)
    faces_tensor = torch.as_tensor([
        [0, 2, 3],
        [0, 3, 1],
        [0, 1, 2],
        [1, 3, 2],
    ], dtype=torch.int).unsqueeze(0)
    
    writer = SummaryWriter()
    writer.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor)
    
    writer.close()
    

`add_hparams(hparam_dict, metric_dict, hparam_domain_discrete=None,
run_name=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_hparams)

    

Add a set of hyperparameters to be compared in TensorBoard.

Parameters

    

  * **hparam_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Each key-value pair in the dictionary is the name of the hyper parameter and it’s corresponding value. The type of the value can be one of `bool`, `string`, `float`, `int`, or `None`.
  * **metric_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Each key-value pair in the dictionary is the name of the metric and it’s corresponding value. Note that the key used here should be unique in the tensorboard record. Otherwise the value you added by `add_scalar` will be displayed in hparam plugin. In most cases, this is unwanted.
  * **hparam_domain_discrete** – (Optional[Dict[str, List[Any]]]) A dictionary that contains names of the hyperparameters and all discrete values they can hold
  * **run_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Name of the run, to be included as part of the logdir. If unspecified, will use current timestamp.

Examples:

    
    
    from torch.utils.tensorboard import SummaryWriter
    with SummaryWriter() as w:
        for i in range(5):
            w.add_hparams({'lr': 0.1*i, 'bsize': i},
                          {'hparam/accuracy': 10*i, 'hparam/loss': 10*i})
    

Expected result:

[](_images/add_hparam.png)

`flush()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.flush)

    

Flushes the event file to disk. Call this method to make sure that all pending
events have been written to disk.

`close()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.close)

# torch.Tensor

A `torch.Tensor` is a multi-dimensional matrix containing elements of a single
data type.

Torch defines 10 tensor types with CPU and GPU variants which are as follows:

Data type | dtype | CPU tensor | GPU tensor  
---|---|---|---  
32-bit floating point | `torch.float32` or `torch.float` | `torch.FloatTensor` | `torch.cuda.FloatTensor`  
64-bit floating point | `torch.float64` or `torch.double` | `torch.DoubleTensor` | `torch.cuda.DoubleTensor`  
16-bit floating point 1 | `torch.float16` or `torch.half` | `torch.HalfTensor` | `torch.cuda.HalfTensor`  
16-bit floating point 2 | `torch.bfloat16` | `torch.BFloat16Tensor` | `torch.cuda.BFloat16Tensor`  
32-bit complex | `torch.complex32` |  |   
64-bit complex | `torch.complex64` |  |   
128-bit complex | `torch.complex128` or `torch.cdouble` |  |   
8-bit integer (unsigned) | `torch.uint8` | `torch.ByteTensor` | `torch.cuda.ByteTensor`  
8-bit integer (signed) | `torch.int8` | `torch.CharTensor` | `torch.cuda.CharTensor`  
16-bit integer (signed) | `torch.int16` or `torch.short` | `torch.ShortTensor` | `torch.cuda.ShortTensor`  
32-bit integer (signed) | `torch.int32` or `torch.int` | `torch.IntTensor` | `torch.cuda.IntTensor`  
64-bit integer (signed) | `torch.int64` or `torch.long` | `torch.LongTensor` | `torch.cuda.LongTensor`  
Boolean | `torch.bool` | `torch.BoolTensor` | `torch.cuda.BoolTensor`  
  
`1`

    

Sometimes referred to as binary16: uses 1 sign, 5 exponent, and 10 significand
bits. Useful when precision is important at the expense of range.

`2`

    

Sometimes referred to as Brain Floating Point: uses 1 sign, 8 exponent, and 7
significand bits. Useful when range is important, since it has the same number
of exponent bits as `float32`

`torch.Tensor` is an alias for the default tensor type (`torch.FloatTensor`).

A tensor can be constructed from a Python
[`list`](https://docs.python.org/3/library/stdtypes.html#list "\(in Python
v3.9\)") or sequence using the
[`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor")
constructor:

    
    
    >>> torch.tensor([[1., -1.], [1., -1.]])
    tensor([[ 1.0000, -1.0000],
            [ 1.0000, -1.0000]])
    >>> torch.tensor(np.array([[1, 2, 3], [4, 5, 6]]))
    tensor([[ 1,  2,  3],
            [ 4,  5,  6]])
    

Warning

[`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor") always
copies `data`. If you have a Tensor `data` and just want to change its
`requires_grad` flag, use `requires_grad_()` or
[`detach()`](autograd#torch.Tensor.detach "torch.Tensor.detach") to avoid a
copy. If you have a numpy array and want to avoid a copy, use
[`torch.as_tensor()`](generated/torch.as_tensor#torch.as_tensor
"torch.as_tensor").

A tensor of specific data type can be constructed by passing a
[`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype")
and/or a [`torch.device`](tensor_attributes#torch.torch.device
"torch.torch.device") to a constructor or tensor creation op:

    
    
    >>> torch.zeros([2, 4], dtype=torch.int32)
    tensor([[ 0,  0,  0,  0],
            [ 0,  0,  0,  0]], dtype=torch.int32)
    >>> cuda0 = torch.device('cuda:0')
    >>> torch.ones([2, 4], dtype=torch.float64, device=cuda0)
    tensor([[ 1.0000,  1.0000,  1.0000,  1.0000],
            [ 1.0000,  1.0000,  1.0000,  1.0000]], dtype=torch.float64, device='cuda:0')
    

The contents of a tensor can be accessed and modified using Python’s indexing
and slicing notation:

    
    
    >>> x = torch.tensor([[1, 2, 3], [4, 5, 6]])
    >>> print(x[1][2])
    tensor(6)
    >>> x[0][1] = 8
    >>> print(x)
    tensor([[ 1,  8,  3],
            [ 4,  5,  6]])
    

Use `torch.Tensor.item()` to get a Python number from a tensor containing a
single value:

    
    
    >>> x = torch.tensor([[1]])
    >>> x
    tensor([[ 1]])
    >>> x.item()
    1
    >>> x = torch.tensor(2.5)
    >>> x
    tensor(2.5000)
    >>> x.item()
    2.5
    

A tensor can be created with `requires_grad=True` so that
[`torch.autograd`](autograd#module-torch.autograd "torch.autograd") records
operations on them for automatic differentiation.

    
    
    >>> x = torch.tensor([[1., -1.], [1., 1.]], requires_grad=True)
    >>> out = x.pow(2).sum()
    >>> out.backward()
    >>> x.grad
    tensor([[ 2.0000, -2.0000],
            [ 2.0000,  2.0000]])
    

Each tensor has an associated `torch.Storage`, which holds its data. The
tensor class also provides multi-dimensional,
[strided](https://en.wikipedia.org/wiki/Stride_of_an_array) view of a storage
and defines numeric operations on it.

Note

For more information on tensor views, see [Tensor Views](tensor_view#tensor-
view-doc).

Note

For more information on the
[`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"),
[`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"),
and [`torch.layout`](tensor_attributes#torch.torch.layout
"torch.torch.layout") attributes of a `torch.Tensor`, see [Tensor
Attributes](tensor_attributes#tensor-attributes-doc).

Note

Methods which mutate a tensor are marked with an underscore suffix. For
example, `torch.FloatTensor.abs_()` computes the absolute value in-place and
returns the modified tensor, while `torch.FloatTensor.abs()` computes the
result in a new tensor.

Note

To change an existing tensor’s
[`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device")
and/or [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), consider using `to()` method on the tensor.

Warning

Current implementation of `torch.Tensor` introduces memory overhead, thus it
might lead to unexpectedly high memory usage in the applications with many
tiny tensors. If this is your case, consider using one large structure.

`class torch.Tensor`

    

There are a few main ways to create a tensor, depending on your use case.

  * To create a tensor with pre-existing data, use [`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor").
  * To create a tensor with specific size, use `torch.*` tensor creation ops (see [Creation Ops](torch#tensor-creation-ops)).
  * To create a tensor with the same size (and similar types) as another tensor, use `torch.*_like` tensor creation ops (see [Creation Ops](torch#tensor-creation-ops)).
  * To create a tensor with similar type but different size as another tensor, use `tensor.new_*` creation ops.

`new_tensor(data, dtype=None, device=None, requires_grad=False) → Tensor`

    

Returns a new Tensor with `data` as the tensor data. By default, the returned
Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device
"torch.torch.device") as this tensor.

Warning

`new_tensor()` always copies `data`. If you have a Tensor `data` and want to
avoid a copy, use `torch.Tensor.requires_grad_()` or
[`torch.Tensor.detach()`](autograd#torch.Tensor.detach "torch.Tensor.detach").
If you have a numpy array and want to avoid a copy, use
[`torch.from_numpy()`](generated/torch.from_numpy#torch.from_numpy
"torch.from_numpy").

Warning

When data is a tensor `x`, `new_tensor()` reads out ‘the data’ from whatever
it is passed, and constructs a leaf variable. Therefore `tensor.new_tensor(x)`
is equivalent to `x.clone().detach()` and `tensor.new_tensor(x,
requires_grad=True)` is equivalent to
`x.clone().detach().requires_grad_(True)`. The equivalents using `clone()` and
`detach()` are recommended.

Parameters

    

  * **data** (_array_like_) – The returned Tensor copies `data`.
  * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor.
  * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> tensor = torch.ones((2,), dtype=torch.int8)
    >>> data = [[0, 1], [2, 3]]
    >>> tensor.new_tensor(data)
    tensor([[ 0,  1],
            [ 2,  3]], dtype=torch.int8)
    

`new_full(size, fill_value, dtype=None, device=None, requires_grad=False) →
Tensor`

    

Returns a Tensor of size `size` filled with `fill_value`. By default, the
returned Tensor has the same
[`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and
[`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as
this tensor.

Parameters

    

  * **fill_value** (_scalar_) – the number to fill the output tensor with.
  * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor.
  * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> tensor = torch.ones((2,), dtype=torch.float64)
    >>> tensor.new_full((3, 4), 3.141592)
    tensor([[ 3.1416,  3.1416,  3.1416,  3.1416],
            [ 3.1416,  3.1416,  3.1416,  3.1416],
            [ 3.1416,  3.1416,  3.1416,  3.1416]], dtype=torch.float64)
    

`new_empty(size, dtype=None, device=None, requires_grad=False) → Tensor`

    

Returns a Tensor of size `size` filled with uninitialized data. By default,
the returned Tensor has the same
[`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and
[`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as
this tensor.

Parameters

    

  * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor.
  * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> tensor = torch.ones(())
    >>> tensor.new_empty((2, 3))
    tensor([[ 5.8182e-18,  4.5765e-41, -1.0545e+30],
            [ 3.0949e-41,  4.4842e-44,  0.0000e+00]])
    

`new_ones(size, dtype=None, device=None, requires_grad=False) → Tensor`

    

Returns a Tensor of size `size` filled with `1`. By default, the returned
Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device
"torch.torch.device") as this tensor.

Parameters

    

  * **size** (_int..._) – a list, tuple, or `torch.Size` of integers defining the shape of the output tensor.
  * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor.
  * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> tensor = torch.tensor((), dtype=torch.int32)
    >>> tensor.new_ones((2, 3))
    tensor([[ 1,  1,  1],
            [ 1,  1,  1]], dtype=torch.int32)
    

`new_zeros(size, dtype=None, device=None, requires_grad=False) → Tensor`

    

Returns a Tensor of size `size` filled with `0`. By default, the returned
Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device
"torch.torch.device") as this tensor.

Parameters

    

  * **size** (_int..._) – a list, tuple, or `torch.Size` of integers defining the shape of the output tensor.
  * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor.
  * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor.
  * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`.

Example:

    
    
    >>> tensor = torch.tensor((), dtype=torch.float64)
    >>> tensor.new_zeros((2, 3))
    tensor([[ 0.,  0.,  0.],
            [ 0.,  0.,  0.]], dtype=torch.float64)
    

`is_cuda`

    

Is `True` if the Tensor is stored on the GPU, `False` otherwise.

`is_quantized`

    

Is `True` if the Tensor is quantized, `False` otherwise.

`is_meta`

    

Is `True` if the Tensor is a meta tensor, `False` otherwise. Meta tensors are
like normal tensors, but they carry no data.

`device`

    

Is the [`torch.device`](tensor_attributes#torch.torch.device
"torch.torch.device") where this Tensor is.

`grad`

    

This attribute is `None` by default and becomes a Tensor the first time a call
to [`backward()`](autograd#torch.Tensor.backward "torch.Tensor.backward")
computes gradients for `self`. The attribute will then contain the gradients
computed and future calls to [`backward()`](autograd#torch.Tensor.backward
"torch.Tensor.backward") will accumulate (add) gradients into it.

`ndim`

    

Alias for `dim()`

`T`

    

Is this Tensor with its dimensions reversed.

If `n` is the number of dimensions in `x`, `x.T` is equivalent to
`x.permute(n-1, n-2, ..., 0)`.

`real`

    

Returns a new tensor containing real values of the `self` tensor. The returned
tensor and `self` share the same underlying storage.

Warning

[`real()`](generated/torch.real#torch.real "torch.real") is only supported for
tensors with complex dtypes.

Example::

    
    
    
    >>> x=torch.randn(4, dtype=torch.cfloat)
    >>> x
    tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)])
    >>> x.real
    tensor([ 0.3100, -0.5445, -1.6492, -0.0638])
    

`imag`

    

Returns a new tensor containing imaginary values of the `self` tensor. The
returned tensor and `self` share the same underlying storage.

Warning

[`imag()`](generated/torch.imag#torch.imag "torch.imag") is only supported for
tensors with complex dtypes.

Example::

    
    
    
    >>> x=torch.randn(4, dtype=torch.cfloat)
    >>> x
    tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)])
    >>> x.imag
    tensor([ 0.3553, -0.7896, -0.0633, -0.8119])
    

`abs() → Tensor`

    

See [`torch.abs()`](generated/torch.abs#torch.abs "torch.abs")

`abs_() → Tensor`

    

In-place version of `abs()`

`absolute() → Tensor`

    

Alias for [`abs()`](generated/torch.abs#torch.abs "torch.abs")

`absolute_() → Tensor`

    

In-place version of `absolute()` Alias for `abs_()`

`acos() → Tensor`

    

See [`torch.acos()`](generated/torch.acos#torch.acos "torch.acos")

`acos_() → Tensor`

    

In-place version of `acos()`

`arccos() → Tensor`

    

See [`torch.arccos()`](generated/torch.arccos#torch.arccos "torch.arccos")

`arccos_() → Tensor`

    

In-place version of `arccos()`

`add(other, *, alpha=1) → Tensor`

    

Add a scalar or tensor to `self` tensor. If both `alpha` and `other` are
specified, each element of `other` is scaled by `alpha` before being used.

When `other` is a tensor, the shape of `other` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the shape of the underlying tensor

See [`torch.add()`](generated/torch.add#torch.add "torch.add")

`add_(other, *, alpha=1) → Tensor`

    

In-place version of `add()`

`addbmm(batch1, batch2, *, beta=1, alpha=1) → Tensor`

    

See [`torch.addbmm()`](generated/torch.addbmm#torch.addbmm "torch.addbmm")

`addbmm_(batch1, batch2, *, beta=1, alpha=1) → Tensor`

    

In-place version of `addbmm()`

`addcdiv(tensor1, tensor2, *, value=1) → Tensor`

    

See [`torch.addcdiv()`](generated/torch.addcdiv#torch.addcdiv "torch.addcdiv")

`addcdiv_(tensor1, tensor2, *, value=1) → Tensor`

    

In-place version of `addcdiv()`

`addcmul(tensor1, tensor2, *, value=1) → Tensor`

    

See [`torch.addcmul()`](generated/torch.addcmul#torch.addcmul "torch.addcmul")

`addcmul_(tensor1, tensor2, *, value=1) → Tensor`

    

In-place version of `addcmul()`

`addmm(mat1, mat2, *, beta=1, alpha=1) → Tensor`

    

See [`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm")

`addmm_(mat1, mat2, *, beta=1, alpha=1) → Tensor`

    

In-place version of `addmm()`

`sspaddmm(mat1, mat2, *, beta=1, alpha=1) → Tensor`

    

See [`torch.sspaddmm()`](sparse#torch.sspaddmm "torch.sspaddmm")

`addmv(mat, vec, *, beta=1, alpha=1) → Tensor`

    

See [`torch.addmv()`](generated/torch.addmv#torch.addmv "torch.addmv")

`addmv_(mat, vec, *, beta=1, alpha=1) → Tensor`

    

In-place version of `addmv()`

`addr(vec1, vec2, *, beta=1, alpha=1) → Tensor`

    

See [`torch.addr()`](generated/torch.addr#torch.addr "torch.addr")

`addr_(vec1, vec2, *, beta=1, alpha=1) → Tensor`

    

In-place version of `addr()`

`allclose(other, rtol=1e-05, atol=1e-08, equal_nan=False) → Tensor`

    

See [`torch.allclose()`](generated/torch.allclose#torch.allclose
"torch.allclose")

`amax(dim=None, keepdim=False) → Tensor`

    

See [`torch.amax()`](generated/torch.amax#torch.amax "torch.amax")

`amin(dim=None, keepdim=False) → Tensor`

    

See [`torch.amin()`](generated/torch.amin#torch.amin "torch.amin")

`angle() → Tensor`

    

See [`torch.angle()`](generated/torch.angle#torch.angle "torch.angle")

`apply_(callable) → Tensor`

    

Applies the function `callable` to each element in the tensor, replacing each
element with the value returned by `callable`.

Note

This function only works with CPU tensors and should not be used in code
sections that require high performance.

`argmax(dim=None, keepdim=False) → LongTensor`

    

See [`torch.argmax()`](generated/torch.argmax#torch.argmax "torch.argmax")

`argmin(dim=None, keepdim=False) → LongTensor`

    

See [`torch.argmin()`](generated/torch.argmin#torch.argmin "torch.argmin")

`argsort(dim=-1, descending=False) → LongTensor`

    

See [`torch.argsort()`](generated/torch.argsort#torch.argsort "torch.argsort")

`asin() → Tensor`

    

See [`torch.asin()`](generated/torch.asin#torch.asin "torch.asin")

`asin_() → Tensor`

    

In-place version of `asin()`

`arcsin() → Tensor`

    

See [`torch.arcsin()`](generated/torch.arcsin#torch.arcsin "torch.arcsin")

`arcsin_() → Tensor`

    

In-place version of `arcsin()`

`as_strided(size, stride, storage_offset=0) → Tensor`

    

See [`torch.as_strided()`](generated/torch.as_strided#torch.as_strided
"torch.as_strided")

`atan() → Tensor`

    

See [`torch.atan()`](generated/torch.atan#torch.atan "torch.atan")

`atan_() → Tensor`

    

In-place version of `atan()`

`arctan() → Tensor`

    

See [`torch.arctan()`](generated/torch.arctan#torch.arctan "torch.arctan")

`arctan_() → Tensor`

    

In-place version of `arctan()`

`atan2(other) → Tensor`

    

See [`torch.atan2()`](generated/torch.atan2#torch.atan2 "torch.atan2")

`atan2_(other) → Tensor`

    

In-place version of `atan2()`

`all(dim=None, keepdim=False) → Tensor`

    

See [`torch.all()`](generated/torch.all#torch.all "torch.all")

`any(dim=None, keepdim=False) → Tensor`

    

See [`torch.any()`](generated/torch.any#torch.any "torch.any")

`backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.backward)

    

Computes the gradient of current tensor w.r.t. graph leaves.

The graph is differentiated using the chain rule. If the tensor is non-scalar
(i.e. its data has more than one element) and requires gradient, the function
additionally requires specifying `gradient`. It should be a tensor of matching
type and location, that contains the gradient of the differentiated function
w.r.t. `self`.

This function accumulates gradients in the leaves - you might need to zero
`.grad` attributes or set them to `None` before calling it. See [Default
gradient layouts](autograd#default-grad-layouts) for details on the memory
layout of accumulated gradients.

Note

If you run any forward ops, create `gradient`, and/or call `backward` in a
user-specified CUDA stream context, see [Stream semantics of backward
passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream-
semantics).

Parameters

    

  * **gradient** (Tensor _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)")) – Gradient w.r.t. the tensor. If it is a tensor, it will be automatically converted to a Tensor that does not require grad unless `create_graph` is True. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable then this argument is optional.
  * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`.
  * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to `False`.
  * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be accumulated into `.grad`. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors.

`baddbmm(batch1, batch2, *, beta=1, alpha=1) → Tensor`

    

See [`torch.baddbmm()`](generated/torch.baddbmm#torch.baddbmm "torch.baddbmm")

`baddbmm_(batch1, batch2, *, beta=1, alpha=1) → Tensor`

    

In-place version of `baddbmm()`

`bernoulli(*, generator=None) → Tensor`

    

Returns a result tensor where each result[i]\texttt{result[i]} is
independently sampled from
Bernoulli(self[i])\text{Bernoulli}(\texttt{self[i]}) . `self` must have
floating point `dtype`, and the result will have the same `dtype`.

See [`torch.bernoulli()`](generated/torch.bernoulli#torch.bernoulli
"torch.bernoulli")

`bernoulli_()`

    

`bernoulli_(p=0.5, *, generator=None) → Tensor`

    

Fills each location of `self` with an independent sample from
Bernoulli(p)\text{Bernoulli}(\texttt{p}) . `self` can have integral `dtype`.

`bernoulli_(p_tensor, *, generator=None) → Tensor`

    

`p_tensor` should be a tensor containing probabilities to be used for drawing
the binary random number.

The ith\text{i}^{th} element of `self` tensor will be set to a value sampled
from Bernoulli(p_tensor[i])\text{Bernoulli}(\texttt{p\\_tensor[i]}) .

`self` can have integral `dtype`, but `p_tensor` must have floating point
`dtype`.

See also `bernoulli()` and
[`torch.bernoulli()`](generated/torch.bernoulli#torch.bernoulli
"torch.bernoulli")

`bfloat16(memory_format=torch.preserve_format) → Tensor`

    

`self.bfloat16()` is equivalent to `self.to(torch.bfloat16)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`bincount(weights=None, minlength=0) → Tensor`

    

See [`torch.bincount()`](generated/torch.bincount#torch.bincount
"torch.bincount")

`bitwise_not() → Tensor`

    

See [`torch.bitwise_not()`](generated/torch.bitwise_not#torch.bitwise_not
"torch.bitwise_not")

`bitwise_not_() → Tensor`

    

In-place version of `bitwise_not()`

`bitwise_and() → Tensor`

    

See [`torch.bitwise_and()`](generated/torch.bitwise_and#torch.bitwise_and
"torch.bitwise_and")

`bitwise_and_() → Tensor`

    

In-place version of `bitwise_and()`

`bitwise_or() → Tensor`

    

See [`torch.bitwise_or()`](generated/torch.bitwise_or#torch.bitwise_or
"torch.bitwise_or")

`bitwise_or_() → Tensor`

    

In-place version of `bitwise_or()`

`bitwise_xor() → Tensor`

    

See [`torch.bitwise_xor()`](generated/torch.bitwise_xor#torch.bitwise_xor
"torch.bitwise_xor")

`bitwise_xor_() → Tensor`

    

In-place version of `bitwise_xor()`

`bmm(batch2) → Tensor`

    

See [`torch.bmm()`](generated/torch.bmm#torch.bmm "torch.bmm")

`bool(memory_format=torch.preserve_format) → Tensor`

    

`self.bool()` is equivalent to `self.to(torch.bool)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`byte(memory_format=torch.preserve_format) → Tensor`

    

`self.byte()` is equivalent to `self.to(torch.uint8)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`broadcast_to(shape) → Tensor`

    

See [`torch.broadcast_to()`](generated/torch.broadcast_to#torch.broadcast_to
"torch.broadcast_to").

`cauchy_(median=0, sigma=1, *, generator=None) → Tensor`

    

Fills the tensor with numbers drawn from the Cauchy distribution:

f(x)=1πσ(x−median)2+σ2f(x) = \dfrac{1}{\pi} \dfrac{\sigma}{(x -
\text{median})^2 + \sigma^2}

`ceil() → Tensor`

    

See [`torch.ceil()`](generated/torch.ceil#torch.ceil "torch.ceil")

`ceil_() → Tensor`

    

In-place version of `ceil()`

`char(memory_format=torch.preserve_format) → Tensor`

    

`self.char()` is equivalent to `self.to(torch.int8)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`cholesky(upper=False) → Tensor`

    

See [`torch.cholesky()`](generated/torch.cholesky#torch.cholesky
"torch.cholesky")

`cholesky_inverse(upper=False) → Tensor`

    

See
[`torch.cholesky_inverse()`](generated/torch.cholesky_inverse#torch.cholesky_inverse
"torch.cholesky_inverse")

`cholesky_solve(input2, upper=False) → Tensor`

    

See
[`torch.cholesky_solve()`](generated/torch.cholesky_solve#torch.cholesky_solve
"torch.cholesky_solve")

`chunk(chunks, dim=0) → List of Tensors`

    

See [`torch.chunk()`](generated/torch.chunk#torch.chunk "torch.chunk")

`clamp(min, max) → Tensor`

    

See [`torch.clamp()`](generated/torch.clamp#torch.clamp "torch.clamp")

`clamp_(min, max) → Tensor`

    

In-place version of `clamp()`

`clip(min, max) → Tensor`

    

Alias for `clamp()`.

`clip_(min, max) → Tensor`

    

Alias for `clamp_()`.

`clone(*, memory_format=torch.preserve_format) → Tensor`

    

See [`torch.clone()`](generated/torch.clone#torch.clone "torch.clone")

`contiguous(memory_format=torch.contiguous_format) → Tensor`

    

Returns a contiguous in memory tensor containing the same data as `self`
tensor. If `self` tensor is already in the specified memory format, this
function returns the `self` tensor.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.contiguous_format`.

`copy_(src, non_blocking=False) → Tensor`

    

Copies the elements from `src` into `self` tensor and returns `self`.

The `src` tensor must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the `self` tensor. It may be of a different data type or
reside on a different device.

Parameters

    

  * **src** (Tensor) – the source tensor to copy from
  * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `True` and this copy is between CPU and GPU, the copy may occur asynchronously with respect to the host. For other cases, this argument has no effect.

`conj() → Tensor`

    

See [`torch.conj()`](generated/torch.conj#torch.conj "torch.conj")

`copysign(other) → Tensor`

    

See [`torch.copysign()`](generated/torch.copysign#torch.copysign
"torch.copysign")

`copysign_(other) → Tensor`

    

In-place version of `copysign()`

`cos() → Tensor`

    

See [`torch.cos()`](generated/torch.cos#torch.cos "torch.cos")

`cos_() → Tensor`

    

In-place version of `cos()`

`cosh() → Tensor`

    

See [`torch.cosh()`](generated/torch.cosh#torch.cosh "torch.cosh")

`cosh_() → Tensor`

    

In-place version of `cosh()`

`count_nonzero(dim=None) → Tensor`

    

See
[`torch.count_nonzero()`](generated/torch.count_nonzero#torch.count_nonzero
"torch.count_nonzero")

`acosh() → Tensor`

    

See [`torch.acosh()`](generated/torch.acosh#torch.acosh "torch.acosh")

`acosh_() → Tensor`

    

In-place version of `acosh()`

`arccosh()`

    

acosh() -> Tensor

See [`torch.arccosh()`](generated/torch.arccosh#torch.arccosh "torch.arccosh")

`arccosh_()`

    

acosh_() -> Tensor

In-place version of `arccosh()`

`cpu(memory_format=torch.preserve_format) → Tensor`

    

Returns a copy of this object in CPU memory.

If this object is already in CPU memory and on the correct device, then no
copy is performed and the original object is returned.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`cross(other, dim=-1) → Tensor`

    

See [`torch.cross()`](generated/torch.cross#torch.cross "torch.cross")

`cuda(device=None, non_blocking=False, memory_format=torch.preserve_format) →
Tensor`

    

Returns a copy of this object in CUDA memory.

If this object is already in CUDA memory and on the correct device, then no
copy is performed and the original object is returned.

Parameters

    

  * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device")) – The destination GPU device. Defaults to the current CUDA device.
  * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True` and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default: `False`.
  * **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`.

`logcumsumexp(dim) → Tensor`

    

See [`torch.logcumsumexp()`](generated/torch.logcumsumexp#torch.logcumsumexp
"torch.logcumsumexp")

`cummax(dim) -> (Tensor, Tensor)`

    

See [`torch.cummax()`](generated/torch.cummax#torch.cummax "torch.cummax")

`cummin(dim) -> (Tensor, Tensor)`

    

See [`torch.cummin()`](generated/torch.cummin#torch.cummin "torch.cummin")

`cumprod(dim, dtype=None) → Tensor`

    

See [`torch.cumprod()`](generated/torch.cumprod#torch.cumprod "torch.cumprod")

`cumprod_(dim, dtype=None) → Tensor`

    

In-place version of `cumprod()`

`cumsum(dim, dtype=None) → Tensor`

    

See [`torch.cumsum()`](generated/torch.cumsum#torch.cumsum "torch.cumsum")

`cumsum_(dim, dtype=None) → Tensor`

    

In-place version of `cumsum()`

`data_ptr() → int`

    

Returns the address of the first element of `self` tensor.

`deg2rad() → Tensor`

    

See [`torch.deg2rad()`](generated/torch.deg2rad#torch.deg2rad "torch.deg2rad")

`dequantize() → Tensor`

    

Given a quantized Tensor, dequantize it and return the dequantized float
Tensor.

`det() → Tensor`

    

See [`torch.det()`](generated/torch.det#torch.det "torch.det")

`dense_dim() → int`

    

Return the number of dense dimensions in a [sparse tensor](sparse#sparse-docs)
`self`.

Warning

Throws an error if `self` is not a sparse tensor.

See also [`Tensor.sparse_dim()`](sparse#torch.Tensor.sparse_dim
"torch.Tensor.sparse_dim") and [hybrid tensors](sparse#sparse-hybrid-coo-
docs).

`detach()`

    

Returns a new Tensor, detached from the current graph.

The result will never require gradient.

Note

Returned Tensor shares the same storage with the original one. In-place
modifications on either of them will be seen, and may trigger errors in
correctness checks. IMPORTANT NOTE: Previously, in-place size / stride /
storage changes (such as `resize_` / `resize_as_` / `set_` / `transpose_`) to
the returned tensor also update the original tensor. Now, these in-place
changes will not update the original tensor anymore, and will instead trigger
an error. For sparse tensors: In-place indices / values changes (such as
`zero_` / `copy_` / `add_`) to the returned tensor will not update the
original tensor anymore, and will instead trigger an error.

`detach_()`

    

Detaches the Tensor from the graph that created it, making it a leaf. Views
cannot be detached in-place.

`diag(diagonal=0) → Tensor`

    

See [`torch.diag()`](generated/torch.diag#torch.diag "torch.diag")

`diag_embed(offset=0, dim1=-2, dim2=-1) → Tensor`

    

See [`torch.diag_embed()`](generated/torch.diag_embed#torch.diag_embed
"torch.diag_embed")

`diagflat(offset=0) → Tensor`

    

See [`torch.diagflat()`](generated/torch.diagflat#torch.diagflat
"torch.diagflat")

`diagonal(offset=0, dim1=0, dim2=1) → Tensor`

    

See [`torch.diagonal()`](generated/torch.diagonal#torch.diagonal
"torch.diagonal")

`fill_diagonal_(fill_value, wrap=False) → Tensor`

    

Fill the main diagonal of a tensor that has at least 2-dimensions. When
dims>2, all dimensions of input must be of equal length. This function
modifies the input tensor in-place, and returns the input tensor.

Parameters

    

  * **fill_value** (_Scalar_) – the fill value
  * **wrap** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – the diagonal ‘wrapped’ after N columns for tall matrices.

Example:

    
    
    >>> a = torch.zeros(3, 3)
    >>> a.fill_diagonal_(5)
    tensor([[5., 0., 0.],
            [0., 5., 0.],
            [0., 0., 5.]])
    >>> b = torch.zeros(7, 3)
    >>> b.fill_diagonal_(5)
    tensor([[5., 0., 0.],
            [0., 5., 0.],
            [0., 0., 5.],
            [0., 0., 0.],
            [0., 0., 0.],
            [0., 0., 0.],
            [0., 0., 0.]])
    >>> c = torch.zeros(7, 3)
    >>> c.fill_diagonal_(5, wrap=True)
    tensor([[5., 0., 0.],
            [0., 5., 0.],
            [0., 0., 5.],
            [0., 0., 0.],
            [5., 0., 0.],
            [0., 5., 0.],
            [0., 0., 5.]])
    

`fmax(other) → Tensor`

    

See [`torch.fmax()`](generated/torch.fmax#torch.fmax "torch.fmax")

`fmin(other) → Tensor`

    

See [`torch.fmin()`](generated/torch.fmin#torch.fmin "torch.fmin")

`diff(n=1, dim=-1, prepend=None, append=None) → Tensor`

    

See [`torch.diff()`](generated/torch.diff#torch.diff "torch.diff")

`digamma() → Tensor`

    

See [`torch.digamma()`](generated/torch.digamma#torch.digamma "torch.digamma")

`digamma_() → Tensor`

    

In-place version of `digamma()`

`dim() → int`

    

Returns the number of dimensions of `self` tensor.

`dist(other, p=2) → Tensor`

    

See [`torch.dist()`](generated/torch.dist#torch.dist "torch.dist")

`div(value, *, rounding_mode=None) → Tensor`

    

See [`torch.div()`](generated/torch.div#torch.div "torch.div")

`div_(value, *, rounding_mode=None) → Tensor`

    

In-place version of `div()`

`divide(value, *, rounding_mode=None) → Tensor`

    

See [`torch.divide()`](generated/torch.divide#torch.divide "torch.divide")

`divide_(value, *, rounding_mode=None) → Tensor`

    

In-place version of `divide()`

`dot(other) → Tensor`

    

See [`torch.dot()`](generated/torch.dot#torch.dot "torch.dot")

`double(memory_format=torch.preserve_format) → Tensor`

    

`self.double()` is equivalent to `self.to(torch.float64)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`eig(eigenvectors=False) -> (Tensor, Tensor)`

    

See [`torch.eig()`](generated/torch.eig#torch.eig "torch.eig")

`element_size() → int`

    

Returns the size in bytes of an individual element.

Example:

    
    
    >>> torch.tensor([]).element_size()
    4
    >>> torch.tensor([], dtype=torch.uint8).element_size()
    1
    

`eq(other) → Tensor`

    

See [`torch.eq()`](generated/torch.eq#torch.eq "torch.eq")

`eq_(other) → Tensor`

    

In-place version of `eq()`

`equal(other) → bool`

    

See [`torch.equal()`](generated/torch.equal#torch.equal "torch.equal")

`erf() → Tensor`

    

See [`torch.erf()`](generated/torch.erf#torch.erf "torch.erf")

`erf_() → Tensor`

    

In-place version of `erf()`

`erfc() → Tensor`

    

See [`torch.erfc()`](generated/torch.erfc#torch.erfc "torch.erfc")

`erfc_() → Tensor`

    

In-place version of `erfc()`

`erfinv() → Tensor`

    

See [`torch.erfinv()`](generated/torch.erfinv#torch.erfinv "torch.erfinv")

`erfinv_() → Tensor`

    

In-place version of `erfinv()`

`exp() → Tensor`

    

See [`torch.exp()`](generated/torch.exp#torch.exp "torch.exp")

`exp_() → Tensor`

    

In-place version of `exp()`

`expm1() → Tensor`

    

See [`torch.expm1()`](generated/torch.expm1#torch.expm1 "torch.expm1")

`expm1_() → Tensor`

    

In-place version of `expm1()`

`expand(*sizes) → Tensor`

    

Returns a new view of the `self` tensor with singleton dimensions expanded to
a larger size.

Passing -1 as the size for a dimension means not changing the size of that
dimension.

Tensor can be also expanded to a larger number of dimensions, and the new ones
will be appended at the front. For the new dimensions, the size cannot be set
to -1.

Expanding a tensor does not allocate new memory, but only creates a new view
on the existing tensor where a dimension of size one is expanded to a larger
size by setting the `stride` to 0. Any dimension of size 1 can be expanded to
an arbitrary value without allocating new memory.

Parameters

    

***sizes** (_torch.Size_ _or_ _int..._) – the desired expanded size

Warning

More than one element of an expanded tensor may refer to a single memory
location. As a result, in-place operations (especially ones that are
vectorized) may result in incorrect behavior. If you need to write to the
tensors, please clone them first.

Example:

    
    
    >>> x = torch.tensor([[1], [2], [3]])
    >>> x.size()
    torch.Size([3, 1])
    >>> x.expand(3, 4)
    tensor([[ 1,  1,  1,  1],
            [ 2,  2,  2,  2],
            [ 3,  3,  3,  3]])
    >>> x.expand(-1, 4)   # -1 means not changing the size of that dimension
    tensor([[ 1,  1,  1,  1],
            [ 2,  2,  2,  2],
            [ 3,  3,  3,  3]])
    

`expand_as(other) → Tensor`

    

Expand this tensor to the same size as `other`. `self.expand_as(other)` is
equivalent to `self.expand(other.size())`.

Please see `expand()` for more information about `expand`.

Parameters

    

**other** (`torch.Tensor`) – The result tensor has the same size as `other`.

`exponential_(lambd=1, *, generator=None) → Tensor`

    

Fills `self` tensor with elements drawn from the exponential distribution:

f(x)=λe−λxf(x) = \lambda e^{-\lambda x}

`fix() → Tensor`

    

See [`torch.fix()`](generated/torch.fix#torch.fix "torch.fix").

`fix_() → Tensor`

    

In-place version of `fix()`

`fill_(value) → Tensor`

    

Fills `self` tensor with the specified value.

`flatten(input, start_dim=0, end_dim=-1) → Tensor`

    

see [`torch.flatten()`](generated/torch.flatten#torch.flatten "torch.flatten")

`flip(dims) → Tensor`

    

See [`torch.flip()`](generated/torch.flip#torch.flip "torch.flip")

`fliplr() → Tensor`

    

See [`torch.fliplr()`](generated/torch.fliplr#torch.fliplr "torch.fliplr")

`flipud() → Tensor`

    

See [`torch.flipud()`](generated/torch.flipud#torch.flipud "torch.flipud")

`float(memory_format=torch.preserve_format) → Tensor`

    

`self.float()` is equivalent to `self.to(torch.float32)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`float_power(exponent) → Tensor`

    

See [`torch.float_power()`](generated/torch.float_power#torch.float_power
"torch.float_power")

`float_power_(exponent) → Tensor`

    

In-place version of `float_power()`

`floor() → Tensor`

    

See [`torch.floor()`](generated/torch.floor#torch.floor "torch.floor")

`floor_() → Tensor`

    

In-place version of `floor()`

`floor_divide(value) → Tensor`

    

See [`torch.floor_divide()`](generated/torch.floor_divide#torch.floor_divide
"torch.floor_divide")

`floor_divide_(value) → Tensor`

    

In-place version of `floor_divide()`

`fmod(divisor) → Tensor`

    

See [`torch.fmod()`](generated/torch.fmod#torch.fmod "torch.fmod")

`fmod_(divisor) → Tensor`

    

In-place version of `fmod()`

`frac() → Tensor`

    

See [`torch.frac()`](generated/torch.frac#torch.frac "torch.frac")

`frac_() → Tensor`

    

In-place version of `frac()`

`gather(dim, index) → Tensor`

    

See [`torch.gather()`](generated/torch.gather#torch.gather "torch.gather")

`gcd(other) → Tensor`

    

See [`torch.gcd()`](generated/torch.gcd#torch.gcd "torch.gcd")

`gcd_(other) → Tensor`

    

In-place version of `gcd()`

`ge(other) → Tensor`

    

See [`torch.ge()`](generated/torch.ge#torch.ge "torch.ge").

`ge_(other) → Tensor`

    

In-place version of `ge()`.

`greater_equal(other) → Tensor`

    

See
[`torch.greater_equal()`](generated/torch.greater_equal#torch.greater_equal
"torch.greater_equal").

`greater_equal_(other) → Tensor`

    

In-place version of `greater_equal()`.

`geometric_(p, *, generator=None) → Tensor`

    

Fills `self` tensor with elements drawn from the geometric distribution:

f(X=k)=pk−1(1−p)f(X=k) = p^{k - 1} (1 - p)

`geqrf() -> (Tensor, Tensor)`

    

See [`torch.geqrf()`](generated/torch.geqrf#torch.geqrf "torch.geqrf")

`ger(vec2) → Tensor`

    

See [`torch.ger()`](generated/torch.ger#torch.ger "torch.ger")

`get_device() -> Device ordinal (Integer)`

    

For CUDA tensors, this function returns the device ordinal of the GPU on which
the tensor resides. For CPU tensors, an error is thrown.

Example:

    
    
    >>> x = torch.randn(3, 4, 5, device='cuda:0')
    >>> x.get_device()
    0
    >>> x.cpu().get_device()  # RuntimeError: get_device is not implemented for type torch.FloatTensor
    

`gt(other) → Tensor`

    

See [`torch.gt()`](generated/torch.gt#torch.gt "torch.gt").

`gt_(other) → Tensor`

    

In-place version of `gt()`.

`greater(other) → Tensor`

    

See [`torch.greater()`](generated/torch.greater#torch.greater
"torch.greater").

`greater_(other) → Tensor`

    

In-place version of `greater()`.

`half(memory_format=torch.preserve_format) → Tensor`

    

`self.half()` is equivalent to `self.to(torch.float16)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`hardshrink(lambd=0.5) → Tensor`

    

See
[`torch.nn.functional.hardshrink()`](nn.functional#torch.nn.functional.hardshrink
"torch.nn.functional.hardshrink")

`heaviside(values) → Tensor`

    

See [`torch.heaviside()`](generated/torch.heaviside#torch.heaviside
"torch.heaviside")

`histc(bins=100, min=0, max=0) → Tensor`

    

See [`torch.histc()`](generated/torch.histc#torch.histc "torch.histc")

`hypot(other) → Tensor`

    

See [`torch.hypot()`](generated/torch.hypot#torch.hypot "torch.hypot")

`hypot_(other) → Tensor`

    

In-place version of `hypot()`

`i0() → Tensor`

    

See [`torch.i0()`](generated/torch.i0#torch.i0 "torch.i0")

`i0_() → Tensor`

    

In-place version of `i0()`

`igamma(other) → Tensor`

    

See [`torch.igamma()`](generated/torch.igamma#torch.igamma "torch.igamma")

`igamma_(other) → Tensor`

    

In-place version of `igamma()`

`igammac(other) → Tensor`

    

See [`torch.igammac()`](generated/torch.igammac#torch.igammac "torch.igammac")

`igammac_(other) → Tensor`

    

In-place version of `igammac()`

`index_add_(dim, index, tensor) → Tensor`

    

Accumulate the elements of [`tensor`](generated/torch.tensor#torch.tensor
"torch.tensor") into the `self` tensor by adding to the indices in the order
given in `index`. For example, if `dim == 0` and `index[i] == j`, then the
`i`th row of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") is
added to the `j`th row of `self`.

The `dim`th dimension of [`tensor`](generated/torch.tensor#torch.tensor
"torch.tensor") must have the same size as the length of `index` (which must
be a vector), and all other dimensions must match `self`, or an error will be
raised.

Note

This operation may behave nondeterministically when given tensors on a CUDA
device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Parameters

    

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to index
  * **index** (_IntTensor_ _or_ _LongTensor_) – indices of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") to select from
  * **tensor** (Tensor) – the tensor containing values to add

Example:

    
    
    >>> x = torch.ones(5, 3)
    >>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
    >>> index = torch.tensor([0, 4, 2])
    >>> x.index_add_(0, index, t)
    tensor([[  2.,   3.,   4.],
            [  1.,   1.,   1.],
            [  8.,   9.,  10.],
            [  1.,   1.,   1.],
            [  5.,   6.,   7.]])
    

`index_add(tensor1, dim, index, tensor2) → Tensor`

    

Out-of-place version of `torch.Tensor.index_add_()`. `tensor1` corresponds to
`self` in `torch.Tensor.index_add_()`.

`index_copy_(dim, index, tensor) → Tensor`

    

Copies the elements of [`tensor`](generated/torch.tensor#torch.tensor
"torch.tensor") into the `self` tensor by selecting the indices in the order
given in `index`. For example, if `dim == 0` and `index[i] == j`, then the
`i`th row of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") is
copied to the `j`th row of `self`.

The `dim`th dimension of [`tensor`](generated/torch.tensor#torch.tensor
"torch.tensor") must have the same size as the length of `index` (which must
be a vector), and all other dimensions must match `self`, or an error will be
raised.

Note

If `index` contains duplicate entries, multiple elements from
[`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") will be copied
to the same index of `self`. The result is nondeterministic since it depends
on which copy occurs last.

Parameters

    

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to index
  * **index** (_LongTensor_) – indices of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") to select from
  * **tensor** (Tensor) – the tensor containing values to copy

Example:

    
    
    >>> x = torch.zeros(5, 3)
    >>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
    >>> index = torch.tensor([0, 4, 2])
    >>> x.index_copy_(0, index, t)
    tensor([[ 1.,  2.,  3.],
            [ 0.,  0.,  0.],
            [ 7.,  8.,  9.],
            [ 0.,  0.,  0.],
            [ 4.,  5.,  6.]])
    

`index_copy(tensor1, dim, index, tensor2) → Tensor`

    

Out-of-place version of `torch.Tensor.index_copy_()`. `tensor1` corresponds to
`self` in `torch.Tensor.index_copy_()`.

`index_fill_(dim, index, val) → Tensor`

    

Fills the elements of the `self` tensor with value `val` by selecting the
indices in the order given in `index`.

Parameters

    

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to index
  * **index** (_LongTensor_) – indices of `self` tensor to fill in
  * **val** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the value to fill with

Example::

    
    
    
    >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float)
    >>> index = torch.tensor([0, 2])
    >>> x.index_fill_(1, index, -1)
    tensor([[-1.,  2., -1.],
            [-1.,  5., -1.],
            [-1.,  8., -1.]])
    

`index_fill(tensor1, dim, index, value) → Tensor`

    

Out-of-place version of `torch.Tensor.index_fill_()`. `tensor1` corresponds to
`self` in `torch.Tensor.index_fill_()`.

`index_put_(indices, values, accumulate=False) → Tensor`

    

Puts values from the tensor [`values`](sparse#torch.Tensor.values
"torch.Tensor.values") into the tensor `self` using the indices specified in
[`indices`](sparse#torch.Tensor.indices "torch.Tensor.indices") (which is a
tuple of Tensors). The expression `tensor.index_put_(indices, values)` is
equivalent to `tensor[indices] = values`. Returns `self`.

If `accumulate` is `True`, the elements in
[`values`](sparse#torch.Tensor.values "torch.Tensor.values") are added to
`self`. If accumulate is `False`, the behavior is undefined if indices contain
duplicate elements.

Parameters

    

  * **indices** (_tuple of LongTensor_) – tensors used to index into `self`.
  * **values** (Tensor) – tensor of same dtype as `self`.
  * **accumulate** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to accumulate into self

`index_put(tensor1, indices, values, accumulate=False) → Tensor`

    

Out-place version of `index_put_()`. `tensor1` corresponds to `self` in
`torch.Tensor.index_put_()`.

`index_select(dim, index) → Tensor`

    

See [`torch.index_select()`](generated/torch.index_select#torch.index_select
"torch.index_select")

`indices() → Tensor`

    

Return the indices tensor of a [sparse COO tensor](sparse#sparse-coo-docs).

Warning

Throws an error if `self` is not a sparse COO tensor.

See also [`Tensor.values()`](sparse#torch.Tensor.values
"torch.Tensor.values").

Note

This method can only be called on a coalesced sparse tensor. See
[`Tensor.coalesce()`](sparse#torch.Tensor.coalesce "torch.Tensor.coalesce")
for details.

`inner(other) → Tensor`

    

See [`torch.inner()`](generated/torch.inner#torch.inner "torch.inner").

`int(memory_format=torch.preserve_format) → Tensor`

    

`self.int()` is equivalent to `self.to(torch.int32)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`int_repr() → Tensor`

    

Given a quantized Tensor, `self.int_repr()` returns a CPU Tensor with uint8_t
as data type that stores the underlying uint8_t values of the given Tensor.

`inverse() → Tensor`

    

See [`torch.inverse()`](generated/torch.inverse#torch.inverse "torch.inverse")

`isclose(other, rtol=1e-05, atol=1e-08, equal_nan=False) → Tensor`

    

See [`torch.isclose()`](generated/torch.isclose#torch.isclose "torch.isclose")

`isfinite() → Tensor`

    

See [`torch.isfinite()`](generated/torch.isfinite#torch.isfinite
"torch.isfinite")

`isinf() → Tensor`

    

See [`torch.isinf()`](generated/torch.isinf#torch.isinf "torch.isinf")

`isposinf() → Tensor`

    

See [`torch.isposinf()`](generated/torch.isposinf#torch.isposinf
"torch.isposinf")

`isneginf() → Tensor`

    

See [`torch.isneginf()`](generated/torch.isneginf#torch.isneginf
"torch.isneginf")

`isnan() → Tensor`

    

See [`torch.isnan()`](generated/torch.isnan#torch.isnan "torch.isnan")

`is_contiguous(memory_format=torch.contiguous_format) → bool`

    

Returns True if `self` tensor is contiguous in memory in the order specified
by memory format.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – Specifies memory allocation order.
Default: `torch.contiguous_format`.

`is_complex() → bool`

    

Returns True if the data type of `self` is a complex data type.

`is_floating_point() → bool`

    

Returns True if the data type of `self` is a floating point data type.

`is_leaf`

    

All Tensors that have [`requires_grad`](autograd#torch.Tensor.requires_grad
"torch.Tensor.requires_grad") which is `False` will be leaf Tensors by
convention.

For Tensors that have [`requires_grad`](autograd#torch.Tensor.requires_grad
"torch.Tensor.requires_grad") which is `True`, they will be leaf Tensors if
they were created by the user. This means that they are not the result of an
operation and so `grad_fn` is None.

Only leaf Tensors will have their [`grad`](autograd#torch.Tensor.grad
"torch.Tensor.grad") populated during a call to
[`backward()`](autograd#torch.Tensor.backward "torch.Tensor.backward"). To get
[`grad`](autograd#torch.Tensor.grad "torch.Tensor.grad") populated for non-
leaf Tensors, you can use [`retain_grad()`](autograd#torch.Tensor.retain_grad
"torch.Tensor.retain_grad").

Example:

    
    
    >>> a = torch.rand(10, requires_grad=True)
    >>> a.is_leaf
    True
    >>> b = torch.rand(10, requires_grad=True).cuda()
    >>> b.is_leaf
    False
    # b was created by the operation that cast a cpu Tensor into a cuda Tensor
    >>> c = torch.rand(10, requires_grad=True) + 2
    >>> c.is_leaf
    False
    # c was created by the addition operation
    >>> d = torch.rand(10).cuda()
    >>> d.is_leaf
    True
    # d does not require gradients and so has no operation creating it (that is tracked by the autograd engine)
    >>> e = torch.rand(10).cuda().requires_grad_()
    >>> e.is_leaf
    True
    # e requires gradients and has no operations creating it
    >>> f = torch.rand(10, requires_grad=True, device="cuda")
    >>> f.is_leaf
    True
    # f requires grad, has no operation creating it
    

`is_pinned()`

    

Returns true if this tensor resides in pinned memory.

`is_set_to(tensor) → bool`

    

Returns True if both tensors are pointing to the exact same memory (same
storage, offset, size and stride).

`is_shared()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.is_shared)

    

Checks if tensor is in shared memory.

This is always `True` for CUDA tensors.

`is_signed() → bool`

    

Returns True if the data type of `self` is a signed data type.

`is_sparse`

    

Is `True` if the Tensor uses sparse storage layout, `False` otherwise.

`istft(n_fft, hop_length=None, win_length=None, window=None, center=True,
normalized=False, onesided=None, length=None, return_complex=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.istft)

    

See [`torch.istft()`](generated/torch.istft#torch.istft "torch.istft")

`isreal() → Tensor`

    

See [`torch.isreal()`](generated/torch.isreal#torch.isreal "torch.isreal")

`item() → number`

    

Returns the value of this tensor as a standard Python number. This only works
for tensors with one element. For other cases, see `tolist()`.

This operation is not differentiable.

Example:

    
    
    >>> x = torch.tensor([1.0])
    >>> x.item()
    1.0
    

`kthvalue(k, dim=None, keepdim=False) -> (Tensor, LongTensor)`

    

See [`torch.kthvalue()`](generated/torch.kthvalue#torch.kthvalue
"torch.kthvalue")

`lcm(other) → Tensor`

    

See [`torch.lcm()`](generated/torch.lcm#torch.lcm "torch.lcm")

`lcm_(other) → Tensor`

    

In-place version of `lcm()`

`ldexp(other) → Tensor`

    

See [`torch.ldexp()`](generated/torch.ldexp#torch.ldexp "torch.ldexp")

`ldexp_(other) → Tensor`

    

In-place version of `ldexp()`

`le(other) → Tensor`

    

See [`torch.le()`](generated/torch.le#torch.le "torch.le").

`le_(other) → Tensor`

    

In-place version of `le()`.

`less_equal(other) → Tensor`

    

See [`torch.less_equal()`](generated/torch.less_equal#torch.less_equal
"torch.less_equal").

`less_equal_(other) → Tensor`

    

In-place version of `less_equal()`.

`lerp(end, weight) → Tensor`

    

See [`torch.lerp()`](generated/torch.lerp#torch.lerp "torch.lerp")

`lerp_(end, weight) → Tensor`

    

In-place version of `lerp()`

`lgamma() → Tensor`

    

See [`torch.lgamma()`](generated/torch.lgamma#torch.lgamma "torch.lgamma")

`lgamma_() → Tensor`

    

In-place version of `lgamma()`

`log() → Tensor`

    

See [`torch.log()`](generated/torch.log#torch.log "torch.log")

`log_() → Tensor`

    

In-place version of `log()`

`logdet() → Tensor`

    

See [`torch.logdet()`](generated/torch.logdet#torch.logdet "torch.logdet")

`log10() → Tensor`

    

See [`torch.log10()`](generated/torch.log10#torch.log10 "torch.log10")

`log10_() → Tensor`

    

In-place version of `log10()`

`log1p() → Tensor`

    

See [`torch.log1p()`](generated/torch.log1p#torch.log1p "torch.log1p")

`log1p_() → Tensor`

    

In-place version of `log1p()`

`log2() → Tensor`

    

See [`torch.log2()`](generated/torch.log2#torch.log2 "torch.log2")

`log2_() → Tensor`

    

In-place version of `log2()`

`log_normal_(mean=1, std=2, *, generator=None)`

    

Fills `self` tensor with numbers samples from the log-normal distribution
parameterized by the given mean μ\mu and standard deviation σ\sigma . Note
that [`mean`](generated/torch.mean#torch.mean "torch.mean") and
[`std`](generated/torch.std#torch.std "torch.std") are the mean and standard
deviation of the underlying normal distribution, and not of the returned
distribution:

f(x)=1xσ2πe−(ln⁡x−μ)22σ2f(x) = \dfrac{1}{x \sigma \sqrt{2\pi}}\ e^{-\frac{(\ln
x - \mu)^2}{2\sigma^2}}

`logaddexp(other) → Tensor`

    

See [`torch.logaddexp()`](generated/torch.logaddexp#torch.logaddexp
"torch.logaddexp")

`logaddexp2(other) → Tensor`

    

See [`torch.logaddexp2()`](generated/torch.logaddexp2#torch.logaddexp2
"torch.logaddexp2")

`logsumexp(dim, keepdim=False) → Tensor`

    

See [`torch.logsumexp()`](generated/torch.logsumexp#torch.logsumexp
"torch.logsumexp")

`logical_and() → Tensor`

    

See [`torch.logical_and()`](generated/torch.logical_and#torch.logical_and
"torch.logical_and")

`logical_and_() → Tensor`

    

In-place version of `logical_and()`

`logical_not() → Tensor`

    

See [`torch.logical_not()`](generated/torch.logical_not#torch.logical_not
"torch.logical_not")

`logical_not_() → Tensor`

    

In-place version of `logical_not()`

`logical_or() → Tensor`

    

See [`torch.logical_or()`](generated/torch.logical_or#torch.logical_or
"torch.logical_or")

`logical_or_() → Tensor`

    

In-place version of `logical_or()`

`logical_xor() → Tensor`

    

See [`torch.logical_xor()`](generated/torch.logical_xor#torch.logical_xor
"torch.logical_xor")

`logical_xor_() → Tensor`

    

In-place version of `logical_xor()`

`logit() → Tensor`

    

See [`torch.logit()`](generated/torch.logit#torch.logit "torch.logit")

`logit_() → Tensor`

    

In-place version of `logit()`

`long(memory_format=torch.preserve_format) → Tensor`

    

`self.long()` is equivalent to `self.to(torch.int64)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`lstsq(A) -> (Tensor, Tensor)`

    

See [`torch.lstsq()`](generated/torch.lstsq#torch.lstsq "torch.lstsq")

`lt(other) → Tensor`

    

See [`torch.lt()`](generated/torch.lt#torch.lt "torch.lt").

`lt_(other) → Tensor`

    

In-place version of `lt()`.

`less()`

    

lt(other) -> Tensor

See [`torch.less()`](generated/torch.less#torch.less "torch.less").

`less_(other) → Tensor`

    

In-place version of `less()`.

`lu(pivot=True, get_infos=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.lu)

    

See [`torch.lu()`](generated/torch.lu#torch.lu "torch.lu")

`lu_solve(LU_data, LU_pivots) → Tensor`

    

See [`torch.lu_solve()`](generated/torch.lu_solve#torch.lu_solve
"torch.lu_solve")

`as_subclass(cls) → Tensor`

    

Makes a `cls` instance with the same data pointer as `self`. Changes in the
output mirror changes in `self`, and the output stays attached to the autograd
graph. `cls` must be a subclass of `Tensor`.

`map_(tensor, callable)`

    

Applies `callable` for each element in `self` tensor and the given
[`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") and stores the
results in `self` tensor. `self` tensor and the given
[`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics).

The `callable` should have the signature:

    
    
    def callable(a, b) -> number
    

`masked_scatter_(mask, source)`

    

Copies elements from `source` into `self` tensor at positions where the `mask`
is True. The shape of `mask` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the shape of the underlying tensor. The `source` should have
at least as many elements as the number of ones in `mask`

Parameters

    

  * **mask** (_BoolTensor_) – the boolean mask
  * **source** (Tensor) – the tensor to copy from

Note

The `mask` operates on the `self` tensor, not on the given `source` tensor.

`masked_scatter(mask, tensor) → Tensor`

    

Out-of-place version of `torch.Tensor.masked_scatter_()`

`masked_fill_(mask, value)`

    

Fills elements of `self` tensor with `value` where `mask` is True. The shape
of `mask` must be
[broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-
semantics) with the shape of the underlying tensor.

Parameters

    

  * **mask** (_BoolTensor_) – the boolean mask
  * **value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the value to fill in with

`masked_fill(mask, value) → Tensor`

    

Out-of-place version of `torch.Tensor.masked_fill_()`

`masked_select(mask) → Tensor`

    

See
[`torch.masked_select()`](generated/torch.masked_select#torch.masked_select
"torch.masked_select")

`matmul(tensor2) → Tensor`

    

See [`torch.matmul()`](generated/torch.matmul#torch.matmul "torch.matmul")

`matrix_power(n) → Tensor`

    

See [`torch.matrix_power()`](generated/torch.matrix_power#torch.matrix_power
"torch.matrix_power")

`matrix_exp() → Tensor`

    

See [`torch.matrix_exp()`](generated/torch.matrix_exp#torch.matrix_exp
"torch.matrix_exp")

`max(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)`

    

See [`torch.max()`](generated/torch.max#torch.max "torch.max")

`maximum(other) → Tensor`

    

See [`torch.maximum()`](generated/torch.maximum#torch.maximum "torch.maximum")

`mean(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)`

    

See [`torch.mean()`](generated/torch.mean#torch.mean "torch.mean")

`median(dim=None, keepdim=False) -> (Tensor, LongTensor)`

    

See [`torch.median()`](generated/torch.median#torch.median "torch.median")

`nanmedian(dim=None, keepdim=False) -> (Tensor, LongTensor)`

    

See [`torch.nanmedian()`](generated/torch.nanmedian#torch.nanmedian
"torch.nanmedian")

`min(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)`

    

See [`torch.min()`](generated/torch.min#torch.min "torch.min")

`minimum(other) → Tensor`

    

See [`torch.minimum()`](generated/torch.minimum#torch.minimum "torch.minimum")

`mm(mat2) → Tensor`

    

See [`torch.mm()`](generated/torch.mm#torch.mm "torch.mm")

`smm(mat) → Tensor`

    

See [`torch.smm()`](sparse#torch.smm "torch.smm")

`mode(dim=None, keepdim=False) -> (Tensor, LongTensor)`

    

See [`torch.mode()`](generated/torch.mode#torch.mode "torch.mode")

`movedim(source, destination) → Tensor`

    

See [`torch.movedim()`](generated/torch.movedim#torch.movedim "torch.movedim")

`moveaxis(source, destination) → Tensor`

    

See [`torch.moveaxis()`](generated/torch.moveaxis#torch.moveaxis
"torch.moveaxis")

`msort() → Tensor`

    

See [`torch.msort()`](generated/torch.msort#torch.msort "torch.msort")

`mul(value) → Tensor`

    

See [`torch.mul()`](generated/torch.mul#torch.mul "torch.mul").

`mul_(value) → Tensor`

    

In-place version of `mul()`.

`multiply(value) → Tensor`

    

See [`torch.multiply()`](generated/torch.multiply#torch.multiply
"torch.multiply").

`multiply_(value) → Tensor`

    

In-place version of `multiply()`.

`multinomial(num_samples, replacement=False, *, generator=None) → Tensor`

    

See [`torch.multinomial()`](generated/torch.multinomial#torch.multinomial
"torch.multinomial")

`mv(vec) → Tensor`

    

See [`torch.mv()`](generated/torch.mv#torch.mv "torch.mv")

`mvlgamma(p) → Tensor`

    

See [`torch.mvlgamma()`](generated/torch.mvlgamma#torch.mvlgamma
"torch.mvlgamma")

`mvlgamma_(p) → Tensor`

    

In-place version of `mvlgamma()`

`nansum(dim=None, keepdim=False, dtype=None) → Tensor`

    

See [`torch.nansum()`](generated/torch.nansum#torch.nansum "torch.nansum")

`narrow(dimension, start, length) → Tensor`

    

See [`torch.narrow()`](generated/torch.narrow#torch.narrow "torch.narrow")

Example:

    
    
    >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
    >>> x.narrow(0, 0, 2)
    tensor([[ 1,  2,  3],
            [ 4,  5,  6]])
    >>> x.narrow(1, 1, 2)
    tensor([[ 2,  3],
            [ 5,  6],
            [ 8,  9]])
    

`narrow_copy(dimension, start, length) → Tensor`

    

Same as `Tensor.narrow()` except returning a copy rather than shared storage.
This is primarily for sparse tensors, which do not have a shared-storage
narrow method. Calling ``narrow_copy` with ``dimemsion > self.sparse_dim()``
will return a copy with the relevant dense dimension narrowed, and
``self.shape`` updated accordingly.

`ndimension() → int`

    

Alias for `dim()`

`nan_to_num(nan=0.0, posinf=None, neginf=None) → Tensor`

    

See [`torch.nan_to_num()`](generated/torch.nan_to_num#torch.nan_to_num
"torch.nan_to_num").

`nan_to_num_(nan=0.0, posinf=None, neginf=None) → Tensor`

    

In-place version of `nan_to_num()`.

`ne(other) → Tensor`

    

See [`torch.ne()`](generated/torch.ne#torch.ne "torch.ne").

`ne_(other) → Tensor`

    

In-place version of `ne()`.

`not_equal(other) → Tensor`

    

See [`torch.not_equal()`](generated/torch.not_equal#torch.not_equal
"torch.not_equal").

`not_equal_(other) → Tensor`

    

In-place version of `not_equal()`.

`neg() → Tensor`

    

See [`torch.neg()`](generated/torch.neg#torch.neg "torch.neg")

`neg_() → Tensor`

    

In-place version of `neg()`

`negative() → Tensor`

    

See [`torch.negative()`](generated/torch.negative#torch.negative
"torch.negative")

`negative_() → Tensor`

    

In-place version of `negative()`

`nelement() → int`

    

Alias for `numel()`

`nextafter(other) → Tensor`

    

See [`torch.nextafter()`](generated/torch.nextafter#torch.nextafter
"torch.nextafter")

`nextafter_(other) → Tensor`

    

In-place version of `nextafter()`

`nonzero() → LongTensor`

    

See [`torch.nonzero()`](generated/torch.nonzero#torch.nonzero "torch.nonzero")

`norm(p='fro', dim=None, keepdim=False, dtype=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.norm)

    

See [`torch.norm()`](generated/torch.norm#torch.norm "torch.norm")

`normal_(mean=0, std=1, *, generator=None) → Tensor`

    

Fills `self` tensor with elements samples from the normal distribution
parameterized by [`mean`](generated/torch.mean#torch.mean "torch.mean") and
[`std`](generated/torch.std#torch.std "torch.std").

`numel() → int`

    

See [`torch.numel()`](generated/torch.numel#torch.numel "torch.numel")

`numpy() → numpy.ndarray`

    

Returns `self` tensor as a NumPy `ndarray`. This tensor and the returned
`ndarray` share the same underlying storage. Changes to `self` tensor will be
reflected in the `ndarray` and vice versa.

`orgqr(input2) → Tensor`

    

See [`torch.orgqr()`](generated/torch.orgqr#torch.orgqr "torch.orgqr")

`ormqr(input2, input3, left=True, transpose=False) → Tensor`

    

See [`torch.ormqr()`](generated/torch.ormqr#torch.ormqr "torch.ormqr")

`outer(vec2) → Tensor`

    

See [`torch.outer()`](generated/torch.outer#torch.outer "torch.outer").

`permute(*dims) → Tensor`

    

Returns a view of the original tensor with its dimensions permuted.

Parameters

    

***dims** (_int..._) – The desired ordering of dimensions

#### Example

    
    
    >>> x = torch.randn(2, 3, 5)
    >>> x.size()
    torch.Size([2, 3, 5])
    >>> x.permute(2, 0, 1).size()
    torch.Size([5, 2, 3])
    

`pin_memory() → Tensor`

    

Copies the tensor to pinned memory, if it’s not already pinned.

`pinverse() → Tensor`

    

See [`torch.pinverse()`](generated/torch.pinverse#torch.pinverse
"torch.pinverse")

`polygamma(n) → Tensor`

    

See [`torch.polygamma()`](generated/torch.polygamma#torch.polygamma
"torch.polygamma")

`polygamma_(n) → Tensor`

    

In-place version of `polygamma()`

`pow(exponent) → Tensor`

    

See [`torch.pow()`](generated/torch.pow#torch.pow "torch.pow")

`pow_(exponent) → Tensor`

    

In-place version of `pow()`

`prod(dim=None, keepdim=False, dtype=None) → Tensor`

    

See [`torch.prod()`](generated/torch.prod#torch.prod "torch.prod")

`put_(indices, tensor, accumulate=False) → Tensor`

    

Copies the elements from [`tensor`](generated/torch.tensor#torch.tensor
"torch.tensor") into the positions specified by indices. For the purpose of
indexing, the `self` tensor is treated as if it were a 1-D tensor.

If `accumulate` is `True`, the elements in
[`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") are added to
`self`. If accumulate is `False`, the behavior is undefined if indices contain
duplicate elements.

Parameters

    

  * **indices** (_LongTensor_) – the indices into self
  * **tensor** (Tensor) – the tensor containing values to copy from
  * **accumulate** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to accumulate into self

Example:

    
    
    >>> src = torch.tensor([[4, 3, 5],
    ...                     [6, 7, 8]])
    >>> src.put_(torch.tensor([1, 3]), torch.tensor([9, 10]))
    tensor([[  4,   9,   5],
            [ 10,   7,   8]])
    

`qr(some=True) -> (Tensor, Tensor)`

    

See [`torch.qr()`](generated/torch.qr#torch.qr "torch.qr")

`qscheme() → torch.qscheme`

    

Returns the quantization scheme of a given QTensor.

`quantile(q, dim=None, keepdim=False) → Tensor`

    

See [`torch.quantile()`](generated/torch.quantile#torch.quantile
"torch.quantile")

`nanquantile(q, dim=None, keepdim=False) → Tensor`

    

See [`torch.nanquantile()`](generated/torch.nanquantile#torch.nanquantile
"torch.nanquantile")

`q_scale() → float`

    

Given a Tensor quantized by linear(affine) quantization, returns the scale of
the underlying quantizer().

`q_zero_point() → int`

    

Given a Tensor quantized by linear(affine) quantization, returns the
zero_point of the underlying quantizer().

`q_per_channel_scales() → Tensor`

    

Given a Tensor quantized by linear (affine) per-channel quantization, returns
a Tensor of scales of the underlying quantizer. It has the number of elements
that matches the corresponding dimensions (from q_per_channel_axis) of the
tensor.

`q_per_channel_zero_points() → Tensor`

    

Given a Tensor quantized by linear (affine) per-channel quantization, returns
a tensor of zero_points of the underlying quantizer. It has the number of
elements that matches the corresponding dimensions (from q_per_channel_axis)
of the tensor.

`q_per_channel_axis() → int`

    

Given a Tensor quantized by linear (affine) per-channel quantization, returns
the index of dimension on which per-channel quantization is applied.

`rad2deg() → Tensor`

    

See [`torch.rad2deg()`](generated/torch.rad2deg#torch.rad2deg "torch.rad2deg")

`random_(from=0, to=None, *, generator=None) → Tensor`

    

Fills `self` tensor with numbers sampled from the discrete uniform
distribution over `[from, to - 1]`. If not specified, the values are usually
only bounded by `self` tensor’s data type. However, for floating point types,
if unspecified, range will be `[0, 2^mantissa]` to ensure that every value is
representable. For example, `torch.tensor(1, dtype=torch.double).random_()`
will be uniform in `[0, 2^53]`.

`ravel(input) → Tensor`

    

see [`torch.ravel()`](generated/torch.ravel#torch.ravel "torch.ravel")

`reciprocal() → Tensor`

    

See [`torch.reciprocal()`](generated/torch.reciprocal#torch.reciprocal
"torch.reciprocal")

`reciprocal_() → Tensor`

    

In-place version of `reciprocal()`

`record_stream(stream)`

    

Ensures that the tensor memory is not reused for another tensor until all
current work queued on `stream` are complete.

Note

The caching allocator is aware of only the stream where a tensor was
allocated. Due to the awareness, it already correctly manages the life cycle
of tensors on only one stream. But if a tensor is used on a stream different
from the stream of origin, the allocator might reuse the memory unexpectedly.
Calling this method lets the allocator know which streams have used the
tensor.

`register_hook(hook)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.register_hook)

    

Registers a backward hook.

The hook will be called every time a gradient with respect to the Tensor is
computed. The hook should have the following signature:

    
    
    hook(grad) -> Tensor or None
    

The hook should not modify its argument, but it can optionally return a new
gradient which will be used in place of [`grad`](autograd#torch.Tensor.grad
"torch.Tensor.grad").

This function returns a handle with a method `handle.remove()` that removes
the hook from the module.

Example:

    
    
    >>> v = torch.tensor([0., 0., 0.], requires_grad=True)
    >>> h = v.register_hook(lambda grad: grad * 2)  # double the gradient
    >>> v.backward(torch.tensor([1., 2., 3.]))
    >>> v.grad
    
     2
     4
     6
    [torch.FloatTensor of size (3,)]
    
    >>> h.remove()  # removes the hook
    

`remainder(divisor) → Tensor`

    

See [`torch.remainder()`](generated/torch.remainder#torch.remainder
"torch.remainder")

`remainder_(divisor) → Tensor`

    

In-place version of `remainder()`

`renorm(p, dim, maxnorm) → Tensor`

    

See [`torch.renorm()`](generated/torch.renorm#torch.renorm "torch.renorm")

`renorm_(p, dim, maxnorm) → Tensor`

    

In-place version of `renorm()`

`repeat(*sizes) → Tensor`

    

Repeats this tensor along the specified dimensions.

Unlike `expand()`, this function copies the tensor’s data.

Warning

`repeat()` behaves differently from
[numpy.repeat](https://docs.scipy.org/doc/numpy/reference/generated/numpy.repeat.html),
but is more similar to
[numpy.tile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html).
For the operator similar to `numpy.repeat`, see
[`torch.repeat_interleave()`](generated/torch.repeat_interleave#torch.repeat_interleave
"torch.repeat_interleave").

Parameters

    

**sizes** (_torch.Size_ _or_ _int..._) – The number of times to repeat this
tensor along each dimension

Example:

    
    
    >>> x = torch.tensor([1, 2, 3])
    >>> x.repeat(4, 2)
    tensor([[ 1,  2,  3,  1,  2,  3],
            [ 1,  2,  3,  1,  2,  3],
            [ 1,  2,  3,  1,  2,  3],
            [ 1,  2,  3,  1,  2,  3]])
    >>> x.repeat(4, 2, 1).size()
    torch.Size([4, 2, 3])
    

`repeat_interleave(repeats, dim=None) → Tensor`

    

See
[`torch.repeat_interleave()`](generated/torch.repeat_interleave#torch.repeat_interleave
"torch.repeat_interleave").

`requires_grad`

    

Is `True` if gradients need to be computed for this Tensor, `False` otherwise.

Note

The fact that gradients need to be computed for a Tensor do not mean that the
[`grad`](autograd#torch.Tensor.grad "torch.Tensor.grad") attribute will be
populated, see [`is_leaf`](autograd#torch.Tensor.is_leaf
"torch.Tensor.is_leaf") for more details.

`requires_grad_(requires_grad=True) → Tensor`

    

Change if autograd should record operations on this tensor: sets this tensor’s
[`requires_grad`](autograd#torch.Tensor.requires_grad
"torch.Tensor.requires_grad") attribute in-place. Returns this tensor.

`requires_grad_()`’s main use case is to tell autograd to begin recording
operations on a Tensor `tensor`. If `tensor` has `requires_grad=False`
(because it was obtained through a DataLoader, or required preprocessing or
initialization), `tensor.requires_grad_()` makes it so that autograd will
begin to record operations on `tensor`.

Parameters

    

**requires_grad**
([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)")) – If autograd should record operations on this tensor. Default:
`True`.

Example:

    
    
    >>> # Let's say we want to preprocess some saved weights and use
    >>> # the result as new weights.
    >>> saved_weights = [0.1, 0.2, 0.3, 0.25]
    >>> loaded_weights = torch.tensor(saved_weights)
    >>> weights = preprocess(loaded_weights)  # some function
    >>> weights
    tensor([-0.5503,  0.4926, -2.1158, -0.8303])
    
    >>> # Now, start to record operations done to weights
    >>> weights.requires_grad_()
    >>> out = weights.pow(2).sum()
    >>> out.backward()
    >>> weights.grad
    tensor([-1.1007,  0.9853, -4.2316, -1.6606])
    

`reshape(*shape) → Tensor`

    

Returns a tensor with the same data and number of elements as `self` but with
the specified shape. This method returns a view if `shape` is compatible with
the current shape. See `torch.Tensor.view()` on when it is possible to return
a view.

See [`torch.reshape()`](generated/torch.reshape#torch.reshape "torch.reshape")

Parameters

    

**shape** (_tuple of python:ints_ _or_ _int..._) – the desired shape

`reshape_as(other) → Tensor`

    

Returns this tensor as the same shape as `other`. `self.reshape_as(other)` is
equivalent to `self.reshape(other.sizes())`. This method returns a view if
`other.sizes()` is compatible with the current shape. See
`torch.Tensor.view()` on when it is possible to return a view.

Please see [`reshape()`](generated/torch.reshape#torch.reshape
"torch.reshape") for more information about `reshape`.

Parameters

    

**other** (`torch.Tensor`) – The result tensor has the same shape as `other`.

`resize_(*sizes, memory_format=torch.contiguous_format) → Tensor`

    

Resizes `self` tensor to the specified size. If the number of elements is
larger than the current storage size, then the underlying storage is resized
to fit the new number of elements. If the number of elements is smaller, the
underlying storage is not changed. Existing elements are preserved but any new
memory is uninitialized.

Warning

This is a low-level method. The storage is reinterpreted as C-contiguous,
ignoring the current strides (unless the target size equals the current size,
in which case the tensor is left unchanged). For most purposes, you will
instead want to use `view()`, which checks for contiguity, or `reshape()`,
which copies data if needed. To change the size in-place with custom strides,
see `set_()`.

Parameters

    

  * **sizes** (_torch.Size_ _or_ _int..._) – the desired size
  * **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of Tensor. Default: `torch.contiguous_format`. Note that memory format of `self` is going to be unaffected if `self.size()` matches `sizes`.

Example:

    
    
    >>> x = torch.tensor([[1, 2], [3, 4], [5, 6]])
    >>> x.resize_(2, 2)
    tensor([[ 1,  2],
            [ 3,  4]])
    

`resize_as_(tensor, memory_format=torch.contiguous_format) → Tensor`

    

Resizes the `self` tensor to be the same size as the specified
[`tensor`](generated/torch.tensor#torch.tensor "torch.tensor"). This is
equivalent to `self.resize_(tensor.size())`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of Tensor.
Default: `torch.contiguous_format`. Note that memory format of `self` is going
to be unaffected if `self.size()` matches `tensor.size()`.

`retain_grad()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.retain_grad)

    

Enables .grad attribute for non-leaf Tensors.

`roll(shifts, dims) → Tensor`

    

See [`torch.roll()`](generated/torch.roll#torch.roll "torch.roll")

`rot90(k, dims) → Tensor`

    

See [`torch.rot90()`](generated/torch.rot90#torch.rot90 "torch.rot90")

`round() → Tensor`

    

See [`torch.round()`](generated/torch.round#torch.round "torch.round")

`round_() → Tensor`

    

In-place version of `round()`

`rsqrt() → Tensor`

    

See [`torch.rsqrt()`](generated/torch.rsqrt#torch.rsqrt "torch.rsqrt")

`rsqrt_() → Tensor`

    

In-place version of `rsqrt()`

`scatter(dim, index, src) → Tensor`

    

Out-of-place version of `torch.Tensor.scatter_()`

`scatter_(dim, index, src, reduce=None) → Tensor`

    

Writes all values from the tensor `src` into `self` at the indices specified
in the `index` tensor. For each value in `src`, its output index is specified
by its index in `src` for `dimension != dim` and by the corresponding value in
`index` for `dimension = dim`.

For a 3-D tensor, `self` is updated as:

    
    
    self[index[i][j][k]][j][k] = src[i][j][k]  # if dim == 0
    self[i][index[i][j][k]][k] = src[i][j][k]  # if dim == 1
    self[i][j][index[i][j][k]] = src[i][j][k]  # if dim == 2
    

This is the reverse operation of the manner described in `gather()`.

`self`, `index` and `src` (if it is a Tensor) should all have the same number
of dimensions. It is also required that `index.size(d) <= src.size(d)` for all
dimensions `d`, and that `index.size(d) <= self.size(d)` for all dimensions `d
!= dim`. Note that `index` and `src` do not broadcast.

Moreover, as for `gather()`, the values of `index` must be between `0` and
`self.size(dim) - 1` inclusive.

Warning

When indices are not unique, the behavior is non-deterministic (one of the
values from `src` will be picked arbitrarily) and the gradient will be
incorrect (it will be propagated to all locations in the source that
correspond to the same index)!

Note

The backward pass is implemented only for `src.shape == index.shape`.

Additionally accepts an optional `reduce` argument that allows specification
of an optional reduction operation, which is applied to all values in the
tensor `src` into `self` at the indicies specified in the `index`. For each
value in `src`, the reduction operation is applied to an index in `self` which
is specified by its index in `src` for `dimension != dim` and by the
corresponding value in `index` for `dimension = dim`.

Given a 3-D tensor and reduction using the multiplication operation, `self` is
updated as:

    
    
    self[index[i][j][k]][j][k] *= src[i][j][k]  # if dim == 0
    self[i][index[i][j][k]][k] *= src[i][j][k]  # if dim == 1
    self[i][j][index[i][j][k]] *= src[i][j][k]  # if dim == 2
    

Reducing with the addition operation is the same as using `scatter_add_()`.

Parameters

    

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the axis along which to index
  * **index** (_LongTensor_) – the indices of elements to scatter, can be either empty or of the same dimensionality as `src`. When empty, the operation returns `self` unchanged.
  * **src** (Tensor _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the source element(s) to scatter.
  * **reduce** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – reduction operation to apply, can be either `'add'` or `'multiply'`.

Example:

    
    
    >>> src = torch.arange(1, 11).reshape((2, 5))
    >>> src
    tensor([[ 1,  2,  3,  4,  5],
            [ 6,  7,  8,  9, 10]])
    >>> index = torch.tensor([[0, 1, 2, 0]])
    >>> torch.zeros(3, 5, dtype=src.dtype).scatter_(0, index, src)
    tensor([[1, 0, 0, 4, 0],
            [0, 2, 0, 0, 0],
            [0, 0, 3, 0, 0]])
    >>> index = torch.tensor([[0, 1, 2], [0, 1, 4]])
    >>> torch.zeros(3, 5, dtype=src.dtype).scatter_(1, index, src)
    tensor([[1, 2, 3, 0, 0],
            [6, 7, 0, 0, 8],
            [0, 0, 0, 0, 0]])
    
    >>> torch.full((2, 4), 2.).scatter_(1, torch.tensor([[2], [3]]),
    ...            1.23, reduce='multiply')
    tensor([[2.0000, 2.0000, 2.4600, 2.0000],
            [2.0000, 2.0000, 2.0000, 2.4600]])
    >>> torch.full((2, 4), 2.).scatter_(1, torch.tensor([[2], [3]]),
    ...            1.23, reduce='add')
    tensor([[2.0000, 2.0000, 3.2300, 2.0000],
            [2.0000, 2.0000, 2.0000, 3.2300]])
    

`scatter_add_(dim, index, src) → Tensor`

    

Adds all values from the tensor `other` into `self` at the indices specified
in the `index` tensor in a similar fashion as `scatter_()`. For each value in
`src`, it is added to an index in `self` which is specified by its index in
`src` for `dimension != dim` and by the corresponding value in `index` for
`dimension = dim`.

For a 3-D tensor, `self` is updated as:

    
    
    self[index[i][j][k]][j][k] += src[i][j][k]  # if dim == 0
    self[i][index[i][j][k]][k] += src[i][j][k]  # if dim == 1
    self[i][j][index[i][j][k]] += src[i][j][k]  # if dim == 2
    

`self`, `index` and `src` should have same number of dimensions. It is also
required that `index.size(d) <= src.size(d)` for all dimensions `d`, and that
`index.size(d) <= self.size(d)` for all dimensions `d != dim`. Note that
`index` and `src` do not broadcast.

Note

This operation may behave nondeterministically when given tensors on a CUDA
device. See
[Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for
more information.

Note

The backward pass is implemented only for `src.shape == index.shape`.

Parameters

    

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the axis along which to index
  * **index** (_LongTensor_) – the indices of elements to scatter and add, can be either empty or of the same dimensionality as `src`. When empty, the operation returns `self` unchanged.
  * **src** (Tensor) – the source elements to scatter and add

Example:

    
    
    >>> src = torch.ones((2, 5))
    >>> index = torch.tensor([[0, 1, 2, 0, 0]])
    >>> torch.zeros(3, 5, dtype=src.dtype).scatter_add_(0, index, src)
    tensor([[1., 0., 0., 1., 1.],
            [0., 1., 0., 0., 0.],
            [0., 0., 1., 0., 0.]])
    >>> index = torch.tensor([[0, 1, 2, 0, 0], [0, 1, 2, 2, 2]])
    >>> torch.zeros(3, 5, dtype=src.dtype).scatter_add_(0, index, src)
    tensor([[2., 0., 0., 1., 1.],
            [0., 2., 0., 0., 0.],
            [0., 0., 2., 1., 1.]])
    

`scatter_add(dim, index, src) → Tensor`

    

Out-of-place version of `torch.Tensor.scatter_add_()`

`select(dim, index) → Tensor`

    

Slices the `self` tensor along the selected dimension at the given index. This
function returns a view of the original tensor with the given dimension
removed.

Parameters

    

  * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to slice
  * **index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the index to select with

Note

`select()` is equivalent to slicing. For example, `tensor.select(0, index)` is
equivalent to `tensor[index]` and `tensor.select(2, index)` is equivalent to
`tensor[:,:,index]`.

`set_(source=None, storage_offset=0, size=None, stride=None) → Tensor`

    

Sets the underlying storage, size, and strides. If `source` is a tensor,
`self` tensor will share the same storage and have the same size and strides
as `source`. Changes to elements in one tensor will be reflected in the other.

If `source` is a `Storage`, the method sets the underlying storage, offset,
size, and stride.

Parameters

    

  * **source** (Tensor _or_ _Storage_) – the tensor or storage to use
  * **storage_offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the offset in the storage
  * **size** (_torch.Size_ _,__optional_) – the desired size. Defaults to the size of the source.
  * **stride** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the desired stride. Defaults to C-contiguous strides.

`share_memory_()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.share_memory_)

    

Moves the underlying storage to shared memory.

This is a no-op if the underlying storage is already in shared memory and for
CUDA tensors. Tensors in shared memory cannot be resized.

`short(memory_format=torch.preserve_format) → Tensor`

    

`self.short()` is equivalent to `self.to(torch.int16)`. See `to()`.

Parameters

    

**memory_format**
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional) – the desired memory format of
returned Tensor. Default: `torch.preserve_format`.

`sigmoid() → Tensor`

    

See [`torch.sigmoid()`](generated/torch.sigmoid#torch.sigmoid "torch.sigmoid")

`sigmoid_() → Tensor`

    

In-place version of `sigmoid()`

`sign() → Tensor`

    

See [`torch.sign()`](generated/torch.sign#torch.sign "torch.sign")

`sign_() → Tensor`

    

In-place version of `sign()`

`signbit() → Tensor`

    

See [`torch.signbit()`](generated/torch.signbit#torch.signbit "torch.signbit")

`sgn() → Tensor`

    

See [`torch.sgn()`](generated/torch.sgn#torch.sgn "torch.sgn")

`sgn_() → Tensor`

    

In-place version of `sgn()`

`sin() → Tensor`

    

See [`torch.sin()`](generated/torch.sin#torch.sin "torch.sin")

`sin_() → Tensor`

    

In-place version of `sin()`

`sinc() → Tensor`

    

See [`torch.sinc()`](generated/torch.sinc#torch.sinc "torch.sinc")

`sinc_() → Tensor`

    

In-place version of `sinc()`

`sinh() → Tensor`

    

See [`torch.sinh()`](generated/torch.sinh#torch.sinh "torch.sinh")

`sinh_() → Tensor`

    

In-place version of `sinh()`

`asinh() → Tensor`

    

See [`torch.asinh()`](generated/torch.asinh#torch.asinh "torch.asinh")

`asinh_() → Tensor`

    

In-place version of `asinh()`

`arcsinh() → Tensor`

    

See [`torch.arcsinh()`](generated/torch.arcsinh#torch.arcsinh "torch.arcsinh")

`arcsinh_() → Tensor`

    

In-place version of `arcsinh()`

`size() → torch.Size`

    

Returns the size of the `self` tensor. The returned value is a subclass of
[`tuple`](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python
v3.9\)").

Example:

    
    
    >>> torch.empty(3, 4, 5).size()
    torch.Size([3, 4, 5])
    

`slogdet() -> (Tensor, Tensor)`

    

See [`torch.slogdet()`](generated/torch.slogdet#torch.slogdet "torch.slogdet")

`solve(A) → Tensor, Tensor`

    

See [`torch.solve()`](generated/torch.solve#torch.solve "torch.solve")

`sort(dim=-1, descending=False) -> (Tensor, LongTensor)`

    

See [`torch.sort()`](generated/torch.sort#torch.sort "torch.sort")

`split(split_size, dim=0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.split)

    

See [`torch.split()`](generated/torch.split#torch.split "torch.split")

`sparse_mask(mask) → Tensor`

    

Returns a new [sparse tensor](sparse#sparse-docs) with values from a strided
tensor `self` filtered by the indices of the sparse tensor `mask`. The values
of `mask` sparse tensor are ignored. `self` and `mask` tensors must have the
same shape.

Note

The returned sparse tensor has the same indices as the sparse tensor `mask`,
even when the corresponding values in `self` are zeros.

Parameters

    

**mask** (Tensor) – a sparse tensor whose indices are used as a filter

Example:

    
    
    >>> nse = 5
    >>> dims = (5, 5, 2, 2)
    >>> I = torch.cat([torch.randint(0, dims[0], size=(nse,)),
    ...                torch.randint(0, dims[1], size=(nse,))], 0).reshape(2, nse)
    >>> V = torch.randn(nse, dims[2], dims[3])
    >>> S = torch.sparse_coo_tensor(I, V, dims).coalesce()
    >>> D = torch.randn(dims)
    >>> D.sparse_mask(S)
    tensor(indices=tensor([[0, 0, 0, 2],
                           [0, 1, 4, 3]]),
           values=tensor([[[ 1.6550,  0.2397],
                           [-0.1611, -0.0779]],
    
                          [[ 0.2326, -1.0558],
                           [ 1.4711,  1.9678]],
    
                          [[-0.5138, -0.0411],
                           [ 1.9417,  0.5158]],
    
                          [[ 0.0793,  0.0036],
                           [-0.2569, -0.1055]]]),
           size=(5, 5, 2, 2), nnz=4, layout=torch.sparse_coo)
    

`sparse_dim() → int`

    

Return the number of sparse dimensions in a [sparse tensor](sparse#sparse-
docs) `self`.

Warning

Throws an error if `self` is not a sparse tensor.

See also [`Tensor.dense_dim()`](sparse#torch.Tensor.dense_dim
"torch.Tensor.dense_dim") and [hybrid tensors](sparse#sparse-hybrid-coo-docs).

`sqrt() → Tensor`

    

See [`torch.sqrt()`](generated/torch.sqrt#torch.sqrt "torch.sqrt")

`sqrt_() → Tensor`

    

In-place version of `sqrt()`

`square() → Tensor`

    

See [`torch.square()`](generated/torch.square#torch.square "torch.square")

`square_() → Tensor`

    

In-place version of `square()`

`squeeze(dim=None) → Tensor`

    

See [`torch.squeeze()`](generated/torch.squeeze#torch.squeeze "torch.squeeze")

`squeeze_(dim=None) → Tensor`

    

In-place version of `squeeze()`

`std(dim=None, unbiased=True, keepdim=False) → Tensor`

    

See [`torch.std()`](generated/torch.std#torch.std "torch.std")

`stft(n_fft, hop_length=None, win_length=None, window=None, center=True,
pad_mode='reflect', normalized=False, onesided=None, return_complex=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.stft)

    

See [`torch.stft()`](generated/torch.stft#torch.stft "torch.stft")

Warning

This function changed signature at version 0.4.1. Calling with the previous
signature may cause error or return incorrect result.

`storage() → torch.Storage`

    

Returns the underlying storage.

`storage_offset() → int`

    

Returns `self` tensor’s offset in the underlying storage in terms of number of
storage elements (not bytes).

Example:

    
    
    >>> x = torch.tensor([1, 2, 3, 4, 5])
    >>> x.storage_offset()
    0
    >>> x[3:].storage_offset()
    3
    

`storage_type() → type`

    

Returns the type of the underlying storage.

`stride(dim) → tuple or int`

    

Returns the stride of `self` tensor.

Stride is the jump necessary to go from one element to the next one in the
specified dimension `dim`. A tuple of all strides is returned when no argument
is passed in. Otherwise, an integer value is returned as the stride in the
particular dimension `dim`.

Parameters

    

**dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in
Python v3.9\)") _,__optional_) – the desired dimension in which stride is
required

Example:

    
    
    >>> x = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
    >>> x.stride()
    (5, 1)
    >>> x.stride(0)
    5
    >>> x.stride(-1)
    1
    

`sub(other, *, alpha=1) → Tensor`

    

See [`torch.sub()`](generated/torch.sub#torch.sub "torch.sub").

`sub_(other, *, alpha=1) → Tensor`

    

In-place version of `sub()`

`subtract(other, *, alpha=1) → Tensor`

    

See [`torch.subtract()`](generated/torch.subtract#torch.subtract
"torch.subtract").

`subtract_(other, *, alpha=1) → Tensor`

    

In-place version of `subtract()`.

`sum(dim=None, keepdim=False, dtype=None) → Tensor`

    

See [`torch.sum()`](generated/torch.sum#torch.sum "torch.sum")

`sum_to_size(*size) → Tensor`

    

Sum `this` tensor to `size`. `size` must be broadcastable to `this` tensor
size.

Parameters

    

**size** (_int..._) – a sequence of integers defining the shape of the output
tensor.

`svd(some=True, compute_uv=True) -> (Tensor, Tensor, Tensor)`

    

See [`torch.svd()`](generated/torch.svd#torch.svd "torch.svd")

`swapaxes(axis0, axis1) → Tensor`

    

See [`torch.swapaxes()`](generated/torch.swapaxes#torch.swapaxes
"torch.swapaxes")

`swapdims(dim0, dim1) → Tensor`

    

See [`torch.swapdims()`](generated/torch.swapdims#torch.swapdims
"torch.swapdims")

`symeig(eigenvectors=False, upper=True) -> (Tensor, Tensor)`

    

See [`torch.symeig()`](generated/torch.symeig#torch.symeig "torch.symeig")

`t() → Tensor`

    

See [`torch.t()`](generated/torch.t#torch.t "torch.t")

`t_() → Tensor`

    

In-place version of `t()`

`tensor_split(indices_or_sections, dim=0) → List of Tensors`

    

See [`torch.tensor_split()`](generated/torch.tensor_split#torch.tensor_split
"torch.tensor_split")

`tile(*reps) → Tensor`

    

See [`torch.tile()`](generated/torch.tile#torch.tile "torch.tile")

`to(*args, **kwargs) → Tensor`

    

Performs Tensor dtype and/or device conversion. A
[`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and
[`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device")
are inferred from the arguments of `self.to(*args, **kwargs)`.

Note

If the `self` Tensor already has the correct
[`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and
[`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"),
then `self` is returned. Otherwise, the returned tensor is a copy of `self`
with the desired [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device
"torch.torch.device").

Here are the ways to call `to`:

`to(dtype, non_blocking=False, copy=False,
memory_format=torch.preserve_format) → Tensor`

    

Returns a Tensor with the specified `dtype`

Args:

    

memory_format
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional): the desired memory format of returned
Tensor. Default: `torch.preserve_format`.

`to(device=None, dtype=None, non_blocking=False, copy=False,
memory_format=torch.preserve_format) → Tensor`

    

Returns a Tensor with the specified `device` and (optional) `dtype`. If
`dtype` is `None` it is inferred to be `self.dtype`. When `non_blocking`,
tries to convert asynchronously with respect to the host if possible, e.g.,
converting a CPU Tensor with pinned memory to a CUDA Tensor. When `copy` is
set, a new Tensor is created even when the Tensor already matches the desired
conversion.

Args:

    

memory_format
([`torch.memory_format`](tensor_attributes#torch.torch.memory_format
"torch.torch.memory_format"), optional): the desired memory format of returned
Tensor. Default: `torch.preserve_format`.

`to(other, non_blocking=False, copy=False) → Tensor`

    

Returns a Tensor with same [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device
"torch.torch.device") as the Tensor `other`. When `non_blocking`, tries to
convert asynchronously with respect to the host if possible, e.g., converting
a CPU Tensor with pinned memory to a CUDA Tensor. When `copy` is set, a new
Tensor is created even when the Tensor already matches the desired conversion.

Example:

    
    
    >>> tensor = torch.randn(2, 2)  # Initially dtype=float32, device=cpu
    >>> tensor.to(torch.float64)
    tensor([[-0.5044,  0.0005],
            [ 0.3310, -0.0584]], dtype=torch.float64)
    
    >>> cuda0 = torch.device('cuda:0')
    >>> tensor.to(cuda0)
    tensor([[-0.5044,  0.0005],
            [ 0.3310, -0.0584]], device='cuda:0')
    
    >>> tensor.to(cuda0, dtype=torch.float64)
    tensor([[-0.5044,  0.0005],
            [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0')
    
    >>> other = torch.randn((), dtype=torch.float64, device=cuda0)
    >>> tensor.to(other, non_blocking=True)
    tensor([[-0.5044,  0.0005],
            [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0')
    

`to_mkldnn() → Tensor`

    

Returns a copy of the tensor in `torch.mkldnn` layout.

`take(indices) → Tensor`

    

See [`torch.take()`](generated/torch.take#torch.take "torch.take")

`tan() → Tensor`

    

See [`torch.tan()`](generated/torch.tan#torch.tan "torch.tan")

`tan_() → Tensor`

    

In-place version of `tan()`

`tanh() → Tensor`

    

See [`torch.tanh()`](generated/torch.tanh#torch.tanh "torch.tanh")

`tanh_() → Tensor`

    

In-place version of `tanh()`

`atanh() → Tensor`

    

See [`torch.atanh()`](generated/torch.atanh#torch.atanh "torch.atanh")

`atanh_(other) → Tensor`

    

In-place version of `atanh()`

`arctanh() → Tensor`

    

See [`torch.arctanh()`](generated/torch.arctanh#torch.arctanh "torch.arctanh")

`arctanh_(other) → Tensor`

    

In-place version of `arctanh()`

`tolist() → list or number`

    

Returns the tensor as a (nested) list. For scalars, a standard Python number
is returned, just like with `item()`. Tensors are automatically moved to the
CPU first if necessary.

This operation is not differentiable.

Examples:

    
    
    >>> a = torch.randn(2, 2)
    >>> a.tolist()
    [[0.012766935862600803, 0.5415473580360413],
     [-0.08909505605697632, 0.7729271650314331]]
    >>> a[0,0].tolist()
    0.012766935862600803
    

`topk(k, dim=None, largest=True, sorted=True) -> (Tensor, LongTensor)`

    

See [`torch.topk()`](generated/torch.topk#torch.topk "torch.topk")

`to_sparse(sparseDims) → Tensor`

    

Returns a sparse copy of the tensor. PyTorch supports sparse tensors in
[coordinate format](sparse#sparse-coo-docs).

Parameters

    

**sparseDims** ([int](https://docs.python.org/3/library/functions.html#int
"\(in Python v3.9\)") _,__optional_) – the number of sparse dimensions to
include in the new sparse tensor

Example:

    
    
    >>> d = torch.tensor([[0, 0, 0], [9, 0, 10], [0, 0, 0]])
    >>> d
    tensor([[ 0,  0,  0],
            [ 9,  0, 10],
            [ 0,  0,  0]])
    >>> d.to_sparse()
    tensor(indices=tensor([[1, 1],
                           [0, 2]]),
           values=tensor([ 9, 10]),
           size=(3, 3), nnz=2, layout=torch.sparse_coo)
    >>> d.to_sparse(1)
    tensor(indices=tensor([[1]]),
           values=tensor([[ 9,  0, 10]]),
           size=(3, 3), nnz=1, layout=torch.sparse_coo)
    

`trace() → Tensor`

    

See [`torch.trace()`](generated/torch.trace#torch.trace "torch.trace")

`transpose(dim0, dim1) → Tensor`

    

See [`torch.transpose()`](generated/torch.transpose#torch.transpose
"torch.transpose")

`transpose_(dim0, dim1) → Tensor`

    

In-place version of `transpose()`

`triangular_solve(A, upper=True, transpose=False, unitriangular=False) ->
(Tensor, Tensor)`

    

See
[`torch.triangular_solve()`](generated/torch.triangular_solve#torch.triangular_solve
"torch.triangular_solve")

`tril(k=0) → Tensor`

    

See [`torch.tril()`](generated/torch.tril#torch.tril "torch.tril")

`tril_(k=0) → Tensor`

    

In-place version of `tril()`

`triu(k=0) → Tensor`

    

See [`torch.triu()`](generated/torch.triu#torch.triu "torch.triu")

`triu_(k=0) → Tensor`

    

In-place version of `triu()`

`true_divide(value) → Tensor`

    

See [`torch.true_divide()`](generated/torch.true_divide#torch.true_divide
"torch.true_divide")

`true_divide_(value) → Tensor`

    

In-place version of `true_divide_()`

`trunc() → Tensor`

    

See [`torch.trunc()`](generated/torch.trunc#torch.trunc "torch.trunc")

`trunc_() → Tensor`

    

In-place version of `trunc()`

`type(dtype=None, non_blocking=False, **kwargs) → str or Tensor`

    

Returns the type if `dtype` is not provided, else casts this object to the
specified type.

If this is already of the correct type, no copy is performed and the original
object is returned.

Parameters

    

  * **dtype** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – The desired type
  * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, and the source is in pinned memory and destination is on the GPU or vice versa, the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect.
  * ****kwargs** – For compatibility, may contain the key `async` in place of the `non_blocking` argument. The `async` arg is deprecated.

`type_as(tensor) → Tensor`

    

Returns this tensor cast to the type of the given tensor.

This is a no-op if the tensor is already of the correct type. This is
equivalent to `self.type(tensor.type())`

Parameters

    

**tensor** (Tensor) – the tensor which has the desired type

`unbind(dim=0) → seq`

    

See [`torch.unbind()`](generated/torch.unbind#torch.unbind "torch.unbind")

`unfold(dimension, size, step) → Tensor`

    

Returns a view of the original tensor which contains all slices of size `size`
from `self` tensor in the dimension `dimension`.

Step between two slices is given by `step`.

If `sizedim` is the size of dimension `dimension` for `self`, the size of
dimension `dimension` in the returned tensor will be `(sizedim - size) / step
+ 1`.

An additional dimension of size `size` is appended in the returned tensor.

Parameters

    

  * **dimension** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension in which unfolding happens
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of each slice that is unfolded
  * **step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the step between each slice

Example:

    
    
    >>> x = torch.arange(1., 8)
    >>> x
    tensor([ 1.,  2.,  3.,  4.,  5.,  6.,  7.])
    >>> x.unfold(0, 2, 1)
    tensor([[ 1.,  2.],
            [ 2.,  3.],
            [ 3.,  4.],
            [ 4.,  5.],
            [ 5.,  6.],
            [ 6.,  7.]])
    >>> x.unfold(0, 2, 2)
    tensor([[ 1.,  2.],
            [ 3.,  4.],
            [ 5.,  6.]])
    

`uniform_(from=0, to=1) → Tensor`

    

Fills `self` tensor with numbers sampled from the continuous uniform
distribution:

P(x)=1to−fromP(x) = \dfrac{1}{\text{to} - \text{from}}

`unique(sorted=True, return_inverse=False, return_counts=False, dim=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.unique)

    

Returns the unique elements of the input tensor.

See [`torch.unique()`](generated/torch.unique#torch.unique "torch.unique")

`unique_consecutive(return_inverse=False, return_counts=False, dim=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.unique_consecutive)

    

Eliminates all but the first element from every consecutive group of
equivalent elements.

See
[`torch.unique_consecutive()`](generated/torch.unique_consecutive#torch.unique_consecutive
"torch.unique_consecutive")

`unsqueeze(dim) → Tensor`

    

See [`torch.unsqueeze()`](generated/torch.unsqueeze#torch.unsqueeze
"torch.unsqueeze")

`unsqueeze_(dim) → Tensor`

    

In-place version of `unsqueeze()`

`values() → Tensor`

    

Return the values tensor of a [sparse COO tensor](sparse#sparse-coo-docs).

Warning

Throws an error if `self` is not a sparse COO tensor.

See also [`Tensor.indices()`](sparse#torch.Tensor.indices
"torch.Tensor.indices").

Note

This method can only be called on a coalesced sparse tensor. See
[`Tensor.coalesce()`](sparse#torch.Tensor.coalesce "torch.Tensor.coalesce")
for details.

`var(dim=None, unbiased=True, keepdim=False) → Tensor`

    

See [`torch.var()`](generated/torch.var#torch.var "torch.var")

`vdot(other) → Tensor`

    

See [`torch.vdot()`](generated/torch.vdot#torch.vdot "torch.vdot")

`view(*shape) → Tensor`

    

Returns a new tensor with the same data as the `self` tensor but of a
different `shape`.

The returned tensor shares the same data and must have the same number of
elements, but may have a different size. For a tensor to be viewed, the new
view size must be compatible with its original size and stride, i.e., each new
view dimension must either be a subspace of an original dimension, or only
span across original dimensions d,d+1,…,d+kd, d+1, \dots, d+k that satisfy the
following contiguity-like condition that ∀i=d,…,d+k−1\forall i = d, \dots,
d+k-1 ,

stride[i]=stride[i+1]×size[i+1]\text{stride}[i] = \text{stride}[i+1] \times
\text{size}[i+1]

Otherwise, it will not be possible to view `self` tensor as `shape` without
copying it (e.g., via `contiguous()`). When it is unclear whether a `view()`
can be performed, it is advisable to use
[`reshape()`](generated/torch.reshape#torch.reshape "torch.reshape"), which
returns a view if the shapes are compatible, and copies (equivalent to calling
`contiguous()`) otherwise.

Parameters

    

**shape** (_torch.Size_ _or_ _int..._) – the desired size

Example:

    
    
    >>> x = torch.randn(4, 4)
    >>> x.size()
    torch.Size([4, 4])
    >>> y = x.view(16)
    >>> y.size()
    torch.Size([16])
    >>> z = x.view(-1, 8)  # the size -1 is inferred from other dimensions
    >>> z.size()
    torch.Size([2, 8])
    
    >>> a = torch.randn(1, 2, 3, 4)
    >>> a.size()
    torch.Size([1, 2, 3, 4])
    >>> b = a.transpose(1, 2)  # Swaps 2nd and 3rd dimension
    >>> b.size()
    torch.Size([1, 3, 2, 4])
    >>> c = a.view(1, 3, 2, 4)  # Does not change tensor layout in memory
    >>> c.size()
    torch.Size([1, 3, 2, 4])
    >>> torch.equal(b, c)
    False
    

`view(dtype) → Tensor`

Returns a new tensor with the same data as the `self` tensor but of a
different `dtype`. `dtype` must have the same number of bytes per element as
`self`’s dtype.

Warning

This overload is not supported by TorchScript, and using it in a Torchscript
program will cause undefined behavior.

Parameters

    

**dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype")) – the desired dtype

Example:

    
    
    >>> x = torch.randn(4, 4)
    >>> x
    tensor([[ 0.9482, -0.0310,  1.4999, -0.5316],
            [-0.1520,  0.7472,  0.5617, -0.8649],
            [-2.4724, -0.0334, -0.2976, -0.8499],
            [-0.2109,  1.9913, -0.9607, -0.6123]])
    >>> x.dtype
    torch.float32
    
    >>> y = x.view(torch.int32)
    >>> y
    tensor([[ 1064483442, -1124191867,  1069546515, -1089989247],
            [-1105482831,  1061112040,  1057999968, -1084397505],
            [-1071760287, -1123489973, -1097310419, -1084649136],
            [-1101533110,  1073668768, -1082790149, -1088634448]],
        dtype=torch.int32)
    >>> y[0, 0] = 1000000000
    >>> x
    tensor([[ 0.0047, -0.0310,  1.4999, -0.5316],
            [-0.1520,  0.7472,  0.5617, -0.8649],
            [-2.4724, -0.0334, -0.2976, -0.8499],
            [-0.2109,  1.9913, -0.9607, -0.6123]])
    
    >>> x.view(torch.int16)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    RuntimeError: Viewing a tensor as a new dtype with a different number of bytes per element is not supported.
    

`view_as(other) → Tensor`

    

View this tensor as the same size as `other`. `self.view_as(other)` is
equivalent to `self.view(other.size())`.

Please see `view()` for more information about `view`.

Parameters

    

**other** (`torch.Tensor`) – The result tensor has the same size as `other`.

`where(condition, y) → Tensor`

    

`self.where(condition, y)` is equivalent to `torch.where(condition, self, y)`.
See [`torch.where()`](generated/torch.where#torch.where "torch.where")

`xlogy(other) → Tensor`

    

See [`torch.xlogy()`](generated/torch.xlogy#torch.xlogy "torch.xlogy")

`xlogy_(other) → Tensor`

    

In-place version of `xlogy()`

`zero_() → Tensor`

    

Fills `self` tensor with zeros.

# torch

The torch package contains data structures for multi-dimensional tensors and
defines mathematical operations over these tensors. Additionally, it provides
many utilities for efficient serializing of Tensors and arbitrary types, and
other useful utilities.

It has a CUDA counterpart, that enables you to run your tensor computations on
an NVIDIA GPU with compute capability >= 3.0

## Tensors

[`is_tensor`](generated/torch.is_tensor#torch.is_tensor "torch.is_tensor") | Returns True if `obj` is a PyTorch tensor.  
---|---  
[`is_storage`](generated/torch.is_storage#torch.is_storage "torch.is_storage") | Returns True if `obj` is a PyTorch storage object.  
[`is_complex`](generated/torch.is_complex#torch.is_complex "torch.is_complex") | Returns True if the data type of `input` is a complex data type i.e., one of `torch.complex64`, and `torch.complex128`.  
[`is_floating_point`](generated/torch.is_floating_point#torch.is_floating_point "torch.is_floating_point") | Returns True if the data type of `input` is a floating point data type i.e., one of `torch.float64`, `torch.float32`, `torch.float16`, and `torch.bfloat16`.  
[`is_nonzero`](generated/torch.is_nonzero#torch.is_nonzero "torch.is_nonzero") | Returns True if the `input` is a single element tensor which is not equal to zero after type conversions.  
[`set_default_dtype`](generated/torch.set_default_dtype#torch.set_default_dtype "torch.set_default_dtype") | Sets the default floating point dtype to `d`.  
[`get_default_dtype`](generated/torch.get_default_dtype#torch.get_default_dtype "torch.get_default_dtype") | Get the current default floating point [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype").  
[`set_default_tensor_type`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type") | Sets the default `torch.Tensor` type to floating point tensor type `t`.  
[`numel`](generated/torch.numel#torch.numel "torch.numel") | Returns the total number of elements in the `input` tensor.  
[`set_printoptions`](generated/torch.set_printoptions#torch.set_printoptions "torch.set_printoptions") | Set options for printing.  
[`set_flush_denormal`](generated/torch.set_flush_denormal#torch.set_flush_denormal "torch.set_flush_denormal") | Disables denormal floating numbers on CPU.  
  
### Creation Ops

Note

Random sampling creation ops are listed under Random sampling and include:
[`torch.rand()`](generated/torch.rand#torch.rand "torch.rand")
[`torch.rand_like()`](generated/torch.rand_like#torch.rand_like
"torch.rand_like") [`torch.randn()`](generated/torch.randn#torch.randn
"torch.randn")
[`torch.randn_like()`](generated/torch.randn_like#torch.randn_like
"torch.randn_like") [`torch.randint()`](generated/torch.randint#torch.randint
"torch.randint")
[`torch.randint_like()`](generated/torch.randint_like#torch.randint_like
"torch.randint_like")
[`torch.randperm()`](generated/torch.randperm#torch.randperm "torch.randperm")
You may also use [`torch.empty()`](generated/torch.empty#torch.empty
"torch.empty") with the In-place random sampling methods to create
[`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") s with values sampled
from a broader range of distributions.

[`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") | Constructs a tensor with `data`.  
---|---  
[`sparse_coo_tensor`](generated/torch.sparse_coo_tensor#torch.sparse_coo_tensor "torch.sparse_coo_tensor") | Constructs a [sparse tensor in COO(rdinate) format](sparse#sparse-coo-docs) with specified values at the given `indices`.  
[`as_tensor`](generated/torch.as_tensor#torch.as_tensor "torch.as_tensor") | Convert the data into a `torch.Tensor`.  
[`as_strided`](generated/torch.as_strided#torch.as_strided "torch.as_strided") | Create a view of an existing `torch.Tensor` `input` with specified `size`, `stride` and `storage_offset`.  
[`from_numpy`](generated/torch.from_numpy#torch.from_numpy "torch.from_numpy") | Creates a [`Tensor`](tensors#torch.Tensor "torch.Tensor") from a [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray "\(in NumPy v1.20\)").  
[`zeros`](generated/torch.zeros#torch.zeros "torch.zeros") | Returns a tensor filled with the scalar value `0`, with the shape defined by the variable argument `size`.  
[`zeros_like`](generated/torch.zeros_like#torch.zeros_like "torch.zeros_like") | Returns a tensor filled with the scalar value `0`, with the same size as `input`.  
[`ones`](generated/torch.ones#torch.ones "torch.ones") | Returns a tensor filled with the scalar value `1`, with the shape defined by the variable argument `size`.  
[`ones_like`](generated/torch.ones_like#torch.ones_like "torch.ones_like") | Returns a tensor filled with the scalar value `1`, with the same size as `input`.  
[`arange`](generated/torch.arange#torch.arange "torch.arange") | Returns a 1-D tensor of size ⌈end−startstep⌉\left\lceil \frac{\text{end} - \text{start}}{\text{step}} \right\rceil with values from the interval `[start, end)` taken with common difference `step` beginning from `start`.  
[`range`](generated/torch.range#torch.range "torch.range") | Returns a 1-D tensor of size ⌊end−startstep⌋+1\left\lfloor \frac{\text{end} - \text{start}}{\text{step}} \right\rfloor + 1 with values from `start` to `end` with step `step`.  
[`linspace`](generated/torch.linspace#torch.linspace "torch.linspace") | Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from `start` to `end`, inclusive.  
[`logspace`](generated/torch.logspace#torch.logspace "torch.logspace") | Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from basestart{{\text{{base}}}}^{{\text{{start}}}} to baseend{{\text{{base}}}}^{{\text{{end}}}} , inclusive, on a logarithmic scale with base `base`.  
[`eye`](generated/torch.eye#torch.eye "torch.eye") | Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.  
[`empty`](generated/torch.empty#torch.empty "torch.empty") | Returns a tensor filled with uninitialized data.  
[`empty_like`](generated/torch.empty_like#torch.empty_like "torch.empty_like") | Returns an uninitialized tensor with the same size as `input`.  
[`empty_strided`](generated/torch.empty_strided#torch.empty_strided "torch.empty_strided") | Returns a tensor filled with uninitialized data.  
[`full`](generated/torch.full#torch.full "torch.full") | Creates a tensor of size `size` filled with `fill_value`.  
[`full_like`](generated/torch.full_like#torch.full_like "torch.full_like") | Returns a tensor with the same size as `input` filled with `fill_value`.  
[`quantize_per_tensor`](generated/torch.quantize_per_tensor#torch.quantize_per_tensor "torch.quantize_per_tensor") | Converts a float tensor to a quantized tensor with given scale and zero point.  
[`quantize_per_channel`](generated/torch.quantize_per_channel#torch.quantize_per_channel "torch.quantize_per_channel") | Converts a float tensor to a per-channel quantized tensor with given scales and zero points.  
[`dequantize`](generated/torch.dequantize#torch.dequantize "torch.dequantize") | Returns an fp32 Tensor by dequantizing a quantized Tensor  
[`complex`](generated/torch.complex#torch.complex "torch.complex") | Constructs a complex tensor with its real part equal to [`real`](generated/torch.real#torch.real "torch.real") and its imaginary part equal to [`imag`](generated/torch.imag#torch.imag "torch.imag").  
[`polar`](generated/torch.polar#torch.polar "torch.polar") | Constructs a complex tensor whose elements are Cartesian coordinates corresponding to the polar coordinates with absolute value [`abs`](generated/torch.abs#torch.abs "torch.abs") and angle [`angle`](generated/torch.angle#torch.angle "torch.angle").  
[`heaviside`](generated/torch.heaviside#torch.heaviside "torch.heaviside") | Computes the Heaviside step function for each element in `input`.  
  
### Indexing, Slicing, Joining, Mutating Ops

[`cat`](generated/torch.cat#torch.cat "torch.cat") | Concatenates the given sequence of `seq` tensors in the given dimension.  
---|---  
[`chunk`](generated/torch.chunk#torch.chunk "torch.chunk") | Splits a tensor into a specific number of chunks.  
[`column_stack`](generated/torch.column_stack#torch.column_stack "torch.column_stack") | Creates a new tensor by horizontally stacking the tensors in `tensors`.  
[`dstack`](generated/torch.dstack#torch.dstack "torch.dstack") | Stack tensors in sequence depthwise (along third axis).  
[`gather`](generated/torch.gather#torch.gather "torch.gather") | Gathers values along an axis specified by `dim`.  
[`hstack`](generated/torch.hstack#torch.hstack "torch.hstack") | Stack tensors in sequence horizontally (column wise).  
[`index_select`](generated/torch.index_select#torch.index_select "torch.index_select") | Returns a new tensor which indexes the `input` tensor along dimension `dim` using the entries in `index` which is a `LongTensor`.  
[`masked_select`](generated/torch.masked_select#torch.masked_select "torch.masked_select") | Returns a new 1-D tensor which indexes the `input` tensor according to the boolean mask `mask` which is a `BoolTensor`.  
[`movedim`](generated/torch.movedim#torch.movedim "torch.movedim") | Moves the dimension(s) of `input` at the position(s) in `source` to the position(s) in `destination`.  
[`moveaxis`](generated/torch.moveaxis#torch.moveaxis "torch.moveaxis") | Alias for [`torch.movedim()`](generated/torch.movedim#torch.movedim "torch.movedim").  
[`narrow`](generated/torch.narrow#torch.narrow "torch.narrow") | Returns a new tensor that is a narrowed version of `input` tensor.  
[`nonzero`](generated/torch.nonzero#torch.nonzero "torch.nonzero") |   
[`reshape`](generated/torch.reshape#torch.reshape "torch.reshape") | Returns a tensor with the same data and number of elements as `input`, but with the specified shape.  
[`row_stack`](generated/torch.row_stack#torch.row_stack "torch.row_stack") | Alias of [`torch.vstack()`](generated/torch.vstack#torch.vstack "torch.vstack").  
[`scatter`](generated/torch.scatter#torch.scatter "torch.scatter") | Out-of-place version of [`torch.Tensor.scatter_()`](tensors#torch.Tensor.scatter_ "torch.Tensor.scatter_")  
[`scatter_add`](generated/torch.scatter_add#torch.scatter_add "torch.scatter_add") | Out-of-place version of [`torch.Tensor.scatter_add_()`](tensors#torch.Tensor.scatter_add_ "torch.Tensor.scatter_add_")  
[`split`](generated/torch.split#torch.split "torch.split") | Splits the tensor into chunks.  
[`squeeze`](generated/torch.squeeze#torch.squeeze "torch.squeeze") | Returns a tensor with all the dimensions of `input` of size `1` removed.  
[`stack`](generated/torch.stack#torch.stack "torch.stack") | Concatenates a sequence of tensors along a new dimension.  
[`swapaxes`](generated/torch.swapaxes#torch.swapaxes "torch.swapaxes") | Alias for [`torch.transpose()`](generated/torch.transpose#torch.transpose "torch.transpose").  
[`swapdims`](generated/torch.swapdims#torch.swapdims "torch.swapdims") | Alias for [`torch.transpose()`](generated/torch.transpose#torch.transpose "torch.transpose").  
[`t`](generated/torch.t#torch.t "torch.t") | Expects `input` to be <= 2-D tensor and transposes dimensions 0 and 1.  
[`take`](generated/torch.take#torch.take "torch.take") | Returns a new tensor with the elements of `input` at the given indices.  
[`tensor_split`](generated/torch.tensor_split#torch.tensor_split "torch.tensor_split") | Splits a tensor into multiple sub-tensors, all of which are views of `input`, along dimension `dim` according to the indices or number of sections specified by `indices_or_sections`.  
[`tile`](generated/torch.tile#torch.tile "torch.tile") | Constructs a tensor by repeating the elements of `input`.  
[`transpose`](generated/torch.transpose#torch.transpose "torch.transpose") | Returns a tensor that is a transposed version of `input`.  
[`unbind`](generated/torch.unbind#torch.unbind "torch.unbind") | Removes a tensor dimension.  
[`unsqueeze`](generated/torch.unsqueeze#torch.unsqueeze "torch.unsqueeze") | Returns a new tensor with a dimension of size one inserted at the specified position.  
[`vstack`](generated/torch.vstack#torch.vstack "torch.vstack") | Stack tensors in sequence vertically (row wise).  
[`where`](generated/torch.where#torch.where "torch.where") | Return a tensor of elements selected from either `x` or `y`, depending on `condition`.  
  
## Generators

[`Generator`](generated/torch.generator#torch.Generator "torch.Generator") | Creates and returns a generator object that manages the state of the algorithm which produces pseudo random numbers.  
---|---  
  
## Random sampling

[`seed`](generated/torch.seed#torch.seed "torch.seed") | Sets the seed for generating random numbers to a non-deterministic random number.  
---|---  
[`manual_seed`](generated/torch.manual_seed#torch.manual_seed "torch.manual_seed") | Sets the seed for generating random numbers.  
[`initial_seed`](generated/torch.initial_seed#torch.initial_seed "torch.initial_seed") | Returns the initial seed for generating random numbers as a Python `long`.  
[`get_rng_state`](generated/torch.get_rng_state#torch.get_rng_state "torch.get_rng_state") | Returns the random number generator state as a `torch.ByteTensor`.  
[`set_rng_state`](generated/torch.set_rng_state#torch.set_rng_state "torch.set_rng_state") | Sets the random number generator state.  
  
`torch.default_generator Returns the default CPU torch.Generator`

[`bernoulli`](generated/torch.bernoulli#torch.bernoulli "torch.bernoulli") | Draws binary random numbers (0 or 1) from a Bernoulli distribution.  
---|---  
[`multinomial`](generated/torch.multinomial#torch.multinomial "torch.multinomial") | Returns a tensor where each row contains `num_samples` indices sampled from the multinomial probability distribution located in the corresponding row of tensor `input`.  
[`normal`](generated/torch.normal#torch.normal "torch.normal") | Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given.  
[`poisson`](generated/torch.poisson#torch.poisson "torch.poisson") | Returns a tensor of the same size as `input` with each element sampled from a Poisson distribution with rate parameter given by the corresponding element in `input` i.e.,  
[`rand`](generated/torch.rand#torch.rand "torch.rand") | Returns a tensor filled with random numbers from a uniform distribution on the interval [0,1)[0, 1)  
[`rand_like`](generated/torch.rand_like#torch.rand_like "torch.rand_like") | Returns a tensor with the same size as `input` that is filled with random numbers from a uniform distribution on the interval [0,1)[0, 1) .  
[`randint`](generated/torch.randint#torch.randint "torch.randint") | Returns a tensor filled with random integers generated uniformly between `low` (inclusive) and `high` (exclusive).  
[`randint_like`](generated/torch.randint_like#torch.randint_like "torch.randint_like") | Returns a tensor with the same shape as Tensor `input` filled with random integers generated uniformly between `low` (inclusive) and `high` (exclusive).  
[`randn`](generated/torch.randn#torch.randn "torch.randn") | Returns a tensor filled with random numbers from a normal distribution with mean `0` and variance `1` (also called the standard normal distribution).  
[`randn_like`](generated/torch.randn_like#torch.randn_like "torch.randn_like") | Returns a tensor with the same size as `input` that is filled with random numbers from a normal distribution with mean 0 and variance 1.  
[`randperm`](generated/torch.randperm#torch.randperm "torch.randperm") | Returns a random permutation of integers from `0` to `n - 1`.  
  
### In-place random sampling

There are a few more in-place random sampling functions defined on Tensors as
well. Click through to refer to their documentation:

  * [`torch.Tensor.bernoulli_()`](tensors#torch.Tensor.bernoulli_ "torch.Tensor.bernoulli_") \- in-place version of [`torch.bernoulli()`](generated/torch.bernoulli#torch.bernoulli "torch.bernoulli")
  * [`torch.Tensor.cauchy_()`](tensors#torch.Tensor.cauchy_ "torch.Tensor.cauchy_") \- numbers drawn from the Cauchy distribution
  * [`torch.Tensor.exponential_()`](tensors#torch.Tensor.exponential_ "torch.Tensor.exponential_") \- numbers drawn from the exponential distribution
  * [`torch.Tensor.geometric_()`](tensors#torch.Tensor.geometric_ "torch.Tensor.geometric_") \- elements drawn from the geometric distribution
  * [`torch.Tensor.log_normal_()`](tensors#torch.Tensor.log_normal_ "torch.Tensor.log_normal_") \- samples from the log-normal distribution
  * [`torch.Tensor.normal_()`](tensors#torch.Tensor.normal_ "torch.Tensor.normal_") \- in-place version of [`torch.normal()`](generated/torch.normal#torch.normal "torch.normal")
  * [`torch.Tensor.random_()`](tensors#torch.Tensor.random_ "torch.Tensor.random_") \- numbers sampled from the discrete uniform distribution
  * [`torch.Tensor.uniform_()`](tensors#torch.Tensor.uniform_ "torch.Tensor.uniform_") \- numbers sampled from the continuous uniform distribution

### Quasi-random sampling

[`quasirandom.SobolEngine`](generated/torch.quasirandom.sobolengine#torch.quasirandom.SobolEngine "torch.quasirandom.SobolEngine") | The [`torch.quasirandom.SobolEngine`](generated/torch.quasirandom.sobolengine#torch.quasirandom.SobolEngine "torch.quasirandom.SobolEngine") is an engine for generating (scrambled) Sobol sequences.  
---|---  
  
## Serialization

[`save`](generated/torch.save#torch.save "torch.save") | Saves an object to a disk file.  
---|---  
[`load`](generated/torch.load#torch.load "torch.load") | Loads an object saved with [`torch.save()`](generated/torch.save#torch.save "torch.save") from a file.  
  
## Parallelism

[`get_num_threads`](generated/torch.get_num_threads#torch.get_num_threads "torch.get_num_threads") | Returns the number of threads used for parallelizing CPU operations  
---|---  
[`set_num_threads`](generated/torch.set_num_threads#torch.set_num_threads "torch.set_num_threads") | Sets the number of threads used for intraop parallelism on CPU.  
[`get_num_interop_threads`](generated/torch.get_num_interop_threads#torch.get_num_interop_threads "torch.get_num_interop_threads") | Returns the number of threads used for inter-op parallelism on CPU (e.g.  
[`set_num_interop_threads`](generated/torch.set_num_interop_threads#torch.set_num_interop_threads "torch.set_num_interop_threads") | Sets the number of threads used for interop parallelism (e.g.  
  
## Locally disabling gradient computation

The context managers [`torch.no_grad()`](generated/torch.no_grad#torch.no_grad
"torch.no_grad"),
[`torch.enable_grad()`](generated/torch.enable_grad#torch.enable_grad
"torch.enable_grad"), and
[`torch.set_grad_enabled()`](generated/torch.set_grad_enabled#torch.set_grad_enabled
"torch.set_grad_enabled") are helpful for locally disabling and enabling
gradient computation. See [Locally disabling gradient
computation](autograd#locally-disable-grad) for more details on their usage.
These context managers are thread local, so they won’t work if you send work
to another thread using the `threading` module, etc.

Examples:

    
    
    >>> x = torch.zeros(1, requires_grad=True)
    >>> with torch.no_grad():
    ...     y = x * 2
    >>> y.requires_grad
    False
    
    >>> is_train = False
    >>> with torch.set_grad_enabled(is_train):
    ...     y = x * 2
    >>> y.requires_grad
    False
    
    >>> torch.set_grad_enabled(True)  # this can also be used as a function
    >>> y = x * 2
    >>> y.requires_grad
    True
    
    >>> torch.set_grad_enabled(False)
    >>> y = x * 2
    >>> y.requires_grad
    False
    

[`no_grad`](generated/torch.no_grad#torch.no_grad "torch.no_grad") | Context-manager that disabled gradient calculation.  
---|---  
[`enable_grad`](generated/torch.enable_grad#torch.enable_grad "torch.enable_grad") | Context-manager that enables gradient calculation.  
[`set_grad_enabled`](generated/torch.set_grad_enabled#torch.set_grad_enabled "torch.set_grad_enabled") | Context-manager that sets gradient calculation to on or off.  
  
## Math operations

### Pointwise Ops

[`abs`](generated/torch.abs#torch.abs "torch.abs") | Computes the absolute value of each element in `input`.  
---|---  
[`absolute`](generated/torch.absolute#torch.absolute "torch.absolute") | Alias for [`torch.abs()`](generated/torch.abs#torch.abs "torch.abs")  
[`acos`](generated/torch.acos#torch.acos "torch.acos") | Computes the inverse cosine of each element in `input`.  
[`arccos`](generated/torch.arccos#torch.arccos "torch.arccos") | Alias for [`torch.acos()`](generated/torch.acos#torch.acos "torch.acos").  
[`acosh`](generated/torch.acosh#torch.acosh "torch.acosh") | Returns a new tensor with the inverse hyperbolic cosine of the elements of `input`.  
[`arccosh`](generated/torch.arccosh#torch.arccosh "torch.arccosh") | Alias for [`torch.acosh()`](generated/torch.acosh#torch.acosh "torch.acosh").  
[`add`](generated/torch.add#torch.add "torch.add") | Adds the scalar `other` to each element of the input `input` and returns a new resulting tensor.  
[`addcdiv`](generated/torch.addcdiv#torch.addcdiv "torch.addcdiv") | Performs the element-wise division of `tensor1` by `tensor2`, multiply the result by the scalar `value` and add it to `input`.  
[`addcmul`](generated/torch.addcmul#torch.addcmul "torch.addcmul") | Performs the element-wise multiplication of `tensor1` by `tensor2`, multiply the result by the scalar `value` and add it to `input`.  
[`angle`](generated/torch.angle#torch.angle "torch.angle") | Computes the element-wise angle (in radians) of the given `input` tensor.  
[`asin`](generated/torch.asin#torch.asin "torch.asin") | Returns a new tensor with the arcsine of the elements of `input`.  
[`arcsin`](generated/torch.arcsin#torch.arcsin "torch.arcsin") | Alias for [`torch.asin()`](generated/torch.asin#torch.asin "torch.asin").  
[`asinh`](generated/torch.asinh#torch.asinh "torch.asinh") | Returns a new tensor with the inverse hyperbolic sine of the elements of `input`.  
[`arcsinh`](generated/torch.arcsinh#torch.arcsinh "torch.arcsinh") | Alias for [`torch.asinh()`](generated/torch.asinh#torch.asinh "torch.asinh").  
[`atan`](generated/torch.atan#torch.atan "torch.atan") | Returns a new tensor with the arctangent of the elements of `input`.  
[`arctan`](generated/torch.arctan#torch.arctan "torch.arctan") | Alias for [`torch.atan()`](generated/torch.atan#torch.atan "torch.atan").  
[`atanh`](generated/torch.atanh#torch.atanh "torch.atanh") | Returns a new tensor with the inverse hyperbolic tangent of the elements of `input`.  
[`arctanh`](generated/torch.arctanh#torch.arctanh "torch.arctanh") | Alias for [`torch.atanh()`](generated/torch.atanh#torch.atanh "torch.atanh").  
[`atan2`](generated/torch.atan2#torch.atan2 "torch.atan2") | Element-wise arctangent of inputi/otheri\text{input}_{i} / \text{other}_{i} with consideration of the quadrant.  
[`bitwise_not`](generated/torch.bitwise_not#torch.bitwise_not "torch.bitwise_not") | Computes the bitwise NOT of the given input tensor.  
[`bitwise_and`](generated/torch.bitwise_and#torch.bitwise_and "torch.bitwise_and") | Computes the bitwise AND of `input` and `other`.  
[`bitwise_or`](generated/torch.bitwise_or#torch.bitwise_or "torch.bitwise_or") | Computes the bitwise OR of `input` and `other`.  
[`bitwise_xor`](generated/torch.bitwise_xor#torch.bitwise_xor "torch.bitwise_xor") | Computes the bitwise XOR of `input` and `other`.  
[`ceil`](generated/torch.ceil#torch.ceil "torch.ceil") | Returns a new tensor with the ceil of the elements of `input`, the smallest integer greater than or equal to each element.  
[`clamp`](generated/torch.clamp#torch.clamp "torch.clamp") | Clamp all elements in `input` into the range `[` [`min`](generated/torch.min#torch.min "torch.min"), [`max`](generated/torch.max#torch.max "torch.max") `]`.  
[`clip`](generated/torch.clip#torch.clip "torch.clip") | Alias for [`torch.clamp()`](generated/torch.clamp#torch.clamp "torch.clamp").  
[`conj`](generated/torch.conj#torch.conj "torch.conj") | Computes the element-wise conjugate of the given `input` tensor.  
[`copysign`](generated/torch.copysign#torch.copysign "torch.copysign") | Create a new floating-point tensor with the magnitude of `input` and the sign of `other`, elementwise.  
[`cos`](generated/torch.cos#torch.cos "torch.cos") | Returns a new tensor with the cosine of the elements of `input`.  
[`cosh`](generated/torch.cosh#torch.cosh "torch.cosh") | Returns a new tensor with the hyperbolic cosine of the elements of `input`.  
[`deg2rad`](generated/torch.deg2rad#torch.deg2rad "torch.deg2rad") | Returns a new tensor with each of the elements of `input` converted from angles in degrees to radians.  
[`div`](generated/torch.div#torch.div "torch.div") | Divides each element of the input `input` by the corresponding element of `other`.  
[`divide`](generated/torch.divide#torch.divide "torch.divide") | Alias for [`torch.div()`](generated/torch.div#torch.div "torch.div").  
[`digamma`](generated/torch.digamma#torch.digamma "torch.digamma") | Computes the logarithmic derivative of the gamma function on `input`.  
[`erf`](generated/torch.erf#torch.erf "torch.erf") | Computes the error function of each element.  
[`erfc`](generated/torch.erfc#torch.erfc "torch.erfc") | Computes the complementary error function of each element of `input`.  
[`erfinv`](generated/torch.erfinv#torch.erfinv "torch.erfinv") | Computes the inverse error function of each element of `input`.  
[`exp`](generated/torch.exp#torch.exp "torch.exp") | Returns a new tensor with the exponential of the elements of the input tensor `input`.  
[`exp2`](generated/torch.exp2#torch.exp2 "torch.exp2") | Computes the base two exponential function of `input`.  
[`expm1`](generated/torch.expm1#torch.expm1 "torch.expm1") | Returns a new tensor with the exponential of the elements minus 1 of `input`.  
[`fake_quantize_per_channel_affine`](generated/torch.fake_quantize_per_channel_affine#torch.fake_quantize_per_channel_affine "torch.fake_quantize_per_channel_affine") | Returns a new tensor with the data in `input` fake quantized per channel using `scale`, `zero_point`, `quant_min` and `quant_max`, across the channel specified by `axis`.  
[`fake_quantize_per_tensor_affine`](generated/torch.fake_quantize_per_tensor_affine#torch.fake_quantize_per_tensor_affine "torch.fake_quantize_per_tensor_affine") | Returns a new tensor with the data in `input` fake quantized using `scale`, `zero_point`, `quant_min` and `quant_max`.  
[`fix`](generated/torch.fix#torch.fix "torch.fix") | Alias for [`torch.trunc()`](generated/torch.trunc#torch.trunc "torch.trunc")  
[`float_power`](generated/torch.float_power#torch.float_power "torch.float_power") | Raises `input` to the power of `exponent`, elementwise, in double precision.  
[`floor`](generated/torch.floor#torch.floor "torch.floor") | Returns a new tensor with the floor of the elements of `input`, the largest integer less than or equal to each element.  
[`floor_divide`](generated/torch.floor_divide#torch.floor_divide "torch.floor_divide") |   
[`fmod`](generated/torch.fmod#torch.fmod "torch.fmod") | Computes the element-wise remainder of division.  
[`frac`](generated/torch.frac#torch.frac "torch.frac") | Computes the fractional portion of each element in `input`.  
[`imag`](generated/torch.imag#torch.imag "torch.imag") | Returns a new tensor containing imaginary values of the `self` tensor.  
[`ldexp`](generated/torch.ldexp#torch.ldexp "torch.ldexp") | Multiplies `input` by 2**:attr:`other`.  
[`lerp`](generated/torch.lerp#torch.lerp "torch.lerp") | Does a linear interpolation of two tensors `start` (given by `input`) and `end` based on a scalar or tensor `weight` and returns the resulting `out` tensor.  
[`lgamma`](generated/torch.lgamma#torch.lgamma "torch.lgamma") | Computes the logarithm of the gamma function on `input`.  
[`log`](generated/torch.log#torch.log "torch.log") | Returns a new tensor with the natural logarithm of the elements of `input`.  
[`log10`](generated/torch.log10#torch.log10 "torch.log10") | Returns a new tensor with the logarithm to the base 10 of the elements of `input`.  
[`log1p`](generated/torch.log1p#torch.log1p "torch.log1p") | Returns a new tensor with the natural logarithm of (1 + `input`).  
[`log2`](generated/torch.log2#torch.log2 "torch.log2") | Returns a new tensor with the logarithm to the base 2 of the elements of `input`.  
[`logaddexp`](generated/torch.logaddexp#torch.logaddexp "torch.logaddexp") | Logarithm of the sum of exponentiations of the inputs.  
[`logaddexp2`](generated/torch.logaddexp2#torch.logaddexp2 "torch.logaddexp2") | Logarithm of the sum of exponentiations of the inputs in base-2.  
[`logical_and`](generated/torch.logical_and#torch.logical_and "torch.logical_and") | Computes the element-wise logical AND of the given input tensors.  
[`logical_not`](generated/torch.logical_not#torch.logical_not "torch.logical_not") | Computes the element-wise logical NOT of the given input tensor.  
[`logical_or`](generated/torch.logical_or#torch.logical_or "torch.logical_or") | Computes the element-wise logical OR of the given input tensors.  
[`logical_xor`](generated/torch.logical_xor#torch.logical_xor "torch.logical_xor") | Computes the element-wise logical XOR of the given input tensors.  
[`logit`](generated/torch.logit#torch.logit "torch.logit") | Returns a new tensor with the logit of the elements of `input`.  
[`hypot`](generated/torch.hypot#torch.hypot "torch.hypot") | Given the legs of a right triangle, return its hypotenuse.  
[`i0`](generated/torch.i0#torch.i0 "torch.i0") | Computes the zeroth order modified Bessel function of the first kind for each element of `input`.  
[`igamma`](generated/torch.igamma#torch.igamma "torch.igamma") | Computes the regularized lower incomplete gamma function:  
[`igammac`](generated/torch.igammac#torch.igammac "torch.igammac") | Computes the regularized upper incomplete gamma function:  
[`mul`](generated/torch.mul#torch.mul "torch.mul") | Multiplies each element of the input `input` with the scalar `other` and returns a new resulting tensor.  
[`multiply`](generated/torch.multiply#torch.multiply "torch.multiply") | Alias for [`torch.mul()`](generated/torch.mul#torch.mul "torch.mul").  
[`mvlgamma`](generated/torch.mvlgamma#torch.mvlgamma "torch.mvlgamma") | Computes the [multivariate log-gamma function](https://en.wikipedia.org/wiki/Multivariate_gamma_function)) with dimension pp element-wise, given by  
[`nan_to_num`](generated/torch.nan_to_num#torch.nan_to_num "torch.nan_to_num") | Replaces `NaN`, positive infinity, and negative infinity values in `input` with the values specified by `nan`, `posinf`, and `neginf`, respectively.  
[`neg`](generated/torch.neg#torch.neg "torch.neg") | Returns a new tensor with the negative of the elements of `input`.  
[`negative`](generated/torch.negative#torch.negative "torch.negative") | Alias for [`torch.neg()`](generated/torch.neg#torch.neg "torch.neg")  
[`nextafter`](generated/torch.nextafter#torch.nextafter "torch.nextafter") | Return the next floating-point value after `input` towards `other`, elementwise.  
[`polygamma`](generated/torch.polygamma#torch.polygamma "torch.polygamma") | Computes the nthn^{th} derivative of the digamma function on `input`.  
[`pow`](generated/torch.pow#torch.pow "torch.pow") | Takes the power of each element in `input` with `exponent` and returns a tensor with the result.  
[`rad2deg`](generated/torch.rad2deg#torch.rad2deg "torch.rad2deg") | Returns a new tensor with each of the elements of `input` converted from angles in radians to degrees.  
[`real`](generated/torch.real#torch.real "torch.real") | Returns a new tensor containing real values of the `self` tensor.  
[`reciprocal`](generated/torch.reciprocal#torch.reciprocal "torch.reciprocal") | Returns a new tensor with the reciprocal of the elements of `input`  
[`remainder`](generated/torch.remainder#torch.remainder "torch.remainder") | Computes the element-wise remainder of division.  
[`round`](generated/torch.round#torch.round "torch.round") | Returns a new tensor with each of the elements of `input` rounded to the closest integer.  
[`rsqrt`](generated/torch.rsqrt#torch.rsqrt "torch.rsqrt") | Returns a new tensor with the reciprocal of the square-root of each of the elements of `input`.  
[`sigmoid`](generated/torch.sigmoid#torch.sigmoid "torch.sigmoid") | Returns a new tensor with the sigmoid of the elements of `input`.  
[`sign`](generated/torch.sign#torch.sign "torch.sign") | Returns a new tensor with the signs of the elements of `input`.  
[`sgn`](generated/torch.sgn#torch.sgn "torch.sgn") | For complex tensors, this function returns a new tensor whose elemants have the same angle as that of the elements of `input` and absolute value 1.  
[`signbit`](generated/torch.signbit#torch.signbit "torch.signbit") | Tests if each element of `input` has its sign bit set (is less than zero) or not.  
[`sin`](generated/torch.sin#torch.sin "torch.sin") | Returns a new tensor with the sine of the elements of `input`.  
[`sinc`](generated/torch.sinc#torch.sinc "torch.sinc") | Computes the normalized sinc of `input.`  
[`sinh`](generated/torch.sinh#torch.sinh "torch.sinh") | Returns a new tensor with the hyperbolic sine of the elements of `input`.  
[`sqrt`](generated/torch.sqrt#torch.sqrt "torch.sqrt") | Returns a new tensor with the square-root of the elements of `input`.  
[`square`](generated/torch.square#torch.square "torch.square") | Returns a new tensor with the square of the elements of `input`.  
[`sub`](generated/torch.sub#torch.sub "torch.sub") | Subtracts `other`, scaled by `alpha`, from `input`.  
[`subtract`](generated/torch.subtract#torch.subtract "torch.subtract") | Alias for [`torch.sub()`](generated/torch.sub#torch.sub "torch.sub").  
[`tan`](generated/torch.tan#torch.tan "torch.tan") | Returns a new tensor with the tangent of the elements of `input`.  
[`tanh`](generated/torch.tanh#torch.tanh "torch.tanh") | Returns a new tensor with the hyperbolic tangent of the elements of `input`.  
[`true_divide`](generated/torch.true_divide#torch.true_divide "torch.true_divide") | Alias for [`torch.div()`](generated/torch.div#torch.div "torch.div") with `rounding_mode=None`.  
[`trunc`](generated/torch.trunc#torch.trunc "torch.trunc") | Returns a new tensor with the truncated integer values of the elements of `input`.  
[`xlogy`](generated/torch.xlogy#torch.xlogy "torch.xlogy") | Computes `input * log(other)` with the following cases.  
  
### Reduction Ops

[`argmax`](generated/torch.argmax#torch.argmax "torch.argmax") | Returns the indices of the maximum value of all elements in the `input` tensor.  
---|---  
[`argmin`](generated/torch.argmin#torch.argmin "torch.argmin") | Returns the indices of the minimum value(s) of the flattened tensor or along a dimension  
[`amax`](generated/torch.amax#torch.amax "torch.amax") | Returns the maximum value of each slice of the `input` tensor in the given dimension(s) `dim`.  
[`amin`](generated/torch.amin#torch.amin "torch.amin") | Returns the minimum value of each slice of the `input` tensor in the given dimension(s) `dim`.  
[`all`](generated/torch.all#torch.all "torch.all") | Tests if all elements in `input` evaluate to `True`.  
[`any`](generated/torch.any#torch.any "torch.any") | 

param input

     the input tensor.  
[`max`](generated/torch.max#torch.max "torch.max") | Returns the maximum value of all elements in the `input` tensor.  
[`min`](generated/torch.min#torch.min "torch.min") | Returns the minimum value of all elements in the `input` tensor.  
[`dist`](generated/torch.dist#torch.dist "torch.dist") | Returns the p-norm of (`input` \- `other`)  
[`logsumexp`](generated/torch.logsumexp#torch.logsumexp "torch.logsumexp") | Returns the log of summed exponentials of each row of the `input` tensor in the given dimension `dim`.  
[`mean`](generated/torch.mean#torch.mean "torch.mean") | Returns the mean value of all elements in the `input` tensor.  
[`median`](generated/torch.median#torch.median "torch.median") | Returns the median of the values in `input`.  
[`nanmedian`](generated/torch.nanmedian#torch.nanmedian "torch.nanmedian") | Returns the median of the values in `input`, ignoring `NaN` values.  
[`mode`](generated/torch.mode#torch.mode "torch.mode") | Returns a namedtuple `(values, indices)` where `values` is the mode value of each row of the `input` tensor in the given dimension `dim`, i.e.  
[`norm`](generated/torch.norm#torch.norm "torch.norm") | Returns the matrix norm or vector norm of a given tensor.  
[`nansum`](generated/torch.nansum#torch.nansum "torch.nansum") | Returns the sum of all elements, treating Not a Numbers (NaNs) as zero.  
[`prod`](generated/torch.prod#torch.prod "torch.prod") | Returns the product of all elements in the `input` tensor.  
[`quantile`](generated/torch.quantile#torch.quantile "torch.quantile") | Returns the q-th quantiles of all elements in the `input` tensor, doing a linear interpolation when the q-th quantile lies between two data points.  
[`nanquantile`](generated/torch.nanquantile#torch.nanquantile "torch.nanquantile") | This is a variant of [`torch.quantile()`](generated/torch.quantile#torch.quantile "torch.quantile") that “ignores” `NaN` values, computing the quantiles `q` as if `NaN` values in `input` did not exist.  
[`std`](generated/torch.std#torch.std "torch.std") | Returns the standard-deviation of all elements in the `input` tensor.  
[`std_mean`](generated/torch.std_mean#torch.std_mean "torch.std_mean") | Returns the standard-deviation and mean of all elements in the `input` tensor.  
[`sum`](generated/torch.sum#torch.sum "torch.sum") | Returns the sum of all elements in the `input` tensor.  
[`unique`](generated/torch.unique#torch.unique "torch.unique") | Returns the unique elements of the input tensor.  
[`unique_consecutive`](generated/torch.unique_consecutive#torch.unique_consecutive "torch.unique_consecutive") | Eliminates all but the first element from every consecutive group of equivalent elements.  
[`var`](generated/torch.var#torch.var "torch.var") | Returns the variance of all elements in the `input` tensor.  
[`var_mean`](generated/torch.var_mean#torch.var_mean "torch.var_mean") | Returns the variance and mean of all elements in the `input` tensor.  
[`count_nonzero`](generated/torch.count_nonzero#torch.count_nonzero "torch.count_nonzero") | Counts the number of non-zero values in the tensor `input` along the given `dim`.  
  
### Comparison Ops

[`allclose`](generated/torch.allclose#torch.allclose "torch.allclose") | This function checks if all `input` and `other` satisfy the condition:  
---|---  
[`argsort`](generated/torch.argsort#torch.argsort "torch.argsort") | Returns the indices that sort a tensor along a given dimension in ascending order by value.  
[`eq`](generated/torch.eq#torch.eq "torch.eq") | Computes element-wise equality  
[`equal`](generated/torch.equal#torch.equal "torch.equal") | `True` if two tensors have the same size and elements, `False` otherwise.  
[`ge`](generated/torch.ge#torch.ge "torch.ge") | Computes input≥other\text{input} \geq \text{other} element-wise.  
[`greater_equal`](generated/torch.greater_equal#torch.greater_equal "torch.greater_equal") | Alias for [`torch.ge()`](generated/torch.ge#torch.ge "torch.ge").  
[`gt`](generated/torch.gt#torch.gt "torch.gt") | Computes input>other\text{input} > \text{other} element-wise.  
[`greater`](generated/torch.greater#torch.greater "torch.greater") | Alias for [`torch.gt()`](generated/torch.gt#torch.gt "torch.gt").  
[`isclose`](generated/torch.isclose#torch.isclose "torch.isclose") | Returns a new tensor with boolean elements representing if each element of `input` is “close” to the corresponding element of `other`.  
[`isfinite`](generated/torch.isfinite#torch.isfinite "torch.isfinite") | Returns a new tensor with boolean elements representing if each element is `finite` or not.  
[`isinf`](generated/torch.isinf#torch.isinf "torch.isinf") | Tests if each element of `input` is infinite (positive or negative infinity) or not.  
[`isposinf`](generated/torch.isposinf#torch.isposinf "torch.isposinf") | Tests if each element of `input` is positive infinity or not.  
[`isneginf`](generated/torch.isneginf#torch.isneginf "torch.isneginf") | Tests if each element of `input` is negative infinity or not.  
[`isnan`](generated/torch.isnan#torch.isnan "torch.isnan") | Returns a new tensor with boolean elements representing if each element of `input` is NaN or not.  
[`isreal`](generated/torch.isreal#torch.isreal "torch.isreal") | Returns a new tensor with boolean elements representing if each element of `input` is real-valued or not.  
[`kthvalue`](generated/torch.kthvalue#torch.kthvalue "torch.kthvalue") | Returns a namedtuple `(values, indices)` where `values` is the `k` th smallest element of each row of the `input` tensor in the given dimension `dim`.  
[`le`](generated/torch.le#torch.le "torch.le") | Computes input≤other\text{input} \leq \text{other} element-wise.  
[`less_equal`](generated/torch.less_equal#torch.less_equal "torch.less_equal") | Alias for [`torch.le()`](generated/torch.le#torch.le "torch.le").  
[`lt`](generated/torch.lt#torch.lt "torch.lt") | Computes input<other\text{input} < \text{other} element-wise.  
[`less`](generated/torch.less#torch.less "torch.less") | Alias for [`torch.lt()`](generated/torch.lt#torch.lt "torch.lt").  
[`maximum`](generated/torch.maximum#torch.maximum "torch.maximum") | Computes the element-wise maximum of `input` and `other`.  
[`minimum`](generated/torch.minimum#torch.minimum "torch.minimum") | Computes the element-wise minimum of `input` and `other`.  
[`fmax`](generated/torch.fmax#torch.fmax "torch.fmax") | Computes the element-wise maximum of `input` and `other`.  
[`fmin`](generated/torch.fmin#torch.fmin "torch.fmin") | Computes the element-wise minimum of `input` and `other`.  
[`ne`](generated/torch.ne#torch.ne "torch.ne") | Computes input≠other\text{input} \neq \text{other} element-wise.  
[`not_equal`](generated/torch.not_equal#torch.not_equal "torch.not_equal") | Alias for [`torch.ne()`](generated/torch.ne#torch.ne "torch.ne").  
[`sort`](generated/torch.sort#torch.sort "torch.sort") | Sorts the elements of the `input` tensor along a given dimension in ascending order by value.  
[`topk`](generated/torch.topk#torch.topk "torch.topk") | Returns the `k` largest elements of the given `input` tensor along a given dimension.  
[`msort`](generated/torch.msort#torch.msort "torch.msort") | Sorts the elements of the `input` tensor along its first dimension in ascending order by value.  
  
### Spectral Ops

[`stft`](generated/torch.stft#torch.stft "torch.stft") | Short-time Fourier transform (STFT).  
---|---  
[`istft`](generated/torch.istft#torch.istft "torch.istft") | Inverse short time Fourier Transform.  
[`bartlett_window`](generated/torch.bartlett_window#torch.bartlett_window "torch.bartlett_window") | Bartlett window function.  
[`blackman_window`](generated/torch.blackman_window#torch.blackman_window "torch.blackman_window") | Blackman window function.  
[`hamming_window`](generated/torch.hamming_window#torch.hamming_window "torch.hamming_window") | Hamming window function.  
[`hann_window`](generated/torch.hann_window#torch.hann_window "torch.hann_window") | Hann window function.  
[`kaiser_window`](generated/torch.kaiser_window#torch.kaiser_window "torch.kaiser_window") | Computes the Kaiser window with window length `window_length` and shape parameter `beta`.  
  
### Other Operations

[`atleast_1d`](generated/torch.atleast_1d#torch.atleast_1d "torch.atleast_1d") | Returns a 1-dimensional view of each input tensor with zero dimensions.  
---|---  
[`atleast_2d`](generated/torch.atleast_2d#torch.atleast_2d "torch.atleast_2d") | Returns a 2-dimensional view of each input tensor with zero dimensions.  
[`atleast_3d`](generated/torch.atleast_3d#torch.atleast_3d "torch.atleast_3d") | Returns a 3-dimensional view of each input tensor with zero dimensions.  
[`bincount`](generated/torch.bincount#torch.bincount "torch.bincount") | Count the frequency of each value in an array of non-negative ints.  
[`block_diag`](generated/torch.block_diag#torch.block_diag "torch.block_diag") | Create a block diagonal matrix from provided tensors.  
[`broadcast_tensors`](generated/torch.broadcast_tensors#torch.broadcast_tensors "torch.broadcast_tensors") | Broadcasts the given tensors according to [Broadcasting semantics](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-semantics).  
[`broadcast_to`](generated/torch.broadcast_to#torch.broadcast_to "torch.broadcast_to") | Broadcasts `input` to the shape `shape`.  
[`broadcast_shapes`](generated/torch.broadcast_shapes#torch.broadcast_shapes "torch.broadcast_shapes") | Similar to [`broadcast_tensors()`](generated/torch.broadcast_tensors#torch.broadcast_tensors "torch.broadcast_tensors") but for shapes.  
[`bucketize`](generated/torch.bucketize#torch.bucketize "torch.bucketize") | Returns the indices of the buckets to which each value in the `input` belongs, where the boundaries of the buckets are set by `boundaries`.  
[`cartesian_prod`](generated/torch.cartesian_prod#torch.cartesian_prod "torch.cartesian_prod") | Do cartesian product of the given sequence of tensors.  
[`cdist`](generated/torch.cdist#torch.cdist "torch.cdist") | Computes batched the p-norm distance between each pair of the two collections of row vectors.  
[`clone`](generated/torch.clone#torch.clone "torch.clone") | Returns a copy of `input`.  
[`combinations`](generated/torch.combinations#torch.combinations "torch.combinations") | Compute combinations of length rr of the given tensor.  
[`cross`](generated/torch.cross#torch.cross "torch.cross") | Returns the cross product of vectors in dimension `dim` of `input` and `other`.  
[`cummax`](generated/torch.cummax#torch.cummax "torch.cummax") | Returns a namedtuple `(values, indices)` where `values` is the cumulative maximum of elements of `input` in the dimension `dim`.  
[`cummin`](generated/torch.cummin#torch.cummin "torch.cummin") | Returns a namedtuple `(values, indices)` where `values` is the cumulative minimum of elements of `input` in the dimension `dim`.  
[`cumprod`](generated/torch.cumprod#torch.cumprod "torch.cumprod") | Returns the cumulative product of elements of `input` in the dimension `dim`.  
[`cumsum`](generated/torch.cumsum#torch.cumsum "torch.cumsum") | Returns the cumulative sum of elements of `input` in the dimension `dim`.  
[`diag`](generated/torch.diag#torch.diag "torch.diag") | 

  * If `input` is a vector (1-D tensor), then returns a 2-D square tensor

  
[`diag_embed`](generated/torch.diag_embed#torch.diag_embed "torch.diag_embed") | Creates a tensor whose diagonals of certain 2D planes (specified by `dim1` and `dim2`) are filled by `input`.  
[`diagflat`](generated/torch.diagflat#torch.diagflat "torch.diagflat") | 

  * If `input` is a vector (1-D tensor), then returns a 2-D square tensor

  
[`diagonal`](generated/torch.diagonal#torch.diagonal "torch.diagonal") | Returns a partial view of `input` with the its diagonal elements with respect to `dim1` and `dim2` appended as a dimension at the end of the shape.  
[`diff`](generated/torch.diff#torch.diff "torch.diff") | Computes the n-th forward difference along the given dimension.  
[`einsum`](generated/torch.einsum#torch.einsum "torch.einsum") | Sums the product of the elements of the input `operands` along dimensions specified using a notation based on the Einstein summation convention.  
[`flatten`](generated/torch.flatten#torch.flatten "torch.flatten") | Flattens `input` by reshaping it into a one-dimensional tensor.  
[`flip`](generated/torch.flip#torch.flip "torch.flip") | Reverse the order of a n-D tensor along given axis in dims.  
[`fliplr`](generated/torch.fliplr#torch.fliplr "torch.fliplr") | Flip tensor in the left/right direction, returning a new tensor.  
[`flipud`](generated/torch.flipud#torch.flipud "torch.flipud") | Flip tensor in the up/down direction, returning a new tensor.  
[`kron`](generated/torch.kron#torch.kron "torch.kron") | Computes the Kronecker product, denoted by ⊗\otimes , of `input` and `other`.  
[`rot90`](generated/torch.rot90#torch.rot90 "torch.rot90") | Rotate a n-D tensor by 90 degrees in the plane specified by dims axis.  
[`gcd`](generated/torch.gcd#torch.gcd "torch.gcd") | Computes the element-wise greatest common divisor (GCD) of `input` and `other`.  
[`histc`](generated/torch.histc#torch.histc "torch.histc") | Computes the histogram of a tensor.  
[`meshgrid`](generated/torch.meshgrid#torch.meshgrid "torch.meshgrid") | Take NN tensors, each of which can be either scalar or 1-dimensional vector, and create NN N-dimensional grids, where the ii th grid is defined by expanding the ii th input over dimensions defined by other inputs.  
[`lcm`](generated/torch.lcm#torch.lcm "torch.lcm") | Computes the element-wise least common multiple (LCM) of `input` and `other`.  
[`logcumsumexp`](generated/torch.logcumsumexp#torch.logcumsumexp "torch.logcumsumexp") | Returns the logarithm of the cumulative summation of the exponentiation of elements of `input` in the dimension `dim`.  
[`ravel`](generated/torch.ravel#torch.ravel "torch.ravel") | Return a contiguous flattened tensor.  
[`renorm`](generated/torch.renorm#torch.renorm "torch.renorm") | Returns a tensor where each sub-tensor of `input` along dimension `dim` is normalized such that the `p`-norm of the sub-tensor is lower than the value `maxnorm`  
[`repeat_interleave`](generated/torch.repeat_interleave#torch.repeat_interleave "torch.repeat_interleave") | Repeat elements of a tensor.  
[`roll`](generated/torch.roll#torch.roll "torch.roll") | Roll the tensor along the given dimension(s).  
[`searchsorted`](generated/torch.searchsorted#torch.searchsorted "torch.searchsorted") | Find the indices from the _innermost_ dimension of `sorted_sequence` such that, if the corresponding values in `values` were inserted before the indices, the order of the corresponding _innermost_ dimension within `sorted_sequence` would be preserved.  
[`tensordot`](generated/torch.tensordot#torch.tensordot "torch.tensordot") | Returns a contraction of a and b over multiple dimensions.  
[`trace`](generated/torch.trace#torch.trace "torch.trace") | Returns the sum of the elements of the diagonal of the input 2-D matrix.  
[`tril`](generated/torch.tril#torch.tril "torch.tril") | Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices `input`, the other elements of the result tensor `out` are set to 0.  
[`tril_indices`](generated/torch.tril_indices#torch.tril_indices "torch.tril_indices") | Returns the indices of the lower triangular part of a `row`-by- `col` matrix in a 2-by-N Tensor, where the first row contains row coordinates of all indices and the second row contains column coordinates.  
[`triu`](generated/torch.triu#torch.triu "torch.triu") | Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices `input`, the other elements of the result tensor `out` are set to 0.  
[`triu_indices`](generated/torch.triu_indices#torch.triu_indices "torch.triu_indices") | Returns the indices of the upper triangular part of a `row` by `col` matrix in a 2-by-N Tensor, where the first row contains row coordinates of all indices and the second row contains column coordinates.  
[`vander`](generated/torch.vander#torch.vander "torch.vander") | Generates a Vandermonde matrix.  
[`view_as_real`](generated/torch.view_as_real#torch.view_as_real "torch.view_as_real") | Returns a view of `input` as a real tensor.  
[`view_as_complex`](generated/torch.view_as_complex#torch.view_as_complex "torch.view_as_complex") | Returns a view of `input` as a complex tensor.  
  
### BLAS and LAPACK Operations

[`addbmm`](generated/torch.addbmm#torch.addbmm "torch.addbmm") | Performs a batch matrix-matrix product of matrices stored in `batch1` and `batch2`, with a reduced add step (all matrix multiplications get accumulated along the first dimension).  
---|---  
[`addmm`](generated/torch.addmm#torch.addmm "torch.addmm") | Performs a matrix multiplication of the matrices `mat1` and `mat2`.  
[`addmv`](generated/torch.addmv#torch.addmv "torch.addmv") | Performs a matrix-vector product of the matrix `mat` and the vector `vec`.  
[`addr`](generated/torch.addr#torch.addr "torch.addr") | Performs the outer-product of vectors `vec1` and `vec2` and adds it to the matrix `input`.  
[`baddbmm`](generated/torch.baddbmm#torch.baddbmm "torch.baddbmm") | Performs a batch matrix-matrix product of matrices in `batch1` and `batch2`.  
[`bmm`](generated/torch.bmm#torch.bmm "torch.bmm") | Performs a batch matrix-matrix product of matrices stored in `input` and `mat2`.  
[`chain_matmul`](generated/torch.chain_matmul#torch.chain_matmul "torch.chain_matmul") | Returns the matrix product of the NN 2-D tensors.  
[`cholesky`](generated/torch.cholesky#torch.cholesky "torch.cholesky") | Computes the Cholesky decomposition of a symmetric positive-definite matrix AA or for batches of symmetric positive-definite matrices.  
[`cholesky_inverse`](generated/torch.cholesky_inverse#torch.cholesky_inverse "torch.cholesky_inverse") | Computes the inverse of a symmetric positive-definite matrix AA using its Cholesky factor uu : returns matrix `inv`.  
[`cholesky_solve`](generated/torch.cholesky_solve#torch.cholesky_solve "torch.cholesky_solve") | Solves a linear system of equations with a positive semidefinite matrix to be inverted given its Cholesky factor matrix uu .  
[`dot`](generated/torch.dot#torch.dot "torch.dot") | Computes the dot product of two 1D tensors.  
[`eig`](generated/torch.eig#torch.eig "torch.eig") | Computes the eigenvalues and eigenvectors of a real square matrix.  
[`geqrf`](generated/torch.geqrf#torch.geqrf "torch.geqrf") | This is a low-level function for calling LAPACK directly.  
[`ger`](generated/torch.ger#torch.ger "torch.ger") | Alias of [`torch.outer()`](generated/torch.outer#torch.outer "torch.outer").  
[`inner`](generated/torch.inner#torch.inner "torch.inner") | Computes the dot product for 1D tensors.  
[`inverse`](generated/torch.inverse#torch.inverse "torch.inverse") | Takes the inverse of the square matrix `input`.  
[`det`](generated/torch.det#torch.det "torch.det") | Calculates determinant of a square matrix or batches of square matrices.  
[`logdet`](generated/torch.logdet#torch.logdet "torch.logdet") | Calculates log determinant of a square matrix or batches of square matrices.  
[`slogdet`](generated/torch.slogdet#torch.slogdet "torch.slogdet") | Calculates the sign and log absolute value of the determinant(s) of a square matrix or batches of square matrices.  
[`lstsq`](generated/torch.lstsq#torch.lstsq "torch.lstsq") | Computes the solution to the least squares and least norm problems for a full rank matrix AA of size (m×n)(m \times n) and a matrix BB of size (m×k)(m \times k) .  
[`lu`](generated/torch.lu#torch.lu "torch.lu") | Computes the LU factorization of a matrix or batches of matrices `A`.  
[`lu_solve`](generated/torch.lu_solve#torch.lu_solve "torch.lu_solve") | Returns the LU solve of the linear system Ax=bAx = b using the partially pivoted LU factorization of A from [`torch.lu()`](generated/torch.lu#torch.lu "torch.lu").  
[`lu_unpack`](generated/torch.lu_unpack#torch.lu_unpack "torch.lu_unpack") | Unpacks the data and pivots from a LU factorization of a tensor.  
[`matmul`](generated/torch.matmul#torch.matmul "torch.matmul") | Matrix product of two tensors.  
[`matrix_power`](generated/torch.matrix_power#torch.matrix_power "torch.matrix_power") | Returns the matrix raised to the power `n` for square matrices.  
[`matrix_rank`](generated/torch.matrix_rank#torch.matrix_rank "torch.matrix_rank") | Returns the numerical rank of a 2-D tensor.  
[`matrix_exp`](generated/torch.matrix_exp#torch.matrix_exp "torch.matrix_exp") | Returns the matrix exponential.  
[`mm`](generated/torch.mm#torch.mm "torch.mm") | Performs a matrix multiplication of the matrices `input` and `mat2`.  
[`mv`](generated/torch.mv#torch.mv "torch.mv") | Performs a matrix-vector product of the matrix `input` and the vector `vec`.  
[`orgqr`](generated/torch.orgqr#torch.orgqr "torch.orgqr") | Computes the orthogonal matrix `Q` of a QR factorization, from the `(input, input2)` tuple returned by [`torch.geqrf()`](generated/torch.geqrf#torch.geqrf "torch.geqrf").  
[`ormqr`](generated/torch.ormqr#torch.ormqr "torch.ormqr") | Multiplies `mat` (given by `input3`) by the orthogonal `Q` matrix of the QR factorization formed by [`torch.geqrf()`](generated/torch.geqrf#torch.geqrf "torch.geqrf") that is represented by `(a, tau)` (given by (`input`, `input2`)).  
[`outer`](generated/torch.outer#torch.outer "torch.outer") | Outer product of `input` and `vec2`.  
[`pinverse`](generated/torch.pinverse#torch.pinverse "torch.pinverse") | Calculates the pseudo-inverse (also known as the Moore-Penrose inverse) of a 2D tensor.  
[`qr`](generated/torch.qr#torch.qr "torch.qr") | Computes the QR decomposition of a matrix or a batch of matrices `input`, and returns a namedtuple (Q, R) of tensors such that input=QR\text{input} = Q R with QQ being an orthogonal matrix or batch of orthogonal matrices and RR being an upper triangular matrix or batch of upper triangular matrices.  
[`solve`](generated/torch.solve#torch.solve "torch.solve") | This function returns the solution to the system of linear equations represented by AX=BAX = B and the LU factorization of A, in order as a namedtuple `solution, LU`.  
[`svd`](generated/torch.svd#torch.svd "torch.svd") | Computes the singular value decomposition of either a matrix or batch of matrices `input`.  
[`svd_lowrank`](generated/torch.svd_lowrank#torch.svd_lowrank "torch.svd_lowrank") | Return the singular value decomposition `(U, S, V)` of a matrix, batches of matrices, or a sparse matrix AA such that A≈Udiag(S)VTA \approx U diag(S) V^T .  
[`pca_lowrank`](generated/torch.pca_lowrank#torch.pca_lowrank "torch.pca_lowrank") | Performs linear Principal Component Analysis (PCA) on a low-rank matrix, batches of such matrices, or sparse matrix.  
[`symeig`](generated/torch.symeig#torch.symeig "torch.symeig") | This function returns eigenvalues and eigenvectors of a real symmetric matrix `input` or a batch of real symmetric matrices, represented by a namedtuple (eigenvalues, eigenvectors).  
[`lobpcg`](generated/torch.lobpcg#torch.lobpcg "torch.lobpcg") | Find the k largest (or smallest) eigenvalues and the corresponding eigenvectors of a symmetric positive defined generalized eigenvalue problem using matrix-free LOBPCG methods.  
[`trapz`](generated/torch.trapz#torch.trapz "torch.trapz") | Estimate ∫ydx\int y\,dx along `dim`, using the trapezoid rule.  
[`triangular_solve`](generated/torch.triangular_solve#torch.triangular_solve "torch.triangular_solve") | Solves a system of equations with a triangular coefficient matrix AA and multiple right-hand sides bb .  
[`vdot`](generated/torch.vdot#torch.vdot "torch.vdot") | Computes the dot product of two 1D tensors.  
  
## Utilities

[`compiled_with_cxx11_abi`](generated/torch.compiled_with_cxx11_abi#torch.compiled_with_cxx11_abi "torch.compiled_with_cxx11_abi") | Returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1  
---|---  
[`result_type`](generated/torch.result_type#torch.result_type "torch.result_type") | Returns the [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") that would result from performing an arithmetic operation on the provided input tensors.  
[`can_cast`](generated/torch.can_cast#torch.can_cast "torch.can_cast") | Determines if a type conversion is allowed under PyTorch casting rules described in the type promotion [documentation](tensor_attributes#type-promotion-doc).  
[`promote_types`](generated/torch.promote_types#torch.promote_types "torch.promote_types") | Returns the [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") with the smallest size and scalar kind that is not smaller nor of lower kind than either `type1` or `type2`.  
[`use_deterministic_algorithms`](generated/torch.use_deterministic_algorithms#torch.use_deterministic_algorithms "torch.use_deterministic_algorithms") | Sets whether PyTorch operations must use “deterministic” algorithms.  
[`are_deterministic_algorithms_enabled`](generated/torch.are_deterministic_algorithms_enabled#torch.are_deterministic_algorithms_enabled "torch.are_deterministic_algorithms_enabled") | Returns True if the global deterministic flag is turned on.  
[`_assert`](generated/torch._assert#torch._assert "torch._assert") | A wrapper around Python’s assert which is symbolically traceable.  
  
# torch.nn.intrinsic

This module implements the combined (fused) modules conv + relu which can be
then quantized.

## ConvBn1d

`class torch.nn.intrinsic.ConvBn1d(conv, bn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/modules/fused.html#ConvBn1d)

    

This is a sequential container which calls the Conv 1d and Batch Norm 1d
modules. During quantization this will be replaced with the corresponding
fused module.

## ConvBn2d

`class torch.nn.intrinsic.ConvBn2d(conv, bn)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/modules/fused.html#ConvBn2d)

    

This is a sequential container which calls the Conv 2d and Batch Norm 2d
modules. During quantization this will be replaced with the corresponding
fused module.

## ConvBnReLU1d

`class torch.nn.intrinsic.ConvBnReLU1d(conv, bn, relu)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/modules/fused.html#ConvBnReLU1d)

    

This is a sequential container which calls the Conv 1d, Batch Norm 1d, and
ReLU modules. During quantization this will be replaced with the corresponding
fused module.

## ConvBnReLU2d

`class torch.nn.intrinsic.ConvBnReLU2d(conv, bn, relu)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/modules/fused.html#ConvBnReLU2d)

    

This is a sequential container which calls the Conv 2d, Batch Norm 2d, and
ReLU modules. During quantization this will be replaced with the corresponding
fused module.

## ConvReLU1d

`class torch.nn.intrinsic.ConvReLU1d(conv, relu)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/modules/fused.html#ConvReLU1d)

    

This is a sequential container which calls the Conv1d and ReLU modules. During
quantization this will be replaced with the corresponding fused module.

## ConvReLU2d

`class torch.nn.intrinsic.ConvReLU2d(conv, relu)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/modules/fused.html#ConvReLU2d)

    

This is a sequential container which calls the Conv2d and ReLU modules. During
quantization this will be replaced with the corresponding fused module.

# torch.nn.intrinsic.qat

This module implements the versions of those fused operations needed for
quantization aware training.

## ConvBn2d

`class torch.nn.intrinsic.qat.ConvBn2d(in_channels, out_channels, kernel_size,
stride=1, padding=0, dilation=1, groups=1, bias=None, padding_mode='zeros',
eps=1e-05, momentum=0.1, freeze_bn=False, qconfig=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/qat/modules/conv_fused.html#ConvBn2d)

    

A ConvBn2d module is a module fused from Conv2d and BatchNorm2d, attached with
FakeQuantize modules for weight, used in quantization aware training.

We combined the interface of
[`torch.nn.Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d
"torch.nn.Conv2d") and
[`torch.nn.BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d
"torch.nn.BatchNorm2d").

Similar to [`torch.nn.Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d
"torch.nn.Conv2d"), with FakeQuantize modules initialized to default.

Variables

    

  * **~ConvBn2d.freeze_bn** – 
  * **~ConvBn2d.weight_fake_quant** – fake quant module for weight

## ConvBnReLU2d

`class torch.nn.intrinsic.qat.ConvBnReLU2d(in_channels, out_channels,
kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=None,
padding_mode='zeros', eps=1e-05, momentum=0.1, freeze_bn=False, qconfig=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/qat/modules/conv_fused.html#ConvBnReLU2d)

    

A ConvBnReLU2d module is a module fused from Conv2d, BatchNorm2d and ReLU,
attached with FakeQuantize modules for weight, used in quantization aware
training.

We combined the interface of
[`torch.nn.Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d
"torch.nn.Conv2d") and
[`torch.nn.BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d
"torch.nn.BatchNorm2d") and
[`torch.nn.ReLU`](generated/torch.nn.relu#torch.nn.ReLU "torch.nn.ReLU").

Similar to `torch.nn.Conv2d`, with FakeQuantize modules initialized to
default.

Variables

    

**~ConvBnReLU2d.weight_fake_quant** – fake quant module for weight

## ConvReLU2d

`class torch.nn.intrinsic.qat.ConvReLU2d(in_channels, out_channels,
kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True,
padding_mode='zeros', qconfig=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/qat/modules/conv_fused.html#ConvReLU2d)

    

A ConvReLU2d module is a fused module of Conv2d and ReLU, attached with
FakeQuantize modules for weight for quantization aware training.

We combined the interface of
[`Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") and
[`BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d
"torch.nn.BatchNorm2d").

Variables

    

**~ConvReLU2d.weight_fake_quant** – fake quant module for weight

## LinearReLU

`class torch.nn.intrinsic.qat.LinearReLU(in_features, out_features, bias=True,
qconfig=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/qat/modules/linear_relu.html#LinearReLU)

    

A LinearReLU module fused from Linear and ReLU modules, attached with
FakeQuantize modules for weight, used in quantization aware training.

We adopt the same interface as
[`torch.nn.Linear`](generated/torch.nn.linear#torch.nn.Linear
"torch.nn.Linear").

Similar to `torch.nn.intrinsic.LinearReLU`, with FakeQuantize modules
initialized to default.

Variables

    

**~LinearReLU.weight** – fake quant module for weight

Examples:

    
    
    >>> m = nn.qat.LinearReLU(20, 30)
    >>> input = torch.randn(128, 20)
    >>> output = m(input)
    >>> print(output.size())
    torch.Size([128, 30])
    

# torch.nn.intrinsic.quantized

This module implements the quantized implementations of fused operations like
conv + relu.

## ConvReLU2d

`class torch.nn.intrinsic.quantized.ConvReLU2d(in_channels, out_channels,
kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/quantized/modules/conv_relu.html#ConvReLU2d)

    

A ConvReLU2d module is a fused module of Conv2d and ReLU

We adopt the same interface as
[`torch.nn.quantized.Conv2d`](torch.nn.quantized#torch.nn.quantized.Conv2d
"torch.nn.quantized.Conv2d").

Variables

    

**as torch.nn.quantized.Conv2d** (_Same_) –

## ConvReLU3d

`class torch.nn.intrinsic.quantized.ConvReLU3d(in_channels, out_channels,
kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True,
padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/quantized/modules/conv_relu.html#ConvReLU3d)

    

A ConvReLU3d module is a fused module of Conv3d and ReLU

We adopt the same interface as
[`torch.nn.quantized.Conv3d`](torch.nn.quantized#torch.nn.quantized.Conv3d
"torch.nn.quantized.Conv3d").

Attributes: Same as torch.nn.quantized.Conv3d

## LinearReLU

`class torch.nn.intrinsic.quantized.LinearReLU(in_features, out_features,
bias=True, dtype=torch.qint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/quantized/modules/linear_relu.html#LinearReLU)

    

A LinearReLU module fused from Linear and ReLU modules

We adopt the same interface as
[`torch.nn.quantized.Linear`](torch.nn.quantized#torch.nn.quantized.Linear
"torch.nn.quantized.Linear").

Variables

    

**as torch.nn.quantized.Linear** (_Same_) –

Examples:

    
    
    >>> m = nn.intrinsic.LinearReLU(20, 30)
    >>> input = torch.randn(128, 20)
    >>> output = m(input)
    >>> print(output.size())
    torch.Size([128, 30])
    

# torch.nn.qat

This module implements versions of the key nn modules **Conv2d()** and
**Linear()** which run in FP32 but with rounding applied to simulate the
effect of INT8 quantization.

## Conv2d

`class torch.nn.qat.Conv2d(in_channels, out_channels, kernel_size, stride=1,
padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros',
qconfig=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/conv.html#Conv2d)

    

A Conv2d module attached with FakeQuantize modules for weight, used for
quantization aware training.

We adopt the same interface as `torch.nn.Conv2d`, please see
<https://pytorch.org/docs/stable/nn.html?highlight=conv2d#torch.nn.Conv2d> for
documentation.

Similar to `torch.nn.Conv2d`, with FakeQuantize modules initialized to
default.

Variables

    

**~Conv2d.weight_fake_quant** – fake quant module for weight

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/conv.html#Conv2d.from_float)

    

Create a qat module from a float module or qparams_dict

Args: `mod` a float module, either produced by torch.quantization utilities or
directly from user

## Linear

`class torch.nn.qat.Linear(in_features, out_features, bias=True,
qconfig=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/linear.html#Linear)

    

A linear module attached with FakeQuantize modules for weight, used for
quantization aware training.

We adopt the same interface as `torch.nn.Linear`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.Linear> for documentation.

Similar to `torch.nn.Linear`, with FakeQuantize modules initialized to
default.

Variables

    

**~Linear.weight** – fake quant module for weight

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/linear.html#Linear.from_float)

    

Create a qat module from a float module or qparams_dict

Args: `mod` a float module, either produced by torch.quantization utilities or
directly from user

# torch.nn.quantized.dynamic

## Linear

`class torch.nn.quantized.dynamic.Linear(in_features, out_features,
bias_=True, dtype=torch.qint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/linear.html#Linear)

    

A dynamic quantized linear module with floating point tensor as inputs and
outputs. We adopt the same interface as `torch.nn.Linear`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.Linear> for documentation.

Similar to [`torch.nn.Linear`](generated/torch.nn.linear#torch.nn.Linear
"torch.nn.Linear"), attributes will be randomly initialized at module creation
time and will be overwritten later

Variables

    

  * **~Linear.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable quantized weights of the module which are of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) .
  * **~Linear.bias** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable floating point bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized to zero.

Examples:

    
    
    >>> m = nn.quantized.dynamic.Linear(20, 30)
    >>> input = torch.randn(128, 20)
    >>> output = m(input)
    >>> print(output.size())
    torch.Size([128, 30])
    

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/linear.html#Linear.from_float)

    

Create a dynamic quantized module from a float module or qparams_dict

Parameters

    

**mod** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – a float module, either produced by torch.quantization
utilities or provided by the user

## LSTM

`class torch.nn.quantized.dynamic.LSTM(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#LSTM)

    

A dynamic quantized LSTM module with floating point tensor as inputs and
outputs. We adopt the same interface as `torch.nn.LSTM`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.LSTM> for documentation.

Examples:

    
    
    >>> rnn = nn.LSTM(10, 20, 2)
    >>> input = torch.randn(5, 3, 10)
    >>> h0 = torch.randn(2, 3, 20)
    >>> c0 = torch.randn(2, 3, 20)
    >>> output, (hn, cn) = rnn(input, (h0, c0))
    

## LSTMCell

`class torch.nn.quantized.dynamic.LSTMCell(*args, **kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#LSTMCell)

    

A long short-term memory (LSTM) cell.

A dynamic quantized LSTMCell module with floating point tensor as inputs and
outputs. Weights are quantized to 8 bits. We adopt the same interface as
`torch.nn.LSTMCell`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.LSTMCell> for documentation.

Examples:

    
    
    >>> rnn = nn.LSTMCell(10, 20)
    >>> input = torch.randn(6, 3, 10)
    >>> hx = torch.randn(3, 20)
    >>> cx = torch.randn(3, 20)
    >>> output = []
    >>> for i in range(6):
            hx, cx = rnn(input[i], (hx, cx))
            output.append(hx)
    

## GRUCell

`class torch.nn.quantized.dynamic.GRUCell(input_size, hidden_size, bias=True,
dtype=torch.qint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#GRUCell)

    

A gated recurrent unit (GRU) cell

A dynamic quantized GRUCell module with floating point tensor as inputs and
outputs. Weights are quantized to 8 bits. We adopt the same interface as
`torch.nn.GRUCell`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.GRUCell> for documentation.

Examples:

    
    
    >>> rnn = nn.GRUCell(10, 20)
    >>> input = torch.randn(6, 3, 10)
    >>> hx = torch.randn(3, 20)
    >>> output = []
    >>> for i in range(6):
            hx = rnn(input[i], hx)
            output.append(hx)
    

## RNNCell

`class torch.nn.quantized.dynamic.RNNCell(input_size, hidden_size, bias=True,
nonlinearity='tanh', dtype=torch.qint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#RNNCell)

    

An Elman RNN cell with tanh or ReLU non-linearity. A dynamic quantized RNNCell
module with floating point tensor as inputs and outputs. Weights are quantized
to 8 bits. We adopt the same interface as `torch.nn.RNNCell`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.RNNCell> for documentation.

Examples:

    
    
    >>> rnn = nn.RNNCell(10, 20)
    >>> input = torch.randn(6, 3, 10)
    >>> hx = torch.randn(3, 20)
    >>> output = []
    >>> for i in range(6):
            hx = rnn(input[i], hx)
            output.append(hx)
    

# torch.nn.quantized

This module implements the quantized versions of the nn modules and
functionals.

## Functional interface

Functional interface (quantized).

`torch.nn.quantized.functional.linear(input, weight, bias=None, scale=None,
zero_point=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#linear)

    

Applies a linear transformation to the incoming quantized data: y=xAT+by =
xA^T + b . See `Linear`

Note

Current implementation packs weights on every call, which has penalty on
performance. If you want to avoid the overhead, use `Linear`.

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Quantized input of type `torch.quint8`
  * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Quantized weight of type `torch.qint8`
  * **bias** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – None or fp32 bias of type `torch.float`
  * **scale** (_double_) – output scale. If None, derived from the input scale
  * **zero_point** (_long_) – output zero point. If None, derived from the input zero_point

Shape:

    

  * Input: (N,∗,in_features)(N, *, in\\_features) where `*` means any number of additional dimensions
  * Weight: (out_features,in_features)(out\\_features, in\\_features)
  * Bias: (out_features)(out\\_features)
  * Output: (N,∗,out_features)(N, *, out\\_features)

`torch.nn.quantized.functional.conv1d(input, weight, bias, stride=1,
padding=0, dilation=1, groups=1, padding_mode='zeros', scale=1.0,
zero_point=0, dtype=torch.quint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#conv1d)

    

Applies a 1D convolution over a quantized 1D input composed of several input
planes.

See `Conv1d` for details and output shape.

Parameters

    

  * **input** – quantized input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW)
  * **weight** – quantized filters of shape (out_channels,in_channelsgroups,iW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , iW)
  * **bias** – **non-quantized** bias tensor of shape (out_channels)(\text{out\\_channels}) . The tensor type must be `torch.float`.
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sW,)`. Default: 1
  * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padW,)`. Default: 0
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dW,)`. Default: 1
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1
  * **padding_mode** – the padding mode to use. Only “zeros” is supported for quantized convolution at the moment. Default: “zeros”
  * **scale** – quantization scale for the output. Default: 1.0
  * **zero_point** – quantization zero_point for the output. Default: 0
  * **dtype** – quantization data type to use. Default: `torch.quint8`

Examples:

    
    
    >>> from torch.nn.quantized import functional as qF
    >>> filters = torch.randn(33, 16, 3, dtype=torch.float)
    >>> inputs = torch.randn(20, 16, 50, dtype=torch.float)
    >>> bias = torch.randn(33, dtype=torch.float)
    >>>
    >>> scale, zero_point = 1.0, 0
    >>> dtype_inputs = torch.quint8
    >>> dtype_filters = torch.qint8
    >>>
    >>> q_filters = torch.quantize_per_tensor(filters, scale, zero_point, dtype_filters)
    >>> q_inputs = torch.quantize_per_tensor(inputs, scale, zero_point, dtype_inputs)
    >>> qF.conv1d(q_inputs, q_filters, bias, padding=1, scale=scale, zero_point=zero_point)
    

`torch.nn.quantized.functional.conv2d(input, weight, bias, stride=1,
padding=0, dilation=1, groups=1, padding_mode='zeros', scale=1.0,
zero_point=0, dtype=torch.quint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#conv2d)

    

Applies a 2D convolution over a quantized 2D input composed of several input
planes.

See `Conv2d` for details and output shape.

Parameters

    

  * **input** – quantized input tensor of shape (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW)
  * **weight** – quantized filters of shape (out_channels,in_channelsgroups,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kH , kW)
  * **bias** – **non-quantized** bias tensor of shape (out_channels)(\text{out\\_channels}) . The tensor type must be `torch.float`.
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sH, sW)`. Default: 1
  * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dH, dW)`. Default: 1
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1
  * **padding_mode** – the padding mode to use. Only “zeros” is supported for quantized convolution at the moment. Default: “zeros”
  * **scale** – quantization scale for the output. Default: 1.0
  * **zero_point** – quantization zero_point for the output. Default: 0
  * **dtype** – quantization data type to use. Default: `torch.quint8`

Examples:

    
    
    >>> from torch.nn.quantized import functional as qF
    >>> filters = torch.randn(8, 4, 3, 3, dtype=torch.float)
    >>> inputs = torch.randn(1, 4, 5, 5, dtype=torch.float)
    >>> bias = torch.randn(8, dtype=torch.float)
    >>>
    >>> scale, zero_point = 1.0, 0
    >>> dtype_inputs = torch.quint8
    >>> dtype_filters = torch.qint8
    >>>
    >>> q_filters = torch.quantize_per_tensor(filters, scale, zero_point, dtype_filters)
    >>> q_inputs = torch.quantize_per_tensor(inputs, scale, zero_point, dtype_inputs)
    >>> qF.conv2d(q_inputs, q_filters, bias, padding=1, scale=scale, zero_point=zero_point)
    

`torch.nn.quantized.functional.conv3d(input, weight, bias, stride=1,
padding=0, dilation=1, groups=1, padding_mode='zeros', scale=1.0,
zero_point=0, dtype=torch.quint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#conv3d)

    

Applies a 3D convolution over a quantized 3D input composed of several input
planes.

See `Conv3d` for details and output shape.

Parameters

    

  * **input** – quantized input tensor of shape (minibatch,in_channels,iD,iH,iW)(\text{minibatch} , \text{in\\_channels} , iD , iH , iW)
  * **weight** – quantized filters of shape (out_channels,in_channelsgroups,kD,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kD , kH , kW)
  * **bias** – **non-quantized** bias tensor of shape (out_channels)(\text{out\\_channels}) . The tensor type must be `torch.float`.
  * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sD, sH, sW)`. Default: 1
  * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padD, padH, padW)`. Default: 0
  * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dD, dH, dW)`. Default: 1
  * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1
  * **padding_mode** – the padding mode to use. Only “zeros” is supported for quantized convolution at the moment. Default: “zeros”
  * **scale** – quantization scale for the output. Default: 1.0
  * **zero_point** – quantization zero_point for the output. Default: 0
  * **dtype** – quantization data type to use. Default: `torch.quint8`

Examples:

    
    
    >>> from torch.nn.quantized import functional as qF
    >>> filters = torch.randn(8, 4, 3, 3, 3, dtype=torch.float)
    >>> inputs = torch.randn(1, 4, 5, 5, 5, dtype=torch.float)
    >>> bias = torch.randn(8, dtype=torch.float)
    >>>
    >>> scale, zero_point = 1.0, 0
    >>> dtype_inputs = torch.quint8
    >>> dtype_filters = torch.qint8
    >>>
    >>> q_filters = torch.quantize_per_tensor(filters, scale, zero_point, dtype_filters)
    >>> q_inputs = torch.quantize_per_tensor(inputs, scale, zero_point, dtype_inputs)
    >>> qF.conv3d(q_inputs, q_filters, bias, padding=1, scale=scale, zero_point=zero_point)
    

`torch.nn.quantized.functional.max_pool2d(input, kernel_size, stride=None,
padding=0, dilation=1, ceil_mode=False, return_indices=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#max_pool2d)

    

Applies a 2D max pooling over a quantized input signal composed of several
quantized input planes.

Note

The input quantization parameters are propagated to the output.

See `MaxPool2d` for details.

`torch.nn.quantized.functional.adaptive_avg_pool2d(input, output_size)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#adaptive_avg_pool2d)

    

Applies a 2D adaptive average pooling over a quantized input signal composed
of several quantized input planes.

Note

The input quantization parameters propagate to the output.

See `AdaptiveAvgPool2d` for details and output shape.

Parameters

    

**output_size** – the target output size (single integer or double-integer
tuple)

`torch.nn.quantized.functional.avg_pool2d(input, kernel_size, stride=None,
padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#avg_pool2d)

    

Applies 2D average-pooling operation in kH×kWkH \times kW regions by step size
sH×sWsH \times sW steps. The number of output features is equal to the number
of input planes.

Note

The input quantization parameters propagate to the output.

See `AvgPool2d` for details and output shape.

Parameters

    

  * **input** – quantized input tensor (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW)
  * **kernel_size** – size of the pooling region. Can be a single number or a tuple `(kH, kW)`
  * **stride** – stride of the pooling operation. Can be a single number or a tuple `(sH, sW)`. Default: `kernel_size`
  * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0
  * **ceil_mode** – when True, will use `ceil` instead of `floor` in the formula to compute the output shape. Default: `False`
  * **count_include_pad** – when True, will include the zero-padding in the averaging calculation. Default: `True`
  * **divisor_override** – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None

`torch.nn.quantized.functional.interpolate(input, size=None,
scale_factor=None, mode='nearest', align_corners=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#interpolate)

    

Down/up samples the input to either the given `size` or the given
`scale_factor`

See
[`torch.nn.functional.interpolate()`](nn.functional#torch.nn.functional.interpolate
"torch.nn.functional.interpolate") for implementation details.

The input dimensions are interpreted in the form: `mini-batch x channels x
[optional depth] x [optional height] x width`.

Note

The input quantization parameters propagate to the output.

Note

Only 2D/3D input is supported for quantized inputs

Note

Only the following modes are supported for the quantized inputs:

  * `bilinear`
  * `nearest`

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size.
  * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to match input size if it is a tuple.
  * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – algorithm used for upsampling: `'nearest'` | `'bilinear'`
  * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'bilinear'`. Default: `False`

`torch.nn.quantized.functional.hardswish(input, scale, zero_point)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#hardswish)

    

This is the quantized version of
[`hardswish()`](nn.functional#torch.nn.functional.hardswish
"torch.nn.functional.hardswish").

Parameters

    

  * **input** – quantized input
  * **scale** – quantization scale of the output tensor
  * **zero_point** – quantization zero point of the output tensor

`torch.nn.quantized.functional.upsample(input, size=None, scale_factor=None,
mode='nearest', align_corners=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#upsample)

    

Upsamples the input to either the given `size` or the given `scale_factor`

Warning

This function is deprecated in favor of
`torch.nn.quantized.functional.interpolate()`. This is equivalent with
`nn.quantized.functional.interpolate(...)`.

See
[`torch.nn.functional.interpolate()`](nn.functional#torch.nn.functional.interpolate
"torch.nn.functional.interpolate") for implementation details.

The input dimensions are interpreted in the form: `mini-batch x channels x
[optional depth] x [optional height] x width`.

Note

The input quantization parameters propagate to the output.

Note

Only 2D input is supported for quantized inputs

Note

Only the following modes are supported for the quantized inputs:

  * `bilinear`
  * `nearest`

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – quantized input tensor
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size.
  * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to be an integer.
  * **mode** (_string_) – algorithm used for upsampling: `'nearest'` | `'bilinear'`
  * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'bilinear'`. Default: `False`

Warning

With `align_corners = True`, the linearly interpolating modes (`bilinear`)
don’t proportionally align the output and input pixels, and thus the output
values can depend on the input size. This was the default behavior for these
modes up to version 0.3.1. Since then, the default behavior is `align_corners
= False`. See [`Upsample`](generated/torch.nn.upsample#torch.nn.Upsample
"torch.nn.Upsample") for concrete examples on how this affects the outputs.

`torch.nn.quantized.functional.upsample_bilinear(input, size=None,
scale_factor=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#upsample_bilinear)

    

Upsamples the input, using bilinear upsampling.

Warning

This function is deprecated in favor of
`torch.nn.quantized.functional.interpolate()`. This is equivalent with
`nn.quantized.functional.interpolate(..., mode='bilinear',
align_corners=True)`.

Note

The input quantization parameters propagate to the output.

Note

Only 2D inputs are supported

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – quantized input
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size.
  * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – multiplier for spatial size

`torch.nn.quantized.functional.upsample_nearest(input, size=None,
scale_factor=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#upsample_nearest)

    

Upsamples the input, using nearest neighbours’ pixel values.

Warning

This function is deprecated in favor of
`torch.nn.quantized.functional.interpolate()`. This is equivalent with
`nn.quantized.functional.interpolate(..., mode='nearest')`.

Note

The input quantization parameters propagate to the output.

Note

Only 2D inputs are supported

Parameters

    

  * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – quantized input
  * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size.
  * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – multiplier for spatial size. Has to be an integer.

## ReLU6

`class torch.nn.quantized.ReLU6(inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/activation.html#ReLU6)

    

Applies the element-wise function:

ReLU6(x)=min⁡(max⁡(x0,x),q(6))\text{ReLU6}(x) = \min(\max(x_0, x), q(6)) ,
where x0x_0 is the zero_point, and q(6)q(6) is the quantized representation of
number 6.

Parameters

    

**inplace** – can optionally do the operation in-place. Default: `False`

Shape:

    

  * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions
  * Output: (N,∗)(N, *) , same shape as the input

Examples:

    
    
    >>> m = nn.quantized.ReLU6()
    >>> input = torch.randn(2)
    >>> input = torch.quantize_per_tensor(input, 1.0, 0, dtype=torch.qint32)
    >>> output = m(input)
    

## ELU

`class torch.nn.quantized.ELU(scale, zero_point, alpha=1.0)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/activation.html#ELU)

    

This is the quantized equivalent of
[`ELU`](generated/torch.nn.elu#torch.nn.ELU "torch.nn.ELU").

Parameters

    

  * **scale** – quantization scale of the output tensor
  * **zero_point** – quantization zero point of the output tensor
  * **alpha** – the alpha constant

## Hardswish

`class torch.nn.quantized.Hardswish(scale, zero_point)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/activation.html#Hardswish)

    

This is the quantized version of
[`Hardswish`](generated/torch.nn.hardswish#torch.nn.Hardswish
"torch.nn.Hardswish").

Parameters

    

  * **scale** – quantization scale of the output tensor
  * **zero_point** – quantization zero point of the output tensor

## Conv1d

`class torch.nn.quantized.Conv1d(in_channels, out_channels, kernel_size,
stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv1d)

    

Applies a 1D convolution over a quantized input signal composed of several
quantized input planes.

For details on input arguments, parameters, and implementation see
[`Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d").

Note

Only `zeros` is supported for the `padding_mode` argument.

Note

Only `torch.quint8` is supported for the input data type.

Variables

    

  * **~Conv1d.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – packed tensor derived from the learnable weight parameter.
  * **~Conv1d.scale** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output scale
  * **~Conv1d.zero_point** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output zero point

See [`Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d")
for other attributes.

Examples:

    
    
    >>> m = nn.quantized.Conv1d(16, 33, 3, stride=2)
    >>> input = torch.randn(20, 16, 100)
    >>> # quantize input to quint8
    >>> q_input = torch.quantize_per_tensor(input, scale=1.0, zero_point=0,
                                            dtype=torch.quint8)
    >>> output = m(q_input)
    

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv1d.from_float)

    

Creates a quantized module from a float module or qparams_dict.

Parameters

    

**mod** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – a float module, either produced by torch.quantization
utilities or provided by the user

## Conv2d

`class torch.nn.quantized.Conv2d(in_channels, out_channels, kernel_size,
stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv2d)

    

Applies a 2D convolution over a quantized input signal composed of several
quantized input planes.

For details on input arguments, parameters, and implementation see
[`Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d").

Note

Only `zeros` is supported for the `padding_mode` argument.

Note

Only `torch.quint8` is supported for the input data type.

Variables

    

  * **~Conv2d.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – packed tensor derived from the learnable weight parameter.
  * **~Conv2d.scale** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output scale
  * **~Conv2d.zero_point** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output zero point

See [`Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d")
for other attributes.

Examples:

    
    
    >>> # With square kernels and equal stride
    >>> m = nn.quantized.Conv2d(16, 33, 3, stride=2)
    >>> # non-square kernels and unequal stride and with padding
    >>> m = nn.quantized.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
    >>> # non-square kernels and unequal stride and with padding and dilation
    >>> m = nn.quantized.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
    >>> input = torch.randn(20, 16, 50, 100)
    >>> # quantize input to quint8
    >>> q_input = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8)
    >>> output = m(q_input)
    

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv2d.from_float)

    

Creates a quantized module from a float module or qparams_dict.

Parameters

    

**mod** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – a float module, either produced by torch.quantization
utilities or provided by the user

## Conv3d

`class torch.nn.quantized.Conv3d(in_channels, out_channels, kernel_size,
stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv3d)

    

Applies a 3D convolution over a quantized input signal composed of several
quantized input planes.

For details on input arguments, parameters, and implementation see
[`Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d").

Note

Only `zeros` is supported for the `padding_mode` argument.

Note

Only `torch.quint8` is supported for the input data type.

Variables

    

  * **~Conv3d.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – packed tensor derived from the learnable weight parameter.
  * **~Conv3d.scale** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output scale
  * **~Conv3d.zero_point** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output zero point

See [`Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d")
for other attributes.

Examples:

    
    
    >>> # With square kernels and equal stride
    >>> m = nn.quantized.Conv3d(16, 33, 3, stride=2)
    >>> # non-square kernels and unequal stride and with padding
    >>> m = nn.quantized.Conv3d(16, 33, (3, 5, 5), stride=(1, 2, 2), padding=(1, 2, 2))
    >>> # non-square kernels and unequal stride and with padding and dilation
    >>> m = nn.quantized.Conv3d(16, 33, (3, 5, 5), stride=(1, 2, 2), padding=(1, 2, 2), dilation=(1, 2, 2))
    >>> input = torch.randn(20, 16, 56, 56, 56)
    >>> # quantize input to quint8
    >>> q_input = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8)
    >>> output = m(q_input)
    

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv3d.from_float)

    

Creates a quantized module from a float module or qparams_dict.

Parameters

    

**mod** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – a float module, either produced by torch.quantization
utilities or provided by the user

## FloatFunctional

`class torch.nn.quantized.FloatFunctional`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/functional_modules.html#FloatFunctional)

    

State collector class for float operations.

The instance of this class can be used instead of the `torch.` prefix for some
operations. See example usage below.

Note

This class does not provide a `forward` hook. Instead, you must use one of the
underlying functions (e.g. `add`).

Examples:

    
    
    >>> f_add = FloatFunctional()
    >>> a = torch.tensor(3.0)
    >>> b = torch.tensor(4.0)
    >>> f_add.add(a, b)  # Equivalent to ``torch.add(a, b)``
    

Valid operation names:

    

  * add
  * cat
  * mul
  * add_relu
  * add_scalar
  * mul_scalar

## QFunctional

`class torch.nn.quantized.QFunctional`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/functional_modules.html#QFunctional)

    

Wrapper class for quantized operations.

The instance of this class can be used instead of the `torch.ops.quantized`
prefix. See example usage below.

Note

This class does not provide a `forward` hook. Instead, you must use one of the
underlying functions (e.g. `add`).

Examples:

    
    
    >>> q_add = QFunctional()
    >>> a = torch.quantize_per_tensor(torch.tensor(3.0), 1.0, 0, torch.qint32)
    >>> b = torch.quantize_per_tensor(torch.tensor(4.0), 1.0, 0, torch.qint32)
    >>> q_add.add(a, b)  # Equivalent to ``torch.ops.quantized.add(a, b, 1.0, 0)``
    

Valid operation names:

    

  * add
  * cat
  * mul
  * add_relu
  * add_scalar
  * mul_scalar

## Quantize

`class torch.nn.quantized.Quantize(scale, zero_point, dtype)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules.html#Quantize)

    

Quantizes an incoming tensor

Parameters

    

  * **scale** – scale of the output Quantized Tensor
  * **zero_point** – zero_point of output Quantized Tensor
  * **dtype** – data type of output Quantized Tensor

Variables

    

**zero_point, dtype** (_`scale`__,_) –

Examples::

    
    
    
    >>> t = torch.tensor([[1., -1.], [1., -1.]])
    >>> scale, zero_point, dtype = 1.0, 2, torch.qint8
    >>> qm = Quantize(scale, zero_point, dtype)
    >>> qt = qm(t)
    >>> print(qt)
    tensor([[ 1., -1.],
            [ 1., -1.]], size=(2, 2), dtype=torch.qint8, scale=1.0, zero_point=2)
    

## DeQuantize

`class torch.nn.quantized.DeQuantize`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules.html#DeQuantize)

    

Dequantizes an incoming tensor

Examples::

    
    
    
    >>> input = torch.tensor([[1., -1.], [1., -1.]])
    >>> scale, zero_point, dtype = 1.0, 2, torch.qint8
    >>> qm = Quantize(scale, zero_point, dtype)
    >>> quantized_input = qm(input)
    >>> dqm = DeQuantize()
    >>> dequantized = dqm(quantized_input)
    >>> print(dequantized)
    tensor([[ 1., -1.],
            [ 1., -1.]], dtype=torch.float32)
    

## Linear

`class torch.nn.quantized.Linear(in_features, out_features, bias_=True,
dtype=torch.qint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/linear.html#Linear)

    

A quantized linear module with quantized tensor as inputs and outputs. We
adopt the same interface as `torch.nn.Linear`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.Linear> for documentation.

Similar to [`Linear`](generated/torch.nn.linear#torch.nn.Linear
"torch.nn.Linear"), attributes will be randomly initialized at module creation
time and will be overwritten later

Variables

    

  * **~Linear.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable quantized weights of the module of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) .
  * **~Linear.bias** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized to zero.
  * **~Linear.scale** – `scale` parameter of output Quantized Tensor, type: double
  * **~Linear.zero_point** – `zero_point` parameter for output Quantized Tensor, type: long

Examples:

    
    
    >>> m = nn.quantized.Linear(20, 30)
    >>> input = torch.randn(128, 20)
    >>> input = torch.quantize_per_tensor(input, 1.0, 0, torch.quint8)
    >>> output = m(input)
    >>> print(output.size())
    torch.Size([128, 30])
    

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/linear.html#Linear.from_float)

    

Create a quantized module from a float module or qparams_dict

Parameters

    

**mod** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – a float module, either produced by torch.quantization
utilities or provided by the user

## BatchNorm2d

`class torch.nn.quantized.BatchNorm2d(num_features, eps=1e-05, momentum=0.1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/batchnorm.html#BatchNorm2d)

    

This is the quantized version of
[`BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d
"torch.nn.BatchNorm2d").

## BatchNorm3d

`class torch.nn.quantized.BatchNorm3d(num_features, eps=1e-05, momentum=0.1)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/batchnorm.html#BatchNorm3d)

    

This is the quantized version of
[`BatchNorm3d`](generated/torch.nn.batchnorm3d#torch.nn.BatchNorm3d
"torch.nn.BatchNorm3d").

## LayerNorm

`class torch.nn.quantized.LayerNorm(normalized_shape, weight, bias, scale,
zero_point, eps=1e-05, elementwise_affine=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#LayerNorm)

    

This is the quantized version of
[`LayerNorm`](generated/torch.nn.layernorm#torch.nn.LayerNorm
"torch.nn.LayerNorm").

Additional args:

    

  * **scale** \- quantization scale of the output, type: double.
  * **zero_point** \- quantization zero point of the output, type: long.

## GroupNorm

`class torch.nn.quantized.GroupNorm(num_groups, num_channels, weight, bias,
scale, zero_point, eps=1e-05, affine=True)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#GroupNorm)

    

This is the quantized version of
[`GroupNorm`](generated/torch.nn.groupnorm#torch.nn.GroupNorm
"torch.nn.GroupNorm").

Additional args:

    

  * **scale** \- quantization scale of the output, type: double.
  * **zero_point** \- quantization zero point of the output, type: long.

## InstanceNorm1d

`class torch.nn.quantized.InstanceNorm1d(num_features, weight, bias, scale,
zero_point, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#InstanceNorm1d)

    

This is the quantized version of
[`InstanceNorm1d`](generated/torch.nn.instancenorm1d#torch.nn.InstanceNorm1d
"torch.nn.InstanceNorm1d").

Additional args:

    

  * **scale** \- quantization scale of the output, type: double.
  * **zero_point** \- quantization zero point of the output, type: long.

## InstanceNorm2d

`class torch.nn.quantized.InstanceNorm2d(num_features, weight, bias, scale,
zero_point, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#InstanceNorm2d)

    

This is the quantized version of
[`InstanceNorm2d`](generated/torch.nn.instancenorm2d#torch.nn.InstanceNorm2d
"torch.nn.InstanceNorm2d").

Additional args:

    

  * **scale** \- quantization scale of the output, type: double.
  * **zero_point** \- quantization zero point of the output, type: long.

## InstanceNorm3d

`class torch.nn.quantized.InstanceNorm3d(num_features, weight, bias, scale,
zero_point, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#InstanceNorm3d)

    

This is the quantized version of
[`InstanceNorm3d`](generated/torch.nn.instancenorm3d#torch.nn.InstanceNorm3d
"torch.nn.InstanceNorm3d").

Additional args:

    

  * **scale** \- quantization scale of the output, type: double.
  * **zero_point** \- quantization zero point of the output, type: long.

## Embedding

`class torch.nn.quantized.Embedding(num_embeddings, embedding_dim,
padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False,
sparse=False, _weight=None, dtype=torch.quint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#Embedding)

    

A quantized Embedding module with quantized packed weights as inputs. We adopt
the same interface as `torch.nn.Embedding`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.Embedding> for
documentation.

Similar to [`Embedding`](generated/torch.nn.embedding#torch.nn.Embedding
"torch.nn.Embedding"), attributes will be randomly initialized at module
creation time and will be overwritten later

Variables

    

**~Embedding.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the
non-learnable quantized weights of the module of shape
(num_embeddings,embedding_dim)(\text{num\\_embeddings},
\text{embedding\\_dim}) .

Examples::

    
    
    
    >>> m = nn.quantized.Embedding(num_embeddings=10, embedding_dim=12)
    >>> indices = torch.tensor([9, 6, 5, 7, 8, 8, 9, 2, 8])
    >>> output = m(indices)
    >>> print(output.size())
    torch.Size([9, 12]
    

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#Embedding.from_float)

    

Create a quantized embedding module from a float module

Parameters

    

**mod** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – a float module, either produced by torch.quantization
utilities or provided by user

## EmbeddingBag

`class torch.nn.quantized.EmbeddingBag(num_embeddings, embedding_dim,
max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='sum',
sparse=False, _weight=None, include_last_offset=False, dtype=torch.quint8)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#EmbeddingBag)

    

A quantized EmbeddingBag module with quantized packed weights as inputs. We
adopt the same interface as `torch.nn.EmbeddingBag`, please see
<https://pytorch.org/docs/stable/nn.html#torch.nn.EmbeddingBag> for
documentation.

Similar to
[`EmbeddingBag`](generated/torch.nn.embeddingbag#torch.nn.EmbeddingBag
"torch.nn.EmbeddingBag"), attributes will be randomly initialized at module
creation time and will be overwritten later

Variables

    

**~EmbeddingBag.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the
non-learnable quantized weights of the module of shape
(num_embeddings,embedding_dim)(\text{num\\_embeddings},
\text{embedding\\_dim}) .

Examples::

    
    
    
    >>> m = nn.quantized.EmbeddingBag(num_embeddings=10, embedding_dim=12, include_last_offset=True, mode='sum')
    >>> indices = torch.tensor([9, 6, 5, 7, 8, 8, 9, 2, 8, 6, 6, 9, 1, 6, 8, 8, 3, 2, 3, 6, 3, 6, 5, 7, 0, 8, 4, 6, 5, 8, 2, 3])
    >>> offsets = torch.tensor([0, 19, 20, 28, 28, 32])
    >>> output = m(indices, offsets)
    >>> print(output.size())
    torch.Size([5, 12]
    

`classmethod from_float(mod)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#EmbeddingBag.from_float)

    

Create a quantized embedding_bag module from a float module

Parameters

    

**mod** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – a float module, either produced by torch.quantization
utilities or provided by user

# torch.overrides

This module exposes various helper functions for the `__torch_function__`
protocol. See [Extending
torch](https://pytorch.org/docs/1.8.0/notes/extending.html#extending-torch)
for more detail on the `__torch_function__` protocol.

## Functions

`torch.overrides.get_ignored_functions()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#get_ignored_functions)

    

Return public functions that cannot be overridden by `__torch_function__`.

Returns

    

A tuple of functions that are publicly available in the torch API but cannot
be overridden with `__torch_function__`. Mostly this is because none of the
arguments of these functions are tensors or tensor-likes.

Return type

    

Set[Callable]

#### Examples

    
    
    >>> torch.Tensor.as_subclass in torch.overrides.get_ignored_functions()
    True
    >>> torch.add in torch.overrides.get_ignored_functions()
    False
    

`torch.overrides.get_overridable_functions()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#get_overridable_functions)

    

List functions that are overridable via __torch_function__

Returns

    

A dictionary that maps namespaces that contain overridable functions to
functions in that namespace that can be overridden.

Return type

    

Dict[Any, List[Callable]]

`torch.overrides.get_testing_overrides()`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#get_testing_overrides)

    

Return a dict containing dummy overrides for all overridable functions

Returns

    

A dictionary that maps overridable functions in the PyTorch API to lambda
functions that have the same signature as the real function and
unconditionally return -1. These lambda functions are useful for testing API
coverage for a type that defines `__torch_function__`.

Return type

    

Dict[Callable, Callable]

#### Examples

    
    
    >>> import inspect
    >>> my_add = torch.overrides.get_testing_overrides()[torch.add]
    >>> inspect.signature(my_add)
    <Signature (input, other, out=None)>
    

`torch.overrides.handle_torch_function(public_api, relevant_args, *args,
**kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#handle_torch_function)

    

Implement a function with checks for `__torch_function__` overrides.

See torch::autograd::handle_torch_function for the equivalent of this function
in the C++ implementation.

Parameters

    

  * **public_api** (_function_) – Function exposed by the public torch API originally called like `public_api(*args, **kwargs)` on which arguments are now being checked.
  * **relevant_args** (_iterable_) – Iterable of arguments to check for __torch_function__ methods.
  * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Arbitrary positional arguments originally passed into `public_api`.
  * **kwargs** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Arbitrary keyword arguments originally passed into `public_api`.

Returns

    

Result from calling `implementation` or an `__torch_function__` method, as
appropriate.

Return type

    

[object](https://docs.python.org/3/library/functions.html#object "\(in Python
v3.9\)")

:raises TypeError : if no implementation is found.:

#### Example

    
    
    >>> def func(a):
    ...     if type(a) is not torch.Tensor:  # This will make func dispatchable by __torch_function__
    ...         return handle_torch_function(func, (a,), a)
    ...     return a + 0
    

`torch.overrides.has_torch_function()`

    

Check for __torch_function__ implementations in the elements of an iterable.
Considers exact `Tensor` s and `Parameter` s non-dispatchable. :param
relevant_args: Iterable or aguments to check for __torch_function__ methods.
:type relevant_args: iterable

Returns

    

True if any of the elements of relevant_args have __torch_function__
implementations, False otherwise.

Return type

    

[bool](https://docs.python.org/3/library/functions.html#bool "\(in Python
v3.9\)")

See also

`torch.is_tensor_like()`

    

Checks if something is a Tensor-like, including an exact `Tensor`.

`torch.overrides.is_tensor_like(inp)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#is_tensor_like)

    

Returns `True` if the passed-in input is a Tensor-like.

Currently, this occurs whenever there’s a `__torch_function__` attribute on
the type of the input.

#### Examples

A subclass of tensor is generally a Tensor-like.

    
    
    >>> class SubTensor(torch.Tensor): ...
    >>> is_tensor_like(SubTensor([0]))
    True
    

Built-in or user types aren’t usually Tensor-like.

    
    
    >>> is_tensor_like(6)
    False
    >>> is_tensor_like(None)
    False
    >>> class NotATensor: ...
    >>> is_tensor_like(NotATensor())
    False
    

But, they can be made Tensor-like by implementing __torch_function__.

    
    
    >>> class TensorLike:
    ...     def __torch_function__(self, func, types, args, kwargs):
    ...         return -1
    >>> is_tensor_like(TensorLike())
    True
    

`torch.overrides.is_tensor_method_or_property(func)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#is_tensor_method_or_property)

    

Returns True if the function passed in is a handler for a method or property
belonging to `torch.Tensor`, as passed into `__torch_function__`.

Note

For properties, their `__get__` method must be passed in.

This may be needed, in particular, for the following reasons:

  1. Methods/properties sometimes don’t contain a `__module__` slot.
  2. They require that the first passed-in argument is an instance of `torch.Tensor`.

#### Examples

    
    
    >>> is_tensor_method_or_property(torch.Tensor.add)
    True
    >>> is_tensor_method_or_property(torch.add)
    False
    

`torch.overrides.wrap_torch_function(dispatcher)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#wrap_torch_function)

    

Wraps a given function with `__torch_function__` -related functionality.

Parameters

    

**dispatcher** (_Callable_) – A callable that returns an iterable of Tensor-
likes passed into the function.

Note

This decorator may reduce the performance of your code. Generally, it’s enough
to express your code as a series of functions that, themselves, support
__torch_function__. If you find yourself in the rare situation where this is
not the case, e.g. if you’re wrapping a low-level library and you also need it
to work for Tensor-likes, then this function is available.

#### Examples

    
    
    >>> def dispatcher(a): # Must have the same signature as func
    ...     return (a,)
    >>> @torch.overrides.wrap_torch_function(dispatcher)
    >>> def func(a): # This will make func dispatchable by __torch_function__
    ...     return a + 0
    

# torch.quantization

This module implements the functions you call directly to convert your model
from FP32 to quantized form. For example the `prepare()` is used in post
training quantization to prepares your model for the calibration step and
`convert()` actually converts the weights to int8 and replaces the operations
with their quantized counterparts. There are other helper functions for things
like quantizing the input to your model and performing critical fusions like
conv+relu.

## Top-level quantization APIs

`torch.quantization.quantize(model, run_fn, run_args, mapping=None,
inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#quantize)

    

Quantize the input float model with post training static quantization.

First it will prepare the model for calibration, then it calls `run_fn` which
will run the calibration step, after that we will convert the model to a
quantized model.

Parameters

    

  * **model** – input float model
  * **run_fn** – a calibration function for calibrating the prepared model
  * **run_args** – positional arguments for `run_fn`
  * **inplace** – carry out model transformations in-place, the original module is mutated
  * **mapping** – correspondence between original module types and quantized counterparts

Returns

    

Quantized model.

`torch.quantization.quantize_dynamic(model, qconfig_spec=None,
dtype=torch.qint8, mapping=None, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#quantize_dynamic)

    

Converts a float model to dynamic (i.e. weights-only) quantized model.

Replaces specified modules with dynamic weight-only quantized versions and
output the quantized model.

For simplest usage provide `dtype` argument that can be float16 or qint8.
Weight-only quantization by default is performed for layers with large weights
size - i.e. Linear and RNN variants.

Fine grained control is possible with `qconfig` and `mapping` that act
similarly to `quantize()`. If `qconfig` is provided, the `dtype` argument is
ignored.

Parameters

    

  * **model** – input model
  * **qconfig_spec** – 

Either:

    * A dictionary that maps from name or type of submodule to quantization configuration, qconfig applies to all submodules of a given module unless qconfig for the submodules are specified (when the submodule already has qconfig attribute). Entries in the dictionary need to be QConfigDynamic instances.
    * A set of types and/or submodule names to apply dynamic quantization to, in which case the `dtype` argument is used to specify the bit-width
  * **inplace** – carry out model transformations in-place, the original module is mutated
  * **mapping** – maps type of a submodule to a type of corresponding dynamically quantized version with which the submodule needs to be replaced

`torch.quantization.quantize_qat(model, run_fn, run_args, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#quantize_qat)

    

Do quantization aware training and output a quantized model

Parameters

    

  * **model** – input model
  * **run_fn** – a function for evaluating the prepared model, can be a function that simply runs the prepared model or a training loop
  * **run_args** – positional arguments for `run_fn`

Returns

    

Quantized model.

`torch.quantization.prepare(model, inplace=False, allow_list=None,
observer_non_leaf_module_list=None, prepare_custom_config_dict=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#prepare)

    

Prepares a copy of the model for quantization calibration or quantization-
aware training.

Quantization configuration should be assigned preemptively to individual
submodules in `.qconfig` attribute.

The model will be attached with observer or fake quant modules, and qconfig
will be propagated.

Parameters

    

  * **model** – input model to be modified in-place
  * **inplace** – carry out model transformations in-place, the original module is mutated
  * **allow_list** – list of quantizable modules
  * **observer_non_leaf_module_list** – list of non-leaf modules we want to add observer
  * **prepare_custom_config_dict** – customization configuration dictionary for prepare function

    
    
    # Example of prepare_custom_config_dict:
    prepare_custom_config_dict = {
        # user will manually define the corresponding observed
        # module class which has a from_float class method that converts
        # float custom module to observed custom module
        "float_to_observed_custom_module_class": {
            CustomModule: ObservedCustomModule
        }
     }
    

`torch.quantization.prepare_qat(model, mapping=None, inplace=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#prepare_qat)

    

Prepares a copy of the model for quantization calibration or quantization-
aware training and converts it to quantized version.

Quantization configuration should be assigned preemptively to individual
submodules in `.qconfig` attribute.

Parameters

    

  * **model** – input model to be modified in-place
  * **mapping** – dictionary that maps float modules to quantized modules to be replaced.
  * **inplace** – carry out model transformations in-place, the original module is mutated

`torch.quantization.convert(module, mapping=None, inplace=False,
remove_qconfig=True, convert_custom_config_dict=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#convert)

    

Converts submodules in input module to a different module according to
`mapping` by calling `from_float` method on the target module class. And
remove qconfig at the end if remove_qconfig is set to True.

Parameters

    

  * **module** – prepared and calibrated module
  * **mapping** – a dictionary that maps from source module type to target module type, can be overwritten to allow swapping user defined Modules
  * **inplace** – carry out model transformations in-place, the original module is mutated
  * **convert_custom_config_dict** – custom configuration dictionary for convert function

    
    
    # Example of convert_custom_config_dict:
    convert_custom_config_dict = {
        # user will manually define the corresponding quantized
        # module class which has a from_observed class method that converts
        # observed custom module to quantized custom module
        "observed_to_quantized_custom_module_class": {
            ObservedCustomModule: QuantizedCustomModule
        }
    }
    

`class torch.quantization.QConfig`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/qconfig.html#QConfig)

    

Describes how to quantize a layer or a part of the network by providing
settings (observer classes) for activations and weights respectively.

Note that QConfig needs to contain observer **classes** (like MinMaxObserver)
or a callable that returns instances on invocation, not the concrete observer
instances themselves. Quantization preparation function will instantiate
observers multiple times for each of the layers.

Observer classes have usually reasonable default arguments, but they can be
overwritten with `with_args` method (that behaves like functools.partial):

my_qconfig = QConfig(activation=MinMaxObserver.with_args(dtype=torch.qint8),
weight=default_observer.with_args(dtype=torch.qint8))

`class torch.quantization.QConfigDynamic`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/qconfig.html#QConfigDynamic)

    

Describes how to dynamically quantize a layer or a part of the network by
providing settings (observer classes) for weights.

It’s like QConfig, but for dynamic quantization.

Note that QConfigDynamic needs to contain observer **classes** (like
MinMaxObserver) or a callable that returns instances on invocation, not the
concrete observer instances themselves. Quantization function will instantiate
observers multiple times for each of the layers.

Observer classes have usually reasonable default arguments, but they can be
overwritten with `with_args` method (that behaves like functools.partial):

my_qconfig =
QConfigDynamic(weight=default_observer.with_args(dtype=torch.qint8))

## Preparing model for quantization

`torch.quantization.fuse_modules(model, modules_to_fuse, inplace=False,
fuser_func=<function fuse_known_modules>, fuse_custom_config_dict=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/fuse_modules.html#fuse_modules)

    

Fuses a list of modules into a single module

Fuses only the following sequence of modules: conv, bn conv, bn, relu conv,
relu linear, relu bn, relu All other sequences are left unchanged. For these
sequences, replaces the first item in the list with the fused module,
replacing the rest of the modules with identity.

Parameters

    

  * **model** – Model containing the modules to be fused
  * **modules_to_fuse** – list of list of module names to fuse. Can also be a list of strings if there is only a single list of modules to fuse.
  * **inplace** – bool specifying if fusion happens in place on the model, by default a new model is returned
  * **fuser_func** – Function that takes in a list of modules and outputs a list of fused modules of the same length. For example, fuser_func([convModule, BNModule]) returns the list [ConvBNModule, nn.Identity()] Defaults to torch.quantization.fuse_known_modules
  * **fuse_custom_config_dict** – custom configuration for fusion

    
    
    # Example of fuse_custom_config_dict
    fuse_custom_config_dict = {
        # Additional fuser_method mapping
        "additional_fuser_method_mapping": {
            (torch.nn.Conv2d, torch.nn.BatchNorm2d): fuse_conv_bn
        },
    }
    

Returns

    

model with fused modules. A new copy is created if inplace=True.

Examples:

    
    
    >>> m = myModel()
    >>> # m is a module containing  the sub-modules below
    >>> modules_to_fuse = [ ['conv1', 'bn1', 'relu1'], ['submodule.conv', 'submodule.relu']]
    >>> fused_m = torch.quantization.fuse_modules(m, modules_to_fuse)
    >>> output = fused_m(input)
    
    >>> m = myModel()
    >>> # Alternately provide a single list of modules to fuse
    >>> modules_to_fuse = ['conv1', 'bn1', 'relu1']
    >>> fused_m = torch.quantization.fuse_modules(m, modules_to_fuse)
    >>> output = fused_m(input)
    

`class torch.quantization.QuantStub(qconfig=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/stubs.html#QuantStub)

    

Quantize stub module, before calibration, this is same as an observer, it will
be swapped as `nnq.Quantize` in `convert`.

Parameters

    

**qconfig** – quantization configuration for the tensor, if qconfig is not
provided, we will get qconfig from parent modules

`class torch.quantization.DeQuantStub`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/stubs.html#DeQuantStub)

    

Dequantize stub module, before calibration, this is same as identity, this
will be swapped as `nnq.DeQuantize` in `convert`.

`class torch.quantization.QuantWrapper(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/stubs.html#QuantWrapper)

    

A wrapper class that wraps the input module, adds QuantStub and DeQuantStub
and surround the call to module with call to quant and dequant modules.

This is used by the `quantization` utility functions to add the quant and
dequant modules, before `convert` function `QuantStub` will just be observer,
it observes the input tensor, after `convert`, `QuantStub` will be swapped to
`nnq.Quantize` which does actual quantization. Similarly for `DeQuantStub`.

`torch.quantization.add_quant_dequant(module)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#add_quant_dequant)

    

Wrap the leaf child module in QuantWrapper if it has a valid qconfig Note that
this function will modify the children of module inplace and it can return a
new module which wraps the input module as well.

Parameters

    

  * **module** – input module with qconfig attributes for all the leaf modules
  * **we want to quantize** (_that_) – 

Returns

    

Either the inplace modified module with submodules wrapped in `QuantWrapper`
based on qconfig or a new `QuantWrapper` module which wraps the input module,
the latter case only happens when the input module is a leaf module and we
want to quantize it.

## Utility functions

`torch.quantization.add_observer_(module, qconfig_propagation_list=None,
non_leaf_module_list=None, device=None, custom_module_class_mapping=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#add_observer_)

    

Add observer for the leaf child of the module.

This function insert observer module to all leaf child module that has a valid
qconfig attribute.

Parameters

    

  * **module** – input module with qconfig attributes for all the leaf modules that we want to quantize
  * **device** – parent device, if any
  * **non_leaf_module_list** – list of non-leaf modules we want to add observer

Returns

    

None, module is modified inplace with added observer modules and forward_hooks

`torch.quantization.swap_module(mod, mapping, custom_module_class_mapping)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#swap_module)

    

Swaps the module if it has a quantized counterpart and it has an `observer`
attached.

Parameters

    

  * **mod** – input module
  * **mapping** – a dictionary that maps from nn module to nnq module

Returns

    

The corresponding quantized module of `mod`

`torch.quantization.propagate_qconfig_(module, qconfig_dict=None,
allow_list=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#propagate_qconfig_)

    

Propagate qconfig through the module hierarchy and assign `qconfig` attribute
on each leaf module

Parameters

    

  * **module** – input module
  * **qconfig_dict** – dictionary that maps from name or type of submodule to quantization configuration, qconfig applies to all submodules of a given module unless qconfig for the submodules are specified (when the submodule already has qconfig attribute)

Returns

    

None, module is modified inplace with qconfig attached

`torch.quantization.default_eval_fn(model, calib_data)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization.html#default_eval_fn)

    

Default evaluation function takes a torch.utils.data.Dataset or a list of
input Tensors and run the model on the dataset

## Observers

`class torch.quantization.ObserverBase(dtype)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#ObserverBase)

    

Base observer Module. Any observer implementation should derive from this
class.

Concrete observers should follow the same API. In forward, they will update
the statistics of the observed Tensor. And they should provide a
`calculate_qparams` function that computes the quantization parameters given
the collected statistics.

Parameters

    

**dtype** – Quantized data type

`classmethod with_args(**kwargs)`

    

Wrapper that allows creation of class factories.

This can be useful when there is a need to create classes with the same
constructor arguments, but different instances.

Example:

    
    
    >>> Foo.with_args = classmethod(_with_args)
    >>> foo_builder = Foo.with_args(a=3, b=4).with_args(answer=42)
    >>> foo_instance1 = foo_builder()
    >>> foo_instance2 = foo_builder()
    >>> id(foo_instance1) == id(foo_instance2)
    False
    

`class torch.quantization.MinMaxObserver(dtype=torch.quint8,
qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None,
quant_max=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#MinMaxObserver)

    

Observer module for computing the quantization parameters based on the running
min and max values.

This observer uses the tensor min/max statistics to compute the quantization
parameters. The module records the running minimum and maximum of incoming
tensors, and uses this statistic to compute the quantization parameters.

Parameters

    

  * **dtype** – Quantized data type
  * **qscheme** – Quantization scheme to be used
  * **reduce_range** – Reduces the range of the quantized data type by 1 bit
  * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
  * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup.

Given running min/max as xminx_\text{min} and xmaxx_\text{max} , scale ss and
zero point zz are computed as:

The running minimum/maximum xmin/maxx_\text{min/max} is computed as:

xmin={min⁡(X)if xmin=Nonemin⁡(xmin,min⁡(X))otherwisexmax={max⁡(X)if
xmax=Nonemax⁡(xmax,max⁡(X))otherwise\begin{array}{ll} x_\text{min} &=
\begin{cases} \min(X) & \text{if~}x_\text{min} = \text{None} \\\
\min\left(x_\text{min}, \min(X)\right) & \text{otherwise} \end{cases}\\\
x_\text{max} &= \begin{cases} \max(X) & \text{if~}x_\text{max} = \text{None}
\\\ \max\left(x_\text{max}, \max(X)\right) & \text{otherwise} \end{cases}\\\
\end{array}

where XX is the observed tensor.

The scale ss and zero point zz are then computed as:

if Symmetric:s=2max⁡(∣xmin∣,xmax)/(Qmax−Qmin)z={0if dtype is
qint8128otherwiseOtherwise:s=(xmax−xmin)/(Qmax−Qmin)z=Qmin−round(xmin/s)\begin{aligned}
\text{if Symmetric:}&\\\ &s = 2 \max(|x_\text{min}|, x_\text{max}) / \left(
Q_\text{max} - Q_\text{min} \right) \\\ &z = \begin{cases} 0 & \text{if dtype
is qint8} \\\ 128 & \text{otherwise} \end{cases}\\\ \text{Otherwise:}&\\\ &s =
\left( x_\text{max} - x_\text{min} \right ) / \left( Q_\text{max} -
Q_\text{min} \right ) \\\ &z = Q_\text{min} - \text{round}(x_\text{min} / s)
\end{aligned}

where QminQ_\text{min} and QmaxQ_\text{max} are the minimum and maximum of the
quantized data type.

Warning

Only works with `torch.per_tensor_symmetric` quantization scheme

Warning

`dtype` can only take `torch.qint8` or `torch.quint8`.

Note

If the running minimum equals to the running maximum, the scale and zero_point
are set to 1.0 and 0.

`class torch.quantization.MovingAverageMinMaxObserver(averaging_constant=0.01,
dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False,
quant_min=None, quant_max=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#MovingAverageMinMaxObserver)

    

Observer module for computing the quantization parameters based on the moving
average of the min and max values.

This observer computes the quantization parameters based on the moving
averages of minimums and maximums of the incoming tensors. The module records
the average minimum and maximum of incoming tensors, and uses this statistic
to compute the quantization parameters.

Parameters

    

  * **averaging_constant** – Averaging constant for min/max.
  * **dtype** – Quantized data type
  * **qscheme** – Quantization scheme to be used
  * **reduce_range** – Reduces the range of the quantized data type by 1 bit
  * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
  * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup.

The moving average min/max is computed as follows

xmin={min⁡(X)if xmin=None(1−c)xmin+cmin⁡(X)otherwisexmax={max⁡(X)if
xmax=None(1−c)xmax+cmax⁡(X)otherwise\begin{array}{ll} x_\text{min} =
\begin{cases} \min(X) & \text{if~}x_\text{min} = \text{None} \\\ (1 - c)
x_\text{min} + c \min(X) & \text{otherwise} \end{cases}\\\ x_\text{max} =
\begin{cases} \max(X) & \text{if~}x_\text{max} = \text{None} \\\ (1 - c)
x_\text{max} + c \max(X) & \text{otherwise} \end{cases}\\\ \end{array}

where xmin/maxx_\text{min/max} is the running average min/max, XX is is the
incoming tensor, and cc is the `averaging_constant`.

The scale and zero point are then computed as in `MinMaxObserver`.

Note

Only works with `torch.per_tensor_affine` quantization scheme.

Note

If the running minimum equals to the running maximum, the scale and zero_point
are set to 1.0 and 0.

`class torch.quantization.PerChannelMinMaxObserver(ch_axis=0,
dtype=torch.quint8, qscheme=torch.per_channel_affine, reduce_range=False,
quant_min=None, quant_max=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#PerChannelMinMaxObserver)

    

Observer module for computing the quantization parameters based on the running
per channel min and max values.

This observer uses the tensor min/max statistics to compute the per channel
quantization parameters. The module records the running minimum and maximum of
incoming tensors, and uses this statistic to compute the quantization
parameters.

Parameters

    

  * **ch_axis** – Channel axis
  * **dtype** – Quantized data type
  * **qscheme** – Quantization scheme to be used
  * **reduce_range** – Reduces the range of the quantized data type by 1 bit
  * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
  * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup.

The quantization parameters are computed the same way as in `MinMaxObserver`,
with the difference that the running min/max values are stored per channel.
Scales and zero points are thus computed per channel as well.

Note

If the running minimum equals to the running maximum, the scales and
zero_points are set to 1.0 and 0.

`class
torch.quantization.MovingAveragePerChannelMinMaxObserver(averaging_constant=0.01,
ch_axis=0, dtype=torch.quint8, qscheme=torch.per_channel_affine,
reduce_range=False, quant_min=None, quant_max=None)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#MovingAveragePerChannelMinMaxObserver)

    

Observer module for computing the quantization parameters based on the running
per channel min and max values.

This observer uses the tensor min/max statistics to compute the per channel
quantization parameters. The module records the running minimum and maximum of
incoming tensors, and uses this statistic to compute the quantization
parameters.

Parameters

    

  * **averaging_constant** – Averaging constant for min/max.
  * **ch_axis** – Channel axis
  * **dtype** – Quantized data type
  * **qscheme** – Quantization scheme to be used
  * **reduce_range** – Reduces the range of the quantized data type by 1 bit
  * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup.
  * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup.

The quantization parameters are computed the same way as in
`MovingAverageMinMaxObserver`, with the difference that the running min/max
values are stored per channel. Scales and zero points are thus computed per
channel as well.

Note

If the running minimum equals to the running maximum, the scales and
zero_points are set to 1.0 and 0.

`class torch.quantization.HistogramObserver(bins=2048, upsample_rate=128,
dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#HistogramObserver)

    

The module records the running histogram of tensor values along with min/max
values. `calculate_qparams` will calculate scale and zero_point.

Parameters

    

  * **bins** – Number of bins to use for the histogram
  * **upsample_rate** – Factor by which the histograms are upsampled, this is used to interpolate histograms with varying ranges across observations
  * **dtype** – Quantized data type
  * **qscheme** – Quantization scheme to be used
  * **reduce_range** – Reduces the range of the quantized data type by 1 bit

The scale and zero point are computed as follows:

  1. Create the histogram of the incoming inputs.
    

The histogram is computed continuously, and the ranges per bin change with
every new tensor observed.

  2. Search the distribution in the histogram for optimal min/max values.
    

The search for the min/max values ensures the minimization of the quantization
error with respect to the floating point model.

  3. Compute the scale and zero point the same way as in the
    

`MinMaxObserver`

`class torch.quantization.FakeQuantize(observer=<class
'torch.quantization.observer.MovingAverageMinMaxObserver'>, quant_min=0,
quant_max=255, **observer_kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/fake_quantize.html#FakeQuantize)

    

Simulate the quantize and dequantize operations in training time. The output
of this module is given by

x_out = (clamp(round(x/scale + zero_point), quant_min,
quant_max)-zero_point)*scale

  * `scale` defines the scale factor used for quantization.
  * `zero_point` specifies the quantized value to which 0 in floating point maps to
  * `quant_min` specifies the minimum allowable quantized value.
  * `quant_max` specifies the maximum allowable quantized value.
  * `fake_quant_enable` controls the application of fake quantization on tensors, note that statistics can still be updated.
  * `observer_enable` controls statistics collection on tensors
  * `dtype specifies the quantized dtype that is being emulated with fake-quantization,`
    

allowable values are torch.qint8 and torch.quint8. The values of quant_min and
quant_max should be chosen to be consistent with the dtype

Parameters

    

  * **observer** (_module_) – Module for observing statistics on input tensors and calculating scale and zero-point.
  * **quant_min** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The minimum allowable quantized value.
  * **quant_max** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The maximum allowable quantized value.
  * **observer_kwargs** (_optional_) – Arguments for the observer module

Variables

    

**~FakeQuantize.observer** ([Module](generated/torch.nn.module#torch.nn.Module
"torch.nn.Module")) – User provided module that collects statistics on the
input tensor and provides a method to calculate scale and zero-point.

`class torch.quantization.NoopObserver(dtype=torch.float16,
custom_op_name='')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#NoopObserver)

    

Observer that doesn’t do anything and just passes its configuration to the
quantized module’s `.from_float()`.

Primarily used for quantization to float16 which doesn’t require determining
ranges.

Parameters

    

  * **dtype** – Quantized data type
  * **custom_op_name** – (temporary) specify this observer for an operator that doesn’t require any observation (Can be used in Graph Mode Passes for special case ops).

## Debugging utilities

`torch.quantization.get_observer_dict(mod, target_dict, prefix='')`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#get_observer_dict)

    

Traverse the modules and save all observers into dict. This is mainly used for
quantization accuracy debug :param mod: the top module we want to save all
observers :param prefix: the prefix for the current module :param target_dict:
the dictionary used to save all the observers

`class torch.quantization.RecordingObserver(**kwargs)`
[[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#RecordingObserver)

    

The module is mainly for debug and records the tensor values during runtime.

Parameters

    

  * **dtype** – Quantized data type
  * **qscheme** – Quantization scheme to be used
  * **reduce_range** – Reduces the range of the quantized data type by 1 bit

[`nn.intrinsic`](torch.nn.intrinsic#module-torch.nn.intrinsic "torch.nn.intrinsic") |   
---|---  
  
# Type Info

The numerical properties of a
[`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") can
be accessed through either the `torch.finfo` or the `torch.iinfo`.

## torch.finfo

`class torch.finfo`

A `torch.finfo` is an object that represents the numerical properties of a
floating point [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype"), (i.e. `torch.float32`, `torch.float64`, and
`torch.float16`). This is similar to
[numpy.finfo](https://docs.scipy.org/doc/numpy/reference/generated/numpy.finfo.html).

A `torch.finfo` provides the following attributes:

Name | Type | Description  
---|---|---  
bits | int | The number of bits occupied by the type.  
eps | float | The smallest representable number such that `1.0 + eps != 1.0`.  
max | float | The largest representable number.  
min | float | The smallest representable number (typically `-max`).  
tiny | float | The smallest positive representable number.  
resolution | float | The approximate decimal resolution of this type, i.e., `10**-precision`.  
  
Note

The constructor of `torch.finfo` can be called without argument, in which case
the class is created for the pytorch default dtype (as returned by
[`torch.get_default_dtype()`](generated/torch.get_default_dtype#torch.get_default_dtype
"torch.get_default_dtype")).

## torch.iinfo

`class torch.iinfo`

A `torch.iinfo` is an object that represents the numerical properties of a
integer [`torch.dtype`](tensor_attributes#torch.torch.dtype
"torch.torch.dtype") (i.e. `torch.uint8`, `torch.int8`, `torch.int16`,
`torch.int32`, and `torch.int64`). This is similar to
[numpy.iinfo](https://docs.scipy.org/doc/numpy/reference/generated/numpy.iinfo.html).

A `torch.iinfo` provides the following attributes:

Name | Type | Description  
---|---|---  
bits | int | The number of bits occupied by the type.  
max | int | The largest representable number.  
min | int | The smallest representable number.