# Automatic Mixed Precision package - torch.cuda.amp `torch.cuda.amp` provides convenience methods for mixed precision, where some operations use the `torch.float32` (`float`) datatype and other operations use `torch.float16` (`half`). Some ops, like linear layers and convolutions, are much faster in `float16`. Other ops, like reductions, often require the dynamic range of `float32`. Mixed precision tries to match each op to its appropriate datatype. Ordinarily, “automatic mixed precision training” uses `torch.cuda.amp.autocast` and `torch.cuda.amp.GradScaler` together, as shown in the [Automatic Mixed Precision examples](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-examples) and [Automatic Mixed Precision recipe](https://pytorch.org/tutorials/recipes/recipes/amp_recipe.html). However, `autocast` and `GradScaler` are modular, and may be used separately if desired. * Autocasting * Gradient Scaling * Autocast Op Reference * Op Eligibility * Op-Specific Behavior * Ops that can autocast to `float16` * Ops that can autocast to `float32` * Ops that promote to the widest input type * Prefer `binary_cross_entropy_with_logits` over `binary_cross_entropy` ## Autocasting `class torch.cuda.amp.autocast(enabled=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/autocast_mode.html#autocast) Instances of `autocast` serve as context managers or decorators that allow regions of your script to run in mixed precision. In these regions, CUDA ops run in an op-specific dtype chosen by autocast to improve performance while maintaining accuracy. See the Autocast Op Reference for details. When entering an autocast-enabled region, Tensors may be any type. You should not call `.half()` on your model(s) or inputs when using autocasting. `autocast` should wrap only the forward pass(es) of your network, including the loss computation(s). Backward passes under autocast are not recommended. Backward ops run in the same type that autocast used for corresponding forward ops. Example: # Creates model and optimizer in default precision model = Net().cuda() optimizer = optim.SGD(model.parameters(), ...) for input, target in data: optimizer.zero_grad() # Enables autocasting for the forward pass (model + loss) with autocast(): output = model(input) loss = loss_fn(output, target) # Exits the context manager before backward() loss.backward() optimizer.step() See the [Automatic Mixed Precision examples](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-examples) for usage (along with gradient scaling) in more complex scenarios (e.g., gradient penalty, multiple models/losses, custom autograd functions). `autocast` can also be used as a decorator, e.g., on the `forward` method of your model: class AutocastModel(nn.Module): ... @autocast() def forward(self, input): ... Floating-point Tensors produced in an autocast-enabled region may be `float16`. After returning to an autocast-disabled region, using them with floating-point Tensors of different dtypes may cause type mismatch errors. If so, cast the Tensor(s) produced in the autocast region back to `float32` (or other dtype if desired). If a Tensor from the autocast region is already `float32`, the cast is a no-op, and incurs no additional overhead. Example: # Creates some tensors in default dtype (here assumed to be float32) a_float32 = torch.rand((8, 8), device="cuda") b_float32 = torch.rand((8, 8), device="cuda") c_float32 = torch.rand((8, 8), device="cuda") d_float32 = torch.rand((8, 8), device="cuda") with autocast(): # torch.mm is on autocast's list of ops that should run in float16. # Inputs are float32, but the op runs in float16 and produces float16 output. # No manual casts are required. e_float16 = torch.mm(a_float32, b_float32) # Also handles mixed input types f_float16 = torch.mm(d_float32, e_float16) # After exiting autocast, calls f_float16.float() to use with d_float32 g_float32 = torch.mm(d_float32, f_float16.float()) Type mismatch errors _in_ an autocast-enabled region are a bug; if this is what you observe, please file an issue. `autocast(enabled=False)` subregions can be nested in autocast-enabled regions. Locally disabling autocast can be useful, for example, if you want to force a subregion to run in a particular `dtype`. Disabling autocast gives you explicit control over the execution type. In the subregion, inputs from the surrounding region should be cast to `dtype` before use: # Creates some tensors in default dtype (here assumed to be float32) a_float32 = torch.rand((8, 8), device="cuda") b_float32 = torch.rand((8, 8), device="cuda") c_float32 = torch.rand((8, 8), device="cuda") d_float32 = torch.rand((8, 8), device="cuda") with autocast(): e_float16 = torch.mm(a_float32, b_float32) with autocast(enabled=False): # Calls e_float16.float() to ensure float32 execution # (necessary because e_float16 was created in an autocasted region) f_float32 = torch.mm(c_float32, e_float16.float()) # No manual casts are required when re-entering the autocast-enabled region. # torch.mm again runs in float16 and produces float16 output, regardless of input types. g_float16 = torch.mm(d_float32, f_float32) The autocast state is thread-local. If you want it enabled in a new thread, the context manager or decorator must be invoked in that thread. This affects [`torch.nn.DataParallel`](generated/torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel") and [`torch.nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") when used with more than one GPU per process (see [Working with Multiple GPUs](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-multigpu)). Parameters **enabled** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=True_) – Whether autocasting should be enabled in the region. `torch.cuda.amp.custom_fwd(fwd=None, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/autocast_mode.html#custom_fwd) Helper decorator for `forward` methods of custom autograd functions (subclasses of [`torch.autograd.Function`](autograd#torch.autograd.Function "torch.autograd.Function")). See the [example page](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-custom- examples) for more detail. Parameters **cast_inputs** (`torch.dtype` or None, optional, default=None) – If not `None`, when `forward` runs in an autocast-enabled region, casts incoming floating-point CUDA Tensors to the target dtype (non-floating-point Tensors are not affected), then executes `forward` with autocast disabled. If `None`, `forward`’s internal ops execute with the current autocast state. Note If the decorated `forward` is called outside an autocast-enabled region, `custom_fwd` is a no-op and `cast_inputs` has no effect. `torch.cuda.amp.custom_bwd(bwd)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/autocast_mode.html#custom_bwd) Helper decorator for backward methods of custom autograd functions (subclasses of [`torch.autograd.Function`](autograd#torch.autograd.Function "torch.autograd.Function")). Ensures that `backward` executes with the same autocast state as `forward`. See the [example page](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#amp-custom- examples) for more detail. ## Gradient Scaling If the forward pass for a particular op has `float16` inputs, the backward pass for that op will produce `float16` gradients. Gradient values with small magnitudes may not be representable in `float16`. These values will flush to zero (“underflow”), so the update for the corresponding parameters will be lost. To prevent underflow, “gradient scaling” multiplies the network’s loss(es) by a scale factor and invokes a backward pass on the scaled loss(es). Gradients flowing backward through the network are then scaled by the same factor. In other words, gradient values have a larger magnitude, so they don’t flush to zero. Each parameter’s gradient (`.grad` attribute) should be unscaled before the optimizer updates the parameters, so the scale factor does not interfere with the learning rate. `class torch.cuda.amp.GradScaler(init_scale=65536.0, growth_factor=2.0, backoff_factor=0.5, growth_interval=2000, enabled=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler) `get_backoff_factor()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_backoff_factor) Returns a Python float containing the scale backoff factor. `get_growth_factor()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_growth_factor) Returns a Python float containing the scale growth factor. `get_growth_interval()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_growth_interval) Returns a Python int containing the growth interval. `get_scale()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.get_scale) Returns a Python float containing the current scale, or 1.0 if scaling is disabled. Warning `get_scale()` incurs a CPU-GPU sync. `is_enabled()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.is_enabled) Returns a bool indicating whether this instance is enabled. `load_state_dict(state_dict)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.load_state_dict) Loads the scaler state. If this instance is disabled, `load_state_dict()` is a no-op. Parameters **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – scaler state. Should be an object returned from a call to `state_dict()`. `scale(outputs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.scale) Multiplies (‘scales’) a tensor or list of tensors by the scale factor. Returns scaled outputs. If this instance of `GradScaler` is not enabled, outputs are returned unmodified. Parameters **outputs** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _iterable of Tensors_) – Outputs to scale. `set_backoff_factor(new_factor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.set_backoff_factor) Parameters **new_scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Value to use as the new scale backoff factor. `set_growth_factor(new_factor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.set_growth_factor) Parameters **new_scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Value to use as the new scale growth factor. `set_growth_interval(new_interval)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.set_growth_interval) Parameters **new_interval** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Value to use as the new growth interval. `state_dict()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.state_dict) Returns the state of the scaler as a [`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)"). It contains five entries: * `"scale"` \- a Python float containing the current scale * `"growth_factor"` \- a Python float containing the current growth factor * `"backoff_factor"` \- a Python float containing the current backoff factor * `"growth_interval"` \- a Python int containing the current growth interval * `"_growth_tracker"` \- a Python int containing the number of recent consecutive unskipped steps. If this instance is not enabled, returns an empty dict. Note If you wish to checkpoint the scaler’s state after a particular iteration, `state_dict()` should be called after `update()`. `step(optimizer, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.step) `step()` carries out the following two operations: 1. Internally invokes `unscale_(optimizer)` (unless `unscale_()` was explicitly called for `optimizer` earlier in the iteration). As part of the `unscale_()`, gradients are checked for infs/NaNs. 2. If no inf/NaN gradients are found, invokes `optimizer.step()` using the unscaled gradients. Otherwise, `optimizer.step()` is skipped to avoid corrupting the params. `*args` and `**kwargs` are forwarded to `optimizer.step()`. Returns the return value of `optimizer.step(*args, **kwargs)`. Parameters * **optimizer** ([torch.optim.Optimizer](optim#torch.optim.Optimizer "torch.optim.Optimizer")) – Optimizer that applies the gradients. * **args** – Any arguments. * **kwargs** – Any keyword arguments. Warning Closure use is not currently supported. `unscale_(optimizer)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.unscale_) Divides (“unscales”) the optimizer’s gradient tensors by the scale factor. `unscale_()` is optional, serving cases where you need to [modify or inspect gradients](https://pytorch.org/docs/1.8.0/notes/amp_examples.html#working- with-unscaled-gradients) between the backward pass(es) and `step()`. If `unscale_()` is not called explicitly, gradients will be unscaled automatically during `step()`. Simple example, using `unscale_()` to enable clipping of unscaled gradients: ... scaler.scale(loss).backward() scaler.unscale_(optimizer) torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm) scaler.step(optimizer) scaler.update() Parameters **optimizer** ([torch.optim.Optimizer](optim#torch.optim.Optimizer "torch.optim.Optimizer")) – Optimizer that owns the gradients to be unscaled. Note `unscale_()` does not incur a CPU-GPU sync. Warning `unscale_()` should only be called once per optimizer per `step()` call, and only after all gradients for that optimizer’s assigned parameters have been accumulated. Calling `unscale_()` twice for a given optimizer between each `step()` triggers a RuntimeError. Warning `unscale_()` may unscale sparse gradients out of place, replacing the `.grad` attribute. `update(new_scale=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/amp/grad_scaler.html#GradScaler.update) Updates the scale factor. If any optimizer steps were skipped the scale is multiplied by `backoff_factor` to reduce it. If `growth_interval` unskipped iterations occurred consecutively, the scale is multiplied by `growth_factor` to increase it. Passing `new_scale` sets the scale directly. Parameters **new_scale** (float or `torch.cuda.FloatTensor`, optional, default=None) – New scale factor. Warning `update()` should only be called at the end of the iteration, after `scaler.step(optimizer)` has been invoked for all optimizers used this iteration. ## Autocast Op Reference ### Op Eligibility Only CUDA ops are eligible for autocasting. Ops that run in `float64` or non-floating-point dtypes are not eligible, and will run in these types whether or not autocast is enabled. Only out-of-place ops and Tensor methods are eligible. In-place variants and calls that explicitly supply an `out=...` Tensor are allowed in autocast- enabled regions, but won’t go through autocasting. For example, in an autocast-enabled region `a.addmm(b, c)` can autocast, but `a.addmm_(b, c)` and `a.addmm(b, c, out=d)` cannot. For best performance and stability, prefer out- of-place ops in autocast-enabled regions. Ops called with an explicit `dtype=...` argument are not eligible, and will produce output that respects the `dtype` argument. ### Op-Specific Behavior The following lists describe the behavior of eligible ops in autocast-enabled regions. These ops always go through autocasting whether they are invoked as part of a [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module"), as a function, or as a [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") method. If functions are exposed in multiple namespaces, they go through autocasting regardless of the namespace. Ops not listed below do not go through autocasting. They run in the type defined by their inputs. However, autocasting may still change the type in which unlisted ops run if they’re downstream from autocasted ops. If an op is unlisted, we assume it’s numerically stable in `float16`. If you believe an unlisted op is numerically unstable in `float16`, please file an issue. #### Ops that can autocast to `float16` `__matmul__`, `addbmm`, `addmm`, `addmv`, `addr`, `baddbmm`, `bmm`, `chain_matmul`, `conv1d`, `conv2d`, `conv3d`, `conv_transpose1d`, `conv_transpose2d`, `conv_transpose3d`, `GRUCell`, `linear`, `LSTMCell`, `matmul`, `mm`, `mv`, `prelu`, `RNNCell` #### Ops that can autocast to `float32` `__pow__`, `__rdiv__`, `__rpow__`, `__rtruediv__`, `acos`, `asin`, `binary_cross_entropy_with_logits`, `cosh`, `cosine_embedding_loss`, `cdist`, `cosine_similarity`, `cross_entropy`, `cumprod`, `cumsum`, `dist`, `erfinv`, `exp`, `expm1`, `gelu`, `group_norm`, `hinge_embedding_loss`, `kl_div`, `l1_loss`, `layer_norm`, `log`, `log_softmax`, `log10`, `log1p`, `log2`, `margin_ranking_loss`, `mse_loss`, `multilabel_margin_loss`, `multi_margin_loss`, `nll_loss`, `norm`, `normalize`, `pdist`, `poisson_nll_loss`, `pow`, `prod`, `reciprocal`, `rsqrt`, `sinh`, `smooth_l1_loss`, `soft_margin_loss`, `softmax`, `softmin`, `softplus`, `sum`, `renorm`, `tan`, `triplet_margin_loss` #### Ops that promote to the widest input type These ops don’t require a particular dtype for stability, but take multiple inputs and require that the inputs’ dtypes match. If all of the inputs are `float16`, the op runs in `float16`. If any of the inputs is `float32`, autocast casts all inputs to `float32` and runs the op in `float32`. `addcdiv`, `addcmul`, `atan2`, `bilinear`, `cat`, `cross`, `dot`, `equal`, `index_put`, `stack`, `tensordot` Some ops not listed here (e.g., binary ops like `add`) natively promote inputs without autocasting’s intervention. If inputs are a mixture of `float16` and `float32`, these ops run in `float32` and produce `float32` output, regardless of whether autocast is enabled. #### Prefer `binary_cross_entropy_with_logits` over `binary_cross_entropy` The backward passes of [`torch.nn.functional.binary_cross_entropy()`](nn.functional#torch.nn.functional.binary_cross_entropy "torch.nn.functional.binary_cross_entropy") (and [`torch.nn.BCELoss`](generated/torch.nn.bceloss#torch.nn.BCELoss "torch.nn.BCELoss"), which wraps it) can produce gradients that aren’t representable in `float16`. In autocast-enabled regions, the forward input may be `float16`, which means the backward gradient must be representable in `float16` (autocasting `float16` forward inputs to `float32` doesn’t help, because that cast must be reversed in backward). Therefore, `binary_cross_entropy` and `BCELoss` raise an error in autocast-enabled regions. Many models use a sigmoid layer right before the binary cross entropy layer. In this case, combine the two layers using [`torch.nn.functional.binary_cross_entropy_with_logits()`](nn.functional#torch.nn.functional.binary_cross_entropy_with_logits "torch.nn.functional.binary_cross_entropy_with_logits") or [`torch.nn.BCEWithLogitsLoss`](generated/torch.nn.bcewithlogitsloss#torch.nn.BCEWithLogitsLoss "torch.nn.BCEWithLogitsLoss"). `binary_cross_entropy_with_logits` and `BCEWithLogits` are safe to autocast. # Automatic differentiation package - torch.autograd `torch.autograd` provides classes and functions implementing automatic differentiation of arbitrary scalar valued functions. It requires minimal changes to the existing code - you only need to declare `Tensor` s for which gradients should be computed with the `requires_grad=True` keyword. As of now, we only support autograd for floating point `Tensor` types ( half, float, double and bfloat16) and complex `Tensor` types (cfloat, cdouble). `torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None, inputs=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd.html#backward) Computes the sum of gradients of given tensors w.r.t. graph leaves. The graph is differentiated using the chain rule. If any of `tensors` are non- scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifying `grad_tensors`. It should be a sequence of matching length, that contains the “vector” in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding tensors (`None` is an acceptable value for all tensors that don’t need gradient tensors). This function accumulates gradients in the leaves - you might need to zero `.grad` attributes or set them to `None` before calling it. See Default gradient layouts for details on the memory layout of accumulated gradients. Note Using this method with `create_graph=True` will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend using `autograd.grad` when creating the graph to avoid this. If you have to use this function, make sure to reset the `.grad` fields of your parameters to `None` after use to break the cycle and avoid the leak. Note If you run any forward ops, create `grad_tensors`, and/or call `backward` in a user-specified CUDA stream context, see [Stream semantics of backward passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream- semantics). Parameters * **tensors** (_sequence of Tensor_) – Tensors of which the derivative will be computed. * **grad_tensors** (_sequence of_ _(_[Tensor](tensors#torch.Tensor "torch.Tensor") _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)") _)_) – The “vector” in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding tensors. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to `True` is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to `False`. * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be accumulated into `.grad`. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors. `torch.autograd.grad(outputs, inputs, grad_outputs=None, retain_graph=None, create_graph=False, only_inputs=True, allow_unused=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd.html#grad) Computes and returns the sum of gradients of outputs w.r.t. the inputs. `grad_outputs` should be a sequence of length matching `output` containing the “vector” in Jacobian-vector product, usually the pre-computed gradients w.r.t. each of the outputs. If an output doesn’t require_grad, then the gradient can be `None`). If `only_inputs` is `True`, the function will only return a list of gradients w.r.t the specified inputs. If it’s `False`, then gradient w.r.t. all remaining leaves will still be computed, and will be accumulated into their `.grad` attribute. Note If you run any forward ops, create `grad_outputs`, and/or call `grad` in a user-specified CUDA stream context, see [Stream semantics of backward passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream- semantics). Parameters * **outputs** (_sequence of Tensor_) – outputs of the differentiated function. * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be returned (and not accumulated into `.grad`). * **grad_outputs** (_sequence of Tensor_) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable for all grad_tensors, then this argument is optional. Default: None. * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to `True` is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Default: `False`. * **allow_unused** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to `False`. ## Functional higher level API Warning This API is in beta. Even though the function signatures are very unlikely to change, major improvements to performances are planned before we consider this stable. This section contains the higher level API for the autograd that builds on the basic API above and allows you to compute jacobians, hessians, etc. This API works with user-provided functions that take only Tensors as input and return only Tensors. If your function takes other arguments that are not Tensors or Tensors that don’t have requires_grad set, you can use a lambda to capture them. For example, for a function `f` that takes three inputs, a Tensor for which we want the jacobian, another tensor that should be considered constant and a boolean flag as `f(input, constant, flag=flag)` you can use it as `functional.jacobian(lambda x: f(x, constant, flag=flag), input)`. `torch.autograd.functional.jacobian(func, inputs, create_graph=False, strict=False, vectorize=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#jacobian) Function that computes the Jacobian of a given function. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a tuple of Tensors or a Tensor. * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the Jacobian will be computed in a differentiable manner. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the jacobian for said inputs, which is the expected mathematical value. Defaults to `False`. * **vectorize** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – This feature is experimental, please use at your own risk. When computing the jacobian, usually we invoke `autograd.grad` once per row of the jacobian. If this flag is `True`, we use the vmap prototype feature as the backend to vectorize calls to `autograd.grad` so we only invoke it once instead of once per row. This should lead to performance improvements in many use cases, however, due to this feature being incomplete, there may be performance cliffs. Please use `torch._C._debug_only_display_vmap_fallback_warnings(True)` to show any performance warnings and file us issues if warnings exist for your use case. Defaults to `False`. Returns if there is a single input and output, this will be a single Tensor containing the Jacobian for the linearized inputs and output. If one of the two is a tuple, then the Jacobian will be a tuple of Tensors. If both of them are tuples, then the Jacobian will be a tuple of tuple of Tensors where `Jacobian[i][j]` will contain the Jacobian of the `i`th output and `j`th input and will have as size the concatenation of the sizes of the corresponding output and the corresponding input and will have same dtype and device as the corresponding input. Return type Jacobian ([Tensor](tensors#torch.Tensor "torch.Tensor") or nested tuple of Tensors) #### Example >>> def exp_reducer(x): ... return x.exp().sum(dim=1) >>> inputs = torch.rand(2, 2) >>> jacobian(exp_reducer, inputs) tensor([[[1.4917, 2.4352], [0.0000, 0.0000]], [[0.0000, 0.0000], [2.4369, 2.3799]]]) >>> jacobian(exp_reducer, inputs, create_graph=True) tensor([[[1.4917, 2.4352], [0.0000, 0.0000]], [[0.0000, 0.0000], [2.4369, 2.3799]]], grad_fn=) >>> def exp_adder(x, y): ... return 2 * x.exp() + 3 * y >>> inputs = (torch.rand(2), torch.rand(2)) >>> jacobian(exp_adder, inputs) (tensor([[2.8052, 0.0000], [0.0000, 3.3963]]), tensor([[3., 0.], [0., 3.]])) `torch.autograd.functional.hessian(func, inputs, create_graph=False, strict=False, vectorize=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#hessian) Function that computes the Hessian of a given scalar function. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor with a single element. * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the Hessian will be computed in a differentiable manner. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the hessian for said inputs, which is the expected mathematical value. Defaults to `False`. * **vectorize** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – This feature is experimental, please use at your own risk. When computing the hessian, usually we invoke `autograd.grad` once per row of the hessian. If this flag is `True`, we use the vmap prototype feature as the backend to vectorize calls to `autograd.grad` so we only invoke it once instead of once per row. This should lead to performance improvements in many use cases, however, due to this feature being incomplete, there may be performance cliffs. Please use `torch._C._debug_only_display_vmap_fallback_warnings(True)` to show any performance warnings and file us issues if warnings exist for your use case. Defaults to `False`. Returns if there is a single input, this will be a single Tensor containing the Hessian for the input. If it is a tuple, then the Hessian will be a tuple of tuples where `Hessian[i][j]` will contain the Hessian of the `i`th input and `j`th input with size the sum of the size of the `i`th input plus the size of the `j`th input. `Hessian[i][j]` will have the same dtype and device as the corresponding `i`th input. Return type Hessian ([Tensor](tensors#torch.Tensor "torch.Tensor") or a tuple of tuple of Tensors) #### Example >>> def pow_reducer(x): ... return x.pow(3).sum() >>> inputs = torch.rand(2, 2) >>> hessian(pow_reducer, inputs) tensor([[[[5.2265, 0.0000], [0.0000, 0.0000]], [[0.0000, 4.8221], [0.0000, 0.0000]]], [[[0.0000, 0.0000], [1.9456, 0.0000]], [[0.0000, 0.0000], [0.0000, 3.2550]]]]) >>> hessian(pow_reducer, inputs, create_graph=True) tensor([[[[5.2265, 0.0000], [0.0000, 0.0000]], [[0.0000, 4.8221], [0.0000, 0.0000]]], [[[0.0000, 0.0000], [1.9456, 0.0000]], [[0.0000, 0.0000], [0.0000, 3.2550]]]], grad_fn=) >>> def pow_adder_reducer(x, y): ... return (2 * x.pow(2) + 3 * y.pow(2)).sum() >>> inputs = (torch.rand(2), torch.rand(2)) >>> hessian(pow_adder_reducer, inputs) ((tensor([[4., 0.], [0., 4.]]), tensor([[0., 0.], [0., 0.]])), (tensor([[0., 0.], [0., 0.]]), tensor([[6., 0.], [0., 6.]]))) `torch.autograd.functional.vjp(func, inputs, v=None, create_graph=False, strict=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#vjp) Function that computes the dot product between a vector `v` and the Jacobian of the given function at the point given by the inputs. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a tuple of Tensors or a Tensor. * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`. * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the vector Jacobian product is computed. Must be the same size as the output of `func`. This argument is optional when the output of `func` contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the vjp for said inputs, which is the expected mathematical value. Defaults to `False`. Returns tuple with: func_output (tuple of Tensors or Tensor): output of `func(inputs)` vjp (tuple of Tensors or Tensor): result of the dot product with the same shape as the inputs. Return type output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) #### Example >>> def exp_reducer(x): ... return x.exp().sum(dim=1) >>> inputs = torch.rand(4, 4) >>> v = torch.ones(4) >>> vjp(exp_reducer, inputs, v) (tensor([5.7817, 7.2458, 5.7830, 6.7782]), tensor([[1.4458, 1.3962, 1.3042, 1.6354], [2.1288, 1.0652, 1.5483, 2.5035], [2.2046, 1.1292, 1.1432, 1.3059], [1.3225, 1.6652, 1.7753, 2.0152]])) >>> vjp(exp_reducer, inputs, v, create_graph=True) (tensor([5.7817, 7.2458, 5.7830, 6.7782], grad_fn=), tensor([[1.4458, 1.3962, 1.3042, 1.6354], [2.1288, 1.0652, 1.5483, 2.5035], [2.2046, 1.1292, 1.1432, 1.3059], [1.3225, 1.6652, 1.7753, 2.0152]], grad_fn=)) >>> def adder(x, y): ... return 2 * x + 3 * y >>> inputs = (torch.rand(2), torch.rand(2)) >>> v = torch.ones(2) >>> vjp(adder, inputs, v) (tensor([2.4225, 2.3340]), (tensor([2., 2.]), tensor([3., 3.]))) `torch.autograd.functional.jvp(func, inputs, v=None, create_graph=False, strict=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#jvp) Function that computes the dot product between the Jacobian of the given function at the point given by the inputs and a vector `v`. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a tuple of Tensors or a Tensor. * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`. * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the Jacobian vector product is computed. Must be the same size as the input of `func`. This argument is optional when the input to `func` contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the jvp for said inputs, which is the expected mathematical value. Defaults to `False`. Returns tuple with: func_output (tuple of Tensors or Tensor): output of `func(inputs)` jvp (tuple of Tensors or Tensor): result of the dot product with the same shape as the output. Return type output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) #### Example >>> def exp_reducer(x): ... return x.exp().sum(dim=1) >>> inputs = torch.rand(4, 4) >>> v = torch.ones(4, 4) >>> jvp(exp_reducer, inputs, v) (tensor([6.3090, 4.6742, 7.9114, 8.2106]), tensor([6.3090, 4.6742, 7.9114, 8.2106])) >>> jvp(exp_reducer, inputs, v, create_graph=True) (tensor([6.3090, 4.6742, 7.9114, 8.2106], grad_fn=), tensor([6.3090, 4.6742, 7.9114, 8.2106], grad_fn=)) >>> def adder(x, y): ... return 2 * x + 3 * y >>> inputs = (torch.rand(2), torch.rand(2)) >>> v = (torch.ones(2), torch.ones(2)) >>> jvp(adder, inputs, v) (tensor([2.2399, 2.5005]), tensor([5., 5.])) Note The jvp is currently computed by using the backward of the backward (sometimes called the double backwards trick) as we don’t have support for forward mode AD in PyTorch at the moment. `torch.autograd.functional.vhp(func, inputs, v=None, create_graph=False, strict=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#vhp) Function that computes the dot product between a vector `v` and the Hessian of a given scalar function at the point given by the inputs. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor with a single element. * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`. * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the vector Hessian product is computed. Must be the same size as the input of `func`. This argument is optional when `func`’s input contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the vhp for said inputs, which is the expected mathematical value. Defaults to `False`. Returns tuple with: func_output (tuple of Tensors or Tensor): output of `func(inputs)` vhp (tuple of Tensors or Tensor): result of the dot product with the same shape as the inputs. Return type output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) #### Example >>> def pow_reducer(x): ... return x.pow(3).sum() >>> inputs = torch.rand(2, 2) >>> v = torch.ones(2, 2) >>> vhp(pow_reducer, inputs, v) (tensor(0.5591), tensor([[1.0689, 1.2431], [3.0989, 4.4456]])) >>> vhp(pow_reducer, inputs, v, create_graph=True) (tensor(0.5591, grad_fn=), tensor([[1.0689, 1.2431], [3.0989, 4.4456]], grad_fn=)) >>> def pow_adder_reducer(x, y): ... return (2 * x.pow(2) + 3 * y.pow(2)).sum() >>> inputs = (torch.rand(2), torch.rand(2)) >>> v = (torch.zeros(2), torch.ones(2)) >>> vhp(pow_adder_reducer, inputs, v) (tensor(4.8053), (tensor([0., 0.]), tensor([6., 6.]))) `torch.autograd.functional.hvp(func, inputs, v=None, create_graph=False, strict=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/functional.html#hvp) Function that computes the dot product between the Hessian of a given scalar function and a vector `v` at the point given by the inputs. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor with a single element. * **inputs** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function `func`. * **v** (_tuple of Tensors_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – The vector for which the Hessian vector product is computed. Must be the same size as the input of `func`. This argument is optional when `func`’s input contains a single element and (if it is not provided) will be set as a Tensor containing a single `1`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, both the output and result will be computed in a differentiable way. Note that when `strict` is `False`, the result can not require gradients or be disconnected from the inputs. Defaults to `False`. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, an error will be raised when we detect that there exists an input such that all the outputs are independent of it. If `False`, we return a Tensor of zeros as the hvp for said inputs, which is the expected mathematical value. Defaults to `False`. Returns tuple with: func_output (tuple of Tensors or Tensor): output of `func(inputs)` hvp (tuple of Tensors or Tensor): result of the dot product with the same shape as the inputs. Return type output ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) #### Example >>> def pow_reducer(x): ... return x.pow(3).sum() >>> inputs = torch.rand(2, 2) >>> v = torch.ones(2, 2) >>> hvp(pow_reducer, inputs, v) (tensor(0.1448), tensor([[2.0239, 1.6456], [2.4988, 1.4310]])) >>> hvp(pow_reducer, inputs, v, create_graph=True) (tensor(0.1448, grad_fn=), tensor([[2.0239, 1.6456], [2.4988, 1.4310]], grad_fn=)) >>> def pow_adder_reducer(x, y): ... return (2 * x.pow(2) + 3 * y.pow(2)).sum() >>> inputs = (torch.rand(2), torch.rand(2)) >>> v = (torch.zeros(2), torch.ones(2)) >>> hvp(pow_adder_reducer, inputs, v) (tensor(2.3030), (tensor([0., 0.]), tensor([6., 6.]))) Note This function is significantly slower than `vhp` due to backward mode AD constraints. If your functions is twice continuously differentiable, then hvp = vhp.t(). So if you know that your function satisfies this condition, you should use vhp instead that is much faster with the current implementation. ## Locally disabling gradient computation `class torch.autograd.no_grad` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#no_grad) Context-manager that disabled gradient calculation. Disabling gradient calculation is useful for inference, when you are sure that you will not call `Tensor.backward()`. It will reduce memory consumption for computations that would otherwise have `requires_grad=True`. In this mode, the result of every computation will have `requires_grad=False`, even when the inputs have `requires_grad=True`. This context manager is thread local; it will not affect computation in other threads. Also functions as a decorator. (Make sure to instantiate with parenthesis.) Example: >>> x = torch.tensor([1], requires_grad=True) >>> with torch.no_grad(): ... y = x * 2 >>> y.requires_grad False >>> @torch.no_grad() ... def doubler(x): ... return x * 2 >>> z = doubler(x) >>> z.requires_grad False `class torch.autograd.enable_grad` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#enable_grad) Context-manager that enables gradient calculation. Enables gradient calculation, if it has been disabled via `no_grad` or `set_grad_enabled`. This context manager is thread local; it will not affect computation in other threads. Also functions as a decorator. (Make sure to instantiate with parenthesis.) Example: >>> x = torch.tensor([1], requires_grad=True) >>> with torch.no_grad(): ... with torch.enable_grad(): ... y = x * 2 >>> y.requires_grad True >>> y.backward() >>> x.grad >>> @torch.enable_grad() ... def doubler(x): ... return x * 2 >>> with torch.no_grad(): ... z = doubler(x) >>> z.requires_grad True `class torch.autograd.set_grad_enabled(mode)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#set_grad_enabled) Context-manager that sets gradient calculation to on or off. `set_grad_enabled` will enable or disable grads based on its argument `mode`. It can be used as a context-manager or as a function. This context manager is thread local; it will not affect computation in other threads. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Flag whether to enable grad (`True`), or disable (`False`). This can be used to conditionally enable gradients. Example: >>> x = torch.tensor([1], requires_grad=True) >>> is_train = False >>> with torch.set_grad_enabled(is_train): ... y = x * 2 >>> y.requires_grad False >>> torch.set_grad_enabled(True) >>> y = x * 2 >>> y.requires_grad True >>> torch.set_grad_enabled(False) >>> y = x * 2 >>> y.requires_grad False ## Default gradient layouts When a non-sparse `param` receives a non-sparse gradient during `torch.autograd.backward()` or `torch.Tensor.backward()` `param.grad` is accumulated as follows. If `param.grad` is initially `None`: 1. If `param`’s memory is non-overlapping and dense, `.grad` is created with strides matching `param` (thus matching `param`’s layout). 2. Otherwise, `.grad` is created with rowmajor-contiguous strides. If `param` already has a non-sparse `.grad` attribute: 3. If `create_graph=False`, `backward()` accumulates into `.grad` in-place, which preserves its strides. 4. If `create_graph=True`, `backward()` replaces `.grad` with a new tensor `.grad + new grad`, which attempts (but does not guarantee) matching the preexisting `.grad`’s strides. The default behavior (letting `.grad`s be `None` before the first `backward()`, such that their layout is created according to 1 or 2, and retained over time according to 3 or 4) is recommended for best performance. Calls to `model.zero_grad()` or `optimizer.zero_grad()` will not affect `.grad` layouts. In fact, resetting all `.grad`s to `None` before each accumulation phase, e.g.: for iterations... ... for param in model.parameters(): param.grad = None loss.backward() such that they’re recreated according to 1 or 2 every time, is a valid alternative to `model.zero_grad()` or `optimizer.zero_grad()` that may improve performance for some networks. ### Manual gradient layouts If you need manual control over `.grad`’s strides, assign `param.grad =` a zeroed tensor with desired strides before the first `backward()`, and never reset it to `None`. 3 guarantees your layout is preserved as long as `create_graph=False`. 4 indicates your layout is _likely_ preserved even if `create_graph=True`. ## In-place operations on Tensors Supporting in-place operations in autograd is a hard matter, and we discourage their use in most cases. Autograd’s aggressive buffer freeing and reuse makes it very efficient and there are very few occasions when in-place operations actually lower memory usage by any significant amount. Unless you’re operating under heavy memory pressure, you might never need to use them. ### In-place correctness checks All `Tensor` s keep track of in-place operations applied to them, and if the implementation detects that a tensor was saved for backward in one of the functions, but it was modified in-place afterwards, an error will be raised once backward pass is started. This ensures that if you’re using in-place functions and not seeing any errors, you can be sure that the computed gradients are correct. ## Variable (deprecated) Warning The Variable API has been deprecated: Variables are no longer necessary to use autograd with tensors. Autograd automatically supports Tensors with `requires_grad` set to `True`. Below please find a quick guide on what has changed: * `Variable(tensor)` and `Variable(tensor, requires_grad)` still work as expected, but they return Tensors instead of Variables. * `var.data` is the same thing as `tensor.data`. * Methods such as `var.backward(), var.detach(), var.register_hook()` now work on tensors with the same method names. In addition, one can now create tensors with `requires_grad=True` using factory methods such as [`torch.randn()`](generated/torch.randn#torch.randn "torch.randn"), [`torch.zeros()`](generated/torch.zeros#torch.zeros "torch.zeros"), [`torch.ones()`](generated/torch.ones#torch.ones "torch.ones"), and others like the following: `autograd_tensor = torch.randn((2, 3, 4), requires_grad=True)` ## Tensor autograd functions `class torch.Tensor` `grad` This attribute is `None` by default and becomes a Tensor the first time a call to `backward()` computes gradients for `self`. The attribute will then contain the gradients computed and future calls to `backward()` will accumulate (add) gradients into it. `requires_grad` Is `True` if gradients need to be computed for this Tensor, `False` otherwise. Note The fact that gradients need to be computed for a Tensor do not mean that the `grad` attribute will be populated, see `is_leaf` for more details. `is_leaf` All Tensors that have `requires_grad` which is `False` will be leaf Tensors by convention. For Tensors that have `requires_grad` which is `True`, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so `grad_fn` is None. Only leaf Tensors will have their `grad` populated during a call to `backward()`. To get `grad` populated for non-leaf Tensors, you can use `retain_grad()`. Example: >>> a = torch.rand(10, requires_grad=True) >>> a.is_leaf True >>> b = torch.rand(10, requires_grad=True).cuda() >>> b.is_leaf False # b was created by the operation that cast a cpu Tensor into a cuda Tensor >>> c = torch.rand(10, requires_grad=True) + 2 >>> c.is_leaf False # c was created by the addition operation >>> d = torch.rand(10).cuda() >>> d.is_leaf True # d does not require gradients and so has no operation creating it (that is tracked by the autograd engine) >>> e = torch.rand(10).cuda().requires_grad_() >>> e.is_leaf True # e requires gradients and has no operations creating it >>> f = torch.rand(10, requires_grad=True, device="cuda") >>> f.is_leaf True # f requires grad, has no operation creating it `backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.backward) Computes the gradient of current tensor w.r.t. graph leaves. The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying `gradient`. It should be a tensor of matching type and location, that contains the gradient of the differentiated function w.r.t. `self`. This function accumulates gradients in the leaves - you might need to zero `.grad` attributes or set them to `None` before calling it. See Default gradient layouts for details on the memory layout of accumulated gradients. Note If you run any forward ops, create `gradient`, and/or call `backward` in a user-specified CUDA stream context, see [Stream semantics of backward passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream- semantics). Parameters * **gradient** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)")) – Gradient w.r.t. the tensor. If it is a tensor, it will be automatically converted to a Tensor that does not require grad unless `create_graph` is True. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable then this argument is optional. * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to `False`. * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be accumulated into `.grad`. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors. `detach()` Returns a new Tensor, detached from the current graph. The result will never require gradient. Note Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as `resize_` / `resize_as_` / `set_` / `transpose_`) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as `zero_` / `copy_` / `add_`) to the returned tensor will not update the original tensor anymore, and will instead trigger an error. `detach_()` Detaches the Tensor from the graph that created it, making it a leaf. Views cannot be detached in-place. `register_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.register_hook) Registers a backward hook. The hook will be called every time a gradient with respect to the Tensor is computed. The hook should have the following signature: hook(grad) -> Tensor or None The hook should not modify its argument, but it can optionally return a new gradient which will be used in place of `grad`. This function returns a handle with a method `handle.remove()` that removes the hook from the module. Example: >>> v = torch.tensor([0., 0., 0.], requires_grad=True) >>> h = v.register_hook(lambda grad: grad * 2) # double the gradient >>> v.backward(torch.tensor([1., 2., 3.])) >>> v.grad 2 4 6 [torch.FloatTensor of size (3,)] >>> h.remove() # removes the hook `retain_grad()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.retain_grad) Enables .grad attribute for non-leaf Tensors. ## Function `class torch.autograd.Function` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#Function) Records operation history and defines formulas for differentiating ops. See the Note on extending the autograd engine for more details on how to use this class: Every operation performed on `Tensor` s creates a new function object, that performs the computation, and records that it happened. The history is retained in the form of a DAG of functions, with edges denoting data dependencies (`input <- output`). Then, when backward is called, the graph is processed in the topological ordering, by calling `backward()` methods of each `Function` object, and passing returned gradients on to next `Function` s. Normally, the only way users interact with functions is by creating subclasses and defining new operations. This is a recommended way of extending torch.autograd. Examples: >>> class Exp(Function): >>> >>> @staticmethod >>> def forward(ctx, i): >>> result = i.exp() >>> ctx.save_for_backward(result) >>> return result >>> >>> @staticmethod >>> def backward(ctx, grad_output): >>> result, = ctx.saved_tensors >>> return grad_output * result >>> >>> #Use it by calling the apply method: >>> output = Exp.apply(input) `static backward(ctx, *grad_outputs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#Function.backward) Defines a formula for differentiating the operation. This function is to be overridden by all subclasses. It must accept a context `ctx` as the first argument, followed by as many outputs did `forward()` return, and it should return as many tensors, as there were inputs to `forward()`. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. The context can be used to retrieve tensors saved during the forward pass. It also has an attribute `ctx.needs_input_grad` as a tuple of booleans representing whether each input needs gradient. E.g., `backward()` will have `ctx.needs_input_grad[0] = True` if the first input to `forward()` needs gradient computated w.r.t. the output. `static forward(ctx, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#Function.forward) Performs the operation. This function is to be overridden by all subclasses. It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types). The context can be used to store tensors that can be then retrieved during the backward pass. ## Context method mixins When creating a new `Function`, the following methods are available to `ctx`. `class torch.autograd.function._ContextMethodMixin` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin) `mark_dirty(*args)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.mark_dirty) Marks given tensors as modified in an in-place operation. **This should be called at most once, only from inside the** `forward()` **method, and all arguments should be inputs.** Every tensor that’s been modified in-place in a call to `forward()` should be given to this function, to ensure correctness of our checks. It doesn’t matter whether the function is called before or after modification. `mark_non_differentiable(*args)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.mark_non_differentiable) Marks outputs as non-differentiable. **This should be called at most once, only from inside the** `forward()` **method, and all arguments should be outputs.** This will mark outputs as not requiring gradients, increasing the efficiency of backward computation. You still need to accept a gradient for each output in `backward()`, but it’s always going to be a zero tensor with the same shape as the shape of a corresponding output. This is used e.g. for indices returned from a max `Function`. `save_for_backward(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.save_for_backward) Saves given tensors for a future call to `backward()`. **This should be called at most once, and only from inside the** `forward()` **method.** Later, saved tensors can be accessed through the `saved_tensors` attribute. Before returning them to the user, a check is made to ensure they weren’t used in any in-place operation that modified their content. Arguments can also be `None`. `set_materialize_grads(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/function.html#_ContextMethodMixin.set_materialize_grads) Sets whether to materialize output grad tensors. Default is true. **This should be called only from inside the** `forward()` **method** If true, undefined output grad tensors will be expanded to tensors full of zeros prior to calling the `backward()` method. ## Numerical gradient checking `torch.autograd.gradcheck(func, inputs, eps=1e-06, atol=1e-05, rtol=0.001, raise_exception=True, check_sparse_nnz=False, nondet_tol=0.0, check_undefined_grad=True, check_grad_dtypes=False, check_batched_grad=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/gradcheck.html#gradcheck) Check gradients computed via small finite differences against analytical gradients w.r.t. tensors in `inputs` that are of floating point or complex type and with `requires_grad=True`. The check between numerical and analytical gradients uses [`allclose()`](generated/torch.allclose#torch.allclose "torch.allclose"). For complex functions, no notion of Jacobian exists. Gradcheck verifies if the numerical and analytical values of Wirtinger and Conjugate Wirtinger derivative are consistent. The gradient computation is done under the assumption that the overall function has a real valued output. For functions with complex output, gradcheck compares the numerical and analytical gradients for two values of `grad_output`: 1 and 1j. For more details, check out [Autograd for Complex Numbers](https://pytorch.org/docs/1.8.0/notes/autograd.html#complex-autograd- doc). Note The default values are designed for `input` of double precision. This check will likely fail if `input` is of less precision, e.g., `FloatTensor`. Warning If any checked tensor in `input` has overlapping memory, i.e., different indices pointing to the same memory address (e.g., from `torch.expand()`), this check will likely fail because the numerical gradients computed by point perturbation at such indices will change values at all other indices that share the same memory address. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors * **inputs** (_tuple of Tensor_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – perturbation for finite differences * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance * **raise_exception** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicating whether to raise an exception if the check fails. The exception gives more information about the exact nature of the failure. This is helpful when debugging gradchecks. * **check_sparse_nnz** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, gradcheck allows for SparseTensor input, and for any SparseTensor at input, gradcheck will perform check at nnz positions only. * **nondet_tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – tolerance for non-determinism. When running identical inputs through the differentiation, the results must either match exactly (default, 0.0) or be within this tolerance. * **check_undefined_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if undefined output grads are supported and treated as zeros, for `Tensor` outputs. * **check_batched_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if we can compute batched gradients using prototype vmap support. Defaults to False. Returns True if all differences satisfy allclose condition `torch.autograd.gradgradcheck(func, inputs, grad_outputs=None, eps=1e-06, atol=1e-05, rtol=0.001, gen_non_contig_grad_outputs=False, raise_exception=True, nondet_tol=0.0, check_undefined_grad=True, check_grad_dtypes=False, check_batched_grad=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/gradcheck.html#gradgradcheck) Check gradients of gradients computed via small finite differences against analytical gradients w.r.t. tensors in `inputs` and `grad_outputs` that are of floating point or complex type and with `requires_grad=True`. This function checks that backpropagating through the gradients computed to the given `grad_outputs` are correct. The check between numerical and analytical gradients uses [`allclose()`](generated/torch.allclose#torch.allclose "torch.allclose"). Note The default values are designed for `input` and `grad_outputs` of double precision. This check will likely fail if they are of less precision, e.g., `FloatTensor`. Warning If any checked tensor in `input` and `grad_outputs` has overlapping memory, i.e., different indices pointing to the same memory address (e.g., from `torch.expand()`), this check will likely fail because the numerical gradients computed by point perturbation at such indices will change values at all other indices that share the same memory address. Parameters * **func** (_function_) – a Python function that takes Tensor inputs and returns a Tensor or a tuple of Tensors * **inputs** (_tuple of Tensor_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the function * **grad_outputs** (_tuple of Tensor_ _or_[Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The gradients with respect to the function’s outputs. * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – perturbation for finite differences * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance * **gen_non_contig_grad_outputs** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `grad_outputs` is `None` and `gen_non_contig_grad_outputs` is `True`, the randomly generated gradient outputs are made to be noncontiguous * **raise_exception** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicating whether to raise an exception if the check fails. The exception gives more information about the exact nature of the failure. This is helpful when debugging gradchecks. * **nondet_tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – tolerance for non-determinism. When running identical inputs through the differentiation, the results must either match exactly (default, 0.0) or be within this tolerance. Note that a small amount of nondeterminism in the gradient will lead to larger inaccuracies in the second derivative. * **check_undefined_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if undefined output grads are supported and treated as zeros * **check_batched_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, check if we can compute batched gradients using prototype vmap support. Defaults to False. Returns True if all differences satisfy allclose condition ## Profiler Autograd includes a profiler that lets you inspect the cost of different operators inside your model - both on the CPU and GPU. There are two modes implemented at the moment - CPU-only using `profile`. and nvprof based (registers both CPU and GPU activity) using `emit_nvtx`. `class torch.autograd.profiler.profile(enabled=True, *, use_cuda=False, record_shapes=False, with_flops=False, profile_memory=False, with_stack=False, use_kineto=False, use_cpu=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile) Context manager that manages autograd profiler state and holds a summary of results. Under the hood it just records events of functions being executed in C++ and exposes those events to Python. You can wrap any code into it and it will only report runtime of PyTorch functions. Note: profiler is thread local and is automatically propagated into the async tasks Parameters * **enabled** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Setting this to False makes this context manager a no-op. * **use_cuda** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Enables timing of CUDA events as well using the cudaEvent API. Adds approximately 4us of overhead to each tensor operation. * **record_shapes** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If shapes recording is set, information about input dimensions will be collected. This allows one to see which dimensions have been used under the hood and further group by them using prof.key_averages(group_by_input_shape=True). Please note that shape recording might skew your profiling data. It is recommended to use separate runs with and without shape recording to validate the timing. Most likely the skew will be negligible for bottom most events (in a case of nested function calls). But for higher level functions the total self cpu time might be artificially increased because of the shape collection. * **with_flops** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If with_flops is set, the profiler will estimate the FLOPS (floating pointer operations per second) value using the operator’s input shape and total time. This allows one to estimate the hardware performance. Currently, this option only works for the matrix multiplication and 2D convolution operators. * **profile_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – track tensor memory allocation/deallocation. * **with_stack** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – record source information (file and line number) for the ops. * **use_kineto** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – experimental, enable profiling with Kineto profiler. * **use_cpu** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – profile CPU events; setting to `False` requires `use_kineto=True` and can be used to lower the overhead for GPU-only profiling. #### Example >>> x = torch.randn((1, 1), requires_grad=True) >>> with torch.autograd.profiler.profile() as prof: >>> for _ in range(100): # any normal python code, really! >>> y = x ** 2 >> y.backward() >>> # NOTE: some columns were removed for brevity >>> print(prof.key_averages().table(sort_by="self_cpu_time_total")) ----------------------------------- --------------- --------------- --------------- Name Self CPU total CPU time avg Number of Calls ----------------------------------- --------------- --------------- --------------- mul 32.048ms 32.048ms 200 pow 27.041ms 27.041ms 200 PowBackward0 9.727ms 55.483ms 100 torch::autograd::AccumulateGrad 9.148ms 9.148ms 100 torch::autograd::GraphRoot 691.816us 691.816us 100 ----------------------------------- --------------- --------------- --------------- `export_chrome_trace(path)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.export_chrome_trace) Exports an EventList as a Chrome tracing tools file. The checkpoint can be later loaded and inspected under `chrome://tracing` URL. Parameters **path** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Path where the trace will be written. `key_averages(group_by_input_shape=False, group_by_stack_n=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.key_averages) Averages all function events over their keys. Parameters * **group_by_input_shapes** – group entries by * **name, input shapes) rather than just event name.** (_(__event_) – * **is useful to see which input shapes contribute to the runtime** (_This_) – * **most and may help with size-specific optimizations or** (_the_) – * **the best candidates for quantization** (_choosing_) – * **group_by_stack_n** – group by top n stack trace entries Returns An EventList containing FunctionEventAvg objects. `property self_cpu_time_total` Returns total time spent on CPU obtained as a sum of all self times across all the events. `table(sort_by=None, row_limit=100, max_src_column_width=75, header=None, top_level_events_only=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.table) Prints an EventList as a nicely formatted table. Parameters * **sort_by** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Attribute used to sort entries. By default they are printed in the same order as they were registered. Valid keys include: `cpu_time`, `cuda_time`, `cpu_time_total`, `cuda_time_total`, `cpu_memory_usage`, `cuda_memory_usage`, `self_cpu_memory_usage`, `self_cuda_memory_usage`, `count`. * **top_level_events_only** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Boolean flag to determine the selection of events to display. If true, the profiler will only display events at top level like top-level invocation of python `lstm`, python `add` or other functions, nested events like low-level cpu/cuda ops events are omitted for profiler result readability. Returns A string containing the table. `total_average()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#profile.total_average) Averages all events. Returns A FunctionEventAvg object. `class torch.autograd.profiler.emit_nvtx(enabled=True, record_shapes=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#emit_nvtx) Context manager that makes every autograd operation emit an NVTX range. It is useful when running the program under nvprof: nvprof --profile-from-start off -o trace_name.prof -- Unfortunately, there’s no way to force nvprof to flush the data it collected to disk, so for CUDA profiling one has to use this context manager to annotate nvprof traces and wait for the process to exit before inspecting them. Then, either NVIDIA Visual Profiler (nvvp) can be used to visualize the timeline, or `torch.autograd.profiler.load_nvprof()` can load the results for inspection e.g. in Python REPL. Parameters * **enabled** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=True_) – Setting `enabled=False` makes this context manager a no-op. Default: `True`. * **record_shapes** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=False_) – If `record_shapes=True`, the nvtx range wrapping each autograd op will append information about the sizes of Tensor arguments received by that op, in the following format: `[[arg0.size(0), arg0.size(1), ...], [arg1.size(0), arg1.size(1), ...], ...]` Non-tensor arguments will be represented by `[]`. Arguments will be listed in the order they are received by the backend op. Please note that this order may not match the order in which those arguments were passed on the Python side. Also note that shape recording may increase the overhead of nvtx range creation. #### Example >>> with torch.cuda.profiler.profile(): ... model(x) # Warmup CUDA memory allocator and profiler ... with torch.autograd.profiler.emit_nvtx(): ... model(x) **Forward-backward correlation** When viewing a profile created using `emit_nvtx` in the Nvidia Visual Profiler, correlating each backward-pass op with the corresponding forward- pass op can be difficult. To ease this task, `emit_nvtx` appends sequence number information to the ranges it generates. During the forward pass, each function range is decorated with `seq=`. `seq` is a running counter, incremented each time a new backward Function object is created and stashed for backward. Thus, the `seq=` annotation associated with each forward function range tells you that if a backward Function object is created by this forward function, the backward object will receive sequence number N. During the backward pass, the top-level range wrapping each C++ backward Function’s `apply()` call is decorated with `stashed seq=`. `M` is the sequence number that the backward object was created with. By comparing `stashed seq` numbers in backward with `seq` numbers in forward, you can track down which forward op created each backward Function. Any functions executed during the backward pass are also decorated with `seq=`. During default backward (with `create_graph=False`) this information is irrelevant, and in fact, `N` may simply be 0 for all such functions. Only the top-level ranges associated with backward Function objects’ `apply()` methods are useful, as a way to correlate these Function objects with the earlier forward pass. **Double-backward** If, on the other hand, a backward pass with `create_graph=True` is underway (in other words, if you are setting up for a double-backward), each function’s execution during backward is given a nonzero, useful `seq=`. Those functions may themselves create Function objects to be executed later during double-backward, just as the original functions in the forward pass did. The relationship between backward and double-backward is conceptually the same as the relationship between forward and backward: The functions still emit current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and during the eventual double-backward, the Function objects’ `apply()` ranges are still tagged with `stashed seq` numbers, which can be compared to `seq` numbers from the backward pass. `torch.autograd.profiler.load_nvprof(path)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/profiler.html#load_nvprof) Opens an nvprof trace file and parses autograd annotations. Parameters **path** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – path to nvprof trace ## Anomaly detection `class torch.autograd.detect_anomaly` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/anomaly_mode.html#detect_anomaly) Context-manager that enable anomaly detection for the autograd engine. This does two things: * Running the forward pass with detection enabled will allow the backward pass to print the traceback of the forward operation that created the failing backward function. * Any backward computation that generate “nan” value will raise an error. Warning This mode should be enabled only for debugging as the different tests will slow down your program execution. #### Example >>> import torch >>> from torch import autograd >>> class MyFunc(autograd.Function): ... @staticmethod ... def forward(ctx, inp): ... return inp.clone() ... @staticmethod ... def backward(ctx, gO): ... # Error during the backward pass ... raise RuntimeError("Some error in backward") ... return gO.clone() >>> def run_fn(a): ... out = MyFunc.apply(a) ... return out.sum() >>> inp = torch.rand(10, 10, requires_grad=True) >>> out = run_fn(inp) >>> out.backward() Traceback (most recent call last): File "", line 1, in File "/your/pytorch/install/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply return self._forward_cls.backward(self, *args) File "", line 8, in backward RuntimeError: Some error in backward >>> with autograd.detect_anomaly(): ... inp = torch.rand(10, 10, requires_grad=True) ... out = run_fn(inp) ... out.backward() Traceback of forward call that caused the error: File "tmp.py", line 53, in out = run_fn(inp) File "tmp.py", line 44, in run_fn out = MyFunc.apply(a) Traceback (most recent call last): File "", line 4, in File "/your/pytorch/install/torch/tensor.py", line 93, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/your/pytorch/install/torch/autograd/__init__.py", line 90, in backward allow_unreachable=True) # allow_unreachable flag File "/your/pytorch/install/torch/autograd/function.py", line 76, in apply return self._forward_cls.backward(self, *args) File "", line 8, in backward RuntimeError: Some error in backward `class torch.autograd.set_detect_anomaly(mode)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/anomaly_mode.html#set_detect_anomaly) Context-manager that sets the anomaly detection for the autograd engine on or off. `set_detect_anomaly` will enable or disable the autograd anomaly detection based on its argument `mode`. It can be used as a context-manager or as a function. See `detect_anomaly` above for details of the anomaly detection behaviour. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Flag whether to enable anomaly detection (`True`), or disable (`False`). # torch.backends `torch.backends` controls the behavior of various backends that PyTorch supports. These backends include: * `torch.backends.cuda` * `torch.backends.cudnn` * `torch.backends.mkl` * `torch.backends.mkldnn` * `torch.backends.openmp` ## torch.backends.cuda `torch.backends.cuda.is_built()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/cuda.html#is_built) Returns whether PyTorch is built with CUDA support. Note that this doesn’t necessarily mean CUDA is available; just that if this PyTorch binary were run a machine with working CUDA drivers and devices, we would be able to use it. `torch.backends.cuda.matmul.allow_tf32` A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") that controls whether TensorFloat-32 tensor cores may be used in matrix multiplications on Ampere or newer GPUs. See [TensorFloat-32(TF32) on Ampere devices](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). `torch.backends.cuda.cufft_plan_cache` `cufft_plan_cache` caches the cuFFT plans `size` A readonly [`int`](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") that shows the number of plans currently in the cuFFT plan cache. `max_size` A [`int`](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") that controls cache capacity of cuFFT plan. `clear()` Clears the cuFFT plan cache. ## torch.backends.cudnn `torch.backends.cudnn.version()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/cudnn.html#version) Returns the version of cuDNN `torch.backends.cudnn.is_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/cudnn.html#is_available) Returns a bool indicating if CUDNN is currently available. `torch.backends.cudnn.enabled` A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") that controls whether cuDNN is enabled. `torch.backends.cudnn.allow_tf32` A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") that controls where TensorFloat-32 tensor cores may be used in cuDNN convolutions on Ampere or newer GPUs. See [TensorFloat-32(TF32) on Ampere devices](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on-ampere). `torch.backends.cudnn.deterministic` A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") that, if True, causes cuDNN to only use deterministic convolution algorithms. See also [`torch.are_deterministic_algorithms_enabled()`](generated/torch.are_deterministic_algorithms_enabled#torch.are_deterministic_algorithms_enabled "torch.are_deterministic_algorithms_enabled") and [`torch.use_deterministic_algorithms()`](generated/torch.use_deterministic_algorithms#torch.use_deterministic_algorithms "torch.use_deterministic_algorithms"). `torch.backends.cudnn.benchmark` A [`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") that, if True, causes cuDNN to benchmark multiple convolution algorithms and select the fastest. ## torch.backends.mkl `torch.backends.mkl.is_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/mkl.html#is_available) Returns whether PyTorch is built with MKL support. ## torch.backends.mkldnn `torch.backends.mkldnn.is_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/mkldnn.html#is_available) Returns whether PyTorch is built with MKL-DNN support. ## torch.backends.openmp `torch.backends.openmp.is_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/backends/openmp.html#is_available) Returns whether PyTorch is built with OpenMP support. # Benchmark Utils - torch.utils.benchmark `class torch.utils.benchmark.Timer(stmt='pass', setup='pass', timer=, globals=None, label=None, sub_label=None, description=None, env=None, num_threads=1, language=)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer) Helper class for measuring execution time of PyTorch statements. For a full tutorial on how to use this class, see: The PyTorch Timer is based on `timeit.Timer` (and in fact uses `timeit.Timer` internally), but with several key differences: 1. Runtime aware: Timer will perform warmups (important as some elements of PyTorch are lazily initialized), set threadpool size so that comparisons are apples-to-apples, and synchronize asynchronous CUDA functions when necessary. 2. Focus on replicates: When measuring code, and particularly complex kernels / models, run-to-run variation is a significant confounding factor. It is expected that all measurements should include replicates to quantify noise and allow median computation, which is more robust than mean. To that effect, this class deviates from the `timeit` API by conceptually merging `timeit.Timer.repeat` and `timeit.Timer.autorange`. (Exact algorithms are discussed in method docstrings.) The `timeit` method is replicated for cases where an adaptive strategy is not desired. 3. Optional metadata: When defining a Timer, one can optionally specify `label`, `sub_label`, `description`, and `env`. (Defined later) These fields are included in the representation of result object and by the `Compare` class to group and display results for comparison. 4. Instruction counts In addition to wall times, Timer can run a statement under Callgrind and report instructions executed. Directly analogous to `timeit.Timer` constructor arguments: `stmt`, `setup`, `timer`, `globals` PyTorch Timer specific constructor arguments: `label`, `sub_label`, `description`, `env`, `num_threads` Parameters * **stmt** – Code snippet to be run in a loop and timed. * **setup** – Optional setup code. Used to define variables used in `stmt` * **timer** – Callable which returns the current time. If PyTorch was built without CUDA or there is no GPU present, this defaults to `timeit.default_timer`; otherwise it will synchronize CUDA before measuring the time. * **globals** – A dict which defines the global variables when `stmt` is being executed. This is the other method for providing variables which `stmt` needs. * **label** – String which summarizes `stmt`. For instance, if `stmt` is “torch.nn.functional.relu(torch.add(x, 1, out=out))” one might set label to “ReLU(x + 1)” to improve readability. * **sub_label** – Provide supplemental information to disambiguate measurements with identical stmt or label. For instance, in our example above sub_label might be “float” or “int”, so that it is easy to differentiate: “ReLU(x + 1): (float)” ”ReLU(x + 1): (int)” when printing Measurements or summarizing using `Compare`. * **description** – String to distinguish measurements with identical label and sub_label. The principal use of `description` is to signal to `Compare` the columns of data. For instance one might set it based on the input size to create a table of the form: | n=1 | n=4 | ... ------------- ... ReLU(x + 1): (float) | ... | ... | ... ReLU(x + 1): (int) | ... | ... | ... using `Compare`. It is also included when printing a Measurement. * **env** – This tag indicates that otherwise identical tasks were run in different environments, and are therefore not equivilent, for instance when A/B testing a change to a kernel. `Compare` will treat Measurements with different `env` specification as distinct when merging replicate runs. * **num_threads** – The size of the PyTorch threadpool when executing `stmt`. Single threaded performace is important as both a key inference workload and a good indicator of intrinsic algorithmic efficiency, so the default is set to one. This is in contrast to the default PyTorch threadpool size which tries to utilize all cores. `blocked_autorange(callback=None, min_run_time=0.2)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer.blocked_autorange) Measure many replicates while keeping timer overhead to a minimum. At a high level, blocked_autorange executes the following pseudo-code: `setup` total_time = 0 while total_time < min_run_time start = timer() for _ in range(block_size): `stmt` total_time += (timer() - start) Note the variable `block_size` in the inner loop. The choice of block size is important to measurement quality, and must balance two competing objectives: 1. A small block size results in more replicates and generally better statistics. 2. A large block size better amortizes the cost of `timer` invocation, and results in a less biased measurement. This is important because CUDA syncronization time is non-trivial (order single to low double digit microseconds) and would otherwise bias the measurement. blocked_autorange sets block_size by running a warmup period, increasing block size until timer overhead is less than 0.1% of the overall computation. This value is then used for the main measurement loop. Returns A `Measurement` object that contains measured runtimes and repetition counts, and can be used to compute statistics. (mean, median, etc.) `collect_callgrind(number=100, collect_baseline=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer.collect_callgrind) Collect instruction counts using Callgrind. Unlike wall times, instruction counts are deterministic (modulo non- determinism in the program itself and small amounts of jitter from the Python interpreter.) This makes them ideal for detailed performance analysis. This method runs `stmt` in a separate process so that Valgrind can instrument the program. Performance is severely degraded due to the instrumentation, howevever this is ameliorated by the fact that a small number of iterations is generally sufficient to obtain good measurements. In order to to use this method `valgrind`, `callgrind_control`, and `callgrind_annotate` must be installed. Because there is a process boundary between the caller (this process) and the `stmt` execution, `globals` cannot contain arbitrary in-memory data structures. (Unlike timing methods) Instead, globals are restricted to builtins, `nn.Modules`’s, and TorchScripted functions/modules to reduce the surprise factor from serialization and subsequent deserialization. The `GlobalsBridge` class provides more detail on this subject. Take particular care with nn.Modules: they rely on pickle and you may need to add an import to `setup` for them to transfer properly. By default, a profile for an empty statement will be collected and cached to indicate how many instructions are from the Python loop which drives `stmt`. Returns A `CallgrindStats` object which provides instruction counts and some basic facilities for analyzing and manipulating results. `timeit(number=1000000)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/timer.html#Timer.timeit) Mirrors the semantics of timeit.Timer.timeit(). Execute the main statement (`stmt`) `number` times. `class torch.utils.benchmark.Measurement(number_per_run, raw_times, task_spec, metadata=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/common.html#Measurement) The result of a Timer measurement. This class stores one or more measurements of a given statement. It is serializable and provides several convenience methods (including a detailed __repr__) for downstream consumers. `static merge(measurements)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/common.html#Measurement.merge) Convenience method for merging replicates. Merge will extrapolate times to `number_per_run=1` and will not transfer any metadata. (Since it might differ between replicates) `property significant_figures` Approximate significant figure estimate. This property is intended to give a convenient way to estimate the precision of a measurement. It only uses the interquartile region to estimate statistics to try to mitigate skew from the tails, and uses a static z value of 1.645 since it is not expected to be used for small values of `n`, so z can approximate `t`. The significant figure estimation used in conjunction with the `trim_sigfig` method to provide a more human interpretable data summary. __repr__ does not use this method; it simply displays raw values. Significant figure estimation is intended for `Compare`. `class torch.utils.benchmark.CallgrindStats(task_spec, number_per_run, built_with_debug_symbols, baseline_inclusive_stats, baseline_exclusive_stats, stmt_inclusive_stats, stmt_exclusive_stats)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats) Top level container for Callgrind results collected by Timer. Manipulation is generally done using the FunctionCounts class, which is obtained by calling `CallgrindStats.stats(…)`. Several convenience methods are provided as well; the most significant is `CallgrindStats.as_standardized()`. `as_standardized()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.as_standardized) Strip library names and some prefixes from function strings. When comparing two different sets of instruction counts, on stumbling block can be path prefixes. Callgrind includes the full filepath when reporting a function (as it should). However, this can cause issues when diffing profiles. If a key component such as Python or PyTorch was built in separate locations in the two profiles, which can result in something resembling: 23234231 /tmp/first_build_dir/thing.c:foo(...) 9823794 /tmp/first_build_dir/thing.c:bar(...) ... 53453 .../aten/src/Aten/...:function_that_actually_changed(...) ... -9823794 /tmp/second_build_dir/thing.c:bar(...) -23234231 /tmp/second_build_dir/thing.c:foo(...) Stripping prefixes can ameliorate this issue by regularizing the strings and causing better cancellation of equivilent call sites when diffing. `counts(*, denoise=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.counts) Returns the total number of instructions executed. See `FunctionCounts.denoise()` for an explation of the `denoise` arg. `delta(other, inclusive=False, subtract_baselines=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.delta) Diff two sets of counts. One common reason to collect instruction counts is to determine the the effect that a particular change will have on the number of instructions needed to perform some unit of work. If a change increases that number, the next logical question is “why”. This generally involves looking at what part if the code increased in instruction count. This function automates that process so that one can easily diff counts on both an inclusive and exclusive basis. The `subtract_baselines` argument allows one to disable baseline correction, though in most cases it shouldn’t matter as the baselines are expected to more or less cancel out. `stats(inclusive=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#CallgrindStats.stats) Returns detailed function counts. Conceptually, the FunctionCounts returned can be thought of as a tuple of (count, path_and_function_name) tuples. `inclusive` matches the semantics of callgrind. If True, the counts include instructions executed by children. `inclusive=True` is useful for identifying hot spots in code; `inclusive=False` is useful for reducing noise when diffing counts from two different runs. (See CallgrindStats.delta(…) for more details) `class torch.utils.benchmark.FunctionCounts(_data, inclusive, _linewidth=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts) Container for manipulating Callgrind results. It supports: 1. Addition and subtraction to combine or diff results. 2. Tuple-like indexing. 3. A `denoise` function which strips CPython calls which are known to be non-deterministic and quite noisy. 4. Two higher order methods (`filter` and `transform`) for custom manipulation. `denoise()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts.denoise) Remove known noisy instructions. Several instructions in the CPython interpreter are rather noisy. These instructions involve unicode to dictionary lookups which Python uses to map variable names. FunctionCounts is generally a content agnostic container, however this is sufficiently important for obtaining reliable results to warrant an exception. `filter(filter_fn)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts.filter) Keep only the elements where `filter_fn` applied to function name returns True. `transform(map_fn)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/benchmark/utils/valgrind_wrapper/timer_interface.html#FunctionCounts.transform) Apply `map_fn` to all of the function names. This can be used to regularize function names (e.g. stripping irrelevant parts of the file path), coalesce entries by mapping multiple functions to the same name (in which case the counts are added together), etc. # torch.utils.bottleneck `torch.utils.bottleneck` is a tool that can be used as an initial step for debugging bottlenecks in your program. It summarizes runs of your script with the Python profiler and PyTorch’s autograd profiler. Run it on the command line with python -m torch.utils.bottleneck /path/to/source/script.py [args] where [args] are any number of arguments to `script.py`, or run `python -m torch.utils.bottleneck -h` for more usage instructions. Warning Because your script will be profiled, please ensure that it exits in a finite amount of time. Warning Due to the asynchronous nature of CUDA kernels, when running against CUDA code, the cProfile output and CPU-mode autograd profilers may not show correct timings: the reported CPU time reports the amount of time used to launch the kernels but does not include the time the kernel spent executing on a GPU unless the operation does a synchronize. Ops that do synchronize appear to be extremely expensive under regular CPU-mode profilers. In these case where timings are incorrect, the CUDA-mode autograd profiler may be helpful. Note To decide which (CPU-only-mode or CUDA-mode) autograd profiler output to look at, you should first check if your script is CPU-bound (“CPU total time is much greater than CUDA total time”). If it is CPU-bound, looking at the results of the CPU-mode autograd profiler will help. If on the other hand your script spends most of its time executing on the GPU, then it makes sense to start looking for responsible CUDA operators in the output of the CUDA-mode autograd profiler. Of course the reality is much more complicated and your script might not be in one of those two extremes depending on the part of the model you’re evaluating. If the profiler outputs don’t help, you could try looking at the result of [`torch.autograd.profiler.emit_nvtx()`](autograd#torch.autograd.profiler.emit_nvtx "torch.autograd.profiler.emit_nvtx") with `nvprof`. However, please take into account that the NVTX overhead is very high and often gives a heavily skewed timeline. Warning If you are profiling CUDA code, the first profiler that `bottleneck` runs (cProfile) will include the CUDA startup time (CUDA buffer allocation cost) in its time reporting. This should not matter if your bottlenecks result in code much slower than the CUDA startup time. For more complicated uses of the profilers (like in a multi-GPU case), please see or [`torch.autograd.profiler.profile()`](autograd#torch.autograd.profiler.profile "torch.autograd.profiler.profile") for more information. # torch.utils.checkpoint Note Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can cause persistent states like the RNG state to be advanced than they would without checkpointing. By default, checkpointing includes logic to juggle the RNG state such that checkpointed passes making use of RNG (through dropout for example) have deterministic output as compared to non-checkpointed passes. The logic to stash and restore RNG states can incur a moderate performance hit depending on the runtime of checkpointed operations. If deterministic output compared to non-checkpointed passes is not required, supply `preserve_rng_state=False` to `checkpoint` or `checkpoint_sequential` to omit stashing and restoring the RNG state during each checkpoint. The stashing logic saves and restores the RNG state for the current device and the device of all cuda Tensor arguments to the `run_fn`. However, the logic has no way to anticipate if the user will move Tensors to a new device within the `run_fn` itself. Therefore, if you move Tensors to a new device (“new” meaning not belonging to the set of [current device + devices of Tensor arguments]) within `run_fn`, deterministic output compared to non-checkpointed passes is never guaranteed. `torch.utils.checkpoint.checkpoint(function, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/checkpoint.html#checkpoint) Checkpoint a model or part of the model Checkpointing works by trading compute for memory. Rather than storing all intermediate activations of the entire computation graph for computing backward, the checkpointed part does **not** save intermediate activations, and instead recomputes them in backward pass. It can be applied on any part of a model. Specifically, in the forward pass, `function` will run in [`torch.no_grad()`](generated/torch.no_grad#torch.no_grad "torch.no_grad") manner, i.e., not storing the intermediate activations. Instead, the forward pass saves the inputs tuple and the `function` parameter. In the backwards pass, the saved inputs and `function` is retrieved, and the forward pass is computed on `function` again, now tracking the intermediate activations, and then the gradients are calculated using these activation values. Warning Checkpointing doesn’t work with [`torch.autograd.grad()`](autograd#torch.autograd.grad "torch.autograd.grad"), but only with [`torch.autograd.backward()`](autograd#torch.autograd.backward "torch.autograd.backward"). Warning If `function` invocation during backward does anything different than the one during forward, e.g., due to some global variable, the checkpointed version won’t be equivalent, and unfortunately it can’t be detected. Warning If checkpointed segment contains tensors detached from the computational graph by `detach()` or `torch.no_grad()`, the backward pass will raise an error. This is because `checkpoint` makes all the outputs require gradients which causes issues when a tensor is defined to have no gradient in the model. To circumvent this, detach the tensors outside of the `checkpoint` function. Parameters * **function** – describes what to run in the forward pass of the model or part of the model. It should also know how to handle the inputs passed as the tuple. For example, in LSTM, if user passes `(activation, hidden)`, `function` should correctly use the first input as `activation` and the second input as `hidden` * **preserve_rng_state** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=True_) – Omit stashing and restoring the RNG state during each checkpoint. * **args** – tuple containing inputs to the `function` Returns Output of running `function` on `*args` `torch.utils.checkpoint.checkpoint_sequential(functions, segments, input, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/checkpoint.html#checkpoint_sequential) A helper function for checkpointing sequential models. Sequential models execute a list of modules/functions in order (sequentially). Therefore, we can divide such a model in various segments and checkpoint each segment. All segments except the last will run in [`torch.no_grad()`](generated/torch.no_grad#torch.no_grad "torch.no_grad") manner, i.e., not storing the intermediate activations. The inputs of each checkpointed segment will be saved for re-running the segment in the backward pass. See `checkpoint()` on how checkpointing works. Warning Checkpointing doesn’t work with [`torch.autograd.grad()`](autograd#torch.autograd.grad "torch.autograd.grad"), but only with [`torch.autograd.backward()`](autograd#torch.autograd.backward "torch.autograd.backward"). Parameters * **functions** – A [`torch.nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") or the list of modules or functions (comprising the model) to run sequentially. * **segments** – Number of chunks to create in the model * **input** – A Tensor that is input to `functions` * **preserve_rng_state** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_ _,__default=True_) – Omit stashing and restoring the RNG state during each checkpoint. Returns Output of running `functions` sequentially on `*inputs` #### Example >>> model = nn.Sequential(...) >>> input_var = checkpoint_sequential(model, chunks, input_var) # torch.utils.cpp_extension `torch.utils.cpp_extension.CppExtension(name, sources, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#CppExtension) Creates a `setuptools.Extension` for C++. Convenience method that creates a `setuptools.Extension` with the bare minimum (but often sufficient) arguments to build a C++ extension. All arguments are forwarded to the `setuptools.Extension` constructor. #### Example >>> from setuptools import setup >>> from torch.utils.cpp_extension import BuildExtension, CppExtension >>> setup( name='extension', ext_modules=[ CppExtension( name='extension', sources=['extension.cpp'], extra_compile_args=['-g']), ], cmdclass={ 'build_ext': BuildExtension }) `torch.utils.cpp_extension.CUDAExtension(name, sources, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#CUDAExtension) Creates a `setuptools.Extension` for CUDA/C++. Convenience method that creates a `setuptools.Extension` with the bare minimum (but often sufficient) arguments to build a CUDA/C++ extension. This includes the CUDA include path, library path and runtime library. All arguments are forwarded to the `setuptools.Extension` constructor. #### Example >>> from setuptools import setup >>> from torch.utils.cpp_extension import BuildExtension, CUDAExtension >>> setup( name='cuda_extension', ext_modules=[ CUDAExtension( name='cuda_extension', sources=['extension.cpp', 'extension_kernel.cu'], extra_compile_args={'cxx': ['-g'], 'nvcc': ['-O2']}) ], cmdclass={ 'build_ext': BuildExtension }) Compute capabilities: By default the extension will be compiled to run on all archs of the cards visible during the building process of the extension, plus PTX. If down the road a new card is installed the extension may need to be recompiled. If a visible card has a compute capability (CC) that’s newer than the newest version for which your nvcc can build fully-compiled binaries, Pytorch will make nvcc fall back to building kernels with the newest version of PTX your nvcc does support (see below for details on PTX). You can override the default behavior using `TORCH_CUDA_ARCH_LIST` to explicitly specify which CCs you want the extension to support: TORCH_CUDA_ARCH_LIST=”6.1 8.6” python build_my_extension.py TORCH_CUDA_ARCH_LIST=”5.2 6.0 6.1 7.0 7.5 8.0 8.6+PTX” python build_my_extension.py The +PTX option causes extension kernel binaries to include PTX instructions for the specified CC. PTX is an intermediate representation that allows kernels to runtime-compile for any CC >= the specified CC (for example, 8.6+PTX generates PTX that can runtime-compile for any GPU with CC >= 8.6). This improves your binary’s forward compatibility. However, relying on older PTX to provide forward compat by runtime-compiling for newer CCs can modestly reduce performance on those newer CCs. If you know exact CC(s) of the GPUs you want to target, you’re always better off specifying them individually. For example, if you want your extension to run on 8.0 and 8.6, “8.0+PTX” would work functionally because it includes PTX that can runtime-compile for 8.6, but “8.0 8.6” would be better. Note that while it’s possible to include all supported archs, the more archs get included the slower the building process will be, as it will build a separate kernel image for each arch. `torch.utils.cpp_extension.BuildExtension(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#BuildExtension) A custom `setuptools` build extension . This `setuptools.build_ext` subclass takes care of passing the minimum required compiler flags (e.g. `-std=c++14`) as well as mixed C++/CUDA compilation (and support for CUDA files in general). When using `BuildExtension`, it is allowed to supply a dictionary for `extra_compile_args` (rather than the usual list) that maps from languages (`cxx` or `nvcc`) to a list of additional compiler flags to supply to the compiler. This makes it possible to supply different flags to the C++ and CUDA compiler during mixed compilation. `use_ninja` (bool): If `use_ninja` is `True` (default), then we attempt to build using the Ninja backend. Ninja greatly speeds up compilation compared to the standard `setuptools.build_ext`. Fallbacks to the standard distutils backend if Ninja is not available. Note By default, the Ninja backend uses #CPUS + 2 workers to build the extension. This may use up too many resources on some systems. One can control the number of workers by setting the `MAX_JOBS` environment variable to a non-negative number. `torch.utils.cpp_extension.load(name, sources, extra_cflags=None, extra_cuda_cflags=None, extra_ldflags=None, extra_include_paths=None, build_directory=None, verbose=False, with_cuda=None, is_python_module=True, is_standalone=False, keep_intermediates=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#load) Loads a PyTorch C++ extension just-in-time (JIT). To load an extension, a Ninja build file is emitted, which is used to compile the given sources into a dynamic library. This library is subsequently loaded into the current Python process as a module and returned from this function, ready for use. By default, the directory to which the build file is emitted and the resulting library compiled to is `/torch_extensions/`, where `` is the temporary folder on the current platform and `` the name of the extension. This location can be overridden in two ways. First, if the `TORCH_EXTENSIONS_DIR` environment variable is set, it replaces `/torch_extensions` and all extensions will be compiled into subfolders of this directory. Second, if the `build_directory` argument to this function is supplied, it overrides the entire path, i.e. the library will be compiled into that folder directly. To compile the sources, the default system compiler (`c++`) is used, which can be overridden by setting the `CXX` environment variable. To pass additional arguments to the compilation process, `extra_cflags` or `extra_ldflags` can be provided. For example, to compile your extension with optimizations, pass `extra_cflags=['-O3']`. You can also use `extra_cflags` to pass further include directories. CUDA support with mixed compilation is provided. Simply pass CUDA source files (`.cu` or `.cuh`) along with other sources. Such files will be detected and compiled with nvcc rather than the C++ compiler. This includes passing the CUDA lib64 directory as a library directory, and linking `cudart`. You can pass additional flags to nvcc via `extra_cuda_cflags`, just like with `extra_cflags` for C++. Various heuristics for finding the CUDA install directory are used, which usually work fine. If not, setting the `CUDA_HOME` environment variable is the safest option. Parameters * **name** – The name of the extension to build. This MUST be the same as the name of the pybind11 module! * **sources** – A list of relative or absolute paths to C++ source files. * **extra_cflags** – optional list of compiler flags to forward to the build. * **extra_cuda_cflags** – optional list of compiler flags to forward to nvcc when building CUDA sources. * **extra_ldflags** – optional list of linker flags to forward to the build. * **extra_include_paths** – optional list of include directories to forward to the build. * **build_directory** – optional path to use as build workspace. * **verbose** – If `True`, turns on verbose logging of load steps. * **with_cuda** – Determines whether CUDA headers and libraries are added to the build. If set to `None` (default), this value is automatically determined based on the existence of `.cu` or `.cuh` in `sources`. Set it to `True`` to force CUDA headers and libraries to be included. * **is_python_module** – If `True` (default), imports the produced shared library as a Python module. If `False`, behavior depends on `is_standalone`. * **is_standalone** – If `False` (default) loads the constructed extension into the process as a plain dynamic library. If `True`, build a standalone executable. Returns Returns the loaded PyTorch extension as a Python module. `If is_python_module is False and is_standalone is False:` Returns nothing. (The shared library is loaded into the process as a side effect.) `If is_standalone is True.` Return the path to the executable. (On Windows, TORCH_LIB_PATH is added to the PATH environment variable as a side effect.) Return type If `is_python_module` is `True` #### Example >>> from torch.utils.cpp_extension import load >>> module = load( name='extension', sources=['extension.cpp', 'extension_kernel.cu'], extra_cflags=['-O2'], verbose=True) `torch.utils.cpp_extension.load_inline(name, cpp_sources, cuda_sources=None, functions=None, extra_cflags=None, extra_cuda_cflags=None, extra_ldflags=None, extra_include_paths=None, build_directory=None, verbose=False, with_cuda=None, is_python_module=True, with_pytorch_error_handling=True, keep_intermediates=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#load_inline) Loads a PyTorch C++ extension just-in-time (JIT) from string sources. This function behaves exactly like `load()`, but takes its sources as strings rather than filenames. These strings are stored to files in the build directory, after which the behavior of `load_inline()` is identical to `load()`. See [the tests](https://github.com/pytorch/pytorch/blob/master/test/test_cpp_extensions.py) for good examples of using this function. Sources may omit two required parts of a typical non-inline C++ extension: the necessary header includes, as well as the (pybind11) binding code. More precisely, strings passed to `cpp_sources` are first concatenated into a single `.cpp` file. This file is then prepended with `#include `. Furthermore, if the `functions` argument is supplied, bindings will be automatically generated for each function specified. `functions` can either be a list of function names, or a dictionary mapping from function names to docstrings. If a list is given, the name of each function is used as its docstring. The sources in `cuda_sources` are concatenated into a separate `.cu` file and prepended with `torch/types.h`, `cuda.h` and `cuda_runtime.h` includes. The `.cpp` and `.cu` files are compiled separately, but ultimately linked into a single library. Note that no bindings are generated for functions in `cuda_sources` per se. To bind to a CUDA kernel, you must create a C++ function that calls it, and either declare or define this C++ function in one of the `cpp_sources` (and include its name in `functions`). See `load()` for a description of arguments omitted below. Parameters * **cpp_sources** – A string, or list of strings, containing C++ source code. * **cuda_sources** – A string, or list of strings, containing CUDA source code. * **functions** – A list of function names for which to generate function bindings. If a dictionary is given, it should map function names to docstrings (which are otherwise just the function names). * **with_cuda** – Determines whether CUDA headers and libraries are added to the build. If set to `None` (default), this value is automatically determined based on whether `cuda_sources` is provided. Set it to `True` to force CUDA headers and libraries to be included. * **with_pytorch_error_handling** – Determines whether pytorch error and warning macros are handled by pytorch instead of pybind. To do this, each function `foo` is called via an intermediary `_safe_foo` function. This redirection might cause issues in obscure cases of cpp. This flag should be set to `False` when this redirect causes issues. #### Example >>> from torch.utils.cpp_extension import load_inline >>> source = \'\'\' at::Tensor sin_add(at::Tensor x, at::Tensor y) { return x.sin() + y.sin(); } \'\'\' >>> module = load_inline(name='inline_extension', cpp_sources=[source], functions=['sin_add']) Note By default, the Ninja backend uses #CPUS + 2 workers to build the extension. This may use up too many resources on some systems. One can control the number of workers by setting the `MAX_JOBS` environment variable to a non-negative number. `torch.utils.cpp_extension.include_paths(cuda=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#include_paths) Get the include paths required to build a C++ or CUDA extension. Parameters **cuda** – If `True`, includes CUDA-specific include paths. Returns A list of include path strings. `torch.utils.cpp_extension.check_compiler_abi_compatibility(compiler)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#check_compiler_abi_compatibility) Verifies that the given compiler is ABI-compatible with PyTorch. Parameters **compiler** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The compiler executable name to check (e.g. `g++`). Must be executable in a shell process. Returns False if the compiler is (likely) ABI-incompatible with PyTorch, else True. `torch.utils.cpp_extension.verify_ninja_availability()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#verify_ninja_availability) Raises `RuntimeError` if [ninja](https://ninja-build.org/) build system is not available on the system, does nothing otherwise. `torch.utils.cpp_extension.is_ninja_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/cpp_extension.html#is_ninja_available) Returns `True` if the [ninja](https://ninja-build.org/) build system is available on the system, `False` otherwise. # torch.cuda This package adds support for CUDA tensor types, that implement the same function as CPU tensors, but they utilize GPUs for computation. It is lazily initialized, so you can always import it, and use `is_available()` to determine if your system supports CUDA. [CUDA semantics](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- semantics) has more details about working with CUDA. `torch.cuda.can_device_access_peer(device, peer_device)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#can_device_access_peer) Checks if peer access between two devices is possible. `torch.cuda.current_blas_handle()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#current_blas_handle) Returns cublasHandle_t pointer to current cuBLAS handle `torch.cuda.current_device()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#current_device) Returns the index of a currently selected device. `torch.cuda.current_stream(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#current_stream) Returns the currently selected `Stream` for a given device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns the currently selected `Stream` for the current device, given by `current_device()`, if `device` is `None` (default). `torch.cuda.default_stream(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#default_stream) Returns the default `Stream` for a given device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns the default `Stream` for the current device, given by `current_device()`, if `device` is `None` (default). `class torch.cuda.device(device)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#device) Context-manager that changes the selected device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – device index to select. It’s a no-op if this argument is a negative integer or `None`. `torch.cuda.device_count()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#device_count) Returns the number of GPUs available. `class torch.cuda.device_of(obj)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#device_of) Context-manager that changes the current device to that of given object. You can use both tensors and storages as arguments. If a given object is not allocated on a GPU, this is a no-op. Parameters **obj** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _Storage_) – object allocated on the selected device. `torch.cuda.get_arch_list()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_arch_list) Returns list CUDA architectures this library was compiled for. `torch.cuda.get_device_capability(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_device_capability) Gets the cuda capability of a device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – device for which to return the device capability. This function is a no-op if this argument is a negative integer. It uses the current device, given by `current_device()`, if `device` is `None` (default). Returns the major and minor cuda capability of the device Return type [tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)"), [int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) `torch.cuda.get_device_name(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_device_name) Gets the name of a device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – device for which to return the name. This function is a no-op if this argument is a negative integer. It uses the current device, given by `current_device()`, if `device` is `None` (default). Returns the name of the device Return type [str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") `torch.cuda.get_device_properties(device)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_device_properties) Gets the properties of a device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – device for which to return the properties of the device. Returns the properties of the device Return type _CudaDeviceProperties `torch.cuda.get_gencode_flags()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#get_gencode_flags) Returns NVCC gencode flags this library were compiled with. `torch.cuda.init()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#init) Initialize PyTorch’s CUDA state. You may need to call this explicitly if you are interacting with PyTorch via its C API, as Python bindings for CUDA functionality will not be available until this initialization takes place. Ordinary users should not need this, as all of PyTorch’s CUDA methods automatically initialize CUDA state on-demand. Does nothing if the CUDA state is already initialized. `torch.cuda.ipc_collect()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#ipc_collect) Force collects GPU memory after it has been released by CUDA IPC. Note Checks if any sent CUDA tensors could be cleaned from the memory. Force closes shared memory file used for reference counting if there is no active counters. Useful when the producer process stopped actively sending tensors and want to release unused memory. `torch.cuda.is_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#is_available) Returns a bool indicating if CUDA is currently available. `torch.cuda.is_initialized()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#is_initialized) Returns whether PyTorch’s CUDA state has been initialized. `torch.cuda.set_device(device)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#set_device) Sets the current device. Usage of this function is discouraged in favor of `device`. In most cases it’s better to use `CUDA_VISIBLE_DEVICES` environmental variable. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – selected device. This function is a no-op if this argument is negative. `torch.cuda.stream(stream)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#stream) Context-manager that selects a given stream. All CUDA kernels queued within its context will be enqueued on a selected stream. Parameters **stream** (Stream) – selected stream. This manager is a no-op if it’s `None`. Note Streams are per-device. If the selected stream is not on the current device, this function will also change the current device to match the stream. `torch.cuda.synchronize(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda.html#synchronize) Waits for all kernels in all streams on a CUDA device to complete. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – device for which to synchronize. It uses the current device, given by `current_device()`, if `device` is `None` (default). ## Random Number Generator `torch.cuda.get_rng_state(device='cuda')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#get_rng_state) Returns the random number generator state of the specified GPU as a ByteTensor. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The device to return the RNG state of. Default: `'cuda'` (i.e., `torch.device('cuda')`, the current CUDA device). Warning This function eagerly initializes CUDA. `torch.cuda.get_rng_state_all()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#get_rng_state_all) Returns a list of ByteTensor representing the random number states of all devices. `torch.cuda.set_rng_state(new_state, device='cuda')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#set_rng_state) Sets the random number generator state of the specified GPU. Parameters * **new_state** (_torch.ByteTensor_) – The desired state * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The device to set the RNG state. Default: `'cuda'` (i.e., `torch.device('cuda')`, the current CUDA device). `torch.cuda.set_rng_state_all(new_states)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#set_rng_state_all) Sets the random number generator state of all devices. Parameters **new_states** (_Iterable of torch.ByteTensor_) – The desired state for each device `torch.cuda.manual_seed(seed)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#manual_seed) Sets the seed for generating random numbers for the current GPU. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. Parameters **seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The desired seed. Warning If you are working with a multi-GPU model, this function is insufficient to get determinism. To seed all GPUs, use `manual_seed_all()`. `torch.cuda.manual_seed_all(seed)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#manual_seed_all) Sets the seed for generating random numbers on all GPUs. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. Parameters **seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The desired seed. `torch.cuda.seed()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#seed) Sets the seed for generating random numbers to a random number for the current GPU. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. Warning If you are working with a multi-GPU model, this function will only initialize the seed on one GPU. To initialize all GPUs, use `seed_all()`. `torch.cuda.seed_all()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#seed_all) Sets the seed for generating random numbers to a random number on all GPUs. It’s safe to call this function if CUDA is not available; in that case, it is silently ignored. `torch.cuda.initial_seed()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/random.html#initial_seed) Returns the current random seed of the current GPU. Warning This function eagerly initializes CUDA. ## Communication collectives `torch.cuda.comm.broadcast(tensor, devices=None, *, out=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#broadcast) Broadcasts a tensor to specified GPU devices. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – tensor to broadcast. Can be on CPU or GPU. * **devices** (_Iterable_ _[_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – an iterable of GPU devices, among which to broadcast. * **out** (_Sequence_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__,__optional_ _,__keyword-only_) – the GPU tensors to store output results. Note Exactly one of `devices` and `out` must be specified. Returns * `If devices is specified,` a tuple containing copies of `tensor`, placed on `devices`. * `If out is specified,` a tuple containing `out` tensors, each containing a copy of `tensor`. `torch.cuda.comm.broadcast_coalesced(tensors, devices, buffer_size=10485760)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#broadcast_coalesced) Broadcasts a sequence tensors to the specified GPUs. Small tensors are first coalesced into a buffer to reduce the number of synchronizations. Parameters * **tensors** (_sequence_) – tensors to broadcast. Must be on the same device, either CPU or GPU. * **devices** (_Iterable_ _[_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – an iterable of GPU devices, among which to broadcast. * **buffer_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximum size of the buffer used for coalescing Returns A tuple containing copies of `tensor`, placed on `devices`. `torch.cuda.comm.reduce_add(inputs, destination=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#reduce_add) Sums tensors from multiple GPUs. All inputs should have matching shapes, dtype, and layout. The output tensor will be of the same shape, dtype, and layout. Parameters * **inputs** (_Iterable_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – an iterable of tensors to add. * **destination** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a device on which the output will be placed (default: current device). Returns A tensor containing an elementwise sum of all inputs, placed on the `destination` device. `torch.cuda.comm.scatter(tensor, devices=None, chunk_sizes=None, dim=0, streams=None, *, out=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#scatter) Scatters tensor across multiple GPUs. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – tensor to scatter. Can be on CPU or GPU. * **devices** (_Iterable_ _[_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – an iterable of GPU devices, among which to scatter. * **chunk_sizes** (_Iterable_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – sizes of chunks to be placed on each device. It should match `devices` in length and sums to `tensor.size(dim)`. If not specified, `tensor` will be divided into equal chunks. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – A dimension along which to chunk `tensor`. Default: `0`. * **streams** (_Iterable_ _[_Stream _]__,__optional_) – an iterable of Streams, among which to execute the scatter. If not specified, the default stream will be utilized. * **out** (_Sequence_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__,__optional_ _,__keyword-only_) – the GPU tensors to store output results. Sizes of these tensors must match that of `tensor`, except for `dim`, where the total size must sum to `tensor.size(dim)`. Note Exactly one of `devices` and `out` must be specified. When `out` is specified, `chunk_sizes` must not be specified and will be inferred from sizes of `out`. Returns * `If devices is specified,` a tuple containing chunks of `tensor`, placed on `devices`. * `If out is specified,` a tuple containing `out` tensors, each containing a chunk of `tensor`. `torch.cuda.comm.gather(tensors, dim=0, destination=None, *, out=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/comm.html#gather) Gathers tensors from multiple GPU devices. Parameters * **tensors** (_Iterable_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – an iterable of tensors to gather. Tensor sizes in all dimensions other than `dim` have to match. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a dimension along which the tensors will be concatenated. Default: `0`. * **destination** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _, or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the output device. Can be CPU or CUDA. Default: the current CUDA device. * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_ _,__keyword-only_) – the tensor to store gather result. Its sizes must match those of `tensors`, except for `dim`, where the size must equal `sum(tensor.size(dim) for tensor in tensors)`. Can be on CPU or CUDA. Note `destination` must not be specified when `out` is specified. Returns * `If destination is specified,` a tensor located on `destination` device, that is a result of concatenating `tensors` along `dim`. * `If out is specified,` the `out` tensor, now containing results of concatenating `tensors` along `dim`. ## Streams and events `class torch.cuda.Stream` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream) Wrapper around a CUDA stream. A CUDA stream is a linear sequence of execution that belongs to a specific device, independent from other streams. See [CUDA semantics](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-semantics) for details. Parameters * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a device on which to allocate the stream. If `device` is `None` (default) or a negative integer, this will use the current device. * **priority** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – priority of the stream. Can be either -1 (high priority) or 0 (low priority). By default, streams have priority 0. Note Although CUDA versions >= 11 support more than two levels of priorities, in PyTorch, we only support two levels of priorities. `query()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.query) Checks if all the work submitted has been completed. Returns A boolean indicating if all kernels in this stream are completed. `record_event(event=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.record_event) Records an event. Parameters **event** (Event _,__optional_) – event to record. If not given, a new one will be allocated. Returns Recorded event. `synchronize()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.synchronize) Wait for all the kernels in this stream to complete. Note This is a wrapper around `cudaStreamSynchronize()`: see [CUDA Stream documentation](https://docs.nvidia.com/cuda/cuda-runtime- api/group__CUDART__STREAM.html) for more info. `wait_event(event)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.wait_event) Makes all future work submitted to the stream wait for an event. Parameters **event** (Event) – an event to wait for. Note This is a wrapper around `cudaStreamWaitEvent()`: see [CUDA Stream documentation](https://docs.nvidia.com/cuda/cuda-runtime- api/group__CUDART__STREAM.html) for more info. This function returns without waiting for `event`: only future operations are affected. `wait_stream(stream)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Stream.wait_stream) Synchronizes with another stream. All future work submitted to this stream will wait until all kernels submitted to a given stream at the time of call complete. Parameters **stream** (Stream) – a stream to synchronize. Note This function returns without waiting for currently enqueued kernels in `stream`: only future operations are affected. `class torch.cuda.Event` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event) Wrapper around a CUDA event. CUDA events are synchronization markers that can be used to monitor the device’s progress, to accurately measure timing, and to synchronize CUDA streams. The underlying CUDA events are lazily initialized when the event is first recorded or exported to another process. After creation, only streams on the same device may record the event. However, streams on any device can wait on the event. Parameters * **enable_timing** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates if the event should measure time (default: `False`) * **blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, `wait()` will be blocking (default: `False`) * **interprocess** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `True`, the event can be shared between processes (default: `False`) `elapsed_time(end_event)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.elapsed_time) Returns the time elapsed in milliseconds after the event was recorded and before the end_event was recorded. `classmethod from_ipc_handle(device, handle)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.from_ipc_handle) Reconstruct an event from an IPC handle on the given device. `ipc_handle()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.ipc_handle) Returns an IPC handle of this event. If not recorded yet, the event will use the current device. `query()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.query) Checks if all work currently captured by event has completed. Returns A boolean indicating if all work currently captured by event has completed. `record(stream=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.record) Records the event in a given stream. Uses `torch.cuda.current_stream()` if no stream is specified. The stream’s device must match the event’s device. `synchronize()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.synchronize) Waits for the event to complete. Waits until the completion of all work currently captured in this event. This prevents the CPU thread from proceeding until the event completes. Note This is a wrapper around `cudaEventSynchronize()`: see [CUDA Event documentation](https://docs.nvidia.com/cuda/cuda-runtime- api/group__CUDART__EVENT.html) for more info. `wait(stream=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/streams.html#Event.wait) Makes all future work submitted to the given stream wait for this event. Use `torch.cuda.current_stream()` if no stream is specified. ## Memory management `torch.cuda.empty_cache()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#empty_cache) Releases all unoccupied cached memory currently held by the caching allocator so that those can be used in other GPU application and visible in `nvidia- smi`. Note `empty_cache()` doesn’t increase the amount of GPU memory available for PyTorch. However, it may help reduce fragmentation of GPU memory in certain cases. See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-memory- management) for more details about GPU memory management. `torch.cuda.list_gpu_processes(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#list_gpu_processes) Returns a human-readable printout of the running processes and their GPU memory use for a given device. This can be useful to display periodically during training, or when handling out-of-memory exceptions. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns printout for the current device, given by `current_device()`, if `device` is `None` (default). `torch.cuda.memory_stats(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_stats) Returns a dictionary of CUDA memory allocator statistics for a given device. The return value of this function is a dictionary of statistics, each of which is a non-negative integer. Core statistics: * `"allocated.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of allocation requests received by the memory allocator. * `"allocated_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of allocated memory. * `"segment.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of reserved segments from `cudaMalloc()`. * `"reserved_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of reserved memory. * `"active.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of active memory blocks. * `"active_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of active memory. * `"inactive_split.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: number of inactive, non-releasable memory blocks. * `"inactive_split_bytes.{all,large_pool,small_pool}.{current,peak,allocated,freed}"`: amount of inactive, non-releasable memory. For these core statistics, values are broken down as follows. Pool type: * `all`: combined statistics across all memory pools. * `large_pool`: statistics for the large allocation pool (as of October 2019, for size >= 1MB allocations). * `small_pool`: statistics for the small allocation pool (as of October 2019, for size < 1MB allocations). Metric type: * `current`: current value of this metric. * `peak`: maximum value of this metric. * `allocated`: historical total increase in this metric. * `freed`: historical total decrease in this metric. In addition to the core statistics, we also provide some simple event counters: * `"num_alloc_retries"`: number of failed `cudaMalloc` calls that result in a cache flush and retry. * `"num_ooms"`: number of out-of-memory errors thrown. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns statistics for the current device, given by `current_device()`, if `device` is `None` (default). Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. `torch.cuda.memory_summary(device=None, abbreviated=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_summary) Returns a human-readable printout of the current memory allocator statistics for a given device. This can be useful to display periodically during training, or when handling out-of-memory exceptions. Parameters * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns printout for the current device, given by `current_device()`, if `device` is `None` (default). * **abbreviated** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to return an abbreviated summary (default: False). Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. `torch.cuda.memory_snapshot()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_snapshot) Returns a snapshot of the CUDA memory allocator state across all devices. Interpreting the output of this function requires familiarity with the memory allocator internals. Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. `torch.cuda.memory_allocated(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_allocated) Returns the current GPU memory occupied by tensors in bytes for a given device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns statistic for the current device, given by `current_device()`, if `device` is `None` (default). Note This is likely less than the amount shown in `nvidia-smi` since some unused memory can be held by the caching allocator and some context needs to be created on GPU. See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-memory- management) for more details about GPU memory management. `torch.cuda.max_memory_allocated(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#max_memory_allocated) Returns the maximum GPU memory occupied by tensors in bytes for a given device. By default, this returns the peak allocated memory since the beginning of this program. `reset_peak_stats()` can be used to reset the starting point in tracking this metric. For example, these two functions can measure the peak allocated memory usage of each iteration in a training loop. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns statistic for the current device, given by `current_device()`, if `device` is `None` (default). Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. `torch.cuda.reset_max_memory_allocated(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#reset_max_memory_allocated) Resets the starting point in tracking maximum GPU memory occupied by tensors for a given device. See `max_memory_allocated()` for details. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns statistic for the current device, given by `current_device()`, if `device` is `None` (default). Warning This function now calls `reset_peak_memory_stats()`, which resets /all/ peak memory stats. Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. `torch.cuda.memory_reserved(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_reserved) Returns the current GPU memory managed by the caching allocator in bytes for a given device. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns statistic for the current device, given by `current_device()`, if `device` is `None` (default). Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. `torch.cuda.max_memory_reserved(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#max_memory_reserved) Returns the maximum GPU memory managed by the caching allocator in bytes for a given device. By default, this returns the peak cached memory since the beginning of this program. `reset_peak_stats()` can be used to reset the starting point in tracking this metric. For example, these two functions can measure the peak cached memory amount of each iteration in a training loop. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns statistic for the current device, given by `current_device()`, if `device` is `None` (default). Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. `torch.cuda.set_per_process_memory_fraction(fraction, device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#set_per_process_memory_fraction) Set memory fraction for a process. The fraction is used to limit an caching allocator to allocated memory on a CUDA device. The allowed value equals the total visible memory multiplied fraction. If trying to allocate more than the allowed value in a process, will raise an out of memory error in allocator. Parameters * **fraction** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Range: 0~1. Allowed memory equals total_memory * fraction. * **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. If it is `None` the default CUDA device is used. Note In general, the total available free memory is less than the total capacity. `torch.cuda.memory_cached(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#memory_cached) Deprecated; see `memory_reserved()`. `torch.cuda.max_memory_cached(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#max_memory_cached) Deprecated; see `max_memory_reserved()`. `torch.cuda.reset_max_memory_cached(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/memory.html#reset_max_memory_cached) Resets the starting point in tracking maximum GPU memory managed by the caching allocator for a given device. See `max_memory_cached()` for details. Parameters **device** ([torch.device](tensor_attributes#torch.torch.device "torch.torch.device") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – selected device. Returns statistic for the current device, given by `current_device()`, if `device` is `None` (default). Warning This function now calls `reset_peak_memory_stats()`, which resets /all/ peak memory stats. Note See [Memory management](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda- memory-management) for more details about GPU memory management. ## NVIDIA Tools Extension (NVTX) `torch.cuda.nvtx.mark(msg)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/nvtx.html#mark) Describe an instantaneous event that occurred at some point. Parameters **msg** (_string_) – ASCII message to associate with the event. `torch.cuda.nvtx.range_push(msg)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/nvtx.html#range_push) Pushes a range onto a stack of nested range span. Returns zero-based depth of the range that is started. Parameters **msg** (_string_) – ASCII message to associate with range `torch.cuda.nvtx.range_pop()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/cuda/nvtx.html#range_pop) Pops a range off of a stack of nested range spans. Returns the zero-based depth of the range that is ended. # torch.utils.data At the heart of PyTorch data loading utility is the `torch.utils.data.DataLoader` class. It represents a Python iterable over a dataset, with support for * map-style and iterable-style datasets, * customizing data loading order, * automatic batching, * single- and multi-process data loading, * automatic memory pinning. These options are configured by the constructor arguments of a `DataLoader`, which has signature: DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, *, prefetch_factor=2, persistent_workers=False) The sections below describe in details the effects and usages of these options. ## Dataset Types The most important argument of `DataLoader` constructor is `dataset`, which indicates a dataset object to load data from. PyTorch supports two different types of datasets: * map-style datasets, * iterable-style datasets. ### Map-style datasets A map-style dataset is one that implements the `__getitem__()` and `__len__()` protocols, and represents a map from (possibly non-integral) indices/keys to data samples. For example, such a dataset, when accessed with `dataset[idx]`, could read the `idx`-th image and its corresponding label from a folder on the disk. See `Dataset` for more details. ### Iterable-style datasets An iterable-style dataset is an instance of a subclass of `IterableDataset` that implements the `__iter__()` protocol, and represents an iterable over data samples. This type of datasets is particularly suitable for cases where random reads are expensive or even improbable, and where the batch size depends on the fetched data. For example, such a dataset, when called `iter(dataset)`, could return a stream of data reading from a database, a remote server, or even logs generated in real time. See `IterableDataset` for more details. Note When using an `IterableDataset` with multi-process data loading. The same dataset object is replicated on each worker process, and thus the replicas must be configured differently to avoid duplicated data. See `IterableDataset` documentations for how to achieve this. ## Data Loading Order and Sampler For iterable-style datasets, data loading order is entirely controlled by the user-defined iterable. This allows easier implementations of chunk-reading and dynamic batch size (e.g., by yielding a batched sample at each time). The rest of this section concerns the case with map-style datasets. `torch.utils.data.Sampler` classes are used to specify the sequence of indices/keys used in data loading. They represent iterable objects over the indices to datasets. E.g., in the common case with stochastic gradient decent (SGD), a `Sampler` could randomly permute a list of indices and yield each one at a time, or yield a small number of them for mini-batch SGD. A sequential or shuffled sampler will be automatically constructed based on the `shuffle` argument to a `DataLoader`. Alternatively, users may use the `sampler` argument to specify a custom `Sampler` object that at each time yields the next index/key to fetch. A custom `Sampler` that yields a list of batch indices at a time can be passed as the `batch_sampler` argument. Automatic batching can also be enabled via `batch_size` and `drop_last` arguments. See the next section for more details on this. Note Neither `sampler` nor `batch_sampler` is compatible with iterable-style datasets, since such datasets have no notion of a key or an index. ## Loading Batched and Non-Batched Data `DataLoader` supports automatically collating individual fetched data samples into batches via arguments `batch_size`, `drop_last`, and `batch_sampler`. ### Automatic batching (default) This is the most common case, and corresponds to fetching a minibatch of data and collating them into batched samples, i.e., containing Tensors with one dimension being the batch dimension (usually the first). When `batch_size` (default `1`) is not `None`, the data loader yields batched samples instead of individual samples. `batch_size` and `drop_last` arguments are used to specify how the data loader obtains batches of dataset keys. For map-style datasets, users can alternatively specify `batch_sampler`, which yields a list of keys at a time. Note The `batch_size` and `drop_last` arguments essentially are used to construct a `batch_sampler` from `sampler`. For map-style datasets, the `sampler` is either provided by user or constructed based on the `shuffle` argument. For iterable-style datasets, the `sampler` is a dummy infinite one. See this section on more details on samplers. Note When fetching from iterable-style datasets with multi-processing, the `drop_last` argument drops the last non-full batch of each worker’s dataset replica. After fetching a list of samples using the indices from sampler, the function passed as the `collate_fn` argument is used to collate lists of samples into batches. In this case, loading from a map-style dataset is roughly equivalent with: for indices in batch_sampler: yield collate_fn([dataset[i] for i in indices]) and loading from an iterable-style dataset is roughly equivalent with: dataset_iter = iter(dataset) for indices in batch_sampler: yield collate_fn([next(dataset_iter) for _ in indices]) A custom `collate_fn` can be used to customize collation, e.g., padding sequential data to max length of a batch. See this section on more about `collate_fn`. ### Disable automatic batching In certain cases, users may want to handle batching manually in dataset code, or simply load individual samples. For example, it could be cheaper to directly load batched data (e.g., bulk reads from a database or reading continuous chunks of memory), or the batch size is data dependent, or the program is designed to work on individual samples. Under these scenarios, it’s likely better to not use automatic batching (where `collate_fn` is used to collate the samples), but let the data loader directly return each member of the `dataset` object. When both `batch_size` and `batch_sampler` are `None` (default value for `batch_sampler` is already `None`), automatic batching is disabled. Each sample obtained from the `dataset` is processed with the function passed as the `collate_fn` argument. **When automatic batching is disabled** , the default `collate_fn` simply converts NumPy arrays into PyTorch Tensors, and keeps everything else untouched. In this case, loading from a map-style dataset is roughly equivalent with: for index in sampler: yield collate_fn(dataset[index]) and loading from an iterable-style dataset is roughly equivalent with: for data in iter(dataset): yield collate_fn(data) See this section on more about `collate_fn`. ### Working with `collate_fn` The use of `collate_fn` is slightly different when automatic batching is enabled or disabled. **When automatic batching is disabled** , `collate_fn` is called with each individual data sample, and the output is yielded from the data loader iterator. In this case, the default `collate_fn` simply converts NumPy arrays in PyTorch tensors. **When automatic batching is enabled** , `collate_fn` is called with a list of data samples at each time. It is expected to collate the input samples into a batch for yielding from the data loader iterator. The rest of this section describes behavior of the default `collate_fn` in this case. For instance, if each data sample consists of a 3-channel image and an integral class label, i.e., each element of the dataset returns a tuple `(image, class_index)`, the default `collate_fn` collates a list of such tuples into a single tuple of a batched image tensor and a batched class label Tensor. In particular, the default `collate_fn` has the following properties: * It always prepends a new dimension as the batch dimension. * It automatically converts NumPy arrays and Python numerical values into PyTorch Tensors. * It preserves the data structure, e.g., if each sample is a dictionary, it outputs a dictionary with the same set of keys but batched Tensors as values (or lists if the values can not be converted into Tensors). Same for `list` s, `tuple` s, `namedtuple` s, etc. Users may use customized `collate_fn` to achieve custom batching, e.g., collating along a dimension other than the first, padding sequences of various lengths, or adding support for custom data types. ## Single- and Multi-process Data Loading A `DataLoader` uses single-process data loading by default. Within a Python process, the [Global Interpreter Lock (GIL)](https://wiki.python.org/moin/GlobalInterpreterLock) prevents true fully parallelizing Python code across threads. To avoid blocking computation code with data loading, PyTorch provides an easy switch to perform multi-process data loading by simply setting the argument `num_workers` to a positive integer. ### Single-process data loading (default) In this mode, data fetching is done in the same process a `DataLoader` is initialized. Therefore, data loading may block computing. However, this mode may be preferred when resource(s) used for sharing data among processes (e.g., shared memory, file descriptors) is limited, or when the entire dataset is small and can be loaded entirely in memory. Additionally, single-process loading often shows more readable error traces and thus is useful for debugging. ### Multi-process data loading Setting the argument `num_workers` as a positive integer will turn on multi- process data loading with the specified number of loader worker processes. In this mode, each time an iterator of a `DataLoader` is created (e.g., when you call `enumerate(dataloader)`), `num_workers` worker processes are created. At this point, the `dataset`, `collate_fn`, and `worker_init_fn` are passed to each worker, where they are used to initialize, and fetch data. This means that dataset access together with its internal IO, transforms (including `collate_fn`) runs in the worker process. `torch.utils.data.get_worker_info()` returns various useful information in a worker process (including the worker id, dataset replica, initial seed, etc.), and returns `None` in main process. Users may use this function in dataset code and/or `worker_init_fn` to individually configure each dataset replica, and to determine whether the code is running in a worker process. For example, this can be particularly helpful in sharding the dataset. For map-style datasets, the main process generates the indices using `sampler` and sends them to the workers. So any shuffle randomization is done in the main process which guides loading by assigning indices to load. For iterable-style datasets, since each worker process gets a replica of the `dataset` object, naive multi-process loading will often result in duplicated data. Using `torch.utils.data.get_worker_info()` and/or `worker_init_fn`, users may configure each replica independently. (See `IterableDataset` documentations for how to achieve this. ) For similar reasons, in multi- process loading, the `drop_last` argument drops the last non-full batch of each worker’s iterable-style dataset replica. Workers are shut down once the end of the iteration is reached, or when the iterator becomes garbage collected. Warning It is generally not recommended to return CUDA tensors in multi-process loading because of many subtleties in using CUDA and sharing CUDA tensors in multiprocessing (see [CUDA in multiprocessing](https://pytorch.org/docs/1.8.0/notes/multiprocessing.html#multiprocessing- cuda-note)). Instead, we recommend using automatic memory pinning (i.e., setting `pin_memory=True`), which enables fast data transfer to CUDA-enabled GPUs. #### Platform-specific behaviors Since workers rely on Python [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module- multiprocessing "\(in Python v3.9\)"), worker launch behavior is different on Windows compared to Unix. * On Unix, `fork()` is the default [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "\(in Python v3.9\)") start method. Using `fork()`, child workers typically can access the `dataset` and Python argument functions directly through the cloned address space. * On Windows, `spawn()` is the default [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module-multiprocessing "\(in Python v3.9\)") start method. Using `spawn()`, another interpreter is launched which runs your main script, followed by the internal worker function that receives the `dataset`, `collate_fn` and other arguments through [`pickle`](https://docs.python.org/3/library/pickle.html#module-pickle "\(in Python v3.9\)") serialization. This separate serialization means that you should take two steps to ensure you are compatible with Windows while using multi-process data loading: * Wrap most of you main script’s code within `if __name__ == '__main__':` block, to make sure it doesn’t run again (most likely generating error) when each worker process is launched. You can place your dataset and `DataLoader` instance creation logic here, as it doesn’t need to be re-executed in workers. * Make sure that any custom `collate_fn`, `worker_init_fn` or `dataset` code is declared as top level definitions, outside of the `__main__` check. This ensures that they are available in worker processes. (this is needed since functions are pickled as references only, not `bytecode`.) #### Randomness in multi-process data loading By default, each worker will have its PyTorch seed set to `base_seed + worker_id`, where `base_seed` is a long generated by main process using its RNG (thereby, consuming a RNG state mandatorily). However, seeds for other libraries may be duplicated upon initializing workers (e.g., NumPy), causing each worker to return identical random numbers. (See [this section](https://pytorch.org/docs/1.8.0/notes/faq.html#dataloader-workers- random-seed) in FAQ.). In `worker_init_fn`, you may access the PyTorch seed set for each worker with either `torch.utils.data.get_worker_info().seed` or [`torch.initial_seed()`](generated/torch.initial_seed#torch.initial_seed "torch.initial_seed"), and use it to seed other libraries before data loading. ## Memory Pinning Host to GPU copies are much faster when they originate from pinned (page- locked) memory. See [Use pinned memory buffers](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-memory-pinning) for more details on when and how to use pinned memory generally. For data loading, passing `pin_memory=True` to a `DataLoader` will automatically put the fetched data Tensors in pinned memory, and thus enables faster data transfer to CUDA-enabled GPUs. The default memory pinning logic only recognizes Tensors and maps and iterables containing Tensors. By default, if the pinning logic sees a batch that is a custom type (which will occur if you have a `collate_fn` that returns a custom batch type), or if each element of your batch is a custom type, the pinning logic will not recognize them, and it will return that batch (or those elements) without pinning the memory. To enable memory pinning for custom batch or data type(s), define a `pin_memory()` method on your custom type(s). See the example below. Example: class SimpleCustomBatch: def __init__(self, data): transposed_data = list(zip(*data)) self.inp = torch.stack(transposed_data[0], 0) self.tgt = torch.stack(transposed_data[1], 0) # custom memory pinning method on custom type def pin_memory(self): self.inp = self.inp.pin_memory() self.tgt = self.tgt.pin_memory() return self def collate_wrapper(batch): return SimpleCustomBatch(batch) inps = torch.arange(10 * 5, dtype=torch.float32).view(10, 5) tgts = torch.arange(10 * 5, dtype=torch.float32).view(10, 5) dataset = TensorDataset(inps, tgts) loader = DataLoader(dataset, batch_size=2, collate_fn=collate_wrapper, pin_memory=True) for batch_ndx, sample in enumerate(loader): print(sample.inp.is_pinned()) print(sample.tgt.is_pinned()) `class torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0, worker_init_fn=None, multiprocessing_context=None, generator=None, *, prefetch_factor=2, persistent_workers=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataloader.html#DataLoader) Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset. The `DataLoader` supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching (collation) and memory pinning. See `torch.utils.data` documentation page for more details. Parameters * **dataset** (Dataset) – dataset from which to load the data. * **batch_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – how many samples per batch to load (default: `1`). * **shuffle** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – set to `True` to have the data reshuffled at every epoch (default: `False`). * **sampler** (Sampler _or_ _Iterable_ _,__optional_) – defines the strategy to draw samples from the dataset. Can be any `Iterable` with `__len__` implemented. If specified, `shuffle` must not be specified. * **batch_sampler** (Sampler _or_ _Iterable_ _,__optional_) – like `sampler`, but returns a batch of indices at a time. Mutually exclusive with `batch_size`, `shuffle`, `sampler`, and `drop_last`. * **num_workers** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – how many subprocesses to use for data loading. `0` means that the data will be loaded in the main process. (default: `0`) * **collate_fn** (_callable_ _,__optional_) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset. * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the data loader will copy Tensors into CUDA pinned memory before returning them. If your data elements are a custom type, or your `collate_fn` returns a batch that is a custom type, see the example below. * **drop_last** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – set to `True` to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If `False` and the size of dataset is not divisible by the batch size, then the last batch will be smaller. (default: `False`) * **timeout** (_numeric_ _,__optional_) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. (default: `0`) * **worker_init_fn** (_callable_ _,__optional_) – If not `None`, this will be called on each worker subprocess with the worker id (an int in `[0, num_workers - 1]`) as input, after seeding and before data loading. (default: `None`) * **prefetch_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_ _,__keyword-only arg_) – Number of samples loaded in advance by each worker. `2` means there will be a total of 2 * num_workers samples prefetched across all workers. (default: `2`) * **persistent_workers** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers `Dataset` instances alive. (default: `False`) Warning If the `spawn` start method is used, `worker_init_fn` cannot be an unpicklable object, e.g., a lambda function. See [Multiprocessing best practices](https://pytorch.org/docs/1.8.0/notes/multiprocessing.html#multiprocessing- best-practices) on more details related to multiprocessing in PyTorch. Warning `len(dataloader)` heuristic is based on the length of the sampler used. When `dataset` is an `IterableDataset`, it instead returns an estimate based on `len(dataset) / batch_size`, with proper rounding depending on `drop_last`, regardless of multi-process loading configurations. This represents the best guess PyTorch can make because PyTorch trusts user `dataset` code in correctly handling multi-process loading to avoid duplicate data. However, if sharding results in multiple workers having incomplete last batches, this estimate can still be inaccurate, because (1) an otherwise complete batch can be broken into multiple ones and (2) more than one batch worth of samples can be dropped when `drop_last` is set. Unfortunately, PyTorch can not detect such cases in general. See Dataset Types for more details on these two types of datasets and how `IterableDataset` interacts with Multi-process data loading. Warning See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html#reproducibility), and [My data loader workers return identical random numbers](https://pytorch.org/docs/1.8.0/notes/faq.html#dataloader-workers- random-seed), and Randomness in multi-process data loading notes for random seed related questions. `class torch.utils.data.Dataset` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#Dataset) An abstract class representing a `Dataset`. All datasets that represent a map from keys to data samples should subclass it. All subclasses should overwrite `__getitem__()`, supporting fetching a data sample for a given key. Subclasses could also optionally overwrite `__len__()`, which is expected to return the size of the dataset by many `Sampler` implementations and the default options of `DataLoader`. Note `DataLoader` by default constructs a index sampler that yields integral indices. To make it work with a map-style dataset with non-integral indices/keys, a custom sampler must be provided. `class torch.utils.data.IterableDataset` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#IterableDataset) An iterable Dataset. All datasets that represent an iterable of data samples should subclass it. Such form of datasets is particularly useful when data come from a stream. All subclasses should overwrite `__iter__()`, which would return an iterator of samples in this dataset. When a subclass is used with `DataLoader`, each item in the dataset will be yielded from the `DataLoader` iterator. When `num_workers > 0`, each worker process will have a different copy of the dataset object, so it is often desired to configure each copy independently to avoid having duplicate data returned from the workers. `get_worker_info()`, when called in a worker process, returns information about the worker. It can be used in either the dataset’s `__iter__()` method or the `DataLoader` ‘s `worker_init_fn` option to modify each copy’s behavior. Example 1: splitting workload across all workers in `__iter__()`: >>> class MyIterableDataset(torch.utils.data.IterableDataset): ... def __init__(self, start, end): ... super(MyIterableDataset).__init__() ... assert end > start, "this example code only works with end >= start" ... self.start = start ... self.end = end ... ... def __iter__(self): ... worker_info = torch.utils.data.get_worker_info() ... if worker_info is None: # single-process data loading, return the full iterator ... iter_start = self.start ... iter_end = self.end ... else: # in a worker process ... # split workload ... per_worker = int(math.ceil((self.end - self.start) / float(worker_info.num_workers))) ... worker_id = worker_info.id ... iter_start = self.start + worker_id * per_worker ... iter_end = min(iter_start + per_worker, self.end) ... return iter(range(iter_start, iter_end)) ... >>> # should give same set of data as range(3, 7), i.e., [3, 4, 5, 6]. >>> ds = MyIterableDataset(start=3, end=7) >>> # Single-process loading >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0))) [3, 4, 5, 6] >>> # Mult-process loading with two worker processes >>> # Worker 0 fetched [3, 4]. Worker 1 fetched [5, 6]. >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2))) [3, 5, 4, 6] >>> # With even more workers >>> print(list(torch.utils.data.DataLoader(ds, num_workers=20))) [3, 4, 5, 6] Example 2: splitting workload across all workers using `worker_init_fn`: >>> class MyIterableDataset(torch.utils.data.IterableDataset): ... def __init__(self, start, end): ... super(MyIterableDataset).__init__() ... assert end > start, "this example code only works with end >= start" ... self.start = start ... self.end = end ... ... def __iter__(self): ... return iter(range(self.start, self.end)) ... >>> # should give same set of data as range(3, 7), i.e., [3, 4, 5, 6]. >>> ds = MyIterableDataset(start=3, end=7) >>> # Single-process loading >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0))) [3, 4, 5, 6] >>> >>> # Directly doing multi-process loading yields duplicate data >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2))) [3, 3, 4, 4, 5, 5, 6, 6] >>> # Define a `worker_init_fn` that configures each dataset copy differently >>> def worker_init_fn(worker_id): ... worker_info = torch.utils.data.get_worker_info() ... dataset = worker_info.dataset # the dataset copy in this worker process ... overall_start = dataset.start ... overall_end = dataset.end ... # configure the dataset to only process the split workload ... per_worker = int(math.ceil((overall_end - overall_start) / float(worker_info.num_workers))) ... worker_id = worker_info.id ... dataset.start = overall_start + worker_id * per_worker ... dataset.end = min(dataset.start + per_worker, overall_end) ... >>> # Mult-process loading with the custom `worker_init_fn` >>> # Worker 0 fetched [3, 4]. Worker 1 fetched [5, 6]. >>> print(list(torch.utils.data.DataLoader(ds, num_workers=2, worker_init_fn=worker_init_fn))) [3, 5, 4, 6] >>> # With even more workers >>> print(list(torch.utils.data.DataLoader(ds, num_workers=20, worker_init_fn=worker_init_fn))) [3, 4, 5, 6] `class torch.utils.data.TensorDataset(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#TensorDataset) Dataset wrapping tensors. Each sample will be retrieved by indexing tensors along the first dimension. Parameters ***tensors** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – tensors that have the same size of the first dimension. `class torch.utils.data.ConcatDataset(datasets)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#ConcatDataset) Dataset as a concatenation of multiple datasets. This class is useful to assemble different existing datasets. Parameters **datasets** (_sequence_) – List of datasets to be concatenated `class torch.utils.data.ChainDataset(datasets)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#ChainDataset) Dataset for chainning multiple `IterableDataset` s. This class is useful to assemble different existing dataset streams. The chainning operation is done on-the-fly, so concatenating large-scale datasets with this class will be efficient. Parameters **datasets** (_iterable of IterableDataset_) – datasets to be chained together `class torch.utils.data.BufferedShuffleDataset(dataset, buffer_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#BufferedShuffleDataset) Dataset shuffled from the original dataset. This class is useful to shuffle an existing instance of an IterableDataset. The buffer with `buffer_size` is filled with the items from the dataset first. Then, each item will be yielded from the buffer by reservoir sampling via iterator. `buffer_size` is required to be larger than 0. For `buffer_size == 1`, the dataset is not shuffled. In order to fully shuffle the whole dataset, `buffer_size` is required to be greater than or equal to the size of dataset. When it is used with `DataLoader`, each item in the dataset will be yielded from the `DataLoader` iterator. And, the method to set up a random seed is different based on `num_workers`. For single-process mode (`num_workers == 0`), the random seed is required to be set before the `DataLoader` in the main process. >>> ds = BufferedShuffleDataset(dataset) >>> random.seed(...) >>> print(list(torch.utils.data.DataLoader(ds, num_workers=0))) For multi-process mode (`num_workers > 0`), the random seed is set by a callable function in each worker. >>> ds = BufferedShuffleDataset(dataset) >>> def init_fn(worker_id): ... random.seed(...) >>> print(list(torch.utils.data.DataLoader(ds, ..., num_workers=n, worker_init_fn=init_fn))) Parameters * **dataset** (IterableDataset) – The original IterableDataset. * **buffer_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The buffer size for shuffling. `class torch.utils.data.Subset(dataset, indices)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#Subset) Subset of a dataset at specified indices. Parameters * **dataset** (Dataset) – The whole Dataset * **indices** (_sequence_) – Indices in the whole set selected for subset `torch.utils.data.get_worker_info()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/_utils/worker.html#get_worker_info) Returns the information about the current `DataLoader` iterator worker process. When called in a worker, this returns an object guaranteed to have the following attributes: * `id`: the current worker id. * `num_workers`: the total number of workers. * `seed`: the random seed set for the current worker. This value is determined by main process RNG and the worker id. See `DataLoader`’s documentation for more details. * `dataset`: the copy of the dataset object in **this** process. Note that this will be a different object in a different process than the one in the main process. When called in the main process, this returns `None`. Note When used in a `worker_init_fn` passed over to `DataLoader`, this method can be useful to set up each worker process differently, for instance, using `worker_id` to configure the `dataset` object to only read a specific fraction of a sharded dataset, or use `seed` to seed other libraries used in dataset code (e.g., NumPy). `torch.utils.data.random_split(dataset, lengths, generator=)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/dataset.html#random_split) Randomly split a dataset into non-overlapping new datasets of given lengths. Optionally fix the generator for reproducible results, e.g.: >>> random_split(range(10), [3, 7], generator=torch.Generator().manual_seed(42)) Parameters * **dataset** (Dataset) – Dataset to be split * **lengths** (_sequence_) – lengths of splits to be produced * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used for the random permutation. `class torch.utils.data.Sampler(data_source)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#Sampler) Base class for all Samplers. Every Sampler subclass has to provide an `__iter__()` method, providing a way to iterate over indices of dataset elements, and a `__len__()` method that returns the length of the returned iterators. Note The `__len__()` method isn’t strictly required by `DataLoader`, but is expected in any calculation involving the length of a `DataLoader`. `class torch.utils.data.SequentialSampler(data_source)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#SequentialSampler) Samples elements sequentially, always in the same order. Parameters **data_source** (Dataset) – dataset to sample from `class torch.utils.data.RandomSampler(data_source, replacement=False, num_samples=None, generator=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#RandomSampler) Samples elements randomly. If without replacement, then sample from a shuffled dataset. If with replacement, then user can specify `num_samples` to draw. Parameters * **data_source** (Dataset) – dataset to sample from * **replacement** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – samples are drawn on-demand with replacement if `True`, default=``False`` * **num_samples** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of samples to draw, default=`len(dataset)`. This argument is supposed to be specified only when `replacement` is `True`. * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used in sampling. `class torch.utils.data.SubsetRandomSampler(indices, generator=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#SubsetRandomSampler) Samples elements randomly from a given list of indices, without replacement. Parameters * **indices** (_sequence_) – a sequence of indices * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used in sampling. `class torch.utils.data.WeightedRandomSampler(weights, num_samples, replacement=True, generator=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#WeightedRandomSampler) Samples elements from `[0,..,len(weights)-1]` with given probabilities (weights). Parameters * **weights** (_sequence_) – a sequence of weights, not necessary summing up to one * **num_samples** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of samples to draw * **replacement** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `True`, samples are drawn with replacement. If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row. * **generator** ([Generator](generated/torch.generator#torch.Generator "torch.Generator")) – Generator used in sampling. #### Example >>> list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True)) [4, 4, 1, 4, 5] >>> list(WeightedRandomSampler([0.9, 0.4, 0.05, 0.2, 0.3, 0.1], 5, replacement=False)) [0, 1, 4, 3, 2] `class torch.utils.data.BatchSampler(sampler, batch_size, drop_last)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/sampler.html#BatchSampler) Wraps another sampler to yield a mini-batch of indices. Parameters * **sampler** (Sampler _or_ _Iterable_) – Base sampler. Can be any iterable object * **batch_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of mini-batch. * **drop_last** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, the sampler will drop the last batch if its size would be less than `batch_size` #### Example >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=False)) [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]] >>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=True)) [[0, 1, 2], [3, 4, 5], [6, 7, 8]] `class torch.utils.data.distributed.DistributedSampler(dataset, num_replicas=None, rank=None, shuffle=True, seed=0, drop_last=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/data/distributed.html#DistributedSampler) Sampler that restricts data loading to a subset of the dataset. It is especially useful in conjunction with [`torch.nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel"). In such a case, each process can pass a `DistributedSampler` instance as a `DataLoader` sampler, and load a subset of the original dataset that is exclusive to it. Note Dataset is assumed to be of constant size. Parameters * **dataset** – Dataset used for sampling. * **num_replicas** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of processes participating in distributed training. By default, `world_size` is retrieved from the current distributed group. * **rank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Rank of the current process within `num_replicas`. By default, `rank` is retrieved from the current distributed group. * **shuffle** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True` (default), sampler will shuffle the indices. * **seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – random seed used to shuffle the sampler if `shuffle=True`. This number should be identical across all processes in the distributed group. Default: `0`. * **drop_last** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, then the sampler will drop the tail of the data to make it evenly divisible across the number of replicas. If `False`, the sampler will add extra indices to make the data evenly divisible across the replicas. Default: `False`. Warning In distributed mode, calling the `set_epoch()` method at the beginning of each epoch **before** creating the `DataLoader` iterator is necessary to make shuffling work properly across multiple epochs. Otherwise, the same ordering will be always used. Example: >>> sampler = DistributedSampler(dataset) if is_distributed else None >>> loader = DataLoader(dataset, shuffle=(sampler is None), ... sampler=sampler) >>> for epoch in range(start_epoch, n_epochs): ... if is_distributed: ... sampler.set_epoch(epoch) ... train(loader) # DDP Communication Hooks DDP communication hook is a generic interface to control how to communicate gradients across workers by overriding the vanilla allreduce in [DistributedDataParallel](https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.). A few built-in communication hooks are provided, and users can easily apply any of these hooks to optimize communication. Besides, the hook interface can also support user-defined communication strategies for more advanced use cases. Warning DDP communication hook is experimental and subject to change. Warning DDP communication hooks can only support single process single device mode on NCCL backend. ## How to Use a Communication Hook? To use a communication hook, the user just needs to let the DDP model register the hook before the training loop as below. `torch.nn.parallel.DistributedDataParallel.register_comm_hook().` noindex ## Default Communication Hooks Default communication hooks are simple **stateless** hooks, so the input state in `register_comm_hook` is either a process group or `None`. `torch.distributed.algorithms.ddp_comm_hooks.default_hooks.allreduce_hook(process_group, bucket)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.html#allreduce_hook) This DDP communication hook just calls `allreduce` using `GradBucket` tensors. Once gradient tensors are aggregated across all workers, its `then` callback takes the mean and returns the result. If user registers this hook, DDP results is expected to be same as the case where no hook was registered. Hence, this won’t change behavior of DDP and user can use this as a reference or modify this hook to log useful information or any other purposes while unaffecting DDP behavior. Example:: >>> ddp_model.register_comm_hook(process_group, allreduce_hook) `torch.distributed.algorithms.ddp_comm_hooks.default_hooks.fp16_compress_hook(process_group, bucket)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/default_hooks.html#fp16_compress_hook) This DDP communication hook implements a simple gradient compression approach that converts `GradBucket` tensors whose type is assumed to be `torch.float32` to half-precision floating point format (`torch.float16`). It allreduces those `float16` gradient tensors. Once compressed gradient tensors are allreduced, its then callback called `decompress` converts the aggregated result back to `float32` and takes the mean. Example:: >>> ddp_model.register_comm_hook(process_group, fp16_compress_hook) ## PowerSGD Communication Hook PowerSGD ([Vogels et al., NeurIPS 2019](https://arxiv.org/abs/1905.13727)) is a gradient compression algorithm, which can provide very high compression rates and accelerate bandwidth-bound distributed training. This algorithm needs to maintain both some hyperparameters and the internal state. Therefore, PowerSGD communication hook is a **stateful** hook, and the user needs to provide a state object defined as below. ### PowerSGD State `class torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook.PowerSGDState(process_group, matrix_approximation_rank=1, start_powerSGD_iter=10, use_error_feedback=True, warm_start=True, random_seed=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.html#PowerSGDState) Stores both the algorithm’s hyperparameters and the internal state for all the gradients during the training. Particularly, `matrix_approximation_rank` and `start_powerSGD_iter` are the main hyperparameters that should be tuned by the user. For performance, we suggest to keep binary hyperparameters `use_error_feedback` and `warm_start` on. 1. `matrix_approximation_rank` controls the size of compressed low-rank tensors, which determines the compression rate. The lower the rank, the stronger the compression. 1.1. If `matrix_approximation_rank` is too low, the full model quality will need more training steps to reach or will never reach and yield loss in accuracy. 1.2. The increase of `matrix_approximation_rank` can substantially increase the computation costs of the compression, and the accuracy may not be futher improved beyond a certain `matrix_approximation_rank` threshold. To tune `matrix_approximation_rank`, we suggest to start from 1 and increase by factors of 2 (like an expoential grid search, 1, 2, 4, …), until a satisfactory accuracy is reached. Typically only a small value 1-4 is used. For some NLP tasks (as shown in Appendix D of the original paper), this value has been increased to 32. 2. `start_powerSGD_iter` defers PowerSGD compression util step `start_powerSGD_iter`, and vanilla allreduce runs prior to step `start_powerSGD_iter`. This hybrid scheme of **vanilla allreduce + PowerSGD** can effectively improve the accuracy, even a relatively small `matrix_approximation_rank` is used. This is because that, the beginning of training phase is usually very sensitive to inaccurate gradients, and compressing gradients too early may make the training quickly take a suboptimal trajectory, which can result in an irrecoverable impact on the accuracy. To tune `start_powerSGD_iter`, we suggest to start with 10% of total training steps, and increase it until a satisfactory accuracy is reached. Warning If error feedback or warm-up is enabled, the minimum value of `start_powerSGD_iter` allowed in DDP is 2. This is because there is another internal optimization that rebuilds buckets at iteration 1 in DDP, and this can conflict with any tensor memorized before the rebuild process. ### PowerSGD Hooks Warning PowerSGD typically requires extra memory of the same size as the model’s gradients to enable error feedback, which can compensate for biased compressed communication and improve accuracy. Warning The current implementation may cause gradient overflow for FP16 input. `torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook.powerSGD_hook(state, bucket)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.html#powerSGD_hook) This DDP communication hook implements PowerSGD gradient compression algorithm described in the [paper](https://arxiv.org/abs/1905.13727). Once gradient tensors are aggregated across all workers, this hook applies compression as follows: 1. Views the input flattened 1D gradient tensor as two groups of per-parameter tensors: high-rank tensors and vector-like rank-1 tensors (for biases). 2. Handles rank-1 tensors by allreducing them without compression: 2.1. Allocate contiguous memory for those rank-1 tensors, and allreduces all the rank-1 tensors as a batch, without compression; 2.2. Copies the individual rank-1 tensors from the contiguous memory back to the input tensor. 3. Handles high-rank tensors by PowerSGD compression: 3.1. For each high-rank tensor M, creates two low-rank tensors P and Q for decomposing M, such that M = PQ^T, where Q is initialized from a standard normal distribution and orthogonalized; 3.2. Computes each P in Ps, which is equal to MQ; 3.3. Allreduces Ps as a batch; 3.4. Orthogonalizes each P in Ps; 3.5. Computes each Q in Qs, which is approximately equal to M^TP; 3.6. Allreduces Qs as a batch; 3.7. Computes each M among all the high-rank tensors, which is approximately equal to PQ^T. Note that this communication hook enforces vanilla allreduce for the first `state.start_powerSGD_iter` iterations. This not only gives the user more control over the tradeoff between speedup and accuracy, but also helps abstract away some complexity of the internal optimization of DDP for future communication hook developers. Parameters * **state** (PowerSGDState) – State information to configure the compression rate and support error feedback, warm start, etc. To tune the compression configs, mainly need to tune `matrix_approximation_rank`` and `start_powerSGD_iter`. * **bucket** (_dist._GradBucket_) – Bucket that stores a 1D flattened gradient tensor that batches multiple per-variable tensors. Note that since DDP comm hook only supports single process single device mode at this time, only exactly one tensor is stored in this bucket. Returns Future handler of the communication, which updates the gradients in place. Example:: >>> state = PowerSGDState(process_group=process_group, matrix_approximation_rank=1, start_powerSGD_iter=10) >>> ddp_model.register_comm_hook(state, powerSGD_hook) `torch.distributed.algorithms.ddp_comm_hooks.powerSGD_hook.batched_powerSGD_hook(state, bucket)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.html#batched_powerSGD_hook) This DDP communication hook implements a simplified PowerSGD gradient compression algorithm described in the [paper](https://arxiv.org/abs/1905.13727). This variant does not compress the gradients layer by layer, but instead compresses the flattened input tensor that batches all the gradients. Therefore, it is **faster** than `powerSGD_hook()`, but usually results in a **much lower accuracy** , unless `matrix_approximation_rank` is 1. Warning Increasing `matrix_approximation_rank` here may not necessarily increase the accuracy, because batching per-parameter tensors without column/row alignment can destroy low-rank structure. Therefore, the user should always consider `powerSGD_hook()` first, and only consider this variant when a satisfactory accuracy can be achieved when `matrix_approximation_rank` is 1. Once gradient tensors are aggregated across all workers, this hook applies compression as follows: 1. Views the input flattened 1D gradient tensor as a square-shaped tensor M with 0 paddings; 2. Creates two low-rank tensors P and Q for decomposing M, such that M = PQ^T, where Q is initialized from a standard normal distribution and orthogonalized; 3. Computes P, which is equal to MQ; 4. Allreduces P; 5. Orthogonalizes P; 6. Computes Q, which is approximately equal to M^TP; 7. Allreduces Q; 8. Computes M, which is approximately equal to PQ^T. 9. Truncates the input tensor to the original length. Note that this communication hook enforces vanilla allreduce for the first `state.start_powerSGD_iter` iterations. This not only gives the user more control over the tradeoff between speedup and accuracy, but also helps abstract away some complexity of the internal optimization of DDP for future communication hook developers. Parameters * **state** (PowerSGDState) – State information to configure the compression rate and support error feedback, warm start, etc. To tune the compression configs, mainly need to tune `matrix_approximation_rank` and `start_powerSGD_iter`. * **bucket** (_dist._GradBucket_) – Bucket that stores a 1D flattened gradient tensor that batches multiple per-variable tensors. Note that since DDP comm hook only supports single process single device mode at this time, only exactly one tensor is stored in this bucket. Returns Future handler of the communication, which updates the gradients in place. Example:: >>> state = PowerSGDState(process_group=process_group, matrix_approximation_rank=1) >>> ddp_model.register_comm_hook(state, batched_powerSGD_hook) ## Acknowledgements Many thanks to PowerSGD paper author **Thijs Vogels** for the code review on PowerSGD communication hook, as well as the [comparison experiments](https://observablehq.com/@tvogels/powersgd-benchmark), which show that the performance of PowerSGD communication hook is on par with the implementation in the original [paper](https://arxiv.org/abs/1905.13727). # Distributed communication package - torch.distributed Note Please refer to [PyTorch Distributed Overview](https://pytorch.org/tutorials/beginner/dist_overview.html) for a brief introduction to all features related to distributed training. ## Backends `torch.distributed` supports three built-in backends, each with different capabilities. The table below shows which functions are available for use with CPU / CUDA tensors. MPI supports CUDA only if the implementation used to build PyTorch supports it. Backend | `gloo` | `mpi` | `nccl` ---|---|---|--- Device | CPU | GPU | CPU | GPU | CPU | GPU send | ✓ | ✘ | ✓ | ? | ✘ | ✘ recv | ✓ | ✘ | ✓ | ? | ✘ | ✘ broadcast | ✓ | ✓ | ✓ | ? | ✘ | ✓ all_reduce | ✓ | ✓ | ✓ | ? | ✘ | ✓ reduce | ✓ | ✘ | ✓ | ? | ✘ | ✓ all_gather | ✓ | ✘ | ✓ | ? | ✘ | ✓ gather | ✓ | ✘ | ✓ | ? | ✘ | ✘ scatter | ✓ | ✘ | ✓ | ? | ✘ | ✘ reduce_scatter | ✘ | ✘ | ✘ | ✘ | ✘ | ✓ all_to_all | ✘ | ✘ | ✓ | ? | ✘ | ✘ barrier | ✓ | ✘ | ✓ | ? | ✘ | ✓ ### Backends that come with PyTorch PyTorch distributed package supports Linux (stable), MacOS (stable), and Windows (prototype). By default for Linux, the Gloo and NCCL backends are built and included in PyTorch distributed (NCCL only when building with CUDA). MPI is an optional backend that can only be included if you build PyTorch from source. (e.g.building PyTorch on a host that has MPI installed.) Note As of PyTorch v1.8, Windows supports all collective communications backend but NCCL, If the `init_method` argument of `init_process_group()` points to a file it must adhere to the following schema: * Local file system, `init_method="file:///d:/tmp/some_file"` * Shared file system, `init_method="file://////{machine_name}/{share_folder_name}/some_file"` Same as on Linux platform, you can enable TcpStore by setting environment variables, MASTER_ADDR and MASTER_PORT. ### Which backend to use? In the past, we were often asked: “which backend should I use?”. * Rule of thumb * Use the NCCL backend for distributed **GPU** training * Use the Gloo backend for distributed **CPU** training. * GPU hosts with InfiniBand interconnect * Use NCCL, since it’s the only backend that currently supports InfiniBand and GPUDirect. * GPU hosts with Ethernet interconnect * Use NCCL, since it currently provides the best distributed GPU training performance, especially for multiprocess single-node or multi-node distributed training. If you encounter any problem with NCCL, use Gloo as the fallback option. (Note that Gloo currently runs slower than NCCL for GPUs.) * CPU hosts with InfiniBand interconnect * If your InfiniBand has enabled IP over IB, use Gloo, otherwise, use MPI instead. We are planning on adding InfiniBand support for Gloo in the upcoming releases. * CPU hosts with Ethernet interconnect * Use Gloo, unless you have specific reasons to use MPI. ### Common environment variables #### Choosing the network interface to use By default, both the NCCL and Gloo backends will try to find the right network interface to use. If the automatically detected interface is not correct, you can override it using the following environment variables (applicable to the respective backend): * **NCCL_SOCKET_IFNAME** , for example `export NCCL_SOCKET_IFNAME=eth0` * **GLOO_SOCKET_IFNAME** , for example `export GLOO_SOCKET_IFNAME=eth0` If you’re using the Gloo backend, you can specify multiple interfaces by separating them by a comma, like this: `export GLOO_SOCKET_IFNAME=eth0,eth1,eth2,eth3`. The backend will dispatch operations in a round-robin fashion across these interfaces. It is imperative that all processes specify the same number of interfaces in this variable. #### Other NCCL environment variables NCCL has also provided a number of environment variables for fine-tuning purposes. Commonly used ones include the following for debugging purposes: * `export NCCL_DEBUG=INFO` * `export NCCL_DEBUG_SUBSYS=ALL` For the full list of NCCL environment variables, please refer to [NVIDIA NCCL’s official documentation](https://docs.nvidia.com/deeplearning/sdk/nccl- developer-guide/docs/env.html) ## Basics The `torch.distributed` package provides PyTorch support and communication primitives for multiprocess parallelism across several computation nodes running on one or more machines. The class [`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") builds on this functionality to provide synchronous distributed training as a wrapper around any PyTorch model. This differs from the kinds of parallelism provided by [Multiprocessing package - torch.multiprocessing](multiprocessing) and [`torch.nn.DataParallel()`](generated/torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel") in that it supports multiple network-connected machines and in that the user must explicitly launch a separate copy of the main training script for each process. In the single-machine synchronous case, `torch.distributed` or the [`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") wrapper may still have advantages over other approaches to data-parallelism, including [`torch.nn.DataParallel()`](generated/torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel"): * Each process maintains its own optimizer and performs a complete optimization step with each iteration. While this may appear redundant, since the gradients have already been gathered together and averaged across processes and are thus the same for every process, this means that no parameter broadcast step is needed, reducing time spent transferring tensors between nodes. * Each process contains an independent Python interpreter, eliminating the extra interpreter overhead and “GIL-thrashing” that comes from driving several execution threads, model replicas, or GPUs from a single Python process. This is especially important for models that make heavy use of the Python runtime, including models with recurrent layers or many small components. ## Initialization The package needs to be initialized using the `torch.distributed.init_process_group()` function before calling any other methods. This blocks until all processes have joined. `torch.distributed.is_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed.html#is_available) Returns `True` if the distributed package is available. Otherwise, `torch.distributed` does not expose any other APIs. Currently, `torch.distributed` is available on Linux, MacOS and Windows. Set `USE_DISTRIBUTED=1` to enable it when building PyTorch from source. Currently, the default value is `USE_DISTRIBUTED=1` for Linux and Windows, `USE_DISTRIBUTED=0` for MacOS. `torch.distributed.init_process_group(backend, init_method=None, timeout=datetime.timedelta(seconds=1800), world_size=-1, rank=-1, store=None, group_name='')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#init_process_group) Initializes the default distributed process group, and this will also initialize the distributed package. There are 2 main ways to initialize a process group: 1. Specify `store`, `rank`, and `world_size` explicitly. 2. Specify `init_method` (a URL string) which indicates where/how to discover peers. Optionally specify `rank` and `world_size`, or encode all required parameters in the URL and omit them. If neither is specified, `init_method` is assumed to be “env://”. Parameters * **backend** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_Backend) – The backend to use. Depending on build-time configurations, valid values include `mpi`, `gloo`, and `nccl`. This field should be given as a lowercase string (e.g., `"gloo"`), which can also be accessed via `Backend` attributes (e.g., `Backend.GLOO`). If using multiple processes per machine with `nccl` backend, each process must have exclusive access to every GPU it uses, as sharing GPUs between processes can result in deadlocks. * **init_method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – URL specifying how to initialize the process group. Default is “env://” if no `init_method` or `store` is specified. Mutually exclusive with `store`. * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of processes participating in the job. Required if `store` is specified. * **rank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Rank of the current process (it should be a number between 0 and `world_size`-1). Required if `store` is specified. * **store** (Store _,__optional_) – Key/value store accessible to all workers, used to exchange connection/address information. Mutually exclusive with `init_method`. * **timeout** (_timedelta_ _,__optional_) – Timeout for operations executed against the process group. Default value equals 30 minutes. This is applicable for the `gloo` backend. For `nccl`, this is applicable only if the environment variable `NCCL_BLOCKING_WAIT` or `NCCL_ASYNC_ERROR_HANDLING` is set to 1. When `NCCL_BLOCKING_WAIT` is set, this is the duration for which the process will block and wait for collectives to complete before throwing an exception. When `NCCL_ASYNC_ERROR_HANDLING` is set, this is the duration after which collectives will be aborted asynchronously and the process will crash. `NCCL_BLOCKING_WAIT` will provide errors to the user which can be caught and handled, but due to its blocking nature, it has a performance overhead. On the other hand, `NCCL_ASYNC_ERROR_HANDLING` has very little performance overhead, but crashes the process on errors. This is done since CUDA execution is async and it is no longer safe to continue executing user code since failed async NCCL operations might result in subsequent CUDA operations running on corrupted data. Only one of these two environment variables should be set. * **group_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_ _,__deprecated_) – Group name. To enable `backend == Backend.MPI`, PyTorch needs to be built from source on a system that supports MPI. `class torch.distributed.Backend` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#Backend) An enum-like class of available backends: GLOO, NCCL, MPI, and other registered backends. The values of this class are lowercase strings, e.g., `"gloo"`. They can be accessed as attributes, e.g., `Backend.NCCL`. This class can be directly called to parse the string, e.g., `Backend(backend_str)` will check if `backend_str` is valid, and return the parsed lowercase string if so. It also accepts uppercase strings, e.g., `Backend("GLOO")` returns `"gloo"`. Note The entry `Backend.UNDEFINED` is present but only used as initial value of some fields. Users should neither use it directly nor assume its existence. `torch.distributed.get_backend(group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#get_backend) Returns the backend of the given process group. Parameters **group** (_ProcessGroup_ _,__optional_) – The process group to work on. The default is the general main process group. If another specific group is specified, the calling process must be part of `group`. Returns The backend of the given process group as a lower case string. `torch.distributed.get_rank(group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#get_rank) Returns the rank of current process group Rank is a unique identifier assigned to each process within a distributed process group. They are always consecutive integers ranging from 0 to `world_size`. Parameters **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. Returns The rank of the process group -1, if not part of the group `torch.distributed.get_world_size(group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#get_world_size) Returns the number of processes in the current process group Parameters **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. Returns The world size of the process group -1, if not part of the group `torch.distributed.is_initialized()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#is_initialized) Checking if the default process group has been initialized `torch.distributed.is_mpi_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#is_mpi_available) Checks if the MPI backend is available. `torch.distributed.is_nccl_available()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#is_nccl_available) Checks if the NCCL backend is available. Currently three initialization methods are supported: ### TCP initialization There are two ways to initialize using TCP, both requiring a network address reachable from all processes and a desired `world_size`. The first way requires specifying an address that belongs to the rank 0 process. This initialization method requires that all processes have manually specified ranks. Note that multicast address is not supported anymore in the latest distributed package. `group_name` is deprecated as well. import torch.distributed as dist # Use address of one of the machines dist.init_process_group(backend, init_method='tcp://10.1.1.20:23456', rank=args.rank, world_size=4) ### Shared file-system initialization Another initialization method makes use of a file system that is shared and visible from all machines in a group, along with a desired `world_size`. The URL should start with `file://` and contain a path to a non-existent file (in an existing directory) on a shared file system. File-system initialization will automatically create that file if it doesn’t exist, but will not delete the file. Therefore, it is your responsibility to make sure that the file is cleaned up before the next `init_process_group()` call on the same file path/name. Note that automatic rank assignment is not supported anymore in the latest distributed package and `group_name` is deprecated as well. Warning This method assumes that the file system supports locking using `fcntl` \- most local systems and NFS support it. Warning This method will always create the file and try its best to clean up and remove the file at the end of the program. In other words, each initialization with the file init method will need a brand new empty file in order for the initialization to succeed. If the same file used by the previous initialization (which happens not to get cleaned up) is used again, this is unexpected behavior and can often cause deadlocks and failures. Therefore, even though this method will try its best to clean up the file, if the auto- delete happens to be unsuccessful, it is your responsibility to ensure that the file is removed at the end of the training to prevent the same file to be reused again during the next time. This is especially important if you plan to call `init_process_group()` multiple times on the same file name. In other words, if the file is not removed/cleaned up and you call `init_process_group()` again on that file, failures are expected. The rule of thumb here is that, make sure that the file is non-existent or empty every time `init_process_group()` is called. import torch.distributed as dist # rank should always be specified dist.init_process_group(backend, init_method='file:///mnt/nfs/sharedfile', world_size=4, rank=args.rank) ### Environment variable initialization This method will read the configuration from environment variables, allowing one to fully customize how the information is obtained. The variables to be set are: * `MASTER_PORT` \- required; has to be a free port on machine with rank 0 * `MASTER_ADDR` \- required (except for rank 0); address of rank 0 node * `WORLD_SIZE` \- required; can be set either here, or in a call to init function * `RANK` \- required; can be set either here, or in a call to init function The machine with rank 0 will be used to set up all connections. This is the default method, meaning that `init_method` does not have to be specified (or can be `env://`). ## Distributed Key-Value Store The distributed package comes with a distributed key-value store, which can be used to share information between processes in the group as well as to initialize the distributed pacakge in `torch.distributed.init_process_group()` (by explicitly creating the store as an alternative to specifying `init_method`.) There are 3 choices for Key-Value Stores: `TCPStore`, `FileStore`, and `HashStore`. `class torch.distributed.Store` Base class for all store implementations, such as the 3 provided by PyTorch distributed: (`TCPStore`, `FileStore`, and `HashStore`). `class torch.distributed.TCPStore` A TCP-based distributed key-value store implementation. The server store holds the data, while the client stores can connect to the server store over TCP and perform actions such as `set()` to insert a key-value pair, `get()` to retrieve a key-value pair, etc. Parameters * **host_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The hostname or IP Address the server store should run on. * **port** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The port on which the server store should listen for incoming requests. * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The total number of store users (number of clients + 1 for the server). * **is_master** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – True when initializing the server store, False for client stores. * **timeout** (_timedelta_) – Timeout used by the store during initialization and for methods such as `get()` and `wait()`. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> # Run on process 1 (server) >>> server_store = dist.TCPStore("127.0.0.1", 1234, 2, True, timedelta(seconds=30)) >>> # Run on process 2 (client) >>> client_store = dist.TCPStore("127.0.0.1", 1234, 2, False) >>> # Use any of the store methods from either the client or server after initialization >>> server_store.set("first_key", "first_value") >>> client_store.get("first_key") `class torch.distributed.HashStore` A thread-safe store implementation based on an underlying hashmap. This store can be used within the same process (for example, by other threads), but cannot be used across processes. Example:: >>> import torch.distributed as dist >>> store = dist.HashStore() >>> # store can be used from other threads >>> # Use any of the store methods after initialization >>> store.set("first_key", "first_value") `class torch.distributed.FileStore` A store implementation that uses a file to store the underlying key-value pairs. Parameters * **file_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – path of the file in which to store the key-value pairs * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The total number of processes using the store Example:: >>> import torch.distributed as dist >>> store1 = dist.FileStore("/tmp/filestore", 2) >>> store2 = dist.FileStore("/tmp/filestore", 2) >>> # Use any of the store methods from either the client or server after initialization >>> store1.set("first_key", "first_value") >>> store2.get("first_key") `class torch.distributed.PrefixStore` A wrapper around any of the 3 key-value stores (`TCPStore`, `FileStore`, and `HashStore`) that adds a prefix to each key inserted to the store. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The prefix string that is prepended to each key before being inserted into the store. * **store** (_torch.distributed.store_) – A store object that forms the underlying key-value store. `torch.distributed.Store.set(self: torch._C._distributed_c10d.Store, arg0: str, arg1: str) → None` Inserts the key-value pair into the store based on the supplied `key` and `value`. If `key` already exists in the store, it will overwrite the old value with the new supplied `value`. Parameters * **key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The key to be added to the store. * **value** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The value associated with `key` to be added to the store. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> store.set("first_key", "first_value") >>> # Should return "first_value" >>> store.get("first_key") `torch.distributed.Store.get(self: torch._C._distributed_c10d.Store, arg0: str) → bytes` Retrieves the value associated with the given `key` in the store. If `key` is not present in the store, the function will wait for `timeout`, which is defined when initializing the store, before throwing an exception. Parameters **key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The function will return the value associated with this key. Returns Value associated with `key` if `key` is in the store. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> store.set("first_key", "first_value") >>> # Should return "first_value" >>> store.get("first_key") `torch.distributed.Store.add(self: torch._C._distributed_c10d.Store, arg0: str, arg1: int) → int` The first call to add for a given `key` creates a counter associated with `key` in the store, initialized to `amount`. Subsequent calls to add with the same `key` increment the counter by the specified `amount`. Calling `add()` with a key that has already been set in the store by `set()` will result in an exception. Parameters * **key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The key in the store whose counter will be incremented. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The quantity by which the counter will be incremented. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> # Using TCPStore as an example, other store types can also be used >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> store.add("first_key", 1) >>> store.add("first_key", 6) >>> # Should return 7 >>> store.get("first_key") `torch.distributed.Store.wait(*args, **kwargs)` Overloaded function. 1. wait(self: torch._C._distributed_c10d.Store, arg0: List[str]) -> None Waits for each key in `keys` to be added to the store. If not all keys are set before the `timeout` (set during store initialization), then `wait` will throw an exception. Parameters **keys** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – List of keys on which to wait until they are set in the store. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> # Using TCPStore as an example, other store types can also be used >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> # This will throw an exception after 30 seconds >>> store.wait(["bad_key"]) 2. wait(self: torch._C._distributed_c10d.Store, arg0: List[str], arg1: datetime.timedelta) -> None Waits for each key in `keys` to be added to the store, and throws an exception if the keys have not been set by the supplied `timeout`. Parameters * **keys** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – List of keys on which to wait until they are set in the store. * **timeout** (_timedelta_) – Time to wait for the keys to be added before throwing an exception. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> # Using TCPStore as an example, other store types can also be used >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> # This will throw an exception after 10 seconds >>> store.wait(["bad_key"], timedelta(seconds=10)) `torch.distributed.Store.num_keys(self: torch._C._distributed_c10d.Store) → int` Returns the number of keys set in the store. Note that this number will typically be one greater than the number of keys added by `set()` and `add()` since one key is used to coordinate all the workers using the store. Warning When used with the `TCPStore`, `num_keys` returns the number of keys written to the underlying file. If the store is destructed and another store is created with the same file, the original keys will be retained. Returns The number of keys present in the store. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> # Using TCPStore as an example, other store types can also be used >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> store.set("first_key", "first_value") >>> # This should return 2 >>> store.num_keys() `torch.distributed.Store.delete_key(self: torch._C._distributed_c10d.Store, arg0: str) → bool` Deletes the key-value pair associated with `key` from the store. Returns `true` if the key was successfully deleted, and `false` if it was not. Warning The `delete_key` API is only supported by the `TCPStore` and `HashStore`. Using this API with the `FileStore` will result in an exception. Parameters **key** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The key to be deleted from the store Returns `True` if `key` was deleted, otherwise `False`. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> # Using TCPStore as an example, HashStore can also be used >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> store.set("first_key") >>> # This should return true >>> store.delete_key("first_key") >>> # This should return false >>> store.delete_key("bad_key") `torch.distributed.Store.set_timeout(self: torch._C._distributed_c10d.Store, arg0: datetime.timedelta) → None` Sets the store’s default timeout. This timeout is used during initialization and in `wait()` and `get()`. Parameters **timeout** (_timedelta_) – timeout to be set in the store. Example:: >>> import torch.distributed as dist >>> from datetime import timedelta >>> # Using TCPStore as an example, other store types can also be used >>> store = dist.TCPStore("127.0.0.1", 0, 1, True, timedelta(seconds=30)) >>> store.set_timeout(timedelta(seconds=10)) >>> # This will throw an exception after 10 seconds >>> store.wait(["bad_key"]) ## Groups By default collectives operate on the default group (also called the world) and require all processes to enter the distributed function call. However, some workloads can benefit from more fine-grained communication. This is where distributed groups come into play. `new_group()` function can be used to create new groups, with arbitrary subsets of all processes. It returns an opaque group handle that can be given as a `group` argument to all collectives (collectives are distributed functions to exchange information in certain well-known programming patterns). `torch.distributed.new_group(ranks=None, timeout=datetime.timedelta(seconds=1800), backend=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#new_group) Creates a new distributed group. This function requires that all processes in the main group (i.e. all processes that are part of the distributed job) enter this function, even if they are not going to be members of the group. Additionally, groups should be created in the same order in all processes. Warning Using multiple process groups with the `NCCL` backend concurrently is not safe and the user should perform explicit synchronization in their application to ensure only one process group is used at a time. This means collectives from one process group should have completed execution on the device (not just enqueued since CUDA execution is async) before collectives from another process group are enqueued. See [Using multiple NCCL communicators concurrently](https://docs.nvidia.com/deeplearning/nccl/user- guide/docs/usage/communicators.html#using-multiple-nccl-communicators- concurrently) for more details. Parameters * **ranks** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – List of ranks of group members. If `None`, will be set to all ranks. Default is `None`. * **timeout** (_timedelta_ _,__optional_) – Timeout for operations executed against the process group. Default value equals 30 minutes. This is only applicable for the `gloo` backend. * **backend** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_Backend _,__optional_) – The backend to use. Depending on build-time configurations, valid values are `gloo` and `nccl`. By default uses the same backend as the global group. This field should be given as a lowercase string (e.g., `"gloo"`), which can also be accessed via `Backend` attributes (e.g., `Backend.GLOO`). Returns A handle of distributed group that can be given to collective calls. ## Point-to-point communication `torch.distributed.send(tensor, dst, group=None, tag=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#send) Sends a tensor synchronously. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to send. * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match send with remote recv `torch.distributed.recv(tensor, src=None, group=None, tag=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#recv) Receives a tensor synchronously. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to fill with received data. * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Source rank. Will receive from any process if unspecified. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match recv with remote send Returns Sender rank -1, if not part of the group `isend()` and `irecv()` return distributed request objects when used. In general, the type of this object is unspecified as they should never be created manually, but they are guaranteed to support two methods: * `is_completed()` \- returns True if the operation has finished * `wait()` \- will block the process until the operation is finished. `is_completed()` is guaranteed to return True once it returns. `torch.distributed.isend(tensor, dst, group=None, tag=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#isend) Sends a tensor asynchronously. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to send. * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match send with remote recv Returns A distributed request object. None, if not part of the group `torch.distributed.irecv(tensor, src=None, group=None, tag=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#irecv) Receives a tensor asynchronously. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to fill with received data. * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Source rank. Will receive from any process if unspecified. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **tag** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Tag to match recv with remote send Returns A distributed request object. None, if not part of the group ## Synchronous and asynchronous collective operations Every collective operation function supports the following two kinds of operations, depending on the setting of the `async_op` flag passed into the collective: **Synchronous operation** \- the default mode, when `async_op` is set to `False`. When the function returns, it is guaranteed that the collective operation is performed. In the case of CUDA operations, it is not guaranteed that the CUDA operation is completed, since CUDA operations are asynchronous. For CPU collectives, any further function calls utilizing the output of the collective call will behave as expected. For CUDA collectives, function calls utilizing the output on the same CUDA stream will behave as expected. Users must take care of synchronization under the scenario of running under different streams. For details on CUDA semantics such as stream synchronization, see [CUDA Semantics](https://pytorch.org/docs/stable/notes/cuda.html). See the below script to see examples of differences in these semantics for CPU and CUDA operations. **Asynchronous operation** \- when `async_op` is set to True. The collective operation function returns a distributed request object. In general, you don’t need to create it manually and it is guaranteed to support two methods: * `is_completed()` \- in the case of CPU collectives, returns `True` if completed. In the case of CUDA operations, returns `True` if the operation has been successfully enqueued onto a CUDA stream and the output can be utilized on the default stream without further synchronization. * `wait()` \- in the case of CPU collectives, will block the process until the operation is completed. In the case of CUDA collectives, will block until the operation has been successfully enqueued onto a CUDA stream and the output can be utilized on the default stream without further synchronization. **Example** The following code can serve as a reference regarding semantics for CUDA operations when using distributed collectives. It shows the explicit need to synchronize when using collective outputs on different CUDA streams: # Code runs on each rank. dist.init_process_group("nccl", rank=rank, world_size=2) output = torch.tensor([rank]).cuda(rank) s = torch.cuda.Stream() handle = dist.all_reduce(output, async_op=True) # Wait ensures the operation is enqueued, but not necessarily complete. handle.wait() # Using result on non-default stream. with torch.cuda.stream(s): s.wait_stream(torch.cuda.default_stream()) output.add_(100) if rank == 0: # if the explicit call to wait_stream was omitted, the output below will be # non-deterministically 1 or 101, depending on whether the allreduce overwrote # the value after the add completed. print(output) ## Collective functions `torch.distributed.broadcast(tensor, src, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#broadcast) Broadcasts the tensor to the whole group. `tensor` must have the same number of elements in all processes participating in the collective. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Data to be sent if `src` is the rank of current process, and tensor to be used to save received data otherwise. * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `torch.distributed.broadcast_object_list(object_list, src=0, group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#broadcast_object_list) Broadcasts picklable objects in `object_list` to the whole group. Similar to `broadcast()`, but Python objects can be passed in. Note that all objects in `object_list` must be picklable in order to be broadcasted. Parameters * **object_list** (_List_ _[__Any_ _]_) – List of input objects to broadcast. Each object must be picklable. Only objects on the `src` rank will be broadcast, but each rank must provide lists of equal sizes. * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank from which to broadcast `object_list`. * **group** – (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. Default is `None`. Returns `None`. If rank is part of the group, `object_list` will contain the broadcasted objects from `src` rank. Note For NCCL-based processed groups, internal tensor representations of objects must be moved to the GPU device before communication takes place. In this case, the device used is given by `torch.cuda.current_device()` and it is the user’s responsiblity to ensure that this is set so that each rank has an individual GPU, via `torch.cuda.set_device()`. Note Note that this API differs slightly from the `all_gather()` collective since it does not provide an `async_op` handle and thus will be a blocking call. Warning `broadcast_object_list()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Only call this function with data you trust. Example:: >>> # Note: Process group initialization omitted on each rank. >>> import torch.distributed as dist >>> if dist.get_rank() == 0: >>> # Assumes world_size of 3. >>> objects = ["foo", 12, {1: 2}] # any picklable object >>> else: >>> objects = [None, None, None] >>> dist.broadcast_object_list(objects, src=0) >>> broadcast_objects ['foo', 12, {1: 2}] `torch.distributed.all_reduce(tensor, op=, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_reduce) Reduces the tensor data across all machines in such a way that all get the final result. After the call `tensor` is going to be bitwise identical in all processes. Complex tensors are supported. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Input and output of the collective. The function operates in-place. * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group #### Examples >>> # All tensors below are of torch.int64 type. >>> # We have 2 process groups, 2 ranks. >>> tensor = torch.arange(2, dtype=torch.int64) + 1 + 2 * rank >>> tensor tensor([1, 2]) # Rank 0 tensor([3, 4]) # Rank 1 >>> dist.all_reduce(tensor, op=ReduceOp.SUM) >>> tensor tensor([4, 6]) # Rank 0 tensor([4, 6]) # Rank 1 >>> # All tensors below are of torch.cfloat type. >>> # We have 2 process groups, 2 ranks. >>> tensor = torch.tensor([1+1j, 2+2j], dtype=torch.cfloat) + 2 * rank * (1+1j) >>> tensor tensor([1.+1.j, 2.+2.j]) # Rank 0 tensor([3.+3.j, 4.+4.j]) # Rank 1 >>> dist.all_reduce(tensor, op=ReduceOp.SUM) >>> tensor tensor([4.+4.j, 6.+6.j]) # Rank 0 tensor([4.+4.j, 6.+6.j]) # Rank 1 `torch.distributed.reduce(tensor, dst, op=, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce) Reduces the tensor data across all machines. Only the process with rank `dst` is going to receive the final result. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Input and output of the collective. The function operates in-place. * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `torch.distributed.all_gather(tensor_list, tensor, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_gather) Gathers tensors from the whole group in a list. Complex tensors are supported. Parameters * **tensor_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – Output list. It should contain correctly-sized tensors to be used for output of the collective. * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Tensor to be broadcast from current process. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group #### Examples >>> # All tensors below are of torch.int64 dtype. >>> # We have 2 process groups, 2 ranks. >>> tensor_list = [torch.zero(2, dtype=torch.int64) for _ in range(2)] >>> tensor_list [tensor([0, 0]), tensor([0, 0])] # Rank 0 and 1 >>> tensor = torch.arange(2, dtype=torch.int64) + 1 + 2 * rank >>> tensor tensor([1, 2]) # Rank 0 tensor([3, 4]) # Rank 1 >>> dist.all_gather(tensor_list, tensor) >>> tensor_list [tensor([1, 2]), tensor([3, 4])] # Rank 0 [tensor([1, 2]), tensor([3, 4])] # Rank 1 >>> # All tensors below are of torch.cfloat dtype. >>> # We have 2 process groups, 2 ranks. >>> tensor_list = [torch.zero(2, dtype=torch.cfloat) for _ in range(2)] >>> tensor_list [tensor([0.+0.j, 0.+0.j]), tensor([0.+0.j, 0.+0.j])] # Rank 0 and 1 >>> tensor = torch.tensor([1+1j, 2+2j], dtype=torch.cfloat) + 2 * rank * (1+1j) >>> tensor tensor([1.+1.j, 2.+2.j]) # Rank 0 tensor([3.+3.j, 4.+4.j]) # Rank 1 >>> dist.all_gather(tensor_list, tensor) >>> tensor_list [tensor([1.+1.j, 2.+2.j]), tensor([3.+3.j, 4.+4.j])] # Rank 0 [tensor([1.+1.j, 2.+2.j]), tensor([3.+3.j, 4.+4.j])] # Rank 1 `torch.distributed.all_gather_object(object_list, obj, group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_gather_object) Gathers picklable objects from the whole group into a list. Similar to `all_gather()`, but Python objects can be passed in. Note that the object must be picklable in order to be gathered. Parameters * **object_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[__Any_ _]_) – Output list. It should be correctly sized as the size of the group for this collective and will contain the output. * **object** (_Any_) – Pickable Python object to be broadcast from current process. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. Default is `None`. Returns None. If the calling rank is part of this group, the output of the collective will be populated into the input `object_list`. If the calling rank is not part of the group, the passed in `object_list` will be unmodified. Note Note that this API differs slightly from the `all_gather()` collective since it does not provide an `async_op` handle and thus will be a blocking call. Note For NCCL-based processed groups, internal tensor representations of objects must be moved to the GPU device before communication takes place. In this case, the device used is given by `torch.cuda.current_device()` and it is the user’s responsiblity to ensure that this is set so that each rank has an individual GPU, via `torch.cuda.set_device()`. Warning `all_gather_object()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Only call this function with data you trust. Example:: >>> # Note: Process group initialization omitted on each rank. >>> import torch.distributed as dist >>> # Assumes world_size of 3. >>> gather_objects = ["foo", 12, {1: 2}] # any picklable object >>> output = [None for _ in gather_objects] >>> dist.all_gather_object(output, gather_objects[dist.get_rank()]) >>> output ['foo', 12, {1: 2}] `torch.distributed.gather(tensor, gather_list=None, dst=0, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#gather) Gathers a list of tensors in a single process. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Input tensor. * **gather_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__,__optional_) – List of appropriately-sized tensors to use for gathered data (default is None, must be specified on the destination rank) * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Destination rank (default is 0) * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `torch.distributed.gather_object(obj, object_gather_list=None, dst=0, group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#gather_object) Gathers picklable objects from the whole group in a single process. Similar to `gather()`, but Python objects can be passed in. Note that the object must be picklable in order to be gathered. Parameters * **obj** (_Any_) – Input object. Must be picklable. * **object_gather_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[__Any_ _]_) – Output list. On the `dst` rank, it should be correctly sized as the size of the group for this collective and will contain the output. Must be `None` on non-dst ranks. (default is `None`) * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Destination rank. (default is 0) * **group** – (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. Default is `None`. Returns None. On the `dst` rank, `object_gather_list` will contain the output of the collective. Note Note that this API differs slightly from the gather collective since it does not provide an async_op handle and thus will be a blocking call. Note Note that this API is not supported when using the NCCL backend. Warning `gather_object()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Only call this function with data you trust. Example:: >>> # Note: Process group initialization omitted on each rank. >>> import torch.distributed as dist >>> # Assumes world_size of 3. >>> gather_objects = ["foo", 12, {1: 2}] # any picklable object >>> output = [None for _ in gather_objects] >>> dist.gather_object( gather_objects[dist.get_rank()], output if dist.get_rank() == 0 else None, dst=0 ) >>> # On rank 0 >>> output ['foo', 12, {1: 2}] `torch.distributed.scatter(tensor, scatter_list=None, src=0, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#scatter) Scatters a list of tensors to all processes in a group. Each process will receive exactly one tensor and store its data in the `tensor` argument. Parameters * **tensor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Output tensor. * **scatter_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to scatter (default is None, must be specified on the source rank) * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank (default is 0) * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `torch.distributed.scatter_object_list(scatter_object_output_list, scatter_object_input_list, src=0, group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#scatter_object_list) Scatters picklable objects in `scatter_object_input_list` to the whole group. Similar to `scatter()`, but Python objects can be passed in. On each rank, the scattered object will be stored as the first element of `scatter_object_output_list`. Note that all objects in `scatter_object_input_list` must be picklable in order to be scattered. Parameters * **scatter_object_output_list** (_List_ _[__Any_ _]_) – Non-empty list whose first element will store the object scattered to this rank. * **scatter_object_input_list** (_List_ _[__Any_ _]_) – List of input objects to scatter. Each object must be picklable. Only objects on the `src` rank will be scattered, and the argument can be `None` for non-src ranks. * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank from which to scatter `scatter_object_input_list`. * **group** – (ProcessGroup, optional): The process group to work on. If None, the default process group will be used. Default is `None`. Returns `None`. If rank is part of the group, `scatter_object_output_list` will have its first element set to the scattered object for this rank. Note Note that this API differs slightly from the scatter collective since it does not provide an `async_op` handle and thus will be a blocking call. Warning `scatter_object_list()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Only call this function with data you trust. Example:: >>> # Note: Process group initialization omitted on each rank. >>> import torch.distributed as dist >>> if dist.get_rank() == 0: >>> # Assumes world_size of 3. >>> objects = ["foo", 12, {1: 2}] # any picklable object >>> else: >>> # Can be any list on non-src ranks, elements are not used. >>> objects = [None, None, None] >>> output_list = [None] >>> dist.scatter_object_list(output_list, objects, src=0) >>> # Rank i gets objects[i]. For example, on rank 2: >>> output_list [{1: 2}] `torch.distributed.reduce_scatter(output, input_list, op=, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce_scatter) Reduces, then scatters a list of tensors to all processes in a group. Parameters * **output** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Output tensor. * **input_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to reduce and scatter. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op. Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group. `torch.distributed.all_to_all(output_tensor_list, input_tensor_list, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_to_all) Each process scatters list of input tensors to all processes in a group and return gathered list of tensors in output list. Parameters * **output_tensor_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to be gathered one per rank. * **input_tensor_list** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors to scatter one per rank. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op. Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group. Warning `all_to_all` is experimental and subject to change. #### Examples >>> input = torch.arange(4) + rank * 4 >>> input = list(input.chunk(4)) >>> input [tensor([0]), tensor([1]), tensor([2]), tensor([3])] # Rank 0 [tensor([4]), tensor([5]), tensor([6]), tensor([7])] # Rank 1 [tensor([8]), tensor([9]), tensor([10]), tensor([11])] # Rank 2 [tensor([12]), tensor([13]), tensor([14]), tensor([15])] # Rank 3 >>> output = list(torch.empty([4], dtype=torch.int64).chunk(4)) >>> dist.all_to_all(output, input) >>> output [tensor([0]), tensor([4]), tensor([8]), tensor([12])] # Rank 0 [tensor([1]), tensor([5]), tensor([9]), tensor([13])] # Rank 1 [tensor([2]), tensor([6]), tensor([10]), tensor([14])] # Rank 2 [tensor([3]), tensor([7]), tensor([11]), tensor([15])] # Rank 3 >>> # Essentially, it is similar to following operation: >>> scatter_list = input >>> gather_list = output >>> for i in range(world_size): >>> dist.scatter(gather_list[i], scatter_list if i == rank else [], src = i) >>> input tensor([0, 1, 2, 3, 4, 5]) # Rank 0 tensor([10, 11, 12, 13, 14, 15, 16, 17, 18]) # Rank 1 tensor([20, 21, 22, 23, 24]) # Rank 2 tensor([30, 31, 32, 33, 34, 35, 36]) # Rank 3 >>> input_splits [2, 2, 1, 1] # Rank 0 [3, 2, 2, 2] # Rank 1 [2, 1, 1, 1] # Rank 2 [2, 2, 2, 1] # Rank 3 >>> output_splits [2, 3, 2, 2] # Rank 0 [2, 2, 1, 2] # Rank 1 [1, 2, 1, 2] # Rank 2 [1, 2, 1, 1] # Rank 3 >>> input = list(input.split(input_splits)) >>> input [tensor([0, 1]), tensor([2, 3]), tensor([4]), tensor([5])] # Rank 0 [tensor([10, 11, 12]), tensor([13, 14]), tensor([15, 16]), tensor([17, 18])] # Rank 1 [tensor([20, 21]), tensor([22]), tensor([23]), tensor([24])] # Rank 2 [tensor([30, 31]), tensor([32, 33]), tensor([34, 35]), tensor([36])] # Rank 3 >>> output = ... >>> dist.all_to_all(output, input) >>> output [tensor([0, 1]), tensor([10, 11, 12]), tensor([20, 21]), tensor([30, 31])] # Rank 0 [tensor([2, 3]), tensor([13, 14]), tensor([22]), tensor([32, 33])] # Rank 1 [tensor([4]), tensor([15, 16]), tensor([23]), tensor([34, 35])] # Rank 2 [tensor([5]), tensor([17, 18]), tensor([24]), tensor([36])] # Rank 3 `torch.distributed.barrier(group=None, async_op=False, device_ids=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#barrier) Synchronizes all processes. This collective blocks processes until the whole group enters this function, if async_op is False, or if async work handle is called on wait(). Parameters * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op * **device_ids** (_[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – List of device/GPU ids. Valid only for NCCL backend. Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `class torch.distributed.ReduceOp` An enum-like class for available reduction operations: `SUM`, `PRODUCT`, `MIN`, `MAX`, `BAND`, `BOR`, and `BXOR`. Note that `BAND`, `BOR`, and `BXOR` reductions are not available when using the `NCCL` backend. Additionally, `MAX`, `MIN` and `PRODUCT` are not supported for complex tensors. The values of this class can be accessed as attributes, e.g., `ReduceOp.SUM`. They are used in specifying strategies for reduction collectives, e.g., `reduce()`, `all_reduce_multigpu()`, etc. Members: SUM PRODUCT MIN MAX BAND BOR BXOR `class torch.distributed.reduce_op` Deprecated enum-like class for reduction operations: `SUM`, `PRODUCT`, `MIN`, and `MAX`. `ReduceOp` is recommended to use instead. ## Autograd-enabled communication primitives If you want to use collective communication functions supporting autograd you can find an implementation of those in the `torch.distributed.nn.*` module. Functions here are synchronous and will be inserted in the autograd graph, so you need to ensure that all the processes that participated in the collective operation will do the backward pass for the backward communication to effectively happen and don’t cause a deadlock. Please notice that currently the only backend where all the functions are guaranteed to work is `gloo`. .. autofunction:: torch.distributed.nn.broadcast .. autofunction:: torch.distributed.nn.gather .. autofunction:: torch.distributed.nn.scatter .. autofunction:: torch.distributed.nn.reduce .. autofunction:: torch.distributed.nn.all_gather .. autofunction:: torch.distributed.nn.all_to_all .. autofunction:: torch.distributed.nn.all_reduce ## Multi-GPU collective functions If you have more than one GPU on each node, when using the NCCL and Gloo backend, `broadcast_multigpu()` `all_reduce_multigpu()` `reduce_multigpu()` `all_gather_multigpu()` and `reduce_scatter_multigpu()` support distributed collective operations among multiple GPUs within each node. These functions can potentially improve the overall distributed training performance and be easily used by passing a list of tensors. Each Tensor in the passed tensor list needs to be on a separate GPU device of the host where the function is called. Note that the length of the tensor list needs to be identical among all the distributed processes. Also note that currently the multi-GPU collective functions are only supported by the NCCL backend. For example, if the system we use for distributed training has 2 nodes, each of which has 8 GPUs. On each of the 16 GPUs, there is a tensor that we would like to all-reduce. The following code can serve as a reference: Code running on Node 0 import torch import torch.distributed as dist dist.init_process_group(backend="nccl", init_method="file:///distributed_test", world_size=2, rank=0) tensor_list = [] for dev_idx in range(torch.cuda.device_count()): tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx)) dist.all_reduce_multigpu(tensor_list) Code running on Node 1 import torch import torch.distributed as dist dist.init_process_group(backend="nccl", init_method="file:///distributed_test", world_size=2, rank=1) tensor_list = [] for dev_idx in range(torch.cuda.device_count()): tensor_list.append(torch.FloatTensor([1]).cuda(dev_idx)) dist.all_reduce_multigpu(tensor_list) After the call, all 16 tensors on the two nodes will have the all-reduced value of 16 `torch.distributed.broadcast_multigpu(tensor_list, src, group=None, async_op=False, src_tensor=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#broadcast_multigpu) Broadcasts the tensor to the whole group with multiple GPU tensors per node. `tensor` must have the same number of elements in all the GPUs from all processes participating in the collective. each tensor in the list must be on a different GPU Only nccl and gloo backend are currently supported tensors should only be GPU tensors Parameters * **tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – Tensors that participate in the collective operation. If `src` is the rank, then the specified `src_tensor` element of `tensor_list` (`tensor_list[src_tensor]`) will be broadcast to all other tensors (on different GPUs) in the src process and all tensors in `tensor_list` of other non-src processes. You also need to make sure that `len(tensor_list)` is the same for all the distributed processes calling this function. * **src** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Source rank. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op * **src_tensor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Source tensor rank within `tensor_list` Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `torch.distributed.all_reduce_multigpu(tensor_list, op=, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_reduce_multigpu) Reduces the tensor data across all machines in such a way that all get the final result. This function reduces a number of tensors on every node, while each tensor resides on different GPUs. Therefore, the input tensor in the tensor list needs to be GPU tensors. Also, each tensor in the tensor list needs to reside on a different GPU. After the call, all `tensor` in `tensor_list` is going to be bitwise identical in all processes. Complex tensors are supported. Only nccl and gloo backend is currently supported tensors should only be GPU tensors Parameters * **list** (_tensor_) – List of input and output tensors of the collective. The function operates in-place and requires that each tensor to be a GPU tensor on different GPUs. You also need to make sure that `len(tensor_list)` is the same for all the distributed processes calling this function. * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `torch.distributed.reduce_multigpu(tensor_list, dst, op=, group=None, async_op=False, dst_tensor=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce_multigpu) Reduces the tensor data on multiple GPUs across all machines. Each tensor in `tensor_list` should reside on a separate GPU Only the GPU of `tensor_list[dst_tensor]` on the process with rank `dst` is going to receive the final result. Only nccl backend is currently supported tensors should only be GPU tensors Parameters * **tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – Input and output GPU tensors of the collective. The function operates in-place. You also need to make sure that `len(tensor_list)` is the same for all the distributed processes calling this function. * **dst** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Destination rank * **op** (_optional_) – One of the values from `torch.distributed.ReduceOp` enum. Specifies an operation used for element-wise reductions. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op * **dst_tensor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Destination tensor rank within `tensor_list` Returns Async work handle, if async_op is set to True. None, otherwise `torch.distributed.all_gather_multigpu(output_tensor_lists, input_tensor_list, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#all_gather_multigpu) Gathers tensors from the whole group in a list. Each tensor in `tensor_list` should reside on a separate GPU Only nccl backend is currently supported tensors should only be GPU tensors Complex tensors are supported. Parameters * **output_tensor_lists** (_List_ _[__List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__]_) – Output lists. It should contain correctly-sized tensors on each GPU to be used for output of the collective, e.g. `output_tensor_lists[i]` contains the all_gather result that resides on the GPU of `input_tensor_list[i]`. Note that each element of `output_tensor_lists` has the size of `world_size * len(input_tensor_list)`, since the function all gathers the result from every single GPU in the group. To interpret each element of `output_tensor_lists[i]`, note that `input_tensor_list[j]` of rank k will be appear in `output_tensor_lists[i][k * world_size + j]` Also note that `len(output_tensor_lists)`, and the size of each element in `output_tensor_lists` (each element is a list, therefore `len(output_tensor_lists[i])`) need to be the same for all the distributed processes calling this function. * **input_tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – List of tensors(on different GPUs) to be broadcast from current process. Note that `len(input_tensor_list)` needs to be the same for all the distributed processes calling this function. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group `torch.distributed.reduce_scatter_multigpu(output_tensor_list, input_tensor_lists, op=, group=None, async_op=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/distributed_c10d.html#reduce_scatter_multigpu) Reduce and scatter a list of tensors to the whole group. Only nccl backend is currently supported. Each tensor in `output_tensor_list` should reside on a separate GPU, as should each list of tensors in `input_tensor_lists`. Parameters * **output_tensor_list** (_List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]_) – Output tensors (on different GPUs) to receive the result of the operation. Note that `len(output_tensor_list)` needs to be the same for all the distributed processes calling this function. * **input_tensor_lists** (_List_ _[__List_ _[_[Tensor](tensors#torch.Tensor "torch.Tensor") _]__]_) – Input lists. It should contain correctly-sized tensors on each GPU to be used for input of the collective, e.g. `input_tensor_lists[i]` contains the reduce_scatter input that resides on the GPU of `output_tensor_list[i]`. Note that each element of `input_tensor_lists` has the size of `world_size * len(output_tensor_list)`, since the function scatters the result from every single GPU in the group. To interpret each element of `input_tensor_lists[i]`, note that `output_tensor_list[j]` of rank k receives the reduce-scattered result from `input_tensor_lists[i][k * world_size + j]` Also note that `len(input_tensor_lists)`, and the size of each element in `input_tensor_lists` (each element is a list, therefore `len(input_tensor_lists[i])`) need to be the same for all the distributed processes calling this function. * **group** (_ProcessGroup_ _,__optional_) – The process group to work on. If None, the default process group will be used. * **async_op** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether this op should be an async op. Returns Async work handle, if async_op is set to True. None, if not async_op or if not part of the group. ## Third-party backends Besides the GLOO/MPI/NCCL backends, PyTorch distributed supports third-party backends through a run-time register mechanism. For references on how to develop a third-party backend through C++ Extension, please refer to [Tutorials - Custom C++ and CUDA Extensions](https://pytorch.org/tutorials/advanced/cpp_extension.html) and `test/cpp_extensions/cpp_c10d_extension.cpp`. The capability of third-party backends are decided by their own implementations. The new backend derives from `c10d.ProcessGroup` and registers the backend name and the instantiating interface through `torch.distributed.Backend.register_backend()` when imported. When manually importing this backend and invoking `torch.distributed.init_process_group()` with the corresponding backend name, the `torch.distributed` package runs on the new backend. Warning The support of third-party backend is experimental and subject to change. ## Launch utility The `torch.distributed` package also provides a launch utility in `torch.distributed.launch`. This helper utility can be used to launch multiple processes per node for distributed training. `torch.distributed.launch` is a module that spawns up multiple distributed training processes on each of the training nodes. The utility can be used for single-node distributed training, in which one or more processes per node will be spawned. The utility can be used for either CPU training or GPU training. If the utility is used for GPU training, each distributed process will be operating on a single GPU. This can achieve well- improved single-node training performance. It can also be used in multi-node distributed training, by spawning up multiple processes on each node for well- improved multi-node distributed training performance as well. This will especially be benefitial for systems with multiple Infiniband interfaces that have direct-GPU support, since all of them can be utilized for aggregated communication bandwidth. In both cases of single-node distributed training or multi-node distributed training, this utility will launch the given number of processes per node (`--nproc_per_node`). If used for GPU training, this number needs to be less or equal to the number of GPUs on the current system (`nproc_per_node`), and each process will be operating on a single GPU from _GPU 0 to GPU (nproc_per_node - 1)_. **How to use this module:** 1. Single-Node multi-process distributed training >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script) 2. Multi-Node multi-process distributed training: (e.g. two nodes) Node 1: _(IP: 192.168.1.1, and has a free port: 1234)_ >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE --nnodes=2 --node_rank=0 --master_addr="192.168.1.1" --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script) Node 2: >>> python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE --nnodes=2 --node_rank=1 --master_addr="192.168.1.1" --master_port=1234 YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script) 3. To look up what optional arguments this module offers: >>> python -m torch.distributed.launch --help **Important Notices:** 1\. This utility and multi-process distributed (single-node or multi-node) GPU training currently only achieves the best performance using the NCCL distributed backend. Thus NCCL backend is the recommended backend to use for GPU training. 2\. In your training program, you must parse the command-line argument: `--local_rank=LOCAL_PROCESS_RANK`, which will be provided by this module. If your training program uses GPUs, you should ensure that your code only runs on the GPU device of LOCAL_PROCESS_RANK. This can be done by: Parsing the local_rank argument >>> import argparse >>> parser = argparse.ArgumentParser() >>> parser.add_argument("--local_rank", type=int) >>> args = parser.parse_args() Set your device to local rank using either >>> torch.cuda.set_device(args.local_rank) # before your code runs or >>> with torch.cuda.device(args.local_rank): >>> # your code to run 3\. In your training program, you are supposed to call the following function at the beginning to start the distributed backend. You need to make sure that the init_method uses `env://`, which is the only supported `init_method` by this module. torch.distributed.init_process_group(backend='YOUR BACKEND', init_method='env://') 4\. In your training program, you can either use regular distributed functions or use [`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") module. If your training program uses GPUs for training and you would like to use [`torch.nn.parallel.DistributedDataParallel()`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") module, here is how to configure it. model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[args.local_rank], output_device=args.local_rank) Please ensure that `device_ids` argument is set to be the only GPU device id that your code will be operating on. This is generally the local rank of the process. In other words, the `device_ids` needs to be `[args.local_rank]`, and `output_device` needs to be `args.local_rank` in order to use this utility 5\. Another way to pass `local_rank` to the subprocesses via environment variable `LOCAL_RANK`. This behavior is enabled when you launch the script with `--use_env=True`. You must adjust the subprocess example above to replace `args.local_rank` with `os.environ['LOCAL_RANK']`; the launcher will not pass `--local_rank` when you specify this flag. Warning `local_rank` is NOT globally unique: it is only unique per process on a machine. Thus, don’t use it to decide if you should, e.g., write to a networked filesystem. See for an example of how things can go wrong if you don’t do this correctly. ## Spawn utility The [Multiprocessing package - torch.multiprocessing](multiprocessing#multiprocessing-doc) package also provides a `spawn` function in [`torch.multiprocessing.spawn()`](multiprocessing#torch.multiprocessing.spawn "torch.multiprocessing.spawn"). This helper function can be used to spawn multiple processes. It works by passing in the function that you want to run and spawns N processes to run it. This can be used for multiprocess distributed training as well. For references on how to use it, please refer to [PyTorch example - ImageNet implementation](https://github.com/pytorch/examples/tree/master/imagenet) Note that this function requires Python 3.4 or higher. # Probability distributions - torch.distributions The `distributions` package contains parameterizable probability distributions and sampling functions. This allows the construction of stochastic computation graphs and stochastic gradient estimators for optimization. This package generally follows the design of the [TensorFlow Distributions](https://arxiv.org/abs/1711.10604) package. It is not possible to directly backpropagate through random samples. However, there are two main methods for creating surrogate functions that can be backpropagated through. These are the score function estimator/likelihood ratio estimator/REINFORCE and the pathwise derivative estimator. REINFORCE is commonly seen as the basis for policy gradient methods in reinforcement learning, and the pathwise derivative estimator is commonly seen in the reparameterization trick in variational autoencoders. Whilst the score function only requires the value of samples f(x)f(x) , the pathwise derivative requires the derivative f′(x)f'(x) . The next sections discuss these two in a reinforcement learning example. For more details see [Gradient Estimation Using Stochastic Computation Graphs](https://arxiv.org/abs/1506.05254) . ## Score function When the probability density function is differentiable with respect to its parameters, we only need `sample()` and `log_prob()` to implement REINFORCE: Δθ=αr∂log⁡p(a∣πθ(s))∂θ\Delta\theta = \alpha r \frac{\partial\log p(a|\pi^\theta(s))}{\partial\theta} where θ\theta are the parameters, α\alpha is the learning rate, rr is the reward and p(a∣πθ(s))p(a|\pi^\theta(s)) is the probability of taking action aa in state ss given policy πθ\pi^\theta . In practice we would sample an action from the output of a network, apply this action in an environment, and then use `log_prob` to construct an equivalent loss function. Note that we use a negative because optimizers use gradient descent, whilst the rule above assumes gradient ascent. With a categorical policy, the code for implementing REINFORCE would be as follows: probs = policy_network(state) # Note that this is equivalent to what used to be called multinomial m = Categorical(probs) action = m.sample() next_state, reward = env.step(action) loss = -m.log_prob(action) * reward loss.backward() ## Pathwise derivative The other way to implement these stochastic/policy gradients would be to use the reparameterization trick from the `rsample()` method, where the parameterized random variable can be constructed via a parameterized deterministic function of a parameter-free random variable. The reparameterized sample therefore becomes differentiable. The code for implementing the pathwise derivative would be as follows: params = policy_network(state) m = Normal(*params) # Any distribution with .has_rsample == True could work based on the application action = m.rsample() next_state, reward = env.step(action) # Assuming that reward is differentiable loss = -reward loss.backward() ## Distribution `class torch.distributions.distribution.Distribution(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution) Bases: [`object`](https://docs.python.org/3/library/functions.html#object "\(in Python v3.9\)") Distribution is the abstract base class for probability distributions. `property arg_constraints` Returns a dictionary from argument names to `Constraint` objects that should be satisfied by each argument of this distribution. Args that are not tensors need not appear in this dict. `property batch_shape` Returns the shape over which parameters are batched. `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.cdf) Returns the cumulative density/mass function evaluated at `value`. Parameters **value** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.entropy) Returns entropy of distribution, batched over batch_shape. Returns Tensor of shape batch_shape. `enumerate_support(expand=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.enumerate_support) Returns tensor containing all values supported by a discrete distribution. The result will enumerate over dimension 0, so the shape of the result will be `(cardinality,) + batch_shape + event_shape` (where `event_shape = ()` for univariate distributions). Note that this enumerates over all batched tensors in lock-step `[[0, 0], [1, 1], …]`. With `expand=False`, enumeration happens along dim 0, but with the remaining batch dimensions being singleton dimensions, `[[0], [1], ..`. To iterate over the full Cartesian product use `itertools.product(m.enumerate_support())`. Parameters **expand** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to expand the support over the batch dims to match the distribution’s `batch_shape`. Returns Tensor iterating over dimension 0. `property event_shape` Returns the shape of a single sample (without batching). `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.expand) Returns a new distribution instance (or populates an existing instance provided by a derived class) with batch dimensions expanded to `batch_shape`. This method calls [`expand`](tensors#torch.Tensor.expand "torch.Tensor.expand") on the distribution’s parameters. As such, this does not allocate new memory for the expanded distribution instance. Additionally, this does not repeat any args checking or parameter broadcasting in `__init__.py`, when an instance is first created. Parameters * **batch_shape** (_torch.Size_) – the desired expanded size. * **_instance** – new instance provided by subclasses that need to override `.expand`. Returns New distribution instance with batch dimensions expanded to `batch_size`. `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.icdf) Returns the inverse cumulative density/mass function evaluated at `value`. Parameters **value** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.log_prob) Returns the log of the probability density/mass function evaluated at `value`. Parameters **value** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – `property mean` Returns the mean of the distribution. `perplexity()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.perplexity) Returns perplexity of distribution, batched over batch_shape. Returns Tensor of shape batch_shape. `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.rsample) Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched. `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.sample) Generates a sample_shape shaped sample or sample_shape shaped batch of samples if the distribution parameters are batched. `sample_n(n)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.sample_n) Generates n samples or n batches of samples if the distribution parameters are batched. `static set_default_validate_args(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/distribution.html#Distribution.set_default_validate_args) Sets whether validation is enabled or disabled. The default behavior mimics Python’s `assert` statement: validation is on by default, but is disabled if Python is run in optimized mode (via `python -O`). Validation may be expensive, so you may want to disable it once a model is working. Parameters **value** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to enable validation. `property stddev` Returns the standard deviation of the distribution. `property support` Returns a `Constraint` object representing this distribution’s support. `property variance` Returns the variance of the distribution. ## ExponentialFamily `class torch.distributions.exp_family.ExponentialFamily(batch_shape=torch.Size([]), event_shape=torch.Size([]), validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exp_family.html#ExponentialFamily) Bases: `torch.distributions.distribution.Distribution` ExponentialFamily is the abstract base class for probability distributions belonging to an exponential family, whose probability mass/density function has the form is defined below pF(x;θ)=exp⁡(⟨t(x),θ⟩−F(θ)+k(x))p_{F}(x; \theta) = \exp(\langle t(x), \theta\rangle - F(\theta) + k(x)) where θ\theta denotes the natural parameters, t(x)t(x) denotes the sufficient statistic, F(θ)F(\theta) is the log normalizer function for a given family and k(x)k(x) is the carrier measure. Note This class is an intermediary between the `Distribution` class and distributions which belong to an exponential family mainly to check the correctness of the `.entropy()` and analytic KL divergence methods. We use this class to compute the entropy and KL divergence using the AD framework and Bregman divergences (courtesy of: Frank Nielsen and Richard Nock, Entropies and Cross-entropies of Exponential Families). `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exp_family.html#ExponentialFamily.entropy) Method to compute the entropy using Bregman divergence of the log normalizer. ## Bernoulli `class torch.distributions.bernoulli.Bernoulli(probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli) Bases: `torch.distributions.exp_family.ExponentialFamily` Creates a Bernoulli distribution parameterized by `probs` or `logits` (but not both). Samples are binary (0 or 1). They take the value `1` with probability `p` and `0` with probability `1 - p`. Example: >>> m = Bernoulli(torch.tensor([0.3])) >>> m.sample() # 30% chance 1; 70% chance 0 tensor([ 0.]) Parameters * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1` * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1` `arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.entropy) `enumerate_support(expand=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.enumerate_support) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.expand) `has_enumerate_support = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.log_prob) `logits` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.logits) `property mean` `property param_shape` `probs` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.probs) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/bernoulli.html#Bernoulli.sample) `support = Boolean()` `property variance` ## Beta `class torch.distributions.beta.Beta(concentration1, concentration0, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta) Bases: `torch.distributions.exp_family.ExponentialFamily` Beta distribution parameterized by `concentration1` and `concentration0`. Example: >>> m = Beta(torch.tensor([0.5]), torch.tensor([0.5])) >>> m.sample() # Beta distributed with concentration concentration1 and concentration0 tensor([ 0.1046]) Parameters * **concentration1** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 1st concentration parameter of the distribution (often referred to as alpha) * **concentration0** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 2nd concentration parameter of the distribution (often referred to as beta) `arg_constraints = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0)}` `property concentration0` `property concentration1` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.expand) `has_rsample = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.log_prob) `property mean` `rsample(sample_shape=())` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/beta.html#Beta.rsample) `support = Interval(lower_bound=0.0, upper_bound=1.0)` `property variance` ## Binomial `class torch.distributions.binomial.Binomial(total_count=1, probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial) Bases: `torch.distributions.distribution.Distribution` Creates a Binomial distribution parameterized by `total_count` and either `probs` or `logits` (but not both). `total_count` must be broadcastable with `probs`/`logits`. Example: >>> m = Binomial(100, torch.tensor([0 , .2, .8, 1])) >>> x = m.sample() tensor([ 0., 22., 71., 100.]) >>> m = Binomial(torch.tensor([[5.], [10.]]), torch.tensor([0.5, 0.8])) >>> x = m.sample() tensor([[ 4., 5.], [ 7., 6.]]) Parameters * **total_count** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – number of Bernoulli trials * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event probabilities * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event log-odds `arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0), 'total_count': IntegerGreaterThan(lower_bound=0)}` `enumerate_support(expand=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.enumerate_support) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.expand) `has_enumerate_support = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.log_prob) `logits` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.logits) `property mean` `property param_shape` `probs` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.probs) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/binomial.html#Binomial.sample) `property support` `property variance` ## Categorical `class torch.distributions.categorical.Categorical(probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical) Bases: `torch.distributions.distribution.Distribution` Creates a categorical distribution parameterized by either `probs` or `logits` (but not both). Note It is equivalent to the distribution that [`torch.multinomial()`](generated/torch.multinomial#torch.multinomial "torch.multinomial") samples from. Samples are integers from {0,…,K−1}\\{0, \ldots, K-1\\} where `K` is `probs.size(-1)`. If `probs` is 1-dimensional with length-`K`, each element is the relative probability of sampling the class at that index. If `probs` is N-dimensional, the first N-1 dimensions are treated as a batch of relative probability vectors. Note The `probs` argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension. attr:`probs` will return this normalized value. The `logits` argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension. attr:`logits` will return this normalized value. See also: [`torch.multinomial()`](generated/torch.multinomial#torch.multinomial "torch.multinomial") Example: >>> m = Categorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) >>> m.sample() # equal probability of 0, 1, 2, 3 tensor(3) Parameters * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event log probabilities (unnormalized) `arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.entropy) `enumerate_support(expand=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.enumerate_support) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.expand) `has_enumerate_support = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.log_prob) `logits` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.logits) `property mean` `property param_shape` `probs` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.probs) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/categorical.html#Categorical.sample) `property support` `property variance` ## Cauchy `class torch.distributions.cauchy.Cauchy(loc, scale, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy) Bases: `torch.distributions.distribution.Distribution` Samples from a Cauchy (Lorentz) distribution. The distribution of the ratio of independent normally distributed random variables with means `0` follows a Cauchy distribution. Example: >>> m = Cauchy(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Cauchy distribution with loc=0 and scale=1 tensor([ 2.3214]) Parameters * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mode or median of the distribution. * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – half width at half maximum. `arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.expand) `has_rsample = True` `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/cauchy.html#Cauchy.rsample) `support = Real()` `property variance` ## Chi2 `class torch.distributions.chi2.Chi2(df, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/chi2.html#Chi2) Bases: `torch.distributions.gamma.Gamma` Creates a Chi2 distribution parameterized by shape parameter `df`. This is exactly equivalent to `Gamma(alpha=0.5*df, beta=0.5)` Example: >>> m = Chi2(torch.tensor([1.0])) >>> m.sample() # Chi2 distributed with shape df=1 tensor([ 0.1046]) Parameters **df** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – shape parameter of the distribution `arg_constraints = {'df': GreaterThan(lower_bound=0.0)}` `property df` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/chi2.html#Chi2.expand) ## ContinuousBernoulli `class torch.distributions.continuous_bernoulli.ContinuousBernoulli(probs=None, logits=None, lims=(0.499, 0.501), validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli) Bases: `torch.distributions.exp_family.ExponentialFamily` Creates a continuous Bernoulli distribution parameterized by `probs` or `logits` (but not both). The distribution is supported in [0, 1] and parameterized by ‘probs’ (in (0,1)) or ‘logits’ (real-valued). Note that, unlike the Bernoulli, ‘probs’ does not correspond to a probability and ‘logits’ does not correspond to log- odds, but the same names are used due to the similarity with the Bernoulli. See [1] for more details. Example: >>> m = ContinuousBernoulli(torch.tensor([0.3])) >>> m.sample() tensor([ 0.2538]) Parameters * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – (0,1) valued parameters * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – real valued parameters whose sigmoid matches ‘probs’ [1] The continuous Bernoulli: fixing a pervasive error in variational autoencoders, Loaiza-Ganem G and Cunningham JP, NeurIPS 2019. `arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.expand) `has_rsample = True` `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.log_prob) `logits` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.logits) `property mean` `property param_shape` `probs` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.probs) `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.rsample) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/continuous_bernoulli.html#ContinuousBernoulli.sample) `property stddev` `support = Interval(lower_bound=0.0, upper_bound=1.0)` `property variance` ## Dirichlet `class torch.distributions.dirichlet.Dirichlet(concentration, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet) Bases: `torch.distributions.exp_family.ExponentialFamily` Creates a Dirichlet distribution parameterized by concentration `concentration`. Example: >>> m = Dirichlet(torch.tensor([0.5, 0.5])) >>> m.sample() # Dirichlet distributed with concentrarion concentration tensor([ 0.1046, 0.8954]) Parameters **concentration** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – concentration parameter of the distribution (often referred to as alpha) `arg_constraints = {'concentration': IndependentConstraint(GreaterThan(lower_bound=0.0), 1)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.expand) `has_rsample = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.log_prob) `property mean` `rsample(sample_shape=())` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/dirichlet.html#Dirichlet.rsample) `support = Simplex()` `property variance` ## Exponential `class torch.distributions.exponential.Exponential(rate, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential) Bases: `torch.distributions.exp_family.ExponentialFamily` Creates a Exponential distribution parameterized by `rate`. Example: >>> m = Exponential(torch.tensor([1.0])) >>> m.sample() # Exponential distributed with rate=1 tensor([ 0.1046]) Parameters **rate** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – rate = 1 / scale of the distribution `arg_constraints = {'rate': GreaterThan(lower_bound=0.0)}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.expand) `has_rsample = True` `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/exponential.html#Exponential.rsample) `property stddev` `support = GreaterThan(lower_bound=0.0)` `property variance` ## FisherSnedecor `class torch.distributions.fishersnedecor.FisherSnedecor(df1, df2, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor) Bases: `torch.distributions.distribution.Distribution` Creates a Fisher-Snedecor distribution parameterized by `df1` and `df2`. Example: >>> m = FisherSnedecor(torch.tensor([1.0]), torch.tensor([2.0])) >>> m.sample() # Fisher-Snedecor-distributed with df1=1 and df2=2 tensor([ 0.2453]) Parameters * **df1** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – degrees of freedom parameter 1 * **df2** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – degrees of freedom parameter 2 `arg_constraints = {'df1': GreaterThan(lower_bound=0.0), 'df2': GreaterThan(lower_bound=0.0)}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor.expand) `has_rsample = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/fishersnedecor.html#FisherSnedecor.rsample) `support = GreaterThan(lower_bound=0.0)` `property variance` ## Gamma `class torch.distributions.gamma.Gamma(concentration, rate, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma) Bases: `torch.distributions.exp_family.ExponentialFamily` Creates a Gamma distribution parameterized by shape `concentration` and `rate`. Example: >>> m = Gamma(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # Gamma distributed with concentration=1 and rate=1 tensor([ 0.1046]) Parameters * **concentration** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – shape parameter of the distribution (often referred to as alpha) * **rate** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – rate = 1 / scale of the distribution (often referred to as beta) `arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'rate': GreaterThan(lower_bound=0.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.expand) `has_rsample = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gamma.html#Gamma.rsample) `support = GreaterThan(lower_bound=0.0)` `property variance` ## Geometric `class torch.distributions.geometric.Geometric(probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric) Bases: `torch.distributions.distribution.Distribution` Creates a Geometric distribution parameterized by `probs`, where `probs` is the probability of success of Bernoulli trials. It represents the probability that in k+1k + 1 Bernoulli trials, the first kk trials failed, before seeing a success. Samples are non-negative integers [0, inf⁡\inf ). Example: >>> m = Geometric(torch.tensor([0.3])) >>> m.sample() # underlying Bernoulli has 30% chance 1; 70% chance 0 tensor([ 2.]) Parameters * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1`. Must be in range (0, 1] * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1`. `arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.expand) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.log_prob) `logits` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.logits) `property mean` `probs` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.probs) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/geometric.html#Geometric.sample) `support = IntegerGreaterThan(lower_bound=0)` `property variance` ## Gumbel `class torch.distributions.gumbel.Gumbel(loc, scale, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Samples from a Gumbel Distribution. Examples: >>> m = Gumbel(torch.tensor([1.0]), torch.tensor([2.0])) >>> m.sample() # sample from Gumbel distribution with loc=1, scale=2 tensor([ 1.0124]) Parameters * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Location parameter of the distribution * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Scale parameter of the distribution `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel.expand) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/gumbel.html#Gumbel.log_prob) `property mean` `property stddev` `support = Real()` `property variance` ## HalfCauchy `class torch.distributions.half_cauchy.HalfCauchy(scale, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Creates a half-Cauchy distribution parameterized by `scale` where: X ~ Cauchy(0, scale) Y = |X| ~ HalfCauchy(scale) Example: >>> m = HalfCauchy(torch.tensor([1.0])) >>> m.sample() # half-cauchy distributed with scale=1 tensor([ 2.3214]) Parameters **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – scale of the full Cauchy distribution `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'scale': GreaterThan(lower_bound=0.0)}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.expand) `has_rsample = True` `icdf(prob)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_cauchy.html#HalfCauchy.log_prob) `property mean` `property scale` `support = GreaterThan(lower_bound=0.0)` `property variance` ## HalfNormal `class torch.distributions.half_normal.HalfNormal(scale, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Creates a half-normal distribution parameterized by `scale` where: X ~ Normal(0, scale) Y = |X| ~ HalfNormal(scale) Example: >>> m = HalfNormal(torch.tensor([1.0])) >>> m.sample() # half-normal distributed with scale=1 tensor([ 0.1046]) Parameters **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – scale of the full Normal distribution `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'scale': GreaterThan(lower_bound=0.0)}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.expand) `has_rsample = True` `icdf(prob)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/half_normal.html#HalfNormal.log_prob) `property mean` `property scale` `support = GreaterThan(lower_bound=0.0)` `property variance` ## Independent `class torch.distributions.independent.Independent(base_distribution, reinterpreted_batch_ndims, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent) Bases: `torch.distributions.distribution.Distribution` Reinterprets some of the batch dims of a distribution as event dims. This is mainly useful for changing the shape of the result of `log_prob()`. For example to create a diagonal Normal distribution with the same shape as a Multivariate Normal distribution (so they are interchangeable), you can: >>> loc = torch.zeros(3) >>> scale = torch.ones(3) >>> mvn = MultivariateNormal(loc, scale_tril=torch.diag(scale)) >>> [mvn.batch_shape, mvn.event_shape] [torch.Size(()), torch.Size((3,))] >>> normal = Normal(loc, scale) >>> [normal.batch_shape, normal.event_shape] [torch.Size((3,)), torch.Size(())] >>> diagn = Independent(normal, 1) >>> [diagn.batch_shape, diagn.event_shape] [torch.Size(()), torch.Size((3,))] Parameters * **base_distribution** (torch.distributions.distribution.Distribution) – a base distribution * **reinterpreted_batch_ndims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of batch dims to reinterpret as event dims `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.entropy) `enumerate_support(expand=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.enumerate_support) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.expand) `property has_enumerate_support` `property has_rsample` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.rsample) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/independent.html#Independent.sample) `property support` `property variance` ## Kumaraswamy `class torch.distributions.kumaraswamy.Kumaraswamy(concentration1, concentration0, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kumaraswamy.html#Kumaraswamy) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Samples from a Kumaraswamy distribution. Example: >>> m = Kumaraswamy(torch.Tensor([1.0]), torch.Tensor([1.0])) >>> m.sample() # sample from a Kumaraswamy distribution with concentration alpha=1 and beta=1 tensor([ 0.1729]) Parameters * **concentration1** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 1st concentration parameter of the distribution (often referred to as alpha) * **concentration0** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – 2nd concentration parameter of the distribution (often referred to as beta) `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'concentration0': GreaterThan(lower_bound=0.0), 'concentration1': GreaterThan(lower_bound=0.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kumaraswamy.html#Kumaraswamy.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kumaraswamy.html#Kumaraswamy.expand) `has_rsample = True` `property mean` `support = Interval(lower_bound=0.0, upper_bound=1.0)` `property variance` ## LKJCholesky `class torch.distributions.lkj_cholesky.LKJCholesky(dim, concentration=1.0, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky) Bases: `torch.distributions.distribution.Distribution` LKJ distribution for lower Cholesky factor of correlation matrices. The distribution is controlled by `concentration` parameter η\eta to make the probability of the correlation matrix MM generated from a Cholesky factor propotional to det⁡(M)η−1\det(M)^{\eta - 1} . Because of that, when `concentration == 1`, we have a uniform distribution over Cholesky factors of correlation matrices. Note that this distribution samples the Cholesky factor of correlation matrices and not the correlation matrices themselves and thereby differs slightly from the derivations in [1] for the `LKJCorr` distribution. For sampling, this uses the Onion method from [1] Section 3. L ~ LKJCholesky(dim, concentration) X = L @ L’ ~ LKJCorr(dim, concentration) Example: >>> l = LKJCholesky(3, 0.5) >>> l.sample() # l @ l.T is a sample of a correlation 3x3 matrix tensor([[ 1.0000, 0.0000, 0.0000], [ 0.3516, 0.9361, 0.0000], [-0.1899, 0.4748, 0.8593]]) Parameters * **dimension** (_dim_) – dimension of the matrices * **concentration** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – concentration/shape parameter of the distribution (often referred to as eta) **References** [1] `Generating random correlation matrices based on vines and extended onion method`, Daniel Lewandowski, Dorota Kurowicka, Harry Joe. `arg_constraints = {'concentration': GreaterThan(lower_bound=0.0)}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky.expand) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky.log_prob) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lkj_cholesky.html#LKJCholesky.sample) `support = CorrCholesky()` ## Laplace `class torch.distributions.laplace.Laplace(loc, scale, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace) Bases: `torch.distributions.distribution.Distribution` Creates a Laplace distribution parameterized by `loc` and `scale`. Example: >>> m = Laplace(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # Laplace distributed with loc=0, scale=1 tensor([ 0.1046]) Parameters * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – scale of the distribution `arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.expand) `has_rsample = True` `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/laplace.html#Laplace.rsample) `property stddev` `support = Real()` `property variance` ## LogNormal `class torch.distributions.log_normal.LogNormal(loc, scale, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/log_normal.html#LogNormal) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Creates a log-normal distribution parameterized by `loc` and `scale` where: X ~ Normal(loc, scale) Y = exp(X) ~ LogNormal(loc, scale) Example: >>> m = LogNormal(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # log-normal distributed with mean=0 and stddev=1 tensor([ 0.1046]) Parameters * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of log of distribution * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – standard deviation of log of the distribution `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/log_normal.html#LogNormal.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/log_normal.html#LogNormal.expand) `has_rsample = True` `property loc` `property mean` `property scale` `support = GreaterThan(lower_bound=0.0)` `property variance` ## LowRankMultivariateNormal `class torch.distributions.lowrank_multivariate_normal.LowRankMultivariateNormal(loc, cov_factor, cov_diag, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal) Bases: `torch.distributions.distribution.Distribution` Creates a multivariate normal distribution with covariance matrix having a low-rank form parameterized by `cov_factor` and `cov_diag`: covariance_matrix = cov_factor @ cov_factor.T + cov_diag #### Example >>> m = LowRankMultivariateNormal(torch.zeros(2), torch.tensor([[1.], [0.]]), torch.ones(2)) >>> m.sample() # normally distributed with mean=`[0,0]`, cov_factor=`[[1],[0]]`, cov_diag=`[1,1]` tensor([-0.2102, -0.5429]) Parameters * **loc** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution with shape `batch_shape + event_shape` * **cov_factor** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – factor part of low-rank form of covariance matrix with shape `batch_shape + event_shape + (rank,)` * **cov_diag** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – diagonal part of low-rank form of covariance matrix with shape `batch_shape + event_shape` Note The computation for determinant and inverse of covariance matrix is avoided when `cov_factor.shape[1] << cov_factor.shape[0]` thanks to [Woodbury matrix identity](https://en.wikipedia.org/wiki/Woodbury_matrix_identity) and [matrix determinant lemma](https://en.wikipedia.org/wiki/Matrix_determinant_lemma). Thanks to these formulas, we just need to compute the determinant and inverse of the small size “capacitance” matrix: capacitance = I + cov_factor.T @ inv(cov_diag) @ cov_factor `arg_constraints = {'cov_diag': IndependentConstraint(GreaterThan(lower_bound=0.0), 1), 'cov_factor': IndependentConstraint(Real(), 2), 'loc': IndependentConstraint(Real(), 1)}` `covariance_matrix` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.covariance_matrix) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.expand) `has_rsample = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.log_prob) `property mean` `precision_matrix` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.precision_matrix) `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.rsample) `scale_tril` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.scale_tril) `support = IndependentConstraint(Real(), 1)` `variance` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/lowrank_multivariate_normal.html#LowRankMultivariateNormal.variance) ## MixtureSameFamily `class torch.distributions.mixture_same_family.MixtureSameFamily(mixture_distribution, component_distribution, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily) Bases: `torch.distributions.distribution.Distribution` The `MixtureSameFamily` distribution implements a (batch of) mixture distribution where all component are from different parameterizations of the same distribution type. It is parameterized by a `Categorical` “selecting distribution” (over `k` component) and a component distribution, i.e., a `Distribution` with a rightmost batch shape (equal to `[k]`) which indexes each (batch of) component. Examples: # Construct Gaussian Mixture Model in 1D consisting of 5 equally # weighted normal distributions >>> mix = D.Categorical(torch.ones(5,)) >>> comp = D.Normal(torch.randn(5,), torch.rand(5,)) >>> gmm = MixtureSameFamily(mix, comp) # Construct Gaussian Mixture Modle in 2D consisting of 5 equally # weighted bivariate normal distributions >>> mix = D.Categorical(torch.ones(5,)) >>> comp = D.Independent(D.Normal( torch.randn(5,2), torch.rand(5,2)), 1) >>> gmm = MixtureSameFamily(mix, comp) # Construct a batch of 3 Gaussian Mixture Models in 2D each # consisting of 5 random weighted bivariate normal distributions >>> mix = D.Categorical(torch.rand(3,5)) >>> comp = D.Independent(D.Normal( torch.randn(3,5,2), torch.rand(3,5,2)), 1) >>> gmm = MixtureSameFamily(mix, comp) Parameters * **mixture_distribution** – `torch.distributions.Categorical`-like instance. Manages the probability of selecting component. The number of categories must match the rightmost batch dimension of the `component_distribution`. Must have either scalar `batch_shape` or `batch_shape` matching `component_distribution.batch_shape[:-1]` * **component_distribution** – `torch.distributions.Distribution`-like instance. Right-most batch dimension indexes component. `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {}` `cdf(x)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.cdf) `property component_distribution` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.expand) `has_rsample = False` `log_prob(x)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.log_prob) `property mean` `property mixture_distribution` `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/mixture_same_family.html#MixtureSameFamily.sample) `property support` `property variance` ## Multinomial `class torch.distributions.multinomial.Multinomial(total_count=1, probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial) Bases: `torch.distributions.distribution.Distribution` Creates a Multinomial distribution parameterized by `total_count` and either `probs` or `logits` (but not both). The innermost dimension of `probs` indexes over categories. All other dimensions index over batches. Note that `total_count` need not be specified if only `log_prob()` is called (see example below) Note The `probs` argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension. attr:`probs` will return this normalized value. The `logits` argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension. attr:`logits` will return this normalized value. * `sample()` requires a single shared `total_count` for all parameters and samples. * `log_prob()` allows different `total_count` for each parameter and sample. Example: >>> m = Multinomial(100, torch.tensor([ 1., 1., 1., 1.])) >>> x = m.sample() # equal probability of 0, 1, 2, 3 tensor([ 21., 24., 30., 25.]) >>> Multinomial(probs=torch.tensor([1., 1., 1., 1.])).log_prob(x) tensor([-4.1338]) Parameters * **total_count** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of trials * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event log probabilities (unnormalized) `arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial.expand) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial.log_prob) `property logits` `property mean` `property param_shape` `property probs` `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multinomial.html#Multinomial.sample) `property support` `total_count: int = None` `property variance` ## MultivariateNormal `class torch.distributions.multivariate_normal.MultivariateNormal(loc, covariance_matrix=None, precision_matrix=None, scale_tril=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal) Bases: `torch.distributions.distribution.Distribution` Creates a multivariate normal (also called Gaussian) distribution parameterized by a mean vector and a covariance matrix. The multivariate normal distribution can be parameterized either in terms of a positive definite covariance matrix Σ\mathbf{\Sigma} or a positive definite precision matrix Σ−1\mathbf{\Sigma}^{-1} or a lower-triangular matrix L\mathbf{L} with positive-valued diagonal entries, such that Σ=LL⊤\mathbf{\Sigma} = \mathbf{L}\mathbf{L}^\top . This triangular matrix can be obtained via e.g. Cholesky decomposition of the covariance. #### Example >>> m = MultivariateNormal(torch.zeros(2), torch.eye(2)) >>> m.sample() # normally distributed with mean=`[0,0]` and covariance_matrix=`I` tensor([-0.2102, -0.5429]) Parameters * **loc** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution * **covariance_matrix** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – positive-definite covariance matrix * **precision_matrix** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – positive-definite precision matrix * **scale_tril** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – lower-triangular factor of covariance, with positive-valued diagonal Note Only one of `covariance_matrix` or `precision_matrix` or `scale_tril` can be specified. Using `scale_tril` will be more efficient: all computations internally are based on `scale_tril`. If `covariance_matrix` or `precision_matrix` is passed instead, it is only used to compute the corresponding lower triangular matrices using a Cholesky decomposition. `arg_constraints = {'covariance_matrix': PositiveDefinite(), 'loc': IndependentConstraint(Real(), 1), 'precision_matrix': PositiveDefinite(), 'scale_tril': LowerCholesky()}` `covariance_matrix` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.covariance_matrix) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.expand) `has_rsample = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.log_prob) `property mean` `precision_matrix` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.precision_matrix) `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.rsample) `scale_tril` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/multivariate_normal.html#MultivariateNormal.scale_tril) `support = IndependentConstraint(Real(), 1)` `property variance` ## NegativeBinomial `class torch.distributions.negative_binomial.NegativeBinomial(total_count, probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial) Bases: `torch.distributions.distribution.Distribution` Creates a Negative Binomial distribution, i.e. distribution of the number of successful independent and identical Bernoulli trials before `total_count` failures are achieved. The probability of failure of each Bernoulli trial is `probs`. Parameters * **total_count** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – non-negative number of negative Bernoulli trials to stop, although the distribution is still valid for real valued count * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event probabilities of failure in the half open interval [0, 1) * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Event log-odds for probabilities of failure `arg_constraints = {'logits': Real(), 'probs': HalfOpenInterval(lower_bound=0.0, upper_bound=1.0), 'total_count': GreaterThanEq(lower_bound=0)}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.expand) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.log_prob) `logits` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.logits) `property mean` `property param_shape` `probs` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.probs) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/negative_binomial.html#NegativeBinomial.sample) `support = IntegerGreaterThan(lower_bound=0)` `property variance` ## Normal `class torch.distributions.normal.Normal(loc, scale, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal) Bases: `torch.distributions.exp_family.ExponentialFamily` Creates a normal (also called Gaussian) distribution parameterized by `loc` and `scale`. Example: >>> m = Normal(torch.tensor([0.0]), torch.tensor([1.0])) >>> m.sample() # normally distributed with loc=0 and scale=1 tensor([ 0.1046]) Parameters * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution (often referred to as mu) * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – standard deviation of the distribution (often referred to as sigma) `arg_constraints = {'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.expand) `has_rsample = True` `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.rsample) `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/normal.html#Normal.sample) `property stddev` `support = Real()` `property variance` ## OneHotCategorical `class torch.distributions.one_hot_categorical.OneHotCategorical(probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical) Bases: `torch.distributions.distribution.Distribution` Creates a one-hot categorical distribution parameterized by `probs` or `logits`. Samples are one-hot coded vectors of size `probs.size(-1)`. Note The `probs` argument must be non-negative, finite and have a non-zero sum, and it will be normalized to sum to 1 along the last dimension. attr:`probs` will return this normalized value. The `logits` argument will be interpreted as unnormalized log probabilities and can therefore be any real number. It will likewise be normalized so that the resulting probabilities sum to 1 along the last dimension. attr:`logits` will return this normalized value. See also: `torch.distributions.Categorical()` for specifications of `probs` and `logits`. Example: >>> m = OneHotCategorical(torch.tensor([ 0.25, 0.25, 0.25, 0.25 ])) >>> m.sample() # equal probability of 0, 1, 2, 3 tensor([ 0., 0., 0., 1.]) Parameters * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event log probabilities (unnormalized) `arg_constraints = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.entropy) `enumerate_support(expand=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.enumerate_support) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.expand) `has_enumerate_support = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.log_prob) `property logits` `property mean` `property param_shape` `property probs` `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/one_hot_categorical.html#OneHotCategorical.sample) `support = OneHot()` `property variance` ## Pareto `class torch.distributions.pareto.Pareto(scale, alpha, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/pareto.html#Pareto) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Samples from a Pareto Type 1 distribution. Example: >>> m = Pareto(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Pareto distribution with scale=1 and alpha=1 tensor([ 1.5623]) Parameters * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Scale parameter of the distribution * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Shape parameter of the distribution `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'alpha': GreaterThan(lower_bound=0.0), 'scale': GreaterThan(lower_bound=0.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/pareto.html#Pareto.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/pareto.html#Pareto.expand) `property mean` `property support` `property variance` ## Poisson `class torch.distributions.poisson.Poisson(rate, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson) Bases: `torch.distributions.exp_family.ExponentialFamily` Creates a Poisson distribution parameterized by `rate`, the rate parameter. Samples are nonnegative integers, with a pmf given by rateke−ratek!\mathrm{rate}^k \frac{e^{-\mathrm{rate}}}{k!} Example: >>> m = Poisson(torch.tensor([4])) >>> m.sample() tensor([ 3.]) Parameters **rate** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the rate parameter `arg_constraints = {'rate': GreaterThan(lower_bound=0.0)}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson.expand) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson.log_prob) `property mean` `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/poisson.html#Poisson.sample) `support = IntegerGreaterThan(lower_bound=0)` `property variance` ## RelaxedBernoulli `class torch.distributions.relaxed_bernoulli.RelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#RelaxedBernoulli) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Creates a RelaxedBernoulli distribution, parametrized by `temperature`, and either `probs` or `logits` (but not both). This is a relaxed version of the `Bernoulli` distribution, so the values are in (0, 1), and has reparametrizable samples. Example: >>> m = RelaxedBernoulli(torch.tensor([2.2]), torch.tensor([0.1, 0.2, 0.3, 0.99])) >>> m.sample() tensor([ 0.2951, 0.3442, 0.8918, 0.9021]) Parameters * **temperature** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – relaxation temperature * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1` * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1` `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#RelaxedBernoulli.expand) `has_rsample = True` `property logits` `property probs` `support = Interval(lower_bound=0.0, upper_bound=1.0)` `property temperature` ## LogitRelaxedBernoulli `class torch.distributions.relaxed_bernoulli.LogitRelaxedBernoulli(temperature, probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli) Bases: `torch.distributions.distribution.Distribution` Creates a LogitRelaxedBernoulli distribution parameterized by `probs` or `logits` (but not both), which is the logit of a RelaxedBernoulli distribution. Samples are logits of values in (0, 1). See [1] for more details. Parameters * **temperature** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – relaxation temperature * **probs** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the probability of sampling `1` * **logits** (_Number_ _,_[Tensor](tensors#torch.Tensor "torch.Tensor")) – the log-odds of sampling `1` [1] The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables (Maddison et al, 2017) [2] Categorical Reparametrization with Gumbel-Softmax (Jang et al, 2017) `arg_constraints = {'logits': Real(), 'probs': Interval(lower_bound=0.0, upper_bound=1.0)}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.expand) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.log_prob) `logits` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.logits) `property param_shape` `probs` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.probs) `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_bernoulli.html#LogitRelaxedBernoulli.rsample) `support = Real()` ## RelaxedOneHotCategorical `class torch.distributions.relaxed_categorical.RelaxedOneHotCategorical(temperature, probs=None, logits=None, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_categorical.html#RelaxedOneHotCategorical) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Creates a RelaxedOneHotCategorical distribution parametrized by `temperature`, and either `probs` or `logits`. This is a relaxed version of the `OneHotCategorical` distribution, so its samples are on simplex, and are reparametrizable. Example: >>> m = RelaxedOneHotCategorical(torch.tensor([2.2]), torch.tensor([0.1, 0.2, 0.3, 0.4])) >>> m.sample() tensor([ 0.1294, 0.2324, 0.3859, 0.2523]) Parameters * **temperature** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – relaxation temperature * **probs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – event probabilities * **logits** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – unnormalized log probability for each event `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'logits': IndependentConstraint(Real(), 1), 'probs': Simplex()}` `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/relaxed_categorical.html#RelaxedOneHotCategorical.expand) `has_rsample = True` `property logits` `property probs` `support = Simplex()` `property temperature` ## StudentT `class torch.distributions.studentT.StudentT(df, loc=0.0, scale=1.0, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT) Bases: `torch.distributions.distribution.Distribution` Creates a Student’s t-distribution parameterized by degree of freedom `df`, mean `loc` and scale `scale`. Example: >>> m = StudentT(torch.tensor([2.0])) >>> m.sample() # Student's t-distributed with degrees of freedom=2 tensor([ 0.1046]) Parameters * **df** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – degrees of freedom * **loc** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – mean of the distribution * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – scale of the distribution `arg_constraints = {'df': GreaterThan(lower_bound=0.0), 'loc': Real(), 'scale': GreaterThan(lower_bound=0.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.expand) `has_rsample = True` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/studentT.html#StudentT.rsample) `support = Real()` `property variance` ## TransformedDistribution `class torch.distributions.transformed_distribution.TransformedDistribution(base_distribution, transforms, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution) Bases: `torch.distributions.distribution.Distribution` Extension of the Distribution class, which applies a sequence of Transforms to a base distribution. Let f be the composition of transforms applied: X ~ BaseDistribution Y = f(X) ~ TransformedDistribution(BaseDistribution, f) log p(Y) = log p(X) + log |det (dX/dY)| Note that the `.event_shape` of a `TransformedDistribution` is the maximum shape of its base distribution and its transforms, since transforms can introduce correlations among events. An example for the usage of `TransformedDistribution` would be: # Building a Logistic Distribution # X ~ Uniform(0, 1) # f = a + b * logit(X) # Y ~ f(X) ~ Logistic(a, b) base_distribution = Uniform(0, 1) transforms = [SigmoidTransform().inv, AffineTransform(loc=a, scale=b)] logistic = TransformedDistribution(base_distribution, transforms) For more examples, please look at the implementations of `Gumbel`, `HalfCauchy`, `HalfNormal`, `LogNormal`, `Pareto`, `Weibull`, `RelaxedBernoulli` and `RelaxedOneHotCategorical` `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.cdf) Computes the cumulative distribution function by inverting the transform(s) and computing the score of the base distribution. `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.expand) `property has_rsample` `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.icdf) Computes the inverse cumulative distribution function using transform(s) and computing the score of the base distribution. `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.log_prob) Scores the sample by inverting the transform(s) and computing the score using the score of the base distribution and the log abs det jacobian. `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.rsample) Generates a sample_shape shaped reparameterized sample or sample_shape shaped batch of reparameterized samples if the distribution parameters are batched. Samples first from base distribution and applies `transform()` for every transform in the list. `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transformed_distribution.html#TransformedDistribution.sample) Generates a sample_shape shaped sample or sample_shape shaped batch of samples if the distribution parameters are batched. Samples first from base distribution and applies `transform()` for every transform in the list. `property support` ## Uniform `class torch.distributions.uniform.Uniform(low, high, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform) Bases: `torch.distributions.distribution.Distribution` Generates uniformly distributed random samples from the half-open interval `[low, high)`. Example: >>> m = Uniform(torch.tensor([0.0]), torch.tensor([5.0])) >>> m.sample() # uniformly distributed in the range [0.0, 5.0) tensor([ 2.3418]) Parameters * **low** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – lower range (inclusive). * **high** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – upper range (exclusive). `arg_constraints = {'high': Dependent(), 'low': Dependent()}` `cdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.cdf) `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.expand) `has_rsample = True` `icdf(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.icdf) `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.log_prob) `property mean` `rsample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/uniform.html#Uniform.rsample) `property stddev` `property support` `property variance` ## VonMises `class torch.distributions.von_mises.VonMises(loc, concentration, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises) Bases: `torch.distributions.distribution.Distribution` A circular von Mises distribution. This implementation uses polar coordinates. The `loc` and `value` args can be any real number (to facilitate unconstrained optimization), but are interpreted as angles modulo 2 pi. Example:: >>> m = dist.VonMises(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # von Mises distributed with loc=1 and concentration=1 tensor([1.9777]) Parameters * **loc** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – an angle in radians. * **concentration** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – concentration parameter `arg_constraints = {'concentration': GreaterThan(lower_bound=0.0), 'loc': Real()}` `expand(batch_shape)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.expand) `has_rsample = False` `log_prob(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.log_prob) `property mean` The provided mean is the circular one. `sample(sample_shape=torch.Size([]))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.sample) The sampling algorithm for the von Mises distribution is based on the following paper: Best, D. J., and Nicholas I. Fisher. “Efficient simulation of the von Mises distribution.” Applied Statistics (1979): 152-157. `support = Real()` `variance` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/von_mises.html#VonMises.variance) The provided variance is the circular one. ## Weibull `class torch.distributions.weibull.Weibull(scale, concentration, validate_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/weibull.html#Weibull) Bases: `torch.distributions.transformed_distribution.TransformedDistribution` Samples from a two-parameter Weibull distribution. #### Example >>> m = Weibull(torch.tensor([1.0]), torch.tensor([1.0])) >>> m.sample() # sample from a Weibull distribution with scale=1, concentration=1 tensor([ 0.4784]) Parameters * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Scale parameter of distribution (lambda). * **concentration** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – Concentration parameter of distribution (k/shape). `arg_constraints: Dict[str, torch.distributions.constraints.Constraint] = {'concentration': GreaterThan(lower_bound=0.0), 'scale': GreaterThan(lower_bound=0.0)}` `entropy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/weibull.html#Weibull.entropy) `expand(batch_shape, _instance=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/weibull.html#Weibull.expand) `property mean` `support = GreaterThan(lower_bound=0.0)` `property variance` ## `KL Divergence` `torch.distributions.kl.kl_divergence(p, q)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kl.html#kl_divergence) Compute Kullback-Leibler divergence KL(p∥q)KL(p \| q) between two distributions. KL(p∥q)=∫p(x)log⁡p(x)q(x)dxKL(p \| q) = \int p(x) \log\frac {p(x)} {q(x)} \,dx Parameters * **p** (Distribution) – A `Distribution` object. * **q** (Distribution) – A `Distribution` object. Returns A batch of KL divergences of shape `batch_shape`. Return type [Tensor](tensors#torch.Tensor "torch.Tensor") Raises [**NotImplementedError**](https://docs.python.org/3/library/exceptions.html#NotImplementedError "\(in Python v3.9\)") – If the distribution types have not been registered via `register_kl()`. `torch.distributions.kl.register_kl(type_p, type_q)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/kl.html#register_kl) Decorator to register a pairwise function with `kl_divergence()`. Usage: @register_kl(Normal, Normal) def kl_normal_normal(p, q): # insert implementation here Lookup returns the most specific (type,type) match ordered by subclass. If the match is ambiguous, a `RuntimeWarning` is raised. For example to resolve the ambiguous situation: @register_kl(BaseP, DerivedQ) def kl_version1(p, q): ... @register_kl(DerivedP, BaseQ) def kl_version2(p, q): ... you should register a third most-specific implementation, e.g.: register_kl(DerivedP, DerivedQ)(kl_version1) # Break the tie. Parameters * **type_p** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)")) – A subclass of `Distribution`. * **type_q** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)")) – A subclass of `Distribution`. ## `Transforms` `class torch.distributions.transforms.Transform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform) Abstract class for invertable transformations with computable log det jacobians. They are primarily used in `torch.distributions.TransformedDistribution`. Caching is useful for transforms whose inverses are either expensive or numerically unstable. Note that care must be taken with memoized values since the autograd graph may be reversed. For example while the following works with or without caching: y = t(x) t.log_abs_det_jacobian(x, y).backward() # x will receive gradients. However the following will error when caching due to dependency reversal: y = t(x) z = t.inv(y) grad(z.sum(), [y]) # error because z is x Derived classes should implement one or both of `_call()` or `_inverse()`. Derived classes that set `bijective=True` should also implement `log_abs_det_jacobian()`. Parameters **cache_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of cache. If zero, no caching is done. If one, the latest single value is cached. Only 0 and 1 are supported. Variables * **~Transform.domain** (`Constraint`) – The constraint representing valid inputs to this transform. * **~Transform.codomain** (`Constraint`) – The constraint representing valid outputs to this transform which are inputs to the inverse transform. * **~Transform.bijective** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether this transform is bijective. A transform `t` is bijective iff `t.inv(t(x)) == x` and `t(t.inv(y)) == y` for every `x` in the domain and `y` in the codomain. Transforms that are not bijective should at least maintain the weaker pseudoinverse properties `t(t.inv(t(x)) == t(x)` and `t.inv(t(t.inv(y))) == t.inv(y)`. * **~Transform.sign** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[Tensor](tensors#torch.Tensor "torch.Tensor")) – For bijective univariate transforms, this should be +1 or -1 depending on whether transform is monotone increasing or decreasing. `property inv` Returns the inverse `Transform` of this transform. This should satisfy `t.inv.inv is t`. `property sign` Returns the sign of the determinant of the Jacobian, if applicable. In general this only makes sense for bijective transforms. `log_abs_det_jacobian(x, y)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform.log_abs_det_jacobian) Computes the log det jacobian `log |dy/dx|` given input and output. `forward_shape(shape)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform.forward_shape) Infers the shape of the forward computation, given the input shape. Defaults to preserving shape. `inverse_shape(shape)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#Transform.inverse_shape) Infers the shapes of the inverse computation, given the output shape. Defaults to preserving shape. `class torch.distributions.transforms.ComposeTransform(parts, cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#ComposeTransform) Composes multiple transforms in a chain. The transforms being composed are responsible for caching. Parameters * **parts** (list of `Transform`) – A list of transforms to compose. * **cache_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of cache. If zero, no caching is done. If one, the latest single value is cached. Only 0 and 1 are supported. `class torch.distributions.transforms.IndependentTransform(base_transform, reinterpreted_batch_ndims, cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#IndependentTransform) Wrapper around another transform to treat `reinterpreted_batch_ndims`-many extra of the right most dimensions as dependent. This has no effect on the forward or backward transforms, but does sum out `reinterpreted_batch_ndims`-many of the rightmost dimensions in `log_abs_det_jacobian()`. Parameters * **base_transform** (`Transform`) – A base transform. * **reinterpreted_batch_ndims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of extra rightmost dimensions to treat as dependent. `class torch.distributions.transforms.ReshapeTransform(in_shape, out_shape, cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#ReshapeTransform) Unit Jacobian transform to reshape the rightmost part of a tensor. Note that `in_shape` and `out_shape` must have the same number of elements, just as for [`torch.Tensor.reshape()`](tensors#torch.Tensor.reshape "torch.Tensor.reshape"). Parameters * **in_shape** (_torch.Size_) – The input event shape. * **out_shape** (_torch.Size_) – The output event shape. `class torch.distributions.transforms.ExpTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#ExpTransform) Transform via the mapping y=exp⁡(x)y = \exp(x) . `class torch.distributions.transforms.PowerTransform(exponent, cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#PowerTransform) Transform via the mapping y=xexponenty = x^{\text{exponent}} . `class torch.distributions.transforms.SigmoidTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#SigmoidTransform) Transform via the mapping y=11+exp⁡(−x)y = \frac{1}{1 + \exp(-x)} and x=logit(y)x = \text{logit}(y) . `class torch.distributions.transforms.TanhTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#TanhTransform) Transform via the mapping y=tanh⁡(x)y = \tanh(x) . It is equivalent to `` ComposeTransform([AffineTransform(0., 2.), SigmoidTransform(), AffineTransform(-1., 2.)]) `` However this might not be numerically stable, thus it is recommended to use `TanhTransform` instead. Note that one should use `cache_size=1` when it comes to `NaN/Inf` values. `class torch.distributions.transforms.AbsTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#AbsTransform) Transform via the mapping y=∣x∣y = |x| . `class torch.distributions.transforms.AffineTransform(loc, scale, event_dim=0, cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#AffineTransform) Transform via the pointwise affine mapping y=loc+scale×xy = \text{loc} + \text{scale} \times x . Parameters * **loc** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Location parameter. * **scale** ([Tensor](tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Scale parameter. * **event_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Optional size of `event_shape`. This should be zero for univariate random variables, 1 for distributions over vectors, 2 for distributions over matrices, etc. `class torch.distributions.transforms.CorrCholeskyTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#CorrCholeskyTransform) Transforms an uncontrained real vector xx with length D∗(D−1)/2D*(D-1)/2 into the Cholesky factor of a D-dimension correlation matrix. This Cholesky factor is a lower triangular matrix with positive diagonals and unit Euclidean norm for each row. The transform is processed as follows: 1. First we convert x into a lower triangular matrix in row order. 2. For each row XiX_i of the lower triangular part, we apply a _signed_ version of class `StickBreakingTransform` to transform XiX_i into a unit Euclidean length vector using the following steps: - Scales into the interval (−1,1)(-1, 1) domain: ri=tanh⁡(Xi)r_i = \tanh(X_i) . - Transforms into an unsigned domain: zi=ri2z_i = r_i^2 . - Applies si=StickBreakingTransform(zi)s_i = StickBreakingTransform(z_i) . - Transforms back into signed domain: yi=sign(ri)∗siy_i = sign(r_i) * \sqrt{s_i} . `class torch.distributions.transforms.SoftmaxTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#SoftmaxTransform) Transform from unconstrained space to the simplex via y=exp⁡(x)y = \exp(x) then normalizing. This is not bijective and cannot be used for HMC. However this acts mostly coordinate-wise (except for the final normalization), and thus is appropriate for coordinate-wise optimization algorithms. `class torch.distributions.transforms.StickBreakingTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#StickBreakingTransform) Transform from unconstrained space to the simplex of one additional dimension via a stick-breaking process. This transform arises as an iterated sigmoid transform in a stick-breaking construction of the `Dirichlet` distribution: the first logit is transformed via sigmoid to the first probability and the probability of everything else, and then the process recurses. This is bijective and appropriate for use in HMC; however it mixes coordinates together and is less appropriate for optimization. `class torch.distributions.transforms.LowerCholeskyTransform(cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#LowerCholeskyTransform) Transform from unconstrained matrices to lower-triangular matrices with nonnegative diagonal entries. This is useful for parameterizing positive definite matrices in terms of their Cholesky factorization. `class torch.distributions.transforms.StackTransform(tseq, dim=0, cache_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/transforms.html#StackTransform) Transform functor that applies a sequence of transforms `tseq` component-wise to each submatrix at `dim` in a way compatible with [`torch.stack()`](generated/torch.stack#torch.stack "torch.stack"). Example:: x = torch.stack([torch.range(1, 10), torch.range(1, 10)], dim=1) t = StackTransform([ExpTransform(), identity_transform], dim=1) y = t(x) ## `Constraints` The following constraints are implemented: * `constraints.boolean` * `constraints.cat` * `constraints.corr_cholesky` * `constraints.dependent` * `constraints.greater_than(lower_bound)` * `constraints.greater_than_eq(lower_bound)` * `constraints.independent(constraint, reinterpreted_batch_ndims)` * `constraints.integer_interval(lower_bound, upper_bound)` * `constraints.interval(lower_bound, upper_bound)` * `constraints.less_than(upper_bound)` * `constraints.lower_cholesky` * `constraints.lower_triangular` * `constraints.multinomial` * `constraints.nonnegative_integer` * `constraints.one_hot` * `constraints.positive_definite` * `constraints.positive_integer` * `constraints.positive` * `constraints.real_vector` * `constraints.real` * `constraints.simplex` * `constraints.stack` * `constraints.unit_interval` `class torch.distributions.constraints.Constraint` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraints.html#Constraint) Abstract base class for constraints. A constraint object represents a region over which a variable is valid, e.g. within which a variable can be optimized. Variables * **~Constraint.is_discrete** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether constrained space is discrete. Defaults to False. * **~Constraint.event_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of rightmost dimensions that together define an event. The `check()` method will remove this many dimensions when computing validity. `check(value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraints.html#Constraint.check) Returns a byte tensor of `sample_shape + batch_shape` indicating whether each event in value satisfies this constraint. `torch.distributions.constraints.dependent_property` alias of `torch.distributions.constraints._DependentProperty` `torch.distributions.constraints.independent` alias of `torch.distributions.constraints._IndependentConstraint` `torch.distributions.constraints.integer_interval` alias of `torch.distributions.constraints._IntegerInterval` `torch.distributions.constraints.greater_than` alias of `torch.distributions.constraints._GreaterThan` `torch.distributions.constraints.greater_than_eq` alias of `torch.distributions.constraints._GreaterThanEq` `torch.distributions.constraints.less_than` alias of `torch.distributions.constraints._LessThan` `torch.distributions.constraints.multinomial` alias of `torch.distributions.constraints._Multinomial` `torch.distributions.constraints.interval` alias of `torch.distributions.constraints._Interval` `torch.distributions.constraints.half_open_interval` alias of `torch.distributions.constraints._HalfOpenInterval` `torch.distributions.constraints.cat` alias of `torch.distributions.constraints._Cat` `torch.distributions.constraints.stack` alias of `torch.distributions.constraints._Stack` ## `Constraint Registry` PyTorch provides two global `ConstraintRegistry` objects that link `Constraint` objects to `Transform` objects. These objects both input constraints and return transforms, but they have different guarantees on bijectivity. 1. `biject_to(constraint)` looks up a bijective `Transform` from `constraints.real` to the given `constraint`. The returned transform is guaranteed to have `.bijective = True` and should implement `.log_abs_det_jacobian()`. 2. `transform_to(constraint)` looks up a not-necessarily bijective `Transform` from `constraints.real` to the given `constraint`. The returned transform is not guaranteed to implement `.log_abs_det_jacobian()`. The `transform_to()` registry is useful for performing unconstrained optimization on constrained parameters of probability distributions, which are indicated by each distribution’s `.arg_constraints` dict. These transforms often overparameterize a space in order to avoid rotation; they are thus more suitable for coordinate-wise optimization algorithms like Adam: loc = torch.zeros(100, requires_grad=True) unconstrained = torch.zeros(100, requires_grad=True) scale = transform_to(Normal.arg_constraints['scale'])(unconstrained) loss = -Normal(loc, scale).log_prob(data).sum() The `biject_to()` registry is useful for Hamiltonian Monte Carlo, where samples from a probability distribution with constrained `.support` are propagated in an unconstrained space, and algorithms are typically rotation invariant.: dist = Exponential(rate) unconstrained = torch.zeros(100, requires_grad=True) sample = biject_to(dist.support)(unconstrained) potential_energy = -dist.log_prob(sample).sum() Note An example where `transform_to` and `biject_to` differ is `constraints.simplex`: `transform_to(constraints.simplex)` returns a `SoftmaxTransform` that simply exponentiates and normalizes its inputs; this is a cheap and mostly coordinate-wise operation appropriate for algorithms like SVI. In contrast, `biject_to(constraints.simplex)` returns a `StickBreakingTransform` that bijects its input down to a one-fewer- dimensional space; this a more expensive less numerically stable transform but is needed for algorithms like HMC. The `biject_to` and `transform_to` objects can be extended by user-defined constraints and transforms using their `.register()` method either as a function on singleton constraints: transform_to.register(my_constraint, my_transform) or as a decorator on parameterized constraints: @transform_to.register(MyConstraintClass) def my_factory(constraint): assert isinstance(constraint, MyConstraintClass) return MyTransform(constraint.param1, constraint.param2) You can create your own registry by creating a new `ConstraintRegistry` object. `class torch.distributions.constraint_registry.ConstraintRegistry` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraint_registry.html#ConstraintRegistry) Registry to link constraints to transforms. `register(constraint, factory=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributions/constraint_registry.html#ConstraintRegistry.register) Registers a `Constraint` subclass in this registry. Usage: @my_registry.register(MyConstraintClass) def construct_transform(constraint): assert isinstance(constraint, MyConstraint) return MyTransform(constraint.arg_constraints) Parameters * **constraint** (subclass of `Constraint`) – A subclass of `Constraint`, or a singleton object of the desired class. * **factory** (_callable_) – A callable that inputs a constraint object and returns a `Transform` object. # torch.utils.dlpack `torch.utils.dlpack.from_dlpack(dlpack) → Tensor` Decodes a DLPack to a tensor. Parameters **dlpack** – a PyCapsule object with the dltensor The tensor will share the memory with the object represented in the dlpack. Note that each dlpack can only be consumed once. `torch.utils.dlpack.to_dlpack(tensor) → PyCapsule` Returns a DLPack representing the tensor. Parameters **tensor** – a tensor to be exported The dlpack shares the tensors memory. Note that each dlpack can only be consumed once. # torch.fft Discrete Fourier transforms and related functions. ## Fast Fourier Transforms `torch.fft.fft(input, n=None, dim=-1, norm=None) → Tensor` Computes the one dimensional discrete Fourier transform of `input`. Note The Fourier domain representation of any real signal satisfies the Hermitian property: `X[i] = conj(X[-i])`. This function always returns both the positive and negative frequency terms even though, for real inputs, the negative frequencies are redundant. `rfft()` returns the more compact one-sided representation where only the positive frequencies are returned. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the FFT. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional FFT. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the forward transform (`fft()`), these correspond to: * `"forward"` \- normalize by `1/n` * `"backward"` \- no normalization * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal) Calling the backward transform (`ifft()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ifft()` the exact inverse. Default is `"backward"` (no normalization). #### Example >>> t = torch.arange(4) >>> t tensor([0, 1, 2, 3]) >>> torch.fft.fft(t) tensor([ 6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j]) >>> t = tensor([0.+1.j, 2.+3.j, 4.+5.j, 6.+7.j]) >>> torch.fft.fft(t) tensor([12.+16.j, -8.+0.j, -4.-4.j, 0.-8.j]) `torch.fft.ifft(input, n=None, dim=-1, norm=None) → Tensor` Computes the one dimensional inverse discrete Fourier transform of `input`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the IFFT. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional IFFT. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the backward transform (`ifft()`), these correspond to: * `"forward"` \- no normalization * `"backward"` \- normalize by `1/n` * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal) Calling the forward transform (`fft()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ifft()` the exact inverse. Default is `"backward"` (normalize by `1/n`). #### Example >>> t = torch.tensor([ 6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j]) >>> torch.fft.ifft(t) tensor([0.+0.j, 1.+0.j, 2.+0.j, 3.+0.j]) `torch.fft.fft2(input, s=None, dim=(-2, -1), norm=None) → Tensor` Computes the 2 dimensional discrete Fourier transform of `input`. Equivalent to `fftn()` but FFTs only the last two dimensions by default. Note The Fourier domain representation of any real signal satisfies the Hermitian property: `X[i, j] = conj(X[-i, -j])`. This function always returns all positive and negative frequency terms even though, for real inputs, half of these values are redundant. `rfft2()` returns the more compact one-sided representation where only the positive frequencies of the last dimension are returned. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]` * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: last two dimensions. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the forward transform (`fft2()`), these correspond to: * `"forward"` \- normalize by `1/n` * `"backward"` \- no normalization * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal) Where `n = prod(s)` is the logical FFT size. Calling the backward transform (`ifft2()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ifft2()` the exact inverse. Default is `"backward"` (no normalization). #### Example >>> x = torch.rand(10, 10, dtype=torch.complex64) >>> fft2 = torch.fft.fft2(t) The discrete Fourier transform is separable, so `fft2()` here is equivalent to two one-dimensional `fft()` calls: >>> two_ffts = torch.fft.fft(torch.fft.fft(x, dim=0), dim=1) >>> torch.allclose(fft2, two_ffts) `torch.fft.ifft2(input, s=None, dim=(-2, -1), norm=None) → Tensor` Computes the 2 dimensional inverse discrete Fourier transform of `input`. Equivalent to `ifftn()` but IFFTs only the last two dimensions by default. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the IFFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]` * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: last two dimensions. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the backward transform (`ifft2()`), these correspond to: * `"forward"` \- no normalization * `"backward"` \- normalize by `1/n` * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal) Where `n = prod(s)` is the logical IFFT size. Calling the forward transform (`fft2()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ifft2()` the exact inverse. Default is `"backward"` (normalize by `1/n`). #### Example >>> x = torch.rand(10, 10, dtype=torch.complex64) >>> ifft2 = torch.fft.ifft2(t) The discrete Fourier transform is separable, so `ifft2()` here is equivalent to two one-dimensional `ifft()` calls: >>> two_iffts = torch.fft.ifft(torch.fft.ifft(x, dim=0), dim=1) >>> torch.allclose(ifft2, two_iffts) `torch.fft.fftn(input, s=None, dim=None, norm=None) → Tensor` Computes the N dimensional discrete Fourier transform of `input`. Note The Fourier domain representation of any real signal satisfies the Hermitian property: `X[i_1, ..., i_n] = conj(X[-i_1, ..., -i_n])`. This function always returns all positive and negative frequency terms even though, for real inputs, half of these values are redundant. `rfftn()` returns the more compact one-sided representation where only the positive frequencies of the last dimension are returned. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]` * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: all dimensions, or the last `len(s)` dimensions if `s` is given. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the forward transform (`fftn()`), these correspond to: * `"forward"` \- normalize by `1/n` * `"backward"` \- no normalization * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal) Where `n = prod(s)` is the logical FFT size. Calling the backward transform (`ifftn()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ifftn()` the exact inverse. Default is `"backward"` (no normalization). #### Example >>> x = torch.rand(10, 10, dtype=torch.complex64) >>> fftn = torch.fft.fftn(t) The discrete Fourier transform is separable, so `fftn()` here is equivalent to two one-dimensional `fft()` calls: >>> two_ffts = torch.fft.fft(torch.fft.fft(x, dim=0), dim=1) >>> torch.allclose(fftn, two_ffts) `torch.fft.ifftn(input, s=None, dim=None, norm=None) → Tensor` Computes the N dimensional inverse discrete Fourier transform of `input`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the IFFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]` * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: all dimensions, or the last `len(s)` dimensions if `s` is given. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the backward transform (`ifftn()`), these correspond to: * `"forward"` \- no normalization * `"backward"` \- normalize by `1/n` * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal) Where `n = prod(s)` is the logical IFFT size. Calling the forward transform (`fftn()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ifftn()` the exact inverse. Default is `"backward"` (normalize by `1/n`). #### Example >>> x = torch.rand(10, 10, dtype=torch.complex64) >>> ifftn = torch.fft.ifftn(t) The discrete Fourier transform is separable, so `ifftn()` here is equivalent to two one-dimensional `ifft()` calls: >>> two_iffts = torch.fft.ifft(torch.fft.ifft(x, dim=0), dim=1) >>> torch.allclose(ifftn, two_iffts) `torch.fft.rfft(input, n=None, dim=-1, norm=None) → Tensor` Computes the one dimensional Fourier transform of real-valued `input`. The FFT of a real signal is Hermitian-symmetric, `X[i] = conj(X[-i])` so the output contains only the positive frequencies below the Nyquist frequency. To compute the full output, use `fft()` Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the real input tensor * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the real FFT. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional real FFT. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the forward transform (`rfft()`), these correspond to: * `"forward"` \- normalize by `1/n` * `"backward"` \- no normalization * `"ortho"` \- normalize by `1/sqrt(n)` (making the FFT orthonormal) Calling the backward transform (`irfft()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `irfft()` the exact inverse. Default is `"backward"` (no normalization). #### Example >>> t = torch.arange(4) >>> t tensor([0, 1, 2, 3]) >>> torch.fft.rfft(t) tensor([ 6.+0.j, -2.+2.j, -2.+0.j]) Compare against the full output from `fft()`: >>> torch.fft.fft(t) tensor([ 6.+0.j, -2.+2.j, -2.+0.j, -2.-2.j]) Notice that the symmetric element `T[-1] == T[1].conj()` is omitted. At the Nyquist frequency `T[-2] == T[2]` is it’s own symmetric pair, and therefore must always be real-valued. `torch.fft.irfft(input, n=None, dim=-1, norm=None) → Tensor` Computes the inverse of `rfft()`. `input` is interpreted as a one-sided Hermitian signal in the Fourier domain, as produced by `rfft()`. By the Hermitian property, the output will be real- valued. Note Some input frequencies must be real-valued to satisfy the Hermitian property. In these cases the imaginary component will be ignored. For example, any imaginary component in the zero-frequency term cannot be represented in a real output and so will always be ignored. Note The correct interpretation of the Hermitian input depends on the length of the original data, as given by `n`. This is because each input shape could correspond to either an odd or even length signal. By default, the signal is assumed to be even length and odd signals will not round-trip properly. So, it is recommended to always pass the signal length `n`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor representing a half-Hermitian signal * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Output signal length. This determines the length of the output signal. If given, the input will either be zero-padded or trimmed to this length before computing the real IFFT. Defaults to even output: `n=2*(input.size(dim) - 1)`. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional real IFFT. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the backward transform (`irfft()`), these correspond to: * `"forward"` \- no normalization * `"backward"` \- normalize by `1/n` * `"ortho"` \- normalize by `1/sqrt(n)` (making the real IFFT orthonormal) Calling the forward transform (`rfft()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `irfft()` the exact inverse. Default is `"backward"` (normalize by `1/n`). #### Example >>> t = torch.arange(5) >>> t tensor([0, 1, 2, 3, 4]) >>> T = torch.fft.rfft(t) >>> T tensor([10.0000+0.0000j, -2.5000+3.4410j, -2.5000+0.8123j]) Without specifying the output length to `irfft()`, the output will not round- trip properly because the input is odd-length: >>> torch.fft.irfft(T) tensor([0.6250, 1.4045, 3.1250, 4.8455]) So, it is recommended to always pass the signal length `n`: >>> torch.fft.irfft(T, t.numel()) tensor([0.0000, 1.0000, 2.0000, 3.0000, 4.0000]) `torch.fft.rfft2(input, s=None, dim=(-2, -1), norm=None) → Tensor` Computes the 2-dimensional discrete Fourier transform of real `input`. Equivalent to `rfftn()` but FFTs only the last two dimensions by default. The FFT of a real signal is Hermitian-symmetric, `X[i, j] = conj(X[-i, -j])`, so the full `fft2()` output contains redundant information. `rfft2()` instead omits the negative frequencies in the last dimension. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]` * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: last two dimensions. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the forward transform (`rfft2()`), these correspond to: * `"forward"` \- normalize by `1/n` * `"backward"` \- no normalization * `"ortho"` \- normalize by `1/sqrt(n)` (making the real FFT orthonormal) Where `n = prod(s)` is the logical FFT size. Calling the backward transform (`irfft2()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `irfft2()` the exact inverse. Default is `"backward"` (no normalization). #### Example >>> t = torch.rand(10, 10) >>> rfft2 = torch.fft.rfft2(t) >>> rfft2.size() torch.Size([10, 6]) Compared against the full output from `fft2()`, we have all elements up to the Nyquist frequency. >>> fft2 = torch.fft.fft2(t) >>> torch.allclose(fft2[..., :6], rfft2) True The discrete Fourier transform is separable, so `rfft2()` here is equivalent to a combination of `fft()` and `rfft()`: >>> two_ffts = torch.fft.fft(torch.fft.rfft(x, dim=1), dim=0) >>> torch.allclose(rfft2, two_ffts) `torch.fft.irfft2(input, s=None, dim=(-2, -1), norm=None) → Tensor` Computes the inverse of `rfft2()`. Equivalent to `irfftn()` but IFFTs only the last two dimensions by default. `input` is interpreted as a one-sided Hermitian signal in the Fourier domain, as produced by `rfft2()`. By the Hermitian property, the output will be real- valued. Note Some input frequencies must be real-valued to satisfy the Hermitian property. In these cases the imaginary component will be ignored. For example, any imaginary component in the zero-frequency term cannot be represented in a real output and so will always be ignored. Note The correct interpretation of the Hermitian input depends on the length of the original data, as given by `s`. This is because each input shape could correspond to either an odd or even length signal. By default, the signal is assumed to be even length and odd signals will not round-trip properly. So, it is recommended to always pass the signal shape `s`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Defaults to even output in the last dimension: `s[-1] = 2*(input.size(dim[-1]) - 1)`. * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. The last dimension must be the half-Hermitian compressed dimension. Default: last two dimensions. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the backward transform (`irfft2()`), these correspond to: * `"forward"` \- no normalization * `"backward"` \- normalize by `1/n` * `"ortho"` \- normalize by `1/sqrt(n)` (making the real IFFT orthonormal) Where `n = prod(s)` is the logical IFFT size. Calling the forward transform (`rfft2()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `irfft2()` the exact inverse. Default is `"backward"` (normalize by `1/n`). #### Example >>> t = torch.rand(10, 9) >>> T = torch.fft.rfft2(t) Without specifying the output length to `irfft2()`, the output will not round- trip properly because the input is odd-length in the last dimension: >>> torch.fft.irfft2(T).size() torch.Size([10, 10]) So, it is recommended to always pass the signal shape `s`. >>> roundtrip = torch.fft.irfft2(T, t.size()) >>> roundtrip.size() torch.Size([10, 9]) >>> torch.allclose(roundtrip, t) True `torch.fft.rfftn(input, s=None, dim=None, norm=None) → Tensor` Computes the N-dimensional discrete Fourier transform of real `input`. The FFT of a real signal is Hermitian-symmetric, `X[i_1, ..., i_n] = conj(X[-i_1, ..., -i_n])` so the full `fftn()` output contains redundant information. `rfftn()` instead omits the negative frequencies in the last dimension. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Default: `s = [input.size(d) for d in dim]` * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. Default: all dimensions, or the last `len(s)` dimensions if `s` is given. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the forward transform (`rfftn()`), these correspond to: * `"forward"` \- normalize by `1/n` * `"backward"` \- no normalization * `"ortho"` \- normalize by `1/sqrt(n)` (making the real FFT orthonormal) Where `n = prod(s)` is the logical FFT size. Calling the backward transform (`irfftn()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `irfftn()` the exact inverse. Default is `"backward"` (no normalization). #### Example >>> t = torch.rand(10, 10) >>> rfftn = torch.fft.rfftn(t) >>> rfftn.size() torch.Size([10, 6]) Compared against the full output from `fftn()`, we have all elements up to the Nyquist frequency. >>> fftn = torch.fft.fftn(t) >>> torch.allclose(fftn[..., :6], rfftn) True The discrete Fourier transform is separable, so `rfftn()` here is equivalent to a combination of `fft()` and `rfft()`: >>> two_ffts = torch.fft.fft(torch.fft.rfft(x, dim=1), dim=0) >>> torch.allclose(rfftn, two_ffts) `torch.fft.irfftn(input, s=None, dim=None, norm=None) → Tensor` Computes the inverse of `rfftn()`. `input` is interpreted as a one-sided Hermitian signal in the Fourier domain, as produced by `rfftn()`. By the Hermitian property, the output will be real- valued. Note Some input frequencies must be real-valued to satisfy the Hermitian property. In these cases the imaginary component will be ignored. For example, any imaginary component in the zero-frequency term cannot be represented in a real output and so will always be ignored. Note The correct interpretation of the Hermitian input depends on the length of the original data, as given by `s`. This is because each input shape could correspond to either an odd or even length signal. By default, the signal is assumed to be even length and odd signals will not round-trip properly. So, it is recommended to always pass the signal shape `s`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **s** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Signal size in the transformed dimensions. If given, each dimension `dim[i]` will either be zero-padded or trimmed to the length `s[i]` before computing the real FFT. If a length `-1` is specified, no padding is done in that dimension. Defaults to even output in the last dimension: `s[-1] = 2*(input.size(dim[-1]) - 1)`. * **dim** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – Dimensions to be transformed. The last dimension must be the half-Hermitian compressed dimension. Default: all dimensions, or the last `len(s)` dimensions if `s` is given. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the backward transform (`irfftn()`), these correspond to: * `"forward"` \- no normalization * `"backward"` \- normalize by `1/n` * `"ortho"` \- normalize by `1/sqrt(n)` (making the real IFFT orthonormal) Where `n = prod(s)` is the logical IFFT size. Calling the forward transform (`rfftn()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `irfftn()` the exact inverse. Default is `"backward"` (normalize by `1/n`). #### Example >>> t = torch.rand(10, 9) >>> T = torch.fft.rfftn(t) Without specifying the output length to `irfft()`, the output will not round- trip properly because the input is odd-length in the last dimension: >>> torch.fft.irfftn(T).size() torch.Size([10, 10]) So, it is recommended to always pass the signal shape `s`. >>> roundtrip = torch.fft.irfftn(T, t.size()) >>> roundtrip.size() torch.Size([10, 9]) >>> torch.allclose(roundtrip, t) True `torch.fft.hfft(input, n=None, dim=-1, norm=None) → Tensor` Computes the one dimensional discrete Fourier transform of a Hermitian symmetric `input` signal. Note `hfft()`/`ihfft()` are analogous to `rfft()`/`irfft()`. The real FFT expects a real signal in the time-domain and gives a Hermitian symmetry in the frequency-domain. The Hermitian FFT is the opposite; Hermitian symmetric in the time-domain and real-valued in the frequency-domain. For this reason, special care needs to be taken with the length argument `n`, in the same way as with `irfft()`. Note Because the signal is Hermitian in the time-domain, the result will be real in the frequency domain. Note that some input frequencies must be real-valued to satisfy the Hermitian property. In these cases the imaginary component will be ignored. For example, any imaginary component in `input[0]` would result in one or more complex frequency terms which cannot be represented in a real output and so will always be ignored. Note The correct interpretation of the Hermitian input depends on the length of the original data, as given by `n`. This is because each input shape could correspond to either an odd or even length signal. By default, the signal is assumed to be even length and odd signals will not round-trip properly. So, it is recommended to always pass the signal length `n`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor representing a half-Hermitian signal * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Output signal length. This determines the length of the real output. If given, the input will either be zero-padded or trimmed to this length before computing the Hermitian FFT. Defaults to even output: `n=2*(input.size(dim) - 1)`. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional Hermitian FFT. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the forward transform (`hfft()`), these correspond to: * `"forward"` \- normalize by `1/n` * `"backward"` \- no normalization * `"ortho"` \- normalize by `1/sqrt(n)` (making the Hermitian FFT orthonormal) Calling the backward transform (`ihfft()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ihfft()` the exact inverse. Default is `"backward"` (no normalization). #### Example Taking a real-valued frequency signal and bringing it into the time domain gives Hermitian symmetric output: >>> t = torch.arange(5) >>> t tensor([0, 1, 2, 3, 4]) >>> T = torch.fft.ifft(t) >>> T tensor([ 2.0000+-0.0000j, -0.5000-0.6882j, -0.5000-0.1625j, -0.5000+0.1625j, -0.5000+0.6882j]) Note that `T[1] == T[-1].conj()` and `T[2] == T[-2].conj()` is redundant. We can thus compute the forward transform without considering negative frequencies: >>> torch.fft.hfft(T[:3], n=5) tensor([0., 1., 2., 3., 4.]) Like with `irfft()`, the output length must be given in order to recover an even length output: >>> torch.fft.hfft(T[:3]) tensor([0.5000, 1.1236, 2.5000, 3.8764]) `torch.fft.ihfft(input, n=None, dim=-1, norm=None) → Tensor` Computes the inverse of `hfft()`. `input` must be a real-valued signal, interpreted in the Fourier domain. The IFFT of a real signal is Hermitian-symmetric, `X[i] = conj(X[-i])`. `ihfft()` represents this in the one-sided form where only the positive frequencies below the Nyquist frequency are included. To compute the full output, use `ifft()`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the real input tensor * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Signal length. If given, the input will either be zero-padded or trimmed to this length before computing the Hermitian IFFT. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to take the one dimensional Hermitian IFFT. * **norm** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Normalization mode. For the backward transform (`ihfft()`), these correspond to: * `"forward"` \- no normalization * `"backward"` \- normalize by `1/n` * `"ortho"` \- normalize by `1/sqrt(n)` (making the IFFT orthonormal) Calling the forward transform (`hfft()`) with the same normalization mode will apply an overall normalization of `1/n` between the two transforms. This is required to make `ihfft()` the exact inverse. Default is `"backward"` (normalize by `1/n`). #### Example >>> t = torch.arange(5) >>> t tensor([0, 1, 2, 3, 4]) >>> torch.fft.ihfft(t) tensor([ 2.0000+-0.0000j, -0.5000-0.6882j, -0.5000-0.1625j]) Compare against the full output from `ifft()`: >>> torch.fft.ifft(t) tensor([ 2.0000+-0.0000j, -0.5000-0.6882j, -0.5000-0.1625j, -0.5000+0.1625j, -0.5000+0.6882j]) ## Helper Functions `torch.fft.fftfreq(n, d=1.0, *, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Computes the discrete Fourier Transform sample frequencies for a signal of size `n`. Note By convention, `fft()` returns positive frequency terms first, followed by the negative frequencies in reverse order, so that `f[-i]` for all 0>> torch.fft.fftfreq(5) tensor([ 0.0000, 0.2000, 0.4000, -0.4000, -0.2000]) For even input, we can see the Nyquist frequency at `f[2]` is given as negative: >>> torch.fft.fftfreq(4) tensor([ 0.0000, 0.2500, -0.5000, -0.2500]) `torch.fft.rfftfreq(n, d=1.0, *, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Computes the sample frequencies for `rfft()` with a signal of size `n`. Note `rfft()` returns Hermitian one-sided output, so only the positive frequency terms are returned. For a real FFT of length `n` and with inputs spaced in length unit `d`, the frequencies are: f = torch.arange((n + 1) // 2) / (d * n) Note For even lengths, the Nyquist frequency at `f[n/2]` can be thought of as either negative or positive. Unlike `fftfreq()`, `rfftfreq()` always returns it as positive. Parameters * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the real FFT length * **d** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The sampling length scale. The spacing between individual samples of the FFT input. The default assumes unit spacing, dividing that result by the actual spacing gives the result in physical frequency units. Keyword Arguments * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** (`torch.layout`, optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** (`torch.device`, optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. #### Example >>> torch.fft.rfftfreq(5) tensor([ 0.0000, 0.2000, 0.4000]) >>> torch.fft.rfftfreq(4) tensor([ 0.0000, 0.2500, 0.5000]) Compared to the output from `fftfreq()`, we see that the Nyquist frequency at `f[2]` has changed sign: >>> torch.fft.fftfreq(4) tensor([ 0.0000, 0.2500, -0.5000, -0.2500]) `torch.fft.fftshift(input, dim=None) → Tensor` Reorders n-dimensional FFT data, as provided by `fftn()`, to have negative frequency terms first. This performs a periodic shift of n-dimensional data such that the origin `(0, ..., 0)` is moved to the center of the tensor. Specifically, to `input.shape[dim] // 2` in each selected dimension. Note By convention, the FFT returns positive frequency terms first, followed by the negative frequencies in reverse order, so that `f[-i]` for all 0>> f = torch.fft.fftfreq(4) >>> f tensor([ 0.0000, 0.2500, -0.5000, -0.2500]) >>> torch.fft.fftshift(f) tensor([-0.5000, -0.2500, 0.0000, 0.2500]) Also notice that the Nyquist frequency term at `f[2]` was moved to the beginning of the tensor. This also works for multi-dimensional transforms: >>> x = torch.fft.fftfreq(5, d=1/5) + 0.1 * torch.fft.fftfreq(5, d=1/5).unsqueeze(1) >>> x tensor([[ 0.0000, 1.0000, 2.0000, -2.0000, -1.0000], [ 0.1000, 1.1000, 2.1000, -1.9000, -0.9000], [ 0.2000, 1.2000, 2.2000, -1.8000, -0.8000], [-0.2000, 0.8000, 1.8000, -2.2000, -1.2000], [-0.1000, 0.9000, 1.9000, -2.1000, -1.1000]]) >>> torch.fft.fftshift(x) tensor([[-2.2000, -1.2000, -0.2000, 0.8000, 1.8000], [-2.1000, -1.1000, -0.1000, 0.9000, 1.9000], [-2.0000, -1.0000, 0.0000, 1.0000, 2.0000], [-1.9000, -0.9000, 0.1000, 1.1000, 2.1000], [-1.8000, -0.8000, 0.2000, 1.2000, 2.2000]]) `fftshift()` can also be useful for spatial data. If our data is defined on a centered grid (`[-(N//2), (N-1)//2]`) then we can use the standard FFT defined on an uncentered grid (`[0, N)`) by first applying an `ifftshift()`. >>> x_centered = torch.arange(-5, 5) >>> x_uncentered = torch.fft.ifftshift(x_centered) >>> fft_uncentered = torch.fft.fft(x_uncentered) Similarly, we can convert the frequency domain components to centered convention by applying `fftshift()`. >>> fft_centered = torch.fft.fftshift(fft_uncentered) The inverse transform, from centered Fourier space back to centered spatial data, can be performed by applying the inverse shifts in reverse order: >>> x_centered_2 = torch.fft.fftshift(torch.fft.ifft(torch.fft.ifftshift(fft_centered))) >>> torch.allclose(x_centered.to(torch.complex64), x_centered_2) True `torch.fft.ifftshift(input, dim=None) → Tensor` Inverse of `fftshift()`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the tensor in FFT order * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – The dimensions to rearrange. Only dimensions specified here will be rearranged, any other dimensions will be left in their original order. Default: All dimensions of `input`. #### Example >>> f = torch.fft.fftfreq(5) >>> f tensor([ 0.0000, 0.2000, 0.4000, -0.4000, -0.2000]) A round-trip through `fftshift()` and `ifftshift()` gives the same result: >>> shifted = torch.fftshift(f) >>> torch.ifftshift(shifted) tensor([ 0.0000, 0.2000, 0.4000, -0.4000, -0.2000]) # torch.futures Warning The `torch.futures` package is experimental and subject to change. This package provides a `Future` type that encapsulates an asynchronous execution and a set of utility functions to simplify operations on `Future` objects. Currently, the `Future` type is primarily used by the [Distributed RPC Framework](rpc#distributed-rpc-framework). `class torch.futures.Future` Wrapper around a `torch._C.Future` which encapsulates an asynchronous execution of a callable, e.g. [`rpc_async()`](rpc#torch.distributed.rpc.rpc_async "torch.distributed.rpc.rpc_async"). It also exposes a set of APIs to add callback functions and set results. `add_done_callback(self: torch._C.Future, arg0: function) → None` `done()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.done) Return `True` if this `Future` is done. A `Future` is done if it has a result or an exception. `set_exception(result)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.set_exception) Set an exception for this `Future`, which will mark this `Future` as completed with an error and trigger all attached callbacks. Note that when calling wait()/value() on this `Future`, the exception set here will be raised inline. Parameters **result** ([BaseException](https://docs.python.org/3/library/exceptions.html#BaseException "\(in Python v3.9\)")) – the exception for this `Future`. Example:: >>> import torch >>> >>> fut = torch.futures.Future() >>> fut.set_exception(ValueError("foo")) >>> fut.wait() >>> >>> # Output: >>> # This will run after the future has finished. >>> ValueError: foo `set_result(result)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.set_result) Set the result for this `Future`, which will mark this `Future` as completed and trigger all attached callbacks. Note that a `Future` cannot be marked completed twice. Parameters **result** ([object](https://docs.python.org/3/library/functions.html#object "\(in Python v3.9\)")) – the result object of this `Future`. Example:: >>> import threading >>> import time >>> import torch >>> >>> def slow_set_future(fut, value): >>> time.sleep(0.5) >>> fut.set_result(value) >>> >>> fut = torch.futures.Future() >>> t = threading.Thread( >>> target=slow_set_future, >>> args=(fut, torch.ones(2) * 3) >>> ) >>> t.start() >>> >>> print(fut.wait()) # tensor([3., 3.]) >>> t.join() `then(callback)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.then) Append the given callback function to this `Future`, which will be run when the `Future` is completed. Multiple callbacks can be added to the same `Future`, and will be invoked in the same order as they were added. The callback must take one argument, which is the reference to this `Future`. The callback function can use the `Future.wait()` API to get the value. Note that if this `Future` is already completed, the given callback will be run immediately inline. Parameters **callback** (`Callable`) – a `Callable` that takes this `Future` as the only argument. Returns A new `Future` object that holds the return value of the `callback` and will be marked as completed when the given `callback` finishes. Example:: >>> import torch >>> >>> def callback(fut): >>> print(f"RPC return value is {fut.wait()}.") >>> >>> fut = torch.futures.Future() >>> # The inserted callback will print the return value when >>> # receiving the response from "worker1" >>> cb_fut = fut.then(callback) >>> chain_cb_fut = cb_fut.then( >>> lambda x : print(f"Chained cb done. {x.wait()}") >>> ) >>> fut.set_result(5) >>> >>> # Outputs are: >>> # RPC return value is 5. >>> # Chained cb done. None `value(self: torch._C.Future) → object` `wait()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#Future.wait) Block until the value of this `Future` is ready. Returns The value held by this `Future`. If the function (callback or RPC) creating the value has thrown an error, this `wait` method will also throw an error. `torch.futures.collect_all(futures)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#collect_all) Collects the provided `Future` objects into a single combined `Future` that is completed when all of the sub-futures are completed. Parameters **futures** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – a list of `Future` objects. Returns Returns a `Future` object to a list of the passed in Futures. Example:: >>> import torch >>> >>> fut0 = torch.futures.Future() >>> fut1 = torch.futures.Future() >>> >>> fut = torch.futures.collect_all([fut0, fut1]) >>> >>> fut0.set_result(0) >>> fut1.set_result(1) >>> >>> fut_list = fut.wait() >>> print(f"fut0 result = {fut_list[0].wait()}") >>> print(f"fut1 result = {fut_list[1].wait()}") >>> # outputs: >>> # fut0 result = 0 >>> # fut1 result = 1 `torch.futures.wait_all(futures)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/futures.html#wait_all) Waits for all provided futures to be complete, and returns the list of completed values. Parameters **futures** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – a list of `Future` object. Returns A list of the completed `Future` results. This method will throw an error if `wait` on any `Future` throws. # torch.fx ## Overview **This feature is under a Beta release and its API may change.** FX is a toolkit for developers to use to transform `nn.Module` instances. FX consists of three main components: a **symbolic tracer,** an **intermediate representation** , and **Python code generation**. A demonstration of these components in action: import torch # Simple module for demonstration class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.param = torch.nn.Parameter(torch.rand(3, 4)) self.linear = torch.nn.Linear(4, 5) def forward(self, x): return self.linear(x + self.param).clamp(min=0.0, max=1.0) module = MyModule() from torch.fx import symbolic_trace # Symbolic tracing frontend - captures the semantics of the module symbolic_traced : torch.fx.GraphModule = symbolic_trace(module) # High-level intermediate representation (IR) - Graph representation print(symbolic_traced.graph) """ graph(x): %param : [#users=1] = self.param %add_1 : [#users=1] = call_function[target=](args = (%x, %param), kwargs = {}) %linear_1 : [#users=1] = call_module[target=linear](args = (%add_1,), kwargs = {}) %clamp_1 : [#users=1] = call_method[target=clamp](args = (%linear_1,), kwargs = {min: 0.0, max: 1.0}) return clamp_1 """ # Code generation - valid Python code print(symbolic_traced.code) """ def forward(self, x): param = self.param add_1 = x + param; x = param = None linear_1 = self.linear(add_1); add_1 = None clamp_1 = linear_1.clamp(min = 0.0, max = 1.0); linear_1 = None return clamp_1 """ The **symbolic tracer** performs “symbolic execution” of the Python code. It feeds fake values, called Proxies, through the code. Operations on theses Proxies are recorded. More information about symbolic tracing can be found in the `symbolic_trace()` and `Tracer` documentation. The **intermediate representation** is the container for the operations that were recorded during symbolic tracing. It consists of a list of Nodes that represent function inputs, callsites (to functions, methods, or [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") instances), and return values. More information about the IR can be found in the documentation for `Graph`. The IR is the format on which transformations are applied. **Python code generation** is what makes FX a Python-to-Python (or Module-to- Module) transformation toolkit. For each Graph IR, we can create valid Python code matching the Graph’s semantics. This functionality is wrapped up in `GraphModule`, which is a [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") instance that holds a `Graph` as well as a `forward` method generated from the Graph. Taken together, this pipeline of components (symbolic tracing → intermediate representation → transforms → Python code generation) constitutes the Python- to-Python transformation pipeline of FX. In addition, these components can be used separately. For example, symbolic tracing can be used in isolation to capture a form of the code for analysis (and not transformation) purposes. Code generation can be used for programmatically generating models, for example from a config file. There are many uses for FX! Several example transformations can be found at the [examples](https://github.com/pytorch/examples/tree/master/fx) repository. ## Writing Transformations What is an FX transform? Essentially, it’s a function that looks like this. import torch import torch.fx def transform(m: nn.Module, tracer_class : type = torch.fx.Tracer) -> torch.nn.Module: # Step 1: Acquire a Graph representing the code in `m` # NOTE: torch.fx.symbolic_trace is a wrapper around a call to # fx.Tracer.trace and constructing a GraphModule. We'll # split that out in our transform to allow the caller to # customize tracing behavior. graph : torch.fx.Graph = tracer_class().trace(m) # Step 2: Modify this Graph or create a new one graph = ... # Step 3: Construct a Module to return return torch.fx.GraphModule(m, graph) Your transform will take in an [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module"), acquire a `Graph` from it, do some modifications, and return a new [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module"). You should think of the [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") that your FX transform returns as identical to a regular [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") – you can pass it to another FX transform, you can pass it to TorchScript, or you can run it. Ensuring that the inputs and outputs of your FX transform are a [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") will allow for composability. Note It is also possible to modify an existing `GraphModule` instead of creating a new one, like so: import torch import torch.fx def transform(m : nn.Module) -> nn.Module): gm : torch.fx.GraphModule = torch.fx.symbolic_trace(m) # Modify gm.graph # <...> # Recompile the forward() method of `gm` from its Graph gm.recompile() return gm Note that you MUST call `GraphModule.recompile()` to bring the generated `forward()` method on the `GraphModule` in sync with the modified `Graph`. Given that you’ve passed in a [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") that has been traced into a `Graph`, there are now two primary approaches you can take to building a new `Graph`. ### A Quick Primer on Graphs Full treatment of the semantics of graphs can be found in the `Graph` documentation, but we are going to cover the basics here. A `Graph` is a data structure that represents a method on a `GraphModule`. The information that this requires is: * What are the inputs to the method? * What are the operations that run inside the method? * What is the output (i.e. return) value from the method? All three of these concepts are represented with `Node` instances. Let’s see what we mean by that with a short example: import torch import torch.fx class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.param = torch.nn.Parameter(torch.rand(3, 4)) self.linear = torch.nn.Linear(4, 5) def forward(self, x): return torch.topk(torch.sum( self.linear(x + self.linear.weight).relu(), dim=-1), 3) m = MyModule() gm = torch.fx.symbolic_trace(m) gm.graph.print_tabular() Here we define a module `MyModule` for demonstration purposes, instantiate it, symbolically trace it, then call the `Graph.print_tabular()` method to print out a table showing the nodes of this `Graph`: opcode | name | target | args | kwargs ---|---|---|---|--- placeholder | x | x | () | {} get_attr | linear_weight | linear.weight | () | {} call_function | add_1 | | (x, linear_weight) | {} call_module | linear_1 | linear | (add_1,) | {} call_method | relu_1 | relu | (linear_1,) | {} call_function | sum_1 | | (relu_1,) | {‘dim’: -1} call_function | topk_1 | | (sum_1, 3) | {} output | output | output | (topk_1,) | {} We can use this information to answer the questions we posed above. * What are the inputs to the method? In FX, method inputs are specified via special `placeholder` nodes. In this case, we have a single `placeholder` node with a `target` of `x`, meaning we have a single (non-self) argument named x. * What are the operations within the method? The `get_attr`, `call_function`, `call_module`, and `call_method` nodes represent the operations in the method. A full treatment of the semantics of all of these can be found in the `Node` documentation. * What is the return value of the method? The return value in a `Graph` is specified by a special `output` node. Given that we now know the basics of how code is represented in FX, we can now explore how we would edit a `Graph`. ### Graph Manipulation #### Direct Graph Manipulation One approach to building this new `Graph` is to directly manipulate your old one. To aid in this, we can simply take the `Graph` we obtain from symbolic tracing and modify it. For example, let’s say we desire to replace [`torch.add()`](generated/torch.add#torch.add "torch.add") calls with [`torch.mul()`](generated/torch.mul#torch.mul "torch.mul") calls. import torch import torch.fx # Sample module class M(torch.nn.Module): def forward(self, x, y): return torch.add(x, y) def transform(m: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module: graph : fx.Graph = tracer_class().trace(m) # FX represents its Graph as an ordered list of # nodes, so we can iterate through them. for node in graph.nodes: # Checks if we're calling a function (i.e: # torch.add) if node.op == 'call_function': # The target attribute is the function # that call_function calls. if node.target == torch.add: node.target = torch.mul graph.lint() # Does some checks to make sure the # Graph is well-formed. return fx.GraphModule(m, graph) We can also do more involved `Graph` rewrites, such as deleting or appending nodes. To aid in these transformations, FX has utility functions for transforming the graph that can be found in the `Graph` documentation. An example of using these APIs to append a `torch.relu()` call can be found below. # Specifies the insertion point. Any nodes added to the # Graph within this scope will be inserted after `node` with traced.graph.inserting_after(node): # Insert a new `call_function` node calling `torch.relu` new_node = traced.graph.call_function( torch.relu, args=(node,)) # We want all places that used the value of `node` to # now use that value after the `relu` call we've added. # We use the `replace_all_uses_with` API to do this. node.replace_all_uses_with(new_node) For simple transformations that only consist of substitutions, you can also make use of the [subgraph rewriter.](https://github.com/pytorch/pytorch/blob/master/torch/fx/subgraph_rewriter.py) #### Subgraph Rewriting With replace_pattern() FX also provides another level of automation on top of direct graph manipulation. The `replace_pattern()` API is essentially a “find/replace” tool for editing `Graph`s. It allows you to specify a `pattern` and `replacement` function and it will trace through those functions, find instances of the group of operations in the `pattern` graph, and replace those instances with copies of the `replacement` graph. This can help to greatly automate tedious graph manipulation code, which can get unwieldy as the transformations get more complex. #### Graph Manipulation Examples * [Replace one op](https://github.com/pytorch/examples/blob/master/fx/replace_op.py) * [Conv/Batch Norm fusion](https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/fuser.py) * [replace_pattern: Basic usage](https://github.com/pytorch/examples/blob/master/fx/subgraph_rewriter_basic_use.py) * [Quantization](https://pytorch.org/docs/master/quantization.html#prototype-fx-graph-mode-quantization) * [Invert Transformation](https://github.com/pytorch/examples/blob/master/fx/invert.py) ### Proxy/Retracing Another way of manipulating `Graph`s is by reusing the `Proxy` machinery used in symbolic tracing. For example, let’s imagine that we wanted to write a transformation that decomposed PyTorch functions into smaller operations. It would transform every `F.relu(x)` call into `(x > 0) * x`. One possibility would be to perform the requisite graph rewriting to insert the comparison and multiplication after the `F.relu`, and then clean up the original `F.relu`. However, we can automate this process by using `Proxy` objects to automatically record operations into the `Graph`. To use this method, we write the operations that we want inserted as regular PyTorch code and invoke that code with `Proxy` objects as arugments. These `Proxy` objects will capture the operations that are performed on them and append them to the `Graph`. # Note that this decomposition rule can be read as regular Python def relu_decomposition(x): return (x > 0) * x decomposition_rules = {} decomposition_rules[F.relu] = relu_decomposition def decompose(model: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module: """ Decompose `model` into smaller constituent operations. Currently,this only supports decomposing ReLU into its mathematical definition: (x > 0) * x """ graph : fx.Graph = tracer_class().trace(model) new_graph = fx.Graph() env = {} for node in graph.nodes: if node.op == 'call_function' and node.target in decomposition_rules: # By wrapping the arguments with proxies, # we can dispatch to the appropriate # decomposition rule and implicitly add it # to the Graph by symbolically tracing it. proxy_args = [ fx.Proxy(env[x.name]) if isinstance(x, fx.Node) else x for x in node.args] output_proxy = decomposition_rules[node.target](*proxy_args) # Operations on `Proxy` always yield new `Proxy`s, and the # return value of our decomposition rule is no exception. # We need to extract the underlying `Node` from the `Proxy` # to use it in subsequent iterations of this transform. new_node = output_proxy.node env[node.name] = new_node else: # Default case: we don't have a decomposition rule for this # node, so just copy the node over into the new graph. new_node = new_graph.node_copy(node, lambda x: env[x.name]) env[node.name] = new_node return fx.GraphModule(model, new_graph) In addition to avoiding explicit graph manipulation, using `Proxy`s also allows you to specify your rewrite rules as native Python code. For transformations that require a large amount of rewrite rules (such as vmap or grad), this can often improve readability and maintainability of the rules. A worked example of using `Proxy`s for `Graph` manipulation can be found [here](https://github.com/pytorch/examples/blob/master/fx/proxy_based_graph_creation.py). ### The Interpreter Pattern A useful code organizational pattern in FX is to loop over all the `Node`s in a `Graph` and execute them. This can be used for several things including runtime analysis of values flowing through the graph or transformation of the code via retracing with `Proxy`s. For example, suppose we want to run a `GraphModule` and record the [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") shape and dtype properties on the nodes as we see them at runtime. That might look like: import torch import torch.fx from torch.fx.node import Node from typing import Dict class ShapeProp: """ Shape propagation. This class takes a `GraphModule`. Then, its `propagate` method executes the `GraphModule` node-by-node with the given arguments. As each operation executes, the ShapeProp class stores away the shape and element type for the output values of each operation on the `shape` and `dtype` attributes of the operation's `Node`. """ def __init__(self, mod): self.mod = mod self.graph = mod.graph self.modules = dict(self.mod.named_modules()) def propagate(self, *args): args_iter = iter(args) env : Dict[str, Node] = {} def load_arg(a): return torch.fx.graph.map_arg(a, lambda n: env[n.name]) def fetch_attr(target : str): target_atoms = target.split('.') attr_itr = self.mod for i, atom in enumerate(target_atoms): if not hasattr(attr_itr, atom): raise RuntimeError(f"Node referenced nonexistant target {'.'.join(target_atoms[:i])}") attr_itr = getattr(attr_itr, atom) return attr_itr for node in self.graph.nodes: if node.op == 'placeholder': result = next(args_iter) elif node.op == 'get_attr': result = fetch_attr(node.target) elif node.op == 'call_function': result = node.target(*load_arg(node.args), **load_arg(node.kwargs)) elif node.op == 'call_method': self_obj, *args = load_arg(node.args) kwargs = load_arg(node.kwargs) result = getattr(self_obj, node.target)(*args, **kwargs) elif node.op == 'call_module': result = self.modules[node.target](*load_arg(node.args), **load_arg(node.kwargs)) # This is the only code specific to shape propagation. # you can delete this `if` branch and this becomes # a generic GraphModule interpreter. if isinstance(result, torch.Tensor): node.shape = result.shape node.dtype = result.dtype env[node.name] = result return load_arg(self.graph.result) As you can see, a full interpreter for FX is not that complicated but it can be very useful. To ease using this pattern, we provide the `Interpreter` class, which encompasses the above logic in a way that certain aspects of the interpreter’s execution can be overridden via method overrides. In addition to executing operations, we can also generate a new `Graph` by feeding `Proxy` values through an interpreter. Similarly, we provide the `Transformer` class to encompass this pattern. `Transformer` behaves similarly to `Interpreter`, but instead of calling the `run` method to get a concrete output value from the Module, you would call the `Transformer.transform()` method to return a new `GraphModule` which was subject to any transformation rules you installed as overridden methods. #### Examples of the Interpreter Pattern * [Shape Propagation](https://github.com/pytorch/pytorch/blob/master/torch/fx/experimental/shape_prop.py) * [Performance Profiler](https://github.com/pytorch/tutorials/pull/1319) ## Debugging ### Introduction Often in the course of authoring transformations, our code will not be quite right. In this case, we may need to do some debugging. The key is to work backwards: first, check the results of invoking the generated module to prove or disprove correctness. Then, inspect and debug the generated code. Then, debug the process of transformations that led to the generated code. If you’re not familiar with debuggers, please see the auxiliary section Available Debuggers. ### Checking Correctness of Modules Because the output of most deep learning modules consists of floating point [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") instances, checking for equivalence between the results of two [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") is not as straightforward as doing a simple equality check. To motivate this, let’s use an example: import torch import torch.fx import torchvision.models as models def transform(m : torch.nn.Module) -> torch.nn.Module: gm = torch.fx.symbolic_trace(m) # Imagine we're doing some transforms here # <...> gm.recompile() return gm resnet18 = models.resnet18() transformed_resnet18 = transform(resnet18) input_image = torch.randn(5, 3, 224, 224) assert resnet18(input_image) == transformed_resnet18(input_image) """ RuntimeError: Boolean value of Tensor with more than one value is ambiguous """ Here, we’ve tried to check equality of the values of two deep learning models with the `==` equality operator. However, this is not well- defined both due to the issue of that operator returning a tensor and not a bool, but also because comparison of floating point values should use a margin of error (or epsilon) to account for the non-commutativity of floating point operations (see [here](https://floating-point-gui.de/errors/comparison/) for more details). We can use [`torch.allclose()`](generated/torch.allclose#torch.allclose "torch.allclose") instead, which will give us an approximate comparison taking into account a relative and absolute tolerance threshold: assert torch.allclose(resnet18(input_image), transformed_resnet18(input_image)) This is the first tool in our toolbox to check if transformed modules are behaving as we expect compared to a reference implementation. ### Debugging the Generated Code Because FX generates the `forward()` function on `GraphModule`s, using traditional debugging techniques like `print` statements or `pdb` is not as straightfoward. Luckily, we have several techniques we can use for debugging the generated code. #### Use `pdb` Invoke `pdb` to step into the running program. Although the code that represents the `Graph` is not in any source file, we can still step into it manually using `pdb` when the forward pass is invoked. import torch import torch.fx import torchvision.models as models def my_pass(inp: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module: graph = tracer_class().trace(inp) # Transformation logic here # <...> # Return new Module return fx.GraphModule(inp, graph) my_module = models.resnet18() my_module_transformed = my_pass(my_module) input_value = torch.randn(5, 3, 224, 224) # When this line is executed at runtime, we will be dropped into an # interactive `pdb` prompt. We can use the `step` or `s` command to # step into the execution of the next line import pdb; pdb.set_trace() my_module_transformed(input_value) #### Print the Generated Code If you’d like to run the same code multiple times, then it can be a bit tedious to step to the right code with `pdb`. In that case, one approach is to simply copy-paste the generated `forward` pass into your code and examine it from there. # Assume that `traced` is a GraphModule that has undergone some # number of transforms # Copy this code for later print(traced) # Print the code generated from symbolic tracing. This outputs: """ def forward(self, y): x = self.x add_1 = x + y; x = y = None return add_1 """ # Subclass the original Module class SubclassM(M): def __init__(self): super().__init__() # Paste the generated `forward` function (the one we printed and # copied above) here def forward(self, y): x = self.x add_1 = x + y; x = y = None return add_1 # Create an instance of the original, untraced Module. Then, create an # instance of the Module with the copied `forward` function. We can # now compare the output of both the original and the traced version. pre_trace = M() post_trace = SubclassM() #### Use the `to_folder` Function From `GraphModule` `GraphModule.to_folder()` is a method in `GraphModule` that allows you to dump out the generated FX code to a folder. Although copying the forward pass into the code often suffices as in Print the Generated Code, it may be easier to examine modules and parameters using `to_folder`. m = symbolic_trace(M()) m.to_folder("foo", "Bar") from foo import Bar y = Bar() After running the above example, we can then look at the code within `foo/module.py` and modify it as desired (e.g. adding `print` statements or using `pdb`) to debug the generated code. ### Debugging the Transformation Now that we’ve identified that a transformation is creating incorrect code, it’s time to debug the transformation itself. First, we’ll check the Limitations of Symbolic Tracing section in the documentation. Once we verify that tracing is working as expected, the goal becomes figuring out what went wrong during our `GraphModule` transformation. There may be a quick answer in Writing Transformations, but, if not, there are several ways to examine our traced module: # Sample Module class M(torch.nn.Module): def forward(self, x, y): return x + y # Create an instance of `M` m = M() # Symbolically trace an instance of `M` (returns a GraphModule). In # this example, we'll only be discussing how to inspect a # GraphModule, so we aren't showing any sample transforms for the # sake of brevity. traced = symbolic_trace(m) # Print the code produced by tracing the module. print(traced) # The generated `forward` function is: """ def forward(self, x, y): add_1 = x + y; x = y = None return add_1 """ # Print the internal Graph. print(traced.graph) # This print-out returns: """ graph(x, y): %add_1 : [#users=1] = call_function[target=](args = (%x, %y), kwargs = {}) return add_1 """ # Print a tabular representation of the internal Graph. traced.graph.print_tabular() # This gives us: """ opcode name target args kwargs ------------- ------ ----------------------- -------- -------- placeholder x x () {} placeholder y y () {} call_function add_1 (x, y) {} """ Using the utility functions above, we can compare our traced Module before and after we’ve applied our transformations. Sometimes, a simple visual comparison is enough to trace down a bug. If it’s still not clear what’s going wrong, a debugger like `pdb` can be a good next step. Going off of the example above, consider the following code: # Sample user-defined function def transform_graph(module: torch.nn.Module, tracer_class : type = fx.Tracer) -> torch.nn.Module: # Get the Graph from our traced Module g = tracer_class().trace(module) """ Transformations on `g` go here """ return fx.GraphModule(module, g) # Transform the Graph transformed = transform_graph(traced) # Print the new code after our transforms. Check to see if it was # what we expected print(transformed) Using the above example, let’s say that the call to `print(traced)` showed us that there was an error in our transforms. We want to find what goes wrong using a debugger. We start a `pdb` session. We can see what’s happening during the transform by breaking on `transform_graph(traced)`, then pressing `s` to “step into” the call to `transform_graph(traced)`. We may also have good luck by editing the `print_tabular` method to print different attributes of the Nodes in the Graph. (For example, we might want to see the Node’s `input_nodes` and `users`.) ### Available Debuggers The most common Python debugger is [pdb](https://docs.python.org/3/library/pdb.html). You can start your program in “debug mode” with `pdb` by typing `python -m pdb FILENAME.py` into the command line, where `FILENAME` is the name of the file you want to debug. After that, you can use the `pdb` [debugger commands](https://docs.python.org/3/library/pdb.html#debugger-commands) to move through your running program stepwise. It’s common to set a breakpoint (`b LINE-NUMBER`) when you start `pdb`, then call `c` to run the program until that point. This prevents you from having to step through each line of execution (using `s` or `n`) to get to the part of the code you want to examine. Alternatively, you can write `import pdb; pdb.set_trace()` before the line you want to break at. If you add `pdb.set_trace()`, your program will automatically start in debug mode when you run it. (In other words, you can just type `python FILENAME.py` into the command line instead of `python -m pdb FILENAME.py`.) Once you’re running your file in debug mode, you can step through the code and examine your program’s internal state using certain commands. There are many excellent tutorials on `pdb` online, including RealPython’s [“Python Debugging With Pdb”](https://realpython.com/python- debugging-pdb/). IDEs like PyCharm or VSCode usually have a debugger built in. In your IDE, you can choose to either a) use `pdb` by pulling up a terminal window in your IDE (e.g. View → Terminal in VSCode), or b) use the built-in debugger (usually a graphical wrapper around `pdb`). ## Limitations of Symbolic Tracing FX uses a system of **symbolic tracing** (a.k.a [symbolic execution](https://en.wikipedia.org/wiki/Symbolic_execution)) to capture the semantics of programs in a transformable/analyzable form. The system is **tracing** in that it executes the program (really a [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") or function) to record operations. It is **symbolic** in that the data flowing through the program during this execution is not real data, but rather symbols (`Proxy` in FX parlance). Although symbolic tracing works for most neural net code, it has some limitations. ### Dynamic Control Flow The main limitation of symbolic tracing is it does not currently support _dynamic control flow_. That is, loops or `if` statements where the condition may depend on the input values of the program. For example, let’s examine the following program: def func_to_trace(x): dim0 = x.size[0] if dim0 == 3: return torch.relu(x) else: return torch.neg(x) traced = torch.fx.symbolic_trace(func_to_trace) """ <...> File "dyn.py", line 6, in func_to_trace if dim0 == 3: File "pytorch/torch/fx/proxy.py", line 155, in __bool__ return self.tracer.to_bool(self) File "pytorch/torch/fx/proxy.py", line 85, in to_bool raise TraceError('symbolically traced variables cannot be used as inputs to control flow') torch.fx.proxy.TraceError: symbolically traced variables cannot be used as inputs to control flow """ The condition to the `if` statement relies on the value of `dim0`, which eventually relies on the value of `x`, a function input. Since `x` can change (i.e. if you pass a new input tensor to the traced function), this is _dynamic control flow_. The traceback walks back up through your code to show you where this situation happens. #### Static Control Flow On the other hand, so-called _static control flow_ is supported. Static control flow is loops or `if` statements whose value cannot change across invocations. Typically, in PyTorch programs, this control flow arises for code making decisions about a model’s architecture based on hyper-parameters. As a concrete example: import torch import torch.fx class MyModule(torch.nn.Module): def __init__(self, do_activation : bool = False): super().__init__() self.do_activation = do_activation self.linear = torch.nn.Linear(512, 512) def forward(self, x): x = self.linear(x) # This if-statement is so-called static control flow. # Its condition does not depend on any input values if self.do_activation: x = torch.relu(x) return x without_activation = MyModule(do_activation=False) with_activation = MyModule(do_activation=True) traced_without_activation = torch.fx.symbolic_trace(without_activation) print(traced_without_activation.code) """ def forward(self, x): linear_1 = self.linear(x); x = None return linear_1 """ traced_with_activation = torch.fx.symbolic_trace(with_activation) print(traced_with_activation.code) """ import torch def forward(self, x): linear_1 = self.linear(x); x = None relu_1 = torch.relu(linear_1); linear_1 = None return relu_1 """ The if-statement `if self.do_activation` does not depend on any function inputs, thus it is static. `do_activation` can be considered to be a hyper- parameter, and the traces of different instances of `MyModule` with different values for that parameter have different code. This is a valid pattern that is supported by symbolic tracing. Many instances of dynamic control flow are semantically static control flow. These instances can be made to support symbolic tracing by removing the data dependencies on input values, for example by moving values to `Module` attributes or by passing constant values during symbolic tracing: def f(x, flag): if flag: return x else: return x*2 fx.symbolic_trace(f) # Fails! def wrapper(flag): return lambda x: f(x, flag) new_f = wrapper(flag=True) fx.symbolic_trace(new_f) In the case of truly dynamic control flow, the sections of the program that contain this code can be traced as calls to the Method (see Customizing Tracing with the Tracer class) or function (see `wrap()`) rather than tracing through them. ### Non-`torch` Functions FX uses `__torch_function__` as the mechanism by which it intercepts calls (see the [technical overview](https://github.com/pytorch/pytorch/blob/master/torch/fx/OVERVIEW.md#technical- details) for more information about this). Some functions, such as builtin Python functions or those in the `math` module, are things that are not covered by `__torch_function__`, but we would still like to capture them in symbolic tracing. For example: import torch import torch.fx from math import sqrt def normalize(x): """ Normalize `x` by the size of the batch dimension """ return x / sqrt(len(x)) # It's valid Python code normalize(torch.rand(3, 4)) traced = torch.fx.symbolic_trace(normalize) """ <...> File "sqrt.py", line 9, in normalize return x / sqrt(len(x)) File "pytorch/torch/fx/proxy.py", line 161, in __len__ raise RuntimeError("'len' is not supported in symbolic tracing by default. If you want " RuntimeError: 'len' is not supported in symbolic tracing by default. If you want this call to be recorded, please call torch.fx.wrap('len') at module scope """ The error tells us that the built-in function `len` is not supported. We can make it so that functions like this are recorded in the trace as direct calls using the `wrap()` API: torch.fx.wrap('len') torch.fx.wrap('sqrt') traced = torch.fx.symbolic_trace(normalize) print(traced.code) """ import math def forward(self, x): len_1 = len(x) sqrt_1 = math.sqrt(len_1); len_1 = None truediv = x / sqrt_1; x = sqrt_1 = None return truediv """ ### Customizing Tracing with the `Tracer` class The `Tracer` class is the class that underlies the implementation of `symbolic_trace`. The behavior of tracing can be customized by subclassing Tracer, like so: class MyCustomTracer(torch.fx.Tracer): # Inside here you can override various methods # to customize tracing. See the `Tracer` API # reference pass # Let's use this custom tracer to trace through this module class MyModule(torch.nn.Module): def forward(self, x): return torch.relu(x) + torch.ones(3, 4) mod = MyModule() traced_graph = MyCustomTracer().trace(mod) # trace() returns a Graph. Let's wrap it up in a # GraphModule to make it runnable traced = torch.fx.GraphModule(mod, traced_graph) #### Leaf Modules Leaf Modules are the modules that appear as calls in the symbolic trace rather than being traced through. The default set of leaf modules is the set of standard `torch.nn` module instances. For example: class MySpecialSubmodule(torch.nn.Module): def forward(self, x): return torch.neg(x) class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.linear = torch.nn.Linear(3, 4) self.submod = MySpecialSubmodule() def forward(self, x): return self.submod(self.linear(x)) traced = torch.fx.symbolic_trace(MyModule()) print(traced.code) # `linear` is preserved as a call, yet `submod` is traced though. # This is because the default set of "Leaf Modules" includes all # standard `torch.nn` modules. """ import torch def forward(self, x): linear_1 = self.linear(x); x = None neg_1 = torch.neg(linear_1); linear_1 = None return neg_1 """ The set of leaf modules can be customized by overriding `Tracer.is_leaf_module()`. ### Miscellanea * Tensor constructors (e.g. `torch.zeros`, `torch.ones`, `torch.rand`, `torch.randn`, `torch.sparse_coo_tensor`) are currently not traceable. * The deterministic constructors (`zeros`, `ones`) can be used and the value they produce will be embedded in the trace as a constant. This is only problematic if the arguments to these constructors refers to dynamic input sizes. In this case, `ones_like` or `zeros_like` may be a viable substitute. * Nondeterministic constructors (`rand`, `randn`) will have a single random value embedded in the trace. This is likely not the intended behavior. * This behavior may be fixed in a future release. * Type annotations * Python 3-style type annotations (e.g. `func(x : torch.Tensor, y : int) -> torch.Tensor`) are supported and will be preserved by symbolic tracing. * Python 2-style comment type annotations `# type: (torch.Tensor, int) -> torch.Tensor` are not currently supported. * Annotations on local names within a function are not currently supported. ## API Reference `torch.fx.symbolic_trace(root, concrete_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#symbolic_trace) Symbolic tracing API Given an `nn.Module` or function instance `root`, this function will return a `GraphModule` constructed by recording operations seen while tracing through `root`. Parameters * **root** (_Union_ _[_[torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") _,__Callable_ _]_) – Module or function to be traced and converted into a Graph representation. * **concrete_args** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__any_ _]__]_) – Concrete arguments that should not be treated as Proxies. Returns a Module created from the recorded operations from `root`. Return type GraphModule `torch.fx.wrap(fn_or_name)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#wrap) This function can be called at module-level scope to register fn_or_name as a “leaf function”. A “leaf function” will be preserved as a CallFunction node in the FX trace instead of being traced through: # foo/bar/baz.py def my_custom_function(x, y): return x * x + y * y torch.fx.wrap('my_custom_function') def fn_to_be_traced(x, y): # When symbolic tracing, the below call to my_custom_function will be inserted into # the graph rather than tracing it. return my_custom_function(x, y) This function can also equivalently be used as a decorator: # foo/bar/baz.py @torch.fx.wrap def my_custom_function(x, y): return x * x + y * y A wrapped function can be thought of a “leaf function”, analogous to the concept of “leaf modules”, that is, they are functions that are left as calls in the FX trace rather than traced through. Parameters **fn_or_name** (_Union_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Callable_ _]_) – The function or name of the global function to insert into the graph when it’s called `class torch.fx.GraphModule(root, graph, class_name='GraphModule')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule) GraphModule is an nn.Module generated from an fx.Graph. Graphmodule has a `graph` attribute, as well as `code` and `forward` attributes generated from that `graph`. Warning When `graph` is reassigned, `code` and `forward` will be automatically regenerated. However, if you edit the contents of the `graph` without reassigning the `graph` attribute itself, you must call `recompile()` to update the generated code. `__init__(root, graph, class_name='GraphModule')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule.__init__) Construct a GraphModule. Parameters * **root** (_Union_ _[_[torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") _,__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Any_ _]_) – `root` can either be an nn.Module instance or a Dict mapping strings to any attribute type. In the case that `root` is a Module, any references to Module-based objects (via qualified name) in the Graph’s Nodes’ `target` field will be copied over from the respective place within `root`’s Module hierarchy into the GraphModule’s module hierarchy. In the case that `root` is a dict, the qualified name found in a Node’s `target` will be looked up directly in the dict’s keys. The object mapped to by the Dict will be copied over into the appropriate place within the GraphModule’s module hierarchy. * **graph** (Graph) – `graph` contains the nodes this GraphModule should use for code generation * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – `name` denotes the name of this GraphModule for debugging purposes. If it’s unset, all error messages will report as originating from `GraphModule`. It may be helpful to set this to `root`’s original name or a name that makes sense within the context of your transform. `property code` Return the Python code generated from the `Graph` underlying this `GraphModule`. `property graph` Return the `Graph` underlying this `GraphModule` `recompile()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule.recompile) Recompile this GraphModule from its `graph` attribute. This should be called after editing the contained `graph`, otherwise the generated code of this `GraphModule` will be out of date. `to_folder(folder, module_name='FxModule')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph_module.html#GraphModule.to_folder) Dumps out module to `folder` with `module_name` so that it can be imported with `from import ` Parameters * **folder** (_Union_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,_[os.PathLike](https://docs.python.org/3/library/os.html#os.PathLike "\(in Python v3.9\)") _]_) – The folder to write the code out to * **module_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Top-level name to use for the `Module` while writing out the code `class torch.fx.Graph` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph) `Graph` is the main data structure used in the FX Intermediate Representation. It consists of a series of `Node` s, each representing callsites (or other syntactic constructs). The list of `Node` s, taken together, constitute a valid Python function. For example, the following code import torch import torch.fx class MyModule(torch.nn.Module): def __init__(self): super().__init__() self.param = torch.nn.Parameter(torch.rand(3, 4)) self.linear = torch.nn.Linear(4, 5) def forward(self, x): return torch.topk(torch.sum(self.linear(x + self.linear.weight).relu(), dim=-1), 3) m = MyModule() gm = torch.fx.symbolic_trace(m) Will produce the following Graph: print(gm.graph) graph(x): %linear_weight : [#users=1] = self.linear.weight %add_1 : [#users=1] = call_function[target=operator.add](args = (%x, %linear_weight), kwargs = {}) %linear_1 : [#users=1] = call_module[target=linear](args = (%add_1,), kwargs = {}) %relu_1 : [#users=1] = call_method[target=relu](args = (%linear_1,), kwargs = {}) %sum_1 : [#users=1] = call_function[target=torch.sum](args = (%relu_1,), kwargs = {dim: -1}) %topk_1 : [#users=1] = call_function[target=torch.topk](args = (%sum_1, 3), kwargs = {}) return topk_1 For the semantics of operations represented in the `Graph`, please see `Node`. `__init__()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.__init__) Construct an empty Graph. `call_function(the_function, args=None, kwargs=None, type_expr=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.call_function) Insert a `call_function` `Node` into the `Graph`. A `call_function` node represents a call to a Python callable, specified by `the_function`. `the_function` can be Parameters * **the_function** (_Callable_ _[__..__,__Any_ _]_) – The function to be called. Can be any PyTorch operator, Python function, or member of the `builtins` or `operator` namespaces. * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – The positional arguments to be passed to the called function. * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – The keyword arguments to be passed to the called function * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. Returns The newly created and inserted `call_function` node. Note The same insertion point and type expression rules apply for this method as `Graph.create_node()`. `call_method(method_name, args=None, kwargs=None, type_expr=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.call_method) Insert a `call_method` `Node` into the `Graph`. A `call_method` node represents a call to a given method on the 0th element of `args`. Parameters * **method_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The name of the method to apply to the self argument. For example, if args[0] is a `Node` representing a `Tensor`, then to call `relu()` on that `Tensor`, pass `relu` to `method_name`. * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – The positional arguments to be passed to the called method. Note that this _should_ include a `self` argument. * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – The keyword arguments to be passed to the called method * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. Returns The newly created and inserted `call_method` node. Note The same insertion point and type expression rules apply for this method as `Graph.create_node()`. `call_module(module_name, args=None, kwargs=None, type_expr=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.call_module) Insert a `call_module` `Node` into the `Graph`. A `call_module` node represents a call to the forward() function of a `Module` in the `Module` hierarchy. Parameters * **module_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The qualified name of the `Module` in the `Module` hierarchy to be called. For example, if the traced `Module` has a submodule named `foo`, which has a submodule named `bar`, the qualified name `foo.bar` should be passed as `module_name` to call that module. * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – The positional arguments to be passed to the called method. Note that this should _not_ include a `self` argument. * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – The keyword arguments to be passed to the called method * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. Returns The newly-created and inserted `call_module` node. Note The same insertion point and type expression rules apply for this method as `Graph.create_node()`. `create_node(op, target, args=None, kwargs=None, name=None, type_expr=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.create_node) Create a `Node` and add it to the `Graph` at the current insert-point. Note that the current insert-point can be set via `Graph.inserting_before()` and `Graph.inserting_after()`. Parameters * **op** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – the opcode for this Node. One of ‘call_function’, ‘call_method’, ‘get_attr’, ‘call_module’, ‘placeholder’, or ‘output’. The semantics of these opcodes are described in the `Graph` docstring. * **args** (_Optional_ _[__Tuple_ _[__Argument_ _,__..__]__]_) – is a tuple of arguments to this node. * **kwargs** (_Optional_ _[__Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Argument_ _]__]_) – the kwargs of this Node * **name** (_Optional_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]_) – an optional string name for the `Node`. This will influence the name of the value assigned to in the Python generated code. * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. Returns The newly-created and inserted node. `erase_node(to_erase)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.erase_node) Erases a `Node` from the `Graph`. Throws an exception if there are still users of that node in the `Graph`. Parameters **to_erase** (Node) – The `Node` to erase from the `Graph`. `get_attr(qualified_name, type_expr=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.get_attr) Insert a `get_attr` node into the Graph. A `get_attr` `Node` represents the fetch of an attribute from the `Module` hierarchy. Parameters * **qualified_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – the fully-qualified name of the attribute to be retrieved. For example, if the traced Module has a submodule named `foo`, which has a submodule named `bar`, which has an attribute named `baz`, the qualified name `foo.bar.baz` should be passed as `qualified_name`. * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. Returns The newly-created and inserted `get_attr` node. Note The same insertion point and type expression rules apply for this method as `Graph.create_node`. `graph_copy(g, val_map)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.graph_copy) Copy all nodes from a given graph into `self`. Parameters * **g** (Graph) – The source graph from which to copy Nodes. * **val_map** (_Dict_ _[_Node _,_Node _]_) – a dictionary that will be populated with a mapping from nodes in `g` to nodes in `self`. Note that `val_map` can be passed in with values in it already to override copying of certain values. Returns The value in `self` that is now equivalent to the output value in `g`, if `g` had an `output` node. `None` otherwise. `inserting_after(n=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.inserting_after) Set the point at which create_node and companion methods will insert into the graph. When used within a ‘with’ statement, this will temporary set the insert point and then restore it when the with statement exits: with g.inserting_after(n): ... # inserting after node n ... # insert point restored to what it was previously g.inserting_after(n) # set the insert point permanently Parameters **n** (_Optional_ _[_Node _]_) – The node before which to insert. If None this will insert after the beginning of the entire graph. Returns A resource manager that will restore the insert point on `__exit__`. `inserting_before(n=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.inserting_before) Set the point at which create_node and companion methods will insert into the graph. When used within a ‘with’ statement, this will temporary set the insert point and then restore it when the with statement exits: with g.inserting_before(n): ... # inserting before node n ... # insert point restored to what it was previously g.inserting_before(n) # set the insert point permanently Parameters **n** (_Optional_ _[_Node _]_) – The node before which to insert. If None this will insert before the beginning of the entire graph. Returns A resource manager that will restore the insert point on `__exit__`. `lint(root=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.lint) Runs various checks on this Graph to make sure it is well-formed. In particular: - Checks Nodes have correct ownership (owned by this graph) - Checks Nodes appear in topological order - If `root` is provided, checks that targets exist in `root` Parameters **root** (_Optional_ _[_[torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") _]_) – The root module with which to check for targets. This is equivalent to the `root` argument that is passed when constructing a `GraphModule`. `node_copy(node, arg_transform=>)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.node_copy) Copy a node from one graph into another. `arg_transform` needs to transform arguments from the graph of node to the graph of self. Example: # Copying all the nodes in `g` into `new_graph` g : torch.fx.Graph = ... new_graph = torch.fx.graph() value_remap = {} for node in g.nodes: value_remap[node] = new_graph.node_copy(node, lambda n : value_remap[n]) Parameters * **node** (Node) – The node to copy into `self`. * **arg_transform** (_Callable_ _[__[_Node _]__,__Argument_ _]_) – A function that transforms `Node` arguments in node’s `args` and `kwargs` into the equivalent argument in `self`. In the simplest case, this should retrieve a value out of a table mapping Nodes in the original graph to `self`. `property nodes` Get the list of Nodes that constitute this Graph. Note that this `Node` list representation is a doubly-linked list. Mutations during iteration (e.g. delete a Node, add a Node) are safe. Returns A doubly-linked list of Nodes. Note that `reversed` can be called on this list to switch iteration order. `output(result, type_expr=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.output) Insert an `output` `Node` into the `Graph`. An `output` node represents a `return` statement in Python code. `result` is the value that should be returned. Parameters * **result** (_Argument_) – The value to be returned. * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. Note The same insertion point and type expression rules apply for this method as `Graph.create_node`. `placeholder(name, type_expr=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.placeholder) Insert a `placeholder` node into the Graph. A `placeholder` represents a function input. Parameters * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – A name for the input value. This corresponds to the name of the positional argument to the function this `Graph` represents. * **type_expr** (_Optional_ _[__Any_ _]_) – an optional type annotation representing the Python type the output of this node will have. This is needed in some cases for proper code generation (e.g. when the function is used subsequently in TorchScript compilation). Note The same insertion point and type expression rules apply for this method as `Graph.create_node`. `print_tabular()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.print_tabular) Prints the intermediate representation of the graph in tabular format. `python_code(root_module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/graph.html#Graph.python_code) Turn this `Graph` into valid Python code. Parameters **root_module** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The name of the root module on which to look-up qualified name targets. This is usually ‘self’. Returns The string source code generated from this `Graph`. `class torch.fx.Node(graph, name, op, target, args, kwargs, type=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node) `Node` is the data structure that represents individual operations within a `Graph`. For the most part, Nodes represent callsites to various entities, such as operators, methods, and Modules (some exceptions include nodes that specify function inputs and outputs). Each `Node` has a function specified by its `op` property. The `Node` semantics for each value of `op` are as follows: * `placeholder` represents a function input. The `name` attribute specifies the name this value will take on. `target` is similarly the name of the argument. `args` holds either: 1) nothing, or 2) a single argument denoting the default parameter of the function input. `kwargs` is don’t-care. Placeholders correspond to the function parameters (e.g. `x`) in the graph printout. * `get_attr` retrieves a parameter from the module hierarchy. `name` is similarly the name the result of the fetch is assigned to. `target` is the fully-qualified name of the parameter’s position in the module hierarchy. `args` and `kwargs` are don’t-care * `call_function` applies a free function to some values. `name` is similarly the name of the value to assign to. `target` is the function to be applied. `args` and `kwargs` represent the arguments to the function, following the Python calling convention * `call_module` applies a module in the module hierarchy’s `forward()` method to given arguments. `name` is as previous. `target` is the fully-qualified name of the module in the module hierarchy to call. `args` and `kwargs` represent the arguments to invoke the module on, _including the self argument_. * `call_method` calls a method on a value. `name` is as similar. `target` is the string name of the method to apply to the `self` argument. `args` and `kwargs` represent the arguments to invoke the module on, _including the self argument_ * `output` contains the output of the traced function in its `args[0]` attribute. This corresponds to the “return” statement in the Graph printout. `property all_input_nodes` Return all Nodes that are inputs to this Node. This is equivalent to iterating over `args` and `kwargs` and only collecting the values that are Nodes. Returns List of `Nodes` that appear in the `args` and `kwargs` of this `Node`, in that order. `append(x)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node.append) Insert x after this node in the list of nodes in the graph. Equvalent to `self.next.prepend(x)` Parameters **x** (Node) – The node to put after this node. Must be a member of the same graph. `property args` The tuple of arguments to this `Node`. The interpretation of arguments depends on the node’s opcode. See the `Node` docstring for more information. Assignment to this property is allowed. All accounting of uses and users is updated automatically on assignment. `property kwargs` The dict of keyword arguments to this `Node`. The interpretation of arguments depends on the node’s opcode. See the `Node` docstring for more information. Assignment to this property is allowed. All accounting of uses and users is updated automatically on assignment. `property next` Returns the next `Node` in the linked list of Nodes. Returns The next `Node` in the linked list of Nodes. `prepend(x)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node.prepend) Insert x before this node in the list of nodes in the graph. Example: Before: p -> self bx -> x -> ax After: p -> x -> self bx -> ax Parameters **x** (Node) – The node to put before this node. Must be a member of the same graph. `property prev` Returns the previous `Node` in the linked list of Nodes. Returns The previous `Node` in the linked list of Nodes. `replace_all_uses_with(replace_with)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/node.html#Node.replace_all_uses_with) Replace all uses of `self` in the Graph with the Node `replace_with`. Parameters **replace_with** (Node) – The node to replace all uses of `self` with. Returns The list of Nodes on which this change was made. `class torch.fx.Tracer(autowrap_modules=(, ))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer) `Tracer` is the class that implements the symbolic tracing functionality of `torch.fx.symbolic_trace`. A call to `symbolic_trace(m)` is equivalent to `Tracer().trace(m)`. Tracer can be subclassed to override various behaviors of the tracing process. The different behaviors that can be overridden are described in the docstrings of the methods on this class. `call_module(m, forward, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.call_module) Method that specifies the behavior of this `Tracer` when it encounters a call to an `nn.Module` instance. By default, the behavior is to check if the called module is a leaf module via `is_leaf_module`. If it is, emit a `call_module` node referring to `m` in the `Graph`. Otherwise, call the `Module` normally, tracing through the operations in its `forward` function. This method can be overridden to–for example–create nested traced GraphModules, or any other behavior you would want while tracing across `Module` boundaries. `Module` boundaries. Parameters * **m** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – The module for which a call is being emitted * **forward** (_Callable_) – The forward() method of the `Module` to be invoked * **args** (_Tuple_) – args of the module callsite * **kwargs** (_Dict_) – kwargs of the module callsite Returns The return value from the Module call. In the case that a `call_module` node was emitted, this is a `Proxy` value. Otherwise, it is whatever value was returned from the `Module` invocation. `create_arg(a)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.create_arg) A method to specify the behavior of tracing when preparing values to be used as arguments to nodes in the `Graph`. By default, the behavior includes: 1. Iterate through collection types (e.g. tuple, list, dict) and recursively call `create_args` on the elements. 2. Given a Proxy object, return a reference to the underlying IR `Node` 3. Given a non-Proxy Tensor object, emit IR for various cases: * For a Parameter, emit a `get_attr` node referring to that Parameter * For a non-Parameter Tensor, store the Tensor away in a special attribute referring to that attribute. This method can be overridden to support more types. Parameters **a** (_Any_) – The value to be emitted as an `Argument` in the `Graph`. Returns The value `a` converted into the appropriate `Argument` `create_args_for_root(root_fn, is_module, concrete_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.create_args_for_root) Create `placeholder` nodes corresponding to the signature of the `root` Module. This method introspects root’s signature and emits those nodes accordingly, also supporting `*args` and `**kwargs`. `is_leaf_module(m, module_qualified_name)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.is_leaf_module) A method to specify whether a given `nn.Module` is a “leaf” module. Leaf modules are the atomic units that appear in the IR, referenced by `call_module` calls. By default, Modules in the PyTorch standard library namespace (torch.nn) are leaf modules. All other modules are traced through and their constituent ops are recorded, unless specified otherwise via this parameter. Parameters * **m** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – The module being queried about * **module_qualified_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The path to root of this module. For example, if you have a module hierarchy where submodule `foo` contains submodule `bar`, which contains submodule `baz`, that module will appear with the qualified name `foo.bar.baz` here. `path_of_module(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.path_of_module) Helper method to find the qualified name of `mod` in the Module hierarchy of `root`. For example, if `root` has a submodule named `foo`, which has a submodule named `bar`, passing `bar` into this function will return the string “foo.bar”. Parameters **mod** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The `Module` to retrieve the qualified name for. `trace(root, concrete_args=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/symbolic_trace.html#Tracer.trace) Trace `root` and return the corresponding FX `Graph` representation. `root` can either be an `nn.Module` instance or a Python callable. Note that after this call, `self.root` may be different from the `root` passed in here. For example, when a free function is passed to `trace()`, we will create an `nn.Module` instance to use as the root and add embedded constants to. Parameters **root** (_Union_ _[_[Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") _,__Callable_ _]_) – Either a `Module` or a function to be traced through. Returns A `Graph` representing the semantics of the passed-in `root`. `class torch.fx.Proxy(node, tracer=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/proxy.html#Proxy) `Proxy` objects are `Node` wrappers that flow through the program during symbolic tracing and record all the operations (`torch` function calls, method calls, operators) that they touch into the growing FX Graph. If you’re doing graph transforms, you can wrap your own `Proxy` method around a raw `Node` so that you can use the overloaded operators to add additional things to a `Graph`. `class torch.fx.Interpreter(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter) An Interpreter executes an FX graph Node-by-Node. This pattern can be useful for many things, including writing code transformations as well as analysis passes. Methods in the Interpreter class can be overridden to customize the behavior of execution. The map of overrideable methods in terms of call hierarchy: run() +-- run_node +-- placeholder() +-- get_attr() +-- call_function() +-- call_method() +-- call_module() +-- output() #### Example Suppose we want to swap all instances of `torch.neg` with `torch.sigmoid` and vice versa (including their `Tensor` method equivalents). We could subclass Interpreter like so: class NegSigmSwapInterpreter(Interpreter): def call_function(self, target : Target, args : Tuple, kwargs : Dict) -> Any: if target == torch.sigmoid: return torch.neg(*args, **kwargs) return super().call_function(n) def call_method(self, target : Target, args : Tuple, kwargs : Dict) -> Any: if target == 'neg': call_self, *args_tail = args return call_self.sigmoid(*args_tail, **kwargs) return super().call_method(n) def fn(x): return torch.sigmoid(x).neg() gm = torch.fx.symbolic_trace(fn) input = torch.randn(3, 4) result = NegSigmSwapInterpreter(gm).run(input) torch.testing.assert_allclose(result, torch.neg(input).sigmoid()) Parameters **module** (GraphModule) – The module to be executed `call_function(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.call_function) Execute a `call_function` node and return the result. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation Return Any: The value returned by the function invocation `call_method(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.call_method) Execute a `call_method` node and return the result. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation Return Any: The value returned by the method invocation `call_module(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.call_module) Execute a `call_module` node and return the result. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation Return Any: The value returned by the module invocation `fetch_args_kwargs_from_env(n)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.fetch_args_kwargs_from_env) Fetch the concrete values of `args` and `kwargs` of node `n` from the current execution environment. Parameters **n** (Node) – The node for which `args` and `kwargs` should be fetched. Returns `args` and `kwargs` with concrete values for `n`. Return type Tuple[Tuple, Dict] `fetch_attr(target)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.fetch_attr) Fetch an attribute from the `Module` hierarchy of `self.module`. Parameters **target** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – The fully-qualfiied name of the attribute to fetch Returns The value of the attribute. Return type Any `get_attr(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.get_attr) Execute a `get_attr` node. Will retrieve an attribute value from the `Module` hierarchy of `self.module`. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation Returns The value of the attribute that was retrieved Return type Any `map_nodes_to_values(args, n)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.map_nodes_to_values) Recursively descend through `args` and look up the concrete value for each `Node` in the current execution environment. Parameters * **args** (_Argument_) – Data structure within which to look up concrete values * **n** (Node) – Node to which `args` belongs. This is only used for error reporting. `output(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.output) Execute an `output` node. This really just retrieves the value referenced by the `output` node and returns it. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation Returns The return value referenced by the output node Return type Any `placeholder(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.placeholder) Execute a `placeholder` node. Note that this is stateful: `Interpreter` maintains an internal iterator over arguments passed to `run` and this method returns next() on that iterator. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation Returns The argument value that was retrieved. Return type Any `run(*args, initial_env=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.run) Run `module` via interpretation and return the result. Parameters * ***args** – The arguments to the Module to run, in positional order * **initial_env** (_Optional_ _[__Dict_ _[_Node _,__Any_ _]__]_) – An optional starting environment for execution. This is a dict mapping `Node` to any value. This can be used, for example, to pre-populate results for certain `Nodes` so as to do only partial evaluation within the interpreter. Returns The value returned from executing the Module Return type Any `run_node(n)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Interpreter.run_node) Run a specific node `n` and return the result. Calls into placeholder, get_attr, call_function, call_method, call_module, or output depending on `node.op` Parameters **n** (Node) – The Node to execute Returns The result of executing `n` Return type Any `class torch.fx.Transformer(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer) `Transformer` is a special type of interpreter that produces a new `Module`. It exposes a `transform()` method that returns the transformed `Module`. `Transformer` does not require arguments to run, as `Interpreter` does. `Transformer` works entirely symbolically. #### Example Suppose we want to swap all instances of `torch.neg` with `torch.sigmoid` and vice versa (including their `Tensor` method equivalents). We could subclass `Transformer` like so: class NegSigmSwapXformer(Transformer): def call_function(self, target : 'Target', args : Tuple[Argument, ...], kwargs : Dict[str, Any]) -> Any: if target == torch.sigmoid: return torch.neg(*args, **kwargs) return super().call_function(n) def call_method(self, target : 'Target', args : Tuple[Argument, ...], kwargs : Dict[str, Any]) -> Any: if target == 'neg': call_self, *args_tail = args return call_self.sigmoid(*args_tail, **kwargs) return super().call_method(n) def fn(x): return torch.sigmoid(x).neg() gm = torch.fx.symbolic_trace(fn) transformed : torch.nn.Module = NegSigmSwapXformer(gm).transform() input = torch.randn(3, 4) torch.testing.assert_allclose(transformed(input), torch.neg(input).sigmoid()) Parameters **module** (GraphModule) – The `Module` to be transformed. `get_attr(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer.get_attr) Execute a `get_attr` node. In `Transformer`, this is overridden to insert a new `get_attr` node into the output graph. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation `placeholder(target, args, kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer.placeholder) Execute a `placeholder` node. In `Transformer`, this is overridden to insert a new `placeholder` into the output graph. Parameters * **target** (_Target_) – The call target for this node. See [Node](https://pytorch.org/docs/master/fx.html#torch.fx.Node) for details on semantics * **args** (_Tuple_) – Tuple of positional args for this invocation * **kwargs** (_Dict_) – Dict of keyword arguments for this invocation `transform()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/interpreter.html#Transformer.transform) Transform `self.module` and return the transformed `GraphModule`. `torch.fx.replace_pattern(gm, pattern, replacement)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/fx/subgraph_rewriter.html#replace_pattern) Matches all possible non-overlapping sets of operators and their data dependencies (`pattern`) in the Graph of a GraphModule (`gm`), then replaces each of these matched subgraphs with another subgraph (`replacement`). Parameters * **gm** – The GraphModule that wraps the Graph to operate on * **pattern** – The subgraph to match in `gm` for replacement * **replacement** – The subgraph to replace `pattern` with Returns A list of `Match` objects representing the places in the original graph that `pattern` was matched to. The list is empty if there are no matches. `Match` is defined as: class Match(NamedTuple): # Node from which the match was found anchor: Node # Maps nodes in the pattern subgraph to nodes in the larger graph nodes_map: Dict[Node, Node] Return type List[Match] Examples: import torch from torch.fx import symbolic_trace, subgraph_rewriter class M(torch.nn.Module): def __init__(self): super().__init__() def forward(self, x, w1, w2): m1 = torch.cat([w1, w2]).sum() m2 = torch.cat([w1, w2]).sum() return x + torch.max(m1) + torch.max(m2) def pattern(w1, w2): return torch.cat([w1, w2]).sum() def replacement(w1, w2): return torch.stack([w1, w2]) traced_module = symbolic_trace(M()) subgraph_rewriter.replace_pattern(traced_module, pattern, replacement) The above code will first match `pattern` in the `forward` method of `traced_module`. Pattern-matching is done based on use-def relationships, not node names. For example, if you had `p = torch.cat([a, b])` in `pattern`, you could match `m = torch.cat([a, b])` in the original `forward` function, despite the variable names being different (`p` vs `m`). The `return` statement in `pattern` is matched based on its value only; it may or may not match to the `return` statement in the larger graph. In other words, the pattern doesn’t have to extend to the end of the larger graph. When the pattern is matched, it will be removed from the larger function and replaced by `replacement`. If there are multiple matches for `pattern` in the larger function, each non-overlapping match will be replaced. In the case of a match overlap, the first found match in the set of overlapping matches will be replaced. (“First” here being defined as the first in a topological ordering of the Nodes’ use-def relationships. In most cases, the first Node is the parameter that appears directly after `self`, while the last Node is whatever the function returns.) One important thing to note is that the parameters of the `pattern` Callable must be used in the Callable itself, and the parameters of the `replacement` Callable must match the pattern. The first rule is why, in the above code block, the `forward` function has parameters `x, w1, w2`, but the `pattern` function only has parameters `w1, w2`. `pattern` doesn’t use `x`, so it shouldn’t specify `x` as a parameter. As an example of the second rule, consider replacing def pattern(x, y): return torch.neg(x) + torch.relu(y) with def replacement(x, y): return torch.relu(x) In this case, `replacement` needs the same number of parameters as `pattern` (both `x` and `y`), even though the parameter `y` isn’t used in `replacement`. After calling `subgraph_rewriter.replace_pattern`, the generated Python code looks like this: def forward(self, x, w1, w2): stack_1 = torch.stack([w1, w2]) sum_1 = stack_1.sum() stack_2 = torch.stack([w1, w2]) sum_2 = stack_2.sum() max_1 = torch.max(sum_1) add_1 = x + max_1 max_2 = torch.max(sum_2) add_2 = add_1 + max_2 return add_2 # torch._assert `torch._assert(condition, message)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#_assert) A wrapper around Python’s assert which is symbolically traceable. # torch.abs `torch.abs(input, *, out=None) → Tensor` Computes the absolute value of each element in `input`. outi=∣inputi∣\text{out}_{i} = |\text{input}_{i}| Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.abs(torch.tensor([-1, -2, 3])) tensor([ 1, 2, 3]) # torch.absolute `torch.absolute(input, *, out=None) → Tensor` Alias for [`torch.abs()`](torch.abs#torch.abs "torch.abs") # torch.acos `torch.acos(input, *, out=None) → Tensor` Computes the inverse cosine of each element in `input`. outi=cos⁡−1(inputi)\text{out}_{i} = \cos^{-1}(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.3348, -0.5889, 0.2005, -0.1584]) >>> torch.acos(a) tensor([ 1.2294, 2.2004, 1.3690, 1.7298]) # torch.acosh `torch.acosh(input, *, out=None) → Tensor` Returns a new tensor with the inverse hyperbolic cosine of the elements of `input`. Note The domain of the inverse hyperbolic cosine is `[1, inf)` and values outside this range will be mapped to `NaN`, except for `+ INF` for which the output is mapped to `+ INF`. outi=cosh⁡−1(inputi)\text{out}_{i} = \cosh^{-1}(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4).uniform_(1, 2) >>> a tensor([ 1.3192, 1.9915, 1.9674, 1.7151 ]) >>> torch.acosh(a) tensor([ 0.7791, 1.3120, 1.2979, 1.1341 ]) # torch.add `torch.add(input, other, *, out=None)` Adds the scalar `other` to each element of the input `input` and returns a new resulting tensor. out=input+other\text{out} = \text{input} + \text{other} If `input` is of type FloatTensor or DoubleTensor, `other` must be a real number, otherwise it should be an integer. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **value** (_Number_) – the number to be added to each element of `input` Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.0202, 1.0985, 1.3506, -0.6056]) >>> torch.add(a, 20) tensor([ 20.0202, 21.0985, 21.3506, 19.3944]) `torch.add(input, other, *, alpha=1, out=None)` Each element of the tensor `other` is multiplied by the scalar `alpha` and added to each element of the tensor `input`. The resulting tensor is returned. The shapes of `input` and `other` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). out=input+alpha×other\text{out} = \text{input} + \text{alpha} \times \text{other} If `other` is of type FloatTensor or DoubleTensor, `alpha` must be a real number, otherwise it should be an integer. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments * **alpha** (_Number_) – the scalar multiplier for `other` * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.9732, -0.3497, 0.6245, 0.4022]) >>> b = torch.randn(4, 1) >>> b tensor([[ 0.3743], [-1.7724], [-0.5811], [-0.8017]]) >>> torch.add(a, b, alpha=10) tensor([[ 2.7695, 3.3930, 4.3672, 4.1450], [-18.6971, -18.0736, -17.0994, -17.3216], [ -6.7845, -6.1610, -5.1868, -5.4090], [ -8.9902, -8.3667, -7.3925, -7.6147]]) # torch.addbmm `torch.addbmm(input, batch1, batch2, *, beta=1, alpha=1, out=None) → Tensor` Performs a batch matrix-matrix product of matrices stored in `batch1` and `batch2`, with a reduced add step (all matrix multiplications get accumulated along the first dimension). `input` is added to the final result. `batch1` and `batch2` must be 3-D tensors each containing the same number of matrices. If `batch1` is a (b×n×m)(b \times n \times m) tensor, `batch2` is a (b×m×p)(b \times m \times p) tensor, `input` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with a (n×p)(n \times p) tensor and `out` will be a (n×p)(n \times p) tensor. out=β input+α(∑i=0b−1batch1i@batch2i)out = \beta\ \text{input} + \alpha\ (\sum_{i=0}^{b-1} \text{batch1}_i \mathbin{@} \text{batch2}_i) If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will not be propagated. For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and `alpha` must be real numbers, otherwise they should be integers. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Parameters * **batch1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first batch of matrices to be multiplied * **batch2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second batch of matrices to be multiplied Keyword Arguments * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta ) * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be added * **alpha** (_Number_ _,__optional_) – multiplier for `batch1 @ batch2` (α\alpha ) * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> M = torch.randn(3, 5) >>> batch1 = torch.randn(10, 3, 4) >>> batch2 = torch.randn(10, 4, 5) >>> torch.addbmm(M, batch1, batch2) tensor([[ 6.6311, 0.0503, 6.9768, -12.0362, -2.1653], [ -4.8185, -1.4255, -6.6760, 8.9453, 2.5743], [ -3.8202, 4.3691, 1.0943, -1.1109, 5.4730]]) # torch.addcdiv `torch.addcdiv(input, tensor1, tensor2, *, value=1, out=None) → Tensor` Performs the element-wise division of `tensor1` by `tensor2`, multiply the result by the scalar `value` and add it to `input`. Warning Integer division with addcdiv is no longer supported, and in a future release addcdiv will perform a true division of tensor1 and tensor2. The historic addcdiv behavior can be implemented as (input + value * torch.trunc(tensor1 / tensor2)).to(input.dtype) for integer inputs and as (input + value * tensor1 / tensor2) for float inputs. The future addcdiv behavior is just the latter implementation: (input + value * tensor1 / tensor2), for all dtypes. outi=inputi+value×tensor1itensor2i\text{out}_i = \text{input}_i + \text{value} \times \frac{\text{tensor1}_i}{\text{tensor2}_i} The shapes of `input`, `tensor1`, and `tensor2` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). For inputs of type `FloatTensor` or `DoubleTensor`, `value` must be a real number, otherwise an integer. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be added * **tensor1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the numerator tensor * **tensor2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the denominator tensor Keyword Arguments * **value** (_Number_ _,__optional_) – multiplier for tensor1/tensor2\text{tensor1} / \text{tensor2} * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> t = torch.randn(1, 3) >>> t1 = torch.randn(3, 1) >>> t2 = torch.randn(1, 3) >>> torch.addcdiv(t, t1, t2, value=0.1) tensor([[-0.2312, -3.6496, 0.1312], [-1.0428, 3.4292, -0.1030], [-0.5369, -0.9829, 0.0430]]) # torch.addcmul `torch.addcmul(input, tensor1, tensor2, *, value=1, out=None) → Tensor` Performs the element-wise multiplication of `tensor1` by `tensor2`, multiply the result by the scalar `value` and add it to `input`. outi=inputi+value×tensor1i×tensor2i\text{out}_i = \text{input}_i + \text{value} \times \text{tensor1}_i \times \text{tensor2}_i The shapes of [`tensor`](torch.tensor#torch.tensor "torch.tensor"), `tensor1`, and `tensor2` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). For inputs of type `FloatTensor` or `DoubleTensor`, `value` must be a real number, otherwise an integer. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be added * **tensor1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be multiplied * **tensor2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be multiplied Keyword Arguments * **value** (_Number_ _,__optional_) – multiplier for tensor1.∗tensor2tensor1 .* tensor2 * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> t = torch.randn(1, 3) >>> t1 = torch.randn(3, 1) >>> t2 = torch.randn(1, 3) >>> torch.addcmul(t, t1, t2, value=0.1) tensor([[-0.8635, -0.6391, 1.6174], [-0.7617, -0.5879, 1.7388], [-0.8353, -0.6249, 1.6511]]) # torch.addmm `torch.addmm(input, mat1, mat2, *, beta=1, alpha=1, out=None) → Tensor` Performs a matrix multiplication of the matrices `mat1` and `mat2`. The matrix `input` is added to the final result. If `mat1` is a (n×m)(n \times m) tensor, `mat2` is a (m×p)(m \times p) tensor, then `input` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with a (n×p)(n \times p) tensor and `out` will be a (n×p)(n \times p) tensor. `alpha` and `beta` are scaling factors on matrix-vector product between `mat1` and `mat2` and the added matrix `input` respectively. out=β input+α(mat1i@mat2i)\text{out} = \beta\ \text{input} + \alpha\ (\text{mat1}_i \mathbin{@} \text{mat2}_i) If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will not be propagated. For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and `alpha` must be real numbers, otherwise they should be integers. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be added * **mat1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first matrix to be matrix multiplied * **mat2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second matrix to be matrix multiplied Keyword Arguments * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta ) * **alpha** (_Number_ _,__optional_) – multiplier for mat1@mat2mat1 @ mat2 (α\alpha ) * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> M = torch.randn(2, 3) >>> mat1 = torch.randn(2, 3) >>> mat2 = torch.randn(3, 3) >>> torch.addmm(M, mat1, mat2) tensor([[-4.8716, 1.4671, -1.3746], [ 0.7573, -3.9555, -2.8681]]) # torch.addmv `torch.addmv(input, mat, vec, *, beta=1, alpha=1, out=None) → Tensor` Performs a matrix-vector product of the matrix `mat` and the vector `vec`. The vector `input` is added to the final result. If `mat` is a (n×m)(n \times m) tensor, `vec` is a 1-D tensor of size `m`, then `input` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with a 1-D tensor of size `n` and `out` will be 1-D tensor of size `n`. `alpha` and `beta` are scaling factors on matrix-vector product between `mat` and `vec` and the added tensor `input` respectively. out=β input+α(mat@vec)\text{out} = \beta\ \text{input} + \alpha\ (\text{mat} \mathbin{@} \text{vec}) If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will not be propagated. For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and `alpha` must be real numbers, otherwise they should be integers Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – vector to be added * **mat** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be matrix multiplied * **vec** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – vector to be matrix multiplied Keyword Arguments * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta ) * **alpha** (_Number_ _,__optional_) – multiplier for mat@vecmat @ vec (α\alpha ) * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> M = torch.randn(2) >>> mat = torch.randn(2, 3) >>> vec = torch.randn(3) >>> torch.addmv(M, mat, vec) tensor([-0.3768, -5.5565]) # torch.addr `torch.addr(input, vec1, vec2, *, beta=1, alpha=1, out=None) → Tensor` Performs the outer-product of vectors `vec1` and `vec2` and adds it to the matrix `input`. Optional values `beta` and `alpha` are scaling factors on the outer product between `vec1` and `vec2` and the added matrix `input` respectively. out=β input+α(vec1⊗vec2)\text{out} = \beta\ \text{input} + \alpha\ (\text{vec1} \otimes \text{vec2}) If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will not be propagated. If `vec1` is a vector of size `n` and `vec2` is a vector of size `m`, then `input` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with a matrix of size (n×m)(n \times m) and `out` will be a matrix of size (n×m)(n \times m) . Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be added * **vec1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first vector of the outer product * **vec2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second vector of the outer product Keyword Arguments * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta ) * **alpha** (_Number_ _,__optional_) – multiplier for vec1⊗vec2\text{vec1} \otimes \text{vec2} (α\alpha ) * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> vec1 = torch.arange(1., 4.) >>> vec2 = torch.arange(1., 3.) >>> M = torch.zeros(3, 2) >>> torch.addr(M, vec1, vec2) tensor([[ 1., 2.], [ 2., 4.], [ 3., 6.]]) # torch.all `torch.all(input) → Tensor` Tests if all elements in `input` evaluate to `True`. Note This function matches the behaviour of NumPy in returning output of dtype `bool` for all supported dtypes except `uint8`. For `uint8` the dtype of output is `uint8` itself. Example: >>> a = torch.rand(1, 2).bool() >>> a tensor([[False, True]], dtype=torch.bool) >>> torch.all(a) tensor(False, dtype=torch.bool) >>> a = torch.arange(0, 3) >>> a tensor([0, 1, 2]) >>> torch.all(a) tensor(False) `torch.all(input, dim, keepdim=False, *, out=None) → Tensor` For each row of `input` in the given dimension `dim`, returns `True` if all elements in the row evaluate to `True` and `False` otherwise. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 fewer dimension than `input`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.rand(4, 2).bool() >>> a tensor([[True, True], [True, False], [True, True], [True, True]], dtype=torch.bool) >>> torch.all(a, dim=1) tensor([ True, False, True, True], dtype=torch.bool) >>> torch.all(a, dim=0) tensor([ True, False], dtype=torch.bool) # torch.allclose `torch.allclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False) → bool` This function checks if all `input` and `other` satisfy the condition: ∣input−other∣≤atol+rtol×∣other∣\lvert \text{input} - \text{other} \rvert \leq \texttt{atol} + \texttt{rtol} \times \lvert \text{other} \rvert elementwise, for all elements of `input` and `other`. The behaviour of this function is analogous to [numpy.allclose](https://docs.scipy.org/doc/numpy/reference/generated/numpy.allclose.html) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor to compare * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor to compare * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance. Default: 1e-08 * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance. Default: 1e-05 * **equal_nan** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, then two `NaN` s will be considered equal. Default: `False` Example: >>> torch.allclose(torch.tensor([10000., 1e-07]), torch.tensor([10000.1, 1e-08])) False >>> torch.allclose(torch.tensor([10000., 1e-08]), torch.tensor([10000.1, 1e-09])) True >>> torch.allclose(torch.tensor([1.0, float('nan')]), torch.tensor([1.0, float('nan')])) False >>> torch.allclose(torch.tensor([1.0, float('nan')]), torch.tensor([1.0, float('nan')]), equal_nan=True) True # torch.amax `torch.amax(input, dim, keepdim=False, *, out=None) → Tensor` Returns the maximum value of each slice of the `input` tensor in the given dimension(s) `dim`. Note `The difference between max/min and amax/amin is:` * `amax`/`amin` supports reducing on multiple dimensions, * `amax`/`amin` does not return indices, * `amax`/`amin` evenly distributes gradient between equal values, while `max(dim)`/`min(dim)` propagates gradient only to a single index in the source tensor. If `keepdim is ``True``, the output tensors are of the same size as `input` except in the dimension(s) `dim` where they are of size 1. Otherwise, `dim`s are squeezed (see :func:`torch.squeeze`), resulting in the output tensors having fewer dimension than `input`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 0.8177, 1.4878, -0.2491, 0.9130], [-0.7158, 1.1775, 2.0992, 0.4817], [-0.0053, 0.0164, -1.3738, -0.0507], [ 1.9700, 1.1106, -1.0318, -1.0816]]) >>> torch.amax(a, 1) tensor([1.4878, 2.0992, 0.0164, 1.9700]) # torch.amin `torch.amin(input, dim, keepdim=False, *, out=None) → Tensor` Returns the minimum value of each slice of the `input` tensor in the given dimension(s) `dim`. Note `The difference between max/min and amax/amin is:` * `amax`/`amin` supports reducing on multiple dimensions, * `amax`/`amin` does not return indices, * `amax`/`amin` evenly distributes gradient between equal values, while `max(dim)`/`min(dim)` propagates gradient only to a single index in the source tensor. If `keepdim` is `True`, the output tensors are of the same size as `input` except in the dimension(s) `dim` where they are of size 1. Otherwise, `dim`s are squeezed (see :func:`torch.squeeze`), resulting in the output tensors having fewer dimensions than `input`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 0.6451, -0.4866, 0.2987, -1.3312], [-0.5744, 1.2980, 1.8397, -0.2713], [ 0.9128, 0.9214, -1.7268, -0.2995], [ 0.9023, 0.4853, 0.9075, -1.6165]]) >>> torch.amin(a, 1) tensor([-1.3312, -0.5744, -1.7268, -1.6165]) # torch.angle `torch.angle(input, *, out=None) → Tensor` Computes the element-wise angle (in radians) of the given `input` tensor. outi=angle(inputi)\text{out}_{i} = angle(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Note Starting in PyTorch 1.8, angle returns pi for negative real numbers, zero for non-negative real numbers, and propagates NaNs. Previously the function would return zero for all real numbers and not propagate floating-point NaNs. Example: >>> torch.angle(torch.tensor([-1 + 1j, -2 + 2j, 3 - 3j]))*180/3.14159 tensor([ 135., 135, -45]) # torch.any `torch.any(input) → Tensor` Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Tests if any element in `input` evaluates to `True`. Note This function matches the behaviour of NumPy in returning output of dtype `bool` for all supported dtypes except `uint8`. For `uint8` the dtype of output is `uint8` itself. Example: >>> a = torch.rand(1, 2).bool() >>> a tensor([[False, True]], dtype=torch.bool) >>> torch.any(a) tensor(True, dtype=torch.bool) >>> a = torch.arange(0, 3) >>> a tensor([0, 1, 2]) >>> torch.any(a) tensor(True) `torch.any(input, dim, keepdim=False, *, out=None) → Tensor` For each row of `input` in the given dimension `dim`, returns `True` if any element in the row evaluate to `True` and `False` otherwise. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 fewer dimension than `input`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 2) < 0 >>> a tensor([[ True, True], [False, True], [ True, True], [False, False]]) >>> torch.any(a, 1) tensor([ True, True, True, False]) >>> torch.any(a, 0) tensor([True, True]) # torch.arange `torch.arange(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a 1-D tensor of size ⌈end−startstep⌉\left\lceil \frac{\text{end} - \text{start}}{\text{step}} \right\rceil with values from the interval `[start, end)` taken with common difference `step` beginning from `start`. Note that non-integer `step` is subject to floating point rounding errors when comparing against `end`; to avoid inconsistency, we advise adding a small epsilon to `end` in such cases. outi+1=outi+step\text{out}_{{i+1}} = \text{out}_{i} + \text{step} Parameters * **start** (_Number_) – the starting value for the set of points. Default: `0`. * **end** (_Number_) – the ending value for the set of points * **step** (_Number_) – the gap between each pair of adjacent points. Default: `1`. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). If `dtype` is not given, infer the data type from the other input arguments. If any of `start`, `end`, or `stop` are floating-point, the `dtype` is inferred to be the default dtype, see [`get_default_dtype()`](torch.get_default_dtype#torch.get_default_dtype "torch.get_default_dtype"). Otherwise, the `dtype` is inferred to be `torch.int64`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.arange(5) tensor([ 0, 1, 2, 3, 4]) >>> torch.arange(1, 4) tensor([ 1, 2, 3]) >>> torch.arange(1, 2.5, 0.5) tensor([ 1.0000, 1.5000, 2.0000]) # torch.arccos `torch.arccos(input, *, out=None) → Tensor` Alias for [`torch.acos()`](torch.acos#torch.acos "torch.acos"). # torch.arccosh `torch.arccosh(input, *, out=None) → Tensor` Alias for [`torch.acosh()`](torch.acosh#torch.acosh "torch.acosh"). # torch.arcsin `torch.arcsin(input, *, out=None) → Tensor` Alias for [`torch.asin()`](torch.asin#torch.asin "torch.asin"). # torch.arcsinh `torch.arcsinh(input, *, out=None) → Tensor` Alias for [`torch.asinh()`](torch.asinh#torch.asinh "torch.asinh"). # torch.arctan `torch.arctan(input, *, out=None) → Tensor` Alias for [`torch.atan()`](torch.atan#torch.atan "torch.atan"). # torch.arctanh `torch.arctanh(input, *, out=None) → Tensor` Alias for [`torch.atanh()`](torch.atanh#torch.atanh "torch.atanh"). # torch.are_deterministic_algorithms_enabled `torch.are_deterministic_algorithms_enabled()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#are_deterministic_algorithms_enabled) Returns True if the global deterministic flag is turned on. Refer to [`torch.use_deterministic_algorithms()`](torch.use_deterministic_algorithms#torch.use_deterministic_algorithms "torch.use_deterministic_algorithms") documentation for more details. # torch.argmax `torch.argmax(input) → LongTensor` Returns the indices of the maximum value of all elements in the `input` tensor. This is the second value returned by [`torch.max()`](torch.max#torch.max "torch.max"). See its documentation for the exact semantics of this method. Note If there are multiple minimal values then the indices of the first minimal value are returned. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 1.3398, 0.2663, -0.2686, 0.2450], [-0.7401, -0.8805, -0.3402, -1.1936], [ 0.4907, -1.3948, -1.0691, -0.3132], [-1.6092, 0.5419, -0.2993, 0.3195]]) >>> torch.argmax(a) tensor(0) `torch.argmax(input, dim, keepdim=False) → LongTensor` Returns the indices of the maximum values of a tensor across a dimension. This is the second value returned by [`torch.max()`](torch.max#torch.max "torch.max"). See its documentation for the exact semantics of this method. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. If `None`, the argmax of the flattened input is returned. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Ignored if `dim=None`. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 1.3398, 0.2663, -0.2686, 0.2450], [-0.7401, -0.8805, -0.3402, -1.1936], [ 0.4907, -1.3948, -1.0691, -0.3132], [-1.6092, 0.5419, -0.2993, 0.3195]]) >>> torch.argmax(a, dim=1) tensor([ 0, 2, 0, 1]) # torch.argmin `torch.argmin(input, dim=None, keepdim=False) → LongTensor` Returns the indices of the minimum value(s) of the flattened tensor or along a dimension This is the second value returned by [`torch.min()`](torch.min#torch.min "torch.min"). See its documentation for the exact semantics of this method. Note If there are multiple minimal values then the indices of the first minimal value are returned. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. If `None`, the argmin of the flattened input is returned. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Ignored if `dim=None`. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 0.1139, 0.2254, -0.1381, 0.3687], [ 1.0100, -1.1975, -0.0102, -0.4732], [-0.9240, 0.1207, -0.7506, -1.0213], [ 1.7809, -1.2960, 0.9384, 0.1438]]) >>> torch.argmin(a) tensor(13) >>> torch.argmin(a, dim=1) tensor([ 2, 1, 3, 1]) >>> torch.argmin(a, dim=1, keepdim=True) tensor([[2], [1], [3], [1]]) # torch.argsort `torch.argsort(input, dim=-1, descending=False) → LongTensor` Returns the indices that sort a tensor along a given dimension in ascending order by value. This is the second value returned by [`torch.sort()`](torch.sort#torch.sort "torch.sort"). See its documentation for the exact semantics of this method. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to sort along * **descending** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls the sorting order (ascending or descending) Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 0.0785, 1.5267, -0.8521, 0.4065], [ 0.1598, 0.0788, -0.0745, -1.2700], [ 1.2208, 1.0722, -0.7064, 1.2564], [ 0.0669, -0.2318, -0.8229, -0.9280]]) >>> torch.argsort(a, dim=1) tensor([[2, 0, 3, 1], [3, 2, 1, 0], [2, 1, 0, 3], [3, 2, 1, 0]]) # torch.as_strided `torch.as_strided(input, size, stride, storage_offset=0) → Tensor` Create a view of an existing `torch.Tensor` `input` with specified `size`, `stride` and `storage_offset`. Warning More than one element of a created tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first. Many PyTorch functions, which return a view of a tensor, are internally implemented with this function. Those functions, like [`torch.Tensor.expand()`](../tensors#torch.Tensor.expand "torch.Tensor.expand"), are easier to read and are therefore more advisable to use. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **size** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _or_ _ints_) – the shape of the output tensor * **stride** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _or_ _ints_) – the stride of the output tensor * **storage_offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the offset in the underlying storage of the output tensor Example: >>> x = torch.randn(3, 3) >>> x tensor([[ 0.9039, 0.6291, 1.0795], [ 0.1586, 2.1939, -0.4900], [-0.1909, -0.7503, 1.9355]]) >>> t = torch.as_strided(x, (2, 2), (1, 2)) >>> t tensor([[0.9039, 1.0795], [0.6291, 0.1586]]) >>> t = torch.as_strided(x, (2, 2), (1, 2), 1) tensor([[0.6291, 0.1586], [1.0795, 2.1939]]) # torch.as_tensor `torch.as_tensor(data, dtype=None, device=None) → Tensor` Convert the data into a `torch.Tensor`. If the data is already a `Tensor` with the same `dtype` and `device`, no copy will be performed, otherwise a new `Tensor` will be returned with computational graph retained if data `Tensor` has `requires_grad=True`. Similarly, if the data is an `ndarray` of the corresponding `dtype` and the `device` is the cpu, no copy will be performed. Parameters * **data** (_array_like_) – Initial data for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, infers data type from `data`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. Example: >>> a = numpy.array([1, 2, 3]) >>> t = torch.as_tensor(a) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([-1, 2, 3]) >>> a = numpy.array([1, 2, 3]) >>> t = torch.as_tensor(a, device=torch.device('cuda')) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([1, 2, 3]) # torch.asin `torch.asin(input, *, out=None) → Tensor` Returns a new tensor with the arcsine of the elements of `input`. outi=sin⁡−1(inputi)\text{out}_{i} = \sin^{-1}(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.5962, 1.4985, -0.4396, 1.4525]) >>> torch.asin(a) tensor([-0.6387, nan, -0.4552, nan]) # torch.asinh `torch.asinh(input, *, out=None) → Tensor` Returns a new tensor with the inverse hyperbolic sine of the elements of `input`. outi=sinh⁡−1(inputi)\text{out}_{i} = \sinh^{-1}(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.1606, -1.4267, -1.0899, -1.0250 ]) >>> torch.asinh(a) tensor([ 0.1599, -1.1534, -0.9435, -0.8990 ]) # torch.atan `torch.atan(input, *, out=None) → Tensor` Returns a new tensor with the arctangent of the elements of `input`. outi=tan⁡−1(inputi)\text{out}_{i} = \tan^{-1}(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.2341, 0.2539, -0.6256, -0.6448]) >>> torch.atan(a) tensor([ 0.2299, 0.2487, -0.5591, -0.5727]) # torch.atan2 `torch.atan2(input, other, *, out=None) → Tensor` Element-wise arctangent of inputi/otheri\text{input}_{i} / \text{other}_{i} with consideration of the quadrant. Returns a new tensor with the signed angles in radians between vector (otheri,inputi)(\text{other}_{i}, \text{input}_{i}) and vector (1,0)(1, 0) . (Note that otheri\text{other}_{i} , the second parameter, is the x-coordinate, while inputi\text{input}_{i} , the first parameter, is the y-coordinate.) The shapes of `input` and `other` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.9041, 0.0196, -0.3108, -2.4423]) >>> torch.atan2(a, torch.randn(4)) tensor([ 0.9833, 0.0811, -1.9743, -1.4151]) # torch.atanh `torch.atanh(input, *, out=None) → Tensor` Returns a new tensor with the inverse hyperbolic tangent of the elements of `input`. Note The domain of the inverse hyperbolic tangent is `(-1, 1)` and values outside this range will be mapped to `NaN`, except for the values `1` and `-1` for which the output is mapped to `+/-INF` respectively. outi=tanh⁡−1(inputi)\text{out}_{i} = \tanh^{-1}(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4).uniform_(-1, 1) >>> a tensor([ -0.9385, 0.2968, -0.8591, -0.1871 ]) >>> torch.atanh(a) tensor([ -1.7253, 0.3060, -1.2899, -0.1893 ]) # torch.atleast_1d `torch.atleast_1d(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#atleast_1d) Returns a 1-dimensional view of each input tensor with zero dimensions. Input tensors with one or more dimensions are returned as-is. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _list of Tensors_) – Returns output (Tensor or tuple of Tensors) Example:: >>> x = torch.randn(2) >>> x tensor([1.4584, 0.7583]) >>> torch.atleast_1d(x) tensor([1.4584, 0.7583]) >>> x = torch.tensor(1.) >>> x tensor(1.) >>> torch.atleast_1d(x) tensor([1.]) >>> x = torch.tensor(0.5) >>> y = torch.tensor(1.) >>> torch.atleast_1d((x,y)) (tensor([0.5000]), tensor([1.])) # torch.atleast_2d `torch.atleast_2d(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#atleast_2d) Returns a 2-dimensional view of each input tensor with zero dimensions. Input tensors with two or more dimensions are returned as-is. :param input: :type input: Tensor or list of Tensors Returns output (Tensor or tuple of Tensors) Example:: >>> x = torch.tensor(1.) >>> x tensor(1.) >>> torch.atleast_2d(x) tensor([[1.]]) >>> x = torch.randn(2,2) >>> x tensor([[2.2086, 2.5165], [0.1757, 0.5194]]) >>> torch.atleast_2d(x) tensor([[2.2086, 2.5165], [0.1757, 0.5194]]) >>> x = torch.tensor(0.5) >>> y = torch.tensor(1.) >>> torch.atleast_2d((x,y)) (tensor([[0.5000]]), tensor([[1.]])) # torch.atleast_3d `torch.atleast_3d(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#atleast_3d) Returns a 3-dimensional view of each input tensor with zero dimensions. Input tensors with three or more dimensions are returned as-is. :param input: :type input: Tensor or list of Tensors Returns output (Tensor or tuple of Tensors) #### Example >>> x = torch.tensor(0.5) >>> x tensor(0.5000) >>> torch.atleast_3d(x) tensor([[[0.5000]]]) >>> y = torch.randn(2,2) >>> y tensor([[-0.8079, 0.7460], [-1.1647, 1.4734]]) >>> torch.atleast_3d(y) tensor([[[-0.8079], [ 0.7460]], [[-1.1647], [ 1.4734]]]) >>> x = torch.randn(1,1,1) >>> x tensor([[[-1.5689]]]) >>> torch.atleast_3d(x) tensor([[[-1.5689]]]) >>> x = torch.tensor(0.5) >>> y = torch.tensor(1.) >>> torch.atleast_3d((x,y)) (tensor([[[0.5000]]]), tensor([[[1.]]])) # torch.baddbmm `torch.baddbmm(input, batch1, batch2, *, beta=1, alpha=1, out=None) → Tensor` Performs a batch matrix-matrix product of matrices in `batch1` and `batch2`. `input` is added to the final result. `batch1` and `batch2` must be 3-D tensors each containing the same number of matrices. If `batch1` is a (b×n×m)(b \times n \times m) tensor, `batch2` is a (b×m×p)(b \times m \times p) tensor, then `input` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with a (b×n×p)(b \times n \times p) tensor and `out` will be a (b×n×p)(b \times n \times p) tensor. Both `alpha` and `beta` mean the same as the scaling factors used in [`torch.addbmm()`](torch.addbmm#torch.addbmm "torch.addbmm"). outi=βinputi+α(batch1i@batch2i)\text{out}_i = \beta\ \text{input}_i + \alpha\ (\text{batch1}_i \mathbin{@} \text{batch2}_i) If `beta` is 0, then `input` will be ignored, and `nan` and `inf` in it will not be propagated. For inputs of type `FloatTensor` or `DoubleTensor`, arguments `beta` and `alpha` must be real numbers, otherwise they should be integers. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be added * **batch1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first batch of matrices to be multiplied * **batch2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second batch of matrices to be multiplied Keyword Arguments * **beta** (_Number_ _,__optional_) – multiplier for `input` (β\beta ) * **alpha** (_Number_ _,__optional_) – multiplier for batch1@batch2\text{batch1} \mathbin{@} \text{batch2} (α\alpha ) * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> M = torch.randn(10, 3, 5) >>> batch1 = torch.randn(10, 3, 4) >>> batch2 = torch.randn(10, 4, 5) >>> torch.baddbmm(M, batch1, batch2).size() torch.Size([10, 3, 5]) # torch.bartlett_window `torch.bartlett_window(window_length, periodic=True, *, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Bartlett window function. w[n]=1−∣2nN−1−1∣={2nN−1if 0≤n≤N−122−2nN−1if N−12>> a = torch.empty(3, 3).uniform_(0, 1) # generate a uniform random matrix with range [0, 1] >>> a tensor([[ 0.1737, 0.0950, 0.3609], [ 0.7148, 0.0289, 0.2676], [ 0.9456, 0.8937, 0.7202]]) >>> torch.bernoulli(a) tensor([[ 1., 0., 0.], [ 0., 0., 0.], [ 1., 1., 1.]]) >>> a = torch.ones(3, 3) # probability of drawing "1" is 1 >>> torch.bernoulli(a) tensor([[ 1., 1., 1.], [ 1., 1., 1.], [ 1., 1., 1.]]) >>> a = torch.zeros(3, 3) # probability of drawing "1" is 0 >>> torch.bernoulli(a) tensor([[ 0., 0., 0.], [ 0., 0., 0.], [ 0., 0., 0.]]) # torch.bincount `torch.bincount(input, weights=None, minlength=0) → Tensor` Count the frequency of each value in an array of non-negative ints. The number of bins (size 1) is one larger than the largest value in `input` unless `input` is empty, in which case the result is a tensor of size 0. If `minlength` is specified, the number of bins is at least `minlength` and if `input` is empty, then the result is tensor of size `minlength` filled with zeros. If `n` is the value at position `i`, `out[n] += weights[i]` if `weights` is specified else `out[n] += 1`. Note This operation may produce nondeterministic gradients when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-d int tensor * **weights** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – optional, weight for each value in the input tensor. Should be of same size as input tensor. * **minlength** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – optional, minimum number of bins. Should be non-negative. Returns a tensor of shape `Size([max(input) + 1])` if `input` is non-empty, else `Size(0)` Return type output ([Tensor](../tensors#torch.Tensor "torch.Tensor")) Example: >>> input = torch.randint(0, 8, (5,), dtype=torch.int64) >>> weights = torch.linspace(0, 1, steps=5) >>> input, weights (tensor([4, 3, 6, 3, 4]), tensor([ 0.0000, 0.2500, 0.5000, 0.7500, 1.0000]) >>> torch.bincount(input) tensor([0, 0, 0, 2, 2, 0, 1]) >>> input.bincount(weights) tensor([0.0000, 0.0000, 0.0000, 1.0000, 1.0000, 0.0000, 0.5000]) # torch.bitwise_and `torch.bitwise_and(input, other, *, out=None) → Tensor` Computes the bitwise AND of `input` and `other`. The input tensor must be of integral or Boolean types. For bool tensors, it computes the logical AND. Parameters * **input** – the first input tensor * **other** – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. #### Example >>> torch.bitwise_and(torch.tensor([-1, -2, 3], dtype=torch.int8), torch.tensor([1, 0, 3], dtype=torch.int8)) tensor([1, 0, 3], dtype=torch.int8) >>> torch.bitwise_and(torch.tensor([True, True, False]), torch.tensor([False, True, False])) tensor([ False, True, False]) # torch.bitwise_not `torch.bitwise_not(input, *, out=None) → Tensor` Computes the bitwise NOT of the given input tensor. The input tensor must be of integral or Boolean types. For bool tensors, it computes the logical NOT. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. #### Example >>> torch.bitwise_not(torch.tensor([-1, -2, 3], dtype=torch.int8)) tensor([ 0, 1, -4], dtype=torch.int8) # torch.bitwise_or `torch.bitwise_or(input, other, *, out=None) → Tensor` Computes the bitwise OR of `input` and `other`. The input tensor must be of integral or Boolean types. For bool tensors, it computes the logical OR. Parameters * **input** – the first input tensor * **other** – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. #### Example >>> torch.bitwise_or(torch.tensor([-1, -2, 3], dtype=torch.int8), torch.tensor([1, 0, 3], dtype=torch.int8)) tensor([-1, -2, 3], dtype=torch.int8) >>> torch.bitwise_or(torch.tensor([True, True, False]), torch.tensor([False, True, False])) tensor([ True, True, False]) # torch.bitwise_xor `torch.bitwise_xor(input, other, *, out=None) → Tensor` Computes the bitwise XOR of `input` and `other`. The input tensor must be of integral or Boolean types. For bool tensors, it computes the logical XOR. Parameters * **input** – the first input tensor * **other** – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. #### Example >>> torch.bitwise_xor(torch.tensor([-1, -2, 3], dtype=torch.int8), torch.tensor([1, 0, 3], dtype=torch.int8)) tensor([-2, -2, 0], dtype=torch.int8) >>> torch.bitwise_xor(torch.tensor([True, True, False]), torch.tensor([False, True, False])) tensor([ True, False, False]) # torch.blackman_window `torch.blackman_window(window_length, periodic=True, *, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Blackman window function. w[n]=0.42−0.5cos⁡(2πnN−1)+0.08cos⁡(4πnN−1)w[n] = 0.42 - 0.5 \cos \left( \frac{2 \pi n}{N - 1} \right) + 0.08 \cos \left( \frac{4 \pi n}{N - 1} \right) where NN is the full window size. The input `window_length` is a positive integer controlling the returned window size. `periodic` flag determines whether the returned window trims off the last duplicate value from the symmetric window and is ready to be used as a periodic window with functions like [`torch.stft()`](torch.stft#torch.stft "torch.stft"). Therefore, if `periodic` is true, the NN in above formula is in fact window_length+1\text{window\\_length} + 1 . Also, we always have `torch.blackman_window(L, periodic=True)` equal to `torch.blackman_window(L + 1, periodic=False)[:-1])`. Note If `window_length` =1=1 , the returned window contains a single value 1. Parameters * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of returned window * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a window to be used as periodic function. If False, return a symmetric window. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). Only floating point types are supported. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Returns A 1-D tensor of size (window_length,)(\text{window\\_length},) containing the window Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") # torch.block_diag `torch.block_diag(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#block_diag) Create a block diagonal matrix from provided tensors. Parameters ***tensors** – One or more tensors with 0, 1, or 2 dimensions. Returns A 2 dimensional tensor with all the input tensors arranged in order such that their upper left and lower right corners are diagonally adjacent. All other elements are set to 0. Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> import torch >>> A = torch.tensor([[0, 1], [1, 0]]) >>> B = torch.tensor([[3, 4, 5], [6, 7, 8]]) >>> C = torch.tensor(7) >>> D = torch.tensor([1, 2, 3]) >>> E = torch.tensor([[4], [5], [6]]) >>> torch.block_diag(A, B, C, D, E) tensor([[0, 1, 0, 0, 0, 0, 0, 0, 0, 0], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [0, 0, 3, 4, 5, 0, 0, 0, 0, 0], [0, 0, 6, 7, 8, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 7, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0, 1, 2, 3, 0], [0, 0, 0, 0, 0, 0, 0, 0, 0, 4], [0, 0, 0, 0, 0, 0, 0, 0, 0, 5], [0, 0, 0, 0, 0, 0, 0, 0, 0, 6]]) # torch.bmm `torch.bmm(input, mat2, *, deterministic=False, out=None) → Tensor` Performs a batch matrix-matrix product of matrices stored in `input` and `mat2`. `input` and `mat2` must be 3-D tensors each containing the same number of matrices. If `input` is a (b×n×m)(b \times n \times m) tensor, `mat2` is a (b×m×p)(b \times m \times p) tensor, `out` will be a (b×n×p)(b \times n \times p) tensor. outi=inputi@mat2i\text{out}_i = \text{input}_i \mathbin{@} \text{mat2}_i This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Note This function does not [broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). For broadcasting matrix products, see [`torch.matmul()`](torch.matmul#torch.matmul "torch.matmul"). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first batch of matrices to be multiplied * **mat2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second batch of matrices to be multiplied Keyword Arguments * **deterministic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – flag to choose between a faster non-deterministic calculation, or a slower deterministic calculation. This argument is only available for sparse-dense CUDA bmm. Default: `False` * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> input = torch.randn(10, 3, 4) >>> mat2 = torch.randn(10, 4, 5) >>> res = torch.bmm(input, mat2) >>> res.size() torch.Size([10, 3, 5]) # torch.broadcast_shapes `torch.broadcast_shapes(*shapes) → Size` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#broadcast_shapes) Similar to [`broadcast_tensors()`](torch.broadcast_tensors#torch.broadcast_tensors "torch.broadcast_tensors") but for shapes. This is equivalent to `torch.broadcast_tensors(*map(torch.empty, shapes))[0].shape` but avoids the need create to intermediate tensors. This is useful for broadcasting tensors of common batch shape but different rightmost shape, e.g. to broadcast mean vectors with covariance matrices. Example: >>> torch.broadcast_shapes((2,), (3, 1), (1, 1, 1)) torch.Size([1, 3, 2]) Parameters ***shapes** (_torch.Size_) – Shapes of tensors. Returns A shape compatible with all input shapes. Return type shape (torch.Size) Raises [**RuntimeError**](https://docs.python.org/3/library/exceptions.html#RuntimeError "\(in Python v3.9\)") – If shapes are incompatible. # torch.broadcast_tensors `torch.broadcast_tensors(*tensors) → List of Tensors` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#broadcast_tensors) Broadcasts the given tensors according to [Broadcasting semantics](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters ***tensors** – any number of tensors of the same type Warning More than one element of a broadcasted tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first. Example: >>> x = torch.arange(3).view(1, 3) >>> y = torch.arange(2).view(2, 1) >>> a, b = torch.broadcast_tensors(x, y) >>> a.size() torch.Size([2, 3]) >>> a tensor([[0, 1, 2], [0, 1, 2]]) # torch.broadcast_to `torch.broadcast_to(input, shape) → Tensor` Broadcasts `input` to the shape `shape`. Equivalent to calling `input.expand(shape)`. See [`expand()`](../tensors#torch.Tensor.expand "torch.Tensor.expand") for details. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **shape** (list, tuple, or `torch.Size`) – the new shape. Example: >>> x = torch.tensor([1, 2, 3]) >>> torch.broadcast_to(x, (3, 3)) tensor([[1, 2, 3], [1, 2, 3], [1, 2, 3]]) # torch.bucketize `torch.bucketize(input, boundaries, *, out_int32=False, right=False, out=None) → Tensor` Returns the indices of the buckets to which each value in the `input` belongs, where the boundaries of the buckets are set by `boundaries`. Return a new tensor with the same size as `input`. If `right` is False (default), then the left boundary is closed. More formally, the returned index satisfies the following rules: `right` | _returned index satisfies_ ---|--- False | `boundaries[i-1] < input[m][n]...[l][x] <= boundaries[i]` True | `boundaries[i-1] <= input[m][n]...[l][x] < boundaries[i]` Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – N-D tensor or a Scalar containing the search value(s). * **boundaries** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D tensor, must contain a monotonically increasing sequence. Keyword Arguments * **out_int32** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicate the output data type. torch.int32 if True, torch.int64 otherwise. Default value is False, i.e. default output data type is torch.int64. * **right** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if False, return the first suitable location that is found. If True, return the last such index. If no suitable index found, return 0 for non-numerical value (eg. nan, inf) or the size of `boundaries` (one pass the last index). In other words, if False, gets the lower bound index for each value in `input` from `boundaries`. If True, gets the upper bound index instead. Default value is False. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor, must be the same size as `input` if provided. Example: >>> boundaries = torch.tensor([1, 3, 5, 7, 9]) >>> boundaries tensor([1, 3, 5, 7, 9]) >>> v = torch.tensor([[3, 6, 9], [3, 6, 9]]) >>> v tensor([[3, 6, 9], [3, 6, 9]]) >>> torch.bucketize(v, boundaries) tensor([[1, 3, 4], [1, 3, 4]]) >>> torch.bucketize(v, boundaries, right=True) tensor([[2, 3, 5], [2, 3, 5]]) # torch.can_cast `torch.can_cast(from, to) → bool` Determines if a type conversion is allowed under PyTorch casting rules described in the type promotion [documentation](../tensor_attributes#type- promotion-doc). Parameters * **from** (_dpython:type_) – The original [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"). * **to** (_dpython:type_) – The target [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"). Example: >>> torch.can_cast(torch.double, torch.float) True >>> torch.can_cast(torch.float, torch.int) False # torch.cartesian_prod `torch.cartesian_prod(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#cartesian_prod) Do cartesian product of the given sequence of tensors. The behavior is similar to python’s `itertools.product`. Parameters ***tensors** – any number of 1 dimensional tensors. Returns A tensor equivalent to converting all the input tensors into lists, do `itertools.product` on these lists, and finally convert the resulting list into tensor. Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> a = [1, 2, 3] >>> b = [4, 5] >>> list(itertools.product(a, b)) [(1, 4), (1, 5), (2, 4), (2, 5), (3, 4), (3, 5)] >>> tensor_a = torch.tensor(a) >>> tensor_b = torch.tensor(b) >>> torch.cartesian_prod(tensor_a, tensor_b) tensor([[1, 4], [1, 5], [2, 4], [2, 5], [3, 4], [3, 5]]) # torch.cat `torch.cat(tensors, dim=0, *, out=None) → Tensor` Concatenates the given sequence of `seq` tensors in the given dimension. All tensors must either have the same shape (except in the concatenating dimension) or be empty. `torch.cat()` can be seen as an inverse operation for [`torch.split()`](torch.split#torch.split "torch.split") and [`torch.chunk()`](torch.chunk#torch.chunk "torch.chunk"). `torch.cat()` can be best understood via examples. Parameters * **tensors** (_sequence of Tensors_) – any python sequence of tensors of the same type. Non-empty tensors provided must have the same shape, except in the cat dimension. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension over which the tensors are concatenated Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x = torch.randn(2, 3) >>> x tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 0) tensor([[ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497], [ 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497]]) >>> torch.cat((x, x, x), 1) tensor([[ 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614, 0.6580, -1.0969, -0.4614], [-0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497, -0.1034, -0.5790, 0.1497]]) # torch.cdist `torch.cdist(x1, x2, p=2.0, compute_mode='use_mm_for_euclid_dist_if_necessary')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#cdist) Computes batched the p-norm distance between each pair of the two collections of row vectors. Parameters * **x1** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input tensor of shape B×P×MB \times P \times M . * **x2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input tensor of shape B×R×MB \times R \times M . * **p** – p value for the p-norm distance to calculate between each vector pair ∈[0,∞]\in [0, \infty] . * **compute_mode** – ‘use_mm_for_euclid_dist_if_necessary’ - will use matrix multiplication approach to calculate euclidean distance (p = 2) if P > 25 or R > 25 ‘use_mm_for_euclid_dist’ - will always use matrix multiplication approach to calculate euclidean distance (p = 2) ‘donot_use_mm_for_euclid_dist’ - will never use matrix multiplication approach to calculate euclidean distance (p = 2) Default: use_mm_for_euclid_dist_if_necessary. If x1 has shape B×P×MB \times P \times M and x2 has shape B×R×MB \times R \times M then the output will have shape B×P×RB \times P \times R . This function is equivalent to `scipy.spatial.distance.cdist(input,’minkowski’, p=p)` if p∈(0,∞)p \in (0, \infty) . When p=0p = 0 it is equivalent to `scipy.spatial.distance.cdist(input, ‘hamming’) * M`. When p=∞p = \infty , the closest scipy function is `scipy.spatial.distance.cdist(xn, lambda x, y: np.abs(x - y).max())`. #### Example >>> a = torch.tensor([[0.9041, 0.0196], [-0.3108, -2.4423], [-0.4821, 1.059]]) >>> a tensor([[ 0.9041, 0.0196], [-0.3108, -2.4423], [-0.4821, 1.0590]]) >>> b = torch.tensor([[-2.1763, -0.4713], [-0.6986, 1.3702]]) >>> b tensor([[-2.1763, -0.4713], [-0.6986, 1.3702]]) >>> torch.cdist(a, b, p=2) tensor([[3.1193, 2.0959], [2.7138, 3.8322], [2.2830, 0.3791]]) # torch.ceil `torch.ceil(input, *, out=None) → Tensor` Returns a new tensor with the ceil of the elements of `input`, the smallest integer greater than or equal to each element. outi=⌈inputi⌉=⌊inputi⌋+1\text{out}_{i} = \left\lceil \text{input}_{i} \right\rceil = \left\lfloor \text{input}_{i} \right\rfloor + 1 Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.6341, -1.4208, -1.0900, 0.5826]) >>> torch.ceil(a) tensor([-0., -1., -1., 1.]) # torch.chain_matmul `torch.chain_matmul(*matrices)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#chain_matmul) Returns the matrix product of the NN 2-D tensors. This product is efficiently computed using the matrix chain order algorithm which selects the order in which incurs the lowest cost in terms of arithmetic operations ([[CLRS]](https://mitpress.mit.edu/books/introduction-algorithms-third- edition)). Note that since this is a function to compute the product, NN needs to be greater than or equal to 2; if equal to 2 then a trivial matrix-matrix product is returned. If NN is 1, then this is a no-op - the original matrix is returned as is. Parameters **matrices** (_Tensors..._) – a sequence of 2 or more 2-D tensors whose product is to be determined. Returns if the ithi^{th} tensor was of dimensions pi×pi+1p_{i} \times p_{i + 1} , then the product would be of dimensions p1×pN+1p_{1} \times p_{N + 1} . Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> a = torch.randn(3, 4) >>> b = torch.randn(4, 5) >>> c = torch.randn(5, 6) >>> d = torch.randn(6, 7) >>> torch.chain_matmul(a, b, c, d) tensor([[ -2.3375, -3.9790, -4.1119, -6.6577, 9.5609, -11.5095, -3.2614], [ 21.4038, 3.3378, -8.4982, -5.2457, -10.2561, -2.4684, 2.7163], [ -0.9647, -5.8917, -2.3213, -5.2284, 12.8615, -12.2816, -2.5095]]) # torch.cholesky `torch.cholesky(input, upper=False, *, out=None) → Tensor` Computes the Cholesky decomposition of a symmetric positive-definite matrix AA or for batches of symmetric positive-definite matrices. If `upper` is `True`, the returned matrix `U` is upper-triangular, and the decomposition has the form: A=UTUA = U^TU If `upper` is `False`, the returned matrix `L` is lower-triangular, and the decomposition has the form: A=LLTA = LL^T If `upper` is `True`, and AA is a batch of symmetric positive-definite matrices, then the returned tensor will be composed of upper-triangular Cholesky factors of each of the individual matrices. Similarly, when `upper` is `False`, the returned tensor will be composed of lower-triangular Cholesky factors of each of the individual matrices. Note [`torch.linalg.cholesky()`](../linalg#torch.linalg.cholesky "torch.linalg.cholesky") should be used over `torch.cholesky` when possible. Note however that [`torch.linalg.cholesky()`](../linalg#torch.linalg.cholesky "torch.linalg.cholesky") does not yet support the `upper` parameter and instead always returns the lower triangular matrix. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor AA of size (∗,n,n)(*, n, n) where `*` is zero or more batch dimensions consisting of symmetric positive-definite matrices. * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – flag that indicates whether to return a upper or lower triangular matrix. Default: `False` Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output matrix Example: >>> a = torch.randn(3, 3) >>> a = torch.mm(a, a.t()) # make symmetric positive-definite >>> l = torch.cholesky(a) >>> a tensor([[ 2.4112, -0.7486, 1.4551], [-0.7486, 1.3544, 0.1294], [ 1.4551, 0.1294, 1.6724]]) >>> l tensor([[ 1.5528, 0.0000, 0.0000], [-0.4821, 1.0592, 0.0000], [ 0.9371, 0.5487, 0.7023]]) >>> torch.mm(l, l.t()) tensor([[ 2.4112, -0.7486, 1.4551], [-0.7486, 1.3544, 0.1294], [ 1.4551, 0.1294, 1.6724]]) >>> a = torch.randn(3, 2, 2) >>> a = torch.matmul(a, a.transpose(-1, -2)) + 1e-03 # make symmetric positive-definite >>> l = torch.cholesky(a) >>> z = torch.matmul(l, l.transpose(-1, -2)) >>> torch.max(torch.abs(z - a)) # Max non-zero tensor(2.3842e-07) # torch.cholesky_inverse `torch.cholesky_inverse(input, upper=False, *, out=None) → Tensor` Computes the inverse of a symmetric positive-definite matrix AA using its Cholesky factor uu : returns matrix `inv`. The inverse is computed using LAPACK routines `dpotri` and `spotri` (and the corresponding MAGMA routines). If `upper` is `False`, uu is lower triangular such that the returned tensor is inv=(uuT)−1inv = (uu^{{T}})^{{-1}} If `upper` is `True` or not provided, uu is upper triangular such that the returned tensor is inv=(uTu)−1inv = (u^T u)^{{-1}} Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input 2-D tensor uu , a upper or lower triangular Cholesky factor * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to return a lower (default) or upper triangular matrix Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor for `inv` Example: >>> a = torch.randn(3, 3) >>> a = torch.mm(a, a.t()) + 1e-05 * torch.eye(3) # make symmetric positive definite >>> u = torch.cholesky(a) >>> a tensor([[ 0.9935, -0.6353, 1.5806], [ -0.6353, 0.8769, -1.7183], [ 1.5806, -1.7183, 10.6618]]) >>> torch.cholesky_inverse(u) tensor([[ 1.9314, 1.2251, -0.0889], [ 1.2251, 2.4439, 0.2122], [-0.0889, 0.2122, 0.1412]]) >>> a.inverse() tensor([[ 1.9314, 1.2251, -0.0889], [ 1.2251, 2.4439, 0.2122], [-0.0889, 0.2122, 0.1412]]) # torch.cholesky_solve `torch.cholesky_solve(input, input2, upper=False, *, out=None) → Tensor` Solves a linear system of equations with a positive semidefinite matrix to be inverted given its Cholesky factor matrix uu . If `upper` is `False`, uu is and lower triangular and `c` is returned such that: c=(uuT)−1bc = (u u^T)^{{-1}} b If `upper` is `True` or not provided, uu is upper triangular and `c` is returned such that: c=(uTu)−1bc = (u^T u)^{{-1}} b `torch.cholesky_solve(b, u)` can take in 2D inputs `b, u` or inputs that are batches of 2D matrices. If the inputs are batches, then returns batched outputs `c` Supports real-valued and complex-valued inputs. For the complex-valued inputs the transpose operator above is the conjugate transpose. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input matrix bb of size (∗,m,k)(*, m, k) , where ∗* is zero or more batch dimensions * **input2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input matrix uu of size (∗,m,m)(*, m, m) , where ∗* is zero of more batch dimensions composed of upper or lower triangular Cholesky factor * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to consider the Cholesky factor as a lower or upper triangular matrix. Default: `False`. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor for `c` Example: >>> a = torch.randn(3, 3) >>> a = torch.mm(a, a.t()) # make symmetric positive definite >>> u = torch.cholesky(a) >>> a tensor([[ 0.7747, -1.9549, 1.3086], [-1.9549, 6.7546, -5.4114], [ 1.3086, -5.4114, 4.8733]]) >>> b = torch.randn(3, 2) >>> b tensor([[-0.6355, 0.9891], [ 0.1974, 1.4706], [-0.4115, -0.6225]]) >>> torch.cholesky_solve(b, u) tensor([[ -8.1625, 19.6097], [ -5.8398, 14.2387], [ -4.3771, 10.4173]]) >>> torch.mm(a.inverse(), b) tensor([[ -8.1626, 19.6097], [ -5.8398, 14.2387], [ -4.3771, 10.4173]]) # torch.chunk `torch.chunk(input, chunks, dim=0) → List of Tensors` Splits a tensor into a specific number of chunks. Each chunk is a view of the input tensor. Last chunk will be smaller if the tensor size along the given dimension `dim` is not divisible by `chunks`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to split * **chunks** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of chunks to return * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to split the tensor # torch.clamp `torch.clamp(input, min, max, *, out=None) → Tensor` Clamp all elements in `input` into the range `[` [`min`](torch.min#torch.min "torch.min"), [`max`](torch.max#torch.max "torch.max") `]`. Let min_value and max_value be [`min`](torch.min#torch.min "torch.min") and [`max`](torch.max#torch.max "torch.max"), respectively, this returns: yi=min⁡(max⁡(xi,min_value),max_value)y_i = \min(\max(x_i, \text{min\\_value}), \text{max\\_value}) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **min** (_Number_) – lower-bound of the range to be clamped to * **max** (_Number_) – upper-bound of the range to be clamped to Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-1.7120, 0.1734, -0.0478, -0.0922]) >>> torch.clamp(a, min=-0.5, max=0.5) tensor([-0.5000, 0.1734, -0.0478, -0.0922]) `torch.clamp(input, *, min, out=None) → Tensor` Clamps all elements in `input` to be larger or equal [`min`](torch.min#torch.min "torch.min"). Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments * **min** (_Number_) – minimal value of each element in the output * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.0299, -2.3184, 2.1593, -0.8883]) >>> torch.clamp(a, min=0.5) tensor([ 0.5000, 0.5000, 2.1593, 0.5000]) `torch.clamp(input, *, max, out=None) → Tensor` Clamps all elements in `input` to be smaller or equal [`max`](torch.max#torch.max "torch.max"). Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments * **max** (_Number_) – maximal value of each element in the output * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.7753, -0.4702, -0.4599, 1.1899]) >>> torch.clamp(a, max=0.5) tensor([ 0.5000, -0.4702, -0.4599, 0.5000]) # torch.clip `torch.clip(input, min, max, *, out=None) → Tensor` Alias for [`torch.clamp()`](torch.clamp#torch.clamp "torch.clamp"). # torch.clone `torch.clone(input, *, memory_format=torch.preserve_format) → Tensor` Returns a copy of `input`. Note This function is differentiable, so gradients will flow back from the result of this operation to `input`. To create a tensor without an autograd relationship to `input` see [`detach()`](../autograd#torch.Tensor.detach "torch.Tensor.detach"). Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned tensor. Default: `torch.preserve_format`. # torch.column_stack `torch.column_stack(tensors, *, out=None) → Tensor` Creates a new tensor by horizontally stacking the tensors in `tensors`. Equivalent to `torch.hstack(tensors)`, except each zero or one dimensional tensor `t` in `tensors` is first reshaped into a `(t.numel(), 1)` column before being stacked horizontally. Parameters **tensors** (_sequence of Tensors_) – sequence of tensors to concatenate Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([1, 2, 3]) >>> b = torch.tensor([4, 5, 6]) >>> torch.column_stack((a, b)) tensor([[1, 4], [2, 5], [3, 6]]) >>> a = torch.arange(5) >>> b = torch.arange(10).reshape(5, 2) >>> torch.column_stack((a, b, b)) tensor([[0, 0, 1, 0, 1], [1, 2, 3, 2, 3], [2, 4, 5, 4, 5], [3, 6, 7, 6, 7], [4, 8, 9, 8, 9]]) # torch.combinations `torch.combinations(input, r=2, with_replacement=False) → seq` Compute combinations of length rr of the given tensor. The behavior is similar to python’s `itertools.combinations` when `with_replacement` is set to `False`, and `itertools.combinations_with_replacement` when `with_replacement` is set to `True`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1D vector. * **r** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – number of elements to combine * **with_replacement** (_boolean_ _,__optional_) – whether to allow duplication in combination Returns A tensor equivalent to converting all the input tensors into lists, do `itertools.combinations` or `itertools.combinations_with_replacement` on these lists, and finally convert the resulting list into tensor. Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> a = [1, 2, 3] >>> list(itertools.combinations(a, r=2)) [(1, 2), (1, 3), (2, 3)] >>> list(itertools.combinations(a, r=3)) [(1, 2, 3)] >>> list(itertools.combinations_with_replacement(a, r=2)) [(1, 1), (1, 2), (1, 3), (2, 2), (2, 3), (3, 3)] >>> tensor_a = torch.tensor(a) >>> torch.combinations(tensor_a) tensor([[1, 2], [1, 3], [2, 3]]) >>> torch.combinations(tensor_a, r=3) tensor([[1, 2, 3]]) >>> torch.combinations(tensor_a, with_replacement=True) tensor([[1, 1], [1, 2], [1, 3], [2, 2], [2, 3], [3, 3]]) # torch.compiled_with_cxx11_abi `torch.compiled_with_cxx11_abi()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#compiled_with_cxx11_abi) Returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1 # torch.complex `torch.complex(real, imag, *, out=None) → Tensor` Constructs a complex tensor with its real part equal to [`real`](torch.real#torch.real "torch.real") and its imaginary part equal to [`imag`](torch.imag#torch.imag "torch.imag"). Parameters * **real** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The real part of the complex tensor. Must be float or double. * **imag** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The imaginary part of the complex tensor. Must be same dtype as [`real`](torch.real#torch.real "torch.real"). Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – If the inputs are `torch.float32`, must be `torch.complex64`. If the inputs are `torch.float64`, must be `torch.complex128`. Example:: >>> real = torch.tensor([1, 2], dtype=torch.float32) >>> imag = torch.tensor([3, 4], dtype=torch.float32) >>> z = torch.complex(real, imag) >>> z tensor([(1.+3.j), (2.+4.j)]) >>> z.dtype torch.complex64 # torch.conj `torch.conj(input, *, out=None) → Tensor` Computes the element-wise conjugate of the given `input` tensor. If :attr`input` has a non-complex dtype, this function just returns `input`. Warning In the future, `torch.conj()` may return a non-writeable view for an `input` of non-complex dtype. It’s recommended that programs not modify the tensor returned by `torch.conj()` when `input` is of non-complex dtype to be compatible with this change. outi=conj(inputi)\text{out}_{i} = conj(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.conj(torch.tensor([-1 + 1j, -2 + 2j, 3 - 3j])) tensor([-1 - 1j, -2 - 2j, 3 + 3j]) # torch.copysign `torch.copysign(input, other, *, out=None) → Tensor` Create a new floating-point tensor with the magnitude of `input` and the sign of `other`, elementwise. outi={−∣inputi∣ifotheri≤−0.0∣inputi∣ifotheri≥0.0\text{out}_{i} = \begin{cases} -|\text{input}_{i}| & \text{if} \text{other}_{i} \leq -0.0 \\\ |\text{input}_{i}| & \text{if} \text{other}_{i} \geq 0.0 \\\ \end{cases} Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics), and integer and float inputs. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – magnitudes. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – contains value(s) whose signbit(s) are applied to the magnitudes in `input`. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(5) >>> a tensor([-1.2557, -0.0026, -0.5387, 0.4740, -0.9244]) >>> torch.copysign(a, 1) tensor([1.2557, 0.0026, 0.5387, 0.4740, 0.9244]) >>> a = torch.randn(4, 4) >>> a tensor([[ 0.7079, 0.2778, -1.0249, 0.5719], [-0.0059, -0.2600, -0.4475, -1.3948], [ 0.3667, -0.9567, -2.5757, -0.1751], [ 0.2046, -0.0742, 0.2998, -0.1054]]) >>> b = torch.randn(4) tensor([ 0.2373, 0.3120, 0.3190, -1.1128]) >>> torch.copysign(a, b) tensor([[ 0.7079, 0.2778, 1.0249, -0.5719], [ 0.0059, 0.2600, 0.4475, -1.3948], [ 0.3667, 0.9567, 2.5757, -0.1751], [ 0.2046, 0.0742, 0.2998, -0.1054]]) # torch.cos `torch.cos(input, *, out=None) → Tensor` Returns a new tensor with the cosine of the elements of `input`. outi=cos⁡(inputi)\text{out}_{i} = \cos(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 1.4309, 1.2706, -0.8562, 0.9796]) >>> torch.cos(a) tensor([ 0.1395, 0.2957, 0.6553, 0.5574]) # torch.cosh `torch.cosh(input, *, out=None) → Tensor` Returns a new tensor with the hyperbolic cosine of the elements of `input`. outi=cosh⁡(inputi)\text{out}_{i} = \cosh(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.1632, 1.1835, -0.6979, -0.7325]) >>> torch.cosh(a) tensor([ 1.0133, 1.7860, 1.2536, 1.2805]) Note When `input` is on the CPU, the implementation of torch.cosh may use the Sleef library, which rounds very large results to infinity or negative infinity. See [here](https://sleef.org/purec.xhtml) for details. # torch.count_nonzero `torch.count_nonzero(input, dim=None) → Tensor` Counts the number of non-zero values in the tensor `input` along the given `dim`. If no dim is specified then all non-zeros in the tensor are counted. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_ _,__optional_) – Dim or tuple of dims along which to count non-zeros. Example: >>> x = torch.zeros(3,3) >>> x[torch.randn(3,3) > 0.5] = 1 >>> x tensor([[0., 1., 1.], [0., 0., 0.], [0., 0., 1.]]) >>> torch.count_nonzero(x) tensor(3) >>> torch.count_nonzero(x, dim=0) tensor([0, 1, 2]) # torch.cross `torch.cross(input, other, dim=None, *, out=None) → Tensor` Returns the cross product of vectors in dimension `dim` of `input` and `other`. `input` and `other` must have the same size, and the size of their `dim` dimension should be 3. If `dim` is not given, it defaults to the first dimension found with the size 3. Note that this might be unexpected. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to take the cross-product in. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 3) >>> a tensor([[-0.3956, 1.1455, 1.6895], [-0.5849, 1.3672, 0.3599], [-1.1626, 0.7180, -0.0521], [-0.1339, 0.9902, -2.0225]]) >>> b = torch.randn(4, 3) >>> b tensor([[-0.0257, -1.4725, -1.2251], [-1.1479, -0.7005, -1.9757], [-1.3904, 0.3726, -1.1836], [-0.9688, -0.7153, 0.2159]]) >>> torch.cross(a, b, dim=1) tensor([[ 1.0844, -0.5281, 0.6120], [-2.4490, -1.5687, 1.9792], [-0.8304, -1.3037, 0.5650], [-1.2329, 1.9883, 1.0551]]) >>> torch.cross(a, b) tensor([[ 1.0844, -0.5281, 0.6120], [-2.4490, -1.5687, 1.9792], [-0.8304, -1.3037, 0.5650], [-1.2329, 1.9883, 1.0551]]) # torch.cummax `torch.cummax(input, dim, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`. yi=max(x1,x2,x3,…,xi)y_i = max(x_1, x_2, x_3, \dots, x_i) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the result tuple of two output tensors (values, indices) Example: >>> a = torch.randn(10) >>> a tensor([-0.3449, -1.5447, 0.0685, -1.5104, -1.1706, 0.2259, 1.4696, -1.3284, 1.9946, -0.8209]) >>> torch.cummax(a, dim=0) torch.return_types.cummax( values=tensor([-0.3449, -0.3449, 0.0685, 0.0685, 0.0685, 0.2259, 1.4696, 1.4696, 1.9946, 1.9946]), indices=tensor([0, 0, 2, 2, 2, 5, 6, 6, 8, 8])) # torch.cummin `torch.cummin(input, dim, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`. yi=min(x1,x2,x3,…,xi)y_i = min(x_1, x_2, x_3, \dots, x_i) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the result tuple of two output tensors (values, indices) Example: >>> a = torch.randn(10) >>> a tensor([-0.2284, -0.6628, 0.0975, 0.2680, -1.3298, -0.4220, -0.3885, 1.1762, 0.9165, 1.6684]) >>> torch.cummin(a, dim=0) torch.return_types.cummin( values=tensor([-0.2284, -0.6628, -0.6628, -0.6628, -1.3298, -1.3298, -1.3298, -1.3298, -1.3298, -1.3298]), indices=tensor([0, 1, 1, 1, 4, 4, 4, 4, 4, 4])) # torch.cumprod `torch.cumprod(input, dim, *, dtype=None, out=None) → Tensor` Returns the cumulative product of elements of `input` in the dimension `dim`. For example, if `input` is a vector of size N, the result will also be a vector of size N, with elements. yi=x1×x2×x3×⋯×xiy_i = x_1 \times x_2\times x_3\times \dots \times x_i Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(10) >>> a tensor([ 0.6001, 0.2069, -0.1919, 0.9792, 0.6727, 1.0062, 0.4126, -0.2129, -0.4206, 0.1968]) >>> torch.cumprod(a, dim=0) tensor([ 0.6001, 0.1241, -0.0238, -0.0233, -0.0157, -0.0158, -0.0065, 0.0014, -0.0006, -0.0001]) >>> a[5] = 0.0 >>> torch.cumprod(a, dim=0) tensor([ 0.6001, 0.1241, -0.0238, -0.0233, -0.0157, -0.0000, -0.0000, 0.0000, -0.0000, -0.0000]) # torch.cumsum `torch.cumsum(input, dim, *, dtype=None, out=None) → Tensor` Returns the cumulative sum of elements of `input` in the dimension `dim`. For example, if `input` is a vector of size N, the result will also be a vector of size N, with elements. yi=x1+x2+x3+⋯+xiy_i = x_1 + x_2 + x_3 + \dots + x_i Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(10) >>> a tensor([-0.8286, -0.4890, 0.5155, 0.8443, 0.1865, -0.1752, -2.0595, 0.1850, -1.1571, -0.4243]) >>> torch.cumsum(a, dim=0) tensor([-0.8286, -1.3175, -0.8020, 0.0423, 0.2289, 0.0537, -2.0058, -1.8209, -2.9780, -3.4022]) # torch.deg2rad `torch.deg2rad(input, *, out=None) → Tensor` Returns a new tensor with each of the elements of `input` converted from angles in degrees to radians. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([[180.0, -180.0], [360.0, -360.0], [90.0, -90.0]]) >>> torch.deg2rad(a) tensor([[ 3.1416, -3.1416], [ 6.2832, -6.2832], [ 1.5708, -1.5708]]) # torch.dequantize `torch.dequantize(tensor) → Tensor` Returns an fp32 Tensor by dequantizing a quantized Tensor Parameters **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – A quantized Tensor `torch.dequantize(tensors) → sequence of Tensors` Given a list of quantized Tensors, dequantize them and return a list of fp32 Tensors Parameters **tensors** (_sequence of Tensors_) – A list of quantized Tensors # torch.det `torch.det(input) → Tensor` Calculates determinant of a square matrix or batches of square matrices. Note `torch.det()` is deprecated. Please use [`torch.linalg.det()`](../linalg#torch.linalg.det "torch.linalg.det") instead. Note Backward through detdet internally uses SVD results when `input` is not invertible. In this case, double backward through detdet will be unstable when `input` doesn’t have distinct singular values. See torch.svd~torch.svd for details. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size `(*, n, n)` where `*` is zero or more batch dimensions. Example: >>> A = torch.randn(3, 3) >>> torch.det(A) tensor(3.7641) >>> A = torch.randn(3, 2, 2) >>> A tensor([[[ 0.9254, -0.6213], [-0.5787, 1.6843]], [[ 0.3242, -0.9665], [ 0.4539, -0.0887]], [[ 1.1336, -0.4025], [-0.7089, 0.9032]]]) >>> A.det() tensor([1.1990, 0.4099, 0.7386]) # torch.diag `torch.diag(input, diagonal=0, *, out=None) → Tensor` * If `input` is a vector (1-D tensor), then returns a 2-D square tensor with the elements of `input` as the diagonal. * If `input` is a matrix (2-D tensor), then returns a 1-D tensor with the diagonal elements of `input`. The argument [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") controls which diagonal to consider: * If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") = 0, it is the main diagonal. * If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") > 0, it is above the main diagonal. * If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") < 0, it is below the main diagonal. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **diagonal** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. See also [`torch.diagonal()`](torch.diagonal#torch.diagonal "torch.diagonal") always returns the diagonal of its input. [`torch.diagflat()`](torch.diagflat#torch.diagflat "torch.diagflat") always constructs a tensor with diagonal elements specified by the input. Examples: Get the square matrix where the input vector is the diagonal: >>> a = torch.randn(3) >>> a tensor([ 0.5950,-0.0872, 2.3298]) >>> torch.diag(a) tensor([[ 0.5950, 0.0000, 0.0000], [ 0.0000,-0.0872, 0.0000], [ 0.0000, 0.0000, 2.3298]]) >>> torch.diag(a, 1) tensor([[ 0.0000, 0.5950, 0.0000, 0.0000], [ 0.0000, 0.0000,-0.0872, 0.0000], [ 0.0000, 0.0000, 0.0000, 2.3298], [ 0.0000, 0.0000, 0.0000, 0.0000]]) Get the k-th diagonal of a given matrix: >>> a = torch.randn(3, 3) >>> a tensor([[-0.4264, 0.0255,-0.1064], [ 0.8795,-0.2429, 0.1374], [ 0.1029,-0.6482,-1.6300]]) >>> torch.diag(a, 0) tensor([-0.4264,-0.2429,-1.6300]) >>> torch.diag(a, 1) tensor([ 0.0255, 0.1374]) # torch.diag_embed `torch.diag_embed(input, offset=0, dim1=-2, dim2=-1) → Tensor` Creates a tensor whose diagonals of certain 2D planes (specified by `dim1` and `dim2`) are filled by `input`. To facilitate creating batched diagonal matrices, the 2D planes formed by the last two dimensions of the returned tensor are chosen by default. The argument `offset` controls which diagonal to consider: * If `offset` = 0, it is the main diagonal. * If `offset` > 0, it is above the main diagonal. * If `offset` < 0, it is below the main diagonal. The size of the new matrix will be calculated to make the specified diagonal of the size of the last input dimension. Note that for `offset` other than 00 , the order of `dim1` and `dim2` matters. Exchanging them is equivalent to changing the sign of `offset`. Applying [`torch.diagonal()`](torch.diagonal#torch.diagonal "torch.diagonal") to the output of this function with the same arguments yields a matrix identical to input. However, [`torch.diagonal()`](torch.diagonal#torch.diagonal "torch.diagonal") has different default dimensions, so those need to be explicitly specified. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Must be at least 1-dimensional. * **offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – which diagonal to consider. Default: 0 (main diagonal). * **dim1** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – first dimension with respect to which to take diagonal. Default: -2. * **dim2** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – second dimension with respect to which to take diagonal. Default: -1. Example: >>> a = torch.randn(2, 3) >>> torch.diag_embed(a) tensor([[[ 1.5410, 0.0000, 0.0000], [ 0.0000, -0.2934, 0.0000], [ 0.0000, 0.0000, -2.1788]], [[ 0.5684, 0.0000, 0.0000], [ 0.0000, -1.0845, 0.0000], [ 0.0000, 0.0000, -1.3986]]]) >>> torch.diag_embed(a, offset=1, dim1=0, dim2=2) tensor([[[ 0.0000, 1.5410, 0.0000, 0.0000], [ 0.0000, 0.5684, 0.0000, 0.0000]], [[ 0.0000, 0.0000, -0.2934, 0.0000], [ 0.0000, 0.0000, -1.0845, 0.0000]], [[ 0.0000, 0.0000, 0.0000, -2.1788], [ 0.0000, 0.0000, 0.0000, -1.3986]], [[ 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000]]]) # torch.diagflat `torch.diagflat(input, offset=0) → Tensor` * If `input` is a vector (1-D tensor), then returns a 2-D square tensor with the elements of `input` as the diagonal. * If `input` is a tensor with more than one dimension, then returns a 2-D tensor with diagonal elements equal to a flattened `input`. The argument `offset` controls which diagonal to consider: * If `offset` = 0, it is the main diagonal. * If `offset` > 0, it is above the main diagonal. * If `offset` < 0, it is below the main diagonal. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider. Default: 0 (main diagonal). Examples: >>> a = torch.randn(3) >>> a tensor([-0.2956, -0.9068, 0.1695]) >>> torch.diagflat(a) tensor([[-0.2956, 0.0000, 0.0000], [ 0.0000, -0.9068, 0.0000], [ 0.0000, 0.0000, 0.1695]]) >>> torch.diagflat(a, 1) tensor([[ 0.0000, -0.2956, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.9068, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.1695], [ 0.0000, 0.0000, 0.0000, 0.0000]]) >>> a = torch.randn(2, 2) >>> a tensor([[ 0.2094, -0.3018], [-0.1516, 1.9342]]) >>> torch.diagflat(a) tensor([[ 0.2094, 0.0000, 0.0000, 0.0000], [ 0.0000, -0.3018, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.1516, 0.0000], [ 0.0000, 0.0000, 0.0000, 1.9342]]) # torch.diagonal `torch.diagonal(input, offset=0, dim1=0, dim2=1) → Tensor` Returns a partial view of `input` with the its diagonal elements with respect to `dim1` and `dim2` appended as a dimension at the end of the shape. The argument `offset` controls which diagonal to consider: * If `offset` = 0, it is the main diagonal. * If `offset` > 0, it is above the main diagonal. * If `offset` < 0, it is below the main diagonal. Applying [`torch.diag_embed()`](torch.diag_embed#torch.diag_embed "torch.diag_embed") to the output of this function with the same arguments yields a diagonal matrix with the diagonal entries of the input. However, [`torch.diag_embed()`](torch.diag_embed#torch.diag_embed "torch.diag_embed") has different default dimensions, so those need to be explicitly specified. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Must be at least 2-dimensional. * **offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – which diagonal to consider. Default: 0 (main diagonal). * **dim1** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – first dimension with respect to which to take diagonal. Default: 0. * **dim2** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – second dimension with respect to which to take diagonal. Default: 1. Note To take a batch diagonal, pass in dim1=-2, dim2=-1. Examples: >>> a = torch.randn(3, 3) >>> a tensor([[-1.0854, 1.1431, -0.1752], [ 0.8536, -0.0905, 0.0360], [ 0.6927, -0.3735, -0.4945]]) >>> torch.diagonal(a, 0) tensor([-1.0854, -0.0905, -0.4945]) >>> torch.diagonal(a, 1) tensor([ 1.1431, 0.0360]) >>> x = torch.randn(2, 5, 4, 2) >>> torch.diagonal(x, offset=-1, dim1=1, dim2=2) tensor([[[-1.2631, 0.3755, -1.5977, -1.8172], [-1.1065, 1.0401, -0.2235, -0.7938]], [[-1.7325, -0.3081, 0.6166, 0.2335], [ 1.0500, 0.7336, -0.3836, -1.1015]]]) # torch.diff `torch.diff(input, n=1, dim=-1, prepend=None, append=None) → Tensor` Computes the n-th forward difference along the given dimension. The first-order differences are given by `out[i] = input[i + 1] - input[i]`. Higher-order differences are calculated by using `torch.diff()` recursively. Note Only `n = 1` is currently supported Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute the differences on * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of times to recursively compute the difference * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to compute the difference along. Default is the last dimension. * **append** (_prepend_ _,_) – values to prepend or append to `input` along `dim` before computing the difference. Their dimensions must be equivalent to that of input, and their shapes must match input’s shape except on `dim`. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([1, 3, 2]) >>> torch.diff(a) tensor([ 2, -1]) >>> b = torch.tensor([4, 5]) >>> torch.diff(a, append=b) tensor([ 2, -1, 2, 1]) >>> c = torch.tensor([[1, 2, 3], [3, 4, 5]]) >>> torch.diff(c, dim=0) tensor([[2, 2, 2]]) >>> torch.diff(c, dim=1) tensor([[1, 1], [1, 1]]) # torch.digamma `torch.digamma(input, *, out=None) → Tensor` Computes the logarithmic derivative of the gamma function on `input`. ψ(x)=ddxln⁡(Γ(x))=Γ′(x)Γ(x)\psi(x) = \frac{d}{dx} \ln\left(\Gamma\left(x\right)\right) = \frac{\Gamma'(x)}{\Gamma(x)} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute the digamma function on Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Note This function is similar to SciPy’s `scipy.special.digamma`. Note From PyTorch 1.8 onwards, the digamma function returns `-Inf` for `0`. Previously it returned `NaN` for `0`. Example: >>> a = torch.tensor([1, 0.5]) >>> torch.digamma(a) tensor([-0.5772, -1.9635]) # torch.dist `torch.dist(input, other, p=2) → Tensor` Returns the p-norm of (`input` \- `other`) The shapes of `input` and `other` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the Right-hand-side input tensor * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the norm to be computed Example: >>> x = torch.randn(4) >>> x tensor([-1.5393, -0.8675, 0.5916, 1.6321]) >>> y = torch.randn(4) >>> y tensor([ 0.0967, -1.0511, 0.6295, 0.8360]) >>> torch.dist(x, y, 3.5) tensor(1.6727) >>> torch.dist(x, y, 3) tensor(1.6973) >>> torch.dist(x, y, 0) tensor(inf) >>> torch.dist(x, y, 1) tensor(2.6537) # torch.div `torch.div(input, other, *, rounding_mode=None, out=None) → Tensor` Divides each element of the input `input` by the corresponding element of `other`. outi=inputiotheri\text{out}_i = \frac{\text{input}_i}{\text{other}_i} Note By default, this performs a “true” division like Python 3. See the `rounding_mode` argument for floor division. Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics), [type promotion](../tensor_attributes#type-promotion-doc), and integer, float, and complex inputs. Always promotes integer types to the default scalar type. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the dividend * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the divisor Keyword Arguments * **rounding_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – Type of rounding applied to the result: * None - default behavior. Performs no rounding and, if both `input` and `other` are integer types, promotes the inputs to the default scalar type. Equivalent to true division in Python (the `/` operator) and NumPy’s `np.true_divide`. * `"trunc"` \- rounds the results of the division towards zero. Equivalent to C-style integer division. * `"floor"` \- rounds the results of the division down. Equivalent to floor division in Python (the `//` operator) and NumPy’s `np.floor_divide`. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Examples: >>> x = torch.tensor([ 0.3810, 1.2774, -0.2972, -0.3719, 0.4637]) >>> torch.div(x, 0.5) tensor([ 0.7620, 2.5548, -0.5944, -0.7438, 0.9274]) >>> a = torch.tensor([[-0.3711, -1.9353, -0.4605, -0.2917], ... [ 0.1815, -1.0111, 0.9805, -1.5923], ... [ 0.1062, 1.4581, 0.7759, -1.2344], ... [-0.1830, -0.0313, 1.1908, -1.4757]]) >>> b = torch.tensor([ 0.8032, 0.2930, -0.8113, -0.2308]) >>> torch.div(a, b) tensor([[-0.4620, -6.6051, 0.5676, 1.2639], [ 0.2260, -3.4509, -1.2086, 6.8990], [ 0.1322, 4.9764, -0.9564, 5.3484], [-0.2278, -0.1068, -1.4678, 6.3938]]) >>> torch.div(a, b, rounding_mode='trunc') tensor([[-0., -6., 0., 1.], [ 0., -3., -1., 6.], [ 0., 4., -0., 5.], [-0., -0., -1., 6.]]) >>> torch.div(a, b, rounding_mode='floor') tensor([[-1., -7., 0., 1.], [ 0., -4., -2., 6.], [ 0., 4., -1., 5.], [-1., -1., -2., 6.]]) # torch.divide `torch.divide(input, other, *, rounding_mode=None, out=None) → Tensor` Alias for [`torch.div()`](torch.div#torch.div "torch.div"). # torch.dot `torch.dot(input, other, *, out=None) → Tensor` Computes the dot product of two 1D tensors. Note Unlike NumPy’s dot, torch.dot intentionally only supports computing the dot product of two 1D tensors with the same number of elements. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor in the dot product, must be 1D. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor in the dot product, must be 1D. Keyword Arguments **{out}** – Example: >>> torch.dot(torch.tensor([2, 3]), torch.tensor([2, 1])) tensor(7) # torch.dstack `torch.dstack(tensors, *, out=None) → Tensor` Stack tensors in sequence depthwise (along third axis). This is equivalent to concatenation along the third axis after 1-D and 2-D tensors have been reshaped by [`torch.atleast_3d()`](torch.atleast_3d#torch.atleast_3d "torch.atleast_3d"). Parameters **tensors** (_sequence of Tensors_) – sequence of tensors to concatenate Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> a = torch.tensor([1, 2, 3]) >>> b = torch.tensor([4, 5, 6]) >>> torch.dstack((a,b)) tensor([[[1, 4], [2, 5], [3, 6]]]) >>> a = torch.tensor([[1],[2],[3]]) >>> b = torch.tensor([[4],[5],[6]]) >>> torch.dstack((a,b)) tensor([[[1, 4]], [[2, 5]], [[3, 6]]]) # torch.eig `torch.eig(input, eigenvectors=False, *, out=None) -> (Tensor, Tensor)` Computes the eigenvalues and eigenvectors of a real square matrix. Note Since eigenvalues and eigenvectors might be complex, backward pass is supported only if eigenvalues and eigenvectors are all real valued. When `input` is on CUDA, `torch.eig()` causes host-device synchronization. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the square matrix of shape (n×n)(n \times n) for which the eigenvalues and eigenvectors will be computed * **eigenvectors** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – `True` to compute both eigenvalues and eigenvectors; otherwise, only eigenvalues will be computed Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the output tensors Returns A namedtuple (eigenvalues, eigenvectors) containing * **eigenvalues** (_Tensor_): Shape (n×2)(n \times 2) . Each row is an eigenvalue of `input`, where the first element is the real part and the second element is the imaginary part. The eigenvalues are not necessarily ordered. * **eigenvectors** (_Tensor_): If `eigenvectors=False`, it’s an empty tensor. Otherwise, this tensor of shape (n×n)(n \times n) can be used to compute normalized (unit length) eigenvectors of corresponding eigenvalues as follows. If the corresponding `eigenvalues[j]` is a real number, column `eigenvectors[:, j]` is the eigenvector corresponding to `eigenvalues[j]`. If the corresponding `eigenvalues[j]` and `eigenvalues[j + 1]` form a complex conjugate pair, then the true eigenvectors can be computed as true eigenvector[j]=eigenvectors[:,j]+i×eigenvectors[:,j+1]\text{true eigenvector}[j] = eigenvectors[:, j] + i \times eigenvectors[:, j + 1] , true eigenvector[j+1]=eigenvectors[:,j]−i×eigenvectors[:,j+1]\text{true eigenvector}[j + 1] = eigenvectors[:, j] - i \times eigenvectors[:, j + 1] . Return type ([Tensor](../tensors#torch.Tensor "torch.Tensor"), [Tensor](../tensors#torch.Tensor "torch.Tensor")) Example: Trivial example with a diagonal matrix. By default, only eigenvalues are computed: >>> a = torch.diag(torch.tensor([1, 2, 3], dtype=torch.double)) >>> e, v = torch.eig(a) >>> e tensor([[1., 0.], [2., 0.], [3., 0.]], dtype=torch.float64) >>> v tensor([], dtype=torch.float64) Compute also the eigenvectors: >>> e, v = torch.eig(a, eigenvectors=True) >>> e tensor([[1., 0.], [2., 0.], [3., 0.]], dtype=torch.float64) >>> v tensor([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], dtype=torch.float64) # torch.einsum `torch.einsum(equation, *operands) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#einsum) Sums the product of the elements of the input `operands` along dimensions specified using a notation based on the Einstein summation convention. Einsum allows computing many common multi-dimensional linear algebraic array operations by representing them in a short-hand format based on the Einstein summation convention, given by `equation`. The details of this format are described below, but the general idea is to label every dimension of the input `operands` with some subscript and define which subscripts are part of the output. The output is then computed by summing the product of the elements of the `operands` along the dimensions whose subscripts are not part of the output. For example, matrix multiplication can be computed using einsum as `torch.einsum(“ij,jk->ik”, A, B)`. Here, j is the summation subscript and i and k the output subscripts (see section below for more details on why). Equation: The `equation` string specifies the subscripts (lower case letters `[‘a’, ‘z’]`) for each dimension of the input `operands` in the same order as the dimensions, separating subcripts for each operand by a comma (‘,’), e.g. `‘ij,jk’` specify subscripts for two 2D operands. The dimensions labeled with the same subscript must be broadcastable, that is, their size must either match or be `1`. The exception is if a subscript is repeated for the same input operand, in which case the dimensions labeled with this subscript for this operand must match in size and the operand will be replaced by its diagonal along these dimensions. The subscripts that appear exactly once in the `equation` will be part of the output, sorted in increasing alphabetical order. The output is computed by multiplying the input `operands` element- wise, with their dimensions aligned based on the subscripts, and then summing out the dimensions whose subscripts are not part of the output. Optionally, the output subscripts can be explicitly defined by adding an arrow (‘->’) at the end of the equation followed by the subscripts for the output. For instance, the following equation computes the transpose of a matrix multiplication: ‘ij,jk->ki’. The output subscripts must appear at least once for some input operand and at most once for the output. Ellipsis (‘…’) can be used in place of subscripts to broadcast the dimensions covered by the ellipsis. Each input operand may contain at most one ellipsis which will cover the dimensions not covered by subscripts, e.g. for an input operand with 5 dimensions, the ellipsis in the equation `‘ab…c’` cover the third and fourth dimensions. The ellipsis does not need to cover the same number of dimensions across the `operands` but the ‘shape’ of the ellipsis (the size of the dimensions covered by them) must broadcast together. If the output is not explicitly defined with the arrow (‘->’) notation, the ellipsis will come first in the output (left-most dimensions), before the subscript labels that appear exactly once for the input operands. e.g. the following equation implements batch matrix multiplication `‘…ij,…jk’`. A few final notes: the equation may contain whitespaces between the different elements (subscripts, ellipsis, arrow and comma) but something like `‘…’` is not valid. An empty string `‘’` is valid for scalar operands. Note `torch.einsum` handles ellipsis (‘…’) differently from NumPy in that it allows dimensions covered by the ellipsis to be summed over, that is, ellipsis are not required to be part of the output. Note This function does not optimize the given expression, so a different formula for the same computation may run faster or consume less memory. Projects like opt_einsum () can optimize the formula for you. Parameters * **equation** (_string_) – The subscripts for the Einstein summation. * **operands** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The operands to compute the Einstein sum of. Examples: # trace >>> torch.einsum('ii', torch.randn(4, 4)) tensor(-1.2104) # diagonal >>> torch.einsum('ii->i', torch.randn(4, 4)) tensor([-0.1034, 0.7952, -0.2433, 0.4545]) # outer product >>> x = torch.randn(5) >>> y = torch.randn(4) >>> torch.einsum('i,j->ij', x, y) tensor([[ 0.1156, -0.2897, -0.3918, 0.4963], [-0.3744, 0.9381, 1.2685, -1.6070], [ 0.7208, -1.8058, -2.4419, 3.0936], [ 0.1713, -0.4291, -0.5802, 0.7350], [ 0.5704, -1.4290, -1.9323, 2.4480]]) # batch matrix multiplication >>> As = torch.randn(3,2,5) >>> Bs = torch.randn(3,5,4) >>> torch.einsum('bij,bjk->bik', As, Bs) tensor([[[-1.0564, -1.5904, 3.2023, 3.1271], [-1.6706, -0.8097, -0.8025, -2.1183]], [[ 4.2239, 0.3107, -0.5756, -0.2354], [-1.4558, -0.3460, 1.5087, -0.8530]], [[ 2.8153, 1.8787, -4.3839, -1.2112], [ 0.3728, -2.1131, 0.0921, 0.8305]]]) # batch permute >>> A = torch.randn(2, 3, 4, 5) >>> torch.einsum('...ij->...ji', A).shape torch.Size([2, 3, 5, 4]) # equivalent to torch.nn.functional.bilinear >>> A = torch.randn(3,5,4) >>> l = torch.randn(2,5) >>> r = torch.randn(2,4) >>> torch.einsum('bn,anm,bm->ba', l, A, r) tensor([[-0.3430, -5.2405, 0.4494], [ 0.3311, 5.5201, -3.0356]]) # torch.empty `torch.empty(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False, pin_memory=False) → Tensor` Returns a tensor filled with uninitialized data. The shape of the tensor is defined by the variable argument `size`. Parameters **size** (_int..._) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.contiguous_format`. Example: >>> torch.empty(2, 3) tensor(1.00000e-08 * [[ 6.3984, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]) # torch.empty_like `torch.empty_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor` Returns an uninitialized tensor with the same size as `input`. `torch.empty_like(input)` is equivalent to `torch.empty(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. Example: >>> torch.empty((2,3), dtype=torch.int64) tensor([[ 9.4064e+13, 2.8000e+01, 9.3493e+13], [ 7.5751e+18, 7.1428e+18, 7.5955e+18]]) # torch.empty_strided `torch.empty_strided(size, stride, *, dtype=None, layout=None, device=None, requires_grad=False, pin_memory=False) → Tensor` Returns a tensor filled with uninitialized data. The shape and strides of the tensor is defined by the variable argument `size` and `stride` respectively. `torch.empty_strided(size, stride)` is equivalent to `torch.empty(size).as_strided(size, stride)`. Warning More than one element of the created tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first. Parameters * **size** (_tuple of python:ints_) – the shape of the output tensor * **stride** (_tuple of python:ints_) – the strides of the output tensor Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`. Example: >>> a = torch.empty_strided((2, 3), (1, 2)) >>> a tensor([[8.9683e-44, 4.4842e-44, 5.1239e+07], [0.0000e+00, 0.0000e+00, 3.0705e-41]]) >>> a.stride() (1, 2) >>> a.size() torch.Size([2, 3]) # enable_grad `class torch.enable_grad` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#enable_grad) Context-manager that enables gradient calculation. Enables gradient calculation, if it has been disabled via [`no_grad`](torch.no_grad#torch.no_grad "torch.no_grad") or [`set_grad_enabled`](torch.set_grad_enabled#torch.set_grad_enabled "torch.set_grad_enabled"). This context manager is thread local; it will not affect computation in other threads. Also functions as a decorator. (Make sure to instantiate with parenthesis.) Example: >>> x = torch.tensor([1], requires_grad=True) >>> with torch.no_grad(): ... with torch.enable_grad(): ... y = x * 2 >>> y.requires_grad True >>> y.backward() >>> x.grad >>> @torch.enable_grad() ... def doubler(x): ... return x * 2 >>> with torch.no_grad(): ... z = doubler(x) >>> z.requires_grad True # torch.eq `torch.eq(input, other, *, out=None) → Tensor` Computes element-wise equality The second argument can be a number or a tensor whose shape is [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the first argument. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Returns A boolean tensor that is True where `input` is equal to `other` and False elsewhere Example: >>> torch.eq(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[ True, False], [False, True]]) # torch.equal `torch.equal(input, other) → bool` `True` if two tensors have the same size and elements, `False` otherwise. Example: >>> torch.equal(torch.tensor([1, 2]), torch.tensor([1, 2])) True # torch.erf `torch.erf(input, *, out=None) → Tensor` Computes the error function of each element. The error function is defined as follows: erf(x)=2π∫0xe−t2dt\mathrm{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.erf(torch.tensor([0, -1., 10.])) tensor([ 0.0000, -0.8427, 1.0000]) # torch.erfc `torch.erfc(input, *, out=None) → Tensor` Computes the complementary error function of each element of `input`. The complementary error function is defined as follows: erfc(x)=1−2π∫0xe−t2dt\mathrm{erfc}(x) = 1 - \frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{-t^2} dt Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.erfc(torch.tensor([0, -1., 10.])) tensor([ 1.0000, 1.8427, 0.0000]) # torch.erfinv `torch.erfinv(input, *, out=None) → Tensor` Computes the inverse error function of each element of `input`. The inverse error function is defined in the range (−1,1)(-1, 1) as: erfinv(erf(x))=x\mathrm{erfinv}(\mathrm{erf}(x)) = x Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.erfinv(torch.tensor([0, 0.5, -1.])) tensor([ 0.0000, 0.4769, -inf]) # torch.exp `torch.exp(input, *, out=None) → Tensor` Returns a new tensor with the exponential of the elements of the input tensor `input`. yi=exiy_{i} = e^{x_{i}} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.exp(torch.tensor([0, math.log(2.)])) tensor([ 1., 2.]) # torch.exp2 `torch.exp2(input, *, out=None) → Tensor` Computes the base two exponential function of `input`. yi=2xiy_{i} = 2^{x_{i}} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.exp2(torch.tensor([0, math.log2(2.), 3, 4])) tensor([ 1., 2., 8., 16.]) # torch.expm1 `torch.expm1(input, *, out=None) → Tensor` Returns a new tensor with the exponential of the elements minus 1 of `input`. yi=exi−1y_{i} = e^{x_{i}} - 1 Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.expm1(torch.tensor([0, math.log(2.)])) tensor([ 0., 1.]) # torch.eye `torch.eye(n, m=None, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a 2-D tensor with ones on the diagonal and zeros elsewhere. Parameters * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of rows * **m** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of columns with default being `n` Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Returns A 2-D tensor with ones on the diagonal and zeros elsewhere Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> torch.eye(3) tensor([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.]]) # torch.fake_quantize_per_channel_affine `torch.fake_quantize_per_channel_affine(input, scale, zero_point, quant_min, quant_max) → Tensor` Returns a new tensor with the data in `input` fake quantized per channel using `scale`, `zero_point`, `quant_min` and `quant_max`, across the channel specified by `axis`. output=min(quant_max,max(quant_min,std::nearby_int(input/scale)+zero_point))\text{output} = min( \text{quant\\_max}, max( \text{quant\\_min}, \text{std::nearby\\_int}(\text{input} / \text{scale}) + \text{zero\\_point} ) ) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input value(s), in `torch.float32`. * **scale** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – quantization scale, per channel * **zero_point** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – quantization zero_point, per channel * **axis** (_int32_) – channel axis * **quant_min** (_int64_) – lower bound of the quantized domain * **quant_max** (_int64_) – upper bound of the quantized domain Returns A newly fake_quantized per channel tensor Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> x = torch.randn(2, 2, 2) >>> x tensor([[[-0.2525, -0.0466], [ 0.3491, -0.2168]], [[-0.5906, 1.6258], [ 0.6444, -0.0542]]]) >>> scales = (torch.randn(2) + 1) * 0.05 >>> scales tensor([0.0475, 0.0486]) >>> zero_points = torch.zeros(2).to(torch.long) >>> zero_points tensor([0, 0]) >>> torch.fake_quantize_per_channel_affine(x, scales, zero_points, 1, 0, 255) tensor([[[0.0000, 0.0000], [0.3405, 0.0000]], [[0.0000, 1.6134], [0.6323, 0.0000]]]) # torch.fake_quantize_per_tensor_affine `torch.fake_quantize_per_tensor_affine(input, scale, zero_point, quant_min, quant_max) → Tensor` Returns a new tensor with the data in `input` fake quantized using `scale`, `zero_point`, `quant_min` and `quant_max`. output=min(quant_max,max(quant_min,std::nearby_int(input/scale)+zero_point))\text{output} = min( \text{quant\\_max}, max( \text{quant\\_min}, \text{std::nearby\\_int}(\text{input} / \text{scale}) + \text{zero\\_point} ) ) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input value(s), in `torch.float32`. * **scale** (_double_) – quantization scale * **zero_point** (_int64_) – quantization zero_point * **quant_min** (_int64_) – lower bound of the quantized domain * **quant_max** (_int64_) – upper bound of the quantized domain Returns A newly fake_quantized tensor Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> x = torch.randn(4) >>> x tensor([ 0.0552, 0.9730, 0.3973, -1.0780]) >>> torch.fake_quantize_per_tensor_affine(x, 0.1, 0, 0, 255) tensor([0.1000, 1.0000, 0.4000, 0.0000]) # torch.fix `torch.fix(input, *, out=None) → Tensor` Alias for [`torch.trunc()`](torch.trunc#torch.trunc "torch.trunc") # torch.flatten `torch.flatten(input, start_dim=0, end_dim=-1) → Tensor` Flattens `input` by reshaping it into a one-dimensional tensor. If `start_dim` or `end_dim` are passed, only dimensions starting with `start_dim` and ending with `end_dim` are flattened. The order of elements in `input` is unchanged. Unlike NumPy’s flatten, which always copies input’s data, this function may return the original object, a view, or copy. If no dimensions are flattened, then the original object `input` is returned. Otherwise, if input can be viewed as the flattened shape, then that view is returned. Finally, only if the input cannot be viewed as the flattened shape is input’s data copied. See [`torch.Tensor.view()`](../tensors#torch.Tensor.view "torch.Tensor.view") for details on when a view will be returned. Note Flattening a zero-dimensional tensor will return a one-dimensional view. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **start_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the first dim to flatten * **end_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the last dim to flatten Example: >>> t = torch.tensor([[[1, 2], ... [3, 4]], ... [[5, 6], ... [7, 8]]]) >>> torch.flatten(t) tensor([1, 2, 3, 4, 5, 6, 7, 8]) >>> torch.flatten(t, start_dim=1) tensor([[1, 2, 3, 4], [5, 6, 7, 8]]) # torch.flip `torch.flip(input, dims) → Tensor` Reverse the order of a n-D tensor along given axis in dims. Note `torch.flip` makes a copy of `input`’s data. This is different from NumPy’s `np.flip`, which returns a view in constant time. Since copying a tensor’s data is more work than viewing that data, `torch.flip` is expected to be slower than `np.flip`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dims** (_a list_ _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – axis to flip on Example: >>> x = torch.arange(8).view(2, 2, 2) >>> x tensor([[[ 0, 1], [ 2, 3]], [[ 4, 5], [ 6, 7]]]) >>> torch.flip(x, [0, 1]) tensor([[[ 6, 7], [ 4, 5]], [[ 2, 3], [ 0, 1]]]) # torch.fliplr `torch.fliplr(input) → Tensor` Flip tensor in the left/right direction, returning a new tensor. Flip the entries in each row in the left/right direction. Columns are preserved, but appear in a different order than before. Note Requires the tensor to be at least 2-D. Note `torch.fliplr` makes a copy of `input`’s data. This is different from NumPy’s `np.fliplr`, which returns a view in constant time. Since copying a tensor’s data is more work than viewing that data, `torch.fliplr` is expected to be slower than `np.fliplr`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Must be at least 2-dimensional. Example: >>> x = torch.arange(4).view(2, 2) >>> x tensor([[0, 1], [2, 3]]) >>> torch.fliplr(x) tensor([[1, 0], [3, 2]]) # torch.flipud `torch.flipud(input) → Tensor` Flip tensor in the up/down direction, returning a new tensor. Flip the entries in each column in the up/down direction. Rows are preserved, but appear in a different order than before. Note Requires the tensor to be at least 1-D. Note `torch.flipud` makes a copy of `input`’s data. This is different from NumPy’s `np.flipud`, which returns a view in constant time. Since copying a tensor’s data is more work than viewing that data, `torch.flipud` is expected to be slower than `np.flipud`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Must be at least 1-dimensional. Example: >>> x = torch.arange(4).view(2, 2) >>> x tensor([[0, 1], [2, 3]]) >>> torch.flipud(x) tensor([[2, 3], [0, 1]]) # torch.float_power `torch.float_power(input, exponent, *, out=None) → Tensor` Raises `input` to the power of `exponent`, elementwise, in double precision. If neither input is complex returns a `torch.float64` tensor, and if one or more inputs is complex returns a `torch.complex128` tensor. Note This function always computes in double precision, unlike [`torch.pow()`](torch.pow#torch.pow "torch.pow"), which implements more typical [type promotion](../tensor_attributes#type-promotion-doc). This is useful when the computation needs to be performed in a wider or more precise dtype, or the results of the computation may contain fractional values not representable in the input dtypes, like when an integer base is raised to a negative integer exponent. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the base value(s) * **exponent** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the exponent value(s) Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randint(10, (4,)) >>> a tensor([6, 4, 7, 1]) >>> torch.float_power(a, 2) tensor([36., 16., 49., 1.], dtype=torch.float64) >>> a = torch.arange(1, 5) >>> a tensor([ 1, 2, 3, 4]) >>> exp = torch.tensor([2, -3, 4, -5]) >>> exp tensor([ 2, -3, 4, -5]) >>> torch.float_power(a, exp) tensor([1.0000e+00, 1.2500e-01, 8.1000e+01, 9.7656e-04], dtype=torch.float64) # torch.floor `torch.floor(input, *, out=None) → Tensor` Returns a new tensor with the floor of the elements of `input`, the largest integer less than or equal to each element. outi=⌊inputi⌋\text{out}_{i} = \left\lfloor \text{input}_{i} \right\rfloor Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.8166, 1.5308, -0.2530, -0.2091]) >>> torch.floor(a) tensor([-1., 1., -1., -1.]) # torch.floor_divide `torch.floor_divide(input, other, *, out=None) → Tensor` Warning This function’s name is a misnomer. It actually rounds the quotient towards zero instead of taking its floor. This behavior will be deprecated in a future PyTorch release. Computes `input` divided by `other`, elementwise, and rounds each quotient towards zero. Equivalently, it truncates the quotient(s): outi=trunc(inputiotheri)\text{{out}}_i = \text{trunc} \left( \frac{{\text{{input}}_i}}{{\text{{other}}_i}} \right) Supports broadcasting to a common shape, type promotion, and integer and float inputs. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the dividend * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – the divisor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([4.0, 3.0]) >>> b = torch.tensor([2.0, 2.0]) >>> torch.floor_divide(a, b) tensor([2.0, 1.0]) >>> torch.floor_divide(a, 1.4) tensor([2.0, 2.0]) # torch.fmax `torch.fmax(input, other, *, out=None) → Tensor` Computes the element-wise maximum of `input` and `other`. This is like [`torch.maximum()`](torch.maximum#torch.maximum "torch.maximum") except it handles NaNs differently: if exactly one of the two elements being compared is a NaN then the non-NaN element is taken as the maximum. Only if both elements are NaN is NaN propagated. This function is a wrapper around C++’s `std::fmax` and is similar to NumPy’s `fmax` function. Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics), [type promotion](../tensor_attributes#type-promotion-doc), and integer and floating-point inputs. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([9.7, float('nan'), 3.1, float('nan')]) >>> b = torch.tensor([-2.2, 0.5, float('nan'), float('nan')]) >>> torch.fmax(a, b) tensor([9.7000, 0.5000, 3.1000, nan]) # torch.fmin `torch.fmin(input, other, *, out=None) → Tensor` Computes the element-wise minimum of `input` and `other`. This is like [`torch.minimum()`](torch.minimum#torch.minimum "torch.minimum") except it handles NaNs differently: if exactly one of the two elements being compared is a NaN then the non-NaN element is taken as the minimum. Only if both elements are NaN is NaN propagated. This function is a wrapper around C++’s `std::fmin` and is similar to NumPy’s `fmin` function. Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics), [type promotion](../tensor_attributes#type-promotion-doc), and integer and floating-point inputs. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([2.2, float('nan'), 2.1, float('nan')]) >>> b = torch.tensor([-9.3, 0.1, float('nan'), float('nan')]) >>> torch.fmin(a, b) tensor([-9.3000, 0.1000, 2.1000, nan]) # torch.fmod `torch.fmod(input, other, *, out=None) → Tensor` Computes the element-wise remainder of division. The dividend and divisor may contain both for integer and floating point numbers. The remainder has the same sign as the dividend `input`. Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics), [type promotion](../tensor_attributes#type-promotion-doc), and integer and float inputs. Note When the divisor is zero, returns `NaN` for floating point dtypes on both CPU and GPU; raises `RuntimeError` for integer division by zero on CPU; Integer division by zero on GPU may return any value. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the dividend * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – the divisor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.fmod(torch.tensor([-3., -2, -1, 1, 2, 3]), 2) tensor([-1., -0., -1., 1., 0., 1.]) >>> torch.fmod(torch.tensor([1, 2, 3, 4, 5]), 1.5) tensor([1.0000, 0.5000, 0.0000, 1.0000, 0.5000]) # torch.frac `torch.frac(input, *, out=None) → Tensor` Computes the fractional portion of each element in `input`. outi=inputi−⌊∣inputi∣⌋∗sgn⁡(inputi)\text{out}_{i} = \text{input}_{i} - \left\lfloor |\text{input}_{i}| \right\rfloor * \operatorname{sgn}(\text{input}_{i}) Example: >>> torch.frac(torch.tensor([1, 2.5, -3.2])) tensor([ 0.0000, 0.5000, -0.2000]) # torch.from_numpy `torch.from_numpy(ndarray) → Tensor` Creates a [`Tensor`](../tensors#torch.Tensor "torch.Tensor") from a [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray "\(in NumPy v1.20\)"). The returned tensor and `ndarray` share the same memory. Modifications to the tensor will be reflected in the `ndarray` and vice versa. The returned tensor is not resizable. It currently accepts `ndarray` with dtypes of `numpy.float64`, `numpy.float32`, `numpy.float16`, `numpy.complex64`, `numpy.complex128`, `numpy.int64`, `numpy.int32`, `numpy.int16`, `numpy.int8`, `numpy.uint8`, and `numpy.bool`. Example: >>> a = numpy.array([1, 2, 3]) >>> t = torch.from_numpy(a) >>> t tensor([ 1, 2, 3]) >>> t[0] = -1 >>> a array([-1, 2, 3]) # torch.full `torch.full(size, fill_value, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Creates a tensor of size `size` filled with `fill_value`. The tensor’s dtype is inferred from `fill_value`. Parameters * **size** (_int..._) – a list, tuple, or `torch.Size` of integers defining the shape of the output tensor. * **fill_value** (_Scalar_) – the value to fill the output tensor with. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.full((2, 3), 3.141592) tensor([[ 3.1416, 3.1416, 3.1416], [ 3.1416, 3.1416, 3.1416]]) # torch.full_like `torch.full_like(input, fill_value, *, dtype=None, layout=torch.strided, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor` Returns a tensor with the same size as `input` filled with `fill_value`. `torch.full_like(input, fill_value)` is equivalent to `torch.full(input.size(), fill_value, dtype=input.dtype, layout=input.layout, device=input.device)`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor. * **fill_value** – the number to fill the output tensor with. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. # torch.gather `torch.gather(input, dim, index, *, sparse_grad=False, out=None) → Tensor` Gathers values along an axis specified by `dim`. For a 3-D tensor the output is specified by: out[i][j][k] = input[index[i][j][k]][j][k] # if dim == 0 out[i][j][k] = input[i][index[i][j][k]][k] # if dim == 1 out[i][j][k] = input[i][j][index[i][j][k]] # if dim == 2 `input` and `index` must have the same number of dimensions. It is also required that `index.size(d) <= input.size(d)` for all dimensions `d != dim`. `out` will have the same shape as `index`. Note that `input` and `index` do not broadcast against each other. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the source tensor * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the axis along which to index * **index** (_LongTensor_) – the indices of elements to gather Keyword Arguments * **sparse_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, gradient w.r.t. `input` will be a sparse tensor. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the destination tensor Example: >>> t = torch.tensor([[1, 2], [3, 4]]) >>> torch.gather(t, 1, torch.tensor([[0, 0], [1, 0]])) tensor([[ 1, 1], [ 4, 3]]) # torch.gcd `torch.gcd(input, other, *, out=None) → Tensor` Computes the element-wise greatest common divisor (GCD) of `input` and `other`. Both `input` and `other` must have integer types. Note This defines gcd(0,0)=0gcd(0, 0) = 0 . Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([5, 10, 15]) >>> b = torch.tensor([3, 4, 5]) >>> torch.gcd(a, b) tensor([1, 2, 5]) >>> c = torch.tensor([3]) >>> torch.gcd(a, c) tensor([1, 1, 3]) # torch.ge `torch.ge(input, other, *, out=None) → Tensor` Computes input≥other\text{input} \geq \text{other} element-wise. The second argument can be a number or a tensor whose shape is [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the first argument. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Returns A boolean tensor that is True where `input` is greater than or equal to `other` and False elsewhere Example: >>> torch.ge(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[True, True], [False, True]]) # Generator `class torch.Generator(device='cpu') → Generator` Creates and returns a generator object that manages the state of the algorithm which produces pseudo random numbers. Used as a keyword argument in many [In- place random sampling](../torch#inplace-random-sampling) functions. Parameters **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device for the generator. Returns An torch.Generator object. Return type Generator Example: >>> g_cpu = torch.Generator() >>> g_cuda = torch.Generator(device='cuda') `device` Generator.device -> device Gets the current device of the generator. Example: >>> g_cpu = torch.Generator() >>> g_cpu.device device(type='cpu') `get_state() → Tensor` Returns the Generator state as a `torch.ByteTensor`. Returns A `torch.ByteTensor` which contains all the necessary bits to restore a Generator to a specific point in time. Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> g_cpu = torch.Generator() >>> g_cpu.get_state() `initial_seed() → int` Returns the initial seed for generating random numbers. Example: >>> g_cpu = torch.Generator() >>> g_cpu.initial_seed() 2147483647 `manual_seed(seed) → Generator` Sets the seed for generating random numbers. Returns a `torch.Generator` object. It is recommended to set a large seed, i.e. a number that has a good balance of 0 and 1 bits. Avoid having many 0 bits in the seed. Parameters **seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The desired seed. Value must be within the inclusive range `[-0x8000_0000_0000_0000, 0xffff_ffff_ffff_ffff]`. Otherwise, a RuntimeError is raised. Negative inputs are remapped to positive values with the formula `0xffff_ffff_ffff_ffff + seed`. Returns An torch.Generator object. Return type Generator Example: >>> g_cpu = torch.Generator() >>> g_cpu.manual_seed(2147483647) `seed() → int` Gets a non-deterministic random number from std::random_device or the current time and uses it to seed a Generator. Example: >>> g_cpu = torch.Generator() >>> g_cpu.seed() 1516516984916 `set_state(new_state) → void` Sets the Generator state. Parameters **new_state** (_torch.ByteTensor_) – The desired state. Example: >>> g_cpu = torch.Generator() >>> g_cpu_other = torch.Generator() >>> g_cpu.set_state(g_cpu_other.get_state()) # torch.geqrf `torch.geqrf(input, *, out=None) -> (Tensor, Tensor)` This is a low-level function for calling LAPACK directly. This function returns a namedtuple (a, tau) as defined in [LAPACK documentation for geqrf](https://software.intel.com/en-us/node/521004) . You’ll generally want to use [`torch.qr()`](torch.qr#torch.qr "torch.qr") instead. Computes a QR decomposition of `input`, but without constructing QQ and RR as explicit separate matrices. Rather, this directly calls the underlying LAPACK function `?geqrf` which produces a sequence of ‘elementary reflectors’. See [LAPACK documentation for geqrf](https://software.intel.com/en- us/node/521004) for further details. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input matrix Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the output tuple of (Tensor, Tensor) # torch.ger `torch.ger(input, vec2, *, out=None) → Tensor` Alias of [`torch.outer()`](torch.outer#torch.outer "torch.outer"). Warning This function is deprecated and will be removed in a future PyTorch release. Use [`torch.outer()`](torch.outer#torch.outer "torch.outer") instead. # torch.get_default_dtype `torch.get_default_dtype() → torch.dtype` Get the current default floating point [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"). Example: >>> torch.get_default_dtype() # initial default for floating point is torch.float32 torch.float32 >>> torch.set_default_dtype(torch.float64) >>> torch.get_default_dtype() # default is now changed to torch.float64 torch.float64 >>> torch.set_default_tensor_type(torch.FloatTensor) # setting tensor type also affects this >>> torch.get_default_dtype() # changed to torch.float32, the dtype for torch.FloatTensor torch.float32 # torch.get_num_interop_threads `torch.get_num_interop_threads() → int` Returns the number of threads used for inter-op parallelism on CPU (e.g. in JIT interpreter) # torch.get_num_threads `torch.get_num_threads() → int` Returns the number of threads used for parallelizing CPU operations # torch.get_rng_state `torch.get_rng_state()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#get_rng_state) Returns the random number generator state as a `torch.ByteTensor`. # torch.greater `torch.greater(input, other, *, out=None) → Tensor` Alias for [`torch.gt()`](torch.gt#torch.gt "torch.gt"). # torch.greater_equal `torch.greater_equal(input, other, *, out=None) → Tensor` Alias for [`torch.ge()`](torch.ge#torch.ge "torch.ge"). # torch.gt `torch.gt(input, other, *, out=None) → Tensor` Computes input>other\text{input} > \text{other} element-wise. The second argument can be a number or a tensor whose shape is [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the first argument. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Returns A boolean tensor that is True where `input` is greater than `other` and False elsewhere Example: >>> torch.gt(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[False, True], [False, False]]) # torch.hamming_window `torch.hamming_window(window_length, periodic=True, alpha=0.54, beta=0.46, *, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Hamming window function. w[n]=α−βcos⁡(2πnN−1),w[n] = \alpha - \beta\ \cos \left( \frac{2 \pi n}{N - 1} \right), where NN is the full window size. The input `window_length` is a positive integer controlling the returned window size. `periodic` flag determines whether the returned window trims off the last duplicate value from the symmetric window and is ready to be used as a periodic window with functions like [`torch.stft()`](torch.stft#torch.stft "torch.stft"). Therefore, if `periodic` is true, the NN in above formula is in fact window_length+1\text{window\\_length} + 1 . Also, we always have `torch.hamming_window(L, periodic=True)` equal to `torch.hamming_window(L + 1, periodic=False)[:-1])`. Note If `window_length` =1=1 , the returned window contains a single value 1. Note This is a generalized version of [`torch.hann_window()`](torch.hann_window#torch.hann_window "torch.hann_window"). Parameters * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of returned window * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a window to be used as periodic function. If False, return a symmetric window. * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The coefficient α\alpha in the equation above * **beta** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The coefficient β\beta in the equation above Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). Only floating point types are supported. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Returns A 1-D tensor of size (window_length,)(\text{window\\_length},) containing the window Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") # torch.hann_window `torch.hann_window(window_length, periodic=True, *, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Hann window function. w[n]=12[1−cos⁡(2πnN−1)]=sin⁡2(πnN−1),w[n] = \frac{1}{2}\ \left[1 - \cos \left( \frac{2 \pi n}{N - 1} \right)\right] = \sin^2 \left( \frac{\pi n}{N - 1} \right), where NN is the full window size. The input `window_length` is a positive integer controlling the returned window size. `periodic` flag determines whether the returned window trims off the last duplicate value from the symmetric window and is ready to be used as a periodic window with functions like [`torch.stft()`](torch.stft#torch.stft "torch.stft"). Therefore, if `periodic` is true, the NN in above formula is in fact window_length+1\text{window\\_length} + 1 . Also, we always have `torch.hann_window(L, periodic=True)` equal to `torch.hann_window(L + 1, periodic=False)[:-1])`. Note If `window_length` =1=1 , the returned window contains a single value 1. Parameters * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of returned window * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a window to be used as periodic function. If False, return a symmetric window. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). Only floating point types are supported. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Returns A 1-D tensor of size (window_length,)(\text{window\\_length},) containing the window Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") # torch.heaviside `torch.heaviside(input, values, *, out=None) → Tensor` Computes the Heaviside step function for each element in `input`. The Heaviside step function is defined as: heaviside(input,values)={0,if input < 0values,if input == 01,if input > 0\text{{heaviside}}(input, values) = \begin{cases} 0, & \text{if input < 0}\\\ values, & \text{if input == 0}\\\ 1, & \text{if input > 0} \end{cases} Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **values** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The values to use where `input` is zero. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> input = torch.tensor([-1.5, 0, 2.0]) >>> values = torch.tensor([0.5]) >>> torch.heaviside(input, values) tensor([0.0000, 0.5000, 1.0000]) >>> values = torch.tensor([1.2, -2.0, 3.5]) >>> torch.heaviside(input, values) tensor([0., -2., 1.]) # torch.histc `torch.histc(input, bins=100, min=0, max=0, *, out=None) → Tensor` Computes the histogram of a tensor. The elements are sorted into equal width bins between [`min`](torch.min#torch.min "torch.min") and [`max`](torch.max#torch.max "torch.max"). If [`min`](torch.min#torch.min "torch.min") and [`max`](torch.max#torch.max "torch.max") are both zero, the minimum and maximum values of the data are used. Elements lower than min and higher than max are ignored. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **bins** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of histogram bins * **min** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – lower end of the range (inclusive) * **max** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – upper end of the range (inclusive) Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Returns Histogram represented as a tensor Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> torch.histc(torch.tensor([1., 2, 1]), bins=4, min=0, max=3) tensor([ 0., 2., 1., 0.]) # torch.hstack `torch.hstack(tensors, *, out=None) → Tensor` Stack tensors in sequence horizontally (column wise). This is equivalent to concatenation along the first axis for 1-D tensors, and along the second axis for all other tensors. Parameters **tensors** (_sequence of Tensors_) – sequence of tensors to concatenate Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([1, 2, 3]) >>> b = torch.tensor([4, 5, 6]) >>> torch.hstack((a,b)) tensor([1, 2, 3, 4, 5, 6]) >>> a = torch.tensor([[1],[2],[3]]) >>> b = torch.tensor([[4],[5],[6]]) >>> torch.hstack((a,b)) tensor([[1, 4], [2, 5], [3, 6]]) # torch.hypot `torch.hypot(input, other, *, out=None) → Tensor` Given the legs of a right triangle, return its hypotenuse. outi=inputi2+otheri2\text{out}_{i} = \sqrt{\text{input}_{i}^{2} + \text{other}_{i}^{2}} The shapes of `input` and `other` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.hypot(torch.tensor([4.0]), torch.tensor([3.0, 4.0, 5.0])) tensor([5.0000, 5.6569, 6.4031]) # torch.i0 `torch.i0(input, *, out=None) → Tensor` Computes the zeroth order modified Bessel function of the first kind for each element of `input`. outi=I0(inputi)=∑k=0∞(inputi2/4)k(k!)2\text{out}_{i} = I_0(\text{input}_{i}) = \sum_{k=0}^{\infty} \frac{(\text{input}_{i}^2/4)^k}{(k!)^2} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.i0(torch.arange(5, dtype=torch.float32)) tensor([ 1.0000, 1.2661, 2.2796, 4.8808, 11.3019]) # torch.igamma `torch.igamma(input, other, *, out=None) → Tensor` Computes the regularized lower incomplete gamma function: outi=1Γ(inputi)∫0otheritinputi−1e−tdt\text{out}_{i} = \frac{1}{\Gamma(\text{input}_i)} \int_0^{\text{other}_i} t^{\text{input}_i-1} e^{-t} dt where both inputi\text{input}_i and otheri\text{other}_i are weakly positive and at least one is strictly positive. If both are zero or either is negative then outi=nan\text{out}_i=\text{nan} . Γ(⋅)\Gamma(\cdot) in the equation above is the gamma function, Γ(inputi)=∫0∞t(inputi−1)e−tdt.\Gamma(\text{input}_i) = \int_0^\infty t^{(\text{input}_i-1)} e^{-t} dt. See [`torch.igammac()`](torch.igammac#torch.igammac "torch.igammac") and [`torch.lgamma()`](torch.lgamma#torch.lgamma "torch.lgamma") for related functions. Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) and float inputs. Note The backward pass with respect to `input` is not yet supported. Please open an issue on PyTorch’s Github to request it. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first non-negative input tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second non-negative input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a1 = torch.tensor([4.0]) >>> a2 = torch.tensor([3.0, 4.0, 5.0]) >>> a = torch.igammac(a1, a2) tensor([0.3528, 0.5665, 0.7350]) tensor([0.3528, 0.5665, 0.7350]) >>> b = torch.igamma(a1, a2) + torch.igammac(a1, a2) tensor([1., 1., 1.]) # torch.igammac `torch.igammac(input, other, *, out=None) → Tensor` Computes the regularized upper incomplete gamma function: outi=1Γ(inputi)∫otheri∞tinputi−1e−tdt\text{out}_{i} = \frac{1}{\Gamma(\text{input}_i)} \int_{\text{other}_i}^{\infty} t^{\text{input}_i-1} e^{-t} dt where both inputi\text{input}_i and otheri\text{other}_i are weakly positive and at least one is strictly positive. If both are zero or either is negative then outi=nan\text{out}_i=\text{nan} . Γ(⋅)\Gamma(\cdot) in the equation above is the gamma function, Γ(inputi)=∫0∞t(inputi−1)e−tdt.\Gamma(\text{input}_i) = \int_0^\infty t^{(\text{input}_i-1)} e^{-t} dt. See [`torch.igamma()`](torch.igamma#torch.igamma "torch.igamma") and [`torch.lgamma()`](torch.lgamma#torch.lgamma "torch.lgamma") for related functions. Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) and float inputs. Note The backward pass with respect to `input` is not yet supported. Please open an issue on PyTorch’s Github to request it. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first non-negative input tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second non-negative input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a1 = torch.tensor([4.0]) >>> a2 = torch.tensor([3.0, 4.0, 5.0]) >>> a = torch.igammac(a1, a2) tensor([0.6472, 0.4335, 0.2650]) >>> b = torch.igamma(a1, a2) + torch.igammac(a1, a2) tensor([1., 1., 1.]) # torch.imag `torch.imag(input) → Tensor` Returns a new tensor containing imaginary values of the `self` tensor. The returned tensor and `self` share the same underlying storage. Warning `imag()` is only supported for tensors with complex dtypes. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example:: >>> x=torch.randn(4, dtype=torch.cfloat) >>> x tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)]) >>> x.imag tensor([ 0.3553, -0.7896, -0.0633, -0.8119]) # torch.index_select `torch.index_select(input, dim, index, *, out=None) → Tensor` Returns a new tensor which indexes the `input` tensor along dimension `dim` using the entries in `index` which is a `LongTensor`. The returned tensor has the same number of dimensions as the original tensor (`input`). The `dim`th dimension has the same size as the length of `index`; other dimensions have the same size as in the original tensor. Note The returned tensor does **not** use the same storage as the original tensor. If `out` has a different shape than expected, we silently change it to the correct shape, reallocating the underlying storage if necessary. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension in which we index * **index** (_IntTensor_ _or_ _LongTensor_) – the 1-D tensor containing the indices to index Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x = torch.randn(3, 4) >>> x tensor([[ 0.1427, 0.0231, -0.5414, -1.0009], [-0.4664, 0.2647, -0.1228, -1.1068], [-1.1734, -0.6571, 0.7230, -0.6004]]) >>> indices = torch.tensor([0, 2]) >>> torch.index_select(x, 0, indices) tensor([[ 0.1427, 0.0231, -0.5414, -1.0009], [-1.1734, -0.6571, 0.7230, -0.6004]]) >>> torch.index_select(x, 1, indices) tensor([[ 0.1427, -0.5414], [-0.4664, -0.1228], [-1.1734, 0.7230]]) # torch.initial_seed `torch.initial_seed()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#initial_seed) Returns the initial seed for generating random numbers as a Python `long`. # torch.inner `torch.inner(input, other, *, out=None) → Tensor` Computes the dot product for 1D tensors. For higher dimensions, sums the product of elements from `input` and `other` along their last dimension. Note If either `input` or `other` is a scalar, the result is equivalent to `torch.mul(input, other)`. If both `input` and `other` are non-scalars, the size of their last dimension must match and the result is equivalent to `torch.tensordot(input, other, dims=([-1], [-1]))` Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – First input tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – Optional output tensor to write result into. The output shape is `input.shape[:-1] + other.shape[:-1]`. Example: # Dot product >>> torch.inner(torch.tensor([1, 2, 3]), torch.tensor([0, 2, 1])) tensor(7) # Multidimensional input tensors >>> a = torch.randn(2, 3) >>> a tensor([[0.8173, 1.0874, 1.1784], [0.3279, 0.1234, 2.7894]]) >>> b = torch.randn(2, 4, 3) >>> b tensor([[[-0.4682, -0.7159, 0.1506], [ 0.4034, -0.3657, 1.0387], [ 0.9892, -0.6684, 0.1774], [ 0.9482, 1.3261, 0.3917]], [[ 0.4537, 0.7493, 1.1724], [ 0.2291, 0.5749, -0.2267], [-0.7920, 0.3607, -0.3701], [ 1.3666, -0.5850, -1.7242]]]) >>> torch.inner(a, b) tensor([[[-0.9837, 1.1560, 0.2907, 2.6785], [ 2.5671, 0.5452, -0.6912, -1.5509]], [[ 0.1782, 2.9843, 0.7366, 1.5672], [ 3.5115, -0.4864, -1.2476, -4.4337]]]) # Scalar input >>> torch.inner(a, torch.tensor(2)) tensor([[1.6347, 2.1748, 2.3567], [0.6558, 0.2469, 5.5787]]) # torch.inverse `torch.inverse(input, *, out=None) → Tensor` Takes the inverse of the square matrix `input`. `input` can be batches of 2D square tensors, in which case this function would return a tensor composed of individual inverses. Supports real and complex input. Note `torch.inverse()` is deprecated. Please use [`torch.linalg.inv()`](../linalg#torch.linalg.inv "torch.linalg.inv") instead. Note Irrespective of the original strides, the returned tensors will be transposed, i.e. with strides like `input.contiguous().transpose(-2, -1).stride()` Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,n,n)(*, n, n) where `*` is zero or more batch dimensions Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Examples: >>> x = torch.rand(4, 4) >>> y = torch.inverse(x) >>> z = torch.mm(x, y) >>> z tensor([[ 1.0000, -0.0000, -0.0000, 0.0000], [ 0.0000, 1.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 1.0000, 0.0000], [ 0.0000, -0.0000, -0.0000, 1.0000]]) >>> torch.max(torch.abs(z - torch.eye(4))) # Max non-zero tensor(1.1921e-07) >>> # Batched inverse example >>> x = torch.randn(2, 3, 4, 4) >>> y = torch.inverse(x) >>> z = torch.matmul(x, y) >>> torch.max(torch.abs(z - torch.eye(4).expand_as(x))) # Max non-zero tensor(1.9073e-06) >>> x = torch.rand(4, 4, dtype=torch.cdouble) >>> y = torch.inverse(x) >>> z = torch.mm(x, y) >>> z tensor([[ 1.0000e+00+0.0000e+00j, -1.3878e-16+3.4694e-16j, 5.5511e-17-1.1102e-16j, 0.0000e+00-1.6653e-16j], [ 5.5511e-16-1.6653e-16j, 1.0000e+00+6.9389e-17j, 2.2204e-16-1.1102e-16j, -2.2204e-16+1.1102e-16j], [ 3.8858e-16-1.2490e-16j, 2.7756e-17+3.4694e-17j, 1.0000e+00+0.0000e+00j, -4.4409e-16+5.5511e-17j], [ 4.4409e-16+5.5511e-16j, -3.8858e-16+1.8041e-16j, 2.2204e-16+0.0000e+00j, 1.0000e+00-3.4694e-16j]], dtype=torch.complex128) >>> torch.max(torch.abs(z - torch.eye(4, dtype=torch.cdouble))) # Max non-zero tensor(7.5107e-16, dtype=torch.float64) # torch.is_complex `torch.is_complex(input) -> (bool)` Returns True if the data type of `input` is a complex data type i.e., one of `torch.complex64`, and `torch.complex128`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. # torch.is_floating_point `torch.is_floating_point(input) -> (bool)` Returns True if the data type of `input` is a floating point data type i.e., one of `torch.float64`, `torch.float32`, `torch.float16`, and `torch.bfloat16`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. # torch.is_nonzero `torch.is_nonzero(input) -> (bool)` Returns True if the `input` is a single element tensor which is not equal to zero after type conversions. i.e. not equal to `torch.tensor([0.])` or `torch.tensor([0])` or `torch.tensor([False])`. Throws a `RuntimeError` if `torch.numel() != 1` (even in case of sparse tensors). Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Examples: >>> torch.is_nonzero(torch.tensor([0.])) False >>> torch.is_nonzero(torch.tensor([1.5])) True >>> torch.is_nonzero(torch.tensor([False])) False >>> torch.is_nonzero(torch.tensor([3])) True >>> torch.is_nonzero(torch.tensor([1, 3, 5])) Traceback (most recent call last): ... RuntimeError: bool value of Tensor with more than one value is ambiguous >>> torch.is_nonzero(torch.tensor([])) Traceback (most recent call last): ... RuntimeError: bool value of Tensor with no values is ambiguous # torch.is_storage `torch.is_storage(obj)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#is_storage) Returns True if `obj` is a PyTorch storage object. Parameters **obj** (_Object_) – Object to test # torch.is_tensor `torch.is_tensor(obj)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#is_tensor) Returns True if `obj` is a PyTorch tensor. Note that this function is simply doing `isinstance(obj, Tensor)`. Using that `isinstance` check is better for typechecking with mypy, and more explicit - so it’s recommended to use that instead of `is_tensor`. Parameters **obj** (_Object_) – Object to test # torch.isclose `torch.isclose(input, other, rtol=1e-05, atol=1e-08, equal_nan=False) → Tensor` Returns a new tensor with boolean elements representing if each element of `input` is “close” to the corresponding element of `other`. Closeness is defined as: ∣input−other∣≤atol+rtol×∣other∣\lvert \text{input} - \text{other} \rvert \leq \texttt{atol} + \texttt{rtol} \times \lvert \text{other} \rvert where `input` and `other` are finite. Where `input` and/or `other` are nonfinite they are close if and only if they are equal, with NaNs being considered equal to each other when `equal_nan` is True. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor to compare * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor to compare * **atol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – absolute tolerance. Default: 1e-08 * **rtol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – relative tolerance. Default: 1e-05 * **equal_nan** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, then two `NaN` s will be considered equal. Default: `False` Examples: >>> torch.isclose(torch.tensor((1., 2, 3)), torch.tensor((1 + 1e-10, 3, 4))) tensor([ True, False, False]) >>> torch.isclose(torch.tensor((float('inf'), 4)), torch.tensor((float('inf'), 6)), rtol=.5) tensor([True, True]) # torch.isfinite `torch.isfinite(input) → Tensor` Returns a new tensor with boolean elements representing if each element is `finite` or not. Real values are finite when they are not NaN, negative infinity, or infinity. Complex values are finite when both their real and imaginary parts are finite. Args: input (Tensor): the input tensor. Returns: A boolean tensor that is True where `input` is finite and False elsewhere Example: >>> torch.isfinite(torch.tensor([1, float('inf'), 2, float('-inf'), float('nan')])) tensor([True, False, True, False, False]) # torch.isinf `torch.isinf(input) → Tensor` Tests if each element of `input` is infinite (positive or negative infinity) or not. Note Complex values are infinite when their real or imaginary part is infinite. Args: {input} Returns: A boolean tensor that is True where `input` is infinite and False elsewhere Example: >>> torch.isinf(torch.tensor([1, float('inf'), 2, float('-inf'), float('nan')])) tensor([False, True, False, True, False]) # torch.isnan `torch.isnan(input) → Tensor` Returns a new tensor with boolean elements representing if each element of `input` is NaN or not. Complex values are considered NaN when either their real and/or imaginary part is NaN. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Returns A boolean tensor that is True where `input` is NaN and False elsewhere Example: >>> torch.isnan(torch.tensor([1, float('nan'), 2])) tensor([False, True, False]) # torch.isneginf `torch.isneginf(input, *, out=None) → Tensor` Tests if each element of `input` is negative infinity or not. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> a = torch.tensor([-float('inf'), float('inf'), 1.2]) >>> torch.isneginf(a) tensor([ True, False, False]) # torch.isposinf `torch.isposinf(input, *, out=None) → Tensor` Tests if each element of `input` is positive infinity or not. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> a = torch.tensor([-float('inf'), float('inf'), 1.2]) >>> torch.isposinf(a) tensor([False, True, False]) # torch.isreal `torch.isreal(input) → Tensor` Returns a new tensor with boolean elements representing if each element of `input` is real-valued or not. All real-valued types are considered real. Complex values are considered real when their imaginary part is 0. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Returns A boolean tensor that is True where `input` is real and False elsewhere Example: >>> torch.isreal(torch.tensor([1, 1+1j, 2+0j])) tensor([True, False, True]) # torch.istft `torch.istft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, normalized=False, onesided=None, length=None, return_complex=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#istft) Inverse short time Fourier Transform. This is expected to be the inverse of [`stft()`](torch.stft#torch.stft "torch.stft"). It has the same parameters (+ additional optional parameter of `length`) and it should return the least squares estimation of the original signal. The algorithm will check using the NOLA condition ( nonzero overlap). Important consideration in the parameters `window` and `center` so that the envelop created by the summation of all the windows is never zero at certain point in time. Specifically, ∑t=−∞∞∣w∣2[n−t×hop_length]=0\sum_{t=-\infty}^{\infty} |w|^2[n-t\times hop\\_length] \cancel{=} 0 . Since [`stft()`](torch.stft#torch.stft "torch.stft") discards elements at the end of the signal if they do not fit in a frame, `istft` may return a shorter signal than the original signal (can occur if `center` is False since the signal isn’t padded). If `center` is `True`, then there will be padding e.g. `'constant'`, `'reflect'`, etc. Left padding can be trimmed off exactly because they can be calculated but right padding cannot be calculated without additional information. Example: Suppose the last window is: `[17, 18, 0, 0, 0]` vs `[18, 0, 0, 0, 0]` The `n_fft`, `hop_length`, `win_length` are all the same which prevents the calculation of right padding. These additional values could be zeros or a reflection of the signal so providing `length` could be useful. If `length` is `None` then padding will be aggressively removed (some loss of signal). [1] D. W. Griffin and J. S. Lim, “Signal estimation from modified short-time Fourier transform,” IEEE Trans. ASSP, vol.32, no.2, pp.236-243, Apr. 1984. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The input tensor. Expected to be output of [`stft()`](torch.stft#torch.stft "torch.stft"), can either be complex (`channel`, `fft_size`, `n_frame`), or real (`channel`, `fft_size`, `n_frame`, 2) where the `channel` dimension is optional. Deprecated since version 1.8.0: Real input is deprecated, use complex inputs as returned by `stft(..., return_complex=True)` instead. * **n_fft** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of Fourier transform * **hop_length** (_Optional_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – The distance between neighboring sliding window frames. (Default: `n_fft // 4`) * **win_length** (_Optional_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – The size of window frame and STFT filter. (Default: `n_fft`) * **window** (_Optional_ _[_[torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – The optional window function. (Default: `torch.ones(win_length)`) * **center** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether `input` was padded on both sides so that the tt -th frame is centered at time t×hop_lengtht \times \text{hop\\_length} . (Default: `True`) * **normalized** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether the STFT was normalized. (Default: `False`) * **onesided** (_Optional_ _[_[bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _]_) – Whether the STFT was onesided. (Default: `True` if `n_fft != fft_size` in the input size) * **length** (_Optional_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – The amount to trim the signal by (i.e. the original signal length). (Default: whole signal) * **return_complex** (_Optional_ _[_[bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _]_) – Whether the output should be complex, or if the input should be assumed to derive from a real signal and window. Note that this is incompatible with `onesided=True`. (Default: `False`) Returns Least squares estimation of the original signal of size (…, signal_length) Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") # torch.jit.fork `torch.jit.fork(func, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_async.html#fork) Creates an asynchronous task executing `func` and a reference to the value of the result of this execution. `fork` will return immediately, so the return value of `func` may not have been computed yet. To force completion of the task and access the return value invoke `torch.jit.wait` on the Future. `fork` invoked with a `func` which returns `T` is typed as `torch.jit.Future[T]`. `fork` calls can be arbitrarily nested, and may be invoked with positional and keyword arguments. Asynchronous execution will only occur when run in TorchScript. If run in pure python, `fork` will not execute in parallel. `fork` will also not execute in parallel when invoked while tracing, however the `fork` and `wait` calls will be captured in the exported IR Graph. .. warning: `fork` tasks will execute non-deterministicly. We recommend only spawning parallel fork tasks for pure functions that do not modify their inputs, module attributes, or global state. Parameters * **func** (_callable_ _or_[torch.nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – A Python function or `torch.nn.Module` that will be invoked. If executed in TorchScript, it will execute asynchronously, otherwise it will not. Traced invocations of fork will be captured in the IR. * ****kwargs** (_*args_ _,_) – arguments to invoke `func` with. Returns a reference to the execution of `func`. The value `T` can only be accessed by forcing completion of `func` through `torch.jit.wait`. Return type `torch.jit.Future[T]` Example (fork a free function): import torch from torch import Tensor def foo(a : Tensor, b : int) -> Tensor: return a + b def bar(a): fut : torch.jit.Future[Tensor] = torch.jit.fork(foo, a, b=2) return torch.jit.wait(fut) script_bar = torch.jit.script(bar) input = torch.tensor(2) # only the scripted version executes asynchronously assert script_bar(input) == bar(input) # trace is not run asynchronously, but fork is captured in IR graph = torch.jit.trace(bar, (input,)).graph assert "fork" in str(graph) Example (fork a module method): import torch from torch import Tensor class AddMod(torch.nn.Module): def forward(self, a: Tensor, b : int): return a + b class Mod(torch.nn.Module): def __init__(self): super(self).__init__() self.mod = AddMod() def forward(self, input): fut = torch.jit.fork(self.mod, a, b=2) return torch.jit.wait(fut) input = torch.tensor(2) mod = Mod() assert mod(input) == torch.jit.script(mod).forward(input) # torch.jit.freeze `torch.jit.freeze(mod, preserved_attrs=None, optimize_numerics=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_freeze.html#freeze) Freezing a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") will clone it and attempt to inline the cloned module’s submodules, parameters, and attributes as constants in the TorchScript IR Graph. By default, `forward` will be preserved, as well as attributes & methods specified in `preserved_attrs`. Additionally, any attribute that is modified within a preserved method will be preserved. Freezing currently only accepts ScriptModules that are in eval mode. Parameters * **mod** ([`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")) – a module to be frozen * **preserved_attrs** (_Optional_ _[__List_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]__]_) – a list of attributes to preserve in addition to the forward method. * **modified in preserved methods will also be preserved.** (_Attributes_) – * **optimize_numerics** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, a set of optimization passes will be run that does not strictly * **numerics. Full details of optimization can be found at torch.jit.optimize_frozen_module.** (_preserve_) – Returns Frozen [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"). Example (Freezing a simple module with a Parameter): def forward(self, input): output = self.weight.mm(input) output = self.linear(output) return output scripted_module = torch.jit.script(MyModule(2, 3).eval()) frozen_module = torch.jit.freeze(scripted_module) # parameters have been removed and inlined into the Graph as constants assert len(list(frozen_module.named_parameters())) == 0 # See the compiled graph as Python code print(frozen_module.code) Example (Freezing a module with preserved attributes) def forward(self, input): self.modified_tensor += 1 return input + self.modified_tensor scripted_module = torch.jit.script(MyModule2().eval()) frozen_module = torch.jit.freeze(scripted_module, preserved_attrs=["version"]) # we've manually preserved `version`, so it still exists on the frozen module and can be modified assert frozen_module.version == 1 frozen_module.version = 2 # `modified_tensor` is detected as being mutated in the forward, so freezing preserves # it to retain model semantics assert frozen_module(torch.tensor(1)) == torch.tensor(12) # now that we've run it once, the next result will be incremented by one assert frozen_module(torch.tensor(1)) == torch.tensor(13) Note If you’re not sure why an attribute is not being inlined as a constant, you can run `dump_alias_db` on frozen_module.forward.graph to see if freezing has detected the attribute is being modified. # torch.jit.ignore `torch.jit.ignore(drop=False, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#ignore) This decorator indicates to the compiler that a function or method should be ignored and left as a Python function. This allows you to leave code in your model that is not yet TorchScript compatible. If called from TorchScript, ignored functions will dispatch the call to the Python interpreter. Models with ignored functions cannot be exported; use [`@torch.jit.unused`](torch.jit.unused#torch.jit.unused "torch.jit.unused") instead. Example (using `@torch.jit.ignore` on a method): import torch import torch.nn as nn class MyModule(nn.Module): @torch.jit.ignore def debugger(self, x): import pdb pdb.set_trace() def forward(self, x): x += 10 # The compiler would normally try to compile `debugger`, # but since it is `@ignore`d, it will be left as a call # to Python self.debugger(x) return x m = torch.jit.script(MyModule()) # Error! The call `debugger` cannot be saved since it calls into Python m.save("m.pt") Example (using `@torch.jit.ignore(drop=True)` on a method): import torch import torch.nn as nn class MyModule(nn.Module): @torch.jit.ignore(drop=True) def training_method(self, x): import pdb pdb.set_trace() def forward(self, x): if self.training: self.training_method(x) return x m = torch.jit.script(MyModule()) # This is OK since `training_method` is not saved, the call is replaced # with a `raise`. m.save("m.pt") # torch.jit.isinstance `torch.jit.isinstance(obj, target_type)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit.html#isinstance) This function provides for conatiner type refinement in TorchScript. It can refine parameterized containers of the List, Dict, Tuple, and Optional types. E.g. `List[str]`, `Dict[str, List[torch.Tensor]]`, `Optional[Tuple[int,str,int]]`. It can also refine basic types such as bools and ints that are available in TorchScript. Parameters * **obj** – object to refine the type of * **target_type** – type to try to refine obj to Returns True if obj was successfully refined to the type of target_type, False otherwise with no new type refinement Return type `bool` Example (using `torch.jit.isinstance` for type refinement): .. testcode: import torch from typing import Any, Dict, List class MyModule(torch.nn.Module): def __init__(self): super(MyModule, self).__init__() def forward(self, input: Any): # note the Any type if torch.jit.isinstance(input, List[torch.Tensor]): for t in input: y = t.clamp(0, 0.5) elif torch.jit.isinstance(input, Dict[str, str]): for val in input.values(): print(val) m = torch.jit.script(MyModule()) x = [torch.rand(3,3), torch.rand(4,3)] m(x) y = {"key1":"val1","key2":"val2"} m(y) # torch.jit.load `torch.jit.load(f, map_location=None, _extra_files=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_serialization.html#load) Load a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") or [`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") previously saved with [`torch.jit.save`](torch.jit.save#torch.jit.save "torch.jit.save") All previously saved modules, no matter their device, are first loaded onto CPU, and then are moved to the devices they were saved from. If this fails (e.g. because the run time system doesn’t have certain devices), an exception is raised. Parameters * **f** – a file-like object (has to implement read, readline, tell, and seek), or a string containing a file name * **map_location** (_string_ _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – A simplified version of `map_location` in `torch.jit.save` used to dynamically remap storages to an alternative set of devices. * **_extra_files** (_dictionary of filename to content_) – The extra filenames given in the map would be loaded and their content would be stored in the provided map. Returns A [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") object. Example: import torch import io torch.jit.load('scriptmodule.pt') # Load ScriptModule from io.BytesIO object with open('scriptmodule.pt', 'rb') as f: buffer = io.BytesIO(f.read()) # Load all tensors to the original device torch.jit.load(buffer) # Load all tensors onto CPU, using a device buffer.seek(0) torch.jit.load(buffer, map_location=torch.device('cpu')) # Load all tensors onto CPU, using a string buffer.seek(0) torch.jit.load(buffer, map_location='cpu') # Load with extra files. extra_files = {'foo.txt': ''} # values will be replaced with data torch.jit.load('scriptmodule.pt', _extra_files=extra_files) print(extra_files['foo.txt']) # torch.jit.save `torch.jit.save(m, f, _extra_files=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_serialization.html#save) Save an offline version of this module for use in a separate process. The saved module serializes all of the methods, submodules, parameters, and attributes of this module. It can be loaded into the C++ API using `torch::jit::load(filename)` or into the Python API with [`torch.jit.load`](torch.jit.load#torch.jit.load "torch.jit.load"). To be able to save a module, it must not make any calls to native Python functions. This means that all submodules must be subclasses of [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") as well. Danger All modules, no matter their device, are always loaded onto the CPU during loading. This is different from [`torch.load()`](torch.load#torch.load "torch.load")’s semantics and may change in the future. Parameters * **m** – A [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") to save. * **f** – A file-like object (has to implement write and flush) or a string containing a file name. * **_extra_files** – Map from filename to contents which will be stored as part of `f`. Note torch.jit.save attempts to preserve the behavior of some operators across versions. For example, dividing two integer tensors in PyTorch 1.5 performed floor division, and if the module containing that code is saved in PyTorch 1.5 and loaded in PyTorch 1.6 its division behavior will be preserved. The same module saved in PyTorch 1.6 will fail to load in PyTorch 1.5, however, since the behavior of division changed in 1.6, and 1.5 does not know how to replicate the 1.6 behavior. Example: import torch import io class MyModule(torch.nn.Module): def forward(self, x): return x + 10 m = torch.jit.script(MyModule()) # Save to file torch.jit.save(m, 'scriptmodule.pt') # This line is equivalent to the previous m.save("scriptmodule.pt") # Save to io.BytesIO buffer buffer = io.BytesIO() torch.jit.save(m, buffer) # Save with extra files extra_files = {'foo.txt': b'bar'} torch.jit.save(m, 'scriptmodule.pt', _extra_files=extra_files) # torch.jit.script `torch.jit.script(obj, optimize=None, _frames_up=0, _rcb=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_script.html#script) Scripting a function or `nn.Module` will inspect the source code, compile it as TorchScript code using the TorchScript compiler, and return a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") or [`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction"). TorchScript itself is a subset of the Python language, so not all features in Python work, but we provide enough functionality to compute on tensors and do control-dependent operations. For a complete guide, see the [TorchScript Language Reference](../jit_language_reference#language-reference). `torch.jit.script` can be used as a function for modules and functions, and as a decorator `@torch.jit.script` for [TorchScript Classes](../jit_language_reference#id2) and functions. Parameters **obj** (callable, class, or `nn.Module`) – The `nn.Module`, function, or class type to compile. Returns If `obj` is `nn.Module`, `script` returns a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") object. The returned [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") will have the same set of sub-modules and parameters as the original `nn.Module`. If `obj` is a standalone function, a [`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") will be returned. **Scripting a function** The `@torch.jit.script` decorator will construct a [`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") by compiling the body of the function. Example (scripting a function): import torch @torch.jit.script def foo(x, y): if x.max() > y.max(): r = x else: r = y return r print(type(foo)) # torch.jit.ScriptFuncion # See the compiled graph as Python code print(foo.code) # Call the function using the TorchScript interpreter foo(torch.ones(2, 2), torch.ones(2, 2)) **Scripting an nn.Module** Scripting an `nn.Module` by default will compile the `forward` method and recursively compile any methods, submodules, and functions called by `forward`. If a `nn.Module` only uses features supported in TorchScript, no changes to the original module code should be necessary. `script` will construct [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") that has copies of the attributes, parameters, and methods of the original module. Example (scripting a simple module with a Parameter): import torch class MyModule(torch.nn.Module): def __init__(self, N, M): super(MyModule, self).__init__() # This parameter will be copied to the new ScriptModule self.weight = torch.nn.Parameter(torch.rand(N, M)) # When this submodule is used, it will be compiled self.linear = torch.nn.Linear(N, M) def forward(self, input): output = self.weight.mv(input) # This calls the `forward` method of the `nn.Linear` module, which will # cause the `self.linear` submodule to be compiled to a `ScriptModule` here output = self.linear(output) return output scripted_module = torch.jit.script(MyModule(2, 3)) Example (scripting a module with traced submodules): import torch import torch.nn as nn import torch.nn.functional as F class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() # torch.jit.trace produces a ScriptModule's conv1 and conv2 self.conv1 = torch.jit.trace(nn.Conv2d(1, 20, 5), torch.rand(1, 1, 16, 16)) self.conv2 = torch.jit.trace(nn.Conv2d(20, 20, 5), torch.rand(1, 20, 16, 16)) def forward(self, input): input = F.relu(self.conv1(input)) input = F.relu(self.conv2(input)) return input scripted_module = torch.jit.script(MyModule()) To compile a method other than `forward` (and recursively compile anything it calls), add the [`@torch.jit.export`](../jit#torch.jit.export "torch.jit.export") decorator to the method. To opt out of compilation use [`@torch.jit.ignore`](torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") or [`@torch.jit.unused`](torch.jit.unused#torch.jit.unused "torch.jit.unused"). Example (an exported and ignored method in a module): import torch import torch.nn as nn class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() @torch.jit.export def some_entry_point(self, input): return input + 10 @torch.jit.ignore def python_only_fn(self, input): # This function won't be compiled, so any # Python APIs can be used import pdb pdb.set_trace() def forward(self, input): if self.training: self.python_only_fn(input) return input * 99 scripted_module = torch.jit.script(MyModule()) print(scripted_module.some_entry_point(torch.randn(2, 2))) print(scripted_module(torch.randn(2, 2))) # torch.jit.script_if_tracing `torch.jit.script_if_tracing(fn)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit.html#script_if_tracing) Compiles `fn` when it is first called during tracing. `torch.jit.script` has a non-negligible start up time when it is first called due to lazy- initializations of many compiler builtins. Therefore you should not use it in library code. However, you may want to have parts of your library work in tracing even if they use control flow. In these cases, you should use `@torch.jit.script_if_tracing` to substitute for `torch.jit.script`. Parameters **fn** – A function to compile. Returns If called during tracing, a [`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") created by `torch.jit.script` is returned. Otherwise, the original function `fn` is returned. # ScriptFunction `class torch.jit.ScriptFunction` Functionally equivalent to a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"), but represents a single function and does not have any attributes or Parameters. `get_debug_state(self: torch._C.ScriptFunction) → torch._C.GraphExecutorState` `save(self: torch._C.ScriptFunction, filename: str, _extra_files: Dict[str, str] = {}) → None` `save_to_buffer(self: torch._C.ScriptFunction, _extra_files: Dict[str, str] = {}) → bytes` # ScriptModule `class torch.jit.ScriptModule` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_script.html#ScriptModule) A wrapper around C++ `torch::jit::Module`. `ScriptModule`s contain methods, attributes, parameters, and constants. These can be accessed the same as on a normal `nn.Module`. `add_module(name, module)` Adds a child module to the current module. The module can be accessed as an attribute using the given name. Parameters * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – child module to be added to the module. `apply(fn)` Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self. Typical use includes initializing the parameters of a model (see also [torch.nn.init](../nn.init#nn-init-doc)). Parameters **fn** (`Module` -> None) – function to be applied to each submodule Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") Example: >>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) `bfloat16()` Casts all floating point parameters and buffers to `bfloat16` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `buffers(recurse=True)` Returns an iterator over module buffers. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _torch.Tensor_ – module buffer Example: >>> for buf in model.buffers(): >>> print(type(buf), buf.size()) (20L,) (20L, 1L, 5L, 5L) `children()` Returns an iterator over immediate children modules. Yields _Module_ – a child module `property code` Returns a pretty-printed representation (as valid Python syntax) of the internal graph for the `forward` method. See [Inspecting Code](../jit#inspecting-code) for details. `property code_with_constants` Returns a tuple of: [0] a pretty-printed representation (as valid Python syntax) of the internal graph for the `forward` method. See `code`. [1] a ConstMap following the CONSTANT.cN format of the output in [0]. The indices in the [0] output are keys to the underlying constant’s values. See [Inspecting Code](../jit#inspecting-code) for details. `cpu()` Moves all model parameters and buffers to the CPU. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `cuda(device=None)` Moves all model parameters and buffers to the GPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `double()` Casts all floating point parameters and buffers to `double` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `eval()` Sets the module in evaluation mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. `Dropout`, `BatchNorm`, etc. This is equivalent with [`self.train(False)`](torch.nn.module#torch.nn.Module.train "torch.nn.Module.train"). Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `extra_repr()` Set the extra representation of the module To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable. `float()` Casts all floating point parameters and buffers to float datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `property graph` Returns a string representation of the internal graph for the `forward` method. See [Interpreting Graphs](../jit#interpreting-graphs) for details. `half()` Casts all floating point parameters and buffers to `half` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `property inlined_graph` Returns a string representation of the internal graph for the `forward` method. This graph will be preprocessed to inline all function and method calls. See [Interpreting Graphs](../jit#interpreting-graphs) for details. `load_state_dict(state_dict, strict=True)` Copies parameters and buffers from `state_dict` into this module and its descendants. If `strict` is `True`, then the keys of `state_dict` must exactly match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Parameters * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Default: `True` Returns * **missing_keys** is a list of str containing the missing keys * **unexpected_keys** is a list of str containing the unexpected keys Return type `NamedTuple` with `missing_keys` and `unexpected_keys` fields `modules()` Returns an iterator over all modules in the network. Yields _Module_ – a module in the network Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True) `named_buffers(prefix='', recurse=True)` Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _(string, torch.Tensor)_ – Tuple containing the name and buffer Example: >>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size()) `named_children()` Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple containing a name and child module Example: >>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module) `named_modules(memo=None, prefix='')` Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple of name and module Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True)) `named_parameters(prefix='', recurse=True)` Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _(string, Parameter)_ – Tuple containing the name and parameter Example: >>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size()) `parameters(recurse=True)` Returns an iterator over module parameters. This is typically passed to an optimizer. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _Parameter_ – module parameter Example: >>> for param in model.parameters(): >>> print(type(param), param.size()) (20L,) (20L, 1L, 5L, 5L) `register_backward_hook(hook)` Registers a backward hook on the module. This function is deprecated in favor of `nn.Module.register_full_backward_hook()` and the behavior of this function will change in future versions. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_buffer(name, tensor, persistent=True)` Adds a buffer to the module. This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s `running_mean` is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting `persistent` to `False`. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s `state_dict`. Buffers can be accessed as attributes using given names. Parameters * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered. * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`. Example: >>> self.register_buffer('running_mean', torch.zeros(num_features)) `register_forward_hook(hook)` Registers a forward hook on the module. The hook will be called every time after `forward()` has computed an output. It should have the following signature: hook(module, input, output) -> None or modified output The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after `forward()` is called. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_forward_pre_hook(hook)` Registers a forward pre-hook on the module. The hook will be called every time before `forward()` is invoked. It should have the following signature: hook(module, input) -> None or modified input The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple). Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_full_backward_hook(hook)` Registers a backward hook on the module. The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature: hook(module, grad_input, grad_output) -> tuple(Tensor) or None The `grad_input` and `grad_output` are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of `grad_input` in subsequent computations. `grad_input` will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output` will be `None` for all non-Tensor arguments. Warning Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_parameter(name, param)` Adds a parameter to the module. The parameter can be accessed as an attribute using given name. Parameters * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module. `requires_grad_(requires_grad=True)` Change if autograd should record operations on parameters in this module. This method sets the parameters’ `requires_grad` attributes in-place. This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training). Parameters **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether autograd should record operations on parameters in this module. Default: `True`. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `save(f, _extra_files={})` See [`torch.jit.save`](torch.jit.save#torch.jit.save "torch.jit.save") for details. `state_dict(destination=None, prefix='', keep_vars=False)` Returns a dictionary containing a whole state of the module. Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Returns a dictionary containing a whole state of the module Return type [dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)") Example: >>> module.state_dict().keys() ['bias', 'weight'] `to(*args, **kwargs)` Moves and/or casts the parameters and buffers. This can be called as `to(device=None, dtype=None, non_blocking=False)` `to(dtype, non_blocking=False)` `to(tensor, non_blocking=False)` `to(memory_format=torch.channels_last)` Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to "torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype` (if given). The integral parameters and buffers will be moved `device`, if that is given, but with dtypes unchanged. When `non_blocking` is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. See below for examples. Note This method modifies the module in-place. Parameters * **device** (`torch.device`) – the desired device of the parameters and buffers in this module * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument) Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") Examples: >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128) `train(mode=True)` Sets the module in training mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. `Dropout`, `BatchNorm`, etc. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode (`False`). Default: `True`. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `type(dst_type)` Casts all parameters and buffers to `dst_type`. Parameters **dst_type** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – the desired type Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `xpu(device=None)` Moves all model parameters and buffers to the XPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `zero_grad(set_to_none=False)` Sets gradients of all model parameters to zero. See similar function under [`torch.optim.Optimizer`](../optim#torch.optim.Optimizer "torch.optim.Optimizer") for more context. Parameters **set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – instead of setting to zero, set the grads to None. See [`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad "torch.optim.Optimizer.zero_grad") for details. # torch.jit.trace `torch.jit.trace(func, example_inputs, optimize=None, check_trace=True, check_inputs=None, check_tolerance=1e-05, strict=True, _force_outplace=False, _module_class=None, _compilation_unit=)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_trace.html#trace) Trace a function and return an executable or [`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") that will be optimized using just-in-time compilation. Tracing is ideal for code that operates only on `Tensor`s and lists, dictionaries, and tuples of `Tensor`s. Using `torch.jit.trace` and `torch.jit.trace_module`, you can turn an existing module or Python function into a TorchScript [`ScriptFunction`](torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") or [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"). You must provide example inputs, and we run the function, recording the operations performed on all the tensors. * The resulting recording of a standalone function produces `ScriptFunction`. * The resulting recording of `nn.Module.forward` or `nn.Module` produces `ScriptModule`. This module also contains any parameters that the original module had as well. Warning Tracing only correctly records functions and modules which are not data dependent (e.g., do not have conditionals on data in tensors) and do not have any untracked external dependencies (e.g., perform input/output or access global variables). Tracing only records operations done when the given function is run on the given tensors. Therefore, the returned `ScriptModule` will always run the same traced graph on any input. This has some important implications when your module is expected to run different sets of operations, depending on the input and/or the module state. For example, * Tracing will not record any control-flow like if-statements or loops. When this control-flow is constant across your module, this is fine and it often inlines the control-flow decisions. But sometimes the control-flow is actually part of the model itself. For instance, a recurrent network is a loop over the (possibly dynamic) length of an input sequence. * In the returned [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"), operations that have different behaviors in `training` and `eval` modes will always behave as if it is in the mode it was in during tracing, no matter which mode the `ScriptModule` is in. In cases like these, tracing would not be appropriate and [`scripting`](torch.jit.script#torch.jit.script "torch.jit.script") is a better choice. If you trace such models, you may silently get incorrect results on subsequent invocations of the model. The tracer will try to emit warnings when doing something that may cause an incorrect trace to be produced. Parameters * **func** (_callable_ _or_[torch.nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – A Python function or `torch.nn.Module` that will be run with `example_inputs`. `func` arguments and return values must be tensors or (possibly nested) tuples that contain tensors. When a module is passed `torch.jit.trace`, only the `forward` method is run and traced (see [`torch.jit.trace`](torch.jit.trace_module#torch.jit.trace_module "torch.jit.trace_module") for details). * **example_inputs** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _or_[torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – A tuple of example inputs that will be passed to the function while tracing. The resulting trace can be run with inputs of different types and shapes assuming the traced operations support those types and shapes. `example_inputs` may also be a single Tensor in which case it is automatically wrapped in a tuple. Keyword Arguments * **check_trace** (`bool`, optional) – Check if the same inputs run through traced code produce the same outputs. Default: `True`. You might want to disable this if, for example, your network contains non- deterministic ops or if you are sure that the network is correct despite a checker failure. * **check_inputs** (_list of tuples_ _,__optional_) – A list of tuples of input arguments that should be used to check the trace against what is expected. Each tuple is equivalent to a set of input arguments that would be specified in `example_inputs`. For best results, pass in a set of checking inputs representative of the space of shapes and types of inputs you expect the network to see. If not specified, the original `example_inputs` are used for checking * **check_tolerance** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Floating-point comparison tolerance to use in the checker procedure. This can be used to relax the checker strictness in the event that results diverge numerically for a known reason, such as operator fusion. * **strict** (`bool`, optional) – run the tracer in a strict mode or not (default: `True`). Only turn this off when you want the tracer to record your mutable container types (currently `list`/`dict`) and you are sure that the container you are using in your problem is a `constant` structure and does not get used as control flow (if, for) conditions. Returns If `func` is `nn.Module` or `forward` of `nn.Module`, `trace` returns a [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") object with a single `forward` method containing the traced code. The returned `ScriptModule` will have the same set of sub-modules and parameters as the original `nn.Module`. If `func` is a standalone function, `trace` returns `ScriptFunction`. Example (tracing a function): import torch def foo(x, y): return 2 * x + y # Run `foo` with the provided inputs and record the tensor operations traced_foo = torch.jit.trace(foo, (torch.rand(3), torch.rand(3))) # `traced_foo` can now be run with the TorchScript interpreter or saved # and loaded in a Python-free environment Example (tracing an existing module): import torch import torch.nn as nn class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv = nn.Conv2d(1, 1, 3) def forward(self, x): return self.conv(x) n = Net() example_weight = torch.rand(1, 1, 3, 3) example_forward_input = torch.rand(1, 1, 3, 3) # Trace a specific method and construct `ScriptModule` with # a single `forward` method module = torch.jit.trace(n.forward, example_forward_input) # Trace a module (implicitly traces `forward`) and construct a # `ScriptModule` with a single `forward` method module = torch.jit.trace(n, example_forward_input) # torch.jit.trace_module `torch.jit.trace_module(mod, inputs, optimize=None, check_trace=True, check_inputs=None, check_tolerance=1e-05, strict=True, _force_outplace=False, _module_class=None, _compilation_unit=)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_trace.html#trace_module) Trace a module and return an executable [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") that will be optimized using just-in-time compilation. When a module is passed to [`torch.jit.trace`](torch.jit.trace#torch.jit.trace "torch.jit.trace"), only the `forward` method is run and traced. With `trace_module`, you can specify a dictionary of method names to example inputs to trace (see the `inputs`) argument below. See [`torch.jit.trace`](torch.jit.trace#torch.jit.trace "torch.jit.trace") for more information on tracing. Parameters * **mod** ([torch.nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – A `torch.nn.Module` containing methods whose names are specified in `inputs`. The given methods will be compiled as a part of a single `ScriptModule`. * **inputs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – A dict containing sample inputs indexed by method names in `mod`. The inputs will be passed to methods whose names correspond to inputs’ keys while tracing. `{ 'forward' : example_forward_input, 'method2': example_method2_input}` Keyword Arguments * **check_trace** (`bool`, optional) – Check if the same inputs run through traced code produce the same outputs. Default: `True`. You might want to disable this if, for example, your network contains non- deterministic ops or if you are sure that the network is correct despite a checker failure. * **check_inputs** (_list of dicts_ _,__optional_) – A list of dicts of input arguments that should be used to check the trace against what is expected. Each tuple is equivalent to a set of input arguments that would be specified in `inputs`. For best results, pass in a set of checking inputs representative of the space of shapes and types of inputs you expect the network to see. If not specified, the original `inputs` are used for checking * **check_tolerance** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Floating-point comparison tolerance to use in the checker procedure. This can be used to relax the checker strictness in the event that results diverge numerically for a known reason, such as operator fusion. Returns A [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") object with a single `forward` method containing the traced code. When `func` is a `torch.nn.Module`, the returned [`ScriptModule`](torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") will have the same set of sub-modules and parameters as `func`. Example (tracing a module with multiple methods): import torch import torch.nn as nn class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv = nn.Conv2d(1, 1, 3) def forward(self, x): return self.conv(x) def weighted_kernel_sum(self, weight): return weight * self.conv.weight n = Net() example_weight = torch.rand(1, 1, 3, 3) example_forward_input = torch.rand(1, 1, 3, 3) # Trace a specific method and construct `ScriptModule` with # a single `forward` method module = torch.jit.trace(n.forward, example_forward_input) # Trace a module (implicitly traces `forward`) and construct a # `ScriptModule` with a single `forward` method module = torch.jit.trace(n, example_forward_input) # Trace specific methods on a module (specified in `inputs`), constructs # a `ScriptModule` with `forward` and `weighted_kernel_sum` methods inputs = {'forward' : example_forward_input, 'weighted_kernel_sum' : example_weight} module = torch.jit.trace_module(n, inputs) # torch.jit.unused `torch.jit.unused(fn)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#unused) This decorator indicates to the compiler that a function or method should be ignored and replaced with the raising of an exception. This allows you to leave code in your model that is not yet TorchScript compatible and still export your model. Example (using `@torch.jit.unused` on a method): import torch import torch.nn as nn class MyModule(nn.Module): def __init__(self, use_memory_efficient): super(MyModule, self).__init__() self.use_memory_efficient = use_memory_efficient @torch.jit.unused def memory_efficient(self, x): import pdb pdb.set_trace() return x + 10 def forward(self, x): # Use not-yet-scriptable memory efficient mode if self.use_memory_efficient: return self.memory_efficient(x) else: return x + 10 m = torch.jit.script(MyModule(use_memory_efficient=False)) m.save("m.pt") m = torch.jit.script(MyModule(use_memory_efficient=True)) # exception raised m(torch.rand(100)) # torch.jit.wait `torch.jit.wait(future)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/jit/_async.html#wait) Forces completion of a `torch.jit.Future[T]` asynchronous task, returning the result of the task. See [`fork()`](torch.jit.fork#torch.jit.fork "torch.jit.fork") for docs and examples. :param func: an asynchronous task reference, created through `torch.jit.fork` :type func: torch.jit.Future[T] Returns the return value of the the completed task Return type `T` # torch.kaiser_window `torch.kaiser_window(window_length, periodic=True, beta=12.0, *, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Computes the Kaiser window with window length `window_length` and shape parameter `beta`. Let I_0 be the zeroth order modified Bessel function of the first kind (see [`torch.i0()`](torch.i0#torch.i0 "torch.i0")) and `N = L - 1` if `periodic` is False and `L` if `periodic` is True, where `L` is the `window_length`. This function computes: outi=I0(β1−(i−N/2N/2)2)/I0(β)out_i = I_0 \left( \beta \sqrt{1 - \left( {\frac{i - N/2}{N/2}} \right) ^2 } \right) / I_0( \beta ) Calling `torch.kaiser_window(L, B, periodic=True)` is equivalent to calling `torch.kaiser_window(L + 1, B, periodic=False)[:-1])`. The `periodic` argument is intended as a helpful shorthand to produce a periodic window as input to functions like [`torch.stft()`](torch.stft#torch.stft "torch.stft"). Note If `window_length` is one, then the returned window is a single element tensor containing a one. Parameters * **window_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – length of the window. * **periodic** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, returns a periodic window suitable for use in spectral analysis. If False, returns a symmetric window suitable for use in filter design. * **beta** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – shape parameter for the window. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned window tensor. Only `torch.strided` (dense layout) is supported. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. # torch.kron `torch.kron(input, other, *, out=None) → Tensor` Computes the Kronecker product, denoted by ⊗\otimes , of `input` and `other`. If `input` is a (a0×a1×⋯×an)(a_0 \times a_1 \times \dots \times a_n) tensor and `other` is a (b0×b1×⋯×bn)(b_0 \times b_1 \times \dots \times b_n) tensor, the result will be a (a0∗b0×a1∗b1×⋯×an∗bn)(a_0*b_0 \times a_1*b_1 \times \dots \times a_n*b_n) tensor with the following entries: (input⊗other)k0,k1,…,kn=inputi0,i1,…,in∗otherj0,j1,…,jn,(\text{input} \otimes \text{other})_{k_0, k_1, \dots, k_n} = \text{input}_{i_0, i_1, \dots, i_n} * \text{other}_{j_0, j_1, \dots, j_n}, where kt=it∗bt+jtk_t = i_t * b_t + j_t for 0≤t≤n0 \leq t \leq n . If one tensor has fewer dimensions than the other it is unsqueezed until it has the same number of dimensions. Supports real-valued and complex-valued inputs. Note This function generalizes the typical definition of the Kronecker product for two matrices to two tensors, as described above. When `input` is a (m×n)(m \times n) matrix and `other` is a (p×q)(p \times q) matrix, the result will be a (p∗m×q∗n)(p*m \times q*n) block matrix: A⊗B=[a11B⋯a1nB⋮⋱⋮am1B⋯amnB]\mathbf{A} \otimes \mathbf{B}=\begin{bmatrix} a_{11} \mathbf{B} & \cdots & a_{1 n} \mathbf{B} \\\ \vdots & \ddots & \vdots \\\ a_{m 1} \mathbf{B} & \cdots & a_{m n} \mathbf{B} \end{bmatrix} where `input` is A\mathbf{A} and `other` is B\mathbf{B} . Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default: `None` Examples: >>> mat1 = torch.eye(2) >>> mat2 = torch.ones(2, 2) >>> torch.kron(mat1, mat2) tensor([[1., 1., 0., 0.], [1., 1., 0., 0.], [0., 0., 1., 1.], [0., 0., 1., 1.]]) >>> mat1 = torch.eye(2) >>> mat2 = torch.arange(1, 5).reshape(2, 2) >>> torch.kron(mat1, mat2) tensor([[1., 2., 0., 0.], [3., 4., 0., 0.], [0., 0., 1., 2.], [0., 0., 3., 4.]]) # torch.kthvalue `torch.kthvalue(input, k, dim=None, keepdim=False, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` is the `k` th smallest element of each row of the `input` tensor in the given dimension `dim`. And `indices` is the index location of each element found. If `dim` is not given, the last dimension of the `input` is chosen. If `keepdim` is `True`, both the `values` and `indices` tensors are the same size as `input`, except in the dimension `dim` where they are of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in both the `values` and `indices` tensors having 1 fewer dimension than the `input` tensor. Note When `input` is a CUDA tensor and there are multiple valid `k` th values, this function may nondeterministically return `indices` for any of them. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **k** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – k for the k-th smallest element * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to find the kth value along * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the output tuple of (Tensor, LongTensor) can be optionally given to be used as output buffers Example: >>> x = torch.arange(1., 6.) >>> x tensor([ 1., 2., 3., 4., 5.]) >>> torch.kthvalue(x, 4) torch.return_types.kthvalue(values=tensor(4.), indices=tensor(3)) >>> x=torch.arange(1.,7.).resize_(2,3) >>> x tensor([[ 1., 2., 3.], [ 4., 5., 6.]]) >>> torch.kthvalue(x, 2, 0, True) torch.return_types.kthvalue(values=tensor([[4., 5., 6.]]), indices=tensor([[1, 1, 1]])) # torch.lcm `torch.lcm(input, other, *, out=None) → Tensor` Computes the element-wise least common multiple (LCM) of `input` and `other`. Both `input` and `other` must have integer types. Note This defines lcm(0,0)=0lcm(0, 0) = 0 and lcm(0,a)=0lcm(0, a) = 0 . Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([5, 10, 15]) >>> b = torch.tensor([3, 4, 5]) >>> torch.lcm(a, b) tensor([15, 20, 15]) >>> c = torch.tensor([3]) >>> torch.lcm(a, c) tensor([15, 30, 15]) # torch.ldexp `torch.ldexp(input, other, *, out=None) → Tensor` Multiplies `input` by 2**:attr:`other`. outi=inputi∗2iother\text{{out}}_i = \text{{input}}_i * 2^\text{{other}}_i Typically this function is used to construct floating point numbers by multiplying mantissas in `input` with integral powers of two created from the exponents in :attr:’other’. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a tensor of exponents, typically integers. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> torch.ldexp(torch.tensor([1.]), torch.tensor([1])) tensor([2.]) >>> torch.ldexp(torch.tensor([1.0]), torch.tensor([1, 2, 3, 4])) tensor([ 2., 4., 8., 16.]) # torch.le `torch.le(input, other, *, out=None) → Tensor` Computes input≤other\text{input} \leq \text{other} element-wise. The second argument can be a number or a tensor whose shape is [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the first argument. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – the tensor or value to compare Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Returns A boolean tensor that is True where `input` is less than or equal to `other` and False elsewhere Example: >>> torch.le(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[True, False], [True, True]]) # torch.lerp `torch.lerp(input, end, weight, *, out=None)` Does a linear interpolation of two tensors `start` (given by `input`) and `end` based on a scalar or tensor `weight` and returns the resulting `out` tensor. outi=starti+weighti×(endi−starti)\text{out}_i = \text{start}_i + \text{weight}_i \times (\text{end}_i - \text{start}_i) The shapes of `start` and `end` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). If `weight` is a tensor, then the shapes of `weight`, `start`, and `end` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor with the starting points * **end** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor with the ending points * **weight** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _tensor_) – the weight for the interpolation formula Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> start = torch.arange(1., 5.) >>> end = torch.empty(4).fill_(10) >>> start tensor([ 1., 2., 3., 4.]) >>> end tensor([ 10., 10., 10., 10.]) >>> torch.lerp(start, end, 0.5) tensor([ 5.5000, 6.0000, 6.5000, 7.0000]) >>> torch.lerp(start, end, torch.full_like(start, 0.5)) tensor([ 5.5000, 6.0000, 6.5000, 7.0000]) # torch.less `torch.less(input, other, *, out=None) → Tensor` Alias for [`torch.lt()`](torch.lt#torch.lt "torch.lt"). # torch.less_equal `torch.less_equal(input, other, *, out=None) → Tensor` Alias for [`torch.le()`](torch.le#torch.le "torch.le"). # torch.lgamma `torch.lgamma(input, *, out=None) → Tensor` Computes the logarithm of the gamma function on `input`. outi=log⁡Γ(inputi)\text{out}_{i} = \log \Gamma(\text{input}_{i}) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.arange(0.5, 2, 0.5) >>> torch.lgamma(a) tensor([ 0.5724, 0.0000, -0.1208]) # torch.linspace `torch.linspace(start, end, steps, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from `start` to `end`, inclusive. That is, the value are: (start,start+end−startsteps−1,…,start+(steps−2)∗end−startsteps−1,end)(\text{start}, \text{start} + \frac{\text{end} - \text{start}}{\text{steps} - 1}, \ldots, \text{start} + (\text{steps} - 2) * \frac{\text{end} - \text{start}}{\text{steps} - 1}, \text{end}) Warning Not providing a value for `steps` is deprecated. For backwards compatibility, not providing a value for `steps` will create a tensor with 100 elements. Note that this behavior is not reflected in the documented function signature and should not be relied on. In a future PyTorch release, failing to provide a value for `steps` will throw a runtime error. Parameters * **start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the starting value for the set of points * **end** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the ending value for the set of points * **steps** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the constructed tensor Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.linspace(3, 10, steps=5) tensor([ 3.0000, 4.7500, 6.5000, 8.2500, 10.0000]) >>> torch.linspace(-10, 10, steps=5) tensor([-10., -5., 0., 5., 10.]) >>> torch.linspace(start=-10, end=10, steps=5) tensor([-10., -5., 0., 5., 10.]) >>> torch.linspace(start=-10, end=10, steps=1) tensor([-10.]) # torch.load `torch.load(f, map_location=None, pickle_module=, **pickle_load_args)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/serialization.html#load) Loads an object saved with [`torch.save()`](torch.save#torch.save "torch.save") from a file. `torch.load()` uses Python’s unpickling facilities but treats storages, which underlie tensors, specially. They are first deserialized on the CPU and are then moved to the device they were saved from. If this fails (e.g. because the run time system doesn’t have certain devices), an exception is raised. However, storages can be dynamically remapped to an alternative set of devices using the `map_location` argument. If `map_location` is a callable, it will be called once for each serialized storage with two arguments: storage and location. The storage argument will be the initial deserialization of the storage, residing on the CPU. Each serialized storage has a location tag associated with it which identifies the device it was saved from, and this tag is the second argument passed to `map_location`. The builtin location tags are `'cpu'` for CPU tensors and `'cuda:device_id'` (e.g. `'cuda:2'`) for CUDA tensors. `map_location` should return either `None` or a storage. If `map_location` returns a storage, it will be used as the final deserialized object, already moved to the right device. Otherwise, `torch.load()` will fall back to the default behavior, as if `map_location` wasn’t specified. If `map_location` is a [`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device") object or a string containing a device tag, it indicates the location where all tensors should be loaded. Otherwise, if `map_location` is a dict, it will be used to remap location tags appearing in the file (keys), to ones that specify where to put the storages (values). User extensions can register their own location tags and tagging and deserialization methods using `torch.serialization.register_package()`. Parameters * **f** – a file-like object (has to implement `read()`, `readline()`, `tell()`, and `seek()`), or a string or os.PathLike object containing a file name * **map_location** – a function, [`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), string or a dict specifying how to remap storage locations * **pickle_module** – module used for unpickling metadata and objects (has to match the `pickle_module` used to serialize file) * **pickle_load_args** – (Python 3 only) optional keyword arguments passed over to `pickle_module.load()` and `pickle_module.Unpickler()`, e.g., `errors=...`. Warning `torch.load()` uses `pickle` module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source, or that could have been tampered with. **Only load data you trust**. Note When you call `torch.load()` on a file which contains GPU tensors, those tensors will be loaded to GPU by default. You can call `torch.load(.., map_location='cpu')` and then `load_state_dict()` to avoid GPU RAM surge when loading a model checkpoint. Note By default, we decode byte strings as `utf-8`. This is to avoid a common error case `UnicodeDecodeError: 'ascii' codec can't decode byte 0x...` when loading files saved by Python 2 in Python 3. If this default is incorrect, you may use an extra `encoding` keyword argument to specify how these objects should be loaded, e.g., `encoding='latin1'` decodes them to strings using `latin1` encoding, and `encoding='bytes'` keeps them as byte arrays which can be decoded later with `byte_array.decode(...)`. #### Example >>> torch.load('tensors.pt') # Load all tensors onto the CPU >>> torch.load('tensors.pt', map_location=torch.device('cpu')) # Load all tensors onto the CPU, using a function >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage) # Load all tensors onto GPU 1 >>> torch.load('tensors.pt', map_location=lambda storage, loc: storage.cuda(1)) # Map tensors from GPU 1 to GPU 0 >>> torch.load('tensors.pt', map_location={'cuda:1':'cuda:0'}) # Load tensor from io.BytesIO object >>> with open('tensor.pt', 'rb') as f: ... buffer = io.BytesIO(f.read()) >>> torch.load(buffer) # Load a module with 'ascii' encoding for unpickling >>> torch.load('module.pt', encoding='ascii') # torch.lobpcg `torch.lobpcg(A, k=None, B=None, X=None, n=None, iK=None, niter=None, tol=None, largest=None, method=None, tracker=None, ortho_iparams=None, ortho_fparams=None, ortho_bparams=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_lobpcg.html#lobpcg) Find the k largest (or smallest) eigenvalues and the corresponding eigenvectors of a symmetric positive defined generalized eigenvalue problem using matrix-free LOBPCG methods. This function is a front-end to the following LOBPCG algorithms selectable via `method` argument: `method=”basic”` \- the LOBPCG method introduced by Andrew Knyazev, see [Knyazev2001]. A less robust method, may fail when Cholesky is applied to singular input. `method=”ortho”` \- the LOBPCG method with orthogonal basis selection [StathopoulosEtal2002]. A robust method. Supported inputs are dense, sparse, and batches of dense matrices. Note In general, the basic method spends least time per iteration. However, the robust methods converge much faster and are more stable. So, the usage of the basic method is generally not recommended but there exist cases where the usage of the basic method may be preferred. Warning The backward method does not support sparse and complex inputs. It works only when `B` is not provided (i.e. `B == None`). We are actively working on extensions, and the details of the algorithms are going to be published promptly. Warning While it is assumed that `A` is symmetric, `A.grad` is not. To make sure that `A.grad` is symmetric, so that `A - t * A.grad` is symmetric in first-order optimization routines, prior to running `lobpcg` we do the following symmetrization map: `A -> (A + A.t()) / 2`. The map is performed only when the `A` requires gradients. Parameters * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,m)(*, m, m) * **B** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the input tensor of size (∗,m,m)(*, m, m) . When not specified, `B` is interpereted as identity matrix. * **X** (_tensor_ _,__optional_) – the input tensor of size (∗,m,n)(*, m, n) where `k <= n <= m`. When specified, it is used as initial approximation of eigenvectors. X must be a dense tensor. * **iK** (_tensor_ _,__optional_) – the input tensor of size (∗,m,m)(*, m, m) . When specified, it will be used as preconditioner. * **k** (_integer_ _,__optional_) – the number of requested eigenpairs. Default is the number of XX columns (when specified) or `1`. * **n** (_integer_ _,__optional_) – if XX is not specified then `n` specifies the size of the generated random approximation of eigenvectors. Default value for `n` is `k`. If XX is specified, the value of `n` (when specified) must be the number of XX columns. * **tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – residual tolerance for stopping criterion. Default is `feps ** 0.5` where `feps` is smallest non-zero floating-point number of the given input tensor `A` data type. * **largest** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – when True, solve the eigenproblem for the largest eigenvalues. Otherwise, solve the eigenproblem for smallest eigenvalues. Default is `True`. * **method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – select LOBPCG method. See the description of the function above. Default is “ortho”. * **niter** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – maximum number of iterations. When reached, the iteration process is hard-stopped and the current approximation of eigenpairs is returned. For infinite iteration but until convergence criteria is met, use `-1`. * **tracker** (_callable_ _,__optional_) – a function for tracing the iteration process. When specified, it is called at each iteration step with LOBPCG instance as an argument. The LOBPCG instance holds the full state of the iteration process in the following attributes: `iparams`, `fparams`, `bparams` \- dictionaries of integer, float, and boolean valued input parameters, respectively `ivars`, `fvars`, `bvars`, `tvars` \- dictionaries of integer, float, boolean, and Tensor valued iteration variables, respectively. `A`, `B`, `iK` \- input Tensor arguments. `E`, `X`, `S`, `R` \- iteration Tensor variables. For instance: `ivars[“istep”]` \- the current iteration step `X` \- the current approximation of eigenvectors `E` \- the current approximation of eigenvalues `R` \- the current residual `ivars[“converged_count”]` \- the current number of converged eigenpairs `tvars[“rerr”]` \- the current state of convergence criteria Note that when `tracker` stores Tensor objects from the LOBPCG instance, it must make copies of these. If `tracker` sets `bvars[“force_stop”] = True`, the iteration process will be hard-stopped. * **ortho_fparams, ortho_bparams** (_ortho_iparams_ _,_) – various parameters to LOBPCG algorithm when using `method=”ortho”`. Returns tensor of eigenvalues of size (∗,k)(*, k) X (Tensor): tensor of eigenvectors of size (∗,m,k)(*, m, k) Return type E ([Tensor](../tensors#torch.Tensor "torch.Tensor")) #### References [Knyazev2001] Andrew V. Knyazev. (2001) Toward the Optimal Preconditioned Eigensolver: Locally Optimal Block Preconditioned Conjugate Gradient Method. SIAM J. Sci. Comput., 23(2), 517-541. (25 pages) [StathopoulosEtal2002] Andreas Stathopoulos and Kesheng Wu. (2002) A Block Orthogonalization Procedure with Constant Synchronization Requirements. SIAM J. Sci. Comput., 23(6), 2165-2182. (18 pages) [DuerschEtal2018] Jed A. Duersch, Meiyue Shao, Chao Yang, Ming Gu. (2018) A Robust and Efficient Implementation of LOBPCG. SIAM J. Sci. Comput., 40(5), C655-C676. (22 pages) # torch.log `torch.log(input, *, out=None) → Tensor` Returns a new tensor with the natural logarithm of the elements of `input`. yi=log⁡e(xi)y_{i} = \log_{e} (x_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(5) >>> a tensor([-0.7168, -0.5471, -0.8933, -1.4428, -0.1190]) >>> torch.log(a) tensor([ nan, nan, nan, nan, nan]) # torch.log10 `torch.log10(input, *, out=None) → Tensor` Returns a new tensor with the logarithm to the base 10 of the elements of `input`. yi=log⁡10(xi)y_{i} = \log_{10} (x_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.rand(5) >>> a tensor([ 0.5224, 0.9354, 0.7257, 0.1301, 0.2251]) >>> torch.log10(a) tensor([-0.2820, -0.0290, -0.1392, -0.8857, -0.6476]) # torch.log1p `torch.log1p(input, *, out=None) → Tensor` Returns a new tensor with the natural logarithm of (1 + `input`). yi=log⁡e(xi+1)y_i = \log_{e} (x_i + 1) Note This function is more accurate than [`torch.log()`](torch.log#torch.log "torch.log") for small values of `input` Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(5) >>> a tensor([-1.0090, -0.9923, 1.0249, -0.5372, 0.2492]) >>> torch.log1p(a) tensor([ nan, -4.8653, 0.7055, -0.7705, 0.2225]) # torch.log2 `torch.log2(input, *, out=None) → Tensor` Returns a new tensor with the logarithm to the base 2 of the elements of `input`. yi=log⁡2(xi)y_{i} = \log_{2} (x_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.rand(5) >>> a tensor([ 0.8419, 0.8003, 0.9971, 0.5287, 0.0490]) >>> torch.log2(a) tensor([-0.2483, -0.3213, -0.0042, -0.9196, -4.3504]) # torch.logaddexp `torch.logaddexp(input, other, *, out=None) → Tensor` Logarithm of the sum of exponentiations of the inputs. Calculates pointwise log⁡(ex+ey)\log\left(e^x + e^y\right) . This function is useful in statistics where the calculated probabilities of events may be so small as to exceed the range of normal floating point numbers. In such cases the logarithm of the calculated probability is stored. This function allows adding probabilities stored in such a fashion. This op should be disambiguated with [`torch.logsumexp()`](torch.logsumexp#torch.logsumexp "torch.logsumexp") which performs a reduction on a single tensor. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.logaddexp(torch.tensor([-1.0]), torch.tensor([-1.0, -2, -3])) tensor([-0.3069, -0.6867, -0.8731]) >>> torch.logaddexp(torch.tensor([-100.0, -200, -300]), torch.tensor([-1.0, -2, -3])) tensor([-1., -2., -3.]) >>> torch.logaddexp(torch.tensor([1.0, 2000, 30000]), torch.tensor([-1.0, -2, -3])) tensor([1.1269e+00, 2.0000e+03, 3.0000e+04]) # torch.logaddexp2 `torch.logaddexp2(input, other, *, out=None) → Tensor` Logarithm of the sum of exponentiations of the inputs in base-2. Calculates pointwise log⁡2(2x+2y)\log_2\left(2^x + 2^y\right) . See [`torch.logaddexp()`](torch.logaddexp#torch.logaddexp "torch.logaddexp") for more details. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. # torch.logcumsumexp `torch.logcumsumexp(input, dim, *, out=None) → Tensor` Returns the logarithm of the cumulative summation of the exponentiation of elements of `input` in the dimension `dim`. For summation index jj given by `dim` and other indices ii , the result is logcumsumexp(x)ij=log⁡∑j=0iexp⁡(xij)\text{logcumsumexp}(x)_{ij} = \log \sum\limits_{j=0}^{i} \exp(x_{ij}) Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to do the operation over Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> a = torch.randn(10) >>> torch.logcumsumexp(a, dim=0) tensor([-0.42296738, -0.04462666, 0.86278635, 0.94622083, 1.05277811, 1.39202815, 1.83525007, 1.84492621, 2.06084887, 2.06844475])) # torch.logdet `torch.logdet(input) → Tensor` Calculates log determinant of a square matrix or batches of square matrices. Note Result is `-inf` if `input` has zero log determinant, and is `nan` if `input` has negative determinant. Note Backward through `logdet()` internally uses SVD results when `input` is not invertible. In this case, double backward through `logdet()` will be unstable in when `input` doesn’t have distinct singular values. See [`svd()`](torch.svd#torch.svd "torch.svd") for details. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size `(*, n, n)` where `*` is zero or more batch dimensions. Example: >>> A = torch.randn(3, 3) >>> torch.det(A) tensor(0.2611) >>> torch.logdet(A) tensor(-1.3430) >>> A tensor([[[ 0.9254, -0.6213], [-0.5787, 1.6843]], [[ 0.3242, -0.9665], [ 0.4539, -0.0887]], [[ 1.1336, -0.4025], [-0.7089, 0.9032]]]) >>> A.det() tensor([1.1990, 0.4099, 0.7386]) >>> A.det().log() tensor([ 0.1815, -0.8917, -0.3031]) # torch.logical_and `torch.logical_and(input, other, *, out=None) → Tensor` Computes the element-wise logical AND of the given input tensors. Zeros are treated as `False` and nonzeros are treated as `True`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute AND with Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.logical_and(torch.tensor([True, False, True]), torch.tensor([True, False, False])) tensor([ True, False, False]) >>> a = torch.tensor([0, 1, 10, 0], dtype=torch.int8) >>> b = torch.tensor([4, 0, 1, 0], dtype=torch.int8) >>> torch.logical_and(a, b) tensor([False, False, True, False]) >>> torch.logical_and(a.double(), b.double()) tensor([False, False, True, False]) >>> torch.logical_and(a.double(), b) tensor([False, False, True, False]) >>> torch.logical_and(a, b, out=torch.empty(4, dtype=torch.bool)) tensor([False, False, True, False]) # torch.logical_not `torch.logical_not(input, *, out=None) → Tensor` Computes the element-wise logical NOT of the given input tensor. If not specified, the output tensor will have the bool dtype. If the input tensor is not a bool tensor, zeros are treated as `False` and non-zeros are treated as `True`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.logical_not(torch.tensor([True, False])) tensor([False, True]) >>> torch.logical_not(torch.tensor([0, 1, -10], dtype=torch.int8)) tensor([ True, False, False]) >>> torch.logical_not(torch.tensor([0., 1.5, -10.], dtype=torch.double)) tensor([ True, False, False]) >>> torch.logical_not(torch.tensor([0., 1., -10.], dtype=torch.double), out=torch.empty(3, dtype=torch.int16)) tensor([1, 0, 0], dtype=torch.int16) # torch.logical_or `torch.logical_or(input, other, *, out=None) → Tensor` Computes the element-wise logical OR of the given input tensors. Zeros are treated as `False` and nonzeros are treated as `True`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute OR with Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.logical_or(torch.tensor([True, False, True]), torch.tensor([True, False, False])) tensor([ True, False, True]) >>> a = torch.tensor([0, 1, 10, 0], dtype=torch.int8) >>> b = torch.tensor([4, 0, 1, 0], dtype=torch.int8) >>> torch.logical_or(a, b) tensor([ True, True, True, False]) >>> torch.logical_or(a.double(), b.double()) tensor([ True, True, True, False]) >>> torch.logical_or(a.double(), b) tensor([ True, True, True, False]) >>> torch.logical_or(a, b, out=torch.empty(4, dtype=torch.bool)) tensor([ True, True, True, False]) # torch.logical_xor `torch.logical_xor(input, other, *, out=None) → Tensor` Computes the element-wise logical XOR of the given input tensors. Zeros are treated as `False` and nonzeros are treated as `True`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute XOR with Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.logical_xor(torch.tensor([True, False, True]), torch.tensor([True, False, False])) tensor([False, False, True]) >>> a = torch.tensor([0, 1, 10, 0], dtype=torch.int8) >>> b = torch.tensor([4, 0, 1, 0], dtype=torch.int8) >>> torch.logical_xor(a, b) tensor([ True, True, False, False]) >>> torch.logical_xor(a.double(), b.double()) tensor([ True, True, False, False]) >>> torch.logical_xor(a.double(), b) tensor([ True, True, False, False]) >>> torch.logical_xor(a, b, out=torch.empty(4, dtype=torch.bool)) tensor([ True, True, False, False]) # torch.logit `torch.logit(input, eps=None, *, out=None) → Tensor` Returns a new tensor with the logit of the elements of `input`. `input` is clamped to [eps, 1 - eps] when eps is not None. When eps is None and `input` < 0 or `input` > 1, the function will yields NaN. yi=ln⁡(zi1−zi)zi={xiif eps is Noneepsif xi1−epsy_{i} = \ln(\frac{z_{i}}{1 - z_{i}}) \\\ z_{i} = \begin{cases} x_{i} & \text{if eps is None} \\\ \text{eps} & \text{if } x_{i} < \text{eps} \\\ x_{i} & \text{if } \text{eps} \leq x_{i} \leq 1 - \text{eps} \\\ 1 - \text{eps} & \text{if } x_{i} > 1 - \text{eps} \end{cases} Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the epsilon for input clamp bound. Default: `None` Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.rand(5) >>> a tensor([0.2796, 0.9331, 0.6486, 0.1523, 0.6516]) >>> torch.logit(a, eps=1e-6) tensor([-0.9466, 2.6352, 0.6131, -1.7169, 0.6261]) # torch.logspace `torch.logspace(start, end, steps, base=10.0, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from basestart{{\text{{base}}}}^{{\text{{start}}}} to baseend{{\text{{base}}}}^{{\text{{end}}}} , inclusive, on a logarithmic scale with base `base`. That is, the values are: (basestart,base(start+end−startsteps−1),…,base(start+(steps−2)∗end−startsteps−1),baseend)(\text{base}^{\text{start}}, \text{base}^{(\text{start} + \frac{\text{end} - \text{start}}{ \text{steps} - 1})}, \ldots, \text{base}^{(\text{start} + (\text{steps} - 2) * \frac{\text{end} - \text{start}}{ \text{steps} - 1})}, \text{base}^{\text{end}}) Warning Not providing a value for `steps` is deprecated. For backwards compatibility, not providing a value for `steps` will create a tensor with 100 elements. Note that this behavior is not reflected in the documented function signature and should not be relied on. In a future PyTorch release, failing to provide a value for `steps` will throw a runtime error. Parameters * **start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the starting value for the set of points * **end** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the ending value for the set of points * **steps** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the constructed tensor * **base** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – base of the logarithm function. Default: `10.0`. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.logspace(start=-10, end=10, steps=5) tensor([ 1.0000e-10, 1.0000e-05, 1.0000e+00, 1.0000e+05, 1.0000e+10]) >>> torch.logspace(start=0.1, end=1.0, steps=5) tensor([ 1.2589, 2.1135, 3.5481, 5.9566, 10.0000]) >>> torch.logspace(start=0.1, end=1.0, steps=1) tensor([1.2589]) >>> torch.logspace(start=2, end=2, steps=1, base=2) tensor([4.0]) # torch.logsumexp `torch.logsumexp(input, dim, keepdim=False, *, out=None)` Returns the log of summed exponentials of each row of the `input` tensor in the given dimension `dim`. The computation is numerically stabilized. For summation index jj given by `dim` and other indices ii , the result is logsumexp(x)i=log⁡∑jexp⁡(xij)\text{logsumexp}(x)_{i} = \log \sum_j \exp(x_{ij}) If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> a = torch.randn(3, 3) >>> torch.logsumexp(a, 1) tensor([ 0.8442, 1.4322, 0.8711]) # torch.lstsq `torch.lstsq(input, A, *, out=None) → Tensor` Computes the solution to the least squares and least norm problems for a full rank matrix AA of size (m×n)(m \times n) and a matrix BB of size (m×k)(m \times k) . If m≥nm \geq n , `lstsq()` solves the least-squares problem: min⁡X∥AX−B∥2.\begin{array}{ll} \min_X & \|AX-B\|_2. \end{array} If m>> A = torch.tensor([[1., 1, 1], ... [2, 3, 4], ... [3, 5, 2], ... [4, 2, 5], ... [5, 4, 3]]) >>> B = torch.tensor([[-10., -3], ... [ 12, 14], ... [ 14, 12], ... [ 16, 16], ... [ 18, 16]]) >>> X, _ = torch.lstsq(B, A) >>> X tensor([[ 2.0000, 1.0000], [ 1.0000, 1.0000], [ 1.0000, 2.0000], [ 10.9635, 4.8501], [ 8.9332, 5.2418]]) # torch.lt `torch.lt(input, other, *, out=None) → Tensor` Computes input>> torch.lt(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[False, False], [True, False]]) # torch.lu `torch.lu(*args, **kwargs)` Computes the LU factorization of a matrix or batches of matrices `A`. Returns a tuple containing the LU factorization and pivots of `A`. Pivoting is done if `pivot` is set to `True`. Note The pivots returned by the function are 1-indexed. If `pivot` is `False`, then the returned pivots is a tensor filled with zeros of the appropriate size. Note LU factorization with `pivot` = `False` is not available for CPU, and attempting to do so will throw an error. However, LU factorization with `pivot` = `False` is available for CUDA. Note This function does not check if the factorization was successful or not if `get_infos` is `True` since the status of the factorization is present in the third element of the return tuple. Note In the case of batches of square matrices with size less or equal to 32 on a CUDA device, the LU factorization is repeated for singular matrices due to the bug in the MAGMA library (see magma issue 13). Note `L`, `U`, and `P` can be derived using [`torch.lu_unpack()`](torch.lu_unpack#torch.lu_unpack "torch.lu_unpack"). Warning The LU factorization does have backward support, but only for square inputs of full rank. Parameters * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to factor of size (∗,m,n)(*, m, n) * **pivot** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether pivoting is done. Default: `True` * **get_infos** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if set to `True`, returns an info IntTensor. Default: `False` * **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – optional output tuple. If `get_infos` is `True`, then the elements in the tuple are Tensor, IntTensor, and IntTensor. If `get_infos` is `False`, then the elements in the tuple are Tensor, IntTensor. Default: `None` Returns A tuple of tensors containing * **factorization** (_Tensor_): the factorization of size (∗,m,n)(*, m, n) * **pivots** (_IntTensor_): the pivots of size (∗,min(m,n))(*, \text{min}(m, n)) . `pivots` stores all the intermediate transpositions of rows. The final permutation `perm` could be reconstructed by applying `swap(perm[i], perm[pivots[i] - 1])` for `i = 0, ..., pivots.size(-1) - 1`, where `perm` is initially the identity permutation of mm elements (essentially this is what [`torch.lu_unpack()`](torch.lu_unpack#torch.lu_unpack "torch.lu_unpack") is doing). * **infos** (_IntTensor_ , _optional_): if `get_infos` is `True`, this is a tensor of size (∗)(*) where non-zero values indicate whether factorization for the matrix or each minibatch has succeeded or failed Return type ([Tensor](../tensors#torch.Tensor "torch.Tensor"), IntTensor, IntTensor (optional)) Example: >>> A = torch.randn(2, 3, 3) >>> A_LU, pivots = torch.lu(A) >>> A_LU tensor([[[ 1.3506, 2.5558, -0.0816], [ 0.1684, 1.1551, 0.1940], [ 0.1193, 0.6189, -0.5497]], [[ 0.4526, 1.2526, -0.3285], [-0.7988, 0.7175, -0.9701], [ 0.2634, -0.9255, -0.3459]]]) >>> pivots tensor([[ 3, 3, 3], [ 3, 3, 3]], dtype=torch.int32) >>> A_LU, pivots, info = torch.lu(A, get_infos=True) >>> if info.nonzero().size(0) == 0: ... print('LU factorization succeeded for all samples!') LU factorization succeeded for all samples! # torch.lu_solve `torch.lu_solve(b, LU_data, LU_pivots, *, out=None) → Tensor` Returns the LU solve of the linear system Ax=bAx = b using the partially pivoted LU factorization of A from [`torch.lu()`](torch.lu#torch.lu "torch.lu"). This function supports `float`, `double`, `cfloat` and `cdouble` dtypes for `input`. Parameters * **b** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the RHS tensor of size (∗,m,k)(*, m, k) , where ∗* is zero or more batch dimensions. * **LU_data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the pivoted LU factorization of A from [`torch.lu()`](torch.lu#torch.lu "torch.lu") of size (∗,m,m)(*, m, m) , where ∗* is zero or more batch dimensions. * **LU_pivots** (_IntTensor_) – the pivots of the LU factorization from [`torch.lu()`](torch.lu#torch.lu "torch.lu") of size (∗,m)(*, m) , where ∗* is zero or more batch dimensions. The batch dimensions of `LU_pivots` must be equal to the batch dimensions of `LU_data`. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> A = torch.randn(2, 3, 3) >>> b = torch.randn(2, 3, 1) >>> A_LU = torch.lu(A) >>> x = torch.lu_solve(b, *A_LU) >>> torch.norm(torch.bmm(A, x) - b) tensor(1.00000e-07 * 2.8312) # torch.lu_unpack `torch.lu_unpack(LU_data, LU_pivots, unpack_data=True, unpack_pivots=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#lu_unpack) Unpacks the data and pivots from a LU factorization of a tensor. Returns a tuple of tensors as `(the pivots, the L tensor, the U tensor)`. Parameters * **LU_data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the packed LU factorization data * **LU_pivots** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the packed LU factorization pivots * **unpack_data** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – flag indicating if the data should be unpacked * **unpack_pivots** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – flag indicating if the pivots should be unpacked Examples: >>> A = torch.randn(2, 3, 3) >>> A_LU, pivots = A.lu() >>> P, A_L, A_U = torch.lu_unpack(A_LU, pivots) >>> >>> # can recover A from factorization >>> A_ = torch.bmm(P, torch.bmm(A_L, A_U)) >>> # LU factorization of a rectangular matrix: >>> A = torch.randn(2, 3, 2) >>> A_LU, pivots = A.lu() >>> P, A_L, A_U = torch.lu_unpack(A_LU, pivots) >>> P tensor([[[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]], [[0., 0., 1.], [0., 1., 0.], [1., 0., 0.]]]) >>> A_L tensor([[[ 1.0000, 0.0000], [ 0.4763, 1.0000], [ 0.3683, 0.1135]], [[ 1.0000, 0.0000], [ 0.2957, 1.0000], [-0.9668, -0.3335]]]) >>> A_U tensor([[[ 2.1962, 1.0881], [ 0.0000, -0.8681]], [[-1.0947, 0.3736], [ 0.0000, 0.5718]]]) >>> A_ = torch.bmm(P, torch.bmm(A_L, A_U)) >>> torch.norm(A_ - A) tensor(2.9802e-08) # torch.manual_seed `torch.manual_seed(seed)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#manual_seed) Sets the seed for generating random numbers. Returns a `torch.Generator` object. Parameters **seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The desired seed. Value must be within the inclusive range `[-0x8000_0000_0000_0000, 0xffff_ffff_ffff_ffff]`. Otherwise, a RuntimeError is raised. Negative inputs are remapped to positive values with the formula `0xffff_ffff_ffff_ffff + seed`. # torch.masked_select `torch.masked_select(input, mask, *, out=None) → Tensor` Returns a new 1-D tensor which indexes the `input` tensor according to the boolean mask `mask` which is a `BoolTensor`. The shapes of the `mask` tensor and the `input` tensor don’t need to match, but they must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Note The returned tensor does **not** use the same storage as the original tensor Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **mask** (_BoolTensor_) – the tensor containing the binary mask to index with Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x = torch.randn(3, 4) >>> x tensor([[ 0.3552, -2.3825, -0.8297, 0.3477], [-1.2035, 1.2252, 0.5002, 0.6248], [ 0.1307, -2.0608, 0.1244, 2.0139]]) >>> mask = x.ge(0.5) >>> mask tensor([[False, False, False, False], [False, True, True, True], [False, False, False, True]]) >>> torch.masked_select(x, mask) tensor([ 1.2252, 0.5002, 0.6248, 2.0139]) # torch.matmul `torch.matmul(input, other, *, out=None) → Tensor` Matrix product of two tensors. The behavior depends on the dimensionality of the tensors as follows: * If both tensors are 1-dimensional, the dot product (scalar) is returned. * If both arguments are 2-dimensional, the matrix-matrix product is returned. * If the first argument is 1-dimensional and the second argument is 2-dimensional, a 1 is prepended to its dimension for the purpose of the matrix multiply. After the matrix multiply, the prepended dimension is removed. * If the first argument is 2-dimensional and the second argument is 1-dimensional, the matrix-vector product is returned. * If both arguments are at least 1-dimensional and at least one argument is N-dimensional (where N > 2), then a batched matrix multiply is returned. If the first argument is 1-dimensional, a 1 is prepended to its dimension for the purpose of the batched matrix multiply and removed after. If the second argument is 1-dimensional, a 1 is appended to its dimension for the purpose of the batched matrix multiple and removed after. The non-matrix (i.e. batch) dimensions are [broadcasted](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting-semantics) (and thus must be broadcastable). For example, if `input` is a (j×1×n×n)(j \times 1 \times n \times n) tensor and `other` is a (k×n×n)(k \times n \times n) tensor, `out` will be a (j×k×n×n)(j \times k \times n \times n) tensor. Note that the broadcasting logic only looks at the batch dimensions when determining if the inputs are broadcastable, and not the matrix dimensions. For example, if `input` is a (j×1×n×m)(j \times 1 \times n \times m) tensor and `other` is a (k×m×p)(k \times m \times p) tensor, these inputs are valid for broadcasting even though the final two dimensions (i.e. the matrix dimensions) are different. `out` will be a (j×k×n×p)(j \times k \times n \times p) tensor. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Note The 1-dimensional dot product version of this function does not support an `out` parameter. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first tensor to be multiplied * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second tensor to be multiplied Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> # vector x vector >>> tensor1 = torch.randn(3) >>> tensor2 = torch.randn(3) >>> torch.matmul(tensor1, tensor2).size() torch.Size([]) >>> # matrix x vector >>> tensor1 = torch.randn(3, 4) >>> tensor2 = torch.randn(4) >>> torch.matmul(tensor1, tensor2).size() torch.Size([3]) >>> # batched matrix x broadcasted vector >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(4) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3]) >>> # batched matrix x batched matrix >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(10, 4, 5) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3, 5]) >>> # batched matrix x broadcasted matrix >>> tensor1 = torch.randn(10, 3, 4) >>> tensor2 = torch.randn(4, 5) >>> torch.matmul(tensor1, tensor2).size() torch.Size([10, 3, 5]) # torch.matrix_exp `torch.matrix_exp()` Returns the matrix exponential. Supports batched input. For a matrix `A`, the matrix exponential is defined as eA=∑k=0∞Ak/k!\mathrm{e}^A = \sum_{k=0}^\infty A^k / k! The implementation is based on: Bader, P.; Blanes, S.; Casas, F. Computing the Matrix Exponential with an Optimized Taylor Polynomial Approximation. Mathematics 2019, 7, 1174. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.randn(2, 2, 2) >>> a[0, :, :] = torch.eye(2, 2) >>> a[1, :, :] = 2 * torch.eye(2, 2) >>> a tensor([[[1., 0.], [0., 1.]], [[2., 0.], [0., 2.]]]) >>> torch.matrix_exp(a) tensor([[[2.7183, 0.0000], [0.0000, 2.7183]], [[7.3891, 0.0000], [0.0000, 7.3891]]]) >>> import math >>> x = torch.tensor([[0, math.pi/3], [-math.pi/3, 0]]) >>> x.matrix_exp() # should be [[cos(pi/3), sin(pi/3)], [-sin(pi/3), cos(pi/3)]] tensor([[ 0.5000, 0.8660], [-0.8660, 0.5000]]) # torch.matrix_power `torch.matrix_power(input, n) → Tensor` Returns the matrix raised to the power `n` for square matrices. For batch of matrices, each individual matrix is raised to the power `n`. If `n` is negative, then the inverse of the matrix (if invertible) is raised to the power `n`. For a batch of matrices, the batched inverse (if invertible) is raised to the power `n`. If `n` is 0, then an identity matrix is returned. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the power to raise the matrix to Example: >>> a = torch.randn(2, 2, 2) >>> a tensor([[[-1.9975, -1.9610], [ 0.9592, -2.3364]], [[-1.2534, -1.3429], [ 0.4153, -1.4664]]]) >>> torch.matrix_power(a, 3) tensor([[[ 3.9392, -23.9916], [ 11.7357, -0.2070]], [[ 0.2468, -6.7168], [ 2.0774, -0.8187]]]) # torch.matrix_rank `torch.matrix_rank(input, tol=None, symmetric=False, *, out=None) → Tensor` Returns the numerical rank of a 2-D tensor. The method to compute the matrix rank is done using SVD by default. If `symmetric` is `True`, then `input` is assumed to be symmetric, and the computation of the rank is done by obtaining the eigenvalues. `tol` is the threshold below which the singular values (or the eigenvalues when `symmetric` is `True`) are considered to be 0. If `tol` is not specified, `tol` is set to `S.max() * max(S.size()) * eps` where `S` is the singular values (or the eigenvalues when `symmetric` is `True`), and `eps` is the epsilon value for the datatype of `input`. Note `torch.matrix_rank()` is deprecated. Please use [`torch.linalg.matrix_rank()`](../linalg#torch.linalg.matrix_rank "torch.linalg.matrix_rank") instead. The parameter `symmetric` was renamed in [`torch.linalg.matrix_rank()`](../linalg#torch.linalg.matrix_rank "torch.linalg.matrix_rank") to `hermitian`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input 2-D tensor * **tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the tolerance value. Default: `None` * **symmetric** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates whether `input` is symmetric. Default: `False` Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.eye(10) >>> torch.matrix_rank(a) tensor(10) >>> b = torch.eye(10) >>> b[0, 0] = 0 >>> torch.matrix_rank(b) tensor(9) # torch.max `torch.max(input) → Tensor` Returns the maximum value of all elements in the `input` tensor. Warning This function produces deterministic (sub)gradients unlike `max(dim=0)` Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.randn(1, 3) >>> a tensor([[ 0.6763, 0.7445, -2.2369]]) >>> torch.max(a) tensor(0.7445) `torch.max(input, dim, keepdim=False, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` is the maximum value of each row of the `input` tensor in the given dimension `dim`. And `indices` is the index location of each maximum value found (argmax). If `keepdim` is `True`, the output tensors are of the same size as `input` except in the dimension `dim` where they are of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensors having 1 fewer dimension than `input`. Note If there are multiple maximal values in a reduced row then the indices of the first maximal value are returned. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Default: `False`. Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the result tuple of two output tensors (max, max_indices) Example: >>> a = torch.randn(4, 4) >>> a tensor([[-1.2360, -0.2942, -0.1222, 0.8475], [ 1.1949, -1.1127, -2.2379, -0.6702], [ 1.5717, -0.9207, 0.1297, -1.8768], [-0.6172, 1.0036, -0.6060, -0.2432]]) >>> torch.max(a, 1) torch.return_types.max(values=tensor([0.8475, 1.1949, 1.5717, 1.0036]), indices=tensor([3, 0, 0, 1])) `torch.max(input, other, *, out=None) → Tensor` See [`torch.maximum()`](torch.maximum#torch.maximum "torch.maximum"). # torch.maximum `torch.maximum(input, other, *, out=None) → Tensor` Computes the element-wise maximum of `input` and `other`. Note If one of the elements being compared is a NaN, then that element is returned. `maximum()` is not supported for tensors with complex dtypes. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor((1, 2, -1)) >>> b = torch.tensor((3, 0, 4)) >>> torch.maximum(a, b) tensor([3, 2, 4]) # torch.mean `torch.mean(input) → Tensor` Returns the mean value of all elements in the `input` tensor. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.randn(1, 3) >>> a tensor([[ 0.2294, -0.5481, 1.3288]]) >>> torch.mean(a) tensor(0.3367) `torch.mean(input, dim, keepdim=False, *, out=None) → Tensor` Returns the mean value of each row of the `input` tensor in the given dimension `dim`. If `dim` is a list of dimensions, reduce over all of them. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 4) >>> a tensor([[-0.3841, 0.6320, 0.4254, -0.7384], [-0.9644, 1.0131, -0.6549, -1.4279], [-0.2951, -1.3350, -0.7694, 0.5600], [ 1.0842, -0.9580, 0.3623, 0.2343]]) >>> torch.mean(a, 1) tensor([-0.0163, -0.5085, -0.4599, 0.1807]) >>> torch.mean(a, 1, True) tensor([[-0.0163], [-0.5085], [-0.4599], [ 0.1807]]) # torch.median `torch.median(input) → Tensor` Returns the median of the values in `input`. Note The median is not unique for `input` tensors with an even number of elements. In this case the lower of the two medians is returned. To compute the mean of both medians, use [`torch.quantile()`](torch.quantile#torch.quantile "torch.quantile") with `q=0.5` instead. Warning This function produces deterministic (sub)gradients unlike `median(dim=0)` Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.randn(1, 3) >>> a tensor([[ 1.5219, -1.5212, 0.2202]]) >>> torch.median(a) tensor(0.2202) `torch.median(input, dim=-1, keepdim=False, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` contains the median of each row of `input` in the dimension `dim`, and `indices` contains the index of the median values found in the dimension `dim`. By default, `dim` is the last dimension of the `input` tensor. If `keepdim` is `True`, the output tensors are of the same size as `input` except in the dimension `dim` where they are of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the outputs tensor having 1 fewer dimension than `input`. Note The median is not unique for `input` tensors with an even number of elements in the dimension `dim`. In this case the lower of the two medians is returned. To compute the mean of both medians in `input`, use [`torch.quantile()`](torch.quantile#torch.quantile "torch.quantile") with `q=0.5` instead. Warning `indices` does not necessarily contain the first occurrence of each median value found, unless it is unique. The exact implementation details are device- specific. Do not expect the same result when run on CPU and GPU in general. For the same reason do not expect the gradients to be deterministic. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** (_(_[Tensor](../tensors#torch.Tensor "torch.Tensor") _,_[Tensor](../tensors#torch.Tensor "torch.Tensor") _)__,__optional_) – The first tensor will be populated with the median values and the second tensor, which must have dtype long, with their indices in the dimension `dim` of `input`. Example: >>> a = torch.randn(4, 5) >>> a tensor([[ 0.2505, -0.3982, -0.9948, 0.3518, -1.3131], [ 0.3180, -0.6993, 1.0436, 0.0438, 0.2270], [-0.2751, 0.7303, 0.2192, 0.3321, 0.2488], [ 1.0778, -1.9510, 0.7048, 0.4742, -0.7125]]) >>> torch.median(a, 1) torch.return_types.median(values=tensor([-0.3982, 0.2270, 0.2488, 0.4742]), indices=tensor([1, 4, 4, 3])) # torch.meshgrid `torch.meshgrid(*tensors)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#meshgrid) Take NN tensors, each of which can be either scalar or 1-dimensional vector, and create NN N-dimensional grids, where the ii th grid is defined by expanding the ii th input over dimensions defined by other inputs. Parameters **tensors** (_list of Tensor_) – list of scalars or 1 dimensional tensors. Scalars will be treated as tensors of size (1,)(1,) automatically Returns If the input has kk tensors of size (N1,),(N2,),…,(Nk,)(N_1,), (N_2,), \ldots , (N_k,) , then the output would also have kk tensors, where all tensors are of size (N1,N2,…,Nk)(N_1, N_2, \ldots , N_k) . Return type seq (sequence of Tensors) Example: >>> x = torch.tensor([1, 2, 3]) >>> y = torch.tensor([4, 5, 6]) >>> grid_x, grid_y = torch.meshgrid(x, y) >>> grid_x tensor([[1, 1, 1], [2, 2, 2], [3, 3, 3]]) >>> grid_y tensor([[4, 5, 6], [4, 5, 6], [4, 5, 6]]) # torch.min `torch.min(input) → Tensor` Returns the minimum value of all elements in the `input` tensor. Warning This function produces deterministic (sub)gradients unlike `min(dim=0)` Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.randn(1, 3) >>> a tensor([[ 0.6750, 1.0857, 1.7197]]) >>> torch.min(a) tensor(0.6750) `torch.min(input, dim, keepdim=False, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` is the minimum value of each row of the `input` tensor in the given dimension `dim`. And `indices` is the index location of each minimum value found (argmin). If `keepdim` is `True`, the output tensors are of the same size as `input` except in the dimension `dim` where they are of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensors having 1 fewer dimension than `input`. Note If there are multiple minimal values in a reduced row then the indices of the first minimal value are returned. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the tuple of two output tensors (min, min_indices) Example: >>> a = torch.randn(4, 4) >>> a tensor([[-0.6248, 1.1334, -1.1899, -0.2803], [-1.4644, -0.2635, -0.3651, 0.6134], [ 0.2457, 0.0384, 1.0128, 0.7015], [-0.1153, 2.9849, 2.1458, 0.5788]]) >>> torch.min(a, 1) torch.return_types.min(values=tensor([-1.1899, -1.4644, 0.0384, -0.1153]), indices=tensor([2, 0, 1, 0])) `torch.min(input, other, *, out=None) → Tensor` See [`torch.minimum()`](torch.minimum#torch.minimum "torch.minimum"). # torch.minimum `torch.minimum(input, other, *, out=None) → Tensor` Computes the element-wise minimum of `input` and `other`. Note If one of the elements being compared is a NaN, then that element is returned. `minimum()` is not supported for tensors with complex dtypes. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor((1, 2, -1)) >>> b = torch.tensor((3, 0, 4)) >>> torch.minimum(a, b) tensor([1, 0, -1]) # torch.mm `torch.mm(input, mat2, *, out=None) → Tensor` Performs a matrix multiplication of the matrices `input` and `mat2`. If `input` is a (n×m)(n \times m) tensor, `mat2` is a (m×p)(m \times p) tensor, `out` will be a (n×p)(n \times p) tensor. Note This function does not [broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). For broadcasting matrix products, see [`torch.matmul()`](torch.matmul#torch.matmul "torch.matmul"). Supports strided and sparse 2-D tensors as inputs, autograd with respect to strided inputs. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first matrix to be matrix multiplied * **mat2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second matrix to be matrix multiplied Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> mat1 = torch.randn(2, 3) >>> mat2 = torch.randn(3, 3) >>> torch.mm(mat1, mat2) tensor([[ 0.4851, 0.5037, -0.3633], [-0.0760, -3.6705, 2.4784]]) # torch.mode `torch.mode(input, dim=-1, keepdim=False, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` is the mode value of each row of the `input` tensor in the given dimension `dim`, i.e. a value which appears most often in that row, and `indices` is the index location of each mode value found. By default, `dim` is the last dimension of the `input` tensor. If `keepdim` is `True`, the output tensors are of the same size as `input` except in the dimension `dim` where they are of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensors having 1 fewer dimension than `input`. Note This function is not defined for `torch.cuda.Tensor` yet. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the result tuple of two output tensors (values, indices) Example: >>> a = torch.randint(10, (5,)) >>> a tensor([6, 5, 1, 0, 2]) >>> b = a + (torch.randn(50, 1) * 5).long() >>> torch.mode(b, 0) torch.return_types.mode(values=tensor([6, 5, 1, 0, 2]), indices=tensor([2, 2, 2, 2, 2])) # torch.moveaxis `torch.moveaxis(input, source, destination) → Tensor` Alias for [`torch.movedim()`](torch.movedim#torch.movedim "torch.movedim"). This function is equivalent to NumPy’s moveaxis function. Examples: >>> t = torch.randn(3,2,1) >>> t tensor([[[-0.3362], [-0.8437]], [[-0.9627], [ 0.1727]], [[ 0.5173], [-0.1398]]]) >>> torch.moveaxis(t, 1, 0).shape torch.Size([2, 3, 1]) >>> torch.moveaxis(t, 1, 0) tensor([[[-0.3362], [-0.9627], [ 0.5173]], [[-0.8437], [ 0.1727], [-0.1398]]]) >>> torch.moveaxis(t, (1, 2), (0, 1)).shape torch.Size([2, 1, 3]) >>> torch.moveaxis(t, (1, 2), (0, 1)) tensor([[[-0.3362, -0.9627, 0.5173]], [[-0.8437, 0.1727, -0.1398]]]) # torch.movedim `torch.movedim(input, source, destination) → Tensor` Moves the dimension(s) of `input` at the position(s) in `source` to the position(s) in `destination`. Other dimensions of `input` that are not explicitly moved remain in their original order and appear at the positions not specified in `destination`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **source** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – Original positions of the dims to move. These must be unique. * **destination** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – Destination positions for each of the original dims. These must also be unique. Examples: >>> t = torch.randn(3,2,1) >>> t tensor([[[-0.3362], [-0.8437]], [[-0.9627], [ 0.1727]], [[ 0.5173], [-0.1398]]]) >>> torch.movedim(t, 1, 0).shape torch.Size([2, 3, 1]) >>> torch.movedim(t, 1, 0) tensor([[[-0.3362], [-0.9627], [ 0.5173]], [[-0.8437], [ 0.1727], [-0.1398]]]) >>> torch.movedim(t, (1, 2), (0, 1)).shape torch.Size([2, 1, 3]) >>> torch.movedim(t, (1, 2), (0, 1)) tensor([[[-0.3362, -0.9627, 0.5173]], [[-0.8437, 0.1727, -0.1398]]]) # torch.msort `torch.msort(input, *, out=None) → Tensor` Sorts the elements of the `input` tensor along its first dimension in ascending order by value. Note `torch.msort(t)` is equivalent to `torch.sort(t, dim=0)[0]`. See also [`torch.sort()`](torch.sort#torch.sort "torch.sort"). Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> t = torch.randn(3, 4) >>> t tensor([[-0.1321, 0.4370, -1.2631, -1.1289], [-2.0527, -1.1250, 0.2275, 0.3077], [-0.0881, -0.1259, -0.5495, 1.0284]]) >>> torch.msort(t) tensor([[-2.0527, -1.1250, -1.2631, -1.1289], [-0.1321, -0.1259, -0.5495, 0.3077], [-0.0881, 0.4370, 0.2275, 1.0284]]) # torch.mul `torch.mul(input, other, *, out=None)` Multiplies each element of the input `input` with the scalar `other` and returns a new resulting tensor. outi=other×inputi\text{out}_i = \text{other} \times \text{input}_i If `input` is of type `FloatTensor` or `DoubleTensor`, `other` should be a real number, otherwise it should be an integer Parameters * **{input}** – * **other** (_Number_) – the number to be multiplied to each element of `input` Keyword Arguments **{out}** – Example: >>> a = torch.randn(3) >>> a tensor([ 0.2015, -0.4255, 2.6087]) >>> torch.mul(a, 100) tensor([ 20.1494, -42.5491, 260.8663]) `torch.mul(input, other, *, out=None)` Each element of the tensor `input` is multiplied by the corresponding element of the Tensor `other`. The resulting tensor is returned. The shapes of `input` and `other` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). outi=inputi×otheri\text{out}_i = \text{input}_i \times \text{other}_i Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first multiplicand tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second multiplicand tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 1) >>> a tensor([[ 1.1207], [-0.3137], [ 0.0700], [ 0.8378]]) >>> b = torch.randn(1, 4) >>> b tensor([[ 0.5146, 0.1216, -0.5244, 2.2382]]) >>> torch.mul(a, b) tensor([[ 0.5767, 0.1363, -0.5877, 2.5083], [-0.1614, -0.0382, 0.1645, -0.7021], [ 0.0360, 0.0085, -0.0367, 0.1567], [ 0.4312, 0.1019, -0.4394, 1.8753]]) # torch.multinomial `torch.multinomial(input, num_samples, replacement=False, *, generator=None, out=None) → LongTensor` Returns a tensor where each row contains `num_samples` indices sampled from the multinomial probability distribution located in the corresponding row of tensor `input`. Note The rows of `input` do not need to sum to one (in which case we use the values as weights), but must be non-negative, finite and have a non-zero sum. Indices are ordered from left to right according to when each was sampled (first samples are placed in first column). If `input` is a vector, `out` is a vector of size `num_samples`. If `input` is a matrix with `m` rows, `out` is an matrix of shape (m×num_samples)(m \times \text{num\\_samples}) . If replacement is `True`, samples are drawn with replacement. If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row. Note When drawn without replacement, `num_samples` must be lower than number of non-zero elements in `input` (or the min number of non-zero elements in each row of `input` if it is a matrix). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor containing probabilities * **num_samples** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of samples to draw * **replacement** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to draw with replacement or not Keyword Arguments * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> weights = torch.tensor([0, 10, 3, 0], dtype=torch.float) # create a tensor of weights >>> torch.multinomial(weights, 2) tensor([1, 2]) >>> torch.multinomial(weights, 4) # ERROR! RuntimeError: invalid argument 2: invalid multinomial distribution (with replacement=False, not enough non-negative category to sample) at ../aten/src/TH/generic/THTensorRandom.cpp:320 >>> torch.multinomial(weights, 4, replacement=True) tensor([ 2, 1, 1, 1]) # torch.multiply `torch.multiply(input, other, *, out=None)` Alias for [`torch.mul()`](torch.mul#torch.mul "torch.mul"). # torch.mv `torch.mv(input, vec, *, out=None) → Tensor` Performs a matrix-vector product of the matrix `input` and the vector `vec`. If `input` is a (n×m)(n \times m) tensor, `vec` is a 1-D tensor of size mm , `out` will be 1-D of size nn . Note This function does not [broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – matrix to be multiplied * **vec** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – vector to be multiplied Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> mat = torch.randn(2, 3) >>> vec = torch.randn(3) >>> torch.mv(mat, vec) tensor([ 1.0404, -0.6361]) # torch.mvlgamma `torch.mvlgamma(input, p) → Tensor` Computes the [multivariate log-gamma function](https://en.wikipedia.org/wiki/Multivariate_gamma_function)) with dimension pp element-wise, given by log⁡(Γp(a))=C+∑i=1plog⁡(Γ(a−i−12))\log(\Gamma_{p}(a)) = C + \displaystyle \sum_{i=1}^{p} \log\left(\Gamma\left(a - \frac{i - 1}{2}\right)\right) where C=log⁡(π)×p(p−1)4C = \log(\pi) \times \frac{p (p - 1)}{4} and Γ(⋅)\Gamma(\cdot) is the Gamma function. All elements must be greater than p−12\frac{p - 1}{2} , otherwise an error would be thrown. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compute the multivariate log-gamma function * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of dimensions Example: >>> a = torch.empty(2, 3).uniform_(1, 2) >>> a tensor([[1.6835, 1.8474, 1.1929], [1.0475, 1.7162, 1.4180]]) >>> torch.mvlgamma(a, 2) tensor([[0.3928, 0.4007, 0.7586], [1.0311, 0.3901, 0.5049]]) # torch.nan_to_num `torch.nan_to_num(input, nan=0.0, posinf=None, neginf=None, *, out=None) → Tensor` Replaces `NaN`, positive infinity, and negative infinity values in `input` with the values specified by `nan`, `posinf`, and `neginf`, respectively. By default, `NaN`s are replaced with zero, positive infinity is replaced with the greatest finite value representable by :attr:`input`’s dtype, and negative infinity is replaced with the least finite value representable by `input`’s dtype. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **nan** (_Number_ _,__optional_) – the value to replace `NaN`s with. Default is zero. * **posinf** (_Number_ _,__optional_) – if a Number, the value to replace positive infinity values with. If None, positive infinity values are replaced with the greatest finite value representable by `input`’s dtype. Default is None. * **neginf** (_Number_ _,__optional_) – if a Number, the value to replace negative infinity values with. If None, negative infinity values are replaced with the lowest finite value representable by `input`’s dtype. Default is None. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x = torch.tensor([float('nan'), float('inf'), -float('inf'), 3.14]) >>> torch.nan_to_num(x) tensor([ 0.0000e+00, 3.4028e+38, -3.4028e+38, 3.1400e+00]) >>> torch.nan_to_num(x, nan=2.0) tensor([ 2.0000e+00, 3.4028e+38, -3.4028e+38, 3.1400e+00]) >>> torch.nan_to_num(x, nan=2.0, posinf=1.0) tensor([ 2.0000e+00, 1.0000e+00, -3.4028e+38, 3.1400e+00]) # torch.nanmedian `torch.nanmedian(input) → Tensor` Returns the median of the values in `input`, ignoring `NaN` values. This function is identical to [`torch.median()`](torch.median#torch.median "torch.median") when there are no `NaN` values in `input`. When `input` has one or more `NaN` values, [`torch.median()`](torch.median#torch.median "torch.median") will always return `NaN`, while this function will return the median of the non-`NaN` elements in `input`. If all the elements in `input` are `NaN` it will also return `NaN`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.tensor([1, float('nan'), 3, 2]) >>> a.median() tensor(nan) >>> a.nanmedian() tensor(2.) `torch.nanmedian(input, dim=-1, keepdim=False, *, out=None) -> (Tensor, LongTensor)` Returns a namedtuple `(values, indices)` where `values` contains the median of each row of `input` in the dimension `dim`, ignoring `NaN` values, and `indices` contains the index of the median values found in the dimension `dim`. This function is identical to [`torch.median()`](torch.median#torch.median "torch.median") when there are no `NaN` values in a reduced row. When a reduced row has one or more `NaN` values, [`torch.median()`](torch.median#torch.median "torch.median") will always reduce it to `NaN`, while this function will reduce it to the median of the non-`NaN` elements. If all the elements in a reduced row are `NaN` then it will be reduced to `NaN`, too. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** (_(_[Tensor](../tensors#torch.Tensor "torch.Tensor") _,_[Tensor](../tensors#torch.Tensor "torch.Tensor") _)__,__optional_) – The first tensor will be populated with the median values and the second tensor, which must have dtype long, with their indices in the dimension `dim` of `input`. Example: >>> a = torch.tensor([[2, 3, 1], [float('nan'), 1, float('nan')]]) >>> a tensor([[2., 3., 1.], [nan, 1., nan]]) >>> a.median(0) torch.return_types.median(values=tensor([nan, 1., nan]), indices=tensor([1, 1, 1])) >>> a.nanmedian(0) torch.return_types.nanmedian(values=tensor([2., 1., 1.]), indices=tensor([0, 1, 0])) # torch.nanquantile `torch.nanquantile(input, q, dim=None, keepdim=False, *, out=None) → Tensor` This is a variant of [`torch.quantile()`](torch.quantile#torch.quantile "torch.quantile") that “ignores” `NaN` values, computing the quantiles `q` as if `NaN` values in `input` did not exist. If all values in a reduced row are `NaN` then the quantiles for that reduction will be `NaN`. See the documentation for [`torch.quantile()`](torch.quantile#torch.quantile "torch.quantile"). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **q** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – a scalar or 1D tensor of quantile values in the range [0, 1] * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> t = torch.tensor([float('nan'), 1, 2]) >>> t.quantile(0.5) tensor(nan) >>> t.nanquantile(0.5) tensor(1.5000) >>> t = torch.tensor([[float('nan'), float('nan')], [1, 2]]) >>> t tensor([[nan, nan], [1., 2.]]) >>> t.nanquantile(0.5, dim=0) tensor([1., 2.]) >>> t.nanquantile(0.5, dim=1) tensor([ nan, 1.5000]) # torch.nansum `torch.nansum(input, *, dtype=None) → Tensor` Returns the sum of all elements, treating Not a Numbers (NaNs) as zero. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. Example: >>> a = torch.tensor([1., 2., float('nan'), 4.]) >>> torch.nansum(a) tensor(7.) `torch.nansum(input, dim, keepdim=False, *, dtype=None) → Tensor` Returns the sum of each row of the `input` tensor in the given dimension `dim`, treating Not a Numbers (NaNs) as zero. If `dim` is a list of dimensions, reduce over all of them. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. Example: >>> torch.nansum(torch.tensor([1., float("nan")])) 1.0 >>> a = torch.tensor([[1, 2], [3., float("nan")]]) >>> torch.nansum(a) tensor(6.) >>> torch.nansum(a, dim=0) tensor([4., 2.]) >>> torch.nansum(a, dim=1) tensor([3., 3.]) # torch.narrow `torch.narrow(input, dim, start, length) → Tensor` Returns a new tensor that is a narrowed version of `input` tensor. The dimension `dim` is input from `start` to `start + length`. The returned tensor and `input` tensor share the same underlying storage. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to narrow * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension along which to narrow * **start** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the starting dimension * **length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the distance to the ending dimension Example: >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> torch.narrow(x, 0, 0, 2) tensor([[ 1, 2, 3], [ 4, 5, 6]]) >>> torch.narrow(x, 1, 1, 2) tensor([[ 2, 3], [ 5, 6], [ 8, 9]]) # torch.ne `torch.ne(input, other, *, out=None) → Tensor` Computes input≠other\text{input} \neq \text{other} element-wise. The second argument can be a number or a tensor whose shape is [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the first argument. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to compare * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the tensor or value to compare Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Returns A boolean tensor that is True where `input` is not equal to `other` and False elsewhere Example: >>> torch.ne(torch.tensor([[1, 2], [3, 4]]), torch.tensor([[1, 1], [4, 4]])) tensor([[False, True], [True, False]]) # torch.neg `torch.neg(input, *, out=None) → Tensor` Returns a new tensor with the negative of the elements of `input`. out=−1×input\text{out} = -1 \times \text{input} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(5) >>> a tensor([ 0.0090, -0.2262, -0.0682, -0.2866, 0.3940]) >>> torch.neg(a) tensor([-0.0090, 0.2262, 0.0682, 0.2866, -0.3940]) # torch.negative `torch.negative(input, *, out=None) → Tensor` Alias for [`torch.neg()`](torch.neg#torch.neg "torch.neg") # torch.nextafter `torch.nextafter(input, other, *, out=None) → Tensor` Return the next floating-point value after `input` towards `other`, elementwise. The shapes of `input` and `other` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the first input tensor * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the second input tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> eps = torch.finfo(torch.float32).eps >>> torch.nextafter(torch.Tensor([1, 2]), torch.Tensor([2, 1])) == torch.Tensor([eps + 1, 2 - eps]) tensor([True, True]) # AdaptiveAvgPool1d `class torch.nn.AdaptiveAvgPool1d(output_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveAvgPool1d) Applies a 1D adaptive average pooling over an input signal composed of several input planes. The output size is H, for any input size. The number of output features is equal to the number of input planes. Parameters **output_size** – the target output size H #### Examples >>> # target output size of 5 >>> m = nn.AdaptiveAvgPool1d(5) >>> input = torch.randn(1, 64, 8) >>> output = m(input) # AdaptiveAvgPool2d `class torch.nn.AdaptiveAvgPool2d(output_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveAvgPool2d) Applies a 2D adaptive average pooling over an input signal composed of several input planes. The output is of size H x W, for any input size. The number of output features is equal to the number of input planes. Parameters **output_size** – the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a `int`, or `None` which means the size will be the same as that of the input. #### Examples >>> # target output size of 5x7 >>> m = nn.AdaptiveAvgPool2d((5,7)) >>> input = torch.randn(1, 64, 8, 9) >>> output = m(input) >>> # target output size of 7x7 (square) >>> m = nn.AdaptiveAvgPool2d(7) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) >>> # target output size of 10x7 >>> m = nn.AdaptiveAvgPool2d((None, 7)) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) # AdaptiveAvgPool3d `class torch.nn.AdaptiveAvgPool3d(output_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveAvgPool3d) Applies a 3D adaptive average pooling over an input signal composed of several input planes. The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes. Parameters **output_size** – the target output size of the form D x H x W. Can be a tuple (D, H, W) or a single number D for a cube D x D x D. D, H and W can be either a `int`, or `None` which means the size will be the same as that of the input. #### Examples >>> # target output size of 5x7x9 >>> m = nn.AdaptiveAvgPool3d((5,7,9)) >>> input = torch.randn(1, 64, 8, 9, 10) >>> output = m(input) >>> # target output size of 7x7x7 (cube) >>> m = nn.AdaptiveAvgPool3d(7) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) >>> # target output size of 7x9x8 >>> m = nn.AdaptiveAvgPool3d((7, None, None)) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) # AdaptiveLogSoftmaxWithLoss `class torch.nn.AdaptiveLogSoftmaxWithLoss(in_features, n_classes, cutoffs, div_value=4.0, head_bias=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/adaptive.html#AdaptiveLogSoftmaxWithLoss) Efficient softmax approximation as described in [Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou](https://arxiv.org/abs/1609.04309). Adaptive softmax is an approximate strategy for training models with large output spaces. It is most effective when the label distribution is highly imbalanced, for example in natural language modelling, where the word frequency distribution approximately follows the [Zipf’s law](https://en.wikipedia.org/wiki/Zipf%27s_law). Adaptive softmax partitions the labels into several clusters, according to their frequency. These clusters may contain different number of targets each. Additionally, clusters containing less frequent labels assign lower dimensional embeddings to those labels, which speeds up the computation. For each minibatch, only clusters for which at least one target is present are evaluated. The idea is that the clusters which are accessed frequently (like the first one, containing most frequent labels), should also be cheap to compute – that is, contain a small number of assigned labels. We highly recommend taking a look at the original paper for more details. * `cutoffs` should be an ordered Sequence of integers sorted in the increasing order. It controls number of clusters and the partitioning of targets into clusters. For example setting `cutoffs = [10, 100, 1000]` means that first `10` targets will be assigned to the ‘head’ of the adaptive softmax, targets `11, 12, …, 100` will be assigned to the first cluster, and targets `101, 102, …, 1000` will be assigned to the second cluster, while targets `1001, 1002, …, n_classes - 1` will be assigned to the last, third cluster. * `div_value` is used to compute the size of each additional cluster, which is given as ⌊in_featuresdiv_valueidx⌋\left\lfloor\frac{\texttt{in\\_features}}{\texttt{div\\_value}^{idx}}\right\rfloor , where idxidx is the cluster index (with clusters for less frequent words having larger indices, and indices starting from 11 ). * `head_bias` if set to True, adds a bias term to the ‘head’ of the adaptive softmax. See paper for details. Set to False in the official implementation. Warning Labels passed as inputs to this module should be sorted according to their frequency. This means that the most frequent label should be represented by the index `0`, and the least frequent label should be represented by the index `n_classes - 1`. Note This module returns a `NamedTuple` with `output` and `loss` fields. See further documentation for details. Note To compute log-probabilities for all classes, the `log_prob` method can be used. Parameters * **in_features** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of features in the input tensor * **n_classes** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of classes in the dataset * **cutoffs** (_Sequence_) – Cutoffs used to assign targets to their buckets * **div_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – value used as an exponent to compute sizes of the clusters. Default: 4.0 * **head_bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a bias term to the ‘head’ of the adaptive softmax. Default: `False` Returns * **output** is a Tensor of size `N` containing computed target log probabilities for each example * **loss** is a Scalar representing the computed negative log likelihood loss Return type `NamedTuple` with `output` and `loss` fields Shape: * input: (N,in_features)(N, \texttt{in\\_features}) * target: (N)(N) where each value satisfies 0<=target[i]<=n_classes0 <= \texttt{target[i]} <= \texttt{n\\_classes} * output1: (N)(N) * output2: `Scalar` `log_prob(input)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/adaptive.html#AdaptiveLogSoftmaxWithLoss.log_prob) Computes log probabilities for all n_classes\texttt{n\\_classes} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a minibatch of examples Returns log-probabilities of for each class cc in range 0<=c<=n_classes0 <= c <= \texttt{n\\_classes} , where n_classes\texttt{n\\_classes} is a parameter passed to `AdaptiveLogSoftmaxWithLoss` constructor. Shape: * Input: (N,in_features)(N, \texttt{in\\_features}) * Output: (N,n_classes)(N, \texttt{n\\_classes}) `predict(input)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/adaptive.html#AdaptiveLogSoftmaxWithLoss.predict) This is equivalent to `self.log_pob(input).argmax(dim=1)`, but is more efficient in some cases. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a minibatch of examples Returns a class with the highest probability for each example Return type output ([Tensor](../tensors#torch.Tensor "torch.Tensor")) Shape: * Input: (N,in_features)(N, \texttt{in\\_features}) * Output: (N)(N) # AdaptiveMaxPool1d `class torch.nn.AdaptiveMaxPool1d(output_size, return_indices=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveMaxPool1d) Applies a 1D adaptive max pooling over an input signal composed of several input planes. The output size is H, for any input size. The number of output features is equal to the number of input planes. Parameters * **output_size** – the target output size H * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool1d. Default: `False` #### Examples >>> # target output size of 5 >>> m = nn.AdaptiveMaxPool1d(5) >>> input = torch.randn(1, 64, 8) >>> output = m(input) # AdaptiveMaxPool2d `class torch.nn.AdaptiveMaxPool2d(output_size, return_indices=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveMaxPool2d) Applies a 2D adaptive max pooling over an input signal composed of several input planes. The output is of size H x W, for any input size. The number of output features is equal to the number of input planes. Parameters * **output_size** – the target output size of the image of the form H x W. Can be a tuple (H, W) or a single H for a square image H x H. H and W can be either a `int`, or `None` which means the size will be the same as that of the input. * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool2d. Default: `False` #### Examples >>> # target output size of 5x7 >>> m = nn.AdaptiveMaxPool2d((5,7)) >>> input = torch.randn(1, 64, 8, 9) >>> output = m(input) >>> # target output size of 7x7 (square) >>> m = nn.AdaptiveMaxPool2d(7) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) >>> # target output size of 10x7 >>> m = nn.AdaptiveMaxPool2d((None, 7)) >>> input = torch.randn(1, 64, 10, 9) >>> output = m(input) # AdaptiveMaxPool3d `class torch.nn.AdaptiveMaxPool3d(output_size, return_indices=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AdaptiveMaxPool3d) Applies a 3D adaptive max pooling over an input signal composed of several input planes. The output is of size D x H x W, for any input size. The number of output features is equal to the number of input planes. Parameters * **output_size** – the target output size of the image of the form D x H x W. Can be a tuple (D, H, W) or a single D for a cube D x D x D. D, H and W can be either a `int`, or `None` which means the size will be the same as that of the input. * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to nn.MaxUnpool3d. Default: `False` #### Examples >>> # target output size of 5x7x9 >>> m = nn.AdaptiveMaxPool3d((5,7,9)) >>> input = torch.randn(1, 64, 8, 9, 10) >>> output = m(input) >>> # target output size of 7x7x7 (cube) >>> m = nn.AdaptiveMaxPool3d(7) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) >>> # target output size of 7x9x8 >>> m = nn.AdaptiveMaxPool3d((7, None, None)) >>> input = torch.randn(1, 64, 10, 9, 8) >>> output = m(input) # AlphaDropout `class torch.nn.AlphaDropout(p=0.5, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#AlphaDropout) Applies Alpha Dropout over the input. Alpha Dropout is a type of Dropout that maintains the self-normalizing property. For an input with zero mean and unit standard deviation, the output of Alpha Dropout maintains the original mean and standard deviation of the input. Alpha Dropout goes hand-in-hand with SELU activation function, which ensures that the outputs have zero mean and unit standard deviation. During training, it randomly masks some of the elements of the input tensor with probability _p_ using samples from a bernoulli distribution. The elements to masked are randomized on every forward call, and scaled and shifted to maintain zero mean and unit standard deviation. During evaluation the module simply computes an identity function. More details can be found in the paper [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515) . Parameters * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – probability of an element to be dropped. Default: 0.5 * **inplace** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to `True`, will do this operation in-place Shape: * Input: (∗)(*) . Input can be of any shape * Output: (∗)(*) . Output is of the same shape as input Examples: >>> m = nn.AlphaDropout(p=0.2) >>> input = torch.randn(20, 16) >>> output = m(input) # AvgPool1d `class torch.nn.AvgPool1d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AvgPool1d) Applies a 1D average pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,C,L)(N, C, L) , output (N,C,Lout)(N, C, L_{out}) and `kernel_size` kk can be precisely described as: out(Ni,Cj,l)=1k∑m=0k−1input(Ni,Cj,stride×l+m)\text{out}(N_i, C_j, l) = \frac{1}{k} \sum_{m=0}^{k-1} \text{input}(N_i, C_j, \text{stride} \times l + m) If `padding` is non-zero, then the input is implicitly zero-padded on both sides for `padding` number of points. Note When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored. The parameters `kernel_size`, `stride`, `padding` can each be an `int` or a one-element tuple. Parameters * **kernel_size** – the size of the window * **stride** – the stride of the window. Default value is `kernel_size` * **padding** – implicit zero padding to be added on both sides * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape * **count_include_pad** – when True, will include the zero-padding in the averaging calculation Shape: * Input: (N,C,Lin)(N, C, L_{in}) * Output: (N,C,Lout)(N, C, L_{out}) , where Lout=⌊Lin+2×padding−kernel_sizestride+1⌋L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{kernel\\_size}}{\text{stride}} + 1\right\rfloor Examples: >>> # pool with window of size=3, stride=2 >>> m = nn.AvgPool1d(3, stride=2) >>> m(torch.tensor([[[1.,2,3,4,5,6,7]]])) tensor([[[ 2., 4., 6.]]]) # AvgPool2d `class torch.nn.AvgPool2d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AvgPool2d) Applies a 2D average pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,C,H,W)(N, C, H, W) , output (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) and `kernel_size` (kH,kW)(kH, kW) can be precisely described as: out(Ni,Cj,h,w)=1kH∗kW∑m=0kH−1∑n=0kW−1input(Ni,Cj,stride[0]×h+m,stride[1]×w+n)out(N_i, C_j, h, w) = \frac{1}{kH * kW} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} input(N_i, C_j, stride[0] \times h + m, stride[1] \times w + n) If `padding` is non-zero, then the input is implicitly zero-padded on both sides for `padding` number of points. Note When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored. The parameters `kernel_size`, `stride`, `padding` can either be: * a single `int` – in which case the same value is used for the height and width dimension * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension Parameters * **kernel_size** – the size of the window * **stride** – the stride of the window. Default value is `kernel_size` * **padding** – implicit zero padding to be added on both sides * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape * **count_include_pad** – when True, will include the zero-padding in the averaging calculation * **divisor_override** – if specified, it will be used as divisor, otherwise `kernel_size` will be used Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where Hout=⌊Hin+2×padding[0]−kernel_size[0]stride[0]+1⌋H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{kernel\\_size}[0]}{\text{stride}[0]} + 1\right\rfloor Wout=⌊Win+2×padding[1]−kernel_size[1]stride[1]+1⌋W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{kernel\\_size}[1]}{\text{stride}[1]} + 1\right\rfloor Examples: >>> # pool of square window of size=3, stride=2 >>> m = nn.AvgPool2d(3, stride=2) >>> # pool of non-square window >>> m = nn.AvgPool2d((3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) # AvgPool3d `class torch.nn.AvgPool3d(kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#AvgPool3d) Applies a 3D average pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,C,D,H,W)(N, C, D, H, W) , output (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) and `kernel_size` (kD,kH,kW)(kD, kH, kW) can be precisely described as: out(Ni,Cj,d,h,w)=∑k=0kD−1∑m=0kH−1∑n=0kW−1input(Ni,Cj,stride[0]×d+k,stride[1]×h+m,stride[2]×w+n)kD×kH×kW\begin{aligned} \text{out}(N_i, C_j, d, h, w) ={} & \sum_{k=0}^{kD-1} \sum_{m=0}^{kH-1} \sum_{n=0}^{kW-1} \\\ & \frac{\text{input}(N_i, C_j, \text{stride}[0] \times d + k, \text{stride}[1] \times h + m, \text{stride}[2] \times w + n)} {kD \times kH \times kW} \end{aligned} If `padding` is non-zero, then the input is implicitly zero-padded on all three sides for `padding` number of points. Note When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored. The parameters `kernel_size`, `stride` can either be: * a single `int` – in which case the same value is used for the depth, height and width dimension * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension Parameters * **kernel_size** – the size of the window * **stride** – the stride of the window. Default value is `kernel_size` * **padding** – implicit zero padding to be added on all three sides * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape * **count_include_pad** – when True, will include the zero-padding in the averaging calculation * **divisor_override** – if specified, it will be used as divisor, otherwise `kernel_size` will be used Shape: * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in}) * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where Dout=⌊Din+2×padding[0]−kernel_size[0]stride[0]+1⌋D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{kernel\\_size}[0]}{\text{stride}[0]} + 1\right\rfloor Hout=⌊Hin+2×padding[1]−kernel_size[1]stride[1]+1⌋H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{kernel\\_size}[1]}{\text{stride}[1]} + 1\right\rfloor Wout=⌊Win+2×padding[2]−kernel_size[2]stride[2]+1⌋W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{kernel\\_size}[2]}{\text{stride}[2]} + 1\right\rfloor Examples: >>> # pool of square window of size=3, stride=2 >>> m = nn.AvgPool3d(3, stride=2) >>> # pool of non-square window >>> m = nn.AvgPool3d((3, 2, 2), stride=(2, 1, 2)) >>> input = torch.randn(20, 16, 50,44, 31) >>> output = m(input) # BatchNorm1d `class torch.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#BatchNorm1d) Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated per-dimension over the mini- batches and γ\gamma and β\beta are learnable parameter vectors of size `C` (where `C` is the input size). By default, the elements of γ\gamma are set to 1 and the elements of β\beta are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default `momentum` of 0.1. If `track_running_stats` is set to `False`, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well. Note This `momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where x^\hat{x} is the estimated statistic and xtx_t is the new observed value. Because the Batch Normalization is done over the `C` dimension, computing statistics on `(N, L)` slices, it’s common terminology to call this Temporal Batch Normalization. Parameters * **num_features** – CC from an expected input of size (N,C,L)(N, C, L) or LL from input of size (N,L)(N, L) * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1 * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True` * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True` Shape: * Input: (N,C)(N, C) or (N,C,L)(N, C, L) * Output: (N,C)(N, C) or (N,C,L)(N, C, L) (same shape as input) Examples: >>> # With Learnable Parameters >>> m = nn.BatchNorm1d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm1d(100, affine=False) >>> input = torch.randn(20, 100) >>> output = m(input) # BatchNorm2d `class torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#BatchNorm2d) Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated per-dimension over the mini- batches and γ\gamma and β\beta are learnable parameter vectors of size `C` (where `C` is the input size). By default, the elements of γ\gamma are set to 1 and the elements of β\beta are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default `momentum` of 0.1. If `track_running_stats` is set to `False`, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well. Note This `momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where x^\hat{x} is the estimated statistic and xtx_t is the new observed value. Because the Batch Normalization is done over the `C` dimension, computing statistics on `(N, H, W)` slices, it’s common terminology to call this Spatial Batch Normalization. Parameters * **num_features** – CC from an expected input of size (N,C,H,W)(N, C, H, W) * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1 * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True` * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True` Shape: * Input: (N,C,H,W)(N, C, H, W) * Output: (N,C,H,W)(N, C, H, W) (same shape as input) Examples: >>> # With Learnable Parameters >>> m = nn.BatchNorm2d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm2d(100, affine=False) >>> input = torch.randn(20, 100, 35, 45) >>> output = m(input) # BatchNorm3d `class torch.nn.BatchNorm3d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#BatchNorm3d) Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated per-dimension over the mini- batches and γ\gamma and β\beta are learnable parameter vectors of size `C` (where `C` is the input size). By default, the elements of γ\gamma are set to 1 and the elements of β\beta are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default `momentum` of 0.1. If `track_running_stats` is set to `False`, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well. Note This `momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where x^\hat{x} is the estimated statistic and xtx_t is the new observed value. Because the Batch Normalization is done over the `C` dimension, computing statistics on `(N, D, H, W)` slices, it’s common terminology to call this Volumetric Batch Normalization or Spatio-temporal Batch Normalization. Parameters * **num_features** – CC from an expected input of size (N,C,D,H,W)(N, C, D, H, W) * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1 * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True` * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True` Shape: * Input: (N,C,D,H,W)(N, C, D, H, W) * Output: (N,C,D,H,W)(N, C, D, H, W) (same shape as input) Examples: >>> # With Learnable Parameters >>> m = nn.BatchNorm3d(100) >>> # Without Learnable Parameters >>> m = nn.BatchNorm3d(100, affine=False) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input) # BCELoss `class torch.nn.BCELoss(weight=None, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#BCELoss) Creates a criterion that measures the Binary Cross Entropy between the target and the output: The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as: ℓ(x,y)=L={l1,…,lN}⊤,ln=−wn[yn⋅log⁡xn+(1−yn)⋅log⁡(1−xn)],\ell(x, y) = L = \\{l_1,\dots,l_N\\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log x_n + (1 - y_n) \cdot \log (1 - x_n) \right], where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`), then ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases} This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets yy should be numbers between 0 and 1. Notice that if xnx_n is either 0 or 1, one of the log terms would be mathematically undefined in the above loss equation. PyTorch chooses to set log⁡(0)=−∞\log (0) = -\infty , since lim⁡x→0log⁡(x)=−∞\lim_{x\to 0} \log (x) = -\infty . However, an infinite term in the loss equation is not desirable for several reasons. For one, if either yn=0y_n = 0 or (1−yn)=0(1 - y_n) = 0 , then we would be multiplying 0 with infinity. Secondly, if we have an infinite loss value, then we would also have an infinite term in our gradient, since lim⁡x→0ddxlog⁡(x)=∞\lim_{x\to 0} \frac{d}{dx} \log (x) = \infty . This would make BCELoss’s backward method nonlinear with respect to xnx_n , and using it for things like linear regression would not be straight-forward. Our solution is that BCELoss clamps its log function outputs to be greater than or equal to -100. This way, we can always have a finite loss value and a linear backward method. Parameters * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size `nbatch`. * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions * Target: (N,∗)(N, *) , same shape as the input * Output: scalar. If `reduction` is `'none'`, then (N,∗)(N, *) , same shape as input. Examples: >>> m = nn.Sigmoid() >>> loss = nn.BCELoss() >>> input = torch.randn(3, requires_grad=True) >>> target = torch.empty(3).random_(2) >>> output = loss(m(input), target) >>> output.backward() # BCEWithLogitsLoss `class torch.nn.BCEWithLogitsLoss(weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#BCEWithLogitsLoss) This loss combines a `Sigmoid` layer and the `BCELoss` in one single class. This version is more numerically stable than using a plain `Sigmoid` followed by a `BCELoss` as, by combining the operations into one layer, we take advantage of the log-sum-exp trick for numerical stability. The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as: ℓ(x,y)=L={l1,…,lN}⊤,ln=−wn[yn⋅log⁡σ(xn)+(1−yn)⋅log⁡(1−σ(xn))],\ell(x, y) = L = \\{l_1,\dots,l_N\\}^\top, \quad l_n = - w_n \left[ y_n \cdot \log \sigma(x_n) + (1 - y_n) \cdot \log (1 - \sigma(x_n)) \right], where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`), then ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases} This is used for measuring the error of a reconstruction in for example an auto-encoder. Note that the targets `t[i]` should be numbers between 0 and 1. It’s possible to trade off recall and precision by adding weights to positive examples. In the case of multi-label classification the loss can be described as: ℓc(x,y)=Lc={l1,c,…,lN,c}⊤,ln,c=−wn,c[pcyn,c⋅log⁡σ(xn,c)+(1−yn,c)⋅log⁡(1−σ(xn,c))],\ell_c(x, y) = L_c = \\{l_{1,c},\dots,l_{N,c}\\}^\top, \quad l_{n,c} = - w_{n,c} \left[ p_c y_{n,c} \cdot \log \sigma(x_{n,c}) + (1 - y_{n,c}) \cdot \log (1 - \sigma(x_{n,c})) \right], where cc is the class number (c>1c > 1 for multi-label binary classification, c=1c = 1 for single-label binary classification), nn is the number of the sample in the batch and pcp_c is the weight of the positive answer for the class cc . pc>1p_c > 1 increases the recall, pc<1p_c < 1 increases the precision. For example, if a dataset contains 100 positive and 300 negative examples of a single class, then `pos_weight` for the class should be equal to 300100=3\frac{300}{100}=3 . The loss would act as if the dataset contains 3×100=3003\times 100=300 positive examples. Examples: >>> target = torch.ones([10, 64], dtype=torch.float32) # 64 classes, batch size = 10 >>> output = torch.full([10, 64], 1.5) # A prediction (logit) >>> pos_weight = torch.ones([64]) # All weights are equal to 1 >>> criterion = torch.nn.BCEWithLogitsLoss(pos_weight=pos_weight) >>> criterion(output, target) # -log(sigmoid(1.5)) tensor(0.2014) Parameters * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to the loss of each batch element. If given, has to be a Tensor of size `nbatch`. * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` * **pos_weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a weight of positive examples. Must be a vector with length equal to the number of classes. Shape: * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions * Target: (N,∗)(N, *) , same shape as the input * Output: scalar. If `reduction` is `'none'`, then (N,∗)(N, *) , same shape as input. Examples: >>> loss = nn.BCEWithLogitsLoss() >>> input = torch.randn(3, requires_grad=True) >>> target = torch.empty(3).random_(2) >>> output = loss(input, target) >>> output.backward() # Bilinear `class torch.nn.Bilinear(in1_features, in2_features, out_features, bias=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#Bilinear) Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A x_2 + b Parameters * **in1_features** – size of each first input sample * **in2_features** – size of each second input sample * **out_features** – size of each output sample * **bias** – If set to False, the layer will not learn an additive bias. Default: `True` Shape: * Input1: (N,∗,Hin1)(N, *, H_{in1}) where Hin1=in1_featuresH_{in1}=\text{in1\\_features} and ∗* means any number of additional dimensions. All but the last dimension of the inputs should be the same. * Input2: (N,∗,Hin2)(N, *, H_{in2}) where Hin2=in2_featuresH_{in2}=\text{in2\\_features} . * Output: (N,∗,Hout)(N, *, H_{out}) where Hout=out_featuresH_{out}=\text{out\\_features} and all but the last dimension are the same shape as the input. Variables * **~Bilinear.weight** – the learnable weights of the module of shape (out_features,in1_features,in2_features)(\text{out\\_features}, \text{in1\\_features}, \text{in2\\_features}) . The values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in1_featuresk = \frac{1}{\text{in1\\_features}} * **~Bilinear.bias** – the learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in1_featuresk = \frac{1}{\text{in1\\_features}} Examples: >>> m = nn.Bilinear(20, 30, 40) >>> input1 = torch.randn(128, 20) >>> input2 = torch.randn(128, 30) >>> output = m(input1, input2) >>> print(output.size()) torch.Size([128, 40]) # CELU `class torch.nn.CELU(alpha=1.0, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#CELU) Applies the element-wise function: CELU(x)=max⁡(0,x)+min⁡(0,α∗(exp⁡(x/α)−1))\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1)) More details can be found in the paper [Continuously Differentiable Exponential Linear Units](https://arxiv.org/abs/1704.07483) . Parameters * **alpha** – the α\alpha value for the CELU formulation. Default: 1.0 * **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.CELU() >>> input = torch.randn(2) >>> output = m(input) # ChannelShuffle `class torch.nn.ChannelShuffle(groups)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/channelshuffle.html#ChannelShuffle) Divide the channels in a tensor of shape (∗,C,H,W)(*, C , H, W) into g groups and rearrange them as (∗,Cg,g,H,W)(*, C \frac g, g, H, W) , while keeping the original tensor shape. Parameters **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of groups to divide channels in. Examples: >>> channel_shuffle = nn.ChannelShuffle(2) >>> input = torch.randn(1, 4, 2, 2) >>> print(input) [[[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]], [[13, 14], [15, 16]], ]] >>> output = channel_shuffle(input) >>> print(output) [[[[1, 2], [3, 4]], [[9, 10], [11, 12]], [[5, 6], [7, 8]], [[13, 14], [15, 16]], ]] # ConstantPad1d `class torch.nn.ConstantPad1d(padding, value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ConstantPad1d) Pads the input tensor boundaries with a constant value. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in both boundaries. If a 2-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} ) Shape: * Input: (N,C,Win)(N, C, W_{in}) * Output: (N,C,Wout)(N, C, W_{out}) where Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ConstantPad1d(2, 3.5) >>> input = torch.randn(1, 2, 4) >>> input tensor([[[-1.0491, -0.7152, -0.0749, 0.8530], [-1.3287, 1.8966, 0.1466, -0.2771]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, -1.0491, -0.7152, -0.0749, 0.8530, 3.5000, 3.5000], [ 3.5000, 3.5000, -1.3287, 1.8966, 0.1466, -0.2771, 3.5000, 3.5000]]]) >>> m = nn.ConstantPad1d(2, 3.5) >>> input = torch.randn(1, 2, 3) >>> input tensor([[[ 1.6616, 1.4523, -1.1255], [-3.6372, 0.1182, -1.8652]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, 1.6616, 1.4523, -1.1255, 3.5000, 3.5000], [ 3.5000, 3.5000, -3.6372, 0.1182, -1.8652, 3.5000, 3.5000]]]) >>> # using different paddings for different sides >>> m = nn.ConstantPad1d((3, 1), 3.5) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 1.6616, 1.4523, -1.1255, 3.5000], [ 3.5000, 3.5000, 3.5000, -3.6372, 0.1182, -1.8652, 3.5000]]]) # ConstantPad2d `class torch.nn.ConstantPad2d(padding, value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ConstantPad2d) Pads the input tensor boundaries with a constant value. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} , padding_top\text{padding\\_top} , padding_bottom\text{padding\\_bottom} ) Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} + \text{padding\\_bottom} Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ConstantPad2d(2, 3.5) >>> input = torch.randn(1, 2, 2) >>> input tensor([[[ 1.6585, 0.4320], [-0.8701, -0.4649]]]) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 1.6585, 0.4320, 3.5000, 3.5000], [ 3.5000, 3.5000, -0.8701, -0.4649, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000, 3.5000]]]) >>> # using different paddings for different sides >>> m = nn.ConstantPad2d((3, 0, 2, 1), 3.5) >>> m(input) tensor([[[ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000], [ 3.5000, 3.5000, 3.5000, 1.6585, 0.4320], [ 3.5000, 3.5000, 3.5000, -0.8701, -0.4649], [ 3.5000, 3.5000, 3.5000, 3.5000, 3.5000]]]) # ConstantPad3d `class torch.nn.ConstantPad3d(padding, value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ConstantPad3d) Pads the input tensor boundaries with a constant value. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 6-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} , padding_top\text{padding\\_top} , padding_bottom\text{padding\\_bottom} , padding_front\text{padding\\_front} , padding_back\text{padding\\_back} ) Shape: * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in}) * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) where Dout=Din+padding_front+padding_backD_{out} = D_{in} + \text{padding\\_front} + \text{padding\\_back} Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} + \text{padding\\_bottom} Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ConstantPad3d(3, 3.5) >>> input = torch.randn(16, 3, 10, 20, 30) >>> output = m(input) >>> # using different paddings for different sides >>> m = nn.ConstantPad3d((3, 3, 6, 6, 0, 1), 3.5) >>> output = m(input) # Conv1d `class torch.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#Conv1d) Applies a 1D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,Cin,L)(N, C_{\text{in}}, L) and output (N,Cout,Lout)(N, C_{\text{out}}, L_{\text{out}}) can be precisely described as: out(Ni,Coutj)=bias(Coutj)+∑k=0Cin−1weight(Coutj,k)⋆input(Ni,k)\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{in} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) where ⋆\star is the valid [cross- correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator, NN is a batch size, CC denotes a number of channels, LL is a length of signal sequence. This module supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). * `stride` controls the stride for the cross-correlation, a single number or a one-element tuple. * `padding` controls the amount of implicit padding on both sides for `padding` number of points. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ). Note When `groups == in_channels` and `out_channels == K * in_channels`, where `K` is a positive integer, this operation is also known as a “depthwise convolution”. In other words, for an input of size (N,Cin,Lin)(N, C_{in}, L_{in}) , a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments (Cin=Cin,Cout=Cin×K,...,groups=Cin)(C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in}) . Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0 * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` Shape: * Input: (N,Cin,Lin)(N, C_{in}, L_{in}) * Output: (N,Cout,Lout)(N, C_{out}, L_{out}) where Lout=⌊Lin+2×padding−dilation×(kernel_size−1)−1stride+1⌋L_{out} = \left\lfloor\frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor Variables * **~Conv1d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (out_channels,in_channelsgroups,kernel_size)(\text{out\\_channels}, \frac{\text{in\\_channels}}{\text{groups}}, \text{kernel\\_size}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗kernel_sizek = \frac{groups}{C_\text{in} * \text{kernel\\_size}} * **~Conv1d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗kernel_sizek = \frac{groups}{C_\text{in} * \text{kernel\\_size}} Examples: >>> m = nn.Conv1d(16, 33, 3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input) # Conv2d `class torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#Conv2d) Applies a 2D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,Cin,H,W)(N, C_{\text{in}}, H, W) and output (N,Cout,Hout,Wout)(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}}) can be precisely described as: out(Ni,Coutj)=bias(Coutj)+∑k=0Cin−1weight(Coutj,k)⋆input(Ni,k)\text{out}(N_i, C_{\text{out}_j}) = \text{bias}(C_{\text{out}_j}) + \sum_{k = 0}^{C_{\text{in}} - 1} \text{weight}(C_{\text{out}_j}, k) \star \text{input}(N_i, k) where ⋆\star is the valid 2D [cross- correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator, NN is a batch size, CC denotes a number of channels, HH is a height of input planes in pixels, and WW is width in pixels. This module supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). * `stride` controls the stride for the cross-correlation, a single number or a tuple. * `padding` controls the amount of implicit padding on both sides for `padding` number of points for each dimension. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ). The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be: * a single `int` – in which case the same value is used for the height and width dimension * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension Note When `groups == in_channels` and `out_channels == K * in_channels`, where `K` is a positive integer, this operation is also known as a “depthwise convolution”. In other words, for an input of size (N,Cin,Lin)(N, C_{in}, L_{in}) , a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments (Cin=Cin,Cout=Cin×K,...,groups=Cin)(C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in}) . Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0 * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` Shape: * Input: (N,Cin,Hin,Win)(N, C_{in}, H_{in}, W_{in}) * Output: (N,Cout,Hout,Wout)(N, C_{out}, H_{out}, W_{out}) where Hout=⌊Hin+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor Wout=⌊Win+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor Variables * **~Conv2d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (out_channels,in_channelsgroups,(\text{out\\_channels}, \frac{\text{in\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\\_size}[i]} * **~Conv2d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\\_size}[i]} #### Examples >>> # With square kernels and equal stride >>> m = nn.Conv2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input) # Conv3d `class torch.nn.Conv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#Conv3d) Applies a 3D convolution over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,Cin,D,H,W)(N, C_{in}, D, H, W) and output (N,Cout,Dout,Hout,Wout)(N, C_{out}, D_{out}, H_{out}, W_{out}) can be precisely described as: out(Ni,Coutj)=bias(Coutj)+∑k=0Cin−1weight(Coutj,k)⋆input(Ni,k)out(N_i, C_{out_j}) = bias(C_{out_j}) + \sum_{k = 0}^{C_{in} - 1} weight(C_{out_j}, k) \star input(N_i, k) where ⋆\star is the valid 3D [cross- correlation](https://en.wikipedia.org/wiki/Cross-correlation) operator This module supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). * `stride` controls the stride for the cross-correlation. * `padding` controls the amount of implicit padding on both sides for `padding` number of points for each dimension. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ). The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be: * a single `int` – in which case the same value is used for the depth, height and width dimension * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension Note When `groups == in_channels` and `out_channels == K * in_channels`, where `K` is a positive integer, this operation is also known as a “depthwise convolution”. In other words, for an input of size (N,Cin,Lin)(N, C_{in}, L_{in}) , a depthwise convolution with a depthwise multiplier `K` can be performed with the arguments (Cin=Cin,Cout=Cin×K,...,groups=Cin)(C_\text{in}=C_\text{in}, C_\text{out}=C_\text{in} \times \text{K}, ..., \text{groups}=C_\text{in}) . Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to all three sides of the input. Default: 0 * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` Shape: * Input: (N,Cin,Din,Hin,Win)(N, C_{in}, D_{in}, H_{in}, W_{in}) * Output: (N,Cout,Dout,Hout,Wout)(N, C_{out}, D_{out}, H_{out}, W_{out}) where Dout=⌊Din+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor Hout=⌊Hin+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor Wout=⌊Win+2×padding[2]−dilation[2]×(kernel_size[2]−1)−1stride[2]+1⌋W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel\\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor Variables * **~Conv3d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (out_channels,in_channelsgroups,(\text{out\\_channels}, \frac{\text{in\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1],kernel_size[2])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}, \text{kernel\\_size[2]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\\_size}[i]} * **~Conv3d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCin∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{in} * \prod_{i=0}^{2}\text{kernel\\_size}[i]} Examples: >>> # With square kernels and equal stride >>> m = nn.Conv3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.Conv3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(4, 2, 0)) >>> input = torch.randn(20, 16, 10, 50, 100) >>> output = m(input) # ConvTranspose1d `class torch.nn.ConvTranspose1d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#ConvTranspose1d) Applies a 1D transposed convolution operator over an input image composed of several input planes. This module can be seen as the gradient of Conv1d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). This module supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). * `stride` controls the stride for the cross-correlation. * `padding` controls the amount of implicit zero padding on both sides for `dilation * (kernel_size - 1) - padding` number of points. See note below for details. * `output_padding` controls the additional size added to one side of the output shape. See note below for details. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ). Note The `padding` argument effectively adds `dilation * (kernel_size - 1) - padding` amount of zero padding to both sizes of the input. This is set so that when a [`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") and a `ConvTranspose1d` are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when `stride > 1`, [`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") maps multiple input shapes to the same output shape. `output_padding` is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that `output_padding` is only used to find output shape, but does not actually add zero-padding to output. Note In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. Please see the notes on [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for background. Parameters * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of the input. Default: 0 * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of the output shape. Default: 0 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 Shape: * Input: (N,Cin,Lin)(N, C_{in}, L_{in}) * Output: (N,Cout,Lout)(N, C_{out}, L_{out}) where Lout=(Lin−1)×stride−2×padding+dilation×(kernel_size−1)+output_padding+1L_{out} = (L_{in} - 1) \times \text{stride} - 2 \times \text{padding} + \text{dilation} \times (\text{kernel\\_size} - 1) + \text{output\\_padding} + 1 Variables * **~ConvTranspose1d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels,out_channelsgroups,(\text{in\\_channels}, \frac{\text{out\\_channels}}{\text{groups}}, kernel_size)\text{kernel\\_size}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗kernel_sizek = \frac{groups}{C_\text{out} * \text{kernel\\_size}} * **~ConvTranspose1d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels). If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗kernel_sizek = \frac{groups}{C_\text{out} * \text{kernel\\_size}} # ConvTranspose2d `class torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#ConvTranspose2d) Applies a 2D transposed convolution operator over an input image composed of several input planes. This module can be seen as the gradient of Conv2d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). This module supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). * `stride` controls the stride for the cross-correlation. * `padding` controls the amount of implicit zero padding on both sides for `dilation * (kernel_size - 1) - padding` number of points. See note below for details. * `output_padding` controls the additional size added to one side of the output shape. See note below for details. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ). The parameters `kernel_size`, `stride`, `padding`, `output_padding` can either be: * a single `int` – in which case the same value is used for the height and width dimensions * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension Note The `padding` argument effectively adds `dilation * (kernel_size - 1) - padding` amount of zero padding to both sizes of the input. This is set so that when a [`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") and a `ConvTranspose2d` are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when `stride > 1`, [`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") maps multiple input shapes to the same output shape. `output_padding` is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that `output_padding` is only used to find output shape, but does not actually add zero-padding to output. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0 * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 Shape: * Input: (N,Cin,Hin,Win)(N, C_{in}, H_{in}, W_{in}) * Output: (N,Cout,Hout,Wout)(N, C_{out}, H_{out}, W_{out}) where Hout=(Hin−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\\_size}[0] - 1) + \text{output\\_padding}[0] + 1 Wout=(Win−1)×stride[1]−2×padding[1]+dilation[1]×(kernel_size[1]−1)+output_padding[1]+1W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\\_size}[1] - 1) + \text{output\\_padding}[1] + 1 Variables * **~ConvTranspose2d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels,out_channelsgroups,(\text{in\\_channels}, \frac{\text{out\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\\_size}[i]} * **~ConvTranspose2d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels) If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=01kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\\_size}[i]} Examples: >>> # With square kernels and equal stride >>> m = nn.ConvTranspose2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> input = torch.randn(20, 16, 50, 100) >>> output = m(input) >>> # exact output size can be also specified as an argument >>> input = torch.randn(1, 16, 12, 12) >>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1) >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1) >>> h = downsample(input) >>> h.size() torch.Size([1, 16, 6, 6]) >>> output = upsample(h, output_size=input.size()) >>> output.size() torch.Size([1, 16, 12, 12]) # ConvTranspose3d `class torch.nn.ConvTranspose3d(in_channels, out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#ConvTranspose3d) Applies a 3D transposed convolution operator over an input image composed of several input planes. The transposed convolution operator multiplies each input value element-wise by a learnable kernel, and sums over the outputs from all input feature planes. This module can be seen as the gradient of Conv3d with respect to its input. It is also known as a fractionally-strided convolution or a deconvolution (although it is not an actual deconvolution operation). This module supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). * `stride` controls the stride for the cross-correlation. * `padding` controls the amount of implicit zero padding on both sides for `dilation * (kernel_size - 1) - padding` number of points. See note below for details. * `output_padding` controls the additional size added to one side of the output shape. See note below for details. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. * `groups` controls the connections between inputs and outputs. `in_channels` and `out_channels` must both be divisible by `groups`. For example, * At groups=1, all inputs are convolved to all outputs. * At groups=2, the operation becomes equivalent to having two conv layers side by side, each seeing half the input channels and producing half the output channels, and both subsequently concatenated. * At groups= `in_channels`, each input channel is convolved with its own set of filters (of size out_channelsin_channels\frac{\text{out\\_channels}}{\text{in\\_channels}} ). The parameters `kernel_size`, `stride`, `padding`, `output_padding` can either be: * a single `int` – in which case the same value is used for the depth, height and width dimensions * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension Note The `padding` argument effectively adds `dilation * (kernel_size - 1) - padding` amount of zero padding to both sizes of the input. This is set so that when a [`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") and a `ConvTranspose3d` are initialized with same parameters, they are inverses of each other in regard to the input and output shapes. However, when `stride > 1`, [`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") maps multiple input shapes to the same output shape. `output_padding` is provided to resolve this ambiguity by effectively increasing the calculated output shape on one side. Note that `output_padding` is only used to find output shape, but does not actually add zero-padding to output. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **in_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels in the input image * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0 * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 Shape: * Input: (N,Cin,Din,Hin,Win)(N, C_{in}, D_{in}, H_{in}, W_{in}) * Output: (N,Cout,Dout,Hout,Wout)(N, C_{out}, D_{out}, H_{out}, W_{out}) where Dout=(Din−1)×stride[0]−2×padding[0]+dilation[0]×(kernel_size[0]−1)+output_padding[0]+1D_{out} = (D_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0] \times (\text{kernel\\_size}[0] - 1) + \text{output\\_padding}[0] + 1 Hout=(Hin−1)×stride[1]−2×padding[1]+dilation[1]×(kernel_size[1]−1)+output_padding[1]+1H_{out} = (H_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1] \times (\text{kernel\\_size}[1] - 1) + \text{output\\_padding}[1] + 1 Wout=(Win−1)×stride[2]−2×padding[2]+dilation[2]×(kernel_size[2]−1)+output_padding[2]+1W_{out} = (W_{in} - 1) \times \text{stride}[2] - 2 \times \text{padding}[2] + \text{dilation}[2] \times (\text{kernel\\_size}[2] - 1) + \text{output\\_padding}[2] + 1 Variables * **~ConvTranspose3d.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (in_channels,out_channelsgroups,(\text{in\\_channels}, \frac{\text{out\\_channels}}{\text{groups}}, kernel_size[0],kernel_size[1],kernel_size[2])\text{kernel\\_size[0]}, \text{kernel\\_size[1]}, \text{kernel\\_size[2]}) . The values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel\\_size}[i]} * **~ConvTranspose3d.bias** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable bias of the module of shape (out_channels) If `bias` is `True`, then the values of these weights are sampled from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=groupsCout∗∏i=02kernel_size[i]k = \frac{groups}{C_\text{out} * \prod_{i=0}^{2}\text{kernel\\_size}[i]} Examples: >>> # With square kernels and equal stride >>> m = nn.ConvTranspose3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.ConvTranspose3d(16, 33, (3, 5, 2), stride=(2, 1, 1), padding=(0, 4, 2)) >>> input = torch.randn(20, 16, 10, 50, 100) >>> output = m(input) # CosineEmbeddingLoss `class torch.nn.CosineEmbeddingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#CosineEmbeddingLoss) Creates a criterion that measures the loss given input tensors x1x_1 , x2x_2 and a `Tensor` label yy with values 1 or -1. This is used for measuring whether two inputs are similar or dissimilar, using the cosine distance, and is typically used for learning nonlinear embeddings or semi-supervised learning. The loss function for each sample is: loss(x,y)={1−cos⁡(x1,x2),if y=1max⁡(0,cos⁡(x1,x2)−margin),if y=−1\text{loss}(x, y) = \begin{cases} 1 - \cos(x_1, x_2), & \text{if } y = 1 \\\ \max(0, \cos(x_1, x_2) - \text{margin}), & \text{if } y = -1 \end{cases} Parameters * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Should be a number from −1-1 to 11 , 00 to 0.50.5 is suggested. If `margin` is missing, the default value is 00 . * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` # CosineSimilarity `class torch.nn.CosineSimilarity(dim=1, eps=1e-08)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/distance.html#CosineSimilarity) Returns cosine similarity between x1x_1 and x2x_2 , computed along dim. similarity=x1⋅x2max⁡(∥x1∥2⋅∥x2∥2,ϵ).\text{similarity} = \dfrac{x_1 \cdot x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)}. Parameters * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Dimension where cosine similarity is computed. Default: 1 * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid division by zero. Default: 1e-8 Shape: * Input1: (∗1,D,∗2)(\ast_1, D, \ast_2) where D is at position `dim` * Input2: (∗1,D,∗2)(\ast_1, D, \ast_2) , same shape as the Input1 * Output: (∗1,∗2)(\ast_1, \ast_2) Examples:: >>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> cos = nn.CosineSimilarity(dim=1, eps=1e-6) >>> output = cos(input1, input2) # CrossEntropyLoss `class torch.nn.CrossEntropyLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#CrossEntropyLoss) This criterion combines [`LogSoftmax`](torch.nn.logsoftmax#torch.nn.LogSoftmax "torch.nn.LogSoftmax") and [`NLLLoss`](torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") in one single class. It is useful when training a classification problem with `C` classes. If provided, the optional argument `weight` should be a 1D `Tensor` assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set. The `input` is expected to contain raw, unnormalized scores for each class. `input` has to be a Tensor of size either (minibatch,C)(minibatch, C) or (minibatch,C,d1,d2,...,dK)(minibatch, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 for the `K`-dimensional case (described later). This criterion expects a class index in the range [0,C−1][0, C-1] as the `target` for each value of a 1D tensor of size `minibatch`; if `ignore_index` is specified, this criterion also accepts this class index (this index may not necessarily be in the class range). The loss can be described as: loss(x,class)=−log⁡(exp⁡(x[class])∑jexp⁡(x[j]))=−x[class]+log⁡(∑jexp⁡(x[j]))\text{loss}(x, class) = -\log\left(\frac{\exp(x[class])}{\sum_j \exp(x[j])}\right) = -x[class] + \log\left(\sum_j \exp(x[j])\right) or in the case of the `weight` argument being specified: loss(x,class)=weight[class](−x[class]+log⁡(∑jexp⁡(x[j])))\text{loss}(x, class) = weight[class] \left(-x[class] + \log\left(\sum_j \exp(x[j])\right)\right) The losses are averaged across observations for each minibatch. If the `weight` argument is specified then this is a weighted average: loss=∑i=1Nloss(i,class[i])∑i=1Nweight[class[i]]\text{loss} = \frac{\sum^{N}_{i=1} loss(i, class[i])}{\sum^{N}_{i=1} weight[class[i]]} Can also be used for higher dimension inputs, such as 2D images, by providing an input of size (minibatch,C,d1,d2,...,dK)(minibatch, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 , where KK is the number of dimensions, and a target of appropriate shape (see below). Parameters * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, has to be a Tensor of size `C` * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets. * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the weighted mean of the output is taken, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (N,C)(N, C) where `C = number of classes`, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of `K`-dimensional loss. * Target: (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss. * Output: scalar. If `reduction` is `'none'`, then the same size as the target: (N)(N) , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss. Examples: >>> loss = nn.CrossEntropyLoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.empty(3, dtype=torch.long).random_(5) >>> output = loss(input, target) >>> output.backward() # CTCLoss `class torch.nn.CTCLoss(blank=0, reduction='mean', zero_infinity=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#CTCLoss) The Connectionist Temporal Classification loss. Calculates loss between a continuous (unsegmented) time series and a target sequence. CTCLoss sums over the probability of possible alignments of input to target, producing a loss value which is differentiable with respect to each input node. The alignment of input to target is assumed to be “many-to-one”, which limits the length of the target sequence such that it must be ≤\leq the input length. Parameters * **blank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – blank label. Default 00 . * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the output losses will be divided by the target lengths and then the mean over the batch is taken. Default: `'mean'` * **zero_infinity** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether to zero infinite losses and the associated gradients. Default: `False` Infinite losses mainly occur when the inputs are too short to be aligned to the targets. Shape: * Log_probs: Tensor of size (T,N,C)(T, N, C) , where T=input lengthT = \text{input length} , N=batch sizeN = \text{batch size} , and C=number of classes (including blank)C = \text{number of classes (including blank)} . The logarithmized probabilities of the outputs (e.g. obtained with [`torch.nn.functional.log_softmax()`](../nn.functional#torch.nn.functional.log_softmax "torch.nn.functional.log_softmax")). * Targets: Tensor of size (N,S)(N, S) or (sum⁡(target_lengths))(\operatorname{sum}(\text{target\\_lengths})) , where N=batch sizeN = \text{batch size} and S=max target length, if shape is (N,S)S = \text{max target length, if shape is } (N, S) . It represent the target sequences. Each element in the target sequence is a class index. And the target index cannot be blank (default=0). In the (N,S)(N, S) form, targets are padded to the length of the longest sequence, and stacked. In the (sum⁡(target_lengths))(\operatorname{sum}(\text{target\\_lengths})) form, the targets are assumed to be un-padded and concatenated within 1 dimension. * Input_lengths: Tuple or tensor of size (N)(N) , where N=batch sizeN = \text{batch size} . It represent the lengths of the inputs (must each be ≤T\leq T ). And the lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. * Target_lengths: Tuple or tensor of size (N)(N) , where N=batch sizeN = \text{batch size} . It represent lengths of the targets. Lengths are specified for each sequence to achieve masking under the assumption that sequences are padded to equal lengths. If target shape is (N,S)(N,S) , target_lengths are effectively the stop index sns_n for each target sequence, such that `target_n = targets[n,0:s_n]` for each target in a batch. Lengths must each be ≤S\leq S If the targets are given as a 1d tensor that is the concatenation of individual targets, the target_lengths must add up to the total length of the tensor. * Output: scalar. If `reduction` is `'none'`, then (N)(N) , where N=batch sizeN = \text{batch size} . Examples: >>> # Target are to be padded >>> T = 50 # Input sequence length >>> C = 20 # Number of classes (including blank) >>> N = 16 # Batch size >>> S = 30 # Target sequence length of longest target in batch (padding length) >>> S_min = 10 # Minimum target length, for demonstration purposes >>> >>> # Initialize random batch of input vectors, for *size = (T,N,C) >>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_() >>> >>> # Initialize random batch of targets (0 = blank, 1:C = classes) >>> target = torch.randint(low=1, high=C, size=(N, S), dtype=torch.long) >>> >>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long) >>> target_lengths = torch.randint(low=S_min, high=S, size=(N,), dtype=torch.long) >>> ctc_loss = nn.CTCLoss() >>> loss = ctc_loss(input, target, input_lengths, target_lengths) >>> loss.backward() >>> >>> >>> # Target are to be un-padded >>> T = 50 # Input sequence length >>> C = 20 # Number of classes (including blank) >>> N = 16 # Batch size >>> >>> # Initialize random batch of input vectors, for *size = (T,N,C) >>> input = torch.randn(T, N, C).log_softmax(2).detach().requires_grad_() >>> input_lengths = torch.full(size=(N,), fill_value=T, dtype=torch.long) >>> >>> # Initialize random batch of targets (0 = blank, 1:C = classes) >>> target_lengths = torch.randint(low=1, high=T, size=(N,), dtype=torch.long) >>> target = torch.randint(low=1, high=C, size=(sum(target_lengths),), dtype=torch.long) >>> ctc_loss = nn.CTCLoss() >>> loss = ctc_loss(input, target, input_lengths, target_lengths) >>> loss.backward() Reference: A. Graves et al.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks: Note In order to use CuDNN, the following must be satisfied: `targets` must be in concatenated format, all `input_lengths` must be `T`. blank=0blank=0 , `target_lengths` ≤256\leq 256 , the integer arguments must be of dtype `torch.int32`. The regular implementation uses the (more common in PyTorch) `torch.long` dtype. Note In some circumstances when using the CUDA backend with CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. Please see the notes on [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for background. # DataParallel `class torch.nn.DataParallel(module, device_ids=None, output_device=None, dim=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/data_parallel.html#DataParallel) Implements data parallelism at the module level. This container parallelizes the application of the given `module` by splitting the input across the specified devices by chunking in the batch dimension (other objects will be copied once per device). In the forward pass, the module is replicated on each device, and each replica handles a portion of the input. During the backwards pass, gradients from each replica are summed into the original module. The batch size should be larger than the number of GPUs used. Warning It is recommended to use [`DistributedDataParallel`](torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel"), instead of this class, to do multi-GPU training, even if there is only a single node. See: [Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-nn-ddp- instead) and [Distributed Data Parallel](https://pytorch.org/docs/1.8.0/notes/ddp.html#ddp). Arbitrary positional and keyword inputs are allowed to be passed into DataParallel but some types are specially handled. tensors will be **scattered** on dim specified (default 0). tuple, list and dict types will be shallow copied. The other types will be shared among different threads and can be corrupted if written to in the model’s forward pass. The parallelized `module` must have its parameters and buffers on `device_ids[0]` before running this `DataParallel` module. Warning In each forward, `module` is **replicated** on each device, so any updates to the running module in `forward` will be lost. For example, if `module` has a counter attribute that is incremented in each `forward`, it will always stay at the initial value because the update is done on the replicas which are destroyed after `forward`. However, `DataParallel` guarantees that the replica on `device[0]` will have its parameters and buffers sharing storage with the base parallelized `module`. So **in-place** updates to the parameters or buffers on `device[0]` will be recorded. E.g., [`BatchNorm2d`](torch.nn.batchnorm2d#torch.nn.BatchNorm2d "torch.nn.BatchNorm2d") and [`spectral_norm()`](torch.nn.utils.spectral_norm#torch.nn.utils.spectral_norm "torch.nn.utils.spectral_norm") rely on this behavior to update the buffers. Warning Forward and backward hooks defined on `module` and its submodules will be invoked `len(device_ids)` times, each with inputs located on a particular device. Particularly, the hooks are only guaranteed to be executed in correct order with respect to operations on corresponding devices. For example, it is not guaranteed that hooks set via [`register_forward_pre_hook()`](torch.nn.module#torch.nn.Module.register_forward_pre_hook "torch.nn.Module.register_forward_pre_hook") be executed before `all` `len(device_ids)` [`forward()`](torch.nn.module#torch.nn.Module.forward "torch.nn.Module.forward") calls, but that each such hook be executed before the corresponding [`forward()`](torch.nn.module#torch.nn.Module.forward "torch.nn.Module.forward") call of that device. Warning When `module` returns a scalar (i.e., 0-dimensional tensor) in `forward()`, this wrapper will return a vector of length equal to number of devices used in data parallelism, containing the result from each device. Note There is a subtlety in using the `pack sequence -> recurrent network -> unpack sequence` pattern in a [`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") wrapped in `DataParallel`. See [My recurrent network doesn’t work with data parallelism](https://pytorch.org/docs/1.8.0/notes/faq.html#pack-rnn-unpack- with-data-parallelism) section in FAQ for details. Parameters * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module to be parallelized * **device_ids** (_list of python:int_ _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – CUDA devices (default: all devices) * **output_device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – device location of output (default: device_ids[0]) Variables **~DataParallel.module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – the module to be parallelized Example: >>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2]) >>> output = net(input_var) # input_var can be on any device, including CPU # Dropout `class torch.nn.Dropout(p=0.5, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#Dropout) During training, randomly zeroes some of the elements of the input tensor with probability `p` using samples from a Bernoulli distribution. Each channel will be zeroed out independently on every forward call. This has proven to be an effective technique for regularization and preventing the co-adaptation of neurons as described in the paper [Improving neural networks by preventing co-adaptation of feature detectors](https://arxiv.org/abs/1207.0580) . Furthermore, the outputs are scaled by a factor of 11−p\frac{1}{1-p} during training. This means that during evaluation the module simply computes an identity function. Parameters * **p** – probability of an element to be zeroed. Default: 0.5 * **inplace** – If set to `True`, will do this operation in-place. Default: `False` Shape: * Input: (∗)(*) . Input can be of any shape * Output: (∗)(*) . Output is of the same shape as input Examples: >>> m = nn.Dropout(p=0.2) >>> input = torch.randn(20, 16) >>> output = m(input) # Dropout2d `class torch.nn.Dropout2d(p=0.5, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#Dropout2d) Randomly zero out entire channels (a channel is a 2D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 2D tensor input[i,j]\text{input}[i, j] ). Each channel will be zeroed out independently on every forward call with probability `p` using samples from a Bernoulli distribution. Usually the input comes from `nn.Conv2d` modules. As described in the paper [Efficient Object Localization Using Convolutional Networks](https://arxiv.org/abs/1411.4280) , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, `nn.Dropout2d()` will help promote independence between feature maps and should be used instead. Parameters * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – probability of an element to be zero-ed. * **inplace** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to `True`, will do this operation in-place Shape: * Input: (N,C,H,W)(N, C, H, W) * Output: (N,C,H,W)(N, C, H, W) (same shape as input) Examples: >>> m = nn.Dropout2d(p=0.2) >>> input = torch.randn(20, 16, 32, 32) >>> output = m(input) # Dropout3d `class torch.nn.Dropout3d(p=0.5, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/dropout.html#Dropout3d) Randomly zero out entire channels (a channel is a 3D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 3D tensor input[i,j]\text{input}[i, j] ). Each channel will be zeroed out independently on every forward call with probability `p` using samples from a Bernoulli distribution. Usually the input comes from `nn.Conv3d` modules. As described in the paper [Efficient Object Localization Using Convolutional Networks](https://arxiv.org/abs/1411.4280) , if adjacent pixels within feature maps are strongly correlated (as is normally the case in early convolution layers) then i.i.d. dropout will not regularize the activations and will otherwise just result in an effective learning rate decrease. In this case, `nn.Dropout3d()` will help promote independence between feature maps and should be used instead. Parameters * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – probability of an element to be zeroed. * **inplace** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to `True`, will do this operation in-place Shape: * Input: (N,C,D,H,W)(N, C, D, H, W) * Output: (N,C,D,H,W)(N, C, D, H, W) (same shape as input) Examples: >>> m = nn.Dropout3d(p=0.2) >>> input = torch.randn(20, 16, 4, 32, 32) >>> output = m(input) # ELU `class torch.nn.ELU(alpha=1.0, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#ELU) Applies the element-wise function: ELU(x)={x, if x>0α∗(exp⁡(x)−1), if x≤0\text{ELU}(x) = \begin{cases} x, & \text{ if } x > 0\\\ \alpha * (\exp(x) - 1), & \text{ if } x \leq 0 \end{cases} Parameters * **alpha** – the α\alpha value for the ELU formulation. Default: 1.0 * **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.ELU() >>> input = torch.randn(2) >>> output = m(input) # Embedding `class torch.nn.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#Embedding) A simple lookup table that stores embeddings of a fixed dictionary and size. This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings. Parameters * **num_embeddings** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the dictionary of embeddings * **embedding_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of each embedding vector * **padding_idx** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – If given, pads the output with the embedding vector at `padding_idx` (initialized to zeros) whenever it encounters the index. * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`. * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The p of the p-norm to compute for the `max_norm` option. Default `2`. * **scale_grad_by_freq** (_boolean_ _,__optional_) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`. * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, gradient w.r.t. `weight` matrix will be a sparse tensor. See Notes for more details regarding sparse gradients. Variables **~Embedding.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape (num_embeddings, embedding_dim) initialized from N(0,1)\mathcal{N}(0, 1) Shape: * Input: (∗)(*) , IntTensor or LongTensor of arbitrary shape containing the indices to extract * Output: (∗,H)(*, H) , where `*` is the input shape and H=embedding_dimH=\text{embedding\\_dim} Note Keep in mind that only a limited number of optimizers support sparse gradients: currently it’s `optim.SGD` (`CUDA` and `CPU`), `optim.SparseAdam` (`CUDA` and `CPU`) and `optim.Adagrad` (`CPU`) Note With `padding_idx` set, the embedding vector at `padding_idx` is initialized to all zeros. However, note that this vector can be modified afterwards, e.g., using a customized initialization method, and thus changing the vector used to pad the output. The gradient for this vector from `Embedding` is always zero. Note When `max_norm` is not `None`, `Embedding`’s forward method will modify the `weight` tensor in-place. Since tensors needed for gradient computations cannot be modified in-place, performing a differentiable operation on `Embedding.weight` before calling `Embedding`’s forward method requires cloning `Embedding.weight` when `max_norm` is not `None`. For example: n, d, m = 3, 5, 7 embedding = nn.Embedding(n, d, max_norm=True) W = torch.randn((m, d), requires_grad=True) idx = torch.tensor([1, 2]) a = embedding.weight.clone() @ W.t() # weight must be cloned for this to be differentiable b = embedding(idx) @ W.t() # modifies weight in-place out = (a.unsqueeze(0) + b.unsqueeze(1)) loss = out.sigmoid().prod() loss.backward() Examples: >>> # an Embedding module containing 10 tensors of size 3 >>> embedding = nn.Embedding(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.LongTensor([[1,2,4,5],[4,3,2,9]]) >>> embedding(input) tensor([[[-0.0251, -1.6902, 0.7172], [-0.6431, 0.0748, 0.6969], [ 1.4970, 1.3448, -0.9685], [-0.3677, -2.7265, -0.1685]], [[ 1.4970, 1.3448, -0.9685], [ 0.4362, -0.4004, 0.9400], [-0.6431, 0.0748, 0.6969], [ 0.9124, -2.3616, 1.1151]]]) >>> # example with padding_idx >>> embedding = nn.Embedding(10, 3, padding_idx=0) >>> input = torch.LongTensor([[0,2,0,5]]) >>> embedding(input) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.1535, -2.0309, 0.9315], [ 0.0000, 0.0000, 0.0000], [-0.1655, 0.9897, 0.0635]]]) `classmethod from_pretrained(embeddings, freeze=True, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#Embedding.from_pretrained) Creates Embedding instance from given 2-dimensional FloatTensor. Parameters * **embeddings** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – FloatTensor containing weights for the Embedding. First dimension is being passed to Embedding as `num_embeddings`, second as `embedding_dim`. * **freeze** (_boolean_ _,__optional_) – If `True`, the tensor does not get updated in the learning process. Equivalent to `embedding.weight.requires_grad = False`. Default: `True` * **padding_idx** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default `2`. * **scale_grad_by_freq** (_boolean_ _,__optional_) – See module initialization documentation. Default `False`. * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Examples: >>> # FloatTensor containing pretrained weights >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) >>> embedding = nn.Embedding.from_pretrained(weight) >>> # Get embeddings for index 1 >>> input = torch.LongTensor([1]) >>> embedding(input) tensor([[ 4.0000, 5.1000, 6.3000]]) # EmbeddingBag `class torch.nn.EmbeddingBag(num_embeddings, embedding_dim, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False, _weight=None, include_last_offset=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#EmbeddingBag) Computes sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings. For bags of constant length and no `per_sample_weights` and 2D inputs, this class * with `mode="sum"` is equivalent to [`Embedding`](torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") followed by `torch.sum(dim=1)`, * with `mode="mean"` is equivalent to [`Embedding`](torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") followed by `torch.mean(dim=1)`, * with `mode="max"` is equivalent to [`Embedding`](torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") followed by `torch.max(dim=1)`. However, `EmbeddingBag` is much more time and memory efficient than using a chain of these operations. EmbeddingBag also supports per-sample weights as an argument to the forward pass. This scales the output of the Embedding before performing a weighted reduction as specified by `mode`. If `per_sample_weights`` is passed, the only supported `mode` is `"sum"`, which computes a weighted sum according to `per_sample_weights`. Parameters * **num_embeddings** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – size of the dictionary of embeddings * **embedding_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of each embedding vector * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`. * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The p of the p-norm to compute for the `max_norm` option. Default `2`. * **scale_grad_by_freq** (_boolean_ _,__optional_) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`. Note: this option is not supported when `mode="max"`. * **mode** (_string_ _,__optional_) – `"sum"`, `"mean"` or `"max"`. Specifies the way to reduce the bag. `"sum"` computes the weighted sum, taking `per_sample_weights` into consideration. `"mean"` computes the average of the values in the bag, `"max"` computes the max value over each bag. Default: `"mean"` * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, gradient w.r.t. `weight` matrix will be a sparse tensor. See Notes for more details regarding sparse gradients. Note: this option is not supported when `mode="max"`. * **include_last_offset** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, `offsets` has one additional element, where the last element is equivalent to the size of `indices`. This matches the CSR format. Variables **~EmbeddingBag.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of the module of shape `(num_embeddings, embedding_dim)` initialized from N(0,1)\mathcal{N}(0, 1) . `Inputs: input (IntTensor or LongTensor), offsets (IntTensor or LongTensor, optional), and` `per_index_weights` (Tensor, optional) * `input` and `offsets` have to be of the same type, either int or long * If `input` is 2D of shape `(B, N)`, it will be treated as `B` bags (sequences) each of fixed length `N`, and this will return `B` values aggregated in a way depending on the `mode`. `offsets` is ignored and required to be `None` in this case. * If `input` is 1D of shape `(N)`, it will be treated as a concatenation of multiple bags (sequences). `offsets` is required to be a 1D tensor containing the starting index positions of each bag in `input`. Therefore, for `offsets` of shape `(B)`, `input` will be viewed as having `B` bags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros. per_sample_weights (Tensor, optional): a tensor of float / double weights, or None to indicate all weights should be taken to be `1`. If specified, `per_sample_weights` must have exactly the same shape as input and is treated as having the same `offsets`, if those are not `None`. Only supported for `mode='sum'`. Output shape: `(B, embedding_dim)` Examples: >>> # an Embedding module containing 10 tensors of size 3 >>> embedding_sum = nn.EmbeddingBag(10, 3, mode='sum') >>> # a batch of 2 samples of 4 indices each >>> input = torch.LongTensor([1,2,4,5,4,3,2,9]) >>> offsets = torch.LongTensor([0,4]) >>> embedding_sum(input, offsets) tensor([[-0.8861, -5.4350, -0.0523], [ 1.1306, -2.5798, -1.0044]]) `classmethod from_pretrained(embeddings, freeze=True, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='mean', sparse=False, include_last_offset=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/sparse.html#EmbeddingBag.from_pretrained) Creates EmbeddingBag instance from given 2-dimensional FloatTensor. Parameters * **embeddings** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – FloatTensor containing weights for the EmbeddingBag. First dimension is being passed to EmbeddingBag as ‘num_embeddings’, second as ‘embedding_dim’. * **freeze** (_boolean_ _,__optional_) – If `True`, the tensor does not get updated in the learning process. Equivalent to `embeddingbag.weight.requires_grad = False`. Default: `True` * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default: `None` * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default `2`. * **scale_grad_by_freq** (_boolean_ _,__optional_) – See module initialization documentation. Default `False`. * **mode** (_string_ _,__optional_) – See module initialization documentation. Default: `"mean"` * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default: `False`. * **include_last_offset** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – See module initialization documentation. Default: `False`. Examples: >>> # FloatTensor containing pretrained weights >>> weight = torch.FloatTensor([[1, 2.3, 3], [4, 5.1, 6.3]]) >>> embeddingbag = nn.EmbeddingBag.from_pretrained(weight) >>> # Get embeddings for index 1 >>> input = torch.LongTensor([[1, 0]]) >>> embeddingbag(input) tensor([[ 2.5000, 3.7000, 4.6500]]) # Flatten `class torch.nn.Flatten(start_dim=1, end_dim=-1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/flatten.html#Flatten) Flattens a contiguous range of dims into a tensor. For use with `Sequential`. Shape: * Input: (N,∗dims)(N, *dims) * Output: (N,∏∗dims)(N, \prod *dims) (for the default case). Parameters * **start_dim** – first dim to flatten (default = 1). * **end_dim** – last dim to flatten (default = -1). Examples:: >>> input = torch.randn(32, 1, 5, 5) >>> m = nn.Sequential( >>> nn.Conv2d(1, 32, 5, 1, 1), >>> nn.Flatten() >>> ) >>> output = m(input) >>> output.size() torch.Size([32, 288]) `add_module(name, module)` Adds a child module to the current module. The module can be accessed as an attribute using the given name. Parameters * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – child module to be added to the module. `apply(fn)` Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self. Typical use includes initializing the parameters of a model (see also [torch.nn.init](../nn.init#nn-init-doc)). Parameters **fn** ([`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") -> None) – function to be applied to each submodule Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") Example: >>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) `bfloat16()` Casts all floating point parameters and buffers to `bfloat16` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `buffers(recurse=True)` Returns an iterator over module buffers. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _torch.Tensor_ – module buffer Example: >>> for buf in model.buffers(): >>> print(type(buf), buf.size()) (20L,) (20L, 1L, 5L, 5L) `children()` Returns an iterator over immediate children modules. Yields _Module_ – a child module `cpu()` Moves all model parameters and buffers to the CPU. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `cuda(device=None)` Moves all model parameters and buffers to the GPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `double()` Casts all floating point parameters and buffers to `double` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `eval()` Sets the module in evaluation mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout"), `BatchNorm`, etc. This is equivalent with [`self.train(False)`](torch.nn.module#torch.nn.Module.train "torch.nn.Module.train"). Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `float()` Casts all floating point parameters and buffers to float datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `half()` Casts all floating point parameters and buffers to `half` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `load_state_dict(state_dict, strict=True)` Copies parameters and buffers from `state_dict` into this module and its descendants. If `strict` is `True`, then the keys of `state_dict` must exactly match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Parameters * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Default: `True` Returns * **missing_keys** is a list of str containing the missing keys * **unexpected_keys** is a list of str containing the unexpected keys Return type `NamedTuple` with `missing_keys` and `unexpected_keys` fields `modules()` Returns an iterator over all modules in the network. Yields _Module_ – a module in the network Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True) `named_buffers(prefix='', recurse=True)` Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _(string, torch.Tensor)_ – Tuple containing the name and buffer Example: >>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size()) `named_children()` Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple containing a name and child module Example: >>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module) `named_modules(memo=None, prefix='')` Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple of name and module Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True)) `named_parameters(prefix='', recurse=True)` Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _(string, Parameter)_ – Tuple containing the name and parameter Example: >>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size()) `parameters(recurse=True)` Returns an iterator over module parameters. This is typically passed to an optimizer. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _Parameter_ – module parameter Example: >>> for param in model.parameters(): >>> print(type(param), param.size()) (20L,) (20L, 1L, 5L, 5L) `register_backward_hook(hook)` Registers a backward hook on the module. This function is deprecated in favor of `nn.Module.register_full_backward_hook()` and the behavior of this function will change in future versions. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_buffer(name, tensor, persistent=True)` Adds a buffer to the module. This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s `running_mean` is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting `persistent` to `False`. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s `state_dict`. Buffers can be accessed as attributes using given names. Parameters * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered. * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`. Example: >>> self.register_buffer('running_mean', torch.zeros(num_features)) `register_forward_hook(hook)` Registers a forward hook on the module. The hook will be called every time after `forward()` has computed an output. It should have the following signature: hook(module, input, output) -> None or modified output The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after `forward()` is called. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_forward_pre_hook(hook)` Registers a forward pre-hook on the module. The hook will be called every time before `forward()` is invoked. It should have the following signature: hook(module, input) -> None or modified input The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple). Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_full_backward_hook(hook)` Registers a backward hook on the module. The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature: hook(module, grad_input, grad_output) -> tuple(Tensor) or None The `grad_input` and `grad_output` are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of `grad_input` in subsequent computations. `grad_input` will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output` will be `None` for all non-Tensor arguments. Warning Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_parameter(name, param)` Adds a parameter to the module. The parameter can be accessed as an attribute using given name. Parameters * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module. `requires_grad_(requires_grad=True)` Change if autograd should record operations on parameters in this module. This method sets the parameters’ `requires_grad` attributes in-place. This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training). Parameters **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether autograd should record operations on parameters in this module. Default: `True`. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `state_dict(destination=None, prefix='', keep_vars=False)` Returns a dictionary containing a whole state of the module. Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Returns a dictionary containing a whole state of the module Return type [dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)") Example: >>> module.state_dict().keys() ['bias', 'weight'] `to(*args, **kwargs)` Moves and/or casts the parameters and buffers. This can be called as `to(device=None, dtype=None, non_blocking=False)` `to(dtype, non_blocking=False)` `to(tensor, non_blocking=False)` `to(memory_format=torch.channels_last)` Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to "torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype` (if given). The integral parameters and buffers will be moved `device`, if that is given, but with dtypes unchanged. When `non_blocking` is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. See below for examples. Note This method modifies the module in-place. Parameters * **device** (`torch.device`) – the desired device of the parameters and buffers in this module * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument) Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") Examples: >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128) `train(mode=True)` Sets the module in training mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout"), `BatchNorm`, etc. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode (`False`). Default: `True`. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `type(dst_type)` Casts all parameters and buffers to `dst_type`. Parameters **dst_type** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – the desired type Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `xpu(device=None)` Moves all model parameters and buffers to the XPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `zero_grad(set_to_none=False)` Sets gradients of all model parameters to zero. See similar function under [`torch.optim.Optimizer`](../optim#torch.optim.Optimizer "torch.optim.Optimizer") for more context. Parameters **set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – instead of setting to zero, set the grads to None. See [`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad "torch.optim.Optimizer.zero_grad") for details. # Fold `class torch.nn.Fold(output_size, kernel_size, dilation=1, padding=0, stride=1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/fold.html#Fold) Combines an array of sliding local blocks into a large containing tensor. Consider a batched `input` tensor containing sliding local blocks, e.g., patches of images, of shape (N,C×∏(kernel_size),L)(N, C \times \prod(\text{kernel\\_size}), L) , where NN is batch dimension, C×∏(kernel_size)C \times \prod(\text{kernel\\_size}) is the number of values within a block (a block has ∏(kernel_size)\prod(\text{kernel\\_size}) spatial locations each containing a CC -channeled vector), and LL is the total number of blocks. (This is exactly the same specification as the output shape of [`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold").) This operation combines these local blocks into the large `output` tensor of shape (N,C,output_size[0],output_size[1],…)(N, C, \text{output\\_size}[0], \text{output\\_size}[1], \dots) by summing the overlapping values. Similar to [`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold"), the arguments must satisfy L=∏d⌊output_size[d]+2×padding[d]−dilation[d]×(kernel_size[d]−1)−1stride[d]+1⌋,L = \prod_d \left\lfloor\frac{\text{output\\_size}[d] + 2 \times \text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor, where dd is over all spatial dimensions. * `output_size` describes the spatial shape of the large containing tensor of the sliding local blocks. It is useful to resolve the ambiguity when multiple input shapes map to same number of sliding blocks, e.g., with `stride > 0`. The `padding`, `stride` and `dilation` arguments specify how the sliding blocks are retrieved. * `stride` controls the stride for the sliding blocks. * `padding` controls the amount of implicit zero-paddings on both sides for `padding` number of points for each dimension before reshaping. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. Parameters * **output_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the shape of the spatial dimensions of the output (i.e., `output.sizes()[2:]`) * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the sliding blocks * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the stride of the sliding blocks in the input spatial dimensions. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – implicit zero padding to be added on both sides of input. Default: 0 * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – a parameter that controls the stride of elements within the neighborhood. Default: 1 * If `output_size`, `kernel_size`, `dilation`, `padding` or `stride` is an int or a tuple of length 1 then their values will be replicated across all spatial dimensions. * For the case of two output spatial dimensions this operation is sometimes called `col2im`. Note `Fold` calculates each combined value in the resulting large tensor by summing all values from all containing blocks. [`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold") extracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other. In general, folding and unfolding operations are related as follows. Consider `Fold` and [`Unfold`](torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold") instances created with the same parameters: >>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...) >>> fold = nn.Fold(output_size=..., **fold_params) >>> unfold = nn.Unfold(**fold_params) Then for any (supported) `input` tensor the following equality holds: fold(unfold(input)) == divisor * input where `divisor` is a tensor that depends only on the shape and dtype of the `input`: >>> input_ones = torch.ones(input.shape, dtype=input.dtype) >>> divisor = fold(unfold(input_ones)) When the `divisor` tensor contains no zero elements, then `fold` and `unfold` operations are inverses of each other (up to constant divisor). Warning Currently, only 4-D output tensors (batched image-like tensors) are supported. Shape: * Input: (N,C×∏(kernel_size),L)(N, C \times \prod(\text{kernel\\_size}), L) * Output: (N,C,output_size[0],output_size[1],…)(N, C, \text{output\\_size}[0], \text{output\\_size}[1], \dots) as described above Examples: >>> fold = nn.Fold(output_size=(4, 5), kernel_size=(2, 2)) >>> input = torch.randn(1, 3 * 2 * 2, 12) >>> output = fold(input) >>> output.size() torch.Size([1, 3, 4, 5]) # FractionalMaxPool2d `class torch.nn.FractionalMaxPool2d(kernel_size, output_size=None, output_ratio=None, return_indices=False, _random_samples=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#FractionalMaxPool2d) Applies a 2D fractional max pooling over an input signal composed of several input planes. Fractional MaxPooling is described in detail in the paper [Fractional MaxPooling](https://arxiv.org/abs/1412.6071) by Ben Graham The max-pooling operation is applied in kH×kWkH \times kW regions by a stochastic step size determined by the target output size. The number of output features is equal to the number of input planes. Parameters * **kernel_size** – the size of the window to take a max over. Can be a single number k (for a square kernel of k x k) or a tuple `(kh, kw)` * **output_size** – the target output size of the image of the form `oH x oW`. Can be a tuple `(oH, oW)` or a single number oH for a square image `oH x oH` * **output_ratio** – If one wants to have an output size as a ratio of the input size, this option can be given. This has to be a number or tuple in the range (0, 1) * **return_indices** – if `True`, will return the indices along with the outputs. Useful to pass to `nn.MaxUnpool2d()`. Default: `False` #### Examples >>> # pool of square window of size=3, and target output size 13x12 >>> m = nn.FractionalMaxPool2d(3, output_size=(13, 12)) >>> # pool of square window and target output size being half of input image size >>> m = nn.FractionalMaxPool2d(3, output_ratio=(0.5, 0.5)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) # GaussianNLLLoss `class torch.nn.GaussianNLLLoss(*, full=False, eps=1e-06, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#GaussianNLLLoss) Gaussian negative log likelihood loss. The targets are treated as samples from Gaussian distributions with expectations and variances predicted by the neural network. For a D-dimensional `target` tensor modelled as having heteroscedastic Gaussian distributions with a D-dimensional tensor of expectations `input` and a D-dimensional tensor of positive variances `var` the loss is: loss=12∑i=1D(log⁡(max(var[i], eps))+(input[i]−target[i])2max(var[i], eps))+const.\text{loss} = \frac{1}{2}\sum_{i=1}^D \left(\log\left(\text{max}\left(\text{var}[i], \ \text{eps}\right)\right) + \frac{\left(\text{input}[i] - \text{target}[i]\right)^2} {\text{max}\left(\text{var}[i], \ \text{eps}\right)}\right) + \text{const.} where `eps` is used for stability. By default, the constant term of the loss function is omitted unless `full` is `True`. If `var` is a scalar (implying `target` tensor has homoscedastic Gaussian distributions) it is broadcasted to be the same size as the input. Parameters * **full** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – include the constant term in the loss calculation. Default: `False`. * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – value used to clamp `var` (see note below), for stability. Default: 1e-6. * **reduction** (_string_ _,__optional_) – specifies the reduction to apply to the output:`'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the output is the average of all batch member losses, `'sum'`: the output is the sum of all batch member losses. Default: `'mean'`. Shape: * Input: (N,∗)(N, *) where ∗* means any number of additional dimensions * Target: (N,∗)(N, *) , same shape as the input * Var: (N,1)(N, 1) or (N,∗)(N, *) , same shape as the input * Output: scalar if `reduction` is `'mean'` (default) or `'sum'`. If `reduction` is `'none'`, then (N)(N) Examples: >>> loss = nn.GaussianNLLLoss() >>> input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> var = torch.ones(5, 2, requires_grad=True) #heteroscedastic >>> output = loss(input, target, var) >>> output.backward() >>> loss = nn.GaussianNLLLoss() >>> input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> var = torch.ones(5, 1, requires_grad=True) #homoscedastic >>> output = loss(input, target, var) >>> output.backward() Note The clamping of `var` is ignored with respect to autograd, and so the gradients are unaffected by it. Reference: Nix, D. A. and Weigend, A. S., “Estimating the mean and variance of the target probability distribution”, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN’94), Orlando, FL, USA, 1994, pp. 55-60 vol.1, doi: 10.1109/ICNN.1994.374138. # GELU `class torch.nn.GELU` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#GELU) Applies the Gaussian Error Linear Units function: GELU(x)=x∗Φ(x)\text{GELU}(x) = x * \Phi(x) where Φ(x)\Phi(x) is the Cumulative Distribution Function for Gaussian Distribution. Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.GELU() >>> input = torch.randn(2) >>> output = m(input) # GroupNorm `class torch.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/normalization.html#GroupNorm) Applies Group Normalization over a mini-batch of inputs as described in the paper [Group Normalization](https://arxiv.org/abs/1803.08494) y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The input channels are separated into `num_groups` groups, each containing `num_channels / num_groups` channels. The mean and standard-deviation are calculated separately over the each group. γ\gamma and β\beta are learnable per-channel affine transform parameter vectors of size `num_channels` if `affine` is `True`. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. This layer uses statistics computed from input data in both training and evaluation modes. Parameters * **num_groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of groups to separate the channels into * **num_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of channels expected in input * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **affine** – a boolean value that when set to `True`, this module has learnable per-channel affine parameters initialized to ones (for weights) and zeros (for biases). Default: `True`. Shape: * Input: (N,C,∗)(N, C, *) where C=num_channelsC=\text{num\\_channels} * Output: (N,C,∗)(N, C, *) (same shape as input) Examples: >>> input = torch.randn(20, 6, 10, 10) >>> # Separate 6 channels into 3 groups >>> m = nn.GroupNorm(3, 6) >>> # Separate 6 channels into 6 groups (equivalent with InstanceNorm) >>> m = nn.GroupNorm(6, 6) >>> # Put all 6 channels into a single group (equivalent with LayerNorm) >>> m = nn.GroupNorm(1, 6) >>> # Activating the module >>> output = m(input) # GRU `class torch.nn.GRU(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#GRU) Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. For each element in the input sequence, each layer computes the following function: rt=σ(Wirxt+bir+Whrh(t−1)+bhr)zt=σ(Wizxt+biz+Whzh(t−1)+bhz)nt=tanh⁡(Winxt+bin+rt∗(Whnh(t−1)+bhn))ht=(1−zt)∗nt+zt∗h(t−1)\begin{array}{ll} r_t = \sigma(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\\ z_t = \sigma(W_{iz} x_t + b_{iz} + W_{hz} h_{(t-1)} + b_{hz}) \\\ n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\\ h_t = (1 - z_t) * n_t + z_t * h_{(t-1)} \end{array} where hth_t is the hidden state at time `t`, xtx_t is the input at time `t`, h(t−1)h_{(t-1)} is the hidden state of the layer at time `t-1` or the initial hidden state at time `0`, and rtr_t , ztz_t , ntn_t are the reset, update, and new gates, respectively. σ\sigma is the sigmoid function, and ∗* is the Hadamard product. In a multilayer GRU, the input xt(l)x^{(l)}_t of the ll -th layer (l>=2l >= 2 ) is the hidden state ht(l−1)h^{(l-1)}_t of the previous layer multiplied by dropout δt(l−1)\delta^{(l-1)}_t where each δt(l−1)\delta^{(l-1)}_t is a Bernoulli random variable which is 00 with probability `dropout`. Parameters * **input_size** – The number of expected features in the input `x` * **hidden_size** – The number of features in the hidden state `h` * **num_layers** – Number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two GRUs together to form a `stacked GRU`, with the second GRU taking in outputs of the first GRU and computing the final results. Default: 1 * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True` * **batch_first** – If `True`, then the input and output tensors are provided as (batch, seq, feature). Default: `False` * **dropout** – If non-zero, introduces a `Dropout` layer on the outputs of each GRU layer except the last layer, with dropout probability equal to `dropout`. Default: 0 * **bidirectional** – If `True`, becomes a bidirectional GRU. Default: `False` Inputs: input, h_0 * **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See [`torch.nn.utils.rnn.pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") for details. * **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1. Outputs: output, h_n * **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features h_t from the last layer of the GRU, for each `t`. If a [`torch.nn.utils.rnn.PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using `output.view(seq_len, batch, num_directions, hidden_size)`, with forward and backward being direction `0` and `1` respectively. Similarly, the directions can be separated in the packed case. * **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len` Like _output_ , the layers can be separated using `h_n.view(num_layers, num_directions, batch, hidden_size)`. Shape: * Input1: (L,N,Hin)(L, N, H_{in}) tensor containing input features where Hin=input_sizeH_{in}=\text{input\\_size} and `L` represents a sequence length. * Input2: (S,N,Hout)(S, N, H_{out}) tensor containing the initial hidden state for each element in the batch. Hout=hidden_sizeH_{out}=\text{hidden\\_size} Defaults to zero if not provided. where S=num_layers∗num_directionsS=\text{num\\_layers} * \text{num\\_directions} If the RNN is bidirectional, num_directions should be 2, else it should be 1. * Output1: (L,N,Hall)(L, N, H_{all}) where Hall=num_directions∗hidden_sizeH_{all}=\text{num\\_directions} * \text{hidden\\_size} * Output2: (S,N,Hout)(S, N, H_{out}) tensor containing the next hidden state for each element in the batch Variables * **~GRU.weight_ih_l[k]** – the learnable input-hidden weights of the kth\text{k}^{th} layer (W_ir|W_iz|W_in), of shape `(3*hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(3*hidden_size, num_directions * hidden_size)` * **~GRU.weight_hh_l[k]** – the learnable hidden-hidden weights of the kth\text{k}^{th} layer (W_hr|W_hz|W_hn), of shape `(3*hidden_size, hidden_size)` * **~GRU.bias_ih_l[k]** – the learnable input-hidden bias of the kth\text{k}^{th} layer (b_ir|b_iz|b_in), of shape `(3*hidden_size)` * **~GRU.bias_hh_l[k]** – the learnable hidden-hidden bias of the kth\text{k}^{th} layer (b_hr|b_hz|b_hn), of shape `(3*hidden_size)` Note All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}} Orphan Note If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype `torch.float16` 4) V100 GPU is used, 5) input data is not in `PackedSequence` format persistent algorithm can be selected to improve performance. Examples: >>> rnn = nn.GRU(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0) # GRUCell `class torch.nn.GRUCell(input_size, hidden_size, bias=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#GRUCell) A gated recurrent unit (GRU) cell r=σ(Wirx+bir+Whrh+bhr)z=σ(Wizx+biz+Whzh+bhz)n=tanh⁡(Winx+bin+r∗(Whnh+bhn))h′=(1−z)∗n+z∗h\begin{array}{ll} r = \sigma(W_{ir} x + b_{ir} + W_{hr} h + b_{hr}) \\\ z = \sigma(W_{iz} x + b_{iz} + W_{hz} h + b_{hz}) \\\ n = \tanh(W_{in} x + b_{in} + r * (W_{hn} h + b_{hn})) \\\ h' = (1 - z) * n + z * h \end{array} where σ\sigma is the sigmoid function, and ∗* is the Hadamard product. Parameters * **input_size** – The number of expected features in the input `x` * **hidden_size** – The number of features in the hidden state `h` * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True` Inputs: input, hidden * **input** of shape `(batch, input_size)`: tensor containing input features * **hidden** of shape `(batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. Outputs: h’ * **h’** of shape `(batch, hidden_size)`: tensor containing the next hidden state for each element in the batch Shape: * Input1: (N,Hin)(N, H_{in}) tensor containing input features where HinH_{in} = `input_size` * Input2: (N,Hout)(N, H_{out}) tensor containing the initial hidden state for each element in the batch where HoutH_{out} = `hidden_size` Defaults to zero if not provided. * Output: (N,Hout)(N, H_{out}) tensor containing the next hidden state for each element in the batch Variables * **~GRUCell.weight_ih** – the learnable input-hidden weights, of shape `(3*hidden_size, input_size)` * **~GRUCell.weight_hh** – the learnable hidden-hidden weights, of shape `(3*hidden_size, hidden_size)` * **~GRUCell.bias_ih** – the learnable input-hidden bias, of shape `(3*hidden_size)` * **~GRUCell.bias_hh** – the learnable hidden-hidden bias, of shape `(3*hidden_size)` Note All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}} Examples: >>> rnn = nn.GRUCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx = rnn(input[i], hx) output.append(hx) # Hardshrink `class torch.nn.Hardshrink(lambd=0.5)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardshrink) Applies the hard shrinkage function element-wise: HardShrink(x)={x, if x>λx, if x<−λ0, otherwise \text{HardShrink}(x) = \begin{cases} x, & \text{ if } x > \lambda \\\ x, & \text{ if } x < -\lambda \\\ 0, & \text{ otherwise } \end{cases} Parameters **lambd** – the λ\lambda value for the Hardshrink formulation. Default: 0.5 Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Hardshrink() >>> input = torch.randn(2) >>> output = m(input) # Hardsigmoid `class torch.nn.Hardsigmoid(inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardsigmoid) Applies the element-wise function: Hardsigmoid(x)={0if x≤−3,1if x≥+3,x/6+1/2otherwise\text{Hardsigmoid}(x) = \begin{cases} 0 & \text{if~} x \le -3, \\\ 1 & \text{if~} x \ge +3, \\\ x / 6 + 1 / 2 & \text{otherwise} \end{cases} Parameters **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Hardsigmoid() >>> input = torch.randn(2) >>> output = m(input) # Hardswish `class torch.nn.Hardswish(inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardswish) Applies the hardswish function, element-wise, as described in the paper: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244). Hardswish(x)={0if x≤−3,xif x≥+3,x⋅(x+3)/6otherwise\text{Hardswish}(x) = \begin{cases} 0 & \text{if~} x \le -3, \\\ x & \text{if~} x \ge +3, \\\ x \cdot (x + 3) /6 & \text{otherwise} \end{cases} Parameters **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Hardswish() >>> input = torch.randn(2) >>> output = m(input) # Hardtanh `class torch.nn.Hardtanh(min_val=-1.0, max_val=1.0, inplace=False, min_value=None, max_value=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Hardtanh) Applies the HardTanh function element-wise HardTanh is defined as: HardTanh(x)={1 if x>1−1 if x<−1x otherwise \text{HardTanh}(x) = \begin{cases} 1 & \text{ if } x > 1 \\\ -1 & \text{ if } x < -1 \\\ x & \text{ otherwise } \\\ \end{cases} The range of the linear region [−1,1][-1, 1] can be adjusted using `min_val` and `max_val`. Parameters * **min_val** – minimum value of the linear region range. Default: -1 * **max_val** – maximum value of the linear region range. Default: 1 * **inplace** – can optionally do the operation in-place. Default: `False` Keyword arguments `min_value` and `max_value` have been deprecated in favor of `min_val` and `max_val`. Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Hardtanh(-2, 2) >>> input = torch.randn(2) >>> output = m(input) # HingeEmbeddingLoss `class torch.nn.HingeEmbeddingLoss(margin=1.0, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#HingeEmbeddingLoss) Measures the loss given an input tensor xx and a labels tensor yy (containing 1 or -1). This is usually used for measuring whether two inputs are similar or dissimilar, e.g. using the L1 pairwise distance as xx , and is typically used for learning nonlinear embeddings or semi-supervised learning. The loss function for nn -th sample in the mini-batch is ln={xn,ifyn=1,max⁡{0,Δ−xn},ifyn=−1,l_n = \begin{cases} x_n, & \text{if}\; y_n = 1,\\\ \max \\{0, \Delta - x_n\\}, & \text{if}\; y_n = -1, \end{cases} and the total loss functions is ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases} where L={l1,…,lN}⊤L = \\{l_1,\dots,l_N\\}^\top . Parameters * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Has a default value of `1`. * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (∗)(*) where ∗* means, any number of dimensions. The sum operation operates over all the elements. * Target: (∗)(*) , same shape as the input * Output: scalar. If `reduction` is `'none'`, then same shape as the input # Identity `class torch.nn.Identity(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#Identity) A placeholder identity operator that is argument-insensitive. Parameters * **args** – any argument (unused) * **kwargs** – any keyword argument (unused) Examples: >>> m = nn.Identity(54, unused_argument1=0.1, unused_argument2=False) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 20]) # InstanceNorm1d `class torch.nn.InstanceNorm1d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/instancenorm.html#InstanceNorm1d) Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022). y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. γ\gamma and β\beta are learnable parameter vectors of size `C` (where `C` is the input size) if `affine` is `True`. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. By default, this layer uses instance statistics computed from input data in both training and evaluation modes. If `track_running_stats` is set to `True`, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default `momentum` of 0.1. Note This `momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where x^\hat{x} is the estimated statistic and xtx_t is the new observed value. Note `InstanceNorm1d` and [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") are very similar, but have some subtle differences. `InstanceNorm1d` is applied on each channel of channeled data like multidimensional time series, but [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") is usually applied on entire sample and often in NLP tasks. Additionally, [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") applies elementwise affine transform, while `InstanceNorm1d` usually don’t apply affine transform. Parameters * **num_features** – CC from an expected input of size (N,C,L)(N, C, L) or LL from input of size (N,L)(N, L) * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **momentum** – the value used for the running_mean and running_var computation. Default: 0.1 * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: `False`. * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: `False` Shape: * Input: (N,C,L)(N, C, L) * Output: (N,C,L)(N, C, L) (same shape as input) Examples: >>> # Without Learnable Parameters >>> m = nn.InstanceNorm1d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm1d(100, affine=True) >>> input = torch.randn(20, 100, 40) >>> output = m(input) # InstanceNorm2d `class torch.nn.InstanceNorm2d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/instancenorm.html#InstanceNorm2d) Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022). y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. γ\gamma and β\beta are learnable parameter vectors of size `C` (where `C` is the input size) if `affine` is `True`. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. By default, this layer uses instance statistics computed from input data in both training and evaluation modes. If `track_running_stats` is set to `True`, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default `momentum` of 0.1. Note This `momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where x^\hat{x} is the estimated statistic and xtx_t is the new observed value. Note `InstanceNorm2d` and [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") are very similar, but have some subtle differences. `InstanceNorm2d` is applied on each channel of channeled data like RGB images, but [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") is usually applied on entire sample and often in NLP tasks. Additionally, [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") applies elementwise affine transform, while `InstanceNorm2d` usually don’t apply affine transform. Parameters * **num_features** – CC from an expected input of size (N,C,H,W)(N, C, H, W) * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **momentum** – the value used for the running_mean and running_var computation. Default: 0.1 * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: `False`. * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: `False` Shape: * Input: (N,C,H,W)(N, C, H, W) * Output: (N,C,H,W)(N, C, H, W) (same shape as input) Examples: >>> # Without Learnable Parameters >>> m = nn.InstanceNorm2d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm2d(100, affine=True) >>> input = torch.randn(20, 100, 35, 45) >>> output = m(input) # InstanceNorm3d `class torch.nn.InstanceNorm3d(num_features, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/instancenorm.html#InstanceNorm3d) Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022). y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated per-dimension separately for each object in a mini-batch. γ\gamma and β\beta are learnable parameter vectors of size C (where C is the input size) if `affine` is `True`. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. By default, this layer uses instance statistics computed from input data in both training and evaluation modes. If `track_running_stats` is set to `True`, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default `momentum` of 0.1. Note This `momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where x^\hat{x} is the estimated statistic and xtx_t is the new observed value. Note `InstanceNorm3d` and [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") are very similar, but have some subtle differences. `InstanceNorm3d` is applied on each channel of channeled data like 3D models with RGB color, but [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") is usually applied on entire sample and often in NLP tasks. Additionally, [`LayerNorm`](torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") applies elementwise affine transform, while `InstanceNorm3d` usually don’t apply affine transform. Parameters * **num_features** – CC from an expected input of size (N,C,D,H,W)(N, C, D, H, W) * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **momentum** – the value used for the running_mean and running_var computation. Default: 0.1 * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters, initialized the same way as done for batch normalization. Default: `False`. * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: `False` Shape: * Input: (N,C,D,H,W)(N, C, D, H, W) * Output: (N,C,D,H,W)(N, C, D, H, W) (same shape as input) Examples: >>> # Without Learnable Parameters >>> m = nn.InstanceNorm3d(100) >>> # With Learnable Parameters >>> m = nn.InstanceNorm3d(100, affine=True) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input) # KLDivLoss `class torch.nn.KLDivLoss(size_average=None, reduce=None, reduction='mean', log_target=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#KLDivLoss) The Kullback-Leibler divergence loss measure [Kullback-Leibler divergence](https://en.wikipedia.org/wiki/Kullback- Leibler_divergence) is a useful distance measure for continuous distributions and is often useful when performing direct regression over the space of (discretely sampled) continuous output distributions. As with [`NLLLoss`](torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss"), the `input` given is expected to contain _log-probabilities_ and is not restricted to a 2D Tensor. The targets are interpreted as _probabilities_ by default, but could be considered as _log-probabilities_ with `log_target` set to `True`. This criterion expects a `target` `Tensor` of the same size as the `input` `Tensor`. The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as: l(x,y)=L={l1,…,lN},ln=yn⋅(log⁡yn−xn)l(x,y) = L = \\{ l_1,\dots,l_N \\}, \quad l_n = y_n \cdot \left( \log y_n - x_n \right) where the index NN spans all dimensions of `input` and LL has the same shape as `input`. If `reduction` is not `'none'` (default `'mean'`), then: ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';} \\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases} In default `reduction` mode `'mean'`, the losses are averaged for each minibatch over observations **as well as** over dimensions. `'batchmean'` mode gives the correct KL divergence where losses are averaged over batch dimension only. `'mean'` mode’s behavior will be changed to the same as `'batchmean'` in the next major release. Parameters * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'batchmean'` | `'sum'` | `'mean'`. `'none'`: no reduction will be applied. `'batchmean'`: the sum of the output will be divided by batchsize. `'sum'`: the output will be summed. `'mean'`: the output will be divided by the number of elements in the output. Default: `'mean'` * **log_target** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Specifies whether `target` is passed in the log space. Default: `False` Note `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Note `reduction` = `'mean'` doesn’t return the true kl divergence value, please use `reduction` = `'batchmean'` which aligns with KL math definition. In the next major release, `'mean'` will be changed to be the same as `'batchmean'`. Shape: * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions * Target: (N,∗)(N, *) , same shape as the input * Output: scalar by default. If :attr:`reduction` is `'none'`, then (N,∗)(N, *) , the same shape as the input # L1Loss `class torch.nn.L1Loss(size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#L1Loss) Creates a criterion that measures the mean absolute error (MAE) between each element in the input xx and target yy . The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as: ℓ(x,y)=L={l1,…,lN}⊤,ln=∣xn−yn∣,\ell(x, y) = L = \\{l_1,\dots,l_N\\}^\top, \quad l_n = \left| x_n - y_n \right|, where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`), then: ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases} xx and yy are tensors of arbitrary shapes with a total of nn elements each. The sum operation still operates over all the elements, and divides by nn . The division by nn can be avoided if one sets `reduction = 'sum'`. Supports real-valued and complex-valued inputs. Parameters * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions * Target: (N,∗)(N, *) , same shape as the input * Output: scalar. If `reduction` is `'none'`, then (N,∗)(N, *) , same shape as the input Examples: >>> loss = nn.L1Loss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> output = loss(input, target) >>> output.backward() # LayerNorm `class torch.nn.LayerNorm(normalized_shape, eps=1e-05, elementwise_affine=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/normalization.html#LayerNorm) Applies Layer Normalization over a mini-batch of inputs as described in the paper [Layer Normalization](https://arxiv.org/abs/1607.06450) y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated separately over the last certain number dimensions which have to be of the shape specified by `normalized_shape`. γ\gamma and β\beta are learnable affine transform parameters of `normalized_shape` if `elementwise_affine` is `True`. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. Note Unlike Batch Normalization and Instance Normalization, which applies scalar scale and bias for each entire channel/plane with the `affine` option, Layer Normalization applies per-element scale and bias with `elementwise_affine`. This layer uses statistics computed from input data in both training and evaluation modes. Parameters * **normalized_shape** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _or_ _torch.Size_) – input shape from an expected input of size [∗×normalized_shape[0]×normalized_shape[1]×…×normalized_shape[−1]][* \times \text{normalized\\_shape}[0] \times \text{normalized\\_shape}[1] \times \ldots \times \text{normalized\\_shape}[-1]] If a single integer is used, it is treated as a singleton list, and this module will normalize over the last dimension which is expected to be of that specific size. * **eps** – a value added to the denominator for numerical stability. Default: 1e-5 * **elementwise_affine** – a boolean value that when set to `True`, this module has learnable per-element affine parameters initialized to ones (for weights) and zeros (for biases). Default: `True`. Shape: * Input: (N,∗)(N, *) * Output: (N,∗)(N, *) (same shape as input) Examples: >>> input = torch.randn(20, 5, 10, 10) >>> # With Learnable Parameters >>> m = nn.LayerNorm(input.size()[1:]) >>> # Without Learnable Parameters >>> m = nn.LayerNorm(input.size()[1:], elementwise_affine=False) >>> # Normalize over last two dimensions >>> m = nn.LayerNorm([10, 10]) >>> # Normalize over last dimension of size 10 >>> m = nn.LayerNorm(10) >>> # Activating the module >>> output = m(input) # LazyConv1d `class torch.nn.LazyConv1d(out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConv1d) A [`torch.nn.Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") module with lazy initialization of the `in_channels` argument of the [`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") that is inferred from the `input.size(1)`. Parameters * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0 * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` See also [`torch.nn.Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") and [`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") `cls_to_become` alias of [`Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") # LazyConv2d `class torch.nn.LazyConv2d(out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConv2d) A [`torch.nn.Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") module with lazy initialization of the `in_channels` argument of the [`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") that is inferred from the `input.size(1)`. Parameters * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0 * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` See also [`torch.nn.Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") and [`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") `cls_to_become` alias of [`Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") # LazyConv3d `class torch.nn.LazyConv3d(out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConv3d) A [`torch.nn.Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") module with lazy initialization of the `in_channels` argument of the [`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") that is inferred from the `input.size(1)`. Parameters * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Zero-padding added to both sides of the input. Default: 0 * **padding_mode** (_string_ _,__optional_) – `'zeros'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'zeros'` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` See also [`torch.nn.Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") and [`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") `cls_to_become` alias of [`Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") # LazyConvTranspose1d `class torch.nn.LazyConvTranspose1d(out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConvTranspose1d) A [`torch.nn.ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") module with lazy initialization of the `in_channels` argument of the [`ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") that is inferred from the `input.size(1)`. Parameters * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of the input. Default: 0 * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of the output shape. Default: 0 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 See also [`torch.nn.ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") and [`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") `cls_to_become` alias of [`ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") # LazyConvTranspose2d `class torch.nn.LazyConvTranspose2d(out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConvTranspose2d) A [`torch.nn.ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") module with lazy initialization of the `in_channels` argument of the [`ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") that is inferred from the `input.size(1)`. Parameters * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0 * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 See also [`torch.nn.ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") and [`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") `cls_to_become` alias of [`ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") # LazyConvTranspose3d `class torch.nn.LazyConvTranspose3d(out_channels, kernel_size, stride=1, padding=0, output_padding=0, groups=1, bias=True, dilation=1, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/conv.html#LazyConvTranspose3d) A [`torch.nn.ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") module with lazy initialization of the `in_channels` argument of the [`ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") that is inferred from the `input.size(1)`. Parameters * **out_channels** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of channels produced by the convolution * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the convolving kernel * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Stride of the convolution. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Default: 0 * **output_padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Additional size added to one side of each dimension in the output shape. Default: 0 * **groups** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of blocked connections from input channels to output channels. Default: 1 * **bias** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, adds a learnable bias to the output. Default: `True` * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – Spacing between kernel elements. Default: 1 See also [`torch.nn.ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") and [`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") `cls_to_become` alias of [`ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") # LazyLinear `class torch.nn.LazyLinear(out_features, bias=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#LazyLinear) A [`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear") module with lazy initialization. In this module, the `weight` and `bias` are of `torch.nn.UninitializedParameter` class. They will be initialized after the first call to `forward` is done and the module will become a regular [`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear") module. Check the [`torch.nn.modules.lazy.LazyModuleMixin`](torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") for further documentation on lazy modules and their limitations. Parameters * **out_features** – size of each output sample * **bias** – If set to `False`, the layer will not learn an additive bias. Default: `True` Variables * **~LazyLinear.weight** – the learnable weights of the module of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) . The values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in_featuresk = \frac{1}{\text{in\\_features}} * **~LazyLinear.bias** – the learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1in_featuresk = \frac{1}{\text{in\\_features}} `cls_to_become` alias of [`Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear") # LeakyReLU `class torch.nn.LeakyReLU(negative_slope=0.01, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#LeakyReLU) Applies the element-wise function: LeakyReLU(x)=max⁡(0,x)+negative_slope∗min⁡(0,x)\text{LeakyReLU}(x) = \max(0, x) + \text{negative\\_slope} * \min(0, x) or LeakyRELU(x)={x, if x≥0negative_slope×x, otherwise \text{LeakyRELU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\\ \text{negative\\_slope} \times x, & \text{ otherwise } \end{cases} Parameters * **negative_slope** – Controls the angle of the negative slope. Default: 1e-2 * **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.LeakyReLU(0.1) >>> input = torch.randn(2) >>> output = m(input) # Linear `class torch.nn.Linear(in_features, out_features, bias=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/linear.html#Linear) Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b This module supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Parameters * **in_features** – size of each input sample * **out_features** – size of each output sample * **bias** – If set to `False`, the layer will not learn an additive bias. Default: `True` Shape: * Input: (N,∗,Hin)(N, *, H_{in}) where ∗* means any number of additional dimensions and Hin=in_featuresH_{in} = \text{in\\_features} * Output: (N,∗,Hout)(N, *, H_{out}) where all but the last dimension are the same shape as the input and Hout=out_featuresH_{out} = \text{out\\_features} . Variables * **~Linear.weight** – the learnable weights of the module of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) . The values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) , where k=1in_featuresk = \frac{1}{\text{in\\_features}} * **~Linear.bias** – the learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1in_featuresk = \frac{1}{\text{in\\_features}} Examples: >>> m = nn.Linear(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30]) # LocalResponseNorm `class torch.nn.LocalResponseNorm(size, alpha=0.0001, beta=0.75, k=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/normalization.html#LocalResponseNorm) Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. Applies normalization across channels. bc=ac(k+αn∑c′=max⁡(0,c−n/2)min⁡(N−1,c+n/2)ac′2)−βb_{c} = a_{c}\left(k + \frac{\alpha}{n} \sum_{c'=\max(0, c-n/2)}^{\min(N-1,c+n/2)}a_{c'}^2\right)^{-\beta} Parameters * **size** – amount of neighbouring channels used for normalization * **alpha** – multiplicative factor. Default: 0.0001 * **beta** – exponent. Default: 0.75 * **k** – additive factor. Default: 1 Shape: * Input: (N,C,∗)(N, C, *) * Output: (N,C,∗)(N, C, *) (same shape as input) Examples: >>> lrn = nn.LocalResponseNorm(2) >>> signal_2d = torch.randn(32, 5, 24, 24) >>> signal_4d = torch.randn(16, 5, 7, 7, 7, 7) >>> output_2d = lrn(signal_2d) >>> output_4d = lrn(signal_4d) # LogSigmoid `class torch.nn.LogSigmoid` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#LogSigmoid) Applies the element-wise function: LogSigmoid(x)=log⁡(11+exp⁡(−x))\text{LogSigmoid}(x) = \log\left(\frac{ 1 }{ 1 + \exp(-x)}\right) Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.LogSigmoid() >>> input = torch.randn(2) >>> output = m(input) # LogSoftmax `class torch.nn.LogSoftmax(dim=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#LogSoftmax) Applies the log⁡(Softmax(x))\log(\text{Softmax}(x)) function to an n-dimensional input Tensor. The LogSoftmax formulation can be simplified as: LogSoftmax(xi)=log⁡(exp⁡(xi)∑jexp⁡(xj))\text{LogSoftmax}(x_{i}) = \log\left(\frac{\exp(x_i) }{ \sum_j \exp(x_j)} \right) Shape: * Input: (∗)(*) where `*` means, any number of additional dimensions * Output: (∗)(*) , same shape as the input Parameters **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which LogSoftmax will be computed. Returns a Tensor of the same dimension and shape as the input with values in the range [-inf, 0) Examples: >>> m = nn.LogSoftmax() >>> input = torch.randn(2, 3) >>> output = m(input) # LPPool1d `class torch.nn.LPPool1d(norm_type, kernel_size, stride=None, ceil_mode=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#LPPool1d) Applies a 1D power-average pooling over an input signal composed of several input planes. On each window, the function computed is: f(X)=∑x∈Xxppf(X) = \sqrt[p]{\sum_{x \in X} x^{p}} * At p = ∞\infty , one gets Max Pooling * At p = 1, one gets Sum Pooling (which is proportional to Average Pooling) Note If the sum to the power of `p` is zero, the gradient of this function is not defined. This implementation will set the gradient to zero in this case. Parameters * **kernel_size** – a single int, the size of the window * **stride** – a single int, the stride of the window. Default value is `kernel_size` * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape Shape: * Input: (N,C,Lin)(N, C, L_{in}) * Output: (N,C,Lout)(N, C, L_{out}) , where Lout=⌊Lin−kernel_sizestride+1⌋L_{out} = \left\lfloor\frac{L_{in} - \text{kernel\\_size}}{\text{stride}} + 1\right\rfloor Examples:: >>> # power-2 pool of window of length 3, with stride 2. >>> m = nn.LPPool1d(2, 3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input) # LPPool2d `class torch.nn.LPPool2d(norm_type, kernel_size, stride=None, ceil_mode=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#LPPool2d) Applies a 2D power-average pooling over an input signal composed of several input planes. On each window, the function computed is: f(X)=∑x∈Xxppf(X) = \sqrt[p]{\sum_{x \in X} x^{p}} * At p = ∞\infty , one gets Max Pooling * At p = 1, one gets Sum Pooling (which is proportional to average pooling) The parameters `kernel_size`, `stride` can either be: * a single `int` – in which case the same value is used for the height and width dimension * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension Note If the sum to the power of `p` is zero, the gradient of this function is not defined. This implementation will set the gradient to zero in this case. Parameters * **kernel_size** – the size of the window * **stride** – the stride of the window. Default value is `kernel_size` * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where Hout=⌊Hin−kernel_size[0]stride[0]+1⌋H_{out} = \left\lfloor\frac{H_{in} - \text{kernel\\_size}[0]}{\text{stride}[0]} + 1\right\rfloor Wout=⌊Win−kernel_size[1]stride[1]+1⌋W_{out} = \left\lfloor\frac{W_{in} - \text{kernel\\_size}[1]}{\text{stride}[1]} + 1\right\rfloor Examples: >>> # power-2 pool of square window of size=3, stride=2 >>> m = nn.LPPool2d(2, 3, stride=2) >>> # pool of non-square window of power 1.2 >>> m = nn.LPPool2d(1.2, (3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) # LSTM `class torch.nn.LSTM(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#LSTM) Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. For each element in the input sequence, each layer computes the following function: it=σ(Wiixt+bii+Whiht−1+bhi)ft=σ(Wifxt+bif+Whfht−1+bhf)gt=tanh⁡(Wigxt+big+Whght−1+bhg)ot=σ(Wioxt+bio+Whoht−1+bho)ct=ft⊙ct−1+it⊙gtht=ot⊙tanh⁡(ct)\begin{array}{ll} \\\ i_t = \sigma(W_{ii} x_t + b_{ii} + W_{hi} h_{t-1} + b_{hi}) \\\ f_t = \sigma(W_{if} x_t + b_{if} + W_{hf} h_{t-1} + b_{hf}) \\\ g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hg} h_{t-1} + b_{hg}) \\\ o_t = \sigma(W_{io} x_t + b_{io} + W_{ho} h_{t-1} + b_{ho}) \\\ c_t = f_t \odot c_{t-1} + i_t \odot g_t \\\ h_t = o_t \odot \tanh(c_t) \\\ \end{array} where hth_t is the hidden state at time `t`, ctc_t is the cell state at time `t`, xtx_t is the input at time `t`, ht−1h_{t-1} is the hidden state of the layer at time `t-1` or the initial hidden state at time `0`, and iti_t , ftf_t , gtg_t , oto_t are the input, forget, cell, and output gates, respectively. σ\sigma is the sigmoid function, and ⊙\odot is the Hadamard product. In a multilayer LSTM, the input xt(l)x^{(l)}_t of the ll -th layer (l>=2l >= 2 ) is the hidden state ht(l−1)h^{(l-1)}_t of the previous layer multiplied by dropout δt(l−1)\delta^{(l-1)}_t where each δt(l−1)\delta^{(l-1)}_t is a Bernoulli random variable which is 00 with probability `dropout`. If `proj_size > 0` is specified, LSTM with projections will be used. This changes the LSTM cell in the following way. First, the dimension of hth_t will be changed from `hidden_size` to `proj_size` (dimensions of WhiW_{hi} will be changed accordingly). Second, the output hidden state of each layer will be multiplied by a learnable projection matrix: ht=Whrhth_t = W_{hr}h_t . Note that as a consequence of this, the output of LSTM network will be of different shape as well. See Inputs/Outputs sections below for exact dimensions of all variables. You can find more details in . Parameters * **input_size** – The number of expected features in the input `x` * **hidden_size** – The number of features in the hidden state `h` * **num_layers** – Number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two LSTMs together to form a `stacked LSTM`, with the second LSTM taking in outputs of the first LSTM and computing the final results. Default: 1 * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True` * **batch_first** – If `True`, then the input and output tensors are provided as (batch, seq, feature). Default: `False` * **dropout** – If non-zero, introduces a `Dropout` layer on the outputs of each LSTM layer except the last layer, with dropout probability equal to `dropout`. Default: 0 * **bidirectional** – If `True`, becomes a bidirectional LSTM. Default: `False` * **proj_size** – If `> 0`, will use LSTM with projections of corresponding size. Default: 0 Inputs: input, (h_0, c_0) * **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See [`torch.nn.utils.rnn.pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") or [`torch.nn.utils.rnn.pack_sequence()`](torch.nn.utils.rnn.pack_sequence#torch.nn.utils.rnn.pack_sequence "torch.nn.utils.rnn.pack_sequence") for details. * **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. If the LSTM is bidirectional, num_directions should be 2, else it should be 1. If `proj_size > 0` was specified, the shape has to be `(num_layers * num_directions, batch, proj_size)`. * **c_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial cell state for each element in the batch. If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero. Outputs: output, (h_n, c_n) * **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features `(h_t)` from the last layer of the LSTM, for each `t`. If a [`torch.nn.utils.rnn.PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") has been given as the input, the output will also be a packed sequence. If `proj_size > 0` was specified, output shape will be `(seq_len, batch, num_directions * proj_size)`. For the unpacked case, the directions can be separated using `output.view(seq_len, batch, num_directions, hidden_size)`, with forward and backward being direction `0` and `1` respectively. Similarly, the directions can be separated in the packed case. * **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`. If `proj_size > 0` was specified, `h_n` shape will be `(num_layers * num_directions, batch, proj_size)`. Like _output_ , the layers can be separated using `h_n.view(num_layers, num_directions, batch, hidden_size)` and similarly for _c_n_. * **c_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the cell state for `t = seq_len`. Variables * **~LSTM.weight_ih_l[k]** – the learnable input-hidden weights of the kth\text{k}^{th} layer `(W_ii|W_if|W_ig|W_io)`, of shape `(4*hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(4*hidden_size, num_directions * hidden_size)` * **~LSTM.weight_hh_l[k]** – the learnable hidden-hidden weights of the kth\text{k}^{th} layer `(W_hi|W_hf|W_hg|W_ho)`, of shape `(4*hidden_size, hidden_size)`. If `proj_size > 0` was specified, the shape will be `(4*hidden_size, proj_size)`. * **~LSTM.bias_ih_l[k]** – the learnable input-hidden bias of the kth\text{k}^{th} layer `(b_ii|b_if|b_ig|b_io)`, of shape `(4*hidden_size)` * **~LSTM.bias_hh_l[k]** – the learnable hidden-hidden bias of the kth\text{k}^{th} layer `(b_hi|b_hf|b_hg|b_ho)`, of shape `(4*hidden_size)` * **~LSTM.weight_hr_l[k]** – the learnable projection weights of the kth\text{k}^{th} layer of shape `(proj_size, hidden_size)`. Only present when `proj_size > 0` was specified. Note All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}} Warning There are known non-determinism issues for RNN functions on some versions of cuDNN and CUDA. You can enforce deterministic behavior by setting the following environment variables: On CUDA 10.1, set environment variable `CUDA_LAUNCH_BLOCKING=1`. This may affect performance. On CUDA 10.2 or later, set environment variable (note the leading colon symbol) `CUBLAS_WORKSPACE_CONFIG=:16:8` or `CUBLAS_WORKSPACE_CONFIG=:4096:2`. See the [cuDNN 8 Release Notes](https://docs.nvidia.com/deeplearning/sdk/cudnn-release- notes/rel_8.html) for more information. Orphan Note If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype `torch.float16` 4) V100 GPU is used, 5) input data is not in `PackedSequence` format persistent algorithm can be selected to improve performance. Examples: >>> rnn = nn.LSTM(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> c0 = torch.randn(2, 3, 20) >>> output, (hn, cn) = rnn(input, (h0, c0)) # LSTMCell `class torch.nn.LSTMCell(input_size, hidden_size, bias=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#LSTMCell) A long short-term memory (LSTM) cell. i=σ(Wiix+bii+Whih+bhi)f=σ(Wifx+bif+Whfh+bhf)g=tanh⁡(Wigx+big+Whgh+bhg)o=σ(Wiox+bio+Whoh+bho)c′=f∗c+i∗gh′=o∗tanh⁡(c′)\begin{array}{ll} i = \sigma(W_{ii} x + b_{ii} + W_{hi} h + b_{hi}) \\\ f = \sigma(W_{if} x + b_{if} + W_{hf} h + b_{hf}) \\\ g = \tanh(W_{ig} x + b_{ig} + W_{hg} h + b_{hg}) \\\ o = \sigma(W_{io} x + b_{io} + W_{ho} h + b_{ho}) \\\ c' = f * c + i * g \\\ h' = o * \tanh(c') \\\ \end{array} where σ\sigma is the sigmoid function, and ∗* is the Hadamard product. Parameters * **input_size** – The number of expected features in the input `x` * **hidden_size** – The number of features in the hidden state `h` * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True` Inputs: input, (h_0, c_0) * **input** of shape `(batch, input_size)`: tensor containing input features * **h_0** of shape `(batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. * **c_0** of shape `(batch, hidden_size)`: tensor containing the initial cell state for each element in the batch. If `(h_0, c_0)` is not provided, both **h_0** and **c_0** default to zero. Outputs: (h_1, c_1) * **h_1** of shape `(batch, hidden_size)`: tensor containing the next hidden state for each element in the batch * **c_1** of shape `(batch, hidden_size)`: tensor containing the next cell state for each element in the batch Variables * **~LSTMCell.weight_ih** – the learnable input-hidden weights, of shape `(4*hidden_size, input_size)` * **~LSTMCell.weight_hh** – the learnable hidden-hidden weights, of shape `(4*hidden_size, hidden_size)` * **~LSTMCell.bias_ih** – the learnable input-hidden bias, of shape `(4*hidden_size)` * **~LSTMCell.bias_hh** – the learnable hidden-hidden bias, of shape `(4*hidden_size)` Note All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}} Examples: >>> rnn = nn.LSTMCell(10, 20) >>> input = torch.randn(3, 10) >>> hx = torch.randn(3, 20) >>> cx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx, cx = rnn(input[i], (hx, cx)) output.append(hx) # MarginRankingLoss `class torch.nn.MarginRankingLoss(margin=0.0, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MarginRankingLoss) Creates a criterion that measures the loss given inputs x1x1 , x2x2 , two 1D mini-batch `Tensors`, and a label 1D mini-batch tensor yy (containing 1 or -1). If y=1y = 1 then it assumed the first input should be ranked higher (have a larger value) than the second input, and vice-versa for y=−1y = -1 . The loss function for each pair of samples in the mini-batch is: loss(x1,x2,y)=max⁡(0,−y∗(x1−x2)+margin)\text{loss}(x1, x2, y) = \max(0, -y * (x1 - x2) + \text{margin}) Parameters * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Has a default value of 00 . * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input1: (N)(N) where `N` is the batch size. * Input2: (N)(N) , same shape as the Input1. * Target: (N)(N) , same shape as the inputs. * Output: scalar. If `reduction` is `'none'`, then (N)(N) . Examples: >>> loss = nn.MarginRankingLoss() >>> input1 = torch.randn(3, requires_grad=True) >>> input2 = torch.randn(3, requires_grad=True) >>> target = torch.randn(3).sign() >>> output = loss(input1, input2, target) >>> output.backward() # MaxPool1d `class torch.nn.MaxPool1d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxPool1d) Applies a 1D max pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,C,L)(N, C, L) and output (N,C,Lout)(N, C, L_{out}) can be precisely described as: out(Ni,Cj,k)=max⁡m=0,…,kernel_size−1input(Ni,Cj,stride×k+m)out(N_i, C_j, k) = \max_{m=0, \ldots, \text{kernel\\_size} - 1} input(N_i, C_j, stride \times k + m) If `padding` is non-zero, then the input is implicitly padded with negative infinity on both sides for `padding` number of points. `dilation` is the stride between the elements within the sliding window. This [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of the pooling parameters. Note When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored. Parameters * **kernel_size** – The size of the sliding window, must be > 0. * **stride** – The stride of the sliding window, must be > 0\. Default value is `kernel_size`. * **padding** – Implicit negative infinity padding to be added on both sides, must be >= 0 and <= kernel_size / 2. * **dilation** – The stride between elements within a sliding window, must be > 0. * **return_indices** – If `True`, will return the argmax along with the max values. Useful for [`torch.nn.MaxUnpool1d`](torch.nn.maxunpool1d#torch.nn.MaxUnpool1d "torch.nn.MaxUnpool1d") later * **ceil_mode** – If `True`, will use `ceil` instead of `floor` to compute the output shape. This ensures that every element in the input tensor is covered by a sliding window. Shape: * Input: (N,C,Lin)(N, C, L_{in}) * Output: (N,C,Lout)(N, C, L_{out}) , where Lout=⌊Lin+2×padding−dilation×(kernel_size−1)−1stride+1⌋L_{out} = \left\lfloor \frac{L_{in} + 2 \times \text{padding} - \text{dilation} \times (\text{kernel\\_size} - 1) - 1}{\text{stride}} + 1\right\rfloor Examples: >>> # pool of size=3, stride=2 >>> m = nn.MaxPool1d(3, stride=2) >>> input = torch.randn(20, 16, 50) >>> output = m(input) # MaxPool2d `class torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxPool2d) Applies a 2D max pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,C,H,W)(N, C, H, W) , output (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) and `kernel_size` (kH,kW)(kH, kW) can be precisely described as: out(Ni,Cj,h,w)=max⁡m=0,…,kH−1max⁡n=0,…,kW−1input(Ni,Cj,stride[0]×h+m,stride[1]×w+n)\begin{aligned} out(N_i, C_j, h, w) ={} & \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\\ & \text{input}(N_i, C_j, \text{stride[0]} \times h + m, \text{stride[1]} \times w + n) \end{aligned} If `padding` is non-zero, then the input is implicitly zero-padded on both sides for `padding` number of points. `dilation` controls the spacing between the kernel points. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. Note When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored. The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be: * a single `int` – in which case the same value is used for the height and width dimension * a `tuple` of two ints – in which case, the first `int` is used for the height dimension, and the second `int` for the width dimension Parameters * **kernel_size** – the size of the window to take a max over * **stride** – the stride of the window. Default value is `kernel_size` * **padding** – implicit zero padding to be added on both sides * **dilation** – a parameter that controls the stride of elements in the window * **return_indices** – if `True`, will return the max indices along with the outputs. Useful for [`torch.nn.MaxUnpool2d`](torch.nn.maxunpool2d#torch.nn.MaxUnpool2d "torch.nn.MaxUnpool2d") later * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where Hout=⌊Hin+2∗padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋H_{out} = \left\lfloor\frac{H_{in} + 2 * \text{padding[0]} - \text{dilation[0]} \times (\text{kernel\\_size[0]} - 1) - 1}{\text{stride[0]}} + 1\right\rfloor Wout=⌊Win+2∗padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋W_{out} = \left\lfloor\frac{W_{in} + 2 * \text{padding[1]} - \text{dilation[1]} \times (\text{kernel\\_size[1]} - 1) - 1}{\text{stride[1]}} + 1\right\rfloor Examples: >>> # pool of square window of size=3, stride=2 >>> m = nn.MaxPool2d(3, stride=2) >>> # pool of non-square window >>> m = nn.MaxPool2d((3, 2), stride=(2, 1)) >>> input = torch.randn(20, 16, 50, 32) >>> output = m(input) # MaxPool3d `class torch.nn.MaxPool3d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxPool3d) Applies a 3D max pooling over an input signal composed of several input planes. In the simplest case, the output value of the layer with input size (N,C,D,H,W)(N, C, D, H, W) , output (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) and `kernel_size` (kD,kH,kW)(kD, kH, kW) can be precisely described as: out(Ni,Cj,d,h,w)=max⁡k=0,…,kD−1max⁡m=0,…,kH−1max⁡n=0,…,kW−1input(Ni,Cj,stride[0]×d+k,stride[1]×h+m,stride[2]×w+n)\begin{aligned} \text{out}(N_i, C_j, d, h, w) ={} & \max_{k=0, \ldots, kD-1} \max_{m=0, \ldots, kH-1} \max_{n=0, \ldots, kW-1} \\\ & \text{input}(N_i, C_j, \text{stride[0]} \times d + k, \text{stride[1]} \times h + m, \text{stride[2]} \times w + n) \end{aligned} If `padding` is non-zero, then the input is implicitly zero-padded on both sides for `padding` number of points. `dilation` controls the spacing between the kernel points. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. Note When ceil_mode=True, sliding windows are allowed to go off-bounds if they start within the left padding or the input. Sliding windows that would start in the right padded region are ignored. The parameters `kernel_size`, `stride`, `padding`, `dilation` can either be: * a single `int` – in which case the same value is used for the depth, height and width dimension * a `tuple` of three ints – in which case, the first `int` is used for the depth dimension, the second `int` for the height dimension and the third `int` for the width dimension Parameters * **kernel_size** – the size of the window to take a max over * **stride** – the stride of the window. Default value is `kernel_size` * **padding** – implicit zero padding to be added on all three sides * **dilation** – a parameter that controls the stride of elements in the window * **return_indices** – if `True`, will return the max indices along with the outputs. Useful for [`torch.nn.MaxUnpool3d`](torch.nn.maxunpool3d#torch.nn.MaxUnpool3d "torch.nn.MaxUnpool3d") later * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape Shape: * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in}) * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where Dout=⌊Din+2×padding[0]−dilation[0]×(kernel_size[0]−1)−1stride[0]+1⌋D_{out} = \left\lfloor\frac{D_{in} + 2 \times \text{padding}[0] - \text{dilation}[0] \times (\text{kernel\\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor Hout=⌊Hin+2×padding[1]−dilation[1]×(kernel_size[1]−1)−1stride[1]+1⌋H_{out} = \left\lfloor\frac{H_{in} + 2 \times \text{padding}[1] - \text{dilation}[1] \times (\text{kernel\\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor Wout=⌊Win+2×padding[2]−dilation[2]×(kernel_size[2]−1)−1stride[2]+1⌋W_{out} = \left\lfloor\frac{W_{in} + 2 \times \text{padding}[2] - \text{dilation}[2] \times (\text{kernel\\_size}[2] - 1) - 1}{\text{stride}[2]} + 1\right\rfloor Examples: >>> # pool of square window of size=3, stride=2 >>> m = nn.MaxPool3d(3, stride=2) >>> # pool of non-square window >>> m = nn.MaxPool3d((3, 2, 2), stride=(2, 1, 2)) >>> input = torch.randn(20, 16, 50,44, 31) >>> output = m(input) # MaxUnpool1d `class torch.nn.MaxUnpool1d(kernel_size, stride=None, padding=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxUnpool1d) Computes a partial inverse of [`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d"). [`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") is not fully invertible, since the non-maximal values are lost. `MaxUnpool1d` takes in as input the output of [`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") including the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero. Note [`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") can map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argument `output_size` in the forward call. See the Inputs and Example below. Parameters * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the max pooling window. * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Stride of the max pooling window. It is set to `kernel_size` by default. * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Padding that was added to the input Inputs: * `input`: the input Tensor to invert * `indices`: the indices given out by [`MaxPool1d`](torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") * `output_size` (optional): the targeted output size Shape: * Input: (N,C,Hin)(N, C, H_{in}) * Output: (N,C,Hout)(N, C, H_{out}) , where Hout=(Hin−1)×stride[0]−2×padding[0]+kernel_size[0]H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{kernel\\_size}[0] or as given by `output_size` in the call operator Example: >>> pool = nn.MaxPool1d(2, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool1d(2, stride=2) >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8]]]) >>> output, indices = pool(input) >>> unpool(output, indices) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]]) >>> # Example showcasing the use of output_size >>> input = torch.tensor([[[1., 2, 3, 4, 5, 6, 7, 8, 9]]]) >>> output, indices = pool(input) >>> unpool(output, indices, output_size=input.size()) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8., 0.]]]) >>> unpool(output, indices) tensor([[[ 0., 2., 0., 4., 0., 6., 0., 8.]]]) # MaxUnpool2d `class torch.nn.MaxUnpool2d(kernel_size, stride=None, padding=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxUnpool2d) Computes a partial inverse of [`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d"). [`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") is not fully invertible, since the non-maximal values are lost. `MaxUnpool2d` takes in as input the output of [`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") including the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero. Note [`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") can map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argument `output_size` in the forward call. See the Inputs and Example below. Parameters * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the max pooling window. * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Stride of the max pooling window. It is set to `kernel_size` by default. * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Padding that was added to the input Inputs: * `input`: the input Tensor to invert * `indices`: the indices given out by [`MaxPool2d`](torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") * `output_size` (optional): the targeted output size Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) , where Hout=(Hin−1)×stride[0]−2×padding[0]+kernel_size[0]H_{out} = (H_{in} - 1) \times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\\_size[0]} Wout=(Win−1)×stride[1]−2×padding[1]+kernel_size[1]W_{out} = (W_{in} - 1) \times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\\_size[1]} or as given by `output_size` in the call operator Example: >>> pool = nn.MaxPool2d(2, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool2d(2, stride=2) >>> input = torch.tensor([[[[ 1., 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12], [13, 14, 15, 16]]]]) >>> output, indices = pool(input) >>> unpool(output, indices) tensor([[[[ 0., 0., 0., 0.], [ 0., 6., 0., 8.], [ 0., 0., 0., 0.], [ 0., 14., 0., 16.]]]]) >>> # specify a different output size than input size >>> unpool(output, indices, output_size=torch.Size([1, 1, 5, 5])) tensor([[[[ 0., 0., 0., 0., 0.], [ 6., 0., 8., 0., 0.], [ 0., 0., 0., 14., 0.], [ 16., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0.]]]]) # MaxUnpool3d `class torch.nn.MaxUnpool3d(kernel_size, stride=None, padding=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pooling.html#MaxUnpool3d) Computes a partial inverse of [`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d"). [`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") is not fully invertible, since the non-maximal values are lost. `MaxUnpool3d` takes in as input the output of [`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") including the indices of the maximal values and computes a partial inverse in which all non-maximal values are set to zero. Note [`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") can map several input sizes to the same output sizes. Hence, the inversion process can get ambiguous. To accommodate this, you can provide the needed output size as an additional argument `output_size` in the forward call. See the Inputs section below. Parameters * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Size of the max pooling window. * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Stride of the max pooling window. It is set to `kernel_size` by default. * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Padding that was added to the input Inputs: * `input`: the input Tensor to invert * `indices`: the indices given out by [`MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") * `output_size` (optional): the targeted output size Shape: * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in}) * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where Dout=(Din−1)×stride[0]−2×padding[0]+kernel_size[0]D_{out} = (D_{in} - 1) \times \text{stride[0]} - 2 \times \text{padding[0]} + \text{kernel\\_size[0]} Hout=(Hin−1)×stride[1]−2×padding[1]+kernel_size[1]H_{out} = (H_{in} - 1) \times \text{stride[1]} - 2 \times \text{padding[1]} + \text{kernel\\_size[1]} Wout=(Win−1)×stride[2]−2×padding[2]+kernel_size[2]W_{out} = (W_{in} - 1) \times \text{stride[2]} - 2 \times \text{padding[2]} + \text{kernel\\_size[2]} or as given by `output_size` in the call operator Example: >>> # pool of square window of size=3, stride=2 >>> pool = nn.MaxPool3d(3, stride=2, return_indices=True) >>> unpool = nn.MaxUnpool3d(3, stride=2) >>> output, indices = pool(torch.randn(20, 16, 51, 33, 15)) >>> unpooled_output = unpool(output, indices) >>> unpooled_output.size() torch.Size([20, 16, 51, 33, 15]) # Module `class torch.nn.Module` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module) Base class for all neural network modules. Your models should also subclass this class. Modules can also contain other Modules, allowing to nest them in a tree structure. You can assign the submodules as regular attributes: import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x)) Submodules assigned in this way will be registered, and will have their parameters converted too when you call `to()`, etc. Variables **training** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Boolean represents whether this module is in training or evaluation mode. `add_module(name, module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.add_module) Adds a child module to the current module. The module can be accessed as an attribute using the given name. Parameters * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name * **module** (Module) – child module to be added to the module. `apply(fn)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.apply) Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self. Typical use includes initializing the parameters of a model (see also [torch.nn.init](../nn.init#nn-init-doc)). Parameters **fn** (`Module` -> None) – function to be applied to each submodule Returns self Return type Module Example: >>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) `bfloat16()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.bfloat16) Casts all floating point parameters and buffers to `bfloat16` datatype. Returns self Return type Module `buffers(recurse=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.buffers) Returns an iterator over module buffers. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _torch.Tensor_ – module buffer Example: >>> for buf in model.buffers(): >>> print(type(buf), buf.size()) (20L,) (20L, 1L, 5L, 5L) `children()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.children) Returns an iterator over immediate children modules. Yields _Module_ – a child module `cpu()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.cpu) Moves all model parameters and buffers to the CPU. Returns self Return type Module `cuda(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.cuda) Moves all model parameters and buffers to the GPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type Module `double()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.double) Casts all floating point parameters and buffers to `double` datatype. Returns self Return type Module `dump_patches: bool = False` This allows better BC support for `load_state_dict()`. In `state_dict()`, the version number will be saved as in the attribute `_metadata` of the returned state dict, and thus pickled. `_metadata` is a dictionary with keys that follow the naming convention of state dict. See `_load_from_state_dict` on how to use this information in loading. If new parameters/buffers are added/removed from a module, this number shall be bumped, and the module’s `_load_from_state_dict` method can compare the version number and do appropriate changes if the state dict is from before the change. `eval()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.eval) Sets the module in evaluation mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout"), `BatchNorm`, etc. This is equivalent with `self.train(False)`. Returns self Return type Module `extra_repr()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.extra_repr) Set the extra representation of the module To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable. `float()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.float) Casts all floating point parameters and buffers to float datatype. Returns self Return type Module `forward(*input)` Defines the computation performed at every call. Should be overridden by all subclasses. Note Although the recipe for forward pass needs to be defined within this function, one should call the `Module` instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them. `half()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.half) Casts all floating point parameters and buffers to `half` datatype. Returns self Return type Module `load_state_dict(state_dict, strict=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.load_state_dict) Copies parameters and buffers from `state_dict` into this module and its descendants. If `strict` is `True`, then the keys of `state_dict` must exactly match the keys returned by this module’s `state_dict()` function. Parameters * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s `state_dict()` function. Default: `True` Returns * **missing_keys** is a list of str containing the missing keys * **unexpected_keys** is a list of str containing the unexpected keys Return type `NamedTuple` with `missing_keys` and `unexpected_keys` fields `modules()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.modules) Returns an iterator over all modules in the network. Yields _Module_ – a module in the network Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True) `named_buffers(prefix='', recurse=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_buffers) Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _(string, torch.Tensor)_ – Tuple containing the name and buffer Example: >>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size()) `named_children()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_children) Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple containing a name and child module Example: >>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module) `named_modules(memo=None, prefix='')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_modules) Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple of name and module Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True)) `named_parameters(prefix='', recurse=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.named_parameters) Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _(string, Parameter)_ – Tuple containing the name and parameter Example: >>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size()) `parameters(recurse=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.parameters) Returns an iterator over module parameters. This is typically passed to an optimizer. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _Parameter_ – module parameter Example: >>> for param in model.parameters(): >>> print(type(param), param.size()) (20L,) (20L, 1L, 5L, 5L) `register_backward_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_backward_hook) Registers a backward hook on the module. This function is deprecated in favor of `nn.Module.register_full_backward_hook()` and the behavior of this function will change in future versions. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_buffer(name, tensor, persistent=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_buffer) Adds a buffer to the module. This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s `running_mean` is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting `persistent` to `False`. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s `state_dict`. Buffers can be accessed as attributes using given names. Parameters * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered. * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`. Example: >>> self.register_buffer('running_mean', torch.zeros(num_features)) `register_forward_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_forward_hook) Registers a forward hook on the module. The hook will be called every time after `forward()` has computed an output. It should have the following signature: hook(module, input, output) -> None or modified output The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after `forward()` is called. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_forward_pre_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_forward_pre_hook) Registers a forward pre-hook on the module. The hook will be called every time before `forward()` is invoked. It should have the following signature: hook(module, input) -> None or modified input The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple). Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_full_backward_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_full_backward_hook) Registers a backward hook on the module. The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature: hook(module, grad_input, grad_output) -> tuple(Tensor) or None The `grad_input` and `grad_output` are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of `grad_input` in subsequent computations. `grad_input` will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output` will be `None` for all non-Tensor arguments. Warning Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_parameter(name, param)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.register_parameter) Adds a parameter to the module. The parameter can be accessed as an attribute using given name. Parameters * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module. `requires_grad_(requires_grad=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.requires_grad_) Change if autograd should record operations on parameters in this module. This method sets the parameters’ `requires_grad` attributes in-place. This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training). Parameters **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether autograd should record operations on parameters in this module. Default: `True`. Returns self Return type Module `state_dict(destination=None, prefix='', keep_vars=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.state_dict) Returns a dictionary containing a whole state of the module. Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Returns a dictionary containing a whole state of the module Return type [dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)") Example: >>> module.state_dict().keys() ['bias', 'weight'] `to(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to) Moves and/or casts the parameters and buffers. This can be called as `to(device=None, dtype=None, non_blocking=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to) `to(dtype, non_blocking=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to) `to(tensor, non_blocking=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to) `to(memory_format=torch.channels_last)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.to) Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to "torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype` (if given). The integral parameters and buffers will be moved `device`, if that is given, but with dtypes unchanged. When `non_blocking` is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. See below for examples. Note This method modifies the module in-place. Parameters * **device** (`torch.device`) – the desired device of the parameters and buffers in this module * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument) Returns self Return type Module Examples: >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128) `train(mode=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.train) Sets the module in training mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout"), `BatchNorm`, etc. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode (`False`). Default: `True`. Returns self Return type Module `type(dst_type)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.type) Casts all parameters and buffers to `dst_type`. Parameters **dst_type** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – the desired type Returns self Return type Module `xpu(device=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.xpu) Moves all model parameters and buffers to the XPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type Module `zero_grad(set_to_none=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#Module.zero_grad) Sets gradients of all model parameters to zero. See similar function under [`torch.optim.Optimizer`](../optim#torch.optim.Optimizer "torch.optim.Optimizer") for more context. Parameters **set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – instead of setting to zero, set the grads to None. See [`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad "torch.optim.Optimizer.zero_grad") for details. # ModuleDict `class torch.nn.ModuleDict(modules=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict) Holds submodules in a dictionary. `ModuleDict` can be indexed like a regular Python dictionary, but modules it contains are properly registered, and will be visible by all [`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") methods. `ModuleDict` is an **ordered** dictionary that respects * the order of insertion, and * in `update()`, the order of the merged `OrderedDict`, `dict` (started from Python 3.6) or another `ModuleDict` (the argument to `update()`). Note that `update()` with other unordered mapping types (e.g., Python’s plain `dict` before Python version 3.6) does not preserve the order of the merged mapping. Parameters **modules** (_iterable_ _,__optional_) – a mapping (dictionary) of (string: module) or an iterable of key-value pairs of type (string, module) Example: class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.choices = nn.ModuleDict({ 'conv': nn.Conv2d(10, 10, 3), 'pool': nn.MaxPool2d(3) }) self.activations = nn.ModuleDict([ ['lrelu', nn.LeakyReLU()], ['prelu', nn.PReLU()] ]) def forward(self, x, choice, act): x = self.choices[choice](x) x = self.activations[act](x) return x `clear()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.clear) Remove all items from the ModuleDict. `items()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.items) Return an iterable of the ModuleDict key/value pairs. `keys()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.keys) Return an iterable of the ModuleDict keys. `pop(key)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.pop) Remove key from the ModuleDict and return its module. Parameters **key** (_string_) – key to pop from the ModuleDict `update(modules)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.update) Update the `ModuleDict` with the key-value pairs from a mapping or an iterable, overwriting existing keys. Note If `modules` is an `OrderedDict`, a `ModuleDict`, or an iterable of key-value pairs, the order of new elements in it is preserved. Parameters **modules** (_iterable_) – a mapping (dictionary) from string to [`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module"), or an iterable of key-value pairs of type (string, [`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module")) `values()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleDict.values) Return an iterable of the ModuleDict values. # ModuleList `class torch.nn.ModuleList(modules=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList) Holds submodules in a list. `ModuleList` can be indexed like a regular Python list, but modules it contains are properly registered, and will be visible by all [`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") methods. Parameters **modules** (_iterable_ _,__optional_) – an iterable of modules to add Example: class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)]) def forward(self, x): # ModuleList can act as an iterable, or be indexed using ints for i, l in enumerate(self.linears): x = self.linears[i // 2](x) + l(x) return x `append(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList.append) Appends a given module to the end of the list. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module to append `extend(modules)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList.extend) Appends modules from a Python iterable to the end of the list. Parameters **modules** (_iterable_) – iterable of modules to append `insert(index, module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ModuleList.insert) Insert a given module before a given index in the list. Parameters * **index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index to insert. * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module to insert # LazyModuleMixin `class torch.nn.modules.lazy.LazyModuleMixin(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/lazy.html#LazyModuleMixin) A mixin for modules that lazily initialize parameters, also known as “lazy modules.” Modules that lazily initialize parameters, or “lazy modules”, derive the shapes of their parameters from the first input(s) to their forward method. Until that first forward they contain `torch.nn.UninitializedParameter`s that should not be accessed or used, and afterward they contain regular :class:`torch.nn.Parameter`s. Lazy modules are convenient since they don't require computing some module arguments, like the `in_features` argument of a typical [`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear"). After construction, networks with lazy modules should first be converted to the desired dtype and placed on the desired device. The lazy modules should then be initialized with one or more “dry runs”. These “dry runs” send inputs of the correct size, dtype, and device through the network and to each one of its lazy modules. After this the network can be used as usual. >>> class LazyMLP(torch.nn.Module): ... def __init__(self): ... super().__init__() ... self.fc1 = torch.nn.LazyLinear(10) ... self.relu1 = torch.nn.ReLU() ... self.fc2 = torch.nn.LazyLinear(1) ... self.relu2 = torch.nn.ReLU() ... ... def forward(self, input): ... x = self.relu1(self.fc1(input)) ... y = self.relu2(self.fc2(x)) ... return y >>> # constructs a network with lazy modules >>> lazy_mlp = LazyMLP() >>> # transforms the network's device and dtype >>> # NOTE: these transforms can and should be applied after construction and before any 'dry runs' >>> lazy_mlp = mlp.cuda().double() >>> lazy_mlp LazyMLP( (fc1): LazyLinear(in_features=0, out_features=10, bias=True) (relu1): ReLU() (fc2): LazyLinear(in_features=0, out_features=1, bias=True) (relu2): ReLU() ) >>> # performs a dry run to initialize the network's lazy modules >>> lazy_mlp(torch.ones(10,10).cuda()) >>> # after initialization, LazyLinear modules become regular Linear modules >>> lazy_mlp LazyMLP( (fc1): Linear(in_features=10, out_features=10, bias=True) (relu1): ReLU() (fc2): Linear(in_features=10, out_features=1, bias=True) (relu2): ReLU() ) >>> # attaches an optimizer, since parameters can now be used as usual >>> optim = torch.optim.SGD(mlp.parameters(), lr=0.01) A final caveat when using lazy modules is that the order of initialization of a network’s parameters may change, since the lazy modules are always initialized after other modules. This can cause the parameters of a network using lazy modules to be initialized differently than the parameters of a network without lazy modules. For example, if the LazyMLP class defined above had a [`torch.nn.LazyLinear`](torch.nn.lazylinear#torch.nn.LazyLinear "torch.nn.LazyLinear") module first and then a regular [`torch.nn.Linear`](torch.nn.linear#torch.nn.Linear "torch.nn.Linear") second, the second module would be initialized on construction and the first module would be initialized during the first dry run. Lazy modules can be serialized with a state dict like other modules. For example: >>> lazy_mlp = LazyMLP() >>> # The state dict shows the uninitialized parameters >>> lazy_mlp.state_dict() OrderedDict([('fc1.weight', Uninitialized parameter), ('fc1.bias', tensor([-1.8832e+25, 4.5636e-41, -1.8832e+25, 4.5636e-41, -6.1598e-30, 4.5637e-41, -1.8788e+22, 4.5636e-41, -2.0042e-31, 4.5637e-41])), ('fc2.weight', Uninitialized parameter), ('fc2.bias', tensor([0.0019]))]) Lazy modules can also load regular `torch.nn.Parameter` s, which replace their `torch.nn.UninitializedParameter` s: >>> full_mlp = LazyMLP() >>> # Dry run to initialize another module >>> full_mlp.forward(torch.ones(10, 1)) >>> # Load an initialized state into a lazy module >>> lazy_mlp.load_state_dict(full_mlp.state_dict()) >>> # The state dict now holds valid values >>> lazy_mlp.state_dict() OrderedDict([('fc1.weight', tensor([[-0.3837], [ 0.0907], [ 0.6708], [-0.5223], [-0.9028], [ 0.2851], [-0.4537], [ 0.6813], [ 0.5766], [-0.8678]])), ('fc1.bias', tensor([-1.8832e+25, 4.5636e-41, -1.8832e+25, 4.5636e-41, -6.1598e-30, 4.5637e-41, -1.8788e+22, 4.5636e-41, -2.0042e-31, 4.5637e-41])), ('fc2.weight', tensor([[ 0.1320, 0.2938, 0.0679, 0.2793, 0.1088, -0.1795, -0.2301, 0.2807, 0.2479, 0.1091]])), ('fc2.bias', tensor([0.0019]))]) Note, however, that lazy modules cannot validate that the shape of parameters they load is correct. `has_uninitialized_params()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/lazy.html#LazyModuleMixin.has_uninitialized_params) Check if a module has parameters that are not initialized `initialize_parameters(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/lazy.html#LazyModuleMixin.initialize_parameters) Initialize parameters according to the input batch properties. This adds an interface to isolate parameter initialization from the forward pass when doing parameter shape inference. # torch.nn.modules.module.register_module_backward_hook `torch.nn.modules.module.register_module_backward_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#register_module_backward_hook) Registers a backward hook common to all the modules. This function is deprecated in favor of `nn.module.register_module_full_backward_hook()` and the behavior of this function will change in future versions. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` # torch.nn.modules.module.register_module_forward_hook `torch.nn.modules.module.register_module_forward_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#register_module_forward_hook) Registers a global forward hook for all the modules Warning This adds global state to the `nn.module` module and it is only intended for debugging/profiling purposes. The hook will be called every time after `forward()` has computed an output. It should have the following signature: hook(module, input, output) -> None or modified output The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after `forward()` is called. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` This hook will be executed before specific module hooks registered with `register_forward_hook`. # torch.nn.modules.module.register_module_forward_pre_hook `torch.nn.modules.module.register_module_forward_pre_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/module.html#register_module_forward_pre_hook) Registers a forward pre-hook common to all modules. Warning This adds global state to the `nn.module` module and it is only intended for debugging/profiling purposes. The hook will be called every time before `forward()` is invoked. It should have the following signature: hook(module, input) -> None or modified input The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple). This hook has precedence over the specific module hooks registered with `register_forward_pre_hook`. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` # MSELoss `class torch.nn.MSELoss(size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MSELoss) Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input xx and target yy . The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as: ℓ(x,y)=L={l1,…,lN}⊤,ln=(xn−yn)2,\ell(x, y) = L = \\{l_1,\dots,l_N\\}^\top, \quad l_n = \left( x_n - y_n \right)^2, where NN is the batch size. If `reduction` is not `'none'` (default `'mean'`), then: ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases} xx and yy are tensors of arbitrary shapes with a total of nn elements each. The mean operation still operates over all the elements, and divides by nn . The division by nn can be avoided if one sets `reduction = 'sum'`. Parameters * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions * Target: (N,∗)(N, *) , same shape as the input Examples: >>> loss = nn.MSELoss() >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randn(3, 5) >>> output = loss(input, target) >>> output.backward() # MultiheadAttention `class torch.nn.MultiheadAttention(embed_dim, num_heads, dropout=0.0, bias=True, add_bias_kv=False, add_zero_attn=False, kdim=None, vdim=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#MultiheadAttention) Allows the model to jointly attend to information from different representation subspaces. See [Attention Is All You Need](https://arxiv.org/abs/1706.03762) MultiHead(Q,K,V)=Concat(head1,…,headh)WO\text{MultiHead}(Q, K, V) = \text{Concat}(head_1,\dots,head_h)W^O where headi=Attention(QWiQ,KWiK,VWiV)head_i = \text{Attention}(QW_i^Q, KW_i^K, VW_i^V) . Parameters * **embed_dim** – total dimension of the model. * **num_heads** – parallel attention heads. * **dropout** – a Dropout layer on attn_output_weights. Default: 0.0. * **bias** – add bias as module parameter. Default: True. * **add_bias_kv** – add bias to the key and value sequences at dim=0. * **add_zero_attn** – add a new batch of zeros to the key and value sequences at dim=1. * **kdim** – total number of features in key. Default: None. * **vdim** – total number of features in value. Default: None. Note that if `kdim` and `vdim` are None, they will be set to `embed_dim` such that query, key, and value have the same number of features. Examples: >>> multihead_attn = nn.MultiheadAttention(embed_dim, num_heads) >>> attn_output, attn_output_weights = multihead_attn(query, key, value) `forward(query, key, value, key_padding_mask=None, need_weights=True, attn_mask=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#MultiheadAttention.forward) Parameters * **key, value** (_query_ _,_) – map a query and a set of key-value pairs to an output. See “Attention Is All You Need” for more details. * **key_padding_mask** – if provided, specified padding elements in the key will be ignored by the attention. When given a binary mask and a value is True, the corresponding value on the attention layer will be ignored. When given a byte mask and a value is non-zero, the corresponding value on the attention layer will be ignored * **need_weights** – output attn_output_weights. * **attn_mask** – 2D or 3D mask that prevents attention to certain positions. A 2D mask will be broadcasted for all the batches while a 3D mask allows to specify a different mask for the entries of each batch. Shapes for inputs: * query: (L,N,E)(L, N, E) where L is the target sequence length, N is the batch size, E is the embedding dimension. * key: (S,N,E)(S, N, E) , where S is the source sequence length, N is the batch size, E is the embedding dimension. * value: (S,N,E)(S, N, E) where S is the source sequence length, N is the batch size, E is the embedding dimension. * key_padding_mask: (N,S)(N, S) where N is the batch size, S is the source sequence length. If a ByteTensor is provided, the non-zero positions will be ignored while the position with the zero positions will be unchanged. If a BoolTensor is provided, the positions with the value of `True` will be ignored while the position with the value of `False` will be unchanged. * attn_mask: if a 2D mask: (L,S)(L, S) where L is the target sequence length, S is the source sequence length. If a 3D mask: (N⋅num_heads,L,S)(N\cdot\text{num\\_heads}, L, S) where N is the batch size, L is the target sequence length, S is the source sequence length. `attn_mask` ensure that position i is allowed to attend the unmasked positions. If a ByteTensor is provided, the non-zero positions are not allowed to attend while the zero positions will be unchanged. If a BoolTensor is provided, positions with `True` is not allowed to attend while `False` values will be unchanged. If a FloatTensor is provided, it will be added to the attention weight. Shapes for outputs: * attn_output: (L,N,E)(L, N, E) where L is the target sequence length, N is the batch size, E is the embedding dimension. * attn_output_weights: (N,L,S)(N, L, S) where N is the batch size, L is the target sequence length, S is the source sequence length. # MultiLabelMarginLoss `class torch.nn.MultiLabelMarginLoss(size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MultiLabelMarginLoss) Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input xx (a 2D mini-batch `Tensor`) and output yy (which is a 2D `Tensor` of target class indices). For each sample in the mini-batch: loss(x,y)=∑ijmax⁡(0,1−(x[y[j]]−x[i]))x.size(0)\text{loss}(x, y) = \sum_{ij}\frac{\max(0, 1 - (x[y[j]] - x[i]))}{\text{x.size}(0)} where x∈{0,⋯,x.size(0)−1}x \in \left\\{0, \; \cdots , \; \text{x.size}(0) - 1\right\\} , y∈{0,⋯,y.size(0)−1}y \in \left\\{0, \; \cdots , \; \text{y.size}(0) - 1\right\\} , 0≤y[j]≤x.size(0)−10 \leq y[j] \leq \text{x.size}(0)-1 , and i≠y[j]i \neq y[j] for all ii and jj . yy and xx must have the same size. The criterion only considers a contiguous block of non-negative targets that starts at the front. This allows for different samples to have variable amounts of target classes. Parameters * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (C)(C) or (N,C)(N, C) where `N` is the batch size and `C` is the number of classes. * Target: (C)(C) or (N,C)(N, C) , label targets padded by -1 ensuring same shape as the input. * Output: scalar. If `reduction` is `'none'`, then (N)(N) . Examples: >>> loss = nn.MultiLabelMarginLoss() >>> x = torch.FloatTensor([[0.1, 0.2, 0.4, 0.8]]) >>> # for target y, only consider labels 3 and 0, not after label -1 >>> y = torch.LongTensor([[3, 0, -1, 1]]) >>> loss(x, y) >>> # 0.25 * ((1-(0.1-0.2)) + (1-(0.1-0.4)) + (1-(0.8-0.2)) + (1-(0.8-0.4))) tensor(0.8500) # MultiLabelSoftMarginLoss `class torch.nn.MultiLabelSoftMarginLoss(weight=None, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MultiLabelSoftMarginLoss) Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input xx and target yy of size (N,C)(N, C) . For each sample in the minibatch: loss(x,y)=−1C∗∑iy[i]∗log⁡((1+exp⁡(−x[i]))−1)+(1−y[i])∗log⁡(exp⁡(−x[i])(1+exp⁡(−x[i])))loss(x, y) = - \frac{1}{C} * \sum_i y[i] * \log((1 + \exp(-x[i]))^{-1}) + (1-y[i]) * \log\left(\frac{\exp(-x[i])}{(1 + \exp(-x[i]))}\right) where i∈{0,⋯,x.nElement()−1}i \in \left\\{0, \; \cdots , \; \text{x.nElement}() - 1\right\\} , y[i]∈{0,1}y[i] \in \left\\{0, \; 1\right\\} . Parameters * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size `C`. Otherwise, it is treated as if having all ones. * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (N,C)(N, C) where `N` is the batch size and `C` is the number of classes. * Target: (N,C)(N, C) , label targets padded by -1 ensuring same shape as the input. * Output: scalar. If `reduction` is `'none'`, then (N)(N) . # MultiMarginLoss `class torch.nn.MultiMarginLoss(p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#MultiMarginLoss) Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input xx (a 2D mini-batch `Tensor`) and output yy (which is a 1D tensor of target class indices, 0≤y≤x.size(1)−10 \leq y \leq \text{x.size}(1)-1 ): For each mini-batch sample, the loss in terms of the 1D input xx and scalar output yy is: loss(x,y)=∑imax⁡(0,margin−x[y]+x[i]))px.size(0)\text{loss}(x, y) = \frac{\sum_i \max(0, \text{margin} - x[y] + x[i]))^p}{\text{x.size}(0)} where x∈{0,⋯,x.size(0)−1}x \in \left\\{0, \; \cdots , \; \text{x.size}(0) - 1\right\\} and i≠yi \neq y . Optionally, you can give non-equal weighting on the classes by passing a 1D `weight` tensor into the constructor. The loss function then becomes: loss(x,y)=∑imax⁡(0,w[y]∗(margin−x[y]+x[i]))p)x.size(0)\text{loss}(x, y) = \frac{\sum_i \max(0, w[y] * (\text{margin} - x[y] + x[i]))^p)}{\text{x.size}(0)} Parameters * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Has a default value of 11 . 11 and 22 are the only supported values. * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Has a default value of 11 . * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size `C`. Otherwise, it is treated as if having all ones. * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` # NLLLoss `class torch.nn.NLLLoss(weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#NLLLoss) The negative log likelihood loss. It is useful to train a classification problem with `C` classes. If provided, the optional argument `weight` should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set. The `input` given through a forward call is expected to contain log- probabilities of each class. `input` has to be a Tensor of size either (minibatch,C)(minibatch, C) or (minibatch,C,d1,d2,...,dK)(minibatch, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 for the `K`-dimensional case (described later). Obtaining log-probabilities in a neural network is easily achieved by adding a `LogSoftmax` layer in the last layer of your network. You may use `CrossEntropyLoss` instead, if you prefer not to add an extra layer. The `target` that this loss expects should be a class index in the range [0,C−1][0, C-1] where `C = number of classes`; if `ignore_index` is specified, this loss also accepts this class index (this index may not necessarily be in the class range). The unreduced (i.e. with `reduction` set to `'none'`) loss can be described as: ℓ(x,y)=L={l1,…,lN}⊤,ln=−wynxn,yn,wc=weight[c]⋅1{c≠ignore_index},\ell(x, y) = L = \\{l_1,\dots,l_N\\}^\top, \quad l_n = - w_{y_n} x_{n,y_n}, \quad w_{c} = \text{weight}[c] \cdot \mathbb{1}\\{c \not= \text{ignore\\_index}\\}, where xx is the input, yy is the target, ww is the weight, and NN is the batch size. If `reduction` is not `'none'` (default `'mean'`), then ℓ(x,y)={∑n=1N1∑n=1Nwynln,if reduction=‘mean’;∑n=1Nln,if reduction=‘sum’.\ell(x, y) = \begin{cases} \sum_{n=1}^N \frac{1}{\sum_{n=1}^N w_{y_n}} l_n, & \text{if reduction} = \text{`mean';}\\\ \sum_{n=1}^N l_n, & \text{if reduction} = \text{`sum'.} \end{cases} Can also be used for higher dimension inputs, such as 2D images, by providing an input of size (minibatch,C,d1,d2,...,dK)(minibatch, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 , where KK is the number of dimensions, and a target of appropriate shape (see below). In the case of images, it computes NLL loss per-pixel. Parameters * **weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, it has to be a Tensor of size `C`. Otherwise, it is treated as if having all ones. * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets. * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the weighted mean of the output is taken, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (N,C)(N, C) where `C = number of classes`, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of `K`-dimensional loss. * Target: (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss. * Output: scalar. If `reduction` is `'none'`, then the same size as the target: (N)(N) , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) with K≥1K \geq 1 in the case of K-dimensional loss. Examples: >>> m = nn.LogSoftmax(dim=1) >>> loss = nn.NLLLoss() >>> # input is of size N x C = 3 x 5 >>> input = torch.randn(3, 5, requires_grad=True) >>> # each element in target has to have 0 <= value < C >>> target = torch.tensor([1, 0, 4]) >>> output = loss(m(input), target) >>> output.backward() >>> >>> >>> # 2D loss example (used, for example, with image inputs) >>> N, C = 5, 4 >>> loss = nn.NLLLoss() >>> # input is of size N x C x height x width >>> data = torch.randn(N, 16, 10, 10) >>> conv = nn.Conv2d(16, C, (3, 3)) >>> m = nn.LogSoftmax(dim=1) >>> # each element in target has to have 0 <= value < C >>> target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C) >>> output = loss(m(conv(data)), target) >>> output.backward() # PairwiseDistance `class torch.nn.PairwiseDistance(p=2.0, eps=1e-06, keepdim=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/distance.html#PairwiseDistance) Computes the batchwise pairwise distance between vectors v1v_1 , v2v_2 using the p-norm: ∥x∥p=(∑i=1n∣xi∣p)1/p.\Vert x \Vert _p = \left( \sum_{i=1}^n \vert x_i \vert ^ p \right) ^ {1/p}. Parameters * **p** (_real_) – the norm degree. Default: 2 * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid division by zero. Default: 1e-6 * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Determines whether or not to keep the vector dimension. Default: False Shape: * Input1: (N,D)(N, D) where `D = vector dimension` * Input2: (N,D)(N, D) , same shape as the Input1 * Output: (N)(N) . If `keepdim` is `True`, then (N,1)(N, 1) . Examples:: >>> pdist = nn.PairwiseDistance(p=2) >>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> output = pdist(input1, input2) # DistributedDataParallel `class torch.nn.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, dim=0, broadcast_buffers=True, process_group=None, bucket_cap_mb=25, find_unused_parameters=False, check_reduction=False, gradient_as_bucket_view=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel) Implements distributed data parallelism that is based on `torch.distributed` package at the module level. This container parallelizes the application of the given module by splitting the input across the specified devices by chunking in the batch dimension. The module is replicated on each machine and each device, and each such replica handles a portion of the input. During the backwards pass, gradients from each node are averaged. The batch size should be larger than the number of GPUs used locally. See also: [Basics](../distributed#distributed-basics) and [Use nn.parallel.DistributedDataParallel instead of multiprocessing or nn.DataParallel](https://pytorch.org/docs/1.8.0/notes/cuda.html#cuda-nn-ddp- instead). The same constraints on input as in [`torch.nn.DataParallel`](torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel") apply. Creation of this class requires that `torch.distributed` to be already initialized, by calling [`torch.distributed.init_process_group()`](../distributed#torch.distributed.init_process_group "torch.distributed.init_process_group"). `DistributedDataParallel` is proven to be significantly faster than [`torch.nn.DataParallel`](torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel") for single-node multi-GPU data parallel training. To use `DistributedDataParallel` on a host with N GPUs, you should spawn up `N` processes, ensuring that each process exclusively works on a single GPU from 0 to N-1. This can be done by either setting `CUDA_VISIBLE_DEVICES` for every process or by calling: >>> torch.cuda.set_device(i) where i is from 0 to N-1. In each process, you should refer the following to construct this module: >>> torch.distributed.init_process_group( >>> backend='nccl', world_size=N, init_method='...' >>> ) >>> model = DistributedDataParallel(model, device_ids=[i], output_device=i) In order to spawn up multiple processes per node, you can use either `torch.distributed.launch` or `torch.multiprocessing.spawn`. Note Please refer to [PyTorch Distributed Overview](https://pytorch.org/tutorials/beginner/dist_overview.html) for a brief introduction to all features related to distributed training. Note `nccl` backend is currently the fastest and highly recommended backend when using GPUs. This applies to both single-node and multi-node distributed training. Note This module also supports mixed-precision distributed training. This means that your model can have different types of parameters such as mixed types of `fp16` and `fp32`, the gradient reduction on these mixed types of parameters will just work fine. Note If you use `torch.save` on one process to checkpoint the module, and `torch.load` on some other processes to recover it, make sure that `map_location` is configured properly for every process. Without `map_location`, `torch.load` would recover the module to devices where the module was saved from. Note When a model is trained on `M` nodes with `batch=N`, the gradient will be `M` times smaller when compared to the same model trained on a single node with `batch=M*N` if the loss is summed (NOT averaged as usual) across instances in a batch (because the gradients between different nodes are averaged). You should take this into consideration when you want to obtain a mathematically equivalent training process compared to the local training counterpart. But in most cases, you can just treat a DistributedDataParallel wrapped model, a DataParallel wrapped model and an ordinary model on a single GPU as the same (E.g. using the same learning rate for equivalent batch size). Note Parameters are never broadcast between processes. The module performs an all- reduce step on gradients and assumes that they will be modified by the optimizer in all processes in the same way. Buffers (e.g. BatchNorm stats) are broadcast from the module in process of rank 0, to all other replicas in the system in every iteration. Note If you are using DistributedDataParallel in conjunction with the [Distributed RPC Framework](../rpc#distributed-rpc-framework), you should always use [`torch.distributed.autograd.backward()`](../rpc#torch.distributed.autograd.backward "torch.distributed.autograd.backward") to compute gradients and [`torch.distributed.optim.DistributedOptimizer`](../rpc#torch.distributed.optim.DistributedOptimizer "torch.distributed.optim.DistributedOptimizer") for optimizing parameters. Example: >>> import torch.distributed.autograd as dist_autograd >>> from torch.nn.parallel import DistributedDataParallel as DDP >>> from torch import optim >>> from torch.distributed.optim import DistributedOptimizer >>> from torch.distributed.rpc import RRef >>> >>> t1 = torch.rand((3, 3), requires_grad=True) >>> t2 = torch.rand((3, 3), requires_grad=True) >>> rref = rpc.remote("worker1", torch.add, args=(t1, t2)) >>> ddp_model = DDP(my_model) >>> >>> # Setup optimizer >>> optimizer_params = [rref] >>> for param in ddp_model.parameters(): >>> optimizer_params.append(RRef(param)) >>> >>> dist_optim = DistributedOptimizer( >>> optim.SGD, >>> optimizer_params, >>> lr=0.05, >>> ) >>> >>> with dist_autograd.context() as context_id: >>> pred = ddp_model(rref.to_here()) >>> loss = loss_func(pred, loss) >>> dist_autograd.backward(context_id, loss) >>> dist_optim.step() Warning Constructor, forward method, and differentiation of the output (or a function of the output of this module) are distributed synchronization points. Take that into account in case different processes might be executing different code. Warning This module assumes all parameters are registered in the model by the time it is created. No parameters should be added nor removed later. Same applies to buffers. Warning This module assumes all parameters are registered in the model of each distributed processes are in the same order. The module itself will conduct gradient `allreduce` following the reverse order of the registered parameters of the model. In other words, it is users’ responsibility to ensure that each distributed process has the exact same model and thus the exact same parameter registration order. Warning This module allows parameters with non-rowmajor-contiguous strides. For example, your model may contain some parameters whose `torch.memory_format` is `torch.contiguous_format` and others whose format is `torch.channels_last`. However, corresponding parameters in different processes must have the same strides. Warning This module doesn’t work with [`torch.autograd.grad()`](../autograd#torch.autograd.grad "torch.autograd.grad") (i.e. it will only work if gradients are to be accumulated in `.grad` attributes of parameters). Warning If you plan on using this module with a `nccl` backend or a `gloo` backend (that uses Infiniband), together with a DataLoader that uses multiple workers, please change the multiprocessing start method to `forkserver` (Python 3 only) or `spawn`. Unfortunately Gloo (that uses Infiniband) and NCCL2 are not fork safe, and you will likely experience deadlocks if you don’t change this setting. Warning Forward and backward hooks defined on `module` and its submodules won’t be invoked anymore, unless the hooks are initialized in the `forward()` method. Warning You should never try to change your model’s parameters after wrapping up your model with `DistributedDataParallel`. Because, when wrapping up your model with `DistributedDataParallel`, the constructor of `DistributedDataParallel` will register the additional gradient reduction functions on all the parameters of the model itself at the time of construction. If you change the model’s parameters afterwards, gradient redunction functions no longer match the correct set of parameters. Warning Using `DistributedDataParallel` in conjunction with the [Distributed RPC Framework](../rpc#distributed-rpc-framework) is experimental and subject to change. Warning The `gradient_as_bucket_view` mode does not yet work with Automatic Mixed Precision (AMP). AMP maintains stashed gradients that are used for unscaling gradients. With `gradient_as_bucket_view=True`, these stashed gradients will point to communication buckets in the first iteration. In the next iteration, the communication buckets are mutated and thus these stashed gradients will be unexpectedly mutated as well, which might lead to wrong results. Parameters * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module to be parallelized * **device_ids** (_list of python:int_ _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – CUDA devices. This should only be provided when the input module resides on a single CUDA device. For single-device modules, the i’th `module` replica is placed on `device_ids[i]`. For multi-device modules and CPU modules, `device_ids` must be `None` or an empty list, and input data for the forward pass must be placed on the correct device. (default: all visible devices for single-device modules) * **output_device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[torch.device](../tensor_attributes#torch.torch.device "torch.torch.device")) – Device location of output for single-device CUDA modules. For multi-device modules and CPU modules, it must be `None`, and the module itself dictates the output location. (default: `device_ids[0]` for single-device modules) * **broadcast_buffers** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Flag that enables syncing (broadcasting) buffers of the module at beginning of the `forward` function. (default: `True`) * **process_group** – The process group to be used for distributed data all-reduction. If `None`, the default process group, which is created by [`torch.distributed.init_process_group()`](../distributed#torch.distributed.init_process_group "torch.distributed.init_process_group"), will be used. (default: `None`) * **bucket_cap_mb** – `DistributedDataParallel` will bucket parameters into multiple buckets so that gradient reduction of each bucket can potentially overlap with backward computation. `bucket_cap_mb` controls the bucket size in MegaBytes (MB). (default: 25) * **find_unused_parameters** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Traverse the autograd graph from all tensors contained in the return value of the wrapped module’s `forward` function. Parameters that don’t receive gradients as part of this graph are preemptively marked as being ready to be reduced. Note that all `forward` outputs that are derived from module parameters must participate in calculating loss and later the gradient computation. If they don’t, this wrapper will hang waiting for autograd to produce gradients for those parameters. Any outputs derived from module parameters that are otherwise unused can be detached from the autograd graph using `torch.Tensor.detach`. (default: `False`) * **check_reduction** – This argument is deprecated. * **gradient_as_bucket_view** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – This is a prototype feature and subject to changes. When set to `True`, gradients will be views pointing to different offsets of `allreduce` communication buckets. This can reduce peak memory usage, where the saved memory size will be equal to the total gradients size. Moreover, it avoids the overhead of copying between gradients and `allreduce` communication buckets. When gradients are views, `detach_()` cannot be called on the gradients. If hitting such errors, please fix it by referring to the [`zero_grad()`](../optim#torch.optim.Optimizer.zero_grad "torch.optim.Optimizer.zero_grad") function in `torch/optim/optimizer.py` as a solution. Variables **~DistributedDataParallel.module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – the module to be parallelized. Example: >>> torch.distributed.init_process_group(backend='nccl', world_size=4, init_method='...') >>> net = torch.nn.parallel.DistributedDataParallel(model, pg) `join(divide_by_initial_world_size=True, enable=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.join) A context manager to be used in conjunction with an instance of `torch.nn.parallel.DistributedDataParallel` to be able to train with uneven inputs across participating processes. This context manager will keep track of already-joined DDP processes, and “shadow” the forward and backward passes by inserting collective communication operations to match with the ones created by non-joined DDP processes. This will ensure each collective call has a corresponding call by already-joined DDP processes, preventing hangs or errors that would otherwise happen when training with uneven inputs across processes. Once all DDP processes have joined, the context manager will broadcast the model corresponding to the last joined process to all processes to ensure the model is the same across all processes (which is guaranteed by DDP). To use this to enable training with uneven inputs across processes, simply wrap this context manager around your training loop. No further modifications to the model or data loading is required. Warning This module works only with the multi-process, single-device usage of `torch.nn.parallel.DistributedDataParallel`, which means that a single process works on a single GPU. Warning This module currently does not support custom distributed collective operations in the forward pass, such as `SyncBatchNorm` or other custom defined collectives in the model’s forward pass. Parameters * **divide_by_initial_world_size** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, will divide gradients by the initial `world_size` DDP training was launched with. If `False`, will compute the effective world size (number of ranks that have not depleted their inputs yet) and divide gradients by that during allreduce. Set `divide_by_initial_world_size=True` to ensure every input sample including the uneven inputs have equal weight in terms of how much they contribute to the global gradient. This is achieved by always dividing the gradient by the initial `world_size` even when we encounter uneven inputs. If you set this to `False`, we divide the gradient by the remaining number of nodes. This ensures parity with training on a smaller `world_size` although it also means the uneven inputs would contribute more towards the global gradient. Typically, you would want to set this to `True` for cases where the last few inputs of your training job are uneven. In extreme cases, where there is a large discrepancy in the number of inputs, setting this to `False` might provide better results. * **enable** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to enable uneven input detection or not. Pass in `enable=False` to disable in cases where you know that inputs are even across participating processes. Default is `True`. Example: >>> import torch >>> import torch.distributed as dist >>> import os >>> import torch.multiprocessing as mp >>> import torch.nn as nn >>> # On each spawned worker >>> def worker(rank): >>> dist.init_process_group("nccl", rank=rank, world_size=2) >>> torch.cuda.set_device(rank) >>> model = nn.Linear(1, 1, bias=False).to(rank) >>> model = torch.nn.parallel.DistributedDataParallel( >>> model, device_ids=[rank], output_device=rank >>> ) >>> # Rank 1 gets one more input than rank 0. >>> inputs = [torch.tensor([1]).float() for _ in range(10 + rank)] >>> with model.join(): >>> for _ in range(5): >>> for inp in inputs: >>> loss = model(inp).sum() >>> loss.backward() >>> # Without the join() API, the below synchronization will hang >>> # blocking for rank 1's allreduce to complete. >>> torch.cuda.synchronize(device=rank) `no_sync()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.no_sync) A context manager to disable gradient synchronizations across DDP processes. Within this context, gradients will be accumulated on module variables, which will later be synchronized in the first forward-backward pass exiting the context. Example: >>> ddp = torch.nn.parallel.DistributedDataParallel(model, pg) >>> with ddp.no_sync(): >>> for input in inputs: >>> ddp(input).backward() # no synchronization, accumulate grads >>> ddp(another_input).backward() # synchronize grads `register_comm_hook(state, hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/distributed.html#DistributedDataParallel.register_comm_hook) Registers a communication hook which is an enhancement that provides a flexible hook to users where they can specify how DDP aggregates gradients across multiple workers. This hook would be very useful for researchers to try out new ideas. For example, this hook can be used to implement several algorithms like GossipGrad and gradient compression which involve different communication strategies for parameter syncs while running Distributed DataParallel training. Parameters * **state** ([object](https://docs.python.org/3/library/functions.html#object "\(in Python v3.9\)")) – Passed to the hook to maintain any state information during the training process. Examples include error feedback in gradient compression, peers to communicate with next in GossipGrad, etc. It is locally stored by each worker and shared by all the gradient tensors on the worker. * **hook** (_callable_) – Averages gradient tensors across workers and defined as: `hook(state: object, bucket: dist._GradBucket) -> torch.futures.Future`: This function is called once the bucket is ready. The hook can perform whatever processing is needed and return a Future indicating completion of any async work (ex: allreduce). If the hook doesn’t perform any communication, it can also just return a completed Future. The Future should hold the new value of grad bucket’s tensors. Once a bucket is ready, c10d reducer would call this hook and use the tensors returned by the Future and copy grads to individual parameters. We also provide an API called `get_future` to retrieve a Future associated with the completion of `c10d.ProcessGroup.work`. Warning Grad bucket’s tensors will not be predivided by world_size. User is responsible to divide by the world_size in case of operations like allreduce. Warning DDP communication hook can only be registered once and should be registered before calling backward. Warning The Future object that hook returns should contain a result that has the same shape with the tensors inside grad bucket. Warning DDP communication hook does not support single-process multiple-device mode. Gradbucket tensors should consist of only a single tensor. Warning `get_future` API supports only NCCL backend and will return a `torch._C.Future` which is an internal type and should be used with caution. It can still be used by `register_comm_hook` API, but it is subject to some subtle differences compared to `torch.futures.Future`. Warning DDP communication hook is experimental and subject to change. Example:: Below is an example of a noop hook that returns the same tensors. >>> def noop(state: object, bucket: dist._GradBucket): -> torch.futures.Future >>> fut = torch.futures.Future() >>> fut.set_result(bucket.get_tensors()) >>> return fut >>> ddp.register_comm_hook(state = None, hook = noop) Example:: Below is an example of a Parallel SGD algorithm where gradients are encoded before allreduce, and then decoded after allreduce. >>> def encode_and_decode(state: object, bucket: dist._GradBucket): -> torch.futures.Future >>> tensors = [t / process_group.world_size for t in bucket.get_tensors()] >>> encoded_tensors = encode(tensors) # encode gradients >>> fut = process_group.allreduce(encoded_tensors).get_future() >>> # Define the then callback to decode. >>> def decode(fut): >>> decoded_tensors = decode(fut.value()) # decode gradients >>> return decoded_tensors >>> return fut.then(decode) >>> ddp.register_comm_hook(state = None, hook = encode_and_decode) # Parameter `class torch.nn.parameter.Parameter` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parameter.html#Parameter) A kind of Tensor that is to be considered a module parameter. Parameters are [`Tensor`](../tensors#torch.Tensor "torch.Tensor") subclasses, that have a very special property when used with `Module` s - when they’re assigned as Module attributes they are automatically added to the list of its parameters, and will appear e.g. in `parameters()` iterator. Assigning a Tensor doesn’t have such effect. This is because one might want to cache some temporary state, like last hidden state of the RNN, in the model. If there was no such class as `Parameter`, these temporaries would get registered too. Parameters * **data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – parameter tensor. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if the parameter requires gradient. See [Excluding subgraphs from backward](https://pytorch.org/docs/1.8.0/notes/autograd.html#excluding-subgraphs) for more details. Default: `True` # UninitializedParameter `class torch.nn.parameter.UninitializedParameter` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parameter.html#UninitializedParameter) A parameter that is not initialized. Unitialized Parameters are a a special case of `torch.nn.Parameter` where the shape of the data is still unknown. Unlikely a `torch.nn.Parameter`, uninitialized parameters hold no data and attempting to access some properties, like their shape, will throw a runtime error. The only operations that can be performed on a uninitialized parameter are changing its datatype, moving it to a different device and converting it to a regular `torch.nn.Parameter`. `materialize(shape, device=None, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parameter.html#UninitializedParameter.materialize) Create a Parameter with the same properties of the uninitialized one. Given a shape, it materializes a parameter in the same device and with the same `dtype` as the current one or the specified ones in the arguments. Parameters * **shape** – (tuple): the shape for the materialized tensor. * **device** (`torch.device`) – the desired device of the parameters and buffers in this module. Optional. * **dtype** (`torch.dtype`) – the desired floating point type of the floating point parameters and buffers in this module. Optional. # ParameterDict `class torch.nn.ParameterDict(parameters=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict) Holds parameters in a dictionary. ParameterDict can be indexed like a regular Python dictionary, but parameters it contains are properly registered, and will be visible by all Module methods. `ParameterDict` is an **ordered** dictionary that respects * the order of insertion, and * in `update()`, the order of the merged `OrderedDict` or another `ParameterDict` (the argument to `update()`). Note that `update()` with other unordered mapping types (e.g., Python’s plain `dict`) does not preserve the order of the merged mapping. Parameters **parameters** (_iterable_ _,__optional_) – a mapping (dictionary) of (string : `Parameter`) or an iterable of key-value pairs of type (string, `Parameter`) Example: class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.params = nn.ParameterDict({ 'left': nn.Parameter(torch.randn(5, 10)), 'right': nn.Parameter(torch.randn(5, 10)) }) def forward(self, x, choice): x = self.params[choice].mm(x) return x `clear()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.clear) Remove all items from the ParameterDict. `items()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.items) Return an iterable of the ParameterDict key/value pairs. `keys()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.keys) Return an iterable of the ParameterDict keys. `pop(key)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.pop) Remove key from the ParameterDict and return its parameter. Parameters **key** (_string_) – key to pop from the ParameterDict `update(parameters)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.update) Update the `ParameterDict` with the key-value pairs from a mapping or an iterable, overwriting existing keys. Note If `parameters` is an `OrderedDict`, a `ParameterDict`, or an iterable of key- value pairs, the order of new elements in it is preserved. Parameters **parameters** (_iterable_) – a mapping (dictionary) from string to `Parameter`, or an iterable of key-value pairs of type (string, `Parameter`) `values()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterDict.values) Return an iterable of the ParameterDict values. # ParameterList `class torch.nn.ParameterList(parameters=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterList) Holds parameters in a list. `ParameterList` can be indexed like a regular Python list, but parameters it contains are properly registered, and will be visible by all [`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") methods. Parameters **parameters** (_iterable_ _,__optional_) – an iterable of `Parameter` to add Example: class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() self.params = nn.ParameterList([nn.Parameter(torch.randn(10, 10)) for i in range(10)]) def forward(self, x): # ParameterList can act as an iterable, or be indexed using ints for i, p in enumerate(self.params): x = self.params[i // 2].mm(x) + p.mm(x) return x `append(parameter)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterList.append) Appends a given parameter at the end of the list. Parameters **parameter** (_nn.Parameter_) – parameter to append `extend(parameters)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#ParameterList.extend) Appends parameters from a Python iterable to the end of the list. Parameters **parameters** (_iterable_) – iterable of parameters to append # PixelShuffle `class torch.nn.PixelShuffle(upscale_factor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pixelshuffle.html#PixelShuffle) Rearranges elements in a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) to a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) , where r is an upscale factor. This is useful for implementing efficient sub-pixel convolution with a stride of 1/r1/r . See the paper: [Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network](https://arxiv.org/abs/1609.05158) by Shi et. al (2016) for more details. Parameters **upscale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – factor to increase spatial resolution by Shape: * Input: (∗,Cin,Hin,Win)(*, C_{in}, H_{in}, W_{in}) , where * is zero or more batch dimensions * Output: (∗,Cout,Hout,Wout)(*, C_{out}, H_{out}, W_{out}) , where Cout=Cin÷upscale_factor2C_{out} = C_{in} \div \text{upscale\\_factor}^2 Hout=Hin×upscale_factorH_{out} = H_{in} \times \text{upscale\\_factor} Wout=Win×upscale_factorW_{out} = W_{in} \times \text{upscale\\_factor} Examples: >>> pixel_shuffle = nn.PixelShuffle(3) >>> input = torch.randn(1, 9, 4, 4) >>> output = pixel_shuffle(input) >>> print(output.size()) torch.Size([1, 1, 12, 12]) # PixelUnshuffle `class torch.nn.PixelUnshuffle(downscale_factor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/pixelshuffle.html#PixelUnshuffle) Reverses the [`PixelShuffle`](torch.nn.pixelshuffle#torch.nn.PixelShuffle "torch.nn.PixelShuffle") operation by rearranging elements in a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) , where r is a downscale factor. See the paper: [Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network](https://arxiv.org/abs/1609.05158) by Shi et. al (2016) for more details. Parameters **downscale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – factor to decrease spatial resolution by Shape: * Input: (∗,Cin,Hin,Win)(*, C_{in}, H_{in}, W_{in}) , where * is zero or more batch dimensions * Output: (∗,Cout,Hout,Wout)(*, C_{out}, H_{out}, W_{out}) , where Cout=Cin×downscale_factor2C_{out} = C_{in} \times \text{downscale\\_factor}^2 Hout=Hin÷downscale_factorH_{out} = H_{in} \div \text{downscale\\_factor} Wout=Win÷downscale_factorW_{out} = W_{in} \div \text{downscale\\_factor} Examples: >>> pixel_unshuffle = nn.PixelUnshuffle(3) >>> input = torch.randn(1, 1, 12, 12) >>> output = pixel_unshuffle(input) >>> print(output.size()) torch.Size([1, 9, 4, 4]) # PoissonNLLLoss `class torch.nn.PoissonNLLLoss(log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#PoissonNLLLoss) Negative log likelihood loss with Poisson distribution of target. The loss can be described as: target∼Poisson(input)loss(input,target)=input−target∗log⁡(input)+log⁡(target!)\text{target} \sim \mathrm{Poisson}(\text{input}) \text{loss}(\text{input}, \text{target}) = \text{input} - \text{target} * \log(\text{input}) + \log(\text{target!}) The last term can be omitted or approximated with Stirling formula. The approximation is used for target values more than 1. For targets less or equal to 1 zeros are added to the loss. Parameters * **log_input** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True` the loss is computed as exp⁡(input)−target∗input\exp(\text{input}) - \text{target}*\text{input} , if `False` the loss is input−target∗log⁡(input+eps)\text{input} - \text{target}*\log(\text{input}+\text{eps}) . * **full** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to compute full loss, i. e. to add the Stirling approximation term target∗log⁡(target)−target+0.5∗log⁡(2πtarget).\text{target}*\log(\text{target}) - \text{target} + 0.5 * \log(2\pi\text{target}). * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid evaluation of log⁡(0)\log(0) when `log_input = False`. Default: 1e-8 * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Examples: >>> loss = nn.PoissonNLLLoss() >>> log_input = torch.randn(5, 2, requires_grad=True) >>> target = torch.randn(5, 2) >>> output = loss(log_input, target) >>> output.backward() Shape: * Input: (N,∗)(N, *) where ∗* means, any number of additional dimensions * Target: (N,∗)(N, *) , same shape as the input * Output: scalar by default. If `reduction` is `'none'`, then (N,∗)(N, *) , the same shape as the input # PReLU `class torch.nn.PReLU(num_parameters=1, init=0.25)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#PReLU) Applies the element-wise function: PReLU(x)=max⁡(0,x)+a∗min⁡(0,x)\text{PReLU}(x) = \max(0,x) + a * \min(0,x) or PReLU(x)={x, if x≥0ax, otherwise \text{PReLU}(x) = \begin{cases} x, & \text{ if } x \geq 0 \\\ ax, & \text{ otherwise } \end{cases} Here aa is a learnable parameter. When called without arguments, `nn.PReLU()` uses a single parameter aa across all input channels. If called with `nn.PReLU(nChannels)`, a separate aa is used for each input channel. Note weight decay should not be used when learning aa for good performance. Note Channel dim is the 2nd dim of input. When input has dims < 2, then there is no channel dim and the number of channels = 1. Parameters * **num_parameters** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of aa to learn. Although it takes an int as input, there is only two values are legitimate: 1, or the number of channels at input. Default: 1 * **init** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the initial value of aa . Default: 0.25 Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Variables **~PReLU.weight** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the learnable weights of shape (`num_parameters`). Examples: >>> m = nn.PReLU() >>> input = torch.randn(2) >>> output = m(input) # ReflectionPad1d `class torch.nn.ReflectionPad1d(padding)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReflectionPad1d) Pads the input tensor using the reflection of the input boundary. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 2-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} ) Shape: * Input: (N,C,Win)(N, C, W_{in}) * Output: (N,C,Wout)(N, C, W_{out}) where Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ReflectionPad1d(2) >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4) >>> input tensor([[[0., 1., 2., 3.], [4., 5., 6., 7.]]]) >>> m(input) tensor([[[2., 1., 0., 1., 2., 3., 2., 1.], [6., 5., 4., 5., 6., 7., 6., 5.]]]) >>> # using different paddings for different sides >>> m = nn.ReflectionPad1d((3, 1)) >>> m(input) tensor([[[3., 2., 1., 0., 1., 2., 3., 2.], [7., 6., 5., 4., 5., 6., 7., 6.]]]) # ReflectionPad2d `class torch.nn.ReflectionPad2d(padding)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReflectionPad2d) Pads the input tensor using the reflection of the input boundary. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} , padding_top\text{padding\\_top} , padding_bottom\text{padding\\_bottom} ) Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} + \text{padding\\_bottom} Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ReflectionPad2d(2) >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> m(input) tensor([[[[8., 7., 6., 7., 8., 7., 6.], [5., 4., 3., 4., 5., 4., 3.], [2., 1., 0., 1., 2., 1., 0.], [5., 4., 3., 4., 5., 4., 3.], [8., 7., 6., 7., 8., 7., 6.], [5., 4., 3., 4., 5., 4., 3.], [2., 1., 0., 1., 2., 1., 0.]]]]) >>> # using different paddings for different sides >>> m = nn.ReflectionPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[7., 6., 7., 8., 7.], [4., 3., 4., 5., 4.], [1., 0., 1., 2., 1.], [4., 3., 4., 5., 4.], [7., 6., 7., 8., 7.]]]]) # ReLU `class torch.nn.ReLU(inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#ReLU) Applies the rectified linear unit function element-wise: ReLU(x)=(x)+=max⁡(0,x)\text{ReLU}(x) = (x)^+ = \max(0, x) Parameters **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.ReLU() >>> input = torch.randn(2) >>> output = m(input) An implementation of CReLU - https://arxiv.org/abs/1603.05201 >>> m = nn.ReLU() >>> input = torch.randn(2).unsqueeze(0) >>> output = torch.cat((m(input),m(-input))) # ReLU6 `class torch.nn.ReLU6(inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#ReLU6) Applies the element-wise function: ReLU6(x)=min⁡(max⁡(0,x),6)\text{ReLU6}(x) = \min(\max(0,x), 6) Parameters **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.ReLU6() >>> input = torch.randn(2) >>> output = m(input) # ReplicationPad1d `class torch.nn.ReplicationPad1d(padding)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReplicationPad1d) Pads the input tensor using replication of the input boundary. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 2-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} ) Shape: * Input: (N,C,Win)(N, C, W_{in}) * Output: (N,C,Wout)(N, C, W_{out}) where Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ReplicationPad1d(2) >>> input = torch.arange(8, dtype=torch.float).reshape(1, 2, 4) >>> input tensor([[[0., 1., 2., 3.], [4., 5., 6., 7.]]]) >>> m(input) tensor([[[0., 0., 0., 1., 2., 3., 3., 3.], [4., 4., 4., 5., 6., 7., 7., 7.]]]) >>> # using different paddings for different sides >>> m = nn.ReplicationPad1d((3, 1)) >>> m(input) tensor([[[0., 0., 0., 0., 1., 2., 3., 3.], [4., 4., 4., 4., 5., 6., 7., 7.]]]) # ReplicationPad2d `class torch.nn.ReplicationPad2d(padding)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReplicationPad2d) Pads the input tensor using replication of the input boundary. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} , padding_top\text{padding\\_top} , padding_bottom\text{padding\\_bottom} ) Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} + \text{padding\\_bottom} Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ReplicationPad2d(2) >>> input = torch.arange(9, dtype=torch.float).reshape(1, 1, 3, 3) >>> input tensor([[[[0., 1., 2.], [3., 4., 5.], [6., 7., 8.]]]]) >>> m(input) tensor([[[[0., 0., 0., 1., 2., 2., 2.], [0., 0., 0., 1., 2., 2., 2.], [0., 0., 0., 1., 2., 2., 2.], [3., 3., 3., 4., 5., 5., 5.], [6., 6., 6., 7., 8., 8., 8.], [6., 6., 6., 7., 8., 8., 8.], [6., 6., 6., 7., 8., 8., 8.]]]]) >>> # using different paddings for different sides >>> m = nn.ReplicationPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[0., 0., 1., 2., 2.], [0., 0., 1., 2., 2.], [0., 0., 1., 2., 2.], [3., 3., 4., 5., 5.], [6., 6., 7., 8., 8.]]]]) # ReplicationPad3d `class torch.nn.ReplicationPad3d(padding)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ReplicationPad3d) Pads the input tensor using replication of the input boundary. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 6-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} , padding_top\text{padding\\_top} , padding_bottom\text{padding\\_bottom} , padding_front\text{padding\\_front} , padding_back\text{padding\\_back} ) Shape: * Input: (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in}) * Output: (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) where Dout=Din+padding_front+padding_backD_{out} = D_{in} + \text{padding\\_front} + \text{padding\\_back} Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} + \text{padding\\_bottom} Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ReplicationPad3d(3) >>> input = torch.randn(16, 3, 8, 320, 480) >>> output = m(input) >>> # using different paddings for different sides >>> m = nn.ReplicationPad3d((3, 3, 6, 6, 1, 1)) >>> output = m(input) # RNN `class torch.nn.RNN(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNN) Applies a multi-layer Elman RNN with tanh⁡\tanh or ReLU\text{ReLU} non- linearity to an input sequence. For each element in the input sequence, each layer computes the following function: ht=tanh⁡(Wihxt+bih+Whhh(t−1)+bhh)h_t = \tanh(W_{ih} x_t + b_{ih} + W_{hh} h_{(t-1)} + b_{hh}) where hth_t is the hidden state at time `t`, xtx_t is the input at time `t`, and h(t−1)h_{(t-1)} is the hidden state of the previous layer at time `t-1` or the initial hidden state at time `0`. If `nonlinearity` is `'relu'`, then ReLU\text{ReLU} is used instead of tanh⁡\tanh . Parameters * **input_size** – The number of expected features in the input `x` * **hidden_size** – The number of features in the hidden state `h` * **num_layers** – Number of recurrent layers. E.g., setting `num_layers=2` would mean stacking two RNNs together to form a `stacked RNN`, with the second RNN taking in outputs of the first RNN and computing the final results. Default: 1 * **nonlinearity** – The non-linearity to use. Can be either `'tanh'` or `'relu'`. Default: `'tanh'` * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True` * **batch_first** – If `True`, then the input and output tensors are provided as `(batch, seq, feature)`. Default: `False` * **dropout** – If non-zero, introduces a `Dropout` layer on the outputs of each RNN layer except the last layer, with dropout probability equal to `dropout`. Default: 0 * **bidirectional** – If `True`, becomes a bidirectional RNN. Default: `False` Inputs: input, h_0 * **input** of shape `(seq_len, batch, input_size)`: tensor containing the features of the input sequence. The input can also be a packed variable length sequence. See [`torch.nn.utils.rnn.pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") or [`torch.nn.utils.rnn.pack_sequence()`](torch.nn.utils.rnn.pack_sequence#torch.nn.utils.rnn.pack_sequence "torch.nn.utils.rnn.pack_sequence") for details. * **h_0** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. If the RNN is bidirectional, num_directions should be 2, else it should be 1. Outputs: output, h_n * **output** of shape `(seq_len, batch, num_directions * hidden_size)`: tensor containing the output features (`h_t`) from the last layer of the RNN, for each `t`. If a [`torch.nn.utils.rnn.PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") has been given as the input, the output will also be a packed sequence. For the unpacked case, the directions can be separated using `output.view(seq_len, batch, num_directions, hidden_size)`, with forward and backward being direction `0` and `1` respectively. Similarly, the directions can be separated in the packed case. * **h_n** of shape `(num_layers * num_directions, batch, hidden_size)`: tensor containing the hidden state for `t = seq_len`. Like _output_ , the layers can be separated using `h_n.view(num_layers, num_directions, batch, hidden_size)`. Shape: * Input1: (L,N,Hin)(L, N, H_{in}) tensor containing input features where Hin=input_sizeH_{in}=\text{input\\_size} and `L` represents a sequence length. * Input2: (S,N,Hout)(S, N, H_{out}) tensor containing the initial hidden state for each element in the batch. Hout=hidden_sizeH_{out}=\text{hidden\\_size} Defaults to zero if not provided. where S=num_layers∗num_directionsS=\text{num\\_layers} * \text{num\\_directions} If the RNN is bidirectional, num_directions should be 2, else it should be 1. * Output1: (L,N,Hall)(L, N, H_{all}) where Hall=num_directions∗hidden_sizeH_{all}=\text{num\\_directions} * \text{hidden\\_size} * Output2: (S,N,Hout)(S, N, H_{out}) tensor containing the next hidden state for each element in the batch Variables * **~RNN.weight_ih_l[k]** – the learnable input-hidden weights of the k-th layer, of shape `(hidden_size, input_size)` for `k = 0`. Otherwise, the shape is `(hidden_size, num_directions * hidden_size)` * **~RNN.weight_hh_l[k]** – the learnable hidden-hidden weights of the k-th layer, of shape `(hidden_size, hidden_size)` * **~RNN.bias_ih_l[k]** – the learnable input-hidden bias of the k-th layer, of shape `(hidden_size)` * **~RNN.bias_hh_l[k]** – the learnable hidden-hidden bias of the k-th layer, of shape `(hidden_size)` Note All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}} Warning There are known non-determinism issues for RNN functions on some versions of cuDNN and CUDA. You can enforce deterministic behavior by setting the following environment variables: On CUDA 10.1, set environment variable `CUDA_LAUNCH_BLOCKING=1`. This may affect performance. On CUDA 10.2 or later, set environment variable (note the leading colon symbol) `CUBLAS_WORKSPACE_CONFIG=:16:8` or `CUBLAS_WORKSPACE_CONFIG=:4096:2`. See the [cuDNN 8 Release Notes](https://docs.nvidia.com/deeplearning/sdk/cudnn-release- notes/rel_8.html) for more information. Orphan Note If the following conditions are satisfied: 1) cudnn is enabled, 2) input data is on the GPU 3) input data has dtype `torch.float16` 4) V100 GPU is used, 5) input data is not in `PackedSequence` format persistent algorithm can be selected to improve performance. Examples: >>> rnn = nn.RNN(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> output, hn = rnn(input, h0) # RNNBase `class torch.nn.RNNBase(mode, input_size, hidden_size, num_layers=1, bias=True, batch_first=False, dropout=0.0, bidirectional=False, proj_size=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNNBase) `flatten_parameters()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNNBase.flatten_parameters) Resets parameter data pointer so that they can use faster code paths. Right now, this works only if the module is on the GPU and cuDNN is enabled. Otherwise, it’s a no-op. # RNNCell `class torch.nn.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/rnn.html#RNNCell) An Elman RNN cell with tanh or ReLU non-linearity. h′=tanh⁡(Wihx+bih+Whhh+bhh)h' = \tanh(W_{ih} x + b_{ih} + W_{hh} h + b_{hh}) If `nonlinearity` is `‘relu’`, then ReLU is used in place of tanh. Parameters * **input_size** – The number of expected features in the input `x` * **hidden_size** – The number of features in the hidden state `h` * **bias** – If `False`, then the layer does not use bias weights `b_ih` and `b_hh`. Default: `True` * **nonlinearity** – The non-linearity to use. Can be either `'tanh'` or `'relu'`. Default: `'tanh'` Inputs: input, hidden * **input** of shape `(batch, input_size)`: tensor containing input features * **hidden** of shape `(batch, hidden_size)`: tensor containing the initial hidden state for each element in the batch. Defaults to zero if not provided. Outputs: h’ * **h’** of shape `(batch, hidden_size)`: tensor containing the next hidden state for each element in the batch Shape: * Input1: (N,Hin)(N, H_{in}) tensor containing input features where HinH_{in} = `input_size` * Input2: (N,Hout)(N, H_{out}) tensor containing the initial hidden state for each element in the batch where HoutH_{out} = `hidden_size` Defaults to zero if not provided. * Output: (N,Hout)(N, H_{out}) tensor containing the next hidden state for each element in the batch Variables * **~RNNCell.weight_ih** – the learnable input-hidden weights, of shape `(hidden_size, input_size)` * **~RNNCell.weight_hh** – the learnable hidden-hidden weights, of shape `(hidden_size, hidden_size)` * **~RNNCell.bias_ih** – the learnable input-hidden bias, of shape `(hidden_size)` * **~RNNCell.bias_hh** – the learnable hidden-hidden bias, of shape `(hidden_size)` Note All the weights and biases are initialized from U(−k,k)\mathcal{U}(-\sqrt{k}, \sqrt{k}) where k=1hidden_sizek = \frac{1}{\text{hidden\\_size}} Examples: >>> rnn = nn.RNNCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx = rnn(input[i], hx) output.append(hx) # RReLU `class torch.nn.RReLU(lower=0.125, upper=0.3333333333333333, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#RReLU) Applies the randomized leaky rectified liner unit function, element-wise, as described in the paper: [Empirical Evaluation of Rectified Activations in Convolutional Network](https://arxiv.org/abs/1505.00853). The function is defined as: RReLU(x)={xif x≥0ax otherwise \text{RReLU}(x) = \begin{cases} x & \text{if } x \geq 0 \\\ ax & \text{ otherwise } \end{cases} where aa is randomly sampled from uniform distribution U(lower,upper)\mathcal{U}(\text{lower}, \text{upper}) . See: Parameters * **lower** – lower bound of the uniform distribution. Default: 18\frac{1}{8} * **upper** – upper bound of the uniform distribution. Default: 13\frac{1}{3} * **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.RReLU(0.1, 0.3) >>> input = torch.randn(2) >>> output = m(input) # SELU `class torch.nn.SELU(inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#SELU) Applied element-wise, as: SELU(x)=scale∗(max⁡(0,x)+min⁡(0,α∗(exp⁡(x)−1)))\text{SELU}(x) = \text{scale} * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1))) with α=1.6732632423543772848170429916717\alpha = 1.6732632423543772848170429916717 and scale=1.0507009873554804934193349852946\text{scale} = 1.0507009873554804934193349852946 . More details can be found in the paper [Self-Normalizing Neural Networks](https://arxiv.org/abs/1706.02515) . Parameters **inplace** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – can optionally do the operation in- place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.SELU() >>> input = torch.randn(2) >>> output = m(input) # Sequential `class torch.nn.Sequential(*args)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/container.html#Sequential) A sequential container. Modules will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of modules can also be passed in. To make it easier to understand, here is a small example: # Example of using Sequential model = nn.Sequential( nn.Conv2d(1,20,5), nn.ReLU(), nn.Conv2d(20,64,5), nn.ReLU() ) # Example of using Sequential with OrderedDict model = nn.Sequential(OrderedDict([ ('conv1', nn.Conv2d(1,20,5)), ('relu1', nn.ReLU()), ('conv2', nn.Conv2d(20,64,5)), ('relu2', nn.ReLU()) ])) # Sigmoid `class torch.nn.Sigmoid` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Sigmoid) Applies the element-wise function: Sigmoid(x)=σ(x)=11+exp⁡(−x)\text{Sigmoid}(x) = \sigma(x) = \frac{1}{1 + \exp(-x)} Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Sigmoid() >>> input = torch.randn(2) >>> output = m(input) # SiLU `class torch.nn.SiLU(inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#SiLU) Applies the silu function, element-wise. silu(x)=x∗σ(x),where σ(x) is the logistic sigmoid.\text{silu}(x) = x * \sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.} Note See [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415) where the SiLU (Sigmoid Linear Unit) was originally coined, and see [Sigmoid- Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning](https://arxiv.org/abs/1702.03118) and [Swish: a Self- Gated Activation Function](https://arxiv.org/abs/1710.05941v1) where the SiLU was experimented with later. Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.SiLU() >>> input = torch.randn(2) >>> output = m(input) # SmoothL1Loss `class torch.nn.SmoothL1Loss(size_average=None, reduce=None, reduction='mean', beta=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#SmoothL1Loss) Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. It is less sensitive to outliers than the [`torch.nn.MSELoss`](torch.nn.mseloss#torch.nn.MSELoss "torch.nn.MSELoss") and in some cases prevents exploding gradients (e.g. see `Fast R-CNN` paper by Ross Girshick). Omitting a scaling factor of `beta`, this loss is also known as the Huber loss: loss(x,y)=1n∑izi\text{loss}(x, y) = \frac{1}{n} \sum_{i} z_{i} where ziz_{i} is given by: zi={0.5(xi−yi)2/beta,if ∣xi−yi∣>> m = nn.Softmax(dim=1) >>> input = torch.randn(2, 3) >>> output = m(input) # Softmax2d `class torch.nn.Softmax2d` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softmax2d) Applies SoftMax over features to each spatial location. When given an image of `Channels x Height x Width`, it will apply `Softmax` to each location (Channels,hi,wj)(Channels, h_i, w_j) Shape: * Input: (N,C,H,W)(N, C, H, W) * Output: (N,C,H,W)(N, C, H, W) (same shape as input) Returns a Tensor of the same dimension and shape as the input with values in the range [0, 1] Examples: >>> m = nn.Softmax2d() >>> # you softmax over the 2nd dimension >>> input = torch.randn(2, 3, 12, 13) >>> output = m(input) # Softmin `class torch.nn.Softmin(dim=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softmin) Applies the Softmin function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range `[0, 1]` and sum to 1. Softmin is defined as: Softmin(xi)=exp⁡(−xi)∑jexp⁡(−xj)\text{Softmin}(x_{i}) = \frac{\exp(-x_i)}{\sum_j \exp(-x_j)} Shape: * Input: (∗)(*) where `*` means, any number of additional dimensions * Output: (∗)(*) , same shape as the input Parameters **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which Softmin will be computed (so every slice along dim will sum to 1). Returns a Tensor of the same dimension and shape as the input, with values in the range [0, 1] Examples: >>> m = nn.Softmin() >>> input = torch.randn(2, 3) >>> output = m(input) # Softplus `class torch.nn.Softplus(beta=1, threshold=20)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softplus) Applies the element-wise function: Softplus(x)=1β∗log⁡(1+exp⁡(β∗x))\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x)) SoftPlus is a smooth approximation to the ReLU function and can be used to constrain the output of a machine to always be positive. For numerical stability the implementation reverts to the linear function when input×β>thresholdinput \times \beta > threshold . Parameters * **beta** – the β\beta value for the Softplus formulation. Default: 1 * **threshold** – values above this revert to a linear function. Default: 20 Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Softplus() >>> input = torch.randn(2) >>> output = m(input) # Softshrink `class torch.nn.Softshrink(lambd=0.5)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softshrink) Applies the soft shrinkage function elementwise: SoftShrinkage(x)={x−λ, if x>λx+λ, if x<−λ0, otherwise \text{SoftShrinkage}(x) = \begin{cases} x - \lambda, & \text{ if } x > \lambda \\\ x + \lambda, & \text{ if } x < -\lambda \\\ 0, & \text{ otherwise } \end{cases} Parameters **lambd** – the λ\lambda (must be no less than zero) value for the Softshrink formulation. Default: 0.5 Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Softshrink() >>> input = torch.randn(2) >>> output = m(input) # Softsign `class torch.nn.Softsign` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Softsign) Applies the element-wise function: SoftSign(x)=x1+∣x∣\text{SoftSign}(x) = \frac{x}{ 1 + |x|} Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Softsign() >>> input = torch.randn(2) >>> output = m(input) # SyncBatchNorm `class torch.nn.SyncBatchNorm(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True, process_group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#SyncBatchNorm) Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . y=x−E[x]Var[x]+ϵ∗γ+βy = \frac{x - \mathrm{E}[x]}{ \sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta The mean and standard-deviation are calculated per-dimension over all mini- batches of the same process groups. γ\gamma and β\beta are learnable parameter vectors of size `C` (where `C` is the input size). By default, the elements of γ\gamma are sampled from U(0,1)\mathcal{U}(0, 1) and the elements of β\beta are set to 0. The standard-deviation is calculated via the biased estimator, equivalent to `torch.var(input, unbiased=False)`. Also by default, during training this layer keeps running estimates of its computed mean and variance, which are then used for normalization during evaluation. The running estimates are kept with a default `momentum` of 0.1. If `track_running_stats` is set to `False`, this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well. Note This `momentum` argument is different from one used in optimizer classes and the conventional notion of momentum. Mathematically, the update rule for running statistics here is x^new=(1−momentum)×x^+momentum×xt\hat{x}_\text{new} = (1 - \text{momentum}) \times \hat{x} + \text{momentum} \times x_t , where x^\hat{x} is the estimated statistic and xtx_t is the new observed value. Because the Batch Normalization is done for each channel in the `C` dimension, computing statistics on `(N, +)` slices, it’s common terminology to call this Volumetric Batch Normalization or Spatio-temporal Batch Normalization. Currently `SyncBatchNorm` only supports `DistributedDataParallel` (DDP) with single GPU per process. Use `torch.nn.SyncBatchNorm.convert_sync_batchnorm()` to convert `BatchNorm*D` layer to `SyncBatchNorm` before wrapping Network with DDP. Parameters * **num_features** – CC from an expected input of size (N,C,+)(N, C, +) * **eps** – a value added to the denominator for numerical stability. Default: `1e-5` * **momentum** – the value used for the running_mean and running_var computation. Can be set to `None` for cumulative moving average (i.e. simple average). Default: 0.1 * **affine** – a boolean value that when set to `True`, this module has learnable affine parameters. Default: `True` * **track_running_stats** – a boolean value that when set to `True`, this module tracks the running mean and variance, and when set to `False`, this module does not track such statistics, and initializes statistics buffers `running_mean` and `running_var` as `None`. When these buffers are `None`, this module always uses batch statistics. in both training and eval modes. Default: `True` * **process_group** – synchronization of stats happen within each process group individually. Default behavior is synchronization across the whole world Shape: * Input: (N,C,+)(N, C, +) * Output: (N,C,+)(N, C, +) (same shape as input) Examples: >>> # With Learnable Parameters >>> m = nn.SyncBatchNorm(100) >>> # creating process group (optional) >>> # ranks is a list of int identifying rank ids. >>> ranks = list(range(8)) >>> r1, r2 = ranks[:4], ranks[4:] >>> # Note: every rank calls into new_group for every >>> # process group created, even if that rank is not >>> # part of the group. >>> process_groups = [torch.distributed.new_group(pids) for pids in [r1, r2]] >>> process_group = process_groups[0 if dist.get_rank() <= 3 else 1] >>> # Without Learnable Parameters >>> m = nn.BatchNorm3d(100, affine=False, process_group=process_group) >>> input = torch.randn(20, 100, 35, 45, 10) >>> output = m(input) >>> # network is nn.BatchNorm layer >>> sync_bn_network = nn.SyncBatchNorm.convert_sync_batchnorm(network, process_group) >>> # only single gpu per process is currently supported >>> ddp_sync_bn_network = torch.nn.parallel.DistributedDataParallel( >>> sync_bn_network, >>> device_ids=[args.local_rank], >>> output_device=args.local_rank) `classmethod convert_sync_batchnorm(module, process_group=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/batchnorm.html#SyncBatchNorm.convert_sync_batchnorm) Helper function to convert all `BatchNorm*D` layers in the model to `torch.nn.SyncBatchNorm` layers. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing one or more attr:`BatchNorm*D` layers * **process_group** (_optional_) – process group to scope synchronization, default is the whole world Returns The original `module` with the converted `torch.nn.SyncBatchNorm` layers. If the original `module` is a `BatchNorm*D` layer, a new `torch.nn.SyncBatchNorm` layer object will be returned instead. Example: >>> # Network with nn.BatchNorm layer >>> module = torch.nn.Sequential( >>> torch.nn.Linear(20, 100), >>> torch.nn.BatchNorm1d(100), >>> ).cuda() >>> # creating process group (optional) >>> # ranks is a list of int identifying rank ids. >>> ranks = list(range(8)) >>> r1, r2 = ranks[:4], ranks[4:] >>> # Note: every rank calls into new_group for every >>> # process group created, even if that rank is not >>> # part of the group. >>> process_groups = [torch.distributed.new_group(pids) for pids in [r1, r2]] >>> process_group = process_groups[0 if dist.get_rank() <= 3 else 1] >>> sync_bn_module = torch.nn.SyncBatchNorm.convert_sync_batchnorm(module, process_group) # Tanh `class torch.nn.Tanh` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Tanh) Applies the element-wise function: Tanh(x)=tanh⁡(x)=exp⁡(x)−exp⁡(−x)exp⁡(x)+exp⁡(−x)\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)} {\exp(x) + \exp(-x)} Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Tanh() >>> input = torch.randn(2) >>> output = m(input) # Tanhshrink `class torch.nn.Tanhshrink` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Tanhshrink) Applies the element-wise function: Tanhshrink(x)=x−tanh⁡(x)\text{Tanhshrink}(x) = x - \tanh(x) Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Tanhshrink() >>> input = torch.randn(2) >>> output = m(input) # Threshold `class torch.nn.Threshold(threshold, value, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/activation.html#Threshold) Thresholds each element of the input Tensor. Threshold is defined as: y={x, if x>thresholdvalue, otherwise y = \begin{cases} x, &\text{ if } x > \text{threshold} \\\ \text{value}, &\text{ otherwise } \end{cases} Parameters * **threshold** – The value to threshold at * **value** – The value to replace with * **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.Threshold(0.1, 20) >>> input = torch.randn(2) >>> output = m(input) # Transformer `class torch.nn.Transformer(d_model=512, nhead=8, num_encoder_layers=6, num_decoder_layers=6, dim_feedforward=2048, dropout=0.1, activation='relu', custom_encoder=None, custom_decoder=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#Transformer) A transformer model. User is able to modify the attributes as needed. The architecture is based on the paper “Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000-6010. Users can build the BERT() model with corresponding parameters. Parameters * **d_model** – the number of expected features in the encoder/decoder inputs (default=512). * **nhead** – the number of heads in the multiheadattention models (default=8). * **num_encoder_layers** – the number of sub-encoder-layers in the encoder (default=6). * **num_decoder_layers** – the number of sub-decoder-layers in the decoder (default=6). * **dim_feedforward** – the dimension of the feedforward network model (default=2048). * **dropout** – the dropout value (default=0.1). * **activation** – the activation function of encoder/decoder intermediate layer, relu or gelu (default=relu). * **custom_encoder** – custom encoder (default=None). * **custom_decoder** – custom decoder (default=None). Examples:: >>> transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12) >>> src = torch.rand((10, 32, 512)) >>> tgt = torch.rand((20, 32, 512)) >>> out = transformer_model(src, tgt) Note: A full example to apply nn.Transformer module for the word language model is available in `forward(src, tgt, src_mask=None, tgt_mask=None, memory_mask=None, src_key_padding_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#Transformer.forward) Take in and process masked source/target sequences. Parameters * **src** – the sequence to the encoder (required). * **tgt** – the sequence to the decoder (required). * **src_mask** – the additive mask for the src sequence (optional). * **tgt_mask** – the additive mask for the tgt sequence (optional). * **memory_mask** – the additive mask for the encoder output (optional). * **src_key_padding_mask** – the ByteTensor mask for src keys per batch (optional). * **tgt_key_padding_mask** – the ByteTensor mask for tgt keys per batch (optional). * **memory_key_padding_mask** – the ByteTensor mask for memory keys per batch (optional). Shape: * src: (S,N,E)(S, N, E) . * tgt: (T,N,E)(T, N, E) . * src_mask: (S,S)(S, S) . * tgt_mask: (T,T)(T, T) . * memory_mask: (T,S)(T, S) . * src_key_padding_mask: (N,S)(N, S) . * tgt_key_padding_mask: (N,T)(N, T) . * memory_key_padding_mask: (N,S)(N, S) . Note: [src/tgt/memory]_mask ensures that position i is allowed to attend the unmasked positions. If a ByteTensor is provided, the non-zero positions are not allowed to attend while the zero positions will be unchanged. If a BoolTensor is provided, positions with `True` are not allowed to attend while `False` values will be unchanged. If a FloatTensor is provided, it will be added to the attention weight. [src/tgt/memory]_key_padding_mask provides specified elements in the key to be ignored by the attention. If a ByteTensor is provided, the non-zero positions will be ignored while the zero positions will be unchanged. If a BoolTensor is provided, the positions with the value of `True` will be ignored while the position with the value of `False` will be unchanged. * output: (T,N,E)(T, N, E) . Note: Due to the multi-head attention architecture in the transformer model, the output sequence length of a transformer is same as the input sequence (i.e. target) length of the decode. where S is the source sequence length, T is the target sequence length, N is the batch size, E is the feature number #### Examples >>> output = transformer_model(src, tgt, src_mask=src_mask, tgt_mask=tgt_mask) `generate_square_subsequent_mask(sz)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#Transformer.generate_square_subsequent_mask) Generate a square mask for the sequence. The masked positions are filled with float(‘-inf’). Unmasked positions are filled with float(0.0). # TransformerDecoder `class torch.nn.TransformerDecoder(decoder_layer, num_layers, norm=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoder) TransformerDecoder is a stack of N decoder layers Parameters * **decoder_layer** – an instance of the TransformerDecoderLayer() class (required). * **num_layers** – the number of sub-decoder-layers in the decoder (required). * **norm** – the layer normalization component (optional). Examples:: >>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8) >>> transformer_decoder = nn.TransformerDecoder(decoder_layer, num_layers=6) >>> memory = torch.rand(10, 32, 512) >>> tgt = torch.rand(20, 32, 512) >>> out = transformer_decoder(tgt, memory) `forward(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoder.forward) Pass the inputs (and mask) through the decoder layer in turn. Parameters * **tgt** – the sequence to the decoder (required). * **memory** – the sequence from the last layer of the encoder (required). * **tgt_mask** – the mask for the tgt sequence (optional). * **memory_mask** – the mask for the memory sequence (optional). * **tgt_key_padding_mask** – the mask for the tgt keys per batch (optional). * **memory_key_padding_mask** – the mask for the memory keys per batch (optional). Shape: see the docs in Transformer class. # TransformerDecoderLayer `class torch.nn.TransformerDecoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='relu')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoderLayer) TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network. This standard decoder layer is based on the paper “Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000-6010. Users may modify or implement in a different way during application. Parameters * **d_model** – the number of expected features in the input (required). * **nhead** – the number of heads in the multiheadattention models (required). * **dim_feedforward** – the dimension of the feedforward network model (default=2048). * **dropout** – the dropout value (default=0.1). * **activation** – the activation function of intermediate layer, relu or gelu (default=relu). Examples:: >>> decoder_layer = nn.TransformerDecoderLayer(d_model=512, nhead=8) >>> memory = torch.rand(10, 32, 512) >>> tgt = torch.rand(20, 32, 512) >>> out = decoder_layer(tgt, memory) `forward(tgt, memory, tgt_mask=None, memory_mask=None, tgt_key_padding_mask=None, memory_key_padding_mask=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerDecoderLayer.forward) Pass the inputs (and mask) through the decoder layer. Parameters * **tgt** – the sequence to the decoder layer (required). * **memory** – the sequence from the last layer of the encoder (required). * **tgt_mask** – the mask for the tgt sequence (optional). * **memory_mask** – the mask for the memory sequence (optional). * **tgt_key_padding_mask** – the mask for the tgt keys per batch (optional). * **memory_key_padding_mask** – the mask for the memory keys per batch (optional). Shape: see the docs in Transformer class. # TransformerEncoder `class torch.nn.TransformerEncoder(encoder_layer, num_layers, norm=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoder) TransformerEncoder is a stack of N encoder layers Parameters * **encoder_layer** – an instance of the TransformerEncoderLayer() class (required). * **num_layers** – the number of sub-encoder-layers in the encoder (required). * **norm** – the layer normalization component (optional). Examples:: >>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8) >>> transformer_encoder = nn.TransformerEncoder(encoder_layer, num_layers=6) >>> src = torch.rand(10, 32, 512) >>> out = transformer_encoder(src) `forward(src, mask=None, src_key_padding_mask=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoder.forward) Pass the input through the encoder layers in turn. Parameters * **src** – the sequence to the encoder (required). * **mask** – the mask for the src sequence (optional). * **src_key_padding_mask** – the mask for the src keys per batch (optional). Shape: see the docs in Transformer class. # TransformerEncoderLayer `class torch.nn.TransformerEncoderLayer(d_model, nhead, dim_feedforward=2048, dropout=0.1, activation='relu')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoderLayer) TransformerEncoderLayer is made up of self-attn and feedforward network. This standard encoder layer is based on the paper “Attention Is All You Need”. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Processing Systems, pages 6000-6010. Users may modify or implement in a different way during application. Parameters * **d_model** – the number of expected features in the input (required). * **nhead** – the number of heads in the multiheadattention models (required). * **dim_feedforward** – the dimension of the feedforward network model (default=2048). * **dropout** – the dropout value (default=0.1). * **activation** – the activation function of intermediate layer, relu or gelu (default=relu). Examples:: >>> encoder_layer = nn.TransformerEncoderLayer(d_model=512, nhead=8) >>> src = torch.rand(10, 32, 512) >>> out = encoder_layer(src) `forward(src, src_mask=None, src_key_padding_mask=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/transformer.html#TransformerEncoderLayer.forward) Pass the input through the encoder layer. Parameters * **src** – the sequence to the encoder layer (required). * **src_mask** – the mask for the src sequence (optional). * **src_key_padding_mask** – the mask for the src keys per batch (optional). Shape: see the docs in Transformer class. # TripletMarginLoss `class torch.nn.TripletMarginLoss(margin=1.0, p=2.0, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#TripletMarginLoss) Creates a criterion that measures the triplet loss given an input tensors x1x1 , x2x2 , x3x3 and a margin with a value greater than 00 . This is used for measuring a relative similarity between samples. A triplet is composed by `a`, `p` and `n` (i.e., `anchor`, `positive examples` and `negative examples` respectively). The shapes of all input tensors should be (N,D)(N, D) . The distance swap is described in detail in the paper [Learning shallow convolutional feature descriptors with triplet losses](http://www.bmva.org/bmvc/2016/papers/paper119/index.html) by V. Balntas, E. Riba et al. The loss function for each sample in the mini-batch is: L(a,p,n)=max⁡{d(ai,pi)−d(ai,ni)+margin,0}L(a, p, n) = \max \\{d(a_i, p_i) - d(a_i, n_i) + {\rm margin}, 0\\} where d(xi,yi)=∥xi−yi∥pd(x_i, y_i) = \left\lVert {\bf x}_i - {\bf y}_i \right\rVert_p See also [`TripletMarginWithDistanceLoss`](torch.nn.tripletmarginwithdistanceloss#torch.nn.TripletMarginWithDistanceLoss "torch.nn.TripletMarginWithDistanceLoss"), which computes the triplet margin loss for input tensors using a custom distance function. Parameters * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Default: 11 . * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The norm degree for pairwise distance. Default: 22 . * **swap** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – The distance swap is described in detail in the paper `Learning shallow convolutional feature descriptors with triplet losses` by V. Balntas, E. Riba et al. Default: `False`. * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there are multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when `reduce` is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Shape: * Input: (N,D)(N, D) where DD is the vector dimension. * `Output: A Tensor of shape (N)(N) if reduction is 'none', or a scalar` otherwise. >>> triplet_loss = nn.TripletMarginLoss(margin=1.0, p=2) >>> anchor = torch.randn(100, 128, requires_grad=True) >>> positive = torch.randn(100, 128, requires_grad=True) >>> negative = torch.randn(100, 128, requires_grad=True) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward() # TripletMarginWithDistanceLoss `class torch.nn.TripletMarginWithDistanceLoss(*, distance_function=None, margin=1.0, swap=False, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/loss.html#TripletMarginWithDistanceLoss) Creates a criterion that measures the triplet loss given input tensors aa , pp , and nn (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function (“distance function”) used to compute the relationship between the anchor and positive example (“positive distance”) and the anchor and negative example (“negative distance”). The unreduced loss (i.e., with `reduction` set to `'none'`) can be described as: ℓ(a,p,n)=L={l1,…,lN}⊤,li=max⁡{d(ai,pi)−d(ai,ni)+margin,0}\ell(a, p, n) = L = \\{l_1,\dots,l_N\\}^\top, \quad l_i = \max \\{d(a_i, p_i) - d(a_i, n_i) + {\rm margin}, 0\\} where NN is the batch size; dd is a nonnegative, real-valued function quantifying the closeness of two tensors, referred to as the `distance_function`; and marginmargin is a nonnegative margin representing the minimum difference between the positive and negative distances that is required for the loss to be 0. The input tensors have NN elements each and can be of any shape that the distance function can handle. If `reduction` is not `'none'` (default `'mean'`), then: ℓ(x,y)={mean⁡(L),if reduction=‘mean’;sum⁡(L),if reduction=‘sum’.\ell(x, y) = \begin{cases} \operatorname{mean}(L), & \text{if reduction} = \text{`mean';}\\\ \operatorname{sum}(L), & \text{if reduction} = \text{`sum'.} \end{cases} See also [`TripletMarginLoss`](torch.nn.tripletmarginloss#torch.nn.TripletMarginLoss "torch.nn.TripletMarginLoss"), which computes the triplet loss for input tensors using the lpl_p distance as the distance function. Parameters * **distance_function** (_callable_ _,__optional_) – A nonnegative, real-valued function that quantifies the closeness of two tensors. If not specified, `nn.PairwiseDistance` will be used. Default: `None` * **margin** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – A nonnegative margin representing the minimum difference between the positive and negative distances required for the loss to be 0. Larger margins penalize cases where the negative examples are not distant enough from the anchors, relative to the positives. Default: 11 . * **swap** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether to use the distance swap described in the paper `Learning shallow convolutional feature descriptors with triplet losses` by V. Balntas, E. Riba et al. If True, and if the positive example is closer to the negative example than the anchor is, swaps the positive example and the anchor in the loss computation. Default: `False`. * **reduction** (_string_ _,__optional_) – Specifies the (optional) reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Default: `'mean'` Shape: * Input: (N,∗)(N, *) where ∗* represents any number of additional dimensions as supported by the distance function. * Output: A Tensor of shape (N)(N) if `reduction` is `'none'`, or a scalar otherwise. Examples: >>> # Initialize embeddings >>> embedding = nn.Embedding(1000, 128) >>> anchor_ids = torch.randint(0, 1000, (1,)) >>> positive_ids = torch.randint(0, 1000, (1,)) >>> negative_ids = torch.randint(0, 1000, (1,)) >>> anchor = embedding(anchor_ids) >>> positive = embedding(positive_ids) >>> negative = embedding(negative_ids) >>> >>> # Built-in Distance Function >>> triplet_loss = \ >>> nn.TripletMarginWithDistanceLoss(distance_function=nn.PairwiseDistance()) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward() >>> >>> # Custom Distance Function >>> def l_infinity(x1, x2): >>> return torch.max(torch.abs(x1 - x2), dim=1).values >>> >>> triplet_loss = \ >>> nn.TripletMarginWithDistanceLoss(distance_function=l_infinity, margin=1.5) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward() >>> >>> # Custom Distance Function (Lambda) >>> triplet_loss = \ >>> nn.TripletMarginWithDistanceLoss( >>> distance_function=lambda x, y: 1.0 - F.cosine_similarity(x, y)) >>> output = triplet_loss(anchor, positive, negative) >>> output.backward() Reference: V. Balntas, et al.: Learning shallow convolutional feature descriptors with triplet losses: # Unflatten `class torch.nn.Unflatten(dim, unflattened_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/flatten.html#Unflatten) Unflattens a tensor dim expanding it to a desired shape. For use with `Sequential`. * `dim` specifies the dimension of the input tensor to be unflattened, and it can be either `int` or `str` when `Tensor` or `NamedTensor` is used, respectively. * `unflattened_size` is the new shape of the unflattened dimension of the tensor and it can be a `tuple` of ints or a `list` of ints or `torch.Size` for `Tensor` input; a `NamedShape` (tuple of `(name, size)` tuples) for `NamedTensor` input. Shape: * Input: (N,∗dims)(N, *dims) * Output: (N,Cout,Hout,Wout)(N, C_{\text{out}}, H_{\text{out}}, W_{\text{out}}) Parameters * **dim** (_Union_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]_) – Dimension to be unflattened * **unflattened_size** (_Union_ _[__torch.Size_ _,__Tuple_ _,__List_ _,__NamedShape_ _]_) – New shape of the unflattened dimension #### Examples >>> input = torch.randn(2, 50) >>> # With tuple of ints >>> m = nn.Sequential( >>> nn.Linear(50, 50), >>> nn.Unflatten(1, (2, 5, 5)) >>> ) >>> output = m(input) >>> output.size() torch.Size([2, 2, 5, 5]) >>> # With torch.Size >>> m = nn.Sequential( >>> nn.Linear(50, 50), >>> nn.Unflatten(1, torch.Size([2, 5, 5])) >>> ) >>> output = m(input) >>> output.size() torch.Size([2, 2, 5, 5]) >>> # With namedshape (tuple of tuples) >>> input = torch.randn(2, 50, names=('N', 'features')) >>> unflatten = nn.Unflatten('features', (('C', 2), ('H', 5), ('W', 5))) >>> output = unflatten(input) >>> output.size() torch.Size([2, 2, 5, 5]) `add_module(name, module)` Adds a child module to the current module. The module can be accessed as an attribute using the given name. Parameters * **name** (_string_) – name of the child module. The child module can be accessed from this module using the given name * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – child module to be added to the module. `apply(fn)` Applies `fn` recursively to every submodule (as returned by `.children()`) as well as self. Typical use includes initializing the parameters of a model (see also [torch.nn.init](../nn.init#nn-init-doc)). Parameters **fn** ([`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") -> None) – function to be applied to each submodule Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") Example: >>> @torch.no_grad() >>> def init_weights(m): >>> print(m) >>> if type(m) == nn.Linear: >>> m.weight.fill_(1.0) >>> print(m.weight) >>> net = nn.Sequential(nn.Linear(2, 2), nn.Linear(2, 2)) >>> net.apply(init_weights) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Linear(in_features=2, out_features=2, bias=True) Parameter containing: tensor([[ 1., 1.], [ 1., 1.]]) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) `bfloat16()` Casts all floating point parameters and buffers to `bfloat16` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `buffers(recurse=True)` Returns an iterator over module buffers. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _torch.Tensor_ – module buffer Example: >>> for buf in model.buffers(): >>> print(type(buf), buf.size()) (20L,) (20L, 1L, 5L, 5L) `children()` Returns an iterator over immediate children modules. Yields _Module_ – a child module `cpu()` Moves all model parameters and buffers to the CPU. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `cuda(device=None)` Moves all model parameters and buffers to the GPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on GPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `double()` Casts all floating point parameters and buffers to `double` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `eval()` Sets the module in evaluation mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout"), `BatchNorm`, etc. This is equivalent with [`self.train(False)`](torch.nn.module#torch.nn.Module.train "torch.nn.Module.train"). Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `float()` Casts all floating point parameters and buffers to float datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `half()` Casts all floating point parameters and buffers to `half` datatype. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `load_state_dict(state_dict, strict=True)` Copies parameters and buffers from `state_dict` into this module and its descendants. If `strict` is `True`, then the keys of `state_dict` must exactly match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Parameters * **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dict containing parameters and persistent buffers. * **strict** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to strictly enforce that the keys in `state_dict` match the keys returned by this module’s [`state_dict()`](torch.nn.module#torch.nn.Module.state_dict "torch.nn.Module.state_dict") function. Default: `True` Returns * **missing_keys** is a list of str containing the missing keys * **unexpected_keys** is a list of str containing the unexpected keys Return type `NamedTuple` with `missing_keys` and `unexpected_keys` fields `modules()` Returns an iterator over all modules in the network. Yields _Module_ – a module in the network Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.modules()): print(idx, '->', m) 0 -> Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) ) 1 -> Linear(in_features=2, out_features=2, bias=True) `named_buffers(prefix='', recurse=True)` Returns an iterator over module buffers, yielding both the name of the buffer as well as the buffer itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all buffer names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields buffers of this module and all submodules. Otherwise, yields only buffers that are direct members of this module. Yields _(string, torch.Tensor)_ – Tuple containing the name and buffer Example: >>> for name, buf in self.named_buffers(): >>> if name in ['running_var']: >>> print(buf.size()) `named_children()` Returns an iterator over immediate children modules, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple containing a name and child module Example: >>> for name, module in model.named_children(): >>> if name in ['conv4', 'conv5']: >>> print(module) `named_modules(memo=None, prefix='')` Returns an iterator over all modules in the network, yielding both the name of the module as well as the module itself. Yields _(string, Module)_ – Tuple of name and module Note Duplicate modules are returned only once. In the following example, `l` will be returned only once. Example: >>> l = nn.Linear(2, 2) >>> net = nn.Sequential(l, l) >>> for idx, m in enumerate(net.named_modules()): print(idx, '->', m) 0 -> ('', Sequential( (0): Linear(in_features=2, out_features=2, bias=True) (1): Linear(in_features=2, out_features=2, bias=True) )) 1 -> ('0', Linear(in_features=2, out_features=2, bias=True)) `named_parameters(prefix='', recurse=True)` Returns an iterator over module parameters, yielding both the name of the parameter as well as the parameter itself. Parameters * **prefix** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – prefix to prepend to all parameter names. * **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _(string, Parameter)_ – Tuple containing the name and parameter Example: >>> for name, param in self.named_parameters(): >>> if name in ['bias']: >>> print(param.size()) `parameters(recurse=True)` Returns an iterator over module parameters. This is typically passed to an optimizer. Parameters **recurse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if True, then yields parameters of this module and all submodules. Otherwise, yields only parameters that are direct members of this module. Yields _Parameter_ – module parameter Example: >>> for param in model.parameters(): >>> print(type(param), param.size()) (20L,) (20L, 1L, 5L, 5L) `register_backward_hook(hook)` Registers a backward hook on the module. This function is deprecated in favor of `nn.Module.register_full_backward_hook()` and the behavior of this function will change in future versions. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_buffer(name, tensor, persistent=True)` Adds a buffer to the module. This is typically used to register a buffer that should not to be considered a model parameter. For example, BatchNorm’s `running_mean` is not a parameter, but is part of the module’s state. Buffers, by default, are persistent and will be saved alongside parameters. This behavior can be changed by setting `persistent` to `False`. The only difference between a persistent buffer and a non-persistent buffer is that the latter will not be a part of this module’s `state_dict`. Buffers can be accessed as attributes using given names. Parameters * **name** (_string_) – name of the buffer. The buffer can be accessed from this module using the given name * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – buffer to be registered. * **persistent** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the buffer is part of this module’s `state_dict`. Example: >>> self.register_buffer('running_mean', torch.zeros(num_features)) `register_forward_hook(hook)` Registers a forward hook on the module. The hook will be called every time after `forward()` has computed an output. It should have the following signature: hook(module, input, output) -> None or modified output The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the output. It can modify the input inplace but it will not have effect on forward since this is called after `forward()` is called. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_forward_pre_hook(hook)` Registers a forward pre-hook on the module. The hook will be called every time before `forward()` is invoked. It should have the following signature: hook(module, input) -> None or modified input The input contains only the positional arguments given to the module. Keyword arguments won’t be passed to the hooks and only to the `forward`. The hook can modify the input. User can either return a tuple or a single modified value in the hook. We will wrap the value into a tuple if a single value is returned(unless that value is already a tuple). Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_full_backward_hook(hook)` Registers a backward hook on the module. The hook will be called every time the gradients with respect to module inputs are computed. The hook should have the following signature: hook(module, grad_input, grad_output) -> tuple(Tensor) or None The `grad_input` and `grad_output` are tuples that contain the gradients with respect to the inputs and outputs respectively. The hook should not modify its arguments, but it can optionally return a new gradient with respect to the input that will be used in place of `grad_input` in subsequent computations. `grad_input` will only correspond to the inputs given as positional arguments and all kwarg arguments are ignored. Entries in `grad_input` and `grad_output` will be `None` for all non-Tensor arguments. Warning Modifying inputs or outputs inplace is not allowed when using backward hooks and will raise an error. Returns a handle that can be used to remove the added hook by calling `handle.remove()` Return type `torch.utils.hooks.RemovableHandle` `register_parameter(name, param)` Adds a parameter to the module. The parameter can be accessed as an attribute using given name. Parameters * **name** (_string_) – name of the parameter. The parameter can be accessed from this module using the given name * **param** ([Parameter](torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter")) – parameter to be added to the module. `requires_grad_(requires_grad=True)` Change if autograd should record operations on parameters in this module. This method sets the parameters’ `requires_grad` attributes in-place. This method is helpful for freezing part of the module for finetuning or training parts of a model individually (e.g., GAN training). Parameters **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether autograd should record operations on parameters in this module. Default: `True`. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `state_dict(destination=None, prefix='', keep_vars=False)` Returns a dictionary containing a whole state of the module. Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Returns a dictionary containing a whole state of the module Return type [dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)") Example: >>> module.state_dict().keys() ['bias', 'weight'] `to(*args, **kwargs)` Moves and/or casts the parameters and buffers. This can be called as `to(device=None, dtype=None, non_blocking=False)` `to(dtype, non_blocking=False)` `to(tensor, non_blocking=False)` `to(memory_format=torch.channels_last)` Its signature is similar to [`torch.Tensor.to()`](../tensors#torch.Tensor.to "torch.Tensor.to"), but only accepts floating point or complex `dtype`s. In addition, this method will only cast the floating point or complex parameters and buffers to :attr:`dtype` (if given). The integral parameters and buffers will be moved `device`, if that is given, but with dtypes unchanged. When `non_blocking` is set, it tries to convert/move asynchronously with respect to the host if possible, e.g., moving CPU Tensors with pinned memory to CUDA devices. See below for examples. Note This method modifies the module in-place. Parameters * **device** (`torch.device`) – the desired device of the parameters and buffers in this module * **dtype** (`torch.dtype`) – the desired floating point or complex dtype of the parameters and buffers in this module * **tensor** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor whose dtype and device are the desired dtype and device for all parameters and buffers in this module * **memory_format** (`torch.memory_format`) – the desired memory format for 4D parameters and buffers in this module (keyword only argument) Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") Examples: >>> linear = nn.Linear(2, 2) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]]) >>> linear.to(torch.double) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1913, -0.3420], [-0.5113, -0.2325]], dtype=torch.float64) >>> gpu1 = torch.device("cuda:1") >>> linear.to(gpu1, dtype=torch.half, non_blocking=True) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16, device='cuda:1') >>> cpu = torch.device("cpu") >>> linear.to(cpu) Linear(in_features=2, out_features=2, bias=True) >>> linear.weight Parameter containing: tensor([[ 0.1914, -0.3420], [-0.5112, -0.2324]], dtype=torch.float16) >>> linear = nn.Linear(2, 2, bias=None).to(torch.cdouble) >>> linear.weight Parameter containing: tensor([[ 0.3741+0.j, 0.2382+0.j], [ 0.5593+0.j, -0.4443+0.j]], dtype=torch.complex128) >>> linear(torch.ones(3, 2, dtype=torch.cdouble)) tensor([[0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j], [0.6122+0.j, 0.1150+0.j]], dtype=torch.complex128) `train(mode=True)` Sets the module in training mode. This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. [`Dropout`](torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout"), `BatchNorm`, etc. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to set training mode (`True`) or evaluation mode (`False`). Default: `True`. Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `type(dst_type)` Casts all parameters and buffers to `dst_type`. Parameters **dst_type** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – the desired type Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `xpu(device=None)` Moves all model parameters and buffers to the XPU. This also makes associated parameters and buffers different objects. So it should be called before constructing optimizer if the module will live on XPU while being optimized. Parameters **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if specified, all parameters will be copied to that device Returns self Return type [Module](torch.nn.module#torch.nn.Module "torch.nn.Module") `zero_grad(set_to_none=False)` Sets gradients of all model parameters to zero. See similar function under [`torch.optim.Optimizer`](../optim#torch.optim.Optimizer "torch.optim.Optimizer") for more context. Parameters **set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – instead of setting to zero, set the grads to None. See [`torch.optim.Optimizer.zero_grad()`](../optim#torch.optim.Optimizer.zero_grad "torch.optim.Optimizer.zero_grad") for details. # Unfold `class torch.nn.Unfold(kernel_size, dilation=1, padding=0, stride=1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/fold.html#Unfold) Extracts sliding local blocks from a batched input tensor. Consider a batched `input` tensor of shape (N,C,∗)(N, C, *) , where NN is the batch dimension, CC is the channel dimension, and ∗* represent arbitrary spatial dimensions. This operation flattens each sliding `kernel_size`-sized block within the spatial dimensions of `input` into a column (i.e., last dimension) of a 3-D `output` tensor of shape (N,C×∏(kernel_size),L)(N, C \times \prod(\text{kernel\\_size}), L) , where C×∏(kernel_size)C \times \prod(\text{kernel\\_size}) is the total number of values within each block (a block has ∏(kernel_size)\prod(\text{kernel\\_size}) spatial locations each containing a CC -channeled vector), and LL is the total number of such blocks: L=∏d⌊spatial_size[d]+2×padding[d]−dilation[d]×(kernel_size[d]−1)−1stride[d]+1⌋,L = \prod_d \left\lfloor\frac{\text{spatial\\_size}[d] + 2 \times \text{padding}[d] % - \text{dilation}[d] \times (\text{kernel\\_size}[d] - 1) - 1}{\text{stride}[d]} + 1\right\rfloor, where spatial_size\text{spatial\\_size} is formed by the spatial dimensions of `input` (∗* above), and dd is over all spatial dimensions. Therefore, indexing `output` at the last dimension (column dimension) gives all values within a certain block. The `padding`, `stride` and `dilation` arguments specify how the sliding blocks are retrieved. * `stride` controls the stride for the sliding blocks. * `padding` controls the amount of implicit zero-paddings on both sides for `padding` number of points for each dimension before reshaping. * `dilation` controls the spacing between the kernel points; also known as the à trous algorithm. It is harder to describe, but this [link](https://github.com/vdumoulin/conv_arithmetic/blob/master/README.md) has a nice visualization of what `dilation` does. Parameters * **kernel_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the sliding blocks * **stride** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the stride of the sliding blocks in the input spatial dimensions. Default: 1 * **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – implicit zero padding to be added on both sides of input. Default: 0 * **dilation** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – a parameter that controls the stride of elements within the neighborhood. Default: 1 * If `kernel_size`, `dilation`, `padding` or `stride` is an int or a tuple of length 1, their values will be replicated across all spatial dimensions. * For the case of two input spatial dimensions this operation is sometimes called `im2col`. Note [`Fold`](torch.nn.fold#torch.nn.Fold "torch.nn.Fold") calculates each combined value in the resulting large tensor by summing all values from all containing blocks. `Unfold` extracts the values in the local blocks by copying from the large tensor. So, if the blocks overlap, they are not inverses of each other. In general, folding and unfolding operations are related as follows. Consider [`Fold`](torch.nn.fold#torch.nn.Fold "torch.nn.Fold") and `Unfold` instances created with the same parameters: >>> fold_params = dict(kernel_size=..., dilation=..., padding=..., stride=...) >>> fold = nn.Fold(output_size=..., **fold_params) >>> unfold = nn.Unfold(**fold_params) Then for any (supported) `input` tensor the following equality holds: fold(unfold(input)) == divisor * input where `divisor` is a tensor that depends only on the shape and dtype of the `input`: >>> input_ones = torch.ones(input.shape, dtype=input.dtype) >>> divisor = fold(unfold(input_ones)) When the `divisor` tensor contains no zero elements, then `fold` and `unfold` operations are inverses of each other (up to constant divisor). Warning Currently, only 4-D input tensors (batched image-like tensors) are supported. Shape: * Input: (N,C,∗)(N, C, *) * Output: (N,C×∏(kernel_size),L)(N, C \times \prod(\text{kernel\\_size}), L) as described above Examples: >>> unfold = nn.Unfold(kernel_size=(2, 3)) >>> input = torch.randn(2, 5, 3, 4) >>> output = unfold(input) >>> # each patch contains 30 values (2x3=6 vectors, each of 5 channels) >>> # 4 blocks (2x3 kernels) in total in the 3x4 input >>> output.size() torch.Size([2, 30, 4]) >>> # Convolution is equivalent with Unfold + Matrix Multiplication + Fold (or view to output shape) >>> inp = torch.randn(1, 3, 10, 12) >>> w = torch.randn(2, 3, 4, 5) >>> inp_unf = torch.nn.functional.unfold(inp, (4, 5)) >>> out_unf = inp_unf.transpose(1, 2).matmul(w.view(w.size(0), -1).t()).transpose(1, 2) >>> out = torch.nn.functional.fold(out_unf, (7, 8), (1, 1)) >>> # or equivalently (and avoiding a copy), >>> # out = out_unf.view(1, 2, 7, 8) >>> (torch.nn.functional.conv2d(inp, w) - out).abs().max() tensor(1.9073e-06) # Upsample `class torch.nn.Upsample(size=None, scale_factor=None, mode='nearest', align_corners=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/upsampling.html#Upsample) Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. The input data is assumed to be of the form `minibatch x channels x [optional depth] x [optional height] x width`. Hence, for spatial inputs, we expect a 4D Tensor and for volumetric inputs, we expect a 5D Tensor. The algorithms available for upsampling are nearest neighbor and linear, bilinear, bicubic and trilinear for 3D, 4D and 5D input Tensor, respectively. One can either give a `scale_factor` or the target output `size` to calculate the output size. (You cannot give both, as it is ambiguous) Parameters * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – output spatial sizes * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _] or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _] or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – multiplier for spatial size. Has to match input size if it is a tuple. * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – the upsampling algorithm: one of `'nearest'`, `'linear'`, `'bilinear'`, `'bicubic'` and `'trilinear'`. Default: `'nearest'` * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the corner pixels of the input and output tensors are aligned, and thus preserving the values at those pixels. This only has effect when `mode` is `'linear'`, `'bilinear'`, or `'trilinear'`. Default: `False` Shape: * Input: (N,C,Win)(N, C, W_{in}) , (N,C,Hin,Win)(N, C, H_{in}, W_{in}) or (N,C,Din,Hin,Win)(N, C, D_{in}, H_{in}, W_{in}) * Output: (N,C,Wout)(N, C, W_{out}) , (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) or (N,C,Dout,Hout,Wout)(N, C, D_{out}, H_{out}, W_{out}) , where Dout=⌊Din×scale_factor⌋D_{out} = \left\lfloor D_{in} \times \text{scale\\_factor} \right\rfloor Hout=⌊Hin×scale_factor⌋H_{out} = \left\lfloor H_{in} \times \text{scale\\_factor} \right\rfloor Wout=⌊Win×scale_factor⌋W_{out} = \left\lfloor W_{in} \times \text{scale\\_factor} \right\rfloor Warning With `align_corners = True`, the linearly interpolating modes (`linear`, `bilinear`, `bicubic`, and `trilinear`) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior is `align_corners = False`. See below for concrete examples on how this affects the outputs. Note If you want downsampling/general resizing, you should use `interpolate()`. Examples: >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> m = nn.Upsample(scale_factor=2, mode='nearest') >>> m(input) tensor([[[[ 1., 1., 2., 2.], [ 1., 1., 2., 2.], [ 3., 3., 4., 4.], [ 3., 3., 4., 4.]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear') # align_corners=False >>> m(input) tensor([[[[ 1.0000, 1.2500, 1.7500, 2.0000], [ 1.5000, 1.7500, 2.2500, 2.5000], [ 2.5000, 2.7500, 3.2500, 3.5000], [ 3.0000, 3.2500, 3.7500, 4.0000]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) >>> m(input) tensor([[[[ 1.0000, 1.3333, 1.6667, 2.0000], [ 1.6667, 2.0000, 2.3333, 2.6667], [ 2.3333, 2.6667, 3.0000, 3.3333], [ 3.0000, 3.3333, 3.6667, 4.0000]]]]) >>> # Try scaling the same data in a larger tensor >>> >>> input_3x3 = torch.zeros(3, 3).view(1, 1, 3, 3) >>> input_3x3[:, :, :2, :2].copy_(input) tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> input_3x3 tensor([[[[ 1., 2., 0.], [ 3., 4., 0.], [ 0., 0., 0.]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear') # align_corners=False >>> # Notice that values in top left corner are the same with the small input (except at boundary) >>> m(input_3x3) tensor([[[[ 1.0000, 1.2500, 1.7500, 1.5000, 0.5000, 0.0000], [ 1.5000, 1.7500, 2.2500, 1.8750, 0.6250, 0.0000], [ 2.5000, 2.7500, 3.2500, 2.6250, 0.8750, 0.0000], [ 2.2500, 2.4375, 2.8125, 2.2500, 0.7500, 0.0000], [ 0.7500, 0.8125, 0.9375, 0.7500, 0.2500, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) >>> m = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) >>> # Notice that values in top left corner are now changed >>> m(input_3x3) tensor([[[[ 1.0000, 1.4000, 1.8000, 1.6000, 0.8000, 0.0000], [ 1.8000, 2.2000, 2.6000, 2.2400, 1.1200, 0.0000], [ 2.6000, 3.0000, 3.4000, 2.8800, 1.4400, 0.0000], [ 2.4000, 2.7200, 3.0400, 2.5600, 1.2800, 0.0000], [ 1.2000, 1.3600, 1.5200, 1.2800, 0.6400, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) # UpsamplingBilinear2d `class torch.nn.UpsamplingBilinear2d(size=None, scale_factor=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/upsampling.html#UpsamplingBilinear2d) Applies a 2D bilinear upsampling to an input signal composed of several input channels. To specify the scale, it takes either the `size` or the `scale_factor` as it’s constructor argument. When `size` is given, it is the output size of the image `(h, w)`. Parameters * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – output spatial sizes * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – multiplier for spatial size. Warning This class is deprecated in favor of `interpolate()`. It is equivalent to `nn.functional.interpolate(..., mode='bilinear', align_corners=True)`. Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where Hout=⌊Hin×scale_factor⌋H_{out} = \left\lfloor H_{in} \times \text{scale\\_factor} \right\rfloor Wout=⌊Win×scale_factor⌋W_{out} = \left\lfloor W_{in} \times \text{scale\\_factor} \right\rfloor Examples: >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> m = nn.UpsamplingBilinear2d(scale_factor=2) >>> m(input) tensor([[[[ 1.0000, 1.3333, 1.6667, 2.0000], [ 1.6667, 2.0000, 2.3333, 2.6667], [ 2.3333, 2.6667, 3.0000, 3.3333], [ 3.0000, 3.3333, 3.6667, 4.0000]]]]) # UpsamplingNearest2d `class torch.nn.UpsamplingNearest2d(size=None, scale_factor=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/upsampling.html#UpsamplingNearest2d) Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels. To specify the scale, it takes either the `size` or the `scale_factor` as it’s constructor argument. When `size` is given, it is the output size of the image `(h, w)`. Parameters * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__,__optional_) – output spatial sizes * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – multiplier for spatial size. Warning This class is deprecated in favor of `interpolate()`. Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where Hout=⌊Hin×scale_factor⌋H_{out} = \left\lfloor H_{in} \times \text{scale\\_factor} \right\rfloor Wout=⌊Win×scale_factor⌋W_{out} = \left\lfloor W_{in} \times \text{scale\\_factor} \right\rfloor Examples: >>> input = torch.arange(1, 5, dtype=torch.float32).view(1, 1, 2, 2) >>> input tensor([[[[ 1., 2.], [ 3., 4.]]]]) >>> m = nn.UpsamplingNearest2d(scale_factor=2) >>> m(input) tensor([[[[ 1., 1., 2., 2.], [ 1., 1., 2., 2.], [ 3., 3., 4., 4.], [ 3., 3., 4., 4.]]]]) # torch.nn.utils.clip_grad_norm_ `torch.nn.utils.clip_grad_norm_(parameters, max_norm, norm_type=2.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/clip_grad.html#clip_grad_norm_) Clips gradient norm of an iterable of parameters. The norm is computed over all gradients together, as if they were concatenated into a single vector. Gradients are modified in-place. Parameters * **parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _] or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – an iterable of Tensors or a single Tensor that will have gradients normalized * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – max norm of the gradients * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – type of the used p-norm. Can be `'inf'` for infinity norm. Returns Total norm of the parameters (viewed as a single vector). # torch.nn.utils.clip_grad_value_ `torch.nn.utils.clip_grad_value_(parameters, clip_value)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/clip_grad.html#clip_grad_value_) Clips gradient of an iterable of parameters at specified value. Gradients are modified in-place. Parameters * **parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _] or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – an iterable of Tensors or a single Tensor that will have gradients normalized * **clip_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximum allowed value of the gradients. The gradients are clipped in the range [-clip_value,clip_value]\left[\text{-clip\\_value}, \text{clip\\_value}\right] # torch.nn.utils.parameters_to_vector `torch.nn.utils.parameters_to_vector(parameters)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/convert_parameters.html#parameters_to_vector) Convert parameters to one vector Parameters **parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – an iterator of Tensors that are the parameters of a model. Returns The parameters represented by a single vector # BasePruningMethod `class torch.nn.utils.prune.BasePruningMethod` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod) Abstract base class for creation of new pruning techniques. Provides a skeleton for customization requiring the overriding of methods such as `compute_mask()` and `apply()`. `classmethod apply(module, name, *args, importance_scores=None, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.apply) Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **args** – arguments passed on to a subclass of `BasePruningMethod` * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the parameter will be used in its place. * **kwargs** – keyword arguments passed on to a subclass of a `BasePruningMethod` `apply_mask(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.apply_mask) Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `abstract compute_mask(t, default_mask)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.compute_mask) Computes and returns a mask for the input tensor `t`. Starting from a base `default_mask` (which should be a mask of ones if the tensor has not been pruned yet), generate a random mask to apply on top of the `default_mask` according to the specific pruning method recipe. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the importance scores of the * **to prune.** (_parameter_) – * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Base mask from previous pruning * **that need to be respected after the new mask is** (_iterations_ _,_) – * **Same dims as t.** (_applied._) – Returns mask to apply to `t`, of same dims as `t` Return type mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `prune(t, default_mask=None, importance_scores=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.prune) Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#BasePruningMethod.remove) Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # torch.nn.utils.prune.custom_from_mask `torch.nn.utils.prune.custom_from_mask(module, name, mask)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#custom_from_mask) Prunes tensor corresponding to parameter called `name` in `module` by applying the pre-computed mask in `mask`. Modifies module in place (and also return the modified module) by: 1) adding a named buffer called `name+'_mask'` corresponding to the binary mask applied to the parameter `name` by the pruning method. 2) replacing the parameter `name` by its pruned version, while the original (unpruned) parameter is stored in a new parameter named `name+'_orig'`. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **mask** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – binary mask to be applied to the parameter. Returns modified (i.e. pruned) version of the input module Return type module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) #### Examples >>> m = prune.custom_from_mask( nn.Linear(5, 3), name='bias', mask=torch.Tensor([0, 1, 0]) ) >>> print(m.bias_mask) tensor([0., 1., 0.]) # CustomFromMask `class torch.nn.utils.prune.CustomFromMask(mask)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#CustomFromMask) `classmethod apply(module, name, mask)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#CustomFromMask.apply) Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. `apply_mask(module)` Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `prune(t, default_mask=None, importance_scores=None)` Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # torch.nn.utils.prune.global_unstructured `torch.nn.utils.prune.global_unstructured(parameters, pruning_method, importance_scores=None, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#global_unstructured) Globally prunes tensors corresponding to all parameters in `parameters` by applying the specified `pruning_method`. Modifies modules in place by: 1) adding a named buffer called `name+'_mask'` corresponding to the binary mask applied to the parameter `name` by the pruning method. 2) replacing the parameter `name` by its pruned version, while the original (unpruned) parameter is stored in a new parameter named `name+'_orig'`. Parameters * **parameters** (_Iterable of_ _(__module_ _,__name_ _)__tuples_) – parameters of the model to prune in a global fashion, i.e. by aggregating all weights prior to deciding which ones to prune. module must be of type `nn.Module`, and name must be a string. * **pruning_method** (_function_) – a valid pruning function from this module, or a custom one implemented by the user that satisfies the implementation guidelines and has `PRUNING_TYPE='unstructured'`. * **importance_scores** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – a dictionary mapping (module, name) tuples to the corresponding parameter’s importance scores tensor. The tensor should be the same shape as the parameter, and is used for computing mask for pruning. If unspecified or None, the parameter will be used in place of its importance scores. * **kwargs** – other keyword arguments such as: amount (int or float): quantity of parameters to prune across the specified parameters. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. Raises [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError "\(in Python v3.9\)") – if `PRUNING_TYPE != 'unstructured'` Note Since global structured pruning doesn’t make much sense unless the norm is normalized by the size of the parameter, we now limit the scope of global pruning to unstructured methods. #### Examples >>> net = nn.Sequential(OrderedDict([ ('first', nn.Linear(10, 4)), ('second', nn.Linear(4, 1)), ])) >>> parameters_to_prune = ( (net.first, 'weight'), (net.second, 'weight'), ) >>> prune.global_unstructured( parameters_to_prune, pruning_method=prune.L1Unstructured, amount=10, ) >>> print(sum(torch.nn.utils.parameters_to_vector(net.buffers()) == 0)) tensor(10, dtype=torch.uint8) # Identity `class torch.nn.utils.prune.Identity` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#Identity) Utility pruning method that does not prune any units but generates the pruning parametrization with a mask of ones. `classmethod apply(module, name)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#Identity.apply) Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. `apply_mask(module)` Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `prune(t, default_mask=None, importance_scores=None)` Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # torch.nn.utils.prune.is_pruned `torch.nn.utils.prune.is_pruned(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#is_pruned) Check whether `module` is pruned by looking for `forward_pre_hooks` in its modules that inherit from the [`BasePruningMethod`](torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod "torch.nn.utils.prune.BasePruningMethod"). Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – object that is either pruned or unpruned Returns binary answer to whether `module` is pruned. #### Examples >>> m = nn.Linear(5, 7) >>> print(prune.is_pruned(m)) False >>> prune.random_unstructured(m, name='weight', amount=0.2) >>> print(prune.is_pruned(m)) True # torch.nn.utils.prune.l1_unstructured `torch.nn.utils.prune.l1_unstructured(module, name, amount, importance_scores=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#l1_unstructured) Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) units with the lowest L1-norm. Modifies module in place (and also return the modified module) by: 1) adding a named buffer called `name+'_mask'` corresponding to the binary mask applied to the parameter `name` by the pruning method. 2) replacing the parameter `name` by its pruned version, while the original (unpruned) parameter is stored in a new parameter named `name+'_orig'`. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place. Returns modified (i.e. pruned) version of the input module Return type module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) #### Examples >>> m = prune.l1_unstructured(nn.Linear(2, 3), 'weight', amount=0.2) >>> m.state_dict().keys() odict_keys(['bias', 'weight_orig', 'weight_mask']) # L1Unstructured `class torch.nn.utils.prune.L1Unstructured(amount)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#L1Unstructured) Prune (currently unpruned) units in a tensor by zeroing out the ones with the lowest L1-norm. Parameters **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. `classmethod apply(module, name, amount, importance_scores=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#L1Unstructured.apply) Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place. `apply_mask(module)` Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `prune(t, default_mask=None, importance_scores=None)` Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # torch.nn.utils.prune.ln_structured `torch.nn.utils.prune.ln_structured(module, name, amount, n, dim, importance_scores=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#ln_structured) Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` with the lowest L``n``-norm. Modifies module in place (and also return the modified module) by: 1) adding a named buffer called `name+'_mask'` corresponding to the binary mask applied to the parameter `name` by the pruning method. 2) replacing the parameter `name` by its pruned version, while the original (unpruned) parameter is stored in a new parameter named `name+'_orig'`. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'_) – See documentation of valid entries for argument `p` in [`torch.norm()`](torch.norm#torch.norm "torch.norm"). * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index of the dim along which we define channels to prune. * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place. Returns modified (i.e. pruned) version of the input module Return type module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) #### Examples >>> m = prune.ln_structured( nn.Conv2d(5, 3, 2), 'weight', amount=0.3, dim=1, n=float('-inf') ) # LnStructured `class torch.nn.utils.prune.LnStructured(amount, n, dim=-1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#LnStructured) Prune entire (currently unpruned) channels in a tensor based on their Ln-norm. Parameters * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of channels to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'_) – See documentation of valid entries for argument `p` in [`torch.norm()`](torch.norm#torch.norm "torch.norm"). * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – index of the dim along which we define channels to prune. Default: -1. `classmethod apply(module, name, amount, n, dim, importance_scores=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#LnStructured.apply) Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'_) – See documentation of valid entries for argument `p` in [`torch.norm()`](torch.norm#torch.norm "torch.norm"). * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index of the dim along which we define channels to prune. * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the module parameter will be used in its place. `apply_mask(module)` Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `compute_mask(t, default_mask)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#LnStructured.compute_mask) Computes and returns a mask for the input tensor `t`. Starting from a base `default_mask` (which should be a mask of ones if the tensor has not been pruned yet), generate a mask to apply on top of the `default_mask` by zeroing out the channels along the specified dim with the lowest Ln-norm. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the parameter to prune * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Base mask from previous pruning iterations, that need to be respected after the new mask is applied. Same dims as `t`. Returns mask to apply to `t`, of same dims as `t` Return type mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) Raises [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError "\(in Python v3.9\)") – if `self.dim >= len(t.shape)` `prune(t, default_mask=None, importance_scores=None)` Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # PruningContainer `class torch.nn.utils.prune.PruningContainer(*args)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#PruningContainer) Container holding a sequence of pruning methods for iterative pruning. Keeps track of the order in which pruning methods are applied and handles combining successive pruning calls. Accepts as argument an instance of a BasePruningMethod or an iterable of them. `add_pruning_method(method)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#PruningContainer.add_pruning_method) Adds a child pruning `method` to the container. Parameters **method** (_subclass of BasePruningMethod_) – child pruning method to be added to the container. `classmethod apply(module, name, *args, importance_scores=None, **kwargs)` Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **args** – arguments passed on to a subclass of [`BasePruningMethod`](torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod "torch.nn.utils.prune.BasePruningMethod") * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as module parameter) used to compute mask for pruning. The values in this tensor indicate the importance of the corresponding elements in the parameter being pruned. If unspecified or None, the parameter will be used in its place. * **kwargs** – keyword arguments passed on to a subclass of a [`BasePruningMethod`](torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod "torch.nn.utils.prune.BasePruningMethod") `apply_mask(module)` Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `compute_mask(t, default_mask)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#PruningContainer.compute_mask) Applies the latest `method` by computing the new partial masks and returning its combination with the `default_mask`. The new partial mask should be computed on the entries or channels that were not zeroed out by the `default_mask`. Which portions of the tensor `t` the new mask will be calculated from depends on the `PRUNING_TYPE` (handled by the type handler): * for ‘unstructured’, the mask will be computed from the raveled list of nonmasked entries; * for ‘structured’, the mask will be computed from the nonmasked channels in the tensor; * for ‘global’, the mask will be computed across all entries. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the parameter to prune (of same dimensions as `default_mask`). * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – mask from previous pruning iteration. Returns new mask that combines the effects of the `default_mask` and the new mask from the current pruning `method` (of same dimensions as `default_mask` and `t`). Return type mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `prune(t, default_mask=None, importance_scores=None)` Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # torch.nn.utils.prune.random_structured `torch.nn.utils.prune.random_structured(module, name, amount, dim)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#random_structured) Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` selected at random. Modifies module in place (and also return the modified module) by: 1) adding a named buffer called `name+'_mask'` corresponding to the binary mask applied to the parameter `name` by the pruning method. 2) replacing the parameter `name` by its pruned version, while the original (unpruned) parameter is stored in a new parameter named `name+'_orig'`. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – index of the dim along which we define channels to prune. Returns modified (i.e. pruned) version of the input module Return type module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) #### Examples >>> m = prune.random_structured( nn.Linear(5, 3), 'weight', amount=3, dim=1 ) >>> columns_pruned = int(sum(torch.sum(m.weight, dim=0) == 0)) >>> print(columns_pruned) 3 # torch.nn.utils.prune.random_unstructured `torch.nn.utils.prune.random_unstructured(module, name, amount)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#random_unstructured) Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) units selected at random. Modifies module in place (and also return the modified module) by: 1) adding a named buffer called `name+'_mask'` corresponding to the binary mask applied to the parameter `name` by the pruning method. 2) replacing the parameter `name` by its pruned version, while the original (unpruned) parameter is stored in a new parameter named `name+'_orig'`. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. Returns modified (i.e. pruned) version of the input module Return type module ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) #### Examples >>> m = prune.random_unstructured(nn.Linear(2, 3), 'weight', amount=1) >>> torch.sum(m.weight_mask == 0) tensor(1) # RandomStructured `class torch.nn.utils.prune.RandomStructured(amount, dim=-1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomStructured) Prune entire (currently unpruned) channels in a tensor at random. Parameters * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – index of the dim along which we define channels to prune. Default: -1. `classmethod apply(module, name, amount, dim=-1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomStructured.apply) Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – index of the dim along which we define channels to prune. Default: -1. `apply_mask(module)` Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `compute_mask(t, default_mask)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomStructured.compute_mask) Computes and returns a mask for the input tensor `t`. Starting from a base `default_mask` (which should be a mask of ones if the tensor has not been pruned yet), generate a random mask to apply on top of the `default_mask` by randomly zeroing out channels along the specified dim of the tensor. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor representing the parameter to prune * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – Base mask from previous pruning iterations, that need to be respected after the new mask is applied. Same dims as `t`. Returns mask to apply to `t`, of same dims as `t` Return type mask ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) Raises [**IndexError**](https://docs.python.org/3/library/exceptions.html#IndexError "\(in Python v3.9\)") – if `self.dim >= len(t.shape)` `prune(t, default_mask=None, importance_scores=None)` Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # RandomUnstructured `class torch.nn.utils.prune.RandomUnstructured(amount)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomUnstructured) Prune (currently unpruned) units in a tensor at random. Parameters * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. `classmethod apply(module, name, amount)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#RandomUnstructured.apply) Adds the forward pre-hook that enables pruning on the fly and the reparametrization of a tensor in terms of the original tensor and the pruning mask. Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. * **amount** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – quantity of parameters to prune. If `float`, should be between 0.0 and 1.0 and represent the fraction of parameters to prune. If `int`, it represents the absolute number of parameters to prune. `apply_mask(module)` Simply handles the multiplication between the parameter being pruned and the generated mask. Fetches the mask and the original tensor from the module and returns the pruned version of the tensor. Parameters **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune Returns pruned version of the input tensor Return type pruned_tensor ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) `prune(t, default_mask=None, importance_scores=None)` Computes and returns a pruned version of input tensor `t` according to the pruning rule specified in `compute_mask()`. Parameters * **t** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to prune (of same dimensions as `default_mask`). * **importance_scores** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor of importance scores (of same shape as `t`) used to compute mask for pruning `t`. The values in this tensor indicate the importance of the corresponding elements in the `t` that is being pruned. If unspecified or None, the tensor `t` will be used in its place. * **default_mask** ([torch.Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – mask from previous pruning iteration, if any. To be considered when determining what portion of the tensor that pruning should act on. If None, default to a mask of ones. Returns pruned version of tensor `t`. `remove(module)` Removes the pruning reparameterization from a module. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! # torch.nn.utils.prune.remove `torch.nn.utils.prune.remove(module, name)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/prune.html#remove) Removes the pruning reparameterization from a module and the pruning method from the forward hook. The pruned parameter named `name` remains permanently pruned, and the parameter named `name+'_orig'` is removed from the parameter list. Similarly, the buffer named `name+'_mask'` is removed from the buffers. Note Pruning itself is NOT undone or reversed! Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – module containing the tensor to prune * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – parameter name within `module` on which pruning will act. #### Examples >>> m = random_unstructured(nn.Linear(5, 7), name='weight', amount=0.2) >>> m = remove(m, name='weight') # torch.nn.utils.remove_spectral_norm `torch.nn.utils.remove_spectral_norm(module, name='weight')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/spectral_norm.html#remove_spectral_norm) Removes the spectral normalization reparameterization from a module. Parameters * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter #### Example >>> m = spectral_norm(nn.Linear(40, 10)) >>> remove_spectral_norm(m) # torch.nn.utils.remove_weight_norm `torch.nn.utils.remove_weight_norm(module, name='weight')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/weight_norm.html#remove_weight_norm) Removes the weight normalization reparameterization from a module. Parameters * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter #### Example >>> m = weight_norm(nn.Linear(20, 40)) >>> remove_weight_norm(m) # torch.nn.utils.rnn.pack_padded_sequence `torch.nn.utils.rnn.pack_padded_sequence(input, lengths, batch_first=False, enforce_sorted=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pack_padded_sequence) Packs a Tensor containing padded sequences of variable length. `input` can be of size `T x B x *` where `T` is the length of the longest sequence (equal to `lengths[0]`), `B` is the batch size, and `*` is any number of dimensions (including 0). If `batch_first` is `True`, `B x T x *` `input` is expected. For unsorted sequences, use `enforce_sorted = False`. If `enforce_sorted` is `True`, the sequences should be sorted by length in a decreasing order, i.e. `input[:,0]` should be the longest sequence, and `input[:,B-1]` the shortest one. `enforce_sorted = True` is only necessary for ONNX export. Note This function accepts any input that has at least two dimensions. You can apply it to pack the labels, and use the output of the RNN with them to compute the loss directly. A Tensor can be retrieved from a [`PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") object by accessing its `.data` attribute. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – padded batch of variable length sequences. * **lengths** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _(_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _)_) – list of sequence lengths of each batch element (must be on the CPU if provided as a tensor). * **batch_first** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the input is expected in `B x T x *` format. * **enforce_sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the input is expected to contain sequences sorted by length in a decreasing order. If `False`, the input will get sorted unconditionally. Default: `True`. Returns a [`PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") object # torch.nn.utils.rnn.pack_sequence `torch.nn.utils.rnn.pack_sequence(sequences, enforce_sorted=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pack_sequence) Packs a list of variable length Tensors `sequences` should be a list of Tensors of size `L x *`, where `L` is the length of a sequence and `*` is any number of trailing dimensions, including zero. For unsorted sequences, use `enforce_sorted = False`. If `enforce_sorted` is `True`, the sequences should be sorted in the order of decreasing length. `enforce_sorted = True` is only necessary for ONNX export. #### Example >>> from torch.nn.utils.rnn import pack_sequence >>> a = torch.tensor([1,2,3]) >>> b = torch.tensor([4,5]) >>> c = torch.tensor([6]) >>> pack_sequence([a, b, c]) PackedSequence(data=tensor([ 1, 4, 6, 2, 5, 3]), batch_sizes=tensor([ 3, 2, 1])) Parameters * **sequences** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – A list of sequences of decreasing length. * **enforce_sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, checks that the input contains sequences sorted by length in a decreasing order. If `False`, this condition is not checked. Default: `True`. Returns a [`PackedSequence`](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") object # PackedSequence `class torch.nn.utils.rnn.PackedSequence` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#PackedSequence) Holds the data and list of `batch_sizes` of a packed sequence. All RNN modules accept packed sequences as inputs. Note Instances of this class should never be created manually. They are meant to be instantiated by functions like [`pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence"). Batch sizes represent the number elements at each sequence step in the batch, not the varying sequence lengths passed to [`pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence"). For instance, given data `abc` and `x` the `PackedSequence` would contain data `axbc` with `batch_sizes=[2,1,1]`. Variables * **~PackedSequence.data** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor containing packed sequence * **~PackedSequence.batch_sizes** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Tensor of integers holding information about the batch size at each sequence step * **~PackedSequence.sorted_indices** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – Tensor of integers holding how this `PackedSequence` is constructed from sequences. * **~PackedSequence.unsorted_indices** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – Tensor of integers holding how this to recover the original sequences with correct order. Note `data` can be on arbitrary device and of arbitrary dtype. `sorted_indices` and `unsorted_indices` must be `torch.int64` tensors on the same device as `data`. However, `batch_sizes` should always be a CPU `torch.int64` tensor. This invariant is maintained throughout `PackedSequence` class, and all functions that construct a `:class:PackedSequence` in PyTorch (i.e., they only pass in tensors conforming to this constraint). `property batch_sizes` Alias for field number 1 `count()` Return number of occurrences of value. `property data` Alias for field number 0 `index()` Return first index of value. Raises ValueError if the value is not present. `property is_cuda` Returns true if `self.data` stored on a gpu `is_pinned()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#PackedSequence.is_pinned) Returns true if `self.data` stored on in pinned memory `property sorted_indices` Alias for field number 2 `to(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#PackedSequence.to) Performs dtype and/or device conversion on `self.data`. It has similar signature as [`torch.Tensor.to()`](../tensors#torch.Tensor.to "torch.Tensor.to"), except optional arguments like `non_blocking` and `copy` should be passed as kwargs, not args, or they will not apply to the index tensors. Note If the `self.data` Tensor already has the correct `torch.dtype` and `torch.device`, then `self` is returned. Otherwise, returns a copy with the desired configuration. `property unsorted_indices` Alias for field number 3 # torch.nn.utils.rnn.pad_packed_sequence `torch.nn.utils.rnn.pad_packed_sequence(sequence, batch_first=False, padding_value=0.0, total_length=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pad_packed_sequence) Pads a packed batch of variable length sequences. It is an inverse operation to [`pack_padded_sequence()`](torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence"). The returned Tensor’s data will be of size `T x B x *`, where `T` is the length of the longest sequence and `B` is the batch size. If `batch_first` is True, the data will be transposed into `B x T x *` format. #### Example >>> from torch.nn.utils.rnn import pack_padded_sequence, pad_packed_sequence >>> seq = torch.tensor([[1,2,0], [3,0,0], [4,5,6]]) >>> lens = [2, 1, 3] >>> packed = pack_padded_sequence(seq, lens, batch_first=True, enforce_sorted=False) >>> packed PackedSequence(data=tensor([4, 1, 3, 5, 2, 6]), batch_sizes=tensor([3, 2, 1]), sorted_indices=tensor([2, 0, 1]), unsorted_indices=tensor([1, 2, 0])) >>> seq_unpacked, lens_unpacked = pad_packed_sequence(packed, batch_first=True) >>> seq_unpacked tensor([[1, 2, 0], [3, 0, 0], [4, 5, 6]]) >>> lens_unpacked tensor([2, 1, 3]) Note `total_length` is useful to implement the `pack sequence -> recurrent network -> unpack sequence` pattern in a [`Module`](torch.nn.module#torch.nn.Module "torch.nn.Module") wrapped in [`DataParallel`](torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel"). See [this FAQ section](https://pytorch.org/docs/1.8.0/notes/faq.html#pack-rnn-unpack-with- data-parallelism) for details. Parameters * **sequence** ([PackedSequence](torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence")) – batch to pad * **batch_first** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the output will be in `B x T x *` format. * **padding_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – values for padded elements. * **total_length** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if not `None`, the output will be padded to have length `total_length`. This method will throw [`ValueError`](https://docs.python.org/3/library/exceptions.html#ValueError "\(in Python v3.9\)") if `total_length` is less than the max sequence length in `sequence`. Returns Tuple of Tensor containing the padded sequence, and a Tensor containing the list of lengths of each sequence in the batch. Batch elements will be re- ordered as they were ordered originally when the batch was passed to `pack_padded_sequence` or `pack_sequence`. # torch.nn.utils.rnn.pad_sequence `torch.nn.utils.rnn.pad_sequence(sequences, batch_first=False, padding_value=0.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/rnn.html#pad_sequence) Pad a list of variable length Tensors with `padding_value` `pad_sequence` stacks a list of Tensors along a new dimension, and pads them to equal length. For example, if the input is list of sequences with size `L x *` and if batch_first is False, and `T x B x *` otherwise. `B` is batch size. It is equal to the number of elements in `sequences`. `T` is length of the longest sequence. `L` is length of the sequence. `*` is any number of trailing dimensions, including none. #### Example >>> from torch.nn.utils.rnn import pad_sequence >>> a = torch.ones(25, 300) >>> b = torch.ones(22, 300) >>> c = torch.ones(15, 300) >>> pad_sequence([a, b, c]).size() torch.Size([25, 3, 300]) Note This function returns a Tensor of size `T x B x *` or `B x T x *` where `T` is the length of the longest sequence. This function assumes trailing dimensions and type of all the Tensors in sequences are same. Parameters * **sequences** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – list of variable length sequences. * **batch_first** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – output will be in `B x T x *` if True, or in `T x B x *` otherwise * **padding_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – value for padded elements. Default: 0. Returns Tensor of size `T x B x *` if `batch_first` is `False`. Tensor of size `B x T x *` otherwise # torch.nn.utils.spectral_norm `torch.nn.utils.spectral_norm(module, name='weight', n_power_iterations=1, eps=1e-12, dim=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/spectral_norm.html#spectral_norm) Applies spectral normalization to a parameter in the given module. WSN=Wσ(W),σ(W)=max⁡h:h≠0∥Wh∥2∥h∥2\mathbf{W}_{SN} = \dfrac{\mathbf{W}}{\sigma(\mathbf{W})}, \sigma(\mathbf{W}) = \max_{\mathbf{h}: \mathbf{h} \ne 0} \dfrac{\|\mathbf{W} \mathbf{h}\|_2}{\|\mathbf{h}\|_2} Spectral normalization stabilizes the training of discriminators (critics) in Generative Adversarial Networks (GANs) by rescaling the weight tensor with spectral norm σ\sigma of the weight matrix calculated using power iteration method. If the dimension of the weight tensor is greater than 2, it is reshaped to 2D in power iteration method to get spectral norm. This is implemented via a hook that calculates spectral norm and rescales weight before every `forward()` call. See [Spectral Normalization for Generative Adversarial Networks](https://arxiv.org/abs/1802.05957) . Parameters * **module** ([nn.Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter * **n_power_iterations** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – number of power iterations to calculate spectral norm * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – epsilon for numerical stability in calculating norms * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – dimension corresponding to number of outputs, the default is `0`, except for modules that are instances of ConvTranspose{1,2,3}d, when it is `1` Returns The original module with the spectral norm hook Example: >>> m = spectral_norm(nn.Linear(20, 40)) >>> m Linear(in_features=20, out_features=40, bias=True) >>> m.weight_u.size() torch.Size([40]) # torch.nn.utils.vector_to_parameters `torch.nn.utils.vector_to_parameters(vec, parameters)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/convert_parameters.html#vector_to_parameters) Convert one vector to the parameters Parameters * **vec** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – a single vector represents the parameters of a model. * **parameters** (_Iterable_ _[_[Tensor](../tensors#torch.Tensor "torch.Tensor") _]_) – an iterator of Tensors that are the parameters of a model. # torch.nn.utils.weight_norm `torch.nn.utils.weight_norm(module, name='weight', dim=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/utils/weight_norm.html#weight_norm) Applies weight normalization to a parameter in the given module. w=gv∥v∥\mathbf{w} = g \dfrac{\mathbf{v}}{\|\mathbf{v}\|} Weight normalization is a reparameterization that decouples the magnitude of a weight tensor from its direction. This replaces the parameter specified by `name` (e.g. `'weight'`) with two parameters: one specifying the magnitude (e.g. `'weight_g'`) and one specifying the direction (e.g. `'weight_v'`). Weight normalization is implemented via a hook that recomputes the weight tensor from the magnitude and direction before every `forward()` call. By default, with `dim=0`, the norm is computed independently per output channel/plane. To compute a norm over the entire weight tensor, use `dim=None`. See Parameters * **module** ([Module](torch.nn.module#torch.nn.Module "torch.nn.Module")) – containing module * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – name of weight parameter * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – dimension over which to compute the norm Returns The original module with the weight norm hook Example: >>> m = weight_norm(nn.Linear(20, 40), name='weight') >>> m Linear(in_features=20, out_features=40, bias=True) >>> m.weight_g.size() torch.Size([40, 1]) >>> m.weight_v.size() torch.Size([40, 20]) # ZeroPad2d `class torch.nn.ZeroPad2d(padding)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/modules/padding.html#ZeroPad2d) Pads the input tensor boundaries with zero. For `N`-dimensional padding, use [`torch.nn.functional.pad()`](../nn.functional#torch.nn.functional.pad "torch.nn.functional.pad"). Parameters **padding** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the size of the padding. If is `int`, uses the same padding in all boundaries. If a 4-`tuple`, uses (padding_left\text{padding\\_left} , padding_right\text{padding\\_right} , padding_top\text{padding\\_top} , padding_bottom\text{padding\\_bottom} ) Shape: * Input: (N,C,Hin,Win)(N, C, H_{in}, W_{in}) * Output: (N,C,Hout,Wout)(N, C, H_{out}, W_{out}) where Hout=Hin+padding_top+padding_bottomH_{out} = H_{in} + \text{padding\\_top} + \text{padding\\_bottom} Wout=Win+padding_left+padding_rightW_{out} = W_{in} + \text{padding\\_left} + \text{padding\\_right} Examples: >>> m = nn.ZeroPad2d(2) >>> input = torch.randn(1, 1, 3, 3) >>> input tensor([[[[-0.1678, -0.4418, 1.9466], [ 0.9604, -0.4219, -0.5241], [-0.9162, -0.5436, -0.6446]]]]) >>> m(input) tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.1678, -0.4418, 1.9466, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.9604, -0.4219, -0.5241, 0.0000, 0.0000], [ 0.0000, 0.0000, -0.9162, -0.5436, -0.6446, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000]]]]) >>> # using different paddings for different sides >>> m = nn.ZeroPad2d((1, 1, 2, 0)) >>> m(input) tensor([[[[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.0000, -0.1678, -0.4418, 1.9466, 0.0000], [ 0.0000, 0.9604, -0.4219, -0.5241, 0.0000], [ 0.0000, -0.9162, -0.5436, -0.6446, 0.0000]]]]) # no_grad `class torch.no_grad` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#no_grad) Context-manager that disabled gradient calculation. Disabling gradient calculation is useful for inference, when you are sure that you will not call [`Tensor.backward()`](../autograd#torch.Tensor.backward "torch.Tensor.backward"). It will reduce memory consumption for computations that would otherwise have `requires_grad=True`. In this mode, the result of every computation will have `requires_grad=False`, even when the inputs have `requires_grad=True`. This context manager is thread local; it will not affect computation in other threads. Also functions as a decorator. (Make sure to instantiate with parenthesis.) Example: >>> x = torch.tensor([1], requires_grad=True) >>> with torch.no_grad(): ... y = x * 2 >>> y.requires_grad False >>> @torch.no_grad() ... def doubler(x): ... return x * 2 >>> z = doubler(x) >>> z.requires_grad False # torch.nonzero `torch.nonzero(input, *, out=None, as_tuple=False) → LongTensor or tuple of LongTensors` Note `torch.nonzero(..., as_tuple=False)` (default) returns a 2-D tensor where each row is the index for a nonzero value. `torch.nonzero(..., as_tuple=True)` returns a tuple of 1-D index tensors, allowing for advanced indexing, so `x[x.nonzero(as_tuple=True)]` gives all nonzero values of tensor `x`. Of the returned tuple, each index tensor contains nonzero indices for a certain dimension. See below for more details on the two behaviors. When `input` is on CUDA, `torch.nonzero()` causes host-device synchronization. **When** `as_tuple` **is ``False`` (default)** : Returns a tensor containing the indices of all non-zero elements of `input`. Each row in the result contains the indices of a non-zero element in `input`. The result is sorted lexicographically, with the last index changing the fastest (C-style). If `input` has nn dimensions, then the resulting indices tensor `out` is of size (z×n)(z \times n) , where zz is the total number of non-zero elements in the `input` tensor. **When** `as_tuple` **is ``True``** : Returns a tuple of 1-D tensors, one for each dimension in `input`, each containing the indices (in that dimension) of all non-zero elements of `input` . If `input` has nn dimensions, then the resulting tuple contains nn tensors of size zz , where zz is the total number of non-zero elements in the `input` tensor. As a special case, when `input` has zero dimensions and a nonzero scalar value, it is treated as a one-dimensional tensor with one element. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** (_LongTensor_ _,__optional_) – the output tensor containing indices Returns If `as_tuple` is `False`, the output tensor containing indices. If `as_tuple` is `True`, one 1-D tensor for each dimension, containing the indices of each nonzero element along that dimension. Return type LongTensor or tuple of LongTensor Example: >>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1])) tensor([[ 0], [ 1], [ 2], [ 4]]) >>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0], ... [0.0, 0.4, 0.0, 0.0], ... [0.0, 0.0, 1.2, 0.0], ... [0.0, 0.0, 0.0,-0.4]])) tensor([[ 0, 0], [ 1, 1], [ 2, 2], [ 3, 3]]) >>> torch.nonzero(torch.tensor([1, 1, 1, 0, 1]), as_tuple=True) (tensor([0, 1, 2, 4]),) >>> torch.nonzero(torch.tensor([[0.6, 0.0, 0.0, 0.0], ... [0.0, 0.4, 0.0, 0.0], ... [0.0, 0.0, 1.2, 0.0], ... [0.0, 0.0, 0.0,-0.4]]), as_tuple=True) (tensor([0, 1, 2, 3]), tensor([0, 1, 2, 3])) >>> torch.nonzero(torch.tensor(5), as_tuple=True) (tensor([0]),) # torch.norm `torch.norm(input, p='fro', dim=None, keepdim=False, out=None, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#norm) Returns the matrix norm or vector norm of a given tensor. Warning torch.norm is deprecated and may be removed in a future PyTorch release. Use [`torch.linalg.norm()`](../linalg#torch.linalg.norm "torch.linalg.norm") instead, but note that [`torch.linalg.norm()`](../linalg#torch.linalg.norm "torch.linalg.norm") has a different signature and slightly different behavior that is more consistent with NumPy’s numpy.linalg.norm. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The input tensor. Its data type must be either a floating point or complex type. For complex inputs, the norm is calculated using the absolute value of each element. If the input is complex and neither `dtype` nor `out` is specified, the result’s data type will be the corresponding floating point type (e.g. float if `input` is complexfloat). * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'__,__optional_) – the order of norm. Default: `'fro'` The following norms can be calculated: ord | matrix norm | vector norm ---|---|--- ’fro’ | Frobenius norm | – ‘nuc’ | nuclear norm | – Number | – | sum(abs(x)**ord)**(1./ord) The vector norm can be calculated across any number of dimensions. The corresponding dimensions of `input` are flattened into one dimension, and the norm is calculated on the flattened dimension. Frobenius norm produces the same result as `p=2` in all cases except when `dim` is a list of three or more dims, in which case Frobenius norm throws an error. Nuclear norm can only be calculated across exactly two dimensions. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__tuple of python:ints_ _,__list of python:ints_ _,__optional_) – Specifies which dimension or dimensions of `input` to calculate the norm across. If `dim` is `None`, the norm will be calculated across all dimensions of `input`. If the norm type indicated by `p` does not support the specified number of dimensions, an error will occur. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether the output tensors have `dim` retained or not. Ignored if `dim` = `None` and `out` = `None`. Default: `False` * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Ignored if `dim` = `None` and `out` = `None`. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to :attr:’dtype’ while performing the operation. Default: None. Note Even though `p='fro'` supports any number of dimensions, the true mathematical definition of Frobenius norm only applies to tensors with exactly two dimensions. [`torch.linalg.norm()`](../linalg#torch.linalg.norm "torch.linalg.norm") with `ord='fro'` aligns with the mathematical definition, since it can only be applied across exactly two dimensions. Example: >>> import torch >>> a = torch.arange(9, dtype= torch.float) - 4 >>> b = a.reshape((3, 3)) >>> torch.norm(a) tensor(7.7460) >>> torch.norm(b) tensor(7.7460) >>> torch.norm(a, float('inf')) tensor(4.) >>> torch.norm(b, float('inf')) tensor(4.) >>> c = torch.tensor([[ 1, 2, 3],[-1, 1, 4]] , dtype= torch.float) >>> torch.norm(c, dim=0) tensor([1.4142, 2.2361, 5.0000]) >>> torch.norm(c, dim=1) tensor([3.7417, 4.2426]) >>> torch.norm(c, p=1, dim=1) tensor([6., 6.]) >>> d = torch.arange(8, dtype= torch.float).reshape(2,2,2) >>> torch.norm(d, dim=(1,2)) tensor([ 3.7417, 11.2250]) >>> torch.norm(d[0, :, :]), torch.norm(d[1, :, :]) (tensor(3.7417), tensor(11.2250)) # torch.normal `torch.normal(mean, std, *, generator=None, out=None) → Tensor` Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. The [`mean`](torch.mean#torch.mean "torch.mean") is a tensor with the mean of each output element’s normal distribution The [`std`](torch.std#torch.std "torch.std") is a tensor with the standard deviation of each output element’s normal distribution The shapes of [`mean`](torch.mean#torch.mean "torch.mean") and [`std`](torch.std#torch.std "torch.std") don’t need to match, but the total number of elements in each tensor need to be the same. Note When the shapes do not match, the shape of [`mean`](torch.mean#torch.mean "torch.mean") is used as the shape for the returned output tensor Parameters * **mean** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element means * **std** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element standard deviations Keyword Arguments * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.normal(mean=torch.arange(1., 11.), std=torch.arange(1, 0, -0.1)) tensor([ 1.0425, 3.5672, 2.7969, 4.2925, 4.7229, 6.2134, 8.0505, 8.1408, 9.0563, 10.0566]) `torch.normal(mean=0.0, std, *, out=None) → Tensor` Similar to the function above, but the means are shared among all drawn elements. Parameters * **mean** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the mean for all distributions * **std** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element standard deviations Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.normal(mean=0.5, std=torch.arange(1., 6.)) tensor([-1.2793, -1.0732, -2.0687, 5.1177, -1.2303]) `torch.normal(mean, std=1.0, *, out=None) → Tensor` Similar to the function above, but the standard-deviations are shared among all drawn elements. Parameters * **mean** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor of per-element means * **std** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the standard deviation for all distributions Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor Example: >>> torch.normal(mean=torch.arange(1., 6.)) tensor([ 1.1552, 2.6148, 2.6535, 5.8318, 4.2361]) `torch.normal(mean, std, size, *, out=None) → Tensor` Similar to the function above, but the means and standard deviations are shared among all drawn elements. The resulting tensor has size given by `size`. Parameters * **mean** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the mean for all distributions * **std** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the standard deviation for all distributions * **size** (_int..._) – a sequence of integers defining the shape of the output tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.normal(2, 3, size=(1, 4)) tensor([[-1.3987, -1.9544, 3.6048, 0.7909]]) # torch.not_equal `torch.not_equal(input, other, *, out=None) → Tensor` Alias for [`torch.ne()`](torch.ne#torch.ne "torch.ne"). # torch.numel `torch.numel(input) → int` Returns the total number of elements in the `input` tensor. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> a = torch.randn(1, 2, 3, 4, 5) >>> torch.numel(a) 120 >>> a = torch.zeros(4,4) >>> torch.numel(a) 16 # torch.ones `torch.ones(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a tensor filled with the scalar value `1`, with the shape defined by the variable argument `size`. Parameters **size** (_int..._) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.ones(2, 3) tensor([[ 1., 1., 1.], [ 1., 1., 1.]]) >>> torch.ones(5) tensor([ 1., 1., 1., 1., 1.]) # torch.ones_like `torch.ones_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor` Returns a tensor filled with the scalar value `1`, with the same size as `input`. `torch.ones_like(input)` is equivalent to `torch.ones(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)`. Warning As of 0.4, this function does not support an `out` keyword. As an alternative, the old `torch.ones_like(input, out=output)` is equivalent to `torch.ones(input.size(), out=output)`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. Example: >>> input = torch.empty(2, 3) >>> torch.ones_like(input) tensor([[ 1., 1., 1.], [ 1., 1., 1.]]) # torch.orgqr `torch.orgqr(input, input2) → Tensor` Computes the orthogonal matrix `Q` of a QR factorization, from the `(input, input2)` tuple returned by [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf"). This directly calls the underlying LAPACK function `?orgqr`. See [LAPACK documentation for orgqr](https://software.intel.com/en-us/mkl-developer- reference-c-orgqr) for further details. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `a` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf"). * **input2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `tau` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf"). # torch.ormqr `torch.ormqr(input, input2, input3, left=True, transpose=False) → Tensor` Multiplies `mat` (given by `input3`) by the orthogonal `Q` matrix of the QR factorization formed by [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf") that is represented by `(a, tau)` (given by (`input`, `input2`)). This directly calls the underlying LAPACK function `?ormqr`. See [LAPACK documentation for ormqr](https://software.intel.com/en-us/mkl-developer- reference-c-ormqr) for further details. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `a` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf"). * **input2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the `tau` from [`torch.geqrf()`](torch.geqrf#torch.geqrf "torch.geqrf"). * **input3** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the matrix to be multiplied. # torch.outer `torch.outer(input, vec2, *, out=None) → Tensor` Outer product of `input` and `vec2`. If `input` is a vector of size nn and `vec2` is a vector of size mm , then `out` must be a matrix of size (n×m)(n \times m) . Note This function does not [broadcast](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D input vector * **vec2** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D input vector Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – optional output matrix Example: >>> v1 = torch.arange(1., 5.) >>> v2 = torch.arange(1., 4.) >>> torch.outer(v1, v2) tensor([[ 1., 2., 3.], [ 2., 4., 6.], [ 3., 6., 9.], [ 4., 8., 12.]]) # torch.pca_lowrank `torch.pca_lowrank(A, q=None, center=True, niter=2)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_lowrank.html#pca_lowrank) Performs linear Principal Component Analysis (PCA) on a low-rank matrix, batches of such matrices, or sparse matrix. This function returns a namedtuple `(U, S, V)` which is the nearly optimal approximation of a singular value decomposition of a centered matrix AA such that A=Udiag(S)VTA = U diag(S) V^T . Note The relation of `(U, S, V)` to PCA is as follows: * AA is a data matrix with `m` samples and `n` features * the VV columns represent the principal directions * S∗∗2/(m−1)S ** 2 / (m - 1) contains the eigenvalues of ATA/(m−1)A^T A / (m - 1) which is the covariance of `A` when `center=True` is provided. * `matmul(A, V[:, :k])` projects data to the first k principal components Note Different from the standard SVD, the size of returned matrices depend on the specified rank and q values as follows: * UU is m x q matrix * SS is q-vector * VV is n x q matrix Note To obtain repeatable results, reset the seed for the pseudorandom number generator Parameters * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n) * **q** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – a slightly overestimated rank of AA . By default, `q = min(6, m, n)`. * **center** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if True, center the input tensor, otherwise, assume that the input is centered. * **niter** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of subspace iterations to conduct; niter must be a nonnegative integer, and defaults to 2. References: - Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, arXiv:0909.4061 [math.NA; math.PR], 2009 (available at `arXiv `_). # torch.pinverse `torch.pinverse(input, rcond=1e-15) → Tensor` Calculates the pseudo-inverse (also known as the Moore-Penrose inverse) of a 2D tensor. Please look at [Moore-Penrose inverse](https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_inverse) for more details Note `torch.pinverse()` is deprecated. Please use [`torch.linalg.pinv()`](../linalg#torch.linalg.pinv "torch.linalg.pinv") instead which includes new parameters `hermitian` and `out`. Note This method is implemented using the Singular Value Decomposition. Note The pseudo-inverse is not necessarily a continuous function in the elements of the matrix [[1]](https://epubs.siam.org/doi/10.1137/0117004). Therefore, derivatives are not always existent, and exist for a constant rank only [[2]](https://www.jstor.org/stable/2156365). However, this method is backprop- able due to the implementation by using SVD results, and could be unstable. Double-backward will also be unstable due to the usage of SVD internally. See [`svd()`](torch.svd#torch.svd "torch.svd") for more details. Note Supports real and complex inputs. Batched version for complex inputs is only supported on the CPU. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The input tensor of size (∗,m,n)(*, m, n) where ∗* is zero or more batch dimensions. * **rcond** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – A floating point value to determine the cutoff for small singular values. Default: `1e-15`. Returns The pseudo-inverse of `input` of dimensions (∗,n,m)(*, n, m) Example: >>> input = torch.randn(3, 5) >>> input tensor([[ 0.5495, 0.0979, -1.4092, -0.1128, 0.4132], [-1.1143, -0.3662, 0.3042, 1.6374, -0.9294], [-0.3269, -0.5745, -0.0382, -0.5922, -0.6759]]) >>> torch.pinverse(input) tensor([[ 0.0600, -0.1933, -0.2090], [-0.0903, -0.0817, -0.4752], [-0.7124, -0.1631, -0.2272], [ 0.1356, 0.3933, -0.5023], [-0.0308, -0.1725, -0.5216]]) >>> # Batched pinverse example >>> a = torch.randn(2,6,3) >>> b = torch.pinverse(a) >>> torch.matmul(b, a) tensor([[[ 1.0000e+00, 1.6391e-07, -1.1548e-07], [ 8.3121e-08, 1.0000e+00, -2.7567e-07], [ 3.5390e-08, 1.4901e-08, 1.0000e+00]], [[ 1.0000e+00, -8.9407e-08, 2.9802e-08], [-2.2352e-07, 1.0000e+00, 1.1921e-07], [ 0.0000e+00, 8.9407e-08, 1.0000e+00]]]) # torch.poisson `torch.poisson(input, generator=None) → Tensor` Returns a tensor of the same size as `input` with each element sampled from a Poisson distribution with rate parameter given by the corresponding element in `input` i.e., outi∼Poisson(inputi)\text{out}_i \sim \text{Poisson}(\text{input}_i) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor containing the rates of the Poisson distribution Keyword Arguments **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling Example: >>> rates = torch.rand(4, 4) * 5 # rate parameter between 0 and 5 >>> torch.poisson(rates) tensor([[9., 1., 3., 5.], [8., 6., 6., 0.], [0., 4., 5., 3.], [2., 1., 4., 2.]]) # torch.polar `torch.polar(abs, angle, *, out=None) → Tensor` Constructs a complex tensor whose elements are Cartesian coordinates corresponding to the polar coordinates with absolute value [`abs`](torch.abs#torch.abs "torch.abs") and angle [`angle`](torch.angle#torch.angle "torch.angle"). out=abs⋅cos⁡(angle)+abs⋅sin⁡(angle)⋅j\text{out} = \text{abs} \cdot \cos(\text{angle}) + \text{abs} \cdot \sin(\text{angle}) \cdot j Parameters * **abs** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The absolute value the complex tensor. Must be float or double. * **angle** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The angle of the complex tensor. Must be same dtype as [`abs`](torch.abs#torch.abs "torch.abs"). Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – If the inputs are `torch.float32`, must be `torch.complex64`. If the inputs are `torch.float64`, must be `torch.complex128`. Example:: >>> import numpy as np >>> abs = torch.tensor([1, 2], dtype=torch.float64) >>> angle = torch.tensor([np.pi / 2, 5 * np.pi / 4], dtype=torch.float64) >>> z = torch.polar(abs, angle) >>> z tensor([(0.0000+1.0000j), (-1.4142-1.4142j)], dtype=torch.complex128) # torch.polygamma `torch.polygamma(n, input, *, out=None) → Tensor` Computes the nthn^{th} derivative of the digamma function on `input`. n≥0n \geq 0 is called the order of the polygamma function. ψ(n)(x)=d(n)dx(n)ψ(x)\psi^{(n)}(x) = \frac{d^{(n)}}{dx^{(n)}} \psi(x) Note This function is implemented only for nonnegative integers n≥0n \geq 0 . Parameters * **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the order of the polygamma function * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example:: >>> a = torch.tensor([1, 0.5]) >>> torch.polygamma(1, a) tensor([1.64493, 4.9348]) >>> torch.polygamma(2, a) tensor([ -2.4041, -16.8288]) >>> torch.polygamma(3, a) tensor([ 6.4939, 97.4091]) >>> torch.polygamma(4, a) tensor([ -24.8863, -771.4742]) # torch.pow `torch.pow(input, exponent, *, out=None) → Tensor` Takes the power of each element in `input` with `exponent` and returns a tensor with the result. `exponent` can be either a single `float` number or a `Tensor` with the same number of elements as `input`. When `exponent` is a scalar value, the operation applied is: outi=xiexponent\text{out}_i = x_i ^ \text{exponent} When `exponent` is a tensor, the operation applied is: outi=xiexponenti\text{out}_i = x_i ^ {\text{exponent}_i} When `exponent` is a tensor, the shapes of `input` and `exponent` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **exponent** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _tensor_) – the exponent value Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.4331, 1.2475, 0.6834, -0.2791]) >>> torch.pow(a, 2) tensor([ 0.1875, 1.5561, 0.4670, 0.0779]) >>> exp = torch.arange(1., 5.) >>> a = torch.arange(1., 5.) >>> a tensor([ 1., 2., 3., 4.]) >>> exp tensor([ 1., 2., 3., 4.]) >>> torch.pow(a, exp) tensor([ 1., 4., 27., 256.]) `torch.pow(self, exponent, *, out=None) → Tensor` `self` is a scalar `float` value, and `exponent` is a tensor. The returned tensor `out` is of the same shape as `exponent` The operation applied is: outi=selfexponenti\text{out}_i = \text{self} ^ {\text{exponent}_i} Parameters * **self** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the scalar base value for the power operation * **exponent** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the exponent tensor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> exp = torch.arange(1., 5.) >>> base = 2 >>> torch.pow(base, exp) tensor([ 2., 4., 8., 16.]) # torch.prod `torch.prod(input, *, dtype=None) → Tensor` Returns the product of all elements in the `input` tensor. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. Example: >>> a = torch.randn(1, 3) >>> a tensor([[-0.8020, 0.5428, -1.5854]]) >>> torch.prod(a) tensor(0.6902) `torch.prod(input, dim, keepdim=False, *, dtype=None) → Tensor` Returns the product of each row of the `input` tensor in the given dimension `dim`. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 fewer dimension than `input`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. Example: >>> a = torch.randn(4, 2) >>> a tensor([[ 0.5261, -0.3837], [ 1.1857, -0.2498], [-1.1646, 0.0705], [ 1.1131, -1.0629]]) >>> torch.prod(a, 1) tensor([-0.2018, -0.2962, -0.0821, -1.1831]) # torch.promote_types `torch.promote_types(type1, type2) → dtype` Returns the [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype") with the smallest size and scalar kind that is not smaller nor of lower kind than either `type1` or `type2`. See type promotion [documentation](../tensor_attributes#type-promotion-doc) for more information on the type promotion logic. Parameters * **type1** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – * **type2** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – Example: >>> torch.promote_types(torch.int32, torch.float32) torch.float32 >>> torch.promote_types(torch.uint8, torch.long) torch.long # torch.qr `torch.qr(input, some=True, *, out=None) -> (Tensor, Tensor)` Computes the QR decomposition of a matrix or a batch of matrices `input`, and returns a namedtuple (Q, R) of tensors such that input=QR\text{input} = Q R with QQ being an orthogonal matrix or batch of orthogonal matrices and RR being an upper triangular matrix or batch of upper triangular matrices. If `some` is `True`, then this function returns the thin (reduced) QR factorization. Otherwise, if `some` is `False`, this function returns the complete QR factorization. Warning `torch.qr` is deprecated. Please use [`torch.linalg.qr()`](../linalg#torch.linalg.qr "torch.linalg.qr") instead. **Differences with** `torch.linalg.qr`: * `torch.linalg.qr` takes a string parameter `mode` instead of `some`: * `some=True` is equivalent of `mode='reduced'`: both are the default * `some=False` is equivalent of `mode='complete'`. Warning If you plan to backpropagate through QR, note that the current backward implementation is only well-defined when the first min⁡(input.size(−1),input.size(−2))\min(input.size(-1), input.size(-2)) columns of `input` are linearly independent. This behavior will propably change once QR supports pivoting. Note This function uses LAPACK for CPU inputs and MAGMA for CUDA inputs, and may produce different (valid) decompositions on different device types or different platforms. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n) where `*` is zero or more batch dimensions consisting of matrices of dimension m×nm \times n . * **some** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Set to `True` for reduced QR decomposition and `False` for complete QR decomposition. If `k = min(m, n)` then: * `some=True` : returns `(Q, R)` with dimensions (m, k), (k, n) (default) * `'some=False'`: returns `(Q, R)` with dimensions (m, m), (m, n) Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – tuple of `Q` and `R` tensors. The dimensions of `Q` and `R` are detailed in the description of `some` above. Example: >>> a = torch.tensor([[12., -51, 4], [6, 167, -68], [-4, 24, -41]]) >>> q, r = torch.qr(a) >>> q tensor([[-0.8571, 0.3943, 0.3314], [-0.4286, -0.9029, -0.0343], [ 0.2857, -0.1714, 0.9429]]) >>> r tensor([[ -14.0000, -21.0000, 14.0000], [ 0.0000, -175.0000, 70.0000], [ 0.0000, 0.0000, -35.0000]]) >>> torch.mm(q, r).round() tensor([[ 12., -51., 4.], [ 6., 167., -68.], [ -4., 24., -41.]]) >>> torch.mm(q.t(), q).round() tensor([[ 1., 0., 0.], [ 0., 1., -0.], [ 0., -0., 1.]]) >>> a = torch.randn(3, 4, 5) >>> q, r = torch.qr(a, some=False) >>> torch.allclose(torch.matmul(q, r), a) True >>> torch.allclose(torch.matmul(q.transpose(-2, -1), q), torch.eye(5)) True # torch.quantile `torch.quantile(input, q) → Tensor` Returns the q-th quantiles of all elements in the `input` tensor, doing a linear interpolation when the q-th quantile lies between two data points. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **q** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – a scalar or 1D tensor of quantile values in the range [0, 1] Example: >>> a = torch.randn(1, 3) >>> a tensor([[ 0.0700, -0.5446, 0.9214]]) >>> q = torch.tensor([0, 0.5, 1]) >>> torch.quantile(a, q) tensor([-0.5446, 0.0700, 0.9214]) `torch.quantile(input, q, dim=None, keepdim=False, *, out=None) → Tensor` Returns the q-th quantiles of each row of the `input` tensor along the dimension `dim`, doing a linear interpolation when the q-th quantile lies between two data points. By default, `dim` is `None` resulting in the `input` tensor being flattened before computation. If `keepdim` is `True`, the output dimensions are of the same size as `input` except in the dimensions being reduced (`dim` or all if `dim` is `None`) where they have size 1. Otherwise, the dimensions being reduced are squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")). If `q` is a 1D tensor, an extra dimension is prepended to the output tensor with the same size as `q` which represents the quantiles. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **q** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – a scalar or 1D tensor of quantile values in the range [0, 1] * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(2, 3) >>> a tensor([[ 0.0795, -1.2117, 0.9765], [ 1.1707, 0.6706, 0.4884]]) >>> q = torch.tensor([0.25, 0.5, 0.75]) >>> torch.quantile(a, q, dim=1, keepdim=True) tensor([[[-0.5661], [ 0.5795]], [[ 0.0795], [ 0.6706]], [[ 0.5280], [ 0.9206]]]) >>> torch.quantile(a, q, dim=1, keepdim=True).shape torch.Size([3, 2, 1]) # torch.quantize_per_channel `torch.quantize_per_channel(input, scales, zero_points, axis, dtype) → Tensor` Converts a float tensor to a per-channel quantized tensor with given scales and zero points. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – float tensor to quantize * **scales** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – float 1D tensor of scales to use, size should match `input.size(axis)` * **zero_points** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – integer 1D tensor of offset to use, size should match `input.size(axis)` * **axis** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension on which apply per-channel quantization * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – the desired data type of returned tensor. Has to be one of the quantized dtypes: `torch.quint8`, `torch.qint8`, `torch.qint32` Returns A newly quantized tensor Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> x = torch.tensor([[-1.0, 0.0], [1.0, 2.0]]) >>> torch.quantize_per_channel(x, torch.tensor([0.1, 0.01]), torch.tensor([10, 0]), 0, torch.quint8) tensor([[-1., 0.], [ 1., 2.]], size=(2, 2), dtype=torch.quint8, quantization_scheme=torch.per_channel_affine, scale=tensor([0.1000, 0.0100], dtype=torch.float64), zero_point=tensor([10, 0]), axis=0) >>> torch.quantize_per_channel(x, torch.tensor([0.1, 0.01]), torch.tensor([10, 0]), 0, torch.quint8).int_repr() tensor([[ 0, 10], [100, 200]], dtype=torch.uint8) # torch.quantize_per_tensor `torch.quantize_per_tensor(input, scale, zero_point, dtype) → Tensor` Converts a float tensor to a quantized tensor with given scale and zero point. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – float tensor to quantize * **scale** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – scale to apply in quantization formula * **zero_point** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – offset in integer value that maps to float zero * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – the desired data type of returned tensor. Has to be one of the quantized dtypes: `torch.quint8`, `torch.qint8`, `torch.qint32` Returns A newly quantized tensor Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> torch.quantize_per_tensor(torch.tensor([-1.0, 0.0, 1.0, 2.0]), 0.1, 10, torch.quint8) tensor([-1., 0., 1., 2.], size=(4,), dtype=torch.quint8, quantization_scheme=torch.per_tensor_affine, scale=0.1, zero_point=10) >>> torch.quantize_per_tensor(torch.tensor([-1.0, 0.0, 1.0, 2.0]), 0.1, 10, torch.quint8).int_repr() tensor([ 0, 10, 20, 30], dtype=torch.uint8) # SobolEngine `class torch.quasirandom.SobolEngine(dimension, scramble=False, seed=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine) The `torch.quasirandom.SobolEngine` is an engine for generating (scrambled) Sobol sequences. Sobol sequences are an example of low discrepancy quasi- random sequences. This implementation of an engine for Sobol sequences is capable of sampling sequences up to a maximum dimension of 21201. It uses direction numbers from obtained using the search criterion D(6) up to the dimension 21201. This is the recommended choice by the authors. #### References * Art B. Owen. Scrambling Sobol and Niederreiter-Xing points. Journal of Complexity, 14(4):466-489, December 1998. * I. M. Sobol. The distribution of points in a cube and the accurate evaluation of integrals. Zh. Vychisl. Mat. i Mat. Phys., 7:784-802, 1967. Parameters * **dimension** (_Int_) – The dimensionality of the sequence to be drawn * **scramble** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Setting this to `True` will produce scrambled Sobol sequences. Scrambling is capable of producing better Sobol sequences. Default: `False`. * **seed** (_Int_ _,__optional_) – This is the seed for the scrambling. The seed of the random number generator is set to this, if specified. Otherwise, it uses a random seed. Default: `None` Examples: >>> soboleng = torch.quasirandom.SobolEngine(dimension=5) >>> soboleng.draw(3) tensor([[0.5000, 0.5000, 0.5000, 0.5000, 0.5000], [0.7500, 0.2500, 0.7500, 0.2500, 0.7500], [0.2500, 0.7500, 0.2500, 0.7500, 0.2500]]) `draw(n=1, out=None, dtype=torch.float32)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.draw) Function to draw a sequence of `n` points from a Sobol sequence. Note that the samples are dependent on the previous samples. The size of the result is (n,dimension)(n, dimension) . Parameters * **n** (_Int_ _,__optional_) – The length of sequence of points to draw. Default: 1 * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor * **dtype** (`torch.dtype`, optional) – the desired data type of the returned tensor. Default: `torch.float32` `draw_base2(m, out=None, dtype=torch.float32)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.draw_base2) Function to draw a sequence of `2**m` points from a Sobol sequence. Note that the samples are dependent on the previous samples. The size of the result is (2∗∗m,dimension)(2**m, dimension) . Parameters * **m** (_Int_) – The (base2) exponent of the number of points to draw. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor * **dtype** (`torch.dtype`, optional) – the desired data type of the returned tensor. Default: `torch.float32` `fast_forward(n)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.fast_forward) Function to fast-forward the state of the `SobolEngine` by `n` steps. This is equivalent to drawing `n` samples without using the samples. Parameters **n** (_Int_) – The number of steps to fast-forward by. `reset()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quasirandom.html#SobolEngine.reset) Function to reset the `SobolEngine` to base state. # torch.rad2deg `torch.rad2deg(input, *, out=None) → Tensor` Returns a new tensor with each of the elements of `input` converted from angles in radians to degrees. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([[3.142, -3.142], [6.283, -6.283], [1.570, -1.570]]) >>> torch.rad2deg(a) tensor([[ 180.0233, -180.0233], [ 359.9894, -359.9894], [ 89.9544, -89.9544]]) # torch.rand `torch.rand(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a tensor filled with random numbers from a uniform distribution on the interval [0,1)[0, 1) The shape of the tensor is defined by the variable argument `size`. Parameters **size** (_int..._) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.rand(4) tensor([ 0.5204, 0.2503, 0.3525, 0.5673]) >>> torch.rand(2, 3) tensor([[ 0.8237, 0.5781, 0.6879], [ 0.3816, 0.7249, 0.0998]]) # torch.rand_like `torch.rand_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor` Returns a tensor with the same size as `input` that is filled with random numbers from a uniform distribution on the interval [0,1)[0, 1) . `torch.rand_like(input)` is equivalent to `torch.rand(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. # torch.randint `torch.randint(low=0, high, size, *, generator=None, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a tensor filled with random integers generated uniformly between `low` (inclusive) and `high` (exclusive). The shape of the tensor is defined by the variable argument `size`. Note With the global dtype default (`torch.float32`), this function returns a tensor with dtype `torch.int64`. Parameters * **low** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Lowest integer to be drawn from the distribution. Default: 0. * **high** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – One above the highest integer to be drawn from the distribution. * **size** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – a tuple defining the shape of the output tensor. Keyword Arguments * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.randint(3, 5, (3,)) tensor([4, 3, 4]) >>> torch.randint(10, (2, 2)) tensor([[0, 2], [5, 5]]) >>> torch.randint(3, 10, (2, 2)) tensor([[4, 5], [6, 7]]) # torch.randint_like `torch.randint_like(input, low=0, high, *, dtype=None, layout=torch.strided, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor` Returns a tensor with the same shape as Tensor `input` filled with random integers generated uniformly between `low` (inclusive) and `high` (exclusive). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor. * **low** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Lowest integer to be drawn from the distribution. Default: 0. * **high** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – One above the highest integer to be drawn from the distribution. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. # torch.randn `torch.randn(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a tensor filled with random numbers from a normal distribution with mean `0` and variance `1` (also called the standard normal distribution). outi∼N(0,1)\text{out}_{i} \sim \mathcal{N}(0, 1) The shape of the tensor is defined by the variable argument `size`. Parameters **size** (_int..._) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.randn(4) tensor([-2.1436, 0.9966, 2.3426, -0.6366]) >>> torch.randn(2, 3) tensor([[ 1.5954, 2.8929, -1.0923], [ 1.1719, -0.4709, -0.1996]]) # torch.randn_like `torch.randn_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor` Returns a tensor with the same size as `input` that is filled with random numbers from a normal distribution with mean 0 and variance 1. `torch.randn_like(input)` is equivalent to `torch.randn(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. # torch.randperm `torch.randperm(n, *, generator=None, out=None, dtype=torch.int64, layout=torch.strided, device=None, requires_grad=False, pin_memory=False) → Tensor` Returns a random permutation of integers from `0` to `n - 1`. Parameters **n** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the upper bound (exclusive) Keyword Arguments * **generator** ([`torch.Generator`](torch.generator#torch.Generator "torch.Generator"), optional) – a pseudorandom number generator for sampling * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: `torch.int64`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`. Example: >>> torch.randperm(4) tensor([2, 1, 0, 3]) # torch.range `torch.range(start=0, end, step=1, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a 1-D tensor of size ⌊end−startstep⌋+1\left\lfloor \frac{\text{end} - \text{start}}{\text{step}} \right\rfloor + 1 with values from `start` to `end` with step `step`. Step is the gap between two values in the tensor. outi+1=outi+step.\text{out}_{i+1} = \text{out}_i + \text{step}. Warning This function is deprecated and will be removed in a future release because its behavior is inconsistent with Python’s range builtin. Instead, use [`torch.arange()`](torch.arange#torch.arange "torch.arange"), which produces values in [start, end). Parameters * **start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the starting value for the set of points. Default: `0`. * **end** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the ending value for the set of points * **step** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the gap between each pair of adjacent points. Default: `1`. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). If `dtype` is not given, infer the data type from the other input arguments. If any of `start`, `end`, or `stop` are floating-point, the `dtype` is inferred to be the default dtype, see [`get_default_dtype()`](torch.get_default_dtype#torch.get_default_dtype "torch.get_default_dtype"). Otherwise, the `dtype` is inferred to be `torch.int64`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.range(1, 4) tensor([ 1., 2., 3., 4.]) >>> torch.range(1, 4, 0.5) tensor([ 1.0000, 1.5000, 2.0000, 2.5000, 3.0000, 3.5000, 4.0000]) # torch.ravel `torch.ravel(input) → Tensor` Return a contiguous flattened tensor. A copy is made only if needed. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> t = torch.tensor([[[1, 2], ... [3, 4]], ... [[5, 6], ... [7, 8]]]) >>> torch.ravel(t) tensor([1, 2, 3, 4, 5, 6, 7, 8]) # torch.real `torch.real(input) → Tensor` Returns a new tensor containing real values of the `self` tensor. The returned tensor and `self` share the same underlying storage. Warning `real()` is only supported for tensors with complex dtypes. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example:: >>> x=torch.randn(4, dtype=torch.cfloat) >>> x tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)]) >>> x.real tensor([ 0.3100, -0.5445, -1.6492, -0.0638]) # torch.reciprocal `torch.reciprocal(input, *, out=None) → Tensor` Returns a new tensor with the reciprocal of the elements of `input` Note Unlike NumPy’s reciprocal, torch.reciprocal supports integral inputs. Integral inputs to reciprocal are automatically [promoted](../tensor_attributes#type- promotion-doc) to the default scalar type. outi=1inputi\text{out}_{i} = \frac{1}{\text{input}_{i}} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.4595, -2.1219, -1.4314, 0.7298]) >>> torch.reciprocal(a) tensor([-2.1763, -0.4713, -0.6986, 1.3702]) # torch.remainder `torch.remainder(input, other, *, out=None) → Tensor` Computes the element-wise remainder of division. The dividend and divisor may contain both for integer and floating point numbers. The remainder has the same sign as the divisor `other`. Supports [broadcasting to a common shape](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics), [type promotion](../tensor_attributes#type-promotion-doc), and integer and float inputs. Note Complex inputs are not supported. In some cases, it is not mathematically possible to satisfy the definition of a modulo operation with complex numbers. See [`torch.fmod()`](torch.fmod#torch.fmod "torch.fmod") for how division by zero is handled. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the dividend * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – the divisor Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.remainder(torch.tensor([-3., -2, -1, 1, 2, 3]), 2) tensor([ 1., 0., 1., 1., 0., 1.]) >>> torch.remainder(torch.tensor([1, 2, 3, 4, 5]), 1.5) tensor([ 1.0000, 0.5000, 0.0000, 1.0000, 0.5000]) See also [`torch.fmod()`](torch.fmod#torch.fmod "torch.fmod"), which computes the element-wise remainder of division equivalently to the C library function `fmod()`. # torch.renorm `torch.renorm(input, p, dim, maxnorm, *, out=None) → Tensor` Returns a tensor where each sub-tensor of `input` along dimension `dim` is normalized such that the `p`-norm of the sub-tensor is lower than the value `maxnorm` Note If the norm of a row is lower than `maxnorm`, the row is unchanged Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the power for the norm computation * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to slice over to get the sub-tensors * **maxnorm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the maximum norm to keep each sub-tensor under Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x = torch.ones(3, 3) >>> x[1].fill_(2) tensor([ 2., 2., 2.]) >>> x[2].fill_(3) tensor([ 3., 3., 3.]) >>> x tensor([[ 1., 1., 1.], [ 2., 2., 2.], [ 3., 3., 3.]]) >>> torch.renorm(x, 1, 0, 5) tensor([[ 1.0000, 1.0000, 1.0000], [ 1.6667, 1.6667, 1.6667], [ 1.6667, 1.6667, 1.6667]]) # torch.repeat_interleave `torch.repeat_interleave(input, repeats, dim=None) → Tensor` Repeat elements of a tensor. Warning This is different from [`torch.Tensor.repeat()`](../tensors#torch.Tensor.repeat "torch.Tensor.repeat") but similar to `numpy.repeat`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **repeats** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of repetitions for each element. repeats is broadcasted to fit the shape of the given axis. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The dimension along which to repeat values. By default, use the flattened input array, and return a flat output array. Returns Repeated tensor which has the same shape as input, except along the given axis. Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> x = torch.tensor([1, 2, 3]) >>> x.repeat_interleave(2) tensor([1, 1, 2, 2, 3, 3]) >>> y = torch.tensor([[1, 2], [3, 4]]) >>> torch.repeat_interleave(y, 2) tensor([1, 1, 2, 2, 3, 3, 4, 4]) >>> torch.repeat_interleave(y, 3, dim=1) tensor([[1, 1, 1, 2, 2, 2], [3, 3, 3, 4, 4, 4]]) >>> torch.repeat_interleave(y, torch.tensor([1, 2]), dim=0) tensor([[1, 2], [3, 4], [3, 4]]) `torch.repeat_interleave(repeats) → Tensor` If the `repeats` is `tensor([n1, n2, n3, …])`, then the output will be `tensor([0, 0, …, 1, 1, …, 2, 2, …, …])` where `0` appears `n1` times, `1` appears `n2` times, `2` appears `n3` times, etc. # torch.reshape `torch.reshape(input, shape) → Tensor` Returns a tensor with the same data and number of elements as `input`, but with the specified shape. When possible, the returned tensor will be a view of `input`. Otherwise, it will be a copy. Contiguous inputs and inputs with compatible strides can be reshaped without copying, but you should not depend on the copying vs. viewing behavior. See [`torch.Tensor.view()`](../tensors#torch.Tensor.view "torch.Tensor.view") on when it is possible to return a view. A single dimension may be -1, in which case it’s inferred from the remaining dimensions and the number of elements in `input`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to be reshaped * **shape** (_tuple of python:ints_) – the new shape Example: >>> a = torch.arange(4.) >>> torch.reshape(a, (2, 2)) tensor([[ 0., 1.], [ 2., 3.]]) >>> b = torch.tensor([[0, 1], [2, 3]]) >>> torch.reshape(b, (-1,)) tensor([ 0, 1, 2, 3]) # torch.result_type `torch.result_type(tensor1, tensor2) → dtype` Returns the [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype") that would result from performing an arithmetic operation on the provided input tensors. See type promotion [documentation](../tensor_attributes#type-promotion-doc) for more information on the type promotion logic. Parameters * **tensor1** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – an input tensor or number * **tensor2** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Number_) – an input tensor or number Example: >>> torch.result_type(torch.tensor([1, 2], dtype=torch.int), 1.0) torch.float32 >>> torch.result_type(torch.tensor([1, 2], dtype=torch.uint8), torch.tensor(1)) torch.uint8 # torch.roll `torch.roll(input, shifts, dims=None) → Tensor` Roll the tensor along the given dimension(s). Elements that are shifted beyond the last position are re-introduced at the first position. If a dimension is not specified, the tensor will be flattened before rolling and then restored to the original shape. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **shifts** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – The number of places by which the elements of the tensor are shifted. If shifts is a tuple, dims must be a tuple of the same size, and each dimension will be rolled by the corresponding value * **dims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – Axis along which to roll Example: >>> x = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8]).view(4, 2) >>> x tensor([[1, 2], [3, 4], [5, 6], [7, 8]]) >>> torch.roll(x, 1, 0) tensor([[7, 8], [1, 2], [3, 4], [5, 6]]) >>> torch.roll(x, -1, 0) tensor([[3, 4], [5, 6], [7, 8], [1, 2]]) >>> torch.roll(x, shifts=(2, 1), dims=(0, 1)) tensor([[6, 5], [8, 7], [2, 1], [4, 3]]) # torch.rot90 `torch.rot90(input, k, dims) → Tensor` Rotate a n-D tensor by 90 degrees in the plane specified by dims axis. Rotation direction is from the first towards the second axis if k > 0, and from the second towards the first for k < 0. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **k** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of times to rotate * **dims** (_a list_ _or_[tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – axis to rotate Example: >>> x = torch.arange(4).view(2, 2) >>> x tensor([[0, 1], [2, 3]]) >>> torch.rot90(x, 1, [0, 1]) tensor([[1, 3], [0, 2]]) >>> x = torch.arange(8).view(2, 2, 2) >>> x tensor([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) >>> torch.rot90(x, 1, [1, 2]) tensor([[[1, 3], [0, 2]], [[5, 7], [4, 6]]]) # torch.round `torch.round(input, *, out=None) → Tensor` Returns a new tensor with each of the elements of `input` rounded to the closest integer. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.9920, 0.6077, 0.9734, -1.0362]) >>> torch.round(a) tensor([ 1., 1., 1., -1.]) # torch.row_stack `torch.row_stack(tensors, *, out=None) → Tensor` Alias of [`torch.vstack()`](torch.vstack#torch.vstack "torch.vstack"). # torch.rsqrt `torch.rsqrt(input, *, out=None) → Tensor` Returns a new tensor with the reciprocal of the square-root of each of the elements of `input`. outi=1inputi\text{out}_{i} = \frac{1}{\sqrt{\text{input}_{i}}} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.0370, 0.2970, 1.5420, -0.9105]) >>> torch.rsqrt(a) tensor([ nan, 1.8351, 0.8053, nan]) # torch.save `torch.save(obj, f, pickle_module=, pickle_protocol=2, _use_new_zipfile_serialization=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/serialization.html#save) Saves an object to a disk file. See also: `saving-loading-tensors` Parameters * **obj** – saved object * **f** – a file-like object (has to implement write and flush) or a string or os.PathLike object containing a file name * **pickle_module** – module used for pickling metadata and objects * **pickle_protocol** – can be specified to override the default protocol Note A common PyTorch convention is to save tensors using .pt file extension. Note PyTorch preserves storage sharing across serialization. See `preserve-storage- sharing` for more details. Note The 1.6 release of PyTorch switched `torch.save` to use a new zipfile-based file format. `torch.load` still retains the ability to load files in the old format. If for any reason you want `torch.save` to use the old format, pass the kwarg `_use_new_zipfile_serialization=False`. #### Example >>> # Save to file >>> x = torch.tensor([0, 1, 2, 3, 4]) >>> torch.save(x, 'tensor.pt') >>> # Save to io.BytesIO buffer >>> buffer = io.BytesIO() >>> torch.save(x, buffer) # torch.scatter `torch.scatter(input, dim, index, src) → Tensor` Out-of-place version of [`torch.Tensor.scatter_()`](../tensors#torch.Tensor.scatter_ "torch.Tensor.scatter_") # torch.scatter_add `torch.scatter_add(input, dim, index, src) → Tensor` Out-of-place version of [`torch.Tensor.scatter_add_()`](../tensors#torch.Tensor.scatter_add_ "torch.Tensor.scatter_add_") # torch.searchsorted `torch.searchsorted(sorted_sequence, values, *, out_int32=False, right=False, out=None) → Tensor` Find the indices from the _innermost_ dimension of `sorted_sequence` such that, if the corresponding values in `values` were inserted before the indices, the order of the corresponding _innermost_ dimension within `sorted_sequence` would be preserved. Return a new tensor with the same size as `values`. If `right` is False (default), then the left boundary of `sorted_sequence` is closed. More formally, the returned index satisfies the following rules: `sorted_sequence` | `right` | _returned index satisfies_ ---|---|--- 1-D | False | `sorted_sequence[i-1] < values[m][n]...[l][x] <= sorted_sequence[i]` 1-D | True | `sorted_sequence[i-1] <= values[m][n]...[l][x] < sorted_sequence[i]` N-D | False | `sorted_sequence[m][n]...[l][i-1] < values[m][n]...[l][x] <= sorted_sequence[m][n]...[l][i]` N-D | True | `sorted_sequence[m][n]...[l][i-1] <= values[m][n]...[l][x] < sorted_sequence[m][n]...[l][i]` Parameters * **sorted_sequence** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – N-D or 1-D tensor, containing monotonically increasing sequence on the _innermost_ dimension. * **values** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – N-D tensor or a Scalar containing the search value(s). Keyword Arguments * **out_int32** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicate the output data type. torch.int32 if True, torch.int64 otherwise. Default value is False, i.e. default output data type is torch.int64. * **right** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if False, return the first suitable location that is found. If True, return the last such index. If no suitable index found, return 0 for non-numerical value (eg. nan, inf) or the size of _innermost_ dimension within `sorted_sequence` (one pass the last index of the _innermost_ dimension). In other words, if False, gets the lower bound index for each value in `values` on the corresponding _innermost_ dimension of the `sorted_sequence`. If True, gets the upper bound index instead. Default value is False. * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor, must be the same size as `values` if provided. Note If your use case is always 1-D sorted sequence, [`torch.bucketize()`](torch.bucketize#torch.bucketize "torch.bucketize") is preferred, because it has fewer dimension checks resulting in slightly better performance. Example: >>> sorted_sequence = torch.tensor([[1, 3, 5, 7, 9], [2, 4, 6, 8, 10]]) >>> sorted_sequence tensor([[ 1, 3, 5, 7, 9], [ 2, 4, 6, 8, 10]]) >>> values = torch.tensor([[3, 6, 9], [3, 6, 9]]) >>> values tensor([[3, 6, 9], [3, 6, 9]]) >>> torch.searchsorted(sorted_sequence, values) tensor([[1, 3, 4], [1, 2, 4]]) >>> torch.searchsorted(sorted_sequence, values, right=True) tensor([[2, 3, 5], [1, 3, 4]]) >>> sorted_sequence_1d = torch.tensor([1, 3, 5, 7, 9]) >>> sorted_sequence_1d tensor([1, 3, 5, 7, 9]) >>> torch.searchsorted(sorted_sequence_1d, values) tensor([[1, 3, 4], [1, 3, 4]]) # torch.seed `torch.seed()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#seed) Sets the seed for generating random numbers to a non-deterministic random number. Returns a 64 bit number used to seed the RNG. # torch.set_default_dtype `torch.set_default_dtype(d)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#set_default_dtype) Sets the default floating point dtype to `d`. This dtype is: 1. The inferred dtype for python floats in [`torch.tensor()`](torch.tensor#torch.tensor "torch.tensor"). 2. Used to infer dtype for python complex numbers. The default complex dtype is set to `torch.complex128` if default floating point dtype is `torch.float64`, otherwise it’s set to `torch.complex64` The default floating point dtype is initially `torch.float32`. Parameters **d** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – the floating point dtype to make the default #### Example >>> # initial default for floating point is torch.float32 >>> torch.tensor([1.2, 3]).dtype torch.float32 >>> # initial default for floating point is torch.complex64 >>> torch.tensor([1.2, 3j]).dtype torch.complex64 >>> torch.set_default_dtype(torch.float64) >>> torch.tensor([1.2, 3]).dtype # a new floating point tensor torch.float64 >>> torch.tensor([1.2, 3j]).dtype # a new complex tensor torch.complex128 # torch.set_default_tensor_type `torch.set_default_tensor_type(t)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#set_default_tensor_type) Sets the default `torch.Tensor` type to floating point tensor type `t`. This type will also be used as default floating point type for type inference in [`torch.tensor()`](torch.tensor#torch.tensor "torch.tensor"). The default floating point tensor type is initially `torch.FloatTensor`. Parameters **t** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – the floating point tensor type or its name Example: >>> torch.tensor([1.2, 3]).dtype # initial default for floating point is torch.float32 torch.float32 >>> torch.set_default_tensor_type(torch.DoubleTensor) >>> torch.tensor([1.2, 3]).dtype # a new floating point tensor torch.float64 # torch.set_flush_denormal `torch.set_flush_denormal(mode) → bool` Disables denormal floating numbers on CPU. Returns `True` if your system supports flushing denormal numbers and it successfully configures flush denormal mode. `set_flush_denormal()` is only supported on x86 architectures supporting SSE3. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Controls whether to enable flush denormal mode or not Example: >>> torch.set_flush_denormal(True) True >>> torch.tensor([1e-323], dtype=torch.float64) tensor([ 0.], dtype=torch.float64) >>> torch.set_flush_denormal(False) True >>> torch.tensor([1e-323], dtype=torch.float64) tensor(9.88131e-324 * [ 1.0000], dtype=torch.float64) # set_grad_enabled `class torch.set_grad_enabled(mode)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/autograd/grad_mode.html#set_grad_enabled) Context-manager that sets gradient calculation to on or off. `set_grad_enabled` will enable or disable grads based on its argument [`mode`](torch.mode#torch.mode "torch.mode"). It can be used as a context- manager or as a function. This context manager is thread local; it will not affect computation in other threads. Parameters **mode** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Flag whether to enable grad (`True`), or disable (`False`). This can be used to conditionally enable gradients. Example: >>> x = torch.tensor([1], requires_grad=True) >>> is_train = False >>> with torch.set_grad_enabled(is_train): ... y = x * 2 >>> y.requires_grad False >>> torch.set_grad_enabled(True) >>> y = x * 2 >>> y.requires_grad True >>> torch.set_grad_enabled(False) >>> y = x * 2 >>> y.requires_grad False # torch.set_num_interop_threads `torch.set_num_interop_threads(int)` Sets the number of threads used for interop parallelism (e.g. in JIT interpreter) on CPU. Warning Can only be called once and before any inter-op parallel work is started (e.g. JIT execution). # torch.set_num_threads `torch.set_num_threads(int)` Sets the number of threads used for intraop parallelism on CPU. Warning To ensure that the correct number of threads is used, set_num_threads must be called before running eager, JIT or autograd code. # torch.set_printoptions `torch.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, profile=None, sci_mode=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_tensor_str.html#set_printoptions) Set options for printing. Items shamelessly taken from NumPy Parameters * **precision** – Number of digits of precision for floating point output (default = 4). * **threshold** – Total number of array elements which trigger summarization rather than full `repr` (default = 1000). * **edgeitems** – Number of array items in summary at beginning and end of each dimension (default = 3). * **linewidth** – The number of characters per line for the purpose of inserting line breaks (default = 80). Thresholded matrices will ignore this parameter. * **profile** – Sane defaults for pretty printing. Can override with any of the above options. (any one of `default`, `short`, `full`) * **sci_mode** – Enable (True) or disable (False) scientific notation. If None (default) is specified, the value is defined by `torch._tensor_str._Formatter`. This value is automatically chosen by the framework. # torch.set_rng_state `torch.set_rng_state(new_state)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#set_rng_state) Sets the random number generator state. Parameters **new_state** (_torch.ByteTensor_) – The desired state # torch.sgn `torch.sgn(input, *, out=None) → Tensor` For complex tensors, this function returns a new tensor whose elemants have the same angle as that of the elements of `input` and absolute value 1. For a non-complex tensor, this function returns the signs of the elements of `input` (see [`torch.sign()`](torch.sign#torch.sign "torch.sign")). outi=0\text{out}_{i} = 0 , if ∣inputi∣==0|{\text{{input}}_i}| == 0 outi=inputi∣inputi∣\text{out}_{i} = \frac{{\text{{input}}_i}}{|{\text{{input}}_i}|} , otherwise Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x=torch.tensor([3+4j, 7-24j, 0, 1+2j]) >>> x.sgn() tensor([0.6000+0.8000j, 0.2800-0.9600j, 0.0000+0.0000j, 0.4472+0.8944j]) # torch.sigmoid `torch.sigmoid(input, *, out=None) → Tensor` Returns a new tensor with the sigmoid of the elements of `input`. outi=11+e−inputi\text{out}_{i} = \frac{1}{1 + e^{-\text{input}_{i}}} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.9213, 1.0887, -0.8858, -1.7683]) >>> torch.sigmoid(a) tensor([ 0.7153, 0.7481, 0.2920, 0.1458]) # torch.sign `torch.sign(input, *, out=None) → Tensor` Returns a new tensor with the signs of the elements of `input`. outi=sgn⁡(inputi)\text{out}_{i} = \operatorname{sgn}(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([0.7, -1.2, 0., 2.3]) >>> a tensor([ 0.7000, -1.2000, 0.0000, 2.3000]) >>> torch.sign(a) tensor([ 1., -1., 0., 1.]) # torch.signbit `torch.signbit(input, *, out=None) → Tensor` Tests if each element of `input` has its sign bit set (is less than zero) or not. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([0.7, -1.2, 0., 2.3]) >>> torch.signbit(a) tensor([ False, True, False, False]) # torch.sin `torch.sin(input, *, out=None) → Tensor` Returns a new tensor with the sine of the elements of `input`. outi=sin⁡(inputi)\text{out}_{i} = \sin(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-0.5461, 0.1347, -2.7266, -0.2746]) >>> torch.sin(a) tensor([-0.5194, 0.1343, -0.4032, -0.2711]) # torch.sinc `torch.sinc(input, *, out=None) → Tensor` Computes the normalized sinc of `input.` outi={1,if inputi=0sin⁡(πinputi)/(πinputi),otherwise\text{out}_{i} = \begin{cases} 1, & \text{if}\ \text{input}_{i}=0 \\\ \sin(\pi \text{input}_{i}) / (\pi \text{input}_{i}), & \text{otherwise} \end{cases} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.2252, -0.2948, 1.0267, -1.1566]) >>> torch.sinc(a) tensor([ 0.9186, 0.8631, -0.0259, -0.1300]) # torch.sinh `torch.sinh(input, *, out=None) → Tensor` Returns a new tensor with the hyperbolic sine of the elements of `input`. outi=sinh⁡(inputi)\text{out}_{i} = \sinh(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.5380, -0.8632, -0.1265, 0.9399]) >>> torch.sinh(a) tensor([ 0.5644, -0.9744, -0.1268, 1.0845]) Note When `input` is on the CPU, the implementation of torch.sinh may use the Sleef library, which rounds very large results to infinity or negative infinity. See [here](https://sleef.org/purec.xhtml) for details. # torch.slogdet `torch.slogdet(input) -> (Tensor, Tensor)` Calculates the sign and log absolute value of the determinant(s) of a square matrix or batches of square matrices. Note `torch.slogdet()` is deprecated. Please use [`torch.linalg.slogdet()`](../linalg#torch.linalg.slogdet "torch.linalg.slogdet") instead. Note If `input` has zero determinant, this returns `(0, -inf)`. Note Backward through `slogdet()` internally uses SVD results when `input` is not invertible. In this case, double backward through `slogdet()` will be unstable in when `input` doesn’t have distinct singular values. See [`svd()`](torch.svd#torch.svd "torch.svd") for details. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size `(*, n, n)` where `*` is zero or more batch dimensions. Returns A namedtuple (sign, logabsdet) containing the sign of the determinant, and the log value of the absolute determinant. Example: >>> A = torch.randn(3, 3) >>> A tensor([[ 0.0032, -0.2239, -1.1219], [-0.6690, 0.1161, 0.4053], [-1.6218, -0.9273, -0.0082]]) >>> torch.det(A) tensor(-0.7576) >>> torch.logdet(A) tensor(nan) >>> torch.slogdet(A) torch.return_types.slogdet(sign=tensor(-1.), logabsdet=tensor(-0.2776)) # torch.solve `torch.solve(input, A, *, out=None) -> (Tensor, Tensor)` This function returns the solution to the system of linear equations represented by AX=BAX = B and the LU factorization of A, in order as a namedtuple `solution, LU`. `LU` contains `L` and `U` factors for LU factorization of `A`. `torch.solve(B, A)` can take in 2D inputs `B, A` or inputs that are batches of 2D matrices. If the inputs are batches, then returns batched outputs `solution, LU`. Supports real-valued and complex-valued inputs. Note Irrespective of the original strides, the returned matrices `solution` and `LU` will be transposed, i.e. with strides like `B.contiguous().transpose(-1, -2).stride()` and `A.contiguous().transpose(-1, -2).stride()` respectively. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input matrix BB of size (∗,m,k)(*, m, k) , where ∗* is zero or more batch dimensions. * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – input square matrix of size (∗,m,m)(*, m, m) , where ∗* is zero or more batch dimensions. Keyword Arguments **out** (_(_[Tensor](../tensors#torch.Tensor "torch.Tensor") _,_[Tensor](../tensors#torch.Tensor "torch.Tensor") _)__,__optional_) – optional output tuple. Example: >>> A = torch.tensor([[6.80, -2.11, 5.66, 5.97, 8.23], ... [-6.05, -3.30, 5.36, -4.44, 1.08], ... [-0.45, 2.58, -2.70, 0.27, 9.04], ... [8.32, 2.71, 4.35, -7.17, 2.14], ... [-9.67, -5.14, -7.26, 6.08, -6.87]]).t() >>> B = torch.tensor([[4.02, 6.19, -8.22, -7.57, -3.03], ... [-1.56, 4.00, -8.67, 1.75, 2.86], ... [9.81, -4.09, -4.57, -8.61, 8.99]]).t() >>> X, LU = torch.solve(B, A) >>> torch.dist(B, torch.mm(A, X)) tensor(1.00000e-06 * 7.0977) >>> # Batched solver example >>> A = torch.randn(2, 3, 1, 4, 4) >>> B = torch.randn(2, 3, 1, 4, 6) >>> X, LU = torch.solve(B, A) >>> torch.dist(B, A.matmul(X)) tensor(1.00000e-06 * 3.6386) # torch.sort `torch.sort(input, dim=-1, descending=False, *, out=None) -> (Tensor, LongTensor)` Sorts the elements of the `input` tensor along a given dimension in ascending order by value. If `dim` is not given, the last dimension of the `input` is chosen. If `descending` is `True` then the elements are sorted in descending order by value. A namedtuple of (values, indices) is returned, where the `values` are the sorted values and `indices` are the indices of the elements in the original `input` tensor. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to sort along * **descending** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls the sorting order (ascending or descending) Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the output tuple of (`Tensor`, `LongTensor`) that can be optionally given to be used as output buffers Example: >>> x = torch.randn(3, 4) >>> sorted, indices = torch.sort(x) >>> sorted tensor([[-0.2162, 0.0608, 0.6719, 2.3332], [-0.5793, 0.0061, 0.6058, 0.9497], [-0.5071, 0.3343, 0.9553, 1.0960]]) >>> indices tensor([[ 1, 0, 2, 3], [ 3, 1, 0, 2], [ 0, 3, 1, 2]]) >>> sorted, indices = torch.sort(x, 0) >>> sorted tensor([[-0.5071, -0.2162, 0.6719, -0.5793], [ 0.0608, 0.0061, 0.9497, 0.3343], [ 0.6058, 0.9553, 1.0960, 2.3332]]) >>> indices tensor([[ 2, 0, 0, 1], [ 0, 1, 1, 2], [ 1, 2, 2, 0]]) # torch.sparse_coo_tensor `torch.sparse_coo_tensor(indices, values, size=None, *, dtype=None, device=None, requires_grad=False) → Tensor` Constructs a [sparse tensor in COO(rdinate) format](../sparse#sparse-coo-docs) with specified values at the given `indices`. Note This function returns an [uncoalesced tensor](../sparse#sparse-uncoalesced- coo-docs). Parameters * **indices** (_array_like_) – Initial data for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. Will be cast to a `torch.LongTensor` internally. The indices are the coordinates of the non-zero values in the matrix, and thus should be two-dimensional where the first dimension is the number of tensor dimensions and the second dimension is the number of non-zero values. * **values** (_array_like_) – Initial values for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. * **size** (list, tuple, or `torch.Size`, optional) – Size of the sparse tensor. If not provided the size will be inferred as the minimum size big enough to hold all non-zero elements. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if None, infers data type from `values`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> i = torch.tensor([[0, 1, 1], ... [2, 0, 2]]) >>> v = torch.tensor([3, 4, 5], dtype=torch.float32) >>> torch.sparse_coo_tensor(i, v, [2, 4]) tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), size=(2, 4), nnz=3, layout=torch.sparse_coo) >>> torch.sparse_coo_tensor(i, v) # Shape inference tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), size=(2, 3), nnz=3, layout=torch.sparse_coo) >>> torch.sparse_coo_tensor(i, v, [2, 4], ... dtype=torch.float64, ... device=torch.device('cuda:0')) tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), device='cuda:0', size=(2, 4), nnz=3, dtype=torch.float64, layout=torch.sparse_coo) # Create an empty sparse tensor with the following invariants: # 1. sparse_dim + dense_dim = len(SparseTensor.shape) # 2. SparseTensor._indices().shape = (sparse_dim, nnz) # 3. SparseTensor._values().shape = (nnz, SparseTensor.shape[sparse_dim:]) # # For instance, to create an empty sparse tensor with nnz = 0, dense_dim = 0 and # sparse_dim = 1 (hence indices is a 2D tensor of shape = (1, 0)) >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), [], [1]) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0,)), size=(1,), nnz=0, layout=torch.sparse_coo) # and to create an empty sparse tensor with nnz = 0, dense_dim = 1 and # sparse_dim = 1 >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), torch.empty([0, 2]), [1, 2]) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0, 2)), size=(1, 2), nnz=0, layout=torch.sparse_coo) # torch.split `torch.split(tensor, split_size_or_sections, dim=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#split) Splits the tensor into chunks. Each chunk is a view of the original tensor. If `split_size_or_sections` is an integer type, then [`tensor`](torch.tensor#torch.tensor "torch.tensor") will be split into equally sized chunks (if possible). Last chunk will be smaller if the tensor size along the given dimension `dim` is not divisible by `split_size`. If `split_size_or_sections` is a list, then [`tensor`](torch.tensor#torch.tensor "torch.tensor") will be split into `len(split_size_or_sections)` chunks with sizes in `dim` according to `split_size_or_sections`. Parameters * **tensor** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – tensor to split. * **split_size_or_sections** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _) or_ _(_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _(_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _)_) – size of a single chunk or list of sizes for each chunk * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to split the tensor. Example:: >>> a = torch.arange(10).reshape(5,2) >>> a tensor([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) >>> torch.split(a, 2) (tensor([[0, 1], [2, 3]]), tensor([[4, 5], [6, 7]]), tensor([[8, 9]])) >>> torch.split(a, [1,4]) (tensor([[0, 1]]), tensor([[2, 3], [4, 5], [6, 7], [8, 9]])) # torch.sqrt `torch.sqrt(input, *, out=None) → Tensor` Returns a new tensor with the square-root of the elements of `input`. outi=inputi\text{out}_{i} = \sqrt{\text{input}_{i}} Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-2.0755, 1.0226, 0.0831, 0.4806]) >>> torch.sqrt(a) tensor([ nan, 1.0112, 0.2883, 0.6933]) # torch.square `torch.square(input, *, out=None) → Tensor` Returns a new tensor with the square of the elements of `input`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-2.0755, 1.0226, 0.0831, 0.4806]) >>> torch.square(a) tensor([ 4.3077, 1.0457, 0.0069, 0.2310]) # torch.squeeze `torch.squeeze(input, dim=None, *, out=None) → Tensor` Returns a tensor with all the dimensions of `input` of size `1` removed. For example, if `input` is of shape: (A×1×B×C×1×D)(A \times 1 \times B \times C \times 1 \times D) then the `out` tensor will be of shape: (A×B×C×D)(A \times B \times C \times D) . When `dim` is given, a squeeze operation is done only in the given dimension. If `input` is of shape: (A×1×B)(A \times 1 \times B) , `squeeze(input, 0)` leaves the tensor unchanged, but `squeeze(input, 1)` will squeeze the tensor to the shape (A×B)(A \times B) . Note The returned tensor shares the storage with the input tensor, so changing the contents of one will change the contents of the other. Warning If the tensor has a batch dimension of size 1, then `squeeze(input)` will also remove the batch dimension, which can lead to unexpected errors. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – if given, the input will be squeezed only in this dimension Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x = torch.zeros(2, 1, 2, 1, 2) >>> x.size() torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x) >>> y.size() torch.Size([2, 2, 2]) >>> y = torch.squeeze(x, 0) >>> y.size() torch.Size([2, 1, 2, 1, 2]) >>> y = torch.squeeze(x, 1) >>> y.size() torch.Size([2, 2, 1, 2]) # torch.stack `torch.stack(tensors, dim=0, *, out=None) → Tensor` Concatenates a sequence of tensors along a new dimension. All tensors need to be of the same size. Parameters * **tensors** (_sequence of Tensors_) – sequence of tensors to concatenate * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension to insert. Has to be between 0 and the number of dimensions of concatenated tensors (inclusive) Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. # torch.std `torch.std(input, unbiased=True) → Tensor` Returns the standard-deviation of all elements in the `input` tensor. If `unbiased` is `False`, then the standard-deviation will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not Example: >>> a = torch.randn(1, 3) >>> a tensor([[-0.8166, -1.3802, -0.3560]]) >>> torch.std(a) tensor(0.5130) `torch.std(input, dim, unbiased=True, keepdim=False, *, out=None) → Tensor` Returns the standard-deviation of each row of the `input` tensor in the dimension `dim`. If `dim` is a list of dimensions, reduce over all of them. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). If `unbiased` is `False`, then the standard-deviation will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 0.2035, 1.2959, 1.8101, -0.4644], [ 1.5027, -0.3270, 0.5905, 0.6538], [-1.5745, 1.3330, -0.5596, -0.6548], [ 0.1264, -0.5080, 1.6420, 0.1992]]) >>> torch.std(a, dim=1) tensor([ 1.0311, 0.7477, 1.2204, 0.9087]) # torch.std_mean `torch.std_mean(input, unbiased=True) -> (Tensor, Tensor)` Returns the standard-deviation and mean of all elements in the `input` tensor. If `unbiased` is `False`, then the standard-deviation will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not Example: >>> a = torch.randn(1, 3) >>> a tensor([[0.3364, 0.3591, 0.9462]]) >>> torch.std_mean(a) (tensor(0.3457), tensor(0.5472)) `torch.std_mean(input, dim, unbiased=True, keepdim=False) -> (Tensor, Tensor)` Returns the standard-deviation and mean of each row of the `input` tensor in the dimension `dim`. If `dim` is a list of dimensions, reduce over all of them. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). If `unbiased` is `False`, then the standard-deviation will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 0.5648, -0.5984, -1.2676, -1.4471], [ 0.9267, 1.0612, 1.1050, -0.6014], [ 0.0154, 1.9301, 0.0125, -1.0904], [-1.9711, -0.7748, -1.3840, 0.5067]]) >>> torch.std_mean(a, 1) (tensor([0.9110, 0.8197, 1.2552, 1.0608]), tensor([-0.6871, 0.6229, 0.2169, -0.9058])) # torch.stft `torch.stft(input, n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=None, return_complex=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#stft) Short-time Fourier transform (STFT). Warning From version 1.8.0, `return_complex` must always be given explicitly for real inputs and `return_complex=False` has been deprecated. Strongly prefer `return_complex=True` as in a future pytorch release, this function will only return complex tensors. Note that [`torch.view_as_real()`](torch.view_as_real#torch.view_as_real "torch.view_as_real") can be used to recover a real tensor with an extra last dimension for real and imaginary components. The STFT computes the Fourier transform of short overlapping windows of the input. This giving frequency components of the signal as they change over time. The interface of this function is modeled after the [librosa](https://librosa.org/doc/latest/generated/librosa.stft.html) stft function. Ignoring the optional batch dimension, this method computes the following expression: X[m,ω]=∑k=0win_length-1window[k] input[m×hop_length+k]exp⁡(−j2π⋅ωkwin_length),X[m, \omega] = \sum_{k = 0}^{\text{win\\_length-1}}% \text{window}[k]\ \text{input}[m \times \text{hop\\_length} + k]\ % \exp\left(- j \frac{2 \pi \cdot \omega k}{\text{win\\_length}}\right), where mm is the index of the sliding window, and ω\omega is the frequency that 0≤ω>> a = torch.tensor((1, 2)) >>> b = torch.tensor((0, 1)) >>> torch.sub(a, b, alpha=2) tensor([1, 0]) # torch.subtract `torch.subtract(input, other, *, alpha=1, out=None) → Tensor` Alias for [`torch.sub()`](torch.sub#torch.sub "torch.sub"). # torch.sum `torch.sum(input, *, dtype=None) → Tensor` Returns the sum of all elements in the `input` tensor. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. Example: >>> a = torch.randn(1, 3) >>> a tensor([[ 0.1133, -0.9567, 0.2958]]) >>> torch.sum(a) tensor(-0.5475) `torch.sum(input, dim, keepdim=False, *, dtype=None) → Tensor` Returns the sum of each row of the `input` tensor in the given dimension `dim`. If `dim` is a list of dimensions, reduce over all of them. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. Example: >>> a = torch.randn(4, 4) >>> a tensor([[ 0.0569, -0.2475, 0.0737, -0.3429], [-0.2993, 0.9138, 0.9337, -1.6864], [ 0.1132, 0.7892, -0.1003, 0.5688], [ 0.3637, -0.9906, -0.4752, -1.5197]]) >>> torch.sum(a, 1) tensor([-0.4598, -0.1381, 1.3708, -2.6217]) >>> b = torch.arange(4 * 5 * 6).view(4, 5, 6) >>> torch.sum(b, (2, 1)) tensor([ 435., 1335., 2235., 3135.]) # torch.svd `torch.svd(input, some=True, compute_uv=True, *, out=None) -> (Tensor, Tensor, Tensor)` Computes the singular value decomposition of either a matrix or batch of matrices `input`. The singular value decomposition is represented as a namedtuple (`U,S,V`), such that `input` = `U` diag(`S`) `Vᴴ`, where `Vᴴ` is the transpose of `V` for the real-valued inputs, or the conjugate transpose of `V` for the complex-valued inputs. If `input` is a batch of tensors, then `U`, `S`, and `V` are also batched with the same batch dimensions as `input`. If `some` is `True` (default), the method returns the reduced singular value decomposition i.e., if the last two dimensions of `input` are `m` and `n`, then the returned `U` and `V` matrices will contain only min(`n, m`) orthonormal columns. If `compute_uv` is `False`, the returned `U` and `V` will be zero-filled matrices of shape `(m × m)` and `(n × n)` respectively, and the same device as `input`. The `some` argument has no effect when `compute_uv` is `False`. Supports input of float, double, cfloat and cdouble data types. The dtypes of `U` and `V` are the same as `input`’s. `S` will always be real-valued, even if `input` is complex. Warning `torch.svd()` is deprecated. Please use [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd") instead, which is similar to NumPy’s `numpy.linalg.svd`. Note Differences with [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd"): * `some` is the opposite of [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd")’s `full_matricies`. Note that default value for both is `True`, so the default behavior is effectively the opposite. * `torch.svd()` returns `V`, whereas [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd") returns `Vᴴ`. * If `compute_uv=False`, `torch.svd()` returns zero-filled tensors for `U` and `Vh`, whereas [`torch.linalg.svd()`](../linalg#torch.linalg.svd "torch.linalg.svd") returns empty tensors. Note The singular values are returned in descending order. If `input` is a batch of matrices, then the singular values of each matrix in the batch is returned in descending order. Note The implementation of SVD on CPU uses the LAPACK routine `?gesdd` (a divide- and-conquer algorithm) instead of `?gesvd` for speed. Analogously, the SVD on GPU uses the cuSOLVER routines `gesvdj` and `gesvdjBatched` on CUDA 10.1.243 and later, and uses the MAGMA routine `gesdd` on earlier versions of CUDA. Note The returned matrix `U` will be transposed, i.e. with strides `U.contiguous().transpose(-2, -1).stride()`. Note Gradients computed using `U` and `V` may be unstable if `input` is not full rank or has non-unique singular values. Note When `some` = `False`, the gradients on `U[..., :, min(m, n):]` and `V[..., :, min(m, n):]` will be ignored in backward as those vectors can be arbitrary bases of the subspaces. Note The `S` tensor can only be used to compute gradients if `compute_uv` is True. Note With the complex-valued input the backward operation works correctly only for gauge invariant loss functions. Please look at [Gauge problem in AD](https://re-ra.xyz/Gauge-Problem-in-Automatic-Differentiation/) for more details. Note Since `U` and `V` of an SVD is not unique, each vector can be multiplied by an arbitrary phase factor eiϕe^{i \phi} while the SVD result is still correct. Different platforms, like Numpy, or inputs on different device types, may produce different `U` and `V` tensors. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size `(*, m, n)` where `*` is zero or more batch dimensions consisting of `(m × n)` matrices. * **some** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to compute the reduced or full decomposition, and consequently the shape of returned `U` and `V`. Defaults to True. * **compute_uv** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – option whether to compute `U` and `V` or not. Defaults to True. Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the output tuple of tensors Example: >>> a = torch.randn(5, 3) >>> a tensor([[ 0.2364, -0.7752, 0.6372], [ 1.7201, 0.7394, -0.0504], [-0.3371, -1.0584, 0.5296], [ 0.3550, -0.4022, 1.5569], [ 0.2445, -0.0158, 1.1414]]) >>> u, s, v = torch.svd(a) >>> u tensor([[ 0.4027, 0.0287, 0.5434], [-0.1946, 0.8833, 0.3679], [ 0.4296, -0.2890, 0.5261], [ 0.6604, 0.2717, -0.2618], [ 0.4234, 0.2481, -0.4733]]) >>> s tensor([2.3289, 2.0315, 0.7806]) >>> v tensor([[-0.0199, 0.8766, 0.4809], [-0.5080, 0.4054, -0.7600], [ 0.8611, 0.2594, -0.4373]]) >>> torch.dist(a, torch.mm(torch.mm(u, torch.diag(s)), v.t())) tensor(8.6531e-07) >>> a_big = torch.randn(7, 5, 3) >>> u, s, v = torch.svd(a_big) >>> torch.dist(a_big, torch.matmul(torch.matmul(u, torch.diag_embed(s)), v.transpose(-2, -1))) tensor(2.6503e-06) # torch.svd_lowrank `torch.svd_lowrank(A, q=6, niter=2, M=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_lowrank.html#svd_lowrank) Return the singular value decomposition `(U, S, V)` of a matrix, batches of matrices, or a sparse matrix AA such that A≈Udiag(S)VTA \approx U diag(S) V^T . In case MM is given, then SVD is computed for the matrix A−MA - M . Note The implementation is based on the Algorithm 5.1 from Halko et al, 2009. Note To obtain repeatable results, reset the seed for the pseudorandom number generator Note The input is assumed to be a low-rank matrix. Note In general, use the full-rank SVD implementation `torch.svd` for dense matrices due to its 10-fold higher performance characteristics. The low-rank SVD will be useful for huge sparse matrices that `torch.svd` cannot handle. Args:: A (Tensor): the input tensor of size (∗,m,n)(*, m, n) q (int, optional): a slightly overestimated rank of A. niter (int, optional): the number of subspace iterations to conduct; niter must be a nonnegative integer, and defaults to 2 M (Tensor, optional): the input tensor’s mean of size (∗,1,n)(*, 1, n) . References:: * Nathan Halko, Per-Gunnar Martinsson, and Joel Tropp, Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions, arXiv:0909.4061 [math.NA; math.PR], 2009 (available at [arXiv](https://arxiv.org/abs/0909.4061)). # torch.swapaxes `torch.swapaxes(input, axis0, axis1) → Tensor` Alias for [`torch.transpose()`](torch.transpose#torch.transpose "torch.transpose"). This function is equivalent to NumPy’s swapaxes function. Examples: >>> x = torch.tensor([[[0,1],[2,3]],[[4,5],[6,7]]]) >>> x tensor([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) >>> torch.swapaxes(x, 0, 1) tensor([[[0, 1], [4, 5]], [[2, 3], [6, 7]]]) >>> torch.swapaxes(x, 0, 2) tensor([[[0, 4], [2, 6]], [[1, 5], [3, 7]]]) # torch.swapdims `torch.swapdims(input, dim0, dim1) → Tensor` Alias for [`torch.transpose()`](torch.transpose#torch.transpose "torch.transpose"). This function is equivalent to NumPy’s swapaxes function. Examples: >>> x = torch.tensor([[[0,1],[2,3]],[[4,5],[6,7]]]) >>> x tensor([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) >>> torch.swapdims(x, 0, 1) tensor([[[0, 1], [4, 5]], [[2, 3], [6, 7]]]) >>> torch.swapdims(x, 0, 2) tensor([[[0, 4], [2, 6]], [[1, 5], [3, 7]]]) # torch.symeig `torch.symeig(input, eigenvectors=False, upper=True, *, out=None) -> (Tensor, Tensor)` This function returns eigenvalues and eigenvectors of a real symmetric matrix `input` or a batch of real symmetric matrices, represented by a namedtuple (eigenvalues, eigenvectors). This function calculates all eigenvalues (and vectors) of `input` such that input=Vdiag(e)VT\text{input} = V \text{diag}(e) V^T . The boolean argument `eigenvectors` defines computation of both eigenvectors and eigenvalues or eigenvalues only. If it is `False`, only eigenvalues are computed. If it is `True`, both eigenvalues and eigenvectors are computed. Since the input matrix `input` is supposed to be symmetric, only the upper triangular portion is used by default. If `upper` is `False`, then lower triangular portion is used. Note The eigenvalues are returned in ascending order. If `input` is a batch of matrices, then the eigenvalues of each matrix in the batch is returned in ascending order. Note Irrespective of the original strides, the returned matrix `V` will be transposed, i.e. with strides `V.contiguous().transpose(-1, -2).stride()`. Warning Extra care needs to be taken when backward through outputs. Such operation is only stable when all eigenvalues are distinct and becomes less stable the smaller min⁡i≠j∣λi−λj∣\min_{i \neq j} |\lambda_i - \lambda_j| is. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,n,n)(*, n, n) where `*` is zero or more batch dimensions consisting of symmetric matrices. * **eigenvectors** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether eigenvectors have to be computed * **upper** (_boolean_ _,__optional_) – controls whether to consider upper-triangular or lower-triangular region Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the output tuple of (Tensor, Tensor) Returns A namedtuple (eigenvalues, eigenvectors) containing * **eigenvalues** (_Tensor_): Shape (∗,m)(*, m) . The eigenvalues in ascending order. * **eigenvectors** (_Tensor_): Shape (∗,m,m)(*, m, m) . If `eigenvectors=False`, it’s an empty tensor. Otherwise, this tensor contains the orthonormal eigenvectors of the `input`. Return type ([Tensor](../tensors#torch.Tensor "torch.Tensor"), [Tensor](../tensors#torch.Tensor "torch.Tensor")) Examples: >>> a = torch.randn(5, 5) >>> a = a + a.t() # To make a symmetric >>> a tensor([[-5.7827, 4.4559, -0.2344, -1.7123, -1.8330], [ 4.4559, 1.4250, -2.8636, -3.2100, -0.1798], [-0.2344, -2.8636, 1.7112, -5.5785, 7.1988], [-1.7123, -3.2100, -5.5785, -2.6227, 3.1036], [-1.8330, -0.1798, 7.1988, 3.1036, -5.1453]]) >>> e, v = torch.symeig(a, eigenvectors=True) >>> e tensor([-13.7012, -7.7497, -2.3163, 5.2477, 8.1050]) >>> v tensor([[ 0.1643, 0.9034, -0.0291, 0.3508, 0.1817], [-0.2417, -0.3071, -0.5081, 0.6534, 0.4026], [-0.5176, 0.1223, -0.0220, 0.3295, -0.7798], [-0.4850, 0.2695, -0.5773, -0.5840, 0.1337], [ 0.6415, -0.0447, -0.6381, -0.0193, -0.4230]]) >>> a_big = torch.randn(5, 2, 2) >>> a_big = a_big + a_big.transpose(-2, -1) # To make a_big symmetric >>> e, v = a_big.symeig(eigenvectors=True) >>> torch.allclose(torch.matmul(v, torch.matmul(e.diag_embed(), v.transpose(-2, -1))), a_big) True # torch.t `torch.t(input) → Tensor` Expects `input` to be <= 2-D tensor and transposes dimensions 0 and 1. 0-D and 1-D tensors are returned as is. When input is a 2-D tensor this is equivalent to `transpose(input, 0, 1)`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example: >>> x = torch.randn(()) >>> x tensor(0.1995) >>> torch.t(x) tensor(0.1995) >>> x = torch.randn(3) >>> x tensor([ 2.4320, -0.4608, 0.7702]) >>> torch.t(x) tensor([ 2.4320, -0.4608, 0.7702]) >>> x = torch.randn(2, 3) >>> x tensor([[ 0.4875, 0.9158, -0.5872], [ 0.3938, -0.6929, 0.6932]]) >>> torch.t(x) tensor([[ 0.4875, 0.3938], [ 0.9158, -0.6929], [-0.5872, 0.6932]]) # torch.take `torch.take(input, index) → Tensor` Returns a new tensor with the elements of `input` at the given indices. The input tensor is treated as if it were viewed as a 1-D tensor. The result takes the same shape as the indices. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **indices** (_LongTensor_) – the indices into tensor Example: >>> src = torch.tensor([[4, 3, 5], ... [6, 7, 8]]) >>> torch.take(src, torch.tensor([0, 2, 5])) tensor([ 4, 5, 8]) # torch.tan `torch.tan(input, *, out=None) → Tensor` Returns a new tensor with the tangent of the elements of `input`. outi=tan⁡(inputi)\text{out}_{i} = \tan(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([-1.2027, -1.7687, 0.4412, -1.3856]) >>> torch.tan(a) tensor([-2.5930, 4.9859, 0.4722, -5.3366]) # torch.tanh `torch.tanh(input, *, out=None) → Tensor` Returns a new tensor with the hyperbolic tangent of the elements of `input`. outi=tanh⁡(inputi)\text{out}_{i} = \tanh(\text{input}_{i}) Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 0.8986, -0.7279, 1.1745, 0.2611]) >>> torch.tanh(a) tensor([ 0.7156, -0.6218, 0.8257, 0.2553]) # torch.tensor `torch.tensor(data, *, dtype=None, device=None, requires_grad=False, pin_memory=False) → Tensor` Constructs a tensor with `data`. Warning `torch.tensor()` always copies `data`. If you have a Tensor `data` and want to avoid a copy, use [`torch.Tensor.requires_grad_()`](../tensors#torch.Tensor.requires_grad_ "torch.Tensor.requires_grad_") or [`torch.Tensor.detach()`](../autograd#torch.Tensor.detach "torch.Tensor.detach"). If you have a NumPy `ndarray` and want to avoid a copy, use [`torch.as_tensor()`](torch.as_tensor#torch.as_tensor "torch.as_tensor"). Warning When data is a tensor `x`, `torch.tensor()` reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore `torch.tensor(x)` is equivalent to `x.clone().detach()` and `torch.tensor(x, requires_grad=True)` is equivalent to `x.clone().detach().requires_grad_(True)`. The equivalents using `clone()` and `detach()` are recommended. Parameters **data** (_array_like_) – Initial data for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, infers data type from `data`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **pin_memory** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: `False`. Example: >>> torch.tensor([[0.1, 1.2], [2.2, 3.1], [4.9, 5.2]]) tensor([[ 0.1000, 1.2000], [ 2.2000, 3.1000], [ 4.9000, 5.2000]]) >>> torch.tensor([0, 1]) # Type inference on data tensor([ 0, 1]) >>> torch.tensor([[0.11111, 0.222222, 0.3333333]], ... dtype=torch.float64, ... device=torch.device('cuda:0')) # creates a torch.cuda.DoubleTensor tensor([[ 0.1111, 0.2222, 0.3333]], dtype=torch.float64, device='cuda:0') >>> torch.tensor(3.14159) # Create a scalar (zero-dimensional tensor) tensor(3.1416) >>> torch.tensor([]) # Create an empty tensor (of size (0,)) tensor([]) # torch.tensor_split `torch.tensor_split(input, indices_or_sections, dim=0) → List of Tensors` Splits a tensor into multiple sub-tensors, all of which are views of `input`, along dimension `dim` according to the indices or number of sections specified by `indices_or_sections`. This function is based on NumPy’s [`numpy.array_split()`](https://numpy.org/doc/stable/reference/generated/numpy.array_split.html#numpy.array_split "\(in NumPy v1.20\)"). Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to split * **indices_or_sections** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _or_ _tuple of python:ints_) – If `indices_or_sections` is an integer `n` or a zero dimensional long tensor with value `n`, `input` is split into `n` sections along dimension `dim`. If `input` is divisible by `n` along dimension `dim`, each section will be of equal size, `input.size(dim) / n`. If `input` is not divisible by `n`, the sizes of the first `int(input.size(dim) % n)` sections will have size `int(input.size(dim) / n) + 1`, and the rest will have size `int(input.size(dim) / n)`. If `indices_or_sections` is a list or tuple of ints, or a one-dimensional long tensor, then `input` is split along dimension `dim` at each of the indices in the list, tuple or tensor. For instance, `indices_or_sections=[2, 3]` and `dim=0` would result in the tensors `input[:2]`, `input[2:3]`, and `input[3:]`. If indices_or_sections is a tensor, it must be a zero-dimensional or one- dimensional long tensor on the CPU. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – dimension along which to split the tensor. Default: `0` Example:: >>> x = torch.arange(8) >>> torch.tensor_split(x, 3) (tensor([0, 1, 2]), tensor([3, 4, 5]), tensor([6, 7])) >>> x = torch.arange(7) >>> torch.tensor_split(x, 3) (tensor([0, 1, 2]), tensor([3, 4]), tensor([5, 6])) >>> torch.tensor_split(x, (1, 6)) (tensor([0]), tensor([1, 2, 3, 4, 5]), tensor([6])) >>> x = torch.arange(14).reshape(2, 7) >>> x tensor([[ 0, 1, 2, 3, 4, 5, 6], [ 7, 8, 9, 10, 11, 12, 13]]) >>> torch.tensor_split(x, 3, dim=1) (tensor([[0, 1, 2], [7, 8, 9]]), tensor([[ 3, 4], [10, 11]]), tensor([[ 5, 6], [12, 13]])) >>> torch.tensor_split(x, (1, 6), dim=1) (tensor([[0], [7]]), tensor([[ 1, 2, 3, 4, 5], [ 8, 9, 10, 11, 12]]), tensor([[ 6], [13]])) # torch.tensordot `torch.tensordot(a, b, dims=2, out=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/functional.html#tensordot) Returns a contraction of a and b over multiple dimensions. `tensordot` implements a generalized matrix product. Parameters * **a** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Left tensor to contract * **b** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – Right tensor to contract * **dims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[__List_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__]__containing two lists_) – number of dimensions to contract or explicit lists of dimensions for `a` and `b` respectively When called with a non-negative integer argument `dims` = dd , and the number of dimensions of `a` and `b` is mm and nn , respectively, `tensordot()` computes ri0,...,im−d,id,...,in=∑k0,...,kd−1ai0,...,im−d,k0,...,kd−1×bk0,...,kd−1,id,...,in.r_{i_0,...,i_{m-d}, i_d,...,i_n} = \sum_{k_0,...,k_{d-1}} a_{i_0,...,i_{m-d},k_0,...,k_{d-1}} \times b_{k_0,...,k_{d-1}, i_d,...,i_n}. When called with `dims` of the list form, the given dimensions will be contracted in place of the last dd of `a` and the first dd of bb . The sizes in these dimensions must match, but `tensordot()` will deal with broadcasted dimensions. Examples: >>> a = torch.arange(60.).reshape(3, 4, 5) >>> b = torch.arange(24.).reshape(4, 3, 2) >>> torch.tensordot(a, b, dims=([1, 0], [0, 1])) tensor([[4400., 4730.], [4532., 4874.], [4664., 5018.], [4796., 5162.], [4928., 5306.]]) >>> a = torch.randn(3, 4, 5, device='cuda') >>> b = torch.randn(4, 5, 6, device='cuda') >>> c = torch.tensordot(a, b, dims=2).cpu() tensor([[ 8.3504, -2.5436, 6.2922, 2.7556, -1.0732, 3.2741], [ 3.3161, 0.0704, 5.0187, -0.4079, -4.3126, 4.8744], [ 0.8223, 3.9445, 3.2168, -0.2400, 3.4117, 1.7780]]) >>> a = torch.randn(3, 5, 4, 6) >>> b = torch.randn(6, 4, 5, 3) >>> torch.tensordot(a, b, dims=([2, 1, 3], [1, 2, 0])) tensor([[ 7.7193, -2.4867, -10.3204], [ 1.5513, -14.4737, -6.5113], [ -0.2850, 4.2573, -3.5997]]) # torch.tile `torch.tile(input, reps) → Tensor` Constructs a tensor by repeating the elements of `input`. The `reps` argument specifies the number of repetitions in each dimension. If `reps` specifies fewer dimensions than `input` has, then ones are prepended to `reps` until all dimensions are specified. For example, if `input` has shape (8, 6, 4, 2) and `reps` is (2, 2), then `reps` is treated as (1, 1, 2, 2). Analogously, if `input` has fewer dimensions than `reps` specifies, then `input` is treated as if it were unsqueezed at dimension zero until it has as many dimensions as `reps` specifies. For example, if `input` has shape (4, 2) and `reps` is (3, 3, 2, 2), then `input` is treated as if it had the shape (1, 1, 4, 2). Note This function is similar to NumPy’s tile function. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor whose elements to repeat. * **reps** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the number of repetitions per dimension. Example: >>> x = torch.tensor([1, 2, 3]) >>> x.tile((2,)) tensor([1, 2, 3, 1, 2, 3]) >>> y = torch.tensor([[1, 2], [3, 4]]) >>> torch.tile(y, (2, 2)) tensor([[1, 2, 1, 2], [3, 4, 3, 4], [1, 2, 1, 2], [3, 4, 3, 4]]) # torch.topk `torch.topk(input, k, dim=None, largest=True, sorted=True, *, out=None) -> (Tensor, LongTensor)` Returns the `k` largest elements of the given `input` tensor along a given dimension. If `dim` is not given, the last dimension of the `input` is chosen. If `largest` is `False` then the `k` smallest elements are returned. A namedtuple of `(values, indices)` is returned, where the `indices` are the indices of the elements in the original `input` tensor. The boolean option `sorted` if `True`, will make sure that the returned `k` elements are themselves sorted Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **k** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the k in “top-k” * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the dimension to sort along * **largest** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to return largest or smallest elements * **sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to return the elements in sorted order Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the output tuple of (Tensor, LongTensor) that can be optionally given to be used as output buffers Example: >>> x = torch.arange(1., 6.) >>> x tensor([ 1., 2., 3., 4., 5.]) >>> torch.topk(x, 3) torch.return_types.topk(values=tensor([5., 4., 3.]), indices=tensor([4, 3, 2])) # torch.trace `torch.trace(input) → Tensor` Returns the sum of the elements of the diagonal of the input 2-D matrix. Example: >>> x = torch.arange(1., 10.).view(3, 3) >>> x tensor([[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.]]) >>> torch.trace(x) tensor(15.) # torch.transpose `torch.transpose(input, dim0, dim1) → Tensor` Returns a tensor that is a transposed version of `input`. The given dimensions `dim0` and `dim1` are swapped. The resulting `out` tensor shares its underlying storage with the `input` tensor, so changing the content of one would change the content of the other. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim0** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the first dimension to be transposed * **dim1** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the second dimension to be transposed Example: >>> x = torch.randn(2, 3) >>> x tensor([[ 1.0028, -0.9893, 0.5809], [-0.1669, 0.7299, 0.4942]]) >>> torch.transpose(x, 0, 1) tensor([[ 1.0028, -0.1669], [-0.9893, 0.7299], [ 0.5809, 0.4942]]) # torch.trapz `torch.trapz(y, x, *, dim=-1) → Tensor` Estimate ∫ydx\int y\,dx along `dim`, using the trapezoid rule. Parameters * **y** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The values of the function to integrate * **x** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The points at which the function `y` is sampled. If `x` is not in ascending order, intervals on which it is decreasing contribute negatively to the estimated integral (i.e., the convention ∫abf=−∫baf\int_a^b f = -\int_b^a f is followed). * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The dimension along which to integrate. By default, use the last dimension. Returns A Tensor with the same shape as the input, except with `dim` removed. Each element of the returned tensor represents the estimated integral ∫ydx\int y\,dx along `dim`. Example: >>> y = torch.randn((2, 3)) >>> y tensor([[-2.1156, 0.6857, -0.2700], [-1.2145, 0.5540, 2.0431]]) >>> x = torch.tensor([[1, 3, 4], [1, 2, 3]]) >>> torch.trapz(y, x) tensor([-1.2220, 0.9683]) `torch.trapz(y, *, dx=1, dim=-1) → Tensor` As above, but the sample points are spaced uniformly at a distance of `dx`. Parameters **y** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – The values of the function to integrate Keyword Arguments * **dx** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – The distance between points at which `y` is sampled. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The dimension along which to integrate. By default, use the last dimension. Returns A Tensor with the same shape as the input, except with `dim` removed. Each element of the returned tensor represents the estimated integral ∫ydx\int y\,dx along `dim`. # torch.triangular_solve `torch.triangular_solve(input, A, upper=True, transpose=False, unitriangular=False) -> (Tensor, Tensor)` Solves a system of equations with a triangular coefficient matrix AA and multiple right-hand sides bb . In particular, solves AX=bAX = b and assumes AA is upper-triangular with the default keyword arguments. `torch.triangular_solve(b, A)` can take in 2D inputs `b, A` or inputs that are batches of 2D matrices. If the inputs are batches, then returns batched outputs `X` Supports real-valued and complex-valued inputs. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – multiple right-hand sides of size (∗,m,k)(*, m, k) where ∗* is zero of more batch dimensions (bb ) * **A** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input triangular coefficient matrix of size (∗,m,m)(*, m, m) where ∗* is zero or more batch dimensions * **upper** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to solve the upper-triangular system of equations (default) or the lower-triangular system of equations. Default: `True`. * **transpose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether AA should be transposed before being sent into the solver. Default: `False`. * **unitriangular** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether AA is unit triangular. If True, the diagonal elements of AA are assumed to be 1 and not referenced from AA . Default: `False`. Returns A namedtuple `(solution, cloned_coefficient)` where `cloned_coefficient` is a clone of AA and `solution` is the solution XX to AX=bAX = b (or whatever variant of the system of equations, depending on the keyword arguments.) Examples: >>> A = torch.randn(2, 2).triu() >>> A tensor([[ 1.1527, -1.0753], [ 0.0000, 0.7986]]) >>> b = torch.randn(2, 3) >>> b tensor([[-0.0210, 2.3513, -1.5492], [ 1.5429, 0.7403, -1.0243]]) >>> torch.triangular_solve(b, A) torch.return_types.triangular_solve( solution=tensor([[ 1.7841, 2.9046, -2.5405], [ 1.9320, 0.9270, -1.2826]]), cloned_coefficient=tensor([[ 1.1527, -1.0753], [ 0.0000, 0.7986]])) # torch.tril `torch.tril(input, diagonal=0, *, out=None) → Tensor` Returns the lower triangular part of the matrix (2-D tensor) or batch of matrices `input`, the other elements of the result tensor `out` are set to 0. The lower triangular part of the matrix is defined as the elements on and below the diagonal. The argument [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") controls which diagonal to consider. If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") = 0, all elements on and below the main diagonal are retained. A positive value includes just as many diagonals above the main diagonal, and similarly a negative value excludes just as many diagonals below the main diagonal. The main diagonal are the set of indices {(i,i)}\lbrace (i, i) \rbrace for i∈[0,min⁡{d1,d2}−1]i \in [0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1}, d_{2} are the dimensions of the matrix. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **diagonal** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(3, 3) >>> a tensor([[-1.0813, -0.8619, 0.7105], [ 0.0935, 0.1380, 2.2112], [-0.3409, -0.9828, 0.0289]]) >>> torch.tril(a) tensor([[-1.0813, 0.0000, 0.0000], [ 0.0935, 0.1380, 0.0000], [-0.3409, -0.9828, 0.0289]]) >>> b = torch.randn(4, 6) >>> b tensor([[ 1.2219, 0.5653, -0.2521, -0.2345, 1.2544, 0.3461], [ 0.4785, -0.4477, 0.6049, 0.6368, 0.8775, 0.7145], [ 1.1502, 3.2716, -1.1243, -0.5413, 0.3615, 0.6864], [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0978]]) >>> torch.tril(b, diagonal=1) tensor([[ 1.2219, 0.5653, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.4785, -0.4477, 0.6049, 0.0000, 0.0000, 0.0000], [ 1.1502, 3.2716, -1.1243, -0.5413, 0.0000, 0.0000], [-0.0614, -0.7344, -1.3164, -0.7648, -1.4024, 0.0000]]) >>> torch.tril(b, diagonal=-1) tensor([[ 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 0.4785, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000], [ 1.1502, 3.2716, 0.0000, 0.0000, 0.0000, 0.0000], [-0.0614, -0.7344, -1.3164, 0.0000, 0.0000, 0.0000]]) # torch.tril_indices `torch.tril_indices(row, col, offset=0, *, dtype=torch.long, device='cpu', layout=torch.strided) → Tensor` Returns the indices of the lower triangular part of a `row`-by- `col` matrix in a 2-by-N Tensor, where the first row contains row coordinates of all indices and the second row contains column coordinates. Indices are ordered based on rows and then columns. The lower triangular part of the matrix is defined as the elements on and below the diagonal. The argument `offset` controls which diagonal to consider. If `offset` = 0, all elements on and below the main diagonal are retained. A positive value includes just as many diagonals above the main diagonal, and similarly a negative value excludes just as many diagonals below the main diagonal. The main diagonal are the set of indices {(i,i)}\lbrace (i, i) \rbrace for i∈[0,min⁡{d1,d2}−1]i \in [0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1}, d_{2} are the dimensions of the matrix. Note When running on CUDA, `row * col` must be less than 2592^{59} to prevent overflow during calculation. Parameters * **row** (`int`) – number of rows in the 2-D matrix. * **col** (`int`) – number of columns in the 2-D matrix. * **offset** (`int`) – diagonal offset from the main diagonal. Default: if not provided, 0. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, `torch.long`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – currently only support `torch.strided`. Example:: >>> a = torch.tril_indices(3, 3) >>> a tensor([[0, 1, 1, 2, 2, 2], [0, 0, 1, 0, 1, 2]]) >>> a = torch.tril_indices(4, 3, -1) >>> a tensor([[1, 2, 2, 3, 3, 3], [0, 0, 1, 0, 1, 2]]) >>> a = torch.tril_indices(4, 3, 1) >>> a tensor([[0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3], [0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2]]) # torch.triu `torch.triu(input, diagonal=0, *, out=None) → Tensor` Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices `input`, the other elements of the result tensor `out` are set to 0. The upper triangular part of the matrix is defined as the elements on and above the diagonal. The argument [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") controls which diagonal to consider. If [`diagonal`](torch.diagonal#torch.diagonal "torch.diagonal") = 0, all elements on and above the main diagonal are retained. A positive value excludes just as many diagonals above the main diagonal, and similarly a negative value includes just as many diagonals below the main diagonal. The main diagonal are the set of indices {(i,i)}\lbrace (i, i) \rbrace for i∈[0,min⁡{d1,d2}−1]i \in [0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1}, d_{2} are the dimensions of the matrix. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **diagonal** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the diagonal to consider Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(3, 3) >>> a tensor([[ 0.2309, 0.5207, 2.0049], [ 0.2072, -1.0680, 0.6602], [ 0.3480, -0.5211, -0.4573]]) >>> torch.triu(a) tensor([[ 0.2309, 0.5207, 2.0049], [ 0.0000, -1.0680, 0.6602], [ 0.0000, 0.0000, -0.4573]]) >>> torch.triu(a, diagonal=1) tensor([[ 0.0000, 0.5207, 2.0049], [ 0.0000, 0.0000, 0.6602], [ 0.0000, 0.0000, 0.0000]]) >>> torch.triu(a, diagonal=-1) tensor([[ 0.2309, 0.5207, 2.0049], [ 0.2072, -1.0680, 0.6602], [ 0.0000, -0.5211, -0.4573]]) >>> b = torch.randn(4, 6) >>> b tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235], [-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857], [ 0.4333, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410], [-0.9888, 1.0679, -1.3337, -1.6556, 0.4798, 0.2830]]) >>> torch.triu(b, diagonal=1) tensor([[ 0.0000, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235], [ 0.0000, 0.0000, -1.2919, 1.3378, -0.1768, -1.0857], [ 0.0000, 0.0000, 0.0000, -1.0432, 0.9348, -0.4410], [ 0.0000, 0.0000, 0.0000, 0.0000, 0.4798, 0.2830]]) >>> torch.triu(b, diagonal=-1) tensor([[ 0.5876, -0.0794, -1.8373, 0.6654, 0.2604, 1.5235], [-0.2447, 0.9556, -1.2919, 1.3378, -0.1768, -1.0857], [ 0.0000, 0.3146, 0.6576, -1.0432, 0.9348, -0.4410], [ 0.0000, 0.0000, -1.3337, -1.6556, 0.4798, 0.2830]]) # torch.triu_indices `torch.triu_indices(row, col, offset=0, *, dtype=torch.long, device='cpu', layout=torch.strided) → Tensor` Returns the indices of the upper triangular part of a `row` by `col` matrix in a 2-by-N Tensor, where the first row contains row coordinates of all indices and the second row contains column coordinates. Indices are ordered based on rows and then columns. The upper triangular part of the matrix is defined as the elements on and above the diagonal. The argument `offset` controls which diagonal to consider. If `offset` = 0, all elements on and above the main diagonal are retained. A positive value excludes just as many diagonals above the main diagonal, and similarly a negative value includes just as many diagonals below the main diagonal. The main diagonal are the set of indices {(i,i)}\lbrace (i, i) \rbrace for i∈[0,min⁡{d1,d2}−1]i \in [0, \min\\{d_{1}, d_{2}\\} - 1] where d1,d2d_{1}, d_{2} are the dimensions of the matrix. Note When running on CUDA, `row * col` must be less than 2592^{59} to prevent overflow during calculation. Parameters * **row** (`int`) – number of rows in the 2-D matrix. * **col** (`int`) – number of columns in the 2-D matrix. * **offset** (`int`) – diagonal offset from the main diagonal. Default: if not provided, 0. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, `torch.long`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – currently only support `torch.strided`. Example:: >>> a = torch.triu_indices(3, 3) >>> a tensor([[0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2]]) >>> a = torch.triu_indices(4, 3, -1) >>> a tensor([[0, 0, 0, 1, 1, 1, 2, 2, 3], [0, 1, 2, 0, 1, 2, 1, 2, 2]]) >>> a = torch.triu_indices(4, 3, 1) >>> a tensor([[0, 0, 1], [1, 2, 2]]) # torch.true_divide `torch.true_divide(dividend, divisor, *, out) → Tensor` Alias for [`torch.div()`](torch.div#torch.div "torch.div") with `rounding_mode=None`. # torch.trunc `torch.trunc(input, *, out=None) → Tensor` Returns a new tensor with the truncated integer values of the elements of `input`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4) >>> a tensor([ 3.4742, 0.5466, -0.8008, -0.9079]) >>> torch.trunc(a) tensor([ 3., 0., -0., -0.]) # torch.unbind `torch.unbind(input, dim=0) → seq` Removes a tensor dimension. Returns a tuple of all slices along a given dimension, already without it. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the tensor to unbind * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension to remove Example: >>> torch.unbind(torch.tensor([[1, 2, 3], >>> [4, 5, 6], >>> [7, 8, 9]])) (tensor([1, 2, 3]), tensor([4, 5, 6]), tensor([7, 8, 9])) # torch.unique `torch.unique(*args, **kwargs)` Returns the unique elements of the input tensor. Note This function is different from [`torch.unique_consecutive()`](torch.unique_consecutive#torch.unique_consecutive "torch.unique_consecutive") in the sense that this function also eliminates non-consecutive duplicate values. Note Currently in the CUDA implementation and the CPU implementation when dim is specified, `torch.unique` always sort the tensor at the beginning regardless of the `sort` argument. Sorting could be slow, so if your input tensor is already sorted, it is recommended to use [`torch.unique_consecutive()`](torch.unique_consecutive#torch.unique_consecutive "torch.unique_consecutive") which avoids the sorting. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor * **sorted** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to sort the unique elements in ascending order before returning as output. * **return_inverse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the indices for where elements in the original input ended up in the returned unique list. * **return_counts** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the counts for each unique element. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to apply unique. If `None`, the unique of the flattened input is returned. default: `None` Returns A tensor or a tuple of tensors containing * **output** (_Tensor_): the output list of unique scalar elements. * **inverse_indices** (_Tensor_): (optional) if `return_inverse` is True, there will be an additional returned tensor (same shape as input) representing the indices for where elements in the original input map to in the output; otherwise, this function will only return a single tensor. * **counts** (_Tensor_): (optional) if `return_counts` is True, there will be an additional returned tensor (same shape as output or output.size(dim), if dim was specified) representing the number of occurrences for each unique value or tensor. Return type ([Tensor](../tensors#torch.Tensor "torch.Tensor"), [Tensor](../tensors#torch.Tensor "torch.Tensor") (optional), [Tensor](../tensors#torch.Tensor "torch.Tensor") (optional)) Example: >>> output = torch.unique(torch.tensor([1, 3, 2, 3], dtype=torch.long)) >>> output tensor([ 2, 3, 1]) >>> output, inverse_indices = torch.unique( ... torch.tensor([1, 3, 2, 3], dtype=torch.long), sorted=True, return_inverse=True) >>> output tensor([ 1, 2, 3]) >>> inverse_indices tensor([ 0, 2, 1, 2]) >>> output, inverse_indices = torch.unique( ... torch.tensor([[1, 3], [2, 3]], dtype=torch.long), sorted=True, return_inverse=True) >>> output tensor([ 1, 2, 3]) >>> inverse_indices tensor([[ 0, 2], [ 1, 2]]) # torch.unique_consecutive `torch.unique_consecutive(*args, **kwargs)` Eliminates all but the first element from every consecutive group of equivalent elements. Note This function is different from [`torch.unique()`](torch.unique#torch.unique "torch.unique") in the sense that this function only eliminates consecutive duplicate values. This semantics is similar to `std::unique` in C++. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor * **return_inverse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the indices for where elements in the original input ended up in the returned unique list. * **return_counts** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to also return the counts for each unique element. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to apply unique. If `None`, the unique of the flattened input is returned. default: `None` Returns A tensor or a tuple of tensors containing * **output** (_Tensor_): the output list of unique scalar elements. * **inverse_indices** (_Tensor_): (optional) if `return_inverse` is True, there will be an additional returned tensor (same shape as input) representing the indices for where elements in the original input map to in the output; otherwise, this function will only return a single tensor. * **counts** (_Tensor_): (optional) if `return_counts` is True, there will be an additional returned tensor (same shape as output or output.size(dim), if dim was specified) representing the number of occurrences for each unique value or tensor. Return type ([Tensor](../tensors#torch.Tensor "torch.Tensor"), [Tensor](../tensors#torch.Tensor "torch.Tensor") (optional), [Tensor](../tensors#torch.Tensor "torch.Tensor") (optional)) Example: >>> x = torch.tensor([1, 1, 2, 2, 3, 1, 1, 2]) >>> output = torch.unique_consecutive(x) >>> output tensor([1, 2, 3, 1, 2]) >>> output, inverse_indices = torch.unique_consecutive(x, return_inverse=True) >>> output tensor([1, 2, 3, 1, 2]) >>> inverse_indices tensor([0, 0, 1, 1, 2, 3, 3, 4]) >>> output, counts = torch.unique_consecutive(x, return_counts=True) >>> output tensor([1, 2, 3, 1, 2]) >>> counts tensor([2, 2, 1, 2, 1]) # torch.unsqueeze `torch.unsqueeze(input, dim) → Tensor` Returns a new tensor with a dimension of size one inserted at the specified position. The returned tensor shares the same underlying data with this tensor. A `dim` value within the range `[-input.dim() - 1, input.dim() + 1)` can be used. Negative `dim` will correspond to `unsqueeze()` applied at `dim` = `dim + input.dim() + 1`. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the index at which to insert the singleton dimension Example: >>> x = torch.tensor([1, 2, 3, 4]) >>> torch.unsqueeze(x, 0) tensor([[ 1, 2, 3, 4]]) >>> torch.unsqueeze(x, 1) tensor([[ 1], [ 2], [ 3], [ 4]]) # torch.use_deterministic_algorithms `torch.use_deterministic_algorithms(d)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#use_deterministic_algorithms) Sets whether PyTorch operations must use “deterministic” algorithms. That is, algorithms which, given the same input, and when run on the same software and hardware, always produce the same output. When True, operations will use deterministic algorithms when available, and if only nondeterministic algorithms are available they will throw a :class:RuntimeError when called. Warning This feature is in beta, and its design and implementation may change in the future. The following normally-nondeterministic operations will act deterministically when `d=True`: * [`torch.nn.Conv1d`](torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") when called on CUDA tensor * [`torch.nn.Conv2d`](torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") when called on CUDA tensor * [`torch.nn.Conv3d`](torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") when called on CUDA tensor * [`torch.nn.ConvTranspose1d`](torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") when called on CUDA tensor * [`torch.nn.ConvTranspose2d`](torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") when called on CUDA tensor * [`torch.nn.ConvTranspose3d`](torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") when called on CUDA tensor * [`torch.bmm()`](torch.bmm#torch.bmm "torch.bmm") when called on sparse-dense CUDA tensors * `torch.__getitem__()` backward when `self` is a CPU tensor and `indices` is a list of tensors * `torch.index_put()` with `accumulate=True` when called on a CPU tensor The following normally-nondeterministic operations will throw a [`RuntimeError`](https://docs.python.org/3/library/exceptions.html#RuntimeError "\(in Python v3.9\)") when `d=True`: * [`torch.nn.AvgPool3d`](torch.nn.avgpool3d#torch.nn.AvgPool3d "torch.nn.AvgPool3d") when called on a CUDA tensor that requires grad * [`torch.nn.AdaptiveAvgPool2d`](torch.nn.adaptiveavgpool2d#torch.nn.AdaptiveAvgPool2d "torch.nn.AdaptiveAvgPool2d") when called on a CUDA tensor that requires grad * [`torch.nn.AdaptiveAvgPool3d`](torch.nn.adaptiveavgpool3d#torch.nn.AdaptiveAvgPool3d "torch.nn.AdaptiveAvgPool3d") when called on a CUDA tensor that requires grad * [`torch.nn.MaxPool3d`](torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") when called on a CUDA tensor that requires grad * [`torch.nn.AdaptiveMaxPool2d`](torch.nn.adaptivemaxpool2d#torch.nn.AdaptiveMaxPool2d "torch.nn.AdaptiveMaxPool2d") when called on a CUDA tensor that requires grad * [`torch.nn.FractionalMaxPool2d`](torch.nn.fractionalmaxpool2d#torch.nn.FractionalMaxPool2d "torch.nn.FractionalMaxPool2d") when called on a CUDA tensor that requires grad * `torch.nn.FractionalMaxPool3d` when called on a CUDA tensor that requires grad * [`torch.nn.functional.interpolate()`](../nn.functional#torch.nn.functional.interpolate "torch.nn.functional.interpolate") when called on a CUDA tensor that requires grad and one of the following modes is used: * `linear` * `bilinear` * `bicubic` * `trilinear` * [`torch.nn.ReflectionPad1d`](torch.nn.reflectionpad1d#torch.nn.ReflectionPad1d "torch.nn.ReflectionPad1d") when called on a CUDA tensor that requires grad * [`torch.nn.ReflectionPad2d`](torch.nn.reflectionpad2d#torch.nn.ReflectionPad2d "torch.nn.ReflectionPad2d") when called on a CUDA tensor that requires grad * [`torch.nn.ReplicationPad1d`](torch.nn.replicationpad1d#torch.nn.ReplicationPad1d "torch.nn.ReplicationPad1d") when called on a CUDA tensor that requires grad * [`torch.nn.ReplicationPad2d`](torch.nn.replicationpad2d#torch.nn.ReplicationPad2d "torch.nn.ReplicationPad2d") when called on a CUDA tensor that requires grad * [`torch.nn.ReplicationPad3d`](torch.nn.replicationpad3d#torch.nn.ReplicationPad3d "torch.nn.ReplicationPad3d") when called on a CUDA tensor that requires grad * [`torch.nn.NLLLoss`](torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") when called on a CUDA tensor that requires grad * [`torch.nn.CTCLoss`](torch.nn.ctcloss#torch.nn.CTCLoss "torch.nn.CTCLoss") when called on a CUDA tensor that requires grad * [`torch.nn.EmbeddingBag`](torch.nn.embeddingbag#torch.nn.EmbeddingBag "torch.nn.EmbeddingBag") when called on a CUDA tensor that requires grad * `torch.scatter_add_()` when called on a CUDA tensor * `torch.index_add_()` when called on a CUDA tensor * `torch.index_copy()` * [`torch.index_select()`](torch.index_select#torch.index_select "torch.index_select") when called on a CUDA tensor that requires grad * [`torch.repeat_interleave()`](torch.repeat_interleave#torch.repeat_interleave "torch.repeat_interleave") when called on a CUDA tensor that requires grad * [`torch.histc()`](torch.histc#torch.histc "torch.histc") when called on a CUDA tensor * [`torch.bincount()`](torch.bincount#torch.bincount "torch.bincount") when called on a CUDA tensor * [`torch.kthvalue()`](torch.kthvalue#torch.kthvalue "torch.kthvalue") with called on a CUDA tensor * [`torch.median()`](torch.median#torch.median "torch.median") with indices output when called on a CUDA tensor A handful of CUDA operations are nondeterministic if the CUDA version is 10.2 or greater, unless the environment variable `CUBLAS_WORKSPACE_CONFIG=:4096:8` or `CUBLAS_WORKSPACE_CONFIG=:16:8` is set. See the CUDA documentation for more details: If one of these environment variable configurations is not set, a [`RuntimeError`](https://docs.python.org/3/library/exceptions.html#RuntimeError "\(in Python v3.9\)") will be raised from these operations when called with CUDA tensors: * [`torch.mm()`](torch.mm#torch.mm "torch.mm") * [`torch.mv()`](torch.mv#torch.mv "torch.mv") * [`torch.bmm()`](torch.bmm#torch.bmm "torch.bmm") Note that deterministic operations tend to have worse performance than non- deterministic operations. Parameters **d** ([`bool`](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If True, force operations to be deterministic. If False, allow non-deterministic operations. # torch.vander `torch.vander(x, N=None, increasing=False) → Tensor` Generates a Vandermonde matrix. The columns of the output matrix are elementwise powers of the input vector x(N−1),x(N−2),...,x0x^{(N-1)}, x^{(N-2)}, ..., x^0 . If increasing is True, the order of the columns is reversed x0,x1,...,x(N−1)x^0, x^1, ..., x^{(N-1)} . Such a matrix with a geometric progression in each row is named for Alexandre-Theophile Vandermonde. Parameters * **x** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – 1-D input tensor. * **N** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Number of columns in the output. If N is not specified, a square array is returned (N=len(x))(N = len(x)) . * **increasing** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Order of the powers of the columns. If True, the powers increase from left to right, if False (the default) they are reversed. Returns Vandermonde matrix. If increasing is False, the first column is x(N−1)x^{(N-1)} , the second x(N−2)x^{(N-2)} and so forth. If increasing is True, the columns are x0,x1,...,x(N−1)x^0, x^1, ..., x^{(N-1)} . Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> x = torch.tensor([1, 2, 3, 5]) >>> torch.vander(x) tensor([[ 1, 1, 1, 1], [ 8, 4, 2, 1], [ 27, 9, 3, 1], [125, 25, 5, 1]]) >>> torch.vander(x, N=3) tensor([[ 1, 1, 1], [ 4, 2, 1], [ 9, 3, 1], [25, 5, 1]]) >>> torch.vander(x, N=3, increasing=True) tensor([[ 1, 1, 1], [ 1, 2, 4], [ 1, 3, 9], [ 1, 5, 25]]) # torch.var `torch.var(input, unbiased=True) → Tensor` Returns the variance of all elements in the `input` tensor. If `unbiased` is `False`, then the variance will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not Example: >>> a = torch.randn(1, 3) >>> a tensor([[-0.3425, -1.2636, -0.4864]]) >>> torch.var(a) tensor(0.2455) `torch.var(input, dim, unbiased=True, keepdim=False, *, out=None) → Tensor` Returns the variance of each row of the `input` tensor in the given dimension `dim`. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). If `unbiased` is `False`, then the variance will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.randn(4, 4) >>> a tensor([[-0.3567, 1.7385, -1.3042, 0.7423], [ 1.3436, -0.1015, -0.9834, -0.8438], [ 0.6056, 0.1089, -0.3112, -1.4085], [-0.7700, 0.6074, -0.1469, 0.7777]]) >>> torch.var(a, 1) tensor([ 1.7444, 1.1363, 0.7356, 0.5112]) # torch.var_mean `torch.var_mean(input, unbiased=True) -> (Tensor, Tensor)` Returns the variance and mean of all elements in the `input` tensor. If `unbiased` is `False`, then the variance will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not Example: >>> a = torch.randn(1, 3) >>> a tensor([[0.0146, 0.4258, 0.2211]]) >>> torch.var_mean(a) (tensor(0.0423), tensor(0.2205)) `torch.var_mean(input, dim, keepdim=False, unbiased=True) -> (Tensor, Tensor)` Returns the variance and mean of each row of the `input` tensor in the given dimension `dim`. If `keepdim` is `True`, the output tensor is of the same size as `input` except in the dimension(s) `dim` where it is of size 1. Otherwise, `dim` is squeezed (see [`torch.squeeze()`](torch.squeeze#torch.squeeze "torch.squeeze")), resulting in the output tensor having 1 (or `len(dim)`) fewer dimension(s). If `unbiased` is `False`, then the variance will be calculated via the biased estimator. Otherwise, Bessel’s correction will be used. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – the dimension or dimensions to reduce. * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether the output tensor has `dim` retained or not. * **unbiased** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use the unbiased estimation or not Example: >>> a = torch.randn(4, 4) >>> a tensor([[-1.5650, 2.0415, -0.1024, -0.5790], [ 0.2325, -2.6145, -1.6428, -0.3537], [-0.2159, -1.1069, 1.2882, -1.3265], [-0.6706, -1.5893, 0.6827, 1.6727]]) >>> torch.var_mean(a, 1) (tensor([2.3174, 1.6403, 1.4092, 2.0791]), tensor([-0.0512, -1.0946, -0.3403, 0.0239])) # torch.vdot `torch.vdot(input, other, *, out=None) → Tensor` Computes the dot product of two 1D tensors. The vdot(a, b) function handles complex numbers differently than dot(a, b). If the first argument is complex, the complex conjugate of the first argument is used for the calculation of the dot product. Note Unlike NumPy’s vdot, torch.vdot intentionally only supports computing the dot product of two 1D tensors with the same number of elements. Parameters * **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – first tensor in the dot product, must be 1D. Its conjugate is used if it’s complex. * **other** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – second tensor in the dot product, must be 1D. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> torch.vdot(torch.tensor([2, 3]), torch.tensor([2, 1])) tensor(7) >>> a = torch.tensor((1 +2j, 3 - 1j)) >>> b = torch.tensor((2 +1j, 4 - 0j)) >>> torch.vdot(a, b) tensor([16.+1.j]) >>> torch.vdot(b, a) tensor([16.-1.j]) # torch.view_as_complex `torch.view_as_complex(input) → Tensor` Returns a view of `input` as a complex tensor. For an input complex tensor of `size` m1,m2,…,mi,2m1, m2, \dots, mi, 2 , this function returns a new complex tensor of `size` m1,m2,…,mim1, m2, \dots, mi where the last dimension of the input tensor is expected to represent the real and imaginary components of complex numbers. Warning `view_as_complex()` is only supported for tensors with [`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype") `torch.float64` and `torch.float32`. The input is expected to have the last dimension of `size` 2\. In addition, the tensor must have a `stride` of 1 for its last dimension. The strides of all other dimensions must be even numbers. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example:: >>> x=torch.randn(4, 2) >>> x tensor([[ 1.6116, -0.5772], [-1.4606, -0.9120], [ 0.0786, -1.7497], [-0.6561, -1.6623]]) >>> torch.view_as_complex(x) tensor([(1.6116-0.5772j), (-1.4606-0.9120j), (0.0786-1.7497j), (-0.6561-1.6623j)]) # torch.view_as_real `torch.view_as_real(input) → Tensor` Returns a view of `input` as a real tensor. For an input complex tensor of `size` m1,m2,…,mim1, m2, \dots, mi , this function returns a new real tensor of size m1,m2,…,mi,2m1, m2, \dots, mi, 2 , where the last dimension of size 2 represents the real and imaginary components of complex numbers. Warning `view_as_real()` is only supported for tensors with `complex dtypes`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the input tensor. Example:: >>> x=torch.randn(4, dtype=torch.cfloat) >>> x tensor([(0.4737-0.3839j), (-0.2098-0.6699j), (0.3470-0.9451j), (-0.5174-1.3136j)]) >>> torch.view_as_real(x) tensor([[ 0.4737, -0.3839], [-0.2098, -0.6699], [ 0.3470, -0.9451], [-0.5174, -1.3136]]) # torch.vstack `torch.vstack(tensors, *, out=None) → Tensor` Stack tensors in sequence vertically (row wise). This is equivalent to concatenation along the first axis after all 1-D tensors have been reshaped by [`torch.atleast_2d()`](torch.atleast_2d#torch.atleast_2d "torch.atleast_2d"). Parameters **tensors** (_sequence of Tensors_) – sequence of tensors to concatenate Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> a = torch.tensor([1, 2, 3]) >>> b = torch.tensor([4, 5, 6]) >>> torch.vstack((a,b)) tensor([[1, 2, 3], [4, 5, 6]]) >>> a = torch.tensor([[1],[2],[3]]) >>> b = torch.tensor([[4],[5],[6]]) >>> torch.vstack((a,b)) tensor([[1], [2], [3], [4], [5], [6]]) # torch.where `torch.where(condition, x, y) → Tensor` Return a tensor of elements selected from either `x` or `y`, depending on `condition`. The operation is defined as: outi={xiif conditioniyiotherwise\text{out}_i = \begin{cases} \text{x}_i & \text{if } \text{condition}_i \\\ \text{y}_i & \text{otherwise} \\\ \end{cases} Note The tensors `condition`, `x`, `y` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). Note Currently valid scalar and tensor combination are 1. Scalar of floating dtype and torch.double 2. Scalar of integral dtype and torch.long 3. Scalar of complex dtype and torch.complex128 Parameters * **condition** (_BoolTensor_) – When True (nonzero), yield x, otherwise yield y * **x** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – value (if :attr:x is a scalar) or values selected at indices where `condition` is `True` * **y** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _or_ _Scalar_) – value (if :attr:x is a scalar) or values selected at indices where `condition` is `False` Returns A tensor of shape equal to the broadcasted shape of `condition`, `x`, `y` Return type [Tensor](../tensors#torch.Tensor "torch.Tensor") Example: >>> x = torch.randn(3, 2) >>> y = torch.ones(3, 2) >>> x tensor([[-0.4620, 0.3139], [ 0.3898, -0.7197], [ 0.0478, -0.1657]]) >>> torch.where(x > 0, x, y) tensor([[ 1.0000, 0.3139], [ 0.3898, 1.0000], [ 0.0478, 1.0000]]) >>> x = torch.randn(2, 2, dtype=torch.double) >>> x tensor([[ 1.0779, 0.0383], [-0.8785, -1.1089]], dtype=torch.float64) >>> torch.where(x > 0, x, 0.) tensor([[1.0779, 0.0383], [0.0000, 0.0000]], dtype=torch.float64) `torch.where(condition) → tuple of LongTensor` `torch.where(condition)` is identical to `torch.nonzero(condition, as_tuple=True)`. Note See also [`torch.nonzero()`](torch.nonzero#torch.nonzero "torch.nonzero"). # torch.xlogy `torch.xlogy(input, other, *, out=None) → Tensor` Computes `input * log(other)` with the following cases. outi={NaNif otheri=NaN0if inputi=0.0inputi∗log⁡(otheri)otherwise\text{out}_{i} = \begin{cases} \text{NaN} & \text{if } \text{other}_{i} = \text{NaN} \\\ 0 & \text{if } \text{input}_{i} = 0.0 \\\ \text{input}_{i} * \log{(\text{other}_{i})} & \text{otherwise} \end{cases} Similar to SciPy’s `scipy.special.xlogy`. Parameters * **input** (_Number_ _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – * **other** (_Number_ _or_[Tensor](../tensors#torch.Tensor "torch.Tensor")) – Note At least one of `input` or `other` must be a tensor. Keyword Arguments **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. Example: >>> x = torch.zeros(5,) >>> y = torch.tensor([-1, 0, 1, float('inf'), float('nan')]) >>> torch.xlogy(x, y) tensor([0., 0., 0., 0., nan]) >>> x = torch.tensor([1, 2, 3]) >>> y = torch.tensor([3, 2, 1]) >>> torch.xlogy(x, y) tensor([1.0986, 1.3863, 0.0000]) >>> torch.xlogy(x, 4) tensor([1.3863, 2.7726, 4.1589]) >>> torch.xlogy(2, y) tensor([2.1972, 1.3863, 0.0000]) # torch.zeros `torch.zeros(*size, *, out=None, dtype=None, layout=torch.strided, device=None, requires_grad=False) → Tensor` Returns a tensor filled with the scalar value `0`, with the shape defined by the variable argument `size`. Parameters **size** (_int..._) – a sequence of integers defining the shape of the output tensor. Can be a variable number of arguments or a collection like a list or tuple. Keyword Arguments * **out** ([Tensor](../tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if `None`, uses a global default (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned Tensor. Default: `torch.strided`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> torch.zeros(2, 3) tensor([[ 0., 0., 0.], [ 0., 0., 0.]]) >>> torch.zeros(5) tensor([ 0., 0., 0., 0., 0.]) # torch.zeros_like `torch.zeros_like(input, *, dtype=None, layout=None, device=None, requires_grad=False, memory_format=torch.preserve_format) → Tensor` Returns a tensor filled with the scalar value `0`, with the same size as `input`. `torch.zeros_like(input)` is equivalent to `torch.zeros(input.size(), dtype=input.dtype, layout=input.layout, device=input.device)`. Warning As of 0.4, this function does not support an `out` keyword. As an alternative, the old `torch.zeros_like(input, out=output)` is equivalent to `torch.zeros(input.size(), out=output)`. Parameters **input** ([Tensor](../tensors#torch.Tensor "torch.Tensor")) – the size of `input` will determine size of the output tensor. Keyword Arguments * **dtype** ([`torch.dtype`](../tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned Tensor. Default: if `None`, defaults to the dtype of `input`. * **layout** ([`torch.layout`](../tensor_attributes#torch.torch.layout "torch.torch.layout"), optional) – the desired layout of returned tensor. Default: if `None`, defaults to the layout of `input`. * **device** ([`torch.device`](../tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if `None`, defaults to the device of `input`. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. * **memory_format** ([`torch.memory_format`](../tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. Example: >>> input = torch.empty(2, 3) >>> torch.zeros_like(input) tensor([[ 0., 0., 0.], [ 0., 0., 0.]]) # torch.hub Pytorch Hub is a pre-trained model repository designed to facilitate research reproducibility. ## Publishing models Pytorch Hub supports publishing pre-trained models(model definitions and pre- trained weights) to a github repository by adding a simple `hubconf.py` file; `hubconf.py` can have multiple entrypoints. Each entrypoint is defined as a python function (example: a pre-trained model you want to publish). def entrypoint_name(*args, **kwargs): # args & kwargs are optional, for models which take positional/keyword arguments. ... ### How to implement an entrypoint? Here is a code snippet specifies an entrypoint for `resnet18` model if we expand the implementation in `pytorch/vision/hubconf.py`. In most case importing the right function in `hubconf.py` is sufficient. Here we just want to use the expanded version as an example to show how it works. You can see the full script in [pytorch/vision repo](https://github.com/pytorch/vision/blob/master/hubconf.py) dependencies = ['torch'] from torchvision.models.resnet import resnet18 as _resnet18 # resnet18 is the name of entrypoint def resnet18(pretrained=False, **kwargs): """ # This docstring shows up in hub.help() Resnet18 model pretrained (bool): kwargs, load pretrained weights into the model """ # Call the model, load pretrained weights model = _resnet18(pretrained=pretrained, **kwargs) return model * `dependencies` variable is a **list** of package names required to **load** the model. Note this might be slightly different from dependencies required for training a model. * `args` and `kwargs` are passed along to the real callable function. * Docstring of the function works as a help message. It explains what does the model do and what are the allowed positional/keyword arguments. It’s highly recommended to add a few examples here. * Entrypoint function can either return a model(nn.module), or auxiliary tools to make the user workflow smoother, e.g. tokenizers. * Callables prefixed with underscore are considered as helper functions which won’t show up in `torch.hub.list()`. * Pretrained weights can either be stored locally in the github repo, or loadable by `torch.hub.load_state_dict_from_url()`. If less than 2GB, it’s recommended to attach it to a [project release](https://help.github.com/en/articles/distributing-large-binaries) and use the url from the release. In the example above `torchvision.models.resnet.resnet18` handles `pretrained`, alternatively you can put the following logic in the entrypoint definition. if pretrained: # For checkpoint saved in local github repo, e.g. =weights/save.pth dirname = os.path.dirname(__file__) checkpoint = os.path.join(dirname, ) state_dict = torch.load(checkpoint) model.load_state_dict(state_dict) # For checkpoint saved elsewhere checkpoint = 'https://download.pytorch.org/models/resnet18-5c106cde.pth' model.load_state_dict(torch.hub.load_state_dict_from_url(checkpoint, progress=False)) ### Important Notice * The published models should be at least in a branch/tag. It can’t be a random commit. ## Loading models from Hub Pytorch Hub provides convenient APIs to explore all available models in hub through `torch.hub.list()`, show docstring and examples through `torch.hub.help()` and load the pre-trained models using `torch.hub.load()`. `torch.hub.list(github, force_reload=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#list) List all entrypoints available in `github` hubconf. Parameters * **github** (_string_) – a string with format “repo_owner/repo_name[:tag_name]” with an optional tag/branch. The default branch is `master` if not specified. Example: ‘pytorch/vision[:hub]’ * **force_reload** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to discard the existing cache and force a fresh download. Default is `False`. Returns a list of available entrypoint names Return type entrypoints #### Example >>> entrypoints = torch.hub.list('pytorch/vision', force_reload=True) `torch.hub.help(github, model, force_reload=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#help) Show the docstring of entrypoint `model`. Parameters * **github** (_string_) – a string with format with an optional tag/branch. The default branch is `master` if not specified. Example: ‘pytorch/vision[:hub]’ * **model** (_string_) – a string of entrypoint name defined in repo’s hubconf.py * **force_reload** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to discard the existing cache and force a fresh download. Default is `False`. #### Example >>> print(torch.hub.help('pytorch/vision', 'resnet18', force_reload=True)) `torch.hub.load(repo_or_dir, model, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#load) Load a model from a github repo or a local directory. Note: Loading a model is the typical use case, but this can also be used to for loading other objects such as tokenizers, loss functions, etc. If `source` is `'github'`, `repo_or_dir` is expected to be of the form `repo_owner/repo_name[:tag_name]` with an optional tag/branch. If `source` is `'local'`, `repo_or_dir` is expected to be a path to a local directory. Parameters * **repo_or_dir** (_string_) – repo name (`repo_owner/repo_name[:tag_name]`), if `source = 'github'`; or a path to a local directory, if `source = 'local'`. * **model** (_string_) – the name of a callable (entrypoint) defined in the repo/dir’s `hubconf.py`. * ***args** (_optional_) – the corresponding args for callable `model`. * **source** (_string_ _,__optional_) – `'github'` | `'local'`. Specifies how `repo_or_dir` is to be interpreted. Default is `'github'`. * **force_reload** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to force a fresh download of the github repo unconditionally. Does not have any effect if `source = 'local'`. Default is `False`. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, mute messages about hitting local caches. Note that the message about first download cannot be muted. Does not have any effect if `source = 'local'`. Default is `True`. * ****kwargs** (_optional_) – the corresponding kwargs for callable `model`. Returns The output of the `model` callable when called with the given `*args` and `**kwargs`. #### Example >>> # from a github repo >>> repo = 'pytorch/vision' >>> model = torch.hub.load(repo, 'resnet50', pretrained=True) >>> # from a local directory >>> path = '/some/local/path/pytorch/vision' >>> model = torch.hub.load(path, 'resnet50', pretrained=True) `torch.hub.download_url_to_file(url, dst, hash_prefix=None, progress=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#download_url_to_file) Download object at the given URL to a local path. Parameters * **url** (_string_) – URL of the object to download * **dst** (_string_) – Full path where object will be saved, e.g. `/tmp/temporary_file` * **hash_prefix** (_string_ _,__optional_) – If not None, the SHA256 downloaded file should start with `hash_prefix`. Default: None * **progress** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether or not to display a progress bar to stderr Default: True #### Example >>> torch.hub.download_url_to_file('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth', '/tmp/temporary_file') `torch.hub.load_state_dict_from_url(url, model_dir=None, map_location=None, progress=True, check_hash=False, file_name=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#load_state_dict_from_url) Loads the Torch serialized object at the given URL. If downloaded file is a zip file, it will be automatically decompressed. If the object is already present in `model_dir`, it’s deserialized and returned. The default value of `model_dir` is `/checkpoints` where `hub_dir` is the directory returned by `get_dir()`. Parameters * **url** (_string_) – URL of the object to download * **model_dir** (_string_ _,__optional_) – directory in which to save the object * **map_location** (_optional_) – a function or a dict specifying how to remap storage locations (see torch.load) * **progress** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether or not to display a progress bar to stderr. Default: True * **check_hash** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, the filename part of the URL should follow the naming convention `filename-.ext` where `` is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file. Default: False * **file_name** (_string_ _,__optional_) – name for the downloaded file. Filename from `url` will be used if not set. #### Example >>> state_dict = torch.hub.load_state_dict_from_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth') ### Running a loaded model: Note that `*args` and `**kwargs` in `torch.hub.load()` are used to **instantiate** a model. After you have loaded a model, how can you find out what you can do with the model? A suggested workflow is * `dir(model)` to see all available methods of the model. * `help(model.foo)` to check what arguments `model.foo` takes to run To help users explore without referring to documentation back and forth, we strongly recommend repo owners make function help messages clear and succinct. It’s also helpful to include a minimal working example. ### Where are my downloaded models saved? The locations are used in the order of * Calling `hub.set_dir()` * `$TORCH_HOME/hub`, if environment variable `TORCH_HOME` is set. * `$XDG_CACHE_HOME/torch/hub`, if environment variable `XDG_CACHE_HOME` is set. * `~/.cache/torch/hub` `torch.hub.get_dir()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#get_dir) Get the Torch Hub cache directory used for storing downloaded models & weights. If `set_dir()` is not called, default path is `$TORCH_HOME/hub` where environment variable `$TORCH_HOME` defaults to `$XDG_CACHE_HOME/torch`. `$XDG_CACHE_HOME` follows the X Design Group specification of the Linux filesystem layout, with a default value `~/.cache` if the environment variable is not set. `torch.hub.set_dir(d)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/hub.html#set_dir) Optionally set the Torch Hub directory used to save downloaded models & weights. Parameters **d** (_string_) – path to a local folder to save downloaded models & weights. ### Caching logic By default, we don’t clean up files after loading it. Hub uses the cache by default if it already exists in the directory returned by `get_dir()`. Users can force a reload by calling `hub.load(..., force_reload=True)`. This will delete the existing github folder and downloaded weights, reinitialize a fresh download. This is useful when updates are published to the same branch, users can keep up with the latest release. ### Known limitations: Torch hub works by importing the package as if it was installed. There’re some side effects introduced by importing in Python. For example, you can see new items in Python caches `sys.modules` and `sys.path_importer_cache` which is normal Python behavior. A known limitation that worth mentioning here is user **CANNOT** load two different branches of the same repo in the **same python process**. It’s just like installing two packages with the same name in Python, which is not good. Cache might join the party and give you surprises if you actually try that. Of course it’s totally fine to load them in separate processes. # PyTorch documentation PyTorch is an optimized tensor library for deep learning using GPUs and CPUs. Features described in this documentation are classified by release status: _Stable:_ These features will be maintained long-term and there should generally be no major performance limitations or gaps in documentation. We also expect to maintain backwards compatibility (although breaking changes can happen and notice will be given one release ahead of time). _Beta:_ Features are tagged as Beta because the API may change based on user feedback, because the performance needs to improve, or because coverage across operators is not yet complete. For Beta features, we are committing to seeing the feature through to the Stable classification. We are not, however, committing to backwards compatibility. _Prototype:_ These features are typically not available as part of binary distributions like PyPI or Conda, except sometimes behind run-time flags, and are at an early stage for feedback and testing. Notes * [Automatic Mixed Precision examples](https://pytorch.org/docs/1.8.0/notes/amp_examples.html) * [Autograd mechanics](https://pytorch.org/docs/1.8.0/notes/autograd.html) * [Broadcasting semantics](https://pytorch.org/docs/1.8.0/notes/broadcasting.html) * [CPU threading and TorchScript inference](https://pytorch.org/docs/1.8.0/notes/cpu_threading_torchscript_inference.html) * [CUDA semantics](https://pytorch.org/docs/1.8.0/notes/cuda.html) * [Distributed Data Parallel](https://pytorch.org/docs/1.8.0/notes/ddp.html) * [Extending PyTorch](https://pytorch.org/docs/1.8.0/notes/extending.html) * [Frequently Asked Questions](https://pytorch.org/docs/1.8.0/notes/faq.html) * [Features for large-scale deployments](https://pytorch.org/docs/1.8.0/notes/large_scale_deployments.html) * [Modules](https://pytorch.org/docs/1.8.0/notes/modules.html) * [Multiprocessing best practices](https://pytorch.org/docs/1.8.0/notes/multiprocessing.html) * [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) * [Serialization semantics](https://pytorch.org/docs/1.8.0/notes/serialization.html) * [Windows FAQ](https://pytorch.org/docs/1.8.0/notes/windows.html) Language Bindings * [C++](https://pytorch.org/docs/1.8.0/cpp_index.html) * [Javadoc](https://pytorch.org/javadoc/) Python API * [torch](torch) * [torch.nn](nn) * [torch.nn.functional](nn.functional) * [torch.Tensor](tensors) * [Tensor Attributes](tensor_attributes) * [Tensor Views](tensor_view) * [torch.autograd](autograd) * [torch.cuda](cuda) * [torch.cuda.amp](amp) * [torch.backends](backends) * [torch.distributed](distributed) * [torch.distributions](distributions) * [torch.fft](fft) * [torch.futures](futures) * [torch.fx](fx) * [torch.hub](hub) * [torch.jit](jit) * [torch.linalg](linalg) * [torch.overrides](torch.overrides) * [torch.nn.init](nn.init) * [torch.onnx](onnx) * [torch.optim](optim) * [Complex Numbers](complex_numbers) * [DDP Communication Hooks](ddp_comm_hooks) * [Pipeline Parallelism](pipeline) * [Quantization](quantization) * [Distributed RPC Framework](rpc) * [torch.random](random) * [torch.sparse](sparse) * [torch.Storage](storage) * [torch.utils.benchmark](benchmark_utils) * [torch.utils.bottleneck](bottleneck) * [torch.utils.checkpoint](checkpoint) * [torch.utils.cpp_extension](cpp_extension) * [torch.utils.data](data) * [torch.utils.dlpack](dlpack) * [torch.utils.mobile_optimizer](mobile_optimizer) * [torch.utils.model_zoo](model_zoo) * [torch.utils.tensorboard](tensorboard) * [Type Info](type_info) * [Named Tensors](named_tensor) * [Named Tensors operator coverage](name_inference) * [torch.__config__](__config__) Libraries * [torchaudio](https://pytorch.org/audio/stable) * [torchtext](https://pytorch.org/text/stable) * [torchvision](https://pytorch.org/vision/stable) * [TorchElastic](https://pytorch.org/elastic/) * [TorchServe](https://pytorch.org/serve) * [PyTorch on XLA Devices](http://pytorch.org/xla/) Community * [PyTorch Contribution Guide](https://pytorch.org/docs/1.8.0/community/contribution_guide.html) * [PyTorch Governance](https://pytorch.org/docs/1.8.0/community/governance.html) * [PyTorch Governance | Persons of Interest](https://pytorch.org/docs/1.8.0/community/persons_of_interest.html) # Indices and tables * [Index](https://pytorch.org/docs/1.8.0/genindex.html) * [Module Index](https://pytorch.org/docs/1.8.0/py-modindex.html) # TorchScript * Creating TorchScript Code * Mixing Tracing and Scripting * TorchScript Language * Built-in Functions and Modules * PyTorch Functions and Modules * Python Functions and Modules * Python Language Reference Comparison * Debugging * Disable JIT for Debugging * Inspecting Code * Interpreting Graphs * Tracer * Frequently Asked Questions * Appendix * Migrating to PyTorch 1.2 Recursive Scripting API * References TorchScript is a way to create serializable and optimizable models from PyTorch code. Any TorchScript program can be saved from a Python process and loaded in a process where there is no Python dependency. We provide tools to incrementally transition a model from a pure Python program to a TorchScript program that can be run independently from Python, such as in a standalone C++ program. This makes it possible to train models in PyTorch using familiar tools in Python and then export the model via TorchScript to a production environment where Python programs may be disadvantageous for performance and multi-threading reasons. For a gentle introduction to TorchScript, see the [Introduction to TorchScript](https://pytorch.org/tutorials/beginner/Intro_to_TorchScript_tutorial.html) tutorial. For an end-to-end example of converting a PyTorch model to TorchScript and running it in C++, see the [Loading a PyTorch Model in C++](https://pytorch.org/tutorials/advanced/cpp_export.html) tutorial. ## Creating TorchScript Code [`script`](generated/torch.jit.script#torch.jit.script "torch.jit.script")(obj[, optimize, _frames_up, _rcb]) | Scripting a function or `nn.Module` will inspect the source code, compile it as TorchScript code using the TorchScript compiler, and return a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") or [`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction"). ---|--- [`trace`](generated/torch.jit.trace#torch.jit.trace "torch.jit.trace")(func, example_inputs[, optimize, …]) | Trace a function and return an executable or [`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") that will be optimized using just-in-time compilation. [`script_if_tracing`](generated/torch.jit.script_if_tracing#torch.jit.script_if_tracing "torch.jit.script_if_tracing")(fn) | Compiles `fn` when it is first called during tracing. [`trace_module`](generated/torch.jit.trace_module#torch.jit.trace_module "torch.jit.trace_module")(mod, inputs[, optimize, …]) | Trace a module and return an executable [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") that will be optimized using just-in-time compilation. [`fork`](generated/torch.jit.fork#torch.jit.fork "torch.jit.fork")(func, *args, **kwargs) | Creates an asynchronous task executing `func` and a reference to the value of the result of this execution. [`wait`](generated/torch.jit.wait#torch.jit.wait "torch.jit.wait")(future) | Forces completion of a `torch.jit.Future[T]` asynchronous task, returning the result of the task. [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")() | A wrapper around C++ `torch::jit::Module`. [`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") | Functionally equivalent to a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"), but represents a single function and does not have any attributes or Parameters. [`freeze`](generated/torch.jit.freeze#torch.jit.freeze "torch.jit.freeze")(mod[, preserved_attrs, optimize_numerics]) | Freezing a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") will clone it and attempt to inline the cloned module’s submodules, parameters, and attributes as constants in the TorchScript IR Graph. [`save`](generated/torch.jit.save#torch.jit.save "torch.jit.save")(m, f[, _extra_files]) | Save an offline version of this module for use in a separate process. [`load`](generated/torch.jit.load#torch.jit.load "torch.jit.load")(f[, map_location, _extra_files]) | Load a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") or [`ScriptFunction`](generated/torch.jit.scriptfunction#torch.jit.ScriptFunction "torch.jit.ScriptFunction") previously saved with [`torch.jit.save`](generated/torch.jit.save#torch.jit.save "torch.jit.save") [`ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore")([drop]) | This decorator indicates to the compiler that a function or method should be ignored and left as a Python function. [`unused`](generated/torch.jit.unused#torch.jit.unused "torch.jit.unused")(fn) | This decorator indicates to the compiler that a function or method should be ignored and replaced with the raising of an exception. [`isinstance`](generated/torch.jit.isinstance#torch.jit.isinstance "torch.jit.isinstance")(obj, target_type) | This function provides for conatiner type refinement in TorchScript. ## Mixing Tracing and Scripting In many cases either tracing or scripting is an easier approach for converting a model to TorchScript. Tracing and scripting can be composed to suit the particular requirements of a part of a model. Scripted functions can call traced functions. This is particularly useful when you need to use control-flow around a simple feed-forward model. For instance the beam search of a sequence to sequence model will typically be written in script but can call an encoder module generated using tracing. Example (calling a traced function in script): import torch def foo(x, y): return 2 * x + y traced_foo = torch.jit.trace(foo, (torch.rand(3), torch.rand(3))) @torch.jit.script def bar(x): return traced_foo(x, x) Traced functions can call script functions. This is useful when a small part of a model requires some control-flow even though most of the model is just a feed-forward network. Control-flow inside of a script function called by a traced function is preserved correctly. Example (calling a script function in a traced function): import torch @torch.jit.script def foo(x, y): if x.max() > y.max(): r = x else: r = y return r def bar(x, y, z): return foo(x, y) + z traced_bar = torch.jit.trace(bar, (torch.rand(3), torch.rand(3), torch.rand(3))) This composition also works for `nn.Module`s as well, where it can be used to generate a submodule using tracing that can be called from the methods of a script module. Example (using a traced module): import torch import torchvision class MyScriptModule(torch.nn.Module): def __init__(self): super(MyScriptModule, self).__init__() self.means = torch.nn.Parameter(torch.tensor([103.939, 116.779, 123.68]) .resize_(1, 3, 1, 1)) self.resnet = torch.jit.trace(torchvision.models.resnet18(), torch.rand(1, 3, 224, 224)) def forward(self, input): return self.resnet(input - self.means) my_script_module = torch.jit.script(MyScriptModule()) ## TorchScript Language TorchScript is a statically typed subset of Python, so many Python features apply directly to TorchScript. See the full [TorchScript Language Reference](jit_language_reference#language-reference) for details. ## Built-in Functions and Modules TorchScript supports the use of most PyTorch functions and many Python built- ins. See [TorchScript Builtins](jit_builtin_functions#builtin-functions) for a full reference of supported functions. ### PyTorch Functions and Modules TorchScript supports a subset of the tensor and neural network functions that PyTorch provides. Most methods on Tensor as well as functions in the `torch` namespace, all functions in `torch.nn.functional` and most modules from `torch.nn` are supported in TorchScript. See [TorchScript Unsupported Pytorch Constructs](jit_unsupported#jit- unsupported) for a list of unsupported PyTorch functions and modules. ### Python Functions and Modules Many of Python’s [built-in functions](https://docs.python.org/3/library/functions.html) are supported in TorchScript. The [`math`](https://docs.python.org/3/library/math.html#module- math "\(in Python v3.9\)") module is also supported (see [math Module](jit_builtin_functions#math-module) for details), but no other Python modules (built-in or third party) are supported. ### Python Language Reference Comparison For a full listing of supported Python features, see [Python Language Reference Coverage](jit_python_reference#python-language-reference). ## Debugging ### Disable JIT for Debugging `PYTORCH_JIT` Setting the environment variable `PYTORCH_JIT=0` will disable all script and tracing annotations. If there is hard-to-debug error in one of your TorchScript models, you can use this flag to force everything to run using native Python. Since TorchScript (scripting and tracing) is disabled with this flag, you can use tools like `pdb` to debug the model code. For example: @torch.jit.script def scripted_fn(x : torch.Tensor): for i in range(12): x = x + x return x def fn(x): x = torch.neg(x) import pdb; pdb.set_trace() return scripted_fn(x) traced_fn = torch.jit.trace(fn, (torch.rand(4, 5),)) traced_fn(torch.rand(3, 4)) Debugging this script with `pdb` works except for when we invoke the [`@torch.jit.script`](generated/torch.jit.script#torch.jit.script "torch.jit.script") function. We can globally disable JIT, so that we can call the [`@torch.jit.script`](generated/torch.jit.script#torch.jit.script "torch.jit.script") function as a normal Python function and not compile it. If the above script is called `disable_jit_example.py`, we can invoke it like so: $ PYTORCH_JIT=0 python disable_jit_example.py and we will be able to step into the [`@torch.jit.script`](generated/torch.jit.script#torch.jit.script "torch.jit.script") function as a normal Python function. To disable the TorchScript compiler for a specific function, see [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore"). ### Inspecting Code TorchScript provides a code pretty-printer for all [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") instances. This pretty-printer gives an interpretation of the script method’s code as valid Python syntax. For example: @torch.jit.script def foo(len): # type: (int) -> torch.Tensor rv = torch.zeros(3, 4) for i in range(len): if i < 10: rv = rv - 1.0 else: rv = rv + 1.0 return rv print(foo.code) A [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") with a single `forward` method will have an attribute `code`, which you can use to inspect the [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")’s code. If the [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") has more than one method, you will need to access `.code` on the method itself and not the module. We can inspect the code of a method named `foo` on a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") by accessing `.foo.code`. The example above produces this output: def foo(len: int) -> Tensor: rv = torch.zeros([3, 4], dtype=None, layout=None, device=None, pin_memory=None) rv0 = rv for i in range(len): if torch.lt(i, 10): rv1 = torch.sub(rv0, 1., 1) else: rv1 = torch.add(rv0, 1., 1) rv0 = rv1 return rv0 This is TorchScript’s compilation of the code for the `forward` method. You can use this to ensure TorchScript (tracing or scripting) has captured your model code correctly. ### Interpreting Graphs TorchScript also has a representation at a lower level than the code pretty- printer, in the form of IR graphs. TorchScript uses a static single assignment (SSA) intermediate representation (IR) to represent computation. The instructions in this format consist of ATen (the C++ backend of PyTorch) operators and other primitive operators, including control flow operators for loops and conditionals. As an example: @torch.jit.script def foo(len): # type: (int) -> torch.Tensor rv = torch.zeros(3, 4) for i in range(len): if i < 10: rv = rv - 1.0 else: rv = rv + 1.0 return rv print(foo.graph) `graph` follows the same rules described in the Inspecting Code section with regard to `forward` method lookup. The example script above produces the graph: graph(%len.1 : int): %24 : int = prim::Constant[value=1]() %17 : bool = prim::Constant[value=1]() # test.py:10:5 %12 : bool? = prim::Constant() %10 : Device? = prim::Constant() %6 : int? = prim::Constant() %1 : int = prim::Constant[value=3]() # test.py:9:22 %2 : int = prim::Constant[value=4]() # test.py:9:25 %20 : int = prim::Constant[value=10]() # test.py:11:16 %23 : float = prim::Constant[value=1]() # test.py:12:23 %4 : int[] = prim::ListConstruct(%1, %2) %rv.1 : Tensor = aten::zeros(%4, %6, %6, %10, %12) # test.py:9:10 %rv : Tensor = prim::Loop(%len.1, %17, %rv.1) # test.py:10:5 block0(%i.1 : int, %rv.14 : Tensor): %21 : bool = aten::lt(%i.1, %20) # test.py:11:12 %rv.13 : Tensor = prim::If(%21) # test.py:11:9 block0(): %rv.3 : Tensor = aten::sub(%rv.14, %23, %24) # test.py:12:18 -> (%rv.3) block1(): %rv.6 : Tensor = aten::add(%rv.14, %23, %24) # test.py:14:18 -> (%rv.6) -> (%17, %rv.13) return (%rv) Take the instruction `%rv.1 : Tensor = aten::zeros(%4, %6, %6, %10, %12) # test.py:9:10` for example. * `%rv.1 : Tensor` means we assign the output to a (unique) value named `rv.1`, that value is of `Tensor` type and that we do not know its concrete shape. * `aten::zeros` is the operator (equivalent to `torch.zeros`) and the input list `(%4, %6, %6, %10, %12)` specifies which values in scope should be passed as inputs. The schema for built-in functions like `aten::zeros` can be found at Builtin Functions. * `# test.py:9:10` is the location in the original source file that generated this instruction. In this case, it is a file named `test.py`, on line 9, and at character 10. Notice that operators can also have associated `blocks`, namely the `prim::Loop` and `prim::If` operators. In the graph print-out, these operators are formatted to reflect their equivalent source code forms to facilitate easy debugging. Graphs can be inspected as shown to confirm that the computation described by a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") is correct, in both automated and manual fashion, as described below. ### Tracer #### Tracing Edge Cases There are some edge cases that exist where the trace of a given Python function/module will not be representative of the underlying code. These cases can include: * Tracing of control flow that is dependent on inputs (e.g. tensor shapes) * Tracing of in-place operations of tensor views (e.g. indexing on the left-hand side of an assignment) Note that these cases may in fact be traceable in the future. #### Automatic Trace Checking One way to automatically catch many errors in traces is by using `check_inputs` on the `torch.jit.trace()` API. `check_inputs` takes a list of tuples of inputs that will be used to re-trace the computation and verify the results. For example: def loop_in_traced_fn(x): result = x[0] for i in range(x.size(0)): result = result * x[i] return result inputs = (torch.rand(3, 4, 5),) check_inputs = [(torch.rand(4, 5, 6),), (torch.rand(2, 3, 4),)] traced = torch.jit.trace(loop_in_traced_fn, inputs, check_inputs=check_inputs) Gives us the following diagnostic information: ERROR: Graphs differed across invocations! Graph diff: graph(%x : Tensor) { %1 : int = prim::Constant[value=0]() %2 : int = prim::Constant[value=0]() %result.1 : Tensor = aten::select(%x, %1, %2) %4 : int = prim::Constant[value=0]() %5 : int = prim::Constant[value=0]() %6 : Tensor = aten::select(%x, %4, %5) %result.2 : Tensor = aten::mul(%result.1, %6) %8 : int = prim::Constant[value=0]() %9 : int = prim::Constant[value=1]() %10 : Tensor = aten::select(%x, %8, %9) - %result : Tensor = aten::mul(%result.2, %10) + %result.3 : Tensor = aten::mul(%result.2, %10) ? ++ %12 : int = prim::Constant[value=0]() %13 : int = prim::Constant[value=2]() %14 : Tensor = aten::select(%x, %12, %13) + %result : Tensor = aten::mul(%result.3, %14) + %16 : int = prim::Constant[value=0]() + %17 : int = prim::Constant[value=3]() + %18 : Tensor = aten::select(%x, %16, %17) - %15 : Tensor = aten::mul(%result, %14) ? ^ ^ + %19 : Tensor = aten::mul(%result, %18) ? ^ ^ - return (%15); ? ^ + return (%19); ? ^ } This message indicates to us that the computation differed between when we first traced it and when we traced it with the `check_inputs`. Indeed, the loop within the body of `loop_in_traced_fn` depends on the shape of the input `x`, and thus when we try another `x` with a different shape, the trace differs. In this case, data-dependent control flow like this can be captured using [`torch.jit.script()`](generated/torch.jit.script#torch.jit.script "torch.jit.script") instead: def fn(x): result = x[0] for i in range(x.size(0)): result = result * x[i] return result inputs = (torch.rand(3, 4, 5),) check_inputs = [(torch.rand(4, 5, 6),), (torch.rand(2, 3, 4),)] scripted_fn = torch.jit.script(fn) print(scripted_fn.graph) #print(str(scripted_fn.graph).strip()) for input_tuple in [inputs] + check_inputs: torch.testing.assert_allclose(fn(*input_tuple), scripted_fn(*input_tuple)) Which produces: graph(%x : Tensor) { %5 : bool = prim::Constant[value=1]() %1 : int = prim::Constant[value=0]() %result.1 : Tensor = aten::select(%x, %1, %1) %4 : int = aten::size(%x, %1) %result : Tensor = prim::Loop(%4, %5, %result.1) block0(%i : int, %7 : Tensor) { %10 : Tensor = aten::select(%x, %1, %i) %result.2 : Tensor = aten::mul(%7, %10) -> (%5, %result.2) } return (%result); } #### Tracer Warnings The tracer produces warnings for several problematic patterns in traced computation. As an example, take a trace of a function that contains an in- place assignment on a slice (a view) of a Tensor: def fill_row_zero(x): x[0] = torch.rand(*x.shape[1:2]) return x traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),)) print(traced.graph) Produces several warnings and a graph which simply returns the input: fill_row_zero.py:4: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator copy_ (possibly due to an assignment). This might cause the trace to be incorrect, because all other views that also reference this data will not reflect this change in the trace! On the other hand, if all other views use the same memory chunk, but are disjoint (e.g. are outputs of torch.split), this might still be safe. x[0] = torch.rand(*x.shape[1:2]) fill_row_zero.py:6: TracerWarning: Output nr 1. of the traced function does not match the corresponding output of the Python function. Detailed error: Not within tolerance rtol=1e-05 atol=1e-05 at input[0, 1] (0.09115803241729736 vs. 0.6782537698745728) and 3 other locations (33.00%) traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),)) graph(%0 : Float(3, 4)) { return (%0); } We can fix this by modifying the code to not use the in-place update, but rather build up the result tensor out-of-place with `torch.cat`: def fill_row_zero(x): x = torch.cat((torch.rand(1, *x.shape[1:2]), x[1:2]), dim=0) return x traced = torch.jit.trace(fill_row_zero, (torch.rand(3, 4),)) print(traced.graph) ## Frequently Asked Questions Q: I would like to train a model on GPU and do inference on CPU. What are the best practices? First convert your model from GPU to CPU and then save it, like so: cpu_model = gpu_model.cpu() sample_input_cpu = sample_input_gpu.cpu() traced_cpu = torch.jit.trace(cpu_model, sample_input_cpu) torch.jit.save(traced_cpu, "cpu.pt") traced_gpu = torch.jit.trace(gpu_model, sample_input_gpu) torch.jit.save(traced_gpu, "gpu.pt") # ... later, when using the model: if use_gpu: model = torch.jit.load("gpu.pt") else: model = torch.jit.load("cpu.pt") model(input) This is recommended because the tracer may witness tensor creation on a specific device, so casting an already-loaded model may have unexpected effects. Casting the model _before_ saving it ensures that the tracer has the correct device information. Q: How do I store attributes on a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")? Say we have a model like: import torch class Model(torch.nn.Module): def __init__(self): super(Model, self).__init__() self.x = 2 def forward(self): return self.x m = torch.jit.script(Model()) If `Model` is instantiated it will result in a compilation error since the compiler doesn’t know about `x`. There are 4 ways to inform the compiler of attributes on [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule"): 1\. `nn.Parameter` \- Values wrapped in `nn.Parameter` will work as they do on `nn.Module`s 2\. `register_buffer` \- Values wrapped in `register_buffer` will work as they do on `nn.Module`s. This is equivalent to an attribute (see 4) of type `Tensor`. 3\. Constants - Annotating a class member as `Final` (or adding it to a list called `__constants__` at the class definition level) will mark the contained names as constants. Constants are saved directly in the code of the model. See `builtin-constants` for details. 4\. Attributes - Values that are a `supported type` can be added as mutable attributes. Most types can be inferred but some may need to be specified, see `module attributes` for details. Q: I would like to trace module’s method but I keep getting this error: `RuntimeError: Cannot insert a Tensor that requires grad as a constant. Consider making it a parameter or input, or detaching the gradient` This error usually means that the method you are tracing uses a module’s parameters and you are passing the module’s method instead of the module instance (e.g. `my_module_instance.forward` vs `my_module_instance`). * Invoking `trace` with a module’s method captures module parameters (which may require gradients) as **constants**. * On the other hand, invoking `trace` with module’s instance (e.g. `my_module`) creates a new module and correctly copies parameters into the new module, so they can accumulate gradients if required. To trace a specific method on a module, see [`torch.jit.trace_module`](generated/torch.jit.trace_module#torch.jit.trace_module "torch.jit.trace_module") ## Appendix ### Migrating to PyTorch 1.2 Recursive Scripting API This section details the changes to TorchScript in PyTorch 1.2. If you are new to TorchScript you can skip this section. There are two main changes to the TorchScript API with PyTorch 1.2. 1\. [`torch.jit.script`](generated/torch.jit.script#torch.jit.script "torch.jit.script") will now attempt to recursively compile functions, methods, and classes that it encounters. Once you call `torch.jit.script`, compilation is “opt-out”, rather than “opt-in”. 2\. `torch.jit.script(nn_module_instance)` is now the preferred way to create [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")s, instead of inheriting from `torch.jit.ScriptModule`. These changes combine to provide a simpler, easier- to-use API for converting your `nn.Module`s into [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule")s, ready to be optimized and executed in a non-Python environment. The new usage looks like this: import torch import torch.nn as nn import torch.nn.functional as F class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.conv1 = nn.Conv2d(1, 20, 5) self.conv2 = nn.Conv2d(20, 20, 5) def forward(self, x): x = F.relu(self.conv1(x)) return F.relu(self.conv2(x)) my_model = Model() my_scripted_model = torch.jit.script(my_model) * The module’s `forward` is compiled by default. Methods called from `forward` are lazily compiled in the order they are used in `forward`. * To compile a method other than `forward` that is not called from `forward`, add `@torch.jit.export`. * To stop the compiler from compiling a method, add [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") or [`@torch.jit.unused`](generated/torch.jit.unused#torch.jit.unused "torch.jit.unused"). `@ignore` leaves the * method as a call to python, and `@unused` replaces it with an exception. `@ignored` cannot be exported; `@unused` can. * Most attribute types can be inferred, so `torch.jit.Attribute` is not necessary. For empty container types, annotate their types using [PEP 526-style](https://www.python.org/dev/peps/pep-0526/#class-and-instance-variable-annotations) class annotations. * Constants can be marked with a `Final` class annotation instead of adding the name of the member to `__constants__`. * Python 3 type hints can be used in place of `torch.jit.annotate` As a result of these changes, the following items are considered deprecated and should not appear in new code: * The `@torch.jit.script_method` decorator * Classes that inherit from `torch.jit.ScriptModule` * The `torch.jit.Attribute` wrapper class * The `__constants__` array * The `torch.jit.annotate` function #### Modules Warning The [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") annotation’s behavior changes in PyTorch 1.2. Before PyTorch 1.2 the @ignore decorator was used to make a function or method callable from code that is exported. To get this functionality back, use `@torch.jit.unused()`. `@torch.jit.ignore` is now equivalent to `@torch.jit.ignore(drop=False)`. See [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") and [`@torch.jit.unused`](generated/torch.jit.unused#torch.jit.unused "torch.jit.unused") for details. When passed to the [`torch.jit.script`](generated/torch.jit.script#torch.jit.script "torch.jit.script") function, a `torch.nn.Module`’s data is copied to a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") and the TorchScript compiler compiles the module. The module’s `forward` is compiled by default. Methods called from `forward` are lazily compiled in the order they are used in `forward`, as well as any `@torch.jit.export` methods. `torch.jit.export(fn)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#export) This decorator indicates that a method on an `nn.Module` is used as an entry point into a [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") and should be compiled. `forward` implicitly is assumed to be an entry point, so it does not need this decorator. Functions and methods called from `forward` are compiled as they are seen by the compiler, so they do not need this decorator either. Example (using `@torch.jit.export` on a method): import torch import torch.nn as nn class MyModule(nn.Module): def implicitly_compiled_method(self, x): return x + 99 # `forward` is implicitly decorated with `@torch.jit.export`, # so adding it here would have no effect def forward(self, x): return x + 10 @torch.jit.export def another_forward(self, x): # When the compiler sees this call, it will compile # `implicitly_compiled_method` return self.implicitly_compiled_method(x) def unused_method(self, x): return x - 20 # `m` will contain compiled methods: # `forward` # `another_forward` # `implicitly_compiled_method` # `unused_method` will not be compiled since it was not called from # any compiled methods and wasn't decorated with `@torch.jit.export` m = torch.jit.script(MyModule()) #### Functions Functions don’t change much, they can be decorated with [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") or [`torch.jit.unused`](generated/torch.jit.unused#torch.jit.unused "torch.jit.unused") if needed. # Same behavior as pre-PyTorch 1.2 @torch.jit.script def some_fn(): return 2 # Marks a function as ignored, if nothing # ever calls it then this has no effect @torch.jit.ignore def some_fn2(): return 2 # As with ignore, if nothing calls it then it has no effect. # If it is called in script it is replaced with an exception. @torch.jit.unused def some_fn3(): import pdb; pdb.set_trace() return 4 # Doesn't do anything, this function is already # the main entry point @torch.jit.export def some_fn4(): return 2 #### TorchScript Classes Warning TorchScript class support is experimental. Currently it is best suited for simple record-like types (think a `NamedTuple` with methods attached). Everything in a user defined [TorchScript Class](torchscript-class) is exported by default, functions can be decorated with [`@torch.jit.ignore`](generated/torch.jit.ignore#torch.jit.ignore "torch.jit.ignore") if needed. #### Attributes The TorchScript compiler needs to know the types of `module attributes`. Most types can be inferred from the value of the member. Empty lists and dicts cannot have their types inferred and must have their types annotated with [PEP 526-style](https://www.python.org/dev/peps/pep-0526/#class-and-instance- variable-annotations) class annotations. If a type cannot be inferred and is not explicitly annotated, it will not be added as an attribute to the resulting [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") Old API: from typing import Dict import torch class MyModule(torch.jit.ScriptModule): def __init__(self): super(MyModule, self).__init__() self.my_dict = torch.jit.Attribute({}, Dict[str, int]) self.my_int = torch.jit.Attribute(20, int) m = MyModule() New API: from typing import Dict class MyModule(torch.nn.Module): my_dict: Dict[str, int] def __init__(self): super(MyModule, self).__init__() # This type cannot be inferred and must be specified self.my_dict = {} # The attribute type here is inferred to be `int` self.my_int = 20 def forward(self): pass m = torch.jit.script(MyModule()) #### Constants The `Final` type constructor can be used to mark members as `constant`. If members are not marked constant, they will be copied to the resulting [`ScriptModule`](generated/torch.jit.scriptmodule#torch.jit.ScriptModule "torch.jit.ScriptModule") as an attribute. Using `Final` opens opportunities for optimization if the value is known to be fixed and gives additional type safety. Old API: class MyModule(torch.jit.ScriptModule): __constants__ = ['my_constant'] def __init__(self): super(MyModule, self).__init__() self.my_constant = 2 def forward(self): pass m = MyModule() New API: try: from typing_extensions import Final except: # If you don't have `typing_extensions` installed, you can use a # polyfill from `torch.jit`. from torch.jit import Final class MyModule(torch.nn.Module): my_constant: Final[int] def __init__(self): super(MyModule, self).__init__() self.my_constant = 2 def forward(self): pass m = torch.jit.script(MyModule()) #### Variables Containers are assumed to have type `Tensor` and be non-optional (see `Default Types` for more information). Previously, `torch.jit.annotate` was used to tell the TorchScript compiler what the type should be. Python 3 style type hints are now supported. import torch from typing import Dict, Optional @torch.jit.script def make_dict(flag: bool): x: Dict[str, int] = {} x['hi'] = 2 b: Optional[int] = None if flag: b = 2 return x, b ### References * [Python Language Reference Coverage](jit_python_reference) * [TorchScript Unsupported Pytorch Constructs](jit_unsupported) * Types * Expressions * Statements * Variable Resolution * Use of Python Values # TorchScript Language Reference TorchScript is a statically typed subset of Python that can either be written directly (using the [`@torch.jit.script`](generated/torch.jit.script#torch.jit.script "torch.jit.script") decorator) or generated automatically from Python code via tracing. When using tracing, code is automatically converted into this subset of Python by recording only the actual operators on tensors and simply executing and discarding the other surrounding Python code. When writing TorchScript directly using `@torch.jit.script` decorator, the programmer must only use the subset of Python supported in TorchScript. This section documents what is supported in TorchScript as if it were a language reference for a stand alone language. Any features of Python not mentioned in this reference are not part of TorchScript. See `Builtin Functions` for a complete reference of available Pytorch tensor methods, modules, and functions. As a subset of Python, any valid TorchScript function is also a valid Python function. This makes it possible to `disable TorchScript` and debug the function using standard Python tools like `pdb`. The reverse is not true: there are many valid Python programs that are not valid TorchScript programs. Instead, TorchScript focuses specifically on the features of Python that are needed to represent neural network models in PyTorch. ## Types The largest difference between TorchScript and the full Python language is that TorchScript only supports a small set of types that are needed to express neural net models. In particular, TorchScript supports: Type | Description ---|--- `Tensor` | A PyTorch tensor of any dtype, dimension, or backend `Tuple[T0, T1, ..., TN]` | A tuple containing subtypes `T0`, `T1`, etc. (e.g. `Tuple[Tensor, Tensor]`) `bool` | A boolean value `int` | A scalar integer `float` | A scalar floating point number `str` | A string `List[T]` | A list of which all members are type `T` `Optional[T]` | A value which is either None or type `T` `Dict[K, V]` | A dict with key type `K` and value type `V`. Only `str`, `int`, and `float` are allowed as key types. `T` | A TorchScript Class `E` | A TorchScript Enum `NamedTuple[T0, T1, ...]` | A [`collections.namedtuple`](https://docs.python.org/3/library/collections.html#collections.namedtuple "\(in Python v3.9\)") tuple type Unlike Python, each variable in TorchScript function must have a single static type. This makes it easier to optimize TorchScript functions. Example (a type mismatch) import torch @torch.jit.script def an_error(x): if x: r = torch.rand(1) else: r = 4 return r Traceback (most recent call last): ... RuntimeError: ... Type mismatch: r is set to type Tensor in the true branch and type int in the false branch: @torch.jit.script def an_error(x): if x: ~~~~~ r = torch.rand(1) ~~~~~~~~~~~~~~~~~ else: ~~~~~ r = 4 ~~~~~ <--- HERE return r and was used here: else: r = 4 return r ~ <--- HERE... ### Unsupported Typing Constructs TorchScript does not support all features and types of the [`typing`](https://docs.python.org/3/library/typing.html#module-typing "\(in Python v3.9\)") module. Some of these are more fundamental things that are unlikely to be added in the future while others may be added if there is enough user demand to make it a priority. These types and features from the [`typing`](https://docs.python.org/3/library/typing.html#module-typing "\(in Python v3.9\)") module are unavailble in TorchScript. Item | Description ---|--- [`typing.Any`](https://docs.python.org/3/library/typing.html#typing.Any "\(in Python v3.9\)") | [`typing.Any`](https://docs.python.org/3/library/typing.html#typing.Any "\(in Python v3.9\)") is currently in development but not yet released [`typing.NoReturn`](https://docs.python.org/3/library/typing.html#typing.NoReturn "\(in Python v3.9\)") | Not implemented [`typing.Union`](https://docs.python.org/3/library/typing.html#typing.Union "\(in Python v3.9\)") | Unlikely to be implemented (however [`typing.Optional`](https://docs.python.org/3/library/typing.html#typing.Optional "\(in Python v3.9\)") is supported) [`typing.Sequence`](https://docs.python.org/3/library/typing.html#typing.Sequence "\(in Python v3.9\)") | Not implemented [`typing.Callable`](https://docs.python.org/3/library/typing.html#typing.Callable "\(in Python v3.9\)") | Not implemented [`typing.Literal`](https://docs.python.org/3/library/typing.html#typing.Literal "\(in Python v3.9\)") | Not implemented [`typing.ClassVar`](https://docs.python.org/3/library/typing.html#typing.ClassVar "\(in Python v3.9\)") | Not implemented [`typing.Final`](https://docs.python.org/3/library/typing.html#typing.Final "\(in Python v3.9\)") | This is supported for module attributes class attribute annotations but not for functions [`typing.AnyStr`](https://docs.python.org/3/library/typing.html#typing.AnyStr "\(in Python v3.9\)") | TorchScript does not support [`bytes`](https://docs.python.org/3/library/stdtypes.html#bytes "\(in Python v3.9\)") so this type is not used [`typing.overload`](https://docs.python.org/3/library/typing.html#typing.overload "\(in Python v3.9\)") | [`typing.overload`](https://docs.python.org/3/library/typing.html#typing.overload "\(in Python v3.9\)") is currently in development but not yet released Type aliases | Not implemented Nominal vs structural subtyping | Nominal typing is in development, but structural typing is not NewType | Unlikely to be implemented Generics | Unlikely to be implemented Any other functionality from the [`typing`](https://docs.python.org/3/library/typing.html#module-typing "\(in Python v3.9\)") module not explitily listed in this documentation is unsupported. ### Default Types By default, all parameters to a TorchScript function are assumed to be Tensor. To specify that an argument to a TorchScript function is another type, it is possible to use MyPy-style type annotations using the types listed above. import torch @torch.jit.script def foo(x, tup): # type: (int, Tuple[Tensor, Tensor]) -> Tensor t0, t1 = tup return t0 + t1 + x print(foo(3, (torch.rand(3), torch.rand(3)))) Note It is also possible to annotate types with Python 3 type hints from the `typing` module. import torch from typing import Tuple @torch.jit.script def foo(x: int, tup: Tuple[torch.Tensor, torch.Tensor]) -> torch.Tensor: t0, t1 = tup return t0 + t1 + x print(foo(3, (torch.rand(3), torch.rand(3)))) An empty list is assumed to be `List[Tensor]` and empty dicts `Dict[str, Tensor]`. To instantiate an empty list or dict of other types, use `Python 3 type hints`. Example (type annotations for Python 3): import torch import torch.nn as nn from typing import Dict, List, Tuple class EmptyDataStructures(torch.nn.Module): def __init__(self): super(EmptyDataStructures, self).__init__() def forward(self, x: torch.Tensor) -> Tuple[List[Tuple[int, float]], Dict[str, int]]: # This annotates the list to be a `List[Tuple[int, float]]` my_list: List[Tuple[int, float]] = [] for i in range(10): my_list.append((i, x.item())) my_dict: Dict[str, int] = {} return my_list, my_dict x = torch.jit.script(EmptyDataStructures()) ### Optional Type Refinement TorchScript will refine the type of a variable of type `Optional[T]` when a comparison to `None` is made inside the conditional of an if-statement or checked in an `assert`. The compiler can reason about multiple `None` checks that are combined with `and`, `or`, and `not`. Refinement will also occur for else blocks of if-statements that are not explicitly written. The `None` check must be within the if-statement’s condition; assigning a `None` check to a variable and using it in the if-statement’s condition will not refine the types of variables in the check. Only local variables will be refined, an attribute like `self.x` will not and must assigned to a local variable to be refined. Example (refining types on parameters and locals): import torch import torch.nn as nn from typing import Optional class M(nn.Module): z: Optional[int] def __init__(self, z): super(M, self).__init__() # If `z` is None, its type cannot be inferred, so it must # be specified (above) self.z = z def forward(self, x, y, z): # type: (Optional[int], Optional[int], Optional[int]) -> int if x is None: x = 1 x = x + 1 # Refinement for an attribute by assigning it to a local z = self.z if y is not None and z is not None: x = y + z # Refinement via an `assert` assert z is not None x += z return x module = torch.jit.script(M(2)) module = torch.jit.script(M(None)) ### TorchScript Classes Warning TorchScript class support is experimental. Currently it is best suited for simple record-like types (think a `NamedTuple` with methods attached). Python classes can be used in TorchScript if they are annotated with [`@torch.jit.script`](generated/torch.jit.script#torch.jit.script "torch.jit.script"), similar to how you would declare a TorchScript function: @torch.jit.script class Foo: def __init__(self, x, y): self.x = x def aug_add_x(self, inc): self.x += inc This subset is restricted: * All functions must be valid TorchScript functions (including `__init__()`). * Classes must be new-style classes, as we use `__new__()` to construct them with pybind11. * TorchScript classes are statically typed. Members can only be declared by assigning to self in the `__init__()` method. For example, assigning to `self` outside of the `__init__()` method: @torch.jit.script class Foo: def assign_x(self): self.x = torch.rand(2, 3) Will result in: RuntimeError: Tried to set nonexistent attribute: x. Did you forget to initialize it in __init__()?: def assign_x(self): self.x = torch.rand(2, 3) ~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE * No expressions except method definitions are allowed in the body of the class. * No support for inheritance or any other polymorphism strategy, except for inheriting from `object` to specify a new-style class. After a class is defined, it can be used in both TorchScript and Python interchangeably like any other TorchScript type: # Declare a TorchScript class @torch.jit.script class Pair: def __init__(self, first, second): self.first = first self.second = second @torch.jit.script def sum_pair(p): # type: (Pair) -> Tensor return p.first + p.second p = Pair(torch.rand(2, 3), torch.rand(2, 3)) print(sum_pair(p)) ### TorchScript Enums Python enums can be used in TorchScript without any extra annotation or code: from enum import Enum class Color(Enum): RED = 1 GREEN = 2 @torch.jit.script def enum_fn(x: Color, y: Color) -> bool: if x == Color.RED: return True return x == y After an enum is defined, it can be used in both TorchScript and Python interchangeably like any other TorchScript type. The type of the values of an enum must be `int`, `float`, or `str`. All values must be of the same type; heterogenous types for enum values are not supported. ### Named Tuples Types produced by [`collections.namedtuple`](https://docs.python.org/3/library/collections.html#collections.namedtuple "\(in Python v3.9\)") can be used in TorchScript. import torch import collections Point = collections.namedtuple('Point', ['x', 'y']) @torch.jit.script def total(point): # type: (Point) -> Tensor return point.x + point.y p = Point(x=torch.rand(3), y=torch.rand(3)) print(total(p)) ### Iterables Some functions (for example, [`zip`](https://docs.python.org/3/library/functions.html#zip "\(in Python v3.9\)") and [`enumerate`](https://docs.python.org/3/library/functions.html#enumerate "\(in Python v3.9\)")) can only operate on iterable types. Iterable types in TorchScript include `Tensor`s, lists, tuples, dictionaries, strings, [`torch.nn.ModuleList`](generated/torch.nn.modulelist#torch.nn.ModuleList "torch.nn.ModuleList") and [`torch.nn.ModuleDict`](generated/torch.nn.moduledict#torch.nn.ModuleDict "torch.nn.ModuleDict"). ## Expressions The following Python Expressions are supported. ### Literals True False None 'string literals' "string literals" 3 # interpreted as int 3.4 # interpreted as a float #### List Construction An empty list is assumed have type `List[Tensor]`. The types of other list literals are derived from the type of the members. See Default Types for more details. [3, 4] [] [torch.rand(3), torch.rand(4)] #### Tuple Construction (3, 4) (3,) #### Dict Construction An empty dict is assumed have type `Dict[str, Tensor]`. The types of other dict literals are derived from the type of the members. See Default Types for more details. {'hello': 3} {} {'a': torch.rand(3), 'b': torch.rand(4)} ### Variables See Variable Resolution for how variables are resolved. my_variable_name ### Arithmetic Operators a + b a - b a * b a / b a ^ b a @ b ### Comparison Operators a == b a != b a < b a > b a <= b a >= b ### Logical Operators a and b a or b not b ### Subscripts and Slicing t[0] t[-1] t[0:2] t[1:] t[:1] t[:] t[0, 1] t[0, 1:2] t[0, :1] t[-1, 1:, 0] t[1:, -1, 0] t[i:j, i] ### Function Calls Calls to `builtin functions` torch.rand(3, dtype=torch.int) Calls to other script functions: import torch @torch.jit.script def foo(x): return x + 1 @torch.jit.script def bar(x): return foo(x) ### Method Calls Calls to methods of builtin types like tensor: `x.mm(y)` On modules, methods must be compiled before they can be called. The TorchScript compiler recursively compiles methods it sees when compiling other methods. By default, compilation starts on the `forward` method. Any methods called by `forward` will be compiled, and any methods called by those methods, and so on. To start compilation at a method other than `forward`, use the [`@torch.jit.export`](jit#torch.jit.export "torch.jit.export") decorator (`forward` implicitly is marked `@torch.jit.export`). Calling a submodule directly (e.g. `self.resnet(input)`) is equivalent to calling its `forward` method (e.g. `self.resnet.forward(input)`). import torch import torch.nn as nn import torchvision class MyModule(nn.Module): def __init__(self): super(MyModule, self).__init__() means = torch.tensor([103.939, 116.779, 123.68]) self.means = torch.nn.Parameter(means.resize_(1, 3, 1, 1)) resnet = torchvision.models.resnet18() self.resnet = torch.jit.trace(resnet, torch.rand(1, 3, 224, 224)) def helper(self, input): return self.resnet(input - self.means) def forward(self, input): return self.helper(input) # Since nothing in the model calls `top_level_method`, the compiler # must be explicitly told to compile this method @torch.jit.export def top_level_method(self, input): return self.other_helper(input) def other_helper(self, input): return input + 10 # `my_script_module` will have the compiled methods `forward`, `helper`, # `top_level_method`, and `other_helper` my_script_module = torch.jit.script(MyModule()) ### Ternary Expressions x if x > y else y ### Casts float(ten) int(3.5) bool(ten) str(2)`` ### Accessing Module Parameters self.my_parameter self.my_submodule.my_parameter ## Statements TorchScript supports the following types of statements: ### Simple Assignments a = b a += b # short-hand for a = a + b, does not operate in-place on a a -= b ### Pattern Matching Assignments a, b = tuple_or_list a, b, *c = a_tuple Multiple Assignments a = b, c = tup ### Print Statements print("the result of an add:", a + b) ### If Statements if a < 4: r = -a elif a < 3: r = a + a else: r = 3 * a In addition to bools, floats, ints, and Tensors can be used in a conditional and will be implicitly casted to a boolean. ### While Loops a = 0 while a < 4: print(a) a += 1 ### For loops with range x = 0 for i in range(10): x *= i ### For loops over tuples These unroll the loop, generating a body for each member of the tuple. The body must type-check correctly for each member. tup = (3, torch.rand(4)) for x in tup: print(x) ### For loops over constant nn.ModuleList To use a `nn.ModuleList` inside a compiled method, it must be marked constant by adding the name of the attribute to the `__constants__` list for the type. For loops over a `nn.ModuleList` will unroll the body of the loop at compile time, with each member of the constant module list. class SubModule(torch.nn.Module): def __init__(self): super(SubModule, self).__init__() self.weight = nn.Parameter(torch.randn(2)) def forward(self, input): return self.weight + input class MyModule(torch.nn.Module): __constants__ = ['mods'] def __init__(self): super(MyModule, self).__init__() self.mods = torch.nn.ModuleList([SubModule() for i in range(10)]) def forward(self, v): for module in self.mods: v = module(v) return v m = torch.jit.script(MyModule()) ### Break and Continue for i in range(5): if i == 1: continue if i == 3: break print(i) ### Return return a, b ## Variable Resolution TorchScript supports a subset of Python’s variable resolution (i.e. scoping) rules. Local variables behave the same as in Python, except for the restriction that a variable must have the same type along all paths through a function. If a variable has a different type on different branches of an if statement, it is an error to use it after the end of the if statement. Similarly, a variable is not allowed to be used if it is only _defined_ along some paths through the function. Example: @torch.jit.script def foo(x): if x < 0: y = 4 print(y) Traceback (most recent call last): ... RuntimeError: ... y is not defined in the false branch... @torch.jit.script... def foo(x): if x < 0: ~~~~~~~~~ y = 4 ~~~~~ <--- HERE print(y) and was used here: if x < 0: y = 4 print(y) ~ <--- HERE... Non-local variables are resolved to Python values at compile time when the function is defined. These values are then converted into TorchScript values using the rules described in Use of Python Values. ## Use of Python Values To make writing TorchScript more convenient, we allow script code to refer to Python values in the surrounding scope. For instance, any time there is a reference to `torch`, the TorchScript compiler is actually resolving it to the `torch` Python module when the function is declared. These Python values are not a first class part of TorchScript. Instead they are de-sugared at compile- time into the primitive types that TorchScript supports. This depends on the dynamic type of the Python valued referenced when compilation occurs. This section describes the rules that are used when accessing Python values in TorchScript. ### Functions TorchScript can call Python functions. This functionality is very useful when incrementally converting a model to TorchScript. The model can be moved function-by-function to TorchScript, leaving calls to Python functions in place. This way you can incrementally check the correctness of the model as you go. `torch.jit.is_scripting()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/_jit_internal.html#is_scripting) Function that returns True when in compilation and False otherwise. This is useful especially with the @unused decorator to leave code in your model that is not yet TorchScript compatible. .. testcode: import torch @torch.jit.unused def unsupported_linear_op(x): return x def linear(x): if not torch.jit.is_scripting(): return torch.linear(x) else: return unsupported_linear_op(x) ### Attribute Lookup On Python Modules TorchScript can lookup attributes on modules. `Builtin functions` like `torch.add` are accessed this way. This allows TorchScript to call functions defined in other modules. ### Python-defined Constants TorchScript also provides a way to use constants that are defined in Python. These can be used to hard-code hyper-parameters into the function, or to define universal constants. There are two ways of specifying that a Python value should be treated as a constant. 1. Values looked up as attributes of a module are assumed to be constant: import math import torch @torch.jit.script def fn(): return math.pi 2. Attributes of a ScriptModule can be marked constant by annotating them with `Final[T]` import torch import torch.nn as nn class Foo(nn.Module): # `Final` from the `typing_extensions` module can also be used a : torch.jit.Final[int] def __init__(self): super(Foo, self).__init__() self.a = 1 + 4 def forward(self, input): return self.a + input f = torch.jit.script(Foo()) Supported constant Python types are * `int` * `float` * `bool` * `torch.device` * `torch.layout` * `torch.dtype` * tuples containing supported types * `torch.nn.ModuleList` which can be used in a TorchScript for loop ### Module Attributes The `torch.nn.Parameter` wrapper and `register_buffer` can be used to assign tensors to a module. Other values assigned to a module that is compiled will be added to the compiled module if their types can be inferred. All types available in TorchScript can be used as module attributes. Tensor attributes are semantically the same as buffers. The type of empty lists and dictionaries and `None` values cannot be inferred and must be specified via [PEP 526-style](https://www.python.org/dev/peps/pep-0526/#class-and-instance- variable-annotations) class annotations. If a type cannot be inferred and is not explicilty annotated, it will not be added as an attribute to the resulting `ScriptModule`. Example: from typing import List, Dict class Foo(nn.Module): # `words` is initialized as an empty list, so its type must be specified words: List[str] # The type could potentially be inferred if `a_dict` (below) was not # empty, but this annotation ensures `some_dict` will be made into the # proper type some_dict: Dict[str, int] def __init__(self, a_dict): super(Foo, self).__init__() self.words = [] self.some_dict = a_dict # `int`s can be inferred self.my_int = 10 def forward(self, input): # type: (str) -> int self.words.append(input) return self.some_dict[input] + self.my_int f = torch.jit.script(Foo({'hi': 2})) # torch.linalg Common linear algebra operations. This module is in BETA. New functions are still being added, and some functions may change in future PyTorch releases. See the documentation of each function for details. ## Functions `torch.linalg.cholesky(input, *, out=None) → Tensor` Computes the Cholesky decomposition of a Hermitian (or symmetric for real- valued matrices) positive-definite matrix or the Cholesky decompositions for a batch of such matrices. Each decomposition has the form: input=LLH\text{input} = LL^H where LL is a lower-triangular matrix and LHL^H is the conjugate transpose of LL , which is just a transpose for the case of real-valued input matrices. In code it translates to `input = L @ L.t()` if `input` is real-valued and `input = L @ L.conj().t()` if `input` is complex-valued. The batch of LL matrices is returned. Supports real-valued and complex-valued inputs. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note LAPACK’s `potrf` is used for CPU inputs, and MAGMA’s `potrf` is used for CUDA inputs. Note If `input` is not a Hermitian positive-definite matrix, or if it’s a batch of matrices and one or more of them is not a Hermitian positive-definite matrix, then a RuntimeError will be thrown. If `input` is a batch of matrices, then the error message will include the batch index of the first matrix that is not Hermitian positive-definite. Parameters **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,n,n)(*, n, n) consisting of Hermitian positive-definite n×nn \times n matrices, where ∗* is zero or more batch dimensions. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default: `None` Examples: >>> a = torch.randn(2, 2, dtype=torch.complex128) >>> a = torch.mm(a, a.t().conj()) # creates a Hermitian positive-definite matrix >>> l = torch.linalg.cholesky(a) >>> a tensor([[2.5266+0.0000j, 1.9586-2.0626j], [1.9586+2.0626j, 9.4160+0.0000j]], dtype=torch.complex128) >>> l tensor([[1.5895+0.0000j, 0.0000+0.0000j], [1.2322+1.2976j, 2.4928+0.0000j]], dtype=torch.complex128) >>> torch.mm(l, l.t().conj()) tensor([[2.5266+0.0000j, 1.9586-2.0626j], [1.9586+2.0626j, 9.4160+0.0000j]], dtype=torch.complex128) >>> a = torch.randn(3, 2, 2, dtype=torch.float64) >>> a = torch.matmul(a, a.transpose(-2, -1)) # creates a symmetric positive-definite matrix >>> l = torch.linalg.cholesky(a) >>> a tensor([[[ 1.1629, 2.0237], [ 2.0237, 6.6593]], [[ 0.4187, 0.1830], [ 0.1830, 0.1018]], [[ 1.9348, -2.5744], [-2.5744, 4.6386]]], dtype=torch.float64) >>> l tensor([[[ 1.0784, 0.0000], [ 1.8766, 1.7713]], [[ 0.6471, 0.0000], [ 0.2829, 0.1477]], [[ 1.3910, 0.0000], [-1.8509, 1.1014]]], dtype=torch.float64) >>> torch.allclose(torch.matmul(l, l.transpose(-2, -1)), a) True `torch.linalg.cond(input, p=None, *, out=None) → Tensor` Computes the condition number of a matrix `input`, or of each matrix in a batched `input`, using the matrix norm defined by `p`. For norms `{‘fro’, ‘nuc’, inf, -inf, 1, -1}` this is defined as the matrix norm of `input` times the matrix norm of the inverse of `input` computed using `torch.linalg.norm()`. While for norms `{None, 2, -2}` this is defined as the ratio between the largest and smallest singular values computed using `torch.linalg.svd()`. This function supports float, double, cfloat and cdouble dtypes. Note When given inputs on a CUDA device, this function may synchronize that device with the CPU depending on which norm `p` is used. Note For norms `{None, 2, -2}`, `input` may be a non-square matrix or batch of non- square matrices. For other norms, however, `input` must be a square matrix or a batch of square matrices, and if this requirement is not satisfied a RuntimeError will be thrown. Note For norms `{‘fro’, ‘nuc’, inf, -inf, 1, -1}` if `input` is a non-invertible matrix then a tensor containing infinity will be returned. If `input` is a batch of matrices and one or more of them is not invertible then a RuntimeError will be thrown. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size `(m, n)` or the batch of matrices of size `(*, m, n)` where `*` is one or more batch dimensions. * **p** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'__,__optional_) – the type of the matrix norm to use in the computations. inf refers to `float('inf')`, numpy’s `inf` object, or any equivalent object. The following norms can be used: p | norm for matrices ---|--- None | ratio of the largest singular value to the smallest singular value ’fro’ | Frobenius norm ’nuc’ | nuclear norm inf | max(sum(abs(x), dim=1)) -inf | min(sum(abs(x), dim=1)) 1 | max(sum(abs(x), dim=0)) -1 | min(sum(abs(x), dim=0)) 2 | ratio of the largest singular value to the smallest singular value -2 | ratio of the smallest singular value to the largest singular value Default: `None` Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – tensor to write the output to. Default is `None`. Returns The condition number of `input`. The output dtype is always real valued even for complex inputs (e.g. float if `input` is cfloat). Examples: >>> a = torch.randn(3, 4, 4, dtype=torch.complex64) >>> torch.linalg.cond(a) >>> a = torch.tensor([[1., 0, -1], [0, 1, 0], [1, 0, 1]]) >>> torch.linalg.cond(a) tensor([1.4142]) >>> torch.linalg.cond(a, 'fro') tensor(3.1623) >>> torch.linalg.cond(a, 'nuc') tensor(9.2426) >>> torch.linalg.cond(a, float('inf')) tensor(2.) >>> torch.linalg.cond(a, float('-inf')) tensor(1.) >>> torch.linalg.cond(a, 1) tensor(2.) >>> torch.linalg.cond(a, -1) tensor(1.) >>> torch.linalg.cond(a, 2) tensor([1.4142]) >>> torch.linalg.cond(a, -2) tensor([0.7071]) >>> a = torch.randn(2, 3, 3) >>> a tensor([[[-0.9204, 1.1140, 1.2055], [ 0.3988, -0.2395, -0.7441], [-0.5160, 0.3115, 0.2619]], [[-2.2128, 0.9241, 2.1492], [-1.1277, 2.7604, -0.8760], [ 1.2159, 0.5960, 0.0498]]]) >>> torch.linalg.cond(a) tensor([[9.5917], [3.2538]]) >>> a = torch.randn(2, 3, 3, dtype=torch.complex64) >>> a tensor([[[-0.4671-0.2137j, -0.1334-0.9508j, 0.6252+0.1759j], [-0.3486-0.2991j, -0.1317+0.1252j, 0.3025-0.1604j], [-0.5634+0.8582j, 0.1118-0.4677j, -0.1121+0.7574j]], [[ 0.3964+0.2533j, 0.9385-0.6417j, -0.0283-0.8673j], [ 0.2635+0.2323j, -0.8929-1.1269j, 0.3332+0.0733j], [ 0.1151+0.1644j, -1.1163+0.3471j, -0.5870+0.1629j]]]) >>> torch.linalg.cond(a) tensor([[4.6245], [4.5671]]) >>> torch.linalg.cond(a, 1) tensor([9.2589, 9.3486]) `torch.linalg.det(input) → Tensor` Computes the determinant of a square matrix `input`, or of each square matrix in a batched `input`. This function supports float, double, cfloat and cdouble dtypes. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note The determinant is computed using LU factorization. LAPACK’s `getrf` is used for CPU inputs, and MAGMA’s `getrf` is used for CUDA inputs. Note Backward through `det` internally uses `torch.linalg.svd()` when `input` is not invertible. In this case, double backward through `det` will be unstable when `input` doesn’t have distinct singular values. See `torch.linalg.svd()` for more details. Parameters **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size `(n, n)` or the batch of matrices of size `(*, n, n)` where `*` is one or more batch dimensions. Example: >>> a = torch.randn(3, 3) >>> a tensor([[ 0.9478, 0.9158, -1.1295], [ 0.9701, 0.7346, -1.8044], [-0.2337, 0.0557, 0.6929]]) >>> torch.linalg.det(a) tensor(0.0934) >>> a = torch.randn(3, 2, 2) >>> a tensor([[[ 0.9254, -0.6213], [-0.5787, 1.6843]], [[ 0.3242, -0.9665], [ 0.4539, -0.0887]], [[ 1.1336, -0.4025], [-0.7089, 0.9032]]]) >>> torch.linalg.det(a) tensor([1.1990, 0.4099, 0.7386]) `torch.linalg.slogdet(input, *, out=None) -> (Tensor, Tensor)` Calculates the sign and natural logarithm of the absolute value of a square matrix’s determinant, or of the absolute values of the determinants of a batch of square matrices `input`. The determinant can be computed with `sign * exp(logabsdet)`. Supports input of float, double, cfloat and cdouble datatypes. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note The determinant is computed using LU factorization. LAPACK’s `getrf` is used for CPU inputs, and MAGMA’s `getrf` is used for CUDA inputs. Note For matrices that have zero determinant, this returns `(0, -inf)`. If `input` is batched then the entries in the result tensors corresponding to matrices with the zero determinant have sign 0 and the natural logarithm of the absolute value of the determinant -inf. Parameters **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size (n,n)(n, n) or the batch of matrices of size (∗,n,n)(*, n, n) where ∗* is one or more batch dimensions. Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – tuple of two tensors to write the output to. Returns A namedtuple (sign, logabsdet) containing the sign of the determinant and the natural logarithm of the absolute value of determinant, respectively. Example: >>> A = torch.randn(3, 3) >>> A tensor([[ 0.0032, -0.2239, -1.1219], [-0.6690, 0.1161, 0.4053], [-1.6218, -0.9273, -0.0082]]) >>> torch.linalg.det(A) tensor(-0.7576) >>> torch.linalg.logdet(A) tensor(nan) >>> torch.linalg.slogdet(A) torch.return_types.linalg_slogdet(sign=tensor(-1.), logabsdet=tensor(-0.2776)) `torch.linalg.eigh(input, UPLO='L', *, out=None) -> (Tensor, Tensor)` Computes the eigenvalues and eigenvectors of a complex Hermitian (or real symmetric) matrix `input`, or of each such matrix in a batched `input`. For a single matrix `input`, the tensor of eigenvalues `w` and the tensor of eigenvectors `V` decompose the `input` such that `input = V diag(w) Vᴴ`, where `Vᴴ` is the transpose of `V` for real-valued `input`, or the conjugate transpose of `V` for complex-valued `input`. Since the matrix or matrices in `input` are assumed to be Hermitian, the imaginary part of their diagonals is always treated as zero. When `UPLO` is “L”, its default value, only the lower triangular part of each matrix is used in the computation. When `UPLO` is “U” only the upper triangular part of each matrix is used. Supports input of float, double, cfloat and cdouble dtypes. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note The eigenvalues/eigenvectors are computed using LAPACK’s `syevd` and `heevd` routines for CPU inputs, and MAGMA’s `syevd` and `heevd` routines for CUDA inputs. Note The eigenvalues of real symmetric or complex Hermitian matrices are always real. Note The eigenvectors of matrices are not unique, so any eigenvector multiplied by a constant remains a valid eigenvector. This function may compute different eigenvector representations on different device types. Usually the difference is only in the sign of the eigenvector. Note See `torch.linalg.eigvalsh()` for a related function that computes only eigenvalues. However, that function is not differentiable. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the Hermitian `n times n` matrix or the batch of such matrices of size `(*, n, n)` where `*` is one or more batch dimensions. * **UPLO** (_'L'__,__'U'__,__optional_) – controls whether to use the upper-triangular or the lower-triangular part of `input` in the computations. Default is `'L'`. Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – tuple of two tensors to write the output to. Default is `None`. Returns A namedtuple (eigenvalues, eigenvectors) containing * `eigenvalues (Tensor): Shape (*, m).` The eigenvalues in ascending order. * `eigenvectors (Tensor): Shape (*, m, m).` The orthonormal eigenvectors of the `input`. Return type ([Tensor](tensors#torch.Tensor "torch.Tensor"), [Tensor](tensors#torch.Tensor "torch.Tensor")) Examples: >>> a = torch.randn(2, 2, dtype=torch.complex128) >>> a = a + a.t().conj() # creates a Hermitian matrix >>> a tensor([[2.9228+0.0000j, 0.2029-0.0862j], [0.2029+0.0862j, 0.3464+0.0000j]], dtype=torch.complex128) >>> w, v = torch.linalg.eigh(a) >>> w tensor([0.3277, 2.9415], dtype=torch.float64) >>> v tensor([[-0.0846+-0.0000j, -0.9964+0.0000j], [ 0.9170+0.3898j, -0.0779-0.0331j]], dtype=torch.complex128) >>> torch.allclose(torch.matmul(v, torch.matmul(w.to(v.dtype).diag_embed(), v.t().conj())), a) True >>> a = torch.randn(3, 2, 2, dtype=torch.float64) >>> a = a + a.transpose(-2, -1) # creates a symmetric matrix >>> w, v = torch.linalg.eigh(a) >>> torch.allclose(torch.matmul(v, torch.matmul(w.diag_embed(), v.transpose(-2, -1))), a) True `torch.linalg.eigvalsh(input, UPLO='L', *, out=None) → Tensor` Computes the eigenvalues of a complex Hermitian (or real symmetric) matrix `input`, or of each such matrix in a batched `input`. The eigenvalues are returned in ascending order. Since the matrix or matrices in `input` are assumed to be Hermitian, the imaginary part of their diagonals is always treated as zero. When `UPLO` is “L”, its default value, only the lower triangular part of each matrix is used in the computation. When `UPLO` is “U” only the upper triangular part of each matrix is used. Supports input of float, double, cfloat and cdouble dtypes. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note The eigenvalues are computed using LAPACK’s `syevd` and `heevd` routines for CPU inputs, and MAGMA’s `syevd` and `heevd` routines for CUDA inputs. Note The eigenvalues of real symmetric or complex Hermitian matrices are always real. Note This function doesn’t support backpropagation, please use `torch.linalg.eigh()` instead, which also computes the eigenvectors. Note See `torch.linalg.eigh()` for a related function that computes both eigenvalues and eigenvectors. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the Hermitian `n times n` matrix or the batch of such matrices of size `(*, n, n)` where `*` is one or more batch dimensions. * **UPLO** (_'L'__,__'U'__,__optional_) – controls whether to use the upper-triangular or the lower-triangular part of `input` in the computations. Default is `'L'`. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – tensor to write the output to. Default is `None`. Examples: >>> a = torch.randn(2, 2, dtype=torch.complex128) >>> a = a + a.t().conj() # creates a Hermitian matrix >>> a tensor([[2.9228+0.0000j, 0.2029-0.0862j], [0.2029+0.0862j, 0.3464+0.0000j]], dtype=torch.complex128) >>> w = torch.linalg.eigvalsh(a) >>> w tensor([0.3277, 2.9415], dtype=torch.float64) >>> a = torch.randn(3, 2, 2, dtype=torch.float64) >>> a = a + a.transpose(-2, -1) # creates a symmetric matrix >>> a tensor([[[ 2.8050, -0.3850], [-0.3850, 3.2376]], [[-1.0307, -2.7457], [-2.7457, -1.7517]], [[ 1.7166, 2.2207], [ 2.2207, -2.0898]]], dtype=torch.float64) >>> w = torch.linalg.eigvalsh(a) >>> w tensor([[ 2.5797, 3.4629], [-4.1605, 1.3780], [-3.1113, 2.7381]], dtype=torch.float64) `torch.linalg.matrix_rank(input, tol=None, hermitian=False, *, out=None) → Tensor` Computes the numerical rank of a matrix `input`, or of each matrix in a batched `input`. The matrix rank is computed as the number of singular values (or absolute eigenvalues when `hermitian` is `True`) that are greater than the specified `tol` threshold. If `tol` is not specified, `tol` is set to `S.max(dim=-1)*max(input.shape[-2:])*eps`, where `S` is the singular values (or absolute eigenvalues when `hermitian` is `True`), and `eps` is the epsilon value for the datatype of `input`. The epsilon value can be obtained using the `eps` attribute of `torch.finfo`. Supports input of float, double, cfloat and cdouble dtypes. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note The matrix rank is computed using singular value decomposition (see `torch.linalg.svd()`) by default. If `hermitian` is `True`, then `input` is assumed to be Hermitian (symmetric if real-valued), and the computation is done by obtaining the eigenvalues (see `torch.linalg.eigvalsh()`). Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size `(m, n)` or the batch of matrices of size `(*, m, n)` where `*` is one or more batch dimensions. * **tol** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – the tolerance value. Default is `None` * **hermitian** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates whether `input` is Hermitian. Default is `False`. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – tensor to write the output to. Default is `None`. Examples: >>> a = torch.eye(10) >>> torch.linalg.matrix_rank(a) tensor(10) >>> b = torch.eye(10) >>> b[0, 0] = 0 >>> torch.linalg.matrix_rank(b) tensor(9) >>> a = torch.randn(4, 3, 2) >>> torch.linalg.matrix_rank(a) tensor([2, 2, 2, 2]) >>> a = torch.randn(2, 4, 2, 3) >>> torch.linalg.matrix_rank(a) tensor([[2, 2, 2, 2], [2, 2, 2, 2]]) >>> a = torch.randn(2, 4, 3, 3, dtype=torch.complex64) >>> torch.linalg.matrix_rank(a) tensor([[3, 3, 3, 3], [3, 3, 3, 3]]) >>> torch.linalg.matrix_rank(a, hermitian=True) tensor([[3, 3, 3, 3], [3, 3, 3, 3]]) >>> torch.linalg.matrix_rank(a, tol=1.0) tensor([[3, 2, 2, 2], [1, 2, 1, 2]]) >>> torch.linalg.matrix_rank(a, tol=1.0, hermitian=True) tensor([[2, 2, 2, 1], [1, 2, 2, 2]]) `torch.linalg.norm(input, ord=None, dim=None, keepdim=False, *, out=None, dtype=None) → Tensor` Returns the matrix norm or vector norm of a given tensor. This function can calculate one of eight different types of matrix norms, or one of an infinite number of vector norms, depending on both the number of reduction dimensions and the value of the `ord` parameter. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – The input tensor. If dim is None, x must be 1-D or 2-D, unless `ord` is None. If both `dim` and `ord` are None, the 2-norm of the input flattened to 1-D will be returned. Its data type must be either a floating point or complex type. For complex inputs, the norm is calculated on of the absolute values of each element. If the input is complex and neither `dtype` nor `out` is specified, the result’s data type will be the corresponding floating point type (e.g. float if `input` is complexfloat). * **ord** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__inf_ _,__-inf_ _,__'fro'__,__'nuc'__,__optional_) – The order of norm. inf refers to `float('inf')`, numpy’s `inf` object, or any equivalent object. The following norms can be calculated: ord | norm for matrices | norm for vectors ---|---|--- None | Frobenius norm | 2-norm ’fro’ | Frobenius norm | – not supported – ‘nuc’ | nuclear norm | – not supported – inf | max(sum(abs(x), dim=1)) | max(abs(x)) -inf | min(sum(abs(x), dim=1)) | min(abs(x)) 0 | – not supported – | sum(x != 0) 1 | max(sum(abs(x), dim=0)) | as below -1 | min(sum(abs(x), dim=0)) | as below 2 | 2-norm (largest sing. value) | as below -2 | smallest singular value | as below other | – not supported – | sum(abs(x)**ord)**(1./ord) Default: `None` * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__2-tuple of python:ints_ _,__2-list of python:ints_ _,__optional_) – If `dim` is an int, vector norm will be calculated over the specified dimension. If `dim` is a 2-tuple of ints, matrix norm will be calculated over the specified dimensions. If `dim` is None, matrix norm will be calculated when the input tensor has two dimensions, and vector norm will be calculated when the input tensor has one dimension. Default: `None` * **keepdim** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If set to True, the reduced dimensions are retained in the result as dimensions with size one. Default: `False` Keyword Arguments * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default: `None` * **dtype** (`torch.dtype`, optional) – If specified, the input tensor is cast to `dtype` before performing the operation, and the returned tensor’s type will be `dtype`. If this argument is used in conjunction with the `out` argument, the output tensor’s type must match this argument or a RuntimeError will be raised. Default: `None` Examples: >>> import torch >>> from torch import linalg as LA >>> a = torch.arange(9, dtype=torch.float) - 4 >>> a tensor([-4., -3., -2., -1., 0., 1., 2., 3., 4.]) >>> b = a.reshape((3, 3)) >>> b tensor([[-4., -3., -2.], [-1., 0., 1.], [ 2., 3., 4.]]) >>> LA.norm(a) tensor(7.7460) >>> LA.norm(b) tensor(7.7460) >>> LA.norm(b, 'fro') tensor(7.7460) >>> LA.norm(a, float('inf')) tensor(4.) >>> LA.norm(b, float('inf')) tensor(9.) >>> LA.norm(a, -float('inf')) tensor(0.) >>> LA.norm(b, -float('inf')) tensor(2.) >>> LA.norm(a, 1) tensor(20.) >>> LA.norm(b, 1) tensor(7.) >>> LA.norm(a, -1) tensor(0.) >>> LA.norm(b, -1) tensor(6.) >>> LA.norm(a, 2) tensor(7.7460) >>> LA.norm(b, 2) tensor(7.3485) >>> LA.norm(a, -2) tensor(0.) >>> LA.norm(b.double(), -2) tensor(1.8570e-16, dtype=torch.float64) >>> LA.norm(a, 3) tensor(5.8480) >>> LA.norm(a, -3) tensor(0.) Using the `dim` argument to compute vector norms: >>> c = torch.tensor([[1., 2., 3.], ... [-1, 1, 4]]) >>> LA.norm(c, dim=0) tensor([1.4142, 2.2361, 5.0000]) >>> LA.norm(c, dim=1) tensor([3.7417, 4.2426]) >>> LA.norm(c, ord=1, dim=1) tensor([6., 6.]) Using the `dim` argument to compute matrix norms: >>> m = torch.arange(8, dtype=torch.float).reshape(2, 2, 2) >>> LA.norm(m, dim=(1,2)) tensor([ 3.7417, 11.2250]) >>> LA.norm(m[0, :, :]), LA.norm(m[1, :, :]) (tensor(3.7417), tensor(11.2250)) `torch.linalg.pinv(input, rcond=1e-15, hermitian=False, *, out=None) → Tensor` Computes the pseudo-inverse (also known as the Moore-Penrose inverse) of a matrix `input`, or of each matrix in a batched `input`. The singular values (or the absolute values of the eigenvalues when `hermitian` is `True`) that are below the specified `rcond` threshold are treated as zero and discarded in the computation. Supports input of float, double, cfloat and cdouble datatypes. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note The pseudo-inverse is computed using singular value decomposition (see `torch.linalg.svd()`) by default. If `hermitian` is `True`, then `input` is assumed to be Hermitian (symmetric if real-valued), and the computation of the pseudo-inverse is done by obtaining the eigenvalues and eigenvectors (see `torch.linalg.eigh()`). Note If singular value decomposition or eigenvalue decomposition algorithms do not converge then a RuntimeError will be thrown. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input matrix of size `(m, n)` or the batch of matrices of size `(*, m, n)` where `*` is one or more batch dimensions. * **rcond** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – the tolerance value to determine the cutoff for small singular values. Must be broadcastable to the singular values of `input` as returned by [`torch.svd()`](generated/torch.svd#torch.svd "torch.svd"). Default is `1e-15`. * **hermitian** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – indicates whether `input` is Hermitian. Default is `False`. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default is `None`. Examples: >>> input = torch.randn(3, 5) >>> input tensor([[ 0.5495, 0.0979, -1.4092, -0.1128, 0.4132], [-1.1143, -0.3662, 0.3042, 1.6374, -0.9294], [-0.3269, -0.5745, -0.0382, -0.5922, -0.6759]]) >>> torch.linalg.pinv(input) tensor([[ 0.0600, -0.1933, -0.2090], [-0.0903, -0.0817, -0.4752], [-0.7124, -0.1631, -0.2272], [ 0.1356, 0.3933, -0.5023], [-0.0308, -0.1725, -0.5216]]) Batched linalg.pinv example >>> a = torch.randn(2, 6, 3) >>> b = torch.linalg.pinv(a) >>> torch.matmul(b, a) tensor([[[ 1.0000e+00, 1.6391e-07, -1.1548e-07], [ 8.3121e-08, 1.0000e+00, -2.7567e-07], [ 3.5390e-08, 1.4901e-08, 1.0000e+00]], [[ 1.0000e+00, -8.9407e-08, 2.9802e-08], [-2.2352e-07, 1.0000e+00, 1.1921e-07], [ 0.0000e+00, 8.9407e-08, 1.0000e+00]]]) Hermitian input example >>> a = torch.randn(3, 3, dtype=torch.complex64) >>> a = a + a.t().conj() # creates a Hermitian matrix >>> b = torch.linalg.pinv(a, hermitian=True) >>> torch.matmul(b, a) tensor([[ 1.0000e+00+0.0000e+00j, -1.1921e-07-2.3842e-07j, 5.9605e-08-2.3842e-07j], [ 5.9605e-08+2.3842e-07j, 1.0000e+00+2.3842e-07j, -4.7684e-07+1.1921e-07j], [-1.1921e-07+0.0000e+00j, -2.3842e-07-2.9802e-07j, 1.0000e+00-1.7897e-07j]]) Non-default rcond example >>> rcond = 0.5 >>> a = torch.randn(3, 3) >>> torch.linalg.pinv(a) tensor([[ 0.2971, -0.4280, -2.0111], [-0.0090, 0.6426, -0.1116], [-0.7832, -0.2465, 1.0994]]) >>> torch.linalg.pinv(a, rcond) tensor([[-0.2672, -0.2351, -0.0539], [-0.0211, 0.6467, -0.0698], [-0.4400, -0.3638, -0.0910]]) Matrix-wise rcond example >>> a = torch.randn(5, 6, 2, 3, 3) >>> rcond = torch.rand(2) # different rcond values for each matrix in a[:, :, 0] and a[:, :, 1] >>> torch.linalg.pinv(a, rcond) >>> rcond = torch.randn(5, 6, 2) # different rcond value for each matrix in 'a' >>> torch.linalg.pinv(a, rcond) `torch.linalg.svd(input, full_matrices=True, compute_uv=True, *, out=None) -> (Tensor, Tensor, Tensor)` Computes the singular value decomposition of either a matrix or batch of matrices `input`.” The singular value decomposition is represented as a namedtuple `(U, S, Vh)`, such that input=U@diag(S)×Vhinput = U \mathbin{@} diag(S) \times Vh . If `input` is a batch of tensors, then `U`, `S`, and `Vh` are also batched with the same batch dimensions as `input`. If `full_matrices` is `False` (default), the method returns the reduced singular value decomposition i.e., if the last two dimensions of `input` are `m` and `n`, then the returned `U` and `V` matrices will contain only min(n,m)min(n, m) orthonormal columns. If `compute_uv` is `False`, the returned `U` and `Vh` will be empy tensors with no elements and the same device as `input`. The `full_matrices` argument has no effect when `compute_uv` is False. The dtypes of `U` and `V` are the same as `input`’s. `S` will always be real- valued, even if `input` is complex. Note Unlike NumPy’s `linalg.svd`, this always returns a namedtuple of three tensors, even when `compute_uv=False`. This behavior may change in a future PyTorch release. Note The singular values are returned in descending order. If `input` is a batch of matrices, then the singular values of each matrix in the batch is returned in descending order. Note The implementation of SVD on CPU uses the LAPACK routine `?gesdd` (a divide- and-conquer algorithm) instead of `?gesvd` for speed. Analogously, the SVD on GPU uses the cuSOLVER routines `gesvdj` and `gesvdjBatched` on CUDA 10.1.243 and later, and uses the MAGMA routine `gesdd` on earlier versions of CUDA. Note The returned matrix `U` will be transposed, i.e. with strides `U.contiguous().transpose(-2, -1).stride()`. Note Gradients computed using `U` and `Vh` may be unstable if `input` is not full rank or has non-unique singular values. Note When `full_matrices` = `True`, the gradients on `U[..., :, min(m, n):]` and `V[..., :, min(m, n):]` will be ignored in backward as those vectors can be arbitrary bases of the subspaces. Note The `S` tensor can only be used to compute gradients if `compute_uv` is True. Note Since `U` and `V` of an SVD is not unique, each vector can be multiplied by an arbitrary phase factor eiϕe^{i \phi} while the SVD result is still correct. Different platforms, like Numpy, or inputs on different device types, may produce different `U` and `V` tensors. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n) where `*` is zero or more batch dimensions consisting of m×nm \times n matrices. * **full_matrices** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – controls whether to compute the full or reduced decomposition, and consequently the shape of returned `U` and `V`. Defaults to True. * **compute_uv** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether to compute `U` and `V` or not. Defaults to True. * **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – a tuple of three tensors to use for the outputs. If compute_uv=False, the 1st and 3rd arguments must be tensors, but they are ignored. E.g. you can pass `(torch.Tensor(), out_S, torch.Tensor())` Example: >>> import torch >>> a = torch.randn(5, 3) >>> a tensor([[-0.3357, -0.2987, -1.1096], [ 1.4894, 1.0016, -0.4572], [-1.9401, 0.7437, 2.0968], [ 0.1515, 1.3812, 1.5491], [-1.8489, -0.5907, -2.5673]]) >>> >>> # reconstruction in the full_matrices=False case >>> u, s, vh = torch.linalg.svd(a, full_matrices=False) >>> u.shape, s.shape, vh.shape (torch.Size([5, 3]), torch.Size([3]), torch.Size([3, 3])) >>> torch.dist(a, u @ torch.diag(s) @ vh) tensor(1.0486e-06) >>> >>> # reconstruction in the full_matrices=True case >>> u, s, vh = torch.linalg.svd(a) >>> u.shape, s.shape, vh.shape (torch.Size([5, 5]), torch.Size([3]), torch.Size([3, 3])) >>> torch.dist(a, u[:, :3] @ torch.diag(s) @ vh) >>> torch.dist(a, u[:, :3] @ torch.diag(s) @ vh) tensor(1.0486e-06) >>> >>> # extra dimensions >>> a_big = torch.randn(7, 5, 3) >>> u, s, vh = torch.linalg.svd(a_big, full_matrices=False) >>> torch.dist(a_big, u @ torch.diag_embed(s) @ vh) tensor(3.0957e-06) `torch.linalg.solve(input, other, *, out=None) → Tensor` Computes the solution `x` to the matrix equation `matmul(input, x) = other` with a square matrix, or batches of such matrices, `input` and one or more right-hand side vectors `other`. If `input` is batched and `other` is not, then `other` is broadcast to have the same batch dimensions as `input`. The resulting tensor has the same shape as the (possibly broadcast) `other`. Supports input of `float`, `double`, `cfloat` and `cdouble` dtypes. Note If `input` is a non-square or non-invertible matrix, or a batch containing non-square matrices or one or more non-invertible matrices, then a RuntimeError will be thrown. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the square n×nn \times n matrix or the batch of such matrices of size (∗,n,n)(*, n, n) where `*` is one or more batch dimensions. * **other** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – right-hand side tensor of shape (∗,n)(*, n) or (∗,n,k)(*, n, k) , where kk is the number of right-hand side vectors. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default: `None` Examples: >>> A = torch.eye(3) >>> b = torch.randn(3) >>> x = torch.linalg.solve(A, b) >>> torch.allclose(A @ x, b) True Batched input: >>> A = torch.randn(2, 3, 3) >>> b = torch.randn(3, 1) >>> x = torch.linalg.solve(A, b) >>> torch.allclose(A @ x, b) True >>> b = torch.rand(3) # b is broadcast internally to (*A.shape[:-2], 3) >>> x = torch.linalg.solve(A, b) >>> x.shape torch.Size([2, 3]) >>> Ax = A @ x.unsqueeze(-1) >>> torch.allclose(Ax, b.unsqueeze(-1).expand_as(Ax)) True `torch.linalg.tensorinv(input, ind=2, *, out=None) → Tensor` Computes a tensor `input_inv` such that `tensordot(input_inv, input, ind) == I_n` (inverse tensor equation), where `I_n` is the n-dimensional identity tensor and `n` is equal to `input.ndim`. The resulting tensor `input_inv` has shape equal to `input.shape[ind:] + input.shape[:ind]`. Supports input of `float`, `double`, `cfloat` and `cdouble` data types. Note If `input` is not invertible or does not satisfy the requirement `prod(input.shape[ind:]) == prod(input.shape[:ind])`, then a RuntimeError will be thrown. Note When `input` is a 2-dimensional tensor and `ind=1`, this function computes the (multiplicative) inverse of `input`, equivalent to calling [`torch.inverse()`](generated/torch.inverse#torch.inverse "torch.inverse"). Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – A tensor to invert. Its shape must satisfy `prod(input.shape[:ind]) == prod(input.shape[ind:])`. * **ind** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A positive integer that describes the inverse tensor equation. See [`torch.tensordot()`](generated/torch.tensordot#torch.tensordot "torch.tensordot") for details. Default: 2. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default: `None` Examples: >>> a = torch.eye(4 * 6).reshape((4, 6, 8, 3)) >>> ainv = torch.linalg.tensorinv(a, ind=2) >>> ainv.shape torch.Size([8, 3, 4, 6]) >>> b = torch.randn(4, 6) >>> torch.allclose(torch.tensordot(ainv, b), torch.linalg.tensorsolve(a, b)) True >>> a = torch.randn(4, 4) >>> a_tensorinv = torch.linalg.tensorinv(a, ind=1) >>> a_inv = torch.inverse(a) >>> torch.allclose(a_tensorinv, a_inv) True `torch.linalg.tensorsolve(input, other, dims=None, *, out=None) → Tensor` Computes a tensor `x` such that `tensordot(input, x, dims=x.ndim) = other`. The resulting tensor `x` has the same shape as `input[other.ndim:]`. Supports real-valued and complex-valued inputs. Note If `input` does not satisfy the requirement `prod(input.shape[other.ndim:]) == prod(input.shape[:other.ndim])` after (optionally) moving the dimensions using `dims`, then a RuntimeError will be thrown. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – “left-hand-side” tensor, it must satisfy the requirement `prod(input.shape[other.ndim:]) == prod(input.shape[:other.ndim])`. * **other** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – “right-hand-side” tensor of shape `input.shape[other.ndim]`. * **dims** (_Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – dimensions of `input` to be moved before the computation. Equivalent to calling `input = movedim(input, dims, range(len(dims) - input.ndim, 0))`. If None (default), no dimensions are moved. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default: `None` Examples: >>> a = torch.eye(2 * 3 * 4).reshape((2 * 3, 4, 2, 3, 4)) >>> b = torch.randn(2 * 3, 4) >>> x = torch.linalg.tensorsolve(a, b) >>> x.shape torch.Size([2, 3, 4]) >>> torch.allclose(torch.tensordot(a, x, dims=x.ndim), b) True >>> a = torch.randn(6, 4, 4, 3, 2) >>> b = torch.randn(4, 3, 2) >>> x = torch.linalg.tensorsolve(a, b, dims=(0, 2)) >>> x.shape torch.Size([6, 4]) >>> a = a.permute(1, 3, 4, 0, 2) >>> a.shape[b.ndim:] torch.Size([6, 4]) >>> torch.allclose(torch.tensordot(a, x, dims=x.ndim), b, atol=1e-6) True `torch.linalg.inv(input, *, out=None) → Tensor` Computes the multiplicative inverse matrix of a square matrix `input`, or of each square matrix in a batched `input`. The result satisfies the relation: `matmul(inv(input),input)` = `matmul(input,inv(input))` = `eye(input.shape[0]).expand_as(input)`. Supports input of float, double, cfloat and cdouble data types. Note When given inputs on a CUDA device, this function synchronizes that device with the CPU. Note The inverse matrix is computed using LAPACK’s `getrf` and `getri` routines for CPU inputs. For CUDA inputs, cuSOLVER’s `getrf` and `getrs` routines as well as cuBLAS’ `getrf` and `getri` routines are used if CUDA version >= 10.1.243, otherwise MAGMA’s `getrf` and `getri` routines are used instead. Note If `input` is a non-invertible matrix or non-square matrix, or batch with at least one such matrix, then a RuntimeError will be thrown. Parameters **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the square `(n, n)` matrix or the batch of such matrices of size `(*, n, n)` where `*` is one or more batch dimensions. Keyword Arguments **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – The output tensor. Ignored if `None`. Default is `None`. Examples: >>> x = torch.rand(4, 4) >>> y = torch.linalg.inv(x) >>> z = torch.mm(x, y) >>> z tensor([[ 1.0000, -0.0000, -0.0000, 0.0000], [ 0.0000, 1.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 1.0000, 0.0000], [ 0.0000, -0.0000, -0.0000, 1.0000]]) >>> torch.max(torch.abs(z - torch.eye(4))) # Max non-zero tensor(1.1921e-07) >>> # Batched inverse example >>> x = torch.randn(2, 3, 4, 4) >>> y = torch.linalg.inv(x) >>> z = torch.matmul(x, y) >>> torch.max(torch.abs(z - torch.eye(4).expand_as(x))) # Max non-zero tensor(1.9073e-06) >>> x = torch.rand(4, 4, dtype=torch.cdouble) >>> y = torch.linalg.inv(x) >>> z = torch.mm(x, y) >>> z tensor([[ 1.0000e+00+0.0000e+00j, -1.3878e-16+3.4694e-16j, 5.5511e-17-1.1102e-16j, 0.0000e+00-1.6653e-16j], [ 5.5511e-16-1.6653e-16j, 1.0000e+00+6.9389e-17j, 2.2204e-16-1.1102e-16j, -2.2204e-16+1.1102e-16j], [ 3.8858e-16-1.2490e-16j, 2.7756e-17+3.4694e-17j, 1.0000e+00+0.0000e+00j, -4.4409e-16+5.5511e-17j], [ 4.4409e-16+5.5511e-16j, -3.8858e-16+1.8041e-16j, 2.2204e-16+0.0000e+00j, 1.0000e+00-3.4694e-16j]], dtype=torch.complex128) >>> torch.max(torch.abs(z - torch.eye(4, dtype=torch.cdouble))) # Max non-zero tensor(7.5107e-16, dtype=torch.float64) `torch.linalg.qr(input, mode='reduced', *, out=None) -> (Tensor, Tensor)` Computes the QR decomposition of a matrix or a batch of matrices `input`, and returns a namedtuple (Q, R) of tensors such that input=QR\text{input} = Q R with QQ being an orthogonal matrix or batch of orthogonal matrices and RR being an upper triangular matrix or batch of upper triangular matrices. Depending on the value of `mode` this function returns the reduced or complete QR factorization. See below for a list of valid modes. Note **Differences with** `numpy.linalg.qr`: * `mode='raw'` is not implemented * unlike `numpy.linalg.qr`, this function always returns a tuple of two tensors. When `mode='r'`, the `Q` tensor is an empty tensor. This behavior may change in a future PyTorch release. Note Backpropagation is not supported for `mode='r'`. Use `mode='reduced'` instead. Backpropagation is also not supported if the first min⁡(input.size(−1),input.size(−2))\min(input.size(-1), input.size(-2)) columns of any matrix in `input` are not linearly independent. While no error will be thrown when this occurs the values of the “gradient” produced may be anything. This behavior may change in the future. Note This function uses LAPACK for CPU inputs and MAGMA for CUDA inputs, and may produce different (valid) decompositions on different device types or different platforms. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor of size (∗,m,n)(*, m, n) where `*` is zero or more batch dimensions consisting of matrices of dimension m×nm \times n . * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – if `k = min(m, n)` then: * `'reduced'` : returns `(Q, R)` with dimensions (m, k), (k, n) (default) * `'complete'`: returns `(Q, R)` with dimensions (m, m), (m, n) * `'r'`: computes only `R`; returns `(Q, R)` where `Q` is empty and `R` has dimensions (k, n) Keyword Arguments **out** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – tuple of `Q` and `R` tensors. The dimensions of `Q` and `R` are detailed in the description of `mode` above. Example: >>> a = torch.tensor([[12., -51, 4], [6, 167, -68], [-4, 24, -41]]) >>> q, r = torch.linalg.qr(a) >>> q tensor([[-0.8571, 0.3943, 0.3314], [-0.4286, -0.9029, -0.0343], [ 0.2857, -0.1714, 0.9429]]) >>> r tensor([[ -14.0000, -21.0000, 14.0000], [ 0.0000, -175.0000, 70.0000], [ 0.0000, 0.0000, -35.0000]]) >>> torch.mm(q, r).round() tensor([[ 12., -51., 4.], [ 6., 167., -68.], [ -4., 24., -41.]]) >>> torch.mm(q.t(), q).round() tensor([[ 1., 0., 0.], [ 0., 1., -0.], [ 0., -0., 1.]]) >>> q2, r2 = torch.linalg.qr(a, mode='r') >>> q2 tensor([]) >>> torch.equal(r, r2) True >>> a = torch.randn(3, 4, 5) >>> q, r = torch.linalg.qr(a, mode='complete') >>> torch.allclose(torch.matmul(q, r), a) True >>> torch.allclose(torch.matmul(q.transpose(-2, -1), q), torch.eye(5)) True # torch.utils.mobile_optimizer Warning This API is in beta and may change in the near future. Torch mobile supports `torch.mobile_optimizer.optimize_for_mobile` utility to run a list of optimization pass with modules in eval mode. The method takes the following parameters: a torch.jit.ScriptModule object, a blocklisting optimization set and a preserved method list `By default, if optimization blocklist is None or empty, optimize_for_mobile will run the following optimizations:` * **Conv2D + BatchNorm fusion** (blocklisting option `MobileOptimizerType::CONV_BN_FUSION`): This optimization pass folds `Conv2d-BatchNorm2d` into `Conv2d` in `forward` method of this module and all its submodules. The weight and bias of the `Conv2d` are correspondingly updated. * **Insert and Fold prepacked ops** (blocklisting option `MobileOptimizerType::INSERT_FOLD_PREPACK_OPS`): This optimization pass rewrites the graph to replace 2D convolutions and linear ops with their prepacked counterparts. Prepacked ops are stateful ops in that, they require some state to be created, such as weight prepacking and use this state, i.e. prepacked weights, during op execution. XNNPACK is one such backend that provides prepacked ops, with kernels optimized for mobile platforms (such as ARM CPUs). Prepacking of weight enables efficient memory access and thus faster kernel execution. At the moment `optimize_for_mobile` pass rewrites the graph to replace `Conv2D/Linear` with 1) op that pre-packs weight for XNNPACK conv2d/linear ops and 2) op that takes pre-packed weight and activation as input and generates output activations. Since 1 needs to be done only once, we fold the weight pre-packing such that it is done only once at model load time. This pass of the `optimize_for_mobile` does 1 and 2 and then folds, i.e. removes, weight pre-packing ops. * **ReLU/Hardtanh fusion** : XNNPACK ops support fusion of clamping. That is clamping of output activation is done as part of the kernel, including for 2D convolution and linear op kernels. Thus clamping effectively comes for free. Thus any op that can be expressed as clamping op, such as `ReLU` or `hardtanh`, can be fused with previous `Conv2D` or `linear` op in XNNPACK. This pass rewrites graph by finding `ReLU/hardtanh` ops that follow XNNPACK `Conv2D/linear` ops, written by the previous pass, and fuses them together. * **Dropout removal** (blocklisting option `MobileOptimizerType::REMOVE_DROPOUT`): This optimization pass removes `dropout` and `dropout_` nodes from this module when training is false. * **Conv packed params hoisting** (blocklisting option `MobileOptimizerType::HOIST_CONV_PACKED_PARAMS`): This optimization pass moves convolution packed params to the root module, so that the convolution structs can be deleted. This decreases model size without impacting numerics. `optimize_for_mobile` will also invoke freeze_module pass which only preserves `forward` method. If you have other method to that needed to be preserved, add them into the preserved method list and pass into the method. `torch.utils.mobile_optimizer.optimize_for_mobile(script_module, optimization_blocklist=None, preserved_methods=None, backend='CPU')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/mobile_optimizer.html#optimize_for_mobile) Parameters * **script_module** – An instance of torch script module with type of ScriptModule. * **optimization_blocklist** – A set with type of MobileOptimizerType. When set is not passed, optimization method will run all the optimizer pass; otherwise, optimizer method will run the optimization pass that is not included inside optimization_blocklist. * **perserved_methods** – A list of methods that needed to be preserved when freeze_module pass is invoked * **backend** – Device type to use for running the result model (‘CPU’(default), ‘Vulkan’ or ‘Metal’). Returns A new optimized torch script module # torch.utils.model_zoo Moved to `torch.hub`. `torch.utils.model_zoo.load_url(url, model_dir=None, map_location=None, progress=True, check_hash=False, file_name=None)` Loads the Torch serialized object at the given URL. If downloaded file is a zip file, it will be automatically decompressed. If the object is already present in `model_dir`, it’s deserialized and returned. The default value of `model_dir` is `/checkpoints` where `hub_dir` is the directory returned by [`get_dir()`](hub#torch.hub.get_dir "torch.hub.get_dir"). Parameters * **url** (_string_) – URL of the object to download * **model_dir** (_string_ _,__optional_) – directory in which to save the object * **map_location** (_optional_) – a function or a dict specifying how to remap storage locations (see torch.load) * **progress** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – whether or not to display a progress bar to stderr. Default: True * **check_hash** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If True, the filename part of the URL should follow the naming convention `filename-.ext` where `` is the first eight or more digits of the SHA256 hash of the contents of the file. The hash is used to ensure unique names and to verify the contents of the file. Default: False * **file_name** (_string_ _,__optional_) – name for the downloaded file. Filename from `url` will be used if not set. #### Example >>> state_dict = torch.hub.load_state_dict_from_url('https://s3.amazonaws.com/pytorch/models/resnet18-5c106cde.pth') # Multiprocessing package - torch.multiprocessing torch.multiprocessing is a wrapper around the native [`multiprocessing`](https://docs.python.org/3/library/multiprocessing.html#module- multiprocessing "\(in Python v3.9\)") module. It registers custom reducers, that use shared memory to provide shared views on the same data in different processes. Once the tensor/storage is moved to shared_memory (see [`share_memory_()`](tensors#torch.Tensor.share_memory_ "torch.Tensor.share_memory_")), it will be possible to send it to other processes without making any copies. The API is 100% compatible with the original module - it’s enough to change `import multiprocessing` to `import torch.multiprocessing` to have all the tensors sent through the queues or shared via other mechanisms, moved to shared memory. Because of the similarity of APIs we do not document most of this package contents, and we recommend referring to very good docs of the original module. Warning If the main process exits abruptly (e.g. because of an incoming signal), Python’s `multiprocessing` sometimes fails to clean up its children. It’s a known caveat, so if you’re seeing any resource leaks after interrupting the interpreter, it probably means that this has just happened to you. ## Strategy management `torch.multiprocessing.get_all_sharing_strategies()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing.html#get_all_sharing_strategies) Returns a set of sharing strategies supported on a current system. `torch.multiprocessing.get_sharing_strategy()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing.html#get_sharing_strategy) Returns the current strategy for sharing CPU tensors. `torch.multiprocessing.set_sharing_strategy(new_strategy)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing.html#set_sharing_strategy) Sets the strategy for sharing CPU tensors. Parameters **new_strategy** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Name of the selected strategy. Should be one of the values returned by `get_all_sharing_strategies()`. ## Sharing CUDA tensors Sharing CUDA tensors between processes is supported only in Python 3, using a `spawn` or `forkserver` start methods. Unlike CPU tensors, the sending process is required to keep the original tensor as long as the receiving process retains a copy of the tensor. The refcounting is implemented under the hood but requires users to follow the next best practices. Warning If the consumer process dies abnormally to a fatal signal, the shared tensor could be forever kept in memory as long as the sending process is running. 1. Release memory ASAP in the consumer. ## Good x = queue.get() # do somethings with x del x ## Bad x = queue.get() # do somethings with x # do everything else (producer have to keep x in memory) 2\. Keep producer process running until all consumers exits. This will prevent the situation when the producer process releasing memory which is still in use by the consumer. ## producer # send tensors, do something event.wait() ## consumer # receive tensors and use them event.set() 3. Don’t pass received tensors. # not going to work x = queue.get() queue_2.put(x) # you need to create a process-local copy x = queue.get() x_clone = x.clone() queue_2.put(x_clone) # putting and getting from the same queue in the same process will likely end up with segfault queue.put(tensor) x = queue.get() ## Sharing strategies This section provides a brief overview into how different sharing strategies work. Note that it applies only to CPU tensor - CUDA tensors will always use the CUDA API, as that’s the only way they can be shared. ### File descriptor - `file_descriptor` Note This is the default strategy (except for macOS and OS X where it’s not supported). This strategy will use file descriptors as shared memory handles. Whenever a storage is moved to shared memory, a file descriptor obtained from `shm_open` is cached with the object, and when it’s going to be sent to other processes, the file descriptor will be transferred (e.g. via UNIX sockets) to it. The receiver will also cache the file descriptor and `mmap` it, to obtain a shared view onto the storage data. Note that if there will be a lot of tensors shared, this strategy will keep a large number of file descriptors open most of the time. If your system has low limits for the number of open file descriptors, and you can’t raise them, you should use the `file_system` strategy. ### File system - `file_system` This strategy will use file names given to `shm_open` to identify the shared memory regions. This has a benefit of not requiring the implementation to cache the file descriptors obtained from it, but at the same time is prone to shared memory leaks. The file can’t be deleted right after its creation, because other processes need to access it to open their views. If the processes fatally crash, or are killed, and don’t call the storage destructors, the files will remain in the system. This is very serious, because they keep using up the memory until the system is restarted, or they’re freed manually. To counter the problem of shared memory file leaks, `torch.multiprocessing` will spawn a daemon named `torch_shm_manager` that will isolate itself from the current process group, and will keep track of all shared memory allocations. Once all processes connected to it exit, it will wait a moment to ensure there will be no new connections, and will iterate over all shared memory files allocated by the group. If it finds that any of them still exist, they will be deallocated. We’ve tested this method and it proved to be robust to various failures. Still, if your system has high enough limits, and `file_descriptor` is a supported strategy, we do not recommend switching to this one. ## Spawning subprocesses Note Available for Python >= 3.4. This depends on the `spawn` start method in Python’s `multiprocessing` package. Spawning a number of subprocesses to perform some function can be done by creating `Process` instances and calling `join` to wait for their completion. This approach works fine when dealing with a single subprocess but presents potential issues when dealing with multiple processes. Namely, joining processes sequentially implies they will terminate sequentially. If they don’t, and the first process does not terminate, the process termination will go unnoticed. Also, there are no native facilities for error propagation. The `spawn` function below addresses these concerns and takes care of error propagation, out of order termination, and will actively terminate processes upon detecting an error in one of them. `torch.multiprocessing.spawn(fn, args=(), nprocs=1, join=True, daemon=False, start_method='spawn')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing/spawn.html#spawn) Spawns `nprocs` processes that run `fn` with `args`. If one of the processes exits with a non-zero exit status, the remaining processes are killed and an exception is raised with the cause of termination. In the case an exception was caught in the child process, it is forwarded and its traceback is included in the exception raised in the parent process. Parameters * **fn** (_function_) – Function is called as the entrypoint of the spawned process. This function must be defined at the top level of a module so it can be pickled and spawned. This is a requirement imposed by multiprocessing. The function is called as `fn(i, *args)`, where `i` is the process index and `args` is the passed through tuple of arguments. * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Arguments passed to `fn`. * **nprocs** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of processes to spawn. * **join** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Perform a blocking join on all processes. * **daemon** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – The spawned processes’ daemon flag. If set to True, daemonic processes will be created. * **start_method** (_string_) – (deprecated) this method will always use `spawn` as the start method. To use a different start method use `start_processes()`. Returns None if `join` is `True`, `ProcessContext` if `join` is `False` `class torch.multiprocessing.SpawnContext` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/multiprocessing/spawn.html#SpawnContext) Returned by `spawn()` when called with `join=False`. `join(timeout=None)` Tries to join one or more processes in this spawn context. If one of them exited with a non-zero exit status, this function kills the remaining processes and raises an exception with the cause of the first process exiting. Returns `True` if all processes have been joined successfully, `False` if there are more processes that need to be joined. Parameters **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Wait this long before giving up on waiting. # Named Tensors Named Tensors allow users to give explicit names to tensor dimensions. In most cases, operations that take dimension parameters will accept dimension names, avoiding the need to track dimensions by position. In addition, named tensors use names to automatically check that APIs are being used correctly at runtime, providing extra safety. Names can also be used to rearrange dimensions, for example, to support “broadcasting by name” rather than “broadcasting by position”. Warning The named tensor API is a prototype feature and subject to change. ## Creating named tensors Factory functions now take a new `names` argument that associates a name with each dimension. >>> torch.zeros(2, 3, names=('N', 'C')) tensor([[0., 0., 0.], [0., 0., 0.]], names=('N', 'C')) Named dimensions, like regular Tensor dimensions, are ordered. `tensor.names[i]` is the name of dimension `i` of `tensor`. The following factory functions support named tensors: * [`torch.empty()`](generated/torch.empty#torch.empty "torch.empty") * [`torch.rand()`](generated/torch.rand#torch.rand "torch.rand") * [`torch.randn()`](generated/torch.randn#torch.randn "torch.randn") * [`torch.ones()`](generated/torch.ones#torch.ones "torch.ones") * [`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor") * [`torch.zeros()`](generated/torch.zeros#torch.zeros "torch.zeros") ## Named dimensions See `names` for restrictions on tensor names. Use `names` to access the dimension names of a tensor and `rename()` to rename named dimensions. >>> imgs = torch.randn(1, 2, 2, 3 , names=('N', 'C', 'H', 'W')) >>> imgs.names ('N', 'C', 'H', 'W') >>> renamed_imgs = imgs.rename(H='height', W='width') >>> renamed_imgs.names ('N', 'C', 'height', 'width) Named tensors can coexist with unnamed tensors; named tensors are instances of [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor"). Unnamed tensors have `None`-named dimensions. Named tensors do not require all dimensions to be named. >>> imgs = torch.randn(1, 2, 2, 3 , names=(None, 'C', 'H', 'W')) >>> imgs.names (None, 'C', 'H', 'W') ## Name propagation semantics Named tensors use names to automatically check that APIs are being called correctly at runtime. This occurs in a process called _name inference_. More formally, name inference consists of the following two steps: * **Check names** : an operator may perform automatic checks at runtime that check that certain dimension names must match. * **Propagate names** : name inference propagates names to output tensors. All operations that support named tensors propagate names. >>> x = torch.randn(3, 3, names=('N', 'C')) >>> x.abs().names ('N', 'C') ### match semantics Two names _match_ if they are equal (string equality) or if at least one is `None`. Nones are essentially a special “wildcard” name. `unify(A, B)` determines which of the names `A` and `B` to propagate to the outputs. It returns the more _specific_ of the two names, if they match. If the names do not match, then it errors. Note In practice, when working with named tensors, one should avoid having unnamed dimensions because their handling can be complicated. It is recommended to lift all unnamed dimensions to be named dimensions by using `refine_names()`. ### Basic name inference rules Let’s see how `match` and `unify` are used in name inference in the case of adding two one-dim tensors with no broadcasting. x = torch.randn(3, names=('X',)) y = torch.randn(3) z = torch.randn(3, names=('Z',)) **Check names** : check that the names of the two tensors _match_. For the following examples: >>> # x + y # match('X', None) is True >>> # x + z # match('X', 'Z') is False >>> # x + x # match('X', 'X') is True >>> x + z Error when attempting to broadcast dims ['X'] and dims ['Z']: dim 'X' and dim 'Z' are at the same position from the right but do not match. **Propagate names** : _unify_ the names to select which one to propagate. In the case of `x + y`, `unify('X', None) = 'X'` because `'X'` is more specific than `None`. >>> (x + y).names ('X',) >>> (x + x).names ('X',) For a comprehensive list of name inference rules, see [Named Tensors operator coverage](name_inference#name-inference-reference-doc). Here are two common operations that may be useful to go over: * Binary arithmetic ops: [Unifies names from inputs](name_inference#unifies-names-from-inputs-doc) * Matrix multiplication ops: [Contracts away dims](name_inference#contracts-away-dims-doc) ## Explicit alignment by names Use `align_as()` or `align_to()` to align tensor dimensions by name to a specified ordering. This is useful for performing “broadcasting by names”. # This function is agnostic to the dimension ordering of `input`, # as long as it has a `C` dimension somewhere. def scale_channels(input, scale): scale = scale.refine_names('C') return input * scale.align_as(input) >>> num_channels = 3 >>> scale = torch.randn(num_channels, names=('C',)) >>> imgs = torch.rand(3, 3, 3, num_channels, names=('N', 'H', 'W', 'C')) >>> more_imgs = torch.rand(3, num_channels, 3, 3, names=('N', 'C', 'H', 'W')) >>> videos = torch.randn(3, num_channels, 3, 3, 3, names=('N', 'C', 'H', 'W', 'D') >>> scale_channels(imgs, scale) >>> scale_channels(more_imgs, scale) >>> scale_channels(videos, scale) ## Manipulating dimensions Use `align_to()` to permute large amounts of dimensions without mentioning all of them as in required by [`permute()`](tensors#torch.Tensor.permute "torch.Tensor.permute"). >>> tensor = torch.randn(2, 2, 2, 2, 2, 2) >>> named_tensor = tensor.refine_names('A', 'B', 'C', 'D', 'E', 'F') # Move the F (dim 5) and E dimension (dim 4) to the front while keeping # the rest in the same order >>> tensor.permute(5, 4, 0, 1, 2, 3) >>> named_tensor.align_to('F', 'E', ...) Use [`flatten()`](tensors#torch.Tensor.flatten "torch.Tensor.flatten") and `unflatten()` to flatten and unflatten dimensions, respectively. These methods are more verbose than [`view()`](tensors#torch.Tensor.view "torch.Tensor.view") and [`reshape()`](tensors#torch.Tensor.reshape "torch.Tensor.reshape"), but have more semantic meaning to someone reading the code. >>> imgs = torch.randn(32, 3, 128, 128) >>> named_imgs = imgs.refine_names('N', 'C', 'H', 'W') >>> flat_imgs = imgs.view(32, -1) >>> named_flat_imgs = named_imgs.flatten(['C', 'H', 'W'], 'features') >>> named_flat_imgs.names ('N', 'features') >>> unflattened_imgs = imgs.view(32, 3, 128, 128) >>> unflattened_named_imgs = named_flat_imgs.unflatten( 'features', [('C', 3), ('H', 128), ('W', 128)]) ## Autograd support Autograd currently supports named tensors in a limited manner: autograd ignores names on all tensors. Gradient computation is still correct but we lose the safety that names give us. >>> x = torch.randn(3, names=('D',)) >>> weight = torch.randn(3, names=('D',), requires_grad=True) >>> loss = (x - weight).abs() >>> grad_loss = torch.randn(3) >>> loss.backward(grad_loss) >>> weight.grad # Unnamed for now. Will be named in the future tensor([-1.8107, -0.6357, 0.0783]) >>> weight.grad.zero_() >>> grad_loss = grad_loss.refine_names('C') >>> loss = (x - weight).abs() # Ideally we'd check that the names of loss and grad_loss match but we don't yet. >>> loss.backward(grad_loss) >>> weight.grad tensor([-1.8107, -0.6357, 0.0783]) ## Currently supported operations and subsystems ### Operators See [Named Tensors operator coverage](name_inference#name-inference-reference- doc) for a full list of the supported torch and tensor operations. We do not yet support the following that is not covered by the link: * indexing, advanced indexing. For `torch.nn.functional` operators, we support the following: * [`torch.nn.functional.relu()`](nn.functional#torch.nn.functional.relu "torch.nn.functional.relu") * [`torch.nn.functional.softmax()`](nn.functional#torch.nn.functional.softmax "torch.nn.functional.softmax") * [`torch.nn.functional.log_softmax()`](nn.functional#torch.nn.functional.log_softmax "torch.nn.functional.log_softmax") * [`torch.nn.functional.tanh()`](nn.functional#torch.nn.functional.tanh "torch.nn.functional.tanh") * [`torch.nn.functional.sigmoid()`](nn.functional#torch.nn.functional.sigmoid "torch.nn.functional.sigmoid") * [`torch.nn.functional.dropout()`](nn.functional#torch.nn.functional.dropout "torch.nn.functional.dropout") ### Subsystems Autograd is supported, see Autograd support. Because gradients are currently unnamed, optimizers may work but are untested. NN modules are currently unsupported. This can lead to the following when calling modules with named tensor inputs: * NN module parameters are unnamed, so outputs may be partially named. * NN module forward passes have code that don’t support named tensors and will error out appropriately. We also do not support the following subsystems, though some may work out of the box: * distributions * serialization ([`torch.load()`](generated/torch.load#torch.load "torch.load"), [`torch.save()`](generated/torch.save#torch.save "torch.save")) * multiprocessing * JIT * distributed * ONNX If any of these would help your use case, please [search if an issue has already been filed](https://github.com/pytorch/pytorch/issues?q=is%3Aopen+is%3Aissue+label%3A%22module%3A+named+tensor%22) and if not, [file one](https://github.com/pytorch/pytorch/issues/new/choose). ## Named tensor API reference In this section please find the documentation for named tensor specific APIs. For a comprehensive reference for how names are propagated through other PyTorch operators, see [Named Tensors operator coverage](name_inference#name- inference-reference-doc). `class torch.Tensor` `names` Stores names for each of this tensor’s dimensions. `names[idx]` corresponds to the name of tensor dimension `idx`. Names are either a string if the dimension is named or `None` if the dimension is unnamed. Dimension names may contain characters or underscore. Furthermore, a dimension name must be a valid Python variable name (i.e., does not start with underscore). Tensors may not have two named dimensions with the same name. Warning The named tensor API is experimental and subject to change. `rename(*names, **rename_map)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.rename) Renames dimension names of `self`. There are two main usages: `self.rename(**rename_map)` returns a view on tensor that has dims renamed as specified in the mapping `rename_map`. `self.rename(*names)` returns a view on tensor, renaming all dimensions positionally using `names`. Use `self.rename(None)` to drop names on a tensor. One cannot specify both positional args `names` and keyword args `rename_map`. Examples: >>> imgs = torch.rand(2, 3, 5, 7, names=('N', 'C', 'H', 'W')) >>> renamed_imgs = imgs.rename(N='batch', C='channels') >>> renamed_imgs.names ('batch', 'channels', 'H', 'W') >>> renamed_imgs = imgs.rename(None) >>> renamed_imgs.names (None,) >>> renamed_imgs = imgs.rename('batch', 'channel', 'height', 'width') >>> renamed_imgs.names ('batch', 'channel', 'height', 'width') Warning The named tensor API is experimental and subject to change. `rename_(*names, **rename_map)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.rename_) In-place version of `rename()`. `refine_names(*names)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.refine_names) Refines the dimension names of `self` according to `names`. Refining is a special case of renaming that “lifts” unnamed dimensions. A `None` dim can be refined to have any name; a named dim can only be refined to have the same name. Because named tensors can coexist with unnamed tensors, refining names gives a nice way to write named-tensor-aware code that works with both named and unnamed tensors. `names` may contain up to one Ellipsis (`...`). The Ellipsis is expanded greedily; it is expanded in-place to fill `names` to the same length as `self.dim()` using names from the corresponding indices of `self.names`. Python 2 does not support Ellipsis but one may use a string literal instead (`'...'`). Parameters **names** (_iterable of str_) – The desired names of the output tensor. May contain up to one Ellipsis. Examples: >>> imgs = torch.randn(32, 3, 128, 128) >>> named_imgs = imgs.refine_names('N', 'C', 'H', 'W') >>> named_imgs.names ('N', 'C', 'H', 'W') >>> tensor = torch.randn(2, 3, 5, 7, 11) >>> tensor = tensor.refine_names('A', ..., 'B', 'C') >>> tensor.names ('A', None, None, 'B', 'C') Warning The named tensor API is experimental and subject to change. `align_as(other) → Tensor` Permutes the dimensions of the `self` tensor to match the dimension order in the `other` tensor, adding size-one dims for any new names. This operation is useful for explicit broadcasting by names (see examples). All of the dims of `self` must be named in order to use this method. The resulting tensor is a view on the original tensor. All dimension names of `self` must be present in `other.names`. `other` may contain named dimensions that are not in `self.names`; the output tensor has a size-one dimension for each of those new names. To align a tensor to a specific order, use `align_to()`. Examples: # Example 1: Applying a mask >>> mask = torch.randint(2, [127, 128], dtype=torch.bool).refine_names('W', 'H') >>> imgs = torch.randn(32, 128, 127, 3, names=('N', 'H', 'W', 'C')) >>> imgs.masked_fill_(mask.align_as(imgs), 0) # Example 2: Applying a per-channel-scale >>> def scale_channels(input, scale): >>> scale = scale.refine_names('C') >>> return input * scale.align_as(input) >>> num_channels = 3 >>> scale = torch.randn(num_channels, names=('C',)) >>> imgs = torch.rand(32, 128, 128, num_channels, names=('N', 'H', 'W', 'C')) >>> more_imgs = torch.rand(32, num_channels, 128, 128, names=('N', 'C', 'H', 'W')) >>> videos = torch.randn(3, num_channels, 128, 128, 128, names=('N', 'C', 'H', 'W', 'D')) # scale_channels is agnostic to the dimension order of the input >>> scale_channels(imgs, scale) >>> scale_channels(more_imgs, scale) >>> scale_channels(videos, scale) Warning The named tensor API is experimental and subject to change. `align_to(*names)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.align_to) Permutes the dimensions of the `self` tensor to match the order specified in `names`, adding size-one dims for any new names. All of the dims of `self` must be named in order to use this method. The resulting tensor is a view on the original tensor. All dimension names of `self` must be present in `names`. `names` may contain additional names that are not in `self.names`; the output tensor has a size- one dimension for each of those new names. `names` may contain up to one Ellipsis (`...`). The Ellipsis is expanded to be equal to all dimension names of `self` that are not mentioned in `names`, in the order that they appear in `self`. Python 2 does not support Ellipsis but one may use a string literal instead (`'...'`). Parameters **names** (_iterable of str_) – The desired dimension ordering of the output tensor. May contain up to one Ellipsis that is expanded to all unmentioned dim names of `self`. Examples: >>> tensor = torch.randn(2, 2, 2, 2, 2, 2) >>> named_tensor = tensor.refine_names('A', 'B', 'C', 'D', 'E', 'F') # Move the F and E dims to the front while keeping the rest in order >>> named_tensor.align_to('F', 'E', ...) Warning The named tensor API is experimental and subject to change. `unflatten(dim, sizes)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.unflatten) Expands the dimension [`dim`](tensors#torch.Tensor.dim "torch.Tensor.dim") of the `self` tensor over multiple dimensions of sizes given by `sizes`. * `sizes` is the new shape of the unflattened dimension and it can be a `Tuple[int]` as well as `torch.Size` if `self` is a `Tensor`, or `namedshape` (Tuple[(name: str, size: int)]) if `self` is a `NamedTensor`. The total number of elements in sizes must match the number of elements in the original dim being unflattened. Parameters * **dim** (_Union_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _]_) – Dimension to unflatten * **sizes** (_Union_ _[__Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _torch.Size_ _,__Tuple_ _[__Tuple_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]__]__]_) – New shape of the unflattened dimension #### Examples >>> torch.randn(3, 4, 1).unflatten(1, (2, 2)).shape torch.Size([3, 2, 2, 1]) >>> torch.randn(2, 4, names=('A', 'B')).unflatten('B', (('B1', 2), ('B2', 2))) tensor([[[-1.1772, 0.0180], [ 0.2412, 0.1431]], [[-1.1819, -0.8899], [ 1.5813, 0.2274]]], names=(‘A’, ‘B1’, ‘B2’)) Warning The named tensor API is experimental and subject to change. `flatten(dims, out_dim) → Tensor` Flattens `dims` into a single dimension with name `out_dim`. All of `dims` must be consecutive in order in the `self` tensor, but not necessary contiguous in memory. Examples: >>> imgs = torch.randn(32, 3, 128, 128, names=('N', 'C', 'H', 'W')) >>> flat_imgs = imgs.flatten(['C', 'H', 'W'], 'features') >>> flat_imgs.names, flat_imgs.shape (('N', 'features'), torch.Size([32, 49152])) Warning The named tensor API is experimental and subject to change. # torch.nn.functional ## Convolution functions ### conv1d `torch.nn.functional.conv1d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor` Applies a 1D convolution over an input signal composed of several input planes. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). See [`Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") for details and output shape. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** – input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW) * **weight** – filters of shape (out_channels,in_channelsgroups,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kW) * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: `None` * **stride** – the stride of the convolving kernel. Can be a single number or a one-element tuple `(sW,)`. Default: 1 * **padding** – implicit paddings on both sides of the input. Can be a single number or a one-element tuple `(padW,)`. Default: 0 * **dilation** – the spacing between kernel elements. Can be a single number or a one-element tuple `(dW,)`. Default: 1 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 Examples: >>> filters = torch.randn(33, 16, 3) >>> inputs = torch.randn(20, 16, 50) >>> F.conv1d(inputs, filters) ### conv2d `torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor` Applies a 2D convolution over an input image composed of several input planes. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). See [`Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") for details and output shape. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** – input tensor of shape (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW) * **weight** – filters of shape (out_channels,in_channelsgroups,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kH , kW) * **bias** – optional bias tensor of shape (out_channels)(\text{out\\_channels}) . Default: `None` * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sH, sW)`. Default: 1 * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dH, dW)`. Default: 1 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 Examples: >>> # With square kernels and equal stride >>> filters = torch.randn(8,4,3,3) >>> inputs = torch.randn(1,4,5,5) >>> F.conv2d(inputs, filters, padding=1) ### conv3d `torch.nn.functional.conv3d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1) → Tensor` Applies a 3D convolution over an input image composed of several input planes. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). See [`Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") for details and output shape. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** – input tensor of shape (minibatch,in_channels,iT,iH,iW)(\text{minibatch} , \text{in\\_channels} , iT , iH , iW) * **weight** – filters of shape (out_channels,in_channelsgroups,kT,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kT , kH , kW) * **bias** – optional bias tensor of shape (out_channels)(\text{out\\_channels}) . Default: None * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sT, sH, sW)`. Default: 1 * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padT, padH, padW)`. Default: 0 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dT, dH, dW)`. Default: 1 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 Examples: >>> filters = torch.randn(33, 16, 3, 3, 3) >>> inputs = torch.randn(20, 16, 50, 10, 20) >>> F.conv3d(inputs, filters) ### conv_transpose1d `torch.nn.functional.conv_transpose1d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor` Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called “deconvolution”. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). See [`ConvTranspose1d`](generated/torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") for details and output shape. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** – input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW) * **weight** – filters of shape (in_channels,out_channelsgroups,kW)(\text{in\\_channels} , \frac{\text{out\\_channels}}{\text{groups}} , kW) * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: None * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sW,)`. Default: 1 * **padding** – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple `(padW,)`. Default: 0 * **output_padding** – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple `(out_padW)`. Default: 0 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dW,)`. Default: 1 Examples: >>> inputs = torch.randn(20, 16, 50) >>> weights = torch.randn(16, 33, 5) >>> F.conv_transpose1d(inputs, weights) ### conv_transpose2d `torch.nn.functional.conv_transpose2d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor` Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution”. This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). See [`ConvTranspose2d`](generated/torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") for details and output shape. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** – input tensor of shape (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW) * **weight** – filters of shape (in_channels,out_channelsgroups,kH,kW)(\text{in\\_channels} , \frac{\text{out\\_channels}}{\text{groups}} , kH , kW) * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: None * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sH, sW)`. Default: 1 * **padding** – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple `(padH, padW)`. Default: 0 * **output_padding** – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple `(out_padH, out_padW)`. Default: 0 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dH, dW)`. Default: 1 Examples: >>> # With square kernels and equal stride >>> inputs = torch.randn(1, 4, 5, 5) >>> weights = torch.randn(4, 8, 3, 3) >>> F.conv_transpose2d(inputs, weights, padding=1) ### conv_transpose3d `torch.nn.functional.conv_transpose3d(input, weight, bias=None, stride=1, padding=0, output_padding=0, groups=1, dilation=1) → Tensor` Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called “deconvolution” This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). See [`ConvTranspose3d`](generated/torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") for details and output shape. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** – input tensor of shape (minibatch,in_channels,iT,iH,iW)(\text{minibatch} , \text{in\\_channels} , iT , iH , iW) * **weight** – filters of shape (in_channels,out_channelsgroups,kT,kH,kW)(\text{in\\_channels} , \frac{\text{out\\_channels}}{\text{groups}} , kT , kH , kW) * **bias** – optional bias of shape (out_channels)(\text{out\\_channels}) . Default: None * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sT, sH, sW)`. Default: 1 * **padding** – `dilation * (kernel_size - 1) - padding` zero-padding will be added to both sides of each dimension in the input. Can be a single number or a tuple `(padT, padH, padW)`. Default: 0 * **output_padding** – additional size added to one side of each dimension in the output shape. Can be a single number or a tuple `(out_padT, out_padH, out_padW)`. Default: 0 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dT, dH, dW)`. Default: 1 Examples: >>> inputs = torch.randn(20, 16, 50, 10, 20) >>> weights = torch.randn(16, 33, 3, 3, 3) >>> F.conv_transpose3d(inputs, weights) ### unfold `torch.nn.functional.unfold(input, kernel_size, dilation=1, padding=0, stride=1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#unfold) Extracts sliding local blocks from a batched input tensor. Warning Currently, only 4-D input tensors (batched image-like tensors) are supported. Warning More than one element of the unfolded tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensor, please clone it first. See [`torch.nn.Unfold`](generated/torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold") for details ### fold `torch.nn.functional.fold(input, output_size, kernel_size, dilation=1, padding=0, stride=1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#fold) Combines an array of sliding local blocks into a large containing tensor. Warning Currently, only 3-D output tensors (unfolded batched image-like tensors) are supported. See [`torch.nn.Fold`](generated/torch.nn.fold#torch.nn.Fold "torch.nn.Fold") for details ## Pooling functions ### avg_pool1d `torch.nn.functional.avg_pool1d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True) → Tensor` Applies a 1D average pooling over an input signal composed of several input planes. See [`AvgPool1d`](generated/torch.nn.avgpool1d#torch.nn.AvgPool1d "torch.nn.AvgPool1d") for details and output shape. Parameters * **input** – input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW) * **kernel_size** – the size of the window. Can be a single number or a tuple `(kW,)` * **stride** – the stride of the window. Can be a single number or a tuple `(sW,)`. Default: `kernel_size` * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padW,)`. Default: 0 * **ceil_mode** – when True, will use `ceil` instead of `floor` to compute the output shape. Default: `False` * **count_include_pad** – when True, will include the zero-padding in the averaging calculation. Default: `True` Examples: >>> # pool of square window of size=3, stride=2 >>> input = torch.tensor([[[1, 2, 3, 4, 5, 6, 7]]], dtype=torch.float32) >>> F.avg_pool1d(input, kernel_size=3, stride=2) tensor([[[ 2., 4., 6.]]]) ### avg_pool2d `torch.nn.functional.avg_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None) → Tensor` Applies 2D average-pooling operation in kH×kWkH \times kW regions by step size sH×sWsH \times sW steps. The number of output features is equal to the number of input planes. See [`AvgPool2d`](generated/torch.nn.avgpool2d#torch.nn.AvgPool2d "torch.nn.AvgPool2d") for details and output shape. Parameters * **input** – input tensor (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW) * **kernel_size** – size of the pooling region. Can be a single number or a tuple `(kH, kW)` * **stride** – stride of the pooling operation. Can be a single number or a tuple `(sH, sW)`. Default: `kernel_size` * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0 * **ceil_mode** – when True, will use `ceil` instead of `floor` in the formula to compute the output shape. Default: `False` * **count_include_pad** – when True, will include the zero-padding in the averaging calculation. Default: `True` * **divisor_override** – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None ### avg_pool3d `torch.nn.functional.avg_pool3d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None) → Tensor` Applies 3D average-pooling operation in kT×kH×kWkT \times kH \times kW regions by step size sT×sH×sWsT \times sH \times sW steps. The number of output features is equal to ⌊input planessT⌋\lfloor\frac{\text{input planes}}{sT}\rfloor . See [`AvgPool3d`](generated/torch.nn.avgpool3d#torch.nn.AvgPool3d "torch.nn.AvgPool3d") for details and output shape. Parameters * **input** – input tensor (minibatch,in_channels,iT×iH,iW)(\text{minibatch} , \text{in\\_channels} , iT \times iH , iW) * **kernel_size** – size of the pooling region. Can be a single number or a tuple `(kT, kH, kW)` * **stride** – stride of the pooling operation. Can be a single number or a tuple `(sT, sH, sW)`. Default: `kernel_size` * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padT, padH, padW)`, Default: 0 * **ceil_mode** – when True, will use `ceil` instead of `floor` in the formula to compute the output shape * **count_include_pad** – when True, will include the zero-padding in the averaging calculation * **divisor_override** – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None ### max_pool1d `torch.nn.functional.max_pool1d(*args, **kwargs)` Applies a 1D max pooling over an input signal composed of several input planes. See [`MaxPool1d`](generated/torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") for details. ### max_pool2d `torch.nn.functional.max_pool2d(*args, **kwargs)` Applies a 2D max pooling over an input signal composed of several input planes. See [`MaxPool2d`](generated/torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") for details. ### max_pool3d `torch.nn.functional.max_pool3d(*args, **kwargs)` Applies a 3D max pooling over an input signal composed of several input planes. See [`MaxPool3d`](generated/torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") for details. ### max_unpool1d `torch.nn.functional.max_unpool1d(input, indices, kernel_size, stride=None, padding=0, output_size=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#max_unpool1d) Computes a partial inverse of `MaxPool1d`. See [`MaxUnpool1d`](generated/torch.nn.maxunpool1d#torch.nn.MaxUnpool1d "torch.nn.MaxUnpool1d") for details. ### max_unpool2d `torch.nn.functional.max_unpool2d(input, indices, kernel_size, stride=None, padding=0, output_size=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#max_unpool2d) Computes a partial inverse of `MaxPool2d`. See [`MaxUnpool2d`](generated/torch.nn.maxunpool2d#torch.nn.MaxUnpool2d "torch.nn.MaxUnpool2d") for details. ### max_unpool3d `torch.nn.functional.max_unpool3d(input, indices, kernel_size, stride=None, padding=0, output_size=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#max_unpool3d) Computes a partial inverse of `MaxPool3d`. See [`MaxUnpool3d`](generated/torch.nn.maxunpool3d#torch.nn.MaxUnpool3d "torch.nn.MaxUnpool3d") for details. ### lp_pool1d `torch.nn.functional.lp_pool1d(input, norm_type, kernel_size, stride=None, ceil_mode=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#lp_pool1d) Applies a 1D power-average pooling over an input signal composed of several input planes. If the sum of all inputs to the power of `p` is zero, the gradient is set to zero as well. See [`LPPool1d`](generated/torch.nn.lppool1d#torch.nn.LPPool1d "torch.nn.LPPool1d") for details. ### lp_pool2d `torch.nn.functional.lp_pool2d(input, norm_type, kernel_size, stride=None, ceil_mode=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#lp_pool2d) Applies a 2D power-average pooling over an input signal composed of several input planes. If the sum of all inputs to the power of `p` is zero, the gradient is set to zero as well. See [`LPPool2d`](generated/torch.nn.lppool2d#torch.nn.LPPool2d "torch.nn.LPPool2d") for details. ### adaptive_max_pool1d `torch.nn.functional.adaptive_max_pool1d(*args, **kwargs)` Applies a 1D adaptive max pooling over an input signal composed of several input planes. See [`AdaptiveMaxPool1d`](generated/torch.nn.adaptivemaxpool1d#torch.nn.AdaptiveMaxPool1d "torch.nn.AdaptiveMaxPool1d") for details and output shape. Parameters * **output_size** – the target output size (single integer) * **return_indices** – whether to return pooling indices. Default: `False` ### adaptive_max_pool2d `torch.nn.functional.adaptive_max_pool2d(*args, **kwargs)` Applies a 2D adaptive max pooling over an input signal composed of several input planes. See [`AdaptiveMaxPool2d`](generated/torch.nn.adaptivemaxpool2d#torch.nn.AdaptiveMaxPool2d "torch.nn.AdaptiveMaxPool2d") for details and output shape. Parameters * **output_size** – the target output size (single integer or double-integer tuple) * **return_indices** – whether to return pooling indices. Default: `False` ### adaptive_max_pool3d `torch.nn.functional.adaptive_max_pool3d(*args, **kwargs)` Applies a 3D adaptive max pooling over an input signal composed of several input planes. See [`AdaptiveMaxPool3d`](generated/torch.nn.adaptivemaxpool3d#torch.nn.AdaptiveMaxPool3d "torch.nn.AdaptiveMaxPool3d") for details and output shape. Parameters * **output_size** – the target output size (single integer or triple-integer tuple) * **return_indices** – whether to return pooling indices. Default: `False` ### adaptive_avg_pool1d `torch.nn.functional.adaptive_avg_pool1d(input, output_size) → Tensor` Applies a 1D adaptive average pooling over an input signal composed of several input planes. See [`AdaptiveAvgPool1d`](generated/torch.nn.adaptiveavgpool1d#torch.nn.AdaptiveAvgPool1d "torch.nn.AdaptiveAvgPool1d") for details and output shape. Parameters **output_size** – the target output size (single integer) ### adaptive_avg_pool2d `torch.nn.functional.adaptive_avg_pool2d(input, output_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#adaptive_avg_pool2d) Applies a 2D adaptive average pooling over an input signal composed of several input planes. See [`AdaptiveAvgPool2d`](generated/torch.nn.adaptiveavgpool2d#torch.nn.AdaptiveAvgPool2d "torch.nn.AdaptiveAvgPool2d") for details and output shape. Parameters **output_size** – the target output size (single integer or double-integer tuple) ### adaptive_avg_pool3d `torch.nn.functional.adaptive_avg_pool3d(input, output_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#adaptive_avg_pool3d) Applies a 3D adaptive average pooling over an input signal composed of several input planes. See [`AdaptiveAvgPool3d`](generated/torch.nn.adaptiveavgpool3d#torch.nn.AdaptiveAvgPool3d "torch.nn.AdaptiveAvgPool3d") for details and output shape. Parameters **output_size** – the target output size (single integer or triple-integer tuple) ## Non-linear activation functions ### threshold `torch.nn.functional.threshold(input, threshold, value, inplace=False)` Thresholds each element of the input Tensor. See [`Threshold`](generated/torch.nn.threshold#torch.nn.Threshold "torch.nn.Threshold") for more details. `torch.nn.functional.threshold_(input, threshold, value) → Tensor` In-place version of `threshold()`. ### relu `torch.nn.functional.relu(input, inplace=False) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#relu) Applies the rectified linear unit function element-wise. See [`ReLU`](generated/torch.nn.relu#torch.nn.ReLU "torch.nn.ReLU") for more details. `torch.nn.functional.relu_(input) → Tensor` In-place version of `relu()`. ### hardtanh `torch.nn.functional.hardtanh(input, min_val=-1., max_val=1., inplace=False) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardtanh) Applies the HardTanh function element-wise. See [`Hardtanh`](generated/torch.nn.hardtanh#torch.nn.Hardtanh "torch.nn.Hardtanh") for more details. `torch.nn.functional.hardtanh_(input, min_val=-1., max_val=1.) → Tensor` In-place version of `hardtanh()`. ### hardswish `torch.nn.functional.hardswish(input, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardswish) Applies the hardswish function, element-wise, as described in the paper: [Searching for MobileNetV3](https://arxiv.org/abs/1905.02244). Hardswish(x)={0if x≤−3,xif x≥+3,x⋅(x+3)/6otherwise\text{Hardswish}(x) = \begin{cases} 0 & \text{if~} x \le -3, \\\ x & \text{if~} x \ge +3, \\\ x \cdot (x + 3) /6 & \text{otherwise} \end{cases} See [`Hardswish`](generated/torch.nn.hardswish#torch.nn.Hardswish "torch.nn.Hardswish") for more details. ### relu6 `torch.nn.functional.relu6(input, inplace=False) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#relu6) Applies the element-wise function ReLU6(x)=min⁡(max⁡(0,x),6)\text{ReLU6}(x) = \min(\max(0,x), 6) . See [`ReLU6`](generated/torch.nn.relu6#torch.nn.ReLU6 "torch.nn.ReLU6") for more details. ### elu `torch.nn.functional.elu(input, alpha=1.0, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#elu) Applies element-wise, ELU(x)=max⁡(0,x)+min⁡(0,α∗(exp⁡(x)−1))\text{ELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x) - 1)) . See [`ELU`](generated/torch.nn.elu#torch.nn.ELU "torch.nn.ELU") for more details. `torch.nn.functional.elu_(input, alpha=1.) → Tensor` In-place version of `elu()`. ### selu `torch.nn.functional.selu(input, inplace=False) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#selu) Applies element-wise, SELU(x)=scale∗(max⁡(0,x)+min⁡(0,α∗(exp⁡(x)−1)))\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1))) , with α=1.6732632423543772848170429916717\alpha=1.6732632423543772848170429916717 and scale=1.0507009873554804934193349852946scale=1.0507009873554804934193349852946 . See [`SELU`](generated/torch.nn.selu#torch.nn.SELU "torch.nn.SELU") for more details. ### celu `torch.nn.functional.celu(input, alpha=1., inplace=False) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#celu) Applies element-wise, CELU(x)=max⁡(0,x)+min⁡(0,α∗(exp⁡(x/α)−1))\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1)) . See [`CELU`](generated/torch.nn.celu#torch.nn.CELU "torch.nn.CELU") for more details. ### leaky_relu `torch.nn.functional.leaky_relu(input, negative_slope=0.01, inplace=False) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#leaky_relu) Applies element-wise, LeakyReLU(x)=max⁡(0,x)+negative_slope∗min⁡(0,x)\text{LeakyReLU}(x) = \max(0, x) + \text{negative\\_slope} * \min(0, x) See [`LeakyReLU`](generated/torch.nn.leakyrelu#torch.nn.LeakyReLU "torch.nn.LeakyReLU") for more details. `torch.nn.functional.leaky_relu_(input, negative_slope=0.01) → Tensor` In-place version of `leaky_relu()`. ### prelu `torch.nn.functional.prelu(input, weight) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#prelu) Applies element-wise the function PReLU(x)=max⁡(0,x)+weight∗min⁡(0,x)\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x) where weight is a learnable parameter. See [`PReLU`](generated/torch.nn.prelu#torch.nn.PReLU "torch.nn.PReLU") for more details. ### rrelu `torch.nn.functional.rrelu(input, lower=1./8, upper=1./3, training=False, inplace=False) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#rrelu) Randomized leaky ReLU. See [`RReLU`](generated/torch.nn.rrelu#torch.nn.RReLU "torch.nn.RReLU") for more details. `torch.nn.functional.rrelu_(input, lower=1./8, upper=1./3, training=False) → Tensor` In-place version of `rrelu()`. ### glu `torch.nn.functional.glu(input, dim=-1) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#glu) The gated linear unit. Computes: GLU(a,b)=a⊗σ(b)\text{GLU}(a, b) = a \otimes \sigma(b) where `input` is split in half along `dim` to form `a` and `b`, σ\sigma is the sigmoid function and ⊗\otimes is the element-wise product between matrices. See [Language Modeling with Gated Convolutional Networks](https://arxiv.org/abs/1612.08083). Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input tensor * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension on which to split the input. Default: -1 ### gelu `torch.nn.functional.gelu(input) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#gelu) Applies element-wise the function GELU(x)=x∗Φ(x)\text{GELU}(x) = x * \Phi(x) where Φ(x)\Phi(x) is the Cumulative Distribution Function for Gaussian Distribution. See [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415). ### logsigmoid `torch.nn.functional.logsigmoid(input) → Tensor` Applies element-wise LogSigmoid(xi)=log⁡(11+exp⁡(−xi))\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right) See [`LogSigmoid`](generated/torch.nn.logsigmoid#torch.nn.LogSigmoid "torch.nn.LogSigmoid") for more details. ### hardshrink `torch.nn.functional.hardshrink(input, lambd=0.5) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardshrink) Applies the hard shrinkage function element-wise See [`Hardshrink`](generated/torch.nn.hardshrink#torch.nn.Hardshrink "torch.nn.Hardshrink") for more details. ### tanhshrink `torch.nn.functional.tanhshrink(input) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#tanhshrink) Applies element-wise, Tanhshrink(x)=x−Tanh(x)\text{Tanhshrink}(x) = x - \text{Tanh}(x) See [`Tanhshrink`](generated/torch.nn.tanhshrink#torch.nn.Tanhshrink "torch.nn.Tanhshrink") for more details. ### softsign `torch.nn.functional.softsign(input) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#softsign) Applies element-wise, the function SoftSign(x)=x1+∣x∣\text{SoftSign}(x) = \frac{x}{1 + |x|} See [`Softsign`](generated/torch.nn.softsign#torch.nn.Softsign "torch.nn.Softsign") for more details. ### softplus `torch.nn.functional.softplus(input, beta=1, threshold=20) → Tensor` Applies element-wise, the function Softplus(x)=1β∗log⁡(1+exp⁡(β∗x))\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x)) . For numerical stability the implementation reverts to the linear function when input×β>thresholdinput \times \beta > threshold . See [`Softplus`](generated/torch.nn.softplus#torch.nn.Softplus "torch.nn.Softplus") for more details. ### softmin `torch.nn.functional.softmin(input, dim=None, _stacklevel=3, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#softmin) Applies a softmin function. Note that Softmin(x)=Softmax(−x)\text{Softmin}(x) = \text{Softmax}(-x) . See softmax definition for mathematical formula. See [`Softmin`](generated/torch.nn.softmin#torch.nn.Softmin "torch.nn.Softmin") for more details. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmin will be computed (so every slice along dim will sum to 1). * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. ### softmax `torch.nn.functional.softmax(input, dim=None, _stacklevel=3, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#softmax) Applies a softmax function. Softmax is defined as: Softmax(xi)=exp⁡(xi)∑jexp⁡(xj)\text{Softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_j \exp(x_j)} It is applied to all slices along dim, and will re-scale them so that the elements lie in the range `[0, 1]` and sum to 1. See [`Softmax`](generated/torch.nn.softmax#torch.nn.Softmax "torch.nn.Softmax") for more details. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed. * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. Note This function doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use log_softmax instead (it’s faster and has better numerical properties). ### softshrink `torch.nn.functional.softshrink(input, lambd=0.5) → Tensor` Applies the soft shrinkage function elementwise See [`Softshrink`](generated/torch.nn.softshrink#torch.nn.Softshrink "torch.nn.Softshrink") for more details. ### gumbel_softmax `torch.nn.functional.gumbel_softmax(logits, tau=1, hard=False, eps=1e-10, dim=-1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#gumbel_softmax) Samples from the Gumbel-Softmax distribution ([Link 1](https://arxiv.org/abs/1611.00712) [Link 2](https://arxiv.org/abs/1611.01144)) and optionally discretizes. Parameters * **logits** – `[…, num_features]` unnormalized log probabilities * **tau** – non-negative scalar temperature * **hard** – if `True`, the returned samples will be discretized as one-hot vectors, but will be differentiated as if it is the soft sample in autograd * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed. Default: -1. Returns Sampled tensor of same shape as `logits` from the Gumbel-Softmax distribution. If `hard=True`, the returned samples will be one-hot, otherwise they will be probability distributions that sum to 1 across `dim`. Note This function is here for legacy reasons, may be removed from nn.Functional in the future. Note The main trick for `hard` is to do `y_hard - y_soft.detach() + y_soft` It achieves two things: - makes the output value exactly one-hot (since we add then subtract y_soft value) - makes the gradient equal to y_soft gradient (since we strip all other gradients) Examples:: >>> logits = torch.randn(20, 32) >>> # Sample soft categorical using reparametrization trick: >>> F.gumbel_softmax(logits, tau=1, hard=False) >>> # Sample hard categorical using "Straight-through" trick: >>> F.gumbel_softmax(logits, tau=1, hard=True) ### log_softmax `torch.nn.functional.log_softmax(input, dim=None, _stacklevel=3, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#log_softmax) Applies a softmax followed by a logarithm. While mathematically equivalent to log(softmax(x)), doing these two operations separately is slower, and numerically unstable. This function uses an alternative formulation to compute the output and gradient correctly. See [`LogSoftmax`](generated/torch.nn.logsoftmax#torch.nn.LogSoftmax "torch.nn.LogSoftmax") for more details. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which log_softmax will be computed. * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None. ### tanh `torch.nn.functional.tanh(input) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#tanh) Applies element-wise, Tanh(x)=tanh⁡(x)=exp⁡(x)−exp⁡(−x)exp⁡(x)+exp⁡(−x)\text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)} See [`Tanh`](generated/torch.nn.tanh#torch.nn.Tanh "torch.nn.Tanh") for more details. ### sigmoid `torch.nn.functional.sigmoid(input) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#sigmoid) Applies the element-wise function Sigmoid(x)=11+exp⁡(−x)\text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)} See [`Sigmoid`](generated/torch.nn.sigmoid#torch.nn.Sigmoid "torch.nn.Sigmoid") for more details. ### hardsigmoid `torch.nn.functional.hardsigmoid(input) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hardsigmoid) Applies the element-wise function Hardsigmoid(x)={0if x≤−3,1if x≥+3,x/6+1/2otherwise\text{Hardsigmoid}(x) = \begin{cases} 0 & \text{if~} x \le -3, \\\ 1 & \text{if~} x \ge +3, \\\ x / 6 + 1 / 2 & \text{otherwise} \end{cases} Parameters **inplace** – If set to `True`, will do this operation in-place. Default: `False` See [`Hardsigmoid`](generated/torch.nn.hardsigmoid#torch.nn.Hardsigmoid "torch.nn.Hardsigmoid") for more details. ### silu `torch.nn.functional.silu(input, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#silu) Applies the silu function, element-wise. silu(x)=x∗σ(x),where σ(x) is the logistic sigmoid.\text{silu}(x) = x * \sigma(x), \text{where } \sigma(x) \text{ is the logistic sigmoid.} Note See [Gaussian Error Linear Units (GELUs)](https://arxiv.org/abs/1606.08415) where the SiLU (Sigmoid Linear Unit) was originally coined, and see [Sigmoid- Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning](https://arxiv.org/abs/1702.03118) and [Swish: a Self- Gated Activation Function](https://arxiv.org/abs/1710.05941v1) where the SiLU was experimented with later. See [`SiLU`](generated/torch.nn.silu#torch.nn.SiLU "torch.nn.SiLU") for more details. ## Normalization functions ### batch_norm `torch.nn.functional.batch_norm(input, running_mean, running_var, weight=None, bias=None, training=False, momentum=0.1, eps=1e-05)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#batch_norm) Applies Batch Normalization for each channel across a batch of data. See [`BatchNorm1d`](generated/torch.nn.batchnorm1d#torch.nn.BatchNorm1d "torch.nn.BatchNorm1d"), [`BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d "torch.nn.BatchNorm2d"), [`BatchNorm3d`](generated/torch.nn.batchnorm3d#torch.nn.BatchNorm3d "torch.nn.BatchNorm3d") for details. ### instance_norm `torch.nn.functional.instance_norm(input, running_mean=None, running_var=None, weight=None, bias=None, use_input_stats=True, momentum=0.1, eps=1e-05)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#instance_norm) Applies Instance Normalization for each channel in each data sample in a batch. See [`InstanceNorm1d`](generated/torch.nn.instancenorm1d#torch.nn.InstanceNorm1d "torch.nn.InstanceNorm1d"), [`InstanceNorm2d`](generated/torch.nn.instancenorm2d#torch.nn.InstanceNorm2d "torch.nn.InstanceNorm2d"), [`InstanceNorm3d`](generated/torch.nn.instancenorm3d#torch.nn.InstanceNorm3d "torch.nn.InstanceNorm3d") for details. ### layer_norm `torch.nn.functional.layer_norm(input, normalized_shape, weight=None, bias=None, eps=1e-05)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#layer_norm) Applies Layer Normalization for last certain number of dimensions. See [`LayerNorm`](generated/torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") for details. ### local_response_norm `torch.nn.functional.local_response_norm(input, size, alpha=0.0001, beta=0.75, k=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#local_response_norm) Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. Applies normalization across channels. See [`LocalResponseNorm`](generated/torch.nn.localresponsenorm#torch.nn.LocalResponseNorm "torch.nn.LocalResponseNorm") for details. ### normalize `torch.nn.functional.normalize(input, p=2, dim=1, eps=1e-12, out=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#normalize) Performs LpL_p normalization of inputs over specified dimension. For a tensor `input` of sizes (n0,...,ndim,...,nk)(n_0, ..., n_{dim}, ..., n_k) , each ndimn_{dim} -element vector vv along dimension `dim` is transformed as v=vmax⁡(∥v∥p,ϵ).v = \frac{v}{\max(\lVert v \rVert_p, \epsilon)}. With the default arguments it uses the Euclidean norm over vectors along dimension 11 for normalization. Parameters * **input** – input tensor of any shape * **p** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the exponent value in the norm formulation. Default: 2 * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to reduce. Default: 1 * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – small value to avoid division by zero. Default: 1e-12 * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. If `out` is used, this operation won’t be differentiable. ## Linear functions ### linear `torch.nn.functional.linear(input, weight, bias=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#linear) Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b . This operator supports [TensorFloat32](https://pytorch.org/docs/1.8.0/notes/cuda.html#tf32-on- ampere). Shape: * Input: (N,∗,in_features)(N, *, in\\_features) N is the batch size, `*` means any number of additional dimensions * Weight: (out_features,in_features)(out\\_features, in\\_features) * Bias: (out_features)(out\\_features) * Output: (N,∗,out_features)(N, *, out\\_features) ### bilinear `torch.nn.functional.bilinear(input1, input2, weight, bias=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#bilinear) Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A x_2 + b Shape: * input1: (N,∗,Hin1)(N, *, H_{in1}) where Hin1=in1_featuresH_{in1}=\text{in1\\_features} and ∗* means any number of additional dimensions. All but the last dimension of the inputs should be the same. * input2: (N,∗,Hin2)(N, *, H_{in2}) where Hin2=in2_featuresH_{in2}=\text{in2\\_features} * weight: (out_features,in1_features,in2_features)(\text{out\\_features}, \text{in1\\_features}, \text{in2\\_features}) * bias: (out_features)(\text{out\\_features}) * output: (N,∗,Hout)(N, *, H_{out}) where Hout=out_featuresH_{out}=\text{out\\_features} and all but the last dimension are the same shape as the input. ## Dropout functions ### dropout `torch.nn.functional.dropout(input, p=0.5, training=True, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#dropout) During training, randomly zeroes some of the elements of the input tensor with probability `p` using samples from a Bernoulli distribution. See [`Dropout`](generated/torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout") for details. Parameters * **p** – probability of an element to be zeroed. Default: 0.5 * **training** – apply dropout if is `True`. Default: `True` * **inplace** – If set to `True`, will do this operation in-place. Default: `False` ### alpha_dropout `torch.nn.functional.alpha_dropout(input, p=0.5, training=False, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#alpha_dropout) Applies alpha dropout to the input. See [`AlphaDropout`](generated/torch.nn.alphadropout#torch.nn.AlphaDropout "torch.nn.AlphaDropout") for details. ### feature_alpha_dropout `torch.nn.functional.feature_alpha_dropout(input, p=0.5, training=False, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#feature_alpha_dropout) Randomly masks out entire channels (a channel is a feature map, e.g. the jj -th channel of the ii -th sample in the batch input is a tensor input[i,j]\text{input}[i, j] ) of the input tensor). Instead of setting activations to zero, as in regular Dropout, the activations are set to the negative saturation value of the SELU activation function. Each element will be masked independently on every forward call with probability `p` using samples from a Bernoulli distribution. The elements to be masked are randomized on every forward call, and scaled and shifted to maintain zero mean and unit variance. See `FeatureAlphaDropout` for details. Parameters * **p** – dropout probability of a channel to be zeroed. Default: 0.5 * **training** – apply dropout if is `True`. Default: `True` * **inplace** – If set to `True`, will do this operation in-place. Default: `False` ### dropout2d `torch.nn.functional.dropout2d(input, p=0.5, training=True, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#dropout2d) Randomly zero out entire channels (a channel is a 2D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 2D tensor input[i,j]\text{input}[i, j] ) of the input tensor). Each channel will be zeroed out independently on every forward call with probability `p` using samples from a Bernoulli distribution. See [`Dropout2d`](generated/torch.nn.dropout2d#torch.nn.Dropout2d "torch.nn.Dropout2d") for details. Parameters * **p** – probability of a channel to be zeroed. Default: 0.5 * **training** – apply dropout if is `True`. Default: `True` * **inplace** – If set to `True`, will do this operation in-place. Default: `False` ### dropout3d `torch.nn.functional.dropout3d(input, p=0.5, training=True, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#dropout3d) Randomly zero out entire channels (a channel is a 3D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 3D tensor input[i,j]\text{input}[i, j] ) of the input tensor). Each channel will be zeroed out independently on every forward call with probability `p` using samples from a Bernoulli distribution. See [`Dropout3d`](generated/torch.nn.dropout3d#torch.nn.Dropout3d "torch.nn.Dropout3d") for details. Parameters * **p** – probability of a channel to be zeroed. Default: 0.5 * **training** – apply dropout if is `True`. Default: `True` * **inplace** – If set to `True`, will do this operation in-place. Default: `False` ## Sparse functions ### embedding `torch.nn.functional.embedding(input, weight, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#embedding) A simple lookup table that looks up embeddings in a fixed dictionary and size. This module is often used to retrieve word embeddings using indices. The input to the module is a list of indices, and the embedding matrix, and the output is the corresponding word embeddings. See [`torch.nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") for more details. Parameters * **input** (_LongTensor_) – Tensor containing indices into the embedding matrix * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size * **padding_idx** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – If given, pads the output with the embedding vector at `padding_idx` (initialized to zeros) whenever it encounters the index. * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`. Note: this will modify `weight` in-place. * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The p of the p-norm to compute for the `max_norm` option. Default `2`. * **scale_grad_by_freq** (_boolean_ _,__optional_) – If given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`. * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, gradient w.r.t. `weight` will be a sparse tensor. See Notes under [`torch.nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") for more details regarding sparse gradients. Shape: * Input: LongTensor of arbitrary shape containing the indices to extract * `Weight: Embedding matrix of floating point type with shape (V, embedding_dim),` where V = maximum index + 1 and embedding_dim = the embedding size * Output: `(*, embedding_dim)`, where `*` is the input shape Examples: >>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([[1,2,4,5],[4,3,2,9]]) >>> # an embedding matrix containing 10 tensors of size 3 >>> embedding_matrix = torch.rand(10, 3) >>> F.embedding(input, embedding_matrix) tensor([[[ 0.8490, 0.9625, 0.6753], [ 0.9666, 0.7761, 0.6108], [ 0.6246, 0.9751, 0.3618], [ 0.4161, 0.2419, 0.7383]], [[ 0.6246, 0.9751, 0.3618], [ 0.0237, 0.7794, 0.0528], [ 0.9666, 0.7761, 0.6108], [ 0.3385, 0.8612, 0.1867]]]) >>> # example with padding_idx >>> weights = torch.rand(10, 3) >>> weights[0, :].zero_() >>> embedding_matrix = weights >>> input = torch.tensor([[0,2,0,5]]) >>> F.embedding(input, embedding_matrix, padding_idx=0) tensor([[[ 0.0000, 0.0000, 0.0000], [ 0.5609, 0.5384, 0.8720], [ 0.0000, 0.0000, 0.0000], [ 0.6262, 0.2438, 0.7471]]]) ### embedding_bag `torch.nn.functional.embedding_bag(input, weight, offsets=None, max_norm=None, norm_type=2, scale_grad_by_freq=False, mode='mean', sparse=False, per_sample_weights=None, include_last_offset=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#embedding_bag) Computes sums, means or maxes of `bags` of embeddings, without instantiating the intermediate embeddings. See [`torch.nn.EmbeddingBag`](generated/torch.nn.embeddingbag#torch.nn.EmbeddingBag "torch.nn.EmbeddingBag") for more details. Note This operation may produce nondeterministic gradients when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **input** (_LongTensor_) – Tensor containing bags of indices into the embedding matrix * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – The embedding matrix with number of rows equal to the maximum possible index + 1, and number of columns equal to the embedding size * **offsets** (_LongTensor_ _,__optional_) – Only used when `input` is 1D. `offsets` determines the starting index position of each bag (sequence) in `input`. * **max_norm** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – If given, each embedding vector with norm larger than `max_norm` is renormalized to have norm `max_norm`. Note: this will modify `weight` in-place. * **norm_type** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The `p` in the `p`-norm to compute for the `max_norm` option. Default `2`. * **scale_grad_by_freq** (_boolean_ _,__optional_) – if given, this will scale gradients by the inverse of frequency of the words in the mini-batch. Default `False`. Note: this option is not supported when `mode="max"`. * **mode** (_string_ _,__optional_) – `"sum"`, `"mean"` or `"max"`. Specifies the way to reduce the bag. Default: `"mean"` * **sparse** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, gradient w.r.t. `weight` will be a sparse tensor. See Notes under [`torch.nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") for more details regarding sparse gradients. Note: this option is not supported when `mode="max"`. * **per_sample_weights** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a tensor of float / double weights, or None to indicate all weights should be taken to be 1. If specified, `per_sample_weights` must have exactly the same shape as input and is treated as having the same `offsets`, if those are not None. * **include_last_offset** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, the size of offsets is equal to the number of bags + 1. * **last element is the size of the input, or the ending index position of the last bag** (_The_) – Shape: * `input` (LongTensor) and `offsets` (LongTensor, optional) * If `input` is 2D of shape `(B, N)`, it will be treated as `B` bags (sequences) each of fixed length `N`, and this will return `B` values aggregated in a way depending on the `mode`. `offsets` is ignored and required to be `None` in this case. * If `input` is 1D of shape `(N)`, it will be treated as a concatenation of multiple bags (sequences). `offsets` is required to be a 1D tensor containing the starting index positions of each bag in `input`. Therefore, for `offsets` of shape `(B)`, `input` will be viewed as having `B` bags. Empty bags (i.e., having 0-length) will have returned vectors filled by zeros. * `weight` (Tensor): the learnable weights of the module of shape `(num_embeddings, embedding_dim)` * `per_sample_weights` (Tensor, optional). Has the same shape as `input`. * `output`: aggregated embedding values of shape `(B, embedding_dim)` Examples: >>> # an Embedding module containing 10 tensors of size 3 >>> embedding_matrix = torch.rand(10, 3) >>> # a batch of 2 samples of 4 indices each >>> input = torch.tensor([1,2,4,5,4,3,2,9]) >>> offsets = torch.tensor([0,4]) >>> F.embedding_bag(embedding_matrix, input, offsets) tensor([[ 0.3397, 0.3552, 0.5545], [ 0.5893, 0.4386, 0.5882]]) ### one_hot `torch.nn.functional.one_hot(tensor, num_classes=-1) → LongTensor` Takes LongTensor with index values of shape `(*)` and returns a tensor of shape `(*, num_classes)` that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1. See also [One-hot on Wikipedia](https://en.wikipedia.org/wiki/One-hot) . Parameters * **tensor** (_LongTensor_) – class values of any shape. * **num_classes** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Total number of classes. If set to -1, the number of classes will be inferred as one greater than the largest class value in the input tensor. Returns LongTensor that has one more dimension with 1 values at the index of last dimension indicated by the input, and 0 everywhere else. #### Examples >>> F.one_hot(torch.arange(0, 5) % 3) tensor([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0]]) >>> F.one_hot(torch.arange(0, 5) % 3, num_classes=5) tensor([[1, 0, 0, 0, 0], [0, 1, 0, 0, 0], [0, 0, 1, 0, 0], [1, 0, 0, 0, 0], [0, 1, 0, 0, 0]]) >>> F.one_hot(torch.arange(0, 6).view(3,2) % 3) tensor([[[1, 0, 0], [0, 1, 0]], [[0, 0, 1], [1, 0, 0]], [[0, 1, 0], [0, 0, 1]]]) ## Distance functions ### pairwise_distance `torch.nn.functional.pairwise_distance(x1, x2, p=2.0, eps=1e-06, keepdim=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#pairwise_distance) See [`torch.nn.PairwiseDistance`](generated/torch.nn.pairwisedistance#torch.nn.PairwiseDistance "torch.nn.PairwiseDistance") for details ### cosine_similarity `torch.nn.functional.cosine_similarity(x1, x2, dim=1, eps=1e-8) → Tensor` Returns cosine similarity between x1 and x2, computed along dim. similarity=x1⋅x2max⁡(∥x1∥2⋅∥x2∥2,ϵ)\text{similarity} = \dfrac{x_1 \cdot x_2}{\max(\Vert x_1 \Vert _2 \cdot \Vert x_2 \Vert _2, \epsilon)} Parameters * **x1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – First input. * **x2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Second input (of size matching x1). * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Dimension of vectors. Default: 1 * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid division by zero. Default: 1e-8 Shape: * Input: (∗1,D,∗2)(\ast_1, D, \ast_2) where D is at position `dim`. * Output: (∗1,∗2)(\ast_1, \ast_2) where 1 is at position `dim`. Example: >>> input1 = torch.randn(100, 128) >>> input2 = torch.randn(100, 128) >>> output = F.cosine_similarity(input1, input2) >>> print(output) ### pdist `torch.nn.functional.pdist(input, p=2) → Tensor` Computes the p-norm distance between every pair of row vectors in the input. This is identical to the upper triangular portion, excluding the diagonal, of `torch.norm(input[:, None] - input, dim=2, p=p)`. This function will be faster if the rows are contiguous. If input has shape N×MN \times M then the output will have shape 12N(N−1)\frac{1}{2} N (N - 1) . This function is equivalent to `scipy.spatial.distance.pdist(input, ‘minkowski’, p=p)` if p∈(0,∞)p \in (0, \infty) . When p=0p = 0 it is equivalent to `scipy.spatial.distance.pdist(input, ‘hamming’) * M`. When p=∞p = \infty , the closest scipy function is `scipy.spatial.distance.pdist(xn, lambda x, y: np.abs(x - y).max())`. Parameters * **input** – input tensor of shape N×MN \times M . * **p** – p value for the p-norm distance to calculate between each vector pair ∈[0,∞]\in [0, \infty] . ## Loss functions ### binary_cross_entropy `torch.nn.functional.binary_cross_entropy(input, target, weight=None, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#binary_cross_entropy) Function that measures the Binary Cross Entropy between the target and the output. See [`BCELoss`](generated/torch.nn.bceloss#torch.nn.BCELoss "torch.nn.BCELoss") for details. Parameters * **input** – Tensor of arbitrary shape * **target** – Tensor of the same shape as input * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight if provided it’s repeated to match input tensor shape * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Examples: >>> input = torch.randn((3, 2), requires_grad=True) >>> target = torch.rand((3, 2), requires_grad=False) >>> loss = F.binary_cross_entropy(F.sigmoid(input), target) >>> loss.backward() ### binary_cross_entropy_with_logits `torch.nn.functional.binary_cross_entropy_with_logits(input, target, weight=None, size_average=None, reduce=None, reduction='mean', pos_weight=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#binary_cross_entropy_with_logits) Function that measures Binary Cross Entropy between target and output logits. See [`BCEWithLogitsLoss`](generated/torch.nn.bcewithlogitsloss#torch.nn.BCEWithLogitsLoss "torch.nn.BCEWithLogitsLoss") for details. Parameters * **input** – Tensor of arbitrary shape * **target** – Tensor of the same shape as input * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight if provided it’s repeated to match input tensor shape * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` * **pos_weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a weight of positive examples. Must be a vector with length equal to the number of classes. Examples: >>> input = torch.randn(3, requires_grad=True) >>> target = torch.empty(3).random_(2) >>> loss = F.binary_cross_entropy_with_logits(input, target) >>> loss.backward() ### poisson_nll_loss `torch.nn.functional.poisson_nll_loss(input, target, log_input=True, full=False, size_average=None, eps=1e-08, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#poisson_nll_loss) Poisson negative log likelihood loss. See [`PoissonNLLLoss`](generated/torch.nn.poissonnllloss#torch.nn.PoissonNLLLoss "torch.nn.PoissonNLLLoss") for details. Parameters * **input** – expectation of underlying Poisson distribution. * **target** – random sample target∼Poisson(input)target \sim \text{Poisson}(input) . * **log_input** – if `True` the loss is computed as exp⁡(input)−target∗input\exp(\text{input}) - \text{target} * \text{input} , if `False` then loss is input−target∗log⁡(input+eps)\text{input} - \text{target} * \log(\text{input}+\text{eps}) . Default: `True` * **full** – whether to compute full loss, i. e. to add the Stirling approximation term. Default: `False` target∗log⁡(target)−target+0.5∗log⁡(2∗π∗target)\text{target} * \log(\text{target}) - \text{target} + 0.5 * \log(2 * \pi * \text{target}) . * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True` * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Small value to avoid evaluation of log⁡(0)\log(0) when `log_input`=``False``. Default: 1e-8 * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` ### cosine_embedding_loss `torch.nn.functional.cosine_embedding_loss(input1, input2, target, margin=0, size_average=None, reduce=None, reduction='mean') → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#cosine_embedding_loss) See [`CosineEmbeddingLoss`](generated/torch.nn.cosineembeddingloss#torch.nn.CosineEmbeddingLoss "torch.nn.CosineEmbeddingLoss") for details. ### cross_entropy `torch.nn.functional.cross_entropy(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#cross_entropy) This criterion combines `log_softmax` and `nll_loss` in a single function. See [`CrossEntropyLoss`](generated/torch.nn.crossentropyloss#torch.nn.CrossEntropyLoss "torch.nn.CrossEntropyLoss") for details. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – (N,C)(N, C) where `C = number of classes` or (N,C,H,W)(N, C, H, W) in case of 2D Loss, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) where K≥1K \geq 1 in the case of K-dimensional loss. * **target** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) where K≥1K \geq 1 for K-dimensional loss. * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, has to be a Tensor of size `C` * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True` * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets. Default: -100 * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Examples: >>> input = torch.randn(3, 5, requires_grad=True) >>> target = torch.randint(5, (3,), dtype=torch.int64) >>> loss = F.cross_entropy(input, target) >>> loss.backward() ### ctc_loss `torch.nn.functional.ctc_loss(log_probs, targets, input_lengths, target_lengths, blank=0, reduction='mean', zero_infinity=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#ctc_loss) The Connectionist Temporal Classification loss. See [`CTCLoss`](generated/torch.nn.ctcloss#torch.nn.CTCLoss "torch.nn.CTCLoss") for details. Note In some circumstances when given tensors on a CUDA device and using CuDNN, this operator may select a nondeterministic algorithm to increase performance. If this is undesirable, you can try to make the operation deterministic (potentially at a performance cost) by setting `torch.backends.cudnn.deterministic = True`. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Note This operation may produce nondeterministic gradients when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **log_probs** – (T,N,C)(T, N, C) where `C = number of characters in alphabet including blank`, `T = input length`, and `N = batch size`. The logarithmized probabilities of the outputs (e.g. obtained with `torch.nn.functional.log_softmax()`). * **targets** – (N,S)(N, S) or `(sum(target_lengths))`. Targets cannot be blank. In the second form, the targets are assumed to be concatenated. * **input_lengths** – (N)(N) . Lengths of the inputs (must each be ≤T\leq T ) * **target_lengths** – (N)(N) . Lengths of the targets * **blank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Blank label. Default 00 . * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the output losses will be divided by the target lengths and then the mean over the batch is taken, `'sum'`: the output will be summed. Default: `'mean'` * **zero_infinity** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Whether to zero infinite losses and the associated gradients. Default: `False` Infinite losses mainly occur when the inputs are too short to be aligned to the targets. Example: >>> log_probs = torch.randn(50, 16, 20).log_softmax(2).detach().requires_grad_() >>> targets = torch.randint(1, 20, (16, 30), dtype=torch.long) >>> input_lengths = torch.full((16,), 50, dtype=torch.long) >>> target_lengths = torch.randint(10,30,(16,), dtype=torch.long) >>> loss = F.ctc_loss(log_probs, targets, input_lengths, target_lengths) >>> loss.backward() ### hinge_embedding_loss `torch.nn.functional.hinge_embedding_loss(input, target, margin=1.0, size_average=None, reduce=None, reduction='mean') → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#hinge_embedding_loss) See [`HingeEmbeddingLoss`](generated/torch.nn.hingeembeddingloss#torch.nn.HingeEmbeddingLoss "torch.nn.HingeEmbeddingLoss") for details. ### kl_div `torch.nn.functional.kl_div(input, target, size_average=None, reduce=None, reduction='mean', log_target=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#kl_div) The [Kullback-Leibler divergence Loss](https://en.wikipedia.org/wiki/Kullback- Leibler_divergence) See [`KLDivLoss`](generated/torch.nn.kldivloss#torch.nn.KLDivLoss "torch.nn.KLDivLoss") for details. Parameters * **input** – Tensor of arbitrary shape * **target** – Tensor of the same shape as input * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True` * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'batchmean'` | `'sum'` | `'mean'`. `'none'`: no reduction will be applied `'batchmean'`: the sum of the output will be divided by the batchsize `'sum'`: the output will be summed `'mean'`: the output will be divided by the number of elements in the output Default: `'mean'` * **log_target** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – A flag indicating whether `target` is passed in the log space. It is recommended to pass certain distributions (like `softmax`) in the log space to avoid numerical issues caused by explicit `log`. Default: `False` Note `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Note :attr:`reduction` = `'mean'` doesn’t return the true kl divergence value, please use :attr:`reduction` = `'batchmean'` which aligns with KL math definition. In the next major release, `'mean'` will be changed to be the same as ‘batchmean’. ### l1_loss `torch.nn.functional.l1_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#l1_loss) Function that takes the mean element-wise absolute value difference. See [`L1Loss`](generated/torch.nn.l1loss#torch.nn.L1Loss "torch.nn.L1Loss") for details. ### mse_loss `torch.nn.functional.mse_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#mse_loss) Measures the element-wise mean squared error. See [`MSELoss`](generated/torch.nn.mseloss#torch.nn.MSELoss "torch.nn.MSELoss") for details. ### margin_ranking_loss `torch.nn.functional.margin_ranking_loss(input1, input2, target, margin=0, size_average=None, reduce=None, reduction='mean') → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#margin_ranking_loss) See [`MarginRankingLoss`](generated/torch.nn.marginrankingloss#torch.nn.MarginRankingLoss "torch.nn.MarginRankingLoss") for details. ### multilabel_margin_loss `torch.nn.functional.multilabel_margin_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#multilabel_margin_loss) See [`MultiLabelMarginLoss`](generated/torch.nn.multilabelmarginloss#torch.nn.MultiLabelMarginLoss "torch.nn.MultiLabelMarginLoss") for details. ### multilabel_soft_margin_loss `torch.nn.functional.multilabel_soft_margin_loss(input, target, weight=None, size_average=None) → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#multilabel_soft_margin_loss) See [`MultiLabelSoftMarginLoss`](generated/torch.nn.multilabelsoftmarginloss#torch.nn.MultiLabelSoftMarginLoss "torch.nn.MultiLabelSoftMarginLoss") for details. ### multi_margin_loss `torch.nn.functional.multi_margin_loss(input, target, p=1, margin=1.0, weight=None, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#multi_margin_loss) multi_margin_loss(input, target, p=1, margin=1, weight=None, size_average=None, reduce=None, reduction=’mean’) -> Tensor See [`MultiMarginLoss`](generated/torch.nn.multimarginloss#torch.nn.MultiMarginLoss "torch.nn.MultiMarginLoss") for details. ### nll_loss `torch.nn.functional.nll_loss(input, target, weight=None, size_average=None, ignore_index=-100, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#nll_loss) The negative log likelihood loss. See [`NLLLoss`](generated/torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") for details. Parameters * **input** – (N,C)(N, C) where `C = number of classes` or (N,C,H,W)(N, C, H, W) in case of 2D Loss, or (N,C,d1,d2,...,dK)(N, C, d_1, d_2, ..., d_K) where K≥1K \geq 1 in the case of K-dimensional loss. * **target** – (N)(N) where each value is 0≤targets[i]≤C−10 \leq \text{targets}[i] \leq C-1 , or (N,d1,d2,...,dK)(N, d_1, d_2, ..., d_K) where K≥1K \geq 1 for K-dimensional loss. * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – a manual rescaling weight given to each class. If given, has to be a Tensor of size `C` * **size_average** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged over each loss element in the batch. Note that for some losses, there multiple elements per sample. If the field `size_average` is set to `False`, the losses are instead summed for each minibatch. Ignored when reduce is `False`. Default: `True` * **ignore_index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – Specifies a target value that is ignored and does not contribute to the input gradient. When `size_average` is `True`, the loss is averaged over non-ignored targets. Default: -100 * **reduce** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Deprecated (see `reduction`). By default, the losses are averaged or summed over observations for each minibatch depending on `size_average`. When `reduce` is `False`, returns a loss per batch element instead and ignores `size_average`. Default: `True` * **reduction** (_string_ _,__optional_) – Specifies the reduction to apply to the output: `'none'` | `'mean'` | `'sum'`. `'none'`: no reduction will be applied, `'mean'`: the sum of the output will be divided by the number of elements in the output, `'sum'`: the output will be summed. Note: `size_average` and `reduce` are in the process of being deprecated, and in the meantime, specifying either of those two args will override `reduction`. Default: `'mean'` Example: >>> # input is of size N x C = 3 x 5 >>> input = torch.randn(3, 5, requires_grad=True) >>> # each element in target has to have 0 <= value < C >>> target = torch.tensor([1, 0, 4]) >>> output = F.nll_loss(F.log_softmax(input), target) >>> output.backward() ### smooth_l1_loss `torch.nn.functional.smooth_l1_loss(input, target, size_average=None, reduce=None, reduction='mean', beta=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#smooth_l1_loss) Function that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. See [`SmoothL1Loss`](generated/torch.nn.smoothl1loss#torch.nn.SmoothL1Loss "torch.nn.SmoothL1Loss") for details. ### soft_margin_loss `torch.nn.functional.soft_margin_loss(input, target, size_average=None, reduce=None, reduction='mean') → Tensor` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#soft_margin_loss) See [`SoftMarginLoss`](generated/torch.nn.softmarginloss#torch.nn.SoftMarginLoss "torch.nn.SoftMarginLoss") for details. ### triplet_margin_loss `torch.nn.functional.triplet_margin_loss(anchor, positive, negative, margin=1.0, p=2, eps=1e-06, swap=False, size_average=None, reduce=None, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#triplet_margin_loss) See [`TripletMarginLoss`](generated/torch.nn.tripletmarginloss#torch.nn.TripletMarginLoss "torch.nn.TripletMarginLoss") for details ### triplet_margin_with_distance_loss `torch.nn.functional.triplet_margin_with_distance_loss(anchor, positive, negative, *, distance_function=None, margin=1.0, swap=False, reduction='mean')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#triplet_margin_with_distance_loss) See [`TripletMarginWithDistanceLoss`](generated/torch.nn.tripletmarginwithdistanceloss#torch.nn.TripletMarginWithDistanceLoss "torch.nn.TripletMarginWithDistanceLoss") for details. ## Vision functions ### pixel_shuffle `torch.nn.functional.pixel_shuffle(input, upscale_factor) → Tensor` Rearranges elements in a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) to a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) , where r is the `upscale_factor`. See [`PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle "torch.nn.PixelShuffle") for details. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **upscale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – factor to increase spatial resolution by Examples: >>> input = torch.randn(1, 9, 4, 4) >>> output = torch.nn.functional.pixel_shuffle(input, 3) >>> print(output.size()) torch.Size([1, 1, 12, 12]) ### pixel_unshuffle `torch.nn.functional.pixel_unshuffle(input, downscale_factor) → Tensor` Reverses the [`PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle "torch.nn.PixelShuffle") operation by rearranging elements in a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) , where r is the `downscale_factor`. See [`PixelUnshuffle`](generated/torch.nn.pixelunshuffle#torch.nn.PixelUnshuffle "torch.nn.PixelUnshuffle") for details. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **downscale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – factor to increase spatial resolution by Examples: >>> input = torch.randn(1, 1, 12, 12) >>> output = torch.nn.functional.pixel_unshuffle(input, 3) >>> print(output.size()) torch.Size([1, 9, 4, 4]) ### pad `torch.nn.functional.pad(input, pad, mode='constant', value=0)` Pads tensor. Padding size: The padding size by which to pad some dimensions of `input` are described starting from the last dimension and moving forward. ⌊len(pad)2⌋\left\lfloor\frac{\text{len(pad)}}{2}\right\rfloor dimensions of `input` will be padded. For example, to pad only the last dimension of the input tensor, then `pad` has the form (padding_left,padding_right)(\text{padding\\_left}, \text{padding\\_right}) ; to pad the last 2 dimensions of the input tensor, then use (padding_left,padding_right,(\text{padding\\_left}, \text{padding\\_right}, padding_top,padding_bottom)\text{padding\\_top}, \text{padding\\_bottom}) ; to pad the last 3 dimensions, use (padding_left,padding_right,(\text{padding\\_left}, \text{padding\\_right}, padding_top,padding_bottom\text{padding\\_top}, \text{padding\\_bottom} padding_front,padding_back)\text{padding\\_front}, \text{padding\\_back}) . Padding mode: See [`torch.nn.ConstantPad2d`](generated/torch.nn.constantpad2d#torch.nn.ConstantPad2d "torch.nn.ConstantPad2d"), [`torch.nn.ReflectionPad2d`](generated/torch.nn.reflectionpad2d#torch.nn.ReflectionPad2d "torch.nn.ReflectionPad2d"), and [`torch.nn.ReplicationPad2d`](generated/torch.nn.replicationpad2d#torch.nn.ReplicationPad2d "torch.nn.ReplicationPad2d") for concrete examples on how each of the padding modes works. Constant padding is implemented for arbitrary dimensions. Replicate padding is implemented for padding the last 3 dimensions of 5D input tensor, or the last 2 dimensions of 4D input tensor, or the last dimension of 3D input tensor. Reflect padding is only implemented for padding the last 2 dimensions of 4D input tensor, or the last dimension of 3D input tensor. Note When using the CUDA backend, this operation may induce nondeterministic behaviour in its backward pass that is not easily switched off. Please see the notes on [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for background. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – N-dimensional tensor * **pad** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – m-elements tuple, where m2≤\frac{m}{2} \leq input dimensions and mm is even. * **mode** – `'constant'`, `'reflect'`, `'replicate'` or `'circular'`. Default: `'constant'` * **value** – fill value for `'constant'` padding. Default: `0` Examples: >>> t4d = torch.empty(3, 3, 4, 2) >>> p1d = (1, 1) # pad last dim by 1 on each side >>> out = F.pad(t4d, p1d, "constant", 0) # effectively zero padding >>> print(out.size()) torch.Size([3, 3, 4, 4]) >>> p2d = (1, 1, 2, 2) # pad last dim by (1, 1) and 2nd to last by (2, 2) >>> out = F.pad(t4d, p2d, "constant", 0) >>> print(out.size()) torch.Size([3, 3, 8, 4]) >>> t4d = torch.empty(3, 3, 4, 2) >>> p3d = (0, 1, 2, 1, 3, 3) # pad by (0, 1), (2, 1), and (3, 3) >>> out = F.pad(t4d, p3d, "constant", 0) >>> print(out.size()) torch.Size([3, 9, 7, 3]) ### interpolate `torch.nn.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None, recompute_scale_factor=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#interpolate) Down/up samples the input to either the given `size` or the given `scale_factor` The algorithm used for interpolation is determined by `mode`. Currently temporal, spatial and volumetric sampling are supported, i.e. expected inputs are 3-D, 4-D or 5-D in shape. The input dimensions are interpreted in the form: `mini-batch x channels x [optional depth] x [optional height] x width`. The modes available for resizing are: `nearest`, `linear` (3D-only), `bilinear`, `bicubic` (4D-only), `trilinear` (5D-only), `area` Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size. * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to match input size if it is a tuple. * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – algorithm used for upsampling: `'nearest'` | `'linear'` | `'bilinear'` | `'bicubic'` | `'trilinear'` | `'area'`. Default: `'nearest'` * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'linear'`, `'bilinear'`, `'bicubic'` or `'trilinear'`. Default: `False` * **recompute_scale_factor** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – recompute the scale_factor for use in the interpolation calculation. When `scale_factor` is passed as a parameter, it is used to compute the `output_size`. If `recompute_scale_factor` is `False` or not specified, the passed-in `scale_factor` will be used in the interpolation computation. Otherwise, a new `scale_factor` will be computed based on the output and input sizes for use in the interpolation computation (i.e. the computation will be identical to if the computed `output_size` were passed-in explicitly). Note that when `scale_factor` is floating-point, the recomputed scale_factor may differ from the one passed in due to rounding and precision issues. Note With `mode='bicubic'`, it’s possible to cause overshoot, in other words it can produce negative values or values greater than 255 for images. Explicitly call `result.clamp(min=0, max=255)` if you want to reduce the overshoot when displaying the image. Warning With `align_corners = True`, the linearly interpolating modes (`linear`, `bilinear`, and `trilinear`) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior is `align_corners = False`. See [`Upsample`](generated/torch.nn.upsample#torch.nn.Upsample "torch.nn.Upsample") for concrete examples on how this affects the outputs. Warning When scale_factor is specified, if recompute_scale_factor=True, scale_factor is used to compute the output_size which will then be used to infer new scales for the interpolation. The default behavior for recompute_scale_factor changed to False in 1.6.0, and scale_factor is used in the interpolation calculation. Note This operation may produce nondeterministic gradients when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. ### upsample `torch.nn.functional.upsample(input, size=None, scale_factor=None, mode='nearest', align_corners=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#upsample) Upsamples the input to either the given `size` or the given `scale_factor` Warning This function is deprecated in favor of `torch.nn.functional.interpolate()`. This is equivalent with `nn.functional.interpolate(...)`. Note This operation may produce nondeterministic gradients when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. The algorithm used for upsampling is determined by `mode`. Currently temporal, spatial and volumetric upsampling are supported, i.e. expected inputs are 3-D, 4-D or 5-D in shape. The input dimensions are interpreted in the form: `mini-batch x channels x [optional depth] x [optional height] x width`. The modes available for upsampling are: `nearest`, `linear` (3D-only), `bilinear`, `bicubic` (4D-only), `trilinear` (5D-only) Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size. * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to match input size if it is a tuple. * **mode** (_string_) – algorithm used for upsampling: `'nearest'` | `'linear'` | `'bilinear'` | `'bicubic'` | `'trilinear'`. Default: `'nearest'` * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'linear'`, `'bilinear'`, `'bicubic'` or `'trilinear'`. Default: `False` Note With `mode='bicubic'`, it’s possible to cause overshoot, in other words it can produce negative values or values greater than 255 for images. Explicitly call `result.clamp(min=0, max=255)` if you want to reduce the overshoot when displaying the image. Warning With `align_corners = True`, the linearly interpolating modes (`linear`, `bilinear`, and `trilinear`) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior is `align_corners = False`. See [`Upsample`](generated/torch.nn.upsample#torch.nn.Upsample "torch.nn.Upsample") for concrete examples on how this affects the outputs. ### upsample_nearest `torch.nn.functional.upsample_nearest(input, size=None, scale_factor=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#upsample_nearest) Upsamples the input, using nearest neighbours’ pixel values. Warning This function is deprecated in favor of `torch.nn.functional.interpolate()`. This is equivalent with `nn.functional.interpolate(..., mode='nearest')`. Currently spatial and volumetric upsampling are supported (i.e. expected inputs are 4 or 5 dimensional). Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatia size. * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – multiplier for spatial size. Has to be an integer. Note This operation may produce nondeterministic gradients when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. ### upsample_bilinear `torch.nn.functional.upsample_bilinear(input, size=None, scale_factor=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#upsample_bilinear) Upsamples the input, using bilinear upsampling. Warning This function is deprecated in favor of `torch.nn.functional.interpolate()`. This is equivalent with `nn.functional.interpolate(..., mode='bilinear', align_corners=True)`. Expected inputs are spatial (4 dimensional). Use `upsample_trilinear` fo volumetric (5 dimensional) inputs. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size. * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – multiplier for spatial size Note This operation may produce nondeterministic gradients when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. ### grid_sample `torch.nn.functional.grid_sample(input, grid, mode='bilinear', padding_mode='zeros', align_corners=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#grid_sample) Given an `input` and a flow-field `grid`, computes the `output` using `input` values and pixel locations from `grid`. Currently, only spatial (4-D) and volumetric (5-D) `input` are supported. In the spatial (4-D) case, for `input` with shape (N,C,Hin,Win)(N, C, H_\text{in}, W_\text{in}) and `grid` with shape (N,Hout,Wout,2)(N, H_\text{out}, W_\text{out}, 2) , the output will have shape (N,C,Hout,Wout)(N, C, H_\text{out}, W_\text{out}) . For each output location `output[n, :, h, w]`, the size-2 vector `grid[n, h, w]` specifies `input` pixel locations `x` and `y`, which are used to interpolate the output value `output[n, :, h, w]`. In the case of 5D inputs, `grid[n, d, h, w]` specifies the `x`, `y`, `z` pixel locations for interpolating `output[n, :, d, h, w]`. `mode` argument specifies `nearest` or `bilinear` interpolation method to sample the input pixels. `grid` specifies the sampling pixel locations normalized by the `input` spatial dimensions. Therefore, it should have most values in the range of `[-1, 1]`. For example, values `x = -1, y = -1` is the left-top pixel of `input`, and values `x = 1, y = 1` is the right-bottom pixel of `input`. If `grid` has values outside the range of `[-1, 1]`, the corresponding outputs are handled as defined by `padding_mode`. Options are * `padding_mode="zeros"`: use `0` for out-of-bound grid locations, * `padding_mode="border"`: use border values for out-of-bound grid locations, * `padding_mode="reflection"`: use values at locations reflected by the border for out-of-bound grid locations. For location far away from the border, it will keep being reflected until becoming in bound, e.g., (normalized) pixel location `x = -3.5` reflects by border `-1` and becomes `x' = 1.5`, then reflects by border `1` and becomes `x'' = -0.5`. Note This function is often used in conjunction with `affine_grid()` to build [Spatial Transformer Networks](https://arxiv.org/abs/1506.02025) . Note When using the CUDA backend, this operation may induce nondeterministic behaviour in its backward pass that is not easily switched off. Please see the notes on [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for background. Note NaN values in `grid` would be interpreted as `-1`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input of shape (N,C,Hin,Win)(N, C, H_\text{in}, W_\text{in}) (4-D case) or (N,C,Din,Hin,Win)(N, C, D_\text{in}, H_\text{in}, W_\text{in}) (5-D case) * **grid** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – flow-field of shape (N,Hout,Wout,2)(N, H_\text{out}, W_\text{out}, 2) (4-D case) or (N,Dout,Hout,Wout,3)(N, D_\text{out}, H_\text{out}, W_\text{out}, 3) (5-D case) * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – interpolation mode to calculate output values `'bilinear'` | `'nearest'` | `'bicubic'`. Default: `'bilinear'` Note: `mode='bicubic'` supports only 4-D input. When `mode='bilinear'` and the input is 5-D, the interpolation mode used internally will actually be trilinear. However, when the input is 4-D, the interpolation mode will legitimately be bilinear. * **padding_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – padding mode for outside grid values `'zeros'` | `'border'` | `'reflection'`. Default: `'zeros'` * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input as squares rather than points. If set to `True`, the extrema (`-1` and `1`) are considered as referring to the center points of the input’s corner pixels. If set to `False`, they are instead considered as referring to the corner points of the input’s corner pixels, making the sampling more resolution agnostic. This option parallels the `align_corners` option in `interpolate()`, and so whichever option is used here should also be used there to resize the input image before grid sampling. Default: `False` Returns output Tensor Return type output ([Tensor](tensors#torch.Tensor "torch.Tensor")) Warning When `align_corners = True`, the grid positions depend on the pixel size relative to the input image size, and so the locations sampled by `grid_sample()` will differ for the same input given at different resolutions (that is, after being upsampled or downsampled). The default behavior up to version 1.2.0 was `align_corners = True`. Since then, the default behavior has been changed to `align_corners = False`, in order to bring it in line with the default for `interpolate()`. Note `mode='bicubic'` is implemented using the [cubic convolution algorithm](https://en.wikipedia.org/wiki/Bicubic_interpolation) with α=−0.75\alpha=-0.75 . The constant α\alpha might be different from packages to packages. For example, [PIL](https://github.com/python- pillow/Pillow/blob/4634eafe3c695a014267eefdce830b4a825beed7/src/libImaging/Resample.c#L51) and [OpenCV](https://github.com/opencv/opencv/blob/f345ed564a06178670750bad59526cfa4033be55/modules/imgproc/src/resize.cpp#L908) use -0.5 and -0.75 respectively. This algorithm may “overshoot” the range of values it’s interpolating. For example, it may produce negative values or values greater than 255 when interpolating input in [0, 255]. Clamp the results with :func: `torch.clamp` to ensure they are within the valid range. ### affine_grid `torch.nn.functional.affine_grid(theta, size, align_corners=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/functional.html#affine_grid) Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices `theta`. Note This function is often used in conjunction with `grid_sample()` to build [Spatial Transformer Networks](https://arxiv.org/abs/1506.02025) . Parameters * **theta** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input batch of affine matrices with shape (N×2×3N \times 2 \times 3 ) for 2D or (N×3×4N \times 3 \times 4 ) for 3D * **size** (_torch.Size_) – the target output image size. (N×C×H×WN \times C \times H \times W for 2D or N×C×D×H×WN \times C \times D \times H \times W for 3D) Example: torch.Size((32, 3, 24, 24)) * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, consider `-1` and `1` to refer to the centers of the corner pixels rather than the image corners. Refer to `grid_sample()` for a more complete description. A grid generated by `affine_grid()` should be passed to `grid_sample()` with the same setting for this option. Default: `False` Returns output Tensor of size (N×H×W×2N \times H \times W \times 2 ) Return type output ([Tensor](tensors#torch.Tensor "torch.Tensor")) Warning When `align_corners = True`, the grid positions depend on the pixel size relative to the input image size, and so the locations sampled by `grid_sample()` will differ for the same input given at different resolutions (that is, after being upsampled or downsampled). The default behavior up to version 1.2.0 was `align_corners = True`. Since then, the default behavior has been changed to `align_corners = False`, in order to bring it in line with the default for `interpolate()`. Warning When `align_corners = True`, 2D affine transforms on 1D data and 3D affine transforms on 2D data (that is, when one of the spatial dimensions has unit size) are ill-defined, and not an intended use case. This is not a problem when `align_corners = False`. Up to version 1.2.0, all grid points along a unit dimension were considered arbitrarily to be at `-1`. From version 1.3.0, under `align_corners = True` all grid points along a unit dimension are considered to be at ``0` (the center of the input image). ## DataParallel functions (multi-GPU, distributed) ### data_parallel `torch.nn.parallel.data_parallel(module, inputs, device_ids=None, output_device=None, dim=0, module_kwargs=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/parallel/data_parallel.html#data_parallel) Evaluates module(input) in parallel across the GPUs given in device_ids. This is the functional version of the DataParallel module. Parameters * **module** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – the module to evaluate in parallel * **inputs** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – inputs to the module * **device_ids** (_list of python:int_ _or_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device")) – GPU ids on which to replicate module * **output_device** (_list of python:int_ _or_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device")) – GPU location of the output Use -1 to indicate the CPU. (default: device_ids[0]) Returns a Tensor containing the result of module(input) located on output_device # torch.nn These are the basic building block for graphs torch.nn * Containers * Convolution Layers * Pooling layers * Padding Layers * Non-linear Activations (weighted sum, nonlinearity) * Non-linear Activations (other) * Normalization Layers * Recurrent Layers * Transformer Layers * Linear Layers * Dropout Layers * Sparse Layers * Distance Functions * Loss Functions * Vision Layers * Shuffle Layers * DataParallel Layers (multi-GPU, distributed) * Utilities * Quantized Functions * Lazy Modules Initialization [`Parameter`](generated/torch.nn.parameter.parameter#torch.nn.parameter.Parameter "torch.nn.parameter.Parameter") | A kind of Tensor that is to be considered a module parameter. ---|--- [`UninitializedParameter`](generated/torch.nn.parameter.uninitializedparameter#torch.nn.parameter.UninitializedParameter "torch.nn.parameter.UninitializedParameter") | A parameter that is not initialized. ## Containers [`Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") | Base class for all neural network modules. ---|--- [`Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") | A sequential container. [`ModuleList`](generated/torch.nn.modulelist#torch.nn.ModuleList "torch.nn.ModuleList") | Holds submodules in a list. [`ModuleDict`](generated/torch.nn.moduledict#torch.nn.ModuleDict "torch.nn.ModuleDict") | Holds submodules in a dictionary. [`ParameterList`](generated/torch.nn.parameterlist#torch.nn.ParameterList "torch.nn.ParameterList") | Holds parameters in a list. [`ParameterDict`](generated/torch.nn.parameterdict#torch.nn.ParameterDict "torch.nn.ParameterDict") | Holds parameters in a dictionary. Global Hooks For Module [`register_module_forward_pre_hook`](generated/torch.nn.modules.module.register_module_forward_pre_hook#torch.nn.modules.module.register_module_forward_pre_hook "torch.nn.modules.module.register_module_forward_pre_hook") | Registers a forward pre-hook common to all modules. ---|--- [`register_module_forward_hook`](generated/torch.nn.modules.module.register_module_forward_hook#torch.nn.modules.module.register_module_forward_hook "torch.nn.modules.module.register_module_forward_hook") | Registers a global forward hook for all the modules [`register_module_backward_hook`](generated/torch.nn.modules.module.register_module_backward_hook#torch.nn.modules.module.register_module_backward_hook "torch.nn.modules.module.register_module_backward_hook") | Registers a backward hook common to all the modules. ## Convolution Layers [`nn.Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") | Applies a 1D convolution over an input signal composed of several input planes. ---|--- [`nn.Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") | Applies a 2D convolution over an input signal composed of several input planes. [`nn.Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") | Applies a 3D convolution over an input signal composed of several input planes. [`nn.ConvTranspose1d`](generated/torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") | Applies a 1D transposed convolution operator over an input image composed of several input planes. [`nn.ConvTranspose2d`](generated/torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") | Applies a 2D transposed convolution operator over an input image composed of several input planes. [`nn.ConvTranspose3d`](generated/torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") | Applies a 3D transposed convolution operator over an input image composed of several input planes. [`nn.LazyConv1d`](generated/torch.nn.lazyconv1d#torch.nn.LazyConv1d "torch.nn.LazyConv1d") | A [`torch.nn.Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") module with lazy initialization of the `in_channels` argument of the `Conv1d` that is inferred from the `input.size(1)`. [`nn.LazyConv2d`](generated/torch.nn.lazyconv2d#torch.nn.LazyConv2d "torch.nn.LazyConv2d") | A [`torch.nn.Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") module with lazy initialization of the `in_channels` argument of the `Conv2d` that is inferred from the `input.size(1)`. [`nn.LazyConv3d`](generated/torch.nn.lazyconv3d#torch.nn.LazyConv3d "torch.nn.LazyConv3d") | A [`torch.nn.Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") module with lazy initialization of the `in_channels` argument of the `Conv3d` that is inferred from the `input.size(1)`. [`nn.LazyConvTranspose1d`](generated/torch.nn.lazyconvtranspose1d#torch.nn.LazyConvTranspose1d "torch.nn.LazyConvTranspose1d") | A [`torch.nn.ConvTranspose1d`](generated/torch.nn.convtranspose1d#torch.nn.ConvTranspose1d "torch.nn.ConvTranspose1d") module with lazy initialization of the `in_channels` argument of the `ConvTranspose1d` that is inferred from the `input.size(1)`. [`nn.LazyConvTranspose2d`](generated/torch.nn.lazyconvtranspose2d#torch.nn.LazyConvTranspose2d "torch.nn.LazyConvTranspose2d") | A [`torch.nn.ConvTranspose2d`](generated/torch.nn.convtranspose2d#torch.nn.ConvTranspose2d "torch.nn.ConvTranspose2d") module with lazy initialization of the `in_channels` argument of the `ConvTranspose2d` that is inferred from the `input.size(1)`. [`nn.LazyConvTranspose3d`](generated/torch.nn.lazyconvtranspose3d#torch.nn.LazyConvTranspose3d "torch.nn.LazyConvTranspose3d") | A [`torch.nn.ConvTranspose3d`](generated/torch.nn.convtranspose3d#torch.nn.ConvTranspose3d "torch.nn.ConvTranspose3d") module with lazy initialization of the `in_channels` argument of the `ConvTranspose3d` that is inferred from the `input.size(1)`. [`nn.Unfold`](generated/torch.nn.unfold#torch.nn.Unfold "torch.nn.Unfold") | Extracts sliding local blocks from a batched input tensor. [`nn.Fold`](generated/torch.nn.fold#torch.nn.Fold "torch.nn.Fold") | Combines an array of sliding local blocks into a large containing tensor. ## Pooling layers [`nn.MaxPool1d`](generated/torch.nn.maxpool1d#torch.nn.MaxPool1d "torch.nn.MaxPool1d") | Applies a 1D max pooling over an input signal composed of several input planes. ---|--- [`nn.MaxPool2d`](generated/torch.nn.maxpool2d#torch.nn.MaxPool2d "torch.nn.MaxPool2d") | Applies a 2D max pooling over an input signal composed of several input planes. [`nn.MaxPool3d`](generated/torch.nn.maxpool3d#torch.nn.MaxPool3d "torch.nn.MaxPool3d") | Applies a 3D max pooling over an input signal composed of several input planes. [`nn.MaxUnpool1d`](generated/torch.nn.maxunpool1d#torch.nn.MaxUnpool1d "torch.nn.MaxUnpool1d") | Computes a partial inverse of `MaxPool1d`. [`nn.MaxUnpool2d`](generated/torch.nn.maxunpool2d#torch.nn.MaxUnpool2d "torch.nn.MaxUnpool2d") | Computes a partial inverse of `MaxPool2d`. [`nn.MaxUnpool3d`](generated/torch.nn.maxunpool3d#torch.nn.MaxUnpool3d "torch.nn.MaxUnpool3d") | Computes a partial inverse of `MaxPool3d`. [`nn.AvgPool1d`](generated/torch.nn.avgpool1d#torch.nn.AvgPool1d "torch.nn.AvgPool1d") | Applies a 1D average pooling over an input signal composed of several input planes. [`nn.AvgPool2d`](generated/torch.nn.avgpool2d#torch.nn.AvgPool2d "torch.nn.AvgPool2d") | Applies a 2D average pooling over an input signal composed of several input planes. [`nn.AvgPool3d`](generated/torch.nn.avgpool3d#torch.nn.AvgPool3d "torch.nn.AvgPool3d") | Applies a 3D average pooling over an input signal composed of several input planes. [`nn.FractionalMaxPool2d`](generated/torch.nn.fractionalmaxpool2d#torch.nn.FractionalMaxPool2d "torch.nn.FractionalMaxPool2d") | Applies a 2D fractional max pooling over an input signal composed of several input planes. [`nn.LPPool1d`](generated/torch.nn.lppool1d#torch.nn.LPPool1d "torch.nn.LPPool1d") | Applies a 1D power-average pooling over an input signal composed of several input planes. [`nn.LPPool2d`](generated/torch.nn.lppool2d#torch.nn.LPPool2d "torch.nn.LPPool2d") | Applies a 2D power-average pooling over an input signal composed of several input planes. [`nn.AdaptiveMaxPool1d`](generated/torch.nn.adaptivemaxpool1d#torch.nn.AdaptiveMaxPool1d "torch.nn.AdaptiveMaxPool1d") | Applies a 1D adaptive max pooling over an input signal composed of several input planes. [`nn.AdaptiveMaxPool2d`](generated/torch.nn.adaptivemaxpool2d#torch.nn.AdaptiveMaxPool2d "torch.nn.AdaptiveMaxPool2d") | Applies a 2D adaptive max pooling over an input signal composed of several input planes. [`nn.AdaptiveMaxPool3d`](generated/torch.nn.adaptivemaxpool3d#torch.nn.AdaptiveMaxPool3d "torch.nn.AdaptiveMaxPool3d") | Applies a 3D adaptive max pooling over an input signal composed of several input planes. [`nn.AdaptiveAvgPool1d`](generated/torch.nn.adaptiveavgpool1d#torch.nn.AdaptiveAvgPool1d "torch.nn.AdaptiveAvgPool1d") | Applies a 1D adaptive average pooling over an input signal composed of several input planes. [`nn.AdaptiveAvgPool2d`](generated/torch.nn.adaptiveavgpool2d#torch.nn.AdaptiveAvgPool2d "torch.nn.AdaptiveAvgPool2d") | Applies a 2D adaptive average pooling over an input signal composed of several input planes. [`nn.AdaptiveAvgPool3d`](generated/torch.nn.adaptiveavgpool3d#torch.nn.AdaptiveAvgPool3d "torch.nn.AdaptiveAvgPool3d") | Applies a 3D adaptive average pooling over an input signal composed of several input planes. ## Padding Layers [`nn.ReflectionPad1d`](generated/torch.nn.reflectionpad1d#torch.nn.ReflectionPad1d "torch.nn.ReflectionPad1d") | Pads the input tensor using the reflection of the input boundary. ---|--- [`nn.ReflectionPad2d`](generated/torch.nn.reflectionpad2d#torch.nn.ReflectionPad2d "torch.nn.ReflectionPad2d") | Pads the input tensor using the reflection of the input boundary. [`nn.ReplicationPad1d`](generated/torch.nn.replicationpad1d#torch.nn.ReplicationPad1d "torch.nn.ReplicationPad1d") | Pads the input tensor using replication of the input boundary. [`nn.ReplicationPad2d`](generated/torch.nn.replicationpad2d#torch.nn.ReplicationPad2d "torch.nn.ReplicationPad2d") | Pads the input tensor using replication of the input boundary. [`nn.ReplicationPad3d`](generated/torch.nn.replicationpad3d#torch.nn.ReplicationPad3d "torch.nn.ReplicationPad3d") | Pads the input tensor using replication of the input boundary. [`nn.ZeroPad2d`](generated/torch.nn.zeropad2d#torch.nn.ZeroPad2d "torch.nn.ZeroPad2d") | Pads the input tensor boundaries with zero. [`nn.ConstantPad1d`](generated/torch.nn.constantpad1d#torch.nn.ConstantPad1d "torch.nn.ConstantPad1d") | Pads the input tensor boundaries with a constant value. [`nn.ConstantPad2d`](generated/torch.nn.constantpad2d#torch.nn.ConstantPad2d "torch.nn.ConstantPad2d") | Pads the input tensor boundaries with a constant value. [`nn.ConstantPad3d`](generated/torch.nn.constantpad3d#torch.nn.ConstantPad3d "torch.nn.ConstantPad3d") | Pads the input tensor boundaries with a constant value. ## Non-linear Activations (weighted sum, nonlinearity) [`nn.ELU`](generated/torch.nn.elu#torch.nn.ELU "torch.nn.ELU") | Applies the element-wise function: ---|--- [`nn.Hardshrink`](generated/torch.nn.hardshrink#torch.nn.Hardshrink "torch.nn.Hardshrink") | Applies the hard shrinkage function element-wise: [`nn.Hardsigmoid`](generated/torch.nn.hardsigmoid#torch.nn.Hardsigmoid "torch.nn.Hardsigmoid") | Applies the element-wise function: [`nn.Hardtanh`](generated/torch.nn.hardtanh#torch.nn.Hardtanh "torch.nn.Hardtanh") | Applies the HardTanh function element-wise [`nn.Hardswish`](generated/torch.nn.hardswish#torch.nn.Hardswish "torch.nn.Hardswish") | Applies the hardswish function, element-wise, as described in the paper: [`nn.LeakyReLU`](generated/torch.nn.leakyrelu#torch.nn.LeakyReLU "torch.nn.LeakyReLU") | Applies the element-wise function: [`nn.LogSigmoid`](generated/torch.nn.logsigmoid#torch.nn.LogSigmoid "torch.nn.LogSigmoid") | Applies the element-wise function: [`nn.MultiheadAttention`](generated/torch.nn.multiheadattention#torch.nn.MultiheadAttention "torch.nn.MultiheadAttention") | Allows the model to jointly attend to information from different representation subspaces. [`nn.PReLU`](generated/torch.nn.prelu#torch.nn.PReLU "torch.nn.PReLU") | Applies the element-wise function: [`nn.ReLU`](generated/torch.nn.relu#torch.nn.ReLU "torch.nn.ReLU") | Applies the rectified linear unit function element-wise: [`nn.ReLU6`](generated/torch.nn.relu6#torch.nn.ReLU6 "torch.nn.ReLU6") | Applies the element-wise function: [`nn.RReLU`](generated/torch.nn.rrelu#torch.nn.RReLU "torch.nn.RReLU") | Applies the randomized leaky rectified liner unit function, element-wise, as described in the paper: [`nn.SELU`](generated/torch.nn.selu#torch.nn.SELU "torch.nn.SELU") | Applied element-wise, as: [`nn.CELU`](generated/torch.nn.celu#torch.nn.CELU "torch.nn.CELU") | Applies the element-wise function: [`nn.GELU`](generated/torch.nn.gelu#torch.nn.GELU "torch.nn.GELU") | Applies the Gaussian Error Linear Units function: [`nn.Sigmoid`](generated/torch.nn.sigmoid#torch.nn.Sigmoid "torch.nn.Sigmoid") | Applies the element-wise function: [`nn.SiLU`](generated/torch.nn.silu#torch.nn.SiLU "torch.nn.SiLU") | Applies the silu function, element-wise. [`nn.Softplus`](generated/torch.nn.softplus#torch.nn.Softplus "torch.nn.Softplus") | Applies the element-wise function: [`nn.Softshrink`](generated/torch.nn.softshrink#torch.nn.Softshrink "torch.nn.Softshrink") | Applies the soft shrinkage function elementwise: [`nn.Softsign`](generated/torch.nn.softsign#torch.nn.Softsign "torch.nn.Softsign") | Applies the element-wise function: [`nn.Tanh`](generated/torch.nn.tanh#torch.nn.Tanh "torch.nn.Tanh") | Applies the element-wise function: [`nn.Tanhshrink`](generated/torch.nn.tanhshrink#torch.nn.Tanhshrink "torch.nn.Tanhshrink") | Applies the element-wise function: [`nn.Threshold`](generated/torch.nn.threshold#torch.nn.Threshold "torch.nn.Threshold") | Thresholds each element of the input Tensor. ## Non-linear Activations (other) [`nn.Softmin`](generated/torch.nn.softmin#torch.nn.Softmin "torch.nn.Softmin") | Applies the Softmin function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range `[0, 1]` and sum to 1. ---|--- [`nn.Softmax`](generated/torch.nn.softmax#torch.nn.Softmax "torch.nn.Softmax") | Applies the Softmax function to an n-dimensional input Tensor rescaling them so that the elements of the n-dimensional output Tensor lie in the range [0,1] and sum to 1. [`nn.Softmax2d`](generated/torch.nn.softmax2d#torch.nn.Softmax2d "torch.nn.Softmax2d") | Applies SoftMax over features to each spatial location. [`nn.LogSoftmax`](generated/torch.nn.logsoftmax#torch.nn.LogSoftmax "torch.nn.LogSoftmax") | Applies the log⁡(Softmax(x))\log(\text{Softmax}(x)) function to an n-dimensional input Tensor. [`nn.AdaptiveLogSoftmaxWithLoss`](generated/torch.nn.adaptivelogsoftmaxwithloss#torch.nn.AdaptiveLogSoftmaxWithLoss "torch.nn.AdaptiveLogSoftmaxWithLoss") | Efficient softmax approximation as described in [Efficient softmax approximation for GPUs by Edouard Grave, Armand Joulin, Moustapha Cissé, David Grangier, and Hervé Jégou](https://arxiv.org/abs/1609.04309). ## Normalization Layers [`nn.BatchNorm1d`](generated/torch.nn.batchnorm1d#torch.nn.BatchNorm1d "torch.nn.BatchNorm1d") | Applies Batch Normalization over a 2D or 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . ---|--- [`nn.BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d "torch.nn.BatchNorm2d") | Applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . [`nn.BatchNorm3d`](generated/torch.nn.batchnorm3d#torch.nn.BatchNorm3d "torch.nn.BatchNorm3d") | Applies Batch Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . [`nn.GroupNorm`](generated/torch.nn.groupnorm#torch.nn.GroupNorm "torch.nn.GroupNorm") | Applies Group Normalization over a mini-batch of inputs as described in the paper [Group Normalization](https://arxiv.org/abs/1803.08494) [`nn.SyncBatchNorm`](generated/torch.nn.syncbatchnorm#torch.nn.SyncBatchNorm "torch.nn.SyncBatchNorm") | Applies Batch Normalization over a N-Dimensional input (a mini-batch of [N-2]D inputs with additional channel dimension) as described in the paper [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167) . [`nn.InstanceNorm1d`](generated/torch.nn.instancenorm1d#torch.nn.InstanceNorm1d "torch.nn.InstanceNorm1d") | Applies Instance Normalization over a 3D input (a mini-batch of 1D inputs with optional additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022). [`nn.InstanceNorm2d`](generated/torch.nn.instancenorm2d#torch.nn.InstanceNorm2d "torch.nn.InstanceNorm2d") | Applies Instance Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022). [`nn.InstanceNorm3d`](generated/torch.nn.instancenorm3d#torch.nn.InstanceNorm3d "torch.nn.InstanceNorm3d") | Applies Instance Normalization over a 5D input (a mini-batch of 3D inputs with additional channel dimension) as described in the paper [Instance Normalization: The Missing Ingredient for Fast Stylization](https://arxiv.org/abs/1607.08022). [`nn.LayerNorm`](generated/torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm") | Applies Layer Normalization over a mini-batch of inputs as described in the paper [Layer Normalization](https://arxiv.org/abs/1607.06450) [`nn.LocalResponseNorm`](generated/torch.nn.localresponsenorm#torch.nn.LocalResponseNorm "torch.nn.LocalResponseNorm") | Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. ## Recurrent Layers [`nn.RNNBase`](generated/torch.nn.rnnbase#torch.nn.RNNBase "torch.nn.RNNBase") | ---|--- [`nn.RNN`](generated/torch.nn.rnn#torch.nn.RNN "torch.nn.RNN") | Applies a multi-layer Elman RNN with tanh⁡\tanh or ReLU\text{ReLU} non-linearity to an input sequence. [`nn.LSTM`](generated/torch.nn.lstm#torch.nn.LSTM "torch.nn.LSTM") | Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence. [`nn.GRU`](generated/torch.nn.gru#torch.nn.GRU "torch.nn.GRU") | Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence. [`nn.RNNCell`](generated/torch.nn.rnncell#torch.nn.RNNCell "torch.nn.RNNCell") | An Elman RNN cell with tanh or ReLU non-linearity. [`nn.LSTMCell`](generated/torch.nn.lstmcell#torch.nn.LSTMCell "torch.nn.LSTMCell") | A long short-term memory (LSTM) cell. [`nn.GRUCell`](generated/torch.nn.grucell#torch.nn.GRUCell "torch.nn.GRUCell") | A gated recurrent unit (GRU) cell ## Transformer Layers [`nn.Transformer`](generated/torch.nn.transformer#torch.nn.Transformer "torch.nn.Transformer") | A transformer model. ---|--- [`nn.TransformerEncoder`](generated/torch.nn.transformerencoder#torch.nn.TransformerEncoder "torch.nn.TransformerEncoder") | TransformerEncoder is a stack of N encoder layers [`nn.TransformerDecoder`](generated/torch.nn.transformerdecoder#torch.nn.TransformerDecoder "torch.nn.TransformerDecoder") | TransformerDecoder is a stack of N decoder layers [`nn.TransformerEncoderLayer`](generated/torch.nn.transformerencoderlayer#torch.nn.TransformerEncoderLayer "torch.nn.TransformerEncoderLayer") | TransformerEncoderLayer is made up of self-attn and feedforward network. [`nn.TransformerDecoderLayer`](generated/torch.nn.transformerdecoderlayer#torch.nn.TransformerDecoderLayer "torch.nn.TransformerDecoderLayer") | TransformerDecoderLayer is made up of self-attn, multi-head-attn and feedforward network. ## Linear Layers [`nn.Identity`](generated/torch.nn.identity#torch.nn.Identity "torch.nn.Identity") | A placeholder identity operator that is argument-insensitive. ---|--- [`nn.Linear`](generated/torch.nn.linear#torch.nn.Linear "torch.nn.Linear") | Applies a linear transformation to the incoming data: y=xAT+by = xA^T + b [`nn.Bilinear`](generated/torch.nn.bilinear#torch.nn.Bilinear "torch.nn.Bilinear") | Applies a bilinear transformation to the incoming data: y=x1TAx2+by = x_1^T A x_2 + b [`nn.LazyLinear`](generated/torch.nn.lazylinear#torch.nn.LazyLinear "torch.nn.LazyLinear") | A [`torch.nn.Linear`](generated/torch.nn.linear#torch.nn.Linear "torch.nn.Linear") module with lazy initialization. ## Dropout Layers [`nn.Dropout`](generated/torch.nn.dropout#torch.nn.Dropout "torch.nn.Dropout") | During training, randomly zeroes some of the elements of the input tensor with probability `p` using samples from a Bernoulli distribution. ---|--- [`nn.Dropout2d`](generated/torch.nn.dropout2d#torch.nn.Dropout2d "torch.nn.Dropout2d") | Randomly zero out entire channels (a channel is a 2D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 2D tensor input[i,j]\text{input}[i, j] ). [`nn.Dropout3d`](generated/torch.nn.dropout3d#torch.nn.Dropout3d "torch.nn.Dropout3d") | Randomly zero out entire channels (a channel is a 3D feature map, e.g., the jj -th channel of the ii -th sample in the batched input is a 3D tensor input[i,j]\text{input}[i, j] ). [`nn.AlphaDropout`](generated/torch.nn.alphadropout#torch.nn.AlphaDropout "torch.nn.AlphaDropout") | Applies Alpha Dropout over the input. ## Sparse Layers [`nn.Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding") | A simple lookup table that stores embeddings of a fixed dictionary and size. ---|--- [`nn.EmbeddingBag`](generated/torch.nn.embeddingbag#torch.nn.EmbeddingBag "torch.nn.EmbeddingBag") | Computes sums or means of ‘bags’ of embeddings, without instantiating the intermediate embeddings. ## Distance Functions [`nn.CosineSimilarity`](generated/torch.nn.cosinesimilarity#torch.nn.CosineSimilarity "torch.nn.CosineSimilarity") | Returns cosine similarity between x1x_1 and x2x_2 , computed along dim. ---|--- [`nn.PairwiseDistance`](generated/torch.nn.pairwisedistance#torch.nn.PairwiseDistance "torch.nn.PairwiseDistance") | Computes the batchwise pairwise distance between vectors v1v_1 , v2v_2 using the p-norm: ## Loss Functions [`nn.L1Loss`](generated/torch.nn.l1loss#torch.nn.L1Loss "torch.nn.L1Loss") | Creates a criterion that measures the mean absolute error (MAE) between each element in the input xx and target yy . ---|--- [`nn.MSELoss`](generated/torch.nn.mseloss#torch.nn.MSELoss "torch.nn.MSELoss") | Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input xx and target yy . [`nn.CrossEntropyLoss`](generated/torch.nn.crossentropyloss#torch.nn.CrossEntropyLoss "torch.nn.CrossEntropyLoss") | This criterion combines [`LogSoftmax`](generated/torch.nn.logsoftmax#torch.nn.LogSoftmax "torch.nn.LogSoftmax") and [`NLLLoss`](generated/torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") in one single class. [`nn.CTCLoss`](generated/torch.nn.ctcloss#torch.nn.CTCLoss "torch.nn.CTCLoss") | The Connectionist Temporal Classification loss. [`nn.NLLLoss`](generated/torch.nn.nllloss#torch.nn.NLLLoss "torch.nn.NLLLoss") | The negative log likelihood loss. [`nn.PoissonNLLLoss`](generated/torch.nn.poissonnllloss#torch.nn.PoissonNLLLoss "torch.nn.PoissonNLLLoss") | Negative log likelihood loss with Poisson distribution of target. [`nn.GaussianNLLLoss`](generated/torch.nn.gaussiannllloss#torch.nn.GaussianNLLLoss "torch.nn.GaussianNLLLoss") | Gaussian negative log likelihood loss. [`nn.KLDivLoss`](generated/torch.nn.kldivloss#torch.nn.KLDivLoss "torch.nn.KLDivLoss") | The Kullback-Leibler divergence loss measure [`nn.BCELoss`](generated/torch.nn.bceloss#torch.nn.BCELoss "torch.nn.BCELoss") | Creates a criterion that measures the Binary Cross Entropy between the target and the output: [`nn.BCEWithLogitsLoss`](generated/torch.nn.bcewithlogitsloss#torch.nn.BCEWithLogitsLoss "torch.nn.BCEWithLogitsLoss") | This loss combines a `Sigmoid` layer and the `BCELoss` in one single class. [`nn.MarginRankingLoss`](generated/torch.nn.marginrankingloss#torch.nn.MarginRankingLoss "torch.nn.MarginRankingLoss") | Creates a criterion that measures the loss given inputs x1x1 , x2x2 , two 1D mini-batch `Tensors`, and a label 1D mini-batch tensor yy (containing 1 or -1). [`nn.HingeEmbeddingLoss`](generated/torch.nn.hingeembeddingloss#torch.nn.HingeEmbeddingLoss "torch.nn.HingeEmbeddingLoss") | Measures the loss given an input tensor xx and a labels tensor yy (containing 1 or -1). [`nn.MultiLabelMarginLoss`](generated/torch.nn.multilabelmarginloss#torch.nn.MultiLabelMarginLoss "torch.nn.MultiLabelMarginLoss") | Creates a criterion that optimizes a multi-class multi-classification hinge loss (margin-based loss) between input xx (a 2D mini-batch `Tensor`) and output yy (which is a 2D `Tensor` of target class indices). [`nn.SmoothL1Loss`](generated/torch.nn.smoothl1loss#torch.nn.SmoothL1Loss "torch.nn.SmoothL1Loss") | Creates a criterion that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise. [`nn.SoftMarginLoss`](generated/torch.nn.softmarginloss#torch.nn.SoftMarginLoss "torch.nn.SoftMarginLoss") | Creates a criterion that optimizes a two-class classification logistic loss between input tensor xx and target tensor yy (containing 1 or -1). [`nn.MultiLabelSoftMarginLoss`](generated/torch.nn.multilabelsoftmarginloss#torch.nn.MultiLabelSoftMarginLoss "torch.nn.MultiLabelSoftMarginLoss") | Creates a criterion that optimizes a multi-label one-versus-all loss based on max-entropy, between input xx and target yy of size (N,C)(N, C) . [`nn.CosineEmbeddingLoss`](generated/torch.nn.cosineembeddingloss#torch.nn.CosineEmbeddingLoss "torch.nn.CosineEmbeddingLoss") | Creates a criterion that measures the loss given input tensors x1x_1 , x2x_2 and a `Tensor` label yy with values 1 or -1. [`nn.MultiMarginLoss`](generated/torch.nn.multimarginloss#torch.nn.MultiMarginLoss "torch.nn.MultiMarginLoss") | Creates a criterion that optimizes a multi-class classification hinge loss (margin-based loss) between input xx (a 2D mini-batch `Tensor`) and output yy (which is a 1D tensor of target class indices, 0≤y≤x.size(1)−10 \leq y \leq \text{x.size}(1)-1 ): [`nn.TripletMarginLoss`](generated/torch.nn.tripletmarginloss#torch.nn.TripletMarginLoss "torch.nn.TripletMarginLoss") | Creates a criterion that measures the triplet loss given an input tensors x1x1 , x2x2 , x3x3 and a margin with a value greater than 00 . [`nn.TripletMarginWithDistanceLoss`](generated/torch.nn.tripletmarginwithdistanceloss#torch.nn.TripletMarginWithDistanceLoss "torch.nn.TripletMarginWithDistanceLoss") | Creates a criterion that measures the triplet loss given input tensors aa , pp , and nn (representing anchor, positive, and negative examples, respectively), and a nonnegative, real-valued function (“distance function”) used to compute the relationship between the anchor and positive example (“positive distance”) and the anchor and negative example (“negative distance”). ## Vision Layers [`nn.PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle "torch.nn.PixelShuffle") | Rearranges elements in a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) to a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) , where r is an upscale factor. ---|--- [`nn.PixelUnshuffle`](generated/torch.nn.pixelunshuffle#torch.nn.PixelUnshuffle "torch.nn.PixelUnshuffle") | Reverses the [`PixelShuffle`](generated/torch.nn.pixelshuffle#torch.nn.PixelShuffle "torch.nn.PixelShuffle") operation by rearranging elements in a tensor of shape (∗,C,H×r,W×r)(*, C, H \times r, W \times r) to a tensor of shape (∗,C×r2,H,W)(*, C \times r^2, H, W) , where r is a downscale factor. [`nn.Upsample`](generated/torch.nn.upsample#torch.nn.Upsample "torch.nn.Upsample") | Upsamples a given multi-channel 1D (temporal), 2D (spatial) or 3D (volumetric) data. [`nn.UpsamplingNearest2d`](generated/torch.nn.upsamplingnearest2d#torch.nn.UpsamplingNearest2d "torch.nn.UpsamplingNearest2d") | Applies a 2D nearest neighbor upsampling to an input signal composed of several input channels. [`nn.UpsamplingBilinear2d`](generated/torch.nn.upsamplingbilinear2d#torch.nn.UpsamplingBilinear2d "torch.nn.UpsamplingBilinear2d") | Applies a 2D bilinear upsampling to an input signal composed of several input channels. ## Shuffle Layers [`nn.ChannelShuffle`](generated/torch.nn.channelshuffle#torch.nn.ChannelShuffle "torch.nn.ChannelShuffle") | Divide the channels in a tensor of shape (∗,C,H,W)(*, C , H, W) into g groups and rearrange them as (∗,Cg,g,H,W)(*, C \frac g, g, H, W) , while keeping the original tensor shape. ---|--- ## DataParallel Layers (multi-GPU, distributed) [`nn.DataParallel`](generated/torch.nn.dataparallel#torch.nn.DataParallel "torch.nn.DataParallel") | Implements data parallelism at the module level. ---|--- [`nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") | Implements distributed data parallelism that is based on `torch.distributed` package at the module level. ## Utilities From the `torch.nn.utils` module [`clip_grad_norm_`](generated/torch.nn.utils.clip_grad_norm_#torch.nn.utils.clip_grad_norm_ "torch.nn.utils.clip_grad_norm_") | Clips gradient norm of an iterable of parameters. ---|--- [`clip_grad_value_`](generated/torch.nn.utils.clip_grad_value_#torch.nn.utils.clip_grad_value_ "torch.nn.utils.clip_grad_value_") | Clips gradient of an iterable of parameters at specified value. [`parameters_to_vector`](generated/torch.nn.utils.parameters_to_vector#torch.nn.utils.parameters_to_vector "torch.nn.utils.parameters_to_vector") | Convert parameters to one vector [`vector_to_parameters`](generated/torch.nn.utils.vector_to_parameters#torch.nn.utils.vector_to_parameters "torch.nn.utils.vector_to_parameters") | Convert one vector to the parameters [`prune.BasePruningMethod`](generated/torch.nn.utils.prune.basepruningmethod#torch.nn.utils.prune.BasePruningMethod "torch.nn.utils.prune.BasePruningMethod") | Abstract base class for creation of new pruning techniques. ---|--- [`prune.PruningContainer`](generated/torch.nn.utils.prune.pruningcontainer#torch.nn.utils.prune.PruningContainer "torch.nn.utils.prune.PruningContainer") | Container holding a sequence of pruning methods for iterative pruning. ---|--- [`prune.Identity`](generated/torch.nn.utils.prune.identity#torch.nn.utils.prune.Identity "torch.nn.utils.prune.Identity") | Utility pruning method that does not prune any units but generates the pruning parametrization with a mask of ones. [`prune.RandomUnstructured`](generated/torch.nn.utils.prune.randomunstructured#torch.nn.utils.prune.RandomUnstructured "torch.nn.utils.prune.RandomUnstructured") | Prune (currently unpruned) units in a tensor at random. [`prune.L1Unstructured`](generated/torch.nn.utils.prune.l1unstructured#torch.nn.utils.prune.L1Unstructured "torch.nn.utils.prune.L1Unstructured") | Prune (currently unpruned) units in a tensor by zeroing out the ones with the lowest L1-norm. [`prune.RandomStructured`](generated/torch.nn.utils.prune.randomstructured#torch.nn.utils.prune.RandomStructured "torch.nn.utils.prune.RandomStructured") | Prune entire (currently unpruned) channels in a tensor at random. [`prune.LnStructured`](generated/torch.nn.utils.prune.lnstructured#torch.nn.utils.prune.LnStructured "torch.nn.utils.prune.LnStructured") | Prune entire (currently unpruned) channels in a tensor based on their Ln-norm. [`prune.CustomFromMask`](generated/torch.nn.utils.prune.customfrommask#torch.nn.utils.prune.CustomFromMask "torch.nn.utils.prune.CustomFromMask") | [`prune.identity`](generated/torch.nn.utils.prune.identity#torch.nn.utils.prune.identity "torch.nn.utils.prune.identity") | Applies pruning reparametrization to the tensor corresponding to the parameter called `name` in `module` without actually pruning any units. [`prune.random_unstructured`](generated/torch.nn.utils.prune.random_unstructured#torch.nn.utils.prune.random_unstructured "torch.nn.utils.prune.random_unstructured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) units selected at random. [`prune.l1_unstructured`](generated/torch.nn.utils.prune.l1_unstructured#torch.nn.utils.prune.l1_unstructured "torch.nn.utils.prune.l1_unstructured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) units with the lowest L1-norm. [`prune.random_structured`](generated/torch.nn.utils.prune.random_structured#torch.nn.utils.prune.random_structured "torch.nn.utils.prune.random_structured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` selected at random. [`prune.ln_structured`](generated/torch.nn.utils.prune.ln_structured#torch.nn.utils.prune.ln_structured "torch.nn.utils.prune.ln_structured") | Prunes tensor corresponding to parameter called `name` in `module` by removing the specified `amount` of (currently unpruned) channels along the specified `dim` with the lowest L``n``-norm. [`prune.global_unstructured`](generated/torch.nn.utils.prune.global_unstructured#torch.nn.utils.prune.global_unstructured "torch.nn.utils.prune.global_unstructured") | Globally prunes tensors corresponding to all parameters in `parameters` by applying the specified `pruning_method`. [`prune.custom_from_mask`](generated/torch.nn.utils.prune.custom_from_mask#torch.nn.utils.prune.custom_from_mask "torch.nn.utils.prune.custom_from_mask") | Prunes tensor corresponding to parameter called `name` in `module` by applying the pre-computed mask in `mask`. [`prune.remove`](generated/torch.nn.utils.prune.remove#torch.nn.utils.prune.remove "torch.nn.utils.prune.remove") | Removes the pruning reparameterization from a module and the pruning method from the forward hook. [`prune.is_pruned`](generated/torch.nn.utils.prune.is_pruned#torch.nn.utils.prune.is_pruned "torch.nn.utils.prune.is_pruned") | Check whether `module` is pruned by looking for `forward_pre_hooks` in its modules that inherit from the `BasePruningMethod`. [`weight_norm`](generated/torch.nn.utils.weight_norm#torch.nn.utils.weight_norm "torch.nn.utils.weight_norm") | Applies weight normalization to a parameter in the given module. [`remove_weight_norm`](generated/torch.nn.utils.remove_weight_norm#torch.nn.utils.remove_weight_norm "torch.nn.utils.remove_weight_norm") | Removes the weight normalization reparameterization from a module. [`spectral_norm`](generated/torch.nn.utils.spectral_norm#torch.nn.utils.spectral_norm "torch.nn.utils.spectral_norm") | Applies spectral normalization to a parameter in the given module. [`remove_spectral_norm`](generated/torch.nn.utils.remove_spectral_norm#torch.nn.utils.remove_spectral_norm "torch.nn.utils.remove_spectral_norm") | Removes the spectral normalization reparameterization from a module. Utility functions in other modules [`nn.utils.rnn.PackedSequence`](generated/torch.nn.utils.rnn.packedsequence#torch.nn.utils.rnn.PackedSequence "torch.nn.utils.rnn.PackedSequence") | Holds the data and list of `batch_sizes` of a packed sequence. ---|--- [`nn.utils.rnn.pack_padded_sequence`](generated/torch.nn.utils.rnn.pack_padded_sequence#torch.nn.utils.rnn.pack_padded_sequence "torch.nn.utils.rnn.pack_padded_sequence") | Packs a Tensor containing padded sequences of variable length. [`nn.utils.rnn.pad_packed_sequence`](generated/torch.nn.utils.rnn.pad_packed_sequence#torch.nn.utils.rnn.pad_packed_sequence "torch.nn.utils.rnn.pad_packed_sequence") | Pads a packed batch of variable length sequences. [`nn.utils.rnn.pad_sequence`](generated/torch.nn.utils.rnn.pad_sequence#torch.nn.utils.rnn.pad_sequence "torch.nn.utils.rnn.pad_sequence") | Pad a list of variable length Tensors with `padding_value` [`nn.utils.rnn.pack_sequence`](generated/torch.nn.utils.rnn.pack_sequence#torch.nn.utils.rnn.pack_sequence "torch.nn.utils.rnn.pack_sequence") | Packs a list of variable length Tensors [`nn.Flatten`](generated/torch.nn.flatten#torch.nn.Flatten "torch.nn.Flatten") | Flattens a contiguous range of dims into a tensor. [`nn.Unflatten`](generated/torch.nn.unflatten#torch.nn.Unflatten "torch.nn.Unflatten") | Unflattens a tensor dim expanding it to a desired shape. ## Quantized Functions Quantization refers to techniques for performing computations and storing tensors at lower bitwidths than floating point precision. PyTorch supports both per tensor and per channel asymmetric linear quantization. To learn more how to use quantized functions in PyTorch, please refer to the [Quantization](quantization#quantization-doc) documentation. ## Lazy Modules Initialization [`nn.modules.lazy.LazyModuleMixin`](generated/torch.nn.modules.lazy.lazymodulemixin#torch.nn.modules.lazy.LazyModuleMixin "torch.nn.modules.lazy.LazyModuleMixin") | A mixin for modules that lazily initialize parameters, also known as “lazy modules.” ---|--- # torch.nn.init `torch.nn.init.calculate_gain(nonlinearity, param=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#calculate_gain) Return the recommended gain value for the given nonlinearity function. The values are as follows: nonlinearity | gain ---|--- Linear / Identity | 11 Conv{1,2,3}D | 11 Sigmoid | 11 Tanh | 53\frac{5}{3} ReLU | 2\sqrt{2} Leaky Relu | 21+negative_slope2\sqrt{\frac{2}{1 + \text{negative\\_slope}^2}} SELU | 34\frac{3}{4} Parameters * **nonlinearity** – the non-linear function (`nn.functional` name) * **param** – optional parameter for the non-linear function #### Examples >>> gain = nn.init.calculate_gain('leaky_relu', 0.2) # leaky_relu with negative_slope=0.2 `torch.nn.init.uniform_(tensor, a=0.0, b=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#uniform_) Fills the input Tensor with values drawn from the uniform distribution U(a,b)\mathcal{U}(a, b) . Parameters * **tensor** – an n-dimensional `torch.Tensor` * **a** – the lower bound of the uniform distribution * **b** – the upper bound of the uniform distribution #### Examples >>> w = torch.empty(3, 5) >>> nn.init.uniform_(w) `torch.nn.init.normal_(tensor, mean=0.0, std=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#normal_) Fills the input Tensor with values drawn from the normal distribution N(mean,std2)\mathcal{N}(\text{mean}, \text{std}^2) . Parameters * **tensor** – an n-dimensional `torch.Tensor` * **mean** – the mean of the normal distribution * **std** – the standard deviation of the normal distribution #### Examples >>> w = torch.empty(3, 5) >>> nn.init.normal_(w) `torch.nn.init.constant_(tensor, val)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#constant_) Fills the input Tensor with the value val\text{val} . Parameters * **tensor** – an n-dimensional `torch.Tensor` * **val** – the value to fill the tensor with #### Examples >>> w = torch.empty(3, 5) >>> nn.init.constant_(w, 0.3) `torch.nn.init.ones_(tensor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#ones_) Fills the input Tensor with the scalar value `1`. Parameters **tensor** – an n-dimensional `torch.Tensor` #### Examples >>> w = torch.empty(3, 5) >>> nn.init.ones_(w) `torch.nn.init.zeros_(tensor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#zeros_) Fills the input Tensor with the scalar value `0`. Parameters **tensor** – an n-dimensional `torch.Tensor` #### Examples >>> w = torch.empty(3, 5) >>> nn.init.zeros_(w) `torch.nn.init.eye_(tensor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#eye_) Fills the 2-dimensional input `Tensor` with the identity matrix. Preserves the identity of the inputs in `Linear` layers, where as many inputs are preserved as possible. Parameters **tensor** – a 2-dimensional `torch.Tensor` #### Examples >>> w = torch.empty(3, 5) >>> nn.init.eye_(w) `torch.nn.init.dirac_(tensor, groups=1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#dirac_) Fills the {3, 4, 5}-dimensional input `Tensor` with the Dirac delta function. Preserves the identity of the inputs in `Convolutional` layers, where as many input channels are preserved as possible. In case of groups>1, each group of channels preserves identity Parameters * **tensor** – a {3, 4, 5}-dimensional `torch.Tensor` * **groups** (_optional_) – number of groups in the conv layer (default: 1) #### Examples >>> w = torch.empty(3, 16, 5, 5) >>> nn.init.dirac_(w) >>> w = torch.empty(3, 24, 5, 5) >>> nn.init.dirac_(w, 3) `torch.nn.init.xavier_uniform_(tensor, gain=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#xavier_uniform_) Fills the input `Tensor` with values according to the method described in `Understanding the difficulty of training deep feedforward neural networks` \- Glorot, X. & Bengio, Y. (2010), using a uniform distribution. The resulting tensor will have values sampled from U(−a,a)\mathcal{U}(-a, a) where a=gain×6fan_in+fan_outa = \text{gain} \times \sqrt{\frac{6}{\text{fan\\_in} + \text{fan\\_out}}} Also known as Glorot initialization. Parameters * **tensor** – an n-dimensional `torch.Tensor` * **gain** – an optional scaling factor #### Examples >>> w = torch.empty(3, 5) >>> nn.init.xavier_uniform_(w, gain=nn.init.calculate_gain('relu')) `torch.nn.init.xavier_normal_(tensor, gain=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#xavier_normal_) Fills the input `Tensor` with values according to the method described in `Understanding the difficulty of training deep feedforward neural networks` \- Glorot, X. & Bengio, Y. (2010), using a normal distribution. The resulting tensor will have values sampled from N(0,std2)\mathcal{N}(0, \text{std}^2) where std=gain×2fan_in+fan_out\text{std} = \text{gain} \times \sqrt{\frac{2}{\text{fan\\_in} + \text{fan\\_out}}} Also known as Glorot initialization. Parameters * **tensor** – an n-dimensional `torch.Tensor` * **gain** – an optional scaling factor #### Examples >>> w = torch.empty(3, 5) >>> nn.init.xavier_normal_(w) `torch.nn.init.kaiming_uniform_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#kaiming_uniform_) Fills the input `Tensor` with values according to the method described in `Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification` \- He, K. et al. (2015), using a uniform distribution. The resulting tensor will have values sampled from U(−bound,bound)\mathcal{U}(-\text{bound}, \text{bound}) where bound=gain×3fan_mode\text{bound} = \text{gain} \times \sqrt{\frac{3}{\text{fan\\_mode}}} Also known as He initialization. Parameters * **tensor** – an n-dimensional `torch.Tensor` * **a** – the negative slope of the rectifier used after this layer (only used with `'leaky_relu'`) * **mode** – either `'fan_in'` (default) or `'fan_out'`. Choosing `'fan_in'` preserves the magnitude of the variance of the weights in the forward pass. Choosing `'fan_out'` preserves the magnitudes in the backwards pass. * **nonlinearity** – the non-linear function (`nn.functional` name), recommended to use only with `'relu'` or `'leaky_relu'` (default). #### Examples >>> w = torch.empty(3, 5) >>> nn.init.kaiming_uniform_(w, mode='fan_in', nonlinearity='relu') `torch.nn.init.kaiming_normal_(tensor, a=0, mode='fan_in', nonlinearity='leaky_relu')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#kaiming_normal_) Fills the input `Tensor` with values according to the method described in `Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification` \- He, K. et al. (2015), using a normal distribution. The resulting tensor will have values sampled from N(0,std2)\mathcal{N}(0, \text{std}^2) where std=gainfan_mode\text{std} = \frac{\text{gain}}{\sqrt{\text{fan\\_mode}}} Also known as He initialization. Parameters * **tensor** – an n-dimensional `torch.Tensor` * **a** – the negative slope of the rectifier used after this layer (only used with `'leaky_relu'`) * **mode** – either `'fan_in'` (default) or `'fan_out'`. Choosing `'fan_in'` preserves the magnitude of the variance of the weights in the forward pass. Choosing `'fan_out'` preserves the magnitudes in the backwards pass. * **nonlinearity** – the non-linear function (`nn.functional` name), recommended to use only with `'relu'` or `'leaky_relu'` (default). #### Examples >>> w = torch.empty(3, 5) >>> nn.init.kaiming_normal_(w, mode='fan_out', nonlinearity='relu') `torch.nn.init.orthogonal_(tensor, gain=1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#orthogonal_) Fills the input `Tensor` with a (semi) orthogonal matrix, as described in `Exact solutions to the nonlinear dynamics of learning in deep linear neural networks` \- Saxe, A. et al. (2013). The input tensor must have at least 2 dimensions, and for tensors with more than 2 dimensions the trailing dimensions are flattened. Parameters * **tensor** – an n-dimensional `torch.Tensor`, where n≥2n \geq 2 * **gain** – optional scaling factor #### Examples >>> w = torch.empty(3, 5) >>> nn.init.orthogonal_(w) `torch.nn.init.sparse_(tensor, sparsity, std=0.01)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/init.html#sparse_) Fills the 2D input `Tensor` as a sparse matrix, where the non-zero elements will be drawn from the normal distribution N(0,0.01)\mathcal{N}(0, 0.01) , as described in `Deep learning via Hessian-free optimization` \- Martens, J. (2010). Parameters * **tensor** – an n-dimensional `torch.Tensor` * **sparsity** – The fraction of elements in each column to be set to zero * **std** – the standard deviation of the normal distribution used to generate the non-zero values #### Examples >>> w = torch.empty(3, 5) >>> nn.init.sparse_(w, sparsity=0.1) # torch.onnx * Example: End-to-end AlexNet from PyTorch to ONNX * Tracing vs Scripting * Write PyTorch model in Torch way * Using dictionaries to handle Named Arguments as model inputs * Indexing * Getter * Setter * TorchVision support * Limitations * Supported operators * Adding support for operators * ATen operators * Non-ATen operators * Custom operators * Operator Export Type * ONNX * ONNX_ATEN * ONNX_ATEN_FALLBACK * RAW * ONNX_FALLTHROUGH * Frequently Asked Questions * Use external data format * Training * Functions ## Example: End-to-end AlexNet from PyTorch to ONNX Here is a simple script which exports a pretrained AlexNet as defined in torchvision into ONNX. It runs a single round of inference and then saves the resulting traced model to `alexnet.onnx`: import torch import torchvision dummy_input = torch.randn(10, 3, 224, 224, device='cuda') model = torchvision.models.alexnet(pretrained=True).cuda() # Providing input and output names sets the display names for values # within the model's graph. Setting these does not change the semantics # of the graph; it is only for readability. # # The inputs to the network consist of the flat list of inputs (i.e. # the values you would pass to the forward() method) followed by the # flat list of parameters. You can partially specify names, i.e. provide # a list here shorter than the number of inputs to the model, and we will # only set that subset of names, starting from the beginning. input_names = [ "actual_input_1" ] + [ "learned_%d" % i for i in range(16) ] output_names = [ "output1" ] torch.onnx.export(model, dummy_input, "alexnet.onnx", verbose=True, input_names=input_names, output_names=output_names) The resulting `alexnet.onnx` is a binary protobuf file which contains both the network structure and parameters of the model you exported (in this case, AlexNet). The keyword argument `verbose=True` causes the exporter to print out a human-readable representation of the network: # These are the inputs and parameters to the network, which have taken on # the names we specified earlier. graph(%actual_input_1 : Float(10, 3, 224, 224) %learned_0 : Float(64, 3, 11, 11) %learned_1 : Float(64) %learned_2 : Float(192, 64, 5, 5) %learned_3 : Float(192) # ---- omitted for brevity ---- %learned_14 : Float(1000, 4096) %learned_15 : Float(1000)) { # Every statement consists of some output tensors (and their types), # the operator to be run (with its attributes, e.g., kernels, strides, # etc.), its input tensors (%actual_input_1, %learned_0, %learned_1) %17 : Float(10, 64, 55, 55) = onnx::Conv[dilations=[1, 1], group=1, kernel_shape=[11, 11], pads=[2, 2, 2, 2], strides=[4, 4]](%actual_input_1, %learned_0, %learned_1), scope: AlexNet/Sequential[features]/Conv2d[0] %18 : Float(10, 64, 55, 55) = onnx::Relu(%17), scope: AlexNet/Sequential[features]/ReLU[1] %19 : Float(10, 64, 27, 27) = onnx::MaxPool[kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%18), scope: AlexNet/Sequential[features]/MaxPool2d[2] # ---- omitted for brevity ---- %29 : Float(10, 256, 6, 6) = onnx::MaxPool[kernel_shape=[3, 3], pads=[0, 0, 0, 0], strides=[2, 2]](%28), scope: AlexNet/Sequential[features]/MaxPool2d[12] # Dynamic means that the shape is not known. This may be because of a # limitation of our implementation (which we would like to fix in a # future release) or shapes which are truly dynamic. %30 : Dynamic = onnx::Shape(%29), scope: AlexNet %31 : Dynamic = onnx::Slice[axes=[0], ends=[1], starts=[0]](%30), scope: AlexNet %32 : Long() = onnx::Squeeze[axes=[0]](%31), scope: AlexNet %33 : Long() = onnx::Constant[value={9216}](), scope: AlexNet # ---- omitted for brevity ---- %output1 : Float(10, 1000) = onnx::Gemm[alpha=1, beta=1, broadcast=1, transB=1](%45, %learned_14, %learned_15), scope: AlexNet/Sequential[classifier]/Linear[6] return (%output1); } You can also verify the protobuf using the [ONNX](https://github.com/onnx/onnx/) library. You can install `ONNX` with conda: conda install -c conda-forge onnx Then, you can run: import onnx # Load the ONNX model model = onnx.load("alexnet.onnx") # Check that the IR is well formed onnx.checker.check_model(model) # Print a human readable representation of the graph onnx.helper.printable_graph(model.graph) To run the exported script with [caffe2](https://caffe2.ai/), you will need to install `caffe2`: If you don’t have one already, Please [follow the install instructions](https://caffe2.ai/docs/getting-started.html). Once these are installed, you can use the backend for Caffe2: # ...continuing from above import caffe2.python.onnx.backend as backend import numpy as np rep = backend.prepare(model, device="CUDA:0") # or "CPU" # For the Caffe2 backend: # rep.predict_net is the Caffe2 protobuf for the network # rep.workspace is the Caffe2 workspace for the network # (see the class caffe2.python.onnx.backend.Workspace) outputs = rep.run(np.random.randn(10, 3, 224, 224).astype(np.float32)) # To run networks with more than one input, pass a tuple # rather than a single numpy ndarray. print(outputs[0]) You can also run the exported model with [ONNX Runtime](https://github.com/microsoft/onnxruntime), you will need to install `ONNX Runtime`: please [follow these instructions](https://github.com/microsoft/onnxruntime#installation). Once these are installed, you can use the backend for ONNX Runtime: # ...continuing from above import onnxruntime as ort ort_session = ort.InferenceSession('alexnet.onnx') outputs = ort_session.run(None, {'actual_input_1': np.random.randn(10, 3, 224, 224).astype(np.float32)}) print(outputs[0]) Here is another [tutorial of exporting the SuperResolution model to ONNX.](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html). In the future, there will be backends for other frameworks as well. ## Tracing vs Scripting The ONNX exporter can be both _trace-based_ and _script-based_ exporter. * _trace-based_ means that it operates by executing your model once, and exporting the operators which were actually run during this run. This means that if your model is dynamic, e.g., changes behavior depending on input data, the export won’t be accurate. Similarly, a trace is likely to be valid only for a specific input size (which is one reason why we require explicit inputs on tracing.) We recommend examining the model trace and making sure the traced operators look reasonable. If your model contains control flows like for loops and if conditions, _trace-based_ exporter will unroll the loops and if conditions, exporting a static graph that is exactly the same as this run. If you want to export your model with dynamic control flows, you will need to use the _script-based_ exporter. * _script-based_ means that the model you are trying to export is a [ScriptModule](jit). `ScriptModule` is the core data structure in `TorchScript`, and `TorchScript` is a subset of Python language, that creates serializable and optimizable models from PyTorch code. We allow mixing tracing and scripting. You can compose tracing and scripting to suit the particular requirements of a part of a model. Checkout this example: import torch # Trace-based only class LoopModel(torch.nn.Module): def forward(self, x, y): for i in range(y): x = x + i return x model = LoopModel() dummy_input = torch.ones(2, 3, dtype=torch.long) loop_count = torch.tensor(5, dtype=torch.long) torch.onnx.export(model, (dummy_input, loop_count), 'loop.onnx', verbose=True) With _trace-based_ exporter, we get the result ONNX graph which unrolls the for loop: graph(%0 : Long(2, 3), %1 : Long()): %2 : Tensor = onnx::Constant[value={1}]() %3 : Tensor = onnx::Add(%0, %2) %4 : Tensor = onnx::Constant[value={2}]() %5 : Tensor = onnx::Add(%3, %4) %6 : Tensor = onnx::Constant[value={3}]() %7 : Tensor = onnx::Add(%5, %6) %8 : Tensor = onnx::Constant[value={4}]() %9 : Tensor = onnx::Add(%7, %8) return (%9) To utilize _script-based_ exporter for capturing the dynamic loop, we can write the loop in script, and call it from the regular nn.Module: # Mixing tracing and scripting @torch.jit.script def loop(x, y): for i in range(int(y)): x = x + i return x class LoopModel2(torch.nn.Module): def forward(self, x, y): return loop(x, y) model = LoopModel2() dummy_input = torch.ones(2, 3, dtype=torch.long) loop_count = torch.tensor(5, dtype=torch.long) torch.onnx.export(model, (dummy_input, loop_count), 'loop.onnx', verbose=True, input_names=['input_data', 'loop_range']) Now the exported ONNX graph becomes: graph(%input_data : Long(2, 3), %loop_range : Long()): %2 : Long() = onnx::Constant[value={1}](), scope: LoopModel2/loop %3 : Tensor = onnx::Cast[to=9](%2) %4 : Long(2, 3) = onnx::Loop(%loop_range, %3, %input_data), scope: LoopModel2/loop # custom_loop.py:240:5 block0(%i.1 : Long(), %cond : bool, %x.6 : Long(2, 3)): %8 : Long(2, 3) = onnx::Add(%x.6, %i.1), scope: LoopModel2/loop # custom_loop.py:241:13 %9 : Tensor = onnx::Cast[to=9](%2) -> (%9, %8) return (%4) The dynamic control flow is captured correctly. We can verify in backends with different loop range. import caffe2.python.onnx.backend as backend import numpy as np import onnx model = onnx.load('loop.onnx') rep = backend.prepare(model) outputs = rep.run((dummy_input.numpy(), np.array(9).astype(np.int64))) print(outputs[0]) #[[37 37 37] # [37 37 37]] import onnxruntime as ort ort_sess = ort.InferenceSession('loop.onnx') outputs = ort_sess.run(None, {'input_data': dummy_input.numpy(), 'loop_range': np.array(9).astype(np.int64)}) print(outputs) #[array([[37, 37, 37], # [37, 37, 37]], dtype=int64)] To avoid exporting a variable scalar tensor as a fixed value constant as part of the ONNX model, please avoid use of `torch.Tensor.item()`. Torch supports implicit cast of single-element tensors to numbers. E.g.: class LoopModel(torch.nn.Module): def forward(self, x, y): res = [] arr = x.split(2, 0) for i in range(int(y)): res += [arr[i].sum(0, False)] return torch.stack(res) model = torch.jit.script(LoopModel()) inputs = (torch.randn(16), torch.tensor(8)) out = model(*inputs) torch.onnx.export(model, inputs, 'loop_and_list.onnx', opset_version=11, example_outputs=out) ## Write PyTorch model in Torch way PyTorch models can be written using numpy manipulations, but this is not proper when we convert to the ONNX model. For the trace-based exporter, tracing treats the numpy values as the constant node, therefore it calculates the wrong result if we change the input. So the PyTorch model need implement using torch operators. For example, do not use numpy operators on numpy tensors: np.concatenate((x, y, z), axis=1) do not convert to numpy types: y = x.astype(np.int) Always use torch tensors and torch operators: torch.concat, etc. In addition, Dropout layer need defined in init function so that inferencing can handle it properly, i.e., class MyModule(nn.Module): def __init__(self): self.dropout = nn.Dropout(0.5) def forward(self, x): x = self.dropout(x) ## Using dictionaries to handle Named Arguments as model inputs There are two ways to handle models which consist of named parameters or keyword arguments as inputs: * The first method is to pass all the inputs in the same order as required by the model and pass None values for the keyword arguments that do not require a value to be passed * The second and more intuitive method is to represent the keyword arguments as key-value pairs where the key represents the name of the argument in the model signature and the value represents the value of the argument to be passed For example, in the model: class Model(torch.nn.Module): def forward(self, x, y=None, z=None): if y is not None: return x + y if z is not None: return x + z return x m = Model() x = torch.randn(2, 3) z = torch.randn(2, 3) There are two ways of exporting the model: * Not using a dictionary for the keyword arguments and passing all the inputs in the same order as required by the model torch.onnx.export(model, (x, None, z), ‘test.onnx’) * Using a dictionary to represent the keyword arguments. This dictionary is always passed in addition to the non-keyword arguments and is always the last argument in the args tuple. torch.onnx.export(model, (x, {'y': None, 'z': z}), ‘test.onnx’) For cases in which there are no keyword arguments, models can be exported with either an empty or no dictionary. For example, torch.onnx.export(model, (x, {}), ‘test.onnx’) or torch.onnx.export(model, (x, ), ‘test.onnx’) An exception to this rule are cases in which the last input is also of a dictionary type. In these cases it is mandatory to have an empty dictionary as the last argument in the args tuple. For example, class Model(torch.nn.Module): def forward(self, k, x): ... return x m = Model() k = torch.randn(2, 3) x = {torch.tensor(1.): torch.randn(2, 3)} Without the presence of the empty dictionary, the export call assumes that the ‘x’ input is intended to represent the optional dictionary consisting of named arguments. In order to prevent this from being an issue a constraint is placed to provide an empty dictionary as the last input in the tuple args in such cases. The new call would look like this. torch.onnx.export(model, (k, x, {}), ‘test.onnx’) ## Indexing Tensor indexing in PyTorch is very flexible and complicated. There are two categories of indexing. Both are largely supported in exporting today. If you are experiencing issues exporting indexing that belongs to the supported patterns below, please double check that you are exporting with the latest opset (opset_version=12). ### Getter This type of indexing occurs on the RHS. Export is supported for ONNX opset version >= 9. E.g.: data = torch.randn(3, 4) index = torch.tensor([1, 2]) # RHS indexing is supported in ONNX opset >= 11. class RHSIndexing(torch.nn.Module): def forward(self, data, index): return data[index] out = RHSIndexing()(data, index) torch.onnx.export(RHSIndexing(), (data, index), 'indexing.onnx', opset_version=9) # onnxruntime import onnxruntime sess = onnxruntime.InferenceSession('indexing.onnx') out_ort = sess.run(None, { sess.get_inputs()[0].name: data.numpy(), sess.get_inputs()[1].name: index.numpy(), }) assert torch.all(torch.eq(out, torch.tensor(out_ort))) Below is the list of supported patterns for RHS indexing. # Scalar indices data[0, 1] # Slice indices data[:3] # Tensor indices data[torch.tensor([[1, 2], [2, 3]])] data[torch.tensor([2, 3]), torch.tensor([1, 2])] data[torch.tensor([[1, 2], [2, 3]]), torch.tensor([2, 3])] data[torch.tensor([2, 3]), :, torch.tensor([1, 2])] # Ellipsis # Not supported in scripting # i.e. torch.jit.script(model) will fail if model contains this pattern. # Export is supported under tracing # i.e. torch.onnx.export(model) data[...] # The combination of above data[2, ..., torch.tensor([2, 1, 3]), 2:4, torch.tensor([[1], [2]])] # Boolean mask (supported for ONNX opset version >= 11) data[data != 1] And below is the list of unsupported patterns for RHS indexing. # Tensor indices that includes negative values. data[torch.tensor([[1, 2], [2, -3]]), torch.tensor([-2, 3])] ### Setter In code, this type of indexing occurs on the LHS. Export is supported for ONNX opset version >= 11. E.g.: data = torch.zeros(3, 4) new_data = torch.arange(4).to(torch.float32) # LHS indexing is supported in ONNX opset >= 11. class LHSIndexing(torch.nn.Module): def forward(self, data, new_data): data[1] = new_data return data out = LHSIndexing()(data, new_data) data = torch.zeros(3, 4) new_data = torch.arange(4).to(torch.float32) torch.onnx.export(LHSIndexing(), (data, new_data), 'inplace_assign.onnx', opset_version=11) # onnxruntime import onnxruntime sess = onnxruntime.InferenceSession('inplace_assign.onnx') out_ort = sess.run(None, { sess.get_inputs()[0].name: torch.zeros(3, 4).numpy(), sess.get_inputs()[1].name: new_data.numpy(), }) assert torch.all(torch.eq(out, torch.tensor(out_ort))) Below is the list of supported patterns for LHS indexing. # Scalar indices data[0, 1] = new_data # Slice indices data[:3] = new_data # Tensor indices # If more than one tensor are used as indices, only consecutive 1-d tensor indices are supported. data[torch.tensor([[1, 2], [2, 3]])] = new_data data[torch.tensor([2, 3]), torch.tensor([1, 2])] = new_data # Ellipsis # Not supported to export in script modules # i.e. torch.onnx.export(torch.jit.script(model)) will fail if model contains this pattern. # Export is supported under tracing # i.e. torch.onnx.export(model) data[...] = new_data # The combination of above data[2, ..., torch.tensor([2, 1, 3]), 2:4] += update # Boolean mask data[data != 1] = new_data And below is the list of unsupported patterns for LHS indexing. # Multiple tensor indices if any has rank >= 2 data[torch.tensor([[1, 2], [2, 3]]), torch.tensor([2, 3])] = new_data # Multiple tensor indices that are not consecutive data[torch.tensor([2, 3]), :, torch.tensor([1, 2])] = new_data # Tensor indices that includes negative values. data[torch.tensor([1, -2]), torch.tensor([-2, 3])] = new_data If you are experiencing issues exporting indexing that belongs to the above supported patterns, please double check that you are exporting with the latest opset (opset_version=12). ## TorchVision support All TorchVision models, except for quantized versions, are exportable to ONNX. More details can be found in [TorchVision](torchvision/models). ## Limitations * Only tuples, lists and Variables are supported as JIT inputs/outputs. Dictionaries and strings are also accepted but their usage is not recommended. Users need to verify their dict inputs carefully, and keep in mind that dynamic lookups are not available. * PyTorch and ONNX backends(Caffe2, ONNX Runtime, etc) often have implementations of operators with some numeric differences. Depending on model structure, these differences may be negligible, but they can also cause major divergences in behavior (especially on untrained models.) We allow Caffe2 to call directly to Torch implementations of operators, to help you smooth over these differences when precision is important, and to also document these differences. ## Supported operators The following operators are supported: * BatchNorm * ConstantPadNd * Conv * Dropout * Embedding (no optional arguments supported) * EmbeddingBag * FeatureDropout (training mode not supported) * Index * MaxPool1d * MaxPool2d * MaxPool3d * RNN * abs * absolute * acos * adaptive_avg_pool1d * adaptive_avg_pool2d * adaptive_avg_pool3d * adaptive_max_pool1d * adaptive_max_pool2d * adaptive_max_pool3d * add (nonzero alpha not supported) * addmm * and * arange * argmax * argmin * asin * atan * avg_pool1d * avg_pool2d * avg_pool2d * avg_pool3d * as_strided * baddbmm * bitshift * cat * ceil * celu * clamp * clamp_max * clamp_min * concat * copy * cos * cumsum * det * dim_arange * div * dropout * einsum * elu * empty * empty_like * eq * erf * exp * expand * expand_as * eye * flatten * floor * floor_divide * frobenius_norm * full * full_like * gather * ge * gelu * glu * group_norm * gt * hardswish * hardtanh * im2col * index_copy * index_fill * index_put * index_select * instance_norm * interpolate * isnan * KLDivLoss * layer_norm * le * leaky_relu * len * log * log1p * log2 * log_sigmoid * log_softmax * logdet * logsumexp * lt * masked_fill * masked_scatter * masked_select * max * mean * min * mm * mul * multinomial * narrow * ne * neg * new_empty * new_full * new_zeros * nll_loss * nonzero * norm * ones * ones_like * or * permute * pixel_shuffle * pow * prelu (single weight shared among input channels not supported) * prod * rand * randn * randn_like * reciprocal * reflection_pad * relu * repeat * replication_pad * reshape * reshape_as * round * rrelu * rsqrt * rsub * scalar_tensor * scatter * scatter_add * select * selu * sigmoid * sign * sin * size * slice * softmax * softplus * sort * split * sqrt * squeeze * stack * std * sub (nonzero alpha not supported) * sum * t * tan * tanh * threshold (non-zero threshold/non-zero value not supported) * to * topk * transpose * true_divide * type_as * unbind * unfold (experimental support with ATen-Caffe2 integration) * unique * unsqueeze * upsample_nearest1d * upsample_nearest2d * upsample_nearest3d * view * weight_norm * where * zeros * zeros_like The operator set above is sufficient to export the following models: * AlexNet * DCGAN * DenseNet * Inception (warning: this model is highly sensitive to changes in operator implementation) * ResNet * SuperResolution * VGG * [word_language_model](https://github.com/pytorch/examples/tree/master/word_language_model) ## Adding support for operators Adding export support for operators is an _advance usage_. To achieve this, developers need to touch the source code of PyTorch. Please follow the [instructions](https://github.com/pytorch/pytorch#from-source) for installing PyTorch from source. If the wanted operator is standardized in ONNX, it should be easy to add support for exporting such operator (adding a symbolic function for the operator). To confirm whether the operator is standardized or not, please check the [ONNX operator list](https://github.com/onnx/onnx/blob/master/docs/Operators.md). ### ATen operators If the operator is an ATen operator, which means you can find the declaration of the function in `torch/csrc/autograd/generated/VariableType.h` (available in generated code in PyTorch install dir), you should add the symbolic function in `torch/onnx/symbolic_opset.py` and follow the instructions listed as below: * Define the symbolic function in `torch/onnx/symbolic_opset.py`, for example [torch/onnx/symbolic_opset9.py](https://github.com/pytorch/pytorch/blob/master/torch/onnx/symbolic_opset9.py). Make sure the function has the same name as the ATen operator/function defined in `VariableType.h`. * The first parameter is always the exported ONNX graph. Parameter names must EXACTLY match the names in `VariableType.h`, because dispatch is done with keyword arguments. * Parameter ordering does NOT necessarily match what is in `VariableType.h`, tensors (inputs) are always first, then non-tensor arguments. * In the symbolic function, if the operator is already standardized in ONNX, we only need to create a node to represent the ONNX operator in the graph. * If the input argument is a tensor, but ONNX asks for a scalar, we have to explicitly do the conversion. The helper function `_scalar` can convert a scalar tensor into a python scalar, and `_if_scalar_type_as` can turn a Python scalar into a PyTorch tensor. ### Non-ATen operators If the operator is a non-ATen operator, the symbolic function has to be added in the corresponding PyTorch Function class. Please read the following instructions: * Create a symbolic function named `symbolic` in the corresponding Function class. * The first parameter is always the exported ONNX graph. * Parameter names except the first must EXACTLY match the names in `forward`. * The output tuple size must match the outputs of `forward`. * In the symbolic function, if the operator is already standardized in ONNX, we just need to create a node to represent the ONNX operator in the graph. Symbolic functions should be implemented in Python. All of these functions interact with Python methods which are implemented via C++-Python bindings, but intuitively the interface they provide looks like this: def operator/symbolic(g, *inputs): """ Modifies Graph (e.g., using "op"), adding the ONNX operations representing this PyTorch function, and returning a Value or tuple of Values specifying the ONNX outputs whose values correspond to the original PyTorch return values of the autograd Function (or None if an output is not supported by ONNX). Args: g (Graph): graph to write the ONNX representation into inputs (Value...): list of values representing the variables which contain the inputs for this function """ class Value(object): """Represents an intermediate tensor value computed in ONNX.""" def type(self): """Returns the Type of the value.""" class Type(object): def sizes(self): """Returns a tuple of ints representing the shape of a tensor this describes.""" class Graph(object): def op(self, opname, *inputs, **attrs): """ Create an ONNX operator 'opname', taking 'args' as inputs and attributes 'kwargs' and add it as a node to the current graph, returning the value representing the single output of this operator (see the `outputs` keyword argument for multi-return nodes). The set of operators and the inputs/attributes they take is documented at https://github.com/onnx/onnx/blob/master/docs/Operators.md Args: opname (string): The ONNX operator name, e.g., `Abs` or `Add`. args (Value...): The inputs to the operator; usually provided as arguments to the `symbolic` definition. kwargs: The attributes of the ONNX operator, with keys named according to the following convention: `alpha_f` indicates the `alpha` attribute with type `f`. The valid type specifiers are `f` (float), `i` (int), `s` (string) or `t` (Tensor). An attribute specified with type float accepts either a single float, or a list of floats (e.g., you would say `dims_i` for a `dims` attribute that takes a list of integers). outputs (int, optional): The number of outputs this operator returns; by default an operator is assumed to return a single output. If `outputs` is greater than one, this functions returns a tuple of output `Value`, representing each output of the ONNX operator in positional. """ The ONNX graph C++ definition is in `torch/csrc/jit/ir/ir.h`. Here is an example of handling missing symbolic function for `elu` operator. We try to export the model and see the error message as below: UserWarning: ONNX export failed on elu because torch.onnx.symbolic_opset9.elu does not exist RuntimeError: ONNX export failed: Couldn't export operator elu The export fails because PyTorch does not support exporting `elu` operator. We find `virtual Tensor elu(const Tensor & input, Scalar alpha, bool inplace) const override;` in `VariableType.h`. This means `elu` is an ATen operator. We check the [ONNX operator list](https://github.com/onnx/onnx/blob/master/docs/Operators.md), and confirm that `Elu` is standardized in ONNX. We add the following lines to `symbolic_opset9.py`: def elu(g, input, alpha, inplace=False): return g.op("Elu", input, alpha_f=_scalar(alpha)) Now PyTorch is able to export `elu` operator. There are more examples in [symbolic_opset9.py](https://github.com/pytorch/pytorch/blob/master/torch/onnx/symbolic_opset9.py), [symbolic_opset10.py](https://github.com/pytorch/pytorch/blob/master/torch/onnx/symbolic_opset10.py). The interface for specifying operator definitions is experimental; adventurous users should note that the APIs will probably change in a future interface. ### Custom operators Following this tutorial [Extending TorchScript with Custom C++ Operators](https://pytorch.org/tutorials/advanced/torch_script_custom_ops.html), you can create and register your own custom ops implementation in PyTorch. Here’s how to export such model to ONNX.: # Create custom symbolic function from torch.onnx.symbolic_helper import parse_args @parse_args('v', 'v', 'f', 'i') def symbolic_foo_forward(g, input1, input2, attr1, attr2): return g.op("Foo", input1, input2, attr1_f=attr1, attr2_i=attr2) # Register custom symbolic function from torch.onnx import register_custom_op_symbolic register_custom_op_symbolic('custom_ops::foo_forward', symbolic_foo_forward, 9) class FooModel(torch.nn.Module): def __init__(self, attr1, attr2): super(FooModule, self).__init__() self.attr1 = attr1 self.attr2 = attr2 def forward(self, input1, input2): # Calling custom op return torch.ops.custom_ops.foo_forward(input1, input2, self.attr1, self.attr2) model = FooModel(attr1, attr2) torch.onnx.export(model, (dummy_input1, dummy_input2), 'model.onnx', custom_opsets={"custom_domain": 2}) Depending on the custom operator, you can export it as one or a combination of existing ONNX ops. You can also export it as a custom op in ONNX as well. In that case, you can specify the custom domain and version (custom opset) using the `custom_opsets` dictionary at export. If not explicitly specified, the custom opset version is set to 1 by default. Using custom ONNX ops, you will need to extend the backend of your choice with matching custom ops implementation, e.g. [Caffe2 custom ops](https://caffe2.ai/docs/custom- operators.html), [ONNX Runtime custom ops](https://github.com/microsoft/onnxruntime/blob/master/docs/AddingCustomOp.md). ## Operator Export Type Exporting models with unsupported ONNX operators can be achieved using the `operator_export_type` flag in export API. This flag is useful when users try to export ATen and non-ATen operators that are not registered and supported in ONNX. ### ONNX This mode is used to export all operators as regular ONNX operators. This is the default `operator_export_type` mode. Example torch ir graph: graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])): %3 : Float(2, 3, 4, strides=[12, 4, 1]) = aten:exp(%0) %4 : Float(2, 3, 4, strides=[12, 4, 1]) = aten:div(%0, %3) return (%4) Is exported as: graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])): %1 : Float(2, 3, 4, strides=[12, 4, 1]) = onnx:Exp(%0) %2 : Float(2, 3, 4, strides=[12, 4, 1]) = onnx:Div(%0, %1) return (%2) ### ONNX_ATEN This mode is used to export all operators as ATen ops, and avoid conversion to ONNX. Example torch ir graph: graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])): %3 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::exp(%0) %4 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::div(%0, %3) return (%4) Is exported as: graph(%0 : Float(2, 3, 4, strides=[12, 4, 1])): %1 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::ATen[operator="exp"](%0) %2 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::ATen[operator="div"](%0, %1) return (%2) ### ONNX_ATEN_FALLBACK To fallback on unsupported ATen operators in ONNX. Supported operators are exported to ONNX regularly. In the following example, aten::triu is not supported in ONNX. Exporter falls back on this operator. Example torch ir graph: graph(%0 : Float): %3 : int = prim::Constant[value=0]() %4 : Float = aten::triu(%0, %3) # unsupported op %5 : Float = aten::mul(%4, %0) # registered op return (%5) is exported as: graph(%0 : Float): %1 : Long() = onnx::Constant[value={0}]() %2 : Float = aten::ATen[operator="triu"](%0, %1) # unsupported op %3 : Float = onnx::Mul(%2, %0) # registered op return (%3) ### RAW To export a raw ir. Example torch ir graph: graph(%x.1 : Float(1, strides=[1])): %1 : Tensor = aten::exp(%x.1) %2 : Tensor = aten::div(%x.1, %1) %y.1 : Tensor[] = prim::ListConstruct(%2) return (%y.1) is exported as: graph(%x.1 : Float(1, strides=[1])): %1 : Tensor = aten::exp(%x.1) %2 : Tensor = aten::div(%x.1, %1) %y.1 : Tensor[] = prim::ListConstruct(%2) return (%y.1) ### ONNX_FALLTHROUGH This mode can be used to export any operator (ATen or non-ATen) that is not registered and supported in ONNX. Exported falls through and exports the operator as is, as custom op. Exporting custom operators enables users to register and implement the operator as part of their runtime backend. Example torch ir graph: graph(%0 : Float(2, 3, 4, strides=[12, 4, 1]), %1 : Float(2, 3, 4, strides=[12, 4, 1])): %6 : Float(2, 3, 4, strides=[12, 4, 1]) = foo_namespace::bar(%0, %1) # custom op %7 : Float(2, 3, 4, strides=[12, 4, 1]) = aten::div(%6, %0) # registered op return (%7)) is exported as: graph(%0 : Float(2, 3, 4, strides=[12, 4, 1]), %1 : Float(2, 3, 4, strides=[12, 4, 1])): %2 : Float(2, 3, 4, strides=[12, 4, 1]) = foo_namespace::bar(%0, %1) # custom op %3 : Float(2, 3, 4, strides=[12, 4, 1]) = onnx::Div(%2, %0) # registered op return (%3 ## Frequently Asked Questions Q: I have exported my lstm model, but its input size seems to be fixed? The tracer records the example inputs shape in the graph. In case the model should accept inputs of dynamic shape, you can utilize the parameter `dynamic_axes` in export api. layer_count = 4 model = nn.LSTM(10, 20, num_layers=layer_count, bidirectional=True) model.eval() with torch.no_grad(): input = torch.randn(5, 3, 10) h0 = torch.randn(layer_count * 2, 3, 20) c0 = torch.randn(layer_count * 2, 3, 20) output, (hn, cn) = model(input, (h0, c0)) # default export torch.onnx.export(model, (input, (h0, c0)), 'lstm.onnx') onnx_model = onnx.load('lstm.onnx') # input shape [5, 3, 10] print(onnx_model.graph.input[0]) # export with `dynamic_axes` torch.onnx.export(model, (input, (h0, c0)), 'lstm.onnx', input_names=['input', 'h0', 'c0'], output_names=['output', 'hn', 'cn'], dynamic_axes={'input': {0: 'sequence'}, 'output': {0: 'sequence'}}) onnx_model = onnx.load('lstm.onnx') # input shape ['sequence', 3, 10] print(onnx_model.graph.input[0]) Q: How to export models with loops in it? Please checkout Tracing vs Scripting. Q: Does ONNX support implicit scalar datatype casting? No, but the exporter will try to handle that part. Scalars are converted to constant tensors in ONNX. The exporter will try to figure out the right datatype for scalars. However for cases that it failed to do so, you will need to manually provide the datatype information. This often happens with scripted models, where the datatypes are not recorded. We are trying to improve the datatype propagation in the exporter such that manual changes are not required in the future. class ImplicitCastType(torch.jit.ScriptModule): @torch.jit.script_method def forward(self, x): # Exporter knows x is float32, will export '2' as float32 as well. y = x + 2 # Without type propagation, exporter doesn't know the datatype of y. # Thus '3' is exported as int64 by default. return y + 3 # The following will export correctly. # return y + torch.tensor([3], dtype=torch.float32) x = torch.tensor([1.0], dtype=torch.float32) torch.onnx.export(ImplicitCastType(), x, 'models/implicit_cast.onnx', example_outputs=ImplicitCastType()(x)) Q: Is tensor in-place indexed assignment like `data[index] = new_data` supported? Yes, this is supported for ONNX opset version >= 11. Please checkout Indexing. Q: Is tensor list exportable to ONNX? Yes, this is supported now for ONNX opset version >= 11. ONNX introduced the concept of Sequence in opset 11. Similar to list, Sequence is a data type that contains arbitrary number of Tensors. Associated operators are also introduced in ONNX, such as SequenceInsert, SequenceAt, etc. However, in-place list append within loops is not exportable to ONNX. To implement this, please use inplace add operator. E.g.: class ListLoopModel(torch.nn.Module): def forward(self, x): res = [] res1 = [] arr = x.split(2, 0) res2 = torch.zeros(3, 4, dtype=torch.long) for i in range(len(arr)): res += [arr[i].sum(0, False)] res1 += [arr[-1 - i].sum(0, False)] res2 += 1 return torch.stack(res), torch.stack(res1), res2 model = torch.jit.script(ListLoopModel()) inputs = torch.randn(16) out = model(inputs) torch.onnx.export(model, (inputs, ), 'loop_and_list.onnx', opset_version=11, example_outputs=out) # onnxruntime import onnxruntime sess = onnxruntime.InferenceSession('loop_and_list.onnx') out_ort = sess.run(None, { sess.get_inputs()[0].name: inputs.numpy(), }) assert [torch.allclose(o, torch.tensor(o_ort)) for o, o_ort in zip(out, out_ort)] ## Use external data format `use_external_data_format` argument in export API enables export of models in ONNX external data format. With this option enabled, the exporter stores some model parameters in external binary files, rather than the ONNX file itself. These external binary files are stored in the same location as the ONNX file. Argument ‘f’ must be a string specifying the location of the model. model = torchvision.models.mobilenet_v2(pretrained=True) input = torch.randn(2, 3, 224, 224, requires_grad=True) torch.onnx.export(model, (input, ), './large_model.onnx', use_external_data_format=True) This argument enables export of large models to ONNX. Models larger than 2GB cannot be exported in one file because of the protobuf size limit. Users should set `use_external_data_format` to `True` to successfully export such models. ## Training `Training` argument in export API allows users to export models in a training- friendly mode. `TrainingMode.TRAINING` exports model in a training-friendly mode that avoids certain model optimizations which might interfere with model parameter training. `TrainingMode.PRESERVE` exports the model in inference mode if `model.training` is `False`. Otherwise, it exports the model in a training-friendly mode. The default mode for this argument is `TrainingMode.EVAL` which exports the model in inference mode. ## Functions `torch.onnx.export(model, args, f, export_params=True, verbose=False, training=, input_names=None, output_names=None, aten=False, export_raw_ir=False, operator_export_type=None, opset_version=None, _retain_param_name=True, do_constant_folding=True, example_outputs=None, strip_doc_string=True, dynamic_axes=None, keep_initializers_as_inputs=None, custom_opsets=None, enable_onnx_checker=True, use_external_data_format=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#export) Export a model into ONNX format. This exporter runs your model once in order to get a trace of its execution to be exported; at the moment, it supports a limited set of dynamic models (e.g., RNNs.) Parameters * **model** ([torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – the model to be exported. * **args** (_tuple of arguments_ _or_[torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__a dictionary consisting of named arguments_ _(__optional_ _)_) – a dictionary to specify the input to the corresponding named parameter: - KEY: str, named parameter - VALUE: corresponding input args can be structured either as: 1. ONLY A TUPLE OF ARGUMENTS or torch.Tensor: ‘’args = (x, y, z)’' The inputs to the model, e.g., such that `model(*args)` is a valid invocation of the model. Any non-Tensor arguments will be hard-coded into the exported model; any Tensor arguments will become inputs of the exported model, in the order they occur in args. If args is a Tensor, this is equivalent to having called it with a 1-ary tuple of that Tensor. 2. A TUPLE OF ARGUEMENTS WITH A DICTIONARY OF NAMED PARAMETERS: ‘’args = (x, { ‘y’: input_y, ‘z’: input_z }) ‘’ The inputs to the model are structured as a tuple consisting of non-keyword arguments and the last value of this tuple being a dictionary consisting of named parameters and the corresponding inputs as key-value pairs. If certain named argument is not present in the dictionary, it is assigned the default value, or None if default value is not provided. Cases in which an dictionary input is the last input of the args tuple would cause a conflict when a dictionary of named parameters is used. The model below provides such an example. class Model(torch.nn.Module): def forward(self, k, x): … return x m = Model() k = torch.randn(2, 3) x = {torch.tensor(1.): torch.randn(2, 3)} In the previous iteration, the call to export API would look like torch.onnx.export(model, (k, x), ‘test.onnx’) This would work as intended. However, the export function would now assume that the ‘x’ input is intended to represent the optional dictionary consisting of named arguments. In order to prevent this from being an issue a constraint is placed to provide an empty dictionary as the last input in the tuple args in such cases. The new call would look like this. torch.onnx.export(model, (k, x, {}), ‘test.onnx’) * **f** – a file-like object (has to implement fileno that returns a file descriptor) or a string containing a file name. A binary Protobuf will be written to this file. * **export_params** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default True_) – if specified, all parameters will be exported. Set this to False if you want to export an untrained model. In this case, the exported model will first take all of its parameters as arguments, the ordering as specified by `model.state_dict().values()` * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – if specified, we will print out a debug description of the trace being exported. * **training** (_enum_ _,__default TrainingMode.EVAL_) – TrainingMode.EVAL: export the model in inference mode. TrainingMode.PRESERVE: export the model in inference mode if model.training is False and to a training friendly mode if model.training is True. TrainingMode.TRAINING: export the model in a training friendly mode. * **input_names** (_list of strings_ _,__default empty list_) – names to assign to the input nodes of the graph, in order * **output_names** (_list of strings_ _,__default empty list_) – names to assign to the output nodes of the graph, in order * **aten** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – [DEPRECATED. use operator_export_type] export the model in aten mode. If using aten mode, all the ops original exported by the functions in symbolic_opset.py are exported as ATen ops. * **export_raw_ir** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – [DEPRECATED. use operator_export_type] export the internal IR directly instead of converting it to ONNX ops. * **operator_export_type** (_enum_ _,__default OperatorExportTypes.ONNX_) – OperatorExportTypes.ONNX: All ops are exported as regular ONNX ops (with ONNX namespace). OperatorExportTypes.ONNX_ATEN: All ops are exported as ATen ops (with aten namespace). OperatorExportTypes.ONNX_ATEN_FALLBACK: If an ATen op is not supported in ONNX or its symbolic is missing, fall back on ATen op. Registered ops are exported to ONNX regularly. Example graph: graph(%0 : Float):: %3 : int = prim::Constant[value=0]() %4 : Float = aten::triu(%0, %3) # missing op %5 : Float = aten::mul(%4, %0) # registered op return (%5) is exported as: graph(%0 : Float):: %1 : Long() = onnx::Constant[value={0}]() %2 : Float = aten::ATen[operator="triu"](%0, %1) # missing op %3 : Float = onnx::Mul(%2, %0) # registered op return (%3) In the above example, aten::triu is not supported in ONNX, hence exporter falls back on this op. OperatorExportTypes.RAW: Export raw ir. OperatorExportTypes.ONNX_FALLTHROUGH: If an op is not supported in ONNX, fall through and export the operator as is, as a custom ONNX op. Using this mode, the op can be exported and implemented by the user for their runtime backend. Example graph: graph(%x.1 : Long(1, strides=[1])):: %1 : None = prim::Constant() %2 : Tensor = aten::sum(%x.1, %1) %y.1 : Tensor[] = prim::ListConstruct(%2) return (%y.1) is exported as: graph(%x.1 : Long(1, strides=[1])):: %1 : Tensor = onnx::ReduceSum[keepdims=0](%x.1) %y.1 : Long() = prim::ListConstruct(%1) return (%y.1) In the above example, prim::ListConstruct is not supported, hence exporter falls through. * **opset_version** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__default is 9_) – by default we export the model to the opset version of the onnx submodule. Since ONNX’s latest opset may evolve before next stable release, by default we export to one stable opset version. Right now, supported stable opset version is 9. The opset_version must be _onnx_main_opset or in _onnx_stable_opsets which are defined in torch/onnx/symbolic_helper.py * **do_constant_folding** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – If True, the constant-folding optimization is applied to the model during export. Constant-folding optimization will replace some of the ops that have all constant inputs, with pre-computed constant nodes. * **example_outputs** (_tuple of Tensors_ _,__default None_) – Model’s example outputs being exported. example_outputs must be provided when exporting a ScriptModule or TorchScript Function. * **strip_doc_string** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default True_) – if True, strips the field “doc_string” from the exported model, which information about the stack trace. * **dynamic_axes** (_dict >__or_ _dict __,__default empty dict_) – a dictionary to specify dynamic axes of input/output, such that: - KEY: input and/or output names - VALUE: index of dynamic axes for given key and potentially the name to be used for exported dynamic axes. In general the value is defined according to one of the following ways or a combination of both: (1). A list of integers specifying the dynamic axes of provided input. In this scenario automated names will be generated and applied to dynamic axes of provided input/output during export. OR (2). An inner dictionary that specifies a mapping FROM the index of dynamic axis in corresponding input/output TO the name that is desired to be applied on such axis of such input/output during export. Example. if we have the following shape for inputs and outputs: shape(input_1) = ('b', 3, 'w', 'h') and shape(input_2) = ('b', 4) and shape(output) = ('b', 'd', 5) Then `dynamic axes` can be defined either as: 1. ONLY INDICES: ``dynamic_axes = {'input_1':[0, 2, 3], 'input_2':[0], 'output':[0, 1]}`` where automatic names will be generated for exported dynamic axes 2. INDICES WITH CORRESPONDING NAMES: ``dynamic_axes = {'input_1':{0:'batch', 1:'width', 2:'height'}, 'input_2':{0:'batch'}, 'output':{0:'batch', 1:'detections'}}`` where provided names will be applied to exported dynamic axes 3. MIXED MODE OF (1) and (2): ``dynamic_axes = {'input_1':[0, 2, 3], 'input_2':{0:'batch'}, 'output':[0,1]}`` * **keep_initializers_as_inputs** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default None_) – If True, all the initializers (typically corresponding to parameters) in the exported graph will also be added as inputs to the graph. If False, then initializers are not added as inputs to the graph, and only the non-parameter inputs are added as inputs. This may allow for better optimizations (such as constant folding etc.) by backends/runtimes that execute these graphs. If unspecified (default None), then the behavior is chosen automatically as follows. If operator_export_type is OperatorExportTypes.ONNX, the behavior is equivalent to setting this argument to False. For other values of operator_export_type, the behavior is equivalent to setting this argument to True. Note that for ONNX opset version < 9, initializers MUST be part of graph inputs. Therefore, if opset_version argument is set to a 8 or lower, this argument will be ignored. * **custom_opsets** (_dict __,__default empty dict_) – A dictionary to indicate custom opset domain and version at export. If model contains a custom opset, it is optional to specify the domain and opset version in the dictionary: - KEY: opset domain name - VALUE: opset version If the custom opset is not provided in this dictionary, opset version is set to 1 by default. * **enable_onnx_checker** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default True_) – If True the onnx model checker will be run as part of the export, to ensure the exported model is a valid ONNX model. * **external_data_format** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__default False_) – If True, then the model is exported in ONNX external data format, in which case some of the model parameters are stored in external binary files and not in the ONNX model file itself. See link for format details: Also, in this case, argument ‘f’ must be a string specifying the location of the model. The external binary files will be stored in the same location specified by the model location ‘f’. If False, then the model is stored in regular format, i.e. model and parameters are all in one file. This argument is ignored for all export types other than ONNX. `torch.onnx.export_to_pretty_string(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#export_to_pretty_string) `torch.onnx.register_custom_op_symbolic(symbolic_name, symbolic_fn, opset_version)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#register_custom_op_symbolic) `torch.onnx.operators.shape_as_tensor(x)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx/operators.html#shape_as_tensor) `torch.onnx.select_model_mode_for_export(model, mode)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#select_model_mode_for_export) A context manager to temporarily set the training mode of ‘model’ to ‘mode’, resetting it when we exit the with-block. A no-op if mode is None. In version 1.6 changed to this from set_training `torch.onnx.is_in_onnx_export()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/onnx.html#is_in_onnx_export) Check whether it’s in the middle of the ONNX export. This function returns True in the middle of torch.onnx.export(). torch.onnx.export should be executed with single thread. # torch.optim `torch.optim` is a package implementing various optimization algorithms. Most commonly used methods are already supported, and the interface is general enough, so that more sophisticated ones can be also easily integrated in the future. ## How to use an optimizer To use `torch.optim` you have to construct an optimizer object, that will hold the current state and will update the parameters based on the computed gradients. ### Constructing it To construct an `Optimizer` you have to give it an iterable containing the parameters (all should be `Variable` s) to optimize. Then, you can specify optimizer-specific options such as the learning rate, weight decay, etc. Note If you need to move a model to GPU via `.cuda()`, please do so before constructing optimizers for it. Parameters of a model after `.cuda()` will be different objects with those before the call. In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used. Example: optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9) optimizer = optim.Adam([var1, var2], lr=0.0001) ### Per-parameter options `Optimizer` s also support specifying per-parameter options. To do this, instead of passing an iterable of `Variable` s, pass in an iterable of [`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)") s. Each of them will define a separate parameter group, and should contain a `params` key, containing a list of parameters belonging to it. Other keys should match the keyword arguments accepted by the optimizers, and will be used as optimization options for this group. Note You can still pass options as keyword arguments. They will be used as defaults, in the groups that didn’t override them. This is useful when you only want to vary a single option, while keeping all others consistent between parameter groups. For example, this is very useful when one wants to specify per-layer learning rates: optim.SGD([ {'params': model.base.parameters()}, {'params': model.classifier.parameters(), 'lr': 1e-3} ], lr=1e-2, momentum=0.9) This means that `model.base`’s parameters will use the default learning rate of `1e-2`, `model.classifier`’s parameters will use a learning rate of `1e-3`, and a momentum of `0.9` will be used for all parameters. ### Taking an optimization step All optimizers implement a `step()` method, that updates the parameters. It can be used in two ways: #### `optimizer.step()` This is a simplified version supported by most optimizers. The function can be called once the gradients are computed using e.g. `backward()`. Example: for input, target in dataset: optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() optimizer.step() #### `optimizer.step(closure)` Some optimization algorithms such as Conjugate Gradient and LBFGS need to reevaluate the function multiple times, so you have to pass in a closure that allows them to recompute your model. The closure should clear the gradients, compute the loss, and return it. Example: for input, target in dataset: def closure(): optimizer.zero_grad() output = model(input) loss = loss_fn(output, target) loss.backward() return loss optimizer.step(closure) ## Algorithms `class torch.optim.Optimizer(params, defaults)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer) Base class for all optimizers. Warning Parameters need to be specified as collections that have a deterministic ordering that is consistent between runs. Examples of objects that don’t satisfy those properties are sets and iterators over values of dictionaries. Parameters * **params** (_iterable_) – an iterable of [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") s or [`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)") s. Specifies what Tensors should be optimized. * **defaults** – (dict): a dict containing default values of optimization options (used when a parameter group doesn’t specify them). `add_param_group(param_group)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.add_param_group) Add a param group to the `Optimizer` s `param_groups`. This can be useful when fine tuning a pre-trained network as frozen layers can be made trainable and added to the `Optimizer` as training progresses. Parameters * **param_group** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Specifies what Tensors should be optimized along with group * **optimization options.** (_specific_) – `load_state_dict(state_dict)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.load_state_dict) Loads the optimizer state. Parameters **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – optimizer state. Should be an object returned from a call to `state_dict()`. `state_dict()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.state_dict) Returns the state of the optimizer as a [`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)"). It contains two entries: * state - a dict holding current optimization state. Its content differs between optimizer classes. * param_groups - a dict containing all parameter groups `step(closure)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.step) Performs a single optimization step (parameter update). Parameters **closure** (_callable_) – A closure that reevaluates the model and returns the loss. Optional for most optimizers. Note Unless otherwise specified, this function should not modify the `.grad` field of the parameters. `zero_grad(set_to_none=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/optimizer.html#Optimizer.zero_grad) Sets the gradients of all optimized [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") s to zero. Parameters **set_to_none** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – instead of setting to zero, set the grads to None. This will in general have lower memory footprint, and can modestly improve performance. However, it changes certain behaviors. For example: 1. When the user tries to access a gradient and perform manual ops on it, a None attribute or a Tensor full of 0s will behave differently. 2. If the user requests `zero_grad(set_to_none=True)` followed by a backward pass, `.grad`s are guaranteed to be None for params that did not receive a gradient. 3. `torch.optim` optimizers have a different behavior if the gradient is 0 or None (in one case it does the step with a gradient of 0 and in the other it skips the step altogether). `class torch.optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adadelta.html#Adadelta) Implements Adadelta algorithm. It has been proposed in [ADADELTA: An Adaptive Learning Rate Method](https://arxiv.org/abs/1212.5701). Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **rho** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – coefficient used for computing a running average of squared gradients (default: 0.9) * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-6) * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – coefficient that scale delta before it is applied to the parameters (default: 1.0) * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adadelta.html#Adadelta.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0, eps=1e-10)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adagrad.html#Adagrad) Implements Adagrad algorithm. It has been proposed in [Adaptive Subgradient Methods for Online Learning and Stochastic Optimization](http://jmlr.org/papers/v12/duchi11a.html). Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2) * **lr_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate decay (default: 0) * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0) * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-10) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adagrad.html#Adagrad.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adam.html#Adam) Implements Adam algorithm. It has been proposed in [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980). The implementation of the L2 penalty follows changes proposed in [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-3) * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8) * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0) * **amsgrad** (_boolean_ _,__optional_) – whether to use the AMSGrad variant of this algorithm from the paper [On the Convergence of Adam and Beyond](https://openreview.net/forum?id=ryQu7f-RZ) (default: False) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adam.html#Adam.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.AdamW(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0.01, amsgrad=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamw.html#AdamW) Implements AdamW algorithm. The original Adam algorithm was proposed in [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980). The AdamW variant was proposed in [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101). Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-3) * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8) * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay coefficient (default: 1e-2) * **amsgrad** (_boolean_ _,__optional_) – whether to use the AMSGrad variant of this algorithm from the paper [On the Convergence of Adam and Beyond](https://openreview.net/forum?id=ryQu7f-RZ) (default: False) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamw.html#AdamW.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.SparseAdam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sparse_adam.html#SparseAdam) Implements lazy version of Adam algorithm suitable for sparse tensors. In this variant, only moments that show up in the gradient get updated, and only those portions of the gradient get applied to the parameters. Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-3) * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square (default: (0.9, 0.999)) * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sparse_adam.html#SparseAdam.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamax.html#Adamax) Implements Adamax algorithm (a variant of Adam based on infinity norm). It has been proposed in [Adam: A Method for Stochastic Optimization](https://arxiv.org/abs/1412.6980). Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 2e-3) * **betas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – coefficients used for computing running averages of gradient and its square * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8) * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/adamax.html#Adamax.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.ASGD(params, lr=0.01, lambd=0.0001, alpha=0.75, t0=1000000.0, weight_decay=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/asgd.html#ASGD) Implements Averaged Stochastic Gradient Descent. It has been proposed in [Acceleration of stochastic approximation by averaging](https://dl.acm.org/citation.cfm?id=131098). Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2) * **lambd** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – decay term (default: 1e-4) * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – power for eta update (default: 0.75) * **t0** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – point at which to start averaging (default: 1e6) * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/asgd.html#ASGD.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.LBFGS(params, lr=1, max_iter=20, max_eval=None, tolerance_grad=1e-07, tolerance_change=1e-09, history_size=100, line_search_fn=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lbfgs.html#LBFGS) Implements L-BFGS algorithm, heavily inspired by `minFunc `. Warning This optimizer doesn’t support per-parameter options and parameter groups (there can be only one). Warning Right now all parameters have to be on a single device. This will be improved in the future. Note This is a very memory intensive optimizer (it requires additional `param_bytes * (history_size + 1)` bytes). If it doesn’t fit in memory try reducing the history size, or use a different algorithm. Parameters * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – learning rate (default: 1) * **max_iter** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximal number of iterations per optimization step (default: 20) * **max_eval** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – maximal number of function evaluations per optimization step (default: max_iter * 1.25). * **tolerance_grad** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – termination tolerance on first order optimality (default: 1e-5). * **tolerance_change** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – termination tolerance on function value/parameter changes (default: 1e-9). * **history_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – update history size (default: 100). * **line_search_fn** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – either ‘strong_wolfe’ or None (default: None). `step(closure)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lbfgs.html#LBFGS.step) Performs a single optimization step. Parameters **closure** (_callable_) – A closure that reevaluates the model and returns the loss. `class torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rmsprop.html#RMSprop) Implements RMSprop algorithm. Proposed by G. Hinton in his [course](https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf). The centered version first appears in [Generating Sequences With Recurrent Neural Networks](https://arxiv.org/pdf/1308.0850v5.pdf). The implementation here takes the square root of the gradient average before adding epsilon (note that TensorFlow interchanges these two operations). The effective learning rate is thus α/(v+ϵ)\alpha/(\sqrt{v} + \epsilon) where α\alpha is the scheduled learning rate and vv is the weighted moving average of the squared gradient. Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2) * **momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – momentum factor (default: 0) * **alpha** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – smoothing constant (default: 0.99) * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – term added to the denominator to improve numerical stability (default: 1e-8) * **centered** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – if `True`, compute the centered RMSProp, the gradient is normalized by an estimation of its variance * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rmsprop.html#RMSprop.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.Rprop(params, lr=0.01, etas=(0.5, 1.2), step_sizes=(1e-06, 50))` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rprop.html#Rprop) Implements the resilient backpropagation algorithm. Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – learning rate (default: 1e-2) * **etas** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – pair of (etaminus, etaplis), that are multiplicative increase and decrease factors (default: (0.5, 1.2)) * **step_sizes** (_Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]__,__optional_) – a pair of minimal and maximal allowed step sizes (default: (1e-6, 50)) `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/rprop.html#Rprop.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. `class torch.optim.SGD(params, lr=, momentum=0, dampening=0, weight_decay=0, nesterov=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sgd.html#SGD) Implements stochastic gradient descent (optionally with momentum). Nesterov momentum is based on the formula from [On the importance of initialization and momentum in deep learning](http://www.cs.toronto.edu/%7Ehinton/absps/momentum.pdf). Parameters * **params** (_iterable_) – iterable of parameters to optimize or dicts defining parameter groups * **lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – learning rate * **momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – momentum factor (default: 0) * **weight_decay** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – weight decay (L2 penalty) (default: 0) * **dampening** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – dampening for momentum (default: 0) * **nesterov** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – enables Nesterov momentum (default: False) #### Example >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step() Note The implementation of SGD with Momentum/Nesterov subtly differs from Sutskever et. al. and implementations in some other frameworks. Considering the specific case of Momentum, the update can be written as vt+1=μ∗vt+gt+1,pt+1=pt−lr∗vt+1,\begin{aligned} v_{t+1} & = \mu * v_{t} + g_{t+1}, \\\ p_{t+1} & = p_{t} - \text{lr} * v_{t+1}, \end{aligned} where pp , gg , vv and μ\mu denote the parameters, gradient, velocity, and momentum respectively. This is in contrast to Sutskever et. al. and other frameworks which employ an update of the form vt+1=μ∗vt+lr∗gt+1,pt+1=pt−vt+1.\begin{aligned} v_{t+1} & = \mu * v_{t} + \text{lr} * g_{t+1}, \\\ p_{t+1} & = p_{t} - v_{t+1}. \end{aligned} The Nesterov version is analogously modified. `step(closure=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/sgd.html#SGD.step) Performs a single optimization step. Parameters **closure** (_callable_ _,__optional_) – A closure that reevaluates the model and returns the loss. ## How to adjust learning rate `torch.optim.lr_scheduler` provides several methods to adjust the learning rate based on the number of epochs. `torch.optim.lr_scheduler.ReduceLROnPlateau` allows dynamic learning rate reducing based on some validation measurements. Learning rate scheduling should be applied after optimizer’s update; e.g., you should write your code this way: >>> scheduler = ... >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() Warning Prior to PyTorch 1.1.0, the learning rate scheduler was expected to be called before the optimizer’s update; 1.1.0 changed this behavior in a BC-breaking way. If you use the learning rate scheduler (calling `scheduler.step()`) before the optimizer’s update (calling `optimizer.step()`), this will skip the first value of the learning rate schedule. If you are unable to reproduce results after upgrading to PyTorch 1.1.0, please check if you are calling `scheduler.step()` at the wrong time. `class torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#LambdaLR) Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **lr_lambda** (_function_ _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. #### Example >>> # Assuming optimizer has two groups. >>> lambda1 = lambda epoch: epoch // 30 >>> lambda2 = lambda epoch: 0.95 ** epoch >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() `load_state_dict(state_dict)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#LambdaLR.load_state_dict) Loads the schedulers state. When saving or loading the scheduler, please make sure to also save or load the state of the optimizer. Parameters **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – scheduler state. Should be an object returned from a call to `state_dict()`. `state_dict()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#LambdaLR.state_dict) Returns the state of the scheduler as a [`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)"). It contains an entry for every variable in self.__dict__ which is not the optimizer. The learning rate lambda functions will only be saved if they are callable objects and not if they are functions or lambdas. When saving or loading the scheduler, please make sure to also save or load the state of the optimizer. `class torch.optim.lr_scheduler.MultiplicativeLR(optimizer, lr_lambda, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiplicativeLR) Multiply the learning rate of each parameter group by the factor given in the specified function. When last_epoch=-1, sets initial lr as lr. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **lr_lambda** (_function_ _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. #### Example >>> lmbda = lambda epoch: 0.95 >>> scheduler = MultiplicativeLR(optimizer, lr_lambda=lmbda) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() `load_state_dict(state_dict)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiplicativeLR.load_state_dict) Loads the schedulers state. Parameters **state_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – scheduler state. Should be an object returned from a call to `state_dict()`. `state_dict()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiplicativeLR.state_dict) Returns the state of the scheduler as a [`dict`](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)"). It contains an entry for every variable in self.__dict__ which is not the optimizer. The learning rate lambda functions will only be saved if they are callable objects and not if they are functions or lambdas. `class torch.optim.lr_scheduler.StepLR(optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#StepLR) Decays the learning rate of each parameter group by gamma every step_size epochs. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **step_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Period of learning rate decay. * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Multiplicative factor of learning rate decay. Default: 0.1. * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. #### Example >>> # Assuming optimizer uses lr = 0.05 for all groups >>> # lr = 0.05 if epoch < 30 >>> # lr = 0.005 if 30 <= epoch < 60 >>> # lr = 0.0005 if 60 <= epoch < 90 >>> # ... >>> scheduler = StepLR(optimizer, step_size=30, gamma=0.1) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() `class torch.optim.lr_scheduler.MultiStepLR(optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#MultiStepLR) Decays the learning rate of each parameter group by gamma once the number of epoch reaches one of the milestones. Notice that such decay can happen simultaneously with other changes to the learning rate from outside this scheduler. When last_epoch=-1, sets initial lr as lr. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **milestones** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – List of epoch indices. Must be increasing. * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Multiplicative factor of learning rate decay. Default: 0.1. * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. #### Example >>> # Assuming optimizer uses lr = 0.05 for all groups >>> # lr = 0.05 if epoch < 30 >>> # lr = 0.005 if 30 <= epoch < 80 >>> # lr = 0.0005 if epoch >= 80 >>> scheduler = MultiStepLR(optimizer, milestones=[30,80], gamma=0.1) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() `class torch.optim.lr_scheduler.ExponentialLR(optimizer, gamma, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#ExponentialLR) Decays the learning rate of each parameter group by gamma every epoch. When last_epoch=-1, sets initial lr as lr. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Multiplicative factor of learning rate decay. * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. `class torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CosineAnnealingLR) Set the learning rate of each parameter group using a cosine annealing schedule, where ηmax\eta_{max} is set to the initial lr and TcurT_{cur} is the number of epochs since the last restart in SGDR: ηt=ηmin+12(ηmax−ηmin)(1+cos⁡(TcurTmaxπ)),Tcur≠(2k+1)Tmax;ηt+1=ηt+12(ηmax−ηmin)(1−cos⁡(1Tmaxπ)),Tcur=(2k+1)Tmax.\begin{aligned} \eta_t & = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right), & T_{cur} \neq (2k+1)T_{max}; \\\ \eta_{t+1} & = \eta_{t} + \frac{1}{2}(\eta_{max} - \eta_{min}) \left(1 - \cos\left(\frac{1}{T_{max}}\pi\right)\right), & T_{cur} = (2k+1)T_{max}. \end{aligned} When last_epoch=-1, sets initial lr as lr. Notice that because the schedule is defined recursively, the learning rate can be simultaneously modified outside this scheduler by other operators. If the learning rate is set solely by this scheduler, the learning rate at each step becomes: ηt=ηmin+12(ηmax−ηmin)(1+cos⁡(TcurTmaxπ))\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{max}}\pi\right)\right) It has been proposed in [SGDR: Stochastic Gradient Descent with Warm Restarts](https://arxiv.org/abs/1608.03983). Note that this only implements the cosine annealing part of SGDR, and not the restarts. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **T_max** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Maximum number of iterations. * **eta_min** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Minimum learning rate. Default: 0. * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of last epoch. Default: -1. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. `class torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, mode='min', factor=0.1, patience=10, threshold=0.0001, threshold_mode='rel', cooldown=0, min_lr=0, eps=1e-08, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#ReduceLROnPlateau) Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – One of `min`, `max`. In `min` mode, lr will be reduced when the quantity monitored has stopped decreasing; in `max` mode it will be reduced when the quantity monitored has stopped increasing. Default: ‘min’. * **factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Factor by which the learning rate will be reduced. new_lr = lr * factor. Default: 0.1. * **patience** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of epochs with no improvement after which learning rate will be reduced. For example, if `patience = 2`, then we will ignore the first 2 epochs with no improvement, and will only decrease the LR after the 3rd epoch if the loss still hasn’t improved then. Default: 10. * **threshold** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Threshold for measuring the new optimum, to only focus on significant changes. Default: 1e-4. * **threshold_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – One of `rel`, `abs`. In `rel` mode, dynamic_threshold = best * ( 1 + threshold ) in ‘max’ mode or best * ( 1 - threshold ) in `min` mode. In `abs` mode, dynamic_threshold = best + threshold in `max` mode or best - threshold in `min` mode. Default: ‘rel’. * **cooldown** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of epochs to wait before resuming normal operation after lr has been reduced. Default: 0. * **min_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A scalar or a list of scalars. A lower bound on the learning rate of all param groups or each group respectively. Default: 0. * **eps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Minimal decay applied to lr. If the difference between new and old lr is smaller than eps, the update is ignored. Default: 1e-8. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. #### Example >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> scheduler = ReduceLROnPlateau(optimizer, 'min') >>> for epoch in range(10): >>> train(...) >>> val_loss = validate(...) >>> # Note that step should be called after validate() >>> scheduler.step(val_loss) `class torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr, max_lr, step_size_up=2000, step_size_down=None, mode='triangular', gamma=1.0, scale_fn=None, scale_mode='cycle', cycle_momentum=True, base_momentum=0.8, max_momentum=0.9, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CyclicLR) Sets the learning rate of each parameter group according to cyclical learning rate policy (CLR). The policy cycles the learning rate between two boundaries with a constant frequency, as detailed in the paper [Cyclical Learning Rates for Training Neural Networks](https://arxiv.org/abs/1506.01186). The distance between the two boundaries can be scaled on a per-iteration or per-cycle basis. Cyclical learning rate policy changes the learning rate after every batch. `step` should be called after a batch has been used for training. This class has three built-in policies, as put forth in the paper: * “triangular”: A basic triangular cycle without amplitude scaling. * “triangular2”: A basic triangular cycle that scales initial amplitude by half each cycle. * “exp_range”: A cycle that scales initial amplitude by gammacycle iterations\text{gamma}^{\text{cycle iterations}} at each cycle iteration. This implementation was adapted from the github repo: [bckenstler/CLR](https://github.com/bckenstler/CLR) Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **base_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Initial learning rate which is the lower boundary in the cycle for each parameter group. * **max_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper learning rate boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_lr - base_lr). The lr at any cycle is the sum of base_lr and some scaling of the amplitude; therefore max_lr may not actually be reached depending on scaling function. * **step_size_up** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of training iterations in the increasing half of a cycle. Default: 2000 * **step_size_down** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of training iterations in the decreasing half of a cycle. If step_size_down is None, it is set to step_size_up. Default: None * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – One of {triangular, triangular2, exp_range}. Values correspond to policies detailed above. If scale_fn is not None, this argument is ignored. Default: ‘triangular’ * **gamma** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Constant in ‘exp_range’ scaling function: gamma**(cycle iterations) Default: 1.0 * **scale_fn** (_function_) – Custom scaling policy defined by a single argument lambda function, where 0 <= scale_fn(x) <= 1 for all x >= 0. If specified, then ‘mode’ is ignored. Default: None * **scale_mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – {‘cycle’, ‘iterations’}. Defines whether scale_fn is evaluated on cycle number or cycle iterations (training iterations since start of cycle). Default: ‘cycle’ * **cycle_momentum** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, momentum is cycled inversely to learning rate between ‘base_momentum’ and ‘max_momentum’. Default: True * **base_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Default: 0.8 * **max_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). The momentum at any cycle is the difference of max_momentum and some scaling of the amplitude; therefore base_momentum may not actually be reached depending on scaling function. Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Default: 0.9 * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of the last batch. This parameter is used when resuming a training job. Since `step()` should be invoked after each batch instead of after each epoch, this number represents the total number of _batches_ computed, not the total number of epochs computed. When last_epoch=-1, the schedule is started from the beginning. Default: -1 * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. #### Example >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> scheduler = torch.optim.lr_scheduler.CyclicLR(optimizer, base_lr=0.01, max_lr=0.1) >>> data_loader = torch.utils.data.DataLoader(...) >>> for epoch in range(10): >>> for batch in data_loader: >>> train_batch(...) >>> scheduler.step() `get_lr()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CyclicLR.get_lr) Calculates the learning rate at batch index. This function treats `self.last_epoch` as the last batch index. If `self.cycle_momentum` is `True`, this function has a side effect of updating the optimizer’s momentum. `class torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr, total_steps=None, epochs=None, steps_per_epoch=None, pct_start=0.3, anneal_strategy='cos', cycle_momentum=True, base_momentum=0.85, max_momentum=0.95, div_factor=25.0, final_div_factor=10000.0, three_phase=False, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#OneCycleLR) Sets the learning rate of each parameter group according to the 1cycle learning rate policy. The 1cycle policy anneals the learning rate from an initial learning rate to some maximum learning rate and then from that maximum learning rate to some minimum learning rate much lower than the initial learning rate. This policy was initially described in the paper [Super- Convergence: Very Fast Training of Neural Networks Using Large Learning Rates](https://arxiv.org/abs/1708.07120). The 1cycle learning rate policy changes the learning rate after every batch. `step` should be called after a batch has been used for training. This scheduler is not chainable. Note also that the total number of steps in the cycle can be determined in one of two ways (listed in order of precedence): 1. A value for total_steps is explicitly provided. 2. A number of epochs (epochs) and a number of steps per epoch (steps_per_epoch) are provided. In this case, the number of total steps is inferred by total_steps = epochs * steps_per_epoch You must either provide a value for total_steps or provide a value for both epochs and steps_per_epoch. The default behaviour of this scheduler follows the fastai implementation of 1cycle, which claims that “unpublished work has shown even better results by using only two phases”. To mimic the behaviour of the original paper instead, set `three_phase=True`. Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **max_lr** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper learning rate boundaries in the cycle for each parameter group. * **total_steps** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The total number of steps in the cycle. Note that if a value is not provided here, then it must be inferred by providing a value for epochs and steps_per_epoch. Default: None * **epochs** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of epochs to train for. This is used along with steps_per_epoch in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None * **steps_per_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of steps per epoch to train for. This is used along with epochs in order to infer the total number of steps in the cycle if a value for total_steps is not provided. Default: None * **pct_start** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – The percentage of the cycle (in number of steps) spent increasing the learning rate. Default: 0.3 * **anneal_strategy** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – {‘cos’, ‘linear’} Specifies the annealing strategy: “cos” for cosine annealing, “linear” for linear annealing. Default: ‘cos’ * **cycle_momentum** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, momentum is cycled inversely to learning rate between ‘base_momentum’ and ‘max_momentum’. Default: True * **base_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Lower momentum boundaries in the cycle for each parameter group. Note that momentum is cycled inversely to learning rate; at the peak of a cycle, momentum is ‘base_momentum’ and learning rate is ‘max_lr’. Default: 0.85 * **max_momentum** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Upper momentum boundaries in the cycle for each parameter group. Functionally, it defines the cycle amplitude (max_momentum - base_momentum). Note that momentum is cycled inversely to learning rate; at the start of a cycle, momentum is ‘max_momentum’ and learning rate is ‘base_lr’ Default: 0.95 * **div_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Determines the initial learning rate via initial_lr = max_lr/div_factor Default: 25 * **final_div_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Determines the minimum learning rate via min_lr = initial_lr/final_div_factor Default: 1e4 * **three_phase** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, use a third phase of the schedule to annihilate the learning rate according to ‘final_div_factor’ instead of modifying the second phase (the first two phases will be symmetrical about the step indicated by ‘pct_start’). * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The index of the last batch. This parameter is used when resuming a training job. Since `step()` should be invoked after each batch instead of after each epoch, this number represents the total number of _batches_ computed, not the total number of epochs computed. When last_epoch=-1, the schedule is started from the beginning. Default: -1 * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. #### Example >>> data_loader = torch.utils.data.DataLoader(...) >>> optimizer = torch.optim.SGD(model.parameters(), lr=0.1, momentum=0.9) >>> scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, max_lr=0.01, steps_per_epoch=len(data_loader), epochs=10) >>> for epoch in range(10): >>> for batch in data_loader: >>> train_batch(...) >>> scheduler.step() `class torch.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer, T_0, T_mult=1, eta_min=0, last_epoch=-1, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CosineAnnealingWarmRestarts) Set the learning rate of each parameter group using a cosine annealing schedule, where ηmax\eta_{max} is set to the initial lr, TcurT_{cur} is the number of epochs since the last restart and TiT_{i} is the number of epochs between two warm restarts in SGDR: ηt=ηmin+12(ηmax−ηmin)(1+cos⁡(TcurTiπ))\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right) When Tcur=TiT_{cur}=T_{i} , set ηt=ηmin\eta_t = \eta_{min} . When Tcur=0T_{cur}=0 after restart, set ηt=ηmax\eta_t=\eta_{max} . It has been proposed in [SGDR: Stochastic Gradient Descent with Warm Restarts](https://arxiv.org/abs/1608.03983). Parameters * **optimizer** (Optimizer) – Wrapped optimizer. * **T_0** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of iterations for the first restart. * **T_mult** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – A factor increases TiT_{i} after a restart. Default: 1. * **eta_min** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Minimum learning rate. Default: 0. * **last_epoch** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The index of last epoch. Default: -1. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, prints a message to stdout for each update. Default: `False`. `step(epoch=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/optim/lr_scheduler.html#CosineAnnealingWarmRestarts.step) Step could be called after every batch update #### Example >>> scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult) >>> iters = len(dataloader) >>> for epoch in range(20): >>> for i, sample in enumerate(dataloader): >>> inputs, labels = sample['inputs'], sample['labels'] >>> optimizer.zero_grad() >>> outputs = net(inputs) >>> loss = criterion(outputs, labels) >>> loss.backward() >>> optimizer.step() >>> scheduler.step(epoch + i / iters) This function can be called in an interleaved way. #### Example >>> scheduler = CosineAnnealingWarmRestarts(optimizer, T_0, T_mult) >>> for epoch in range(20): >>> scheduler.step() >>> scheduler.step(26) >>> scheduler.step() # scheduler.step(27), instead of scheduler(20) ## Stochastic Weight Averaging `torch.optim.swa_utils` implements Stochastic Weight Averaging (SWA). In particular, `torch.optim.swa_utils.AveragedModel` class implements SWA models, `torch.optim.swa_utils.SWALR` implements the SWA learning rate scheduler and `torch.optim.swa_utils.update_bn()` is a utility function used to update SWA batch normalization statistics at the end of training. SWA has been proposed in [Averaging Weights Leads to Wider Optima and Better Generalization](https://arxiv.org/abs/1803.05407). ### Constructing averaged models `AveragedModel` class serves to compute the weights of the SWA model. You can create an averaged model by running: >>> swa_model = AveragedModel(model) Here the model `model` can be an arbitrary [`torch.nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") object. `swa_model` will keep track of the running averages of the parameters of the `model`. To update these averages, you can use the `update_parameters()` function: >>> swa_model.update_parameters(model) ### SWA learning rate schedules Typically, in SWA the learning rate is set to a high constant value. `SWALR` is a learning rate scheduler that anneals the learning rate to a fixed value, and then keeps it constant. For example, the following code creates a scheduler that linearly anneals the learning rate from its initial value to 0.05 in 5 epochs within each parameter group: >>> swa_scheduler = torch.optim.swa_utils.SWALR(optimizer, \ >>> anneal_strategy="linear", anneal_epochs=5, swa_lr=0.05) You can also use cosine annealing to a fixed value instead of linear annealing by setting `anneal_strategy="cos"`. ### Taking care of batch normalization `update_bn()` is a utility function that allows to compute the batchnorm statistics for the SWA model on a given dataloader `loader` at the end of training: >>> torch.optim.swa_utils.update_bn(loader, swa_model) `update_bn()` applies the `swa_model` to every element in the dataloader and computes the activation statistics for each batch normalization layer in the model. Warning `update_bn()` assumes that each batch in the dataloader `loader` is either a tensors or a list of tensors where the first element is the tensor that the network `swa_model` should be applied to. If your dataloader has a different structure, you can update the batch normalization statistics of the `swa_model` by doing a forward pass with the `swa_model` on each element of the dataset. ### Custom averaging strategies By default, `torch.optim.swa_utils.AveragedModel` computes a running equal average of the parameters that you provide, but you can also use custom averaging functions with the `avg_fn` parameter. In the following example `ema_model` computes an exponential moving average. Example: >>> ema_avg = lambda averaged_model_parameter, model_parameter, num_averaged:\ >>> 0.1 * averaged_model_parameter + 0.9 * model_parameter >>> ema_model = torch.optim.swa_utils.AveragedModel(model, avg_fn=ema_avg) ### Putting it all together In the example below, `swa_model` is the SWA model that accumulates the averages of the weights. We train the model for a total of 300 epochs and we switch to the SWA learning rate schedule and start to collect SWA averages of the parameters at epoch 160: >>> loader, optimizer, model, loss_fn = ... >>> swa_model = torch.optim.swa_utils.AveragedModel(model) >>> scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300) >>> swa_start = 160 >>> swa_scheduler = SWALR(optimizer, swa_lr=0.05) >>> >>> for epoch in range(300): >>> for input, target in loader: >>> optimizer.zero_grad() >>> loss_fn(model(input), target).backward() >>> optimizer.step() >>> if epoch > swa_start: >>> swa_model.update_parameters(model) >>> swa_scheduler.step() >>> else: >>> scheduler.step() >>> >>> # Update bn statistics for the swa_model at the end >>> torch.optim.swa_utils.update_bn(loader, swa_model) >>> # Use swa_model to make predictions on test data >>> preds = swa_model(test_input) # Pipeline Parallelism Pipeline parallelism was original introduced in the [Gpipe](https://arxiv.org/abs/1811.06965) paper and is an efficient technique to train large models on multiple GPUs. Warning Pipeline Parallelism is experimental and subject to change. ## Model Parallelism using multiple GPUs Typically for large models which don’t fit on a single GPU, model parallelism is employed where certain parts of the model are placed on different GPUs. Although, if this is done naively for sequential models, the training process suffers from GPU under utilization since only one GPU is active at one time as shown in the figure below: The figure represents a model with 4 layers placed on 4 different GPUs (vertical axis). The horizontal axis represents training this model through time demonstrating that only 1 GPU is utilized at a time ([image source](https://arxiv.org/abs/1811.06965)). ## Pipelined Execution To alleviate this problem, pipeline parallelism splits the input minibatch into multiple microbatches and pipelines the execution of these microbatches across multiple GPUs. This is outlined in the figure below: The figure represents a model with 4 layers placed on 4 different GPUs (vertical axis). The horizontal axis represents training this model through time demonstrating that the GPUs are utilized much more efficiently. However, there still exists a bubble (as demonstrated in the figure) where certain GPUs are not utilized. ([image source](https://arxiv.org/abs/1811.06965)). ## Pipe APIs in PyTorch `class torch.distributed.pipeline.sync.Pipe(module, chunks=1, checkpoint='except_last', deferred_batch_norm=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/pipe.html#Pipe) Wraps an arbitrary [`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") module to train on using synchronous pipeline parallelism. If the module requires lots of memory and doesn’t fit on a single GPU, pipeline parallelism is a useful technique to employ for training. The implementation is based on the [torchgpipe](https://arxiv.org/abs/2004.09910) paper. Pipe combines pipeline parallelism with checkpointing to reduce peak memory required to train while minimizing device under-utilization. You should place all the modules on the appropriate devices and wrap them into an [`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") module defining the desired order of execution. Parameters * **module** ([`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential")) – sequential module to be parallelized using pipelining. Each module in the sequence has to have all of its parameters on a single device. Each module in the sequence has to either be an nn.Module or [`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential") (to combine multiple sequential modules on a single device) * **chunks** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of micro-batches (default: `1`) * **checkpoint** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – when to enable checkpointing, one of `'always'`, `'except_last'`, or `'never'` (default: `'except_last'`). `'never'` disables checkpointing completely, `'except_last'` enables checkpointing for all micro-batches except the last one and `'always'` enables checkpointing for all micro-batches. * **deferred_batch_norm** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to use deferred `BatchNorm` moving statistics (default: [`False`](https://docs.python.org/3/library/constants.html#False "\(in Python v3.9\)")). If set to [`True`](https://docs.python.org/3/library/constants.html#True "\(in Python v3.9\)"), we track statistics across multiple micro-batches to update the running statistics per mini-batch. Raises * [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError "\(in Python v3.9\)") – the module is not a [`nn.Sequential`](generated/torch.nn.sequential#torch.nn.Sequential "torch.nn.Sequential"). * [**ValueError**](https://docs.python.org/3/library/exceptions.html#ValueError "\(in Python v3.9\)") – invalid arguments Example:: Pipeline of two FC layers across GPUs 0 and 1. >>> fc1 = nn.Linear(16, 8).cuda(0) >>> fc2 = nn.Linear(8, 4).cuda(1) >>> model = nn.Sequential(fc1, fc2) >>> model = Pipe(model, chunks=8) >>> input = torch.rand(16, 16).cuda(0) >>> output_rref = model(input) Note You can wrap a `Pipe` model with [`torch.nn.parallel.DistributedDataParallel`](generated/torch.nn.parallel.distributeddataparallel#torch.nn.parallel.DistributedDataParallel "torch.nn.parallel.DistributedDataParallel") only when the checkpoint parameter of `Pipe` is `'never'`. Note `Pipe` only supports intra-node pipelining currently, but will be expanded to support inter-node pipelining in the future. The forward function returns an [`RRef`](rpc#torch.distributed.rpc.RRef "torch.distributed.rpc.RRef") to allow for inter-node pipelining in the future, where the output might be on a remote host. For intra-node pipelinining you can use [`local_value()`](rpc#torch.distributed.rpc.RRef.local_value "torch.distributed.rpc.RRef.local_value") to retrieve the output locally. Warning `Pipe` is experimental and subject to change. `forward(input)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/pipe.html#Pipe.forward) Processes a single input mini-batch through the pipe and returns an [`RRef`](rpc#torch.distributed.rpc.RRef "torch.distributed.rpc.RRef") pointing to the output. `Pipe` is a fairly transparent module wrapper. It doesn’t modify the input and output signature of the underlying module. But there’s type restriction. Input and output have to be a [`Tensor`](tensors#torch.Tensor "torch.Tensor") or a sequence of tensors. This restriction is applied at partition boundaries too. The input tensor is split into multiple micro-batches based on the `chunks` parameter used to initialize `Pipe`. The batch size is assumed to be the first dimension of the tensor and if the batch size is less than `chunks`, the number of micro-batches is equal to the batch size. Parameters **input** (torch.Tensor or sequence of [`Tensor`](tensors#torch.Tensor "torch.Tensor")) – input mini-batch Returns [`RRef`](rpc#torch.distributed.rpc.RRef "torch.distributed.rpc.RRef") to the output of the mini-batch Raises [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError "\(in Python v3.9\)") – input is not a tensor or sequence of tensors. ### Skip connections Certain models like ResNeXt are not completely sequential and have skip connections between layers. Naively implementing as part of pipeling parallelism would imply that we need to copy outputs for certain layers through multiple GPUs till we eventually reach the GPU where the layer for the skip connection resides. To avoid this copy overhead, we provide APIs below to stash and pop Tensors in different layers of the model. `torch.distributed.pipeline.sync.skip.skippable.skippable(stash=(), pop=())` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#skippable) The decorator to define a [`nn.Module`](generated/torch.nn.module#torch.nn.Module "torch.nn.Module") with skip connections. Decorated modules are called “skippable”. This functionality works perfectly fine even when the module is not wrapped by `Pipe`. Each skip tensor is managed by its name. Before manipulating skip tensors, a skippable module must statically declare the names for skip tensors by `stash` and/or `pop` parameters. Skip tensors with pre-declared name can be stashed by `yield stash(name, tensor)` or popped by `tensor = yield pop(name)`. Here is an example with three layers. A skip tensor named “1to3” is stashed and popped at the first and last layer, respectively: @skippable(stash=['1to3']) class Layer1(nn.Module): def forward(self, input): yield stash('1to3', input) return f1(input) class Layer2(nn.Module): def forward(self, input): return f2(input) @skippable(pop=['1to3']) class Layer3(nn.Module): def forward(self, input): skip_1to3 = yield pop('1to3') return f3(input) + skip_1to3 model = nn.Sequential(Layer1(), Layer2(), Layer3()) One skippable module can stash or pop multiple skip tensors: @skippable(stash=['alice', 'bob'], pop=['carol']) class StashStashPop(nn.Module): def forward(self, input): yield stash('alice', f_alice(input)) yield stash('bob', f_bob(input)) carol = yield pop('carol') return input + carol Every skip tensor must be associated with exactly one pair of `stash` and `pop`. `Pipe` checks this restriction automatically when wrapping a module. You can also check the restriction by `verify_skippables()` without `Pipe`. `class torch.distributed.pipeline.sync.skip.skippable.stash(name, tensor)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#stash) The command to stash a skip tensor. def forward(self, input): yield stash('name', input) return f(input) Parameters * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – name of skip tensor * **input** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)")) – tensor to pass to the skip connection `class torch.distributed.pipeline.sync.skip.skippable.pop(name)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#pop) The command to pop a skip tensor. def forward(self, input): skip = yield pop('name') return f(input) + skip Parameters **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – name of skip tensor Returns the skip tensor previously stashed by another layer under the same name `torch.distributed.pipeline.sync.skip.skippable.verify_skippables(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/pipeline/sync/skip/skippable.html#verify_skippables) Verifies if the underlying skippable modules satisfy integrity. Every skip tensor must have only one pair of `stash` and `pop`. If there are one or more unmatched pairs, it will raise [`TypeError`](https://docs.python.org/3/library/exceptions.html#TypeError "\(in Python v3.9\)") with the detailed messages. Here are a few failure cases. `verify_skippables()` will report failure for these cases: # Layer1 stashes "1to3". # Layer3 pops "1to3". nn.Sequential(Layer1(), Layer2()) # └──── ? nn.Sequential(Layer2(), Layer3()) # ? ────┘ nn.Sequential(Layer1(), Layer2(), Layer3(), Layer3()) # └───────────────────┘ ^^^^^^ nn.Sequential(Layer1(), Layer1(), Layer2(), Layer3()) # ^^^^^^ └───────────────────┘ To use the same name for multiple skip tensors, they must be isolated by different namespaces. See `isolate()`. Raises [**TypeError**](https://docs.python.org/3/library/exceptions.html#TypeError "\(in Python v3.9\)") – one or more pairs of `stash` and `pop` are not matched. ## Acknowledgements The implementation for pipeline parallelism is based on [fairscale’s pipe implementation](https://github.com/facebookresearch/fairscale/tree/master/fairscale/nn/pipe) and [torchgpipe](https://github.com/kakaobrain/torchgpipe). We would like to thank both teams for their contributions and guidance towards bringing pipeline parallelism into PyTorch. # torch.random `torch.random.fork_rng(devices=None, enabled=True, _caller='fork_rng', _devices_kw='devices')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#fork_rng) Forks the RNG, so that when you return, the RNG is reset to the state that it was previously in. Parameters * **devices** (_iterable of CUDA IDs_) – CUDA devices for which to fork the RNG. CPU RNG state is always forked. By default, `fork_rng()` operates on all devices, but will emit a warning if your machine has a lot of devices, since this function will run very slowly in that case. If you explicitly specify devices, this warning will be suppressed * **enabled** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `False`, the RNG is not forked. This is a convenience argument for easily disabling the context manager without having to delete it and unindent your Python code under it. `torch.random.get_rng_state()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#get_rng_state) Returns the random number generator state as a `torch.ByteTensor`. `torch.random.initial_seed()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#initial_seed) Returns the initial seed for generating random numbers as a Python `long`. `torch.random.manual_seed(seed)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#manual_seed) Sets the seed for generating random numbers. Returns a `torch.Generator` object. Parameters **seed** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The desired seed. Value must be within the inclusive range `[-0x8000_0000_0000_0000, 0xffff_ffff_ffff_ffff]`. Otherwise, a RuntimeError is raised. Negative inputs are remapped to positive values with the formula `0xffff_ffff_ffff_ffff + seed`. `torch.random.seed()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#seed) Sets the seed for generating random numbers to a non-deterministic random number. Returns a 64 bit number used to seed the RNG. `torch.random.set_rng_state(new_state)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/random.html#set_rng_state) Sets the random number generator state. Parameters **new_state** (_torch.ByteTensor_) – The desired state # Distributed RPC Framework The distributed RPC framework provides mechanisms for multi-machine model training through a set of primitives to allow for remote communication, and a higher-level API to automatically differentiate models split across several machines. Warning APIs in the RPC package are stable. There are multiple ongoing work items to improve performance and error handling, which will ship in future releases. Note Please refer to [PyTorch Distributed Overview](https://pytorch.org/tutorials/beginner/dist_overview.html) for a brief introduction to all features related to distributed training. ## Basics The distributed RPC framework makes it easy to run functions remotely, supports referencing remote objects without copying the real data around, and provides autograd and optimizer APIs to transparently run backward and update parameters across RPC boundaries. These features can be categorized into four sets of APIs. 1. **Remote Procedure Call (RPC)** supports running a function on the specified destination worker with the given arguments and getting the return value back or creating a reference to the return value. There are three main RPC APIs: `rpc_sync()` (synchronous), `rpc_async()` (asynchronous), and `remote()` (asynchronous and returns a reference to the remote return value). Use the synchronous API if the user code cannot proceed without the return value. Otherwise, use the asynchronous API to get a future, and wait on the future when the return value is needed on the caller. The `remote()` API is useful when the requirement is to create something remotely but never need to fetch it to the caller. Imagine the case that a driver process is setting up a parameter server and a trainer. The driver can create an embedding table on the parameter server and then share the reference to the embedding table with the trainer, but itself will never use the embedding table locally. In this case, `rpc_sync()` and `rpc_async()` are no longer appropriate, as they always imply that the return value will be returned to the caller immediately or in the future. 2. **Remote Reference (RRef)** serves as a distributed shared pointer to a local or remote object. It can be shared with other workers and reference counting will be handled transparently. Each RRef only has one owner and the object only lives on that owner. Non-owner workers holding RRefs can get copies of the object from the owner by explicitly requesting it. This is useful when a worker needs to access some data object, but itself is neither the creator (the caller of `remote()`) or the owner of the object. The distributed optimizer, as we will discuss below, is one example of such use cases. 3. **Distributed Autograd** stitches together local autograd engines on all the workers involved in the forward pass, and automatically reach out to them during the backward pass to compute gradients. This is especially helpful if the forward pass needs to span multiple machines when conducting, e.g., distributed model parallel training, parameter-server training, etc. With this feature, user code no longer needs to worry about how to send gradients across RPC boundaries and in which order should the local autograd engines be launched, which can become quite complicated where there are nested and inter-dependent RPC calls in the forward pass. 4. **Distributed Optimizer** ’s constructor takes a [`Optimizer()`](optim#torch.optim.Optimizer "torch.optim.Optimizer") (e.g., [`SGD()`](optim#torch.optim.SGD "torch.optim.SGD"), [`Adagrad()`](optim#torch.optim.Adagrad "torch.optim.Adagrad"), etc.) and a list of parameter RRefs, creates an [`Optimizer()`](optim#torch.optim.Optimizer "torch.optim.Optimizer") instance on each distinct RRef owner, and updates parameters accordingly when running `step()`. When you have distributed forward and backward passes, parameters and gradients will be scattered across multiple workers, and hence it requires an optimizer on each of the involved workers. Distributed Optimizer wraps all those local optimizers into one, and provides a concise constructor and `step()` API. ## RPC Before using RPC and distributed autograd primitives, initialization must take place. To initialize the RPC framework we need to use `init_rpc()` which would initialize the RPC framework, RRef framework and distributed autograd. `torch.distributed.rpc.init_rpc(name, backend=None, rank=-1, world_size=None, rpc_backend_options=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc.html#init_rpc) Initializes RPC primitives such as the local RPC agent and distributed autograd, which immediately makes the current process ready to send and receive RPCs. Parameters * **name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – a globally unique name of this node. (e.g., `Trainer3`, `ParameterServer2`, `Master`, `Worker1`) Name can only contain number, alphabet, underscore, colon, and/or dash, and must be shorter than 128 characters. * **backend** (BackendType _,__optional_) – The type of RPC backend implementation. Supported values include `BackendType.TENSORPIPE` (the default) and `BackendType.PROCESS_GROUP`. See Backends for more information. * **rank** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – a globally unique id/rank of this node. * **world_size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The number of workers in the group. * **rpc_backend_options** (RpcBackendOptions _,__optional_) – The options passed to the RpcAgent constructor. It must be an agent-specific subclass of `RpcBackendOptions` and contains agent-specific initialization configurations. By default, for all agents, it sets the default timeout to 60 seconds and performs the rendezvous with an underlying process group initialized using `init_method = "env://"`, meaning that environment variables `MASTER_ADDR` and `MASTER_PORT` need to be set properly. See Backends for more information and find which options are available. The following APIs allow users to remotely execute functions as well as create references (RRefs) to remote data objects. In these APIs, when passing a `Tensor` as an argument or a return value, the destination worker will try to create a `Tensor` with the same meta (i.e., shape, stride, etc.). We intentionally disallow transmitting CUDA tensors because it might crash if the device lists on source and destination workers do not match. In such cases, applications can always explicitly move the input tensors to CPU on the caller and move it to the desired devices on the callee if necessary. Warning TorchScript support in RPC is a prototype feature and subject to change. Since v1.5.0, `torch.distributed.rpc` supports calling TorchScript functions as RPC target functions, and this will help improve parallelism on the callee side as executing TorchScript functions does not require GIL. `torch.distributed.rpc.rpc_sync(to, func, args=None, kwargs=None, timeout=-1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#rpc_sync) Make a blocking RPC call to run function `func` on worker `to`. RPC messages are sent and received in parallel to execution of Python code. This method is thread-safe. Parameters * **to** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_WorkerInfo _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – name/rank/`WorkerInfo` of the destination worker. * **func** (_callable_) – a callable function, such as Python callables, builtin operators (e.g. [`add()`](generated/torch.add#torch.add "torch.add")) and annotated TorchScript functions. * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the argument tuple for the `func` invocation. * **kwargs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – is a dictionary of keyword arguments for the `func` invocation. * **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – timeout in seconds to use for this RPC. If the RPC does not complete in this amount of time, an exception indicating it has timed out will be raised. A value of 0 indicates an infinite timeout, i.e. a timeout error will never be raised. If not provided, the default value set during initialization or with `_set_rpc_timeout` is used. Returns Returns the result of running `func` with `args` and `kwargs`. Warning Using GPU tensors as arguments or return values of `func` is not supported since we don’t support sending GPU tensors over the wire. You need to explicitly copy GPU tensors to CPU before using them as arguments or return values of `func`. Example:: Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both workers. Refer to [`init_process_group()`](distributed#torch.distributed.init_process_group "torch.distributed.init_process_group") API for more details. For example, >>> export MASTER_ADDR=localhost >>> export MASTER_PORT=5678 Then run the following code in two different processes: >>> # On worker 0: >>> import torch >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker0", rank=0, world_size=2) >>> ret = rpc.rpc_sync("worker1", torch.add, args=(torch.ones(2), 3)) >>> rpc.shutdown() >>> # On worker 1: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker1", rank=1, world_size=2) >>> rpc.shutdown() Below is an example of running a TorchScript function using RPC. >>> # On both workers: >>> @torch.jit.script >>> def my_script_add(t1, t2): >>> return torch.add(t1, t2) >>> # On worker 0: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker0", rank=0, world_size=2) >>> ret = rpc.rpc_sync("worker1", my_script_add, args=(torch.ones(2), 3)) >>> rpc.shutdown() >>> # On worker 1: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker1", rank=1, world_size=2) >>> rpc.shutdown() `torch.distributed.rpc.rpc_async(to, func, args=None, kwargs=None, timeout=-1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#rpc_async) Make a non-blocking RPC call to run function `func` on worker `to`. RPC messages are sent and received in parallel to execution of Python code. This method is thread-safe. This method will immediately return a [`Future`](futures#torch.futures.Future "torch.futures.Future") that can be awaited on. Parameters * **to** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_WorkerInfo _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – name/rank/`WorkerInfo` of the destination worker. * **func** (_callable_) – a callable function, such as Python callables, builtin operators (e.g. [`add()`](generated/torch.add#torch.add "torch.add")) and annotated TorchScript functions. * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the argument tuple for the `func` invocation. * **kwargs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – is a dictionary of keyword arguments for the `func` invocation. * **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – timeout in seconds to use for this RPC. If the RPC does not complete in this amount of time, an exception indicating it has timed out will be raised. A value of 0 indicates an infinite timeout, i.e. a timeout error will never be raised. If not provided, the default value set during initialization or with `_set_rpc_timeout` is used. Returns Returns a [`Future`](futures#torch.futures.Future "torch.futures.Future") object that can be waited on. When completed, the return value of `func` on `args` and `kwargs` can be retrieved from the [`Future`](futures#torch.futures.Future "torch.futures.Future") object. Warning Using GPU tensors as arguments or return values of `func` is not supported since we don’t support sending GPU tensors over the wire. You need to explicitly copy GPU tensors to CPU before using them as arguments or return values of `func`. Warning The `rpc_async` API does not copy storages of argument tensors until sending them over the wire, which could be done by a different thread depending on the RPC backend type. The caller should make sure that the contents of those tensors stay intact until the returned [`Future`](futures#torch.futures.Future "torch.futures.Future") completes. Example:: Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both workers. Refer to [`init_process_group()`](distributed#torch.distributed.init_process_group "torch.distributed.init_process_group") API for more details. For example, >>> export MASTER_ADDR=localhost >>> export MASTER_PORT=5678 Then run the following code in two different processes: >>> # On worker 0: >>> import torch >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker0", rank=0, world_size=2) >>> fut1 = rpc.rpc_async("worker1", torch.add, args=(torch.ones(2), 3)) >>> fut2 = rpc.rpc_async("worker1", min, args=(1, 2)) >>> result = fut1.wait() + fut2.wait() >>> rpc.shutdown() >>> # On worker 1: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker1", rank=1, world_size=2) >>> rpc.shutdown() Below is an example of running a TorchScript function using RPC. >>> # On both workers: >>> @torch.jit.script >>> def my_script_add(t1, t2): >>> return torch.add(t1, t2) >>> # On worker 0: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker0", rank=0, world_size=2) >>> fut = rpc.rpc_async("worker1", my_script_add, args=(torch.ones(2), 3)) >>> ret = fut.wait() >>> rpc.shutdown() >>> # On worker 1: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker1", rank=1, world_size=2) >>> rpc.shutdown() `torch.distributed.rpc.remote(to, func, args=None, kwargs=None, timeout=-1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#remote) Make a remote call to run `func` on worker `to` and return an `RRef` to the result value immediately. Worker `to` will be the owner of the returned `RRef`, and the worker calling `remote` is a user. The owner manages the global reference count of its `RRef`, and the owner `RRef` is only destructed when globally there are no living references to it. Parameters * **to** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _or_WorkerInfo _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – name/rank/`WorkerInfo` of the destination worker. * **func** (_callable_) – a callable function, such as Python callables, builtin operators (e.g. [`add()`](generated/torch.add#torch.add "torch.add")) and annotated TorchScript functions. * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – the argument tuple for the `func` invocation. * **kwargs** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – is a dictionary of keyword arguments for the `func` invocation. * **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – timeout in seconds for this remote call. If the creation of this `RRef` on worker `to` is not successfully processed on this worker within this timeout, then the next time there is an attempt to use the RRef (such as `to_here()`), a timeout will be raised indicating this failure. A value of 0 indicates an infinite timeout, i.e. a timeout error will never be raised. If not provided, the default value set during initialization or with `_set_rpc_timeout` is used. Returns A user `RRef` instance to the result value. Use the blocking API `torch.distributed.rpc.RRef.to_here()` to retrieve the result value locally. Warning Using GPU tensors as arguments or return values of `func` is not supported since we don’t support sending GPU tensors over the wire. You need to explicitly copy GPU tensors to CPU before using them as arguments or return values of `func`. Warning The `remote` API does not copy storages of argument tensors until sending them over the wire, which could be done by a different thread depending on the RPC backend type. The caller should make sure that the contents of those tensors stay intact until the returned RRef is confirmed by the owner, which can be checked using the `torch.distributed.rpc.RRef.confirmed_by_owner()` API. Warning Errors such as timeouts for the `remote` API are handled on a best-effort basis. This means that when remote calls initiated by `remote` fail, such as with a timeout error, we take a best-effort approach to error handling. This means that errors are handled and set on the resulting RRef on an asynchronous basis. If the RRef has not been used by the application before this handling (such as `to_here` or fork call), then future uses of the `RRef` will appropriately raise errors. However, it is possible that the user application will use the `RRef` before the errors are handled. In this case, errors may not be raised as they have not yet been handled. Example:: Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both workers. Refer to [`init_process_group()`](distributed#torch.distributed.init_process_group "torch.distributed.init_process_group") API for more details. For example, >>> export MASTER_ADDR=localhost >>> export MASTER_PORT=5678 Then run the following code in two different processes: >>> # On worker 0: >>> import torch >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker0", rank=0, world_size=2) >>> rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3)) >>> rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1)) >>> x = rref1.to_here() + rref2.to_here() >>> rpc.shutdown() >>> # On worker 1: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker1", rank=1, world_size=2) >>> rpc.shutdown() Below is an example of running a TorchScript function using RPC. >>> # On both workers: >>> @torch.jit.script >>> def my_script_add(t1, t2): >>> return torch.add(t1, t2) >>> # On worker 0: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker0", rank=0, world_size=2) >>> rref = rpc.remote("worker1", my_script_add, args=(torch.ones(2), 3)) >>> rref.to_here() >>> rpc.shutdown() >>> # On worker 1: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker1", rank=1, world_size=2) >>> rpc.shutdown() `torch.distributed.rpc.get_worker_info(worker_name=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#get_worker_info) Get `WorkerInfo` of a given worker name. Use this `WorkerInfo` to avoid passing an expensive string on every invocation. Parameters **worker_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – the string name of a worker. If `None`, return the the id of the current worker. (default `None`) Returns `WorkerInfo` instance for the given `worker_name` or `WorkerInfo` of the current worker if `worker_name` is `None`. `torch.distributed.rpc.shutdown(graceful=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#shutdown) Perform a shutdown of the RPC agent, and then destroy the RPC agent. This stops the local agent from accepting outstanding requests, and shuts down the RPC framework by terminating all RPC threads. If `graceful=True`, this will block until all local and remote RPC processes reach this method and wait for all outstanding work to complete. Otherwise, if `graceful=False`, this is a local shutdown, and it does not wait for other RPC processes to reach this method. Warning For [`Future`](futures#torch.futures.Future "torch.futures.Future") objects returned by `rpc_async()`, `future.wait()` should not be called after `shutdown()`. Parameters **graceful** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to do a graceful shutdown or not. If True, this will 1) wait until there is no pending system messages for `UserRRefs` and delete them; 2) block until all local and remote RPC processes have reached this method and wait for all outstanding work to complete. Example:: Make sure that `MASTER_ADDR` and `MASTER_PORT` are set properly on both workers. Refer to [`init_process_group()`](distributed#torch.distributed.init_process_group "torch.distributed.init_process_group") API for more details. For example, >>> export MASTER_ADDR=localhost >>> export MASTER_PORT=5678 Then run the following code in two different processes: >>> # On worker 0: >>> import torch >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker0", rank=0, world_size=2) >>> # do some work >>> result = rpc.rpc_sync("worker1", torch.add, args=(torch.ones(1), 1)) >>> # ready to shutdown >>> rpc.shutdown() >>> # On worker 1: >>> import torch.distributed.rpc as rpc >>> rpc.init_rpc("worker1", rank=1, world_size=2) >>> # wait for worker 0 to finish work, and then shutdown. >>> rpc.shutdown() `class torch.distributed.rpc.WorkerInfo` A structure that encapsulates information of a worker in the system. Contains the name and ID of the worker. This class is not meant to be constructed directly, rather, an instance can be retrieved through `get_worker_info()` and the result can be passed in to functions such as `rpc_sync()`, `rpc_async()`, `remote()` to avoid copying a string on every invocation. `property id` Globally unique id to identify the worker. `property name` The name of the worker. The RPC package also provides decorators which allow applications to specify how a given function should be treated on the callee side. `torch.distributed.rpc.functions.async_execution(fn)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/functions.html#async_execution) A decorator for a function indicating that the return value of the function is guaranteed to be a [`Future`](futures#torch.futures.Future "torch.futures.Future") object and this function can run asynchronously on the RPC callee. More specifically, the callee extracts the [`Future`](futures#torch.futures.Future "torch.futures.Future") returned by the wrapped function and installs subsequent processing steps as a callback to that [`Future`](futures#torch.futures.Future "torch.futures.Future"). The installed callback will read the value from the [`Future`](futures#torch.futures.Future "torch.futures.Future") when completed and send the value back as the RPC response. That also means the returned [`Future`](futures#torch.futures.Future "torch.futures.Future") only exists on the callee side and is never sent through RPC. This decorator is useful when the wrapped function’s (`fn`) execution needs to pause and resume due to, e.g., containing `rpc_async()` or waiting for other signals. Note To enable asynchronous execution, applications must pass the function object returned by this decorator to RPC APIs. If RPC detected attributes installed by this decorator, it knows that this function returns a `Future` object and will handle that accordingly. However, this does not mean this decorator has to be outmost one when defining a function. For example, when combined with `@staticmethod` or `@classmethod`, `@rpc.functions.async_execution` needs to be the inner decorator to allow the target function be recognized as a static or class function. This target function can still execute asynchronously because, when accessed, the static or class method preserves attributes installed by `@rpc.functions.async_execution`. Example:: The returned [`Future`](futures#torch.futures.Future "torch.futures.Future") object can come from `rpc_async()`, [`then()`](futures#torch.futures.Future.then "torch.futures.Future.then"), or [`Future`](futures#torch.futures.Future "torch.futures.Future") constructor. The example below shows directly using the [`Future`](futures#torch.futures.Future "torch.futures.Future") returned by [`then()`](futures#torch.futures.Future.then "torch.futures.Future.then"). >>> from torch.distributed import rpc >>> >>> # omitting setup and shutdown RPC >>> >>> # On all workers >>> @rpc.functions.async_execution >>> def async_add_chained(to, x, y, z): >>> # This function runs on "worker1" and returns immediately when >>> # the callback is installed through the `then(cb)` API. In the >>> # mean time, the `rpc_async` to "worker2" can run concurrently. >>> # When the return value of that `rpc_async` arrives at >>> # "worker1", "worker1" will run the lambda function accordingly >>> # and set the value for the previously returned `Future`, which >>> # will then trigger RPC to send the result back to "worker0". >>> return rpc.rpc_async(to, torch.add, args=(x, y)).then( >>> lambda fut: fut.wait() + z >>> ) >>> >>> # On worker0 >>> ret = rpc.rpc_sync( >>> "worker1", >>> async_add_chained, >>> args=("worker2", torch.ones(2), 1, 1) >>> ) >>> print(ret) # prints tensor([3., 3.]) When combined with TorchScript decorators, this decorator must be the outmost one. >>> from torch import Tensor >>> from torch.futures import Future >>> from torch.distributed import rpc >>> >>> # omitting setup and shutdown RPC >>> >>> # On all workers >>> @torch.jit.script >>> def script_add(x: Tensor, y: Tensor) -> Tensor: >>> return x + y >>> >>> @rpc.functions.async_execution >>> @torch.jit.script >>> def async_add(to: str, x: Tensor, y: Tensor) -> Future[Tensor]: >>> return rpc.rpc_async(to, script_add, (x, y)) >>> >>> # On worker0 >>> ret = rpc.rpc_sync( >>> "worker1", >>> async_add, >>> args=("worker2", torch.ones(2), 1) >>> ) >>> print(ret) # prints tensor([2., 2.]) When combined with static or class method, this decorator must be the inner one. >>> from torch.distributed import rpc >>> >>> # omitting setup and shutdown RPC >>> >>> # On all workers >>> class AsyncExecutionClass: >>> >>> @staticmethod >>> @rpc.functions.async_execution >>> def static_async_add(to, x, y, z): >>> return rpc.rpc_async(to, torch.add, args=(x, y)).then( >>> lambda fut: fut.wait() + z >>> ) >>> >>> @classmethod >>> @rpc.functions.async_execution >>> def class_async_add(cls, to, x, y, z): >>> ret_fut = torch.futures.Future() >>> rpc.rpc_async(to, torch.add, args=(x, y)).then( >>> lambda fut: ret_fut.set_result(fut.wait() + z) >>> ) >>> return ret_fut >>> >>> @rpc.functions.async_execution >>> def bound_async_add(self, to, x, y, z): >>> return rpc.rpc_async(to, torch.add, args=(x, y)).then( >>> lambda fut: fut.wait() + z >>> ) >>> >>> # On worker0 >>> ret = rpc.rpc_sync( >>> "worker1", >>> AsyncExecutionClass.static_async_add, >>> args=("worker2", torch.ones(2), 1, 2) >>> ) >>> print(ret) # prints tensor([4., 4.]) >>> >>> ret = rpc.rpc_sync( >>> "worker1", >>> AsyncExecutionClass.class_async_add, >>> args=("worker2", torch.ones(2), 1, 2) >>> ) >>> print(ret) # prints tensor([4., 4.]) This decorator also works with RRef helpers, i.e., . `torch.distributed.rpc.RRef.rpc_sync()`, `torch.distributed.rpc.RRef.rpc_async()`, and `torch.distributed.rpc.RRef.remote()`. >>> from torch.distributed import rpc >>> >>> # reuse the AsyncExecutionClass class above >>> rref = rpc.remote("worker1", AsyncExecutionClass) >>> ret = rref.rpc_sync().static_async_add("worker2", torch.ones(2), 1, 2) >>> print(ret) # prints tensor([4., 4.]) >>> >>> rref = rpc.remote("worker1", AsyncExecutionClass) >>> ret = rref.rpc_async().static_async_add("worker2", torch.ones(2), 1, 2).wait() >>> print(ret) # prints tensor([4., 4.]) >>> >>> rref = rpc.remote("worker1", AsyncExecutionClass) >>> ret = rref.remote().static_async_add("worker2", torch.ones(2), 1, 2).to_here() >>> print(ret) # prints tensor([4., 4.]) ### Backends The RPC module can leverage different backends to perform the communication between the nodes. The backend to be used can be specified in the `init_rpc()` function, by passing a certain value of the `BackendType` enum. Regardless of what backend is used, the rest of the RPC API won’t change. Each backend also defines its own subclass of the `RpcBackendOptions` class, an instance of which can also be passed to `init_rpc()` to configure the backend’s behavior. `class torch.distributed.rpc.BackendType` An enum class of available backends. PyTorch ships with two builtin backends: `BackendType.TENSORPIPE` and `BackendType.PROCESS_GROUP`. Additional ones can be registered using the `register_backend()` function. `class torch.distributed.rpc.RpcBackendOptions` An abstract structure encapsulating the options passed into the RPC backend. An instance of this class can be passed in to `init_rpc()` in order to initialize RPC with specific configurations, such as the RPC timeout and `init_method` to be used. `property init_method` URL specifying how to initialize the process group. Default is `env://` `property rpc_timeout` A float indicating the timeout to use for all RPCs. If an RPC does not complete in this timeframe, it will complete with an exception indicating that it has timed out. #### TensorPipe Backend The TensorPipe agent, which is the default, leverages [the TensorPipe library](https://github.com/pytorch/tensorpipe), which provides a natively point-to-point communication primitive specifically suited for machine learning that fundamentally addresses some of the limitations of Gloo. Compared to Gloo, it has the advantage of being asynchronous, which allows a large number of transfers to occur simultaneously, each at their own speed, without blocking each other. It will only open pipes between pairs of nodes when needed, on demand, and when one node fails only its incident pipes will be closed, while all other ones will keep working as normal. In addition, it is able to support multiple different transports (TCP, of course, but also shared memory, NVLink, InfiniBand, …) and can automatically detect their availability and negotiate the best transport to use for each pipe. The TensorPipe backend has been introduced in PyTorch v1.6 and is being actively developed. At the moment, it only supports CPU tensors, with GPU support coming soon. It comes with a TCP-based transport, just like Gloo. It is also able to automatically chunk and multiplex large tensors over multiple sockets and threads in order to achieve very high bandwidths. The agent will be able to pick the best transport on its own, with no intervention required. Example: >>> import os >>> from torch.distributed import rpc >>> os.environ['MASTER_ADDR'] = 'localhost' >>> os.environ['MASTER_PORT'] = '29500' >>> >>> rpc.init_rpc( >>> "worker1", >>> rank=0, >>> world_size=2, >>> rpc_backend_options=rpc.TensorPipeRpcBackendOptions( >>> num_worker_threads=8, >>> rpc_timeout=20 # 20 second timeout >>> ) >>> ) >>> >>> # omitting init_rpc invocation on worker2 `class torch.distributed.rpc.TensorPipeRpcBackendOptions(*, num_worker_threads=16, rpc_timeout=60.0, init_method='env://', device_maps=None, _transports=None, _channels=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/options.html#TensorPipeRpcBackendOptions) The backend options for `TensorPipeAgent`, derived from `RpcBackendOptions`. Parameters * **num_worker_threads** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The number of threads in the thread-pool used by `TensorPipeAgent` to execute requests (default: 16). * **rpc_timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The default timeout, in seconds, for RPC requests (default: 60 seconds). If the RPC has not completed in this timeframe, an exception indicating so will be raised. Callers can override this timeout for individual RPCs in `rpc_sync()` and `rpc_async()` if necessary. * **init_method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – The URL to initialize the distributed store used for rendezvous. It takes any value accepted for the same argument of [`init_process_group()`](distributed#torch.distributed.init_process_group "torch.distributed.init_process_group") (default: `env://`). * **device_maps** (_Dict_ _[_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__Dict_ _]_) – Device placement mappings from this worker to the callee. Key is the callee worker name and value the dictionary (`Dict` of `int`, `str`, or `torch.device`) that maps this worker’s devices to the callee worker’s devices. (default: `None`) `property device_maps` The device map locations. `property init_method` URL specifying how to initialize the process group. Default is `env://` `property num_worker_threads` The number of threads in the thread-pool used by `TensorPipeAgent` to execute requests. `property rpc_timeout` A float indicating the timeout to use for all RPCs. If an RPC does not complete in this timeframe, it will complete with an exception indicating that it has timed out. `set_device_map(to, device_map)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/options.html#TensorPipeRpcBackendOptions.set_device_map) Set device mapping between each RPC caller and callee pair. This function can be called multiple times to incrementally add device placement configurations. Parameters * **worker_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Callee name. * **device_map** (_Dict of python:int_ _,_[str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _, or_[torch.device](tensor_attributes#torch.torch.device "torch.torch.device")) – Device placement mappings from this worker to the callee. This map must be invertible. Example:: >>> # both workers >>> def add(x, y): >>> print(x) # tensor([1., 1.], device='cuda:1') >>> return x + y, (x + y).to(2) >>> >>> # on worker 0 >>> options = TensorPipeRpcBackendOptions( >>> num_worker_threads=8, >>> device_maps={"worker1": {0, 1}} >>> # maps worker0's cuda:0 to worker1's cuda:1 >>> ) >>> options.set_device_map("worker1", {1, 2}) >>> # maps worker0's cuda:1 to worker1's cuda:2 >>> >>> rpc.init_rpc( >>> "worker0", >>> rank=0, >>> world_size=2 >>> backend=rpc.BackendType.TENSORPIPE, >>> rpc_backend_options=options >>> ) >>> >>> x = torch.ones(2) >>> rets = rpc.rpc_sync("worker1", add, args=(x.to(0), 1)) >>> # The first argument will be moved to cuda:1 on worker1. When >>> # sending the return value back, it will follow the invert of >>> # the device map, and hence will be moved back to cuda:0 and >>> # cuda:1 on worker0 >>> print(rets[0]) # tensor([2., 2.], device='cuda:0') >>> print(rets[0]) # tensor([2., 2.], device='cuda:1') #### Process Group Backend Warning The Process Group Backend will be deprecated soon, we recommend using the TensorPipe Backend instead. The Process Group agent instantiates a process group from the [`distributed`](distributed#module-torch.distributed "torch.distributed") module and utilizes its point-to-point communication capabilities to send RPC messages. Internally, the process group uses [the Gloo library](https://github.com/facebookincubator/gloo/). Gloo has been hardened by years of extensive use in PyTorch and is thus very reliable. However, as it was designed to perform collective communication, it may not always be the best fit for RPC. For example, each networking operation is synchronous and blocking, which means that it cannot be run in parallel with others. Moreover, it opens a connection between all pairs of nodes, and brings down all of them when one fails, thus reducing the resiliency and the elasticity of the system. Example: >>> import os >>> from torch.distributed import rpc >>> os.environ['MASTER_ADDR'] = 'localhost' >>> os.environ['MASTER_PORT'] = '29500' >>> >>> rpc.init_rpc( >>> "worker1", >>> rank=0, >>> world_size=2, >>> backend=rpc.BackendType.PROCESS_GROUP, >>> rpc_backend_options=rpc.ProcessGroupRpcBackendOptions( >>> num_send_recv_threads=16, >>> rpc_timeout=20 # 20 second timeout >>> ) >>> ) >>> >>> # omitting init_rpc invocation on worker2 `class torch.distributed.rpc.ProcessGroupRpcBackendOptions` The backend options class for `ProcessGroupAgent`, which is derived from `RpcBackendOptions`. Parameters * **num_send_recv_threads** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The number of threads in the thread-pool used by `ProcessGroupAgent` (default: 4). * **rpc_timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – The default timeout, in seconds, for RPC requests (default: 60 seconds). If the RPC has not completed in this timeframe, an exception indicating so will be raised. Callers can override this timeout for individual RPCs in `rpc_sync()` and `rpc_async()` if necessary. * **init_method** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – The URL to initialize `ProcessGroupGloo` (default: `env://`). `property init_method` URL specifying how to initialize the process group. Default is `env://` `property num_send_recv_threads` The number of threads in the thread-pool used by ProcessGroupAgent. `property rpc_timeout` A float indicating the timeout to use for all RPCs. If an RPC does not complete in this timeframe, it will complete with an exception indicating that it has timed out. ## RRef An `RRef` (Remote REFerence) is a reference to a value of some type `T` (e.g. `Tensor`) on a remote worker. This handle keeps the referenced remote value alive on the owner, but there is no implication that the value will be transferred to the local worker in the future. RRefs can be used in multi- machine training by holding references to [nn.Modules](https://pytorch.org/docs/stable/nn.html#torch.nn.Module) that exist on other workers, and calling the appropriate functions to retrieve or modify their parameters during training. See [Remote Reference Protocol](rpc/rref#remote-reference-protocol) for more details. `class torch.distributed.rpc.RRef` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/rpc/api.html#RRef) `backward(self: torch._C._distributed_rpc.PyRRef, dist_autograd_ctx_id: int = -1, retain_graph: bool = False) → None` Runs the backward pass using the RRef as the root of the backward pass. If `dist_autograd_ctx_id` is provided, we perform a distributed backward pass using the provided ctx_id starting from the owner of the RRef. In this case, `get_gradients()` should be used to retrieve the gradients. If `dist_autograd_ctx_id` is `None`, it is assumed that this is a local autograd graph and we only perform a local backward pass. In the local case, the node calling this API has to be the owner of the RRef. The value of the RRef is expected to be a scalar Tensor. Parameters * **dist_autograd_ctx_id** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – The distributed autograd context id for which we should retrieve the gradients (default: -1). * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to `True` is not needed and often can be worked around in a much more efficient way. Usually, you need to set this to `True` to run backward multiple times (default: False). Example:: >>> import torch.distributed.autograd as dist_autograd >>> with dist_autograd.context() as context_id: >>> rref.backward(context_id) `confirmed_by_owner(self: torch._C._distributed_rpc.PyRRef) → bool` Returns whether this `RRef` has been confirmed by the owner. `OwnerRRef` always returns true, while `UserRRef` only returns true when the owner knowns about this `UserRRef`. `is_owner(self: torch._C._distributed_rpc.PyRRef) → bool` Returns whether or not the current node is the owner of this `RRef`. `local_value(self: torch._C._distributed_rpc.PyRRef) → object` If the current node is the owner, returns a reference to the local value. Otherwise, throws an exception. `owner(self: torch._C._distributed_rpc.PyRRef) → torch._C._distributed_rpc.WorkerInfo` Returns worker information of the node that owns this `RRef`. `owner_name(self: torch._C._distributed_rpc.PyRRef) → str` Returns worker name of the node that owns this `RRef`. `remote(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) → object` Create a helper proxy to easily launch a `remote` using the owner of the RRef as the destination to run functions on the object referenced by this RRef. More specifically, `rref.remote().func_name(*args, **kwargs)` is the same as the following: >>> def run(rref, func_name, args, kwargs): >>> return getattr(rref.local_value(), func_name)(*args, **kwargs) >>> >>> rpc.remote(rref.owner(), run, args=(rref, func_name, args, kwargs)) Parameters **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Timeout for `rref.remote()`. If the creation of this `RRef` is not successfully completed within the timeout, then the next time there is an attempt to use the RRef (such as `to_here`), a timeout will be raised. If not provided, the default RPC timeout will be used. Please see `rpc.remote()` for specific timeout semantics for `RRef`. Example:: >>> from torch.distributed import rpc >>> rref = rpc.remote("worker1", torch.add, args=(torch.zeros(2, 2), 1)) >>> rref.remote().size().to_here() # returns torch.Size([2, 2]) >>> rref.remote().view(1, 4).to_here() # returns tensor([[1., 1., 1., 1.]]) `rpc_async(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) → object` Create a helper proxy to easily launch an `rpc_async` using the owner of the RRef as the destination to run functions on the object referenced by this RRef. More specifically, `rref.rpc_async().func_name(*args, **kwargs)` is the same as the following: >>> def run(rref, func_name, args, kwargs): >>> return getattr(rref.local_value(), func_name)(*args, **kwargs) >>> >>> rpc.rpc_async(rref.owner(), run, args=(rref, func_name, args, kwargs)) Parameters **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Timeout for `rref.rpc_async()`. If the call does not complete within this timeframe, an exception indicating so will be raised. If this argument is not provided, the default RPC timeout will be used. Example:: >>> from torch.distributed import rpc >>> rref = rpc.remote("worker1", torch.add, args=(torch.zeros(2, 2), 1)) >>> rref.rpc_async().size().wait() # returns torch.Size([2, 2]) >>> rref.rpc_async().view(1, 4).wait() # returns tensor([[1., 1., 1., 1.]]) `rpc_sync(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) → object` Create a helper proxy to easily launch an `rpc_sync` using the owner of the RRef as the destination to run functions on the object referenced by this RRef. More specifically, `rref.rpc_sync().func_name(*args, **kwargs)` is the same as the following: >>> def run(rref, func_name, args, kwargs): >>> return getattr(rref.local_value(), func_name)(*args, **kwargs) >>> >>> rpc.rpc_sync(rref.owner(), run, args=(rref, func_name, args, kwargs)) Parameters **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Timeout for `rref.rpc_sync()`. If the call does not complete within this timeframe, an exception indicating so will be raised. If this argument is not provided, the default RPC timeout will be used. Example:: >>> from torch.distributed import rpc >>> rref = rpc.remote("worker1", torch.add, args=(torch.zeros(2, 2), 1)) >>> rref.rpc_sync().size() # returns torch.Size([2, 2]) >>> rref.rpc_sync().view(1, 4) # returns tensor([[1., 1., 1., 1.]]) `to_here(self: torch._C._distributed_rpc.PyRRef, timeout: float = -1.0) → object` Blocking call that copies the value of the RRef from the owner to the local node and returns it. If the current node is the owner, returns a reference to the local value. Parameters **timeout** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _,__optional_) – Timeout for `to_here`. If the call does not complete within this timeframe, an exception indicating so will be raised. If this argument is not provided, the default RPC timeout (60s) will be used. More Information about RRef * [Remote Reference Protocol](rpc/rref) * [Background](rpc/rref#background) * [Assumptions](rpc/rref#assumptions) * [RRef Lifetime](rpc/rref#rref-lifetime) * [Design Reasoning](rpc/rref#design-reasoning) * [Implementation](rpc/rref#implementation) * [Protocol Scenarios](rpc/rref#protocol-scenarios) * [User Share RRef with Owner as Return Value](rpc/rref#user-share-rref-with-owner-as-return-value) * [User Share RRef with Owner as Argument](rpc/rref#user-share-rref-with-owner-as-argument) * [Owner Share RRef with User](rpc/rref#owner-share-rref-with-user) * [User Share RRef with User](rpc/rref#user-share-rref-with-user) ## Distributed Autograd Framework This module provides an RPC-based distributed autograd framework that can be used for applications such as model parallel training. In short, applications may send and receive gradient recording tensors over RPC. In the forward pass, we record when gradient recording tensors are sent over RPC and during the backward pass we use this information to perform a distributed backward pass using RPC. For more details see [Distributed Autograd Design](rpc/distributed_autograd#distributed-autograd-design). `class torch.distributed.autograd.context` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/autograd.html#context) Context object to wrap forward and backward passes when using distributed autograd. The `context_id` generated in the `with` statement is required to uniquely identify a distributed backward pass on all workers. Each worker stores metadata associated with this `context_id`, which is required to correctly execute a distributed autograd pass. Example:: >>> import torch.distributed.autograd as dist_autograd >>> with dist_autograd.context() as context_id: >>> t1 = torch.rand((3, 3), requires_grad=True) >>> t2 = torch.rand((3, 3), requires_grad=True) >>> loss = rpc.rpc_sync("worker1", torch.add, args=(t1, t2)).sum() >>> dist_autograd.backward(context_id, [loss]) `torch.distributed.autograd.backward(context_id: int, roots: List[Tensor], retain_graph = False) → None` Kicks off the distributed backward pass using the provided roots. This currently implements the [FAST mode algorithm](rpc/distributed_autograd#fast- mode-algorithm) which assumes all RPC messages sent in the same distributed autograd context across workers would be part of the autograd graph during the backward pass. We use the provided roots to discover the autograd graph and compute appropriate dependencies. This method blocks until the entire autograd computation is done. We accumulate the gradients in the appropriate `torch.distributed.autograd.context` on each of the nodes. The autograd context to be used is looked up given the `context_id` that is passed in when `torch.distributed.autograd.backward()` is called. If there is no valid autograd context corresponding to the given ID, we throw an error. You can retrieve the accumulated gradients using the `get_gradients()` API. Parameters * **context_id** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The autograd context id for which we should retrieve the gradients. * **roots** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – Tensors which represent the roots of the autograd computation. All the tensors should be scalars. * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Usually, you need to set this to True to run backward multiple times. Example:: >>> import torch.distributed.autograd as dist_autograd >>> with dist_autograd.context() as context_id: >>> pred = model.forward() >>> loss = loss_func(pred, loss) >>> dist_autograd.backward(context_id, loss) `torch.distributed.autograd.get_gradients(context_id: int) → Dict[Tensor, Tensor]` Retrieves a map from Tensor to the appropriate gradient for that Tensor accumulated in the provided context corresponding to the given `context_id` as part of the distributed autograd backward pass. Parameters **context_id** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The autograd context id for which we should retrieve the gradients. Returns A map where the key is the Tensor and the value is the associated gradient for that Tensor. Example:: >>> import torch.distributed.autograd as dist_autograd >>> with dist_autograd.context() as context_id: >>> t1 = torch.rand((3, 3), requires_grad=True) >>> t2 = torch.rand((3, 3), requires_grad=True) >>> loss = t1 + t2 >>> dist_autograd.backward(context_id, [loss.sum()]) >>> grads = dist_autograd.get_gradients(context_id) >>> print(grads[t1]) >>> print(grads[t2]) More Information about RPC Autograd * [Distributed Autograd Design](rpc/distributed_autograd) * [Background](rpc/distributed_autograd#background) * [Autograd recording during the forward pass](rpc/distributed_autograd#autograd-recording-during-the-forward-pass) * [Distributed Autograd Context](rpc/distributed_autograd#distributed-autograd-context) * [Distributed Backward Pass](rpc/distributed_autograd#distributed-backward-pass) * [Computing dependencies](rpc/distributed_autograd#computing-dependencies) * [FAST mode algorithm](rpc/distributed_autograd#fast-mode-algorithm) * [SMART mode algorithm](rpc/distributed_autograd#smart-mode-algorithm) * [Distributed Optimizer](rpc/distributed_autograd#distributed-optimizer) * [Simple end to end example](rpc/distributed_autograd#simple-end-to-end-example) ## Distributed Optimizer `torch.distributed.optim` exposes DistributedOptimizer, which takes a list of remote parameters (`RRef`) and runs the optimizer locally on the workers where the parameters live. The distributed optimizer can use any of the local optimizer [Algorithms](optim#optimizer-algorithms) to apply the gradients on each worker. `class torch.distributed.optim.DistributedOptimizer(optimizer_class, params_rref, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/optim/optimizer.html#DistributedOptimizer) DistributedOptimizer takes remote references to parameters scattered across workers and applies the given optimizer locally for each parameter. This class uses `get_gradients()` in order to retrieve the gradients for specific parameters. Concurrent calls to `step()`, either from the same or different clients, will be serialized on each worker – as each worker’s optimizer can only work on one set of gradients at a time. However, there is no guarantee that the full forward-backward-optimizer sequence will execute for one client at a time. This means that the gradients being applied may not correspond to the latest forward pass executed on a given worker. Also, there is no guaranteed ordering across workers. `DistributedOptimizer` creates the local optimizer with TorchScript enabled by default, so that optimizer updates are not blocked by the Python Global Interpreter Lock (GIL) during multithreaded training (e.g. Distributed Model Parallel). This feature is currently in beta stage, enabled for optimizers including `Adagrad`, `Adam`, `SGD`, `RMSprop`, `AdamW` and `Adadelta`. We are increasing the coverage to all optimizers in future releases. Parameters * **optimizer_class** ([optim.Optimizer](optim#torch.optim.Optimizer "torch.optim.Optimizer")) – the class of optimizer to instantiate on each worker. * **params_rref** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") _[_RRef _]_) – list of RRefs to local or remote parameters to optimize. * **args** – arguments to pass to the optimizer constructor on each worker. * **kwargs** – arguments to pass to the optimizer constructor on each worker. Example:: >>> import torch.distributed.autograd as dist_autograd >>> import torch.distributed.rpc as rpc >>> from torch import optim >>> from torch.distributed.optim import DistributedOptimizer >>> >>> with dist_autograd.context() as context_id: >>> # Forward pass. >>> rref1 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 3)) >>> rref2 = rpc.remote("worker1", torch.add, args=(torch.ones(2), 1)) >>> loss = rref1.to_here() + rref2.to_here() >>> >>> # Backward pass. >>> dist_autograd.backward(context_id, [loss.sum()]) >>> >>> # Optimizer. >>> dist_optim = DistributedOptimizer( >>> optim.SGD, >>> [rref1, rref2], >>> lr=0.05, >>> ) >>> dist_optim.step(context_id) `step(context_id)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/distributed/optim/optimizer.html#DistributedOptimizer.step) Performs a single optimization step. This will call [`torch.optim.Optimizer.step()`](optim#torch.optim.Optimizer.step "torch.optim.Optimizer.step") on each worker containing parameters to be optimized, and will block until all workers return. The provided `context_id` will be used to retrieve the corresponding `context` that contains the gradients that should be applied to the parameters. Parameters **context_id** – the autograd context id for which we should run the optimizer step. ## Design Notes The distributed autograd design note covers the design of the RPC-based distributed autograd framework that is useful for applications such as model parallel training. * [Distributed Autograd Design](rpc/distributed_autograd#distributed-autograd-design) The RRef design note covers the design of the RRef (Remote REFerence) protocol used to refer to values on remote workers by the framework. * [Remote Reference Protocol](rpc/rref#remote-reference-protocol) ## Tutorials The RPC tutorials introduce users to the RPC framework, provide several example applications using torch.distributed.rpc APIs, and demonstrate how to use [the profiler](https://pytorch.org/docs/stable/autograd.html#profiler) to profile RPC-based workloads. * [Getting started with Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_tutorial.html) * [Implementing a Parameter Server using Distributed RPC Framework](https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html) * [Combining Distributed DataParallel with Distributed RPC Framework](https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html) * [Profiling RPC-based Workloads](https://pytorch.org/tutorials/recipes/distributed_rpc_profiling.html) * [Implementing batch RPC processing](https://pytorch.org/tutorials/intermediate/rpc_async_execution.html) * [Distributed Pipeline Parallel](https://pytorch.org/tutorials/intermediate/dist_pipeline_parallel_tutorial.html) # torch.sparse ## Introduction PyTorch provides [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") to represent a multi-dimensional array containing elements of a single data type. By default, array elements are stored contiguously in memory leading to efficient implementations of various array processing algorithms that relay on the fast access to array elements. However, there exists an important class of multi-dimensional arrays, so-called sparse arrays, where the contiguous memory storage of array elements turns out to be suboptimal. Sparse arrays have a property of having a vast portion of elements being equal to zero which means that a lot of memory as well as processor resources can be spared if only the non-zero elements are stored or/and processed. Various sparse storage formats ([such as COO, CSR/CSC, LIL, etc.](https://en.wikipedia.org/wiki/Sparse_matrix)) have been developed that are optimized for a particular structure of non-zero elements in sparse arrays as well as for specific operations on the arrays. Note When talking about storing only non-zero elements of a sparse array, the usage of adjective “non-zero” is not strict: one is allowed to store also zeros in the sparse array data structure. Hence, in the following, we use “specified elements” for those array elements that are actually stored. In addition, the unspecified elements are typically assumed to have zero value, but not only, hence we use the term “fill value” to denote such elements. Note Using a sparse storage format for storing sparse arrays can be advantageous only when the size and sparsity levels of arrays are high. Otherwise, for small-sized or low-sparsity arrays using the contiguous memory storage format is likely the most efficient approach. Warning The PyTorch API of sparse tensors is in beta and may change in the near future. ## Sparse COO tensors Currently, PyTorch implements the so-called Coordinate format, or COO format, as the default sparse storage format for storing sparse tensors. In COO format, the specified elements are stored as tuples of element indices and the corresponding values. In particular, * the indices of specified elements are collected in `indices` tensor of size `(ndim, nse)` and with element type `torch.int64`, * the corresponding values are collected in `values` tensor of size `(nse,)` and with an arbitrary integer or floating point number element type, where `ndim` is the dimensionality of the tensor and `nse` is the number of specified elements. Note The memory consumption of a sparse COO tensor is at least `(ndim * 8 + ) * nse` bytes (plus a constant overhead from storing other tensor data). The memory consumption of a strided tensor is at least `product() * `. For example, the memory consumption of a 10 000 x 10 000 tensor with 100 000 non-zero 32-bit floating point numbers is at least `(2 * 8 + 4) * 100 000 = 2 000 000` bytes when using COO tensor layout and `10 000 * 10 000 * 4 = 400 000 000` bytes when using the default strided tensor layout. Notice the 200 fold memory saving from using the COO storage format. ### Construction A sparse COO tensor can be constructed by providing the two tensors of indices and values, as well as the size of the sparse tensor (when it cannot be inferred from the indices and values tensors) to a function [`torch.sparse_coo_tensor()`](generated/torch.sparse_coo_tensor#torch.sparse_coo_tensor "torch.sparse_coo_tensor"). Suppose we want to define a sparse tensor with the entry 3 at location (0, 2), entry 4 at location (1, 0), and entry 5 at location (1, 2). Unspecified elements are assumed to have the same value, fill value, which is zero by default. We would then write: >>> i = [[0, 1, 1], [2, 0, 2]] >>> v = [3, 4, 5] >>> s = torch.sparse_coo_tensor(i, v, (2, 3)) >>> s tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3, 4, 5]), size=(2, 3), nnz=3, layout=torch.sparse_coo) >>> s.to_dense() tensor([[0, 0, 3], [4, 0, 5]]) Note that the input `i` is NOT a list of index tuples. If you want to write your indices this way, you should transpose before passing them to the sparse constructor: >>> i = [[0, 2], [1, 0], [1, 2]] >>> v = [3, 4, 5 ] >>> s = torch.sparse_coo_tensor(list(zip(*i)), v, (2, 3)) >>> # Or another equivalent formulation to get s >>> s = torch.sparse_coo_tensor(torch.tensor(i).t(), v, (2, 3)) >>> torch.sparse_coo_tensor(i.t(), v, torch.Size([2,3])).to_dense() tensor([[0, 0, 3], [4, 0, 5]]) An empty sparse COO tensor can be constructed by specifying its size only: >>> torch.sparse_coo_tensor(size=(2, 3)) tensor(indices=tensor([], size=(2, 0)), values=tensor([], size=(0,)), size=(2, 3), nnz=0, layout=torch.sparse_coo) ### Hybrid sparse COO tensors Pytorch implements an extension of sparse tensors with scalar values to sparse tensors with (contiguous) tensor values. Such tensors are called hybrid tensors. PyTorch hybrid COO tensor extends the sparse COO tensor by allowing the `values` tensor to be a multi-dimensional tensor so that we have: * the indices of specified elements are collected in `indices` tensor of size `(sparse_dims, nse)` and with element type `torch.int64`, * the corresponding (tensor) values are collected in `values` tensor of size `(nse, dense_dims)` and with an arbitrary integer or floating point number element type. Note We use (M + K)-dimensional tensor to denote a N-dimensional hybrid sparse tensor, where M and K are the numbers of sparse and dense dimensions, respectively, such that M + K == N holds. Suppose we want to create a (2 + 1)-dimensional tensor with the entry [3, 4] at location (0, 2), entry [5, 6] at location (1, 0), and entry [7, 8] at location (1, 2). We would write >>> i = [[0, 1, 1], [2, 0, 2]] >>> v = [[3, 4], [5, 6], [7, 8]] >>> s = torch.sparse_coo_tensor(i, v, (2, 3, 2)) >>> s tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([[3, 4], [5, 6], [7, 8]]), size=(2, 3, 2), nnz=3, layout=torch.sparse_coo) >>> s.to_dense() tensor([[[0, 0], [0, 0], [3, 4]], [[5, 6], [0, 0], [7, 8]]]) In general, if `s` is a sparse COO tensor and `M = s.sparse_dim()`, `K = s.dense_dim()`, then we have the following invariants: * `M + K == len(s.shape) == s.ndim` \- dimensionality of a tensor is the sum of the number of sparse and dense dimensions, * `s.indices().shape == (M, nse)` \- sparse indices are stored explicitly, * `s.values().shape == (nse,) + s.shape[M : M + K]` \- the values of a hybrid tensor are K-dimensional tensors, * `s.values().layout == torch.strided` \- values are stored as strided tensors. Note Dense dimensions always follow sparse dimensions, that is, mixing of dense and sparse dimensions is not supported. ### Uncoalesced sparse COO tensors PyTorch sparse COO tensor format permits _uncoalesced_ sparse tensors, where there may be duplicate coordinates in the indices; in this case, the interpretation is that the value at that index is the sum of all duplicate value entries. For example, one can specify multiple values, `3` and `4`, for the same index `1`, that leads to an 1-D uncoalesced tensor: >>> i = [[1, 1]] >>> v = [3, 4] >>> s=torch.sparse_coo_tensor(i, v, (3,)) >>> s tensor(indices=tensor([[1, 1]]), values=tensor( [3, 4]), size=(3,), nnz=2, layout=torch.sparse_coo) while the coalescing process will accumulate the multi-valued elements into a single value using summation: >>> s.coalesce() tensor(indices=tensor([[1]]), values=tensor([7]), size=(3,), nnz=1, layout=torch.sparse_coo) In general, the output of `torch.Tensor.coalesce()` method is a sparse tensor with the following properties: * the indices of specified tensor elements are unique, * the indices are sorted in lexicographical order, * `torch.Tensor.is_coalesced()` returns `True`. Note For the most part, you shouldn’t have to care whether or not a sparse tensor is coalesced or not, as most operations will work identically given a coalesced or uncoalesced sparse tensor. However, some operations can be implemented more efficiently on uncoalesced tensors, and some on coalesced tensors. For instance, addition of sparse COO tensors is implemented by simply concatenating the indices and values tensors: >>> a = torch.sparse_coo_tensor([[1, 1]], [5, 6], (2,)) >>> b = torch.sparse_coo_tensor([[0, 0]], [7, 8], (2,)) >>> a + b tensor(indices=tensor([[0, 0, 1, 1]]), values=tensor([7, 8, 5, 6]), size=(2,), nnz=4, layout=torch.sparse_coo) If you repeatedly perform an operation that can produce duplicate entries (e.g., [`torch.Tensor.add()`](tensors#torch.Tensor.add "torch.Tensor.add")), you should occasionally coalesce your sparse tensors to prevent them from growing too large. On the other hand, the lexicographical ordering of indices can be advantageous for implementing algorithms that involve many element selection operations, such as slicing or matrix products. ### Working with sparse COO tensors Let’s consider the following example: >>> i = [[0, 1, 1], [2, 0, 2]] >>> v = [[3, 4], [5, 6], [7, 8]] >>> s = torch.sparse_coo_tensor(i, v, (2, 3, 2)) As mentioned above, a sparse COO tensor is a [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") instance and to distinguish it from the `Tensor` instances that use some other layout, on can use `torch.Tensor.is_sparse` or `torch.Tensor.layout` properties: >>> isinstance(s, torch.Tensor) True >>> s.is_sparse True >>> s.layout == torch.sparse_coo True The number of sparse and dense dimensions can be acquired using methods `torch.Tensor.sparse_dim()` and `torch.Tensor.dense_dim()`, respectively. For instance: >>> s.sparse_dim(), s.dense_dim() (2, 1) If `s` is a sparse COO tensor then its COO format data can be acquired using methods `torch.Tensor.indices()` and `torch.Tensor.values()`. Note Currently, one can acquire the COO format data only when the tensor instance is coalesced: >>> s.indices() RuntimeError: Cannot get indices on an uncoalesced tensor, please call .coalesce() first For acquiring the COO format data of an uncoalesced tensor, use `torch.Tensor._values()` and `torch.Tensor._indices()`: >>> s._indices() tensor([[0, 1, 1], [2, 0, 2]]) Constructing a new sparse COO tensor results a tensor that is not coalesced: >>> s.is_coalesced() False but one can construct a coalesced copy of a sparse COO tensor using the `torch.Tensor.coalesce()` method: >>> s2 = s.coalesce() >>> s2.indices() tensor([[0, 1, 1], [2, 0, 2]]) When working with uncoalesced sparse COO tensors, one must take into an account the additive nature of uncoalesced data: the values of the same indices are the terms of a sum that evaluation gives the value of the corresponding tensor element. For example, the scalar multiplication on an uncoalesced sparse tensor could be implemented by multiplying all the uncoalesced values with the scalar because `c * (a + b) == c * a + c * b` holds. However, any nonlinear operation, say, a square root, cannot be implemented by applying the operation to uncoalesced data because `sqrt(a + b) == sqrt(a) + sqrt(b)` does not hold in general. Slicing (with positive step) of a sparse COO tensor is supported only for dense dimensions. Indexing is supported for both sparse and dense dimensions: >>> s[1] tensor(indices=tensor([[0, 2]]), values=tensor([[5, 6], [7, 8]]), size=(3, 2), nnz=2, layout=torch.sparse_coo) >>> s[1, 0, 1] tensor(6) >>> s[1, 0, 1:] tensor([6]) In PyTorch, the fill value of a sparse tensor cannot be specified explicitly and is assumed to be zero in general. However, there exists operations that may interpret the fill value differently. For instance, `torch.sparse.softmax()` computes the softmax with the assumption that the fill value is negative infinity. ## Supported Linear Algebra operations The following table summarizes supported Linear Algebra operations on sparse matrices where the operands layouts may vary. Here `T[layout]` denotes a tensor with a given layout. Similarly, `M[layout]` denotes a matrix (2-D PyTorch tensor), and `V[layout]` denotes a vector (1-D PyTorch tensor). In addition, `f` denotes a scalar (float or 0-D PyTorch tensor), `*` is element- wise multiplication, and `@` is matrix multiplication. PyTorch operation | Sparse grad? | Layout signature ---|---|--- [`torch.mv()`](generated/torch.mv#torch.mv "torch.mv") | no | `M[sparse_coo] @ V[strided] -> V[strided]` [`torch.matmul()`](generated/torch.matmul#torch.matmul "torch.matmul") | no | `M[sparse_coo] @ M[strided] -> M[strided]` [`torch.mm()`](generated/torch.mm#torch.mm "torch.mm") | no | `M[sparse_coo] @ M[strided] -> M[strided]` `torch.sparse.mm()` | yes | `M[sparse_coo] @ M[strided] -> M[strided]` `torch.smm()` | no | `M[sparse_coo] @ M[strided] -> M[sparse_coo]` `torch.hspmm()` | no | `M[sparse_coo] @ M[strided] -> M[hybrid sparse_coo]` [`torch.bmm()`](generated/torch.bmm#torch.bmm "torch.bmm") | no | `T[sparse_coo] @ T[strided] -> T[strided]` [`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm") | no | `f * M[strided] + f * (M[sparse_coo] @ M[strided]) -> M[strided]` `torch.sparse.addmm()` | yes | `f * M[strided] + f * (M[sparse_coo] @ M[strided]) -> M[strided]` `torch.sspaddmm()` | no | `f * M[sparse_coo] + f * (M[sparse_coo] @ M[strided]) -> M[sparse_coo]` [`torch.lobpcg()`](generated/torch.lobpcg#torch.lobpcg "torch.lobpcg") | no | `GENEIG(M[sparse_coo]) -> M[strided], M[strided]` [`torch.pca_lowrank()`](generated/torch.pca_lowrank#torch.pca_lowrank "torch.pca_lowrank") | yes | `PCA(M[sparse_coo]) -> M[strided], M[strided], M[strided]` [`torch.svd_lowrank()`](generated/torch.svd_lowrank#torch.svd_lowrank "torch.svd_lowrank") | yes | `SVD(M[sparse_coo]) -> M[strided], M[strided], M[strided]` where “Sparse grad?” column indicates if the PyTorch operation supports backward with respect to sparse matrix argument. All PyTorch operations, except `torch.smm()`, support backward with respect to strided matrix arguments. Note Currently, PyTorch does not support matrix multiplication with the layout signature `M[strided] @ M[sparse_coo]`. However, applications can still compute this using the matrix relation `D @ S == (S.t() @ D.t()).t()`. `class torch.Tensor` The following methods are specific to sparse tensors: `is_sparse` Is `True` if the Tensor uses sparse storage layout, `False` otherwise. `dense_dim() → int` Return the number of dense dimensions in a sparse tensor `self`. Warning Throws an error if `self` is not a sparse tensor. See also `Tensor.sparse_dim()` and hybrid tensors. `sparse_dim() → int` Return the number of sparse dimensions in a sparse tensor `self`. Warning Throws an error if `self` is not a sparse tensor. See also `Tensor.dense_dim()` and hybrid tensors. `sparse_mask(mask) → Tensor` Returns a new sparse tensor with values from a strided tensor `self` filtered by the indices of the sparse tensor `mask`. The values of `mask` sparse tensor are ignored. `self` and `mask` tensors must have the same shape. Note The returned sparse tensor has the same indices as the sparse tensor `mask`, even when the corresponding values in `self` are zeros. Parameters **mask** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse tensor whose indices are used as a filter Example: >>> nse = 5 >>> dims = (5, 5, 2, 2) >>> I = torch.cat([torch.randint(0, dims[0], size=(nse,)), ... torch.randint(0, dims[1], size=(nse,))], 0).reshape(2, nse) >>> V = torch.randn(nse, dims[2], dims[3]) >>> S = torch.sparse_coo_tensor(I, V, dims).coalesce() >>> D = torch.randn(dims) >>> D.sparse_mask(S) tensor(indices=tensor([[0, 0, 0, 2], [0, 1, 4, 3]]), values=tensor([[[ 1.6550, 0.2397], [-0.1611, -0.0779]], [[ 0.2326, -1.0558], [ 1.4711, 1.9678]], [[-0.5138, -0.0411], [ 1.9417, 0.5158]], [[ 0.0793, 0.0036], [-0.2569, -0.1055]]]), size=(5, 5, 2, 2), nnz=4, layout=torch.sparse_coo) `sparse_resize_(size, sparse_dim, dense_dim) → Tensor` Resizes `self` sparse tensor to the desired size and the number of sparse and dense dimensions. Note If the number of specified elements in `self` is zero, then [`size`](tensors#torch.Tensor.size "torch.Tensor.size"), `sparse_dim`, and `dense_dim` can be any size and positive integers such that `len(size) == sparse_dim + dense_dim`. If `self` specifies one or more elements, however, then each dimension in [`size`](tensors#torch.Tensor.size "torch.Tensor.size") must not be smaller than the corresponding dimension of `self`, `sparse_dim` must equal the number of sparse dimensions in `self`, and `dense_dim` must equal the number of dense dimensions in `self`. Warning Throws an error if `self` is not a sparse tensor. Parameters * **size** (_torch.Size_) – the desired size. If `self` is non-empty sparse tensor, the desired size cannot be smaller than the original size. * **sparse_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of sparse dimensions * **dense_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of dense dimensions `sparse_resize_and_clear_(size, sparse_dim, dense_dim) → Tensor` Removes all specified elements from a sparse tensor `self` and resizes `self` to the desired size and the number of sparse and dense dimensions. Parameters * **size** (_torch.Size_) – the desired size. * **sparse_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of sparse dimensions * **dense_dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the number of dense dimensions `to_dense() → Tensor` Creates a strided copy of `self`. Warning Throws an error if `self` is a strided tensor. Example: >>> s = torch.sparse_coo_tensor( ... torch.tensor([[1, 1], ... [0, 2]]), ... torch.tensor([9, 10]), ... size=(3, 3)) >>> s.to_dense() tensor([[ 0, 0, 0], [ 9, 0, 10], [ 0, 0, 0]]) `to_sparse(sparseDims) → Tensor` Returns a sparse copy of the tensor. PyTorch supports sparse tensors in coordinate format. Parameters **sparseDims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of sparse dimensions to include in the new sparse tensor Example: >>> d = torch.tensor([[0, 0, 0], [9, 0, 10], [0, 0, 0]]) >>> d tensor([[ 0, 0, 0], [ 9, 0, 10], [ 0, 0, 0]]) >>> d.to_sparse() tensor(indices=tensor([[1, 1], [0, 2]]), values=tensor([ 9, 10]), size=(3, 3), nnz=2, layout=torch.sparse_coo) >>> d.to_sparse(1) tensor(indices=tensor([[1]]), values=tensor([[ 9, 0, 10]]), size=(3, 3), nnz=1, layout=torch.sparse_coo) `coalesce() → Tensor` Returns a coalesced copy of `self` if `self` is an uncoalesced tensor. Returns `self` if `self` is a coalesced tensor. Warning Throws an error if `self` is not a sparse COO tensor. `is_coalesced() → bool` Returns `True` if `self` is a sparse COO tensor that is coalesced, `False` otherwise. Warning Throws an error if `self` is not a sparse COO tensor. See `coalesce()` and uncoalesced tensors. `indices() → Tensor` Return the indices tensor of a sparse COO tensor. Warning Throws an error if `self` is not a sparse COO tensor. See also `Tensor.values()`. Note This method can only be called on a coalesced sparse tensor. See `Tensor.coalesce()` for details. `values() → Tensor` Return the values tensor of a sparse COO tensor. Warning Throws an error if `self` is not a sparse COO tensor. See also `Tensor.indices()`. Note This method can only be called on a coalesced sparse tensor. See `Tensor.coalesce()` for details. The following [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") methods support sparse COO tensors: [`add()`](tensors#torch.Tensor.add "torch.Tensor.add") [`add_()`](tensors#torch.Tensor.add_ "torch.Tensor.add_") [`addmm()`](tensors#torch.Tensor.addmm "torch.Tensor.addmm") [`addmm_()`](tensors#torch.Tensor.addmm_ "torch.Tensor.addmm_") [`any()`](tensors#torch.Tensor.any "torch.Tensor.any") [`asin()`](tensors#torch.Tensor.asin "torch.Tensor.asin") [`asin_()`](tensors#torch.Tensor.asin_ "torch.Tensor.asin_") [`arcsin()`](tensors#torch.Tensor.arcsin "torch.Tensor.arcsin") [`arcsin_()`](tensors#torch.Tensor.arcsin_ "torch.Tensor.arcsin_") [`bmm()`](tensors#torch.Tensor.bmm "torch.Tensor.bmm") [`clone()`](tensors#torch.Tensor.clone "torch.Tensor.clone") [`deg2rad()`](tensors#torch.Tensor.deg2rad "torch.Tensor.deg2rad") `deg2rad_()` [`detach()`](autograd#torch.Tensor.detach "torch.Tensor.detach") [`detach_()`](autograd#torch.Tensor.detach_ "torch.Tensor.detach_") [`dim()`](tensors#torch.Tensor.dim "torch.Tensor.dim") [`div()`](tensors#torch.Tensor.div "torch.Tensor.div") [`div_()`](tensors#torch.Tensor.div_ "torch.Tensor.div_") [`floor_divide()`](tensors#torch.Tensor.floor_divide "torch.Tensor.floor_divide") [`floor_divide_()`](tensors#torch.Tensor.floor_divide_ "torch.Tensor.floor_divide_") [`get_device()`](tensors#torch.Tensor.get_device "torch.Tensor.get_device") [`index_select()`](tensors#torch.Tensor.index_select "torch.Tensor.index_select") [`isnan()`](tensors#torch.Tensor.isnan "torch.Tensor.isnan") [`log1p()`](tensors#torch.Tensor.log1p "torch.Tensor.log1p") [`log1p_()`](tensors#torch.Tensor.log1p_ "torch.Tensor.log1p_") [`mm()`](tensors#torch.Tensor.mm "torch.Tensor.mm") [`mul()`](tensors#torch.Tensor.mul "torch.Tensor.mul") [`mul_()`](tensors#torch.Tensor.mul_ "torch.Tensor.mul_") [`mv()`](tensors#torch.Tensor.mv "torch.Tensor.mv") [`narrow_copy()`](tensors#torch.Tensor.narrow_copy "torch.Tensor.narrow_copy") [`neg()`](tensors#torch.Tensor.neg "torch.Tensor.neg") [`neg_()`](tensors#torch.Tensor.neg_ "torch.Tensor.neg_") [`negative()`](tensors#torch.Tensor.negative "torch.Tensor.negative") [`negative_()`](tensors#torch.Tensor.negative_ "torch.Tensor.negative_") [`numel()`](tensors#torch.Tensor.numel "torch.Tensor.numel") [`rad2deg()`](tensors#torch.Tensor.rad2deg "torch.Tensor.rad2deg") `rad2deg_()` [`resize_as_()`](tensors#torch.Tensor.resize_as_ "torch.Tensor.resize_as_") [`size()`](tensors#torch.Tensor.size "torch.Tensor.size") [`pow()`](tensors#torch.Tensor.pow "torch.Tensor.pow") [`sqrt()`](tensors#torch.Tensor.sqrt "torch.Tensor.sqrt") [`square()`](tensors#torch.Tensor.square "torch.Tensor.square") `smm()` `sspaddmm()` [`sub()`](tensors#torch.Tensor.sub "torch.Tensor.sub") [`sub_()`](tensors#torch.Tensor.sub_ "torch.Tensor.sub_") [`t()`](tensors#torch.Tensor.t "torch.Tensor.t") [`t_()`](tensors#torch.Tensor.t_ "torch.Tensor.t_") [`transpose()`](tensors#torch.Tensor.transpose "torch.Tensor.transpose") [`transpose_()`](tensors#torch.Tensor.transpose_ "torch.Tensor.transpose_") [`zero_()`](tensors#torch.Tensor.zero_ "torch.Tensor.zero_") ## Sparse tensor functions `torch.sparse_coo_tensor(indices, values, size=None, *, dtype=None, device=None, requires_grad=False) → Tensor` Constructs a sparse tensor in COO(rdinate) format with specified values at the given `indices`. Note This function returns an uncoalesced tensor. Parameters * **indices** (_array_like_) – Initial data for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. Will be cast to a `torch.LongTensor` internally. The indices are the coordinates of the non-zero values in the matrix, and thus should be two-dimensional where the first dimension is the number of tensor dimensions and the second dimension is the number of non-zero values. * **values** (_array_like_) – Initial values for the tensor. Can be a list, tuple, NumPy `ndarray`, scalar, and other types. * **size** (list, tuple, or `torch.Size`, optional) – Size of the sparse tensor. If not provided the size will be inferred as the minimum size big enough to hold all non-zero elements. Keyword Arguments * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired data type of returned tensor. Default: if None, infers data type from `values`. * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, uses the current device for the default tensor type (see [`torch.set_default_tensor_type()`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type")). `device` will be the CPU for CPU tensor types and the current CUDA device for CUDA tensor types. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> i = torch.tensor([[0, 1, 1], ... [2, 0, 2]]) >>> v = torch.tensor([3, 4, 5], dtype=torch.float32) >>> torch.sparse_coo_tensor(i, v, [2, 4]) tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), size=(2, 4), nnz=3, layout=torch.sparse_coo) >>> torch.sparse_coo_tensor(i, v) # Shape inference tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), size=(2, 3), nnz=3, layout=torch.sparse_coo) >>> torch.sparse_coo_tensor(i, v, [2, 4], ... dtype=torch.float64, ... device=torch.device('cuda:0')) tensor(indices=tensor([[0, 1, 1], [2, 0, 2]]), values=tensor([3., 4., 5.]), device='cuda:0', size=(2, 4), nnz=3, dtype=torch.float64, layout=torch.sparse_coo) # Create an empty sparse tensor with the following invariants: # 1. sparse_dim + dense_dim = len(SparseTensor.shape) # 2. SparseTensor._indices().shape = (sparse_dim, nnz) # 3. SparseTensor._values().shape = (nnz, SparseTensor.shape[sparse_dim:]) # # For instance, to create an empty sparse tensor with nnz = 0, dense_dim = 0 and # sparse_dim = 1 (hence indices is a 2D tensor of shape = (1, 0)) >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), [], [1]) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0,)), size=(1,), nnz=0, layout=torch.sparse_coo) # and to create an empty sparse tensor with nnz = 0, dense_dim = 1 and # sparse_dim = 1 >>> S = torch.sparse_coo_tensor(torch.empty([1, 0]), torch.empty([0, 2]), [1, 2]) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0, 2)), size=(1, 2), nnz=0, layout=torch.sparse_coo) `torch.sparse.sum(input, dim=None, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#sum) Returns the sum of each row of the sparse tensor `input` in the given dimensions `dim`. If `dim` is a list of dimensions, reduce over all of them. When sum over all `sparse_dim`, this method returns a dense tensor instead of a sparse tensor. All summed `dim` are squeezed (see [`torch.squeeze()`](generated/torch.squeeze#torch.squeeze "torch.squeeze")), resulting an output tensor having `dim` fewer dimensions than `input`. During backward, only gradients at `nnz` locations of `input` will propagate back. Note that the gradients of `input` is coalesced. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input sparse tensor * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _tuple of python:ints_) – a dimension or a list of dimensions to reduce. Default: reduce over all dims. * **dtype** (`torch.dtype`, optional) – the desired data type of returned Tensor. Default: dtype of `input`. Example: >>> nnz = 3 >>> dims = [5, 5, 2, 3] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size) >>> S tensor(indices=tensor([[2, 0, 3], [2, 4, 1]]), values=tensor([[[-0.6438, -1.6467, 1.4004], [ 0.3411, 0.0918, -0.2312]], [[ 0.5348, 0.0634, -2.0494], [-0.7125, -1.0646, 2.1844]], [[ 0.1276, 0.1874, -0.6334], [-1.9682, -0.5340, 0.7483]]]), size=(5, 5, 2, 3), nnz=3, layout=torch.sparse_coo) # when sum over only part of sparse_dims, return a sparse tensor >>> torch.sparse.sum(S, [1, 3]) tensor(indices=tensor([[0, 2, 3]]), values=tensor([[-1.4512, 0.4073], [-0.8901, 0.2017], [-0.3183, -1.7539]]), size=(5, 2), nnz=3, layout=torch.sparse_coo) # when sum over all sparse dim, return a dense tensor # with summed dims squeezed >>> torch.sparse.sum(S, [0, 1, 3]) tensor([-2.6596, -1.1450]) `torch.sparse.addmm(mat, mat1, mat2, beta=1.0, alpha=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#addmm) This function does exact same thing as [`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm") in the forward, except that it supports backward for sparse matrix `mat1`. `mat1` need to have `sparse_dim = 2`. Note that the gradients of `mat1` is a coalesced sparse tensor. Parameters * **mat** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be added * **mat1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be multiplied * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be multiplied * **beta** (_Number_ _,__optional_) – multiplier for `mat` (β\beta ) * **alpha** (_Number_ _,__optional_) – multiplier for mat1@mat2mat1 @ mat2 (α\alpha ) `torch.sparse.mm(mat1, mat2)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#mm) Performs a matrix multiplication of the sparse matrix `mat1` and the (sparse or strided) matrix `mat2`. Similar to [`torch.mm()`](generated/torch.mm#torch.mm "torch.mm"), If `mat1` is a (n×m)(n \times m) tensor, `mat2` is a (m×p)(m \times p) tensor, out will be a (n×p)(n \times p) tensor. `mat1` need to have `sparse_dim = 2`. This function also supports backward for both matrices. Note that the gradients of `mat1` is a coalesced sparse tensor. Parameters * **mat1** (_SparseTensor_) – the first sparse matrix to be multiplied * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the second matrix to be multiplied, which could be sparse or dense Shape: The format of the output tensor of this function follows: - sparse x sparse -> sparse - sparse x dense -> dense Example: >>> a = torch.randn(2, 3).to_sparse().requires_grad_(True) >>> a tensor(indices=tensor([[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]), values=tensor([ 1.5901, 0.0183, -0.6146, 1.8061, -0.0112, 0.6302]), size=(2, 3), nnz=6, layout=torch.sparse_coo, requires_grad=True) >>> b = torch.randn(3, 2, requires_grad=True) >>> b tensor([[-0.6479, 0.7874], [-1.2056, 0.5641], [-1.1716, -0.9923]], requires_grad=True) >>> y = torch.sparse.mm(a, b) >>> y tensor([[-0.3323, 1.8723], [-1.8951, 0.7904]], grad_fn=) >>> y.sum().backward() >>> a.grad tensor(indices=tensor([[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]]), values=tensor([ 0.1394, -0.6415, -2.1639, 0.1394, -0.6415, -2.1639]), size=(2, 3), nnz=6, layout=torch.sparse_coo) `torch.sspaddmm(input, mat1, mat2, *, beta=1, alpha=1, out=None) → Tensor` Matrix multiplies a sparse tensor `mat1` with a dense tensor `mat2`, then adds the sparse tensor `input` to the result. Note: This function is equivalent to [`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm"), except `input` and `mat1` are sparse. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be added * **mat1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be matrix multiplied * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be matrix multiplied Keyword Arguments * **beta** (_Number_ _,__optional_) – multiplier for `mat` (β\beta ) * **alpha** (_Number_ _,__optional_) – multiplier for mat1@mat2mat1 @ mat2 (α\alpha ) * **out** ([Tensor](tensors#torch.Tensor "torch.Tensor") _,__optional_) – the output tensor. `torch.hspmm(mat1, mat2, *, out=None) → Tensor` Performs a matrix multiplication of a sparse COO matrix `mat1` and a strided matrix `mat2`. The result is a (1 + 1)-dimensional hybrid COO matrix. Parameters * **mat1** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the first sparse matrix to be matrix multiplied * **mat2** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the second strided matrix to be matrix multiplied Keyword Arguments **{out}** – `torch.smm(input, mat) → Tensor` Performs a matrix multiplication of the sparse matrix `input` with the dense matrix `mat`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a sparse matrix to be matrix multiplied * **mat** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – a dense matrix to be matrix multiplied `torch.sparse.softmax(input, dim, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#softmax) Applies a softmax function. Softmax is defined as: Softmax(xi)=exp(xi)∑jexp(xj)\text{Softmax}(x_{i}) = \frac{exp(x_i)}{\sum_j exp(x_j)} where i,ji, j run over sparse tensor indices and unspecified entries are ignores. This is equivalent to defining unspecified entries as negative infinity so that exp(xk)=0exp(x_k) = 0 when the entry with index kk has not specified. It is applied to all slices along `dim`, and will re-scale them so that the elements lie in the range `[0, 1]` and sum to 1. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed. * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None `torch.sparse.log_softmax(input, dim, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/sparse.html#log_softmax) Applies a softmax function followed by logarithm. See `softmax` for more details. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – input * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – A dimension along which softmax will be computed. * **dtype** (`torch.dtype`, optional) – the desired data type of returned tensor. If specified, the input tensor is casted to `dtype` before the operation is performed. This is useful for preventing data type overflows. Default: None ## Other functions The following `torch` functions support sparse COO tensors: [`cat()`](generated/torch.cat#torch.cat "torch.cat") [`dstack()`](generated/torch.dstack#torch.dstack "torch.dstack") [`empty()`](generated/torch.empty#torch.empty "torch.empty") [`empty_like()`](generated/torch.empty_like#torch.empty_like "torch.empty_like") [`hstack()`](generated/torch.hstack#torch.hstack "torch.hstack") [`index_select()`](generated/torch.index_select#torch.index_select "torch.index_select") [`is_complex()`](generated/torch.is_complex#torch.is_complex "torch.is_complex") [`is_floating_point()`](generated/torch.is_floating_point#torch.is_floating_point "torch.is_floating_point") [`is_nonzero()`](generated/torch.is_nonzero#torch.is_nonzero "torch.is_nonzero") `is_same_size()` `is_signed()` [`is_tensor()`](generated/torch.is_tensor#torch.is_tensor "torch.is_tensor") [`lobpcg()`](generated/torch.lobpcg#torch.lobpcg "torch.lobpcg") [`mm()`](generated/torch.mm#torch.mm "torch.mm") `native_norm()` [`pca_lowrank()`](generated/torch.pca_lowrank#torch.pca_lowrank "torch.pca_lowrank") `select()` [`stack()`](generated/torch.stack#torch.stack "torch.stack") [`svd_lowrank()`](generated/torch.svd_lowrank#torch.svd_lowrank "torch.svd_lowrank") [`unsqueeze()`](generated/torch.unsqueeze#torch.unsqueeze "torch.unsqueeze") [`vstack()`](generated/torch.vstack#torch.vstack "torch.vstack") [`zeros()`](generated/torch.zeros#torch.zeros "torch.zeros") [`zeros_like()`](generated/torch.zeros_like#torch.zeros_like "torch.zeros_like") # torch.Storage A `torch.Storage` is a contiguous, one-dimensional array of a single data type. Every [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") has a corresponding storage of the same data type. `class torch.FloatStorage(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch.html#FloatStorage) `bfloat16()` Casts this storage to bfloat16 type `bool()` Casts this storage to bool type `byte()` Casts this storage to byte type `char()` Casts this storage to char type `clone()` Returns a copy of this storage `complex_double()` Casts this storage to complex double type `complex_float()` Casts this storage to complex float type `copy_()` `cpu()` Returns a CPU copy of this storage if it’s not already on the CPU `cuda(device=None, non_blocking=False, **kwargs)` Returns a copy of this object in CUDA memory. If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned. Parameters * **device** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The destination GPU id. Defaults to the current device. * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True` and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. * ****kwargs** – For compatibility, may contain the key `async` in place of the `non_blocking` argument. `data_ptr()` `device` `double()` Casts this storage to double type `dtype` `element_size()` `fill_()` `float()` Casts this storage to float type `static from_buffer()` `static from_file(filename, shared=False, size=0) → Storage` If `shared` is `True`, then memory is shared between all processes. All changes are written to the file. If `shared` is `False`, then the changes on the storage do not affect the file. `size` is the number of elements in the storage. If `shared` is `False`, then the file must contain at least `size * sizeof(Type)` bytes (`Type` is the type of storage). If `shared` is `True` the file will be created if needed. Parameters * **filename** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – file name to map * **shared** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to share memory * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – number of elements in the storage `get_device()` `half()` Casts this storage to half type `int()` Casts this storage to int type `is_cuda: bool = False` `is_pinned()` `is_shared()` `is_sparse: bool = False` `long()` Casts this storage to long type `new()` `pin_memory()` Copies the storage to pinned memory, if it’s not already pinned. `resize_()` `share_memory_()` Moves the storage to shared memory. This is a no-op for storages already in shared memory and for CUDA storages, which do not need to be moved for sharing across processes. Storages in shared memory cannot be resized. Returns: self `short()` Casts this storage to short type `size()` `tolist()` Returns a list containing the elements of this storage `type(dtype=None, non_blocking=False, **kwargs)` Returns the type if `dtype` is not provided, else casts this object to the specified type. If this is already of the correct type, no copy is performed and the original object is returned. Parameters * **dtype** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – The desired type * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, and the source is in pinned memory and destination is on the GPU or vice versa, the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect. * ****kwargs** – For compatibility, may contain the key `async` in place of the `non_blocking` argument. The `async` arg is deprecated. # Tensor Attributes Each `torch.Tensor` has a `torch.dtype`, `torch.device`, and `torch.layout`. ## torch.dtype `class torch.dtype` A `torch.dtype` is an object that represents the data type of a [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor"). PyTorch has twelve different data types: Data type | dtype | Legacy Constructors ---|---|--- 32-bit floating point | `torch.float32` or `torch.float` | `torch.*.FloatTensor` 64-bit floating point | `torch.float64` or `torch.double` | `torch.*.DoubleTensor` 64-bit complex | `torch.complex64` or `torch.cfloat` | 128-bit complex | `torch.complex128` or `torch.cdouble` | 16-bit floating point 1 | `torch.float16` or `torch.half` | `torch.*.HalfTensor` 16-bit floating point 2 | `torch.bfloat16` | `torch.*.BFloat16Tensor` 8-bit integer (unsigned) | `torch.uint8` | `torch.*.ByteTensor` 8-bit integer (signed) | `torch.int8` | `torch.*.CharTensor` 16-bit integer (signed) | `torch.int16` or `torch.short` | `torch.*.ShortTensor` 32-bit integer (signed) | `torch.int32` or `torch.int` | `torch.*.IntTensor` 64-bit integer (signed) | `torch.int64` or `torch.long` | `torch.*.LongTensor` Boolean | `torch.bool` | `torch.*.BoolTensor` `1` Sometimes referred to as binary16: uses 1 sign, 5 exponent, and 10 significand bits. Useful when precision is important. `2` Sometimes referred to as Brain Floating Point: use 1 sign, 8 exponent and 7 significand bits. Useful when range is important, since it has the same number of exponent bits as `float32` To find out if a `torch.dtype` is a floating point data type, the property [`is_floating_point`](generated/torch.is_floating_point#torch.is_floating_point "torch.is_floating_point") can be used, which returns `True` if the data type is a floating point data type. To find out if a `torch.dtype` is a complex data type, the property [`is_complex`](generated/torch.is_complex#torch.is_complex "torch.is_complex") can be used, which returns `True` if the data type is a complex data type. When the dtypes of inputs to an arithmetic operation (`add`, `sub`, `div`, `mul`) differ, we promote by finding the minimum dtype that satisfies the following rules: * If the type of a scalar operand is of a higher category than tensor operands (where complex > floating > integral > boolean), we promote to a type with sufficient size to hold all scalar operands of that category. * If a zero-dimension tensor operand has a higher category than dimensioned operands, we promote to a type with sufficient size and category to hold all zero-dim tensor operands of that category. * If there are no higher-category zero-dim operands, we promote to a type with sufficient size and category to hold all dimensioned operands. A floating point scalar operand has dtype `torch.get_default_dtype()` and an integral non-boolean scalar operand has dtype `torch.int64`. Unlike numpy, we do not inspect values when determining the minimum `dtypes` of an operand. Quantized and complex types are not yet supported. Promotion Examples: >>> float_tensor = torch.ones(1, dtype=torch.float) >>> double_tensor = torch.ones(1, dtype=torch.double) >>> complex_float_tensor = torch.ones(1, dtype=torch.complex64) >>> complex_double_tensor = torch.ones(1, dtype=torch.complex128) >>> int_tensor = torch.ones(1, dtype=torch.int) >>> long_tensor = torch.ones(1, dtype=torch.long) >>> uint_tensor = torch.ones(1, dtype=torch.uint8) >>> double_tensor = torch.ones(1, dtype=torch.double) >>> bool_tensor = torch.ones(1, dtype=torch.bool) # zero-dim tensors >>> long_zerodim = torch.tensor(1, dtype=torch.long) >>> int_zerodim = torch.tensor(1, dtype=torch.int) >>> torch.add(5, 5).dtype torch.int64 # 5 is an int64, but does not have higher category than int_tensor so is not considered. >>> (int_tensor + 5).dtype torch.int32 >>> (int_tensor + long_zerodim).dtype torch.int32 >>> (long_tensor + int_tensor).dtype torch.int64 >>> (bool_tensor + long_tensor).dtype torch.int64 >>> (bool_tensor + uint_tensor).dtype torch.uint8 >>> (float_tensor + double_tensor).dtype torch.float64 >>> (complex_float_tensor + complex_double_tensor).dtype torch.complex128 >>> (bool_tensor + int_tensor).dtype torch.int32 # Since long is a different kind than float, result dtype only needs to be large enough # to hold the float. >>> torch.add(long_tensor, float_tensor).dtype torch.float32 `When the output tensor of an arithmetic operation is specified, we allow casting to its dtype except that:` * An integral output tensor cannot accept a floating point tensor. * A boolean output tensor cannot accept a non-boolean tensor. * A non-complex output tensor cannot accept a complex tensor Casting Examples: # allowed: >>> float_tensor *= double_tensor >>> float_tensor *= int_tensor >>> float_tensor *= uint_tensor >>> float_tensor *= bool_tensor >>> float_tensor *= double_tensor >>> int_tensor *= long_tensor >>> int_tensor *= uint_tensor >>> uint_tensor *= int_tensor # disallowed (RuntimeError: result type can't be cast to the desired output type): >>> int_tensor *= float_tensor >>> bool_tensor *= int_tensor >>> bool_tensor *= uint_tensor >>> float_tensor *= complex_float_tensor ## torch.device `class torch.device` A `torch.device` is an object representing the device on which a [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") is or will be allocated. The `torch.device` contains a device type (`'cpu'` or `'cuda'`) and optional device ordinal for the device type. If the device ordinal is not present, this object will always represent the current device for the device type, even after [`torch.cuda.set_device()`](cuda#torch.cuda.set_device "torch.cuda.set_device") is called; e.g., a [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") constructed with device `'cuda'` is equivalent to `'cuda:X'` where X is the result of [`torch.cuda.current_device()`](cuda#torch.cuda.current_device "torch.cuda.current_device"). A [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor")’s device can be accessed via the [`Tensor.device`](tensors#torch.Tensor.device "torch.Tensor.device") property. A `torch.device` can be constructed via a string or via a string and device ordinal Via a string: >>> torch.device('cuda:0') device(type='cuda', index=0) >>> torch.device('cpu') device(type='cpu') >>> torch.device('cuda') # current cuda device device(type='cuda') Via a string and device ordinal: >>> torch.device('cuda', 0) device(type='cuda', index=0) >>> torch.device('cpu', 0) device(type='cpu', index=0) Note The `torch.device` argument in functions can generally be substituted with a string. This allows for fast prototyping of code. >>> # Example of a function that takes in a torch.device >>> cuda1 = torch.device('cuda:1') >>> torch.randn((2,3), device=cuda1) >>> # You can substitute the torch.device with a string >>> torch.randn((2,3), device='cuda:1') Note For legacy reasons, a device can be constructed via a single device ordinal, which is treated as a cuda device. This matches [`Tensor.get_device()`](tensors#torch.Tensor.get_device "torch.Tensor.get_device"), which returns an ordinal for cuda tensors and is not supported for cpu tensors. >>> torch.device(1) device(type='cuda', index=1) Note Methods which take a device will generally accept a (properly formatted) string or (legacy) integer device ordinal, i.e. the following are all equivalent: >>> torch.randn((2,3), device=torch.device('cuda:1')) >>> torch.randn((2,3), device='cuda:1') >>> torch.randn((2,3), device=1) # legacy ## torch.layout `class torch.layout` Warning The `torch.layout` class is in beta and subject to change. A `torch.layout` is an object that represents the memory layout of a [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor"). Currently, we support `torch.strided` (dense Tensors) and have beta support for `torch.sparse_coo` (sparse COO Tensors). `torch.strided` represents dense Tensors and is the memory layout that is most commonly used. Each strided tensor has an associated `torch.Storage`, which holds its data. These tensors provide multi-dimensional, [strided](https://en.wikipedia.org/wiki/Stride_of_an_array) view of a storage. Strides are a list of integers: the k-th stride represents the jump in the memory necessary to go from one element to the next one in the k-th dimension of the Tensor. This concept makes it possible to perform many tensor operations efficiently. Example: >>> x = torch.Tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) >>> x.stride() (5, 1) >>> x.t().stride() (1, 5) For more information on `torch.sparse_coo` tensors, see [torch.sparse](sparse#sparse-docs). ## torch.memory_format `class torch.memory_format` A `torch.memory_format` is an object representing the memory format on which a [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") is or will be allocated. Possible values are: * `torch.contiguous_format`: Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in decreasing order. * `torch.channels_last`: Tensor is or will be allocated in dense non-overlapping memory. Strides represented by values in `strides[0] > strides[2] > strides[3] > strides[1] == 1` aka NHWC order. * `torch.preserve_format`: Used in functions like `clone` to preserve the memory format of the input tensor. If input tensor is allocated in dense non-overlapping memory, the output tensor strides will be copied from the input. Otherwise output strides will follow `torch.contiguous_format` # torch.utils.tensorboard Before going further, more details on TensorBoard can be found at Once you’ve installed TensorBoard, these utilities let you log PyTorch models and metrics into a directory for visualization within the TensorBoard UI. Scalars, images, histograms, graphs, and embedding visualizations are all supported for PyTorch models and tensors as well as Caffe2 nets and blobs. The SummaryWriter class is your main entry to log data for consumption and visualization by TensorBoard. For example: import torch import torchvision from torch.utils.tensorboard import SummaryWriter from torchvision import datasets, transforms # Writer will output to ./runs/ directory by default writer = SummaryWriter() transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))]) trainset = datasets.MNIST('mnist_train', train=True, download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True) model = torchvision.models.resnet50(False) # Have ResNet model take in grayscale rather than RGB model.conv1 = torch.nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False) images, labels = next(iter(trainloader)) grid = torchvision.utils.make_grid(images) writer.add_image('images', grid, 0) writer.add_graph(model, images) writer.close() This can then be visualized with TensorBoard, which should be installable and runnable with: pip install tensorboard tensorboard --logdir=runs Lots of information can be logged for one experiment. To avoid cluttering the UI and have better result clustering, we can group plots by naming them hierarchically. For example, “Loss/train” and “Loss/test” will be grouped together, while “Accuracy/train” and “Accuracy/test” will be grouped separately in the TensorBoard interface. from torch.utils.tensorboard import SummaryWriter import numpy as np writer = SummaryWriter() for n_iter in range(100): writer.add_scalar('Loss/train', np.random.random(), n_iter) writer.add_scalar('Loss/test', np.random.random(), n_iter) writer.add_scalar('Accuracy/train', np.random.random(), n_iter) writer.add_scalar('Accuracy/test', np.random.random(), n_iter) Expected result: [](_images/hier_tags.png) `class torch.utils.tensorboard.writer.SummaryWriter(log_dir=None, comment='', purge_step=None, max_queue=10, flush_secs=120, filename_suffix='')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter) Writes entries directly to event files in the log_dir to be consumed by TensorBoard. The `SummaryWriter` class provides a high-level API to create an event file in a given directory and add summaries and events to it. The class updates the file contents asynchronously. This allows a training program to call methods to add data to the file directly from the training loop, without slowing down training. `__init__(log_dir=None, comment='', purge_step=None, max_queue=10, flush_secs=120, filename_suffix='')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.__init__) Creates a `SummaryWriter` that will write out events and summaries to the event file. Parameters * **log_dir** (_string_) – Save directory location. Default is runs/**CURRENT_DATETIME_HOSTNAME** , which changes after each run. Use hierarchical folder structure to compare between runs easily. e.g. pass in ‘runs/exp1’, ‘runs/exp2’, etc. for each new experiment to compare across them. * **comment** (_string_) – Comment log_dir suffix appended to the default `log_dir`. If `log_dir` is assigned, this argument has no effect. * **purge_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – When logging crashes at step T+XT+X and restarts at step TT , any events whose global_step larger or equal to TT will be purged and hidden from TensorBoard. Note that crashed and resumed experiments should have the same `log_dir`. * **max_queue** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Size of the queue for pending events and summaries before one of the ‘add’ calls forces a flush to disk. Default is ten items. * **flush_secs** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – How often, in seconds, to flush the pending events and summaries to disk. Default is every two minutes. * **filename_suffix** (_string_) – Suffix added to all event filenames in the log_dir directory. More details on filename construction in tensorboard.summary.writer.event_file_writer.EventFileWriter. Examples: from torch.utils.tensorboard import SummaryWriter # create a summary writer with automatically generated folder name. writer = SummaryWriter() # folder location: runs/May04_22-14-54_s-MacBook-Pro.local/ # create a summary writer using the specified folder name. writer = SummaryWriter("my_experiment") # folder location: my_experiment # create a summary writer with comment appended. writer = SummaryWriter(comment="LR_0.1_BATCH_16") # folder location: runs/May04_22-14-54_s-MacBook-Pro.localLR_0.1_BATCH_16/ `add_scalar(tag, scalar_value, global_step=None, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_scalar) Add scalar data to summary. Parameters * **tag** (_string_) – Data identifier * **scalar_value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _string/blobname_) – Value to save * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) with seconds after epoch of event Examples: from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() x = range(100) for i in x: writer.add_scalar('y=2x', i * 2, i) writer.close() Expected result: [](_images/add_scalar.png) `add_scalars(main_tag, tag_scalar_dict, global_step=None, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_scalars) Adds many scalar data to summary. Parameters * **main_tag** (_string_) – The parent name for the tags * **tag_scalar_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Key-value pair storing the tag and corresponding values * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Examples: from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() r = 5 for i in range(100): writer.add_scalars('run_14h', {'xsinx':i*np.sin(i/r), 'xcosx':i*np.cos(i/r), 'tanx': np.tan(i/r)}, i) writer.close() # This call adds three values to the same scalar plot with the tag # 'run_14h' in TensorBoard's scalar section. Expected result: [](_images/add_scalars.png) `add_histogram(tag, values, global_step=None, bins='tensorflow', walltime=None, max_bins=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_histogram) Add histogram to summary. Parameters * **tag** (_string_) – Data identifier * **values** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Values to build histogram * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **bins** (_string_) – One of {‘tensorflow’,’auto’, ‘fd’, …}. This determines how the bins are made. You can find other options in: * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Examples: from torch.utils.tensorboard import SummaryWriter import numpy as np writer = SummaryWriter() for i in range(10): x = np.random.random(1000) writer.add_histogram('distribution centers', x + i, i) writer.close() Expected result: [](_images/add_histogram.png) `add_image(tag, img_tensor, global_step=None, walltime=None, dataformats='CHW')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_image) Add image data to summary. Note that this requires the `pillow` package. Parameters * **tag** (_string_) – Data identifier * **img_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Image data * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Shape: img_tensor: Default is (3,H,W)(3, H, W) . You can use `torchvision.utils.make_grid()` to convert a batch of tensor into 3xHxW format or call `add_images` and let us do the job. Tensor with (1,H,W)(1, H, W) , (H,W)(H, W) , (H,W,3)(H, W, 3) is also suitable as long as corresponding `dataformats` argument is passed, e.g. `CHW`, `HWC`, `HW`. Examples: from torch.utils.tensorboard import SummaryWriter import numpy as np img = np.zeros((3, 100, 100)) img[0] = np.arange(0, 10000).reshape(100, 100) / 10000 img[1] = 1 - np.arange(0, 10000).reshape(100, 100) / 10000 img_HWC = np.zeros((100, 100, 3)) img_HWC[:, :, 0] = np.arange(0, 10000).reshape(100, 100) / 10000 img_HWC[:, :, 1] = 1 - np.arange(0, 10000).reshape(100, 100) / 10000 writer = SummaryWriter() writer.add_image('my_image', img, 0) # If you have non-default dimension setting, set the dataformats argument. writer.add_image('my_image_HWC', img_HWC, 0, dataformats='HWC') writer.close() Expected result: [](_images/add_image.png) `add_images(tag, img_tensor, global_step=None, walltime=None, dataformats='NCHW')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_images) Add batched image data to summary. Note that this requires the `pillow` package. Parameters * **tag** (_string_) – Data identifier * **img_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Image data * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event * **dataformats** (_string_) – Image data format specification of the form NCHW, NHWC, CHW, HWC, HW, WH, etc. Shape: img_tensor: Default is (N,3,H,W)(N, 3, H, W) . If `dataformats` is specified, other shape will be accepted. e.g. NCHW or NHWC. Examples: from torch.utils.tensorboard import SummaryWriter import numpy as np img_batch = np.zeros((16, 3, 100, 100)) for i in range(16): img_batch[i, 0] = np.arange(0, 10000).reshape(100, 100) / 10000 / 16 * i img_batch[i, 1] = (1 - np.arange(0, 10000).reshape(100, 100) / 10000) / 16 * i writer = SummaryWriter() writer.add_images('my_image_batch', img_batch, 0) writer.close() Expected result: [](_images/add_images.png) `add_figure(tag, figure, global_step=None, close=True, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_figure) Render matplotlib figure into an image and add it to summary. Note that this requires the `matplotlib` package. Parameters * **tag** (_string_) – Data identifier * **figure** (_matplotlib.pyplot.figure_) – Figure or a list of figures * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **close** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Flag to automatically close the figure * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event `add_video(tag, vid_tensor, global_step=None, fps=4, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_video) Add video data to summary. Note that this requires the `moviepy` package. Parameters * **tag** (_string_) – Data identifier * **vid_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Video data * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **fps** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Frames per second * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Shape: vid_tensor: (N,T,C,H,W)(N, T, C, H, W) . The values should lie in [0, 255] for type `uint8` or [0, 1] for type `float`. `add_audio(tag, snd_tensor, global_step=None, sample_rate=44100, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_audio) Add audio data to summary. Parameters * **tag** (_string_) – Data identifier * **snd_tensor** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Sound data * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **sample_rate** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – sample rate in Hz * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Shape: snd_tensor: (1,L)(1, L) . The values should lie between [-1, 1]. `add_text(tag, text_string, global_step=None, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_text) Add text data to summary. Parameters * **tag** (_string_) – Data identifier * **text_string** (_string_) – String to save * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Examples: writer.add_text('lstm', 'This is an lstm', 0) writer.add_text('rnn', 'This is an rnn', 10) `add_graph(model, input_to_model=None, verbose=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_graph) Add graph data to summary. Parameters * **model** ([torch.nn.Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – Model to draw. * **input_to_model** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _list of torch.Tensor_) – A variable or a tuple of variables to be fed. * **verbose** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – Whether to print graph structure in console. `add_embedding(mat, metadata=None, label_img=None, global_step=None, tag='default', metadata_header=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_embedding) Add embedding projector data to summary. Parameters * **mat** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _or_ _numpy.array_) – A matrix which each row is the feature vector of the data point * **metadata** ([list](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)")) – A list of labels, each element will be convert to string * **label_img** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Images correspond to each data point * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **tag** (_string_) – Name for the embedding Shape: mat: (N,D)(N, D) , where N is number of data and D is feature dimension label_img: (N,C,H,W)(N, C, H, W) Examples: import keyword import torch meta = [] while len(meta)<100: meta = meta+keyword.kwlist # get some strings meta = meta[:100] for i, v in enumerate(meta): meta[i] = v+str(i) label_img = torch.rand(100, 3, 10, 32) for i in range(100): label_img[i]*=i/100.0 writer.add_embedding(torch.randn(100, 5), metadata=meta, label_img=label_img) writer.add_embedding(torch.randn(100, 5), label_img=label_img) writer.add_embedding(torch.randn(100, 5), metadata=meta) `add_pr_curve(tag, labels, predictions, global_step=None, num_thresholds=127, weights=None, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_pr_curve) Adds precision recall curve. Plotting a precision-recall curve lets you understand your model’s performance under different threshold settings. With this function, you provide the ground truth labeling (T/F) and prediction confidence (usually the output of your model) for each target. The TensorBoard UI will let you choose the threshold interactively. Parameters * **tag** (_string_) – Data identifier * **labels** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – Ground truth data. Binary label for each element. * **predictions** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor") _,__numpy.array_ _, or_ _string/blobname_) – The probability that an element be classified as true. Value should be in [0, 1] * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **num_thresholds** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Number of thresholds used to draw the curve. * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Examples: from torch.utils.tensorboard import SummaryWriter import numpy as np labels = np.random.randint(2, size=100) # binary label predictions = np.random.rand(100) writer = SummaryWriter() writer.add_pr_curve('pr_curve', labels, predictions, 0) writer.close() `add_custom_scalars(layout)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_custom_scalars) Create special chart by collecting charts tags in ‘scalars’. Note that this function can only be called once for each SummaryWriter() object. Because it only provides metadata to tensorboard, the function can be called before or after the training loop. Parameters **layout** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – {categoryName: _charts_}, where _charts_ is also a dictionary {chartName: _ListOfProperties_}. The first element in _ListOfProperties_ is the chart’s type (one of **Multiline** or **Margin**) and the second element should be a list containing the tags you have used in add_scalar function, which will be collected into the new chart. Examples: layout = {'Taiwan':{'twse':['Multiline',['twse/0050', 'twse/2330']]}, 'USA':{ 'dow':['Margin', ['dow/aaa', 'dow/bbb', 'dow/ccc']], 'nasdaq':['Margin', ['nasdaq/aaa', 'nasdaq/bbb', 'nasdaq/ccc']]}} writer.add_custom_scalars(layout) `add_mesh(tag, vertices, colors=None, faces=None, config_dict=None, global_step=None, walltime=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_mesh) Add meshes or 3D point clouds to TensorBoard. The visualization is based on Three.js, so it allows users to interact with the rendered object. Besides the basic definitions such as vertices, faces, users can further provide camera parameter, lighting condition, etc. Please check for advanced usage. Parameters * **tag** (_string_) – Data identifier * **vertices** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – List of the 3D coordinates of vertices. * **colors** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Colors for each vertex * **faces** ([torch.Tensor](tensors#torch.Tensor "torch.Tensor")) – Indices of vertices within each triangle. (Optional) * **config_dict** – Dictionary with ThreeJS classes names and configuration. * **global_step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – Global step value to record * **walltime** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – Optional override default walltime (time.time()) seconds after epoch of event Shape: vertices: (B,N,3)(B, N, 3) . (batch, number_of_vertices, channels) colors: (B,N,3)(B, N, 3) . The values should lie in [0, 255] for type `uint8` or [0, 1] for type `float`. faces: (B,N,3)(B, N, 3) . The values should lie in [0, number_of_vertices] for type `uint8`. Examples: from torch.utils.tensorboard import SummaryWriter vertices_tensor = torch.as_tensor([ [1, 1, 1], [-1, -1, 1], [1, -1, -1], [-1, 1, -1], ], dtype=torch.float).unsqueeze(0) colors_tensor = torch.as_tensor([ [255, 0, 0], [0, 255, 0], [0, 0, 255], [255, 0, 255], ], dtype=torch.int).unsqueeze(0) faces_tensor = torch.as_tensor([ [0, 2, 3], [0, 3, 1], [0, 1, 2], [1, 3, 2], ], dtype=torch.int).unsqueeze(0) writer = SummaryWriter() writer.add_mesh('my_mesh', vertices=vertices_tensor, colors=colors_tensor, faces=faces_tensor) writer.close() `add_hparams(hparam_dict, metric_dict, hparam_domain_discrete=None, run_name=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.add_hparams) Add a set of hyperparameters to be compared in TensorBoard. Parameters * **hparam_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Each key-value pair in the dictionary is the name of the hyper parameter and it’s corresponding value. The type of the value can be one of `bool`, `string`, `float`, `int`, or `None`. * **metric_dict** ([dict](https://docs.python.org/3/library/stdtypes.html#dict "\(in Python v3.9\)")) – Each key-value pair in the dictionary is the name of the metric and it’s corresponding value. Note that the key used here should be unique in the tensorboard record. Otherwise the value you added by `add_scalar` will be displayed in hparam plugin. In most cases, this is unwanted. * **hparam_domain_discrete** – (Optional[Dict[str, List[Any]]]) A dictionary that contains names of the hyperparameters and all discrete values they can hold * **run_name** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – Name of the run, to be included as part of the logdir. If unspecified, will use current timestamp. Examples: from torch.utils.tensorboard import SummaryWriter with SummaryWriter() as w: for i in range(5): w.add_hparams({'lr': 0.1*i, 'bsize': i}, {'hparam/accuracy': 10*i, 'hparam/loss': 10*i}) Expected result: [](_images/add_hparam.png) `flush()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.flush) Flushes the event file to disk. Call this method to make sure that all pending events have been written to disk. `close()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/utils/tensorboard/writer.html#SummaryWriter.close) # torch.Tensor A `torch.Tensor` is a multi-dimensional matrix containing elements of a single data type. Torch defines 10 tensor types with CPU and GPU variants which are as follows: Data type | dtype | CPU tensor | GPU tensor ---|---|---|--- 32-bit floating point | `torch.float32` or `torch.float` | `torch.FloatTensor` | `torch.cuda.FloatTensor` 64-bit floating point | `torch.float64` or `torch.double` | `torch.DoubleTensor` | `torch.cuda.DoubleTensor` 16-bit floating point 1 | `torch.float16` or `torch.half` | `torch.HalfTensor` | `torch.cuda.HalfTensor` 16-bit floating point 2 | `torch.bfloat16` | `torch.BFloat16Tensor` | `torch.cuda.BFloat16Tensor` 32-bit complex | `torch.complex32` | | 64-bit complex | `torch.complex64` | | 128-bit complex | `torch.complex128` or `torch.cdouble` | | 8-bit integer (unsigned) | `torch.uint8` | `torch.ByteTensor` | `torch.cuda.ByteTensor` 8-bit integer (signed) | `torch.int8` | `torch.CharTensor` | `torch.cuda.CharTensor` 16-bit integer (signed) | `torch.int16` or `torch.short` | `torch.ShortTensor` | `torch.cuda.ShortTensor` 32-bit integer (signed) | `torch.int32` or `torch.int` | `torch.IntTensor` | `torch.cuda.IntTensor` 64-bit integer (signed) | `torch.int64` or `torch.long` | `torch.LongTensor` | `torch.cuda.LongTensor` Boolean | `torch.bool` | `torch.BoolTensor` | `torch.cuda.BoolTensor` `1` Sometimes referred to as binary16: uses 1 sign, 5 exponent, and 10 significand bits. Useful when precision is important at the expense of range. `2` Sometimes referred to as Brain Floating Point: uses 1 sign, 8 exponent, and 7 significand bits. Useful when range is important, since it has the same number of exponent bits as `float32` `torch.Tensor` is an alias for the default tensor type (`torch.FloatTensor`). A tensor can be constructed from a Python [`list`](https://docs.python.org/3/library/stdtypes.html#list "\(in Python v3.9\)") or sequence using the [`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor") constructor: >>> torch.tensor([[1., -1.], [1., -1.]]) tensor([[ 1.0000, -1.0000], [ 1.0000, -1.0000]]) >>> torch.tensor(np.array([[1, 2, 3], [4, 5, 6]])) tensor([[ 1, 2, 3], [ 4, 5, 6]]) Warning [`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor") always copies `data`. If you have a Tensor `data` and just want to change its `requires_grad` flag, use `requires_grad_()` or [`detach()`](autograd#torch.Tensor.detach "torch.Tensor.detach") to avoid a copy. If you have a numpy array and want to avoid a copy, use [`torch.as_tensor()`](generated/torch.as_tensor#torch.as_tensor "torch.as_tensor"). A tensor of specific data type can be constructed by passing a [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and/or a [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") to a constructor or tensor creation op: >>> torch.zeros([2, 4], dtype=torch.int32) tensor([[ 0, 0, 0, 0], [ 0, 0, 0, 0]], dtype=torch.int32) >>> cuda0 = torch.device('cuda:0') >>> torch.ones([2, 4], dtype=torch.float64, device=cuda0) tensor([[ 1.0000, 1.0000, 1.0000, 1.0000], [ 1.0000, 1.0000, 1.0000, 1.0000]], dtype=torch.float64, device='cuda:0') The contents of a tensor can be accessed and modified using Python’s indexing and slicing notation: >>> x = torch.tensor([[1, 2, 3], [4, 5, 6]]) >>> print(x[1][2]) tensor(6) >>> x[0][1] = 8 >>> print(x) tensor([[ 1, 8, 3], [ 4, 5, 6]]) Use `torch.Tensor.item()` to get a Python number from a tensor containing a single value: >>> x = torch.tensor([[1]]) >>> x tensor([[ 1]]) >>> x.item() 1 >>> x = torch.tensor(2.5) >>> x tensor(2.5000) >>> x.item() 2.5 A tensor can be created with `requires_grad=True` so that [`torch.autograd`](autograd#module-torch.autograd "torch.autograd") records operations on them for automatic differentiation. >>> x = torch.tensor([[1., -1.], [1., 1.]], requires_grad=True) >>> out = x.pow(2).sum() >>> out.backward() >>> x.grad tensor([[ 2.0000, -2.0000], [ 2.0000, 2.0000]]) Each tensor has an associated `torch.Storage`, which holds its data. The tensor class also provides multi-dimensional, [strided](https://en.wikipedia.org/wiki/Stride_of_an_array) view of a storage and defines numeric operations on it. Note For more information on tensor views, see [Tensor Views](tensor_view#tensor- view-doc). Note For more information on the [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), and [`torch.layout`](tensor_attributes#torch.torch.layout "torch.torch.layout") attributes of a `torch.Tensor`, see [Tensor Attributes](tensor_attributes#tensor-attributes-doc). Note Methods which mutate a tensor are marked with an underscore suffix. For example, `torch.FloatTensor.abs_()` computes the absolute value in-place and returns the modified tensor, while `torch.FloatTensor.abs()` computes the result in a new tensor. Note To change an existing tensor’s [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") and/or [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), consider using `to()` method on the tensor. Warning Current implementation of `torch.Tensor` introduces memory overhead, thus it might lead to unexpectedly high memory usage in the applications with many tiny tensors. If this is your case, consider using one large structure. `class torch.Tensor` There are a few main ways to create a tensor, depending on your use case. * To create a tensor with pre-existing data, use [`torch.tensor()`](generated/torch.tensor#torch.tensor "torch.tensor"). * To create a tensor with specific size, use `torch.*` tensor creation ops (see [Creation Ops](torch#tensor-creation-ops)). * To create a tensor with the same size (and similar types) as another tensor, use `torch.*_like` tensor creation ops (see [Creation Ops](torch#tensor-creation-ops)). * To create a tensor with similar type but different size as another tensor, use `tensor.new_*` creation ops. `new_tensor(data, dtype=None, device=None, requires_grad=False) → Tensor` Returns a new Tensor with `data` as the tensor data. By default, the returned Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. Warning `new_tensor()` always copies `data`. If you have a Tensor `data` and want to avoid a copy, use `torch.Tensor.requires_grad_()` or [`torch.Tensor.detach()`](autograd#torch.Tensor.detach "torch.Tensor.detach"). If you have a numpy array and want to avoid a copy, use [`torch.from_numpy()`](generated/torch.from_numpy#torch.from_numpy "torch.from_numpy"). Warning When data is a tensor `x`, `new_tensor()` reads out ‘the data’ from whatever it is passed, and constructs a leaf variable. Therefore `tensor.new_tensor(x)` is equivalent to `x.clone().detach()` and `tensor.new_tensor(x, requires_grad=True)` is equivalent to `x.clone().detach().requires_grad_(True)`. The equivalents using `clone()` and `detach()` are recommended. Parameters * **data** (_array_like_) – The returned Tensor copies `data`. * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor. * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> tensor = torch.ones((2,), dtype=torch.int8) >>> data = [[0, 1], [2, 3]] >>> tensor.new_tensor(data) tensor([[ 0, 1], [ 2, 3]], dtype=torch.int8) `new_full(size, fill_value, dtype=None, device=None, requires_grad=False) → Tensor` Returns a Tensor of size `size` filled with `fill_value`. By default, the returned Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. Parameters * **fill_value** (_scalar_) – the number to fill the output tensor with. * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor. * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> tensor = torch.ones((2,), dtype=torch.float64) >>> tensor.new_full((3, 4), 3.141592) tensor([[ 3.1416, 3.1416, 3.1416, 3.1416], [ 3.1416, 3.1416, 3.1416, 3.1416], [ 3.1416, 3.1416, 3.1416, 3.1416]], dtype=torch.float64) `new_empty(size, dtype=None, device=None, requires_grad=False) → Tensor` Returns a Tensor of size `size` filled with uninitialized data. By default, the returned Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. Parameters * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor. * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> tensor = torch.ones(()) >>> tensor.new_empty((2, 3)) tensor([[ 5.8182e-18, 4.5765e-41, -1.0545e+30], [ 3.0949e-41, 4.4842e-44, 0.0000e+00]]) `new_ones(size, dtype=None, device=None, requires_grad=False) → Tensor` Returns a Tensor of size `size` filled with `1`. By default, the returned Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. Parameters * **size** (_int..._) – a list, tuple, or `torch.Size` of integers defining the shape of the output tensor. * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor. * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> tensor = torch.tensor((), dtype=torch.int32) >>> tensor.new_ones((2, 3)) tensor([[ 1, 1, 1], [ 1, 1, 1]], dtype=torch.int32) `new_zeros(size, dtype=None, device=None, requires_grad=False) → Tensor` Returns a Tensor of size `size` filled with `0`. By default, the returned Tensor has the same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. Parameters * **size** (_int..._) – a list, tuple, or `torch.Size` of integers defining the shape of the output tensor. * **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), optional) – the desired type of returned tensor. Default: if None, same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") as this tensor. * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), optional) – the desired device of returned tensor. Default: if None, same [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as this tensor. * **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If autograd should record operations on the returned tensor. Default: `False`. Example: >>> tensor = torch.tensor((), dtype=torch.float64) >>> tensor.new_zeros((2, 3)) tensor([[ 0., 0., 0.], [ 0., 0., 0.]], dtype=torch.float64) `is_cuda` Is `True` if the Tensor is stored on the GPU, `False` otherwise. `is_quantized` Is `True` if the Tensor is quantized, `False` otherwise. `is_meta` Is `True` if the Tensor is a meta tensor, `False` otherwise. Meta tensors are like normal tensors, but they carry no data. `device` Is the [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") where this Tensor is. `grad` This attribute is `None` by default and becomes a Tensor the first time a call to [`backward()`](autograd#torch.Tensor.backward "torch.Tensor.backward") computes gradients for `self`. The attribute will then contain the gradients computed and future calls to [`backward()`](autograd#torch.Tensor.backward "torch.Tensor.backward") will accumulate (add) gradients into it. `ndim` Alias for `dim()` `T` Is this Tensor with its dimensions reversed. If `n` is the number of dimensions in `x`, `x.T` is equivalent to `x.permute(n-1, n-2, ..., 0)`. `real` Returns a new tensor containing real values of the `self` tensor. The returned tensor and `self` share the same underlying storage. Warning [`real()`](generated/torch.real#torch.real "torch.real") is only supported for tensors with complex dtypes. Example:: >>> x=torch.randn(4, dtype=torch.cfloat) >>> x tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)]) >>> x.real tensor([ 0.3100, -0.5445, -1.6492, -0.0638]) `imag` Returns a new tensor containing imaginary values of the `self` tensor. The returned tensor and `self` share the same underlying storage. Warning [`imag()`](generated/torch.imag#torch.imag "torch.imag") is only supported for tensors with complex dtypes. Example:: >>> x=torch.randn(4, dtype=torch.cfloat) >>> x tensor([(0.3100+0.3553j), (-0.5445-0.7896j), (-1.6492-0.0633j), (-0.0638-0.8119j)]) >>> x.imag tensor([ 0.3553, -0.7896, -0.0633, -0.8119]) `abs() → Tensor` See [`torch.abs()`](generated/torch.abs#torch.abs "torch.abs") `abs_() → Tensor` In-place version of `abs()` `absolute() → Tensor` Alias for [`abs()`](generated/torch.abs#torch.abs "torch.abs") `absolute_() → Tensor` In-place version of `absolute()` Alias for `abs_()` `acos() → Tensor` See [`torch.acos()`](generated/torch.acos#torch.acos "torch.acos") `acos_() → Tensor` In-place version of `acos()` `arccos() → Tensor` See [`torch.arccos()`](generated/torch.arccos#torch.arccos "torch.arccos") `arccos_() → Tensor` In-place version of `arccos()` `add(other, *, alpha=1) → Tensor` Add a scalar or tensor to `self` tensor. If both `alpha` and `other` are specified, each element of `other` is scaled by `alpha` before being used. When `other` is a tensor, the shape of `other` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the shape of the underlying tensor See [`torch.add()`](generated/torch.add#torch.add "torch.add") `add_(other, *, alpha=1) → Tensor` In-place version of `add()` `addbmm(batch1, batch2, *, beta=1, alpha=1) → Tensor` See [`torch.addbmm()`](generated/torch.addbmm#torch.addbmm "torch.addbmm") `addbmm_(batch1, batch2, *, beta=1, alpha=1) → Tensor` In-place version of `addbmm()` `addcdiv(tensor1, tensor2, *, value=1) → Tensor` See [`torch.addcdiv()`](generated/torch.addcdiv#torch.addcdiv "torch.addcdiv") `addcdiv_(tensor1, tensor2, *, value=1) → Tensor` In-place version of `addcdiv()` `addcmul(tensor1, tensor2, *, value=1) → Tensor` See [`torch.addcmul()`](generated/torch.addcmul#torch.addcmul "torch.addcmul") `addcmul_(tensor1, tensor2, *, value=1) → Tensor` In-place version of `addcmul()` `addmm(mat1, mat2, *, beta=1, alpha=1) → Tensor` See [`torch.addmm()`](generated/torch.addmm#torch.addmm "torch.addmm") `addmm_(mat1, mat2, *, beta=1, alpha=1) → Tensor` In-place version of `addmm()` `sspaddmm(mat1, mat2, *, beta=1, alpha=1) → Tensor` See [`torch.sspaddmm()`](sparse#torch.sspaddmm "torch.sspaddmm") `addmv(mat, vec, *, beta=1, alpha=1) → Tensor` See [`torch.addmv()`](generated/torch.addmv#torch.addmv "torch.addmv") `addmv_(mat, vec, *, beta=1, alpha=1) → Tensor` In-place version of `addmv()` `addr(vec1, vec2, *, beta=1, alpha=1) → Tensor` See [`torch.addr()`](generated/torch.addr#torch.addr "torch.addr") `addr_(vec1, vec2, *, beta=1, alpha=1) → Tensor` In-place version of `addr()` `allclose(other, rtol=1e-05, atol=1e-08, equal_nan=False) → Tensor` See [`torch.allclose()`](generated/torch.allclose#torch.allclose "torch.allclose") `amax(dim=None, keepdim=False) → Tensor` See [`torch.amax()`](generated/torch.amax#torch.amax "torch.amax") `amin(dim=None, keepdim=False) → Tensor` See [`torch.amin()`](generated/torch.amin#torch.amin "torch.amin") `angle() → Tensor` See [`torch.angle()`](generated/torch.angle#torch.angle "torch.angle") `apply_(callable) → Tensor` Applies the function `callable` to each element in the tensor, replacing each element with the value returned by `callable`. Note This function only works with CPU tensors and should not be used in code sections that require high performance. `argmax(dim=None, keepdim=False) → LongTensor` See [`torch.argmax()`](generated/torch.argmax#torch.argmax "torch.argmax") `argmin(dim=None, keepdim=False) → LongTensor` See [`torch.argmin()`](generated/torch.argmin#torch.argmin "torch.argmin") `argsort(dim=-1, descending=False) → LongTensor` See [`torch.argsort()`](generated/torch.argsort#torch.argsort "torch.argsort") `asin() → Tensor` See [`torch.asin()`](generated/torch.asin#torch.asin "torch.asin") `asin_() → Tensor` In-place version of `asin()` `arcsin() → Tensor` See [`torch.arcsin()`](generated/torch.arcsin#torch.arcsin "torch.arcsin") `arcsin_() → Tensor` In-place version of `arcsin()` `as_strided(size, stride, storage_offset=0) → Tensor` See [`torch.as_strided()`](generated/torch.as_strided#torch.as_strided "torch.as_strided") `atan() → Tensor` See [`torch.atan()`](generated/torch.atan#torch.atan "torch.atan") `atan_() → Tensor` In-place version of `atan()` `arctan() → Tensor` See [`torch.arctan()`](generated/torch.arctan#torch.arctan "torch.arctan") `arctan_() → Tensor` In-place version of `arctan()` `atan2(other) → Tensor` See [`torch.atan2()`](generated/torch.atan2#torch.atan2 "torch.atan2") `atan2_(other) → Tensor` In-place version of `atan2()` `all(dim=None, keepdim=False) → Tensor` See [`torch.all()`](generated/torch.all#torch.all "torch.all") `any(dim=None, keepdim=False) → Tensor` See [`torch.any()`](generated/torch.any#torch.any "torch.any") `backward(gradient=None, retain_graph=None, create_graph=False, inputs=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.backward) Computes the gradient of current tensor w.r.t. graph leaves. The graph is differentiated using the chain rule. If the tensor is non-scalar (i.e. its data has more than one element) and requires gradient, the function additionally requires specifying `gradient`. It should be a tensor of matching type and location, that contains the gradient of the differentiated function w.r.t. `self`. This function accumulates gradients in the leaves - you might need to zero `.grad` attributes or set them to `None` before calling it. See [Default gradient layouts](autograd#default-grad-layouts) for details on the memory layout of accumulated gradients. Note If you run any forward ops, create `gradient`, and/or call `backward` in a user-specified CUDA stream context, see [Stream semantics of backward passes](https://pytorch.org/docs/1.8.0/notes/cuda.html#bwd-cuda-stream- semantics). Parameters * **gradient** (Tensor _or_[None](https://docs.python.org/3/library/constants.html#None "\(in Python v3.9\)")) – Gradient w.r.t. the tensor. If it is a tensor, it will be automatically converted to a Tensor that does not require grad unless `create_graph` is True. None values can be specified for scalar Tensors or ones that don’t require grad. If a None value would be acceptable then this argument is optional. * **retain_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `False`, the graph used to compute the grads will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of `create_graph`. * **create_graph** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – If `True`, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to `False`. * **inputs** (_sequence of Tensor_) – Inputs w.r.t. which the gradient will be accumulated into `.grad`. All other Tensors will be ignored. If not provided, the gradient is accumulated into all the leaf Tensors that were used to compute the attr::tensors. All the provided inputs must be leaf Tensors. `baddbmm(batch1, batch2, *, beta=1, alpha=1) → Tensor` See [`torch.baddbmm()`](generated/torch.baddbmm#torch.baddbmm "torch.baddbmm") `baddbmm_(batch1, batch2, *, beta=1, alpha=1) → Tensor` In-place version of `baddbmm()` `bernoulli(*, generator=None) → Tensor` Returns a result tensor where each result[i]\texttt{result[i]} is independently sampled from Bernoulli(self[i])\text{Bernoulli}(\texttt{self[i]}) . `self` must have floating point `dtype`, and the result will have the same `dtype`. See [`torch.bernoulli()`](generated/torch.bernoulli#torch.bernoulli "torch.bernoulli") `bernoulli_()` `bernoulli_(p=0.5, *, generator=None) → Tensor` Fills each location of `self` with an independent sample from Bernoulli(p)\text{Bernoulli}(\texttt{p}) . `self` can have integral `dtype`. `bernoulli_(p_tensor, *, generator=None) → Tensor` `p_tensor` should be a tensor containing probabilities to be used for drawing the binary random number. The ith\text{i}^{th} element of `self` tensor will be set to a value sampled from Bernoulli(p_tensor[i])\text{Bernoulli}(\texttt{p\\_tensor[i]}) . `self` can have integral `dtype`, but `p_tensor` must have floating point `dtype`. See also `bernoulli()` and [`torch.bernoulli()`](generated/torch.bernoulli#torch.bernoulli "torch.bernoulli") `bfloat16(memory_format=torch.preserve_format) → Tensor` `self.bfloat16()` is equivalent to `self.to(torch.bfloat16)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `bincount(weights=None, minlength=0) → Tensor` See [`torch.bincount()`](generated/torch.bincount#torch.bincount "torch.bincount") `bitwise_not() → Tensor` See [`torch.bitwise_not()`](generated/torch.bitwise_not#torch.bitwise_not "torch.bitwise_not") `bitwise_not_() → Tensor` In-place version of `bitwise_not()` `bitwise_and() → Tensor` See [`torch.bitwise_and()`](generated/torch.bitwise_and#torch.bitwise_and "torch.bitwise_and") `bitwise_and_() → Tensor` In-place version of `bitwise_and()` `bitwise_or() → Tensor` See [`torch.bitwise_or()`](generated/torch.bitwise_or#torch.bitwise_or "torch.bitwise_or") `bitwise_or_() → Tensor` In-place version of `bitwise_or()` `bitwise_xor() → Tensor` See [`torch.bitwise_xor()`](generated/torch.bitwise_xor#torch.bitwise_xor "torch.bitwise_xor") `bitwise_xor_() → Tensor` In-place version of `bitwise_xor()` `bmm(batch2) → Tensor` See [`torch.bmm()`](generated/torch.bmm#torch.bmm "torch.bmm") `bool(memory_format=torch.preserve_format) → Tensor` `self.bool()` is equivalent to `self.to(torch.bool)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `byte(memory_format=torch.preserve_format) → Tensor` `self.byte()` is equivalent to `self.to(torch.uint8)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `broadcast_to(shape) → Tensor` See [`torch.broadcast_to()`](generated/torch.broadcast_to#torch.broadcast_to "torch.broadcast_to"). `cauchy_(median=0, sigma=1, *, generator=None) → Tensor` Fills the tensor with numbers drawn from the Cauchy distribution: f(x)=1πσ(x−median)2+σ2f(x) = \dfrac{1}{\pi} \dfrac{\sigma}{(x - \text{median})^2 + \sigma^2} `ceil() → Tensor` See [`torch.ceil()`](generated/torch.ceil#torch.ceil "torch.ceil") `ceil_() → Tensor` In-place version of `ceil()` `char(memory_format=torch.preserve_format) → Tensor` `self.char()` is equivalent to `self.to(torch.int8)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `cholesky(upper=False) → Tensor` See [`torch.cholesky()`](generated/torch.cholesky#torch.cholesky "torch.cholesky") `cholesky_inverse(upper=False) → Tensor` See [`torch.cholesky_inverse()`](generated/torch.cholesky_inverse#torch.cholesky_inverse "torch.cholesky_inverse") `cholesky_solve(input2, upper=False) → Tensor` See [`torch.cholesky_solve()`](generated/torch.cholesky_solve#torch.cholesky_solve "torch.cholesky_solve") `chunk(chunks, dim=0) → List of Tensors` See [`torch.chunk()`](generated/torch.chunk#torch.chunk "torch.chunk") `clamp(min, max) → Tensor` See [`torch.clamp()`](generated/torch.clamp#torch.clamp "torch.clamp") `clamp_(min, max) → Tensor` In-place version of `clamp()` `clip(min, max) → Tensor` Alias for `clamp()`. `clip_(min, max) → Tensor` Alias for `clamp_()`. `clone(*, memory_format=torch.preserve_format) → Tensor` See [`torch.clone()`](generated/torch.clone#torch.clone "torch.clone") `contiguous(memory_format=torch.contiguous_format) → Tensor` Returns a contiguous in memory tensor containing the same data as `self` tensor. If `self` tensor is already in the specified memory format, this function returns the `self` tensor. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.contiguous_format`. `copy_(src, non_blocking=False) → Tensor` Copies the elements from `src` into `self` tensor and returns `self`. The `src` tensor must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the `self` tensor. It may be of a different data type or reside on a different device. Parameters * **src** (Tensor) – the source tensor to copy from * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – if `True` and this copy is between CPU and GPU, the copy may occur asynchronously with respect to the host. For other cases, this argument has no effect. `conj() → Tensor` See [`torch.conj()`](generated/torch.conj#torch.conj "torch.conj") `copysign(other) → Tensor` See [`torch.copysign()`](generated/torch.copysign#torch.copysign "torch.copysign") `copysign_(other) → Tensor` In-place version of `copysign()` `cos() → Tensor` See [`torch.cos()`](generated/torch.cos#torch.cos "torch.cos") `cos_() → Tensor` In-place version of `cos()` `cosh() → Tensor` See [`torch.cosh()`](generated/torch.cosh#torch.cosh "torch.cosh") `cosh_() → Tensor` In-place version of `cosh()` `count_nonzero(dim=None) → Tensor` See [`torch.count_nonzero()`](generated/torch.count_nonzero#torch.count_nonzero "torch.count_nonzero") `acosh() → Tensor` See [`torch.acosh()`](generated/torch.acosh#torch.acosh "torch.acosh") `acosh_() → Tensor` In-place version of `acosh()` `arccosh()` acosh() -> Tensor See [`torch.arccosh()`](generated/torch.arccosh#torch.arccosh "torch.arccosh") `arccosh_()` acosh_() -> Tensor In-place version of `arccosh()` `cpu(memory_format=torch.preserve_format) → Tensor` Returns a copy of this object in CPU memory. If this object is already in CPU memory and on the correct device, then no copy is performed and the original object is returned. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `cross(other, dim=-1) → Tensor` See [`torch.cross()`](generated/torch.cross#torch.cross "torch.cross") `cuda(device=None, non_blocking=False, memory_format=torch.preserve_format) → Tensor` Returns a copy of this object in CUDA memory. If this object is already in CUDA memory and on the correct device, then no copy is performed and the original object is returned. Parameters * **device** ([`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device")) – The destination GPU device. Defaults to the current CUDA device. * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True` and the source is in pinned memory, the copy will be asynchronous with respect to the host. Otherwise, the argument has no effect. Default: `False`. * **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `logcumsumexp(dim) → Tensor` See [`torch.logcumsumexp()`](generated/torch.logcumsumexp#torch.logcumsumexp "torch.logcumsumexp") `cummax(dim) -> (Tensor, Tensor)` See [`torch.cummax()`](generated/torch.cummax#torch.cummax "torch.cummax") `cummin(dim) -> (Tensor, Tensor)` See [`torch.cummin()`](generated/torch.cummin#torch.cummin "torch.cummin") `cumprod(dim, dtype=None) → Tensor` See [`torch.cumprod()`](generated/torch.cumprod#torch.cumprod "torch.cumprod") `cumprod_(dim, dtype=None) → Tensor` In-place version of `cumprod()` `cumsum(dim, dtype=None) → Tensor` See [`torch.cumsum()`](generated/torch.cumsum#torch.cumsum "torch.cumsum") `cumsum_(dim, dtype=None) → Tensor` In-place version of `cumsum()` `data_ptr() → int` Returns the address of the first element of `self` tensor. `deg2rad() → Tensor` See [`torch.deg2rad()`](generated/torch.deg2rad#torch.deg2rad "torch.deg2rad") `dequantize() → Tensor` Given a quantized Tensor, dequantize it and return the dequantized float Tensor. `det() → Tensor` See [`torch.det()`](generated/torch.det#torch.det "torch.det") `dense_dim() → int` Return the number of dense dimensions in a [sparse tensor](sparse#sparse-docs) `self`. Warning Throws an error if `self` is not a sparse tensor. See also [`Tensor.sparse_dim()`](sparse#torch.Tensor.sparse_dim "torch.Tensor.sparse_dim") and [hybrid tensors](sparse#sparse-hybrid-coo- docs). `detach()` Returns a new Tensor, detached from the current graph. The result will never require gradient. Note Returned Tensor shares the same storage with the original one. In-place modifications on either of them will be seen, and may trigger errors in correctness checks. IMPORTANT NOTE: Previously, in-place size / stride / storage changes (such as `resize_` / `resize_as_` / `set_` / `transpose_`) to the returned tensor also update the original tensor. Now, these in-place changes will not update the original tensor anymore, and will instead trigger an error. For sparse tensors: In-place indices / values changes (such as `zero_` / `copy_` / `add_`) to the returned tensor will not update the original tensor anymore, and will instead trigger an error. `detach_()` Detaches the Tensor from the graph that created it, making it a leaf. Views cannot be detached in-place. `diag(diagonal=0) → Tensor` See [`torch.diag()`](generated/torch.diag#torch.diag "torch.diag") `diag_embed(offset=0, dim1=-2, dim2=-1) → Tensor` See [`torch.diag_embed()`](generated/torch.diag_embed#torch.diag_embed "torch.diag_embed") `diagflat(offset=0) → Tensor` See [`torch.diagflat()`](generated/torch.diagflat#torch.diagflat "torch.diagflat") `diagonal(offset=0, dim1=0, dim2=1) → Tensor` See [`torch.diagonal()`](generated/torch.diagonal#torch.diagonal "torch.diagonal") `fill_diagonal_(fill_value, wrap=False) → Tensor` Fill the main diagonal of a tensor that has at least 2-dimensions. When dims>2, all dimensions of input must be of equal length. This function modifies the input tensor in-place, and returns the input tensor. Parameters * **fill_value** (_Scalar_) – the fill value * **wrap** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – the diagonal ‘wrapped’ after N columns for tall matrices. Example: >>> a = torch.zeros(3, 3) >>> a.fill_diagonal_(5) tensor([[5., 0., 0.], [0., 5., 0.], [0., 0., 5.]]) >>> b = torch.zeros(7, 3) >>> b.fill_diagonal_(5) tensor([[5., 0., 0.], [0., 5., 0.], [0., 0., 5.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.], [0., 0., 0.]]) >>> c = torch.zeros(7, 3) >>> c.fill_diagonal_(5, wrap=True) tensor([[5., 0., 0.], [0., 5., 0.], [0., 0., 5.], [0., 0., 0.], [5., 0., 0.], [0., 5., 0.], [0., 0., 5.]]) `fmax(other) → Tensor` See [`torch.fmax()`](generated/torch.fmax#torch.fmax "torch.fmax") `fmin(other) → Tensor` See [`torch.fmin()`](generated/torch.fmin#torch.fmin "torch.fmin") `diff(n=1, dim=-1, prepend=None, append=None) → Tensor` See [`torch.diff()`](generated/torch.diff#torch.diff "torch.diff") `digamma() → Tensor` See [`torch.digamma()`](generated/torch.digamma#torch.digamma "torch.digamma") `digamma_() → Tensor` In-place version of `digamma()` `dim() → int` Returns the number of dimensions of `self` tensor. `dist(other, p=2) → Tensor` See [`torch.dist()`](generated/torch.dist#torch.dist "torch.dist") `div(value, *, rounding_mode=None) → Tensor` See [`torch.div()`](generated/torch.div#torch.div "torch.div") `div_(value, *, rounding_mode=None) → Tensor` In-place version of `div()` `divide(value, *, rounding_mode=None) → Tensor` See [`torch.divide()`](generated/torch.divide#torch.divide "torch.divide") `divide_(value, *, rounding_mode=None) → Tensor` In-place version of `divide()` `dot(other) → Tensor` See [`torch.dot()`](generated/torch.dot#torch.dot "torch.dot") `double(memory_format=torch.preserve_format) → Tensor` `self.double()` is equivalent to `self.to(torch.float64)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `eig(eigenvectors=False) -> (Tensor, Tensor)` See [`torch.eig()`](generated/torch.eig#torch.eig "torch.eig") `element_size() → int` Returns the size in bytes of an individual element. Example: >>> torch.tensor([]).element_size() 4 >>> torch.tensor([], dtype=torch.uint8).element_size() 1 `eq(other) → Tensor` See [`torch.eq()`](generated/torch.eq#torch.eq "torch.eq") `eq_(other) → Tensor` In-place version of `eq()` `equal(other) → bool` See [`torch.equal()`](generated/torch.equal#torch.equal "torch.equal") `erf() → Tensor` See [`torch.erf()`](generated/torch.erf#torch.erf "torch.erf") `erf_() → Tensor` In-place version of `erf()` `erfc() → Tensor` See [`torch.erfc()`](generated/torch.erfc#torch.erfc "torch.erfc") `erfc_() → Tensor` In-place version of `erfc()` `erfinv() → Tensor` See [`torch.erfinv()`](generated/torch.erfinv#torch.erfinv "torch.erfinv") `erfinv_() → Tensor` In-place version of `erfinv()` `exp() → Tensor` See [`torch.exp()`](generated/torch.exp#torch.exp "torch.exp") `exp_() → Tensor` In-place version of `exp()` `expm1() → Tensor` See [`torch.expm1()`](generated/torch.expm1#torch.expm1 "torch.expm1") `expm1_() → Tensor` In-place version of `expm1()` `expand(*sizes) → Tensor` Returns a new view of the `self` tensor with singleton dimensions expanded to a larger size. Passing -1 as the size for a dimension means not changing the size of that dimension. Tensor can be also expanded to a larger number of dimensions, and the new ones will be appended at the front. For the new dimensions, the size cannot be set to -1. Expanding a tensor does not allocate new memory, but only creates a new view on the existing tensor where a dimension of size one is expanded to a larger size by setting the `stride` to 0. Any dimension of size 1 can be expanded to an arbitrary value without allocating new memory. Parameters ***sizes** (_torch.Size_ _or_ _int..._) – the desired expanded size Warning More than one element of an expanded tensor may refer to a single memory location. As a result, in-place operations (especially ones that are vectorized) may result in incorrect behavior. If you need to write to the tensors, please clone them first. Example: >>> x = torch.tensor([[1], [2], [3]]) >>> x.size() torch.Size([3, 1]) >>> x.expand(3, 4) tensor([[ 1, 1, 1, 1], [ 2, 2, 2, 2], [ 3, 3, 3, 3]]) >>> x.expand(-1, 4) # -1 means not changing the size of that dimension tensor([[ 1, 1, 1, 1], [ 2, 2, 2, 2], [ 3, 3, 3, 3]]) `expand_as(other) → Tensor` Expand this tensor to the same size as `other`. `self.expand_as(other)` is equivalent to `self.expand(other.size())`. Please see `expand()` for more information about `expand`. Parameters **other** (`torch.Tensor`) – The result tensor has the same size as `other`. `exponential_(lambd=1, *, generator=None) → Tensor` Fills `self` tensor with elements drawn from the exponential distribution: f(x)=λe−λxf(x) = \lambda e^{-\lambda x} `fix() → Tensor` See [`torch.fix()`](generated/torch.fix#torch.fix "torch.fix"). `fix_() → Tensor` In-place version of `fix()` `fill_(value) → Tensor` Fills `self` tensor with the specified value. `flatten(input, start_dim=0, end_dim=-1) → Tensor` see [`torch.flatten()`](generated/torch.flatten#torch.flatten "torch.flatten") `flip(dims) → Tensor` See [`torch.flip()`](generated/torch.flip#torch.flip "torch.flip") `fliplr() → Tensor` See [`torch.fliplr()`](generated/torch.fliplr#torch.fliplr "torch.fliplr") `flipud() → Tensor` See [`torch.flipud()`](generated/torch.flipud#torch.flipud "torch.flipud") `float(memory_format=torch.preserve_format) → Tensor` `self.float()` is equivalent to `self.to(torch.float32)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `float_power(exponent) → Tensor` See [`torch.float_power()`](generated/torch.float_power#torch.float_power "torch.float_power") `float_power_(exponent) → Tensor` In-place version of `float_power()` `floor() → Tensor` See [`torch.floor()`](generated/torch.floor#torch.floor "torch.floor") `floor_() → Tensor` In-place version of `floor()` `floor_divide(value) → Tensor` See [`torch.floor_divide()`](generated/torch.floor_divide#torch.floor_divide "torch.floor_divide") `floor_divide_(value) → Tensor` In-place version of `floor_divide()` `fmod(divisor) → Tensor` See [`torch.fmod()`](generated/torch.fmod#torch.fmod "torch.fmod") `fmod_(divisor) → Tensor` In-place version of `fmod()` `frac() → Tensor` See [`torch.frac()`](generated/torch.frac#torch.frac "torch.frac") `frac_() → Tensor` In-place version of `frac()` `gather(dim, index) → Tensor` See [`torch.gather()`](generated/torch.gather#torch.gather "torch.gather") `gcd(other) → Tensor` See [`torch.gcd()`](generated/torch.gcd#torch.gcd "torch.gcd") `gcd_(other) → Tensor` In-place version of `gcd()` `ge(other) → Tensor` See [`torch.ge()`](generated/torch.ge#torch.ge "torch.ge"). `ge_(other) → Tensor` In-place version of `ge()`. `greater_equal(other) → Tensor` See [`torch.greater_equal()`](generated/torch.greater_equal#torch.greater_equal "torch.greater_equal"). `greater_equal_(other) → Tensor` In-place version of `greater_equal()`. `geometric_(p, *, generator=None) → Tensor` Fills `self` tensor with elements drawn from the geometric distribution: f(X=k)=pk−1(1−p)f(X=k) = p^{k - 1} (1 - p) `geqrf() -> (Tensor, Tensor)` See [`torch.geqrf()`](generated/torch.geqrf#torch.geqrf "torch.geqrf") `ger(vec2) → Tensor` See [`torch.ger()`](generated/torch.ger#torch.ger "torch.ger") `get_device() -> Device ordinal (Integer)` For CUDA tensors, this function returns the device ordinal of the GPU on which the tensor resides. For CPU tensors, an error is thrown. Example: >>> x = torch.randn(3, 4, 5, device='cuda:0') >>> x.get_device() 0 >>> x.cpu().get_device() # RuntimeError: get_device is not implemented for type torch.FloatTensor `gt(other) → Tensor` See [`torch.gt()`](generated/torch.gt#torch.gt "torch.gt"). `gt_(other) → Tensor` In-place version of `gt()`. `greater(other) → Tensor` See [`torch.greater()`](generated/torch.greater#torch.greater "torch.greater"). `greater_(other) → Tensor` In-place version of `greater()`. `half(memory_format=torch.preserve_format) → Tensor` `self.half()` is equivalent to `self.to(torch.float16)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `hardshrink(lambd=0.5) → Tensor` See [`torch.nn.functional.hardshrink()`](nn.functional#torch.nn.functional.hardshrink "torch.nn.functional.hardshrink") `heaviside(values) → Tensor` See [`torch.heaviside()`](generated/torch.heaviside#torch.heaviside "torch.heaviside") `histc(bins=100, min=0, max=0) → Tensor` See [`torch.histc()`](generated/torch.histc#torch.histc "torch.histc") `hypot(other) → Tensor` See [`torch.hypot()`](generated/torch.hypot#torch.hypot "torch.hypot") `hypot_(other) → Tensor` In-place version of `hypot()` `i0() → Tensor` See [`torch.i0()`](generated/torch.i0#torch.i0 "torch.i0") `i0_() → Tensor` In-place version of `i0()` `igamma(other) → Tensor` See [`torch.igamma()`](generated/torch.igamma#torch.igamma "torch.igamma") `igamma_(other) → Tensor` In-place version of `igamma()` `igammac(other) → Tensor` See [`torch.igammac()`](generated/torch.igammac#torch.igammac "torch.igammac") `igammac_(other) → Tensor` In-place version of `igammac()` `index_add_(dim, index, tensor) → Tensor` Accumulate the elements of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") into the `self` tensor by adding to the indices in the order given in `index`. For example, if `dim == 0` and `index[i] == j`, then the `i`th row of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") is added to the `j`th row of `self`. The `dim`th dimension of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") must have the same size as the length of `index` (which must be a vector), and all other dimensions must match `self`, or an error will be raised. Note This operation may behave nondeterministically when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Parameters * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to index * **index** (_IntTensor_ _or_ _LongTensor_) – indices of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") to select from * **tensor** (Tensor) – the tensor containing values to add Example: >>> x = torch.ones(5, 3) >>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float) >>> index = torch.tensor([0, 4, 2]) >>> x.index_add_(0, index, t) tensor([[ 2., 3., 4.], [ 1., 1., 1.], [ 8., 9., 10.], [ 1., 1., 1.], [ 5., 6., 7.]]) `index_add(tensor1, dim, index, tensor2) → Tensor` Out-of-place version of `torch.Tensor.index_add_()`. `tensor1` corresponds to `self` in `torch.Tensor.index_add_()`. `index_copy_(dim, index, tensor) → Tensor` Copies the elements of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") into the `self` tensor by selecting the indices in the order given in `index`. For example, if `dim == 0` and `index[i] == j`, then the `i`th row of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") is copied to the `j`th row of `self`. The `dim`th dimension of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") must have the same size as the length of `index` (which must be a vector), and all other dimensions must match `self`, or an error will be raised. Note If `index` contains duplicate entries, multiple elements from [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") will be copied to the same index of `self`. The result is nondeterministic since it depends on which copy occurs last. Parameters * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to index * **index** (_LongTensor_) – indices of [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") to select from * **tensor** (Tensor) – the tensor containing values to copy Example: >>> x = torch.zeros(5, 3) >>> t = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float) >>> index = torch.tensor([0, 4, 2]) >>> x.index_copy_(0, index, t) tensor([[ 1., 2., 3.], [ 0., 0., 0.], [ 7., 8., 9.], [ 0., 0., 0.], [ 4., 5., 6.]]) `index_copy(tensor1, dim, index, tensor2) → Tensor` Out-of-place version of `torch.Tensor.index_copy_()`. `tensor1` corresponds to `self` in `torch.Tensor.index_copy_()`. `index_fill_(dim, index, val) → Tensor` Fills the elements of the `self` tensor with value `val` by selecting the indices in the order given in `index`. Parameters * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension along which to index * **index** (_LongTensor_) – indices of `self` tensor to fill in * **val** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the value to fill with Example:: >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=torch.float) >>> index = torch.tensor([0, 2]) >>> x.index_fill_(1, index, -1) tensor([[-1., 2., -1.], [-1., 5., -1.], [-1., 8., -1.]]) `index_fill(tensor1, dim, index, value) → Tensor` Out-of-place version of `torch.Tensor.index_fill_()`. `tensor1` corresponds to `self` in `torch.Tensor.index_fill_()`. `index_put_(indices, values, accumulate=False) → Tensor` Puts values from the tensor [`values`](sparse#torch.Tensor.values "torch.Tensor.values") into the tensor `self` using the indices specified in [`indices`](sparse#torch.Tensor.indices "torch.Tensor.indices") (which is a tuple of Tensors). The expression `tensor.index_put_(indices, values)` is equivalent to `tensor[indices] = values`. Returns `self`. If `accumulate` is `True`, the elements in [`values`](sparse#torch.Tensor.values "torch.Tensor.values") are added to `self`. If accumulate is `False`, the behavior is undefined if indices contain duplicate elements. Parameters * **indices** (_tuple of LongTensor_) – tensors used to index into `self`. * **values** (Tensor) – tensor of same dtype as `self`. * **accumulate** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to accumulate into self `index_put(tensor1, indices, values, accumulate=False) → Tensor` Out-place version of `index_put_()`. `tensor1` corresponds to `self` in `torch.Tensor.index_put_()`. `index_select(dim, index) → Tensor` See [`torch.index_select()`](generated/torch.index_select#torch.index_select "torch.index_select") `indices() → Tensor` Return the indices tensor of a [sparse COO tensor](sparse#sparse-coo-docs). Warning Throws an error if `self` is not a sparse COO tensor. See also [`Tensor.values()`](sparse#torch.Tensor.values "torch.Tensor.values"). Note This method can only be called on a coalesced sparse tensor. See [`Tensor.coalesce()`](sparse#torch.Tensor.coalesce "torch.Tensor.coalesce") for details. `inner(other) → Tensor` See [`torch.inner()`](generated/torch.inner#torch.inner "torch.inner"). `int(memory_format=torch.preserve_format) → Tensor` `self.int()` is equivalent to `self.to(torch.int32)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `int_repr() → Tensor` Given a quantized Tensor, `self.int_repr()` returns a CPU Tensor with uint8_t as data type that stores the underlying uint8_t values of the given Tensor. `inverse() → Tensor` See [`torch.inverse()`](generated/torch.inverse#torch.inverse "torch.inverse") `isclose(other, rtol=1e-05, atol=1e-08, equal_nan=False) → Tensor` See [`torch.isclose()`](generated/torch.isclose#torch.isclose "torch.isclose") `isfinite() → Tensor` See [`torch.isfinite()`](generated/torch.isfinite#torch.isfinite "torch.isfinite") `isinf() → Tensor` See [`torch.isinf()`](generated/torch.isinf#torch.isinf "torch.isinf") `isposinf() → Tensor` See [`torch.isposinf()`](generated/torch.isposinf#torch.isposinf "torch.isposinf") `isneginf() → Tensor` See [`torch.isneginf()`](generated/torch.isneginf#torch.isneginf "torch.isneginf") `isnan() → Tensor` See [`torch.isnan()`](generated/torch.isnan#torch.isnan "torch.isnan") `is_contiguous(memory_format=torch.contiguous_format) → bool` Returns True if `self` tensor is contiguous in memory in the order specified by memory format. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – Specifies memory allocation order. Default: `torch.contiguous_format`. `is_complex() → bool` Returns True if the data type of `self` is a complex data type. `is_floating_point() → bool` Returns True if the data type of `self` is a floating point data type. `is_leaf` All Tensors that have [`requires_grad`](autograd#torch.Tensor.requires_grad "torch.Tensor.requires_grad") which is `False` will be leaf Tensors by convention. For Tensors that have [`requires_grad`](autograd#torch.Tensor.requires_grad "torch.Tensor.requires_grad") which is `True`, they will be leaf Tensors if they were created by the user. This means that they are not the result of an operation and so `grad_fn` is None. Only leaf Tensors will have their [`grad`](autograd#torch.Tensor.grad "torch.Tensor.grad") populated during a call to [`backward()`](autograd#torch.Tensor.backward "torch.Tensor.backward"). To get [`grad`](autograd#torch.Tensor.grad "torch.Tensor.grad") populated for non- leaf Tensors, you can use [`retain_grad()`](autograd#torch.Tensor.retain_grad "torch.Tensor.retain_grad"). Example: >>> a = torch.rand(10, requires_grad=True) >>> a.is_leaf True >>> b = torch.rand(10, requires_grad=True).cuda() >>> b.is_leaf False # b was created by the operation that cast a cpu Tensor into a cuda Tensor >>> c = torch.rand(10, requires_grad=True) + 2 >>> c.is_leaf False # c was created by the addition operation >>> d = torch.rand(10).cuda() >>> d.is_leaf True # d does not require gradients and so has no operation creating it (that is tracked by the autograd engine) >>> e = torch.rand(10).cuda().requires_grad_() >>> e.is_leaf True # e requires gradients and has no operations creating it >>> f = torch.rand(10, requires_grad=True, device="cuda") >>> f.is_leaf True # f requires grad, has no operation creating it `is_pinned()` Returns true if this tensor resides in pinned memory. `is_set_to(tensor) → bool` Returns True if both tensors are pointing to the exact same memory (same storage, offset, size and stride). `is_shared()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.is_shared) Checks if tensor is in shared memory. This is always `True` for CUDA tensors. `is_signed() → bool` Returns True if the data type of `self` is a signed data type. `is_sparse` Is `True` if the Tensor uses sparse storage layout, `False` otherwise. `istft(n_fft, hop_length=None, win_length=None, window=None, center=True, normalized=False, onesided=None, length=None, return_complex=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.istft) See [`torch.istft()`](generated/torch.istft#torch.istft "torch.istft") `isreal() → Tensor` See [`torch.isreal()`](generated/torch.isreal#torch.isreal "torch.isreal") `item() → number` Returns the value of this tensor as a standard Python number. This only works for tensors with one element. For other cases, see `tolist()`. This operation is not differentiable. Example: >>> x = torch.tensor([1.0]) >>> x.item() 1.0 `kthvalue(k, dim=None, keepdim=False) -> (Tensor, LongTensor)` See [`torch.kthvalue()`](generated/torch.kthvalue#torch.kthvalue "torch.kthvalue") `lcm(other) → Tensor` See [`torch.lcm()`](generated/torch.lcm#torch.lcm "torch.lcm") `lcm_(other) → Tensor` In-place version of `lcm()` `ldexp(other) → Tensor` See [`torch.ldexp()`](generated/torch.ldexp#torch.ldexp "torch.ldexp") `ldexp_(other) → Tensor` In-place version of `ldexp()` `le(other) → Tensor` See [`torch.le()`](generated/torch.le#torch.le "torch.le"). `le_(other) → Tensor` In-place version of `le()`. `less_equal(other) → Tensor` See [`torch.less_equal()`](generated/torch.less_equal#torch.less_equal "torch.less_equal"). `less_equal_(other) → Tensor` In-place version of `less_equal()`. `lerp(end, weight) → Tensor` See [`torch.lerp()`](generated/torch.lerp#torch.lerp "torch.lerp") `lerp_(end, weight) → Tensor` In-place version of `lerp()` `lgamma() → Tensor` See [`torch.lgamma()`](generated/torch.lgamma#torch.lgamma "torch.lgamma") `lgamma_() → Tensor` In-place version of `lgamma()` `log() → Tensor` See [`torch.log()`](generated/torch.log#torch.log "torch.log") `log_() → Tensor` In-place version of `log()` `logdet() → Tensor` See [`torch.logdet()`](generated/torch.logdet#torch.logdet "torch.logdet") `log10() → Tensor` See [`torch.log10()`](generated/torch.log10#torch.log10 "torch.log10") `log10_() → Tensor` In-place version of `log10()` `log1p() → Tensor` See [`torch.log1p()`](generated/torch.log1p#torch.log1p "torch.log1p") `log1p_() → Tensor` In-place version of `log1p()` `log2() → Tensor` See [`torch.log2()`](generated/torch.log2#torch.log2 "torch.log2") `log2_() → Tensor` In-place version of `log2()` `log_normal_(mean=1, std=2, *, generator=None)` Fills `self` tensor with numbers samples from the log-normal distribution parameterized by the given mean μ\mu and standard deviation σ\sigma . Note that [`mean`](generated/torch.mean#torch.mean "torch.mean") and [`std`](generated/torch.std#torch.std "torch.std") are the mean and standard deviation of the underlying normal distribution, and not of the returned distribution: f(x)=1xσ2πe−(ln⁡x−μ)22σ2f(x) = \dfrac{1}{x \sigma \sqrt{2\pi}}\ e^{-\frac{(\ln x - \mu)^2}{2\sigma^2}} `logaddexp(other) → Tensor` See [`torch.logaddexp()`](generated/torch.logaddexp#torch.logaddexp "torch.logaddexp") `logaddexp2(other) → Tensor` See [`torch.logaddexp2()`](generated/torch.logaddexp2#torch.logaddexp2 "torch.logaddexp2") `logsumexp(dim, keepdim=False) → Tensor` See [`torch.logsumexp()`](generated/torch.logsumexp#torch.logsumexp "torch.logsumexp") `logical_and() → Tensor` See [`torch.logical_and()`](generated/torch.logical_and#torch.logical_and "torch.logical_and") `logical_and_() → Tensor` In-place version of `logical_and()` `logical_not() → Tensor` See [`torch.logical_not()`](generated/torch.logical_not#torch.logical_not "torch.logical_not") `logical_not_() → Tensor` In-place version of `logical_not()` `logical_or() → Tensor` See [`torch.logical_or()`](generated/torch.logical_or#torch.logical_or "torch.logical_or") `logical_or_() → Tensor` In-place version of `logical_or()` `logical_xor() → Tensor` See [`torch.logical_xor()`](generated/torch.logical_xor#torch.logical_xor "torch.logical_xor") `logical_xor_() → Tensor` In-place version of `logical_xor()` `logit() → Tensor` See [`torch.logit()`](generated/torch.logit#torch.logit "torch.logit") `logit_() → Tensor` In-place version of `logit()` `long(memory_format=torch.preserve_format) → Tensor` `self.long()` is equivalent to `self.to(torch.int64)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `lstsq(A) -> (Tensor, Tensor)` See [`torch.lstsq()`](generated/torch.lstsq#torch.lstsq "torch.lstsq") `lt(other) → Tensor` See [`torch.lt()`](generated/torch.lt#torch.lt "torch.lt"). `lt_(other) → Tensor` In-place version of `lt()`. `less()` lt(other) -> Tensor See [`torch.less()`](generated/torch.less#torch.less "torch.less"). `less_(other) → Tensor` In-place version of `less()`. `lu(pivot=True, get_infos=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.lu) See [`torch.lu()`](generated/torch.lu#torch.lu "torch.lu") `lu_solve(LU_data, LU_pivots) → Tensor` See [`torch.lu_solve()`](generated/torch.lu_solve#torch.lu_solve "torch.lu_solve") `as_subclass(cls) → Tensor` Makes a `cls` instance with the same data pointer as `self`. Changes in the output mirror changes in `self`, and the output stays attached to the autograd graph. `cls` must be a subclass of `Tensor`. `map_(tensor, callable)` Applies `callable` for each element in `self` tensor and the given [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") and stores the results in `self` tensor. `self` tensor and the given [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics). The `callable` should have the signature: def callable(a, b) -> number `masked_scatter_(mask, source)` Copies elements from `source` into `self` tensor at positions where the `mask` is True. The shape of `mask` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the shape of the underlying tensor. The `source` should have at least as many elements as the number of ones in `mask` Parameters * **mask** (_BoolTensor_) – the boolean mask * **source** (Tensor) – the tensor to copy from Note The `mask` operates on the `self` tensor, not on the given `source` tensor. `masked_scatter(mask, tensor) → Tensor` Out-of-place version of `torch.Tensor.masked_scatter_()` `masked_fill_(mask, value)` Fills elements of `self` tensor with `value` where `mask` is True. The shape of `mask` must be [broadcastable](https://pytorch.org/docs/1.8.0/notes/broadcasting.html#broadcasting- semantics) with the shape of the underlying tensor. Parameters * **mask** (_BoolTensor_) – the boolean mask * **value** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the value to fill in with `masked_fill(mask, value) → Tensor` Out-of-place version of `torch.Tensor.masked_fill_()` `masked_select(mask) → Tensor` See [`torch.masked_select()`](generated/torch.masked_select#torch.masked_select "torch.masked_select") `matmul(tensor2) → Tensor` See [`torch.matmul()`](generated/torch.matmul#torch.matmul "torch.matmul") `matrix_power(n) → Tensor` See [`torch.matrix_power()`](generated/torch.matrix_power#torch.matrix_power "torch.matrix_power") `matrix_exp() → Tensor` See [`torch.matrix_exp()`](generated/torch.matrix_exp#torch.matrix_exp "torch.matrix_exp") `max(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)` See [`torch.max()`](generated/torch.max#torch.max "torch.max") `maximum(other) → Tensor` See [`torch.maximum()`](generated/torch.maximum#torch.maximum "torch.maximum") `mean(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)` See [`torch.mean()`](generated/torch.mean#torch.mean "torch.mean") `median(dim=None, keepdim=False) -> (Tensor, LongTensor)` See [`torch.median()`](generated/torch.median#torch.median "torch.median") `nanmedian(dim=None, keepdim=False) -> (Tensor, LongTensor)` See [`torch.nanmedian()`](generated/torch.nanmedian#torch.nanmedian "torch.nanmedian") `min(dim=None, keepdim=False) -> Tensor or (Tensor, Tensor)` See [`torch.min()`](generated/torch.min#torch.min "torch.min") `minimum(other) → Tensor` See [`torch.minimum()`](generated/torch.minimum#torch.minimum "torch.minimum") `mm(mat2) → Tensor` See [`torch.mm()`](generated/torch.mm#torch.mm "torch.mm") `smm(mat) → Tensor` See [`torch.smm()`](sparse#torch.smm "torch.smm") `mode(dim=None, keepdim=False) -> (Tensor, LongTensor)` See [`torch.mode()`](generated/torch.mode#torch.mode "torch.mode") `movedim(source, destination) → Tensor` See [`torch.movedim()`](generated/torch.movedim#torch.movedim "torch.movedim") `moveaxis(source, destination) → Tensor` See [`torch.moveaxis()`](generated/torch.moveaxis#torch.moveaxis "torch.moveaxis") `msort() → Tensor` See [`torch.msort()`](generated/torch.msort#torch.msort "torch.msort") `mul(value) → Tensor` See [`torch.mul()`](generated/torch.mul#torch.mul "torch.mul"). `mul_(value) → Tensor` In-place version of `mul()`. `multiply(value) → Tensor` See [`torch.multiply()`](generated/torch.multiply#torch.multiply "torch.multiply"). `multiply_(value) → Tensor` In-place version of `multiply()`. `multinomial(num_samples, replacement=False, *, generator=None) → Tensor` See [`torch.multinomial()`](generated/torch.multinomial#torch.multinomial "torch.multinomial") `mv(vec) → Tensor` See [`torch.mv()`](generated/torch.mv#torch.mv "torch.mv") `mvlgamma(p) → Tensor` See [`torch.mvlgamma()`](generated/torch.mvlgamma#torch.mvlgamma "torch.mvlgamma") `mvlgamma_(p) → Tensor` In-place version of `mvlgamma()` `nansum(dim=None, keepdim=False, dtype=None) → Tensor` See [`torch.nansum()`](generated/torch.nansum#torch.nansum "torch.nansum") `narrow(dimension, start, length) → Tensor` See [`torch.narrow()`](generated/torch.narrow#torch.narrow "torch.narrow") Example: >>> x = torch.tensor([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> x.narrow(0, 0, 2) tensor([[ 1, 2, 3], [ 4, 5, 6]]) >>> x.narrow(1, 1, 2) tensor([[ 2, 3], [ 5, 6], [ 8, 9]]) `narrow_copy(dimension, start, length) → Tensor` Same as `Tensor.narrow()` except returning a copy rather than shared storage. This is primarily for sparse tensors, which do not have a shared-storage narrow method. Calling ``narrow_copy` with ``dimemsion > self.sparse_dim()`` will return a copy with the relevant dense dimension narrowed, and ``self.shape`` updated accordingly. `ndimension() → int` Alias for `dim()` `nan_to_num(nan=0.0, posinf=None, neginf=None) → Tensor` See [`torch.nan_to_num()`](generated/torch.nan_to_num#torch.nan_to_num "torch.nan_to_num"). `nan_to_num_(nan=0.0, posinf=None, neginf=None) → Tensor` In-place version of `nan_to_num()`. `ne(other) → Tensor` See [`torch.ne()`](generated/torch.ne#torch.ne "torch.ne"). `ne_(other) → Tensor` In-place version of `ne()`. `not_equal(other) → Tensor` See [`torch.not_equal()`](generated/torch.not_equal#torch.not_equal "torch.not_equal"). `not_equal_(other) → Tensor` In-place version of `not_equal()`. `neg() → Tensor` See [`torch.neg()`](generated/torch.neg#torch.neg "torch.neg") `neg_() → Tensor` In-place version of `neg()` `negative() → Tensor` See [`torch.negative()`](generated/torch.negative#torch.negative "torch.negative") `negative_() → Tensor` In-place version of `negative()` `nelement() → int` Alias for `numel()` `nextafter(other) → Tensor` See [`torch.nextafter()`](generated/torch.nextafter#torch.nextafter "torch.nextafter") `nextafter_(other) → Tensor` In-place version of `nextafter()` `nonzero() → LongTensor` See [`torch.nonzero()`](generated/torch.nonzero#torch.nonzero "torch.nonzero") `norm(p='fro', dim=None, keepdim=False, dtype=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.norm) See [`torch.norm()`](generated/torch.norm#torch.norm "torch.norm") `normal_(mean=0, std=1, *, generator=None) → Tensor` Fills `self` tensor with elements samples from the normal distribution parameterized by [`mean`](generated/torch.mean#torch.mean "torch.mean") and [`std`](generated/torch.std#torch.std "torch.std"). `numel() → int` See [`torch.numel()`](generated/torch.numel#torch.numel "torch.numel") `numpy() → numpy.ndarray` Returns `self` tensor as a NumPy `ndarray`. This tensor and the returned `ndarray` share the same underlying storage. Changes to `self` tensor will be reflected in the `ndarray` and vice versa. `orgqr(input2) → Tensor` See [`torch.orgqr()`](generated/torch.orgqr#torch.orgqr "torch.orgqr") `ormqr(input2, input3, left=True, transpose=False) → Tensor` See [`torch.ormqr()`](generated/torch.ormqr#torch.ormqr "torch.ormqr") `outer(vec2) → Tensor` See [`torch.outer()`](generated/torch.outer#torch.outer "torch.outer"). `permute(*dims) → Tensor` Returns a view of the original tensor with its dimensions permuted. Parameters ***dims** (_int..._) – The desired ordering of dimensions #### Example >>> x = torch.randn(2, 3, 5) >>> x.size() torch.Size([2, 3, 5]) >>> x.permute(2, 0, 1).size() torch.Size([5, 2, 3]) `pin_memory() → Tensor` Copies the tensor to pinned memory, if it’s not already pinned. `pinverse() → Tensor` See [`torch.pinverse()`](generated/torch.pinverse#torch.pinverse "torch.pinverse") `polygamma(n) → Tensor` See [`torch.polygamma()`](generated/torch.polygamma#torch.polygamma "torch.polygamma") `polygamma_(n) → Tensor` In-place version of `polygamma()` `pow(exponent) → Tensor` See [`torch.pow()`](generated/torch.pow#torch.pow "torch.pow") `pow_(exponent) → Tensor` In-place version of `pow()` `prod(dim=None, keepdim=False, dtype=None) → Tensor` See [`torch.prod()`](generated/torch.prod#torch.prod "torch.prod") `put_(indices, tensor, accumulate=False) → Tensor` Copies the elements from [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") into the positions specified by indices. For the purpose of indexing, the `self` tensor is treated as if it were a 1-D tensor. If `accumulate` is `True`, the elements in [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") are added to `self`. If accumulate is `False`, the behavior is undefined if indices contain duplicate elements. Parameters * **indices** (_LongTensor_) – the indices into self * **tensor** (Tensor) – the tensor containing values to copy from * **accumulate** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – whether to accumulate into self Example: >>> src = torch.tensor([[4, 3, 5], ... [6, 7, 8]]) >>> src.put_(torch.tensor([1, 3]), torch.tensor([9, 10])) tensor([[ 4, 9, 5], [ 10, 7, 8]]) `qr(some=True) -> (Tensor, Tensor)` See [`torch.qr()`](generated/torch.qr#torch.qr "torch.qr") `qscheme() → torch.qscheme` Returns the quantization scheme of a given QTensor. `quantile(q, dim=None, keepdim=False) → Tensor` See [`torch.quantile()`](generated/torch.quantile#torch.quantile "torch.quantile") `nanquantile(q, dim=None, keepdim=False) → Tensor` See [`torch.nanquantile()`](generated/torch.nanquantile#torch.nanquantile "torch.nanquantile") `q_scale() → float` Given a Tensor quantized by linear(affine) quantization, returns the scale of the underlying quantizer(). `q_zero_point() → int` Given a Tensor quantized by linear(affine) quantization, returns the zero_point of the underlying quantizer(). `q_per_channel_scales() → Tensor` Given a Tensor quantized by linear (affine) per-channel quantization, returns a Tensor of scales of the underlying quantizer. It has the number of elements that matches the corresponding dimensions (from q_per_channel_axis) of the tensor. `q_per_channel_zero_points() → Tensor` Given a Tensor quantized by linear (affine) per-channel quantization, returns a tensor of zero_points of the underlying quantizer. It has the number of elements that matches the corresponding dimensions (from q_per_channel_axis) of the tensor. `q_per_channel_axis() → int` Given a Tensor quantized by linear (affine) per-channel quantization, returns the index of dimension on which per-channel quantization is applied. `rad2deg() → Tensor` See [`torch.rad2deg()`](generated/torch.rad2deg#torch.rad2deg "torch.rad2deg") `random_(from=0, to=None, *, generator=None) → Tensor` Fills `self` tensor with numbers sampled from the discrete uniform distribution over `[from, to - 1]`. If not specified, the values are usually only bounded by `self` tensor’s data type. However, for floating point types, if unspecified, range will be `[0, 2^mantissa]` to ensure that every value is representable. For example, `torch.tensor(1, dtype=torch.double).random_()` will be uniform in `[0, 2^53]`. `ravel(input) → Tensor` see [`torch.ravel()`](generated/torch.ravel#torch.ravel "torch.ravel") `reciprocal() → Tensor` See [`torch.reciprocal()`](generated/torch.reciprocal#torch.reciprocal "torch.reciprocal") `reciprocal_() → Tensor` In-place version of `reciprocal()` `record_stream(stream)` Ensures that the tensor memory is not reused for another tensor until all current work queued on `stream` are complete. Note The caching allocator is aware of only the stream where a tensor was allocated. Due to the awareness, it already correctly manages the life cycle of tensors on only one stream. But if a tensor is used on a stream different from the stream of origin, the allocator might reuse the memory unexpectedly. Calling this method lets the allocator know which streams have used the tensor. `register_hook(hook)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.register_hook) Registers a backward hook. The hook will be called every time a gradient with respect to the Tensor is computed. The hook should have the following signature: hook(grad) -> Tensor or None The hook should not modify its argument, but it can optionally return a new gradient which will be used in place of [`grad`](autograd#torch.Tensor.grad "torch.Tensor.grad"). This function returns a handle with a method `handle.remove()` that removes the hook from the module. Example: >>> v = torch.tensor([0., 0., 0.], requires_grad=True) >>> h = v.register_hook(lambda grad: grad * 2) # double the gradient >>> v.backward(torch.tensor([1., 2., 3.])) >>> v.grad 2 4 6 [torch.FloatTensor of size (3,)] >>> h.remove() # removes the hook `remainder(divisor) → Tensor` See [`torch.remainder()`](generated/torch.remainder#torch.remainder "torch.remainder") `remainder_(divisor) → Tensor` In-place version of `remainder()` `renorm(p, dim, maxnorm) → Tensor` See [`torch.renorm()`](generated/torch.renorm#torch.renorm "torch.renorm") `renorm_(p, dim, maxnorm) → Tensor` In-place version of `renorm()` `repeat(*sizes) → Tensor` Repeats this tensor along the specified dimensions. Unlike `expand()`, this function copies the tensor’s data. Warning `repeat()` behaves differently from [numpy.repeat](https://docs.scipy.org/doc/numpy/reference/generated/numpy.repeat.html), but is more similar to [numpy.tile](https://docs.scipy.org/doc/numpy/reference/generated/numpy.tile.html). For the operator similar to `numpy.repeat`, see [`torch.repeat_interleave()`](generated/torch.repeat_interleave#torch.repeat_interleave "torch.repeat_interleave"). Parameters **sizes** (_torch.Size_ _or_ _int..._) – The number of times to repeat this tensor along each dimension Example: >>> x = torch.tensor([1, 2, 3]) >>> x.repeat(4, 2) tensor([[ 1, 2, 3, 1, 2, 3], [ 1, 2, 3, 1, 2, 3], [ 1, 2, 3, 1, 2, 3], [ 1, 2, 3, 1, 2, 3]]) >>> x.repeat(4, 2, 1).size() torch.Size([4, 2, 3]) `repeat_interleave(repeats, dim=None) → Tensor` See [`torch.repeat_interleave()`](generated/torch.repeat_interleave#torch.repeat_interleave "torch.repeat_interleave"). `requires_grad` Is `True` if gradients need to be computed for this Tensor, `False` otherwise. Note The fact that gradients need to be computed for a Tensor do not mean that the [`grad`](autograd#torch.Tensor.grad "torch.Tensor.grad") attribute will be populated, see [`is_leaf`](autograd#torch.Tensor.is_leaf "torch.Tensor.is_leaf") for more details. `requires_grad_(requires_grad=True) → Tensor` Change if autograd should record operations on this tensor: sets this tensor’s [`requires_grad`](autograd#torch.Tensor.requires_grad "torch.Tensor.requires_grad") attribute in-place. Returns this tensor. `requires_grad_()`’s main use case is to tell autograd to begin recording operations on a Tensor `tensor`. If `tensor` has `requires_grad=False` (because it was obtained through a DataLoader, or required preprocessing or initialization), `tensor.requires_grad_()` makes it so that autograd will begin to record operations on `tensor`. Parameters **requires_grad** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If autograd should record operations on this tensor. Default: `True`. Example: >>> # Let's say we want to preprocess some saved weights and use >>> # the result as new weights. >>> saved_weights = [0.1, 0.2, 0.3, 0.25] >>> loaded_weights = torch.tensor(saved_weights) >>> weights = preprocess(loaded_weights) # some function >>> weights tensor([-0.5503, 0.4926, -2.1158, -0.8303]) >>> # Now, start to record operations done to weights >>> weights.requires_grad_() >>> out = weights.pow(2).sum() >>> out.backward() >>> weights.grad tensor([-1.1007, 0.9853, -4.2316, -1.6606]) `reshape(*shape) → Tensor` Returns a tensor with the same data and number of elements as `self` but with the specified shape. This method returns a view if `shape` is compatible with the current shape. See `torch.Tensor.view()` on when it is possible to return a view. See [`torch.reshape()`](generated/torch.reshape#torch.reshape "torch.reshape") Parameters **shape** (_tuple of python:ints_ _or_ _int..._) – the desired shape `reshape_as(other) → Tensor` Returns this tensor as the same shape as `other`. `self.reshape_as(other)` is equivalent to `self.reshape(other.sizes())`. This method returns a view if `other.sizes()` is compatible with the current shape. See `torch.Tensor.view()` on when it is possible to return a view. Please see [`reshape()`](generated/torch.reshape#torch.reshape "torch.reshape") for more information about `reshape`. Parameters **other** (`torch.Tensor`) – The result tensor has the same shape as `other`. `resize_(*sizes, memory_format=torch.contiguous_format) → Tensor` Resizes `self` tensor to the specified size. If the number of elements is larger than the current storage size, then the underlying storage is resized to fit the new number of elements. If the number of elements is smaller, the underlying storage is not changed. Existing elements are preserved but any new memory is uninitialized. Warning This is a low-level method. The storage is reinterpreted as C-contiguous, ignoring the current strides (unless the target size equals the current size, in which case the tensor is left unchanged). For most purposes, you will instead want to use `view()`, which checks for contiguity, or `reshape()`, which copies data if needed. To change the size in-place with custom strides, see `set_()`. Parameters * **sizes** (_torch.Size_ _or_ _int..._) – the desired size * **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of Tensor. Default: `torch.contiguous_format`. Note that memory format of `self` is going to be unaffected if `self.size()` matches `sizes`. Example: >>> x = torch.tensor([[1, 2], [3, 4], [5, 6]]) >>> x.resize_(2, 2) tensor([[ 1, 2], [ 3, 4]]) `resize_as_(tensor, memory_format=torch.contiguous_format) → Tensor` Resizes the `self` tensor to be the same size as the specified [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor"). This is equivalent to `self.resize_(tensor.size())`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of Tensor. Default: `torch.contiguous_format`. Note that memory format of `self` is going to be unaffected if `self.size()` matches `tensor.size()`. `retain_grad()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.retain_grad) Enables .grad attribute for non-leaf Tensors. `roll(shifts, dims) → Tensor` See [`torch.roll()`](generated/torch.roll#torch.roll "torch.roll") `rot90(k, dims) → Tensor` See [`torch.rot90()`](generated/torch.rot90#torch.rot90 "torch.rot90") `round() → Tensor` See [`torch.round()`](generated/torch.round#torch.round "torch.round") `round_() → Tensor` In-place version of `round()` `rsqrt() → Tensor` See [`torch.rsqrt()`](generated/torch.rsqrt#torch.rsqrt "torch.rsqrt") `rsqrt_() → Tensor` In-place version of `rsqrt()` `scatter(dim, index, src) → Tensor` Out-of-place version of `torch.Tensor.scatter_()` `scatter_(dim, index, src, reduce=None) → Tensor` Writes all values from the tensor `src` into `self` at the indices specified in the `index` tensor. For each value in `src`, its output index is specified by its index in `src` for `dimension != dim` and by the corresponding value in `index` for `dimension = dim`. For a 3-D tensor, `self` is updated as: self[index[i][j][k]][j][k] = src[i][j][k] # if dim == 0 self[i][index[i][j][k]][k] = src[i][j][k] # if dim == 1 self[i][j][index[i][j][k]] = src[i][j][k] # if dim == 2 This is the reverse operation of the manner described in `gather()`. `self`, `index` and `src` (if it is a Tensor) should all have the same number of dimensions. It is also required that `index.size(d) <= src.size(d)` for all dimensions `d`, and that `index.size(d) <= self.size(d)` for all dimensions `d != dim`. Note that `index` and `src` do not broadcast. Moreover, as for `gather()`, the values of `index` must be between `0` and `self.size(dim) - 1` inclusive. Warning When indices are not unique, the behavior is non-deterministic (one of the values from `src` will be picked arbitrarily) and the gradient will be incorrect (it will be propagated to all locations in the source that correspond to the same index)! Note The backward pass is implemented only for `src.shape == index.shape`. Additionally accepts an optional `reduce` argument that allows specification of an optional reduction operation, which is applied to all values in the tensor `src` into `self` at the indicies specified in the `index`. For each value in `src`, the reduction operation is applied to an index in `self` which is specified by its index in `src` for `dimension != dim` and by the corresponding value in `index` for `dimension = dim`. Given a 3-D tensor and reduction using the multiplication operation, `self` is updated as: self[index[i][j][k]][j][k] *= src[i][j][k] # if dim == 0 self[i][index[i][j][k]][k] *= src[i][j][k] # if dim == 1 self[i][j][index[i][j][k]] *= src[i][j][k] # if dim == 2 Reducing with the addition operation is the same as using `scatter_add_()`. Parameters * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the axis along which to index * **index** (_LongTensor_) – the indices of elements to scatter, can be either empty or of the same dimensionality as `src`. When empty, the operation returns `self` unchanged. * **src** (Tensor _or_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)")) – the source element(s) to scatter. * **reduce** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)") _,__optional_) – reduction operation to apply, can be either `'add'` or `'multiply'`. Example: >>> src = torch.arange(1, 11).reshape((2, 5)) >>> src tensor([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10]]) >>> index = torch.tensor([[0, 1, 2, 0]]) >>> torch.zeros(3, 5, dtype=src.dtype).scatter_(0, index, src) tensor([[1, 0, 0, 4, 0], [0, 2, 0, 0, 0], [0, 0, 3, 0, 0]]) >>> index = torch.tensor([[0, 1, 2], [0, 1, 4]]) >>> torch.zeros(3, 5, dtype=src.dtype).scatter_(1, index, src) tensor([[1, 2, 3, 0, 0], [6, 7, 0, 0, 8], [0, 0, 0, 0, 0]]) >>> torch.full((2, 4), 2.).scatter_(1, torch.tensor([[2], [3]]), ... 1.23, reduce='multiply') tensor([[2.0000, 2.0000, 2.4600, 2.0000], [2.0000, 2.0000, 2.0000, 2.4600]]) >>> torch.full((2, 4), 2.).scatter_(1, torch.tensor([[2], [3]]), ... 1.23, reduce='add') tensor([[2.0000, 2.0000, 3.2300, 2.0000], [2.0000, 2.0000, 2.0000, 3.2300]]) `scatter_add_(dim, index, src) → Tensor` Adds all values from the tensor `other` into `self` at the indices specified in the `index` tensor in a similar fashion as `scatter_()`. For each value in `src`, it is added to an index in `self` which is specified by its index in `src` for `dimension != dim` and by the corresponding value in `index` for `dimension = dim`. For a 3-D tensor, `self` is updated as: self[index[i][j][k]][j][k] += src[i][j][k] # if dim == 0 self[i][index[i][j][k]][k] += src[i][j][k] # if dim == 1 self[i][j][index[i][j][k]] += src[i][j][k] # if dim == 2 `self`, `index` and `src` should have same number of dimensions. It is also required that `index.size(d) <= src.size(d)` for all dimensions `d`, and that `index.size(d) <= self.size(d)` for all dimensions `d != dim`. Note that `index` and `src` do not broadcast. Note This operation may behave nondeterministically when given tensors on a CUDA device. See [Reproducibility](https://pytorch.org/docs/1.8.0/notes/randomness.html) for more information. Note The backward pass is implemented only for `src.shape == index.shape`. Parameters * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the axis along which to index * **index** (_LongTensor_) – the indices of elements to scatter and add, can be either empty or of the same dimensionality as `src`. When empty, the operation returns `self` unchanged. * **src** (Tensor) – the source elements to scatter and add Example: >>> src = torch.ones((2, 5)) >>> index = torch.tensor([[0, 1, 2, 0, 0]]) >>> torch.zeros(3, 5, dtype=src.dtype).scatter_add_(0, index, src) tensor([[1., 0., 0., 1., 1.], [0., 1., 0., 0., 0.], [0., 0., 1., 0., 0.]]) >>> index = torch.tensor([[0, 1, 2, 0, 0], [0, 1, 2, 2, 2]]) >>> torch.zeros(3, 5, dtype=src.dtype).scatter_add_(0, index, src) tensor([[2., 0., 0., 1., 1.], [0., 2., 0., 0., 0.], [0., 0., 2., 1., 1.]]) `scatter_add(dim, index, src) → Tensor` Out-of-place version of `torch.Tensor.scatter_add_()` `select(dim, index) → Tensor` Slices the `self` tensor along the selected dimension at the given index. This function returns a view of the original tensor with the given dimension removed. Parameters * **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the dimension to slice * **index** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the index to select with Note `select()` is equivalent to slicing. For example, `tensor.select(0, index)` is equivalent to `tensor[index]` and `tensor.select(2, index)` is equivalent to `tensor[:,:,index]`. `set_(source=None, storage_offset=0, size=None, stride=None) → Tensor` Sets the underlying storage, size, and strides. If `source` is a tensor, `self` tensor will share the same storage and have the same size and strides as `source`. Changes to elements in one tensor will be reflected in the other. If `source` is a `Storage`, the method sets the underlying storage, offset, size, and stride. Parameters * **source** (Tensor _or_ _Storage_) – the tensor or storage to use * **storage_offset** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the offset in the storage * **size** (_torch.Size_ _,__optional_) – the desired size. Defaults to the size of the source. * **stride** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)") _,__optional_) – the desired stride. Defaults to C-contiguous strides. `share_memory_()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.share_memory_) Moves the underlying storage to shared memory. This is a no-op if the underlying storage is already in shared memory and for CUDA tensors. Tensors in shared memory cannot be resized. `short(memory_format=torch.preserve_format) → Tensor` `self.short()` is equivalent to `self.to(torch.int16)`. See `to()`. Parameters **memory_format** ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional) – the desired memory format of returned Tensor. Default: `torch.preserve_format`. `sigmoid() → Tensor` See [`torch.sigmoid()`](generated/torch.sigmoid#torch.sigmoid "torch.sigmoid") `sigmoid_() → Tensor` In-place version of `sigmoid()` `sign() → Tensor` See [`torch.sign()`](generated/torch.sign#torch.sign "torch.sign") `sign_() → Tensor` In-place version of `sign()` `signbit() → Tensor` See [`torch.signbit()`](generated/torch.signbit#torch.signbit "torch.signbit") `sgn() → Tensor` See [`torch.sgn()`](generated/torch.sgn#torch.sgn "torch.sgn") `sgn_() → Tensor` In-place version of `sgn()` `sin() → Tensor` See [`torch.sin()`](generated/torch.sin#torch.sin "torch.sin") `sin_() → Tensor` In-place version of `sin()` `sinc() → Tensor` See [`torch.sinc()`](generated/torch.sinc#torch.sinc "torch.sinc") `sinc_() → Tensor` In-place version of `sinc()` `sinh() → Tensor` See [`torch.sinh()`](generated/torch.sinh#torch.sinh "torch.sinh") `sinh_() → Tensor` In-place version of `sinh()` `asinh() → Tensor` See [`torch.asinh()`](generated/torch.asinh#torch.asinh "torch.asinh") `asinh_() → Tensor` In-place version of `asinh()` `arcsinh() → Tensor` See [`torch.arcsinh()`](generated/torch.arcsinh#torch.arcsinh "torch.arcsinh") `arcsinh_() → Tensor` In-place version of `arcsinh()` `size() → torch.Size` Returns the size of the `self` tensor. The returned value is a subclass of [`tuple`](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)"). Example: >>> torch.empty(3, 4, 5).size() torch.Size([3, 4, 5]) `slogdet() -> (Tensor, Tensor)` See [`torch.slogdet()`](generated/torch.slogdet#torch.slogdet "torch.slogdet") `solve(A) → Tensor, Tensor` See [`torch.solve()`](generated/torch.solve#torch.solve "torch.solve") `sort(dim=-1, descending=False) -> (Tensor, LongTensor)` See [`torch.sort()`](generated/torch.sort#torch.sort "torch.sort") `split(split_size, dim=0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.split) See [`torch.split()`](generated/torch.split#torch.split "torch.split") `sparse_mask(mask) → Tensor` Returns a new [sparse tensor](sparse#sparse-docs) with values from a strided tensor `self` filtered by the indices of the sparse tensor `mask`. The values of `mask` sparse tensor are ignored. `self` and `mask` tensors must have the same shape. Note The returned sparse tensor has the same indices as the sparse tensor `mask`, even when the corresponding values in `self` are zeros. Parameters **mask** (Tensor) – a sparse tensor whose indices are used as a filter Example: >>> nse = 5 >>> dims = (5, 5, 2, 2) >>> I = torch.cat([torch.randint(0, dims[0], size=(nse,)), ... torch.randint(0, dims[1], size=(nse,))], 0).reshape(2, nse) >>> V = torch.randn(nse, dims[2], dims[3]) >>> S = torch.sparse_coo_tensor(I, V, dims).coalesce() >>> D = torch.randn(dims) >>> D.sparse_mask(S) tensor(indices=tensor([[0, 0, 0, 2], [0, 1, 4, 3]]), values=tensor([[[ 1.6550, 0.2397], [-0.1611, -0.0779]], [[ 0.2326, -1.0558], [ 1.4711, 1.9678]], [[-0.5138, -0.0411], [ 1.9417, 0.5158]], [[ 0.0793, 0.0036], [-0.2569, -0.1055]]]), size=(5, 5, 2, 2), nnz=4, layout=torch.sparse_coo) `sparse_dim() → int` Return the number of sparse dimensions in a [sparse tensor](sparse#sparse- docs) `self`. Warning Throws an error if `self` is not a sparse tensor. See also [`Tensor.dense_dim()`](sparse#torch.Tensor.dense_dim "torch.Tensor.dense_dim") and [hybrid tensors](sparse#sparse-hybrid-coo-docs). `sqrt() → Tensor` See [`torch.sqrt()`](generated/torch.sqrt#torch.sqrt "torch.sqrt") `sqrt_() → Tensor` In-place version of `sqrt()` `square() → Tensor` See [`torch.square()`](generated/torch.square#torch.square "torch.square") `square_() → Tensor` In-place version of `square()` `squeeze(dim=None) → Tensor` See [`torch.squeeze()`](generated/torch.squeeze#torch.squeeze "torch.squeeze") `squeeze_(dim=None) → Tensor` In-place version of `squeeze()` `std(dim=None, unbiased=True, keepdim=False) → Tensor` See [`torch.std()`](generated/torch.std#torch.std "torch.std") `stft(n_fft, hop_length=None, win_length=None, window=None, center=True, pad_mode='reflect', normalized=False, onesided=None, return_complex=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.stft) See [`torch.stft()`](generated/torch.stft#torch.stft "torch.stft") Warning This function changed signature at version 0.4.1. Calling with the previous signature may cause error or return incorrect result. `storage() → torch.Storage` Returns the underlying storage. `storage_offset() → int` Returns `self` tensor’s offset in the underlying storage in terms of number of storage elements (not bytes). Example: >>> x = torch.tensor([1, 2, 3, 4, 5]) >>> x.storage_offset() 0 >>> x[3:].storage_offset() 3 `storage_type() → type` Returns the type of the underlying storage. `stride(dim) → tuple or int` Returns the stride of `self` tensor. Stride is the jump necessary to go from one element to the next one in the specified dimension `dim`. A tuple of all strides is returned when no argument is passed in. Otherwise, an integer value is returned as the stride in the particular dimension `dim`. Parameters **dim** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the desired dimension in which stride is required Example: >>> x = torch.tensor([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]]) >>> x.stride() (5, 1) >>> x.stride(0) 5 >>> x.stride(-1) 1 `sub(other, *, alpha=1) → Tensor` See [`torch.sub()`](generated/torch.sub#torch.sub "torch.sub"). `sub_(other, *, alpha=1) → Tensor` In-place version of `sub()` `subtract(other, *, alpha=1) → Tensor` See [`torch.subtract()`](generated/torch.subtract#torch.subtract "torch.subtract"). `subtract_(other, *, alpha=1) → Tensor` In-place version of `subtract()`. `sum(dim=None, keepdim=False, dtype=None) → Tensor` See [`torch.sum()`](generated/torch.sum#torch.sum "torch.sum") `sum_to_size(*size) → Tensor` Sum `this` tensor to `size`. `size` must be broadcastable to `this` tensor size. Parameters **size** (_int..._) – a sequence of integers defining the shape of the output tensor. `svd(some=True, compute_uv=True) -> (Tensor, Tensor, Tensor)` See [`torch.svd()`](generated/torch.svd#torch.svd "torch.svd") `swapaxes(axis0, axis1) → Tensor` See [`torch.swapaxes()`](generated/torch.swapaxes#torch.swapaxes "torch.swapaxes") `swapdims(dim0, dim1) → Tensor` See [`torch.swapdims()`](generated/torch.swapdims#torch.swapdims "torch.swapdims") `symeig(eigenvectors=False, upper=True) -> (Tensor, Tensor)` See [`torch.symeig()`](generated/torch.symeig#torch.symeig "torch.symeig") `t() → Tensor` See [`torch.t()`](generated/torch.t#torch.t "torch.t") `t_() → Tensor` In-place version of `t()` `tensor_split(indices_or_sections, dim=0) → List of Tensors` See [`torch.tensor_split()`](generated/torch.tensor_split#torch.tensor_split "torch.tensor_split") `tile(*reps) → Tensor` See [`torch.tile()`](generated/torch.tile#torch.tile "torch.tile") `to(*args, **kwargs) → Tensor` Performs Tensor dtype and/or device conversion. A [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") are inferred from the arguments of `self.to(*args, **kwargs)`. Note If the `self` Tensor already has the correct [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"), then `self` is returned. Otherwise, the returned tensor is a copy of `self` with the desired [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device"). Here are the ways to call `to`: `to(dtype, non_blocking=False, copy=False, memory_format=torch.preserve_format) → Tensor` Returns a Tensor with the specified `dtype` Args: memory_format ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional): the desired memory format of returned Tensor. Default: `torch.preserve_format`. `to(device=None, dtype=None, non_blocking=False, copy=False, memory_format=torch.preserve_format) → Tensor` Returns a Tensor with the specified `device` and (optional) `dtype`. If `dtype` is `None` it is inferred to be `self.dtype`. When `non_blocking`, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor. When `copy` is set, a new Tensor is created even when the Tensor already matches the desired conversion. Args: memory_format ([`torch.memory_format`](tensor_attributes#torch.torch.memory_format "torch.torch.memory_format"), optional): the desired memory format of returned Tensor. Default: `torch.preserve_format`. `to(other, non_blocking=False, copy=False) → Tensor` Returns a Tensor with same [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") and [`torch.device`](tensor_attributes#torch.torch.device "torch.torch.device") as the Tensor `other`. When `non_blocking`, tries to convert asynchronously with respect to the host if possible, e.g., converting a CPU Tensor with pinned memory to a CUDA Tensor. When `copy` is set, a new Tensor is created even when the Tensor already matches the desired conversion. Example: >>> tensor = torch.randn(2, 2) # Initially dtype=float32, device=cpu >>> tensor.to(torch.float64) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], dtype=torch.float64) >>> cuda0 = torch.device('cuda:0') >>> tensor.to(cuda0) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], device='cuda:0') >>> tensor.to(cuda0, dtype=torch.float64) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0') >>> other = torch.randn((), dtype=torch.float64, device=cuda0) >>> tensor.to(other, non_blocking=True) tensor([[-0.5044, 0.0005], [ 0.3310, -0.0584]], dtype=torch.float64, device='cuda:0') `to_mkldnn() → Tensor` Returns a copy of the tensor in `torch.mkldnn` layout. `take(indices) → Tensor` See [`torch.take()`](generated/torch.take#torch.take "torch.take") `tan() → Tensor` See [`torch.tan()`](generated/torch.tan#torch.tan "torch.tan") `tan_() → Tensor` In-place version of `tan()` `tanh() → Tensor` See [`torch.tanh()`](generated/torch.tanh#torch.tanh "torch.tanh") `tanh_() → Tensor` In-place version of `tanh()` `atanh() → Tensor` See [`torch.atanh()`](generated/torch.atanh#torch.atanh "torch.atanh") `atanh_(other) → Tensor` In-place version of `atanh()` `arctanh() → Tensor` See [`torch.arctanh()`](generated/torch.arctanh#torch.arctanh "torch.arctanh") `arctanh_(other) → Tensor` In-place version of `arctanh()` `tolist() → list or number` Returns the tensor as a (nested) list. For scalars, a standard Python number is returned, just like with `item()`. Tensors are automatically moved to the CPU first if necessary. This operation is not differentiable. Examples: >>> a = torch.randn(2, 2) >>> a.tolist() [[0.012766935862600803, 0.5415473580360413], [-0.08909505605697632, 0.7729271650314331]] >>> a[0,0].tolist() 0.012766935862600803 `topk(k, dim=None, largest=True, sorted=True) -> (Tensor, LongTensor)` See [`torch.topk()`](generated/torch.topk#torch.topk "torch.topk") `to_sparse(sparseDims) → Tensor` Returns a sparse copy of the tensor. PyTorch supports sparse tensors in [coordinate format](sparse#sparse-coo-docs). Parameters **sparseDims** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,__optional_) – the number of sparse dimensions to include in the new sparse tensor Example: >>> d = torch.tensor([[0, 0, 0], [9, 0, 10], [0, 0, 0]]) >>> d tensor([[ 0, 0, 0], [ 9, 0, 10], [ 0, 0, 0]]) >>> d.to_sparse() tensor(indices=tensor([[1, 1], [0, 2]]), values=tensor([ 9, 10]), size=(3, 3), nnz=2, layout=torch.sparse_coo) >>> d.to_sparse(1) tensor(indices=tensor([[1]]), values=tensor([[ 9, 0, 10]]), size=(3, 3), nnz=1, layout=torch.sparse_coo) `trace() → Tensor` See [`torch.trace()`](generated/torch.trace#torch.trace "torch.trace") `transpose(dim0, dim1) → Tensor` See [`torch.transpose()`](generated/torch.transpose#torch.transpose "torch.transpose") `transpose_(dim0, dim1) → Tensor` In-place version of `transpose()` `triangular_solve(A, upper=True, transpose=False, unitriangular=False) -> (Tensor, Tensor)` See [`torch.triangular_solve()`](generated/torch.triangular_solve#torch.triangular_solve "torch.triangular_solve") `tril(k=0) → Tensor` See [`torch.tril()`](generated/torch.tril#torch.tril "torch.tril") `tril_(k=0) → Tensor` In-place version of `tril()` `triu(k=0) → Tensor` See [`torch.triu()`](generated/torch.triu#torch.triu "torch.triu") `triu_(k=0) → Tensor` In-place version of `triu()` `true_divide(value) → Tensor` See [`torch.true_divide()`](generated/torch.true_divide#torch.true_divide "torch.true_divide") `true_divide_(value) → Tensor` In-place version of `true_divide_()` `trunc() → Tensor` See [`torch.trunc()`](generated/torch.trunc#torch.trunc "torch.trunc") `trunc_() → Tensor` In-place version of `trunc()` `type(dtype=None, non_blocking=False, **kwargs) → str or Tensor` Returns the type if `dtype` is not provided, else casts this object to the specified type. If this is already of the correct type, no copy is performed and the original object is returned. Parameters * **dtype** ([type](https://docs.python.org/3/library/functions.html#type "\(in Python v3.9\)") _or_ _string_) – The desired type * **non_blocking** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)")) – If `True`, and the source is in pinned memory and destination is on the GPU or vice versa, the copy is performed asynchronously with respect to the host. Otherwise, the argument has no effect. * ****kwargs** – For compatibility, may contain the key `async` in place of the `non_blocking` argument. The `async` arg is deprecated. `type_as(tensor) → Tensor` Returns this tensor cast to the type of the given tensor. This is a no-op if the tensor is already of the correct type. This is equivalent to `self.type(tensor.type())` Parameters **tensor** (Tensor) – the tensor which has the desired type `unbind(dim=0) → seq` See [`torch.unbind()`](generated/torch.unbind#torch.unbind "torch.unbind") `unfold(dimension, size, step) → Tensor` Returns a view of the original tensor which contains all slices of size `size` from `self` tensor in the dimension `dimension`. Step between two slices is given by `step`. If `sizedim` is the size of dimension `dimension` for `self`, the size of dimension `dimension` in the returned tensor will be `(sizedim - size) / step + 1`. An additional dimension of size `size` is appended in the returned tensor. Parameters * **dimension** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – dimension in which unfolding happens * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the size of each slice that is unfolded * **step** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – the step between each slice Example: >>> x = torch.arange(1., 8) >>> x tensor([ 1., 2., 3., 4., 5., 6., 7.]) >>> x.unfold(0, 2, 1) tensor([[ 1., 2.], [ 2., 3.], [ 3., 4.], [ 4., 5.], [ 5., 6.], [ 6., 7.]]) >>> x.unfold(0, 2, 2) tensor([[ 1., 2.], [ 3., 4.], [ 5., 6.]]) `uniform_(from=0, to=1) → Tensor` Fills `self` tensor with numbers sampled from the continuous uniform distribution: P(x)=1to−fromP(x) = \dfrac{1}{\text{to} - \text{from}} `unique(sorted=True, return_inverse=False, return_counts=False, dim=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.unique) Returns the unique elements of the input tensor. See [`torch.unique()`](generated/torch.unique#torch.unique "torch.unique") `unique_consecutive(return_inverse=False, return_counts=False, dim=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/tensor.html#Tensor.unique_consecutive) Eliminates all but the first element from every consecutive group of equivalent elements. See [`torch.unique_consecutive()`](generated/torch.unique_consecutive#torch.unique_consecutive "torch.unique_consecutive") `unsqueeze(dim) → Tensor` See [`torch.unsqueeze()`](generated/torch.unsqueeze#torch.unsqueeze "torch.unsqueeze") `unsqueeze_(dim) → Tensor` In-place version of `unsqueeze()` `values() → Tensor` Return the values tensor of a [sparse COO tensor](sparse#sparse-coo-docs). Warning Throws an error if `self` is not a sparse COO tensor. See also [`Tensor.indices()`](sparse#torch.Tensor.indices "torch.Tensor.indices"). Note This method can only be called on a coalesced sparse tensor. See [`Tensor.coalesce()`](sparse#torch.Tensor.coalesce "torch.Tensor.coalesce") for details. `var(dim=None, unbiased=True, keepdim=False) → Tensor` See [`torch.var()`](generated/torch.var#torch.var "torch.var") `vdot(other) → Tensor` See [`torch.vdot()`](generated/torch.vdot#torch.vdot "torch.vdot") `view(*shape) → Tensor` Returns a new tensor with the same data as the `self` tensor but of a different `shape`. The returned tensor shares the same data and must have the same number of elements, but may have a different size. For a tensor to be viewed, the new view size must be compatible with its original size and stride, i.e., each new view dimension must either be a subspace of an original dimension, or only span across original dimensions d,d+1,…,d+kd, d+1, \dots, d+k that satisfy the following contiguity-like condition that ∀i=d,…,d+k−1\forall i = d, \dots, d+k-1 , stride[i]=stride[i+1]×size[i+1]\text{stride}[i] = \text{stride}[i+1] \times \text{size}[i+1] Otherwise, it will not be possible to view `self` tensor as `shape` without copying it (e.g., via `contiguous()`). When it is unclear whether a `view()` can be performed, it is advisable to use [`reshape()`](generated/torch.reshape#torch.reshape "torch.reshape"), which returns a view if the shapes are compatible, and copies (equivalent to calling `contiguous()`) otherwise. Parameters **shape** (_torch.Size_ _or_ _int..._) – the desired size Example: >>> x = torch.randn(4, 4) >>> x.size() torch.Size([4, 4]) >>> y = x.view(16) >>> y.size() torch.Size([16]) >>> z = x.view(-1, 8) # the size -1 is inferred from other dimensions >>> z.size() torch.Size([2, 8]) >>> a = torch.randn(1, 2, 3, 4) >>> a.size() torch.Size([1, 2, 3, 4]) >>> b = a.transpose(1, 2) # Swaps 2nd and 3rd dimension >>> b.size() torch.Size([1, 3, 2, 4]) >>> c = a.view(1, 3, 2, 4) # Does not change tensor layout in memory >>> c.size() torch.Size([1, 3, 2, 4]) >>> torch.equal(b, c) False `view(dtype) → Tensor` Returns a new tensor with the same data as the `self` tensor but of a different `dtype`. `dtype` must have the same number of bytes per element as `self`’s dtype. Warning This overload is not supported by TorchScript, and using it in a Torchscript program will cause undefined behavior. Parameters **dtype** ([`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype")) – the desired dtype Example: >>> x = torch.randn(4, 4) >>> x tensor([[ 0.9482, -0.0310, 1.4999, -0.5316], [-0.1520, 0.7472, 0.5617, -0.8649], [-2.4724, -0.0334, -0.2976, -0.8499], [-0.2109, 1.9913, -0.9607, -0.6123]]) >>> x.dtype torch.float32 >>> y = x.view(torch.int32) >>> y tensor([[ 1064483442, -1124191867, 1069546515, -1089989247], [-1105482831, 1061112040, 1057999968, -1084397505], [-1071760287, -1123489973, -1097310419, -1084649136], [-1101533110, 1073668768, -1082790149, -1088634448]], dtype=torch.int32) >>> y[0, 0] = 1000000000 >>> x tensor([[ 0.0047, -0.0310, 1.4999, -0.5316], [-0.1520, 0.7472, 0.5617, -0.8649], [-2.4724, -0.0334, -0.2976, -0.8499], [-0.2109, 1.9913, -0.9607, -0.6123]]) >>> x.view(torch.int16) Traceback (most recent call last): File "", line 1, in RuntimeError: Viewing a tensor as a new dtype with a different number of bytes per element is not supported. `view_as(other) → Tensor` View this tensor as the same size as `other`. `self.view_as(other)` is equivalent to `self.view(other.size())`. Please see `view()` for more information about `view`. Parameters **other** (`torch.Tensor`) – The result tensor has the same size as `other`. `where(condition, y) → Tensor` `self.where(condition, y)` is equivalent to `torch.where(condition, self, y)`. See [`torch.where()`](generated/torch.where#torch.where "torch.where") `xlogy(other) → Tensor` See [`torch.xlogy()`](generated/torch.xlogy#torch.xlogy "torch.xlogy") `xlogy_(other) → Tensor` In-place version of `xlogy()` `zero_() → Tensor` Fills `self` tensor with zeros. # torch The torch package contains data structures for multi-dimensional tensors and defines mathematical operations over these tensors. Additionally, it provides many utilities for efficient serializing of Tensors and arbitrary types, and other useful utilities. It has a CUDA counterpart, that enables you to run your tensor computations on an NVIDIA GPU with compute capability >= 3.0 ## Tensors [`is_tensor`](generated/torch.is_tensor#torch.is_tensor "torch.is_tensor") | Returns True if `obj` is a PyTorch tensor. ---|--- [`is_storage`](generated/torch.is_storage#torch.is_storage "torch.is_storage") | Returns True if `obj` is a PyTorch storage object. [`is_complex`](generated/torch.is_complex#torch.is_complex "torch.is_complex") | Returns True if the data type of `input` is a complex data type i.e., one of `torch.complex64`, and `torch.complex128`. [`is_floating_point`](generated/torch.is_floating_point#torch.is_floating_point "torch.is_floating_point") | Returns True if the data type of `input` is a floating point data type i.e., one of `torch.float64`, `torch.float32`, `torch.float16`, and `torch.bfloat16`. [`is_nonzero`](generated/torch.is_nonzero#torch.is_nonzero "torch.is_nonzero") | Returns True if the `input` is a single element tensor which is not equal to zero after type conversions. [`set_default_dtype`](generated/torch.set_default_dtype#torch.set_default_dtype "torch.set_default_dtype") | Sets the default floating point dtype to `d`. [`get_default_dtype`](generated/torch.get_default_dtype#torch.get_default_dtype "torch.get_default_dtype") | Get the current default floating point [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"). [`set_default_tensor_type`](generated/torch.set_default_tensor_type#torch.set_default_tensor_type "torch.set_default_tensor_type") | Sets the default `torch.Tensor` type to floating point tensor type `t`. [`numel`](generated/torch.numel#torch.numel "torch.numel") | Returns the total number of elements in the `input` tensor. [`set_printoptions`](generated/torch.set_printoptions#torch.set_printoptions "torch.set_printoptions") | Set options for printing. [`set_flush_denormal`](generated/torch.set_flush_denormal#torch.set_flush_denormal "torch.set_flush_denormal") | Disables denormal floating numbers on CPU. ### Creation Ops Note Random sampling creation ops are listed under Random sampling and include: [`torch.rand()`](generated/torch.rand#torch.rand "torch.rand") [`torch.rand_like()`](generated/torch.rand_like#torch.rand_like "torch.rand_like") [`torch.randn()`](generated/torch.randn#torch.randn "torch.randn") [`torch.randn_like()`](generated/torch.randn_like#torch.randn_like "torch.randn_like") [`torch.randint()`](generated/torch.randint#torch.randint "torch.randint") [`torch.randint_like()`](generated/torch.randint_like#torch.randint_like "torch.randint_like") [`torch.randperm()`](generated/torch.randperm#torch.randperm "torch.randperm") You may also use [`torch.empty()`](generated/torch.empty#torch.empty "torch.empty") with the In-place random sampling methods to create [`torch.Tensor`](tensors#torch.Tensor "torch.Tensor") s with values sampled from a broader range of distributions. [`tensor`](generated/torch.tensor#torch.tensor "torch.tensor") | Constructs a tensor with `data`. ---|--- [`sparse_coo_tensor`](generated/torch.sparse_coo_tensor#torch.sparse_coo_tensor "torch.sparse_coo_tensor") | Constructs a [sparse tensor in COO(rdinate) format](sparse#sparse-coo-docs) with specified values at the given `indices`. [`as_tensor`](generated/torch.as_tensor#torch.as_tensor "torch.as_tensor") | Convert the data into a `torch.Tensor`. [`as_strided`](generated/torch.as_strided#torch.as_strided "torch.as_strided") | Create a view of an existing `torch.Tensor` `input` with specified `size`, `stride` and `storage_offset`. [`from_numpy`](generated/torch.from_numpy#torch.from_numpy "torch.from_numpy") | Creates a [`Tensor`](tensors#torch.Tensor "torch.Tensor") from a [`numpy.ndarray`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.html#numpy.ndarray "\(in NumPy v1.20\)"). [`zeros`](generated/torch.zeros#torch.zeros "torch.zeros") | Returns a tensor filled with the scalar value `0`, with the shape defined by the variable argument `size`. [`zeros_like`](generated/torch.zeros_like#torch.zeros_like "torch.zeros_like") | Returns a tensor filled with the scalar value `0`, with the same size as `input`. [`ones`](generated/torch.ones#torch.ones "torch.ones") | Returns a tensor filled with the scalar value `1`, with the shape defined by the variable argument `size`. [`ones_like`](generated/torch.ones_like#torch.ones_like "torch.ones_like") | Returns a tensor filled with the scalar value `1`, with the same size as `input`. [`arange`](generated/torch.arange#torch.arange "torch.arange") | Returns a 1-D tensor of size ⌈end−startstep⌉\left\lceil \frac{\text{end} - \text{start}}{\text{step}} \right\rceil with values from the interval `[start, end)` taken with common difference `step` beginning from `start`. [`range`](generated/torch.range#torch.range "torch.range") | Returns a 1-D tensor of size ⌊end−startstep⌋+1\left\lfloor \frac{\text{end} - \text{start}}{\text{step}} \right\rfloor + 1 with values from `start` to `end` with step `step`. [`linspace`](generated/torch.linspace#torch.linspace "torch.linspace") | Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from `start` to `end`, inclusive. [`logspace`](generated/torch.logspace#torch.logspace "torch.logspace") | Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from basestart{{\text{{base}}}}^{{\text{{start}}}} to baseend{{\text{{base}}}}^{{\text{{end}}}} , inclusive, on a logarithmic scale with base `base`. [`eye`](generated/torch.eye#torch.eye "torch.eye") | Returns a 2-D tensor with ones on the diagonal and zeros elsewhere. [`empty`](generated/torch.empty#torch.empty "torch.empty") | Returns a tensor filled with uninitialized data. [`empty_like`](generated/torch.empty_like#torch.empty_like "torch.empty_like") | Returns an uninitialized tensor with the same size as `input`. [`empty_strided`](generated/torch.empty_strided#torch.empty_strided "torch.empty_strided") | Returns a tensor filled with uninitialized data. [`full`](generated/torch.full#torch.full "torch.full") | Creates a tensor of size `size` filled with `fill_value`. [`full_like`](generated/torch.full_like#torch.full_like "torch.full_like") | Returns a tensor with the same size as `input` filled with `fill_value`. [`quantize_per_tensor`](generated/torch.quantize_per_tensor#torch.quantize_per_tensor "torch.quantize_per_tensor") | Converts a float tensor to a quantized tensor with given scale and zero point. [`quantize_per_channel`](generated/torch.quantize_per_channel#torch.quantize_per_channel "torch.quantize_per_channel") | Converts a float tensor to a per-channel quantized tensor with given scales and zero points. [`dequantize`](generated/torch.dequantize#torch.dequantize "torch.dequantize") | Returns an fp32 Tensor by dequantizing a quantized Tensor [`complex`](generated/torch.complex#torch.complex "torch.complex") | Constructs a complex tensor with its real part equal to [`real`](generated/torch.real#torch.real "torch.real") and its imaginary part equal to [`imag`](generated/torch.imag#torch.imag "torch.imag"). [`polar`](generated/torch.polar#torch.polar "torch.polar") | Constructs a complex tensor whose elements are Cartesian coordinates corresponding to the polar coordinates with absolute value [`abs`](generated/torch.abs#torch.abs "torch.abs") and angle [`angle`](generated/torch.angle#torch.angle "torch.angle"). [`heaviside`](generated/torch.heaviside#torch.heaviside "torch.heaviside") | Computes the Heaviside step function for each element in `input`. ### Indexing, Slicing, Joining, Mutating Ops [`cat`](generated/torch.cat#torch.cat "torch.cat") | Concatenates the given sequence of `seq` tensors in the given dimension. ---|--- [`chunk`](generated/torch.chunk#torch.chunk "torch.chunk") | Splits a tensor into a specific number of chunks. [`column_stack`](generated/torch.column_stack#torch.column_stack "torch.column_stack") | Creates a new tensor by horizontally stacking the tensors in `tensors`. [`dstack`](generated/torch.dstack#torch.dstack "torch.dstack") | Stack tensors in sequence depthwise (along third axis). [`gather`](generated/torch.gather#torch.gather "torch.gather") | Gathers values along an axis specified by `dim`. [`hstack`](generated/torch.hstack#torch.hstack "torch.hstack") | Stack tensors in sequence horizontally (column wise). [`index_select`](generated/torch.index_select#torch.index_select "torch.index_select") | Returns a new tensor which indexes the `input` tensor along dimension `dim` using the entries in `index` which is a `LongTensor`. [`masked_select`](generated/torch.masked_select#torch.masked_select "torch.masked_select") | Returns a new 1-D tensor which indexes the `input` tensor according to the boolean mask `mask` which is a `BoolTensor`. [`movedim`](generated/torch.movedim#torch.movedim "torch.movedim") | Moves the dimension(s) of `input` at the position(s) in `source` to the position(s) in `destination`. [`moveaxis`](generated/torch.moveaxis#torch.moveaxis "torch.moveaxis") | Alias for [`torch.movedim()`](generated/torch.movedim#torch.movedim "torch.movedim"). [`narrow`](generated/torch.narrow#torch.narrow "torch.narrow") | Returns a new tensor that is a narrowed version of `input` tensor. [`nonzero`](generated/torch.nonzero#torch.nonzero "torch.nonzero") | [`reshape`](generated/torch.reshape#torch.reshape "torch.reshape") | Returns a tensor with the same data and number of elements as `input`, but with the specified shape. [`row_stack`](generated/torch.row_stack#torch.row_stack "torch.row_stack") | Alias of [`torch.vstack()`](generated/torch.vstack#torch.vstack "torch.vstack"). [`scatter`](generated/torch.scatter#torch.scatter "torch.scatter") | Out-of-place version of [`torch.Tensor.scatter_()`](tensors#torch.Tensor.scatter_ "torch.Tensor.scatter_") [`scatter_add`](generated/torch.scatter_add#torch.scatter_add "torch.scatter_add") | Out-of-place version of [`torch.Tensor.scatter_add_()`](tensors#torch.Tensor.scatter_add_ "torch.Tensor.scatter_add_") [`split`](generated/torch.split#torch.split "torch.split") | Splits the tensor into chunks. [`squeeze`](generated/torch.squeeze#torch.squeeze "torch.squeeze") | Returns a tensor with all the dimensions of `input` of size `1` removed. [`stack`](generated/torch.stack#torch.stack "torch.stack") | Concatenates a sequence of tensors along a new dimension. [`swapaxes`](generated/torch.swapaxes#torch.swapaxes "torch.swapaxes") | Alias for [`torch.transpose()`](generated/torch.transpose#torch.transpose "torch.transpose"). [`swapdims`](generated/torch.swapdims#torch.swapdims "torch.swapdims") | Alias for [`torch.transpose()`](generated/torch.transpose#torch.transpose "torch.transpose"). [`t`](generated/torch.t#torch.t "torch.t") | Expects `input` to be <= 2-D tensor and transposes dimensions 0 and 1. [`take`](generated/torch.take#torch.take "torch.take") | Returns a new tensor with the elements of `input` at the given indices. [`tensor_split`](generated/torch.tensor_split#torch.tensor_split "torch.tensor_split") | Splits a tensor into multiple sub-tensors, all of which are views of `input`, along dimension `dim` according to the indices or number of sections specified by `indices_or_sections`. [`tile`](generated/torch.tile#torch.tile "torch.tile") | Constructs a tensor by repeating the elements of `input`. [`transpose`](generated/torch.transpose#torch.transpose "torch.transpose") | Returns a tensor that is a transposed version of `input`. [`unbind`](generated/torch.unbind#torch.unbind "torch.unbind") | Removes a tensor dimension. [`unsqueeze`](generated/torch.unsqueeze#torch.unsqueeze "torch.unsqueeze") | Returns a new tensor with a dimension of size one inserted at the specified position. [`vstack`](generated/torch.vstack#torch.vstack "torch.vstack") | Stack tensors in sequence vertically (row wise). [`where`](generated/torch.where#torch.where "torch.where") | Return a tensor of elements selected from either `x` or `y`, depending on `condition`. ## Generators [`Generator`](generated/torch.generator#torch.Generator "torch.Generator") | Creates and returns a generator object that manages the state of the algorithm which produces pseudo random numbers. ---|--- ## Random sampling [`seed`](generated/torch.seed#torch.seed "torch.seed") | Sets the seed for generating random numbers to a non-deterministic random number. ---|--- [`manual_seed`](generated/torch.manual_seed#torch.manual_seed "torch.manual_seed") | Sets the seed for generating random numbers. [`initial_seed`](generated/torch.initial_seed#torch.initial_seed "torch.initial_seed") | Returns the initial seed for generating random numbers as a Python `long`. [`get_rng_state`](generated/torch.get_rng_state#torch.get_rng_state "torch.get_rng_state") | Returns the random number generator state as a `torch.ByteTensor`. [`set_rng_state`](generated/torch.set_rng_state#torch.set_rng_state "torch.set_rng_state") | Sets the random number generator state. `torch.default_generator Returns the default CPU torch.Generator` [`bernoulli`](generated/torch.bernoulli#torch.bernoulli "torch.bernoulli") | Draws binary random numbers (0 or 1) from a Bernoulli distribution. ---|--- [`multinomial`](generated/torch.multinomial#torch.multinomial "torch.multinomial") | Returns a tensor where each row contains `num_samples` indices sampled from the multinomial probability distribution located in the corresponding row of tensor `input`. [`normal`](generated/torch.normal#torch.normal "torch.normal") | Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. [`poisson`](generated/torch.poisson#torch.poisson "torch.poisson") | Returns a tensor of the same size as `input` with each element sampled from a Poisson distribution with rate parameter given by the corresponding element in `input` i.e., [`rand`](generated/torch.rand#torch.rand "torch.rand") | Returns a tensor filled with random numbers from a uniform distribution on the interval [0,1)[0, 1) [`rand_like`](generated/torch.rand_like#torch.rand_like "torch.rand_like") | Returns a tensor with the same size as `input` that is filled with random numbers from a uniform distribution on the interval [0,1)[0, 1) . [`randint`](generated/torch.randint#torch.randint "torch.randint") | Returns a tensor filled with random integers generated uniformly between `low` (inclusive) and `high` (exclusive). [`randint_like`](generated/torch.randint_like#torch.randint_like "torch.randint_like") | Returns a tensor with the same shape as Tensor `input` filled with random integers generated uniformly between `low` (inclusive) and `high` (exclusive). [`randn`](generated/torch.randn#torch.randn "torch.randn") | Returns a tensor filled with random numbers from a normal distribution with mean `0` and variance `1` (also called the standard normal distribution). [`randn_like`](generated/torch.randn_like#torch.randn_like "torch.randn_like") | Returns a tensor with the same size as `input` that is filled with random numbers from a normal distribution with mean 0 and variance 1. [`randperm`](generated/torch.randperm#torch.randperm "torch.randperm") | Returns a random permutation of integers from `0` to `n - 1`. ### In-place random sampling There are a few more in-place random sampling functions defined on Tensors as well. Click through to refer to their documentation: * [`torch.Tensor.bernoulli_()`](tensors#torch.Tensor.bernoulli_ "torch.Tensor.bernoulli_") \- in-place version of [`torch.bernoulli()`](generated/torch.bernoulli#torch.bernoulli "torch.bernoulli") * [`torch.Tensor.cauchy_()`](tensors#torch.Tensor.cauchy_ "torch.Tensor.cauchy_") \- numbers drawn from the Cauchy distribution * [`torch.Tensor.exponential_()`](tensors#torch.Tensor.exponential_ "torch.Tensor.exponential_") \- numbers drawn from the exponential distribution * [`torch.Tensor.geometric_()`](tensors#torch.Tensor.geometric_ "torch.Tensor.geometric_") \- elements drawn from the geometric distribution * [`torch.Tensor.log_normal_()`](tensors#torch.Tensor.log_normal_ "torch.Tensor.log_normal_") \- samples from the log-normal distribution * [`torch.Tensor.normal_()`](tensors#torch.Tensor.normal_ "torch.Tensor.normal_") \- in-place version of [`torch.normal()`](generated/torch.normal#torch.normal "torch.normal") * [`torch.Tensor.random_()`](tensors#torch.Tensor.random_ "torch.Tensor.random_") \- numbers sampled from the discrete uniform distribution * [`torch.Tensor.uniform_()`](tensors#torch.Tensor.uniform_ "torch.Tensor.uniform_") \- numbers sampled from the continuous uniform distribution ### Quasi-random sampling [`quasirandom.SobolEngine`](generated/torch.quasirandom.sobolengine#torch.quasirandom.SobolEngine "torch.quasirandom.SobolEngine") | The [`torch.quasirandom.SobolEngine`](generated/torch.quasirandom.sobolengine#torch.quasirandom.SobolEngine "torch.quasirandom.SobolEngine") is an engine for generating (scrambled) Sobol sequences. ---|--- ## Serialization [`save`](generated/torch.save#torch.save "torch.save") | Saves an object to a disk file. ---|--- [`load`](generated/torch.load#torch.load "torch.load") | Loads an object saved with [`torch.save()`](generated/torch.save#torch.save "torch.save") from a file. ## Parallelism [`get_num_threads`](generated/torch.get_num_threads#torch.get_num_threads "torch.get_num_threads") | Returns the number of threads used for parallelizing CPU operations ---|--- [`set_num_threads`](generated/torch.set_num_threads#torch.set_num_threads "torch.set_num_threads") | Sets the number of threads used for intraop parallelism on CPU. [`get_num_interop_threads`](generated/torch.get_num_interop_threads#torch.get_num_interop_threads "torch.get_num_interop_threads") | Returns the number of threads used for inter-op parallelism on CPU (e.g. [`set_num_interop_threads`](generated/torch.set_num_interop_threads#torch.set_num_interop_threads "torch.set_num_interop_threads") | Sets the number of threads used for interop parallelism (e.g. ## Locally disabling gradient computation The context managers [`torch.no_grad()`](generated/torch.no_grad#torch.no_grad "torch.no_grad"), [`torch.enable_grad()`](generated/torch.enable_grad#torch.enable_grad "torch.enable_grad"), and [`torch.set_grad_enabled()`](generated/torch.set_grad_enabled#torch.set_grad_enabled "torch.set_grad_enabled") are helpful for locally disabling and enabling gradient computation. See [Locally disabling gradient computation](autograd#locally-disable-grad) for more details on their usage. These context managers are thread local, so they won’t work if you send work to another thread using the `threading` module, etc. Examples: >>> x = torch.zeros(1, requires_grad=True) >>> with torch.no_grad(): ... y = x * 2 >>> y.requires_grad False >>> is_train = False >>> with torch.set_grad_enabled(is_train): ... y = x * 2 >>> y.requires_grad False >>> torch.set_grad_enabled(True) # this can also be used as a function >>> y = x * 2 >>> y.requires_grad True >>> torch.set_grad_enabled(False) >>> y = x * 2 >>> y.requires_grad False [`no_grad`](generated/torch.no_grad#torch.no_grad "torch.no_grad") | Context-manager that disabled gradient calculation. ---|--- [`enable_grad`](generated/torch.enable_grad#torch.enable_grad "torch.enable_grad") | Context-manager that enables gradient calculation. [`set_grad_enabled`](generated/torch.set_grad_enabled#torch.set_grad_enabled "torch.set_grad_enabled") | Context-manager that sets gradient calculation to on or off. ## Math operations ### Pointwise Ops [`abs`](generated/torch.abs#torch.abs "torch.abs") | Computes the absolute value of each element in `input`. ---|--- [`absolute`](generated/torch.absolute#torch.absolute "torch.absolute") | Alias for [`torch.abs()`](generated/torch.abs#torch.abs "torch.abs") [`acos`](generated/torch.acos#torch.acos "torch.acos") | Computes the inverse cosine of each element in `input`. [`arccos`](generated/torch.arccos#torch.arccos "torch.arccos") | Alias for [`torch.acos()`](generated/torch.acos#torch.acos "torch.acos"). [`acosh`](generated/torch.acosh#torch.acosh "torch.acosh") | Returns a new tensor with the inverse hyperbolic cosine of the elements of `input`. [`arccosh`](generated/torch.arccosh#torch.arccosh "torch.arccosh") | Alias for [`torch.acosh()`](generated/torch.acosh#torch.acosh "torch.acosh"). [`add`](generated/torch.add#torch.add "torch.add") | Adds the scalar `other` to each element of the input `input` and returns a new resulting tensor. [`addcdiv`](generated/torch.addcdiv#torch.addcdiv "torch.addcdiv") | Performs the element-wise division of `tensor1` by `tensor2`, multiply the result by the scalar `value` and add it to `input`. [`addcmul`](generated/torch.addcmul#torch.addcmul "torch.addcmul") | Performs the element-wise multiplication of `tensor1` by `tensor2`, multiply the result by the scalar `value` and add it to `input`. [`angle`](generated/torch.angle#torch.angle "torch.angle") | Computes the element-wise angle (in radians) of the given `input` tensor. [`asin`](generated/torch.asin#torch.asin "torch.asin") | Returns a new tensor with the arcsine of the elements of `input`. [`arcsin`](generated/torch.arcsin#torch.arcsin "torch.arcsin") | Alias for [`torch.asin()`](generated/torch.asin#torch.asin "torch.asin"). [`asinh`](generated/torch.asinh#torch.asinh "torch.asinh") | Returns a new tensor with the inverse hyperbolic sine of the elements of `input`. [`arcsinh`](generated/torch.arcsinh#torch.arcsinh "torch.arcsinh") | Alias for [`torch.asinh()`](generated/torch.asinh#torch.asinh "torch.asinh"). [`atan`](generated/torch.atan#torch.atan "torch.atan") | Returns a new tensor with the arctangent of the elements of `input`. [`arctan`](generated/torch.arctan#torch.arctan "torch.arctan") | Alias for [`torch.atan()`](generated/torch.atan#torch.atan "torch.atan"). [`atanh`](generated/torch.atanh#torch.atanh "torch.atanh") | Returns a new tensor with the inverse hyperbolic tangent of the elements of `input`. [`arctanh`](generated/torch.arctanh#torch.arctanh "torch.arctanh") | Alias for [`torch.atanh()`](generated/torch.atanh#torch.atanh "torch.atanh"). [`atan2`](generated/torch.atan2#torch.atan2 "torch.atan2") | Element-wise arctangent of inputi/otheri\text{input}_{i} / \text{other}_{i} with consideration of the quadrant. [`bitwise_not`](generated/torch.bitwise_not#torch.bitwise_not "torch.bitwise_not") | Computes the bitwise NOT of the given input tensor. [`bitwise_and`](generated/torch.bitwise_and#torch.bitwise_and "torch.bitwise_and") | Computes the bitwise AND of `input` and `other`. [`bitwise_or`](generated/torch.bitwise_or#torch.bitwise_or "torch.bitwise_or") | Computes the bitwise OR of `input` and `other`. [`bitwise_xor`](generated/torch.bitwise_xor#torch.bitwise_xor "torch.bitwise_xor") | Computes the bitwise XOR of `input` and `other`. [`ceil`](generated/torch.ceil#torch.ceil "torch.ceil") | Returns a new tensor with the ceil of the elements of `input`, the smallest integer greater than or equal to each element. [`clamp`](generated/torch.clamp#torch.clamp "torch.clamp") | Clamp all elements in `input` into the range `[` [`min`](generated/torch.min#torch.min "torch.min"), [`max`](generated/torch.max#torch.max "torch.max") `]`. [`clip`](generated/torch.clip#torch.clip "torch.clip") | Alias for [`torch.clamp()`](generated/torch.clamp#torch.clamp "torch.clamp"). [`conj`](generated/torch.conj#torch.conj "torch.conj") | Computes the element-wise conjugate of the given `input` tensor. [`copysign`](generated/torch.copysign#torch.copysign "torch.copysign") | Create a new floating-point tensor with the magnitude of `input` and the sign of `other`, elementwise. [`cos`](generated/torch.cos#torch.cos "torch.cos") | Returns a new tensor with the cosine of the elements of `input`. [`cosh`](generated/torch.cosh#torch.cosh "torch.cosh") | Returns a new tensor with the hyperbolic cosine of the elements of `input`. [`deg2rad`](generated/torch.deg2rad#torch.deg2rad "torch.deg2rad") | Returns a new tensor with each of the elements of `input` converted from angles in degrees to radians. [`div`](generated/torch.div#torch.div "torch.div") | Divides each element of the input `input` by the corresponding element of `other`. [`divide`](generated/torch.divide#torch.divide "torch.divide") | Alias for [`torch.div()`](generated/torch.div#torch.div "torch.div"). [`digamma`](generated/torch.digamma#torch.digamma "torch.digamma") | Computes the logarithmic derivative of the gamma function on `input`. [`erf`](generated/torch.erf#torch.erf "torch.erf") | Computes the error function of each element. [`erfc`](generated/torch.erfc#torch.erfc "torch.erfc") | Computes the complementary error function of each element of `input`. [`erfinv`](generated/torch.erfinv#torch.erfinv "torch.erfinv") | Computes the inverse error function of each element of `input`. [`exp`](generated/torch.exp#torch.exp "torch.exp") | Returns a new tensor with the exponential of the elements of the input tensor `input`. [`exp2`](generated/torch.exp2#torch.exp2 "torch.exp2") | Computes the base two exponential function of `input`. [`expm1`](generated/torch.expm1#torch.expm1 "torch.expm1") | Returns a new tensor with the exponential of the elements minus 1 of `input`. [`fake_quantize_per_channel_affine`](generated/torch.fake_quantize_per_channel_affine#torch.fake_quantize_per_channel_affine "torch.fake_quantize_per_channel_affine") | Returns a new tensor with the data in `input` fake quantized per channel using `scale`, `zero_point`, `quant_min` and `quant_max`, across the channel specified by `axis`. [`fake_quantize_per_tensor_affine`](generated/torch.fake_quantize_per_tensor_affine#torch.fake_quantize_per_tensor_affine "torch.fake_quantize_per_tensor_affine") | Returns a new tensor with the data in `input` fake quantized using `scale`, `zero_point`, `quant_min` and `quant_max`. [`fix`](generated/torch.fix#torch.fix "torch.fix") | Alias for [`torch.trunc()`](generated/torch.trunc#torch.trunc "torch.trunc") [`float_power`](generated/torch.float_power#torch.float_power "torch.float_power") | Raises `input` to the power of `exponent`, elementwise, in double precision. [`floor`](generated/torch.floor#torch.floor "torch.floor") | Returns a new tensor with the floor of the elements of `input`, the largest integer less than or equal to each element. [`floor_divide`](generated/torch.floor_divide#torch.floor_divide "torch.floor_divide") | [`fmod`](generated/torch.fmod#torch.fmod "torch.fmod") | Computes the element-wise remainder of division. [`frac`](generated/torch.frac#torch.frac "torch.frac") | Computes the fractional portion of each element in `input`. [`imag`](generated/torch.imag#torch.imag "torch.imag") | Returns a new tensor containing imaginary values of the `self` tensor. [`ldexp`](generated/torch.ldexp#torch.ldexp "torch.ldexp") | Multiplies `input` by 2**:attr:`other`. [`lerp`](generated/torch.lerp#torch.lerp "torch.lerp") | Does a linear interpolation of two tensors `start` (given by `input`) and `end` based on a scalar or tensor `weight` and returns the resulting `out` tensor. [`lgamma`](generated/torch.lgamma#torch.lgamma "torch.lgamma") | Computes the logarithm of the gamma function on `input`. [`log`](generated/torch.log#torch.log "torch.log") | Returns a new tensor with the natural logarithm of the elements of `input`. [`log10`](generated/torch.log10#torch.log10 "torch.log10") | Returns a new tensor with the logarithm to the base 10 of the elements of `input`. [`log1p`](generated/torch.log1p#torch.log1p "torch.log1p") | Returns a new tensor with the natural logarithm of (1 + `input`). [`log2`](generated/torch.log2#torch.log2 "torch.log2") | Returns a new tensor with the logarithm to the base 2 of the elements of `input`. [`logaddexp`](generated/torch.logaddexp#torch.logaddexp "torch.logaddexp") | Logarithm of the sum of exponentiations of the inputs. [`logaddexp2`](generated/torch.logaddexp2#torch.logaddexp2 "torch.logaddexp2") | Logarithm of the sum of exponentiations of the inputs in base-2. [`logical_and`](generated/torch.logical_and#torch.logical_and "torch.logical_and") | Computes the element-wise logical AND of the given input tensors. [`logical_not`](generated/torch.logical_not#torch.logical_not "torch.logical_not") | Computes the element-wise logical NOT of the given input tensor. [`logical_or`](generated/torch.logical_or#torch.logical_or "torch.logical_or") | Computes the element-wise logical OR of the given input tensors. [`logical_xor`](generated/torch.logical_xor#torch.logical_xor "torch.logical_xor") | Computes the element-wise logical XOR of the given input tensors. [`logit`](generated/torch.logit#torch.logit "torch.logit") | Returns a new tensor with the logit of the elements of `input`. [`hypot`](generated/torch.hypot#torch.hypot "torch.hypot") | Given the legs of a right triangle, return its hypotenuse. [`i0`](generated/torch.i0#torch.i0 "torch.i0") | Computes the zeroth order modified Bessel function of the first kind for each element of `input`. [`igamma`](generated/torch.igamma#torch.igamma "torch.igamma") | Computes the regularized lower incomplete gamma function: [`igammac`](generated/torch.igammac#torch.igammac "torch.igammac") | Computes the regularized upper incomplete gamma function: [`mul`](generated/torch.mul#torch.mul "torch.mul") | Multiplies each element of the input `input` with the scalar `other` and returns a new resulting tensor. [`multiply`](generated/torch.multiply#torch.multiply "torch.multiply") | Alias for [`torch.mul()`](generated/torch.mul#torch.mul "torch.mul"). [`mvlgamma`](generated/torch.mvlgamma#torch.mvlgamma "torch.mvlgamma") | Computes the [multivariate log-gamma function](https://en.wikipedia.org/wiki/Multivariate_gamma_function)) with dimension pp element-wise, given by [`nan_to_num`](generated/torch.nan_to_num#torch.nan_to_num "torch.nan_to_num") | Replaces `NaN`, positive infinity, and negative infinity values in `input` with the values specified by `nan`, `posinf`, and `neginf`, respectively. [`neg`](generated/torch.neg#torch.neg "torch.neg") | Returns a new tensor with the negative of the elements of `input`. [`negative`](generated/torch.negative#torch.negative "torch.negative") | Alias for [`torch.neg()`](generated/torch.neg#torch.neg "torch.neg") [`nextafter`](generated/torch.nextafter#torch.nextafter "torch.nextafter") | Return the next floating-point value after `input` towards `other`, elementwise. [`polygamma`](generated/torch.polygamma#torch.polygamma "torch.polygamma") | Computes the nthn^{th} derivative of the digamma function on `input`. [`pow`](generated/torch.pow#torch.pow "torch.pow") | Takes the power of each element in `input` with `exponent` and returns a tensor with the result. [`rad2deg`](generated/torch.rad2deg#torch.rad2deg "torch.rad2deg") | Returns a new tensor with each of the elements of `input` converted from angles in radians to degrees. [`real`](generated/torch.real#torch.real "torch.real") | Returns a new tensor containing real values of the `self` tensor. [`reciprocal`](generated/torch.reciprocal#torch.reciprocal "torch.reciprocal") | Returns a new tensor with the reciprocal of the elements of `input` [`remainder`](generated/torch.remainder#torch.remainder "torch.remainder") | Computes the element-wise remainder of division. [`round`](generated/torch.round#torch.round "torch.round") | Returns a new tensor with each of the elements of `input` rounded to the closest integer. [`rsqrt`](generated/torch.rsqrt#torch.rsqrt "torch.rsqrt") | Returns a new tensor with the reciprocal of the square-root of each of the elements of `input`. [`sigmoid`](generated/torch.sigmoid#torch.sigmoid "torch.sigmoid") | Returns a new tensor with the sigmoid of the elements of `input`. [`sign`](generated/torch.sign#torch.sign "torch.sign") | Returns a new tensor with the signs of the elements of `input`. [`sgn`](generated/torch.sgn#torch.sgn "torch.sgn") | For complex tensors, this function returns a new tensor whose elemants have the same angle as that of the elements of `input` and absolute value 1. [`signbit`](generated/torch.signbit#torch.signbit "torch.signbit") | Tests if each element of `input` has its sign bit set (is less than zero) or not. [`sin`](generated/torch.sin#torch.sin "torch.sin") | Returns a new tensor with the sine of the elements of `input`. [`sinc`](generated/torch.sinc#torch.sinc "torch.sinc") | Computes the normalized sinc of `input.` [`sinh`](generated/torch.sinh#torch.sinh "torch.sinh") | Returns a new tensor with the hyperbolic sine of the elements of `input`. [`sqrt`](generated/torch.sqrt#torch.sqrt "torch.sqrt") | Returns a new tensor with the square-root of the elements of `input`. [`square`](generated/torch.square#torch.square "torch.square") | Returns a new tensor with the square of the elements of `input`. [`sub`](generated/torch.sub#torch.sub "torch.sub") | Subtracts `other`, scaled by `alpha`, from `input`. [`subtract`](generated/torch.subtract#torch.subtract "torch.subtract") | Alias for [`torch.sub()`](generated/torch.sub#torch.sub "torch.sub"). [`tan`](generated/torch.tan#torch.tan "torch.tan") | Returns a new tensor with the tangent of the elements of `input`. [`tanh`](generated/torch.tanh#torch.tanh "torch.tanh") | Returns a new tensor with the hyperbolic tangent of the elements of `input`. [`true_divide`](generated/torch.true_divide#torch.true_divide "torch.true_divide") | Alias for [`torch.div()`](generated/torch.div#torch.div "torch.div") with `rounding_mode=None`. [`trunc`](generated/torch.trunc#torch.trunc "torch.trunc") | Returns a new tensor with the truncated integer values of the elements of `input`. [`xlogy`](generated/torch.xlogy#torch.xlogy "torch.xlogy") | Computes `input * log(other)` with the following cases. ### Reduction Ops [`argmax`](generated/torch.argmax#torch.argmax "torch.argmax") | Returns the indices of the maximum value of all elements in the `input` tensor. ---|--- [`argmin`](generated/torch.argmin#torch.argmin "torch.argmin") | Returns the indices of the minimum value(s) of the flattened tensor or along a dimension [`amax`](generated/torch.amax#torch.amax "torch.amax") | Returns the maximum value of each slice of the `input` tensor in the given dimension(s) `dim`. [`amin`](generated/torch.amin#torch.amin "torch.amin") | Returns the minimum value of each slice of the `input` tensor in the given dimension(s) `dim`. [`all`](generated/torch.all#torch.all "torch.all") | Tests if all elements in `input` evaluate to `True`. [`any`](generated/torch.any#torch.any "torch.any") | param input the input tensor. [`max`](generated/torch.max#torch.max "torch.max") | Returns the maximum value of all elements in the `input` tensor. [`min`](generated/torch.min#torch.min "torch.min") | Returns the minimum value of all elements in the `input` tensor. [`dist`](generated/torch.dist#torch.dist "torch.dist") | Returns the p-norm of (`input` \- `other`) [`logsumexp`](generated/torch.logsumexp#torch.logsumexp "torch.logsumexp") | Returns the log of summed exponentials of each row of the `input` tensor in the given dimension `dim`. [`mean`](generated/torch.mean#torch.mean "torch.mean") | Returns the mean value of all elements in the `input` tensor. [`median`](generated/torch.median#torch.median "torch.median") | Returns the median of the values in `input`. [`nanmedian`](generated/torch.nanmedian#torch.nanmedian "torch.nanmedian") | Returns the median of the values in `input`, ignoring `NaN` values. [`mode`](generated/torch.mode#torch.mode "torch.mode") | Returns a namedtuple `(values, indices)` where `values` is the mode value of each row of the `input` tensor in the given dimension `dim`, i.e. [`norm`](generated/torch.norm#torch.norm "torch.norm") | Returns the matrix norm or vector norm of a given tensor. [`nansum`](generated/torch.nansum#torch.nansum "torch.nansum") | Returns the sum of all elements, treating Not a Numbers (NaNs) as zero. [`prod`](generated/torch.prod#torch.prod "torch.prod") | Returns the product of all elements in the `input` tensor. [`quantile`](generated/torch.quantile#torch.quantile "torch.quantile") | Returns the q-th quantiles of all elements in the `input` tensor, doing a linear interpolation when the q-th quantile lies between two data points. [`nanquantile`](generated/torch.nanquantile#torch.nanquantile "torch.nanquantile") | This is a variant of [`torch.quantile()`](generated/torch.quantile#torch.quantile "torch.quantile") that “ignores” `NaN` values, computing the quantiles `q` as if `NaN` values in `input` did not exist. [`std`](generated/torch.std#torch.std "torch.std") | Returns the standard-deviation of all elements in the `input` tensor. [`std_mean`](generated/torch.std_mean#torch.std_mean "torch.std_mean") | Returns the standard-deviation and mean of all elements in the `input` tensor. [`sum`](generated/torch.sum#torch.sum "torch.sum") | Returns the sum of all elements in the `input` tensor. [`unique`](generated/torch.unique#torch.unique "torch.unique") | Returns the unique elements of the input tensor. [`unique_consecutive`](generated/torch.unique_consecutive#torch.unique_consecutive "torch.unique_consecutive") | Eliminates all but the first element from every consecutive group of equivalent elements. [`var`](generated/torch.var#torch.var "torch.var") | Returns the variance of all elements in the `input` tensor. [`var_mean`](generated/torch.var_mean#torch.var_mean "torch.var_mean") | Returns the variance and mean of all elements in the `input` tensor. [`count_nonzero`](generated/torch.count_nonzero#torch.count_nonzero "torch.count_nonzero") | Counts the number of non-zero values in the tensor `input` along the given `dim`. ### Comparison Ops [`allclose`](generated/torch.allclose#torch.allclose "torch.allclose") | This function checks if all `input` and `other` satisfy the condition: ---|--- [`argsort`](generated/torch.argsort#torch.argsort "torch.argsort") | Returns the indices that sort a tensor along a given dimension in ascending order by value. [`eq`](generated/torch.eq#torch.eq "torch.eq") | Computes element-wise equality [`equal`](generated/torch.equal#torch.equal "torch.equal") | `True` if two tensors have the same size and elements, `False` otherwise. [`ge`](generated/torch.ge#torch.ge "torch.ge") | Computes input≥other\text{input} \geq \text{other} element-wise. [`greater_equal`](generated/torch.greater_equal#torch.greater_equal "torch.greater_equal") | Alias for [`torch.ge()`](generated/torch.ge#torch.ge "torch.ge"). [`gt`](generated/torch.gt#torch.gt "torch.gt") | Computes input>other\text{input} > \text{other} element-wise. [`greater`](generated/torch.greater#torch.greater "torch.greater") | Alias for [`torch.gt()`](generated/torch.gt#torch.gt "torch.gt"). [`isclose`](generated/torch.isclose#torch.isclose "torch.isclose") | Returns a new tensor with boolean elements representing if each element of `input` is “close” to the corresponding element of `other`. [`isfinite`](generated/torch.isfinite#torch.isfinite "torch.isfinite") | Returns a new tensor with boolean elements representing if each element is `finite` or not. [`isinf`](generated/torch.isinf#torch.isinf "torch.isinf") | Tests if each element of `input` is infinite (positive or negative infinity) or not. [`isposinf`](generated/torch.isposinf#torch.isposinf "torch.isposinf") | Tests if each element of `input` is positive infinity or not. [`isneginf`](generated/torch.isneginf#torch.isneginf "torch.isneginf") | Tests if each element of `input` is negative infinity or not. [`isnan`](generated/torch.isnan#torch.isnan "torch.isnan") | Returns a new tensor with boolean elements representing if each element of `input` is NaN or not. [`isreal`](generated/torch.isreal#torch.isreal "torch.isreal") | Returns a new tensor with boolean elements representing if each element of `input` is real-valued or not. [`kthvalue`](generated/torch.kthvalue#torch.kthvalue "torch.kthvalue") | Returns a namedtuple `(values, indices)` where `values` is the `k` th smallest element of each row of the `input` tensor in the given dimension `dim`. [`le`](generated/torch.le#torch.le "torch.le") | Computes input≤other\text{input} \leq \text{other} element-wise. [`less_equal`](generated/torch.less_equal#torch.less_equal "torch.less_equal") | Alias for [`torch.le()`](generated/torch.le#torch.le "torch.le"). [`lt`](generated/torch.lt#torch.lt "torch.lt") | Computes input>> m = nn.qat.LinearReLU(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30]) # torch.nn.intrinsic.quantized This module implements the quantized implementations of fused operations like conv + relu. ## ConvReLU2d `class torch.nn.intrinsic.quantized.ConvReLU2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/quantized/modules/conv_relu.html#ConvReLU2d) A ConvReLU2d module is a fused module of Conv2d and ReLU We adopt the same interface as [`torch.nn.quantized.Conv2d`](torch.nn.quantized#torch.nn.quantized.Conv2d "torch.nn.quantized.Conv2d"). Variables **as torch.nn.quantized.Conv2d** (_Same_) – ## ConvReLU3d `class torch.nn.intrinsic.quantized.ConvReLU3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/quantized/modules/conv_relu.html#ConvReLU3d) A ConvReLU3d module is a fused module of Conv3d and ReLU We adopt the same interface as [`torch.nn.quantized.Conv3d`](torch.nn.quantized#torch.nn.quantized.Conv3d "torch.nn.quantized.Conv3d"). Attributes: Same as torch.nn.quantized.Conv3d ## LinearReLU `class torch.nn.intrinsic.quantized.LinearReLU(in_features, out_features, bias=True, dtype=torch.qint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/intrinsic/quantized/modules/linear_relu.html#LinearReLU) A LinearReLU module fused from Linear and ReLU modules We adopt the same interface as [`torch.nn.quantized.Linear`](torch.nn.quantized#torch.nn.quantized.Linear "torch.nn.quantized.Linear"). Variables **as torch.nn.quantized.Linear** (_Same_) – Examples: >>> m = nn.intrinsic.LinearReLU(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30]) # torch.nn.qat This module implements versions of the key nn modules **Conv2d()** and **Linear()** which run in FP32 but with rounding applied to simulate the effect of INT8 quantization. ## Conv2d `class torch.nn.qat.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', qconfig=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/conv.html#Conv2d) A Conv2d module attached with FakeQuantize modules for weight, used for quantization aware training. We adopt the same interface as `torch.nn.Conv2d`, please see for documentation. Similar to `torch.nn.Conv2d`, with FakeQuantize modules initialized to default. Variables **~Conv2d.weight_fake_quant** – fake quant module for weight `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/conv.html#Conv2d.from_float) Create a qat module from a float module or qparams_dict Args: `mod` a float module, either produced by torch.quantization utilities or directly from user ## Linear `class torch.nn.qat.Linear(in_features, out_features, bias=True, qconfig=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/linear.html#Linear) A linear module attached with FakeQuantize modules for weight, used for quantization aware training. We adopt the same interface as `torch.nn.Linear`, please see for documentation. Similar to `torch.nn.Linear`, with FakeQuantize modules initialized to default. Variables **~Linear.weight** – fake quant module for weight `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/qat/modules/linear.html#Linear.from_float) Create a qat module from a float module or qparams_dict Args: `mod` a float module, either produced by torch.quantization utilities or directly from user # torch.nn.quantized.dynamic ## Linear `class torch.nn.quantized.dynamic.Linear(in_features, out_features, bias_=True, dtype=torch.qint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/linear.html#Linear) A dynamic quantized linear module with floating point tensor as inputs and outputs. We adopt the same interface as `torch.nn.Linear`, please see for documentation. Similar to [`torch.nn.Linear`](generated/torch.nn.linear#torch.nn.Linear "torch.nn.Linear"), attributes will be randomly initialized at module creation time and will be overwritten later Variables * **~Linear.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable quantized weights of the module which are of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) . * **~Linear.bias** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable floating point bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized to zero. Examples: >>> m = nn.quantized.dynamic.Linear(20, 30) >>> input = torch.randn(128, 20) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30]) `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/linear.html#Linear.from_float) Create a dynamic quantized module from a float module or qparams_dict Parameters **mod** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – a float module, either produced by torch.quantization utilities or provided by the user ## LSTM `class torch.nn.quantized.dynamic.LSTM(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#LSTM) A dynamic quantized LSTM module with floating point tensor as inputs and outputs. We adopt the same interface as `torch.nn.LSTM`, please see for documentation. Examples: >>> rnn = nn.LSTM(10, 20, 2) >>> input = torch.randn(5, 3, 10) >>> h0 = torch.randn(2, 3, 20) >>> c0 = torch.randn(2, 3, 20) >>> output, (hn, cn) = rnn(input, (h0, c0)) ## LSTMCell `class torch.nn.quantized.dynamic.LSTMCell(*args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#LSTMCell) A long short-term memory (LSTM) cell. A dynamic quantized LSTMCell module with floating point tensor as inputs and outputs. Weights are quantized to 8 bits. We adopt the same interface as `torch.nn.LSTMCell`, please see for documentation. Examples: >>> rnn = nn.LSTMCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> cx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx, cx = rnn(input[i], (hx, cx)) output.append(hx) ## GRUCell `class torch.nn.quantized.dynamic.GRUCell(input_size, hidden_size, bias=True, dtype=torch.qint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#GRUCell) A gated recurrent unit (GRU) cell A dynamic quantized GRUCell module with floating point tensor as inputs and outputs. Weights are quantized to 8 bits. We adopt the same interface as `torch.nn.GRUCell`, please see for documentation. Examples: >>> rnn = nn.GRUCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx = rnn(input[i], hx) output.append(hx) ## RNNCell `class torch.nn.quantized.dynamic.RNNCell(input_size, hidden_size, bias=True, nonlinearity='tanh', dtype=torch.qint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/dynamic/modules/rnn.html#RNNCell) An Elman RNN cell with tanh or ReLU non-linearity. A dynamic quantized RNNCell module with floating point tensor as inputs and outputs. Weights are quantized to 8 bits. We adopt the same interface as `torch.nn.RNNCell`, please see for documentation. Examples: >>> rnn = nn.RNNCell(10, 20) >>> input = torch.randn(6, 3, 10) >>> hx = torch.randn(3, 20) >>> output = [] >>> for i in range(6): hx = rnn(input[i], hx) output.append(hx) # torch.nn.quantized This module implements the quantized versions of the nn modules and functionals. ## Functional interface Functional interface (quantized). `torch.nn.quantized.functional.linear(input, weight, bias=None, scale=None, zero_point=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#linear) Applies a linear transformation to the incoming quantized data: y=xAT+by = xA^T + b . See `Linear` Note Current implementation packs weights on every call, which has penalty on performance. If you want to avoid the overhead, use `Linear`. Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Quantized input of type `torch.quint8` * **weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – Quantized weight of type `torch.qint8` * **bias** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – None or fp32 bias of type `torch.float` * **scale** (_double_) – output scale. If None, derived from the input scale * **zero_point** (_long_) – output zero point. If None, derived from the input zero_point Shape: * Input: (N,∗,in_features)(N, *, in\\_features) where `*` means any number of additional dimensions * Weight: (out_features,in_features)(out\\_features, in\\_features) * Bias: (out_features)(out\\_features) * Output: (N,∗,out_features)(N, *, out\\_features) `torch.nn.quantized.functional.conv1d(input, weight, bias, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros', scale=1.0, zero_point=0, dtype=torch.quint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#conv1d) Applies a 1D convolution over a quantized 1D input composed of several input planes. See `Conv1d` for details and output shape. Parameters * **input** – quantized input tensor of shape (minibatch,in_channels,iW)(\text{minibatch} , \text{in\\_channels} , iW) * **weight** – quantized filters of shape (out_channels,in_channelsgroups,iW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , iW) * **bias** – **non-quantized** bias tensor of shape (out_channels)(\text{out\\_channels}) . The tensor type must be `torch.float`. * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sW,)`. Default: 1 * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padW,)`. Default: 0 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dW,)`. Default: 1 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 * **padding_mode** – the padding mode to use. Only “zeros” is supported for quantized convolution at the moment. Default: “zeros” * **scale** – quantization scale for the output. Default: 1.0 * **zero_point** – quantization zero_point for the output. Default: 0 * **dtype** – quantization data type to use. Default: `torch.quint8` Examples: >>> from torch.nn.quantized import functional as qF >>> filters = torch.randn(33, 16, 3, dtype=torch.float) >>> inputs = torch.randn(20, 16, 50, dtype=torch.float) >>> bias = torch.randn(33, dtype=torch.float) >>> >>> scale, zero_point = 1.0, 0 >>> dtype_inputs = torch.quint8 >>> dtype_filters = torch.qint8 >>> >>> q_filters = torch.quantize_per_tensor(filters, scale, zero_point, dtype_filters) >>> q_inputs = torch.quantize_per_tensor(inputs, scale, zero_point, dtype_inputs) >>> qF.conv1d(q_inputs, q_filters, bias, padding=1, scale=scale, zero_point=zero_point) `torch.nn.quantized.functional.conv2d(input, weight, bias, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros', scale=1.0, zero_point=0, dtype=torch.quint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#conv2d) Applies a 2D convolution over a quantized 2D input composed of several input planes. See `Conv2d` for details and output shape. Parameters * **input** – quantized input tensor of shape (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW) * **weight** – quantized filters of shape (out_channels,in_channelsgroups,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kH , kW) * **bias** – **non-quantized** bias tensor of shape (out_channels)(\text{out\\_channels}) . The tensor type must be `torch.float`. * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sH, sW)`. Default: 1 * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dH, dW)`. Default: 1 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 * **padding_mode** – the padding mode to use. Only “zeros” is supported for quantized convolution at the moment. Default: “zeros” * **scale** – quantization scale for the output. Default: 1.0 * **zero_point** – quantization zero_point for the output. Default: 0 * **dtype** – quantization data type to use. Default: `torch.quint8` Examples: >>> from torch.nn.quantized import functional as qF >>> filters = torch.randn(8, 4, 3, 3, dtype=torch.float) >>> inputs = torch.randn(1, 4, 5, 5, dtype=torch.float) >>> bias = torch.randn(8, dtype=torch.float) >>> >>> scale, zero_point = 1.0, 0 >>> dtype_inputs = torch.quint8 >>> dtype_filters = torch.qint8 >>> >>> q_filters = torch.quantize_per_tensor(filters, scale, zero_point, dtype_filters) >>> q_inputs = torch.quantize_per_tensor(inputs, scale, zero_point, dtype_inputs) >>> qF.conv2d(q_inputs, q_filters, bias, padding=1, scale=scale, zero_point=zero_point) `torch.nn.quantized.functional.conv3d(input, weight, bias, stride=1, padding=0, dilation=1, groups=1, padding_mode='zeros', scale=1.0, zero_point=0, dtype=torch.quint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#conv3d) Applies a 3D convolution over a quantized 3D input composed of several input planes. See `Conv3d` for details and output shape. Parameters * **input** – quantized input tensor of shape (minibatch,in_channels,iD,iH,iW)(\text{minibatch} , \text{in\\_channels} , iD , iH , iW) * **weight** – quantized filters of shape (out_channels,in_channelsgroups,kD,kH,kW)(\text{out\\_channels} , \frac{\text{in\\_channels}}{\text{groups}} , kD , kH , kW) * **bias** – **non-quantized** bias tensor of shape (out_channels)(\text{out\\_channels}) . The tensor type must be `torch.float`. * **stride** – the stride of the convolving kernel. Can be a single number or a tuple `(sD, sH, sW)`. Default: 1 * **padding** – implicit paddings on both sides of the input. Can be a single number or a tuple `(padD, padH, padW)`. Default: 0 * **dilation** – the spacing between kernel elements. Can be a single number or a tuple `(dD, dH, dW)`. Default: 1 * **groups** – split input into groups, in_channels\text{in\\_channels} should be divisible by the number of groups. Default: 1 * **padding_mode** – the padding mode to use. Only “zeros” is supported for quantized convolution at the moment. Default: “zeros” * **scale** – quantization scale for the output. Default: 1.0 * **zero_point** – quantization zero_point for the output. Default: 0 * **dtype** – quantization data type to use. Default: `torch.quint8` Examples: >>> from torch.nn.quantized import functional as qF >>> filters = torch.randn(8, 4, 3, 3, 3, dtype=torch.float) >>> inputs = torch.randn(1, 4, 5, 5, 5, dtype=torch.float) >>> bias = torch.randn(8, dtype=torch.float) >>> >>> scale, zero_point = 1.0, 0 >>> dtype_inputs = torch.quint8 >>> dtype_filters = torch.qint8 >>> >>> q_filters = torch.quantize_per_tensor(filters, scale, zero_point, dtype_filters) >>> q_inputs = torch.quantize_per_tensor(inputs, scale, zero_point, dtype_inputs) >>> qF.conv3d(q_inputs, q_filters, bias, padding=1, scale=scale, zero_point=zero_point) `torch.nn.quantized.functional.max_pool2d(input, kernel_size, stride=None, padding=0, dilation=1, ceil_mode=False, return_indices=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#max_pool2d) Applies a 2D max pooling over a quantized input signal composed of several quantized input planes. Note The input quantization parameters are propagated to the output. See `MaxPool2d` for details. `torch.nn.quantized.functional.adaptive_avg_pool2d(input, output_size)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#adaptive_avg_pool2d) Applies a 2D adaptive average pooling over a quantized input signal composed of several quantized input planes. Note The input quantization parameters propagate to the output. See `AdaptiveAvgPool2d` for details and output shape. Parameters **output_size** – the target output size (single integer or double-integer tuple) `torch.nn.quantized.functional.avg_pool2d(input, kernel_size, stride=None, padding=0, ceil_mode=False, count_include_pad=True, divisor_override=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#avg_pool2d) Applies 2D average-pooling operation in kH×kWkH \times kW regions by step size sH×sWsH \times sW steps. The number of output features is equal to the number of input planes. Note The input quantization parameters propagate to the output. See `AvgPool2d` for details and output shape. Parameters * **input** – quantized input tensor (minibatch,in_channels,iH,iW)(\text{minibatch} , \text{in\\_channels} , iH , iW) * **kernel_size** – size of the pooling region. Can be a single number or a tuple `(kH, kW)` * **stride** – stride of the pooling operation. Can be a single number or a tuple `(sH, sW)`. Default: `kernel_size` * **padding** – implicit zero paddings on both sides of the input. Can be a single number or a tuple `(padH, padW)`. Default: 0 * **ceil_mode** – when True, will use `ceil` instead of `floor` in the formula to compute the output shape. Default: `False` * **count_include_pad** – when True, will include the zero-padding in the averaging calculation. Default: `True` * **divisor_override** – if specified, it will be used as divisor, otherwise size of the pooling region will be used. Default: None `torch.nn.quantized.functional.interpolate(input, size=None, scale_factor=None, mode='nearest', align_corners=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#interpolate) Down/up samples the input to either the given `size` or the given `scale_factor` See [`torch.nn.functional.interpolate()`](nn.functional#torch.nn.functional.interpolate "torch.nn.functional.interpolate") for implementation details. The input dimensions are interpreted in the form: `mini-batch x channels x [optional depth] x [optional height] x width`. Note The input quantization parameters propagate to the output. Note Only 2D/3D input is supported for quantized inputs Note Only the following modes are supported for the quantized inputs: * `bilinear` * `nearest` Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the input tensor * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size. * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to match input size if it is a tuple. * **mode** ([str](https://docs.python.org/3/library/stdtypes.html#str "\(in Python v3.9\)")) – algorithm used for upsampling: `'nearest'` | `'bilinear'` * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'bilinear'`. Default: `False` `torch.nn.quantized.functional.hardswish(input, scale, zero_point)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#hardswish) This is the quantized version of [`hardswish()`](nn.functional#torch.nn.functional.hardswish "torch.nn.functional.hardswish"). Parameters * **input** – quantized input * **scale** – quantization scale of the output tensor * **zero_point** – quantization zero point of the output tensor `torch.nn.quantized.functional.upsample(input, size=None, scale_factor=None, mode='nearest', align_corners=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#upsample) Upsamples the input to either the given `size` or the given `scale_factor` Warning This function is deprecated in favor of `torch.nn.quantized.functional.interpolate()`. This is equivalent with `nn.quantized.functional.interpolate(...)`. See [`torch.nn.functional.interpolate()`](nn.functional#torch.nn.functional.interpolate "torch.nn.functional.interpolate") for implementation details. The input dimensions are interpreted in the form: `mini-batch x channels x [optional depth] x [optional height] x width`. Note The input quantization parameters propagate to the output. Note Only 2D input is supported for quantized inputs Note Only the following modes are supported for the quantized inputs: * `bilinear` * `nearest` Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – quantized input tensor * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size. * **scale_factor** ([float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _or_ _Tuple_ _[_[float](https://docs.python.org/3/library/functions.html#float "\(in Python v3.9\)") _]_) – multiplier for spatial size. Has to be an integer. * **mode** (_string_) – algorithm used for upsampling: `'nearest'` | `'bilinear'` * **align_corners** ([bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") _,__optional_) – Geometrically, we consider the pixels of the input and output as squares rather than points. If set to `True`, the input and output tensors are aligned by the center points of their corner pixels, preserving the values at the corner pixels. If set to `False`, the input and output tensors are aligned by the corner points of their corner pixels, and the interpolation uses edge value padding for out-of-boundary values, making this operation _independent_ of input size when `scale_factor` is kept the same. This only has an effect when `mode` is `'bilinear'`. Default: `False` Warning With `align_corners = True`, the linearly interpolating modes (`bilinear`) don’t proportionally align the output and input pixels, and thus the output values can depend on the input size. This was the default behavior for these modes up to version 0.3.1. Since then, the default behavior is `align_corners = False`. See [`Upsample`](generated/torch.nn.upsample#torch.nn.Upsample "torch.nn.Upsample") for concrete examples on how this affects the outputs. `torch.nn.quantized.functional.upsample_bilinear(input, size=None, scale_factor=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#upsample_bilinear) Upsamples the input, using bilinear upsampling. Warning This function is deprecated in favor of `torch.nn.quantized.functional.interpolate()`. This is equivalent with `nn.quantized.functional.interpolate(..., mode='bilinear', align_corners=True)`. Note The input quantization parameters propagate to the output. Note Only 2D inputs are supported Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – quantized input * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size. * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – multiplier for spatial size `torch.nn.quantized.functional.upsample_nearest(input, size=None, scale_factor=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/functional.html#upsample_nearest) Upsamples the input, using nearest neighbours’ pixel values. Warning This function is deprecated in favor of `torch.nn.quantized.functional.interpolate()`. This is equivalent with `nn.quantized.functional.interpolate(..., mode='nearest')`. Note The input quantization parameters propagate to the output. Note Only 2D inputs are supported Parameters * **input** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – quantized input * **size** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _] or_ _Tuple_ _[_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _,_[int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)") _]_) – output spatial size. * **scale_factor** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – multiplier for spatial size. Has to be an integer. ## ReLU6 `class torch.nn.quantized.ReLU6(inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/activation.html#ReLU6) Applies the element-wise function: ReLU6(x)=min⁡(max⁡(x0,x),q(6))\text{ReLU6}(x) = \min(\max(x_0, x), q(6)) , where x0x_0 is the zero_point, and q(6)q(6) is the quantized representation of number 6. Parameters **inplace** – can optionally do the operation in-place. Default: `False` Shape: * Input: (N,∗)(N, *) where `*` means, any number of additional dimensions * Output: (N,∗)(N, *) , same shape as the input Examples: >>> m = nn.quantized.ReLU6() >>> input = torch.randn(2) >>> input = torch.quantize_per_tensor(input, 1.0, 0, dtype=torch.qint32) >>> output = m(input) ## ELU `class torch.nn.quantized.ELU(scale, zero_point, alpha=1.0)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/activation.html#ELU) This is the quantized equivalent of [`ELU`](generated/torch.nn.elu#torch.nn.ELU "torch.nn.ELU"). Parameters * **scale** – quantization scale of the output tensor * **zero_point** – quantization zero point of the output tensor * **alpha** – the alpha constant ## Hardswish `class torch.nn.quantized.Hardswish(scale, zero_point)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/activation.html#Hardswish) This is the quantized version of [`Hardswish`](generated/torch.nn.hardswish#torch.nn.Hardswish "torch.nn.Hardswish"). Parameters * **scale** – quantization scale of the output tensor * **zero_point** – quantization zero point of the output tensor ## Conv1d `class torch.nn.quantized.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv1d) Applies a 1D convolution over a quantized input signal composed of several quantized input planes. For details on input arguments, parameters, and implementation see [`Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d"). Note Only `zeros` is supported for the `padding_mode` argument. Note Only `torch.quint8` is supported for the input data type. Variables * **~Conv1d.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – packed tensor derived from the learnable weight parameter. * **~Conv1d.scale** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output scale * **~Conv1d.zero_point** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output zero point See [`Conv1d`](generated/torch.nn.conv1d#torch.nn.Conv1d "torch.nn.Conv1d") for other attributes. Examples: >>> m = nn.quantized.Conv1d(16, 33, 3, stride=2) >>> input = torch.randn(20, 16, 100) >>> # quantize input to quint8 >>> q_input = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8) >>> output = m(q_input) `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv1d.from_float) Creates a quantized module from a float module or qparams_dict. Parameters **mod** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – a float module, either produced by torch.quantization utilities or provided by the user ## Conv2d `class torch.nn.quantized.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv2d) Applies a 2D convolution over a quantized input signal composed of several quantized input planes. For details on input arguments, parameters, and implementation see [`Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d"). Note Only `zeros` is supported for the `padding_mode` argument. Note Only `torch.quint8` is supported for the input data type. Variables * **~Conv2d.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – packed tensor derived from the learnable weight parameter. * **~Conv2d.scale** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output scale * **~Conv2d.zero_point** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output zero point See [`Conv2d`](generated/torch.nn.conv2d#torch.nn.Conv2d "torch.nn.Conv2d") for other attributes. Examples: >>> # With square kernels and equal stride >>> m = nn.quantized.Conv2d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.quantized.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> m = nn.quantized.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) >>> input = torch.randn(20, 16, 50, 100) >>> # quantize input to quint8 >>> q_input = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8) >>> output = m(q_input) `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv2d.from_float) Creates a quantized module from a float module or qparams_dict. Parameters **mod** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – a float module, either produced by torch.quantization utilities or provided by the user ## Conv3d `class torch.nn.quantized.Conv3d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv3d) Applies a 3D convolution over a quantized input signal composed of several quantized input planes. For details on input arguments, parameters, and implementation see [`Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d"). Note Only `zeros` is supported for the `padding_mode` argument. Note Only `torch.quint8` is supported for the input data type. Variables * **~Conv3d.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – packed tensor derived from the learnable weight parameter. * **~Conv3d.scale** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output scale * **~Conv3d.zero_point** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – scalar for the output zero point See [`Conv3d`](generated/torch.nn.conv3d#torch.nn.Conv3d "torch.nn.Conv3d") for other attributes. Examples: >>> # With square kernels and equal stride >>> m = nn.quantized.Conv3d(16, 33, 3, stride=2) >>> # non-square kernels and unequal stride and with padding >>> m = nn.quantized.Conv3d(16, 33, (3, 5, 5), stride=(1, 2, 2), padding=(1, 2, 2)) >>> # non-square kernels and unequal stride and with padding and dilation >>> m = nn.quantized.Conv3d(16, 33, (3, 5, 5), stride=(1, 2, 2), padding=(1, 2, 2), dilation=(1, 2, 2)) >>> input = torch.randn(20, 16, 56, 56, 56) >>> # quantize input to quint8 >>> q_input = torch.quantize_per_tensor(input, scale=1.0, zero_point=0, dtype=torch.quint8) >>> output = m(q_input) `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/conv.html#Conv3d.from_float) Creates a quantized module from a float module or qparams_dict. Parameters **mod** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – a float module, either produced by torch.quantization utilities or provided by the user ## FloatFunctional `class torch.nn.quantized.FloatFunctional` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/functional_modules.html#FloatFunctional) State collector class for float operations. The instance of this class can be used instead of the `torch.` prefix for some operations. See example usage below. Note This class does not provide a `forward` hook. Instead, you must use one of the underlying functions (e.g. `add`). Examples: >>> f_add = FloatFunctional() >>> a = torch.tensor(3.0) >>> b = torch.tensor(4.0) >>> f_add.add(a, b) # Equivalent to ``torch.add(a, b)`` Valid operation names: * add * cat * mul * add_relu * add_scalar * mul_scalar ## QFunctional `class torch.nn.quantized.QFunctional` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/functional_modules.html#QFunctional) Wrapper class for quantized operations. The instance of this class can be used instead of the `torch.ops.quantized` prefix. See example usage below. Note This class does not provide a `forward` hook. Instead, you must use one of the underlying functions (e.g. `add`). Examples: >>> q_add = QFunctional() >>> a = torch.quantize_per_tensor(torch.tensor(3.0), 1.0, 0, torch.qint32) >>> b = torch.quantize_per_tensor(torch.tensor(4.0), 1.0, 0, torch.qint32) >>> q_add.add(a, b) # Equivalent to ``torch.ops.quantized.add(a, b, 1.0, 0)`` Valid operation names: * add * cat * mul * add_relu * add_scalar * mul_scalar ## Quantize `class torch.nn.quantized.Quantize(scale, zero_point, dtype)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules.html#Quantize) Quantizes an incoming tensor Parameters * **scale** – scale of the output Quantized Tensor * **zero_point** – zero_point of output Quantized Tensor * **dtype** – data type of output Quantized Tensor Variables **zero_point, dtype** (_`scale`__,_) – Examples:: >>> t = torch.tensor([[1., -1.], [1., -1.]]) >>> scale, zero_point, dtype = 1.0, 2, torch.qint8 >>> qm = Quantize(scale, zero_point, dtype) >>> qt = qm(t) >>> print(qt) tensor([[ 1., -1.], [ 1., -1.]], size=(2, 2), dtype=torch.qint8, scale=1.0, zero_point=2) ## DeQuantize `class torch.nn.quantized.DeQuantize` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules.html#DeQuantize) Dequantizes an incoming tensor Examples:: >>> input = torch.tensor([[1., -1.], [1., -1.]]) >>> scale, zero_point, dtype = 1.0, 2, torch.qint8 >>> qm = Quantize(scale, zero_point, dtype) >>> quantized_input = qm(input) >>> dqm = DeQuantize() >>> dequantized = dqm(quantized_input) >>> print(dequantized) tensor([[ 1., -1.], [ 1., -1.]], dtype=torch.float32) ## Linear `class torch.nn.quantized.Linear(in_features, out_features, bias_=True, dtype=torch.qint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/linear.html#Linear) A quantized linear module with quantized tensor as inputs and outputs. We adopt the same interface as `torch.nn.Linear`, please see for documentation. Similar to [`Linear`](generated/torch.nn.linear#torch.nn.Linear "torch.nn.Linear"), attributes will be randomly initialized at module creation time and will be overwritten later Variables * **~Linear.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable quantized weights of the module of shape (out_features,in_features)(\text{out\\_features}, \text{in\\_features}) . * **~Linear.bias** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable bias of the module of shape (out_features)(\text{out\\_features}) . If `bias` is `True`, the values are initialized to zero. * **~Linear.scale** – `scale` parameter of output Quantized Tensor, type: double * **~Linear.zero_point** – `zero_point` parameter for output Quantized Tensor, type: long Examples: >>> m = nn.quantized.Linear(20, 30) >>> input = torch.randn(128, 20) >>> input = torch.quantize_per_tensor(input, 1.0, 0, torch.quint8) >>> output = m(input) >>> print(output.size()) torch.Size([128, 30]) `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/linear.html#Linear.from_float) Create a quantized module from a float module or qparams_dict Parameters **mod** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – a float module, either produced by torch.quantization utilities or provided by the user ## BatchNorm2d `class torch.nn.quantized.BatchNorm2d(num_features, eps=1e-05, momentum=0.1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/batchnorm.html#BatchNorm2d) This is the quantized version of [`BatchNorm2d`](generated/torch.nn.batchnorm2d#torch.nn.BatchNorm2d "torch.nn.BatchNorm2d"). ## BatchNorm3d `class torch.nn.quantized.BatchNorm3d(num_features, eps=1e-05, momentum=0.1)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/batchnorm.html#BatchNorm3d) This is the quantized version of [`BatchNorm3d`](generated/torch.nn.batchnorm3d#torch.nn.BatchNorm3d "torch.nn.BatchNorm3d"). ## LayerNorm `class torch.nn.quantized.LayerNorm(normalized_shape, weight, bias, scale, zero_point, eps=1e-05, elementwise_affine=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#LayerNorm) This is the quantized version of [`LayerNorm`](generated/torch.nn.layernorm#torch.nn.LayerNorm "torch.nn.LayerNorm"). Additional args: * **scale** \- quantization scale of the output, type: double. * **zero_point** \- quantization zero point of the output, type: long. ## GroupNorm `class torch.nn.quantized.GroupNorm(num_groups, num_channels, weight, bias, scale, zero_point, eps=1e-05, affine=True)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#GroupNorm) This is the quantized version of [`GroupNorm`](generated/torch.nn.groupnorm#torch.nn.GroupNorm "torch.nn.GroupNorm"). Additional args: * **scale** \- quantization scale of the output, type: double. * **zero_point** \- quantization zero point of the output, type: long. ## InstanceNorm1d `class torch.nn.quantized.InstanceNorm1d(num_features, weight, bias, scale, zero_point, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#InstanceNorm1d) This is the quantized version of [`InstanceNorm1d`](generated/torch.nn.instancenorm1d#torch.nn.InstanceNorm1d "torch.nn.InstanceNorm1d"). Additional args: * **scale** \- quantization scale of the output, type: double. * **zero_point** \- quantization zero point of the output, type: long. ## InstanceNorm2d `class torch.nn.quantized.InstanceNorm2d(num_features, weight, bias, scale, zero_point, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#InstanceNorm2d) This is the quantized version of [`InstanceNorm2d`](generated/torch.nn.instancenorm2d#torch.nn.InstanceNorm2d "torch.nn.InstanceNorm2d"). Additional args: * **scale** \- quantization scale of the output, type: double. * **zero_point** \- quantization zero point of the output, type: long. ## InstanceNorm3d `class torch.nn.quantized.InstanceNorm3d(num_features, weight, bias, scale, zero_point, eps=1e-05, momentum=0.1, affine=False, track_running_stats=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/normalization.html#InstanceNorm3d) This is the quantized version of [`InstanceNorm3d`](generated/torch.nn.instancenorm3d#torch.nn.InstanceNorm3d "torch.nn.InstanceNorm3d"). Additional args: * **scale** \- quantization scale of the output, type: double. * **zero_point** \- quantization zero point of the output, type: long. ## Embedding `class torch.nn.quantized.Embedding(num_embeddings, embedding_dim, padding_idx=None, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, sparse=False, _weight=None, dtype=torch.quint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#Embedding) A quantized Embedding module with quantized packed weights as inputs. We adopt the same interface as `torch.nn.Embedding`, please see for documentation. Similar to [`Embedding`](generated/torch.nn.embedding#torch.nn.Embedding "torch.nn.Embedding"), attributes will be randomly initialized at module creation time and will be overwritten later Variables **~Embedding.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable quantized weights of the module of shape (num_embeddings,embedding_dim)(\text{num\\_embeddings}, \text{embedding\\_dim}) . Examples:: >>> m = nn.quantized.Embedding(num_embeddings=10, embedding_dim=12) >>> indices = torch.tensor([9, 6, 5, 7, 8, 8, 9, 2, 8]) >>> output = m(indices) >>> print(output.size()) torch.Size([9, 12] `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#Embedding.from_float) Create a quantized embedding module from a float module Parameters **mod** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – a float module, either produced by torch.quantization utilities or provided by user ## EmbeddingBag `class torch.nn.quantized.EmbeddingBag(num_embeddings, embedding_dim, max_norm=None, norm_type=2.0, scale_grad_by_freq=False, mode='sum', sparse=False, _weight=None, include_last_offset=False, dtype=torch.quint8)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#EmbeddingBag) A quantized EmbeddingBag module with quantized packed weights as inputs. We adopt the same interface as `torch.nn.EmbeddingBag`, please see for documentation. Similar to [`EmbeddingBag`](generated/torch.nn.embeddingbag#torch.nn.EmbeddingBag "torch.nn.EmbeddingBag"), attributes will be randomly initialized at module creation time and will be overwritten later Variables **~EmbeddingBag.weight** ([Tensor](tensors#torch.Tensor "torch.Tensor")) – the non-learnable quantized weights of the module of shape (num_embeddings,embedding_dim)(\text{num\\_embeddings}, \text{embedding\\_dim}) . Examples:: >>> m = nn.quantized.EmbeddingBag(num_embeddings=10, embedding_dim=12, include_last_offset=True, mode='sum') >>> indices = torch.tensor([9, 6, 5, 7, 8, 8, 9, 2, 8, 6, 6, 9, 1, 6, 8, 8, 3, 2, 3, 6, 3, 6, 5, 7, 0, 8, 4, 6, 5, 8, 2, 3]) >>> offsets = torch.tensor([0, 19, 20, 28, 28, 32]) >>> output = m(indices, offsets) >>> print(output.size()) torch.Size([5, 12] `classmethod from_float(mod)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/nn/quantized/modules/embedding_ops.html#EmbeddingBag.from_float) Create a quantized embedding_bag module from a float module Parameters **mod** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – a float module, either produced by torch.quantization utilities or provided by user # torch.overrides This module exposes various helper functions for the `__torch_function__` protocol. See [Extending torch](https://pytorch.org/docs/1.8.0/notes/extending.html#extending-torch) for more detail on the `__torch_function__` protocol. ## Functions `torch.overrides.get_ignored_functions()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#get_ignored_functions) Return public functions that cannot be overridden by `__torch_function__`. Returns A tuple of functions that are publicly available in the torch API but cannot be overridden with `__torch_function__`. Mostly this is because none of the arguments of these functions are tensors or tensor-likes. Return type Set[Callable] #### Examples >>> torch.Tensor.as_subclass in torch.overrides.get_ignored_functions() True >>> torch.add in torch.overrides.get_ignored_functions() False `torch.overrides.get_overridable_functions()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#get_overridable_functions) List functions that are overridable via __torch_function__ Returns A dictionary that maps namespaces that contain overridable functions to functions in that namespace that can be overridden. Return type Dict[Any, List[Callable]] `torch.overrides.get_testing_overrides()` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#get_testing_overrides) Return a dict containing dummy overrides for all overridable functions Returns A dictionary that maps overridable functions in the PyTorch API to lambda functions that have the same signature as the real function and unconditionally return -1. These lambda functions are useful for testing API coverage for a type that defines `__torch_function__`. Return type Dict[Callable, Callable] #### Examples >>> import inspect >>> my_add = torch.overrides.get_testing_overrides()[torch.add] >>> inspect.signature(my_add) `torch.overrides.handle_torch_function(public_api, relevant_args, *args, **kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#handle_torch_function) Implement a function with checks for `__torch_function__` overrides. See torch::autograd::handle_torch_function for the equivalent of this function in the C++ implementation. Parameters * **public_api** (_function_) – Function exposed by the public torch API originally called like `public_api(*args, **kwargs)` on which arguments are now being checked. * **relevant_args** (_iterable_) – Iterable of arguments to check for __torch_function__ methods. * **args** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Arbitrary positional arguments originally passed into `public_api`. * **kwargs** ([tuple](https://docs.python.org/3/library/stdtypes.html#tuple "\(in Python v3.9\)")) – Arbitrary keyword arguments originally passed into `public_api`. Returns Result from calling `implementation` or an `__torch_function__` method, as appropriate. Return type [object](https://docs.python.org/3/library/functions.html#object "\(in Python v3.9\)") :raises TypeError : if no implementation is found.: #### Example >>> def func(a): ... if type(a) is not torch.Tensor: # This will make func dispatchable by __torch_function__ ... return handle_torch_function(func, (a,), a) ... return a + 0 `torch.overrides.has_torch_function()` Check for __torch_function__ implementations in the elements of an iterable. Considers exact `Tensor` s and `Parameter` s non-dispatchable. :param relevant_args: Iterable or aguments to check for __torch_function__ methods. :type relevant_args: iterable Returns True if any of the elements of relevant_args have __torch_function__ implementations, False otherwise. Return type [bool](https://docs.python.org/3/library/functions.html#bool "\(in Python v3.9\)") See also `torch.is_tensor_like()` Checks if something is a Tensor-like, including an exact `Tensor`. `torch.overrides.is_tensor_like(inp)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#is_tensor_like) Returns `True` if the passed-in input is a Tensor-like. Currently, this occurs whenever there’s a `__torch_function__` attribute on the type of the input. #### Examples A subclass of tensor is generally a Tensor-like. >>> class SubTensor(torch.Tensor): ... >>> is_tensor_like(SubTensor([0])) True Built-in or user types aren’t usually Tensor-like. >>> is_tensor_like(6) False >>> is_tensor_like(None) False >>> class NotATensor: ... >>> is_tensor_like(NotATensor()) False But, they can be made Tensor-like by implementing __torch_function__. >>> class TensorLike: ... def __torch_function__(self, func, types, args, kwargs): ... return -1 >>> is_tensor_like(TensorLike()) True `torch.overrides.is_tensor_method_or_property(func)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#is_tensor_method_or_property) Returns True if the function passed in is a handler for a method or property belonging to `torch.Tensor`, as passed into `__torch_function__`. Note For properties, their `__get__` method must be passed in. This may be needed, in particular, for the following reasons: 1. Methods/properties sometimes don’t contain a `__module__` slot. 2. They require that the first passed-in argument is an instance of `torch.Tensor`. #### Examples >>> is_tensor_method_or_property(torch.Tensor.add) True >>> is_tensor_method_or_property(torch.add) False `torch.overrides.wrap_torch_function(dispatcher)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/overrides.html#wrap_torch_function) Wraps a given function with `__torch_function__` -related functionality. Parameters **dispatcher** (_Callable_) – A callable that returns an iterable of Tensor- likes passed into the function. Note This decorator may reduce the performance of your code. Generally, it’s enough to express your code as a series of functions that, themselves, support __torch_function__. If you find yourself in the rare situation where this is not the case, e.g. if you’re wrapping a low-level library and you also need it to work for Tensor-likes, then this function is available. #### Examples >>> def dispatcher(a): # Must have the same signature as func ... return (a,) >>> @torch.overrides.wrap_torch_function(dispatcher) >>> def func(a): # This will make func dispatchable by __torch_function__ ... return a + 0 # torch.quantization This module implements the functions you call directly to convert your model from FP32 to quantized form. For example the `prepare()` is used in post training quantization to prepares your model for the calibration step and `convert()` actually converts the weights to int8 and replaces the operations with their quantized counterparts. There are other helper functions for things like quantizing the input to your model and performing critical fusions like conv+relu. ## Top-level quantization APIs `torch.quantization.quantize(model, run_fn, run_args, mapping=None, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#quantize) Quantize the input float model with post training static quantization. First it will prepare the model for calibration, then it calls `run_fn` which will run the calibration step, after that we will convert the model to a quantized model. Parameters * **model** – input float model * **run_fn** – a calibration function for calibrating the prepared model * **run_args** – positional arguments for `run_fn` * **inplace** – carry out model transformations in-place, the original module is mutated * **mapping** – correspondence between original module types and quantized counterparts Returns Quantized model. `torch.quantization.quantize_dynamic(model, qconfig_spec=None, dtype=torch.qint8, mapping=None, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#quantize_dynamic) Converts a float model to dynamic (i.e. weights-only) quantized model. Replaces specified modules with dynamic weight-only quantized versions and output the quantized model. For simplest usage provide `dtype` argument that can be float16 or qint8. Weight-only quantization by default is performed for layers with large weights size - i.e. Linear and RNN variants. Fine grained control is possible with `qconfig` and `mapping` that act similarly to `quantize()`. If `qconfig` is provided, the `dtype` argument is ignored. Parameters * **model** – input model * **qconfig_spec** – Either: * A dictionary that maps from name or type of submodule to quantization configuration, qconfig applies to all submodules of a given module unless qconfig for the submodules are specified (when the submodule already has qconfig attribute). Entries in the dictionary need to be QConfigDynamic instances. * A set of types and/or submodule names to apply dynamic quantization to, in which case the `dtype` argument is used to specify the bit-width * **inplace** – carry out model transformations in-place, the original module is mutated * **mapping** – maps type of a submodule to a type of corresponding dynamically quantized version with which the submodule needs to be replaced `torch.quantization.quantize_qat(model, run_fn, run_args, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#quantize_qat) Do quantization aware training and output a quantized model Parameters * **model** – input model * **run_fn** – a function for evaluating the prepared model, can be a function that simply runs the prepared model or a training loop * **run_args** – positional arguments for `run_fn` Returns Quantized model. `torch.quantization.prepare(model, inplace=False, allow_list=None, observer_non_leaf_module_list=None, prepare_custom_config_dict=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#prepare) Prepares a copy of the model for quantization calibration or quantization- aware training. Quantization configuration should be assigned preemptively to individual submodules in `.qconfig` attribute. The model will be attached with observer or fake quant modules, and qconfig will be propagated. Parameters * **model** – input model to be modified in-place * **inplace** – carry out model transformations in-place, the original module is mutated * **allow_list** – list of quantizable modules * **observer_non_leaf_module_list** – list of non-leaf modules we want to add observer * **prepare_custom_config_dict** – customization configuration dictionary for prepare function # Example of prepare_custom_config_dict: prepare_custom_config_dict = { # user will manually define the corresponding observed # module class which has a from_float class method that converts # float custom module to observed custom module "float_to_observed_custom_module_class": { CustomModule: ObservedCustomModule } } `torch.quantization.prepare_qat(model, mapping=None, inplace=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#prepare_qat) Prepares a copy of the model for quantization calibration or quantization- aware training and converts it to quantized version. Quantization configuration should be assigned preemptively to individual submodules in `.qconfig` attribute. Parameters * **model** – input model to be modified in-place * **mapping** – dictionary that maps float modules to quantized modules to be replaced. * **inplace** – carry out model transformations in-place, the original module is mutated `torch.quantization.convert(module, mapping=None, inplace=False, remove_qconfig=True, convert_custom_config_dict=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#convert) Converts submodules in input module to a different module according to `mapping` by calling `from_float` method on the target module class. And remove qconfig at the end if remove_qconfig is set to True. Parameters * **module** – prepared and calibrated module * **mapping** – a dictionary that maps from source module type to target module type, can be overwritten to allow swapping user defined Modules * **inplace** – carry out model transformations in-place, the original module is mutated * **convert_custom_config_dict** – custom configuration dictionary for convert function # Example of convert_custom_config_dict: convert_custom_config_dict = { # user will manually define the corresponding quantized # module class which has a from_observed class method that converts # observed custom module to quantized custom module "observed_to_quantized_custom_module_class": { ObservedCustomModule: QuantizedCustomModule } } `class torch.quantization.QConfig` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/qconfig.html#QConfig) Describes how to quantize a layer or a part of the network by providing settings (observer classes) for activations and weights respectively. Note that QConfig needs to contain observer **classes** (like MinMaxObserver) or a callable that returns instances on invocation, not the concrete observer instances themselves. Quantization preparation function will instantiate observers multiple times for each of the layers. Observer classes have usually reasonable default arguments, but they can be overwritten with `with_args` method (that behaves like functools.partial): my_qconfig = QConfig(activation=MinMaxObserver.with_args(dtype=torch.qint8), weight=default_observer.with_args(dtype=torch.qint8)) `class torch.quantization.QConfigDynamic` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/qconfig.html#QConfigDynamic) Describes how to dynamically quantize a layer or a part of the network by providing settings (observer classes) for weights. It’s like QConfig, but for dynamic quantization. Note that QConfigDynamic needs to contain observer **classes** (like MinMaxObserver) or a callable that returns instances on invocation, not the concrete observer instances themselves. Quantization function will instantiate observers multiple times for each of the layers. Observer classes have usually reasonable default arguments, but they can be overwritten with `with_args` method (that behaves like functools.partial): my_qconfig = QConfigDynamic(weight=default_observer.with_args(dtype=torch.qint8)) ## Preparing model for quantization `torch.quantization.fuse_modules(model, modules_to_fuse, inplace=False, fuser_func=, fuse_custom_config_dict=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/fuse_modules.html#fuse_modules) Fuses a list of modules into a single module Fuses only the following sequence of modules: conv, bn conv, bn, relu conv, relu linear, relu bn, relu All other sequences are left unchanged. For these sequences, replaces the first item in the list with the fused module, replacing the rest of the modules with identity. Parameters * **model** – Model containing the modules to be fused * **modules_to_fuse** – list of list of module names to fuse. Can also be a list of strings if there is only a single list of modules to fuse. * **inplace** – bool specifying if fusion happens in place on the model, by default a new model is returned * **fuser_func** – Function that takes in a list of modules and outputs a list of fused modules of the same length. For example, fuser_func([convModule, BNModule]) returns the list [ConvBNModule, nn.Identity()] Defaults to torch.quantization.fuse_known_modules * **fuse_custom_config_dict** – custom configuration for fusion # Example of fuse_custom_config_dict fuse_custom_config_dict = { # Additional fuser_method mapping "additional_fuser_method_mapping": { (torch.nn.Conv2d, torch.nn.BatchNorm2d): fuse_conv_bn }, } Returns model with fused modules. A new copy is created if inplace=True. Examples: >>> m = myModel() >>> # m is a module containing the sub-modules below >>> modules_to_fuse = [ ['conv1', 'bn1', 'relu1'], ['submodule.conv', 'submodule.relu']] >>> fused_m = torch.quantization.fuse_modules(m, modules_to_fuse) >>> output = fused_m(input) >>> m = myModel() >>> # Alternately provide a single list of modules to fuse >>> modules_to_fuse = ['conv1', 'bn1', 'relu1'] >>> fused_m = torch.quantization.fuse_modules(m, modules_to_fuse) >>> output = fused_m(input) `class torch.quantization.QuantStub(qconfig=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/stubs.html#QuantStub) Quantize stub module, before calibration, this is same as an observer, it will be swapped as `nnq.Quantize` in `convert`. Parameters **qconfig** – quantization configuration for the tensor, if qconfig is not provided, we will get qconfig from parent modules `class torch.quantization.DeQuantStub` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/stubs.html#DeQuantStub) Dequantize stub module, before calibration, this is same as identity, this will be swapped as `nnq.DeQuantize` in `convert`. `class torch.quantization.QuantWrapper(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/stubs.html#QuantWrapper) A wrapper class that wraps the input module, adds QuantStub and DeQuantStub and surround the call to module with call to quant and dequant modules. This is used by the `quantization` utility functions to add the quant and dequant modules, before `convert` function `QuantStub` will just be observer, it observes the input tensor, after `convert`, `QuantStub` will be swapped to `nnq.Quantize` which does actual quantization. Similarly for `DeQuantStub`. `torch.quantization.add_quant_dequant(module)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#add_quant_dequant) Wrap the leaf child module in QuantWrapper if it has a valid qconfig Note that this function will modify the children of module inplace and it can return a new module which wraps the input module as well. Parameters * **module** – input module with qconfig attributes for all the leaf modules * **we want to quantize** (_that_) – Returns Either the inplace modified module with submodules wrapped in `QuantWrapper` based on qconfig or a new `QuantWrapper` module which wraps the input module, the latter case only happens when the input module is a leaf module and we want to quantize it. ## Utility functions `torch.quantization.add_observer_(module, qconfig_propagation_list=None, non_leaf_module_list=None, device=None, custom_module_class_mapping=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#add_observer_) Add observer for the leaf child of the module. This function insert observer module to all leaf child module that has a valid qconfig attribute. Parameters * **module** – input module with qconfig attributes for all the leaf modules that we want to quantize * **device** – parent device, if any * **non_leaf_module_list** – list of non-leaf modules we want to add observer Returns None, module is modified inplace with added observer modules and forward_hooks `torch.quantization.swap_module(mod, mapping, custom_module_class_mapping)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#swap_module) Swaps the module if it has a quantized counterpart and it has an `observer` attached. Parameters * **mod** – input module * **mapping** – a dictionary that maps from nn module to nnq module Returns The corresponding quantized module of `mod` `torch.quantization.propagate_qconfig_(module, qconfig_dict=None, allow_list=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#propagate_qconfig_) Propagate qconfig through the module hierarchy and assign `qconfig` attribute on each leaf module Parameters * **module** – input module * **qconfig_dict** – dictionary that maps from name or type of submodule to quantization configuration, qconfig applies to all submodules of a given module unless qconfig for the submodules are specified (when the submodule already has qconfig attribute) Returns None, module is modified inplace with qconfig attached `torch.quantization.default_eval_fn(model, calib_data)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization.html#default_eval_fn) Default evaluation function takes a torch.utils.data.Dataset or a list of input Tensors and run the model on the dataset ## Observers `class torch.quantization.ObserverBase(dtype)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#ObserverBase) Base observer Module. Any observer implementation should derive from this class. Concrete observers should follow the same API. In forward, they will update the statistics of the observed Tensor. And they should provide a `calculate_qparams` function that computes the quantization parameters given the collected statistics. Parameters **dtype** – Quantized data type `classmethod with_args(**kwargs)` Wrapper that allows creation of class factories. This can be useful when there is a need to create classes with the same constructor arguments, but different instances. Example: >>> Foo.with_args = classmethod(_with_args) >>> foo_builder = Foo.with_args(a=3, b=4).with_args(answer=42) >>> foo_instance1 = foo_builder() >>> foo_instance2 = foo_builder() >>> id(foo_instance1) == id(foo_instance2) False `class torch.quantization.MinMaxObserver(dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#MinMaxObserver) Observer module for computing the quantization parameters based on the running min and max values. This observer uses the tensor min/max statistics to compute the quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters. Parameters * **dtype** – Quantized data type * **qscheme** – Quantization scheme to be used * **reduce_range** – Reduces the range of the quantized data type by 1 bit * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup. * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup. Given running min/max as xminx_\text{min} and xmaxx_\text{max} , scale ss and zero point zz are computed as: The running minimum/maximum xmin/maxx_\text{min/max} is computed as: xmin={min⁡(X)if xmin=Nonemin⁡(xmin,min⁡(X))otherwisexmax={max⁡(X)if xmax=Nonemax⁡(xmax,max⁡(X))otherwise\begin{array}{ll} x_\text{min} &= \begin{cases} \min(X) & \text{if~}x_\text{min} = \text{None} \\\ \min\left(x_\text{min}, \min(X)\right) & \text{otherwise} \end{cases}\\\ x_\text{max} &= \begin{cases} \max(X) & \text{if~}x_\text{max} = \text{None} \\\ \max\left(x_\text{max}, \max(X)\right) & \text{otherwise} \end{cases}\\\ \end{array} where XX is the observed tensor. The scale ss and zero point zz are then computed as: if Symmetric:s=2max⁡(∣xmin∣,xmax)/(Qmax−Qmin)z={0if dtype is qint8128otherwiseOtherwise:s=(xmax−xmin)/(Qmax−Qmin)z=Qmin−round(xmin/s)\begin{aligned} \text{if Symmetric:}&\\\ &s = 2 \max(|x_\text{min}|, x_\text{max}) / \left( Q_\text{max} - Q_\text{min} \right) \\\ &z = \begin{cases} 0 & \text{if dtype is qint8} \\\ 128 & \text{otherwise} \end{cases}\\\ \text{Otherwise:}&\\\ &s = \left( x_\text{max} - x_\text{min} \right ) / \left( Q_\text{max} - Q_\text{min} \right ) \\\ &z = Q_\text{min} - \text{round}(x_\text{min} / s) \end{aligned} where QminQ_\text{min} and QmaxQ_\text{max} are the minimum and maximum of the quantized data type. Warning Only works with `torch.per_tensor_symmetric` quantization scheme Warning `dtype` can only take `torch.qint8` or `torch.quint8`. Note If the running minimum equals to the running maximum, the scale and zero_point are set to 1.0 and 0. `class torch.quantization.MovingAverageMinMaxObserver(averaging_constant=0.01, dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False, quant_min=None, quant_max=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#MovingAverageMinMaxObserver) Observer module for computing the quantization parameters based on the moving average of the min and max values. This observer computes the quantization parameters based on the moving averages of minimums and maximums of the incoming tensors. The module records the average minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters. Parameters * **averaging_constant** – Averaging constant for min/max. * **dtype** – Quantized data type * **qscheme** – Quantization scheme to be used * **reduce_range** – Reduces the range of the quantized data type by 1 bit * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup. * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup. The moving average min/max is computed as follows xmin={min⁡(X)if xmin=None(1−c)xmin+cmin⁡(X)otherwisexmax={max⁡(X)if xmax=None(1−c)xmax+cmax⁡(X)otherwise\begin{array}{ll} x_\text{min} = \begin{cases} \min(X) & \text{if~}x_\text{min} = \text{None} \\\ (1 - c) x_\text{min} + c \min(X) & \text{otherwise} \end{cases}\\\ x_\text{max} = \begin{cases} \max(X) & \text{if~}x_\text{max} = \text{None} \\\ (1 - c) x_\text{max} + c \max(X) & \text{otherwise} \end{cases}\\\ \end{array} where xmin/maxx_\text{min/max} is the running average min/max, XX is is the incoming tensor, and cc is the `averaging_constant`. The scale and zero point are then computed as in `MinMaxObserver`. Note Only works with `torch.per_tensor_affine` quantization scheme. Note If the running minimum equals to the running maximum, the scale and zero_point are set to 1.0 and 0. `class torch.quantization.PerChannelMinMaxObserver(ch_axis=0, dtype=torch.quint8, qscheme=torch.per_channel_affine, reduce_range=False, quant_min=None, quant_max=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#PerChannelMinMaxObserver) Observer module for computing the quantization parameters based on the running per channel min and max values. This observer uses the tensor min/max statistics to compute the per channel quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters. Parameters * **ch_axis** – Channel axis * **dtype** – Quantized data type * **qscheme** – Quantization scheme to be used * **reduce_range** – Reduces the range of the quantized data type by 1 bit * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup. * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup. The quantization parameters are computed the same way as in `MinMaxObserver`, with the difference that the running min/max values are stored per channel. Scales and zero points are thus computed per channel as well. Note If the running minimum equals to the running maximum, the scales and zero_points are set to 1.0 and 0. `class torch.quantization.MovingAveragePerChannelMinMaxObserver(averaging_constant=0.01, ch_axis=0, dtype=torch.quint8, qscheme=torch.per_channel_affine, reduce_range=False, quant_min=None, quant_max=None)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#MovingAveragePerChannelMinMaxObserver) Observer module for computing the quantization parameters based on the running per channel min and max values. This observer uses the tensor min/max statistics to compute the per channel quantization parameters. The module records the running minimum and maximum of incoming tensors, and uses this statistic to compute the quantization parameters. Parameters * **averaging_constant** – Averaging constant for min/max. * **ch_axis** – Channel axis * **dtype** – Quantized data type * **qscheme** – Quantization scheme to be used * **reduce_range** – Reduces the range of the quantized data type by 1 bit * **quant_min** – Minimum quantization value. If unspecified, it will follow the 8-bit setup. * **quant_max** – Maximum quantization value. If unspecified, it will follow the 8-bit setup. The quantization parameters are computed the same way as in `MovingAverageMinMaxObserver`, with the difference that the running min/max values are stored per channel. Scales and zero points are thus computed per channel as well. Note If the running minimum equals to the running maximum, the scales and zero_points are set to 1.0 and 0. `class torch.quantization.HistogramObserver(bins=2048, upsample_rate=128, dtype=torch.quint8, qscheme=torch.per_tensor_affine, reduce_range=False)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#HistogramObserver) The module records the running histogram of tensor values along with min/max values. `calculate_qparams` will calculate scale and zero_point. Parameters * **bins** – Number of bins to use for the histogram * **upsample_rate** – Factor by which the histograms are upsampled, this is used to interpolate histograms with varying ranges across observations * **dtype** – Quantized data type * **qscheme** – Quantization scheme to be used * **reduce_range** – Reduces the range of the quantized data type by 1 bit The scale and zero point are computed as follows: 1. Create the histogram of the incoming inputs. The histogram is computed continuously, and the ranges per bin change with every new tensor observed. 2. Search the distribution in the histogram for optimal min/max values. The search for the min/max values ensures the minimization of the quantization error with respect to the floating point model. 3. Compute the scale and zero point the same way as in the `MinMaxObserver` `class torch.quantization.FakeQuantize(observer=, quant_min=0, quant_max=255, **observer_kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/fake_quantize.html#FakeQuantize) Simulate the quantize and dequantize operations in training time. The output of this module is given by x_out = (clamp(round(x/scale + zero_point), quant_min, quant_max)-zero_point)*scale * `scale` defines the scale factor used for quantization. * `zero_point` specifies the quantized value to which 0 in floating point maps to * `quant_min` specifies the minimum allowable quantized value. * `quant_max` specifies the maximum allowable quantized value. * `fake_quant_enable` controls the application of fake quantization on tensors, note that statistics can still be updated. * `observer_enable` controls statistics collection on tensors * `dtype specifies the quantized dtype that is being emulated with fake-quantization,` allowable values are torch.qint8 and torch.quint8. The values of quant_min and quant_max should be chosen to be consistent with the dtype Parameters * **observer** (_module_) – Module for observing statistics on input tensors and calculating scale and zero-point. * **quant_min** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The minimum allowable quantized value. * **quant_max** ([int](https://docs.python.org/3/library/functions.html#int "\(in Python v3.9\)")) – The maximum allowable quantized value. * **observer_kwargs** (_optional_) – Arguments for the observer module Variables **~FakeQuantize.observer** ([Module](generated/torch.nn.module#torch.nn.Module "torch.nn.Module")) – User provided module that collects statistics on the input tensor and provides a method to calculate scale and zero-point. `class torch.quantization.NoopObserver(dtype=torch.float16, custom_op_name='')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#NoopObserver) Observer that doesn’t do anything and just passes its configuration to the quantized module’s `.from_float()`. Primarily used for quantization to float16 which doesn’t require determining ranges. Parameters * **dtype** – Quantized data type * **custom_op_name** – (temporary) specify this observer for an operator that doesn’t require any observation (Can be used in Graph Mode Passes for special case ops). ## Debugging utilities `torch.quantization.get_observer_dict(mod, target_dict, prefix='')` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/quantize.html#get_observer_dict) Traverse the modules and save all observers into dict. This is mainly used for quantization accuracy debug :param mod: the top module we want to save all observers :param prefix: the prefix for the current module :param target_dict: the dictionary used to save all the observers `class torch.quantization.RecordingObserver(**kwargs)` [[source]](https://pytorch.org/docs/1.8.0/_modules/torch/quantization/observer.html#RecordingObserver) The module is mainly for debug and records the tensor values during runtime. Parameters * **dtype** – Quantized data type * **qscheme** – Quantization scheme to be used * **reduce_range** – Reduces the range of the quantized data type by 1 bit [`nn.intrinsic`](torch.nn.intrinsic#module-torch.nn.intrinsic "torch.nn.intrinsic") | ---|--- # Type Info The numerical properties of a [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") can be accessed through either the `torch.finfo` or the `torch.iinfo`. ## torch.finfo `class torch.finfo` A `torch.finfo` is an object that represents the numerical properties of a floating point [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype"), (i.e. `torch.float32`, `torch.float64`, and `torch.float16`). This is similar to [numpy.finfo](https://docs.scipy.org/doc/numpy/reference/generated/numpy.finfo.html). A `torch.finfo` provides the following attributes: Name | Type | Description ---|---|--- bits | int | The number of bits occupied by the type. eps | float | The smallest representable number such that `1.0 + eps != 1.0`. max | float | The largest representable number. min | float | The smallest representable number (typically `-max`). tiny | float | The smallest positive representable number. resolution | float | The approximate decimal resolution of this type, i.e., `10**-precision`. Note The constructor of `torch.finfo` can be called without argument, in which case the class is created for the pytorch default dtype (as returned by [`torch.get_default_dtype()`](generated/torch.get_default_dtype#torch.get_default_dtype "torch.get_default_dtype")). ## torch.iinfo `class torch.iinfo` A `torch.iinfo` is an object that represents the numerical properties of a integer [`torch.dtype`](tensor_attributes#torch.torch.dtype "torch.torch.dtype") (i.e. `torch.uint8`, `torch.int8`, `torch.int16`, `torch.int32`, and `torch.int64`). This is similar to [numpy.iinfo](https://docs.scipy.org/doc/numpy/reference/generated/numpy.iinfo.html). A `torch.iinfo` provides the following attributes: Name | Type | Description ---|---|--- bits | int | The number of bits occupied by the type. max | int | The largest representable number. min | int | The smallest representable number.