ScaledDotProductAttention

Defined in fynance.models.attention

class ScaledDotProductAttention(dropout=0.0)[source]

Bases: Module

Scaled Dot-Product Attention.

Computes \(\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right)V\).

Parameters:
dropoutfloat, optional

Dropout probability applied to the attention weights, default 0.

References

Vaswani et al., “Attention is All You Need”, arXiv 2017.

forward(Q, K, V, mask=None)[source]

Compute attention.

Parameters:
Qtorch.Tensor

Queries, shape (B, ..., T, d_k).

Ktorch.Tensor

Keys, shape (B, ..., S, d_k).

Vtorch.Tensor

Values, shape (B, ..., S, d_v).

masktorch.Tensor, optional

Boolean mask of shape (B, ..., T, S). Positions where mask == 0 are set to -inf before softmax.

Returns:
torch.Tensor

Output of shape (B, ..., T, d_v).

torch.Tensor

Attention weights of shape (B, ..., T, S).