ScaledDotProductAttention¶

Defined in fynance.models.attention

class ScaledDotProductAttention(dropout=0.0)[source]

Scaled Dot-Product Attention.

Computes \(\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right)V\).

Parameters:

dropoutfloat, optional: Dropout probability applied to the attention weights, default 0.

References

Vaswani et al., “Attention is All You Need”, arXiv 2017.

forward(Q, K, V, mask=None)[source]

Compute attention.

Parameters:

Qtorch.Tensor: Queries, shape (B, ..., T, d_k).
Ktorch.Tensor: Keys, shape (B, ..., S, d_k).
Vtorch.Tensor: Values, shape (B, ..., S, d_v).
masktorch.Tensor, optional: Boolean mask of shape (B, ..., T, S). Positions where mask == 0 are set to -inf before softmax.

Returns: