ScaledDotProductAttention¶
Defined in fynance.models.attention
- class ScaledDotProductAttention(dropout=0.0)[source]
Bases:
ModuleScaled Dot-Product Attention.
Computes \(\text{Attention}(Q, K, V) = \text{softmax}\!\left(\frac{QK^T}{\sqrt{d_k}}\right)V\).
- Parameters:
- dropoutfloat, optional
Dropout probability applied to the attention weights, default 0.
References
Vaswani et al., “Attention is All You Need”, arXiv 2017.
- forward(Q, K, V, mask=None)[source]
Compute attention.
- Parameters:
- Qtorch.Tensor
Queries, shape
(B, ..., T, d_k).- Ktorch.Tensor
Keys, shape
(B, ..., S, d_k).- Vtorch.Tensor
Values, shape
(B, ..., S, d_v).- masktorch.Tensor, optional
Boolean mask of shape
(B, ..., T, S). Positions wheremask == 0are set to-infbefore softmax.
- Returns:
- torch.Tensor
Output of shape
(B, ..., T, d_v).- torch.Tensor
Attention weights of shape
(B, ..., T, S).