AttentionΒΆ

Scaled dot-product and multi-head attention modules for Transformer-based architectures.

ScaledDotProductAttention([dropout])

Scaled Dot-Product Attention.

MultiHeadAttention(d_model, num_heads[, dropout])

Multi-Head Self-Attention.