Attention¶

Scaled dot-product and multi-head attention modules for Transformer-based architectures.

`ScaledDotProductAttention`([dropout])	Scaled Dot-Product Attention.
`MultiHeadAttention`(d_model, num_heads[, dropout])	Multi-Head Self-Attention.