Transformer Neural Networks – EXPLAINED! (Attention is all you need)

Please subscribe to keep me alive:

BLOG:

MATH COURSES (7 day free trial)
📕 Mathematics for Machine Learning:
📕 Calculus:
📕 Statistics for Data Science:
📕 Bayesian Statistics:
📕 Linear Algebra:
📕 Probability:

OTHER RELATED COURSES (7 day free trial)
📕 ⭐ Deep Learning Specialization:
📕 Python for Everybody:
📕 MLOps Course:
📕 Natural Language Processing (NLP):
📕 Machine Learning in Production:
📕 Data Science Specialization:
📕 Tensorflow:

REFERENCES
[1] The main Paper:
[2] Tensor2Tensor has some code with a tutorial:
[3] Transformer very intuitively explained – Amazing:
[4] Medium Blog on intuitive explanation:
[5] Pretrained word embeddings:
[6] Intuitive explanation of Layer normalization:
[7] Paper that gives even better results than transformers (Pervasive Attention):
[8] BERT uses transformers to pretrain neural nets for common NLP tasks. :
[9] Stanford Lecture on RNN:
[10] Colah’s Blog:
[11] Wiki for timeseries of events: (machine_learning_model)