Transformer Neural Networks – EXPLAINED! (Attention is all you need)

Please subscribe to keep me alive:

BLOG:

MATH COURSES (7 day free trial)
πŸ“• Mathematics for Machine Learning:
πŸ“• Calculus:
πŸ“• Statistics for Data Science:
πŸ“• Bayesian Statistics:
πŸ“• Linear Algebra:
πŸ“• Probability:

OTHER RELATED COURSES (7 day free trial)
πŸ“• ⭐ Deep Learning Specialization:
πŸ“• Python for Everybody:
πŸ“• MLOps Course:
πŸ“• Natural Language Processing (NLP):
πŸ“• Machine Learning in Production:
πŸ“• Data Science Specialization:
πŸ“• Tensorflow:

REFERENCES
[1] The main Paper:
[2] Tensor2Tensor has some code with a tutorial:
[3] Transformer very intuitively explained – Amazing:
[4] Medium Blog on intuitive explanation:
[5] Pretrained word embeddings:
[6] Intuitive explanation of Layer normalization:
[7] Paper that gives even better results than transformers (Pervasive Attention):
[8] BERT uses transformers to pretrain neural nets for common NLP tasks. :
[9] Stanford Lecture on RNN:
[10] Colah’s Blog:
[11] Wiki for timeseries of events: (machine_learning_model)

Leave a Reply

Your email address will not be published. Required fields are marked *