Variance Reduction with a Baseline

Generated by author

This is an advanced theoretical blog which focusses on one of the most intriguing and complex aspect of policy gradient algorithms. The reader is assumed to have some basic understanding of policy gradient algorithms: A popular class of reinforcement learning algorithms which estimates the gradient for a function approximation. You can refer to chapter 13 of Reinforcement Learning: An Introduction for understanding policy gradient algorithms.

Quick Revision of Policy Gradients !

In policy gradient setup, the idea is to directly parameterise the policy. The optimal policy is the policy with highest value function. This is easier and certainly different from value-based method, where we first find…

Thanks for responding. Its h(planks)/(2*pi). 6.6/(2*3.14) ~ 1.05 ! I will leave a comment in the code.

Hey, thanks for suggestion. I tried to type it in latex, its too much effort so I resorted to writing with stylus. Anyway I will try to redo the blog with latex eqs.

Bidirectional Encoder Representations from Transformers

Generated by Author


BERT (Bidirectional Encoder Representations from Transformers) is a language representation model. It is a recent success in NLP which proved to outperform many existing state-of-art models in many NLP tasks. These pre-trained models can then be fine-tuned for many NLP problems like question&answering and sentiment analysis.

Pre-trained language representations can either be context-free or context-based. Context-free models like word2vec generate a single word embedding representation (a vector of numbers) for each word in the vocabulary. Like other dense representation models like Word2Vec, Bert also employs an unsupervised learning setup, therefore eliminating the need for labelled data. BERT tries to predict…

Scaled Dot-product Attention, Multi-Headed Self Attention,

Generated by Author


Transformers is a novel neural architecture that proved to be a recent success in machine learning translation. Like encoder-decoder models, Transformer is an architecture for transforming one sequence into another using the encoder and decoder. The difference is from the previously RNN based sequence-to-sequence models is that transformer does-not use any Recurrent Networks (GRU, LSTM, etc.) as neither encoder nor decoder. So the transformers eliminated the need for using the RNN connections in the encoder and decoder networks.

The idea of transformer is that instead of using the RNN for accumulating the memory, transformers uses multi-headed attention directly on the…

Bottle Neck Problem, Dot-Product Attention

Generated by Author


The RNN based encoder and decoder models are proved to be very powerful neural architecture which provides a practical solution to many sequence to sequence predictions problems like machine translation, question answering model and text summarization.
The encoder in the model is tasked with building a contextual representation if input sequence. The decoder , which uses the context to generate the output sequence. In RNN context we described in the last blog, the context vector is essentially the last hidden state of the last time step“ hn” in the chain of input sequence. …

Sequence to Sequence Network, Contextual Representation

Generated by Author


In tasks like machine translation, we must map from a sequence of input words to a sequence of output words. The reader must note that this is not similar to “sequence labelling”, where that task it to to map each word in the sequence to a predefined classes, like part-of-speech or named entity task.

Constant Error Carousel

Generated by Author

RNN or LSTM is only concerned about the accumulation of a memory context in the forward direction. But we would also want the model to allow for both the “forward” context and “backward” context to be incorporated into a prediction. This can be achieved if we have a model architecture that run over forward sequence (“He is a good person”) and backward sequence(“person good a is He”). The kind of RNN’s that is specifically built for this kind of bi-directional sequences is called Bidirectional RNN. …

Kowshik chilamkurthy


Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store