Autoregressive models are a sequential deep generative models which are among the constituent part of
the
state-of-the art models
in different domains such as Image Generation,
Audio Generation ,
Gesture Generation and
Large Language Models.
Since autoregressive models are sequential, they are similar to RNNs but they differ in the method.
While RNNs are recurrent, autoregressive models are feed-forward yet both methods are supervised.
Figure 1:
Both RNNs and
Autoregressive Models process data sequentially. (Left) RNN output at time-step T depends not
only the current input, but also the previous inputs and the dependency is provided through
hiddent states. (Right) The output of an autoregressive model also depends on current and
previous inputs but in this case previous outputs are explicitly given to the model as inputs.
On the other hand, because autoregressive models are a type of generative models, their expressiveness
capability is often compared against VAEs and GANs but there are some core differences. First of all,
although autoregressive models and VAEs are explicit models, due to intractable likelihood computation,
VAEs optimize Evidence Lower Bound whereas in
autoregressive models likelihood is optimized directly.
When it comes to comparing autoregressive models with GANs, GANs are implicit models which optimizes
minimax objective as opposed to explicit likelihood based autoregressive models.
Model
Main Objective
Explicit/Implicit
0
Autoregressive Models
Log-Likelihood
Explicit
1
Variational Autoencoders
(VAEs)
Evidence Lower Bound
(ELBO)
Explicit
2
Generative Adversarial
Networks (GANs)
Minimax
Implicit
Table 1: General Properties of Deep Generative Models
🍁 Chain Rule
We start our discussion on autoregressive models with the chain rule of probability. From now on, we
assume that we have an access to a dataset of N dimensional binary datapoints and the cardinality of M:
The chain rule of probability states that joint probability distribution can be factorized into
conditional probability distributions. This fact constitutes the backbone of Bayesian Networks and
Autoregressive Models.
which means the probability of any random variable in the set is conditionally dependent on all the
random variables preceding it and further, the dependence order is predefined and fixed. Such a
dependence relationship can be graphically described as: