The course will cover various concepts related to understanding and processing different types of deep learning-based multimedia sequence models. The course starts with an overview of sequence models, RNNs and continues with details on training RNNs. By introducing different sequence modelling problems, recurrent architectural models and variants of gated units (LSTMs, GRUs) the course covers all fundamental concepts related to sequence learning in intelligent multimedia systems. In addition, the course covers the recurrent and nonrecurrent models of attention (Transformers self-attention etc.) in various multimedia type signals such as vision and/or sound.