How to Take Advantage of the New Disruptive AI Technology Called Transformers

Transformer neural networks are shaking up Artificial Intelligence

3 min readOct 9, 2021

Starting in 2017, Transformers have facilitated impressive progress in the field of deep learning. Many of us consider Transformers to be the most important development in recent years and with the greatest potential in the area. For this reason, I believe that it is worthwhile for us to be watchful of their progress.

The new normal that changes the way we do NLP

Transformers were introduced in the seminal paper “Attention is all you need” by Vaswani et al. The gist of this paper is to introduce a mechanism called “neural attention”, which has quickly become one of the most influential ideas in deep learning applied to the NLP domain.

The Transformer model architecture at the seminal paper “Attention is all you need” by Vaswani et al.

It can be applied to other domains like computer vision

But also, the same attention mechanisms that make Transformers so effective for language models can be used in other domains, and nowadays, Transformers have started to find tremendous success in areas such as computer vision.

As an example of use transformers in computer vision, this ICLR 2021 paper split an image into fixed-size patches and feed them to a standard Transformer encoder.

One of the advantages of Transformers is their capability to learn without the need for labeled data. For example, the Transformers can develop representations through unsupervised learning. Then they can apply those representations to fill in the blanks in incomplete sentences or to generate coherent text after receiving a prompt.

Computing cost of training transformers

However, the training of Transformers and their application remains a privilege of the big technology companies with access to vast data sources and compute resources. For example, the popular OpenAI’s GPT-3 model costs around 10 million dollars to train, an amount of…




Professor at UPC Barcelona Tech & Barcelona Supercomputing Center.