A pre-trained transformer network that uses bidirectional encoding to learn two representations of each word in a sentence, allowing for state-of-the-art performance on natural language processing tasks.
For example, BERT can be fine-tuned on a task like sentiment analysis to classify text as positive, negative, or neutral based on the context of the sentence.