BERT = Bidirectional Encoder Representations from Transformers.

Core idea

  • BERT is a pretrained NLP model based on Transformer encoder layers.
  • It is mainly designed for understanding text.
  • It learns bidirectional context for each token.

Architecture

  • BERT is built from a stack of encoder layers in Transformer.
  • Input text is converted into vectors.
  • These vectors pass through multiple encoder layers.
  • In each layer, each token can attend to other tokens in the sentence.
  • The final output is a contextual representation for each token.

Typical workflow

input text -> BERT -> representation or task-specific prediction

Common use cases

  • text classification
  • question answering
  • named entity recognition
  • sentence-pair understanding