BERT = Bidirectional Encoder Representations from Transformers.
Core idea
- BERT is a pretrained NLP model based on Transformer encoder layers.
- It is mainly designed for understanding text.
- It learns bidirectional context for each token.
Architecture
- BERT is built from a stack of encoder layers in Transformer.
- Input text is converted into vectors.
- These vectors pass through multiple encoder layers.
- In each layer, each token can attend to other tokens in the sentence.
- The final output is a contextual representation for each token.
Typical workflow
input text -> BERT -> representation or task-specific predictionCommon use cases
- text classification
- question answering
- named entity recognition
- sentence-pair understanding