BERT model

BERT = Bidirectional Encoder Representations from Transformers.

Core idea

BERT is a pretrained NLP model based on Transformer encoder layers.
It is mainly designed for understanding text.
It learns bidirectional context for each token.

Architecture

BERT is built from a stack of encoder layers in Transformer.
Input text is converted into vectors.
These vectors pass through multiple encoder layers.
In each layer, each token can attend to other tokens in the sentence.
The final output is a contextual representation for each token.

Typical workflow

input text -> BERT -> representation or task-specific prediction

Common use cases

text classification
question answering
named entity recognition
sentence-pair understanding