BERT, which stands for Bidirectional Encoder Representations from Transformers, is a natural language processing (NLP) model developed by Google in 2018 for tasks such as question answering, language understanding, and sentiment analysis. This deep learning model has significantly improved the understanding of the context of words in a sentence, and goes beyond examining words in isolation.

How BERT works

BERT uses a transformer attention mechanism that learns contextual relations between words (or sub-words) in a text. Unlike previous models, which read text input sequentially (either left-to-right or right-to-left), BERT reads the entire sequence of words at once. This characteristic allows the model to learn and use the context of each word based on all of its surroundings (left and right of the word).

The "Attention" mechanism in the transformer architecture allows the model to weigh the importance of different words in the input when producing an output. This is particularly useful in understanding the context and semantics of a word in a sentence.

BERT is pre-trained on two unsupervised tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP). MLM randomly masks words in the sentence and predicts them based on the context provided by non-masked words. NSP, on the other hand, learns to predict whether two sentences logically follow each other or not. This pre-training step helps BERT gain a solid language understanding before it is fine-tuned for specific tasks.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Related terms
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.