VOCABULARY
Augmentation
Data augmentation refers to techniques used to artificially increase the size of a dataset by applying various transformations on the original data. Having a larger training dataset can lead to better generalization.
How Data Augmentation Works
Let's have a look at the types of data that can be augmented, the purpose and implementation of data augmentation techniques.
1. Type of Data:
Image Data:
- Rotations, zooming, flips (horizontal & vertical), color variations, cropping, and more.
- More advanced techniques include cutout, and mixup.
Text Data:
- Back translation (translating a sentence from the original language to another language and then back to the original language), synonym replacement, and sentence shuffling.
Audio Data:
- Changing pitch, speed, or adding noise.
2. Purpose:
- Enhance Generalization: By exposing the model to various modifications of the original data, the model becomes more robust and can generalize better to new, unseen data.
- Balance Datasets: Augmentation can be used to balance classes in datasets by artificially increasing the number of examples in underrepresented classes.
3. Implementation:
- Most machine learning libraries, such as TensorFlow and PyTorch, have built-in functions or modules for data augmentation.
- Data augmentation is typically applied during the training process. When a training batch is requested, raw data samples are fetched and augmented on-the-fly before being fed into the model.
![Lakera LLM Security Playbook](https://cdn.prod.website-files.com/65080baa3f9a607985451de3/65254b087d7b8af4624a0982_CTA%20Image.webp)
Learn how to protect against the most common LLM vulnerabilities
Download this guide to delve into the most common LLM security risks and ways to mitigate them.
Related terms
![](https://cdn.prod.website-files.com/65080baa3f9a607985451de3/650d8986cacb870bc87f91f2_Spacer%20Bottomr.webp)
Activate
untouchable mode.
untouchable mode.
Get started for free.
Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.
Join our Slack Community.
Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.