VOCABULARY

Synthetic Data Generation

Synthetic data generation is the process of creating artificial data that mimics real-world data. This technique is often used in machine learning and AI to generate large datasets for training models, especially in situations where real data is scarce, expensive to obtain, or sensitive in nature.

How Synthetic Data Generation Works

To create synthetic data, algorithms often use existing data as a basis to understand patterns and distributions. Then, they generate new data points that are statistically similar but not identical. For example, in image processing, synthetic data might include new images that are variations of existing ones, tweaked in terms of lighting, angles, or background. In finance, it could involve generating transactional data that mirrors real customer behavior without using actual customer data.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Related terms
Activate
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.