Cookie Consent
Hi, this website uses essential cookies to ensure its proper operation and tracking cookies to understand how you interact with it. The latter will be set only after consent.
Read our Privacy Policy

Data-Centric AI

Data-centric AI is a model of artificial intelligence where the emphasis is majorly on the quality and management of the data used for training the AI system, rather than focusing solely on improving the algorithm or model. It recognizes the crucial role that data plays in AI, asserting that refining and curating data can lead to significant improvements in AI system performance.

Data-centric AI approach in practice

In a data-centric AI approach, a large portion of the work revolves around preparing and maintaining the data used for training models. This includes tasks like data collection, cleaning, labeling, and ongoing curation. In essence, it means investing time and resources to ensure that the data feeding into the system is accurate, reliable, diverse, and truly representative.

One method used in data-centric AI is consistent and precise labeling, which can significantly impact the performance of AI models. Robust label management systems are therefore a crucial part of the process, ensuring that labels across datasets are uniform and accurate.

Another method involves continuous data auditing that ensures the data used is of high quality, devoid of bias and errors that could affect the model's performance. There might also be iterations of refining data and retraining the model, in a way that any errors or biases found are corrected in the data itself.

Lastly, techniques for synthetic and augmented data generation could also be used to enhance data quantity and quality, enabling models to learn from a broader and more diverse set of instances.

In conclusion, data-centric AI works by putting the focus on the data used to train the model, ensuring the consistency and quality of this data to improve the overall performance and reliability of the AI system.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Related terms
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.