F-score, also known as the F1 score, is a measure used in statistical analysis of binary classification systems. It considers both precision and recall to compute the score. Precision is the number of true positive results divided by the number of all positive results, including those not correctly identified, while recall is the number of true positive results divided by the number of all samples that should have been identified as positive. The F-score reaches its best value at 1 (perfect precision and recall) and worst at 0.

How F-score works

The F-score is the harmonic mean of precision and recall. While the regular mean treats all values equally, the harmonic mean gives much more weight to low values. As a result, the classifier will only get a high F-score if both recall and precision are high. So, to perform well, a model must have both a high precision (few false positives) and a high recall (few false negatives).

Here is the formula for F-score:

2 * (precision * recall) / (precision + recall)

In the case where precision and/or recall are 0, this will lead to division by 0. To handle this, in the case where both precision and recall are 0, the F-score is defined to be 0.

Lakera LLM Security Playbook
Learn how to protect against the most common LLM vulnerabilities

Download this guide to delve into the most common LLM security risks and ways to mitigate them.

Related terms
untouchable mode.
Get started for free.

Lakera Guard protects your LLM applications from cybersecurity risks with a single line of code. Get started in minutes. Become stronger every day.

Join our Slack Community.

Several people are typing about AI/ML security. 
Come join us and 1000+ others in a chat that’s thoroughly SFW.