Precision and Recall in AI

Written by
Andy Pandharikar
March 12, 2018

What is Precision and Recall?

In Artificial Intelligence, precision and recall can be intuitively understood as concepts by thinking of precision as addressing the question “how reliable are the results” and recall as “how complete are the results”.  An example will help understand it better. Let us say we have a computer program for recognizing pictures of shoes and it identifies 8 shoes in a picture containing 12 shoes and some socks. Of the eight shoes identified, five are actually shoes (true positives) while the rest are socks (false positives). The program’s precision is 5 / 8 while its recall is 5 / 12.

Why does it Matter?

In simple terms, high precision means an algorithm returned substantially more relevant results than irrelevant ones, while a high recall means the algorithm returned most of the relevant results.

Precision measures the quality of our prediction only based on what our predictor claims to be positive (regardless of all it might miss).

Precision = All we predicted correctly / All we predicted correctly or incorrectly

However, recall is to measure such quality with respect to the mistakes we did (what should have been predicted as positive but we flagged as negative):

Recall = All we predicted correctly / All we should have predicted


When Precision matters more than Recall

For example, the fundamental concept behind many countries’ judicial system is to risk punishing fewer culprits so that no innocent is ever wrongly punished. When you pardon more culprits than you should, your recall is low, but if you punish someone that you are absolutely sure is a criminal, then your precision is high.

Other examples where a higher precision is valued over recall is when the cost of false positives is far larger than that of false negatives such as military AI systems that identify an enemy or a stock buying system that identifies profitable stocks or a gold mining survey system to identify mining sites.

When Recall matters more than Precision

Suppose we had a weather forecasting model that predicts rainy days and non-rainy days. If we are in a business where missing a claim is going to be costly for our business, it is more important to predict all rainy days at the expense of predicting more than there is.

If an AI model is good at recall it does not mean it is good at precision too. That is why we need an F1 score which is a harmonic mean of precision and recall to evaluate the algorithm.

Commerce AI’s models are performing better than state-of-the-art technologies and more consistent than human performance.  For large scale data, which is the reality in today’s world, we have surpassed human performance. We continue to improve our techniques, data set and computation infrastructure. With 10B unique products and over 100B points of consumer reviews per year to learn from, we invite your business to embark on a new revolution with Commerce AI.

Return to blog