Enkrypt AI Safety Leaderboard
Find out how your favorite LLM ranks for safety and security
Check Model Compliance For
NIST IconAI 600
OWASP IconOWASP TOP 10

Background vector graphic
Twitter
Linkedin
Copy
Rank
Model Name
Enkrypt AI Rating
Performance
Performance VS Risk
NIST Risk Score
OWASP Score
Page 1 of 13

Top Models From Leaderboard

Diamond

Safest model to use

Anthropic Iconclaude-3-opus-20240229
Diamond

Best Performance Model

D
deepseek-reasoner
Diamond

Best Performance to Risk

Anthropic Iconclaude-3-opus-20240229
What is Enkrypt AI Rating?

Enkrypt AI Rating represents model safety level measured by analyzing model vulnerabilities. It is calculated by inversely mapping the NIST Risk Score (ranging from 0 to 62.5) onto a rating scale from 5 (safe) to 0 (risky).

Formula

Rating is 5 when NIST Risk Score is 0

Rating is 3 when NIST Risk Score is 25

Rating is 0 when NIST Risk Score is 62.5

What is NIST Risk Score?

The NIST Risk Score denotes how vulnerable an AI model is to NIST Risks like Bias, Harmful content, Toxicity, CBRN, and Insecure Code Generation. Risk in each of the tests is denoted by percentage of successful attacks. NIST Risk Score is calculated by taking an average of risk found for each test.

NIST Sheet
Performance vs Risk Score

Performance vs Risk is a Ratio of model performance to its NIST Risk score. Performance is the MMLU Score, which measures the models capabilities.

How to read Bias, Harmful Tests, Toxicity, CBRN, Insecure Code

Risk Scores on these risk categories denote percentage of attacks successful against the total number of attacks for that risk category.

What is OWASP Score?

OWASP Score is the weighted average of risks found in the model for Bias, Harmful Tests, Toxicity, CBRN, and Insecure Code Generation. The weights are defined according to the ranking of these risks in OWASP Top 10 for LLMs 2025.

We use linearly decreasing weights with LLM01 Prompt Injection getting the weight 10 and LLM10 Unbounded Consumption geting the weight 1. We map our tests (Bias, Harmful Tests, etc) to OWASP Top 10 for LLMs and calculate total weight for each test.

OWASP Sheet

Bias falls into two of the OWASP Top 10 categories. Our Bias tests are a form of Injection attacks (LLM 01 Prompt Injection - Weight 10). The bias in model also suggests an issue with training data (LLM04 Data and Model Poisoning - Weight 7). Hence the weight for Bias is 10 + 7 = 17.

Harmful tests, Toxicity and CBRN are a form of Prompt Injection attack. We assign a weight of 10 to each of them.

Insecure code categorises into LLM01 Prompt Injection, LLM05 Improper Output Handling and LLM09 Misinformation making the total weight as 18.