Enkrypt AI Safety Leaderboard
Find out how your favorite LLM ranks for safety and security
Check Model Compliance For
AI 600
OWASP TOP 10

↑Rank

Rank is based on Enkrypt AI Rating. Higher the Rating, higher the Rank.

Model Name

Enkrypt AI Rating

Enkrypt AI Rating represents model safety level measured by analyzing model vulnerabilities. It is calculated by inversely mapping the NIST Risk Score (ranging from 0 to 62.5) onto a rating scale from 5 (safe) to 0 (risky).

Performance

Model’s MMLU Score indicating its performance. (Source: Internet)

Performance VS Risk

A Ratio of model performance to its NIST Risk score. A higher ratio indicates a better model.

NIST Risk Score

The NIST Risk Score denotes how vulnerable an AI model is to NIST Risks like Bias, Harmful content, Toxicity, CBRN, and Insecure Code Generation. Risk in each of the tests is denoted by percentage of successful attacks. NIST Risk Score is calculated by taking an average of risk found for each test.

OWASP Score

OWASP Score is the weighted average of risks found in the model for Bias, Harmful Tests, Toxicity, CBRN, and Insecure Code Generation. The weights are defined according to the ranking of these risks in OWASP Top 10 for LLMs 2025.

EAI-Llama-3.1-8B-Instruct-SageAlign-DPO

claude-3-opus-20240229

68.45

6.22:1

11%

13%

claude-3-5-sonnet-20241022

78.0

5.57:1

14%

19%

Mistral-NeMo-Minitron-8B-Instruct

14%

13%

gemma-2-9b-it-Enkrypt-Aligned

16%

20%

PowerLM-3b-EAI-Aligned

17%

20%

grok-3-EAI-Hardened-System-Prompt

18%

23%

o1-preview

19%

24%

claude-3-5-sonnet-20240620

19%

24%

granite-3.1-8b-instruct-Enkrypt-Aligned

20%

26%

Page 1 of 19

Top Models From Leaderboard

Safest model to use

EAI-Llama-3.1-8B-Instruct-SageAlign-DPO

Check out this model

Best Performance Model

deepseek-reasoner

Check out this model

Best Performance to Risk

claude-3-opus-20240229

Check out this model

What is Enkrypt AI Rating?

Rating is 5 when NIST Risk Score is 0

Rating is 3 when NIST Risk Score is 25

Rating is 0 when NIST Risk Score is 62.5

What is NIST Risk Score?

Performance vs Risk Score

Performance vs Risk is a Ratio of model performance to its NIST Risk score. Performance is the MMLU Score, which measures the models capabilities.

How to read Bias, Harmful Tests, Toxicity, CBRN, Insecure Code

Risk Scores on these risk categories denote percentage of attacks successful against the total number of attacks for that risk category.

What is OWASP Score?

We use linearly decreasing weights with LLM01 Prompt Injection getting the weight 10 and LLM10 Unbounded Consumption geting the weight 1. We map our tests (Bias, Harmful Tests, etc) to OWASP Top 10 for LLMs and calculate total weight for each test.

Bias falls into two of the OWASP Top 10 categories. Our Bias tests are a form of Injection attacks (LLM 01 Prompt Injection - Weight 10). The bias in model also suggests an issue with training data (LLM04 Data and Model Poisoning - Weight 7). Hence the weight for Bias is 10 + 7 = 17.

Harmful tests, Toxicity and CBRN are a form of Prompt Injection attack. We assign a weight of 10 to each of them.

Insecure code categorises into LLM01 Prompt Injection, LLM05 Improper Output Handling and LLM09 Misinformation making the total weight as 18.

Enkrypt AI Safety LeaderboardFind out how your favorite LLM ranks for safety and securityCheck Model Compliance ForAI 600OWASP TOP 10

Top Models From Leaderboard

Safest model to use

Best Performance Model

Best Performance to Risk

What is Enkrypt AI Rating?

What is NIST Risk Score?

Performance vs Risk Score

How to read Bias, Harmful Tests, Toxicity, CBRN, Insecure Code

What is OWASP Score?

Enkrypt AI Safety Leaderboard
Find out how your favorite LLM ranks for safety and security
Check Model Compliance For
AI 600
OWASP TOP 10