For more information about the various scores and how this benchmark works please look at the original page: Ayumi's LLM Role Play & ERP Ranking - If you find the benchmark data useful and want to contribute: You can donate via Ko-fi -

Not satisfied with this benchmark? Checkout this site with community driven ratings and reviews of LLMs: BestERP -

Filter Guide
  • Syntax:
                filter-expr ::= or-expr
                or-expr ::= (and-expr)+
                          | (and-expr)+ "|" or-expr
                and-expr ::= negative-match | match
                # a match word with a "-" in front:
                negative-match ::= "-" match
                # every character except " " or "|"
                match ::= [^| ]+ 
  • Example LLaMA-2 Models: "L2 | llama 2"
  • Example Airoboros GPT4 models not version 1.4: "airoboros GPT4 -1.4"
Column Description
ALC-IQ3 The ALC-IQ3 is the 3rd version of the ALC-IQ. It tries to determine how well a model understands a character card. The higher the better. Best score is 100.
IQ Entropy The IQ Entropy is not part of the ranking. It's just the (normalized) Entropy of the Yes/No answer probabilities. It's just a slightly different measure of the ALC-IQ3.
ERP3 Score The average ratio of lewd words vs. words in a response. The higher the better.
Var Score The lewd word variety score. It counts how many different lewd words occur in all ERP responses
Rank ERP3 Response Link Size Q ALC-IQ3 IQ3 Entropy ERP3 Score ERP3 Variety