For more information about the various scores and how this benchmark works please look at the original page: Ayumi's LLM Role Play & ERP Ranking - https://rentry.co/ayumi_erp_rating. If you find the benchmark data useful and want to contribute: You can donate via Ko-fi - https://ko-fi.com/weicon
Not satisfied with this benchmark? Checkout this site with community driven ratings and reviews of LLMs: BestERP - https://besterp.ai/
Filter Guide
- Syntax:
filter-expr ::= or-expr or-expr ::= (and-expr)+ | (and-expr)+ "|" or-expr and-expr ::= negative-match | match # a match word with a "-" in front: negative-match ::= "-" match # every character except " " or "|" match ::= [^| ]+
- Example LLaMA-2 Models: "L2 | llama 2"
- Example Airoboros GPT4 models not version 1.4: "airoboros GPT4 -1.4"
Column Description
Column | Description |
---|---|
ALC-IQ3 | The ALC-IQ3 is the 3rd version of the ALC-IQ. It tries to determine how well a model understands a character card. The higher the better. Best score is 100. |
IQ Entropy | The IQ Entropy is not part of the ranking. It's just the (normalized) Entropy of the Yes/No answer probabilities. It's just a slightly different measure of the ALC-IQ3. |
ERP3 Score | The average ratio of lewd words vs. words in a response. The higher the better. |
Var Score | The lewd word variety score. It counts how many different lewd words occur in all ERP responses |
Rank | ERP3 Response Link | Size | Q | ALC-IQ3 | IQ3 Entropy | ERP3 Score | ERP3 Variety |
---|