feat: add promptfoo eval harness for agent quality scoring (#371)
Adds promptfoo eval harness for agent quality scoring. LLM-as-judge system scoring task completion, instruction adherence, identity consistency, deliverable quality, and safety. Includes tests.
This commit is contained in:
6
evals/.gitignore
vendored
Normal file
6
evals/.gitignore
vendored
Normal file
@@ -0,0 +1,6 @@
|
||||
node_modules/
|
||||
dist/
|
||||
.promptfoo/
|
||||
results/latest.json
|
||||
*.log
|
||||
.env
|
||||
Reference in New Issue
Block a user