feat: add promptfoo eval harness for agent quality scoring (#371)

Adds promptfoo eval harness for agent quality scoring. LLM-as-judge system scoring task completion, instruction adherence, identity consistency, deliverable quality, and safety. Includes tests.
This commit is contained in:
Russell Jones
2026-04-10 22:54:31 -04:00
committed by GitHub
parent 1e73b5be0d
commit b456845e85
11 changed files with 796 additions and 0 deletions

6
evals/.gitignore vendored Normal file
View File

@@ -0,0 +1,6 @@
node_modules/
dist/
.promptfoo/
results/latest.json
*.log
.env