feat: add promptfoo eval harness for agent quality scoring (#371)

Adds promptfoo eval harness for agent quality scoring. LLM-as-judge system scoring task completion, instruction adherence, identity consistency, deliverable quality, and safety. Includes tests.
2026-04-10 21:54:31 -05:00
parent 1e73b5be0d
commit b456845e85
11 changed files with 796 additions and 0 deletions
@@ -0,0 +1,6 @@
+node_modules/
+dist/
+.promptfoo/
+results/latest.json
+*.log
+.env