feat: add promptfoo eval harness for agent quality scoring (#371)

Adds promptfoo eval harness for agent quality scoring. LLM-as-judge system scoring task completion, instruction adherence, identity consistency, deliverable quality, and safety. Includes tests.
This commit is contained in:
Russell Jones
2026-04-10 22:54:31 -04:00
committed by GitHub
parent 1e73b5be0d
commit b456845e85
11 changed files with 796 additions and 0 deletions

15
evals/tsconfig.json Normal file
View File

@@ -0,0 +1,15 @@
{
"compilerOptions": {
"target": "ES2022",
"module": "commonjs",
"moduleResolution": "node",
"esModuleInterop": true,
"strict": true,
"outDir": "dist",
"rootDir": ".",
"resolveJsonModule": true,
"declaration": false
},
"include": ["scripts/**/*.ts"],
"exclude": ["node_modules", "dist"]
}