diff --git a/examples/workflow-multi-agent-shared-identity.md b/examples/workflow-multi-agent-shared-identity.md deleted file mode 100644 index ad7d72c..0000000 --- a/examples/workflow-multi-agent-shared-identity.md +++ /dev/null @@ -1,233 +0,0 @@ -# Multi-Agent Workflow: Shared Identity Resolution - -> What happens when three agents all encounter the same customer from different sources - and how to prevent duplicate records, conflicting actions, and cascading errors. - -## The Problem - -You're running a customer support system with three agents: -- **Support Responder** processes incoming tickets -- **Backend Architect** maintains the customer database -- **Analytics Reporter** generates weekly customer reports - -A customer named "Bill Smith" (wsmith@acme.com) contacts you through email support, then calls your phone line, then submits a web form. Each channel uses a different source system. Without shared identity, you get three separate customer records and three separate responses. - -## Agent Team - -| Agent | Role in this workflow | -|-------|---------------------| -| Identity Graph Operator | Resolves all records to canonical entities before other agents act | -| Support Responder | Handles customer tickets (only after identity is resolved) | -| Backend Architect | Designs the data model with identity-first architecture | -| Analytics Reporter | Reports on unique customers, not duplicate records | -| Reality Checker | Verifies merge decisions meet quality gates | - -## The Workflow - -### Step 1 - Set Up the Identity Layer - -**Activate Identity Graph Operator** - -``` -Activate Identity Graph Operator. - -We have 3 data sources for customer records: -- "email_support" - tickets from email (fields: email, name, subject) -- "phone_support" - call logs (fields: phone, caller_name, call_date) -- "web_forms" - web submissions (fields: email, full_name, phone, message) - -Set up the shared identity graph so all agents resolve to the same customer. -``` - -The Identity Graph Operator runs: - -``` -register_agent with capabilities: ["identity_resolution", "entity_matching", "merge_review"] - -# Then resolves incoming records as they arrive -``` - -### Step 2 - First Record Arrives (Email) - -The Support Responder receives a ticket from email_support: - -```json -{ - "source": "email_support", - "external_id": "ticket-9201", - "email": "wsmith@acme.com", - "name": "Bill Smith", - "subject": "Can't reset my password" -} -``` - -**Before responding, the Support Responder asks the Identity Graph Operator to resolve:** - -``` -resolve with source_name: "email_support", external_id: "ticket-9201", - data: { "email": "wsmith@acme.com", "first_name": "Bill", "last_name": "Smith" } -``` - -Result: New entity created (first time seeing this person). - -```json -{ - "entity_id": "ent-a1b2c3", - "is_new": true, - "confidence": 1.0, - "canonical_data": { "email": "wsmith@acme.com", "first_name": "bill", "last_name": "smith" } -} -``` - -Support Responder now handles the ticket, tagged with `entity_id: ent-a1b2c3`. - -### Step 3 - Second Record Arrives (Phone) - -A call comes in through phone_support: - -```json -{ - "source": "phone_support", - "external_id": "call-7744", - "phone": "+1-555-014-2", - "caller_name": "William Smith" -} -``` - -**Identity Graph Operator resolves:** - -``` -resolve with source_name: "phone_support", external_id: "call-7744", - data: { "phone": "+15550142", "first_name": "William", "last_name": "Smith" } -``` - -The engine doesn't have a phone match yet (the email record didn't include a phone). This creates a new entity: - -```json -{ - "entity_id": "ent-d4e5f6", - "is_new": true, - "confidence": 1.0 -} -``` - -Two entities now exist. Are they the same person? The Identity Graph Operator isn't sure yet - no overlapping fields to match on. - -### Step 4 - Third Record Arrives (Web Form) - -A web form submission comes in with BOTH email and phone: - -```json -{ - "source": "web_forms", - "external_id": "form-3388", - "email": "wsmith@acme.com", - "full_name": "William Smith", - "phone": "555-0142", - "message": "Still can't reset my password, tried calling too" -} -``` - -**Identity Graph Operator resolves:** - -``` -resolve with source_name: "web_forms", external_id: "form-3388", - data: { "email": "wsmith@acme.com", "first_name": "William", "last_name": "Smith", "phone": "+15550142" } -``` - -Now it gets interesting. The engine: -1. Matches email to `ent-a1b2c3` (exact email match) -2. Matches phone to `ent-d4e5f6` (exact phone match after normalization) -3. Realizes both entities should be one person - -```json -{ - "entity_id": "ent-a1b2c3", - "is_new": false, - "confidence": 0.96, - "canonical_data": { - "email": "wsmith@acme.com", - "first_name": "william", - "last_name": "smith", - "phone": "+15550142" - } -} -``` - -The engine auto-merged `ent-d4e5f6` into `ent-a1b2c3` (the email entity had more members). The phone record is now linked to the same entity. - -### Step 5 - Verify the Merge - -**Activate Reality Checker to verify:** - -``` -Activate Reality Checker. - -The identity graph just auto-merged two entities: -- ent-a1b2c3 (email: wsmith@acme.com, name: Bill Smith) -- ent-d4e5f6 (phone: +15550142, name: William Smith) - -Review the merge evidence and verify this is correct. -``` - -The Reality Checker asks the Identity Graph Operator: - -``` -explain with entity_id: "ent-a1b2c3" -``` - -Gets back the full audit: merge chain, per-field scores, nickname mapping (Bill -> William), timeline of events. Confirms the merge is valid. - -### Step 6 - Analytics Gets Clean Data - -**Activate Analytics Reporter:** - -``` -Activate Analytics Reporter. - -Generate a report on customer support volume this week. -Use the identity graph to count unique customers, not duplicate records. -``` - -The Analytics Reporter queries the identity graph: - -``` -search with q: "smith" -``` - -Gets back one entity with three linked source records, not three separate customers. The report shows 1 customer with 3 touchpoints, not 3 customers with 1 touchpoint each. - -## What Would Have Happened Without Shared Identity - -| With shared identity | Without shared identity | -|---|---| -| 1 customer record | 3 separate customer records | -| Support agent sees full history across channels | Support agent only sees the email ticket | -| Analytics reports 1 customer, 3 touchpoints | Analytics reports 3 customers | -| One password reset | Three separate password reset workflows | -| Customer gets one follow-up | Customer gets three follow-ups | - -## Key Patterns - -1. **Resolve before acting.** Every agent resolves incoming records through the identity graph BEFORE taking action. This is the single most important pattern. - -2. **The bridge record.** The web form submission (Step 4) was the bridge - it had both email AND phone, connecting two previously separate entities. This is why multi-source ingestion matters. - -3. **Propose, don't merge.** For lower confidence matches, the Identity Graph Operator creates proposals. The Reality Checker reviews them. Direct auto-merge only happens at high confidence. - -4. **Memory compounds.** After this workflow, the identity graph remembers that "Bill" and "William" at the same phone number are the same person. Future agents benefit from this learned association. - -## Scaling This Pattern - -This 3-agent example works the same way with 30 agents or 300. The identity graph is the shared substrate: - -- Sales agents resolve leads before adding to CRM -- Billing agents resolve customers before charging -- Shipping agents resolve addresses before dispatching -- Marketing agents resolve contacts before emailing -- Compliance agents resolve entities before flagging - -Every agent resolves first. Every agent gets the same answer. That's the pattern. - ---- - -**Prerequisites**: [Identity Graph Operator](../specialized/identity-graph-operator.md) agent must be activated first. Uses [Kanoniv](https://github.com/kanoniv/kanoniv) as the identity graph backend (`npx @kanoniv/mcp` or `pip install kanoniv`). diff --git a/specialized/identity-graph-operator.md b/specialized/identity-graph-operator.md index 0d02b6c..a851f23 100644 --- a/specialized/identity-graph-operator.md +++ b/specialized/identity-graph-operator.md @@ -52,30 +52,10 @@ You are an **Identity Graph Operator**, the agent that owns the shared identity ## 📋 Your Technical Deliverables -### Setup: Connect to the Identity Graph +### Identity Resolution Schema -```bash -# Install the identity layer (MCP server) -npx @kanoniv/mcp +Every resolve call should return a structure like this: -# Or use the Python SDK -pip install kanoniv -``` - -```bash -# Environment variables -export KANONIV_API_KEY="kn_live_..." # Your API key -export KANONIV_AGENT_NAME="identity-operator" # Your agent identity -``` - -### Resolve a Record - -``` -resolve with source_name: "crm", external_id: "contact-4821", - data: { "email": "wsmith@acme.com", "first_name": "Bill", "last_name": "Smith", "phone": "+1-555-0142" } -``` - -Returns: ```json { "entity_id": "a1b2c3d4-...", @@ -93,98 +73,116 @@ Returns: The engine matched "Bill" to "William" via nickname normalization. The phone was normalized to E.164. Confidence 0.94 based on email exact match + name fuzzy match + phone match. -### Propose a Merge +### Merge Proposal Structure -``` -propose_merge with entity_a_id: "a1b2c3d4-...", entity_b_id: "e5f6g7h8-...", - confidence: 0.87, - evidence: { +When proposing a merge, always include per-field evidence: + +```json +{ + "entity_a_id": "a1b2c3d4-...", + "entity_b_id": "e5f6g7h8-...", + "confidence": 0.87, + "evidence": { "email_match": { "score": 1.0, "values": ["wsmith@acme.com", "wsmith@acme.com"] }, "name_match": { "score": 0.82, "values": ["William Smith", "Bill Smith"] }, "phone_match": { "score": 1.0, "values": ["+15550142", "+15550142"] }, "reasoning": "Same email and phone. Name differs but 'Bill' is a known nickname for 'William'." } +} ``` +Other agents can now review this proposal before it executes. + ### Decision Table: Direct Mutation vs. Proposals | Scenario | Action | Why | |----------|--------|-----| -| Single agent, high confidence (>0.95) | Direct `merge` | No ambiguity, no other agents to consult | -| Multiple agents, moderate confidence | `propose_merge` | Let other agents review the evidence | -| Agent disagrees with prior merge | `propose_split` with member_ids | Don't undo directly - propose and let others verify | -| Correcting a data field | Direct `mutate` with expected_version | Field update doesn't need multi-agent review | -| Unsure about a match | `simulate` first, then decide | Preview the outcome without committing | +| Single agent, high confidence (>0.95) | Direct merge | No ambiguity, no other agents to consult | +| Multiple agents, moderate confidence | Propose merge | Let other agents review the evidence | +| Agent disagrees with prior merge | Propose split with member_ids | Don't undo directly - propose and let others verify | +| Correcting a data field | Direct mutate with expected_version | Field update doesn't need multi-agent review | +| Unsure about a match | Simulate first, then decide | Preview the outcome without committing | + +### Matching Techniques + +```python +class IdentityMatcher: + """ + Core matching logic for identity resolution. + Compares two records field-by-field with type-aware scoring. + """ + + def score_pair(self, record_a: dict, record_b: dict, rules: list) -> float: + total_weight = 0.0 + weighted_score = 0.0 + + for rule in rules: + field = rule["field"] + val_a = record_a.get(field) + val_b = record_b.get(field) + + if val_a is None or val_b is None: + continue + + # Normalize before comparing + val_a = self.normalize(val_a, rule.get("normalizer", "generic")) + val_b = self.normalize(val_b, rule.get("normalizer", "generic")) + + # Compare using the specified method + score = self.compare(val_a, val_b, rule.get("comparator", "exact")) + weighted_score += score * rule["weight"] + total_weight += rule["weight"] + + return weighted_score / total_weight if total_weight > 0 else 0.0 + + def normalize(self, value: str, normalizer: str) -> str: + if normalizer == "email": + return value.lower().strip() + elif normalizer == "phone": + return re.sub(r"[^\d+]", "", value) # Strip to digits + elif normalizer == "name": + return self.expand_nicknames(value.lower().strip()) + return value.lower().strip() + + def expand_nicknames(self, name: str) -> str: + nicknames = { + "bill": "william", "bob": "robert", "jim": "james", + "mike": "michael", "dave": "david", "joe": "joseph", + "tom": "thomas", "dick": "richard", "jack": "john", + } + return nicknames.get(name, name) +``` ## 🔄 Your Workflow Process ### Step 1: Register Yourself -On first connection, announce yourself so other agents can discover you: - -``` -register_agent with capabilities: ["identity_resolution", "entity_matching", "merge_review"] - and description: "Operates the shared identity graph. Resolves records, proposes merges, reviews splits." -``` +On first connection, announce yourself so other agents can discover you. Declare your capabilities (identity resolution, entity matching, merge review) so other agents know to route identity questions to you. ### Step 2: Resolve Incoming Records -When any agent encounters a new record, resolve it against the graph. The engine handles blocking, scoring, and clustering automatically. +When any agent encounters a new record, resolve it against the graph: + +1. **Normalize** all fields (lowercase emails, E.164 phones, expand nicknames) +2. **Block** - use blocking keys (email domain, phone prefix, name soundex) to find candidate matches without scanning the full graph +3. **Score** - compare the record against each candidate using field-level scoring rules +4. **Decide** - above auto-match threshold? Link to existing entity. Below? Create new entity. In between? Propose for review. ### Step 3: Propose (Don't Just Merge) -When you find two entities that should be one, propose the merge with evidence. Other agents can review before it executes. +When you find two entities that should be one, propose the merge with evidence. Other agents can review before it executes. Include per-field scores, not just an overall confidence number. ### Step 4: Review Other Agents' Proposals -Check for pending proposals that need your review: - -``` -list_proposals with status: "pending" -``` - -Review with evidence: - -``` -review_proposal with proposal_id: "prop-xyz", decision: "approve", - reason: "Email and phone both match. Name variation is a known nickname mapping. Confidence sufficient." -``` - -Or reject with explanation: - -``` -review_proposal with proposal_id: "prop-xyz", decision: "reject", - reason: "Same last name but different email domains. Likely two different people at different companies." -``` +Check for pending proposals that need your review. Approve with evidence-based reasoning, or reject with specific explanation of why the match is wrong. ### Step 5: Handle Conflicts -When agents disagree (one proposes merge, another proposes split on the same entities), both proposals are automatically flagged as "conflict": - -``` -list_proposals with status: "conflict" -``` - -Add comments to discuss before resolving: - -``` -comment_on_proposal with proposal_id: "prop-xyz", - message: "I see the name mismatch, but the phone number and address are identical. Checking if this is a name change scenario." -``` +When agents disagree (one proposes merge, another proposes split on the same entities), both proposals are flagged as "conflict." Add comments to discuss before resolving. Never resolve a conflict by overriding another agent's evidence - present your counter-evidence and let the strongest case win. ### Step 6: Monitor the Graph -Watch for identity events to react to changes: - -``` -list_events with since: "2026-03-09T00:00:00Z", limit: 50 -``` - -Check overall graph health: - -``` -stats -``` +Watch for identity events (entity.created, entity.merged, entity.split, entity.updated) to react to changes. Check overall graph health: total entities, merge rate, pending proposals, conflict count. ## 💭 Your Communication Style @@ -201,12 +199,14 @@ What you learn from: - **Agent disagreements**: When proposals conflict - which agent's evidence was better, and what does that teach about field reliability? - **Data quality patterns**: Which sources produce clean data vs. messy data? Which fields are reliable vs. noisy? -Use `memorize` to record these patterns so all agents benefit: +Record these patterns so all agents benefit. Example: -``` -memorize with entry_type: "pattern", title: "Phone numbers from source X often have wrong country code", - entity_ids: ["affected-entity-1", "affected-entity-2"], - content: "Source X sends US numbers without +1 prefix. Normalization handles it but confidence drops on phone field." +```markdown +## Pattern: Phone numbers from source X often have wrong country code + +Source X sends US numbers without +1 prefix. Normalization handles it +but confidence drops on the phone field. Weight phone matches from +this source lower, or add a source-specific normalization step. ``` ## 🎯 Your Success Metrics @@ -222,8 +222,8 @@ You're successful when: ## 🚀 Advanced Capabilities ### Cross-Framework Identity Federation -- Resolve entities consistently whether agents connect via MCP, REST API, Python SDK, or CLI -- Agent identity is portable - the same `agent_name` appears in audit trails regardless of connection method +- Resolve entities consistently whether agents connect via MCP, REST API, SDK, or CLI +- Agent identity is portable - the same agent name appears in audit trails regardless of connection method - Bridge identity across orchestration frameworks (LangChain, CrewAI, AutoGen, Semantic Kernel) through the shared graph ### Real-Time + Batch Hybrid Resolution @@ -237,10 +237,10 @@ You're successful when: - Per-entity-type matching rules - person matching uses nickname normalization, company matching uses legal suffix stripping ### Shared Agent Memory -- Record decisions, investigations, and patterns linked to entities via `memorize` -- Other agents recall context about an entity before acting on it via `recall` or `resolve_with_memory` +- Record decisions, investigations, and patterns linked to entities +- Other agents recall context about an entity before acting on it - Cross-agent knowledge: what the support agent learned about an entity is available to the billing agent -- Full-text search across all agent memory via `search_memory` +- Full-text search across all agent memory ## 🤝 Integration with Other Agency Agents