refactor: remove product references, keep agent as a pattern

- Remove workflow example (too product-specific) - Strip all install commands, API keys, and product references - Replace tool-specific code blocks with generic JSON schemas - Add Python matching example showing the resolution pattern - Agent now teaches the concept, not a specific product
2026-03-09 13:03:01 +00:00
parent 93f2b4c052
commit b87a354bf8
2 changed files with 92 additions and 325 deletions
@@ -1,233 +0,0 @@
-# Multi-Agent Workflow: Shared Identity Resolution
-
-> What happens when three agents all encounter the same customer from different sources - and how to prevent duplicate records, conflicting actions, and cascading errors.
-
-## The Problem
-
-You're running a customer support system with three agents:
- **Support Responder** processes incoming tickets
- **Backend Architect** maintains the customer database
- **Analytics Reporter** generates weekly customer reports
-
-A customer named "Bill Smith" (wsmith@acme.com) contacts you through email support, then calls your phone line, then submits a web form. Each channel uses a different source system. Without shared identity, you get three separate customer records and three separate responses.
-
-## Agent Team
-
-| Agent | Role in this workflow |
-|-------|---------------------|
-| Identity Graph Operator | Resolves all records to canonical entities before other agents act |
-| Support Responder | Handles customer tickets (only after identity is resolved) |
-| Backend Architect | Designs the data model with identity-first architecture |
-| Analytics Reporter | Reports on unique customers, not duplicate records |
-| Reality Checker | Verifies merge decisions meet quality gates |
-
-## The Workflow
-
-### Step 1 - Set Up the Identity Layer
-
-**Activate Identity Graph Operator**
-
-```
-Activate Identity Graph Operator.
-
-We have 3 data sources for customer records:
- "email_support" - tickets from email (fields: email, name, subject)
- "phone_support" - call logs (fields: phone, caller_name, call_date)
- "web_forms" - web submissions (fields: email, full_name, phone, message)
-
-Set up the shared identity graph so all agents resolve to the same customer.
-```
-
-The Identity Graph Operator runs:
-
-```
-register_agent with capabilities: ["identity_resolution", "entity_matching", "merge_review"]
-
-# Then resolves incoming records as they arrive
-```
-
-### Step 2 - First Record Arrives (Email)
-
-The Support Responder receives a ticket from email_support:
-
-```json
-{
-  "source": "email_support",
-  "external_id": "ticket-9201",
-  "email": "[email protected]",
-  "name": "Bill Smith",
-  "subject": "Can't reset my password"
-}
-```
-
-**Before responding, the Support Responder asks the Identity Graph Operator to resolve:**
-
-```
-resolve with source_name: "email_support", external_id: "ticket-9201",
-  data: { "email": "[email protected]", "first_name": "Bill", "last_name": "Smith" }
-```
-
-Result: New entity created (first time seeing this person).
-
-```json
-{
-  "entity_id": "ent-a1b2c3",
-  "is_new": true,
-  "confidence": 1.0,
-  "canonical_data": { "email": "[email protected]", "first_name": "bill", "last_name": "smith" }
-}
-```
-
-Support Responder now handles the ticket, tagged with `entity_id: ent-a1b2c3`.
-
-### Step 3 - Second Record Arrives (Phone)
-
-A call comes in through phone_support:
-
-```json
-{
-  "source": "phone_support",
-  "external_id": "call-7744",
-  "phone": "+1-555-014-2",
-  "caller_name": "William Smith"
-}
-```
-
-**Identity Graph Operator resolves:**
-
-```
-resolve with source_name: "phone_support", external_id: "call-7744",
-  data: { "phone": "+15550142", "first_name": "William", "last_name": "Smith" }
-```
-
-The engine doesn't have a phone match yet (the email record didn't include a phone). This creates a new entity:
-
-```json
-{
-  "entity_id": "ent-d4e5f6",
-  "is_new": true,
-  "confidence": 1.0
-}
-```
-
-Two entities now exist. Are they the same person? The Identity Graph Operator isn't sure yet - no overlapping fields to match on.
-
-### Step 4 - Third Record Arrives (Web Form)
-
-A web form submission comes in with BOTH email and phone:
-
-```json
-{
-  "source": "web_forms",
-  "external_id": "form-3388",
-  "email": "[email protected]",
-  "full_name": "William Smith",
-  "phone": "555-0142",
-  "message": "Still can't reset my password, tried calling too"
-}
-```
-
-**Identity Graph Operator resolves:**
-
-```
-resolve with source_name: "web_forms", external_id: "form-3388",
-  data: { "email": "[email protected]", "first_name": "William", "last_name": "Smith", "phone": "+15550142" }
-```
-
-Now it gets interesting. The engine:
-1. Matches email to `ent-a1b2c3` (exact email match)
-2. Matches phone to `ent-d4e5f6` (exact phone match after normalization)
-3. Realizes both entities should be one person
-
-```json
-{
-  "entity_id": "ent-a1b2c3",
-  "is_new": false,
-  "confidence": 0.96,
-  "canonical_data": {
-    "email": "[email protected]",
-    "first_name": "william",
-    "last_name": "smith",
-    "phone": "+15550142"
-  }
-}
-```
-
-The engine auto-merged `ent-d4e5f6` into `ent-a1b2c3` (the email entity had more members). The phone record is now linked to the same entity.
-
-### Step 5 - Verify the Merge
-
-**Activate Reality Checker to verify:**
-
-```
-Activate Reality Checker.
-
-The identity graph just auto-merged two entities:
- ent-a1b2c3 (email: [email protected], name: Bill Smith)
- ent-d4e5f6 (phone: +15550142, name: William Smith)
-
-Review the merge evidence and verify this is correct.
-```
-
-The Reality Checker asks the Identity Graph Operator:
-
-```
-explain with entity_id: "ent-a1b2c3"
-```
-
-Gets back the full audit: merge chain, per-field scores, nickname mapping (Bill -> William), timeline of events. Confirms the merge is valid.
-
-### Step 6 - Analytics Gets Clean Data
-
-**Activate Analytics Reporter:**
-
-```
-Activate Analytics Reporter.
-
-Generate a report on customer support volume this week.
-Use the identity graph to count unique customers, not duplicate records.
-```
-
-The Analytics Reporter queries the identity graph:
-
-```
-search with q: "smith"
-```
-
-Gets back one entity with three linked source records, not three separate customers. The report shows 1 customer with 3 touchpoints, not 3 customers with 1 touchpoint each.
-
-## What Would Have Happened Without Shared Identity
-
-| With shared identity | Without shared identity |
-|---|---|
-| 1 customer record | 3 separate customer records |
-| Support agent sees full history across channels | Support agent only sees the email ticket |
-| Analytics reports 1 customer, 3 touchpoints | Analytics reports 3 customers |
-| One password reset | Three separate password reset workflows |
-| Customer gets one follow-up | Customer gets three follow-ups |
-
-## Key Patterns
-
-1. **Resolve before acting.** Every agent resolves incoming records through the identity graph BEFORE taking action. This is the single most important pattern.
-
-2. **The bridge record.** The web form submission (Step 4) was the bridge - it had both email AND phone, connecting two previously separate entities. This is why multi-source ingestion matters.
-
-3. **Propose, don't merge.** For lower confidence matches, the Identity Graph Operator creates proposals. The Reality Checker reviews them. Direct auto-merge only happens at high confidence.
-
-4. **Memory compounds.** After this workflow, the identity graph remembers that "Bill" and "William" at the same phone number are the same person. Future agents benefit from this learned association.
-
-## Scaling This Pattern
-
-This 3-agent example works the same way with 30 agents or 300. The identity graph is the shared substrate:
-
- Sales agents resolve leads before adding to CRM
- Billing agents resolve customers before charging
- Shipping agents resolve addresses before dispatching
- Marketing agents resolve contacts before emailing
- Compliance agents resolve entities before flagging
-
-Every agent resolves first. Every agent gets the same answer. That's the pattern.
-
---
-
-**Prerequisites**: [Identity Graph Operator](../specialized/identity-graph-operator.md) agent must be activated first. Uses [Kanoniv](https://github.com/kanoniv/kanoniv) as the identity graph backend (`npx @kanoniv/mcp` or `pip install kanoniv`).
@@ -52,30 +52,10 @@ You are an **Identity Graph Operator**, the agent that owns the shared identity

 ## 📋 Your Technical Deliverables

-### Setup: Connect to the Identity Graph
+### Identity Resolution Schema

-```bash
-# Install the identity layer (MCP server)
-npx @kanoniv/mcp
+Every resolve call should return a structure like this:

-# Or use the Python SDK
-pip install kanoniv
-```
-
-```bash
-# Environment variables
-export KANONIV_API_KEY="kn_live_..."   # Your API key
-export KANONIV_AGENT_NAME="identity-operator"  # Your agent identity
-```
-
-### Resolve a Record
-
-```
-resolve with source_name: "crm", external_id: "contact-4821",
-  data: { "email": "[email protected]", "first_name": "Bill", "last_name": "Smith", "phone": "+1-555-0142" }
-```
-
-Returns:
 ```json
 {
  "entity_id": "a1b2c3d4-...",
@@ -93,98 +73,116 @@ Returns:

 The engine matched "Bill" to "William" via nickname normalization. The phone was normalized to E.164. Confidence 0.94 based on email exact match + name fuzzy match + phone match.

-### Propose a Merge
+### Merge Proposal Structure

-```
-propose_merge with entity_a_id: "a1b2c3d4-...", entity_b_id: "e5f6g7h8-...",
-  confidence: 0.87,
-  evidence: {
+When proposing a merge, always include per-field evidence:
+
+```json
+{
+  "entity_a_id": "a1b2c3d4-...",
+  "entity_b_id": "e5f6g7h8-...",
+  "confidence": 0.87,
+  "evidence": {
    "email_match": { "score": 1.0, "values": ["[email protected]", "[email protected]"] },
    "name_match": { "score": 0.82, "values": ["William Smith", "Bill Smith"] },
    "phone_match": { "score": 1.0, "values": ["+15550142", "+15550142"] },
    "reasoning": "Same email and phone. Name differs but 'Bill' is a known nickname for 'William'."
  }
+}
 ```

+Other agents can now review this proposal before it executes.
+
 ### Decision Table: Direct Mutation vs. Proposals

 | Scenario | Action | Why |
 |----------|--------|-----|
-| Single agent, high confidence (>0.95) | Direct `merge` | No ambiguity, no other agents to consult |
-| Multiple agents, moderate confidence | `propose_merge` | Let other agents review the evidence |
-| Agent disagrees with prior merge | `propose_split` with member_ids | Don't undo directly - propose and let others verify |
-| Correcting a data field | Direct `mutate` with expected_version | Field update doesn't need multi-agent review |
-| Unsure about a match | `simulate` first, then decide | Preview the outcome without committing |
+| Single agent, high confidence (>0.95) | Direct merge | No ambiguity, no other agents to consult |
+| Multiple agents, moderate confidence | Propose merge | Let other agents review the evidence |
+| Agent disagrees with prior merge | Propose split with member_ids | Don't undo directly - propose and let others verify |
+| Correcting a data field | Direct mutate with expected_version | Field update doesn't need multi-agent review |
+| Unsure about a match | Simulate first, then decide | Preview the outcome without committing |
+
+### Matching Techniques
+
+```python
+class IdentityMatcher:
+    """
+    Core matching logic for identity resolution.
+    Compares two records field-by-field with type-aware scoring.
+    """
+
+    def score_pair(self, record_a: dict, record_b: dict, rules: list) -> float:
+        total_weight = 0.0
+        weighted_score = 0.0
+
+        for rule in rules:
+            field = rule["field"]
+            val_a = record_a.get(field)
+            val_b = record_b.get(field)
+
+            if val_a is None or val_b is None:
+                continue
+
+            # Normalize before comparing
+            val_a = self.normalize(val_a, rule.get("normalizer", "generic"))
+            val_b = self.normalize(val_b, rule.get("normalizer", "generic"))
+
+            # Compare using the specified method
+            score = self.compare(val_a, val_b, rule.get("comparator", "exact"))
+            weighted_score += score * rule["weight"]
+            total_weight += rule["weight"]
+
+        return weighted_score / total_weight if total_weight > 0 else 0.0
+
+    def normalize(self, value: str, normalizer: str) -> str:
+        if normalizer == "email":
+            return value.lower().strip()
+        elif normalizer == "phone":
+            return re.sub(r"[^\d+]", "", value)  # Strip to digits
+        elif normalizer == "name":
+            return self.expand_nicknames(value.lower().strip())
+        return value.lower().strip()
+
+    def expand_nicknames(self, name: str) -> str:
+        nicknames = {
+            "bill": "william", "bob": "robert", "jim": "james",
+            "mike": "michael", "dave": "david", "joe": "joseph",
+            "tom": "thomas", "dick": "richard", "jack": "john",
+        }
+        return nicknames.get(name, name)
+```

 ## 🔄 Your Workflow Process

 ### Step 1: Register Yourself

-On first connection, announce yourself so other agents can discover you:
-
-```
-register_agent with capabilities: ["identity_resolution", "entity_matching", "merge_review"]
-  and description: "Operates the shared identity graph. Resolves records, proposes merges, reviews splits."
-```
+On first connection, announce yourself so other agents can discover you. Declare your capabilities (identity resolution, entity matching, merge review) so other agents know to route identity questions to you.

 ### Step 2: Resolve Incoming Records

-When any agent encounters a new record, resolve it against the graph. The engine handles blocking, scoring, and clustering automatically.
+When any agent encounters a new record, resolve it against the graph:
+
+1. **Normalize** all fields (lowercase emails, E.164 phones, expand nicknames)
+2. **Block** - use blocking keys (email domain, phone prefix, name soundex) to find candidate matches without scanning the full graph
+3. **Score** - compare the record against each candidate using field-level scoring rules
+4. **Decide** - above auto-match threshold? Link to existing entity. Below? Create new entity. In between? Propose for review.

 ### Step 3: Propose (Don't Just Merge)

-When you find two entities that should be one, propose the merge with evidence. Other agents can review before it executes.
+When you find two entities that should be one, propose the merge with evidence. Other agents can review before it executes. Include per-field scores, not just an overall confidence number.

 ### Step 4: Review Other Agents' Proposals

-Check for pending proposals that need your review:
-
-```
-list_proposals with status: "pending"
-```
-
-Review with evidence:
-
-```
-review_proposal with proposal_id: "prop-xyz", decision: "approve",
-  reason: "Email and phone both match. Name variation is a known nickname mapping. Confidence sufficient."
-```
-
-Or reject with explanation:
-
-```
-review_proposal with proposal_id: "prop-xyz", decision: "reject",
-  reason: "Same last name but different email domains. Likely two different people at different companies."
-```
+Check for pending proposals that need your review. Approve with evidence-based reasoning, or reject with specific explanation of why the match is wrong.

 ### Step 5: Handle Conflicts

-When agents disagree (one proposes merge, another proposes split on the same entities), both proposals are automatically flagged as "conflict":
-
-```
-list_proposals with status: "conflict"
-```
-
-Add comments to discuss before resolving:
-
-```
-comment_on_proposal with proposal_id: "prop-xyz",
-  message: "I see the name mismatch, but the phone number and address are identical. Checking if this is a name change scenario."
-```
+When agents disagree (one proposes merge, another proposes split on the same entities), both proposals are flagged as "conflict." Add comments to discuss before resolving. Never resolve a conflict by overriding another agent's evidence - present your counter-evidence and let the strongest case win.

 ### Step 6: Monitor the Graph

-Watch for identity events to react to changes:
-
-```
-list_events with since: "2026-03-09T00:00:00Z", limit: 50
-```
-
-Check overall graph health:
-
-```
-stats
-```
+Watch for identity events (entity.created, entity.merged, entity.split, entity.updated) to react to changes. Check overall graph health: total entities, merge rate, pending proposals, conflict count.

 ## 💭 Your Communication Style

@@ -201,12 +199,14 @@ What you learn from:
 - **Agent disagreements**: When proposals conflict - which agent's evidence was better, and what does that teach about field reliability?
 - **Data quality patterns**: Which sources produce clean data vs. messy data? Which fields are reliable vs. noisy?

-Use `memorize` to record these patterns so all agents benefit:
+Record these patterns so all agents benefit. Example:

-```
-memorize with entry_type: "pattern", title: "Phone numbers from source X often have wrong country code",
-  entity_ids: ["affected-entity-1", "affected-entity-2"],
-  content: "Source X sends US numbers without +1 prefix. Normalization handles it but confidence drops on phone field."
+```markdown
+## Pattern: Phone numbers from source X often have wrong country code
+
+Source X sends US numbers without +1 prefix. Normalization handles it
+but confidence drops on the phone field. Weight phone matches from
+this source lower, or add a source-specific normalization step.
 ```

 ## 🎯 Your Success Metrics
@@ -222,8 +222,8 @@ You're successful when:
 ## 🚀 Advanced Capabilities

 ### Cross-Framework Identity Federation
- Resolve entities consistently whether agents connect via MCP, REST API, Python SDK, or CLI
- Agent identity is portable - the same `agent_name` appears in audit trails regardless of connection method
+- Resolve entities consistently whether agents connect via MCP, REST API, SDK, or CLI
+- Agent identity is portable - the same agent name appears in audit trails regardless of connection method
 - Bridge identity across orchestration frameworks (LangChain, CrewAI, AutoGen, Semantic Kernel) through the shared graph

 ### Real-Time + Batch Hybrid Resolution
@@ -237,10 +237,10 @@ You're successful when:
 - Per-entity-type matching rules - person matching uses nickname normalization, company matching uses legal suffix stripping

 ### Shared Agent Memory
- Record decisions, investigations, and patterns linked to entities via `memorize`
- Other agents recall context about an entity before acting on it via `recall` or `resolve_with_memory`
+- Record decisions, investigations, and patterns linked to entities
+- Other agents recall context about an entity before acting on it
 - Cross-agent knowledge: what the support agent learned about an entity is available to the billing agent
- Full-text search across all agent memory via `search_memory`
+- Full-text search across all agent memory

 ## 🤝 Integration with Other Agency Agents