vExpertAI
✓ What Works Well
Redis for A2A
LPUSH/BRPOP pattern: simple, reliable, observable. Under 10ms latency.
Idempotency Cache
24hr TTL prevents duplicates. Hash: device_id + command + timestamp.
TextFSM Parsing
Converts Cisco CLI to JSON. Templates from ntc-templates repo.
Graceful Fallback
HF endpoint fails → auto-switch to OpenAI. Zero downtime demos.
⚠ Challenges & Solutions
Challenge: HF Cold Starts
Dedicated endpoints sleep after 15min idle → 60-90s first call
Solution: Warmup ping every 10min + OpenAI fallback
Challenge: LLM Token Noise
Llama3 outputs <|eot_id|>, system messages leak through
Solution: Regex cleanup + structured prompts with examples
Challenge: SSH Tunnel Instability
Azure VM → Router connections drop randomly
Solution: Auto-reconnect with exponential backoff + health checks
Challenge: Duplicate Approvals
Same incident → multiple agents → N approval cards
Solution: Dedup by (device_id + agent_type), show latest only
💡 Best Practices
1. Start Simple
Single agent + OpenAI first. Add specialists & fine-tuning later.
2. Log Everything
Trace IDs, timestamps, provider used. Essential for debugging.
3. Test with Mocks
Mock routers, LLM responses. E2E tests without production risk.
4. Version Everything
LoRA adapters, prompts, schemas. Roll back on regression.