Mistral-7B

Architecture: Grouped Query Attention (GQA)

Strength: Fast pattern matching & classification

Why: Rapid incident triage and routing (2-3s)

"Interface down" → Keyword analysis → Route to Troubleshooting

Qwen2.5-7B

Architecture: Enhanced attention for reasoning

Strength: Multi-step technical reasoning

Why: OSPF/BGP requires relationship analysis (7-10s)

LOADING→FULL: Check neighbors → Config → Conclusion: Normal ✓

CodeLlama-7B

Pre-training: 500B tokens of CODE

Strength: Perfect CLI syntax understanding

Why: Cisco IOS commands must be syntactically perfect

conf t → interface Gi0/1 → no shut → end

Mistral-7B

Architecture: Grouped Query Attention (GQA)

Strength: Policy compliance & structured output

Why: Security requires rigid rule following

Failed SSH → Rule: Block → Create ACL → Log action

Llama3-8B

Architecture: 8B parameters

Context: Longer context window

Strength: Holistic design thinking

Pre-training: 15T tokens (design patterns)

Why: Network design needs big picture

Time: 10-15s for comprehensive design

Example: "Redundant R1→R3 path" → Current: R1-R2-R3 → Design: Add R1-R4-R3 backup → EIGRP load balancing → Dual-homed failover

Why Different LLMs? The Right Tool for the Right Job