O.RI [ ROUTING INTELLIGENCE ]

Why ORI is Needed

  • Single-model gap: Large language models vary in strengths; no single model excels at everything.
  • Existing routers rely on preferences: Many routing methods depend on human preference data, risking bias.
  • ORI advantage: Focuses on embedding-based segmentation, avoiding heavy human input.


How ORI Works

  • Vector embeddings: Transforms each query into a vector (Sentence Transformer), capturing semantic meaning.
  • Segmentation: Groups similar queries (K-Means/Agglomerative/KNN) to map them to dominant benchmarks.
  • Model selection: Identifies top-performing LLM for each cluster’s dominant task or benchmark.
  • Dynamic routing: Sends incoming queries to the most capable model automatically.


Key Outcomes & Takeaways

  • Performance gains: Up to +2.7 on MMLU, +1.8 on MuSR vs. best single models.
  • Broad effectiveness: Excels on diverse benchmarks like BBH, ARC, and MMLU.
  • Cost-speed balance: Maintains near-top token generation speed, keeps latency in check.
  • Scalable approach: Easily extends to new tasks and additional LLMs without re-engineering.