Related note 0
Related note 1
Related note 2
O.RI [ ROUTING INTELLIGENCE ]
ORI: O Routing Intelligence
https://arxiv.org/abs/2502.10051
Add a caption...
Why ORI is Needed
Single-model gap:
Large language models vary in strengths; no single model excels at everything.
Existing routers rely on preferences:
Many routing methods depend on human preference data, risking bias.
ORI advantage:
Focuses on embedding-based segmentation, avoiding heavy human input.
How ORI Works
Vector embeddings:
Transforms each query into a vector (Sentence Transformer), capturing semantic meaning.
Segmentation:
Groups similar queries (K-Means/Agglomerative/KNN) to map them to dominant benchmarks.
Model selection:
Identifies top-performing LLM for each cluster’s dominant task or benchmark.
Dynamic routing:
Sends incoming queries to the most capable model automatically.
Key Outcomes & Takeaways
Performance gains:
Up to +2.7 on MMLU, +1.8 on MuSR vs. best single models.
Broad effectiveness:
Excels on diverse benchmarks like BBH, ARC, and MMLU.
Cost-speed balance:
Maintains near-top token generation speed, keeps latency in check.
Scalable approach:
Easily extends to new tasks and additional LLMs without re-engineering.