O.RI INFRA

Hardware Infrastructure Hosting the ORI & modelsInference Endpoints Registry
Aims for 100K+ models in the registry (mirroring HuggingFace open-source models).
Deployed on ATLAS.O (Cerebras WSE-3 cluster).
Deployed on  IO.NET  (Ray Cluster with GPU- H100, A100).
Deployed on HIPAA/SOC2/TEE-compliant infrastructure.
Connects to 200+ Model inference-as-a-service providers for closed source models (AIML API, Fireworks AI, Together AI).
Routing Optimization
Utilizes chips that are 10x faster for routing operations, e.g. Cerebras Wafer.
Utilizes complier level optimizer for structured LLM tasks, e.g.  Rysana Inversion .
ORI Core Deployment
Exclusively runs on ATLAS.O (Cerebras WSE-3 cluster).
Inference Endpoint Distribution
ATLAS.O (Cerebras WSE-3 cluster).
﻿ IO.NET  RAY CLUSTERS (GPU- H100, A100).
Expansion Strategy
Engages 3rd party hardware partners for capacity expansion.
Supports TEE (Trusted Execution Environment) with confidential computing-capable chips.
Ensures deployment on HIPAA/SOC2/TEE-compliant infrastructure.
Scalability
Features: 
Currently handles 1,000 RPS (Requests Per Second).
Targets 1M RPS as a future goal.
How: 
We will host most routed/ used LLM and ML models on Cerebras clusters, which allow fast inference and sufficient compute/ memory capacity to handle 1K-1M concurrent calls and processes. This requires collaboration and support from Cerebras software team.
For less-used/ routed models we can host them in  io.net  Ray Clusters, for slower inference and higher latency use cases, so we can balance the hosting cost in our hardware infrastructures.
﻿