Requirement | Specification |
---|---|
Cores | Minimum 4 physical CPU cores per GPU + 2 for system operations |
Clock Speed | Minimum 3.5 GHz base clock, with boost clock of at least 4.0 GHz |
Recommended CPUs | AMD EPYC 9654 (96 cores, up to 3.7 GHz), Intel Xeon Platinum 8490H (60 cores, up to 4.8 GHz), AMD EPYC 9474F (48 cores, up to 4.1 GHz) |
GPU VRAM | Minimum Bandwidth |
---|---|
8/10/12/16 GB | PCIe 3.0 x16 |
20/24/32/40/48 GB | PCIe 4.0 x16 |
80 GB | PCIe 5.0 x16 |
GPU Configuration | Recommended RAM |
---|---|
8x 80 GB VRAM | >= 2048 GB DDR5 |
8x 40/48 GB VRAM | >= 1024 GB DDR5 |
8x 24 GB VRAM | >= 512 GB DDR4/5 |
8x 16 GB VRAM | >= 256 GB DDR4/5 |
Requirement | Specification |
---|---|
Redundancy | >= 2n redundancy (RAID 1) |
Size | >= 500GB (Post RAID) |
Disk Perf - Sequential read | 2,000 MB/s |
Disk Perf - Sequential write | 2,000 MB/s |
Disk Perf - Random Read (4K QD32) | 100,000 IOPS |
Disk Perf - Random Write (4K QD32) | 10,000 IOPS |
Component | Requirement |
---|---|
Redundancy | >= 2n redundancy (RAID 1 or RAID 10) |
Size | 2 TB+ NVME per GPU for 24/48 GB GPUs; 4 TB+ NVME per GPU for 80 GB GPUs (Post RAID) |
Disk Perf - Sequential read | 6,000 MB/s |
Disk Perf - Sequential write | 5,000 MB/s |
Disk Perf - Random Read (4K QD32) | 400,000 IOPS |
Disk Perf - Random Write (4K QD32) | 40,000 IOPS |
Component | Requirement |
---|---|
Minimum Servers | 4 |
Minimum Storage size | 200 TB raw (100 TB usable) |
Connectivity | 200 Gbps between servers/data-plane |
Network | Private subnet |
Component | Requirement |
---|---|
CPU | AMD Genoa: EPYC 9354P (32-Core, 3.25-3.8 GHz), EPYC 9534 (64-Core, 2.45-3.7 GHz), or EPYC 9554 (64-Core, 3.1-3.75 GHz) |
RAM | 256 GB or higher, DDR5/ECC |
Requirement | Specification |
---|---|
Redundancy | >= 2n redundancy (RAID 1) |
Size | >= 500GB (Post RAID) |
Disk Perf - Sequential read | 2,000 MB/s |
Disk Perf - Sequential write | 2,000 MB/s |
Disk Perf - Random Read (4K QD32) | 100,000 IOPS |
Disk Perf - Random Write (4K QD32) | 10,000 IOPS |
Component | Requirement |
---|---|
Redundancy | None (JBOD) - Runpod will assemble into array. 7 to 14TB disk sizes recommended. |
Disk Perf - Sequential read | 6,000 MB/s |
Disk Perf - Sequential write | 5,000 MB/s |
Disk Perf - Random Read (4K QD32) | 400,000 IOPS |
Disk Perf - Random Write (4K QD32) | 40,000 IOPS |
Component | Requirement |
---|---|
CPU | AMD Ryzen Threadripper 7960X (24-Cores, 4.2-5.3 GHz) |
RAM | 128 GB or higher, DDR5/ECC |
Boot disk | >= 500 GB, RAID 1 |
Component | Requirement |
---|---|
Minimum Servers | 2 |
Minimum Storage size | 8 TB usable |
Connectivity | 200 Gbps between servers/data-plane |
Network | Private subnet; public IP and >990 ports open |
Component | Requirement |
---|---|
CPU | AMD EPYC 9004 ‘Genoa’ Zen 4 or better with minimum 32 cores. 3+ GHz clock speed. |
RAM | 1 TB or higher, DDR5/ECC |
Component | Requirement |
---|---|
Redundancy | >= 2n redundancy (RAID 1 or RAID 10) |
Size | 8 TB+ |
Disk Perf - Sequential read | 6,000 MB/s |
Disk Perf - Sequential write | 5,000 MB/s |
Disk Perf - Random Read (4K QD32) | 400,000 IOPS |
Disk Perf - Random Write (4K QD32) | 40,000 IOPS |
Component | Requirement |
---|---|
Redundancy | >= 2n redundancy (RAID 1) |
Size | >= 500GB (Post RAID) |
Disk Perf - Sequential read | 2,000 MB/s |
Disk Perf - Sequential write | 2,000 MB/s |
Disk Perf - Random Read (4K QD32) | 100,000 IOPS |
Disk Perf - Random Write (4K QD32) | 10,000 IOPS |
Component | Requirement |
---|---|
NVIDIA Drivers | Version 550.54.15 or later production version |
CUDA | Version 12.4 or later production version |
NVIDIA Persistence | Activated for GPUs of 48 GB or more |
Requirement | Specification |
---|---|
Utility Feeds | - Minimum of two independent utility feeds from separate substations - Each feed capable of supporting 100% of the data center’s power load - Automatic transfer switches (ATS) for seamless switchover between feeds with UL 1008 certification (or regional equivalent) |
UPS | - N+1 redundancy for UPS systems - Minimum of 15 minutes runtime at full load |
Generators | - N+1 redundancy for generator systems - Generators must be able to support 100% of the data center’s power load - Minimum of 48 hours of on-site fuel storage at full load - Automatic transfer to generator power within 10 seconds of utility failure |
Power Distribution | - Redundant power distribution paths (2N) from utility to rack level - Redundant Power Distribution Units (PDUs) in each rack - Remote power monitoring and management capabilities at rack level |
Testing and Maintenance | - Monthly generator tests under load for a minimum of 30 minutes - Quarterly full-load tests of the entire backup power system, including UPS and generators - Annual full-facility power outage test (coordinated with Runpod) - Regular thermographic scanning of electrical systems - Detailed maintenance logs for all power equipment - 24/7 on-site facilities team for immediate response to power issues |
Monitoring and Alerting | - Real-time monitoring of all power systems - Automated alerting for any power anomalies or threshold breaches |
Capacity Planning | - Maintain a minimum of 20% spare power capacity for future growth - Annual power capacity audits and forecasting |
Fire Suppression | - Maintain datacenter fire suppression systems in compliance with NFPA 75 and 76 (or regional equivalent) |
Requirement | Specification |
---|---|
Internet Connectivity | - Minimum of two diverse and redundant internet circuits from separate providers - Each connection should be capable of supporting 100% of the data center’s bandwidth requirements - BGP routing implemented for automatic failover between circuit providers - 100 Gbps minimum total bandwidth capacity |
Core Infrastructure | - Redundant core switches in a high-availability configuration (e.g., stacking, VSS, or equivalent) |
Distribution Layer | - Redundant distribution switches with multi-chassis link aggregation (MLAG) or equivalent technology - Minimum 100 Gbps uplinks to core switches |
Access Layer | - Redundant top-of-rack switches in each cabinet - Minimum 100 Gbps server connections for high-performance compute nodes |
DDoS Protection | - Must have a DDoS mitigation solution, either on-premises or on-demand cloud-based |
Quality of service | Maintain network performance within the following parameters: * Network utilization levels must remain below 80% on any link during peak hours * Packet loss must not exceed 0.1% (1 in 1000) on any network segment * P95 round-trip time (RTT) within the data center should not exceed 4ms * P95 jitter within the datacenter should not exceed 3ms |
Testing and Maintenance | - Regular failover testing of all redundant components (minimum semi-annually) - Annual full-scale disaster recovery test - Maintenance windows for network updates and patches, with minimal service disruption scheduled at least 1 week in advance |
Capacity Planning | - Maintain a minimum of 40% spare network capacity for future growth - Regular network performance audits and capacity forecasting |
Requirement | Description |
---|---|
Data Center Tier | Abide by Tier III+ Data Center Standards |
Security | 24/7 on-site security and technical staff |
Physical security | Runpod servers must be held in an isolated secure rack or cage in an area that is not accessible to any non-partner or approved DC personnel. Physical access to this area must be tracked and logged. |
Maintenance | All maintenance resulting in disruption or downtime must be scheduled at least 1 week in advance. Large disruptions must be coordinated with Runpod at least 1 month in advance. |