OpenShift CNF Reference Test Harness¶

SIG-GW / IBCF Validation Architecture

Author: Daniel
Platform: OpenShift CNF

1. Architecture Overview¶

+------------------------------+
| Traffic Generator (SIPp)     |
+------------------------------+
| Chaos Injector (Litmus)      |
+------------------------------+
| Metrics Stack (Prom + Grafana)|
+------------------------------+
| SIG-GW Pods (IBCF)           |
+------------------------------+
| STIR Service + Vault         |
+------------------------------+
| Media Relay Pods             |
+------------------------------+

1.1 Component Placement¶

Traffic Generator: Separate namespace (testrig)
Chaos Injector: Separate namespace (chaos)
Metrics Stack: Monitoring namespace (monitoring)
SIG-GW Pods: Production namespace (ims)
STIR Service: Production namespace (ims)
Media Relay: Production namespace (ims)

2. Components¶

2.1 Traffic Generation¶

SIPp¶

Purpose: SIP traffic generation
Deployment: StatefulSet or DaemonSet
Configuration: Test scenario scripts
Metrics: Generated CPS, call success rate

Custom Load Injector¶

Purpose: Advanced load patterns
Features:
Ramp-up/ramp-down
Burst patterns
Emergency call injection
STIR-aware traffic

STIR-Aware Traffic Scripts¶

Purpose: Generate STIR-signed traffic
Features:
A/B/C attestation levels
Certificate rotation scenarios
Invalid signature testing

2.2 Chaos Engineering¶

Litmus Chaos¶

Purpose: Chaos injection framework
Experiments:
Pod kill
Network partition
CPU throttling
Memory pressure
DNS corruption

Custom Chaos Scenarios¶

Purpose: IMS-specific chaos
Scenarios:
STIR service crash
LI mediation device failure
Certificate expiration
State store partition

2.3 Metrics Stack¶

Prometheus¶

Purpose: Metrics collection
Scrape Interval: 15 seconds
Retention: 30 days
Targets: All IMS components

Grafana Dashboards¶

Purpose: Visualization
Dashboards:
IBCF Signaling
STIR/SHAKEN
LI Intercepts
Emergency Calls
Infrastructure

Alertmanager¶

Purpose: Alert routing
Channels: PagerDuty, Slack, Email
Rules: KPI threshold violations

3. CI/CD Integration¶

3.1 Automated Regression Pipeline¶

Trigger: Push to main/develop

Stages: 1. Build: Container images 2. Deploy: Test environment 3. Functional Tests: Unit + integration 4. Load Tests: Baseline performance 5. Chaos Tests: Resilience validation 6. Report: Test results

3.2 Performance Baseline Comparison¶

Purpose: Detect performance regressions

Comparison: - Current run vs. baseline - PDD trends - CPU/Memory trends - Latency trends

Action: Alert if > 10% degradation

3.3 Canary Testing¶

Purpose: Validate new releases

Process: 1. Deploy canary (10% traffic) 2. Monitor KPIs 3. Compare to baseline 4. Promote or rollback

4. Key Metrics Collected¶

4.1 Signaling Metrics¶

CPS: Calls per second
PDD: Post dial delay (p50, p95, p99)
CSSR: Call setup success rate
Dialog Count: Active dialogs
Error Rate: 4xx/5xx responses

4.2 Infrastructure Metrics¶

CPU: Per pod, per node
Memory: Per pod, per node
Network: Bandwidth, packet loss
Disk: I/O, space

4.3 STIR/SHAKEN Metrics¶

Signing Latency: p50, p95, p99
Verification Latency: p50, p95, p99
Certificate Cache Hit Rate: %
OCSP Response Time: p50, p95, p99
Attestation Distribution: A/B/C counts

4.4 LI Metrics¶

Intercept Coverage: % of warrants
Intercept Latency: Overhead
Media Duplication Rate: %
Audit Log Completeness: %

4.5 Emergency Metrics¶

Emergency PDD: p50, p95, p99
Emergency Drop Rate: %
Priority Preemption: Success rate
Location Preservation: %

4.6 TLS Metrics¶

TLS Handshake Time: p50, p95, p99
mTLS Success Rate: %
Certificate Rotation Time: Seconds
OCSP Validation Time: p50, p95, p99

5. Scaling Strategy¶

5.1 HPA for Signaling Pods¶

Configuration:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ibcf-signaling
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ibcf-signaling
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

Triggers: - CPU > 70% - Memory > 80% - Custom metric (CPS)

5.2 Dedicated Media Plane¶

Architecture: - Separate namespace for media - Dedicated nodes (optional) - Independent scaling - SR-IOV enabled

5.3 SR-IOV Enabled Nodes¶

Purpose: Low-latency media handling

Configuration: - SR-IOV network attachments - DPDK support (optional) - CPU pinning - NUMA awareness

5.4 Node Affinity for Latency¶

Purpose: Optimize for low latency

Configuration:

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:
      nodeSelectorTerms:
      - matchExpressions:
        - key: node-role.kubernetes.io/latency-optimized
          operator: In
          values:
          - "true"

6. Test Harness Execution Flow¶

6.1 Deploy CNF¶

Steps: 1. Apply Kubernetes manifests 2. Wait for pods ready 3. Verify health checks 4. Confirm metrics collection

6.2 Deploy Test Harness¶

Steps: 1. Deploy traffic generator 2. Deploy chaos injector 3. Deploy metrics stack (if not existing) 4. Configure test scenarios

6.3 Inject Load¶

Steps: 1. Start baseline load (50% rated) 2. Ramp to 100% rated 3. Sustain for test duration 4. Monitor KPIs

6.4 Inject Chaos¶

Steps: 1. Select chaos scenario 2. Execute chaos experiment 3. Monitor system behavior 4. Validate recovery

6.5 Validate KPIs¶

Steps: 1. Check all KPIs within target 2. Verify no degradation 3. Confirm recovery 4. Document results

6.6 Generate Report¶

Steps: 1. Collect metrics 2. Generate dashboards 3. Create test report 4. Archive results

7. Test Scenarios¶

7.1 Baseline Performance¶

Configuration: - Load: 100% rated CPS - Duration: 1 hour - Metrics: All KPIs

Validation: - All KPIs within target - Stable performance - No errors

7.2 Burst Load¶

Configuration: - Load: 2× rated CPS - Duration: 60 seconds - Metrics: CPS, PDD, CPU

Validation: - No packet loss - PDD within target - No crash

7.3 Failover Test¶

Configuration: - Load: 100% rated CPS - Event: Kill primary pod - Metrics: Failover time, call loss

Validation: - Failover ≤ 1 second - Zero call loss - No emergency drop

7.4 Chaos Test¶

Configuration: - Load: 100% rated CPS - Chaos: Pod kill, network partition - Metrics: Recovery time, call handling

Validation: - Graceful degradation - Fast recovery - No data loss

8. Monitoring & Alerting¶

8.1 Prometheus Rules¶

Example:

groups:
- name: ibcf_alerts
  rules:
  - alert: HighPDD
    expr: histogram_quantile(0.95, rate(ibcf_pdd_bucket[5m])) > 0.4
    for: 5m
    annotations:
      summary: "PDD above threshold"

8.2 Grafana Dashboards¶

Dashboards: - Real-time IBCF metrics - Historical trends - Component health - KPI compliance

8.3 Alert Routing¶

Channels: - Critical: PagerDuty - Warning: Slack - Info: Email

9. Test Data Management¶

9.1 Test Scenarios¶

Storage: Git repository Format: YAML/JSON Versioning: Git tags

9.2 PIXIT Configurations¶

Storage: testrig/pixit/ Format: YAML Usage: Test execution parameters

9.3 Test Results¶

Storage: Object storage (S3) Retention: 90 days Format: JSON, PCAP, logs