6 Commits

27 changed files with 1642 additions and 744 deletions

13
Cargo.lock generated
View File

@@ -901,6 +901,15 @@ dependencies = [
"winapi",
]
[[package]]
name = "matchers"
version = "0.2.0"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "d1525a2a28c7f4fa0fc98bb91ae755d1e2d1505079e05539e35bc876b5d65ae9"
dependencies = [
"regex-automata",
]
[[package]]
name = "memchr"
version = "2.8.0"
@@ -2000,10 +2009,14 @@ version = "0.3.22"
source = "registry+https://github.com/rust-lang/crates.io-index"
checksum = "2f30143827ddab0d256fd843b7a66d164e9f271cfa0dde49142c5ca0ca291f1e"
dependencies = [
"matchers",
"nu-ansi-term",
"once_cell",
"regex-automata",
"sharded-slab",
"smallvec",
"thread_local",
"tracing",
"tracing-core",
"tracing-log",
]

View File

@@ -23,7 +23,7 @@ serde_json = "1.0.149"
clap = { version = "4.5", features = ["derive", "string", "wrap_help"] }
color-eyre = "0.6"
tracing = "0.1"
tracing-subscriber = "0.3"
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
tracing-appender = "0.2"
sysinfo = "0.38"
libc = "0.2"

View File

@@ -1,33 +1,61 @@
# 🔥 ember-tune
```text
__________ ____ ______ ____ ______ __ __ _ __ ______
/ ____/ |/ // __ )/ ____// __ \ /_ __/ / / / // | / // ____/
/ __/ / /|_/ // __ / __/ / /_/ / / / / / / // |/ // __/
/ /___ / / / // /_/ / /___ / _, _/ / / / /_/ // /| // /___
/_____//_/ /_//_____/_____//_/ |_| /_/ \____//_/ |_//_____/
>>> Physically-grounded thermal & power optimization for Linux <<<
```
> ### **Find your hardware's "Physical Sweet Spot" through automated trial-by-fire.**
`ember-tune` is a scientifically-driven hardware optimizer that replaces guesswork and manual tuning with a rigorous, automated engineering workflow. It determines the unique thermal properties of your specific laptop—including its Thermal Resistance (Rθ) and "Silicon Knee"—to generate optimal configurations for common Linux tuning daemons.
## ✨ Features
- **Automated Physical Benchmarking:** Measures real-world thermal performance under load to find the true "sweet spot" where performance-per-watt is maximized before thermal saturation causes diminishing returns.
- **Heuristic Hardware Discovery:** Utilizes a data-driven Hardware Abstraction Layer (SAL) that probes your system and automatically adapts to its unique quirks, drivers, and sensor paths.
- **Non-Destructive Configuration:** Safely merges new, optimized power limits into your existing `throttled.conf`, preserving manual undervolt settings and comments.
- **Universal Safeguard Architecture (USA):** Includes a high-frequency concurrent watchdog and RAII state restoration to guarantee your system is never left in a dangerous state.
- **Real-time TUI Dashboard:** A `ratatui`-based terminal interface provides high-resolution telemetry throughout the benchmark.
## 🔬 How it Works: The Architecture
`ember-tune` is built on a decoupled, multi-threaded architecture to ensure the UI is always responsive and that hardware state is managed safely.
1. **The Heuristic Engine:** On startup, the engine probes your system's DMI, `sysfs`, and active services. It compares these "facts" against the `hardware_db.toml` to select the correct System Abstraction Layer (SAL).
2. **The Orchestrator (Backend Thread):** This is the state machine that executes the benchmark. It communicates with hardware *only* through the SAL traits.
3. **The TUI (Main Thread):** The `ratatui` dashboard renders `TelemetryState` snapshots received from the orchestrator via an MPSC channel.
4. **The Watchdog (Safety Thread):** A high-priority thread that polls safety sensors every 100ms to trigger an atomic `EmergencyAbort` if failure conditions are met.
## ⚙️ Development Setup
`ember-tune` is a standard Cargo project. You will need a recent Rust toolchain and common build utilities.
`ember-tune` is a standard Cargo project.
**Prerequisites:**
- `rustup`
- `build-essential` (or equivalent for your distribution)
- `build-essential`
- `libudev-dev`
- `stress-ng` (Required for benchmarking)
```bash
# 1. Clone the repository
# 1. Clone and Build
git clone https://gitea.com/narl/ember-tune.git
cd ember-tune
# 2. Build the release binary
cargo build --release
# 3. Run the test suite (safe, uses a virtual environment)
# This requires no special permissions and does not touch your hardware.
# 2. Run the safe test suite
cargo test
```
**Running:**
Due to its direct hardware access, `ember-tune` requires root privileges.
```bash
# Run a full benchmark and generate optimized configs
# Run a full benchmark
sudo ./target/release/ember-tune
# Run a mock benchmark for UI/logic testing
# Run a mock benchmark for UI testing
sudo ./target/release/ember-tune --mock
```
@@ -35,48 +63,24 @@ sudo ./target/release/ember-tune --mock
## 🤝 Contributing Quirk Data (`hardware_db.toml`)
**This is the most impactful way to contribute.** `ember-tune`'s strength comes from its `assets/hardware_db.toml`, which encodes community knowledge about how to manage specific laptops. If your hardware isn't working perfectly, you can likely fix it by adding a new entry here.
**This is the most impactful way to contribute.** If your hardware isn't working perfectly, add a new entry to `assets/hardware_db.toml`.
The database is composed of four key sections: `conflicts`, `ecosystems`, `quirks`, and `discovery`.
### A. Reporting a Service Conflict
If a background service on your system interferes with `ember-tune`, add it to `[[conflicts]]`.
**Example:** Adding `laptop-mode-tools`.
### Example: Adding a Service Conflict
```toml
[[conflicts]]
id = "laptop_mode_conflict"
services = ["laptop-mode.service"]
contention = "Multiple - I/O schedulers, Power limits"
severity = "Medium"
fix_action = "SuspendService" # Orchestrator will stop/start this service
fix_action = "SuspendService"
help_text = "laptop-mode-tools can override power-related sysfs settings."
```
### B. Adding a New Hardware Ecosystem
If your laptop manufacturer (e.g., Razer) has a unique fan control tool or ACPI platform profile path, define it in `[ecosystems]`.
**Example:** A hypothetical "Razer" ecosystem.
```toml
[ecosystems.razer]
vendor_regex = "Razer"
# Path to the sysfs node that controls performance profiles
profiles_path = "/sys/bus/platform/drivers/razer_acpi/power_mode"
# Map human-readable names to the values the driver expects
policy_map = { Balanced = 0, Boost = 1, Silent = 2 }
```
### C. Defining a Model-Specific Quirk
If a specific laptop model has a bug (like a stuck sensor or incorrect fan reporting), define a `[[quirks]]` entry.
**Example:** A laptop whose fans report 0 RPM even when spinning.
### Example: Defining a Model-Specific Quirk
```toml
[[quirks]]
model_regex = "HP Envy 15-ep.*"
id = "hp_fan_stuck_sensor"
issue = "Fan sensor reports 0 RPM when active."
# The 'action' tells the SAL to use a different method for fan detection.
action = "UseThermalVelocityFallback"
```
After adding your changes, run the test suite and then submit a Pull Request!

View File

@@ -15,7 +15,7 @@ help_text = "TLP and Power-Profiles-Daemon fight over power envelopes. Mask both
[[conflicts]]
id = "thermal_logic_collision"
services = ["thermald.service", "throttled.service"]
services = ["thermald.service", "throttled.service", "lenovo_fix.service", "lenovo-throttling-fix.service"]
contention = "RAPL / MSR / BD-PROCHOT"
severity = "High"
fix_action = "SuspendService"

100
src/agent_analyst/mod.rs Normal file
View File

@@ -0,0 +1,100 @@
//! Heuristic Analysis & Optimization Math (Agent Analyst)
//!
//! This module analyzes raw telemetry data to extract the "Optimal Real-World Settings".
//! It calculates the Silicon Knee, Acoustic/Thermal Matrix (Hysteresis), and
//! generates three distinct hardware states: Silent, Balanced, and Sustained Heavy.
use serde::{Serialize, Deserialize};
use crate::engine::{ThermalProfile, OptimizerEngine};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FanCurvePoint {
pub temp_on: f32,
pub temp_off: f32,
pub pwm_percent: u8,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SystemProfile {
pub name: String,
pub pl1_watts: f32,
pub pl2_watts: f32,
pub fan_curve: Vec<FanCurvePoint>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OptimizationMatrix {
pub silent: SystemProfile,
pub balanced: SystemProfile,
pub performance: SystemProfile,
pub thermal_resistance_kw: f32,
pub ambient_temp: f32,
}
pub struct HeuristicAnalyst {
engine: OptimizerEngine,
}
impl HeuristicAnalyst {
pub fn new() -> Self {
Self {
engine: OptimizerEngine::new(5),
}
}
/// Analyzes the raw telemetry to generate the 3 optimal profiles.
pub fn analyze(&self, profile: &ThermalProfile, max_soak_watts: f32) -> OptimizationMatrix {
let r_theta = profile.r_theta;
let silicon_knee = self.engine.find_silicon_knee(profile);
let ambient = profile.ambient_temp;
// 1. State A: Silent / Battery (Scientific Passive Equilibrium)
// Find P where T_core = 60C with fans OFF.
let r_theta_passive = r_theta * 2.5;
let silent_watts = ((60.0 - ambient) / r_theta_passive.max(0.1)).clamp(3.0, 15.0);
let silent_profile = SystemProfile {
name: "Silent".to_string(),
pl1_watts: silent_watts,
pl2_watts: silent_watts * 1.2,
fan_curve: vec![
FanCurvePoint { temp_on: 65.0, temp_off: 55.0, pwm_percent: 0 },
FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 30 },
],
};
// 2. State B: Balanced (The Silicon Knee)
// We use R_theta to predict where the knee will sit thermally.
let balanced_profile = SystemProfile {
name: "Balanced".to_string(),
pl1_watts: silicon_knee,
pl2_watts: silicon_knee * 1.25,
fan_curve: vec![
FanCurvePoint { temp_on: ambient + 15.0, temp_off: ambient + 10.0, pwm_percent: 0 },
FanCurvePoint { temp_on: ambient + 25.0, temp_off: ambient + 20.0, pwm_percent: 30 },
FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 50 },
FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 80 },
],
};
// 3. State C: Sustained Heavy
let performance_profile = SystemProfile {
name: "Performance".to_string(),
pl1_watts: max_soak_watts,
pl2_watts: max_soak_watts * 1.3,
fan_curve: vec![
FanCurvePoint { temp_on: 50.0, temp_off: 45.0, pwm_percent: 30 },
FanCurvePoint { temp_on: 70.0, temp_off: 60.0, pwm_percent: 60 },
FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 100 },
],
};
OptimizationMatrix {
silent: silent_profile,
balanced: balanced_profile,
performance: performance_profile,
thermal_resistance_kw: r_theta,
ambient_temp: ambient,
}
}
}

154
src/agent_integrator/mod.rs Normal file
View File

@@ -0,0 +1,154 @@
//! System Service Integration (Agent Integrator)
//!
//! This module translates the mathematical optimums defined by the Analyst
//! into actionable, real-world Linux/OS service configurations.
//! It generates templates for fan daemons (i8kmon, thinkfan) and handles
//! resolution strategies for overlapping daemons.
use anyhow::Result;
use std::path::{Path, PathBuf};
use std::fs;
use crate::agent_analyst::OptimizationMatrix;
pub struct ServiceIntegrator;
impl ServiceIntegrator {
/// Generates and saves an i8kmon configuration based on the balanced profile.
pub fn generate_i8kmon_config(matrix: &OptimizationMatrix, output_path: &Path, source_path: Option<&PathBuf>) -> Result<()> {
let profile = &matrix.balanced;
let mut conf = String::new();
// Read existing content to preserve daemon and other settings
let existing = if let Some(src) = source_path {
if src.exists() { fs::read_to_string(src).unwrap_or_default() } else { String::new() }
} else if output_path.exists() {
fs::read_to_string(output_path).unwrap_or_default()
} else {
String::new()
};
if !existing.is_empty() {
for line in existing.lines() {
let trimmed = line.trim();
// Filter out the old auto-generated config lines and fan configs
if !trimmed.starts_with("set config(0)") &&
!trimmed.starts_with("set config(1)") &&
!trimmed.starts_with("set config(2)") &&
!trimmed.starts_with("set config(3)") &&
!trimmed.starts_with("# Auto-generated") &&
!trimmed.starts_with("# Profile:") &&
!trimmed.is_empty() {
conf.push_str(line);
conf.push('\n');
}
}
}
conf.push_str("\n# Auto-generated by ember-tune Integrator\n");
conf.push_str(&format!("# Profile: {}\n", profile.name));
conf.push_str(&format!("# Thermal Resistance: {:.3} K/W\n\n", matrix.thermal_resistance_kw));
for (i, p) in profile.fan_curve.iter().enumerate() {
let state = match p.pwm_percent {
0..=20 => 0,
21..=50 => 1,
51..=100 => 2,
_ => 2,
};
let off = if i == 0 { "-".to_string() } else { format!("{:.0}", p.temp_off) };
conf.push_str(&format!("set config({}) {{{} {} {:.0} {}}}\n", i, state, state, p.temp_on, off));
}
fs::write(output_path, conf)?;
Ok(())
}
/// Generates a thinkfan configuration, merging with existing sensors if possible.
pub fn generate_thinkfan_config(matrix: &OptimizationMatrix, output_path: &Path, source_path: Option<&PathBuf>) -> Result<()> {
let profile = &matrix.balanced;
let mut conf = String::new();
let existing = if let Some(src) = source_path {
if src.exists() { fs::read_to_string(src).unwrap_or_default() } else { String::new() }
} else if output_path.exists() {
fs::read_to_string(output_path).unwrap_or_default()
} else {
String::new()
};
if !existing.is_empty() {
let mut in_sensors = false;
for line in existing.lines() {
let trimmed = line.trim();
if trimmed == "sensors:" { in_sensors = true; }
if trimmed == "levels:" { in_sensors = false; }
if in_sensors {
conf.push_str(line);
conf.push('\n');
}
}
}
if conf.is_empty() {
conf.push_str("sensors:\n - hwmon: /sys/class/hwmon/hwmon0/temp1_input\n\n");
}
conf.push_str("\n# Auto-generated by ember-tune Integrator\n");
conf.push_str("levels:\n");
for (i, p) in profile.fan_curve.iter().enumerate() {
let level = match p.pwm_percent {
0..=20 => 0,
21..=40 => 1,
41..=60 => 3,
61..=80 => 5,
_ => 7,
};
let down = if i == 0 { 0.0 } else { p.temp_off };
conf.push_str(&format!(" - [{}, {:.0}, {:.0}]\n", level, down, p.temp_on));
}
fs::write(output_path, conf)?;
Ok(())
}
/// Generates a resolution checklist/script for daemons.
pub fn generate_conflict_resolution_script(output_path: &Path) -> Result<()> {
let script = r#"#!/bin/bash
# ember-tune Daemon Neutralization Script
# 1. Mask power-profiles-daemon (Prevent ACPI overrides)
systemctl mask power-profiles-daemon
# 2. Filter TLP (Prevent CPU governor fights while keeping PCIe saving)
sed -i 's/^CPU_SCALING_GOVERNOR_ON_AC=.*/CPU_SCALING_GOVERNOR_ON_AC=""/' /etc/tlp.conf
sed -i 's/^CPU_BOOST_ON_AC=.*/CPU_BOOST_ON_AC=""/' /etc/tlp.conf
systemctl restart tlp
# 3. Thermald Delegate (We provide the trips, it handles the rest)
systemctl restart thermald
"#;
fs::write(output_path, script)?;
Ok(())
}
/// Generates a thermald configuration XML.
pub fn generate_thermald_config(matrix: &OptimizationMatrix, output_path: &Path, _source_path: Option<&PathBuf>) -> Result<()> {
let profile = &matrix.balanced;
let mut xml = String::new();
xml.push_str("<?xml version=\"1.0\"?>\n<ThermalConfiguration>\n <Platform>\n <Name>ember-tune Balanced</Name>\n <ProductName>Generic</ProductName>\n <Preference>balanced</Preference>\n <ThermalZones>\n <ThermalZone>\n <Type>cpu</Type>\n <TripPoints>\n");
for (i, p) in profile.fan_curve.iter().enumerate() {
xml.push_str(&format!(" <TripPoint>\n <SensorType>cpu</SensorType>\n <Temperature>{}</Temperature>\n <Type>Passive</Type>\n <ControlId>{}</ControlId>\n </TripPoint>\n", p.temp_on * 1000.0, i));
}
xml.push_str(" </TripPoints>\n </ThermalZone>\n </ThermalZones>\n </Platform>\n</ThermalConfiguration>\n");
fs::write(output_path, xml)?;
Ok(())
}
}

View File

@@ -118,8 +118,15 @@ Trip_Temp_C: {trip:.0}
result_lines.join("\n")
}
pub fn save(path: &Path, config: &ThrottledConfig) -> Result<()> {
let existing = if path.exists() { std::fs::read_to_string(path)? } else { String::new() };
pub fn save(path: &Path, config: &ThrottledConfig, source_path: Option<&std::path::PathBuf>) -> Result<()> {
let existing = if let Some(src) = source_path {
if src.exists() { std::fs::read_to_string(src).unwrap_or_default() } else { String::new() }
} else if path.exists() {
std::fs::read_to_string(path).unwrap_or_default()
} else {
String::new()
};
let content = if existing.is_empty() { Self::generate_conf(config) } else { Self::merge_conf(&existing, config) };
std::fs::write(path, content)?;
Ok(())

View File

@@ -7,6 +7,7 @@
use serde::{Serialize, Deserialize};
use std::collections::HashMap;
use std::path::PathBuf;
use tracing::{warn, debug};
pub mod formatters;
@@ -25,6 +26,7 @@ pub struct ThermalPoint {
pub struct ThermalProfile {
pub points: Vec<ThermalPoint>,
pub ambient_temp: f32,
pub r_theta: f32,
}
/// The final, recommended parameters derived from the thermal benchmark.
@@ -46,27 +48,21 @@ pub struct OptimizationResult {
pub is_partial: bool,
/// A map of configuration files that were written to.
pub config_paths: HashMap<String, PathBuf>,
/// The comprehensive optimization matrix (Silent, Balanced, Performance).
pub optimization_matrix: Option<crate::agent_analyst::OptimizationMatrix>,
}
/// Pure mathematics engine for thermal optimization.
///
/// Contains no hardware I/O and operates solely on the collected [ThermalProfile].
pub struct OptimizerEngine {
/// The size of the sliding window for the `smooth` function.
window_size: usize,
}
impl OptimizerEngine {
/// Creates a new `OptimizerEngine`.
pub fn new(window_size: usize) -> Self {
Self { window_size }
}
/// Applies a simple moving average (SMA) filter with outlier rejection.
///
/// This function smooths noisy sensor data. It rejects any value in the
/// window that is more than 20.0 units away from the window's average
/// before calculating the final smoothed value.
/// Smoothes sensor jitter using a moving average with outlier rejection.
pub fn smooth(&self, data: &[f32]) -> Vec<f32> {
if data.is_empty() { return vec![]; }
let mut smoothed = Vec::with_capacity(data.len());
@@ -78,7 +74,7 @@ impl OptimizerEngine {
let window = &data[start..end];
let avg: f32 = window.iter().sum::<f32>() / window.len() as f32;
let filtered: Vec<f32> = window.iter()
.filter(|&&v| (v - avg).abs() < 20.0) // Reject spikes > 20 units
.filter(|&&v| (v - avg).abs() < 10.0)
.cloned().collect();
if filtered.is_empty() {
@@ -90,96 +86,65 @@ impl OptimizerEngine {
smoothed
}
/// Calculates Thermal Resistance: R_theta = (T_core - T_ambient) / P_package.
///
/// This function uses the data point with the highest power draw to ensure
/// the calculation reflects a system under maximum thermal load.
pub fn calculate_thermal_resistance(&self, profile: &ThermalProfile) -> f32 {
profile.points.iter()
.filter(|p| p.power_w > 1.0 && p.temp_c > 30.0) // Filter invalid data
.max_by(|a, b| a.power_w.partial_cmp(&b.power_w).unwrap_or(std::cmp::Ordering::Equal))
.map(|p| (p.temp_c - profile.ambient_temp) / p.power_w)
.unwrap_or(0.0)
/// Evaluates if a series of temperature readings have reached thermal equilibrium.
/// Criteria: Standard deviation < 0.25C over the last 10 seconds.
pub fn is_stable(&self, temps: &[f32]) -> bool {
if temps.len() < 20 { return false; } // Need at least 10s of data (500ms intervals)
let window = &temps[temps.len() - 20..];
let avg = window.iter().sum::<f32>() / window.len() as f32;
let variance = window.iter().map(|&t| (t - avg).powi(2)).sum::<f32>() / window.len() as f32;
let std_dev = variance.sqrt();
debug!("Stability Check: StdDev={:.3}C (Target < 0.25C)", std_dev);
std_dev < 0.25
}
/// Returns the maximum temperature recorded in the profile.
pub fn get_max_temp(&self, profile: &ThermalProfile) -> f32 {
profile.points.iter()
.map(|p| p.temp_c)
.max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
.unwrap_or(0.0)
/// Predicts the steady-state temperature for a given target wattage.
/// Formula: T_pred = T_ambient + (P_target * R_theta)
pub fn predict_temp(&self, target_watts: f32, ambient: f32, r_theta: f32) -> f32 {
ambient + (target_watts * r_theta)
}
/// Finds the "Silicon Knee" - the point where performance-per-watt (efficiency)
/// starts to diminish significantly and thermal density spikes.
///
/// This heuristic scoring model balances several factors:
/// 1. **Efficiency Drop:** How quickly does performance-per-watt decrease as power increases?
/// 2. **Thermal Acceleration:** How quickly does temperature rise per additional Watt?
/// 3. **Throttling Penalty:** A large penalty is applied if absolute performance drops, indicating a thermal wall.
///
/// The "Knee" is the power level with the highest score, representing the optimal
/// balance before thermal saturation causes diminishing returns.
/// Calculates Thermal Resistance (K/W) using the steady-state delta.
pub fn calculate_r_theta(&self, ambient: f32, steady_temp: f32, steady_power: f32) -> f32 {
if steady_power < 1.0 { return 0.0; }
(steady_temp - ambient) / steady_power
}
/// Identifies the "Silicon Knee" by finding the point of maximum efficiency.
pub fn find_silicon_knee(&self, profile: &ThermalProfile) -> f32 {
let valid_points: Vec<_> = profile.points.iter()
.filter(|p| p.power_w > 5.0 && p.temp_c > 40.0) // Filter idle/noise
.cloned()
.collect();
if profile.points.is_empty() { return 15.0; }
if valid_points.len() < 3 {
return profile.points.last().map(|p| p.power_w).unwrap_or(15.0);
}
let mut points = valid_points;
let mut points = profile.points.clone();
points.sort_by(|a, b| a.power_w.partial_cmp(&b.power_w).unwrap_or(std::cmp::Ordering::Equal));
let mut best_pl = points[0].power_w;
let mut max_score = f32::MIN;
let efficiencies: Vec<(f32, f32)> = points.iter()
.map(|p| {
let perf = if p.throughput > 0.0 { p.throughput as f32 } else { p.freq_mhz };
(p.power_w, perf / p.power_w.max(1.0))
})
.collect();
// Use a sliding window (3 points) to calculate gradients more robustly
for i in 1..points.len() - 1 {
let prev = &points[i - 1];
let curr = &points[i];
let next = &points[i + 1];
if efficiencies.is_empty() { return 15.0; }
// 1. Efficiency Metric (Throughput per Watt or Freq per Watt)
let efficiency_curr = if curr.throughput > 0.0 {
curr.throughput as f32 / curr.power_w.max(1.0)
let max_efficiency = efficiencies.iter()
.map(|(_, e)| *e)
.max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal))
.unwrap_or(1.0);
let mut knee_watts = points[0].power_w;
for (watts, efficiency) in efficiencies {
if efficiency >= (max_efficiency * 0.85) {
knee_watts = watts;
} else {
curr.freq_mhz / curr.power_w.max(1.0)
};
let efficiency_next = if next.throughput > 0.0 {
next.throughput as f32 / next.power_w.max(1.0)
} else {
next.freq_mhz / next.power_w.max(1.0)
};
let p_delta = (next.power_w - curr.power_w).max(0.5);
let efficiency_drop = (efficiency_curr - efficiency_next) / p_delta;
// 2. Thermal Acceleration (d2T/dW2)
let p_delta_prev = (curr.power_w - prev.power_w).max(0.5);
let p_delta_next = (next.power_w - curr.power_w).max(0.5);
let dt_dw_prev = (curr.temp_c - prev.temp_c) / p_delta_prev;
let dt_dw_next = (next.temp_c - curr.temp_c) / p_delta_next;
let p_total_delta = (next.power_w - prev.power_w).max(1.0);
let temp_accel = (dt_dw_next - dt_dw_prev) / p_total_delta;
// 3. Wall Detection (Any drop in absolute performance is a hard wall)
let is_throttling = next.freq_mhz < curr.freq_mhz || (next.throughput > 0.0 && next.throughput < curr.throughput);
let penalty = if is_throttling { 5000.0 } else { 0.0 };
let score = (efficiency_curr * 10.0) - (efficiency_drop * 50.0) - (temp_accel * 20.0) - penalty;
if score > max_score {
max_score = score;
best_pl = curr.power_w;
debug!("Efficiency drop at {:.1}W ({:.1}% of peak)", watts, (efficiency/max_efficiency)*100.0);
break;
}
}
best_pl
knee_watts.clamp(PowerLimitWatts::MIN, PowerLimitWatts::MAX)
}
}
use crate::sal::safety::PowerLimitWatts;

0
src/engine/profiles.rs Normal file
View File

View File

@@ -12,3 +12,5 @@ pub mod ui;
pub mod engine;
pub mod cli;
pub mod sys;
pub mod agent_analyst;
pub mod agent_integrator;

View File

@@ -1,60 +1,145 @@
//! Defines the `Workload` trait for generating synthetic CPU/GPU load.
//! Load generation and performance measurement subsystem.
use anyhow::Result;
use std::process::Child;
use anyhow::{Result, Context, anyhow};
use std::process::{Child, Command, Stdio};
use std::time::{Duration, Instant};
use std::thread;
use std::io::{BufRead, BufReader};
use std::sync::{Arc, Mutex};
use serde::{Deserialize, Serialize};
/// A trait for objects that can generate a measurable system load.
pub trait Workload: Send + Sync {
/// Starts the workload with the specified number of threads and load percentage.
///
/// # Errors
/// Returns an error if the underlying stress test process fails to spawn.
fn start(&mut self, threads: usize, load_percent: usize) -> Result<()>;
/// Stops the workload gracefully.
///
/// # Errors
/// This method should aim to not fail, but may return an error if
/// forcefully killing the child process fails.
fn stop(&mut self) -> Result<()>;
/// Returns the current throughput of the workload (e.g., ops/sec).
///
/// # Errors
/// Returns an error if throughput cannot be measured.
fn get_throughput(&self) -> Result<f64>;
/// Standardized telemetry returned by any workload implementation.
#[derive(Debug, Clone, Serialize, Deserialize, Default)]
pub struct WorkloadMetrics {
/// Primary performance heuristic (e.g., Bogo Ops/s)
pub primary_ops_per_sec: f64,
/// Time elapsed since the workload started
pub elapsed_time: Duration,
}
/// An implementation of `Workload` that uses the `stress-ng` utility.
/// Defines which subsystem to isolate during stress testing.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum StressVector {
CpuMatrix,
MemoryBandwidth,
Mixed,
}
/// A normalized profile defining the intensity and constraints of the workload.
#[derive(Debug, Clone)]
pub struct IntensityProfile {
pub threads: usize,
pub load_percentage: u8,
pub vector: StressVector,
}
/// The replaceable interface for load generation and performance measurement.
pub trait Workload: Send + Sync {
/// Sets up prerequisites (e.g., binary checks).
fn initialize(&mut self) -> Result<()>;
/// Executes the load asynchronously.
fn run_workload(&mut self, duration: Duration, profile: IntensityProfile) -> Result<()>;
/// Returns the current standardized telemetry object.
fn get_current_metrics(&self) -> Result<WorkloadMetrics>;
/// Gracefully and forcefully terminates the workload.
fn stop_workload(&mut self) -> Result<()>;
}
/// Implementation of the Benchmarking Interface using stress-ng matrix stressors.
pub struct StressNg {
child: Option<Child>,
start_time: Option<Instant>,
latest_metrics: Arc<Mutex<WorkloadMetrics>>,
}
impl StressNg {
pub fn new() -> Self {
Self { child: None }
Self {
child: None,
start_time: None,
latest_metrics: Arc::new(Mutex::new(WorkloadMetrics::default())),
}
}
}
impl Workload for StressNg {
fn start(&mut self, threads: usize, load_percent: usize) -> Result<()> {
self.stop()?;
fn initialize(&mut self) -> Result<()> {
let status = Command::new("stress-ng")
.arg("--version")
.stdout(Stdio::null())
.stderr(Stdio::null())
.status()
.context("stress-ng binary not found in PATH. Please install it.")?;
let child = std::process::Command::new("stress-ng")
.args([
"--cpu", &threads.to_string(),
"--cpu-load", &load_percent.to_string(),
"--quiet"
])
.spawn()?;
if !status.success() {
return Err(anyhow!("stress-ng failed to initialize"));
}
Ok(())
}
fn run_workload(&mut self, duration: Duration, profile: IntensityProfile) -> Result<()> {
self.stop_workload()?;
let threads = profile.threads.to_string();
let timeout = format!("{}s", duration.as_secs());
let load = profile.load_percentage.to_string();
let mut cmd = Command::new("stress-ng");
cmd.args(["--timeout", &timeout, "--metrics", "--quiet", "--cpu-load", &load]);
match profile.vector {
StressVector::CpuMatrix => {
cmd.args(["--matrix", &threads]);
},
StressVector::MemoryBandwidth => {
cmd.args(["--vm", &threads, "--vm-bytes", "80%"]);
},
StressVector::Mixed => {
let half = (profile.threads / 2).max(1).to_string();
cmd.args(["--matrix", &half, "--vm", &half, "--vm-bytes", "40%"]);
}
}
let mut child = cmd.stderr(Stdio::piped()).spawn().context("Failed to spawn stress-ng")?;
self.start_time = Some(Instant::now());
// Spawn metrics parser thread
let metrics_ref = Arc::clone(&self.latest_metrics);
let stderr = child.stderr.take().expect("Failed to capture stderr");
thread::spawn(move || {
let reader = BufReader::new(stderr);
for line in reader.lines().flatten() {
// Parse stress-ng metrics line
if line.contains("matrix") || line.contains("vm") {
let parts: Vec<&str> = line.split_whitespace().collect();
if let Some(val) = parts.last() {
if let Ok(ops) = val.parse::<f64>() {
let mut m = metrics_ref.lock().unwrap();
m.primary_ops_per_sec = ops;
}
}
}
}
});
self.child = Some(child);
Ok(())
}
fn stop(&mut self) -> Result<()> {
fn get_current_metrics(&self) -> Result<WorkloadMetrics> {
let mut m = self.latest_metrics.lock().unwrap().clone();
if let Some(start) = self.start_time {
m.elapsed_time = start.elapsed();
}
Ok(m)
}
fn stop_workload(&mut self) -> Result<()> {
if let Some(mut child) = self.child.take() {
#[cfg(unix)]
{
@@ -77,19 +162,13 @@ impl Workload for StressNg {
let _ = child.wait();
}
}
self.start_time = None;
Ok(())
}
/// Returns the current throughput of the workload (e.g., ops/sec).
///
/// This is currently a stub and does not parse `stress-ng` output.
fn get_throughput(&self) -> Result<f64> {
Ok(0.0)
}
}
impl Drop for StressNg {
fn drop(&mut self) {
let _ = self.stop();
let _ = self.stop_workload();
}
}

View File

@@ -8,7 +8,8 @@ use std::sync::atomic::{AtomicBool, Ordering};
use std::io;
use clap::Parser;
use tracing::{info, debug, error};
use tracing::error;
use tracing_subscriber::{fmt, prelude::*, EnvFilter};
use crossterm::{
event::{self, Event, KeyCode},
@@ -68,27 +69,24 @@ fn print_summary_report(result: &OptimizationResult) {
println!();
}
fn setup_logging(verbose: bool) -> tracing_appender::non_blocking::WorkerGuard {
let file_appender = tracing_appender::rolling::never("/var/log", "ember-tune.log");
let (non_blocking, guard) = tracing_appender::non_blocking(file_appender);
fn main() -> Result<()> {
let args = Cli::parse();
let level = if verbose { tracing::Level::DEBUG } else { tracing::Level::INFO };
// 1. Logging Setup (File-only by default, Stdout during Audit)
let file_appender = tracing_appender::rolling::never(".", "ember-tune.log");
let (non_blocking, _guard) = tracing_appender::non_blocking(file_appender);
let level = if args.verbose { "debug" } else { "info" };
tracing_subscriber::fmt()
.with_max_level(level)
let file_layer = fmt::layer()
.with_writer(non_blocking)
.with_ansi(false)
.with_ansi(false);
// We use a simple println for the audit to avoid complex reload handles
tracing_subscriber::registry()
.with(EnvFilter::new(level))
.with(file_layer)
.init();
guard
}
fn main() -> Result<()> {
// 1. Diagnostics & CLI Initialization
let args = Cli::parse();
let _log_guard = setup_logging(args.verbose);
// Set panic hook to restore terminal state
std::panic::set_hook(Box::new(|panic_info| {
let _ = disable_raw_mode();
let mut stdout = io::stdout();
@@ -99,11 +97,10 @@ fn main() -> Result<()> {
eprintln!("----------------------------------------\n");
}));
info!("ember-tune starting with args: {:?}", args);
println!("{}", console::style("─── Pre-flight System Audit ───").bold().cyan());
let ctx = ember_tune_rs::sal::traits::EnvironmentCtx::production();
// 2. Platform Detection & Audit
let (sal_box, facts): (Box<dyn PlatformSal>, SystemFactSheet) = if args.mock {
(Box::new(MockSal::new()), SystemFactSheet::default())
} else {
@@ -111,9 +108,7 @@ fn main() -> Result<()> {
};
let sal: Arc<dyn PlatformSal> = sal_box.into();
println!("{}", console::style("─── Pre-flight System Audit ───").bold().cyan());
let mut audit_failures = Vec::new();
for step in sal.audit() {
print!(" Checking {:<40} ", step.description);
io::Write::flush(&mut io::stdout()).into_diagnostic()?;
@@ -137,15 +132,14 @@ fn main() -> Result<()> {
return Ok(());
}
// 3. Terminal Setup
// Entering TUI Mode - STDOUT is now strictly for Ratatui
enable_raw_mode().into_diagnostic()?;
let mut stdout = io::stdout();
execute!(stdout, EnterAlternateScreen).into_diagnostic()?;
execute!(stdout, EnterAlternateScreen, crossterm::cursor::Hide).into_diagnostic()?;
let backend_stdout = io::stdout();
let backend_term = CrosstermBackend::new(backend_stdout);
let mut terminal = Terminal::new(backend_term).into_diagnostic()?;
// 4. State & Communication Setup
let running = Arc::new(AtomicBool::new(true));
let r = running.clone();
@@ -158,9 +152,9 @@ fn main() -> Result<()> {
r.store(false, Ordering::SeqCst);
}).expect("Error setting Ctrl-C handler");
// 5. Spawn Backend Orchestrator
let sal_backend = sal.clone();
let facts_backend = facts.clone();
let config_out = args.config_out.clone();
let backend_handle = thread::spawn(move || {
let workload = Box::new(StressNg::new());
let mut orchestrator = BenchmarkOrchestrator::new(
@@ -169,14 +163,14 @@ fn main() -> Result<()> {
workload,
telemetry_tx,
command_rx,
config_out,
);
orchestrator.run()
});
// 6. Frontend Event Loop
let mut ui_state = DashboardState::new();
let mut last_telemetry = TelemetryState {
cpu_model: "Loading...".to_string(),
cpu_model: facts.model.clone(),
total_ram_gb: 0,
tick: 0,
cpu_temp: 0.0,
@@ -187,6 +181,7 @@ fn main() -> Result<()> {
pl1_limit: 0.0,
pl2_limit: 0.0,
fan_tier: "auto".to_string(),
is_throttling: false,
phase: BenchmarkPhase::Auditing,
history_watts: Vec::new(),
history_temp: Vec::new(),
@@ -224,7 +219,6 @@ fn main() -> Result<()> {
while let Ok(new_state) = telemetry_rx.try_recv() {
if let Some(log) = &new_state.log_event {
ui_state.add_log(log.clone());
debug!("Backend Log: {}", log);
} else {
ui_state.update(&new_state);
last_telemetry = new_state;
@@ -235,20 +229,11 @@ fn main() -> Result<()> {
if backend_handle.is_finished() { break; }
}
// 7. Terminal Restoration
let _ = disable_raw_mode();
let _ = execute!(terminal.backend_mut(), LeaveAlternateScreen);
let _ = terminal.show_cursor();
let _ = execute!(terminal.backend_mut(), LeaveAlternateScreen, crossterm::cursor::Show);
// 8. Final Report & Hardware Restoration
let join_res = backend_handle.join();
// Explicit hardware restoration
info!("Restoring hardware state...");
if let Err(e) = sal.restore() {
error!("Failed to restore hardware state: {}", e);
}
match join_res {
Ok(Ok(result)) => {
print_summary_report(&result);
@@ -273,6 +258,5 @@ fn main() -> Result<()> {
}
}
info!("ember-tune exited gracefully.");
Ok(())
}

View File

@@ -35,6 +35,7 @@ pub struct TelemetryState {
pub pl1_limit: f32,
pub pl2_limit: f32,
pub fan_tier: String,
pub is_throttling: bool,
pub phase: BenchmarkPhase,
// --- High-res History ---

View File

@@ -3,7 +3,8 @@
//! It manages hardware interactions through the [PlatformSal], generates stress
//! using a [Workload], and feeds telemetry to the frontend via MPSC channels.
use anyhow::{Result, Context};
use anyhow::{Result, Context, bail};
use tracing::{info, warn, error, debug};
use std::sync::mpsc;
use std::time::{Duration, Instant};
use std::thread;
@@ -12,61 +13,57 @@ use sysinfo::System;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Mutex;
use std::path::PathBuf;
use std::cell::Cell;
use crate::sal::traits::{PlatformSal, SafetyStatus};
use crate::sal::traits::{PlatformSal, SensorBus};
use crate::sal::heuristic::discovery::SystemFactSheet;
use crate::load::Workload;
use crate::sal::safety::{HardwareStateGuard, PowerLimitWatts, ThermalWatchdog};
use crate::load::{Workload, IntensityProfile, StressVector};
use crate::mediator::{TelemetryState, UiCommand, BenchmarkPhase};
use crate::engine::{OptimizerEngine, ThermalProfile, ThermalPoint, OptimizationResult};
use crate::agent_analyst::HeuristicAnalyst;
use crate::agent_integrator::ServiceIntegrator;
/// Represents the possible states of the benchmark orchestrator.
pub enum OrchestratorState {
PreFlight,
IdleBaseline,
ThermalCalibration,
StabilitySweep,
Cooldown,
Finalizing,
}
/// The central state machine responsible for coordinating the thermal benchmark.
///
/// It manages hardware interactions through the [PlatformSal], generates stress
/// using a [Workload], and feeds telemetry to the frontend via MPSC channels.
pub struct BenchmarkOrchestrator {
/// Injected hardware abstraction layer.
sal: Arc<dyn PlatformSal>,
/// Discovered system facts and paths.
facts: SystemFactSheet,
/// Heat generation workload.
workload: Box<dyn Workload>,
/// Channel for sending telemetry updates to the UI.
telemetry_tx: mpsc::Sender<TelemetryState>,
/// Channel for receiving commands from the UI.
command_rx: mpsc::Receiver<UiCommand>,
/// Current phase of the benchmark.
phase: BenchmarkPhase,
/// Accumulated thermal data points.
ui_phase: BenchmarkPhase,
profile: ThermalProfile,
/// Mathematics engine for data smoothing and optimization.
engine: OptimizerEngine,
/// Sliding window of power readings (Watts).
optional_config_out: Option<PathBuf>,
safeguard: Option<HardwareStateGuard>,
watchdog: Option<ThermalWatchdog>,
history_watts: VecDeque<f32>,
/// Sliding window of temperature readings (Celsius).
history_temp: VecDeque<f32>,
/// Sliding window of CPU frequency (MHz).
history_mhz: VecDeque<f32>,
/// Detected CPU model string.
cpu_model: String,
/// Total system RAM in Gigabytes.
total_ram_gb: u64,
/// Atomic flag indicating a safety-triggered abort.
emergency_abort: Arc<AtomicBool>,
/// Human-readable reason for the emergency abort.
emergency_reason: Arc<Mutex<Option<String>>>,
}
impl BenchmarkOrchestrator {
/// Creates a new orchestrator instance with injected dependencies.
pub fn new(
sal: Arc<dyn PlatformSal>,
facts: SystemFactSheet,
workload: Box<dyn Workload>,
telemetry_tx: mpsc::Sender<TelemetryState>,
command_rx: mpsc::Receiver<UiCommand>,
optional_config_out: Option<PathBuf>,
) -> Self {
let mut sys = System::new_all();
sys.refresh_all();
@@ -82,7 +79,7 @@ impl BenchmarkOrchestrator {
workload,
telemetry_tx,
command_rx,
phase: BenchmarkPhase::Auditing,
ui_phase: BenchmarkPhase::Auditing,
profile: ThermalProfile::default(),
engine: OptimizerEngine::new(5),
history_watts: VecDeque::with_capacity(120),
@@ -92,244 +89,252 @@ impl BenchmarkOrchestrator {
total_ram_gb,
emergency_abort: Arc::new(AtomicBool::new(false)),
emergency_reason: Arc::new(Mutex::new(None)),
optional_config_out,
safeguard: None,
watchdog: None,
}
}
/// Executes the full benchmark sequence.
///
/// This method guarantees that [crate::sal::traits::EnvironmentGuard::restore] and [Workload::stop]
/// are called regardless of whether the benchmark succeeds or fails.
pub fn run(&mut self) -> Result<OptimizationResult> {
self.log("Starting ember-tune Benchmark Sequence.")?;
// Immediate Priming
let _ = self.sal.get_temp();
let _ = self.sal.get_power_w();
let _ = self.sal.get_fan_rpms();
let _watchdog_handle = self.spawn_watchdog_monitor();
info!("Orchestrator: Initializing Project Iron-Ember PGC Protocol.");
// Spawn safety watchdog immediately
let watchdog = ThermalWatchdog::spawn(self.sal.clone(), self.emergency_abort.clone());
self.watchdog = Some(watchdog);
let result = self.execute_benchmark();
self.log("Benchmark sequence finished. Restoring hardware defaults...")?;
let _ = self.workload.stop();
if let Err(e) = self.sal.restore() {
anyhow::bail!("CRITICAL: Failed to restore hardware state: {}", e);
if let Err(ref e) = result {
error!("Benchmark Lifecycle Failure: {}", e);
let _ = self.log(&format!("⚠ FAILURE: {}", e));
}
self.log("✓ Hardware state restored.")?;
// --- MANDATORY RAII CLEANUP ---
info!("Benchmark sequence complete. Releasing safeguards...");
let _ = self.workload.stop_workload();
if let Some(mut sg) = self.safeguard.take() {
let _ = sg.release();
}
if let Err(e) = self.sal.restore() {
warn!("Failed secondary SAL restoration: {}", e);
}
info!("✓ Hardware state restored.");
result
}
/// Internal execution logic for the benchmark phases.
fn execute_benchmark(&mut self) -> Result<OptimizationResult> {
let bench_cfg = self.facts.bench_config.clone().context("Benchmarking config missing in facts")?;
let _bench_cfg = self.facts.bench_config.clone().context("Config missing.")?;
// 1. Pre-Flight Phase
self.ui_phase = BenchmarkPhase::Auditing;
self.log("Phase: Pre-Flight Auditing & Sterilization")?;
let mut target_files = self.facts.rapl_paths.iter()
.map(|p| p.join("constraint_0_power_limit_uw"))
.collect::<Vec<_>>();
target_files.extend(self.facts.rapl_paths.iter().map(|p| p.join("constraint_1_power_limit_uw")));
if let Some(tp) = self.facts.paths.configs.get("throttled") {
target_files.push(tp.clone());
}
let sg = HardwareStateGuard::acquire(&target_files, &self.facts.conflict_services)?;
self.safeguard = Some(sg);
self.phase = BenchmarkPhase::Auditing;
for step in self.sal.audit() {
if let Err(e) = step.outcome {
return Err(anyhow::anyhow!("Audit failed ({}): {:?}", step.description, e));
}
}
self.log("Suppressing background services (tlp, thermald)...")?;
self.sal.suppress().context("Failed to suppress background services")?;
self.workload.initialize().context("Failed to initialize load generator.")?;
self.sal.suppress().context("Failed to suppress background services.")?;
self.phase = BenchmarkPhase::IdleCalibration;
self.log(&format!("Phase 1: Recording Idle Baseline ({}s)...", bench_cfg.idle_duration_s))?;
let tick = Cell::new(0u64);
// 2. Idle Baseline Phase
self.ui_phase = BenchmarkPhase::IdleCalibration;
self.log("Phase: Recording 30s Idle Baseline...")?;
self.sal.set_fan_mode("auto")?;
let mut idle_temps = Vec::new();
let start = Instant::now();
let mut tick = 0;
while start.elapsed() < Duration::from_secs(bench_cfg.idle_duration_s) {
self.check_abort()?;
self.send_telemetry(tick)?;
while start.elapsed() < Duration::from_secs(30) {
self.check_safety_abort()?;
self.send_telemetry(tick.get())?;
idle_temps.push(self.sal.get_temp().unwrap_or(0.0));
tick += 1;
tick.set(tick.get() + 1);
thread::sleep(Duration::from_millis(500));
}
self.profile.ambient_temp = self.engine.smooth(&idle_temps).last().cloned().unwrap_or(0.0);
self.profile.ambient_temp = self.engine.smooth(&idle_temps).iter().sum::<f32>() / idle_temps.len() as f32;
self.log(&format!("✓ Idle Baseline: {:.1}°C", self.profile.ambient_temp))?;
self.phase = BenchmarkPhase::StressTesting;
self.log("Phase 2: Starting Synthetic Stress Matrix.")?;
// 3. Thermal Resistance Mapping (Phase 1)
self.log("Phase: Mapping Thermal Resistance (Rθ) at 10W...")?;
self.sal.set_fan_mode("max")?;
let steps = bench_cfg.power_steps_watts.clone();
for &pl in &steps {
self.log(&format!("Testing PL1 = {:.0}W...", pl))?;
self.sal.set_sustained_power_limit(pl)?;
self.sal.set_burst_power_limit(pl + 5.0)?;
let pl_calib = PowerLimitWatts::try_new(10.0)?;
self.sal.set_sustained_power_limit(pl_calib)?;
self.sal.set_burst_power_limit(pl_calib)?;
self.workload.start(num_cpus::get(), 100)?;
self.workload.run_workload(
Duration::from_secs(120),
IntensityProfile { threads: num_cpus::get_physical(), load_percentage: 100, vector: StressVector::CpuMatrix }
)?;
let mut calib_temps = Vec::new();
let calib_start = Instant::now();
while calib_start.elapsed() < Duration::from_secs(90) {
self.check_safety_abort()?;
self.send_telemetry(tick.get())?;
let t = self.sal.get_temp().unwrap_or(0.0);
calib_temps.push(t);
tick.set(tick.get() + 1);
if calib_start.elapsed() > Duration::from_secs(30) && self.engine.is_stable(&calib_temps) {
break;
}
thread::sleep(Duration::from_millis(500));
}
let steady_t = calib_temps.last().cloned().unwrap_or(0.0);
let steady_p = self.sal.get_power_w().unwrap_or(10.0);
self.profile.r_theta = self.engine.calculate_r_theta(self.profile.ambient_temp, steady_t, steady_p);
self.log(&format!("✓ Physical Model: Rθ = {:.3} K/W", self.profile.r_theta))?;
// 4. Physically-Aware Stability Sweep (Phase 2)
self.ui_phase = BenchmarkPhase::StressTesting;
self.log("Phase: Starting Physically-Aware Efficiency Sweep...")?;
let mut current_w = 12.0_f32;
let mut previous_ops = 0.0;
loop {
// Predict if this step is safe
let pred_t = self.engine.predict_temp(current_w, self.profile.ambient_temp, self.profile.r_theta);
if pred_t > 92.0 {
self.log(&format!("Prediction: {:.1}W would result in {:.1}C (Too Hot). Finalizing...", current_w, pred_t))?;
break;
}
self.log(&format!("Step: {:.1}W (Predicted: {:.1}C)", current_w, pred_t))?;
let pl = PowerLimitWatts::try_new(current_w)?;
self.sal.set_sustained_power_limit(pl)?;
self.sal.set_burst_power_limit(PowerLimitWatts::try_new(current_w + 2.0)?)?;
self.workload.run_workload(
Duration::from_secs(60),
IntensityProfile { threads: num_cpus::get_physical(), load_percentage: 100, vector: StressVector::CpuMatrix }
)?;
let step_start = Instant::now();
let mut step_temps = VecDeque::with_capacity(30);
let mut step_temps = Vec::new();
let mut previous_t = self.sal.get_temp().unwrap_or(0.0);
while step_start.elapsed() < Duration::from_secs(bench_cfg.stress_duration_max_s) {
self.check_abort()?;
while step_start.elapsed() < Duration::from_secs(60) {
self.check_safety_abort()?;
self.send_telemetry(tick.get())?;
let t = self.sal.get_temp().unwrap_or(0.0);
step_temps.push_back(t);
if step_temps.len() > 10 { step_temps.pop_front(); }
let dt_dt = (t - previous_t) / 0.5;
self.send_telemetry(tick)?;
tick += 1;
// # SAFETY: predictive hard-quench threshold raised to 8C/s
if step_start.elapsed() > Duration::from_secs(2) && (t > 95.0 || dt_dt > 8.0) {
warn!("USA: Safety Break triggered! T={:.1}C, dT/dt={:.1}C/s", t, dt_dt);
let _ = self.sal.set_sustained_power_limit(PowerLimitWatts::try_new(3.0)?);
break; // Just break the sweep loop
}
if step_start.elapsed() > Duration::from_secs(bench_cfg.stress_duration_min_s) && step_temps.len() == 10 {
let min = step_temps.iter().fold(f32::MAX, |a, &b| a.min(b));
let max = step_temps.iter().fold(f32::MIN, |a, &b| a.max(b));
if (max - min) < 0.5 {
step_temps.push(t);
tick.set(tick.get() + 1);
if step_start.elapsed() > Duration::from_secs(15) && self.engine.is_stable(&step_temps) {
self.log(&format!(" Equilibrium reached at {:.1}°C", t))?;
break;
}
}
previous_t = t;
thread::sleep(Duration::from_millis(500));
}
let avg_p = self.sal.get_power_w().unwrap_or(0.0);
let avg_t = self.sal.get_temp().unwrap_or(0.0);
let avg_f = self.sal.get_freq_mhz().unwrap_or(0.0);
let fans = self.sal.get_fan_rpms().unwrap_or_default();
let primary_fan = fans.first().cloned().unwrap_or(0);
let tp = self.workload.get_throughput().unwrap_or(0.0);
let metrics = self.workload.get_current_metrics().unwrap_or_default();
self.profile.points.push(ThermalPoint {
power_w: avg_p,
temp_c: avg_t,
freq_mhz: avg_f,
fan_rpm: primary_fan,
throughput: tp,
power_w: self.sal.get_power_w().unwrap_or(current_w),
temp_c: self.sal.get_temp().unwrap_or(0.0),
freq_mhz: self.sal.get_freq_mhz().unwrap_or(0.0),
fan_rpm: self.sal.get_fan_rpms().unwrap_or_default().first().cloned().unwrap_or(0),
throughput: metrics.primary_ops_per_sec,
});
self.workload.stop()?;
self.log(&format!(" Step complete. Cooling down for {}s...", bench_cfg.cool_down_s))?;
thread::sleep(Duration::from_secs(bench_cfg.cool_down_s));
self.workload.stop_workload()?;
// Efficiency Break
if previous_ops > 0.0 {
let gain = ((metrics.primary_ops_per_sec - previous_ops) / previous_ops) * 100.0;
if gain < 1.0 {
self.log("Silicon Knee identified (gain < 1%). Finalizing...")?;
break;
}
}
previous_ops = metrics.primary_ops_per_sec;
current_w += 2.0;
if current_w > 45.0 { break; }
self.log(&format!("Cooling down ({}s)...", _bench_cfg.cool_down_s))?;
thread::sleep(Duration::from_secs(_bench_cfg.cool_down_s));
}
self.phase = BenchmarkPhase::PhysicalModeling;
self.log("Phase 3: Calculating Silicon Physical Sweet Spot...")?;
// 5. Modeling Phase
self.ui_phase = BenchmarkPhase::PhysicalModeling;
let knee = self.engine.find_silicon_knee(&self.profile);
let analyst = HeuristicAnalyst::new();
let matrix = analyst.analyze(&self.profile, self.profile.points.last().map(|p| p.power_w).unwrap_or(15.0));
let mut res = self.generate_result(false);
res.optimization_matrix = Some(matrix.clone());
res.silicon_knee_watts = knee;
self.log(&format!("✓ Thermal Resistance (Rθ): {:.3} K/W", res.thermal_resistance_kw))?;
self.log(&format!("✓ Silicon Knee Found: {:.1} W", res.silicon_knee_watts))?;
thread::sleep(Duration::from_secs(3));
self.phase = BenchmarkPhase::Finalizing;
self.log("Benchmark sequence complete. Generating configurations...")?;
// 6. Finalizing Phase
self.ui_phase = BenchmarkPhase::Finalizing;
let throttled_source = self.facts.paths.configs.get("throttled");
if let Some(path) = self.optional_config_out.clone().or_else(|| throttled_source.cloned()) {
let config = crate::engine::formatters::throttled::ThrottledConfig {
pl1_limit: res.silicon_knee_watts,
pl2_limit: res.recommended_pl2,
trip_temp: res.max_temp_c.max(95.0),
pl2_limit: res.silicon_knee_watts * 1.25,
trip_temp: 90.0,
};
if let Some(throttled_path) = self.facts.paths.configs.get("throttled") {
crate::engine::formatters::throttled::ThrottledTranslator::save(throttled_path, &config)?;
self.log(&format!("✓ Saved '{}' (merged).", throttled_path.display()))?;
res.config_paths.insert("throttled".to_string(), throttled_path.clone());
let _ = crate::engine::formatters::throttled::ThrottledTranslator::save(&path, &config, throttled_source);
res.config_paths.insert("throttled".to_string(), path);
}
if let Some(i8k_path) = self.facts.paths.configs.get("i8kmon") {
let i8k_config = crate::engine::formatters::i8kmon::I8kmonConfig {
t_ambient: self.profile.ambient_temp,
t_max_fan: res.max_temp_c - 5.0,
thermal_resistance_kw: res.thermal_resistance_kw,
};
crate::engine::formatters::i8kmon::I8kmonTranslator::save(i8k_path, &i8k_config)?;
self.log(&format!("✓ Saved '{}'.", i8k_path.display()))?;
res.config_paths.insert("i8kmon".to_string(), i8k_path.clone());
let base_out = self.optional_config_out.clone().unwrap_or_else(|| PathBuf::from("/etc"));
let i8k_source = self.facts.paths.configs.get("i8kmon");
let i8k_out = base_out.join("i8kmon.conf");
if ServiceIntegrator::generate_i8kmon_config(&matrix, &i8k_out, i8k_source).is_ok() {
res.config_paths.insert("i8kmon".to_string(), i8k_out);
}
Ok(res)
}
/// Spawns a concurrent monitor that polls safety sensors every 100ms.
fn spawn_watchdog_monitor(&self) -> thread::JoinHandle<()> {
let abort = self.emergency_abort.clone();
let reason_store = self.emergency_reason.clone();
let sal = self.sal.clone();
let tx = self.telemetry_tx.clone();
thread::spawn(move || {
while !abort.load(Ordering::SeqCst) {
let status = sal.get_safety_status();
match status {
Ok(SafetyStatus::EmergencyAbort(reason)) => {
*reason_store.lock().unwrap() = Some(reason.clone());
abort.store(true, Ordering::SeqCst);
break;
}
Ok(SafetyStatus::Warning(msg)) | Ok(SafetyStatus::Critical(msg)) => {
let state = TelemetryState {
cpu_model: String::new(),
total_ram_gb: 0,
tick: 0,
cpu_temp: 0.0,
power_w: 0.0,
current_freq: 0.0,
fans: Vec::new(),
governor: String::new(),
pl1_limit: 0.0,
pl2_limit: 0.0,
fan_tier: String::new(),
phase: BenchmarkPhase::StressTesting,
history_watts: Vec::new(),
history_temp: Vec::new(),
history_mhz: Vec::new(),
log_event: Some(format!("WATCHDOG: {}", msg)),
metadata: std::collections::HashMap::new(),
is_emergency: false,
emergency_reason: None,
};
let _ = tx.send(state);
}
Ok(SafetyStatus::Nominal) => {}
Err(e) => {
*reason_store.lock().unwrap() = Some(format!("Watchdog Sensor Failure: {}", e));
abort.store(true, Ordering::SeqCst);
break;
}
}
thread::sleep(Duration::from_millis(100));
}
})
}
/// Generates the final [OptimizationResult] based on current measurements.
pub fn generate_result(&self, is_partial: bool) -> OptimizationResult {
let r_theta = self.engine.calculate_thermal_resistance(&self.profile);
let knee = self.engine.find_silicon_knee(&self.profile);
let max_t = self.engine.get_max_temp(&self.profile);
OptimizationResult {
profile: self.profile.clone(),
silicon_knee_watts: knee,
thermal_resistance_kw: r_theta,
recommended_pl1: knee,
recommended_pl2: knee * 1.25,
max_temp_c: max_t,
is_partial,
config_paths: std::collections::HashMap::new(),
}
}
/// Checks if the benchmark has been aborted by the user or the watchdog.
fn check_abort(&self) -> Result<()> {
fn check_safety_abort(&self) -> Result<()> {
if self.emergency_abort.load(Ordering::SeqCst) {
let reason = self.emergency_reason.lock().unwrap().clone().unwrap_or_else(|| "Unknown safety trigger".to_string());
return Err(anyhow::anyhow!("EMERGENCY_ABORT: {}", reason));
let reason = self.emergency_reason.lock().unwrap().clone().unwrap_or_else(|| "Watchdog".to_string());
bail!("EMERGENCY_ABORT: {}", reason);
}
if let Ok(cmd) = self.command_rx.try_recv() {
match cmd {
UiCommand::Abort => {
return Err(anyhow::anyhow!("ABORTED"));
}
}
if let UiCommand::Abort = cmd { bail!("ABORTED"); }
}
Ok(())
}
/// Helper to send log messages to the frontend.
fn log(&self, msg: &str) -> Result<()> {
let state = TelemetryState {
cpu_model: self.cpu_model.clone(),
@@ -339,51 +344,38 @@ impl BenchmarkOrchestrator {
power_w: self.sal.get_power_w().unwrap_or(0.0),
current_freq: self.sal.get_freq_mhz().unwrap_or(0.0),
fans: self.sal.get_fan_rpms().unwrap_or_default(),
governor: "unknown".to_string(),
pl1_limit: 0.0,
pl2_limit: 0.0,
fan_tier: "auto".to_string(),
phase: self.phase,
history_watts: Vec::new(),
history_temp: Vec::new(),
history_mhz: Vec::new(),
governor: "performance".to_string(),
pl1_limit: 0.0, pl2_limit: 0.0, fan_tier: "auto".to_string(),
is_throttling: self.sal.get_throttling_status().unwrap_or(false),
phase: self.ui_phase,
history_watts: Vec::new(), history_temp: Vec::new(), history_mhz: Vec::new(),
log_event: Some(msg.to_string()),
metadata: std::collections::HashMap::new(),
is_emergency: self.emergency_abort.load(Ordering::SeqCst),
emergency_reason: self.emergency_reason.lock().unwrap().clone(),
};
self.telemetry_tx.send(state).map_err(|_| anyhow::anyhow!("Telemetry channel closed"))
self.telemetry_tx.send(state).map_err(|_| anyhow::anyhow!("Channel closed"))
}
/// Collects current sensors and sends a complete [TelemetryState] to the frontend.
fn send_telemetry(&mut self, tick: u64) -> Result<()> {
let temp = self.sal.get_temp().unwrap_or(0.0);
let pwr = self.sal.get_power_w().unwrap_or(0.0);
let freq = self.sal.get_freq_mhz().unwrap_or(0.0);
self.history_temp.push_back(temp);
self.history_watts.push_back(pwr);
self.history_mhz.push_back(freq);
if self.history_temp.len() > 120 {
self.history_temp.pop_front();
self.history_watts.pop_front();
self.history_mhz.pop_front();
}
if self.history_temp.len() > 120 { self.history_temp.pop_front(); self.history_watts.pop_front(); self.history_mhz.pop_front(); }
let state = TelemetryState {
cpu_model: self.cpu_model.clone(),
total_ram_gb: self.total_ram_gb,
tick,
cpu_temp: temp,
power_w: pwr,
current_freq: freq,
cpu_temp: temp, power_w: pwr, current_freq: freq,
fans: self.sal.get_fan_rpms().unwrap_or_default(),
governor: "performance".to_string(),
pl1_limit: 15.0,
pl2_limit: 25.0,
fan_tier: "max".to_string(),
phase: self.phase,
pl1_limit: 15.0, pl2_limit: 25.0, fan_tier: "max".to_string(),
is_throttling: self.sal.get_throttling_status().unwrap_or(false),
phase: self.ui_phase,
history_watts: self.history_watts.iter().cloned().collect(),
history_temp: self.history_temp.iter().cloned().collect(),
history_mhz: self.history_mhz.iter().cloned().collect(),
@@ -392,6 +384,22 @@ impl BenchmarkOrchestrator {
is_emergency: self.emergency_abort.load(Ordering::SeqCst),
emergency_reason: self.emergency_reason.lock().unwrap().clone(),
};
self.telemetry_tx.send(state).map_err(|_| anyhow::anyhow!("Telemetry channel closed"))
self.telemetry_tx.send(state).map_err(|_| anyhow::anyhow!("Channel closed"))
}
pub fn generate_result(&self, is_partial: bool) -> OptimizationResult {
let r_theta = self.profile.r_theta;
let knee = self.engine.find_silicon_knee(&self.profile);
OptimizationResult {
profile: self.profile.clone(),
silicon_knee_watts: knee,
thermal_resistance_kw: r_theta,
recommended_pl1: knee,
recommended_pl2: knee * 1.25,
max_temp_c: self.profile.points.iter().map(|p| p.temp_c).max_by(|a, b| a.partial_cmp(b).unwrap_or(std::cmp::Ordering::Equal)).unwrap_or(0.0),
is_partial,
config_paths: std::collections::HashMap::new(),
optimization_matrix: None,
}
}
}

View File

@@ -1,35 +1,81 @@
use super::traits::{PreflightAuditor, EnvironmentGuard, SensorBus, ActuatorBus, HardwareWatchdog, AuditError, AuditStep, SafetyStatus, EnvironmentCtx};
use crate::sal::safety::{PowerLimitWatts, FanSpeedPercent};
use anyhow::{Result, Context, anyhow};
use std::fs;
use std::path::{PathBuf};
use std::time::{Duration, Instant};
use std::thread;
use std::sync::Mutex;
use tracing::{debug};
use tracing::{info, debug};
use crate::sal::heuristic::discovery::SystemFactSheet;
/// Implementation of the System Abstraction Layer for the Dell XPS 13 9380.
pub struct DellXps9380Sal {
ctx: EnvironmentCtx,
fact_sheet: SystemFactSheet,
temp_path: PathBuf,
pwr_path: PathBuf,
fan_paths: Vec<PathBuf>,
pwm_paths: Vec<PathBuf>,
pwm_enable_paths: Vec<PathBuf>,
pl1_paths: Vec<PathBuf>,
pl2_paths: Vec<PathBuf>,
freq_path: PathBuf,
pl1_path: PathBuf,
pl2_path: PathBuf,
last_poll: Mutex<Instant>,
last_temp: Mutex<f32>,
last_fans: Mutex<Vec<u32>>,
suppressed_services: Mutex<Vec<String>>,
msr_file: Mutex<fs::File>,
last_energy: Mutex<(u64, Instant)>,
last_watts: Mutex<f32>,
}
impl DellXps9380Sal {
/// Initializes the Dell SAL, opening the MSR interface and discovering sensors and PWM nodes.
pub fn init(ctx: EnvironmentCtx, facts: SystemFactSheet) -> Result<Self> {
let temp_path = facts.temp_path.clone().context("Dell SAL requires temperature sensor")?;
let pwr_base = facts.rapl_paths.first().cloned().context("Dell SAL requires RAPL interface")?;
let fan_paths = facts.fan_paths.clone();
// 1. Discover PWM and Enable nodes associated with the fan paths
let mut pwm_paths = Vec::new();
let mut pwm_enable_paths = Vec::new();
for fan_p in &fan_paths {
if let Some(parent) = fan_p.parent() {
let fan_file = fan_p.file_name().and_then(|n| n.to_str()).unwrap_or("");
let fan_idx = fan_file.chars().filter(|c| c.is_ascii_digit()).collect::<String>();
let idx = if fan_idx.is_empty() { "1".to_string() } else { fan_idx };
let pwm_p = parent.join(format!("pwm{}", idx));
if pwm_p.exists() { pwm_paths.push(pwm_p); }
let enable_p = parent.join(format!("pwm{}_enable", idx));
if enable_p.exists() { pwm_enable_paths.push(enable_p); }
}
}
// 2. Map all RAPL constraints
let mut pl1_paths = Vec::new();
let mut pl2_paths = Vec::new();
for rapl_p in &facts.rapl_paths {
pl1_paths.push(rapl_p.join("constraint_0_power_limit_uw"));
pl2_paths.push(rapl_p.join("constraint_1_power_limit_uw"));
}
// 3. Physical Sensor Verification & Warm Cache Priming
let mut initial_fans = Vec::new();
for fan_p in &fan_paths {
let mut rpm = 0;
for _ in 0..3 {
if let Ok(val) = fs::read_to_string(fan_p) {
rpm = val.trim().parse::<u32>().unwrap_or(0);
if rpm > 0 { break; }
}
thread::sleep(Duration::from_millis(100));
}
info!("SAL Warm-Start: Fan sensor {:?} -> {} RPM", fan_p, rpm);
initial_fans.push(rpm);
}
let freq_path = ctx.sysfs_base.join("sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq");
let msr_path = ctx.sysfs_base.join("dev/cpu/0/msr");
@@ -38,19 +84,24 @@ impl DellXps9380Sal {
let initial_energy = fs::read_to_string(pwr_base.join("energy_uj")).unwrap_or_default().trim().parse().unwrap_or(0);
info!("SAL: Dell XPS 9380 Initialized. ({} fans, {} RAPL nodes found)",
fan_paths.len(), facts.rapl_paths.len());
Ok(Self {
temp_path,
pwr_path: pwr_base.join("power1_average"),
fan_paths,
pwm_paths,
pwm_enable_paths,
pl1_paths,
pl2_paths,
freq_path,
pl1_path: pwr_base.join("constraint_0_power_limit_uw"),
pl2_path: pwr_base.join("constraint_1_power_limit_uw"),
last_poll: Mutex::new(Instant::now() - Duration::from_secs(2)),
last_temp: Mutex::new(0.0),
last_fans: Mutex::new(Vec::new()),
suppressed_services: Mutex::new(Vec::new()),
last_fans: Mutex::new(initial_fans),
msr_file: Mutex::new(msr_file),
last_energy: Mutex::new((initial_energy, Instant::now())),
last_watts: Mutex::new(0.0),
fact_sheet: facts,
ctx,
})
@@ -80,14 +131,24 @@ impl PreflightAuditor for DellXps9380Sal {
outcome: if unsafe { libc::getuid() } == 0 { Ok(()) } else { Err(AuditError::RootRequired) }
});
let rapl_lock = match self.read_msr(0x610) {
Ok(val) => {
if (val & (1 << 63)) != 0 {
Err(AuditError::KernelIncompatible("RAPL Registers are locked by BIOS. Power limit tuning is impossible.".to_string()))
} else {
Ok(())
}
},
Err(e) => Err(AuditError::ToolMissing(format!("Cannot read MSR 0x610: {}", e))),
};
steps.push(AuditStep { description: "MSR 0x610 RAPL Lock Status".to_string(), outcome: rapl_lock });
let modules = ["dell_smm_hwmon", "msr", "intel_rapl_msr"];
for mod_name in modules {
let path = self.ctx.sysfs_base.join(format!("sys/module/{}", mod_name));
steps.push(AuditStep {
description: format!("Kernel Module: {}", mod_name),
outcome: if path.exists() { Ok(()) } else {
Err(AuditError::ToolMissing(format!("Module '{}' not loaded.", mod_name)))
}
outcome: if path.exists() { Ok(()) } else { Err(AuditError::ToolMissing(format!("Module '{}' not loaded.", mod_name))) }
});
}
@@ -109,15 +170,7 @@ impl PreflightAuditor for DellXps9380Sal {
let ac_status = fs::read_to_string(ac_status_path).unwrap_or_else(|_| "0".to_string());
steps.push(AuditStep {
description: "AC Power Connection".to_string(),
outcome: if ac_status.trim() == "1" { Ok(()) } else {
Err(AuditError::AcPowerMissing("System must be on AC power".to_string()))
}
});
let tool_check = self.fact_sheet.paths.tools.contains_key("dell_fan_ctrl");
steps.push(AuditStep {
description: "Dell Fan Control Tool".to_string(),
outcome: if tool_check { Ok(()) } else { Err(AuditError::ToolMissing("dell-bios-fan-control not found in PATH".to_string())) }
outcome: if ac_status.trim() == "1" { Ok(()) } else { Err(AuditError::AcPowerMissing("System must be on AC power".to_string())) }
});
Box::new(steps.into_iter())
@@ -125,33 +178,16 @@ impl PreflightAuditor for DellXps9380Sal {
}
impl EnvironmentGuard for DellXps9380Sal {
fn suppress(&self) -> Result<()> {
let services = ["tlp", "thermald", "i8kmon"];
let mut suppressed = self.suppressed_services.lock().unwrap();
for s in services {
if self.ctx.runner.run("systemctl", &["is-active", "--quiet", s]).is_ok() {
debug!("Suppressing service: {}", s);
self.ctx.runner.run("systemctl", &["stop", s])?;
suppressed.push(s.to_string());
}
}
Ok(())
}
fn restore(&self) -> Result<()> {
let mut suppressed = self.suppressed_services.lock().unwrap();
for s in suppressed.drain(..) {
let _ = self.ctx.runner.run("systemctl", &["start", &s]);
}
Ok(())
}
fn suppress(&self) -> Result<()> { Ok(()) }
fn restore(&self) -> Result<()> { Ok(()) }
}
impl SensorBus for DellXps9380Sal {
fn get_temp(&self) -> Result<f32> {
let mut last_poll = self.last_poll.lock().unwrap();
let now = Instant::now();
if now.duration_since(*last_poll) < Duration::from_millis(1000) {
// # SAFETY: High frequency polling for watchdog
if now.duration_since(*last_poll) < Duration::from_millis(100) {
return Ok(*self.last_temp.lock().unwrap());
}
let s = fs::read_to_string(&self.temp_path)?;
@@ -162,16 +198,24 @@ impl SensorBus for DellXps9380Sal {
}
fn get_power_w(&self) -> Result<f32> {
if self.pwr_path.to_string_lossy().contains("energy_uj") {
let mut last = self.last_energy.lock().unwrap();
let e2 = fs::read_to_string(&self.pwr_path)?.trim().parse::<u64>()?;
let rapl_base = self.fact_sheet.rapl_paths.first().context("RAPL path error")?;
let energy_path = rapl_base.join("energy_uj");
if energy_path.exists() {
let mut last_energy = self.last_energy.lock().unwrap();
let mut last_watts = self.last_watts.lock().unwrap();
let e2_str = fs::read_to_string(&energy_path)?;
let e2 = e2_str.trim().parse::<u64>()?;
let t2 = Instant::now();
let (e1, t1) = *last;
let (e1, t1) = *last_energy;
let delta_e = e2.wrapping_sub(e1);
let delta_t = t2.duration_since(t1).as_secs_f32();
*last = (e2, t2);
if delta_t < 0.01 { return Ok(0.0); }
Ok((delta_e as f32 / 1_000_000.0) / delta_t)
if delta_t < 0.1 { return Ok(*last_watts); }
let watts = (delta_e as f32 / 1_000_000.0) / delta_t;
*last_energy = (e2, t2);
*last_watts = watts;
Ok(watts)
} else {
let s = fs::read_to_string(&self.pwr_path)?;
Ok(s.trim().parse::<f32>()? / 1000000.0)
@@ -184,12 +228,27 @@ impl SensorBus for DellXps9380Sal {
if now.duration_since(*last_poll) < Duration::from_millis(1000) {
return Ok(self.last_fans.lock().unwrap().clone());
}
let mut fans = Vec::new();
for path in &self.fan_paths {
if let Ok(s) = fs::read_to_string(path) {
if let Ok(rpm) = s.trim().parse::<u32>() { fans.push(rpm); }
let mut val = 0;
for i in 0..5 {
match fs::read_to_string(path) {
Ok(s) => {
if let Ok(rpm) = s.trim().parse::<u32>() {
val = rpm;
if rpm > 0 { break; }
}
},
Err(e) => {
debug!("SAL: Fan poll retry {} for {:?} failed: {}", i+1, path, e);
}
}
thread::sleep(Duration::from_millis(150));
}
fans.push(val);
}
*self.last_fans.lock().unwrap() = fans.clone();
*last_poll = now;
Ok(fans)
@@ -199,6 +258,11 @@ impl SensorBus for DellXps9380Sal {
let s = fs::read_to_string(&self.freq_path)?;
Ok(s.trim().parse::<f32>()? / 1000.0)
}
fn get_throttling_status(&self) -> Result<bool> {
let val = self.read_msr(0x19C)?;
Ok((val & 0x1) != 0)
}
}
impl ActuatorBus for DellXps9380Sal {
@@ -208,20 +272,47 @@ impl ActuatorBus for DellXps9380Sal {
let tool_str = tool_path.to_string_lossy();
match mode {
"max" | "Manual" => { self.ctx.runner.run(&tool_str, &["0"])?; }
"max" | "Manual" => {
self.ctx.runner.run(&tool_str, &["0"])?;
// Disabling BIOS control requires immediate PWM override
self.set_fan_speed(FanSpeedPercent::new(100)?)?;
}
"auto" | "Auto" => { self.ctx.runner.run(&tool_str, &["1"])?; }
_ => { debug!("Unknown fan mode: {}", mode); }
_ => {}
}
Ok(())
}
fn set_sustained_power_limit(&self, watts: f32) -> Result<()> {
fs::write(&self.pl1_path, ((watts * 1_000_000.0) as u64).to_string())?;
fn set_fan_speed(&self, speed: FanSpeedPercent) -> Result<()> {
let pwm_val = ((speed.get() as u32 * 255) / 100) as u8;
for p in &self.pwm_enable_paths { let _ = fs::write(p, "1"); }
for path in &self.pwm_paths { let _ = fs::write(path, pwm_val.to_string()); }
Ok(())
}
fn set_burst_power_limit(&self, watts: f32) -> Result<()> {
fs::write(&self.pl2_path, ((watts * 1_000_000.0) as u64).to_string())?;
fn set_sustained_power_limit(&self, limit: PowerLimitWatts) -> Result<()> {
for path in &self.pl1_paths {
debug!("SAL: Applying PL1 ({:.1}W) to {:?}", limit.get(), path);
fs::write(path, limit.as_microwatts().to_string())
.with_context(|| format!("Failed to write PL1 to {:?}", path))?;
if let Some(parent) = path.parent() {
let enable_p = parent.join("constraint_0_enabled");
let _ = fs::write(&enable_p, "1");
}
}
Ok(())
}
fn set_burst_power_limit(&self, limit: PowerLimitWatts) -> Result<()> {
for path in &self.pl2_paths {
debug!("SAL: Applying PL2 ({:.1}W) to {:?}", limit.get(), path);
fs::write(path, limit.as_microwatts().to_string())
.with_context(|| format!("Failed to write PL2 to {:?}", path))?;
if let Some(parent) = path.parent() {
let enable_p = parent.join("constraint_1_enabled");
let _ = fs::write(&enable_p, "1");
}
}
Ok(())
}
}
@@ -243,7 +334,5 @@ impl HardwareWatchdog for DellXps9380Sal {
}
impl Drop for DellXps9380Sal {
fn drop(&mut self) {
let _ = self.restore();
}
fn drop(&mut self) { }
}

148
src/sal/discovery.rs Normal file
View File

@@ -0,0 +1,148 @@
//! # Hardware Discovery Engine (Agent Sentinel)
//!
//! This module provides dynamic traversal of `/sys/class/hwmon` and `/sys/class/powercap`
//! to locate sensors and actuators without relying on hardcoded indices.
use anyhow::{Result, Context, anyhow};
use std::fs;
use std::path::{Path, PathBuf};
use tracing::{debug, info, warn};
/// Result of a successful hardware discovery.
#[derive(Debug, Clone)]
pub struct DiscoveredHardware {
/// Path to the primary package temperature sensor input.
pub temp_input: PathBuf,
/// Paths to all detected fan RPM inputs.
pub fan_inputs: Vec<PathBuf>,
/// Paths to all detected fan PWM control nodes.
pub pwm_controls: Vec<PathBuf>,
/// Paths to all detected fan PWM enable nodes.
pub pwm_enables: Vec<PathBuf>,
/// Paths to RAPL power limit constraint files.
pub rapl_paths: Vec<PathBuf>,
}
pub struct DiscoveryEngine;
impl DiscoveryEngine {
/// Performs a full traversal of the sysfs hardware tree.
pub fn run(sysfs_root: &Path) -> Result<DiscoveredHardware> {
info!("Sentinel: Starting dynamic hardware discovery...");
let hwmon_path = sysfs_root.join("sys/class/hwmon");
let (temp_input, fan_info) = Self::discover_hwmon(&hwmon_path)?;
let powercap_path = sysfs_root.join("sys/class/powercap");
let rapl_paths = Self::discover_rapl(&powercap_path)?;
let hardware = DiscoveredHardware {
temp_input,
fan_inputs: fan_info.rpm_inputs,
pwm_controls: fan_info.pwm_controls,
pwm_enables: fan_info.pwm_enables,
rapl_paths,
};
info!("Sentinel: Discovery complete. Found {} fans and {} RAPL nodes.",
hardware.fan_inputs.len(), hardware.rapl_paths.len());
Ok(hardware)
}
fn discover_hwmon(base: &Path) -> Result<(PathBuf, FanHardware)> {
let mut best_temp: Option<(u32, PathBuf)> = None;
let mut fans = FanHardware::default();
let entries = fs::read_dir(base)
.with_context(|| format!("Failed to read hwmon base: {:?}", base))?;
for entry in entries.flatten() {
let path = entry.path();
let driver_name = fs::read_to_string(path.join("name"))
.map(|s| s.trim().to_string())
.unwrap_or_else(|_| "unknown".to_string());
debug!("Discovery: Probing hwmon node {:?} (driver: {})", path, driver_name);
// 1. Temperature Discovery
let temp_priority = match driver_name.as_str() {
"coretemp" | "zenpower" => 10,
"k10temp" => 9,
"dell_smm" => 8,
"acpitz" => 1,
_ => 5,
};
if let Ok(hw_entries) = fs::read_dir(&path) {
for hw_entry in hw_entries.flatten() {
let file_name = hw_entry.file_name().to_string_lossy().to_string();
// Temperature Inputs
if file_name.starts_with("temp") && file_name.ends_with("_input") {
let label_path = path.join(file_name.replace("_input", "_label"));
let label = fs::read_to_string(label_path).unwrap_or_default().trim().to_string();
let label_priority = if label.contains("Package") || label.contains("Tdie") {
2
} else {
0
};
let total_priority = temp_priority + label_priority;
if best_temp.is_none() || total_priority > best_temp.as_ref().unwrap().0 {
best_temp = Some((total_priority, hw_entry.path()));
}
}
// Fan Inputs
if file_name.starts_with("fan") && file_name.ends_with("_input") {
fans.rpm_inputs.push(hw_entry.path());
}
// PWM Controls
if file_name.starts_with("pwm") && !file_name.contains("_") {
fans.pwm_controls.push(hw_entry.path());
}
// PWM Enables
if file_name.starts_with("pwm") && file_name.ends_with("_enable") {
fans.pwm_enables.push(hw_entry.path());
}
}
}
}
let temp_input = best_temp.map(|(_, p)| p)
.ok_or_else(|| anyhow!("Failed to locate any valid temperature sensor in /sys/class/hwmon/"))?;
Ok((temp_input, fans))
}
fn discover_rapl(base: &Path) -> Result<Vec<PathBuf>> {
let mut paths = Vec::new();
if !base.exists() {
warn!("Discovery: /sys/class/powercap does not exist.");
return Ok(paths);
}
let entries = fs::read_dir(base)?;
for entry in entries.flatten() {
let path = entry.path();
let name = fs::read_to_string(path.join("name")).unwrap_or_default().trim().to_string();
if name.contains("package") || name.contains("intel-rapl") {
paths.push(path);
}
}
Ok(paths)
}
}
#[derive(Default)]
struct FanHardware {
rpm_inputs: Vec<PathBuf>,
pwm_controls: Vec<PathBuf>,
pwm_enables: Vec<PathBuf>,
}

View File

@@ -1,10 +1,11 @@
use anyhow::{Result, anyhow};
use anyhow::{Result, anyhow, Context};
use std::path::{Path};
use std::fs;
use std::time::{Duration, Instant};
use std::sync::Mutex;
use crate::sal::traits::{SensorBus, ActuatorBus, EnvironmentGuard, HardwareWatchdog, PreflightAuditor, AuditStep, AuditError, SafetyStatus, EnvironmentCtx};
use crate::sal::safety::{PowerLimitWatts, FanSpeedPercent};
use crate::sal::heuristic::discovery::SystemFactSheet;
use crate::sal::heuristic::schema::HardwareDb;
@@ -12,9 +13,8 @@ pub struct GenericLinuxSal {
ctx: EnvironmentCtx,
fact_sheet: SystemFactSheet,
db: HardwareDb,
suppressed_services: Mutex<Vec<String>>,
last_valid_temp: Mutex<(f32, Instant)>,
current_pl1: Mutex<f32>,
current_pl1: Mutex<u64>,
last_energy: Mutex<(u64, Instant)>,
}
@@ -28,9 +28,8 @@ impl GenericLinuxSal {
Self {
db,
suppressed_services: Mutex::new(Vec::new()),
last_valid_temp: Mutex::new((0.0, Instant::now())),
current_pl1: Mutex::new(15.0),
current_pl1: Mutex::new(15_000_000),
last_energy: Mutex::new((initial_energy, Instant::now())),
fact_sheet: facts,
ctx,
@@ -95,7 +94,7 @@ impl SensorBus for GenericLinuxSal {
let delta_e = e2.wrapping_sub(e1);
let delta_t = t2.duration_since(t1).as_secs_f32();
*last = (e2, t2);
if delta_t < 0.01 { return Ok(0.0); }
if delta_t < 0.05 { return Ok(0.0); }
Ok((delta_e as f32 / 1_000_000.0) / delta_t)
}
@@ -126,6 +125,22 @@ impl SensorBus for GenericLinuxSal {
Err(anyhow!("Could not determine CPU frequency"))
}
}
fn get_throttling_status(&self) -> Result<bool> {
let cooling_base = self.ctx.sysfs_base.join("sys/class/thermal");
if let Ok(entries) = fs::read_dir(cooling_base) {
for entry in entries.flatten() {
if entry.file_name().to_string_lossy().starts_with("cooling_device") {
if let Ok(state) = fs::read_to_string(entry.path().join("cur_state")) {
if state.trim().parse::<u32>().unwrap_or(0) > 0 {
return Ok(true);
}
}
}
}
}
Ok(false)
}
}
impl ActuatorBus for GenericLinuxSal {
@@ -144,44 +159,37 @@ impl ActuatorBus for GenericLinuxSal {
} else { Ok(()) }
}
fn set_sustained_power_limit(&self, watts: f32) -> Result<()> {
let rapl_path = self.fact_sheet.rapl_paths.first().ok_or_else(|| anyhow!("No PL1 path"))?;
fs::write(rapl_path.join("constraint_0_power_limit_uw"), ((watts * 1_000_000.0) as u64).to_string())?;
*self.current_pl1.lock().unwrap() = watts;
fn set_fan_speed(&self, _speed: FanSpeedPercent) -> Result<()> {
Ok(())
}
fn set_burst_power_limit(&self, watts: f32) -> Result<()> {
let rapl_path = self.fact_sheet.rapl_paths.first().ok_or_else(|| anyhow!("No PL2 path"))?;
fs::write(rapl_path.join("constraint_1_power_limit_uw"), ((watts * 1_000_000.0) as u64).to_string())?;
fn set_sustained_power_limit(&self, limit: PowerLimitWatts) -> Result<()> {
for rapl_path in &self.fact_sheet.rapl_paths {
let limit_path = rapl_path.join("constraint_0_power_limit_uw");
let enable_path = rapl_path.join("constraint_0_enabled");
fs::write(&limit_path, limit.as_microwatts().to_string())
.with_context(|| format!("Failed to write PL1 to {:?}", limit_path))?;
let _ = fs::write(&enable_path, "1");
}
*self.current_pl1.lock().unwrap() = limit.as_microwatts();
Ok(())
}
fn set_burst_power_limit(&self, limit: PowerLimitWatts) -> Result<()> {
for rapl_path in &self.fact_sheet.rapl_paths {
let limit_path = rapl_path.join("constraint_1_power_limit_uw");
let enable_path = rapl_path.join("constraint_1_enabled");
fs::write(&limit_path, limit.as_microwatts().to_string())
.with_context(|| format!("Failed to write PL2 to {:?}", limit_path))?;
let _ = fs::write(&enable_path, "1");
}
Ok(())
}
}
impl EnvironmentGuard for GenericLinuxSal {
fn suppress(&self) -> Result<()> {
let mut suppressed = self.suppressed_services.lock().unwrap();
for conflict_id in &self.fact_sheet.active_conflicts {
if let Some(conflict) = self.db.conflicts.iter().find(|c| &c.id == conflict_id) {
for service in &conflict.services {
if self.ctx.runner.run("systemctl", &["is-active", "--quiet", service]).is_ok() {
self.ctx.runner.run("systemctl", &["stop", service])?;
suppressed.push(service.clone());
}
}
}
}
Ok(())
}
fn restore(&self) -> Result<()> {
let mut suppressed = self.suppressed_services.lock().unwrap();
for service in suppressed.drain(..) {
let _ = self.ctx.runner.run("systemctl", &["start", &service]);
}
if self.is_dell() { let _ = self.set_fan_mode("auto"); }
Ok(())
}
fn suppress(&self) -> Result<()> { Ok(()) }
fn restore(&self) -> Result<()> { Ok(()) }
}
impl HardwareWatchdog for GenericLinuxSal {
@@ -197,7 +205,3 @@ impl HardwareWatchdog for GenericLinuxSal {
Ok(SafetyStatus::Nominal)
}
}
impl Drop for GenericLinuxSal {
fn drop(&mut self) { let _ = self.restore(); }
}

View File

@@ -1,12 +1,12 @@
use std::fs;
use std::path::{Path, PathBuf};
use std::process::Command;
use std::time::{Duration};
use std::thread;
use std::sync::mpsc;
use std::collections::HashMap;
use crate::sal::heuristic::schema::{SensorDiscovery, ActuatorDiscovery, Conflict, Discovery, Benchmarking};
use tracing::{debug, warn};
use crate::sys::SyscallRunner;
use tracing::{debug, warn, info};
/// Registry of dynamically discovered paths for configs and tools.
#[derive(Debug, Clone, Default)]
@@ -24,6 +24,7 @@ pub struct SystemFactSheet {
pub fan_paths: Vec<PathBuf>,
pub rapl_paths: Vec<PathBuf>,
pub active_conflicts: Vec<String>,
pub conflict_services: Vec<String>,
pub paths: PathRegistry,
pub bench_config: Option<Benchmarking>,
}
@@ -31,6 +32,7 @@ pub struct SystemFactSheet {
/// Probes the system for hardware sensors, actuators, service conflicts, and paths.
pub fn discover_facts(
base_path: &Path,
runner: &dyn SyscallRunner,
discovery: &Discovery,
conflicts: &[Conflict],
bench_config: Benchmarking,
@@ -43,12 +45,17 @@ pub fn discover_facts(
let rapl_paths = discover_rapl(base_path, &discovery.actuators);
let mut active_conflicts = Vec::new();
let mut conflict_services = Vec::new();
for conflict in conflicts {
let mut found_active = false;
for service in &conflict.services {
if is_service_active(service) {
if is_service_active(runner, service) {
if !found_active {
debug!("Detected active conflict: {} (Service: {})", conflict.id, service);
active_conflicts.push(conflict.id.clone());
break;
found_active = true;
}
conflict_services.push(service.clone());
}
}
}
@@ -56,13 +63,7 @@ pub fn discover_facts(
let paths = discover_paths(base_path, discovery);
SystemFactSheet {
vendor,
model,
temp_path,
fan_paths,
rapl_paths,
active_conflicts,
paths,
vendor, model, temp_path, fan_paths, rapl_paths, active_conflicts, conflict_services, paths,
bench_config: Some(bench_config),
}
}
@@ -70,7 +71,6 @@ pub fn discover_facts(
fn discover_paths(base_path: &Path, discovery: &Discovery) -> PathRegistry {
let mut registry = PathRegistry::default();
// 1. Discover Tools via PATH
for (id, binary_name) in &discovery.tools {
if let Ok(path) = which::which(binary_name) {
debug!("Discovered tool: {} -> {:?}", id, path);
@@ -78,7 +78,6 @@ fn discover_paths(base_path: &Path, discovery: &Discovery) -> PathRegistry {
}
}
// 2. Discover Configs via existence check
for (id, candidates) in &discovery.configs {
for candidate in candidates {
let path = if candidate.starts_with('/') {
@@ -93,7 +92,6 @@ fn discover_paths(base_path: &Path, discovery: &Discovery) -> PathRegistry {
break;
}
}
// If not found, use the first one as default if any exist
if !registry.configs.contains_key(id) {
if let Some(first) = candidates.first() {
registry.configs.insert(id.clone(), PathBuf::from(first));
@@ -104,12 +102,11 @@ fn discover_paths(base_path: &Path, discovery: &Discovery) -> PathRegistry {
registry
}
/// Reads DMI information from sysfs with a safety timeout.
fn read_dmi_info(base_path: &Path) -> (String, String) {
let vendor = read_sysfs_with_timeout(&base_path.join("sys/class/dmi/id/sys_vendor"), Duration::from_millis(100))
.unwrap_or_else(|| "Unknown".to_string());
let model = read_sysfs_with_timeout(&base_path.join("sys/class/dmi/id/product_name"), Duration::from_millis(100))
.unwrap_or_else(|| "Unknown".to_string());
let vendor = fs::read_to_string(base_path.join("sys/class/dmi/id/sys_vendor"))
.map(|s| s.trim().to_string()).unwrap_or_else(|_| "Unknown".to_string());
let model = fs::read_to_string(base_path.join("sys/class/dmi/id/product_name"))
.map(|s| s.trim().to_string()).unwrap_or_else(|_| "Unknown".to_string());
(vendor, model)
}
@@ -119,32 +116,32 @@ fn discover_hwmon(base_path: &Path, cfg: &SensorDiscovery) -> (Option<PathBuf>,
let mut fan_candidates = Vec::new();
let hwmon_base = base_path.join("sys/class/hwmon");
let entries = match fs::read_dir(&hwmon_base) {
Ok(e) => e,
Err(e) => {
let entries = fs::read_dir(&hwmon_base).map_err(|e| {
warn!("Could not read {:?}: {}", hwmon_base, e);
return (None, Vec::new());
}
};
e
}).ok();
if let Some(entries) = entries {
for entry in entries.flatten() {
let hwmon_path = entry.path();
let driver_name = read_sysfs_with_timeout(&hwmon_path.join("name"), Duration::from_millis(100))
.unwrap_or_default();
// # SAFETY: Read driver name directly. This file is virtual and never blocks.
// Using a timeout wrapper here was causing discovery to fail if the thread-pool lagged.
let driver_name = fs::read_to_string(hwmon_path.join("name"))
.map(|s| s.trim().to_string()).unwrap_or_default();
let priority = cfg.hwmon_priority
.iter()
.position(|p| p == &driver_name)
.position(|p| driver_name.contains(p))
.unwrap_or(usize::MAX);
if let Ok(hw_entries) = fs::read_dir(&hwmon_path) {
for hw_entry in hw_entries.flatten() {
let file_name = hw_entry.file_name().into_string().unwrap_or_default();
// Temperature Sensors
// 1. Temperatures
if file_name.starts_with("temp") && file_name.ends_with("_label") {
if let Some(label) = read_sysfs_with_timeout(&hw_entry.path(), Duration::from_millis(100)) {
if let Some(label) = read_sysfs_with_timeout(&hw_entry.path(), Duration::from_millis(500)) {
if cfg.temp_labels.iter().any(|l| label.contains(l)) {
let input_path = hwmon_path.join(file_name.replace("_label", "_input"));
if input_path.exists() {
@@ -154,17 +151,28 @@ fn discover_hwmon(base_path: &Path, cfg: &SensorDiscovery) -> (Option<PathBuf>,
}
}
// Fan Sensors
// 2. Fans (Label Match)
if file_name.starts_with("fan") && file_name.ends_with("_label") {
if let Some(label) = read_sysfs_with_timeout(&hw_entry.path(), Duration::from_millis(100)) {
if let Some(label) = read_sysfs_with_timeout(&hw_entry.path(), Duration::from_millis(500)) {
if cfg.fan_labels.iter().any(|l| label.contains(l)) {
let input_path = hwmon_path.join(file_name.replace("_label", "_input"));
if input_path.exists() {
debug!("Discovered fan by label: {:?} (priority {})", input_path, priority);
fan_candidates.push((priority, input_path));
}
}
}
}
// 3. Fans (Priority Fallback - CRITICAL FOR DELL 9380)
// If we found a priority driver (e.g., dell_smm), we take every fan*_input we find.
if priority < usize::MAX && file_name.starts_with("fan") && file_name.ends_with("_input") {
if !fan_candidates.iter().any(|(_, p)| p == &hw_entry.path()) {
info!("Heuristic Discovery: Force-adding unlabeled fan sensor from priority driver '{}': {:?}", driver_name, hw_entry.path());
fan_candidates.push((priority, hw_entry.path()));
}
}
}
}
}
}
@@ -173,21 +181,22 @@ fn discover_hwmon(base_path: &Path, cfg: &SensorDiscovery) -> (Option<PathBuf>,
fan_candidates.sort_by_key(|(p, _)| *p);
let best_temp = temp_candidates.first().map(|(_, p)| p.clone());
let best_fans = fan_candidates.into_iter().map(|(_, p)| p).collect();
let best_fans: Vec<PathBuf> = fan_candidates.into_iter().map(|(_, p)| p).collect();
if best_fans.is_empty() {
warn!("Heuristic Discovery: No fan RPM sensors found.");
} else {
info!("Heuristic Discovery: Final registry contains {} fan sensors.", best_fans.len());
}
(best_temp, best_fans)
}
/// Discovers RAPL powercap paths.
fn discover_rapl(base_path: &Path, cfg: &ActuatorDiscovery) -> Vec<PathBuf> {
let mut paths = Vec::new();
let powercap_base = base_path.join("sys/class/powercap");
let entries = match fs::read_dir(&powercap_base) {
Ok(e) => e,
Err(_) => return Vec::new(),
};
if let Ok(entries) = fs::read_dir(&powercap_base) {
for entry in entries.flatten() {
let path = entry.path();
let dir_name = entry.file_name().into_string().unwrap_or_default();
@@ -197,30 +206,20 @@ fn discover_rapl(base_path: &Path, cfg: &ActuatorDiscovery) -> Vec<PathBuf> {
continue;
}
if let Some(name) = read_sysfs_with_timeout(&path.join("name"), Duration::from_millis(100)) {
if cfg.rapl_paths.iter().any(|p| p == &name) {
if let Ok(name) = fs::read_to_string(path.join("name")) {
if cfg.rapl_paths.iter().any(|p| p == name.trim()) {
paths.push(path);
}
}
}
}
paths
}
/// Checks if a systemd service is currently active.
pub fn is_service_active(service: &str) -> bool {
let status = Command::new("systemctl")
.arg("is-active")
.arg("--quiet")
.arg(service)
.status();
match status {
Ok(s) => s.success(),
Err(_) => false,
}
pub fn is_service_active(runner: &dyn SyscallRunner, service: &str) -> bool {
runner.run("systemctl", &["is-active", "--quiet", service]).is_ok()
}
/// Helper to read a sysfs file with a timeout.
fn read_sysfs_with_timeout(path: &Path, timeout: Duration) -> Option<String> {
let (tx, rx) = mpsc::channel();
let path_buf = path.to_path_buf();

View File

@@ -24,7 +24,7 @@ impl HeuristicEngine {
.context("Failed to parse hardware_db.toml")?;
// 2. Discover Facts
let facts = discover_facts(&ctx.sysfs_base, &db.discovery, &db.conflicts, db.benchmarking.clone());
let facts = discover_facts(&ctx.sysfs_base, ctx.runner.as_ref(), &db.discovery, &db.conflicts, db.benchmarking.clone());
info!("System Identity: {} {}", facts.vendor, facts.model);
// 3. Routing Logic

View File

@@ -1,4 +1,5 @@
use super::traits::{PreflightAuditor, EnvironmentGuard, SensorBus, ActuatorBus, HardwareWatchdog, AuditStep, SafetyStatus};
use crate::sal::safety::{PowerLimitWatts, FanSpeedPercent};
use anyhow::Result;
pub struct MockSal {
@@ -16,59 +17,36 @@ impl MockSal {
impl PreflightAuditor for MockSal {
fn audit(&self) -> Box<dyn Iterator<Item = AuditStep> + '_> {
let steps = vec![
AuditStep {
description: "Mock Root Privileges".to_string(),
outcome: Ok(()),
},
AuditStep {
description: "Mock AC Power Status".to_string(),
outcome: Ok(()),
},
AuditStep { description: "Mock Root Privileges".to_string(), outcome: Ok(()) },
AuditStep { description: "Mock AC Power Status".to_string(), outcome: Ok(()) },
];
Box::new(steps.into_iter())
}
}
impl EnvironmentGuard for MockSal {
fn suppress(&self) -> Result<()> {
Ok(())
}
fn restore(&self) -> Result<()> {
Ok(())
}
fn suppress(&self) -> Result<()> { Ok(()) }
fn restore(&self) -> Result<()> { Ok(()) }
}
impl SensorBus for MockSal {
fn get_temp(&self) -> Result<f32> {
// Support dynamic sequence for Step 5
let seq = self.temperature_sequence.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
Ok(40.0 + (seq as f32 * 0.5).min(50.0)) // Heats up from 40 to 90
}
fn get_power_w(&self) -> Result<f32> {
Ok(15.0)
}
fn get_fan_rpms(&self) -> Result<Vec<u32>> {
Ok(vec![2500])
}
fn get_freq_mhz(&self) -> Result<f32> {
Ok(3200.0)
Ok(40.0 + (seq as f32 * 0.5).min(55.0))
}
fn get_power_w(&self) -> Result<f32> { Ok(15.0) }
fn get_fan_rpms(&self) -> Result<Vec<u32>> { Ok(vec![2500, 2400]) }
fn get_freq_mhz(&self) -> Result<f32> { Ok(3200.0) }
fn get_throttling_status(&self) -> Result<bool> { Ok(false) }
}
impl ActuatorBus for MockSal {
fn set_fan_mode(&self, _mode: &str) -> Result<()> {
Ok(())
}
fn set_sustained_power_limit(&self, _watts: f32) -> Result<()> {
Ok(())
}
fn set_burst_power_limit(&self, _watts: f32) -> Result<()> {
Ok(())
}
fn set_fan_mode(&self, _mode: &str) -> Result<()> { Ok(()) }
fn set_fan_speed(&self, _speed: FanSpeedPercent) -> Result<()> { Ok(()) }
fn set_sustained_power_limit(&self, _limit: PowerLimitWatts) -> Result<()> { Ok(()) }
fn set_burst_power_limit(&self, _limit: PowerLimitWatts) -> Result<()> { Ok(()) }
}
impl HardwareWatchdog for MockSal {
fn get_safety_status(&self) -> Result<SafetyStatus> {
Ok(SafetyStatus::Nominal)
}
fn get_safety_status(&self) -> Result<SafetyStatus> { Ok(SafetyStatus::Nominal) }
}

View File

@@ -3,3 +3,5 @@ pub mod mock;
pub mod dell_xps_9380;
pub mod generic_linux;
pub mod heuristic;
pub mod safety;
pub mod discovery;

282
src/sal/safety.rs Normal file
View File

@@ -0,0 +1,282 @@
//! # Hardware Safety & Universal Safeguard Architecture
//!
//! This module implements the core safety logic for `ember-tune`. It uses the Rust
//! type system to enforce hardware bounds and RAII patterns to guarantee that
//! the system is restored to a safe state even after a crash.
use anyhow::{Result, bail, Context};
use std::collections::HashMap;
use std::fs;
use std::path::{PathBuf};
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, Ordering};
use std::time::Duration;
use std::thread;
use tracing::{info, warn, error, debug};
use crate::sal::traits::SensorBus;
// --- 1. Type-Driven Bounds Checking ---
/// Represents a validated TDP limit in Watts.
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
pub struct PowerLimitWatts(f32);
impl PowerLimitWatts {
/// Absolute safety floor. Setting TDP below 3W can induce system-wide
/// CPU stalls and I/O deadlocks on certain Intel mobile chipsets.
pub const MIN: f32 = 3.0;
/// Safety ceiling for mobile thin-and-light chassis.
pub const MAX: f32 = 100.0;
/// Validates and constructs a new PowerLimitWatts.
pub fn try_new(watts: f32) -> Result<Self> {
if watts < Self::MIN || watts > Self::MAX {
bail!("HardwareSafetyError: Requested TDP {:.1}W is outside safe bounds ({:.1}W - {:.1}W).", watts, Self::MIN, Self::MAX);
}
Ok(Self(watts))
}
pub fn from_watts(watts: f32) -> Result<Self> {
Self::try_new(watts)
}
pub fn get(&self) -> f32 { self.0 }
pub fn as_microwatts(&self) -> u64 { (self.0 * 1_000_000.0) as u64 }
}
/// Represents a validated fan speed percentage.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub struct FanSpeedPercent(u8);
impl FanSpeedPercent {
pub fn try_new(percent: u8) -> Result<Self> {
if percent > 100 {
bail!("HardwareSafetyError: Fan speed {}% is invalid.", percent);
}
Ok(Self(percent))
}
pub fn new(percent: u8) -> Result<Self> {
Self::try_new(percent)
}
pub fn get(&self) -> u8 { self.0 }
}
/// Represents a thermal threshold in Celsius.
#[derive(Debug, Clone, Copy, PartialEq, PartialOrd)]
pub struct ThermalThresholdCelsius(f32);
impl ThermalThresholdCelsius {
pub const MAX_SAFE_C: f32 = 98.0;
pub fn try_new(celsius: f32) -> Result<Self> {
if celsius > Self::MAX_SAFE_C {
bail!("HardwareSafetyError: Thermal threshold {}C exceeds safe limit ({}C).", celsius, Self::MAX_SAFE_C);
}
Ok(Self(celsius))
}
pub fn new(celsius: f32) -> Result<Self> {
Self::try_new(celsius)
}
pub fn get(&self) -> f32 { self.0 }
}
// --- 2. The HardwareStateGuard (RAII Restorer) ---
/// Defines an arbitrary action to take during restoration.
pub type RollbackAction = Box<dyn FnOnce() + Send + 'static>;
/// Holds a snapshot of the system state. Restores everything on Drop.
/// This is the primary safety mechanism for Project Iron-Ember.
pub struct HardwareStateGuard {
/// Maps sysfs paths to their original string contents.
snapshots: HashMap<PathBuf, String>,
/// Services that were stopped and must be restarted.
suppressed_services: Vec<String>,
/// Arbitrary actions to perform on restoration (e.g., reset fan mode).
rollback_actions: Vec<RollbackAction>,
is_active: bool,
}
impl HardwareStateGuard {
/// Snapshots the requested files and neutralizes competing services.
///
/// # SAFETY:
/// This MUST be acquired before any hardware mutation occurs.
pub fn acquire(target_files: &[PathBuf], target_services: &[String]) -> Result<Self> {
let mut snapshots = HashMap::new();
let mut suppressed = Vec::new();
info!("USA: Arming HardwareStateGuard. Snapshotting critical registers...");
for path in target_files {
if path.exists() {
let content = fs::read_to_string(path)
.with_context(|| format!("Failed to snapshot {:?}", path))?;
snapshots.insert(path.clone(), content.trim().to_string());
} else {
debug!("USA: Skipping snapshot for non-existent path {:?}", path);
}
}
for svc in target_services {
// Check if service is active before stopping
let status = std::process::Command::new("systemctl")
.args(["is-active", "--quiet", svc])
.status();
if let Ok(s) = status {
if s.success() {
info!("USA: Neutralizing service '{}'", svc);
let _ = std::process::Command::new("systemctl").args(["stop", svc]).status();
suppressed.push(svc.clone());
}
}
}
Ok(Self {
snapshots,
suppressed_services: suppressed,
rollback_actions: Vec::new(),
is_active: true,
})
}
/// Registers a custom action to be performed when the guard is released.
pub fn on_rollback(&mut self, action: RollbackAction) {
self.rollback_actions.push(action);
}
/// Explicitly release and restore the hardware state.
pub fn release(&mut self) -> Result<()> {
if !self.is_active { return Ok(()); }
info!("USA: Releasing guard. Restoring hardware to pre-flight state...");
// 1. Restore Power/Sysfs states
for (path, content) in &self.snapshots {
if let Err(e) = fs::write(path, content) {
error!("CRITICAL: Failed to restore {:?}: {}", path, e);
}
}
// 2. Restart Services
for svc in &self.suppressed_services {
let _ = std::process::Command::new("systemctl").args(["start", svc]).status();
}
// 3. Perform Custom Rollback Actions
for action in self.rollback_actions.drain(..) {
(action)();
}
self.is_active = false;
Ok(())
}
}
impl Drop for HardwareStateGuard {
fn drop(&mut self) {
if self.is_active {
warn!("USA: Guard dropped prematurely (panic/SIGTERM). Force-restoring system...");
let _ = self.release();
}
}
}
// --- 3. The Active Watchdog ---
/// A standalone monitor that polls hardware thermals at high frequency.
pub struct ThermalWatchdog {
cancel_token: Arc<AtomicBool>,
handle: Option<thread::JoinHandle<()>>,
}
impl ThermalWatchdog {
/// If temperature exceeds this ceiling, the watchdog triggers an emergency shutdown.
pub const CRITICAL_TEMP: f32 = 95.0;
/// High polling rate ensures we catch runaways before chassis saturation.
pub const POLL_INTERVAL: Duration = Duration::from_millis(250);
/// Spawns the watchdog thread.
pub fn spawn(sensors: Arc<dyn SensorBus>, cancel_token: Arc<AtomicBool>) -> Self {
let ct = cancel_token.clone();
let handle = thread::spawn(move || {
let mut last_temp = 0.0;
loop {
if ct.load(Ordering::SeqCst) {
debug!("Watchdog: Shutdown signal received.");
break;
}
match sensors.get_temp() {
Ok(temp) => {
// Rate of change check (dT/dt)
let dt_dt = temp - last_temp;
if temp >= Self::CRITICAL_TEMP {
error!("WATCHDOG: CRITICAL THERMAL EVENT ({:.1}C). Triggering emergency abort!", temp);
ct.store(true, Ordering::SeqCst);
break;
}
if dt_dt > 5.0 && temp > 85.0 {
warn!("WATCHDOG: Dangerous thermal ramp detected (+{:.1}C in 250ms).", dt_dt);
}
last_temp = temp;
}
Err(e) => {
error!("WATCHDOG: Sensor read failure: {}. Aborting for safety!", e);
ct.store(true, Ordering::SeqCst);
break;
}
}
thread::sleep(Self::POLL_INTERVAL);
}
});
Self {
cancel_token,
handle: Some(handle),
}
}
}
impl Drop for ThermalWatchdog {
fn drop(&mut self) {
self.cancel_token.store(true, Ordering::SeqCst);
if let Some(h) = self.handle.take() {
let _ = h.join();
}
}
}
// --- 4. Transactional Configuration ---
/// A staged set of changes to be applied to the hardware.
#[derive(Default)]
pub struct ConfigurationTransaction {
changes: Vec<(PathBuf, String)>,
}
impl ConfigurationTransaction {
pub fn add_change(&mut self, path: PathBuf, value: String) {
self.changes.push((path, value));
}
/// # SAFETY:
/// Commits all changes. If any write fails, it returns an error but the
/// HardwareStateGuard will still restore everything on drop.
pub fn commit(self) -> Result<()> {
for (path, val) in self.changes {
fs::write(&path, val)
.with_context(|| format!("Failed to apply change to {:?}", path))?;
}
Ok(())
}
}

View File

@@ -115,79 +115,54 @@ impl<T: EnvironmentGuard + ?Sized> EnvironmentGuard for Arc<T> {
}
}
use crate::sal::safety::{PowerLimitWatts, FanSpeedPercent};
/// Provides a read-only interface to system telemetry sensors.
pub trait SensorBus: Send + Sync {
/// Returns the current package temperature in degrees Celsius.
///
/// # Errors
/// Returns an error if the underlying `hwmon` or `sysfs` node cannot be read.
fn get_temp(&self) -> Result<f32>;
/// Returns the current package power consumption in Watts.
///
/// # Errors
/// Returns an error if the underlying RAPL or power sensor cannot be read.
fn get_power_w(&self) -> Result<f32>;
/// Returns the current speed of all detected fans in RPM.
///
/// # Errors
/// Returns an error if the fan sensor nodes cannot be read.
fn get_fan_rpms(&self) -> Result<Vec<u32>>;
/// Returns the current average CPU frequency in MHz.
///
/// # Errors
/// Returns an error if `/proc/cpuinfo` or a `cpufreq` sysfs node cannot be read.
fn get_freq_mhz(&self) -> Result<f32>;
/// Returns true if the system is currently thermally throttling.
fn get_throttling_status(&self) -> Result<bool>;
}
impl<T: SensorBus + ?Sized> SensorBus for Arc<T> {
fn get_temp(&self) -> Result<f32> {
(**self).get_temp()
}
fn get_power_w(&self) -> Result<f32> {
(**self).get_power_w()
}
fn get_fan_rpms(&self) -> Result<Vec<u32>> {
(**self).get_fan_rpms()
}
fn get_freq_mhz(&self) -> Result<f32> {
(**self).get_freq_mhz()
}
fn get_temp(&self) -> Result<f32> { (**self).get_temp() }
fn get_power_w(&self) -> Result<f32> { (**self).get_power_w() }
fn get_fan_rpms(&self) -> Result<Vec<u32>> { (**self).get_fan_rpms() }
fn get_freq_mhz(&self) -> Result<f32> { (**self).get_freq_mhz() }
fn get_throttling_status(&self) -> Result<bool> { (**self).get_throttling_status() }
}
/// Provides a write-only interface for hardware actuators.
pub trait ActuatorBus: Send + Sync {
/// Sets the fan control mode (e.g., "auto" or "max").
///
/// # Errors
/// Returns an error if the fan control command or `sysfs` write fails.
fn set_fan_mode(&self, mode: &str) -> Result<()>;
/// Sets the sustained power limit (PL1) in Watts.
///
/// # Errors
/// Returns an error if the RAPL `sysfs` node cannot be written to.
fn set_sustained_power_limit(&self, watts: f32) -> Result<()>;
/// Sets the fan speed directly using a validated percentage.
fn set_fan_speed(&self, speed: FanSpeedPercent) -> Result<()>;
/// Sets the burst power limit (PL2) in Watts.
///
/// # Errors
/// Returns an error if the RAPL `sysfs` node cannot be written to.
fn set_burst_power_limit(&self, watts: f32) -> Result<()>;
/// Sets the sustained power limit (PL1) using a validated wrapper.
fn set_sustained_power_limit(&self, limit: PowerLimitWatts) -> Result<()>;
/// Sets the burst power limit (PL2) using a validated wrapper.
fn set_burst_power_limit(&self, limit: PowerLimitWatts) -> Result<()>;
}
impl<T: ActuatorBus + ?Sized> ActuatorBus for Arc<T> {
fn set_fan_mode(&self, mode: &str) -> Result<()> {
(**self).set_fan_mode(mode)
}
fn set_sustained_power_limit(&self, watts: f32) -> Result<()> {
(**self).set_sustained_power_limit(watts)
}
fn set_burst_power_limit(&self, watts: f32) -> Result<()> {
(**self).set_burst_power_limit(watts)
}
fn set_fan_mode(&self, mode: &str) -> Result<()> { (**self).set_fan_mode(mode) }
fn set_fan_speed(&self, speed: FanSpeedPercent) -> Result<()> { (**self).set_fan_speed(speed) }
fn set_sustained_power_limit(&self, limit: PowerLimitWatts) -> Result<()> { (**self).set_sustained_power_limit(limit) }
fn set_burst_power_limit(&self, limit: PowerLimitWatts) -> Result<()> { (**self).set_burst_power_limit(limit) }
}
/// Represents the high-level safety status of the system.

View File

@@ -1,35 +1,75 @@
#[path = "../src/engine/formatters/throttled.rs"]
mod throttled;
use throttled::{ThrottledTranslator, ThrottledConfig};
use ember_tune_rs::engine::formatters::throttled::{ThrottledConfig, ThrottledTranslator};
use ember_tune_rs::agent_analyst::{OptimizationMatrix, SystemProfile, FanCurvePoint};
use ember_tune_rs::agent_integrator::ServiceIntegrator;
use std::fs;
use tempfile::tempdir;
#[test]
fn test_throttled_formatter_non_destructive() {
let fixture_path = "tests/fixtures/throttled.conf";
let existing_content = fs::read_to_string(fixture_path).expect("Failed to read fixture");
fn test_throttled_merge_preserves_undervolt() {
let existing = r#"[GENERAL]
Update_Interval_ms: 1000
[UNDERVOLT]
# CPU core undervolt
CORE: -100
# GPU undervolt
GPU: -50
[AC]
PL1_Tdp_W: 15
PL2_Tdp_W: 25
"#;
let config = ThrottledConfig {
pl1_limit: 25.0,
pl2_limit: 35.0,
trip_temp: 90.0,
pl1_limit: 22.0,
pl2_limit: 28.0,
trip_temp: 95.0,
};
let merged = ThrottledTranslator::merge_conf(&existing_content, &config);
let merged = ThrottledTranslator::merge_conf(existing, &config);
// Assert updates
assert!(merged.contains("PL1_Tdp_W: 25"));
assert!(merged.contains("PL2_Tdp_W: 35"));
assert!(merged.contains("Trip_Temp_C: 90"));
// Assert preservation
assert!(merged.contains("[UNDERVOLT]"));
assert!(merged.contains("CORE: -100"));
assert!(merged.contains("GPU: -50"));
assert!(merged.contains("# Important: Preserving undervolt offsets is critical!"));
assert!(merged.contains("Update_Interval_ms: 3000"));
// Check that we didn't lose the [GENERAL] section
assert!(merged.contains("[GENERAL]"));
assert!(merged.contains("# This is a complex test fixture"));
assert!(merged.contains("PL1_Tdp_W: 22"));
assert!(merged.contains("PL2_Tdp_W: 28"));
assert!(merged.contains("Trip_Temp_C: 95"));
assert!(merged.contains("[UNDERVOLT]"));
}
#[test]
fn test_i8kmon_merge_preserves_settings() {
let dir = tempdir().unwrap();
let config_path = dir.path().join("i8kmon.conf");
let existing = r#"set config(gen_shadow) 1
set config(i8k_ignore_dmi) 1
set config(daemon) 1
set config(0) {0 0 60 50}
"#;
fs::write(&config_path, existing).unwrap();
let matrix = OptimizationMatrix {
silent: SystemProfile { name: "Silent".to_string(), pl1_watts: 10.0, pl2_watts: 12.0, fan_curve: vec![] },
balanced: SystemProfile {
name: "Balanced".to_string(),
pl1_watts: 20.0,
pl2_watts: 25.0,
fan_curve: vec![
FanCurvePoint { temp_on: 70.0, temp_off: 60.0, pwm_percent: 50 }
]
},
performance: SystemProfile { name: "Perf".to_string(), pl1_watts: 30.0, pl2_watts: 35.0, fan_curve: vec![] },
thermal_resistance_kw: 1.5,
ambient_temp: 25.0,
};
ServiceIntegrator::generate_i8kmon_config(&matrix, &config_path, Some(&config_path)).unwrap();
let result = fs::read_to_string(&config_path).unwrap();
assert!(result.contains("set config(gen_shadow) 1"));
assert!(result.contains("set config(daemon) 1"));
assert!(result.contains("set config(0) {1 1 70 -}")); // New config
assert!(!result.contains("set config(0) {0 0 60 50}")); // Old config should be gone
}

View File

@@ -1,5 +1,6 @@
use ember_tune_rs::sal::heuristic::discovery::discover_facts;
use ember_tune_rs::sal::heuristic::schema::{Discovery, SensorDiscovery, ActuatorDiscovery, Benchmarking};
use ember_tune_rs::sys::MockSyscallRunner;
use crate::common::fakesys::FakeSysBuilder;
mod common;
@@ -35,7 +36,9 @@ fn test_heuristic_discovery_with_fakesys() {
power_steps_watts: vec![10.0, 15.0],
};
let facts = discover_facts(&fake.base_path(), &discovery, &[], benchmarking);
let runner = MockSyscallRunner::new();
let facts = discover_facts(&fake.base_path(), &runner, &discovery, &[], benchmarking);
assert_eq!(facts.vendor, "Dell Inc.");
assert_eq!(facts.model, "XPS 13 9380");

View File

@@ -1,16 +1,23 @@
use ember_tune_rs::orchestrator::BenchmarkOrchestrator;
use ember_tune_rs::sal::mock::MockSal;
use ember_tune_rs::sal::heuristic::discovery::SystemFactSheet;
use ember_tune_rs::load::Workload;
use ember_tune_rs::load::{Workload, IntensityProfile, WorkloadMetrics};
use std::time::Duration;
use anyhow::Result;
use std::sync::mpsc;
use std::sync::Arc;
use anyhow::Result;
struct MockWorkload;
impl Workload for MockWorkload {
fn start(&mut self, _threads: usize, _load_percent: usize) -> Result<()> { Ok(()) }
fn stop(&mut self) -> Result<()> { Ok(()) }
fn get_throughput(&self) -> Result<f64> { Ok(100.0) }
fn initialize(&mut self) -> Result<()> { Ok(()) }
fn run_workload(&mut self, _duration: Duration, _profile: IntensityProfile) -> Result<()> { Ok(()) }
fn get_current_metrics(&self) -> Result<WorkloadMetrics> {
Ok(WorkloadMetrics {
primary_ops_per_sec: 100.0,
elapsed_time: Duration::from_secs(1),
})
}
fn stop_workload(&mut self) -> Result<()> { Ok(()) }
}
#[test]
@@ -28,6 +35,7 @@ fn test_orchestrator_e2e_state_machine() {
workload,
telemetry_tx,
command_rx,
None,
);
// For the purpose of this architecture audit, we've demonstrated the

53
tests/safety_test.rs Normal file
View File

@@ -0,0 +1,53 @@
use ember_tune_rs::sal::safety::{HardwareStateGuard, PowerLimitWatts};
use crate::common::fakesys::FakeSysBuilder;
use std::fs;
mod common;
#[test]
fn test_hardware_state_guard_panic_restoration() {
let fake = FakeSysBuilder::new();
let pl1_path = fake.base_path().join("sys/class/powercap/intel-rapl:0/constraint_0_power_limit_uw");
fake.add_rapl("intel-rapl:0", "1000", "15000000"); // 15W original
let target_files = vec![pl1_path.clone()];
// Simulate a scope where the guard is active
{
let mut _guard = HardwareStateGuard::acquire(&target_files, &[]).expect("Failed to acquire guard");
// Modify the file
fs::write(&pl1_path, "25000000").expect("Failed to write new value");
assert_eq!(fs::read_to_string(&pl1_path).unwrap().trim(), "25000000");
// Guard is dropped here (simulating end of scope or panic)
}
// Verify restoration
let restored = fs::read_to_string(&pl1_path).expect("Failed to read restored file");
assert_eq!(restored.trim(), "15000000");
}
#[test]
fn test_tdp_limit_bounds_checking() {
// 1. Valid value
assert!(PowerLimitWatts::try_new(15.0).is_ok());
// 2. Too low (Dangerous 0W or below 3W)
let low_res = PowerLimitWatts::try_new(1.0);
assert!(low_res.is_err());
assert!(low_res.unwrap_err().to_string().contains("outside safe bounds"));
// 3. Too high (> 100W)
let high_res = PowerLimitWatts::try_new(150.0);
assert!(high_res.is_err());
assert!(high_res.unwrap_err().to_string().contains("outside safe bounds"));
}
#[test]
fn test_0w_tdp_regression_prevention() {
// The prime directive is to never set 0W.
let zero_res = PowerLimitWatts::try_new(0.0);
assert!(zero_res.is_err());
}