update docs + agents
This commit is contained in:
82
README.md
82
README.md
@@ -1,33 +1,61 @@
|
||||
# 🔥 ember-tune
|
||||
```text
|
||||
__________ ____ ______ ____ ______ __ __ _ __ ______
|
||||
/ ____/ |/ // __ )/ ____// __ \ /_ __/ / / / // | / // ____/
|
||||
/ __/ / /|_/ // __ / __/ / /_/ / / / / / / // |/ // __/
|
||||
/ /___ / / / // /_/ / /___ / _, _/ / / / /_/ // /| // /___
|
||||
/_____//_/ /_//_____/_____//_/ |_| /_/ \____//_/ |_//_____/
|
||||
|
||||
>>> Physically-grounded thermal & power optimization for Linux <<<
|
||||
```
|
||||
|
||||
> ### **Find your hardware's "Physical Sweet Spot" through automated trial-by-fire.**
|
||||
|
||||
`ember-tune` is a scientifically-driven hardware optimizer that replaces guesswork and manual tuning with a rigorous, automated engineering workflow. It determines the unique thermal properties of your specific laptop—including its Thermal Resistance (Rθ) and "Silicon Knee"—to generate optimal configurations for common Linux tuning daemons.
|
||||
|
||||
## ✨ Features
|
||||
|
||||
- **Automated Physical Benchmarking:** Measures real-world thermal performance under load to find the true "sweet spot" where performance-per-watt is maximized before thermal saturation causes diminishing returns.
|
||||
- **Heuristic Hardware Discovery:** Utilizes a data-driven Hardware Abstraction Layer (SAL) that probes your system and automatically adapts to its unique quirks, drivers, and sensor paths.
|
||||
- **Non-Destructive Configuration:** Safely merges new, optimized power limits into your existing `throttled.conf`, preserving manual undervolt settings and comments.
|
||||
- **Universal Safeguard Architecture (USA):** Includes a high-frequency concurrent watchdog and RAII state restoration to guarantee your system is never left in a dangerous state.
|
||||
- **Real-time TUI Dashboard:** A `ratatui`-based terminal interface provides high-resolution telemetry throughout the benchmark.
|
||||
|
||||
## 🔬 How it Works: The Architecture
|
||||
|
||||
`ember-tune` is built on a decoupled, multi-threaded architecture to ensure the UI is always responsive and that hardware state is managed safely.
|
||||
|
||||
1. **The Heuristic Engine:** On startup, the engine probes your system's DMI, `sysfs`, and active services. It compares these "facts" against the `hardware_db.toml` to select the correct System Abstraction Layer (SAL).
|
||||
2. **The Orchestrator (Backend Thread):** This is the state machine that executes the benchmark. It communicates with hardware *only* through the SAL traits.
|
||||
3. **The TUI (Main Thread):** The `ratatui` dashboard renders `TelemetryState` snapshots received from the orchestrator via an MPSC channel.
|
||||
4. **The Watchdog (Safety Thread):** A high-priority thread that polls safety sensors every 100ms to trigger an atomic `EmergencyAbort` if failure conditions are met.
|
||||
|
||||
## ⚙️ Development Setup
|
||||
|
||||
`ember-tune` is a standard Cargo project. You will need a recent Rust toolchain and common build utilities.
|
||||
`ember-tune` is a standard Cargo project.
|
||||
|
||||
**Prerequisites:**
|
||||
- `rustup`
|
||||
- `build-essential` (or equivalent for your distribution)
|
||||
- `build-essential`
|
||||
- `libudev-dev`
|
||||
- `stress-ng` (Required for benchmarking)
|
||||
|
||||
```bash
|
||||
# 1. Clone the repository
|
||||
# 1. Clone and Build
|
||||
git clone https://gitea.com/narl/ember-tune.git
|
||||
cd ember-tune
|
||||
|
||||
# 2. Build the release binary
|
||||
cargo build --release
|
||||
|
||||
# 3. Run the test suite (safe, uses a virtual environment)
|
||||
# This requires no special permissions and does not touch your hardware.
|
||||
# 2. Run the safe test suite
|
||||
cargo test
|
||||
```
|
||||
|
||||
**Running:**
|
||||
Due to its direct hardware access, `ember-tune` requires root privileges.
|
||||
|
||||
```bash
|
||||
# Run a full benchmark and generate optimized configs
|
||||
# Run a full benchmark
|
||||
sudo ./target/release/ember-tune
|
||||
|
||||
# Run a mock benchmark for UI/logic testing
|
||||
# Run a mock benchmark for UI testing
|
||||
sudo ./target/release/ember-tune --mock
|
||||
```
|
||||
|
||||
@@ -35,48 +63,24 @@ sudo ./target/release/ember-tune --mock
|
||||
|
||||
## 🤝 Contributing Quirk Data (`hardware_db.toml`)
|
||||
|
||||
**This is the most impactful way to contribute.** `ember-tune`'s strength comes from its `assets/hardware_db.toml`, which encodes community knowledge about how to manage specific laptops. If your hardware isn't working perfectly, you can likely fix it by adding a new entry here.
|
||||
**This is the most impactful way to contribute.** If your hardware isn't working perfectly, add a new entry to `assets/hardware_db.toml`.
|
||||
|
||||
The database is composed of four key sections: `conflicts`, `ecosystems`, `quirks`, and `discovery`.
|
||||
|
||||
### A. Reporting a Service Conflict
|
||||
If a background service on your system interferes with `ember-tune`, add it to `[[conflicts]]`.
|
||||
|
||||
**Example:** Adding `laptop-mode-tools`.
|
||||
### Example: Adding a Service Conflict
|
||||
```toml
|
||||
[[conflicts]]
|
||||
id = "laptop_mode_conflict"
|
||||
services = ["laptop-mode.service"]
|
||||
contention = "Multiple - I/O schedulers, Power limits"
|
||||
severity = "Medium"
|
||||
fix_action = "SuspendService" # Orchestrator will stop/start this service
|
||||
fix_action = "SuspendService"
|
||||
help_text = "laptop-mode-tools can override power-related sysfs settings."
|
||||
```
|
||||
|
||||
### B. Adding a New Hardware Ecosystem
|
||||
If your laptop manufacturer (e.g., Razer) has a unique fan control tool or ACPI platform profile path, define it in `[ecosystems]`.
|
||||
|
||||
**Example:** A hypothetical "Razer" ecosystem.
|
||||
```toml
|
||||
[ecosystems.razer]
|
||||
vendor_regex = "Razer"
|
||||
# Path to the sysfs node that controls performance profiles
|
||||
profiles_path = "/sys/bus/platform/drivers/razer_acpi/power_mode"
|
||||
# Map human-readable names to the values the driver expects
|
||||
policy_map = { Balanced = 0, Boost = 1, Silent = 2 }
|
||||
```
|
||||
|
||||
### C. Defining a Model-Specific Quirk
|
||||
If a specific laptop model has a bug (like a stuck sensor or incorrect fan reporting), define a `[[quirks]]` entry.
|
||||
|
||||
**Example:** A laptop whose fans report 0 RPM even when spinning.
|
||||
### Example: Defining a Model-Specific Quirk
|
||||
```toml
|
||||
[[quirks]]
|
||||
model_regex = "HP Envy 15-ep.*"
|
||||
id = "hp_fan_stuck_sensor"
|
||||
issue = "Fan sensor reports 0 RPM when active."
|
||||
# The 'action' tells the SAL to use a different method for fan detection.
|
||||
action = "UseThermalVelocityFallback"
|
||||
```
|
||||
|
||||
After adding your changes, run the test suite and then submit a Pull Request!
|
||||
|
||||
100
src/agent_analyst/mod.rs
Normal file
100
src/agent_analyst/mod.rs
Normal file
@@ -0,0 +1,100 @@
|
||||
//! Heuristic Analysis & Optimization Math (Agent Analyst)
|
||||
//!
|
||||
//! This module analyzes raw telemetry data to extract the "Optimal Real-World Settings".
|
||||
//! It calculates the Silicon Knee, Acoustic/Thermal Matrix (Hysteresis), and
|
||||
//! generates three distinct hardware states: Silent, Balanced, and Sustained Heavy.
|
||||
|
||||
use serde::{Serialize, Deserialize};
|
||||
use crate::engine::{ThermalProfile, OptimizerEngine};
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct FanCurvePoint {
|
||||
pub temp_on: f32,
|
||||
pub temp_off: f32,
|
||||
pub pwm_percent: u8,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct SystemProfile {
|
||||
pub name: String,
|
||||
pub pl1_watts: f32,
|
||||
pub pl2_watts: f32,
|
||||
pub fan_curve: Vec<FanCurvePoint>,
|
||||
}
|
||||
|
||||
#[derive(Debug, Clone, Serialize, Deserialize)]
|
||||
pub struct OptimizationMatrix {
|
||||
pub silent: SystemProfile,
|
||||
pub balanced: SystemProfile,
|
||||
pub performance: SystemProfile,
|
||||
pub thermal_resistance_kw: f32,
|
||||
}
|
||||
|
||||
pub struct HeuristicAnalyst {
|
||||
engine: OptimizerEngine,
|
||||
}
|
||||
|
||||
impl HeuristicAnalyst {
|
||||
pub fn new() -> Self {
|
||||
Self {
|
||||
engine: OptimizerEngine::new(5),
|
||||
}
|
||||
}
|
||||
|
||||
/// Analyzes the raw telemetry to generate the 3 optimal profiles.
|
||||
pub fn analyze(&self, profile: &ThermalProfile, max_soak_watts: f32) -> OptimizationMatrix {
|
||||
let r_theta = self.engine.calculate_thermal_resistance(profile);
|
||||
let silicon_knee = self.engine.find_silicon_knee(profile);
|
||||
|
||||
// 1. State A: Silent / Battery (Scientific Passive Equilibrium)
|
||||
// Objective: Find P where T_core = 60C with fans OFF.
|
||||
// T_core = T_ambient + (P * R_theta_passive)
|
||||
// Note: R_theta measured during benchmark was with fans MAX.
|
||||
// Passive R_theta is typically 2-3x higher.
|
||||
let r_theta_passive = r_theta * 2.5;
|
||||
let silent_watts = ((60.0 - profile.ambient_temp) / r_theta_passive.max(0.1)).clamp(5.0, 15.0);
|
||||
|
||||
let silent_profile = SystemProfile {
|
||||
name: "Silent".to_string(),
|
||||
pl1_watts: silent_watts,
|
||||
pl2_watts: silent_watts * 1.2,
|
||||
fan_curve: vec![
|
||||
FanCurvePoint { temp_on: 65.0, temp_off: 55.0, pwm_percent: 0 },
|
||||
FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 30 },
|
||||
],
|
||||
};
|
||||
|
||||
// 2. State B: Balanced
|
||||
// The exact calculated Silicon Knee
|
||||
let balanced_profile = SystemProfile {
|
||||
name: "Balanced".to_string(),
|
||||
pl1_watts: silicon_knee,
|
||||
pl2_watts: silicon_knee * 1.25,
|
||||
fan_curve: vec![
|
||||
FanCurvePoint { temp_on: 60.0, temp_off: 55.0, pwm_percent: 0 },
|
||||
FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 40 },
|
||||
FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 70 },
|
||||
],
|
||||
};
|
||||
|
||||
// 3. State C: Sustained Heavy
|
||||
// Based on the max soak watts from Phase 1.
|
||||
let performance_profile = SystemProfile {
|
||||
name: "Performance".to_string(),
|
||||
pl1_watts: max_soak_watts,
|
||||
pl2_watts: max_soak_watts * 1.3,
|
||||
fan_curve: vec![
|
||||
FanCurvePoint { temp_on: 50.0, temp_off: 45.0, pwm_percent: 30 },
|
||||
FanCurvePoint { temp_on: 70.0, temp_off: 60.0, pwm_percent: 60 },
|
||||
FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 100 },
|
||||
],
|
||||
};
|
||||
|
||||
OptimizationMatrix {
|
||||
silent: silent_profile,
|
||||
balanced: balanced_profile,
|
||||
performance: performance_profile,
|
||||
thermal_resistance_kw: r_theta,
|
||||
}
|
||||
}
|
||||
}
|
||||
115
src/agent_integrator/mod.rs
Normal file
115
src/agent_integrator/mod.rs
Normal file
@@ -0,0 +1,115 @@
|
||||
//! System Service Integration (Agent Integrator)
|
||||
//!
|
||||
//! This module translates the mathematical optimums defined by the Analyst
|
||||
//! into actionable, real-world Linux/OS service configurations.
|
||||
//! It generates templates for fan daemons (i8kmon, thinkfan) and handles
|
||||
//! resolution strategies for overlapping daemons.
|
||||
|
||||
use anyhow::Result;
|
||||
use std::path::Path;
|
||||
use std::fs;
|
||||
use crate::agent_analyst::OptimizationMatrix;
|
||||
|
||||
pub struct ServiceIntegrator;
|
||||
|
||||
impl ServiceIntegrator {
|
||||
/// Generates and saves an i8kmon configuration based on the balanced profile.
|
||||
pub fn generate_i8kmon_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> {
|
||||
let profile = &matrix.balanced;
|
||||
|
||||
let mut conf = String::new();
|
||||
conf.push_str("# Auto-generated by ember-tune Integrator
|
||||
");
|
||||
conf.push_str(&format!("# Profile: {}
|
||||
|
||||
", profile.name));
|
||||
|
||||
for (i, p) in profile.fan_curve.iter().enumerate() {
|
||||
// i8kmon syntax: set config(state) {left_fan right_fan temp_on temp_off}
|
||||
// State 0, 1, 2, 3 correspond to BIOS fan states (off, low, high)
|
||||
|
||||
let state = match p.pwm_percent {
|
||||
0..=20 => 0,
|
||||
21..=50 => 1,
|
||||
51..=100 => 2,
|
||||
_ => 2,
|
||||
};
|
||||
|
||||
let off = if i == 0 { "-".to_string() } else { format!("{}", p.temp_off) };
|
||||
conf.push_str(&format!("set config({}) {{{} {} {} {}}}
|
||||
", i, state, state, p.temp_on, off));
|
||||
}
|
||||
|
||||
fs::write(output_path, conf)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Generates a thinkfan configuration.
|
||||
pub fn generate_thinkfan_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> {
|
||||
let profile = &matrix.balanced;
|
||||
|
||||
let mut conf = String::new();
|
||||
conf.push_str("# Auto-generated by ember-tune Integrator
|
||||
");
|
||||
conf.push_str("sensors:
|
||||
- hwmon: /sys/class/hwmon/hwmon0/temp1_input
|
||||
|
||||
");
|
||||
conf.push_str("levels:
|
||||
");
|
||||
|
||||
for (i, p) in profile.fan_curve.iter().enumerate() {
|
||||
// thinkfan syntax: - [level, temp_down, temp_up]
|
||||
let level = match p.pwm_percent {
|
||||
0..=20 => 0,
|
||||
21..=40 => 1,
|
||||
41..=60 => 3,
|
||||
61..=80 => 5,
|
||||
_ => 7,
|
||||
};
|
||||
|
||||
let down = if i == 0 { 0.0 } else { p.temp_off };
|
||||
conf.push_str(&format!(" - [{}, {}, {}]
|
||||
", level, down, p.temp_on));
|
||||
}
|
||||
|
||||
fs::write(output_path, conf)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Generates a resolution checklist/script for daemons.
|
||||
pub fn generate_conflict_resolution_script(output_path: &Path) -> Result<()> {
|
||||
let script = r#"#!/bin/bash
|
||||
# ember-tune Daemon Neutralization Script
|
||||
|
||||
# 1. Mask power-profiles-daemon (Prevent ACPI overrides)
|
||||
systemctl mask power-profiles-daemon
|
||||
|
||||
# 2. Filter TLP (Prevent CPU governor fights while keeping PCIe saving)
|
||||
sed -i 's/^CPU_SCALING_GOVERNOR_ON_AC=.*/CPU_SCALING_GOVERNOR_ON_AC=""/' /etc/tlp.conf
|
||||
sed -i 's/^CPU_BOOST_ON_AC=.*/CPU_BOOST_ON_AC=""/' /etc/tlp.conf
|
||||
systemctl restart tlp
|
||||
|
||||
# 3. Thermald Delegate (We provide the trips, it handles the rest)
|
||||
# (Ensure your custom thermal-conf.xml is in /etc/thermald/)
|
||||
systemctl restart thermald
|
||||
"#;
|
||||
fs::write(output_path, script)?;
|
||||
Ok(())
|
||||
}
|
||||
|
||||
/// Generates a thermald configuration XML.
|
||||
pub fn generate_thermald_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> {
|
||||
let profile = &matrix.balanced;
|
||||
let mut xml = String::new();
|
||||
xml.push_str("<?xml version=\"1.0\"?>\n<ThermalConfiguration>\n <Platform>\n <Name>ember-tune Balanced</Name>\n <ProductName>Generic</ProductName>\n <Preference>balanced</Preference>\n <ThermalZones>\n <ThermalZone>\n <Type>cpu</Type>\n <TripPoints>\n");
|
||||
|
||||
for (i, p) in profile.fan_curve.iter().enumerate() {
|
||||
xml.push_str(&format!(" <TripPoint>\n <SensorType>cpu</SensorType>\n <Temperature>{}</Temperature>\n <Type>Passive</Type>\n <ControlId>{}</ControlId>\n </TripPoint>\n", p.temp_on * 1000.0, i));
|
||||
}
|
||||
|
||||
xml.push_str(" </TripPoints>\n </ThermalZone>\n </ThermalZones>\n </Platform>\n</ThermalConfiguration>\n");
|
||||
fs::write(output_path, xml)?;
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
66
src/agent_metrology/mod.rs
Normal file
66
src/agent_metrology/mod.rs
Normal file
@@ -0,0 +1,66 @@
|
||||
//! Telemetry & Benchmarking Methodology (Agent Metrology)
|
||||
//!
|
||||
//! This module defines the execution flow to extract flawless hardware telemetry.
|
||||
//! It isolates specific subsystems (CPU Core, Memory) and executes the Sweep Protocol
|
||||
//! and Thermal Soak to map the physical limits of the hardware.
|
||||
|
||||
use anyhow::Result;
|
||||
use std::time::{Duration, Instant};
|
||||
use std::thread;
|
||||
use crate::sal::traits::PlatformSal;
|
||||
use crate::load::{Workload, IntensityProfile, StressVector};
|
||||
use tracing::info;
|
||||
|
||||
pub struct MetrologyAgent<'a> {
|
||||
sal: &'a dyn PlatformSal,
|
||||
workload: &'a mut Box<dyn Workload>,
|
||||
}
|
||||
|
||||
impl<'a> MetrologyAgent<'a> {
|
||||
pub fn new(sal: &'a dyn PlatformSal, workload: &'a mut Box<dyn Workload>) -> Self {
|
||||
Self { sal, workload }
|
||||
}
|
||||
|
||||
/// Performs a prolonged mixed-load test to achieve chassis thermal saturation.
|
||||
/// Bypasses short-term PL2/boost metrics to find the true steady-state dissipation capacity.
|
||||
pub fn perform_thermal_soak(&mut self, duration_minutes: u64) -> Result<f32> {
|
||||
info!("Metrology: Starting {} minute Thermal Soak...", duration_minutes);
|
||||
|
||||
self.sal.set_fan_mode("max")?;
|
||||
|
||||
// Mixed load: matrix math + memory stressors to saturate entire SoC and Chassis.
|
||||
self.workload.run_workload(
|
||||
Duration::from_secs(duration_minutes * 60),
|
||||
IntensityProfile {
|
||||
threads: num_cpus::get(),
|
||||
load_percentage: 100,
|
||||
vector: StressVector::Mixed
|
||||
}
|
||||
)?;
|
||||
|
||||
let start = Instant::now();
|
||||
let target = Duration::from_secs(duration_minutes * 60);
|
||||
let mut max_sustained_watts = 0.0;
|
||||
|
||||
while start.elapsed() < target {
|
||||
thread::sleep(Duration::from_secs(5));
|
||||
let temp = self.sal.get_temp().unwrap_or(0.0);
|
||||
let watts = self.sal.get_power_w().unwrap_or(0.0);
|
||||
|
||||
if watts > max_sustained_watts {
|
||||
max_sustained_watts = watts;
|
||||
}
|
||||
|
||||
// Abort if dangerously hot
|
||||
if temp >= 98.0 {
|
||||
info!("Metrology: Thermal ceiling hit during soak ({}C). Stopping early.", temp);
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
self.workload.stop_workload()?;
|
||||
info!("Metrology: Thermal Soak complete. Max sustained: {:.1}W", max_sustained_watts);
|
||||
|
||||
Ok(max_sustained_watts)
|
||||
}
|
||||
}
|
||||
@@ -47,6 +47,8 @@ pub struct OptimizationResult {
|
||||
pub is_partial: bool,
|
||||
/// A map of configuration files that were written to.
|
||||
pub config_paths: HashMap<String, PathBuf>,
|
||||
/// The comprehensive optimization matrix (Silent, Balanced, Performance).
|
||||
pub optimization_matrix: Option<crate::agent_analyst::OptimizationMatrix>,
|
||||
}
|
||||
|
||||
/// Pure mathematics engine for thermal optimization.
|
||||
|
||||
0
src/engine/profiles.rs
Normal file
0
src/engine/profiles.rs
Normal file
@@ -12,3 +12,6 @@ pub mod ui;
|
||||
pub mod engine;
|
||||
pub mod cli;
|
||||
pub mod sys;
|
||||
pub mod agent_metrology;
|
||||
pub mod agent_analyst;
|
||||
pub mod agent_integrator;
|
||||
|
||||
@@ -17,11 +17,20 @@ pub struct WorkloadMetrics {
|
||||
pub elapsed_time: Duration,
|
||||
}
|
||||
|
||||
/// Defines which subsystem to isolate during stress testing.
|
||||
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
|
||||
pub enum StressVector {
|
||||
CpuMatrix,
|
||||
MemoryBandwidth,
|
||||
Mixed,
|
||||
}
|
||||
|
||||
/// A normalized profile defining the intensity and constraints of the workload.
|
||||
#[derive(Debug, Clone)]
|
||||
pub struct IntensityProfile {
|
||||
pub threads: usize,
|
||||
pub load_percentage: u8,
|
||||
pub vector: StressVector,
|
||||
}
|
||||
|
||||
/// The replaceable interface for load generation and performance measurement.
|
||||
@@ -63,7 +72,7 @@ impl Workload for StressNg {
|
||||
.stdout(Stdio::null())
|
||||
.stderr(Stdio::null())
|
||||
.status()
|
||||
.context("stress-ng binary not found in PATH")?;
|
||||
.context("stress-ng binary not found in PATH. Please install it.")?;
|
||||
|
||||
if !status.success() {
|
||||
return Err(anyhow!("stress-ng failed to initialize"));
|
||||
@@ -72,24 +81,29 @@ impl Workload for StressNg {
|
||||
}
|
||||
|
||||
fn run_workload(&mut self, duration: Duration, profile: IntensityProfile) -> Result<()> {
|
||||
self.stop_workload()?; // Ensure clean state
|
||||
self.stop_workload()?;
|
||||
|
||||
let threads = profile.threads.to_string();
|
||||
let timeout = format!("{}s", duration.as_secs());
|
||||
let load = profile.load_percentage.to_string();
|
||||
|
||||
let mut child = Command::new("stress-ng")
|
||||
.args([
|
||||
"--matrix", &threads,
|
||||
"--cpu-load", &load,
|
||||
"--timeout", &timeout,
|
||||
"--metrics-brief",
|
||||
"--metrics-brief", // Repeat for stderr/stdout consistency
|
||||
])
|
||||
.stdout(Stdio::piped())
|
||||
.stderr(Stdio::piped())
|
||||
.spawn()
|
||||
.context("Failed to spawn stress-ng")?;
|
||||
let mut cmd = Command::new("stress-ng");
|
||||
cmd.args(["--timeout", &timeout, "--metrics", "--quiet"]);
|
||||
|
||||
match profile.vector {
|
||||
StressVector::CpuMatrix => {
|
||||
cmd.args(["--matrix", &threads, "--cpu-load", &load]);
|
||||
},
|
||||
StressVector::MemoryBandwidth => {
|
||||
cmd.args(["--vm", &threads, "--vm-bytes", "80%"]);
|
||||
},
|
||||
StressVector::Mixed => {
|
||||
let half = (profile.threads / 2).max(1).to_string();
|
||||
cmd.args(["--matrix", &half, "--vm", &half, "--vm-bytes", "40%"]);
|
||||
}
|
||||
}
|
||||
|
||||
let mut child = cmd.stderr(Stdio::piped()).spawn().context("Failed to spawn stress-ng")?;
|
||||
|
||||
self.start_time = Some(Instant::now());
|
||||
|
||||
@@ -100,16 +114,13 @@ impl Workload for StressNg {
|
||||
thread::spawn(move || {
|
||||
let reader = BufReader::new(stderr);
|
||||
for line in reader.lines().flatten() {
|
||||
// Parse stress-ng metrics line:
|
||||
// stress-ng: info: [PID] matrix [OPS] [TIME] [BOGO OPS/S]
|
||||
if line.contains("matrix") && line.contains("bogo ops/s") {
|
||||
// Parse stress-ng metrics line
|
||||
if line.contains("matrix") || line.contains("vm") {
|
||||
let parts: Vec<&str> = line.split_whitespace().collect();
|
||||
if let Some(ops_idx) = parts.iter().position(|&p| p == "ops/s") {
|
||||
if let Some(ops_val) = parts.get(ops_idx - 1) {
|
||||
if let Ok(ops) = ops_val.parse::<f64>() {
|
||||
let mut m = metrics_ref.lock().unwrap();
|
||||
m.primary_ops_per_sec = ops;
|
||||
}
|
||||
if let Some(val) = parts.last() {
|
||||
if let Ok(ops) = val.parse::<f64>() {
|
||||
let mut m = metrics_ref.lock().unwrap();
|
||||
m.primary_ops_per_sec = ops;
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -130,7 +141,6 @@ impl Workload for StressNg {
|
||||
|
||||
fn stop_workload(&mut self) -> Result<()> {
|
||||
if let Some(mut child) = self.child.take() {
|
||||
// Polite SIGTERM
|
||||
#[cfg(unix)]
|
||||
{
|
||||
use libc::{kill, SIGTERM};
|
||||
|
||||
@@ -189,6 +189,7 @@ fn main() -> Result<()> {
|
||||
pl1_limit: 0.0,
|
||||
pl2_limit: 0.0,
|
||||
fan_tier: "auto".to_string(),
|
||||
is_throttling: false,
|
||||
phase: BenchmarkPhase::Auditing,
|
||||
history_watts: Vec::new(),
|
||||
history_temp: Vec::new(),
|
||||
|
||||
@@ -35,6 +35,7 @@ pub struct TelemetryState {
|
||||
pub pl1_limit: f32,
|
||||
pub pl2_limit: f32,
|
||||
pub fan_tier: String,
|
||||
pub is_throttling: bool,
|
||||
pub phase: BenchmarkPhase,
|
||||
|
||||
// --- High-res History ---
|
||||
|
||||
@@ -18,9 +18,12 @@ use std::path::PathBuf;
|
||||
use crate::sal::traits::{PlatformSal, SafetyStatus};
|
||||
use crate::sal::heuristic::discovery::SystemFactSheet;
|
||||
use crate::sal::safety::{HardwareStateGuard, TdpLimitMicroWatts, ConfigurationTransaction, ThermalThresholdCelsius};
|
||||
use crate::load::{Workload, IntensityProfile};
|
||||
use crate::load::{Workload, IntensityProfile, StressVector};
|
||||
use crate::mediator::{TelemetryState, UiCommand, BenchmarkPhase};
|
||||
use crate::engine::{OptimizerEngine, ThermalProfile, ThermalPoint, OptimizationResult};
|
||||
use crate::agent_metrology::MetrologyAgent;
|
||||
use crate::agent_analyst::{HeuristicAnalyst, OptimizationMatrix};
|
||||
use crate::agent_integrator::ServiceIntegrator;
|
||||
|
||||
/// The central state machine responsible for coordinating the thermal benchmark.
|
||||
pub struct BenchmarkOrchestrator {
|
||||
@@ -189,6 +192,13 @@ impl BenchmarkOrchestrator {
|
||||
self.profile.ambient_temp = self.engine.smooth(&idle_temps).last().cloned().unwrap_or(0.0);
|
||||
self.log(&format!("✓ Idle Baseline: {:.1}°C", self.profile.ambient_temp))?;
|
||||
|
||||
// Phase 1.5: Thermal Soak (Agent Metrology)
|
||||
self.log("Phase 1.5: Executing Thermal Soak to achieve chassis saturation...")?;
|
||||
let soak_duration_minutes = 1;
|
||||
let mut metrology = MetrologyAgent::new(self.sal.as_ref(), &mut self.workload);
|
||||
let max_soak_watts = metrology.perform_thermal_soak(soak_duration_minutes)?;
|
||||
self.log(&format!("✓ Max sustained wattage during soak: {:.1}W", max_soak_watts))?;
|
||||
|
||||
// Phase 2: Stress Stepping
|
||||
self.phase = BenchmarkPhase::StressTesting;
|
||||
self.log("Phase 2: Starting Synthetic Stress Matrix.")?;
|
||||
@@ -213,7 +223,7 @@ impl BenchmarkOrchestrator {
|
||||
|
||||
self.workload.run_workload(
|
||||
Duration::from_secs(bench_cfg.stress_duration_max_s),
|
||||
IntensityProfile { threads: num_cpus::get(), load_percentage: 100 }
|
||||
IntensityProfile { threads: num_cpus::get(), load_percentage: 100, vector: StressVector::CpuMatrix }
|
||||
)?;
|
||||
|
||||
let step_start = Instant::now();
|
||||
@@ -287,18 +297,22 @@ impl BenchmarkOrchestrator {
|
||||
thread::sleep(Duration::from_secs(bench_cfg.cool_down_s));
|
||||
}
|
||||
|
||||
// Phase 4: Physical Modeling
|
||||
// Phase 4: Physical Modeling (Agent Analyst)
|
||||
self.phase = BenchmarkPhase::PhysicalModeling;
|
||||
self.log("Phase 3: Calculating Silicon Physical Sweet Spot...")?;
|
||||
self.log("Phase 3: Calculating Silicon Physical Sweet Spot & Profiles...")?;
|
||||
|
||||
let analyst = HeuristicAnalyst::new();
|
||||
let matrix = analyst.analyze(&self.profile, max_soak_watts);
|
||||
|
||||
let mut res = self.generate_result(false);
|
||||
res.optimization_matrix = Some(matrix.clone());
|
||||
|
||||
self.log(&format!("✓ Thermal Resistance (Rθ): {:.3} K/W", res.thermal_resistance_kw))?;
|
||||
self.log(&format!("✓ Silicon Knee Found: {:.1} W", res.silicon_knee_watts))?;
|
||||
|
||||
thread::sleep(Duration::from_secs(3));
|
||||
|
||||
// Phase 5: Finalizing
|
||||
// Phase 5: Finalizing (Agent Integrator)
|
||||
self.phase = BenchmarkPhase::Finalizing;
|
||||
self.log("Benchmark sequence complete. Generating configurations...")?;
|
||||
|
||||
@@ -317,15 +331,31 @@ impl BenchmarkOrchestrator {
|
||||
res.config_paths.insert("throttled".to_string(), path.clone());
|
||||
}
|
||||
|
||||
if let Some(i8k_path) = self.facts.paths.configs.get("i8kmon") {
|
||||
let i8k_config = crate::engine::formatters::i8kmon::I8kmonConfig {
|
||||
t_ambient: self.profile.ambient_temp,
|
||||
t_max_fan: res.max_temp_c - 5.0,
|
||||
thermal_resistance_kw: res.thermal_resistance_kw,
|
||||
};
|
||||
crate::engine::formatters::i8kmon::I8kmonTranslator::save(i8k_path, &i8k_config)?;
|
||||
self.log(&format!("✓ Saved '{}'.", i8k_path.display()))?;
|
||||
res.config_paths.insert("i8kmon".to_string(), i8k_path.clone());
|
||||
// Generate Fan configs via Agent Integrator
|
||||
let base_out = self.optional_config_out.clone().unwrap_or_else(|| PathBuf::from("/etc"));
|
||||
|
||||
let i8k_out = base_out.join("i8kmon.conf");
|
||||
if ServiceIntegrator::generate_i8kmon_config(&matrix, &i8k_out).is_ok() {
|
||||
self.log(&format!("✓ Saved '{}'.", i8k_out.display()))?;
|
||||
res.config_paths.insert("i8kmon".to_string(), i8k_out);
|
||||
}
|
||||
|
||||
let thinkfan_out = base_out.join("thinkfan.conf");
|
||||
if ServiceIntegrator::generate_thinkfan_config(&matrix, &thinkfan_out).is_ok() {
|
||||
self.log(&format!("✓ Saved '{}'.", thinkfan_out.display()))?;
|
||||
res.config_paths.insert("thinkfan".to_string(), thinkfan_out);
|
||||
}
|
||||
|
||||
let thermald_out = base_out.join("thermal-conf.xml");
|
||||
if ServiceIntegrator::generate_thermald_config(&matrix, &thermald_out).is_ok() {
|
||||
self.log(&format!("✓ Saved '{}'.", thermald_out.display()))?;
|
||||
res.config_paths.insert("thermald".to_string(), thermald_out);
|
||||
}
|
||||
|
||||
let script_out = base_out.join("ember-tune-neutralize.sh");
|
||||
if ServiceIntegrator::generate_conflict_resolution_script(&script_out).is_ok() {
|
||||
self.log(&format!("✓ Saved conflict resolution script: '{}'", script_out.display()))?;
|
||||
res.config_paths.insert("conflict_script".to_string(), script_out);
|
||||
}
|
||||
|
||||
Ok(res)
|
||||
@@ -359,6 +389,7 @@ impl BenchmarkOrchestrator {
|
||||
pl1_limit: 0.0,
|
||||
pl2_limit: 0.0,
|
||||
fan_tier: String::new(),
|
||||
is_throttling: sal.get_throttling_status().unwrap_or(false),
|
||||
phase: BenchmarkPhase::StressTesting,
|
||||
history_watts: Vec::new(),
|
||||
history_temp: Vec::new(),
|
||||
@@ -396,6 +427,7 @@ impl BenchmarkOrchestrator {
|
||||
max_temp_c: max_t,
|
||||
is_partial,
|
||||
config_paths: std::collections::HashMap::new(),
|
||||
optimization_matrix: None,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -428,6 +460,7 @@ impl BenchmarkOrchestrator {
|
||||
pl1_limit: 0.0,
|
||||
pl2_limit: 0.0,
|
||||
fan_tier: "auto".to_string(),
|
||||
is_throttling: self.sal.get_throttling_status().unwrap_or(false),
|
||||
phase: self.phase,
|
||||
history_watts: Vec::new(),
|
||||
history_temp: Vec::new(),
|
||||
@@ -444,6 +477,7 @@ impl BenchmarkOrchestrator {
|
||||
let temp = self.sal.get_temp().unwrap_or(0.0);
|
||||
let pwr = self.sal.get_power_w().unwrap_or(0.0);
|
||||
let freq = self.sal.get_freq_mhz().unwrap_or(0.0);
|
||||
let throttling = self.sal.get_throttling_status().unwrap_or(false);
|
||||
|
||||
self.history_temp.push_back(temp);
|
||||
self.history_watts.push_back(pwr);
|
||||
@@ -467,6 +501,7 @@ impl BenchmarkOrchestrator {
|
||||
pl1_limit: 15.0,
|
||||
pl2_limit: 25.0,
|
||||
fan_tier: "max".to_string(),
|
||||
is_throttling: throttling,
|
||||
phase: self.phase,
|
||||
history_watts: self.history_watts.iter().cloned().collect(),
|
||||
history_temp: self.history_temp.iter().cloned().collect(),
|
||||
|
||||
@@ -5,9 +5,10 @@ use std::fs;
|
||||
use std::path::{PathBuf};
|
||||
use std::time::{Duration, Instant};
|
||||
use std::sync::Mutex;
|
||||
use tracing::{debug};
|
||||
use tracing::{debug, warn};
|
||||
use crate::sal::heuristic::discovery::SystemFactSheet;
|
||||
|
||||
/// Implementation of the System Abstraction Layer for the Dell XPS 13 9380.
|
||||
pub struct DellXps9380Sal {
|
||||
ctx: EnvironmentCtx,
|
||||
fact_sheet: SystemFactSheet,
|
||||
@@ -23,9 +24,16 @@ pub struct DellXps9380Sal {
|
||||
suppressed_services: Mutex<Vec<String>>,
|
||||
msr_file: Mutex<fs::File>,
|
||||
last_energy: Mutex<(u64, Instant)>,
|
||||
last_watts: Mutex<f32>,
|
||||
|
||||
// --- Original State for Restoration ---
|
||||
original_pl1: Mutex<Option<u64>>,
|
||||
original_pl2: Mutex<Option<u64>>,
|
||||
original_fan_mode: Mutex<Option<String>>,
|
||||
}
|
||||
|
||||
impl DellXps9380Sal {
|
||||
/// Initializes the Dell SAL, opening the MSR interface and discovering sensors.
|
||||
pub fn init(ctx: EnvironmentCtx, facts: SystemFactSheet) -> Result<Self> {
|
||||
let temp_path = facts.temp_path.clone().context("Dell SAL requires temperature sensor")?;
|
||||
let pwr_base = facts.rapl_paths.first().cloned().context("Dell SAL requires RAPL interface")?;
|
||||
@@ -52,8 +60,12 @@ impl DellXps9380Sal {
|
||||
suppressed_services: Mutex::new(Vec::new()),
|
||||
msr_file: Mutex::new(msr_file),
|
||||
last_energy: Mutex::new((initial_energy, Instant::now())),
|
||||
last_watts: Mutex::new(0.0),
|
||||
fact_sheet: facts,
|
||||
ctx,
|
||||
original_pl1: Mutex::new(None),
|
||||
original_pl2: Mutex::new(None),
|
||||
original_fan_mode: Mutex::new(None),
|
||||
})
|
||||
}
|
||||
|
||||
@@ -81,6 +93,22 @@ impl PreflightAuditor for DellXps9380Sal {
|
||||
outcome: if unsafe { libc::getuid() } == 0 { Ok(()) } else { Err(AuditError::RootRequired) }
|
||||
});
|
||||
|
||||
// RAPL Lock Check (MSR 0x610)
|
||||
let rapl_lock = match self.read_msr(0x610) {
|
||||
Ok(val) => {
|
||||
if (val & (1 << 63)) != 0 {
|
||||
Err(AuditError::KernelIncompatible("RAPL Registers are locked by BIOS. Power limit tuning is impossible.".to_string()))
|
||||
} else {
|
||||
Ok(())
|
||||
}
|
||||
},
|
||||
Err(e) => Err(AuditError::ToolMissing(format!("Cannot read MSR 0x610: {}", e))),
|
||||
};
|
||||
steps.push(AuditStep {
|
||||
description: "MSR 0x610 RAPL Lock Status".to_string(),
|
||||
outcome: rapl_lock,
|
||||
});
|
||||
|
||||
let modules = ["dell_smm_hwmon", "msr", "intel_rapl_msr"];
|
||||
for mod_name in modules {
|
||||
let path = self.ctx.sysfs_base.join(format!("sys/module/{}", mod_name));
|
||||
@@ -115,23 +143,24 @@ impl PreflightAuditor for DellXps9380Sal {
|
||||
}
|
||||
});
|
||||
|
||||
let tool_check = self.fact_sheet.paths.tools.contains_key("dell_fan_ctrl");
|
||||
steps.push(AuditStep {
|
||||
description: "Dell Fan Control Tool".to_string(),
|
||||
outcome: if tool_check { Ok(()) } else { Err(AuditError::ToolMissing("dell-bios-fan-control not found in PATH".to_string())) }
|
||||
});
|
||||
|
||||
Box::new(steps.into_iter())
|
||||
}
|
||||
}
|
||||
|
||||
impl EnvironmentGuard for DellXps9380Sal {
|
||||
fn suppress(&self) -> Result<()> {
|
||||
let mut suppressed = self.suppressed_services.lock().unwrap();
|
||||
if let Ok(pl1) = fs::read_to_string(&self.pl1_path) {
|
||||
*self.original_pl1.lock().unwrap() = pl1.trim().parse().ok();
|
||||
}
|
||||
if let Ok(pl2) = fs::read_to_string(&self.pl2_path) {
|
||||
*self.original_pl2.lock().unwrap() = pl2.trim().parse().ok();
|
||||
}
|
||||
*self.original_fan_mode.lock().unwrap() = Some("1".to_string());
|
||||
|
||||
let services = ["tlp", "thermald", "i8kmon"];
|
||||
let mut suppressed = self.suppressed_services.lock().unwrap();
|
||||
for s in services {
|
||||
if self.ctx.runner.run("systemctl", &["is-active", "--quiet", s]).is_ok() {
|
||||
debug!("Suppressing service: {}", s);
|
||||
let _ = self.ctx.runner.run("systemctl", &["stop", s]);
|
||||
suppressed.push(s.to_string());
|
||||
}
|
||||
@@ -140,6 +169,15 @@ impl EnvironmentGuard for DellXps9380Sal {
|
||||
}
|
||||
|
||||
fn restore(&self) -> Result<()> {
|
||||
if let Some(pl1) = *self.original_pl1.lock().unwrap() {
|
||||
let _ = fs::write(&self.pl1_path, pl1.to_string());
|
||||
}
|
||||
if let Some(pl2) = *self.original_pl2.lock().unwrap() {
|
||||
let _ = fs::write(&self.pl2_path, pl2.to_string());
|
||||
}
|
||||
if let Some(tool_path) = self.fact_sheet.paths.tools.get("dell_fan_ctrl") {
|
||||
let _ = self.ctx.runner.run(&tool_path.to_string_lossy(), &["1"]);
|
||||
}
|
||||
let mut suppressed = self.suppressed_services.lock().unwrap();
|
||||
for s in suppressed.drain(..) {
|
||||
let _ = self.ctx.runner.run("systemctl", &["start", &s]);
|
||||
@@ -167,16 +205,25 @@ impl SensorBus for DellXps9380Sal {
|
||||
let energy_path = rapl_base.join("energy_uj");
|
||||
|
||||
if energy_path.exists() {
|
||||
let mut last = self.last_energy.lock().unwrap();
|
||||
let mut last_energy = self.last_energy.lock().unwrap();
|
||||
let mut last_watts = self.last_watts.lock().unwrap();
|
||||
|
||||
let e2_str = fs::read_to_string(&energy_path)?;
|
||||
let e2 = e2_str.trim().parse::<u64>()?;
|
||||
let t2 = Instant::now();
|
||||
let (e1, t1) = *last;
|
||||
let (e1, t1) = *last_energy;
|
||||
|
||||
let delta_e = e2.wrapping_sub(e1);
|
||||
let delta_t = t2.duration_since(t1).as_secs_f32();
|
||||
*last = (e2, t2);
|
||||
if delta_t < 0.05 { return Ok(0.0); }
|
||||
Ok((delta_e as f32 / 1_000_000.0) / delta_t)
|
||||
|
||||
if delta_t < 0.1 {
|
||||
return Ok(*last_watts); // Return cached if polled too fast
|
||||
}
|
||||
|
||||
let watts = (delta_e as f32 / 1_000_000.0) / delta_t;
|
||||
*last_energy = (e2, t2);
|
||||
*last_watts = watts;
|
||||
Ok(watts)
|
||||
} else {
|
||||
let s = fs::read_to_string(&self.pwr_path)?;
|
||||
Ok(s.trim().parse::<f32>()? / 1000000.0)
|
||||
@@ -204,6 +251,12 @@ impl SensorBus for DellXps9380Sal {
|
||||
let s = fs::read_to_string(&self.freq_path)?;
|
||||
Ok(s.trim().parse::<f32>()? / 1000.0)
|
||||
}
|
||||
|
||||
fn get_throttling_status(&self) -> Result<bool> {
|
||||
// MSR 0x19C bit 0 is "Thermal Status", bit 1 is "Thermal Log"
|
||||
let val = self.read_msr(0x19C)?;
|
||||
Ok((val & 0x1) != 0)
|
||||
}
|
||||
}
|
||||
|
||||
impl ActuatorBus for DellXps9380Sal {
|
||||
@@ -220,14 +273,7 @@ impl ActuatorBus for DellXps9380Sal {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn set_fan_speed(&self, speed: FanSpeedPercentage) -> Result<()> {
|
||||
let tool_path = self.fact_sheet.paths.tools.get("dell_fan_ctrl")
|
||||
.ok_or_else(|| anyhow!("Dell fan control tool not found in PATH"))?;
|
||||
let tool_str = tool_path.to_string_lossy();
|
||||
|
||||
if speed.as_u8() > 50 {
|
||||
let _ = self.ctx.runner.run(&tool_str, &["0"]);
|
||||
}
|
||||
fn set_fan_speed(&self, _speed: FanSpeedPercentage) -> Result<()> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
|
||||
@@ -133,6 +133,23 @@ impl SensorBus for GenericLinuxSal {
|
||||
Err(anyhow!("Could not determine CPU frequency"))
|
||||
}
|
||||
}
|
||||
|
||||
fn get_throttling_status(&self) -> Result<bool> {
|
||||
// Fallback: check if any cooling device is active (cur_state > 0)
|
||||
let cooling_base = self.ctx.sysfs_base.join("sys/class/thermal");
|
||||
if let Ok(entries) = fs::read_dir(cooling_base) {
|
||||
for entry in entries.flatten() {
|
||||
if entry.file_name().to_string_lossy().starts_with("cooling_device") {
|
||||
if let Ok(state) = fs::read_to_string(entry.path().join("cur_state")) {
|
||||
if state.trim().parse::<u32>().unwrap_or(0) > 0 {
|
||||
return Ok(true);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
Ok(false)
|
||||
}
|
||||
}
|
||||
|
||||
impl ActuatorBus for GenericLinuxSal {
|
||||
|
||||
@@ -54,6 +54,9 @@ impl SensorBus for MockSal {
|
||||
fn get_freq_mhz(&self) -> Result<f32> {
|
||||
Ok(3200.0)
|
||||
}
|
||||
fn get_throttling_status(&self) -> Result<bool> {
|
||||
Ok(self.get_temp()? > 90.0)
|
||||
}
|
||||
}
|
||||
|
||||
impl ActuatorBus for MockSal {
|
||||
|
||||
@@ -140,6 +140,9 @@ pub trait SensorBus: Send + Sync {
|
||||
/// # Errors
|
||||
/// Returns an error if `/proc/cpuinfo` or a `cpufreq` sysfs node cannot be read.
|
||||
fn get_freq_mhz(&self) -> Result<f32>;
|
||||
|
||||
/// Returns true if the system is currently thermally throttling.
|
||||
fn get_throttling_status(&self) -> Result<bool>;
|
||||
}
|
||||
|
||||
impl<T: SensorBus + ?Sized> SensorBus for Arc<T> {
|
||||
@@ -155,6 +158,9 @@ impl<T: SensorBus + ?Sized> SensorBus for Arc<T> {
|
||||
fn get_freq_mhz(&self) -> Result<f32> {
|
||||
(**self).get_freq_mhz()
|
||||
}
|
||||
fn get_throttling_status(&self) -> Result<bool> {
|
||||
(**self).get_throttling_status()
|
||||
}
|
||||
}
|
||||
|
||||
use crate::sal::safety::{TdpLimitMicroWatts, FanSpeedPercentage};
|
||||
|
||||
Reference in New Issue
Block a user