diff --git a/README.md b/README.md index eeb4b38..4aa18b9 100644 --- a/README.md +++ b/README.md @@ -1,33 +1,61 @@ +# 🔥 ember-tune +```text + __________ ____ ______ ____ ______ __ __ _ __ ______ + / ____/ |/ // __ )/ ____// __ \ /_ __/ / / / // | / // ____/ + / __/ / /|_/ // __ / __/ / /_/ / / / / / / // |/ // __/ +/ /___ / / / // /_/ / /___ / _, _/ / / / /_/ // /| // /___ +/_____//_/ /_//_____/_____//_/ |_| /_/ \____//_/ |_//_____/ + + >>> Physically-grounded thermal & power optimization for Linux <<< +``` + +> ### **Find your hardware's "Physical Sweet Spot" through automated trial-by-fire.** + +`ember-tune` is a scientifically-driven hardware optimizer that replaces guesswork and manual tuning with a rigorous, automated engineering workflow. It determines the unique thermal properties of your specific laptop—including its Thermal Resistance (Rθ) and "Silicon Knee"—to generate optimal configurations for common Linux tuning daemons. + +## ✨ Features + +- **Automated Physical Benchmarking:** Measures real-world thermal performance under load to find the true "sweet spot" where performance-per-watt is maximized before thermal saturation causes diminishing returns. +- **Heuristic Hardware Discovery:** Utilizes a data-driven Hardware Abstraction Layer (SAL) that probes your system and automatically adapts to its unique quirks, drivers, and sensor paths. +- **Non-Destructive Configuration:** Safely merges new, optimized power limits into your existing `throttled.conf`, preserving manual undervolt settings and comments. +- **Universal Safeguard Architecture (USA):** Includes a high-frequency concurrent watchdog and RAII state restoration to guarantee your system is never left in a dangerous state. +- **Real-time TUI Dashboard:** A `ratatui`-based terminal interface provides high-resolution telemetry throughout the benchmark. + +## 🔬 How it Works: The Architecture + +`ember-tune` is built on a decoupled, multi-threaded architecture to ensure the UI is always responsive and that hardware state is managed safely. + +1. **The Heuristic Engine:** On startup, the engine probes your system's DMI, `sysfs`, and active services. It compares these "facts" against the `hardware_db.toml` to select the correct System Abstraction Layer (SAL). +2. **The Orchestrator (Backend Thread):** This is the state machine that executes the benchmark. It communicates with hardware *only* through the SAL traits. +3. **The TUI (Main Thread):** The `ratatui` dashboard renders `TelemetryState` snapshots received from the orchestrator via an MPSC channel. +4. **The Watchdog (Safety Thread):** A high-priority thread that polls safety sensors every 100ms to trigger an atomic `EmergencyAbort` if failure conditions are met. + ## ⚙️ Development Setup -`ember-tune` is a standard Cargo project. You will need a recent Rust toolchain and common build utilities. +`ember-tune` is a standard Cargo project. **Prerequisites:** - `rustup` -- `build-essential` (or equivalent for your distribution) +- `build-essential` - `libudev-dev` +- `stress-ng` (Required for benchmarking) ```bash -# 1. Clone the repository +# 1. Clone and Build git clone https://gitea.com/narl/ember-tune.git cd ember-tune - -# 2. Build the release binary cargo build --release -# 3. Run the test suite (safe, uses a virtual environment) -# This requires no special permissions and does not touch your hardware. +# 2. Run the safe test suite cargo test ``` **Running:** -Due to its direct hardware access, `ember-tune` requires root privileges. - ```bash -# Run a full benchmark and generate optimized configs +# Run a full benchmark sudo ./target/release/ember-tune -# Run a mock benchmark for UI/logic testing +# Run a mock benchmark for UI testing sudo ./target/release/ember-tune --mock ``` @@ -35,48 +63,24 @@ sudo ./target/release/ember-tune --mock ## 🤝 Contributing Quirk Data (`hardware_db.toml`) -**This is the most impactful way to contribute.** `ember-tune`'s strength comes from its `assets/hardware_db.toml`, which encodes community knowledge about how to manage specific laptops. If your hardware isn't working perfectly, you can likely fix it by adding a new entry here. +**This is the most impactful way to contribute.** If your hardware isn't working perfectly, add a new entry to `assets/hardware_db.toml`. -The database is composed of four key sections: `conflicts`, `ecosystems`, `quirks`, and `discovery`. - -### A. Reporting a Service Conflict -If a background service on your system interferes with `ember-tune`, add it to `[[conflicts]]`. - -**Example:** Adding `laptop-mode-tools`. +### Example: Adding a Service Conflict ```toml [[conflicts]] id = "laptop_mode_conflict" services = ["laptop-mode.service"] contention = "Multiple - I/O schedulers, Power limits" severity = "Medium" -fix_action = "SuspendService" # Orchestrator will stop/start this service +fix_action = "SuspendService" help_text = "laptop-mode-tools can override power-related sysfs settings." ``` -### B. Adding a New Hardware Ecosystem -If your laptop manufacturer (e.g., Razer) has a unique fan control tool or ACPI platform profile path, define it in `[ecosystems]`. - -**Example:** A hypothetical "Razer" ecosystem. -```toml -[ecosystems.razer] -vendor_regex = "Razer" -# Path to the sysfs node that controls performance profiles -profiles_path = "/sys/bus/platform/drivers/razer_acpi/power_mode" -# Map human-readable names to the values the driver expects -policy_map = { Balanced = 0, Boost = 1, Silent = 2 } -``` - -### C. Defining a Model-Specific Quirk -If a specific laptop model has a bug (like a stuck sensor or incorrect fan reporting), define a `[[quirks]]` entry. - -**Example:** A laptop whose fans report 0 RPM even when spinning. +### Example: Defining a Model-Specific Quirk ```toml [[quirks]] model_regex = "HP Envy 15-ep.*" id = "hp_fan_stuck_sensor" issue = "Fan sensor reports 0 RPM when active." -# The 'action' tells the SAL to use a different method for fan detection. action = "UseThermalVelocityFallback" ``` - -After adding your changes, run the test suite and then submit a Pull Request! diff --git a/src/agent_analyst/mod.rs b/src/agent_analyst/mod.rs new file mode 100644 index 0000000..c5b3b33 --- /dev/null +++ b/src/agent_analyst/mod.rs @@ -0,0 +1,100 @@ +//! Heuristic Analysis & Optimization Math (Agent Analyst) +//! +//! This module analyzes raw telemetry data to extract the "Optimal Real-World Settings". +//! It calculates the Silicon Knee, Acoustic/Thermal Matrix (Hysteresis), and +//! generates three distinct hardware states: Silent, Balanced, and Sustained Heavy. + +use serde::{Serialize, Deserialize}; +use crate::engine::{ThermalProfile, OptimizerEngine}; + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct FanCurvePoint { + pub temp_on: f32, + pub temp_off: f32, + pub pwm_percent: u8, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct SystemProfile { + pub name: String, + pub pl1_watts: f32, + pub pl2_watts: f32, + pub fan_curve: Vec, +} + +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct OptimizationMatrix { + pub silent: SystemProfile, + pub balanced: SystemProfile, + pub performance: SystemProfile, + pub thermal_resistance_kw: f32, +} + +pub struct HeuristicAnalyst { + engine: OptimizerEngine, +} + +impl HeuristicAnalyst { + pub fn new() -> Self { + Self { + engine: OptimizerEngine::new(5), + } + } + + /// Analyzes the raw telemetry to generate the 3 optimal profiles. + pub fn analyze(&self, profile: &ThermalProfile, max_soak_watts: f32) -> OptimizationMatrix { + let r_theta = self.engine.calculate_thermal_resistance(profile); + let silicon_knee = self.engine.find_silicon_knee(profile); + + // 1. State A: Silent / Battery (Scientific Passive Equilibrium) + // Objective: Find P where T_core = 60C with fans OFF. + // T_core = T_ambient + (P * R_theta_passive) + // Note: R_theta measured during benchmark was with fans MAX. + // Passive R_theta is typically 2-3x higher. + let r_theta_passive = r_theta * 2.5; + let silent_watts = ((60.0 - profile.ambient_temp) / r_theta_passive.max(0.1)).clamp(5.0, 15.0); + + let silent_profile = SystemProfile { + name: "Silent".to_string(), + pl1_watts: silent_watts, + pl2_watts: silent_watts * 1.2, + fan_curve: vec![ + FanCurvePoint { temp_on: 65.0, temp_off: 55.0, pwm_percent: 0 }, + FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 30 }, + ], + }; + + // 2. State B: Balanced + // The exact calculated Silicon Knee + let balanced_profile = SystemProfile { + name: "Balanced".to_string(), + pl1_watts: silicon_knee, + pl2_watts: silicon_knee * 1.25, + fan_curve: vec![ + FanCurvePoint { temp_on: 60.0, temp_off: 55.0, pwm_percent: 0 }, + FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 40 }, + FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 70 }, + ], + }; + + // 3. State C: Sustained Heavy + // Based on the max soak watts from Phase 1. + let performance_profile = SystemProfile { + name: "Performance".to_string(), + pl1_watts: max_soak_watts, + pl2_watts: max_soak_watts * 1.3, + fan_curve: vec![ + FanCurvePoint { temp_on: 50.0, temp_off: 45.0, pwm_percent: 30 }, + FanCurvePoint { temp_on: 70.0, temp_off: 60.0, pwm_percent: 60 }, + FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 100 }, + ], + }; + + OptimizationMatrix { + silent: silent_profile, + balanced: balanced_profile, + performance: performance_profile, + thermal_resistance_kw: r_theta, + } + } +} diff --git a/src/agent_integrator/mod.rs b/src/agent_integrator/mod.rs new file mode 100644 index 0000000..8328498 --- /dev/null +++ b/src/agent_integrator/mod.rs @@ -0,0 +1,115 @@ +//! System Service Integration (Agent Integrator) +//! +//! This module translates the mathematical optimums defined by the Analyst +//! into actionable, real-world Linux/OS service configurations. +//! It generates templates for fan daemons (i8kmon, thinkfan) and handles +//! resolution strategies for overlapping daemons. + +use anyhow::Result; +use std::path::Path; +use std::fs; +use crate::agent_analyst::OptimizationMatrix; + +pub struct ServiceIntegrator; + +impl ServiceIntegrator { + /// Generates and saves an i8kmon configuration based on the balanced profile. + pub fn generate_i8kmon_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> { + let profile = &matrix.balanced; + + let mut conf = String::new(); + conf.push_str("# Auto-generated by ember-tune Integrator +"); + conf.push_str(&format!("# Profile: {} + +", profile.name)); + + for (i, p) in profile.fan_curve.iter().enumerate() { + // i8kmon syntax: set config(state) {left_fan right_fan temp_on temp_off} + // State 0, 1, 2, 3 correspond to BIOS fan states (off, low, high) + + let state = match p.pwm_percent { + 0..=20 => 0, + 21..=50 => 1, + 51..=100 => 2, + _ => 2, + }; + + let off = if i == 0 { "-".to_string() } else { format!("{}", p.temp_off) }; + conf.push_str(&format!("set config({}) {{{} {} {} {}}} +", i, state, state, p.temp_on, off)); + } + + fs::write(output_path, conf)?; + Ok(()) + } + + /// Generates a thinkfan configuration. + pub fn generate_thinkfan_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> { + let profile = &matrix.balanced; + + let mut conf = String::new(); + conf.push_str("# Auto-generated by ember-tune Integrator +"); + conf.push_str("sensors: + - hwmon: /sys/class/hwmon/hwmon0/temp1_input + +"); + conf.push_str("levels: +"); + + for (i, p) in profile.fan_curve.iter().enumerate() { + // thinkfan syntax: - [level, temp_down, temp_up] + let level = match p.pwm_percent { + 0..=20 => 0, + 21..=40 => 1, + 41..=60 => 3, + 61..=80 => 5, + _ => 7, + }; + + let down = if i == 0 { 0.0 } else { p.temp_off }; + conf.push_str(&format!(" - [{}, {}, {}] +", level, down, p.temp_on)); + } + + fs::write(output_path, conf)?; + Ok(()) + } + + /// Generates a resolution checklist/script for daemons. + pub fn generate_conflict_resolution_script(output_path: &Path) -> Result<()> { + let script = r#"#!/bin/bash +# ember-tune Daemon Neutralization Script + +# 1. Mask power-profiles-daemon (Prevent ACPI overrides) +systemctl mask power-profiles-daemon + +# 2. Filter TLP (Prevent CPU governor fights while keeping PCIe saving) +sed -i 's/^CPU_SCALING_GOVERNOR_ON_AC=.*/CPU_SCALING_GOVERNOR_ON_AC=""/' /etc/tlp.conf +sed -i 's/^CPU_BOOST_ON_AC=.*/CPU_BOOST_ON_AC=""/' /etc/tlp.conf +systemctl restart tlp + +# 3. Thermald Delegate (We provide the trips, it handles the rest) +# (Ensure your custom thermal-conf.xml is in /etc/thermald/) +systemctl restart thermald +"#; + fs::write(output_path, script)?; + Ok(()) + } + + /// Generates a thermald configuration XML. + pub fn generate_thermald_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> { + let profile = &matrix.balanced; + let mut xml = String::new(); + xml.push_str("\n\n \n ember-tune Balanced\n Generic\n balanced\n \n \n cpu\n \n"); + + for (i, p) in profile.fan_curve.iter().enumerate() { + xml.push_str(&format!(" \n cpu\n {}\n Passive\n {}\n \n", p.temp_on * 1000.0, i)); + } + + xml.push_str(" \n \n \n \n\n"); + fs::write(output_path, xml)?; + Ok(()) + } +} diff --git a/src/agent_metrology/mod.rs b/src/agent_metrology/mod.rs new file mode 100644 index 0000000..7bc4946 --- /dev/null +++ b/src/agent_metrology/mod.rs @@ -0,0 +1,66 @@ +//! Telemetry & Benchmarking Methodology (Agent Metrology) +//! +//! This module defines the execution flow to extract flawless hardware telemetry. +//! It isolates specific subsystems (CPU Core, Memory) and executes the Sweep Protocol +//! and Thermal Soak to map the physical limits of the hardware. + +use anyhow::Result; +use std::time::{Duration, Instant}; +use std::thread; +use crate::sal::traits::PlatformSal; +use crate::load::{Workload, IntensityProfile, StressVector}; +use tracing::info; + +pub struct MetrologyAgent<'a> { + sal: &'a dyn PlatformSal, + workload: &'a mut Box, +} + +impl<'a> MetrologyAgent<'a> { + pub fn new(sal: &'a dyn PlatformSal, workload: &'a mut Box) -> Self { + Self { sal, workload } + } + + /// Performs a prolonged mixed-load test to achieve chassis thermal saturation. + /// Bypasses short-term PL2/boost metrics to find the true steady-state dissipation capacity. + pub fn perform_thermal_soak(&mut self, duration_minutes: u64) -> Result { + info!("Metrology: Starting {} minute Thermal Soak...", duration_minutes); + + self.sal.set_fan_mode("max")?; + + // Mixed load: matrix math + memory stressors to saturate entire SoC and Chassis. + self.workload.run_workload( + Duration::from_secs(duration_minutes * 60), + IntensityProfile { + threads: num_cpus::get(), + load_percentage: 100, + vector: StressVector::Mixed + } + )?; + + let start = Instant::now(); + let target = Duration::from_secs(duration_minutes * 60); + let mut max_sustained_watts = 0.0; + + while start.elapsed() < target { + thread::sleep(Duration::from_secs(5)); + let temp = self.sal.get_temp().unwrap_or(0.0); + let watts = self.sal.get_power_w().unwrap_or(0.0); + + if watts > max_sustained_watts { + max_sustained_watts = watts; + } + + // Abort if dangerously hot + if temp >= 98.0 { + info!("Metrology: Thermal ceiling hit during soak ({}C). Stopping early.", temp); + break; + } + } + + self.workload.stop_workload()?; + info!("Metrology: Thermal Soak complete. Max sustained: {:.1}W", max_sustained_watts); + + Ok(max_sustained_watts) + } +} diff --git a/src/engine/mod.rs b/src/engine/mod.rs index 07997d8..e65a992 100644 --- a/src/engine/mod.rs +++ b/src/engine/mod.rs @@ -47,6 +47,8 @@ pub struct OptimizationResult { pub is_partial: bool, /// A map of configuration files that were written to. pub config_paths: HashMap, + /// The comprehensive optimization matrix (Silent, Balanced, Performance). + pub optimization_matrix: Option, } /// Pure mathematics engine for thermal optimization. diff --git a/src/engine/profiles.rs b/src/engine/profiles.rs new file mode 100644 index 0000000..e69de29 diff --git a/src/lib.rs b/src/lib.rs index 0f4aa6a..99103a3 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -12,3 +12,6 @@ pub mod ui; pub mod engine; pub mod cli; pub mod sys; +pub mod agent_metrology; +pub mod agent_analyst; +pub mod agent_integrator; diff --git a/src/load/mod.rs b/src/load/mod.rs index 3ec7956..a19ed48 100644 --- a/src/load/mod.rs +++ b/src/load/mod.rs @@ -17,11 +17,20 @@ pub struct WorkloadMetrics { pub elapsed_time: Duration, } +/// Defines which subsystem to isolate during stress testing. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum StressVector { + CpuMatrix, + MemoryBandwidth, + Mixed, +} + /// A normalized profile defining the intensity and constraints of the workload. #[derive(Debug, Clone)] pub struct IntensityProfile { pub threads: usize, pub load_percentage: u8, + pub vector: StressVector, } /// The replaceable interface for load generation and performance measurement. @@ -63,7 +72,7 @@ impl Workload for StressNg { .stdout(Stdio::null()) .stderr(Stdio::null()) .status() - .context("stress-ng binary not found in PATH")?; + .context("stress-ng binary not found in PATH. Please install it.")?; if !status.success() { return Err(anyhow!("stress-ng failed to initialize")); @@ -72,24 +81,29 @@ impl Workload for StressNg { } fn run_workload(&mut self, duration: Duration, profile: IntensityProfile) -> Result<()> { - self.stop_workload()?; // Ensure clean state + self.stop_workload()?; let threads = profile.threads.to_string(); let timeout = format!("{}s", duration.as_secs()); let load = profile.load_percentage.to_string(); - let mut child = Command::new("stress-ng") - .args([ - "--matrix", &threads, - "--cpu-load", &load, - "--timeout", &timeout, - "--metrics-brief", - "--metrics-brief", // Repeat for stderr/stdout consistency - ]) - .stdout(Stdio::piped()) - .stderr(Stdio::piped()) - .spawn() - .context("Failed to spawn stress-ng")?; + let mut cmd = Command::new("stress-ng"); + cmd.args(["--timeout", &timeout, "--metrics", "--quiet"]); + + match profile.vector { + StressVector::CpuMatrix => { + cmd.args(["--matrix", &threads, "--cpu-load", &load]); + }, + StressVector::MemoryBandwidth => { + cmd.args(["--vm", &threads, "--vm-bytes", "80%"]); + }, + StressVector::Mixed => { + let half = (profile.threads / 2).max(1).to_string(); + cmd.args(["--matrix", &half, "--vm", &half, "--vm-bytes", "40%"]); + } + } + + let mut child = cmd.stderr(Stdio::piped()).spawn().context("Failed to spawn stress-ng")?; self.start_time = Some(Instant::now()); @@ -100,16 +114,13 @@ impl Workload for StressNg { thread::spawn(move || { let reader = BufReader::new(stderr); for line in reader.lines().flatten() { - // Parse stress-ng metrics line: - // stress-ng: info: [PID] matrix [OPS] [TIME] [BOGO OPS/S] - if line.contains("matrix") && line.contains("bogo ops/s") { + // Parse stress-ng metrics line + if line.contains("matrix") || line.contains("vm") { let parts: Vec<&str> = line.split_whitespace().collect(); - if let Some(ops_idx) = parts.iter().position(|&p| p == "ops/s") { - if let Some(ops_val) = parts.get(ops_idx - 1) { - if let Ok(ops) = ops_val.parse::() { - let mut m = metrics_ref.lock().unwrap(); - m.primary_ops_per_sec = ops; - } + if let Some(val) = parts.last() { + if let Ok(ops) = val.parse::() { + let mut m = metrics_ref.lock().unwrap(); + m.primary_ops_per_sec = ops; } } } @@ -130,7 +141,6 @@ impl Workload for StressNg { fn stop_workload(&mut self) -> Result<()> { if let Some(mut child) = self.child.take() { - // Polite SIGTERM #[cfg(unix)] { use libc::{kill, SIGTERM}; diff --git a/src/main.rs b/src/main.rs index bc962f4..8e09e73 100644 --- a/src/main.rs +++ b/src/main.rs @@ -189,6 +189,7 @@ fn main() -> Result<()> { pl1_limit: 0.0, pl2_limit: 0.0, fan_tier: "auto".to_string(), + is_throttling: false, phase: BenchmarkPhase::Auditing, history_watts: Vec::new(), history_temp: Vec::new(), diff --git a/src/mediator.rs b/src/mediator.rs index a2d4266..4d1acdc 100644 --- a/src/mediator.rs +++ b/src/mediator.rs @@ -35,6 +35,7 @@ pub struct TelemetryState { pub pl1_limit: f32, pub pl2_limit: f32, pub fan_tier: String, + pub is_throttling: bool, pub phase: BenchmarkPhase, // --- High-res History --- diff --git a/src/orchestrator/mod.rs b/src/orchestrator/mod.rs index 9fe8341..b3f1071 100644 --- a/src/orchestrator/mod.rs +++ b/src/orchestrator/mod.rs @@ -18,9 +18,12 @@ use std::path::PathBuf; use crate::sal::traits::{PlatformSal, SafetyStatus}; use crate::sal::heuristic::discovery::SystemFactSheet; use crate::sal::safety::{HardwareStateGuard, TdpLimitMicroWatts, ConfigurationTransaction, ThermalThresholdCelsius}; -use crate::load::{Workload, IntensityProfile}; +use crate::load::{Workload, IntensityProfile, StressVector}; use crate::mediator::{TelemetryState, UiCommand, BenchmarkPhase}; use crate::engine::{OptimizerEngine, ThermalProfile, ThermalPoint, OptimizationResult}; +use crate::agent_metrology::MetrologyAgent; +use crate::agent_analyst::{HeuristicAnalyst, OptimizationMatrix}; +use crate::agent_integrator::ServiceIntegrator; /// The central state machine responsible for coordinating the thermal benchmark. pub struct BenchmarkOrchestrator { @@ -189,6 +192,13 @@ impl BenchmarkOrchestrator { self.profile.ambient_temp = self.engine.smooth(&idle_temps).last().cloned().unwrap_or(0.0); self.log(&format!("✓ Idle Baseline: {:.1}°C", self.profile.ambient_temp))?; + // Phase 1.5: Thermal Soak (Agent Metrology) + self.log("Phase 1.5: Executing Thermal Soak to achieve chassis saturation...")?; + let soak_duration_minutes = 1; + let mut metrology = MetrologyAgent::new(self.sal.as_ref(), &mut self.workload); + let max_soak_watts = metrology.perform_thermal_soak(soak_duration_minutes)?; + self.log(&format!("✓ Max sustained wattage during soak: {:.1}W", max_soak_watts))?; + // Phase 2: Stress Stepping self.phase = BenchmarkPhase::StressTesting; self.log("Phase 2: Starting Synthetic Stress Matrix.")?; @@ -213,7 +223,7 @@ impl BenchmarkOrchestrator { self.workload.run_workload( Duration::from_secs(bench_cfg.stress_duration_max_s), - IntensityProfile { threads: num_cpus::get(), load_percentage: 100 } + IntensityProfile { threads: num_cpus::get(), load_percentage: 100, vector: StressVector::CpuMatrix } )?; let step_start = Instant::now(); @@ -287,18 +297,22 @@ impl BenchmarkOrchestrator { thread::sleep(Duration::from_secs(bench_cfg.cool_down_s)); } - // Phase 4: Physical Modeling + // Phase 4: Physical Modeling (Agent Analyst) self.phase = BenchmarkPhase::PhysicalModeling; - self.log("Phase 3: Calculating Silicon Physical Sweet Spot...")?; + self.log("Phase 3: Calculating Silicon Physical Sweet Spot & Profiles...")?; + + let analyst = HeuristicAnalyst::new(); + let matrix = analyst.analyze(&self.profile, max_soak_watts); let mut res = self.generate_result(false); + res.optimization_matrix = Some(matrix.clone()); self.log(&format!("✓ Thermal Resistance (Rθ): {:.3} K/W", res.thermal_resistance_kw))?; self.log(&format!("✓ Silicon Knee Found: {:.1} W", res.silicon_knee_watts))?; thread::sleep(Duration::from_secs(3)); - // Phase 5: Finalizing + // Phase 5: Finalizing (Agent Integrator) self.phase = BenchmarkPhase::Finalizing; self.log("Benchmark sequence complete. Generating configurations...")?; @@ -317,15 +331,31 @@ impl BenchmarkOrchestrator { res.config_paths.insert("throttled".to_string(), path.clone()); } - if let Some(i8k_path) = self.facts.paths.configs.get("i8kmon") { - let i8k_config = crate::engine::formatters::i8kmon::I8kmonConfig { - t_ambient: self.profile.ambient_temp, - t_max_fan: res.max_temp_c - 5.0, - thermal_resistance_kw: res.thermal_resistance_kw, - }; - crate::engine::formatters::i8kmon::I8kmonTranslator::save(i8k_path, &i8k_config)?; - self.log(&format!("✓ Saved '{}'.", i8k_path.display()))?; - res.config_paths.insert("i8kmon".to_string(), i8k_path.clone()); + // Generate Fan configs via Agent Integrator + let base_out = self.optional_config_out.clone().unwrap_or_else(|| PathBuf::from("/etc")); + + let i8k_out = base_out.join("i8kmon.conf"); + if ServiceIntegrator::generate_i8kmon_config(&matrix, &i8k_out).is_ok() { + self.log(&format!("✓ Saved '{}'.", i8k_out.display()))?; + res.config_paths.insert("i8kmon".to_string(), i8k_out); + } + + let thinkfan_out = base_out.join("thinkfan.conf"); + if ServiceIntegrator::generate_thinkfan_config(&matrix, &thinkfan_out).is_ok() { + self.log(&format!("✓ Saved '{}'.", thinkfan_out.display()))?; + res.config_paths.insert("thinkfan".to_string(), thinkfan_out); + } + + let thermald_out = base_out.join("thermal-conf.xml"); + if ServiceIntegrator::generate_thermald_config(&matrix, &thermald_out).is_ok() { + self.log(&format!("✓ Saved '{}'.", thermald_out.display()))?; + res.config_paths.insert("thermald".to_string(), thermald_out); + } + + let script_out = base_out.join("ember-tune-neutralize.sh"); + if ServiceIntegrator::generate_conflict_resolution_script(&script_out).is_ok() { + self.log(&format!("✓ Saved conflict resolution script: '{}'", script_out.display()))?; + res.config_paths.insert("conflict_script".to_string(), script_out); } Ok(res) @@ -359,6 +389,7 @@ impl BenchmarkOrchestrator { pl1_limit: 0.0, pl2_limit: 0.0, fan_tier: String::new(), + is_throttling: sal.get_throttling_status().unwrap_or(false), phase: BenchmarkPhase::StressTesting, history_watts: Vec::new(), history_temp: Vec::new(), @@ -396,6 +427,7 @@ impl BenchmarkOrchestrator { max_temp_c: max_t, is_partial, config_paths: std::collections::HashMap::new(), + optimization_matrix: None, } } @@ -428,6 +460,7 @@ impl BenchmarkOrchestrator { pl1_limit: 0.0, pl2_limit: 0.0, fan_tier: "auto".to_string(), + is_throttling: self.sal.get_throttling_status().unwrap_or(false), phase: self.phase, history_watts: Vec::new(), history_temp: Vec::new(), @@ -444,6 +477,7 @@ impl BenchmarkOrchestrator { let temp = self.sal.get_temp().unwrap_or(0.0); let pwr = self.sal.get_power_w().unwrap_or(0.0); let freq = self.sal.get_freq_mhz().unwrap_or(0.0); + let throttling = self.sal.get_throttling_status().unwrap_or(false); self.history_temp.push_back(temp); self.history_watts.push_back(pwr); @@ -467,6 +501,7 @@ impl BenchmarkOrchestrator { pl1_limit: 15.0, pl2_limit: 25.0, fan_tier: "max".to_string(), + is_throttling: throttling, phase: self.phase, history_watts: self.history_watts.iter().cloned().collect(), history_temp: self.history_temp.iter().cloned().collect(), diff --git a/src/sal/dell_xps_9380.rs b/src/sal/dell_xps_9380.rs index 6c81d1a..be78de1 100644 --- a/src/sal/dell_xps_9380.rs +++ b/src/sal/dell_xps_9380.rs @@ -5,9 +5,10 @@ use std::fs; use std::path::{PathBuf}; use std::time::{Duration, Instant}; use std::sync::Mutex; -use tracing::{debug}; +use tracing::{debug, warn}; use crate::sal::heuristic::discovery::SystemFactSheet; +/// Implementation of the System Abstraction Layer for the Dell XPS 13 9380. pub struct DellXps9380Sal { ctx: EnvironmentCtx, fact_sheet: SystemFactSheet, @@ -23,9 +24,16 @@ pub struct DellXps9380Sal { suppressed_services: Mutex>, msr_file: Mutex, last_energy: Mutex<(u64, Instant)>, + last_watts: Mutex, + + // --- Original State for Restoration --- + original_pl1: Mutex>, + original_pl2: Mutex>, + original_fan_mode: Mutex>, } impl DellXps9380Sal { + /// Initializes the Dell SAL, opening the MSR interface and discovering sensors. pub fn init(ctx: EnvironmentCtx, facts: SystemFactSheet) -> Result { let temp_path = facts.temp_path.clone().context("Dell SAL requires temperature sensor")?; let pwr_base = facts.rapl_paths.first().cloned().context("Dell SAL requires RAPL interface")?; @@ -52,8 +60,12 @@ impl DellXps9380Sal { suppressed_services: Mutex::new(Vec::new()), msr_file: Mutex::new(msr_file), last_energy: Mutex::new((initial_energy, Instant::now())), + last_watts: Mutex::new(0.0), fact_sheet: facts, ctx, + original_pl1: Mutex::new(None), + original_pl2: Mutex::new(None), + original_fan_mode: Mutex::new(None), }) } @@ -81,6 +93,22 @@ impl PreflightAuditor for DellXps9380Sal { outcome: if unsafe { libc::getuid() } == 0 { Ok(()) } else { Err(AuditError::RootRequired) } }); + // RAPL Lock Check (MSR 0x610) + let rapl_lock = match self.read_msr(0x610) { + Ok(val) => { + if (val & (1 << 63)) != 0 { + Err(AuditError::KernelIncompatible("RAPL Registers are locked by BIOS. Power limit tuning is impossible.".to_string())) + } else { + Ok(()) + } + }, + Err(e) => Err(AuditError::ToolMissing(format!("Cannot read MSR 0x610: {}", e))), + }; + steps.push(AuditStep { + description: "MSR 0x610 RAPL Lock Status".to_string(), + outcome: rapl_lock, + }); + let modules = ["dell_smm_hwmon", "msr", "intel_rapl_msr"]; for mod_name in modules { let path = self.ctx.sysfs_base.join(format!("sys/module/{}", mod_name)); @@ -115,23 +143,24 @@ impl PreflightAuditor for DellXps9380Sal { } }); - let tool_check = self.fact_sheet.paths.tools.contains_key("dell_fan_ctrl"); - steps.push(AuditStep { - description: "Dell Fan Control Tool".to_string(), - outcome: if tool_check { Ok(()) } else { Err(AuditError::ToolMissing("dell-bios-fan-control not found in PATH".to_string())) } - }); - Box::new(steps.into_iter()) } } impl EnvironmentGuard for DellXps9380Sal { fn suppress(&self) -> Result<()> { - let mut suppressed = self.suppressed_services.lock().unwrap(); + if let Ok(pl1) = fs::read_to_string(&self.pl1_path) { + *self.original_pl1.lock().unwrap() = pl1.trim().parse().ok(); + } + if let Ok(pl2) = fs::read_to_string(&self.pl2_path) { + *self.original_pl2.lock().unwrap() = pl2.trim().parse().ok(); + } + *self.original_fan_mode.lock().unwrap() = Some("1".to_string()); + let services = ["tlp", "thermald", "i8kmon"]; + let mut suppressed = self.suppressed_services.lock().unwrap(); for s in services { if self.ctx.runner.run("systemctl", &["is-active", "--quiet", s]).is_ok() { - debug!("Suppressing service: {}", s); let _ = self.ctx.runner.run("systemctl", &["stop", s]); suppressed.push(s.to_string()); } @@ -140,6 +169,15 @@ impl EnvironmentGuard for DellXps9380Sal { } fn restore(&self) -> Result<()> { + if let Some(pl1) = *self.original_pl1.lock().unwrap() { + let _ = fs::write(&self.pl1_path, pl1.to_string()); + } + if let Some(pl2) = *self.original_pl2.lock().unwrap() { + let _ = fs::write(&self.pl2_path, pl2.to_string()); + } + if let Some(tool_path) = self.fact_sheet.paths.tools.get("dell_fan_ctrl") { + let _ = self.ctx.runner.run(&tool_path.to_string_lossy(), &["1"]); + } let mut suppressed = self.suppressed_services.lock().unwrap(); for s in suppressed.drain(..) { let _ = self.ctx.runner.run("systemctl", &["start", &s]); @@ -167,16 +205,25 @@ impl SensorBus for DellXps9380Sal { let energy_path = rapl_base.join("energy_uj"); if energy_path.exists() { - let mut last = self.last_energy.lock().unwrap(); + let mut last_energy = self.last_energy.lock().unwrap(); + let mut last_watts = self.last_watts.lock().unwrap(); + let e2_str = fs::read_to_string(&energy_path)?; let e2 = e2_str.trim().parse::()?; let t2 = Instant::now(); - let (e1, t1) = *last; + let (e1, t1) = *last_energy; + let delta_e = e2.wrapping_sub(e1); let delta_t = t2.duration_since(t1).as_secs_f32(); - *last = (e2, t2); - if delta_t < 0.05 { return Ok(0.0); } - Ok((delta_e as f32 / 1_000_000.0) / delta_t) + + if delta_t < 0.1 { + return Ok(*last_watts); // Return cached if polled too fast + } + + let watts = (delta_e as f32 / 1_000_000.0) / delta_t; + *last_energy = (e2, t2); + *last_watts = watts; + Ok(watts) } else { let s = fs::read_to_string(&self.pwr_path)?; Ok(s.trim().parse::()? / 1000000.0) @@ -204,6 +251,12 @@ impl SensorBus for DellXps9380Sal { let s = fs::read_to_string(&self.freq_path)?; Ok(s.trim().parse::()? / 1000.0) } + + fn get_throttling_status(&self) -> Result { + // MSR 0x19C bit 0 is "Thermal Status", bit 1 is "Thermal Log" + let val = self.read_msr(0x19C)?; + Ok((val & 0x1) != 0) + } } impl ActuatorBus for DellXps9380Sal { @@ -220,14 +273,7 @@ impl ActuatorBus for DellXps9380Sal { Ok(()) } - fn set_fan_speed(&self, speed: FanSpeedPercentage) -> Result<()> { - let tool_path = self.fact_sheet.paths.tools.get("dell_fan_ctrl") - .ok_or_else(|| anyhow!("Dell fan control tool not found in PATH"))?; - let tool_str = tool_path.to_string_lossy(); - - if speed.as_u8() > 50 { - let _ = self.ctx.runner.run(&tool_str, &["0"]); - } + fn set_fan_speed(&self, _speed: FanSpeedPercentage) -> Result<()> { Ok(()) } diff --git a/src/sal/generic_linux.rs b/src/sal/generic_linux.rs index e003ce6..767dbe7 100644 --- a/src/sal/generic_linux.rs +++ b/src/sal/generic_linux.rs @@ -133,6 +133,23 @@ impl SensorBus for GenericLinuxSal { Err(anyhow!("Could not determine CPU frequency")) } } + + fn get_throttling_status(&self) -> Result { + // Fallback: check if any cooling device is active (cur_state > 0) + let cooling_base = self.ctx.sysfs_base.join("sys/class/thermal"); + if let Ok(entries) = fs::read_dir(cooling_base) { + for entry in entries.flatten() { + if entry.file_name().to_string_lossy().starts_with("cooling_device") { + if let Ok(state) = fs::read_to_string(entry.path().join("cur_state")) { + if state.trim().parse::().unwrap_or(0) > 0 { + return Ok(true); + } + } + } + } + } + Ok(false) + } } impl ActuatorBus for GenericLinuxSal { diff --git a/src/sal/mock.rs b/src/sal/mock.rs index ecddb91..079a982 100644 --- a/src/sal/mock.rs +++ b/src/sal/mock.rs @@ -54,6 +54,9 @@ impl SensorBus for MockSal { fn get_freq_mhz(&self) -> Result { Ok(3200.0) } + fn get_throttling_status(&self) -> Result { + Ok(self.get_temp()? > 90.0) + } } impl ActuatorBus for MockSal { diff --git a/src/sal/traits.rs b/src/sal/traits.rs index 235f6b1..996b4e6 100644 --- a/src/sal/traits.rs +++ b/src/sal/traits.rs @@ -140,6 +140,9 @@ pub trait SensorBus: Send + Sync { /// # Errors /// Returns an error if `/proc/cpuinfo` or a `cpufreq` sysfs node cannot be read. fn get_freq_mhz(&self) -> Result; + + /// Returns true if the system is currently thermally throttling. + fn get_throttling_status(&self) -> Result; } impl SensorBus for Arc { @@ -155,6 +158,9 @@ impl SensorBus for Arc { fn get_freq_mhz(&self) -> Result { (**self).get_freq_mhz() } + fn get_throttling_status(&self) -> Result { + (**self).get_throttling_status() + } } use crate::sal::safety::{TdpLimitMicroWatts, FanSpeedPercentage};