update docs + agents

This commit is contained in:
2026-02-27 17:04:47 +01:00
parent fe1f58b5ce
commit 4f54fd81ce
15 changed files with 508 additions and 99 deletions

View File

@@ -1,33 +1,61 @@
# 🔥 ember-tune
```text
__________ ____ ______ ____ ______ __ __ _ __ ______
/ ____/ |/ // __ )/ ____// __ \ /_ __/ / / / // | / // ____/
/ __/ / /|_/ // __ / __/ / /_/ / / / / / / // |/ // __/
/ /___ / / / // /_/ / /___ / _, _/ / / / /_/ // /| // /___
/_____//_/ /_//_____/_____//_/ |_| /_/ \____//_/ |_//_____/
>>> Physically-grounded thermal & power optimization for Linux <<<
```
> ### **Find your hardware's "Physical Sweet Spot" through automated trial-by-fire.**
`ember-tune` is a scientifically-driven hardware optimizer that replaces guesswork and manual tuning with a rigorous, automated engineering workflow. It determines the unique thermal properties of your specific laptop—including its Thermal Resistance (Rθ) and "Silicon Knee"—to generate optimal configurations for common Linux tuning daemons.
## ✨ Features
- **Automated Physical Benchmarking:** Measures real-world thermal performance under load to find the true "sweet spot" where performance-per-watt is maximized before thermal saturation causes diminishing returns.
- **Heuristic Hardware Discovery:** Utilizes a data-driven Hardware Abstraction Layer (SAL) that probes your system and automatically adapts to its unique quirks, drivers, and sensor paths.
- **Non-Destructive Configuration:** Safely merges new, optimized power limits into your existing `throttled.conf`, preserving manual undervolt settings and comments.
- **Universal Safeguard Architecture (USA):** Includes a high-frequency concurrent watchdog and RAII state restoration to guarantee your system is never left in a dangerous state.
- **Real-time TUI Dashboard:** A `ratatui`-based terminal interface provides high-resolution telemetry throughout the benchmark.
## 🔬 How it Works: The Architecture
`ember-tune` is built on a decoupled, multi-threaded architecture to ensure the UI is always responsive and that hardware state is managed safely.
1. **The Heuristic Engine:** On startup, the engine probes your system's DMI, `sysfs`, and active services. It compares these "facts" against the `hardware_db.toml` to select the correct System Abstraction Layer (SAL).
2. **The Orchestrator (Backend Thread):** This is the state machine that executes the benchmark. It communicates with hardware *only* through the SAL traits.
3. **The TUI (Main Thread):** The `ratatui` dashboard renders `TelemetryState` snapshots received from the orchestrator via an MPSC channel.
4. **The Watchdog (Safety Thread):** A high-priority thread that polls safety sensors every 100ms to trigger an atomic `EmergencyAbort` if failure conditions are met.
## ⚙️ Development Setup
`ember-tune` is a standard Cargo project. You will need a recent Rust toolchain and common build utilities.
`ember-tune` is a standard Cargo project.
**Prerequisites:**
- `rustup`
- `build-essential` (or equivalent for your distribution)
- `build-essential`
- `libudev-dev`
- `stress-ng` (Required for benchmarking)
```bash
# 1. Clone the repository
# 1. Clone and Build
git clone https://gitea.com/narl/ember-tune.git
cd ember-tune
# 2. Build the release binary
cargo build --release
# 3. Run the test suite (safe, uses a virtual environment)
# This requires no special permissions and does not touch your hardware.
# 2. Run the safe test suite
cargo test
```
**Running:**
Due to its direct hardware access, `ember-tune` requires root privileges.
```bash
# Run a full benchmark and generate optimized configs
# Run a full benchmark
sudo ./target/release/ember-tune
# Run a mock benchmark for UI/logic testing
# Run a mock benchmark for UI testing
sudo ./target/release/ember-tune --mock
```
@@ -35,48 +63,24 @@ sudo ./target/release/ember-tune --mock
## 🤝 Contributing Quirk Data (`hardware_db.toml`)
**This is the most impactful way to contribute.** `ember-tune`'s strength comes from its `assets/hardware_db.toml`, which encodes community knowledge about how to manage specific laptops. If your hardware isn't working perfectly, you can likely fix it by adding a new entry here.
**This is the most impactful way to contribute.** If your hardware isn't working perfectly, add a new entry to `assets/hardware_db.toml`.
The database is composed of four key sections: `conflicts`, `ecosystems`, `quirks`, and `discovery`.
### A. Reporting a Service Conflict
If a background service on your system interferes with `ember-tune`, add it to `[[conflicts]]`.
**Example:** Adding `laptop-mode-tools`.
### Example: Adding a Service Conflict
```toml
[[conflicts]]
id = "laptop_mode_conflict"
services = ["laptop-mode.service"]
contention = "Multiple - I/O schedulers, Power limits"
severity = "Medium"
fix_action = "SuspendService" # Orchestrator will stop/start this service
fix_action = "SuspendService"
help_text = "laptop-mode-tools can override power-related sysfs settings."
```
### B. Adding a New Hardware Ecosystem
If your laptop manufacturer (e.g., Razer) has a unique fan control tool or ACPI platform profile path, define it in `[ecosystems]`.
**Example:** A hypothetical "Razer" ecosystem.
```toml
[ecosystems.razer]
vendor_regex = "Razer"
# Path to the sysfs node that controls performance profiles
profiles_path = "/sys/bus/platform/drivers/razer_acpi/power_mode"
# Map human-readable names to the values the driver expects
policy_map = { Balanced = 0, Boost = 1, Silent = 2 }
```
### C. Defining a Model-Specific Quirk
If a specific laptop model has a bug (like a stuck sensor or incorrect fan reporting), define a `[[quirks]]` entry.
**Example:** A laptop whose fans report 0 RPM even when spinning.
### Example: Defining a Model-Specific Quirk
```toml
[[quirks]]
model_regex = "HP Envy 15-ep.*"
id = "hp_fan_stuck_sensor"
issue = "Fan sensor reports 0 RPM when active."
# The 'action' tells the SAL to use a different method for fan detection.
action = "UseThermalVelocityFallback"
```
After adding your changes, run the test suite and then submit a Pull Request!

100
src/agent_analyst/mod.rs Normal file
View File

@@ -0,0 +1,100 @@
//! Heuristic Analysis & Optimization Math (Agent Analyst)
//!
//! This module analyzes raw telemetry data to extract the "Optimal Real-World Settings".
//! It calculates the Silicon Knee, Acoustic/Thermal Matrix (Hysteresis), and
//! generates three distinct hardware states: Silent, Balanced, and Sustained Heavy.
use serde::{Serialize, Deserialize};
use crate::engine::{ThermalProfile, OptimizerEngine};
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct FanCurvePoint {
pub temp_on: f32,
pub temp_off: f32,
pub pwm_percent: u8,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct SystemProfile {
pub name: String,
pub pl1_watts: f32,
pub pl2_watts: f32,
pub fan_curve: Vec<FanCurvePoint>,
}
#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct OptimizationMatrix {
pub silent: SystemProfile,
pub balanced: SystemProfile,
pub performance: SystemProfile,
pub thermal_resistance_kw: f32,
}
pub struct HeuristicAnalyst {
engine: OptimizerEngine,
}
impl HeuristicAnalyst {
pub fn new() -> Self {
Self {
engine: OptimizerEngine::new(5),
}
}
/// Analyzes the raw telemetry to generate the 3 optimal profiles.
pub fn analyze(&self, profile: &ThermalProfile, max_soak_watts: f32) -> OptimizationMatrix {
let r_theta = self.engine.calculate_thermal_resistance(profile);
let silicon_knee = self.engine.find_silicon_knee(profile);
// 1. State A: Silent / Battery (Scientific Passive Equilibrium)
// Objective: Find P where T_core = 60C with fans OFF.
// T_core = T_ambient + (P * R_theta_passive)
// Note: R_theta measured during benchmark was with fans MAX.
// Passive R_theta is typically 2-3x higher.
let r_theta_passive = r_theta * 2.5;
let silent_watts = ((60.0 - profile.ambient_temp) / r_theta_passive.max(0.1)).clamp(5.0, 15.0);
let silent_profile = SystemProfile {
name: "Silent".to_string(),
pl1_watts: silent_watts,
pl2_watts: silent_watts * 1.2,
fan_curve: vec![
FanCurvePoint { temp_on: 65.0, temp_off: 55.0, pwm_percent: 0 },
FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 30 },
],
};
// 2. State B: Balanced
// The exact calculated Silicon Knee
let balanced_profile = SystemProfile {
name: "Balanced".to_string(),
pl1_watts: silicon_knee,
pl2_watts: silicon_knee * 1.25,
fan_curve: vec![
FanCurvePoint { temp_on: 60.0, temp_off: 55.0, pwm_percent: 0 },
FanCurvePoint { temp_on: 75.0, temp_off: 65.0, pwm_percent: 40 },
FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 70 },
],
};
// 3. State C: Sustained Heavy
// Based on the max soak watts from Phase 1.
let performance_profile = SystemProfile {
name: "Performance".to_string(),
pl1_watts: max_soak_watts,
pl2_watts: max_soak_watts * 1.3,
fan_curve: vec![
FanCurvePoint { temp_on: 50.0, temp_off: 45.0, pwm_percent: 30 },
FanCurvePoint { temp_on: 70.0, temp_off: 60.0, pwm_percent: 60 },
FanCurvePoint { temp_on: 85.0, temp_off: 75.0, pwm_percent: 100 },
],
};
OptimizationMatrix {
silent: silent_profile,
balanced: balanced_profile,
performance: performance_profile,
thermal_resistance_kw: r_theta,
}
}
}

115
src/agent_integrator/mod.rs Normal file
View File

@@ -0,0 +1,115 @@
//! System Service Integration (Agent Integrator)
//!
//! This module translates the mathematical optimums defined by the Analyst
//! into actionable, real-world Linux/OS service configurations.
//! It generates templates for fan daemons (i8kmon, thinkfan) and handles
//! resolution strategies for overlapping daemons.
use anyhow::Result;
use std::path::Path;
use std::fs;
use crate::agent_analyst::OptimizationMatrix;
pub struct ServiceIntegrator;
impl ServiceIntegrator {
/// Generates and saves an i8kmon configuration based on the balanced profile.
pub fn generate_i8kmon_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> {
let profile = &matrix.balanced;
let mut conf = String::new();
conf.push_str("# Auto-generated by ember-tune Integrator
");
conf.push_str(&format!("# Profile: {}
", profile.name));
for (i, p) in profile.fan_curve.iter().enumerate() {
// i8kmon syntax: set config(state) {left_fan right_fan temp_on temp_off}
// State 0, 1, 2, 3 correspond to BIOS fan states (off, low, high)
let state = match p.pwm_percent {
0..=20 => 0,
21..=50 => 1,
51..=100 => 2,
_ => 2,
};
let off = if i == 0 { "-".to_string() } else { format!("{}", p.temp_off) };
conf.push_str(&format!("set config({}) {{{} {} {} {}}}
", i, state, state, p.temp_on, off));
}
fs::write(output_path, conf)?;
Ok(())
}
/// Generates a thinkfan configuration.
pub fn generate_thinkfan_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> {
let profile = &matrix.balanced;
let mut conf = String::new();
conf.push_str("# Auto-generated by ember-tune Integrator
");
conf.push_str("sensors:
- hwmon: /sys/class/hwmon/hwmon0/temp1_input
");
conf.push_str("levels:
");
for (i, p) in profile.fan_curve.iter().enumerate() {
// thinkfan syntax: - [level, temp_down, temp_up]
let level = match p.pwm_percent {
0..=20 => 0,
21..=40 => 1,
41..=60 => 3,
61..=80 => 5,
_ => 7,
};
let down = if i == 0 { 0.0 } else { p.temp_off };
conf.push_str(&format!(" - [{}, {}, {}]
", level, down, p.temp_on));
}
fs::write(output_path, conf)?;
Ok(())
}
/// Generates a resolution checklist/script for daemons.
pub fn generate_conflict_resolution_script(output_path: &Path) -> Result<()> {
let script = r#"#!/bin/bash
# ember-tune Daemon Neutralization Script
# 1. Mask power-profiles-daemon (Prevent ACPI overrides)
systemctl mask power-profiles-daemon
# 2. Filter TLP (Prevent CPU governor fights while keeping PCIe saving)
sed -i 's/^CPU_SCALING_GOVERNOR_ON_AC=.*/CPU_SCALING_GOVERNOR_ON_AC=""/' /etc/tlp.conf
sed -i 's/^CPU_BOOST_ON_AC=.*/CPU_BOOST_ON_AC=""/' /etc/tlp.conf
systemctl restart tlp
# 3. Thermald Delegate (We provide the trips, it handles the rest)
# (Ensure your custom thermal-conf.xml is in /etc/thermald/)
systemctl restart thermald
"#;
fs::write(output_path, script)?;
Ok(())
}
/// Generates a thermald configuration XML.
pub fn generate_thermald_config(matrix: &OptimizationMatrix, output_path: &Path) -> Result<()> {
let profile = &matrix.balanced;
let mut xml = String::new();
xml.push_str("<?xml version=\"1.0\"?>\n<ThermalConfiguration>\n <Platform>\n <Name>ember-tune Balanced</Name>\n <ProductName>Generic</ProductName>\n <Preference>balanced</Preference>\n <ThermalZones>\n <ThermalZone>\n <Type>cpu</Type>\n <TripPoints>\n");
for (i, p) in profile.fan_curve.iter().enumerate() {
xml.push_str(&format!(" <TripPoint>\n <SensorType>cpu</SensorType>\n <Temperature>{}</Temperature>\n <Type>Passive</Type>\n <ControlId>{}</ControlId>\n </TripPoint>\n", p.temp_on * 1000.0, i));
}
xml.push_str(" </TripPoints>\n </ThermalZone>\n </ThermalZones>\n </Platform>\n</ThermalConfiguration>\n");
fs::write(output_path, xml)?;
Ok(())
}
}

View File

@@ -0,0 +1,66 @@
//! Telemetry & Benchmarking Methodology (Agent Metrology)
//!
//! This module defines the execution flow to extract flawless hardware telemetry.
//! It isolates specific subsystems (CPU Core, Memory) and executes the Sweep Protocol
//! and Thermal Soak to map the physical limits of the hardware.
use anyhow::Result;
use std::time::{Duration, Instant};
use std::thread;
use crate::sal::traits::PlatformSal;
use crate::load::{Workload, IntensityProfile, StressVector};
use tracing::info;
pub struct MetrologyAgent<'a> {
sal: &'a dyn PlatformSal,
workload: &'a mut Box<dyn Workload>,
}
impl<'a> MetrologyAgent<'a> {
pub fn new(sal: &'a dyn PlatformSal, workload: &'a mut Box<dyn Workload>) -> Self {
Self { sal, workload }
}
/// Performs a prolonged mixed-load test to achieve chassis thermal saturation.
/// Bypasses short-term PL2/boost metrics to find the true steady-state dissipation capacity.
pub fn perform_thermal_soak(&mut self, duration_minutes: u64) -> Result<f32> {
info!("Metrology: Starting {} minute Thermal Soak...", duration_minutes);
self.sal.set_fan_mode("max")?;
// Mixed load: matrix math + memory stressors to saturate entire SoC and Chassis.
self.workload.run_workload(
Duration::from_secs(duration_minutes * 60),
IntensityProfile {
threads: num_cpus::get(),
load_percentage: 100,
vector: StressVector::Mixed
}
)?;
let start = Instant::now();
let target = Duration::from_secs(duration_minutes * 60);
let mut max_sustained_watts = 0.0;
while start.elapsed() < target {
thread::sleep(Duration::from_secs(5));
let temp = self.sal.get_temp().unwrap_or(0.0);
let watts = self.sal.get_power_w().unwrap_or(0.0);
if watts > max_sustained_watts {
max_sustained_watts = watts;
}
// Abort if dangerously hot
if temp >= 98.0 {
info!("Metrology: Thermal ceiling hit during soak ({}C). Stopping early.", temp);
break;
}
}
self.workload.stop_workload()?;
info!("Metrology: Thermal Soak complete. Max sustained: {:.1}W", max_sustained_watts);
Ok(max_sustained_watts)
}
}

View File

@@ -47,6 +47,8 @@ pub struct OptimizationResult {
pub is_partial: bool,
/// A map of configuration files that were written to.
pub config_paths: HashMap<String, PathBuf>,
/// The comprehensive optimization matrix (Silent, Balanced, Performance).
pub optimization_matrix: Option<crate::agent_analyst::OptimizationMatrix>,
}
/// Pure mathematics engine for thermal optimization.

0
src/engine/profiles.rs Normal file
View File

View File

@@ -12,3 +12,6 @@ pub mod ui;
pub mod engine;
pub mod cli;
pub mod sys;
pub mod agent_metrology;
pub mod agent_analyst;
pub mod agent_integrator;

View File

@@ -17,11 +17,20 @@ pub struct WorkloadMetrics {
pub elapsed_time: Duration,
}
/// Defines which subsystem to isolate during stress testing.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum StressVector {
CpuMatrix,
MemoryBandwidth,
Mixed,
}
/// A normalized profile defining the intensity and constraints of the workload.
#[derive(Debug, Clone)]
pub struct IntensityProfile {
pub threads: usize,
pub load_percentage: u8,
pub vector: StressVector,
}
/// The replaceable interface for load generation and performance measurement.
@@ -63,7 +72,7 @@ impl Workload for StressNg {
.stdout(Stdio::null())
.stderr(Stdio::null())
.status()
.context("stress-ng binary not found in PATH")?;
.context("stress-ng binary not found in PATH. Please install it.")?;
if !status.success() {
return Err(anyhow!("stress-ng failed to initialize"));
@@ -72,24 +81,29 @@ impl Workload for StressNg {
}
fn run_workload(&mut self, duration: Duration, profile: IntensityProfile) -> Result<()> {
self.stop_workload()?; // Ensure clean state
self.stop_workload()?;
let threads = profile.threads.to_string();
let timeout = format!("{}s", duration.as_secs());
let load = profile.load_percentage.to_string();
let mut child = Command::new("stress-ng")
.args([
"--matrix", &threads,
"--cpu-load", &load,
"--timeout", &timeout,
"--metrics-brief",
"--metrics-brief", // Repeat for stderr/stdout consistency
])
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()
.context("Failed to spawn stress-ng")?;
let mut cmd = Command::new("stress-ng");
cmd.args(["--timeout", &timeout, "--metrics", "--quiet"]);
match profile.vector {
StressVector::CpuMatrix => {
cmd.args(["--matrix", &threads, "--cpu-load", &load]);
},
StressVector::MemoryBandwidth => {
cmd.args(["--vm", &threads, "--vm-bytes", "80%"]);
},
StressVector::Mixed => {
let half = (profile.threads / 2).max(1).to_string();
cmd.args(["--matrix", &half, "--vm", &half, "--vm-bytes", "40%"]);
}
}
let mut child = cmd.stderr(Stdio::piped()).spawn().context("Failed to spawn stress-ng")?;
self.start_time = Some(Instant::now());
@@ -100,16 +114,13 @@ impl Workload for StressNg {
thread::spawn(move || {
let reader = BufReader::new(stderr);
for line in reader.lines().flatten() {
// Parse stress-ng metrics line:
// stress-ng: info: [PID] matrix [OPS] [TIME] [BOGO OPS/S]
if line.contains("matrix") && line.contains("bogo ops/s") {
// Parse stress-ng metrics line
if line.contains("matrix") || line.contains("vm") {
let parts: Vec<&str> = line.split_whitespace().collect();
if let Some(ops_idx) = parts.iter().position(|&p| p == "ops/s") {
if let Some(ops_val) = parts.get(ops_idx - 1) {
if let Ok(ops) = ops_val.parse::<f64>() {
let mut m = metrics_ref.lock().unwrap();
m.primary_ops_per_sec = ops;
}
if let Some(val) = parts.last() {
if let Ok(ops) = val.parse::<f64>() {
let mut m = metrics_ref.lock().unwrap();
m.primary_ops_per_sec = ops;
}
}
}
@@ -130,7 +141,6 @@ impl Workload for StressNg {
fn stop_workload(&mut self) -> Result<()> {
if let Some(mut child) = self.child.take() {
// Polite SIGTERM
#[cfg(unix)]
{
use libc::{kill, SIGTERM};

View File

@@ -189,6 +189,7 @@ fn main() -> Result<()> {
pl1_limit: 0.0,
pl2_limit: 0.0,
fan_tier: "auto".to_string(),
is_throttling: false,
phase: BenchmarkPhase::Auditing,
history_watts: Vec::new(),
history_temp: Vec::new(),

View File

@@ -35,6 +35,7 @@ pub struct TelemetryState {
pub pl1_limit: f32,
pub pl2_limit: f32,
pub fan_tier: String,
pub is_throttling: bool,
pub phase: BenchmarkPhase,
// --- High-res History ---

View File

@@ -18,9 +18,12 @@ use std::path::PathBuf;
use crate::sal::traits::{PlatformSal, SafetyStatus};
use crate::sal::heuristic::discovery::SystemFactSheet;
use crate::sal::safety::{HardwareStateGuard, TdpLimitMicroWatts, ConfigurationTransaction, ThermalThresholdCelsius};
use crate::load::{Workload, IntensityProfile};
use crate::load::{Workload, IntensityProfile, StressVector};
use crate::mediator::{TelemetryState, UiCommand, BenchmarkPhase};
use crate::engine::{OptimizerEngine, ThermalProfile, ThermalPoint, OptimizationResult};
use crate::agent_metrology::MetrologyAgent;
use crate::agent_analyst::{HeuristicAnalyst, OptimizationMatrix};
use crate::agent_integrator::ServiceIntegrator;
/// The central state machine responsible for coordinating the thermal benchmark.
pub struct BenchmarkOrchestrator {
@@ -189,6 +192,13 @@ impl BenchmarkOrchestrator {
self.profile.ambient_temp = self.engine.smooth(&idle_temps).last().cloned().unwrap_or(0.0);
self.log(&format!("✓ Idle Baseline: {:.1}°C", self.profile.ambient_temp))?;
// Phase 1.5: Thermal Soak (Agent Metrology)
self.log("Phase 1.5: Executing Thermal Soak to achieve chassis saturation...")?;
let soak_duration_minutes = 1;
let mut metrology = MetrologyAgent::new(self.sal.as_ref(), &mut self.workload);
let max_soak_watts = metrology.perform_thermal_soak(soak_duration_minutes)?;
self.log(&format!("✓ Max sustained wattage during soak: {:.1}W", max_soak_watts))?;
// Phase 2: Stress Stepping
self.phase = BenchmarkPhase::StressTesting;
self.log("Phase 2: Starting Synthetic Stress Matrix.")?;
@@ -213,7 +223,7 @@ impl BenchmarkOrchestrator {
self.workload.run_workload(
Duration::from_secs(bench_cfg.stress_duration_max_s),
IntensityProfile { threads: num_cpus::get(), load_percentage: 100 }
IntensityProfile { threads: num_cpus::get(), load_percentage: 100, vector: StressVector::CpuMatrix }
)?;
let step_start = Instant::now();
@@ -287,18 +297,22 @@ impl BenchmarkOrchestrator {
thread::sleep(Duration::from_secs(bench_cfg.cool_down_s));
}
// Phase 4: Physical Modeling
// Phase 4: Physical Modeling (Agent Analyst)
self.phase = BenchmarkPhase::PhysicalModeling;
self.log("Phase 3: Calculating Silicon Physical Sweet Spot...")?;
self.log("Phase 3: Calculating Silicon Physical Sweet Spot & Profiles...")?;
let analyst = HeuristicAnalyst::new();
let matrix = analyst.analyze(&self.profile, max_soak_watts);
let mut res = self.generate_result(false);
res.optimization_matrix = Some(matrix.clone());
self.log(&format!("✓ Thermal Resistance (Rθ): {:.3} K/W", res.thermal_resistance_kw))?;
self.log(&format!("✓ Silicon Knee Found: {:.1} W", res.silicon_knee_watts))?;
thread::sleep(Duration::from_secs(3));
// Phase 5: Finalizing
// Phase 5: Finalizing (Agent Integrator)
self.phase = BenchmarkPhase::Finalizing;
self.log("Benchmark sequence complete. Generating configurations...")?;
@@ -317,15 +331,31 @@ impl BenchmarkOrchestrator {
res.config_paths.insert("throttled".to_string(), path.clone());
}
if let Some(i8k_path) = self.facts.paths.configs.get("i8kmon") {
let i8k_config = crate::engine::formatters::i8kmon::I8kmonConfig {
t_ambient: self.profile.ambient_temp,
t_max_fan: res.max_temp_c - 5.0,
thermal_resistance_kw: res.thermal_resistance_kw,
};
crate::engine::formatters::i8kmon::I8kmonTranslator::save(i8k_path, &i8k_config)?;
self.log(&format!("✓ Saved '{}'.", i8k_path.display()))?;
res.config_paths.insert("i8kmon".to_string(), i8k_path.clone());
// Generate Fan configs via Agent Integrator
let base_out = self.optional_config_out.clone().unwrap_or_else(|| PathBuf::from("/etc"));
let i8k_out = base_out.join("i8kmon.conf");
if ServiceIntegrator::generate_i8kmon_config(&matrix, &i8k_out).is_ok() {
self.log(&format!("✓ Saved '{}'.", i8k_out.display()))?;
res.config_paths.insert("i8kmon".to_string(), i8k_out);
}
let thinkfan_out = base_out.join("thinkfan.conf");
if ServiceIntegrator::generate_thinkfan_config(&matrix, &thinkfan_out).is_ok() {
self.log(&format!("✓ Saved '{}'.", thinkfan_out.display()))?;
res.config_paths.insert("thinkfan".to_string(), thinkfan_out);
}
let thermald_out = base_out.join("thermal-conf.xml");
if ServiceIntegrator::generate_thermald_config(&matrix, &thermald_out).is_ok() {
self.log(&format!("✓ Saved '{}'.", thermald_out.display()))?;
res.config_paths.insert("thermald".to_string(), thermald_out);
}
let script_out = base_out.join("ember-tune-neutralize.sh");
if ServiceIntegrator::generate_conflict_resolution_script(&script_out).is_ok() {
self.log(&format!("✓ Saved conflict resolution script: '{}'", script_out.display()))?;
res.config_paths.insert("conflict_script".to_string(), script_out);
}
Ok(res)
@@ -359,6 +389,7 @@ impl BenchmarkOrchestrator {
pl1_limit: 0.0,
pl2_limit: 0.0,
fan_tier: String::new(),
is_throttling: sal.get_throttling_status().unwrap_or(false),
phase: BenchmarkPhase::StressTesting,
history_watts: Vec::new(),
history_temp: Vec::new(),
@@ -396,6 +427,7 @@ impl BenchmarkOrchestrator {
max_temp_c: max_t,
is_partial,
config_paths: std::collections::HashMap::new(),
optimization_matrix: None,
}
}
@@ -428,6 +460,7 @@ impl BenchmarkOrchestrator {
pl1_limit: 0.0,
pl2_limit: 0.0,
fan_tier: "auto".to_string(),
is_throttling: self.sal.get_throttling_status().unwrap_or(false),
phase: self.phase,
history_watts: Vec::new(),
history_temp: Vec::new(),
@@ -444,6 +477,7 @@ impl BenchmarkOrchestrator {
let temp = self.sal.get_temp().unwrap_or(0.0);
let pwr = self.sal.get_power_w().unwrap_or(0.0);
let freq = self.sal.get_freq_mhz().unwrap_or(0.0);
let throttling = self.sal.get_throttling_status().unwrap_or(false);
self.history_temp.push_back(temp);
self.history_watts.push_back(pwr);
@@ -467,6 +501,7 @@ impl BenchmarkOrchestrator {
pl1_limit: 15.0,
pl2_limit: 25.0,
fan_tier: "max".to_string(),
is_throttling: throttling,
phase: self.phase,
history_watts: self.history_watts.iter().cloned().collect(),
history_temp: self.history_temp.iter().cloned().collect(),

View File

@@ -5,9 +5,10 @@ use std::fs;
use std::path::{PathBuf};
use std::time::{Duration, Instant};
use std::sync::Mutex;
use tracing::{debug};
use tracing::{debug, warn};
use crate::sal::heuristic::discovery::SystemFactSheet;
/// Implementation of the System Abstraction Layer for the Dell XPS 13 9380.
pub struct DellXps9380Sal {
ctx: EnvironmentCtx,
fact_sheet: SystemFactSheet,
@@ -23,9 +24,16 @@ pub struct DellXps9380Sal {
suppressed_services: Mutex<Vec<String>>,
msr_file: Mutex<fs::File>,
last_energy: Mutex<(u64, Instant)>,
last_watts: Mutex<f32>,
// --- Original State for Restoration ---
original_pl1: Mutex<Option<u64>>,
original_pl2: Mutex<Option<u64>>,
original_fan_mode: Mutex<Option<String>>,
}
impl DellXps9380Sal {
/// Initializes the Dell SAL, opening the MSR interface and discovering sensors.
pub fn init(ctx: EnvironmentCtx, facts: SystemFactSheet) -> Result<Self> {
let temp_path = facts.temp_path.clone().context("Dell SAL requires temperature sensor")?;
let pwr_base = facts.rapl_paths.first().cloned().context("Dell SAL requires RAPL interface")?;
@@ -52,8 +60,12 @@ impl DellXps9380Sal {
suppressed_services: Mutex::new(Vec::new()),
msr_file: Mutex::new(msr_file),
last_energy: Mutex::new((initial_energy, Instant::now())),
last_watts: Mutex::new(0.0),
fact_sheet: facts,
ctx,
original_pl1: Mutex::new(None),
original_pl2: Mutex::new(None),
original_fan_mode: Mutex::new(None),
})
}
@@ -81,6 +93,22 @@ impl PreflightAuditor for DellXps9380Sal {
outcome: if unsafe { libc::getuid() } == 0 { Ok(()) } else { Err(AuditError::RootRequired) }
});
// RAPL Lock Check (MSR 0x610)
let rapl_lock = match self.read_msr(0x610) {
Ok(val) => {
if (val & (1 << 63)) != 0 {
Err(AuditError::KernelIncompatible("RAPL Registers are locked by BIOS. Power limit tuning is impossible.".to_string()))
} else {
Ok(())
}
},
Err(e) => Err(AuditError::ToolMissing(format!("Cannot read MSR 0x610: {}", e))),
};
steps.push(AuditStep {
description: "MSR 0x610 RAPL Lock Status".to_string(),
outcome: rapl_lock,
});
let modules = ["dell_smm_hwmon", "msr", "intel_rapl_msr"];
for mod_name in modules {
let path = self.ctx.sysfs_base.join(format!("sys/module/{}", mod_name));
@@ -115,23 +143,24 @@ impl PreflightAuditor for DellXps9380Sal {
}
});
let tool_check = self.fact_sheet.paths.tools.contains_key("dell_fan_ctrl");
steps.push(AuditStep {
description: "Dell Fan Control Tool".to_string(),
outcome: if tool_check { Ok(()) } else { Err(AuditError::ToolMissing("dell-bios-fan-control not found in PATH".to_string())) }
});
Box::new(steps.into_iter())
}
}
impl EnvironmentGuard for DellXps9380Sal {
fn suppress(&self) -> Result<()> {
let mut suppressed = self.suppressed_services.lock().unwrap();
if let Ok(pl1) = fs::read_to_string(&self.pl1_path) {
*self.original_pl1.lock().unwrap() = pl1.trim().parse().ok();
}
if let Ok(pl2) = fs::read_to_string(&self.pl2_path) {
*self.original_pl2.lock().unwrap() = pl2.trim().parse().ok();
}
*self.original_fan_mode.lock().unwrap() = Some("1".to_string());
let services = ["tlp", "thermald", "i8kmon"];
let mut suppressed = self.suppressed_services.lock().unwrap();
for s in services {
if self.ctx.runner.run("systemctl", &["is-active", "--quiet", s]).is_ok() {
debug!("Suppressing service: {}", s);
let _ = self.ctx.runner.run("systemctl", &["stop", s]);
suppressed.push(s.to_string());
}
@@ -140,6 +169,15 @@ impl EnvironmentGuard for DellXps9380Sal {
}
fn restore(&self) -> Result<()> {
if let Some(pl1) = *self.original_pl1.lock().unwrap() {
let _ = fs::write(&self.pl1_path, pl1.to_string());
}
if let Some(pl2) = *self.original_pl2.lock().unwrap() {
let _ = fs::write(&self.pl2_path, pl2.to_string());
}
if let Some(tool_path) = self.fact_sheet.paths.tools.get("dell_fan_ctrl") {
let _ = self.ctx.runner.run(&tool_path.to_string_lossy(), &["1"]);
}
let mut suppressed = self.suppressed_services.lock().unwrap();
for s in suppressed.drain(..) {
let _ = self.ctx.runner.run("systemctl", &["start", &s]);
@@ -167,16 +205,25 @@ impl SensorBus for DellXps9380Sal {
let energy_path = rapl_base.join("energy_uj");
if energy_path.exists() {
let mut last = self.last_energy.lock().unwrap();
let mut last_energy = self.last_energy.lock().unwrap();
let mut last_watts = self.last_watts.lock().unwrap();
let e2_str = fs::read_to_string(&energy_path)?;
let e2 = e2_str.trim().parse::<u64>()?;
let t2 = Instant::now();
let (e1, t1) = *last;
let (e1, t1) = *last_energy;
let delta_e = e2.wrapping_sub(e1);
let delta_t = t2.duration_since(t1).as_secs_f32();
*last = (e2, t2);
if delta_t < 0.05 { return Ok(0.0); }
Ok((delta_e as f32 / 1_000_000.0) / delta_t)
if delta_t < 0.1 {
return Ok(*last_watts); // Return cached if polled too fast
}
let watts = (delta_e as f32 / 1_000_000.0) / delta_t;
*last_energy = (e2, t2);
*last_watts = watts;
Ok(watts)
} else {
let s = fs::read_to_string(&self.pwr_path)?;
Ok(s.trim().parse::<f32>()? / 1000000.0)
@@ -204,6 +251,12 @@ impl SensorBus for DellXps9380Sal {
let s = fs::read_to_string(&self.freq_path)?;
Ok(s.trim().parse::<f32>()? / 1000.0)
}
fn get_throttling_status(&self) -> Result<bool> {
// MSR 0x19C bit 0 is "Thermal Status", bit 1 is "Thermal Log"
let val = self.read_msr(0x19C)?;
Ok((val & 0x1) != 0)
}
}
impl ActuatorBus for DellXps9380Sal {
@@ -220,14 +273,7 @@ impl ActuatorBus for DellXps9380Sal {
Ok(())
}
fn set_fan_speed(&self, speed: FanSpeedPercentage) -> Result<()> {
let tool_path = self.fact_sheet.paths.tools.get("dell_fan_ctrl")
.ok_or_else(|| anyhow!("Dell fan control tool not found in PATH"))?;
let tool_str = tool_path.to_string_lossy();
if speed.as_u8() > 50 {
let _ = self.ctx.runner.run(&tool_str, &["0"]);
}
fn set_fan_speed(&self, _speed: FanSpeedPercentage) -> Result<()> {
Ok(())
}

View File

@@ -133,6 +133,23 @@ impl SensorBus for GenericLinuxSal {
Err(anyhow!("Could not determine CPU frequency"))
}
}
fn get_throttling_status(&self) -> Result<bool> {
// Fallback: check if any cooling device is active (cur_state > 0)
let cooling_base = self.ctx.sysfs_base.join("sys/class/thermal");
if let Ok(entries) = fs::read_dir(cooling_base) {
for entry in entries.flatten() {
if entry.file_name().to_string_lossy().starts_with("cooling_device") {
if let Ok(state) = fs::read_to_string(entry.path().join("cur_state")) {
if state.trim().parse::<u32>().unwrap_or(0) > 0 {
return Ok(true);
}
}
}
}
}
Ok(false)
}
}
impl ActuatorBus for GenericLinuxSal {

View File

@@ -54,6 +54,9 @@ impl SensorBus for MockSal {
fn get_freq_mhz(&self) -> Result<f32> {
Ok(3200.0)
}
fn get_throttling_status(&self) -> Result<bool> {
Ok(self.get_temp()? > 90.0)
}
}
impl ActuatorBus for MockSal {

View File

@@ -140,6 +140,9 @@ pub trait SensorBus: Send + Sync {
/// # Errors
/// Returns an error if `/proc/cpuinfo` or a `cpufreq` sysfs node cannot be read.
fn get_freq_mhz(&self) -> Result<f32>;
/// Returns true if the system is currently thermally throttling.
fn get_throttling_status(&self) -> Result<bool>;
}
impl<T: SensorBus + ?Sized> SensorBus for Arc<T> {
@@ -155,6 +158,9 @@ impl<T: SensorBus + ?Sized> SensorBus for Arc<T> {
fn get_freq_mhz(&self) -> Result<f32> {
(**self).get_freq_mhz()
}
fn get_throttling_status(&self) -> Result<bool> {
(**self).get_throttling_status()
}
}
use crate::sal::safety::{TdpLimitMicroWatts, FanSpeedPercentage};