YAML rules → Pandas DataFrames (under the hood)
This note is for engineers who want to see exactly how Open-FDD turns rule YAML on disk into pandas operations on time-series DataFrames—and where Brick TTL fits in. It complements Fault rules overview and standalone CSV / pandas.
1. YAML never becomes “one giant DataFrame of rules”
Rule files are configuration, not tabular data. Open-FDD:
- Reads each
*.yamlinrules_dirwith PyYAML (yaml.safe_load) into a Pythondictper file (open_fdd.engine.runner.load_rule/load_rules_from_dir). - Keeps those dicts in a
list[dict]insideRuleRunner(RuleRunner(rules=...)orRuleRunner(rules_path=...)). - At evaluation time, walks that list and, for each rule, computes a boolean mask (
pandas.Series) aligned to the sensor DataFrame’s index, then writes a new column for the fault flag (e.g.my_rule_flag).
So: rules = list of dicts in memory; data = one wide-ish DataFrame of timestamps and point columns.
2. From Postgres rows to a site/equipment DataFrame
The continuous loop (open_fdd.platform.loop.run_fdd_loop) loads telemetry with SQL, then reshapes it for pandas:
- Query
timeseries_readingsjoined topointsfor the time window (load_timeseries_for_site/load_timeseries_for_equipment). - Build a long table:
pd.DataFrame(rows)with columns likets,external_id,value. - Pivot to wide:
df.pivot_table(index="ts", columns="external_id", values="value")so each point is a column (one column perexternal_id). - Rename columns using a column map derived from Brick TTL (
resolve_from_ttl): external refs or labels become the names rules expect (Brick-class-driven mapping). - Add
timestamp = pd.to_datetime(df["ts"])for time-based checks.
Under the hood, pandas is doing grouped aggregation in the pivot (duplicate (ts, external_id) would aggregate), then aligning all series to a common DatetimeIndex (implicit via the pivot index).
3. What RuleRunner.run does to that DataFrame
RuleRunner.run (open_fdd.engine.runner):
result = df.copy()— rules never mutate the caller’s frame in place (callers can still hold a reference to the original).- For each rule dict:
- Derives
flag_namefromflagor{name}_flag. - Calls
_evaluate_rule→ returns a booleanSeries(fault mask) aligned toresult.index. - Optionally applies a rolling window on the mask:
mask.astype(int).rolling(window=rw).sum() >= rwso a fault must persist for N consecutive samples. - Assigns
result[flag_name] = ...as integer 0/1.
- Derives
So each rule adds one column of flags; the frame grows width-wise, not row-wise.
4. How a YAML rule dict becomes pandas expressions
Inside _evaluate_rule, Open-FDD branches on rule["type"] (e.g. bounds, flatline, expression, …):
column_mapresolution: For each logical input key, the runner picks a DataFrame column name. If the YAML input has abrickclass, the globalcolumn_mapfrom TTL is consulted first (brick→ column label), so the same YAML can run against different exports as long as TTL maps Brick classes to the right columns.- Bounds / thresholds: Typical pattern is
(series < low) | (series > high)using vectorized comparisons — these are numpy ufuncs under pandas, no Pythonforover rows. - Expressions: String expressions may be evaluated in a restricted eval context (
open_fdd.engine.checks.check_expression) with named series bound toresult[col]— still vectorized.
If a required column is missing, the runner either raises or skips the rule (skip_missing_columns=True), depending on the call site.
5. Hot reload: YAML vs DataFrame lifetime
run_fdd_loop calls load_rules_from_dir(rules_path) every run, then filters by equipment types from TTL, then RuleRunner(rules=rules). There is no long-lived compiled rule object on disk—editing YAML affects the next scheduled run (or the next manual POST /run-fdd). The DataFrame exists only for the duration of that run’s Python call stack (load → run → persist results).
6. Mental model checklist
| Artifact | In-memory shape | pandas role |
|---|---|---|
| Rule YAML files | N/A (on disk) | None until loaded |
| Loaded rules | list[dict] | None |
| Telemetry window | DataFrame (time × points) | pivot, datetime index, column rename |
| Rule output | Same index, +flag columns | vectorized masks, optional rolling |
fault_results / DB | Written from row-wise FDDResult | After flags are computed |
For a minimal example of rules + CSV (no DB), see standalone CSV / pandas and unit tests in open_fdd/tests/engine/test_runner.py.