YAML rules → Pandas DataFrames (under the hood)

This note is for engineers who want to see exactly how Open-FDD turns rule YAML on disk into pandas operations on time-series DataFrames—and where Brick TTL fits in. It complements Fault rules overview and standalone CSV / pandas.

1. YAML never becomes “one giant DataFrame of rules”

Rule files are configuration, not tabular data. Open-FDD:

Reads each *.yaml in rules_dir with PyYAML (yaml.safe_load) into a Python dict per file (open_fdd.engine.runner.load_rule / load_rules_from_dir).
Keeps those dicts in a list[dict] inside RuleRunner (RuleRunner(rules=...) or RuleRunner(rules_path=...)).
At evaluation time, walks that list and, for each rule, computes a boolean mask (pandas.Series) aligned to the sensor DataFrame’s index, then writes a new column for the fault flag (e.g. my_rule_flag).

So: rules = list of dicts in memory; data = one wide-ish DataFrame of timestamps and point columns.

2. From long telemetry rows to a wide DataFrame

A common pattern (warehouse export, historian CSV, SQL query) is a long table (timestamp, point_id, value). For RuleRunner you typically:

Load rows into pandas.
Pivot to wide: df.pivot_table(index="timestamp", columns="point_id", values="value") (or equivalent) so each sensor is a column.
Rename columns using column_map (dict or manifest) so rule inputs line up with Brick-style or logical keys.
Parse the index with pd.to_datetime when you need time-based checks.

pandas handles aggregation during the pivot when duplicate index/column pairs exist; the result is one row per timestamp.

3. What `RuleRunner.run` does to that DataFrame

RuleRunner.run (open_fdd.engine.runner):

result = df.copy() — rules never mutate the caller’s frame in place (callers can still hold a reference to the original).
For each rule dict:
- Derives flag_name from flag or {name}_flag.
- Calls _evaluate_rule → returns a boolean Series (fault mask) aligned to result.index.
- Optionally applies a rolling window on the mask: mask.astype(int).rolling(window=rw).sum() >= rw so a fault must persist for N consecutive samples.
- Assigns result[flag_name] = ... as integer 0/1.

So each rule adds one column of flags; the frame grows width-wise, not row-wise.

4. How a YAML rule dict becomes pandas expressions

Inside _evaluate_rule, Open-FDD branches on rule["type"] (e.g. bounds, flatline, expression, …):

column_map resolution: For each logical input key, the runner picks a DataFrame column name. If the YAML input has a brick class, the supplied column_map dict is consulted (brick class key → column label), so the same YAML can run against different exports as long as the map matches your frame.
Bounds / thresholds: Typical pattern is (series < low) | (series > high) using vectorized comparisons — these are numpy ufuncs under pandas, no Python for over rows.
Expressions: String expressions may be evaluated in a restricted eval context (open_fdd.engine.checks.check_expression) with named series bound to result[col] — still vectorized.

If a required column is missing, the runner either raises or skips the rule (skip_missing_columns=True), depending on the call site.

5. Reloading rules vs DataFrame lifetime

If you call load_rules_from_dir before each RuleRunner construction, edits to YAML on disk take effect on the next run. The DataFrame you pass in exists only for that evaluation; construct a fresh frame for each batch or window as your pipeline requires.

6. Mental model checklist

Artifact	In-memory shape	pandas role
Rule YAML files	N/A (on disk)	None until loaded
Loaded rules	`list[dict]`	None
Telemetry window	`DataFrame` (time × points)	pivot, datetime index, column rename
Rule output	Same index, +flag columns	vectorized masks, optional `rolling`
Downstream store	Your app persists outputs if needed	After flags are computed

For a minimal example of rules + CSV (no DB), see standalone CSV / pandas and unit tests in open_fdd/tests/engine/test_runner.py.