YAML rules → Pandas DataFrames (under the hood)
This note is for engineers who want to see exactly how Open-FDD turns rule YAML on disk into pandas operations on time-series DataFrames—and where Brick TTL fits in. It complements Fault rules overview and standalone CSV / pandas.
1. YAML never becomes “one giant DataFrame of rules”
Rule files are configuration, not tabular data. Open-FDD:
- Reads each
*.yamlinrules_dirwith PyYAML (yaml.safe_load) into a Pythondictper file (open_fdd.engine.runner.load_rule/load_rules_from_dir). - Keeps those dicts in a
list[dict]insideRuleRunner(RuleRunner(rules=...)orRuleRunner(rules_path=...)). - At evaluation time, walks that list and, for each rule, computes a boolean mask (
pandas.Series) aligned to the sensor DataFrame’s index, then writes a new column for the fault flag (e.g.my_rule_flag).
So: rules = list of dicts in memory; data = one wide-ish DataFrame of timestamps and point columns.
2. From long telemetry rows to a wide DataFrame
A common pattern (warehouse export, historian CSV, SQL query) is a long table (timestamp, point_id, value). For RuleRunner you typically:
- Load rows into pandas.
- Pivot to wide:
df.pivot_table(index="timestamp", columns="point_id", values="value")(or equivalent) so each sensor is a column. - Rename columns using
column_map(dict or manifest) so rule inputs line up with Brick-style or logical keys. - Parse the index with
pd.to_datetimewhen you need time-based checks.
pandas handles aggregation during the pivot when duplicate index/column pairs exist; the result is one row per timestamp.
3. What RuleRunner.run does to that DataFrame
RuleRunner.run (open_fdd.engine.runner):
result = df.copy()— rules never mutate the caller’s frame in place (callers can still hold a reference to the original).- For each rule dict:
- Derives
flag_namefromflagor{name}_flag. - Calls
_evaluate_rule→ returns a booleanSeries(fault mask) aligned toresult.index. - Optionally applies a rolling window on the mask:
mask.astype(int).rolling(window=rw).sum() >= rwso a fault must persist for N consecutive samples. - Assigns
result[flag_name] = ...as integer 0/1.
- Derives
So each rule adds one column of flags; the frame grows width-wise, not row-wise.
4. How a YAML rule dict becomes pandas expressions
Inside _evaluate_rule, Open-FDD branches on rule["type"] (e.g. bounds, flatline, expression, …):
column_mapresolution: For each logical input key, the runner picks a DataFrame column name. If the YAML input has abrickclass, the suppliedcolumn_mapdict is consulted (brickclass key → column label), so the same YAML can run against different exports as long as the map matches your frame.- Bounds / thresholds: Typical pattern is
(series < low) | (series > high)using vectorized comparisons — these are numpy ufuncs under pandas, no Pythonforover rows. - Expressions: String expressions may be evaluated in a restricted eval context (
open_fdd.engine.checks.check_expression) with named series bound toresult[col]— still vectorized.
If a required column is missing, the runner either raises or skips the rule (skip_missing_columns=True), depending on the call site.
5. Reloading rules vs DataFrame lifetime
If you call load_rules_from_dir before each RuleRunner construction, edits to YAML on disk take effect on the next run. The DataFrame you pass in exists only for that evaluation; construct a fresh frame for each batch or window as your pipeline requires.
6. Mental model checklist
| Artifact | In-memory shape | pandas role |
|---|---|---|
| Rule YAML files | N/A (on disk) | None until loaded |
| Loaded rules | list[dict] | None |
| Telemetry window | DataFrame (time × points) | pivot, datetime index, column rename |
| Rule output | Same index, +flag columns | vectorized masks, optional rolling |
| Downstream store | Your app persists outputs if needed | After flags are computed |
For a minimal example of rules + CSV (no DB), see standalone CSV / pandas and unit tests in open_fdd/tests/engine/test_runner.py.