Glossary — Ai26.10 Technical Reference

High confidence

Updated 26 Apr 2026 by David Olsson

An extensive, opinionated reference for the technical vocabulary of Ai26.10 — what each term means in general, and what it means in this project specifically. Sections are tagged by domain: SCI (fermentation science), PROC (process control), AI (machine learning), TWIN (digital twin), MATH (modeling and statistics), ANO (anomaly detection), MPC (optimization & control), SYS (system architecture), GOV (project & governance).

This page is meant to be readable end-to-end, but most readers will land on a specific section. Use the table of contents (Cmd-F is your friend until we add a sidebar).

Where each term sits in the system

The signal flow underneath the whole stack:

Each block in this diagram has its own glossary entry below.

Fermentation Science (SCI)

Fermentation

A microbially-driven biochemical conversion in a controlled vessel. In this project: CDI's fermenters convert grape pomace and other agricultural sidestreams into premium ingredients via a patented bioconversion that activates polyphenols and fibre.

Batch

A single run of the fermentation process from inoculation to harvest. In this project: the unit of measurement for everything — yield, cycle time, failure rate are all per-batch quantities. Each batch has a recipe, feedstock lot, operator, and a complete time-series of sensor data.

Feedstock

The input material being fermented. In this project: grape pomace (skins, seeds, stems) plus other agricultural sidestreams. Variability in pomace composition (varietal, vintage, harvest practices) is one of the hardest problems for the AI — the system has to generalize across inputs that are never quite identical.

Polyphenols

A broad class of plant-derived molecules with antioxidant, antimicrobial, and flavor-modulating properties. In this project: the activation of polyphenols during fermentation is the value-add — the difference between waste pomace and a premium food ingredient.

Biomass

The total mass of microbial cells in the fermenter. Usually measured as cell density (g/L dry weight) or via optical density (OD). In this project: biomass is one of the primary targets for the soft-sensor — measuring it directly requires sampling and lab work, but predicting it from pH/DO/temperature/off-gas in real-time enables much tighter control.

Yield

Mass of product per mass of feedstock (g/g) or per mass of substrate consumed. In this project: one of the four core baseline KPIs. Improving yield by even a few percent at industrial scale is the commercial case for the AI.

Cycle Time

Total elapsed time from start of one batch to start of the next, including fermentation, harvest, CIP. In this project: another core baseline KPI. Faster cycles directly translate to more annual production from the same plant.

Batch Failure Rate

Fraction of batches that fail to meet specification. In this project: baseline KPI. Anomaly detection directly attacks this number — early intervention on failing batches either rescues them or terminates them before more energy is spent.

Off-Gas

Gases evolving from the fermenter — CO₂, O₂ depletion, sometimes ethanol vapors. In this project: off-gas composition is a rich, non-invasive signal of metabolic state and a key input to the soft-sensor for biomass.

Optical Density (OD)

Light absorbance at a specified wavelength (commonly 600 nm), used as a fast proxy for cell density. In this project: an inline OD probe is one of the cheap-and-frequent sensors the soft-sensor learns to interpret.

Dissolved Oxygen (DO)

Oxygen concentration in the fermentation broth, usually measured as % saturation. In this project: a primary control variable; aeration setpoints are computed against a desired DO trajectory.

pH

Acid/base balance of the fermentation broth. In this project: controlled actively via base/acid addition; deviations are early indicators of metabolic shift.

Microbial Community

The collection of organisms doing the fermentation. In this project: CDI's process uses a defined consortium — knowing this matters for the digital twin, because the kinetics depend on what's growing.

Lag / Log / Stationary Phase

The classical three-phase microbial growth curve: slow start (lag), exponential growth (log), and growth halt (stationary). In this project: the soft-sensor and the twin both have to handle phase transitions cleanly — they're the moments when models tend to disagree most.

Primary vs. Secondary Metabolites

Primary metabolites are produced during exponential growth and are essential for the cell (amino acids, ATP). Secondary metabolites are produced in stationary phase and are often the commercially valuable ones (polyphenol derivatives in our case). In this project: the timing of secondary metabolite production is a key target for control.

CIP (Clean-in-Place)

Automated cleaning cycle between batches. In this project: captured for completeness in the data pipeline but not part of the fermentation baseline.

Process Control (PROC)

PLC (Programmable Logic Controller)

The deterministic, real-time control hardware that actually drives valves, pumps, and heating elements. In this project: CDI's existing PLCs are unchanged — Ai26.10 sits over them, sending advisory setpoint changes that the PLC executes.

SCADA (Supervisory Control and Data Acquisition)

The software layer above PLCs that operators interact with. Provides visualization, historian functions, alarms. In this project: the existing SCADA is the system of record for setpoints; our advisory overlay sits next to it.

MES (Manufacturing Execution System)

A higher layer above SCADA that handles batch records, recipe management, and integrates with ERP. In this project: scope only as far as we need to read batch metadata.

OPC-UA

The standard industrial protocol for sensor and control data. In this project: the language the edge gateway speaks to talk to the existing PLC/SCADA.

Setpoint

The target value for a controlled variable (e.g. "DO = 30%"). In this project: the MPC's outputs are setpoint changes, not raw actuator commands. The PLC enforces the setpoint via existing PID loops.

PID Controller

A classical feedback controller that adjusts an actuator based on Proportional, Integral, and Derivative terms of the error. In this project: every existing CDI control loop is PID; our MPC sits above and re-tunes their setpoints.

Advisory Control

The AI computes a recommendation; the operator (or PLC) decides whether to apply it. In this project: the default mode through the entire project. We don't move to closed-loop autonomous control without explicit operator validation gates.

Supervisory Control

The AI sits above lower-level controllers (PIDs), changing their setpoints rather than driving actuators directly. In this project: the architectural choice for the MPC — it interacts with the existing control system at the setpoint level, not at the device level.

Sensor Calibration

The process of mapping a sensor's raw electrical output to a physical quantity, accounting for offset and slope drift. In this project: a tracked risk (R03). The soft-sensor and twin assume sensors are well-calibrated; we add drift detection as a safety net.

Sensor Drift

Slow change in a sensor's calibration over time. In this project: modelled explicitly — the data infrastructure flags drift via repeated lab-vs-sensor comparisons, and the soft-sensor's confidence bounds widen when drift is suspected.

Machine Learning & AI (AI)

Neural Network

A parameterized function approximator built from layers of weighted, non-linear units. In this project: the architectural choice for the soft-sensor models. Ensembles of small networks rather than a single large one — easier to quantify uncertainty, faster to retrain.

Deep Learning

The use of multi-layer neural networks with modern training techniques. In this project: soft-sensor and anomaly-detection models. The digital twin uses ML primarily as a residual on top of first-principles equations, not as a replacement.

Soft Sensor

A model that infers an expensive-to-measure quantity from cheap-to-measure ones. In this project: networks that infer biomass, key metabolites, and off-gas composition from the routinely-available temperature, pH, DO, and gas-phase sensors. Validated against periodic wet-chem analytics.

Ensemble

Multiple models combined into one prediction, typically for robustness and uncertainty estimation. In this project: the soft-sensors are ensembles of ~5-10 networks trained on bootstrap-resampled data; ensemble disagreement gives a confidence bound.

Confidence Bound

A statistically-defined range of plausible values around a prediction. In this project: every soft-sensor output ships with a confidence bound. The MPC weights its trust in the prediction by the bound width; below a threshold, it falls back to operator advisory.

Drift Detection

Monitoring the input distribution and prediction distribution of a deployed model for departure from training conditions. In this project: a first-class feature of the ML platform — the model registry tracks reference distributions for every model, and triggers retraining when drift exceeds a threshold.

Retraining Trigger

The signal that a model needs to be re-fit. In this project: triggered by sensor drift, change in feedstock composition, statistically-significant shift in prediction error, or scheduled cadence (whichever comes first).

Feature Store

A versioned repository of engineered features used as model inputs. In this project: sits between the time-series DB and the models — feature definitions are code-versioned, so every model knows exactly what it was trained on.

Time-Series Database (TSDB)

A database optimized for write-heavy, time-stamped numeric data. In this project: the primary store for sensor and PLC data — designed for years of data at sub-second sampling.

Cross-Validation

A model-evaluation technique where training data is repeatedly split into train/test folds. In this project: all model performance numbers in the validation reports use leave-one-batch-out cross-validation — we never train and test on the same batch.

MAE / RMSE / R²

Mean Absolute Error, Root Mean Squared Error, coefficient of determination. In this project: the standard accuracy metrics reported for soft-sensors. RMSE for "how big are typical errors", MAE for "what error should I plan around", R² for "how much variance does the model explain".

Hyperparameter

A model parameter set by the trainer rather than learned from data (learning rate, layer width, regularization strength). In this project: swept via Bayesian optimization on an internal grid; final values logged in the model registry.

Regularization

Techniques (L1/L2 weight decay, dropout, early stopping) that prevent overfitting. In this project: important because our datasets are small relative to deep-learning standards — historical CDI data measured in hundreds of batches, not millions.

Digital Twin (TWIN)

Digital Twin

A live model of a physical system that runs in parallel and predicts its evolution. In this project: a hybrid first-principles + ML model of CDI's fermentation that the MPC queries for "what if I change setpoint X?".

Hybrid Model

A model combining mechanistic equations with data-driven components. In this project: the digital twin uses a first-principles ODE system for mass balance, energy balance, and basic kinetics, plus an ML residual that captures unmodelled dynamics.

First-Principles Model

A model derived from physical/chemical/biological theory (conservation laws, kinetic mechanisms). In this project: the bones of the digital twin. Always interpretable, never wrong about something physically impossible (like negative biomass).

ML Residual

The difference between observed reality and the first-principles prediction, learned by an ML model. In this project: the flesh of the digital twin. Captures the parts of fermentation that physics alone can't predict — micro-scale heterogeneity, sensor characteristics, recipe-specific quirks.

Calibration (Twin)

Tuning the parameters of the digital twin so its predictions match observed data. In this project: done per-fermenter and per-recipe. Re-calibration is automated whenever sufficient new data is available.

Validation (Twin)

Comparing twin predictions against held-out batches the twin wasn't tuned on. In this project: every release of the twin is gated by a documented validation report — out-of-sample RMSE on biomass trajectory, prediction-vs-actual scatter for end-of-batch yield.

Parameter Estimation

Fitting the unknown parameters of a mechanistic model to observed data. In this project: maximum-likelihood with informative priors. Priors come from the literature where available (e.g. Monod parameters for grape-derived sugars), updated against CDI's actual batches.

Uncertainty Quantification (UQ)

Putting error bars on a model's outputs, not just point estimates. In this project: the twin always reports prediction intervals, not just trajectories. The MPC respects them — narrow intervals → trust the model, wide intervals → fall back to conservative control.

Surrogate Model

A cheap-to-evaluate model that approximates an expensive one. In this project: if the full twin is too slow for the MPC's optimization loop, we fit a surrogate to it offline.

Mathematical Modeling (MATH)

ODE (Ordinary Differential Equation)

An equation relating a function to its derivatives in one variable. In this project: the backbone of the first-principles model. Each species concentration $c_i$ evolves as $\frac{dc_i}{dt} = f_i(\mathbf{c}, T, \text{setpoints})$.

Mass Balance

Conservation of mass: in = out + accumulation + reaction. In this project: written for substrate, biomass, and key product species, gives the structural equations of the twin.

Energy Balance

Conservation of energy. In this project: captures how heat from metabolism plus heat from the jacket equals temperature change of the broth — needed for accurate temperature trajectory prediction.

Monod Kinetics

The classic empirical relationship for microbial growth rate vs. substrate: $\mu = \mu_\text{max} \cdot \frac{S}{K_S + S}$. In this project: the starting point for the kinetic submodel; extended with substrate-inhibition and product-inhibition terms as needed.

Michaelis-Menten Kinetics

The classical enzyme-rate equation: $v = \frac{V_\text{max} \cdot [S]}{K_M + [S]}$. In this project: used for specific enzymatic steps in the polyphenol-activation pathway.

Arrhenius Equation

Temperature dependence of reaction rate: $k = A \cdot \exp(-E_a / RT)$. In this project: every kinetic parameter in the twin has an Arrhenius temperature correction.

Bayesian Inference

Updating beliefs given data: $P(\theta | D) \propto P(D | \theta) \cdot P(\theta)$. In this project: how parameter estimation works. Priors encode literature/expert knowledge; the likelihood comes from observed batch data.

Kalman Filter

A recursive Bayesian estimator for linear dynamic systems. In this project: a candidate for the state-estimation layer that fuses noisy sensor readings with the twin's prediction in real time. Likely extended to EKF (Extended Kalman Filter) for the non-linear fermentation dynamics.

Particle Filter

A Monte-Carlo state estimator that handles non-Gaussian distributions. In this project: considered as an alternative to EKF when the posterior is multi-modal (e.g. during phase transitions).

Gradient Descent

Iterative parameter update by stepping along the negative gradient of a loss function. In this project: Adam optimizer for neural-network training; L-BFGS for parameter estimation in the first-principles model.

Backpropagation

The chain-rule-based algorithm that computes gradients of a neural-network loss. In this project: the internal mechanic of how the soft-sensors and the ML residual learn.

Anomaly Detection (ANO)

Anomaly Detection

Identifying observations that depart from a learned notion of "normal". In this project: flags batch-failure precursors and quality excursions early enough for operator intervention.

PCA (Principal Component Analysis)

A linear dimension-reduction technique that finds the directions of maximum variance. In this project: the baseline anomaly detector — distance from the in-control PCA subspace as the first alarm metric.

Autoencoder

A neural network trained to reconstruct its input through a low-dimensional bottleneck. In this project: the non-linear cousin to the PCA detector. Reconstruction error becomes the anomaly score.

Multivariate Control Charts

Statistical charts (Hotelling T², SPE) that monitor multivariate processes. In this project: the control-room-readable form of the anomaly signal.

Hotelling T²

A multivariate generalization of the Student's t statistic for monitoring multiple variables jointly. In this project: combined with SPE (Squared Prediction Error) for two-stage anomaly detection — T² flags departures within the model subspace, SPE flags departures from it.

SPC (Statistical Process Control)

The classical framework of control charts and process capability. In this project: anomaly detection results are surfaced in SPC-format charts familiar to plant operators, not as opaque ML scores.

False Positive Rate (FPR)

Fraction of normal observations incorrectly flagged. In this project: the dominant tuning knob — a 1% FPR per minute of monitoring still produces an alarm every ~100 minutes, which operators stop trusting fast. Real targets are much lower.

True Positive Rate (TPR) / Recall

Fraction of actual anomalies correctly flagged. In this project: the other axis of the operating-point trade-off. Tracked per known failure mode separately.

Alerting Threshold

The numerical cutoff above which a score becomes an alarm. In this project: tuned per-alarm-channel, with operator-in-the-loop calibration over weeks of pilot operation.

MPC & Optimization (MPC)

MPC (Model-Predictive Control)

A control strategy that, at each step, solves an optimization problem over a prediction horizon to choose the best next setpoint. In this project: the supervisory controller that uses the digital twin to decide setpoint changes.

Cost Function

The scalar quantity the MPC minimizes (or its negative — maximizes). In this project: a weighted combination of negative yield, cycle-time penalty, energy penalty, plus large penalties on constraint violation.

Constraint

A condition the optimizer must respect (e.g. "DO never below 10%"). In this project: safety constraints (temperature, pressure, pH bounds) are hard; quality constraints (yield window) are soft via penalty.

Prediction Horizon

The number of future steps the MPC simulates and optimizes over. In this project: typically the remainder of the current batch — fermentation is a finite-horizon problem.

Control Horizon

The number of future setpoint changes the MPC actually plans. In this project: shorter than the prediction horizon — the MPC plans a few moves ahead, applies one, replans.

Receding Horizon

The procedure of replanning at every step using the latest observations. In this project: the MPC re-optimizes every minute or so. The "plan" from a minute ago is replaced by a fresh one informed by what just happened.

Quadratic Programming (QP)

Optimization with a quadratic objective and linear constraints. In this project: used when the linearized control problem is convex enough — fast solver, guaranteed optimum.

Non-Linear Programming (NLP)

Optimization with a non-linear objective or non-linear constraints. In this project: the general case. Solved with interior-point methods (IPOPT) when the problem is smooth, sequential quadratic programming (SQP) when initial guesses are good.

Convex Optimization

A class of optimization problems where the objective and feasible set are convex. In this project: we work hard to keep the MPC convex — globally-optimal solutions, fast solve times.

KKT Conditions

The Karush-Kuhn-Tucker first-order optimality conditions. In this project: mostly under the hood — the way our solver knows it's at an optimum.

System Architecture (SYS)

Edge Gateway

Small computer at the plant boundary that buffers and forwards sensor data. In this project: the only piece of new hardware in CDI's facility. Buffers locally during connectivity loss, replays on reconnection, talks OPC-UA inward and HTTPS outward.

OT/IT Segmentation

Architectural separation between Operational Technology (control systems) and Information Technology (cloud, business systems). In this project: strictly enforced. The cloud control plane never speaks directly to PLCs. Data flows OT → DMZ (edge) → IT. Setpoint suggestions flow back through the same path with operator approval gates.

Time-Series DB (TSDB)

See AI section. Architecturally, this is the first persistent store for incoming sensor data.

Feature Store

See AI section. Architecturally, sits between TSDB and the model layer.

Data Lake

A versioned object store for everything that doesn't fit cleanly into a relational schema — raw lab files, batch reports, archived models. In this project: the long-term archive, alongside the TSDB.

Model Registry

A versioned repository of trained ML models with their training data lineage, evaluation metrics, and deployment status. In this project: every soft-sensor and anomaly model that's ever served traffic is in the registry — important for auditability and rollback.

Observability

The instrumentation that lets you understand what a running system is doing — metrics, logs, traces. In this project: every model inference call, every MPC solve, every alarm is logged with a unique trace ID so we can reconstruct any incident.

Audit Trail

A tamper-evident log of every action taken by the system. In this project: required by the OT/safety posture. Every advisory issued, every operator override, every retraining event.

SaaS (Software as a Service)

Software delivered as a hosted, multi-tenant service. In this project: the eventual commercialization path for A47 — the same stack deployed across multiple fermentation operators.

Multi-Tenancy

Architectural pattern where one application instance serves multiple isolated customers. In this project: designed in from the start. CDI is tenant zero; future operators get isolated data, models, and configurations behind the same code base.

Zero-Trust

A network architecture where no actor is trusted by default; every request is authenticated and authorized. In this project: the security model between the cloud control plane and the plant edge gateway. Every message is mTLS-authenticated; secrets rotate.

Project & Governance (GOV)

PIC (Protein Industries Canada)

The Global Innovation Cluster funding this project under the PCAIS (Protein Consortia AI Stream) program.

MPA (Master Project Agreement)

The contract between PIC, CDI, and A47 that governs Ai26.10. In this project: the source of truth for all obligations not derivable from the proposal — IP rights, claim procedures, change controls, publication clearance.

Schedule A

The workbook attached to the MPA defining milestones, deliverables, financials, and timelines. In this project: every milestone in the YAML traces to a Schedule A entry; every claim package reconciles against Schedule A.

Steering Committee (SC)

The governance body for the consortium. In this project: at minimum monthly, with members from CDI, A47, and PIC. Decisions ratified here are recorded in decisions/ with sc_meeting references.

Claim Package

The quarterly submission to PIC reconciling actual spend against Schedule A and providing milestone evidence. In this project: automated by the project-claim-prep skill against .project-state/.

RAG Status (Red / Amber / Green)

A traffic-light indicator of project health on a given dimension. In this project: state.json carries RAG status for schedule, budget, scope, risk, and overall.

Change Order

A formal modification to Schedule A. In this project: classified as material (requires PIC approval and Schedule A amendment) or minor (in-consortium, logged but not requiring PIC sign-off). Routed by project-change-register.

IP Disclosure

A formal notification to PIC of newly-created intellectual property. In this project: required within a defined window of creation. Tracked by project-ip-tracker.

ADR (Architecture Decision Record)

A documented record of a significant decision and its rationale. In this project: the format used in the decisions/ log — context, decision, rationale, consequences.

Milestone

A unit of project work with planned dates, owner, deliverables, and completion criteria. In this project: 13 of them, each in its own YAML file under milestones/. The unit of progress reporting.

Lessons Learned

A captured insight (positive or negative) recorded for future reference. In this project: continuously through the project, then summarized at closeout per the PIC PM Guide.

How to add to this glossary

This page is a wiki entry. To add a term:

Decide which section it lives in.
Add the term as a ### Term heading.
Provide a general definition (one sentence ideally).
Add an "In this project:" line explaining the project-specific meaning.
If math or a diagram clarifies things, add it. Mermaid diagrams render natively.

Or just send a request to the wiki maintainer and they'll add it via the scsiwyg MCP.