Why Nimbus SDK? Beyond scikit-learn and pyRiemann for BCI

April 23, 2026

Nimbus SDK is not trying to be another wrapper around LinearDiscriminantAnalysis. It starts from a different assumption: BCI models are not just classifiers — they are decision systems that must work under uncertainty, adapt online, and explain their confidence in real time.

That makes Nimbus SDK most useful when a project moves beyond offline accuracy tables and into calibration sessions, streaming EEG, rejection thresholds, ITR tradeoffs, and uncertainty-aware user feedback.

Python library: PyPI — nimbus-bci · Documentation — docs.nimbusbci.com

The short version

If you are training a quick baseline on a static feature matrix, scikit-learn may be enough — but once you need calibrated confidence outputs for thresholding and abstention, you will usually want a Bayesian classifier head.

If your core representation is trial covariance geometry, pyRiemann is the right tool for that part of the pipeline.

If you need Bayesian classifier heads, online updates, BCI-oriented diagnostics, and a consistent batch or streaming report format, Nimbus SDK is designed for that layer.

🧠 Use Nimbus SDK when the question is not only “What class did the model predict?” but also “How certain is the model, should this trial be accepted, and how does that affect the BCI loop?”

When to use what (checklist)

Offline baseline only? If you need a quick frequentist model on a static matrix and you will not close a BCI loop on confidence, entropy, or session adaptation, scikit-learn (optionally with calibration wrappers) is often enough.
Covariance / manifold geometry? If trial covariances, tangent space, or MDM-style classifiers are the core of your method, use pyRiemann (or similar) for that stage.
Euclidean features + a BCI decision layer? If your inputs are already a feature matrix (CSP, band power, DWT, embeddings, or vectors from an upstream Riemannian step) and you need posteriors, entropy, batch diagnostics, online partial_fit, or a StreamingSession loop, use Nimbus SDK for the head and reports.
Composing both? Typical pattern: geometry or CSP for features, then NimbusLDA / NimbusQDA / NimbusSoftmax (or NimbusSTS when you need state) for inference—as in the repo’s examples/e2e_mne_csp_nimbus.py.
Cued calibration with a label budget? Use Nimbus’s active-learning module: it ranks unlabeled trials by BALD (mutual information against the posterior) and stops the calibration block without needing labels via a posterior-stability rule. sklearn LDA/QDA (no partial_fit, no posterior to BALD against) and pyRiemann (out of scope) both leave this to bespoke code.

Minimal SDK example (synthetic data)

This matches the fit → clf.model_ → BCIData → predict_batch path used in our EEG → CSP → NimbusLDA end-to-end example; here features are random so you can run it without MNE. BCIData expects shape (n_features, n_samples, n_trials); we duplicate one time slice so each trial has at least two time samples (same validation pattern as in e2e_mne_csp_nimbus.py).

import numpy as np
from nimbus_bci import NimbusLDA, predict_batch
from nimbus_bci.data import BCIData, BCIMetadata

rng = np.random.default_rng(0)
X_train = rng.standard_normal((80, 8))
y_train = rng.integers(0, 2, size=80)
X_test = rng.standard_normal((20, 8))

clf = NimbusLDA()
clf.fit(X_train, y_train)

n_feat, n_trials = X_test.shape[1], X_test.shape[0]
# BCIData expects (n_features, n_samples, n_trials) with at least 2 time samples per trial
X3 = np.zeros((n_feat, 2, n_trials), dtype=np.float64)
X3[:, 0, :] = X_test.T
X3[:, 1, :] = X_test.T

meta = BCIMetadata(
    sampling_rate=250.0,
    paradigm="motor_imagery",
    feature_type="csp",
    n_features=n_feat,
    n_classes=2,
    temporal_aggregation="mean",
)
bci = BCIData(X3, meta, labels=None)  # pass labels for ECE/MCE on held-out data
res = predict_batch(clf.model_, bci, rng_seed=0)

print("mean entropy (bits):", res.mean_entropy)
print("confidence (first trial):", float(res.confidences[0]))

Different tools, different assumptions

A useful comparison starts with what each stack assumes about the problem.

Topic	scikit-learn	pyRiemann	✨ Nimbus SDK
Active learning (cued-calibration time)	No native ranking; reach for external libraries like modAL or `scikit-activeml` and roll your own posterior sampling — and sklearn's LDA/QDA cannot support BALD honestly because they only expose a point estimate, not a posterior	Out of scope	Native ranking, streaming, and stopping helpers aimed squarely at shrinking cued-calibration time — cheap strategies (entropy, margin, least-confidence), honest BALD on LDA, QDA, and Softmax thanks to the underlying conjugate and variational posteriors, and a label-free posterior-stability stopping rule that also works for the latent-state head
Primary focus	General-purpose ML estimators	Riemannian geometry on covariance features	Uncertainty-aware BCI inference, streaming updates, diagnostics
Typical input	Euclidean feature matrix	Covariance matrices / tangent vectors	Euclidean features from CSP, band power, embeddings, or tangent vectors
LDA / QDA `predict_proba`	Plug-in Gaussian from MLE estimators	Usually replaced by MDM or tangent classifiers	Multivariate Student’s t posterior predictive — heavier tails, parameter uncertainty folded into the probability
Online updates	`partial_fit` on SGD and a few others; none for LDA/QDA	Mostly offline covariance fitting	`partial_fit` on every Nimbus head (LDA/QDA via NIW conjugate update)
Latent state / session drift	No first-class classifier with persistent state	Out of scope	`NimbusSTS` — EKF latent state that persists across trials and sessions
BCI diagnostics & rejection	Roll your own; calibration via wrappers	Geometry distances + downstream model	`predict_batch` returns a `BatchResult` with entropy, calibration, and per-trial diagnostics in one call; `evaluate_rejection_policy` turns confidence into accept rate and ITR
Real-time chunk loop	`Pipeline` is offline; no chunk contract	Not aimed at chunk loop	`StreamingSession` / `StreamingSessionSTS` validated against `BCIMetadata`

Posterior predictive vs plug-in LDA/QDA — what actually changes?

NimbusLDA and NimbusQDA are the same model family as scikit-learn’s LinearDiscriminantAnalysis and QuadraticDiscriminantAnalysis — Gaussian class-conditionals, shared vs per-class covariance. The differences are on the inference side, and they are concrete in the API:

predict_proba is a multivariate Student’s t posterior predictive, not a plug-in Gaussian — heavier tails, with parameter uncertainty folded into the probability. Sklearn LDA/QDA do not give you this without a custom implementation.
Closed-form partial_fit for LDA/QDA via Normal–Inverse–Wishart conjugate updates. Sklearn LDA/QDA have no partial_fit at all.
NimbusSoftmax.predict_samples draws labels from the posterior over weights (Polya–Gamma VB). LogisticRegression.predict_proba returns a single point.
NimbusSTS carries a persistent latent state across trials and sessions (EKF, propagate_state, get_latent_state / set_latent_state, reset_state). There is no scikit-learn class with the same shape.
BALD-driven active learning — because Nimbus heads expose a posterior over parameters, query strategies like BALD (mutual information under the parameter posterior) are well-defined. Sklearn’s LDA/QDA expose only point estimates, so any “BALD-like” score requires approximations that no longer match BALD’s definition.

When the gap is largest

Few trials per class, noisy covariances, session-to-session drift, online or streaming calibration, and any flow where you act on a confidence threshold or a rejection policy.

When the gap is smallest

Plenty of clean i.i.d. offline data — boundaries often agree closely with scikit-learn. Even there, the tails of predict_proba differ (Student’s t vs Gaussian), which matters if a rejection policy reads those probabilities.

Why Nimbus is not just “LDA in NumPy”

A standard LDA baseline answers a narrow question: given a fitted model and a feature vector, which class is most likely?

That is useful, but BCI practice usually needs more:

Uncertainty: a model should expose when it is unsure, not only which class has the highest score.
Calibration: a BCI system often needs threshold sweeps, rejection logic, and confidence checks before closing the loop.
Online updates: new trials may arrive during a session, and the model should support calibration-style adaptation.
Streaming semantics: batch prediction is not enough when the deployment target is a chunked EEG loop.
BCI reports: metrics such as entropy and ITR are part of the product workflow, not afterthoughts.
Label-efficient calibration: picking which trial to label next and when to stop the calibration block matters as much as the classifier itself; cued-calibration time is a major user-facing pain point in practical BCI.

Nimbus SDK is built around those needs. NimbusLDA, NimbusQDA, NimbusSoftmax, and NimbusSTS expose model state and prediction outputs that map directly to BCI workflows rather than generic tabular classification.

That positioning is slightly different from our classifier-specific guides. Bayesian CSP and Motor Imagery focuses on the motor imagery and CSP path. Decoding P300 ERPs with Bayesian QDA goes deeper on ERP detection. Beyond Binary: Multi-Class BCI Decoding with Bayesian Softmax and NimbusSoftmax explains multi-class decoding. This post explains why the SDK layer exists at all and how those model choices fit into a coherent BCI workflow.

Where scikit-learn still fits

scikit-learn remains a strong choice when the goal is a familiar baseline or a quick frequentist comparison.

Use it when:

You need a simple offline reference model.
You do not need uncertainty diagnostics beyond basic probability estimates.
You are comparing conventional classifiers across a fixed dataset.
You are using calibration wrappers such as Platt scaling or isotonic regression around existing estimators.

The important point is not that scikit-learn is weak. The point is that it is general-purpose. It does not try to define a BCI session model, a streaming prediction report, or a diagnostic layer that connects entropy, rejection, and ITR.

Where pyRiemann fits

pyRiemann addresses a different part of the EEG stack. It is valuable when the representation itself is covariance-based and the geometry matters.

Use pyRiemann when:

Your pipeline is centered on trial covariance matrices.
You want Riemannian distances, tangent-space projections, or MDM-style classifiers.
The main modeling question is about geometry on the SPD manifold.

That does not make pyRiemann and Nimbus competitors in every pipeline. In many workflows, they compose naturally: pyRiemann can produce geometry-aware feature vectors, and Nimbus can serve as the uncertainty-aware decision layer on top — particularly when you need the system to reason about confidence under real-world variability. For the broader probabilistic framing, see What Is Active Inference? A Practical Primer for BCI Engineers.

For the preprocessing side of that story, see EEG Foundation Models in Practice: What REVE Brings to BCI Preprocessing, which covers learned EEG representations before the classifier head.

A practical composition pattern

For many BCI teams, the right architecture is not “choose one library forever.” It is:

Use signal processing or geometry tools to build a feature representation.
Feed the resulting Euclidean feature matrix into a Nimbus classifier head.
Use predict_batch or streaming APIs to collect posterior probabilities, entropy, calibration diagnostics, and ITR-oriented reports.

This becomes especially important once you account for neural drift: the same decoder that looks good offline may degrade silently during long sessions unless the model can adapt and expose uncertainty.

That gives the pipeline a clean separation of responsibilities:

Feature extraction can come from CSP, band power, DWT, embeddings, or Riemannian geometry.
Decision inference can be handled by NimbusLDA, NimbusQDA, NimbusSoftmax, or NimbusSTS.
BCI diagnostics can be handled consistently through Nimbus report outputs.

What Nimbus adds after `fit`

In a generic classifier workflow, fit usually produces a point-estimate model. Prediction returns a class label and perhaps a probability.

Nimbus makes the fitted state explicit and turns it into a reusable BCI report. A single predict_batch call on a fitted model and a BCIData object returns a BatchResult with:

Posteriors and per-trial confidences (max of posterior).
Entropy per trial and a mean entropy summary.
For LDA / QDA heads, Mahalanobis distances and outlier scores (zeros for softmax / STS — use entropy and confidence there).
ECE / MCE when BCIData carries labels.
Latency per trial and class balance across trials.

For confidence-gated decisions, evaluate_rejection_policy sweeps thresholds and returns accept rate, accuracy on accepted, Wolpaw ITR on the accepted subset, and an effective ITR heuristic. assess_trial_quality adds a per-trial gate (low confidence / high uncertainty / outlier / NaN). Persistence is parameter-only via nimbus_save / nimbus_load (.npz snapshot of the NimbusModel).

The distinction matters: a BCI loop needs to know when a prediction is usable, not only what the top class is. For the broader argument behind this design choice, read Active Inference vs. Deep Learning for BCI: Why Uncertainty Quantification Changes Everything.

Top charts: BNCI 2014-004. Bottom charts: Zhou 2016.

Supporting: Zhou 2016 (n=4 MOABB subjects)

We also plotted the same S3 scenario from the checked-in nimbusbench/results/zhou2016/scenario3_online_update.csv (small n, so treat this as a qualitative replication). The trend matches the main dataset: nimbus_partial_fit remains substantially faster than sklearn_refit on the final eval-round snapshot per subject.

These latency numbers are from our checked-in benchmark CSV (reproducible via notebooks/s3_update_latency_head_vs_sklearn.ipynb):

nimbus_partial_fit: mean mean_update_sec across subjects ≈ 87 µs (median ≈ 83 µs)
nimbus_refit: ≈ 145 µs (median ≈ 143 µs)
sklearn_refit: ≈ 835 µs (median ≈ 827 µs)
Ratio sklearn_refit / nimbus_partial_fit: about ~9.7× on average across subjects (~7.5×–10.2× per subject on this run)

Active learning during calibration

Cued calibration time is one of the biggest pain points in practical BCI. After fit, Nimbus’s active-learning module turns any fitted Bayesian head into a calibration loop, not just a classifier:

A pool ranker picks the next unlabeled trial to label, using BALD (mutual information against the posterior) or cheaper signals like entropy, margin, or least-confidence as fallbacks.
A streaming variant decides whether to ask the user to label the current trial under a label budget.
A label-free stopping rule ends the calibration block once successive posterior predictives stop drifting — and it works for the latent-state head too.

💡 Honest BALD requires a real posterior over parameters. A predict_proba-only wrapper around an MLE classifier cannot produce a BALD score that means what BALD is supposed to mean — which is the structural reason teams pick Nimbus when calibration time is the binding constraint.

The same posteriors that drive predict_proba also drive the ranker and the stopping rule, so there is no second model to maintain. See the SDK docs for the full API and a worked end-to-end recipe.

When Nimbus SDK is the wrong tool

Nimbus SDK is not meant to replace every machine learning library.

It is probably the wrong tool if:

You only need a quick frequentist baseline.
You do not need uncertainty, calibration, ITR, or streaming-oriented reports.
Your whole method is Riemannian geometry and you only need a distance classifier.
You are not working with BCI-style decision loops or EEG-derived feature streams.

In those cases, scikit-learn or pyRiemann may be simpler and more direct.

The takeaway

Nimbus SDK is best understood as a BCI decision layer.

It sits after preprocessing and feature extraction, where a system must turn uncertain neural evidence into calibrated decisions. That is why the SDK emphasizes Bayesian heads, online updates, streaming sessions, entropy, ITR, and diagnostic reports.

For offline baselines, use scikit-learn. For covariance geometry, use pyRiemann. For uncertainty-aware BCI inference, label-efficient calibration loops, and real-time chunk APIs that have to survive long sessions, use ✨Nimbus SDK.