Advanced: Using Concomitant Formulation#
The concomitant formulation provides a more robust approach to both regression and classification by jointly estimating the model coefficients and the noise level (σ). This is particularly useful when variance is not homogeneous across your samples.
What is Concomitant Formulation?#
By setting --p-concomitant True, you switch from standard formulation to the concomitant formulation (R3):
Mathematical Formulation#
Standard Formulation (R1):
min ||y - X·β||² / σ + λ·||β||₁ s.t. C·β = 0
Assumes a fixed noise level across all samples.
Concomitant Formulation (R3):
min ||y - X·β||² / σ + e·σ + λ·||β||₁ s.t. C·β = 0
Where e = 20.0 is the penalty parameter for the standard deviation (σ).
Key Differences#
Aspect |
Standard (R1) |
Concomitant (R3) |
|---|---|---|
Noise level |
Fixed |
Jointly estimated |
Heteroscedasticity |
Assumed equal variance |
Adaptive to varying noise |
Uncertainty estimates |
May be inaccurate |
More reliable |
Computation |
Faster |
Slightly slower |
Output label |
“Formulation: R1” |
“Formulation: R3 (concomitant with e = 20.0)” |
When to Use Concomitant Formulation#
Use --p-concomitant True when:
Your residual plots show increasing variance with predicted values (heteroscedasticity)
You have samples from different environments with potentially different noise levels
You need more reliable confidence intervals for your predictions
You want robust feature selection across diverse data
Computation time is not a critical constraint
Regression with Concomitant Formulation#
Log-Contrast with Concomitant#
Replace the training step with:
qiime classo regress \
--i-features data/regress-xtraining_lc.qza \
--i-c data/ccovariates_lc.qza \
--i-weights data/wcovariates_lc.qza \
--m-y-file data/atacama-selected-covariates-veg.tsv \
--m-y-column average-soil-temperature \
--p-concomitant True \
--p-stabsel \
--p-cv \
--p-path \
--p-lamfixed \
--p-stabsel-threshold 0.5 \
--p-cv-seed 1 \
--p-no-cv-one-se \
--o-result data/regresstaxa_lc_concomitant.qza
Then proceed with prediction and visualization using regresstaxa_lc_concomitant.qza.
TRAC with Concomitant#
Replace the training step with:
qiime classo regress \
--i-features data/regress-xtraining_trac.qza \
--i-c data/ccovariates_trac.qza \
--i-weights data/wcovariates_trac.qza \
--m-y-file data/atacama-selected-covariates-veg.tsv \
--m-y-column average-soil-temperature \
--p-concomitant True \
--p-stabsel \
--p-cv \
--p-path \
--p-lamfixed \
--p-stabsel-threshold 0.5 \
--p-cv-seed 1 \
--p-no-cv-one-se \
--o-result data/regresstaxa_trac_concomitant.qza
Then proceed with prediction and visualization using regresstaxa_trac_concomitant.qza.
Classification with Concomitant Formulation#
Log-Contrast with Huber Loss#
For both log-contrast classification and TRAC classification, replace the training step with:
qiime classo classify \
--i-features data/classify-xtraining_lc.qza \
--i-c data/ccovariates_lc.qza \
--i-weights data/wcovariates_lc.qza \
--m-y-file data/atacama-selected-covariates-veg.tsv \
--m-y-column vegetation \
--p-huber False \
--p-concomitant True \
--p-stabsel \
--p-cv \
--p-path \
--p-lamfixed \
--p-stabsel-threshold 0.5 \
--p-cv-seed 42 \
--p-no-cv-one-se \
--o-result data/classifytaxa_lc_concomitant.qza
Then proceed with prediction and visualization using classifytaxa_lc_concomitant.qza.
TRAC with Concomitant#
Replace the training step with:
qiime classo classify \
--i-features data/classify-xtraining_trac.qza \
--i-c data/ccovariates_trac.qza \
--i-weights data/wcovariates_trac.qza \
--m-y-file data/atacama-selected-covariates-veg.tsv \
--m-y-column vegetation \
--p-huber False \
--p-concomitant True \
--p-stabsel \
--p-cv \
--p-path \
--p-lamfixed \
--p-stabsel-threshold 0.5 \
--p-cv-seed 42 \
--p-no-cv-one-se \
--o-result data/classifytaxa_trac_concomitant.qza
Then proceed with prediction and visualization using classifytaxa_trac_concomitant.qza.
Interpreting Results with Concomitant Formulation#
When you run your pipeline with --p-concomitant True, the output will display:
Formulation: R3 (concomitant with e = 20.0)
This indicates:
The model is using concomitant formulation
The noise level parameter
e = 20.0controls the penalty strengthCoefficients and predictions are more robust to heteroscedasticity
Tips for Optimal Results#
Compare Both Formulations: Run your analysis with both
--p-concomitant FalseandTrueto see which produces better results for your dataExamine Residuals: Plot residuals vs. fitted values before deciding:
If variance is constant → Standard formulation is sufficient
If variance increases/decreases with fitted values → Use concomitant
Cross-Validation Performance: Compare CV scores between formulations; concomitant often performs better on heteroscedastic data
Computational Budget: Concomitant requires ~1.5-2x more computation time, so consider this for large datasets
Parameter Tuning: The default
e = 20.0works well for most cases, but can be adjusted if needed