Regression: Predicting Temperature#

This tutorial demonstrates how to use log-contrast regression to predict average soil temperature from microbial community data. The regression model identifies which taxa are most predictive of temperature variations.

Step 1: Train the Regression Model#

Use log-contrast regression with stability selection to identify the most stable predictive features:

# Log-contrast regression with stability selection
qiime classo regress \
    --i-features data/regress-xtraining.qza \
    --i-c data/ccovariates.qza \
    --i-weights data/wcovariates.qza \
    --m-y-file data/atacama-selected-covariates-veg.tsv \
    --m-y-column average-soil-temperature \
    --p-concomitant False \
    --p-stabsel \
    --p-cv \
    --p-path \
    --p-lamfixed \
    --p-stabsel-threshold 0.5 \
    --p-cv-seed 1 \
    --p-no-cv-one-se \
    --o-result data/regresstaxa.qza

Key parameters:

  • --p-stabsel: Enable stability selection for robust feature selection

  • --p-stabsel-threshold 0.5: Features selected in >50% of subsamples

  • --p-cv: Use cross-validation for model selection

  • --p-concomitant: Include concomitant variables in the model

Step 2: Make Predictions#

Apply the trained model to test data:

qiime classo predict \
    --i-features data/regress-xtest.qza \
    --i-problem data/regresstaxa.qza \
    --o-predictions data/regress-predictions.qza

Step 3: Visualize Results#

Generate a comprehensive summary of the regression results:

qiime classo summarize \
    --i-problem data/regresstaxa.qza \
    --i-taxa data/classification.qza \
    --i-predictions data/regress-predictions.qza \
    --o-visualization data/regresstaxa_R1.qzv

The visualization will show selected taxa, model performance metrics, and prediction accuracy.

Explanation:

  • Generates an interactive QIIME 2 visualization of the estimated network.

  • The output .qzv file can be viewed using QIIME 2 View.