Classification: Predicting Vegetation#

This tutorial demonstrates how to predict vegetation presence using log-contrast classification.

Overview#

Log-contrast classification is useful for predicting categorical outcomes (like vegetation presence/absence) from compositional data. This approach uses regularized logistic regression with log-contrast penalties to handle the compositional nature of microbiome data.

Step 1: Train the Classification Model#

First, we train a log-contrast classifier using cross-validation to predict vegetation presence:

# Log-contrast classification with cross-validation
qiime classo classify \
    --i-features data/classify-xtraining.qza \
    --i-c data/ccovariates.qza \
    --i-weights data/wcovariates.qza \
    --m-y-file data/atacama-selected-covariates-veg.tsv \
    --m-y-column vegetation \
    --p-huber False \
    --p-stabsel \
    --p-cv \
    --p-path \
    --p-lamfixed \
    --p-stabsel-threshold 0.5 \
    --p-cv-seed 42 \
    --p-no-cv-one-se \
    --o-result data/classifytaxa.qza

Parameters explained:

  • --i-features: Training feature table

  • --i-c: C matrix for log-contrast constraints

  • --i-weights: Feature weights

  • --m-y-column vegetation: Target variable (vegetation presence/absence)

  • --p-huber: Use Huber loss for robustness

  • --p-stabsel: Enable stability selection for feature selection

  • --p-cv: Perform cross-validation

  • --p-stabsel-threshold 0.5: Stability selection threshold

Step 2: Make Predictions#

Apply the trained model to test data:

qiime classo predict \
    --i-features data/classify-xtest.qza \
    --i-problem data/classifytaxa.qza \
    --o-predictions data/classify-predictions.qza

Step 3: Generate Summary Visualization#

Create a comprehensive summary of the classification results:

qiime classo summarize \
    --i-problem data/classifytaxa.qza \
    --i-taxa data/classification.qza \
    --i-predictions data/classify-predictions.qza \
    --o-visualization data/classifytaxa_C1.qzv

Explanation:

  • Generates an interactive QIIME 2 visualization of the estimated network.

  • The output .qzv file can be viewed using QIIME 2 View.