Data Preparation#

This guide demonstrates how to prepare microbiome data for log-contrast analysis using q2-classo. We’ll walk through transforming features, adding taxonomic information and covariates, and splitting data for machine learning tasks.

In

Data Transformation#

Transform count data using CLR transformation:

qiime classo transform-features \
     --p-transformation clr \
     --p-coef 0.5 \
     --i-features data/atacama-counts.qza \
     --o-x data/xclr

Add Taxonomic Information#

Incorporate taxonomic classifications and compute adaptive weights:

qiime classo add-taxa \
    --i-features data/xclr.qza  \
    --i-taxa data/classification.qza \
    --o-x data/xtaxa \
    --o-aweights data/wtaxa

Add Covariates#

Include environmental metadata with custom weights for each covariate:

qiime classo add-covariates \
    --i-features data/xtaxa.qza \
    --i-weights data/wtaxa.qza \
    --m-covariates-file data/atacama-selected-covariates-veg.tsv \
    --p-to-add ph average-soil-relative-humidity elevation average-soil-temperature vegetation \
    --p-w-to-add 1. 0.1 0.1 0.1 1 \
    --o-new-features data/xcovariates \
    --o-new-c data/ccovariates \
    --o-new-w data/wcovariates

Split Data for Regression Analysis#

Create training and test sets for continuous target prediction:

qiime sample-classifier split-table \
    --i-table data/xcovariates.qza \
    --m-metadata-file data/atacama-selected-covariates-veg.tsv \
    --m-metadata-column average-soil-temperature \
    --p-test-size 0.2 \
    --p-random-state 42 \
    --p-stratify False \
    --o-training-table data/regress-xtraining \
    --o-test-table data/regress-xtest \
    --o-training-targets data/regress-training-targets.qza \
    --o-test-targets data/regress-test-targets.qza

Split Data for Classification Analysis#

Create training and test sets for categorical target prediction:

qiime sample-classifier split-table \
    --i-table data/xcovariates.qza \
    --m-metadata-file data/atacama-selected-covariates-veg.tsv \
    --m-metadata-column vegetation \
    --p-test-size 0.2 \
    --p-random-state 42 \
    --p-stratify False \
    --o-training-table data/classify-xtraining \
    --o-test-table data/classify-xtest \
    --o-training-targets data/classify-training-targets.qza \
    --o-test-targets data/classify-test-targets.qza

Your data is now prepared for log-contrast modeling with both regression and classification targets.