Data Preparation#

We’ll use the Atacama soil microbiome dataset [6], which contains:

  • \(N = 49\) samples from Atacama Desert soil

  • \(p = 13\) microbial taxa (ASVs)

  • \(q = 5\) environmental covariates: pH, elevation, temperature, humidity, and vegetation

For more details about this dataset, see Data Overview.

Data Transformation#

Microbiome data represents relative abundances constrained to sum to a constant.

# Transform compositional data using mCLR transformation
qiime gglasso transform-features \
     --p-transformation mclr \
     --p-add-metadata False \
     --p-scale-metadata False \
     --i-table data/atacama-counts.qza \
     --i-taxonomy data/classification.qza \
     --m-sample-metadata-file data/selected-atacama-sample-metadata.tsv \
     --o-transformed-table data/atacama-table-mclr.qza

This transformation:

  • Converts compositional data to unconstrained space

  • Handles zeros without adding pseudo-counts

  • Preserves the relative information between taxa

Note: You can also use standard CLR transformation with --p-transformation clr.

Create an input correlation#

qiime gglasso calculate-covariance \
     --p-method scaled \
     --i-table data/atacama-table-mclr.qza \
     --o-covariance-matrix data/atacama-table-corr.qza

This method:

  • Calculate a scaled covariance, also known as the Pearson correlation.

  • One can also use a simple covaraince with --p-method unscaled

Note: Input for the graphical lasso problem must be a positive semi-definite matrix.