Microbial association network construction

SparCC

SparCC is a network inference tool that was specifically designed to be robust to data compositionality. The method is described in PLoS Comp 8(9): e1002687.

Step 1 - Compute correlations
SparCC is a python program that runs on command line. Please choose the input file "arctic_soils_filtered.txt". Compute the compositionality-robust correlations as the median of 10 iterations as follows:


python SparCC.py arctic_soils_filtered.txt -i 10 --cor_file=arctic_soils_sparcc.txt > sparcc.log

where -i gives the number of iterations over which the correlations are averaged. SparCC averages its results over several estimates of the true fractions, which it estimates from the counts using the Dirichlet distribution.

Step 2 - Compute bootstraps
You can then generate bootstraps from the input data using the following command:


python MakeBootstraps.py arctic_soils_filtered.txt -n 100 -o Resamplings/boot

where Resamplings is a directory and boot is the prefix of all resampled data sets. You then have to launch SparCC on each of the resampled data sets. This is best done in a script. Here is a simple script that does the job. As an example, this bash script would generate 10 correlation matrices from the first 10 resampled data sets:


for i in 0 1 2 3 4 5 6 7 8 9
do
    python SparCC.py Resamplings/boot_$i.txt -i 10 --cor_file=Bootstraps/sim_cor_$i.txt >> sparcc.log 
done

where Bootstraps is the directory into which correlation matrices computed from the resampled data matrices will be written. To compute p-values, more than 10 iterations are needed. Precomputed bootstrap correlations for 100 iterations can be downloaded here.

Step 3 - Compute p-values
Once the bootstrapped correlation scores have been computed, the p-values can be generated using command:


python PseudoPvals.py arctic_soils_sparcc.txt Bootstraps/sim_cor 10 -o pvals_two_sided.txt -t 'two_sided'  >> sparcc.log

Step 4 - Visualization
Note that you need to threshold the p-value matrix at the desired cut-off and to convert it into a network using a script of your own. For example, below is a simple R script that will perform this task. In the script, the p-value matrix is converted into a matrix of significances. Do you know why? You can find the answer here.


# load R graph library igraph
library(igraph)
path="pvals_two_sided.txt"
pvals=read.table(path,header=TRUE,sep="\t")
pvals.mat=pvals[,2:ncol(pvals)]
# set p-values of 0 to a non-zero, small p-value so we can take the logarithm
pvals.mat[pvals.mat==0]=0.000000001
# convert into significance
sig.mat=-1*log10(pvals.mat) 
# remove all edges with significance below 1
sig.mat[sig.mat<1]=0
sig.mat=as.matrix(sig.mat)
# convert adjacency matrix into a graph
sparcc.graph=graph.adjacency(sig.mat,mode="undirected")
# display the graph
layout=layout.spring
plot(sparcc.graph, layout=layout)

Microbial association network construction tutorial

Resources