This is a tutorial illustrating the methods presented in Sweeny et al. 2023 (https://journals.asm.org/doi/10.1128/msystems.00040-23)
We provide an outline of applying generalised linear mixed-effects models to relative abundance tables of metabarcoding data as a means to account for complex covariates that are present in wildlife microbiota datasets. The below from the manuscript illustrates the model formulation and output.
For computing efficiency, an anonymised subset of one of the pilot datasets used in the companion manuscript is used in this tutorial and can be downloaded from the ‘Data’ folder on github (https://github.com/arsweeny/microbiome-glmm/tree/main/Data)
library(tidyverse) # data manipulation and ggplot
library(janitor) # data cleaning
library(magrittr) # data manipulation
library(reshape2) # data manipulation
library(phyloseq) # output format of sequence processing
library(microbiome) # useful exploration and visualisation functions
library(MCMCglmm) # implementation of GLMMs
library(ALDEx2) # CLR transformations
library(plotrix) # std error functions
library(pals) # palettes
library(MetBrewer) # palettes
library(ggregplot) # package my friend made with lots of handy funcions - https://github.com/gfalbery/ggregplot
Set-up some palettes for later use
MicroColours <- met.brewer("Renoir",12, "discrete")
MicroColoursRamp <- colorRampPalette(MicroColours)
MicroColoursExpand <- MicroColoursRamp(n=20)
MicroColoursCat <- list(
Age = MicroColours[c(3,4)],
Sex= MicroColours[c(1,5)],
Season=MicroColours[c(2,8)]
) #%>% unlist()
AmyTheme <- theme_bw(base_size=14) +
theme(
axis.title = element_text(colour="black"),
axis.title.x = element_text(angle=45, hjust=1)
)
Mods <- list() # Model list for later
We will use the phyloseq object representing the feature abundance table, metadata, and taxonomy table for exploration of the data and analysis presented in the first section of the results (alpha and beta diversity).
We will use the csv object representing the phyloseq objected ‘melted’ into one dataframe for the mixed modelling approach outlined in Section 3.1
path <- "Data/"
df <- read_csv(paste0(path, "GLMMMTutorialData.csv")) # csv file
ps <- readRDS(paste0(path,"PhyloseqGLMM.rds")) # phyloseq object
This data was previously filtered to includes ASVs with a total abundance of 250 reads, so we have no singletons and a total of 794 ASVs and 30 samples we will proceed with.
ps
## phyloseq-class experiment-level object
## otu_table() OTU Table: [ 794 taxa and 30 samples ]
## sample_data() Sample Data: [ 30 samples by 3 sample variables ]
## tax_table() Taxonomy Table: [ 794 taxa by 6 taxonomic ranks ]
summarize_phyloseq(ps)
## [[1]]
## [1] "1] Min. number of reads = 20375"
##
## [[2]]
## [1] "2] Max. number of reads = 46860"
##
## [[3]]
## [1] "3] Total number of reads = 1093264"
##
## [[4]]
## [1] "4] Average number of reads = 36442.1333333333"
##
## [[5]]
## [1] "5] Median number of reads = 36980.5"
##
## [[6]]
## [1] "7] Sparsity = 0.606591099916037"
##
## [[7]]
## [1] "6] Any OTU sum to 1 or less? NO"
##
## [[8]]
## [1] "8] Number of singletons = 0"
##
## [[9]]
## [1] "9] Percent of OTUs that are singletons \n (i.e. exactly one read detected across all samples)0"
##
## [[10]]
## [1] "10] Number of sample variables are: 3"
##
## [[11]]
## [1] "age_class" "sex" "side"
Sample variables include a mix of age classes (15 lamb and 15 adult), males and females, and individuals from the east and west side of the study system. Samples are row names.
sample_data(ps) %>% as.data.frame() %>% head
## age_class sex side
## S57 Adult Female East
## S61 Lamb Female West
## S48 Lamb Male West
## S42 Adult Female East
## S50 Lamb Female East
## S34 Lamb Female West
By far community dominated by Firmicutes and Bacteroidetes
psPhy <- tax_glom(ps, taxrank = "phylum") # aggregates to the phylum level
psMelt <- psmelt(psPhy) # melts into a combined df for plots
SampleDepth <- psMelt %>%
group_by(Sample) %>%
summarise(Depth=sum(Abundance)) %>% as.data.frame() # summarise reads per sample
RelAbundance <- full_join(psMelt, SampleDepth) %>%
group_by(Sample, phylum) %>%
summarise(RelAbundance=Abundance/Depth) %>% as.data.frame() # summarise proportion of each phyla for each sample
require(plotrix)
RelAbundanceMean <- RelAbundance %>%
dplyr::select(-Sample) %>% group_by(phylum) %>%
summarise_each(funs(mean, sd, std.error)) %>%
arrange(desc(mean)) # summarise proportion of phyla across all samples
RelAbundanceMean
## # A tibble: 16 × 4
## phylum mean sd std.error
## <chr> <dbl> <dbl> <dbl>
## 1 Firmicutes 0.471 0.0940 0.0172
## 2 Bacteroidetes 0.380 0.0993 0.0181
## 3 Proteobacteria 0.0546 0.112 0.0204
## 4 Fibrobacteres 0.0320 0.0234 0.00427
## 5 Verrucomicrobia 0.0175 0.0366 0.00667
## 6 Cyanobacteria 0.0153 0.0116 0.00213
## 7 Spirochaetes 0.00853 0.00665 0.00121
## 8 Epsilonbacteraeota 0.00639 0.0122 0.00223
## 9 Tenericutes 0.00464 0.00292 0.000533
## 10 Lentisphaerae 0.00241 0.00207 0.000378
## 11 Euryarchaeota 0.00220 0.00334 0.000610
## 12 Kiritimatiellaeota 0.00173 0.00205 0.000375
## 13 Planctomycetes 0.00147 0.00120 0.000219
## 14 Actinobacteria 0.00139 0.00276 0.000504
## 15 Fusobacteria 0.000475 0.00260 0.000475
## 16 Elusimicrobia 0.000396 0.000949 0.000173
ggplot(psMelt, aes(x = Sample, y = Abundance, fill = phylum)) +
geom_bar(stat = "identity", position = "fill", colour="transparent") +
scale_fill_manual(values=MicroColoursExpand) +
facet_grid(~age_class, scales="free_x") +
theme_bw(base_size = 14) + # labs(subtitle = "B") +
guides(fill=guide_legend(ncol=2)) +
theme(axis.text.x = element_text(angle=45, hjust=1, size=6),
strip.background = element_rect(fill="white"))
Significant differences in diversity between age classes
# alpha metrics
ps.even <- evenness(ps, index = "all")
ps.meta <- meta(ps)
ps.meta$simpson <- ps.even$simpson
hist(ps.meta$simpson)
# test for normality
shapiro.test(ps.meta$simpson) # p=0.3525 - data not violating normality
##
## Shapiro-Wilk normality test
##
## data: ps.meta$simpson
## W = 0.96222, p-value = 0.3525
qqnorm(ps.meta$simpson) # looks okay
# compare between host groups
SimpsonTestAge<- t.test(ps.meta$simpson[ps.meta$age_class=="Lamb"], ps.meta$simpson[ps.meta$age_class=="Adult"])
SimpsonTestAge # p = 0.0197
##
## Welch Two Sample t-test
##
## data: ps.meta$simpson[ps.meta$age_class == "Lamb"] and ps.meta$simpson[ps.meta$age_class == "Adult"]
## t = -2.4932, df = 24.811, p-value = 0.0197
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.14879071 -0.01414539
## sample estimates:
## mean of x mean of y
## 0.1917007 0.2731688
require(ggpubr)
alpha.ps <- ps.meta %>%
ggviolin(., x="age_class", y="simpson",
add = "boxplot", fill = "age_class",
palette = c(MicroColoursCat$Sex),
alpha=0.6, add.params = list(alpha=0.6),
legend = "top") +
stat_compare_means(method = "t.test",
aes(label=..p.signif..),
label.x=1.5, label.y=0.4,
size=7)
alpha.ps
ps.rel <- microbiome::transform(ps, "compositional")
bx.ord_pcoa_bray <- ordinate(ps.rel, "PCoA", "bray")
metadf.bx <- data.frame(sample_data(ps.rel))
bray_ps.bxn <- phyloseq::distance(physeq = ps.rel, method = "bray")
set.seed(999)
# Adonis test
library(vegan)
adonis.test <- adonis(bray_ps.bxn ~ age_class, data = metadf.bx)
adonis.test<-clean_names(adonis.test$aov.tab)
df | sums_of_sqs | mean_sqs | f_model | r2 | pr_f | |
---|---|---|---|---|---|---|
age_class | 1 | 0.7568348 | 0.7568348 | 4.145245 | 0.1289536 | 0.001 |
Residuals | 28 | 5.1122134 | 0.1825791 | NA | 0.8710464 | NA |
Total | 29 | 5.8690482 | NA | NA | 1.0000000 | NA |
beta.ps <- plot_ordination(ps.rel,
bx.ord_pcoa_bray,
color="age_class"
#label = "sample_id",
) +
geom_point(aes(shape = age_class), size= 4) +
scale_colour_manual(values=MicroColoursCat$Sex)+
theme_bw() +
theme(plot.title = element_text(hjust = 0, size = 12),
legend.position = "top")+
stat_ellipse()+
guides(shape="none")
beta.ps
Mixed models here will all be carried out in MCMCglmm, but your favourite package/ program should work just fine with some adaptation as well. Feel free to contact me about syntax switches between algorithms.
mf <- 20 # multiplier for model iterations and chains
resp<-"abundance"
fixed<- c("age_class")
mcmcF<- as.formula(paste(resp, "~", paste(fixed)))
Prior3 <- list(R = list(V = diag(1), nu = 0.002),
G = list(G1 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G2 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G3 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100)))
# to run the model
Mods[[1]]<- MCMCglmm(fixed= mcmcF,
random= ~ sample + asv + age_class:asv,
prior=Prior3,
data=df,
verbose=T, pr=T, pl=T, # will save posteriors for random effect levels & model predictions for each iteration
family = "poisson",
nitt = 13000*mf,
thin = 10*mf,burnin=3000*mf)
Notes: * verbose = T will print progress * pr = T saves posterior distributions for all random effect levels (see section 4) * pl = T saves latent variables (model predicted variables for every data point & iteration, useful for testing for excess zeroes) * Note this did take my laptop ~ 90 minutes to run, so not a fast model!
View proportion variance associated with each random term
PropVarPois <- MCMCRep(Mods[[1]], scale="link") %>% # function from ggregplot to calculate proportion variance for each random effect
as.data.frame() %>%
mutate_at(c("Mode", "lHPD", "uHPD"), as.numeric)
PropVarPois
## Component Mode lHPD uHPD
## 1 sample 0.01203397 0.007792082 0.02416657
## 2 asv 0.17223235 0.146959862 0.21102106
## 3 age_class:asv 0.21607760 0.193830355 0.24683450
## 4 units 0.53964912 0.518421149 0.55962454
Variance components will not add to 1 here because there is some variance attributable to the poisson process. We’ll calculate that and visualise the variance breakdown for this model.
PropVarPois %>% pull(Mode) %>% sum -> TotalVar
1- TotalVar -> PoissonVar
PropVarPois[5, ] <- c("poisson", PoissonVar, NA, NA)
PropVarPois %<>%
mutate_at(c("Mode", "lHPD", "uHPD"), as.numeric)
PropVarPois%>%
mutate(Component=fct_relevel(Component, "age_class:asv", "asv", "sample","poisson", "units")) %>%
ggplot(aes(x=1, y=as.numeric(Mode), fill=Component)) +
geom_bar(stat="identity", colour="transparent") +
scale_x_continuous(labels=NULL) +
scale_fill_manual(values=MicroColours[c(2,4,6,8,10)]) +
theme_bw(base_size = 14) +
#theme(legend.position = "top") +
theme(axis.ticks.x = element_blank(),
legend.text = element_text(size=11),
legend.title = element_text(size=12)) +
labs(x='Poisson model', y='Proportion Variance')
Let’s pull out the posterior distributions relevant to our host age groups of interest
Solutions<-Mods[[1]]$Sol[, 1:dim(Mods[[1]]$Z)[2]]
SolutionsDF<- Solutions %>% as.data.frame()
SolKeepAge <- SolutionsDF %>% dplyr::select(contains("age_class:asv")) %>% colnames()
SolutionsAge<- SolutionsDF %>% dplyr::select(SolKeepAge)
SolutionsAdult <- SolutionsAge %>%
gather(key="level", value="posterior") %>% as.data.frame() %>%
separate(level, into=c("term", "age_class", "asv"), sep="[.]")
Here’s a function I wrote that you can apply to posterior distributions for each ASV in both age groups to calculate differences between groups (see Sections 2 & 3 in manuscript)
RanefPostComp<- function(df, comps, taxa, var){
require(tidyverse)
require(MCMCglmm)
not_all_na <- function(x) any(!is.na(x))
df$group <- df[, var]
df$taxa <- df[, taxa]
df<- df %>% filter(term!="(Intercept)") %>%
mutate(taxa=as.factor(taxa))
LevelComps<-list()
Levels=levels(df[, "taxa"])
for(x in 1:length(Levels)){
post1 <- df %>% subset(taxa==Levels[x] & group== comps[1]) %>% pull(posterior)
post2 <- df %>% subset(taxa==Levels[x] & group== comps[2]) %>% pull(posterior)
if(length(post2>0 ) & length(post1>0)){
postComp <- post2-post1 %>% as.matrix()
}
else{
postComp <- rep(NA, 1000)
}
LevelComps[[x]] <-postComp
}
LevelCompsFin<- bind_cols(LevelComps) %>% as.matrix()
colnames(LevelCompsFin) <- paste(Levels, comps[[2]], sep=".")
LevelCompsFin %>% as.data.frame() %>% select_if(not_all_na) -> df
df1<-apply(df,2, function(a) mean(a)[1])
df1l<-apply(df,2,function(a) HPDinterval(as.mcmc(a))[1])
df1u<-apply(df,2,function(a) HPDinterval(as.mcmc(a))[2])
df1v<-apply(df,2,function(a) var(a)[1])
df2 <- cbind(df1, df1l, df1u, df1v)
colnames(df2) <- c("mean", "lower", "upper", "var")
df2 <- df2 %>% as.data.frame() %>%
rownames_to_column(var="term") %>%
mutate(taxa=sapply(str_split(term, "[[.]]"), '[', 1),
effect=var,
nonZeroCI= ifelse(lower<0&upper<0, "Y",
ifelse(lower>0&upper>0, "Y", "N")))
colnames(df2)[which(names(df2) == "taxa")] <- paste(taxa)
return(df2)
}
Let’s apply this function to the age-related posterior distributions from the model. This will take a little while to run, especially for datasets larger than this test data! (Or just read them in below)
ASVByAge <- RanefPostComp(SolutionsAdult,
comps=c("Lamb", "Adult"),
taxa="asv",
var="age_class")
rm(var) # to avoid conflict later with another function
Let’s prep the taxonomy rank effects for visualisation with some extra information
Taxo <- df %>% filter(!duplicated(asv)) %>%
dplyr::select(asv, phylum, fam) %>% as.data.frame() %>%
rename(family=fam) # MCMCglmm get's a little cranky if a variable in the df is called family so was renamed previously
Prev <- df %>% group_by(asv) %>%
summarise(prevASV=length(abundance[abundance>0])/ 30) %>% # n samples
arrange(-prevASV)
TaxoDf <- left_join(Prev, Taxo)
TaxAge <- merge(ASVByAge, TaxoDf, by="asv", all.x=TRUE)
Identify taxa with the strongest effects (positive & negative shifts)
TaxAge %>% group_by(phylum) %>% na.omit() %>%
summarise(nASV=length(asv)) %>% filter(nASV>1) %>%
pull(phylum) %>% as.character() -> keepPhy # retain Phylum with > 1 asv for visualisation
TopAgeASV<- TaxAge %>%
filter(nonZeroCI=="Y") # pull out taxa effects with CI not spanning zero
Top50 <- TaxAge %>% arrange(-mean) %>%
filter(phylum %in% keepPhy) %>%
dplyr::select(asv, phylum, lower, upper, mean) %>%
#filter(lower>0) %>% #make sure errors don't span zero
top_n(50) %>%
mutate(group="real")
Bottom50 <- TaxAge %>% arrange(-mean) %>%
filter(phylum %in% keepPhy) %>%
dplyr::select(asv, phylum, lower, upper, mean) %>%
#filter(upper<0.1) %>%
top_n(-50) %>%
mutate(group="real")
Create some filler categories for the axis breaks for creating a panel with highest ranked taxa (see Fig 2B & 2D)
DummiesAge<-paste0("dummy", rep(1:25))
GroupAge<-rep("dummy", 25)
DummyAge<- cbind(DummiesAge, GroupAge) %>% as.data.frame()
colnames(DummyAge)<- c("asv", "group")
select50<- bind_rows(Top50, DummyAge, Bottom50)
select_x_age<- select50 %>% pull(asv)
MicroColoursPhy <- MicroColours
PlotPal <- data.frame(keepPhy, MicroColoursPhy[c(1,3,4:10,12)])
colnames(PlotPal) <- c("phylum", "colour")
AgeTopBottom<-
select50 %>% filter(!is.na(phylum)) %>%
ggplot(aes(x=reorder(asv, mean), y=mean, colour=phylum))+
geom_point(position=position_dodge(w=0.4), size=1.5)+
geom_errorbar(position=position_dodge(w=0.5),
aes(ymin=lower,ymax=upper),size=0.3,width=0.2)+
geom_segment(aes(y=0, yend=0, x=120, xend=70), colour="black", size=0.7)+
geom_segment(aes(y=0, yend=0, x=70, xend=50), linetype="dashed", colour="black", size=0.7)+
geom_segment(aes(y=0, yend=0, x=50, xend=0), colour="black", size=0.7)+
#geom_hline(aes(yintercept=0, linetype=group))+
theme_classic(base_size = 14)+labs(x=NULL)+
scale_colour_manual(values=PlotPal %>%
filter(phylum %in% select50$phylum) %>%
mutate(colour=as.character(colour)) %>%
pull(colour),
breaks=unique(as.character(select50$phylum)) %>%
sort()) +
scale_x_discrete(limits=rev(select_x_age))+
#geom_vline(aes(xintercept=50.5, colour="grey"), size=0.2) +
coord_flip()+
theme(strip.text.x = element_text(size = 0.8),
strip.background =element_rect(fill="white"),
axis.text.y = element_blank(),
axis.ticks.y=element_blank())+
labs(subtitle = "B")
TaxAge %>% na.omit() %>% subset(phylum %in% keepPhy) -> SinaDFAge
AgeSina<-
ggplot() +
geom_violin(data=SinaDFAge,
aes(y=mean, x=phylum, fill=phylum),
colour="transparent", alpha=0.2) +
ggforce::geom_sina(data=SinaDFAge,aes(y=mean, x=phylum, size=1/var, colour=phylum), alpha=0.3) +
scale_size(range=c(0.15, 3), name="Inverse Variance") +
scale_colour_manual(values=PlotPal %>%
filter(phylum %in% SinaDFAge$phylum) %>%
mutate(colour=as.character(colour)) %>%
pull(colour),
breaks=unique(as.character(SinaDFAge$phylum)) %>%
sort()) +
scale_fill_manual(values=PlotPal %>%
filter(phylum %in% SinaDFAge$phylum) %>%
mutate(colour=as.character(colour)) %>%
pull(colour),
breaks=unique(as.character(SinaDFAge$phylum)) %>%
sort()) +
geom_hline(yintercept=0, linetype="dashed")+
theme_bw(base_size = 14) +
#coord_flip() +theme_ipsum(base_size = 10) +
theme(axis.text.x = element_text(angle=45, hjust=1),
axis.title.x = element_blank(),
legend.position = "top")+ #+facet_wrap(~phylum) +
guides(colour=FALSE, fill=FALSE)
AgeSinaRect <- AgeSina+
geom_rect(data=Top50,
aes(xmin=0.75, xmax=10.25, ymin=min(mean-0.15), ymax=max(mean+0.25)),
fill="transparent", color="grey60",size=0.5) +
geom_rect(data=Bottom50,
aes(xmin=0.75, xmax=10.25, ymin=min(mean-0.30), ymax=max(mean+0.25)),
fill="transparent", color="grey60",size=0.5)+
labs(subtitle = "A")
We can now visualise a plot comparable (but with fewer ASVs) to Figure 2 in the manuscript, where the highest ranked positive and negative shifts are highlighted by the rectangle in panel A & panel B forest plot.
As in the manuscript, although there is a lot of variation within phlya, the majority of highly positive shifts (increases into adulthood) are found in the Bacteroidetes, and the majority of the highly negative shifts (decreases into adulthood) are found in the Firmicutes.
require(patchwork)
AgeSinaRect + AgeTopBottom
The CLR takes the form of the log-ratio of the count for a given taxon in a given sample and the geometric mean of the counts of all taxa in that sample
This transformation has several desirable properties:
As a log-ratio it is approximately Gaussian in distribution, allowing the use of a Gaussian linear mixed model
Secondly, the mean of the CLR of all taxa in any sample will be zero, allowing for simplification of our random effects structure (i.e. removal of random effects not dealing with taxa)
CLR is increasingly used for microbiome analyses for it’s explicit acknowledgement of read abundances as compositional (Gloor et al 2017)
Below we use the package ‘ADLEx2’ (Fernandes et al 2013) to calculate a centered log-ratio for each ASV in each sample.
In addition to CLR values, ALDEx2 estimates technical variation for each ASV from a probability distribution using Monte-Carlo instances drawn from the Drichlet distribution, which maintains the compositional nature of the data.
This variance for each asv:sample level can be incorporated as measurement error in the MCMCglmm syntax, comparable to the ‘asv:sample’ term in the Poisson models (3A).
Further details of CLR models can be found in ESM section 2.
df_t <- df %>%
dplyr::select(asv:abundance) %>%
spread(sample, abundance) # spread data to wide format
df_m <-df_t %>%
column_to_rownames("asv") %>%
as.matrix() # turn into a matrix
The below will create a transformed dataframe, or can read in the CLR format data in the next code chunk
n_sample<-length(unique(df$sample)) #30
x<-aldex.clr(df_m,as.character(seq(1,n_sample)), mc.samples=128, denom="all", verbose=F) # run ALDEx2
y<-melt(getMonteCarloInstances (x)) # extract the data
colnames(y)<-c("asv","mcmc_instance","clr","sample") # tidy for df
z<-aggregate(y$clr,by=list(y$asv,y$sample),FUN=mean) # take mean across monte-carlo samples
meas_var<-aggregate(y$clr,by=list(y$asv,y$sample),FUN=var) # calculate variance across monte-carlo samples
z$sd<-meas_var$x # add variances to data frame
colnames(z)<-c("asv","sample","clr","meas_var") # correct dimensions
mean(z$clr) # will be very close to zero
meta<- df %>%
dplyr::select(sample, asv, phylum, abundance, id, age_class, sex)
CLR <- merge(z, meta, by=c("asv", "sample"), all.x=TRUE) #looks good
CLR <- read_csv("Data/GLMMMTutorialDatasetCLR.csv")
Each ASV per sample ends up with a CLR value taking into account the abundance of all other ASVs in the sample and a distribution of possible values, from which error around the CLR estimate is obtained
asvs<- paste0("ASV", seq(1:50)) %>% sample(., size=5)
meanCLR <- z %>% mutate(meanCLR=clr) %>%
dplyr::select(asv, meanCLR, sample) %>%
filter(asv %in% asvs) %>%
mutate(sample_asv=paste(asv, sample, sep="-"))
y_t<-y %>%
mutate(sample_asv=paste(asv, sample, sep="-")) %>%
filter(asv %in% asvs)
ridges <- merge(y_t %>% dplyr::select(-c("asv", "sample")), meanCLR, by=c("sample_asv"),
all.x=TRUE, all.y=FALSE) %>%
arrange(meanCLR, mcmc_instance) %>%
mutate(order=row_number())
require(ggridges)
clr_posteriors <- ridges %>%
ggplot(aes(x=clr, y=reorder(sample_asv,-meanCLR), group=sample_asv)) +
scale_x_continuous(limits=c(-15,15))+
geom_density_ridges(scale=5, size=0.25)+
theme_minimal() +
geom_vline(xintercept = 0, linetype="dashed")+
theme(axis.text.y = element_blank()) + labs(y="asv", x="clr")
clr_posteriors
mf <- 20 # multiplier for model iterations and chains
Prior2 <- list(R = list(V = diag(1), nu = 0.002),
G = list(G1 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G2 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100)))
Mods[[2]] <-MCMCglmm(clr~ 1,
random=~ age_class:asv + asv, # note sample does not need to be included here
mev=CLR$meas_var, # measurement error variance
data=CLR,
prior=Prior2,
verbose=T,
pr=F, pl=F,
nitt= 13000*mf,
thin=10*mf, burnin=3000*mf)
Notes: * this also takes a fair bit of time (~40 min for me) but will be more efficient for more complex random effect structures * pr & pl are set to F for the purposes of this tutorial, but if you want the posteriors/ predictions saved, set to T
PropVarCLR <- MCMCRep(Mods[[2]]) %>%
as.data.frame() %>%
mutate_at(c("Mode", "lHPD", "uHPD"), as.numeric)
PropVarCLR[3,1] <- "mev"
We can compare these variance components to the those of the Poisson model and see how well they match.
We can see below that the variation in community composition associated with age class is about 20% in each model.
PropVarAll <- bind_rows(PropVarPois %>%
mutate(Model="Poisson"),
PropVarCLR %>%
mutate(Model="Gaussian"))
PropVarAll%>%
mutate(Component=fct_relevel(Component, "age_class:asv", "asv", "sample","poisson", "mev", "units")) %>%
ggplot(aes(x=Model, y=as.numeric(Mode), fill=Component)) +
geom_bar(stat="identity", colour="transparent") +
scale_fill_manual(values=MicroColours[c(2,4,6,8,10,12)]) +
theme_bw(base_size = 14) +
#theme(legend.position = "top") +
theme_bw(base_size = 14) +
labs(x='Model', y='Proportion Variance')
mf <- 20 # multiplier for model iterations and chains
Prior9 <- list(R = list(V = diag(1), nu = 0.002),
G = list(G1 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G2 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G3 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G4 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G5 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G6 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G7 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G8 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100),
G9 = list(V = diag(1), nu = 0.002, alpha.mu = rep(0,1), alpha.V = diag(1)*100)))
Mods[[3]]<- MCMCglmm(fixed= mcmcF,
random= ~ sample + asv + age_class:asv +
phylum + phylum:sample + age_class:phylum +
fam + fam:sample + age_class:fam,
prior=Prior9,
data=df,
verbose=T, pr=F, pl=F, # will save posteriors for random effect levels & model predictions for each iteration
family = "poisson",
nitt = 13000*mf,
thin = 10*mf,burnin=3000*mf)
View proportion variance breakdown with added taxonomic levels
PropVarTaxa <- MCMCRep(Mods[[3]], scale="link") %>% # function from ggregplot to calculate proportion variance for each random effect
as.data.frame() %>%
mutate_at(c("Mode", "lHPD", "uHPD"), as.numeric)
PropVarTaxa
## Component Mode lHPD uHPD
## 1 sample 0.0269881669 1.445599e-02 0.05075494
## 2 asv 0.1506133772 1.273715e-01 0.18110125
## 3 age_class:asv 0.1566505516 1.303556e-01 0.18020695
## 4 phylum 0.0002892060 1.428179e-07 0.04702793
## 5 phylum:sample 0.0154113850 6.350466e-03 0.02726147
## 6 age_class:phylum 0.0002328383 6.401795e-07 0.03563875
## 7 fam 0.0003373888 6.033218e-09 0.09344710
## 8 fam:sample 0.0371861831 2.851100e-02 0.05019781
## 9 age_class:fam 0.0339000125 1.513097e-02 0.10228162
## 10 units 0.4514965291 4.091250e-01 0.48510608
Variance components will not add to 1 here because there is some variance attributable to the poisson process. We’ll calculate that and visualise the variance breakdown for this model.
PropVarTaxa %>% pull(Mode) %>% sum -> TotalVar
1- TotalVar -> TaxasonVar
PropVarTaxa[11, ] <- c("poisson", TaxasonVar, NA, NA)
PropVarTaxa %<>%
mutate_at(c("Mode", "lHPD", "uHPD"), as.numeric)
PropVarTaxa %>%
ggplot(aes(x=1, y=as.numeric(Mode), fill=Component)) +
geom_bar(stat="identity", colour="transparent") +
scale_x_continuous(labels=NULL) +
scale_fill_manual(values=MicroColours) +
theme_bw(base_size = 14) +
#theme(legend.position = "top") +
theme(axis.ticks.x = element_blank(),
legend.text = element_text(size=11),
legend.title = element_text(size=12)) +
labs(x='Nested taxonomy model', y='Proportion Variance')
Most variance is coming from the asv level (age_class:asv), followed by the family (age_class:fam) and finally the phylum (age_class:phylum) level. You can visualise the output of this model following code in part 3 of this tutorial to create a plot comparable to Figure S4 in this manuscript.