Introduction

We want to calibrate the parameters that will be used to model the simulations of coverage variations. Specifically, we are analyzing the coverage variation within individuals. Such variation will consist on a multiplier selection, coming from a Gamma distribution with a hyper prior for the alpha shape, sampled from a Log. Normal distribution. Mean and standard deviation parameters of the Log. Normal distribution will be modified to study, afterwards, the resulting coverage distributions.

Scenario

Sampling of Gamma alpha-shapes

I’ll start with Log. Normal Distribution parameters of:

  • 1.4
  • 1

Also, they will be sampled within the (1,9) quantiles so we wouldn’t have too low/high values.

probs=seq(0,1,length.out = 1000)
lims= qlnorm(probs,meanLN,sdLN) # To avoid Inf
plot(probs,lims,
     xlab="Probabilities",ylab="Values",pch=16,
     main=paste0("Quantile function\nLog. Normal distribution (",meanLN,",",sdLN,")"),
     col=ifelse(((probs> 0.1 & probs <= 0.9)),"red", "black"), axes=F)
axis(1); axis(2)

Sample the alpha-shapes for all the replicates.

for (i in 1: nReplicates){
  a=rlnorm(1,meanlog = meanLN,sdlog = sdLN)
  lims= qlnorm(c(0.1,0.9),meanLN,sdLN) # To avoid Inf
  while ( a < lims[1] && a > lims[2]){
    a=rlnorm(1,meanlog = meanLN,sdlog = sdLN)
  }
  alphaShapes[i]=a
}

x=seq(0,5,length.out = 100000)
plot(x,
     dgamma(
       x,
       alphaShapes[1],
       alphaShapes[1]),
     type="l", col="darkred",lwd=2,
     ylim=c(0,5),
     main="Gamma distribution sampled",
     ylab="dgamma(x, alpha,alpha)", xlab="Value", axes=F)

for (i in 2:nReplicates){
  points(x,dgamma(
       x,
       alphaShapes[i],
       alphaShapes[i]),
       type="l", col=cols[i],
       lwd=2)
}
axis(1);axis(2)

Multipliers computation and sampling

for (i in 1:nReplicates){
  multipliers[[i]]=rgamma(nInds, alphaShapes[i],alphaShapes[i])
  for (j in 1:nInds){
    multiplier=multipliers[[i]][j]
    data[[i]][j,]=data[[i]][j,]*multiplier
  }
}

Analysis

maxX=max(unlist(lapply(data, max)))
histData=list(data.frame())
densityData=list(data.frame())
for (i in 1:nReplicates){
  h=hist(data[[i]], breaks = 100, plot=F)
  histData[[i]]=h
  d=density(data[[i]])
  densityData[[i]]=d
}

maxY=max(unlist(lapply(densityData,function(x){max(x$y)})))
plot(densityData[[1]]$x,densityData[[1]]$y,
     col=cols[1], lwd=2,  type="l",
     main="Coverage distribution",
     xlab="Depth of coverage", ylab="Density",
     xlim=c(0,maxX), ylim=c(0,maxY),axes=F)
for (i in 2:nReplicates){
  points(densityData[[i]]$x,densityData[[i]]$y,col=cols[i], lwd=2, pch=16, type="l")  
}

axis(1);axis(2)

Replicate details - Histogram and overlapping density plots

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap

Coverage distribution

Heatmap