bigPint 1.2.2
Information about the dataMetrics
object can be found in the article Data metrics object. If a researcher is using the bigPint
package to plot RNA-seq data, then many will create the dataMetrics
object using popular RNA-seq packages such as edgeR (Robinson, McCarthy, and Smyth 2010), DESeq2 (Love, Huber, and Anders 2014), and limma (Ritchie et al. 2015). These R packages will output several interesting quantitative variables for each gene in the dataset, such as log fold change, p-value, and FDR. These quantitative variables can be incorporated into the dataMetrics
object and determine by thresholds what subset of genes will be superimposed onto plots.
The following example shows how to create the dataMetrics
object called soybean_ir_sub_metrics
, which was shown in the article Data metrics object (Lauter and Graham 2016). The dataset from which it is created (soybean_ir_sub
) contains only two treatment groups, N and P. In this case, the edgeR (Robinson, McCarthy, and Smyth 2010) package was primarily followed.
library(bigPint)
library(edgeR)
library(data.table)
data(soybean_ir_sub)
data = soybean_ir_sub
rownames(data) = data[,1]
y = DGEList(counts=data[,-1])
group = c(1,1,1,2,2,2)
y = DGEList(counts=y, group=group)
Group = factor(c(rep("N",3), rep("P",3)))
design <- model.matrix(~0+Group, data=y$samples)
colnames(design) <- levels(Group)
y <- estimateDisp(y, design)
fit <- glmFit(y, design)
soybean_ir_sub_metrics <- list()
for (i in 1:(ncol(fit)-1)){
for (j in (i+1):ncol(fit)){
contrast=rep(0,ncol(fit))
contrast[i]=1
contrast[j]=-1
lrt <- glmLRT(fit, contrast=contrast)
lrt <- topTags(lrt, n = nrow(y[[1]]))[[1]]
setDT(lrt, keep.rownames = TRUE)[]
colnames(lrt)[1] = "ID"
lrt <- as.data.frame(lrt)
soybean_ir_sub_metrics[[paste0(colnames(fit)[i], "_", colnames(fit)[j])]] <- lrt
}
}
We can indeed examine the generated soybean_ir_sub_metrics
object as follows:
str(soybean_ir_sub_metrics, strict.width = "wrap")
## List of 1
## $ N_P:'data.frame': 5604 obs. of 6 variables:
## ..$ ID : chr [1:5604] "Glyma.19G168700.Wm82.a2.v1" "Glyma.13G293500.Wm82.a2.v1"
## "Glyma.05G188700.Wm82.a2.v1" "Glyma.13G173100.Wm82.a2.v1" ...
## ..$ logFC : num [1:5604] -5.92 2.99 -3.51 -3.91 -3.51 ...
## ..$ logCPM: num [1:5604] 7.52 8.08 8.83 8.27 10.19 ...
## ..$ LR : num [1:5604] 266 171 167 157 154 ...
## ..$ PValue: num [1:5604] 9.18e-60 3.65e-39 2.73e-38 6.04e-36 2.58e-35 ...
## ..$ FDR : num [1:5604] 5.14e-56 1.02e-35 5.09e-35 8.46e-33 2.89e-32 ...
And verify that it contains one list element:
names(soybean_ir_sub_metrics)
## [1] "N_P"
The following example shows how to create the dataMetrics
object called soybean_cn_sub_metrics
, which was shown in the article Data metrics object). The dataset from which it is created (soybean_cn_sub
) contains three treatment groups, S1, S2, and S3 (Brown and Hudson 2015). In this case, the edgeR (Robinson, McCarthy, and Smyth 2010) package was primarily followed.
library(edgeR)
library(data.table)
data(soybean_cn_sub)
data = soybean_cn_sub
rownames(data) = data[,1]
y = DGEList(counts=data[,-1])
group = c(1,1,1,2,2,2,3,3,3)
y = DGEList(counts=y, group=group)
Group = factor(c(rep("S1",3), rep("S2",3), rep("S3",3)))
design <- model.matrix(~0+Group, data=y$samples)
colnames(design) <- levels(Group)
y <- estimateDisp(y, design)
fit <- glmFit(y, design)
soybean_cn_sub_metrics <- list()
for (i in 1:(ncol(fit)-1)){
for (j in (i+1):ncol(fit)){
contrast=rep(0,ncol(fit))
contrast[i]=1
contrast[j]=-1
lrt <- glmLRT(fit, contrast=contrast)
lrt <- topTags(lrt, n = nrow(y[[1]]))[[1]]
setDT(lrt, keep.rownames = TRUE)[]
colnames(lrt)[1] = "ID"
lrt <- as.data.frame(lrt)
soybean_cn_sub_metrics[[paste0(colnames(fit)[i], "_", colnames(fit)[j])]] <- lrt
}
}
We can indeed examine the generated soybean_cn_sub_metrics
object as follows:
str(soybean_cn_sub_metrics, strict.width = "wrap")
## List of 3
## $ S1_S2:'data.frame': 7332 obs. of 6 variables:
## ..$ ID : chr [1:7332] "Glyma18g00690.1" "Glyma08g22380.1" "Glyma20g30460.1"
## "Glyma07g09700.1" ...
## ..$ logFC : num [1:7332] 3.14 2.62 2.5 2.37 2.74 ...
## ..$ logCPM: num [1:7332] 8.04 8.05 8.14 8.19 7.93 ...
## ..$ LR : num [1:7332] 29.3 24.8 24 22.7 21 ...
## ..$ PValue: num [1:7332] 6.08e-08 6.50e-07 9.82e-07 1.94e-06 4.57e-06 ...
## ..$ FDR : num [1:7332] 0.000446 0.002384 0.0024 0.003557 0.006697 ...
## $ S1_S3:'data.frame': 7332 obs. of 6 variables:
## ..$ ID : chr [1:7332] "Glyma08g22380.1" "Glyma08g19290.1" "Glyma20g30460.1"
## "Glyma18g00690.1" ...
## ..$ logFC : num [1:7332] 3.18 3.32 2.44 2.46 3.03 ...
## ..$ logCPM: num [1:7332] 8.05 8.14 8.14 8.04 7.85 ...
## ..$ LR : num [1:7332] 30.4 24.7 23.3 22.5 22.3 ...
## ..$ PValue: num [1:7332] 3.60e-08 6.53e-07 1.42e-06 2.09e-06 2.30e-06 ...
## ..$ FDR : num [1:7332] 0.000264 0.002393 0.003378 0.003378 0.003378 ...
## $ S2_S3:'data.frame': 7332 obs. of 6 variables:
## ..$ ID : chr [1:7332] "Glyma08g14670.3" "Glyma08g14670.2" "Glyma08g19290.1"
## "Glyma06g46701.1" ...
## ..$ logFC : num [1:7332] 2.66 2.72 2.78 2.19 2.29 ...
## ..$ logCPM: num [1:7332] 7.86 7.82 8.14 7.63 7.84 ...
## ..$ LR : num [1:7332] 16.1 16 14.3 11.8 10.5 ...
## ..$ PValue: num [1:7332] 6.01e-05 6.30e-05 1.57e-04 5.81e-04 1.21e-03 ...
## ..$ FDR : num [1:7332] 0.231 0.231 0.385 1 1 ...
And verify that it contains three list element:
names(soybean_cn_sub_metrics)
## [1] "S1_S2" "S1_S3" "S2_S3"
Brown, Anne V., and Karen A. Hudson. 2015. “Developmental Profiling of Gene Expression in Soybean Trifoliate Leaves and Cotyledons.” BMC Plant Biology 15 (1). BioMed Central:169.
Lauter, AN Moran, and MA Graham. 2016. “NCBI Sra Bioproject Accession: PRJNA318409.”
Love, Michael I., Wolfgang Huber, and Simon Anders. 2014. “Moderated Estimation of Fold Change and Dispersion for Rna-Seq Data with Deseq2.” Genome Biology 15 (12). BioMed Central:550.
Ritchie, Matthew E., Belinda Phipson, Di Wu, Yifang Hu, Charity W. Law, Wei Shi, and Gordon K. Smyth. 2015. “Limma Powers Differential Expression Analyses for Rna-Sequencing and Microarray Studies.” Nucleic Acids Research 43 (7). Oxford University Press:e47–e47.
Robinson, Mark D., Davis J. McCarthy, and Gordon K. Smyth. 2010. “EdgeR: A Bioconductor Package for Differential Expression Analysis of Digital Gene Expression Data.” Bioinformatics 26 (1). Oxford University Press:139–40.