GFM: A Simple Transcriptomics Data

Wei Liu


Load real data

First, we load the ‘GFM’ package and the real data which can be downloaded at ‘here’ ( and extraction code being ‘sppy’. This data is in the format of ‘.Rdata’ that inludes a gene expression matrix ‘X’ with 3460 rows (cells) and 2000 columns (genes), a vector ‘group’ specifying two groups of variable types (‘type’ variable) including ‘gaussian’ and ‘poisson’ and a vector ‘y’ meaning the clusters of cells annotated by experts. We compare the performance of ‘GFM’ and ‘LFM’ in downstream clustering analysis based on the benchchmarked clusters ‘y’.

#ls() # check the variables

Fit GFM model

We fit the GFM model using ‘gfm’ function.

q <- 15
gfm1 <- gfm(X, group, type, q= q, output = FALSE)

Compare with LFM in downstream analysis

We conduct the clustering analysis based on the extracted factors by GFM and evaluate the adjusted rand index (ARI) value based on the annotated cluster labels by experts.

hH <- gfm1$hH
gmm1 <- Mclust(hH, G=7)
ARI_gfm <- adjustedRandIndex(gmm1$classification, y)

We fit linear factor model using same number of factors.

fac <- Factorm(X, q=15)
hH_lfm <- fac$hH
gmm2 <- Mclust(hH_lfm, G=7)
ARI_lfm <- adjustedRandIndex(gmm2$classification, y)

Compare with the ARIs by visualization.

df1 <- data.frame(ARI= c(ARI_gfm,ARI_lfm),
                    Method =factor(c('GFM', "LFM")))
  ggplot(data=df1, aes(x=Method, y=ARI, fill=Method)) + geom_bar(position = "dodge", stat="identity",width = 0.5)