LUCIDus: Latent Unknown Clustering with Integrated Data

Cheng Peng


About LUCIDus

The LUCIDus package is aiming to provide researchers in the genetic epidemiology community with an integrative tool in R to obtain a joint estimation of latent or unknown clusters/subgroups with multi-omics data and phenotypic traits.

This package is an implementation for the novel statistical method proposed in the research paper “Latent Unknown Clustering Integrating Multi-omics Data with Phenotypic Traits (LUCID)1.” LUCID improves the subtype classification which leads to better diagnostics as well as prognostics and could be the potential solution for efficient targeted treatments and successful personalized medicine.

LUCIDus Functions and Model Fitting

Three main functions, including est_lucid, sem_lucid, and tune_lucid, are currently available for model fitting and feature selection. The model outputs can be summarized and visualized using summary_lucid and plot_lucid respectively. Predictions could be made with pred_lucid.

Here are the descriptions of LUCIDus functions:

Function Description
est_lucid() Estimates latent clusters using multi-omics data with/without the outcome of interest, and producing an IntClust object
sem_lucid() Latent cluster estimation with supplemented E-M algorithm for statistical inference
tune_lucid() Grid search for tuning parameters using parallel computing to determine an optimal choice of three tuning parameters with minimum model BIC
summary_lucid() Summarizes the results of integrative clustering based on an IntClust object
plot_lucid() Produces a Sankey diagram for the results of integrative clustering based on an IntClust object
pred_lucid() Predicts latent clusters and outcomes with an IntClust object and new data
def_initial() Defines initial values of model parameters in est_lucid(), sem_lucid() , and tune_lucid() fitting
def_tune() Defines selection options and tuning parameters in est_lucid(), sem_lucid() , and tune_lucid() fitting
def_tol() Defines tolerance settings in est_lucid(), sem_lucid() , and tune_lucid() fitting


For a simulated data with 10 genetic features (5 causal) and 4 biomarkers (2 causal)

Final Remarks


  1. Under development, citation coming soon

  2. Parameter initials and stopping criteria can be specified using def_initial() and def_tol()

  3. Note tuning parameters in est_lucid() are defined by def_tune()

  4. Supported by the National Cancer Institute at the National Institutes of Health Grant P01 CA196569