The Open Source Directory of LucaOne

Welcome to the LucaOne project repository. You can explore our model, datasets, and downstream task materials below:

1) LucaOne The model code and model training scripts. Training scripts are in: src/training_v2.0/.
2) TrainedCheckPoint The trained checkpoint model of LucaOne.
3) LucaOneApp Project to infer the representational matrix/vector of Nucleic acid or protein sequence based on LucaOne.
4) LucaOneTasks The 8 downstream validation tasks and general-purpose implementation for downstream tasks.
5) PreTrainingDataset Contains the pre-trained dataset of LucaOne.
6) DownstreamTasksDataset Contains the datasets for the 8 downstream validation tasks.
7) DownstreamTasksTrainedModels Trained models for all downstream tasks, based on LucaOne's embedding.
8) Others Supplementary materials including: T-SNEs, statistics, central dogma analysis, and taxonomy tree data (ncbi_lineages_2023-04-24.csv.gz).