Welcome to the LucaOne project repository. You can explore our model, datasets, and downstream task materials below:
1) LucaOne
The model code and model training scripts. Training scripts are in:
src/training_v2.0/.
3) LucaOneApp
Project to infer the representational matrix/vector of Nucleic acid or protein sequence based on LucaOne.
4) LucaOneTasks
The 8 downstream validation tasks and general-purpose implementation for downstream tasks.
8) Others
Supplementary materials including: T-SNEs, statistics, central dogma analysis, and taxonomy tree data (
ncbi_lineages_2023-04-24.csv.gz).