“人工智能与生物医学”专题特约综述：生物大模型技术前沿与应用进展

石金龙, 张 哲, 戴安琳, 林 恺, 何昆仑

doi:10.20059/j.cnki.pps.2025.04.1009

生理科学进展 >

2025 , Vol. 56 >Issue 3: 235 - 242

DOI: https://doi.org/10.20059/j.cnki.pps.2025.04.1009

特约综述

“人工智能与生物医学”专题特约综述：生物大模型技术前沿与应用进展

展开

(1 中国人民解放军总医院医学创新研究部, 北京 100853; 2医疗大数据应用技术国家工程研究中心, 北京 100853; 3 医学工程实验室,北京 100853; 4大连理工大学计算机科学与技术学院,大连 116024)

△ kunlunhe@plagh.org

录用日期: 2025-03-12

网络出版日期: 2025-06-25

基金资助

国家重大科研仪器研制项目(61927807)资助课题

收起

Technological Frontiers and Application Progress of Biological Large Models

Expand

(1Medical Innovation Research Department of Chinese PLA General Hospital, Beijing 100853, China; 2 National Engineering Research Center of Medical Big Data Application Technology, Beijing 100853, China; 3 Medical Engineering Laboratory, Beijing 100853, China; 4 School of computer science and technology, Dalian University of Technology, Dalian 116024, China)

△ kunlunhe@plagh.org

Accepted date: 2025-03-12

Online published: 2025-06-25

Fold

摘要

以基因、转录、蛋白质等生命组学为主体的生物大数据快速积累和以深度学习为代表的人工智能技术迅猛发展,催生出各种类别的生物大模型 (biological large models)。复杂的深度学习架构、巨大的参数量和算力需求、以及海量的预训练数据等是大模型技术的主要特征。预训练数据类别及参数量一定程度上决定了大模型所具备的能力强弱,而不同的模型架构则可支撑不同类别的下游任务。近两年,围绕 DNA/RNA/蛋白质等生物序列与单细胞表达图谱等组学数据分析挖掘、大分子结构预测、新型药物设计和功能机制解析等多种应用场景,涌现了多种通用或专用大模型, 展示出其在生物医学研究及转化应用等领域的巨大潜力。本文旨在结合不同类别的生物数据特点和研究应用需求, 概述生物数据特征及其用于生物大模型训练的技术方法, 并进一步综述现有大模型在生物医学研究及疾病诊疗中的应用进展, 为提升生物大模型能力、拓展应用范围提供新的思路。

关键词： 生物大模型; 注意力机制; 序列分析; 结构预测; 功能解读; 合成设计

本文引用格式

石金龙, 张哲, 戴安琳, 林恺, 何昆仑 . “人工智能与生物医学”专题特约综述：生物大模型技术前沿与应用进展[J]. 生理科学进展, 2025 , 56(3) : 235 -242 . DOI: 10.20059/j.cnki.pps.2025.04.1009

Abstract

The rapid accumulation of biological big data, primarily comprising genomics, transcriptomics, proteomics, and more, coupled with the swift advancement of artificial intelligence technologies, notably deep learning, has given rise to a variety of biological large models. Characterized by complex deep-learning architectures, massive parameter counts, high computational power requirements, and vast amounts of pre-training data, these large models' capabilities are largely dictated by the types and volumes of pre-training data, while different model architectures support various downstream tasks. Over the past two years, a variety of general-purpose and specialized large models have emerged in multiple application scenarios, including the analysis and mining of DNA, RNA, and protein sequences, single-cell expression atlases, structure prediction of biomacromolecules, de novo drug design, and interpretation of biological mechanisms. These models have demonstrated significant potential in the domains of biomedical research and translational applications. This paper aims to provide an overview of the characteristics of biological data and the technical methods used for training biological large models, considering the unique features and research application needs of different types of biological data. Furthermore, it reviews the application progress of existing models in biomedical research and disease diagnosis and treatment, offering new insights for enhancing model capabilities and expanding their application scope.

Key words： biological large models; attention mechanism; sequence analysis; structure prediction; functional interpretation; synthetic design

Options

摘要页面

文章导航

模态框（Modal）标题

摘要

本文引用格式

Abstract