马闯等《Plant Physiology》 2024年

作者：来源：发布日期：2024-04-29 浏览次数：

论文题目：PEA-m6A: an ensemble learning framework for accurately predicting N ⁶-methyladenosine modifications in plants

论文作者：Minggui Song^#, Jiawen Zhao^#, Chujun Zhang, Chengchao Jia, Jing Yang, Haonan Zhao, Jingjing Zhai, Beilei Lei, Shiheng Tao, Siqi Chen, Ran Su, Chuang Ma^*

论文摘要： N ⁶-methyladenosine (m⁶A), which is the mostly prevalent modification in eukaryotic mRNAs, is involved in gene expression regulation and many RNA metabolism processes. Accurate prediction of m⁶A modification is important for understanding its molecular mechanisms in different biological contexts. However, most existing models have limited range of application and are species-centric. Here we present PEA-m6A, a unified, modularized and parameterized framework that can streamline m⁶A-Seq data analysis for predicting m⁶A-modified regions in plant genomes. The PEA-m6A framework builds ensemble learning based m⁶A prediction models with statistic-based and deep learning-driven features, achieving superior performance with an improvement of 6.7% to 23.3% in the area under precision-recall curve compared with state-of-the-art regional-scale m⁶A predictor WeakRM in 12 plant species. Especially, PEA-m6A is capable of leveraging knowledge from pretrained models via transfer learning, representing an innovation in that it can improve prediction accuracy of m⁶A modifications under small-sample training tasks. PEA-m6A also has a strong capability for generalization, making it suitable for application in within- and cross-species m⁶A prediction. Overall, this study presents a promising m⁶A prediction tool, PEA-m6A, with outstanding performance in terms of its accuracy, flexibility, transferability, and generalization ability. PEA-m6A has been packaged using Galaxy and Docker technologies for ease of use and is publicly available at https://github.com/cma2015/PEA-m6A.

N ⁶-甲基腺苷(m⁶A)是真核mRNA中最常见的修饰，参与基因表达调控和多种RNA代谢过程。准确预测m⁶A修饰对于理解其在不同生物学背景下的分子机制至关重要。然而，现有的大多数模型应用范围有限，且受物种限制。我们开发了一个统一的、模块化的和参数化的框架PEA-m6A，可以简化m⁶A-Seq数据分析，用于预测植物基因组中的m⁶A修饰区域。PEA-m6A框架基于统计学特征和深度学习的特征构建集成学习的m⁶A预测模型。相对于最新的m⁶A修饰区域预测方法 WeakRM，PEA-m6A的PRC指标（精度-召回曲线下的面积）在12种植物物种中提升了6.7%至23.3%。此外，PEA-m6A可以通过迁移学习利用预训练模型的知识，在小样本训练任务中提高了m⁶A修饰预测的准确性。PEA-m6A还具有很强的泛化能力，使其适用于在物种间和物种内进行m⁶A预测。综上所述，这项研究提供了一种具有高准确性、灵活性、可迁移性和泛化能力的m⁶A预测工具——PEA-m6A。PEA-m6A的源代码以及Galaxy和Docker image版本可通过以下网址获得：https://github.com/cma2015/PEA-m6A。

文章链接：https://doi.org/10.1093/plphys/kiae120