计算机科学

首页 > 计算机科学

分子挖掘

2018-09-06 14:21:32     所属分类:化学信息学

此页介绍使用分子数据挖掘。由于分子可由分子图表示,这与图形挖掘和结构化数据挖掘密切相关。主要问题是如何在区分数据实例时表示分子。其中一种方法是化学相似性度量,这在化学信息学领域具有悠久的传统。

计算化学相似性的典型方法是使用化学指纹,但这会导致丢失有关分子拓扑的基础信息。挖掘分子图直接避免了这个问题。反向QSAR问题也适用于矢量映射问题。

目录

  • 1 编码(分子i,分子jneq i)
    • 1.1 核心方法
    • 1.2 最大值共同图形方法(Maximum Common Graph methods)
  • 2 编码(分子i)
    • 2.1 分子查询方法
    • 2.2 基于神经网络特殊架构的方法
  • 3 参见
  • 4 参考文献
    • 4.1 进一步阅读
  • 5 参见
  • 6 外部链接

编码(分子i,分子jneq i)

核心方法

  • 边缘化图形核心
    [1]
  • 最优分配核心[2][3][4]
  • 药效核心[5]
  • C++(and R)执行结合
    • 标记图之间的边缘化图形核心
      [1]
    • 边缘化核心的扩展[6]
    • 谷本核(Tanimoto kernels)[7]
    • 基于树形图的图形内核[8]
    • 基于用于分子3D结构的药效核心[5]

最大值共同图形方法(Maximum Common Graph methods)

  • MCS-HSCS[9] (单MCS最高得分普通子结构(HSCS)排名策略)
  • 小分子子图检测器(SMSD)[10]-是一个基于Java的软件库,用于计算小分子之间的最大共同子图(MCS)。这将有助于我们找到两个分子之间的相似性/距离。 MCS也用于通过击打分子来筛选药物化合物,其分享共同的子图(子结构)。[11]

编码(分子i)

分子查询方法

  • Warmr[12][13]
  • AGM[14][15]
  • PolyFARM[16]
  • FSG[17][18]
  • MolFea[19]
  • MoFa/MoSS[20][21][22]
  • Gaston[23]
  • LAZAR[24]
  • ParMol[25] (包括 MoFa, FFSM, gSpan 和 Gaston)
  • optimized gSpan[26][27]
  • SMIREP[28]
  • DMax[29]
  • SAm/AIm/RHC[30]
  • AFGen[31]
  • gRed[32]
  • G-Hash[33]

基于神经网络特殊架构的方法

  • BPZ[34][35]
  • ChemNet[36]
  • CCS[37][38]
  • MolNet[39]
  • Graph machines[40]

参见

  • 分子查询语言
  • 化学图论

参考文献

  1. ^ 1.0 1.1 H. Kashima, K. Tsuda, A. Inokuchi, Marginalized Kernels Between Labeled Graphs, The 20th International Conference on Machine Learning (ICML2003), 2003. PDF
  2. ^ H. Fröhlich, J. K. Wegner, A. Zell, Optimal Assignment Kernels For Attributed Molecular Graphs, The 22nd International Conference on Machine Learning (ICML 2005), Omnipress, Madison, WI, USA, 2005, 225-232. PDF
  3. ^ H. Fröhlich, J. K. Wegner, A. Zell, Kernel Functions for Attributed Molecular Graphs - A New Similarity Based Approach To ADME Prediction in Classification and Regression, QSAR Comb. Sci., 2006, 25, 317-326. doi:10.1002/qsar.200510135
  4. ^ H. Fröhlich, J. K. Wegner, A. Zell, Assignment Kernels For Chemical Compounds, International Joint Conference on Neural Networks 2005 (IJCNN'05), 2005, 913-918. CiteSeer
  5. ^ 5.0 5.1 P. Mahe, L. Ralaivola, V. Stoven, J. Vert, The pharmacophore kernel for virtual screening with support vector machines, J Chem Inf Model, 2006, 46, 2003-2014. doi:10.1021/ci060138m
  6. ^ P. Mahé, N. Ueda, T. Akutsu, J.-L. Perret and P. Vert, J.-P. Extensions of marginalized graph kernels. Proceedings of the 21st ICML. 2004: 552–559.  参数|author=值左起第56位存在水平制表 (帮助)
  7. ^ L. Ralaivola, S. J. Swamidass, S. Hiroto and P. Baldi. Graph kernels for chemical informatics. Neural Networks. 2005, 18: 1093–1110. doi:10.1016/j.neunet.2005.07.009. 
  8. ^ P. Mahé and J.-P. Vert. Graph kernels based on tree patterns for molecules. Machine Learning. 2009, 75 (1): 3–35. ISSN 0885-6125. doi:10.1007/s10994-008-5086-2. 
  9. ^ J. K. Wegner, H. Fröhlich, H. Mielenz, A. Zell, Data and Graph Mining in Chemical Space for ADME and Activity Data Sets, QSAR Comb. Sci., 2006, 25, 205-220. doi:10.1002/qsar.200510009
  10. ^ S. A. Rahman, M. Bashton, G. L. Holliday, R. Schrader and J. M. Thornton, Small Molecule Subgraph Detector (SMSD) toolkit, Journal of Cheminformatics 2009, 1:12. doi:10.1186/1758-2946-1-12
  11. ^ http://www.ebi.ac.uk/thornton-srv/software/SMSD/
  12. ^ R. D. King, A. Srinivasan, L. Dehaspe, Wamr: a data mining tool for chemical data, J. Comput.-Aid. Mol. Des., 2001, 15, 173-181. doi:10.1023/A:1008171016861
  13. ^ L. Dehaspe, H. Toivonen, King, Finding frequent substructures in chemical compounds, 4th International Conference on Knowledge Discovery and Data Mining, AAAI Press., 1998, 30-36.
  14. ^ A. Inokuchi, T. Washio, T. Okada, H. Motoda, Applying the Apriori-based Graph Mining Method to Mutagenesis Data Analysis, Journal of Computer Aided Chemistry, 2001, 2, 87-92.
  15. ^ A. Inokuchi, T. Washio, K. Nishimura, H. Motoda, A Fast Algorithm for Mining Frequent Connected Subgraphs, IBM Research, Tokyo Research Laboratory, 2002.
  16. ^ A. Clare, R. D. King, Data mining the yeast genome in a lazy functional language, Practical Aspects of Declarative Languages (PADL2003), 2003.
  17. ^ M. Kuramochi, G. Karypis, An Efficient Algorithm for Discovering Frequent Subgraphs, IEEE Transactions on Knowledge and Data Engineering, 2004, 16(9), 1038-1051.
  18. ^ M. Deshpande, M. Kuramochi, N. Wale, G. Karypis, Frequent Substructure-Based Approaches for Classifying Chemical Compounds, IEEE Transactions on Knowledge and Data Engineering, 2005, 17(8), 1036-1050.
  19. ^ C. Helma, T. Cramer, S. Kramer, L. de Raedt, Data Mining and Machine Learning Techniques for the Identification of Mutagenicity Inducing Substructures and Structure Activity Relationships of Noncongeneric Compounds, J. Chem. Inf. Comput. Sci., 2004, 44, 1402-1411. doi:10.1021/ci034254q
  20. ^ T. Meinl, C. Borgelt, M. R. Berthold, Discriminative Closed Fragment Mining and Perfect Extensions in MoFa, Proceedings of the Second Starting AI Researchers Symposium (STAIRS 2004), 2004.
  21. ^ T. Meinl, C. Borgelt, M. R. Berthold, M. Philippsen, Mining Fragments with Fuzzy Chains in Molecular Databases, Second International Workshop on Mining Graphs, Trees and Sequences (MGTS2004), 2004.
  22. ^ T. Meinl, M. R. Berthold, Hybrid Fragment Mining with MoFa and FSG, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  23. ^ S. Nijssen, J. N. Kok. Frequent Graph Mining and its Application to Molecular Databases, Proceedings of the 2004 IEEE Conference on Systems, Man & Cybernetics (SMC2004), 2004.
  24. ^ C. Helma, Predictive Toxicology, CRC Press, 2005.
  25. ^ M. Wörlein, Extension and parallelization of a graph-mining-algorithm, Friedrich-Alexander-Universität, 2006. PDF
  26. ^ K. Jahn, S. Kramer, Optimizing gSpan for Molecular Datasets, Proceedings of the Third International Workshop on Mining Graphs, Trees and Sequences (MGTS-2005), 2005.
  27. ^ X. Yan, J. Han, gSpan: Graph-Based Substructure Pattern Mining, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM 2002), IEEE Computer Society, 2002, 721-724.
  28. ^ A. Karwath, L. D. Raedt, SMIREP: predicting chemical activity from SMILES, J Chem Inf Model, 2006, 46, 2432-2444. doi:10.1021/ci060159g
  29. ^ H. Ando, L. Dehaspe, W. Luyten, E. Craenenbroeck, H. Vandecasteele, L. Meervelt, Discovering H-Bonding Rules in Crystals with Inductive Logic Programming, Mol Pharm, 2006, 3, 665-674 . doi:10.1021/mp060034z
  30. ^ P. Mazzatorta, L. Tran, B. Schilter, M. Grigorov, Integration of Structure-Activity Relationship and Artificial Intelligence Systems To Improve in Silico Prediction of Ames Test Mutagenicity, J. Chem. Inf. Model., 2006, ASAP alert. doi:10.1021/ci600411v
  31. ^ N. Wale, G. Karypis. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification, ICDM, ''2006, 678-689.
  32. ^ A. Gago Alonso, J.E. Medina Pagola, J.A. Carrasco-Ochoa and J.F. Martínez-Trinidad Mining Connected Subgraph Mining Reducing the Number of Candidates, In Proc. of ECML--PKDD, pp. 365–376, 2008.
  33. ^ Xiaohong Wang, Jun Huan , Aaron Smalter, Gerald Lushington, Application of Kernel Functions for Accurate Similarity Search in Large Chemical Databases , in BMC Bioinformatics Vol. 11 (Suppl 3):S8 2010.
  34. ^ Baskin, I. I.; V. A. Palyulin; N. S. Zefirov. A methodology for searching direct correlations between structures and properties of organic compounds by using computational neural networks. Doklady Akademii Nauk SSSR. 1993, 333 (2): 176–179. 
  35. ^ I. I. Baskin, V. A. Palyulin, N. S. Zefirov. A Neural Device for Searching Direct Correlations between Structures and Properties of Organic Compounds. J. Chem. Inf. Comput. Sci. 1997, 37 (4): 715–721. doi:10.1021/ci940128y. 
  36. ^ D. B. Kireev. ChemNet: A Novel Neural Network Based Method for Graph/Property Mapping. J. Chem. Inf. Comput. Sci. 1995, 35 (2): 175–180. doi:10.1021/ci00024a001. 
  37. ^ A. M. Bianucci; Micheli, Alessio; Sperduti, Alessandro; Starita, Antonina. Application of Cascade Correlation Networks for Structures to Chemistry. Applied Intelligence. 2000, 12 (1-2): 117–146. doi:10.1023/A:1008368105614. 
  38. ^ A. Micheli, A. Sperduti, A. Starita, A. M. Bianucci. Analysis of the Internal Representations Developed by Neural Networks for Structures Applied to Quantitative Structure-Activity Relationship Studies of Benzodiazepines. J. Chem. Inf. Comput. Sci. 2001, 41 (1): 202–218. PMID 11206375. doi:10.1021/ci9903399. 
  39. ^ O. Ivanciuc. Molecular Structure Encoding into Artificial Neural Networks Topology. Roumanian Chemical Quarterly Reviews. 2001, 8: 197–220. 
  40. ^ A. Goulon, T. Picot, A. Duprat, G. Dreyfus. Predicting activities without computing descriptors: Graph machines for QSAR. SAR and QSAR in Environmental Research. 2007, 18 (1-2): 141–153. PMID 17365965. doi:10.1080/10629360601054313. 

进一步阅读

  • Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
  • R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
  • Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997。 ISBN 0-521-58519-8
  • R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3-527-29913-0

参见

  • 定量构效关系
  • ADME
  • 分配系数

外部链接

  • 小分子子图检测器(SMSD) - 是一个基于Java的软件库,用于计算小分子之间的最大共同子图(MCS)。
  • 2007年第五届国际挖掘与学习研讨会
  • 2006年概览
  • 分子开采(基础化学专家系统)
  • ParMol 和 硕士论文文档[永久失效链接] - Java - 开源 - 分布式挖掘 - 基准算法库
  • TU慕尼黑 - 克莱默集团
  • 分子采矿(高级化学专家系统)
  • DMax化学助理 -商业软件
  • AFGen -用于生成基于片段的描述符的软件
版权声明:本文由北城百科网创作,转载请联系管理获取授权,未经容许转载必究。https://www.beichengjiu.com/computerscience/340721.html

显示全文

取消

感谢您的支持,我会继续努力的!

扫码支持
支付宝扫一扫赏金或者微信支付5毛钱,阅读全文

打开微信扫一扫,即可进行阅读全文哦


上一篇:SMARTS
下一篇:分子查询语言
相关推荐
爱淘宝