Novel Methodology for Improving the Generalization Capability of Chemo-Informatics Deep Learning Models

Sandjakoska, Ljubinka; Madevska Bogdanova, Ana; Pejov, Ljupcho

Novel Methodology for Improving the Generalization Capability of Chemo-Informatics Deep Learning Models

Date Issued

2022-10

Author(s)

Sandjakoska, Ljubinka

Madevska Bogdanova, Ana

Pejov, Ljupcho

Abstract

In the last decade, the research community has implemented various applications of deep learning concepts to solve quite advanced tasks in chemistry, ranging from computational chemistry to materials and drug design and even chemical synthesis problems at both laboratory and industrial – grades. Because of the advantages as a high-performance prediction tool in molecular simulations, deep learning is becoming far more than just a temporary trend. Instead, it is foreseen as a tool that will be essential to employ throughout tackling a range of different issues in chemical sciences in the nearest future. In this paper, we propose a novel methodology for regularization of deep neural networks used in chemo-informatics. The methodology consists of four blocks: Class of initial conditions; Orthogonalization, Activation and Standardization. Three graph-based architectures are developed: deep tensor neural network, directed acyclic graph and convolutional graph model. Graph-based models are more convenient for modeling molecules since the molecules and their features are often naturally represented by graphs. Several experiments are obtained on datasets from MoleculeNet aggregator: QM7, QM8, QM9, ToxCast, Tox21, ClinTox, BBBP and SIDER, for predicting geometric, energetic, electronic and thermodynamic properties on small molecules. The obtained results outperform some of the published references and give directions for further improvement. As a particular example, in one of the architectures, we have reduced mean absolute error by more than 12 times compared to conventional regression models, and more than 3 times in comparison to deep networks where the proposed methodology is not implemented.

Subjects

Deep learning Regular...