THE AUTOMATIZATION OF CHEMICAL FORMULAS COMPARISON

UDC 004.89

N.A. Vayngolts, G.A. Vereshchak, D.M. Korobkin, S.A. Fomenkov

An expert of the patent office to establish the uniqueness of the patented technology, it is necessary to compare the patent application with the patents and make sure that there are no analogues of the invention. When analyzing patents of chemical classes, it is required to compare chemical formulas that can be given in different formats: MOL, InChi, SMILES, structural formula, molecular fingerprint. This paper describes the development of a software that automates the procedures: conversion of various formalization of the chemical formula, comparison of chemical formulas from the patent application and patents, identification of patents-analogues based on the results of comparison of chemical formulas. Comparison of chemical formulas is based on the calculation of the similarity of molecular fingerprints using the Tanimoto coefficient. The coefficient of similarity of patents is calculated based on the maximum values of the Tanimoto coefficient for a set of compared chemical compounds from patents. The software is developed on Java using the Spring Framework technology, the H2, and the Chemistry Development Kit (CDK). The software showed a high performance (high recall and precision of the patent search on the basis of chemical formulas, the lowest values of the information loss and noise).

Keywords: : chemical formula, SMILES, InChi, MDL Molfile, molecular fingerprint, patent database analysis, Tanimoto Coefficient.

Full text:
VayngoltsSoatori_4_18_1.pdf