Molecular descriptor

Molecular descriptors play a fundamental role in chemistry, pharmaceutical sciences, environmental protection policy, and health researches, as well as in quality control, being the way molecules, thought of as real bodies, are transformed into numbers, allowing some mathematical treatment of the chemical information contained in the molecule. This was defined by Todeschini and Consonni as:

"The molecular descriptor is the final result of a logic and mathematical procedure which transforms chemical information encoded within a symbolic representation of a molecule into a useful number or the result of some standardized experiment."[1]

By this definition, the molecular descriptors are divided into two main categories: experimental measurements, such as log P, molar refractivity, dipole moment, polarizability, and, in general, additive physico-chemical properties, and theoretical molecular descriptors, which are derived from a symbolic representation of the molecule and can be further classified according to the different types of molecular representation.[2]

The main classes of theoretical molecular descriptors are: 1) 0D-descriptors (i.e. constitutional descriptors, count descriptors), 2) 1D-descriptors (i.e. list of structural fragments, fingerprints),3) 2D-descriptors (i.e. graph invariants),4) 3D-descriptors (such as, for example, 3D-MoRSE descriptors, WHIM descriptors, GETAWAY descriptors, quantum-chemical descriptors, size, steric, surface and volume descriptors),5) 4D-descriptors (such as those derived from GRID or CoMFA methods, Volsurf). The outspread of artificial intelligence and machine learning to computational chemistry has also lead to various attempts to uncover new descriptors or to find the most predictive ones among some sort of candidates.[3][4]

Invariance properties of molecular descriptors

[edit]

The invariance properties of molecular descriptors can be defined as the ability of the algorithm for their calculation to give a descriptor value that is independent of the particular characteristics of the molecular representation, such as atom numbering or labeling, spatial reference frame, molecular conformations, etc. Invariance to molecular numbering or labeling is assumed as a minimal basic requirement for any descriptor.[citation needed]

Two other important invariance properties, translational invariance and rotational invariance, are the invariance of a descriptor value to any translation or rotation of the molecules in the chosen reference frame. These last invariance properties are required for the 3D-descriptors.[citation needed]

Degeneracy of molecular descriptors

[edit]

This property refers to the ability of a descriptor to avoid equal values for different molecules. In this sense, descriptors can show no degeneracy at all, low, intermediate, or high degeneracy. For example, the number of molecule atoms and the molecular weights are high degeneracy descriptors, while, usually, 3D-descriptors show low or no degeneracy at all.[citation needed]

Basic requirements for optimal descriptors

[edit]
  1. Should have structural interpretation
  2. Should have good correlation with at least one property
  3. Should preferably discriminate among isomers
  4. Should be possible to apply to local structure
  5. Should be possible to generalize to "higher" descriptors
  6. Should be simple
  7. Should not be based on experimental properties
  8. Should not be trivially related to other descriptors
  9. Should be possible to construct efficiently
  10. Should use familiar structural concepts
  11. Should change gradually with gradual changes in structures
  12. Should have the correct size dependence, if related to the molecule size

Software for molecular descriptors calculation

[edit]

Here there is a list of a selection of commercial and free descriptor calculation tools.

Name Descriptors Fingerprints CLI GUI KNIME Comments License Website
alvaDesc[5][6] 5799 Yes Yes Yes Yes Available for Windows, Linux and macOS Proprietary, commercial https://www.alvascience.com/alvadesc/
Dragon[7] 5270 Yes Yes Yes Yes Discontinued Proprietary, commercial https://chm.kode-solutions.net/products_dragon.php
Mordred[8] 1826 No Yes No No Based on RDKit Free open source https://github.com/mordred-descriptor
PaDEL-descriptor[9] 1875 Yes Yes Yes Yes Based on CDK Free open source http://www.yapcwsoft.com/dd/padeldescriptor/

See also

[edit]

References

[edit]
  1. ^ Todeschini, Roberto; Consonni, Viviana (2000). Handbook of Molecular Descriptors. Methods and Principles in Medicinal Chemistry. Wiley. doi:10.1002/9783527613106. ISBN 978-3-527-29913-3.
  2. ^ Mauri, Andrea; Consonni, Viviana; Todeschini, Roberto (2017). "Molecular Descriptors". Handbook of Computational Chemistry. Springer International Publishing. pp. 2065–2093. doi:10.1007/978-3-319-27282-5_51. ISBN 978-3-319-27282-5.
  3. ^ Mueller, Tim; Kusne, Aaron Gilad; Ramprasad, Rampi (2016-04-01). "Machine Learning in Materials Science". In Parrill, Abby L.; Lipkowitz, Kenny B. (eds.). Reviews in Computational Chemistry. Vol. 29 (1st ed.). Wiley. pp. 186–273. doi:10.1002/9781119148739.ch4. ISBN 978-1-119-10393-6.
  4. ^ Ghiringhelli, Luca M.; Vybiral, Jan; Levchenko, Sergey V.; Draxl, Claudia; Scheffler, Matthias (2015-03-10). "Big Data of Materials Science: Critical Role of the Descriptor". Physical Review Letters. 114 (10). 105503. arXiv:1411.7437. Bibcode:2015PhRvL.114j5503G. doi:10.1103/PhysRevLett.114.105503. PMID 25815947.
  5. ^ Mauri, Andrea (2020). "alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints". Methods in Pharmacology and Toxicology. New York, NY: Springer US. pp. 801–820. doi:10.1007/978-1-0716-0150-1_32. ISBN 978-1-0716-0149-5. ISSN 1557-2153. S2CID 213896490.
  6. ^ Mauri, Andrea; Bertola, Matteo (2022). "Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood–Brain Barrier Permeability". International Journal of Molecular Sciences. 23 (12882): 12882. doi:10.3390/ijms232112882. PMC 9655980. PMID 36361669.
  7. ^ Mauri, A., Consonni, V., Pavan, M., & Todeschini, R. (2006). Dragon software: An easy approach to molecular descriptor calculations. Match Communications In Mathematical And In Computer Chemistry, 56(2), 237–248.
  8. ^ Moriwaki, H., Tian, Y. S., Kawashita, N., & Takagi, T. (2018). Mordred: A molecular descriptor calculator. Journal of Cheminformatics, 10(1), 1–14. https://doi.org/10.1186/s13321-018-0258-y
  9. ^ Yap, C. W. (2011). PaDEL-descriptor: An open source software to calculate molecular descriptors and fingerprints. Journal of Computational Chemistry. https://doi.org/10.1002/jcc.21707

Further reading

[edit]
  • Roberto Todeschini and Viviana Consonni, Molecular Descriptors for Chemoinformatics (2 volumes), Wiley-VCH, 2009.
  • Mati Karelson, Molecular Descriptors in QSAR/QSPR, John Wiley & Sons, 2000.
  • James Devillers and Alexandru T. Balaban (Eds.), Topological indices and related descriptors in QSAR and QSPR. Taylor & Francis, 2000.
  • Lemont Kier and Lowell Hall, Molecular structure description. Academic Press, 1999.
  • Alexandru T. Balaban (Ed.), From chemical topology to three-dimensional geometry. Plenum Press, 1997