Cosian Documentation

Summary :


I) User Guide

II) CoSiAn

III) Employed Software

IV) References


User guide :


To use Cosian, you can either go to the webserver page and launch similarity analysis directly. You can also go to the download section and download scripts to run Cosian on your personnal computer under Linux distribution.

Webserver use :

To use Cosian, you will need :
-> A "Reference" structure, formated in MOLFILE v3000; This file must contain only ONE structure.
-> A "Bank" of molecules to screen, formated in MOLFILE v3000; This file can contain many different structures.
-> Select a "Similarity Combination" : Tanimoto, Shape, RMSD, Hybrid or Fast. Hybrid combination is performing better than other combinations, and therefore is set to the default combination.
-> If you wish to perform molecular superimposition, tick the "Superimposition option" and select a threshold to be outreached by the "Consensus" similarity score.

Result Analysis :

-> Your results are displayed in a table. Consensus score is the final score given by Cosian.
-> In the Zscore visualisation, similarity scores are transformed into zscores, which allows the visualization of how far from the standard deviation is the metric score.
-> You can click on molecule names, and display them in the user-friendly JSmol implementation. You can also visualize superimposed structures.
-> In the result page, you can download your results in .csv or .mol2 format.

CoSiAn :


Among the different drug development strategies, virtual docking of a compound in a receptor and similarity analysis between different molecules are widly used in the early stages of the drug development. Both methods rely on different assumptions : for virtual docking, the assumption is that finding a molecule with good affinity for its receptor may lead to a desired activity. For similarity analysis, the assumption is that similar molecules should have similar activity with respect to the receptor. In both cases, results include false positive estimations, encouraging us to develop Cosian (Combinatorial Similarity Analysis), a pipeline that aims to improve conventional ligand-based similarity search by combining five different algorithms to represent at best the similarity landscape by using different similarity detectors. Cosian was built and tested using the DUD-E database, and showed overall increase of performance compared to the use of a single software.



Software :

ISIDA Fragmentor [1]

ISIDA Fragmentor was developed by the Laboratoire de Chémoinformatique, Chimie de la Matière Complexe, Université de Strasbourg, France. ISIDA Fragmentor is part of the ISIDA project that aims to develop tools for calculation of descriptors, help navigation in chemical space, modeling of Quantitative Structure-Activity Relationship (QSAR) and virtual screening.
Based on a series of graph algorithm, ISIDA Fragmentor is able to generate molecular fragment descriptors (of a user defined size) from SDF file containing molecular structures.
The generated fragments are associated with the number of occurences present in each molecule, and allow the generation of database-dependant fragment descriptors.
Please find the link to the Laboratoire de Chémoinformatique, Strasbourg and to the ISIDA project.

OpenBabel [2]

OpenBabel, a free and open source project, is one of the most widely used software for molecular interconversion and file formating.
From its high number of functions, OpenBabel is able to compute fingerprints from molecular structure. Many different fingerprint types are available, such as Extended Connectivity Fingerprint (ECFP) 2, 3 & 4 and MACCS fingerprints. In all cases, the generated output is a fixed-length bit string.
For similarity analysis purposes, the conversion of the molecular structure to a bit string allows the calculation of the well known Tanimoto coefficient.
OpenBabel is free, open source, widely used and well documented. If you wish to know more about it, use this link : Open Babel

SHAFTS [3]

Abstract :

We developed a novel approach called SHAFTS (SHApe-FeaTure Similarity) for 3D molecular similarity calculation and ligand-based virtual screening. SHAFTS adopts a hybrid similarity metric combined with molecular shape and colored (labeled) chemistry groups annotated by pharmacophore features for 3D similarity calculation and ranking, which is designed to integrate the strength of pharmacophore matching and volumetric overlay approaches. A feature triplet hashing method is used for fast molecular alignment poses enumeration, and the optimal superposition between the target and the query molecules can be prioritized by calculating corresponding “hybrid similarities”. SHAFTS is suitable for large-scale virtual screening with single or multiple bioactive compounds as the query “templates” regardless of whether corresponding experimentally determined conformations are available. Two public test sets (DUD and Jain’s sets) including active and decoy molecules from a panel of useful drug targets were adopted to evaluate the virtual screening performance. SHAFTS outperformed several other widely used virtual screening methods in terms of enrichment of known active compounds as well as novel chemotypes, thereby indicating its robustness in hit compounds identification and potential of scaffold hopping in virtual screening.

-> Here is a link to the publication

LSalign [4]

Abstract :

Motivation :
Sequence-order independent structural comparison, also called structural alignment, of small ligand molecules is often needed for computer-aided virtual drug screening. Although many ligand structure alignment programs are proposed, most of them build the alignments based on rigid-body shape comparison which cannot provide atom-specific alignment information nor allow structural variation; both abilities are critical to efficient high-throughput virtual screening.

Results :
We propose a novel ligand comparison algorithm, LS-align, to generate fast and accurate atom-level structural alignments of ligand molecules, through an iterative heuristic search of the target function that combines inter-atom distance with mass and chemical bond comparisons. LS-align contains two modules of Rigid-LS-align and Flexi-LS-align, designed for rigid-body and flexible alignments, respectively, where a ligand-size independent, statistics-based scoring function is developed to evaluate the similarity of ligand molecules relative to random ligand pairs. Large-scale benchmark tests are performed on prioritizing chemical ligands of 102 protein targets involving 1 415 871 candidate compounds from the DUD-E (Database of Useful Decoys: Enhanced) database, where LS-align achieves an average enrichment factor (EF) of 22.0 at the 1% cutoff and the AUC score of 0.75, which are significantly higher than other state-of-the-art methods. Detailed data analyses show that the advanced performance is mainly attributed to the design of the target function that combines structural and chemical information to enhance the sensitivity of recognizing subtle difference of ligand molecules and the introduces of structural flexibility that help capture the conformational changes induced by the ligand–receptor binding interactions. These data demonstrate a new avenue to improve the virtual screening efficiency through the development of sensitive ligand structural alignments.

Availability and implementation

http://zhanglab.ccmb.med.umich.edu/LS-align/

-> Here is a link to the publication

LigCSRre [5]

Abstract :

The wwLigCSRre web server performs ligand-based screening using a 3D molecular similarity engine. Its aim is to provide an online versatile facility to assist the exploration of the chemical similarity of families of compounds, or to propose some scaffold hopping from a query compound. The service allows the user to screen several chemically diversified focused banks, such as Kinase-, CNS-, GPCR-, Ion-channel-, Antibacterial-, Anticancer- and Analgesic-focused libraries. The server also provides the possibility to screen the DrugBank and DSSTOX/Carcinogenic compounds databases. User banks can also been downloaded. The 3D similarity search combines both geometrical (3D) and physicochemical information. Starting from one 3D ligand molecule as query, the screening of such databases can lead to unraveled compound scaffold as hits or help to optimize previously identified hit molecules in a SAR (Structure activity relationship) project. wwLigCSRre can be accessed at http://bioserv.rpbs.univ-paris-diderot.fr/wwLigCSRre.html.

Please find a link to the related publication -> here <-

Bibliography :


[1] A. Varnek, D. Fourches, D. Horvath, O. Klimchuk, C. Gaudin, P. Vayer, V. Solov'ev, F. Hoonakker, I.V. Tetko, G. Marcou. ISIDA - Platform for Virtual Screening Based on Fragment and Pharmacophoric Descriptors, Current Computer-Aided Drug Design (2008)4:191. ▸Publication Link◂

[2] Noel M O'Boyle, Michael Banck, Craig A James, Chris Morley, Tim Vandermeersch and Geoffrey R Hutchison. Open Babel : An open chemical toolbox. Journal of Cheminformatics (2011) 3:33. https://doi.org/10.1186/1758-2946-3-33 ▸Publication Link◂

[3] Xiaofeng L., Hualiang J., Hongling L. SHAFTS: A Hybrid Approach for 3D Molecular Similarity Calculation. Method and Assessment of Virtual Screening J. Chem. Inf.Model.,2011,51(9),p2372–2385.DOI :10.1021/ci200060s

[4] Jun Hu, Zi Liu, Dong-Jun Yu, and Yang Zhang. LS-align: an atom-level, flexible ligand structural alignment algorithm for high-throughput virtual screening. Bioinformatics, 2018,34(13): p2209-2218.

[5] Quintus F, Sperandio O. Grynberg J, Petitjean M. Tuffery P. Ligand scaffold hopping combining 3D maximal substructure search and molecular similarity. BMC Bioinformatics. 2009 Aug 11;10:245 doi: 10.1186/1471-2105-10-245.