Article ID: 1620 - Last Modified: October 12, 2011
Starting with a set of query molecules, I would like to find nearest-neighbor, similar compounds based on 2D fingerprints, from a larger database of compounds. How should I approach this task?
This feature is available from the Canvas user interface at Applications → Similarity/Distance Screen.
For a larger database of compounds, running from the command line is also an option, with the following set of commands. It is a good idea to start with a small, example set of compounds for the database before scaling up to a larger set.
- Generate fingerprints for your database of compounds. For example, with an SD file for the database, use the following command:
$SCHRODINGER/utilities/canvasFPGen -isd database.sdf -o database.fp -fptype molprint2D -atomtype 5 -minpath 2 -path 2 -scaling 0
Generate fingerprints for your query molecules. For example, with the query molecules in an SD file, use the following command:
$SCHRODINGER/utilities/canvasFPGen -isd query.sdf -o query.fp -fptype molprint2D -atomtype 5 -minpath 2 -path 2 -scaling 0
Create the similarity matrix:
$SCHRODINGER/utilities/canvasFPMatrix -ifp database.fp -ifp2 query.fp -ocsv output_fpscreen.csv -metric tanimoto -filter cutoff > output.log
You may need to experiment with the cutoff (which should be between 0 and 1) to filter out the least similar structures: for example, start with a higher value and decrease it if you need more structures.
The output is written to the csv file output_fpscreen.csv. You can import this file into a spreadsheet and sort it to identify the most similar compounds to each query molecule.
Type the words or phrases on which you would like to search, or click here to view a list of all
Knowledge Base articles