Article ID: 1620 - Last Modified: October 12, 2011
Starting with a set of query molecules, I would like to find nearest-neighbor, similar compounds based on 2D fingerprints, from a larger database of compounds. How should I approach this task?
This feature is available from the Canvas user interface at Applications → Similarity/Distance Screen.
For a larger database of compounds, running from the command line is also an option, with the following set of commands. It is a good idea to start with a small, example set of compounds for the database before scaling up to a larger set.
- Generate fingerprints for your database of compounds. For example, with an SD file for the database, use the following command:
$SCHRODINGER/utilities/canvasFPGen -isd database.sdf -o database.fp -fptype molprint2D -atomtype 5 -minpath 2 -path 2 -scaling 0 -
Generate fingerprints for your query molecules. For example, with the query molecules in an SD file, use the following command:
$SCHRODINGER/utilities/canvasFPGen -isd query.sdf -o query.fp -fptype molprint2D -atomtype 5 -minpath 2 -path 2 -scaling 0 -
Create the similarity matrix:
$SCHRODINGER/utilities/canvasFPMatrix -ifp database.fp -ifp2 query.fp -ocsv output_fpscreen.csv -metric tanimoto -filter cutoff > output.log
You may need to experiment with the cutoff (which should be between 0 and 1) to filter out the least similar structures: for example, start with a higher value and decrease it if you need more structures.
The output is written to the csv file output_fpscreen.csv. You can import this file into a spreadsheet and sort it to identify the most similar compounds to each query molecule.
If you need additional help, please email us at help@schrodinger.com.
Type the words or phrases on which you would like to search, or click here to view a list of all
Knowledge Base articles

