I would like to select a set of 100K-200K new compounds (from vendor libraries) that are diverse with respect to each other and also diverse with respect to the compounds we already have. How should I do this?

You can do this using Canvas. You need to do diversity-based compound selection, using your existing library compounds as an initialization for the diversity-selection algorithm.

You can run canvasDBCS from the command line to do the selection. To see the options for this command, run it without options:


You must use the "sphere" diversity selection algorithm (this is the default) and use the -ifp2 option to initialize the algorithm with your existing library compounds. The -ifp2 option overrides the -init option.

You can also do this using the Canvas GUI, from Suite 2010 on. Load all the relevant compounds into the project, and then create a View containing only your existing library compounds. Generate fingerprints for all compounds. Then, under Applications → Diversity Analysis, choose "Initialization method" to be "existing structures" and select the View for your existing library compounds.

You can tweak the "exclusion sphere size" GUI option (equivalent to the -d option on the commandline) to obtain more or fewer results.

