The increase in diversity achieved by Canvas Hole-Filling is further confirmed by performing a Bemis-Murcko3 scaffold analysis on the collections used for the analyses shown in Figure 2b and 2c. When Hole-Filling is used, the combined library of 2000 compounds contains a total of 2256 unique Bemis‑Murcko scaffolds. This compares to only 1775 unique scaffolds when compounds are chosen randomly.
It is important to note that merely adding diverse compounds to a library may or may not satisfactorily fill holes in that library. For example, the chosen compounds may simply occupy regions of chemical space that surround the library, and fail to fill holes in the interior. To determine whether the desired behavior is occurring, a fingerprint-based Kohonen map4 can be built from the reference library plus all 5000 compounds in the pool. This establishes the limits of a hypothetical chemical space and the maximum possible coverage that would result by adding every available compound. As shown in Figure 3a, the combined collection of 6000 compounds occupies 233 out of 256 cells, which corresponds to 91% coverage of the chemical space. When this same map is applied to just the 1000 compounds in the reference library, the coverage is only 28% (72/256 cells), as illustrated in Figure 3b. By contrast, applying the map to the reference library plus the 1000 hole-filling compounds results in 88% coverage (225/256 cells), Figure 3c. Thus, choosing only 20% of the pool fills nearly every hole that could be filled, including interior ones.

Figure 3: Use of Kohonen maps to verify whether holes in a library are actually being filled. Red cells represent units of chemical space that contain one or more compounds, whereas white cells represent units of chemical space that are empty. (a) A map built from the reference library of 1000 compounds plus all 5000 compounds in the pool; (b) a map of only the reference library; (c) a map of the reference library plus 1000 hole-filling compounds.
Visualizing the Effects of Structure on Potency
Lead identification is typically followed by synthetic efforts that explore the chemical space around those leads, with the aim of improving potency and/or pharmacokinetic profiles and elucidating structure-activity relationships. If sufficient data are available, it may be possible to develop a predictive QSAR model that can help identify additional high potency compounds. Models that allow chemists to visualize favorable and unfavorable characteristics of structure are particularly useful in this regard, because they provide specific clues about how to modify an existing lead to improve potency. Canvas offers a versatile new tool for this as well, which combines the unique qualities of Canvas fingerprints5 with the power of kernel‑based partial least‑squares regression, or KPLS.6
