Knowledge Base

Article ID: 980 - Last Modified: May 31, 2011

When should I use more bits to generate a fingerprint? There are options for 10-bit, 32-bit, and 64-bit address space. What's the advantage of running with more bits?

Using a small address space in a hashed fingerprint can result in many collisions. One serious consequence of collisions is an artificial increase in the average similarity between molecules. This increase is misleading because it is entirely due to unrelated fragments that happen to be setting the same bits in the fingerprint.

The conventional 10-bit address space provides for the setting of 210 = 1024 bits, for 1024 features. With a 32-bit address space, linear fingerprints exhibit about one collision for every few thousand molecules fingerprinted. This is a considerable improvement over conventional approaches, where several colliding fragments typically arise from fingerprinting a single molecule. Use of a 64-bit address space effectively eliminates collisions altogether for any reasonable number of molecules. You should therefore consider using a larger address space when you are concerned about collisions.

Because Canvas stores only the index of the "on" bits for each fragment, the storage requirements are not as large as might be expected. The conventional 1024-bit fingerprint is usually stored as a set of on and off bits, which can be done in 128 bytes. An address space of 32 bits would need 512MB of storage per molecule if all bits were stored. If only the on bits are stored, and for example there are 250 features in a molecule, the fingerprint can be stored in 1000 bytes, which is only an order of magnitude more than the conventional 1024-bit fingerprint. (For a 64-bit address space, the storage is doubled.) Using a larger address space therefore takes more disk space. However, it does not, in general, significantly increase the time taken for either generation or use of the fingerprints. The exception is the use of fingerprints as a substructure query index (pre-filter) because there are fewer on bits to compare.

Keywords: Canvas

Back to Search Results

Was this information helpful?

What can we do to improve this information?


If you need additional help, please email us at help@schrodinger.com.
Knowledge Base Search

Type the words or phrases on which you would like to search, or click here to view a list of all
Knowledge Base articles