Schrödinger's KNIME Extensions: Integrating our leading scientific programs through data pipelining
Many Schrödinger customers know Dr. Eyrich as the Prime
project leader. In addition to his involvement with Prime and his role
as Director of Strategic Software Development, Dr. Eyrich has also
overseen the creation of Schrödinger KNIME Extensions, which is now
available in beta. Here, Dr. Eyrich discusses the functionality users
will see in this first release of Schrödinger KNIME Extensions.
We are excited to announce the
recent beta release of Schrödinger KNIME Extensions, which allow
researchers to rapidly build and execute complex scientific workflows
using the KNIME interface. The Schrödinger nodes build upon the
existing KNIME infrastructure that already contains a variety of useful
nodes, ranging from database reading/writing, statistical analysis,
data mining, model building, reporting, and more. With close to 100
nodes covering many core functionalities within the Schrödinger Suite,
as well as pre-packaged workflows that incorporate these nodes, we are
confident that the first release of KNIME provides a solid foundation
for researchers needing to rapidly prototype, validate, and deploy
robust computational workflows.
KNIME
has established itself as the leading open-source data pipelining tool,
and provides an ideal platform for researchers looking for a way to
combine best-of-breed technologies, both commercial and academic. KNIME
is developed at the University of Konstanz, Germany by the Chair for
Bioinformatics and Information Mining. Under the direction of Dr.
Michael Berthold, who uses KNIME extensively for research, a large
number of new data analysis methods developed at the University have
been integrated in KNIME.
Independent
of the Schrödinger Extensions, KNIME already includes over 100
processing nodes for data manipulation and mining. The KNIME
distribution also incorporates the complete set of analysis models from
the well-known Weka data-mining environment and includes plug-ins that allow R-scripts
to be run, which gives users access to a vast library of statistical
routines. Visualization of results is supported by means of KNIME nodes
that support interactive use of scatter plots, parallel coordinates,
and more. KNIME is based on the Eclipse platform, providing a modular
API that is easily extensible.
Additionally,
custom nodes and data types can be developed and integrated into KNIME
with a very moderate amount of effort, thus enabling the incorporation
of in-house applications and academic packages. Depending on the
complexity of the node or data type, this can be done in hours or even
minutes.
With
Schrödinger KNIME Extensions the number of available nodes is almost
doubled, bringing a wealth of ligand- and structure-based chemistry
functionality. The current beta release of the Schrödinger KNIME
Extensions includes nodes for programs like Glide, Prime, Phase, MacroModel, and Jaguar.
This allows for a wide array of calculations to be performed, ranging
from rapid conformational searches to more detailed molecular mechanics
and quantum mechanics calculations. Complex workflows can be
constructed to bring molecules through a series of different programs
that compute energetic and structural quantities, which can then easily
be combined to build models and improve the accuracy of predictions.
Furthermore, a set of key functionalities from our upcoming
cheminformatics product, Canvas, allows for many types of fingerprints
to be computed, similarity/diversity analysis, clustering, and more.
Schrödinger KNIME nodes support practical considerations as well. Nodes that incorporate LigPrep and Epik
functionality can convert molecules from 1D or 2D virtual libraries
into 3D structures suitable for both ligand- and structure-based
studies. Many nodes exist for protein preparation as well, including
the protein assignment code to optimize hydrogen bond networks. A
number of fragment-based nodes allow users to perform rule-based
fragmentation of existing molecules, or join fragments into molecules.
The QikProp
node computes ADME properties that can then be used for filtering, data
analysis, and model building. Additionally, a variety of visualization
tools are available to better explore molecules of interest and
associated data.
The
above is just a brief listing of what we already have available. Most
importantly, we are working hard to bring much more functionality soon,
helping to make KNIME the premier modular data exploration tool.
Schrödinger
provides support for both the Schrödinger KNIME Extensions as well as
for the KNIME platform itself. The KNIME developers also provide
support through a KNIME community
that contains useful information and discussion forums. We hope to see
many other groups, both academic and commercial alike, create nodes to
expose the functionality of their programs to the KNIME community.
KNIME
can be downloaded with the recently released update of the Schrödinger
2007 Suite. Take a look at what we have and let us know what you think
or how you envision using it in your everyday work. Your feedback is
always appreciated and helps us expand the product’s capabilities in
the directions that are most useful for your research.
At
top, the KNIME interface containing a workflow that clusters molecules
based on calculated fingerprints. One of the end results of the
workflow is the interactive table populated with representative
structures that were output from the clustering job. Below, a close-up
on several nodes of a KNIME workflow that uses Schrödinger Extensions.
The drag-and-drop interface of the KNIME platform makes it relatively
easy and intuitive to create new workflows.