CLOSE
Schrödinger
  • Home
  • Product Suites
    • S m a l l - M o l e c u l e  D r u g  D i s c o v e r y
    • B i o l o g i c s
    • M a t e r i a l s  S c i e n c e
    • D i s c o v e r y  I n f o r m a t i c s
    • P y M O L
    • L i s t  o f  A l l  P r o d u c t s
    • I m a g e  G a l l e r i e s
  • Support
    • C o n t a c t  S u p p o r t
    • R e q u e s t  L i c e n s e
    • D o c s  a n d  K n o w n  I s s u e s
    • K n o w l e d g e  B a s e
    • V i d e o s
  • Resources & Downloads
    • D o w n l o a d s
    • S c r i p t s
    • K N I M E  W o r k f l o w s
    • P y t h o n  A P I
    • C i t a t i o n s
    • T r i a l s / S a l e s  Q u o t e
    • P a y m e n t  P o l i c i e s
    • E U L A
  • News & Events
    • E v e n t s
    • N e w s
    • N e w s l e t t e r s
    • S e m i n a r s
  • About
    • O v e r v i e w
    • P a r t n e r s h i p s
    • L e a d e r s h i p
    • S c i e n t i f i c  A d v i s o r s
    • C a r e e r s
    • C o n t a c t  U s
    • S c h r ö d i n g e r  K . K .
  • Home
  • News & Events
  • Newsletters

– January 2011 Newsletters

  • Events
  • News
  • Newsletters
  • Seminars
Using KNIME for Workflow Automation: Questions and Answers
Dr. Jean-Christophe Mozziconacci, Schrödinger Applications Scientist

Schrödinger KNIME Extensions allow scientists to protoype, validate, automate, and deploy multi-step workflows. In this installment of KNIME Questions and Answers, KNIME expert and Schrödinger Applications Scientist Jean-Christophe Mozziconacci talks about how KNIME handles data between steps, and how to use loops within KNIME.

--

Q. It makes sense to me that I can build a KNIME workflow by using the output from one node as input to another. But how does KNIME handle the data between steps?

A. KNIME nodes usually accept input and report output in the form of a table, though there are exceptions for specific connectors (e.g., the model connector). Within Schrödinger workflows, there is usually one or more columns that contain structure files. These columns contain one structure file per row. Additional columns may contain structure-related properties.

In practice, the format of the data tables that KNIME uses is something you’ll seldom have to worry about. However, there are a couple of important things to bear in mind.

In these data tables, each cell with a structure file contains an entire structure file, including structure-related properties. If you ever need to put a newly calculated property into a separate column, the node Extract Maestro Properties will allow you to do this.

Additionally, the properties in structure files and the properties in other columns are not automatically kept in sync. If you want the structure files to reflect property information from the KNIME data table, you can use the node Set Maestro properties.

--

Q. Can the output of any KNIME node be used as the input to any other KNIME node?

A. Any given node requires that the input table(s) have columns of the expected type. This helps to prevent calculation failures in the middle of your workflow. In other words, it may be necessary to check and possibly convert input table column types before connecting nodes.

However, this tends not to be much of a problem. As long as you’re careful to provide nodes with the proper input, the workflow will run without problems. For example, because running a Jaguar optimization on a SMILES structure would result in a failed calculation, the Jaguar node requires that a Maestro format (i.e., 3-dimensional) structure file be used as input. Hence SMILES strings can’t be used as input.

To restate things a little more technically, KNIME uses strict column types in the data table, making it possible for nodes to reliably handle the input data. Column types frequently used for generic data manipulation include integers, strings, and doubles (i.e., double precision). Column types used specifically for molecular modeling include SMILES strings, SDF structure files, and Maestro format files.

It’s important to keep column types in mind when merging results (i.e., concatenating tables) or when you can’t find a column that you expect. If the column is not properly typed, it may not be accepted as input to a specific node.

--

Q: What are some examples of the column types that Schrödinger nodes use?

A. In addition to the usual cell types like text, integer, double, SDF, and SMILES, specific cell types have been created to handle modeling-specific and Schrödinger-specific objects:

Cell types for structures: Maestro format files (also called Maestro connection tables, or Maestro CTs), and protein sequences and alignments all have their own cell types.

Maestro data-type cells may contain either a single discrete structure, or multiple structures, (i.e., a “grouped CT”). These so-called “grouped CTs” can contain several structures per cell. Examples might be a set of conformations for the same molecule, different Glide poses for the same ligand, or the tautomers and ionization states that LigPrep enumerated for a single compound.

Cell types for other types of modeling results: Glide grids, Phase pharmacophore hypotheses, and one-dimensional Canvas ligand fingerprints all have their own cell types.

--

Q. My structure files are in SDF format, but the node I want to use requires a Maestro format file. Is it possible to convert from one type of data to another?

A. Yes, the KNIME node repository has a node called Molecule-to-MAE, found under Schrödinger > Converters. Additionally, many nodes are available to perform the following conversions.

Convert basic types: The nodes Double to Integer, String to Number, and Number to String can all be found in the KNIME Workbench’s node repository under Data Manipulation > Column > Convert & Replace.

Turn a text cell into a structure type: These operations are handled by the String-To-Type node and the Molecule Type Cast node. They are found in the node repository under Schrödinger > Converters and Chemistry > Translators, respectively.

Convert structure types: Many such nodes are available under Schrödinger > Converters, including Molecule-to-MAE, MAE-to-Pdb, MAE-to-SD, MAE-to-Smiles, MAE-to-mol2, and SD-to-smiles. The Openbabel node is found under Chemistry > Translators, and several CDK conversion nodes are found under Chemistry > CDK > Translators.

Group and ungroup structures: The Group MAE and Ungroup Mae nodes are available to collapse structure files into grouped CTs, or expand them back into single-CT format. They are found under Schrödinger > Tools > Data Manipulation.

Convert Canvas objects: Nodes for converting many types of Canvas results (e.g., Convert Fingerprint to Bitvector) are available under Schrödinger > Cheminformatics > Utilities and Converters.

--

Q. I want to dock multiple compounds. Do I always need to use loops if I want to perform the same operation on multiple structures?

A. If you’re used to Python scripting, you might expect that a loop would be required to perform the same calculation on each of your input structures (like a multiple minimization or multi-ligand docking job). However, in most cases you wouldn’t actually need a loop to do so, because the nodes will sequentially operate on each row in the input table. For example, the Glide Multiple Ligand Docking and Phase Database Query nodes will operate on all rows in the input without requiring any special setup.

--

Q. Are there any scenarios where I would need to use loops? How can I do that?

A. Looping is still necessary for scenarios that are more complex than multi-ligand minimization, docking, and so on. Most loop-related nodes are available under the Loop support category in the node repository. Pre-built protocols can be found in the Meta category (e.g., the Loop x Times and Cross Validation nodes).

A loop typically begins with one of the start nodes and ends with the Loop End node. There are a variety of nodes available for starting loops, and the start node you choose will affect the way the loop is executed. One simple example is the node Counting Loop Start, which runs a loop a specified number of times. Another example is Schrödinger’s Row Iterator Loop Start, which operates on one row at a time.

--

Q. What are some example workflows that use loops?

A. For an example workflow that uses loops, see the workflow Docking and post-processing – Loop Over Docking Parameters. This workflow docks a given ligand set using a variety of different docking parameters, allowing you to compare the best-scoring results from different docking runs. It uses the node Table Row to Variable Loop Start to loop over docking parameters that are specified earlier in the workflow. Note that the loop takes place within a metanode, which is why it’s not immediately visible in the workflow.

If you want to iterate over all the rows in the input data (like grouped Maestro CTs), you can use the node Row Iterator Loop Start. This is found in the node repository under Schrödinger > Tools > Utilities. Possible uses of this node might be to evaluate RMSDs for groups of conformers, or to extract diverse compounds from each cluster. For examples that use this node, see the workflows General Tools – Group Looper and Cheminformatics – Cluster by Fingerprint 1-4.

Shown above is a workflow that will run a docking calculation using various sets of parameters. Here, blue nodes mark the beginning and end of a loop. This example is very similar to the workflow Docking and post-processing - Loop Over Docking Parameters, which embeds the loop within a “metanode.”

Table of Contents

Sampling Macrocycles with MacroModel

Shawn Watts, Pranav Dalal, Teng Lin, and John Shelley

Bill Gates visits Schrödinger's NY office
Ask the Scripts Expert

Dr. Woody Sherman, Vice President of Applications Science

Using KNIME for Workflow Automation: Questions and Answers

Dr. Jean-Christophe Mozziconacci, Schrödinger Applications Scientist

Glide Paper is top cited J. Med. Chem. publication
Answer your technical questions with the new Schrödinger Knowledge Base
Slides and Recordings from Fall 2010 Seminar Series available for download
What talks do you want to see in the Spring 2011 Seminar Series?
Upcoming Events
Recent Publications

View Issue

  • November 2012
  • May 2012
  • October 2011
  • May 2011
  • January 2011
  • August 2010
  • May 2010
  • January 2010
  • August 2009
  • February 2009
  • Fall 2008
  • Summer 2008
  • Spring 2008
  • Winter 2007
  • Fall 2007
  • Summer 2007
  • Spring 2007
  • Winter 2006
  • Fall 2006
  • Home
  • Product Suites
  • S m a l l - M o l e c u l e  D r u g  D i s c o v e r y
  • B i o l o g i c s
  • M a t e r i a l s  S c i e n c e
  • D i s c o v e r y  I n f o r m a t i c s
  • P y M O L
  • L i s t  o f  A l l  P r o d u c t s
  • I m a g e  G a l l e r i e s
  • Support
  • C o n t a c t  S u p p o r t
  • R e q u e s t  L i c e n s e
  • D o c s  a n d  K n o w n  I s s u e s
  • K n o w l e d g e  B a s e
  • V i d e o s
  • Resources & Downloads
  • D o w n l o a d s
  • S c r i p t s
  • K N I M E  W o r k f l o w s
  • P y t h o n  A P I
  • C i t a t i o n s
  • T r i a l s / S a l e s  Q u o t e
  • P a y m e n t  P o l i c i e s
  • E U L A
  • News & Events
  • E v e n t s
  • N e w s
  • N e w s l e t t e r s
  • S e m i n a r s
  • About
  • O v e r v i e w
  • P a r t n e r s h i p s
  • L e a d e r s h i p
  • S c i e n t i f i c  A d v i s o r s
  • C a r e e r s
  • C o n t a c t  U s
  • S c h r ö d i n g e r  K . K .
RSS RSS
Copyright © 2005-2013 Schrödinger, LLC
  • Privacy Policy
  • Terms of Use
  • FCOI Policy
  • Log On
  • My Account
Schrödinger