Skip to content.
LONI > Pipeline > Pipeline_2011Specs

LONI Pipeline 2011 Specifications

This document outlines some of the LONI Pipeline foci for development efforts and specifies features that need to be developed in 2011.

Priority 5 Action Items

Web-service invocation ( Priority: 5 )

We need to extend the current command-line-driven Pipeline tool invocation to include web-service-based resource utilization (WSDL) and module communication

Pipeline Logs/Management/Viz of SW Tool Usage (Priority 4.5)

The PL should track and disseminate accurate PL usage stats to the community, so that tool developers may themselves obtain, process and report these usage stats for their own tools. We need to find a quick way to provide accurate and real-time usage stats and load patterns for all SW Tools which are currently run through the pipeline (whether these tools are available in the PL Library or designed and executed from separate users independently). Note that this can't be done on the systems level, as many systems commands are misleading (e.g., for java -jar MyJar.jar, the system would report a Java invocation, instead of MyJar.jar run) - so this may have to be done on the PL Server level. Example of summary stats to record and present to users:
  • (Total) CPU time: (daily, monthly)
  • Usage counts: (number of module instances executed, daily, monthly)
  • Number of different users (user count): (number of different users who use a module, daily, monthly).

Priority 4 Action Items

Pipeline consumption of web-services as modules in heterogeneous workflows, as well as pipeline as a web-service itself.

Web service-based modules

Additional Database Access Examples

We need to extend the PL IDA/XNAT DB interface to enable data retrieval from (and storage of derived data into) external databases (imaging and non-imaging. See the XCEDE framework which facilitates common imaging DB communication.
  • Facilitate general informatics and genomics research
  • Examples of Informatics DBs: PDB, SCOP, GenBank
  • (Smartlines) Informatics data formats (processing and conversion?): gcg, embl, swissprot, fasta, ncbi, genbank, nbrf, codata, strider, clustal, phylip, acedb, msf, ig, staden, text, raw, asis

Distributed Pipeline Server (DPS)

We need to improve and distribute V.2 of the Distributed Pipeline Server (DPS) in Winter 2011. DPS V2:
  • New features
    • Auto-Updater ( Priority: 4 )
    • Classes of tools (e.g., neuroimaigng, informatics, genomics, astronomy, etc.) ( Priority: 1 )
  • Improvements
    • Support for Multiple Linux platforms (Fedora, Ubuntu, RedHat?, etc.) ( Priority: 2 )
    • Reduce the list of requirements ( Priority: 1 )

Meta-data augmentation ( Priority: 5 )

Provide a mechanism for storing the derived meta-data, e.g., volume/shape measures.
  • Background: See the Kepler (PDF1 and PDF2) and Taverna (PDF1 and PDF2) approaches for metadata augmentation
  • Output Stream and Error Stream Readers - Allows Pipeline to access and use the content of module's output/error stream in conditionals as well as other kind of modules. Enable metadata as output parameter.
  • All meta-data provided in the beginning of the workflow (CSV or XML) needs to be accessible to all modules in the workflow by value and by reference
  • The following functions may be useful:
    • Get Value
      • String getMetaDataValue(SubjectID?, FieldName?)
      • String getMetaDataValue(FieldName?), data-source based iterator-index provides the Subject ID indexing
    • Get Name
      • String[] getMetaDataName(SubjectID?, Value)
      • String[] getMetaDataName(Value)
  • Reference Meta-Data value or name by following syntax
    • subject[$index].${variable[${index[$entry]}]}
    • Meta-data values can be of String or String[] types
    • We want to be able to reference by value or name each meta-data entry
    • Do we need Pipeline Macros for this??? Macros could be ala-Excel functions that users can write themselves – for instance to compute subject[$index1 + $index2]?
  • Use case 1: Compute the mean-age of all subjects in Group1 (say Normals), using an external module that does average of a bunch of numbers. Then passing the mean-age value to the next module for processing along with all imaging and meta-data.
  • Finally, the complete meta-data-augmentation feature will allow parsing ASCII output results files, extracting specific formatted values and passing these downstream or saving them as additional meta-data for the initial dataset.

Improvements for Annotations ( Priority: 4 )

  • Hide/show all annotations
  • Collapse/Expand all and individual annotations
  • Allow to attach images to annotations

Grid Interfaces

  • Improve the Pipeline DRMAA interface to be fully functional for multiple schedulers such as GridWay ( Priority: 4 )
  • SGE external scheduler. SGE specs defined in workflow-properties. ( Priority: 1 )
  • Globus plug-in interface ( Priority: 2 )

Priority 3 Action Items

Graphical Programming Environment ( Priority: 3 )

Finalize/generalize the Loops/Iterators
  • Finalize the module-group looping mechanism
  • Nice ( simplified ) user interface for conditional modules and iterators

Social Pipeline Networking (SPN)

Extend the Pipeline environment to allow social user networking and open or controlled exchange of ideas, protocols and discussions within the PL environment see this PDF spec (March 2011).

Priority 2 Action Items

Workflow-Diff (Priority 2)

Design a new PL functionality that enables the comparison of 2 workflows at high and low level. This may work on module or workflow definition level like various IDE's provide mechanisms for comparing 2 files (typically versions of the same file).

Scripting ( Priority: 2 )

  • (bash) scripts and makefiles with conditionals and iterations ( Student project )

Peer-to-peer (P2P/S2S) communication (Priority 2)

  • Workflow execution on multiple servers / Web-services (WSDL)? ( Priority: ? ) ( GLOBUS may resolve the need for S2S implementation within PL )

Pipeline Library Manager ( Pipeline: 2 )

Design, implement and distribute a new Java-based Pipeline Library Manager GUI, which includes an graphical user interface for managing the Pipeline module and workflow definitions on remote Pipeline servers or local user clients.

Pipeline Library Navigator

Support Functionality for Genomics and Informatics Research

The Pipeline is increasingly utilized beyond its initial neuroimaging applications domain. An example of a new domain of rapidly growing Pipeline applications include the area of genomics sequencing and informatics analysis.

BIRN Capability Integration ( Priority: 1 )

Other

Improvements

  • Add a rudimentary mechanism (exec2pipe.java) that allows the skeletonization of the pipeline module wrapper (*.pipe) defining the basic interface to an existing executable (exec.exe). This can use man-page like or "--help" interfaces to deduce the most likely invocation protocol for an executable and generate the pipeline module definition, which will subsequently require user fine-tuning.
  • Add comments and descriptions wherever is missing to help new users to understand Pipeline (This can include perhaps wizards, tips and/or auto-corrections when something is done wrong and Pipeline is able to automatically fix it)
  • Collaborative Workflows: User groups (friends), sticky-notes/messages, optional joint workflow control
  • Primary: Random() functionality (e.g., randomly permute subjects in groups, or RNG)
  • Start workflows at user specified days/times
  • Result Validation: Need to think of a schema for generating a movie for all subjects, showing axi/sagi/cor slices throughout the brain for validation of automated results. See SIG-FLOW 11/04/10 Notes.
  • LONI Viewer improvements:
    • LONI_Previewer (svn+ssh://svn.loni.ucla.edu/cvs/loni_dev_src/LONI_PL_Previewer) for thumbnail-like quick, efficient and dynamic navigation of results from large number of subjects (need a link to a project).
    • Expand the LONI VIewer to allow reading PDF, HTML, TXT, TSV, CSV, XLS and other file types.

Student Projects 2011

  • Student Projects 2011-2012
  • Student Projects 2011
  • Other Projects: PL Webstart Project
  • LONI_Previewer (svn+ssh://svn.loni.ucla.edu/cvs/loni_dev_src/LONI_PL_Previewer) for thumbnail-like quick, efficient and dynamic navigation of results from large number of subjects (need a link to a project)
  • (bash) scripts and makefiles with conditionals and iterations

Features

  • Cancelled status for each instance.
  • Persistence for Client
  • Workflow toolbars
  • Dialogs OK Button should compare changes and then mark the workflow as modified.
  • Pages in execution logs.
  • When connection is lost and reconnected with other user, workflows from previous user still reconnect.
  • Scheduled workflow submission
  • Send email and IM from Server to Clients.
  • Live support directly from Pipeline, which allows pipeline user to chat with admins
  • Run without Validation functionality in some cases.
  • Validation speed improvement
  • Sharing workflow statuses and operations between workflow friends
  • Connections order from module to children ( i.e module has 5 outputs and all outputs to to child module's one input, there is no way to specify an order for this inputs )
  • Validation with sudo
  • Transformation with multiple parameters
  • Transformations which allow to have same value as other parameter.
  • Conditional Transformations ( if file doesn't end with extension, then append extension )
  • Define env variables prior execution ( for non array jobs )
  • Server Status panel with charts

Bugs

  • Newly created while loops do not show parameters in conditions tab
  • Newly created modules do not switch to looping conditions tab
  • Suggestions Box does not disappear when switching tabs [ fixed, but now, when suggestion box is closed (opened and closed ), then tab is changed, the suggestion box somehow appears again ).
  • Suggestions Box location is not on the right Place.
  • Rename does not apply on already existing condition
  • Outputs for repeated instances do not exist.
  • Outputs are the same for all instances of while loop
  • Mickey Mouse look for modules with repeat until
  • For loop groups copy loses icons
  • For loop groups reconnection shows as a ModuleGroup?
  • Restart Module of Repeat Until doesn't remove instances created from previous execution
  • Pause After Restart of module with Repeat until, makes module in "Waiting" mode
  • Cut of a module which has a smartline before it , Paste puts smartline and data sink together.
  • After Pause, there are some background status changes.
  • When restarting failed module, which has cancelled children modules, successful completion of module doesn't start cancelled children ( need to be verified )

Research of Alternatives

  • Research alternative approaches, functionalities, and novel technological uses (e.g., Geneious)
  • Yahoo's s4

Completed Actions

Java Web-Start Interface ( Priority: 3 ) Completed 04/30/2011

Provide a Java web-start-based alternative to the Pipeline Applet-based functionality utilized by the LONI Try-It-Now server. Project specification is available here.

Module-Skeletonization (V.5.2 05/05/2011)

Generate a new tool/script for automated skeletonization of module descriptions using help/man pages of different types of tools.
  • Example script and instructions: /ifs/ccb/CCB_SW_Tools/Pipeline/AutoModuleDefinition_PerlScript_2011/
  • Some known issues to resolve (documented in the ReadMe? and Script files) are:
    • Handle gracefully /ifs/tmp/{user} pointers
    • Why is there a reference at the end of 2 *.pipe files (is one a template for a generic Pipeline XML)?
    • May need some work on formatting the final description in the GUI window
    • Needs validation (using existent module definitions, e.g., AIR tools (-help))