Pipeline General Notes


Here you will find some brief notes about Pipeline that may help you if you are building a new pipeline. These notes go over general terminology as well as the general idea of what Pipeline is meant to do.

Topics covered on this page
Introduction
Making Lists
Environment
Modules and Variables
Connecting to Servers
Building a pipeline
Troubleshooting

Introduction

    Pipeline provides users with a simple graphical environment for constructing analyses of data. Once learned, the intuitive interace allows users to quickly construct data processing pipelines. One of several advantages of the pipeline is that once built, it can be used on whatever dataset you may have. It can also be easily shared between collaborators. The graphical nature of the program allows others to understand your pipeline with minimal explanation. Constructing pipelines is a fairly simple process. All that is needed to be done is for some global variables to be declared (we'll get into this later) and drag/dropping and linking the modules in the workspace. You can learn more specifics about the Pipeline program at the LONI website, or go to Pipeline's support page at http://www.loni.ucla.edu/twiki/bin/view/Pipeline/WebHome
 

Making Lists

    Before getting into the Pipeline program itself it is important to discuss the idea of lists. A list is exactly what it sounds like it is, a list of something, in this case a list of files and their paths. Lists are important to the Pipeline program because the user can not program a loop function into it, the files need to be explicitly stated in a list file. The extension of this file is .list. It is important that this extension is used because it signifies to the program that this file is a list file.
    A list can be a list of anything; files, paths, paths to files, subject ids. For example a list can look like this:

           

    Notice that the path to each file must be explicitly stated. This MUST be done in every list that is made.
    A list can be created one of two ways: 1) the user can manually type in the list (if the list is short) 2) use the "ls" command and output the result to a .list file (when the list is long). The latter of these two options is more commonly used.
    The "ls" command will list everything in the current directory. Typing "ls ${PWD}/*" will list the complete path to each of the items in the current directory. For example if we were to be in the directory /loni/edevel/leonard and I were to type "ls ${PWD}/*/*air" it would output to the terminal a list similar to the list in the screenshot above. To save this output to a file just type "ls ${PWD}/*/*air > filename.list" where 'filename' is whatever you want to name your file. A list file is created in the directory where you typed all this.
    The easiest way to learn to make lists is to practice. Since this is an extremely important part of the pipeline it's best if you get comfortable making lists as soon as possible. It is not complicated once you get the hang of it.
 

Environment

 
   There are two spaces of importance in the Pipeline program (aka environment). The first is the big blue window (workspace) that takes up most of the Pipeline window. This is where your pipeline will be constructed. The second is the space on the left, which is reserved for an organized listing of defined modules and pipelines.
   Many modules have already been defined in the left column and are organized by the groups that developed them. Modules are what we know as functions in scripting. To add a module to a pipeline, the user simply clicks and drags the desired module to the workspace window. The module then appears in the workspace as a box with the name of the module. You may move the module box around in the workspace if needed by left clicking and dragging the box to the desired location. If you wish to remove a module, simply right click the module, and select delete.

 
 

Modules and Variables

    First of all, it may be helpful to define what a module actually is. In most cases, a module is most like a function that is called in a script. Therefore, modules are very simply single functions to be used on the data A pipeline is then simply a collection of one or more modules. To make an analogy to scripts, modules are most like functions called in scripts and pipelines are most like the script itself.
    There is a little more to the story however. The Pipeline program allows the user to make a pipeline into a module which then can be used in another pipeline. Therefore a module is not specifically a single function, but instead can be a collections of functions.
    Double clicking the module will bring up a window that will allow the user to set options for the function(s) it calls on. Naming specific inputs or outputs can also be done here. Make sure to check these options as the default settings may not be what you necessarily want.
 
 
On the left is an example of what a module looks like after it has been dragged into the workspace. The input(s) to a module are always located on the square nodes at the top, and the output is located at the node on the bottom of the module. The input to a module can be defined one of three ways: 1) It can be the output of another module (this can be done by connecting the output of another module to the input of your module) or 2) Using a global variable or 3) Giving the input a specific path and files name.

 

    Global variables are declared by entering them into the global variables table. This table may be accessed by right clicking anywhere in a blank part in the workspace and selecting "Global variables". These variables can point to files, paths, or a list. A list is a file that is created by the user that points to the needed files. The file must end with the extension .list for the Pipeline program to understand that it is a list file.
    To add a new variable, simply press add on the table. Enter in the name and value of that variable. To edit an existing variable, click the checkbox next to the desired variable, and click edit. To delete, do the same, but click delete instead of edit.
    It is also possible to manipulate contents in a list in the global variables. The point of this is to give new names to output files, or use shortcuts so that fewer lists need to made.
        +  (string)    = adds a string
                ex. y2008_mr_rf_brainonly + .mnc will output y2008_mr_rf_brainonly.mnc
        - (string)    = subtracts a string
                ex. y2008_mr_rf_brainonly.mnc -.mnc will output y2008_mr_rf_brainonly
        ${listfile:f}= takes the list file and returns only the file
        ${listfile:d}= takes the list file and returns only the directory
        ${listfile:b}= takes list file and returns only the base name (no directory or extentions}
        ${listfile:e}= take list file and returns only the extension
 
 
 

Connecting to servers

 
    To connect to a server, click on "Server" on the toolbar and click "login". The default server is inire, but you may connect to other servers such as armiger as well. After entering your login and password you will be logged into the server. You will notice that a new tab (Modules @ inire.loni.ucla.edu) in the left menu will appear after you have logged in. 
 
     This new tab will have a list of modules basically identical to the ones you find in the "System Modules List" tab. Modules dragged from the "Inire" tab will be executed on the inire server, modules dragged from the System Module list will be executed locally. It is important that when building the pipeline that the user pull all modules from only ONE of these tabs. Otherwise the pipeline will swap data in between the local machine and server which takes up alot of time. 

 

Building a pipeline


    Building a pipeline is very easy! If you are starting with a new (blank) pipeline, start off by double clicking the module you want to get it into the workspace. After that simply click and drag the rest of the modules as needed for your analyses. Again, it is important to remind you that when building a pipeline, it is better that the modules are all obtained from the same list of modules (prefereably the server's list of modules. Running a pipeline locally would take a very long time with some modules).
    After you get the modules onto the workspace, you can link them. Just click the output node of one module and the input node of another module and a line will be drawn between them symbolizing the connection.

    In this example, the input for the "Flip volume" module would be defined in the Global variables table. The output from that module then becomes the 1st input of the "Mask volume" modules. The second module has 2 nodes for inputs. The 2nd input can be from another module OR from a global variable. The final output from the "Mask volume" module can be named by double clicking the output node and entering a name. If there were more than one inputs, the output must be named via a list to avoid overwriting.
 
 
After the Global variables are set, and module options are defined, the pipeline is ready to be executed. Click on "Execution" on the tool bar and first click "Validation". This validates that the paths and files of your global variables actually exist before running the pipeline. If everything checks out, click "Run". You may get a warning about lists if you are running the pipeline with a list, just click ok to begin running the pipeline.

Troubleshooting and error prevention

    There are several areas in which errors may occur. To save alot of headache it would save time to take these precautionary steps:
        1) Check your lists!
                a) use the "wc" command in a terminal to check that you have the same number of lines in each of your list files
                b) use the "xdiff" command between two lists to make sure that they are in the same order
                c) make sure the paths and filenames are correct (Validation doesn't always catch the errors)
        2) Be sure when building your pipeline to build from the Modules @ <server> tab. If you build locally or from two different tabs by accident
                it will make the runtime of your pipeline extremely long.
        3) Go open a server window and type "top" to track the progress of your pipeline every once in awhile (it will show up as "pipeline").
        4) Check your lists again!

    Sometimes, even with all the preventive measures the pipeline will break. First step always is to check your lists again. This is a slow and meticulous process but more than half the time it is a user error in the lists (spelling error, path error, order error). Also check that you are outputting files where you expect.
    If the pipeline runs to completion but the results are not what you expected, make sure that the correct options are selected for each module.
    If you are running processes that output large sized files (ex. thickness mncs) be sure to know the space capacity of the server you are running on and adjust your input load accordingly. You can always split your lists into smaller lists. Again if you do this, make sure to check your lists again before running.