Skip to content.
People: Rob, Daniel, Jintao, Steve, Hongqiang, Bill
Minutes for follow up meetings
Suggested plan for outlining a Common Data Model (by August 25)
  • Identify data sources
    • GeneNetwork (Searchable API)
      • takes a full list and can do something with it
    • GEO (text dump, array by array)
      • upload of data: MAGE-ML is supported, looks like MINiML (XML) may be more comprehensive
        • create an account
        • submit platform (type of array used)
        • submit sample (one dataset for each microarray-can cross subjects, etc. one experimenter would likley submit many of these per experiment)
        • submit series (links together all samples and explains the experiment)
        • submit raw data (if you wish)
      • download of data
        • query is simple, return of data is dense and difficult to understand and it requires additional filtering by the user
    • GNF (Genome Institute of Novartis Foundation)
      • can return a list of genes
      • symatlas.gnf.org
        • ptpn2: query is cumbersome, but it may return a list of values for several structures (including some in the brain). We are not sure where the data comes from or how it is added to this database. Looks like they use something called the Bioperl object model.
    • Array Express (May be a good place to look for APIs etc.)
      • Public repository for MIAMI-compliant data MGED recommendations: tool called MIAMExpress, or MAGE-ML:
      • investigate their MAGE Java API
      • For query, they have what looks like simple HTTP acccess to their repository. However, all the data is returned and it seems you need to download the data yourself. Also this database seems more specialized for type of tissue (more human)
    • NIH Blueprint Microarray Consortium (***release in 6 months: Stan Nelson, Dietrich Stepsan):
      • have their own relational databases, but upon publication will store it in GEO
  • Investigate if these APIs already exist
  • Investigate if any wrappers already exist for desired data sources
  • Develop 2 use case scenarios for potential microarray user: 1 simple, 1 complex
  • Outline a common Data Model and present to the DF and Atlasing groups, Vadim, and Ilya

Supporting Documents

Related Information

Data to evaluate:
  • Gene Network: on GN database (switch format?)
    • provides real-time data that can be visualized in MBAT
    • also like to use this to provide annotation data to MBAT (real-time?)
    • visualization requires a loaded atlas (recommend one for the user?)
  • Desmond Smith datasets:
    • 1st large voxels
      • currently stored at UCSD:
        • potential to access with mediator?
        • format appropriate for all atlasing efforts?
        • extensible format?
        • worth trying to reformat?
      • accessible through SA
    • 2nd smaller voxels within a single plane
      • to be expanded to other planes
      • current plane can be visualized in MBAT (file)
      • they have a web-engine, but it's not queryable
      • what is the potential for adding this to a mediator accessible DB (they may appreciate this for expansion)?
  • Barlow/Zapala data:
    • currently stored at UCSD : same questions as 1st Des Smith dataset
    • accessible through SA
  • Local data from Daniel Sforza
    • currently able to load it locally (annotation is local, but should be from GN)
    • current interface for visualizing local data is a bit cumbersome

Suggestions from Michael Miller
  • use the MAGE v1 object model and the associated MAGEstk to code to this model
  • EBI ArrayExpress? folks have created a useful "simple" version of the format (MAGE-TAB) to accommodate those not equipped technical to provide MAGE v1 format

  • best programmer resources for MAGE v1:
    • MAGEstk
    • AndroMDA?
    • ArrayExpress? MAGE tools

Evaluate Other:
Standford group mapping (MAGE-ML_to_SMD.xls) at http://www.mged.org/Workgroups/MAGE/mage.html