Skip to content.
- 20 Apr 2007

Working Page to standardize some of the MAGE entries across Mouse BIRN or BIRN test beds

  1. the identifier prefix: To identify where the data is from and what source (Accession number)
    • is there a BIRN standard we should adopt? It is in 2004 to construct MAGE object identifiers in the following LSID-like manner: <authority>:[<namespace>]:<object>[:<revision>>]
    • Recommendation: Jeff G?
  2. database reference code: This specifies if the raw data is coming from another DB or if we point to data in another DB.
  3. acceptable data file format
    • Currently BIRN Microarray DB accepts CSV format, people may expect Excel, XML, or some other format for PROCESSED data (after normalization)
    • File format may depend on if we decide to accept data earlier in the workflow, i.e. a .cel file (reference Daniel Sforza's presentation)
    • Recommendation: .CSV, .XML (later), .CEL(Affymetrix CEL file)
  4. log entries information
    • After upload, the provenance information should be stored, possible examples: IP address, error messages,
    • Recommendations: We need input on what other types of information should be kept in this area

Ontology Entries

Click herefor entries need to be mapped to BIRNLex.

  1. Information about who is providing the data and where they are from
    • Role (Submitter/Experimenter)
    • Primary Investigator
    • Institute name, url
    • Funder and Grant #
    • Contact Information (phone, email)
    • Other?
  2. File information
    • File name
    • File location on other DB (url, uri, or urn)
    • File format
    • Date file generated (need this?)
    • Other?
  3. Experiment Information
    • Experiment name
    • Experiment description
    • Array Chip Type
    • Annotation or Reference to an external annotation database
    • Normalization or Summarization
    • Public accessible?
    • Publication Status (published, unpublished, submitted, in press, confidential)
  4. Experiment details
    • IRB information or human subject information
    • Group of Interest (normal, control, diseased, pre-diseased)
    • Treatment methods
    • Treatment chemical/compound
    • Tissue or structures information
  5. Subject(s) information
    • Species (preferred name and scientific name)
    • Strains
    • Developmental Stage (embryonic stages and adult stages)
    • Age/Age Unit
    • Sex(male, female, hemphrodites, other)

Ontology references:



MAGE-TAB

MAGE-TAB is a simplied version of MAGE-ML, and is proposed to be part of MAGEv2. The data is arranged in tabular format instead of XML like MAGE-ML. It consists four different types of files: Investigation Description Format(IDF), Array Design Format(ADF), Sample and Data Relationship Format(SDRF), Raw and processed data files.

Examples and Use Cases

Investigation Description Format (IDF) Example:

Investigation Title Chronic Mouse Model of Parkinson's Disease
Experimental Design Parkinson_Disease_Design
Experimental Factor Name genetic_modification_design
Experimental Factor Type genetic modification
Experimental Facotr Term Source REF BIRNLEX
Person Last Name Sforza
Person First Name Daniel
Person Email sforza@ucla.edu
Person Phone xxx-xxx-xxxx
Person Address  
Person Affiliation Laboratory of Neuro Imaging
Person Roles Experimenter
Person Roles Term Source REF  
Quality Control Type  
Quality Control Term Source REF  
Replicate Type  
Replicate Term Source REF  
Date of Experiment yyyy-mm-dd
Public Release Date yyyy-mm-dd
PubMedID  
Publication Author List  
Publication Status  
Experment Description  
Protocol Name  
Protocal Type  
Protocol Description  
Protocol Parameters  
Protocol Term Source REF  
SDRF File e-external_file.csv
Term Source Name BIRNLEX
Term Source File http://fireball.drexelmed.edu/birnlex/
Term Source Version 1.2

Sample and Data Relationship Format(SDRF) Example
Sourc Name Hybridization Name Array Design REF TERM SOURCE REF Array Data File Derived Array Data File
Sample 1 Experiment 1 A-AFFY-MOE430 AFFYMETRIX_TERM Data1.CEL AFFY_data1.CHP

Sample Name Characteristics [Organism] Characteristics [Organism Part] Characteristics [Organism Part] Characteristics
[Developmental Stage]
Protocol REF Parameter Value
[Compound]
Extract Name Labeled Extract Name Label Hybridization Name
M6_FCX Mouse Brain Frontal Cortex Adult   MPTP M6_FCX_Ext M6_FCX_lab_biotin Biotin M6_CTx_hyb
C6_FCX Mouse Brain Frontal Cortex Adult   Saline C6_FCX_Ext C6_FCX_lab_biotin Biotin C6_CTX_hyb



Tab2MAGE

This is a software package supported by ArrayExpress curators, which is able to generate MAGE-ML from Tab2MAGE spreadsheet document.The spreadsheet consists of three sections, separated by one or more blank lines: Experiment, Protocol and Hybridization.



GEO SOFTmatrix and GEOarchive

Daniel has submitted a microarray dataset to GEO using the SOFTmatrix and GEOarchive metadata format, and found it much clearer with example and annotations for large Affymetrix datasets. Their format may be easier for researchers to organize their information easily, and easily integrate with the database entries.

Example
  • Click here for Daniel's submittied file, which also includes the mapping between GEOarchive meta data and MAGE-TAB metadata.
  • Link to GEO site
  • Metadata mapping between GEOarchive and MAGE-TAB.
    GEOarchive MAGE-TAB
    SERIES IDF
    title Investigation Title
    summary Experiment Description
    overall design Experimental Design
    type N/A
    contributor[] Person Last Name
    Person First Name
    Person Mid Initial
    PLATFORM  
    title Investigation Title
    technology Protocol Contact
    distribution N/A
    organism Characteristics[] in SDRF section
    manufacturer Term Source
    manufacture protocol Term Source File
    SAMPLES SDRF
    Sample name Sample Name
    title Hybridization Name
    CEL file [Affymetrix submissions only] Array Data File
    EXP file [Affymetrix submissions only] Array Data File
    source name Source Name, Material Type
    organism Characteristics[organism]
    characteristics[] Characteristics[] (e.g Characteristics[Strain], Characteristics[Gender], Characteristics[Age])
    molecule Extract Name, Material Type
    label Label
    description Description
    platform reference to Term Source Name in IDF file
    PROTOCOLS  
    growth protocol Protocol Name, Protocol Type, Protocol Description
    treatment protocol Investigation Title
    extract protocol Same as Above
    EXP file [Affymetrix submissions only] Same as Above
    label protocol Same as Above
    hyb protocol Same as Above
    scan protocol Same as Above
    data processing Normalization Name
    value definition N/A



MAGE-ML

Example XML broken down by MAGE-ML package

Audit and Security
Audit and Security package contains classes that describe an individaul contact information, or an oragnization information, as wells as the the security group they belong, and the role they play.

Classes Item Detail Example
Audit and Security
Contact Organization name  
    affiliation  
    roles (ontology term)  
  Person name (last, first)  
    affiliation  
    roles (ontology term)  
    phone  
    email  
    fax  
Property Set IRB number  
    exp date  
    grant number  
  Public rec? yes or no  

Description
Description package contains classes that describe the external databases or annotations being referenced to.

Classes Item Detail Example
Annotation and Description
  Databases    
  Database Entries    
  Ontology Entries    

Array
Array package contains classes that describe individual arrays, including detailed information on relevant manufacturing proccesses. It also contains references to the LIMS data that might contain BioMaterial information being used.

Classes Item Detail Example
Array
ArrayManufacture ID unique identifier for each manufacturer, use manufacturer assession number as prefix  
  Person (Same as Contact->Person)  
  Arrays contain multiple array  
Array ID unique identifier for each array chip
    other array properties should be automatically loaded based on the array chip model

ArrayDesign
ArrayDesign package contains classes that describe a microarray design. PhysicalArrayDesign describes the design that is being used to manufacture physical array.

Classes Item Detail Remarks and Example
Array Design
Physical Array Design Composite Group    
  PhysicalArrayDesign    
  DesignElemntGroup FeatureGroup Contain multiple Feature
    Species ontology term

DesignElement
DesignElement contains classes that describes the purpose of each Array. The Feature describe the intended location on the Array. It can be specified as reporters or CompositeSequence for the arrays.

Classes Item Detail Example
Design Element
Feature Controlled Features    
  Control Type    

Experiment
Experiment package contains a collection of BioAssays that are related to the ExperimentalDesign. ExperimentalDesign is the description and collection of ExperimentalFactors and the BioAssays information.

Classes Item Detail Example
Experiment
Experimental Factors      
  Factors    
  Providers    
ExperimentDesign

BioSequence
BioSequence is a representation of DNA, RNA, protein sequence. It can be represented as Clone, Gene or a sequence.

Classes Item Detail Example
BioSequence      

BioMaterial
BioMaterial package describes the biological material being used, and the description of the creation through BioSource, BioSample, LabelExtract classes. BioMaterial can be related to other BioMaterial through DAG(directed acyclic graph). BioSample are products of treatments that are of interest. It can be used as the sources of other BioSamples. BioSource is the raw material information. LabelExtracts are special BioSamples? that have Compounds that are detectable.

Classes Item Detail Example
BioMaterial
  BioSource    
  BioSample    
  Other BioMaterial    
  ActionMeasurement    
  LabelExtract    

BioAssay
BioAssay represents both physical and computational groupings of arrays and biomaterials. PhysicalBioAssay is a bioAssay created by the BioAssayCreation event. A measured bioAssay is the direct processing of information in a physical bioAssay by the featureExtraction event. DerivedBioAssay is created by the Transformation BioEvent? from one or more MeasuredBioAssays or DerivedBioAssays.

Classes Item Detail Example
BioAssay DerivedBioAssay    
  MeasuredBioAssay    
  PhysicalBioAssay    
  BioAssayCreation    
  Hybridization    
  BioAssayTreatment    
  FeatureExtraction    
  Image    
  Image Acquisition    
  Channel    

BioAssayData
BioAssayData describes gene expression data. BioDataValues contain the actual values.

Classes Item Detail Example
BioAssayData
  DesignElements
  BioAssays
  QuantitationTypes

Protocol
Classes Item Detail Example
Protocol      

QuantitationType
QuantitationType describes the data that are communicated in microarray experiments. It can be Standard or Specialized types. The standard types can be MeasuredSignal, DerivedSignal, Ratio or PresentAbsent, or PValue, Error and ExpectedValue. Specialized types are user defined.

Classes Item Detail Example
QuantitationType      



Case Reference


Working Page to look at the Minimum Amount of Metadata a Microarray Data Submitter should provide

Attachment sort Action Size Date Who Comment
GEO_metadata.xls manage 41.0 K 17 May 2007 - 22:17 QueenieNg Daniel's MPTP data in GEOArchive format
Illumina33TissuesTab2MBEC1_v2.xls manage 64.5 K 11 May 2007 - 23:21 QueenieNg Rob Williams' Illumina33Tissues data in Tab2MAGE format
concept_review.xls manage 60.0 K 17 May 2007 - 22:13 QueenieNg Concept Terms for BIRN MAGE entries
MPTP_ML.xml manage 88.6 K 16 May 2007 - 16:53 QueenieNg Daniel's MPTP data in MAGE-ML Format
illumina33.xml manage 497.0 K 06 Jun 2007 - 16:48 QueenieNg Rob Williams' Illumina33Tissues data in MAGE-ML format