- 20 Apr 2007
Working Page to standardize some of the MAGE entries across Mouse BIRN or BIRN test beds
- the identifier prefix: To identify where the data is from and what source (Accession number)
- is there a BIRN standard we should adopt?
It is in 2004 to construct MAGE object identifiers in the following LSID-like manner: <authority>:[<namespace>]:<object>[:<revision>>]
- Recommendation: Jeff G?
- database reference code: This specifies if the raw data is coming from another DB or if we point to data in another DB.
- acceptable data file format
- Currently BIRN Microarray DB accepts CSV format, people may expect Excel, XML, or some other format for PROCESSED data (after normalization)
- File format may depend on if we decide to accept data earlier in the workflow, i.e. a .cel file (reference Daniel Sforza's presentation)
- Recommendation: .CSV, .XML (later), .CEL(Affymetrix CEL file)
- log entries information
- After upload, the provenance information should be stored, possible examples: IP address, error messages,
- Recommendations: We need input on what other types of information should be kept in this area
Ontology Entries
Click
herefor entries need to be mapped to BIRNLex.
- Information about who is providing the data and where they are from
- Role (Submitter/Experimenter)
- Primary Investigator
- Institute name, url
- Funder and Grant #
- Contact Information (phone, email)
- Other?
- File information
- File name
- File location on other DB (url, uri, or urn)
- File format
- Date file generated (need this?)
- Other?
- Experiment Information
- Experiment name
- Experiment description
- Array Chip Type
- Annotation or Reference to an external annotation database
- Normalization or Summarization
- Public accessible?
- Publication Status (published, unpublished, submitted, in press, confidential)
- Experiment details
- IRB information or human subject information
- Group of Interest (normal, control, diseased, pre-diseased)
- Treatment methods
- Treatment chemical/compound
- Tissue or structures information
- Subject(s) information
- Species (preferred name and scientific name)
- Strains
- Developmental Stage (embryonic stages and adult stages)
- Age/Age Unit
- Sex(male, female, hemphrodites, other)
Ontology references:
MAGE-TAB
MAGE-TAB is a simplied version of MAGE-ML, and is proposed to be part of MAGEv2. The data is arranged in tabular format instead of XML like MAGE-ML. It consists four different types of files: Investigation Description Format(IDF), Array Design Format(ADF), Sample and Data Relationship Format(SDRF), Raw and processed data files.
Examples and Use Cases
Investigation Description Format (IDF) Example:
| Investigation Title | Chronic Mouse Model of Parkinson's Disease |
| Experimental Design | Parkinson_Disease_Design |
| Experimental Factor Name | genetic_modification_design |
| Experimental Factor Type | genetic modification |
| Experimental Facotr Term Source REF | BIRNLEX |
| |
| Person Last Name | Sforza |
| Person First Name | Daniel |
| Person Email | sforza@ucla.edu |
| Person Phone | xxx-xxx-xxxx |
| Person Address | |
| Person Affiliation | Laboratory of Neuro Imaging |
| Person Roles | Experimenter |
| Person Roles Term Source REF | |
| |
| Quality Control Type | |
| Quality Control Term Source REF | |
| Replicate Type | |
| Replicate Term Source REF | |
| Date of Experiment | yyyy-mm-dd |
| Public Release Date | yyyy-mm-dd |
| |
| PubMedID | |
| Publication Author List | |
| Publication Status | |
| Experment Description | |
| |
| Protocol Name | |
| Protocal Type | |
| Protocol Description | |
| Protocol Parameters | |
| Protocol Term Source REF | |
| |
| SDRF File | e-external_file.csv |
| Term Source Name | BIRNLEX |
| Term Source File | http://fireball.drexelmed.edu/birnlex/ |
| Term Source Version | 1.2 |
Sample and Data Relationship Format(SDRF) Example
| Sourc Name | Hybridization Name | Array Design REF | TERM SOURCE REF | Array Data File | Derived Array Data File |
| Sample 1 | Experiment 1 | A-AFFY-MOE430 | AFFYMETRIX_TERM | Data1.CEL | AFFY_data1.CHP |
| Sample Name | Characteristics [Organism] | Characteristics [Organism Part] | Characteristics [Organism Part] | Characteristics [Developmental Stage] | Protocol REF | Parameter Value [Compound] | Extract Name | Labeled Extract Name | Label | Hybridization Name |
| M6_FCX | Mouse | Brain | Frontal Cortex | Adult | | MPTP | M6_FCX_Ext | M6_FCX_lab_biotin | Biotin | M6_CTx_hyb |
| C6_FCX | Mouse | Brain | Frontal Cortex | Adult | | Saline | C6_FCX_Ext | C6_FCX_lab_biotin | Biotin | C6_CTX_hyb |
This is a software package supported by ArrayExpress curators, which is able to generate MAGE-ML from Tab2MAGE spreadsheet document.The spreadsheet consists of three sections, separated by one or more blank lines: Experiment, Protocol and Hybridization.
GEO SOFTmatrix and GEOarchive
Daniel has submitted a microarray dataset to GEO using the SOFTmatrix and GEOarchive metadata format, and found it much clearer with example and annotations for large Affymetrix datasets. Their format may be easier for researchers to organize their information easily, and easily integrate with the database entries.
Example
- Click here for Daniel's submittied file, which also includes the mapping between GEOarchive meta data and MAGE-TAB metadata.
- Link to GEO site
- Metadata mapping between GEOarchive and MAGE-TAB.
| GEOarchive | MAGE-TAB |
| SERIES | IDF |
| title | Investigation Title |
| summary | Experiment Description |
| overall design | Experimental Design |
| type | N/A |
| contributor[] | Person Last Name Person First Name Person Mid Initial |
| |
| PLATFORM | |
| title | Investigation Title |
| technology | Protocol Contact |
| distribution | N/A |
| organism | Characteristics[] in SDRF section |
| manufacturer | Term Source |
| manufacture protocol | Term Source File |
| |
| SAMPLES | SDRF |
| Sample name | Sample Name |
| title | Hybridization Name |
| CEL file [Affymetrix submissions only] | Array Data File |
| EXP file [Affymetrix submissions only] | Array Data File |
| source name | Source Name, Material Type |
| organism | Characteristics[organism] |
| characteristics[] | Characteristics[] (e.g Characteristics[Strain], Characteristics[Gender], Characteristics[Age]) |
| molecule | Extract Name, Material Type |
| label | Label |
| description | Description |
| platform | reference to Term Source Name in IDF file |
| |
| PROTOCOLS | |
| growth protocol | Protocol Name, Protocol Type, Protocol Description |
| treatment protocol | Investigation Title |
| extract protocol | Same as Above |
| EXP file [Affymetrix submissions only] | Same as Above |
| label protocol | Same as Above |
| hyb protocol | Same as Above |
| scan protocol | Same as Above |
| data processing | Normalization Name |
| value definition | N/A |
MAGE-ML
Example XML broken down by MAGE-ML package
Audit and Security
Audit and Security package contains classes that describe an individaul contact information, or an oragnization information, as wells as the the security group they belong, and the role they play.
| Classes | Item | Detail | Example |
| Audit and Security |
| Contact | Organization | name | |
| | | affiliation | |
| | | roles (ontology term) | |
| | Person | name (last, first) | |
| | | affiliation | |
| | | roles (ontology term) | |
| | | phone | |
| | | email | |
| | | fax | |
| Property Set | IRB | number | |
| | | exp date | |
| | | grant number | |
| | Public rec? | yes or no | |
| |
Description
Description package contains classes that describe the external databases or annotations being referenced to.
Array
Array package contains classes that describe individual arrays, including detailed information on relevant manufacturing proccesses. It also contains references to the LIMS data that might contain BioMaterial information being used.
| Classes | Item | Detail | Example |
| Array |
| ArrayManufacture | ID | unique identifier for each manufacturer, use manufacturer assession number as prefix | |
| | Person | (Same as Contact->Person) | |
| | Arrays | contain multiple array | |
| Array | ID | unique identifier for each array chip |
| | | other array properties should be automatically loaded based on the array chip model |
| |
ArrayDesign
ArrayDesign package contains classes that describe a microarray design. PhysicalArrayDesign describes the design that is being used to manufacture physical array.
| Classes | Item | Detail | Remarks and Example |
| Array Design |
| Physical Array Design | Composite Group | | |
| | PhysicalArrayDesign | | |
| | DesignElemntGroup | FeatureGroup | Contain multiple Feature |
| | | Species | ontology term |
| |
DesignElement
DesignElement contains classes that describes the purpose of each Array. The Feature describe the intended location on the Array.
It can be specified as reporters or CompositeSequence for the arrays.
Experiment
Experiment package contains a collection of BioAssays that are related to the ExperimentalDesign. ExperimentalDesign is the description and collection of ExperimentalFactors and the BioAssays information.
BioSequence
BioSequence is a representation of DNA, RNA, protein sequence. It can be represented as Clone, Gene or a sequence.
BioMaterial
BioMaterial package describes the biological material being used, and the description of the creation through BioSource, BioSample, LabelExtract classes. BioMaterial can be related to other BioMaterial through DAG(directed acyclic graph). BioSample are products of treatments that are of interest. It can be used as the sources of other BioSamples. BioSource is the raw material information. LabelExtracts are special
BioSamples? that have Compounds that are detectable.
BioAssay
BioAssay represents both physical and computational groupings of arrays and biomaterials. PhysicalBioAssay is a bioAssay created by the BioAssayCreation event. A measured bioAssay is the direct processing of information in a physical bioAssay by the featureExtraction event. DerivedBioAssay is created by the Transformation
BioEvent? from one or more MeasuredBioAssays or DerivedBioAssays.
| Classes | Item | Detail | Example |
| BioAssay | DerivedBioAssay | | |
| | MeasuredBioAssay | | |
| | PhysicalBioAssay | | |
| | BioAssayCreation | | |
| | Hybridization | | |
| | BioAssayTreatment | | |
| | FeatureExtraction | | |
| | Image | | |
| | Image Acquisition | | |
| | Channel | | |
| |
BioAssayData
BioAssayData describes gene expression data. BioDataValues contain the actual values.
Protocol
QuantitationType
QuantitationType describes the data that are communicated in microarray experiments. It can be Standard or Specialized types. The standard types can be MeasuredSignal, DerivedSignal, Ratio or PresentAbsent, or PValue, Error and ExpectedValue. Specialized types are user defined.
Case Reference
Working Page to look at the Minimum Amount of Metadata a Microarray Data Submitter should provide