"MEEBO" MOUSE GENOME SET Naming Conventions for "Oligo_ID" Each oligo is assigned a unique identifier in the "Oligo_ID" column of the "Probes" worksheet tab in the Excel document. General Format: mXXnnnnnn Where: "m" is an oligo derived from the MEEBO set. "XX" is a two-letter code specifying the group to which the oligo belongs (see below). "nnnnnn" is a 6-digit number indicating the order of the oligo. Plate 001, Row A, Column 1 is 000001. Empty wells are designated as "EMPTY" Two-letter "XX" Group Codes: CA - control, antisense; probes recognizing the antisense strand of selected mouse and doping transcripts CD - control, doping; probe recognizing spike-in transcripts from Methanococcus and B. subtilis (Stanford) and from commercial suppliers CM - control, mismatch; anchored and distributed mismatched versions of selected spike-in transcripts plus distributed and anchored mismatch for 5 positive control mouse genes CN - control, negative; randomized 70mers, selected not to recognize mouse CP - control, positive; Ubiquitin C probe as a corner placed PMT aid, assuming that sector widths are 28 or 29 spots (192 replicates) plus 10 mouse “housekeeping” genes, 20 copies of each, based on Vandesompele et al., Genome Biol. 2002 3(7):RESEARCH0034 CT - control, tiling; series of probes designed to recognize sequences at varying distances from the 3’ end plus 11 mouse genes and selected spike-in transcripts MA - mouse alternative exonic; alternative spliced/skipped exons collected through extensive curation of 5 published datasets by Max Diehn, Ash Alizadeh, Jean Yang, and Catherine Foo MC - mouse constitutive exonic; Rockefeller MouSDB3 constitutive exons and locuslink2ucsc constitutive exons MO - mouse other; includes syntenic orthologs of human loci exhibiting cis-antisense transcription based on Yelin et al Nat Biotech 2003, microRNA tagged templates, mitochondrial genes, BCR and TCR genes, rRNA genes MR - mouse mRNAs; mRNA derived 70mers which may span >1 exon TG - transgenes; recognize sequences used in transgenic constructs VI - virus; sequences from selected viruses that infect mice (sequences from the ViroChip) Naming Conventions for "Sequence_ID" Some oligo sequences (especially the "CP" class of positive controls) are replicated within the set. To identify instances where the same sequence is used for multiple oligos, a sequence identifier is listed in the "Sequence_ID" column of the "Probes" worksheet tab in this document. The sequence ID could be used in place of the oligo ID when the goal is to generate an aggregate measure obtained in a series of replicates (e.g., the average Cy5 intensity for 192 probes with sequence mSQ000001). General Format: mSQnnnnnn Where: "m" is an oligo derived from the MEEBO set. "SQ" designates a Sequence_ID "nnnnnn" is a 6-digit number indicating the order of the oligo; this number is the same as the number from the Oligo_ID that represents the first instance of the sequence Empty wells are designated as "EMPTY" Naming Conventions for "Description" Briefly, there are three main categories of 70mers as described in the "Description" field: I. Positive controls (PC): these include genes expressed at a range of abundances, and very robust for normalization, PMT adjustment, bright gridding aids serving as "landing lights" strategically placed at the top of each sector, etc. These include: 1. Tiled mouse genes (n=11, an improved set with a range of sizes and expression levels within "Universal Reference" RNAs from Stratagene) 2. Normalization mouse genes (n=10 genes, 20 copies of each, based on http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12184808 ) 3. Distributed and anchored mismatch controls for 5 mouse genes selected in #2 above 4. Ubiquitin C as a corner placed PMT aid, assuming that sector widths are 28 or 29 spots 5. others The information within the square brackets describes these more specifically: PC Details: [Positive Control Type(s)] II. MEEBOChip Core Collection (MCC): This is the main backbone of the array and covers over 25k mouse genes and includes: 1. Rockefeller MouSDB3 constitutive exons/islands (oligo names start with 'scl' followed by a number >0), 2. locuslink2ucsc constitutive exons/islands (oligo names start with 'scl0' followed by a number >0) 3. mRNA derived 70mers which may span intron/exon boundaries (oligo names start with 'scl00' followed by a number >0) 4. an unprecedented cohort of alternative spliced/skipped exons collected through extensive curation of 5 published datasets by Max Diehn, myself, Jean Yang, and Catherine Foo (oligo names start with 'scl000' followed by a number >0) 5. syntenic orthologs of human loci exhibiting cis-antisense transcription based on Yelin et al Nat Biotech 2003. (oligo names start with 'scl0000' followed by a number >0) 6. microRNA Tagged Templates (these are complicated, will need to explain these in detail) 7. Transgenes/Cassettes (eg, GFP, YFP, beta-Gal, etc) 8. mitochondrial genes 9. BCR and TCR genes 10. select murine viruses from the ViroChip 11. rRNA genes 12. others The specific annotations for each 70mer in the MCC are detailed within the Comment field and are carat (^) delimited as follows: MCC Details: [LLID (060504)^Design type^cluster^Long Oligoname^output0set^worstxhyb^oligo_3_marg"] LLID refers to LocusLink ID, which can be readily linked with gene annotation data through a variety of tools including BatchSource. III. Doped Controls (DC): This is the largest collection of 70mers to detect doped/spiked species that are either commercially available now or will soon be readily available. These include: 1. MJDC from Stanford (192) 2. Affymetrix spikes (4 genes) 3. Stratagene spikes 4. Ambion spikes (replication positive controls included) 5. B. subtilis spikes 6. anchored and distributed mismatched versions of 5 genes (4 MJDC, 1 Ambion) 7. tiled MJDC (4 genes) and Ambion (1 gene) and Affymetrix (4 genes) 8. antisense versions of all spikes 9. others The specific annotations for each 70mer in the DCC are detailed within the Comment field and are carat (^) delimited as follows: Doping Control Details: [Category^RNA Length^oligo3mag^Design^RNA Source^RNA Species^Organism^Pick70Name] In addition, roughly 100 random sequences and ~1 empty well per plate have been allotted as additional negative controls.