TranscriptionDetector is a tool for finding probes measuring significantly expressed loci in a genomic array experiment. This software was developed in Harmen Bussemaker's Lab at Columbia University by Gabor Halasz and Marinus van Batenburg, with the source code ported from Perl to C by Xiang-Jun Lu. TranscriptionDetector was originally created to analyze the transcriptome of Drosophila melanogaster. The study, a collaboration with Kevin White's Lab at Yale, is described in our 2004 Science paper. Later on, it has been elaborated and more recently presented in Gabor Halasz et al. in Genome Biol. 2006 July 19; 7(7):R59
The TranscriptionDetector algorithm requires that a set of negative control probes (NCPs) be included on each microarray used in the study. The significance of a data probe is evaluated relative to these NCPs, which represent non-specific binding. The algorithm consists of the following four steps (see also the figure below):
- Correcting each probe's signal intensity by taking into account the effect of probe sequence on non-specific hybridization. This correction is determined by parameters obtained after applying a position-dependent sequence model (Mei et al. 2003) to the NCPs.
- Computing each probe's experiment-specific P-value by comparing the probe signal intensity to the NCP signal intensity distribution..
- Non-parametrically combining all experiment-specific p-values for each probe into a single confidence score.
- Applying the FDR procedure (Benjamini & Hochberg, 1995) to these confidence scores.
The TranscriptionDetector program has been designed to be easy to use. To get command-line help, type TranscriptionDetector -help. Since the C source code is available, you can always check for any technical details you are interested in to get a thorough understanding of the underlying algorithm, and/or make changes as necessary.
TranscriptionDetector can be run as shown in the following example:
TranscriptionDetector -expr=log_EPNEPRO -s=probeVsId_EPNEPRO \ -ncp_p=RO.idlist -ncp_s=RO.idlist -fdr=0.05 -o=sample.out
The following command-line parameters are recognized:
- Required parameters
- -expression=file_name – tab-delimited expression file for all probes (including negative control probes)
- -sequence=file_name – tab-delimited probe sequence file for all probes (including negative control probes)
- -ncp_seq_file=name – file listing negative control probe IDs used for SEQUENCE CORRECTION, one per line (mandatory unless "nocrxn" option is used)
- -ncp_pvalue_file=name – file listing negative control probe IDs used for P-VALUE CALCULATION (usually same as neg1f), one per line
- -output=file_name – name of the output file
- Optional parameters
- -fdr_value=float – threshold for FDR (dft: 0.01)
- -directory=string – output directory (dft: ".", i.e., current directory)
- -noseq_correction – switch for omitting sequence correction (see -ncp_seq_file=name)
- -exclude_ncp – switch for excluding negative controls in the output significant probe list
- -verbose – switch for verbose output
- -help – this help message
You must accept our license terms and register first before you can download and use the TranscriptionDetector program.
Download the software
Please register first (see above). You can then download either the source code or pre-compiled binary distribution for Linux, Cygwin (Windows), Mac OS X, SunOS. Since TranscriptionDetector makes use of some NR ANSI C routines, you must have a valid NR license installed on your computer if you want to compile TranscriptionDetector from source yourself. If, however, you only install a pre-compiled distribution of TranscriptionDetector, you do not need a NR license to run it.
Installing TranscriptionDetector is easy – you just need to perform the following simple steps:
tar zxvf TranscriptionDetector_os_verNum.tar.gz
The os_verNum part of the tarball is specific to your selection. For example, for Linux binary distribution, the current version of the tarball is called TranscriptionDetector_Linux_v1.0.tar.gz, and for the source code, it is named TranscriptionDetector_src_v1.0.tar.gz.
The will created a directory named TranscriptionDetector under your current working directory, which contains three subdirectories bin, doc, examples, plus src if you have downloaded the source code distribution.
Normally, you would install TranscriptionDetector directly under your home directory, although any location will be just fine.
Skip to the next step if you have downloaded a pre-compiled binary version.
Edit the Makefile using your favorite editor (e.g., vi or emacs): you need to change the value of the macro NRC_RECIPES (the 6th line) to the location of your NR ANSI C routines (2nd edition). On my computer, it is set to /home/lux/src/recipes_c-ansi/recipes.
make patch applying patches to the original NR ANSI C routines to make them usable with TranscriptionDetector.
make generating the executable file TranscriptionDetector, and moves it into the bin directory.
To run TranscriptionDetector, you need to set up the environment variable TRANSCRIPTIONDETECTOR and add $TRANSCRIPTIONDETECTOR/bin to your command line search path. The Perl script TranscriptionDetector_setup is used to automate this process. It checks for your shell (bash, csh/tcsh), and produces the settings to be included into your login script. For example, on my system running SuSE Linux, it produces the following:
for your 'bash' shell, please add the following into ~/.bashrc: -------------------------------------------------------------- export TRANSCRIPTIONDETECTOR=/home/lux/TranscriptionDetector export PATH=/home/lux/TranscriptionDetector/bin:$PATH --------------------------------------------------------------
- For command line help, type TranscriptionDetector -h
- The examples directory contains a sample case for you to
verify your TranscriptionDetector installation and
get familiar with the program. It contains the following files:
- log_EPNEPRO – sample tab-delimited expression file for all probes, including negative control probes (NCPs) (input file).
- probeVsId_EPNEPRO – sample tab-delimited probe sequence file for all probes, including NCPs (input file).
- RO.idlist – sample NCPs file listing IDs used both for sequence correction and p-value calculations (input file).
- sample.out – output file containing a list of probe IDs that are expressed above background.
- used_options – a detailed listing of command line options for the current run so that the results can be exactly reproduced.
- Halasz G, van Batenburg MF, Perusse J, Hua S, Lu XJ, White KP & Bussemaker HJ (2006). "Detecting Transcriptionally Active Regions Using Genomic Tiling Arrays" Genome Biology, 7:R59.
- Stolc V, Gauhar Z, Mason C, Halasz G, van Batenburg MF, Rifkin SA, Hua S, Herreman T, Tongprasit W, Barbano PE, Bussemaker HJ & White KP (2004). "A gene expression map for the euchromatic genome of Drosophila melanogaster." Science, 306(5696), 655-660.
- Mei R, Hubbell E, Bekiranov S, Mittmann M, Christians FC, Shen MM, Lu G, Fang J, Liu WM, Ryder T, Kaplan P, Kulp D & Webster TA (2003). "Probe selection for high-density oligonucleotide arrays." Proc Natl Acad Sci USA, 100(20), 11237-42.
- Bailey TL & Gribskov M (1998). "Combining evidence using p-values: application to sequence homology searches." Bioinformatics, 14(1), 48-54.
- Benjamini Y & Hochberg Y (1995). "Controlling the false discovery rate: a practical and powerful approach to multiple testing." J. R. Statist. Soc. B, 57(1), 289-300.