A collection of software packages and publications for ChIP-seq and other NGS analysis
(Most of these contents were
adapted based on SEQanswers forum (thanks SEQanswers' members sci_guy and ECO organized the list of tools) and Bionformatics
NGS virtual issue)
Integrated solutions
# Galaxy
- Galaxy = interactive and reproducible genomics. A job webportal. Paper
link
# PIAQ
- Pipeline for Illumina G1 Genome Analyzer
Data Quality Assessment. Paper
link
# SHORE - SHORE, for Short Read, is a mapping and analysis
pipeline for short DNA sequences produced on a Illumina Genome Analyzer. A
suite created by the 1001 Genomes project. Source for POSIX. Paper link
# ShortRead - A Bioconductor
package for input, quality assessment, and exploration of high throughput
sequence data. Paper
link
ChIP-Seq and other counting related NGS
analysis
# BS-Seq - The source code and data used by paper
"Shotgun Bisulphite Sequencing of the Arabidopsis Genome Reveals DNA
Methylation Patterning". POSIX. Paper
link
# chipseq – A Bioconductor package for analyzing chipseq data
# ChIPmeta - Hierarchical hidden
Markov model with application to joint analysis of ChIP-chip and ChIP-seq
data. Paper
link
# ChIPSeq - Program used by paper
“Genome-Wide Mapping of in Vivo
Protein-DNA Interactions” Paper link
# ChiPDiff - An HMM approach to
genome-wide identification of differential histone modification sites from
ChIP-seq data. Paper
link
# CisGenome - An integrated software system for analyzing
ChIP-chip and ChIP-seq data. Paper
link
# CNV-Seq - CNV-seq, a new method to detect copy number
variation using high-throughput sequencing. Perl/R. Paper link
# FindPeaks - perform analysis of ChIP-Seq experiments. It uses a
naive algorithm for identifying regions of high coverage, which represent
Chromatin Immunoprecipitation enrichment of sequence fragments, indicating
the location of a bound protein of interest. JAVA/OS independent. Latest
versions available as part of the Vancouver Short Read
Analysis Package, Paper
link
# F-seq - A feature density estimator for high-throughput
sequence tags. Paper
link
# MACS - Model-based Analysis for ChIP-Seq. MACS empirically
models the length of the sequenced ChIP fragments, which tends to be
shorter than sonication or library construction size estimates, and uses it
to improve the spatial resolution of predicted binding sites. MACS also
uses a dynamic Poisson distribution to effectively capture local biases in
the genome sequence, allowing for more sensitive and robust prediction. Paper
link
# PeakSeq - PeakSeq: Systematic Scoring of ChIP-Seq Experiments
Relative to Controls. a two-pass approach for scoring ChIP-Seq data
relative to controls. The first pass identifies putative binding sites and
compensates for variation in the mappability of sequences across the
genome. The second pass filters out sites that are not significantly
enriched compared to the normalized input DNA and computes a precise
enrichment and significance. C/Perl. Paper
link
# QuEST - Quantitative Enrichment of Sequence Tags. From the
2008 publication Genome-wide analysis of
transcription factor binding sites based on ChIP-Seq data. (C++). Paper
link
# SICER - A clustering
approach for identification of enriched domains from histone modification
ChIP-Seq data. Paper
link
# SISSRs - Site Identification from Short Sequence Reads. BED
file input.
Perl. Paper link
Align/Assemble
to a reference
# Bowtie - Ultrafast, memory-efficient short read aligner. It
aligns short DNA sequences (reads) to the human genome at a rate of 25
million reads per hour on a typical workstation with 2 gigabytes of memory.
Uses a Burrows-Wheeler-Transformed (BWT) index. Linux, Windows, and Mac OS
X. Paper link
# ELAND - Efficient Large-Scale Alignment of Nucleotide
Databases. Whole genome alignments to a reference genome allowing up to 2
errors per match. Written by Illumina
author Anthony J. Cox for the Solexa 1G machine.
# Exonerate - Various forms of pairwise alignment (including
Smith-Waterman-Gotoh) of DNA/protein
against a reference. C for POSIX. Paper link
# GMAP - GMAP (Genomic Mapping and Alignment Program) for
mRNA and EST Sequences. C/Perl for Unix. Paper
link
# MAQ
- Mapping and Assembly with Qualities (renamed from MAPASS2). Particularly
designed for Illumina with preliminary functions
to handle ABI SOLiD data. Features
extensive supporting tools for DIP/SNP detection, etc. C++ source Paper link
# MUMmer - MUMmer is a modular system for the rapid whole
genome alignment of finished or draft sequence. Released as a package
providing an efficient suffix tree library, seed-and-extend alignment, SNP
detection, repeat detection, and
visualization tools. POSIX OS required. Paper link
# PASS - It supports Illumina, SOLiD and Roche-FLX data
formats and allows the user to modulate very finely the sensitivity of the
alignments. Spaced seed intial filter, then NW dynamic algorithm to a
SW(like) local alignment. Win/Linux. Paper
link
# ProbeMatch, - rapid alignment of
oligonucleotides to genome allowing both gaps and mismatches. Paper
link
# Pyro-Align, - Multiple Sequence Alignment System for Pyrosequencing Reads. Paper
link Free text
# RMAP
- Assembles 20 - 64 bp Illumina reads to a FASTA reference genome. POSIX OS
required. Paper link
# SeqMap - Supports up to 5 or more bp mismatches/INDELs. Highly
tunable. Builds available for most OS's. Paper
link
# SHRiMP - Assembles to a reference sequence. Developed with
Applied Biosystem's colourspace genomic
representation in mind. POSIX. Paper
link
# Slider- An application for the Illumina Sequence Analyzer
output that uses the probability files instead of the sequence files as an
input for alignment to a reference sequence or a set of reference sequences.
Paper
link.
# SOAP
- SOAP (Short Oligonucleotide Alignment Program). A program for efficient
gapped and ungapped alignment of short oligonucleotides onto reference
sequences. The updated version uses a BWT. Can call SNPs and INDELs. C++,
POSIX. Paper
link.
# SSAHA - SSAHA (Sequence Search and Alignment by Hashing
Algorithm) is a tool for rapidly finding near exact matches in DNA or protein databases using a hash table. C++ for Linux/Alpha.
Paper link
# SOCS - Aligns SOLiD data. SOCS is built on an iterative
variation of the Rabin-Karp string search algorithm, which uses hashing to
reduce the set of possible matches, drastically
increasing search speed. Paper
link
# SWIFT - The SWIFT suit is a software collection for fast
index-based sequence comparison. It contains: SWIFT — fast local alignment
search, guaranteeing to find epsilon-matches between two sequences. SWIFT
BALSAM — a very fast program to find semiglobal non-gapped alignments based
on k-mer seeds. Paper link
# Vmatch -
A versatile software tool for efficiently solving large scale sequence
matching tasks. Vmatch subsumes the software tool REPuter, but is much more
general, with a very flexible user interface, and improved space and time
requirements. Essentially a large string matching toolbox. POSIX.
# ZOOM - ZOOM (Zillions Of Oligos Mapped) is designed to map
millions of short reads, emerged by next-generation sequencing technology,
back to the reference genomes, and carry out post-analysis. ZOOM is
developed to be highly accurate, flexible, and user-friendly with speed
being a critical priority. Commercial. Supports Illumina and SOLiD data. Paper
link
Genome
Annotation/Genome Browser/Alignment Viewer/Assembly Database
# EagleView - An information-rich genome assembler viewer.
EagleView can display a dozen different types of information including base quality and flowgram signal. Paper
link
# LookSeq - LookSeq is a web-based application for alignment
visualization, browsing and analysis of genome sequence data. LookSeq
supports multiple sequencing technologies, alignment sources, and viewing
modes; low or high-depth read pileups; and easy visualization of putative
single nucleotide and structural variation. Paper
link
# MapView - MapView: visualization of short reads alignment on
desktop computer. Linux. Paper
link
# rtracklayer - A Bioconductor package providing R interface to genome browsers and their annotation tracks. Paper
link
# SAM - Sequence Assembly Manager. Whole Genome Assembly
(WGA) Management and Visualization Tool. It provides a generic platform for
manipulating, analyzing and viewing WGA data, regardless of input type. MySQL
backend and Perl-CGI web-based frontend/Linux. Paper
link
# XMatchView - A visual tool for analyzing cross_match alignments.
Developed by Rene Warren and Steven Jones at Canada's Michael Smith Genome
Sciences Centre. Python/Win or Linux.
|