Mate-Pair sequencing has been used from the early days of genome sequencing to help with final assembly. The technique creates libraries with much larger inserts than the standard fragment library prep, even 10Kb or more. However making these libraries has never been easy and the methods used have not changed a great deal from the days of the HGP.
A few groups have been successful in creating mate-pair libraries but my experience has not been so good. My lab has only tried once and did not get great results, many of the groups I work with have found the preps difficult and sequencing results less than encouraging. It looks like mate-pair is a technique where “green fingers” are required. This is not a situation I like as I am a firm believer that as long as a protocol has been carefully put together then anyone should be able to follow it.
How does mate-pair work: High-molecular weight genomic DNA is fragmented to an approximate size range, usually 3, 5 or 10Kb, this is then end-repaired with biotinylated nucleotides and gel-size selected more specifically. Fragments are then circularized by ligation, and purified by biotin:streptavidin cleanup. These circularized fragments are then subjected to a second round of fragmentation (to 300-500bp) biotin:streptavidin cleanup to remove all but the ligated ends of the original molecules. This DNA is then used for a standard fragment library prep and mate-pair sequencing. The length of the initial gel size-selection determines the mate-pair gap size expected during alignment of the final sequence reads. The orientation of these reads also acts as a useful QC. Two major problems are the amount of DNA required, usually 10ug or more and the creation of chimeric molecules during ligation that produce artifactual structural variations.
Jan Korbel’s group at EMBL have been successfully using mate-pair sequencing for their analysis of structural variation in Cancer (Paired-End Mapping Reveals Extensive Structural Variation in the Human Genome). They have been using a pretty standard protocol. They fragment DNA using a Hydroshear (see the bottom of this post for more details), cut gels for 5Kb libraries, and try to minimise PCR to keep diversity high. They are processing 8 samples at a time and it takes 5 days but the tedious and careful handling pays off in some very nice results. The biggest drawback is the requirement for 10ug of DNA.
Newer methods offer some promise for simpler mate-pair prep: At least two new products have been released that may help the rest of us produce reliable mate-pair libraries. Lucigen have developed a new BAC/Fosmid vector system for the creation of high-quality libraries with large inserts. Illumina have released a new Nextera based mate-pair kit.
Lucigen pNGS: Lucigen's novel protocol for making clone-free libraries of 40-300kb insert size could have a dramatic impact on the de novo assembly of complex genomes. They have released the pNGS system that includes a di-tagged vector which contains the PCR amplification sites required to produce sequence ready libraries. This vector lacks the promoter for lacZ and has transcriptional terminators which, according to Lucigen, “result in higher stability cloned insterts”.
The system works by using random DNA fragmentation to produce inserts of 40-300kb that are gel-purified and cloned into the pNGS vector. The amplified BACs or Fosmids are then digested with 4bp cutters to leave just the ends of the original insert fragment and the vector sequence. This digested molecule is self-ligated to create a circular template for NGS PCR amplification producing the final sequencing library.
Nextera mate-pair: The new protocol from Illumina makes use of the Nextera product Illumina acquired when they bought Epicentre in 2010. The protocol is not completely what I expected as it uses a mix of Nextera and TruSeq reagents rather than relying on Nextera alone. Illumina have a TechNote on their website technote_nextera_matepair_data_processing that I'd recommend interested readers take a look at. They show comparison data from paired-end vs mate-pair+paired-end sequencing of the Human genome and show a modest, but important increase in coverage statistics that is most obvious in repeat regions of the Human genome.
Nextera mate-pair offers gel-free or gel-based size selection. The gel-free option allows a lower DNA input of just 1ug but generates a larger final mate-pair size distribution of 2-15kb, which may make analysis harder as you cannot simply discard mate-pairs using insert size as a QC. The gel-based protocol requires 4ug of DNA but the user has control over the final mate-pair size distribution as normal.
Transposomes are loaded with biotinylated oligs and are mixed with DNA at a much lower ratio than in the standard Nextera kits, which allows much larger fragment sizes to be produced (2-15kb). The mate-pair library is then size-selected and circularized as in standard protocols (this is still a blunt-ended ligation which is less efficient than the “sticky” one used in the Lucigen kit. Physical shearing breaks up the large circular molecules and the biotinylated oligos added by in vitro transposition allow capture of only the mate-pair regions. These are purified and used as the template to a standard TruSeq library prep.
I like Nextera and we have been using it in capture projects and the new XT formulation. My ideas about what Nextera mate-pair might look like made use of two sets of transoposome loaded with different sequences. The first would create the large fragments, incorporate biotin and leave compatible ends for a "sticky" ligation. The second would be a standard Nextera prep to produce the final library which could be streptavidin purified before PCR.
Will structural-variants in Cancer be easier to detect: We’ve done very little mate-pair in my lab because of the sample requirements, so I'm hoping that these new developments will mean more users request the protocol and are able to make use of the additional structural variation data. For now many people seem to be happy with getting 80% or more of the variants from standard fragment libraries. However protocols that allow generation of multiple mate-pair sizes that can be indexed for sequencing are likely to allow identification of important, and so far difficult to identify rearrangements in Cancer genomes. Being able to run a single pooled sample that contains tumour:normal at ~350, ~3Kb, ~10kb & ~40kb inserts should give very high resolution copy number and high-quality structural variation data. This may also be achievable with far fewer reads than are used today and with the bonus that significantly less DNA is used in the prep.
Hydroshear vs Covaris vs Soniction vs enzymes: There are many ways to chop DNA into fragments but only a few will reliably give the larger ones required for successful mate-pair library preparation. Most of us are using Covaris or Bioruptor to produce 300-500bp fragments. These instruments can also generate longer bits of DNA but they are inefficient as most DNA is outside the range required.
The Hydroshear is a very clever piece of kit from Digilab that uses a pretty simple mechanism to break DNA into relatively tight fragment distributions. DNA is pushed through a tight contraction in a tube by a syringe. As the sample moves through the contraction the flow rate increases dramatically, this stretches the DNA until it snaps. The process is repeated over several cycles until an equilibrium is reached. The flow-rate and the size of the contraction determine the final fragment size. The smear of DNA usually seen on a gel after sonication is much tighter, e.g. 1.5-3Kb.
I'm writing a post on DNA fragmentation methods and will go into more detail there.