CoreGenomics: June 2011

Saturday, 25 June 2011

What sort of user are you?

Back in April Tales of the Genomic Repairman discussed five core facility stereotypes; Core of ill-repute, High $ Hooka core, the Waiting by the phone for the core to call core, the High maintenance core and the Easy core next door. I hope my users see the core I run as an easy core next door.

I thought I’d follow up after a conversation around Genomic Repairmans blog at a recent Illumina user meeting in Chantilly where six core heads were on the same table at dinner. We all agreed that those stereoptypes exist but the same goes for users. Anyway, here is what we came up with...

The “the deadline is always tomorrow” user always professes that their experiment needs to happen now, the data is either needed for a paper (Nature, Cell or Science of course) or a looming grant deadline. If the core can’t deliver then this guy’s career is over. You meet them six weeks later in the pub after work and ask if the paper was accepted and they say, “sorry, I’ve not had time to look at the data yet”! We want to help when we can and my lab can run an array project in three days if samples are ready to roll. But whilst we all know priorities get changed you can't pull this too many times. Ever heard of the boy who cried wolf?

The “how can I do it cheaper” user always wants to cut corners to bring the cost of the experiment down. If you say four replicates they say “three”, when you say three they say “two, you can do stats on two can’t you?” Their experiments never quite seem to generate results that can be validated.

The “I didn’t quite follow the protocol” user brings their samples along but casually let you know that they had to skip a clean up or amplification step to get the samples in before heading off for the weekend. You can bet they’ll be disappointed when the job does not work and they are still asked to pay. This user is often unhappy with the quality of the work from the core.

The “I want to use this method that was just published last week” user can often lead to an exciting new application in the core. However this kind of user can also come back the week after and say forget about the last method, this ones really great. It can be difficult to know when they are coming in with the next big thing rather than a fad. This user is enthusiastic but can take up a lot of time, however most cores relish the conversations and the opportunity to work on new ideas.

The " " silent user. Most of us round the table thought many users were too quiet. Whilst they seem happy with services offered, with the quality and turnaround of data, their feedback is limited. We want these users to talk to us more. Tell us what they like and don’t like. They have the opportunity to shape how our cores develop with their comments.

The “thanks a lot” user. This user always says thanks when they get their results. Even better, they always acknowledge the work the core did in the paper and sometimes include core staff as authors if it is warranted. The core is always happy to see this person back again and will always try to give that little extra. Service for this guy comes with a smile.

A last comment: Users of core facilities should speak with the staff and mangers. If you have a Core that’s not so easy going then talk to them about it. Constructive criticism should be taken on board and who knows soon your Core of ill-repute could soon be as easy going as you’d always hoped for.

Thursday, 23 June 2011

MiSeq vs Ion: How a little bug got involved in a big battle

How a little bug got involved in a big battle: MiSeq vs Ion

There has been a lot said about the recent sequencing of E coli from the recent outbreak in Germany. Over five isolates have been sequenced on almost every sequencing platform, HiSeq, 454, Ion and MiSeq. Interestingly I am not aware of a SOLiD or a Sanger genome. I’d recommend Genomewebs coverage as a good starter if you’re interested in finding out more.

Just this week Illumina made available a slide deck and data from their analysis of E coli K12MG1655 sequenced on HiSeq (PE100) and MiSeq (PE150).

The first 8 slides are the HiSeq MiSeq comparison:

Throughput seems to have gone up to 1.5Gb, which is a 50% increase over the initial specs so growth seems to be as fast on MiSeq as it has been on GA and HiSeq platofmrs. Albeit with just one data point so far. And there is still a,long way to go before it gets to 25Gb, see my first blog.

Libraries for the comparison were made using Tru Seq and not the Nextera kits from the recent Epicentre purchase which I was a little surprised about. Ot would have been great to see a multiplexed run on the two platforms for a TruSeq and Nextera comparison in the same data set.

Essentialy MiSeq outperforms HiSeq on quality a tiny bit. Other than that the datasets are pretty much the same n all respects. The HiSeq run has the characteristic intensity fluctuation at about 75bp where laser power is adjusted mid-run. Both runs have a stepped prfile in Men Qscore which I guess is a function of alignment getting better till about 20bp then declining with read quality. In this presentation Illumina do not show the Mean Qscore at 150bp for MiSeq.

Illumina sub samples the HiSeq run for a de novo assembly comparison and the result were strickingly similar in all respects other than MiSeq gave 11 where HiSeq gave 12 contigs. I am not an assembly wiz so can’t think why this would be the case nor whether it would matter a great deal.

The next 7 slides are a comparison of MiSeq to Ion Torrent:

Again I am not a bioinformatician so can’t realistically comment on the fairness of this comparison and others are getting the data for more impartial assessment.

The Ion data was all from the 314 chip which was specced at 10MB and the average run yield was 11-24MB from the threes tes that have generated data. So it looks like Ion is increasing the yield as expected. However it is very clear that the MiSeq has a huge advantage over the Ion platform with respect to yield as it gave 1.7Gb from this run. Quality on Ion was lower but without the stepped profile seen in MIseq and HiSeq, averaging out at Q31 vs Q19.

The comparison is an interesting one although it will probably be obsolete by the time I hit post due to the rapid developments from Ion. I did hear Broad have now obtained over 290Mb of data from a 316 chips so itt looks like their roadmap is on track.

It remains to be seen which platform is going to win this particular VHS vs Betamax battle (showing my age here). An unanswered question is also whether these platforms will ultimately replace systems like HiSeq, especially when whole genome sequencing can be purchased for $4000 with a 60 day turnaround. This could drop to $2500 next year if the 1Tb runs from Illumina work outside their development labs.

HiSeq costs $750,000 to buy including a cBot, MiSeq is $125,000. If yield is not your primary concern then MiSeq may well turn out to be the kind of instrument that finally democratises sequencing.

MiSeq vs HiSeq flowcell

So here it is, I was given one of these at AGBT and there is a picture in the MiSeq brochure. I thought I’d share an image with anyone wondering what they look like.

You can see it is very different from a HiSeq flowcell in that it is encased in a plastic housing. You can also see two inlet ports just above the ‘um’ in Illumina. These allow the much shorted and therefore faster fluidics to operate. The flowcell lane is bent back on itself, which is difficult to see in this picture.

I believe that only a portion of the MiSeq lane is imaged right now. But I am not exactly sure how much and this is key to working out how much data MiSeq might ultimately yield. A side-by-side comparison to a HiSeq flowcell shows that there is about 1/3rd the surface area of a HiSeq flowcell lane. Cluster density is being reported as the same on both platforms so if the clusters per mm2 is the same then a full MiSeq lane should generate about 1/3 the yield of a HiSeq lane.

However you could not access the increase in imaging area without spending more time on imaging. If Illumina and users are willing to increase the imaged area then yields will go up. MiSeq only images one surface right now, imaging both surfaces would generate a two fold increase but would double imaging time. If MiSeq is currently imaging one ‘surface’ from a possible two and one ‘tile’ from a possible three then it may be possible to increase out put by 6 fold. Of course I am not sure if this is possible and it relies on my assumptions about how MiSeq imaging works.

MiSeq is sold with very fast run times as a unique selling point and a single end 36bp run takes about four hours. If both surfaces can be imaged and three tiles rather than one are possible then the SE36 run time would go up to 6-8 hours but yield would jump from 0.3GB to 6Gb. For paired end 150bp runs this time would go from 1 to 2 days and yield would rocket from 1.5Gb to 9Gb. Run costs would not change.

I’d quite like my MiSeq to be shipped with a dial in the software that allows me to maximise data in the time I have available. If a run is going on overnight or the weekend for instance then why not let it run for longer and generate more data. This would be essentially for free.

More food for thought.

And ‘Yes’ that is James Watson’s signature on a HiSeq flowcell.

Monday, 20 June 2011

Why do the Chinese get such a good deal from Illumina?

Because they order 128 the day a new instrument is anounced and have deep pockets.

I have been following developments at BGI over the past three or so years and was as stunned as anyone else at their announcment of a 128 HiSeq 2000 order in January of last year. 128 instruments was a lot then and is still a lot today. Certainly the largest sequencing centre in the world and significanty more capacity than every country in the world except the USA and UK.

BGI has 137 HiSeq 2000s as well as 1x 454, 27x SOLiD and 1x Ion Torrent according to the google map of next gen sequencers. The US has 712, the UK 132 instruments each respectively.

Using the latest 600GB run chemistry they can put out 50Tb of data per week.

A recent Newsweek article profiles what is going on today over in Shenzhen. Note this BGI is nowhere near Beijing. It also specifies a price of just $500,000 for the HiSeq 2000. I thought $700,000 was nearer the mark so this represents about a 30% discount. Pretty good.

Now all anyone else need to do to realise a similar discount is spend $64M in a single purchase order. Reagents not included!

Roll on HiSeq version 2... ;-)

Thursday, 2 June 2011

The cost of maintaining those toys in the lab

I have just renewed several service contracts and this time of year always leaves me feeling a bit cold about how much money has just been spent. For my lab the bill is well over $200,000. That is nearly 50 Genomes at todays prices, and people are only just publishing that many in a single paper!

At about 8-15% of the instrument purchase price a service contract is often seen as expensive. But I don't think there are many who choose to take a chance on their microarray scanner, real-time PCR instrument or sequencers - next-gen or otherwise. The impact of downtime is significant. The cost of an engineer visit can be very high and include very expensive travel costs. And the parts can be insanely expensive, over $50,000 on a laser for instance.

Two things amuse me about the whole service contract business. First are the names used for contracts; Gold, Silver, Advantage, Prestige, etc are all very aspirational and suggest we get something truly valuable. Second is how every company professes to make absolutely no profit. Companies are certainly not aiming to lose money here and I guess it is not so easy to set margins as on a consumable item. However I do not believe the this is unprofitable.

Some bits of kit seem to plod along with nothing other than an annual preventative maintenance visit. Others seem to need constant engineer visits to keep them working. The cost of the contract often reflects this although I don't know if anyone has ever tried to compare cost vs reliability?

Many labs, mine included, try to recover some of the maintenance costs in charges made to users. However this can add huge amounts of money to the cost of accessing some little used bits of equipment. And this is one of the major reasons for my deciding whether to keep a system going for another year. Sometimes a system needs to be internally subsidised. But commercial service providers can be good value, who still runs low volume Sanger sequencing for instance?

Recently when I have spoken to people running similar labs the words 'risk management' keep coming up. It seems that more of us are considering not getting contracts on every item in the lab, but nearly all agree the big tickets need to be covered. I'd worry a little though if my centrifuge, bioanalyser or PCR machine broke and I could not run my HiSeq! I guess the crunch in the economy hits us all and anywhere we can make a saving needs to be explored.

At the end of the day service contracts are just insurance policies and ones that most of us would not be without. However I don't know of any large institution that has looked into this in any depth to see if the insurance is worth paying. Perhaps a more holistic view could be taken with no service but a big contingency pot for when things do go wrong. What could MRC or NIH save or would it cost almost as much to administer?

Pages