Friday 6 May 2016

BaseSpace updated: no more "free" bioinformatics

I've been a big fan of Illumina's BaseSpace since it was launched in late 2011. It was the first truly simple to access and free to use cloud-based analysis infrastructure for NGS data. I've used it a lot - primarily for run monitoring, but also for RNA-seq and Exome QC analysis using the BaseSpace Apps. But the free-for-all analysis Smörgåsbord is ending, and users will be looking carefully at the costs to determine if BaseSpace offers real value when compared to their internal infrastructure.

My lab has been using BaseSpace lots, primarily for run tracking and sharing some data with users, but more and more for quality control analysis of NGS libraries we've made and sequenced for users. At one point my lab was the largest user of BaseSpace in the world (we were early adopters) and we've an awful lot of data now sitting on Illumina's space in the Amazon Cloud - 70Tb to be precise! The prpblem with BaseSpace being free is that I've never deleted any data (although this was only possible from Sept 2014). As such my storage requirements are far higher than they need to be, and I suspect many other labs have the same problem. This has created an increasing headache for Illumina, but compared to the the price of a flowcell the storage is actually negligible: a PE150 HiSeq 4000 run costs about $15,000 in consumables, and generates 750Gb of data that costs $270 to store on AWS.

The cost of BaseSpace to users: Pricing was first discussed in the release of iCredits in 2013 but it was last year that Illumina announced the start of the transition to a full commercial launch of BaseSpace.

Whilst they are still providing a basic BaseSpace account for free this will only include 1TB storage space, or about 66% of a PE150 run on a HiSeq 4000. Using the free account you can still monitor sequencing runs, share data and access BaseSpace Apps. But the storage limit means after every single flowcell HiSeq users will need to tidy up and get rid of the data before their next flowcell starts. This probably makes BaseSpace Basic fine for smaller labs e.g 100+ MiniSeq runs, 50+ MiSeq 2x300 runs (although these don't appear to be working properly anyway so you could store 250 2x75 runs instead), or about 8 high-output NextSeq 2x150 runs.

BaseSpace Professional and Enterprise accounts allow users to purchase more storage and Illumina will pass-through Amazon Web Service prices (currently $360/TB/yr for storage under 50TB) at cost. Probably the feature I'd most like, and one Illumina spoke to us about five or six years ago was "fleet monitoring" where BaseSpace gives us the ability to keep an eye on our 7 Illumina sequencers and is promised to deliver "statistical information on sequencing runs over time". But this won't be available unless we sign up to the Enterprise account.

Free use of Illumina's BaseSpace apps ended officially last week (March 2016). Now almost all* compute charges will also be at AWS retail rates. The thing that has proven hardest to work out is how much what we do with BaseSpace will cost us going forwards. Although 99% of the data we generate is processed and delivered to users on our own internal hardware (supported very amiably by our wonderful Bioinformatics core) and this is all automated using Genologics Clarity LIMs, we still regularly make use of BaseSpace for some lightweight analysis.

The impact on our use of BaseSpace: Illumina showed the power of BaseSpace with the first "Genome in a day" in Feb 2012. And in April 2014 launched the RNA-seq apps TopHat and Cufflinks, which were a real milestone for me. I spoke at Illumina's 2014 AGBT workshop about my labs use of these tools for QC analysis of RNA-seq projects where my lab had made the libraries and run the sequencing. This allowed us to go much further than saying the sequencing was of high yield and quality. Our users liked the additional feedback as it meant they could get straight into downstream analysis and interpretation more quickly.

Why are Illumina doing this: How much storage and compute Illumina are subsidising is unclear but it is likely to be big (although a huge chunk could simply be people not deleteing data). In the note to customers highlighting the upcoming changes they say "due to the tremendous growth of our customer base, we are unable to continue to provide free unlimited storage and compute". However that same tremendous growth has resulted in tremendous profits from the large install base and significant reagent pull-through from all those customers. Personally I'd like to have seen costs rolled into SBS reagents, or better still library prep kits.

I hope BaseSpace continues to grow and I hope users don't leave in droves. My 70+TB are costing Illumina around $25,000 per year so I can understand them not wanting to pick up the tab - but this represents about 1% of my annual spend on consumables and support so I can't help but see this change as more than a little avaricious.

* Not all compute is charged for: Bcl2fastq conversion and demultiplexing remains free.

1 comment:

  1. "50+ MiSeq 2x300 runs (although these don't appear to be working properly anyway so you could store 250 2x75 runs instead)"

    Appear not to be working? Could you elaborate?


Note: only a member of this blog may post a comment.