Tuesday, September 29, 2009

How to manage 2PB+

September 30, 2009 – I dialed in to a wikibon.org Peer Incite teleconference yesterday. I like these things because they typically feature end users, often representing very large IT facilities.

In yesterday’s meeting, that would be the California Institute of Technology (Caltech) which, among many other things, is the academic home of NASA’s Jet Propulsion Laboratory (JPL).

One facility at Caltech hosts 2.3PB to 2.5PB of data, according to Eugean Hacopians, a senior systems engineer at Caltech and the speaker on the wikibon.org Peer Incite gathering. Since the facility’s files are very small, that 2.5PB translates into about two trillion files, according to Hacopians. Translated another way, the facility has about five million files per TB of storage capacity. (In its Infrared Processing and Analysis Center, or IPAC, the applications are primarily astronomy imaging related to space projects.)

You can listen to Hacopians’ hour-long chat here, but a few things jumped out at me while I was listening.

Instead of using a traditional SAN, Hacopians uses what he refers to as building blocks. In the file-serving area (as opposed to its compute servers and database servers), a typical building block consists of a Sun server (with two 4Gbps Fibre Channel HBAs and one 4Gbps Fibre Channel switch from QLogic) attached to SATA-based disk subsystems. About 99% of the capacity is on SATABeast disk arrays from Nexsan, and up to three SATABeast arrays can be attached to each file server.

“A large, shared SAN would have created more hassles and headaches than the building block approach,” says Hacopians. “A SAN would have introduced 3x to 5x more complexity.” That’s in part because the Caltech facility has a lot of different projects (with 10 to 14 projects going on simultaneously), which poses problems from a charge-back and accounting standpoint, according to Hacopians.

To cut energy costs (which are high when you have more than 2,500 spinning disks) Hacopians’ Caltech facility takes advantage of Nexsan’s autoMAID (massive array of idle disks) technology, which offers three levels of disk spin-down modes. (Caltech uses two of the three modes to maximize the performance/savings tradeoffs.)

Although some IT sites are leery of disk spin-down technology, Hacopians says that Caltech has not had any negative issues with Nexsan’s autoMAID technology.

So how many storage pros does it take to manage almost 2.5PB of storage capacity in a building block architecture? Until about two years ago, Hacopians managed about 1.5PB on his own. Today, he has help from two other people, each spending about one-fourth of their time on storage management.

1 comment:

melanb said...

That is great way to address the problem but it seems "just one side of the moon". Immagine a space where you have "intelligent storage" who handles information based on rules on metadatas, webservices oriented, geographically dispersed, multitenant, with many embedded functions like dedup, compression, spindown, .... and sold as low cost infrastructure. Call it "storage internal cloud". Now take a step forward and imagine to be able to connect your "internal cloud" to other "federated external clouds" used as IaaS so you can mix CAPEX and OPEX as you need. Third step, imagine to add on top internal virtual servers and to extend them to use external servers as IaaS too. More powerfull ? Dream ? It is a reality.
Bruno