Cambridge Research Institute deploys HP Converged Infrastructure to cope with big data

Cancer Research UK arm will have to cope with 750TB of data in next 18 months

Cambridge Research Institute (CRI), the part of Cancer Research UK that conducts research, has deployed HP's Converged Infrastructure to help it cope with an anticipated increased workload over next 18 months.

The CRI is funded by Cancer Research and works with the University of Cambridge and Addenbrooke's Hospital.

Peter MacCallum, head of IT and scientific computing at CRI, told Computing that the CRI deployed HP Converged Infrastructure to handle the growing amount of data that was being transfered into the Institute.

"We are increasingly using technologies such as high resolution imaging, multi-dimensioning imaging and in particular DNA sequencing. These technologies generate several terabytes (TB) per week.

"Our biggest problem is managing and handling the data volumes. We're looking at projects within the next 18 months which will bring three-quarters of a petabyte of data into the institute, in addition to the 300TB we currently generate a year. We were in need of a multi-petabyte management system," he said.

The institute has also deployed a new storage system.

MacCallum said that the institute's main storage was moved from a storage area network (SAN) based infrastructure three years ago.

"We had an Apple Xserve infrastructure in place but Apple doesn't support that anymore and there were issues with interoperability with other technologies and a cost issue based on using fibre switches. We wanted a more modular and scalable technology," he said.

The CRI then started using network-attached storage arrays and individual islands of Linux file systems of between 20TB and 40TB each.

But this meant it had to manage multiple individual file systems, which was becoming an overhead.

The CRI tendered for a new storage system and, after a four-month process, selected HP's IBRIX X9720 network storage system to add to its HP Converged Infrastructure.

"We looked at four or five of the main players but decided on HP's system because it was competitive in price.

"We trusted the engineering quality because we have worked with HP before and it fulfilled our requirements of being scalable and cost-effective," said MacCallum.

MacCallum explained that the IBRIX infrastructure allows CRI to have a single scalable main space that is easier to manage.

"The IBRIX infrastructure gives us a single scalable storage space where we can put all our data, particularly the data we're going to keep long term.

"We wanted to keep the management overhead as low as possible by using a single storage space. Technically we're able to scale well beyond 16PB," he said.

He added that the technology will also be similar to a virtualised storage system but at a much cheaper rate than a SAN-based infrastructure.

"All the data will be kept in one place in one file system, which is easier to manage. We can turn over the hardware so its disc becomes denser, and as technology changes can move the data onto new hardware without taking the file system offline.

"This is crucial as it is similar to virtualising the storage layer but without the costs that are associated with the SAN-based approach," he said.

Two HP MSL8096 tape libraries provide backup for the CRI's datacentres, and to reduce power consumption, legacy servers have been replaced with an HP ProLiant BladeSystem c7000 enclosure and HP ProLiant BL490c G6 and G7 servers.