Why Harte Hanks moved from Cloudera's Hadoop to MapR for its new marketing services platform

Head of partner tools Donna Belanger explains big data developments at the marketing services firm

Harte Hanks is a multinational marketing services firm offering customer analytics, database marketing and customer relationship database services. It provides a multi-dimensional view of its customers' clients by pulling data from multiple sources including social media, email transaction history and others.

Donna Belanger is head of partner tools at Harte Hanks, a relatively new role that recognises that the firm is "not a software company; we are an enabler of marketing process through technology and data".

It is Belanger's job to manage the software and outsourcing partnerships that enable Harte Hanks to stay ahead of the game. Most recently this has meant embracing big data technologies to facilitate the processing of large volumes of data from multiple sources.

One element of this was the adoption last year of of Splice Machine, a relational database that uses Apache Hbase as its storage layer; another was Cloudera's Hadoop1 distribution, chosen to store and crunch the streams of incoming data. Based on this combination the company was able to save money on Oracle RAC by distributing its systems across clusters of commodity servers, as Rob Fuller, managing director of the product innovation centre at Harte Hanks, explained to Computing last year.

Since then things have moved on and Harte Hanks is due to launch its new "data refinery" platform - what Belanger describes as a "decision engine" - later this month. But while Splice Machine remains a key element of the new engine, Cloudera has been displaced by rival Hadoop distribution MapR.

"Working with MapR is giving us a lot more capabilities to handle large data files, get into that big data space, and still keep the hardware costs in a manageable range for our customers," Belanger says, adding that the new platform will also integrate analytics software from Actian and Apache Drill.

The final contract with MapR was signed in April this year, after some initial pilots. Belanger explains the company's reasoning for switching away from Cloudera, one of which is, ironically, because as market leader it finds itself more thinly spread.

"MapR have been very attentive and really supportive whereas with Cloudera our feeling was we weren't going to be as important to them," she says. "Cloudera is the market leader and they're working with other partners but we want to make sure that we are on the latest and greatest."

However, the requirement that really tipped the balance in favour of MapR was multi-tenancy. As each customer has its own space on the Harte Hanks platform, with each needing to be able to scale up or down individually according to demand, the ability to accommodate this cost-effectively was something that MapR was able to demonstrate early on.

"Our ability to see that and have confidence in the multi-tenancy capability came from MapR. It certainly sounds like Cloudera are doing stuff in that area but we weren't able to see it in action," she says.

"Also it's the resiliency. If a node goes down everything is replicated on other nodes and the way that works within MapR is more designed for this multi-tenancy capability. It's much more straightforward."

The re-engineering required to get Splice Machine working with MapR-DB instead of its native Hbase, was one of the factors weighed up before the move, but Belanger says that on balance it was worthwhile.

"Our engineers will have a little bit more of a learning curve because it isn't the traditional Hbase, it's MapR's version of Hbase, but the feeling was the benefits we would get from that and the security that it affords our customers outweigh that additional knowledge requirement," she says.

The important thing is that customers, who typically use Cognos, Unica or Tableau to connect via ODBC connectors to Oracle through Splice Machine, don't notice any change in the service - except in a positive sense.

"It really gives us a migration path for our existing companies that should almost be transparent to them. The only difference will be the speed of their queries and the campaigns that run on their tools. As they add in additional types of data and other formats it's going to give them that flexibility that they need," says Belanger.

1Update: Cloudera has asked us to make clear that Harte Hanks was not a subscriber to the paid-for version of its Hadoop distribution, and was using the community version instead.