Data warehouse switchover pays off for Morgan Stanley
IT chief reveals how firm used parallel simulation to minimise risk
Global financial services firm Morgan Stanley has completed a five year migration project to a Teradata data warehouse.
The migration made use of parallel simulation to limit the risks involved in switching over.
Stephen Kuster, executive director of finance IT at Morgan Stanley, told delegates at Teradata's Partner User Conference in San Diego about the move earlier this week.
Kuster said that the company's old data warehouse consisted of 31,000 databases, 350,000 tables and 100 terabytes of data, where 2,800 users were using the data across 35 different applications.
The warehouse was used for the company's daily profit and loss accounting, monitoring trading activities and valuation of books. It essentially collected data from across the whole firm and ran 4,000 extract, transform and load (ETL) jobs on a daily basis.
An ETL job requires extracting data from a source (E), then transforming it to fit operational needs (T) and finally loading (L) it into the end target. Morgan Stanley estimate that it loads approximately 1.5 billion records every day.
"We needed a new solution because we could not keep up with the capacity. Scalability was a real problem for us, we could have thrown money at more hardware, but we couldn't be sure that it would work across the whole system," said Kuster.
"Resiliency was also a concern. We had 52 servers running this operation, and if we had an outage it would take up to an hour to fail over. So with 52 servers you are looking at potentially 52 hours of outages. That's not sustainable," he added.
Kuster decided to migrate to Teradata's Enterprise Data Warehousing solution, which boasts 99.999 per cent availability, and enabled Morgan Stanley to shrink its footprint down from 350,000 tables to 1,200 tables through deduplication.
However, the switch to the new system had to be done in one weekend, so Kuster opted to trial it in parallel to the old one for three months first.
"We had to do this in a big bang and that was really scary: 35 applications, 2,800 users and 31,000 databases killed in one weekend," explained Kuster.
"The only way you are going to do this and be sure of what you are doing and what problems you will face is if you run the new system. I tried to estimate things, but we would never be sure unless we were running it," he added.
"So we did that, we ran the new system in parallel to the old one for three months. We did application simulations, where requests were coming into production, and we set up ways to mimic the traffic in the new Teradata warehouse in real time.
"We did this also for our middleware service and the ETL loads. So we were eventually running the whole thing in parallel."
Kuster explained to delegates that this took away the element of surprise and gave his team time to familiarise themselves with the system's capabilities.
"Teradata's workload management tool was new to us, where it allows us to prioritise work, figure out who goes first, and get some complicated rules in there to do different things," said Kuster.
"We now have some very complex rules to manage our workload, but I wouldn't have wanted to figure that out on the first day of production. There is no way. The three-month simulation gave us time to do this."
Running the simulation for three months also allowed Morgan Stanley to monitor performance between the two systems and get an indication of where performance had improved.
"The ability to compare SQL query logs was very helpful. Every time I ran something in the simulator I knew the real performance number and I was able to compare that against the other system. We were able to do a head-to-head competition and in the new warehouse 74 per cent of SQLs were faster," said Kuster.