Friday, February 12, 2010

New buzz for Industry..MDM solutions..So what is it?

We have had so many MDM like words before Master File, Master Data, Customer Master file, Product Master File, etc. So whats new with MDM's inception? Is it different from before?

Yes, MDM is a new technique around the maintenance of the old customer/product master file. Electronic data processing(EDP) has evolved from 1950s to 2000s. It was about time to change the maintenance procedure of the master file hence the buzz word "Master Data Management"  . Companies have invested billions of dollars in creation of MDM solutions. Following are the companies that offer MDM solutions as a product -

1. Oracle Siebel UCM/Oracle CDH
2. IBM MDM through InfoSphere.
3. SAP MDM
4. Informatica Siperian (Informatica acquired Siperian in January 2010).
5. SAS DataFlux
6. Initiate Systems
7. D&B Purisma
8. SAP MDM
9. Tibco
10. VisionWare
11. Sun
12. Talend MDM (Open Source MDM)

MDM is a concept to keep all packaged, Enterprise and analytical applications up-to-date with a single view of entities, mainly Customer and Product. Beware, MDM is a concept not a framework. A MDM solution can be built using home grown techniques or through MDM product solution offerings.

So what is MDM anyways?
MDM is a technique to generate a "golden record" of an organization's essential entity such as Customer, Product, Account, etc. Some people argue that a datawarehouse contains a "single version of truth" of these essential entities. Well said, but the truth of the matter is datawarehouses are flattened out from the 3rd normal form. Whereas, a MDM is still in the 3rd normal form that can be leveraged by the transactional system. A transactional record gets updated frequently and cannot show up in the analytical/datawarehouse immediately. MDM data stores get updated immediately through real-time processing, that can help business gain confidence in making a decision. MDM enables data cleansing, matching, de-duplicating and surviving techniques to uniquely identify the "golden record" along data governance procedures with the responsbilities&roles of data stewards.

For the year 2010, Gartner predicts that 70% of  Global 2000 companies would be MDM ready. Forrester projects $6 billion in spending for MDM implementations by 2010.

Thursday, February 11, 2010

ELT - Extract, Load and Transform. When to use?

Heard so much about it around 2005-06, now the buzz word has faded on "ELT". So what exactly, is an ELT? Why can't I traditionally use an ETL? Well, ELT pretty much can do the same transformations as an ETL. It is for you to decide why, when, where and how to do the transformations.

Mostly, if I/O is a concern then ELT is your answer. ELT does not create any memory/process/swap overheads on the hosting server since all calculations are done on the database. Talend's offerings for ELT based processing is a must watch. Also, ELT is mostly relied for 3rd normal form to 3rd normal form data loading. If you have an enterprise datawarehouse that has batch/real time loading then ELT is not your choice, the answer is always a YES to use ETL for datawarehouse loading.

Talend offers open source ETL and ELT. Interesting features around Talend are you could use both ELT and ETL objects in the mapping you are designing. Also, Talend lets you choose between Java and PERL source code type, just at a click before creating a mapping.

A typical ELT mapping/job contains a source query object(E), then creation of a table, followed by loading (L), and finally transforming(T) such as aggregating, then loading into a target table.

The old school method of doing an ELT was, when a stored proc. created temp tables, then getting them loaded and then performing some calculations before loading into target tables. InfoPump, Microsoft DTS was another old school method of doing an ELT. 

With lowering hardware costs, ELT seems like an almost obsolete concept now.

Disclaimer: The above are my personal opinions based on my observation and do not represent my employer's view in any way.

Competitive price offerings from expressor-software

Booyah! I watched expressor-software's deep dive webinar today hosted by two of their highly knowledgeable - Director of Field Engineering (DFE), Marketing Director (MD). After the webinar I had tons of questions around the framework that I would perhaps need to design for a large EDW if I used expressor-software.
Soon after, DFE spoke about scalability, stress testing, semantic repository, operational metadata, I feel it is an amazing kick-off. Following are features that came fascinating to me right away -
1. Semantic repository based on a rules engine. What that means is a data steward who creates business rules can log the rules into their semantic engine and a developer can pick it off during the dev. phase. Amazing isnt't it?
2. Re-usability - Often on financial and banking projects we hear developer's frustrations. "Should I recreate the business rule? Phew! the tool could have been a little better to offer something like a shared container(IBM DataStage term), mapplet(Informatica lingo)". expressor-software's thrust is on their business rules re-usability.
3. Pipeline Partitioning - Early 2000s is when Ascential introduced pipeline partioning into DataStage at VERSION 5. Even Informatica introduced pipeline partitioning later in their offerings. For a startup tool, this is great to have in its 2nd version itself! Good visioning expressor-software!
4. Pricing Model - Channel based pricing. A channel based on their pricing model is sequential processing if 1 channel. For multi-processing multi channels need to be bought. Excellent price chart starting with 6months, 1 year, 2 year based subscriptions. Feels like you are at a redbox for renting a tool.

What I really lookforward from expressor-software is -
1. Although every tool should and can be configured with an enterprise scheduler in this era of fast paced data integration world, every tool needs to have its own workflow designer. I'm waiting for expressor-software to come out with its own workflow based designer for developers to call it fully packaged enterprise tool and for operators to receive notifications at wee and fiesta hours. :)

Disclaimer: The above are my personal opinions based on my observation and do not represent my employer's view in any way.

DataIntegration market getting more crowded

Too many offerings..too many vendors..Small, Big, financial viability, Confused? With Open Source offerings in ETL, ELT, MDM, etc the market seems to have got crowded than ever before. Following are the vendor offerings out in the stiff market of data integration space.
Top picks out in the market -
1. Informatica
2. IBM DataStage
3. Ab Initio
4. Oracle
5. SAP Business Objects DataIntegrator
6. Microsoft SSIS

Top Challengers-
1. SAS/DataFlux
2. iWay Software
3. Talend
4. Syncsort
5. Pervasive Software
6. Pitney Bowes Insight

Competitive price vendors -
1. Informatica Cloud edition
2. expressor-software (pricing based on channels i.e. multiple processing)

Open Source vendors -
1. Talend
2. Apatar
3. Pentaho Kettle
4. SnapLogic

Cloud/SaaS Based vendors -
1. Informatica
2. SnapLogic
Both started by entrepreneur Gaurav Dhillon.

Top Pipeline Partitioning vendors -
1. IBM DataStage (process based)
2. Informatica (multi thread based)
3. expressor-software (hybrid based)

Top Message oriented/Real-time processing vendors -
1. IBM DataStage
2. Informatica

Best salesforce.com Integration vendors -
1. Informatica (both Cloud edition based and adaptor based support)
2. IBM DataStage (Adaptor/Pack based support)


Top ELT architecture based vendors -
1. Talend (excellent ELT based objects to drag and drop in the designer)
2. IBM DataStage (provides options to create tables before loading)

Disclaimer: The above are my personal opinions based on my observation and do not represent my employer's view in any way.