Wednesday, March 3, 2010

Informatica vs DataStage


Note: The following is based on differences between Informatica 8.6.1 and IBM DataStage 8.0.1. Informatica’s standard industry term is Mapping for a source-target pipeline and IBM DataStage’s is a job.

Pipeline Partitioning –

IBM's thrust is DataStage's pipeline partitioning. Data can be segmented on multiple partitions, processed and then re-collected with IBM DataStage. IBM DataStage lets control a job design based on the logic of the processing instead of defaulting the whole pipeline flow to one partition type. IBM DataStage offers 7 different types of multi-processing partitions.

Informatica offers partitioning as dynamic partitioning which defaults a workflow not at every Stage/Object level in a mapping/job. Informatica offers other partitioning choices as well at the workflow level.

Designer/Monitoring GUIs –

Informatica offers access to the development and monitoring effort through its 4 GUIs - offered as Informatica PowerDesigner, Repository Manager, Worflow Designer, Workflow Manager.

IBM DataStage caters to development and monitoring its jobs through 3 GUIs - IBM DataStage Designer(for development), Job Sequence Designer(workflow design) and Director(for monitoring).

Version Control –

Informatica offers instant version control through its repository server managed with “Repository Manager” GUI console. A mapping with work-in-progress cannot be opened until saved and checked back into the repository.

Version Control was offered as a component until version Ascential DataStage7.5.x. Ascential was acquired by IBM and when DataStage was integrated into IBM Information Server with DataStage at version 8.0.1, the support of version control as a component was discontinued.

Repository based flow –

Informatica, offers a step-by-step effort of creating a data integration solution. Each object created while mapping a source with a target gets saved into the repository project folder categorized by - Sources, Targets, Transformations, Mappings, Mapplets, User-defined functions, Business Components, Cubes and Dimensions. Each object created can be shared, dropped into a mapping across cross-functional development teams. Thus increasing re-usability. Projects are folder based and inter-viewable.

IBM DataStage offers a project based integration solution, projects are not interviewable. Every project needs a role based access. The step-by-step effort in mapping a source to a target lineages into a job. For sharing objects within a job, separate objects need to be created called containers that are local/shared.

Creating a source-target pipeline –

Within Informatica’s PowerCenter Designer, first a source definition needs to be created using “Source Analyzer” that imports the metadata, then a target definition is created using “Target Designer”, then a transformation using “Transformation Developer” is created, and finally maps a source-transformation-target using “Mapping Designer”.

IBM lets drag and drop a functionality i.e a stage within in one canvas area for a pipeline source-target job. With IBM DataStage within the “DataStage Designer” import of both source and target metadata is needed, proceeding with variety of stages offered as database stages, transformation stages, etc.

The biggest difference between both the vendor offerings in this area is Informatica forces you to be organized through a step-by-step design process, while IBM DataStage leaves the organization as a choice and gives you flexibility in dragging and dropping objects based on the logic flow. 

Code Generation and Compilation –

Informatica’s thrust is the auto-generated code. A mapping gets created by dropping a source-transformation-target that doesn’t need to be compiled.

IBM DataStage requires to compile a job in order to run it successfully. Changing business requirements effect the maintenance of change control management with IBM DataStage jobs. Re-compilation is required for every occurring change.

Reusability –

Informatica offers ease of re-usability through Mapplets and Worklets for re-using mappings and workflows.

IBM Stage offers re-usability of a job through containers(local&shared). To re-use a Job Sequence(workflow), you will need to make a copy, compile and run.

Change Data Capture (CDC) –

Informatica offers CDC through a separate edition – Real-time Edition. CDC is a drag and drop object within IBM DataStage Designer.

Data Encryption/Masking -
Data Masking or encryption needs to be done before reaching IBM DataStage Server. Informatica has an offering within PowerCenter Designer as a separate transformation called “Data Masking Transformation”.

Variety of Transformations –

Informatica offers about 30 general transformations for processing incoming data.

IBM offers about 40 data transforming stages/objects.

Impact Analysis –

Informatica offers a separate edition – Advanced edition that helps with data lineage and impact analysis.

IBM DataStage offers through Designer by right clicking on a job to perform dependencies or impact analysis.

Real-Time Integration –

IBM DataStage within the Designer offers creating in-the-box real-time solutions for WISD, XML, Web Services, WebSphere MQ, Java based services.

Informatica offers SOA/Real-time integration through Real-Time edition.

Monitoring –

Informatica Workflow monitor offers different levels of run-statistics information. Tracing levels are offered at 4 different levels – Normal, Terse, Verbose Initialization and Verbose data. These tracing levels offer the degree of information based on source/target rows, caching, transformation statistics for each mapping.

IBM DataStage offers Operational Statistics from DataStage Director. The start, elapsed, end times can be viewed within the Director GUI. Row statistics can be obtained at every processing stage/object through the monitor option within the Director.

6 comments:

  1. I have a couple of corrections or clarifications to Informatica part.

    First Re-usability: In addition to high level re-usability of mapplets/worklets, there are the re-usable transformations and session tasks and the user defined functions.

    Second CDC: The CDC comes bundled with RT Edition, but can also be purchased separately. And when installed, it is just another source object. And for DS CDC, it is as well a separate license for true CDC.

    Third Impact Analysis: In Informatica you get impact analysis called View Dependencies by right clicking on a source, target, mapping or any repository object. The impact analysis that comes with Advanced Edition, called Metadata Manager, brings impact analysis across source and target databases and reporting systems.

    ReplyDelete
  2. Anonymous : I have a couple of corrections or clarifications to Informatica part.

    Aditya : Thank you for the comments, you have provided more detailed info. from the high-level found in this subject, rather than corrections or clarifications as you meant.

    ReplyDelete
  3. Anonymous : And for DS CDC, it is as well a separate license for true CDC.

    Aditya : Bundled DataStage CDC has the capabilities of a true CDC at row-level. Real-Time CDC comes as a separate licence with DataStage through Data-Mirror. Real-Time CDC in Informatica is now offered through Power-Exchange.

    ReplyDelete
  4. Hi there! glad to drop by your page and found these very interesting and informative stuff. Thanks for sharing, keep it up!
    - data integration

    ReplyDelete
  5. Thanks for all your comments that motivated me to become a professional technical blogger. Please check my professional site that provides quick insights, comparisons of data integration tools.
    Please check - www.quickdatainsights.com

    ReplyDelete
  6. Good one Aditya. Found it really useful.

    ReplyDelete