Pages

DataStage Tips n Tricks

 DataStage Administrator
  • Add & Delete projects
  •  Issue DataStage Engine commands directly from the selected project
  •  View or set project properties (allow users to Cleanup Resources & Clear Status File from within the   Job menu of DataStage Director)
  •  Permission (Production Manager, Developer & Operator)
--> Production Manager - full access to all areas of the project and can create   manipulate protected projects.
(If the project is created with ‘protected’ property, only Production Managers can add or remove objects.  [Production manager is a user role].  Other user can,
Run jobs,
Set job properties,
Set job parameter default values)

--> Developer - full access to all areas of the project
                  --> Operator     - permission to run and manage DataStage   jobs
  • Server side Tracing (Can trace activity on the server to help diagnose project problems)
  • Scheduling the jobs by using Windows NT Schedule service
  • Performance tuning
--> Set the memory cache size for reading and writing hashed files

--> In-process row buffering (This allows connected active stages to pass data via  buffers rather than row by row)
  • Inter Process row buffering (SMP) (This enables the job to run using a separate process for each active stage, which will run simultaneously on a separate processor)
DataStage Manager
  • Viewing & managing the contents of the Repository

  •  Create, Delete Categories and Move items between Categories

  •  Import & Export components and objects in the repository

  •  Export utility creates an ASCII text file

  •  Usage analysis tool (How modifying a particular object would affect the DataStage project as a whole)

  •  Reporting Assistant allows you to generate reports at various levels within a project
Director 
  • Used to validate, run, schedule and monitor jobs

  •  Gather statistics as the job runs
Designer

  • To develop process for extracting, cleansing, transforming, integrating and loading data

  •  Stages (Each stage describes a particular database or process)

  •  Three basic types of stages (Built-in Stage, Plug-in Stage and Job Sequence Stage)

 i) Built-in Stage - Used for ETL process

         

ü)  Plug-in Stage - Additional stages to perform specialized tasks that built-in stages do not support.

         

                  üi) Job Sequence Stage - To define sequences of activities to run         

  • Three basic type of jobs (Server jobs, Parallel jobs and Mainframe jobs)

 i).  Server and Parallel jobs are compiled and run on the DataStage Server.  But parallel jobs supports parallel processing on SMP, MPP and cluster systems.   



       ü).  Mainframe jobs are complied and run on the mainframe.

  • Server job stages (Active or Passive)

 i).  Active stage provide mechanisms for combining data streams, aggregating data and converting data from one data type to another. (E.X. Aggregator, Transformer, etc.)


 ü) Passive stage handles access to databases for extraction or writing of data (E.X. ODBC, Hashed file, Sequential file, etc)

  •  Reusable elements (Server shared container and Parallel shares container)
Job Sequences (To specify a sequence of DataStage jobs to be executed and actions to take depending on results)

1 comments:

Unknown said...

Datastage is a good etl tool by IBM which is very good for doing the jobs.
DataStage

Post a Comment