Arangen, Inc. Far Out
The Real-Time Data Integration Provider 
Call Now Call Now: 1-408-942-7320  
 
 

GLOSSARY of DATABASE TERMS

 

            ARANGEN

            3GL

            B2B

            B2C

            B2P

            BI

            CDI

            CICS

            CORBA

            CRM

            CRP

            CSS

            CWM

            DBMS

            DCOM

            DSS

            EAI

            EII

            EIS

            ERP

            ETL

            GIOP

            IIOP

            JDBC

            METADATA

            MDM

            MOF

            MRP

            ODBC

            OMG

            ORB

            OTLP

            RPC

            SFA

            SIC

            SOA

            SOAP

            SQL

            STP

            UML

            XMI

            XML

 

ARANGEN (Middle English) to put into proper order or into a correct or suitable sequence, relationship, or adjustment. [Merriam-Webster’s Ninth New Collegiate Dictionary, 1986]

 

3GL    A third generation language (3GL) is a programming language designed to be easier for a human to understand, including things like named variables. A fragment might be:

 

let b = c + 2 * d

 

Fortran, ALGOL and COBOL are early examples of this sort of language. Most "modern" languages (BASIC, C, C++, Delphi, Java, and including COBOL, Fortran, ALGOL) are third generation. Most 3GLs support structured programming.

 

B2B    Business-to-business electronic commerce (B2B) typically takes the form of automated processes between trading partners and is performed in much higher volumes than business-to-consumer (B2C) applications. For example, a company that makes chicken feed would sell it to a chicken farm, another company, rather than directly to consumers. An example of a B2C transaction would be a consumer buying grain-fed chickens at a grocery store. B2B can also encompass marketing activities between businesses, and not just the final transactions that result from marketing. B2B also is used to identify sales transactions between business. For example a company selling photocopiers would likely be a B2B sales organisation as opposed to a B2C (business to consumer) sales organisation.

 

B2C    Business-to-consumer (B2C), also business-to-customer, describes activities of commercial organizations serving the end consumer with products and/or services.

 

B2P     Business-to-Partner (B2P), describes the activities of commercial organizations providing access to their on-line resources for their partners.

 

BI        Business intelligence (BI) relates to the intelligence as information valued for its currency and relevance. It is expert information, knowledge and technologies efficient in the management of organizational and individual business. Therefore, in this sense, business intelligence is a broad category of applications and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions. The term implies having a comprehensive knowledge of all of the factors that affect your business. It is imperative that you have an in depth knowledge about factors such as your customers, competitors, business partners, economic environment, and internal operations to make effective and good quality business decisions. Business intelligence enables you to make these kinds of decisions.

 

CDI     (Customer Data Integration) is the combination of the technology, processes and services needed to create and maintain an accurate, timely and complete & comprehensive representation of a customer across multiple channels, business lines, and enterprises typically where there are multiple sources of associated data in multiple application systems and databases. CDI is commonly used in Master Data Management, and enables access to information describing everything known about a customer including all attributes and cross references, along with the critical definition and identification necessary to uniquely differentiate one similar customer from another. Customer Data Integration relies heavily on the standardization of data and overall data quality. Therefore, large corporations and those with large amounts of data often set up data governance teams to manage the CDI process.

 

CICS   (Customer Information Control System) is a transaction server that runs primarily on IBM mainframe systems under z/OS or z/VSE. CICS is available for other operating systems, notably i5/OS, OS/2, and as the closely related IBM TXSeries software on AIX, Windows, and Linux, among others. The z/OS implementation is by far the most popular and significant.

 

CICS is a transaction processing system designed for both online and batch activity. On large IBM zSeries and System z9 servers, CICS easily supports thousands of transactions per second, making it a mainstay of enterprise computing. CICS applications can be written in numerous programming languages, including COBOL, PL/I, C, C++, Assembler, REXX, and Java.

 

CORBA In computing, Common Object Request Broker Architecture (CORBA) is a standard for software componentry, created and controlled by the Object Management Group (OMG). It defines APIs, communication protocol, and object/service information models to enable heterogeneous applications written in various languages running on various platforms to interoperate. CORBA therefore provides platform and location transparency for sharing well-defined objects across a distributed computing platform.

 

In a general sense CORBA “wraps” code written in some language into a bundle containing additional information on the capabilities of the code inside, and how to call it. The resulting wrapped objects can then be called from other programs (or CORBA objects) over the network. In this sense, CORBA can be considered as a machine-readable documentation format, similar to a header file but with considerably more information.

 

CORBA uses an interface definition language (IDL) to specify the interfaces that objects will present to the world. CORBA then specifies a “mapping” from IDL to a specific implementation language like C++ or Java. This mapping precisely describes how the CORBA data types are to be used in both client and server implementations. Standard mappings exist for Ada, C, C++, Lisp, Smalltalk, Java, and Python. There are also non-standard mappings for Perl and Tcl implemented by ORBs written for those languages.

 

CRM   The generally accepted purpose of Customer Relationship Management (CRM) is to enable organizations to better manage their customers through the introduction of reliable processes and procedures for interacting with those customers.

 

In today's competitive business environment, a successful CRM strategy cannot be implemented by only installing and integrating a software package designed to support CRM processes. A holistic approach to CRM is vital for an effective and efficient CRM policy. This approach includes training of employees, a modification of business processes based on customers' needs and an adoption of relevant IT-systems (including soft- and maybe hardware) and/or usage of IT-Services that enable the organization or company to follow its CRM strategy. CRM-Services can even replace the acquisition of additional hardware or CRM software-licences.

 

The term CRM is used to describe either the software or the whole business strategy (or lack of one) oriented on customer needs. The second one is the description which is correct. The main misconception of CRM is that it is only software, instead of whole business strategy.

 

Major areas of CRM focus on service automated processes, personal information gathering and processing, and self-service. It attempts to integrate and automate the various customer serving processes within a company.

 

Architecture of CRM

 

There are three parts of application architecture of CRM:

 

* operational - automation to the basic business processes (marketing, sales, service)

* analytical - support to analyze customer behavior, implements business intelligence alike technology

* co-operational - ensures the contact with customers (phone, email, fax, web, sms, post, in person)

 

Operational CRM

 

Operational CRM means supporting the so-called "front office" business processes, which include customer contact (sales, marketing and service). Tasks resulting from these processes are forwarded to employees responsible for them, as well as the information necessary for carrying out the tasks and interfaces to back-end applications are being provided and activities with customers are being documented for further reference.

 

Operational CRM provides the following benefits:

 

* Delivers personalized and efficient marketing, sales, and service through multi-channel collaboration

* Enables a 360-degree view of your customer while you are interacting with them

* Sales people and service engineers can access complete history of all customer interaction with your company, regardless of the touch point

 

According to Gartner Group, the operational part of CRM typically involves three general areas of business:

 

* Sales force automation (SFA): SFA automates some of the company's critical sales and sales force management functions, for example, lead/account management, contact management, quote management, forecasting, sales administration, keeping track of customer preferences, buying habits, and demographics, as well as sales staff performance. SFA tools are designed to improve field sales productivity. Key infrastructure requirements of SFA are mobile synchronization and integrated product configuration.

 

* Customer service and support (CSS): CSS automates some service requests, complaints, product returns, and information requests. Traditional internal help desk and traditional inbound call-center support for customer inquiries are now evolved into the "customer interaction center" (CIC), using multiple channels (Web, phone/fax, face-to-face, kiosk, etc). Key infrastructure requirements of CSS include computer telephony integration (CTI) which provides high volume processing capability, and reliability.

 

* Enterprise marketing automation (EMA): EMA provides information about the business environment, including competitors, industry trends, and macroenviromental variables. It is the execution side of campaign and lead management. The intent of EMA applications is to improve marketing campaign efficiencies. Functions include demographic analysis, variable segmentation, and predictive modeling occur on the analytical (Business Intelligence) side.

 

Integrated CRM software is often also known as "front office solutions." This is because they deal directly with the customer.

 

Many call centers use CRM software to store all of their customer's details. When a customer calls, the system can be used to retrieve and store information relevant to the customer. By serving the customer quickly and efficiently, and also keeping all information on a customer in one place, a company aims to make cost savings, and also encourage new customers.

 

CRM solutions can also be used to allow customers to perform their own service via a variety of communication channels. For example, you might be able to check your bank balance via your WAP phone without ever having to talk to a person, saving money for the company, and saving you time.

 

Analytical CRM

 

In analytical CRM, data gathered within operational CRM are analyzed to segment customers or to identify cross- and up-selling potential. Data collection and analysis is viewed as a continuing and iterative process. Ideally, business decisions are refined over time, based on feedback from earlier analysis and decisions. Business Intelligence offers some more functionality as separate application software.

 

Collaborative CRM

 

Collaborative CRM facilitates interactions with customers through all channels (personal, letter, fax, phone, web, e-mail) and supports co-ordination of employee teams and channels. It is a solution that brings people, processes and data together so companies can better serve and retain their customers. The data/activities can be structured, unstructured,conversational, and/or transactional in nature.

 

Collaborative CRM provides the following benefits:

 

* Enables efficient productive customer interactions across all communications channels

* Enables web collaboration to reduce customer service costs

* Integrates call centers enabling multi-channel personal customer interaction

* Integrates view of the customer while interaction at the transaction level

 

Improving customer service

 

CRMs are to improve customer service. Proponents say they can improve customer service by facilitating communication in several ways:

 

* Provide product information, product use information, and technical assistance on web sites that are accessible 24 hours a day, 7 days a week.

* Help to identify potential problems quickly, before they occur.

* Provide a user-friendly mechanism for registering customer complaints (complaints that are not registered with the company cannot be resolved, and are a major source of customer dissatisfaction).

* Provide a fast mechanism for handling problems and complaints (complaints that are resolved quickly can increase customer satisfaction).

* Provide a fast mechanism for correcting service deficiencies (correct the problem before other customers experience the same dissatisfaction).

* Identify how each individual customer defines quality, and then design a service strategy for each customer based on these individual requirements and expectations.

* Use internet cookies to track customer interests and personalize product offerings accordingly.

* Use the Internet to engage in collaborative customization or real-time customization

* Provide a fast mechanism for managing and scheduling followup sales calls to assess post-purchase cognitive dissonance, repurchase probabilities, repurchase times, and repurchase frequencies.

* Provide a fast mechanism for managing and scheduling maintenance, repair, and on-going support (improve efficiency and effectiveness).

* Provide a mechanism to track all points of contact between a customer and the company, and do it in an integrated way so that all sources and types of contact are included, and all users of the system see the same view of the customer (reduces confusion).

* The CRM can be integrated into other cross-functional systems and thereby provide accounting and production information to customers when they want it.

 

Improving customer relationships

 

CRMs are also claimed to be able to improve customer relationships . Proponents say this is so because:

 

* CRM technology can track customer interests, needs, and buying habits as they progress through their life cycles, and tailor the marketing effort accordingly. This way customers get exactly what they want as they change.

* The technology can track customer product use as the product progresses through its life cycle, and tailor the service strategy accordingly. This way customers get what they need as the product ages.

* In industrial markets, the technology can be used to micro-segment the buying centre and help coordinate the conflicting and changing purchase criteria of its members.

* When any of the technology-driven improvements in customer service (mentioned above) contribute to long-term customer satisfaction, they can ensure repeat purchases, improve customer relationships, increase customer loyalty, decrease customer turnover, decrease marketing costs (associated with customer acquisition and customer “training”), increase sales revenue, and thereby increase profit margins.

 

Technical functionality

 

A CRM solution is characterised by the following functionality:

 

* scalability - the ability to be used on a large scale, and to be reliably expanded to whatever scale is necessary.

* multiple communication channels - the ability to interface with users via many different devices (phone, WAP, internet, etc)

* workflow - the ability to trigger a process in the backoffice system, e. g. Email Response, ...

* assignment - the ability to assign requests (Service Requests, Sales Opportunities) to a person or group.

* database - the centralised storage (in a data warehouse) of all information relevant to customer interaction

* customer privacy considerations, e.g. data encryption and the destruction of records to ensure that they are not stolen or abused

 

Privacy and ethical concerns

 

CRMs are not however considered universally good - some feel it invades customer privacy and enable coercive sales techniques due to the information companies now have on customers - see persuasion technology. However, CRM does not necessarily imply gathering new data, it can be used merely to make "better use" of data the corporation already has. But in most cases they are used to collect new data.

 

Some argue that the most basic privacy concern is the centralised database itself, and that CRMs built this way are inherently privacy-invasive. See the commercial version of the debate over the carceral state, e.g. Total Information Awareness program of the United States federal government.

 

Setting up a framework for CRM

 

* When you start setting up your CRM segment for your business you first want to see what profile aspects you feel are relevant to your business. Which information will provide you the keys to serve your customers in the best way possible? If you can look at your financial history for this information then what would you have liked to know about your customers in the past? What would have been the effects? And what information is not useful? Being able to eliminate unwanted information is a big aspect in implementing your CRM systems

* When designing your CRM's structure, always remember who your primary customers are. You want to keep more extensive information on them because they are your high-margin customers. You can keep less extensive details on the clients you identify as “low-margin”.

 

CRM in Business

 

In this day and age the use of internet sites and specifically e-mail, in particular, are touted as less expensive communication methods, compared to traditional methods like telephone calls. This revolutionary type of service can be very helpful, but it is completely useless if you are having trouble reaching your customers. It has been determined by some major companies that the majority of clients trust other means of communication, like telephone, more than they trust e-mail. Clients, however, are not the ones to blame because it is often the manner of connecting with consumers on a personal level making them feel as though they are cherished as customers. It is up to the companies to focus on reaching every customer and developing a relationship.

 

CRM software can run your entire business. From prospect and client contact tools to billing history and bulk email management. The CRM system allows you to maintain all customer records in one centralized location that is accessible to your entire organization through password administration. Front office systems are set up to collect data from the customers for processing into the data warehouse. The data warehouse is a back office system used to fulfill and support customer orders. All customer information is stored in the data warehouse. Back office CRM makes it possible for a company to follow sales, orders, and cancellations. Special regressions of this data can be very beneficial for the marketing division of a firm.

 

CRP    Capacity Requirements Planning is a computerized technique for projecting resource requirements for critical work stations. It is a tool for:

                        determining capacity that is available and required.

                        Alleviating bottleneck work centers.

Helping planners make the right decisions on scheduling before problems develop.

It verifies that you have sufficient capacity available to meet the capacity requirements for MRP plans.

 

CSS    Customer service and support (CSS): CSS automates some service requests, complaints, product returns, and information requests. Traditional internal help desk and traditional inbound call-center support for customer inquiries are now evolved into the "customer interaction center" (CIC), using multiple channels (Web, phone/fax, face-to-face, kiosk, etc). Key infrastructure requirements of CSS include computer telephony integration (CTI) which provides high volume processing capability, and reliability.

 

CWM  The Common Warehouse Metamodel (CWM) is a specification for modeling metadata for relational, non-relational, multidimensional systems, and most other objects found in a data warehousing environment. In addition, CWM models enable users to trace the lineage of data – CWM provides objects that describe where the data came from and when and how the data was created. Instances of the metamodel are exchanged via XMI (XML Metadata Interchange) documents.

 

DBMS A database management system (DBMS) is a computer program (or more typically, a suite of them) designed to manage a database (a large set of structured data), and run operations on the data requested by numerous clients. Typical examples of DBMS use include accounting, human resources and customer support systems. Originally found only in large organizations with the computer hardware needed to support large data sets, DBMSs have more recently emerged as a fairly standard part of any company back office.

 

DBMS's are found at the heart of most database applications. Sometimes DBMSs are built around a private multitasking kernel with built-in networking support although nowadays these functions are left to the operating system.

 

DCOM Distributed Component Object Model (DCOM) is a Microsoft proprietary technology for software components distributed across several networked computers to communicate with each other. It extends Microsoft's COM, and provides the communication substrate under Microsoft's COM+ application server infrastructure. It has been deprecated in favor of Microsoft .NET.

 

The addition of the "D" to COM was due to extensive use of DCE/RPC - more specifically Microsoft's enhanced version, known as MSRPC.

 

In terms of the extensions it added to COM, DCOM had to solve the problems of

 

* Marshalling - serializing and deserializing the arguments and return values of method calls "over the wire".

* Distributed garbage collection - ensuring that references held by clients of interfaces are released when, for example, the client process crashed, or the network connection was lost.

 

One of the key factors in solving these problems is the use of DCE/RPC as the underlying RPC mechanism behind DCOM. DCE/RPC has strictly defined rules regarding marshalling and who is responsible for freeing memory.

 

DCOM was a major competitor to CORBA. Proponents of both of these technologies saw them as one day becoming the model for code and service-reuse over the Internet.

 

Ironically, however, the difficulties involved in getting either of these technologies to work over Internet firewalls, and on unknown and insecure machines, meant that normal http requests in combination with web browsers won out over both of them. This despite Microsoft's attempts to add an extra transport - Network Computing Architecture, Connection-based, over HTTP aka ncacn_http - to DCE/RPC, which was made available seamlessly to DCOM services.

 

Alternate versions and implementations

 

The Open Group have a DCOM implementation called COMsource. The source code is available for COMsource, along with full and complete documentation, sufficient to use and also sufficient to implement an interoperable version of DCOM. According to that documentation, COMsource comes directly from the Windows NT 4.0 source code, and even includes the source code for a Windows NT Registry Service.

 

The Wine Team are also implementing DCOM. They are doing so for binary interoperability purposes, and are not currently interested in the networking side of DCOM, which is provided by MSRPC. They are restricted to implementing NDR (Network Data Representation) through Microsoft's API, but are committed to making it as compatible as possible with MSRPC.

 

DSS    Decision support systems are a class of computerized information systems that support decision making activities.

 

EAI     Enterprise Application Integration (EAI) is the use of software and architectural principles to bring together (integrate) a set of enterprise computer applications. It is an area of computer systems architecture that gained wide recognition from about 2004 onwards. EAI is related to middleware technologies such as message-oriented middleware MOM, and data representation technologies such as XML. Newer EAI technologies involve using web services as part of service-oriented architecture as a means of integration. Enterprise Application Integration tends to be data centric. In coming years it will come to include content integration and business processes.

 

Without integration, enterprise computing often takes the form of islands of automation, where the value of individual systems is not maximised because they are working in partial or full isolation. However if integration is carried out without following a structured EAI approach, many point-to-point connections grow up across an organisation. Dependencies are added on an ad-hoc basis, resulting in a tangled unmaintainable mess, commonly referred to as spaghetti, a comparison to the programming equivalent known as spaghetti code.

 

The number of n connections needed to have a fully-meshed point-point connections is given by (n * (n-1)) / 2. Thus for 10 applications to be fully integrated point-to-point,

(10 * 9) / 2, or 45 point-to-point connections are needed.

 

Current thinking is that the best approach to EAI is to use an Enterprise service bus (ESB) to connect numerous separate systems together. Other approaches have been explored, connecting at the database level or at the user-interface level. However, the ESB approach has generally been adopted as the strategic winner. Individual applications can publish messages to the bus, and also subscribe to receive certain messages from the bus.

 

With EAI each application only requires one connection, which is to the bus. Attending to EAI involves looking at the system of systems. Such message bus approaches can be extremely scalable, and also highly evolvable.

 

EAI is not just about sharing data between applications. EAI focuses on sharing both business data and business process.

 

EII      is the industry acronym for Enterprise Information Integration. It describes the process of using data abstraction to address the data access challenges associated with data heterogeneity and data contextualization. Data is the foundation upon which the "Information Age" and critical components such as the burgeoning Web 2.0 and a future Semantic Web are being built. Uniform data access and uniform information representation are critical aspects of this journey.

Data takes many forms within an enterprise, but it is safe to identify the following forms as most dominant:

 

* SQL - as result of the prominence of Relational Databases in modern business applications

* Non SQL Data - most dominant in legacy mainframe environments with a variety of proprietary storage, indexing, and data access methods.

 

Irrespective of data form, the issue of data access is pivotal en route to producing Information; hence the emergence of standardized Data Access APIs such as ODBC, JDBC, OLE DB, and more recently ADO.NET.

 

Standardization of Data Access APIs and the emergence of XML as a universal representation format, collectively provide a foundation for Information creation, persistence, and dissemination. It is this capability expressed via a software offering that describes an EII product.

 

Product Characteristics

 

An EII product offers virtualization of heterogeneous data where data takes the form of SQL, XML, Data-returning Web services, and other URI-referencable resources. Such SQL data is typically accessible via ODBC, JDBC, ADO.NET, OLE DB. XML is generally URI based, and is thus accessible via WebDAV.

 

EII, Virtual Database, and Universal Server products are more alike than different. In all cases, they provide single -- homogenous -- data representations and/or access points (SQL, ODBC, JDBC, ADO.NET, XML, or Web Services) for disparate data sources. For instance, a single JDBC or ODBC or XML resource URI could provide access to data in several relational database tables, each associated with a different database engine, from a different database vendor, and associated with a myriad of enterprise applications.

 

EII products enable loose coupling between homogenous-data consuming client applications and services and heterogeneous-data stores. Such client applications and services include Desktop Productivity Tools (Spreadsheets, Word processors, Presentation Software, etc.), Development Environments and Frameworks (J2EE, .NET, Mono, SOAP or RESTian Web services, etc.), business intelligence (BI), business activity monitoring (BAM) software, enterprise resource planning (ERP), Customer Relationship Management (CRM), Business Process Management (BPM and/or BPEL) Software, Web Content Management.


 

Utilization Mechanics

 

The steps that follow are common across all EII product offerings. Naturally, the implementation specifics will differ on a per vendor basis.

 

1. Determine shape and form of information to be processed

2. Identify associated data sources

3. Create EII product references (data source linking process) for respective data sources

4. Process information - for instance via a discrete or composite Web Service, Dynamic HTML/XHTML/XML Web Page, XML transformation (e.g RSS/Atom/RDF feed) etc.

 

EIS      An Executive Information System (EIS) is a computer-based system intended to facilitate and support the information and decision making needs of senior executives by providing easy access to both internal and external information relevant to meeting the strategic goals of the organization. It is commonly considered as a specialized form of Decision Support System (DSS).

 

The emphasis of EIS is on graphical displays and easy-to-use user interfaces. They offer strong reporting and drill-down capabilities. In general, EIS are enterprise-wide DSS that help top-level executives analyze, compare, and highlight trends in important variables so that they can monitor performance and identify opportunities and problems. EIS and data warehousing technologies are converging in the marketplace.

 

ERP    Enterprise resource planning systems (ERPs) are management information systems that integrate and automate many of the business practices associated with the operations or production aspects of a company.

 

Overview

 

Enterprise resource planning is a term derived from manufacturing resource planning (MRP II) that followed material requirements planning (MRP). ERP systems typically handle the manufacturing, logistics, distribution, inventory, shipping, invoicing, and accounting for a company. Enterprise Resource Planning or ERP software can aid in the control of many business activities, like sales, delivery, billing, production, inventory management, and human resources management.

 

ERPs are often called back office systems indicating that customers and the general public are not directly involved. This is contrasted with front office systems like customer relationship management (CRM) systems that deal directly with the customers, or the eBusiness systems such as eCommerce, eGoverment, eTelecom, and eFinance, or supplier relationship management (SRM) systems that deal with the suppliers.

 

ERPs are cross-functional and enterprise wide. All functional departments that are involved in operations or production are integrated in one system. In addition to manufacturing, warehousing, logistics, and Information Technology, this would include accounting, human resources, marketing, and strategic management.

 

In the early days of business computing, companies used to write their own software to control their business processes. This is an expensive approach. Since many of these processes occur in common across various types of businesses, common reusable software may provide cost-effective alternatives to custom software. Thus some ERP software caters to a wide range of industries from service sectors like software vendors and hospitals to manufacturing industries and even to government departments.

 

Implementation

 

Because of their wide scope of application within the firm, ERP software systems rely on some of the largest bodies of software ever written. Implementing such a complex and huge software system in a company usually involves an army of analysts, programmers, and users, and often comprises a very expensive project in itself for bigger companies, especially transnationals.

 

Enterprise resource planning systems are often closely tied to supply chain management and logistics automation systems. Supply chain management software can extend the ERP system to include links with suppliers.

 

To implement ERP systems, companies often seek the help of an ERP vendor or of third-party consulting companies. Consulting in ERP involves two levels, namely business consulting and technical consulting. A business consultant studies an organization's current business processes and matches them to the corresponding processes in the ERP system, thus 'configuring' the ERP system to the organisation's needs. Technical consulting often involves programming. Most ERP vendors allow changing their software to suit the business needs of their customer.

 

ETL    Extract, transform, and load (ETL) is a process in data warehousing that involves

 

* extracting data from outside sources,

* transforming it to fit business needs, and ultimately

* loading it into the data warehouse.

 

ETL is important, as it is the way data actually gets loaded into the warehouse. This article assumes that data is always loaded into a data warehouse, whereas the term ETL can in fact refer to a process that loads any database.

 

Extract

 

The first part of an ETL process is to extract the data from the source systems. Most data warehousing projects consolidate data from different source systems. Each separate system may also use a different data organization / format. Common data source formats are relational databases, and flat files, but other source formats exist. Extraction converts the data into records and columns (aka fields).


 

Transform

 

The transform phase applies a series of rules or functions to the extracted data to derive the data to be loaded. Some data sources will require very little manipulation of data. However, in other cases any combination of the following transformations types may be required:

 

* Select only certain columns to load (or if you prefer, null columns not to load)

* Translate coded values (e.g. If the source system stores M for male and F for female but the warehouse stores 1 for male and 2 for female)

* Derive a new calculated value (e.g. sale_amount = qty * unit_price)

* Join together data from multiple sources (e.g. lookup, merge, etc)

* Summarize multiple rows of data (e.g. total sales for each region)

* Generate a Surrogate_key value

* Transpose / cross tabulate (turn multiple columns into multiple rows or vice versa)

 

Load

 

The load phase loads the data into the data warehouse. Depending on the requirements of the organization, this process ranges widely. Some data warehouses merely overwrite old information with new data. More complex systems can maintain a history and audit trail of all changes to the data.

 

Challenges

 

ETL processes can be quite complex, and significant problems can occur. Improperly designed ETL systems or an unexpected change in format of one of the source systems can cause serious problems in the ETL process potentially destroying or corrupting significant amounts of data in the target system. An additional difficulty is making sure the data being uploaded is relatively consistent. Since multiple source databases all have different update cycles (some may be updated every few minutes, while others may take days or weeks), an ETL system may be required to hold back certain data until all sources are synchronized.

 

Tools

 

While an ETL process can be created using almost any programming language, creating them from scratch is quite complex. Increasingly, companies are buying ETL tools to help in the creation of ETL processes.

 

A good ETL tool must be able to communicate with the many different relational databases and read the various file formats used throughout an organization. ETL tools have started to migrate into Enterprise Application Integration, or even Enterprise Service Bus, systems that now cover much more than just the extraction transformation and loading of data. Many ETL vendors now have data profiling, data quality and metadata capabilities.

 

GIOP  In distributed computing, GIOP (General Inter-ORB Protocol) is the abstract protocol by which Object request brokers (ORBs) communicate. Standards associated with the protocol are maintained by the Object Management Group (OMG).

 

IIOP    (Internet Inter-Orb Protocol) is the implementation of GIOP for TCP/IP.

 

JDBC  Java Database Connectivity, or JDBC, is an API for the Java programming language that defines how a client may access a database. (To be strictly correct, JDBC is not an acronym.) It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases.

 

JDBC allows multiple implementations to exist and be used by the same application. The API provides a mechanism for dynamically loading the correct Java packages and registering them with the JDBC Driver Manager. The DriverManager is used as a connection factory for creating JDBC connections.

 

JDBC connections support creating and executing statements. These statements may be update statements such as SQL INSERT, UPDATE and DELETE or they may be query statements using the SELECT statement. Additionally, stored procedures may be invoked through a statement. Statements are one of the following types:

 

* Statement - the statement is sent to the database server each and everytime.

* PreparedStatement - the statement is compiled on the database server allowing it to be executed multiple times in an efficient manner.

* CallableStatement - used for executing stored procedures on the database.

 

Update statements such as INSERT, UPDATE and DELETE return an update count that indicates how many rows were affected in the database. These statements do not return any other information.

 

Query statements return a JDBC row result set. The row result set is used to walk over the result set. Individual columns in a row are retrieved either by name or by column number. There may be any number of rows in the result set. The row result set has metadata that describes the names of the columns and their types.

 

There is an extension to the basic JDBC API that allows for scrollable result sets and cursor support among other things. Refer to the SUN documentation [2] for more details.

 

METADATA (Greek: meta-+ Latin: data "information"), literally "data about data", is     information that describes another set of data. A common example is a library catalog card, which contains data about the contents and location of a book: It is data about the data in the book referred to by the card. Other common contents of metadata include the source or author of the described dataset, how it should be accessed, and its limitations.

 

Other machine generated data about data, such as the reversed index created by a free-text search engine is generally not considered as metadata. Another important type of data about data is the links or relationship among data. Some metadata scheme attempts to embrace this concept (such as Dublin Core element link). Since metadata is also data, it is possible to have "metadata of the metadata of data".

 

The metadata which is embedded with content is called embedded metadata. A data repository typically stores the metadata detached from the data.

 

MDM  Master Data Management (MDM), also known as Reference Data Management, is a discipline in Information Technology (IT) that focuses on the management of reference or master data that is shared by several disparate IT systems and groups. MDM is required to warrant consistent computing between diverse system architectures and business functions.

 

Large companies often have IT systems that are used by diverse business functions (e.g., finance, sales, R&D, etc.) and span across multiple countries. These diverse systems usually need to share key data that is relevant to the parent company (e.g., products, customers, and suppliers). It is critical for the company to consistently use these shared data elements through various IT systems.

 

MDM also becomes important when two or more companies want to share data across corporate boundaries. In this case, MDM becomes an industry issue such as is the case with the Finance industry and the required STP (Straight Through Processing) or [[T+1]].

 

In the Y computing model, MDM is one of three computing types (OLTP transactional computing (typically ERP), DSS (Decision Support Systems) and MDM). These types range from operational reporting to EIS (Executive Information Systems). Master data management is not only required to coordinate different ERP systems, but also necessary to supply meta-data for aggregating and integrating transactional data. This use of MDM is necessary for Data Warehouse projects typically incorporated in Decision Support Systems. For this reason, MDM systems sometimes provide a meta-data abstraction layer. This design provides an entity relationship (ER)-scheme for systems that use the master data.

 

MOF   the Meta-Object Facility, is an Object Management Group (OMG) standard. MOF originated in the Unified Modeling Language (UML); the OMG was in need of a Meta-Modeling architecture to define the UML. MOF is designed as a four-layered architecture. It provides a meta-meta model at the top layer, aka the M3 layer. This M3-model is the language used by MOF to build meta-models, called M2-models. The most prominent example of a Layer 2 MOF model is the UML meta-model, the model that describes the UML itself. These M2-models describe elements of the M1-layer, and thus M1-models. These would be, for example, models written in UML. The last layer is the M0-layer or data layer. It is used to describe application data, and are thus instances of M1-models.

 

Beyond the M3-model, MOF describes the means to create and manipulate (meta-)models by defining CORBA interfaces that describe those operations. Because of the similarities between the MOF M3-model and UML structure models, MOF meta-models are usually modeled as UML class diagrams. A supporting standard of MOF is XMI, which defines an XML-based exchange format for models on the M3-, M2-, or M1-Layer.

 

MOF is a closed meta-modelling architecture; it defines an M3-model, which is a model (or instance) of itself. MOF is a strict meta-modelling architecture; every model element on every layer is strictly an instance of a model element of the layer above. MOF only provides a means to define the structure, or abstract syntax of a languages or of data.

 

Simplified, MOF uses the notion of classes, as known from object orientation, to define concepts (model elements) on a meta-layer. These classes (concepts) can then be instantiated through objects (instances) of the model layer below. Due to the fact that an element on the M2 layer is an object (instance of an M3 model element) as well as a class (it is an M2 layer concept) the notion of a clabject is used. Clabject is a merge of the words class and object.

 

MRP   Material Requirements Planning (MRP) is a software based production planning and inventory control system used to manage manufacturing processes. Although it is not common nowadays, it is possible to conduct MRP by hand as well.

 

An MRP system is intended to simultaneously meet 3 objectives:


    * Ensure materials and products are available for production and delivery to customers.

    * Maintain the lowest possible level of inventory.

    * Plan manufacturing activities, delivery schedules and purchasing activities.

 

The scope of MRP in manufacturing

 

All manufacturing organizations, whatever it is they produce, face the same daily practical problem - that customers want products to be available in a shorter time than it takes to make them. This means that some level of planning is required.

 

Companies need to control the types and quantities of materials they purchase, plan which products are to be produced and in what quantities and ensure that they are able to meet current and future customer demand, all at the lowest possible cost. Making a bad decision in any of these areas will lose the company money. A few examples are given below:

 

* If a company purchases insufficient quantities of an item used in manufacturing, or the wrong item, they may be unable to meet contracts to supply products by the agreed date.

 

* If a company purchases excessive quantities of an item, money is being wasted - the excess quantity ties up cash while it remains as stock and may never even be used at all. This is a particularly severe problem for food manufacturers and companies with very short product life cycles. However, some purchased items will have a minimum quantity that must be met, therefore, purchasing excess is necessary.

 

* Beginning production of an order at the wrong time can mean customer deadlines being missed.

 

MRP is used by many organizations as a tool to deal with these problems. The questions it provides answers for are: WHAT items are required, HOW MANY are required and WHEN are they required by. This applies to items that are bought in and to sub-assemblies that go into more complex items.

 

ODBC Open Database Connectivity (ODBC) is a standard software API specification for using database management systems (DBMS). ODBC is designed to be independent of programming language, database system and operating system.

 

ODBC is an API specification for using SQL queries to access data. An implementation of ODBC will contain one or more applications, a core ODBC library, and one or more "database drivers". The core library is independent of the applications and DBMSes, and acts as an "interpreter" between the applications and the database drivers. The DBMS-specific details are contained in the database drivers. Thus, it is possible to write applications that use standard types and features without concern for the specifics of each DBMS that might be used. Likewise, database driver implementors need only know how to attach to the core library. This makes ODBC modular.

 

To write ODBC code that exploits DBMS-specific features requires more advanced programming. An application must use introspection, calling ODBC metadata functions that return information about supported features, available types, syntax, limits, isolation levels, driver capabilities and other information.

 

ODBC is the foremost example of ubiquitous data access because there are hundreds of ODBC drivers for a large variety of data sources. ODBC is available for a variety of operating systems and there are drivers for non-relational data such as spreadsheets, text and XML files. Because ODBC dates back more than ten years, it offers connectivity to a wider variety of data sources than other data access APIs. There are more drivers for ODBC than drivers or providers for newer APIs such as OLE DB, JDBC and ADO.NET.

 

Despite the benefits of ubiquitous connectivity and platform independence, ODBC has certain drawbacks. Administering a large number of client machines can involve a diversity of drivers and DLLs. This complexity can increase system administration overhead. Large organizations with thousands of PCs have often turned to ODBC server technology to simplify the administration problem.

 

The layered architecture of ODBC can introduce a minor performance penalty. The overhead of executing an additional layer of code is generally insignificant compared to network latency and other factors that influence query performance. Driver architecture is also a consideration. Many first-generation ODBC drivers operated with database client libraries supplied by a DBMS vendor. An ODBC driver for Oracle, for example, would use Oracle's network library (SQL*Net, Oracle Net) and OCI (Oracle Call Interface) client library. Similarly, a driver for Sybase or Microsoft SQL Server would use a vendor-supplied network library to emit Tabular Data Stream (TDS) packets. Those earlier drivers have been largely supplanted by wire protocol drivers that do not use database client libraries. The newer type of driver communicates using protocols such as TDS, TNS (Oracle Transparent Network Substrate), and DRDA without needing database client libraries.

 

Differences between drivers and driver maturity are also important issues. Newer ODBC drivers are often less stable than drivers that have been in production for years. Years of testing and deployment mean a driver is less likely to contain bugs.

 

To use DBMS-specific features with ODBC, a developer must understand adaptive programming techniques such as introspection and writing interoperable SQL statements. Even when using adaptive techniques, however, some advanced DBMS features might not be available with ODBC. The ODBC 3.x API is well-suited to traditional SQL applications such as OLTP but it has not evolved to support richer types introduced by SQL:1999 and SQL:2003.

 

Developers needing features or types not accessible with ODBC can use other SQL APIs. When platform independence is not a goal, developers can use proprietary APIs. If creating portable, platform-independent code is a goal, developers can use the JDBC API.

 

OMG  Object Management Group (OMG) is a consortium aimed at setting standards in object-oriented programming as well as system modeling. In 1989, this consortium, which included Hewlett-Packard Company, IBM Corporation, Apple Computer Inc. and Sun Microsystems Inc., mobilised to create a cross-compatible distributed object standard. The goal was a common binary object with methods and data that work using all types of development environments on all types of platforms. Using a committee of organisations, OMG set out to create the first Common Object Request Broker Architecture (CORBA) standard which appeared in 1991. As of March 2003, the latest standard is CORBA 3.0.

 

ORB   In distributed computing, an object request broker (ORB) is a piece of middleware software that allows programmers to make program calls from one computer to another, via a network. An important special case of this is client-server computing, where a client program calls a server program over a network. ORBs handle the transformation of in-process data structures to the byte sequence which is transmitted over the network (of course also the reverse transformation). This is called marshalling or serialization.Some ORBs, such as CORBA-compliant systems, use an Interface Description Language (IDL) to describe the data which is to be transmitted on remote calls. Before object-oriented programming became mainstream, a similar technology called RPC (Remote Procedure Call) was popular.

 

In addition to marshalling data, ORBs often expose many more features, such as distributed transactions, directory services or realtime scheduling.

 

OTLP  (Online Transaction Processing) is a form of transaction processing conducted via computer network. Some applications of OLTP include electronic banking, order processing, employee time clock systems, e-commerce, and eTrading.

 

In large applications, efficient OLTP may depend on sophisticated transaction management software (such as CICS) and/or database optimization tactics to facilitate the processing of large numbers of concurrent updates to an OLTP-oriented database.

 

For even more demanding decentralized database systems, OLTP brokering programs can distribute transaction processing among multiple computers on a network. OLTP is often integrated into service-oriented architecture and Web services.

 

The term Online Transaction Processing is somewhat ambiguous: some understand "transaction" as a reference to computer or database transactions, while others (such as the Transaction Processing Performance Council) define it in terms of business or commercial transactions.

 

RPC    A remote procedure call (RPC) is a protocol that allows a computer program running on one host to cause code to be executed on another host without the programmer needing to explicitly code for this. When the code in question is written using object-oriented principles, RPC is sometimes referred to as remote invocation or remote method invocation.

 

RPC is an easy and popular paradigm for implementing the client-server model of distributed computing. An RPC is initiated by the caller (client) sending a request message to a remote system (the server) to execute a certain procedure using arguments supplied. A result message is returned to the caller. There are many variations and subtleties in various implementations, resulting in a variety of different (incompatible) RPC protocols.

 

SFA    Sales force management systems are information systems used in marketing and management that automate some sales and sales force management functions. They are frequently combined with a marketing information system, in which case they are often called customer relationship management systems.

 

Advantages to sales people

 

Proponents claim that sales force automation systems can improve the productivity of sales personnel. Here are some examples:

 

* Rather than write-out sales reports, activity reports, and/or call sheets, sales people can fill-in prepared e-forms. This saves time.

* Rather than printing out reports and taking them to the sales manager, sales people can use the company intranet to transmit the information. This saves time.

* Rather than waiting for paper based product inventory data, sales prospect lists, and sales support information, they will have access to the information when they need it. This could be useful in the field when answering prospects’ questions and objections.

* The additional tools could help improve sales staff morale if they reduce the amount of record keeping and/or increase the rate of closing. This could contribute to a virtuous spiral of beneficial and cumulative effects.

* These sales force systems can be used as an effective and efficient training device. They provide sales staff with product information and sales technique training without them having to waste time at seminars.

* Better communication and co-operation between sales personnel facilitates successful team selling.

* More and better qualified sales leads could be automatically generated by the software.

* This technology increases the sales person’s ratio of selling time to non-selling time. Non-selling time includes activities like report writing, travel time, internal meetings, training, and seminars.

 

Advantages to the sales manager

 

Sales force automation systems can also affect sales management. Here are some examples:

 

*The sales manager, rather than gathering all the call sheets from various sales people and tabulating the results, will have the results automatically presented in easy to understand tables, charts, or graphs. This saves time for the manager.

* Activity reports, information requests, orders booked, and other sales information will be sent to the sales manager more frequently, allowing him/her to respond more directly with advice, product in-stock verifications, and price discount authorizations. This gives management more hands-on control of the sales process if they wish to use it.

* The sales manager can configure the system so as to automatically analyze the information using sophisticated statistical techniques, and present the results in a user-friendly way. This gives the sales manager information that is more useful in :

o Providing current and useful sales support materials to their sales staff

o Providing marketing research data : demographic, psychographic, behavioural, product acceptance, product problems, detecting trends

o Providing market research data : industry dynamics, new competitors, new products from competitors, new promotional campaigns from competitors, macro-environmental scanning, detecting trends

o Co-ordinate with other parts of the firm, particularly marketing, production, and finance

o Identifying your most profitable customers, and your problem customers

o Tracking the productivity of their sales force by combining a number of performance measures such as : revenue per sales person, revenue per territory, margin by product category, margin by customer segment, margin by customer, number of calls per day, time spent per contact, revenue per call, cost per call, entertainment cost per call, ratio of orders to calls, revenue as a percentage of sales quota, number of new customers per period, number of lost customers per period, cost of customer acquisition as a percentage of expected lifetime value of customer, percentage of goods returned, number of customer complaints, and number of overdue accounts. More complex models like the PAIRS model (by Parasuraman and Day) and the Call Plan model (by Lodish) can also be used.

 

Advantages to the marketing manager

 

It is also claimed to be useful for the marketing manager. It gives the marketing manager information that is useful in :

 

* Understanding the economic structure of your industry

* Identifying segments within your market

* Identifying your target market

* Identifying your best customers

* Doing marketing research to develop profiles (demographic, psychographic, and behavoural) of your core customers

* Understanding your competitors and their products

* Developing new products

* Establishing environmental scanning mechanisms to detect opportunities and threats

* Understanding your company's strengths and weaknesses

* Auditing your customers' experience of your brand in full

* Developing marketing strategies for each of your products using the marketing mix variables of price, product, distribution, and promotion

* Co-ordinating the sales function with other parts of the promotional mix (such as advertising, sales promotion, public relations, and publicity)

* Creating a sustainable competitive advantage

* Understanding where you want your brands to be in the future, and providing an empirical basis for writing marketing plans on a regular basis to help you get there

* Providing input into feedback systems to help you monitor and adjust the process

 

Strategic advantages

 

Sales force automation systems can also create competitive advantage. Here are some examples:

 

* As mentioned above, productivity will increase. Sales staff will use their time more efficiently and more effectively. The sales manager will also become more efficient and more effective.(see above) This increased productivity can create a competitive advantage in three ways: it can reduce costs, it can increase sales revenue, and it can increase market share.

* Field sales staff will send their information more frequently. Typically information will be sent to management after every sales call (rather than once a week). This provides management with current information, information that they will be able to use while it is still valuable. Management response time will be greatly reduced. The company will become more alert and more agile.

* These systems could increase customer satisfaction if they are used with wisdom. If the information obtained and analyzed with the system is used to create a product that matches or exceeds customer expectations, and the sales staff use the system to service customers more expertly and diligently, then customers should be satisfied with the company. This will provide a competitive advantage because customer satisfaction leads to increased customer loyalty, reduced customer acquisition costs, reduced price elasticity of demand, and increased profit margins.

 

Disadvantages

 

Detractors claim that sales force management systems are:

 

* difficult to work with

* require additional work inputting data

* dehumanize a process that should be personal

* require continuous maintenance, information updating, and system upgrading

* costly

* difficult to integrate with other management information systems

 

Encouraging use

 

For all the reasons stated above many organisations have found it difficult to persuade sales people to enter data into the system. For this reason many have questioned the value of the investment. Recent developments have embedded sales process systems that give something back to the seller within the CRM screens. Because these systems help the sales person plan and structure their selling in the most effective way they give a reason to use the CRM.

 

SGML The Standard Generalized Markup Language (SGML) is a metalanguage in which one can define markup languages for documents. SGML is a descendant of IBM's Generalized Markup Language (GML), developed in the 1960s by Charles Goldfarb, Edward Mosher and Raymond Lorie (whose surname initials also happen to be GML). SGML should not be confused with the Geography Markup Language (GML) developed by the Open GIS Consortium; cf, or the Game Maker scripting language, GML.

 

SGML provides a variety of markup syntaxes that can be used for many applications. By changing the SGML Declaration one does not even need to use "angle brackets" although they are the norm, the so-called concrete reference syntax.

 

SGML was originally designed to enable the sharing of machine-readable documents in large projects in government, legal and the aerospace industry, which have to remain readable for several decades—a very long time in information technology. It has also been used extensively in the printing and publishing industries, but its complexity has prevented its widespread application for small-scale general-purpose use.

 

SGML is an ISO standard: "ISO 8879:1986 Information processing—Text and office systems—Standard Generalized Markup Language (SGML)".

 

SIC (code) The Standard Industrial Classification was a United States government system for classifying industries by a four-digit code. Established in the 1930s, it was supplanted by the six-digit North American Industry Classification System in 1997.

 

SOA    In computing, the term Service-Oriented Architecture (SOA) expresses a software architectural concept that defines the use of services to support the requirements of software users. In a SOA environment, nodes on a network[1] make resources available to other participants in the network as independent services that the participants access in a standardized way. Most definitions of SOA identify the use of Web services (i.e. using SOAP or REST) in its implementation. However, one can implement SOA using any service-based technology. The OASIS SOA Reference Model Technical Committee is working on defining SOA independent of any specific technologies.

 

Unlike traditional point-to-point architectures, SOAs comprise loosely coupled, highly interoperable application services. These services interoperate based on a formal definition independent of the underlying platform and programming language (e.g., WSDL) . The interface definition encapsulates (hides) the vendor and language-specific implementation. A SOA is independent of development technology (such as Java and .NET). The software components become very reusable because the interface is defined in a standards-compliant manner. So, for example, a C# (C Sharp) service could be used by a Java application.


 

SOA provides a methodology and framework for documenting enterprise capabilities and can support integration and consolidation activities.

 

SOAP is a protocol for exchanging XML-based messages over a computer network, normally using HTTP. SOAP forms the foundation layer of the web services stack, providing a basic messaging framework that more abstract layers can build on. SOAP facilitates the Service-Oriented architectural pattern.

 

There are several different types of messaging patterns in SOAP, but by far the most common is the Remote Procedure Call (RPC) pattern, where one network node (the client) sends a request message to another node (the server), and the server immediately sends a response message to the client.

 

SOAP originally was an acronym for Simple Object Access Protocol, but the acronym was dropped in Version 1.2 of the SOAP specification. Originally designed by Dave Winer, Don Box, Bob Atkinson, and Mohsen Al-Ghosein in 1998 with backing from Microsoft (where Atkinson and Al-Ghosein worked at the time), the SOAP specification is currently maintained by the XML Protocol Working Group of the World Wide Web Consortium.

 

Transport methods

 

HTTP was chosen as the primary application layer protocol for SOAP since it works well with today's Internet infrastructure; specifically, SOAP works well with network firewalls. This is a major advantage over other distributed protocols like GIOP/IIOP or DCOM which are normally filtered by firewalls.

 

XML was chosen as the standard message format because of its widespread acceptance by major corporations and open source development efforts. Additionally, a wide variety of freely available tools significantly ease the transition to a SOAP-based implementation.

 

The somewhat lengthy syntax of XML can be both a benefit and a drawback. Its format is easy for humans to read, but can be complex and slow down processing times. For example, CORBA, GIOP and DCOM use much shorter, binary message formats. On the other hand, hardware appliances are available to accelerate processing of XML messages. Binary XML (the use of the word "XML" is controversial here) is also being explored as a means for streamlining the throughput requirements of XML.

 

Structure of a SOAP message

 

A SOAP message is contained in an envelope. Within this envelope are two additional sections: the header and the body of the message. SOAP messages use XML namespaces.

 

The header contains relevant information about the message. For example, a header can contain the date the message is sent, or authentication information. It is not required, but, if present, must always be included at the top of the envelope.

 

SQL    (short for Structured Query Language) is the most popular computer language used to create, modify and retrieve data from relational database management systems. The language has evolved beyond its original purpose to support object-relational database management systems. It is an ANSI/ISO standard.

 

STP     Straight Through Processing (STP) enables the entire trade process for capital markets and payments transactions to be conducted electronically without the need for re-keying or manual intervention, subject to legal and regulatory restrictions. The concept has also been transferred into other asset classes including energy (oil, gas) trading.

 

Presently, the entire trade lifecycle, from initiation to settlement, is a complex labyrinth of manual processes, taking several days. STP is at least 'same-day' or faster, ideally minutes or even seconds. The goal to minimise settlement risk is for the execution of a trade and its settlement and clearing to occur simultaneously. However, for this to be achieved, multiple market participants must realise high levels of STP. In particular, transaction data would need to be made available on a just-in-time basis which is a considerably harder goal to achieve for the financial services community than the application of STP alone. After all, STP itself is merely an efficient utilisation of computer-based technology to transaction processing.

 

Historically, STP solutions were needed to help financial markets firms meet the move to one-day trade settlement of equities transactions, as well as to meet the global demand that had resulted from the explosive growth of online trading. Now the concepts of STP are applied to reduce systemic and operational risk and to improve certainty of settlement and minimize operational costs.

 

When fully realized, STP will provide asset managers, broker/dealers, custodians and other financial services players with tremendous benefits, including greatly shortened processing cycles, reduced settlement risk and lower operating costs. Some industry analysts believe that STP is not an achievable goal in the sense that firms are unlikely to find the cost/benefit to reach 100% automation. Instead they promote the idea of improving levels of internal STP within a firm while encouraging groups of firms to work together to improve the quality of the automation of transaction information between themselves, either bilaterally or as a community of users (external STP).

 

UML   Unified Modeling Language (UML) is a non-proprietary, object modeling and specification language used in software engineering.

 

UML is not restricted to modeling software. As a graphical notation, UML can be used for modeling hardware (engineering systems) and is commonly used for business process modeling, systems engineering modeling, and representing organizational structure.

 

UML was designed to be used to specify, visualize, construct, and document the artifacts of an object-oriented software-intensive system under development. It represents an integrated compilation of best engineering practices that have proven to be successful in modeling large, complex systems, especially at the architectural level.

 

XMI    The XML Metadata Interchange (XMI) is an OMG standard for exchanging metadata information via Extensible Markup Language (XML). It can be used for any metadata whose metamodel can be expressed in Meta-Object Facility (MOF). The most common use of XMI is as an interchange format for UML models, although it can also be used for serialization of models of other languages (metamodels).

 

In the OMG vision of modeling, data is split into abstract models and concrete models. The abstract models represent the semantic information, whereas the concrete models represent visual diagrams. Abstract models are instances of arbitrary MOF-based modeling languages such as UML. For diagrams, the Diagram Interchange (DI, XMI[DI]) standard is used. At the moment there are severe incompatibilities between different modeling tool vendor implementations of XMI, even between interchange of abstract model data. The usage of Diagram Interchange is almost nonexistent. Unfortunately this means exchanging files between UML modeling tools using XMI is rarely possible.

 

XML   The Extensible Markup Language (XML) is a W3C-recommended general-purpose markup language for creating special-purpose markup languages, capable of describing many different kinds of data. It is a simplified subset of SGML. Its primary purpose is to facilitate the sharing of data across different systems, particularly systems connected via the Internet. Languages based on XML (for example, RDF/XML, RSS, MathML, XHTML, SVG, and cXML) are defined in a formal way, allowing programs to modify and validate documents in these languages without prior knowledge of their form.

 

Features of XML

 

XML provides a text-based means to describe and apply a tree-based structure to information. At its base level, all information manifests as text, interspersed with markup that indicates the information's separation into a hierarchy of character data, container-like elements, and attributes of those elements. In this respect, it is similar to the LISP programming language's S-expressions, which describe tree structures wherein each node may have its own property list.

 

The fundamental unit in XML is the character, as defined by the Universal Character Set. Characters are combined in certain allowable combinations to form an XML document. The document consists of one or more entities, each of which is typically some portion of the document's characters, encoded as a series of bits and stored in a text file.

 

The ubiquity of text file authoring software (word processors) facilitates rapid XML document authoring and maintenance, whereas prior to the advent of XML, there were very few data description languages that were general-purpose, Internet protocol-friendly, and very easy to learn and author. In fact, most data interchange formats were proprietary, special-purpose, "binary" formats (based foremost on bit sequences rather than characters) that could not be easily shared by different software applications or across different computing platforms, much less authored and maintained in common text editors.

 

By leaving the names, allowable hierarchy, and meanings of the elements and attributes open and definable by a customizable schema, XML provides a syntactic foundation for the creation of custom, XML-based markup languages. The general syntax of such languages is rigid — documents must adhere to the general rules of XML, assuring that all XML-aware software can at least read (parse) and understand the relative arrangement of information within them. The schema merely supplements the syntax rules with a set of constraints. Schemas typically restrict element and attribute names and their allowable containment hierarchies, such as only allowing an element named 'birthday' to contain 1 element named 'month' and 1 element named 'day', each of which has to contain only character data. The constraints in a schema may also include data type assignments that affect how information is processed; for example, the 'month' element's character data may be defined as being a month according to a particular schema language's conventions, perhaps meaning that it must not only be formatted a certain way, but also must not be processed as if it were some other type of data.

 

In this way, XML contrasts with HTML, which has an inflexible, single-purpose vocabulary of elements and attributes that, in general, cannot be repurposed. With XML, it is much easier to write software that accesses the document's information, since the data structures are expressed in a formal, relatively simple way.

 

XML makes no prohibitions on how it is used. Although XML is fundamentally text-based, software quickly emerged to abstract it into other, richer formats, largely through the use of datatype-oriented schemas and object-oriented programming paradigms (in which the document is manipulated as an object). Such software might only treat XML as serialized text when it needs to transmit data over a network, and some software doesn't even do that much. Such uses have led to "binary XML", the relaxed restrictions of XML 1.1, and other proposals that run counter to XML's original spirit and thus garner an amount of criticism.

 

Strengths and weaknesses

 

Some features of XML that make it well-suited for data transfer are:

 

* its simultaneously human- and machine-readable format;

* it has support for Unicode, allowing almost any information in any human language to be communicated;

* the ability to represent the most general computer science data structures: records, lists and trees;

* the self-documenting format that describes structure and field names as well as specific values;

* the strict syntax and parsing requirements that allow the necessary parsing algorithms to remain simple, efficient, and consistent.

 

XML is also heavily used as a format for document storage and processing, both online and offline, and offers several benefits:

 

* its robust, logically-verifiable format is based on international standards;

* the hierarchical structure is suitable for most (but not all) types of documents;

* it manifests as plain text files, unencumbered by licenses or restrictions;

* it is platform-independent, thus relatively immune to changes in technology;

* it and its predecessor, SGML, have been in use since 1986, so there is extensive experience and software available.

 

For certain applications, XML also has the following weaknesses:

 

* Its syntax is fairly verbose and partially redundant. This can hurt human readability and application efficiency, and yields higher storage costs. It can also make XML difficult to apply in cases where bandwidth is limited, though compression can reduce the problem in some cases. This is particularly true for multimedia applications running on cell phones and PDAs which want to use XML to describe images and video.

* Parsers should be designed to recursively handle arbitrarily nested data structures and must perform additional checks to detect improperly formatted or differently ordered syntax or data (this is because the markup is descriptive and partially redundant, as noted above). This causes a significant overhead for most basic uses of XML, particularly where resources may be scarce - for example in embedded systems. Furthermore, additional security considerations arise when XML input is fed from untrustworthy sources, and resource exhaustion or stack overflows are possible.

* Some consider the syntax to contain a number of obscure, unnecessary features born of its legacy of SGML compatibility. However, an effort to settle on a subset called "Minimal XML" led to the discovery that there was no consensus on which features were in fact obscure or unnecessary.

* The basic parsing requirements do not support a very wide array of data types, so interpretation sometimes involves additional work in order to process the desired data from a document. For example, there is no provision in XML for mandating that "3.14159" is a floating-point number rather than a seven-character string. XML schema languages add this functionality.

* Modeling overlapping (non-hierarchical) data structures requires extra effort.

* Mapping XML to the relational or object oriented paradigms is often cumbersome.

* Some have argued that XML can be used as a data storage only if the file is of low volume, but this is only true given particular assumptions about architecture, data, implementation, and other issues.

 

  Home | Copyright For This Page ...except where cited