Most relevant topics

Monday, May 4, 2009

Should countries like the United States push for agenda items like freedom of expression, intellectual property protection, and cybercrime prevention?

The freedom of expression, being one of the firmly stated and affirmed right in the US constitution, I think the question refers to effecting in a global scenario. Freedom of expression touches most sensitive areas such as religion, which has a soft corner for. Considering the current world situation, I think pushing for such agendas that is based on moral, cultural. religious values of countries outside of US will have little effect and will cause adverse effects.

Intellectual property rights is a very important area where US should defenitely push for agendas, considering the need to support and encourage innovation and growth. As per the court decision - Rockwell Graphic Sys., Inc. v. DEV Indus., Inc., 925 F.2d 174, 180 (7th Cir. 1991), "the future of the nation depends in no small part on the efficiency of industry, and the efficiency of industry depends in no small part on the protection of intellectual property.". Seeing the recent developments, the move by US to protect the control over Internet Governance, has raised concerns among other participating countries. The question is - if we stand for freedom of expression, isn't it better to pass control over to a global body such as UN to oversee ICANN? I think, in future, there is a possibility for an IPR setup for internet's origin and development in favor of US. Unless there is no such IPR on internet as a whole, I strongly oppose such actions of pollicy making and control by any single entity, rather than a global entity like UN.

Regarding Cybercrime protection, yes, US Criminal Division's is doing a great job in implementing various. Cyber crimes such as hacking, IP thefts, computer virus/worm proliferation etc. cybercrime is a common issue in the world, US should work in hands with other technologically advanced countries to build stringent Cybercrime protection laws.

I agree that "US should lead by example". In my opinion, its healthy to being proactive on pushing "some" of these agendas, which is condemned globally.

(c) Deepesh Joseph, May 2007.
Related websites --

What measures need to be taken to make sure that offshore outsourcing does not adversely affect American jobs?

Offshore outsourcing is a highly debated topic. It would be nice if we reflect on the reasons that has lead to offshore outsourcing -

1. Highly skilled workforce for cheap labor - Most of the developing countries offers education that is highly specialized and with focus on international job economy. Difference in dollars and x currency, leads to a favorable situation for companies

2. Current restrictions and huge backlogs in H1Bs and job based Green Cards - If U.S. doesn't take necessary steps to provide more opportunities for the highly-skilled workforce to migrate into U.S., companies will continue to outsource.

3. Inadequate number of high-tech workers in U.S. - Research shows that there is about 190,000 unfilled IT jobs in the US today due to a shortage of qualified high-tech workers. Companies resort to outsourcing to fill in immediate gaps in HR.

Now looking at possible recommendations to remedy the situation of U.S. economy getting affected, the following steps may be taken -

1. Focus on STEM (Science, Technology, Engineering and Maths) based education - Children should be encouraged to pursue these fields right from their school education. Provide more funding and scholarships to students who pursue these career paths. This will eventually lead to increased pool of highly skilled workers at reasonable rates.

2. Increase the number of immigrant job visas which will reduce outsourcing for skilled labor.

3. Establish legislative measures to regulate outsourcing so that companies are required to fill in certain percentage of U.S. natives, if sufficient skills are available.

(c) Deepesh Joseph, Aug 2007
Related websites --

What is IT?

We have seen so much about IT and all of us have been a part of it by some means or other. We all know what IT refers to and what are its applications. I have been trying to define what is IT or Information Technology, since some time. This is puerly a personal effort and each one of us may have different opinion. I think it will be fun and thoughtful if we try to make a personal definition of what IT is. Here goes my definition -

"IT is an evolving disciplne which act as a PLANNER, ENABLER, SUPPLIER, INTEGRATOR and ENHANCER of the necessary TECHNOLOGY which provides the basic framework for AQUIRING, STORING, MANIPULATING, RETRIEVING and MANAGING information so that the general motto of IT is accomplished."

The motto of IT, according to me is to get the right information to the right person at the right place, at the right time, in the right form and at the right cost.

(c) Deepesh Joseph, 2001

Related websites --

Information Management and career levels

Deepesh Joseph
August 2006

Information Management (IM) refers to any activity that leads to effective collection, storage, retrieval and analysis of information by focusing on business strategy and by continuously improving business competitiveness and adaptability. The ultimate goal of any IM job is to add up to the combined efforts to meet the required business strategy. The business strategy can be, say, “making Government more approachable and transparent to the citizens”. To meet this goal, the Federal Government has to implement scalable Information Management solutions that will make strategic use of IT as an enabling medium. This is a typical situation where we require professional IM expertise and skills. Depending upon the nature of business and the complexity of the IM project, there exists various professional fields of expertise within the IM field. IM field is very diverse and its hard to cover all the skill positions. A typical map and guide to IM field is depicted in Figure (a).

Figure (a) shows a typical IM field where in a business strategy gets converted to IM solution. As shown in the figure, there are 4 major levels of skills cloud. All the skills sets works in a synergistic manner towards the common business goal. I shall briefly state each of the skill position.

Level 4: Level 4 is the Chief Officers' level where the business strategy is mapped to IT strategy and architecture. On the top, we have

4.1 CIO or Chief Information Officer who is responsible for providing technology vision and leadership to conceive, build and implement multiple IT projects on time and within budget. CIO should have technology skills as well as business skills to effectively understand the business vision and then utilize IT wisely and cost effectively to realize the vision. So CIO is a strategist as well as a technologist.

4.2 CTO or Chief Technology Officer is more technology oriented job which requires knowledge about existing and emerging technologies and thus advice on technology adoption. CTO should be capable of creating an organizational vision for new technologies and oversee and manage the firm's technological operations and infrastructure.

Level 3: Level 3 consists of the IT managers, Architects and Specialists who work on building the feasible architectures, specifications project management methodologies.

3.1 IT Manager is responsible for managing individuals involved in the activities of developing, designing, debugging, installing, modifying, testing, analyzing or maintaining applications programs and other IT systems. IT manager ensures effective coordination and technical support between programmers, developers, testers, interfacing system groups, and end-users and adherence to quality standards. The position requires extensive experience in managing IT projects and systems.

3.1.1 Project Manager applies or ensures the application of a standard, defined methodology to the project, and provides the leadership skills necessary to control the execution of the project tasks, as defined in the methodology.

3.2 IT consultants/specialists are either outsourced or in house manpower who have specialized skills in various aspects of system architecture development and business process improvement. Well known IT consultant job titles are -

3.2.1 ERP consultant
3.2.2 Business Process consultant
3.2.3 Security Consultant
3.2.4QA specialist
3.2.5 Hardware specialist

3.3 Enterprise Architect is a strategic person who is passionate about delivering business orientated IT outcomes by adopting an architectural approach. This position requires ability to articulate strategies, ideas, positions and recommendations in written, visual and verbal forms such as Enterprise Architect frameworks, UML/Object Oriented Design and Architecture Tools.
3.4 Chief Applications Architect is a highly skilled position whose major responsibilities includes developing and implementing standards and procedures for data, data modeling, data architecture and software applications. This position plans, analyzes, and develops requirements for application development and enhancement of all critical applications.

Level 2: Level 2 consists of the Team leads and administrators who actually does hands-on on various senior System development and Administration jobs such as follows.

2.1 Team Leader usually is the senior member of a software development team who performs high level system/db designs, software testing and some amount of Quality Assurance. He should have excellent communication skills and team management skills.

2.2 Sr. Systems Analyst acts as a liaison between business people who have a business problem and technology people who know how to create solutions. In doing so he manages the organization's documentation of business requirements, including use cases and other system analysis docs to ensure that the project team develops solutions that meet the business goal .

2.3 Administrators manages and maintains specific areas of IT infrastructure such as servers and other hardware, Database and Network. Well known job titles are System Administrator, DBA and Network Admin.

Level 1: Level 1 consists of the Engineering and Analyst team that perform the basic day to day System Development tasks. At this level all the the business strategy is in the form of clear system specifications such as DFDs, ERDs, System flows, Program logic flows and detailed hardware/network configurations. The job in this level requires knowledge of basic Software Engineering methodologies, Hardware/Network technologies, configurations and troubleshooting. Its essential to have skills in System Analysis and design techniques, Programming, Software testing and hardware/network configuration and maintenance. Well known job titles are as follows.

1.1 Hardware Engineer is responsible for configuring, troubleshooting and servicing hardware/network. Well known job titles are -

1.1.1 Hardware service Engineer
1.1.2 Network Engineer

1.2 Software Engineer is responsible for developing software based on the system specifications and acceptable programming techniques and standards so that the developed IM solution will satisfy the business goal. Major job titles are -

1.2.1 Software Engineer/Developer
1.2.2 Web Developer
1.2.3 Web designer
1.2.4 Software Tester

(c) Deepesh Joseph, 2006

Related websites --

Compare and contrast data warehouse principles

Operational Vs Informational system : Operational Systems are those which are based on current data and thus supports current/day-to-day functioning of a business. Informational Systems on the other hand are based on historical information and thus supports complex data mining/analysis and decision making. The core difference lies in the way queries are run to process/analyze data. Simple, planned and real time queries are run on an operational system that acts on small data sets, where as in an informational system, huge complex queries are executed that acts on bulk amounts of data sets.

Operational database vs. Data warehouse : Operational database is the database that we use to implement an operational system and Data warehouse is one of the opted database design that we use to implement an information system. Operational database is less complex in design (such as relational based and in 3rd normal form) supporting storage and processing of current/operational data where as Data warehouses are designed to support effective storage and processing of large volumes of historical data. Operational databases are relational in most cases, where as Data warehouse follows database models such as star or snowflake schema (multidimensional data models).

Data warehouse vs. Data mart : Data warehouse comprises of a complete domain of data pertaining to an enterprise, where data marts are scaled down version of a data warehouse that is confined to a particular data domain (eg: sales). In other words, data marts are a subset of data warehouse.

OLTP vs. OLAP : OLTP or Online transaction Processing Systems refers to those Information Management systems that are based on operational databases and thus supports current business transactions such as sales order processing. OLAP or Online Analytical Processing systems are those which are based on data warehouses or similar solutions that are designed to support ad-hoc query analysis or data-mining based on historical data. OLTP supports current business functions and flows where OLAP supports decision making based on complex data analysis.

Ad hoc queries vs. Data Mining : Ad hoc queries are queries which are designed for a specific known purpose and cannot be dynamically altered to suit a different need. Data mining on the other hand allows us to create dynamic queries based on real time user/process inputs and thus produce data patterns which are not originally known to the business.

Star schema vs. Snow flake schema : Both Star and Snow flake schemas are data warehouse database design models that supports multidimensional data modeling. Star schema is the most simple data warehouse data model which consists of one or more fact tables in the middle and many dimensional tables connected to the fact table, thus creating the shape of a star. Snowflake data warehouse data model are similar to star schema, except that the dimensional tables are normalized. Snowflake schema is usually used when we convert an already normalized transactional database into a data warehouse, where all dimensional tables would be already normalized.

(c) Deepesh Joseph, 2001
Related websites --

Compare and contrast distributed database principles

a. Homogeneous vs. heterogeneous distributed database system

Homogeneous database systems involve similar databases distributed over the network (on separate machines). Example of homogeneous database system is an enterprise’s nation-wide ERP system which comprises of distributed databases, all of which are Oracle. Heterogeneous database systems on the other hand are distributed database systems that consist of at least one different database. Example of heterogeneous database system is an enterprise wide intranet application which consists of databases such as MS SQL server and DB2, which belong to the same integrated database application.

b. Autonomous vs. Non-autonomous distributed database system

Autonomous and Non-autonomous distributed database is a sub-set of Homogeneous databases. Autonomous distributed database are independent databases (separate data residing in each database) that function independently, but, are integrated by the controlling application software. Non-autonomous distributed database are homogeneous databases where data is distributed across homogeneous nodes and is controlled by DBMS at each node. Example for a autonomous distributed database system is Oracle based data marts which manages data pertaining to sales, distribution and inventory. Example for a non-autonomous distributed database system is Oracle based global sales database which is partitioned across multiple databases.

c. Federated vs. Unfederated

Federated database systems are collection of heterogeneous database systems which is integrated together to function as a single system. Each constituent database system is autonomous and control can be exercised to each local database component of the federation. Unfederated database systems are collection of homogeneous database systems which are generally non-autonomous by nature and employs centralized control. Example for a federated database system would be an extended heterogeneous distributed database system that span across multiple database vendors and multiple enterprise departments. Example for unfederated database system would be an extended homogeneous distributed database system that spans across a global enterprise function.

(c) Deepesh Joseph, 2001
Related websites --

Pros and Cons of normalizing data into 3NF form

Deepesh Joseph
February 2009

The main purpose of normalization is to reduce data redundancy and avoid inconsistent data. Normalization leads to separation of unrelated entities into separate entities. In effect, normalization leads to clean database design. Since we do not store redundant data, we save storage space and save resources to maintain (update, delete) redundant data.

But, there are instances where we do not need to fully normalize data. The example provided in question is an excellent example to explain why do we allow de-normalized data. Looking at the customer address data, it is desirable to design city, state, country and postal codes as separate entities since they could be represented by thier own unique identifiers (state_id, country_id, zip_id) and that multiple customers may belong to same country, state, city and zip. Suppose we did design these as separate entities and try to retrieve data for the following problem --

"Generate report of all customer belonging to 'US' and who reside in 'FLORIDA's 'TAMPA' city in '33601' postal code."

The query would be something like --

c.customer_first_name, c.customer_middle_name, c.customer_last_name
customer c, customer_address ca, address_city act, address_zip az, address_state as, adress_country ac
c.customer_id=ca.customer_id and c.city_id=act.city_id AND c.zip_id=az.zip_id
AND c.state_id = as.state_id and c.country_id = ac.country_id AND ac.country_name = 'US'
AND as.state_name = 'FLORIDA' AND act.city_name = 'TAMPA' AND az.zip_code = '33601'"
Notice the joins (c.customer_id=ca.customer_id and c.city_id=act.city_id etc) required in the SQL to retrieve the required information. SQLs joins are considered to be very expensive when there is huge amount of data, say we have a tera byte of data within customer table. The four additional joins is going to be very expensive and will lead to unacceptable system response time.

If we de-normalize data and allow country, state, city and zip data to reside within customer_address table, then we could rewrite the above query as --

c.customer_first_name, c.customer_middle_name, c.customer_last_name
customer c, customer_address ca
c.customer_id=ca.customer_id AND ca.country_name = 'US'
AND ca.state_name = 'FLORIDA' AND ca.city_name = 'TAMPA' AND ca.zip_code = '33601'"

After de-normalization, the query would run much faster. So, in effect, normalizing data into 3NF form is not always practical.

(c) Deepesh Joseph, 2001
Related websites --

Saturday, May 2, 2009

Current usage and future of XML Database Management Systems

Deepesh Joseph
April, 2009


The purpose of this document is to present a management report that analyzes the current usage of XML based Database Management System, its impacts and future trends of usage.

Present Landscape

Database Management Systems (DBMS) has been constantly evolving ever since first DBMS was installed and used based on Codd’s relational concepts. We saw steady pace of innovations in DBMS technology and usage starting from Hierarchical databases to Object/Relational databases and from centralized to decentralized database systems. DBMS evolution has also been supported by innovative ways of collecting, storing, processing and retrieving data such as from basic genealogy to complex forensic data (Hoffer J.A). These innovations shows how closely linked is DBMS to the advancements in general technology landscape, overall systems architecture and type and nature of Information Management (IM). For example, Object Oriented systems development has supported the development of Object Oriented DBMS. Similarly, distributed systems architecture has lead to distributed DBMS setup. Also, advanced needs of managing complex information has lead to the development of various sophisticated spatial/clinical/genetic databases.

The current landscape of DBMS is a collection of above mentioned trends that has continued to evolve in the past 30 years. Different data management domains calls for specific technology based DBMS viz. Relational versus Object Oriented. Again within each specialized DBMS technology domain, their exists number of vendor products that compete to provide efficient database management. For example, Oracle and DB2 compete each other in Relational database domain. Competition is also prevalent in the way the DBMS product license is offered such as with MySQL, PostgreSQL (IdeaByte) and recently with Sybase (Product: Sybase ASE 15 Express) when it announced to offer it free on Linux (

General Overview

XML (Extensible Markup Language, developed by W3C) based DBMS is one such innovative usage of DBMS prompted by pervasive usage of web based database applications and its related need of managing frequent storage and retrieval of not-very structured data in document format (i.e. as web pages). This need goes in line with what was described in the above section as related to evolving and specific IM needs. XML, in its basic sense of existence, is used to create, store and transport either data-centric (such as a SOAP request and response) or document-centric data (such as XHTML documents). In either case, XML provides an ordered way to arrange data in cascaded data tags which could be easily read and processed using XML query language (eg: XQuery). Even though XML was originally designed to create XHTML (Extensible HTML) documents, technologists and database vendors realized the importance of XML in storing and retrieving semi-structured data, efficiently (Obasanjo, D).

Strictly speaking, XML is not a database, but an efficient medium to represent and transport data across multiple systems. Also XML is not a DBMS in strict sense, but could provide some basic features of DBMS via XML documents, XML query languages, programming interfaces etc (Bourret, R.), i.e use XML documents to store data (eg: DTDs) which is queried and accessed by XML query language (eg: XQuery). This is the most basic model of XML DBMS and forms the basis of all modern Native XML DBMS available such as Sedna. Sedna is a DBMS that supports some traditional DBMS features such as update and query languages, query optimization, fine-grain concurrency control, various indexing techniques, recovery and security. Sedna also supports W3C based XQuery language which could be used to conduct complex data management operations such as XML data querying, XML data transformations and even business logic computation ( Another type of XML DBMS is the normal DBMS such as DB2 or Oracle that provides support for XML based data storage and retrieval through special storage and data management features. For examples, latest releases of Oracle provides native XML data type which could be used to store XML data. DB2 provides support for XQuery based data management where data could be exported/imported into the database in XML format.

As we saw from above analysis, XML DBMS’s core usage is based on the need to handle vast amount of document centric or XHTML centric data. This is the basic feature that distinguishes XML DBMS from current DBMS technology. If we have a database application that is web based and it requires heavy processing of documents/objects (storage/access/search of web pages, music/video files, directory/phone book type of data etc) and that requirements of structured document/data storage is not very relevant, then we could potentially reap benefits by using a native XML DBMS. Where as, if we plan to implement a heavy transaction oriented web application which involves atomic transactions, such as bank transactions, we should be using a traditional DBMS such as DB2 or Oracle (provided these support native XML transactions).

Impact and Future Directions

Degree of disruption:

XML DBMS has not bought any level disruption to the current DBMS market or its usage. It has been developed and used as an add on tool to support a specific IM need, mainly in web based database applications. The name ‘XML DBMS’ sounds like a misnomer since XML or native XML DBMS does not provide all features of a full blown DBMS. Since XML and its query language confirms to W3C standards, it is could be easily integrated with all popular relational/ object oriented DBMS as add-on feature.


Costs associated with implementing XML DBMS depends on the type of solution that is sought for. If we plan to use native XML DBMS, most of it is free/open source, which brings down total cost of ownership to zero. Most of the present day DBMS such as Oracle, DB2 etc comes with native XML support, so that no extra cost is incurred if we are already using one of these DBMS.


XMLS DBMS provides maximum benefit when used for driving heavy document-centric web applications as we saw in ‘General Overview’ section. XML DBMS provides most cost effective way to store and process document data since very little effort is required to present user data since the underlying data format for the transport and presentation layers are in the same, i.e XML or a derivative such as XHTML.

IT infrastructure changes:

Since XML DBMS is used to support the strategy of building cost effective XML data management, most of the supporting system architecture would be already in place - such as XML documents (that follows a specific XML schema) , XML parser/extractor and Query tool (which is supported by almost all of the web scripting languages and native XML DBMSs) and native XML support by the underlying relational DBMS. For example, if we plan to implement XML DBMS for a web based application which is LAMP (Linux/Apache/MySQL/PHP) based, all supporting technology (DTD/XML schema support, XQuery/XPath based query language support, SAX/DOM based programming interface support, native XML support within MySQL etc ) is inherent to the underlying technology infrastructure. The most critical factor that drives selection of XML DBMS is thus the specific need to support XML centric application architecture.

Skills required:

The basic skill required to implement and manage XML DBMS is knowledge of XML, XML schema, XML query language, XML parsers/extractors, XML programming interfaces, usage of native XML functions (if using relational based DBMS) and knowledge of native XML DBMS (if native XML DBMS such as Sedna drives the database application).

Future directions in usage of XML DBMS:

XML technology in general is being widely accepted as a standard medium of data transport between disparate systems. The standards are W3C complaint and XML query tools and APIs are constantly improved to be interoperable with wide range of relational, object-oriented and non-relational databases,. This scenario supports the wide acceptance of XML based databases for powering systems which are less web centric in nature. To show the immense possibility of effectively utilizing the power of XML DBMS, provided is a sample system as shown below ( The system leverages XML DBMS technology to build and manage a Pattern Base Management System (PBMS) which enables user to store and retrieve patterns, just like data.

Figure 1. XML DBMS based PBMS (Image courtesy -

The idea of using XML based DBMS originated with the concept that patterns are “compact and rich semantic representations of data”, which could be effectively represented in XML schema. The figure shows how data is extracted from data sources via XML and further fed into underlying relational based (or a native XML DBMS) through appropriate XML query tool (in this case, it is a Pattern Definition/Query/Manipulation Language or PD/Q/ML).


XML - Extensible Markup Language, developed by World wide web consortium (W3C) to deal with shortcoming of HTML.
IM - Information Management
SOAP - Simple Object Access Protocol used as a medium to communicate between two systems, eg: an application and a web service.
Native XML DBMS - Pure XML based DBMS without underlying relational or any other traditional DBMS support
XQuery - An XML query tool
XHTML - Extensible HTML
SAX - Simplae API for XML
DOM - Document Object Model
DTD - Document Type Definition


1. Hoffer J.A (March 20, 2006). Modern Database Management. Prentice Hall 8th edition.

2. (2009). Database Management System trends - Retrieved April 18, 2009 from

3. IdeaByte. (February 13, 2003). IT Trends 2003: Database Management Systems - Retrieved April 18, 2009 from

4. (January, 2009). Database Technology Trends Behind The Scenes of Database Evolution -Sybase ASE 15 for Linux - FREE?- Retrieved April 19, 2009 from

5. Feuerlicht G. (n.d.). Recent Trends in Database Technology. Retrieved April 19, 2009 from

6. Bourret R. (September, 2005). XML and Databases. Retrieved April 19, 2009 from

7. Obasanjo, D. (n.d.). An Exploration of XML in DBMS. Retrieved April 19, 2009 from

8. (2009). About Sedna (XML DBMS). Retrieved April 20, 2009 from

9. (n.d.). XML-based Pattern Base Management system. Retrieved April 19, 2009 from

10. Jiang H. (n.d.). XParent: An Efficient RDBMS-Based XML Database System. Retrieved April 19, 2009 from

11. Zhou A. (n.d.). VXMLR: A Visual XML-Relational Database System. Retrieved April 20, 2009 from

(c) Deepesh Joseph, 2001
Related websites --