Wednesday, September 17, 2008

SaaS Data Integration

Data integration is the problem of gathering data, perhaps from many different application for the purpose of doing some analysis of the data as a whole. Mike Pittaro, Co-Founder of SnapLogic spoke to the SDForum Business Intelligence SIG September meeting on "Enhancing SaaS Applications Through Data Integration with SnapLogic".

The big players in data integration are Informatica and Ascential (now IBM Information Integration) who sell large, expensive and complex products. Because of the cost, these products are often not used, particularly for one off projects which are common. Mike helped found SnapLogic in 2005 to bring a new perspective to data integration. SnapLogic is an open source framework and therefore both affordable and extensible by its users.

He showed us the complexity of data integration. It involves dealing with many different access protocols, multiple ways of getting the data and each type of data has its own metadata format to describe the data. This he contrasted with the World Wide Web where huge amounts of data are pulled back and forth every day, without interoperability problems. There are almost 200 million web sites, and billions of users, yet World Wide Web is completely decentralized, with heterogeneous model that allows for different operating system, servers, client software applications and frameworks, and yet they are all compatible and interoperable.

The World Wide Web is based on open standards and protocols and an architectural principal called REST, which stands for REpresentational State Transfer. REST plays with data resources, in standardized representations and each resource identified by a unique identifier like a URL.

SnapLogic builds on this by turning data sources into standard web resources. With SnapLogic you configure a server to extracts data from a datasource like a file or database and transform the data into the form you want. The server presents the datasource as a standard web resource with a URL. These servers are the blocks for building a data integration application.

No comments: