Saturday, October 11, 2008

Its Called Risk, Have You Heard Of It?

Senator Phil Gramm famously called us a "nation of whiners" and he may be right. (Note, while I try to keep this blog about technology, the financial system seems to be so badly broken it is worthy of a comment or two.) I recently ran across a blog post on a financial site called "Our Timid Government Is Killing Us" by Michael Kao, CEO of Akanthos Capital Management. In it he complains about 4 things that the Government has not done to help resolve the financial crisis. I want to concentrate on one of there here. "Problem No. 2: Lehman's bankruptcy has severely eroded confidence between counterparties."

The problem is this. Over the last few weeks, financial institutions have become unwilling to trust one another and with good cause. The issue is Credit Default Swaps. This is a 60 Trillion dollar market (that is Trillion with a capital T) where financial institutions like banks and hedge funds (the parties) buy and sell insurance policies on bonds. The "This American Life" radio program and Podcast has very good and understandable explanation of the market and how it came to be.

There are two important things to understand about the credit default swap market. Firstly, it is completely unregulated. Senator Phil Gramm tacked a clause to keep the market unregulated to an appropriations bill in 2000 that was approved by 95 votes to 0 in the Senate. Secondly, the market is not transparent, that is the various parties to the market do not know what any other players position is. Note that these two features are the way the parties in the market wanted it. There has been great outcry about reducing financial regulations in the last few years.

Lack of transparency was not a problem until Lehman Brothers went bankrupt. They were a big player in the credit default swap market. Now all their credit default swap insurance policies are frozen by their bankruptcy. Anyone who sold a credit default swap policy and then laid off the bet by buying an equivalent credit default swap from Lehman Brothers is now on the hook to pay off the insurance policy on the bond without the compensation of being able to get Lehman Brothers to make good on their policy.

The lack of transparency means that nobody knows for sure about anyone else in the market. That is that anyone could go bankrupt tomorrow because they have bought credit default swaps from a bankrupt company like Lehman Brothers and they cannot make good on their promise of a credit default swaps that they have sold. Already AIG has needed a huge injection of government money to stay afloat, and others may be suffering as well. But no one knows what positions anyone else holds. So everyone is conserving their cash not lending it out to anyone else so that they will not lose it if the other party goes bankrupt. Thus are the credit markets constipated.

A final problem is that, because the market is unregulated, there are no capital requirements to back up a bet. I can sell a credit default swap insurance policy based on my good name. I immediately get a large sum of money which I can register as a profit. It is only later that I have to worry about a problem with the bond that I have insured defaulting (what are the chances of that?) This is how the market got to be 60 Trillion dollars in size.

The underlying issue is this. There is a large risk in trading in unregulated markets. The risk is made larger if the market is not transparent, because if one of the parties to the market goes bust nobody knows what their position is worth. These risks were not recognized in the credit default swap market and policies were sold at far too low a price to recognize these risks. If the market were regulated, like other markets are, these risks would not be there and the market could deal with usual events like a bankruptcy of a player.

Finally there is a risk to the nation in allowing an unregulated market to balloon to the size that the credit default swap market has. Bankruptcies happen all the time. the fact that the bankruptcy of a player has caused the entire financial marketplace to go into a swoon is bad for the nation. The players in the credit default swap market asked for an unregulated market and they got what they asked for. Now the risk of having an unregulated market has shown itself and as Senator Gramm tells them they should deal with it and stop whining.

I am not an apologist, I am a technologist who is interested in how things work.

Thursday, October 02, 2008

Articulate UML Modeling

Last week, Leon Starr spoke to the SDForum SAM SIG on "Articulate UML Modeling". Leon is an avid modeler and has been using UML for modeling software systems since it was first defined. He believes in building executable models and I applaud him for that. The very act of making something executable ensures that it is in some sense complete and free from many definitional errors. Executing the model allows it to be tested.

There are several advantages to building models rather than programs. A big part of many project is extracting requirements. Unlike a program, a model can describe requirements in a way that a non-technical user can understand and appreciate, so the user can provide feedback. Another advantage of a model is that is does not arbitrarily constrain the order in which things are done. Essentially, a model is asynchronous and captures opportunities for concurrency in its implementation. This struck a chord with me as I am going to speak to the SAM SIG in October on "Models and Patterns for Concurrency".

The other part of the talk that interested me was Leon's attack on building models as controllers. For example, he gave the example of a laser cutting machine. A common way of modeling this is the laser cutter controller that interprets patterns. He prefers to see this modeled as patterns that describe how they are cut by the laser cutter. Leon's experience is with modeling software to manage physical systems like the air traffic controller example that he used to illustrate his talk. His approach is certainly useful for the understanding and analysis of physical systems, however I have seen the problem argued both ways. It is worth a separate post on the issue.

Monday, September 29, 2008

42 Revisited

Last week TechCrunch had a post on the State of The Blogosphere: The More You Post, The Higher You Rank. One statistic is that the top 100 bloggers post on average 310 times a month, which sounds quite exhausting. As you know, I post 42 times a year. I am going to promise to my faithful reader that I will stick to my pace. You will not get an unreadable avalanche of overlapping verbiage from this blog.

If I have not posted much recently, it is because I have spent a lot of time reading blog posts on the financial crisis. It is very entertaining to see these extrordinary events unfold around us. Who would have thought that George W. Bush will be known to future generations as the President who nationalized the American financial services industry?

Wednesday, September 17, 2008

SaaS Data Integration

Data integration is the problem of gathering data, perhaps from many different application for the purpose of doing some analysis of the data as a whole. Mike Pittaro, Co-Founder of SnapLogic spoke to the SDForum Business Intelligence SIG September meeting on "Enhancing SaaS Applications Through Data Integration with SnapLogic".

The big players in data integration are Informatica and Ascential (now IBM Information Integration) who sell large, expensive and complex products. Because of the cost, these products are often not used, particularly for one off projects which are common. Mike helped found SnapLogic in 2005 to bring a new perspective to data integration. SnapLogic is an open source framework and therefore both affordable and extensible by its users.

He showed us the complexity of data integration. It involves dealing with many different access protocols, multiple ways of getting the data and each type of data has its own metadata format to describe the data. This he contrasted with the World Wide Web where huge amounts of data are pulled back and forth every day, without interoperability problems. There are almost 200 million web sites, and billions of users, yet World Wide Web is completely decentralized, with heterogeneous model that allows for different operating system, servers, client software applications and frameworks, and yet they are all compatible and interoperable.

The World Wide Web is based on open standards and protocols and an architectural principal called REST, which stands for REpresentational State Transfer. REST plays with data resources, in standardized representations and each resource identified by a unique identifier like a URL.

SnapLogic builds on this by turning data sources into standard web resources. With SnapLogic you configure a server to extracts data from a datasource like a file or database and transform the data into the form you want. The server presents the datasource as a standard web resource with a URL. These servers are the blocks for building a data integration application.

Thursday, September 04, 2008

Chrome

On Tuesday, Google announced their new browser Chrome. Although it has generated huge discussions in various forums and an astonishing adoption rate, I am not going to rush to use it. In fact, I think I will wait until it is out of beta before considering whether to adopt it. That should give me many years before I have to even think about making a change!

Wednesday, September 03, 2008

A Tale of Two Search Engines

At the SDForum Software Architecture and Modeling (SAM) SIG last week, John D. Mitchel, Mad Scientist at MarkMail and previously Chief Architect of Krugle talked about the architectures of the search engines that he has built for these two companies.

Krugle is a search engine for code and all related programming artifacts. The public engine indexes all the open source software repositories on the web. This system was built a couple of years ago with cheap of the shelf commodity hardware, open source software and Network Attached Storage (NAS). In total it has about 150 computer systems in its clusters. The major software components are Lucene (search engine), Nutch (web crawling and etc.) and Hadoop (distributed file system and map-reduce). These are all Open Source projects written in Java and sponsored by the Apache Software Foundation. Krugle sells an enterprise edition that can index and make available all source code in an enterprise.

MarkMail is a search engine for email. It indexes all public mailing lists and is a technology demonstrator for the MarkLogic XML Content Server. MarkMail is built with newer hardware that is more capable. It uses Storage Area Network (SAN) for storage which offers higher performance at a greater cost than NAS. The MarkMail search system is built on about 60 computer systems in its clusters.

Saturday, August 16, 2008

Windows Woes

For years it seemed like a good idea, Microsoft produced the software and many vendors sold compatible hardware. Competition kept the hardware innovation flowing and prices low. Then Microsoft turned into a big bloated monopoly that could not create a decent product if it tried. Moreover, Microsoft is not really in control, it itself is hostage to other interests. The result is a horrible user experience. Here are a couple of my recent experiences.

A few months ago I bought a new video card so that it would use the digital input to the monitor. Installing the card was a breeze and the digital input makes the monitor noticeably sharper. The only problem was that the sound had stopped working. After a couple of hours scratching my head and vigorous Googling, the problem turns out to have been caused by Hollywood.

The connection between a computer and its display uses HDMI, a digital interconnect standard that can transmit both video and audio. This allows a PC to connect to a digital television as well as a simple display. It also allows the video and audio content to be encrypted so that you cannot steal it from your own computer. This was mandated by Hollywood and Microsoft meekly acquiesced to it so that they could provide media center software that would display Hollywood movies in high definition.

So after the video card installation, Windows software assumed that I was going to use the digital audio output on the video card and ignored all other audio output devices. This even although my display does not have any speakers. I had to go into the BIOS and change some low level settings for sound so that Windows would allow me to select the sound settings that I had been using before installing the video card. Any time you have to go into the BIOS to change settings, the user experience loses.

More recently my brother and family came to visit during a tour of California. He wanted to unload all the pictures on his cameras flash card and write them to a CD as the flash card was full. I suggested the easy way out, visit Fry's Electronics and buy another flash card, but that deemed more trouble. In practice it would have been much easier.

We downloaded the flash card to my PC. The first difficulty is that you are presented with a list of 6 competing programs that want to download your pictures. Which one should I use? I know that in practice they are all going to put the pictures in some ridiculous place where you can never find them again (that is the subject of another tirade). I chose the first in the list which happened to be compatible with the brand of digital camera.

The next problem came when we went into Windows Explorer so that we could drag the pictures to the CD ROM folder. Every time we went into the folder where the pictures were, Explorer exited saying that it had an unexpected fault. I knew exactly what the problem was because I had seen it before. There were some movie files taken with the digital camera, and Windows has problem with these movie (.avi) files. For some reason, Explorer tries to open every file in a folder when it enters the folder, even although I set it to just list the files and not display thumbnails.

The fix was to open a DOS window, navigate to the folder with the files and rename them so that Windows would not think they were media files. I added the extension .tmp to each .avi file by laborious typing. Then it was possible to do the intuitive drag and and drop with Explorer to make a CD ROM. Any time you have to resort to using a DOS window to do a straightforward function in Windows, usability has gone out the window.

I could go on (as I have in the past), there have been more problems, however with each problem the Apple alternative looks better. Apple is by no means perfect, however the Apple OS is built on a better foundation and the innovations that it makes when it comes out with a new version are both useful and innovative.

Thursday, July 17, 2008

A Gentle Introduction to R

We were given a gentle introduction to the R statistical programming language and its application in Business Intelligence at the July meeting of the SDForum Business Intelligence SIG. The speakers were Jim Porzac ( Senior Director of Analytics at Responsys) and Michael Driscoll (Principal at Dataspora). Jim has posted the presentation here.

R is an Open Source project that uses the GNU license. It has a growing user base with a strong support community and a user group (called UseR Group - try Googling that). There are now almost 1500 packages for the languages that supports various statistical techniques and specialized application areas. Packages include: Bayesian, Econometrics, Genetics, Machine Learning, Natural Language Processing, Pharmacokinetics, Psycometrics, which gives some idea of the range of subjects and techniques that R covers.

Jim did most of the talking, introducing the language and showing us some examples of its use. One example is his data quality package that he uses on each new dataset that he receives for analysis at Responsys. Another example showed how reporting capabilities while a third showed sophisticated graphs and plots used for customer segmentation analysis. Michael showed us how he used R to do some interesting and very practical analyzes of Baseball statistics.

The audience probed R's strength and weakness. R has the connectivity to get data for analysis from databases and other sources. R also has excellent graphing and reporting capabilities. Currently R works by reading data into memory where it is manipulated, which limits the maximum size of data set that can be analyzed to the many Gigabyte range.

One person asked for a comparison with SAS. R has the advantages of being free with an enthusiastic user base to keeps it on the cutting edge. Also R is a more coherent language than SAS, which is a collection of libraries, each of which may be very good but they do not necessarily make a whole.

Jim and Michael are starting a Bay Area chapter of the UseR Group. If you are interested, contact Jim Porzac at Responsys.

Wednesday, July 09, 2008

Social Search

The SDForum Search SIG pulled together an A-List panel for their July meeting on Social Search. Moderator Safa Rashtchy hosted Bret Taylor of FriendFeed, Ari Steinberg of FaceBook, Jason Calacanis of Mahalo and Jeremie Miller of Wikia Search. Of the panelists, Jason Calacanis had the most to say, was arguably the most interesting and definitely the most opinionated. He also recorded the event with the camera in his MacBook Air. Vallywag has a better and more concise video excerpt of Jason in action shaded by their desire to capture controversy.

Facebook and FriendFeed are working on automated search within their social networks, while Mahalo and Wikia Search are working on improving general search by using people to curate the results. Mahalo is paying people, while Wikia Search is trying to use the Wikipedia model of free community involvement.

Most of the audience questions to the panel were about their business models and monetization. I tried to tried to get into technicalities by asking a question about Search Quality, there was a question on privacy, and one audience member argued that none of the panelists companies were doing social search as he defined it.

Saturday, June 21, 2008

Master Data Management - What, Why, How, Who?

I got two interesting things out of Ramon Chen's talk on Master Data Management (MDM) to the SDForum Business Intelligence SIG June meeting. Ramon is VP Product Marketing at Siperian. The first thing is the importance idea is the notion of Data Governance, and as part of governance the emerging role of the Data Steward. The second thing is the big enterprise software vendors are circling.

Large organizations, companies and government collect vast amounts of information and Data Governance is the process of looking after that data. First is the problem of cataloging all the data that the organization has. Next, there may be different versions of the same data that needs to be reconciled and the quality of the data that needs to be ensured. Finally there is the question of deciding who has access to different parts of the data and ensuring that it is correctly secured. A Data Steward is a person who is responsible for some part of the data.

Ramon had some specific examples of problems with data. One is in the Medical field where gifts to doctors are highly regulated. The problem is in identifying a specific doctor particularly where a father and son with similar names may share a practice, which is not uncommon. Another problem is security. Siperian has implemented security down to the cell level to ensure that each user can only see data that they are allowed to look at.

Ramon also described how MDM software vendors are being consolidated by data providers and the big enterprise software vendors. For example, Purisma, who presented to the BI SIG a couple of years ago was bought by Dunn and Bradstreet last year. IBM has been particularly active in buying small MDM related software vendors, however SAP, Microsoft and Oracle have also bought companies in this area recently.