Ari Steinberg and Charlie Cheever of Facebook gave a fascinating talk about the design of the Facebook platform when they spoke on Tuesday to the SDForum Web Services SIG. As might be expected, the presentation drew a large audience including renowned Facebook fanboy Dave Maclure.
The Facebook platform allows outside developers to build applications and offer them to Facebook users. These applications can reach into Facebook data to do their work so security and privacy concerns are paramount. Also with Facebook approaching 50 million users and reportedly 85,000 applications, performance and scaling are big concerns.
Ari spoke first on the Facebook API. After looking at other Web 2.0 APIs and finding them wanting, they decided to use a query language approach which would give applications much greater flexibility and at the same time have the possibility of being more efficient by only fetching data that is needed. The language, FBQL is basically a simplified and restricted version of SQL.
The linguistic approach is also used in Facebook Markup Language (FBML) the language that applications and users use to define and customize their pages. As Charlie told us, it is much easier to validate and sanitize markup and style sheets from a parse tree than with regular expressions or any other technique. Also the result is much less likely to have the security loopholes that seem to plague so many Web 2.0 sites.
Thursday, October 25, 2007
Sunday, October 21, 2007
OLAP with Mondrian
I got three things out of Julian Hyde's talk the the SDForum SIG last week. Julian is the lead developer on the Open Source Mondrian project and his talk was entitled "Building scalable OLAP applications with Mondrian and Pentaho".
The first thing is the nature of Mondrian. While most OLAP servers are database like servers that persistently store the OLAP cubes, Mondrian is a cache. At the back end, Mondrian uses JDBC to fetch persistent data from a database. At the front end Mondrian accepts queries in the MDX language and replies to these queries with data cubes. In the middle, Mondrian caches data cubes in memory so that it can respond to queries faster. Mondrian is a component. It is packaged as a single Java .jar file, and its schema is defined by an XML file. Mondrian needs both a back end database and a front end, either something like JPivot that can display OLAP cubes or an XML/A driver that can communicate MDX queries with a remote front end.
This brings us to the second thing. While there have been several attempts to create a standard interface to OLAP servers, none have really succeeded. Julian has spearheaded a new Open Source initiative called olap4j. The concept is that olap4j will be the JDBC of OLAP. It is based on JDBC, and uses the MDX language to express OLAP queries. MDX is the universal language of OLAP, just as SQL is the universal language of relational databases. The only curiosity is MDX is owned by Microsoft and that other implementations of MDX like Mondrian have to scramble to match each Microsoft release. I think that it is about time that MDX is managed by an independent standards committee.
Finally Julian talked about a "real-time" feature that he has recently added to Mondrian. This is a cache control API that allows you to invalidate parts of the Mondrian cache when new data is loaded into the underlying database. Having invalidated part of the cache, new queries fetch the new data and display up to date results. Initially the example looked complicated, and the audience reacted with some skepticism. On looking back at the example in the presentation, it seems less forbidding, however the long term goal should be to automatically manage the OLAP cache from an ETL tool. When pressed, Julian told us that the API is being used, for example, in applications for securities analysis.
In all, it was an satisfyingly informative evening.
The first thing is the nature of Mondrian. While most OLAP servers are database like servers that persistently store the OLAP cubes, Mondrian is a cache. At the back end, Mondrian uses JDBC to fetch persistent data from a database. At the front end Mondrian accepts queries in the MDX language and replies to these queries with data cubes. In the middle, Mondrian caches data cubes in memory so that it can respond to queries faster. Mondrian is a component. It is packaged as a single Java .jar file, and its schema is defined by an XML file. Mondrian needs both a back end database and a front end, either something like JPivot that can display OLAP cubes or an XML/A driver that can communicate MDX queries with a remote front end.
This brings us to the second thing. While there have been several attempts to create a standard interface to OLAP servers, none have really succeeded. Julian has spearheaded a new Open Source initiative called olap4j. The concept is that olap4j will be the JDBC of OLAP. It is based on JDBC, and uses the MDX language to express OLAP queries. MDX is the universal language of OLAP, just as SQL is the universal language of relational databases. The only curiosity is MDX is owned by Microsoft and that other implementations of MDX like Mondrian have to scramble to match each Microsoft release. I think that it is about time that MDX is managed by an independent standards committee.
Finally Julian talked about a "real-time" feature that he has recently added to Mondrian. This is a cache control API that allows you to invalidate parts of the Mondrian cache when new data is loaded into the underlying database. Having invalidated part of the cache, new queries fetch the new data and display up to date results. Initially the example looked complicated, and the audience reacted with some skepticism. On looking back at the example in the presentation, it seems less forbidding, however the long term goal should be to automatically manage the OLAP cache from an ETL tool. When pressed, Julian told us that the API is being used, for example, in applications for securities analysis.
In all, it was an satisfyingly informative evening.
Thursday, October 04, 2007
The Future of Music Revisited
Michael Arrington has a great post in TechCrunch on "The Inevitable March of Recorded Music Towards Free". Most interesting is both the large response and the highly negative nature of some of the responses. I wrote about this subject under the title of "The Future of Music" a couple of years ago, partly inspired by an article in Wired about MySpace.
I completely agree with Michael Arrington, and I am surprised at the reaction of some performers. I expect that a performer would want to escape the clutches of their record company, who seems to feel that it is their right to keep 90% of their artists earnings. The fact that groups like Radiohead are escaping from their contract and planned to go free shows the way forward.
It is worth remembering that the Recording business is relatively recent. Big recording companies have only existed for 60 to 70 years. They came into their position of power by controlling the means of production, first records and then CDs. Now that they are no longer needed to reproduce music, they are irrelevant and should just fade away.
I completely agree with Michael Arrington, and I am surprised at the reaction of some performers. I expect that a performer would want to escape the clutches of their record company, who seems to feel that it is their right to keep 90% of their artists earnings. The fact that groups like Radiohead are escaping from their contract and planned to go free shows the way forward.
It is worth remembering that the Recording business is relatively recent. Big recording companies have only existed for 60 to 70 years. They came into their position of power by controlling the means of production, first records and then CDs. Now that they are no longer needed to reproduce music, they are irrelevant and should just fade away.
Subscribe to:
Posts (Atom)