Author Archives: Ted Leung

Surge 2011

Last week I was in Baltimore attending OmniTI’s Surge Conference. I can’t remember exactly when I first met OmniTI CEO Theo Schlossnagle, but it was at an ApacheCon after he had delivered one of his 3 hour tutorials on Scalable Internet Architectures, back in the early 2000’s. Theo’s been at this scalability business for a long time, and I was sad to have missed the first Surge, which was held last year.

Talks

Ben Fried, Google’s CIO started the conference (and one of the major themes) with a “disaster porn” talk. He described a system that he built in a previous life, for a major wall street company. The system had to be very scalable to accommodate the needs of traders. One day, the system started failing, and ended up costing his employer a significant amount of money. In the ensuing effort to get the system working again, he ended up with all the people from the various specializations (development, operations, networking, etc) all stuck in a very large room with a lot of whiteboards. It turned out that no one really understood how the entire system worked, and that issues at the boundaries of the specialties were causing many of the problems. The way that they had scaled up their organization was to specialize, but that specialization caused them to lose an end to end view of the system. Their organization of their people had led to some of the problems they were experiencing, and was impeding their ability to solve the problems. The quote that I most remember was “specialization is an industrial age notion and needs to be discounted in spaces where we operate at the boundary of the known versus unknown”. The lessons that Fried learned on that project have influenced the way that Google works (Site Reliability Engineers as an example), and are similar to the ideas being espoused by the “DevOps” movement. His description of the solution was to “reward and recognize generalist skill and end to end knowledge”. There was a pretty lively Q&A around this notion of generalists.

Mark Imbriaco’s talk was titled “Anatomy of a Failure” in the program, but he actually presented a very detailed account of how Heroku responds to incidents. My background isn’t in operations, so I found this to be pretty interesting and useful. I particularly liked the idea of playbooks to be followed when incidents occur, and that alert messages actually contain links to the necessary playbooks. The best quote from Mark’s talk was probably “Automation is also a great way to distribute failure across an entire system”.

Raymond Blum presented the third of three Google talks that were shoe horned into a single session. He described the kind of problems involved in doing backups at Google scale. Backup is one of those problems that needs to be solved, but is mostly unglamourous. Unless you are Google, that is. Blum talked about how they actually read their backup tapes to be sure that they work, their strategy of backing up to data centers in different geographies, and clever usage of map reduce to parallelize the backup and restore process. He cited the Gmail outage earlier this year as a way of grasping the scale of the problem of backing up a service like GMail, much less all of Google. One way to know if a talk succeeds is if it provokes thoughts. Based on my conversations with other attendees, this one succeeded.

David Pacheco and Bryan Cantrill talked about “Realtime Cloud Analytics with Node.js”. This work is an analog of the work that they did on the analytics for the “Fishworks”/Sun Storage 7000 products, except instead of measuring a storage appliance, they are doing analytics for Joyent’s cloud offering. This is basically a system which talks to DTrace on every machine, and then reports the requested metrics to an analytics service once a second. The most interesting part of the talk was listening to two guys who are hard core C programmers / kernel developers walk us through their decision to write the system in Javascript on Node.js instead of using C. They also discussed the areas where they expected there to be performance problems, and were surprised when those problems never appeared. When it came time for the demo, it was quite funny to see one of the inventors of DTrace being publicly nervous about running DTrace on every machine in the Joyent public cloud. “Automation is also a great way to distribute failure across an entire system”. But everything was fine, and people were impressed with the analytics.

Fellow ASF member Geir Magnusson’s talk was named “When Business Models Attack”. The title alludes to the two systems that Geir described, both of which are designed specifically to handle extreme numbers of users. Geir was the VP of Platform and Architecture at Gilt Groupe, and one description of their model is that every day at Noon is Black Friday. So the Gilt system has to count on handling peak numbers of users every day at a particular time. Geir’s new employer, Function(x), also has a business model that depends on large numbers of users. The challenge is to design systems that will handle big usage spikes as a matter of course, not as a rarity. One of architectures that Geir described involved writing data into a Riak cluster in order to absorb the write traffic, and then using a Node.js based process to do a “write-behind” of that data into a relational database.

Takeaways

There were several technology themes that I encountered during the course of the 2 days:

Many of the talks that I attended involved the use of some kind of messaging system (most frequently RabbitMQ). Messaging is an important component in connecting systems that are operating a different rates, which is frequently the case in systems operating at high scale.
Many people are using Amazon EC2, and liking it, but there were a lot of jokes about the reliability of EC2.
I was surprised by how many people appear to be using Node.js. This is not a Javascript or dynamic language oriented community. There’s an inclination towards C, systems programming, and systems administration. Hardly an audience where you’d expect to see lots of Node usage, but I think that it’s notable that Node is finding some uptake.

One thing that I especially liked about Surge was the focus on learning from failure, otherwise known as a “fascination with disaster porn”. Most of the time you only hear about things that worked, but hearing about what didn’t work is at least as instructive, and in some case more instructive. This is something that (thus far) is unique to Surge.

W3C Web and TV Workshop

Thanks, Steve

OSCON 2011

1 Reply

I’m sitting on an Amtrak train at the end of July, which can only mean that OSCON has just finished up.

This year OSCON had something new, a pair of extra conferences, one on Data and one on Java, that overlapped the usual OSCON tutorial days. This year, I’m going to break the talk coverage down by conference.

OSCON Data

For purely technical content, OSCON Data was the winner for me. There were two talks which stuck out.

The first was Tom Wilkie’s talk on Castle. I first became aware of Castle at South by Southwest, when a persistent Manu Marchal found me on Twitter and arranged a meeting to explain the technology that Acunu was building to accelerate Cassandra and similar types of storage engines. Earlier this year the Acunu team published some papers describing their work. Those papers are still on the desk in my office, so I figured I could get the overview of the paper content by attending the talk. It’s a pleasure to see a talk focused on fundamental technology work on data structures and algorithms, and Tom’s talk delivered that. Castle is the open source version of their write optimized in kernel storage system. I’m looking foward to hearing field reports.

The other stand out talk was Josh Patterson’s talk Lumberyard: Time series indexing at scale, Time series data is growing in importance, and it’s great to see people working on this problem. Lumberyard uses the iSAX work that was done at UC Riverside. In addition to the time series functionality, Josh demonstrated how a number of seeming unrelated problems like image recognition, could be converted into time series problems which could then be solved with something like Lumberyard. I definitely learned something new.

OSCON Java

I only attended one talk at OSCON Java, and then only briefly. The talks was a keynote on JavaFX. JavaFX was originally positioned as a competitor to Flash/Flex, and then expanded to act as a new GUI framework for desktop Java. In the meantime, the world has moved. The iPad is casting doubts on the value proposition of Flash, which is an established and broadly adopted technology. Microsoft is backing away from Silverlight, and focusing on HTML5. Even if Oracle wins its lawsuit with Google it’s hard to see how JavaFX would be relevant for the device world. It’s hard to see how there is any room for JavaFX.

OSCON

Ariel Waldman was given a keynote slot to talk about Hacking Space Exploration. Space exploration is a topic that resonates with many people that come to OSCON, and given the recent end of the U.S. Space Shuttle program, it was encouraging to hear about all the avenues which are available to those who are interested in space exploration. My middle daughter wants to go to Mars someday, and I’m definitely going to be showing her the video of this talk.

I ended up in Gabe Zicherman’s talk on Gamification by accident. Late last year we sent someone from our group to a gamification workshop and Zicherman was among the speakers. I figured that via summarized analysis I had heard all that I needed to know. Apparently not. The material that was presented was thought provoking, and Zicherman is an effective and entertaining speaker.

Personal information is in my blood, from my time working on Chandler and well before that. When I was at Strata last year, I briefly heard about the Locker project, but I didn’t really get a good sense of what was going on there. Jeremie Miller partially talked about and partially demo’ed the Locker project, and there was also a hackathon during one of the BOF slots. It is very early days still, but if you are interested in getting involved, bootstrapping instructions are here.

Sometimes you find emerging technology in the strangest places. I went to Awakening The Maker Ethic in K-12 Students because it was the only talk in that slot that seemed interesting, because I have kids, and because we’ve pursued an unusual path for their education. In addition to amassing a long reading list on education and technology and education, I learned something interesting about open hardware, Arduino in particular. Arduino has been around for a while, but I’ve really only heard it mentioned in the context of communities that would be the logical extensions of the model rocketeers, the hams, and the heathkit crowd. During this talk I learned that Arduino has crossed over into communities that are focused more on craft and art. Arduino makes it possible for these people to make smart craft items. Some examples of such items are this “reverse geocache” wedding gift box, and a bicycling jacket with turn signal lights embedded in it.

Hallway

I try not to comment on the hallway track, but there’s one conversation that i had which I think merits an mention. Earlier in the year, some mutual friends introduced me to David Eaves, who has been consulting with Mozilla to develop some metrics to help Mozilla improve the way that it manages its community. In April, David previewed some of his work in a blog post. At the ASF various individuals have done small experiments around community metrics, but as far as I can tell David’s work is defining the state of the art in this area. I would love to see the work that he has been doing duplicated in JIRA and Github. This is the kind of work that should be a talk at OSCON, and it’s a shame that we’ll have to wait until next year to hear that talk. In the meantime, read the post, get on the mailing list, and help bring open source community tools into the 21st century.

Meta

I’ve been coming to OSCON since 2003 (I did miss one year), and I always look forward to it. This year OSCON was tough for me. I had a very difficult time finding sessions to go to, especially in OSCON proper. Of the four sessions I found blog worthy, two were about open source, and I only ended up in two of them by accident – I almost went to nothing during those sessions. Ordinarily, that wouldn’t be a problem, because there’s always the hallway track. This year a lot of people that I normally expect to see at OSCON were not there. I did have a good hallway track, but not nearly as rich as normal. As an example, some of us went to celebrate Duncan Davidson’s birthday on Thursday night. In the past, there have been enough people to take over the Vault in Portland. This year, we only needed a single table. For me, that’s a serious impact because of the way that I approach the technology field. When I was a graduate student, my advisor lamented about a bygone era of DARPA funding in which you would go to DARPA and say, “I’m going to do X”. Sometimes you came back with X, but sometimes you came back with Y, but as along as Y was interesting there wouldn’t be a problem. During my era, DARPA apparently got much more serious about you doing what you said you would do. In that bygone era, DARPA funded people. In my era, they funded topics. Topics are important to me, but given the choice between topics and people, I will pick the people every time. I always tell people that the value of OSCON is that it’s the one place where you can get any substantial number of the open source communities together under one roof. I hope that keeps happening.

Google I/O 2011

4 Replies

Google I/O has a different feel than many of the conferences that I attend. Like Apple’s WWDC, there is a distinctly vendor partisan tone to the entire show — having the show in the same location as WWDC probably reinforces that. Unlike WWDC, the web focused portion of Google I/O helps to blunt that feeling, and the fact that lots of things are open or being open sourced also helps with the partisan feeling.

I’m going to split this writeup into two parts, the two keynotes, and the rest of the talks.

Android Keynote

The first keynote was the Android keynote and opened with a recap of Android’s marketplace accomplishments over the last year. The tone was decidedly less combative towards Apple than last year. There weren’t many platform technology announcements. There was the expected discussion of features for the next version of Android, but I didn’t really see much that was new. There was a very nice head tracking demo that involved front facing cameras and OpenGL – I believe this will be a platform feature, which is cool. Much was made of Music and Movies, but this is mostly an end user and business development story. The ability to buy/stream without a cable is nice, but as long as devices need to be plugged in to recharge (which in my case is every day), I don’t find this to be as compelling as those who were clapping loudly. What I did find interesting was the creation of a Council that will specify how quickly devices will be updated to a particular release of Android, and how long a device will be supported. This is pretty much an admission that fragmentation is real and a problem that needs addressing. I hope that it works.

The most interesting announcement during the Android keynote was the open accessories initiative. This is in direct contrast to Apple’s tight control over the iOS device connector. Google’s initiative is based on the open source Arduino hardware platform, and they showed some cool integration with an exercise bike, control over a home made labyrinth board, and some very interesting home automation work. As part of the home automation stuff, they showed an NFC enabled CD package being swiped against a home audio device, which then caused the CD to be loaded into the Google music service. This is cool, but I don’t know if CD’s will be around long enough for NFC enabled packaging to become pervasive. I’m very curious to see how the accessories initiative will play out, especially versus the iOS device connector. If this were to take off, Apple could build support for the specs into future iOS devices, although they would have to swallow their pride first. This will be very interesting to watch.

Chrome Keynote

Day two’s keynote was about Chrome, and the open web, although the focus was on Google’s contributions. Adoption of Chrome is going really nicely – 160M users. There was a demonstration of adding speech input by adding a single attribute to an element (done via the Chrome Developer Tools). Performance got several segments. The obligatory Javascript performance slide when up, showing a 20x improvement since 2008, and the speaker said he hoped to stop showing such slides, claiming that the bottlenecks in the system were now in other parts of the browser. This was a perfect segue to show hardware accelerated CSS transforms as well as hardware accelerated Canvas and WebGL.

I’ve been curious whether the Chrome web store is really a good idea or not, and we got some statistics to ponder. Apparently people spend twice as much time in applications when they are obtained via the web store, and people perform 2.5x the number of transactions. I wish there were some more information on these stats. Of course this is all before in-app purchasing, which was announced, along with a very small 5% cut for Google.

Of course, no discussion of an app store should be without a killer app, so Google brought Rovio onto the stage to announce that Angry Birds is now available for the web, although it’s called Angry Birds for Chrome, and has special levels just for Chrome users. Apparently Chrome’s implementation of Open Web technologies has advanced to the point where doing a no compromises version of Angry Birds is possible. Another indication of how far the Open Web has come is “3D dreams of Black“, which is a cool interactive media piece that is part film, part 3d virtual world. I’m keeping a pretty close eye on the whole HTML5 space, but this piece really shows how the next generation of the web is coming together as a medium all its own.

The final portion of the keynote was about ChromeOS and the notebooks or “Chromebook”s that run it. A lot of the content in this section was a repeat of content from Google’s Chrome Update event in December, but there were a few new things. Google has been hard at work solving some of the usage problems discovered during the CR-48 beta. This includes the trackpad (which was awful), Movies and Music, local file storage, and offline access. The big news for I/O is that Google has decided that ChromeOS is ready to be installed on laptops which will be sold as “Chromebooks”. Samsung and Acer have signed up to manufacture the devices. Google will also rent Chromebooks to businesses ($28/mo per user) and schools ($20/mo per user). This is latest round of the network computer vision, and it’s going to be interesting to see whether the windows of technology readiness and user mindset are overlapping or not. The Chrome team appears to have the best marketing team at Google, and in their classic style, they’ve produced a video which they hope will persuade people of the Chromebook value proposition.

Talks

On to the talks.

“Make the Web Faster” by Richard Rabbat, Joshua Marantz, and Håkon Wium Lie was a double header talk covering mod_pagespeed and WebP. mod_pagespeed is a module for the Apache HTTP server, which speeds up web pages by using filters to rewrite pages and resources before they are delivered to the client. These rewrites are derived from the rules tested by the client side Page Speed tool. The other half of the talk was about WebP which is a new format for images. Microsoft also proposed a new web image format several years ago, but it didn’t go anywhere.

Nick Pelly and Jeff Hamilton presented “How to NFC”. The NFC landscape is complicated and there are lots of options because of hardware types and capabilities. The examples that were shown were reasonably straightforward, but the whole time I found myself thinking that NFC is way more complicated than it should be. Having written device drivers in a previous life, I shouldn’t be surprised, but I still am. It seems obvious to me that the concept of NFC is a great one. The technical end of thing seems tractable, if annoying. The business model issues are still unclear to me. I hope that it all comes together.

I really enjoyed Eric Bidelman and Arne Roomann-Kurrik’s HTML5 Showcase. They showed some neat demos of things that you can do in HTML5. I particularly liked this one using 3D CSS. They also did some entertaining stuff with a command line interface. All of the source code to their demos is available – the link is in the slides.

I wasn’t able to get to Paul Irish’s talk on the Chrome Developer Tools at JSConf – there was quite a bit of Twitter buzz about it. I wasn’t too worried because I knew that the talk would be given again at Google I/O. For this version Paul teamed up with Pavel Feldman. There are a lot of really cool features going into the Chrome Developer tools. My favorite new features are the live editing of CSS and Javascript, revisions, saving modified resources, and remote debugging. The slide deck has pointers to the rest of the new features. If they go much further, they are going to turn the Developer Tools into an IDE (which they said they didn’t want to do).

Ray Cromwell and Phillip Rogers did a talk titled “Kick-ass Game Programming with Google Web Toolkit”, which was a talk about ForPlay, which is a library for writing games that they developed on top of GWT. This is the library that Rovio used to do Angry Birds for Chrome. If you implement your game using GWT, ForPlay can compile your game into an HTML5 version, an Android native app version, a Flash version, and a desktop Java version. They also showed a cool feature where you could modify the code of the game in Eclipse, save it, and then switch to a running instance of the Java version of the game, and see the changes reflected instantly.

Postscript

Google has an undeniably large footprint in the mobile and open web spaces. I/O is a good way to keep abreast of what is happening at the Googleplex.

NodeConf 2011

4 Replies

Although I was definitely interested in JSConf (writeup), Nodeconf was the part of the week that I was really looking forward to. I’ve written a few small prototypes using Node and some networking / web swiss army knife code, so I was really curious to see what people are doing with Node, whether they were running into the same issues that I was, and overall just get a sense of the community.

Talks

Ryan Dahl’s keynote covered the plans for the next version of Node. The next release is focused on Windows, and the majority of the time was spent on the details of how one might implement Node on Windows. Since I’m not a Windows user, that means an entire release with nothing for me (besides bug fixes). At the same time, Ryan acknowledged the need for some kind of multiple Node on a single machine facility, which would appear in a subsequent. I can see the wisdom of making sure that the Windows implementation works well before tackling clustering or whatever it ends up being called. This is the third time I’ve heard Ryan speak, and this week is the first time I’ve spent any time talking with him directly. Despite all the hype swirling around Node, Ryan is quiet, humble, and focused on making a really good piece of software.

Guillermo Rauch talked about Socket.io, giving an overview of features and talking about what is coming next. Realtime apps and devices are a big part of my interest in Node, and Socket.io is providing an important piece of functionality towards that goal.

Henrik Joreteg’s talk was about Building Realtime Single Page applications, again in the sweet spot of my interest in Node. Henrik has built a framework called Capsule which combines Socket.io and Backbone.js to do real time synchronization of model states between the client and server. I’m not sure I believe the scalability story as far as the single root model, but there’s definitely some interesting stuff in there.

Brendan Eich talked about Mozilla’s SpiderNode project, where they’ve taken Mozilla’s SpiderMonkey Javascript Engine and implemented V8’s API around it as a veneer (V8Monkey) and then plugged that into Node. There are lots of reasons why this might be interesting. Brendan listed some of the reasons in his post. For me, it means a chance to see how some proposed JS.Next features might ease some of the pain of writing large programs in a completely callback oriented style. The generator examples Brendan showed are interesting, and I’d be interested in seeing some larger examples. Pythonistas will rightly claim that the combination of generators and callbacks is a been there / done that idea, but I am happy to see some recognition that callbacks cause pain. There are some other benefits of SpiderMonkey in Node such as access to a new debugging API that is in the works, and (at the moment) the ability to switch runtimes between V8 and SpiderMonkey via a command line switch. I would be fine if Mozilla decided to really take a run at making a “production quality” SpiderNode. Things are still early during this cycle of server side JavaScript, and I think we should be encouraging experimentation rather than consolidation.

One of the things that I’ve enjoyed the most during my brief time with Node is npm, the package management system. npm went 1.0 shortly before NodeConf, so Isaac Schleuter, the primary author of npm, described the changes. When I started using Node I knew that big changes were in the works for npm, so I was using a mix of npm managed packages and linking stuff into the Node search path directly. Now I’m using npm. When I work in Python I’m always using a virtualenv and pip, but I don’t like the fact that those two systems are loosely coupled. I find that npm is doing exactly what I want and I’m both happy and impressed.

I’ve been using Matt Ranney’s node_redis in several of my projects, it has been a good piece of code, so I was interested to hear what he had to say about debugging large node clusters. Most of what he described was pretty standard stuff for working in clustered environments. He did present a trick for using the REPL on a remote system to aid in debugging, but this is a trick that other dynamic language communities have been doing for some time.

Felix Geisendorfer’s talk was titled “How to test Asynchronous Code”. Unfortunately his main points were 1. No I/O (which takes out the asynchrony 2. TDD and 3. Discipline. He admitted in his talk that he was really advocating unit testing and mocking. While this is good and useful, it’s not really serious testing against the asynchronous aspects of the code, and I don’t really know of any way to do good testing of the non-determinism introduced by asynchrony. Felix released several pieces of code, including a test framework, a test runner, and some faking/mocking code.

Charlie Robbins from Nodejitsu talked about Node.js in production, and described some techniques that Nodejitsu uses to manage their hosted Node environment. Many of these techniques are embodied in Haibu, which is the system that Nodejitsu uses to manage their installation. Charlie pushed the button to publish the github repository for Haibu at the end of his talk.

Issues with Node

The last talk of the day was a panel of various Node committers and relevant folks from the broader Node community depending on the question. There were two of the audience questions that I wanted to cover.

The first was what kind of applications is Node.js not good for. The consensus of the panel was you wouldn’t want to use Node for applications involving lots of numeric computation, especially decimal or floating point, and that longer running computations were a bad fit as well. Several people also said that databases (as in implementing a database) were a problem space that Node would be bad at. Despite the hype surrounding Node on Twitter and in the blogosphere, I think that the core members of the Node community are pretty realistic about what Node is good for an where it could be usefully applied.

The second issue had to do with Joyent’s publication of a trademark policy for Node. One of the big Node events in the last year was Joyent’s hiring of Ryan Dahl, and subsequently a few other Node contributors. Joyent is basing its Platform as a Service offering on Node, and is mixing its Node committers with some top notch systems people who used to be at Sun, including some of the founding members of the DTrace team. Joyent has also taken over “ownership” of the Node.js codebase from Ryan Dahl, and that, in combination with the trademark policy is causing concern in the broader Node community.

All things being equal, I would prefer to see Node.js in the hands of a foundation. At the same time, I understand Joyent’s desire to try and make money from Node. I know a number of people at Joyent personally, and I have no reason to suspect their motives. However, with the backdrop of Oracle’s acquisition of Sun, and the way that Oracle is handling Sun’s open source projects, I think that it’s perfectly reasonable to have questions about Joyent or any other company “owning” an open source project. Let’s look at the ways that an open source project is controlled. There’s 1) licensing 2) intellectual property/patents 3) trademarks 4) governance. Now, taking them one at a time:

Licensing – Node.JS is licensed under the MIT license. There are no viral/reciprocal terms to prevent forking (or taking a fork private). Unfortunately, there are no patent provisions in the MIT license. This applies to #2 below. The MIT license is one of the most liberal licenses around – it’s hard to see anything nefarious in its selection, and forking as a nuclear option in the case of bad behavior by Joyent or an acquirer is not a problem. This is the same whether Node is at a foundation or at Joyent.
Intellectual Property – Code which is contributed to Node is governed by the Node Contributor License Agreement, which appears to be partially derived from the Apache Individual and Corporate Contributor license agreements (Joyent’s provision of an on-line form is something that I wish the ASF would adopt – we are living in the 21st century after all). Contributed IP is licensed to Node, but the copyright is not assigned as in the case of the FSF. Since all contributors retain their rights to their contributions, the IP should be clean. The only hitch would be if Joyent’s contributions were not licensed back on these terms as well, but given the use of the MIT license for the entire codebase, I don’ think that’s the case. As far as I can tell, there isn’t much difference between having Node at a foundation or having it at Joyent.
Trademark – Trademark law is misunderstood by lots of people, and the decision to obtain a trademark can be a controversial one for an open source project. Whether or not Node.js should have been trademarked is a separate discussion. Given that there will be a trademark for Node.js, what is the difference between having Node at a foundation or at Joyent? Trademark law says that you have to defend your trademark or risk losing it. That applies to foundations as well as for profit companies. The ASF has sent cease and desist letters to companies which are misusing Apache trademarks. The requirement to defend the mark does not change between a non-profit and a for-profit. Joyent’s policy is actually more liberal than the ASF trademark policy. The only difference between a foundation and a company would be the decision to provide a license for use of the trademark as opposed to disallowing a use altogether. If a company or other organization is misusing the Node.js trademark, they will have to either obtain a license or stop using the mark. That’s the same regardless of who owns the mark. What may be different is whether or not a license is granted or usage is forbidden. In the event of acquisition by a company unfriendly to the community, the community would lose the trademarks – see the Hudson/Jenkins situation to see what that scenario looks like.
Governance – Node.js is run on a “benevolent dictator for life” model of governance. Python and Perl are examples of community/foundation based open source projects which have this model of governance. The risk here is that Ryan Dahl is an employee of Joyent, and could be instructed to do things a certain way, which I consider unlikely. I suppose that a foundation you could try to write additional policy about removal of the dictator in catastrophic scenarios, but I’m not aware of any projects that have such a policy. The threat of forking is the other balance to a dictator gone rogue, and aside from the loss of the trademark, there are no substantial roadblocks to a fork if one became necessary.

To riff on the 2010 Web 2.0 Summit, these are the four “points of control” for open source projects. As I said, my first choice would have been a foundation, and for now I can live with the situation as it is, but I am also not a startup trying to use the Node name to help gain visibility.

Final thoughts

On the whole, I was really pleased with Nodeconf. I did pick up some useful information, but more importantly I got some sense of the community / ecosystem, which is really important. While the core engine of Node.js is important, it’s the growth and flourishing of the community and ecosystem that matter the most. As with most things Node, we are still in the early days but thing seem promising.

The best collections of JSConf/NodeConf slides seem to be in gists rather than Lanyrd, so here’s a link to the most up to date one that I could find.

Update: corrected misspelling of Henrik Joreteg’s name. And incorrectly calling Matt Ranney Mark.

JSConf 2011

3 Replies

Last year when I attended JSConf I had some ideas about the importance of Javascript. I was concerned in a generic way about building “richer” applications in the browser and Javascript’s role in building those applications. Additionally, I was interested in the possibility of using Javascript on the server, and was starting to learn about Node.js.

A year later, I have some more refined ideas. The fragmentation of mobile platforms means that open web technologies are the only way to deliver applications across the spectrum of telephones, tables, televisions and what have you, without incurring the pain of multi platform development. The types of applications that are most interesting to me are highly interactive with low latency user interfaces – note that I am intentionally avoiding the use of the word “native”. Demand for these applications is going to raise the bar on the skill sets of web developers. I think that we will see more applications where the bulk of the interface and logic are in the browser, and where the server becomes a REST API endpoint. The architecture of “New Twitter” is in this vein. API endpoints have far less of a need for HTML templating and server side MVC frameworks. But those low latency applications are going mean that servers are doing more asynchronous delivery of data, whether that is via existing Comet like techniques or via Websockets (once it finally stabilizes). Backend systems are going to partition into parts that do asynchronous delivery of data, and other parts which run highly computationally intensive jobs.

I’ll save the discussion of the server parts for my Nodeconf writeup, but now I’m ready to report on JSConf.

Talks

Here are some of the talks that I found interesting or entertaining.

Former OSAF colleague Adam Christian talked about Jellyfish, which is a tool for executing Javascript in a variety of environments from Node to desktop browsers to mobile browsers. One great application for Jellyfish is testing, and Jellyfish sprang out of the work that Adam and others did on Windmill.

It’s been a while since I looked at Bespin/Skywriter/Ace, and I was pleased to see that it seems to be progressing quite nicely. I particularly liked the Github support.

I enjoyed Mary Rose Cook’s account of how writing a 2D platform game using Javascript cause her to have a falling in love like experience with programming. It’s nice to be reminded of the sheer fun and art of making something using code.

Unfortunately I missed Andrew Dupont’s talk on extending built-ins. The talk was widely acclaimed on Twitter, and fortunately the slides are available. More on this (perhaps) once I get some time to read the slide deck.

Mark Headd showed some cool telephony apps built using Node.js including simple control of a web browser via cell phone voice commands or text messages. The code that he used is available, and uses Asterisk, Tropos, Couchbase, and a few other pieces of technology.

Dethe Elze showed of Waterbear, which is a Scratch-like environment running in the browser. It’s not solely targeted at Javascript, which I have mixed feelings about. My girls have done a bunch of Scratch programming, so I am glad to see that environment coming to languages that are more widely used.

The big topics

There were four talks in the areas that am really concerned about, and I missed one of them, which was Rebecca Murphey’s talk on Modern Javascript, which appeared to be derived from some blog posts that she has written on the topic. I think that the problems she is pointing out – ability to modularize, dependency management, and intentional interoperability are going to be major impediments to building large applications in the browser, never mind on the server.

Dave Herman from Mozilla did a presentation on a module system for the next version of Javascript (which people refer to as JS.next). The design looks reasonable to me, and you can actually play with it in Narcissus, Mozilla’s meta circular Javascript interpreter, which is a testbed for JS.next ideas. One thing that’s possible with the design is to run different module environments in the same page, which Dave demonstrated by running Javascript, Coffeescript, and Scheme syntaxed code in different parts of a page.

The last two talks of the conference were also focused on the topic of JS.next.

Jeremy Askenas was scheduled to talk about Coffeescript, but he asked Brendan Eich to join him and talk about some of the new features that have been approved or proposed for JS.next. Many of these ideas look similar to ideas that are in Coffeescript. Jeremy then went on to try and explain what he’s trying to do in Coffeescript, and encouraged people to experiment with their own language extensions. He and Brendan are calling programs like the Coffeescript compiler, “transpilers” – compilers which compile into Javascript. I’ve written some Coffeescript code just to get a feel for it, and parts of the experience reminded me of the days when C++ programs went through CFront, which then translated them into C which was then compiled. I didn’t care for that experience then, and I didn’t care for it this time, although the fact that most of what Coffeescript does is pure syntax means that the generated code is easy to associate back to the original Coffeescript. There appears to be considerable angst around Coffeescript, at least in the Javascript community. Summarizing that angst and my own experience with Coffeescript is enough for a separate post. Instead I’ll just say that I like many of the language ideas in Coffeescript, but I’d prefer not to see Coffeescript code in libraries used by the general Javascript community. If individuals or organizations choose to adopt Coffeescript, that’s fine by me, but having Coffeescript go into the wild in library code means that pressure will build to adapt Javascript libraries to be Coffeescript friendly, which will be detrimental to efforts to move to JS.next.

The last talk was given by Alex Russell, and included a triple head fake where Alex was ostensibly to talk about feature detection, although only after a too long comedic delay involving Dojo project lead Pete Higgins. A few minutes into the content on feature detection, Alex “threw up his hands”, and pulled out the real topic of his talk, which is the work that he’s been doing on Traceur, which is Google’s transpiler for experimenting with JS.next features. Alex then left the stage and a member of the Traceur team gave the rest of the talk. I am all in favor of cleverness to make a talk interesting, but I would have to say that the triple head fake didn’t add anything to the presentation. Instead, it dissipated the energy from the Brendan / Jeremy talk, and used up time that could have been used to better motivate the technical details that were shown. The Traceur talk ended up being less energetic and less focused than the talk before it, which is a shame because the content was important. While improving the syntax of JS.next is important, it’s even more important to fix the problems that prevent large scale code reuse and interoperability. The examples being given in the Traceur talk were those kinds of examples, but they were buried by a lack of energy, and the display of the inner workings of the transpiler.

I am glad to see that the people working on JS.next are trying to implement their ideas to the point where they could be used in large Javascript programs. I would much rather that the ECMAScript committee had actual implementation reports to base their decisions on, rather than designing features on paper in a committee (update: I am not meaning to imply that TC39 is designing by committee — see the comment thread for more on that. ). It is going to be several more years before any of these features get standardized, so in the meantime we’ll be working with the Javascript that we have, or in some lucky cases, with the recently approved ECMAScript 5.

Final Thoughts

If your interests are different than mine, here is a list of pointers to all the slides (I hope someone will help these links make it onto the Lanyrd coverage page for JSConf 2011.

JSConf is very well organized, there are lots of social events, and there are lots of nice touches. I did feel that last year’s program was stronger than this years. There are lots of reasons for why this might be the case, including what happened in Javascript in 2010/11, who was able to submit a talk, a change in my focus and interests. Chris Williams has a very well reasoned description of how he selects speakers for JSConf. In general I really agree with what he’s trying to do. One thing that might help is to keep all the sessions to 30 minutes, which would allow more speakers, and also reduce the loss if a talk doesn’t live up to expectations.

On the whole, I definitely got a lot out the conference, and as far as I can tell if you want to know what is happening or about to happen in the Javascript world, JSConf is the place to be.

South by Southwest Interactive 2011

Leave a reply

Back in 2006, Julie made the trek to Austin for South By Southwest Interactive (SXSWi) because she was organizing a panel. This year, I finally got a chance to go. In recent years, I’ve been to a lot of conferences. Many of them have been O’Reilly conferences, and the rest have been conferences organized by various open source communities. What almost all of them have in common is that they are developer centric. What is intriguing about SXSWi, to use John Gruber’s words, is that it is a conference where both developers and designers are welcome (As are a whole pile of people working in the social media space). One of the reasons that I decided to go this year was to try to get some perspective from a different population of people.

SXSWi is a very large conference with this year’s attendance at around 14000 people. There are conferences which are bigger (Oracle OpenWorld, JavaOne in its heyday, or ComicCon San Diego), but not many. If you mix in the Film conference, which runs at the same time, you have a lot of people in Austin. Any way you slice it, it’s a huge conference. According to “old-timers” that I spoke to, the scale is new, and I would say it’s the source of almost all of the problems that I had with the conference.

Talks

Common wisdom in recent years is that SXSWi is more about the networking than the panel / talk content. I did find a number of interesting talks.

I’ve been loosely aware of Jane McGonigal’s work on games for quite some time, but I’ve never actually been able to hear her speak until now. Gamification is a big topic in some circles right now. I think that Jane’s approach to gaming is deeper and has much longer term impact than just incorporating some of the types of game mechanics that are currently in vogue. I also really appreciated the scientific evidence that she presented about games. I’m looking forward to reading her book “Reality Is Broken: Why Games Make Us Better and How They Can Change the World”.

I had no idea who Felicia Day was when I got to SXSWi. Like all conferences, I did my real planning for each day of SXSWi the night before, doing the usual research on speakers that I was unfamiliar with. Felicia’s story resonated with me because she was homeschooled (like my daughters), went on to be very successful academically and then went into the entertainment business. She is among the leaders in bringing original video content to the internet instead of going through the traditional channels of broadcast television or movie studios. It’s a path that seems more and more likely to widen (witness Netflix’s licensing of “House of Cards”, or Google’s acquisition of Next New Networks). I learned all of that before I sat in the keynote. By the time that I left the keynote, I found myself charmed by her humility and down to earthness, and impressed by the way that she has built a real relationship with her fans in such a way that she can rally them for support when needed.

For the last year or so I’ve been seeing reviews for “The Power of Pull: How Small Moves, Smartly Made, Can Set Big Things in Motion” by John Hagel, John Seely Brown and Lang Davison. It sounded like the authors have found an interesting structuring for some of the changes that I’ve observed by being in the middle of open source software, blogging, and so forth. I still haven’t gotten around to reading that book (the stack is tall – well actually, the directory on the iPad is full), but I was glad for the chance to hear John Hagel talk about shaping strategies, his theory on how to make big changes by leveraging the resources of an entire market or ecosystem rather than taking on all the risk in a solo fashion. His talk was on the last day of the conference, and I was wiped out by then, so I need a refresher and some additional think time on his ideas.

Much to my surprise, there were a number of really interesting talks on the algorithmic side of Data Science/Big Data. Many of these talks were banished to the AT&T Conference center at UT Austin, which was really far from the Austin Convention Center and very inconvenient to get to. I wasn’t able to make it to many of these talks due to this – having venues so far away – the AT&T Center, the Sheraton, and the Hyatt – pretty much dooms the talks that get assigned to those venues. It’s not a total loss, since these days it’s pretty easy to find the speakers of the talks and contact them for more information. But that’s a much higher friction effort than going to their talk, having a chance to talk to them afterwards or over dinner, and going from there. I did really enjoy the talk Machines Trading Stocks on News. I am not a financial services guy, and there was no algorithmic heavy lifting on display, but the talk still provided a really interesting look at the issues around analyzing semistructured data and then acting on it. As usual, the financial guys are quietly doing some seriously sophisticated stuff, while the internet startup guys get all the attention. In a related vein, I also went to How to Personalize Without Being Creepy which had a good discussion of the state of the art of integrating personalization into products. There was not statistical machine learning on display, but the product issues around personalization are at least as important as the particulars of personalization technology.

One of the nice things about having such a huge conference is that you get some talks from interesting vectors. Our middle daughter has decided that she wants to go to Mars when she grows up. Now it’s quite some time between now and then, but just in case, I stopped into the talk on Participatory Space Exploration and collected a bunch of references that she can go chase. I was also able to chat with the folks from NASA afterwards and pick up some good age appropriate pointers.

There were some interesting sounding talks that I wasn’t able to get into because the rooms were full. And as I’ve mentioned there were also some talks that I wasn’ t able to go to because they were located too far away. As a first time SXSWi attendee but a veteran tech conference attendee and speaker, I’d say that SXSWi is groaning under its own scale at this point. It’s affecting the talks, the “evening track” and pretty much everything else. This is definitely a case of bigger is not better.

Party Scene

I am used to conferences with an active “evening track”, and of course, this includes parties. SXSWi is like no other event that I’ve been to. The sheer number of parties, both public and private is staggering. I’ve never had to wait in line to get into parties before, and there are very few VIP lists, whereas at SXSWi both lines and VIP lists seem to be the order of the day. Part of that is due to the scale, and I’m sure that part of that is SXSW’s reputation as a party or euphemistically, networking, conference. The other issue that I had with the parties is that the atmosphere at many of them just wasn’t conducive to meeting people. I went to several parties where the music was so loud that my ears were ringing within a short time. It’s great that there was good music (a benefit of SXSW), and lots of free sponsor alcohol, but that isn’t really my style.

Despite all that, I did have some good party experiences. I accidentally/serendipitously met a group of folks who are responsible for social media presences at big brands in the entertainment sector, so I got some good insight in to the kind of problems that they face and the back channel on business arrangements with some of the bigger social networks. I definitely got some serious schooling on how to use Foursquare. At another party, I got ground’s eye view on what parts of Microsoft’s Azure PaaS offering is real, and how much is not. I’m not planning to be an Azure user any time soon, but it’s always nice to know what is hype and what is reality. I also really enjoyed the ARM party. It was a great chance to see what people are doing with ARM processors – these days. This video that I saw at the TI table made me realize just how close we are to seeing some pretty cool stuff. Nikon USA and Vimeo sponsored a fun party at an abandoned power plant. The music was really loud, but the light was cool and I made some decent pictures.

Other activities

There are activities of all kinds going on during SXSW. I wasn’t able to do a lot of them because they conflicted with sessions, but I was able to go on a pair of photowalks, which was kind of fun. The photowalk with Trey Ratcliff was pretty fun. As usual, scale was an issue, because we pretty much clogged up streets and venues wherever we went. I’ve started to put some of those photos up on Flickr, but I decided to finish this post rather than finish the post production on the pictures.

App Round Up

One of the things that makes SXSWi is that you have a large group of people who are willing to try a new technology or application. It’s conventional wisdom that SXSWi provided launching pads for Twitter and Foursquare, so now every startup is trying to get you to try their application during the week of the conference. While by no means foolproof or definitive, this is a unique opportunity to observe how people might use a piece of technology.

Before flying down to Austin, I downloaded a bunch of new apps on my iPhone and iPad – so many that I had to make a SXSW folder. I had no preconceived notions about which of these new apps I was going to use.

There were also two web applications that I ended up using quite a bit: Lanyrd’s SXSW guide, and Plancast. Lanyrd launched last year as kind of a directory for conferences, and I’ve been using it to keep track of my conference schedule for a good number of months. For SXSWi, they created a SXSW specific part of the site that included all the panels, along with useful information like the Twitter handles and bios of the speakers. Although SXSW itself had a web application with the schedule, I found that Lanyrd worked better for the way that I wanted to use the schedule. This is despite the face that SXSW had an iPhone app while Lanyrd’s app has yet to ship. With Lanryd covering the sessions, I used Plancast (and along the way Eventbrite) to manage the parties. Plancast had all the parties in their system, including the Alaska direct flight from Seattle to Austin that I was on. Many of the parties were using Eventbrite to limit attendance, and while I had used Eventbrite here and there in the past, this finally got me to actually create an account there and use it. Eventbrite and Plancast integrate in a nice way, and it all worked pretty well for me.

Of all the ballyhooed applications that I downloaded, I really only ended up using two. There were a huge number of group chat/small group broadcast applications competing for attention. The one that I ended up using was GroupMe, mostly because the people I wanted to keep up with were using it. Beyond the simple group chat/broadcast functionality, it has some other nice features like voice conference calling that I didn’t really make use of during SXSW. Oddly enough, I first started using Twitter when I was working with a distributed team, and I always wished that Twitter had some kind of group facility. It’s nice that GroupMe and its competitors exist, but I also can’t help feeling like Twitter missed an opportunity here. Facebook’s acquisition of Beluga suggests as much.

The other application that I ended up using was Hashable. Hashable’s marketing describes it as “A fun and useful way to track your relationships”. I’d describe my usage of it as a way to exchange business cards moderately quickly using Twitter handles. A lot of my Hashable use centered around using my Belkin Mini Surge Protector Dual USB Charger to multiply the power outlets at the back of the ballrooms. I’ve made a lot of friends with that little device. In any case, I used Hashable as a quick way to swap information with my new power strip friends. While I used it, I’m ambivalent about it. I like that it can be keyed off of either email address or Twitter handle – I always used Twitter handle. My official business cards don’t have a space for the handle, which is annoying here in the 21st century. However, the profile that it records is not that detailed, so any business card information that is going to a new contact isn’t that detailed. It seems obvious to me that there ought to be some kind of connection to LinkedIn, but there’s no space for that. So I couldn’t really use Hashable as a replacement for a business card because all the information isn’t there. It’s also more clumsy to take notes about a #justmet on the iPhone keyboard than to write on the back of a card. The difficulty of typing on the iPhone keyboard also makes it time consuming and kind of antisocial to use. In a world where everyone used Hashable, and phones were NFC equipped, you can imagine a more streamlined exchange, but even then, the right app would have to be open on the phone. Long term, that’s an interface issue that phones are going to run into. Selecting the right functionality at the right time is getting to be harder and harder – pages of folders of apps means that everything gets on the screen, but it doesn’t mean that accessing them is fast.

In a similar vein, there were QR codes plastered all over pamphlets, flyers, and posters, but as @larrywright asked me on Twitter, I didn’t see very many people scanning them. Maybe people were scanning all that literature in their rooms after being out till 2am. There’s still an interface problem there.

In addition to all the hot new applications, there were the “old” standby’s, Foursquare and Twitter.

I am a purpose driven Foursquare user. I use Foursquare when I want people to know where I am. I’ve never really been into the gamification aspects of Foursquare, but I figured that SXSWi was the place to give that aspect of Foursquare more of a try. Foursquare rolled out a truckload of badges for SXSWi, and sometimes it seemed like you could check into every individual square foot of the Austin Convention Center and surrounding areas. So I did do a lot more checking in, mostly because there were more places to check in, and secondarily because I was trying to rack up some points. Not that the points ever turned into any tangible value for me. But as has been true at other conferences, the combination of checking on Foursquare and posting those checkins to Twitter did in fact result in some people actually tracking me down and visiting.

If you only allowed me one application, it would still be Twitter. If I wanted to know what was happening, Twitter was the first place I looked. Live commentary on the talks was there. I ended up coordinating several serendipitous meetings with people from Twitter. Twitter clients with push notifications made things both easy and timely. While I’m very unhappy with Twitter’s recent decree on new Twitter clients, the service is still without equal for the things that I use it for.

One word on hardware. There were lots of iPad 2’s floating around. I’m not going to do a commentary on that product here. For a conference like SXSWi, the iPad is the machine of choice. After the first day, I locked my laptop in the hotel safe. I would be physically much more worn out if I had hauled that laptop around. The iPad did everything that I needed it to do, even when I forgot to charge it one night.

Interesting Tech

While SXSWi is not a hard core technology conference, I did manage to see some very interesting technology. I’ve already mentioned the TI OMAP5 product line at the ARM party. I took a tour of the exhibit floor with Julie Steele from O’Reilly, and one of the interesting things that we saw was an iPhone app called Neer. Neer is an application that let’s you set to-do’s based on location. This is sort of an interesting idea, but the more interesting point came out after I asked about Neer’s impact on the phone’s battery life. I had tried an application called Future Checkin, which would monitor your location and and check you into places on Foursquare, because I was so bad about remembering to check in. It turned out that this destroyed the battery life on my phone, so I stopped using it. When I asked the Neer folks how they dealt with this, they told me that they use the phone’s accelerometer to detect when the phone is actually moving, and they only ping the GPS when they know you are moving, thus saving a bunch of battery life. This is a clever use of multiple sensors to get the job done, and I suspect that we’re really only at the beginning of seeing how the various sensors in mobile devices will be put to use. It turns out that the people working on Neer are part of a Qualcomm lab that is focused on driving the usage of mobile devices. I’d say they are doing their job.

The other thing that Julie and I stumbled upon was 3taps, which is trying to build a Data Commons. The whole issue of data openness, provenence, governance, and so forth is going to be a big issue in the next several years, and I expect to see lots of attempts to figure this stuff out.

The last interesting piece of technology that I learned about is comes from Acunu. The Acunu folks have developed a new low-level data store for NoSQL storage engines, particularly engines like Cassandra. The performance gains are quite impressive. The engine will be open source and should be available in a few months.

In conclusion

SXSWi is a huge conference and it took a lot out of me, more than any other conference that I’ve been to. While I definitely got some value out of the conference, I’m not sure that the value I got corresponded to the amount of energy that I had to put in. Some of that is my own fault. If I were coming back to SXSWi, here are some things that I would do:

Work harder at being organized about the schedule and setting up meetings with people prior to the conference
Skip many of the parties and try to organize get togethers with people outside of the parties
Eat reasonably – SXSW has no official lunch or dinner breaks – this makes it to easy to go too long without eating which leads to problems.
Always sit at the back of the room and make friends over the power outlets

Lanyrd is collecting various types of coverage of the conference whether that is slide decks, writeups, or audio recordings.

I like the idea of SXSWi, and I like the niche that it occupies, but I think that scale has overtaken the conference and is detracting from the value of it. Long time attendees told me that repeatedly when I asked. I would love to see some alternatives to SXSWi, so that we don’t have to put our eggs all in one basket.

Strata 2011

11 Replies

I spent three days last week at O’Reilly’s Strata Conference. This is the first year of the conference, which is focused on topics around data. The tag line of the conference was “Making Data Work”, but the focus of the content was on “Big Data”.

The state of the data field

Big Data as a term is kind of undefined in a “I’ll know it when I see it” kind of way. As an example,I saw tweets asking how much data one needed to have in order to qualify as having a Big Data problem. Whatever the complete meaning is, if one exists, there is a huge amount of interest in this area. O’Reilly planned for 1200 people, but actual attendance was 1400, and due to the level of interest, there will be another Strata in September 2011, this time in New York. Another term that was used frequently was data science, or more often data scientists, people who have a set of skill that make them well suited to dealing with data problems. These skills include programming, statistics, machine learning, and data visualization, and depending on who you ask, there will be additions or subtractions from that list. Moreover, this skill set is in high demand. There was a very full job board, and many presentations ended with the words “we’re hiring”. And as one might suspect, the venture capitalists are sniffing around — at the venture capital panel, one person said that he believed there was a 10-25 year run in data problems and the surrounding ecosystem.

The Strata community is a multi disciplinary community. There were talks on infrastructure for supporting big data (Hadoop, Cassandra, Esper, custom systems), algorithms for machine learning (although not as many as I would have liked), the business and ethics of possessing large data sets, and all kinds of visualizations. In the executive summit, there were also a number of presentations from traditional business intelligence, analytics, and data warehousing folks. It is very unusual to have all these communities in one place and talking to each other. One side effect of this, especially for a first time conference, is that it is difficult to assess the quality of speakers and talks. There were a number of talks which had good looking abstracts, but did not live up to those aspirations in the actual presentation. I suspect that it is going to take several iterations to identify the the best speakers and the right areas – par for a new conference in a multidisciplinary field.

General Observations

I did my graduate work in object databases, which is a mix of systems, databases, and programming languages. I also did a minor in AI, although it was in the days before machine learning really became statistically oriented. I’m looking forward to going a bit deeper into all these areas as I look around in the space.

One theme that appeared in many talks was the importance of good, clean data. In fact, Bob Page from eBay showed a chart comparing 5 different learning algorithms, and it was clear that having a lot of data made up for differences in the algorithms, making the quality and volume of the data more important than the details of the algorithms being used. That’s not to say that algorithms are unimportant, just that high quality data is more important. It seems obvious that having access to good data is really important.

Another theme that appeared in many talks was the combination of algorithms and humans. I remember this being said repeatedly in the panel on predicting the future. I think that there’s a great opportunity in figuring out how to make the algorithm and human collaboration work as pleasantly and efficiently as possible.

There were two talks that at least touched on building data science teams, and on Twitter it seemed that LinkedIn was viewed as having one of the best data science teams in the industry. Not to take anything away from the great job that the LinkedIn folks are doing, or the importance of helping people find good jobs, but I hope that in a few years, we are looking up to data science teams from healthcare, energy, and education.

It amused me to see tweets and have discussions on the power of Python as a tool in this space. With libraries like numpy, scipy, nltk, and scikits.learn, along with an interactive interpreter loop, Python is well suited for data science/big data tasks. It’s interesting to note that tools like R and Incanter have similar properties.

There were two areas that I am particularly interested in, and which I felt were somewhat under represented. The issue of doing analysis in low latency / “realtime” scenarios, and the notion of “personal analytics” (analytics around a single person’s data). I hope that we’ll see more on these topics in the future.

The talks

As is the case nowadays, the proceedings from the conference are available online in the form of slide decks, and in some cases video. Material will probably continue to show up over the course of the next week or so. Below are some of the talks I found noteworthy.

Day 1

I spent the tutorial day in the Executive Summit, looking for interesting problems or approaches that companies are taking with their data efforts. There were two talks that stood out to me. The first was Bob Page’s talk Building the Data Driven Organization, which was really about eBay. Bob shared from eBay’s experience over the last 10 years. Probably the most interesting thing he described was an internal social network like tool, which allowed people to discover and then bookmark analytics reports from other people.

Marilyn and Terence Craig presented Retail: Lessons Learned from the First Data-Driven Business and Future Directions, which was exactly how it sounded. It’s conventional wisdom among Internet people that retail as we know it is dead. I came away from this talk being impressed by the problems that retail logistics presents, and by how retail’s problems are starting to look like Internet problems. Or is that vice versa?

Day 2

The conference proper started with the usual slew of keynotes. I’ve been to enough O’Reilly conferences to know that some proportion of the keynotes are given in exchange for sponsorships, but some of the keynotes were egregiously commercial. The Microsoft keynote included a promotional video, and the EnterpriseDB keynote on Day 3 was a bald faced sales pitch. I understand that the sponsors want to get value for the money they paid (I helped sponsor several conferences during my time at Sun). The sponsors should look at the twitter chatter during their keynotes to realize that these advertising keynotes hurt them far more than they help them. Before Strata, I didn’t really know anything about EnterpriseDB except that they had something to do with Postgres. Now I all I know is that they wasted a bunch of my time during a keynote spot.

Day 2 was a little bit light on memorable talks. I went to Generating Dynamic Social Networks from Large Scale Unstructured Data which was in the vendor presentation track. Although I didn’t learn much about the actual techniques and technologies that were used, I did at least gain some appreciation for the issues involved. The panel Real World Applications Panel: Machine Learning and Decision Support only had two panelists. Jonathan Seidman and Robert Lancaster from Orbitz described how they use learning for sort optimization, intelligent caching, and personalization/segmentation, and Alasdair Allan from the University of Exeter described the use of learning and multiagent systems to control networks telescopes at observatories around the world. The telescope control left me with a vaguely SkyNet ish feeling. Matthew Russell has written a book called Mining the Social Web. I grabbed his code off of github and it looked interesting, so I dropped into his talk Unleashing Twitter Data for Fun and Insight. He’s also written 21 Recipes for Mining Twitter, and the code for that is on github as well.

Day 3

Day 3 produced a reprieve on the keynote front. Despite the aforementioned horrible EnterpriseDB keynote, there were 3 very good talks. LinkedIn’s keynote on Innovating Data Teams was good. They presented some data science on the Strata attendees and described how they recruited and organized their data team. They did launch a product, LinkedIn Skills, but it was done in such a way as to show off the data science relevant aspects of the product.

Scott Yara from EMC did a keynote called Your Data Rules the World. This is how a sponsor keynote should be done. No EMC products were promoted, and Scott did a great job of demonstrating a future filled with data, right down to still and video footage of him being stopped for a traffic violation. The keynote provoked you to really thing about where all this is heading, and what some of the big issues are going to be. I know that EMC make storage and other products. But more than that, I know that they employ Product Management people who have been thinking deeply about a future that is swimming with data.

The final keynote was titled Can Big Data Fix Healthcare?. Carol McCall has been working on data oriented healthcare solutions for quite some time now, and her talk was inspirational and gave me some hope that improvements can happen.

Day 3 was the day of the Where’s the Money in Big Data? panel, where a bunch of venture capitalists talked about how they see the market and where it might be headed. It was also the day of two really good sessions. In Present Tense: The Challenges and Trade-offs in Building a Web-scale Real-time Analytics System, Ben Black described Fast-IP’s journey to build a web-scale real-time analytics system. It was an honest story of attempts and failures as well as the technical lessons that they learned after each attempt. This was the most detailed technical talk I attended, although the terms distributed lower dimensional cuboid and word-aligned bitmap index were tossed around, but not covered in detail. It’s worth noting that Fast-IP’s system and Twitter’s Analytics system, Rainbird, are both based, to varying degrees, on Cassandra.

I ended up spending an extra night in San Jose so that I could stay for Predicting the Future: Anticipating the World with Data, which was in the last session block of the conference. I think that it was worth it. This was a panel format, but each panelist was well prepared. Recorded Future is building a search engine that uses the past to predict the future. They didn’t give out much of their secret sauce, but they did say that they have built a temporally based index as opposed to a keyword based one. Unfortunately their system is domain specific, with finance and geopolitics being the initial domains. Palantir Technologies is trying to predict terrorist attacks. In the abstract, this means predicting in the face of an adaptive adversary, and in contexts like this, the key is to stop thinking in terms of machine learning and start thinking in terms of game theory. It seems like there’s a pile of interesting stuff in that last statement. Finally, Rion Snow from Twitter took us through a number of academic papers where people have successfully made predictions about box office revenue, the stock market, and the flu, just from analyzing information available via Twitter. I had seen almost all of the papers before, but it was nice to feel that I hadn’t missed any of the important results.

Talks I missed but had twitter buzz

You can’t go to every talk at a conference (nor should you, probably), but here are some talks that I missed, but which had a lot of buzz on Twitter. MAD Skills: A Magnetic, Agile and Deep Approach to Scalable Analytics – the hotness of this talk seemed related more to the DataWrangler tool (for cleansing data) than the MAD library (scalable analytics engine running inside Postgres) itself. Big Data, Lean Startup: Data Science on a Shoestring seemed like it had a lot of just good commonsense about running in a startup in addition to know how to do data science without doing overkill. Joseph Turian’s New Developments in Large Data Techniques looked like a great talk. His slides are available online, as well as the papers that he referenced. It seemed like the demos were the topic of excitement in Data Journalism: Applied Interfaces, given jointly by folks from ReadWriteWeb, The Guardian, and The New York Times. Rainbird is Twitter’s analytics system, which was described in Real-time Analytics at Twitter. Notable news on that one is that Twitter will be open sourcing Rainbird once the requisite version of Cassandra is released.

Evening activities

There were events both evenings of the show, which made for very long days. On Day 1 there was a showcase of various startup companies, and on Day 2, there was a “science fair”. In all honesty, the experience was pretty much the same both nights. Walk your way around some tables/pedestals, and talk to people who are working on stuff that you might think is cool. The highlights for me were:

Jeremie Miller’s Locker project (github) (written on Node.js)
Impure, a visual tool for building data visualizations – here are some Strata/Twitter visualizations built with Impure
DrawnToScale, a big data database being developed here in Seattle

Links

Here is a bunch of miscellaneous interesting links from the conference:

Data BootCamp slides
The book Elements of Statistical Learning (in PDF form, too)
The paper The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis, recommended by Mark Madsen during his keynote
The book The Fourth Paradigm: Data-Intensive Scientific Discovery (in PDF form, too), recommended by Werner Vogels during his keynote
The $3M Heritage Health Prize to develop an algorithm to predict and prevent unnecessary hospitalizations
Public domain government information
All data covered by the Guardian since 2009
40 Fascinating blogs for the Ultimate Statistics Geek
A List of Data Markets

Tweet Mining

Finally, no conference on data should be without it’s own Twitter exhaust. So I’ll leave you with some analysis and visualizations done on the tweets from Strata.

Chirpstory’s compilation of all the tweets.
Confluential’s analysis
Impure generated visualizations

Update: Thanks to bear for a typo correction.

Blogaversary 2011

1 Reply

I’m not that good at remembering my Blogaversary — it’s been two years since I remembered last. You can thank the OmniGroup’s wonderful OmniFocus for reminding me in time this year. Almost everything that I wrote describing my 6 year blogaversary is still true today. In fact, I’m doing more traveling than I was when I wrote that. In the past much of my travel has been for conferences, but last year, I did a lot of traveling for other meetings. I’m expecting that I’ll be at fewer conferences this year than last year. I’ve started using Simon Willison’s excellent Lanyrd to manage my conference tracking. My list for this year will give you some hints about some of the stuff that I am looking at. One thing that is difference since I’ve been at Disney is that I am seeing lots of interesting stuff, but much of it is covered by Non Disclosure Agreements. Needless to say, I don’t write about any of that.

Here’s to another year of blogging, tweeting, and whatever else is coming down the path.