Radically Open Cultural Heritage Data at SXSW Interactive 2012

April 11, 2012


Posted by Adrian Stevenson

I had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with Jon Voss from Historypin, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were delighted to see that Mashable.com voted it as one of ’22 SXSW Panels You Can’t Up This Year’.

All of our panelists covered themes and issues addressed by the Discovery initiative, including the importance of open licenses, and the need for machine readable data via APIs to facilitate the easy transfer, aggregation and link-up of library, archives and museum content.

Jon gave some background on the ‘Linked Open Data in Libraries, Archives and Museums’ (LOD-LAM) efforts around the world, talking about how the first International LODLAM Summit held in San Francisco last year helped galvanise the LODLAM community. Jon also covered some recent work Historypin are doing to allow users to dig into archival records.

Julie then covered some of the technical aspects of publishing Linked Data through the lens of the OpenArt Discovery project, which recently released the ‘London Art World 1660-1735’ data. She mentioned some of the benefits of the Linked Data approach, and explained how they’ve been linking to VIAF for names and Geonames for location.

I gave a quick overview of the LOCAH and Linking Lives projects, before giving a heads up to the World War One Discovery project. LOCAH has been making archival records from the Archives Hub national service available as Linked Data, and Linking Lives is a continuation project that’s using Linked Data from a variety of sources to create an interface based around the names of people in the Archives Hub. After attempting to crystallise what I see are the key benefits of Linked Data, I finished up by focusing on particular challenges we’ve met on our projects.

Rachel considered how open data might affect policies, procedures and the organisational structure of the library world.  She talked about the Digital Public Library of America, a growing initiative started in Oct 2010. The DPLA vision is to have an “open distributed network of comprehensive online resources that draw on the nations living history from libraries, universities, archives and museums to educate, inform, and empower everyone in current and future generations”. After outlining how the DPLA is aiming to achieve this vision, she explained how interested parties can get involved.

There’s an audio recording of the panel on our session page, as well as recordings of all sessions mentioned below on their respective SXSW pages. I’ve also included the slides for our session at the bottom of this post.

Not surprisingly, there were plenty of other great sessions at SXSW. I’ve picked a few highlights that I thought would be of interest to readers of this blog.

Probably of most relevance to Discovery was the lightening fast ‘Open APIs: What’s Hot and What’s Not’ session from John Musser, founder of Programmableweb.com, who gave us what he sees as the eight hottest API trends. He mentioned that the REST style of software architecture is rapidly growing in popularity, being regarded as easier to use than other API technologies such as SOAP (see image below). JSON is very popular with 60% of APIs now supporting it. It was also noted that one in five APIs don’t support XML.

Hot API Protocols and Styles from John Musser of Programmableweb.com

Musser suggested that APIs need to be supported, with Hackathons and funded prizes being a good way to get people interested. He noted that the hottest trend right now is that VCs are providing significant funding to incentivise people to use their APIs, Twilio being one of the first to do this. He also mentioned that your API documentation needs to be live if you’re to get interest and maintain use. Invisible mashups are also hot, with operating systems such as Apple’s OS cited as being examples of such. Musser suggests the overall meta-trend is that APIs are now ubiquitous. John’s now made his slides available on slideshare.

The many users of laptops amongst us will have been interested to hear about the ‘Future of Wireless Power’.  The session didn’t go into great detail, but the message was very much “it’s not a new technology, and it’ll be here very soon”. Expect wireless power functionality in mobile devices in the next few years, using the Qi standard.

Some very interesting folks from MIT gave the thought provoking ‘MIT Media Lab: Making Connections’ session. Joi Ito, Director of MIT Media Labs explained how it’s all about the importance of connecting people, stating that “we’re now beyond the cognitive limits of individuals, and are in an era where we rely on networks to make progress”. He suggested that traditional roadmaps are outmoded, and that we should throw them away and embrace serendipity if we’re to make real progress in technology. Ito mentioned that MIT has put significant funding into undirected research and an ‘anti-disciplinary’ approach. He said that we now have much agility in hardware as well as software, and that the agile software mentality is being applied to hardware development. He pointed to a number of projects that are embracing these ideas – idcubed, affectiva, sourcemap and formlabs.

Josh Greenberg talked about ‘macroscopy’ in the ‘Data Visualization and the Future of Research’ session, which is essentially about how research is starting to be done at large scale. Josh suggested that ‘big data’ and computation are now very important for doing science, with macroscopy being the implementation of big data to research. He referred to the ‘Fourth Paradigm’ book which presents the idea that research is now about data intensive discovery. Lee Dirks from Microsoft gave us a look at some new open source tools they’ve been developing for data visualisation, including Layerscape, which allows users to explore and discover data, and Chronozoom, which looked useful for navigating through historical big data.  Lee mentioned Chronozoom was good for rich data sources such as archive & museum data, demoing it using resources relating to the Industrial Revolution.

So that was about it for the sessions I was able to get to as part of the SXSW Interactive conference. It was a really amazing event, and I’d highly recommend it to anyone as a great way to meet some of the top people in the technology sector, and of course, hear some great sessions.

The slides from our session:

Developers entries help us explore new possibilities in discovery

September 15, 2011

It really was a tough call to pinpoint a clear winner for the #discodev competition. After we gave people a bit more time, using some of the August lull to work on applications, we ended up with a really good array of entries, demonstrating a wide range of possibilities. A key judging criterion (obviously) concerns the usability of the application. But judging aside, I am personally less concerned with how usable a rapidly developed application is – and some of these applications have worked very effectively with complex and often dense datasets – but how much they get me thinking about potential use cases and benefits.

To a large degree, the Discovery programme is about identifying the potential, and where appropriate finding ways to build on someone’s seed of an idea. Applications such as Yogesh Patel’s experiment with Archives Hub linked data might only scratch at the surface of the dataset but they still prompt us to think about some of the great potential that exists. Along with What’s About it hints at the potential of combining historic and contemporary geospatial data to provide new routes through to content; to explore the world of ‘exploration’ spatially as opposed through the linear and hierarchical structure of the archival description. I think the archival community especially is hungry for examples to help us get past some of our entrenched thinking about what discovery interfaces looks like. Along with initiatives such as HistoryPin, OCLCs MapFast these applications give us something tangible to react to and explore ideas around discovering library, archival, or museum data geospatially.

We’re also learning more about the potential for Linked Data. The entry from Mathieu D’Aquin, Discobro, compliments the research and development activity of the JISC-funded LOCAH project perfectly in this regard. These are projects that enable the archival community see how EAD rendered as linked data can become more embedded within the wider web of data; and instantly (it seems to me) we’re forced beyond the finding aid and document-centric mindset, and thinking about our descriptions as data that needs to be interlinkable to be found and used. It is remarkable how well Discobro works. My own search for the Stanley Kubrick archives in the Archives Hub using the bookmarklet immediately provided multiple links out to DBpedia entries on Kubrick’s life, cinematography, and films. All this is not achieved through a manual mashing of data, but an automatic ‘meshing’ that can scale (which is perhaps one of the most heady promises of Linked Data).

Will Linked Data be The Way Forward? The jury’s still out, but applications such as Discobro,  and others help us understand in much more tangible terms what benefits might be delivered.

And some applications demonstrated benefits that we can work on delivering much more immediately. For me the stand out here is the Open URL Router Recommender developed by Dimitrios Sferopoulos and Sheila Fraser at EDINA . My brain’s whirring with the possibility of how we can include this as a functionality into article search services at the local or national level (for example, embedding it into the newly designed Zetoc which will be launched later this year). The use case for recommender functions is already proven, although we have more to learn about such functions in academic and teaching contexts, but what EDINA have demonstrated is what you can achieve through the network effect – gathering data centrally. Patterns and relationships between articles emerge that are not readily available through other means. It’s simple, and the data’s already there waiting to be exploited. As a result we can provide routes through to discovery based on communities of use, disciplinary context, and not descriptive metadata alone.

Neeta Patel’s simple visualisation of the MOSAIC circulation data demonstrates something similar – through my involvement with the SALT and Copac Collections Management projects, we know that libraries are already using their circ data (if they collect it) to inform collection management decisions, but that often this work involves scrutinising spreadsheets and figures. Visual views of the data can really help support such analysis, and give that at-a-glimpse overview that can often tell a whole story.

There’s obviously a lot more that could be said about these entries (I wish I could touch on them all) and hopefully we’ll hear some views from my Discovery cohorts.  I’m now interested in seeing what conversations now open up as a result, and what practical work we can carry forward through new collaborations.