Radically Open Cultural Heritage Data at SXSW Interactive 2012

April 11, 2012


Posted by Adrian Stevenson

I had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with Jon Voss from Historypin, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were delighted to see that Mashable.com voted it as one of ’22 SXSW Panels You Can’t Up This Year’.

All of our panelists covered themes and issues addressed by the Discovery initiative, including the importance of open licenses, and the need for machine readable data via APIs to facilitate the easy transfer, aggregation and link-up of library, archives and museum content.

Jon gave some background on the ‘Linked Open Data in Libraries, Archives and Museums’ (LOD-LAM) efforts around the world, talking about how the first International LODLAM Summit held in San Francisco last year helped galvanise the LODLAM community. Jon also covered some recent work Historypin are doing to allow users to dig into archival records.

Julie then covered some of the technical aspects of publishing Linked Data through the lens of the OpenArt Discovery project, which recently released the ‘London Art World 1660-1735’ data. She mentioned some of the benefits of the Linked Data approach, and explained how they’ve been linking to VIAF for names and Geonames for location.

I gave a quick overview of the LOCAH and Linking Lives projects, before giving a heads up to the World War One Discovery project. LOCAH has been making archival records from the Archives Hub national service available as Linked Data, and Linking Lives is a continuation project that’s using Linked Data from a variety of sources to create an interface based around the names of people in the Archives Hub. After attempting to crystallise what I see are the key benefits of Linked Data, I finished up by focusing on particular challenges we’ve met on our projects.

Rachel considered how open data might affect policies, procedures and the organisational structure of the library world.  She talked about the Digital Public Library of America, a growing initiative started in Oct 2010. The DPLA vision is to have an “open distributed network of comprehensive online resources that draw on the nations living history from libraries, universities, archives and museums to educate, inform, and empower everyone in current and future generations”. After outlining how the DPLA is aiming to achieve this vision, she explained how interested parties can get involved.

There’s an audio recording of the panel on our session page, as well as recordings of all sessions mentioned below on their respective SXSW pages. I’ve also included the slides for our session at the bottom of this post.

Not surprisingly, there were plenty of other great sessions at SXSW. I’ve picked a few highlights that I thought would be of interest to readers of this blog.

Probably of most relevance to Discovery was the lightening fast ‘Open APIs: What’s Hot and What’s Not’ session from John Musser, founder of Programmableweb.com, who gave us what he sees as the eight hottest API trends. He mentioned that the REST style of software architecture is rapidly growing in popularity, being regarded as easier to use than other API technologies such as SOAP (see image below). JSON is very popular with 60% of APIs now supporting it. It was also noted that one in five APIs don’t support XML.

Hot API Protocols and Styles from John Musser of Programmableweb.com

The rise of REST – ‘Hot API Protocols and Styles’ from John Musser of Programmableweb.com at SXSW 2012

Musser suggested that APIs need to be supported, with Hackathons and funded prizes being a good way to get people interested. He noted that the hottest trend right now is that VCs are providing significant funding to incentivise people to use their APIs, Twilio being one of the first to do this. He also mentioned that your API documentation needs to be live if you’re to get interest and maintain use. Invisible mashups are also hot, with operating systems such as Apple’s OS cited as being examples of such. Musser suggests the overall meta-trend is that APIs are now ubiquitous. John’s now made his slides available on slideshare.

The many users of laptops amongst us will have been interested to hear about the ‘Future of Wireless Power’.  The session didn’t go into great detail, but the message was very much “it’s not a new technology, and it’ll be here very soon”. Expect wireless power functionality in mobile devices in the next few years, using the Qi standard.

Some very interesting folks from MIT gave the thought provoking ‘MIT Media Lab: Making Connections’ session. Joi Ito, Director of MIT Media Labs explained how it’s all about the importance of connecting people, stating that “we’re now beyond the cognitive limits of individuals, and are in an era where we rely on networks to make progress”. He suggested that traditional roadmaps are outmoded, and that we should throw them away and embrace serendipity if we’re to make real progress in technology. Ito mentioned that MIT has put significant funding into undirected research and an ‘anti-disciplinary’ approach. He said that we now have much agility in hardware as well as software, and that the agile software mentality is being applied to hardware development. He pointed to a number of projects that are embracing these ideas – idcubed, affectiva, sourcemap and formlabs.

Josh Greenberg talked about ‘macroscopy’ in the ‘Data Visualization and the Future of Research’ session, which is essentially about how research is starting to be done at large scale. Josh suggested that ‘big data’ and computation are now very important for doing science, with macroscopy being the implementation of big data to research. He referred to the ‘Fourth Paradigm’ book which presents the idea that research is now about data intensive discovery. Lee Dirks from Microsoft gave us a look at some new open source tools they’ve been developing for data visualisation, including Layerscape, which allows users to explore and discover data, and Chronozoom, which looked useful for navigating through historical big data.  Lee mentioned Chronozoom was good for rich data sources such as archive & museum data, demoing it using resources relating to the Industrial Revolution.

So that was about it for the sessions I was able to get to as part of the SXSW Interactive conference. It was a really amazing event, and I’d highly recommend it to anyone as a great way to meet some of the top people in the technology sector, and of course, hear some great sessions.

The slides from our session:

The Case for Discovery – reflections on a presentation to RLUK members

November 27, 2011

David Kay – david.kay@sero.co.uk – 27 November 2011

I was pleased to have the opportunity to talk about the Discovery initiative at the RLUK members meeting last week (#rlukmm11). This blog post picks up key points raised in the Twitter stream (thanks especially to Simon Bains, Mike Mertens, Tracey Stanley and Owen Stephens) and links them to my concluding suggestions.

The presentation mixed an update on progress to date (because a lot has happened in the six months since Discovery was ‘named’) with a focus on emerging business cases for further investment of valuable institutional and collective effort in this space, leading to some collective considerations for RLUK.

My suggestion is that the ‘business case’ for investment in resource description and discovery hinges on opportunities for gains in economy (back of house), efficiency (relating to both library and patron workflows) and effectiveness (better supporting 21st century modes of research and learning). I’ve set out 10 benefit cases drawn from recent institutional and consortium projects in a recent Discovery blogpost. However, as pointed out in questions, these ‘business arguments’ need to be sharpened to identify ROI and how it will be measured – member suggestions will be most welcome!

In addition, I proposed ‘expression’ as an essential part of the top line business case. However good the service offered by local discovery layers and globally by such as Google, there is a gap between the way records are currently discovered and the style of connected navigation that could be offered though more complete, consistent and connected classification geared to academic enquiry. This is about taking the value we already provide in ‘cataloguing’ and making that work for us in the web of scholarly data within and beyond our institutional controls – across libraries, archives and museums, plus such as learning assets, OERs and research data.

Making relevant ‘stuff’ (the library catalogue and the rest, within and beyond the academy) discoverable as linked open data is an obvious way to support this approach – but my key point is about a business requirement (truly joined up expression of scholarly data and metadata) rather than a technology. I suggested that, of all the things to be done to enact that transformation, senior managers should concentrate on the key enablers – metadata licensing, use of common identifiers and authorities across all types of records, service sustainability and measurement – whilst ensuring appropriate staff are skilled in the mechanics.

Comments on the Twitter stream, suggested that there is very little distance between this Discovery proposition and what Paul Ayris set out in the recommendations emerging from the shared cataloguing working group. Owen Stephens tweeted that perhaps this represents the major Discovery use case from an RLUK perspective – though we definitely need the Discovery programme to exemplify more cases. Both these presentations indicated a long term objective geared to serving teaching, learning and research, whilst offering economies and efficiencies along the way. However the 5 years horizon is very distant – and therefore I would emphasise the complementary short term opportunities and stepping stones listed at the end of my presentation.

  1. Liberate Copac – publish Copac as open data and potentially as Linked Open Data; the first cut may only involve limited authorities but would still enable the potential to be tested alongside such as the Archives Hub
  2. Animate Knowledge Base Plus – play a leading role in the collective population of this shared subscription and licence dataset, which may be of significant assistance in future licensing work with JISC Collections
  3. Review scope of other RLUK initiatives – establish whether such as common authorities and open licensing may be priority components in such as the shared cataloguing and special collections work
  4. Assess the wider curatorial landscape – identify where RLUK could be taking collective steps of this type in areas such as learning assets and research data
  5. ‘Understand’ e-books in this context – whilst the metadata supply chain and workflows remain extremely uncertain, alignment with this direction of travel will be essential (and in 5 years may be a lost opportunity)
  6. Consider action on identifiers, authorities and access points – all of the above raise the challenge of collectively adopting key reference points, presumably including name, place and subject; a working group specifically focused on this and looking beyond libraries may be of value

My personal observation is that these represent immediate and low cost collective opportunities to assess and develop metadata infrastructure in anticipation of the roles that RLUK might play in a changing knowledge and service environment, both within the academy and in the wider UK context.

And last but not least, thanks again to RLUK for the chance to attend a very stimulating event.

Making resources discoverable … is there a business case?

November 11, 2011

David Kay – @serodavid – david.kay@sero.co.uk – 4 November 2011

> Crying ‘Wolf’

It seems obvious that there would be a self-evident business case for making learning, teaching and research resources discoverable.

However, it is arguable that this is a ‘crying wolf’ scenario. Library, archive and museum services have been at this for time immemorial and therefore the idea of a further push (better indexing, open licensing) for a special reason (the evolving information ecosystem) may be somewhat unappealing – especially in a period of austerity.

> So … what makes a business case?

In these times it may no longer be sufficient to argue a case on the grounds of service improvement and fulfilment of approved mission (e.g. the university’s library strategic plan).

It is arguable that if the library (or archive or museum) has a signed off plan, then changes in the mode of discovery and the underlying handling of metadata are solely tactical issues within that plan and its budget envelope. In reality, that depends on how good and recent the plan is! Indeed, faced with the twin pressures of the student as customer and institutional financial priorities, the stated service mission may not necessarily be the ideal foundation for a compelling business case.

When faced with the opportunities presented by new models for resource discovery and utilisation, the enabling services (not just institutional libraries, archives and museums but also the keepers of repositories, VLEs and OERs) need to weigh the following factors:

  • Institutional – Demands for step changes in efficiency and economy;
  • Users – Requirements of undergraduates, researchers and BCE partners;
  • Professional – Service improvement, enhancing local assets alongside wider resources;
  • Global – Alignment with prevalent technologies and wider developments in the knowledge ecosystem

All these facets – not one or another – need to be considered in a compelling business case for new modes of discovery, and presumably in any tenable strategic plan.

> Did the 2011 Discovery projects find the business case grail?

The eight projects supported under the first phase of the JISC RDTF Discovery programme were experimental, exploring ways and means of developing new services based on more discoverable metadata, alternative formats (including Linked Data) and open licensing. Common technical and professional challenges had to be addressed ahead of any assessment of the business case specific to the host institution.

Nevertheless, the projects identified several benefit cases worthy of evaluation. Not all of the suggested benefits will be appealing, let alone persuasive, in every library, archive or museum setting. Indeed, they are more specific to circumstances and vision than to curatorial domain.

The synthesis of project findings, undertaken by Mimas, found that the projects had proposed around 15 business case ‘arguments’. As operational scenarios solidify and mature, we can reasonably expect there will be more where these came from, all of which might be combined to present business cases for the service, for the institution and, not least, for the user. [It should be noted that the Discovery projects did not address the business case relating to global drivers as the projects were predicated on this is a ‘given’ factor].

> Paint a business case by numbers? A personal Top Ten

Following some discussion of the list of 15 business case arguments with colleagues (thanks especially to Mike Mertens of RLUK for his feedback), here is my personal ‘Top Ten’. You can find the rest plus links to the relevant projects at http://discovery.ac.uk.

Institutional Level – Serving strategic institutional objectives, especially in support of a more effective learning and more efficient research infrastructure.

1 – Fulfilling institutional policy commitment to Open Data provides a strong basis for this work
2 – Contributing proactively to wider strategic directions such as personalization, user co-creation and integrated resource discovery
3 – Following such as Google in opening data to serendipitous development is low cost and may yield unknown benefits

Practitioner Benefits (Librarians, Archivists, Curators) – More economic and effective ways of ensuring the collection is well described.

4 – Making better use of limited professional time by embedding records improvement in core workflows and / or by automating separately
5 – Providing more efficient mechanisms to generate more effective indexing and access points, based on standard and shared authorities

General User Benefits – Making the collection being more discoverable, more accessible and linked to other relevant knowledge assets.

6 – Amplifying the impact of the collection by broadening the scope for discovery, achieving greater utilisation and enabling downstream discovery of relevant ‘linked’ resources
7 – Using open metadata to provide a richer user experience and create opportunities for a variety of interfaces

Researcher Benefits – Contributing to the research ecosystem, within and beyond the institution.

8 – Cultivating the international research ecosystem by minimising duplication of effort and avoiding knowledge silos
9 – Evolving scholarship by enabling participation of a wider community in testing, refining and building on research results
10 – Surfacing the unpredictable connections required by interdisciplinary research

> And finally…

Increasing numbers of managers and practitioners are involved in demonstrating the business case for enacting the principles endorsed by Discovery. What cuts it for you? Is it purely a cost metric or a measure of user satisfaction? Which of these arguments and what others would you put forward?