Twelve Themes and a few more

September 28, 2012

All 17 projects funded by JISC in Phase 2 of the Discovery programme met in Birmingham today to share updates and ideas as they wind down their efforts. It was a very stimulating meeting, not least because the shared Discovery dialogue seems to have developed significantly since during 2012. The Phase 1 projects undertook some very useful experiments, but the Phase 2 projects have taken things up a notch.

Here, in very raw form are the recurrent themes that I recorded as takeaways from the session

A – Data and access points

  • Time and Place are priority access points
  • URIs offer an effective base level linking strategy
  • Collection level descriptions have potential as finding aids across domains
  • User generated content, such as annotations, has a place at the table

B – People

  • Community is a vital driver – open communities maintain momentum; specialist enthusiasms and ways of working provide strong use cases
  • For embedding new metadata practice, start where the workers are – add-ins to Calm and MODS demonstrate that
  • More IT experience / skills are required on the ground

C – The way the web works

  • Aggregators crawl don’t query … OAI-PMH, Robots, etc
  • Google’s strength shouts ‘Do it my way’ – and we should take heed (but we do need both/and)
  • Currency of data is important – there may be a tension with time lags associated with crawling
  • Aggregators need to know what is where to build or add value  so … we don’t need a registry?
  • No man is an island – It’s a collaborative world with requirements to interact with complementary services such as Dbpedia, Europeana, Google Historypin, Pleiades, UKAT, VIAF

D – Tools and technology

  • There is opportunity / obligation to leverage expert authority data and vocabularies – examples as above and more, such as Victoria County History, …
  • Commonly used software tools include Drupal, Solr/Lucene, Elastic Search, Javascript, Twitter bootstrap
  • JSON and RDF are strong format choices amongst the developers
  • Beware SPARQL end points and Triple Stores, especially in terms of performance
  • APIs are essential – but little use without both documentation and example code
  • OSS tools have been built by several projects … but how do we leverage them (e.g. Bibsoup, Alicat)

Radically Open Cultural Heritage Data at SXSW Interactive 2012

April 11, 2012

Image

Posted by Adrian Stevenson

I had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with Jon Voss from Historypin, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were delighted to see that Mashable.com voted it as one of ’22 SXSW Panels You Can’t Up This Year’.

All of our panelists covered themes and issues addressed by the Discovery initiative, including the importance of open licenses, and the need for machine readable data via APIs to facilitate the easy transfer, aggregation and link-up of library, archives and museum content.

Jon gave some background on the ‘Linked Open Data in Libraries, Archives and Museums’ (LOD-LAM) efforts around the world, talking about how the first International LODLAM Summit held in San Francisco last year helped galvanise the LODLAM community. Jon also covered some recent work Historypin are doing to allow users to dig into archival records.

Julie then covered some of the technical aspects of publishing Linked Data through the lens of the OpenArt Discovery project, which recently released the ‘London Art World 1660-1735’ data. She mentioned some of the benefits of the Linked Data approach, and explained how they’ve been linking to VIAF for names and Geonames for location.

I gave a quick overview of the LOCAH and Linking Lives projects, before giving a heads up to the World War One Discovery project. LOCAH has been making archival records from the Archives Hub national service available as Linked Data, and Linking Lives is a continuation project that’s using Linked Data from a variety of sources to create an interface based around the names of people in the Archives Hub. After attempting to crystallise what I see are the key benefits of Linked Data, I finished up by focusing on particular challenges we’ve met on our projects.

Rachel considered how open data might affect policies, procedures and the organisational structure of the library world.  She talked about the Digital Public Library of America, a growing initiative started in Oct 2010. The DPLA vision is to have an “open distributed network of comprehensive online resources that draw on the nations living history from libraries, universities, archives and museums to educate, inform, and empower everyone in current and future generations”. After outlining how the DPLA is aiming to achieve this vision, she explained how interested parties can get involved.

There’s an audio recording of the panel on our session page, as well as recordings of all sessions mentioned below on their respective SXSW pages. I’ve also included the slides for our session at the bottom of this post.

Not surprisingly, there were plenty of other great sessions at SXSW. I’ve picked a few highlights that I thought would be of interest to readers of this blog.

Probably of most relevance to Discovery was the lightening fast ‘Open APIs: What’s Hot and What’s Not’ session from John Musser, founder of Programmableweb.com, who gave us what he sees as the eight hottest API trends. He mentioned that the REST style of software architecture is rapidly growing in popularity, being regarded as easier to use than other API technologies such as SOAP (see image below). JSON is very popular with 60% of APIs now supporting it. It was also noted that one in five APIs don’t support XML.

Hot API Protocols and Styles from John Musser of Programmableweb.com

The rise of REST – ‘Hot API Protocols and Styles’ from John Musser of Programmableweb.com at SXSW 2012

Musser suggested that APIs need to be supported, with Hackathons and funded prizes being a good way to get people interested. He noted that the hottest trend right now is that VCs are providing significant funding to incentivise people to use their APIs, Twilio being one of the first to do this. He also mentioned that your API documentation needs to be live if you’re to get interest and maintain use. Invisible mashups are also hot, with operating systems such as Apple’s OS cited as being examples of such. Musser suggests the overall meta-trend is that APIs are now ubiquitous. John’s now made his slides available on slideshare.

The many users of laptops amongst us will have been interested to hear about the ‘Future of Wireless Power’.  The session didn’t go into great detail, but the message was very much “it’s not a new technology, and it’ll be here very soon”. Expect wireless power functionality in mobile devices in the next few years, using the Qi standard.

Some very interesting folks from MIT gave the thought provoking ‘MIT Media Lab: Making Connections’ session. Joi Ito, Director of MIT Media Labs explained how it’s all about the importance of connecting people, stating that “we’re now beyond the cognitive limits of individuals, and are in an era where we rely on networks to make progress”. He suggested that traditional roadmaps are outmoded, and that we should throw them away and embrace serendipity if we’re to make real progress in technology. Ito mentioned that MIT has put significant funding into undirected research and an ‘anti-disciplinary’ approach. He said that we now have much agility in hardware as well as software, and that the agile software mentality is being applied to hardware development. He pointed to a number of projects that are embracing these ideas – idcubed, affectiva, sourcemap and formlabs.

Josh Greenberg talked about ‘macroscopy’ in the ‘Data Visualization and the Future of Research’ session, which is essentially about how research is starting to be done at large scale. Josh suggested that ‘big data’ and computation are now very important for doing science, with macroscopy being the implementation of big data to research. He referred to the ‘Fourth Paradigm’ book which presents the idea that research is now about data intensive discovery. Lee Dirks from Microsoft gave us a look at some new open source tools they’ve been developing for data visualisation, including Layerscape, which allows users to explore and discover data, and Chronozoom, which looked useful for navigating through historical big data.  Lee mentioned Chronozoom was good for rich data sources such as archive & museum data, demoing it using resources relating to the Industrial Revolution.

So that was about it for the sessions I was able to get to as part of the SXSW Interactive conference. It was a really amazing event, and I’d highly recommend it to anyone as a great way to meet some of the top people in the technology sector, and of course, hear some great sessions.

The slides from our session:



The Digital Public Library of America. Highlights from Robert Darnton’s recent talk

January 24, 2012

I was fortunate to be among those attending Robert Darnton’s talk on the Digital Public Library of America initiative last week. Harvard Professor and Director of Harvard Library, Darnton is a pivotal figure behind DPLA and his talk – most concurred – was both provocative and inspirational. More than a description of the DPLA initiative, Darnton framed his talk with key issues and questions for us to reflect upon. How can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change the emerging patterns of locked down and monopolised chains of supply and demand?  And as Professor David Baker highlighted in his introduction of Darnton, there is much alignment here with the broader and more aspirational ethos of Discovery: a striving to support new marketplaces, new patterns of demand, new business models – all in the ideal pursuit of the Public Good. Arguably naïve aspirations, but certainly the tenor in the room was one of consensus, a collective pleasure at being both challenged and inspired. Like Discovery, the DPLA is a vision, a movement, tackling these grand challenges, but also striving to make practical inroads along the way.

The remainder of this post attempts to capture Darnton’s key points, and also highlight some of the interesting themes emerging in the Q&A session that followed.

————-

 “He who receives ideas from me, receives instruction himself without lessening mine; as he who lights his taper at mine receives light without darkening me” Thomas Jefferson

 

To frame his talk, Darnton invoked this oft-cited tenet of Thomas Jefferson – that the spread of knowledge benefits all. He aptly applied this concept to the concept of the internet and specifically the principles of Open Access for the Public Good, and the assumption that one citizen’s benefit does not diminish another. But of course, he cautioned, this does not mean information is free and we face a challenging time where, even as more knowledge is being produced, an increasingly smaller proportion of it is being made available to the public openly. To illustrate this, he pointed to how academic journals have increased in costs at four times the cost of inflation, and we are anticipating that these rates will continue to rise, even as Universities and libraries face increasing cutbacks. We need to ask, how can that increase in price be sustained? Health care may be a Public Good, but information about health is monopolised by those who will push it as far as the market will bear.

Darnton acknowledged that publishers will reply by deprecating the naiveté of the Jeffersonian framing of the issue. And, he conceded, journal suppliers clearly add value; it’s fair they should benefit – but how much? Publishers often invoke the concept of ‘marketplace of ideas’ But in a free marketplace, the best will survive. For Darnton, we are not currently operating in a free marketplace, as demand is simply not flexible  – publishers create niche journals, territorialise, and then crush the competition.

The questions remain, then, how can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change these locked down and monopolised chains of supply and demand?  The remainder of Darnton’s talk outlined the approaches being taken by the DPLA initiative. It’s early days, he acknowledged, but significant inroads are already being made.

So what is DPLA? A brief overview

Darnton addressed (in relative brief) the scope and content of DPLA, the costs, the legal issues being tackled, technical approaches, and governance.

Scope and content: Like Discovery, the DPLA is not to be One Big Database – instead, the approach is to establish a distributed system aggregating collections from many institutions. Their vision is to provide one click access to many different resource types, with the initial focus on producing a resource that gives full text access to books in public domain, e.g. from  Hathi Trust, the Internet Archive, and U.S and international research libraries. Also carefully highlighted that the DPLA vision is being openly and deliberately defined in a manner that makes the service distinct from those services offered by public libraries, for instance excluding everything from the last 5-10 years (with a moving wall annually as more content come available as Public Domain).

The key tactic to maximise impact and reduce costs will be to aggregate collections that already exist, and so when it opens, it will likely only contain a stock of public domain items, and will grow as fast as funding commits. To achieve this, it will be designed in a way that as much as possible makes it interoperable with other Digital Libraries (for example, an agreement has already been made with Europeana). So far funding has been dedicated to building this technical architecture, but there is also a strong concentration on ongoing digitisation and collaboratively funding such initiatives.

In terms of legal issues Darnton anticipates that DPLA will be butting heads against copyright legislation – he clearly has strong personal views in this area (e.g. referring to the Google Books project as a ‘great idea gone wrong’ with Google’s failure to pursue making the content available under Fair Use)  but he was careful to distinguish these views from any DPLA policy in this regard.  But as DPLA will be not-for-profit, he suggested that they might stand a good chance to invoke the Fair Use defence in the case of orphan works, for example. But he also acknowledged this is difficult and new territory. Other open models referenced included the case of a Scandinavian style licence for public digital access to all books. He also stated that he sees the potential for private partnerships in developing value-added monetised services such as apps – while keeping the basic open access principles of the DPLA untouched.

The technical work of DPLA is still very much in progress, with a launch date of April 2013 for a technical prototype along with 6 winning ideas from a beta sprint competition. More information will be released soon.

In terms of governance, a committee has been convened and has only just started to study options for running DPLA.

Some questions from UK stakeholders

The Q&A session was kicked off by Martin Hall, VC of Salford University, who commented that in many ways there is much to be hopeful for in the UK in terms of the Open agenda. Open Access is going strongly in the UK with 120 open access repositories; and, he stated, a government that seems to ‘get it’ largely because of a fascination with forces in the open market. As a result there is a clause in new policy about making available ‘openly’ public datasets.  This is quite an extraordinary statement, Hall commented, given the implications for public health, etc. and this is possibly indicating a step change. But it all perhaps contributes to the quiet revolution occurring around Open Access.

Darnton responded by highlighting that in the USA they may have open access repositories, but that there is a low compliance rate in terms of depositing (and of course this is an issue in the UK too). But Harvard has recently mandated the deposit; and while there was less than 4% before, there is now over 50% compliance, and the repository “is bulging with new knowledge.”

In addition, Darnton reminded the group, while the government might be behind ‘Open,’ we still face opposition from the private sector. A lot of vested interests feel threatened by open access; and there is always a danger of vested interest groups capturing attention of the government.  But, he said, it’s good to see hard business reasons are being argued as well as cultural ones, but we need to be very careful.

Building on this issue, Eric Thomas, Vice Chancellor of Bristol University raised the issue of demonstrating the public value – how do we achieve this? He noted that the focus of Darnton’s talk was on supply side, but what about demand? To what extend are DPLA looking at ways to demonstrate public value, i.e. ‘this is what is happening now that couldn’t happen before…’?

In his response, Darnton referred to a number of grassroots approaches that are addressing this ‘demand’ side of the equation, including a roving Winnebago ‘roadshow’ to get communities participating in curating and digitising local artefacts. In short, DPLA is not about a website, but an organic, living entity… This approach, he later commented was about encouraging participation from the top down and bottom up.

Alistair Dunning from JISC posed the question of what will ‘stop people from going to Google?; Darnton was keen to point out that while he critiqued Google’s approach to the million books copyright situation, DPLA was in no way about ‘competing’ with Google.  People must and will use Google, and DPLA will open their metadata and indexes to ensure they are discoverable by search engines. DPLA would highly value a collaborative partnership with Google.

Peter Burnhill from EDINA raised the critical question of licensing. Making data ‘open’ through APIs can allow people to do ‘unimaginable things’; what will the licensing provision for DPLA be? CC-0?  Darnton acknowledged that this was still a matter of debate in terms of policy decisions – and especially around content. He agreed that there were unthought of possibilities in terms of Apps using DPLA, and they want to add value by taking this approach (and presumably consider sustainability options moving forward).  In short, the content would be open access, and metadata likely openly licensed, but in terms of reuse of the content itself, this *could* be commercialised in order to sustain the DPLA.

In a later comment, Caroline Brazier from the British Library expressed admiration for the vision and the energy and the drive. She explained that from the BL perspective ‘we’re there for anybody who wants to do research’; She highlighted how the British Library and the community more broadly has a huge amount to do to push on with advocacy, particularly around copyrighting issues.  This, forces all institutions of all sizes to rethink their roles in this environment – there are no barriers here, she suggested: we can do things differently. We need to think individually about what we do uniquely. What do we do? What do we invest in? What do we stop doing? Funding will be precious, and we really need to maximise the possibility to get funding.

Darnton agreed, and stated that there is a role for any library that has something unique to make it available (and of course, the British Library is the pinnacle of this). The U.S. has many independent research libraries (the Huntington, Newberry, etc) and they very much want to make room for them in the DPLA; they want to reach out to these research libraries who may be open minded but are behind closed doors in terms of broader public.

The final (and perhaps one of the most thought-provoking questions) came from Graham Taylor from the Publishers Association. He stated that he concurred with much of what Darnton had to say (perhaps surprising, he suggested, given his role) but he did comment that throughout the afternoon he had “not heard anything good about publishers.” So, he asked, where do publishers fit? In many regards, publishers are the risk-takers, the ones who work to protect intellectual property, and get all works out there – including those that pose ‘risk’ because they are not guaranteed blockbusters.

Darnton strongly agreed that publishers do add value, but, he explained, what he’s attacking is excessive, monopolistic commercial practices to such an extent that they are damaging the world of knowledge.  He was struck by Taylor’s comment on risk-taking, though, for indeed publishing is a very risky business. But sometimes the way risk is dealt with is unfortunate, with that emphasis on the blockbuster as opposed to a quality, sound backlist. So what can be done about this risktaking and sharing the burden? Later this year, he said, Harvard would be hosting a conference that explores business opportunities in publishing in open access. If publishers are gatekeepers of quality, how can open access can be used to the benefit of publishing, and so alleviate that risk-taking and raise quality?


The Case for Discovery – reflections on a presentation to RLUK members

November 27, 2011

David Kay – david.kay@sero.co.uk – 27 November 2011

I was pleased to have the opportunity to talk about the Discovery initiative at the RLUK members meeting last week (#rlukmm11). This blog post picks up key points raised in the Twitter stream (thanks especially to Simon Bains, Mike Mertens, Tracey Stanley and Owen Stephens) and links them to my concluding suggestions.

The presentation mixed an update on progress to date (because a lot has happened in the six months since Discovery was ‘named’) with a focus on emerging business cases for further investment of valuable institutional and collective effort in this space, leading to some collective considerations for RLUK.

My suggestion is that the ‘business case’ for investment in resource description and discovery hinges on opportunities for gains in economy (back of house), efficiency (relating to both library and patron workflows) and effectiveness (better supporting 21st century modes of research and learning). I’ve set out 10 benefit cases drawn from recent institutional and consortium projects in a recent Discovery blogpost. However, as pointed out in questions, these ‘business arguments’ need to be sharpened to identify ROI and how it will be measured – member suggestions will be most welcome!

In addition, I proposed ‘expression’ as an essential part of the top line business case. However good the service offered by local discovery layers and globally by such as Google, there is a gap between the way records are currently discovered and the style of connected navigation that could be offered though more complete, consistent and connected classification geared to academic enquiry. This is about taking the value we already provide in ‘cataloguing’ and making that work for us in the web of scholarly data within and beyond our institutional controls – across libraries, archives and museums, plus such as learning assets, OERs and research data.

Making relevant ‘stuff’ (the library catalogue and the rest, within and beyond the academy) discoverable as linked open data is an obvious way to support this approach – but my key point is about a business requirement (truly joined up expression of scholarly data and metadata) rather than a technology. I suggested that, of all the things to be done to enact that transformation, senior managers should concentrate on the key enablers – metadata licensing, use of common identifiers and authorities across all types of records, service sustainability and measurement – whilst ensuring appropriate staff are skilled in the mechanics.

Comments on the Twitter stream, suggested that there is very little distance between this Discovery proposition and what Paul Ayris set out in the recommendations emerging from the shared cataloguing working group. Owen Stephens tweeted that perhaps this represents the major Discovery use case from an RLUK perspective – though we definitely need the Discovery programme to exemplify more cases. Both these presentations indicated a long term objective geared to serving teaching, learning and research, whilst offering economies and efficiencies along the way. However the 5 years horizon is very distant – and therefore I would emphasise the complementary short term opportunities and stepping stones listed at the end of my presentation.

  1. Liberate Copac – publish Copac as open data and potentially as Linked Open Data; the first cut may only involve limited authorities but would still enable the potential to be tested alongside such as the Archives Hub
  2. Animate Knowledge Base Plus – play a leading role in the collective population of this shared subscription and licence dataset, which may be of significant assistance in future licensing work with JISC Collections
  3. Review scope of other RLUK initiatives – establish whether such as common authorities and open licensing may be priority components in such as the shared cataloguing and special collections work
  4. Assess the wider curatorial landscape – identify where RLUK could be taking collective steps of this type in areas such as learning assets and research data
  5. ‘Understand’ e-books in this context – whilst the metadata supply chain and workflows remain extremely uncertain, alignment with this direction of travel will be essential (and in 5 years may be a lost opportunity)
  6. Consider action on identifiers, authorities and access points – all of the above raise the challenge of collectively adopting key reference points, presumably including name, place and subject; a working group specifically focused on this and looking beyond libraries may be of value

My personal observation is that these represent immediate and low cost collective opportunities to assess and develop metadata infrastructure in anticipation of the roles that RLUK might play in a changing knowledge and service environment, both within the academy and in the wider UK context.

And last but not least, thanks again to RLUK for the chance to attend a very stimulating event.


Making resources discoverable … is there a business case?

November 11, 2011

David Kay – @serodavid – david.kay@sero.co.uk – 4 November 2011

> Crying ‘Wolf’

It seems obvious that there would be a self-evident business case for making learning, teaching and research resources discoverable.

However, it is arguable that this is a ‘crying wolf’ scenario. Library, archive and museum services have been at this for time immemorial and therefore the idea of a further push (better indexing, open licensing) for a special reason (the evolving information ecosystem) may be somewhat unappealing – especially in a period of austerity.

> So … what makes a business case?

In these times it may no longer be sufficient to argue a case on the grounds of service improvement and fulfilment of approved mission (e.g. the university’s library strategic plan).

It is arguable that if the library (or archive or museum) has a signed off plan, then changes in the mode of discovery and the underlying handling of metadata are solely tactical issues within that plan and its budget envelope. In reality, that depends on how good and recent the plan is! Indeed, faced with the twin pressures of the student as customer and institutional financial priorities, the stated service mission may not necessarily be the ideal foundation for a compelling business case.

When faced with the opportunities presented by new models for resource discovery and utilisation, the enabling services (not just institutional libraries, archives and museums but also the keepers of repositories, VLEs and OERs) need to weigh the following factors:

  • Institutional – Demands for step changes in efficiency and economy;
  • Users – Requirements of undergraduates, researchers and BCE partners;
  • Professional – Service improvement, enhancing local assets alongside wider resources;
  • Global – Alignment with prevalent technologies and wider developments in the knowledge ecosystem

All these facets – not one or another – need to be considered in a compelling business case for new modes of discovery, and presumably in any tenable strategic plan.

> Did the 2011 Discovery projects find the business case grail?

The eight projects supported under the first phase of the JISC RDTF Discovery programme were experimental, exploring ways and means of developing new services based on more discoverable metadata, alternative formats (including Linked Data) and open licensing. Common technical and professional challenges had to be addressed ahead of any assessment of the business case specific to the host institution.

Nevertheless, the projects identified several benefit cases worthy of evaluation. Not all of the suggested benefits will be appealing, let alone persuasive, in every library, archive or museum setting. Indeed, they are more specific to circumstances and vision than to curatorial domain.

The synthesis of project findings, undertaken by Mimas, found that the projects had proposed around 15 business case ‘arguments’. As operational scenarios solidify and mature, we can reasonably expect there will be more where these came from, all of which might be combined to present business cases for the service, for the institution and, not least, for the user. [It should be noted that the Discovery projects did not address the business case relating to global drivers as the projects were predicated on this is a ‘given’ factor].

> Paint a business case by numbers? A personal Top Ten

Following some discussion of the list of 15 business case arguments with colleagues (thanks especially to Mike Mertens of RLUK for his feedback), here is my personal ‘Top Ten’. You can find the rest plus links to the relevant projects at http://discovery.ac.uk.

Institutional Level – Serving strategic institutional objectives, especially in support of a more effective learning and more efficient research infrastructure.

1 – Fulfilling institutional policy commitment to Open Data provides a strong basis for this work
2 – Contributing proactively to wider strategic directions such as personalization, user co-creation and integrated resource discovery
3 – Following such as Google in opening data to serendipitous development is low cost and may yield unknown benefits

Practitioner Benefits (Librarians, Archivists, Curators) – More economic and effective ways of ensuring the collection is well described.

4 – Making better use of limited professional time by embedding records improvement in core workflows and / or by automating separately
5 – Providing more efficient mechanisms to generate more effective indexing and access points, based on standard and shared authorities

General User Benefits – Making the collection being more discoverable, more accessible and linked to other relevant knowledge assets.

6 – Amplifying the impact of the collection by broadening the scope for discovery, achieving greater utilisation and enabling downstream discovery of relevant ‘linked’ resources
7 – Using open metadata to provide a richer user experience and create opportunities for a variety of interfaces

Researcher Benefits – Contributing to the research ecosystem, within and beyond the institution.

8 – Cultivating the international research ecosystem by minimising duplication of effort and avoiding knowledge silos
9 – Evolving scholarship by enabling participation of a wider community in testing, refining and building on research results
10 – Surfacing the unpredictable connections required by interdisciplinary research

> And finally…

Increasing numbers of managers and practitioners are involved in demonstrating the business case for enacting the principles endorsed by Discovery. What cuts it for you? Is it purely a cost metric or a measure of user satisfaction? Which of these arguments and what others would you put forward?