The Digital Public Library of America. Highlights from Robert Darnton’s recent talk

January 24, 2012

I was fortunate to be among those attending Robert Darnton’s talk on the Digital Public Library of America initiative last week. Harvard Professor and Director of Harvard Library, Darnton is a pivotal figure behind DPLA and his talk – most concurred – was both provocative and inspirational. More than a description of the DPLA initiative, Darnton framed his talk with key issues and questions for us to reflect upon. How can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change the emerging patterns of locked down and monopolised chains of supply and demand?  And as Professor David Baker highlighted in his introduction of Darnton, there is much alignment here with the broader and more aspirational ethos of Discovery: a striving to support new marketplaces, new patterns of demand, new business models – all in the ideal pursuit of the Public Good. Arguably naïve aspirations, but certainly the tenor in the room was one of consensus, a collective pleasure at being both challenged and inspired. Like Discovery, the DPLA is a vision, a movement, tackling these grand challenges, but also striving to make practical inroads along the way.

The remainder of this post attempts to capture Darnton’s key points, and also highlight some of the interesting themes emerging in the Q&A session that followed.


 “He who receives ideas from me, receives instruction himself without lessening mine; as he who lights his taper at mine receives light without darkening me” Thomas Jefferson


To frame his talk, Darnton invoked this oft-cited tenet of Thomas Jefferson – that the spread of knowledge benefits all. He aptly applied this concept to the concept of the internet and specifically the principles of Open Access for the Public Good, and the assumption that one citizen’s benefit does not diminish another. But of course, he cautioned, this does not mean information is free and we face a challenging time where, even as more knowledge is being produced, an increasingly smaller proportion of it is being made available to the public openly. To illustrate this, he pointed to how academic journals have increased in costs at four times the cost of inflation, and we are anticipating that these rates will continue to rise, even as Universities and libraries face increasing cutbacks. We need to ask, how can that increase in price be sustained? Health care may be a Public Good, but information about health is monopolised by those who will push it as far as the market will bear.

Darnton acknowledged that publishers will reply by deprecating the naiveté of the Jeffersonian framing of the issue. And, he conceded, journal suppliers clearly add value; it’s fair they should benefit – but how much? Publishers often invoke the concept of ‘marketplace of ideas’ But in a free marketplace, the best will survive. For Darnton, we are not currently operating in a free marketplace, as demand is simply not flexible  – publishers create niche journals, territorialise, and then crush the competition.

The questions remain, then, how can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change these locked down and monopolised chains of supply and demand?  The remainder of Darnton’s talk outlined the approaches being taken by the DPLA initiative. It’s early days, he acknowledged, but significant inroads are already being made.

So what is DPLA? A brief overview

Darnton addressed (in relative brief) the scope and content of DPLA, the costs, the legal issues being tackled, technical approaches, and governance.

Scope and content: Like Discovery, the DPLA is not to be One Big Database – instead, the approach is to establish a distributed system aggregating collections from many institutions. Their vision is to provide one click access to many different resource types, with the initial focus on producing a resource that gives full text access to books in public domain, e.g. from  Hathi Trust, the Internet Archive, and U.S and international research libraries. Also carefully highlighted that the DPLA vision is being openly and deliberately defined in a manner that makes the service distinct from those services offered by public libraries, for instance excluding everything from the last 5-10 years (with a moving wall annually as more content come available as Public Domain).

The key tactic to maximise impact and reduce costs will be to aggregate collections that already exist, and so when it opens, it will likely only contain a stock of public domain items, and will grow as fast as funding commits. To achieve this, it will be designed in a way that as much as possible makes it interoperable with other Digital Libraries (for example, an agreement has already been made with Europeana). So far funding has been dedicated to building this technical architecture, but there is also a strong concentration on ongoing digitisation and collaboratively funding such initiatives.

In terms of legal issues Darnton anticipates that DPLA will be butting heads against copyright legislation – he clearly has strong personal views in this area (e.g. referring to the Google Books project as a ‘great idea gone wrong’ with Google’s failure to pursue making the content available under Fair Use)  but he was careful to distinguish these views from any DPLA policy in this regard.  But as DPLA will be not-for-profit, he suggested that they might stand a good chance to invoke the Fair Use defence in the case of orphan works, for example. But he also acknowledged this is difficult and new territory. Other open models referenced included the case of a Scandinavian style licence for public digital access to all books. He also stated that he sees the potential for private partnerships in developing value-added monetised services such as apps – while keeping the basic open access principles of the DPLA untouched.

The technical work of DPLA is still very much in progress, with a launch date of April 2013 for a technical prototype along with 6 winning ideas from a beta sprint competition. More information will be released soon.

In terms of governance, a committee has been convened and has only just started to study options for running DPLA.

Some questions from UK stakeholders

The Q&A session was kicked off by Martin Hall, VC of Salford University, who commented that in many ways there is much to be hopeful for in the UK in terms of the Open agenda. Open Access is going strongly in the UK with 120 open access repositories; and, he stated, a government that seems to ‘get it’ largely because of a fascination with forces in the open market. As a result there is a clause in new policy about making available ‘openly’ public datasets.  This is quite an extraordinary statement, Hall commented, given the implications for public health, etc. and this is possibly indicating a step change. But it all perhaps contributes to the quiet revolution occurring around Open Access.

Darnton responded by highlighting that in the USA they may have open access repositories, but that there is a low compliance rate in terms of depositing (and of course this is an issue in the UK too). But Harvard has recently mandated the deposit; and while there was less than 4% before, there is now over 50% compliance, and the repository “is bulging with new knowledge.”

In addition, Darnton reminded the group, while the government might be behind ‘Open,’ we still face opposition from the private sector. A lot of vested interests feel threatened by open access; and there is always a danger of vested interest groups capturing attention of the government.  But, he said, it’s good to see hard business reasons are being argued as well as cultural ones, but we need to be very careful.

Building on this issue, Eric Thomas, Vice Chancellor of Bristol University raised the issue of demonstrating the public value – how do we achieve this? He noted that the focus of Darnton’s talk was on supply side, but what about demand? To what extend are DPLA looking at ways to demonstrate public value, i.e. ‘this is what is happening now that couldn’t happen before…’?

In his response, Darnton referred to a number of grassroots approaches that are addressing this ‘demand’ side of the equation, including a roving Winnebago ‘roadshow’ to get communities participating in curating and digitising local artefacts. In short, DPLA is not about a website, but an organic, living entity… This approach, he later commented was about encouraging participation from the top down and bottom up.

Alistair Dunning from JISC posed the question of what will ‘stop people from going to Google?; Darnton was keen to point out that while he critiqued Google’s approach to the million books copyright situation, DPLA was in no way about ‘competing’ with Google.  People must and will use Google, and DPLA will open their metadata and indexes to ensure they are discoverable by search engines. DPLA would highly value a collaborative partnership with Google.

Peter Burnhill from EDINA raised the critical question of licensing. Making data ‘open’ through APIs can allow people to do ‘unimaginable things’; what will the licensing provision for DPLA be? CC-0?  Darnton acknowledged that this was still a matter of debate in terms of policy decisions – and especially around content. He agreed that there were unthought of possibilities in terms of Apps using DPLA, and they want to add value by taking this approach (and presumably consider sustainability options moving forward).  In short, the content would be open access, and metadata likely openly licensed, but in terms of reuse of the content itself, this *could* be commercialised in order to sustain the DPLA.

In a later comment, Caroline Brazier from the British Library expressed admiration for the vision and the energy and the drive. She explained that from the BL perspective ‘we’re there for anybody who wants to do research’; She highlighted how the British Library and the community more broadly has a huge amount to do to push on with advocacy, particularly around copyrighting issues.  This, forces all institutions of all sizes to rethink their roles in this environment – there are no barriers here, she suggested: we can do things differently. We need to think individually about what we do uniquely. What do we do? What do we invest in? What do we stop doing? Funding will be precious, and we really need to maximise the possibility to get funding.

Darnton agreed, and stated that there is a role for any library that has something unique to make it available (and of course, the British Library is the pinnacle of this). The U.S. has many independent research libraries (the Huntington, Newberry, etc) and they very much want to make room for them in the DPLA; they want to reach out to these research libraries who may be open minded but are behind closed doors in terms of broader public.

The final (and perhaps one of the most thought-provoking questions) came from Graham Taylor from the Publishers Association. He stated that he concurred with much of what Darnton had to say (perhaps surprising, he suggested, given his role) but he did comment that throughout the afternoon he had “not heard anything good about publishers.” So, he asked, where do publishers fit? In many regards, publishers are the risk-takers, the ones who work to protect intellectual property, and get all works out there – including those that pose ‘risk’ because they are not guaranteed blockbusters.

Darnton strongly agreed that publishers do add value, but, he explained, what he’s attacking is excessive, monopolistic commercial practices to such an extent that they are damaging the world of knowledge.  He was struck by Taylor’s comment on risk-taking, though, for indeed publishing is a very risky business. But sometimes the way risk is dealt with is unfortunate, with that emphasis on the blockbuster as opposed to a quality, sound backlist. So what can be done about this risktaking and sharing the burden? Later this year, he said, Harvard would be hosting a conference that explores business opportunities in publishing in open access. If publishers are gatekeepers of quality, how can open access can be used to the benefit of publishing, and so alleviate that risk-taking and raise quality?

Five Reasons To Be Cheerful

January 20, 2012

Five reasons to be cheerful about the Discovery Service Projects

David Kay, working with the Mimas Discovery team

So, what’s new? Another year, another round of  projects – the second phase of the Discovery initiative.

Whilst it would be naïve to trumpet progress or to estimate distance travelled at this stage, I confess to being enthused by the discussions taking place at the kick off workshop in Birmingham on 11 January. You’ll find initial introductions to all the projects mentioned in this post here.

The meeting brought together 10 of the 11 projects linked to the JISC 13/11 call for Discovery Services, the Cambridge / Lincoln CLOCK project being the only absentees. So let’s start right there for the first of five observations in this post …

The CLOCK collaboration emerged directly from a fruitful dialogue about the practical value open catalogue data in Phase 1 (check out the COMET and JEROME precursors). Likewise the Open Bibliography project, championed by the inventive Mark MacGillivray, continues powerful work started in the JISC Expo programme with 30m openly licensed records already in the bag – check out their demonstrator.

Observation 1 – Thinking shared and experience gained within the Discovery initiative is maturing in to a powerful community tool.

And lest anyone should suggest that all the running is being made by libraries, up steps the AIM25 archival consortium with ‘Step Change’, working to apply the linked data based indexing productivity endorsed by archivists in Phase 1 to the widely used CALM cataloguing application. Meanwhile, in the world of museums, Contextual Wrappers 2 (led by the Cambridge Fitzwilliam museum and Collections Trust, working with Knowledge Integration) plans to extend its collection descriptions model across the HE Museums sector, informed by a grounded ‘market’ survey.  We should also highlight the efforts of Search25 (the M25 library consortium) and ServiceCore (the OU project harvesting dozens of Open Access repositories) to ensure their services address community needs.

Observation 2 – Responding to practitioner and community opinion is at the heart of Discovery aggregator thinking.

Discovery is not about a single model that fits all. However, the growing interest in Linked Open Data as an approach with a future is significant. This ranges from the Bodleian recognizing it as a vehicle for breaking down the silos that divide their own collections (the Digital.Bodleian project) to museums across the North East using linked data and supporting vocabularies in the Cutting Edge project to enable cross-searching of collections to meet the needs of very different types of users from schools to researchers. AIM25 Step Change shares the same confidence.

Observation 3 – There is a measured expectation that linked data can yield practical value for highly focused local services, as well as delivering in grand ‘web scale’ settings.

It is particularly interesting how the value of place and other geographic information is becoming leveraged in a variety of ways within the linked data model. Pelagios 2, involving Southampton and the OU with a range of international partners, is linking data to place to assist in cataloguing, annotation, search and visualization of ancient objects. Fast forward a couple of millennia and the DiscoverEDINA project is using an automated Geotagger to expose place metadata embedded in digital media files. The links of AIM25 Step Change to Historypin address the same theme.

Observation 4 – The adoption of common vocabularies seems key to making the most of key access points across the ‘web of data’ – and place looks like the early candidate for generating critical mass.

The afternoon sessions focused on the objectives of the Discovery initiative under the themes ‘Terms of Use’, ‘Data’ and ‘Interfaces’ and the underlying quest for service sustainability. On behalf of the Mimas-led Discovery team, Owen Stephens set out 12 practical measures of quality implementation, whilst recognizing that no single project will address every measure.

Terms of Use

1 – Adopting open licensing

2 – Requiring clear reasonable terms and conditions


3 – Using easily understood data models

4 – Deploying persistent identifiers

5 – Establishing data relationships by re-using authoritative identifiers


6 – Providing clear mechanisms for accessing APIs

7 – Documenting APIs

8 – Adopting widely understood data formats


9 – Ensuring data is sustainable

10 – Ensuring services are supported

11 – Using your own APIs

12 – Collecting data to measure use

Observation 5 – Whilst there is still much work to be done, Discovery is moving from abstract principles to tangible measures of practical implementation.

As you can tell, I think the plans and ambitions of these Phase 2 projects are indicative of healthy developments and increasing maturity in the wider Discovery initiative. And this is where the Discovery team led by Mimas has a vital role in supporting practical implementation beyond these institutions through case studies, guidance materials and targeted workshops … watch this space!

w/c 16 Jan 2012 – Discovery News Round-up

January 16, 2012

This is the first of my regular round-up of what’s happening in the world of resource discovery. Twice a month I’ll be sharing what I’ve found during my internet travels and also highlighting things that have caught my eye under the #UKDiscovery Twitter hashtag. You can also see the latest tweets from that hashtag compiled into an eye-pleasing PDF format (created via the FiveFilters PDF newspaper maker).

Firstly, I want to share the output of the JISC Activity Data Synthesis project which I was involved with last year. The project website was published in November 2011, which already seems like a lifetime ago, but hopefully the collective wisdom gathered together there will be useful for some time to come.

JISC Activity Data website screenshot

The JISC Activity Data programme was a collection of nine projects which, although not directly part of the Discovery initiative, covered some relevant terrain – particularly around issues of licensing, metadata and open data. Other strong themes that emerged during the course of the programme were ‘big data’ (particularly so for the Exposing VLE Activity Data project), data storage and data visualisation. If you’re interested in getting to grips with data visualisation then the online talk that Tony Hirst kindly did for us as part of our virtual exchange sessions is well worth a watch. Five of the projects were focused on library activity data so they are worth exploring if that’s the domain you’re involved with: AEIOU, LIDP, RISE, SALT and OpenURL.

Now onto the highlights of things I’ve come across over the past few weeks:

Discovery exemplars

January 13, 2012

The Discovery programme is, in many ways, a slippery beast. It is not building one specific thing, but it is rather advocating a range of approaches that, if taken by libraries, museums and archives, should lead to better resource discovery services. This can make it difficult to explain. This is compounded by the fact we are learning as we go so messages are starting simple and high level and getting gradually richer and more granular as we learn more. Despite this, persuading people to adopt new approaches to licensing, technology and institutional processes is the key to achieving the aims of the Discovery programme. To help cope with this contradiction we resolved to build two exemplar services that show what is possible if the Discovery principles are adopted by collection owners and service builders.

We have now funded these projects and the work is starting to get underway.

EDINA are building Shakespeare’s Registry. An aggregation of online sources of digital resources relating to William Shakespeare, covering performance, interpretative and contextual resources in order to demonstrate the value and principles of metadata aggregation as part of the JISC/RLUK Discovery initiative.

Mimas are working with King’s College London to develop an api to enable people to explore content about World War One. They will work with other partners to develop two innovative interfaces built on top of the api. More detail on the background and intentions of this exemplar can be found on the dedicated blog.

The projects will comply with the principles laid out in the Discovery open metadata and technical principles. As well as developing useful resources they will learn valuable lessons about how to best go about building resources that comply with the principles. Both aggregations will use apis to aggregate the content and will do so via open interfaces rather than negotiating special access to the content. Both projects will focus on encouraging others to build on top of the apis they develop rather than focusing on their own vision for an interface. This also means that both projects need to take an open approach to the metadata they aggregate and adopt suitable Creative Commons or Open Data commons licences (pdf).

So, both projects have a lot on their plate and have challenging timescales. Both are scheduled to deliver the exemplar by July 2012. They both have the potential to be rich and interesting resources and will definitely learn useful lessons. We will update you on their progress via this blog.