The Digital Public Library of America. Highlights from Robert Darnton’s recent talk

January 24, 2012

I was fortunate to be among those attending Robert Darnton’s talk on the Digital Public Library of America initiative last week. Harvard Professor and Director of Harvard Library, Darnton is a pivotal figure behind DPLA and his talk – most concurred – was both provocative and inspirational. More than a description of the DPLA initiative, Darnton framed his talk with key issues and questions for us to reflect upon. How can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change the emerging patterns of locked down and monopolised chains of supply and demand?  And as Professor David Baker highlighted in his introduction of Darnton, there is much alignment here with the broader and more aspirational ethos of Discovery: a striving to support new marketplaces, new patterns of demand, new business models – all in the ideal pursuit of the Public Good. Arguably naïve aspirations, but certainly the tenor in the room was one of consensus, a collective pleasure at being both challenged and inspired. Like Discovery, the DPLA is a vision, a movement, tackling these grand challenges, but also striving to make practical inroads along the way.

The remainder of this post attempts to capture Darnton’s key points, and also highlight some of the interesting themes emerging in the Q&A session that followed.


 “He who receives ideas from me, receives instruction himself without lessening mine; as he who lights his taper at mine receives light without darkening me” Thomas Jefferson


To frame his talk, Darnton invoked this oft-cited tenet of Thomas Jefferson – that the spread of knowledge benefits all. He aptly applied this concept to the concept of the internet and specifically the principles of Open Access for the Public Good, and the assumption that one citizen’s benefit does not diminish another. But of course, he cautioned, this does not mean information is free and we face a challenging time where, even as more knowledge is being produced, an increasingly smaller proportion of it is being made available to the public openly. To illustrate this, he pointed to how academic journals have increased in costs at four times the cost of inflation, and we are anticipating that these rates will continue to rise, even as Universities and libraries face increasing cutbacks. We need to ask, how can that increase in price be sustained? Health care may be a Public Good, but information about health is monopolised by those who will push it as far as the market will bear.

Darnton acknowledged that publishers will reply by deprecating the naiveté of the Jeffersonian framing of the issue. And, he conceded, journal suppliers clearly add value; it’s fair they should benefit – but how much? Publishers often invoke the concept of ‘marketplace of ideas’ But in a free marketplace, the best will survive. For Darnton, we are not currently operating in a free marketplace, as demand is simply not flexible  – publishers create niche journals, territorialise, and then crush the competition.

The questions remain, then, how can we provide a context where more knowledge is as much as possible freely available to all? Where we can leverage the internet to change these locked down and monopolised chains of supply and demand?  The remainder of Darnton’s talk outlined the approaches being taken by the DPLA initiative. It’s early days, he acknowledged, but significant inroads are already being made.

So what is DPLA? A brief overview

Darnton addressed (in relative brief) the scope and content of DPLA, the costs, the legal issues being tackled, technical approaches, and governance.

Scope and content: Like Discovery, the DPLA is not to be One Big Database – instead, the approach is to establish a distributed system aggregating collections from many institutions. Their vision is to provide one click access to many different resource types, with the initial focus on producing a resource that gives full text access to books in public domain, e.g. from  Hathi Trust, the Internet Archive, and U.S and international research libraries. Also carefully highlighted that the DPLA vision is being openly and deliberately defined in a manner that makes the service distinct from those services offered by public libraries, for instance excluding everything from the last 5-10 years (with a moving wall annually as more content come available as Public Domain).

The key tactic to maximise impact and reduce costs will be to aggregate collections that already exist, and so when it opens, it will likely only contain a stock of public domain items, and will grow as fast as funding commits. To achieve this, it will be designed in a way that as much as possible makes it interoperable with other Digital Libraries (for example, an agreement has already been made with Europeana). So far funding has been dedicated to building this technical architecture, but there is also a strong concentration on ongoing digitisation and collaboratively funding such initiatives.

In terms of legal issues Darnton anticipates that DPLA will be butting heads against copyright legislation – he clearly has strong personal views in this area (e.g. referring to the Google Books project as a ‘great idea gone wrong’ with Google’s failure to pursue making the content available under Fair Use)  but he was careful to distinguish these views from any DPLA policy in this regard.  But as DPLA will be not-for-profit, he suggested that they might stand a good chance to invoke the Fair Use defence in the case of orphan works, for example. But he also acknowledged this is difficult and new territory. Other open models referenced included the case of a Scandinavian style licence for public digital access to all books. He also stated that he sees the potential for private partnerships in developing value-added monetised services such as apps – while keeping the basic open access principles of the DPLA untouched.

The technical work of DPLA is still very much in progress, with a launch date of April 2013 for a technical prototype along with 6 winning ideas from a beta sprint competition. More information will be released soon.

In terms of governance, a committee has been convened and has only just started to study options for running DPLA.

Some questions from UK stakeholders

The Q&A session was kicked off by Martin Hall, VC of Salford University, who commented that in many ways there is much to be hopeful for in the UK in terms of the Open agenda. Open Access is going strongly in the UK with 120 open access repositories; and, he stated, a government that seems to ‘get it’ largely because of a fascination with forces in the open market. As a result there is a clause in new policy about making available ‘openly’ public datasets.  This is quite an extraordinary statement, Hall commented, given the implications for public health, etc. and this is possibly indicating a step change. But it all perhaps contributes to the quiet revolution occurring around Open Access.

Darnton responded by highlighting that in the USA they may have open access repositories, but that there is a low compliance rate in terms of depositing (and of course this is an issue in the UK too). But Harvard has recently mandated the deposit; and while there was less than 4% before, there is now over 50% compliance, and the repository “is bulging with new knowledge.”

In addition, Darnton reminded the group, while the government might be behind ‘Open,’ we still face opposition from the private sector. A lot of vested interests feel threatened by open access; and there is always a danger of vested interest groups capturing attention of the government.  But, he said, it’s good to see hard business reasons are being argued as well as cultural ones, but we need to be very careful.

Building on this issue, Eric Thomas, Vice Chancellor of Bristol University raised the issue of demonstrating the public value – how do we achieve this? He noted that the focus of Darnton’s talk was on supply side, but what about demand? To what extend are DPLA looking at ways to demonstrate public value, i.e. ‘this is what is happening now that couldn’t happen before…’?

In his response, Darnton referred to a number of grassroots approaches that are addressing this ‘demand’ side of the equation, including a roving Winnebago ‘roadshow’ to get communities participating in curating and digitising local artefacts. In short, DPLA is not about a website, but an organic, living entity… This approach, he later commented was about encouraging participation from the top down and bottom up.

Alistair Dunning from JISC posed the question of what will ‘stop people from going to Google?; Darnton was keen to point out that while he critiqued Google’s approach to the million books copyright situation, DPLA was in no way about ‘competing’ with Google.  People must and will use Google, and DPLA will open their metadata and indexes to ensure they are discoverable by search engines. DPLA would highly value a collaborative partnership with Google.

Peter Burnhill from EDINA raised the critical question of licensing. Making data ‘open’ through APIs can allow people to do ‘unimaginable things’; what will the licensing provision for DPLA be? CC-0?  Darnton acknowledged that this was still a matter of debate in terms of policy decisions – and especially around content. He agreed that there were unthought of possibilities in terms of Apps using DPLA, and they want to add value by taking this approach (and presumably consider sustainability options moving forward).  In short, the content would be open access, and metadata likely openly licensed, but in terms of reuse of the content itself, this *could* be commercialised in order to sustain the DPLA.

In a later comment, Caroline Brazier from the British Library expressed admiration for the vision and the energy and the drive. She explained that from the BL perspective ‘we’re there for anybody who wants to do research’; She highlighted how the British Library and the community more broadly has a huge amount to do to push on with advocacy, particularly around copyrighting issues.  This, forces all institutions of all sizes to rethink their roles in this environment – there are no barriers here, she suggested: we can do things differently. We need to think individually about what we do uniquely. What do we do? What do we invest in? What do we stop doing? Funding will be precious, and we really need to maximise the possibility to get funding.

Darnton agreed, and stated that there is a role for any library that has something unique to make it available (and of course, the British Library is the pinnacle of this). The U.S. has many independent research libraries (the Huntington, Newberry, etc) and they very much want to make room for them in the DPLA; they want to reach out to these research libraries who may be open minded but are behind closed doors in terms of broader public.

The final (and perhaps one of the most thought-provoking questions) came from Graham Taylor from the Publishers Association. He stated that he concurred with much of what Darnton had to say (perhaps surprising, he suggested, given his role) but he did comment that throughout the afternoon he had “not heard anything good about publishers.” So, he asked, where do publishers fit? In many regards, publishers are the risk-takers, the ones who work to protect intellectual property, and get all works out there – including those that pose ‘risk’ because they are not guaranteed blockbusters.

Darnton strongly agreed that publishers do add value, but, he explained, what he’s attacking is excessive, monopolistic commercial practices to such an extent that they are damaging the world of knowledge.  He was struck by Taylor’s comment on risk-taking, though, for indeed publishing is a very risky business. But sometimes the way risk is dealt with is unfortunate, with that emphasis on the blockbuster as opposed to a quality, sound backlist. So what can be done about this risktaking and sharing the burden? Later this year, he said, Harvard would be hosting a conference that explores business opportunities in publishing in open access. If publishers are gatekeepers of quality, how can open access can be used to the benefit of publishing, and so alleviate that risk-taking and raise quality?

The Case for Discovery – reflections on a presentation to RLUK members

November 27, 2011

David Kay – – 27 November 2011

I was pleased to have the opportunity to talk about the Discovery initiative at the RLUK members meeting last week (#rlukmm11). This blog post picks up key points raised in the Twitter stream (thanks especially to Simon Bains, Mike Mertens, Tracey Stanley and Owen Stephens) and links them to my concluding suggestions.

The presentation mixed an update on progress to date (because a lot has happened in the six months since Discovery was ‘named’) with a focus on emerging business cases for further investment of valuable institutional and collective effort in this space, leading to some collective considerations for RLUK.

My suggestion is that the ‘business case’ for investment in resource description and discovery hinges on opportunities for gains in economy (back of house), efficiency (relating to both library and patron workflows) and effectiveness (better supporting 21st century modes of research and learning). I’ve set out 10 benefit cases drawn from recent institutional and consortium projects in a recent Discovery blogpost. However, as pointed out in questions, these ‘business arguments’ need to be sharpened to identify ROI and how it will be measured – member suggestions will be most welcome!

In addition, I proposed ‘expression’ as an essential part of the top line business case. However good the service offered by local discovery layers and globally by such as Google, there is a gap between the way records are currently discovered and the style of connected navigation that could be offered though more complete, consistent and connected classification geared to academic enquiry. This is about taking the value we already provide in ‘cataloguing’ and making that work for us in the web of scholarly data within and beyond our institutional controls – across libraries, archives and museums, plus such as learning assets, OERs and research data.

Making relevant ‘stuff’ (the library catalogue and the rest, within and beyond the academy) discoverable as linked open data is an obvious way to support this approach – but my key point is about a business requirement (truly joined up expression of scholarly data and metadata) rather than a technology. I suggested that, of all the things to be done to enact that transformation, senior managers should concentrate on the key enablers – metadata licensing, use of common identifiers and authorities across all types of records, service sustainability and measurement – whilst ensuring appropriate staff are skilled in the mechanics.

Comments on the Twitter stream, suggested that there is very little distance between this Discovery proposition and what Paul Ayris set out in the recommendations emerging from the shared cataloguing working group. Owen Stephens tweeted that perhaps this represents the major Discovery use case from an RLUK perspective – though we definitely need the Discovery programme to exemplify more cases. Both these presentations indicated a long term objective geared to serving teaching, learning and research, whilst offering economies and efficiencies along the way. However the 5 years horizon is very distant – and therefore I would emphasise the complementary short term opportunities and stepping stones listed at the end of my presentation.

  1. Liberate Copac – publish Copac as open data and potentially as Linked Open Data; the first cut may only involve limited authorities but would still enable the potential to be tested alongside such as the Archives Hub
  2. Animate Knowledge Base Plus – play a leading role in the collective population of this shared subscription and licence dataset, which may be of significant assistance in future licensing work with JISC Collections
  3. Review scope of other RLUK initiatives – establish whether such as common authorities and open licensing may be priority components in such as the shared cataloguing and special collections work
  4. Assess the wider curatorial landscape – identify where RLUK could be taking collective steps of this type in areas such as learning assets and research data
  5. ‘Understand’ e-books in this context – whilst the metadata supply chain and workflows remain extremely uncertain, alignment with this direction of travel will be essential (and in 5 years may be a lost opportunity)
  6. Consider action on identifiers, authorities and access points – all of the above raise the challenge of collectively adopting key reference points, presumably including name, place and subject; a working group specifically focused on this and looking beyond libraries may be of value

My personal observation is that these represent immediate and low cost collective opportunities to assess and develop metadata infrastructure in anticipation of the roles that RLUK might play in a changing knowledge and service environment, both within the academy and in the wider UK context.

And last but not least, thanks again to RLUK for the chance to attend a very stimulating event.

Discovery conference, May 26th. An overview of the day’s discussions

July 1, 2011

Here is a summary of the main ideas and themes from the presentations and discussions at the Discovery Conference at the Wellcome Institute, London, on 26 May 2011. It’s based on notes taken at the time, and is therefore by necessity to some extent selective, but I’ve tried to be comprehensive and true to the spirit of the day. I’ve included references to some of the key twitter themes as these help to highlight issues of interest to the community.

Jane Plenderleith of Glenaffric Ltd (and member of the Discovery Communications Team

Opening Address from David Baker, JISC

Our starting point was the RDTF Vision Statement of 2009. Since then there’s been some discussion about scope, suggesting that the vision should not be limited to UK HE. Following some heated discussion at the 2010 JISC conference, the vision is about opening access for all. But we have to start somewhere, hence the focus on UK HE. In our definition of the future state of the art, it’s important not to try to project too far forward, so the focus is on what we aim to achieve by 2012.

We are aiming for integrated seamless access, focusing on UK HE in the first instance, with a thorough and open aggregated layer, designed to work with all search engines, through a diverse range of personalised and innovative discovery services. Increasing efficiencies is clearly important for sector leaders and managers – the potential of open data to address this priority needs to be emphasised.

At the moment we are in Phase 1 of this process, focusing on open data. More detail about the call from JISC for moving into Phase 2 will follow. Phase 1 achievements include:

ñ  Excellent Open Bibliographic Data Guide

ñ  Projects

ñ  Metadata Guidelines

ñ  Newsletter  (people can sign up, keep in touch, feedback, engage in interchange of ideas and experiences)

There was a successful event in April, with good engagement, proposals for further work, and suggestions for a ‘Call to Arms’. This has resulted in the Statement with eight Open Metadata Principles in the pack for today’s event.

Eight projects have already been funded, focusing on a broad and appropriate range of issues, providing a test-bed for the Phase 1 work, and giving us a good platform on which to build Phases 2 and 3.

We are working on making metadata open and easier to use, distilling advice and guidance from the eight projects. Key stakeholder engagement is vital to this process. This new phase of work is under the brand ‘Discovery’. RDTF was a clunky old name, but when boiled down to the essence, it’s about developing a metadata discovery ecology for UK education and research.

Engaging stakeholders and developing critical mass is key. With the community we want to explore what open data makes possible. Since the first event in Manchester on 18 April many people have signed up to the Statement of Principles. Today’s event is about using this momentum to move the open data agenda forward.

Part 1: The Demand Side — User Expectations in Teachin, Learning and Research

Keynote 1: Stuart Lee

Stuart was addressing the conference (by filmed videolink) wearing two hats, one as a researcher in the humanities and one as an IT service manager at the University of Oxford.

He started with a historical overview of his data usage techniques: ‘When I was writing my PhD thesis, I had to produce a glossary. The normal method at the time was using a card system, which took a long time (a friend took one and a half years). But I was trained to use a text analysis tool, so it took me three hours. Later I was asked to produce a monograph of my thesis, but instead I made a pdf and put my thesis on the web. I didn’t know at the time that this was in fact open publishing, but this had far more impact than if I had published in book form. It’s been downloaded thousands of times, and made my reputation in this field.’

Stuart went on to make reflections on how researchers in the humanities work, and what open data might mean. Researchers in the humanities never really finish. Projects have a long life span and are often revisited. We work in an iterative cycle, our research is unbounded and incomplete. We don’t just publish and move on to the next thing. We tend to work alone, in our own way, not in teams in labs. Print is a very important medium for us. We use primary and secondary resources, we find stuff through browsing catalogues.

In a nutshell, we just want to find ‘useful stuff’. Modern researchers are less worried about provenance, they are more concerned with usefulness. Many collections that we use are built by other academics working in our field.

We use tools to edit, analyse and compare data. We need to organise material so we can quickly find it. We have to present our work in a particular way – present an argument, combine primary and secondary resources. Citation is important, particularly of recognised names in the field. The material we produce has to be safeguarded, archived, so we can come back to it and others can use it. We want it to be available for a long long time. It does not go out of date like science stuff does.

So what opportunities does open data present for researchers in the humanities? We are very interested in open data and the Discovery agenda. We can now achieve the previously impossible – find relevant resources quickly, deal with mass quantities of data (example: corpus linguistics), achieve low cost distribution (example: iTunesU). Storage is no longer a problem, we can search across data silos from our desk and take advantage of cross-searching possibilities.

Perhaps we undervalue serendipity when we are looking for resources. If you are scanning books in the library, you find useful stuff on either side of the one you are looking for. If you are browsing data on a keyword search this throws up lots of possibilities.

There are a lot of chances for collaboration using online tools. We work very much in our own sub domain, with international connections in our field. We need better bibliographic tools like Mendeley.

Inevitably, open data poses some problems and challenges. Who is a researcher? Increasingly libraries have to incorporate meeting the needs of people beyond the usual HE sphere including the public and corporate bodies. There’s a lack of awareness about what is available. There’s a need for better standardisation. Text analysis tools haven’t advanced much in 20 years, and training undergraduates in their use is still necessary. There is still a problem with accessing data when we don’t know its provenance.

We need to break free of the stranglehold of academic publishers – we in the humanities are every bit as fed up about this as people working in the sciences. The system we have at present is unsustainable. We need to make metadata open to make it easier to find things. There are more challenges relating to the analysis of data, and preserving knowledge. We need support for adopting open content, both top down and bottom up.

Stuart ended with some comments on the changing nature of the library itself, the concept being no longer of a physical building, but a whole plethora of bodies holding information and making it available in what Stuart called the ‘cl**d’ (he doesn’t care for the word).

Keynote 2: Peter Murray Rust

PMR’s focus was researchers in the STEM field, and he was provocative from the outset. How many practising scientists are in the room? None. That’s the problem – scientists have no use for university libraries and repositories.

There are global and domain solutions to resource discovery. We have the technical solutions – what we need to make this happen is political will. For example, only those universities which have mandated publishing work in repositories (such as Ghent, Queensland, and to some extent Southampton) actually use them.

By comparison, look at the Open Street Map project (an open information resource for global maps). People have really contributed to this. They even held mapping parties. Example: after the earthquake they created a digital map of Haiti in two days for the rescue services. That’s the power of crowd sourcing. But there is no sense of the power of this in JISC – their strategy is to rely on publishers getting the stuff for us. But publishers, says PMR, produce garbage (this remark aroused amused assent from most of the people in the room).

PMR continues in this provocative vein. ‘It is quite simple for us to produce our own discovery data. Example: I have an interest in UK Theses, so I went to Ethos. I went with a simple and fundamental question – trying to access all Chemistry theses published in the UK in 2010. But they are scattered over different repositories, not searchable, and not available in any integrated way. In France they have SUDOC Catalogue– with 9 million bibliographic data references. If there is one clear message from today, it’s “do what the French do”.

It is technically trivial to turn documents into pdf, but this is an appalling way of managing data. PDF is like turning a cow into a hamburger. You can’t turn the hamburger back into a cow. (The twittersphere took up this comment and retweeted it many times).

Another example of where it doesn’t work: I put 2,000 objects into the Cambridge D-Space, but then I couldn’t get them out. I had to write some bash code to get my own objects back out again.

More provocation: We are paralysed by the absurdity of copyright. I know people who delight in not doing anything because of copyright. Any small interaction that is not automatic kills open data. Google just goes and does it.

PMR’s solution to these problems was to build his own repository – a graduate student did this in a year, which now costs about 0.25 FTE to maintain each year. Some funding was secured under JISC Expo to make open bibliographic data available. We have ‘liberated’ 25 million bibliographic references. It’s important to aim outwards not inwards. Example: PubMed is funded in the UK by Wellcome Trust. This organisation has done more than the whole of UK HE to push the open data agenda forward.

For PMR, what would really make this work is support from the major research funders. Wellcome, RCUK, Cancer Research UK. But they are not here today. If the funders were to mandate that all the work they fund is published openly, and state that if you don’t publish your data you won’t get another grant, this would have a serious impact. All that would then be required would be to manage the bibliography, and that’s easy. Open data just requires political will and management to make it happen.

Research Conversation

The opening keynotes gave rise to a lively debate about open data for research, with comments and questions from lots of people. The tweet wall was also animated, echoing key points and making further suggestions and generating ideas. Here’s a summary of the main questions and comments from this session:

What is the value of open data to researchers? What’s the value of a map to geographers? It’s a vital resource – we need to know who is doing what, with links to everything, with that we have the complete spine of scholarship. Bibliography is the map of scholarship. There are also management uses for data about published papers.

PMR said that data are complicated, diverse, and domain dependent. Every discipline is different and has its own views on what data is. It will take 25 years to sort what scientific data actually is on a technical level.

How important is provenance? Researchers care about provenance and how something came about. We need to exercise critical appraisal when assisting the construct of information sources. But while provenance is important, it is also incredibly difficult. In the first instance what’s important is that the data is available

What do we have to say to the funders to make them listen? Funders want the work they fund to be widely used, discovered, read, computed, built on.

The tweet wall at this point was alive with comments about IPR and copyright risks.

Is there the same ethos of collaboration and openness for museum data? Museums are protective of what is effectively their life’s work. There are copyright worries about the protection of intellectual capital. Providing an open record to a world where it might be challenged or used in a context for which it was not intended is quite challenging for museums. But it was also noted that there are people in museums who do want to share.

Should publicly funded research institutes make their data openly available? PMR praised organisations like BAS and NERC which are dedicated to maintaining data and making it publicly available. He noted that in academic communities this practice is variable. Some researchers would die rather than make their data available, while others are doing this quite freely. In some places there is an embargo on publication for five years, in case people might find out what they are doing. Issues relating to university ethics and data storage policies were mentioned.

It was suggested that what is needed in the sector is strong leadership promoting open data. There’s a particular problem with senior academic managers, working in a factional REF-dominated culture of competition. In industry, competitors manage to work together on issues of common interest, while still maintaining competition.

David Baker summarised the key issue: it is becoming apparent that the political and legal challenges to open data are more difficult than the technical.

Keynote 3: Drew Whitworth

Drew’s focus was the role of open data for teaching and learning in a variety of formal and informal contexts. A key theme was information as a resource in the environment – it does not diffuse itself evenly, it can be controlled, polluted, degraded. Drawing on Rose Luckin’s 2010 work ‘An ecology of resources’, Drew noted that an ecology evolves in a dynamic way. When you use resources you transform them into something else. This can be a problem – if we transfer resources into pollution, we are not using them in a sustainable way. Sustainable development means you meet current needs without damaging the process of meeting needs in the future. How are you using information now? Are you developing resources that will lead to enhanced resources in the future? We need to use resources now to build resources for the future.

In his book Information Obesity (2009) Drew presented the argument that while logically, information is a good thing and we need it, a lot of information can be a bad thing (why do we talk about information ‘overload’ not ‘abundance’?) It is the same with food – it is possible to have too much food or the wrong kind of food. Fitness means eating smaller amounts of right kind of food. We are under pressure to consume, and this works for information as well as food. Obesity is not just about over-consumption. It depends on individuals, and purpose. Athletes process lots of calories. Some of us can process lots of information. But we don’t want to turn learners into information processing machines.

Drew described the JISC-funded MOSI-ALONG project which was trying to connect museum artefacts of local relevance with real people and stories from the community.

In summary: we have to remember that learning and information processing happens all the time through communities. If we don’t look after our information environment it will become polluted. Environments are healthiest when they are diverse. We need to look after these environments, protect against storms and national disaster. It falls upon people in workplaces, and business leaders, to make sure the information environment on which we depend is sustainable. Our task is to look after these environments, and it’s everyone’s responsibility.

Teaching and Learning Conversation

Does the UK discovery ecotecture need to concern itself with usability or are we simply aiming to get the stuff out there? We definitely need a usability strategy. Otherwise people can just shove data in and it’s unusable.

We also need to be aware of our filtering strategies. We are programmed to filter sensory information all the time. We have known tendencies to filter out information that challenges our primary beliefs. You want to give help and guidance, you have a mental model of the data, you have some organising principles, but you need to build in some flexibility in case your mental models do not match those of your data users. This is key to effective use of these resources for learning and teaching. We need to guide, help, but not fix and control.

There is a danger in the paradigm of respected provenance, we need to be wary of gate-keeping, and think about filtering throughout the chain of use. But from a metadata standpoint – if we try to predict how users will use data, we tie ourselves in knots. For the Discovery initiative there is a sense we just have to get the content out there and communities can practice, can start to repurpose for their needs.

Usability and discovery are different but related. The challenge is – we are used to usability in terms of HCI making it easy for people to navigate and use.

But how do we channel that thinking about flexible usability while still making it possible for people to uncover the complexity of the data?

Whatever usability criteria there are need to be continuously reviewed in the light of how people are using the data. Any organising principle can become too restrictive. Scaffolding learning is a good principle – but when the job is done the scaffolding comes down. The challenge is finding a way to use scaffolding for information retrieval then take it down so people can find for themselves.

We need a discussion about the nature of infrastructure, so the scaffolding notion is useful. If we immediately apply this – we have processes that generate metadata, and much of it is context bound. We are moving towards just-in-time metadata, that is generated from processes. You might need the scaffold for 10 seconds or 10 months.

The elephant in the room is VLEs. People who are populating VLEs are not putting together temporary scaffolding, it’s a bit more permanent. There are competing approaches to describing resources and we need to take this into account. The problem with VLEs for learners is that the second they leave the institution they no longer have access.

Information is resource in a context. What else is necessary in order to turn information into learning? It’s not really possible to say what turns information into an educational resource. The quality of teacher, the motivation of student, the relevance in context. You cannot reduce education to a science, it is unpredictable, conversational, context specific.

There has to be redundancy to make our ecology healthy and diverse. Funders make decisions. JISC has a pretty good approach, do consultation, collaboration before they set priorities. For others funding is based more on political expediency, and this is worrying. There is a need to prioritise developments, but let’s do this in the right way and leave some room for flexibility.

At this point in the proceedings there was a welcome break for lunch.

Part 2: The supply side: Opportunities to expand access and visibility

Summing up the morning, David Baker said:

Seamless access, flexible delivery, personalisation – if we can put these three together, there is a very exciting future.

The afternoon session was chaired by Nick Poole. HE/FE says ‘we need this’. Just do it. Politicians say just do it. We say do it, but do it well. The afternoon session was to present examples of people who have just done it.

Veronica Adamson: The Art of the Possible, Special Collections

Key points from the discussion:

Special Collections may be the key that unlocks potential of open data for many people – it resonates, there is an understanding, examples of where LAM can really work together.

Having a business case is essential – LAM managers need to be able to make the case for open data on the basis of efficiency savings, improving the quality of learning and teaching, enhancing research output, widening participation, raising the profile of the institution.

Collections experts may not be the best people to make the business case for managers. It’s not just about listing benefits, it’s about costs and benefits.

There is some interesting work by Keith Walker at IoE looking at learning trails in museums.

Peter Burnhill: Aggregation as Tactic

Aggregation means combining different sources of data, seeing the machine as user.

There is a purpose beyond discovery.

Do we need to decouple the metadata layer from the presentation layer? This is a techie question but it’s important.

Supply and demand – maybe we are all middle folk. We are adding value by bringing different streams of data together, making them more amenable for access. Aggregation is an intervention with some purpose.

There was some activity on the tweet wall at this point relating to important parallels between the discovery agenda and OER in terms of aggregation as a tactic.


The final session was a conference discussion about the scope of projects which might catalyse collaboration, focusing on events, celebrations and anniversaries in 2012. The debate continues.