w/c 24 December – Discovery news roundup

December 31, 2012

UK and JISC Discovery Project news:

Europeana news:

Other news from Europe and beyond:

Event reports:

  • The University of Leicester’s live debate, ‘Museums in the information age: Evolution or extinction?’, which took place at the Science Museum is available to listen to onlineA Guardian article covering the debate notes the importance of digital discovery: “[…] some digital resources produced by museums quickly become disposable if not easily discoverable by potential users.”
  • The Culture Hack event, which took place at the Google Campus, centred on envisaging new ways for London’s schoolchildren to interact with and be inspired by the city’s cultural heritage. Martin Belam’s blogpost provides a good overview of the day and highlights a thought provoking comment from the British Library’s Nora McGregor“It’s about teaching metadata to children.”

Call for contributions:

w/c 12 November – Discovery news roundup

November 15, 2012

One particularly interesting thing I noticed this past month was that tweets about open data, linked data and metadata were starting to come thick and fast from people within my network who sit well outside the library and cultural data domains. In particular the tweets from attendees of the Lasa’s Charity Digital Summit and the ‘Nesta in Manchester’ event about innovation seemed to include a rich vein of tweets about all things open. Perhaps an indicator that open data’s tipping point is approaching?

Some highlights from the world of resource discovery and open data in recent weeks:

Updates from a couple of large-scale projects in Europe and the US:

News from Discovery partner organisations:

Please note that places are still available for the second of our free Discovery Licensing Clinics on 30 November in London. It is an opportunity for managers and decision makers from libraries, archives and museums to get practical advice on open data licensing from our assembled team of experts.

Update on the activities of the Phase 2 Discovery projects

October 19, 2012

Latest news from the Phase 2 Discovery projects:

Last month all of the Phase 2 Discovery projects met to share information as they approach the end of their projects. Several of the projects have published their final blogposts which share their key lessons learnt and project outputs:

w/c 8 October – Discovery News Roundup

October 12, 2012

In recent weeks the Discovery Team have been finalising and releasing a whole suite of online materials which reflect our continued focus on the business case for Discovery. In September we released a collection of eight videos containing the reflections of UK academic library directors on topics such as the key issues and challenges for resource discovery, the value of making special collections visible and the potential of collaboration.

The videos were launched in our latest Discovery newsletter, along with our Case Study collection and Guidance Materials which aim to highlight and support current real-world practices relating to the Discovery Open Metadata Principles and Technical Principles within museums, libraries and archives.

The work of the Discovery programme has informed the latest animation from the OER IPR Support Project: ‘Open Data Licensing’ which you can view below. A key aspect of the Discovery programme’s approach is “establishing clarity of understanding around licensing and open data” so it’s good to see such a complex issue described in an accessible way – it doesn’t remove any of the inherent complexity but it breaks down and clarifies that complexity, which is an important initial step towards enabling action.

Some highlights from the wider world of resource discovery and open data:

  • In September, Europeana continued to set the pace in cultural data aggregation by opening up metadata for more than 20 million cultural objects for free use under the Creative Commons CC0 Public Domain Dedication licence. Their release represents the largest one-time dedication of cultural data to the public domain using the CC0 waiver and opens up the possibility of innovative apps, games, web services and portals being developed. The move also ‘holds the potential to bring together data […] from other sectors, such as tourism and broadcasting’. As Jill Cousins, Executive Director of Europeana said: “This move is a significant step forward for open data and an important cultural shift for the network of museums, libraries and galleries who have created Europeana”. EC Vice President, Neelie Kroes referred to the Europeana release as a ‘treasure trove of cultural heritage.
  • At the Healthcare Efficiency Through Technology Expo this week, Garry Coleman talked about the NHS Information Centre’s plans for a large-scale open data release, involving millions of rows of data being made available under an Open Government Licence. This release reflects the wider importance of transparency as a motivator for open data, particularly within governmental/publically-funded organisations. It also could be a watershed moment for the release of anonymised sensitive data which could further open up the way for the, arguably much less contentious, sharing of open metadata that our sector is working towards.
  • I mentioned Cooper-Hewitt Labs’ Director, Seb Chan, in my digest last month and his latest blogpost about being ‘of the web’ rather than ‘on the web’ is another interesting read. They are embracing the porosity of the internet and working with websites such as Behance to surface their collections and associated information out in the wild. In doing so they are finding creative ways to tackle potential showstoppers such as control over branding and retaining attribution. Their approach enables them to keep their expertise focussed on activities that are within their own domain and offers up an interesting blueprint for externally located engagement and visibility.
  • Rewired State are running an ‘Open Science’ hack day event in partnership with the Wellcome Trust in December.
  • The Open Data Network have launched the Open Data Showroom website which looks like it will become a very useful ‘at a glance’ resource for finding interesting sources of, and uses for, open data.
  • Leigh Dodds’ blogpost identifies a simple model for exploring the sustainability of open data curation projects such as legislation.gov.uk.
  • A significant release of legislative open data was announced this week on the Open Knowledge Foundation website, which reported on the release of US Congress legislative data going back to 1973.
  • The latest Arts Council Digital R&D podcast focuses on how organisations can use digital technology to open up archives, collections and data. It includes news from the V&A and the British Museum and considers the impact of projects such as Google’s Art Project.
  • And staying with Google, this week saw the launch of the Google Cultural Institute which aims to “preserve and promote culture online”. The Cultural Institute website presents curated cultural artefacts in online galleries, together with search and browse facilities. The individual artefacts retain their attribution to the holding organisation and, in some cases, the associated metadata can also be viewed. It’s not immediately obvious how open the underlying data is but it appears to be a walled garden at the moment.

w/c 27 August – Discovery News Roundup

August 31, 2012

Earlier this month OCLC announced their recommendation that member institutions use the Open Data Commons Attribution (ODC-BY) License when releasing WorldCat-derived library catalogue data. You can read David Kay’s response to that announcement here on the Discovery blog. And last week there was news that OCLC and Europeana are collaborating on a project developing ‘semantic similarity’ that will improve the experience of searching aggregated metadata by identifying items that are near duplicates or related to each other. The wider significance of this project is that it will feed into the Europeana Data Model and will provide “opportunities to develop new data services for third parties.”

It’s long been asserted that content is king, and more recently that context is queen but now Associated Press are investing in marrying the two in order to speed up the distribution of content. Clearly the Associated Press business model is based on syndication rather than aggregation but Melody K. Smith’s assertion that “[f]indability only works when a proper taxonomy is in place.” seems worth some thought with regard to its relevance for our sectors.

TechCrunch’s article pitching Mendeley’s open API against Elsevier’s closed API is flawed but it’s worth reading for the comments it provoked, particularly from Elsevier’s Director of Platform Integration, Ale De Vries. You can read more about the growth of Mendeley’s API service on the Guardian Technology blog – it’s interesting to note that their future plans involve developing their API service into a multi-directional dataflow that will allow applications built on their API to talk to each other and to upload data to Mendeley.

Seb Chan’s candid blogpost reflecting on the Cooper-Hewitt Design Center’s experience of openly releasing their collection metadata is a useful and timely reminder that a) issues around the quality of released metadata need to be addressed if we want anyone to use the data we’re releasing and b) “collection metadata [has value as a tool for discovery] but it is not the collection itself.” Seb’s point about museum collections being no match for the comprehensiveness of libraries and archives highlights the importance of open metadata, by enabling cross-institutional aggregation, and the work of OCLC and Europeana’s ‘semantic similarity’ project I mentioned above. In an ideal world it will also enable the public permeability that Seb touches on by connecting our collections with the boundless ‘amateur web’ corpus.

News from Discovery projects

w/c 16 July 2012 – Discovery News Roundup

July 20, 2012

The past few weeks have seen some fairly significant announcements within the UK, in Europe and beyond, regarding linked data, APIs and global discovery services. Below are some of the highlights:

  • The European Library’s new union search portal was launched at the LIBER Conference and opens up online access “to more than 200 million records from the collections of national and research libraries across Europe” – UK contributors to the initiative are:British Library, Wellcome Library, Bodleian Libraries, University College London and the National Library of Wales. European Library The European Library has made an OpenSearch API available for developers on a non-commercial use basis but unfortunately it is only available to member libraries.

“One other thing that came up is that the situation of libraries differs greatly from other cultural institutions. This [is] because they are often not the owners of their metadata, but buy this from a commercial company. This means that open data is often not discussed in the library world because they argue that it is not their choice to make. As a result the librarians remain invisible in the discussion about how [to] provide service in a digital age.”

w/c 11 June 2012 – Discovery News Roundup

June 12, 2012

Although I’ve been away over recent weeks, activity within the world of open metadata has continued unabated – here is my digest of activity from within the Discovery programme and from further afield.

Joy Palmer’s talk at the Joint Content and Discovery Programme contained a wealth of information about the current open metadata landscape, including links to a still relevant 2010 Economist article on the ‘data deluge’ (see also their report on the potential, and conversely the problem, of ‘superabundant data’). I’d argue that the increased quantity of data isn’t necessarily creating lots of new information management issues but it certainly makes those issues more visible and more pressing as soon as we move from passively collecting data to wanting to actively exploit the potential of that data.

Last month OCLC released the Virtual International Authority File (VIAF) dataset under an open data licence, together with their guidance on attribution. OCLC has also recently launched their WorldShare Management Service which provides libraries with “a new approach to managing library services cooperatively, including integrated acquisitions, cataloging, circulation, resource sharing, license management and patron administration, as well as a next-gen discovery tool for library users.” (emphasis is mine).

America’s National Institutes of Health (NIH) presented a showcase of the National Library of Medicine APIs that are available to developers. A recording of the live webcast is available to view online. The NIH has clearly decided to move beyond the more commonly found ‘build it and they will come’ approach and are actively engaging the developer community to help them understand what APIs are available. More recently they ran a two day Health Datapalooza event which brought together NLM data experts and developers. The event was livestreamed and you can view the archived video online.

Closer to home, discussion of data in The Guardian has made it out of their Data Store pages and into the pages of their Culture Professionals Network blog. Patrick Hussey has written a three -part wide ranging exploration of data within the arts and culture sector which argues that it is time to open up performance paradata and look at ways of making their shared data count. Patrick’s main focus is on open data rather than open metadata but the series is very thought provoking and in his second article he points to the work of The National Archive in creating an open API database legislation in the shape of: http://www.legislation.gov.uk/

The BBC Connected Studio project is an open collaboration initiative that kicked off in May and is initially focused on developing new approaches to personalisation using DevCSI-style hackspace gatherings to bring together digital talent from outside the BBC. Later this year the focus shifts to “connected platforms and big data” which could mean some interesting developments that the MLA sectors might benefit from and opportunities for MLA developers to get involved by responding to Connected Studio call for participants.

The BBC Online team have managed to communicate their search and discovery strategy very clearly in the second of the videos included within this Connected Studio blogpost.

link to the BBC blogpost containing the video

The Imperial Museum is heading an international partnership of organisations in the run-up to the beginning of a four-year programme of activities to commemorate the First World War Centenary: “Through the partnership, colleagues from a variety of sectors [including museums, archives, libraries, universities and colleges, special interest groups and broadcasters] have the opportunity to communicate with each other, share and combine resources, cooperate and co-develop products and services that complement each other […]”.  It will be interesting to see whether any developments similar to the Will’s World Discovery aggregation project emerge as a result of such a broad collaborative partnership.

Discovery Licensing Clinic

May 23, 2012


photo credit: Ed Bremner

The first Discovery Licensing Clinic brought together representatives from a number of different libraries, archives and museums to spend a day considering practical responses to the Discovery open licensing principles and getting practical guidance from the assembled experts. It was an opportunity to identify issues and discuss the range of tactics that institutions might adopt in scoping metadata releases and making the associated licensing decisions.

Our panel of experts on the day consisted of Francis Davey (Barrister), Naomi Korn (Copyright Consultant), Paul Miller (Cloud of Data) and Chris Banks (University Librarian & Director, Library, Special Collections & Museums, University of Aberdeen)

Chris Banks has written a blogpost reflecting on the day and her presentation slides can be viewed below:

The issues around licensing open metadata do represent a significant hurdle for institutions but none of those issues are insurmountable. Our hope is that licensing clinics such as this one, and the ones we plan to run in the future, will give managers and decision makers the knowledge they need to progress the open metadata agenda within their organisation.

Highlights from the Content and Discovery Joint Programme event

May 22, 2012

On the 23rd April colleagues from projects across the Discovery, JISC Content, JISC OER and Emerging Opportunities programmes gathered in Birmingham to share knowledge and identify shared challenges and key agendas that need to be progressed going forward. As is often the way with these types of events the discussions that took place over a day and a half were as useful to those running the event as they were for the delegates attending. The notes below represent just a handful of my highlights.

Joy Palmer presented on behalf of the Discovery Programme and gave a compelling overview of the challenges and aspirations we share around the discovery of content. She highlighted how, as the RDTF work was translated into the Discovery initiative, it became clear that we needed to talk in terms of an ecosystem as opposed to an ‘infrastructure’ because the latter suggested that the initiative was aiming to impose an overarching infrastructure model over the entire museums, libraries and archives (and JISC) discovery space.

“To a large degree, what today is about is determining to what degree we can operate as a healthy and thriving ecosystem, where components of our content or applications interact as a system, linked together by the flow of data and transactions.”

But as Joy stated, this is not to oversimplify matters. Her talk touched on the many apparently competing theories about how to enable discovery in the dataspace, highlighting the complexity we’re all confronting as we make decisions about the discovery and use of our data: Big Data and The Cloud, Paradata, Linked Data, Microdata, and the ‘return’ of Structured Data.

But in terms of our shared goals to have our content discoverable or useable via the web, she explained it is the tactic of opening up data that is relevant to us all, even if our challenges in achieving ‘openness’ differ.

The slides from Joy’s presntation are available to view on Slideshare:

Discovery: Towards a (meta)data ecology for education and research

View more PowerPoint from joypalmer

In the afternoon I facilitated Andy McGregor and David Kay’s session on business cases where the participants obligingly contributed to David’s mapping exercises.

There were some interesting discussions around the participants’ experience of writing business cases, including useful suggestions for getting the most out of building a business case:

  • Predicting and measuring benefit are key challenges to overcome but we can do that by using the data at our disposal to create a convincing narrative. However it’s not about manipulating that data and making up stories retrospectively, we need to put energy into building robust analytics that help communicate our story clearly and convincingly.
  • Filling out a business case template shouldn’t be an activity that only happens in order to secure funding or other resources – it can be very useful to reiterate the process throughout the course of the project in order to track any changes in the course of the project.

The following links may be useful if you are interested in building robust business cases:

In the plenary session on day two the conversations centred around a number of discussion points:

  • Terms such as ‘microdata’ (machine-readable semantic tagging of webpage content) and ‘paradata’ (usage analytics or contextual information about data/metadata) were new to some of the participants and this prompted a discussion around the seemingly unavoidable challenge of jargon that we face within the Discovery arena. One suggestion was that instead of working to define a stronger vocabulary that is understood by all, perhaps we should be identifying stronger metaphors which everyone can relate to; metaphors that communicate the vision of what we are working towards and help everyone understand how they can get involved with delivering that vision within their own context.
  • We should be stepping outside of the sector to see the potential for emerging areas of activity (e.g. paradata). Looking to those sectors who are ahead of the game saves the library, museum and archives sectors having to try and work from a blank page. We also need to identify where our sectors are ahead and recognise how those advantages leave us well positioned to make significant progress.
  • Projects would benefit from a system of ‘evaluation buddies’ from within their programme to help uncover evidence of project impact and then share this evidence, together with highlighting any awards and recognition won by projects. This will help institutions build their internal business cases for bidding to run and then embed JISC projects in the future. There was also the suggestion that JISC could usefully build a collection of the major use cases (in a similar way to the Open Bibliographic Data Guide) together with short case studies that demonstrate the institutional impact.
  • Across the two days there were mentions of ‘microdata’ (machine-readable semantic tagging of webpage content), ‘big data’ (i.e. high volume) and ‘heavy data’ (data which ‘stretches current infrastructure or tools due to its size or bulk’ but the argument was made that the primary objective should be to produce ‘simple data’ (data that is both simple to produce and simple to consume).
  • There was recognition that aggregation is an art not a science and that current data standards are a) opinion, not fact and b) open to interpretation. High quality data is key to producing usable datasets but there was a question about how that quality can be defined. One suggestion was that data clean-up is a highly specialist service that should be decoupled, as per the government’s view with regard to open data.

Some key takeaway points for the Discovery programme:

  • Information about the Discovery programme, its projects and the underlying principles should be in a format that is ‘reframeable’, making it easy for interested parties to access information on their terms and cascade that information to their own audience or stakeholders.
  • Identifying and highlighting the tangible benefits of the Discovery Priniciples enables supporters of those principles to embark on fruitful conversations with colleagues in their institutions.
  • There is huge benefit in sharing the learning and challenges from within, and without, the Discovery programme.  An ongoing process of synthesis, re-synthesis and distillation will extract maximum value from the activity taking place across the Discovery initiative.
  • The quality of metadata is key to the success of Discovery initiatives – we need to explore how high quality metadata is defined and ensured.

Community Feedback

April 10, 2012

“Anything that helps people to make more meaningful use of resources is a good thing”

Veronica Adamson and Jane Plenderleith report on recent interviews.

Since March, we’ve carried out a series of interviews with leaders and managers in the library, archive and museum (LAM) community about what open data means for their users and communities. Discussions focused on benefits, issues and challenges for institutions, collections and users in this space. Some interesting and thought-provoking views have emerged, providing much food for thought on the development of the RDTF vision.

Here are some key points emerging from our discussions:

  • Supporting open data – the LAM community is keen that resources are available to a wide community of users and contribute as much as possible to the furthering of knowledge
  • Simplifying access – there is strong support for systems which help users easily to discover resources and avoid the confusion caused by a multiplicity of disparate datasets
  • Communication – to these ends, LAM professionals need accessible language, and clear evidence of the benefit of open data aggregation, aligned with institutional priorities
  • Local examples – networks of libraries, museums and archives are already sharing data and developing local solutions to metadata challenges relating to standards, purpose and nomenclature
  • High quality aggregation – we need to move beyond small-scale initiatives providing partial answers, which then sit on websites gathering digital dust
  • Special Collections as the archives of the future – as more and more published material is available digitally, the role of the library is as custodian of unique collections, so data relating to these collections is an invaluable national resource.

Our thanks to all those who have been involved in this process so far. If you’d like to share your thoughts, aspirations, plans or reservations about these matters in our forthcoming round of interviews, please get in touch via info@discovery.ac.uk.