w/c 24 December – Discovery news roundup

December 31, 2012

UK and JISC Discovery Project news:

Europeana news:

Other news from Europe and beyond:

Event reports:

  • The University of Leicester’s live debate, ‘Museums in the information age: Evolution or extinction?’, which took place at the Science Museum is available to listen to onlineA Guardian article covering the debate notes the importance of digital discovery: “[…] some digital resources produced by museums quickly become disposable if not easily discoverable by potential users.”
  • The Culture Hack event, which took place at the Google Campus, centred on envisaging new ways for London’s schoolchildren to interact with and be inspired by the city’s cultural heritage. Martin Belam’s blogpost provides a good overview of the day and highlights a thought provoking comment from the British Library’s Nora McGregor“It’s about teaching metadata to children.”

Call for contributions:

w/c 12 November – Discovery news roundup

November 15, 2012

One particularly interesting thing I noticed this past month was that tweets about open data, linked data and metadata were starting to come thick and fast from people within my network who sit well outside the library and cultural data domains. In particular the tweets from attendees of the Lasa’s Charity Digital Summit and the ‘Nesta in Manchester’ event about innovation seemed to include a rich vein of tweets about all things open. Perhaps an indicator that open data’s tipping point is approaching?

Some highlights from the world of resource discovery and open data in recent weeks:

Updates from a couple of large-scale projects in Europe and the US:

News from Discovery partner organisations:

Please note that places are still available for the second of our free Discovery Licensing Clinics on 30 November in London. It is an opportunity for managers and decision makers from libraries, archives and museums to get practical advice on open data licensing from our assembled team of experts.

Some preliminary highlights from the Discovery programme

November 15, 2012

The Discovery programme is nearing its completion date of December 2012. Most of the projects have finished or are wrapping up. Our efforts are now directed towards gathering together all that we have learned and produced in the programme.

The programme has covered a lot of ground so pulling everything together will take us some time. While that happens I thought it might be worth listing a selection of preliminary highlights of the programme. This blog is based on a talk I gave at the RLUK conference so the focus is on libraries and archives rather than museums.

Future approaches to Discovery

It is not clear what the future is for resource discovery. It is unlikely that there will just be one approach to resource discovery for libraries, museums and archives. The future is likely to be plural. While discovery has not developed firm answers on what the future is. We have experimented with a range of approaches and have identified those that are promising.

These approaches are recorded in the Discovery case studies and guidance site. They can be used to inform future plans in libraries, museums and archives. Or if the approaches seem promising enough they can be emulated or the tools that have been developed can be used. We are planning to produce a toolkit so that all these tools are in one place.

What is clear is that we are not alone in experimenting with these kind of approaches. This is a global movement with many and diverse institutions exploring similar approaches. The case studies and guidance recognise this by including explorations of the approaches of the Wellcome Trust, the Rijksmuseum and the Victoria and Albert Museum.

Innovative cataloguing

Resource discovery starts with cataloguing. The focus of the programme was not on cataloguing but a couple of interesting innovative approaches have emerged from the project.

The Institute of Education decided to explore new ways of cataloguing their collection. This involves the creation of basic records in Drupal, enriching of these records using professional cataloguer input then exporting of these records into the LMS. This may sound a roundabout way of doing things as I have written it but it was 3.5 times quicker and therefore cheaper than the current approaches. This allows the catalloguer to concentrate on the record enrichment by adding index terms. Full figures are available on their blog. They also developed lightweight ways to catalogue uncatalogued material which offers a significant saving in researcher time when using the material. More detail on this on the blog.

The second exploration of catalogues focused on the collection as a whole. The Copac collections management project used the copac data to create tool to allow librarians to analyse their collections and make decisions on which items can be removed from the collection and which are rare and need to be retained. This tool has been trialled by a number of libraries. During their trial, the University of Manchester found that the tool was 86% more effective than manual checking of the collection. Details on how this figure was arrived at can be found in the case study.

Greater impact through linking

Linking items in collections with relevant items in other collections offers the possibility of enabling richer resource discovery services and supports new and emerging research interests. Linked data is an intriguing option for enabling this. I don’t think the discovery programme has come up with a definitive answer on whether linked data is the future for libraries, museums and archives. But I think that the evidence is fairly strong that it will be a part of the future.

The programme included a number of projects experimenting with linked data for libraries and archives and there is work to be done to gather all of these together. However there are some headlines that we can report now:

  • The use case in archives seems to be strong as linking resources by place and person is something that should be useful to researchers and students
  • The step change project worked with Axiell to update CALM so that archives can create linked data records from within CALM. This functionality will be included in the next update and has the potential to benefit the large number of archives that use CALM. This linked data creation functionality is also available as a stand alone tool called Alicat.
  • Cambridge were able to create linked data records for 2.3 million books for their project which cost just under £40,000.
  • The ArchivesHub project Linking Lives has worked to use people as hooks to explore archive collections. This uses linked data and the model they have developed is being reused internationally. 
  • The Pelagios project has created a way to use linked data to identify ancient places in archive collections and there is a vibrant community growing around their approach.

Of course the Discovery programme is not alone in investigating linked data. The Library of Congress, OCLC, The British Library, Europeana and the DPLA are all using or investigating some form of linked data technology in pursuing their aims.

Linked data is not the only option for bringing different collections together and allowing people to use them in new ways. This can also be done with APIs and there are two discovery exemplar projects doing just this for Shakespeare and for WW1. Work on these is still underway but both are looking promising and offer some very interesting lessons for how to aggregate collections to enable new forms of resource discovery and research.

Enhanced shared services

We already have many shared services that help people discover those resources. Throughout the programme we have worked with those services to enhance them to help realise the resource discovery taskforce vision. It s worth a separate post on all of the ways the services have been developed so for now, I will just list the services that have been developed in the programme:

Business case

These are challenging economic times so it was important to address the business case for libraries, museums and archives to invest effort in improving resource discovery. The results of this work can be seen in the business case section of the discovery guidance. We worked with senior managers from libraries, museums and archives throughout the programme to ensure what we were doing address their needs. As part of this work we produced a series of videos where a selection of senior managers talk about their needs, challenges and predictions and they make for interesting viewing.

What’s next?

We are in the process of reviewing the Discovery programme and the resource discovery taskforce vision that kicked it all off. This review will produce a set of recommendations on what we should do next. These will be available in January. We will be looking to pull all of the outputs from the programme into a form that makes it easy for people to learn from the programme and to use what has been produced. We are also in the process of putting together an event for 2013 that brings together people from around the world that are working on addressing resource discovery challenges and seeing what we can learn from each other. More information on all of these things to follow soon.

Update on the activities of the Phase 2 Discovery projects

October 19, 2012

Latest news from the Phase 2 Discovery projects:

Last month all of the Phase 2 Discovery projects met to share information as they approach the end of their projects. Several of the projects have published their final blogposts which share their key lessons learnt and project outputs:

w/c 8 October – Discovery News Roundup

October 12, 2012

In recent weeks the Discovery Team have been finalising and releasing a whole suite of online materials which reflect our continued focus on the business case for Discovery. In September we released a collection of eight videos containing the reflections of UK academic library directors on topics such as the key issues and challenges for resource discovery, the value of making special collections visible and the potential of collaboration.

The videos were launched in our latest Discovery newsletter, along with our Case Study collection and Guidance Materials which aim to highlight and support current real-world practices relating to the Discovery Open Metadata Principles and Technical Principles within museums, libraries and archives.

The work of the Discovery programme has informed the latest animation from the OER IPR Support Project: ‘Open Data Licensing’ which you can view below. A key aspect of the Discovery programme’s approach is “establishing clarity of understanding around licensing and open data” so it’s good to see such a complex issue described in an accessible way – it doesn’t remove any of the inherent complexity but it breaks down and clarifies that complexity, which is an important initial step towards enabling action.

Some highlights from the wider world of resource discovery and open data:

  • In September, Europeana continued to set the pace in cultural data aggregation by opening up metadata for more than 20 million cultural objects for free use under the Creative Commons CC0 Public Domain Dedication licence. Their release represents the largest one-time dedication of cultural data to the public domain using the CC0 waiver and opens up the possibility of innovative apps, games, web services and portals being developed. The move also ‘holds the potential to bring together data […] from other sectors, such as tourism and broadcasting’. As Jill Cousins, Executive Director of Europeana said: “This move is a significant step forward for open data and an important cultural shift for the network of museums, libraries and galleries who have created Europeana”. EC Vice President, Neelie Kroes referred to the Europeana release as a ‘treasure trove of cultural heritage.
  • At the Healthcare Efficiency Through Technology Expo this week, Garry Coleman talked about the NHS Information Centre’s plans for a large-scale open data release, involving millions of rows of data being made available under an Open Government Licence. This release reflects the wider importance of transparency as a motivator for open data, particularly within governmental/publically-funded organisations. It also could be a watershed moment for the release of anonymised sensitive data which could further open up the way for the, arguably much less contentious, sharing of open metadata that our sector is working towards.
  • I mentioned Cooper-Hewitt Labs’ Director, Seb Chan, in my digest last month and his latest blogpost about being ‘of the web’ rather than ‘on the web’ is another interesting read. They are embracing the porosity of the internet and working with websites such as Behance to surface their collections and associated information out in the wild. In doing so they are finding creative ways to tackle potential showstoppers such as control over branding and retaining attribution. Their approach enables them to keep their expertise focussed on activities that are within their own domain and offers up an interesting blueprint for externally located engagement and visibility.
  • Rewired State are running an ‘Open Science’ hack day event in partnership with the Wellcome Trust in December.
  • The Open Data Network have launched the Open Data Showroom website which looks like it will become a very useful ‘at a glance’ resource for finding interesting sources of, and uses for, open data.
  • Leigh Dodds’ blogpost identifies a simple model for exploring the sustainability of open data curation projects such as legislation.gov.uk.
  • A significant release of legislative open data was announced this week on the Open Knowledge Foundation website, which reported on the release of US Congress legislative data going back to 1973.
  • The latest Arts Council Digital R&D podcast focuses on how organisations can use digital technology to open up archives, collections and data. It includes news from the V&A and the British Museum and considers the impact of projects such as Google’s Art Project.
  • And staying with Google, this week saw the launch of the Google Cultural Institute which aims to “preserve and promote culture online”. The Cultural Institute website presents curated cultural artefacts in online galleries, together with search and browse facilities. The individual artefacts retain their attribution to the holding organisation and, in some cases, the associated metadata can also be viewed. It’s not immediately obvious how open the underlying data is but it appears to be a walled garden at the moment.

Twelve Themes and a few more

September 28, 2012

All 17 projects funded by JISC in Phase 2 of the Discovery programme met in Birmingham today to share updates and ideas as they wind down their efforts. It was a very stimulating meeting, not least because the shared Discovery dialogue seems to have developed significantly since during 2012. The Phase 1 projects undertook some very useful experiments, but the Phase 2 projects have taken things up a notch.

Here, in very raw form are the recurrent themes that I recorded as takeaways from the session

A – Data and access points

  • Time and Place are priority access points
  • URIs offer an effective base level linking strategy
  • Collection level descriptions have potential as finding aids across domains
  • User generated content, such as annotations, has a place at the table

B – People

  • Community is a vital driver – open communities maintain momentum; specialist enthusiasms and ways of working provide strong use cases
  • For embedding new metadata practice, start where the workers are – add-ins to Calm and MODS demonstrate that
  • More IT experience / skills are required on the ground

C – The way the web works

  • Aggregators crawl don’t query … OAI-PMH, Robots, etc
  • Google’s strength shouts ‘Do it my way’ – and we should take heed (but we do need both/and)
  • Currency of data is important – there may be a tension with time lags associated with crawling
  • Aggregators need to know what is where to build or add value  so … we don’t need a registry?
  • No man is an island – It’s a collaborative world with requirements to interact with complementary services such as Dbpedia, Europeana, Google Historypin, Pleiades, UKAT, VIAF

D – Tools and technology

  • There is opportunity / obligation to leverage expert authority data and vocabularies – examples as above and more, such as Victoria County History, …
  • Commonly used software tools include Drupal, Solr/Lucene, Elastic Search, Javascript, Twitter bootstrap
  • JSON and RDF are strong format choices amongst the developers
  • Beware SPARQL end points and Triple Stores, especially in terms of performance
  • APIs are essential – but little use without both documentation and example code
  • OSS tools have been built by several projects … but how do we leverage them (e.g. Bibsoup, Alicat)

w/c 27 August – Discovery News Roundup

August 31, 2012

Earlier this month OCLC announced their recommendation that member institutions use the Open Data Commons Attribution (ODC-BY) License when releasing WorldCat-derived library catalogue data. You can read David Kay’s response to that announcement here on the Discovery blog. And last week there was news that OCLC and Europeana are collaborating on a project developing ‘semantic similarity’ that will improve the experience of searching aggregated metadata by identifying items that are near duplicates or related to each other. The wider significance of this project is that it will feed into the Europeana Data Model and will provide “opportunities to develop new data services for third parties.”

It’s long been asserted that content is king, and more recently that context is queen but now Associated Press are investing in marrying the two in order to speed up the distribution of content. Clearly the Associated Press business model is based on syndication rather than aggregation but Melody K. Smith’s assertion that “[f]indability only works when a proper taxonomy is in place.” seems worth some thought with regard to its relevance for our sectors.

TechCrunch’s article pitching Mendeley’s open API against Elsevier’s closed API is flawed but it’s worth reading for the comments it provoked, particularly from Elsevier’s Director of Platform Integration, Ale De Vries. You can read more about the growth of Mendeley’s API service on the Guardian Technology blog – it’s interesting to note that their future plans involve developing their API service into a multi-directional dataflow that will allow applications built on their API to talk to each other and to upload data to Mendeley.

Seb Chan’s candid blogpost reflecting on the Cooper-Hewitt Design Center’s experience of openly releasing their collection metadata is a useful and timely reminder that a) issues around the quality of released metadata need to be addressed if we want anyone to use the data we’re releasing and b) “collection metadata [has value as a tool for discovery] but it is not the collection itself.” Seb’s point about museum collections being no match for the comprehensiveness of libraries and archives highlights the importance of open metadata, by enabling cross-institutional aggregation, and the work of OCLC and Europeana’s ‘semantic similarity’ project I mentioned above. In an ideal world it will also enable the public permeability that Seb touches on by connecting our collections with the boundless ‘amateur web’ corpus.

News from Discovery projects