w/c 11 June 2012 – Discovery News Roundup

June 12, 2012

Although I’ve been away over recent weeks, activity within the world of open metadata has continued unabated – here is my digest of activity from within the Discovery programme and from further afield.

Joy Palmer’s talk at the Joint Content and Discovery Programme contained a wealth of information about the current open metadata landscape, including links to a still relevant 2010 Economist article on the ‘data deluge’ (see also their report on the potential, and conversely the problem, of ‘superabundant data’). I’d argue that the increased quantity of data isn’t necessarily creating lots of new information management issues but it certainly makes those issues more visible and more pressing as soon as we move from passively collecting data to wanting to actively exploit the potential of that data.

Last month OCLC released the Virtual International Authority File (VIAF) dataset under an open data licence, together with their guidance on attribution. OCLC has also recently launched their WorldShare Management Service which provides libraries with “a new approach to managing library services cooperatively, including integrated acquisitions, cataloging, circulation, resource sharing, license management and patron administration, as well as a next-gen discovery tool for library users.” (emphasis is mine).

America’s National Institutes of Health (NIH) presented a showcase of the National Library of Medicine APIs that are available to developers. A recording of the live webcast is available to view online. The NIH has clearly decided to move beyond the more commonly found ‘build it and they will come’ approach and are actively engaging the developer community to help them understand what APIs are available. More recently they ran a two day Health Datapalooza event which brought together NLM data experts and developers. The event was livestreamed and you can view the archived video online.

Closer to home, discussion of data in The Guardian has made it out of their Data Store pages and into the pages of their Culture Professionals Network blog. Patrick Hussey has written a three -part wide ranging exploration of data within the arts and culture sector which argues that it is time to open up performance paradata and look at ways of making their shared data count. Patrick’s main focus is on open data rather than open metadata but the series is very thought provoking and in his second article he points to the work of The National Archive in creating an open API database legislation in the shape of: http://www.legislation.gov.uk/

The BBC Connected Studio project is an open collaboration initiative that kicked off in May and is initially focused on developing new approaches to personalisation using DevCSI-style hackspace gatherings to bring together digital talent from outside the BBC. Later this year the focus shifts to “connected platforms and big data” which could mean some interesting developments that the MLA sectors might benefit from and opportunities for MLA developers to get involved by responding to Connected Studio call for participants.

The BBC Online team have managed to communicate their search and discovery strategy very clearly in the second of the videos included within this Connected Studio blogpost.

link to the BBC blogpost containing the video

The Imperial Museum is heading an international partnership of organisations in the run-up to the beginning of a four-year programme of activities to commemorate the First World War Centenary: “Through the partnership, colleagues from a variety of sectors [including museums, archives, libraries, universities and colleges, special interest groups and broadcasters] have the opportunity to communicate with each other, share and combine resources, cooperate and co-develop products and services that complement each other […]”.  It will be interesting to see whether any developments similar to the Will’s World Discovery aggregation project emerge as a result of such a broad collaborative partnership.

Discovery Licensing Clinic

May 23, 2012


photo credit: Ed Bremner

The first Discovery Licensing Clinic brought together representatives from a number of different libraries, archives and museums to spend a day considering practical responses to the Discovery open licensing principles and getting practical guidance from the assembled experts. It was an opportunity to identify issues and discuss the range of tactics that institutions might adopt in scoping metadata releases and making the associated licensing decisions.

Our panel of experts on the day consisted of Francis Davey (Barrister), Naomi Korn (Copyright Consultant), Paul Miller (Cloud of Data) and Chris Banks (University Librarian & Director, Library, Special Collections & Museums, University of Aberdeen)

Chris Banks has written a blogpost reflecting on the day and her presentation slides can be viewed below:

The issues around licensing open metadata do represent a significant hurdle for institutions but none of those issues are insurmountable. Our hope is that licensing clinics such as this one, and the ones we plan to run in the future, will give managers and decision makers the knowledge they need to progress the open metadata agenda within their organisation.

Highlights from the Content and Discovery Joint Programme event

May 22, 2012

On the 23rd April colleagues from projects across the Discovery, JISC Content, JISC OER and Emerging Opportunities programmes gathered in Birmingham to share knowledge and identify shared challenges and key agendas that need to be progressed going forward. As is often the way with these types of events the discussions that took place over a day and a half were as useful to those running the event as they were for the delegates attending. The notes below represent just a handful of my highlights.

Joy Palmer presented on behalf of the Discovery Programme and gave a compelling overview of the challenges and aspirations we share around the discovery of content. She highlighted how, as the RDTF work was translated into the Discovery initiative, it became clear that we needed to talk in terms of an ecosystem as opposed to an ‘infrastructure’ because the latter suggested that the initiative was aiming to impose an overarching infrastructure model over the entire museums, libraries and archives (and JISC) discovery space.

“To a large degree, what today is about is determining to what degree we can operate as a healthy and thriving ecosystem, where components of our content or applications interact as a system, linked together by the flow of data and transactions.”

But as Joy stated, this is not to oversimplify matters. Her talk touched on the many apparently competing theories about how to enable discovery in the dataspace, highlighting the complexity we’re all confronting as we make decisions about the discovery and use of our data: Big Data and The Cloud, Paradata, Linked Data, Microdata, and the ‘return’ of Structured Data.

But in terms of our shared goals to have our content discoverable or useable via the web, she explained it is the tactic of opening up data that is relevant to us all, even if our challenges in achieving ‘openness’ differ.

The slides from Joy’s presntation are available to view on Slideshare:

Discovery: Towards a (meta)data ecology for education and research

View more PowerPoint from joypalmer

In the afternoon I facilitated Andy McGregor and David Kay’s session on business cases where the participants obligingly contributed to David’s mapping exercises.

There were some interesting discussions around the participants’ experience of writing business cases, including useful suggestions for getting the most out of building a business case:

  • Predicting and measuring benefit are key challenges to overcome but we can do that by using the data at our disposal to create a convincing narrative. However it’s not about manipulating that data and making up stories retrospectively, we need to put energy into building robust analytics that help communicate our story clearly and convincingly.
  • Filling out a business case template shouldn’t be an activity that only happens in order to secure funding or other resources – it can be very useful to reiterate the process throughout the course of the project in order to track any changes in the course of the project.

The following links may be useful if you are interested in building robust business cases:

In the plenary session on day two the conversations centred around a number of discussion points:

  • Terms such as ‘microdata’ (machine-readable semantic tagging of webpage content) and ‘paradata’ (usage analytics or contextual information about data/metadata) were new to some of the participants and this prompted a discussion around the seemingly unavoidable challenge of jargon that we face within the Discovery arena. One suggestion was that instead of working to define a stronger vocabulary that is understood by all, perhaps we should be identifying stronger metaphors which everyone can relate to; metaphors that communicate the vision of what we are working towards and help everyone understand how they can get involved with delivering that vision within their own context.
  • We should be stepping outside of the sector to see the potential for emerging areas of activity (e.g. paradata). Looking to those sectors who are ahead of the game saves the library, museum and archives sectors having to try and work from a blank page. We also need to identify where our sectors are ahead and recognise how those advantages leave us well positioned to make significant progress.
  • Projects would benefit from a system of ‘evaluation buddies’ from within their programme to help uncover evidence of project impact and then share this evidence, together with highlighting any awards and recognition won by projects. This will help institutions build their internal business cases for bidding to run and then embed JISC projects in the future. There was also the suggestion that JISC could usefully build a collection of the major use cases (in a similar way to the Open Bibliographic Data Guide) together with short case studies that demonstrate the institutional impact.
  • Across the two days there were mentions of ‘microdata’ (machine-readable semantic tagging of webpage content), ‘big data’ (i.e. high volume) and ‘heavy data’ (data which ‘stretches current infrastructure or tools due to its size or bulk’ but the argument was made that the primary objective should be to produce ‘simple data’ (data that is both simple to produce and simple to consume).
  • There was recognition that aggregation is an art not a science and that current data standards are a) opinion, not fact and b) open to interpretation. High quality data is key to producing usable datasets but there was a question about how that quality can be defined. One suggestion was that data clean-up is a highly specialist service that should be decoupled, as per the government’s view with regard to open data.

Some key takeaway points for the Discovery programme:

  • Information about the Discovery programme, its projects and the underlying principles should be in a format that is ‘reframeable’, making it easy for interested parties to access information on their terms and cascade that information to their own audience or stakeholders.
  • Identifying and highlighting the tangible benefits of the Discovery Priniciples enables supporters of those principles to embark on fruitful conversations with colleagues in their institutions.
  • There is huge benefit in sharing the learning and challenges from within, and without, the Discovery programme.  An ongoing process of synthesis, re-synthesis and distillation will extract maximum value from the activity taking place across the Discovery initiative.
  • The quality of metadata is key to the success of Discovery initiatives – we need to explore how high quality metadata is defined and ensured.

Radically Open Cultural Heritage Data at SXSW Interactive 2012

April 11, 2012


Posted by Adrian Stevenson

I had the privilege of attending the annual South by South-west Interactive, Film and Music conference (SXSW) a few weeks ago in Austin, Texas.    I was there as part of the ‘Radically Open Cultural Heritage Data on the Web’ Interactive panel session, along with Jon Voss from Historypin, Julie Allinson from the University of York digital library, and Rachel Frick from the Council on Library and Information Resources (CLIR). We were delighted to see that Mashable.com voted it as one of ’22 SXSW Panels You Can’t Up This Year’.

All of our panelists covered themes and issues addressed by the Discovery initiative, including the importance of open licenses, and the need for machine readable data via APIs to facilitate the easy transfer, aggregation and link-up of library, archives and museum content.

Jon gave some background on the ‘Linked Open Data in Libraries, Archives and Museums’ (LOD-LAM) efforts around the world, talking about how the first International LODLAM Summit held in San Francisco last year helped galvanise the LODLAM community. Jon also covered some recent work Historypin are doing to allow users to dig into archival records.

Julie then covered some of the technical aspects of publishing Linked Data through the lens of the OpenArt Discovery project, which recently released the ‘London Art World 1660-1735’ data. She mentioned some of the benefits of the Linked Data approach, and explained how they’ve been linking to VIAF for names and Geonames for location.

I gave a quick overview of the LOCAH and Linking Lives projects, before giving a heads up to the World War One Discovery project. LOCAH has been making archival records from the Archives Hub national service available as Linked Data, and Linking Lives is a continuation project that’s using Linked Data from a variety of sources to create an interface based around the names of people in the Archives Hub. After attempting to crystallise what I see are the key benefits of Linked Data, I finished up by focusing on particular challenges we’ve met on our projects.

Rachel considered how open data might affect policies, procedures and the organisational structure of the library world.  She talked about the Digital Public Library of America, a growing initiative started in Oct 2010. The DPLA vision is to have an “open distributed network of comprehensive online resources that draw on the nations living history from libraries, universities, archives and museums to educate, inform, and empower everyone in current and future generations”. After outlining how the DPLA is aiming to achieve this vision, she explained how interested parties can get involved.

There’s an audio recording of the panel on our session page, as well as recordings of all sessions mentioned below on their respective SXSW pages. I’ve also included the slides for our session at the bottom of this post.

Not surprisingly, there were plenty of other great sessions at SXSW. I’ve picked a few highlights that I thought would be of interest to readers of this blog.

Probably of most relevance to Discovery was the lightening fast ‘Open APIs: What’s Hot and What’s Not’ session from John Musser, founder of Programmableweb.com, who gave us what he sees as the eight hottest API trends. He mentioned that the REST style of software architecture is rapidly growing in popularity, being regarded as easier to use than other API technologies such as SOAP (see image below). JSON is very popular with 60% of APIs now supporting it. It was also noted that one in five APIs don’t support XML.

Hot API Protocols and Styles from John Musser of Programmableweb.com

The rise of REST – ‘Hot API Protocols and Styles’ from John Musser of Programmableweb.com at SXSW 2012

Musser suggested that APIs need to be supported, with Hackathons and funded prizes being a good way to get people interested. He noted that the hottest trend right now is that VCs are providing significant funding to incentivise people to use their APIs, Twilio being one of the first to do this. He also mentioned that your API documentation needs to be live if you’re to get interest and maintain use. Invisible mashups are also hot, with operating systems such as Apple’s OS cited as being examples of such. Musser suggests the overall meta-trend is that APIs are now ubiquitous. John’s now made his slides available on slideshare.

The many users of laptops amongst us will have been interested to hear about the ‘Future of Wireless Power’.  The session didn’t go into great detail, but the message was very much “it’s not a new technology, and it’ll be here very soon”. Expect wireless power functionality in mobile devices in the next few years, using the Qi standard.

Some very interesting folks from MIT gave the thought provoking ‘MIT Media Lab: Making Connections’ session. Joi Ito, Director of MIT Media Labs explained how it’s all about the importance of connecting people, stating that “we’re now beyond the cognitive limits of individuals, and are in an era where we rely on networks to make progress”. He suggested that traditional roadmaps are outmoded, and that we should throw them away and embrace serendipity if we’re to make real progress in technology. Ito mentioned that MIT has put significant funding into undirected research and an ‘anti-disciplinary’ approach. He said that we now have much agility in hardware as well as software, and that the agile software mentality is being applied to hardware development. He pointed to a number of projects that are embracing these ideas – idcubed, affectiva, sourcemap and formlabs.

Josh Greenberg talked about ‘macroscopy’ in the ‘Data Visualization and the Future of Research’ session, which is essentially about how research is starting to be done at large scale. Josh suggested that ‘big data’ and computation are now very important for doing science, with macroscopy being the implementation of big data to research. He referred to the ‘Fourth Paradigm’ book which presents the idea that research is now about data intensive discovery. Lee Dirks from Microsoft gave us a look at some new open source tools they’ve been developing for data visualisation, including Layerscape, which allows users to explore and discover data, and Chronozoom, which looked useful for navigating through historical big data.  Lee mentioned Chronozoom was good for rich data sources such as archive & museum data, demoing it using resources relating to the Industrial Revolution.

So that was about it for the sessions I was able to get to as part of the SXSW Interactive conference. It was a really amazing event, and I’d highly recommend it to anyone as a great way to meet some of the top people in the technology sector, and of course, hear some great sessions.

The slides from our session:

Community Feedback

April 10, 2012

“Anything that helps people to make more meaningful use of resources is a good thing”

Veronica Adamson and Jane Plenderleith report on recent interviews.

Since March, we’ve carried out a series of interviews with leaders and managers in the library, archive and museum (LAM) community about what open data means for their users and communities. Discussions focused on benefits, issues and challenges for institutions, collections and users in this space. Some interesting and thought-provoking views have emerged, providing much food for thought on the development of the RDTF vision.

Here are some key points emerging from our discussions:

  • Supporting open data – the LAM community is keen that resources are available to a wide community of users and contribute as much as possible to the furthering of knowledge
  • Simplifying access – there is strong support for systems which help users easily to discover resources and avoid the confusion caused by a multiplicity of disparate datasets
  • Communication – to these ends, LAM professionals need accessible language, and clear evidence of the benefit of open data aggregation, aligned with institutional priorities
  • Local examples – networks of libraries, museums and archives are already sharing data and developing local solutions to metadata challenges relating to standards, purpose and nomenclature
  • High quality aggregation – we need to move beyond small-scale initiatives providing partial answers, which then sit on websites gathering digital dust
  • Special Collections as the archives of the future – as more and more published material is available digitally, the role of the library is as custodian of unique collections, so data relating to these collections is an invaluable national resource.

Our thanks to all those who have been involved in this process so far. If you’d like to share your thoughts, aspirations, plans or reservations about these matters in our forthcoming round of interviews, please get in touch via info@discovery.ac.uk.

Warwick workshop prioritises resource discovery

March 29, 2012

In January 2012, JISC and SCONUL convened a workshop for Library Directors and Senior Managers to review the evolving requirements for institutional Library Management Systems (LMS), referenced as Domain 3 in the 2009 SCONUL report to HEFCE.  Entitled ‘The Squeezed Middle’, the workshop focused on the key service developments impacting the LMS footprint, given evolving approaches in Resource Discovery (Domain 2) and shared service developments in the management of subscription resources (Domain 1).

After considering a business modeling framework presented by Lorcan Dempsey and a number of future scenarios set in the year 2020, the workshop reviewed a catalogue of over 60 potential library service and institutional knowledge management objectives. The group evaluated them in terms of desirability, feasibility and their potential to act as drivers of mission critical change.

It was striking that the Discovery agenda represented a very high proportion of the items ranked as high priority looking to 2020. It was also noted that above campus initiatives (such as shared cataloguing and records improvement) and services (such as resource discovery aggregations) can act as catalysts for reviewing workflows (both user and librarian) and reappraising library team skills.

The highest ranked Discovery related targets were as follows:

  • 31 – Provide 1-stop search across all asset types
  • 32 – Publish open linked catalogue metadata
  • 33 – Expose the collection to other search mechanisms
  • 34 – Emphasise exposure of special collections
  • 35 – Integrate LMS & VLE resources, including reading lists
  • 43 – Curate local learning resources, including OERs
  • 44 – Drive the value of reading lists

Medium priority Discovery related targets were:

  • 36 – Provide recommender and associated ‘social’ services
  • 45 – Curate institutional research data
  • 46 – Expose the institutional repository
  • 47 – Expose the university archives

The headline priorities included

  • Provide 1-stop search across the range of Teaching, Learning and Research asset types that are authored and collected within institutions
  • Integrate reading lists effectively with the discovery of and access to library, VLE and repository resources
  • Establish sustainable curation, workflow management and exposure for all digital scholarly assets – including local learning resources, OERs and research data
  • Not on the original list, delegates added the potential for a persistent personal interface to assets, typically through bookmarking; the metaphor of a personal e-shelf was regarded as attractive.

Other challenges such as re-thinking the user access points for resource discovery or collaboration on adoption of widely used authorities and vocabularies were regarded as less critical, though not unimportant. The abandonment of the traditional LMS OPAC received a low vote on the basis that this will be an outcome of success in these broader ambitions. Whilst enhancing the discoverability of university museum assets received a low average vote, it was highly scored by those institutions with their own museum collection.

So Discovery featured highly for library management both as an end in itself and as a catalyst for changing processes and practice, relationships and responsibilities. However, we must also reflect on whether this professional and user-centred aspiration relates to a destination at which we will one day arrive or perhaps may be better viewed as an essential element in the continuous evolution of the academy.

w/c 12 Mar 2012 – Discovery News Roundup

March 16, 2012

Here’s my round up of news from the world of Discovery and beyond over the past couple of weeks. As with my previous posts, many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf.

First of all, some news from the Discovery initiative – There is an opportunity to attend the free Licensing Clinic that the Discovery project is running on Wednesday 9th May in Birmingham. This practical roundtable event is aimed at managers and decision makers in libraries, archives and museums and there will be the following experts on hand to help guide you through your institution’s particular open metadata licensing challenges: Francis Davey (Barrister), Naomi Korn (Copyright Consultant), Paul Miller (Cloud of Data). Please note that places at this event are strictly limited to 15 delegates so you’re advised to book sooner rather than later and you can do that by signing up via the Eventbrite registration page.

In recent weeks I’ve seen a few articles relating to the need for skills development in the area of ‘data wrangling’/’data management’:

Those articles left me wondering whether there are specific skills needed for dealing with and managing open metadata which we should be identifying and highlighting? On a related note, I saw a short conversation regarding Linked Data on Twitter that I think a lot of people will relate to and which could be equally applied to any of the areas touched on by the Discovery initiative – To summarise, the main point of the conversation was that [people] have no trouble understanding what terms such as Linked Data mean while they are being explained to them but that knowledge is hard to retain and quickly loses definition when you walk away and/or try to explain it to anyone else.

Resources such as the Open Metadata Handbook are undoubtedly a useful touchstone people can keep returning to when they need a refresher but what else needs to be in place to ensure that knowledge about open metadata is discovered, shared and becomes embedded within staff skillsets?

One of the aims of the Discovery initiative is to raise awareness of open metadata and if you’d like to help us do that then you can either:

Some other links of interest from the wider world of data:

Lastly, I’ve started exploring how I can use Delicious to share other items of interest that I pick up during my travels across the webosphere – To that end I’ve started using Packrati.us to auto-bookmark my Twitter favourites and shared hyperlinks in Delicious and have also created a #UKDiscovery ‘stack’ where I’ve started sharing any of my bookmarks that seem particularly pertinent to the Discovery initiative.

Open Data – The Missing Link?

March 12, 2012

Ken Chad positions Discovery in the context of global and national thinking

In March 2011 the first issue of Google’s Think Quarterly[1] online magazine was dedicated to data. Nigel Shadbolt of the University of Southampton writes that one of the key responses to the 21st century demand for information is open data. The data.gov.uk website and the influence of Shadbolt alongside Sir Tim Berners-Lee has positioned UK government as one of the leaders in open data[2].

However, despite the increased recognition of Shadbolt’s argument that “open data provides a platform on which innovation and value can flourish”, more needs to be done. This is certainly the case with libraries, museums and archives. Discovery Chair, Prof. David Baker, emhasises that by opening up more data for reuse “we can better serve UK educators and researchers to excel in their work by increasing access to, and visibility of, relevant content”.

If we are to achieve the ambition of the Discovery initiative for a sustainable ‘metadata ecology’, two broad issues need to be addressed. The first is around making a clear business case. Key figures like Shadbolt and Berners-Lee have done much to clarify and advocate the broader business case especially for government data. However more remains to be done to help heads of libraries, museums and archives articulate the particular business case for their organisations – as Discovery is undertaking to do.

Secondly, a commitment to licensing open metadata will be vital. It is encouraging that this is central to a number of current projects in libraries, museums and archives with the British Library[3] amongst those leading the way. At the same time Discovery is providing case studies and tools such as the Open Bibliographic Data Guide to support managers, practitioners and developers.


  1. ^ http://thinkquarterly.co.uk/
  2. ^ http://www.guardian.co.uk/news/datablog/2010/jan/21/timbernerslee-government-data
  3. ^ http://www.bl.uk/bibliographic/datafree.html

w/c 27 Feb 2012 – Discovery News Roundup

March 4, 2012

Here’s my round up of news from the world of Discovery and beyond over the past few weeks. As with previous posts, many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf [update: URL fixed].

Last week the Discovery team published Issue 6 of the Discovery Newsletter which included the following articles among others:

  • an article on how the Copac Collections Management Tool project is aiming to help collections managers.
  • an introduction to ‘Will’s World’ – one of the JISC-funded large-scale exemplar projects.
  • an invitation for supply chain organisations such as system vendors and publishers to engage with the Discovery initiative.

If you’d like to receive future newsletters by email you simply need to drop us a line at rdtf-discovery@sero.co.uk and you’ll be added to the distribution list.

It was interesting to read Harvard’s announcement of the changes they will be undergoing in order to unify their 73 (!) libraries. Much of the announcement concentrated on structural changes but this sentence caught my eye and it seems to suggest that some game changing LIS developments could be in the offing: “The changes will position the Library to lead in scholarly communication and open access, to design next generation search and discovery services, and to accelerate digitization and digital preservation.

Of course Harvard’s Library Lab team are already involved in designing next generation search and discovery services as part of the Digital Public Library of America (DPLA) Beta Sprint initiative – the scale of the data they’re dealing with is pretty impressive but it was the live demo of their “pre-alpha” ShelfLife/LibraryCloud system that took my breath away and got me thinking about new possibilities for discovery interfaces.

When I first read this short blogpost from the Louie B. Nunn Center for Oral History, University of Kentucky I initially dismissed it as not quite newsworthy enough to include in this digest … but I kept thinking about the story after I had clicked away from it.  It seems to me that the ‘Oral History Metadata Synchronizer’ (OHMS) tool that they’ve developed with their digital library division has huge potential for improving the visibility of audio collections and connecting them to other relevant resources. The story of how the Nunn Center have used OHMS to preserve and share interviews with survivors of the Haiti earthquake is a moving reminder that metadata is (at the risk of getting poetic and misty eyed) more than sterile information, and the discovery it enables is human as much as it is digital.

Staying on the subject of audio collections, the Music Library Association is currently working on a final version of their Music Discovery Requirements document and they are currently inviting thoughts and suggestions. This presentation by Nara Newcomer provides useful background on the aim of the Music Discovery Requirements document.

The Discovery programme is particularly focused on the business case for adopting open metadata so it was interesting to read this white paper from Nielsen which reports on the effect of supplying (or not supplying) metadata within the book industry. One of the key conclusions reads: “Overall we see clear indications that supplying a set of full enhanced metadata for product records helps to maximise sales, and that this relationship between enhanced metadata and sales is even stronger for the online retail sector.” Of course UK university libraries are not in the business of book retail and this report could simply serve to make publishers more commercially protective over the metadata they create but all the same it is good to have some high profile research published in this area. It’s a pity that they don’t separate out enhanced metadata from the provision of a cover images in their analysis – from research I’ve been involved in previously I suspect there might be some interesting findings that remain hidden by the approach they’ve taken.

Europeana have published data for 2.4 million items under an open metadata licence as part of its Linked Open Data pilot. The data is provided by eight national libraries and a number of cultural heritage organisations (including some from the UK) and there’s also a convincing animation on the ‘what and why’ of linked data which, pleasingly, keeps the end user at the forefront of the discussion. Europeana also launched the ‘European Library Standards Handbook’ which is their guide for libraries who are providing content to data aggregators – it includes a legal overview as well as a technical guide. If you are interested in linked open data then you might want to follow the University of Bristol’s ‘Bricolage’ project which is JISC-funded and will be publishing catalogue metadata from their Penguin Archive and Geology Museum collections.

Earlier this week I found myself having one of those ‘am I the only person not at this event?’ moments as my Twitterstream gradually filled up with all manner of interesting and diverting tweets from the OCLC EMEA Regional Council Annual Meeting.  Owen Stephens captured some of the knowledge that was shared around the topic of APIs in his blogposts written on the day. One of the sessions that seemed to be particularly well received was Alison Cullingford’s presentation on recent survey findings from the RLUK Unique and Distinct Collections project so it will be interesting to read the report when it is published. The meeting also brought news that an open data commons licence is being considered for WorldCat:

WorldCat: open data commons licence is being considered and will be discussed with OCLC membership through Global Council #EMEARC

— Simon Bains (@simonjbains) February 29, 2012

I will not pretend to be an expert but these guides that the Archives Hub have added to their website look very useful for anyone who is interested in accessing Archives Hub data using SRU and OAI-PMH interfaces.

I’ll finish up by sharing some interesting news in the wider world of open data and metadata:

  • The JISC Managing Research Data Programme is doing some heavy lifting in terms of building a registry of metadata standards  (for UK university research datasets) – I’m sure they would be pleased to hear from you if you have any insights you’d like to share with them.
  • The Government’s call for input to their consultation on “open standards for software interoperability, data and document formats” is ongoing and it doesn’t close until 3 May so there’s plenty of time left to think about what the direct and indirect supply chain ripples might be.
  • In my last news digest I mentioned that ‘big data’ suddenly seemed to be everywhere – This week Nick Edouard’s reflective post over on the BuzzData blog struck a chord with me, particularly his point that “Open-data initiatives are good for many reasons, not least because they can radically improve internal data-sharing.” Often the discussion around open data tends towards a leap of faith/altruistic model but keeping focused on the ‘what’s in it for us?’ question seems a surer way of securing the internal resources needed to release data in the first place.

In closing, a couple of blogposts I’ve read recently have got me thinking about the importance of identifying a vision that other people can quickly understand and get behind:

I think that the Discovery vision packs a similar punch but perhaps it could be more emotive?: “[Our vision] is about making resources more discoverable both by people and machines.” Is that a vision which speaks to you? Have you found the words to succinctly describe your institution’s vision for resource discovery? Please do share your thoughts in the comments below.

w/c 6 Feb 2012 – Discovery News Round-up

February 9, 2012

Here’s my round up of news from the world of Discovery and beyond over the past few weeks. Many of the items were gleaned from the #ukdiscovery twitter hashtag which you can dip into whenever you like by opening up this FiveFilters ‘newspaper’ pdf that I generated.

Last week Joy Palmer shared plans for the next phase of guidance materials and workshops here on the Discovery blog and is looking for your feedback on the outlined approach so please do wade in and let us know what you think. And bonus points for anyone who can suggest a better title for the event than ‘Un’developer hands-on development event. The best I can come up with is ‘Can’t Code, Won’t Code’ so the field is wide open.

The National Information Standards Organization (NISO) are currently inviting public comment on the working group recommendations that have come out of the joint NISO and NFAIS (the National Federation of Advanced Information Services) project to develop Recommended Practice on Online Supplemental Journal Article Materials. The main aim of the project is to improve the ‘discoverability and findability’ of journal supplemental materials for librarians and would-be readers by establishing and maintaining links to the related article. The comment period runs until 29th February and, although the recommendations are aimed mainly at publishers, they are also interested in feedback from the wider scholarly community. [via @simonhodson99]

One of the key NISO/NFAIS recommendations is around consistency and, interestingly, this was also one of the key discussion points raised during recent focus groups run by the JISC/AHRC-funded Open Access e-Books research project (OAPEN-UK). So far the project have heard from humanities and social sciences (HSS) monograph publishers, authors/readers and institutional representatives and next week they are running focus groups for research funders, e-book aggregators and learned societies. Incidentally, if you are interested in taking part in one of those focus groups then further details can be found on their Events page. [via @publishersrcly]

A couple of weeks ago it seemed to be ‘Big Data’ week on my twitter stream – all and sundry were tweeting about it and it wasn’t just the data geeks any more. It certainly seemed to suggest, as reported in this Museum Geek post, that “the era of Big Data has begun” but it struck me that the conversation around big data seems to be moving on from mostly logistical or functional discussions about gathering, storing, sharing and making use of data to a realisation that generating and circulating more data doesn’t solve anything on its own (see GigaOm’s article which likens it to virtual landfill via @paulmiller). In the world of building websites there’s a saying that ‘content is king’ but in the world of data it would appear that ‘content + context = king and queen’. Which had me pondering whether the Discovery initiative could usefully consider establishing Open Paradata Guidelines to sit alongside our Open Metadata Principles. And coming from a humanities background myself I found Michael Kramer’s assertion that “data is always already meta-data” an interesting point to mull over.

The Data Catalogs website, which was launched last summer, aims to be “the most comprehensive list of open data catalogs in the world”. I’m sure it’s relatively early days yet but there are already 212 catalogues listed and the list of experts involved in the website is impressive. It looks like it will grow into a useful centralised resource, particularly if a more advanced search is added, but I noticed that not all of the entries state what their metadata license is – it seems to me that there’s an opportunity to improve consistency and clarity by making that a mandatory field. What did impress/surprise me though is that any visitor to the website can improve a record simply by clicking on the ‘Please help improve this page by adding more information’ link at the bottom of the record and editing the fields that appear [via @rufuspollock]. If you are interested in the issues around licensing open data then Naomi Korn and Professor Charles Oppenheim’s practical guide is worth a read.

And finally, a few items of interest from the wider world of Discovery:

  • This article about book mashups on the Programmable Web ‘API News’ blog got me thinking about countless possibilities for making library and museum and gallery collections more visible and connected in new ways. Then this morning someone tweeted about the strangely hypnotic Flight Radar website and I wondered if one day I might find myself gazing at a map that shows books flying overhead as they wend their way from place to place as inter-library loans.
  • March is looking set to be Culture Hack Month, with events taking place on both sides of the Pennines. Hack for Culture takes place on the 3rd and 4th March in Liverpool and is bringing interested parties together “to explore the possibilities offered by joint experimentation with a wide variety of hidden cultural data sets”.  The 24 hour-long CultureCode Hack takes place towards the end of March in Newcastle and will give cultural and arts organisations with open data the opportunity to work with developers and designers to create something new. You can take a peek at the hacks that were developed the Culture Hack North event in Leeds last year to get an idea of what can be produced in such a short amount of time.