Discovery

A metadata ecology for UK education & research
  • Home
 

Emerging bibliographic tools and technologies

October 19, 2011

On the 5th October 2011 I attended a workshop on ’emerging bibliographic tools’ organised by JISC. The idea of the workshop was to bring together a small group of people with experience of a wide variety of tools used to transform, publish, and otherwise manipulate bibliographic data.

The day kicked off (after introductions) with simply capturing the whole range of activity, formats and tools that the attendees through were relevant to exploiting bibliographic data. The nature of this session made it rather a whistlestop tour of technology and terminology, including:

  • Linked Data and RDF
  • NoSQL and related tools such as CouchDb, MongoDb (document stores) and Redis (a key-value store)
  • Big data (defined as ‘data bigger than your used to handling’) and Hadoop/MapReduce
  • Identifiers – the challenges of finding and exploiting appropriate ones such as DOI, ISBN, AuthorClaim and ORCID
  • Automatic metadata creation from full text resources
  • Visualisation tools – from Google Charts to R
  • Ontologies and representations – from MARC to BibJSON to RIS to BibTeX to Bibliographic Ontology to Schema.org
  • ‘Data reconciliation’ tools such as Google Refine and the Stanford Data Wrangler
  • Indexing technologies; Solr/Lucene, SolrMARC, Sphinx
  • Code libraries for MARC: PyMARC, ruby-marc, MARC::Record, MARC4J
  • Spidering/Web crawling technology: CrystalEye, PubCrawler, nutch
  • … and more

However, there was also time to discuss some aspects in more detail, going beyond just the tech, and starting to talk about the skills required to manipulate bibliographic data, and potential developments that might support those working with data, such as identifier lookups, visualisations, and data transformation services.

After lunch we picked up on these latter points looking for the opportunities, challenges and gaps that existed. The morning discussion had highlighted the incredible range of relevant technologies, and one of the challenges identified in the afternoon was keeping on top of existing and new initiatives, with the use of mentoring and online community support, identified as opportunities.

In the morning a healthcare metaphor was introduced with some discussion of a ‘Data Doctor’ role for organisations – someone with the technical skills, domain knowledge, and data expertise, who would be responsible for ensuring that the organisations data was in ‘good health’ (see also ‘data scientist‘ ). In the afternoon, this concept was expanded with the idea of a ‘data health check’ service, somewhere you could load data to identify possible problems, and crucially suggested workflows and resources for improving the data.

Perhaps the most crucial issues identified in the afternoon were around skills and sustainability. As we see an increasing need to manipulate data and publish it simultaneously in multiple formats to serve different audiences and needs, we need to find staff with appropriate skills, and ensure managers understand the business case for this work and the skills needed to support it.

At times, the range and scope of the technologies, tools and issues identified by the workshop was overwhelming, as acronyms and jargon flew freely around the room. However, the opportunities opened up by new ways of working with bibliographic (and other) data are exciting, and I strongly believe that we can take advantage of these to produce richer expressions of our data than ever before.

The technologies and tools identified by the workshop will form the basis of a short guide which will be published by the Discovery initiative.

Leave a Comment » | Uncategorized | Tagged: marc, metadata, tools, ukdiscovery | Permalink
Posted by ostephens


  • Links

    Discovery main site

  • RSS Presentations on Slideshare

    • Resource Discovery - a Bournemouth Perspective
    • Linked Data as an enabling framework for resource discovery across libraries, museums and archives
    • RLUK members meeting 25-11-11 discovery presentation
    • Aggregation Using Linked Data – LOCAH Project Experiences
    • Uk discovery-jisc-project-showcase
    • Using OpenUrl Activity Data Summary for RDTF Day 26 May 11
  • Bloggers

    • Adrian Stevenson
    • amcgregor
    • helenharrop
    • James Riding
    • joypalmer
    • nimp0
    • ostephens
    • serodavid
    • ukdiscovery
  • Related reads

  • aggregation businesscase devcsi digest discodev Discovery jiscad jiscsalt Ken Chad libraries linking lives locah marc metadata mimas opendata opensource rdtf RLUK sxsw sxsw interactive thought piece tools ukdiscovery usability
  • RSS

    • RSS - Posts
    • RSS - Comments
  • Archives

    • December 2012
    • November 2012
    • October 2012
    • September 2012
    • August 2012
    • July 2012
    • June 2012
    • May 2012
    • April 2012
    • March 2012
    • February 2012
    • January 2012
    • November 2011
    • October 2011
    • September 2011
    • August 2011
    • July 2011


Blog at WordPress.com.