“Telling DSpace Stories” is a community-led initiative aimed at introducing project leaders and their ideas to one another while providing details about DSpace implementations for the community and beyond. The following interview includes personal observations that may not represent the opinions and views of Cornell University Library or the DSpace Project.
Manage Metadata (Diane Hillmann and Jon Phipps): Review of: DRAFT Principles for Evaluating Metadata Standards
Metadata standards is a huge topic and evaluation a difficult task, one I’ve been involved in for quite a while. So I was pretty excited when I saw the link for “DRAFT Principles for Evaluating Metadata Standards”, but after reading it? Not so much. If we’re talking about “principles” in the sense of ‘stating-the-obvious-as-a-first-step’, well, okay—but I’m still not very excited. I do note that the earlier version link uses the title ‘draft checklist’, and I certainly think that’s a bit more real than ‘draft principles’ for this effort. But even taken as a draft, the text manages to use lots of terms without defining them—not a good thing in an environment where semantics is so important. Let’s start with a review of the document itself, then maybe I can suggest some alternative paths forward.
First off, I have a problem with the preamble: “These principles are intended for use by libraries, archives and museum (LAM) communities for the development, maintenance, governance, selection, use and assessment of metadata standards. They apply to metadata structures (field lists, property definitions, etc.), but can also be used with content standards and value vocabularies”. Those tasks (“development, maintenance, governance, selection, use and assessment” are pretty all encompassing, but yet the connection between those tasks and the overall “evaluation” is unclear. And, of course, without definitions, it’s difficult to understand how ‘evaluation’ relates to ‘assessment’ in this context—are they they same thing?
Moving on to the second part about what kind of metadata standards that might be evaluated, we have a very general term, ‘metadata structures’, with what look to be examples of such structures (field lists, property definitions, etc.). Some would argue (including me) that a field list is not a structure without a notion of connections between the fields; and although property definitions may be part of a ‘structure’ (as I understand it, at least), they are not a structure, per se. And what is meant by the term ‘content standards’, and how is that different from ‘metadata structures’? The term ’value vocabularies’ goes by many names, and is not something that can go without a definition. I say this as an author/co-author of a lot of papers that use this term, and we always define it within the context of the paper for just that reason.
There are many more places in the text where fuzziness in terminology is a problem (maybe not a problem for a checklist, but certainly for principles). Some examples:
1. What is meant by ’network’? There are many different kinds, and if you mean to refer to the Internet, for goodness sakes say so. ‘Things’ rather than ‘strings’ is good, but it will take a while to make it happen in legacy data, which we’ll be dealing with for some time, most likely forever. Prospectively created data is a bit easier, but still not a cakewalk — if the ‘network’ is the global Internet, then “leveraging ‘by-reference’ models” present yet-to-be-solved problems of network latency, caching, provenance, security, persistence, and most importantly: stability. Metadata models for both properties and controlled values are an essential part of LAM systems and simply saying that metadata is “most efficient when connected with the broader network” doesn’t necessarily make it so.
2. ‘Open’ can mean many things. Are we talking specific kinds of licenses, or the lack of a license? What kind of re-use are you talking about? Extension? Wholesale adoption with namespace substitution? How does semantic mapping fit into this? (In lieu of a definition, see the paper at (1) below)
3. This principle seems to imply that “metadata creation” is the sole province of human practitioners and seriously muddies the meaning of the word creation by drawing a distinction between passive system-created metadata and human-created metadata. Metadata is metadata and standards apply regardless. What do you mean by ‘benefit user communities’? Whose communities? Please define what is meant by ‘value’ in this context? How would metadata practitioners ‘dictate the level of description provided based on the situation at hand’?
4. As an evaluative ‘principle’ this seems overly vague. How would you evaluate a metadata standard’s ability to ‘easily’ support ‘emerging’ research? What is meant by ‘exchange/access methods’ and what do they have to do with metadata standards for new kinds of research?
5. I agree totally with the sentence “Metadata standards are only as valuable and current as their communities of practice,” but the one following makes little sense to me. “ … metadata in LAM institutions have been very stable over the last 40 years …” Really? It could easily be argued that the reason for that perceived stability is the continual inability of implementers to “be a driving force for change” within a governance model that has at the same time been resistant to change. The existence of the DCMI usage board, MARBI, the various boards advising the RDA Steering Committee, all speak to the involvement of ‘implementers’. Yet there’s an implication in this ‘principle’ that stability is liable to no longer be the case and that implementers ‘driving’ will somehow make that inevitable lack of stability palatable. I would submit that stability of the standard should be the guiding principle rather than the democracy of its governance.
6. “Extensible, embeddable, and interoperable” sounds good, but each is more complex than this triumvirate seems. Interoperability in particular is something that we should all keep in mind, but although admirable, interoperability rarely succeeds in practice because of the practical incompatibility of different models. DC, MARC21, BibFrame, RDA, and Schema.org are examples of this — despite their ‘modularity’ they generally can’t simply be used as ‘modules’ because of differences in the thinking behind the model and their respective audiences.
I would also argue that ‘lite style implementations’ make sense only if ‘lite’ means a dumbed-down core that can be mapped to by more detailed metadata. But stressing the ‘lite implementations’ as a specified part of an overall standard gives too much power to the creator of the standard, rather than the creator of the data. Instead we should encourage the use of application profiles, so that the particular choices and usages of the creating entity are well documented, and others can use the data in full or in part according to their needs. I predict that lossy data transfer will be less acceptable in the reality than it is in the abstract, and would suggest that dumb data is more expensive over the longer term (and certainly doesn’t support ‘new research methods’ at all). “Incorporation into local systems” really can only be accomplished by building local systems that adhere to their own local metadata model and are able to map that model in/out to more global models. Extensible and embeddable are very different from interoperable in that context.
7. The last section, after the inarguable first sentence, describes what the DCMI ‘dumb-down’ principle defined nearly twenty years ago, and that strategy still makes sense in a lot of situations. But ‘graceful degradation’ and ‘supporting new and unexpected uses’ requires smart data to start with. ‘Lite’ implementation choices (as in #6 above) preclude either of those options, IMO, and ‘adding value’ of any kind (much less by using ‘ontological inferencing’) is in no way easily achievable.
I intend to be present at the session in Boston [9:00-10:00 Boston Conference and Exhibition Center, 107AB] and since I’ve asked most of my questions here I intend not to talk much. Let’s see how successful I can be at that!
It may well be that a document this short and generalized isn’t yet ready to be a useful tool for metadata practitioners (especially without definitions!). That doesn’t mean that the topics that it’s trying to address aren’t important, just that the comprehensive goals in the preamble are not yet being met in this document.
There are efforts going on in other arenas–the NISO Bibliography Roadmap work, for instance, that should have an important impact on many of these issues, which suggests that it might be wise for the Committee to pause and take another look around. Maybe a good glossary would be a important step?
Dunsire, Gordon, et al. “A Reconsideration of Mapping in a Semantic World”, paper presented at International Conference on Dublin Core and Metadata Applications, The Hague, 2011. Available at: dcpapers.dublincore.org/pubs/article/view/3622/1848
I’ve had such a great time hanging together with all of you – staff at our Partner institutions; professionals from all corners of the library, archive, and museum worlds; and especially my OCLC colleagues.
But at the end of December, I’m going to retire and fly solo. Well, actually I’ll have a co-pilot. My husband, Ted, and I are dramatically downsizing, moving into an RV, and hitting the road. We plan to travel for the next while — we refer to it as two weeks, two months, or two years (to find out which it will be, follow our blog at twotwotwo.net).
I’ve learned a lot from all of you and hope I’ve contributed in a useful manner to the conversations about digitization, LAM collaboration, archiving born-digital content, data curation, and support for researchers.
OCLC Research will be filling the vacant senior program officer position with someone with experience specifically in research information management. If this sounds like you (or someone you know), check out the posting. It’s a fantastic job doing meaningful work with wonderful people.
In the meantime, I’ll be busy finishing up a few projects at work (and a few thousand at home).
If we meet up on the road somewhere, let’s hang together!
RickyAbout Ricky Erway
Ricky Erway, Senior Program Officer at OCLC Research, works with staff from the OCLC Research Library Partnership on projects ranging from managing born digital archives to research data curation.Mail | Web | Twitter | LinkedIn | More Posts (43)
This week's listserv conversation that deserves highlighting was kicked off by Cecily Walker from Vancouver Public Library. As part of a skills gap analysis, she came to the Islandora listserv with the question "what are the skills needed for a successful Islandora implementation?" with an emphasis on front and back-end development. It's a great question for both those looking to hire new staff to work with Islandora, and users of Islandora who want to develop their skills. And it solicited some very useful responses:
From Kelsey Williamson:
This may not be as technical an answer as you might be hoping for, but as someone who was thrown in the deep end, now working from inside the gap, I thought I might weigh in. I came from a pretty basic library background with some experience managing CMSs and creating simple webpages and I understood fundamentally what xml was and how it worked. I've had to learn how to work with xpath and xslt in order to manage and troubleshoot metadata creation/edits, as well as solr searches. I'm currently in the process of training someone to work with me and I've found it really difficult to explain the abstract - how the "ecosystem" works, fedora datastreams, rdf, etc.
But, what I've found most challenging so far in my case is that there is a separation of knowledge between technical (server set up, installations, etc) and metadata skillsets. I have someone helping me with the back-end aspects who is super smart and competent, but doesn't really work with metadata standards or any other library stuff for that matter. I don't have server access- so it's challenging to work through and communicate metadata issues/solr configuration problems. There is definitely a disconnect going on there (guess that's a soft skills issue).
You also might check out some presentations from the 2015 conference - http://islandora.ca/camps/conference2015/schedule - there are some presentations geared towards beginners that might illuminate required skills a little more clearly.
From Diego Pino:
I would start with metadata understanding. My experience tells me that you should always manage the most valuable asset for your end users (which most of the time are Librarians and/or other preservation professionals). For how the world moves, XML and XSLT(path, etc) are not enough today. RDF is a must, Linked data and SPARQL also.
Under the same understanding, end users are the most important asset (yeah, it's not really metadata!), Solr knowledge is also a need. From how things work, to what/how stuff is stored in an index. Basically because users do two basic actions: Search + Look. (plus ingesting of course) and if those things don't work you get frustration.
Since Islandora is based on Drupal and Drupal is php, and since learning a new CMS is far less challenging that learning to code a new language, some basic/medium PHP knowledge is also required.
I don't think there are many people out there that manage the whole Stack as per se. Fedora is not that easy to learn without getting your hands onto installing it, which means a mix of hard and soft skills, from basic Unix management, Some services knowledge (Apache 2, Tomcat, Solr already mentioned), installing stuff, etc + being able to learn independently (research), which leads to good communication skills to interact with the community forums. Most people dismiss the importance of reading + failing an retrying . I have learned a lot from just browsing through the code, watching how the different modules interact, changed, evolved in time and even more by making stuff fail.
And finally, internal communication skills. Like being able to ask end users for their needs, helping them to find solutions and transferring those needs to technical actions and back.
Repositories are as complex as the information you wan't to preserve, and I personally think many soft skills lead to better adoption of hard skills, not the other way.
Not a formal definition, but if I were looking for someone, I would put as many efforts as possible to find someone as flexible enough and communicative enough to jump into technical learning that also has some love for preservation.
And finally, a detailed look at a wide variety of skills from Donald Moses:Using the Islandora VM to learn about Drupal and Fedora
- Downloading and running the Islandora VM so that you can test, model and explore locally before you apply changes to a production environment.
- Troubleshooting using various logfiles (Drupal, Fedora, Solr, GSearch).
- Ability to navigate the Drupal CMS.
- Applying Drupal's roles and permissions.
- Extending Drupal by downloading and configuring Drupal modules and libraries.
- Configuring Islandora through the Drupal UI (modules, helper utilities, viewers).
- Using Git to help manage code/documentation/workflow.
- MODS Guidelines - http://www.loc.gov/standards/mods/userguide/generalapp.html
- Oxygen XML Editor - especially useful when developing or working with a new schema. I typically create the ideal record I want to have in that tool then recreate those elements in the XML Form Builder. I use Oxygen to validate XML, to run XML through transformation scenarios, to grab XPATHs of elements.
- XML Form Builder - Create/edit/troubleshoot forms.
- Solr - understand how to configure the Islandora Solr options. More importantly be able to have a snoop through Solr to get an understanding of how your content is getting indexed. eg. http://localhost:8080/solr/collection1/select?q=PID%3Aislandora%5C%3Aroo...
- Learning about and apply simple patterns for searching the triplestore http://localhost:8080/fedora/risearch
- Having an awareness of the work being done in the community (what modules have been or are being developed).
- Reporting bugs or other issues.
- Helping with the work that needs to be done (and there's a lot of it).
- Participating in Interest Groups.
- Willingness to keep learning.
his post was written by Arianit Dobroshi
Kosovo is ranked 35th in the 2015 Open Knowledge Global Open Data Index , down from 31st place in last year’s measurement and marked as 43% open. In South East Europe, Albania ranked 37th and Macedonia, 69th. Others, such as Serbia and Bosnia, have not been scored since they did not submit all datasets to this year’s Index.
The drop in ranking is due to three reasons:
Firstly, the Government did not make any advancements in open data during 2015 on the datasets the Index covers.
Secondly, there are still copyright notices that mark content on public websites as protected, when actually it is in public domain. This is done despite the Law on Copyright which places documents the Government produces with the purpose of informing the public into the public domain.
Thirdly, Kosovo ranks particularly low in the four new datasets that were added this year to the Index. Kosovo is completely missing two datasets: Location datasets and Weather forecast. The Hydro-meteorology Institute collects the required weather data on their website but they are published only for a few days and then removed.
While the government of Kosovo does a fine job collecting the data in question, the data is not made publicly available on what I suspect are, for the most part, pure organizational reasons rather than political or technical ones. While this is unfortunate, the good news is that improvement would be quite easy. The requirements are modest to start with and Kosovo has already met most of the publishing criteria.
Let’s divide it into the index criteria:
Kosovo public data is in digital form; again not a fancy requirement today since having it otherwise makes it more burdensome for public officials.
Open License: Kosovo copyright law is rather good in placing public documents in the public domain for free use, including for commercial purposes, with the exception of the national map which is protected and commercialized by the Cadastre Agency. The Law on Access to Public Documents is another matter, however, since it stops use of documents accessed through this law “for commercial or propaganda purposes”. On machine readability, it starts getting tricky as many of the public websites were not developed with this requirement in mind.
On bulk availability Kosovo again misses out. Here the country would probably score worse if it was not for some of the datasets being shared in simple Excel format. We have quite a bit of spreadsheets and not enough databases. In cases of good databases, the public can’t access them in bulk.
We score well on up to date provision, but once again requirements are not very high here. Pollution data for example, are required on a yearly basis.
Where do we go from here?
The Kosovo Open Government Partnership Action Plan (in Albanian) has an action item dedicated to Open Data. An Open Data Portal has been set up this year, though the datasets provided are rather limited at this point. The Office of the Prime Minister has started consultations on amending the Law on Access to Information to include the EU Directive on the re-use of public sector information which will provide a strong legal basis for open public data by positively committing the country to an open PSI policy.
Fresh devotion needs to be found about open government data once again. Not that the outcome will be the panacea to Kosovo’s problems, but it will strengthen transparency and evidence-based policy making and offer a strong base for other good governance efforts in Kosovo.
Post-war Kosovo was lucky to establish an administration without the burden of legacy systems. yet from the age of policy decision making to the current status, the lack of capacity and ability to adapt is beginning to be an impetus to the country’s progress.
Journal of Web Librarianship: Digital Library Acceptance Model and Its Social Construction: Conceptualization and Development
Journal of Web Librarianship: A Review of "New Content in Digital Repositories: The Changing Research Landscape"
Journal of Web Librarianship: The Evolving Impact of the Invisible Web: Exploring Economic and Political Ramifications
Today I found the following resources and bookmarked them on Delicious.
- Open Broadcaster Software Free, open source software for live streaming and recording
Digest powered by RSS Digest
xID product including xISBN, xISSN, and xOCLCNum will be retired March 15, 2016.
Libraries have a tradition of serving their communities, so our Washington Office staff—spearheaded by Office of Information Technology Policy’s (OITP) Carrie Russell—chose to serve our Washington, D.C. community by donating holiday shoebox gifts to homeless families through a wonderful organization called So Others Might Eat (SOME).
The holiday shoebox drive takes place each year, but this is the first year ALA Washington identified this charity as our way to serve. The way it works is that every shoebox is filled with essentials, such as soap, shampoo, deodorant, shaving sets, underwear, socks, and hats, scarves and gloves, among other items. Each shoebox is then wrapped with shiny holiday paper and labeled whether it is for a man, woman, girl or boy, and then delivered by SOME to needy families throughout the D.C. community.
Every member of the Washington Office participated in this effort. Some brought in toiletries, others hats, scarves and gloves, and others brought additional clothing and stuffed toys to donate to families in need. Carrie even knit several hats! We spread out everything in our conference room and had two group efforts: one to sort everything so each shoebox had all the essentials, and another to wrap all the shoeboxes. Then we delivered the holiday shoeboxes, clothing and toys to SOME headquarters.
As you can see from the picture that shows the stacks of shoeboxes that have come in from many organizations and individuals, hundreds and hundreds of families will be touched by SOME’s holiday deliveries.
As we left, we turned and thanked SOME staff members for giving us the opportunity to serve.
DPLA: DPLA Workshop: Introduction to DPLA’s Application Programming Interface, February 11, 2016, 3:30 PM Eastern
We’re pleased to invite our extended community to attend a free DPLA workshop webinar — An Introduction to DPLA’s Application Programming Interface — taking place on February 11, 2016 at 3:30PM Eastern.
This webinar, led by DPLA Technology Specialist Mark Breedlove, will introduce the fundamentals of distributed web application architecture to an uninitiated audience, with a special focus on the DPLA’s Application Programming Interface, or API. We will cover what a web application is, what an API is, how web applications on different sites communicate with one another, and why. This webinar should interest those who have been hearing about “APIs” and “web applications,” but do not fully understand what these terms mean; or those who understand generally, but want to get a better sense of what the DPLA’s API can do and what role it plays at a high level. We aim to have participants leave the webinar feeling confident in their ability to discuss web APIs and strategies for their implementation.
DPLA Workshops are online learning opportunities highlighting subjects central to our community, such as education, metadata, technology, copyright, and more. These events are open to the public (registration required). To hear about all upcoming workshop announcements, sign up for our mailing list.
Lucas123 at Slashdot reports:
Hard disk drive per-gigabyte pricing has remained relatively stagnant over the past three years, and prices are expected to be completely flat over at least the next two, allowing SSDs to significantly close the cost gap, according to a new report. The report, from DRAMeXchange, stated that this marks the fourth straight quarter that the SSD price decline has exceeded 10%. ... However, through 2017, the per-gigabyte price of HDDs is expected to remain flat: 6 cents per gigabyte. Consumer SSDs were on average were selling for 99 cents a gigabyte in 2012. From 2013 to 2015, the price dropped from 68 cents to 39 cents per gig, meaning the average 1TB SSD sells for about $390 today. Next year, SSD prices will decline to 24 cents per gig and in 2017, they're expected to drop to 17 cents per gig.In October, in a post based on Robert Fontana's excellent slides from the Library of Congress Storage Architecture workshop, I wrote
dividing the $/GB for flash by the $/GB for hard disk gives us the cost ratio and the following table:
- Year Cost Ratio
- 2008 12.2
- 2009 13.1
- 2010 17.7
- 2011 11.6
- 2012 7.8
- 2013 8.7
- 2014 8.4
- 2015 6.5
- 2016 4
- 2017 2.8
- Flash is inherently a more valuable medium than hard disk, so will command a price premium even if manufacturing costs per GB were equivalent.
- Flash supply is constrained by fab capacity, which is inadequate to displace much of the hard disk volume.
A major reauthorization bill overhauling K-12 education policy–the Every Student Succeeds Act (ESSA)–has been signed into law by President Obama and in a significant victory for ALA’s decade of advocacy efforts, it includes provisions favorable to libraries.
ALA President Sari Feldman praised all ALA members, crediting their unified, collective, high-impact messages to their Members of Congress for the favorable provisions for school libraries specifically included in the reauthorization legislation.
AASL President Leslie Preddy said, “For school-age students, ESSA represents an historic new chapter in federal support of education, one that will ensure effective school library programs are there to help them learn how to use new technology tools, develop critical thinking, and the reading and research skills essential to achievement in science, math and all other ‘STEM’ fields.”
“School libraries and school librarians are really recognized as critical education partners in this bill,” Feldman said in an Education Week article posted this week.
As noted in a previous District Dispatch article, the new bill authorizes the Innovative Approaches to Literacy program that allows the education secretary to “award grants, contracts, or cooperative agreements, on a competitive basis” to promote literacy programs in low-income areas, including “developing and enhancing effective school library programs.”
Those funds can go toward library resources and providing professional development for school librarians. States and districts can also use Title II funds for “supporting the instructional services provided by effective school library programs.”And the bill encourages local education agencies to assist schools in developing effective school library programs, in part to help students gain digital skills.
As Feldman notes in the Education Week article, “It’s very clear that as libraries are called out by the federal government in this legislation and there’s opportunity to apply for funding around effective school libraries, it will also strengthen state mandates around libraries.”
ESSA replaces No Child Left Behind, the 2002 signature domestic initiative of President George W. Bush that heightened Washington’s role in local classrooms. It sends significant power back to states and local districts while maintaining limited federal oversight of education.
This will move the focus for library advocacy efforts to the local level in coming days, but for now, ALA members deserve to savor the achievement wrought by their long-term efforts.
The post Significant victory for libraries as President signs ESSA into law appeared first on District Dispatch.