The British Parliament is celebrating the 800th anniversary of Magna Carta:
On Thursday 5 February 2015, the four surviving original copies of Magna Carta were displayed in the Houses of Parliament – bringing together the documents that established the principle of the rule of law in the place where law is made in the UK today. The closing speech of the ceremony in the House of Lords was given by Sir Tim Berners-Lee, who is reported to have said:
I invented the acronym LOCKSS more than a decade and a half ago. Thank you, Sir Tim!
On October 24, 2014 Linus Torvalds added overlayfs to release 3.18 of the Linux kernel. Various Linux distributions have implemented various versions of overlayfs for some time, but now it is an official part of Linux. Overlayfs is a simplified implementation of union mounts, which allow a set of file systems to be superimposed on a single mount point. This is useful in many ways, for example to make a read-only file system such as a CD-ROM appear to be writable by mounting a read-write file system "on top" of it.
Other Unix-like systems have had union mounts for a long time. BSD systems first implemented it in 4.4BSD-Lite two decades ago. The concept traces back five years earlier to my paper for the Summer 1990 USENIX Conference Evolving the Vnode Interface which describes a prototype implementation of "stackable vnodes". Among other things, it could implement union mounts as shown in the paper's Figure 10:
This use of stackable vnodes was in part inspired by work at Sun two years earlier on the Translucent File Service, a user-level NFS service by David Hendricks that implemented a restricted version of union mounts. All I did was prototype the concept, and like many of my prototypes it served mainly to discover that the problem was harder than I initially thought. It took others another five years to deploy it in SunOS and BSD. Because they weren't hamstrung by legacy code and semantics by far the most elegant and sophisticated implementation was around the same time by Rob Pike and the Plan 9 team. Instead of being a bolt-on addition, union mounting was fundamental to the way Plan 9 worked.
About five years later Erez Zadok at Stony Brook led the FiST project, a major development of stackable file systems including two successive major releases of unionfs, a unioning file system for Linux.
About the same time I tried to use OpenBSD's implementation of union mounts early in the boot sequence to construct the root directory by mounting a RAM file system over a read-only root file system on a CD, but gave up on encountering deadlocks.
In 2009 Valerie Aurora published a truly excellent series of articles going into great detail about the difficult architectural and implementation issues that arise when implementing union mounts in Unix kernels. It includes the following statement, with which I concur:
The consensus at the 2009 Linux file systems workshop was that stackable file systems are conceptually elegant, but difficult or impossible to implement in a maintainable manner with the current VFS structure. My own experience writing a stacked file system (an in-kernel chunkfs prototype) leads me to agree with these criticisms.Note that my original paper was only incidentally about union mounts, it was a critique of the then-current VFS structure, and a suggestion that stackable vnodes might be a better way to go. It was such a seductive suggestion that it took nearly two decades to refute it! My apologies for pointing down a blind alley.
The overlayfs implementation in 3.18 is minimal:
Overlayfs allows one, usually read-write, directory tree to be overlaid onto another, read-only directory tree. All modifications go to the upper, writable layer.But given the architectural issues doing one thing really well has a lot to recommend itself over doing many things fairly well. This is, after all, the use case from my paper.
It took a quarter of a century, but the idea has finally been accepted. And, even though I had to build a custom 3.18 kernel to do so, I am using it on a Raspberry Pi serving as part of the CLOCKSS Archive.
Thank you, Linus! And everyone else who worked on the idea during all that time!
References (date order):
- David Hendricks, The Translucent File Service, pp. 87-93, Proceedings of the Autumn 1988 EUUG Conference, Vienna, Austria, October 1988.
- David S. H. Rosenthal, Evolving the Vnode Interface, Proceedings of the Summer 1990 USENIX Conference, Anaheim, 1990.
- Rob Pike, Dave Presotto, Sean Dorward, Bob Flandrena, Ken Thompson, Howard Trickey & Phil Winterbottom, Plan 9 from Bell Labs, Computing Systems Vol. 8, No. 3, Summer 1995.
- Jan-Simon Pendry & Marshall Kirk McKusick, Union Mounts in 4.4BSD-Lite. Proceedings of the USENIX Technical Conference on UNIX and Advanced Computing Systems: pp. 25–33 December 1995.
- David S. H. Rosenthal, A Digital Preservation Network Appliance Based on OpenBSD, BSDCon, 2003.
- Charles P. Wright, Jay Dave, Puja Gupta, Harikesavan Krishnan, Erez Zadok, and Mohammad Nayyer Zubair, Versatility and Unix Semantics in a Fan-Out Unification File System, Stony Brook University Technical Report FSL-04-01b, November 2004.
- Valerie Aurora, Unioning file systems: Architecture, features, and design choices, lwn.net, March 2009
- Valerie Aurora, Union file systems: Implementations, part I. lwn.net, March 2009.
- Valeria Aurora, Unioning file systems: Implementations, part 2. lwn.net April 2009.
- Miklos Szeredi, overlay filesystem, October 2014.
The UNT Libraries has made use of the ARK identifier specification for a number of years and have used these identifiers throughout our infrastructure on a number of levels. This post is to give a little background about where, when, why and a little about how we assign our ARK identifiers.Terminology
The first thing we need to do is get some terminology out of the way so that we can talk about the parts consistently. This is taken from the ARK documentationhttp://example.org/ark:/12025/654xz321/s3/f8.05v.tiff \________________/ \__/ \___/ \______/ \____________/ (replaceable) | | | Qualifier | ARK Label | | (NMA-supported) | | | Name Mapping Authority | Name (NAA-assigned) (NMA) | Name Assigning Authority Number (NAAN)
The ARK syntax can be summarized,[http://NMA/]ark:/NAAN/Name[Qualifier]
For the UNT Libraries we were assigned a Name Assigning Authority Number (NAAN) of 67531 so all of our identifiers will start like this ark:/67531/
We mint Names for our ARKs locally with a home-grown system locally called a “Number Server” this Python Web service receives a request for a new number, assigns that number a prefix based on which instance we pull from and returns the new Name.Namespaces
We have four different namespaces that we use for minting identifiers. They are the following, metapth, metadc, metarkv, and coda. Additionally we have a metatest namespace which we use when we need to test things out but it isn’t used that often. Finally we have a historic namespace that is no longer used that is metacrs. Here is the breakdown of how we use these namespaces.
We try to assign all items that end up on The Portal to Texas History with Names from the metapth namespace whenever possible. We assign all other public facing digital objects the metadc namespace. This means that the UNT Digital Library and The Gateway to Oklahoma History both share Names from the metadc namespace. The metarkv namespace is used for “archive only” objects that go directly into our archival repository system, these include large Web archiving datasets. The coda namespace is used within our archival repository called Coda. As was stated earlier the metatest namespace is only used for testing and these items are thrown away after processing.Name assignment
We assign Names in our systems in programatic ways, this is always done as part of our digital item ingest process. We tend to process items in batches, most often we try to process several hundred items at any given time and sometimes we process several thousand items. When we process items they are processed in parallel and therefore there is no logical order to how the Names are assigned to objects. They are in the order that they were processed but may have no logical order past that.
We also don’t assume that our Names are continuous. If you have an identifier metapth123 and metapth125 we don’t assume that there is an item metapth124, sure it may be there, but it also may never have been assigned. When we first started with these systems we would get worked up if we assigned several hundred or a few thousands identifiers and then had to delete those items, now this isn’t an issue at all but that took some time to get over.
Another assumption that can’t be made in our systems is that if you have an item, Newspaper Vol 1 Issue 2 that has an identifier of metapth333 there is no guarantee that Newspaper Vol. 1 Issue 3 will have metapth334, it might but it isn’t guaranteed either. Another thing that happens in our systems is that items can be shared between systems and the membership to either the Portal, UNT Digital Library or Gateway is notated in the descriptive metadata. Therefore you can’t say all metapth* identifiers are Portal or all metadc* identifiers are not the Portal, you have to look them up based on the metadata.
Once a number is assigned it is never assigned again. This sounds like a silly thing to say but it is important to remember, we don’t try and save identifiers, or reuse them as if we will run out of them.Level of assignment
We currently assign an ARK identifier at the level of the intellectual object. So for example, a newspaper issue gets and ARK, a photograph gets an ARK, a book, a map, a report, an audio recording, a video recording gets an ARK. The sub-parts of an item are not given further unique identifiers because the way that we tend to interface with them is in the form of formatted URLs such as those described here or from other URL based patterns such as the URLs we use to retrieve items from Coda.http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/manifest-md5.txt http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/coda_directives.py http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/bagit.txt http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/bag-info.txt http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/0=untl_aip_1.0 http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/data/data/01_data/queries.xlsx http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/data/data/01_data/README.txt http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/data/metadata.xml http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/data/metadata/ba3ce7a1-0e3b-44cb-8b41-5d9d1b0438fe.jhove.xml http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/data/metadata/7fe68777-54a2-4c71-95b2-aa33204ae84b.jhove.xml http:/coda.library.unt.edu/bag/ark:/67531/codanaf8/data/metadc498968.aip.mets.xml Lessons Learned Things I would do again.
- I would most likely use just an incrementing counter for assigning identifiers. Name minters such as Noid are also an option but I like the numbers with a short prefix.
- I would not use a prefix such as UNT do stay away from branding as much as possible. Even metapth is way too branded (see below).
- I would only have one namespace for non-archival items. Two namespaces for production data just invite someone to screw up (usually me) and then suddenly the reason for having one namespace over the other is meaningless. Just manage one namespace and move on.
- I would not have a six or seven character prefix. metapth and metadc came as baggage from our first system, we decided that the 30k identifiers we already minted had set our path. Now after 1,077,975 identifiers in those namespaces, it seems a little silly that those the first 3% of our items would have such an effect on us still today.
- I would not brand our namespaces so closely to our systems names such as metapth, metadc, and the legacy metacrs people read too much into the naming convention. This is a big reason for opaque Names in the first place, and is pretty important.
- I would probably pad my identifiers out to eight digits. While you can’t rely on the ARKs to be generated in a given order, once they are assigned it is helpful to be able to sort by them and have a consistent order, metapth1, metapth100, metapth100000 don’t always sort nicely like metapth00000001, metapth00000100, metapth00100000 do. But then again longer run numbers of zeros are harder to transcribe and I had a tough time just writing this example. Maybe I wouldn’t do this.
I don’t think any of this post applies only to ARK identifiers as most identifier schemes at some level have to have a decision made about how you are going to mint unique names for things. So hopefully this is useful to others.
If you have any specific questions for me let me know on twitter.
Last updated February 7, 2015. Created by Peter Murray on February 7, 2015.
Log in to edit this page.
From the release announcement
I'm pleased to announce the release of Hydra 9.0.0! This is the first release of the Hydra gem for Fedora 4 and represents almost a year of effort. In addition to working with Fedora 4, Hydra 9 includes many improvements and bug fixes. Especially notable is the ability to add RDF properties on repository objects themselves (no need for datastreams) and large-file streaming support.
Semantic enrichment is an active area of development for many publishers. Our enrichment processes are based on the use of different Knowledge Models (e.g., an ontology or thesaurus) which provide the terms required to describe different subject disciplines.
The CrossRef Taxonomy Interest Group is a collaboration among publishers, and sponsored by CrossRef, to share the Knowledge Models they are using, creating opportunities for standardization, collaboration and interoperability. Please join the webinar to get an introduction to the work this group is doing, use cases for the information collected and learn how your organization can contribute to the project.
Christian Kohl - Director Information and Publishing Technology, De Gruyter
Graham McCann - Head of Content and Platform Management, IOP Publishing
The webinar will take place on Tuesday, March 3rd at 11 am ET.
- “https.port=$HTTP_PORT” \
- “$JETTY_BASE/etc/jetty-ssl.xml” \
Updated February 3, 2015
Academy of Medical and Health Research
Agrivita, Journal of Agricultural Science (AJAS)
Eurasian Scientific and Industrial Chamber, Ltd.
Hitte Journal of Science and Education
Institute of Mathematical Problems of Biology of RAS (IMPB RAS)
MIM Research Group
Tomsk State University
Universitas Pendidikan Indonesia (UPI)
Amasya Universitesi Egitim Fakultesi Dergisi
Hikmet Yurdu Dusunce-Yorum Sosyal Bilimler Arastirma Dergisi
Necatibey Faculty of Education Electronics Journal of Science and Mathematics Education
Optimum Journal of Economics and Management Sciences
Last update January 26, 2015
Escola Bahiana de Medicine e Saude Publica
Escola Superior de Educacao de Paula Frassinetti
Lundh Research Foundation
ABRACICON: Academia Brasileira de Ciencias Contabeis
Canakkale Arastirmalari Turk Yilligi
Chinese Journal of Plant Ecology
Eskisehir Osmangazi University Journal of Social Sciences
Geological Society of India
Instituto do Zootecnia
Journal of Social Studies Education Research
Journal Press India
Kahramanmaras Sutcu Imam Universitesi Tip Fakultesi Dergisi
Nitte Management Review
Sanat Tasarim Dergisi
Sociedade Brasileira de Virologia
The Apicultural Society of Korea
The East Asian Society of Dietary Life
The Korea Society of Aesthetics and Science of Art
Turkish History Education Journal
Updated February 3, 2015
Total no. participating publishers & societies 5772
Total no. voting members 3058
% of non-profit publishers 57%
Total no. participating libraries 1926
No. journals covered 37,687
No. DOIs registered to date 72,062,095
No. DOIs deposited in previous month 471,657
No. DOIs retrieved (matched references) in previous month 41,726,414
DOI resolutions (end-user clicks) in previous month 134,057,984
We're all pretty excited about catching up with everyone at Code4Lib in Portland, Oregon next week. Karen Coombs, George Campbell and I will be going, along with Bruce Washburn and a couple of our other OCLC colleagues. Stop us and fill us in on what's new with you - we're anxious to hear about the projects you've been working on and what you'll be doing next. Or ask us about Developer House, our API Explorer, or whatever you'd like to know about OCLC Web services.
Last week, U.S. Senator Jack Reed (D-RI) joined Senate Appropriations Committee Chairman Thad Cochran (R-MS) in introducing the SKILLS Act (S.312). Key improvements to the program include expanding professional development to include digital literacy, reading and writing instruction across all grade levels; focusing on coordination and shared planning time between teachers and librarians; and ensuring that books and materials are appropriate students with special learning needs, including English learners.
The legislation would expand federal investment in school libraries so they can continue to offer students the tools they need to develop the critical thinking, digital, and research skills necessary for success in the twenty-first century.
“Effective school library programs are essential for educational success. Multiple education and library studies have produced clear evidence that school libraries staffed by qualified librarians have a positive impact on student academic achievement. Knowing how to find and use information are essential skills for college, careers, and life in general,” said Senator Reed, a member of the Senate Appropriations Committee, in a statement.
“Absent a clear federal investment, the libraries in some school districts will languish with outdated materials and technology, or cease to exist at all, cutting students off from a vital information hub that connects them to the tools they need to develop the critical thinking and research skills necessary for success,” Senator Reed continued. “This is a true equity issue, which is why I will continue to fight to sustain our federal investment in this area and why renewing and strengthening the school library program is so critical.”
“School libraries should be an integral part of our educational system,” said Chairman Cochran. “This bipartisan legislation is intended to ensure that school libraries are better equipped to offer students the reading, research and digital skills resources they need to succeed.”
The bipartisan SKILLS Act would further amend the Elementary and Secondary Education Act by requiring state and school districts plan to address the development of effective school library programs to help students gain digital literacy skills, master the knowledge and skills in the challenging academic content standards adopted by the state, and graduate from high school ready for college and careers. Additionally, the legislation would broaden the focus of training, professional development and recruitment activities to include school librarians.
he American Library Association (ALA) last week sent comments (pdf) to the U.S. Senate Committee on Health, Education, Labor, and Pensions (HELP) Chairman Sen. Lamar Alexander and member Sen. Patty Murray on the discussion draft to reauthorize the Elementary and Secondary Education Act.
The post Sens. Reed and Cochran introduce school library bill appeared first on District Dispatch.
Library of Congress: The Signal: Conservation Documentation Metadata at MoMA – An NDSR Project Update
The following is a guest post by Peggy Griesinger, National Digital Stewardship Resident at the Museum of Modern Art.
As the National Digital Stewardship Resident at the Museum of Modern Art I have had the opportunity to work with MoMA’s newly launched digital repository for time-based media. Specifically, I have been tasked with updating and standardizing the Media Conservation department’s documentation practices. Their documentation needs are somewhat unique in the museum world, as they work with time-based media artworks that are transferred from physical formats such as VHS and U-matic tape to a variety of digital formats, each encoded in different ways. Recording these processes of digitization and migration is a huge concern for media conservators in order to ensure that the digital objects they store are authentic representations of the original works they processed.
It is my job to find a way of recording this information that adheres to standards and can be leveraged for indexing, searching and browsing. The main goal of this project is to integrate the metadata into the faceted browsing system that already exists in the repository. This would mean that, for example, a user could narrow down a results set to all artworks digitized using a particular make and model of a playback device. This would be hugely helpful in the event that an error were discovered with that playback device, making all objects digitized using it potentially invalid. We need the “process history metadata” (which records the technical details of tools used in the digitization or migration of digital objects) to be easily accessible and dynamic so that the conservators can make use of it in innovative and viable ways.
The first phase of this project involved doing in-depth research into existing standards that might be able to solve our documentation needs. Specifically, I needed to find a standardized way to describe – in technical detail – the process of digitizing and migrating various iterations of a time-based media work, or what we call the process history of an object. This work was complicated by the fact that I had little technical knowledge of time-based media. This meant that I not only had to research and understand a variety of metadata standards but I also had to simultaneously learn the technical language being used to express them.
Fortunately, my education in audiovisual technology developed naturally through my extensive interviews and collaborations with the media conservators at MoMA. In order to decide upon a metadata standard to use, I needed to learn very specifically the type of information the conservators wanted to express with this metadata, and how that information would be most effectively structured. This involved choosing artworks from the collection and going over, in great detail, how these objects were assessed, processed, and, if necessary, digitized. After selecting a few standards (namely PBCore, PREMIS, and reVTMD) I thought were worth pursuing in detail, I mapped this information into XML to see if the standards could, in fact, adequately express the information.
Before making a final decision on which standard or combination of standards to use, I organized a metadata experts meeting to get feedback on my project. The discussion at this meeting was immensely helpful in allowing me to understand my project in the wider scope of the metadata world. I also found it extremely helpful to get feedback from experts in the field who did not have much exposure to the project itself, so that they could catch any potential problems or errors that I might not be able to see from having worked so closely with the material for so long.
One important point that was brought up at the meeting was the need to develop detailed use cases for the process history metadata in the repository. I talked with the media conservators at MoMA to see what intended uses they had for this information. To get an idea of the specific types of uses they foresee for this metadata, we can look at the use case for accessing process history metadata. This seems simple on the surface, but we had a number of questions to answer: How do users navigate to this information? Is it accessed at the artwork level (including all copies and versions of an artwork) or at the file level? How is it displayed? Is every element displayed, or only select elements? Where is this information situated in our current system? The discussions I had with the media conservators and our digital repository manager allowed us to answer these questions and create clear and concise use cases.
Developing use cases was simplified by two things:
1) we already had a custom-designed digital repository into which this metadata would be ingested and
2) we had a very clear idea of the structure and content of this metadata.
This meant we were very aware of what we had to work with, and what our potential limitations were. It was therefore very simple for us to know which use cases would be simple fixes and which would require developing entirely new functionalities and user interfaces in the repository. Because we had a good idea of how simple or complex each use case would be, we could prescribe levels of desirability to each use case to ensure the most important and achievable use cases were implemented first.
The next stop for this project will be to bring these use cases, as well as wireframes we have developed to reflect them, to the company responsible for developing our digital repository system. Through conversation with them we will begin the process of integrating process history metadata into the existing repository system.
As I pass the halfway point of my residency, I can look back on the work I have done with pride and look forward to the work still to come with excitement. I cannot wait to see this metadata fully implemented into MoMA’s time-based media digital repository as a dynamic resource for conservators to use and explore. Hopefully the tools we are in the process of creating will be useful to other institutions looking to make their documentation more accessible and interactive.
This semester, I have the exciting opportunity to work as an intern among the hum of computers and maze of cubicles at Indiana University’s Digital Library Program! My main projects include migrating two existing digital collections from TEI P4 to TEI P5 using XSLT. If you are familiar with XML and TEI, feel free to skim a bit! Otherwise, I’ve included short explanations of each and links to follow for more information.
Texts for digital archives and libraries are frequently marked up in a language called eXtensible Markup Language (XML), which looks and acts similarly to HTML. Marking up the texts allow them to be human- and machine-readable, displayed, and searched in different ways than if they were simply plain text.
The Text Encoding Initiative (TEI) Consortium “develops and maintains a standard for the representation of texts in digital form” (i.e. guidelines). Basically, if you wanted to encode a poem in XML, you would follow the TEI guidelines to markup each line, stanza, etc. in order to make it machine-readable and cohesive with the collection and standard. In 2007, the TEI consortium unveiled an updated form of TEI called TEI P5, to replace the older P4 version.
However, many digital collections still operate under the TEI P4 guidelines and must be migrated over to P5 moving forward. Here is where XSLT and I come in.
eXtensible Stylesheet Language (XSL) Transformations are used to convert an XML document to another text document, such as (new) XML, HTML or text. In my case, I’m migrating from one type of XML document to another type of XML document, and the tool in between, making it happen, is XSLT.
Many utilize custom XSLT to transform an XML representation of a text into HTML to be displayed on a webpage. The process is similar to using CSS to transform basic HTML into a stylized webpage. When working with digital collections, or even moving from XML to PDF, XSLT is an invaluable tool to have handy. Learning it can be a bit of an undertaking, though, especially adding to an already full work week.
I have free time, sign me up!
Here are some helpful tips I have been given (and discovered) in the month I’ve been learning XSLT to get you started:
- Register for a tutorial.
Lynda.com, YouTube, and Oracle provide tutorials to get your feet wet and see what XSLT actually looks like. Before registering for anything with a price, first see if your institution offers free tutorials. Indiana University offers an IT Training Workshop on XSLT each semester.
- Keep W3Schools bookmarked.
Their XSLT page acts as a self-guided tutorial, providing examples, function lists, and function implementations. I access it nearly every day because it is clear and concise, especially for beginners.
- Google is your best friend.
If you don’t know how to do something, Google it! Odds are someone before you didn’t have your exact problem, but they did have one like it. Looking over another’s code on StackOverflow can give you hints to new functions and expose you to more use possibilities. **This goes for learning every coding and markup language!!
- Create or obtain a set of XML documents and practice!
A helpful aspect of using Oxygen Editor (the most common software used to encode in XML) for your transformations is that you can see the results instantly, or at least see your errors. If you have one or more XML documents, figure out how to transform them to HTML and view them in your browser. If you need to go from XML to XML, create a document with recipes and simply change the tags. The more you work with XSLT, the simpler it becomes, and you will feel confident moving on to larger projects.
- Find a guru at your institution.
Nick Homenda, Digital Projects Librarian, is mine at IU. For my internship, he has built a series of increasingly difficult exercises, where I can dabble in and get accustomed to XSLT before creating the migration documents. When I feel like I’m spinning in circles, he usually explains a simpler way to get the desired result. Google is an unmatched resource for lines of code, but sometimes talking it out can make learning less intimidating.
Note : If textbooks are more your style, Mastering XSLT by Chuck White lays a solid foundation for the language. This is a great resource for users who already know how to program, especially in Java and the C varieties. White makes many comparisons between them, which can help strengthen understanding.
If you have found another helpful resource for learning and applying XSLT, especially an online practice site, please share it! Tell us about projects you have done utilizing XSLT at your institution!
This is a cross-post from the Open Knowledge Switzerland blog, see the original here.
It has been a big year for us in Switzerland. An openness culture spreading among civil administration, NGOs, SMEs, backed by the efforts of makers, supporters and activists throughout the country, has seen the projects initiated over the past three years go from strength to strength – and establish open data in the public eye.
Here are the highlights of what is keeping us busy – and information on how you can get involved in helping us drive Open Knowledge forward, no matter where you are based. Check out our Storify recap, or German- and French-language blogs for further coverage.
To see the Events Calendar for 2015, scroll on down.2014 in review #sports
Our hackdays went global, with Milan joining Basel and Sierre for a weekend of team spirit and data wrangling. The projects which resulted ranged from the highly applicable to the ludicrously inventive, and led us to demand better from elite sport. The event was a starting point for the Open Knowledge Sports Working Group, aiming to “build bridges between sport experts and data scientists, coaches and communities”. We’re right behind you, Rowland Jack!#international
The international highlight of the year was a chance for a sizeable group of our members to meet, interact and make stuff with the Open Knowledge community at OK Festival Berlin. Unforgettable! Later in the year, the Global Open Data Index got journalists knocking on our doorstep. However, the recently opened timetable data is not as open as some would like to think – leading us to continue making useful apps with our own open Transport API, and the issuing of a statement in Errata.#community
The yearly Opendata.ch conference attracted yet again a big crowd of participants to hear talks, participate in hands-on workshops, and launch exciting projects (e.g. Lobbywatch). We got some fantastic press in the media, with the public encouraged to think of the mountains of data as a national treasure. At our annual association meeting we welcomed three new Directors, and tightened up with the Wikimedia community inviting us to develop open data together.#science
CERN’s launch of an open data portal made headlines around the world. We were excited and more than a little dazzled by what we found when we dug in – and could hardly imagine a better boost for the upcoming initiative OpenResearchData.ch. Improving data access and research transparency is, indeed, the future of science. Swiss public institutions like the National Science Foundation are taking note, and together we are making a stand to make sure scientific knowledge stays open and accessible on the Internet we designed for it.#politics
Swiss openness in politics was waymarked in 2014 with a motion regarding Open Procurement Data passing through parliament, legal provisions to opening weather data, the City of Zürich and Canton of St.Gallen voting in commitments to transparency, and fresh support for accountability and open principles throughout the country. This means more work and new responsibility for people in our movement to get out there and answer tough questions. The encouragement and leadership on an international level is helping us enormously to work towards national data transparency, step by step.#government
The Swiss Open Government Data Portal launched at OKCon 2013 has 1’850 datasets published on it as of January 2015, now including data from cantons and communes as well as the federal government. New portals are opening up on a cantonal and city level, more people are working on related projects and using the data in their applications to interact with government. With Open Government Data Strategy confirmed by the Swiss Federal Council in April, and established as one of the six priorities of the federal E-Government action plan, the project is only bound to pick up more steam in the years ahead.#finance
With Open Budget visualisations now deployed for the canton of Berne and six municipalities – including the City of Zurich, which has officially joined our association – the finance interest group is quickly proving that it’s not all talk. Spending data remains a big challenge, and we look forward to continuing the fight for financial transparency. This cause is being boosted by interest and support from the next generation, such as the 29 student teams participating in a recent Open Data Management and Visualization course at the University of Berne.#apps/#apis
We may be fast, but our community is faster. Many new open data apps and APIs have been released and enhanced by our community: New open data projects were released by the community: such as WindUndWetter.ch and SwissMetNet API, based on just-opened national weather data resulting from a partial revision of the Federal Act on Meteorology and Climatology. Talk about “hold your horses”: a city waste removal schedule app led to intense debate with officials over open data policy, the results making waves in the press and open data developers leading by doing.#culture
An OpenGLAM Working Group started over the summer, and quickly formed into a dedicated organising committee of our first hackathon in the new year. Towards this at least a dozen Swiss heritage institutions are providing content, data, and expertise. We look forward to international participants virtually and on-location, and your open culture data!What’s coming up in 2015
Even if we do half the things we did in ‘14, a big year is in store for our association. Chances are that it will be even bigger: this is the year when the elections of the Federal Council are happening for the first time since our founding. It is an important opportunity to put open data in the spotlight of public service. And we are going to be busy running multiple flagship projects at the same time in all the areas mentioned.
Here are the main events coming up – we will try to update this as new dates come in, but give us a shout if we are missing something:
- 21. January: Open Finance and Participatory Budgeting, Bern
- 3. February: FlashHack with OpenCorporates, Zurich
- 6. February: Data Canvas Visualization Challenge, Lift15, Geneva
- 21. February: International Open Data Day
- 27. & 28. February: Open Cultural Data Hackathon, Bern
- 05. & 06. June: Open Research Data Hackdays, Lausanne & Basel
- 01. July: Opendata.ch Conference 2015, Bern
- 04. & 05. September: Election Hackdays 2015, Lausanne & Zurich
So, happy new year! We hope you are resolved to make more of open data in 2015. The hardest part may be taking the first step, and we are here for sport and support.
There is lots going on, and the easiest way to get started is to take part in one of the events. Start with your own neighbourhood: what kind of data would you like to have about your town? What decisions are you making that could benefit from having a first-hand, statistically significant, visually impressive, and above all, honest and critical look at the issue?
Lots is happening online and offline, and if you express interest in a topic you’re passionate about, people are generally quick to respond with invitations and links. To stay on top of things we urge you to join our mailing list, follow us on social media, and check out the maker wiki and forum. Find something you are passionate about, and jump right in! Reach out if you have any questions or comments.
I thought I might take a break to post an amusing photo of something I wrote out today:
The photo is a first draft of a database schema for a revamp of how LibraryThing will do library subjects. All told, it has 26 tables. Gulp.
About eight of the tables do what a good cataloging system would do:
- Distinguishes the various subject systems (LCSH, Medical Subjects, etc.)
- Preserves the semantic richness of subject cataloging, including the stuff that never makes it into library systems.
- Breaks subjects into their facets (e.g., “Man-woman relationships — Fiction”) has two subject facets
Most of the tables, however, satisfy LibraryThing’s unusual core commitments: to let users do their own thing, like their own little library, but also to let them benefit from and participate in the data and contributions of others.(1) So it:
- Links to subjects from various “levels,” including book-level, edition-level, ISBN-level and work-level.
- Allows members to use their own data, or “inherit” subjects from other levels.
- Allows for members to “play librarian,” improving good data and suppressing bad data.(2)
- Allows for real-time, fully reversible aliasing of subjects and subject facets.
The last is perhaps the hardest. Nine years ago (!) I compared LibraryThing to the “Ship of Theseus,” a ship which is “preserved” although its components are continually changed. The same goes for much of its data, although “shifting sands” might be a better analogy. Accounting for this makes for some interesting database structures, and interesting programming. Not every system at LibraryThing does this perfectly. But I hope this structure will help us do that better for subjects.(3)
Weird as all this is, I think it’s the way things are going. At present most libraries maintain their own data, which, while generally copied from another library, is fundamentally siloed. Like an evolving species, library records descend from each other; they aren’t dynamically linked. The data inside the records are siloed as well, trapped in a non-relational model. The profession that invented metadata, and indeed invented sharing metadata, is, at least as far as its catalogs go, far behind.
Eventually that will end. It may end in a “Library Goodreads,” every library sharing the same data, with global changes possible, but reserved for special catalogers. But my bet is on a more LibraryThing-like future, where library systems will both respect local cataloging choices and, if they like, benefit instantly from improvements made elsewhere in the system.
When that future arrives, we got the schema!1. I’m betting another ten tables are added before the system is complete.
2. The system doesn’t presume whether changes will be made unilaterally, or voted on. Voting, like much else, existings in a separate system, even if it ends up looking like part of the subject system.
3. This is a long-term project. Our first steps are much more modest–the tables have an order-of-use, not shown. First off we’re going to duplicate the current system, but with appropriate character sets and segmentation by thesaurus and language.
I think there are two keys to why I was a successful electrical engineer, when I did not (initially) succeed as a computer scientist—despite being more interested in the latter, to begin with, and despite wanting to pursue the latter now.The first key: invisible struggle, no displays of fallibility
I went to the University of Virginia as an undergrad. I transferred into the Engineering School a year in, which put me approximately one semester behind my peers. I chose Electrical Engineering (EE) instead of Computer Science (CS), even though it was a CS major who convinced me to switch. You see, I fell for a lot of the misconceptions laid out in Unlocking the Clubhouse: despite evidence to the contrary (I earned a high enough grade in the class to be hired as a teaching assistant for CS 101 in my second semester of college), I didn’t believe I could compete* with people who had been programming for their whole lives; and I vastly over-estimated how many people really fell into that bucket.
Also, because nobody told me that programming is hard for everyone when they start, I didn’t think CS was a field where I could be successful. I didn’t see everyone around me struggling, the way I did in my first EE class (which, to be fair, was pretty hellish).
I think that points to an important difference between my EE and CS education: I saw other EEs fail as often as I failed. Although it sounds that way, this isn’t me being modest, or feeling like an impostor, or anything else; I worked very hard and did very well. But I also know that a large part of my success comes from my peers and me taking time outside of class to teach each other the things the faculty didn’t see fit to impart; the homework assignments were too hard for us to do otherwise, because we were (deliberately?) not taught how to solve our homework problems in class. This is a common experience in engineering and, apparently, in physics — this Medium article does a fantastic job of explaining what’s wrong with teachers refusing to teach (though it comes with a trigger warning: it was written in the aftermath of a professor sexually harassing students).
So, one key to how my EE experience differs from CS is that I got to see my peers struggling, and it got me past my initial concern that they had all been tearing apart VCRs and putting them back together since the age of 10. (It was a very specific fear: I remember, it was VCRs, specifically, not watches or robots or anything else. Perhaps that points to a lack of imagination on my part.)
In CS, all of the assigned work was individual, and the focus on the school’s Honor Code meant that we were afraid to work together. I saw other CS students in the computer lab, but I didn’t know they were struggling as hard as I was. Even after working as a TA and helping people through their struggles, it took me more than a decade to internalize the fact that CS, like most things, is hard for beginners.
So, key one: In CS I kept believing the “everyone has been programming forever” lie, combined with the “I am not naturally good at this, and other people are” lie. In EE it was actively disproven, pretty much immediately.The second key: starting with ‘hello world’
But there was one other key to my success as an electrical engineering student: I took the “intro to EE for non EEs” course that they were piloting at the time—even though, unfortunately (for them), most of my colleagues didn’t join me in taking it. In that class we got an introduction to the broader field, with short descriptions of the various sub-fields of EE and beginner-level introductions to concepts we would later be taking in-depth classes on. The portion of the class dealing with information theory and signal processing gave me the background to understand several really difficult subjects when they were introduced (poorly) in 300-level classes, and that confidence (bolstered by the experience of explaining it to some of my peers) ultimately led me to double-specialize in “Communications” (by which I mean wireless engineering, signal processing, etc.), along with “Computers/Digital” (processor and chip design, etc.).
I would probably not have become a wireless engineer without that experience.
CS, on the other hand, had nothing like that. CS 101 was “Hey, here’s how you program really simple stuff in C++. Also, ignore half of what you’re typing.” It wasn’t “Here are the sub-fields of computer science,” or “Here are introductory-level explanations of some of the important stuff we’ll talk about later,” both of which would have been better.
CS 101 should be an introduction to the field of computer science and computer programming, not a first programming course. It should consist of a little Boolean logic, maybe some control flow (i.e. loops), and some basic information about data structures; then, “here’s what an algorithm is”; then, some high-level information about computer networks; then, maybe slip in something about software testing and/or version control; and, finally, it should definitely include an exploration of the differences between web programming, DevOps, middleware, and math-heavy CS research. Not only would that class help people understand the field and how they might like to be part of it; it would also improve interview questions, later on. (Seriously, front-end developers don’t need to know how to implement QuickSort!)
There are lots of important changes we should be making to the way CS is taught, but when we’re looking at how to find and retain students for a four-year major, I think adding a high-level class before beginning programming would help tremendously. It’s certainly better than the then-popular (and, I sincerely hope, now-outmoded) practice of making the second programming course into a “weeding” class—a course so hard that half the students quit or fail, then change majors. And I think that, in the process of designing the intro course’s curriculum, the CS faculty might find themselves rethinking the whole major. So, yes, you could say I’m proposing a band-aid, and I agree; but it might also be a first step to structural change.
*In an environment where grades are issued on a curve, education is a competition. Assignments and tests were so hard at UVA’s engineering school that one time I got 38% on a midterm, and that translated to an A. (back)
John Miedema: Lila is cognitive writing technology built on top of software like Evernote. Key differences.
Writers everywhere benefit from content management software like Evernote. Evernote can collect data from multiple devices and locations and organize it into a single writing repository. Evernote is beautiful software. For the last few years, I have been using Google Drive to collect notes. Recently I tried Evernote again, and I am impressed enough to switch. Notebooks, tags, collaboration, web clipping, related searches. All very nice.
Lila is cognitive writing technology built on top of software like Evernote. Here are some key differences between the products:
1. Evernote users read long-form content manually, decide if it is relevant, and then write notes to integrate it into their project. Lila will pre-read content for users and embed relevant notes (slips) in the context of the user’s writing. This will save the writer lots of reading and evaluation time.
2. Evernote users get “related searches” from a very limited number of web sources. Lila will perform open web searches for related content.
3. Evernote users can visualize a limited number of connections between notes. I am yet to get any utility out of this. Lila will use natural language search to generate a vast number of connections between notes, allowing a user to quickly understand complex relationships between notes.
4. Evernote users can use tags to construct a hierarchical organization of content. Notebooks can only have one sub-level of categorization, essentially chapters, but many writers need additional levels of classification. Tags can be ordered hierarchically and if you prefix them with a number they will sort in a linear order. You can use tags for hierarchical classification but it creates problems.
- If you want both categories and tags, you will have to use a naming convention to split tags into two types.
- Numbering tags causes them to lose type-ahead look-up functionality, i..e, you have to start by typing the number. It is a problem because numbers can be expected to change often.
- If you decide to insert a category in the middle of two tags, you have to manually re-number all the tags below.
- Tags are shared between Notebooks. Maybe that works for tags? Not for hierarchical sectioning of a single work.
None of these problems are technically insurmountable. I hope Evernote comes out with enhancements soon. I would like to build Lila on top of Evernote. Lila has something to add. To be cognitive means an inherent ability to automate hierarchical classification. Lila will be able to suggest hierarchical views, different ways of understanding the data, different choices for what could be a table of contents.
Last night [some days ago now], the Arbitration Committee for English Wikipedia reached a final decision on the case: https://en.wikipedia.org/wiki/Wikipedia:Arbitration/Requests/Case/GamerGate#Final_decision. The Committee chose to issue one complete site ban for a male editor, citing a pattern of disruptive behavior that included more than 20 lesser sanctions since 2006. No other Wikipedia editors received site-wide bans.
We can confirm that in addition to a single site-wide ban, the Committee issued and endorsed nearly 150 warnings, sanctions, or topic bans to other editors from various sides of the case. We can clarify that of the eleven Committee-issued topic bans, only one was applied to an editor who identifies as female. All of the sanctioned editors have the right to appeal in the future: over the years, the Committee has approved appeals if they are found to no longer be necessary.
Some reporting portrayed this case as a referendum on Gamergate itself, or as a purge of women or feminist voices from Wikipedia. That mischaracterizes the case, the role of the volunteer Arbitration Committee, and the nature of their findings.
The Committee does not consider the content of articles, it only focuses on the behavior of editors. This decision was also not a purge. Only one user has been removed from Wikipedia. Finally, it is not intended as a referendum on Gamergate — what is right, what is wrong, and its place in broader discourse — and should not be understood that way. That discussion may be necessary, but it is better suited for another forum.
Wikipedia is an encyclopedia. It is also the largest free knowledge resource in human history — and it is written by people from all over the world, often from very different backgrounds, who may hold differing points of view. This is made possible thanks to a fundamental principle of mutual respect: respectful discourse, and respect for difference and diversity.
The Wikimedia Foundation offers resources for programs and outreach with our partners across the global Wikimedia movement, and engage people that have been underrepresented in traditional encyclopedias. These include women, people of color, people from the Global South, immigrant communities, and members of the LGBTQ community. They are invaluable contributors to our community and partners in our mission.
For Wikipedia to represent the sum of all knowledge, it has to be a place where people can collaborate and disagree constructively even on difficult topics. It has to be a place that is welcoming for all voices. This is essential to ensuring people are free to to focus on being creative and constructive, and contributing to this remarkable collective human achievement.
For more on our stance on this issue, please see a blog post we released this week: https://blog.wikimedia.org/2015/01/27/civility-wikipedia-gamergate/.
I am sorry for jumping the gun on this, and I am deeply sorry for mischaracterizing the situation. I hope that this post can help to rectify any damage I may have unwittingly caused. Mea culpa.