Open Knowledge Foundation: Open Knowledge Belgium: Bringing Together Open Communities, Policy Makers & Industry
Open Knowledge Belgium to host The Second Edition of Open Belgium in Namur on Feb 23rd, 2015! Register Today!
On 23 February, Open Knowledge Belgium is hosting the second edition of Open Belgium, an event expected to attract over 200 people, coming together to learn and discuss the growing open knowledge movement in Belgium. This year Open Knowledge Belgium is hosting the conference, together with our Walloon colleagues and partners, at the Palais des Congrès in Namur.
The jam-packed programme is not to be missed! With over 35 speakers, the objective of the day is unpack challenges, explore opportunities and learn about technological developments as they relate to Open Data and Open Knowledge. The event presents an ideal opportunity to exchange best practices with national and international experts.The conference program includes:
The conference will open with a panel discussion on the state-of-play of open data and open knowledge in Belgium, followed by a series of keynote talks and eight participatory workshops!
A panel discussion on Open data in Belgium, with representatives from the federal and regional governments.
A Series of Keynotes
- Jörgen Gren of DG Connect on the future of Open Data in Europe
- Dimitri Brosens of the institute of Nature and Forests (INBO) becoming an open research institut
- Thomas Hermine (Nextride) and Antoine Patris (TEC) on how opening up Walloon public transport data offers new opportunities and economic value.
Eight Participatory Workshops:
Following the keynotes, participants will have the opportunity to participate in eight workshops focused on specific themes and organised by national and international experts.
- Open Transport, from data source to journey planner (moderated by Pieter Colpaert)
- Open Culture, tackling barriers with benefits (Barbara Dierickx)
- Open Tools, using tools to release the full Open Data potential (Philippe Duchesne)
- Open Tourism, the importance of framing the scheme online efforts (Raf Buyle)
- OpenStreetMap, the importance of working with communities (Ben Abelshausen)
- Open Science, going beyond open access (Gwen Franck)
- Local Open Data efforts in Belgium (Wouter Degadt)
- Emerging Open Data business models (Tanguy De LESTRE).
Open Knowledge Belgium will close the day with networking drinks on a rooftop terrace overlooking the city of Namur.
Practical information and registration
- Date and Location: Monday, February 23, 2015 in [Namur Palais des Congrès](http://2015.openbelgium.be/practical/)
- Admission: € 130 – [Register online](http://2015.openbelgium.be/registrations/)
- Contact the organisers: firstname.lastname@example.org
Code4Lib is an annual, volunteer-organized conference focused on the intersection of technology and cultural heritage. DPLA is participating heavily in Code4Lib 2015, taking place on February 9 – 12 in Portland, Oregon. Here’s a handy guide detailing some of the key places they’ll be and how you can connect with them.
- Monday, February 9 (9 AM – noon): Tom Johnson (DPLA Metadata and Platform Architect) will lead a Linked Data Workshop with Karen Estlund (University of Oregon).
- Monday, February 9 (1:30 – 4:30 PM): Tom Johnson, Mark Matienzo (DPLA Director of Technology), Mark Breedlove (DPLA Technology Specialist), Audrey Altman (DPLA Technology Specialist), Gretchen Gueguen (DPLA Data Services Coordinator), and Amy Rudersdorf (DPLA Assistant Director for Content) will lead an introductory workshop on the DPLA API.
- Wednesday, February 11 (4:30 PM): Audrey Altman, Mark Breedlove, and Gretchen Gueguen will present on DPLA’s new ingestion system. The presentation is entitled, “Heiðrún: DPLA’s Metadata Harvesting, Mapping and Enhancement System.”
Beyond these formal opportunities to connect, these folks are eager to chat and answer questions about timely topics including the Community Reps application, DPLAfest 2015, and DPLA’s recent work upgrading its ingestion system.
In addition to staff participation, and in keeping with DPLA’s broader commitment to diversity, DPLA has also supported Code4Lib 2015 by helping to sponsor one of the Code4Lib 2015 Diversity Scholarships as part of the Code4Lib community.
Questions about where specific DPLA staffers will be at Code4Lib 2015? Drop one of us a line!
Backblaze now have over 41K drives ranging from 1.5TB to 6TB spinning. Their data for a year consists of 365 daily tables each with one row for each spinning drive, so there is a lot of it, over 12M records. The 4TB disk generation looks good:
We like every one of the 4 TB drives we bought this year. For the price, you get a lot of storage, and the drive failure rates have been really low. The Seagate Desktop HDD.15 has had the best price, and we have a LOT of them. Over 12 thousand of them. The failure rate is a nice low 2.6% per year. Low price and reliability is good for business.
The HGST drives, while priced a little higher, have an even lower failure rate, at 1.4%. It’s not enough of a difference to be a big factor in our purchasing, but when there’s a good price, we grab some. We have over 12 thousand of these drives.Its too soon to tell about the 6TB generation:
Currently we have 270 of the Western Digital Red 6 TB drives. The failure rate is 3.1%, but there have been only 3 failures. ... We have just 45 of the Seagate 6 TB SATA 3.5 drives, although more are on order. They’ve only been running a few months, and none have failed so far.What grabbed all the attention was the 3TB generation:
The HGST Deskstar 5K3000 3 TB drives have proven to be very reliable, but expensive relative to other models (including similar 4 TB drives by HGST). The Western Digital Red 3 TB drives annual failure rate of 7.6% is a bit high but acceptable. The Seagate Barracuda 7200.14 3 TB drives are another story.Their 1163 Seagate 3TB drives with an average age of 2.2 years had an annual failure rate (AFR) over 40% in 2014. Backblaze's economics mean that they can live with a reasonably high failure rate:
Double the reliability is only worth 1/10th of 1 percent cost increase. ...
Replacing one drive takes about 15 minutes of work. If we have 30,000 drives and 2 percent fail, it takes 150 hours to replace those. In other words, one employee for one month of 8 hour days. Getting the failure rate down to 1 percent means you save 2 weeks of employee salary - maybe $5,000 total? The 30,000 drives costs you $4m.
The $5k/$4m means the Hitachis are worth 1/10th of 1 per cent higher cost to us. ACTUALLY we pay even more than that for them, but not more than a few dollars per drive (maybe 2 or 3 percent more).
Moral of the story: design for failure and buy the cheapest components you can. :-)40% AFR is really high, but labor to replace the failed drives would still have cost less than $8/drive. The cost isn't the interesting aspect of this story. The drives would have failed at some point anyway, incurring the replacement labor cost. The 40% AFR just meant the labor cost, and the capital cost of new drives, was incurred earlier than expected, reducing the return on the investment in purchasing those drives.
Alas, there is a long history of high failure rates among particular batches of drives. An experience similar to Backblaze's at Facebook is related here, with an AFR over 60%. My first experience of this was nearly 30 years ago in the early days of Sun Microsystems. Manufacturing defects, software bugs, mishandling by distributors, vibration resonance, there are many causes for these correlated failures. It is the correlated failures that make the interesting connection with the Self-Repairing Disk Arrays paper.
The first thing to note about the paper is that Paris et al are not dealing with Backblaze-scale arrays:
These solutions are not difficult to implement in installations that have trained personnel on site round-the-clock. When this is not the case, disk repairs will have to wait until a technician can service the failed disk. There are two major disadvantages to this solution. First, it introduces an additional delay, which will have a detrimental effect on the reliability of the storage system. Second, the cost of the service call is likely to exceed that of the equipment being replaced.4-slot DroboThe first problem with the paper is that there has been a technological solution to this problem for a decade since Data Robotics (now Drobo) introduced the Drobo. I've been using them ever since. They are available in configurations from 4 to 12 slots and in all cases when a drive fails the light by the slot flashes red. All that is needed is to pull out the failed drive and push in a replacement disk the same size or bigger. The Drobo's firmware handles hot-swapping and recovers the failed drive's data with no human intervention. No technician and much less than 15 minutes per drive needed.
The second problem is that although the paper's failure model is based on 2013 failure data from Backblaze, it appears to assume that the failures are uncorrelated. The fact that errors in storage systems are correlated has been known since at least the work of Talagala at Berkeley in 1999. Correlated failures such as those of the 3TB Seagate drives at Backblaze in 2014 would invalidate the paper's claim that:
we have shown that several complete two-dimensional disk arrays with n parity disks, n ( n– 1)/2 data disks, and less than n ( n + 1)/2 data disks could achieve a 99.999 percent probability of not losing data over four years.A 99.999 percent probability would mean that only 1 in 100,000 arrays would lose data in 4 years. But the very next year's data from their data source would probably have caused most of the arrays to lose data. When designing reliable storage, the failure model needs to be pessimistic, not average. And it needs to consider correlated failures, which is admittedly very hard to do.
[Thanks to Geneva Henry, University Librarian and Vice Provost for Libraries at the George Washington University, for contributing this guest blog post.]
While many may think of the scholarly record as the products surrounding scholarly works that are eventually disseminated, usually through publications, there is another aspect to the scholarly record that people at academic institutions – especially administrators – care about. This can be thought of as the campus scholarly record that frames the identity of an institution. In considering this perspective, there is an even more compelling reason to consider how the many activities surrounding scholarly dissemination are captured and managed. The libraries at academic institutions are arguably the obvious leaders to assume responsibility for managing these resources; libraries have been the stewards of the scholarly record for a very long time. But librarians must now recognize the changing nature of the elements of that record and take a proactive role in its capture and preservation. Moreover, they have a responsibility to the many campus stakeholders who have an interest in these resources for differing and sometimes conflicting purposes.
Research activities and early dissemination of findings have changed with the proliferation of social media and the Web. Scholars can exchange information via blog posts, twitter messages, Facebook posts and every other means of social media available, with feedback from colleagues helping to refine the final formal publication. The traditional methods of peer review are now being further enhanced through web-based prepublications and blogs where reviewers from anywhere can provide less formal feedback to authors. For an increasing number of scholars, social media is the new preprint. Data is posted and shared, comments are exchanged, methods are presented and questioned, revisions happen and the process can continue, even after the “formal” publication has been released in a more traditional form. This requires librarians to think about how they’re preserving their websites and social media outputs that now need to be part of the scholarly record as well as the overall campus record of scholarship.
The campus is full of stakeholders who have an interest in this new, constantly evolving record. Some would like all of this information fully exposed to publicize the work being done, while others feel that there are limits to how much should be made available for everyone to view. Systems such as VIVO and Elements provide platforms that will highlight faculty activities to provide more visibility into the research activities on campus. Sponsored research offices want insights into what people are doing so that they can match research opportunities with relevant researchers and help with identifying partners at other institutions. Media relations staff want to identify experts as media inquiries come in related to current issues happening in the world. Academic departments are interested in showcasing the scholarly record of their faculty in order to attract more graduate students and new faculty to their departments. Promotion and tenure committees want a full understanding of all of the activities of faculty members, including their service activities; increasingly, social media is blurring the line between scholarship and service as one feeds into the other.
Faculty members, the source of creating these resources, are understandably confused. Their attitudes and perceptions range from excited to worried, from protective to open. Their activities on social media do not always relate cleanly to a single scholarly record and will often be mixed with personal, non-scholarly information they may not want the world to see (e.g. pictures of their dinner, political commentaries, stories of their family vacation). This mixed landscape helps to fuel the legal concerns of an institution’s general counsel and the image consciousness of the public relations folks who are cautious about what might end up in the public with the exercising of academic freedom.
Circling back, now, to the library as the logical keeper of the academic record, it is important to realize that there is a vast range of stakeholders that the records serve. These stakeholders become partners with the library in helping to determine what information will be kept, what will be exposed and what needs to remain in restricted access. Partnerships with campus IT units that manage security and authoritative feeds from enterprise systems are critical. Sometimes some stakeholders will ask that exposed information be “redacted” from its online availability and librarians must be able to intelligently communicate the limits of successfully removing this from the world wide web.
The change in the scholarly record raises many questions and will continue to present challenges for libraries and academic institutions. As faculty change institutions, who will be responsible for managing their record of scholarship that is disseminated through social media so that it is preserved long-term? Constantly changing methods for communicating and sharing knowledge will require a scholarly record that can readily accommodate innovations. What will the scholarly record of the future be and what should be captured? While we don’t have a crystal ball to help with this prediction, we do have a good barometer surrounding us in our libraries everyday: study your students and how they communicate.About Merrilee ProffittMail | Web | Twitter | Facebook | LinkedIn | More Posts (278)
In the two years since the launch of FundRef we have been helping participating publishers with their implementations and listening to their feedback. As is often the case with new services, we have found that some of our original assumptions now need tweaking, and so the FundRef Advisory Group (made up of representatives from a dozen or so publishers and funding agencies) has been discussing the next phase of FundRef. I'd like to share some of our findings and proposals for improving the service.
When CrossRef launched FundRef, the FundRef Registry - the openly available taxonomy of standardised funder names that is central to the project - contained around 4,000 international funders. In the past 24 months this has doubled to over 8,000, thanks to input from funders and publishers and the ongoing work of the team at Elsevier who curate and update the list. There are over 170,000 articles with a properly identified funders. Unfortunately, there are also over 400,000 articles with a funder name that hasn't been match to the Registry and doesn't have a Funder Identifier. While a number of publishers are routinely supplying Funder IDs in all of their deposits, some are only managing to supply Funder IDs in as little as 30% of cases. Funder IDs are critical to FundRef - they allow us to collate and display the data accurately. Analysis shows that the deposits we are receiving without IDs fall into roughly three categories:
- Funder names that are in the Registry but have not been matched to an ID
- Entries into the funder name field that are clearly grant numbers, program names, or free-form text that has been entered or extracted incorrectly
- Funders that are not yet listed in the Registry.
At the outset we expected most of the deposits with no IDs to be a result of the third of these use cases. What we are finding, however, is that the vast majority are a result of the first two. Delving into this a little more and talking to publishers about their processes and experiences, we have identified the following reasons:
- Where authors are asked to input funding data on submission or acceptance of their paper, the margin for error appears to be quite high. They are not used to being asked for this data, and so very clear instructions are needed to stress its importance and ensure that they understand what it is they are being asked for. Authors should be strongly encouraged to pick their funding sources from the list in the FundRef Registry, but presenting a list of 8000+ funder names in a navigable, straightforward way is not without its challenges. Back in 2013 CrossRef worked with UI experts to develop a widget that publishers and their vendor partners could use - either outright or as a guideline - for collecting data from authors. Two years down the line we are reviewing this UI to see how we can further encourage authors to select the canonical funder name and only enter a free-form funder if it is genuinely missing from the Registry. Even with the most intuitive of interfaces, however, some authors will copy and paste an alternative name, or enter a program name instead of a funding body. Editorial and production staff should be aware of FundRef requirements and incorporate this metadata into their routine reviews.
- Some publishers have opted to extract funding data from manuscripts instead of asking authors to supply it in a form. This is perfectly acceptable - after all, the information is usually right there in the paper's acknowledgements. However, this process also needs to be accompanied by a certain amount of QA. We are seeing instances of grant numbers being extracted instead of funders, funder names that are concatenated into a single field, and funder names that are 100% accurate but have simply not been matched with IDs ahead of deposit. (In the CrossRef database we currently have 16,989 FundRef deposits that contain the name "National Natural Science Foundation of China" but have no accompanying ID. These are clearly slipping through the QA net.)
So what are we going to do to try and improve things?
Firstly, we are undergoing a review of our own UI and talking with vendors about changes that might encourage better data input by authors. We are also going to find out more about what processes are being undertaken by the publishers that are depositing consistently accurate data, and share these with the publishing community as a set of best practices. Whether publishers are asking authors or extracting the data from manuscripts, an element of QA seems to be critical to ensure the integrity of the data being deposited.
Secondly, we are going to start on some data "tidying" tasks at our end. Traditionally, CrossRef has not altered or corrected any of the data that publishers deposit: we provide error reports and ask that they make the corrections themselves. But with FundRef there seem to be a few quick wins - those 16,000 instances of the Natural National Science Foundation of China could easily and without ambiguity be matched to the correct FundRef ID (http://dx.doi.org/10.13039/501100001809), along with other names that have some very obvious minor discrepancies ("&" in place of "and", "US" instead of "U.S."). Cleaning up these deposits and adding the Funder IDs should result in a significant increase in the amount of FundRef data that is retrievable through FundRef Search and our Search API (and by extension, CHORUS Search).
We are also asking publishers to continue to review their own processes and procedures to see where improvements can be made, as the success of FundRef ultimately depends on the quality of the data that is fed into it.
The LITA Blog features original content by LITA members on technologies and trends relevant to librarians. The writers represent a variety of perspectives, from library students to public, academic, and special librarians.
The blog also delivers announcements about LITA programming, conferences, and other events, and serves as a place for LITA committees to share information back with the community if they so choose.
Sharing on the LITA blog ensures a broad audience for your content. Four recent LITA blog posts (authored by Brianna Marshall, Michael Rodriguez, Bryan Brown, and John Klima) have been picked up by American Libraries Direct – and most posts have been viewed hundreds of times and shared dozens of times on social media. John Klima’s post on 3D printers has been shared 40 times from the LITA Twitter account and another 40 times directly from the blog (a cumulative record), Bryan Brown’s post on MOOCs has been viewed over 800 times (also a record as of this writing), and Michael Rodriguez’s post on web accessibility was shared over 60 times direct from the blog (another record).
Anyone can write a guest post for the LITA Blog, even non-LITA members, as long as the topic is relevant. Would you like to write a guest post or share posts reflecting the interests of your committee or interest group? Contact blog editor Brianna Marshall at briannahmarshall(at)gmail(dot)com or Mark Beatty at mbeatty(at)ala(dot)org.
Peter Murray: Thursday Threads: Web Time Travel, Fake Engine Noise, The Tech Behind Delivering Pictures of Behinds
Delivered by FeedBurner
In this week’s DLTJ Thursday Threads: the introduction of a web service that points you to old copies of web pages, dispelling illusions of engine noise, and admiring the technical architecture of Amazon Web Services that gives us the power to witness Kim Kardashian’s back side.
Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.Introducing the Memento Web Time Travel Service
The Time Travel service helps you find versions of a page that existed at some time in the past. These prior versions of web pages are named Mementos. Mementos can be found in web archives or in systems that support versioning such as wikis and revision control systems.
When you enter the web address of a page and a time in the past, the Time Travel service tries to find a Memento for that page as it existed around the time of your choice. This will work for addresses of pages that currently exist on the web but also for those that have meanwhile vanished.- About the Time Travel Service, Last updated: 19-Jan-2015
The folks at Los Alamos National Laboratory have been working on web-time-travel for years. What started with browser plugins has now become a web service that can be used to find old copies of web pages found in caches throughout the world. Thought the Internet Archive’s Wayback Machine was the only game in town? Check out the Memento Time Travel service.America’s best-selling cars and trucks are built on lies: The rise of fake engine noise
Stomp on the gas in a new Ford Mustang or F-150 and you’ll hear a meaty, throaty rumble — the same style of roar that Americans have associated with auto power and performance for decades.
It’s a sham. The engine growl in some of America’s best-selling cars and trucks is actually a finely tuned bit of lip-syncing, boosted through special pipes or digitally faked altogether. And it’s driving car enthusiasts insane.- America’s best-selling cars and trucks are built on lies: The rise of fake engine noise, by Drew Harwell, The Washington Post, 21-Jan-2014
I knew they were adding “engine noise” to the all-electric Prius car because it was so quiet that it could startle people, but I didn’t know it was happening to so-called “muscle cars”.A look at Amazon’s world-class data-center ecosystem
Amazon VP and Distinguished Engineer James Hamilton shares what makes the company’s armada of data centers run smoothly.- A look at Amazon’s world-class data-center ecosystem, by Michael Kassner, TechRepublic, 8-Dec-2014
Among the geek community, there must be some awe at how Amazon seems to create infinitely big data centers that can be used for everything from powering Netflix to this humble blog. Amazon is also notoriously secret about how it does things. This article provides a glimpse into how Amazon Web Services achieves the scale that it does.How PAPER Magazine’s web engineers scaled their back-end for Kim Kardashian
On November 11th 2014, the art-and-nightlife magazine PAPER “broke the Internet” when it put a Jean-Paul Goude photograph of a well-oiled, mostly-nude Kim Kardashian on its cover and posted the same nude photos of Kim Kardashian to its website (NSFW). It linked together all of these things—and other articles, too—under the “#breaktheinternet” hashtag. There was one part of the Internet that PAPER didn’t want to break: The part that was serving up millions of copies of Kardashian’s nudes over the web.
Hosting that butt is an impressive feat. You can’t just put Kim Kardashian nudes on the Internet and walk away —that would be like putting up a tent in the middle of a hurricane. Your web server would melt. You need to plan.- How PAPER Magazine’s web engineers scaled their back-end for Kim Kardashian (SFW), by Paul Ford, Medium, 21-Jan-2015
Speaking of how Amazon can seemingly scale to infinite levels, this article tells the story of how one online publisher ramped up their server capacity to meet the demands of users flocking to see Kim Kardashian’s rear end. (And who said the internet wasn’t a valuable tool…)Link to this post!
Today I found the following resources and bookmarked them on <a href=
- Greenfoot Teach and learn Java programming
- Blockly Games Blockly Games is a series of educational games that teach programming. It is designed for children who have not had prior experience with computer programming. By the end of these games, players are ready to use conventional text-based languages.
- Blockly Blockly is a library for building visual programming editors
Digest powered by RSS Digest
On December 10th, we held our second Evolving Scholarly Record Workshop at George Washington University in Washington, DC (you can read Ricky Erway’s summary of the first workshop, starting here). Many thanks to Geneva Henry and all the staff at GWU for hosting us in the fabulous International Brotherhood of Teamsters Labor History Research Center.This workshop, and others, build on the framework presented in the OCLC Research report, The Evolving Scholarly Record.
At George Washington Gelman library attending OCLC workshop #esrworkshop
— Martha Kyrillidou (@kyrillidou) December 10, 2014
Our first speaker, Brian Lavoie (OCLC Research) presented the ESR Framework and put it into context. What is considered part of the record is constantly expanding – for example, blogs and social media, which would previously not have been included. The evolution of how scholarship is recorded, makes it challenging to organize the record in a consistent and reliable ways. The ecosystem of stakeholders is evolving as well. It became clear to Brian and others involved in discussions around the problem space that a framework was necessary in order to support strategic discussions across stakeholders and across domains.
Formats shift (print to dig), boundaries blur (from books to data sets), characteristics change (static works to dynamic) #esrworkshop
— Keith Webster (@CMKeithW) December 10, 2014
In addition to traditional scholarly outcomes, there are two additional areas of focus, process and aftermath.
Process is what leads up to the publication of the outcomes – in the framework, process is composed of method, evidence and discussion (important because outcomes usually consolidate thanks to discussions with peers). Anchoring outcomes in process will help reproducibility. Scholarly activities continue in aftermath: discussion (including post publication reviews and commentaries), revision (enhancement, clarification), re-use (including repackaging for other audiences).
In the stakeholder ecosystem, the traditional roles (create, fix, collect, use) are being reconfigured. For example, in addition to libraries, service providers like Portico and JSTOR are now important in the collect role. Social media and social storage services, which are entirely outside the academy, are now part of create and use. New platforms, like figshare, are taking on the roles of fix and collect. The takeaway here? The roles are constant, but the configurations of the stakeholders beneath them are changing.
How does the traditional "scholarly record" fit into today's consumer-producer model of web content? Not well, it seems. #esrworkshop
— Scott W. H. Young (@hei_scott) December 10, 2014
Our second speaker, Herbert van de Sompel (Los Alamos National Laboratory) gave perspective from the network point of view. His talk was a modified reprise of his presentation at the June OCLC / DANS workshop in Amsterdam, which Ricky nicely summarized in a previous posting. Herbert will also be speaking at our workshop coming up in March, so if you’d like to catch him in action, sign up for that session.
— Scott W. H. Young (@hei_scott) December 10, 2014
Our third speaker was Geneva Henry (George Washington University) – Geneva represented the view from the campus. We will be posing her viewpoint in a separate blog post, later this week but her remarks touched on the various campus stakeholders in the scholarly record – scholars, media relations, promotion and tenure committee, the office of research, the library.
Look to your students: How are they communicating? asks Henry. They’re future scholars; don’t expect drastic behavior changes #esrworkshop
— Mark Newton (@libmark) December 10, 2014
Daniel Hook (Digital Science), shared his “view from the platform.” (Digital Science is the parent company of several platform services, such as FigShare, AltMetrics, Symplectic Elements, and Overleaf). Daniel stressed the importance in transparency and reproducibility of research – there is a need for a demonstrable pay-off for investors in research. There is a delicate balance to be reached in collaboration versus competition in research. We are in an era of increased collaboration and the “fourth age of research” is marked by international collaboration. Who “owns” research, and the scholarly record? Individual researchers? Their institutions? Evaluation of research increasingly calls for demonstrating impact of research. Identifiers are glue – identifiers for projects, for researchers, for institutions. The future will be in dynamically making assertions of value and impact across institutions, and to build confidence in those assertions.
Funders keen to assess impact of research they funded at a macro-level – have you influenced policy? The economy? #esrworkshop
— Keith Webster (@CMKeithW) December 10, 2014
Finally Clifford Lynch (Coalition for Networked Information) gave some additional remarks, highlighting stress points. Potentially, the scholarly record is huge, especially with an expanded range of media and channels. The minutes of science are recording every minute, year in year out. Selection issues are challenging, to say the least. Is it sensible to consider keeping everything? Cliff called for hard questions to be asked, and for studies to be done. Some formats seem to be overlooked — video, for example.
We concluded the meeting with a number of break-out sessions that took up focused topics. The groups came back with tons of notes, and also some possible “next steps” or actions that could be taken to move us forward. Those included.
- Promulgating name identifiers and persistent IDs for use by other stakeholders
- Focusing on research centers and subject/disciplinary repositories to see what kinds of relationships are needed
- Mining user studies/reviews to pull out research needs/methods/trends/gaps and find touch-points to the library
- Following the money in the ESR ecosystem to see whether there are disconnects between shareholder interests and scholar value
- Pursuing with publishers whether they will collect the appropriate contextual processes and aftermaths
- Investigating funding, ROI, and financial tradeoffs
- Getting involved during the grant planning processes so that materials flow to the right places instead of needing to be rescued after the fact
Thanks to all of our participants, but particularly to our hosts, our speakers, our notetakers and those who helped record the event on Twitter. We’re looking forward to another productive workshop in Chicago (in March) and then expect to culminate the three workshops at the ESR workshop in San Francisco (in June) where we’ll focus on how we can collaboratively move things forward to do our best to ensure stewardship of the scholarly record now and into the future.
I have 19 pages of notes from 5 breakout sessions. We'll boil down discussions and talking points for next steps soon #esrworkshop
— Merrilee Proffitt (@MerrileeIAm) December 10, 2014
— Herbert (@hvdsomp) December 11, 2014About Merrilee ProffittMail | Web | Twitter | Facebook | LinkedIn | More Posts (278)
New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.New This Week
Visit the LITA Job Site for more available jobs and for information on submitting a job posting.
So far in this series, I’ve talked about the pros and cons of Agile, and reviewed the methodology’s core values. Today I want to move beyond the “what” and into more of the “how.” I’ll start by looking at user stories.
A user story is the basic unit of Agile development. User stories should be written by the business, not by the development team. They should clearly state the business value that the project is expected to create, as well as the user that will benefit. The focus should be on the problem being solved, not the software being built. This not only increases efficiency, but also provides flexibility for the development team: how they solve the problem is up to them.
There’s a generally accepted template for writing user stories: “As a [user type], I want to [specific functionality] so that [tangible benefit].” I’m not crazy about using this convention because it seems contrived to me, but it does make it easier to understand the priorities of Agile development: a feature exists to provide a benefit for a specific user or user group. If you can’t express functionality in this manner, then it is either superfluous or a technical requirement (there’s a separate document for those, which is written during and after development, not before).
A great user story should follow the INVEST model: user stories should be Independent, Negotiable, Valuable, Estimatable, Small, and Testable (you can read about this in more detail in the links provided below). The main thing to remember, though, is that we’re really just trying to create software where every component can be proven to solve a specific problem for a specific user. It all comes back to justifying programming effort in terms of the value it provides once it’s been released into the wild. Let’s look at some examples, based on developing a tool to keep track of tasks:
- “As a task list creator, I can see all of my tasks together.” This story is too vague, and will result in developers guessing about the true purpose of this feature.
- “As a task list creator, I can see all of my tasks together so I can download them to MS Excel.” This one is too specific. MS Excel is a technical requirement, and should not be part of the user story text. The real need is for a downloadable version of the task list; limiting it to one specific format at this point may lead to problems later on.
- “As a task list creator, I can see all of my tasks together so I can download them.” This is better, but it still doesn’t answer the question of value. Why do I need to download the tasks? This looks ok, but reality I have created a false dependency between two separate features.
- “As a task list creator, I can download a task list to so I can share it with project stakeholders.” Now we’re getting somewhere! The user needs to share the list with other members of the team, which is why she needs a downloadable version.
User story writing is iterative and investigative. At this point, I could argue that downloading, just like display, is an unnecessary step, and that the real feature is for some mechanism that allows all project members to see the task list, and the true need is for the team to work on the list together. That’s where the value-add is. Everything else is bells and whistles. Maybe downloading is the most efficient way to share the list, but that decision is part of the development process and should be documented in the technical requirements. Maybe there are other reasons to add a download feature; those belong on separate stories.
As a business-side stakeholder with an engineering background, my first attempts at creating user stories did not go well. I like to tinker and get my hands dirty, so trying to keep design out of my user stories proved difficult; it’s easy for me to get bogged down in technical details and focus on fixing what’s in front of me, rather than asking whether it’s even necessary in the first place. Any time design creeps into product requirements, it adds a layer of abstraction that makes it harder for a development team to understand what it is you really want. It took me a while to learn that lesson (you could argue that I’m still learning it). Besides, when a product owner gets involved in designing software, it’s hard to avoid creating an attachment to the specific design, regardless of whether it meets user needs or not. It’s best to stay out of that process altogether (no matter how much fun it may be) and maintain the focus on the user.
Writing user stories can be frustrating, especially if you’re new to Agile, but they are a great way to discover the true user needs that should drive your software development project. If you want to learn more about user stories, you can go here, here, or here. I’ll be back next month to talk about prioritization and scheduling.
What’s your experience with user stories? Do you have any tips on writing a great user story?
Boston, MA The Sixth Annual VIVO Conference will be held August 12-14, 2015 at the Hyatt Regency Cambridge, overlooking Boston. The VIVO Conference creates a unique opportunity for people from across the country and around the world to come together to explore ways to use semantic technologies and linked open data to promote scholarly collaboration and research discovery.
We have started sharing projects created at our Developer House in December, and this week we’re happy to share another. This project and post come to you from Bill Jones and Steelsen Smith.
We have started sharing projects created at our Developer House in December, and this week we’re happy to share another. This project and post come to you from Bill Jones and Steelsen Smith.
(View from the Podium)
OLA SuperConference 2015 was last week. I had the good opportunity to attend as well as too present.
On February 1st, I gave a presentation the American Library Association Midwinter Conference in Chicago, Illinois as part of the ALA Masters Series called Mechanic Institutes, Hackerspaces, Makerspaces, TechShops, Incubators, Accelerators, and Centers of Social Enterprise. Where do Libraries fit in? :: http://librarian.newjackalmanac.ca/2015/02/hackerspaces-makerspaces-fab-labs.html
But after inspection, it looks like the RSS feed that is generated by Feedburner has been updated in such a way that I - using feedly - I needed to re-subscribe. Now, I'm not sure who is at fault for this: Feedburner, feedly, or myself for using a third party to distribute a perfectly good rss feed.
I don't follow my reading statistics very closely but I do know that the traffic to this site is largely driven by Twitter and Facebook -- much more than hits from, say, other blogs. And yet, I'm disturbed that the 118 readers using the now defunct feedly rss feed will not know about what I'm writing now. I'm sad because while I've always had a small audience for this blog - I have always been very proud and humbled that I had this readership because attention is a gift.
This blog post is not solely for the attention of women. Everyone can benefit from challenging themselves in the STEM field. STEM stands for Science Technology Engineering and Math. Though there is debate on whether there is a general shortage of Americans in STEM fields, women and minorities represent a large deficit in these areas of study. It goes without saying that all members of the public should invest in educating themselves in at least one of the STEM fields.
STEM is versatile
There is nothing wrong with participating in the humanities and liberal arts. In fact, we need those facets of our cultural identity. The addition of STEM skills can greatly enhance those areas of knowledge and make you a more diverse, dynamic and competitive employee. You can be a teacher, but you can also be an algebra or biology teacher for K-12 students. You can be a librarian, but you can also be a systems librarian, medical reference librarian, web and data librarian etc. Organization, communication and creative skills are not lost in the traditionally analytical STEM fields. Fashion designers and crafters can use computer-aided design (CAD) to create graphic representations of a product. CAD can also be used to create 3D designs. Imagine a 22nd Century museum filled with 3D printed sculptures as a representation of early 21st Century art.
Early and mid-career professionals
You’re not stuck with a job title/description. Most employers would be more than happy if you inquired about being the technology liaison for your department. Having background knowledge in your area, as well as tech knowledge, could place you in a dynamic position to recommend training, updates or software to improve your organizations management. It is also never too late to learn. In fact, having spent decades in an industry, you’re more experienced. Experience brings with it a treasure of potential.
Youth and pre-professionals
According to the National Girls Collaborative Project, at the K-12 level girls are holding their ground with boys in the math and science curriculum. The reality is that, contrary to previously held beliefs, girls and young women are entering the STEM fields. The drop-off occurs at the post-secondary level. Women account for more than half of the population at most American universities. However, women account for the least university degrees in math, engineering and computer science (approx 18%). Hispanics, blacks and Native Americans combined account for 13% of science and engineering degrees earned in 2011. Though there is an abundance of articles, listing the reasons why women are dropping like flies from STEM education, many of them boil down to options. Either there aren’t any or not enough. Young females are discouraged at a young age, so they opt to shift toward other areas of study they believe their skills are better suited. There is also the unfortunate tale of the post-graduate woman who is forced to trade her dream career in order to care for her children (or is marginalized for employment because she has children).
A little encouragement
As a female and a minority, the statistics mirror my personal narrative. I was a late arrival to the information technology field because, when I was younger, I was terrible at math. I unfortunately assumed, like many other girls, that I would never be able to work in the sciences because I wasn’t “naturally” talented. I also didn’t receive my first personal computer until I was in middle school (late 90s to early 2000s). Back then a computer would easily set you back a couple grand. The few computer courses I took championed boys as future programmers and technicians. I thought that boys were naturally great with technology without realizing that it was a symptom of their environment. It would have been great if someone pulled me aside and explained that natural talent is a myth. That if I was willing to work diligently, I could be as good as the boys.
Organizations like STEMinist.com exist to remind everyone that women and minorities are capable of holding positions in STEM fields. This is by no means an endorsement for STEMinist, I just thought the addition of STEM as another frontier for feminism should be recognized. If you type “women in STEM” into a search engine, you’ll be inundated with other organizations that are adding to the conversation. No matter where your views fall on the concept of feminism, or females and minorities in the sciences, we all have a role to play in encouraging them to pursue their interests as a career.
Are you or do you know a woman/minority who is contributing to science and technology? No matter how small you believe the contribution to be, leave a comment in hopes of encouraging someone.
There are so many ways to get involved with the Digital Public Library of America, each of which contributes enormously to our mission of connecting people with our shared cultural heritage. Obviously we have our crucial hubs and institutional partners, who work closely with us to bring their content to the world. If you’re a software developer, you can build upon our code, write your own, and create apps that help to spread that content far and wide. And if you want to provide financial support, that’s easy too.
But I’m often asked about more general ways that one can advance DPLA’s goals, especially in your local community, and perhaps if you’re not necessarily a librarian, archivist, museum professional, or technologist.
Our Community Reps program exists precisely for that reason. DPLA staff and partners can only get the word out so far each year, and we need people from all walks of life who appreciate what we do to step forward and act as our representatives across the county and around the globe. We currently have another round of applications, and I ask you to strongly consider joining the program.
Two hundred people, from all 50 states and from five foreign countries have already done so. They orchestrate public meetings, hold special events, go into classrooms, host hackathons, and use their creativity and voices to let others know about DPLA and what everyone can do with it. I know this first-hand through my interactions with Reps in virtual meetings, on our special Reps email list, and from meeting many in person as I travel across the country. Reps are a critical part of the DPLA family, who often have innovative ideas that make a huge difference to our efforts.
We’re incredibly fortunate to have so many like-minded and energetic people join us and the expanding DPLA community through the Reps program. It’s a great chance to be a part of our mission. Apply today!