You are here

planet code4lib

Subscribe to planet code4lib feed
Planet Code4Lib - http://planet.code4lib.org
Updated: 1 day 13 hours ago

Patrick Hochstenbach: Homework assignment #6 Sketchbookskool #BootKamp

Sat, 2015-06-20 12:48
Filed under: Sketchbook Tagged: art, drawing, portrait, sketch, sketchbook, sketchbookskool

Patrick Hochstenbach: Homework assignment #5 Sketchbookskool #BootKamp

Sat, 2015-06-20 12:46
Filed under: Sketchbook Tagged: art, inspiration, picasso, sketch, sketchbook, sketchbookskool

State Library of Denmark: Dubious guesses, counted correctly

Fri, 2015-06-19 21:55

We do have a bit of a performance challenge with heavy faceting on large result sets in our Solr based Net Archive Search. The usual query speed is < 2 seconds, but if the user requests aggregations based on large result sets, such as all resources from a whole year, processing time jumps to minutes. To get an idea of how bad it is, here’s a chart for response times when faceting on a field with 640M unique values.

Faceting performance for field links with 600M unique values on a 900GB / 250M document index

Yes, the 80M hits query does take 16 minutes! As outlined in Heuristically correct top-X facets, it seems possible to use sampling to determine the top-X terms of the facet result and then fine count only those terms. The first version of heuristically correct top-X facets has now been implemented (download the latest Sparse faceting WAR to try it out), so time for evaluation.

Three facet fields

For this small scale evaluation we use just a single 900GB shard with 250M documents, generated from harvested web resources. The three fields of interests are

  • domain, with 1 value/document and 1.1M unique values. Of these, 230K are only referenced by a single document. The most popular domains are referenced by 4M documents.
    Intuitively, domain seems fitting for sampling, with relatively few unique values, not too many single instance values and a high amount of popular domains.
  • url, with 1 value/document and 200M unique values. Of these, 185M are only referenced by a single document. The most popular urls are referenced by 65K documents.
    Contrary to domain, url seems more problematic to sample, with relatively many unique values, a great deal of single value instances and not very many popular urls.
  • links, with 10 values/document and 600M unique values. Of these, 420M are only referenced by a single document. The most popular links are referenced by 8M documents.
    In between domain and url is links, with relatively many unique values, but only 10% of the 6 billion references being to single instance values and a with high amount of popular links.
Methodology

Caveat lector: This test should not be seen as authoritative, but rather an indicator of trade-offs. It was done on a heavy loaded machine, so real-world performance should be better. However, the relative differences in speed should not be to far off (tested ad hoc at a time where the machine was not under heavy load).

11 very popular terms were extracted from the general text field and used as query term, to simulate queries, heavy in terms of the number of hits.

Term Hits og 77M a 54M 10 50M to 45M ikke 40M søg 33M denne 25M også 22M under 18M telefon 10M indkøbskurv  7M

The top 25 terms were requested with facet.limit=25 and sampling was performed by using only part of the result set to update the facet counters. The sampling was controlled by 2 options:

  • fraction (facet.sparse.heuristic.fraction=0.xx): How much of the total number of documents to sample. If fraction is 0.01, this means 1% or 0.01*250M = 2.5M documents. Note that these are all the documents, not only the ones in the result set!
  • chunks (facet.sparse.heuristic.sample.chunks=xxx): How many chunks to split the sampling in. If chunks is 10 and fraction is 0.01, the 2.5M sample documents will be checked by visiting the first 250K, skipping ahead, visiting another 250K etc. 10 times.

To get a measure of validity, a full count was performed for each facet with each search term. The result from the samples runs were then compared to the full count, by counting the number of correct terms from the top to the first error. Example: If the fully counted result is

  • a (100)
  • b (80)
  • c (50)
  • d (20)
  • e (20)

and the sample result is

  • a (100)
  • b (80)
  • c (50)
  • e (20)
  • f (18)

then the score would be 3. Note that the counts themselves are guaranteed to be correct. Only the terms are unreliable.

Measurements Facet field domain (1.1M unique values, 1 value/document)

First we sample using half of all documents (sample fraction 0.5), for varying amounts of chunks: c10 means 10 chunks, c10K means 10000 chunks. As facet.limit=25, highest possible validity score is 25. Scores below 10 are marked with red, scores from 10-19 are marked purple.

Term Hits c10 c100 c1K c10K c100K og 77M 19 9 25 25 25 a 54M 20 4 25 25 25 10 50M 20 5 25 25 25 to 45M 18 14 25 25 25 ikke 40M 16 15 25 25 25 søg 33M 16 15 23 25 24 denne 25M 17 18 23 24 25 også 22M 17 12 25 25 25 under 18M 4 12 23 23 25 telefon 10M 16 8 23 23 25 indkøbskurv 7M 8 2 16 21 25

Heuristic faceting for field domain with 50% sampling

Looking at this, it seems that c1k (1000 chunks) is good, except for the last term indkøbskurv, and really good for 10000 chunks. Alas, sampling with half the data is nearly the full work.

Looking at a sample fraction of 0.01 (1% of total size) is more interesting:

Term Hits c10 c100 c1K c10K c100K og 77M 4 9 24 23 25 a 54M 4 4 23 24 25 10 50M 3 4 23 25 20 to 45M 0 0 24 24 24 ikke 40M 5 13 25 24 25 søg 33M 0 0 20 21 25 denne 25M 0 0 18 22 23 også 22M 6 12 23 25 25 under 18M 3 4 22 23 24 telefon 10M 5 7 12 12 25 indkøbskurv 7M 0 1 4 16 23

Heuristic faceting for field domain with 1% sampling

Here it seems that c10K is good and c100K is really good, using only 1% of the documents for sampling. If we were only interested in the top-10 terms, the over-provisioning call for top-25 would yield valid results for both c10k and c100k. If we want all top-25 terms to be correct, over-provisioning to top-50 or something like that should work.

The results are viable, even with a 1% sample size, provided that the number of chunks is high enough. So how fast is it to perform heuristic faceting, as opposed to full count?

Faceting performance for field domain with 1% sampling

The blue line represents the standard full counting faceting, no sampling. It grows linear with result size, with worst case being 14 seconds. Sample based counting (all the other lines) also grows linear, but with worst case at 2 seconds. Furthermore the speed difference between the number of chunks is so small that choosing 100K chunks, and thereby the best chance of getting the viable results, is not a problem.

In short: Heuristic faceting on the domain field for large result sets is 4-7 times faster than standard counting, with a high degree of viability.

Facet field url (200M unique values, 1 value/document)

Heuristic faceting for field url with 1% sampling

Faceting performance for field url with 1% sampling

The speed up is a modest 2-4 times for the url field, but worse the viability is low, even when using 100000 chunks. Raising the minimum result set size for heuristic faceting to 20M hits could conceivably work, but the url field still seems a poor fit. Considering that the url field does not have very many recurring values, this is not too surprising.

Facet field links (600M unique values, 10 values/document)

Heuristic faceting for field links with 1% sampling

Faceting performance for field links with 1% sampling

The heuristic viability of the links field is just as good as with the domain field: As long af the number of chunks is above 1000, sampling with 1% yields great results. The performance is 10-30 times that of standard counting. This means that the links field is an exceptionally well fit for heuristic faceting.

Removing the full count from the chart above reveals that worst-case in this setup is 22 seconds. Not bad for a result set of 77M documents, each with 10 references to any of 600M values:

Faceting performance for field links with 1% sampling, no baseline shown

Summary

Heuristically correct faceting for large result sets allows us to reduce the runtime of our heaviest queries by an order of magnitude. Viability and relative performance is heavily dictated by the term count distribution for the concrete fields (the url field was a poor fit) and by cardinality. Anyone considering heuristic faceting should test viability on their corpus before enabling it.

Word of caution

Heuristic faceting as part of Solr sparse faceting is very new and not tested in production. It is also somewhat rough on the edges; simple features such as automatic over-provisioning has not been implemented yet.


Nicole Engard: Bookmarks for June 19, 2015

Fri, 2015-06-19 20:30

Today I found the following resources and bookmarked them on Delicious.

  • explainshell.com Write down a command-line to see the help text that matches each argument

Digest powered by RSS Digest

The post Bookmarks for June 19, 2015 appeared first on What I Learned Today....

Related posts:

  1. Herding Cattle
  2. Live Ink
  3. Car 5.0

Terry Reese: MarcEdit User-Group Meeting @ALA, June 26, 2015

Fri, 2015-06-19 20:24
Logistics

Time: 6:00 – 7:30 pm, Friday, June 26, 2015
Place: Marriott Marquis (map)
Room: Pacific H, capacity: 30

Description:

The MarcEdit user community is large and diverse and honestly, I get to meet far too few community members.  This meeting has been put together to give members of the community a chance to come together and talk about the development road map, hear about the work to port MarcEdit to the Mac, and give me an opportunity to hear from the community.  I’ll talk about future work, areas of potential partnership, as well as hearing from you what you’d like to see in the program to make your metadata live’s a little easier.  If this sounds interesting to you — I really hope to see you there.

Acknowledgements:

A *big* thank you to John Chapman and OCLC for allowing this to happen.  As folks might guess, finding space at ALA can be a challenging and expensive endeavor so when I originally broached the idea with OCLC, I had pretty low expectations.  But they truly went above and beyond any reasonable expectation, working with the hotel and ALA so this meeting could take place.  And why they didn’t ask for it — they have my personal thanks and gratitude.  If you can attend the event, or heck, wish you could have but your schedule made it impossible — make sure you let OCLC know that this was appreciated.

District Dispatch: IMLS announces next immigration webinar in series

Fri, 2015-06-19 20:15

 

Photo Credit: U.S. Citizenship and Immigration Services

On July 2, 2015, the Institute of Museum and Library Services (IMLS) and U.S. Citizenship and Immigration Services (USCIS) will host a free webinar for public librarians on the topic of immigration and U.S. citizenship. Join in to learn more about what resources are available to assist libraries that provide immigrant and adult education services. The webinar, Overview of myE-Verify, will explore a new online service for the general public. Representatives will be on hand to discuss how the service can be used to:

  • Confirm their work eligibility with Self Check
  • Create a myE-Verify account
  • Protect their Social Security number in E-Verify with Self Lock
  • Access myResources, a multimedia resource center to learn about their rights and their employer’s responsibilities.

Webinar Details:
Date: July 2, 2015
Time: 2:00 – 3:00 p.m. EDT
Click here to register

This series was developed as part of a partnership between IMLS and USCIS to ensure that librarians have the necessary tools and knowledge to refer their patrons to accurate and reliable sources of information on immigration-related topics. To find out more about the partnership and the webinar series, visit the Serving New Americans page of the IMLS website or on the USCIS website.

The post IMLS announces next immigration webinar in series appeared first on District Dispatch.

Harvard Library Innovation Lab: Link roundup June 19, 2015

Fri, 2015-06-19 20:04

More rounded up than ever

Spot the Ball: Women’s World Cup 2015 – NYTimes.com

Fun and interactive method for displaying images

Toby Glanville’s brilliant images of workers in the late 90s

“I think perhaps that a real portrait is one that suggests to the viewer that the subject portrayed is alive”

The construction of the Statue of Liberty – Google Cultural Institute

Love the windowpane slider at the bottom.

The Humans Who Dream Of Companies That Won’t Need Us | Fast Company | Business + Innovation

An army of accountant-robots is coming for you

Giphoscopes from Officina K | The Public Domain Review

Giphoscopes are hand cranked animated gifs

FOSS4Lib Recent Releases: Fedora Repository - 3.8.1

Fri, 2015-06-19 17:29

Last updated June 19, 2015. Created by Peter Murray on June 19, 2015.
Log in to edit this page.

Package: Fedora RepositoryRelease Date: Wednesday, June 17, 2015

David Rosenthal: EE380 talk on eBay storage

Fri, 2015-06-19 15:00
Russ McElroy & Farid Yavari gave a talk to Stanford's EE380 course describing how eBay's approach to storage (YouTube) is driven by their Total Cost of Ownership (TCO) model. As shown in this screengrab, by taking into account all the cost elements, they can justify the higher capital cost of flash media in much the way, but with much more realistic data and across a broader span of applications, that Ian Adams, Ethan Miller and I did in our 2011 paper Using Storage Class Memory for Archives with DAWN, a Durable Array of Wimpy Nodes.

We were inspired by a 2009 paper FAWN A Fast Array of Wimpy Nodes in which David Andersen and his co-authors from C-MU showed that a network of large numbers of small CPUs coupled with modest amounts of flash memory could process key-value queries at the same speed as the networks of beefy servers used by, for example, Google, but using 2 orders of magnitude less power.

As this McElroy slide shows, power cost is important and it varies over a 3x range (a problem for Kaminska's thesis about the importance of 21 Inc's bitcoin mining hardware). He specifically mentions the need to get the computation close to the data, with ARM processors in the storage fabric. In this way the amount of data to be moved can be significantly reduced, and thus the capital cost, since as he reports the cost of the network hardware is 25% of the cost of the rack, and it burns a lot of power.

At present, eBay relies on tiering, moving data to less expensive storage such as consumer hard drives when it hasn't been accessed in some time. As I wrote last year:
Fundamentally, tiering like most storage architectures suffers from the idea that in order to do anything with data you need to move it from the storage medium to some compute engine. Thus an obsession with I/O bandwidth rather than what the application really wants, which is query processing rate. By moving computation to the data on the storage medium, rather than moving data to the computation, architectures like DAWN and Seagate's and WD's Ethernet-connected hard disks show how to avoid the need to tier and thus the need to be right in your predictions about how users will access the data.That post was in part about Facebook's use of tiering, which works well because Facebook has highly predictable data access patterns. McElroy's talk suggests that eBay's data accesses are somewhat predictable, but much less so than Facebook's. This makes his implication that tiering isn't a good long-term approach plausible.

District Dispatch: 3D and IP at VT

Fri, 2015-06-19 14:49

From Wikipedia

3D printers are finding their way into an ever-growing number of libraries, schools, museums and universities. Together, these institutions facilitate creative learning through 3D modeling, scanning and printing. Virginia Tech is one of the latest universities within this far-reaching “creative learning community” to build a makerspace replete with cutting-edge 3D technology.

The facility, located in the Resource Center at the Virginia Tech Northern Virginia Center in Fairfax, VA, is the brainchild of Kenneth Wong, Associate Dean of the Graduate School – and Director – of the Northern Virginia Center. Under the leadership of Associate Dean Wong and the day-to-day management of an expert team of library professionals led by Coordinator Debbie Cash, the facility is open to the public, giving all people the chance to bring their imaginations to life.

On Tuesday, as part of the Obama Administration’s “Week of Making,” the Northern Virginia Center held a 3D Printing Day. Dean Wong kindly invited me to kick off the day with a talk on 3D printing and intellectual property (IP). I spent the hour talking about the copyright, trademark and patent implications of 3D printing. Over the course of the ALA Office for Information Technology Policy’s work on 3D printing, we have exhorted library professionals to be undaunted by the specter of IP infringement. Our message has been a positive one: If we’re proactive in figuring out where our rights begin and end as users and providers of 3D printers, we can set the direction of the public policy that coalesces around 3D printing technology in the coming years.

From Flickr

Makerspaces like the one at Virginia Tech underscore just how imperative it is that we get out ahead of rights holders in setting the bounds of our 3D printing IP rights. Using the printers and scanners at Virginia Tech, Dean Wong built models of the human hand in an effort to design better prosthetics, and other users have created figurines, sculptures, and more. The sort of creativity this facility enables inspires hope for the future of connected learning. To ensure that such creativity can continue unfettered, we must be fearless in our approach to intellectual property.

IP law is not simply a series of stipulations that hamstring our ability to be creative; in the context of 3D printing, it represents something of a blank slate. Copyright, patent and trademark have yet to be interpreted in the context of 3D printing. As a result, we, as 3D printing leaders, can influence the efforts of lawmakers, regulators and the courts as they work to create frameworks for the use of this technology.

The ALA Office for Information Technology Policy’s 3D Printing Task Force works to do just that. The task force is dedicated to advancing policies that will allow Dean Wong and his Virginia Tech colleagues – and other “making enthusiasts” – to democratize creation and empower people of all ages to solve personal and community problems through 3D printing.

I would like to thank Kenneth Wong and Debbie Cash for the opportunity to speak at Virginia Tech, and to congratulate the Virginia Tech Northern Virginia Center for getting a state of the art makerspace up and running. ALA wishes Dean Wong and his team all the best in their efforts to unlock opportunities for all through 3D printing, scanning and design.

The post 3D and IP at VT appeared first on District Dispatch.

Open Knowledge Foundation: What should we include in the Global Open Data Index? From reference data to civil society audit.

Thu, 2015-06-18 13:05

Three years ago we decided to begin to systematically track the state of open data around the world. We wanted to know which countries were the strongest and which national governments were lagging behind in releasing the key datasets as open data so that we could better understand the gaps and work with our global community to advocate for these to be addressed.

In order to do this, we created the Global Open Data Index, which was a global civil society collaboration to map the state of open data in countries around the world. The result was more than just a benchmark. Governments started to use the Index as a reference to inform their priorities on open data. Civil society actors began to use it as a tool to teach newcomers about open data and as advocacy mechanism to encourage governments to improve their performance in releasing key datasets.

Three years on we want the Global Open Data Index to become much more than a measurement tool. We would like it to become a civil society audit of the data revolution. As a tool driven by campaigners, researchers and advocacy organisations, it can help us, as a movement, determine the topics and issues we want to promote and to track progress on them together. This will mean going beyond a “baseline” of reference datasets which are widely held to be important. We would like the Index to include more datasets which are critical for democratic accountability but which may be more ambitious than what is made available by many governments today.

The 10 datasets we have now and their score in France

To do this, we are today opening a consultation on what themes and datasets civil society think should be included in the Global Open Data Index. We want you to help us decide on the priority datasets that we should be tracking and advocating to have opened up. We want to work with our global network to collaboratively determine the datasets that are most important to obtaining progress on different issues – from democratic accountability, to stronger action on climate change, to tackling tax avoidance and tax evasion.

Drawing inspiration from our chapter Open Knowledge Belgium’s activities to run their own local open data census, we decided to conduct a public consultation. This public consultation will be divided into two parts:

Crowdsourced Survey – Using the platform of WikiSurvey, a platform inspired by kittens war (and as we all know, anything inspired by viral kittens cannot be bad), we are interested in what you think about which datasets are most important. The platform is simple, just choose between two datasets the one that you see as being a higher priority to include in the Global Open Data Index. Can’t find a dataset that you think is important? Add your own idea to the pool. You do not have a vote limit, so vote as much as you want and shape the index. SUBMIT YOUR DATA NOW

Our Wiki Survey

 

Focused consultation with civil society organisations – This survey will be sent to a group of NGOs working on a variety of issues to find out what they think about what specific datasets are needed and how they can be used. We will add ideas from the survey to general pool as they come in. Want to answer the survey as well? You can find it here.

This public consultation will be open for the next 10 days and will be closed at June 28th. At the end of the process we will analyse the results and share them with you.

We hope that this new process that we are starting today will lead to an even better index. If you have thoughts about the process, please do share your thoughts with us on our new forum on this topic: https://discuss.okfn.org/c/open-data-index

Peter Murray: Thursday Threads: Let’s Encrypt is coming, Businesses want you coming to the office, OR2015 Summary

Thu, 2015-06-18 10:47
Receive DLTJ Thursday Threads:

by E-mail

by RSS

Delivered by FeedBurner

This week’s threads:

Funding for my current position at LYRASIS runs out at the end of June, so I am looking for new opportunities and challenges for my skills. Check out my resume/c.v. and please let me know of job opportunities in library technology, open source, and/or community engagement.

Feel free to send this to others you think might be interested in the topics. If you find these threads interesting and useful, you might want to add the Thursday Threads RSS Feed to your feed reader or subscribe to e-mail delivery using the form to the right. If you would like a more raw and immediate version of these types of stories, watch my Pinboard bookmarks (or subscribe to its feed in your feed reader). Items posted to are also sent out as tweets; you can follow me on Twitter. Comments and tips, as always, are welcome.

Let’s Encrypt Launch Schedule

Let’s Encrypt has reached a point where we’re ready to announce our launch schedule.

  • First certificate: Week of July 27, 2015
  • General availability: Week of September 14, 2015
Let’s Encrypt Launch Schedule, by Josh Aas, 16-Jun-2015

As you might recall from a earlier edition of DLTJ Thursday Threads, the Let’s Encrypt initiative will allow anyone who has a domain name to get an encryption certificate at no cost. Not only that, but the effort is also building software to automatically create, update, install, and securely configure those certificates. This will make it very easy for small sites — like libraries, archives, and museums — to use HTTPS-encrypted connections. There has been a great deal of talk within the library patron privacy community about how to best make this happen, including a proposal by Eric Hellman for a “Digital Library Privacy Pledge” that will encourage libraries to adopt encrypted web connections across all of their services. Keep your eye out for more about “Let’s Encrypt.”

Five trends that are reshaping your office

But lots of companies wrestling with how to get people to show their face at work, in an era where telecommuting is increasingly popular, are trying to lure them back rather than mandate it. While organizations have long embraced the benefits of “hoteling,” where employees reserve desks for themselves rather than getting a dedicated space to work every day, many are taking that concept even further, adding concierge-like staff and other perks to give workers more reasons to come onsite.

Five trends that are reshaping your office, by Jena McGregor, Washington Post, 15-Jun-2015

I’m not sure this applies to many of our offices, but it is useful to know that these things are happening. As someone who has worked remotely for the past five years, I don’t know if these kinds of perks from my employer would get me to come into an office more. It is hard to beat face-to-face interaction for its power to convey information and build community. We are using tools like Slack to reproduce that kind of interaction as best we can, and the tools are getting better at making it easier for remote teams to form cohesion and effectively get work done.

Open Repositories 2015 Summary

Technology take away from #OR2015 is the Hydra-Fedora 4 stack is shaping up to be very impressive; plus ambitious plans from #DSpace camp

— Open Repository (@OpenRepository) June 11, 2015

Tweet from @OpenRepository, as quoted by Hardly Pottinger in his 2015 Recap

That tweet is a summary of what happened at Open Repositories 2015 last week, and Hardly’s summary matches what I heard about the conference activities from a far. They keynote from Google Scholar&aposs Anurag Acharya on pitfalls and best practices for indexing repository content was a bit hit. His slides are online as are a collection of tweets curated by Eileen Clancy, and I highly recommend software developers and repository users look over these do’s and don’ts for their own systems.

Link to this post!

DuraSpace News: Amy Brand Named New Director of the MIT Press

Thu, 2015-06-18 00:00

From Peter Dizikes, MIT News Office

Cambridge, MA The MIT Press has named Amy Brand PhD ’89, an executive with a wide array of experience in academic publishing and communications, as its new director. She will begin in this position on July 20.

DuraSpace News: VIVO 2015 Update: Early Registration Deadline is Tomorrow; Conference Program Released!

Thu, 2015-06-18 00:00

Early Bird Registration for the VIVO 2015 Conference Ends Friday... Don't Delay! Registration is open and the lowest registration rate is only available through tomorrow, June 19th. Register online today.

The $375 Early Bird registration rate is only available through June 19th.

DuraSpace News: Ten Years After: Open Repositories Conferences Fulfill the Promise of “Open”

Thu, 2015-06-18 00:00

From Carol Minton Morris, DuraSpace; OR Steering Committee

Ranti Junus: The awesome things @ Michigan State University

Wed, 2015-06-17 23:28

Sometime ago I read about going out and learning about your own surroundings. Sorry, I’m completely blank on the actual resource and whether I read from one of those motivational emails or tweets or websites or image meme. The point is, we should not stay inside our own bubble.

How much do we actually know the kind of awesome services or initiatives available in our own library or within other units on campus? I only know a little, to be honest. Many times I found out a cool set of collection in the library because somebody mentioned it, a local newspaper wrote about it, or from the newsletter sent to the library supporters. Kinda embarrassing, but, hey, better late than never. Same thing with many initiatives happening around campus. With so many units established on campus, I am sure I miss many of them. But I would like to highlight several of them:

First, MSU Libraries is gathering text and data aimed for digital humanities (DH) projects either through our own digital collection or collaborate with vendors. It’s all started with a request from a research faculty wanting to work on a topic that would require Congressional data. This collaboration with the faculty prompted our Digital Humanities librarians to pursue other text or data collections that we could offer to our users (and, in some cases, to the public).

MSU Libraries Digital Humanities text and data collections

Another one that I’d like to highlight is Enviro-weather, a weather-based tools for Michigan Agriculture’s pest, natural resources, and production management decisions. This is a collaborative project between the Michigan Climatological Resources Program and the MSU Integrated Pest Management Program. Each yellow dot on the map represents an Agriculture Station. If you highlight the dot with your cursor, you’ll see the latest weather data pulled from the weather station positioned around the state. Click on the dot and you’ll see a more complete information on the area. You could, of course, go further and get the raw data itself by going to their Enviro-Weather Automated Weather Station Network site.

Michigan State University Enviro-Weather tool

The Geographic Information System (GIS) unit on campus created cool and useful GIS-based applications that they developed to showcase the MSU campus. My favorite applications are these two below:

The Historical Imagery provides aerial photography of the MSU campus from 1938 to 2010 (I hope they’d add more for the later years.) While interacting with application, I, of course, couldn’t resist checking the area where the current MSU Libraries is located. By moving the slider slowly, I could see the changes happened from an empty slot to its current structure. Not all images are available; sometimes you get an empty section due to image unavailability. Still, it’s really cool to see the changes happened during the last 60 years or so.

Michigan State University GIS Historical Imagery. The round construction in middle is the Spartan Stadium.

The Environmental Stewardship (requires Adobe Flash Player 11 or higher, unfortunately) allows one to check the energy consumption and/or waste reduction effort around campus. You can pick a building and generate the report based on the data for current or past fiscal year. One can see that they made the information available for the public to see and download due to MSU’s status as a public and land grant university; the application allows the public to inspect and interact with the information themselves.

Michigan State University Environmental Stewardship map

There are more great projects and initiatives around campus like the ones that I highlighted above. It would be nice if I could do a “cool stuff on campus” search on the university website instead of relying on the serendipity. But, hey, I probably should go around and ask instead. :-)

Terry Reese: Working with the Clipboard on OSX

Wed, 2015-06-17 22:23

Coming from the Windows and Linux world — the object where data is copy and pasted from is called the Clipboard.  Not so in OSX.  In OSX, this is referred to as the NSPasteBoard.  Should you need to get string data on and off of it – use the following:

 

private static string[] pboardTypes = new string[] { "NSStringPboardType" }; public void SetClipboardText(string text) { NSPasteboard.GeneralPasteboard.DeclareTypes(pboardTypes, null); NSPasteboard.GeneralPasteboard.SetStringForType(text, pboardTypes[0]); } public string GetClipboardText() { return NSPasteboard.GeneralPasteboard.GetStringForType(pboardTypes[0]); }

–tr

LITA: Jobs in Information Technology: June 17, 2015

Wed, 2015-06-17 20:17

New vacancy listings are posted weekly on Wednesday at approximately 12 noon Central Time. They appear under New This Week and under the appropriate regional listing. Postings remain on the LITA Job Site for a minimum of four weeks.

New This Week

Special Collections Audiovisual Archivist, University of Arkansas, Fayetteville, AR

Systems and Technology Librarian, Catawba College, Salisbury, NC

Visit the LITA Job Site for more available jobs and for information on submitting a job posting.

Zotero: New Outreach Coordinator

Wed, 2015-06-17 19:15

We’re happy to announce that Alyssa Fahringer has joined the Zotero team as our new outreach coordinator. Alyssa is currently a Ph.D. student in George Mason’s Department of History and Art History studying U.S. history, women and gender, and digital history. She has years of experience working closely with university library communities, and of course she’s an active researcher herself. Take it away, Alyssa!

I am excited to begin as the outreach coordinator for Zotero! I have worked as an intern at the Library of Virginia and in the Collections Department of Hillman Library at the University of Pittsburgh. I have also been employed as a public librarian. When I complete the Ph.D. program I am hoping to work as a digital public historian. My goal as outreach coordinator is to make Zotero more accessible and user-friendly by updating existing documentation, creating new documentation for improved functionalities, and managing Zotero’s social media presence. I am looking forward to working with the Zotero community and increasing public awareness of Zotero.

Pages