Uncategorized

Querying the Linked Data Cloud for African Countries (Quirk #1)

Posted in Uncategorized on June 14th, 2013 by admin – Be the first to comment

I’m currently embarking on an EDA project to build a graph of currently relevant data about the African continent, it’s countries and cities.  There may be other projects that have done this already, but my goal is to first go through the process and discovery resources along the way, hopefully bump into some challenges with the data that I can learn how to get around, and then look to see if there are people with better solutions out there.

The motivation for this work comes from the fact that MIT is highly prioritizing what they can do to have a positive impact in Africa.  At the MIT Libraries, we hold a lot of research output, some of which specifically applies to issues and topics happening in African countries.  The question is whether our research is actually being seen by the people in those areas who would benefit most from it.  The first step is to build a reliable data-set representing the continent, countries, cities, and holding metadata about them all.

Here’s one data quirk, and an example of the type of thing one has to deal with when using linked data in general.

This query returns 99 countries in Africa, but there are actually only 54-56 *currently. Examples of things I wouldn’t want back are  things that are no longer relevant to modern geo-political questions, e.g.

{
 "Country": { "type": "literal", "xml:lang": "en", "value": "Roman Empire" }
 }

Here’s the actual query for the curious. I’m running this query on FactForge, which integrates roughly 8 LOD data-sets, including DBPedia, Geonames, NYT, CIA Fact Book, etc.

PREFIX ff: <http://factforge.net/>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX dbpedia: <http://dbpedia.org/resource/>
 PREFIX pext: <http://www.ontotext.com/proton/protonext#>
 PREFIX ptop: <http://www.ontotext.com/proton/protontop#>
SELECT DISTINCT ?Country
 where {
 ?Coun ff:preferredLabel ?Country ;
 rdf:type pext:Country ;
 ptop:subRegionOf dbpedia:Africa.
 FILTER ( LANG(?Country) = "en")
 }

* In order to easily distinguish what is the current representation of the world and what is historical, and subset either would require properties indicating this currently relevant status, e.g. (in short-hand) dbpedia:Roman_Empire dbpedia:current “false”^^xsd:boolean .

Parsing MARC/XML in R

Posted in Uncategorized on March 6th, 2013 by admin – Be the first to comment

After a bit of fighting with R’s XPath configuration, I was able to write a pretty reliable if simple function to pull all instances of a field/sub-field value in a data frame…

On MOOCs, Libraries, and Disruptive Models

Posted in Uncategorized on March 6th, 2013 by admin – Be the first to comment
“Institutions of higher learning must move, as the historian Walter Russell Mead puts it, from a model of “time served” to a model of “stuff learned.” Because increasingly the world does not care what you know. Everything is on Google. The world only cares, and will only pay for, what you can do with what you know. And therefore it will not pay for a C+ in chemistry, just because your state college considers that a passing grade and was willing to give you a diploma that says so. We’re moving to a more competency-based world where there will be less interest in how you acquired the competency — in an online course, at a four-year-college or in a company-administered class — and more demand to prove that you mastered the competency.”



One might generalize using this same disruptive model, and suggest that libraries who are purveyors of the same historical services, moving at the same historically slow and cautious pace, in the context of Google, linked open library catalogs, and data science start-ups quickly subsuming the information management tasks that libraries have historically considered their core roles, need to place more emphasis on integration, experimentation, and openness pursuant of engaging in collaboration with our users in order to remain relevant.



“Openness is a survival instinct.”




Recording An Event Happening 3000 Miles Away at 3AM (#SWIB12)

Posted in Uncategorized on November 27th, 2012 by admin – Be the first to comment

Boston to Cologne

In case anyone is curious, here are the details of how I’m archiving for future reference the Flash-based conference stream from Semantic Web in Libraries (#SWIB12) happening in Cologne, Germany. The organizers of the conference generously set up a video feed so that the world (of Semantic Web professionals who were unable to travel to the event) can benefit from the great presentations happening there.

I’m using a tool called RTMPDump, which can take a stream URL and Flash Player URL, and write the contents of the stream to a local (.flv) file.

Following the instructions here, I was able to install the tool without issue (OS/X Mountain Lion) and test out the configuration for the SWIB12 stream.  The example command didn’t work right out of the gate and there were some command-line options that were order-dependent, and relied on certain quotation usage for parameters, but with a little help from the man page, I got a command that worked.

$ rtmpdump –live -r rtmp://62.113.221.5/servicevideotv-live/livestream -W “http://www.blitzvideoserver06.de/blitzvideoplayer6.swf” –flv “/Users/sands/Movies/SWIB12_day2_1.flv”

 

-r:  The actual RTMP stream URL

-W: The URL of the Flash Player

–flv: The local file to output to

–live: Specify that this is not a static remote video file, but a live stream.  (I’m guessing this helps with closing out the file when the stream interrupts.)

 

Once the stream is up and running (something I have to wait until 3am EST for) I can trigger the above command and get a clear read-out of the connection, parameters, and capture progress…

 Now all I have to do is make sure I’m awake when the stream drops for the lunch break so I can retrigger the capture once it is back online and have an archive of the second half of the day.  O.o

Arlington Tornado/Microburst – July 18th, 2012

Posted in Uncategorized on July 19th, 2012 by admin – Be the first to comment

I spent the night driving around the damage on my motorcycle taking long-exposure shots of the aftermath.

The full set can be found here:  http://www.flickr.com/photos/sandsfish/sets/72157630651535854/

More about the event can be found here and here.

 

Untitled

Untitled

Untitled

Untitled

Untitled

The Author

Bike Path

Untitled

Untitled

MIT VIVO Ontology Mapping – Grants and Publications

Posted in Uncategorized on September 15th, 2011 by admin – 2 Comments

My colleague Sean Thomas and I are currently working on evaluating the open-source semantic web application VIVO [ http://vivoweb.org/ ] for MIT.  Our latest investigation is how to best model grant data and relate it to other entities in the system, specifically authors and publications.

The desire is to have a better grasp on compliance with NIH (National Institutes of Health) publication requirements, specifically with PubMed and PubMed Central.

Taking some queues from how the University of Florida and Cornell have modeled this type of data, and sketching out our own pseudo-ontological mappings for entities they haven’t mapped, we’ve come up with a first draft of how we imagine this will look (below).  I’m sure this will evolve and I haven’t double-checked this for accuracy against the VIVO Core ontology, but it is at least a start…

 

NIH Grant Modeling for MIT VIVO

UTC Clock Hack for OS/X 10.6

Posted in Uncategorized on May 26th, 2010 by admin – 12 Comments

If you work on open source projects or within distributed work environments, and meetings are scheduled and publicized in UTC, you’ll find yourself doing the time zone conversion repeatedly.

If you run OS/X, you can use the World Clock Widget to keep track easily.  The problem is that the World Clock does not come with UTC as a default option.  Fortunately, it is easy to add.

I used this article to make the modifications, and wasn’t sure if the method would work on OS/X 10.6, but it didn’t give me a single issue.  There is some iffy information in the comments of that article, so here is how I proceeded successfully:

1.)  Crack Open the WorldClock widget

Applications & widgets in OS/X are not just one file, as they initially appear, but special container objects that hold other files (essentially a special-case folder).

  • Navigate to /Library/Widgets
  • Right-click (or command-click) on the WorldClock.wdgt file
  • Click Show Package Contents

This will open a new Finder window displaying the contents of the World Clock widget.

2.)  Edit the WorldClock.js file

If you’re not used to doing these things, this isn’t as intimidating as it might seem.  (Make a backup copy if you’re disaster-prone.)

  • Find the line with the text:  “var Europe = ["
  • Find the bottom of this block of text, terminated by "];
  • On the line before this, add “{city:’UTC’, offsett:0, timezone:’UTC’}
  • Be sure to add a comma to the end of the previous line, since it is no longer the last item.
  • Save the file.  (You may have to authenticate in order to do this, since it is technically a system file.)

Modified "WorldClock.js"

3.)  Edit the localizedStrings.js file

There is a language localization file for all strings that appear in the widget’s interface.

  • Find the directory that matches the language you are using (I modified English.lproj/localizedStrings.js)
  • At the bottom of the file, add the line:  “localizedCityNames['UTC'] = ‘UTC’;
  • Save the file.

Modified localizedStrings.js

4.)  Add the World Clock widget to your Dashboard & Configure

The widget will have to be initialized with the new changes, so…

  • If you have an old one, remove it.
  • Add the widget to your Dashboard again.
  • Click the “i” in the bottom-right corner to configure it.
  • Choose Country:  Europe, City:  UTC

And there you go, a clock reading the UTC time…

How to deploy a Maven Parent POM

Posted in Uncategorized on May 14th, 2010 by admin – 12 Comments

mvn -N deploy

Maven help defines the “-N” option as:

-N,–non-recursive                     Do not recurse into sub-projects

Running this command in the same directory as the Parent POM (pom.xml) will result in just the parent being deployed, avoiding Maven’s attempt to traverse all sub-projects referenced.

(This all presumes that you have the correct prerequisites in place for the deploy phase, such as proper ~/.m2/settings.xml configuration and <distributionManagement> configuration present in said pom.xml.)

Thanks to Elliot Metsger of the DSpace community for this tip.

SIMILE Longwell Command Execution

Posted in Uncategorized on April 30th, 2010 by admin – 5 Comments

Here’s my high-level analysis of how Longwell commands are executed using HTTP calls.  The most obvious to the end user will be calls triggered by mouse-clicks in the browser, but there are a large amount of these command calls which are made within Longwell itself as somewhat of a local loop-back call that happens after the user triggers something via the UI (e.g. expanding a facet, changing the current view, etc.)

longwell_command_diagram

Please comment if you find any errors in the interpretation of the call structure.