Multi-Mode Visualizations and How To Evaluate Them

5th
Feb. × ’14

I recently had a discussion with some colleagues about high levels of visual complexity in data visualizations.  It started around a few questions:

Can a graph (in the network diagram sense) have axes?

Can a network visualization represent qualitative and/or quantitative information?

Is it still a network visualization if it includes other components outside of vertices and edges?

Examples to consider were this combination of a network and x/y axis and the early version of an influence network timeline I am developing.

Some quick notes on these:  The former uses the axes as a coordinate system to identify nodes without overlaying labels on an already dense layout.  (I actually quite like this method, though it would probably be even more effective if it were interactive, with mouse-over identification, etc.)  It is also important to point out that it does not imply additional qualitative or quantitative properties, other than a simple label.  The latter is technically a Path Graph, who’s qualities are constrained much more than an average network graph.  So these are, in some ways, exceptions, though good fodder for the conversation.

Those questions continue to be debated and I think they are good ones.  Our conversation turned at one point into a question of whether visualizations with a large amount of complexity, or unconventional visual structures, are really worth using.

One of the problems of standards used for evaluating visualizations is that there is an assumed single audience.  I agree, if you are creating a visualization for a single audience (i.e. “everyone everywhere that might see this”) having to explain the way the visual mechanisms work in long form is kind of ridiculous, and not very effective (the bounce rate probably being very high upon even seeing that a paragraph exists, never mind after the first few sentences).

However there are different tiers of audiences for any given visualization design.  High-level might be fine when the structure/point is obvious, and further detail (whether it’s just there, or that you can enable through interactivity) might be useful to better understand why the higher level exists as it does.  Maybe there’s an even more refined analytical level or series of levels beyond that.  (The raw data would be the basement floor for any visualization.)  There are certainly ways to provide different sets of information depending on the interest-level of the viewer.

But also, I think too many visualizations are discarded as too complex or opaque when they might be extremely useful to a small set of people.  I don’t think we should write off a particular form because the general public can’t make sense of it in a short amount of time.  These debates typically take place in public, but a visualization should be valued by taking into account who the audience is, what the utility of the visualization is, the qualities and nature of what is represented, and how broad or narrow the underlying data and phenomena are.  An extremely technical visualization that would boggle the mind of the unfamiliar may have extreme utility and efficiency for someone intimately familiar with the task(s) and data at hand.

What do you think are other criteria to evaluate data visualizations, and how does one provide multi-faceted utility in a single form?

 

More food for thought on my ”Graphs && Networks” and ”Data Viz“ Pinterest boards.

Posted in Data, Data Visualization, Uncategorized | Tagged , , , , , | Leave a comment

A Random Walk of Linked Data

19th
Nov. × ’13

Random Walking (Wikipedia)

Random Walking (Wikipedia)

I’ve been building an algorithm (and more development and open-sourcing to come) to do a random walk of the linked data in Wikipedia via DBpedia’s SPARQL endpoint.  It’s fascinating to watch it crawl, and how close some seemingly disparate concepts are on the graph.  Here’s a run from today.  Hello World to Betty Boop!  :)

 

frontal-lobe:RubyRDF sands$ ./linked_data_random_walker.rb “Hello World”

 

 

Seeding with Hello World…

Found Hello World, Hello world program

 

Pulling http://dbpedia.org/resource/Hello_world_program...

Pulling http://dbpedia.org/resource/Hello,_world...

Pulling http://dbpedia.org/resource/Hello_world_program...

Pulling http://dbpedia.org/resource/List_of_Yoku_Wakaru_Gendai_Mah%C5%8D_episodes...

Pulling http://dbpedia.org/resource/Deus_ex_machina...

Pulling http://dbpedia.org/resource/Category:Latin_literary_phrases...

Pulling http://dbpedia.org/resource/Gradus_ad_Parnassum...

Pulling http://dbpedia.org/resource/Alfred_Mann_(musicologist)...

Pulling http://dbpedia.org/resource/Musicology...

Pulling http://dbpedia.org/resource/Category:Aesthetics...

Pulling http://dbpedia.org/resource/The_arts_and_politics...

Pulling http://dbpedia.org/resource/Category:Arts...

Pulling http://dbpedia.org/resource/Category:Arts-related_lists...

Pulling http://dbpedia.org/resource/Index_of_articles_related_to_sound_art...

Pulling http://dbpedia.org/resource/List_of_topics_related_to_Sound_Art...

Pulling http://dbpedia.org/resource/Category:Arts-related_lists...

Pulling http://dbpedia.org/resource/Category:Science_fiction_lists...

Pulling http://dbpedia.org/resource/List_of_science_fiction_television_programs,_H...

Pulling http://dbpedia.org/resource/Category:Science_fiction_lists...

Pulling http://dbpedia.org/resource/List_of_science_fiction_television_programs,_U...

Pulling http://dbpedia.org/resource/Category:Science_fiction_lists...

Pulling http://dbpedia.org/resource/List_of_science_fiction_television_programs,_G...

Pulling http://dbpedia.org/resource/Category:Science_fiction_lists...

Pulling http://dbpedia.org/resource/Category:Star_Trek_lists...

Pulling http://dbpedia.org/resource/Star_Trek_crossovers...

Pulling http://dbpedia.org/resource/Category:Star_Trek_characters...

Pulling http://dbpedia.org/resource/Category:Starfleet_officers...

Pulling http://dbpedia.org/resource/Jenna_D’Sora...

Pulling http://dbpedia.org/resource/Category:Fictional_characters_introduced_in_1991...

Pulling http://dbpedia.org/resource/Klim_Dokachin...

Pulling http://dbpedia.org/resource/Category:Fictional_extraterrestrial_characters...

Pulling http://dbpedia.org/resource/Shran...

Pulling http://dbpedia.org/resource/Category:Fictional_extraterrestrial_characters...

Pulling http://dbpedia.org/resource/Nero_(Star_Trek)...

Pulling http://dbpedia.org/resource/Category:Romulans...

Pulling http://dbpedia.org/resource/Category:Star_Trek_races...

Pulling http://dbpedia.org/resource/Gorn_(Star_Trek)...

Pulling http://dbpedia.org/resource/Category:Fictional_warrior_races...

Pulling http://dbpedia.org/resource/Claymore_(manga)...

Pulling http://dbpedia.org/resource/Category:Adventure_anime_and_manga...

Pulling http://dbpedia.org/resource/Kurokami_Captured...

Pulling http://dbpedia.org/resource/Category:Manga_series...

Pulling http://dbpedia.org/resource/Et_Cetera...

Pulling http://dbpedia.org/resource/Tow_Nakazaki...

Pulling http://dbpedia.org/resource/Et_Cetera...

Pulling http://dbpedia.org/resource/Tokyopop...

Pulling http://dbpedia.org/resource/D.N.Angel...

Pulling http://dbpedia.org/resource/D%E2%80%A2N%E2%80%A2Angel...

Pulling http://dbpedia.org/resource/D.N.Angel...

Pulling http://dbpedia.org/resource/Monthly_Asuka...

Pulling http://dbpedia.org/resource/Neon_Genesis_Evangelion:_Angelic_Days...

Pulling http://dbpedia.org/resource/Romantic_comedy_film...

Pulling http://dbpedia.org/resource/Double_Inconstancy...

Pulling http://dbpedia.org/resource/France...

Pulling http://dbpedia.org/resource/Lessard-et-le-Ch%C3%AAne...

Pulling http://dbpedia.org/resource/France...

Pulling http://dbpedia.org/resource/Amen....

Pulling http://dbpedia.org/resource/Category:French_films...

Pulling http://dbpedia.org/resource/The_Last_Billionaire...

Pulling http://dbpedia.org/resource/Category:1935_films...

Pulling http://dbpedia.org/resource/Betty_Boop_with_Henry,_the_Funniest_Living_Ameraican...

Posted in Data, Linked Data, Programming, Semantic Web | Leave a comment

Querying the UDFR with SPARQL

6th
Oct. × ’13

The UDFR is the “Unified Digital Format Registry”, a semantic registry for digital preservation. It is an extremely useful tool for specifying file formats and being able to address them unambiguously across domains and repositories. While working with Helen Bailey, an MIT Library Fellow for Digital Curation and Preservation, we needed a better sense of the way data was stored in the UDFR , how best to query it, and how we might integrate it into our own work.

Since the “Start Here” link on http://udfr.org/ drops you unceremoniously into an unintuitive interface, I thought it would be worth writing up some of the basic steps and queries I used to get to the meat of this service.

Start Here” puts you into the root of their OntoWiki instance, which only offers a list of Knowledge Base objects, and a list of classes of resources available. One can click through these classes in a sort of hierarchical navigation, but it leaves a lot to be desired in terms of UX and conveying the structure of the data model.

Personally, I would much rather simply query my way through the triple-store to understand how classes link to one another, and how they look in their actual data representation. This allows one, in a way, to see how these resources might be used in a local data store and how they might be integrated with one’s own resources. The query interface is a bit hidden, but if you click on the “Extras” in the menu at the top-left of the page, and select “Queries”, you will be directed to an interface that will allow you to compose SPARQL queries, and get the results back in a number of formats.

The predefined namespaces in this interface are the first clue to how the registry describes file formats and their related entities. Here they are, converted to SPARQL PREFIX format if you want to use them in other endpoints:

 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
 PREFIX owl: <http://www.w3.org/2002/07/owl#>
 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 PREFIX foaf: <http://xmlns.com/foaf/0.1/>
 PREFIX sioc: <http://rdfs.org/sioc/ns#>
 PREFIX sysont: <http://ns.ontowiki.net/SysOnt/>
 PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
 PREFIX text: <http://purl.org/NET/mediatypes/text/>
 PREFIX multipart: <http://purl.org/NET/mediatypes/multipart/>
 PREFIX mime: <http://purl.org/NET/mediatypes/>
 PREFIX id: <http://reference.data.gov.uk/id/>
 PREFIX madsrdf: <http://www.loc.gov/mads/rdf/v1#>
 PREFIX dcam: <http://purl.org/dc/dcam/>
 PREFIX msg: <http://purl.org/NET/mediatypes/message/>
 PREFIX audio: <http://purl.org/NET/mediatypes/audio/>
 PREFIX app: <http://purl.org/NET/mediatypes/application/>
 PREFIX dct: <http://purl.org/dc/terms/>
 PREFIX model: <http://purl.org/NET/mediatypes/model/>
 PREFIX video: <http://purl.org/NET/mediatypes/video/>
 PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
 PREFIX wot: <http://xmlns.com/wot/0.1/>
 PREFIX dc: <http://purl.org/dc/elements/1.1/>
 PREFIX vann: <http://purl.org/vocab/vann/>
 PREFIX ns: <http://www.w3.org/2006/vcard/ns#>
 PREFIX vs: <http://www.w3.org/2003/06/sw-vocab-status/ns#>
 PREFIX image: <http://purl.org/NET/mediatypes/image/>
 PREFIX udfrs: <http://udfr.org/onto#>
 PREFIX sioct: <http://rdfs.org/sioc/types#>
 PREFIX sysconf: <http://localhost/OntoWiki/Config/>
 PREFIX ns0: <http://udfr.org/profile/>

You can also go directly to their SPARQL endpoint, which isn’t terribly well advertised on the site:

http://udfr.org/ontowiki/sparql/

Following are a few queries I performed to interrogate the structure of the UDFR triple-store. Their interface will automatically append a “LIMIT 20″ onto the query if no LIMIT clause exists, but adding something like “LIMIT 400″ works just lovely. I will omit this from my examples.

# Retrieve a list of all classes of objects

SELECT DISTINCT * WHERE { [] a ?type . }

# Retrieve a list of all file format URIs

SELECT DISTINCT * WHERE { ?fileformat a <http://udfr.org/onto#FileFormat> . }

# Retrieve a list of data types used to describe resources

SELECT DISTINCT * WHERE { ?s a <http://www.w3.org/2002/07/owl#DatatypeProperty> . }

# Retrieve a list of data sources

 SELECT DISTINCT * WHERE { ?s a <http://udfr.org/profile/DataSourceProfile> . }

# e.g.

# http://udfr.org/profile/PRONOM
# http://udfr.org/profile/MIME

# ^^ I’m disappointed that these don’t dereference…

# Retrieve a list of person profiles

SELECT DISTINCT * WHERE { ?s a <http://udfr.org/profile/IndividualProfile> . }

# Retrieve a list of all SKOS Concepts in the registry

SELECT * WHERE { ?concept a <http://www.w3.org/2004/02/skos/core#Concept> . }

Hopefully this helps to get jump-started on navigating and using this resource, either for a way to learn more about querying with SPARQL via example, or to understand the UDFR better.

Posted in Uncategorized | Leave a comment

Word Counts in the MIT Open Access Collection

29th
Sep. × ’13

I’ve been setting up a data mining framework for the MIT Open Access collection, and testing some initial simple / naive analysis runs. Below are the results of a word count algorithm run aggregating over the entire content contained in the OA collection. (It is severely truncated because WordPress freaked out when I handed it the whole list… because it’s awesome.) Clearly there are a bunch of additions to go into the stop-words list, and a few interesting blips to investigate.

In the coming months, I’ll be doing a number of EDA projects, discovery interfaces, and complex data objects, graphs, etc. Watch this space for details

0000, 118739
data, 102927
using, 90690
time, 88965
model, 87614
will, 86509
between, 83933
figure, 72830
other, 71166
used, 66665
number, 60452
then, 60077
results, 58688
when, 57849
phys, 55129
cell, 53733
university, 52594
publisher, 52411
function, 51740
analysis, 51676
system, 51500
cells, 51493
state, 49607
high, 49290
first, 48682
section, 48373
2009, 48312
based, 47998
energy, 47963
author, 47323
agreement, 47125
given, 46486
over, 45379
same, 45173
shown, 44474
after, 43440
different, 42725
case, 42213
under, 42157
2010, 41450
however, 38996
2008, 37723
order, 37502
table, 36521
value, 36363
rate, 36062
values, 35830
large, 35773
information, 35627
thus, 35037
while, 34587
2007, 34150
well, 33992
three, 33968
signal, 33591
since, 33233
single, 32920
because, 32812
most, 32721
2000, 32577
null, 32412
manuscript, 31927
distribution, 31920
mass, 31765
physical, 31679
article, 31637
control, 31630
field, 31607
2006, 31349
terms, 30759
observed, 30547
within, 30256
during, 29765
through, 29714
2011, 29693
surface, 29662
work, 29557
models, 29308
events, 29275
level, 29033
research, 28843
similar, 28807
systems, 28729
small, 28725
point, 28582
journal, 28548
study, 28448
protein, 28375
show, 28357
structure, 28316
example, 28267
following, 28111
algorithm, 28075
form, 27731
total, 27418
second, 27113
phase, 26824
above, 26817
2005, 26719
publication, 26705
effect, 26594
power, 26544
line, 26522
those, 26423
without, 26413
available, 26405
expression, 26382
process, 26294
institute, 26146
region, 26070
therefore, 25732
does, 25722
result, 25618
subject, 25527
here, 25523
found, 25349
states, 25314
shows, 25258
problem, 24955
review, 24868
method, 24743
conditions, 24713
network, 24606
mean, 24452
parameters, 24407
range, 24330
space, 24264
theory, 24080
paper, 24052
sample, 23810
could, 23730
2004, 23693
current, 23686
error, 23674
size, 23645
license, 23634
average, 23612
physics, 23150
described, 23143
approach, 23135
methods, 23064
articles, 23034
type, 23022
general, 22909
gene, 22762
massachusetts, 22680
effects, 22572
technology, 22560
frequency, 22550
human, 22526
measured, 22395
higher, 22364
further, 22269
ieee, 22150
many, 22123
note, 22045
obtained, 21896
probability, 21762
tion, 21744
scale, 21667
possible, 21637
science, 21539
even, 21234
temperature, 21142
ctcf_known1, 21124
lower, 21005
date, 20978
performance, 20869
policy, 20853
linear, 20805
factor, 20760
genes, 20624
matrix, 20553
2003, 20531
solution, 20344
initial, 20312
including, 20292
very, 20247
density, 20230
response, 20150
associated, 20108
flow, 19972
defined, 19925
present, 19909
cross, 19781
ratio, 19658
background, 19593
respectively, 19552
must, 19537
before, 19473
standard, 19467
consider, 19248
changes, 19182
relative, 19179
design, 19152
specific, 19119
part, 19094
studies, 19034
change, 18936
2012, 18931
authors, 18925
potential, 18814
corresponding, 18755
like, 18691
group, 18687
experiments, 18670
test, 18620
version, 18442
compared, 18436
length, 18246
2002, 18217
lett, 18182
significant, 18095
measurements, 17905
source, 17858
functions, 17816
additional, 17718
less, 17613
experimental, 17506
important, 17501
long, 17481
binding, 17442
cambridge, 17412
expected, 17275
prime, 17270
provide, 17236
page, 17168
light, 17084
sequence, 17057
channel, 17036
regions, 16959
properties, 16944
vector, 16943
particular, 16926
multiple, 16926
image, 16893
required, 16891
either, 16882
samples, 16835
constant, 16794
activity, 16785
follows, 16740
equation, 16722
increase, 16692
chem, 16684
levels, 16584
several, 16506
known, 16457
parameter, 16442
volume, 16400
whether, 16284
points, 16282
term, 16248
related, 16183
department, 16163
limit, 16121
local, 16070
effective, 16055
limited, 16052
least, 15863
right, 15859
below, 15844
need, 15840
independent, 15802
random, 15800
zero, 15724
published, 15685
bound, 15650
four, 15590
across, 15567
final, 15552
layer, 15552
complex, 15509
along, 15469
noise, 15382
positive, 15304
input, 15302
although, 15273
2001, 15224
much, 15210
fact, 15185
full, 15182
rights, 15175
performed, 15155
consistent, 15136
theorem, 15130
lines, 15029
free, 14966
applied, 14936
cost, 14904
optimal, 14884
find, 14874
water, 14832
obtain, 14774
growth, 14716
termination, 14696
larger, 14682
2500, 14547
event, 14539
uncertainty, 14521
times, 14504
pubmed, 14488
step, 14481
addition, 14448
maximum, 14386
proof, 14313
particle, 14284
condition, 14281
open, 14263
estimate, 14166
proteins, 14146
previous, 14118
target, 14108
individual, 14066
cases, 14063
mice, 14012
center, 13952
distance, 13933
respect, 13871
nature, 13859
left, 13836
behavior, 13764
dependent, 13632
wave, 13597
measurement, 13569
interaction, 13562
proc, 13562
production, 13561
interactions, 13534
make, 13531
apply, 13477
simple, 13468
quantum, 13454
determined, 13427
national, 13422
simulation, 13420
previously, 13387
fixed, 13379
role, 13269
difference, 13261
color, 13210
1000, 13200
node, 13180
measure, 13162
area, 13088
genome, 13056
electron, 12973
resolution, 12971
every, 12936
output, 12869
rates, 12819
networks, 12811
mode, 12793
real, 12760
engineering, 12743
period, 12718
domain, 12594
induced, 12566
next, 12558
nodes, 12522
society, 12522
calculated, 12515
american, 12483
position, 12457
site, 12375
them, 12357
near, 12326
comparison, 12276
factors, 12267
d8cwct, 12255
online, 12248
features, 12246
spin, 12208
dimensional, 12199
variables, 12197
experiment, 12191
boundary, 12173
being, 12146
presented, 12138
development, 12126
global, 12121
provided, 12110
dynamics, 12110
access, 12107
optical, 12053
search, 12020
resulting, 11996
loss, 11976
likely, 11911
components, 11902
generated, 11872
processes, 11861
various, 11852
efficiency, 11836
might, 11782
best, 11779
decay, 11767
made, 11754
future, 11740
sites, 11684
spectrum, 11668
contrast, 11627
graph, 11600
correlation, 11552
images, 11486
negative, 11471
increased, 11453
velocity, 11448
component, 11443
1999, 11442
fraction, 11426
assume, 11404
presence, 11384
variable, 11382
prior, 11370
functional, 11347
width, 11345
reported, 11319
among, 11296
hence, 11272
take, 11263
directly, 11232
significantly, 11211
observations, 11169
selection, 11162
reduced, 11148
transition, 11118
upper, 11112
another, 11106
action, 11048
account, 11007
cancer, 11002
applications, 10978
formation, 10930
distributions, 10912
sets, 10912
increases, 10910
structures, 10908
learning, 10893
lemma, 10890
support, 10880
direction, 10872
1998, 10864
include, 10831
statistical, 10777
object, 10720
materials, 10711
strong, 10693
3333, 10631
copyright, 10623
provides, 10622
good, 10548
equilibrium, 10488
italy, 10485
particles, 10480
side, 10474
estimated, 10471
molecular, 10468
flux, 10457
spatial, 10454
evidence, 10451
path, 10444
direct, 10424
class, 10411
rather, 10409
entity, 10378
basis, 10367
derived, 10358
link, 10351
algorithms, 10349
together, 10336
biol, 10332
product, 10293
edge, 10277
errors, 10269
detection, 10221
original, 10203
dynamic, 10187
accepted, 10187
normal, 10168
smaller, 10159
differences, 10158
peak, 10144
main, 10137
journals, 10122
determine, 10097
program, 10087
require, 10086
finally, 10065
neurons, 10065
hand, 10046
processing, 10035

Posted in Data, MIT, Programming, Uncategorized | Tagged , , , , , , | Leave a comment

Rotational Navigation Around Africa in Processing

24th
Aug. × ’13

Using some work I’ve been doing on 3D navigation around the cartesian mouse space, I wrote up this Processing sketch today to allow rotation of a country SVG from Wikipedia, in this case configured for Africa.

Africa Navigation

// http://dbpedia.org/page/Africa
// http://commons.wikimedia.org/wiki/File:Blank_Map-Africa.svg
PShape africaSVG;
float xPos, yPos;
PVector xCenter, yCenter;

void setup() {
size(displayWidth, displayHeight, OPENGL);
africaSVG = loadShape(“http://upload.wikimedia.org/wikipedia/commons/6/66/Blank_Map-Africa.svg”);
// africaSVG = loadShape(“Blank_Map-Africa.svg”);
africaSVG.scale(0.5);
africaSVG.disableStyle();
shapeMode(CENTER);
}

void draw() {
background(30);

xCenter = new PVector(mouseX, height/2);
yCenter = new PVector(width/2, mouseY);

drawCartesianAxes();

pushMatrix();
if (mouseX > width/2-20 && mouseX < width/2+20 && mouseY > height/2-20 && mouseY < height/2+20) {
drawCenteredBox();
}
else {
if (mouseY > height/2) {
xPos += map(xCenter.dist(new PVector(mouseX, mouseY)), 0, height/2, 0.0001, 0.05);
}
else {
xPos += -1*map(xCenter.dist(new PVector(mouseX, mouseY)), 0, height/2, 0.0001, 0.05);
}

if (mouseX > width/2) {
yPos += map(yCenter.dist(new PVector(mouseX, mouseY)), 0, width/2, 0.0001, 0.05);
}
else {
yPos += -1*map(yCenter.dist(new PVector(mouseX, mouseY)), 0, width/2, 0.0001, 0.05);
}
}
popMatrix();

 

pushMatrix();
translate(width/2, height/2);
rotateX(xPos);
rotateY(yPos);

stroke(150, 100);
strokeWeight(2);
fill(200);
shape(africaSVG, 0, 0);
popMatrix();
}

void drawCartesianAxes() {
pushMatrix();
translate(0, 0, 150);
pushStyle();
stroke(150, 100);
strokeWeight(1);
line(0, height/2, width, height/2);
line(width/2, 0, width/2, height);

stroke(150, 100);
strokeWeight(10);
fill(200, 0, 0);

point(xCenter.x, xCenter.y);
point(yCenter.x, yCenter.y);
popStyle();
popMatrix();
}

void drawCenteredBox() {
noFill();
stroke(0, 0, 200);
strokeWeight(2);
rect(10, 10, width-20, height-20);
}

Posted in Uncategorized | Leave a comment

Building Out A Data View Of Africa

14th
Aug. × ’13

In continuing to build a framework for further linked data-driven exploration of Africa, I ran a list of African country names against the DBpedia Lookup service today.  The following is a (CSV) result of the original country name used, the DBpedia/Wikipedia name for the country, and the DBpedia URI for each…

Algeria, Algeria, http://dbpedia.org/resource/Algeria
Angola, Angola, http://dbpedia.org/resource/Angola
Benin, Benin, http://dbpedia.org/resource/Benin
Botswana, Botswana, http://dbpedia.org/resource/Botswana
Burkina Faso, Burkina Faso, http://dbpedia.org/resource/Burkina_Faso
Burundi, Burundi, http://dbpedia.org/resource/Burundi
Cameroon, Cameroon, http://dbpedia.org/resource/Cameroon
Cape Verde, Cape Verde, http://dbpedia.org/resource/Cape_Verde
Central African Republic, Central African Republic, http://dbpedia.org/resource/Central_African_Republic
Chad, Chad, http://dbpedia.org/resource/Chad
Comoros, Comoros, http://dbpedia.org/resource/Comoros
Congo (DRC), Democratic Republic of the Congo, http://dbpedia.org/resource/Democratic_Republic_of_the_Congo
Congo (Republic), Republic of the Congo, http://dbpedia.org/resource/Republic_of_the_Congo
Cote d’Ivoire, Côte d’Ivoire, http://dbpedia.org/resource/Côte_d’Ivoire
Djibouti, Djibouti, http://dbpedia.org/resource/Djibouti
Egypt, Egypt, http://dbpedia.org/resource/Egypt
Equatorial Guinea, Equatorial Guinea, http://dbpedia.org/resource/Equatorial_Guinea
Eritrea, Eritrea, http://dbpedia.org/resource/Eritrea
Ethiopia, Ethiopia, http://dbpedia.org/resource/Ethiopia
Gabon, Gabon, http://dbpedia.org/resource/Gabon
Gambia, The Gambia, http://dbpedia.org/resource/The_Gambia
Ghana, Ghana, http://dbpedia.org/resource/Ghana
Guinea, Papua New Guinea, http://dbpedia.org/resource/Papua_New_Guinea
Guinea-Bissau, Guinea-Bissau, http://dbpedia.org/resource/Guinea-Bissau
Kenya, Kenya, http://dbpedia.org/resource/Kenya
Lesotho, Lesotho, http://dbpedia.org/resource/Lesotho
Liberia, Liberia, http://dbpedia.org/resource/Liberia
Libya, Libya, http://dbpedia.org/resource/Libya
Madagascar, Madagascar, http://dbpedia.org/resource/Madagascar
Malawi, Malawi, http://dbpedia.org/resource/Malawi
Mali, Mali, http://dbpedia.org/resource/Mali
Mauritania, Mauritania, http://dbpedia.org/resource/Mauritania
Mauritius, Mauritius, http://dbpedia.org/resource/Mauritius
Mayotte, Mayotte, http://dbpedia.org/resource/Mayotte
Morocco, Morocco, http://dbpedia.org/resource/Morocco
Mozambique, Mozambique, http://dbpedia.org/resource/Mozambique
Namibia, Namibia, http://dbpedia.org/resource/Namibia
Niger, Niger, http://dbpedia.org/resource/Niger
Nigeria, Nigeria, http://dbpedia.org/resource/Nigeria
Rwanda, Rwanda, http://dbpedia.org/resource/Rwanda
Sao Tome and Principe, São Tomé and Príncipe, http://dbpedia.org/resource/São_Tomé_and_Príncipe
Senegal, Senegal, http://dbpedia.org/resource/Senegal
Seychelles, Seychelles, http://dbpedia.org/resource/Seychelles
Sierra Leone, Sierra Leone, http://dbpedia.org/resource/Sierra_Leone
Somalia, Somalia, http://dbpedia.org/resource/Somalia
South Africa, South Africa, http://dbpedia.org/resource/South_Africa
Sudan, Sudan, http://dbpedia.org/resource/Sudan
Swaziland, Swaziland, http://dbpedia.org/resource/Swaziland
Tanzania, Tanzania, http://dbpedia.org/resource/Tanzania
Togo, Togo, http://dbpedia.org/resource/Togo
Tunisia, Tunisia, http://dbpedia.org/resource/Tunisia
Uganda, Uganda, http://dbpedia.org/resource/Uganda
Western Sahara, Western Sahara, http://dbpedia.org/resource/Western_Sahara
Zambia, Zambia, http://dbpedia.org/resource/Zambia
Zimbabwe, Zimbabwe, http://dbpedia.org/resource/Zimbabwe

Posted in Africa, Data, Programming, Semantic Web | Tagged , , , , , , | Leave a comment

From Open Access To Metabolic Chains – Semantic Analysis & Linked Open Data

1st
Aug. × ’13

One of the side-projects I’ve been working on at MIT recently is a web API for the full text content of the Open Access collection, which holds over 10,000 articles across a range of disciplines.  The potential here for data mining is immense, but our UI does not expose a way to get to the full text of the article other than by downloading the PDF.

Behind the scenes, for search indexing purposes, we run tools over the PDF content to extract a less-than-perfect, but reasonable extraction of plain text from the files.  These files are stored in the repository’s data structure, but not exposed.  My recent work has been to produce URLs for the full text content of each of these open access items so that subsequent analysis, beyond human consumption, can be performed.  Hopefully, this leads to some valuable insight into the output of MIT, and perhaps enables a story to emerge from the content that could not otherwise be culled by manual browsing and research.  It is a kind of auto-research; one of the areas I see the most potential in for libraries.

In one of my initial EDA forays, I ran full text content through a word frequency tool I wrote.  It is an extremely simple occurrence counter that hands you back the most frequently used words in a text, with a cutoff for insignificant amounts of occurrence.

Screen Shot 2013-08-02 at 1.13.15 AM

Once I gathered the most frequently occurring words for the article, the next step was to run them through an entity extractor.  In this case, it was WikiMeta, though there are a number of other alternatives in this space, including OpenCalais.

This provided me with a link to further data about the individual words and concepts used in the article.  One that caught my eye, for whatever reason, was Acetylation, and I decided to follow this trail further into the adjacent data.  The immediate connection was to its wikipedia article.

Acetylation

It is a simple matter to translate the URL of the wikipedia article to the DBpedia page, which lists the structured linked open data extracted from the Wikipedia page.  Upon inspection, I found that there was a property used in the DBpedia ontology called dbpprop:metabolism, which was used as a linked descriptor for this particular entity.

metabolism in dbpedia

What is significant about this is that it specifies another node that has additional metadata about the result of the metabolism.  This is only the first step in a potential chain of chemicals and anatomical structures that could be related to the topic of this paper, as seen below when you follow the trail to the Liver.

Liver on DBpedia

Are these all explicitly related to what was being discussed in the original article?  No, not necessarily, but as a first pass, it provides a much greater context for understanding and exploration, as well as other potential inbound links to this article.  It shows how linked data allows for one starting point, such as a single word used in a text, to branch out in hundreds of directions, all of which may situate the original content in a wealth of knowledge that could provide the insight necessary for that next critical leap in thought.

Posted in Linked Data, MIT, Semantic Web | Tagged , , , , , | Leave a comment

EYEO Roots in the MIT Libraries

29th
Jul. × ’13

Working in a research library, specifically at MIT, can be very distracting.  When making changes to interfaces, or working on data modeling tasks, you inevitably have to stare at amazing content; more amazing content than you could ever digest, and it becomes a habit to make a quick decision about whether to capture it for later, or simply pretend you didn’t see it and complete the task at hand.

I recently realized just how many papers we have from the incredible people that are associated with the EYEO Festival and I started curating a list of resources that those involved in EYEO would be interested in.  Below is undoubtedly an incomplete list, but there are some great works that undoubtedly were the seeds for some of the things happening in this community today.

 

Casey Reas
Behavioral Kinetic Sculpture
http://dspace.mit.edu/handle/1721.1/62356

 

Ben Fry
Organic Information Design
http://hdl.handle.net/1721.1/9042

 

Ben Fry
Computational Information Design
http://hdl.handle.net/1721.1/26913

 

Golan Levin
Painterly Interfaces for Audiovisual Performance
http://hdl.handle.net/1721.1/61848

 

John Rothenberg
Duration, Density, and Evolutionary Form: Application of Biological Principles to Architectural Surface
http://hdl.handle.net/1721.1/62977

 

John Rothenberg
Indeterminate Liberal Form: Public Space in Sprawl
http://hdl.handle.net/1721.1/39319

 

John Underkoffler
The I/O Bulb and the Luminous Room
http://hdl.handle.net/1721.1/29145

 

John Underkoffler
Toward Accurate Computation of Optically Reconstructed Holograms
http://hdl.handle.net/1721.1/13914

 

Fernanda Viégas
Data Portraits
http://hdl.handle.net/1721.1/60252

 

Fernanda Viégas
Collections: Adapting the Display of Personal Objects for Different Audiences
http://hdl.handle.net/1721.1/62944

 

Fernanda Viégas
Revealing Individual and Collective Pasts: Visualizations of Online Social Archives
http://hdl.handle.net/1721.1/33880

 

John Maeda
MAS.110 Fundamentals of Computational Media Design, Spring 2003
http://hdl.handle.net/1721.1/49531

 

Evelyn Eastmond
New Tools To Enable Children To Manipulate Images Through Computer Programming
http://hdl.handle.net/1721.1/37049

 

Justin Manor
Cinema Fabriqué: A Gestural Environment for Realtime Video Performance
http://hdl.handle.net/1721.1/61862

 

Heather Knight
Real-Time Social Touch Gesture Recognition for Sensate Robots
http://hdl.handle.net/1721.1/59473

 

Heather Knight
An Architecture for Sensate Robots: Real-Time Social-Gesture Recognition Using a Full Body Array of Touch Sensors
http://hdl.handle.net/1721.1/46036

 

Heather Strausfeld
Embodying Virtual Space to Enhance the Understanding of Information
http://hdl.handle.net/1721.1/62624

 

Ayah Bdeir
<random> Search
http://hdl.handle.net/1721.1/36152

 

Posted in MIT | Leave a comment

Connecting MIT Research To The Global Community

25th
Jun. × ’13

At the MIT Libraries, we hold a large amount of research content (MIT theses, working papers, technical reports, etc.) and a large amount of that is being made open access.  The unanimous vote by the MIT faculty to adopt an open access policy granting MIT non-exclusive copyright to all its research articles signaled an important milestone in making scholarly literature freely available to the world.

 

The policy has served as a model for other research institutions to adopt similar practices. Additionally, the existence of institutional repositories like DSpace@MIT to archive and disseminate this research has facilitated rapid institution-local distribution of this open research.  These repositories contain searchable full-text and quality metadata to improve discoverability, and are routinely harvested by major search engines.  What many seek to understand now is just how much impact these open repositories of peer-reviewed research are having.

 

Two readily observable metrics of success are collection growth and usage.  For the MIT Open Access Article collection, some measure of success can be claimed in both areas.  After 3.5 years of availability, the collection boasts nearly 10,000 articles and over 1M downloads from nearly all of the world’s countries.

 

What warrants further consideration however is the apparent disconnect between downloads originating from regions of the world affected by tremendous social, political, environmental, educational and health issues and the scholarly materials directly applicable to those great challenges – often specifically referencing those geographies.  There are, of course, some easy explanations for this – network availability, interpretation of scholarly articles to practical applications, language barriers, etc.  That said, it seems that if we are to adequately support the mission of MIT in addressing the world’s great challenges, then there is an onus for us to understand the practical limitations of our current passive approach to making this scholarship openly available.  To a finer point, this is beyond the scope of search engine optimization, this is basic discoverability by those for whom the information is potentially impactful at the most basic human level – either directly or by proxy.

 

To that end, what we’re trying to accomplish is the construction of a framework of social and environmental problems, linked to data about the geographical areas that are greatly affected by those issues.  By linking data about the problems to the data about the places where they occur, one gains the benefit of other potentially valuable knowledge about those places, including the people and organizations who are active in solving those problems.

Our first area of focus is Africa.  This is a particular area in need of scrutiny because of the challenges of infrastructure, access to technology, and significant public health issues.  We are beginning to experiment with ways to programmatically link concepts in the linked open data cloud with countries and cities mentioned in MIT’s research, and analyze the match between these areas of the globe, and the areas where access to this research is coming from.

 

The goal is to connect the research and researchers with the people they can help.  

Open access to research is only the first, albeit very important step.  What follows is what is incumbent upon us: to ensure that the people who can benefit from this open research are aware that it exists, are able to access it, and can leverage it to improve their lives.

 

Here is a discussion with David Weinberger at the 2013 LODLAM conference about our efforts in this space…

http://youtu.be/xkyE2qcOx_E

@sean_m_thomas / @sandsfish

sthomas@mit.edu / sands@mit.edu

opinions are our own

 

Posted in Uncategorized | Leave a comment