Experimenting with Linked Data
Experimenting with Linked Data
This week I was able to poke around a bit with SPARQL queries and get a bit closer to understanding how to deal with some of the data that exists ~in the wild~ in the Wikidata universe on my research subject. I was able to generate the nifty visualization above that displays “Anglican Bishops of Winchester who have a date of death before January 1st, 1600.” Except, what I noticed immediately is that there is a problem with both the visualization method (due to data structuring) and precision of information (also due slightly to data, but in a different way). This query is also a Frankenstein’s monster of code pulled from several of the example queries, so I’m not sure I could recreate it from scratch, meaning I probably still don’t understand SPARQL well enough yet.
Problems with data presentation:
Hearkening back to discussions our class has been having in relation to data visualization efforts, influcenced by Kieran Healy’s Data Visualization (2019), and in conversation with Lemercier and Zalc’s points about the quantifiability of humanities data, this visualization struck me as “off” for a few reasons. Particularly, due to the fact that so many people seem to conveniently die on January 1st! Without delving too deeply into problems of quantification of time on a linear axis (something which is brought up by Underwood while citing Cartographies of Time by Daniel Rosenberg, 2010, and is a fascinating rabbit hole to delve down), we can see that even the Wikidata entry is wrong based on data presented in the Wikipedia page. I’m leaving it untouched for now to show the example, but in Wikidata, one of my results - [John Watson (bishop)](https://en.wikipedia.org/wiki/John_Watson_(bishop) is given the date of death in his biography template box of “1584.” However, further down in the text, a clarification that he died on 23 January, 1583/4 is given, which is otherwise unrecorded in any of the date format data in his “date of death” statement in Wikidata.
(My) Problems with RDF’s controlled vocabulary structure
Wikipedia (as a source of contemporary information as well as historical) has an up-to-date entry about the office of the Bishop of Winchester, including historical records of many people who held that title. It also contains an entry for the Diocese of Winchester as a unit of ecclesiastical governance. It does not have an entry on the Bishopric as an economic unit, or as a historical bishopric as part of the Catholic church, however. As mentioned above, there are references within the Wikidata unierse to various bishops as “position held” - “Anglican Bishop of Winchester.” There are also ways to decipher how that position links to other bishops (through a link to “diocesan bishop” which is apparently used for many of the bishops I’m interested in as reference to their role while under the Catholic church’s hierarchy). However, in no universe would Swithun be categorized as an Anglican Bishop, since Swithun lived from the early 800s to 862* and the “Anglican” Church was not established until Henry VIII broke from the Roman Catholic Pope and established himself (as the English Monarch) as separate and Supreme Head of the Church of England with the Supremacy Act of 1534. So, linking data becomes a problem for someone like Swithun, who was indeed a bishop, and who indeed served as the head of a episcopal see in Winchester, in what would eventually become the Anglican Church (and for additional linguistic confusion, was also indeed part of a tradition of christianity that was uniquely influenced by insular development of the Anglo-Saxons, hence being a particularly Anglo-ican church) but who would not have identified with the title of “Anglican Bishop.” Yet the line of data continuity (especially for something like the machine-readable aspects of the Wikidata Identifier codes) works best with a continuous through-line of Bishops “replaced by” P:1366 eachother, historical nomenclature and periodicity be damned. Even after several hours of playing around with nested query strings that should have, by my reckoning, provided results of “Any person with the role of bishop (or any of it’s subclasses) from England” I couldn’t successfully create a list because the geographic aspect of what constituted “England” was another can of worms that I couldn’t decipher.
Returning to Swithun as an example: He is not listed as having anything to do with “England” or the “Kingdom of England” or anything similar, since he has the data type “country of citizenship = Kingdom of Wessex.” This goes to show the inherent problem with my earlier point, as Swithun probably would have indeed felt himself to be more of a “citizen” of the Kingdom of Wessex than something called England (which didn’t exist yet), but unfortunately I couldn’t find a good way to reference - within Wikidata’s SPARQL, RDF vocabulary universe - that the Kingdom of Wessex was eventually subsumed by England, which means to my mind that it should be related in some way. In truth, the current Wikipedia page for the Kingdom of Wessex has a point that it is “Today part of” the United Kingdom, but this isn’t referenced anywhere in the Wikidata item for that page! So how we’re to reconcile the rejection of periodicity and still be able to functionally use linked data is still unclear to me. It’s almost as if, due to the necessary rigor of vocabulary control needed for the RDF structure to work means that it is human readable but perhaps not human composable without extensive training.