A dumbed-down semantic triple taxonomy
Been working with semantic triples and entity-relationship models recently, and have a lot of feelings about what a kind of meta-taxonomy or type system might look like.
I feel like there's so much in common between RDF entity definitions and packages/namespaces and types (both abstract and concrete). And there are probably some really interesting extensions of that to multiple dispatch, where a package defines a function that means something semantically, and its inputs also mean something semantically in a way that can enable really transparent use of e.g. a
predict
function of a library that takes a model and a particular input. I just see a lot of echoes between a bunch of concepts here.
Despite personally wishing for some kind of elegant, strongly-typed unicorn, it seems to me that, in practice, triples often amount to throwing unvalidated strings into three columns of a table (subject
, predicate
, object
or whatever) and calling it a day.
(I tend to think of "predicate" and "relationship" somewhat interchangeably—maybe that's a problem.)
(This is also all about representing triples in a regular-degular relational database, because that's all my brain is good for these days.)
But I noticed at least one pattern where we offload all the things we know about an object/entity—including "metadata"—into a few fairly consistent classes of triple based on predicate/relationship type. All it requires is that entities are represented by some kind of unique identifier that we can find across a bunch of different tables/triplet stores/whatever. Here are the categorizations:
is_a
: This tells us the type of the entity. For example, whether it's a person, or a location, or an item, or whatever. Notably, this is only for "concrete" entity types, and is not used to describe entity hierarchies (like "cat is-a animal"). Why not? Well, if our actual entities identifiers stored in the tables (e.g. UUIDs, integers) are a different type than our entity type identifiers (e.g. strings, maybe something enum-like?), then we can't have a single table that contains all is_a's in our whole universe, because columns have to be consistently typed. But if we have multiple is_a tables, then we need to distinguish between which is which either by giving them different names (which gets us into wonky table name parsing) or trying to make type-differentiated tables (e.g. we have multiple is_a tables, but thesubject
andobject
columns [or whatever] have different, semantically meaningful types likeint
but also likehttp://www.w3.org/People
or whatever. This would be cool but isn't super well supported by anything that I know of).has_X
: This tells us that the entity has a particular "attribute". If we were modeling our entity as a struct, ahas_X
would just be telling us what the value of a field/attribute is. Notably, thesubject
column is an entity identifier (e.g. UUID) and theobject
field is a value of some kind, like an int or float or date (or string, that's fine). The value is a terminal node, it has no further attributes itself.- All others. Any other kind of relationship type is between two entities; both the
subject
and theobject
column contain entity identifiers (e.g. UUIDs).