A dumbed-down semantic triple taxonomy

⊕: May 12, 2025, 12:00 PM
Δ: Jul 6, 2025, 10:45 PM
⌗: EntityRelationship

Been working with semantic triples and entity-relationship models recently, and have a lot of feelings about what a kind of meta-taxonomy or type system might look like.

I feel like there's so much in common between RDF entity definitions and packages/namespaces and types (both abstract and concrete). And there are probably some really interesting extensions of that to multiple dispatch, where a package defines a function that means something semantically, and its inputs also mean something semantically in a way that can enable really transparent use of e.g. a predict function of a library that takes a model and a particular input. I just see a lot of echoes between a bunch of concepts here.

Despite personally wishing for some kind of elegant, strongly-typed unicorn, it seems to me that, in practice, triples often amount to throwing unvalidated strings into three columns of a table (subject, predicate, object or whatever) and calling it a day.

(I tend to think of "predicate" and "relationship" somewhat interchangeably—maybe that's a problem.)

(This is also all about representing triples in a regular-degular relational database, because that's all my brain is good for these days.)

But I noticed at least one pattern where we offload all the things we know about an object/entity—including "metadata"—into a few fairly consistent classes of triple based on predicate/relationship type. All it requires is that entities are represented by some kind of unique identifier that we can find across a bunch of different tables/triplet stores/whatever. Here are the categorizations:

is_a: This tells us the type of the entity. For example, whether it's a person, or a location, or an item, or whatever. Notably, this is only for "concrete" entity types, and is not used to describe entity hierarchies (like "cat is-a animal"). Why not? Well, if our actual entities identifiers stored in the tables (e.g. UUIDs, integers) are a different type than our entity type identifiers (e.g. strings, maybe something enum-like?), then we can't have a single table that contains all is_a's in our whole universe, because columns have to be consistently typed. But if we have multiple is_a tables, then we need to distinguish between which is which either by giving them different names (which gets us into wonky table name parsing) or trying to make type-differentiated tables (e.g. we have multiple is_a tables, but the subject and object columns [or whatever] have different, semantically meaningful types like int but also like http://www.w3.org/People or whatever. This would be cool but isn't super well supported by anything that I know of).
has_X: This tells us that the entity has a particular "attribute". If we were modeling our entity as a struct, a has_X would just be telling us what the value of a field/attribute is. Notably, the subject column is an entity identifier (e.g. UUID) and the object field is a value of some kind, like an int or float or date (or string, that's fine). The value is a terminal node, it has no further attributes itself.
All others. Any other kind of relationship type is between two entities; both the subject and the object column contain entity identifiers (e.g. UUIDs).