RE: [ANN] RDF Delta : change logging and dataset replication. from Stian Soiland-Reyes on 2018-06-18 (semantic-web@w3.org from June 2018)

From: Stian Soiland-Reyes <soiland-reyes@manchester.ac.uk>
Date: Mon, 18 Jun 2018 12:46:21 +0000
To: Reto Gm�r <reto@factsmission.com>, Andy Seaborne <andy@apache.org>, Semantic Web <semantic-web@w3.org>
Message-ID: <D5780135E58FC940BDB87E7D49991018D6D382FC@MBXP13.ds.man.ac.uk>

>From what I get in https://afs.github.io/rdf-delta/rdf-patch.html#blank-nodes it assumes a �system identifier� that survives multiple patches. This is kind of like a I-know-its-a-bnode-and-so-should-you skolemization (but where the end result is still a bnode).



I see why you raise this, as there would be challenges if you had federated systems that used RDF patches, as you would need to keep track of which �system� a patch picked its identifiers from. Yes, that could go into �H id� as in https://afs.github.io/rdf-delta/rdf-patch-logs.html





I think a select-pattern-based system that would work with isomorphic graphs would be more general (e.g. such patches could be applied to a variety of stores), but probably harder for an RDF store to generate from a simple transaction log. It could also be more computationally expensive to apply.



As this is an RDF Patch update we don�t need any kind of selection, just to deal with known triples separately.





Perhaps it could work by adding an S(elect) operation and E(xists) within a transaction?



Suggested format:





TX .

S _:1 .

E <http://example.com/person1>  <http://schema.org/Person> .

E <http://example.com/person1> <http://schema.org/affiliation> _:1 .

E _:1 <http://schema.org/url> <http://example.com/org1> .

D _:1 <http://schema.org/name> �Fred�s Fish House� .

A _:1 <http://schema.org/name> �Fred�s Soup House� .

TA .



(Using schema.org as example as it relies a lot on bnodes)



Here we (S)elect _:1 as a blank node ID to be bound within this transaction (_:1 is no longer a system identifier).



To restrict which bnode we are talking about, the store would need to match all of the E(xists) statements.  Any non-selected _: identifiers there are NOT free, but are still interpreted as �system identifiers�, but you can add multiple S(elections).



Here the transaction would fail if any of the E�s triples/quads fail to exists, or give multiple bindings for _:1. I don�t think it would be appropriate for RDF Patch format to do wildcard bnode selections, e.g. �Delete all bnodes that are organizations..�.



It is not a requirement that every selected bnode is used in A/D, although it would be silly if none of them were used. (This permits you do use intermediate bnodes in the E selection)





It would have to be inside a transaction because such patches are not necessarily idempotent, e.g. the A/D operations might be doing something that breaks the E query and so you can�t run it again.



My proposal would presumably be fairly simple to translate to SPARQL updates.



--
Stian Soiland-Reyes, eScience Lab
School of Computer Science, The University of Manchester
http://orcid.org/0000-0001-9842-9718



From: Reto Gm�r<mailto:reto@factsmission.com>
Sent: 18 June 2018 06:55
To: Andy Seaborne<mailto:andy@apache.org>; Semantic Web<mailto:semantic-web@w3.org>
Subject: RE: [ANN] RDF Delta : change logging and dataset replication.



Hi Andy

I'm curious: does this system rely on persistent blanknode ids or can it generate SPARQL Update statements that can be applied to any isomorphic graph?

Cheers,
Reto

> -----Original Message-----
> From: Andy Seaborne <andy@apache.org>
> Sent: Friday, June 15, 2018 6:36 PM
> To: Semantic Web <semantic-web@w3.org>
> Subject: [ANN] RDF Delta : change logging and dataset replication.
>
> RDF Delta is a system for recording and publishing changes to RDF Datasets. It
> can be used to create replicas.
>
> https://afs.github.io/rdf-delta/
>
> It is built on top patches and logs which record the changes made to the data.
>
> https://afs.github.io/rdf-delta/rdf-patch.html
>
> One use case is running multiple sync'ed Apache Jena Fuseki servers, for high
> availability or for a request-scalable publishing solution:
>
> https://afs.github.io/rdf-delta/ha-fuseki.html
>
> The current version is 0.4.0.
>
>      Andy

Received on Monday, 18 June 2018 12:46:48 UTC