webarch frbr frir#

Dieses Dokument: Zitatsammlung aus WEBARCH, FRIR, RDF 1.2 zur Korrespondenz zwischen WEBARCH und FRBR

Zentrales Dokument:

Autoren:

  1. Acknowledgments. This document was authored by the W3C Technical Architecture Group which included the following participants: Tim Berners-Lee (co-Chair, W3C), Tim Bray (Antarctica Systems), Dan Connolly (W3C), Paul Cotton (Microsoft Corporation), Roy Fielding (Day Software), Mario Jeckle (Daimler Chrysler), Chris Lilley (W3C), Noah Mendelsohn (IBM), David Orchard (BEA Systems), Norman Walsh (Sun Microsystems), and Stuart Williams (co-Chair, Hewlett-Packard). (https://www.w3.org/TR/webarch/#acks)

three architectural bases:

  • Identification

  • Interaction

  • Formats

Durchgängiges Beispiel: http://weather.example.com/oaxaca

Erste Abbildung im Text:

Note: Some URI schemes (such as the “ftp” URI scheme specification) use the term “designate” where this document uses “identify.” (https://www.w3.org/TR/webarch/#uri-benefits)

Terminologie:

  • Statt de:Ressource als Übersetzung für en:resource verwenden wir im folgenden das Wort Entität. EN:entity ist ein legitimes Synonym für en:resource:

1.2 Resources and Statements. Any IRI or literal denotes something in the world (the “universe of discourse”). These things are called resources. Anything can be a resource, including physical things, documents, abstract concepts, numbers and strings; the term is synonymous with “entity” as it is used in RDF 1.2 Semantics [RDF12-SEMANTICS]. The resource denoted by an IRI is called its referent […] (https://www.w3.org/TR/rdf12-concepts/#resources-and-statements)

Was ist eine Ressource?#

2.2. URI/Resource Relationships. By design a URI identifies one resource. We do not limit the scope of what might be a resource. The term “resource” is used in a general sense for whatever might be identified by a URI. (https://www.w3.org/TR/webarch/#id-resources)

It is conventional on the hypertext Web to describe Web pages, images, product catalogs, etc. as “resources”. The distinguishing characteristic of these resources is that all of their essential characteristics can be conveyed in a message. We identify this set as “information resources.”

This document is an example of an information resource. It consists of words and punctuation symbols and graphics and other artifacts that can be encoded, with varying degrees of fidelity, into a sequence of bits. There is nothing about the essential information content of this document that cannot in principle be transfered in a message. In the case of this document, the message payload is the representation of this document.

However, our use of the term resource is intentionally more broad. Other things, such as cars and dogs (and, if you’ve printed this document on physical sheets of paper, the artifact that you are holding in your hand), are resources too. They are not information resources, however, because their essence is not information.

2.2.1. URI collision. Suppose, for example, that one organization makes use of a URI to refer to the movie The Sting, and another organization uses the same URI to refer to a discussion forum about The Sting. To a third party, aware of both organizations, this collision creates confusion about what the URI identifies, undermining the value of the URI. If one wanted to talk about the creation date of the resource identified by the URI, for instance, it would not be clear whether this meant “when the movie was created” or “when the discussion forum about the movie was created.” (https://www.w3.org/TR/webarch/#URI-collision)

Dereferenzierung durch Interaktion#

viel technischer Text … knappe Zusammenfassung von https://www.w3.org/TR/webarch/#dereference-details:

Der folgende Interaktionsprozess wird Dereferenzierung oder retrieval eines URI genannt:

  • unser Browser interpretiert einen URI

    • insbesondere verwirft er den Fragment Identifier:

      Interpretation of the fragment identifier is performed solely by the agent that dereferences a URI; the fragment identifier is not passed to other systems during the process of retrieval. (https://www.w3.org/TR/webarch/#internet-media-type)

  • unser Browser nimmt mit einem Server Kontakt auf, übergibt den URI

  • der Server konstruiert eine Repräsentation in einem bestimmt Format

  • der Server schickt die Repräsentation als Nachricht an unseren Browser zurück

  • unser Browser interpretiert die Repräsentation

  • je nachdem, in welchem Format die Repräsentation vorliegt, identifiziert unser Browser nun ggf. noch ein Fragment.

    The semantics of a fragment identifier are defined by the set of representations that might result from a retrieval action on the primary resource. The fragment’s format and resolution are therefore dependent on the type of a potentially retrieved representation. (https://www.w3.org/TR/webarch/#internet-media-type)

Assuming that a representation has been successfully retrieved, the expressive power of the representation’s format will affect how precisely the representation provider communicates resource state. If the representation communicates the state of the resource inaccurately, this inaccuracy or ambiguity may lead to confusion among users about what the resource is. If different users reach different conclusions about what the resource is, they may interpret this as a URI collision (§2.2.1). Some communities, such as the ones developing the Semantic Web, seek to provide a framework for accurately communicating the semantics of a resource in a machine readable way. https://www.w3.org/TR/webarch/#dereference-details:

The IRI owner can establish the intended referent by means of a specification or other document that explains what is denoted. […] A good way of communicating the intended referent is to set up the IRI so that it dereferences [WEBARCH] to such a document. Such a document can, in fact, be an RDF document that describes the denoted resource by means of RDF statements. (<https […] Perhaps the most important characteristic of IRIs in web architecture is that they can be dereferenced, and hence serve as starting points for interactions with a remote server. ://www.w3.org/TR/rdf11-concepts/#referents>)

Wir halten fest: Ein URI ist ein Zeichen, der gleichzeitig mit zwei Entitäten verbunden ist:

  1. Er “identifizier” ein beliebiges (digitales oder analoges, konkretes oder abstraktes, existierendes oder auch nur denkbares) Ding in der Welt;

  2. er “dereferenziert” im Rahmen eines technischen Interaktionsprosses (hier das http-Protokoll) eine Repräsentation des identifizierten Dings.

Was ist eine Repräsentation?#

A representation is data that encodes information about resource state. Representations do not necessarily describe the resource, or portray a likeness of the resource, or represent the resource in other senses of the word “represent”. (https://www.w3.org/TR/webarch/#internet-media-type)

Klar ist: Eine Repräsentation ist eine digitale Entität, die insbesondere durch ihr Format angemessen beschrieben wird.

Offen ist: “Eine Repräsentation besteht aus Daten, die Information über einen bestimmten Zustand einer Entität encodieren”: Was mag damit gemeint sein?

FRIR#

FRIR ist eine Übertragung von FRBR auf digitale Ressourcen … hat eine bestimmte Richtung: FRIR interpretiert http-Interaktionen, indem es FRBR als gegeben voraussetzetzt.

Hint

Aufsatz: [MLG+12]

Quelle: James P. McCusker, Timothy Lebo, Alvaro Graves, Dominic Difranzo, Paulo Pinheiro, and Deborah L. McGuinness: Functional Requirements for Information Resource Provenance on the Web. https://www.researchgate.net/publication/262369023_Functional_Requirements_for_Information_Resource_Provenance_on_the_Web June 2012, DOI:10.1007/978-3-642-34222-6_5, Conference: Proceedings of the 4th international conference on Provenance and Annotation of Data and Processes

Functional Requirements for Information Resources [ Fußnote 5: <http://purl.org/twc/pub/mccusker2012parallel ] (FRIR) extends the use of frbr:Work, frbr:Expression, frbr:Manifestation, and frbr:Item to electronic resources, and therefore any information resource. Within electronic resources, a frbr:Work remains a distinct intellectual or artistic creation. A frbr:Work corresponds to the Resource or Referent in the semiotic framework discussed above, and is identified by a URL, as was shown in Figure 2. Taken together, frbr:Expression, frbr:Manifestation, and frbr:Item are all aspects of the Representation, and are each Referents in their own rights. Inasmuch as they can be identified or symbolized, they have symbols that identify them. frbr:Expression corresponds to a specific set of content regardless of its serialization. For instance, two files would have the same frbr:Expression if they are the same picture stored in two different formats (e.g., JPG and PNG). Similarly, a spreadsheet stored in both CSV and Excel would still have the same frbr:Expression. frbr:Manifestations correspond to a specific bit pattern. If a file is an exact copy of another file, they have the same frbr:Manifestation. An frbr:Item is a specific copy of information stored somewhere or transmitted through a communication link. If a copy of the frbr:Item is made, it results in a new frbr:Item. (p. 56f)

[…]

Conventional message digests such as MD5 or SHA-1 produce identifiers where the probability of creating the same identifier using different data is vanishingly small. This corresponds very closely to our definition of frbr:Manifestation for electronic resources, so we make it possible to identify frbr:Manifestations using message digests. (p. 57)

Similarly, a number of content digests have been developed for RDF graphs, spreadsheets, images, and XML documents that provide the same digest hash regardless of any particular serialization. We use this to computationally identify frbr:Expressions. (p. 57)

Backlink: https://www.researchgate.net/figure/Relating-URIs-Resources-and-Representations-using-FRIR-FRBR-and-the-semiotic_fig3_262369023

Ontologie: http://purl.org/twc/ontology/frir.owl | https://raw.githubusercontent.com/timrdf/csv2rdf4lod-automation/master/doc/ontology/frir.owl

Anwendung auf das Beispiel https://www.windfinder.com/forecast/st_leon_lake:

  • Werk: eine komplexe Sammlung von Informationen

    • z.B. eine spezifisch auf die Bedürfnisse von Wassersportlern zugeschnittene Wettervorhersage, hier z.B. für den See bei St. Leon-Rot

  • Expression: eine sprachlich-typografische Darstellungen eines Werks, z.B. eine Wettervorhersage als Dashboard in DE oder EN

    • Übersetzungen in DE und EN, Angaben in Knoten oder km/h erzeugen verschiedene Expressionen desselben Werks.

  • Manifestation: eine Datei, die man als Ergebnis einer http Content Negotion als Message Payload erhält.

    • Verschiedene Formate (pdf, html, Markdown, csv, xls etc.) erzeugen unterschiedliche Manifestationen derselben Expression.

  • Item

    • Die Datei, die ein Webserver für mich gebaut, auf seiner Festplatte zwischengespeichert und mir dann zugeschickt hat, sowie die Datei, die ich dann erhalten und in meinem Literaturverwaltungsprogramm abgelegt habe, sind zwei unterschiedliche Items derselben Manifestation.

Wir halten fest:

  • In Webarch repräsentiert eine Repräsentation eine Entity, gibt aber keinen Hinweis auf einen Unterschied zwischen Expression, Manifestation und Item.

  • FRIR interpretiert webarch:Repräsentationen aus Sicht von FRBR

The representation of that resource is the content that comes from dereferencing the URL, and is composed of an frbr:Expression, frbr:Manifestation, and frbr:Item. (p. 59)

Problem: Die Übertragung von FRBR auf FRIR ist plausibel, aber noch nicht ganz wasserdicht. Insbesondere haben wir mit obiger FRIR-Korrespondenz zwar ein praktikables Verständnis von frir:Manifestation, aber noch kein positives Verständnis von frir:Expression.

frbr:Expression corresponds to a specific set of content regardless of its serialization. For instance, two files would have the same frbr:Expression if they are the same picture stored in two different formats (e.g., JPG and PNG). [MLG+12], (p. 56)

Ein “set of content regardless of its serialization” – was könnte das sein? Wir suchen also nach einer positiven Beschreibung, Charakterisierung, Definition einer digitalen Expression (für die man dann, wie von FRIR vorgeschlagen, einen Content Digest erstellen kann). Die gesuchte Unterscheidung liefert die RDF Spec.

Abstrakter und konkreter RDF Graph#

Quelle: https://www.w3.org/TR/rdf11-concepts/

in der RDF 1.1 Spec haben wir die Definition eines (RDF-) Dokument:

1.8 RDF Documents and Syntaxes. An RDF document is a document that encodes an RDF graph or RDF dataset in a concrete RDF syntax, such as Turtle [TURTLE], RDFa [RDFA-PRIMER], JSON-LD [JSON-LD], or TriG [TRIG]. RDF documents enable the exchange of RDF graphs and RDF datasets between systems. (https://www.w3.org/TR/rdf11-concepts/#rdf-documents)

A concrete RDF syntax may offer many different ways to encode the same RDF graph or RDF dataset, for example through the use of namespace prefixes, relative IRIs, blank node identifiers, and different ordering of statements. While these aspects can have great effect on the convenience of working with the RDF document, they are not significant for its meaning.

Es liegt nahe, in dieser Spec auch nach so etwas wie einer abstrakten RDF-Syntax zu suchen. Wir werden fündig:

Abstract. The Resource Description Framework (RDF) is a framework for representing information in the Web. This document defines an abstract syntax (a data model) which serves to link all RDF-based languages and specifications. The abstract syntax has two key data structures: RDF graphs are […]. RDF datasets are […] (https://www.w3.org/TR/rdf11-concepts/#abstract)

The core structure of the abstract syntax is a set of triples, each consisting of a subject, a predicate and an object. A set of such triples is called an RDF graph. (https://www.w3.org/TR/rdf11-concepts/#data-model)

Wer FRBR kennt, erkennt hier die gesuchte Unterscheidung zwischen Expression und Manifestation wieder:

  • Ein RDF-Dokument wird im wesentlichen durch eine konkrete Syntax, durch ein Format beschrieben, und entspricht damit dem Manifestations-Aspekt einer webarch:representation

  • Ein RDF-Graph (resp. ein RDF-Dataset) wird in RDF als ein Datenmodell eingeführt, ist eine Menge von RDF-Tripeln (resp. RDF-Graphen), die einer abstrakten Syntax gehorchen, und entspricht damit dem Expressions-Aspekt einer webarch:representation

Wenn RDF ein Datenmodell ist: Dann kann man einen RDF-Graphen als ein Modell verstehen, und zwar hier eine frbr:Werkes. das ursprüngliche frbr:WEMI stellt sich uns nun so dar:

  • Werk – Werk

  • Expression – Modell als ein rdf:Datset von rdf:Graphs

  • Manifestation – eine Message, z.B. als ein rdf11:rdfDocument, Ergebnis einer SPARQL-Query etc.

Wer will, kann diese Gegenüberstellung auch nutzen, um die entsprechenden FRBR-Begriffe aus frbr:wemi neu zu fassen?

Nachtrag: Auch der Begriff Datenmmodell hat 2 Bedeutungen:

Overview. The term data model can refer to two distinct but closely related concepts. Sometimes it refers to an abstract formalization of the objects and relationships found in a particular application domain: for example the customers, products, and orders found in a manufacturing organization. At other times it refers to the set of concepts used in defining such formalizations: for example concepts such as entities, attributes, relations, or tables. So the “data model” of a banking application may be defined using the entity–relationship “data model”. This article uses the term in both senses. (https://en.wikipedia.org/wiki/Data_model#Overview)

Die RDF 1.1 Spec stellt ein Datenmodell im zweiten Sinn dar.

Fragment Identifiers#

  1. Fragment Identifiers. RDF uses IRIs, which may include fragment identifiers, as resource identifiers. The semantics of fragment identifiers is defined in RFC 3986 [RFC3986]: They identify a secondary resource that is usually a part of, view of, defined in, or described in the primary resource, and the precise semantics depend on the set of representations that might result from a retrieval action on the primary resource.

This section discusses the handling of fragment identifiers in representations that encode RDF graphs. In RDF-bearing representations of a primary resource <foo>, the secondary resource identified by a fragment bar is the resource denoted by the full IRI <foo#bar> in the RDF graph. Since IRIs in RDF graphs can denote anything, this can be something external to the representation, or even external to the web. In this way, the RDF-bearing representation acts as an intermediary between the web-accessible primary resource, and some set of possibly non-web or abstract entities that the RDF graph may describe.