Re: [dpub identifiers] Please review updated Identifiers TF wiki from AUDRAIN LUC on 2015-04-08 (public-digipub-ig@w3.org from April 2015)

From: AUDRAIN LUC <LAUDRAIN@hachette-livre.fr>
Date: Wed, 8 Apr 2015 18:41:35 +0200
To: Ivan Herman <ivan@w3.org>, Bill Kasdorf <bkasdorf@apexcovantage.com>
CC: "Stein, Ayla" <astein@illinois.edu>, Thierry Michel <tmichel@w3.org>, W3C Digital Publishing IG <public-digipub-ig@w3.org>
Message-ID: <D14B246D.4798D%laudrain@hachette-livre.fr>
Dear Ivan,

In EPUB3 files, HTML content is tagged with empty anchors like :
"Ils n�en veulent pas, ils n�en <a id="page_182"/>veulent pas, elle l�che
dans un soupir en attrapant encore une lettre."

This means that a new paper page starts at word � veulent �.

In parallel, the EPUB3 nav document contains an ordered list of navigation
points in a <nav epub:type="page-list �> element :
<li>
   <a href="chap22.html#page_182">Page 182</a>
               </li>
Then the label of this paper page 182 is � Page 182 �.


In term of worflow, by good practice, we produced a new EPUB file as soon
as text corrections have been inserted in the reprint book.

Best,
Luc




Luc Audrain
Hachette Livre
Direction Innovation et Technologie Num�rique
11, rue Paul Bert, 92240 Malakoff
Fixe : +33 (0) 1 4123 6370
Mobile : +33 (0) 6 48 38 21 41





Le 08/04/2015 18:27, � Ivan Herman � <ivan@w3.org> a �crit :

>In spite of being a W3C digerati, ie, the worst possible sort:-), I do
>understand...
>
>It is good that this came up: it must be recorded as a requirement...
>
>But how does it work, eg, in EDUPUB? Does it mean that the, say, HTML
>file contains some non-visible spans with ID-s, where the ID somehow
>reflects the page number of the print version? And what happens if there
>is a new printed version (but no new digital version)?
>
>Ivan
>
>
>> On 08 Apr 2015, at 16:55 , Bill Kasdorf <bkasdorf@apexcovantage.com>
>>wrote:
>> 
>> This issue is mainly pertinent to publications originally published in
>>print and only later provided in digital form. There are of course
>>millions of such publications in libraries, which is the main ___domain of
>>the HathiTrust.
>> 
>> The reason this is important is that there are four primary use cases
>>characteristic of this "print is the version of record" situation:
>> 
>> --The indexes in print books typically (though not universally) point
>>to arbitrary points in the content: the print page breaks.
>> --Cross-references in the text of print books typically refer to print
>>page breaks.
>> --Citations in the literature (very important in scholarship) point to
>>print page breaks.
>> --The accessibility community strongly advocates the recording of print
>>page breaks in digital versions of print publications, particularly
>>textbooks, so that when the teacher says "turn to page 53" the
>>print-disabled user can find that spot (as can any user of the digital
>>version).
>> 
>> While most W3C folks would argue that this is a relic of print-based
>>publishing (and it is), and would argue that these should be replaced
>>with real links to meaningful points in the content, not to something as
>>arbitrary as a print page break (which is indisputably better), it
>>unfortunately happens to be a real need when we are in this transitional
>>phase; and all of those millions of old books, and the citations to
>>their pages, do actually exist. So it really does turn out to be useful
>>to have "markers" in a digital file designating where the print page
>>breaks are--accompanied, btw, with an ability to designate _which_ print
>>edition the markers refer to.
>> 
>> As distasteful as that is to digerati like us. ;-)
>> 
>> And btw, in the context of EPUB-WEB, for these very reasons (especially
>>the accessibility issue), providing such print page break markers is
>>recommended in the EDUPUB spec, which provides a recommended syntax for
>>the marker. It doesn't attempt to contain the page with a
>>start-and-end-tag pair, because you run into well-formedness issues;
>>instead, it just provides an empty element that says, in effect, "page
>>53 in the print book starts here."
>> 
>> --Bill K
>> 
>> -----Original Message-----
>> From: Ivan Herman [mailto:ivan@w3.org]
>> Sent: Wednesday, April 08, 2015 4:30 AM
>> To: Stein, Ayla
>> Cc: Thierry Michel; Bill Kasdorf; W3C Digital Publishing IG
>> Subject: Re: [dpub identifiers] Please review updated Identifiers TF
>>wiki
>> 
>> Thank you Ayla.
>> 
>> Without going into the details of the proposal, the question it raises
>>to me, as part of the EPUB-WEB discussion, is what is the role (if any)
>>of an identifier that identifies a *page*. Indeed, depending on the
>>style of the online document, a page is
>> 
>> * a very ephemeral entity and thereby it is not really a suitable
>>target for an identifier (a flowing book, whose pagination is based on
>>user interaction, is the obvious example)
>> * a fixed entity, ie, for fixed layout document
>> 
>> it strikes me that an identifier approach for an EPUB-WEB document
>>needs to cover the second item, too. AFAIK, CFI can do that only if the
>>fixed layout document is organized in terms of a series of separate
>>files within the package, but that may not cover all the cases (e.g., if
>>a presentation slide show is stored as a portable document, and the
>>'pagination' is the result of a javascript running on one single source).
>> 
>> Whether the approach taken by the HathiTrust document is the right one
>>(as far as I could understand from a cursory look it assigns a UDDI type
>>URN to each page, which is then combined with the identifier of a
>>'volume') is a different question. I am not sure this is a general
>>solution but I guess the more general questions are certainly valid!
>> 
>> Thanks again
>> 
>> Ivan
>> 
>> 
>>> On 07 Apr 2015, at 20:21 , Stein, Ayla <astein@illinois.edu> wrote:
>>> 
>>> Matt's comment about content version reminded me of some ongoing work
>>>at the HathiTrust Research Center. One of the problems they're looking
>>>into is identifying an object at a specific point in time. Their
>>>initial proposal document discusses several different issues regarding
>>>identifiers in HTRC and can be accessed here:
>>>https://www.ideals.illinois.edu/handle/2142/73147. I've also added it
>>>as an attachment to this email.
>>> 
>>> I know there's also been some work on a prototype for identifying
>>>versions, but the draft of that document is not yet available for
>>>circulation. While these aren't necessarily solutions that can be
>>>implemented here, I think it's of interest and relevance to this
>>>discussion.
>>> 
>>> Thanks,
>>> 
>>> Ayla
>>> 
>>> -----Original Message-----
>>> From: Ivan Herman [mailto:ivan@w3.org]
>>> Sent: Tuesday, March 24, 2015 3:32 AM
>>> To: Thierry Michel
>>> Cc: Bill Kasdorf; W3C Digital Publishing IG
>>> Subject: Re: [dpub identifiers] Please review updated Identifiers TF
>>> wiki
>>> 
>>> 
>>>> On 24 Mar 2015, at 09:30 , Ivan Herman <ivan@w3.org> wrote:
>>>> 
>>>> I have added the media fragment URI to the wiki with few examples.
>>>>Thierry, if you want to add something, please do at:
>>> 
>>> Sorry, pushed the send button too soon:
>>> 
>>> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#W3C.E2.80.99s_
>>> Media_Fragment
>>> 
>>> Thanks
>>> 
>>> ivan
>>> 
>>>> 
>>>> 
>>>>> On 23 Mar 2015, at 08:20 , Thierry MICHEL <tmichel@w3.org> wrote:
>>>>> 
>>>>> Bill,
>>>>> 
>>>>> I would also suggest Media Fragments URI 1.0 It specifies the syntax
>>>>> for constructing media fragment URIs and explains how to handle them
>>>>>when used over the HTTP protocol.
>>>>> 
>>>>> http://www.w3.org/TR/2012/REC-media-frags-20120925/
>>>>> a W3C Recommendation 25 September 2012.
>>>>> 
>>>>> Best,
>>>>> 
>>>>> thierry.
>>>>> 
>>>>> On 22/03/2015 17:51, Bill Kasdorf wrote:
>>>>>> Thanks to Tzviya, we have some substantive content for review on
>>>>>> the Identifiers TF wiki at [1].
>>>>>> 
>>>>>> This initial draft of background information gives brief
>>>>>> descriptions, links, discussion, and examples of three possible
>>>>>> options for consideration as the basis for our initial work on a
>>>>>>Fragment Identifier:
>>>>>> 
>>>>>> --EPUB CFI
>>>>>> 
>>>>>> --W3C Packaging for the Web Fragment Identifiers
>>>>>> 
>>>>>> --The Open Annotations Fragment Selector
>>>>>> 
>>>>>> In addition, there's a placeholder for XPath, and we need to
>>>>>> collect suggestions for other relevant specs or technologies to
>>>>>> take into account, e.g. XPointer.
>>>>>> 
>>>>>> Please take a look at this before the Monday IG call and suggest
>>>>>> any others we should add. Feel free to add a placeholder (ideally
>>>>>> with a
>>>>>> link) if you aren't prepared to add the prose.
>>>>>> 
>>>>>> And although we now have a good list of participants in this TF,
>>>>>> please add your name if you'd like to participate as well. We will
>>>>>> discuss next steps on the call Monday, which will probably involve
>>>>>> a TF conference call later this week if we can find a time that
>>>>>>works for everybody.
>>>>>> 
>>>>>> --Bill K
>>>>>> 
>>>>>> [1]
>>>>>> https://www.w3.org/dpub/IG/wiki/Task_Forces/identifiers#Background
>>>>>> 
>>>>>> Bill Kasdorf
>>>>>> 
>>>>>> Vice President, Apex Content Solutions
>>>>>> 
>>>>>> Apex CoVantage
>>>>>> 
>>>>>> W: +1 734-904-6252
>>>>>> 
>>>>>> M: +1 734-904-6252
>>>>>> 
>>>>>> @BillKasdorf <http://twitter.com/#!/BillKasdorf> //
>>>>>> 
>>>>>> _bkasdorf@apexcovantage.com_
>>>>>> 
>>>>>> ISNI: 0000 0001 1649 0786__
>>>>>> 
>>>>>> https://orcid.org/0000-0001-7002-4786
>>>>>> <https://orcid.org/0000-0001-7002-4786?lang=en>
>>>>>> 
>>>>>> www.apexcovantage.com <http://www.apexcovantage.com/>
>>>>>> 
>>>>>> Corporate Logo-Copy
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> ----
>>>> Ivan Herman, W3C
>>>> Digital Publishing Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>>> 
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ----
>>> Ivan Herman, W3C
>>> Digital Publishing Activity Lead
>>> Home: http://www.w3.org/People/Ivan/
>>> mobile: +31-641044153
>>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>>> 
>>> 
>>> 
>>> 
>>> <IdentifiersProposal.pdf>
>> 
>> 
>> ----
>> Ivan Herman, W3C
>> Digital Publishing Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> ORCID ID: http://orcid.org/0000-0003-0782-2704
>> 
>> 
>> 
>> 
>> 
>
>
>----
>Ivan Herman, W3C
>Digital Publishing Activity Lead
>Home: http://www.w3.org/People/Ivan/
>mobile: +31-641044153
>ORCID ID: http://orcid.org/0000-0003-0782-2704
>
>
>
>
Received on Wednesday, 8 April 2015 16:42:14 UTC