Re: In-band text track captions and subtitles from Bob Lund on 2014-06-16 (public-html@w3.org from June 2014)

From: Bob Lund <B.Lund@CableLabs.com>
Date: Mon, 16 Jun 2014 21:23:53 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
CC: "public-html@w3.org" <public-html@w3.org>
Message-ID: <CFC45D5C.418F4%b.lund@cablelabs.com>
On 6/15/14, 5:32 PM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com> wrote:

>[Replying only on the HTML WG to avoid cross-posting.]
>
>
>
>On Wed, Jun 11, 2014 at 7:26 AM, Bob Lund <B.Lund@cablelabs.com> wrote:
>> In-band Tracks CG and HTML WG members,
>>
>> "Sourcing In-band Media Resource Tracks from Media Containers into
>>HTML� [1]
>> defines a method for using DataCue to expose MPEG-2 Transport Stream
>> captions (CEA 708 [2]) and subtitles (SCTE 27 [3]). This same approach
>>could
>> be used for exposing Text Track Cues for other media containers that
>>don�t
>> use VTTCue. Discussion during development of the definition raised some
>> questions about TextTrack and DataCues that might benefit from
>>discussion in
>> these groups.
>>
>> - DataCue is currently defined in W3C HTML5 CR [4] for use on metadata
>>text
>> tracks. Does text need to be added to [4] to clarify that DataCue can be
>> used for non-metadata text tracks?
>
>DataCue could be defined on text tracks of any kind

Then the spec language needs to be changed to reflect this.

> - in fact, we have
>already stopped throwing errors when this happens:
>https://www.w3.org/Bugs/Public/show_bug.cgi?id=25261 .
>
>
>> - The sourcing spec [1] defines DataCue.data to contain the CEA 708 or
>>SCTE
>> 27 data. [2] and [3], respectively, define the rendering behavior
>>required
>> for these formats. Should there be a clarification in HTML specs that
>> DataCue can be rendered by the UA as long as a rendering specification
>>is
>> referenced?
>
>It would be possible to source CEA708 captions into DataCue objects
>and have the kind=captions and the UA render the captions according to
>[2].

Good -seems that way to me, also.

> This would expose the cue content to JS, but without JS
>developers being able to make use of the CEA708 rendering capabilities
>of the browser. In my opinion in this case the browser should expose
>CEA708Cue objects and the rendering abilities instead.

While this might be desirable, there are several factors that need to be
taken into account:

1) What is most critical for accessibility and regulatory reasons is that
a mechanism exist for 708 captions to be rendered with controls for
caption tracks to be showing or hidden/disabled.

2) The only �spec� for 708 rendering is CEA708. So there is no defined set
of higher level rendering capabilities.  The 708 to VTT spec [6] could be
used but I think that needs broader consensus before it becomes the
 �standard� 708 cue representation. This can happen but it can be done as
a second phase of this work.

3) JS might want access to the cue for non-rendering purposes, e.g.
searching content based on keywords/phrases. Exposing the raw 708 data
suffices for this.

4) It's presumed that there is some intermediate, higher level form the
captions take, prior to rendering. This needn�t be the case, for example
if the UA contains a hardware 708 rendering capability.

IMO, we should expose 708 data as proposed - service blocks, either via
the DataCue or a 708Cue, ASAP. We can work on a more semantically rich cue
format if we want in parallel with that.

>
>
>> - There may be the implication that since DataCue is currently
>>specified for
>> use with metadata text tracks, then �captions" and �subtitles" text
>>tracks
>> that use DataCue will never be rendered by the UA. Is language needed in
>> HTML to clarify that non-metadata TextTracks using DataCue should be
>> rendered according to @mode state?
>
>I don't think there is anything unclear about the DataCue and its
>rendering abilities. The spec already says:
>"The rules for updating the text track rendering for a DataCue simply
>state that there is no rendering, even when the cues are in showing
>mode and the text track kind is one of subtitles or captions or
>descriptions or chapters."
>
>This just means that mode=showing will "overlay the cues as
>appropriate", which in the case of DataCue means: showing nothing.

I don�t see this in the either the lastest HTML5 CR or ED. However, if
there is consensus and spec language that precludes rendering tracks that
expose data via DataCue, then we could define a format specific cue, e.g.
CEA708Cue that exposes the same data, i.e. service blocks binary data.


>
>
>> - The question arose whether it is ever the case where �captions�,
>> �subtitles�, �descriptions� and �chapters� text tracks would NOT be
>>rendered
>> by the UA. The existing definition for UA behavior seems to imply that
>>the
>> UA must render these types of text tracks when TextTrack.mode is set to
>> �showing� [5] . Does the HTML spec language need to be more explicit?
>
>I do wonder what to do with CEA708 captions while browsers don't
>convert them to WebVTT to expose as VTTCue, and while they don't have
>rendering implemented for them,

I think that tracks that don�t have rendering implemented are, by
definition, metadata text tracks.

A UA that supports MPEG-2 TS media resource should be capable of rendering
708 captions. It may expose cue data as �service blocks�, through DataCue.
If DataCue is objectionable for some reason, then we should define a
CEA708 Cue with a �service_block� attribute.

>but while they are able to parse
>CEA708 chunks and throw them to JavaScript. Would it make sense to use
>kind=captions but with cues being exposed as DataCue to indicate to
>the JS developer that they have to do the rendering manually?

It seems metadata tracks exist for this purpose. The
�inBandMetadataTrackDispatchType� can be set to identify the data as 708
caption �service blocks�.

>
>
>> - Is it OK to have a �captions� or �subtitles� text track that that
>>does not
>> define a cue format, i.e. is only rendered by the UA?
>
>I think that once the browser implements rendering, a specific cue
>format should be defined, too.
>
>
>> A couple of alternatives to the use of DataCue for �captions� and
>> �subtitles� text tracks were discussed.
>>
>> Alternative #1: Format specific �captions� and �subtitles� cues. A
>>CEA708Cue
>> and SCTE27Cue could be defined that derives from DataCue.  These format
>> specific cues would have @data attribute that would contain the raw
>>CEA708
>> and SCTE27 data. Is there any advantage to such a format specific cue
>> definition over direct use of DataCue?
>
>Since the browser actually renders a CEA708Cue, it would most
>certainly have more properties parsed out from the CEA708 format than
>just the plain data.

This is implementation specific. A smartTV might use an embedded 708
rendering capability that takes as input the 708 caption coding data.

> It would, for example, know the Window and the
>Pen attributes. A proper cue object should expose these properties
>properly.

What constitutes �proper�? CEA708 is the only �standard� that exists today
so exposing cues using that syntax should be considered �proper�. Your 708
to WebVTT mapping definition would enable exposing 708 as a VTTCue. But, I
think broader consensus on using this is needed. There are no other
alternatives that I am aware of.

>
>> Alternative #2: Translate MPEG-2 �captions� and �subtitles to WebVTT
>>and use
>> a derivative of VTTCue (derivative is necessary as you�d still want to
>>make
>> the raw, binary cue data available). CEA 708 captions could be exposed
>>as a
>> VTTCue derivative according to [6]. SCTE 27 subtitles are images and no
>> mapping to VTTCue is defined (or possible?). DVB subtitles [7] also
>>mostly
>> uses the image alternative and would need a mapping to WebVTT.
>
>I don't actually mind this.

I think this is a possibility but IMO it�s a longer term solution. We need
a 708 captions solution before this longer term solution would be
available. IMO, a reasonable one is point to the CEA708 spec for rendering
requirements and expose 708 �service block� data by DataCue or something
similar, e.g CEA708Cue.

Thanks,
Bob

>Cheers,
>Silvia.
>
>> Are there any other points to consider on this topic?
>>
>> Thanks,
>> Bob Lund
>>
>> [1] http://rawgit.com/w3c/HTMLSourcingInbandTracks/master/index.html
>> [2] Good explanation http://en.wikipedia.org/wiki/CEA-708. Non-free spec
>> 
>>http://www.ce.org/Standards/Standard-Listings/R4-3-Television-Data-System
>>s-Subcommittee/CEA-708-D.aspx
>> [3] http://www.scte.org/documents/pdf/standards/SCTE_27_2011.pdf
>> [4]
>> 
>>http://www.w3.org/TR/html5/embedded-content-0.html#guidelines-for-exposin
>>g-cues-in-various-formats-as-text-track-cues
>> [5] http://www.w3.org/TR/html5/embedded-content-0.html#text-track-model
>> [6]
>> 
>>https://dvcs.w3.org/hg/text-tracks/raw-file/default/608toVTT/608toVTT.htm
>>l
>> [7]
>> 
>>http://www.etsi.org/deliver/etsi_en/300700_300799/300743/01.03.01_60/en_3
>>00743v010301p.pdf
>>
>>
>>
Received on Monday, 16 June 2014 21:24:18 UTC