Agenda for October 19, 2017 ALTO board teleconference.

  1. Find and tell a non-offensive, maybe self-deprecating joke before the meeting begins and/or after it ends. [All]
  2. Review action items. [Frederick leads discussion]
  3. Vote on new board member (CV and email application sent by email). [Frederick leads discussion]
  4. Report on ALTO version 4.0, the ALTO XML schema README, and its CC-BY-SA 4.0 license. [Joachim, Evelien, Clemens, and Stefan lead discussion.]
  5. Report on registration of MIME type for ALTO. See issue 40. [Jean Philippe and Frederick lead discussion.]
  6. Report on ALTO reference examples; see issue 1. [Stefan or Christian lead discussion. (Stefan raised this issue at the last board meeting. He cannot attend this board meeting but his colleague and the person who originally added the issue to github will attend. Christopher Clausner may report on it.)]
  7. Report on issue 32. [Art leads discussion. (Art volunteered to report on this issue during the last board meeting.)]
  8. Report on progress with processing history change requests. See issue 39. [Clemens leads discussion]
  9. Report (if needed) on integration of ALTO with the International Image Interoperability Framework (IIIF) isssue 45. [Jean Philippe and Clemens lead discussion]

View all open action items here.

Attending members


Ashok reports on the Toronto IIIF meeting. He reports that IIIF will not figure out text granularity but will instead defer to ALTO. The text granularity IIIF group also discussed the APIs for accessing and interacting with text. As a result of the IIIF discussions Ashok is of the opinion that ALTO should give renewed attention to APIs access to text. Art agrees with Ashok's assessment of IIIF's direction wrt text and APIs. He further added that IIIF is beginning to be concerned about overhead, so whatever API, access, or interaction mechanism is provided must be lightweight and should be provided by a group - namely, ALTO - that deals with text. IIIF wants access to glyphs but does not want to develop the access / interaction mechanism.

Access to text by IIIF (or others) is discussed in issue 45.

Clemens argues that the schema constitutes an API and therefore no additional API needs to be developed. Clemens also mentions that it is possible to generate class files from a schema. Both Art and Ashok agree.

Frederick suggests that an off-line meeting for those interested in this issue (Ashok, Art, Clemens, Jean Philippe, and perhaps others). In Jean-Philippe's absence, Clemens agrees to publish a Doodle poll to find a suitable time for the off-line meeting.

Frederick reminds the board to send new ALTO use cases to Nate Trail so that he can add them to the Library of Congress's list of ALTO use cases. Raju has written to colleagues asking them for examples of ALTO used for less common languages and for permission to use the examples; he has not yet gotten replies but thought he might have gotten replies by the next board meeting.

Clemens remarks that he has tried to recruit someone experienced with hOCR 9 as per issue 23 (these attempts lead to Ashok joining the board). Further he remarks that his library (or German libraries? unclear - Clemens please correct) in is negotiation with a colleague who will assume the role of editor for hOCR from Thomas Breuel; if there is an opportunity to do so, Clemens will ask the new hOCR editor to joint the ALTO board.

Board members unanimously approved Ralph Marschall's application to become an ALTO board member.

Discussion of schema issue 42. Evelien reports that there is indeed a problem with multiple shape elements in one text line and also proposes a fix for it. The fix breaks backward compatibility and, by the ALTO board's convention, mandates a new major version, version 4. Joachim briefly reviewed the it; according to Christian, Stefan has not yet reviewed it. Evelien, Joachim, and Stefan will review the proposed fix before the next meeting and discuss it then. This fix will be part of ALTO version vversion 4 along with the new Creative Commons license and the glyphs changes proposed in issue 26.

Frederick reports that the ALTO MIME type IETF draft RFC failed the IETF RFC checker rather miserably. Frederick promises that Jean-Philippe and he will continue to work on it and try to submit it to the IETF before the next meeting.

Christian reports on PRImA's use of ALTO: it would be helpful to have an ALTO XML file with associated image(s), either real world ALTO files or artificially generated ALTO files and images. Frederick, Raju, Evelien, and Clemens will ask the Library of Congress, Singapore National Library Board, the Netherlands, and the IMPACT Centre of Competence respectively for examples of real world use of ALTO. These example ALTO files and images will be added to the ALTO Github repository under board issue 32.

Frederick asks to manage examples of ALTO files with different versions of ALTO. Christian suggests that putting files with different versions in different folders, one for each of the versions.

Clemens reports on issue 39, ALTO processing history. Some of the proposed processing history changes are included in the ALTO vversion 4 draft schema. Frederick notes that the README for version 4 must be updated, too. Clemens notes that, save for the processing history vocabulary (names), the changes for processing history are ready and, pending board approval, will be included in ALTO vversion 4.

Evelien and Clemens will email the board asking the board to vote ACCEPT or REJECT the changes proposed for issue 42 and issue 39 respectively. The changes for the Creative Commons license and the glyphs (issue 26) have already been approved.

A long discussion ensued about processing history. If interested, please listen to the board meeting recording which you can request. Clemens mentions that W3C standards such as PROV1 are being developed. Ashok suggests a few examples of processing history would be useful; Art agrees. Clemens says that the offline meetings about processing history also suggested the need for a common vocabulary, however, a common vocabulary was not developed. Many of the board agree that examples of processing history would be useful. Clemens suggests that defining the provenance of an ALTO file should be further developed; he thinks that the processing history will not be able to describe complex processing histories. Jukka says that the provenance might also be captured in METS as PREMIS DIGIPROV events. Clemens said that adding another processing history schema issue is not necessary at this point but it would be a good discussion at future face-to-face board meeting, that is, where does ALTO processing history end and some other processing history mechanism begin.