Agenda for September 20, 2017 ALTO board teleconference.

  1. Review action items. [Frederick leads discussion]
  2. Choose next open issue(s) for focus. See open isssues. [Frederick leads discussion. ALL participate!]
  3. Report on ALTO version 3.2, the ALTO XML schema README and what must be added to it for verions 3.2, and its CC-BY-SA 4.0 license. [Joachim leads discussion.]
  4. Report on registration of MIME type for ALTO. See issue 40. [Jean Philippe leads discussion.]
  5. Report on progress with processing history change requests. See issue 39. [Clemens leads discussion]
  6. Report (if needed) on integration of ALTO with the International Image Interoperability Framework (IIIF) isssue 45. [Jean Philippe and Clemens lead discussion]

View all open action items here.

Minutes

Frederick asks Raju if Siang Hock wil continue on the ALTO board. Raju said that he would ask when he returns to Singapore (Raju is in India at present).

Raju said that he give some attention to action item #4 for the next board meeting. Jean Philippe reports that he has attended a recent IIIF teleconference. He also says that it would be a good time to update use cases for IIIF and ALTO (see issue 45). Jean Philippe says he is writing an ALTO+IIIF use case for illustrations with Glen Robson, a IIIF guru. Jean Philippe mentions the IIIF Working Meeting in Toronado October 11-13.

Frederick reports that the IETF draft document for an ALTO MIME type was reviewed at the Dresden face-to-face meeting. It is ready to be submitted to IETF. Jean Philippe and Frederick will do this before the next board meeting.

At the face-to-face meeting we also reviewed the README for ALTO schema 3.2. It's ready to go. Nate volunteers to work with Joachim to publish schema version 3.2 before the next board meeting.

The board reviewed open issues in the github repository. Evelien says we should give attention to issue 42, a bug reported by an ALTO user. The bug fix would break backwards compatibility. Stefan reminds the board that it had a long discussion about changes: for every schema change that breaks backward compatibility will be a new schema major version. Jukka wonders if issue 42 also affects glyphs. Evelien will look at other elements to see if issue 42 affects them too.

In view of the overdue fix of issue 42 which will be added to the next schema release and which will break backward compatibility, the release of ALTO schema version 3.2 is no more. The erstwhile version 3.2 will be renamed to version 4.0 and incorporated with a fix to issue 42 (the changes already agreed to for version 3.2 will be retained in version 4.0). Evelien will investigate the necessary changes to fix issue 42 and see what other schema elements may be affected by the change to fix it. Evelien will work on the change and propose schema changes to fix it in time for the next meeting; Stefan and Joachim will review the changes. The erstwhile schema version 3.2 README will be changed to accomodate the changes necessary for issue 42.

Several board members participated in a lively and lengthy discussion about reference examples for ALTO. Stefan will add an issue to githhub to address reference examples for ALTO or revitalize issue 1; he will report on progress at the next ALTO baord meeting.

Art volunteered to investigate and report on issue 32 at the next board meeting.

Ashok is interested in issue 23, OCR confidence calculation, and will assume the role of champion for the issue.

Stefan asks if the board will add new features and if it has a future direction in mind for ALTO. Frederick reports that new features have been added to ALTO as a result of external requests and defect reports.

A short discussion about text direction and languages. Frederick said that ALTO has been developed with European languages in mind. Raju reported that Singapore National Library had no significant issues using ALTO for Malay, Chinese, and English; he does not anticipate problems for other languages of interest to Singapore. Raju said that there may be problems with non-spacing characters, but, because Singapore has not yet done much with such languages, he cannot say much about problems with ALTO. Ashok reported briefly on Google's experience with OCR and languages with mixed text, for example, Arabic and English. He mentioned as well the issue of compounding characters, that is, character plus diacritic marks which readers perceive as a single character. Unicode handles such characters just fine but the OCR representation is not well-defined. Ashok mentions problems with Japanese characters (ruby). Raju said that the Thai language does not have a notion of a word, that is, there is no space between words in a line of a Thai language sentence. Ashok and Raju will add one or more issues to cover the issues raised during this discussion.