Agenda for March 5, 2015 ALTO board teleconference.
View all open action items here.
Attending members
Minutes
Frederick and Evelien discussed the implementation of a wiki for the ALTO Github repository. The wiki can be viewed here.
Attending board members (Evelien, Jean Philippe, and Frederick) abandoned the draft agenda in order to discuss an email Jean Philippe had sent to all board members about ePub support of languages, text direction, etc. The discussion focused on CSS Writing Modes. It must be possible to encode text direction and language at the word level. ePub is based on HTML. In ePubl and HTML it is possible to encode text direction and language at page-level, text block level, and word level. CSS vocabulary can define reading order and text direction.
Evelien encountered 'rotation' in the context of text blocks and wondered if the text within the block was rotated or if the block itself was rotated.
Frederick asks if it is possible to add capabilities similar to ePub and CSS Writing Modes to ALTO and maintain backward compability. For example, Jean Philippe says that it is quite common to include Western words in the text of Japanese newspapers. Further research needed.
Jean Philippe says the BnF is now creating ePub from ALTO XML files. It's difficult because that accuracy of the OCR often requires lots of manual corrections. Also because ALTO XML files are page-based; ePub is not page-based. Jean Philippe showed an example of how the BnF is creating ePub from ALTO XML. It's relatively easy for books with a single level, like novels, but more difficult for more complex documents like textboks.
Evelien asks how BnF corrects ALTO files. Jean Philippe explains that BnF delivers ALTO XML files plus images to service providers and gets in return ePub files plus corrected ALTO XML files. ALTO text correction is quite expensive according to Jean Philippe.