Agenda for September 25, 2014 ALTO board teleconference.
- Review action items [Frederick leads discussion]
- Discuss release of ALTO schema v3.0. Review comments received about schema v3.0 (if any) including the recent question from Lars Svensson (Deutsche Nationalbibliothek). [Everyone]
- Report on September 10 ALTO editorial board face-to-face meeting in London: Agenda and summary of items discussed.
[Markus leads discussion]
- Review Github directory structure: Create a repository for general questions (such as the one from Lars Svensson) about ALTO? [Frederick leads discussion]
- Review migration from PBWorks to Github. It appears that all issues have been transferred to Gibhub. Discussion? [Frederick leads discussion]
- Brief report on effort to recruit new board members. Question for Siang Hock: Can you represent Asian languages (instead of recruiting a new board member for this)? [Frederick leads discussion]
- Reminder to send URLs to digital collections which use ALTO XML to Nate so that he can update the use case examples at ALTO Implementers [Frederick harangues everyone]
Action items
- [Action 2014-09-25]
Frederick will add an explanation of what's in the Github repositories 'board' and 'altoxml.github.io'.
- [Action 2014-07-03]
Frederick will contact UT Austin about broken link on Library of Congress's ALTO implementer's webpage.
- [Action 2014-07-03]
Jean Philippe and Jukka will create draft schema version 2.2 and create a README for it. They will send the README to the ALTO for comment.
- [Action 2014-07-03]
Nate will make the corresponding updates to the Library of Congress ALTO webpages on July 8.
- [Action 2014-07-03]
Frederick will announce the draft schema release on the ALTO listserv and distribute the README.
- [Action 2014-07-03]
Brian and Evelien will send hi-res logos for CDNC and KB respectively to Jean Philippe. Jukka will send a hi-res logo for Finland.
- [Action 2014-04-24]
Jean Philippe will create a branch of the v2.1 ALTO schema on Github. The branch will be used to implement the needed named types for ALTO dialects in schema v2.2.
- [Action 2014-04-24]
Jean Philippe will draft a design for the ALTO poster at WLIC 2014.
- [Action 2014-04-24] Jukka will create a short document showing how to migrate open change proposals from PBWorks to Github.
- [Action 2014-02-20]
Everyone will copy the open proposal that they champion from PBworks to Github. The open proposals will become "issues" in Github.
- [Action 2013-12-12] Brian will create text for an ALTO tools and software webpage.
- [Action 2013-11-03] Frederick will recruit a board member from an OCR software development team, perhaps Tesseract.
- [Action 2013-11-03] Frederick will recruit a board member from an Asian language country.
- [Action 2013-09-19] Everyone to send ALTO use cases to Nate. Nate will add new use cases to the ALTO implementers webpage.
- [Action 2013-04-11] Frederick will draft a change proposal for "normalized" coordinates.
Attending members
- Evelien Ket
- Frederick Zarndt
- Joachim Bauer
- Nate Trail
Minutes
Markus's London Meeting Notes from 10-September-2014
Attendees
- Joachim Bauer
- Jean Philippe Moreux
- Jukka Kervinen
- Markus Enders
-
Github PBWorks migration: Not all board members are seen as board members on the Github page. Reason: Some board member profiles are not public. Each member has to set her/his own profile to public.
-
Discussion about Legacy data on PBWorks: Rejected use cases such as Block Type extension, BNF’s TYPE proposal and OCR QA proposal are rejected/obsolete and should not be moved to GitHub.
All other use cases (open or closed) have been moved.
-
PBWorks had a Tag Samples folder. Sample/V1 can be deleted and will not be moved. Sample/V2 has been moved to GitHub.
-
Board meeting minutes and agendas are not yet on Github.
-
Miscellanous folder contains internal documents such as description of change request process, membership criteria. All these items are not yet on GitHub, but should be moved to “Board” repository in Github.
-
Member proposals do/can contain private information that we do wish to make publically available. Either (1) delete member proposals, or (2) get a private Github respoitory so that information can be hidden.
Suggestion was to delete member proposals.
-
Old comments and minutes need to be moved. Comments to be deleted (notes are not clear about this).
-
Use the 'board' repository as a miscellanous container; it is currently unused.
-
Discussion about Action Items and Meetings: Jukka proposed to document Action Items more formally.
Currently Action Items are “just” bullet points in the Meeting Notes and are sent via Email; the Board Notes are available under altoxml.github.io as HTML.
Action Items are sent around viaeEmail prior to next board meeting.
Proposal: A Board Meeting would be considered as a Milestone. Action Items are assigned to the milestone as well as to certain board members.
Advantage: Tracking action items would be easier; long term documentation; less risk of losing action items.
Biggest concern with the proposal is that it would be a lot of administrative overhead and board notes/action items are not viewable from external except if data gets copied.
No decision was made.
-
Outstanding Use Cases: Only two of the outstanding use cases were discussed. The four board members that met would like to propose the following to board:
- Glyph: Accept the proposed solution as outlined in current use case (https://github.com/altoxml/schema/issues/26), with the following changes:
Rename the "variance" attribute to "variant". We discussed whether “alternative” (already available on word level) should be used, but "alternative" seems to have a different semantic; it is used to store manually corrected words.
Glyph should get bounding box coordinates, same as words, textline etc. Bounding Box coordinates are optional.
We may want to introduce “variant” attribute on word level as well; but this wasn’t covered by this use case. In theory we should submit a new use case; but may want to change current use case.
This was therefore not discussed in detail. Would like to hear other board member’s opinion.
-
Shape: Accept the proposed solution as outlined in current use case (https://github.com/altoxml/schema/issues/22) with the following changes:
(1) Proposed values need to be all float values; (2) Add description to the element that makes it clear when to use shape (exact description of the shape) and a bounding box (may want to create separate documentation);
(3) Do not allow RECTANGLE in shape as this would then become a bounding box; (4) Shape should also be added to Glyph.
-
We also started to discuss the impact of the ROTATION attribute on BlockType and its impact on underlying elements. This is currently not defined but should be made clear in the schema documentation whether:
(1) The ROTATION applies to all embedded elements, which means that all embedded elements are ROTATED as well; this would have an impact on the coordinates of all underlying elements.
(2) Or The ROTATION only applies to the Block; all embedded elements are unaffected.
Joachim reported on the face-to-face ALTO board meeting in London. He also reported that he migrated a couple of missing issues from PBWorks to Github.
Markus's notes from the meeting are included above.
Joachim reported that there was some confusion about the repositories 'board' and 'altoxml.github.io'. Frederick explains that Jukka created the 'board' repository
intending for it to contain board agendas and minutes. Frederick later created the 'altoxml.github.io' repository so that board agendas
and minutes could be viewed as webpages (the name 'altoxml.github.io' follows Github conventions for webpages). Joachim suggests that an explanation
of what's in these 2 repositories is needed (to do item for Frederick added).
Joachim reported that members think that the original change requests from IMPACT.