Documentation for encoding texts in XML

Table of Contents

I: Downloads

II: Encoding the Text

Part 1: Pre-Encoding Instructions

  1. Before you start you'll need:
    • a photocopy of the original text (please do not mark)
    • a hard copy of the electronic version of the text
    • a copy of the electronic version of the text
  2. On the printout of the electronic text:
    • Note exactly where page breaks occur;
    • Proof the text again: Compare the hard copy of the digitized text against the photocopy of the original and note any changes on the printout
    • Make any changes you noted to the electronic edition in jEdit.
  3. When you're ready to encode the text:
    • Open the text in jEdit.
    • Open Template.xml for articles or letterTemplate.xml for letters in jEdit
    NOTE: ignore any error messages at this point.
  4. Save a copy of the template file under a new name so you don't alter the original template file by mistake.
    • All letters should be named in the same format:
      1. First, use the prefix "let" for letter. This is followed by a period.
      2. Next, designate the author of the letter: "mac." for MacGreevy and "yeats." for Yeats. Place this directly after "let."
      3. Render the date the letter was written in a yyyy-mm-dd format. For example, a letter dated 15 March 1926 would be "1926-03-15."
      4. Finally, add the file extension ".xml". If the date is unknown use the circa date listed in the file.
      For example, a letter from Yeats circa 15 March 1926 would be named: let.yeats.1926-03-15.xml
    • Filenames for articles also follow a naming convention.
      1. Names should have no more than eight letters/numbers.
      2. The first three letters indicate the type of document that is being encoded:
        • articles: art
        • letters: let
        • poems: pms
        • non-fiction prose: nfp
        • fiction prose: fip
        • monographs: mon
        • official documents: ofd
        • interview: int
      3. The second three letters indicate the name of the journal or text, for example:
        • The Father Mathew Record: fmr
        • Capuchin Annual: cpa
      4. The third part of the name is numerical and contains the number of the article in the series, or the version of a poem being encoded, etc.
      For example, the fourth article from the Irish Times would be art.irt.0004.xml.

Part 2: Encoding the header

Header encoding in letters

Within the <teiHeader>, look for any text that ends with double question marks (??). This must be replaced with text unique to this document instance. In particular, you will need to enter text within the following tags:

  1. The first instance of (??) occurs in the DOCTYPE declaration in the ENTITY reference created for images of the pages of the text. (If there are no images associated with the text, delete the ENTITY reference and move on to the next step.)
    Later on, in the body of the text, <figure> tags will be used to create links to images (this portion of the encoding is discussed in the sections on figures in the <head> and the <p> tag). However, before encoding the links in the text references to the images must be added to the DOCTYPE declaration. The template contains the beginnings of one such declaration, which you can create as follows:
    1. Within the DOCTYPE declaration locate the following ENTITY reference:
      <!ENTITY ?? SYSTEM "??" NDATA JPEG>
    2. The first set of question marks should be replaced with the alphanumeric string you want to use in the <figure> tags later on. This will be like a nickname for the image. For example, if you wanted to call the first image "image1" your ENTITY declaration would read:
      <!ENTITY image1 SYSTEM "??" NDATA JPEG>
    3. Next, replace the other set of question marks with the actual filename of the jpeg image file. Be sure to include the .jpg file extension and do this within the quotation marks. So if your file was named "m1.jpg" your ENTITY declaration should now read:
      <!ENTITY image1 SYSTEM "m1.jpg" NDATA JPEG>
    4. You will repeat these steps adding separate entity declarations for each image you add to the document. Be sure to keep subsequent declarations before the closing square bracket and angle bracket (i.e., ]> ).
    5. You have now finished the ENTITY reference in the DOCTYPE declaration. You will need to use these references to create links to the images in the document when you are encoding the body of the text.
  2. <tei2>: in the attribute field id add the id number of the text, which is the file name minus the .xml. For example the file let.mac.1925-12-03.xml would be:
    let.mac.1925-12-03
  3. <title>: enter the names of the author and recipient and the date in the empty spaces in the title. All titles will follow the format of this example:
    <title type="main">Letter from <persName reg="George Yeats" type="sender">George Yeats</persName> to <persName reg="Thomas MacGreevy" type="recipient">Thomas MacGreevy</persName>. c. 15 March 1926</title>
    <title type="version">A Machine-Readable Version</title><author>George Yeats</author>
    Be sure to add the regularized form of the names in the reg attribute of the <persName> tags of the <title> as well.
  4. <respStmt>: in the <name> tag following the <resp> tag that states: Creation of machine readable text by: enter:
    • Susan Schreibman if the letter is authored by McGreevy or
    • Ann Saddlemyer if the letter is authored by George Yeats.
    Enter your name in the <name> tag of the third <resp> statement which says Header creation and markup by:.
  5. <date>: on the day you finish markup, enter that day's date. Inside the tag write out the date, day first, then month, then year. In the value attribute write the date in the yyyy-mm-dd format:
    <publicationStmt><date value="2001-10-08">8 October 2001</date>
  6. <extent>: this will be completed as the very last step. See part 5: Final Encoding for more information.
  7. <idno>: enter the ID Number of the document which is the same as the full file name with the file-type extension, for example:

    let.mac.1932-04-16.xml
  8. <note>: place the abstract for the letter here. The type attribute of this should already have the value of "abstract." [Remember that later on when you regularize and encode the content of the document you will have to return to the abstract to add this encoding as well.]
  9. Make sure that entity references are encoded so that they can be viewed in all browsers. The format for all entity references is "&#xxxx;" with the "x"s representing the numerical code (i.e. "&#0233;" for "é").
    • If you are using a PC you can open up the Character Map program on your computer (found under "All Programs/Accessories/System Tools" in the Windows Start menu) to find these codes. The proper code is found on the bottom right corner after the label "Keystroke:" You can eliminate at ALT or + signs and use only the 4-digit code.
    • If you are using a Macintosh or don't have the character map program, you can find character codes at http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html. The codes are written in the third column.
    • Em and en dashes create a special difficulty. For these characters use the UTF-8 codes: "&#8212; for an EM dash "—" [used to denote a break in the flow of the sentence; used most often] and &#8211; for an EN dash "–" [used to introduce a range or list; used infrequently].
  10. Encode all of the following in <bibl> inside <sourceDesc>. The template file contains the following text:
    <bibl>
    <title>Letter from AUTHOR?? to RECIPIENT??</title>
    <date value="1964-07-18??">DATE??</date>
    <author>AUTHOR??/author>
    <orgName type="archive">LIBRARY HOLDING ORIGINAL COPY??</orgName>.
    DESCRIBE THE WRITING HERE??.
    <num type="ms">MANUSCRIPT NUMBER??</num>
    </bibl>
    • <title>: Delete the words "AUTHOR??" and "RECIPIENT??" and replace with the proper names. So, for example, a letter from Yeats to MacGreevy should look like this:
      <title>Letter from George Yeats to Thomas MacGreevy</title>
    • <date>: Enter date in the dd Month yyyy format, and the value of the date in the yyyy-mm-dd format:
      <date value="1928-03-15">15 March 1928</date>
    • <author>: The author of the letter should be placed here again:
      <author>George Yeats</author>
    • <orgName>: fill in the name of the institution holding the original copies. For MacGreevy's letters this is "National Library of Ireland." For Yeats's letters this is "Trinity College, Dublin" Be sure to use these regularized forms.
      <orgName type="archive">Trinity College, Dublin</orgName>
    • After <orgName>, but before <num>, enter a description of how the letter was created. Choose from the following:
      • Autograph letter signed.
      • Autograph letter unsigned.
      • Typewritten letter signed.
      • Typewritten letter with autograph annotations signed.
      • Typewritten letter unsigned.
      • Typewritten letter with autograph annotations unsigned.
    • <num>: Write the original manuscript number in this field. All TCD manuscripts begin with "TCD MS 8104/" and are immediately followed by the document number. [Note that in the typed versions of these letters, the manuscript numbers appear as "TCD8104/xx." In the .xml version we are adding the "MS" as well as spaces after "TCD" and "MS".] All National Library manuscripts use "NLI MS 30,859" and have no document number.
      Document numbers for the Yeats letters are circled in the upper right-hand corner of the microfilm copies. This number should be placed directly after the "/". So for the example that we have been using, this element would look like:
      <num type="ms">TCD MS 8104/57</num>
  11. <item> in <keywords>: leave this element empty at the moment. These keywords will be added by project editors.
  12. Finally, make sure you have not missed anything. Search for ?? (in jEdit you can open the search dialog box by pressing ctrl+f) and make any changes you missed.
  13. Be sure to save the file!

Header encoding in articles

Within the <teiHeader> section, look for any text that ends with double question marks (??). These must be replaced with text unique to this document instance. In particular, you will need to enter text within the following tags:

  1. <tei2>: in the attribute field id add the filename of the text, minus the ".xml." For example, rev.it.023.xml would be rev.it.023.
  2. <title>: If the text is NOT a book review enter the title in the very first pair of <title> tags. No matter how the title appears in the text typographically, type it in contemporary format,. i.e. articles, prepositions and conjunctions in lower case, nouns, verbs, adjectives, etc. in upper case. Do not include any full stops in the title (but do include question and exclamation marks).

    If the title IS a book review, please use the following format:
    <title type="main">Review of <title rend="italic" level="M">El Artista Adolescente</title></title>
    <title type="version">by <persName reg="James Joyce">James Joyce</persName></title>
  3. <name>: enter your name in the <name> element of the third <resp> tag:
    <resp>Header creation and markup by:
    <name>YOUR NAME HERE??</name> </resp>
    as you will be creating the header and encoding this text.
  4. <date>: on the day you finish markup, enter the date in the dd-Month-yyyy format. Be sure to encode the value of that date in the yyyy-mm-dd format in the value attribute as well. So if you finished encoding on December 10, 2004, it would look like:
    <date value="2004-12-10">10 December 2004</date>
  5. <extent>: this will be completed as the very last step. See Part 5: Final Encoding for more information.
  6. <idno>: enter the filename of the document which is the same as the full file name with the file-type extension, for example:
    rev.fmr.003.xml
  7. all fields in <biblStruct> inside <sourceDesc>:
    • <title>: This element appears twice in <biblStruct>. In the first place enter the title of the article, in the second enter the title of the periodical where the article was published
    • <date>: the date the article was originally published in dd-mm-yyyy form. Again remember to encode the value attribute with the date in yyyy-mm-dd format.
    • <biblScope>: the page or range of pages in the periodical where the article originally appeared.
  8. <item> in <keywords>: The following are keywords used in the <item> elements of the <keyword> list in the <textclass> section of the TEI Header by subject, date and nationality:
    • By subject:
      • Art
      • Architecture
      • Biography
      • Catholicism
      • Critical Method
      • Dance
      • Education
      • Film
      • Folklore
      • Great War
      • History
      • Journals
      • Literature
      • Music
      • Mythology
      • Opera
      • Politics
      • Sport
      • Theatre
      • Travel
    • By nationality:
      • American
      • Dutch
      • English
      • French
      • German
      • Irish
      • Italian
      • Scottish
      • Spanish
      • Swiss
    • By date:
      • 1100-1199
      • 1200-1299
      • 1300-1399
      • 1400-1499
      • 1500-1599
      • 1600-1699
      • 1700-1799
      • 1800-1899
      • 1900-1999
    • By text class:
      • Article
      • Book
      • ArtReview
      • BookReview
      • Criticism
      • DanceReview
      • FilmReview
      • MusicReview
      • Letter-to-the-Editor
      • Obituary
      • Interview
      • TheatreReview
    Assign one keyword from each of these categories to separate <item> elements. The template already includes <item> tags with the proper type attributes to delineate each category.
  9. <classCode>: after reading the article, you should be able to classify it as one of the following categories (typed exactly as it appears here):
    • Article
    • Book
    • ArtReview
    • BookReview
    • FilmReview
    • MusicReview
    • Letter-to-the-Editor
    • Obituary
    • TheatreReview
  10. Finally, make sure you have not missed anything. Search for ?? (in jEdit you can open the search dialog box by pressing ctrl+f) and make any changes you missed.
  11. Be sure to save the file!

Part 3: Encoding the Body

Structural Units of the Text in Letters

After you've finished all the markup in the <teiHeader, within the <text section, you will need to enter information in the following sections:

  1. <div0>: all of the text of the letter will be nested inside these tags. Each subsection is contained in a <div1> tag (there can be several <div1> tags.) If there is further subdivision, within the <div1> tag, nest a <div2> tag, and so forth.

    In all letters everything except the first <pb> tag (which is discussed later) will be in a <div1> tag inside the <div0> tag. This includes the head, opener, text, and closer tags, as well as any postscripts. The template is set up this way. However, for enclosures (of another letter or a poem for example) will need to be placed in a new <div1> section after the closing of the previous <div1> that contained the letter. The type attribute of this new <div1> should be encoded as "enclosure."

    So in the following example, MacGreevy enclosed a poem to Yeats. The poem was placed in a new <div1> and encoded as a lines of verse (see the section on Quotes in Part 4: Regularization and Content Encoding for more information on line groups and poems):
    <div0>
    <pb n="1" />
    <div1>
    <head type="main">Letter from...
    ...letter text...
    <closer>My love to you all at <placeName>82</placeName> &#0151;&#38; thanks again <lb />
    <signed><persName>Tom</persName>.</signed>
    </closer>
    <p type="postscript">The enclosed suggested by the <persName>Picasso</persName>&#38; a white-washed wall in a little restaurant about a fortnight ago.</p>
    </div1>
    <div1 type="enclosure">
    <head>Grian&#0244;n</head>
    <lg>
    <l>The end of Love, Love's ultimate good</l>
    <l>Is the end of Love,</l>
    ...
  2. <head>: Within the <head> tags, enter the author, recipient and date values in place of the terms AUTHOR??, RECIPIENT??, and DATE??. Be sure to fill in the value attribute of the <date> tag in yyyy-mm-dd form.
  3. <p>: Next, enter the text of the letter indicating the following:
    • Paragraphs: Insert <p></p> tags after the </head> tag but before the </div0> tag. Copy and paste the letter, paragraph by paragraph, between <p></p> tags. Make sure that each separate paragraph is tagged with <p> at the beginning and </p> at the end. Each <p> tag should be numbered with the n (number) attribute. For example:
      <p n="1">Every genuine work of art belongs...
    • Special characters: Make sure any special characters that appear in the original text are reproduced in the electronic text. See the section on entity references in the section on Header Encoding in Letters for more information.
    • Page numbers: Pages of the original publication should be indicated in the electronic version with a <pb/> tag. The first page of the letter indication occurs directly after the initial <div0> tag. The <pb/> tag is an empty element, in other words you cannot type any content into it, it will only take an attribute. Use the n attribute to fill in the page number, e.g. <pb n="5"/>. Each time a new page occurs, insert a new <pb/> tag and indicate the page number accordingly. For most letters the page break will be followed by a <figure> element to link to the image of the new page.
    • Figures: Images of each letter page should be referenced immediately after page breaks (but before closing </p> tags). Other images that are to be linked to the text should be added in the same way.
  4. <closer>: enter any closing remarks inside the <closer> tag. If the author has used more than one line, use a <lb/> tag to mark the line break.
    • Within this element enter the author's named in the <signed> tag exactly as they have signed it. Be sure to put it within <persName> tags and fill in the regularized form:
      <closer>Regards to you all ever.<lb/>
      <signed>
      <persName reg="Thomas MacGreevy">Tom</persName>
      </signed>
      </closer>
    • Any postscripts, enclosures, or envelopes should go within a new <div1> tag after the <div1> that contained the body of the letter, but before the close of the <div0> tag. This information should be encoded within <p> tags following all other encoding guidelines and the value of "postscript", "enclosure", or "envelope" should be entered into in the type attribute of the new <div1> tag.
Final Steps
  1. Proof your markup in jEdit.
  2. Validate the file: In jEdit, either save your document or use the "Parse" button in the Structure Browser to the left of your screen. When you save or parse your document, any errors will appear in the error window at the bottom of the screen.
  3. Save the file (you should get no error messages at this point).

Structural Units of the Text in Articles

After you've finished all the markup in the <teiHeader>, within the <body> section, you will need to enter information in the following sections:

  1. <div0>: Encode the value of the type attribute for the <div0> tag. This should be the same as content of the <classCode> tag.

    If you are encoding an article with subsections, nest <div> tags. For example, the entire article is contained in <div0>. Each subsection is contained in a <div1> tag (there can be several <div1> tags.) If there is further subdivision, within the <div1> tag, nest a <div2> tag, and so forth.
  2. <milestone>: The <milestone> tag is used to indicate typographical separations of sections of a text. For example, if a row of asterisks (*) is used to separate two sections of a book, this would be an instance in which a <milestone> tag would be used.

    The <milestone> tag must be placed inside of a <p> or <quote> tag, or else you will get a parsing error.

    The <milestone> tag has an attribute, unit, that may be used to determine the type of division made. In most cases, the unit would be simply called "section." You may follow the <milestone> tag with a series of asterisks or symbols to visibly mark the division. Example:
    <milestone unit="section"> * * *
  3. <head>: Within the <head> tags, type in the title of the article. You do not need to add <title> tags within the <head> tags. Using the rend attribute, indicate how the title looked in the original document, e.g.
    <head rend="case(allcaps)">How Does She Stand?</head>
    In this case, include all punctuation in the original text.

    Acceptable attribute values for the rend attribute:
    • italic
    • bold
    • underline
    • case(allcaps)
    • case(smallcap)
    • doublequotes
    • singlequotes
    • autograph
    • typescript
    The <head> tag can take one of three type attributes, "main," "sub" and "version." If there is a subtitle enter it in a new <title> element with the type attribute "sub". Type attributes of "version" are only used in the <teiHeader> to denote the title of the electronic version.
    <title type="main">Current Art Notes</title>
    <title type="sub">Great Britain, etc. </title>
    <title type="sub">Our Plates</title>
    <title type="version">A Machine-Readable Version</title>
  4. <bibl>: If the article is a book review, and if there is NO TITLE for the book review (just bibliographic details concerning the book being reviewed), then encode the bibliographic details within the <bibl> tag. All the separate items (title, name of author, etc) should be encoded accordingly. For example:
    <head>
    <bibl>
    <title rend="italic" level="M">The Farm by <placeName>Lough Gur</placeName>.</title>
    By <persName reg="Lady Mary Carbery">Mary Carbery</persName>.
    (Longmans. 10s 6d).
    </bibl>
    </head>
    However, if there IS a separate title to the review, followed by bibliographic details of the book being reviewed, encode the title of the review in the <head> tag, and the bibliographic details (such as title, author, price, etc) within a <bibl> tag after the <head> tag. For example:
    <head>New Book on the Irish Countryside</head>
    <bibl>
    <title rend="italic" level="M">The Farm by <placeName>Lough Gur</placeName>.</title>
    By <persName reg="Lady Mary Carbery">Mary Carbery</persName>.
    (Longmans. 10s 6d).
    </bibl>
  5. All other information:
    1. Paragraphs Insert <p></p> tags after the </head> tag but before the </div0> tag. Copy and paste the article, paragraph by paragraph, between the <p></p> tags. Make sure that each separate paragraph is tagged with <p> at the beginning and </p> at the end. Each <p> tag should be numbered with the n (number) attribute. For example:
      <p n="1">Every genuine work of art belongs . .
    2. Author of text If MacGreevy signs the article, and if the name appears at the top of the text, encode it directly after the <head> tag:
      <byline>Thomas McGreevy </byline>
      If instead the name appears at the end of the article, encode it thus after the last set of <p></p> tags:
      <closer>
      <signed>
      <persName reg="Thomas MacGreevy">Thomas McGreevy</persName>
      </signed>>
      </closer>
    3. Special characters make sure any special characters which appear in the original text are reproduced in the electronic text. Se the section on special characters in Header Encoding in Letters for more information.
    4. Page numbers of the original publication should be indicated in the electronic version. The first page of the article indication occurs directly after the initial <div0> tag. The <pb/> tag is an empty element, in other words you cannot type any content into it, it will only take an attribute. Use the n attribute to fill in the page number, e.g. <pb n="5" />. Each time a new page occurs, insert a new <pb/> tag and indicate the page number accordingly.
Final Steps
  1. Proof your markup in jEdit.
  2. Validate the file: In jEdit, either save your document or use the "Parse" button in the Structure Browser to the left of your screen. When you save or parse your document, any errors will appear in the error window at the bottom of the screen.
  3. Save the file (you should get no error messages at this point).

Part 4: Regularization and Content Encoding of All Texts

Now that you've encoded the structural elements of the body of the text, you'll fill in more detail in the text of both letters and articles.

On the printout of the digital text, mark all of the following with a highlighter:

Many of these elements are typographically different in the original. To indicate this difference, you must first encode the text according to its function (i.e., use the <foreign> tag if it is a foreign word or the <title> tag for a title), and then use the rend attribute to supply a value indicating how the text was rendered in the original. Use one of the following values (always in lower case letters) for this attribute:

So, for example an italicized title would appear as:

<title rend="italic" level="a">A Nation Once Again</title>.
  1. Proper Names: When you come across a name, look it up in Who's Who in the Archive, the Thomas MacGreevy Archive's database of regularized names. (If you come across a name that does not appear on the regularised list, please write it [them] on a post-it and attach the list of names to the article. Please also put the id of the article on the note.) Copy the form of the name which appears in green in the name database. In the xml document, put the name in a <persName> tag, and paste the regularized form of the name in the reg attribute field. DO NOT RETYPE THE NAME. Note the following:
    1. All honorifics, such as Dr, Mr, Miss, Sir, etc, go inside the <persName> tags. However, any extraneous punctuation or letters (such as 's) go outside it. For example, if the sentence reads:
      Mr MacGreevy's latest book . . .
      encode it:
      <persName>Mr MacGreevy</persName>'s latest book.
    2. If a name belongs to a "not real" person, encode it as such using the type attribute:
      <persName type="mythological">Apollo</persName>
      <persName type="fictional">Leopold Bloom</persName>
      <persName type="operatic">Violetta</persName>
      <persName type="biblical">Moses</persName>
    3. If the name stands for multiple individuals (such as a family name or the last name of a married couple), put that name in a <persName>, but do not regularise it:
      <persName>Medici</persName>
  2. Referred Strings: If someone or something is not referred to by its proper name, but referred to instead by a phrase like "The Great Man," "the emerald isle" or "the school" this phrase goes into an <rs> tag with both a type and a reg attribute. For example:
    <rs type="person" reg="James Joyce" >The Great Man</rs>
    <rs type="place" reg="Ireland">the emerald isle</rs>
    <rs type="organisation" reg="National University of Ireland">the university</rs>
    If the phrase refers to more than one person, put the phrase in an <rs> tag but leave the reg field empty:
    <rs type="person">Three Kings</rs>
  3. Place names are indicated using the <placeName> tags. These should only include geographical entities such as Dublin, Munster, England, Pleasant Street, etc. NOT hotels, restaraunts, specific street numbers, etc. Place names DO NOT get a reg attribute. For example:
    A friend who remembers a Red Cross Sale in <placeName>Dublin</placeName>...
    Going up <placeName>Westmoreland Street</placeName>...
  4. Organisational Names such as clubs, schools, or professional organizations should be enclosed in the <orgName> tag (do not apply this tag to churches, however). Just as with the <persName> tag, use the attribute reg to include the regularized form (a list of regularised organisation names can be found in the name database). For example:
    <orgName reg="Holy Cross College, Dublin">Holy Cross</orgName>
  5. Quotes: Quotations from other literary sources should be marked up in the <quote> tag if they are indented (if the quote forms part of the paragraph, do not encode it). We are only encoding quotes from other texts. Do not encode spoken speech.

    If the quote contains lines of verse, all lines should go into an <lg> element. Each line then goes into a <l> element within the <lg>.

    If it is unclear from the text if the quote is spoken or from a text, do not encode. When encoding this text, do not put quote marks within the element. If the quote was indented in the original text, then encode it as <quote rend="block">.
  6. Titles: Any titles of creative works such as paintings, sculptures, journals, books, music, or operas, should be put within a <title> element.
    1. If titles of texts were originally in quote marks, for example, "The Irish Statesman," retain the quote marks on the digital copy as in the original, but keep the quotations marks outside the <title> tag. For example:
      "<title>The Irish Statesman</title>"
    2. If in the original text the title is emphasised in some way – such as italics, bold, all caps, etc. – you should also indicate this in your markup via the rend attribute.
    3. If the text is a published text, and falls into the category of literature, fiction, poetry, articles, short stories, journals, etc., decide if it is one of the following:
      • a journal (indicated by the attribute "j")
      • a serial (indicated by the attribute "s")
      • a monograph (any book-length work) indicated by "m"
      • an article, poem, short story or any smaller unit of a book-length work (indicated by "a")
      • an unpublished text (indicated by "u")
      These should be encoded using the level attribute, for example:
      <title level="j">The New Review</title>
      Titles of other types of creative work, such as paintings, sculpture, musical compositions, etc, do not take a level attribute.
    4. If a title of a text is given by a shortened name, for example, The Annual for The Capuchin Annual, encode it as follows:
      <title type="uniform">The Capuchin Annual</title> <title type="given">The Annual</title>
      This is somewhat unusual encoding from the other examples. In this case, you must actually add text into the article you are encoding. But don't worry, when we display the files we will "turn off" all the elements <title type="uniform"> so the user will only see the text which appears within the <title type="given"> element. But the software will be able to search on the full title.
  7. Foreign Words: Indicating a foreign word or phrase requires a few steps
    1. Go back to the <teiHeader>, look for the <langUsage> tag within the <profileDesc> section. Add another tag after the one indicating that English is being used.
    2. Type the name of the language within the new <langUsage> tag.
    3. Add the proper ISO639 language code to the id attribute of the new tag. Search for ISO639 on the web, or just go to http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt to find these codes. Note that all language ID's MUST be in lowercase.
    4. When you're done, the <langUsage> section should look something like this:
      <langUsage>
      <language id="en">English</language>
      <langUsage id="it">Italian</language>
      </langUsage>
    5. Within the body of the text, enclose the foreign word or phrase within the <foreign> tag. Add a value to the lang attribute. When finished, your tag should look something like this:
      <foreign lang="it">il danno e la vergogna</foreign>
  8. Indicating Typos: If there are any typos in the original text, they must be retained in the electronic edition. To indicate this, use the <sic> tag. Within the <sic> tag, under the corr attribute, put the correct spelling, punctuation, etc. After you encode the <sic> tag, delete the text [Sic] from the text itself if it appears. For example,
    There was a <sic corr="time">timt</sic>
  9. Emphasized Text: If text is highlighted simply because it is emphasised, such as:
    He was really sick that day.
    Encode it as follows:
    He was <emph rend="italic">really</emph> sick that day.
  10. Highlighted Text: If for some reason text is highlighted, but you can't figure out why, encode it within a <hi> element and choose the appropriate value for the rend attribute. This tag is usually only used to indicate house styles, such as the opening sentence from the following article:
    <hi rend="case(smallcaps)">Forty</hi> years ago, the name of <placeName>Dublin</placeName>...
  11. Notes: The <note> tag is used to indicate any citation or footnote information included in the original text. Insert the <note> tag directly into the body of the text where the note (or its superscript number) appears. When typing the citation into the <note> tag, use whatever citation format that is utilized in the original document, being sure to tag such proper names as personal names, titles, publisher, and place of publication. For example:
    On MacGreevy's only visit to America in 1954, Stevens presented him with an inscribed copy of The Auroras of Autumn, a collection which included the 1948 poem he had written in honor of his Irish friend, "Our Stars Come from Ireland." <note n="6" place="end">See Letters of Wallace Stevens (New York: Knopf, 1966). </note>
    • Use the attribute n (number) to indicate the numbering system in the original article. Use the attribute place to indicate where the note originally appeared. Legal values for place are:
      1. foot – note appears at foot of page
      2. end – note appears at end of chapter or volume
    • Inline notes (such as citations that appear in parentheses within the body of the text) should be put in a <bibl> tag. Any parentheses should be kept outside of the <bibl> tag. For example the text:
      Must be content
      To cry to myself
      'Thalassa!' (Ten Thousand Leaping Swords; CP 29).
      would be encoded as
      <l>Must be content</l>
      <l>To cry to myself</l>
      <l>'<placeName rend="italic">Thalassa</placeName>!'</l>
      (<bibl>
      <title level="A" rend="italic">Ten Thousand Leaping Swords</title>;
      <title type="uniform">Collected Poems of <persName reg="Thomas MacGreevy" >Thomas MacGreevy</persName></title>
      <title level="M" rend="italic" type="given">CP</title>29
      </bibl>).
    • For more information on writing original notes for articles or letters, see Directions for Writing Notes in the Thomas MacGreevy Archive.
  12. Abbreviations: The <abbr> tag is used to specify any term in the text that appears in an abbreviated form (except for personal or organizational names). Use the expan attribute to write out the expanded or full form of the abbreviated term. For example,
    <abbr expan="watermark">wm</abbr>
    Abbreviations that include a superscript character should include the word "superscript" in the rend attribute:
    "113r"
    should be displayed as
    113<abbr expan="recto" rend="superscript">r</abbr>
  13. Additions and Deletions: The <add> and <del> tags are most often when encoding manuscripts. <add> is used when the author adds texts to the original document, <del> is used when the author deletes text. The place attribute is used to record where the addition is made:
    • inline addition is made within the line of text. For example:
      "The night came sulkily down."
      would be encoded as:
      The night <add place="inline">came</add> sulkily down.
    • supralinear addition is made above the line. For example:
      And
      Then night pressed
      would be encoded as:
      <add place="supralinear">And</add><del rend="overstrike">Then</del>night pressed
    • intralinear addition is made below the line. For Example:
      Octavius
      R.T.T.
      would be encoded as:
      <del rend="overstrike"><persName>Octavius</persName></del>
      <add place="intralinear"><persName>R.T.T.</persName></add>
    • left addition is made in left margin of page. For example:
      When     You and I were there
      would be encoded as:
      <add place="left">When</add>You and I were there
    • Other values include:
      • right addition is made in right margin of page
      • top addition is made in top margin of page
      • bottom addition is made in bottom margin of page
      • opposite addition is made on opposite (facing) page
      • verso addition is made on verso (opposite side) of sheet
      • mixed addition is made somewhere, one or more of the other values
  14. The <del> tag is used to indicate text that has been deleted or struck out from a manuscript or typescript. Placement of the deleted text should be indicated in the rend attribute with values such as:
    • subpunction (dots below the line indicate matter to be deleted).
      Ex. close
    • overstrike (lines through the text indicated matter to be deleted).
      Ex.: close
    • erasure (material to be deleted has been erased, but remains legible enough to transcribe)
    • bracketed (brackets around the material indicate that it is spurious or superfluous).
      Ex.: [close]
  15. Sometimes there will be instances in which text has been added and then deleted. Since the <del> tag has no place attribute, you should nest the <del> tag in the <add> tag and enter the appropriate place attribute in the <add> tag:
    When     You and I were there
    would be encoded as:
    <add place="left"><del rend="overstrike">When</del></add> you and I were there.

Part 5: Final Encoding

Final Header Encoding

After you save your document for the last time, find the document icon using Windows Explorer. Using your mouse, right click on the file and select Properties from the menu that pops up. Note the value given for the size attribute. Go back into the .xml document and under <extent> in the header, put the size of the file in kb, i.e., 5 kb.

Viewing in a Browser

Open the document with a browser like Mozilla Firefox or Internet Explorer and check to see how your encoding is displayed. Be especially careful not to have any run-on encoding, such as

Forty years ago, the name ofDublin

TThis occurs when you forget to leave space between a tag and text

<hi rend="case(smallcaps)">Forty</hi> years ago, the name of<placeName>Dublin</placeName>

Appendix A: Installing jEdit

Before you begin encoding your text, you'll need to install and set up JEdit:

  1. First you must check that you have Java Runtime Evironment on your system. jEdit will not run unless you have this program. In the "Start" menu, choose "All Programs" and look for the program. If you do not have it you can download it at http://java.sun.com/javase/downloads/index.jsp.
  2. The "latest stable version" of jEdit can be downloaded for free from http://www.jedit.org/index.php?page=download. Select "Windows-based installer" and any download location. Then when your system's pop-up window opens choose "open" and your system will take you through the installation procedure.
  3. Once jEdit is installed, you will need to install plugins (additional software that adds functionality to jEdit) to parse the XML documents. To install plugins:
    1. Select "Plugins" from the jEdit menu in the toolbar.
    2. Select "Plugin Manager" from the drop-down box.
    3. Select "Install Plugins." In the open window, be certain "Install in user plugin directory" is selected and select the following plugins:
      • File Management
        • Buffer Tabs
      • HTML and XML
        • select all options
      • Project Management
        • JDiff
      • Support
        • CommonControls
        • ErrorList
        • Sidekick
      • Text
        • SpellCheck
        • TextTools
      • Visual
        • Docker
    4. Select "Install."
    After installation, jEdit must be restarted for the plugins to function.
  4. Once you have downloaded your plugins, you will need to "dock" them. Docking your plugins allows you to keep open on your screen the different utilities you have "plugged in" to jEdit. The following instructions will allow you to dock the plugins that will be most useful to text encoding:
    1. From the Utilities menu select Global Options. In the opened window, choose "docking" from the menu to the left.
    2. On the left side of the new window, select Structure Browser and select "left" from the pull down menu that has the default category "floating." Dock "XMLInsert" on the "right." We also want to dock the Error List on the "bottom" (these settings can be changed if you have other preferences).
    3. Open the newly docked plugins (XMLInsert and the Error List) by clicking on the name of the plugin in the side or bottom gutter of your main window. To close them, click on the corresponding "X".
    4. To be certain that jEdit is parsing your document choose Plugins from the menu, then Sidekick from the drop-down menu, and finally check Parse on Keystroke. Later, when you have parsing errors (and we all have errors some time) these errors will appear in the ErrorList at the bottom of the screen.

jEdit is now set up and you’re ready to start encoding.

Appendix B: Directions for Writing Notes for the Thomas MacGreevy Archive

There are two different types of notes in the Thomas MacGreevy Archive, those that convey biographical information about a person and those that apply to the context of a specific article or letter. Each of these notes require a slightly different writing style and must be integrated with the text in different ways.

Biographical Notes

These are notes about a person's life that do not have anything to do with the context of the specific article or letter where they are mentioned. The information contained in the note may be pertinent to events that occurred before or after the immediate context of the letter. As much as possible they should adhere to the following format:

Name (date of birth - date of death), Nationality occupation. Note.

So as an example, the entry for Lennox Robinson would be:

Lennox Robinson (1886-1958), Irish playwright, producer, manager of the Abbey Theatre. Also known as Tinche and Lynx. In 1897, after seeing an Abbey production at the Cork Opera House, Robinson began to write plays. In 1909 Robinson was appointed producer of plays and manager of the Abbey by WB Yeats and Lady Gregory. In 1915 Robinson was hired by the Carnegie United Kingdom Trust to act as part-time Organising Librarian for Newcastle West and Rathkeale. Robinson met MacGreevy in 1919, and the following year, when he was hired as Secretary to the newly-established Irish Advisory Committee of the CUKT, recommended MacGreevy to the Committee as Assistant Secretary. On 8 September 1930, Robinson married Dolly Travers Smith.

These biographical notes will be separate from the particular text and will apply across all the texts in the Archive. Therefore, if a note contains biographical particulars that are really only pertinent to the context of the letter (such as "Robinson wrote The Big House which was performed on January 15, 1925, attended by George Yeats..."), those particulars should go into a context note.

Biographical notes should be placed in the "Note" field of the standard entry in the Who's Who in the MacGreevy Archive database. Directions for writing a new entry for Who's Who can be found in the Regularization Guide.

Contextual Notes

Context notes are more particularly related to the immediate text. There is no particular format for these notes as they will contain various types of information. These notes will be embedded within the text in <note> tags (see encoding documentation section on <note>). Context note tags must have the attribute value type="context" to be properly displayed.