Documentation for encoding texts in XML
Table of Contents
I: Downloads
-
MacGreevy Encoding Folder
This folder (right click on the link above and choose 'Save Link As')
is the environment you'll work in when encoding documents for the
Thomas MacGreevy Archive. It includes:
- The MacGreevy DTD,
macgX.dtd, along with two accompanying entity fifles,
macgX.ent and macgXent.dtd, are stored in the
'dtd' folder.
- MacGreevy XML Templates are used to encode
new texts. The template includes header information that will be
constant across all documents, and indicates where information
should be filled in which differs from text to text. Templates are
located in the 'documents' folder. To begin a new document open
template.xml (for articles) or letterTemplate.xml
(for letters). Save the template using the name assigned to the
document you are encoding (document names are located on the hard
copy).
- The XSLT
file resides in the 'stylesheets' folder. XSLT is used
to tranform the XML / TEI
document to HTML for
display in a web browser.
- Example documents are saved in the 'examples'
folder for reference.
-
jEdit
jEdit is an open source free text editor which is suited to XML
encoding. jEdit parses or checks the text you are encoding against a
DTD, which helps to prevent encoding mistakes. After downloading
jEdit, follow the instructions for installing jEdit
to set up the editor before beginning to encode.
II: Encoding the Text
Part 1: Pre-Encoding Instructions
- Before you start you'll need:
- a photocopy of the original text (please do not mark)
- a hard copy of the electronic version of the text
- a copy of the electronic version of the text
- On the printout of the electronic text:
- Note exactly where page breaks occur;
- Proof the text again: Compare the hard copy of the digitized text against the photocopy of the original and note any changes on the printout
- Make any changes you noted to the electronic edition in jEdit.
- When you're ready to encode the text:
- Open the text in jEdit.
- Open Template.xml for articles or letterTemplate.xml for letters in jEdit
NOTE: ignore any error messages at this point.
- Save a copy of the template file under a new name so you don't alter the original template file by mistake.
- All letters should be named in the same format:
- First, use the prefix "let" for letter. This is followed by a period.
- Next, designate the author of the letter: "mac." for MacGreevy and "yeats." for Yeats. Place this directly after "let."
- Render the date the letter was written in a yyyy-mm-dd format. For example, a letter dated 15 March 1926 would be "1926-03-15."
- Finally, add the file extension ".xml". If the date is unknown use the circa date listed in the file.
For example, a letter from Yeats circa 15 March 1926 would be named: let.yeats.1926-03-15.xml
- Filenames for articles also follow a naming convention.
- Names should have no more than eight letters/numbers.
- The first three letters indicate the type of document that is being encoded:
- articles: art
- letters: let
- poems: pms
- non-fiction prose: nfp
- fiction prose: fip
- monographs: mon
- official documents: ofd
- interview: int
- The second three letters indicate the name of the journal or text, for example:
- The Father Mathew Record: fmr
- Capuchin Annual: cpa
- The third part of the name is numerical and contains the number of
the article in the series, or the version of a poem being encoded,
etc.
For example, the fourth article from the Irish Times would be
art.irt.0004.xml.
Part 2: Encoding the header
Header encoding in letters
Within the <teiHeader>, look for any text that ends
with double question marks (??). This must be replaced with text unique to
this document instance. In particular, you will need to enter text within the
following tags:
- The first instance of (??) occurs in the DOCTYPE declaration in the
ENTITY reference created for images of the pages of the text. (If there
are no images associated with the text, delete the ENTITY reference and
move on to the next step.)
Later on, in the body of the text, <figure> tags will be used to
create links to images (this portion of the encoding is discussed in the
sections on figures in the <head> and the <p> tag). However,
before encoding the links in the text references to the images must be
added to the DOCTYPE declaration. The template contains the beginnings
of one such declaration, which you can create as follows:
- Within the DOCTYPE declaration locate the following ENTITY reference:
<!ENTITY ?? SYSTEM "??" NDATA JPEG>
- The first set of question marks should be replaced with the
alphanumeric string you want to use in the <figure> tags later on.
This will be like a nickname for the image. For example, if you wanted
to call the first image "image1" your ENTITY declaration would read:
<!ENTITY image1 SYSTEM "??" NDATA JPEG>
- Next, replace the other set of question marks with the actual
filename of the jpeg image file. Be sure to include the .jpg file
extension and do this within the quotation marks. So if your file
was named "m1.jpg" your ENTITY declaration should now read:
<!ENTITY image1 SYSTEM "m1.jpg" NDATA JPEG>
- You will repeat these steps adding separate entity declarations
for each image you add to the document. Be sure to keep subsequent
declarations before the closing square bracket and angle bracket
(i.e., ]> ).
- You have now finished the ENTITY reference in the DOCTYPE
declaration. You will need to use these references to create links to
the images in the document when you are encoding the body of the text.
- <tei2>: in the attribute field id
add the id number of the text, which is the file name minus the .xml.
For example the file let.mac.1925-12-03.xml would be:
let.mac.1925-12-03
- <title>: enter the names of the author and
recipient and the date in the empty spaces in the title. All titles
will follow the format of this example:
<title type="main">Letter from <persName reg="George Yeats" type="sender">George Yeats</persName> to <persName reg="Thomas MacGreevy" type="recipient">Thomas MacGreevy</persName>. c. 15 March 1926</title>
<title type="version">A Machine-Readable Version</title><author>George Yeats</author>
Be sure to add the regularized form of the names in the reg
attribute of the <persName> tags of the
<title> as well.
- <respStmt>: in the <name> tag following
the <resp> tag that states: Creation of machine readable text
by: enter:
- Susan Schreibman if the letter is authored by McGreevy or
- Ann Saddlemyer if the letter is authored by George Yeats.
Enter your name in the <name> tag of the third <resp>
statement which says Header creation and markup by:.
- <date>: on the day you finish markup, enter
that day's date. Inside the tag write out the date, day first, then
month, then year. In the value attribute write the date in the
yyyy-mm-dd format:
<publicationStmt><date
value="2001-10-08">8 October 2001</date>
- <extent>: this will be completed as the
very last step. See part 5: Final Encoding
for more information.
- <idno>: enter the ID Number of the document
which is the same as the full file name with the file-type extension,
for example:
let.mac.1932-04-16.xml
- <note>: place the abstract for the letter
here. The type attribute of this should already have
the value of "abstract." [Remember that later on when you
regularize and encode the content of the document you will have to
return to the abstract to add this encoding as well.]
- Make sure that entity references are encoded so that they can be
viewed in all browsers. The format for all entity references is
"&#xxxx;" with the "x"s representing the numerical code (i.e.
"é" for "é").
- If you are using a PC you can open up the Character Map
program on your computer (found under "All Programs/Accessories/System
Tools" in the Windows Start menu) to find these codes. The proper
code is found on the bottom right corner after the label "Keystroke:"
You can eliminate at ALT or + signs and use only the 4-digit code.
- If you are using a Macintosh or don't have the character map
program, you can find character codes at
http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html.
The codes are written in the third column.
- Em and en dashes create a special difficulty. For these
characters use the UTF-8 codes: "— for an EM dash "—"
[used to denote a break in the flow of the sentence; used most often]
and – for an EN dash "–" [used to introduce a range
or list; used infrequently].
- Encode all of the following in <bibl> inside <sourceDesc>.
The template file contains the following text:
<bibl>
<title>Letter from AUTHOR?? to RECIPIENT??</title>
<date value="1964-07-18??">DATE??</date>
<author>AUTHOR??/author>
<orgName type="archive">LIBRARY HOLDING ORIGINAL COPY??</orgName>.
DESCRIBE THE WRITING HERE??.
<num type="ms">MANUSCRIPT NUMBER??</num>
</bibl>
- <title>: Delete the words "AUTHOR??" and
"RECIPIENT??" and replace with the proper names. So, for example, a
letter from Yeats to MacGreevy should look like this:
<title>Letter from George Yeats to Thomas
MacGreevy</title>
- <date>: Enter date in the dd Month
yyyy format, and the value of the date in the yyyy-mm-dd
format:
<date value="1928-03-15">15 March 1928</date>
- <author>: The author of the letter should
be placed here again:
<author>George Yeats</author>
- <orgName>: fill in the name of the
institution holding the original copies. For MacGreevy's letters
this is "National Library of Ireland." For Yeats's letters this is
"Trinity College, Dublin" Be sure to use these regularized forms.
<orgName type="archive">Trinity College,
Dublin</orgName>
- After <orgName>, but before <num>, enter a
description of how the letter was created. Choose from the
following:
- Autograph letter signed.
- Autograph letter unsigned.
- Typewritten letter signed.
- Typewritten letter with autograph annotations signed.
- Typewritten letter unsigned.
- Typewritten letter with autograph annotations unsigned.
- <num>: Write the original manuscript
number in this field. All TCD manuscripts begin with "TCD MS 8104/"
and are immediately followed by the document number. [Note
that in the typed versions of these letters, the manuscript numbers
appear as "TCD8104/xx." In the .xml version we are adding
the "MS" as well as spaces after "TCD" and "MS".] All
National Library manuscripts use "NLI MS 30,859" and have no
document number.
Document numbers for the Yeats letters are circled in the upper
right-hand corner of the microfilm copies. This number should be
placed directly after the "/". So for the example that we have
been using, this element would look like:
<num type="ms">TCD MS 8104/57</num>
- <item> in <keywords>:
leave this element empty at the moment. These keywords will be added
by project editors.
- Finally, make sure you have not missed anything. Search for ??
(in jEdit you can open the search dialog box by pressing ctrl+f) and
make any changes you missed.
- Be sure to save the file!
Header encoding in articles
Within the <teiHeader> section, look for any text that ends
with double question marks (??). These must be replaced with text unique
to this document instance. In particular, you will need to enter text
within the following tags:
- <tei2>: in the attribute field
id add the filename of the text, minus
the ".xml." For example, rev.it.023.xml would be
rev.it.023.
- <title>: If the text is NOT a book review
enter the title in the very first pair of <title> tags. No
matter how the title appears in the text typographically, type it
in contemporary format,. i.e. articles, prepositions and conjunctions
in lower case, nouns, verbs, adjectives, etc. in upper case. Do not
include any full stops in the title (but do include question and
exclamation marks).
If the title IS a book review, please use the following format:
<title type="main">Review of <title
rend="italic" level="M">El Artista Adolescente</title></title>
<title type="version">by <persName reg="James Joyce">James
Joyce</persName></title>
- <name>: enter your name in the <name>
element of the third <resp> tag:
<resp>Header creation and markup by:
<name>YOUR NAME HERE??</name> </resp>
as you will be creating the header and encoding this text.
- <date>: on the day you finish markup, enter
the date in the dd-Month-yyyy format. Be sure to encode the
value of that date in the yyyy-mm-dd format in the
value attribute as well. So if you
finished encoding on December 10, 2004, it would look like:
<date value="2004-12-10">10 December 2004</date>
- <extent>: this will be
completed as the very last step. See Part 5:
Final Encoding for more information.
- <idno>: enter the filename of
the document which is the same as the full file name with the
file-type extension, for example:
rev.fmr.003.xml
- all fields in <biblStruct>
inside <sourceDesc>:
- <title>: This element
appears twice in <biblStruct>.
In the first place enter the title of the article, in the second
enter the title of the periodical where the article was published
- <date>: the date the article
was originally published in dd-mm-yyyy form. Again
remember to encode the value
attribute with the date in yyyy-mm-dd format.
- <biblScope>: the page or
range of pages in the periodical where the article originally
appeared.
- <item> in
<keywords>: The following are
keywords used in the <item>
elements of the <keyword> list in
the <textclass> section of the
TEI Header by subject, date and nationality:
- By subject:
- Art
- Architecture
- Biography
- Catholicism
- Critical Method
- Dance
- Education
- Film
- Folklore
- Great War
- History
- Journals
- Literature
- Music
- Mythology
- Opera
- Politics
- Sport
- Theatre
- Travel
- By nationality:
- American
- Dutch
- English
- French
- German
- Irish
- Italian
- Scottish
- Spanish
- Swiss
- By date:
- 1100-1199
- 1200-1299
- 1300-1399
- 1400-1499
- 1500-1599
- 1600-1699
- 1700-1799
- 1800-1899
- 1900-1999
- By text class:
- Article
- Book
- ArtReview
- BookReview
- Criticism
- DanceReview
- FilmReview
- MusicReview
- Letter-to-the-Editor
- Obituary
- Interview
- TheatreReview
Assign one keyword from each of these categories to separate
<item> elements. The template already
includes <item> tags with the proper
type attributes to delineate each
category.
- <classCode>: after reading the
article, you should be able to classify it as one of the following
categories (typed exactly as it appears here):
- Article
- Book
- ArtReview
- BookReview
- FilmReview
- MusicReview
- Letter-to-the-Editor
- Obituary
- TheatreReview
- Finally, make sure you have not missed anything. Search for ?? (in
jEdit you can open the search dialog box by pressing ctrl+f) and make
any changes you missed.
- Be sure to save the file!
Part 3: Encoding the Body
Structural Units of the Text in Letters
After you've finished all the markup in the <teiHeader,
within the <text section, you will need
to enter information in the following sections:
- <div0>: all of the text of the letter
will be nested inside these tags. Each subsection is contained in a
<div1> tag (there can be several <div1> tags.) If there is further subdivision,
within the <div1> tag, nest a
<div2> tag, and so forth.
In all letters everything except the first <pb>
tag (which is discussed later) will be in a <div1>
tag inside the <div0> tag. This
includes the head, opener, text, and closer tags, as well as any
postscripts. The template is set up this way. However, for enclosures
(of another letter or a poem for example) will need to be placed in a
new <div1> section after the closing
of the previous <div1> that contained
the letter. The type attribute of this new
<div1> should be encoded as
"enclosure."
So in the following example, MacGreevy enclosed a poem to Yeats. The poem
was placed in a new <div1> and encoded
as a lines of verse (see the section on Quotes in Part 4:
Regularization and Content Encoding for more information on line
groups and poems):
<div0>
<pb n="1" />
<div1>
<head type="main">Letter from...
...letter text...
<closer>My love to you all at <placeName>82</placeName> —& thanks again <lb />
<signed><persName>Tom</persName>.</signed>
</closer>
<p type="postscript">The enclosed suggested by the <persName>Picasso</persName>& a
white-washed wall in a little restaurant about a fortnight ago.</p>
</div1>
<div1 type="enclosure">
<head>Grianôn</head>
<lg>
<l>The end of Love, Love's ultimate good</l>
<l>Is the end of Love,</l>
...
- <head>: Within the
<head> tags, enter the author,
recipient and date values in place of the terms AUTHOR??,
RECIPIENT??, and DATE??. Be sure to fill in the
value attribute of the
<date> tag in yyyy-mm-dd
form.
- <p>: Next, enter the text of the
letter indicating the following:
- <closer>: enter any closing remarks
inside the <closer> tag. If the author
has used more than one line, use a <lb/>
tag to mark the line break.
- Within this element enter the author's named in the
<signed> tag exactly as they have
signed it. Be sure to put it within <persName>
tags and fill in the regularized form:
<closer>Regards to you all ever.<lb/>
<signed>
<persName reg="Thomas MacGreevy">Tom</persName>
</signed>
</closer>
- Any postscripts, enclosures, or
envelopes should go within a new <div1>
tag after the <div1> that contained
the body of the letter, but before the close of the <div0>
tag. This information should be encoded within <p> tags
following all other encoding guidelines and the value of "postscript", "enclosure", or
"envelope" should be entered into in the type attribute of
the new <div1> tag.
Final Steps
- Proof your markup in jEdit.
- Validate the file: In jEdit, either save your document or use the
"Parse" button in the Structure Browser to the left of your screen. When
you save or parse your document, any errors will appear in the error
window at the bottom of the screen.
- Save the file (you should get no error messages at this point).
Structural Units of the Text in Articles
After you've finished all the markup in the <teiHeader>,
within the <body> section, you will need
to enter information in the following sections:
- <div0>: Encode the value of the
type attribute for the
<div0> tag. This should be the same
as content of the <classCode> tag.
If you are encoding an article with subsections, nest
<div> tags. For example, the entire
article is contained in <div0>. Each
subsection is contained in a <div1>
tag (there can be several <div1>
tags.) If there is further subdivision, within the
<div1> tag, nest a
<div2> tag, and so forth.
- <milestone>: The
<milestone> tag is used to indicate
typographical separations of sections of a text. For example, if a row
of asterisks (*) is used to separate two sections of a book, this would
be an instance in which a <milestone>
tag would be used.
The <milestone> tag must be placed
inside of a <p> or
<quote> tag, or else you will get a
parsing error.
The <milestone> tag has an attribute,
unit, that may be used to determine the
type of division made. In most cases, the unit would be simply called
"section." You may follow the <milestone>
tag with a series of asterisks or symbols to visibly mark the division.
Example:
<milestone unit="section"> * * *
- <head>: Within the
<head> tags, type in the title of
the article. You do not need to add <title>
tags within the <head> tags. Using
the rend attribute, indicate how the
title looked in the original document, e.g.
<head rend="case(allcaps)">How Does She
Stand?</head>
In this case, include all punctuation in the original text.
Acceptable attribute values for the rend attribute:
- italic
- bold
- underline
- case(allcaps)
- case(smallcap)
- doublequotes
- singlequotes
- autograph
- typescript
The <head> tag can take one of three
type attributes, "main," "sub" and "version."
If there is a subtitle enter it in a new <title>
element with the type attribute "sub".
Type attributes of "version" are only used
in the <teiHeader> to denote the title
of the electronic version.
<title type="main">Current Art Notes</title>
<title type="sub">Great Britain, etc. </title>
<title type="sub">Our Plates</title>
<title type="version">A Machine-Readable Version</title>
- <bibl>: If the article is a book review,
and if there is NO TITLE for the book review (just bibliographic details
concerning the book being reviewed), then encode the bibliographic details
within the <bibl> tag. All the separate
items (title, name of author, etc) should be encoded accordingly. For example:
<head>
<bibl>
<title rend="italic" level="M">The
Farm by <placeName>Lough Gur</placeName>.</title>
By <persName reg="Lady Mary Carbery">Mary
Carbery</persName>.
(Longmans. 10s 6d).
</bibl>
</head>
However, if there IS a separate title to the review, followed by bibliographic
details of the book being reviewed, encode the title of the review in the
<head> tag, and the bibliographic details
(such as title, author, price, etc) within a
<bibl> tag after the
<head> tag. For example:
<head>New Book on the Irish Countryside</head>
<bibl>
<title rend="italic" level="M">The Farm by
<placeName>Lough Gur</placeName>.</title>
By <persName reg="Lady Mary Carbery">Mary
Carbery</persName>.
(Longmans. 10s 6d).
</bibl>
- All other information:
- Paragraphs Insert <p></p> tags after
the </head> tag but before the </div0> tag. Copy and paste
the article, paragraph by paragraph, between the <p></p>
tags. Make sure that each separate paragraph is tagged with <p>
at the beginning and </p> at the end. Each <p> tag should
be numbered with the n (number)
attribute. For example:
<p n="1">Every genuine work of art belongs . .
- Author of text If MacGreevy signs the article,
and if the name appears at the top of the text, encode it directly
after the <head> tag:
<byline>Thomas McGreevy </byline>
If instead the name appears at the end of the article, encode it
thus after the last set of <p></p> tags:
<closer>
<signed>
<persName reg="Thomas MacGreevy">Thomas McGreevy</persName>
</signed>>
</closer>
- Special characters make sure any special
characters which appear in the original text are reproduced in
the electronic text. Se the section on special characters in
Header Encoding in Letters for more
information.
- Page numbers of the original publication
should be indicated in the electronic version. The first page of
the article indication occurs directly after the initial <div0>
tag. The <pb/> tag is an empty element, in other words you
cannot type any content into it, it will only take an attribute.
Use the n attribute to fill in the
page number, e.g. <pb n="5" />.
Each time a new page occurs, insert a new
<pb/> tag and indicate the
page number accordingly.
Final Steps
- Proof your markup in jEdit.
- Validate the file: In jEdit, either save your document or use the
"Parse" button in the Structure Browser to the left of your screen. When
you save or parse your document, any errors will appear in the error
window at the bottom of the screen.
- Save the file (you should get no error messages at this point).
Part 4: Regularization and Content Encoding of All Texts
Now that you've encoded the structural elements of the body
of the text, you'll fill in more detail in the text of both
letters and articles.
On the printout of the digital text, mark all of the
following with a highlighter:
Many of these elements are typographically different in the
original. To indicate this difference, you must first encode the
text according to its function (i.e., use the
<foreign> tag if it is a foreign
word or the <title> tag for a
title), and then use the rend attribute
to supply a value indicating how the text was rendered in the original.
Use one of the following values (always in lower case letters) for
this attribute:
- italic
- bold
- underline
- case(allcaps)
- case(smallcap)
- doublequotes
- singlequotes
So, for example an italicized title would appear as:
<title rend="italic" level="a">A Nation Once
Again</title>.
- Proper Names: When you come across a name, look
it up in Who's Who in the Archive,
the Thomas MacGreevy Archive's database of regularized names. (If you
come across a name that does not appear on the regularised list,
please write it [them] on a post-it and attach the list of names to
the article. Please also put the id of the article on the note.) Copy
the form of the name which appears in green in the name database. In
the xml document, put the name in a <persName>
tag, and paste the regularized form of the name in the
reg attribute field. DO NOT RETYPE THE
NAME. Note the following:
- All honorifics, such as Dr, Mr, Miss, Sir, etc, go inside the
<persName> tags. However, any
extraneous punctuation or letters (such as 's) go outside
it. For example, if the sentence reads:
Mr MacGreevy's latest book . . .
encode it:
<persName>Mr MacGreevy</persName>'s
latest book.
- If a name belongs to a "not real" person, encode it as such
using the type attribute:
<persName type="mythological">Apollo</persName>
<persName type="fictional">Leopold Bloom</persName>
<persName type="operatic">Violetta</persName>
<persName type="biblical">Moses</persName>
- If the name stands for multiple individuals (such as a family
name or the last name of a married couple), put that name in a
<persName>, but do not regularise
it:
<persName>Medici</persName>
- Referred Strings: If someone or something is not referred
to by its proper name, but referred to instead by a phrase like "The Great Man,"
"the emerald isle" or "the school" this phrase goes into an
<rs> tag with both a type
and a reg attribute. For example:
<rs type="person" reg="James Joyce" >The Great Man</rs>
<rs type="place" reg="Ireland">the emerald isle</rs>
<rs type="organisation" reg="National University of Ireland">the university</rs>
If the phrase refers to more than one person, put the phrase in an
<rs> tag but leave the
reg field empty:
<rs type="person">Three Kings</rs>
- Place names are indicated using the <placeName>
tags. These should only include geographical entities such as Dublin, Munster,
England, Pleasant Street, etc. NOT hotels, restaraunts, specific street numbers,
etc. Place names DO NOT get a reg attribute.
For example:
A friend who remembers a Red Cross Sale in
<placeName>Dublin</placeName>...
Going up <placeName>Westmoreland Street</placeName>...
- Organisational Names such as clubs, schools, or
professional organizations should be enclosed in the <orgName>
tag (do not apply this tag to churches, however). Just as with the
<persName> tag, use the attribute
reg to include the regularized form (a list
of regularised organisation names can be found in the name
database). For example:
<orgName reg="Holy Cross College, Dublin">Holy Cross</orgName>
- Quotes: Quotations from other literary sources should
be marked up in the <quote> tag if they are
indented (if the quote forms part of the paragraph, do not encode it). We are
only encoding quotes from other texts. Do not encode spoken speech.
If the quote contains lines of verse, all lines should go into an
<lg> element. Each line then goes into a
<l> element within the
<lg>.
If it is unclear from the text if the quote is spoken or from a text, do not encode.
When encoding this text, do not put quote marks within the element. If the quote was
indented in the original text, then encode it as <quote
rend="block">.
- Titles: Any titles of creative works such as paintings,
sculptures, journals, books, music, or operas, should be put within a
<title> element.
- If titles of texts were originally in quote marks, for example, "The Irish
Statesman," retain the quote marks on the digital copy as in the original, but
keep the quotations marks outside the <title>
tag. For example:
"<title>The Irish Statesman</title>"
- If in the original text the title is emphasised in some way – such as
italics, bold, all caps, etc. – you should also indicate this in your
markup via the rend attribute.
- If the text is a published text, and falls into the category of literature,
fiction, poetry, articles, short stories, journals, etc., decide if it is one
of the following:
- a journal (indicated by the attribute "j")
- a serial (indicated by the attribute "s")
- a monograph (any book-length work) indicated by "m"
- an article, poem, short story or any smaller unit of a book-length
work (indicated by "a")
- an unpublished text (indicated by "u")
These should be encoded using the level attribute, for example:
<title level="j">The New Review</title>
Titles of other types of creative work, such as paintings, sculpture, musical
compositions, etc, do not take a level attribute.
- If a title of a text is given by a shortened name, for example, The
Annual for The Capuchin Annual, encode it as follows:
<title type="uniform">The Capuchin Annual</title>
<title type="given">The Annual</title>
This is somewhat unusual encoding from the other examples. In this case,
you must actually add text into the article you are encoding. But don't
worry, when we display the files we will "turn off" all the elements
<title type="uniform"> so the user will
only see the text which appears within the <title
type="given"> element. But the software will be able to search on
the full title.
- Foreign Words: Indicating a foreign word or phrase
requires a few steps
- Go back to the <teiHeader>, look
for the <langUsage> tag within the
<profileDesc> section. Add another
tag after the one indicating that English is being used.
- Type the name of the language within the new
<langUsage> tag.
- Add the proper ISO639 language code to the id
attribute of the new tag. Search for ISO639 on the web, or just go to
http://www.ics.uci.edu/pub/ietf/http/related/iso639.txt
to find these codes. Note that all language ID's MUST be in lowercase.
- When you're done, the <langUsage>
section should look something like this:
<langUsage>
<language id="en">English</language>
<langUsage id="it">Italian</language>
</langUsage>
- Within the body of the text, enclose the foreign word or phrase within
the <foreign> tag. Add a value to the
lang attribute. When finished, your tag
should look something like this:
<foreign lang="it">il danno e la
vergogna</foreign>
- Indicating Typos: If there are any typos in the
original text, they must be retained in the electronic edition. To indicate
this, use the <sic> tag. Within the
<sic> tag, under the
corr attribute, put the correct spelling,
punctuation, etc. After you encode the <sic>
tag, delete the text [Sic] from the text itself if it appears. For
example,
There was a <sic corr="time">timt</sic>
- Emphasized Text: If text is highlighted simply because
it is emphasised, such as:
He was really sick that day.
Encode it as follows:
He was <emph rend="italic">really</emph>
sick that day.
- Highlighted Text: If for some reason text is highlighted,
but you can't figure out why, encode it within a <hi>
element and choose the appropriate value for the rend
attribute. This tag is usually only used to indicate house styles, such as
the opening sentence from the following article:
<hi rend="case(smallcaps)">Forty</hi> years ago,
the name of <placeName>Dublin</placeName>...
- Notes: The <note>
tag is used to indicate any citation or footnote information included in the
original text. Insert the <note> tag
directly into the body of the text where the note (or its superscript number)
appears. When typing the citation into the
<note> tag, use whatever citation format
that is utilized in the original document, being sure to tag such proper names
as personal names, titles, publisher, and place of publication. For example:
On MacGreevy's only visit to America in 1954, Stevens
presented him with an inscribed copy of The Auroras of Autumn, a collection
which included the 1948 poem he had written in honor of his Irish friend,
"Our Stars Come from Ireland." <note n="6" place="end">See Letters of
Wallace Stevens (New York: Knopf, 1966). </note>
- Abbreviations: The <abbr> tag is
used to specify any term in the text that appears in an abbreviated form (except for
personal or organizational names). Use the expan attribute
to write out the expanded or full form of the abbreviated term. For example,
<abbr expan="watermark">wm</abbr>
Abbreviations that include a superscript character should include the word "superscript"
in the rend attribute:
"113r"
should be displayed as
113<abbr expan="recto" rend="superscript">r</abbr>
- Additions and Deletions: The <add>
and <del> tags are most often when encoding manuscripts.
<add> is used when the author adds texts to the original
document, <del> is used when the author deletes text.
The place attribute is used to record where the
addition is made:
- inline addition is made within the line of text. For example:
"The night came sulkily down."
would be encoded as:
The night <add place="inline">came</add>
sulkily down.
- supralinear addition is made above the line. For example:
And
Then night pressed
would be encoded as:
<add place="supralinear">And</add><del
rend="overstrike">Then</del>night pressed
- intralinear addition is made below the line. For Example:
Octavius
R.T.T.
would be encoded as:
<del rend="overstrike"><persName>Octavius</persName></del>
<add place="intralinear"><persName>R.T.T.</persName></add>
- left addition is made in left margin of page. For example:
When You and I were there
would be encoded as:
<add place="left">When</add>You and I were there
- Other values include:
- right addition is made in right margin of page
- top addition is made in top margin of page
- bottom addition is made in bottom margin of page
- opposite addition is made on opposite (facing) page
- verso addition is made on verso (opposite side) of
sheet
- mixed addition is made somewhere, one or more of the
other values
- The <del> tag is used to indicate text that has
been deleted or struck out from a manuscript or typescript. Placement of the deleted
text should be indicated in the rend attribute with
values such as:
- subpunction (dots below the line indicate matter to be deleted).
Ex. close
- overstrike (lines through the text indicated matter to be deleted).
Ex.: close
- erasure (material to be deleted has been erased, but remains
legible enough to transcribe)
- bracketed (brackets around the material indicate that it is
spurious or superfluous).
Ex.: [close]
- Sometimes there will be instances in which text has been added and then deleted.
Since the <del> tag has no place
attribute, you should nest the <del> tag in the
<add> tag and enter the appropriate place
attribute in the <add> tag:
When You and I were there
would be encoded as:
<add place="left"><del rend="overstrike">When</del></add>
you and I were there.
Part 5: Final Encoding
Final Header Encoding
After you save your document for the last time, find the document icon using Windows Explorer.
Using your mouse, right click on the file and select Properties from the menu that pops up. Note
the value given for the size attribute. Go back into the .xml
document and under <extent> in the header, put the size of the
file in kb, i.e., 5 kb.
Viewing in a Browser
Open the document with a browser like Mozilla Firefox or Internet Explorer and check to see how
your encoding is displayed. Be especially careful not to have any run-on encoding, such as
Forty years ago, the name ofDublin
TThis occurs when you forget to leave space between a tag and text
<hi rend="case(smallcaps)">Forty</hi> years ago, the name
of<placeName>Dublin</placeName>
Appendix A: Installing jEdit
Before you begin encoding your text, you'll need to install and set up JEdit:
- First you must check that you have Java Runtime Evironment on your system.
jEdit will not run unless you have this program. In the "Start" menu, choose
"All Programs" and look for the program. If you do not have it you can download
it at
http://java.sun.com/javase/downloads/index.jsp.
- The "latest stable version" of jEdit can be downloaded for free from
http://www.jedit.org/index.php?page=download.
Select "Windows-based installer" and any download location. Then when your
system's pop-up window opens choose "open" and your system will take you
through the installation procedure.
- Once jEdit is installed, you will need to install plugins (additional
software that adds functionality to jEdit) to parse the XML documents. To
install plugins:
- Select "Plugins" from the jEdit menu in the toolbar.
- Select "Plugin Manager" from the drop-down box.
- Select "Install Plugins." In the open window, be certain "Install
in user plugin directory" is selected and select the following plugins:
- File Management
- HTML and XML
- Project Management
- Support
- CommonControls
- ErrorList
- Sidekick
- Text
- Visual
- Select "Install."
After installation, jEdit must be restarted for the plugins to function.
- Once you have downloaded your plugins, you will need to "dock" them.
Docking your plugins allows you to keep open on your screen the different
utilities you have "plugged in" to jEdit. The following instructions will
allow you to dock the plugins that will be most useful to text encoding:
- From the Utilities menu select Global Options. In the opened
window, choose "docking" from the menu to the left.
- On the left side of the new window, select Structure Browser and
select "left" from the pull down menu that has the default category
"floating." Dock "XMLInsert" on the "right." We also want to dock the
Error List on the "bottom" (these settings can be changed if you have
other preferences).
- Open the newly docked plugins (XMLInsert and the Error List) by
clicking on the name of the plugin in the side or bottom gutter of
your main window. To close them, click on the corresponding "X".
- To be certain that jEdit is parsing your document choose Plugins
from the menu, then Sidekick from the drop-down menu, and finally
check Parse on Keystroke. Later, when you have parsing errors (and we
all have errors some time) these errors will appear in the ErrorList
at the bottom of the screen.
jEdit is now set up and you’re ready to start encoding.
Appendix B: Directions for Writing Notes for the Thomas MacGreevy
Archive
There are two different types of notes in the Thomas MacGreevy Archive,
those that convey biographical information about a person and those that apply to
the context of a specific article or letter. Each of these notes require a slightly
different writing style and must be integrated with the text in different ways.
Biographical Notes
These are notes about a person's life that do not have anything to do with the
context of the specific article or letter where they are mentioned. The information
contained in the note may be pertinent to events that occurred before or after the
immediate context of the letter. As much as possible they should adhere to the
following format:
Name (date of birth - date of death),
Nationality occupation. Note.
So as an example, the entry for Lennox Robinson would be:
Lennox Robinson (1886-1958), Irish playwright,
producer, manager of the Abbey Theatre. Also known as Tinche and Lynx. In 1897,
after seeing an Abbey production at the Cork Opera House, Robinson began to write
plays. In 1909 Robinson was appointed producer of plays and manager of the Abbey
by WB Yeats and Lady Gregory. In 1915 Robinson was hired by the Carnegie United
Kingdom Trust to act as part-time Organising Librarian for Newcastle West and
Rathkeale. Robinson met MacGreevy in 1919, and the following year, when he was
hired as Secretary to the newly-established Irish Advisory Committee of the CUKT,
recommended MacGreevy to the Committee as Assistant Secretary. On 8 September
1930, Robinson married Dolly Travers Smith.
These biographical notes will be separate from the particular text and will
apply across all the texts in the Archive. Therefore, if a note contains
biographical particulars that are really only pertinent to the context of the
letter (such as "Robinson wrote The Big House which was performed on January 15,
1925, attended by George Yeats..."), those particulars should go into a context
note.
Biographical notes should be placed in the "Note" field of the standard entry
in the Who's Who in the MacGreevy Archive database.
Directions for writing a new entry for Who's Who can be found in the Regularization
Guide.
Contextual Notes
Context notes are more particularly related to the immediate text. There is no
particular format for these notes as they will contain various types of
information. These notes will be embedded within the text in
<note> tags (see encoding documentation section
on <note>). Context note tags must have the
attribute value type="context" to be properly
displayed.