A Celebration of Women Writers


Transcribing and Proofreading:

Summary: Working as a transcriber-proofreader


Step 1. If you want to work as a transcriber-proofreader, email me at celebration.women@gmail.com and send me your NAME and ADDRESS. Tell me the name and edition of the book on which you are thinking of working. I'll let you know if it is one that we would host on the Celebration of Women Writers site.


Step 2. When you get your text, transcribe it by typing or scanning it in. If possible, spell-check it. If you are comfortable doing so, you can add HTML formatting.

Goals when transcribing and proofing are (1) to respect the author's intentions and (2) to achieve consistency, completeness, and correctness in your transcription.

Respecting the Author's Intentions

We tend to think of authors as the people who create books. But the appearance of a particular edition is also the result of many decisions by the printer or compositor. In general, readers are more interested in the author's intentions than in a compositor's conventions. In creating on-line editions for the Celebration, our goal to respect the intentions of the original writer, rather than to recreate every production decision in the original text.

The significance and meaning intended by the original author may be revealed not only by the wording and spelling of the original text, but also by the use of capitalization, italics, punctuation, accents, breaks between paragraphs (in prose) and breaks between lines (in poetry). All of these can be important in communicating the meaning of the author, and we try to recreate them in our on-line editions.

In most cases, authors have little control over mundane production details of the printer's trade such as the layout of the text across pages. These may include the choice of character sets or font sizes, the use of running headers and footers on pages, the placement of line breaks (except in poetry), page breaks, and page numbers. The placement of illustrations may also reflect printing constraints rather than the author's preference. Therefore, we generally ignore these details when creating on-line editions.

Consistency, Completeness, and Correctness

As transcribers and proofreaders, it is essential to strive for

Consistency in transcribing makes your work understandable, and makes it possible to turn it into displayable HTML with a minimum of work. How you transcribe features of the layout and formatting of your text should remain consistent throughout your submission. For example, if you use tabs to indicate the start of paragraphs, do so throughout; don't mix tabs with invisible indents set via the margins.

Completeness means that you've transcribed all the relevant sections of the text that you were given, and checked to ensure that you haven't missed any pages, paragraphs, sentences, lines, or words. Keep in mind the discussion of author's intent vs. compositor's layout, above: your submission should recreate all of the author's text, but not layout details like running headers and page numbers.

Correctness means that you've respected the wording, spelling, capitalization, italics, bolding, punctuation, etc. of the original text, and recreated it accurately in your submission. The original text should be the final authority when making decisions about correctness, even if you find it a bit strange, or your spelling checker program complains about it. This is especially important with British and American spellings. It is also crucial when you are transcribing a text where the characters speak in some sort of dialect or accent.

Transcribing:

You can transcribe a section of text into electronic form either by scanning it with an optical character recognition (OCR) program, or by typing it in. Scanning tends to be faster than typing for most people, so if you have access to a scanner, you may want to try using it. But with 16-20 page segments, even typing in a page every night will allow you to accomplish a lot in a reasonable time.

Be aware that scanning a page as text with a OCR program and scanning a page as a graphical image are not the same thing: you need to ensure that the files you are creating and sending me are text files, not image files.

If illustrations accompany the text, and the illustrations as well as the text are in the public domain, and can be put on-line, I will scan them in and put them up. Volunteers can ignore them. The apparent size, coloration, and location of illustrations may vary from the original book due to the processes of scanning and displaying them. In cases where a large illustration caused a paragraph to be split in the original text, I often choose to place the illustration after the rejoined paragraph in the on-line version. In rare cases, where illustrations in the original text were so far from their related content that their significance was lost, I have chosen to reorganize them and place them closer to relevant paragraphs in the on-line text (c.f. A Childhood in Brittany)

If there are footnotes in the text, it's easiest for me if you transcribe them at the end of your text file, and treat them there as if they were normal paragraphs. You can proceed each with a comment such as "[footnote 1 from page 135]" to let me know where they appeared in the original text. I prefer this to having you format them using footnote features unique to your text editor, because text editors store and convert footnotes in different ways. There's no guarantee that what your program does will survive conversion to a format I can read.

Spell-Checking:

Many OCR programs have spell-checking facilities built in. Many text editors will also offer you spell-checking features. My advice is that you use them–but don't rely on them. Auto-correction features in scanners can make some truly disastrous subsitutions, and incorrect 'corrections' are harder to find than uncorrected errors when proofreading. Spell-checkers that show you possible errors and ask you to correct them can be very helpful, but you should always compare their findings to the original text you were sent, rather than relying on the words suggested by your spell-checker. The original text is the final authority on how something should be spelled.

In the section on proofreading, I list characteristic errors for both typed and scanned text. I also indicate how different types of errors are likely to be detected: by spell-checking, by proofreading, and by proofreading against the original. Some errors can be detected by spell-checking, but many types of errors cannot be. Please look at the examples. I hope they'll both educate people about what to watch out for, and motivate people to proofread carefully comparing to the original text.

Formatting:

You can indicate italics and accents either by using HTML commands, or by using special formatting commands in your word processing program. In both cases, there are issues to be aware of.

You can save me time by setting up your text in HTML and proofreading it. However, it's important that all volunteers use a standard set of HTML encodings. For this reason, I would ask that you do NOT automatically generate an HTML version of your file. Programs that support conversion to HTML do so in a variety of ways. I have to do a lot of painful clean-up work to restandardize files that have been auto-converted to HTML.

If you are comfortable putting in HTML codes yourself, terrific! It's really not difficult. The table below is a quick guide to some of the HTML I use to indicate paragraph separations, different font styles (italic, bold, etc.) and the most common diacriticals. With these, you can transcribe most book sections that I send out. If you're interested in learning more about HTML, try A Beginner's Guide to HTM. Unicode code charts for representing special characters like accents and diacriticals are also available on-line.

Standard HTML Diacriticals

FORMAT HTML SOURCE APPEARANCE
Paragraphs <P> This should appear at the beginning of the paragraph.
Italics <I>italics</I> italics
Bold <B>bold</B> bold
Acute Accent &eacute; &Eacute; &aacute; é É á
Grave Accent &egrave; è
Circumflex &ecirc; ê
Umlaut &euml; ë
Ring Accent &aring; å
Tilde &ntilde; ñ
Cedilla &ccedil; ç
Pound sign &pound; £

If you're using these character sequences, be sure to include the final semi-colon at the end of the sequence! Some browsers will display the sequence appropriately without the semi-colon, but others will not. We hope to be displayable across as many as possible.

If you are not using HTML formatting to indicate italics and accents, you can still indicate these things by using special formatting commands in your word processing program. If you use word-processor-specific formatting, it will ultimately have to be replaced with HTML commands to create the same effects in HTML. Not all programs do the same things with special sequences when saving to RTF format, so check to be sure that any special characters you are using in your file still appear in the RTF version of the file that you send to me.

The Trouble With Hyphens

Unfortunately, the HTML world hasn't really agreed on a standard way to represent hyphens and dashes. For the texts we're putting on-line, we need three hyphen or dash options. The three options need to be both semantically distinguishable (internal to my files) and visually distinguishable (to the reader of the page.)

The convention I'm following uses numeric Unicode character entities that are consistent with the &ndash; and &mdash; symbolic character entities for HTML. The numeric Unicode entities are supported by some browers that don't support the symbolic HTML entities yet.

The distinctions appear more clearly in some fonts and font sizes than in others. With my current computer and browser, I find that Times New Roman, font size 12, is fairly good. Trebuchet font makes less of a distinction between dashes and hyphens, but is particularly good for distinguishing between ones, l's, zeros, and 0's. You may need to experiment a bit to find what works well for you.

Character sequence: Appears as:Example:
Hyphen:- - trap-door
Medium Dash: &#8211; fall down–heaven knows
Long Dash: &#8212; she preceded C—

"Not expecting you, miss, I have no proper room prepared; indeed, the only tolerable room I can put you in is the room with the trap-door–if you would not object to it," said Mrs. Condiment, as with a candle in her hand she preceded C— along the gloomy hall and then opened a door that led into a narrow passage. "Now, my dear, take care of yourself, for this bolt slides very easily, and if, while you happened to be walking across this place, you were to push the bolt back, the trap-door would drop and you fall down–heaven knows where!" [Adapted from The Hidden Hand (1859) by E.D.E.N. Southworth (1819-1899)]

If you aren't comfortable using the character sequences shown above, you'll still need to distinguish between different lengths of dashes. Text editors encode medium and long dashes in a variety of ways, some of which don't survive file conversion, so I encourage you to transcribe a medium dash as two short hyphens --, and a long dash as four short hyphens ----. These are easy sequences for me to search for and replace as needed.

White Space and Punctuation

Spacing in punctuation can be tricky, especially if you are using a scanner. Sometimes it's hard to decide whether the original text uses extra whitespace or not. If in doubt, the important thing is to choose and use a consistent convention. I generally assume that extra whitespace does not appear around punctuation. For example, I don't put spaces before periods, question marks, or exclamation points at the end of sentences. Similarly, when words or sentences appear in quotation marks, I don't add extra whitespace. If "this" appears in double quotes, it is not transcribed as " this ".

One place where additional white space may legitimately occur, is within words which are commonly connected in current usage, but weren't always connected in the past. Examples are "any body" vs. "anybody", and "would n't" vs. "wouldn't". Older conventions don't always match modern conventions. Again, it's important to match the original text, and to be consistent throughout (unless of course, the original text is itself inconsistent, in which case its idiosyncrasies should be respected.)

Some text editors use special punctuation characters which don't display in HTML. If these are used in submissions, I have to do extra work to replace them. Smart quotes are the most frequent nuisance. Please use straight quotes, rather than smart quotes (most text editors will allow you to turn this feature off). To reproduce an ellipsis (three periods in a row), please use three separate period characters, not a special "three-in-one period" character. You'll save me time and effort!


Step 3. Finally, go through the entire text and proofread it against the original pages.

A well-done proofreading pass is critical. It can make all the difference between producing a mish-mash of errors, rough edges and idiosyncrasies, and producing something that we all can be proud of, and happy to see on-line.

Whether you're scanning or typing, there will be errors when you enter the material. You must go back and read through it a second time, carefully comparing your version against the original text, before you send it to me. This is still essential if you are working with a scanner, since scanners can introduce some very strange errors into the text. Careful proofreading is essential in finding and correcting mistakes. I expect you to do it.

If you can control the font you see when you are proofing, pick a font in which similar characters are easily distinguished, and a font size that is easy to read. It is essential that 1 (the number one), l (lower case consonant l) and I (upper case vowel I) are distinguishable. You should also be able to distinguish between 0 (the number zero), O (capital letter) and o (small letter). I also tend to use a larger font size than I would normally use for reading, since it makes it easier to catch errors. Another useful trick is to display the text in one font when you transcribe it, and another when you proofread, e.g. Times New Roman and Trebuchet.

When comparing my transcription to the original copy, I generally use a pointer or a marker to hold my position in one while I look at the other. If you are comfortable doing so, you can proof on-screen. Some people prefer printing out a paper copy of the to-be-proofed text, with the same line width as the original pages, and going through with red ink to compare each paragraph, phrase, word and punctuation mark.

Certain types of errors tend to occur more often in hand-typed text than in scanned text, and vice versa. But don't assume that because a particular type of error occurs more frequently in another type of transcription, it won't occur in what you're doing. Hopefully the examples below will both give you an idea of errors to watch for and motivate you to proofread carefully!

Please don't try to short-cut, by relying on your scanner or spell-checker instead of proofreading the entire text yourself. With both scanning and typing, it's essential to compare the text you're producing to the original, phrase by phrase, and word by word if necessary. The intelligence of a human reader IS needed to catch errors, whichever transcription method you use.

Characteristic Errors In Hand-Typed Text:  
Type of Error Example/Note Detected by
Transposed Letterse.g. "thsi" for "this" Spell-checking or Proofreading
 e.g. "form" for "from"; "dog" for "god" Proofreading
Missing Letters e.g. "eample" for "example" Spell-checking or Proofreading
  e.g. "fog" for "frog"; "kettle" for "kettles"; "though" for "thought" Proofreading
Missing Words e.g. "And Tiny Tim, who did die...." Proofreading AGAINST THE ORIGINAL
Missing Phrases, Sentences, or Paragraphs may not interfere with the obvious sense of the transcription Proofreading AGAINST THE ORIGINAL
Substituting Wordse.g. "a" for "the" Proofreading AGAINST THE ORIGINAL
British vs. American Spellings e.g. "honour" for "honor" or vice versa. Proofreading AGAINST THE ORIGINAL

Scanners make it easier to avoid some types of errors, but they can actually introduce other types of errors. Also, using automatic spell-checking and correction (i.e. relying on your scanner to 'decide for you' as to the correct spelling of a questionable word) is not recommended. Plausible looking errors are more likely to be missed when you proof than obvious mis-scanned words.

Characteristic Errors In Scanned Text:  
Type of Error Example/Note Detected by
Mis-transcribed Letters e.g. "be" for "he" (or vice versa), "tile" for "the", "not" for "now", "lying" for "dying", "tartar" for "tartan" Proofreading
Mis-capitalization of Letters "S" for "s"; "W" for "w" (and vice versa) Proofreading AGAINST THE ORIGINAL
Missing Letters (especially at end of lines) e.g. "stil" for "still" Spell-checking or Proofreading
  e.g. "though" for "thought" Proofreading
Missing Lines (especially at ends of pages) sometimes but not always interferes with the apparent sense of the text Proofreading AGAINST THE ORIGINAL
Missing Pages sometimes but not always interferes with the apparent sense of the text Proofreading AGAINST THE ORIGINAL
British vs. American Spellings (beware spell-checkers) e.g. "honour" for "honor" or vice versa. Proofreading AGAINST THE ORIGINAL
Similar Characters: 1 (number one), l (lower-case consonant), I (capitalized vowel) e.g. "al1" for "all" Spell-checking or Proofreading with distinguishable fonts
Similar Characters: 0 (number zero), O, (upper-case vowel) o (lower-case vowel) e.g. "1OOOO" for "10000" Spell-checking or Proofreading with distinguishable fonts
Garbage Punctuation e.g. periods in the middle of sentences, extra punctuation at ends of sentences Proofreading
Extraneous Spaces (around words or punctuation) e.g. " this " for "this"; "Why ?" for "Why?" Proofreading AGAINST THE ORIGINAL
Missing Spaces (especially older spacing conventions) e.g. "anybody" for "any body"; "shouldn't" for "should n't" Proofreading AGAINST THE ORIGINAL
Single Letters e.g. "S U R PRISE" for "SURPRISE" Proofreading
Broken Words (across line breaks) e.g. "de- sire" should be recombined to become "desire" Spell-checking or Proofreading
  e.g. "Wall- ace" should be recombined to become "Wallace", not split into "Wall ace" Proofreading
Broken Paragraphs (across page breaks) sometimes but not always apparent from the sense of the text Proofreading AGAINST THE ORIGINAL
Combined Paragraphs (often single line paragraphs) sometimes but not always apparent from the sense of the text Proofreading AGAINST THE ORIGINAL

If I can detect which method you were using to transcribe your text, from the kind of errors I find in it, there are too many errors!


Step 4. After you've entered and proofread your text, save your file in RTF format (Rich Text Format or Interchange Format) or plain ASCII text, and email it to me.

PLEASE DO NOT SEND ME FILES IN OTHER FILE FORMATS!

Each text editing program (and sometimes each version) has its own format for saving information. If you send me a file in a format that my programs can't read, I may be able to convert it to something I can read... or I may not.

Luckily, most text editors will let you choose which format to use for saving files. Two common standards are RTF (rich text format) which will preserve things like font sizes and fancy characters, and ascii text format (or text only) which does not retain this formatting information. Microsoft Word lets you "Save As" in either "RTF" format or "Text without Line Breaks"; Word Perfect offers options including "RTF export", and I think a text-only option. Any of these should be fine. I recommend that you do NOT use "Text only with line breaks" as this will change wrapped lines to lines with a carriage return at the end, throughout your file. This may make it impossible for me to find the ends of paragraphs automatically.

Since I'm turning the files into HTML, I know it sounds reasonable for you to convert files to HTML ... but programs that support conversion to HTML don't do so according to a single standard. I have to do a lot of clean-up work if files are auto-converted to HTML. So I much prefer to receive an RTF version.

You can also make my work easier if you give the file a name that will help to uniquely identify it for me. An helpful naming convention is for you to indicate the book, and the first page number of the text you worked on. For example, if you typed in pages 181-196 of Aurora Leigh, name the file "Aurora181.rtf" rather than "aurora.rtf". This makes it much easier for me to keep track of the files as they come in.

Once that's done, you can email the file to me at celebration.women@gmail.com The best way to send it is to attach the file to the email message. If your mailer doesn't allow you to attach files, you may be able to include the file inside the message. (Be warned that formatting and layout information may be lost this way.)

If you enjoyed doing your selection, and would like to work on another, PLEASE TELL ME, in your email, that you'd like more. If you have a preference for particular book or type of book, mention it! I won't assume that you want to work on another text unless you say so. People's situations change enough that even if you had lots of time last month, you might be much too busy this month.

As I get the sections back, I'll add HTML formatting and do a final proof pass on the sections. When I link your text in on-line, I'll add your name to the list of contributors to the book. If you would like to add a brief dedication (one or two sentences) with your name, let me know what you'd like to have said.


Contacting Us:

That's all there is to it. Thank you for your interest in putting books by women on-line! I appreciate your help. Please email me at
celebration.women@gmail.com if you have any questions, or if you would like to work on a text for the Celebration of Women Writers. If there are issues or questions you're concerned about, or you're generally interested in electronic books and digital libraries, you may also find the BookPeople Mailing List to be of interest.


Editor: Mary Mark Ockerbloom