The Online Books Page

HOW TO PUT BOOKS ONLINE

A Guide for Beginners

(This guide is adapted from a guide written for the Celebration of Women Writers. We're also looking for interested transcribers for that project.)

We're looking for volunteers to help put books online. It's not difficult for one person to enter their favorite text, and once one person has done this, it becomes available to millions of Internet readers. Here's how to go about it.

Find a suitable text, and tell us about it

For The Online Books Page, we're looking for complete, English-language books in any subject. You can choose a book from this list of requests, or choose another book that you would like to transcribe.

Any text you choose must either not be copyrighted, or be approved for free online use by the copyright holder. In the United States, any work published before 1923 is no longer copyrighted. (In other countries, copyright usually lasts at least 50 years after the author's death, but laws vary.) Note that revised texts, translations, and other derivative works can get a new copyright from the date of their creation. Check the copyright information (usually on the back of the title page) to see what copyrights are claimed. For more details on copyrights and permissions, see this page.

If there's a particular title you know you want to transcribe, but you're having trouble finding a copy to work from, see this page for some suggestions on where to find copies.

Nowadays you can often find page images of books at mass digitization sites like Google Books and the Internet Archive. These projects put hundreds of books online every day, but only in page image form (and unproofread computer-generated scanned text). So it can still be quite useful to transcribe a clean text copy of an important book.

If you'd rather not start out with a whole book, but would like to try something smaller first, you might want to try one of the distributed book-posting projects on the Net. For instance, the Distributed Proofreaders project lets you proofread individual pages of a previously-scanned text, and then when all the pages have been proofread twice, the book is posted to Project Gutenberg, a widely distributed etext collection that gets indexed on The Online Books Page. To get more information about the Distributed Proofreaders, or to join the project, see their site.

Make an electronic copy of your material

You then need to get your text into electronic form. You can do this by scanning and running an optical character recognition (OCR) program, or just typing it in. Or, if you are the author of the material, and already have the book text in word processor files, you may be able to just save it as plain text, HTML, or some other suitable format for the Web.

If you need to input text from a print copy, and you have access to a scanner, you'll probably want to use it, since scanning is significantly faster than typing for most people. Flatbed scanners are available in many schools, libraries, and workplaces, and can be bought for as little as a few hundred dollars. Many scanners come with optical character recognition (OCR) software, which is quite accurate once you've adjusted your settings properly. (For best results, lay your book flat on the scanner, and close the top lid as much as possible. Then experiment with the brightness level until you find a level that gets all of the letters and little of the other stray marks found in books.) To give you some idea of time, it took me about 3 hours to scan in all of E. Nesbit's Five Children and It using a Silverscan II with OmniPage Professional software.

Now that storage and bandwidth is relatively cheap, some etext projects produce page images without a text transcription. This can take less time to do than a full text transcription, but the resulting book might not be as easy for readers to use. (Since there's no actual text produced, it can't be searched, or reformatted for people with disabilities; and it takes longer to download.) It will also take up more space than a transcription. Still, page images can be useful for readers who want to see just what a book looked like. Some projects produce both a transcription and page images. Even if you don't do this, you might be interested in scanning images of illustrations or other important material that isn't as easily conveyed in textual form.

You can also type the work in if you prefer, or if a good scanner is not available. The time required depends on your typing speed, and generally is considerably slower than using a scanner. But it can be done by anyone with a computer, without any extra equipment.

If the text includes Greek or other non-Roman characters, we suggest using Unicode as the encoding for these characters. (It's possible to encode Unicode characters as HTML entities, using only standard ASCII characters.) Unicode is more portable, and is more likely to remain readable in the long term, than specialized 8-bit encodings. If you need fonts for Unicode characters, here are some pointers. (I'm told that recent editions of Windows and MacOS do have at least one font that covers most Unicode characters.)

Check it for accuracy

Errors can -- and inevitably do -- creep into a text, whether you've typed or scanned it in. So you'll want to proofread the electronic copy, or have someone else proofread it, before submitting it. Even if you've only produced page images, you'll want to double-check that you have all the pages in the right order, and that they're all legible.

When academics or professional publishers prepare a research-quality text, they usually have it proofread at least twice, by different people, each carefully comparing the new text with the source text. If you're just planning on supplying the text informally to Internet readers, you don't have to be that rigorous. You should, however, go through the entire text at least once, with the original book handy to check consistency. With scanned works, it may be sufficient just to read the electronic text through at a reasonable speed, checking the book whenever something looks strange and making corrections as needed. Also run the text through a spelling checker for good measure. Errors in a typed text are often less obvious than those in a scanned text, so you may want to be more careful to compare the two texts as you go along. (The proofreading process can be a pleasant opportunity to read or re-read the book yourself.)

Occasionally, you (or your spelling checker) will come across something that looks like an error in the original source text. We recommend being very cautious about correcting any "errors" in the original book. Writers through history use many spellings and idioms that are not familiar to modern American readers or spell-checking programs. Text, particularly dialogue, can intentionally involve non-standard usage or mechanics. For editions meant for research, many scholars prefer that no changes whatsoever be made in the electronic version of a text, or at least that any changes be explicitly noted. If you want your electronic text to be used for scholarly research, or for preservation, Marc Demarest's essay The Responsible Preparation of Electronic Literary Texts describes what many serious scholars look for in electronic versions of previously published books.

If you mean to prepare texts for a casual reader, you needn't be as picky. To us, corrections of obvious typographical or printing errors, or shifts in line breaks (particularly those that split a word) can be useful if done with care. There can also be good reasons to prepare an electronic version of a text that does not exactly match any previous print edition. Choose the policy that makes the most sense to you. In any case, it's a good idea to include some brief transcriber's notes at the start or end of the text, explaining what you've done and giving publication information on the source text(s) you used.

Publish it!

Once you have your text entered and proofread, "publishing" it on the Internet is easy.

If you already have space on a Web site, you can just place the work there, and tell us how to get to it. We can then include a link in the listings of The Online Books Page. It may also qualify for listing in one of our special exhibits.

Or, you can submit the text to one of many book archives on the Net. (There are a number of archives that are looking for all sorts of texts. We can help you find a suitable one.) Then we'll just link to the copy in those archives. To see examples of some of the texts and archives out there, see the list of archives.

For text formats, plain vanilla text or HTML (the hypertext markup language of the Web) are the formats of choice. Just about everybody can read and store plain ASCII text, so this is the most portable format. HTML lets you mark up the text in interesting ways-- such as adding accents and italics, or including hypertext links to related material-- but is not widely recognized outside the Web, and not supported by some text archives. Other formats are less useful, but it may be possible to convert some of them to plain text. Many academic probject now prepare texts in more detailed formats, like TEI (a SGML format). Since these formats are not so widely readable, many of them also provide translations into HTML or plain text.

That's how it works. Please write us if you have any questions, or if you would like to start working on a book.


Home -- About Us -- FAQ -- Get Involved! -- In Progress / Requested -- More Book Links

Books -- News -- Features -- Archives -- The Inside Story

Copyright 1995-2007 by Mary Mark Ockerbloom (celebration.women@gmail.com) and John Mark Ockerbloom (onlinebooks@pobox.upenn.edu)