Scanning a book start to finish with MODI
byWith my Adobe Acrobat trial long expired, I’m using Microsoft Office Document Imaging (MODI) as my multipage-searchable-digital-book maker. I never actually did a whole book with MODI, just a trial chapter once to compare to Acrobat. I also cut the binding off the book for reasons I’ll go into later (another new experience). In the end, MODI gave me everything Acrobat did minus bookmarks and plus ink comments for the low price of free.
Summary:
-
Why I removed the binding
-
How to manually remove the binding
-
What to do with the loose pages
-
Scanner folder preparation
-
BookPilor or Microsoft Office Document Scanning?
-
Making a multipage TIFF - XnView
-
Running OCR
-
Making huge files much smaller
-
The finished product
I’ll start with saying this book was a 1000+ page behemoth with color 7” x 9” pages. I didn’t have a chance to stop by Kinko’s, and since I’m not feeding it through a scanner—glue, frayed edges, and crinkles don’t work well with automatic document feeders—the edges didn’t need to be perfect and so I did it myself.
Why I removed the binding
No, I don’t own an ADF scanner and it was even a new book. But I’ve scanned a couple books the bound way and large books are a pain, even with a special book scanner. Reason #1 for removing the binding was to ease the pain in the neck (literally) of lifting and turning and a seven pound book 1000 times creates.
Reason #2 is that I plan on following a post on our forum suggesting to take the current chapter with you to class. That way I have a hardcopy, the copy on my tablet, and my notes with me in class so I can follow along in the chapter during the lecture without having to switch back and fourth.
That’s nice, you say, but why scan it in then? Because then I can have the entire book with me at all times. I don’t have to remember to bring the hardcopy with me. I don’t even have to know what chapter we’re studying to have the right chapter with me. The hardcopy is only a convenience. Also, if by the chance my battery is dead or charging, I have something to read and study (no loss in productivity).
One last bonus from this method is I get less weird looks the less I whip out the digital copy in class. Just a personal perk since it’s really hard to explain why you would scan in an entire book.
How to manually remove the binding
This was a lot of trial and error. From what worked and what didn’t, maybe you can find your own method.

Idea #1 - The big knife I started with a huge kitchen knife. Actually two knives since I had one straight edge and one ridged. The idea came from thinking a book is just paper and sharp blades are usually pretty good at cutting paper, so I tried sawing at the book binding with a sharp blade.
Well, it cut through the cover easy enough. Cardboard it can do. The pages weren’t to fond of the knife idea though and ended up either guiding the blade crooked through the book or stopping the blade altogether. Maybe I’m just weak, but I wasn’t able to get either knife to cut through more than about 10% of what it needed to. Didn’t seem the safest thing to do, anyway. But it did get the cover off, which lead to idea #2.
Idea #2 - Removing the sticky stuff from the edge Sounds like a sane idea to me. With the cardboard part of the spine removed using the knife, the rubbery glue was exposed and begged me to try pulling it off. One tug must have had it laughing because it sure wasn’t going anywhere.
OK, well, what if I heat it up? Maybe then it’d give some. I pulled out the hairdryer and aimed it at the glue for a couple seconds, tried my fingernail at it again, and all I got was melted glue on my nail. Technically, I guess this would work eventually but it would take way more time than I wanted to give it (I’m talking hours more time).
Idea #3 - Pulling the pages off the binding in chunks This idea actually worked best for me. I did this before with my physics book because it was very, very used and the pages practically fell off the binding. But the book in this situation was new and not very, very used so I had no idea if the pages would come off so easily. Well, once the first couple pages were off and started, everything came out easily by peeling 20-30 pages at a time.
Unfortunately, my big knife idea made cuts about a quarter inch from the glue so that when I tried to pull/peel the pages, all that would happen is it would instead rip at that little cut at the top and bottom. I had to pull from the <0.25” strip that was still securely attached to the glue.
Idea #4 - Scissors Since I didn’t want pages with cuts going through the edge and strips hanging off from my knife experiment, I finally broke down and pulled out the scissors. I cut about 20 pages at a time along the knife line and then pulled the remaining quarter inch strip out as outlines in idea #3 to make room for the scissors.
Idea #5 - Kinkos/Copy Shop They cut bindings off books. I think that would have been easier, but I was lazy and didn’t feel like driving over there.
What to do with the loose pages
At first I was afraid the pages from this book wouldn’t fit in a standard three hole binder. It’s shorter than a normal piece of paper and the holes went all the way to the edge with only three millimeters to spare on each side, so I started exploring other options.
I first thought to try for a binder with two rings instead of three since I had a two-hole punch (meant for putting holes at the top of the page). No luck at my local Office Depot. Then I tried to find one of those presentation covers or binders with the metal holders/rings on top, again at no luck.
I didn’t feel like searching store to store for the perfect binder/presentation cover, so instead I picked up some of those two-inch loose binder rings that come in a little box.
When I looked at the pages again, I decided 3 mm was enough clearance for me and ended up using a three-hole punch and putting the pages in a normal binder.
My suggestion is to get Kinkos to punch the holes since it takes a while and that way you’ll get straight holes and the pages will be in line. One note of caution, don’t put the holes in the paper until after the page is scanned. If you punch the wholes first, you’ll have black circles on the scanned pages and that would get annoying.
Scanning folder preparation
To make assembling the book later in the process easier, it’s best to make a separate folder for every section of the book. If you want to separate it by chapter, one for ever chapter. If you just want Part I and Part II, go for it. Because MODI doesn’t have any way of organizing a multipage file (like PDFs with bookmarks) other than sequential pages, I went ahead and prepared for one multipage file per chapter/appendix.
SPACE WARNING If you scan the entire book, create the multipage TIFF (Tagged Image File Format) files, and then go back and delete/remove the original scans, you need to have a lot of free hard drive space. I topped out at 14 GB. TIFF files are big, but don’t worry because later you’ll be shrinking the files down to under 120 MB for a 1000 page book. Just know how much space you have to work with during the process.
BookPilot or Microsoft Office Document Scanning?
MODI gives you the option of importing a scan from Microsoft Office Document Scanning and making a multipage TIFF on the spot. I experimented with this, however I was never able to get the scan to look right. For some reason it came out very light and I was never able to find a way to fix the scan quality.
For those using ADF scanners, however, I recommend the Microsoft scanner as it handles ADF scanners well.
After playing with setting for about an hour with no avail, I gave up and went back to the trusty BookPilot that came with my OpticBook 3600. I went into settings and moved the brightness down slightly and contrast up slightly and the color scans became real crisp and saturated. I set the resolution at 200 dpi since TIFF files were already huge and I didn’t want them any bigger by setting it at 300 dpi. After navigating to the correct folder to put the scanned files and changing the file type to .tiff, I was ready to scan.
For those wondering, I timed myself at 11-14 seconds a scan, averaging 12.5 seconds a page. To estimate the amount of time it will take to scan a book with this particular scanner at 200 dpi, plug it into this equation:
(#pages) X (15seconds) / (3600) = (#hours to scan the book)
ex. 1000 pages = 4.17 hours
I added two seconds to make up for little things that slow you down, such as changing folders and forgetting which side of the page you were on ^_^. This doesn’t include bathroom breaks.
Five episodes of Desperate Housewives (via iTunes…it’s really not a bad show) later, the book was in the computer.
Making a multipage TIFF - XnView
Because I used BookPilot instead of Microsoft’s scanner, instead of having one multipage TIFF file, I had a folder full of single page TIFF files. That’s not very useful or desireable. The whole reason for using the TIFF format is so I can have one chapter in a single file.
Now, you can do this the hard way and use MODI to import page by page of the chapter. As far as I know, there isn’t an easy way to combine multiple files in to one file. Instead, I decided to go for a third-party software solution.
XnView is a free image program and as described on Pricelessware.org:
XnView is a utility for viewing and converting graphic files. Features : Import about 360 graphic file formats. Export about 40 graphic file formats. Multipage TIFF, Animated GIF, Animated ICO support. Resize, Copy/Cut/Crop Adjust brightness, contrast… Modify number of colors, Apply filters (blur, average, emboss, …), Apply effects (lens, wave, …), Fullscreen mode, Slide show, Picture browser, Batch convert, Thumbnail create, Screen capture, Contact Sheet create, Multi-page file create (TIFF, DCX, LDF), TWAIN support (Windows only), Print support (Windows only), Drag and Drop support (Windows only), 36 languages support (Windows only), And many many other things.
This program did the trick easier than I expected. When the program opens, completely ignore whatever’s shown on the screen and just hop up to the “Tools” menu. Go down to “Create Multi-page…” and click.
From here it’s simple. Click “Add,” add you files, name the new file you want to create and where you want it to go, and click “Create.”

100 pages takes about a minute or two to process so don’t worry if nothing happens at first.
Running OCR (optical character recognition)
You should now have your humongous multipage TIFF files and should be ready to make them searchable. MODI really shines on this part and it’s insanely easy to do.

The default settings should work for you so try that first. All you should have to worry about is if you want to run it on one page or all the pages.
Once OCR is performed, the only difference you should notice is a little OCR icon on the thumbnails and that now you can click and drag across the page to select text. The page itself looks exactly the same.
For a 100 page file, mine took around five-ten minutes to run. The good news is you can run it on as many files as you’d like at one time (or as many as your computer can handle). I had five or so going at once and it saved some time since I could then walk away from the computer for a while and let it do it’s thing.
Making huge files much smaller
It’s not exactly the best situation to have 5 GB occupied by one book. If you leave the book in TIFF format, that’s probably about what you have at this point in the process. In comes the MDI (Microsoft Document Imaging Format).
Simply clicking “File—>Save as…” and changing the file format to MDI, then clicking “Save” is all you need to do. Your file is instantly MUCH smaller.
After making sure the files are working and that they’re how you want them, feel free to toss the huge TIFF file. I personally keep the individual scans on an external hard drive just in case something happened I didn’t think of until the end of the semester, but you can toss those too if you’d like.
The finished product
You now should have your book scanned, searchable, and organized. MODI allows for easy mark-up with Journal-like ink, text boxes, images, and highlighting. You can also select text and copy the text into Word or whatever you like, or export the entire book to Word if that’s what works best for you.

The program has a very customizable view and the book is easy to read. Don’t miss the “Reading View,” a nice full page viewer for those who like to maximize screen space.
More MODI and scanning articles can be found under the scanning category of our site.
Enjoy!
Other posts that may interest you:
If I have books next semester, I might just scan them in!
January 12th, 2006 at 1:40 amWow! What a great post Tracy! THANKS!
January 12th, 2006 at 8:42 amThis updated process, then, is taken to be a replacement of the Adobe/PDF Annotator method described last year? Is it actually better or just cheaper? The OCR step has me puzzled. When I run OCR from other programs on tiffs that contain lots of images, graphs, and mathematics I get lots of wierd junk rendering the whole thing unusable. Obviously this is somehow not an issue with MODI. I can’t stand having to correct and edit OCR’d docs. I think it was mentioned a while back that you just overlay an imperfect OCR’d doc with the MDI or PDF in order to have something searchable but this seems kludgy and memory intensive.
At any rate, Tracy, do you do all this on a desktop or all on your M200? I would think that a powerful desktop would make this task more bearable since it has to be done in the home study anyway rather than in a mobile situation.
Also, for someone who has a library of books I would have to arrive at a better solution than a three ring binder for shelving the thing, given that I may end up with dozens of such books. There are ways of bradding or clamping 2 or 3 inches of loose pages such as what is used in catalogs and ponderous tomes, but I will have to research. I have found many useful things at library supply houses like http://www.highsmith.com for this. The idea is to rebind the book in a way that allows storage just like it was a regular book, but can be unbound in a jiffy.
My first project may just be my 1300 page calculus book that weighs in around 15 pounds and costs 140 dollars. This takes serious planning and commitment to pull off. i am getting squeamish thinking about it!
January 12th, 2006 at 9:21 amGreat article!
I work off a Sony Vaio TR series (which is really small to begin with) and while not exactly a tablet, is small enough for me to carry everywhere. My bother has always been all the textbooks that I’ve had to carry around in addition and have always wanted textbook publishers to make digital editions, but they’re so rare to find.
I’ve tried scanning textbooks myself, but have found that often the time spent doing it (In your case 4 hours) was sometimes just too much and not worth the hassle. If only there were an ADF that could flip the double sided pages of books and make it much more efficient during scanning. Then, also, you lose the resale value of your book by having to rip off the binding making it completely unsellable.
I’m really dissappointed in textbook publishers in not embracing digital technology to offer digital editions most especially to college students who often have to have two or more texts per class. But, they likely make more money off a book than a CD.
January 12th, 2006 at 11:32 amWill - MODI shines in places Acrobat doesn’t in that it has great mark-up and tablet support. Acrobat is great for those who want to bookmark and insert links into thier PDFs. Personally, I can live without bookmarks to save $130, or however much Acrobat is.
When MODI runs OCR, it puts a text underlay (is that a word?) on the page, so yes, you don’t see the actual text but you can select it. I think this is the only way to go with OCR at the moment for textbooks simply because of the size of the project. If it were pure text and image inserts, it would bug the daylights out of me for .1% of the words to be wrong or missing. It would be hard to read. However, if only 99.9% of the book is searchable, I can definitely live with that. The few paragraph and sentences I’ll need to copy/paste into another program can easily be fixed if need be, but the 99% statistic gives me pretty good odds that the one sentence out of the whole that I want at the moment book will be correct, or only have one error. I can live with that too.
It does take up extra space, but I couldn’t see using the textbooks without looking at the original image. For the time being, it’s the best solution I have.
For book storage without using a binder, looking into a rebinding service is a good idea. Also, some of those loose rings put through the book keeps the book it’s original size with just some rings keeping it together.
January 12th, 2006 at 12:06 pmHi Tracy,
To reiterate everyone else, thanks for the great post–this is super helpful.
I have a few questions about the utility of MODI.
Is the mdi format as ubiquitous as pdf? More important, will it be? One advantage of pdf is that it is a standard and nearly everybody uses it. This makes it easy, for example, to read my books on my friend’s computer, because I don’t have to convert the file format first. Another benefit is that I can be pretty confident that, if I put in all this effort converting my books to pdfs, it won’t be a waste of time (because everybody won’t be switching over to something much better anytime soon). Or, if I do have to switch, everybody else will be in the same boat, which means that programs that do this will be readily available. Since I have MODI on my tablet with Office 2003 and not on my lab computer with Office 2000, I believe it must be fairly new, so I was wondering how confident you are in MODI being a good long term solution (not that adobe is the world’s greatest company, but I can feel confident that pdfs are here to stay for a good chunk of my lifetime anyway). I guess that keeping the TIFFs also is one way to take care of this, but as you’ve mentioned, they are huge. Jpegs are much smaller, but I’ve heard they aren’t very good at storing text. Does anyone know if this is true or why (to my eye at least, they don’t look too bad)?
You also mentioned that MODI had better markup features than Acrobat. Does this extend to the highlighting features that you and Eric Mack sort of figured out on Acrobat? I agree with you both that it would be extremely useful to be able to pull out all of the annotations and annotated text and figures to a seperate file, or at least be able to look at them all at once (even better if you could sort by highlighter color, etc.). Have you tried this, or is it still a pipe dream?
January 12th, 2006 at 6:11 pmIn response to Chris - but might be of interest to others. The Visioneer Strobe XP 450 (Xerox and Fujitsu also have something similar) is ideal for this kind of operation. It’s ADF only, and the interface lets you scan a batch of pages (up to 25 or so) and then flip the batch and scan the other side, after which it delivers the whole lot with the sides in the right order. Mine came with Paperport 9, which lets you make multipage TIFFS/PDF/JPGs. Another older, but still useful, program is Imaging Pro for Windows, which comes with a Flow batch utility that also does page collation. I use the Strobe mainly to feed my paperless filing cabinet, but after reading Tracy’s article I’m planning to turn a couple of Chinese and Japanese textbooks into MODI format Tablet fodder - the MODI annotation system will be just right for practising writing characters.
January 13th, 2006 at 5:30 amTracy,
A couple of questions, because I’ve scanned several books lately in preparation for teaching my class at UAB:
My download of Office 12 got rid of MODI–did yours? If not, why?
I thought you had Adobe Acrobat, full version, why not just scan it immediately to there in the first place?
The reason I ask . . . In my scanning of books, I use 300 dpi and black/white setting with Adobe. The scans aren’t always perfect, and I’d like something that did a better job. Without MODI, I can’t compare. What do you think?
January 16th, 2006 at 4:20 pmTracy: have I got a tip for you! And anyone else struggling with removing book bindings . . .
1) Make friends with a woodworker (or otherwise get access to a woodworking shop).
2) Ask your friendly woodworker to cut off the edge of the book where the binding is for you. (Note: for best results, I recommend using a bandsaw for this, but a scrollsaw, table saw, or even a jig saw will work. If you do this yourself, PLEASE be sure to ask how to SAFELY operate the machine.)
3) You’re done; elapsed time: oh, 20s per book; results (using a cheapo bandsaw): very clean.
I’ve done this myself with dozens of magazines (with consistently great results) and it should work just fine (maybe a little slower) for books also.
January 18th, 2006 at 3:23 pmPhilip - I actually don’t have the Office 12 beta, so I wouldn’t know. I also didn’t have the full version of Acrobat 7 (until today, so expect more Acrobat articles!! I did have Acrobat 5 for a while though, but it wasn’t quite the same experience).
I also don’t scan directly to Acrobat because I like how smooth and easy BookPilot is with 500+ pages. For 1-10 pages, I can easily see using Acrobat, but I don’t even have to touch the mouse/pen to scan with BookPilot.
January 19th, 2006 at 12:54 amOh also, my version of Office XP has MODI, so it was at least out for the last two releases. I have a feeling microsoft would not make their own file format obsolete, but I’m going to experiment with creating a PDF from a MDI just in case.
January 19th, 2006 at 12:56 amThank you Tracy for this excellent article!
February 14th, 2006 at 6:24 amThanks Tracy for the tips and article.
I’m new to Tbalet PC but I’ve started to scan my courses docs the way yo describe. Works fine for me. I’m using MODS to scan from a canoscan flatbed scanner. A little slow, but on my first try I did 125 pages in about 50 minutes…
Great site ! I caome everyday since I got my Tablet (2 weeks ago).
Keep up the god work !
Richard
February 24th, 2006 at 7:59 pmThis was working beautifully until I tried to make the file smaller by switching from TIFF to MDI - then I found out (via much time on the Microsoft Support Site shudder) that the MDI format is only supported by MODI in Office 2003, not MODI in Office XP Pro (or 2000 for that matter) - which is what I have. Any work arounds/suggestions?
Thanks!
May 24th, 2006 at 12:13 amHaving just previously posted about a problem with not having the MDI extension in Office XP I wanted to report that I did notice something peculiar. After using MODI to OCR scan the pages (which can be found on the file menu, there is no icon in my version) and resaving a 23,663 KB document becomes only 4,005 KB. This is small enough for me - just thought you might like to know!
May 24th, 2006 at 12:52 amThanks for posting information about XnView. This allows me to scan directly to TIFF’s at work with much greater ease. It used to be very difficult to work with TIFF’s, but now it takes mere moments to get documents in their final form (OCR’d MDI, or PDF).
February 22nd, 2007 at 7:19 pmI’m also confused about the OCR part — what program are you using to do the OCR, and how good is the output?
April 6th, 2008 at 12:34 pmHi, You can simply take any size book either soft bound or hard bound and have them cut the binding to 1 mm or so. I just did it to-day on a 500 page soft cover book. it costs me $1.75. I am planning to scan that as PDF using a fijitzu scanner which can scan 15 pages a mt. You can bind it back with sipral or glue it would cost another $3. for spiral. Hope this is helpful for folks who want to cut the binding.
April 23rd, 2008 at 2:47 pmsadasivam eniasivam, you mentioned having “them” cut the binding. Who is “them”. I’m interestd in getting this done.
August 9th, 2008 at 7:32 pm