Wednesday, October 16, 2024

Converting PDFs to Editable Text

 Or something like that.  We have about 58 (could be 57) PDF files of addresses given at Carnegie Hall in the 19th and 20th NYSEC seasons.   I have long wanted to convert them to something more readable and/or editable.  As the texts currently exist, they appear to be carbon copies of typed scripts.  The paper is yellowed with age, but it also seems to be a thin onion skin.  The carbon paper had, no doubt, been used several times already, hence the slightly blurred type.  The scans sometimes showed folded pages--obscuring text--and frequently were tilted to some greater or lesser degree from the horizontal.  To be honest--if also a tad biased--it's just not comfortable to read these texts.

I tried to convert the PDFs to Word files, so that they might be a bit more readable.  This was the result:


Not only was this even worse for reading, it was not going to be usable if we wanted to consider print publication.  

So I tried another method to create readable text:  Dictation.  Reading aloud to MS Word, I applied clear diction to spoken punctuation and formatting instructions and managed to read an entire address into Word.  This was much better.  It took awhile to figure out the command "lingo" (Word wants to hear "new line" rather than "paragraph").  The process was somewhat tiring.  Some measure of proofreading and reformatting would still be needed, but the outcome was readable.

Then my grandson asked "Why don't you just retype it?"  So I tried that, too.  Although I'm not a trained typist, I have gained some speed over the years, and I managed to retype about 4 pages in an hour. Retyping was, in my mind, the easier and more productive (less tiring) way to get a readable text (more attractive to the contemporary reader).  Even more important, retyping into a word processing program such as Word allows for easier copyediting and reformatting for print and online publication.

With that bit of "research," the project had to sit on the back burner while other projects took priority.  Eventually I was asked to talk about some of one of those projects for a Sunday gathering of the Ethical Society of Austin.  I presented:  "The Big Dig:  Delving into the History of Thought in Ethical Culture." Some of my talk was about online repositories of digital books and journals, but most of it was about the Bibliography of Ethical Culture.  I tossed out tidbits of (to me) fascinating information that I had been discovering as I worked on the Bibliography, and I took every opportunity to highlight points at which many hands could make the work light.  The good news?  Three new partners in the work!

For now the exciting news is that one of those partners, an excellent typist, will help pull the Carnegie Hall Project off of the back burner and get to work on giving us some new typescripts to work with.  Stay tuned.

No comments:

Post a Comment

Converting PDFs to Editable Text

 Or something like that.  We have about 58 (could be 57) PDF files of addresses given at Carnegie Hall in the 19th and 20th NYSEC seasons.  ...