Tim2000
Status: Dr. Seuss
Offline
Posts: 4
|
 |
« on: December 27, 2011, 08:05:43 PM » |
|
I downloaded several MOBI books from openlibrary.org. I found lots of mispellings and strange characters and odd formatting. I am able to read the books, but it gets annoying having to interpret what the author meant to say when I see for example, instead of reading "and then the cow walks into the barn" it reads "and then the cow /*s into the b .a..*"
Could this be a Kindle 4 Problem (I have $79 latest Kindle)?
Could this be a problem with Open Library when they converted the source material to digital?
Could this be a problem with the MOBI format? I also tried sending the books to my Amazon Account and then to my Kindle as AZW format, still have the funky characters and odd formatting.
I tried PDF, looks fine, but reading a PDF with a Kindle for any length of time is a pain, having to scroll to view the rest of the screen, or zoom out until the text is very small and hard to read.
The same issue occurs when I download MOBI books from archive.org.
Has anyone encountered this issue, if so what was your idea for a solution?
NOTE: I never have this problem when I download books from Project Gutenberg.
|
|
|
|
« Last Edit: December 27, 2011, 08:28:41 PM by Tim2000 »
|
Logged
|
|
|
|
|
jmiked
|
 |
« Reply #1 on: December 27, 2011, 08:33:43 PM » |
|
It's a function of the Optical Character Recognition used to convert from a scanned image to an ebook format.
It's not a mobi problem, or a Kindle problem, it's a software conversion problem from the OCR software. I see it in ePub books as well. I rarely see a book that doesn't have it to some degree, including from the Big Six.
Project Gutenberg does have these problems, but their books are a bit more carefully edited than some.
Mike
|
|
|
|
|
Logged
|
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I've found it!), but 'That's funny...'" - Isaac Asimov
|
|
|
Tim2000
Status: Dr. Seuss
Offline
Posts: 4
|
 |
« Reply #2 on: December 27, 2011, 08:45:15 PM » |
|
It's a function of the Optical Character Recognition used to convert from a scanned image to an ebook format.
It's not a mobi problem, or a Kindle problem, it's a software conversion problem from the OCR software. I see it in ePub books as well. I rarely see a book that doesn't have it to some degree, including from the Big Six.
Project Gutenberg does have these problems, but their books are a bit more carefully edited than some.
Mike
This makes sense now. This was what I suspected. It appears that with Project Gutenberg, they had people actually edit the books after they were scanned. A laborious project I am sure, maybe this is why their e-book library is so small compared to archive.org and openlibrary.org. These e-book libraries appear to have just scanned the books en-masse, with little or no editing. I guess I cannot complain, just have to accept it as a technology limitation. The books are readable to some degree, better to have them as an e-book than not at all. Thanks for the quick reply.
|
|
|
|
|
Logged
|
|
|
|
PhillyGuy
Status: Madeleine L'Engle

Offline
Gender: 
Wynnewood Pennsylvania USA
Posts: 69
|
 |
« Reply #3 on: December 27, 2011, 09:17:16 PM » |
|
It appears that with Project Gutenberg, they had people actually edit the books after they were scanned. A laborious project I am sure . . . To volunteer, go here: http://www.pgdp.net/c/I have done a little of the proofreading, although I found it tedious. You can select from different books they are working on to hopefully find something of interest. More experienced and skilled volunteers double and triple check what beginners do.
|
|
|
|
|
Logged
|
|
|
|
|
jmiked
|
 |
« Reply #4 on: December 27, 2011, 09:19:01 PM » |
|
I guess I cannot complain, just have to accept it as a technology limitation. The books are readable to some degree, better to have them as an e-book than not at all.
I don't see it as a technology problem, I see it as a problem with people not caring if they produce a quality product or not. I'd rather not have it than have a product riddled with errors. But that's just me. Mike
|
|
|
|
|
Logged
|
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' (I've found it!), but 'That's funny...'" - Isaac Asimov
|
|
|
|
kindlegrl81
|
 |
« Reply #5 on: December 27, 2011, 10:26:38 PM » |
|
If the PDF downloads correctly I would use calibre to convert the PDF to .mobi and then put it on the kindle. Calibre is free and I use it to convert all my PDF with great success so far.
|
|
|
|
|
Logged
|
|
|
|
Ann in Arlington
Inmate # 65
Global Moderator
Status: Shakespeare
   
Online
Gender: 
Arlington, VA
Posts: 32249
Go Nats!
|
 |
« Reply #6 on: December 28, 2011, 06:16:44 AM » |
|
Definitely an OCR issue and one of the reasons that I have gotten pretty selective about where I'll get books. . .even if they're free. Project Gutenberg has quality product. manyreads and feedbooks are usually pretty good but you'll get the odd one that's not great.
MOST of Amazon's are in decent shape but not all. . . .if you find one that's so bad as to make it a pain to read, report it to Amazon and they'll take it down and make the publisher fix it before they make it available again. This is one reason to use the sample feature and/or at least open a book as soon as you buy it, since you have 7 days to return it.
With PDF's, you might get conversion problems as well even, if you do it yourself, if the PDF is basically just a scan.
|
|
|
|
|
Logged
|
Ann Von Hagel Arlington, VA 
|
|
|
Tim2000
Status: Dr. Seuss
Offline
Posts: 4
|
 |
« Reply #7 on: December 28, 2011, 08:06:58 AM » |
|
To volunteer, go here: http://www.pgdp.net/c/I have done a little of the proofreading, although I found it tedious. You can select from different books they are working on to hopefully find something of interest. More experienced and skilled volunteers double and triple check what beginners do. I had no idea that this was being done with volunteers. With regards to the error ridden MOBIs and such, all I can say is 1. They are volunteers; and 2. It's free! what more can you expect for something that does not cost anything. I appreciate the valiant efforts on the part of all who proofread these books.
|
|
|
|
|
Logged
|
|
|
|
|
BTackitt
|
 |
« Reply #8 on: December 28, 2011, 08:16:25 AM » |
|
The Volunteers are at Project Gutenberg, not Openlibrary. They are WHY PG is a better place to get books. Each book goes through (I think) 3 layers of proofreaders at PG.
|
|
|
|
|
Logged
|
|
|
|
Ann in Arlington
Inmate # 65
Global Moderator
Status: Shakespeare
   
Online
Gender: 
Arlington, VA
Posts: 32249
Go Nats!
|
 |
« Reply #9 on: December 28, 2011, 09:29:29 AM » |
|
What Bev said. . .  Openlibrary is, I think, scanned and uploaded with, at best, a cursory inspection. . .probably just to make sure the pages are there but no proofing. At Project Gutenberg people upload things -- sometimes from having typed them in rather than scanned -- and then they're proofed several times. It's been around for a LOOONNNNGGGG Time . . .since around 1971 when eBook readers, tablets, even laptops and other 'personal computers' were still the stuff of Star Trek. The first thing uploaded was The Declaration of Independence and the rest, as they say, is history.
|
|
|
|
|
Logged
|
Ann Von Hagel Arlington, VA 
|
|
|
|
Bigal-sa
|
 |
« Reply #10 on: December 29, 2011, 11:47:29 PM » |
|
All I can say is that it's a real pain in the butt to proofread an OCRed book. Apart from the obvious errors, you have to try to get the formatting to follow the paper copy as well. And, on top of everything, you can't (or rather shouldn't) fix the grammatical and spelling errors either!
It's an incredibly time consuming process.
Sent from my U20i using Tapatalk
|
|
|
|
|
Logged
|
|
|
|
|
HappyGuy
|
 |
« Reply #11 on: December 30, 2011, 06:19:18 AM » |
|
I did some volunteer work with Distributed Proofreaders which uploads to the Gutenberg Project. It was fun! And, yes, material went through several iterations before being sent on to PG. We used an excellent program, ABBEY FineReader, which allows you to see the OCR'd text and the scanned image at the same time, so each word was compared to the original. OCR is nowhere close to being perfect yet. I just wish some of the big publishing houses would take as much care with their eBooks.
|
|
|
|
|
Logged
|
"From the lips of infants and children you have ordained praise..." Psalms 8:2
|
|
|
wdeen
Status: Madeleine L'Engle

Offline
Gender: 
Florida
Posts: 61
|
 |
« Reply #12 on: December 30, 2011, 08:17:53 AM » |
|
It's a common problem with the format when converting to the reading device. PDF is a pain. It's like reading a report instead of enjoying a book. PDF is the last resort for me if I really want to read something. You shouldn't experience this problem when downloading files formatted as 'kindle'. Especially from Amazon.
|
|
|
|
|
Logged
|
|
|
|
|
|
|