Linguist Forum

Specializations => Computational Linguistics => Topic started by: mojobadshah on July 25, 2019, 07:34:53 AM

Title: Best OCR scan app for IPA?
Post by: mojobadshah on July 25, 2019, 07:34:53 AM
Where can I find "the" best OCR scan app or coding script for scanning morpholinguistc texts (etymologies)?  What apps are the universities using to either OCR scan linguistic texts/lexocographies or has anyone sorted out what IPA symbols, punctuation, diacritics, etc... systematically scan as errors?
Title: Re: Best OCR scan app for IPA?
Post by: panini on July 25, 2019, 08:19:10 AM
None at all. OCR technologies on the market operate on the basis of a "language", and no language uses IPA. This is annoying, and I would be happy if someone knew the technology well enough to create a Unicode-scoped OCR routine.
Title: Re: Best OCR scan app for IPA?
Post by: Daniel on July 25, 2019, 10:24:05 AM
I use OCR every day for a variety of languages, but never IPA for the reasons panini listed. I wouldn't trust it anyway, given slight contrasts in symbols then varying with fonts. (It's hard enough for me to figure out what some symbols are supposed to be, especially when non-standard symbols are being used.)

If you want to take on this project seriously, though, you could look into customizing OCR software:
In short, I don't know how reliable this would be, given that the details are important, and there is no easy way to verify that the results are correct (as would be the case with, for example, reading English paragraphs and noting either spelling errors or words that don't make sense in context-- unless you know all of these transcriptions already you wouldn't know if they're right or wrong). In this case I'd advise typing them all out by hand. It would even take more time just to correct imperfect OCR! My guess is also that very few data sets would be long enough so as to require so much time to just type out by hand that solving IPA OCR would actually save you time.