![]() ![]() Slow : Slow OCR performance and provide best OCR accuracy.Fast : provides moderate OCR processing speed and accuracy.Rapid : high speed OCR performance and provide normal OCR accuracy.You can set the different performance level to the OCRProcessor using “Performance” enumeration. ![]() Tesseract works best with text when at least 300 dots per inch (DPI) are used, so it is beneficial to resize images. Asprise Java C VB.NET Python OCR library offers a royalty-free API that converts images (in formats like JPEG, PNG, TIFF, PDF, etc.) into editable document formats Word, XML, searchable PDF, etc.) by extracting text and barcode information. In addition, rotated images, and skewed images can also affect the accuracy and readability of the OCR process.This is especially useful in low-resolution scans. It ensures that optical character recognition works on the highest-quality image, by improving the OCR accuracy. Use CCITT Group 4 or JBIG2 (lossless) compression for monochrome images.Use (zip) lossless compression for color or gray-scale images.You can improve the accuracy of the OCR process by choosing the correct compression method when converting scanned paper to a TIFF image and then to a PDF document. NET PDF supports OCR by using the Tesseract open-source engine. ( "Save changes in PDF document.Tesseract is an optical character recognition engine, one of the most accurate OCR engines at present. ' add recognized OCR page to the PDF document ' for each image in image collection For Each image As Vintasoft.Imaging. ' specify that image must be place over textĭocumentBuilder.PageCreationMode = .PdfPageCreationMode.ImageOverText ' specify that the best image compression must be calculated automaticallyĭocumentBuilder.ImageCompression = .Auto ' create a PDF document builder Dim documentBuilder As New. ' create the Tesseract OCR engine Using tesseractOcr As New. PdfDocument(searchablePdfFilename, .Pdf_14) Refer the below code snippet for the same. You can perform OCR on PDF document with the help of OCRProcessor Class. ' create a searchable PDF document Using document As New . PDF supports OCR only in Windows Forms, WPF, ASP.NET and ASP.NET MVC platforms. ' add pages from image-only PDF document into image collection ' create an image collection Using images As New Vintasoft.Imaging. Resolution, searchablePdfFilename As String) OcrLanguage, imageOnlyPdfFilename As String, ocrResolution As Vintasoft.Imaging. Public Shared Sub ConvertImageOnlyPdfToSearchablePdf(ocrLanguage As. ''' A filename of destination searchable PDF file. ''' A filename of source image-only PDF file. ''' ''' Converts an image-only PDF document to a searchable PDF document. clear and dispose images in image collection add recognized OCR page to the PDF document OcrPage page = tesseractOcr.Recognize(image) for each image in image collection foreach (Vintasoft.Imaging. specify that image must be place over textĭocumentBuilder.PageCreationMode = .PdfPageCreationMode.ImageOverText specify that the best image compression must be calculated automaticallyĭocumentBuilder.ImageCompression = .Auto create the Tesseract OCR engine using (. PdfDocument(searchablePdfFilename, .Pdf_14)) create a searchable PDF document using (. add pages from image-only PDF document into image collection ImageCollection images = new Vintasoft.Imaging. create an image collection using (Vintasoft.Imaging. ByteScout PDF Extractor SDK source code samples (VB.NET) - GitHub - bytescout/pdf-extractor-sdk-samples-vb-net: ByteScout PDF Extractor SDK source code. public static void ConvertImageOnlyPdfToSearchablePdf( / A filename of destination searchable PDF file. / A filename of source image-only PDF file. / /// Converts an image-only PDF document to a searchable PDF document. ![]()
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |