Literature Survey on Multilingual OCR-Based PDF Search System
Sannapu reddy shanmukha reddy
, Shaik saleef , Ch Sukumar
Text segmentation, Text Extraction, image-based, Document processing, OCR
In the digital era, a vast amount of information is stored in the form of PDF documents, many of which contain scanned images rather than machine-readable text. Traditional search mechanisms fail to retrieve content from such image-based PDFs, especially when the documents are written in multiple languages. To address this limitation, this project proposes a Multilingual OCR-Based PDF Search System that enables efficient searching of text across PDFs containing both Unicode text and scanned images in multiple languages.
The proposed system integrates Optical Character Recognition (OCR) technology to extract textual content from image-based PDF files. It supports multiple languages, including English and selected Indian languages such as Telugu and Urdu, by leveraging language-specific OCR models. The extracted text is processed, indexed, and stored to enable fast and accurate keyword-based search across large document collections.
The system allows users to upload PDFs, select the desired language, and perform searches to retrieve relevant documents and highlighted text segments. By combining OCR, text preprocessing, and indexing techniques, the proposed solution improves accessibility, document management, and information retrieval from multilingual PDF archives. This system is particularly useful for applications such as digital libraries, historical document preservation, government records, and academic repositories, where multilingual scanned documents are widely used.
"Literature Survey on Multilingual OCR-Based PDF Search System", IJEDR - INTERNATIONAL JOURNAL OF ENGINEERING DEVELOPMENT AND RESEARCH (www.IJEDR.org), ISSN:2321-9939, Vol.14, Issue 1, page no.857-861, January-2026, Available :https://rjwave.org/IJEDR/papers/IJEDR2601235.pdf
Volume 14
Issue 1,
January-2026
Pages : 857-861
Paper Reg. ID: IJEDR_303901
Published Paper Id: IJEDR2601235
Research Area: Other area not in list
Country: banglore, karnataka, karnataka, India
ISSN: 2321-9939 | IMPACT FACTOR: 9.37 Calculated By Google Scholar | ESTD YEAR: 2013
An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 9.37 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator
Publisher: IJEDR (IJ Publication) Janvi Wave