Paper Title

Literature Survey on Multilingual OCR-Based PDF Search System

Authors

Sannapu reddy shanmukha reddy , Shaik saleef , Ch Sukumar

Keywords

Text segmentation, Text Extraction, image-based, Document processing, OCR

Abstract

In the digital era, a vast amount of information is stored in the form of PDF documents, many of which contain scanned images rather than machine-readable text. Traditional search mechanisms fail to retrieve content from such image-based PDFs, especially when the documents are written in multiple languages. To address this limitation, this project proposes a Multilingual OCR-Based PDF Search System that enables efficient searching of text across PDFs containing both Unicode text and scanned images in multiple languages. The proposed system integrates Optical Character Recognition (OCR) technology to extract textual content from image-based PDF files. It supports multiple languages, including English and selected Indian languages such as Telugu and Urdu, by leveraging language-specific OCR models. The extracted text is processed, indexed, and stored to enable fast and accurate keyword-based search across large document collections. The system allows users to upload PDFs, select the desired language, and perform searches to retrieve relevant documents and highlighted text segments. By combining OCR, text preprocessing, and indexing techniques, the proposed solution improves accessibility, document management, and information retrieval from multilingual PDF archives. This system is particularly useful for applications such as digital libraries, historical document preservation, government records, and academic repositories, where multilingual scanned documents are widely used.

How To Cite

"Literature Survey on Multilingual OCR-Based PDF Search System", IJEDR - INTERNATIONAL JOURNAL OF ENGINEERING DEVELOPMENT AND RESEARCH (www.IJEDR.org), ISSN:2321-9939, Vol.14, Issue 1, page no.857-861, January-2026, Available :https://rjwave.org/IJEDR/papers/IJEDR2601235.pdf

Issue

Volume 14 Issue 1, January-2026

Pages : 857-861

Other Publication Details

Paper Reg. ID: IJEDR_303901

Published Paper Id: IJEDR2601235

Research Area: Other area not in list

Country: banglore, karnataka, karnataka, India

Published Paper PDF: https://rjwave.org/IJEDR/papers/IJEDR2601235

Published Paper URL: https://rjwave.org/IJEDR/viewpaperforall?paper=IJEDR2601235

About Publisher

ISSN: 2321-9939 | IMPACT FACTOR: 9.37 Calculated By Google Scholar | ESTD YEAR: 2013

An International Scholarly Open Access Journal, Peer-Reviewed, Refereed Journal Impact Factor 9.37 Calculate by Google Scholar and Semantic Scholar | AI-Powered Research Tool, Multidisciplinary, Monthly, Multilanguage Journal Indexing in All Major Database & Metadata, Citation Generator

Publisher: IJEDR (IJ Publication) Janvi Wave

Article Preview

academia
publon
sematicscholar
googlescholar
scholar9
maceadmic
Microsoft_Academic_Search_Logo
elsevier
researchgate
ssrn
mendeley
Crossref
UGC Care
orcid
sitecreex