Journal IJCRT UGC-CARE, UGCCARE( ISSN: 2320-2882 ) | UGC Approved Journal | UGC Journal | UGC CARE Journal | UGC-CARE list, New UGC-CARE Reference List, UGC CARE Journals, International Peer Reviewed Journal and Refereed Journal, ugc approved journal, UGC CARE, UGC CARE list, UGC CARE list of Journal, UGCCARE, care journal list, UGC-CARE list, New UGC-CARE Reference List, New ugc care journal list, Research Journal, Research Journal Publication, Research Paper, Low cost research journal, Free of cost paper publication in Research Journal, High impact factor journal, Journal, Research paper journal, UGC CARE journal, UGC CARE Journals, ugc care list of journal, ugc approved list, ugc approved list of journal, Follow ugc approved journal, UGC CARE Journal, ugc approved list of journal, ugc care journal, UGC CARE list, UGC-CARE, care journal, UGC-CARE list, Journal publication, ISSN approved, Research journal, research paper, research paper publication, research journal publication, high impact factor, free publication, index journal, publish paper, publish Research paper, low cost publication, ugc approved journal, UGC CARE, ugc approved list of journal, ugc care journal, UGC CARE list, UGCCARE, care journal, UGC-CARE list, New UGC-CARE Reference List, UGC CARE Journals, ugc care list of journal, ugc care list 2020, ugc care approved journal, ugc care list 2020, new ugc approved journal in 2020, ugc care list 2021, ugc approved journal in 2021, Scopus, web of Science.
How start New Journal & software Book & Thesis Publications
Submit Your Paper
Login to Author Home
Communication Guidelines

WhatsApp Contact
Click Here

  Published Paper Details:

  Paper Title

A Hybrid Vision-Language Framework for Intelligent Invoice Information Extraction Using Donut and Gemini Models

  Authors

  Anubhav Mathur,  Anuj Singh Tomar,  Vaibhav Verma,  Suraj Prakash Chauhan,  Sanjeev Kumar Pathak

  Keywords

Donut model, Gemini AI, Invoice Extraction, Document Intelligence, Vision-Language Models

  Abstract


This study explores Intelligent Invoice Information Extraction in the context of recent progress in Vision-Language Models (VLMs). Conventional OCRbased pipelines frequently encounter recognition errors, domain-specific limitations, weak generalization, and poor performance when processing invoices with varied layouts. To address these challenges, we present a Hybrid Vision- Language Framework that combines the OCR-free Donut document transformer with the Gemini multimodal large language model. The framework enables structured extraction of key financial fields from invoices of multiple templates. Donut performs the visual encoding and sequence generation without relying on OCR, whereas Gemini provides higher-level reasoning, validation, and semantic refinement of the extracted information. The objective is to achieve high-precision identification of invoice numbers, dates, vendor details, tax components, itemized records, and total amounts. A detailed review of the literature indicates that only a few existing systems utilize hybrid VLM architectures that fuse OCR-free models with multimodal reasoning models for invoice extraction. Extensive empirical evaluations on custom datasets and standard benchmarks demonstrate substantial performance gains over traditional OCR-based and transformer-based baselines. The key contributions of this work include a scalable system architecture, an analysis of hybrid reasoning effectiveness, comprehensive experimental results, and practical insights for organizations aiming to automate financial processing workflows. The findings suggest that hybrid VLM frameworks offer a significant advancement for Intelligent Document Processing (IDP), reducing manual effort while improving generalization to previously unseen invoice formats.

  IJCRT's Publication Details

  Unique Identification Number - IJCRT2512402

  Paper ID - 298331

  Page Number(s) - d525-d537

  Pubished in - Volume 13 | Issue 12 | December 2025

  DOI (Digital Object Identifier) -    https://doi.org/10.56975/ijcrt.v13i12.298331

  Publisher Name - IJCRT | www.ijcrt.org | ISSN : 2320-2882

  E-ISSN Number - 2320-2882

  Cite this article

  Anubhav Mathur,  Anuj Singh Tomar,  Vaibhav Verma,  Suraj Prakash Chauhan,  Sanjeev Kumar Pathak,   "A Hybrid Vision-Language Framework for Intelligent Invoice Information Extraction Using Donut and Gemini Models", International Journal of Creative Research Thoughts (IJCRT), ISSN:2320-2882, Volume.13, Issue 12, pp.d525-d537, December 2025, Available at :http://www.ijcrt.org/papers/IJCRT2512402.pdf

  Share this article

  Article Preview

  Indexing Partners

indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
Call For Paper December 2025
Indexing Partner
ISSN and 7.97 Impact Factor Details


ISSN
ISSN
ISSN: 2320-2882
Impact Factor: 7.97 and ISSN APPROVED
Journal Starting Year (ESTD) : 2013
ISSN
ISSN and 7.97 Impact Factor Details


ISSN
ISSN
ISSN: 2320-2882
Impact Factor: 7.97 and ISSN APPROVED
Journal Starting Year (ESTD) : 2013
ISSN
DOI Details

Providing A digital object identifier by DOI.org How to get DOI?
For Reviewer /Referral (RMS) Earn 500 per paper
Our Social Link
Open Access
This material is Open Knowledge
This material is Open Data
This material is Open Content
Indexing Partner

Scholarly open access journals, Peer-reviewed, and Refereed Journals, Impact factor 7.97 (Calculate by google scholar and Semantic Scholar | AI-Powered Research Tool) , Multidisciplinary, Monthly, Indexing in all major database & Metadata, Citation Generator, Digital Object Identifier(DOI)

indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer
indexer