MAVREN: A multilayered adaptive framework for deploying vision-language models on resource-constrained unmanned aerial vehicles for autonomous search and rescue

Md Tahmid Rashid; Md Jawad Siddique; Abdus Shaqur

MAVREN: A multilayered adaptive framework for deploying vision-language models on resource-constrained unmanned aerial vehicles for autonomous search and rescue

Files

1.docx (14.71 KB)

Date

2025-12-22

Authors

Md Tahmid Rashid

Md Jawad Siddique

Abdus Shaqur

Publisher

Engineering Applications of Artificial Intelligence

Abstract

Abstract Unmanned Aerial Vehicles (UAVs) have become indispensable in autonomous search and rescue (SAR) missions, where the ability to interpret complex visual scenes in real time is critical. When equipped with artificial intelligence (AI)-empowered Vision-Language Models (VLMs), UAVs can provide rich contextual insights, interpret their findings, and even suggest next steps, but their deployment on resource-constrained UAV platforms is vastly limited by high computational demands, energy constraints, and strict latency requirements. This paper introduces MAVREN, a multilayered adaptive scheduler for VLM execution in resource-constrained UAV networks for autonomous SAR operations. Evaluations conducted on NVIDIA Jetson Orin NX using state-of-the-art VLMs such as Large Language and Vision Assistant (LLaVA) 1.6 and Vision-Language Alignment (VILA) 7B demonstrate that MAVREN achieves up to 26.11% higher throughput, 23% lower energy consumption, 13.51% reduced latency, and a 7% gain in detection accuracy compared to baseline schedulers across indoor, outdoor, and multi-UAV SAR scenarios. This is achieved through the integration of a visual encoder for lightweight feature extraction, a block floating-point quantizer for precision-efficient representation, a bit-wise computation engine for fast arithmetic execution, and a branch-and-bound optimizer for dynamic central processing unit (CPU) scheduling. These tightly coupled components allow MAVREN to optimize the energy–latency–accuracy trade-off, making it a deployable solution for vision-language reasoning in real-world SAR missions. Our findings demonstrate MAVREN’s capability to deliver rapid, energy-efficient inference, advancing the deployment of computationally intensive VLMs on resource-constrained UAV platforms.

Keywords

MAVREN, Multilayered adaptive framework, Vision-Language Models (VLMs), Resource-constrained systems

Citation

Rashid, M. T., Siddique, M. J., & Shaqur, A. (2025). MAVREN: A multilayered adaptive framework for deploying vision-language models on resource-constrained unmanned aerial vehicles for autonomous search and rescue. Engineering Applications of Artificial Intelligence, 162, 112498.

URI

http://dspace.uttarauniversity.edu.bd:4000/handle/123456789/1346

Collections

Journal Articles

Full item page

MAVREN: A multilayered adaptive framework for deploying vision-language models on resource-constrained unmanned aerial vehicles for autonomous search and rescue

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By