MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG

Long-context (LC) Large Language Models (LLMs) combined with Retrieval-Augmented Generation (RAG) hold strong potential for complex multi-hop and large-document tasks. However, existing RAG systems often suffer from imprecise retrieval, incomplete context coverage under constrained context windows, and fragmented information caused by suboptimal context construction. We introduce Multi-scale Adaptive Context RAG (MacRAG), a hierarchical retrieval framework that compresses and partitions documents into coarse-to-fine granularities, then adaptively merges relevant contexts through chunk- and document-level expansions in real time. By starting from the finest-level retrieval and progressively incorporating higher-level and broader context, MacRAG constructs effective query-specific long contexts, optimizing both precision and coverage. Evaluations on the challenging LongBench expansions of HotpotQA, 2WikiMultihopQA, and Musique confirm that MacRAG consistently surpasses baseline RAG pipelines on single- and multi-step generation with Llama-3.1-8B, Gemini-1.5-pro, and GPT-4o. Our results establish MacRAG as an efficient, scalable solution for real-world long-context, multi-hop reasoning. Our code is available atthis https URL.
View on arXiv@article{lim2025_2505.06569, title={ MacRAG: Compress, Slice, and Scale-up for Multi-Scale Adaptive Context RAG }, author={ Woosang Lim and Zekun Li and Gyuwan Kim and Sungyoung Ji and HyeonJung Kim and Kyuri Choi and Jin Hyuk Lim and Kyungpyo Park and William Yang Wang }, journal={arXiv preprint arXiv:2505.06569}, year={ 2025 } }