v1v2 (latest)

A Parallel CPU-GPU Framework for Batching Heuristic Operations in Depth-First Heuristic Search

16 July 2025

Ehsan Futuhi

Nathan R. Sturtevant

ArXiv (abs)PDF HTML Github (119★)

Main:7 Pages

4 Figures

Bibliography:2 Pages

8 Tables

Appendix:4 Pages

Abstract

The rapid advancement of GPU technology has unlocked powerful parallel processing capabilities, creating new opportunities to enhance classic search algorithms. This hardware has been exploited in best-first search algorithms with neural network-based heuristics by creating batched versions of A* and Weighted A* that delay heuristic evaluation until sufficiently many states can be evaluated in parallel on the GPU. But, research has not addressed how depth-first algorithms like IDA* or Budgeted Tree Search (BTS) can have their heuristic computations batched. This is more complicated in a tree search, because progress in the search tree is blocked until heuristic evaluations are complete. In this paper we show that GPU parallelization of heuristics can be effectively performed when the tree search is parallelized on the CPU while heuristic evaluations are parallelized on the GPU. We develop a parallelized cost-bounded depth-first search (CB-DFS) framework that can be applied to both IDA* and BTS, significantly improving their performance. We demonstrate the strength of the approach on the 3x3 Rubik's Cube and the 4x4 sliding tile puzzle (STP) with both classifier-based and regression-based heuristics.

View on arXiv

Comments on this paper