Legal NLP remains underdeveloped in regions like India due to the scarcity of structured datasets. We introduce IndianBailJudgments-1200, a new benchmark dataset comprising 1200 Indian court judgments on bail decisions, annotated across 20+ attributes including bail outcome, IPC sections, crime type, and legal reasoning. Annotations were generated using a prompt-engineered GPT-4o pipeline and verified for consistency. This resource supports a wide range of legal NLP tasks such as outcome prediction, summarization, and fairness analysis, and is the first publicly available dataset focused specifically on Indian bail jurisprudence.
View on arXiv@article{deshmukh2025_2507.02506, title={ IndianBailJudgments-1200: A Multi-Attribute Dataset for Legal NLP on Indian Bail Orders }, author={ Sneha Deshmukh and Prathmesh Kamble }, journal={arXiv preprint arXiv:2507.02506}, year={ 2025 } }