ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.09433
  4. Cited By
Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
v1v2v3 (latest)

Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes

Proceedings of the VLDB Endowment (PVLDB), 2023
19 April 2023
Simran Arora
Brandon Yang
Sabri Eyuboglu
A. Narayan
Andrew Hojel
Immanuel Trummer
Christopher Ré
    SyDa
ArXiv (abs)PDFHTML

Papers citing "Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes"

50 / 87 papers shown
BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents
BookRAG: A Hierarchical Structure-aware Index-based Approach for Retrieval-Augmented Generation on Complex Documents
Shu Wang
Yingli Zhou
Yixiang Fang
175
0
0
03 Dec 2025
SRE-Llama -- Fine-Tuned Meta's Llama LLM, Federated Learning, Blockchain and NFT Enabled Site Reliability Engineering(SRE) Platform for Communication and Networking Software Services
SRE-Llama -- Fine-Tuned Meta's Llama LLM, Federated Learning, Blockchain and NFT Enabled Site Reliability Engineering(SRE) Platform for Communication and Networking Software ServicesInternational Conference on Blockchain Computing and Applications (BCCA), 2025
Eranga Bandara
Safdar H. Bouk
Sachin Shetty
Ravi Mukkamala
A. Rahman
Peter Foytik
Ross Gore
Xueping Liang
Ng Wee Keong
Kasun De Zoysa
80
1
0
11 Nov 2025
Structured RAG for Answering Aggregative Questions
Structured RAG for Answering Aggregative Questions
Omri Koshorek
Niv Granot
Aviv Alloni
Shahar Admati
Roee Hendel
Ido Weiss
Alan Arazi
Shay-Nitzan Cohen
Yonatan Belinkov
RALM
255
0
0
11 Nov 2025
Cortex AISQL: A Production SQL Engine for Unstructured Data
Cortex AISQL: A Production SQL Engine for Unstructured Data
Paweł Liskowski
Bowei Chen
Paritosh Aggarwal
Benjamin Han
Boxin Jiang
...
Jay Tayade
Weicheng Zhao
Anupam Datta
Nathan Wiegand
Dimitris Tsirogiannis
137
2
0
10 Nov 2025
Attention and Compression is all you need for Controllably Efficient Language Models
Attention and Compression is all you need for Controllably Efficient Language Models
Jatin Prakash
N. Jethani
Rajesh Ranganath
MQVLM
467
0
0
07 Nov 2025
Relational Deep Dive: Error-Aware Queries Over Unstructured Data
Relational Deep Dive: Error-Aware Queries Over Unstructured Data
Daren Chao
Kaiwen Chen
Naiqing Guan
Nick Koudas
102
0
0
04 Nov 2025
AGRAG: Advanced Graph-based Retrieval-Augmented Generation for LLMs
AGRAG: Advanced Graph-based Retrieval-Augmented Generation for LLMs
Y. Wang
Haoyang Li
Fei Teng
Lei Chen
LRM
105
0
0
02 Nov 2025
FlashEVA: Accelerating LLM inference via Efficient Attention
FlashEVA: Accelerating LLM inference via Efficient Attention
Juan Gabriel Kostelec
Qinghai Guo
164
0
0
01 Nov 2025
Standardization of Psychiatric Diagnoses -- Role of Fine-tuned LLM Consortium and OpenAI-gpt-oss Reasoning LLM Enabled Decision Support System
Standardization of Psychiatric Diagnoses -- Role of Fine-tuned LLM Consortium and OpenAI-gpt-oss Reasoning LLM Enabled Decision Support System
Eranga Bandara
Ross Gore
Atmaram Yarlagadda
Anita H. Clayton
Preston Samuel
Christopher Rhea
Sachin Shetty
AI4MH
183
3
0
29 Oct 2025
TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents
TEXT2DB: Integration-Aware Information Extraction with Large Language Model AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yizhu Jiao
S. Li
Sizhe Zhou
Heng Ji
Jiawei Han
137
9
0
28 Oct 2025
Agentsway -- Software Development Methodology for AI Agents-based Teams
Agentsway -- Software Development Methodology for AI Agents-based Teams
Eranga Bandara
Ross Gore
Xueping Liang
Sachini Rajapakse
Isurunima Kularathne
...
Amin Hass
Ng Wee Keong
Kasun De Zoysa
Aruna Withanage
Nilaan Loganathan
LLMAGAI4TSAIFin
326
4
0
26 Oct 2025
Model Context Contracts - MCP-Enabled Framework to Integrate LLMs With Blockchain Smart Contracts
Model Context Contracts - MCP-Enabled Framework to Integrate LLMs With Blockchain Smart Contracts
Eranga Bandara
Sachin Shetty
Ravi Mukkamala
Ross Gore
Peter Foytik
...
Xueping Liang
Ng Wee Keong
Kasun De Zoysa
Aruna Withanage
Nilaan Loganathan
80
3
0
21 Oct 2025
Implementing Semantic Join Operators Efficiently
Implementing Semantic Join Operators Efficiently
Immanuel Trummer
105
0
0
09 Oct 2025
LLM/Agent-as-Data-Analyst: A Survey
LLM/Agent-as-Data-Analyst: A Survey
Zirui Tang
Weizheng Wang
Z. Zhou
Yang Jiao
Bangrui Xu
...
Conghui He
Bin Wang
Conghui He
Xiaoyang Wang
Fan Wu
236
6
0
28 Sep 2025
ScaleDoc: Scaling LLM-based Predicates over Large Document Collections
ScaleDoc: Scaling LLM-based Predicates over Large Document Collections
Hengrui Zhang
Yulong Hui
Yihao Liu
Huanchen Zhang
OffRL
109
0
0
16 Sep 2025
A Survey on Retrieval And Structuring Augmented Generation with Large Language Models
A Survey on Retrieval And Structuring Augmented Generation with Large Language Models
Pengcheng Jiang
Siru Ouyang
Yizhu Jiao
Ming Zhong
Runchu Tian
Jiawei Han
RALMKELM
208
5
0
12 Sep 2025
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees
Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees
Sepanta Zeighami
Shreya Shankar
Aditya G. Parameswaran
126
3
0
02 Sep 2025
A Survey on Open Dataset Search in the LLM Era: Retrospectives and Perspectives
A Survey on Open Dataset Search in the LLM Era: Retrospectives and Perspectives
Pengyue Li
Sheng Wang
Hua Dai
Zhiyu Zoey Chen
Z. Bao
Brian D. Davison
88
0
0
31 Aug 2025
ST-Raptor: LLM-Powered Semi-Structured Table Question Answering
ST-Raptor: LLM-Powered Semi-Structured Table Question Answering
Zirui Tang
Boyu Niu
Xuanhe Zhou
Boxiu Li
Wei Zhou
Jiannan Wang
Guoliang Li
Xinyi Zhang
Fan Wu
LMTD
228
2
0
25 Aug 2025
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu
Qinghao Hu
Shang Yang
Haocheng Xi
Junyu Chen
Song Han
Han Cai
246
11
0
21 Aug 2025
LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
LoopServe: An Adaptive Dual-phase LLM Inference Acceleration System for Multi-Turn Dialogues
Haoyang Li
Zhanchao Xu
Yiming Li
Xuejia Chen
Darian Li
...
Cheng Deng
Jun Wang
Qing Li
Lei Chen
Mingxuan Yuan
233
1
0
18 Jul 2025
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
Instruction Tuning with and without Context: Behavioral Shifts and Downstream Impact
Hyunji Lee
Seunghyun Yoon
Yunjae Won
Hanseok Oh
Geewook Kim
Trung H. Bui
Franck Dernoncourt
Elias Stengel-Eskin
Mohit Bansal
Minjoon Seo
LRM
246
2
0
18 Jun 2025
MesaNet: Sequence Modeling by Locally Optimal Test-Time Training
J. Oswald
Nino Scherrer
Seijin Kobayashi
Luca Versari
Songlin Yang
...
Guillaume Lajoie
Charlotte Frenkel
Razvan Pascanu
Blaise Agüera y Arcas
João Sacramento
311
14
0
05 Jun 2025
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Blending Complementary Memory Systems in Hybrid Quadratic-Linear Transformers
Kazuki Irie
Morris Yau
Samuel J. Gershman
221
6
0
31 May 2025
Towards Scalable Schema Mapping using Large Language Models
Towards Scalable Schema Mapping using Large Language Models
Christopher Buss
Mahdis Safari
Arash Termehchy
Stefan Lee
David Maier
148
4
0
30 May 2025
ATLAS: Learning to Optimally Memorize the Context at Test Time
ATLAS: Learning to Optimally Memorize the Context at Test Time
Ali Behrouz
Zeman Li
Praneeth Kacham
Majid Daliri
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
523
24
0
29 May 2025
SQUiD: Synthesizing Relational Databases from Unstructured Text
SQUiD: Synthesizing Relational Databases from Unstructured Text
Mushtari Sadia
Zhenning Yang
Yunming Xiao
Ang Chen
Amrita Roy Chowdhury
SyDa
251
1
0
25 May 2025
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation
How Does Sequence Modeling Architecture Influence Base Capabilities of Pre-trained Language Models? Exploring Key Architecture Design Principles to Avoid Base Capabilities Degradation
Xin Lu
Yanyan Zhao
Si Wei
Shijin Wang
Bing Qin
Ting Liu
216
0
0
24 May 2025
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Efficient LLM Serving on Hybrid Real-time and Best-effort Requests
Wan Borui
Zhao Juntao
Jiang Chenyu
Guo Chuanxiong
Wu Chuan
VLM
297
7
0
13 Apr 2025
Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous Databases
Simplifying Data Integration: SLM-Driven Systems for Unified Semantic Queries Across Heterogeneous DatabasesIEEE International Conference on Data Engineering (ICDE), 2025
Teng Lin
301
3
0
08 Apr 2025
LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts
LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts
Pankaj Thorat
Adnan Qidwai
Adrija Dhar
Aishwariya Chakraborty
Anand Eswaran
Hima Patel
Praveen Jayachandran
230
1
0
19 Mar 2025
Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models
A. Narayan
D. Biderman
Sabri Eyuboglu
Avner May
Scott W. Linderman
James Zou
Christopher Ré
261
12
0
21 Feb 2025
MoM: Linear Sequence Modeling with Mixture-of-Memories
MoM: Linear Sequence Modeling with Mixture-of-Memories
Jusen Du
Weigao Sun
Disen Lan
Jiaxi Hu
Yu Cheng
KELM
555
15
0
19 Feb 2025
Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification
Graph-based Retrieval Augmented Generation for Dynamic Few-shot Text Classification
Yubo Wang
Haoyang Li
Fei Teng
Lei Chen
491
3
0
17 Feb 2025
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
CodeMonkeys: Scaling Test-Time Compute for Software Engineering
Ryan Ehrlich
Bradley Brown
Jordan Juravsky
Ronald Clark
Christopher Ré
Azalia Mirhoseini
312
26
0
24 Jan 2025
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Mind the Data Gap: Bridging LLMs to Enterprise Data Integration
Moe Kayali
Fabian Wenz
Nesime Tatbul
Çağatay Demiralp
207
6
0
31 Dec 2024
The Design of an LLM-powered Unstructured Analytics System
The Design of an LLM-powered Unstructured Analytics System
Eric Anderson
Jonathan Fritz
Austin Lee
Bohou Li
Mark Lindblad
...
Mehul A. Shah
Benjamin Sowell
Dan Tecuci
Vinayak Thapliyal
Matt Welsh
280
28
0
31 Dec 2024
Smoothie: Label Free Language Model Routing
Smoothie: Label Free Language Model RoutingNeural Information Processing Systems (NeurIPS), 2024
Neel Guha
Mayee F. Chen
Trevor Chow
Ishan S. Khare
Christopher Ré
260
22
0
06 Dec 2024
Unlocking State-Tracking in Linear RNNs Through Negative Eigenvalues
Unlocking State-Tracking in Linear RNNs Through Negative EigenvaluesInternational Conference on Learning Representations (ICLR), 2024
Riccardo Grazzi
Julien N. Siems
Jörg Franke
Arber Zela
Katharina Eggensperger
Massimiliano Pontil
747
45
0
19 Nov 2024
DocETL: Agentic Query Rewriting and Evaluation for Complex Document Processing
DocETL: Agentic Query Rewriting and Evaluation for Complex Document ProcessingProceedings of the VLDB Endowment (PVLDB), 2024
Shreya Shankar
Tristan Chambers
Eugene Wu
Aditya G. Parameswaran
Eugene Wu
LLMAG
358
28
0
16 Oct 2024
Reward-Robust RLHF in LLMs
Reward-Robust RLHF in LLMs
Yuzi Yan
Xingzhou Lou
Jialian Li
Yiping Zhang
Jian Xie
Chao Yu
Yu Wang
Dong Yan
Yuan Shen
368
17
0
18 Sep 2024
Large Language Models are Pattern Matchers: Editing Semi-Structured and
  Structured Documents with ChatGPT
Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT
Irene Weber
KELMAI4MH
206
1
0
12 Sep 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
422
29
0
19 Jul 2024
A Declarative System for Optimizing AI Workloads
A Declarative System for Optimizing AI Workloads
Chunwei Liu
Matthew Russo
Michael Cafarella
Lei Cao
Peter Baille Chen
Zui Chen
Michael Franklin
Tim Kraska
Samuel Madden
Gerardo Vitagliano
237
45
0
23 May 2024
Chameleon: Foundation Models for Fairness-aware Multi-modal Data
  Augmentation to Enhance Coverage of Minorities
Chameleon: Foundation Models for Fairness-aware Multi-modal Data Augmentation to Enhance Coverage of Minorities
Mahdi Erfanian
H. V. Jagadish
Abolfazl Asudeh
174
7
0
02 Feb 2024
Gated Linear Attention Transformers with Hardware-Efficient Training
Gated Linear Attention Transformers with Hardware-Efficient Training
Aaron Courville
Bailin Wang
Songlin Yang
Yikang Shen
Yoon Kim
443
300
0
11 Dec 2023
Jellyfish: A Large Language Model for Data Preprocessing
Jellyfish: A Large Language Model for Data Preprocessing
Haochen Zhang
Yuyang Dong
Chuan Xiao
Masafumi Oyamada
516
36
0
04 Dec 2023
SEED: Domain-Specific Data Curation With Large Language Models
SEED: Domain-Specific Data Curation With Large Language Models
Zui Chen
Lei Cao
Samuel Madden
Tim Kraska
Zeyuan Shang
Ju Fan
Nan Tang
Zihui Gu
Chunwei Liu
Michael Cafarella
270
13
0
01 Oct 2023
Generative Benchmark Creation for Table Union Search
Generative Benchmark Creation for Table Union Search
Koyena Pal
Aamod Khatiwada
Roee Shraga
Renée J. Miller
170
2
0
07 Aug 2023
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage
TPTU: Large Language Model-based AI Agents for Task Planning and Tool Usage
Jingqing Ruan
Yihong Chen
Bin Zhang
Zhiwei Xu
Tianpeng Bao
...
Shiwei Shi
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAGLM&Ro
341
53
0
07 Aug 2023
12
Next