Restaurant

AI-Powered Data Exploration: Interacting with Apache Iceberg via Spark and LLMs

Pratik Patel

Lead Developer Advocate

Azul

AI-Powered Data Exploration: Interacting with Apache Iceberg via Spark and LLMs

This presentation delves into the potential of integrating LLMs with Apache Spark and Apache Iceberg to establish an intuitive chat interface for data interaction. We’ll show how this combination enables users to perform data queries and extract insights from massive datasets using natural language. At Azul, we have a massive amount of data (logs) gathered over the years, of our open source and free JVM’s downloads - all stored in Apache Iceberg. In this session we’ll explore the potential of combining Iceberg, Spark and LLMs:

Natural Language Querying: By leveraging LLMs, we can run Spark operations that query the underlying Iceberg dataset. This abstracts away the need for users to write complex SQL or PySpark code, making data exploration accessible and easy

AI-Enhanced Insight Generation: The integration allows LLMs to not only retrieve data but also to generate summaries, identify patterns, and perform trend analysis directly from the structured information stored in Iceberg tables

Integrated Solution: How we’ve built a solution that stacks Iceberg, Spark and GenAI to interact with the download data

#iceberg #big data #Spark #LLM #GenAI #Analytics

Pratik Patel

Biography

Pratik Patel is a Java Champion, developer advocate at Azul Systems and has written 3 books on programming (Java, Cloud and OSS). An all around software and hardware nerd with experience in the healthcare, telecom, financial services, and startup sectors. He's also a co-organizer of the Atlanta Java User Group and conference chairperson for Devnexus, frequent speaker at tech events, and master builder of nachos.