15–19 Jun 2026
UC Irvine
America/New_York timezone

.The Ghost Hunter Open Data Challeng for ML-driven PMT Waveform Analysis and Event Reconstruction

18 Jun 2026, 14:25
20m
The Interdisciplinary Science and Engineering Building (UC Irvine)

The Interdisciplinary Science and Engineering Building

UC Irvine

419 Physical Sciences Quad, Irvine, CA 92697
Public Datasets and Challenges Datasets Ecosystem: Public Datasets and Open Data Challenges

Speaker

Benda Xu (Tsinghua University)

Description

Next-generation large-scale liquid scintillator detectors (JUNO, KamLAND2 , SNO+ and JNE) rely on tens of thousands of Photomultiplier Tubes (PMTs) to capture single optical photons. Pushing the energy resolution to the physical limit requires extracting sub-nanosecond timing and precise charge from highly piled-up and noisy PMT waveforms. However, the lack of standardized, publicly accessible datasets with perfect ground-truth labels has hindered the rapid iteration of novel Machine Learning architectures in this domain.

We introduce the Ghost Hunter Open Data Challenge, a comprehensive benchmark dataset and outreach framework designed to bridge the gap between AI developers and neutrino physics. Generated by a fast, standalone toy-detector simulation decoupled from official experimental frameworks, the dataset provides massive PMT waveform tensors paired with exact microphysical labels (vertex, kinetic energy, photon time-of-flight, and true hit times).

Originating as an educational competition in 2019 and scaling to a nationwide undergraduate challenge with over 60 participating teams, the Ghost Hunter framework has successfully crowd-sourced diverse analytical methods, ranging from heuristic deconvolution and topological algorithms to deep Convolutional Neural Networks and Metropolis-Hastings sampling.

For NPML 2026, we propose to elevate this framework into a community-wide Open Data Challenge. We will present the dataset structure, the physics-driven loss functions such as Unbinned Time-Dependent Poisson Likelihood, and an open-source evaluation platform. By providing a standard dataset, we aim to lower the impedance for CS researchers entering neutrino physics, foster cross-disciplinary ML collaborations, and incubate next-generation differentiable reconstruction algorithms.

Contribution types Standard talk (20min + 5min Q/A)

Author

Benda Xu (Tsinghua University)

Co-author

Mr Shengqi Chen (Tsinghua University)

Presentation materials