Speaker
Description
Next-generation large-scale liquid scintillator detectors (JUNO, KamLAND2 , SNO+ and JNE) rely on tens of thousands of Photomultiplier Tubes (PMTs) to capture single optical photons. Pushing the energy resolution to the physical limit requires extracting sub-nanosecond timing and precise charge from highly piled-up and noisy PMT waveforms. However, the lack of standardized, publicly accessible datasets with perfect ground-truth labels has hindered the rapid iteration of novel Machine Learning architectures in this domain.
We introduce the Ghost Hunter Open Data Challenge, a comprehensive benchmark dataset and outreach framework designed to bridge the gap between AI developers and neutrino physics. Generated by a fast, standalone toy-detector simulation decoupled from official experimental frameworks, the dataset provides massive PMT waveform tensors paired with exact microphysical labels (vertex, kinetic energy, photon time-of-flight, and true hit times).
Originating as an educational competition in 2019 and scaling to a nationwide undergraduate challenge with over 60 participating teams, the Ghost Hunter framework has successfully crowd-sourced diverse analytical methods, ranging from heuristic deconvolution and topological algorithms to deep Convolutional Neural Networks and Metropolis-Hastings sampling.
For NPML 2026, we propose to elevate this framework into a community-wide Open Data Challenge. We will present the dataset structure, the physics-driven loss functions such as Unbinned Time-Dependent Poisson Likelihood, and an open-source evaluation platform. By providing a standard dataset, we aim to lower the impedance for CS researchers entering neutrino physics, foster cross-disciplinary ML collaborations, and incubate next-generation differentiable reconstruction algorithms.
| Contribution types | Standard talk (20min + 5min Q/A) |
|---|