CIDeR-ML General Meeting

America/Los_Angeles
Description

https://u-tokyo-ac-jp.zoom.us/j/83932834349

Recording

Minutes:

Quick recap

The meeting focused on updates and discussions related to cosmic calibration and machine learning models for PMT waveform prediction. Ka Ming presented progress on cosmic calibration, highlighting improvements in charge distribution and visibility tuning, while Junjie shared updates on the sirenTV model, including performance checks and training improvements. The team discussed challenges with the current training approach, particularly around the rising edge of waveforms, and agreed to revisit the method to address systematic errors. Additionally, the group scheduled a future Japan workshop for late April, with participants confirming their availability.

Next steps

  • Patrick Kavang, Omar: Fill out the poll for the late March/early April Japan meeting preferences as soon as possible.
  • Ka Ming: Upload the presentation slides to the Indico page.
  • Ka Ming: Make a list of outstanding tasks/questions and provide instructions/recipes for the pre-training sample and related software pipeline; coordinate with Patrick to potentially assign these tasks to the new postdocs.
  • Junjie: Investigate the discrepancy in dataset size (15M vs expected 20M events) and follow up with Sam as needed.
  • Junjie: Estimate the expected visibility bias due to photon statistics and compare with observed bias pattern across PMT IDs.
  • Junjie: Revisit the loss function and masking/supervision approach around T0 (rising edge) in waveform modeling, as discussed with Kazu.
  • Kazu: Check resource availability and assist Junjie in obtaining access to appropriate GPU nodes for training.
  • Kazu: Check with absent team members (e.g., Sam, Catherine) and Patrick about availability for a possible April 27th meeting, and confirm if Patrick can host.

Summary

WCTE Cosmic Calibration and Track Visibility

Ka Ming presented an update on cosmic calibration work for WCTE detectors, focusing on addressing issues with charge tuning and visibility for muon tracks. He discovered that current tuning processes artificially reduce visibility for upward and horizontal tracks, leading to loss of accuracy in the top portion of the detector. Ka Ming demonstrated improvements in the calibration process using a larger dataset of cosmic muon tracks with lengths greater than 1 meter and going all directions, showing that the tuned model performs better than the pre-trained model in most cases. However, he identified that using reconstructed track parameters instead of true parameters leads to significantly worse performance, suggesting the need for improvements in the reconstruction process, fiTQun and ML, before applying WCTE data. Ka Ming reported progress on calibration and discussed plans to improve reconstruction performance, noting a lack of experts in this area. Patrick suggested exploring improvements to pre-training samples and photon bomb simulations, with Ka Ming explaining the challenges involved.

SirenTV Data Loader and Training Updates

Junjie presented updates on the SirenTV model, which aims to predict PMT waveforms using photon source and PMT positions as input. They discussed the model's architecture and recent improvements, including expanding the dataset and updating the model to address new challenges. Junjie presented updates on data loading and training improvements. He explained that the old data loader was inefficient, but the updated version now uses multiprocessing for CPU loading, reducing transfer time from 4 seconds to 1 second with 4 workers. Junjie also described enabling distributed training, which should make training n times faster with n GPUs. He showed that after 12 epochs, the loss had decreased to 10-94s(?), and the average bias in visibility and waveform had reached 5-6%. Junjie noted that the bias on the rise time was still far from the target, but training was still in its early stages. He concluded by presenting performance checks, including comparisons between predicted and actual visibilities, which showed good alignment overall but some discrepancies that may improve with further training or different normalization methods.
The team discussed a statistical analysis of data plots, with Kazu and Junjie examining patterns in photon visibility and identifying potential geometric effects affecting the results. They agreed to revisit the approach for modeling the rising edge, as the current method likely causes highest errors at the rising edge, which is not ideal. Junjie mentioned challenges with accessing GPU nodes for training, and Kazu agreed to help find available resources.