Artificial Intelligence (AI) Seminar

Finn O'Shea (SLAC) - Cohort Organize Learning (CoOL): Getting Robots to Cluster your Data for You

America/Los_Angeles
41/2-2162 - Sonoma (SLAC)

41/2-2162 - Sonoma

SLAC

26
Description

SLAC is starting to create, or already has, enormous scientific datasets.  A key problem with scientific data is to extract the data scientists want without bringing along all the, usually much more numerous, irrelevant data.  This typically requires some kind of organization of the data.  One way to organize data is to cluster it.  Clustering might also tell the user things about the data they did not expect.  Classical clustering algorithms suffer from the curse of dimensionality and may require significant preprocessing of, for example, time series data. Cohort Organized Learning (CoOL) aims to sidestep these issues by using neural networks to cluster directly in high dimension on data of any type appropriate for a neural network.  In this presentation, I motivate and describe how CoOL works, show it clustering some benchmark datasets, how to evaluate convergence during training without labels, and how to get a sense of the stability of the clusters it generates.  I will end with a discussion of potential future work.

Zoom: https://stanford.zoom.us/j/92002426906?pwd=3aO4cROZa0RNckj7tmQyXZxk0v9GOZ.1