We look forward to MLonMB on November 10, 2021 from 9am-3:30pm at the UCSC Hay Barn (coffee and fruit will be available from 8am-9am). Our last requests of you prior to the meeting are described below. They include input on your dietary requirements, parking, and a bit more.
Logistical info
This Google doc describes logistics like parking and COVID protocol. We will supplement it when asked.
Please fill out this form to provide us with your dietary information and other salient info. We request this information by October 20th.
Introductory Slide
We are asking each participant to provide a single slide introducing themselves. At a minimum, please provide a photo, a brief bio, and a few interesting tidbits related to your attendance at the meeting.
Here is an example from one of the organizers.
You can copy that Google Slide and then edit. Or generate your own PDF with any software. We will generate and share the slide deck for the full set of attendees.
We request this slide by November 1st. Send it to cjhangen@ucsc.edu with subject “MLonMB slide”.
Agenda
A draft of the agenda is below. It emphasizes interaction, i.e. meeting each other and holding informal discussions.
Talks
Time
|
Activity
|
Participants/Notes
|
8-9
|
Coffee/fruit
|
All
|
9-9:05am
|
Welcome (Refer to attendee intro slide packet, Intro to Idea Boards)
|
All
|
9:05a-9:30a
|
Introductory Bingo
|
All
|
9:30a-10:30a
|
Data Talks (10 x 6 minutes)
— Who I am and represent
— Data I have
— Why I obtain it
— What I aim to do with it
|
Capture on Idea boards
|
10:30-10:48
|
Break
|
|
10:48-12pm
|
ML Talks (12 x 6 minutes)
— Who I am and represent
— Technical expertise
— What it is useful for
— Data I’d love to work on
|
All
|
12-1p
|
Lunch
Review Idea Boards
|
All
Organizers
|
1-2pm
|
Results from idea boards, break into smaller groups
|
Capture on idea boards focused on “areas of interest”
|
2-3pm
|
Idea group discussion / walk-around?
|
|
3:15pm-3:30pm
|
Wrap-Up + Future engagement
|
Avoid Friday!
|
3:30pm-end
|
Collaboration building
Hike (for those that want)
Otherwise at site
|
|
When: 9am PT, noon ET, 6pm CET
How: http://MLclub.net
Who:
- Helena Dominguez-Sanchez (ICE)
- Sandra M. Faber (UCSC)
- Josh Peek (STScI)
- Simon D.M. White (MPA)
Title: “Galaxy Morphology in the machine learning era: revolution or incremental science?”
Abstract: An online debate on the challenges of photometric redshift estimation and the role Machine Learning can play in wide-field surveys.
When: 9am PT, noon ET, 5pm CET
How: http://MLclub.net
Who:
- Gary Bernstein (UPenn)
- Olivier Ilbert (LAM, Marsielle)
- Alex Malz (Ruhr U., Bochum)
- Emmanuel Bertin (LAP, Paris)
Title: “Will Machine Learning solve photometric redshifts?”
Abstract: An online debate on the challenges of photometric redshift estimation and the role Machine Learning can play in wide-field surveys.
When: 9am PT, noon ET, 6pm CET
How: http://MLclub.net
Who:
- Julia Kempe (NYU Center for data science)
- David Spergel (Flatiron institute)
- Alex Szalay (John Hopkins Univ.)
- J. Xavier Prochaska (UC Santa Cruz, AAII)
Title: “How should ML penetrate the natural sciences? Do we need ML institutes?”
Abstract: An online debate on the role of ML institutes / Data Science centers in research and education.
Save the dates:
March 23, 9am – 12pm.
March 26, 9am – 12pm.
Check back with us. Details forthcoming!
When/where: March 18, 2020 at 12pm PT; Zoom only — https://ucsc.zoom.us/j/562952785
Title: Deep Learning for Predicting Domain Prices
Speaker: Jason Ansel, Distinguished Engineer at GoDaddy
Learn how GoDaddy uses neural networks to predict the price of a
domain name in the aftermarket. GoDaddy Domain Appraisals (GoValue) is
available to millions of GoDaddy customers and provides estimated
values to help both buyers and sellers more effectively price domain
names. GoValue is 1.25x better at predicting past domain name sale
prices than human experts.
This talk will explain the hybrid recurrent neural networks behind
GoValue. It will discuss some of the practical aspects of scaling and
deploying a sophisticated machine learning system. Finally, we will
dive into recent research at GoDaddy that created a new neural network
structure for outputting tighter prediction intervals than preexisting
techniques.
Try GoDaddy Domain Appraisals for yourself:
https://www.godaddy.com/domain-value-appraisal
When/where: E2-215 at 12pm
Presenter: Jaehoon Lee (Google Brain)
Title: Everything you wanted to know about batch size (in neural net training) but were afraid to ask
Abstract: Recent hardware developments have made unprecedented amounts of data parallelism available for accelerating neural network training. Among the simplest ways to harness next-generation accelerators is to increase the batch size in standard mini-batch neural network training algorithms. In this work, we aim to experimentally characterize the effects of increasing the batch size on training time, as measured in the number of steps necessary to reach a goal out-of-sample error. Eventually, increasing the batch size will no longer reduce the number of training steps required, but the exact relationship between the batch size and how many training steps are necessary is of critical importance to practitioners, researchers, and hardware designers alike. We study how this relationship varies with the training algorithm, model, and data set and find extremely large variation between workloads. Along the way, we reconcile disagreements in the literature on whether batchsize affects model quality. Finally, we discuss the implications of our results for efforts to train neural networks much faster in the future.
Reference: https://arxiv.org/abs/1811.03600
When/where: E2-215 at 12pm
Presenter: David Haan (PBSE)
LURE (Learning UnRealized Events): Finding New(or Equivalent) Driver Mutation Events using Supervised Machine Learning
Cancer is a genetic disease typically resulting from an accumulation of mutations. Mutations in normal cells generally result in repair or cell suicide. In cancer cells, the mutations accumulate leading to an uncontrolled growth otherwise known as a tumor. There are two broadly defined types of mutations, driver and passenger mutations. Tumors contain around 2-5 driver mutations which cause and accelerate cancer, and about 10-200 passenger mutations which are accidental by products and result of thwarted DNA repair mechanism. The driver mutations are what defines the tumor, subtype and are therapeutic targets.
The Cancer Genome Atlas (TCGA) is a publicly accessible atlas of cancer related data from the National Cancer Institute (NCI). This atlas of data is a comprehensive analysis of 9000 patients and 33 cancer subtypes cataloging mutation data, DNA, mRNA, methylation, and protein expression. In particular, the TCGA study of Papillary Thyroid Carcinoma identified two subtypes, one harboring mutations in BRAF and the other were more RAS-like with mutations in KRAS, NRAS, HRAS. The study identified driver mutations, whether BRAF or H/K/NRAS, in about 95% of the samples, leaving about 5% with no known driver mutations. Here we present a tool, “Learning UnRealized Events” (LURE) designed to identify driver mutations in those samples without known driver mutations.
No formal presentation this week. Those who attend are encouraged to bring ~5 min of material to discuss. Their own research, a paper that has excited them, etc.
Pizza will be provided.
Speaker: Majid Moghadam (CS)
Title: Tactical Decision Making for Autonomous Driving Using Deep Reinforcement Learning
Abstract: Following the recent advances in AI, autonomous driving has gained considerable attention in both academia and industries. For autonomous driving the classical paradigm is to use a hierarchical architecture of perception, planning and control; but recent deep learning progress lets foresee the AI-based approaches as the alternative solutions to the problem. Companies are pushing hard to produce the first fully autonomous self-driving cars. Various approaches ranging from end-to-end deep learning techniques to multi-layer hierarchical architectures are being taken to achieve this goal. In most of the approaches, advanced driving assistance systems (ADAS) play a pivotal role in enhancing the driving intelligence. Our work is mostly focused on the decision-making layer of the ADAS systems. High-level decision making is a critical feature for ADAS, that involves several challenges such as uncertainty in other driver’s behaviors and the trade-off between safety and agility. In this work, we develop a novel simulation environment that emulates these challenges and train a deep reinforcement learning agent that yields consistent performance in a variety of dynamic and uncertain traffic scenarios.
pizza will be provided