3/12/12: Tamara Sipes (UCSD ECE)


Multivariate Time Series Classification Using Temporal Metafeature Abstractions

Tamara B. Sipes, Ph.D.
Space Plasma Physics Lab, UCSD
SciberQuest, Inc.

Extraction of knowledge from massive and complex data sets poses a major obstacle to scientific progress, even more so when the data is in the form of time series. We demonstrate a new approach to the classification of multivariate time series data by utilizing an innovative feature extraction technique in combination with a specialized data mining algorithm.  The technique extracts global features and metafeatures in order to capture the necessary time-lapse information. The features are then used to create a static, intermediate data set that includes all the important time-varying information and is suitable for analysis using the standard supervised data mining techniques.  The viability of the new algorithm called MineTool-TS is demonstrated through its application to the problem of automatic detection of flux transfer events in spacecraft data and mining of simulation data.  The technique has also been successfully applied to a variety of medical, biomedical, environmental and space physics data.

Tamara B. Sipes, Ph.D. is a researcher at the Space Physics lab at UCSD specializing in data mining, predictive modeling and computational algorithms.  She has a substantial industry and research experience in building highly innovative predictive models and creative solutions to complex problems.  Her data mining expertise has been applied to a variety of data, industries and areas, including space physics, biotechnology, scientific, financial, robotic, educational and signal processing data.  Her work has led to several patent applications and published research in the areas of data mining and learning technologies.  In addition to her position at SciberQuest, Inc.  Dr. Sipes is also an instructor at the University of California San Diego Extension where she created and teaches several Data Mining courses.   

Besides her appointment at UCSD, Dr. Sipes is a Vice President of Analytics as SciberQuest, Inc.  a company specializing in providing advanced solutions to the most complex computational and data analysis challenges facing the scientific world, including the development of self-adaptive algorithms for modeling of complex, multi-scale problems, computational infrastructure for NASA's magnetospheric virtual observatory , specialized scientific simulations for the Air Force including a first-ever, multi-resolution, 3D, parallel, object-oriented, electromagnetic PIC (particle in cell) code capable of handling complex boundaries, as well as advanced data mining methods for predictive analytics of static, time series, image and simulation data. Dr. Sipes earned her Ph.D. in Computer Science from Vanderbilt University.

3/5/12: Doug White (UC Irvine Anthropology)

Networks, Causality and Evolution of Cooperation


Abstract: The 1927 Menger theorem  proves the equivalence of a maximal k-cohesive set of nodes in a network to a maximal set of nodes in which all pairs of nodes are at least k-connected. We call these equivalent concepts ''stru-cohesion.'' It has a record of powerful causal predictions in social networks. This talk will present six types of examples, and end with a discussion of how stru-cohesion outperforms existing rules for the evolution of cooperation in human groups.


Bio: Doug White is a mathematical anthropology faculty member at UCI and complexity scientist on the external faculty at SFI. He designed one of the major cross-cultural databases in the social sciences (SCCS), founded the prototype for the Kinsources datasite for the study of community social structure, leads a causality research group at SFI working on evolutionary causality, co-authored algorithms for statistical entailment analysis, regular equivalence network role analysis, and stru-cohesion. He has published spates of articles and numbers of books on social networks, mathematical sociology, the comparative network study of human kinship communities, world system economic networks, and the dynamics of global inter-urban networks. He founded the eScholarship World Cultures and Structure and Dynamics ejournals and continues to edit the latter.

http://eclectic.ss.uci.edu/~drwhite/



2/27/12: Lorenzo Torresani (Dartmouth)


Learning a Compact Image Code for Efficient Recognition of Novel Classes

Lorenzo Torresani

Assistant Professor of Computer Science
Dartmouth College

http://www.cs.dartmouth.edu/~lorenzo/home.html

Abstract: In this talk I will discuss methods enabling efficient object-class recognition in large image collections. We are specifically interested in scenarios where the classes to be recognized are not known in advance. The motivating application is "object-class search by example" where a user provides at query time a small set of training images defining an arbitrary novel category and the system must retrieve images belonging to this class from a large database. This application scenario poses challenging requirements on the system design: the object classifier must be learned efficiently at query time from few examples; recognition must have low computational cost with respect to the database size; finally, compact image descriptors must be used to allow storage of large collections in memory.

We propose to address these requirements by learning a compact image code optimized to yield good categorization accuracy with linear (i.e., efficient) classifiers: even when the representation is compressed to less than 300 bytes per image, linear classifiers trained on our descriptor yield accuracy matching the state-of-the-art but at orders of magnitude lower computational cost.



2/20/12: No seminar--Presidents' day


2/13/12: Lilia Iakoucheva (UCSD Psychiatry): Prediction of protein post-translational modifications using machine learning approaches

Prediction of protein post-translational modifications using machine learning approaches


Abstract: I will describe how machine learning approaches could help biologists to predict the sites of posttranslational modifications in proteins using two examples - phosphorylation and ubiquitination. I will also briefly summarize the ongoing systems biology projects that we are currently working on in my lab.


Bio: Dr. Lilia Iakoucheva received her PhD degree from the Institute of Immunology, Moscow, Russia. After completing postdoctoral training in protein biochemistry and protein structure/intrinsic disorder analysis, she joined The Rockefeller University (New York, NY) as a Research Assistant Professor, and then the faculty of the UCSD Department of Psychiatry as an Assistant Professor. Dr. Iakoucheva is applying her experience in protein structure and protein-protein interactions analysis towards investigation of psychiatric disorders. Her research focuses on understanding molecular basis of psychiatric diseases using systems biology approaches. Dr. Iakoucheva has been the principal investigator on research grants from NSF, NCI, NICHD, and NIMH.


http://psychiatry.ucsd.edu/faculty/lIakoucheva.html

2/6/12: Joseph Barr: Risk Scoring and Future Directions (Id Analytics, Inc.)

Risk Scoring and Future Directions

Part 1: ID Analytics main business is scoring applications (for credit/services) for risks including identity/authenticity & credit.  By definition an application is a vector of identity elements (SSN, Name, Address, Phone, DOB, more), a vector known as “SNAPD”, as well as additional fields. ID Analytics process the data, extract pertinent features and calculate risk score on the fly. The entire process has a sub-second latency. At the basis of our analytics is the ID Network – a virtual graph with SNAPD-vectors as nodes. One can envision making a connection between two nodes if they share some identity element.  The weight of the edge is the strength of the connection.  As one can imagine various graphical parameters are the predominant inputs to our risk models.  At the time I write this, the ID network has 1.5 billion nodes (corresponding to number of transactions); this of course means that the graph is too large to be stored in memory, and needless to say, how we do it is a trade secret, but I will indicate some principles behind the ideas.

Part 2:
 The risk ID Analytics is scoring falls under the more general rubric of consumer behavior. We are interested in the spatial / temporal aspects of our network and how it related to macroeconomic and social data including demographics, geography, housing, census, interest rates, unemployment, federal deficit, foreign balance of trade and whatnot.  Under certain conditions, we will avail our data to an outside organization to participate in publishable research.

Introducing id:a labs, a research-oriented organization which promotes collaborations with academia and other research institutions.
 
Bio
Joseph Barr is the Chief Scientist at ID Analytics (www.idanalytics.com). After a few years in academia (as assistant professor at California Lutheran University,) he has spent the past 16+ years in industry as a risk & consumer behavior (analytics) professional. He was awarded a Ph.D. in mathematics from the University of New Mexico in 1991 on his work on graph colorings. His current interests include the application of statistics, machine-learning and combinatorial algorithms to risk management and consumer behavior.

http://www.linkedin.com/in/barranalytics

1/30/12: Vaclav Petricek: Data-driven Matchmaking at Scale (eHarmony, Inc.)

Data-driven Matchmaking at Scale


Note start time is 12:15pm, because of the 11am talk by Andrew Ng; see http://www.cs.ucsd.edu/node/2096


Abstract: Nearly 5% of all US marriages are created by eHarmony. I will talk about the tech that stands behind this and how eHarmony is different from a typical dating site.  I will describe the three main components of eHarmony's approach. First I will discuss the models for predicting deep psychological compatibility. I will then show how we use large scale machine learning to learn models of affinity based on user behavior, demographics, interests etc and show some insights into what makes a match more likely to succeed. Finally I will demonstrate how we use graph optimization to choose the best matches to deliver every single day.

eHarmony iPad app demo:
http://www.youtube.com/watch?v=zQE-ILMmqDs


Bio: Vaclav Petricek is a Principal Data Scientist at Santa Monica-based
eHarmony where he is responsible for optimization and machine learning
for eHarmony's core matchmaking algorithms. He also runs a series of
invited ML talks at eHarmony, part of the Los Angeles Machine Learning Meetup.
Prior to eHarmony, Vaclav was Visiting Researcher at University College, London
where his research spanned recommender systems, social networks, web structure
and online auctions. Prior to that he has worked at several Czech internet startups.
Vaclav earned his PhD in Computer Science and a Masters in Distributed Systems
from Charles University in Prague.

http://www.occamslab.com/petricek/

1/23/12: Lars Kai Hansen (Danish Technical University)

Learning from small samples in high dimensions


AbstractI will discuss recent progress in coping with variance inflation in high-dimensional unsupervised learning (PCA and kPCA). Small sample high-dimensional principal component analysis (PCA) suffers from variance inflation and lack of generalizability. It has earlier been pointed out that a simple leave-one-out variance renormalization scheme can cure the problem. We have generalized the cure in two directions: First, we propose a computationally less intensive approximate leave-one-out estimator, secondly, we show that variance inflation is also present in kernel principal component analysis (kPCA) and we provide a non-parametric renormalization scheme which can quite efficiently restore generalizability in kPCA. As for PCA our analysis also suggests a simplified approximate expression. Finally, I present evidence that these ideas may be relevant also for supervised high-dimensional supervised learning with support vector machines.

Reference: A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis by T. J. Abrahamsen and L.K. Hansen, Journal of Machine Learning Research 12:2027-2044 (2011).


Bio: Professor Lars Kai Hansen is the director of the THOR Center for Neuroinformatics and the Head of the Section for Cognitive Systems at DTU Informatics at the Technical University of Denmark.


http://www.imm.dtu.dk/~lkh

1/16/12: No seminar--MLK day


1/9/12: Dhruv Batra: Focused Inference in Markov Random Fields with Local Primal-Dual Gaps (Toyota Technological Institute at Chicago)

Focused Inference in Markov Random Fields with Local Primal-Dual Gaps

A large number of problems in computer vision, computational biology and robotics can formulated as the search for the most probable state under a discrete probabilistic model -- known as the MAP inference problem in Markov Random Fields (MRFs). 

While a lot of progress has been made on the "static" version of this problem, a number of situations require dynamic inference algorithms that must adapt and reorder computation to focus on "important" parts of the problem. In this talk I will describe one measure for identifying such important parts of the problem -- called Local Primal Dual Gaps (LPDG). LPDG is based on complementary slackness conditions in the Primal-Dual pair of Linear Programs (LP) in the LP relaxation of MAP inference. We have found LPDG to be useful in a number of situations -- speeding-up message-passing algorithms by re-ordering message computations (Tarlow et al. ICML '11), speeding up alpha-expansion by re-ordering label sweeps (Batra & Kohli CVPR '11) and adaptive tightening of the standard LP relaxation by choosing important constraints to add (Batra et al. AISTATS '11). 

Time permitting, I will also talk about our recent work on the M-Best-Mode problem, which involves extracting not just the most probable solution, but also a /diverse/ set of top M most probable solutions in discrete graphical models. 

The talk is meant to be accessible to a broad audience. No background in MRFs or discrete optimization is assumed. 

Joint work with Pushmeet Kohli (MSRC), Vladimir Kolmogorov (IST), Sebastian Nowozin (MSRC), Greg Shakhnarovich (TTIC), Daniel Tarlow (UToronto) and Payman Yadollahpour (TTIC).


Bio: Dhruv Batra is a Research Assistant Professor at Toyota Technological Institute at Chicago (TTIC), a philanthropically endowed academic computer science institute affiliated with the University of Chicago. He received his M.S. and Ph.D. degrees from Carnegie Mellon University in 2007 and 2010 respectively, advised by Tsuhan Chen. In the past, he has held visiting positions at Cornell University and MIT. 

His research interests include machine learning, computer vision and applications of combinatorial optimization algorithms to learning and vision tasks. Specifically, he is interested in structured prediction, MAP inference in MRFs, max-margin methods, co-segmentation in multiple images, and interactive 3D modelling. 
http://ttic.uchicago.edu/~dbatra/