2013年1月14日星期一

New Challenges in Computer Science Research




Yesterday afternoon at the 2012 Computer Science Faculty Summit, there was a round of lightning talks addressing some of the research problems faced by Google across several domains. The talks pointed out some of the biggest challenges emerging from increasing digital interaction, which is this year’s Faculty Summit theme.

Research Scientist Vivek Kwatra kicked things off with a talk about video stabilization on YouTube. The popularity of mobile devices with cameras has led to an explosion in the amount of video people capture, which can often be shaky. Vivek and his team have found algorithmic approaches to make casual videos look more professional by simulating professional camera moves. Their stabilization technology vastly improves the quality of amateur footage.

Next, Ed Chi (Research Scientist) talked about social media focusing on the experimental circle model that characterizes Google+. Ed is particularly interested in how social interaction on the web can be designed to mimic live communication. Circles on Google+ allow a user to manage their audience and share content in a targeted fashion, which reflects face-to-face interaction. Ed discussed how, from an HCI perspective, the challenge going forward is the need to consider the trinity of social media: context, audience, content.

John Wilkes, Principal Software Engineer, talked about cluster management at Google and the challenges of building a new cluster manager-- that is, an operating system for a fleet of machines. Everything at Google is big and a consequence of operating at such tremendous scale is that machines are bound to fail. John’s team is working to make things easier for internal users enabling our ability to respond to more system requests. There are several hard problems in this domain, such as issues with configuration, making it as easy as possible to run a binary, increasing failure tolerance, and helping internal users understand their own needs as well as the behavior and performance of their system in our complicated distributed environment.

Research Scientist and coffee connoisseur Alon Halevy took to the podium to confirm that he did indeed author an empirical book on coffee, and also talked with attendees about structured data on the web. Structured data is comprised of hundreds of millions of (relatively small) tables of data, and Alon’s work is focused on enabling data enthusiasts to discover and visualize those data sets. Great possibilities open up when people start combining data sets in meaningful ways, which inspired the creation of Fusion Tables. An example is a map made in the aftermath of the 2011 earthquake and tsunami in Japan, that shows natural disaster data alongside the locations of the world’s nuclear plants. Moving forward, Alon’s team will continue to think about interesting things that can be done with data, and the techniques needed to distinguish good data from bad data.

To wrap up the session, Praveen Paritosh did a brief, but deep dive into the Knowledge Graph, an intelligent model that understands real-world entities and their relationships to one another-- things, not strings-- which launched earlier this year.

The Google Faculty Summit continued today with more talks, and breakout sessions centered on our theme of digital interaction. Check back for additional blog posts in the coming days. 

Machine Learning Book for Students and Researchers




Our machine learning book, The Foundations of Machine Learning, is now published! The book, with authors from both Google Research and academia, covers a large variety of fundamental machine learning topics in depth, including the theoretical basis of many learning algorithms and key aspects of their applications. The material presented takes its origin in a machine learning graduate course, "Foundations of Machine Learning", taught by Mehryar Mohri over the past seven years and has considerably benefited from comments and suggestions from students and colleagues at Google.

The book can serve as a textbook for both graduate students and advanced undergraduate students and a reference manual for researchers in machine learning, statistics, and many other related areas. It includes as a supplement introductory material to topics such as linear algebra and optimization and other useful conceptual tools, as well as a large number of exercises at the end of each chapter whose full solutions are provided online.

Better table search through Machine Learning and Knowledge




The Web offers a trove of structured data in the form of tables. Organizing this collection of information and helping users find the most useful tables is a key mission of Table Search from Google Research. While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are "good" (one that contains meaningful structured data) and which ones are "bad" (for example, a table that hold the layout of a Web page). In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations. This new classifier is asupport vector machine (SVM) that makes use of multiple kernel functions which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research [1,2].

We are also able to achieve a better understanding of the tables by leveraging the Knowledge Graph. In particular, we improved our algorithms for identifying the context and topics of each table, the entities represented in the table and the properties they have. This knowledge not only helps our classifier make a better decision on the quality of the table, but also enables better matching of the table to the user query.

Finally, you will notice that we added an easy way for our users to import Web tables found through Table Search into their Google Drive account as Fusion Tables. Now that we can better identify good tables, the import feature enables our users to further explore the data. Once in Fusion Tables, the data can be visualized, updated, and accessed programmatically using the Fusion Tables API.

These enhancements are just the start. We are continually updating the quality of our Table Search and adding features to it.

Stay tuned for more from Boulos Harb, Afshin Rostamizadeh, Fei Wu, Cong Yu and the rest of the Structured Data Team.


[1] Algorithms for Learning Kernels Based on Centered Alignment
[2] Generalization Bounds for Learning Kernels