In this talk I will show how complex video analysis can be performed by exploiting the internal redundancy inside a single video. Composing internal chunks of the visual datum allows to make sophisticated inferences about dynamic scenes and events we have never seen before, in a totally unsupervised way, with no prior examples or training data.
I will show the power of this approach through a variety of example problems (as time permits), including:
1. Segmentation of unconstrained videos and images.
2. Unsupervised discovery of new video categories.
3. Detection of complex objects and actions (with no prior examples or training).
Michal Irani is a Professor at the Weizmann Institute of Science, in the Department of Computer Science and Applied Mathematics. She received a B.Sc. degree in Mathematics and Computer Science from the Hebrew University of Jerusalem, and M.Sc. and Ph.D. degrees in Computer Science from the same institution. During 1993-1996 she was a member of the Vision Technologies Laboratory at the Sarnoff Research Center (Princeton). She joined the Weizmann Institute in 1997. Michal's research interests center around computer vision, image processing, and video information analysis. Michal's prizes and honors include the David Sarnoff Research Center Technical Achievement Award (1994), the Yigal Allon three-year Fellowship for Outstanding Young Scientists (1998), the Morris L. Levinson Prize in Mathematics (2003), and the Maria Petrou Prize (awarded by the IAPR) for outstanding contributions to the fields of Computer Vision and Pattern Recognition (2016). She received the ECCV Best Paper Award in 2000 and in 2002, and was awarded the Honorable Mention for the Marr Prize in 2001 and in 2005.
IBM Watson tantalized the world when it beat two grand master champions at the game of Jeopardy in 2011. At the time, Watson was able to understand natural language and answer sophisticated trivia questions. Watson has since been deployed in numerous commercial applications including the Healthcare, Education, Financial, and IOT domains. It has additionally been "given" additional senses including the ability to understand spoken text, speak, and see. In this talk I will focus on Watson's seeing technologies. In particular I will highlight the capabilities in the area of image tagging, medical imaging, security, video analytics, and augmented reality. I will describe the underlying technology as well as some of the novel use cases developed by IBM, by business partners, and other entrepreneurs.
Dr. Aya Soffer is Director of Cognitive Analytics and Solutions Research in IBM. In this role, Dr. Soffer directs the world-wide research strategy and works with IBM’s product groups and customers to drive Research innovation into the Market. Her focus is on applications and solutions leveraging Cognitive Technology that interacts naturally with people to extend what either humans or machines could do on their own. In her 16 years at IBM Dr. Soffer has lead several strategic initiatives that resulted in both breakthrough technology and contributions to IBM products and solutions, most notably IBM Watson. These include search technology, social analytics solutions, and multimedia analytics solutions in the areas of healthcare, security, telco, and commerce.
In the IMVC 2015 and IMVC,2016 meetings I gave technical talks about Machine Learning (Wavelet decomposition of Random Forests) and Deep Learning (Deep Learning on Manifolds). This year I have been asked to give an overview talk and share my perspective on the massively disruptive Artificial Intelligence (AI) wave. Some of topics that will be discussed are:
Shai serves as the head of AI at WIX and is an associate visiting professor at the school of mathematics, Tel Aviv university.
Deep learning plays a key role in the race to autonomous vehicles. Although remarkable progress has been made, the vast majority of both existing theories and technologies have yet to transition to real-world scenarios, which introduce a huge variety of road, weather and lighting conditions as well as deviations of driver behavior.
Nexar builds the world's largest open vehicle-to-vehicle (V2V) network by turning smartphones into connected AI dash-cams. Joining deep learning with millions of crowdsourced driving miles collected by our users, Nexar’s technology provides a new, safer driving experience with the potential of saving the lives of 1.3 million people who die on the road every year.
In this talk, we will share our journey to make the roads safer. We will examine some of the challenges we face, from a real-time collision avoidance system to learning autonomous driving policies for a safer driving experience. We will also share some of our joint research with the Berkeley Deep-Drive (BDD) Industry Consortium and present the Nexar challenge, in which we open some of our deep learning challenges to the outside world and invite aspiring researchers to test their chops, win prizes, and to join our mission to free the world of car accidents.
Ilan Kadar is the Director of Deep Learning at Nexar. Ilan is responsible for leading the deep learning team and effort to leverage Nexar's large-scale datasets of real-world driving environments to automative applications. Prior to Nexar, Ilan was leading the deep learning group at Cortica and was responsible for building the company's machine vision technology. Ilan received his BSc, MSc and PhD degrees in computer science from the Ben-Gurion University of the Negev, Israel, in 2006, 2008, and 2012 respectively (Summa Cum Laude). His research thesis focused on machine learning algorithms for scene recognition and image retrieval, while employing insights from behavioral and psychophysical experiments. His work was published in leading conferences and journals in the areas of machine vision and was awarded the best research project at IMVC in 2013, the Intel award for excellent Israeli PhD students in 2012, and the Friedman award for outstanding PhD students in 2012.
A revolution has started…
Robots are traditionally confined to factories behind steel fences for the safety of the workers around them. That is changing thanks to advanced perception capabilities, robots are now getting ready to be a part of our day to day lives.
For a researcher, joining this revolution is not a simple task. There is a steep learning curve to get into robotics. It is complicated, and it involves working with many devices and different technologies. What if creating perceptive robots would have been easy and fun? Just plug and play. What would you do?
Join us to hear more about Intel® Euclid™ - an innovative all-in-one perception device that allows beginners and professionals to use advanced vision capabilities in their robotic projects.
Amit Moran is leading the Robotics Innovation team as part of the Intel's RealSense Innovation Lab in Israel. In the past 5.5 years in the Innovation Lab, Amit has lead several novel experiences and usages, leveraging from Intel’s RealSense™ program; during which he has been inventing, designing and prototyping innovative concepts for interacting with our computing devices. Currently, Amit is leading a R&D team focused on 3D vision in robotics. Amit holds a MSc degree in Software Engineering from Institute National de Science Appliqué (INSA) de Lyon, France. He is a co-author of several patents in the perceptual computing (NUI) domain.
Reconstruction of a full dynamic scene using a monocular moving camera requires reconstruction of both static and dynamic objects. We present a generic approach for simultaneous estimation of shape, location and velocity for static and moving objects using graph optimization. The main challenge is the unknown scale of each moving object; this is handled with a good initial depth estimation. For validation we created a dynamic simulation of an urban environment. On this simulation our method yields 13% velocity error when the camera is mounted on a car, and 2.8% velocity error when it is mounted on a drone.
I received my B.Sc in computer engineering from the Hebrew University, specializing in electro-optics and microelectronics. In 2012 I received my M.Sc in applied physics, specializing in nanotechnology. I currently work at Rafael's Jerusalem site as a computer vision algorithm developer.
The task of category independent foreground segmentation in images is challenging for a machine learning system, because it needs to learn the general concept of an object, even for object categories that it hasn’t seen during training. In the case of foreground segmentation in videos, the problem is compounded by the fact that the object as well as the background change appearance throughout the video. We propose a method for learning the general concept of object appearance in videos, based on deep neural networks. Apart from learning the object appearance for each frame, our system learns the temporal changes between frames in a video, which represent the object motion, and thus leverages the temporal information available in videos. By learning a category-independent object segmentation, we are able to perform unsupervised video object segmentation. In addition, in the case of semi-supervised video segmentation (where one frame from the video is annotated) we further train our system to recognize a specific object which appears in the video. In both scenarios, our system compares favorably against the state of the art.
Furthermore, we demonstrate a novel use case for video object segmentation, by implementing a mobile application where a user captures a video of an object, and our system is able to segment the object and display it in an AR setting.
Gilad is Visualead's Senior Computer Vision Researcher, where he applies deep learning research on innovative scenarios. Gilad holds a B.Sc. in Electrical Engineering and Physics, and an M.Sc. in EE from Tel Aviv University. He specialised as a computer vision researcher in VISICS lab at KUL university, where he worked on implementing advanced algorithms for action recognition from videos. Gilad published several research-papers in leading computer vision conferences on the topics of image segmentation and action recognition.
Eddie joined Visualead as an accomplished computer vision engineer and quickly grew to lead Visualead's Research vision and efforts. Always working on the Next Big Thing, the highly academic research team is responsible for the company's innovation and next generation of disruptive technologies. Prior to Visualead, Eddie worked in the smartphone dual-lens company CorePhotonics. In the IDF, he took part in the prestigious Psagot program and later served as an academic officer in the unit 8200, where his team won the Israel Defense Award (among others). He holds two B.Sc. degrees from the Technion in EE and Physics.
Trax is built to revolutionize the retail industry using cutting edge computer vision techniques. Our challenges include fine-grained visual recognition of densely arranged items in store displays.
In this presentation, we will introduce a novel CNN architecture which learns both local visual features and neighboring class representations, and extends the softmax function to define a probabilistic graphical model.
In test time, the graph is extracted from a context-enhanced detector, which converges into the correct localizations by iterative reduction of the attention regions, and refinement of object-proposal granularity.
The detected items are then classified by the marginal distributions of the graph joint probability, integrating local and spatial features.
We will further demonstrate how the algorithms can be optimized by dynamic programming techniques and parallel pipeline design.
Eran Goldman is a Computer Vision Researcher at Trax Image Recognition, developing machine learning algorithms for fine-grained image recognition.
His team is focused on the development of a scalable recognition system by using deep learning technology.
Eran works toward a M.Sc. thesis under the supervision of Prof. Jacob Goldberger from the Bar-Ilan University. His research deals with contextual models for object classification with conditional random field as deep neural network.
Rare diseases affect one in every ten people. Many of these diseases are characterized by observable traits of the affected individuals - a ‘phenotype’. In many cases, this phenotype is especially noticeable in the facial features of the patients, Down syndrome for example. But most such conditions have subtle facial patterns and are harder to diagnose. FDNA has developed a unique, Computer Vision based solution called ‘Face2Gene’ - an AI engine of suggested syndromes and genes based on facial images of the patient. FDNA will present its solution, challenges and research aspects of the phenotype analysis.
Yair Hanani is the algorithms team leader at FDNA. He was also one of the first algorithm developers at FDNA and has a wide experience in the fields of Computer Vision, Machine Learning and Deep Learning. He holds a BSc in Bio-Medical engineering from Tel Aviv University and an MSc in Electrical Engineering, specializing in Computer Vision from Tel Aviv University under the supervision of Prof. Lior Wolf.
Correctly matching feature points in a pair of images is an important pre-processing step for many computer vision applications. In this talk we propose an efficient method for estimating the number of correct matches without explicitly computing them. In addition, our method estimates the region of overlap between the images. To this end, we propose to analyze the set of matches using the spatial order of the features, as projected to the x-axis of the image. The set of features in each image is thus represented by a sequence. This reduces the analysis of the matching problem to the analysis of the permutation between the sequences.
Using the Kendall distance metric between permutations and natural assumptions on the distribution of the correct and incorrect matches, we show how to estimate the above-mentioned values. We demonstrate the usefulness of our method in two applications: (i) a new halting condition for RANSAC based epipolar geometry estimation methods that considerably reduce the running time, and (ii) discarding spatially unrelated image pairs in the Structure-from-Motion pipeline.
Authors: Lior Talker, Yael Moses, Ilan Shimshoni
Ilan Shimshoni received his B.Sc. in mathematics from the Hebrew University in Jerusalem, his M.Sc. in computer science from the Weizmann Institute of Science, and his Ph.D. in computer science from the University of Illinois at Urbana Champaign (UIUC).
Ilan was a post-doctorate fellow at the faculty of computer science at the Technion, from 1995-1998, and was a member of the faculty of industrial engineering and management from 1998-2005. He joined the department of Information Systems (IS) at Haifa University in October 2005.
His research areas are computer vision, computer graphics and their application to for example archaeology. He is an associate editor of IEEE PAMI.
While augmented reality (AR) has been steadily gaining popularity in recent years, the majority of AR solutions are either mixed reality (MR), where graphics are not attached to specific elements in the scene, or marker-based AR, where graphics are anchored to a planar object visible in the scene. These solutions are often suitable for limited scenarios, but typical use-cases in the industrial domain require precisely anchored, marker-free AR. Our work in the Cognitive Vision and Augmented Reality group at IBM Research in Haifa focuses on research in AR for technical training, field maintenance, IoT, and other use-cases. In this talk I will present the group's AR platform and review some of the real-world installations in the industrial domain.
Yochay Tzur leads the 3D vision and augmented reality research at IBM Research - Haifa. He joined IBM in 2013, with over 10 years of hands-on experience in companies such as Rafael and CSR-Zoran. At IBM, Yochay has been leading the development of AR projects, from prototype stage to realization at the customer's site. In his current role, he is responsible for the development of the complete AR offering of the lab. Yochay received his B.Sc. and M.Sc degrees in computer science from the Technion and is the author of several patents.
This talk will introduce the listeners to the world of capsule endoscopy and its computer vision and machine learning challenges. One application of this pioneering technology is as a screening tool for detecting polyps, a precursor to colorectal cancer.
We will go over some of the technological innovations which are part of our research. The highlight of the talk will introduce a novel and general-purpose classification algorithm aimed at directly optimizing for the image classification task. The algorithm includes handling hundreds of millions of healthy examples, seamless transitions between penalty functions and grouping examples.
Mammography breast cancer screening over the past three decades has doubled the rate of early breast cancer detection and is credited with decreasing the mortality from breast cancer. Expert screening with current computer-aided detection (CAD) tools has sensitivity estimated between 68%-87% and specificity of 75% 91%. We present a mammography detection algorithm based upon machine learning techniques utilizing both public and proprietary training datasets. We used a committee of diversely filtered shallow Theano-based convolutional neural networks, pre- trained on ImageNet data and fine tuned using False-Color Enhanced Contrast-Limited Adaptive Histogram Equalized mammographic images. Our results yield accuracy of over 98%.
Eldad Elnekave, MD is a US and Israeli Board Certified Radiologist and has served as the Chief Medical Officer of Zebra Medical Vision since its founding. He completed his Interventional Radiology training at Memorial-Sloan Kettering Cancer Center and practices clinically as the Director of the Interventional Oncology clinic at Davidoff Oncology Center. Dr. Elnekave envisions a foreseeable future in which machine learning algorithms will contribute substantial, even critical, insight in every radiologic examination, from X-ray to CT and MRI.
Breathing monitors have become the all-important cornerstone of a wide variety of commercial and personal safety applications, ranging from elderly care to baby monitoring. Many such monitors exist in the market, some, with vital signs monitoring capabilities, but none remote. We presents a simple, yet efficient, real time method of extracting the subject's breathing sinus rhythm. Points of interest are detected on the subject's body, and the corresponding optical flow is estimated and tracked using the well-known Lucas-Kanade algorithm on a frame by frame basis. A generalized likelihood ratio test is then utilized on each of the many interest points to detect which is moving in harmonic fashion. Finally, a spectral estimation algorithm based on Pisarenko harmonic decomposition tracks the harmonic frequency in real time, and a fusion maximum likelihood algorithm optimally estimates the breathing rate using all points considered. The results show a maximal error of $1$ BPM between the true breathing rate and the algorithm's calculated rate, based on experiments on two babies and three adults.
Nir is a researcher with Essence Group CTO’s office, responsible for bringing to life future technologies for the company. Nir’s research interests are statistical signal processing, namely, estimation, detection theory and machine learning applied to problems in computer vision and radar remote sensing. Nir is also a Ph.D. student in Ben-Gurion University, researching the problem of radar remote sensing, classification and recognition of miniature drones.
Abstract - Over the past 20 years, digital wireless communications has become an essential technology for many industries, and a primary driver for the electronics industry. Today, computer vision is showing signs of following a similar trajectory. Once used only in low-volume applications such as manufacturing inspection, vision is now becoming an essential technology for a wide range of mass-market devices, from cars to drones to mobile phones. This presentation examines the motivations for incorporating vision into diverse products, presents case studies that illuminate the current state of vision technology in high-volume products, and explores critical challenges to ubiquitous deployment of visual intelligence.
Ananth Kandhadai is a Senior Director, Engineering in the Multimedia R&D group in Qualcomm Technologies Inc. His research interests include Speech coding and enhancement, Image processing for cameras, Model based Computer Vision, Deep Learning for object detection and tracking, Power constrained system design, and Vision hardware acceleration. He has over two decades of experience working on speech codec standardization, camera system design, and computer vision involving fundamental research as well as commercial implementation. He currently leads Computer Vision technology development for Qualcomm’s Snapdragon SoCs, including technology roadmap, use-cases, algorithms, system design, architecture, hardware, software development, and customer product delivery.
This talk discusses some of the history of graphical models and neural networks and speculates on the future of both fields with examples from the particular problem of image restoration.
Yair Weiss is a Professor at the School of Computer Science and Engineering at the Hebrew University and is currently serving as the Head of the School. His research interests include machine learning, human vision and computer vision. He served as the program chair for NIPS 2004, general chair of NIPS 2005 and will be a program chair ECCV 2018.
Together with his students and colleagues he has received best paper awards at UAI, NIPS, ECCV and CVPR.
William Joy is the founder and CEO of Video++. William previously founded Venvy, one of the first interactive video service with funding from Harvard Innovation Lab and Harvard Veritas Venture. William has then been focusing on developing new video technology and specializing in video recognition. He was named one of the “30 notable entrepreneurs under the age of 30” by Forbes in 2015. At the age of 21, he was the youngest recipient of this title. He is also the recipient of Red Herring Award as “Asia’s top 100” companies.
Video++ is China's largest in-video technologies company and No.1 market share in-video operating system (Video OS) offering automated video recognition, ads matching and interactive video features that enables platforms and publishers to engage and monetize their user traffic. Serving over 11,767 video platforms, Video++ has created value for over 11.6 million videos with over 10.1 billion service requests per month.
1 Nirim St. P.O.Box 9352
Tel Aviv 6109202