April 8, 2024
Pavilion 10, EXPO Tel Aviv
Blue white robotics and Tel-Aviv Unviersity
Amazon Prime Video Sports
Tel Aviv University
Weizmann Institute of Science
Bar-Ilan University and OriginAI
Galilee Medical Center, Nahariya Israel
Max Stern Yezreel Valley College
New York University (NYU)
Ben-Gurion University of the Negev
Ben-Gurion University of the Negev (BGU)
Tel Aviv University & Google research
Technion – Israel Institute of Technology
Tel Aviv University
Product and AI Consultant
The Australian National University and Technion, Israel Institute of Technology
Hebrew University of Jerusalem
Gentex Technologies Israel
NVIDIA and Technion
Conflu3nce ltd and Conflu3nce Health AI (CHAI)
R&D Lab Group Manager General Motors
Gershon Celniker is an R&D Lab Group Manager at GM, previously a Principal Data Scientist at Verint, Check Point and Chief Data Scientist at Wiser. He holds a BSc from Technion Institute and a MSc from Hebrew University in Bioinformatics and Machine learning applications with vast academic experience as a fellow CS researcher from Weizmann institute and Tel-Aviv University. Currently, his main areas of research interest lie in the design of AI and CV algorithms and their applications in the Automotive industry.
Understanding and modeling gaze patterns in the automotive environment
intuitively, when a person is relaxed and has no task to perform, one tends to look at salient objects in the field of view (bottom-up). As tasks are introduced and workload increases, one usually tends to select a more task-oriented gaze behavior (top-down) and a shift from salient objects-oriented gaze patterns to important objects-oriented gaze patterns can be observed. In the automotive environment, this shift between gaze pattern types and their linkage to the driver or passengers’ states suggests that modeling a gaze pattern can lead to an understanding of one’s state and vice versa.
Gaze patterns were modeled by training both deep learning networks and statistical models. Deep learning networks were trained to digest effectively larger datasets, and statistical models were selected for their simplicity and explainability. A set of experiments was conducted both in real-world setups and in simulated environments. The real-world experiments took place in Israel and the USA while modeling the behavior of drivers and passengers. Overall, our results supported our assumptions and can be divided into two types: prediction of expected gaze patterns given the environment and establishing a linkage between gaze patterns and the driver’s and passenger’s state.
Roy Orfaig specializes in the fields of AI, computer vision, and robotics for autonomous vehicles. He has been serving as a lecturer and an advisor for master's research at Tel-Aviv University, focusing on perception, localization and mapping applications for autonomous robots within the Department of Electrical Engineering.
Furthermore, he is an AI Tech Lead at Blue White Robotics, a startup that pioneers cutting-edge autonomous tractors for smart farming. Before joining Blue White Robotics, he held various key roles and gained extensive experience at top companies such as Applied Materials, Elta (within the autonomous ground robotics group), and Brodmann17. He holds an M.Sc. in Electrical Engineering from Ben-Gurion University.
CLRMatchNet: Enhancing Curved Lane Detection with Deep Matching Process
Lane detection is crucial for autonomous driving, furnishing indispensable data for safe navigation. Modern algorithms employ anchor-based detectors, followed by a label assignment process to categorize training detections as either positive or negative instances. However, the existing methods might be limited and not necessary optimal due to relying on predefined classic cost functions with not many calibration parameters.
Our research introduces MatchNet, a deep learning-based approach aimed at optimizing the label assignment process. Interwoven into a SOTA lane detection network such as CLRNet, MatchNet replaces the conventional label assignment process with a submodule network. This integration yields significant enhancements, particularly in challenging scenarios such as curve detection (+1.82%), shadows (+1.1%), and non-visible lane markings (+0.88%). Notably, this method boosts lane detection confidence level, enabling a 3% increase in the confidence threshold.
AI Research Team LeadTheator
Omri Bar is leading the artificial intelligence research group at Theator, an AI-focused health care start-up that builds the next generation of AI in surgery.
Over the past ten years, Omri has gathered extensive experience in applying machine learning in various medical start-ups focusing mainly on Computer Vision and Natural Language Processing.
Omri holds an MSc in Electrical and Computer Engineering and a BSc in Biomedical Engineering.
Novel applications of deep video action recognition: How data-driven video understanding powers better surgical practice
Video plays an increasingly important role in our lives, motivating the development of novel machine learning methods for video understanding. However, using machine learning to extract information from video is inherently complex, as both spatial and temporal information must be processed. In this talk, we will review several deep video action recognition methods and discuss challenges that arise in developing and deploying AI-based models for video understanding in novel settings. Using surgical videos, we will demonstrate how such models operate in real-time, and how they are unlocking the immense potential of large-scale data to drive better patient care in the surgical domain.
Applied ScientistAmazon Prime Video Sports
Shachar is an applied scientist at Amazon, Prime Video Sports division. Specializing in computer vision and deep learning, she holds an M.Sc. in Electrical Engineering from Tel Aviv University, with research focused on Light Field photography.
Sports, Computer Vision and AI
American Football, a complex sport of strategy, is the most popular sport in the US, engaging tens of millions of fans every week. Amazon owns exclusive broadcast rights for the Thursday Night Football (TNF) and is working to create a unique viewing experience, presenting new analytic features and enhanced graphics to help fans get more out of the game. This lecture will give a peek into the new features of the 2023 season, covering two ML powered features that are based on player tracking data collected in low latency in the NFL venues
Miri Kenig is a physicist studying the ability of generative AI to learn, generalize, and explore quantum reality. Miri holds a BSc and MSc (with honors) in physics. As part of her Ph.D. at the School of Physics & Astronomy at Tel Aviv University, she developed a deep learning algorithm capable of learning quantum processes from examples only. Her research was published in the physics journal Physical Review A (PRA), presented at physics conferences (IPS 2023), and covered recently by Ynet Science. Miri is currently working on further developing this approach to analyze and explore poorly understood physical phenomena.
Exploring and analyzing quantum dynamics with generative AI
In this talk, I will show that generative models can learn the dynamics of interacting quantum particles on disordered chains, a general scenario underlying a wide range of physical problems, from many-body quantum physics to quantum computation. Our algorithm learns complex quantum correlations from unlabeled examples and can then generate new physically valid instances with tunable physical parameters. This enables post-training exploration of the problem space, revealing underlying physical phenomena and accelerating the learning of more complex problems. These results suggest a general framework for generative AI in physical analysis and discovery.
Postdoctoral fellowWeizmann Institute of Science
Alona Strugatski-Faktor is a Postdoctoral fellow at the Weizmann Institute of Science. Alona's research focuses on cognitive capabilities of AI models and visual scene interpretation.
She is specifically interested in combining human vision research with state-of-the-art AI models. Alona holds a B.Sc. in Physics and Electrical Engineering from the Technion,
a M.Sc. in Electrical Engineering from Tel Aviv University and a PhD in Mathematics and Computer Science from the Weizmann Institute of Science.
Why does Visual-Language Models Struggle with Scene Structure Extraction
Though the huge breakthrough in vision-language models, they are still far from achieving human-level scene-understanding and have several fundamental limitations.
We show that these models are not able to perform simple tasks such as questions regarding locations and relations between objects. We suggest a model which can
naturally answer such questions and achieve scene-understanding even for complex scenes. It does this by using an iterative goal-driven approach that resembles
human vision. Our model is able to focus its attention in each iteration on the relevant parts of the scene and thus iteratively build a complex understanding of the scene.
PhD studentBar-Ilan University and OriginAI
Yochai Yemini is a PhD student at Bar-Ilan University, under the supervision of Prof. Sharon Gannot and Dr. Ethan Fetaya. He is also a deep learning researcher at OriginAI. His areas of interest include computer vision, speech processing and their intersection, and his current research focuses on deep learning methods for audio-visual tasks.
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
In the lip-to-speech task, the objective is to accurately generate the missing speech for a soundless video of a person talking. It is required, e.g., when the speech signal is completely obfuscated by background noises. In this talk, I will present LipVoicer, a novel approach for producing high-quality speech for in-the-wild silent videos. LipVoicer leverages the transcription of the speech we wish to generate as predicted by a lip-reading model, and a diffusion model conditioned on the video to generate mel-spectrograms. LipVoicer achieves exceptional results, and the generated speech sounds natural and synchronized to the lip motion.
Ward van der Tempel invented Switching Pixels® Active Event Sensor (SPAES) technology. He has over 15 years of experience in CMOS analog and digital sensor design and sensor fabrication process. He was co-founder and Product Director of Spectricity, developing miniature spectrometer solutions. Before this, he co-founded Optrima (merged with SoftKinetic, then acquired by Sony) to bring to market 3D Time-of-Flight technology. After the acquisition by Sony, Ward was Head of Technology at Sony DepthSensing Solutions, driving its 3D time-of-flight (ToF) development. Ward holds an MSc. Eng. degree and a Ph.D. in Electrical Engineering, both from the Vrije Universiteit Brussel, Belgium.
Low-Power, Low-Latency Perception for XR
XR devices immerse users in augmented realities, seamlessly merging digital and physical realms. Achieving this demands advanced perception technology that is resilient across environments, low in power usage, and has minimal latency. Yet, existing solutions struggle to meet this demanding combination of requirements, even with Apple’s Vision Pro setting a new standard for XR glasses’ 3D perception.
VoxelSensors’ Active Event Sensors (AES) enable robust, low-power, low-latency 3D sensing using laser triangulation. This innovation enhances SLAM, odometry, gesture recognition, and tracking, potentially revolutionizing augmented reality experiences. Ward will outline this groundbreaking approach and perspectives in XR 3D perception.
Tel-Aviv UniversityPhD student
Yotam Nitzan is a Ph.D. student at Tel-Aviv University, advised by Daniel Cohen-Or, and a Research Intern at Adobe working with Eli Shechtman. Previously he was a Student Researcher at Google Research working with Kfir Aberman. His research is focused on forming and leveraging deep generative priors for downstream tasks. Previously, he received an M.Sc. in computer science from Tel-Aviv University and a B.Sc. in applied mathematics from Bar-Ilan University.
Domain Expansion of Image Generators
Can one inject new concepts into a trained generative model, while respecting its existing structure and knowledge? We propose a new task - domain expansion - to address this. Given a pretrained generator and novel (but related) domains, we expand the generator to jointly model all domains, old and new, harmoniously. Interestingly, we find that the latent space offers unused, "dormant" directions, which do not affect the output. This provides an opportunity: By "repurposing" these directions, we can represent new domains without perturbing the original representation. In fact, we find that pretrained generators have the capacity to add hundreds of new domains!
Computer Vision ResearcherForesight Autonomous
Omri is a computer vision researcher at the R&D team of Foresight Autonomous.
His main research topics are sensor pose estimation and 3D reconstruction. He holds his BSc in Computer Science from Ben-Gurion University.
Consistent Pixel Matching between different cameras using individual temporal updates
Matching points across images from different cameras is commonly used in vision systems for a variety of purposes such as 3D reconstruction and field calibration. Systems doing so over time are often designed to achieve high consistency. Meaning that matches may change but do not jitter between solutions or errors. The presented method estimates consistent matches in adynamic environment using dense optical flows between the sequent images of each individual camera.
Sapir holds a B.Sc. in Physics and Electrical Engineering from Tel Aviv University. She is currently pursuing her M.Sc. in Electrical Engineering at TAU, focusing on detection for autonomous driving under the supervision of Prof. Ben-Zion Bobrovsky and Roy Orfaig. Additionally, Sapir works as an Algorithm Engineer at Samsung R&D Center in the field of image processing.
MD., Dept. Of Orthopedics B and Spine surgeryGalilee Medical Center, Nahariya Israel
I am Hamza Murad, currently a resident in Orthopedic Surgery at Galilee Medical Center. My academic journey includes a solid biology education at Technion Institute of Technology, accompanied by medical studies at Hebrew University in Jerusalem. Proficient in Python programming, I am particularly drawn to the application of unsupervised techniques in skeletal radiology. By merging my medical expertise with technical skills, I am eager to offer a distinctive viewpoint at the upcoming Computer Vision Conference, where the fusion of medical imaging and technology takes center stage.
Clustering-based Detection of Occult Osteoporotic Fractures using Machine Learning and CT Scans
Osteoporotic vertebral compression fractures (VCFs) in the elderly pose significant quality-of-life challenges. a fraction of the VCF are occult, and cannot be distinguished from normal vertebrae by Traditional imaging like x-rays and CT scans. Tc99 bone scans is usually utilized to identify occult fractures. We propose a data-driven solution employing machine learning and computer vision to identify unique radiological patterns in occult VCFs. Our method, using only CT scans data of 24 vertebrae, successfully segments vertebrae into clusters, revealing distinct volume ratios that distinguish normal vertebrae from those with occult fractures. Importantly, we identified that vertebral posterior element volumes aid occult fracture identification and may play a role in the pathology of VCF. This approach highlights the potential of machine learning in enhancing skeletal condition diagnosis, bridging inter-modality gaps.
Ph.D., Senior Lecturer at the Department of Information Systems Max Stern Yezreel Valley College
I am Dr. Loai Abdallah, and I have honed my expertise in data analysis and artificial intelligence over 15 years. I hold a senior lecturer position at the Department of Information Systems at the Max Stern Yezreel Valley College. My main research is focusing in data mining and big data. In addition to my academic pursuits, I am the founder and CEO of xBiDa, a company at the forefront of AI for big data and computer vision.
Senior Machine Learning EngineerSightful
Elad Levi is a machine learning engineer at Sightful, a startup that is creating the first AR laptop. His work focuses on leveraging multimodal inputs (in particular vision and language) in order to build a novel AR operating system. Elad received a PhD degree in mathematics from the Hebrew University. His thesis was in the field of model-theoretic with applications to combinatorics problems.
Democratizing Large Language Models
Large language models (LLMs) have emerged as a breakthrough technology, exhibiting remarkable performance across a wide range of tasks. Until recently, the development of LLMs seemed constrained by high barriers, resulting in a few companies dominating the field. However, recent advancements in the field have significantly lowered these barriers, enabling the development of high-quality LLMs with a limited amount of effort and computation resources.
In this tutorial, we will explore the challenges involved in building LLM models, the development that allows building such high-performance custom models with a small amount of resources, and the new possibilities it unlocks, including multimodal extension and expanded context windows.
MSc student in the Electrical and Computer Engineering faculty Technion
Eyal is currently pursuing his MSc in Electrical and Computer Engineering at the Technion, mentored jointly by Dr. Moti Freiman (Faculty of Biomedical Engineering) and Prof. Israel Cohen (Faculty of Electrical and Computer Engineering). His ongoing research is centered on creating deep-learning models with physical constraints for motion correction in medical imaging. Alongside his academic work, Eyal serves as an AI Research Intern at GE Research. He brings several years of industrial experience as an algorithm and computer vision engineer to his role. He earned his B.Sc in Electrical and Computer Engineering from the Technion.
Free-breathing myocardial T1 mapping with Physically-Constrained Motion Correction
T1 mapping is a quantitative MRI technique that has emerged as a valuable tool in the diagnosis of diffuse myocardial diseases. However, prevailing approaches have relied heavily on breath-hold sequences to eliminate respiratory motion artifacts. This limitation hinders accessibility and effectiveness for patients who cannot tolerate breath-holding. We address this limitation by introducing PCMC-T1, a physically-constrained deep-learning model that accounts for the signal decay along the longitudinal relaxation axis for motion correction in free-breathing T1 mapping. PCMC-T1 demonstrated superior results compared to baseline methods using a 5-fold experimental setup on a publicly available dataset of 210 patients.
Assistant Professor and Faculty FellowNew York University (NYU)
Ravid Shwartz-Ziv is currently a CDS Assistant Professor and Faculty Fellow at the NYU Center for Data Science. He collaborates with Prof. Yann LeCun, focusing on neural networks, information theory, and self-supervised learning. Ravid's research aims to dissect the complexities of deep neural networks to enhance their efficiency and effectiveness. He is particularly intrigued by what defines a 'good' representation in machine learning and explores its impact on various applications. His work also delves into data compression and its implications for machine learning, as well as investigating the essential components for effective learning and the dynamics of training algorithms.
Decoding the Information Bottleneck in Self-Supervised Learning: Pathway to Optimal Representations and Semantic Alignment
Deep Neural Networks (DNNs) have excelled in many fields, largely due to their proficiency in supervised learning tasks. However, the dependence on vast labeled data becomes a constraint when such data is scarce.
Self-Supervised Learning (SSL), a promising approach, harnesses unlabeled data to derive meaningful representations. Yet, how SSL filters irrelevant information without explicit labels remains unclear.
In this talk, we aim to unravel the enigma of SSL using the lens of Information Theory, with a spotlight on the Information Bottleneck principle. This principle, while providing a sound understanding of the balance between compressing and preserving relevant features in supervised learning, presents a puzzle when applied to SSL due to the absence of labels during training.
We will delve into the concept of 'optimal representation' in SSL, its relationship with data augmentations, optimization methods, and downstream tasks, and how SSL training learns and achieves optimal representations.
Our discussion unveils our pioneering discoveries, demonstrating how SSL training naturally leads to the creation of optimal, compact representations that correlate with semantic labels. Remarkably, SSL seems to orchestrate an alignment of learned representations with semantic classes across multiple hierarchical levels, an alignment that intensifies during training and grows more defined deeper into the network.
Considering these insights and their implications for class set performance, we conclude our talk by applying our analysis to devise more robust SSL-based information algorithms. These enhancements in transfer learning could lead to more efficient learning systems, particularly in data-scarce environments.
Joint work with Yann LeCun, Ido Ben Shaul, and Tomer Galanti.
Computer Vision Algorithm Developer Ben-Gurion University of the Negev
Meitar Ronen is currently an algorithm developer at General Motors. She earned her MSc in Computer Science from Ben-Gurion University where she did her research at the Vision, Inference and Learning lab, under Dr. Oren Freifeld's supervision. Her research interests include unsupervised learning, probabilistic models, Bayesian nonparametrics, and Deep Learning. She has authored or co-authored papers published in CVPR 2022, ICCV 2019, and EDM 2019. During her MSc, Meitar won the national VATAT scholarship for outstanding women in the Hi-Tech fields.
Deep Clustering with an Unknown Number of Clusters
Deep Learning (DL) has shown great promise in the unsupervised task of clustering. That said, while in classical (i.e., non-deep) clustering the benefits of the nonparametric approach are well known, most deep-clustering methods are parametric: namely, they require a predefined and fixed number of clusters, denoted by K. When K is unknown, however, using model-selection criteria to choose its optimal value might become computationally expensive, especially in DL as the training process would have to be repeated numerous times. In this work, we bridge this gap by introducing an effective deep-clustering method that does not require knowing the value of K as it infers it during the learning. Using a split/merge framework, a dynamic architecture that adapts to the changing K, and a novel loss, our proposed method outperforms existing nonparametric methods (both classical and deep ones). While the very few existing deep nonparametric methods lack scalability, we demonstrate ours by being the first to report the performance of such a method on ImageNet. We also demonstrate the importance of inferring K by showing how methods that fix it deteriorate in performance when their assumed K value gets further from the ground-truth one, especially on imbalanced datasets.
CTO and VP software engineering, Edison DataGE HealthCare
Dr. Ruth Bergman serves as the Chief Technology Officer at Edison Data within GE HealthCare's Science and Technology Organization. Her team's focus lies in establishing a unified data fabric to ensure consistency in data aggregation, normalization, and exchange across healthcare devices and applications. This comprehensive data fabric incorporates diverse patient data, spanning medical images, waveforms, labs, pathology, and genomic profiles. This integrated data accelerates analytics and machine learning efforts, accessible via open-standard Application Programming Interfaces (APIs). Dr. Bergman's prior achievements include spearheading the development of Graffiti, the first FDA Cleared Clinical Virtual Assistant, and a cloud-based collaboration tool for clinicians, both aimed at enhancing patient care and preventing sepsis-related deterioration. Her expansive experience encompasses leadership roles at GE Global Research and Hewlett Packard Labs Israel, underpinned by a profound technology background encompassing machine learning, artificial intelligence, computer vision, and algorithms. Dr. Bergman holds a PhD in Electrical Engineering and Computer Science from MIT, along with a wealth of patents and academic publications.
Navigating the AI Landscape in Healthcare: Striking the Balance Between Uncertainty and Risk
Risk management is paramount in all enterprizes, and particularly in Healthcare where patient safety takes precedence. Flaws in design or product functionality can result in treatment delays, patient harm, and reputational damage. AI, exemplified by models like ChatGPT and Dall-E, has the potential to revolutionize digital Healthcare, enabling more patient-focused care and informed interactions. However, AI's inherent uncertainty and the risk of generating inaccurate or misleading outputs pose challenges. This discussion explores the delicate balance between leveraging AI's capabilities and safeguarding patient safety, highlighting the need for responsible implementation in healthcare settings.
Senior ResearcherGeneral Motors
Michael Baltaxe is a senior researcher at General Motors. He works in machine learning and computer vision projects in the automotive field, specially focusing on scene understanding using multiple viewing sensors and 3D point clouds. His research strives to improve machine perception in complex scenarios by harnessing multi-modal data gathered in efficient manners. Previously, Michael held algorithm development positions at Microsoft and Orbotech. He holds and M.Sc. in Computer Science from the Technion.
Polarimetric Imaging for Perception
Autonomous driving and advanced driver-assistance systems rely on sensors and algorithms to perform appropriate actions. Typically, the sensors include color cameras, radar, lidar and ultrasonic sensors. Strikingly however, although light polarization is a fundamental property of light, it is seldom harnessed for perception tasks. Here, we analyze the potential for improvement when using an RGB-polarimetric camera for the tasks of monocular depth estimation and free space detection, as compared to using a standard RGB-only camera. We show that quantifiable improvement can be achieved using state-of-the-art neural networks, with minimum architectural changes. Additionally, we introduce an open dataset with RGB-polarimetric images, lidar scans, GNSS / IMU readings and free space segmentations that can be used by the community for new research.
Prof. Tammy Riklin Raviv leads the Biomedical Image Computing group at the School of Electrical and Computer Engineering of Ben-Gurion University of the Negev (BGU). Her research interests focus on the development of Deep Learning and Computer Vision algorithms mainly for Medical and Microscopy Imaging Analysis and Computational Neuroscience.
She currently serves as an Associate Editor at the IEEE Transactions on Medical Imaging (TMI) journal and at the IEEE Bio Imaging and Signal Processing (BISP) committee. She also serves on the steering committee of the IEEE International Symposium on Biomedical Imaging (ISBI) 2024 and at the organizing committee Medical Image Computing and Computer Aided Intervention (MICCAI) 2024.
She holds a B.Sc. in Physics and an M.Sc. in Computer Science, both from the Hebrew University in Jerusalem and a PhD from the School of Electrical Engineering of Tel-Aviv University. Prior to establishing her research group at BGU (2012) she was a research fellow and a post-doctorate associate at the Computer Science and Artificial Intelligence lab. (CSAIL), MIT, at Harvard Medical School, and the Broad Institute of MIT and Harvard.
A brief reminder on classical computer vision
Shila Ofek-Koifman is a Director for Language Technologies in IBM Research AI. Shila manages the AI- Language & Media area in the Haifa Research Lab, and co-leads Research AI’s strategy in the area of Natural Language Understanding, and aspects of the AI-Driven Customer Care strategy, including research on natural language generation, document understanding, summarization, neural information retrieval, conversation and Large Language models. Shila works closely with the Watson products, and under her leadership, her teams deliver differentiating research technologies into the products. Shila received multiple IBM awards for her research work and contributions to the business, including an IBM Corporate award and the "Best of IBM" award.
What’s next in Multimodal Learning for Enterprises
Two mostly separate fields of machine learning -- computer vision and natural language processing -- have gradually become closer in recent years.
Advancements in each field have greatly influenced the other, driven in part by the abundance of weakly annotated data in the form of image-text pairs. These advancements brought focus to the multimodal Vision-Language models (VL) which jointly process images and free-text.
At IBM Research we have focused our research on multimodal learning, the limitations, applications and the adaptation of these models to the world of business documents.
In this talk, we will cover our latest work in the VL field, with topics such as Foundation Models for Expert Task Applications, Understanding Structured Vision and Language Concepts, and more.
Co-founder & CEO BRIA AI
Dr. Yair Adato, Co-founder & CEO of BRIA AI, is a visionary in his field. He holds a PhD in Computer Vision from Ben-Gurion University and has conducted joint research with Harvard University. With 67 patents in machine learning and AI, Dr. Adato boasts a remarkable innovation record.
Before leading BRIA, Dr. Adato served as CTO at Trax Retail, pivotal in propelling the company from startup to unicorn status. His expertise transcends BRIA, offering valuable advisory guidance to prominent firms like Sparx, Vicomi, Tasq, DataGen, and Anima.
Solving the big problems for Visual Generative AI
The biggest and most challenging problems when using this amazing technology in a commercial setting are not necessarily algorithmic in nature. This talk suggests the biggest problems are related to trained data and responsible AI, to models accessibility by the community, and lastly, how to use Visual Generative AI to create a commercial impacta. Specifically, we will focus on solving one hard problem, attribution and transparency of the v-gen-ai.
PhD CandidateTel Aviv University & Google research
Hila is a PhD candidate at Tel-Aviv University, advised by Prof. Lior Wolf. Her research focuses on constructing faithful explainable AI algorithms for classifiers and generative models, and leveraging explanations to promote model accuracy and robustness.
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Can a diffusion process be corrected after taking a wrong turn? We present Attend-and-Excite, a novel method that guides a text-to-image diffusion model to attend to all subjects in a text prompt and strengthen — or excite — their activations, encouraging the generation of all subjects in the prompt.
VR for screenless laptops
Dori Peleg is Sightful’s Sr. Director of algorithms. He has a PhD in Electrical Engineering from the Technion Israel institute of technology and was a lecturer for a graduate optimization course. Dori’s technical expertise is machine learning and optimization. He has led AI and algorithms teams for 15 years in companies such as Cortica, Given Imaging, Medtronc and Sightful. At Medtronic, the world’s largest medical device company, he was a technical and Bakken fellow and led the AI for the Gastrointestinal division. He also led and initiated Medtronic's AI conference and mentorship program.
PhD, School of Computer ScienceAriel University
Assaf is a senior lecturer at Ariel University, leading the Computer Vision and Deep Learning lab. His research stands on the theoretical-practical line, improving core elements of deep learning by proposing adaptive solutions for optimization, data normalization, and regularization. These improvements aim to enhance accuracy, robustness, and efficiency in both natural and medical computer vision, addressing their significant challenges. Assaf holds a BSc in biomedical engineering from Ben Gurion University and MSc/PhD in biomedical signal and image processing from Technion. He completed a postdoc at Stanford University and received the Young Investigator Award from NCI-NIH for his exceptional contributions to medical imaging.
Leveraging the Triple Exponential Moving Average for Fast-Adaptive Moment Estimation
To enhance deep network performance, the precision and efficiency of optimizers in recognizing gradient trends are crucial. Existing optimizers primarily rely on first-order Exponential Moving Averages, resulting in noticeable delays and suboptimal performance. We introduce the Fast-Adaptive Moment Estimation (FAME) optimizer. FAME leverages a higher-order Triple Exponential Moving Average (TEMA, inspired by the financial domain) to improve gradient trend identification. Here, TEMA actively influences optimization dynamics, unlike its passive role in finance. FAME excels in identifying gradient trends accurately, reducing lag and offering smoother responses to fluctuations compared to first-order methods. Results showed FAME’s superiority. It minimizes noisy trend fluctuations, enhances robustness, and boosts accuracy in significantly fewer training epochs than existing optimizers.
Head of AI researchWix.com
Dr. Eli Brosh is the Head of AI Research at Wix, where he is working on future-looking technologies for website building using language and vision-based models. His research focuses on applying deep learning models and utilizing multimodal inputs for graphic design systems and layout generation. Prior to Wix, Eli held leadership positions in top companies in the fields of visual driving analytics and medical diagnostics. Eli holds a PhD in Computer Science from Columbia University and is the author of more than 30 publications and patents.
Generative AI in graphic design: challenges and opportunities
In this talk, we focus on the layout generation process, an essential ingredient of graphic design applications. We discuss the main challenges in the field, describe the different solution approaches, and introduce our recently proposed method, DLT, which consists of a novel joint discrete-continuous diffusion process, and highlight its effectiveness for conditioned layout generation.
Senior Deep Learning Team Lead Mobileye
Gal Kaplun is a research scientist and senior team leader at Mobileye, specializing in applied Deep Learning and Computer Vision. He recently completed his Ph.D. in Computer Science at Harvard University's ML Foundations Group, under the guidance of Prof. Boaz Barak. Prior to that, he earned his BSc in Computer Science and Mathematics from the Hebrew University of Jerusalem, graduating summa cum laude. Gal has published papers in top Deep Learning conferences such as ICLR, NeurIPS, and ICML, focusing on the theoretical foundations and empirical understanding of DNNs.
Less is More
In this talk, I will introduce SubTuning, a novel parameter-efficient finetuning method for neural networks that selectively trains a subset of the layers. This approach is based on the observation that the utility of layers in a pretrained model varies when adapting to a target task, influenced by factors such as model architecture, pretraining tasks, and data volume. By leveraging this observation, SubTuning carefully chooses the layers to finetune, providing a flexible method that outperforms conventional methods in scenarios with scarce data while also enabling efficient inference in multi-task settings.
Efrat Shimron is an assistant professor at the Technion, with dual affiliation to the departments of Electrical and Computer Engineering and Biomedical Engineering. She was previously a postdoctoral fellow at UC Berkeley. Her research spans the development of Compressed Sensing and AI algorithms for medical imaging, focusing on magnetic resonance imaging (MRI). She also investigates topics of bias in AI models; her work on identifying “data crimes” in medical AI was published in the proceedings of the national academy of sciences (PNAS) journal. Efrat recently received several career awards, including MIT’s Rising Star in Electrical Engineering and Computer Science award.Efrat Shimron is an assistant professor at the Technion, with dual affiliation to the departments of Electrical and Computer Engineering and Biomedical Engineering. She was previously a postdoctoral fellow at UC Berkeley. Her research spans the development of Compressed Sensing and AI algorithms for medical imaging, focusing on magnetic resonance imaging (MRI). She also investigates topics of bias in AI models; her work on identifying “data crimes” in medical AI was published in the proceedings of the national academy of sciences (PNAS) journal. Efrat recently received several career awards, including MIT’s Rising Star in Electrical Engineering and Computer Science award.
“Data Crimes: The Risk in Naive Training of Medical AI Algorithms”
In contrast to the computer vision field, where large open-access databases are abundant, the medical AI field suffers from data scarcity. Specifically, datasets of raw magnetic resonance imaging (MRI) measurements are small and quite limited. This poses a challenge for training AI algorithms for certain tasks, e.g. image reconstruction from MRI measurements. A common workaround is to download non-raw datasets that were published for other tasks, such as tumor segmentation, and use them for synthesizing “raw” MRI data and training reconstruction algorithms. Nevertheless, this could lead to biased results. In this talk I will describe how the bias emerges from such naïve workflows, and how it leads to fantastic, overly-optimistic results, which are too good to be true. Moreover, I will show that algorithms trained on synthesized data could later fail in clinical settings and miss important details. Next, I will introduce a new framework, titled “k-band”, which our team developed to address this challenge. The k-band framework enables training MRI reconstruction algorithms using only limited data, in a self-supervised manner, and hence reduces the need for massive datasets.
Co-Founder and CEOLeo AI
Dr. Maor Farid (Co-Founder & CEO wiki, LI, web) is an AI and Chaos Theory scientist and lecturer at the Technion. He previously served as a Fulbright postdoctoral fellow at MIT and Israel's representative at Harvard’s leadership program. During his military service in the IDF, he served as a researcher and commander in the Brakim excellence program, the Israeli Prime Minister's Office, and Unit 8200 (Captain), and was acknowledged as a Distinguished Scientist (top 3 scientists in the IDF). He completed his Ph.D. with the highest honors as the youngest graduate at the Technion, at the age of 24. Dr. Farid is the recipient of some of the most prestigious academic awards, including Israel's National Academy Award, and Israel's Ministry of Science and Technology award for groundbreaking research. He's also the founder of the Center of Israeli Scholars at MIT (ScienceAbroad, NGO) and an NGO "Learn to Succeed" for empowering youths at risk, and the author of a top seller that carries the same name. Dr. Farid is a member of the Forbes 30 Under 30 list.
GenAI & the Next Industrial Revolution - How will humanity engineer the future?
In today's landscape, engineering design remains mostly manual, posing significant challenges in translating market and product requirements into engineering concepts, technical specifications, and 3D computational (CAD) models. This labor-intensive process results in extended Time to Market (TTM), often causing organizations to lag behind their competitors. While the potential of Generative Artificial Intelligence (GenAI) is promising, existing generally-trained Language Models (LLMs) and Deep Learning models struggle to comprehend the complexity of engineering systems. A tailored engineering-specific solution is essential.
Director of Autonomous Mapping & StrategyNexar Inc.
Matan Friedmann is leading the autonomous mapping team in the AI Automotive group at Nexar, a startup company dedicated to creating a network of connected vehicles for the future of mobility. His team focuses on the creation of high-definition road maps using crowd-sourced vision datasets from AI-powered dashcams. These maps are built as a precise and scalable solution for smart driving platforms. Matan received a BSc in physics, an MBA, and an MSc in astrophysics from Tel Aviv University, where he published several papers in leading journals in the fields of microlensing exoplanets and supernovae in the early universe.
Scalable HD-Map Creation from Crowd-Sourced Vision
Creating and updating HD maps for autonomous driving is critical yet prohibitively expensive at scale. This presentation delves into our scalable solution, relying on crowd-sourced vision data from AI-powered dashcams, while addressing mixed-fleet localization accuracy challenges, and without the use of expensive lidars. Utilizing deep learning in computer vision and structure-from-motion components, we generate precise 3D point clouds with dense 3D representations of various road assets, and provide high levels of asset localization. Join us as we explore the technical intricacies of this approach, offering insights into its potential to revolutionize autonomous navigation.
Co-Founder & CEO DailyRobotics
An experienced entrepreneur who loves to lead early-stage teams building new technology and business in the fields of AI, Robotics, and Autonomous Systems. Adham believes that the current way we do agriculture is unsustainable in terms of soil disruption and robotics can play a significant role as AI is advancing rapidly.
· Adham is the former CEO and Co-Founder at Imagry, He led the company from 0 to 70 employees and raised more than $20M in venture funding. Imagry is the first company to receive a license to operate an autonomous public transportation bus in Israel.
· Adham has a B.Sc. in Biomedical Engineering and studied towards an M.Sc. in Mechanical Engineering at Tel Aviv University.
· Holds multiple patents in the fields of AI, Computer vision and Deep Learning
Zero Shot Learning in Farming
The impressive generalization capabilities of large neural network models (as lately seen in Dalle2/3, Stable Diffusion, ChatGPT …etc.) revolve around on the ability to integrate enormous quantities of training data. To enable robots with the ability to perform multiple complex tasks in unstructured environments such as farms and learn new tasks with minimal effort; we need to learn from diverse prior datasets in the real world. However, collecting a large amount of data from demonstrations, or even with randomized exploration, can be challenging for the robot. It needs to generalize to unseen farms, recognize visual and dynamical similarities across scenes, and learn a representation of visual observations that is robust to distractors like weather conditions and obstacles. Since such factors can be hard to model and transfer from simulated environments, we tackle these problems by building a multi-model learning algorithm that combines language, visuals, and actions.
Robotic navigation and decision-making (motion planning) have been approached as a problem of 3D reconstruction (Perception) and planning, as well as an end-to-end learning problem.The first method requires hand engineering that is difficult to scale from one environment to another. End-to-End learning is a large black box that is uncontrollable and cannot be debugged leading to unpredictable development cycles.
Our proposed approach integrates learning and planning, and can utilize side information such as schematic roadmaps, object descriptions, text instructions, satellite maps and GPS coordinates as a planning heuristic, without relying on them being accurate.
Our target is to use an image-based learned controller and goal-directed heuristic to navigate and execute to goals few kilometers away and execute novel tasks when we reach our goals in previously unseen environments, without performing any explicit geometric reconstruction, by utilizing only a topological representation of the environment.
The resulting method should be robust to unreliable maps, GPS, and commands since the low-level controller (Decision-making Model) ultimately makes decisions based on egocentric image observations.
Algorithm ManagerApplied Materials
I hold a BSc and MSc in Physics with research in the field of Superconductivity.
I have 13 years of experience in the Semiconductors industry in leading companies like Intel, Nova, and Applied Materials filling various positions in data science and algorithm development.
Co-inventor of 10 patents in this field in machine and deep learning.
Currently, leading an algorithm group developing deep learning solutions for next-generation Anomaly Detection and Segmentation industry-specified solutions.
Generating the Perfect Reference: Anomaly Detection Via Fusion of Stochastic and Deterministic Learning
Defect Detection in the Semiconductor industry is an extreme case of Anomaly Detection comprised of very small anomalies often well-harmonized in the background pattern.
Although very similar, no reference sample is perfect for comparison due to production variation.
We propose to generate this perfect reference – a generated image counterpart that is identical to the input sample everywhere, but the defective area.
We are using a novel fusion of stochastic and deterministic learning to train a conditional generative deep VAE model.
We demonstrate perfect reference generation for MVTec dataset and silicon manufacturing Scanning Electron Microscope (SEM) images achieving industry SOTA results.
Co Founder & Co CEOSagivTech
Chen Sagiv earned her PhD in Applied Mathematics from the Tel Aviv University focusing on variational methods and Gabor Analysis.
After working as algorithms developer, she became a parallel entrepreneur and co founded SagivTech, a computer vision projects company, DeePathology working in AI for computational pathology and SurgeonAI working on bringing AI to the OR.
Chen is also co founder of IMVC.
Chen is passionate about bringing technology to healthcare, promoting Math education to at risk youth and dogs.
Introduction to Transformers
Transformers are neural networks that learns context from relationships in sequential data using a mechanism called attention.
The modern transformer was proposed in the 2017 paper titled 'Attention Is All You Need' by Ashish Vaswani et al., Google Brain team.
While transformer models are basically large encoder/decoder blocks that process data they also have an attention ingredient that allows them to detect patterns in data.
In this session, a brief introduction to the foundations of transformers will be given.
Morris Alper is a PhD student at the School of Electrical Engineering, Tel Aviv University (TAU). Under the mentorship of Dr. Hadar Averbuch-Elor, he is researching multimodal learning – machine learning applied to tasks involving vision and language. He received his MSc with honors from TAU (Computer Science), and his BSc from MIT (Mathematics and Linguistics).
Most humans use visual imagination to understand and reason about language, but models such as BERT reason about language using knowledge acquired during text-only pretraining. In this work, we investigate whether vision-and-language pretraining can improve performance on text-only tasks that involve implicit visual reasoning, focusing primarily on zero-shot probing methods. We propose task suites for testing this as well as a novel zero-shot knowledge probing method for multimodal models like CLIP. Our results indicate that exposure to images during pretraining affords inherent visual reasoning knowledge that is reflected in language-only tasks that require implicit visual reasoning. Our findings bear importance in the broader context of multimodal learning, providing principled guidelines for the choice of text encoders used in such contexts.
With over two decades of experience in the tech industry, Shira has held prominent product and AI leadership roles at companies like Microsoft and monday.com. As a co-founder of LeadWith, a non-profit organization dedicated to empowering women in the tech field, she is passionate about promoting diversity and inclusivity in the industry. Shira’s contributions have been recognized by being selected for the esteemed 40 under 40 list by Globes. Currently, she works as an independent consultant and speaker, leveraging her expertise to mentor product managers and impart knowledge through her own academy. Additionally, her influential podcast serves as a platform for engaging discussions on all aspects of product management.
Artificial Intelligence, Real Biases: Examining Gender Biases in AI
Have you ever wondered why when you ask Midjourney for pictures of drivers, only images of men appear? Or why Siri's default voice is female?
The lecture explores the existence of gender biases in artificial intelligence and how they impact our understanding of reality.
As AI continues to permeate our lives, it's crucial to recognize that it is not always neutral. From Google Translate to cutting-edge generative AI platforms, this lecture will examine the gender biases present in various technologies. Through engaging examples, attendees will gain a deeper understanding of the issue and learn about available tools to address and create a more equitable future.
Research FellowThe Australian National University and Technion, Israel Institute of Technology
Dr. Yizhak Ben-Shabat (Itzik) is a Research Fellow at the Australian National University (ANU) and Technion – Israel Institute of Technology. With expertise in 3D computer vision, machine learning, and geometric algorithms, Itzik research focuses on applying deep learning methods to 3D point clouds for tasks like 3D reconstruction, classification, detection, and action recognition. Besides his research role, Itzik is the founder and host of The Talking Papers Podcast—a groundbreaking platform for disseminating research and supporting early career academics and PhD students. Itzik earned his Ph.D. in 2019 from the Technion and later served as a Research Fellow at the ARC Centre of Excellence for Robotic Vision (ACRV).
Full details, publications, and code are available on his personal website: www.itzikbs.com.
Octree Guided Unoriented Surface Reconstruction
We address the problem of surface reconstruction from unoriented point clouds. Implicit neural representations (INRs) have become popular for this task, but when information relating to the inside versus outside of a shape is not available optimization relies on heuristics and regularizers to recover the surface. These methods can be slow to converge and easily get stuck in local minima. We propose a two-step approach, OG-INR, where we (1) construct an octree and label what is inside and outside (2) optimize for a continuous and high-fidelity shape using an INR that is initially guided by the octree's labelling. To solve for our labelling, we propose an energy function over the discrete structure and provide an efficient move-making algorithm that explores many possible labellings.Our results show that the exploration by the move-making algorithm avoids many of the bad local minima reached by purely gradient descent optimized methods .
VP of Data ScienceZefr
Or Levi is an AI Researcher and VP of Data Science at Zefr. He holds a M.Sc. (Magna Cum Laude) in Information Retrieval from the Technion, the Israel Institute of Technology. Or’s strongest passion is using AI for social impact, which led him to develop innovative AI to fight the spread of misinformation online. The technology was named among CB Insights’ International Game Changers – with potential to transform society and economies for the better. Or’s work has been presented in leading AI conferences and covered by international media.
Detecting AI-Generated Fakes with Machine Vision
With the meteoric rise of AI-image generators, fake images of public figures - such as 'Trump's arrest' - have recently become viral sensations. The risks of synthetic media being utilized to spread misinformation and undermine democracy, were brought into the public’s attention, raising an interesting question - can we use AI to catch AI-generated images before they become the next viral hit? Zefr, the global leader in brand suitability, is introducing advanced vision models to detect AI-generated images and counter misinformation. The talk will cover real world examples, the challenges of detecting fakes and practical tips for training and deploying specialized vision models at scale.
Research EngineerMeta AI
Adam is a Research Engineer at Meta AI Research (formerly Facebook AI Research) and a PhD student under Prof. Lior Wolf at Tel-Aviv University. He holds a BSc in computer science and mathematics from Bar-Ilan University, and an MSc in computer science from Tel-Aviv University. His research is focused on advancing generative models in image, audio, and video domains, with recent achievements in large-scale foundational generative models for images and videos.
Text to Dynamic Visual Worlds: Advancements in Video and 4D Scene Generation
In this talk, we present two methods for Text-to-Video generation: i) Make-A-Video (MAV), video generation from textual prompts, and ii) Make-A-Video3D (MAV3D), three-dimensional dynamic scenes generation from text descriptions. MAV introduces a paradigm for directly translating the tremendous recent progress in Text-to-Image generation to Text-to-Video. MAV3D leverages a 4D dynamic Neural Radiance Field (NeRF) optimized for scene appearance, density, and motion consistency through the MAV model. The dynamic video output generated from the provided text can be viewed from any camera location and angle, and can be composited into any 3D environment. Both methods rely only on text-image pairs and unlabeled videos. To the best of our knowledge, MAV3D is the first to generate 3D dynamic scenes given a text description.
PhD Candidate Hebrew University of Jerusalem
Bella is a PhD candidate at the Hebrew University under the supervision of Prof. Leo Joskowicz. Prior to that, she worked on medical imaging algorithms as a scientist for 7 years at Philips. Bella is also the founder and organizer of the Machine Learning for Medical Imaging (MLMI) and Haifa Machine Learning and meetups.
We present a new method for partial annotations of MR images that uses a small set of consecutive annotated slices from each scan with an annotation effort equal to few annotated cases. The training is performed by using only annotated blocks, incorporating information about slices outside the structure, and modifying a batch loss function. For fetal body segmentation of in-distribution data, the use of partial annotations resulted in decrease in Standard Deviations of Dice scores by 22% and 27.5% for the FIESTA and TRUFI sequences respectively. For TRUFI out-of-distribution data, the method increased average Dice scores from 0.84 to 0.9.
Deep Learning EngineerDeci
Kate is a researcher and an engineer with a degree in Computer Science passionate about the field of Computer Vision. She moved to Israel 6 years ago to be in the center of the vibrant Israeli AI community and now she's working at Deci focusing on making SOTA vision applications faster. She has vast experience with edge devices and her day-to-day work lies in scaling and applying Deci's proprietary Neural Architecture Search algorithm, contributing to open-source, and inspecting funny training curves along the way.
architectures, notably the YOLO (You Only Look Once) family. This talk introduces a novel YOLO-based architecture, YOLO-NAS, developed via a proprietary neural architecture search (NAS) algorithm, AutoNAC. By optimizing accuracy and efficiency, YOLO-NAS redefines state-of-the-art object detection, paving the way for increased precision and performance in applications like autonomous vehicles, robotics, and video analytics.
Algorithm Team LeaderWSC Sports
Amos Bercovich is an Algorithm Team Leader at WSC Sports, where he and his team research and develop end-to-end real-time solutions for generating sports content from live sports broadcasts automatically, using deep learning for video, image, and audio analysis. Before joining WSC Sports, Amos worked at Cortica as an Algorithm Developer where he focused on developing image recognition applications. He has acquired his B.Sc. and M.Sc. degrees at the Ben-Gurion University of the Negev, with a thesis in the field of Computer Vision in collaboration with the Agricultural Research Organization.
Zero-Shot Event Retrieval in Sports Broadcasting
WSC Sports is developing an AI platform for generating automatic sports highlights. To make the storytelling of our content more compelling, our system is also programmed to add to the highlight video, short transitions such as, team lineup graphics, close-ups of reactions, etc.
Unlike regular plays in the game, these transitions tend to change from one broadcaster to the other, and over time. Although plain supervised algorithms can recognize and classify these transitions, it can be quite costly and difficult to keep their performance at a high level. In this presentation, we will present our Concept Detector, a framework for creating concepts - a set of textual and visual queries combined with a set of rules to retrieve those events.
INTL Senior Algorithm DeveloperGentex Technologies Israel
Avrech Ben-David is a Senior Algorithm Researcher at Gentex Technologies Israel (GTI). Avrech is part of GTI’s AI center-of-excellence developing Deep-Learning based machine vision algorithms for Driver and In-Cabin monitoring. Avrech’s interests and publications span various topics in optimization, graph NN, reinforcement learning, DNN accelerator architecture, text-to-speech styling, and human-machine interaction. Avrech holds a BSc and MSc in Electrical and Computer Engineering from the Technion, IIT.
Solving 3D Human Pose Ambiguities with Quadratic Programming
3D human pose estimation (HPE) is a fundamental task in human-computer interaction. Monocular 3D HPE is a challenging task due to a lack of in-the-wild annotated data, high computational load, and accessibility to depth observations. Despite their success on 2D HPE, end-to-end DNN approaches to 3D HPE hardly generalize to in-the-wild scenes with multiple self-occlusions. Recent approaches suggested optimizing 3D humanoid model parameters to minimize a 2D objective, however, as they optimize in 2D, they suffer from depth ambiguities.
We propose a two-stage depth-based solution to monocular 3D HPE. We start by using a deep neural network to predict 2D body-joint locations and to classify joints as occluded or visible. Then, having valid depth for the visible joints, we solve a Quadratically Constrained Quadratic Program enforcing skeletal and temporal-continuity constraints and thereby solving the self-occlusion problem. We demonstrate our method effectiveness on Gentex’s in-cabin 180 degrees fisheye depth camera and show that it can reconstruct reliable 3D human pose in complex situations.
Principal Data ScientistMicrosoft
Zvi Figov is a data scientist with over 20 years of experience in various computer vision fields. He currently works in the Azure Video Indexer group at Microsoft. He holds a BSc and MSc in computer science and mathematics from Bar-Ilan University. Zvi has vast experience in computer vision applications, including deep learning, object detection and tracking. Since joining the Video Indexer group 4 years ago Zvi has also been working on creating solutions based on multimodality analysis, combining vision, audio and NLP.
Person tracker for Media and Entertainment videos
Azure Video Indexer is an analytical tool to generate insights from videos while indexing them. Person tracking is a crucial aspect of video analysis and plays a significant role in Azure Video Indexer. However, it poses several algorithmic and computational challenges, particularly in real-world scenarios such as the media industry. These challenges include the need for efficient and scalable algorithms, handling multiple camera switches, dealing with different angles and poses, occlusions and more.
In this talk, I will present our novel pipeline for person tracking, with significant improvements for media and entertainment videos. Our approach addresses the above mentioned challenges and significantly reduces the computational cost and runtime required for person tracking. It combines Neural Network Models together with a novel tracking algorithm, all running fast and efficiently on a CPU.
Ph.D. candidate at Weizmann Institute of Science and AI Researcher at IBMIBM
Sivan works as an AI Researcher at IBM. She is a Ph.D. candidate at the Weizmann Institute of Science under the supervision of Prof. Shimon Ullman. Her papers have been published in top AI conferences, including CVPR, NeurIPS, ICCV, and AAAI. Sivan's research focuses on the fields of weakly supervised learning and Multi-Modal image-text learning.
Teaching Structured Vision&Language Concepts to Vision & Language Models
Vision and Language (VL) models have demonstrated remarkable zero-shot performance in a variety of tasks. However, some aspects of complex language understanding still remain a challenge. We introduce the collective notion of Structured Vision&Language Concepts (SVLC) which includes object attributes, relations, and states which are present in the text and visible in the image. Recent studies have shown that even the best VL models struggle with SVLC. A possible way to fix this issue is by collecting dedicated datasets for teaching each SVLC type, which might be expensive and time-consuming. Instead, we propose a more elegant data-driven approach for enhancing VL models' understanding of SVLCs that makes more effective use of existing VL pre-training datasets and does not require any additional data. While automatic understanding of image structure remains largely unsolved, language structure is much better modeled and understood, allowing for its effective use in teaching VL models. We propose various techniques based on language structure understanding that can be used to manipulate the textual part of off-the-shelf paired VL datasets. VL models trained with the updated data exhibit a significant improvement of up to 15% in their SVLC understanding with only a mild degradation in their zero-shot capabilities both when training from scratch or fine-tuning a pre-trained model.
PhD, Principal Data Scientist Salt Security
With a Ph.D. in Applied Mathematics from the Technion, Mike Erlihson is a recognized data scientist(DS) and machine learning(ML) expert. He currently serves as a Principal DS at Salt Security, leading the development of statistical and ML techniques for efficient API attack detection.
In 2020, Mike founded the community project DEEPNIGHTLEARNERS, aiming at making Deep Learning(DL) papers more accessible to a broader audience, publishing reviews in both Hebrew and English. As an author, he penned "DL in Hebrew" and has been an influential educator and lecturer, including roles at BGU University and the Israel Tech Challenge.
Generate Any Visual Data with Text2Image Diffusion Models
The talk discusses visual data generation and manipulation using pretrained Text2Image (T2I) diffusion models,. These models are capable of synthesizing realistic images based on textual descriptions by combining recent advances in natural language processing and computer vision. In this presentation we discuss how pretrained T2I models can be used to generate visual content of different types such as video and 3D models. Additionally T2I diffusion models can be leveraged to manipulate visual data (image/video personalization, image/video editing and image generation with desired visual characteristics etc). The presentation defines 4 broad approaches of utilizing T2I diffusion models to create/manipulate visual data of different types.
Or Litany is a senior researcher at Nvidia and an assistant professor at the Technion where he leads the visual computing and AI lab. His research focuses on semantic scene understanding and spatiotemporal content generation.
Data driven simulation for autonomous driving
Simulation is a critical tool for ensuring the safety of autonomous driving. However, traditional simulation methods can be labor-intensive and struggle to scale.
In this talk, I will discuss an innovative neural simulation approach that learns to simulate driving scenarios from data. Specifically, I will focus on my latest research in three key areas: scene reconstruction in both appearance and geometry, motion generation of humans and vehicles, and LiDAR view synthesis.
MS. CEO and Co-Founder,Conflu3nce ltd and Conflu3nce Health AI (CHAI)
Tami Ellison is co-founder of conflu3nce - a Jerusalem-based health technology start-up. She leads the company’s early disease detection initiatives, applying her patented technologies to transform image intelligence for both humans and machines. An accomplished photographer with exhibitions in Israel and the US, her multiplexed, figure-ground visual illusions bring a Gestalt-based understanding of how images/image parts interact with one another. A C-level consultant, working with public and private entities for over 25 years, she holds a thesis research MS from UIC’s Laboratory for Cell, Molecular and Developmental Biology, investigating developmental model systems, expression patterns, and systems-level regulation/control mechanisms.
Deep Learning for ALL: Enhancing Image Inputs - Building Knowledge Outputs
Globally, an estimated 40M diagnostic reading errors occur annually; approximately 62% can be attributed to cognitive/perceptual issues associated with complacency, underreading, and search satisfaction. AI expert systems are critical to address the exponential growth in the volume of medical images generated and help alleviate workforce:workflow inefficiencies. But outsourcing clinical decision-making can exacerbate existing errors and introduce FN/FP reporting issues. We will present image enhancement methods that transform early disease detection capabilities, applying a “Deep Learning for ALL” approach that advances pixel-level “Image Intelligence” for both humans and AI and cooperatively promotes knowledge-building, pattern recognition, and attribute extraction
Tomer Weiss is a PhD student at the Computer Science faculty at the Technion, where he is working under the guidance of Prof. Alex Bronstein. His primary research focuses on harnessing deep learning methodologies for inverse design in computational imaging and cheminformatics. Holding an MSc with honors from the Technion in Computer Science and a BSc from Ben-Gurion University in Mathematics and Computer Science.
This talk introduces the concept of joint optimization in computational imaging using deep learning, demonstrating its practical benefits for improved performance. By showcasing real-world examples from Magnetic Resonance Imaging (MRI) and Multiple Input Multiple Output (MIMO) radar imaging, we reveal how this approach can positively impact the end performance. Join us for an exploration into the world of computational imaging, where simple yet effective techniques can make a meaningful difference.
VP of AI and AlgorithmsVoyage81, ODDITY
Topaz Gilad is an R&D manager specializing in AI, machine learning, and computer vision, leading production-oriented innovative research. With experience in large companies as well as startups, in various industries, from space imaging and semiconductor microscopy to sports tech, wellness, beauty, and self-care industry, she has developed methodologies to scale up while improving quality, delivery, and teamwork. Currently VP of AI and Algorithms at Voyage81, ODDITY, which excels in computer vision deep learning algorithms in both RGB and hyper-spectral domains. Previously head of AI at Pixellot, a leading AI-automated sports production company. Topaz is also an advocate for women in tech.
From Cost-Sensitive Classification to Regression: Unlock the True Potential of Your Labels!
Many of the tasks we face as data scientists or machine-learning researchers relate to categorization in one way or the other. In the words of David Mumford: "The world is continuous, but the mind is discrete." We often define categories when breaking down a real-world problem into an ML-based solution. However, actual target values may be continuous or at least ordered. This is something to consider and even leverage in the design of your ML model.
Using case studies from real-world data domains, we will see how acknowledging the inner relations of our target labels can boost the knowledge we provide in the training phase, better model the world, reduce overconfidence, and improve robustness. From classical concepts to state-of-the-art, this talk will walk you through regression-based approaches for what may seem like classification problems. Unlock the true potential of your labels and boost your classifiers!
VP Research Lightricks
Ofir Bibi, VP Research at Lightricks, has led the research department of 40+ researchers for the past seven and a half years and counting. Ofir specializes in bringing core technologies and ML solutions into products. Ofir has a vast experience in building products and processes by utilizing data to the furthest extent. His main research focus is on Machine Learning, Statistical Signal Processing, and Optimization.
Taming the Wild Generative Beast for the Everyday Creator
As the use of generative AI becomes more prevalent in the tech industry, it can be difficult to understand how to effectively implement this new technology in your company's products.In this talk, Ofir Bibi will provide a general overview of how we implemented generative AI at Lightricks and showcase our latest developments in image transformation technology. He will also share thoughts on the impact that generative AI will have on the industry.
Co-founder and CTO Visual Layer
Amir Alush is the co-founder and CTO of Visual Layer, a company dedicated to improving the quality of image datasets used in AI model development. He holds a Ph.D. in Computer Vision and Machine Learning from Bar-Ilan University and has extensive experience in the field, including roles at Quris.AI and Brodmann17. His work focuses on AI system design, deep learning, and computer vision. His recent project fastdup (co-authored with Dr. Danny Bickson) showcases his commitment to practical, data-driven solutions in AI.
From Raw Data to Refined Datasets: Introducing VL Datasets for Reliable AI Model Development
Generative AI has revolutionized various domains, including art and design. However, the success of generative models heavily relies on high-quality, extensive image datasets for effective training. Whether you're an AI enthusiast, researcher or a student, you've likely encountered challenges associated with untidy image datasets in Generative AI or other visual data-focused AI applications.
These challenges, including issues like duplicated images, mislabeled data, and outliers, can severely impact model reliability, waste computational resources and storage, and demand significant manual cleanup efforts. In our research project, LION-1B, we uncovered quality issues in approximately 105 million images. Notably, more than 90 million images were identified as duplicates, over 7 million images were blurry or of low quality, and more than 6 million images were deemed outliers.
To address these challenges, we have released a set of refined versions of popular visual datasets, namely LAION-1B and ImageNet-21K. We have named these refined datasets VL Datasets, and they are freely accessible through the visuallayer Python SDK or the free user-friendly VL Profiler UI. By utilizing VL Datasets, AI practitioners and researchers can enhance the development of more robust and reliable AI models.
Senior Manager, Data Science & ResearchDoubleVerify
Seasoned AI leader, with over a decade of experience, solving complex industry problems.
Auto Labeling of Data Sets
Connecting traditional algorithms to ensure a successful lazy-labeling strategy.
Computer Vision Perception Group Manager Innoviz Technologies
Eyal is an Algorithm Engineer with over 10 years of experience in leading productization of Computer Vision technologies.
Following his experience in building and leading research and development algorithms teams, these days, Eyal leads the Perception Group at Innoviz Technologies overseeing Innoviz's AI Perception and Calibrations solutions, from data aspects though algorithms to deployment .
Prior to Innoviz, Eyal held key positions in several startup companies, early stage core team to late stage, leading computer vision projects from conception to productization for applications such as autonomous drones, security, and industrial automation.
Eyal Holds a BSc in Electrical Engineering from Ben-Gurion University and an MSc in Electrical Engineering from Tel-Aviv University.
Using AI for 3D perception in L3 Autonomous driving
In order to improve safety in autonomous driving it is important to use multiple sensors. In this talk we will discuss how to perform perception from 3D data acquired by the Innoviz lidar. Specifically we will describe how to identify obstacles or moving objects and perform detection in 3D.
We will show how combining AI with classic thence provide us both with high accuracy and robustness to diverse conditions and scenarios.
AI Research Scientist IBM
Amit works as an AI Research Scientist @ IBM. He graduated with a PhD from the Technion under the supervision of prof. Alex Bronstein. His papers were published in top AI conferences such as: CVPR, NeurIPS, ECCV, AAAI. Amit researches the fields of Few-shot learning, Multi-Modal image-text learning and currently Multi-Modal image-language foundation models.
Attention based change detection using transformers
Amit will discuss FETA, NeurIPS 2022 main conference paper. FETA: Towards Specializing Foundation Models for Expert Task Applications. While Foundation Models (FMs) have demonstrated unprecedented capabilities, FMs still have poor out-of-the-box performance on expert tasks. We offer an automatic system and method to adapt FMs to expert data using raw documents only, without requiring any annotations. FETA can be easily used on any document. We also propose a benchmark built around the task of teaching FMs to understand technical documentation. Our FETA benchmark focuses on text-to-image and image-to-text retrieval in public car manuals and sales catalogue brochures.
Enhancing Image Retrieval: Novel Approaches and ScenariosOriginAI
Rami Ben-Ari is a senior research scientist and technical leader at OriginAI, an AI research center at Israel. He maintains close collaboration with several universities in Israel, supervising graduate students and serves as an adjunct professor at Bar-Ilan university. Rami has published over 50 papers and patents, and organized various workshops and challenges in CV & ML.
His research interests cover deep learning methods in computer vision and particularly, image retrieval, multimodal learning and generative models.
He holds a PhD in Applied Mathematics from Tel-Aviv University, specializing in computer vision.
Enhancing Image Retrieval: Novel Approaches and Scenarios
Image retrieval are essential for managing, searching and making sense of the ever-growing volume of visual data, across diverse fields and applications. In this talk I will present several of our research works in interactive image retrieval, including a new architecture that leverages few shot learning, a greedy active learning approach for image retrieval, a new method that combines textual and visual search and finally introducing a system that leverages the emerging chat capabilities for the benefit of image retrieval.
AI Team Leader4M Analytics
Rebecca Hojsteen is AI team leader at 4M Analytics, a company specializing in providing cutting-edge AI solutions for mapping underground infrastructures on a large scale.
Prior to joining 4M Analytics, Rebecca was an expert in computer vision algorithm development at RTC-vision and Samsung.
She holds an MSc in biomedical engineering from the Technion and a ME in electrical engineering from Supelec in Paris.
How to process engineering records for infrastructure mapping
In today's construction industry, precise knowledge of underground infrastructures is of paramount importance in project planning. However, the current scenario lacks a reliable and up-to-date map of underground infrastructure.
We have developed a groundbreaking mapping technology based on the processing of multiple sources, including engineering records which are extremely accurate and rich in information.
Engineering records need to be geolocated, extracted, and digitized. This is particularly challenging due to the very high variability in documents and the density of the sketches.
In this talk, we will present the innovative solution we have developed to process this data at scale.
Sr. Director of EyeQ Deep Learning FrameworksMobileye
Shiri Manor is a Senior Director at Mobileye with over 20 years of experience in software engineering and management. Leading a dynamic team at Mobileye, Shiri Manor spearheads efforts to enable and enhance deep learning algorithms on Mobileye's embedded car hardware. With a focus on innovation, the team crafts cutting-edge tools utilizing open-source technology for streamlined deployment of deep learning networks, incorporating advanced optimization techniques.
Shiri Manor's professional journey includes being a Software Group Engineering Manager at Intel, where she played an instrumental role in designing and developing computer vision SDKs and Intel OpenCL products. She holds a BSc in computer science with honors and a MSc in the same field from the Technion.
How to enable running DL Networks in the car
Deep learning networks have emerged as a prominent technology for the accurate detection and identification of objects on the road, empowering the way to fully autonomous cars. In the pursuit of cost-effective and power-efficient solutions, this presentation delves into the challenges and strategies associated with optimizing deep learning networks for real-time execution in the context of vehicular environments.
Given the stringent constraints of cost-efficiency and low power consumption, a key challenge lies in achieving high performance without compromising accuracy.
Several optimization techniques are elucidated contribute to network efficiency, with a focus on minimizing computational redundancy and encouraging resource-efficient execution.
Through a compelling case study, the effectiveness of these optimization techniques is demonstrated in the real-world scenario of running a transformer network within an automotive system.
Master's student Weizmann Institute of Science
Tom Sharon received the B.Sc degree in science physics with Summa Cum Laude, focusing on mathematics and physics from The Open University,Israel, in 2021. Presently, she’s completing her M.Sc. in mathematics and computer science from the Weizmann Institute, Rehovot,Israel, under Prof. Yonina Eldar’s supervision.
Her research interests include intersection between deep-learning and computer vision methods to physics challenges, including medical application. Her work focus on electromagnetic and acoustics signals for medical imaging using deep-learning methods such as model-based neural networks,and solving inverse scattering problems for quantitative imaging.
Her awards include the scholarship for excellence master’s degree students in high-tech fields.
Real-Time Model Based Quantitative Radar
Ultrasound and radar signals are beneficial for medical imaging due to their non-invasive and low-cost nature. Quantitative medical imaging can display various physical properties of the scanned medium, in contrast to traditional imaging techniques. This broadens the scope of medical applications including fast stroke imaging. However, current quantitative imaging techniques are time-consuming and tend to converge to local minima. We propose a neural network based on the physical model of wave propagation, to achieve real time multiple quantitative imaging for complex and realistic scenarios,using data from only eight elements, demonstrated for diverse transmission setups using either radar or ultrasound signals.
PhD candidate Technion
Maya is a PhD candidate at the Technion. Working under the supervision of Dr. Moti Freiman, she specializes in the development of innovative algorithms for medical imaging. Maya holds both a BSc and an MSc in Computer Science, having graduated Magna Cum Laude. With a diverse background in software engineering and machine learning, she has had the opportunity to lead engineering teams in both the IDF and the private tech sector. Before assuming her current role at Voyantis, Maya served as an Algorithms Architect at Gett. Her current research efforts focus on leveraging DWI-MRI to improve breast cancer treatment outcomes.
Integrating Radiomics and Physiological Decomposition of DWI
We introduce PD-DWI, a machine-learning model for early prediction of pathological complete response (pCR) in breast cancer patients undergoing neoadjuvant chemotherapy (NAC). Leveraging decomposed diffusion-weighted MRI (DWI) and clinical data, our model outperforms conventional methods in the BMMR2 challenge, achieving an area under the curve (AUC) of 0.8849 versus 0.8397. PD-DWI has the potential to enhance pCR prediction accuracy, reduce MRI acquisition times, and eliminate the need for contrast agents.
94, Yigal Alon St.
Tel Aviv 6109202
Contact us via WhatsApp