15th Israel Machine Vision Conference (IMVC) 2026

Monday|April 27, 2026

Pavilion 10, EXPO Tel Aviv

Agenda

IMVC 2026 will feature presentations by leading researchers in AI, with a focus on image and video processing, computer vision, machine learning, and deep learning. Attendees can expect visionary insights and cutting-edge developments from both academia and industry, showcasing the latest trends in artificial intelligence and its applications.

 

Exhibition

IMVC is the premier platform for companies shaping the future of AI. Discover the latest advancements in machine vision and machine learning, and connect with experts, entrepreneurs, developers, and engineers. Join us in Tel Aviv to forge collaborations, explore ongoing trends, and witness new applications in the field.

Topics

Core Computer Vision & AI Technologies | Advanced AI Methodologies | Emerging Technologies & Applications | Specialized Application Domains | Technical Foundations & Optimization | Cutting-Edge Research Areas

...and many more

 

Keynote Speaker

Eyal Enav

Nvidia

Bio:

Ayellet Tal is a professor and the Alfred and Marion Bär Chair in Engineering at the Technion's Department of Electrical and Computer Engineering. She holds a Ph.D. in Computer Science from Princeton University and a B.Sc degree (Summa cum Laude) in Mathematics and Computer Science from Tel Aviv University. Among Prof. Tal’s accomplishments are the Rechler Prize for Excellence in Research, the Henry Taub Prize for Academic Excellence, and the Milton and Lillian Edwards Academic Lectureship. Prof. Tal has chaired several conferences on computer graphics, shape modeling, and computer vision, including the upcoming ICCV.

Title:

Build Vision AI Agents With NVIDIA Cosmos Reason VLM and Video Analytics Blueprint

Abstract:

A point cloud, which is a set of 3D positions, is a simple, efficient, and versatile representation of 3D data. Given a point cloud and a viewpoint, which points are visible from that viewpoint? Since points themselves do not occlude one another, the real question becomes: which points would be visible if the surface they were sampled from were known? In this talk we will explore why point visibility is important, not only in computer vision but also beyond, how it can be determined, and in particular, how it can be addressed within optimization or deep learning frameworks.

Iris Barshack

ProfessorSheba Medical Center & Ariel University

Bio:

Title:

Abstract:

Idan Bassouk

Aidoc

Bio:

Title:

Abstract:

Ilan Tsarfaty

ProfessorTel Aviv University

Bio:

Title:

Abstract:

Tal Drori

IBM Research

Bio:

Title:

Abstract:

Speakers

Ofir Lindenbaum

Assistant professor Bar Ilan University

Bio:

Ofir Lindenbaum is a Senior Lecturer (Assistant Professor) in the Faculty of Engineering at Bar-Ilan University. He completed a postdoctoral fellowship at Yale University in the Applied Mathematics Program with Prof. Ronald Coifman and Prof. Yuval Kluger, and earned his Ph.D. in Electrical Engineering from Tel Aviv University under Prof. Arie Yeredor and Prof. Amir Averbuch. His research develops interpretable, efficient machine learning methods for scientific discovery, focusing on high-dimensional tabular data, multimodal learning, sparsification, optimization, and representation learning, aiming to design principled, reliable, and data-efficient algorithms for real-world scientific data.  

Title:

COPER: Correlation-based Permutations for Multi-View Clustering

Abstract:

Combining data from multiple sources often leads to better insights, yet many existing multi-view clustering methods are tailored to specific domains or require complex, multi-stage pipelines. We present a practical end-to-end deep learning framework that works across diverse data types, including images and tabular data. Our approach learns unified representations that capture shared structure across sources and enables consistent grouping without manual labels. The method is scalable, robust to noise, and supported by both theoretical insights and extensive experiments on ten benchmark datasets, demonstrating strong and reliable performance across varied real-world settings.

Michal Holtzman Gazit

AI research scientistEarth Dynamics AI

Bio:

Michal Holtzman Gazit is a Computer Vision and AI Researcher at Earth Dynamics AI with over 25 years of expertise in image processing, computer vision and deep learning. Her career evolved from a foundation in medical imaging to sophisticated 2D and 3D structural analysis. She holds a BSc and MSc in electrical engineering, a PhD in computer science Technion and performed post-doctoral research in inverse problems at the University of British Columbia. Michal specializes in leading the transition of advanced research to production-ready systems. Currently, she develops Geoscience Foundation Models for mineral exploration, utilizing AI to decode Earth’s 3D structures and revolutionize resource discovery

Title:

Multi-Modal Geologic Intelligence: 3D Inversion and Map Synthesis via Generative Foundation Models

Abstract:

The integration of generative foundation models into geoscientific workflows represents a transformative shift in solving complex inverse problems. We explore advanced architectures for map synthesis via Conditional Flow Matching and volumetric inversion via 3D VAEs, leveraging magnetic, gravity, and drilling data. By constraining multi-modal generative priors with physical laws, we synthesize high-fidelity geologic insights from sparse, unorganized measurements. This approach accelerates mineral exploration by significantly reducing the cost and uncertainty of targeting subsurface anomalies. This synergy of cross-modal generative processes and potential field theory defines a new era of geologic intelligence.

Lorina Dascal

Principal Image Processing SpecialistAbbott Laboratories

Bio:

Lorina Dascal is a principal computer vision and image processing specialist at Abbott Labs. Her research interests include    deep learning for image/ video understanding, 3D medical shapes, multimodal fusion of imaging and neural partial differential equation in vision. She has authored 14 published papers and has earned 11 patents. She holds a   PhD in Applied Mathematics from Tel-Aviv University, she was a postdoctoral fellow and a research  assistant   in the Computer Science Department at the Technion. 

Title:

Automatic 3D Surface Reconstruction of the Left Atrium from Unorganized Contours

Abstract:

ICE (intracardiac echocardiography) is a valuable tool in cardiac catheterization and electrophysiology (EP) procedures, assisting physicians in visualizing anatomical details and in monitoring procedures like catheter ablation, septal defect closure, left atrial appendage occlusion, and valve implantation.  Our aim is to automatically create an accurate three-dimensional surface model of Left atrium from automatic segmented boundaries of ICE images.  We propose a modified Poisson reconstruction method with additional geometric constraints, which enables the creation of accurate, highly detailed and computationally efficient surfaces from diverse sets of unorganized and sparse contours.

Hani Bezalel

Tel Aviv University

Bio:

Hana Bezalel holds an M.Sc. from Tel Aviv University, supervised by Hadar Averbuch Elor. Her CVPR 2025 publication focuses on relative, in-the-wild, pose estimation in extreme settings. Previously Lead Computer Vision Engineer at Rafael, she currently serves as an Algorithm Developer at Mobileye, where her work centers on geometric computer vision and spatial understanding.

Title:

Extreme Rotations Estimation In The Wild

Abstract:

We present a technique and benchmark dataset for estimating the relative 3D orientation between a pair of Internet images captured in an extreme setting, where the images have limited or non-overlapping field of views. Prior work targeting extreme rotation estimation assume constrained 3D environments and emulate perspective images by cropping regions from panoramic views. However, real images captured in the wild are highly diverse, exhibiting variation in both appearance and camera intrinsics. In this work, we propose a Transformer-based method for estimating relative rotations in extreme real-world settings, and contribute the ExtremeLandmarkPairs dataset, assembled from scene-level Internet photo collections. Our evaluation demonstrates that our approach succeeds in estimating the relative rotations in a wide variety of extreme-view Internet image pairs, outperforming various baselines, including dedicated rotation estimation techniques and contemporary 3D reconstruction methods.

Or Kozlovsky

Senior AI Applied Researcher Bluewhite Robotics

Bio:

Or Kozlovsky is a Senior AI Applied Researcher at Bluewhite Robotics and was recently a Student Researcher at Google. His work and research focus on generative AI, spatial AI, and real-time computer vision in both 2D and 3D domains. With a strong record of bridging cutting-edge research with real-world computer vision applications across a broad range of areas, including medical, space, entertainment, and robotics.  

Currently, Or is an M.Sc. student at Tel Aviv University under the supervision of Prof. Amit Bermano, and holds dual B.Sc. degrees in Electrical Engineering and Economics from the Technion. 

Title:

BINA: Bootstrapped Intelligence for Novel Adaptation

Abstract:

Robotic systems in real-world environments face conditions unseen during development, and while foundation models promise better generalization, integrating them under real-time onboard constraints remains challenging. We introduce BINA , a deployment-driven framework for online perceptual adaptation. BINA leverages online sparse supervision from a VLM to incrementally distil semantic knowledge into an onboard perception module. Beyond single-robot learning, BINA supports fleet-level knowledge aggregation, enabling scalable adaptation to new environments. Demonstrated on off-road traversability estimation, BINA rapidly converges from zero prior knowledge through operator-guided driving. Although demonstrated on traversability, BINA is task-agnostic and applicable to other perception and autonomy tasks. 

Moshe Mandel

AI ResearcherThe Hebrew University of Jerusalem

Bio:

Earned an MSc in Computer Science from the Hebrew University of Jerusalem (HUJI) under the supervision of Dr. Yossi Adi. Research bridges audio and visual domains, grounded in deep learning methodologies. Work centers on creative generative AI at the intersection of machine learning and artistic expression.

Title:

Latent Space JAM: Layout-Guided Video Generation

Abstract:

Controlling video generation is commonly achieved through training-based methods such as fine-tuning or adding control modules, which require extra optimization, data, or architectural changes. We propose a training-free approach that leverages pretrained video diffusion models to control object layout and motion using coarse spatio-temporal layouts. Our method operates in two passes: first, it steers spatial placement and temporal evolution through prompt-guided cross-attention to produce a coarse visual guide; second, this guide conditions the same model to generate a high-quality, layout-consistent video. The approach enables structured, coordinated motion with strong temporal consistency, supporting complex trajectories and multi-object interactions without additional training.

Asaf Joseph

R&DLightricks

Bio:

Hold MS.c in computer science for HUJI under Prof. Shmuel Peleg supervision.Currently works at Lightricks; main research interest is Video Generation.

 

Title:

Latent Space JAM: Layout-Guided Video Generation

Abstract:

Controlling video generation is commonly achieved through training-based methods such as fine-tuning or adding control modules, which require extra optimization, data, or architectural changes. We propose a training-free approach that leverages pretrained video diffusion models to control object layout and motion using coarse spatio-temporal layouts. Our method operates in two passes: first, it steers spatial placement and temporal evolution through prompt-guided cross-attention to produce a coarse visual guide; second, this guide conditions the same model to generate a high-quality, layout-consistent video. The approach enables structured, coordinated motion with strong temporal consistency, supporting complex trajectories and multi-object interactions without additional training.

Zvi Stein

Sr. Algorithm DeveloperAlign Technology, Inc.

Bio:

Zvi Stein is an Algorithms Engineer at Align Technology, working on computer vision and 3D geometry pipelines for multi-view scanning. His work focuses on surface reconstruction, mesh refinement, and performance-critical implementations with GPU acceleration. He has experience building end-to-end systems, from image-based inference to real-time processing and quality evaluation, aimed at improving surface accuracy and robustness in challenging acquisition conditions.

Title:

Mesh Refinement from Multi-View RGB Using Image-Predicted Surface Normals

Abstract:

Accurate surface refinement in regions with fine geometric detail remains challenging in practical 3D acquisition pipelines, where reconstructed meshes are often limited by scan resolution and noise. Although many scanning systems capture high-resolution multi-view RGB imagery, exploiting these images for metric geometry refinement is difficult due to scale ambiguity and perspective effects inherent to wide field-of-view 2D projections.

We present a geometry-refinement pipeline that converts multi-view RGB observations into a consistent surface normal field and integrates it to deform an initial mesh toward a refined surface. The central approach is to use image-predicted surface normals as the primary refinement signal, providing scale-consistent geometric constraints that are not directly available from intensity values alone. Input views are selected and scored based on geometric visibility and viewpoint diversity to ensure robust coverage and stable convergence across the surface. To mitigate projection-induced distortions, images are undistorted and re-parameterized into locally aligned patches, with corresponding rotations applied to the predicted normals.

A U-Net model trained from scratch predicts normal maps on a dedicated network surface, while deformation is applied on a separate, explicitly upsampled sampling surface designed to absorb high-frequency detail beyond the resolution of the original reconstruction; an additional simplified surface supports efficient view selection and scoring. The refined normal field is fused by solving a Poisson formulation to recover metrically consistent vertex displacements. Experimental results demonstrate improved reconstruction fidelity in high-curvature and detail-critical regions, recovering subtle structures that are commonly smoothed or missing in scan-resolution-limited meshes.

Rafael Ben Ari

Software Tech Lead, Senior AI EngineerGeneral Motors

Bio:

Rafael is a Software Tech Lead with a passion for AI Engineering at General Motors, where his focus is on making AI systems reliable, efficient, and production-ready. His work centers on integrating AI into core systems, and optimizing the tradeoffs between cost, latency, and result quality, using engineering to give AI exactly the context it needs to perform well. Rafael builds agents, sandboxes, benchmarks, and metrics to deeply understand system behavior and design AI solutions that actually work in the real world.

Title:

Introduction to Multi-Agent Architecture Patterns

Abstract:

A practical 15-minute technical session exploring the core principles of agentic workflows and how to move beyond single-agent limitations into coordinated, multi-agent AI systems. Attendees will learn key orchestration patterns, when to apply each, task decomposition strategies, overhead considerations, and an overview of leading frameworks. Participants will leave with a clear blueprint for designing robust multi-agent workflows and applying the right architecture for their AI applications.

Igal Dmitriev

Senior Algorithm EngineerWSC Sports

Bio:

Igal Dmitriev is a Senior Algorithm Engineer at WSC Sports, where he builds computer-vision and vision-language systems that turn broadcast sports video into dense, reliable player metadata. He specializes in deep learning, transformers, detection, and lightweight inference pipelines for production. Previously, he developed vision algorithms at UVeYe and Healthy.io, working on video analytics and 3D reconstruction. Igal holds an M.Sc. (with

Title:

Unified Sports Perception: Single-Pass Extraction of Dense Player Metadata via Lightweight VLMs

Abstract:

Automated sports analytics often requires separate detection, OCR, and captioning models, while standard VLMs struggle to link jersey numbers to the correct player. We present a Unified Perception Engine built on Florence-2, fine-tuned to produce hierarchical, structured outputs-scene context plus player bounding boxes, orientation, and OCR-in a single pass. We use parameter-efficient fine-tuning (frozen DaViT encoder, trainable decoder), custom token weighting, and a 21-token vocabulary expansion. Trained on 47K semi-automatically annotated frames with human verification, the model achieves 97.1% precision / 91.4% recall for player-metadata association, 89.3% jersey-number exact match, and 3.2× faster inference than a three-model ensemble.

Ofir Liba

Algorithm team leadWSC Sports

Bio:

Ofir Liba is an Algorithm Team Lead at WSC Sports, where he spearheads the development of multimodal systems combining Natural Language Processing (NLP) and Computer Vision. With a focus on production-grade AI, he oversees the creation of robust detection models and scalable inference pipelines. Previously, he developed vision algorithms at Samsung. Ofir holds both a B.Sc. and an M.Sc. in Electrical Engineering from Tel Aviv University.

Title:

Unified Sports Perception: Single-Pass Extraction of Dense Player Metadata via Lightweight VLMs

Abstract:

Automated sports analytics often requires separate detection, OCR, and captioning models, while standard VLMs struggle to link jersey numbers to the correct player. We present a Unified Perception Engine built on Florence-2, fine-tuned to produce hierarchical, structured outputs-scene context plus player bounding boxes, orientation, and OCR-in a single pass. We use parameter-efficient fine-tuning (frozen DaViT encoder, trainable decoder), custom token weighting, and a 21-token vocabulary expansion. Trained on 47K semi-automatically annotated frames with human verification, the model achieves 97.1% precision / 91.4% recall for player-metadata association, 89.3% jersey-number exact match, and 3.2× faster inference than a three-model ensemble.

Sagie Benaim

Assistant Professor (Senior Lecturer)The Hebrew University of Jerusalem

Bio:

Sagie Benaim is an Assistant Professor (Senior Lecturer) at the School of Computer Science and Engineering at the Hebrew University of Jerusalem. Previously, he was a postdoc at Copenhagen University, working with Prof. Serge Belongie and as a member of the Pioneer Center for AI. Prior to that, he completed his PhD at Tel Aviv University in the Deep Learning Lab under the supervision of Prof. Lior Wolf. His research interests lie in computer vision, machine learning, and computer graphics, with a particular focus on generative models, neural-based signal representations, and inverse graphics.

Title:

Retrieval-Augmented 3D Vision: Analysis and Generation in the Long Tail

Abstract:

While 2D foundation models scale remarkably, 3D vision remains constrained by data scarcity. Current methods struggle with "long-tail" distributions, often resulting in hallucinated geometry. I will present Retrieval-Augmented 3D, a framework that decouples reasoning from memorization by actively querying external databases. I will discuss two applications: RAD, utilizing uncertainty-aware retrieval to correct monocular depth, and MV-RAG, which leverages retrieved images to guide multi-view diffusion. I will conclude by demonstrating how this framework enables open-world 3D generation and understanding in the long tail.

Abraham Pelz

Algorithm Engineer Corephotonics a Samsung Company

Bio:

Abraham (Avi) Pelz received his B.Sc. and M.Sc. degrees in Electrical and Electronics Engineering from Tel Aviv University. Since 2018, he has been an Algorithm Researcher at Corephotonics, specializing in vision algorithm proof-of-concept. His research encompasses challenging data scenarios, including self-supervision, domain adaptation, neural uncertainty estimation, and data scaling. Avi is the corresponding author of: Kim, Jaeseong, Abraham Pelz, Michael Scherer and David Mendlovic. “On the Effectiveness of Sparse Linear Polarization Pixels for Face Anti-Spoofing.” IEEE Sensors Journal 25 (2025)

Title:

Data Efficiency Estimation from Tiny Datasets: Sparse Polarization Biometric Case Study

Abstract:

Developing robust AI systems requires massive datasets, a significant hurdle for emerging technologies. This talk explores how to evaluate data utility early in the R&D cycle, drawing from our recent paper, "On the Effectiveness of Sparse Linear Polarization Pixels for Face Anti-Spoofing." We not only show that sparse linear polarization is highly effective for face anti-spoofing (tenfold error reduction relative to RGB), but also demonstrate that less informative representations require exponentially more training data to reach given specifications—a trend predictable using tiny datasets. This talk provides a practical case study for comparing physical representations early in research, helping teams identify the most promising technologies before hitting the "data bottleneck."

Noa Cahan

Computer Vision ResearcherTel Aviv University

Bio:

Noa Cahan is a PhD candidate in Electrical Engineering at Tel Aviv University, advised by Prof. Hayit Greenspan. She holds both a BSc and an MSc in Electrical Engineering from Tel Aviv University. Noa's research focuses on deep learning and computer vision in medical imaging, with a particular interest in integrating diverse data modalities such as imaging, free text, and structured tabular data for medical prognosis, as well as developing cross-modal translation models using generative AI. Noa has been awarded ISF research grants, and her work has been published in leading journals such as Scientific Reports and NPJ Digital Medicine, and presented at top conferences including MICCAI and NeurIPS. Prior to her PhD, Noa worked at Amazon and Qualcomm.

Title:

Leveraging Diffusion Models towards PE Early Diagnosis using CXRs

Abstract:

Patients with respiratory issues in emergency rooms typically undergo chest X-rays (CXR), which are accessible and low-cost but provide limited low-resolution imaging. Higher-risk patients are referred for more detailed and expensive CT or Computed Tomography Pulmonary Angiography (CTPA) scans, which involve higher radiation. The study focuses on detecting Pulmonary Embolism (PE), usually invisible in CXR but detectable in CTPA.

By leveraging paired CXR–CTPA data, we investigate two complementary diffusion-based strategies that transfer diagnostic knowledge from the high-fidelity CTPA modality to the widely available CXR domain. In the first, a conditional diffusion model
is trained to generate 3D CTPA - like representations directly from 2D CXRs, enriching the initial imaging with high-resolution vascular cues and improving PE detection performance from 69% to 80% AUC. In addition, we introduce a latent-space
diffusion prior that performs cross-modal knowledge distillation, generating CTPA-informed classifier embeddings from CXR embeddings without explicit image synthesis, enabling state-of-the-art PE classification using CXR alone.
Together, these approaches demonstrate that diffusion models can act as powerful cross-modal bridges, either through image generation or embedding level supervision, substantially enhancing early PE diagnosis from CXRs while reducing reliance on
expensive and high radiation imaging. Although not a replacement for clinical CTPA, this framework highlights a scalable and generalizable pathway for augmenting low-cost imaging with high-level diagnostic insight.
Our contributions through these works are as follows: (1) First true CXR→CTPA diffusion pipeline with diagnostic validation; (2) A novel 1D-diffusion prior for CXR→CTPA embedding distillation; (3) State-of-the-art CXR-based PE classification; (4) Modality-agnostic framework extendable to other cross-modal imaging tasks, facilitating wider access to advanced diagnostic tools.

 

Ran Itay

Algorithm DeveloperApplied Materials

Bio:

Ran Itay is an algorithm developer at Applied Materials, working in the Process Diagnostics and Control business unit. He holds a Ph.D. in physics from the Department of Particle Physics and Astrophysics at the Weizmann Institute of Science in Israel. Ran began working in machine learning during his postdoctoral research at the Stanford Linear Accelerator Center (SLAC) in California, USA, where he led the deep learning group in the MicroBooNE experiment, in the field of neutrino physics. In his current role, Ran focuses on developing deep learning and classical algorithmic solutions across various domains, including metrology, defect detection, and physical simulation

Title:

Warp and Render: A Dual-Network Framework for Geometry-Controlled Simulation in Semiconductor Process Diagnostics

Abstract:

SEM images used in semiconductor manufacturing pose significant challenges for vision models because labels are scarce, geometric accuracy must be maintained at sub-pixel levels, and the domain gap from natural images is substantial. To address these limitations, we introduce Warp and Render, a dual-network framework that separates geometric structure from visual appearance and enables controlled, design-guided simulation. A deformation-prediction network aligns a reference layout to the observed image, and a rendering network generates realistic SEM-like appearance from the aligned geometry. The approach generalizes across diverse pattern types and imaging conditions, remains effective in low-data regimes, and preserves strong geometric consistency, supporting high-accuracy industrial SEM-image analysis.

Tom Hirshberg

Data Scientist Microsoft

Bio:

Tom Hirshberg is a data scientist at Microsoft in the Edge AI group, where she develops multimodal AI systems for large-scale video understanding. Previously, she was a research intern at Microsoft in Redmond, focusing on optimization and control methods for autonomous robotic systems.

Tom holds a BSc and an MSc (cum laude) in Computer Science from The Technion. During her studies, she was part of the algorithm team that developed the first student’s autonomous formula race car at The Technion. Her master’s thesis explored acoustic-based indoor localization for drones, bridging signal processing, machine learning, and robotics.

Title:

Object Detection and Tracking in Live Streams Using Textual and Visual Detailed Descriptions

Abstract:

In the live video analysis domain, everything must happen quickly, efficiently and accurately. While traditional object detection systems rely on predefined classes, modern applications require flexibility to describe, detect, and track any object in live video streams. This brings algorithmic and computational challenges, especially for edge devices, like handling detailed attributes (e.g., “a red vintage car”), integrating specialized trackers, and managing high camera loads efficiently.

This lecture presents an algorithm for detecting and tracking objects in live video streams using detailed textual description, image examples or both. Our approach is already successfully implemented in Microsoft’s Azure AI Video Indexer.

Eli Schwartz

Research Manager IBM Research

Bio:

Dr. Eli Schwartz is Research Manager of Multimodal AI at IBM Research. His research focuses on vision-language foundation models and learning with limited data. Eli earned his PhD from Tel Aviv University and has authored more than 30 papers and more than 10 patents. Before joining IBM Research, he co-founded Inka Robotics, working on autonomous robotics, and worked at Microsoft developing computer vision algorithms for AR/VR.

Title:

Adaptive Resolution Processing in Vision-Language Models

Abstract:

Modern vision-language models face a fundamental accuracy-efficiency trade-off with high-resolution inputs. This talk presents four approaches to adaptive resolution across two architectural paradigms. For contrastive encoders, WAVECLIP enables coarse-to-fine processing via wavelet tokenization with early exits, while CLIMP uses Mamba architectures for natural variable-resolution support. For decoder-based VLMs, CARES predicts minimal sufficient resolution with a lightweight preprocessor (80% compute reduction), while ZoomCall trains models to selectively fetch high-resolution crops via tool-calling and reinforcement learning. These complementary strategies—progressive refinement, learned preprocessing, and agent-based reasoning—enable dynamic accuracy-efficiency trade-offs within deployed models.

Or Greenberg

AI researcher The Hebrew University of Jerusalem

Bio:

I am a PhD candidate at The Hebrew University of Jerusalem, advised by Prof. Dani Lischinski, and a Senior Researcher at General Motors. My research focuses on image and video generation and manipulation, with a particular interest in adverse viewing conditions and out-of-distribution (OOD) concepts, primarily related to automotive scenarios.

Title:

Seed-to-Seed: Unpaired Image Translation in Diffusion Seed Space

Abstract:

We introduce Seed-to-Seed Translation (StS), a framework optimizing unpaired image-to-image translation through two primary contributions. First, we provide an in-depth analysis of the space of inverted latents ("seeds"), denoted "seed-space", demonstrating that it encodes critical semantic features for discriminative tasks. Second, we leverage these features through a novel hybrid mechanism that combines a GAN with a diffusion-model to perform unpaired seed-to-seed translation ( = image translation in the seed space) before the diffusion sampling steps start. 

We show that our method outperforms existing GAN and diffusion-based baselines in complex automotive scene synthesis,  while establishing a novel paradigm for latent-based image manipulation.

Iftach Klapp

ResearcherVolcani Institute

Bio:

Dr. Iftach Klapp, (Ph.D., Electrical Engineering). Following a short Post-Doc training, he joined the Volcani Institute, where he founded the Agro‑Optics and Sensing Laboratory, dedicated to developing advanced electro-optical sensing systems for agriculture. The lab studies interactions between sensors, objects, and the environment, creating optical systems and embedding physical models into data processing to ensure accurate sensing in dynamic conditions. Its approach integrates physical modeling with inverse‑problem methods, including Physically Aware Convolutional Neural Networks, to extract reliable, meaningful information from sensor data.  Prior to his PhD studies, he worked for six years in the Automatic Optical Inspection industry as an opto-mechanical R&D engineer.

Title:

Affordable Thermal Imaging: Overcoming Accuracy and Resolution Limits with AI

Abstract:

Plant temperature serves as a critical indicator of crop health, especially for identifying water stress that triggers stomatal closure and canopy heating. Although radiometric thermal IR cameras can detect such stress at an early stage, their high cost (>$20,000) restricts their use in agriculture. More affordable uncooled thermal cameras (~$4,000) present a promising alternative but suffer from drift, non-uniformity, limited accuracy (±5 °C), and low spatial resolution. To overcome these limitations, we developed deep-learning methods for non-uniformity correction (NUC) and super-resolution (SR), enhancing image resolution by factors of ×2 and ×4. In field experiments using a low-cost FLIR TAU2 alongside a scientific-grade FLIR A655Sc mounted on the same drone, our end-to-end system achieved real-time processing (<1 s per frame) with high fidelity, reducing root mean square error to ~0.5 °C. The derived Crop Water Stress Index (CWSI) closely matched reference measurements, with deviations of only ~1.4–1.9%, demonstrating that this approach enables precise, affordable, and scalable agricultural monitoring for water management.

Amit Bleiweiss

Senior Data ScientistNVIDIA

Bio:

Amit Bleiweiss is a senior data scientist at NVIDIA, where he focuses on large language models and generative AI for healthcare and life sciences. He has 25 years of experience in applied AI, with over 50 patents and publications in the domain. Amit received his degrees from the University of California at Berkeley and the Hebrew University of Jerusalem, where he specialized in machine learning.

Title:

Advancing AI in Radiology Research with NVIDIA Clara Open Medical Imaging Models

Abstract:

Advancing AI in Radiology Research with NVIDIA Clara Open Medical Imaging Models - This session presents the Clara Medical Open Models, a suite of pre-trained deep learning models designed to advance research in medical image analysis. We will present the architectural principles, dataset curation strategies, and benchmarking protocols that underpin these models, emphasizing explainability, reproducibility, and domain generalization. Case studies will illustrate their application across diverse imaging modalities, including CT, MRI, and digital pathology. Through this exploration, the session will highlight how open, standardized model repositories accelerate scientific discovery and enable robust evaluation frameworks in healthcare research.

Omri Hirsch

Ben-Gurion University of the Negev

Bio:

Omri Hirsch is an MSc. student in Computer Science at Ben-Gurion University of the Negev, conducting research in Computer Vision and Machine Learning in the Vision, Inference, and Learning (VIL) group under the supervision of Prof. Oren Freifeld. His research focuses on efficient geometric learning and joint image alignment, and he is the first author of FastJAM, recently accepted to NeurIPS 2025. He has previously worked on medical imaging in collaboration with Dr. Yonatan Winetraub’s lab at Stanford University. Omri is a recipient of competitive scholarships for outstanding MSc. students in AI and Data Science for two consecutive years.

Title:

FastJAM: a Fast Joint Alignment Model for Images

Abstract:

Joint Alignment (JA) aims to align a collection of images into a shared coordinate frame such that semantically corresponding features coincide spatially. Despite its importance in many vision applications, existing JA methods often rely on heavy optimization, large-capacity models, and extensive hyperparameters, leading to long training and limited scalability.

In this talk, we present FastJAM, a fast joint alignment framework that reframes JA as a graph-based problem over sparse keypoints. FastJAM leverages pairwise correspondences and a graph neural network to efficiently predict per-image transformations, achieving state-of-the-art alignment quality while reducing runtime from minutes or hours to just seconds.

Natalya Segal

PhD CandidateBar-Ilan University

Bio:

Natalya Segal is a data scientist and biomedical AI researcher with experience leading data science teams and developing AI/ML systems across multiple domains. She is an inventor on multiple granted U.S. patents recognized and adopted by leading technology companies. As a PhD candidate at Bar-Ilan University, she is pioneering a contactless, affordable brain-computer interface (BCI) that uses remote optical sensing and deep learning to decode internal speech. Her work advances optical neural decoding and cortical monitoring. She holds an MSc in Electrical Engineering and a BSc in Mathematics and Computer Science.

Title:

Adapting Long-Video Masked Autoencoders to High-Speed Brain Imaging

Abstract:

Self-supervised video foundation models have recently shown strong transferability across natural video tasks, yet their applicability to domains with radically different spatiotemporal statistics remains largely unexplored. We investigate whether long-video masked autoencoders (LV-MAE), originally designed for low-frame-rate semantic natural videos, can be adapted to high-speed coherent imaging that lacks semantic structure. We apply LV-MAE to speckle-pattern video2-3 recordings captured at 1000 fps from the scalp overlying language-related cortex during silent speech tasks. Despite extreme differences in resolution, modality, and temporal scale, LV-MAE learns transferable representations that enable accurate downstream classification with minimal labeled data. Using leave-one-subject-out evaluation with one-minute subject-specific calibration, the proposed approach achieves strong cross-subject performance on millisecond-scale inputs. These results suggest that masked video representation learning can generalize beyond natural video, enabling efficient learning in specialized high-speed imaging domains.

Shachar Shmueli

ResearcherBen Gurion University of the Negev

Bio:

Shachar Shmueli is an Electrical Engineering Master’s student at Ben-Gurion University, specializing in the security of generative AI. His research focuses on developing robust attacks and defenses for diffusion models. Alongside his studies, he works as a Data Engineer at the startup Octup.

Title:

Black-box Adversarial Attack on Stable Diffusion Models

Abstract:

We present a black-box adversarial attack on diffusion models using genetic algorithms to evolve adversarial prompts. Our method injects evolved code strings to input prompts, modifying generated images to match target semantic content. Operating in a black-box setting with only image outputs, we use CLIP embeddings to measure semantic similarity. Evaluated on Stable Diffusion across multiple categories, our attack successfully manipulates image generation while maintaining perceptual quality. Multi-classifier evaluation demonstrates significant classification changes with minimal degradation, revealing vulnerabilities in diffusion model robustness.

Meir Yossef Levi

PhD StudentTechnion - Israel Institute of Technology

Bio:

Meir Yossef Levi (Yossi Levi) is in the final stages of his Ph.D. at the Technion, advised by Prof. Guy Gilboa, after receiving both his B.Sc. and M.Sc. in Electrical Engineering from the Technion. His research focuses on multimodal representation learning, with a particular interest in understanding the latent geometry of vision-language models and its implications. His recent work centers on the representation of foundation models, with several papers accepted to ICML and ICLR on this topic. Prior to this, he studied robust classification in 3D vision, with publications at ICCV and 3DV.

Title:

The Geometry and Likelihood Structure of CLIP Embeddings

Abstract:

The talk mainly covers two papers to be presented at ICML 2025. I will present our recent work analyzing the geometry of CLIP’s latent space from both geometric and probabilistic perspectives. We show that the embedding space is better characterized by two distinct, shifted ellipsoids, rather than a shared hypersphere. This finding challenges common assumptions about CLIP’s latent structure. Building on this double-ellipsoid perspective, we introduce a new measure called conformity, which captures how closely a sample aligns with its Modality Mean. Finally, I will introduce Whitened CLIP (W-CLIP) — a simple, linear transformation of the latent space into an isotropic space. It enables the use of embedding norms as a surrogate for likelihood approximation. This approach supports a wide range of applications, including domain shift detection and the identification of generative artifacts.

Dvir Samuel

Senior Research ScientistOriginAI

Bio:

Dvir Samuel is a researcher at OriginAI. He holds a PhD from Bar-Ilan University, and his research focuses on long-tail and few-shot learning, as well as generative models, with a particular emphasis on diffusion- and flow-matching–based methods for image and video editing and personalization. At OriginAI, he develops scalable and practical methods to advance generative AI across images, videos, and 3D content.

Title:

Latent Space JAM: Layout-Guided Video Generation

Abstract:

In Omnimatte, one aims to decompose a given video into semantically meaningful layers, including the background and individual objects along with their associated effects, such as shadows and reflections. Existing methods often require extensive training or costly self-supervised optimization. In this paper, we present OmnimatteZero, a training-free approach that leverages off-the-shelf pre-trained video diffusion models for omnimatte. It can remove objects from videos, extract individual object layers along with their effects, and composite those objects onto new videos. These are accomplished by adapting zero-shot image inpainting techniques for video object removal, a task they fail to handle effectively out-of-the-box. To overcome this, we introduce temporal and spatial attention guidance modules that steer the diffusion process for accurate object removal and temporally consistent background reconstruction. We further show that self-attention maps capture information about the object and its footprints and use them to inpaint the object's effects, leaving a clean background. Additionally, through simple latent arithmetic, object layers can be isolated and recombined seamlessly with new video layers to produce new videos. Evaluations show that OmnimatteZero not only achieves superior performance in terms of background reconstruction but also sets a new record for the fastest Omnimatte approach, achieving real-time performance with minimal frame runtime. 

Dori Gaton

Co-Founder & CEO Data Compass AI

Bio:

Dori Gaton works at the intersection of AI R&D and delivery: consulting, leading projects, and shipping AI systems used in real workflows. He co-founded Data Compass AI seven years ago and leads it as CEO, supporting teams with full-scale execution or targeted advisory. He enjoys tackling the hard parts - domain shift, noisy labels, and ambiguous ground truth - across computer vision, multimodal models, and trustworthy AI. At Proofig AI, where he serves as Chief AI Officer (CAIO), he has worked with the team for the past five years on production systems that help safeguard research integrity.  

Title:

The Scientific Image Arms Race: How We Detect AI Generated Figures in the Wild

Abstract:

Generative AI is lowering the barrier to producing plausible scientific figures, creating new risks for research integrity. We present a production-oriented approach to detecting AI-generated images in academic papers, focusing on microscopy - a domain with biological and imaging “rules,” domain-specific textures, and noise. In an internal survey domain experts found that distinguishing real vs. generated imagery is very challenging, even side-by-side. We describe a deep learning classifier trained on real literature images and synthetic images generated at scale via image-to-image and image-to-text-to-image pipelines, with publication-like compression and figure distortions. We close with lessons on the generator-detector arms race and limited explainability.

Irit Chelly

Senior Applied Researcher Wix

Bio:

Irit Chelly is a PhD graduate from the Computer Science Department at Ben-Gurion University, where she also earned her M.Sc., under the supervision of Prof. Oren Freifeld and Dr. Ari Pakman in the Vision, Inference, and Learning group. Her research focuses on probabilistic clustering using non-parametric Bayesian models and unsupervised learning. Her previous projects involved spatial transformations, dimensionality reduction in video analysis, and generative models. Irit won the national-level Aloni PhD Scholarship from Israel’s Ministry of Technology and Science, as well as the BGU Hi-Tech Scholarship for outstanding PhD students, and received annual awards and instructor rank for excellence in teaching core Computer Science courses.

Title:

Consistent Amortized Clustering via Generative Flow Networks

Abstract:

Neural models for amortized probabilistic clustering yield samples of cluster labels given a set-structured input, while avoiding lengthy Markov chain runs and the need for explicit data likelihoods. Existing methods which label each data point sequentially, like the Neural Clustering Process, often lead to cluster assignments highly dependent on the data order. Alternatively, methods that sequentially create full clusters, do not provide assignment probabilities. In this paper, we introduce GFNCP, a novel framework for  amortized clustering. GFNCP is formulated as a Generative Flow Network with a shared energy-based parametrization of policy and reward. We show that the flow matching conditions are equivalent to consistency of the clustering posterior under marginalization, which in turn implies order invariance. GFNCP also outperforms existing methods in clustering performance on both synthetic and real-world data.. The talk is based on [Chelly et. all, AISTATS '25].

Or Levi

Senior VP, Data ScienceZefr

Bio:

Or Levi is an AI Researcher and Senior VP of Data Science at Zefr. He holds a M.Sc. (Magna Cum Laude) in Information Retrieval from the Technion, the Israel Institute of Technology. Or’s strongest passion is using AI for social impact, which led him to develop innovative AI to fight the spread of misinformation online. His work has been presented in leading AI conferences and covered by international media.

Title:

When AI Agents Should Ask for Help - Building Reliable Human–AI Systems

Abstract:

Large Language Models (LLMs) and AI Agents are increasingly deployed in high-stakes human–AI systems such as video content moderation. Yet a fundamental limitation remains: they tend to respond with confidence even when they are wrong, creating significant real-world risks.

The central challenge in deploying LLMs and Agents is not maximizing autonomy, but enabling systems to recognize when an Agent should not be trusted. To address this, we introduce a trust-aware framework for human–AI collaboration in which a judge model predicts whether the LLM output should be trusted or escalated to a human.

Our approach relies on LLM Performance Predictors (LPPs) derived directly from LLM outputs, capturing confidence signals, self-reported uncertainty, and indicators of missing evidence or ambiguous decision rules. Evaluated on a large-scale multimodal moderation benchmark, our method improves performance while reducing unnecessary human intervention. These results suggest that reliable AI systems are built not by replacing humans, but by enabling models to know when to ask for human judgment.

Or Bachar

Data ScientistZefr

Bio:

Or Bachar is a Data Scientist at Zefr and an M.Sc. student in Machine Learning and Data Science at Reichman University, focusing on reliable machine learning and computer vision systems, with an emphasis on model uncertainty and human–AI collaboration.  

Title:

When AI Agents Should Ask for Help - Building Reliable Human–AI Systems

Abstract:

Large Language Models (LLMs) and AI Agents are increasingly deployed in high-stakes human–AI systems such as video content moderation. Yet a fundamental limitation remains: they tend to respond with confidence even when they are wrong, creating significant real-world risks.

The central challenge in deploying LLMs and Agents is not maximizing autonomy, but enabling systems to recognize when an Agent should not be trusted. To address this, we introduce a trust-aware framework for human–AI collaboration in which a judge model predicts whether the LLM output should be trusted or escalated to a human.

Our approach relies on LLM Performance Predictors (LPPs) derived directly from LLM outputs, capturing confidence signals, self-reported uncertainty, and indicators of missing evidence or ambiguous decision rules. Evaluated on a large-scale multimodal moderation benchmark, our method improves performance while reducing unnecessary human intervention. These results suggest that reliable AI systems are built not by replacing humans, but by enabling models to know when to ask for human judgment.

Ofir Bibi

VP of ResearchLightricks

Bio:

Ofir Bibi is the VP of Research at Lightricks, where he leads the development of generative video applications. His team built LTX-2, a state-of-the-art open-source video generation model, and now develops the application layers that bring it to market. Ofir drives both the technical innovation and open-source community strategy that position Lightricks at the forefront of generative media, bridging fundamental ML research with products that reach millions of users.

Title:

Building Production Video Applications with LTX-2 and IC-LoRA

Abstract:

This talk explores practical strategies for building generative video applications using specialized fine-tunes of LTX-2. We demonstrate how IC-LoRA enables efficient task-specific adaptations, injecting reference material and extending the capabilities of the base model. From camera control, to character consistency, we'll cover the technical considerations for using this method, training and data requirements as well as inference time challenges. This lecture is for you if you're interested in video generation, open-source video models and diverse creative use cases.

Gilad Erlich-Shemesh

Senior Algorithms ResearcherTrigo

Bio:

Gilad is a Senior Algorithm Researcher on Trigo’s interdisciplinary research team, where, over the past six years, he has built production computer-vision systems for physical retail - such as autonomous checkout and theft detection, and develops advanced, real-time, in-the-wild AI algorithms used by millions of shoppers worldwide.
Gilad’s background spans applied computer vision, signal processing, multi-view geometry, and optimization. He holds BSc degrees in Electrical Engineering and in Physics from the Technion.

Title:

Efficient Multi-Camera Multi-Person Tracking in Sparse CCTV Layouts for Loss Prevention in Retail Spaces

Abstract:

Retail shrinkage remains a major operational challenge, yet detecting cross-camera events in real stores is far from trivial. In this talk, we present a real-time multi-camera, multi-person tracking system designed specifically for sparse, pre-existing CCTV layouts, where camera placement is uncontrolled, viewpoints vary dramatically, privacy constraints imply blurred faces, and compute resources are limited.
Our approach combines calibrated visual re-identification, compact online person representations, and a learned spatiotemporal prior that models zone-to-zone transitions in an unsupervised manner, enabling robust and scalable cross-camera tracking under real-world “AI-in-the-wild” conditions.

Roni Goldshmidt

Senior Machine Learning EngineerNexar

Bio:

Roni Goldshmidt is a Senior Machine Learning Engineer at Nexar, where he leads research and development of large-scale, real-world computer vision and multimodal models for driving safety. His work focuses on ego-centric collision prediction, risk modeling, and learning from large-scale dashcam data. Roni has hands-on experience building and deploying deep learning systems in production, bridging research and real-world applications in mobility and automotive AI.

Title:

BADAS: Context-Aware Ego-Centric Collision Prediction Using Real-World Dashcam Data

Abstract:

BADAS-1.5 is Nexar’s latest V-JEPA collision prediction model, trained on large-scale real-world dashcam data. In this talk, we present the design and evolution of BADAS-1.5, focusing on how improved labeling strategies, temporal context modeling, and risk-aware training lead to a better tradeoff between recall and false positive rate.
We will share practical insights from large-scale evaluation, common failure modes, and lessons learned from deploying collision prediction models in real-world driving environments.

Roi Pony

Research Scientist IBM Research

Bio:

Roi is a Research Scientist at IBM Research, Vision & Learning Technologies Group, where he focuses on multimodal embeddings, retrieval-augmented generation (RAG), vision-language models (VLMs), and LLMs. With experience spanning classical computer vision to modern deep learning, he brings a broad perspective to AI research. He holds an M.Sc. and B.Sc. in Electrical Engineering, both from the Technion – Israel Institute of Technology.

Title:

Real-World Multi-Modal RAG: Innovative Benchmarking and Efficient Visual Document Retrieval

Abstract:

Enterprise documents carry critical information in their visual layout, not just their text. As RAG systems evolve to handle these multi-modal documents, new challenges emerge around evaluation, retrieval quality, and production-scale efficiency.
In this talk, I will present our team's recent work across three directions. First, building realistic benchmarks for multi-modal RAG that reflect real enterprise needs. Second, training vision-language based document retrievers that capture layout and visual semantics beyond text extraction. Third, our findings on redundancy in multi-vector document representations and how this insight enables significantly more efficient retrieval at query time.

Ori Lifschitz

Head of Computer Vision Skana Robotics

Bio:

Ori holds a BSc in Electrical Engineering from Ben-Gurion University of the Negev (Israel) and an MSc in Marine Technologies from the Hatter Department of Marine Technologies at the University of Haifa (Israel), graduating on the Dean’s Honor List. During his MSc, he published two papers, including a NeurIPS 2025 paper co-authored with Prof. Tali Treibitz and Dr. Dan Rosenbaum, in which he addressed refraction-induced water-surface distortions using unsupervised, physics-constrained deep learning. In industry, Ori reduced training-data requirements for a semantic-segmentation model serving thousands of client API calls and owned deep-learning pipelines end to end—from research to deployment. He now heads Computer Vision at Skana Robotics, developing robust, real-world perception systems for marine environments.

Title:

Looking Into the Water by Unsupervised Learning of the Surface Shape

Abstract:

We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training. We show that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction. Using both simulated and real data we show that our method outperforms the latest unsupervised image restoration approach. In addition, it provides an estimate of the water surface.

A Taste from IMVC 2025