Computer Vision Tutorial

Improving AI models' ability to explain their predictions

In high-stakes settings like medical diagnostics, users often want to know what led a computer vision model to make a certain prediction, so they can determine whether to trust its output. Concept ...

Microsoft Builds A Compact AI Model That Decides When To Think

Microsoft's Phi-4-reasoning-vision-15B uses careful data curation and selective reasoning to compete with models trained on ...

EurekAlert!

Ateneo machine learning lab opens doors to industry partners, collaborators

The Ateneo Laboratory for Intelligent Visual Environments (ALIVE) is eager to co-develop machine learning solutions with ...

OpenAI launches GPT-5.4 with computer vision, tool use enhancements

OpenAI Group PBC today launched a new large language model that it says is more adept at automating work tasks than its earlier algorithms. GPT-5.4 is available in ChatGPT, the Codex programming tool ...

IEEE

A Review of Computer Vision for Railways

Abstract: Modern railways continue to strive for remote and automated methods to improve the visual inspection procedures for their assets. In some cases, these inspections provide new information ...

GitHub

A framework to enable multimodal models to operate a computer.

Using the same inputs and outputs as a human operator, the model views the screen and decides on a series of mouse and keyboard actions to reach an objective. Released Nov 2023, the Self-Operating ...

12d

Vertebrate paleontology has a numbers problem. Computer vision can help

How many fossils does it take to accurately train an image-based AI algorithm? According to a new study co-authored by Bruce ...

IEEE

Deep Learning for Computer Vision: Recent Breakthroughs and Emerging Trends

Abstract: Significant strides have been achieved in the use of deep learning to computer vision, which has changed the way that computers process and respond to visual data. The authors of this study ...

Geeky Gadgets

Gemini 3 Agentic Vision Proves Image Analysis Needs Reasoning Not Guesswork

What if you could transform complex images into actionable insights with just a few clicks? That’s exactly what Google Gemini 3’s Agentic Vision promises to deliver, an innovative way to analyze, ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results