
OpenAI Codex, OpenCV Conference, Foundations of Computer Vision Book, Wildlife Species Detection Challenge
Vision Geek Newsletter #12
Hello there,
Welcome to the next edition of Vision Geek Newsletter, covering the essential news and learning resources in Computer Vision and Machine Learning, read by 1000+ Computer Vision practitioners.
Let’s get into it.
OpenAI Codex
OpenAI has introduced Codex, a powerful AI coding agent that converts natural language into code, streamlining programming tasks and enhancing developer productivity. It’s a cloud-based software engineering agent that can work on many tasks in parallel, powered by codex-1, a version of OpenAI o3 model optimized for software engineering.
It is not a full fledged IDE like Cursor. It is designed mainly to work with your GitHub repos. It can perform tasks like writing features, fixing bugs, answering codebase questions, and running tests (upto 30 minutes). Once it completes a coding task assigned, it can create a PR and a human can review it before merging it to the codebase providing more control to the human.
Codex-1 demonstrates high accuracy on internal (75%) and SWE (72.1%) benchmarks. The agents run in isolated containerized environments providing better security and autonomy. Codex is currently available to ChatGPT Pro, Team and Enterprise users today and will be available to Plus users soon.
OpenCV Conference
OpenCV recently hosted its first ever conference “OpenCV-SID Conference on Computer Vision & AI (OSCCA)” as part of Display Week 2025 (the premier international event for electronic display technologies) in partnership with SID (Society for Information Display).
It was an one-day in-person event featuring talks from industry experts such as Gary Bradski (Founder of OpenCV), Satya Mallick (CEO of OpenCV), Monica Song (Product Manager at Google AI Frameworks) and Joseph Nelson (CEO of Roboflow) etc. on various topics ranging from deep learning framework Keras to state-of-the-art augmented reality gaming to the latest advancements in single-shot detection.
Unfortunately, the recordings of the talks or slides are not publicly shared online. Hope to see relevant materials from the talks in other forms online in the near future.
Foundations of Computer Vision Book
“Foundations of Computer Vision” is a comprehensive textbook authored by Antonio Torralba, Phillip Isola, and William T. Freeman, published by the MIT Press in April 16,2024. Designed for students, educators, and practitioners, the book offers an in-depth exploration of computer vision, integrating both classical methods and contemporary deep learning advancements.
It incorporates the latest deep learning techniques, offering insights into how modern approaches have transformed computer vision. Beyond algorithms, the text addresses the relationship between machine vision and human perception, emphasizing the interdisciplinary nature of the field.
Its coverage includes transformers, diffusion models, statistical image models, fairness, ethics, and research methodologies, topics not commonly found in other textbooks. Concepts are presented in concise chapters with extensive illustrations, examples, and exercises to facilitate intuitive learning.
For those interested in a structured and modern exploration of computer vision, this textbook serves as a valuable resource, bridging traditional concepts with the latest advancements in the field. Having said that, it’s hard to cover everything in a single book. This book does not cover in depth the many applications of computer vision such as shape analysis, object tracking, person pose analysis and face recognition.
Fun Fact
It has taken 10 years to complete this book (started at Nov 2010) because of the rapid development in the field after the book started, mainly due to the explosion of deep learning.
Wildlife Species Detection Challenge
The Cup-ybara Challenge 2025 is an 8-week global AI competition hosted by Tryolabs (as part of its “AI for Good” initiative) on the Kaggle platform. It tasks participants with building machine-learning models to automatically identify native Uruguayan wildlife species in short camera-trap videos.
Tryolabs describes it as “an open competition to develop cutting-edge AI models for automated wildlife species detection”, aimed at speeding up a traditionally slow, manual monitoring process. The contest connects the data science community with conservation: the winning models will be shared with local NGO AMBÁ to be put into real-world use in Uruguay’s forests and reserves.
The dataset for this competition is composed of real-world footage collected by camera traps in Uruguayan ecosystems which includes unlabeled training data (train), labeled test data for public leaderboard evaluation (test), held-out private test data for final scoring (private_test).
Each video is exactly 15 seconds long, encoded in .mp4 format, and recorded using motion-triggered camera traps deployed in natural, uncontrolled environments.
Competition start date: Monday, April 21st, 2025
Submissions close date: Sunday, June 15th, 2025
During the competition, participants may collaborate in teams, iterate on models, and discuss ideas on the Kaggle forums. Kaggle enforces its standard competition rules: for example, no external data or manual re-labeling beyond what is provided, and a specific submission format (CSV) via the provided BaseModel class.
Submissions are evaluated by a “weighted F1 score” on the held-out test set, reflecting the multi-class classification accuracy of species labels. (Weighted F1 is common in Kaggle’s multi-category tasks)
The challenge is open to anyone worldwide. Participants need to register for free on Kaggle and accept the competition rules. Monetary prices will be awarded based on evaluation results performed after the main competition ends on a private dataset. (1st Place: $500, 2nd Place: $300, 3rd Place: $100, Honorable Mention: $100)
Prizes will be paid using Buy Me a Coffee. In summary, Cup-ybara aligns AI and wildlife conservation, inviting participants to leverage modern video classification techniques to help protect biodiversity.