Nvidia Jetson Orin Nano Super, PaliGemma 2, Segment Anything Model (SAM) 2.1
Vision Geek Newsletter #10
Hello there,
Welcome to the next edition of Vision Geek Newsletter, covering the essential news in Computer Vision and Machine Learning, read by 1000+ Computer Vision practitioners.
Let’s get into it.
Nvidia Jetson Orin Nano Super
Nvidia has announced a “super” update to their existing edge device “Jetson Orin Nano” and renaming the updated device to “Jetson Orin Nano Super” with a price drop from $499 to just $249 (8GB variant), claiming to be “The World’s Most Affordable Generative AI Computer”.
There is no hardware update. But through a pure software update, they have introduced a new 25W power mode to the device which has increased the performance in all aspects.
1.7x higher generative AI model performance.
67 Sparse TOPs, a significant increase from the previous 40 Sparse TOPs
102 GB/s of memory bandwidth, a significant leap from the previous 65 GB/s memory bandwidth.
1.7 GHz of CPU clock speed, up from 1.5 GHz.
1020 MHz of GPU clock speed, up from 635 MHz.
This compact yet powerful system can effortlessly handle a wide range of LLMs, VLMs, and Vision Transformers (ViTs), from smaller models to those with up to 8B parameters, such as the Llama-3.1-8B model.
For the given price point and the performance, Orin Nano Super can be a great choice as a baseline edge device to handle the modern day AI workloads. Though it is advertised as Generative AI computer, it can handle all forms of ML models.
PaliGemma 2
Google has released the next update to PaliGemma, open source vision-language model in the Gemma family of models.
Like its predecessor, PaliGemma 2 uses the same powerful SigLIP and PaLI-3 for vision, but it upgrades to the latest Gemma 2 for the text decoder part.
What’s New
Scalable performance: Optimize performance for any task with PaliGemma 2's multiple model sizes (3B, 10B, 28B parameters) and resolutions (224px, 448px, 896px), where as the previous version had only one variant (3B).
Long captioning: PaliGemma 2 generates detailed, contextually relevant captions for images, going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene.
Expanding to new horizons: Leading performance on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation, as detailed in the technical report.
PaliGemma 2 is distributed under the Gemma license, which allows for redistribution, commercial use, fine-tuning and creation of model derivatives. The pre-trained models have been designed for easy fine-tuning on custom datasets for specific tasks. Model files are available on Kaggle and HuggingFace. Get started with HuggingFace Transformers or Keras. (Docs | Notebooks | Demo)
Segment Anything Model (SAM) 2.1
Meta’s Fundamental AI Research (FAIR) team has released an update to SAM 2 featuring updated checkpoints with better accuracy.
What’s New
Additional data augmentation techniques to simulate the presence of visually similar objects and small objects where SAM 2 previously struggled.
Improved occlusion handling capability by training the model on longer sequences of frames and making some tweaks to positional encoding of spatial and object pointer memory (updated paper).
SAM 2 Developer Suite, a package of open source code to make it easier than ever to build with SAM 2. This release includes training code for fine-tuning SAM 2 with your own data.
Front-end and back-end code for the web demo.
Not much of changes in model sizes, no. of model variants and inference speed.