Vision Geek Newsletter

Share this post

User's avatar
Vision Geek Newsletter
Nvidia Jetson Orin Nano Super, PaliGemma 2, Segment Anything Model (SAM) 2.1
Copy link
Facebook
Email
Notes
More
User's avatar
Discover more from Vision Geek Newsletter
Essential news and learning resources for computer vision enthusiasts.
Over 1,000 subscribers
Already have an account? Sign in

Nvidia Jetson Orin Nano Super, PaliGemma 2, Segment Anything Model (SAM) 2.1

Vision Geek Newsletter #10

Arun Ponnusamy
Dec 31, 2024
1

Share this post

User's avatar
Vision Geek Newsletter
Nvidia Jetson Orin Nano Super, PaliGemma 2, Segment Anything Model (SAM) 2.1
Copy link
Facebook
Email
Notes
More
Share

Hello there,

Welcome to the next edition of Vision Geek Newsletter, covering the essential news in Computer Vision and Machine Learning, read by 1000+ Computer Vision practitioners.

Let’s get into it.

Nvidia Jetson Orin Nano Super

Nvidia has announced a “super” update to their existing edge device “Jetson Orin Nano” and renaming the updated device to “Jetson Orin Nano Super” with a price drop from $499 to just $249 (8GB variant), claiming to be “The World’s Most Affordable Generative AI Computer”.

There is no hardware update. But through a pure software update, they have introduced a new 25W power mode to the device which has increased the performance in all aspects.

  • 1.7x higher generative AI model performance.

  • 67 Sparse TOPs, a significant increase from the previous 40 Sparse TOPs

  • 102 GB/s of memory bandwidth, a significant leap from the previous 65 GB/s memory bandwidth.

  • 1.7 GHz of CPU clock speed, up from 1.5 GHz.

  • 1020 MHz of GPU clock speed, up from 635 MHz.

This compact yet powerful system can effortlessly handle a wide range of LLMs, VLMs, and Vision Transformers (ViTs), from smaller models to those with up to 8B parameters, such as the Llama-3.1-8B model.

For the given price point and the performance, Orin Nano Super can be a great choice as a baseline edge device to handle the modern day AI workloads. Though it is advertised as Generative AI computer, it can handle all forms of ML models.

PaliGemma 2

Google has released the next update to PaliGemma, open source vision-language model in the Gemma family of models. Like its predecessor, PaliGemma 2 uses the same powerful SigLIP and PaLI-3 for vision, but it upgrades to the latest Gemma 2 for the text decoder part.

What’s New

  • Scalable performance: Optimize performance for any task with PaliGemma 2's multiple model sizes (3B, 10B, 28B parameters) and resolutions (224px, 448px, 896px), where as the previous version had only one variant (3B).

  • Long captioning: PaliGemma 2 generates detailed, contextually relevant captions for images, going beyond simple object identification to describe actions, emotions, and the overall narrative of the scene.

  • Expanding to new horizons: Leading performance on chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation, as detailed in the technical report.

PaliGemma 2 is distributed under the Gemma license, which allows for redistribution, commercial use, fine-tuning and creation of model derivatives. The pre-trained models have been designed for easy fine-tuning on custom datasets for specific tasks. Model files are available on Kaggle and HuggingFace. Get started with HuggingFace Transformers or Keras. (Docs | Notebooks | Demo)

Segment Anything Model (SAM) 2.1

Meta’s Fundamental AI Research (FAIR) team has released an update to SAM 2 featuring updated checkpoints with better accuracy.

What’s New

  • Additional data augmentation techniques to simulate the presence of visually similar objects and small objects where SAM 2 previously struggled.

  • Improved occlusion handling capability by training the model on longer sequences of frames and making some tweaks to positional encoding of spatial and object pointer memory (updated paper).

  • SAM 2 Developer Suite, a package of open source code to make it easier than ever to build with SAM 2. This release includes training code for fine-tuning SAM 2 with your own data.

  • Front-end and back-end code for the web demo.

Not much of changes in model sizes, no. of model variants and inference speed.


Subscribe to Vision Geek Newsletter

By Arun Ponnusamy · Launched 6 years ago
Essential news and learning resources for computer vision enthusiasts.
1

Share this post

User's avatar
Vision Geek Newsletter
Nvidia Jetson Orin Nano Super, PaliGemma 2, Segment Anything Model (SAM) 2.1
Copy link
Facebook
Email
Notes
More
Share

Discussion about this post

User's avatar
Detection Transformers; Nvidia Jetson Xavier NX DevKit; Google's Big Transfer (BiT); CVPR 2020
Vision Week Issue #5
Jun 26, 2020 • 
Arun Ponnusamy
1

Share this post

User's avatar
Vision Geek Newsletter
Detection Transformers; Nvidia Jetson Xavier NX DevKit; Google's Big Transfer (BiT); CVPR 2020
Copy link
Facebook
Email
Notes
More
Meta SAM2; FastHTML; Google Imagen 3; Grok 2
Vision Geek Newsletter #9
Aug 25, 2024 • 
Arun Ponnusamy

Share this post

User's avatar
Vision Geek Newsletter
Meta SAM2; FastHTML; Google Imagen 3; Grok 2
Copy link
Facebook
Email
Notes
More
PyTorch Documentary; Karpathy's Keynote Speech; Kyutai's Moshi; Andrew Ng's Talk
Vision Geek Newsletter #8
Jul 26, 2024 • 
Arun Ponnusamy

Share this post

User's avatar
Vision Geek Newsletter
PyTorch Documentary; Karpathy's Keynote Speech; Kyutai's Moshi; Andrew Ng's Talk
Copy link
Facebook
Email
Notes
More

Ready for more?

© 2025 Vision Geek
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.