Meta SAM2; FastHTML; Google Imagen 3; Grok 2

Vision Geek Newsletter #9

Aug 25, 2024

Meta has released the second version of SAM with significant improvements to the first version released last year. SAM2 can now segment and track selected objects in videos. This is made possible with the added memory bank in the architecture. SAM2 is 6x faster compared to SAM1 and the model weights sizes are much smaller. SAM2 has 4 model variants tiny, small, base plus and large. They have also released both the model weights and the dataset used for training. (github | blog | paper | demo)

FastHTML

Jeremy Howard and the team at answer.ai have released a new library for building modern interactive web apps with pure python in just few lines of code. FastHTML can be used for everything from collaborative games to multi-modal UI. It can be a potential alternative to existing libraries like streamlit, gradio and mesop. (docs | blog | github)

Google Imagen 3

Google announced it’s latest image generation model “Imagen 3” at Google I/O few months back. Now it is being released for people to use in select countries through ImageFX lab web interface and Google Cloud Vertex AI Studio. Imagen 3 seems to be capable of creating stunning photo realistic images from text prompts competing with other models like DALL-E 3, FLUX.1 and Midjourney with better safety guardrails.

Grok 2 with Vision

xAI, the AI lab from X (formerly Twitter) headed by Elon Musk, has released Grok 2 with vision capability. Grok started off as an LLM with just text input and output. The recent version expands it’s capabilities in vision and text understanding, also integrating real-time information from the 𝕏 platform. Currently it is in beta. 𝕏 Premium and Premium+ users will have access to two new models: Grok-2 and Grok-2 mini in the X app. xAI is working with Black Forest Labs to experiment with their uncensored image generation model FLUX.1

Mark’s Open Source Vision

Meta has been making significant contributions to the open source AI community lately with state-of-the-art foundational models like SAM2 and Llama 3.1 which competes with commercial models from companies like Google and OpenAI. If you have been wondering why Meta is actually open sourcing these models, Meta’s CEO Mark Zuckerberg shares his vision for open source in this Bloomberg interview.

Vision Geek Newsletter