Home Blog Google Introduces PaliGemma 2 Family of Open Source AI Vision-Language Models

Google Introduces PaliGemma 2 Family of Open Source AI Vision-Language Models

6
0


Google launched the successor to its PaliGemma synthetic intelligence (AI) vision-language mannequin on Thursday. Dubbed PaliGemma 2, the household of AI fashions enhance upon the capabilities of the older era. The Mountain View-based tech big stated the vision-language mannequin can see, perceive, and work together with visible enter akin to photographs and different visible belongings. It is constructed utilizing the Gemma 2 small language fashions (SLM) which have been launched in August. Interestingly, the tech big claimed that the mannequin can analyse feelings within the uploaded photographs.

Google PaliGemma AI Model

In a blog post, the tech big detailed the brand new PaliGemma 2 AI mannequin. While Google has a number of vision-language fashions, PaliGemma was the primary such mannequin within the Gemma household. Vision fashions are totally different from typical massive language fashions (LLMs) in that they’ve further encoders that may analyse visible content material and convert it into acquainted knowledge kind. This approach, imaginative and prescient fashions can technically “see” and perceive the exterior world.

One good thing about a smaller imaginative and prescient mannequin is that it may be used for numerous purposes as smaller fashions are optimised for pace and accuracy. With PaliGemma 2 being open-sourced, builders can use its capabilities to construct into apps.

The PaliGemma 2 is available in three totally different parameter sizes of three billion, 10 billion, and 28 billion. It can also be accessible in 224p, 448p, 896p resolutions. Due to this, the tech big claims that it’s simple to optimise the AI mannequin’s efficiency for a variety of duties. Google says it generates detailed, contextually related captions for photographs. It can’t solely establish objects but additionally describe actions, feelings, and general narrative of the scene.

Google highlighted that the device can be utilized for chemical components recognition, music rating recognition, spatial reasoning, and chest X-ray report era. The firm has additionally revealed a paper within the on-line pre-print journal arXiv.

Developers and AI lovers can obtain the PaliGemma 2 mannequin and its code on Hugging Face and Kaggle here and here. The AI mannequin helps frameworks akin to Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.



Leave a Reply