Skip to content

Meta AI Releases New Quantized Versions of Llama 3.2(1B & 3B)

Posted on:October 25, 2024 at 05:02 AM

Llama 3.2 Quantized

Meta AI Releases New Quantized versions of Llama 3.2 (1B & 3B) models with 2-4x increases in inference Speed and 56% Reduction in Model Size.

Table of contents

Open Table of contents

Introduction

Meta AI has recently released new quantized versions of the Llama 3.2 (1B & 3B) models. These quantized models are designed to be more efficient and lightweight, with a 56% reduction in model size and 2-4x increases in inference speed compared to the original BF16 models.

Key takeaways

Accuracy

Llama-3.2 1B QLoRA delivers competitive accuracy to Llama-3.2 1B BF16 while improving the inference speed significantly on Android phones.

Llama 3.2 1B QLoRA

Similar improvements were observed for the 3B model.

Safety

As these models are small and can be executed on mobile devices, they can run offline and the data never leaves the device. This ensures user privacy and data security.

As these models execute locally it opens up numerous possibilities for the on-device AI capabilities.

Availability

The models can be downloaded from Hugging Face or Llama Website

Conclusion

The release of the quantized versions of Llama 3.2 (1B & 3B) models is a significant milestone for Meta AI. These models are more efficient and lightweight, with a 56% reduction in model size and 2-4x increases in inference speed compared to the original BF16 models. This will enable developers to deploy AI models more easily on resource-constrained devices, opening up new possibilities for AI applications in various industries.

Read more about the release here