In this lesson, you will learn key information about the Jamba models, its unique architecture, and the advantages that comes with it. Let's dive into it. The main motivation to develop the Jamba model is to improve the efficiency of large language models, without compromising the output quality. Transformer architecture is used for almost all of the mainstream our LLMs today. It keeps all the context information using the attention mechanism. The training time grows quadratically with contexts length. The inference time grows linearly with the context length per step with KV cache, and the inference memory grows linearly too. The computation inefficiency takes a heavy toll, especially when you have a long contexts and it puts limitation on the contexts length. The Mamba architecture was published in December 2023 to address the computational inefficiency issue of Transformers. By compressing the context into an efficient state similar to a recurrent neural network, while also running computation in parallel with a selection mechanism, a more detailed discussion will come in the next lesson. Mamba Architecture enables linear training time growth with context length in comparison to quadratic growth for transformer. Inference time is constant per step compared to linear time graph for transformer. And inference memory stays constant compared to linear memory growth for transformer. The much improved computational efficiency enables the Mamba architecture to better process long context length natively. The downside of pure architecture model is that the model robustness and output quality suffers are large scale. To take the best of both worlds, we optimized a hybrid of transformer and Mamba architecture to create the Jamba model to achieve high output quality, as well as high computational efficiency for high throughput and low memory footprint. Mixture of experts were also used for the Jamba model to further improve the model throughput, efficiency and quality. The latest iteration of the Jamba model is the Jamba 1.5 model family, which consists of two models. Jamba 1.5 large and Jamba 1.5 mini. The Jamba 1.5 large model has 94 billion active parameters and 398 billion total parameters. And the Jamba 1.5 mini model has 12 billion active parameters and 52 billion total parameters. Both models have a very long effective context length of 256 K tokens. The Jamba models are also equipped with key features for developers to build enterprise GenAI applications such as tool calling, structured Json output, document as an input object, and streaming. Jamba models are also multilingual, supporting nine different languages including English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic and Hebrew. Jamba models are released as an open-weight models on HuggingFace and also widely available across different platforms and frameworks, including AWS, GCP, Azure, Nvidia NIM, Databricks, Snowflake Cortex, LangChain, LlamaIndex, and more. We can also deploy the Jamba models into your private cloud or on-premise environments. If you look under the hood, a single Jamba block consists of eight layers, including one transformer layer, three Mamba layers, and four Mamba Plus MOE layers. This is a composition that optimizes both model output quality and the model efficiency. The improved efficiency of the Jamba model is quite clear. When comparing KV cache memory footprint a long context. Because of the compressed context and all of the Mamba layers Jamba models, KV cache is only a fraction of other transformer-based LLMs at similar or even smaller number of model parameters. While being a lot more efficient, the Jamba model also excel at long contexts evaluation benchmarks. RULER is a recent benchmark developed by Nvidia specifically to evaluate large language models performance at long context. RULER evaluates LLMs for different task categories, including multiple needles in the haystack retrieval, multi-hop tracing, word extraction, and question answering at different context length. Both Jamba models do very well across different contexts length with Jamba 1.5 tops Leaderboard. Great. So now you have learned all the key information about the Jamba model. Next, you will learn the evolution journey that leads to the development of the Jamba model and the science behind its efficiency gain. See you in the next lesson.