Google’s Gemini Ultra is the first model to outperform human experts on MMLU – Google.
Highlights:
- Gemini outperforms GPT-4V in several performance categories according to Google’s benchmarks.
- There will be three versions of Gemini: Ultra, Pro, and Nano.
- The first Pixel device to use Gemini via Gemini Nano will be the Pixel 8 Pro.
- On December 13th, Gemini Pro will be accessible via the Gemini API in Google AI Studio.
Following the recent turmoil at OpenAI, which included Sam Altman’s firing and subsequent rehire. It appears that Google sensed trouble in the air because a few weeks later, the company unveiled a new AI model more potent than GPT-4V.
Google Gemini, which will underpin Bard starting today and eventually all of Google’s AI products, was unveiled by Google as the company’s AI future. Like the rest of Google, the Ultra, Pro, and Nano models of the Gemini 1.0 are designed to be widely available.
Google’s Gemini Explained
Google is calling Gemini “the most capable and general model we’ve ever built.” Google’s stack of AI products will be powered by this backend model, notwithstanding the model’s three-size release choice.
- Google’s largest and most powerful model for extremely complicated jobs is called Gemini Ultra.
- Google’s top model for scaling over a variety of workloads is Gemini Pro.
- Google’s most effective model for on-device tasks is Gemini Nano.
If there’s one thing I’ve learned about electronics, it’s that manufacturer benchmarks aren’t always reliable, even though some of the performance figures Google is highlighting for Gemini are very remarkable. Having said that, after seeing Gemini in action, it is hard to doubt its efficacy. A video of Gemini in action was shared by @rowancheung on X (Twitter), and the outcomes are astonishing.
Gemini Ultra is the first model to outperform human experts on MMLU (massive multitask language understanding), which uses a combination of 57 subjects such as math, physics, history, law, medicine and ethics for testing both world knowledge and problem-solving abilities.
Google
Let’s Talk About Google’s Gemini Performance! Shall We?
Google is promoting Gemini as the greatest AI model ever created. At least until OpenAI releases ChatGPT-5, Gemini will be the best product on the market if these benchmarks hold up to independent testing. The great rule of the modern economy is that, as businesses strive to produce the finest goods, customers almost always come out on top.
Gemini ought to encourage OpenAI to keep innovating, but it’s clear that many people are worried about heedless research that disregards safety, including CEOs like Satya Nadella who have contrasted AI to atomic energy.
In the majority of the benchmarks that Google displayed, Google Gemini performed better than ChatGPT-4V. At times, by more than 4% points. View the complete set of benchmarks.
Table Showing Comparision B/w Gemini Ultra vs GPT 4V
Capability | Benchmark | Description | Gemini Ultra | GPT-4V |
---|---|---|---|---|
General | MMLU | Representation of questions in 57 subjects (incl. STEM, humanities, and others) | 90.0% CoT@32* | 86.4% 5-shot* (reported) |
Reasoning | Big-Bench Hard | Diverse set of challenging tasks requiring multi-step reasoning | 83.6% 3-shot | 83.1% 3-shot (API), |
Row 2 – Cell 0 | DROP | Reading comprehension (F1 Score) | 82.4 Variable shots | 80.9 3-shot (reported) |
Row 3 – Cell 0 | HellaSwag | Commonsense reasoning for everyday tasks | 87.8% 10-shot* | 95.3% 10-shot* (reported) |
Math | GSM8K | Basic arithmetic manipulations (incl. Grade School math problems) | 94.4% maj1@32 | 92.0% 5-shot CoT (reported) |
Row 5 – Cell 0 | MATH | Challenging math problems (incl. algebra, geometry, pre-calculus, and others) | 53.2% 4-shot | 52.9% 4-shot (API) |
Code | HumanEval | Python code generation | 74.4% 0-shot (IT)* | 67.0% 0-shot* (reported) |
Row 7 – Cell 0 | Natural2Code | Python code generation. New held out dataset HumanEval-like, not leaked on the web | 74.9% 0-shot | 73.9% 0-shot (API) |
Even if these numbers are outstanding, the typical consumer probably doesn’t give much thought to them. I find it more fascinating that Google is integrating Gemini Nano into the Pixel 8 Pro since it is a model for on-device tasks. Many manufacturers are starting to incorporate on-device AI features, such as TensorRT-LLM from NVIDIA, into their products. In my opinion, this is a more promising direction for AI in the future.
Also Read: Want to Know About Microsoft’s Windows 10 ESU? Read Every Minute Detail Here
Perfecting Every Bit But Still A Long Road Ahead
When multimedia replaces text, it’s probably going to be a major shift. The underlying issues with AI models that are trained by identifying patterns in enormous amounts of real-world data, however, have not altered. You can get them to respond to increasingly complex indications with increasingly sophisticated answers, but you can never be sure that they weren’t just giving you a plausible answer rather than the right one. When interacting with Google’s chatbot, be aware that “Bard may display inaccurate info, including about people, so double-check its responses.”
According to a Google study document (PDF), Gemini has a wide range of talents.
It can accurately predict that a hexagon will be the next shape in a series of shapes that include a triangle, square, and pentagon. When asked to identify the connection between images of the moon and a hand holding a golf ball, it correctly notes that two golf balls were struck by Apollo astronauts on the moon in 1971.
Additionally, the company demonstrated how Gemini could analyze a handwritten physics problem with a straightforward sketch, identify the student’s mistake, and provide an explanation of the correction. A longer demonstration video had Gemini identifying other videos, hand puppets, sleight-of-hand maneuvers, and a blue duck. However, none of the demos were conducted in real-time, and it’s unclear how frequently Gemini fails at tasks like these.
Final Verdict
We have all imagined of one of the greatest—and most likely feasible—future uses for these LLM AIs since Star Trek more than eight decades ago. A translator of all languages. Although ChatGPT can currently translate, the process of creating translations takes a while. AI models are now available that can translate voice acting into a different language while preserving the voice of the original performer. As an avid fan of anime, Japanese dramas, and Korean dramas, I would like a world in which I could turn on my TV and instantly hear the voices of the original actors in English. This eventuality is becoming closer as these massive companies vie with one another to enhance AI.