We tried Google’s Gemini AI chatbot and found it’s more capable but still prone to hallucinations

Google has come a long way with its generative artificial intelligence (AI) offerings. A year ago, when the tech giant first unveiled its AI assistant, Bard, it failed because it made a factual error in answering a question regarding the James Webb Space Telescope. Since then, the tech giant has improved the chatbot’s responses, added a feedback mechanism to check the source behind the responses, and much more. But the biggest upgrade came when the company changed the large language model (LLM), moving the chatbot from Pathways Language Model 2 (PALM2) to Gemini in December 2023.

The company described Gemini AI as its most powerful language model to date. It also added AI image generation capabilities to the chatbot, made it multimodal and even renamed it Gemini. But how big of a leap is this for AI chatbots? Can it now compete with Microsoft Copilot, which is based on GPT-4 and has the same capabilities? And what about examples of AI hallucinations (a phenomenon where an AI responds with false or non-existent information masquerading as facts)? We decided to find out.

Google AI can currently be accessed in several ways. Google Advanced is a paid subscription with Google One AI premium plan which costs Rs. 1,950 monthly. Google Gemini also has an Android app. However, it is not available in India right now. Google Pixel 8 Pro also comes with a Gemini Nano model. For our testing purposes, we decided to use Google’s Gemini Pro-powered web portal, which is available in over 230 countries and territories and free to use.

Google Gemini’s generative capabilities

The user interface of the website remains the same, but the name has been changed from Bard to Gemini. If you’re signed in with your Google Account, the AI ​​will greet you by name and ask, “How can I help you today?” Below are some useful, quick tips that highlight the different functions it can do.

First, we asked you to write an email to test its basic generator skills. Keeping in mind that it’s layoff season, we asked her to “Draft an empathetic email from a manager to an employee informing them that they’ve been laid off.” To test its performance, we asked CoPilot to do the same. We found Gemini’s responses to be quite dull and robotic, trying to put in too much information at the beginning. The first paragraph said, “With great regret I am writing to inform you of a difficult decision we have had to make (company name). Due to (reason), we need to make reductions in our workforce, and your position will be terminated effective (date).

In comparison, the copilot’s response seemed more sympathetic and humane. It read, “I hope you enjoyed this email. I regret to inform you that recent organizational changes have required us to make some difficult decisions. Unfortunately, your position at (Company Name) has been impacted, and I am very sad to inform you that you have been laid off.

This should come as no surprise as Google’s chatbots are known for being a bit soft-spoken when it comes to formal content creation, and this hasn’t changed with the Gemini model. However, when it comes to informal content creation, Gemini starts to show its capabilities.

We asked both chatbots to “Write a letter to my mother informing me about my being fired from my job and write a short note to my mother informing me that I have been fired from my job. Be sure to express my feelings of sadness and grief.

Copilot took this literally and wrote, “The suddenness of it all has me feeling a mix of emotions – sadness, grief and a touch of uncertainty about what will happen next.” But Google understood the underlying message and did a much better job at conveying the emotion. It read, “I know you’re always worried about job security, and I wish I could tell you this was different. It was nothing wrong that I did, they are (the reasons). I’m trying to stay positive, but it’s hard not to get discouraged. However, I don’t want you to worry too much. I’m not panicking, and I’m already starting to think about my next steps.

It was very well written, and one could be forgiven for thinking that a real human being had written it. We ran some more tests, and each time Gemini’s responses were much more balanced than those of the GPT-4-powered Copilot.

accuracy of google gemini

The next test was to check the accuracy of the AI ​​model. Many people use AI to find answers to questions, and we wanted to do the same. It handled static general knowledge-based questions well. Even when we sent some grey-zone questions, they were based on facts and did not include any additional information. When questions were controversial, it refused to answer them, which is what you would expect from a responsible AI chatbot.

The problem came when we asked it a popular trick question that had troubled the older generation of chatbots in the past. When we asked the question, “Countries in Africa that start with the letter K”, he unknowingly replied, “As of today, February 12, 2024, there are no countries in Africa that start with the letter “K”. Yes. While circumstances may change over time, this fact has remained consistent for some time.” For those unaware, the problem occurs because the LLM it was trained on had the prompted phrase with the same response. Interestingly, this particular glitch is present on Copilot and ChatGPT, and despite regular reports, it has not been fixed. Still not removed from any chatbot.

However, the problem of AI hallucinations did not end there. We also found several other incorrect Gemini responses. When we asked, “Based on reviews, tell me the pros and cons of the iPhone 15 Pro”, he replied, “The iPhone 15 Pro has not been officially announced yet”. Actually, Apple smartphone was launched in September last year. In comparison, CoPilot performed better on technical questions.

Google Gemini in Assistant functions

Another skill that most AI chatbots boast of is their helpful features. They can brainstorm an idea, create an itinerary for the trip, compare your options, and even chat with you. We started by asking them to create an itinerary for a 5-day trip to Goa on a budget and include things people can do. Since the author was in Goa recently, it was easy for us to test it. While Gemini did a good job in highlighting all the popular destinations, the answers were not detailed and not much different from any other travel website. One positive side of this is that the chatbot probably won’t suggest anything wrong.

On the other hand, I was impressed by Copilot’s detailed feedback that included hidden gems and even names of recipes everyone should try. We repeated the test with different variations, but the results remained consistent.

Next, we asked, “I live in India. Should I buy a subscription to Amazon Prime Video or Netflix?” The feedback was exhaustive and included various parameters including depth of content, pricing, features and benefits. Although it did not directly suggest one of them, it listed why the user should choose any of these options. The co-pilot’s reply was also the same.

Finally, we spent time chatting with Gemini. The test lasted a few hours, and we tested the chatbot’s ability to be engaging, entertaining, informative, and relevant. Gemini performed quite well in all these parameters. It can tell you a joke, share lesser known facts, give you advice and even play word and picture-based games with you. We also tested its memory, but it could remember the conversion even after texting for an hour. The only thing it can’t do is reply to messages one line at a time as a human friend would.

Image creation capabilities of Google Gemini

In our testing, we learned several interesting things about the Gemini AI’s image-making capabilities. For example, the resolution of all images generated is 1536×1536, which cannot be changed. The chatbot refuses to fulfill any requests requiring it to create images of real-life people, which would potentially reduce the risk of deepfakes (creating AI-generated photos of real-looking people and objects).

But in terms of quality, the Gemini did a faithful job of sticking to the prompt and producing images. It can generate random photos in a particular style, such as postmodern, realistic and symbolic. The chatbot can also generate images in the style of popular artists from history. However, there are many restrictions, and if you ask for something special you will probably find Gemini denying your request. But comparing it with Copilot, I found that images were generated faster, remained consistent with prompts, and featured a wider range of styles we could use. However, it cannot be compared to dedicated image-generating AI models like DALL-E and MidJourney.

Google Gemini: Bottomline

Overall, we found the Gemini AI to be quite capable in most categories. As someone who has used AI chatbots occasionally since they became available, I can say with confidence that the Gemini Pro model makes it better to understand natural language communication and gain contextual understanding of questions Have given. The free chatbot version is a reliable companion if one needs it to generate ideas, write informal notes, plan a trip, or even generate basic images. However, it should not be used as a research tool or for formal writing, as these are two areas where it struggles greatly.

By comparison, CoPilot is better at formal writing and itinerary creation, as well as conversation (albeit with a smaller memory) and comparison. Gemini takes the crown in image building, informal content creation and user engagement. Considering that this is the first iteration of Gemini LLM, as opposed to the fourth iteration of GPT, we are curious to see how the tech giant looks at different ways to further improve its AI assistant.

Affiliate links may be automatically generated – see our ethics statement for details.

(TagstoTranslate)Google Gemini AI chatbot is more capable but still at risk of hallucinations CoPilot Google Gemini(T)Google(T)Artificial Intelligence(T)AI

Leave a comment