Grok 2 tested: Is it better than ChatGPT in the real world?

3 months ago 71

Calvin Wankhede / Android Authority

When Elon Musk took over Twitter, he made the controversial decision to rebrand the platform as X. The name change was part of his grand vision for an all-in-one or “super” app. Not long after, OpenAI’s ChatGPT kickstarted an industry-wide AI arms race. Musk, who was a founding member and initial investor in OpenAI, criticized the company for abandoning its non-profit roots and announced his own chatbot, Grok, to compete with ChatGPT. Both AI platforms have received major updates since then, so in this article, let’s take a closer look at how Grok-2 performs vs ChatGPT in the real world and which one you should use.

What is Grok AI and what can it do?

X (Twitter) app on smartphone stock photo

Edgar Cervantes / Android Authority

Grok is an AI chatbot similar to OpenAI’s ChatGPT and Google’s Gemini. Developed by xAI, a startup founded by Elon Musk, Grok relies on a family of language models of the same name. The latest model is available in two sizes: Grok-2 and Grok-2 mini. The latter delivers faster responses at the expense of accuracy, while xAI claims that the larger model can even match ChatGPT. We’ll test those claims in a later section below.

Grok-2 has access to real-time information via tweets shared on X, differing from rivals like ChatGPT and Gemini that use a search engine like Bing or Google. Given that the social media platform is often used to share breaking news, Grok can potentially generate more useful responses about current affairs and recent events. However, this strategy can also equally backfire and cause the chatbot to regurgitate fake news and other low-quality X posts. Even before Elon Musk took over the platform, Twitter famously struggled to draw a line between credible sources and misinformation.

Instead of using a search engine like Bing, Grok relies on tweets for real-time information.

It’s important to note that Grok’s focus on X posts doesn’t extend to its underlying training dataset. Like any other language model, it also has knowledge of older events and broader topics. This broader training explains why Grok-2 can generate code. In fact, one of xAI’s goals is to develop a model capable of advanced mathematical reasoning. To achieve this goal, Elon Musk enticed talent from Google and OpenAI, which likely helped the startup raise billions in funding to date.

Like many other chatbots, Grok can also create AI-generated images. Unlike ChatGPT, which uses OpenAI’s in-house DALL-E 3, Grok relies on the relatively newer FLUX.1 model for image generation. The latter comes courtesy of another artificial intelligence startup, Black Forest Labs. FLUX.1’s biggest advantage over competitors like Midjourney and Stable Diffusion is that it can accurately handle intricate human anatomy like fingers.

How to access and use Grok AI

Grok is one of the only AI chatbots that cannot be used for free — you’ll need either an X Premium ($8 per month) or X Premium Plus ($16 per month) subscription. Even though Grok AI is developed by xAI, a completely independent startup, you can only access the chatbot via X (formerly Twitter). This is perhaps not that surprising since Elon Musk is at the helm of both companies and that the chatbot relies on posts published to the social media platform.

We don’t know if xAI will drop the subscription requirement in the future but AI chatbots have notoriously high computational costs. Grok competitors like ChatGPT and Gemini can only offer their service for free because of their heavy funding from cloud providers like Microsoft, Amazon, and Google.

Grok-2 vs ChatGPT: What’s the difference?

Calvin Wankhede / Android Authority

Before we get into the real-world comparisons, there are a couple of big philosophical differences between Grok and ChatGPT we should get out of the way first. Elon Musk created xAI and Grok in direct response to OpenAI’s handling of ChatGPT. Shortly after the chatbot’s release in late 2022, he tweeted, “The danger of training AI to be woke – in other words, lie – is deadly.”

With Grok, Musk aims to build a “maximum truth-seeking AI” that does not align with a particular ideology. On the other hand, most AI giants like OpenAI and Google devote copious resources to build guardrails for their respective AI models. This can be viewed as a form of censorship, but experts believe that such guardrails are necessary to prevent AI from being used for unethical or illegal purposes.

So how does Grok-2 perform in the real world? According to xAI’s blog post, it ranks above OpenAI’s latest GPT-4o and Anthropic’s Claude 3.5 Sonnet in various benchmarks. But benchmarks rarely reflect the way you or I would use an AI chatbot, so let’s take a look at some real-world comparisons of Grok vs ChatGPT.

Prompt: “What is the general consensus on the Tesla Cybertruck from the perspective of both, the general public and car enthusiasts?”

The Tesla Cybertruck represents another controversial Elon Musk move in recent memory, so this prompt should give us a good sense of Grok’s truthfulness and ability to pick relevant tweets. Luckily, the chatbot lists any tweets it references at the bottom. In this instance, many of the tweets Grok picked came from self-proclaimed Tesla investors. A smaller proportion of tweets admittedly went the opposite direction and criticized the Cybertruck.

Overall, both Grok and ChatGPT delivered a balanced answer to a nuanced question without much censoring from either side. If you value personal reviews, you may even prefer Grok’s answer as it describes the ownership experience more thoroughly than ChatGPT. For example, the X chatbot highlighted the Cybertruck’s odd design choices “like the lack of physical controls for common functions and issues like inadequate mirrors for towing.”

ChatGPT’s response wasn’t as detailed, likely because it read a more limited selection of sources. It only criticized the Cybertruck’s unsafe design for pedestrian safety and remained mostly neutral otherwise.

Prompt: “What’s the likelihood of rain over the next week in Mumbai?

Chatbots that rely on large language models (LLMs) are known to hallucinate, so asking about the weather or using vague terms like “the next week” can easily trip them up. Thankfully, both ChatGPT and Grok can access the internet for real-time information, which at least points them in the right direction.

I picked this prompt because official rain forecasts are far from accurate where I live. So instead, I mostly rely on a handful of weather experts that post live updates on X/Twitter. Unsurprisingly then, Grok could easily deliver a comprehensive forecast based on various tweets corroborating each other. The list of sources even included my city’s meteorological department. ChatGPT delivered a similar forecast too, but it consulted generic weather websites and didn’t go into nearly as much detail.

Prompt: “Generate a photorealistic image of Steve Jobs wearing his iconic turtleneck hoodie, using a modern Samsung Android phone”

If you’re big on AI image generators, both ChatGPT and Grok include one. However, ChatGPT’s DALL-E model has plenty of restrictions that prevent it from generating unsafe or privacy-infringing images. It outright refused to generate the image I requested in the above prompt, for example, limiting its usefulness considerably. Grok delivered an image, following Steve Jobs’ likeness but mistaking the turtleneck for a hoodie. Still, it proves that Grok doesn’t censor its output to nearly the same degree as ChatGPT.

Prompt: “Is the Amex Platinum worth its high annual fee?”

This is another experiential question that gives Grok a slight edge, thanks to several differing perspectives on X. While both chatbots’ responses were similar, Grok offered some additional insights like the card’s welcome bonus potentially outweighing the annual fee.

It’s ultimately the click-able tweets at the bottom that help solidify Grok’s lead over ChatGPT. If I have little knowledge about a product or service, the X results are invaluable as they instantly give me both positive and negative reviews from real people.

Verdict: Is Grok better than ChatGPT?

Calvin Wankhede / Android Authority

In my time testing Grok, I found that it tends to pick a seemingly random mix of tweets to use as sources. In the Cybertruck prompt above, it could have weighed the opinions of established car reviewers or journalists instead of relatively unknown Tesla investors or influencers. With the way Grok is currently set up, I fear that X users with large followings and engagement may skew the chatbot’s output.

However, there is no denying that Grok is currently at least as good as its direct rivals like ChatGPT. As I said in the weather example above, I may even prefer using the chatbot instead of scrolling through a list on X (formerly Twitter). However, it’s a double-edged sword — Grok should ideally warn users of potential biases due to its over-reliance on just a handful of X posts.

That aside, the only major downside to Grok vs ChatGPT is that you still need to pay for an X Premium subscription. With mounting hardware and infrastructure costs, xAI can only sustain growing user demand if it can find investors with deep pockets. Having said that, if you already pay for an X subscription, Grok is a surprisingly competitive value-add that you should consider adding to your rotation as a ChatGPT alternative.

Read Entire Article