article

What are Large Language Models? (LLMs?)

A deep dive into LLMs
2024-03-214 min readFetch.ai

When OpenAI released ChatGPT in January 2023, it sent shockwaves around the world by bringing the transformative potential of Large Language Models (LLMs) to the forefront of public consciousness.

Since then, interest in LLMs and artificial intelligence has soared, with this new technology already sparking a seismic shift across multiple fields. From medicine to education, and law to technology, everyone is racing to discover how LLMs can make their lives easier and more productive, and this is just the beginning!

Let's dive into how they work, what they do, and how they'll evolve in the future. 

Key Takeaways

  • LLMs are generative AI models trained on massive amounts of textual data, able to understand and generate texts and carry out a variety of other tasks.

  • They are currently experiencing broad utilization across a range of areas for example content creation, research, education, customer service, programming, translation and more.

  • In the future, we will see LLMs become ever more customizable, specialist, and multi-modal.

How Do Large Language Models Work?

There are several technologies that contribute to the magic of LLMs. Let's take a closer look at what's involved.
LLMs leverage deep learning and vast amounts of textual data to generate accurate outputs. User queries are passed through layers of neural networks, each with parameters that can be fine-tuned during training to produce the best results. Results can be further improved by a layer known as the attention mechanism, which dials in on specific parts of data sets. They're typically based on a 'transformer' architecture, a framework that excels at handling sequential data, like text. 

Machine Learning & Deep Learning

AI LLMs heavily utilize machine learning and deep learning. Machine learning describes how models are trained to make predictions based on the data they've been trained on - typically a huge amount of text gathered from a variety of sources. What data is chosen and omitted can have a significant effect on what outputs users receive from the LLM. For example, LLMs trained using data scraped from the internet can reproduce biases and fake news present on the web.

Now add to that the innovation of Deep Learning and LLMs become more powerful. Deep Learning is a type of machine learning that's able to train itself without human interaction. It can accurately predict the most likely correct outcome for a given input by using probability and large training data sets.

Neural Networks

LLM queries are usually passed through layers of neural networks that are model architectures designed to imitate the network of neurons present in the human brain. Billions of 'nodes' are separated into layers that pass and transform information layer-to-layer to create an accurate output. 

Transformer Models

LLMs use a kind of neural network called a transformer model to help them better understand context. These models use a special mathematical technique called self-attention which detects subtle clues in a sentence that show how the different elements relate to one another. It helps the model understand how the ends of sentences connect to the beginning, how they fit into the overall structure, and how the sentences and paragraphs relate to one another.

So what can LLMs Do?

LLMs were initially designed with the goal of mastering human language which means they excel at tasks involving the comprehension and production of text. But that's not all they can do! LLMs are proving useful in many new ways.

Text Generation

LLMs excel at crafting texts and passages to a standard reaching or even surpassing human abilities. Whether it's writing a poem, drafting a rental contract, or creating a piece of content marketing, it's likely that an LLM can produce something pretty close to the real thing...in seconds.

Content Summarisation

Walls of texts could be a thing of the past, as LLMs can now quickly understand and summarise even the densest of passages. Meeting notes, scientific articles, news articles can all be fed into LLMs and re-constituted into an easily digestible form, saving time and effort.

DNA Research

Some researchers claim the way that DNA fits together is a lot like the building blocks of language. That makes DNA research a perfect area for LLMs to excel, and specialist LLMs are already being deployed to do that, unlocking great potential for scientific breakthroughs.

AI Assistants & Chatbots

LLM's natural language mastery and access to information makes them perfect chatbots and AI assistants. They can understand almost any request that can be formulated in text and their broad knowledge and human-like language abilities provide helpful, efficient responses.

Code Generation

LLM coding ability is advancing rapidly with the swift generation of large amounts of near-perfect code already possible. This helps speed up the process for human coders by streamlining the programming process, removing some of the more tedious aspects of the task.

Sentiment Analysis

Sentiment analysis is a form of consumer research that analyses open sources like customer reviews, social media posts, and news articles to gauge public opinion. Feeding these sources into an LLM as training data makes this process significantly more comprehensive and efficient than manual methods.

Language Translation

LLMs are able to grasp nuance and context in a way far improved from previous digital translators. They're even able to grasp complex patterns, homonyms, and compound words, which would usually confound previous translators. 

Advantages & Limitations of LLMs

A technology with this much transformative potential comes with risks as well as opportunities. Ahead, we outline some of the most important aspects that may elicit optimism or caution regarding LLMs.

Language Comprehension

One of the key strengths of LLMs is their ability to grasp natural human language. When querying an LLM, users have a remarkable level of freedom to craft the prompt however they see fit -- as if talking to a real human. This is far removed from the strict rules of syntax that have characterised the human-computer interactions of the past.

LLMs are usually able to understand even the most convoluted, badly-formulated questions and give accurate, well-reasoned answers at breakneck speeds. This is hugely advantageous in many areas, from education, to research, customer service, and many more.

Data Bias and Hallucinations

LLMs are only as good as the data they are trained on. That means faulty or biased data will produce faulty or biased results. This is especially true for LLMs trained on data from online sources for example, where a large portion of the data is user-generated.

In other cases, LLMs may 'hallucinate' data that never existed in the first place. These nonsensical or incorrect outputs are presented to the user in the same form as correct information and are sometimes very difficult to detect. Hallucinations can create big problems if not noticed.

Ethics and Privacy

AI LLMs aren't without the potential for ethical problems. For example, LLM data sets are truly massive, and sometimes those sets can include personal information that can be elicited from a simple text prompt.

LLMs are also considered 'black boxes'. This means that how they reach their conclusions and produce their outputs isn't explicitly explainable. Due to the fact that they reach decisions based on the consideration of millions of parameters, we can't fully understand why we get the outputs we do. That could become a problem when LLMs are used for highly consequential decisions like in law, for example. 

Best LLMs

Here are some of the most significant LLMs available right now:

  • ChatGPT, OpenAI, 1.5B/1.76B parameters. 

  • Gemini, Google, 1.8B/3.25B parameters. 

  • LLaMA-2, Meta, 7B/13B/70B parameters. 

  • BERT, Google, 110M/340M parameters. 

  • Command, Cohere (Amazon), 6B/52B parameters.

  • BLOOM, BigScience, 176B parameters

  • Falcon, TII, 1.3B/7.5B/40B/180B parameters. 

  • Claude, Anthropic, 500 billion parameters (rumoured) 

  • Grok, xAI, 63.2B parameters 

  • StableLM, Stability AI, 7B/13B/70B parameters. 

Future Steps for LLM Development

With the multitude of LLMs on the market all competing to become the next big thing, there's no doubt that LLMs will continue to evolve as we move into the future. It's difficult to predict where this will take us, but we can already see a few areas where breakthroughs may be on the horizon.

Specialisation

Currently, most LLMs are very general, trained on broad datasets with a very large scope. Going forward, we will see more specialist LLMs developed, explicitly tailored to excel in their respective fields. These LLMs should perform exceptionally well for their industry-specific tasks. For example, LLMs designed to construct legal documents or medical LLMs able to provide swift, accurate diagnoses.

Multimodal LLMs

Most current LLMs are limited to text output and input, with some limited scope for images. Future development will integrate more modes of information into LLM capabilities, like text, images, videos, audio, and other sensory inputs.

Customization

Future LLMs will be tailored to their user. This means over time they will learn what their user expects from them and how best to serve them. To achieve this, methods for personalization are being developed to upgrade user experience and allow continuous learning.

How Can Developers Build Their Own LLMs?

You may be wondering whether it's possible to create your own LLM. Although very difficult because of the need for significant computational resources and expertise, it's possible. Here's a broad overview of the general steps.

They are:

  • Obtain and prepare data. Obtain the textual data you want to input into your model. Clean the data by removing unnecessary data and organize it neatly. Generally, the more data, the better - as long as it's good data. 

  • Design your model. Choose the architecture for your model. There are different structures to choose from: like Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs), or Transformers. Different architectures have different strengths. 

  • Train your model. Feed the data to your model, splitting the data into training, validation, and test sets. Monitor performance and continually tweak the model. 

  • Evaluate model. Test your model's performance by assessing various metrics, like accuracy, maths level, problem-solving, language skills, etc. Fine-tune based on results. 

  • Deploy. When you're satisfied with your LLM, you're ready to put it to use. 

Frequently Asked Questions about LLMs

Here are some of the most commonly asked questions regarding LLMs... 

What are LLMs in AI?

Large language models (LLMs) are models that have been trained on extremely large datasets, enabling them to understand natural language, generate text and other types of content, and carry out a wide variety of tasks. 

Is ChatGPT an LLM??

OpenAI's ChatGPT is the world's leading LLM at the time of writing. After launching in January 2023, it became the world's fastest-adopted consumer application in history and currently has around 180 million users. 

What is the difference between NLP and LLM?

A large language model is a model trained on massive amounts of textual data on order to understand and create text and other types of content. It uses Natural Language Processing (NLP) to recognize, understand, and generate text and speech.

Liked this article? You might also like our latest blog on artificial general intelligence.


More from Fetch

light-leftellipse-orangeellipse-orange