What is a foundational model?
Foundation Models (FM), also often known as general purpose AI, are a type of AI technology that are trained on large datasets and are capable of carrying out a wide range of "general" tasks and operations. 1
FMs can help businesses and individuals to improve communication, analyse data and automate tasks. They underpin the natural language processing chatbot ChatGPT, image generation tools such as Midjourney and the use of generative AI in productivity software.
The type of data a FM is exposed to during training will determine its 'mode'.
- Large language models (LLMs) are FMs which are trained on text data.
- Image generation models are FMs trained on image data (in addition to text data).
- Multi-modal FMs are FMs trained using several different data sources.
Developing and training FMs
There are a number of steps required to develop, train and deploy a FM.
Pre-training
In the pre-training stage, the training data is collated from different sources and usually examined to allow for the extraction of harmful or irrelevant data. The data is commonly taken from publicly available sources through a process of web crawling, using open datasets and/or using proprietary data.
The datasets are then tokenised, which involves dividing the data into billions of small 'tokens'. In an LLM, each token may represent a word or parts of a word, whereas in image generation models, a token may represent smaller components of an image (such as a pixel). Through a process called 'self-attention', the model can weigh up the importance of each token and determine the probable relationships between them such as between the words cat, kitty and feline.
The model learns how to produce accurate outputs during the training process by using the parameters it is exposed to from the datasets to adjust its calculations. Sometimes referred to as "weights", a parameter is a connection chosen by the language model and learned during training.
FMs apply learned patterns from one task to another (transfer learning).
Fine-tuning
Fine-tuning is an optional next phase in which a pre-trained model receives additional capabilities or improvements using specific datasets. The main types are:
- Alignment: fine-tuning a model to align its behaviour with the expectations or preferences of a human user for example to enable it to create music that matches certain moods. In order to prevent biased, false or harmful outputs, a technique known as reinforcement learning from human feedback (RLHF) is used. RLHF trains the model to make decisions that maximise "rewards". The rewards function relies on humans feeding back which responses they prefer, to distinguish between wanted and unwanted behaviour. The response-rating preferences build a reward model that automatically estimates how high a human would score any given prompt response. This reward model is then applied to a (language) model to allow it to internally evaluate a series of responses and then select the response most likely to result in the greatest reward, optimising human preferences. RLHF can be provided by paid contractors or directly from users. Alignment is also used to teach the model to ‘speak like a machine’ so as not to mislead users. Examples of human-machine conversations from existing chatbots can be collated and used to fine-tune a pre-trained model to add this capability.
- Domain or task specific: fine-tuning a model to a specific domain or task using smaller, specialised datasets. For instance, a dataset containing legal documents could improve a model's ability to prepare legal documents or provide advice.
- Synthetic data: fine-tuning a model using artificially generated data, such as data from simulations, real data which has been artificially extended or new datasets created from existing AI models. While developers benefit from the lower cost of acquiring synthetic data at large scales (compared to human data), there is the potential risk of 'model collapse', where defective data from existing FM models pollute the generated synthetic data.
Inference
At the inference stage of the development process, the user feeds new inputs into the model, which then uses its parameters to create a prediction. An inference refers to the process of models making predictions based on the new data received. It provides a test of how well the model can apply information learned during training to make a prediction.
Open vs closed source models
FM developers may choose to develop and release a FM in either open or closed source.
An open-source FM model can be shared widely and is free to use, subject to a licence (although the licence may prohibit commercial use). A licensee may be provided with the original FM code, model architecture, training data and potentially even the weights and biases, allowing them to mirror the training process and/or to fine-tune the FM without needing to go through the pre-training process.
A closed-source FM model is developed privately. Access to the model is usually restricted and controlled by those within the company, for example. Rather than releasing externally, these models are likely to be used for the company’s own initiatives and operations.
Evaluation methods – testing the performance of FMs
FM developers typically evaluate their own FMs to analyse their capabilities or to identify falsities in outputs. Different evaluation methods include:
- Evaluating against static datasets of input-output pairs, which assess performance against a wide-range of criteria such as accuracy, multitask ability and robustness.
- Model based evaluation, where one or more other models are used to evaluate the FM.
- Using human raters who are asked or paid to carry out model specific evaluation tasks. This is considered to be the gold-standard for evaluation, but it can make comparison exercises across models and papers more difficult due to the tailored nature of the tasks and the different evaluation methods adopted by raters.
- The process of red teaming involves experts using deliberately misleading questions to identify faults.
How do businesses access FMs in downstream markets?
There are already many areas where FMs are, or will likely become, incorporated into downstream markets. Downstream businesses can access FMs by:
Creating and developing a FM in-house to support the business' needs and objectives. Whilst this option offers businesses full control over an FM, the technical, costly and time-consuming development process means it is unfeasible for many businesses.
Collaborating with an established third-party FM provider to develop an existing FM. This may allow the business to fine-tune the FM with its own data and, in turn, take ownership over the fine-tuned FM. This would be a cheaper option but still requires businesses to invest money, time and expertise for the FM's development.
Purchasing application programming interface (API) access to a FM and FM deployment tools owned by a third party. This option is often much cheaper and faster to implement than developing a FM in-house. However, businesses will not have the opportunity to tailor the FM to business needs and will be reliant on a third-party product.
Offering a third-party FM plug-in to enhance services and extend functionality. For example, a business may opt to provide a plug-in which allows users to use FM based service such as ChatGPT. This is a very accessible option for businesses seeking to reap the benefits of incorporating an FM into its products and services without the cost, expertise and time required by other options.
1 This summary contains public sector information licensed under the Open Government Licence v3.0, specifically the Competition & Markets Authority's AI Foundation Models Initial Report dated 18 September 2023.
Discover more insights on the AI guide
Stay connected and subscribe to our latest insights and views
Subscribe Here