Generative AI and large language models (LLMs)¶
This module is designed to give some overview of and develop our skills in computational tools which may aid our mathematical investigations. It would, therefore, be remiss not to discuss the relative new kids on the block that are AI (artificial intelligence) and ML (machine learning).
AI/ML have developed at an incredible pace in recent years and are now influencing our lives in countless ways, from self-driving cars to disease detection and more.
While the developement of the AI algorithms themselves is a highly mathematical endevour, they have also had a significant impact on mathematical and statistical research more generally, seeing use in 'predicting' chaos, ecological modelling and in compuational algebra.
Where you might be most interested in the use of AI, as a student navigating a mathematics degree, is as a teaching and learning aid.
IMPORTANT¶
It should be said from the outset that unauthorised use of AI when completing assessments is forbidden in the university regulations. This means unless otherwise explicitly stated in the assessment instructions (see tutorial 4), you should not use any AI tool like ChatGPT (see below).
Generally, good practice is to use it only in a responsible way to enhance your understanding of a topic, not as a means to get an answer to a problem. In fact, as we will see, ChatGPT often performs poorly with specific problems but can peform well at regurgitating theory.
This being said, employers will increasingly be expecting students to be aware of AI's abilities, shortcomings and be effective users. This section of the module is designed, in part, to address this.
What is AI?¶
Before launching into a discussion on the pros and cons of popular AI tools we should mention what AI is. This isn't a course in AI, but we can point out the basics for anybody unfamiliar with what it is.
AI is a broad and rather nebulous term covering a range of technologies where an algorithm has been "trained" to be able to perform certain tasks.
Often the underlying algorithm is a neural network, which is essentially a function (with inputs and outputs) that operates on the inputs via layers of neurons. Neurons simply peform a weighted sum of the inputs, the learning part comes when determining what the weights need to be for a specific set of "training data". An optimisation step occurs to set these weights based on finding the minimum of some loss function between the network output and the training data. Other elements are added into the design of the network (number of layers, activation functions, biases etc.) but this is the general idea. A "test" step follows, once the network is trained, to make sure that it is performing correctly. See here for a "worked example".
Generative AI is a class of AI models capable of creating completely new outputs with similar features to the training data. For instance chatbots, which can converse with a user (see below), or text-to-image generators (DALL-E, Stable Diffusion).
What is an LLM?¶
ChatGPT, Bard and Copilot are examples of large language models (LLMs). These are AI trained on very large data sets of text and are designed to emulate human language by producing outputs as "natural language". The idea is to provide a user friendly "chatbot" interface for a back catalogue of human knowledge.
The LLM creates a response based on weighted probabilities; it essentially provides you with an answer which is "most likely" correct or helpful. This means that the LLM can give incorrect or unhelpful answers since these were deemed the "least worst" response. Responses like 'I don't know the answer' are down-weighted significantly. This becomes a significant concern, particularly in fields like mathematics, where precision is crucial, and an incorrect answer is of little use.
This means that the model lacks true 'reasoning' capabilities. While it may appear to present a logical sequence of arguments, this is still a result of the 'least worst' mindset, and the model's responses are essentially educated guesses based on the patterns observed in the training data. If the specific example provided is not present in the training data, the model's accuracy becomes uncertain.
The best use case for us is perhaps to use LLMs to summarise a "textbook" concept, rather than expecting it to complete worked examples for us. For instance, we might ask it to explain what a neural network is (try this as homework!), or remind us of a simple mathematical definition (see below). Keep in mind that you should cross check anything produced by a chatbot in this context, as it can simply make things up, see for instance controversy about made up references, or this dodgy proof. We'll take a look at some more examples below.
It is worth noting at this point in the discussion that the AI is constantly being exposed to new training data and plugged into other computational engines (e.g. Wolfram) so in some circumstances it will be able to query some symbolic computing package to perform mathematical manipulations. This, in some sense, indicates that there are usually more appropriate computational tools for many advanced mathematical applications (e.g. Mathematica, Maple, or a specific computational module/library e.g. GAP or LAPACK).
Examples¶
In this section we will generate some examples using ChatGPT to demonstrate where it can be helpful and where it does a bad job.
We show examples via screenshot. These tools are constantly being re-trained so there is no gaurantee that the prompts used below will give the same response in a few months time (try this!). We've tried to pick out examples to demonstrate the way LLMs work, rather than specific topics that always perform well or not.
First the bad.
The bad¶
As we mentioned earlier, specific examples are generally not handled well by LLMs. Here we ask a question which is related to the applied part of this course. We ask "What is the Lagrange interpolating quadratic through (0,0) and (1,2).", in other words the polynomial of degree $\leq 2$ through these two points. Note that this cannot be a unique quadratic (the Lagrange polynomial), as this would need three pieces of data. Here's how ChatGPT handled the query:
First of all, it is aware of Lagrange interpolating polynomials (don't worry if you aren't! we'll cover this in more detail in Chapter 5) and has given us the correct formula for the linear case, so it has realised that linear is the best we can do with two points. So far so good! Unfortunately it then goes wrong when simplifying the result down. Clearly we should find $y=2x$ as the line through $(0,0)$ and $(1,2)!$
It gets worse! You would hope it would be possible to point out to the chatbot that it has made a mistake. Here we check it has the correct points and then that the polynomial it has arrived at goes through the required points:
It has doubled down on us and attempts to gaslight us that $2=0!$ To us as mathematicians this is quite heinous but it is indicative of the way that these LLMs work. They aren't always able to recognise logical errors and will regularly give incorrect guesses. We should say, the same mistake is made if we ask for the linear interpolant rather than the quadratic; it isn't that we have thrown it off with a bad prompt (but this can happen!).
For this reason if you do use something like ChatGPT, you have the ability to verify the result it has given you, and you do not blindly accept it.
A further example is given here demonstrating the inefficiency of using an LLM for simple mathematics. Suppose we are posed with finding a quadratic with a root at $x=1$ and a turning point (of any type) at $x=2.$ Quickly we should/could realise that by symmetry the other root of the quadratic is at $x=3,$ therefore the quadratics are $\pm(x-1)(x-3)$ (you can check the turning point at $x=2$ for yourself if you like). Hopefully you agree, from a mathematical reasoning point of view, this was a quick problem to solve, once you spot the right approach. Take a look at what ChatGPT did:
This example has really mangled the methods; it recognises $(x-1)$ should be a factor but then tries to combine this with another method altogether and ends up with a cubic! Again the reasoning has let it down, but moreover, unless it has seen the convenient "trick" before, it will always attempt something more convoluted. Think of it in the following way; would you use your calculator to compute $1+1$? No, because it's not efficient! Part of developing good mathematical skills is using them efficiently to build up more advanced results or concepts. If we blindly use technology to prop up our skills we will not be able to make progress.
The not bad¶
A common use-case for AI is actually in producing code. Some platforms embed a kind of auto-complete system to help but ChatGPT can produce Python code. In a similar way that we have already discussed, we should be prepared to critically evaluate the code, as we would with any other AI output.
Here is a reasonable example where we asked for some code to evaluate the trapezium rule for numerical integration:
def trapezoidal_rule_integral(f, a, b, n):
"""
Compute the definite integral of a function using the trapezoidal rule.
Parameters:
- f: The function to be integrated.
- a: The lower limit of integration.
- b: The upper limit of integration.
- n: The number of subintervals.
Returns:
The approximate value of the definite integral.
"""
h = (b - a) / n
result = 0.5 * (f(a) + f(b))
for i in range(1, n):
result += f(a + i * h)
result *= h
return result
# Example usage: Compute the indefinite integral of x^2 from 0 to 1
def function_to_integrate(x):
return x**2
lower_limit = 0
upper_limit = 1
num_subintervals = 1000
result = trapezoidal_rule_integral(function_to_integrate, lower_limit, upper_limit, num_subintervals)
print(f"The approximate integral is: {result}")
The approximate integral is: 0.33333349999999995
The code looks reasonable and obeys a number of the good practice principles that we have discussed earlier. There is even a validation example provided linking back to our discussion on computational thinking.
However, we know from our good practice discussion that Python can conduct numerical operations like these ones much more quickly by using vectorisation via numpy
arrays. Adding into the prompt "using numpy arrays" yields the following code:
import numpy as np
def trapezoidal_rule_integral(f, a, b, n):
"""
Compute the definite integral of a function using the trapezoidal rule.
Parameters:
- f: The function to be integrated.
- a: The lower limit of integration.
- b: The upper limit of integration.
- n: The number of subintervals.
Returns:
The approximate value of the definite integral.
"""
x_values = np.linspace(a, b, n + 1)
h = (b - a) / n
result = 0.5 * (f(a) + f(b)) + np.sum(f(x_values[1:-1]))
result *= h
return result
# Example usage: Compute the indefinite integral of x^2 from 0 to 1 using NumPy arrays
def function_to_integrate(x):
return x**2
lower_limit = 0
upper_limit = 1
num_subintervals = 1000
result = trapezoidal_rule_integral(function_to_integrate, lower_limit, upper_limit, num_subintervals)
print(f"The approximate integral is: {result}")
The approximate integral is: 0.33333349999999995
We get the same result but now much more efficiently. The key here is knowing the best prompt to give ChatGPT, there is significant danger in using it naively and expecting good quality code.
We should also give a warning about using LLMs for debugging. We have discussed that LLMs cannot reason, therefore they are incapable of accurately finding logical mistakes in code.
Here we give the debugging example from the computational thinking notebook:
The response is surprisingly bad, there are too many parenthesis (because we just removed 0.5*
) and it assumes this must be a bug. Given it could give us the trapezium rule (above) it should have no trouble with the midpoint rule, but the educated guesses aren't good enough. It did call us out for not using import numpy as np
.
We can persist and try and resolve it's concerns:
Now it assumes the issue is that we're not providing a numpy
compatible (vectorised function) and so gives us the loop version. Now the factor of $1/2$ is present, but it hasn't really debugged our code, we've just been given an alternative (sub-optimal) code.
The good¶
Arguably the best thing to use an LLM for is to revise/refresh simple bookwork results where the chatbot will just parrot a standard definition widely found in multiple sources. In this sense we use them like we would any other reference material like an internet search, a textbook or the lecture notes for your modules. You may even notice the wording of a standard definition from an LLM follows closely some prominent source like wikipedia because this is in the training data. Importantly for your studies you should always consider the lecture notes or course materials the definitive source; this should be thought of as your training data!
Here are a few examples of potentially helpful interactions with ChatGPT:
The first two examples are quite standard. The third one on Fourier Series is textbook stuff too, however when you learn Fourier Series as a mathematician you will learn how the coefficients are obtained by these formulas and what properties of the Fourier basis functions are important. This is really very valuable for a good understanding of Fourier Series and the ChatGPT definition is missing it. One could give further prompts to get this information but you need to know what is missing, so just be wary.
Report writing
If you read into LLMs more widely you may notice that they are often used for language correction, proof-reading or to improve/edit text generally. For mathematicians this can be useful when we come to write journal articles or when you are report writing, however, as we mentioned at the beginning the university forbids this usage and language correction should only be used where authorised and acknowledged. The same reasoning applied here as to the mathematical use of AI; if we are unable to write effectively ourselves, how do we know the AI is performing well?