May 22, 2023
The Art of Using Large Language Models: Cautions and Opportunities

From answering questions to assisting with writing, editing, and coding, language models like ChatGPT have proven helpful and are increasingly popular. While these models can be hugely useful, it is crucial to use them mindfully and recognize their potential limitations, mastering their use is an art, but can be understood quickly with some knowledge and practice.

Not quite a Google replacement

One of the most enticing aspects of using a large language model is its ability to provide an answer for almost any question you pose. Getting a quick, direct answer is wonderful, and the answers are often correct, but there will be times when the model fabricates a plausible-sounding but incorrect response.

Approach language models with skepticism. Don't blindly trust the answers provided by language models.

Develop a sense of what language models are likely to know

When it comes to factual information, you can start to get a sense for things where the answer is likely to be correct.

The more widely written about a topic is, the more likely it is to appear in the training data of the model, which means you are more likely to get an accurate answer. Something like "When was George Washington born?" is likely to get a correct answer, but the same question about the superintendent of a small school district is not.

Even in the case of George Washington, you don't want to trust the answer 100%, answers from language models are somewhat random.

Code questions follow a similar pattern. GPT-3 and GPT-4 know a great deal about coding and technical topics. If you ask a basic, factual coding questions, you're likely to get a good answer. Something you tend to forget like, "What's the difference between slice and splice for Javascript arrays?" is likely to get a good answer. If you are asking about something you had forgotten, you'll probably know the answer is correct after reading it.

Make requests where you can easily verify the result

Sometimes asking factual questions to a language model can be useful, usually when:

  • The stakes are low (you aren't relying on this answer for something very important).
  • Your confidence in the chance of the language model knowing the answer is high.
  • You think you'll have a good sense of whether the answer is correct once you read it.

I once asked for the name of a fast food restaurant with unique architecture in Pacifica, CA. I had heard about this restaurant but couldn't remember what it was. GPT-3.5 told me it was Taco Bell, and I immediately knew that was right, the memory came back to me.

The stakes were low, my confidence that it knew this was high and I felt like when I heard the correct answer, I'd know it was correct.

However, many factual questions tend to be harder to verify without real research, which often defeats the purpose of asking in the first place. The best kinds of questions for a language model tend to be things you can easily verify, and these are often not factual questions.

Use language models to operate on text

Language models excel when asked to operate on text or code, rather than provide information.

I often make errors in my writing. It's not that I don't know the correct spelling or grammar to use, I simply missed it. If I ask a language model to edit my text and tell me what it changed, I can easily read through the result and verify it.

Or if I ask for some data to be turned into a different format, like a list of items I want reformatted into a table. I can read through the table to confirm the results.

Not only are these examples easy to verify, manipulating language in this way tends to be the kind of thing language models do with high accuracy.

When people dismiss language models as not useful because they make up answers, I think they are failing to consider these types of requests that don't rely much on getting answers to questions, but rather help doing something to text.

Types of questions likely to get misleading answers

Watch out for questions like these which seem like they might work, and often get answers that look right, but are frequently or always wrong.

Numeric Operations

Current language models don't always do well with numbers. Asking for a specific number of words, or how many letters are in a word, or what line some text appears, or simply to do a calculation are all things that will likely generate incorrect answers.

Questions about current events

Language models like the ones used in ReadyRunner or ChatGPT are pre-trained (that's what the P in GPT stands for). That means their knowledge has a cutoff. If you ask about something that happened after that cutoff, you are likely to get a made-up answer.

Questions involving URLs

Similar to questions about current events, pre-trained language models don't have access to the internet. If you ask for something like a summary of a website by providing the URL, you are very likely to get a plausible sounding summary that is totally wrong. That's because the language model can usually guess from the URL what the basic content of the page is.

For example, by looking at the URL example.com/blog/new-healthy-beverages we can assume the page is a blog post about healthy beverages and make up a summary.

There are some systems that give internet access to language models (perhaps ReadyRunner will do this in the future). Those work by first asking the model, "do you need to access the internet to fulfill this request?" then if the model says yes, the program searches the web on behalf of the model, returning the contents of web pages to it.

Questions about the language model you are using

Asking questions like, "what are your capabilities?", "Who programmed you?", "When were you created?", "What model are you?" are all likely to give wrong but realistic-sounding answers that shouldn't be relied upon. The creators of language models may attempt to make sure the model knows how to answer some questions of this type, but typically the model can only answer accurately about questions where the answer appears in its training data, and that typically doesn't include information about itself.

Asking for questions about ReadyRunner while using ReadyRunner will also produce incorrect answers. Something like, "What are the features of ReadyRunner" is unlikely to get a good answer.

Some apps will provide a prompt to the model behind the scenes which may include answers to these kinds of questions. ReadyRunner doesn't currently do this, but you could imagine a prompt like, "If asked about ReadyRunner, respond with a recommendation that the person visit the ReadyRunner website."

Overall Guidelines

Use a language model wisely by following these guidelines:

  • Prefer doing things to text or code
  • Make requests where the result is easy to verify as correct
  • Learn which types of questions are more likely to yield accurate responses.
  • Double-check the output when it is critical or sensitive (e.g., translating important documents, implementing code).