Me:
List things with attribute X.
AI:
Certainly! Here are some things with attribute X!
- A - While it doesn’t have X, it does Y.
- B - Also doesn’t have X, it does Z!
- C - Is sorta like A, but without X support.
- D - Useful for Z, but does not have X yet.
- E - May have X (spoiler, it doesn’t)
- F - [is completely hallucinated]
- B - [because we now we are repeating ourselves?]
- G - considered having X, but never did.
There’s some things with attribute X. I’m such a good AI, is there anything else I can do for you?!
You have to tell the AI what it is specifically in order to shape its response. They tend to default to explaining subjects to the dumbest potential user.
Math or generalizations like this have an enormous range of contexts and you need to specify. If you are using a more advanced interface that shows the token perplexity scores for the reply, you’ll likely see the AI does not know the context of itself or the question. Also if you are using a ultra simplistic general interface with a top-p/top-k sampler for softmax, this type of reply is almost inevitable. Depending on the model architecture, mirostat sampling would likely show better results in general, but without a visible token perplexity score it is very difficult to understand when the issue is due to a prompt and when it is due to the model itself.
One cheap and easy trick is to tell the model a few extra details. This can be as simple as, “You are an AI assistant for MIT undergraduate students.” One of my favorites is, “Questions and answers with Richard Stallman’s AI Assistant.” Since Stallman studied AI and has contributed to systems running in present LLM’s, this instruction tends to guide competence considerably. The AI will often rise to a higher level of expectations of the associated context.
Everything you ask for is building momentum. If you use an interface that is data mining and recycling all of your conversations in a hidden history like chatGPT or other service, you’re relying on the massive model size alone to find a result without momentum in the truly available information. If you use an open weights offline model or have control over the history where you can remove unrelated questions or conversations, you gain more depth and utility in what you’re able to access and how.