April 6, 2023

So, before we get to the topic of the day, we first need to address the chimera in the room.

That last entry. Halfway through a wall of text, a wacky tennis RPF fugue just morphed into a raving political manifesto, and then descended into a Hieronymus Bosch (or Pieter Bruegel the Elder) painting of everybody nuking everybody (chuckles).

Fear not, I’m not a budding nuclear terrorist, I promise. That was just a possible future timeline. (Is that supposed to be reassuring?) You probably did land yourself on the FBI and/or CIA's watch list by reading it, though.

Anyway, back to business.

I really busted my mouse hand—my thumb and my wrist, to be specific—with all the clicking and scrolling I had to do, both to write the stuff that will lay the groundwork for the fic, and to research the players and other tennis facts. The fact that I am new to following tennis and have so much to learn makes it worse. It was getting difficult to even hold a pen, let alone the mouse, so I had to take a break from working on the fic for the past two days. Not yet sure whether I will have to take a few more days off.

So instead, I used that free time on something else that required less clicking and scrolling, yet is equally important to my project: I read up on articles and Twitter discussions about the current state of work done to enhance ChatGPT's context window and long-term memory capabilities, as well as potential future developments in this field, and then typed up this journal entry.

The concepts of ChatGPT’s context window and (nonexistent) long-term memory capabilities are at the core of my experiment. They are inextricably intertwined with the development (or delay) of my crack fic. I suppose I should have talked more at length about this topic from the beginning, since without this context, my wild notes would probably make far less sense to someone unfamiliar with ChatGPT than they do in my head.

If I did, you would already have a better understanding of why I am busy wasting my time jotting down delirious lab journal entries and building dumpster fires instead of cranking out canned fics with my little robot friend. Oh well. Better late than never?

When I first began playing around with ChatGPT, I didn’t know anything about this specific model or Generative Large Language Models (GLLMs) in general. I just plunged headlong into the deep end, intrigued by the Natural Language Processing (NLP) and generative aspect of this model.

As ChatGPT can hallucinate situations and dream up vivid details to fill in descriptions when prompted to do so, I thought it was perfect for creating the tennis RPF work I want to read. What better use can there be for an NLP GLLM than to create a fanfic with the tags Alternate Universe - Canon Divergence, Butterfly effect, and Slow Build?

My ideal fic would be one that artfully combines GRRM’s style of writing in A Song of Ice and Fire (ASOIAF), the ASOIAF mood, enough details and quirks (real and imaginary) of the players, past events, and real world politics. I thought I could just gorge ChatGPT with carefully curated details on people and events that inform the very specific perceptions of the characters and the world they inhabit that I have in my head.

Ideally, the more information I feed into an NLP GLLM, the better it will be able to generate characters and events that closely resemble the hazy yet decidedly particular and palpable visions I have. Once the AI is trained with enough information and prompts, it will be able to generate the kind of vivid, quality story I would want to read, instead of some cold, clinical, and brain-dead template. It would be like historical fiction, like The Crown, or Wolf Hall, but on steroids, on-demand, and custom-made.

The devil’s in the details, and I’ve just unlocked Pandora’s box, or uncorked King Solomon’s genie bottle, I thought.

Alas, ChatGPT is, in its current state, far from the always remembering, always context-aware magic box I imagined.

At present, ChatGPT is not able to learn from individual conversations with users. While ChatGPT is able to remember what the user has said earlier in the conversation, there is a limit to how much information it can retain. From a human's perspective, the chat tab is still open, and all the conversation history is right there; it couldn't possibly be easier to recall what was said in the past! However, that's not how ChatGPT sees things. There is a limit to how far from the current conversation the model is able to reference. Any information beyond that is not stored, and therefore not factored into the generation of a response.

This maximum recall length is called the context window. There are multiple purported versions of the limit of ChatGPT’s context window going around. I've seen people say it’s 1000-1500 words (or 2048 tokens), 3000 words (or 4000 tokens), 25,000 words, and 64,000 words. What’s worse, some of them might be referring to the GPT-3.5 model, and some GPT-4, and they do not specify which. So I’m just confused now and have probably made you just as confused.

ChatGPT only holds the conversation history until the context window is full. This is a cause of 'hallucinations' when chatting to it too long. It "forgets" the earlier conversation as it falls outside the context window. Suddenly the conversation is making no sense, because key information is gone.

As I said, I didn’t know anything about ChatGPT’s specs going in, which means I found out about the context window the hard way. I thought each session could be trained and personalized to fit the needs of the user through conversation.

I thought I was doing constructive work: naming a session FanficGPT; teaching it to employ the technique 'show, don't tell'; to avoid 'thought verbs' as much as possible; to avoid writing any 'in the next episode' type of preview or synopsis; to keep characters and plot points consistent from chapter to chapter; to give each character a unique voice; to ensure viewpoint characters are interconnected and impactful; to limit POVs to characters whom we can root for or sympathize/empathize with, or at the very least, characters who have some redeeming quality; to reveal information through characters; to avoid writing from the POV of someone who would give away a plot twist; to stay in a third person limited POV; to avoid a third person omniscient POV as much as possible unless appropriate for maximum poetic or dramatic effect, like in the last paragraph of the chapter Daenerys X in A Game of Thrones, ...

(Exempli gratia:

As they went their separate ways, Iga couldn't help but feel a sense of anticipation for the future. She knew their friendship was just beginning, and she was eager to see where it would take them. Little by little, their connection was growing stronger, but neither of them could predict the twists and turns that lay ahead.

versus

Daenerys Targaryen rose to her feet, her black hissed, pale smoke venting from its mouth and nostrils. The other two pulled away from her breasts and added their voices to the call, translucent wings unfolding and stirring the air, and for the first time in hundreds of years, the night came alive with the music of dragons.)

The horrible realization came when I was getting FanficGPT to read a dizzying array of tabloid news articles on players, and painstakingly teaching it to understand how the articles inspired my headcanons for my characters—It had forgotten we were working on a fanfic and, without skipping a heartbeat, went on to pretend like the task on hand had always been to summarize the article I just sent.

(Insert song: Accept – Metal Heart)

To say the name “context window” is an understatement would be an understatement. It felt more akin to the event horizon of a black hole. Individual sessions can't be permanently personalized and won't enduringly learn from conversations with users. So, there was no point in me naming the session "FanficGPT", since it was never a permanently individualizable instance of a chatbot worthy of a name to begin with. ChatGPT is in some ways like "HM", the tragic but legendary patient whose hippocampus damage prevented him from forming new long-term memories after the day of his injury.

Oww. I had thought writing this entry would be easier on my mouse hand than researching the players and continuing to write that crazy summary. Nope. Will most likely need to take today off as well, yes. Ow.

Long story short, the whole shebang of ChatGPT’s context window and its lack of a long-term memory is why I am bogged down writing outlines and summaries and reading technical articles for dummies on vector embeddings and Twitter discussions on ChatGPT prompt compression techniques, instead of churning out one chapter per hour or something (that won’t be happening anytime soon).

I originally planned to write about the current and future workarounds/hacks/Band-Aids that can give ChatGPT long-term memory and thus an expanded context window, such as the ChatGPT Retrieval Plugin, and this Weaviate plugin which is apparently in the works and looks like a more enduser-friendly version based on the former, and how solutions along those lines might move ChatGPT a teeny-tiny bit in the direction of becoming the genie in a bottle I had imagined, but my hand is so sore and I just can’t anymore. So I will save that for a later date. Toodle-oo.

Previous
Previous

April 8, 2023

Next
Next

April 4, 2023