Geek Alert

A few thoughts on Sora

February 16, 2024 Film Industry, Geek Alert, WGA

Yesterday, OpenAI announced Sora, a new product that generates realistic video from text prompts.¹ The examples are remarkable.

A TV writer friend texted me to ask “is it time to be petrified?”

I wrote back:

I don’t think you need to be petrified. It’s very impressive at creating video in a way that’s like how Dall-E does images. A huge achievement. For pre-viz? Mood reels? Incredible. We’ll see stuff coming out of it used in commercials first.

For longer, narrative stuff, there’s a real challenge moving from text generation (gpt-4 putting together something that looks like a script) to “filming” that script with these tools to resemble anything like our movies and television.

Writers, directors, actors and crew have a sense of why they’re doing what they’re doing, and what makes sense in this fictitious reality they’re creating. I don’t think you can do that without consciousness, without self-awareness, and if/when AI gets there, stuff like Sora will be the least of our concerns.

With a night to sleep on it, I think there are a few larger, more immediate concerns. Writers (and humans in general) should be aware of but not petrified by some of the implications of this technology beyond the obvious ones like deepfakes and disinformation.

Video as input. Like image generators, this technology can work off of a text prompt. But you can also feed it video and have it change things. Do you want A Few Good Men, but with Muppets? Done. Need to replace Kevin Spacey in a movie? No need to reshoot anything. Just let Sora do it.
Remake vs. refresh. Similarly, any existing film or television episode could be “redone” with this technology. In some cases, that could mean a restoration or visual effects refresh, like George Lucas did with Star Wars. Or it could be what we’d consider a remake, where the original writer gets paid. What’s the difference between a refresh and a remake, and who decides?
Animation vs. live action. How do we define the video material that comes out of Sora? It can look like live action, but wasn’t filmed with cameras. It can look like animation, but it didn’t come out of an animation process. This matters because while the WGA represents writers of both live action and animation, studios are not currently required to use WGA writers in animation. We can’t let this technology to be used as an end-run around WGA (and other guild) jurisdiction.
Reality engines. In a second paper, OpenAI notes that Sora could point to “general purpose simulators of the physical world.” The implications go far beyond any disruptive effects on Hollywood, and are worth a closer look.

It seems like a long way to go from videos of cute paper craft turtles to The Matrix, but it’s worth taking the progress they’ve made here seriously. In generating video, Sora does a few things that are really difficult, and resemble human developmental milestones.

Like all models, Sora is predictive, making guesses about what just happened and what happens next. But it feels different because it’s doing this in a 3D space that largely tracks with our lived experience. It remembers objects, even if they’re not on screen at the moment, and recognizes interactions between objects, such as paintbrushes leaving marks on the canvas.²

Sora makes mistakes, but the results surprisingly good for a system that wasn’t explicitly trained to do anything other than generate video. Those capabilities could be used to do other things. In a jargon-heavy paragraph, OpenAI notes:

Sora is also able to simulate artificial processes — one example is video games. Sora can simultaneously control the player in Minecraft with a basic policy while also rendering the world and its dynamics in high fidelity. These capabilities can be elicited zero-shot by prompting Sora with captions mentioning “Minecraft.”

Sora “gets” Minecraft because it’s ingested countless hours of Minecraft videos. If it’s able to create a simulation of the game that is indistinguishable from the original, is there really a difference? If it’s able to create a convincing simulation of reality based on the endless video it scapes, what are the implications for “our” reality?

These are questions for philosophers, sure, but we’re all going to be faced with them sooner than we’d like. Sora and its descendants are going to have an impact beyond the cool video they generate.

Sora is a great name, btw. It doesn’t mean anything, and doesn’t have any specific connotation, yet feels like something that should exist. ↩
Not to dive too deeply into theories of human consciousness, but the ability to internally model reality and predict things feel like table stakes. ↩

How ChatGPT resets the levels

December 22, 2022 Follow Up, Geek Alert, Hive Mind, Meta

On a recent Scriptnotes, I showed off ChatGPT to Craig and Scriptnotes guest Rian Johnson. We asked it to write up a plot summary for a future Knives Out movie set in space.

We got an answer similar to this query I just ran:¹

In the far-flung future, wealthy space entrepreneur Harrison Thorne is found dead in his luxurious space station, the victim of an apparent suicide. However, Detective Benoit Blanc, a renowned investigator known for solving even the most complex cases, has a hunch that foul play is at work. As he begins to investigate, he finds that every member of Thorne’s eccentric and contentious family has a motive for murder.

As the detective delves deeper into the mystery, he must navigate the tangled web of lies and deceit spun by the suspects, all while trying to stay one step ahead of the cunning killer. With the help of Thorne’s loyal assistant Marta, Blanc must unravel the truth before the killer strikes again.

As the body count rises and the tension mounts, it becomes clear that no one is above suspicion in this high-stakes game of space-age crime and deception. Will Detective Blanc be able to solve the case before it’s too late, or will the killer get away with murder in the cold vacuum of space?

None of us were blown away by the plotting. It felt like it was copy-pasting ideas from the first movie, with a lot of generic pitch filler. Still, we all agreed that the sentence-by-sentence writing was surprisingly good. It didn’t feel like it was written by a bot.

When we asked it to write a scene, it was functional but weightless. Here’s one based on the summary above:²

INT. SPACE STATION – DAY

Detective BENOIT BLANC, a middle-aged man with a southern drawl, walks through the sterile corridors of the space station, his face etched with determination. He approaches a door labeled “Harrison Thorne – CEO.”

BLANC

(to himself)

Time to get to work.

He opens the door and enters the office, which is spacious and well-appointed. A young woman, MARTA, sits at a desk, typing away on a computer. She looks up as Blanc enters.

MARTA

(stands up)

Can I help you?

BLANC

(flashes badge)

Detective Benoit Blanc, ma’am. I’m here to investigate Mr. Thorne’s death.

MARTA

(nods)

I see. Well, Mr. Thorne was a good man. I can’t believe he’s gone.

BLANC

(nods sympathetically)

I understand. Is there anything you can tell me about his final days, or any unusual circumstances leading up to his death?

MARTA

(hesitates)

I’m not sure. I mean, he was under a lot of stress with the business, but he always seemed to handle it well.

BLANC

(nods)

I see. Well, thank you for your time. I may have more questions for you later.

MARTA

(nods)

Of course. I’ll do whatever I can to help.

BLANC

(smiles)

Thank you, ma’am. I’ll be in touch.

Blanc exits the office, deep in thought. The investigation has only just begun, and he has a feeling that it won’t be an easy one.

To be generous, it’s not nonsense. Unlike earlier attempts at algorithmically-generated screenplays, you can follow what happens, and characters do feel like they’re talking to each other. But there’s no nuance or character motivation despite the endless parentheticals.

It’s not a scene, but a simulacrum of one.

Listener JR, who makes his living using tools like these, wrote in with an interesting perspective:

Language models are built on “training data,” which is the text you feed into a learning process to produce the output. For very sophisticated models, the training data is vast: for something like ChatGPT, it includes something like all the text you can scrape off of the last twenty years of the Internet, or so.

But this means ChatGPT is about as smart as the average writer on the Internet has been over the past twenty years — and indeed, the models that comprise GPT drag the results toward the average, not the extraordinary, because the average has much nicer statistical properties than the extraordinary for companies that seek to produce a marketable, scalable product from their models, which requires the ability to tweak, diagnose, and defend what you’re selling.

ChatGPT isn’t optimized for breathtaking originality because that’s not its goal. It’s meant to be conversational and approachable, not wild and daring. It’s giving you hamburgers rather than Japanese wagyu sando with white truffle shavings because 90% of the time, that’s what you want.

JR continues:

Ultimately what these models mean is that with the click of a button you can now be just as good as the average writer who posts content to the Internet, and so the old “average” is now the new “zero.” If you wrote at the average level of the Internet in 2022 you now write at the zero level.

That’s a big idea, so let’s turn it over in our hands a bit.

Let’s imagine we can pick something that represents the average quality of English writing on the internet. It might be an article about holiday lobster prices or a piece of Peloton marketing copy.

Could any English speaker write at that level? No.

You probably don’t have to think that hard to come up with several people in your life who can’t write up to the standards of “Yahoo News article” or “Peloton blog post.”

Many folks don’t write, won’t write, can’t write.

In fact, if we want to consider it “median” writing quality rather than “average,” then approximately half the English speakers on the internet couldn’t hit this level of sophistication.

Only now, with tools like ChatGPT, they can. It’s very good at taking awkward, ungrammatical writing and making something decent out of it.

I fed ChatGPT this text —

if youre stove won’t light here’s what you do 1) check that that the gas is on turn the knob do you hear it 2) lisen if it’s clicking that’s the igniter. if it’s not clickign you could light it with a match 3) lift off the burner cover cuz sometimes that the issue 4) try another burner

— with the instruction to “Rewrite this as a good blog post.” Here’s the result:

If Your Stove Won’t Light: A Step-by-Step Guide

If you’re trying to cook a meal and your stove won’t light, it can be frustrating and time-consuming. Here are some steps you can try to troubleshoot the issue:

Check that the gas is on. Turn the knob and listen for the sound of gas flowing. If you don’t hear it, the gas may be turned off or there may be a problem with the gas line.

Listen for the igniter. If you hear a clicking sound, that means the igniter is trying to light the gas. If you don’t hear a clicking sound, you can try lighting the burner with a match.

Lift off the burner cover. Sometimes, food or debris can accumulate underneath the burner cover and prevent the gas from lighting. Removing the cover may resolve the issue.

Try another burner. If all else fails, it’s possible that there is a problem with a specific burner on your stove. Try using a different burner to see if it lights properly.

By following these steps, you should be able to get your stove working again in no time. If the problem persists, it may be a good idea to call a professional for assistance.

Imagine you’re a technician asked to document how a system works. You could describe the steps and let ChatGPT do the writing. Would you still need to proofread it, to make sure it’s not hallucinating something wrong? Absolutely. But most reluctant writers are competent readers. They should be able to spot when it’s gone off track.

Where I think these tools have even more benefit is for people who don’t feel confident in English. Type in your native language and let the AI do its work, translating not just the words but the intent.

Which brings us back to JR’s thesis, that a tool like ChatGPT resets the levels. In a world in which everyone can write at the old “average” level, what happens?

I’m not sure. There’s not a great comparison in recent human history. Calculators enabled ordinary people to do a lot more number-crunching, but were they transformative? (For the human calculators of Hidden Figures, perhaps so.)

A world filled with AI-polished text might push us in unexpected directions. To err is human, and we might find language incorporating mistakes to porve a hooman wrote it. Or we might use these tools like Instagram filters to make us look like someone we aspire to be.

Personally, I don’t think screenwriters are particularly endangered, at least not in the short term. Future tools may be trained and tuned for scriptwriting, but they’ll be working off a corpus of all the screenplays that have come before. And that’s how it’s always been.

JR notes:

All writers are competing against the cumulative past of our species and its semi-automatic weapons of creative destruction, and are competing to stand out from an average defined over the past couple of decades or so.[…] Writers should only worry about automated writers as much as they are already worried about the agglomerated capacity and output of our species up until the present moment, which, eh, we can do better.

The prompt is “Give me a plot summary for a Knives Out set in space.” ↩
Prompt: “Write me a scene from this movie where Blanc first meets Marta, in screenplay format.” ↩

Getting Apple Pay to work on my M1 MacBook

April 29, 2021 Geek Alert, How-To

I love my 2020 MacBook Air and have had very few issues with it, except that I couldn’t get it to work with Apple Pay. Transactions would fail, and it wasn’t clear why. For months I just lived with it and used alternate ways of paying for things.

But man, I missed the convenience of Apple Pay, which worked fine on my much older machine. (This was my first Mac with Touch ID, which is relevant.)

Last night, while trying to buy two t-shirts at Cotton Bureau, I was determined to find a solution to my Apple Pay woes.

Checking the Wallet & Apple Pay pane in System Preferences showed my credit cards were there but disabled. It pointed me to this page at Apple:

If Apple Pay on your Mac is disabled because security settings were modified

Had my security settings been modified? No idea.

That page pointed me to this one:

About Startup Security Utility

It seemed that maybe I needed to boot into Recovery Mode and use the Startup Security Utility to set my Mac to a higher security threshold.

The article said you get to Recovery Mode by booting while holding down Command-R. But on the new M1 Macs, instead you hold down the power (Touch ID) button instead. This tripped me up for a few minutes.

In Recovery Mode, it was clear that all my system settings were fine.

Here’s what ultimately worked: Deleting the credit cards in Wallet & Apple Pay, then re-adding them. After that, transactions worked great.

My hunch is that the credit cards were imported when I set up this Mac but not properly authorized. Apple should change the language of the warnings or link to a better explanation.

In the meantime, hopefully this writeup has saved you some hassle.

Getting the Razer Tartarus Pro keyboard to work with macOS Catalina and beyond

November 7, 2020 Geek Alert, How-To, Tools

As I’ve written before, I use this weird keyboard which has helped greatly with my carpal tunnel issues.

While this odd keyboard is great for normal typing, certain key combinations are unwieldy. So for the past decade I’ve used an external gaming keypad to the left of my keyboard with custom macros set up for some common commands:

Select All
Undo
Copy
Cut
Paste
Paste and Match Style
Pasteboard History (which is part of Better Touch Tool)

When my beloved Logitech keypad crapped out, I switched to the well-reviewed Razer Tartarus Pro. It’s nicely built! Unfortunately, it doesn’t offer drivers for the current macOS.

After a lot of googling, I’ve cobbled together a solution. So in the interest of sharing what I’ve learned — and remembering how I got this to work in the first place — let me walk through the steps.

Note that this doesn’t do half of what a proper driver could accomplish, particularly for gaming. So please, Razer, make one! But if you want to use a gaming pad like the Tartarus Pro for keyboard shortcuts, this does the trick.

How to make the Tartarus Pro work on macOS Catalina

It’s important to understand that macOS sees the Tartarus Pro as a plain old keyboard. So if you plug it in and hit the 08 key, you’ll see it type a ‘w’.

Luckily, there’s software that can recognize that and do something useful instead.

Better Touch Tool is best known for getting random mice and trackpads to work, but it does a nice job on keyboards as well. (I’m using the 3.5 Alpha version.)

better touch tool setup

Let’s look at the Select All shortcut. You’ll notice the “Assigned Action” is ⌘A. Now direct your attention to the righthand sidebar. That’s where all the real work happens.

1) For the moment, ignore the “Click here to record a shortcut” section. We’ll come back to that.

2) You want the shortcut Enabled, so check the box.

3) You should put a note in this field for clarity.

4) The HUD overlay is surprisingly helpful. It shows what’s happening, like that you just hit “copy.” I find the Title text to be too large, so I use the Subtitle instead.

HUD display shows Copy

5) For Trigger Conditions, you want to choose “Works on keyboards with the same type as used for recording.” Yes, this is a ridiculously long label.

6) You want it to Trigger on Key Down.

7) You don’t want it to repeat.

You’ll do these steps for each key on the gaming keypad you want to remap. Here’s my setup.

I also set key 20, the spacebar, to Undo.

In theory, you’re done! For a few weeks, this worked great. And then it started having issues. When encountering password fields, my normal keyboard would start triggering keyboard shortcuts. I had to restart Better Touch Tool multiple times per day.

Basically, the app kept getting my normal weird keyboard confused with my special weird gaming keypad. I needed to call in the big guns.

Enter Karabiner

I’d long heard of Karabiner Elements, a public domain app that can remap any key and do really impressive things. But it’s intimidating as hell.

Here’s what I wanted Karabiner to do: remap the keys of the Tartarus Pro to seldom-used keystrokes so I could then set those as triggers for Better Touch Tool.

Looking through their user forums, I couldn’t find any perfect matches for this use case, but luckily @bradcurtis had built a set of custom mappings (a “complex modification” in Karabiner speak) for a similar purpose.

Installing them is odd. Here’s how you do it.

Install Karabiner-Elements. You’ll have to give it a ton of permissions in System Preferences.
In Karabiner-Elements Preferences, choose Complex Modications and then Add Rule.
On the next screen, choose “Import More Rules from the internet.”
Either search for “Tartarus” or follow this link
Choose the Import button. It’ll ask you whether you want to open the link in Karabiner-Elements. You do.
Click the button to “Enable All”

If you have the Tartarus v2 like @bradcurtis, you’re done! All of the keys should be mapped to new, less-common keystrokes. But if you have the Tartarus Pro like I do, you need to modify the settings you just imported to change the product ID. This is where it gets frustratingly user-hostile, because it requires you to modify a JSON file in an external editor.

Navigate to ~/.config/karabiner/karabiner.json — the easiest way to do this is by choosing Go > Go to Folder… in the Finder.
Open this file in a plain text editor (I use TextMate).
Find and replace 555 (the product ID for the v2) with 580 (the product ID for the Pro).¹
Save this file and restart Karabiner-Elements.

After doing this, and mapping these new keystrokes to Better Touch Tool, I’m back up to full speed.²

Again, almost no one on Earth will never need or want to do any of this. But if you’re the one person who needs this solution, I hope it helps. Please pay it forward by documenting something you’ve discovered.

If you’re looking for a different product ID, open the Event Viewer in Karabiner-Elements and choose Devices. ↩
I added one additonal modification, converting key 20 (which is coded as “spacebar”) to Left_Shift-Left_Option-s. ↩