Sign in with Google

OpenClaw Just Updated: How To Use AI Agents Like a Pro (Full Tutorial)

AAI Master·May 6, 2026

Search & chat across thousands of video summaries. Free to start.

Watch on YouTube

Transcript

Open Claw has been shipping at an absurd pace. Since the last video where we covered the install, the team has pushed a stack of updates, new skills, new integrations, a whole new memory system, and most of the reviews you'll find on YouTube are just reading the change log out loud. I actually run the best of them on real production tasks, the kind I do every day. By the end of this video, you'll know which of these are actually worth using dayto-day.

Quick context for anyone who landed here called Open Claw is an open-source autonomous agent. You give it a task. [music] It plans the steps, calls tools, runs them, checks the output, and keeps iterating until the job is actually done. Think of it as the engine your personal AI assistant runs on. We covered the install in the previous video.

Link in the description below if you need it. Cool. Let's get into it. First one's video.

Open Claw now has a built-in tool called video generate. You ask the agent for a clip. It makes a clip right inside the response. No external service, no copy paste, no separate dashboard.

Under the hood, you get four providers. XAI's Grom Imagine video does a 5-second clip in about 30 seconds. Looks decent. Cost almost nothing.

That's your social and quick demo lane. Runway Gen 4 takes 1 to 3 minutes. Looks premium. Runs $2 to5 per 10-second clip.

That's the production lane. Juan from Alibaba runs locally. If you've got a 12 gig GPU or better, free if you own the hardware, quality lands in the middle, that's the no API path. And Comfy UI is there if you've already got your own video workflow on your own machine.

Custom quality depends entirely on what you've built. Quick note on wiring these up because it applies to every provider in this video, not just video. You pick the lane you want, head to that provider's site, and generate an API key. I'll show it on Grock since it's the cheapest to start with.

Create the key, copy it, open your terminal, and paste the export command with your key in line. On a Mac, it's one line. Hit enter. Done.

Same flow for every other provider we hit in this video. Here's the honest part. If you need one video a week, don't use this. Open Cling or Cance directly.

The UI is better and on a single clip, the price is basically a wash. Open Claw earns its place when video is part of a pipeline. automated content. The agent reads a news feed every morning, writes a summary in the channel's tone, generates a 5-second B-roll clip on the topic, drops the whole package into a draft. Doing that with runway in the loop means a human has to sit there and copy paste prompts.

With video generate, it's literally one step inside a longer script and the agent runs the whole thing while you're asleep. Image to video on local files. The agent has access to your file system. A prompt like take every photo from the trip folder, animate each one for three seconds, stitch them into real runs end to end.

Runway doesn't read your folders. Nano banana doesn't read your folders. This one's only possible because the agent lives on your machine. Images are next.

And the shift here is the bigger one. Open claw could already do text to image. Type a prompt, get a picture from scratch. What's new is reference editing.

You hand the agent an image you already have, tell it what to change, and it generates a new version based on yours. That one shift is what unlocks most of the workflows that weren't really possible before. Three providers are worth knowing here. The new default is OpenAI's GPT image 2.

And the nice touch is that it runs through Codeex Oath. If you're already logged into Codeex, you don't even need a separate OpenAI API key. X AI's Gro imagine image and its provariant both to reference editing too. So there's a real alternative if you prefer that side of the ecosystem and open router now covers image generation under the same single key it already uses for text.

One credential swap providers without rewiring anything. Same caveat as with video. If you need one image just open Nano Banana or ChatGpt. They're built for that exact job.

The reason image editing earns its place inside Open Claw is again the agents access to your files. The cleanest example is brand work. Drop a logo into the chat. Ask the agent for five color variants for a deck.

Original, blue, red, monochrome, whatever you need. 20 seconds later, all five are sitting in the conversation ready to download. Same shape works for product shots, head shot, anything you need a quick set of variations on. It also helps when you don't have anything to start with yet. You're just thinking out loud.

Sketching a post, brainstorming a campaign, talking through how a section should look. The agent generates a preview while you're still figuring out the idea. Don't like it? Ask for it minimalist.

Ask for it darker. Done. Same shape works for mockups. Screenshot a website.

Ask for a dark mode redraw and you've got something to send your designer in under a minute. Speaking of cutting out repetitive work, here's a tool that does the same thing for the processes you keep reexplaining to your team. Have you ever had someone on your team message you like, "Wait, how do I do that again?" And you realize you've explained the exact same process three times this week. Yeah, that's what I call expensive busy work.

Luckily, today's sponsor, Scribe, is here to fix that. Scribe records your workflow in real time and automatically builds a step-by-step guide that anyone can follow. No extra effort required. I'll show you.

I just hit record in the Scribe extension and I'm just doing the process once like I normally would and boom. Scribes already captured every step with screenshots and text automatically added. I didn't have to paste anything into a Word doc or manually screenshot every single click. It just did it.

And here's where it gets even better. I can customize it, add my branding, hide sensitive details, and then share it anywhere in seconds. I can send a link, download a PDF embedded in notion, whatever fits my workflow. And if I need to update the guy later, I just edit it once and everyone with access gets the latest version automatically.

No more wait, is this the old version? Oh, and this is my favorite part. The guide me feature pulls your Scribe straight into the browser with interactive walkthroughs. It literally shows you exactly where to click and walks you through each step as you go.

So yeah, when we're talking about AI subscriptions that are actually worth the money, this one gives you serious time savings. Take back your time and skip the busy work. Head to scribe.house/master or click the link in the description to get started for free. You can also unlock even more powerful features with Scribe Pro.

Try Scribe for free. Let's back to Open Cloth. Third one is audio. Same idea as the last two.

Generation moves inside the agent. There is a music generate tool and the agent now produces tracks straight from a prompt without leaving the conversation. Three providers under the hood. Google Liia is the polished instrumental option.

Orchestral, ambient, the cinematic stuff. Miniax handles vocals better and costs less. Comfy UI is there for anyone running custom audio models locally. One small detail that's easy to miss but matters in production.

Optional parameters like duration seconds aren't supported by every provider and Open Claw handles that gracefully instead of crashing the run. The most obvious case is background music inside a video pipeline. The agent makes a B-roll clip, picks a track that fits the mood, lays it under the video, and the finished thing comes out the other end. You don't open a separate music site.

You don't download anything. Nothing lands on a timeline. And there's one nice detail worth flagging. You can hand the agent a reference track or a voice memo and it generates something close in vibe.

Useful when a brand has a recognizable sound and you don't want to describe it in words every single time. The numbers 1 to 3 minutes per track. It's async, so the agent runs it in the background and pings you when it's ready. Cost lands in the 5 to 20 cent range per track depending on [music] provider.

Fourth one, text to speech. TTS isn't new in OpenClaw. Voice wake and talk mode have been in there for a while. 11 laps plus a system fallback. What's new in the last few weeks is the upgrade around it.

The provider menu got a lot wider. Voice personas landed. The controls moved from one global setting down to per agent and per chat overrides. The provider list is where most of the surface area is. 11 labs got bumped to V3.

The top tier voice Azure speech is in. Shyomi's Mimo is in. inworld and volen are in. Gradium and Gemini TTS are in. And if you want everything local on Apple Silicon, there's an MLX option that runs on device.

The tool surface itself is straightforward. SLTS generates speech ad hoc/tts latest revoices the most recent message. And underneath that, you can pin a default voice per agent through agents.list.tts tts or overwrite it for a single conversation through messages. That last bit is the part that actually changes day-to-day behavior.

Where this earns its place is again inside a pipeline. The most useful flow is a full video where the agent writes the script, generates the B-roll, picks the music, and now also lays down the voice over with whichever voice you've pinned to that agent. By the time you check on it, there's a draft cut sitting in your output folder. Same logic for podcasts and audio books that scale and multilingual content gets a lot cheaper.

The same script routed through a Mimo or Volen covers Asian language voices that 11 Labs doesn't handle as cleanly. Numbers depend heavily on which provider you pin. 11 Labs V3 is cents per minute and sounds human enough for narration you'd actually publish. MLX local on a decent Mac is effectively free after the model download but slower. Azure Gemini TTS and the rest sit somewhere between.

Fifth one is the update that makes people raise an eyebrow when I show it. Open Claw can now join your Google Meet calls not as a recording bot sitting in the corner. The agent is actually a participant. It listens, takes notes in real time, and at the end of the call, you get a clean summary, decisions, and action items dropped into wherever you keep that stuff.

Setup is a one-time OOTH into your Google account. After that, you can either invite the agent to a specific meeting or let it autojoin everything on your calendar. Both work. Auto join is the one that changes behavior.

You stop thinking about whether the agent is in the room. It just is. The use cases stack up fast. Client calls become searchable.

The agent files the transcript and summary alongside the project folder. So two weeks later when someone says, "Didn't we agree on X?" The answer is one query away. Internal standups stop needing a dedicated notetaker and the one that surprised me. Async meetings actually start working.

If you can't make a call, the agent attends, takes notes, and you read the summary in 2 minutes instead of watching a recording at 2x. The honest caveat, live transcription is only as good as the audio. On a clean call with decent mics, the summary is genuinely usable. On a noisy call with three people on phone audio, you'll still get a transcript, but you'll want to skim it rather than trust the action items blindly.

The cost is whatever your LLM call costs to summarize at the end. A few cents for a typical hour-long meeting. Quick aside before the next feature, because the bridge here is too clean to skip. Open Claw is an autonomous agent for dev work and routine automation.

Same idea, different domain. We built AI master, our content production pipeline around an agent producer that runs a channel, not just YouTube. By the way, the way Open Claw runs your dev tasks. The producer agent is a full team member, not a chatbot.

It has the channel context, niche, audience, tone, and live access to the analytics. So, it knows which hooks pulled and which didn't and uses that to push you toward videos that actually have a shot at going viral. It writes the scripts to structure hooks, beats, CDA placement. Pulled from what's actually worked on your channel, not a generic template.

Sitting next to it is an agent designer. You brief it the way you brief a human. It understands references in the spec, pulls the asset back, and drops it in the chat. The whole video you're watching right now, research, script, structure, runs through it.

If you want to use the same production agents and run it yourself, link is in the description below. If you'd rather skip the build, we do that, too. The same engine you just heard about runs on our own channels and our clients. Strategy, scripts, generation, production, publishing, the whole pipeline already wired up.

You're not getting a pile of tools. You're getting a process that produces content. Second link in the description. Drop a request and we'll take it from there.

Sixth one is the one I've been waiting for. Open claw already had memory. Memory.md project files dreaming in beta since October. The agent wasn't starting from zero.

What active memory changes is the quality of that recall. A structured querable layer on top of what was already there. The way it works in practice, the agent writes distilled notes to itself as you go, not the full transcript. The compressed version, the user prefers short Slack style replies.

The marmalade recipe lives in recipes marmalade on every new session. It pulls back what's relevant to the current task instead of treating each chat as day one. The cleanest demo is the one that lands hardest. So on screen here is a side by side.

Same prompt, two agents. Default config on the left. Remind me what we decided about the pricing page last month. And the agent has nothing.

Asks you to paste the context. active memory on the right. Same question and it pulls back the actual decision, the date and the file where it lives. That's the whole pitch in 8 seconds. Where this changes dayto-day work is long run in projects.

Anything you come back to weekly, a content calendar, a client account, an ongoing build stops requiring a reonboarding ritual. The agent already knows the players, the constraints, the things you tried that didn't work. And after a couple of weeks of use, it stops giving generic advice. It picks up your stack, your tone, the kind of answer that actually lands, not because you toggled the setting, but as a side effect of the memory accumulating.

One thing to flag, honestly, active memory isn't free. It runs as a sub agent that fires on every request to decide what to pull and what to write back. And that adds roughly 20 to 30% on top of your token usage per turn. On cheap models, you won't feel it. on Opus or another premium model where you're already burning a few cents a turn, the bill adds up.

Worth it for project work where the recall actually saves you time. Probably not worth it for one-off chats. Turn it off for those. Last one on the list, and the name is the first thing people push back on.

It's not marketing, it's literal. Dreaming is what agent does when you're not using it. Between sessions in the background, Open Claw goes back through recent memory and reorganizes it. connects things that happened in different chats, drops what's no longer relevant, promotes patterns that keep showing up into something more durable. The honest framing is it's the same trick humans run while sleeping.

You don't store every detail of the day. You consolidate. Dreaming is the agent doing that on its own schedule. So the memory it carries forward is structured, not just a pile of notes that grows forever.

What it looks like from the user side is mostly absence. You don't see dreaming run. What you notice is that two weeks in, the agent's recall gets sharper rather than noisier. It surfaces the right thing faster, and it stops repeating advice you already pushed back on.

If active memory is the storage layer, dreaming is what keeps the storage layer from rotting. This is the one I'd flag as still maturing. If you're testing Open Claw on a single afternoon, you won't see dreaming do anything. Set it up on a project you'll come back to for a month and it earns its name.

One thing I have to address before we wrap. In the previous video, I recommended Claude Max at 200 a month as the cleanest way to run open claw opus. After April 4th, that recommendation is dead. Short version.

Until April 4th, you could pull the OOTH token out of a Claude Pro or Max subscription and feed it to OpenClaw, effectively unlimited opus for a flat fee. About 60% of active OpenClaw sessions ran that way. Anthropic closed it February 20th. the terms of service banned subscription tokens and third-party tools. April 4th, they stopped accepting them.

Done. For context, Peter Steinberger, the original creator of OpenClaw, now at OpenAI, framed it publicly as quote, "First, they cloned the popular features into a closed harness. Then they blocked open source." A month earlier, Enthropic had cut clawed code off from outside coding harnesses like Open Code. So, this was move number two.

Take that for what it's worth. It's the creator's read, not mine. Quick checklist. What to use today, what to enable and forget, and what to skip.

Use today reference image editing, music generation, and active memory. If you do real project work, point them at your providers and start using them in your daily flow. Keep active memory off for cheap oneoff chats where the 20 to 30% token text doesn't pay back. Enable and let it bake.

Dreaming 5 minutes to flip on in settings, then walk away. Come back in a month. The TTS upgrade. Pin a provider per agent and forget it.

Conditional video generation only if video is part of a pipeline. For one-offs, use Cling or Runway directly. The Google Meet plugin only if you actually take meetings. If you do, it's the highest leverage one on the list.

Skip for now. Dreaming on a fresh install with no project history. The mechanic is real, but it needs material to consolidate. Don't judge it on day one.

And by the way, the research, the script, the structure you just watched, that runs through AI Master, the production pipeline our team uses daily. If you want the same setup for your own channel, not just YouTube, link is in the description below. If this was useful, subscribe and I'll see you in the next one.

More from AI Master

How To Create Your Own CRM System With AI (No Code)

How To Create Your Own CRM System With AI (No Code)

How To Use AI in 2026: AI Tools Explained for Beginners

How To Use AI in 2026: AI Tools Explained for Beginners

Master Nano Banana 2 Instantly – The SECRET JSON Prompt Engineering Formula

Master Nano Banana 2 Instantly – The SECRET JSON Prompt Engineering Formula

How to Turn ChatGPT Prompt Into a Complete Visual Workflow (Step-by-Step Guide)

How to Turn ChatGPT Prompt Into a Complete Visual Workflow (Step-by-Step Guide)

NEW Claude AI Tutorial — How to Use Anthropic’s Opus 4.7 (Updated Guide)

NEW Claude AI Tutorial — How to Use Anthropic’s Opus 4.7 (Updated Guide)

How to Create Digital Marketing Design With AI in 5 Minutes

How to Create Digital Marketing Design With AI in 5 Minutes

Want more than one video?

Search, chat, and build a knowledge base from thousands of videos.

Get started free