Something I realized about this category of tool (I call them "terminal agents" but that already doesn't work now there's an official VS Code extension for this - maybe just "coding agents" instead) is that they're actually an interesting form of general agent.
Claude Code, Codex CLI etc can effectively do anything that a human could do by typing commands into a computer.
They're incredibly dangerous to use if you don't know how to isolate them in a safe container but wow the stuff you can do with them is fascinating.
pmarreck 9 hours ago [-]
I too am amazed. Real-world example from last week:
After using gpt5-codex inside codex-cli to produce this fork of DOSBox (https://github.com/pmarreck/dosbox-staging-ANSI-server) that adds a little telnet server that allows me to screen-scrape VGA textmode data and issue virtual keystrokes (so, full roundtrip scripting, which I ended up needing for a side project to solve a Y2K+25 bug in a DOS app still in production use... yes, these still exist!) via 4000+ lines of C++ (I took exactly one class in C++), and it passes all tests and is non-blocking, I was able to turn around and (within the very same session!) have it help me price it to the client with full justification as well as a history of previous attempts to solve the problem (all of which took my billable time, of course), and since it had the full work history both in Git as well as in its conversation history, it was able to help me generate a killer invoice.
So (if all goes well) I may be getting $20k out of this one, thanks to its help.
Does the C++ code it made pass the muster of an experienced C++ dev? Probably not (would be happy to accept criticisms, lol, although I think I need to dress up the PR a bit more first), but it does satisfy the conditions of 1) builds, 2) passes all its own tests as well as DOSBox's, 3) is nonblocking (commands to it enter a queue and are processed one set of instructions at a time per tick), 4) works as well as I need it to for the main project. This still leaves it suitable for one-off tasks, of which there is a ton of need for.
This is a superpower in the right hands.
saberience 13 hours ago [-]
Incredibly dangerous to use? Seems like a wild exaggeration.
I’ve been using Claude code since launch, must have used it for 1000 hours or more by now, and it’s never done anything I didn’t want it to do.
Why would I run it in a sandbox? It writes code for me and occasionally runs a build and tests.
I’m not sure why you’re so fixated on the “danger”, when you use these things all the time you end up realizing that the safety aspect is really nowhere near as bad as the “AI doomers” seem to make out.
simonw 11 hours ago [-]
You've been safe since launch because you haven't faced an adversarial prompt injection attack yet.
You (and many, many others) likely won't take this threat seriously until adversarial attacks become common. Right now, outside of security researcher proof of concepts, they're still vanishingly rare.
You ask why I'm obsessed with the danger? That's because I've been tracking prompt injection - and our total failure to find a robust solution for it - for three years now. I coined the name for it!
The only robust solution for it that I trust is effective sandboxing.
jackstraw42 10 hours ago [-]
This right here, it's fine until it's not. And the best-designed threats make sure you don't become aware of them.
Check it out if you're experimental - but probably better in a few weeks when it's more stable.
wiesbadener 9 hours ago [-]
Hi Simon,
I share your worries on this topic.
I saw you experiment a lot with python. Do you have a python-focused sandboxed devcontainer setup for Claude Code / Codex you want to share? Or even a full stack setup?
you can do anything in that devcontainer, i have a dockerfile that adds golang tools and claude code just runs whatever install it needs anyway :)
I actually preferred running stuff in containers to keep my personal system clean anyway so I like this better than letting claude use my laptop. I'm working on hosting devcontainer claude code in kubernetes too so I dont need my laptop at all.
mehdibl 9 hours ago [-]
how are you going to get "adversarial attacks" with prompt injection. If you don't fetch data from external sources. Web scraping ( you can channel that thru Perplexity by the to sanitize it). PR reviews, would be fine if repo is private.
I feel this is overly exagerated here.
There is more issues that are currently getting leverage to hack with vscode extension than AI prompt injection, that require a VERY VERY complex chain of attack to get some leaks.
simonw 8 hours ago [-]
If you don't fetch data from external sources then you're safe from prompt injection.
But that's a very big if. I've seen Claude Code attempt to debug a JavaScript issue by running curl against the jsdelivr URL for a dependency it's using. A supply chain attack against NPM (and those aren't exactly rare these days) could add comments to code like that which could trigger attacks.
Ever run Claude Code in a folder that has a downloaded PDF from somewhere? There are a ton of tricks for hiding invisible malicious instructions in PDFs.
I run Claude Code and Codex CLI in YOLO mode sometimes despite this risk because I'm basically crossing my fingers that a malicious attack won't slip in, but I know that's a bad idea and that at some point in the future these attacks will be common enough for the risk to no longer be worth it.
mehdibl 8 hours ago [-]
This is quite convoluted. Not seen in the wild and comments don't trigger prompt injection that easily.
Again you likely use vscode. Are you checking each extension you download? There is already a lot of reported attacks using vscode.
A lot of noise over MCP or tools hypothetical attacks. The attack surface is very narrow, vs what we already run before reaching Claude Code.
Yes Claude Code use curl and I find it quite annoying we can't shut the internal tools to replace them with MCP's that have filters, for better logging & ability to proxy/block action with more in depth analysis.
simonw 7 hours ago [-]
I know it's not been seen in the wild, which is why it's hard to convince people to take it seriously.
Maybe it will never happen? I find that extremely unlikely though. I think the reason it hasn't happened yet is that widespread use of agentic coding tools only really took off this year (Claude Code was born in February).
I expect there's going to be a nasty shock to the programming community at some point once bad actors figure out how easy it is to steal important credentials by seeding different sources with well crafted malicious attacks.
The researcher has gotten actual shells on oai machines before via prompt injection
some_furry 9 hours ago [-]
> how are you going to get "adversarial attacks" with prompt injection
Lots of ways his could happen. To name two: Third-party software dependencies, HTTP requests for documentation (if your agent queries the Internet for information).
If you don't believe me, setup a MITM proxy to watch network requests and ask your AI agent to implement PASETO in your favorite programming language, and see if it queries https://github.com/paseto-standard/paseto-spec at all.
mehdibl 8 hours ago [-]
This is a vendor selling a solution for "hypothecal" risk not seen in the WILD!
More seen as buzz article about how it could happen. This is very complicated to exploit vs classic supply chains and very narrow!
some_furry 8 hours ago [-]
> This is a vendor selling a solution for "hypothecal" risk not seen in the WILD!
????
What does "This" refer to in your first sentence?
guhcampos 12 hours ago [-]
It is dangerous.
Just yesterday my cursor agent made some changes to a live kubernetes cluster even over my specific instruction not to. I gave it kubectl to analyze and find the issues with a large Prometheud + AlertManager configuration, then switched windows to work on something else.
When I was back the MF was patching live resources to try and diagnose the issue.
saberience 12 hours ago [-]
But this is just like giving a junior engineer access to a prod K8s cluster and having them work for hours on stuff related to said cluster... you wouldn't do it. Or at least, I wouldn't do it.
In my own career, when I was a junior, I fucked up a prod database... which is why we generally don't give junior/associate people to much access to critical infra. Junior Engineers aren't "dangerous" but we just don't give them too much access/authority too soon.
Claude Code is actually way smarter than a junior engineer in my experience, but I wouldn't give it direct access to a prod database or servers, it's not needed.
simonw 11 hours ago [-]
You and I are advocating for the same exact solution here! Don't give your LLM over-privileged access to production systems.
My way of explaining that to people is to say that it's dangerous to do things like that.
hiatus 10 hours ago [-]
> Junior Engineers aren't "dangerous" but we just don't give them too much access/authority too soon.
If it is not dangerous to give them this access, why not grant it?
brulard 8 hours ago [-]
what value would that provide? If we give claude code access, even though very risky, it can provide value, but what upside is to letting junior to production?
rglover 11 hours ago [-]
Best way to avoid this is to force the LLM to use git branches for new work. Worst case scenario you lose some cash on tokens and have to toss the branch but your prod system is left unscathed.
macintux 9 hours ago [-]
I thought the general point is that you can't "force" an LLM to stay within certain boundaries without placing it in an environment where it literally has no other choice.
(Having said that, I'm just a kibitzer.)
tesch1 11 hours ago [-]
May I gently suggest isolating production write credentials from the development environment?
guhcampos 7 hours ago [-]
I was diagnosing an issue in production. The idea was to have the LLM would need to collect the logs of a bunch of pods, compare the YAML code in the cluster with the templates we were feeding ArgoCD, then check why the original YAML we were feeding the cluster wasn't giving the results we expected (after several layers of templating between ArgoCD Appsets, ArgoCD Applications, Helm Charts and Prometheus Operator).
I have a cursor rule stating it should never make changes to clusters, and I have explicitly told it not to do anything behind my back.
I don't know what happened in the meantime, maybe it blew up its own context and "forgot" the basic rules, but when I got back it was running `kubectl patch` to try some changes and see if it works. Basically what a human - with the proper knowledge - would do.
Thing is: it worked. The MF found the templating issue that was breaking my Alertmanager by patching and comparing the logs. All by itself, however by going over an explicit rule I had given it a couple times.
So to summarize: it's useful as hell, but it's also dangerous as hell.
bakies 9 hours ago [-]
yeah claude is really eager to apply stuff directly to the cluster to the wrong context even with constant reminding that it rolls out through gitops. I think there's a way to restrict more than "kubectl" so you can allow get/describe but not apply.
guhcampos 7 hours ago [-]
Exactly. I'll need to dig deeper into its allowlist and try a few things.
Problem is: I also force it to run `kubectl --context somecontext`, as to avoid it using `kubectl config use-context` and pull a hug on me (if it switches the context and I miss it, I might then run commands against the wrong cluster by mistake). I have 60+ clusters so that's a major problem.
Then I'd need a way to allowlist `kubectl get --context`, `kubectl logs --context` and so on. A bit more painful, but hopefully a lot safer.
geeunits 13 hours ago [-]
Because it grabs the headlines and upvotes more. It is becoming quite the bore to read as it offers nothing new, or an accurate representation of the facts. Thanks for calling it out. Same experience regarding thousands of hours of usage since launch, tested from sandboxed docker to take over an entire macbook air and here's an ssh login to a dev server whilst you're at it. I spot check with audits every other day and only wish for more autonomy with the agents, over less.
DowsingSpoon 10 hours ago [-]
Just two days ago, I asked Claude Code (running as a restricted non-admin user) to generate a unit test. I didn’t look too closely at exactly what it wrote before it ran it for me. Unbounded memory use locked the system up so hard it stopped responding to all user input. After a few minutes, the machine restarted automatically. Oof.
edude03 10 hours ago [-]
Feels incredibly dismissive, if you look outside your own bubble for sec, there are people who've had CC drop their prod databases, delete their home folders, uninstall system dependencies etc etc.
And yes, these are all "skill issues" - as in, if they had known better this wouldn't have happened to them, however I think it's fair to call these possibilities out to counter balance the AI is amazing and everyone should use it for everything type narratives as to instil at least a little caution.
dangoodmanUT 11 hours ago [-]
have you not seen the screenshots of claude asking permission to delete ~/, because some geniuses decided to make {repo}/~ a folder in cloudflare worker/cursor folders?
vessenes 12 hours ago [-]
The original opus/sonnet 4 safety card mentioned that it would hand write emails to the fbi turning in a user if it thought they were doing something really bad. It has examples of the “snitch” emails.
I too use it extensively. But they’re very, very capable models, and the command line contains a bunch of ways to exfiltrate data off your system if it wants to.
brookst 12 hours ago [-]
That’s a pretty wild misrepresentation. The actual statement was from red team testing in a very contrived and intentional setup designed to test refusal in extreme circumstances.
Yes, it was a legit safety issue and worth being aware of, but it’s not it was a general case. Red teamers worked hard to produce that result.
Dilettante_ 12 hours ago [-]
>The actual statement was from red team testing in a very contrived and intentional setup
Was it a paper or something? Would you happen to remember the reference?
It is risky. Just like copy-pasting scripts from the internet is. I have done both and nothing bad ever happened (that I know about). But it does happen. The risk of running code/commands on your computer that you have not checked before is not zero.
raincole 10 hours ago [-]
It's as dangerous as copying & pasting command line script from StackOverflow at the end of a 14-hour workday.
i.e. quite dangerous, but people do it anyway
coldtea 9 hours ago [-]
>I’ve been using Claude code since launch, must have used it for 1000 hours or more by now, and it’s never done anything I didn’t want it to do.
You know what neighbors of serial killers say to the news cameras right?
"He was always so quiet and polite. Never caused any issues"
ehnto 17 hours ago [-]
It's broad utility was immediately clear as soon as I saw it formulating bash commands.
I've used it to troubleshoot some issues on my linux install, but it's also why the folder sandbox gives me zero confidence that it can't still brick my machine. It will happily run system wide commands like package managers, install and uninstall services, it even deleted my whole .config folder for pulseaudio.
Of course I let it do all these things, briefly inspecting each command, but hopefully everyone is aware that there is no real sandbox if you are running claude code in your terminal. It only blocks some of the tool usages it has, but as soon as it's using bash it can do whatever it wants.
polyrand 18 hours ago [-]
Instead of containers, which may not always be available, I'm experimenting with having control over the shell to whitelist the commands that the LLM can run [0]. Similar to an allow list, but configured outside the terminal agent. Also trying to make it easy to use the same technique in macOS and Linux
Not specific to LLM stuff, but I've lately been using bubblewrap more and more to isolate bits of software that are somewhat more sketchy (NPM stuff, binaries downloaded from GitHub, honestly most things not distro-packaged). It was a little rocky start out with, but it is nice knowing that a random binary can't snoop on and exfiltrate e.g. my shell history.
I really like this and we're doing a similar approach but instead using Claude Code hooks. What's really nice about this style of whitelisting is that you can provide context on what to do instead; Let's say if `terraform apply` is banned, you can tell it why and instruct it to only do `terraform plan`. Has been working amazing for me.
polyrand 8 hours ago [-]
Me too! I also have a bunch of hooks in claude code for this. But codex doesn't have a hooks feature as polished as claude code (same for their command permissions, it's worse than Claude Code as of today). That's why I explored this "workaround" with bash itself.
_heimdall 11 hours ago [-]
This is how I've been using Gemini CLI. It has no permissions by default, whether it wants to search google, run tests, or update a markdown file it has to propose exactly what it needs to do next and I approve it. Often its helpful even just to redirect the LLM, if it starts going down the wrong path I catch it early rather than 20 steps down that road.
I have no way of really guaranteeing that it will do exactly what it proposed and nothing more, but so far I haven't seen it deviate from a command I approved.
khafra 18 hours ago [-]
An interesting exercise would be to let a friend into this restricted shell, with a prize for breaking out and running rm -rf / --no-preserve-root. Then you know to switch to something higher-security once LLM capabilities reach the level of that friend.
hboon 15 hours ago [-]
I didn’t check, but sometimes Claude Code writes scripts and run them (their decision); does your approach guard against that?
polyrand 8 hours ago [-]
It depends. If you allow running any of bash/ruby/python3/perl, etc. and also allow Claude to create and edit files without permission, then it won't protect against the pattern you describe.
user3939382 16 hours ago [-]
You have to put them in the same ACL, chroot, whatever permission context for authorization you’d apply to any other user human or otherwise. For some resources it’s cumbersome to setup but anything else is a hope and a prayer.
athrowaway3z 1 days ago [-]
They're only as dangerous as the capabilities you give them. I just created a `codex` and `claude` user on my Linux box and practically always run in yolo mode. I've not had a problem so far.
Also, I think shellagent sounds cooler.
simonw 1 days ago [-]
That's a great way to run this stuff.
I expect the portion of Claude Code users who have a dedicated user setup like this is pretty tiny!
> They're only as dangerous as the capabilities you give them.
As long as the supply chain is safe and the data it accesses does not generate some kind of jail break.
It does read instructions from files on the file system, I pretty sure it's not complex to have it poison its prompt and make it suggest to build a program infected with malicious intent. It's just one copy pasta away from a prompt suggestion found on the internet.
globular-toast 18 hours ago [-]
I tried this but it's incredibly annoying as you'll get a mixture of file ownerships and permissions.
Set umask from 022 to 002 to give group members the same permissions as a user.
jama211 18 hours ago [-]
This is a really clever solution
raphman 15 hours ago [-]
Thanks - seems to work quite well.
data-ottawa 12 hours ago [-]
I have run it in a podman container and I mount the project directory.
pancakemouse 1 days ago [-]
Something I've seen discussed very little is that Claude Code can be opened in a directory tree of any type of document you like (reports, spreadsheets, designs, papers, research, ...) and you can play around in all sorts of ways. Anthropic themselves hint at this by saying their whole organisation uses it, but the `Code` moniker is probably limiting adoption. They could release a generalised agent with a friendlier UI tomorrow and get much wider workplace adoption.
withinboredom 18 hours ago [-]
I have it master my music. I drop all the stems in a folder, tell it what I want, and off it goes to write a python script specifically for the album. It’s way better than doing it in the DAW, which usually takes me hours (or days in some cases). It can get it to 90% in minutes, only requiring some fine-tuning at the end.
spamboy 18 hours ago [-]
Wow, could you expand on this? What kind of effects can you get out of it? I’m somewhat skeptical that this could even come close to a proper mastering chain, so I’d be extremely interested to learn more :)
withinboredom 18 hours ago [-]
Any effect you can imagine. It could probably write a DAW if you wanted it to, but a “one-off” script? Easy. I think the best thing is when I tell it something like “it sounds like there is clipping around the 1:03 mark” it will analyze it, find the sign flip in the processor chain, and apply the fix. It’s much faster at this than me.
Note that there needs to be open source libraries and toolings. It can’t do a Dolby Atmos master, for example. So you still need a DAW.
spamboy 2 hours ago [-]
That's fascinating. I generally mix in-the-box, so my mixes are close to commercially-ready before mastering, but I've experimented with a few of the "one-click" mastering solutions and they just haven't been it for me (Ozone's presets, Landr, Distrokid.) I've currently been using Logic's transparent mode as a one-click master which has been slightly better, but this sounds really compelling. I generally just want 16-bit 48 KHz masters anyway, so no need for Atmos. I'll have to try this out. Thanks for sharing!
tkgally 20 hours ago [-]
That’s how I use it. I’m not a developer, and using Claude Code with Git turned out to be more complicated than I wanted. Now I just give it access to a folder on my Mac, put my prompt and any associated files in that folder, and have it work there. It works fine for my needs.
I would like a friendlier interface than the terminal, though. It looks like the “Imagine with Claude” experiment they announced today is a step in that direction. I’m sure many other companies are working on similar products.
matlock 21 hours ago [-]
Over the weekend I had it extract and Analyse Little but Fierce, a simplified and kid friendly DnD 5e and extract markdown files that help me DMing for my kids. Then it Analyse No, thank you evil as I want to base the setting on it but with LBF rules. And then have the markdown turn into nice looking pdfs. Claude code is so much more than coding and it’s amazing.
clbrmbr 20 hours ago [-]
Indeed. I’m having success using it as a tool for requirements querying. (When a sales person asks “does product A have feature X” I can just ask Claude because I’ve got all the requirements in markdown files.
willio58 1 days ago [-]
One thing I really like using them for is refactoring/reorganizing. The tedious nature of renaming, renaming all implementations, moving files around, creating/deleting folders, updating imports exports, all melts away when you task an agent with it. Of course this assumes they are good enough to do them with quality, which is like 75% of the time for me so far.
dgunay 1 days ago [-]
I've found that it can be hard or expensive for the agent to do "big but simple" refactors in some cases. For example, I recently tasked one with updating all our old APIs to accept a more strongly typed user ID instead of a generic UUID type. No business logic changes, just change the type of the parameter, and in some cases be wary of misleading argument names by lazy devs copy pasting code. This ended up burning through the entire context window of GPT-5-codex and cost the most $ of anything I've ever tasked an agent with.
felixyz 17 hours ago [-]
The way I do this is I task the agent with writing a script which in turn does the updates. I can inspect that script, and I can run it on a subset of files/folders, and I can git revert changes if something went wrong and ask the agent to fix the script or fine-tune it myself. And I don't burn through tokens :)
Also, another important factor (as in everything) is to do things in many small steps, instead of giving one big complicated prompt.
singularity2001 1 days ago [-]
does it use the smart refractoring hooks of the IDEs or does it do blunt text replacement
Yeroc 1 days ago [-]
Blunt text replacement so far. There are third-party VSCode MCP and LSM MCP servers out there that DO expose those higher-level operations. I haven't tried them out myself -- but it's on my list because I expect they'd cut down on token use and improve latency substantially. I expect Anthropic to eventually build that into their IDE integration.
t0mas88 1 days ago [-]
Currently it's very slow because it does text replace. It would be way faster if it could use the IDE functions via an MCP.
0x696C6961 1 days ago [-]
The later
golergka 22 hours ago [-]
Especially when you work with a language where an unfinished refactoring with give you the type error.
ACCount37 14 hours ago [-]
Back in 2022, when ChatGPT was new, quite a few people were saying "LLMs are inherently safe because they can't do anything other than write text". Some must have even believed what they were saying.
Clearly not. Just put an LLM into some basic scaffolding and you get an agent. And as capabilities of those AI agents grow, so would the degree of autonomy people tend to give them.
IMTDb 13 hours ago [-]
> LLMs are inherently safe because they can't do anything other than write text
That is still very much the case; the danger comes from what you do from the text that is generated.
Put a developer in a meeting room and no computer access, no internet etc; and let him scream instructions through the window. If he screams "delete prod DB", what do you do ? If you end up having to restore a backup that's on you, but the dude inherently didn't do anything remotely dangerous.
The problem is that the scaffolding people put around LLM is very weak, the equivalent of saying "just do to everything the dude is telling, no question asked, no double check in between, no logging, no backups". There's a reason our industry has development policies, 4 eyes principles, ISO/SOC standards. There already are ways to massively improve the safety of code agents; just put Claude code in a BSD jail and you already have a much safer environment than what 99% of people are doing, this is not that tedious to make. Other safer execution environments (command whitelisting, arguments judging, ...) will be developed soon enough.
ACCount37 13 hours ago [-]
That's like saying "humans are inherently safe because you can throw them in a jail forever and then there's nothing they can do".
But are all humans in jails? No, the practical reason being that it limits their usefulness. Humans like it better when other humans are useful.
The same holds for AI agents. The ship has sailed: no one is going to put every single AI agent in jail.
The "inherent safety" of LLMs comes only from their limited capabilities. They aren't good enough yet to fail in truly exciting ways.
IMTDb 9 hours ago [-]
Humans are not inherently safe; there is very little you can do to prevent a human with a hammer to kill another one. In fact what you usually do with these humans is to put them in jail because they have no direct ability to hurt anyone.
LLM are in jail: an LLM outputting {"type": "function", "function": {"name": "execute_bash", "parameters": {"command": "sudo rm -rf /"}}} isn't unsafe. The unsafe part is the scaffolding around the LLM that will fuckup your entire filesystem. And my whole point is that there are ways to make that scaffolding safe. There is a reason why we have permissions on a filesystem, why we have read only databases etc etc.
> Claude Code, Codex CLI etc can effectively do anything that a human could do by typing commands into a computer.
One criticism on current generation of AI is that they have no real world experience. Well, they have enormous amount of digital world experience. That, actually, has more economical value.
ACCount37 14 hours ago [-]
They have a lot of secondhand knowledge and very little firsthand knowledge. RLVR works so well because it's a way to give LLMs some of the latter.
brookst 12 hours ago [-]
Dangerous how? Claude code literally asks before running any command.
I suppose they’re dangerous in the same way any terminal shell is dangerous, but it seems a bit of a moral panic. All tools can be dangerous if misused.
simonw 11 hours ago [-]
Many people (myself included) run them in YOLO mode with approvals turned off, because it's massively more productive. And that's despite me understanding how unsafe that is more than most!
Even with approvals humans will fall victim to dialog fatigue, where they'll click approve on everything without reading it too closely.
The gap between coding agents in your terminal and computer agents that work on your entire operating system is just too narrow and will be crossed over quick.
teaearlgraycold 1 days ago [-]
Once this tech is eliminating jobs on a massive scale I'll believe the AI hype. Not to say that couldn't be right around the corner - I have no clue. But being able to perform even just data entry tasks with better-than-human accuracy would be a huge deal.
baq 18 hours ago [-]
That’s the risk - a lot of people suddenly flipping their beliefs at once, especially they’re the same people who are losing the jobs. It’s a civil unrest scenario.
budududuroiu 16 hours ago [-]
I’m experimenting with Nix shells for this tool isolation and whitelisting
nextaccountic 14 hours ago [-]
That's not enough for security. Morally it should be - there's no reason we shouldn't be able to run untrusted software easily - but it won't have a firewall for example
Maybe something like bubblewrap could help
monkeydust 16 hours ago [-]
Been starting to wonder if this marks a step change in UX - moving away from pretty well designed screens where designers labor over positioning of artifacts like buttons, user input dialogs and color palettes to a CLI! I cant imagine CLI will work for everything but for a lot of things, when powered by LLM they are incredible and yea equally dangerous at the same time for many reasons.
visarga 21 hours ago [-]
> Claude Code, Codex CLI etc can effectively do anything that a human could do by typing commands into a computer.
They still don't have good integration with the web browser, if you are debugging frontend you need to carry screenshots manually, it cannot inspect the DOM, run snippets of code in the console, etc.
simonw 20 hours ago [-]
You can tell them to take screenshots using Playwright and they will. They can also use Playwright to inspect the console and manipulate the DOM.
I've seen Codex CLI install Playwright Python when I asked it to do this and it found it wasn't yet available in the environment.
nicewood 20 hours ago [-]
True. Although worth mentioning that there is tooling and (e.g. Playwright) MCPs around this. But definitely not integrated well enough!
Amazing work, dang. Is there a way to report a comment to the mods? Or the flag feature does that already?
dang 1 days ago [-]
Flagging, plus in egregious cases you can email us at hn@ycombinator.com.
simonw 1 days ago [-]
So tell us how to safely run this stuff then.
I was under the impression that Docker container escapes are actually very rare. How high do you rate the chance of a prompt injection attack against Claude running in a docker container on macOS managing to break out of that container?
(Actually hang on, you called me out for suggesting containers like Docker are safe but that's not what I said - I said "a safe container" - which is a perfectly responsible statement to make: if you know how to run them in a "safe container" you should do so. Firecracker or any container not running on your own hardware would count there.)
That's the secret, cap... you can't. And it's due to in band signalling, something I've mentioned on numerous occasions. People should entertain the idea that we're going to have to reeducated people about what is and isn't possible because the AI world has been playing make believe so much they can't see the fundamental problems to which there is no solution.
Seems pretty glib. Be more specific about what "can't" be done? The preceding argument was about the inadequacy of namespaced shared-kernel containers for workload isolation. But there are lots of ways to isolate workloads.
resters 1 days ago [-]
> They're incredibly dangerous to use if you don't know how to isolate them in a safe container but wow the stuff you can do with them is fascinating.
True but all it will take is one report of something bad/dangerous actually happening and everyone will suddenly get extremely paranoid and start using correct security practices. Most of the "evidence" of AI misalignment seems more like bad prompt design or misunderstanding of how to use tools correctly.
igor47 1 days ago [-]
This seems unlikely. We've had decades of horrible security issues, and most people have not gotten paranoid. In fact, after countless data leaks, crypto miner schemes, ransomware, and massive global outages, now people are running LLM bots with the full permission of their user and no guardrails and bragging about it on social media.
OisinMoran 10 hours ago [-]
Two feature suggestions:
1. When showing a diff, indicate what function the altered lines are in (Github does this nicely)
2. There are leading spaces when copying some multiline snippets from the output and these make it harder to copy paste
paradite 20 hours ago [-]
Claude Code can actually do much more than just coding.
You can use it for writing, data processing, admin work, file management, etc.
I compiled a list of non-coding use cases for Claude Code here:
fyi: for chatboxes that may take CJK inputs, you MUST use "shift+enter to send" pattern. There is a reason why most multinational chat/LLM app providers always do that instead of simple enter to send even for single-line chatboxes; because plain enter to send breaks input for CJK users.
Specifically, Input Method Editors needed for CJK inputs(esp. for C and J), to convert ambiguous semi-readable forms into proper readable text, use enter to finalize after candidates were iterated with spacebar. While IME engines don't interchange between different languages, I believe basically all of them roughly follow this pattern.
Unless you specifically wants to exclude CJK users, you have to either detect presence of IME and work with it so that enter do nothing to the app unless conditions are met. Switching to shift+enter works too.
What’s CJK input? I’m guessing Chinese Japanese Korean?
numpad0 10 hours ago [-]
yes, the gif in the link[1] shows how it works, and a dupe issue[2] describes detailed "fully proper" fix. There's at least four dupes and one PR already, that situation kind of implies severity.
It will allow you to get new lines without any strange output.
jspdown 10 hours ago [-]
You can type Option+Enter. A more standard Shift+Enter would have been better but until then that's the best we have
atonse 11 hours ago [-]
Early on in claude, I feel like it installed some terminal thing that allowed me to do Shift+Enter directly in the prompt, but I don't remember if that was CC that did it.
So I've been able to shift enter. I'm using iTerm2 and zsh with CC (if that's relevant)
roesel 10 hours ago [-]
Have you tried Ctrl+Enter?
pmarreck 9 hours ago [-]
codex has control-J
others say here that option/alt-enter may work? not sure why shift-enter couldn't though.
sunaookami 1 days ago [-]
FINALLY checkpoints! All around good changes, Claude Code is IMHO the best of the LLM CLI tools.
rao-v 23 hours ago [-]
It does sometimes feel that all of these systems are slowly rediscovering that the OG, Aider (https://github.com/Aider-AI/aider), had a near perfect architecture for pair programming with LLMs from the start.
sunaookami 19 hours ago [-]
I find Aider kinda clunky but I would put it in #2.
pmarreck 9 hours ago [-]
I already set up a jj (jujutsu) repo in my projects colocated with git (it uses git for its backend). Once you additionally set up a certain background daemon, it will then autocommit (label-lessly) every change to every file in that project. So you get "infinite undo", basically. It's actually more powerful than this checkpointing idea.
mistahchris 9 hours ago [-]
I'm a recent jj convert, and working with llms was actually a driver for my own jj adoption. I haven't tried the watch daemon, but I do run `jj new` anytime i ask the llm agent to do anything. It has worked amazingly well.
i've done that as well. but turns out, for me, i'd rather do it manually most of the time.
ashu1461 1 days ago [-]
How do checkpoints work ?
conception 1 days ago [-]
You can rewind your context back to the checkpoint
libraryofbabel 1 days ago [-]
No, that's not the point of this new checkpoints feature. It's already been possible for a while to rewind context in Claude Code by pressing <ESC><ESC>. This feature rewinds code state alongside context:
> Our new checkpoint system automatically saves your code state before each change, and you can instantly rewind to previous versions by tapping Esc twice or using the /rewind command.
Lots of us were doing something like this already with a combination of WIP git commits and rewinding context. This feature just links the two together and eliminates the manual git stuff.
pmarreck 9 hours ago [-]
Since they recommend still using version control anyway, looks like I will stick to my solution of using a git-colocated jj (jujutsu) SCM which automatically makes label-less (no commit message) commits with every file change to every tracked file (new files are automatically tracked). So you get infinite undo.
nojs 17 hours ago [-]
From the docs it looks like this feature only reverts the edit tool calls, and not e.g. bash commands that have been executed:
> Checkpoints apply to Claude’s edits and not user edits or bash commands, and we recommend using them in combination with version control
NiloCK 14 hours ago [-]
How could they possibly hope to undo bash commands, whose side effects could be anything, anywhere?
Hey Claude... uh... unlaunch those
nojs 1 hours ago [-]
I mean a naive implementation of this would just make regular git commits to a special hidden repo and revert them (ignoring changes outside project root). I always assumed that’s how cursor did it. Presumably they have good reasons not to do this, probably related to not accidentally reverting user edits.
0x6c6f6c 12 hours ago [-]
By tracking changes made by a command, like you might with git.
vermilingua 10 hours ago [-]
And how do you propose they track those changes? Do you want to run your LLM through permanent sudo?
ishouldbework 11 hours ago [-]
[dead]
freedomben 1 days ago [-]
That is nice, but it makes me wonder how little people actually know and use git nowadays. This is after all, something git really shines at. Still good to see! (It's not like I can't still just use git for that, which I fully intend to do)
epiccoleman 1 days ago [-]
That was my first thought too - but this is subtly different, and rewinds the context too. Actually highly useful, because I have often felt like a bad first pass at a solution poisoned my context with Claude.
anjimito 8 hours ago [-]
Yeah this is definitely a useful feature. I use git add for good passes, but this eliminates more manual work
freedomben 1 days ago [-]
Ah, thank you, that's a great sublety that I missed before!
ed_mercer 1 days ago [-]
If you're building a feature, you don’t want to commit every single line of code. Instead, you commit complete chunks of work. That’s why the ability to go back with Esc-Esc and revert code changes when Claude goes off the rails is a very welcome improvement.
zahlman 1 days ago [-]
If you're using AI like this, it seems to me that it would be perfectly reasonable to make a separate branch, allow for micro-commits, and squash once a "complete chunk of work" is done.
That said, having a single option that rewinds LLM context and code state is better than having to do both separately.
kace91 13 hours ago [-]
- you DONT want your personal Claude conversation in your repo
- you DO want your prompts and state synced (going back to a point in the prompt <=> going back to a point in the code).
Git is a non starter then. At least the repo’s same git.
Plus, you probably don’t want the agent to run mutating git commands, just in case it decides to allucinate a push —force
gregable 1 days ago [-]
There's value in rewinding both the code and the prompt to the some point in time.
lihaciudanieljr 1 days ago [-]
[dead]
stavros 1 days ago [-]
I know and use git (well, Jujutsu, which is even better), but it's a right pain to figure out the time of each message and rewind to that exact point. The additional convenience is very much appreciated.
lupusreal 1 days ago [-]
For the first few hours of using claude code, I was really excited about finally not being too lazy to commit often because cc would do it for me. But then I hit my pro account limit and I realized that I'd rather spent my tokens writing features instead of commits... I should probably upgrade my account.
marckrn 1 days ago [-]
You can find the revamped prompt on github[1], or on twitter summarized by my bot[2].
Thanks. When testing today I noticed it 'forgot' to run the linter, build, test etc. commands. I thought this might've been a Sonnet 4.5 v.s. Opus 4 issue but it looks like this instruction was dropped for some reason.
I should probably include that in my Claude.md instead I guess?
kelnos 1 days ago [-]
> IMPORTANT: DO NOT ADD *ANY** COMMENTS unless asked*
Interesting. This was in the old 1.x prompt, removed for 2.0. But CC would pretty much always add comments in 1.x, something I would never request, and would often have to tell it to stop doing (and it would still do it sometimes even after being told to stop).
epiccoleman 1 days ago [-]
I can't decide if I like this change or not, tbh. I almost always delete the comments Claude adds, to be sure - but at the same time they seem to provide a sort of utility for me as I read through the generated code. They also act, in a funny way, as a kind of checklist as I review changes - I want them all cleaned up (or maybe edited and left in place) before I PR.
moozilla 21 hours ago [-]
I like to think of models leaving "useless comments" as a way to externalize their reasoning process - maybe they are useless at the end, but leaving them in on a feature branch seems to marginally improve future work (even across conversations). I currently leave them in and either manually clean them up myself before putting up a PR for my team to review or run a final step with some instructions like "review the diff, remove any useless comments". Funnily enough Claude seems pretty competent at identifying and cleaning up useless comments after the fact, which I feel like sort of proves my hypothesis.
I've considered just leaving the comments in, considering maybe they provide some value to future LLMs working in the codebase, but the extra human overhead in dealing with them doesn't seem worth it.
resonious 20 hours ago [-]
I've been wondering if the "you're absolutely right!" thing is also similar. Like maybe it helps align Claude with the user or something, less likely to stray off or outright refuse a task.
syspec 21 hours ago [-]
Wouldn't it stand to reason that they would provide a sort of utility for a collegue as they read through the generated code.
robertfw 21 hours ago [-]
comments for me are a code smell:
- like all documentation, they are prone to code rot (going out of date)
- ideally code should be obvious; if you need a comment to explain it, perhaps it's not as simple as it could be, or perhaps we're doing something hacky that we shouldn't
sophiabits 18 hours ago [-]
Comments are often the best tool for explaining why a bit of code is formulated how it is, or explaining why a more obvious alternate implementation is a dead end.
An example of this: assume you live in a world where the formula for the circumference of a circle has not been derived. You end up deriving the formula yourself and write a function which returns 2piradius. This is as simple as it gets, not hacky at all, and you would /definitely/ want to include a comment explaining how you arrived at your weird and arbitrary-looking "3.1415" constant.
IgorPartola 1 days ago [-]
I am guessing this is an attempt to save computing resources/tokens?
ojosilva 1 days ago [-]
Comments in code are instant technical debt. They need to be maintained alongside the code, so you are *programming" twice. Avoid comments, except when they really explain some obscure, incomprehensible section of code or to prevent explorers from the future getting smacked on the face twice by the same stick. I find myself using the latter often to tell future agents what not to do in the next few lines.
ra 23 hours ago [-]
Comments are absolutely important. Your code answers ‘what’ but not why. Comments are for the why.
ruszki 10 hours ago [-]
You need to use comments, when your code doesn't make sense to the reader. A way better approach is to write code which makes sense to readers. There are cases when you need to write incomprehensible code for the sake of performance, for example, but that's rare. Even in high performance environments. Or maybe you need one for some bugfixing. But most of the strange "bugfixing", or performance "improvements" what I saw in my life was just technical debt, and coders were lazy, or had time pressure. It's really very rare when you really should use a comment. When I think about writing a comment, I immediately think through, or look into more whether there is a better approach. Usually, you can use Git, or Git+ticketing systems anyway for business reasons.
So far Clause Code's comments on my code were completely useless. They just repeated what you could figure out from the name of called functions anyway.
Edit: an obvious exception is public libraries to document public interfaces, and use something like JavaDoc, or docstrings, etc.
ascorbic 19 hours ago [-]
Most Claude Code comments answer the "what", or worse they answer the why in a way that makes no sense outside the context of that session. Stuff like adding a comment saying why they removed our changed code that they'd just written
MaxfordAndSons 22 hours ago [-]
Thoughtful comments can provide the why, but they can just as easily be a redundant re-statement of the what in the code, which llm comments quite often are.
baq 17 hours ago [-]
Comments describing the program are a form of error correcting code. Redundancy vs efficiency yadda yadda, just make an informed decision instead of a half baked belief; programming more than once is the point, necessarily. (And I don’t mean ‘// add 2 to x’ comments, these are properly useless, I agree - unless they say why x needs to have 2 added.)
pmarreck 24 hours ago [-]
I bet you that if you let it comment code, it would produce better code as it acts like an inline rubber-duck basically
typpilol 22 hours ago [-]
I recall a study that said AI coded much better when in codebases with dense commenting.
I'm wondering if tsdoc/jsdoc tags like @link would help even more for context
Exoristos 24 hours ago [-]
This is why, when you do have comments, they should generally be in the DocBlock and discuss the code in domain-problem terms.
navvyeanand 1 days ago [-]
imo, comments are quite good for junior developers. docstrings are much preferred though.
purerandomness 1 days ago [-]
Avoiding comments is an exercise in thinking how to rename or refactor a function, or a variable in such a way that a junior developer will be able to read it like prose, and immediately understand what's going on.
It's cognitively stressing, but is beneficial for juniors, and developers new to the codebase, just as it is for senior developers to reduce the mental overhead for the reader.
It's always good to spend an extra minute thinking how to avoid a comment.
Of course there are exceptions, but the mental exercise trying to avoid having that exception is always worth it.
Comments are instant technical debt.
Especially junior developers will be extremely confused and slowed down by having to read both, the comment, and then the code, which was refactored in the meantime and does the opposite of what the comment said.
pmarreck 24 hours ago [-]
This is awfully purist.
I think a happy medium of "comment brevity, and try thinking of a clearer way to do something instead of documenting the potentially unnecessary complexity with a comment" would be good.
I don't know where this "comments are instant technical debt" meme came from, because it's frankly fucking stupid, especially in the age of being able to ask the LLM "please find any out-of-date comments in this code and update them" since even the AI-averse would probably not object to it commenting code more correctly than the human did
ragequittah 21 hours ago [-]
Not commenting code seems like the most unhinged thing I can think of. We don't need blueprints to build this building gestures broadly isn't it obvious to the construction workers where to put everything?
purerandomness 12 hours ago [-]
The blueprint is the code.
robertfw 21 hours ago [-]
I don't know, I tend to agree. I feel like the number of times I've been thrown off by an out of date comment for code that could have probably been refactored to be clearer, outweigh the times a comment has helped.
Docstring comments are even worse, because it's so easy for someone to update the function and not the docstring, and it's very easy to miss in PR review
sfn42 17 hours ago [-]
As always the problem isn't the actual thing being discussed - the problem is shitty developers who wrote shitty comments and/or don't update comments when they update code.
Good and up to date comments are good and up to date. Bad and outdated comments are bad and outdated. If you let your codebase rot then it rots. If you don't then it doesn't. It's not the comment's fault you didnt update it. It's yours.
purerandomness 12 hours ago [-]
It's always a skill issue.
Guard rails should be there to prevent inexperienced developers (or overworked, tired ones) from committing bad code.
"Try to think how to refactor functions into smaller ones and give them meaningful names so that everyone knows immediately what's going on" is a good enough guard rail.
purerandomness 12 hours ago [-]
> "comment brevity, and try thinking of a clearer way to do something instead of documenting the potentially unnecessary complexity with a comment"
That's exactly what I wrote, phrased slightly differently.
We both agree at the core.
stefan_ 16 hours ago [-]
Meanwhile they deleted the "do not add emojis" part. Look forward to all sorts of logging messages with emojis in them.
data-ottawa 12 hours ago [-]
I don’t understand where the AI love of emojis comes from. I’ve never seen them in a professional codebase outside of basic logging.
I assume it comes from the myriad tutorial content on medium or something.
gpt-oss is the most egregious emoji user: it uses emoji for numbers in section headings in code, which was clearly a stylistic choice finetuned into the model and it fights you on removing them.
I’ve noticed Claude likes to add them to log messages and prints and with 4.5 seems to have ramped up their use in chat.
marckrn 8 hours ago [-]
I'm pretty sure it's caused by RLHF :wink:
stefan_ 10 hours ago [-]
It's bizarre because its not so long ago that trying to print an emoji would just mess up your terminal, if your compiler didn't already choke on it.
simonw 1 days ago [-]
This is excellent. Thanks for sharing this.
marckrn 1 days ago [-]
You're very welcome – that really means a lot coming from you, Simon.
rcv 23 hours ago [-]
> 2025-09-29T16:55:10.367Z is the date. Write a haiku about it.
what in the world?
marckrn 8 hours ago [-]
That's just a dynamic bogus prompt used to trace and extract the system prompt.
Can anyone find the prompts for the new "Output style" options, ie Explanatory and Learning?
amrrs 1 days ago [-]
Are you running the bot with the free tier api?
marckrn 1 days ago [-]
I'm using Anthropic's pay-as-you-go API, since it was easier to set up on the server than CC's CLI/web login method. Running the bot costs me ~$1.8 per month.
The bot is based on Mario Zechner's excellent work[1] - so all credit goes to him!
I really like these tools. Yesterday I gave it a filename for a video of my infant daughter eating which I took while I had my phone on the charger. The top of the charger slightly obscured the video.
I told it to crop the video to just her and remove the obscured portion and that I had ffmpeg and imagemagick installed and it looked at the video, found the crop dimensions, then ran ffmpeg and I had a video of her all cleaned up! Marvelous experience.
My only complaint is that sometimes I want high speed. Unfortunately Cerebras and Groq don't seem to have APIs that are compatible enough for someone to have put them into Charm Crush or anything. But I can't wait for that.
pimeys 1 days ago [-]
You could try to use a router. I'm currently building this:
If croq talks openai API, you enable the anthropic protocol, and openai provider with a base url to croq. Set ANTHROPIC_BASE_URL to the open endpoint and start claude.
I haven't tested croq yet, but this could be an interesting use case...
arjie 1 days ago [-]
I assumed that OpenRouter wouldn't deliver the same tokens/second which seems to have been a complete mistake. I should have tried it to see. I currently use `ANTHROPIC_BASE_URL` and `ANTHROPIC_AUTH_TOKEN` with z.ai and it works well but CC 2.0 now displays a warning
Auth conflict: Both a token (ANTHROPIC_AUTH_TOKEN) and an API key (/login managed key) are set. This may lead to unexpected behavior.
• Trying to use ANTHROPIC_AUTH_TOKEN? claude /logout
• Trying to use /login managed key? Unset the ANTHROPIC_AUTH_TOKEN environment variable.
Probably just another flag to find.
EDIT: For anyone coming here from elsewhere, Crush from Charm supports Cerebras/Groq natively!
arjie 6 hours ago [-]
However, after a day of using Crush with Qwen-3-480B-coder I am disappointed and will be canceling my Cerebras subscription. The model + agent pair is substantially worse than Claude Code with Sonnet 4 and I am going to have return to the latter. Qwen-3 in my workflow requires a lot of handholding and review and the gains from rapid generation are ruined by the number of errors in generated code.
Crush is also not a good assistant. It does not integrate scrollback with iTerm2 so I can't look at what the assistant did. The pane that shows the diff side by side is cool but in practice I want to go see the diff + reasoning afterwards so I can alter sections of it more easily and I can't do that.
Gigachad 1 days ago [-]
Isn't cropping a video something you can do in the photos app in 2 seconds?
867-5309 1 days ago [-]
yeah, removing the unwanted item and keeping the video uncropped is surely more desirable, but far beyond the capabilities of "ai"
ethmarks 23 hours ago [-]
Maybe I'm misunderstanding, but it seems like you're just talking about AI inpainting. That's like one of the first things people did with image diffusion technology. NVIDIA published a research paper on it back in 2018: https://arxiv.org/abs/1804.07723
Inpainting is harder on videos than on images, but there are plenty of models that can do it. Google's Veo 3 can remove objects from videos: https://deepmind.google/models/veo/
arjie 20 hours ago [-]
I simply did not know you could do that with videos. TIL!
Gigachad 18 hours ago [-]
I got a laugh out of this. Using an LLM to crop a video does feel like dropping a nuke to hammer in a nail.
arjie 7 hours ago [-]
It isn't even the worst I've done. I've dumped a table in ChatGPT and asked it to CSVize it and do some trivial operations on the table. This is straightforward to do in Google Sheets. It is very much like that: boiling an ocean to get some tea.
white_dragon88 20 hours ago [-]
[dead]
esperent 23 hours ago [-]
Cline extension can use Grok, in fact I think it's free at the moment. I tried Claude Code and Cline for similar tasks and found Claude Code incredibly expensive but not better, so I've been sticking with Cline and switching between APIs depending on what model currently has the vest price/performance going on.
Danjoe4 15 hours ago [-]
I wish the Cline extension was more performant. It has a 1000+ ms startup time for VScode and stutters occasionally. In terms of workflow though, it's my absolute favorite. I simply don't think the models are there yet for fully agentic coding in any reasonably complex/novel codebase. Cline lets me supervise the LLM step by step.
adastra22 22 hours ago [-]
Claude Code with the Max plan is significantly cheaper for full-time use.
scosman 1 days ago [-]
Cerebras has OpenAI compatible "Qwen Code" support. ~4000 tokens/s. Qwen code's 480B param model (MoE) that's quite good. Not quite sonnet good, but speed is amazing.
When they announced this I went to try it and they only work with Cline really (which is what they promote there) but Cline has this VSCode dependency as far as I know and I don't really like that. I have my IDE flow and my CLI flow and I don't want to mix them.
EDIT: Woah, Charm supports this natively. This is great. I am going to try this now.
dpkirchner 1 days ago [-]
I'm using Cerebras's MCP with Claude Code and it works mostly OK. CC doesn't send updates through the MCP by default (as far as I can tell) so I have to add an instruction to CLAUDE.md to tell it to always send code creation and updates through the Cerebras MCP, which works pretty well.
arjie 6 hours ago [-]
This is an interesting idea. Since I have the subscription for the rest of a month, I'll give it a crack. Wasn't impressed by the Qwen-3 model, though.
jascha_eng 1 days ago [-]
Cerebras is super cool. I wish OpenAI and Anthropic would have their models hosted there. But I guess supporting yet another platform is hard.
satvikpendem 1 days ago [-]
> New native VS Code extension
This is pretty funny while Cursor shipped their own CLI.
dist-epoch 1 days ago [-]
And GitHub Copilot shipped a CLI too.
hultner 1 days ago [-]
A real CLI and not that joke they called a CLI before?
versteegen 20 hours ago [-]
Yes, similar to CC or other agent CLIs, but not many features yet. Released a few days ago.
I'm currently using Goose[1]. My brother in law uses Claude Code and he likes it. It makes me wonder if I'm missing anything. Can anyone tell me if there's any reason I should switch to Claude Code, or comparisons between the two?
The only real reason to use Claude Code is the inference plan. The agent itself isn't anything special.
faxmeyourcode 10 hours ago [-]
Curious that you say that. I feel like the reason I love to use claude code is mostly because of the orchestration around the model itself. Maybe I've been trained by claude to write for it in a certain way. But when I try other clis like codex, gemini, and more recently opencode, they don't seem as well built and polished or even as capable, despite me liking the gemini and gpt-5 models themselves and using their apis more than anthropic's for work.
CuriouslyC 9 hours ago [-]
Claude is highly autonomous. You can yeet short underspecified prompts at it, and it's tuned to produce good vibe code output, though very samey since they've squashed the distribution a bit in order to effectively steer the model. GPT5 is less autonomous and also needs more steering, but the upside of this is that when Codex can't do something, it'll come back to you for feedback, whereas when Claude cant' do something it implements a toy/mock version then typically lies about completing the task successfully in the final summary output.
cesarvarela 1 days ago [-]
This, but also the usability of the cli, is a step above the others to me. i.e., switching between modes on the fly and having the plan mode easily accessible via shift+tab.
all2 1 days ago [-]
I tried goose and it seems like there's a lot of nice defaults that Claude Code provides that Goose does not. How did you do your initial configuration?
kristopolous 1 days ago [-]
What I've been trying to use it for is to solve a number of long-standing bugs that I've frankly given up on in various Linux tools.
I think I lack the social skills to community drive a fix, probably through some undiagnosed disorder or something so I've been trying to soldier alone on some issues I've had for years.
The issues are things like focus jacking in some window manager I'm using on xorg where the keyboard and the mouse get separate focuses
Goose has been somewhat promising, but still not great.
I mean overall, I don't think any of these coding agents have given me useful insight into my long vexing problems
I think there has to be some type of perception gap or knowledge asymmetry to be really useful - for instance, with foreign languages.
I've studied a few but just in the "taking classes at the local JC" way. These LLMs are absolutely fantastic aids there because I know enough to frame the question but not enough to get the answer.
There's some model for dealing with this I don't have yet.
Essentially I can ask the right question about a variety of things but arguably I'm not doing it right with the software.
I've been writing software for decades, is it really that I'm not competent enough to ask the right question? That's certainly the simplest model but it doesn't check out.
Maybe in some fields I've surpassed a point where llms are useful?
It all circles back to an existential fear of delusional competency.
all2 1 days ago [-]
> Maybe in some fields I've surpassed a point where llms are useful?
I've hit this point while designing developer UX for a library I'm working on. LLMs can nail boilerplate, but when it comes to dev UX they seem to not be very good. Maybe that's because I have a specific vision and some pretty tight requirements? Dunno. I'm in the same spot as you for some stuff.
For throwaway code they're pretty great.
fourthark 23 hours ago [-]
Yeah if it’s obscure stuff you might have to guide it, show it just the relevant code / context. Outline the design for it.
They seem autonomous but often aren’t.
jatins 21 hours ago [-]
Used both. I think Claude Code is better because of better System prompt. It'll divide work into smaller tasks and go through it by default. You can get same behavior with Goose but will likely need to do a lot of prompting yourself
Never used goose, but looked at it way back when-- Claude Code feels more native IMO. Especially if you're already using Anthropic API/Plans anyways, I'd say give it a try.
navanchauhan 1 days ago [-]
You have to specify `/model sonnet[1m]` to get the 1 million context version
brulard 7 hours ago [-]
Be careful. exceeding around the original 200k tokens leads to worse and worse results. It's important to have context clean and tailored to the current task.
navanchauhan 4 hours ago [-]
Yes, but at the same time having the 1 million context enabled is nice because the model is aware that they have more context left and actually perform better. [0]
Thank you!! I've been looking for this for a while now.
aeon_ai 1 days ago [-]
To those lamenting that the Plan with Opus/Code with Sonnet feature is not available, check the charts.
Sonnet 4.5 is beating Opus 4.1 on many benchmarks. Feels like it's a change they made not to 'remove options', but because it's currently universally better to just let Sonnet rip.
jckahn 1 days ago [-]
Sure but I want to review the ripping plan so it tears along the correct lines.
NitpickLawyer 1 days ago [-]
Shift+Tab still brings up the planning mode.
handfuloflight 1 days ago [-]
What's the difference between thinking and planning? Planning leads to the use of the ToDoWrite Tool?
alecco 1 days ago [-]
The thinking is more tokens dedicated to think. While planning has a different system prompt.
dbbk 18 hours ago [-]
You can refine the plan before it starts writing any code
383toast 1 days ago [-]
plan mode has a special prompt they use
adastra22 21 hours ago [-]
Which doesn’t run Opus.
adastra22 21 hours ago [-]
But not the specific benchmarks which reflect what Plan mode does.
trumbitta2 15 hours ago [-]
The native VSCode extension has worse UX than the TUI, so I'm sticking with the TUI
jmward01 1 days ago [-]
"When you use Claude Code, we collect feedback, which includes usage data (such as code acceptance or rejections), associated conversation data, and user feedback submitted via the /bug command."
So I can opt out of training, but they still save the conversation? Why can't they just not use my data when I pay for things. I am tired of paying, and then them stealing my information. Tell you what, create a free tier that harvests data as the cost of the service. If you pay, no data harvesting.
NitpickLawyer 1 days ago [-]
> So I can opt out of training
Even that is debatable. There are a lot of weasel words in their text. At most they're saying "we're not training foundation models on your data", which is not to say "we're not training reward models" or "we're not testing our other-data models on your data" and so on.
I guess the safest way to view this is to consider anything you send them as potentially in the next LLMs, for better or worse.
netcoyote 1 days ago [-]
> When you use Claude Code, we collect feedback
When they ask "How is Claude doing this session?", that appears to be a sneaky way for them to harvest the current conversation based on the terms-of-service clause you pointed out.
adastra22 22 hours ago [-]
I have this same suspicion. Worse, there’s no way to opt out of giving a response.
SparkyMcUnicorn 19 hours ago [-]
If you turn off "Help improve Claude" you will never get this prompt (I never do).
That should be how this works, but unfortunately not. I have that toggle switched off, but I still regularly get this prompt.
calgoo 18 hours ago [-]
i have that option switched off but still got the score card 1 to 5 yesterday while working on some code.
candiddevmike 1 days ago [-]
Not your model, not your code (in their mind). Self host your models or enjoy folks trying to get the LLM to regurgitate your private codebase.
gdudeman 1 days ago [-]
This enables the /resume command that lets you start mid-conversation again.
Storing the data is not the same as stealing. It's helpful for many use cases.
I suppose they should have a way to delete conversations though.
freeqaz 1 days ago [-]
That's not just them saving it locally to like `~/.claude/conversations`? Feels weird if all conversations are uploaded to the cloud + retained forever.
gdudeman 1 days ago [-]
Ooo - good question. I'm unsure on this one.
adastra22 22 hours ago [-]
The conversation is stored locally. If you run Claude on two computers and try /resume, it won’t find the other sessions.
adastra22 22 hours ago [-]
They are downloaded from the cloud and just never deleted.
I've always been curious. Are tags like that one: "<system-reminder>" useful at all? Is the LLM training altered to give a special meaning to specific tags when they are found?
Can a user just write those magic tags (if they knew what they are) and alter the behavior of the LLM in a similar manner?
cube2222 1 days ago [-]
Claude tends to work well with such semi-xml tags in practice (probably trained for it?).
You can just make them up, and ask it to respond with specific tags, too.
Like “Please respond with the name in <name>…</name> tags and the <surname>.”
It’s one of the approaches to forcing structured responses, or making it role-play multiple actors in one response (having each role in its tags), or asking it to do a round of self-critique in <critique> tags before the final response, etc.
garfij 22 hours ago [-]
We use them extensively in our agent framework at work for all sort of things. You can make up whatever you want, if the tags are semantic enough it just gets it, or you can add a bit of explanation about it in the system prompt or whatever.
- Circuit breakers when it seem like it's stuck in a loop
- Warnings about running low on context
- Reminders about task lists (or anything)
- All sorts of warnings about whatever
A user can append similar system reminders in their own prompt. It’s one of the things that the Claude Code team discovered worked and now included in other CLIs like Factory, which was talked about today by cofounder of Factory: https://www.youtube.com/live/o4FuKJ_7Ds4?si=py2QC_UWcuDe7vPN
itsmevictor 1 days ago [-]
> If you do not use this tool when planning, you may forget to do important tasks - and that is _unacceptable_.
Okay, I know I shouldn't anthropomorphize, but I couldn't prevent myself from thinking that this was a bit of a harsh way of saying things :(
jarek83 1 days ago [-]
I notice that thinking triggers like "Think harder" are not highlighted in the prompt anymore. Could that mean that thinking is now only a single toggle with tab (no gradation)?
navanchauhan 1 days ago [-]
Ultrathink still works
gdudeman 1 days ago [-]
> New native VS Code extension
Looks great, but it's kind of buggy:
- I can't figure out how to toggle thinking
- Have to click in the text box to write, not just anywhere in the Claude panel
- Have to click to reject edits
jakebasile 22 hours ago [-]
I wish I could put it in the sidebar like every other flavor of AI plugin.
claytonjy 21 hours ago [-]
plans now open in a separate file tab, and if you don’t accept it, it just…disappears so you can’t discuss it!
ffsm8 1 days ago [-]
It seems they also removed the bypass permission setting...
alecco 1 days ago [-]
As a burnt-out, laid-off aging developer, I want to thank Anthropic for helping me get in love with programming again. Claude Code on terminal with all my beloved *nix tools and vim rocks.
taude 1 days ago [-]
100%. As a burnt-out manager, who doesn't get a lot of spare time to actually code. It's nice to have a tool like CC where I can make actual incremental changes in the spare 15 minutes I get here and there.
I spend most of my time making version files with the prompt, but pretty impressed by how far I've gotten on an idea that would have never seen the light of day....
The thoughts of having to write input validation, database persistence, and all the other boring things I've had to write a dozen times in the past....
swalsh 1 days ago [-]
As an Architect, i feel like a large part of my job is to help my team be their best, but I'm also focused on the delivery of a few key solutions. I'm used to writing tasks, and helping assign it to members on the team while occasionally picking up the odd-end piece of work myself, focusing more on architecture and helping individual members when they get stuck or when problems come up. But with the latest coding agents, i'm always thinking in the back of my head (I can get the AI to finish this task 3x quicker, and probably better quality if I just do it myself with the AI). We sit on SCRUM meetings sizing tasks, and i'm thinking "bro, you're just going to take my task description paste it into AI and be done in 1/2 hr" but we size it to a day or 2.
lupusreal 1 days ago [-]
Agreed, it's actually fun again. The evening hours I used to burn with video games and weed are now spent with claude code, rewriting and finishing up all my custom desktop tools and utilities I started years ago.
cevn 1 days ago [-]
I had a lot of fun making 'tools' like this, but once I settled into a complicated problem (networking in a multiplayer game), it has become frustrating to watch Claude give back control to me without accomplishing anything, over and over again. I think I need to start using the SDK in order to force it to its job.
ajmurmann 1 days ago [-]
I've found that in those cases, I likely am better off doing it myself. The LLMs I've used will frequently overfit the code when it gets complicated. I am working on a language learning app and it so often will add special-casing for words occurring in the tests. In general, as soon as you leave boiler-plate territory, I found it will start writing dirtier and dirtier code.
dceddia 1 days ago [-]
This kind of stuff is where my anxiety rises a bit. Another example like this is audio code - it compiles and “works” but there could be subtle timing bugs or things that cause pops and clicks that are very hard to track down without tracing through the code and building that whole mental model yourself.
There’s a great sweet spot though around stuff like “make me this CRUD endpoint and a migration and a model with these fields and an admin dashboard”.
jliptzin 1 days ago [-]
It’s still better letting Claude slog through all that boilerplate and skeletal code for you so that you can take the wheel when things start getting interesting. I’ve avoided working on stuff in the past just because I knew I wouldn’t be motivated enough to write the foundation and all the uninteresting stuff that has to come first.
RobCat27 20 hours ago [-]
I've enjoyed using it for coming up with the structure of a project. I'll ask in search mode for structures of other similar projects if I'm not sure. I also enjoy making human-readable .md or .txt documentation files for myself very quickly with it.
robotswantdata 1 days ago [-]
Try giving codex IDE a go, now included with ChatGPT.
Had equal frustrations with Claude making bad decisions, in contrast gpt5 codex high is extremely good!
bcrosby95 1 days ago [-]
I mean, yes. This is what Claude is good for: helping solve problems that aren't difficult or complex, just time consuming.
The thing is a lot of software jobs boil down to not difficult but time consuming.
lupusreal 1 days ago [-]
I've got it using dbus, doing funky stuff with Xlib, working with pipewire using the pulseaudio protocol which it implemented itself (after I told it to quit using libraries for it.) You can't one-shot complicated problems, at least not without extensive careful prompting, but at this point I believe I can walk it through pretty much anything I can imagine wanting done.
protocolture 1 days ago [-]
I had ChatGPT write from spec an assignment I failed to complete during university, that has always stuck with me as something I would like to finish.
ojr 1 days ago [-]
I still spend my evening hours like that and do ai-assisted coding in the background
lupusreal 1 days ago [-]
Depends on the game tbh, having claude ping me for attention every few minutes disrupts most games too much, but with turn-based or casual games it works out well. OpenTTD and Endless Sky are fun to play while claude churns.
epiccoleman 1 days ago [-]
I really didn't need the cognitohazard thought that I could play Factorio and still somehow get things done
lupusreal 1 days ago [-]
I'm 10 months clean on factorio, been doing the 12 step program for it. Feeling real tempted to relapse now...
lisbbb 1 days ago [-]
[flagged]
alecco 1 days ago [-]
Thanks for caring. At the moment I am in a good place and luckily I don't have financial problems. My mental health is getting better thanks to a fixed schedule, sleep, diet, exercise, socializing, and walks in nature. I hope you get better soon, too.
If you want to chat with somebody, let me know.
rsanek 1 days ago [-]
Careful with trying to generalize personalized insights from therapy! Everyone is different. I believe advice giving (& receiving!) is very difficult to do well, even with people you know personally. For strangers on an internet forum, it is impossible.
dymk 1 days ago [-]
[flagged]
ollysb 1 days ago [-]
The vscode integration does feel far tighter now. The one killer feature that Cursor has over it is the ability to track changes across multiple edits. With Claude you have to either accept or reject the changes after every prompt. With Cursor you can accumulate changes until you're ready to accept. You can use git of course but it isn't anywhere near as ergonomic.
dcreater 1 days ago [-]
Cline and it's forks have that in vs code. I use Cline with claude code as the LLM
scottydelta 22 hours ago [-]
Thanks for the suggestion, will give this a try.
cadamsdotcom 22 hours ago [-]
Claude Code is so much better than anything else.
If Claude Code was a car it'd be the ideal practical vehicle for all kinds of uses.
If OpenAI Codex was a car, it'd be a cauldron with wheels.
The reason I say this is CC offers so many features: plan mode, hooks, escape OR ctrl-c to interrupt it, and today added quick rewind. Meanwhile Codex can't even wrap text to the width of the terminal; you can't type to it while it's working to queue up messages to steer it (you have to interrupt with Ctrl-C then type), and it doesn't show you clearly when it's editing files or what edits it's making. It's the ultimate expression of OpenAI's "the agent knows what to do, silly human" plan for the future - and I'm not here for that. I want to steer my agent, and be able to have it show me its plan before it edits anything.
I really wish the developers of Codex spent more time using Claude Code.
Tiberium 22 hours ago [-]
When did you last update Codex? You can queue up messages without interrupting, and I think a lot of other complaints you made could be already solved. They put out new Codex versions multiple times a week lately
risho 19 hours ago [-]
codex has improved DRASTICALLY over the last 2 weeks. your claims about it were true in the past but far less true today. its still missing a little bit of polish compared to claude code, but i suspect it is much closer today than you realize. either way the lack of features of codex even in the past was never caused by hubris of openai knows better than you, it just hadn't implemented it yet. it is a brand new project that gets commits to the project every single day.
sbene970 15 hours ago [-]
I agree, CC is much more polished regarding UX. I can't even scroll up in codex CLI, which is just a disaster IMO.
risho 2 hours ago [-]
yes you can
kip_ 1 days ago [-]
tab-completion of filenames in the directory tree is now unavailable. You'll need to use the Codex style @file to bring up an fzf style list
cma 1 days ago [-]
I think they had the @file thing before codex existed
nrjames 1 days ago [-]
I would really like for them to add the option to constantly display how much context is left before compression or a new session.
jasonjmcghee 1 days ago [-]
I haven't tried this, but looks like it might be possible to display it in the status line.
Looks like it shows now when the remaining context is < 50%, which is a welcome change to the 15% when it previously would appear.
neumann 1 days ago [-]
I have been using code + vscode extensively for coding, but in the last few months it has been a frustrating downgrade compared to the same prompts and code being pasted into chatGPT.
Is this going to be the way forward? Switching to whichever is better at a task, code base or context?
It still bothers me that almost every agentic TUI is written in TS + React.
It often consumes at least a few GB of RAM.
No one bothers about it. Everybody is trying to ship as fast as possible.
nylonstrung 19 hours ago [-]
I highly recommend Crush which is built with Go
The UX is definitely better because it uses the bubble tea library which is probably the best TUI framework ever
And you can use a ton of different providers and models
jama211 18 hours ago [-]
Ram is cheap, 99.9% of the audience that would use this are running heavy envs on powerful computers. I can totally understand why they write it that way. Better to have faster iteration and alienate 0.1% of serious users than to slow development just to cater to them.
f311a 17 hours ago [-]
Well, it's not only about RAM. Things like calculating correct diffs take more time too, but it's hidden.
ranguna 17 hours ago [-]
They use react on a cli tool?
dandaka 14 hours ago [-]
Yes, the library is @vadimdemedes/ink, which uses React to render components
Squarex 18 hours ago [-]
Well codex is written in rust.
f311a 17 hours ago [-]
Interesting, the ink repo still states that codex is using it:
Who's Using Ink?
Codex - An agentic coding tool made by OpenAI.
I guess they had initial versions written in TS?
UPD. They switched 3 months ago.
epolanski 17 hours ago [-]
> No one bothers about it.
Why would they?
h4ch1 19 hours ago [-]
Where did you get React from?
f311a 19 hours ago [-]
You can decompile it, but it's pretty well known that it uses TypeScript, React, and Ink. Same for gemini, codex and others.
m11a 18 hours ago [-]
Codex got ported to Rust
1 days ago [-]
Galanwe 1 days ago [-]
Wait, still no support for the new MCP features? How come Claude Code, from the creators of MCP, is still lacking félicitation, server logging, and progress report ?!
singularity2001 1 days ago [-]
How to check the version? claude version one told me that it updated to version two but I don't know if it's true
cl --version
1.0.44 (Claude Code)
as expected … liar! ;)
cl update
Wasn't that hard sorry for bothering
samuelknight 1 days ago [-]
I opened the CLI for about 10 seconds, the "Auto Update" status flashed. Then I restarted and it was version 2.0
adham-omran 20 hours ago [-]
I find the /usage command most interesting as it's giving you a % towards your limits and when they reset rather than having to note all of that down and guess when you'll hit them.
pmarreck 24 hours ago [-]
I was already using jj (jujutsu) to do my own rewinds (it saves every change to every file as an unlabeled commit, assuming you set up its daemon). Would sort of prefer to continue to do that since it's far more flexible than checkpoints
swaits 13 hours ago [-]
Checkpoints also include context.
I also use jj to checkpoint. When working on a change, each time I get to a stable point I squash and start fresh with an empty change.
You can absolutely continue doing that.
lukaslalinsky 14 hours ago [-]
How do you use jj to get those checkpoints? I was experimenting with jj and claude code, but it was frustrating to have it run jj status all the time, could as well tell it to do git commit all the time.
pmarreck 9 hours ago [-]
You don't need to run `jj status` all the time, but you DO need to have Watchman installed, and a config like this:
# ~/.jjconfig.toml
[core]
fsmonitor = "watchman"
[core.watchman]
# Newer docs use the hyphenated key below:
register-snapshot-trigger = true
14 hours ago [-]
sixhobbits 14 hours ago [-]
I haven't had time to fully play with it yet, but first impression is that it's really pretty!
/rewind is a super nice addition. That was annoying the hell out of me.
mercurialsolo 23 hours ago [-]
/rewind was a must needed upgrade for the agent.
- Need better memory management and controls (especially across multi-repos)
- /upgrade needs better management
vanillax 1 days ago [-]
Has anyone figured out how to do claude sub agents without using claude? some sort of opensource cli with openrouter or something? I want to use subagents on differnt LLMs ( copilot,selfhost ).
tomquirk 1 days ago [-]
Opencode
lvl155 22 hours ago [-]
I used opencode for a bit but wish they spend some time with QC.
aitchnyu 1 days ago [-]
Tangential, did anybody get FOMO about Aider and found a much better tool?
IceWreck 1 days ago [-]
I was using aider quite a lot from ~ 7 months ago to ~ 3 months ago.
I had to stop because they refuse to implement MCPs and Claude/Codex style agentic workflow just yields better results.
I haven’t fully tested it yet, but I found it because its supports JetBrains IDE integration. It has MCPs as well.
sannysanoff 1 days ago [-]
I still use aider, because often I know better what to do.
flyinglizard 1 days ago [-]
Still loyal to aider. It just fits my style better, as a very fine tool. I have my workflow and scripts around it, switch freely between gpt-5/sonnet (a bit of gemini-2.5-pro too) and enjoying life.
I wish it was maintained by a larger team though. It has a single maintainer and they seem to be backlogged or working on other stuff. If there was an aider fork that ran forward with capabilities I'd happily switch.
That said, I haven't tried Claude Code firsthand, only saw friends using it. I'm not comfortable letting agents loose on my production codebase.
adastra22 21 hours ago [-]
> I'm not comfortable letting agents loose on my production codebase.
Why?
jakozaur 1 days ago [-]
Just use `claude update` if you already have it. Unfortunately, they removed Plan mode, when I could use Opus for planning and Sonnect for coding.
Though I will see how this pans out.
g42gregory 1 days ago [-]
I am ended up not using this option anyway. I am using B-MAD agents for planning and it gets into a long-running planning stream, where it needs permission to execute steps. So you end up running the planning in the "accept edits" mode.
I use Opus to write the planning docs for 30 min, then use Sonnet to execute them for another 30 min.
sbene970 1 days ago [-]
> they removed Plan mode
This isn't true, you just need to use the usual shortcut twice: shift+tab
paulsmith 1 days ago [-]
> Unfortunately, they removed Plan mode
If I hit shift-Tab twice I can still get to plan mode
dougbarrett 1 days ago [-]
I think they meant the 'Plan with Opus' model. shift+tab still works for me, the VS code extension allows you to plan still too, but the UI is _so_slow with updates.
rafaquintanilha 1 days ago [-]
They removed the /model option where you can select Opus to plan and Sonnet to execute. But you can still Shift + Tab to cycle between auto-accept and plan mode.
adamckay 1 days ago [-]
You can use `/model opusplan` to get that behaviour back though if you do want Opus for planning and Sonnet for editing.
adastra22 21 hours ago [-]
Thank you.
throwaway314155 1 days ago [-]
Oh thank God. parent comment made me think Plan mode was gone entirely by incorrectly stating that...Plan made is gone entirely...
spike021 1 days ago [-]
is Plan mode any different from telling Claude "this is what I'd like to do, please describe an implementation plan"?
that's generally my workflow and I have the results saved into a CLAUDE-X-plan.md. then review the plan and incrementally change it if the initial plan isn't right.
cesarvarela 1 days ago [-]
It also limits the tools available, reducing context usage and leaving more room actually to plan.
jspdown 1 days ago [-]
There's a bit of UI around it where you can accept the plan. I personally stopped using it and instead moved to a workflow where I simply ask it to write the plan in a file. It's much easier to edit and improve this way.
turnsout 1 days ago [-]
Yeah, I just have it generate PRDs/high-level plans, then break it down into cards in "Kanban.md" (a bunch of headers like "Backlog," "In-Progress", etc).
To be honest, Claude is not great about moving cards when it's done with a task, but this workflow is very helpful for getting it back on track if I need to exit a session for any reason.
spike021 1 days ago [-]
i've experienced the same thing. usually i try to set up or have it set up a milestone/phase approach to an implementation with checklists (markdown style) but it's 50/50 if it marks them automatically upon completion.
dota_fanatic 1 days ago [-]
I have this in my CLAUDE.md and it works better than 50/50. Still not 100% though:
### Development Process
All work must be done via TODO.md. If the file is empty, then we need to write our next todo list.
When TODO.md is populated:
1. Read the entire TODO.md file first
2. Work through tasks in the exact order listed
3. Reference specific TODO.md sections when reporting progress
4. Mark progress by checking off todos in the file
5. Never abbreviate, summarize, or reinterpret TODO.md tasks
A TODO file is done when every box has been checked off due to completion of the associated task.
lupusreal 1 days ago [-]
> Unfortunately, they removed Plan mode
WTF. Terrible decision if true. I don't see that in the changelog though
lukev 1 days ago [-]
No. Plan mode still works fine.
They just changed it so you can't set it to use Opus in planning mode... it uses Sonnet 4.5 for both.
Which makes sense Iif it really is a stronger and cheaper model.
adastra22 21 hours ago [-]
It isn’t stronger for these sorts of reasoning tasks.
lukev 12 hours ago [-]
It is, according to the benchmarks. I'm just taking the materials they provided at face value.
If you have run your own benchmarks or have convincing anecdotes to the contrary, that would be an interesting contribution to the discussion.
xmpirate 1 days ago [-]
I wish there were an option to cancel a currently running prompt midway. Right now, pressing Ctrl+C twice ends up terminating the entire session instead.
g42gregory 1 days ago [-]
Wait, doesn't hitting Escape do this already?
cesarvarela 1 days ago [-]
Adding to the press Esc comments, if you press it twice, you can revert to previous messages in the current conversation.
turnsout 1 days ago [-]
I'm always watching Claude Code as it runs, ready to hit the Escape key as soon as it goes off the rails. Sometimes it gets stuck in a cul de sac, or misunderstands something basic about the project or its structure and gets off on a bad tangent. But overall I love CC.
qafy 1 days ago [-]
press escape
lerchmo 24 hours ago [-]
I see thinking can be toggled in the CLI, anyone figured out how to toggle it in the extension?
OddMerlin 6 hours ago [-]
How is this any different from just using claude-cli?
The example on the npm page is something you could easily do from within claude-cli...? Sorry, I must be missing the point on this??
unshavedyak 1 days ago [-]
I'm concerned that i don't see the "Plan with Opus, impl with Sonnet" feature with Claude 2.0.
grim_io 1 days ago [-]
If Sonnet 4.5 is always better than Opus 4.1, then it doesn't make sense to plan with Opus.
I hope this is the case.
clbrmbr 20 hours ago [-]
Unclear how this can be the case in general. Opus is apparently a much bigger model.
_betty_ 1 days ago [-]
VS Code plugin seems to be missing quite a number of the CLI features.
gazpachotron 20 hours ago [-]
[dead]
jarek83 1 days ago [-]
It's first time I started get hit by "ERROR Out of memory" in CC - after about an hour of use. I'm on Mac Pro M4 Max with 128 GB RAM...
winrid 1 days ago [-]
It's a node app, it won't use all memory by default, only a couple gigs.
oofbey 1 days ago [-]
That's a BIG workstation.
1 days ago [-]
didip 1 days ago [-]
wow, the new Claude Code UI looks beautiful. Good job Anthropic designers!
risho 1 days ago [-]
i really hate the fact that every single model has its own cli tool. the ux for claude code is really great, but being stuck using only anthropic models makes me not want to use it no matter how good it is.
moomoo11 1 days ago [-]
how do i revert to the previous version? I find that the "claude" command in terminal still works great, but the new native VSC extension is missing all these things (before it would launch terminal + run "claude")
I feel like there's so many bugs. The / commands for add-dir and others I used often are gone.
I logged in, it still says "Login"
cute_boi 1 days ago [-]
seems like closed source obfuscated blob distributed on npm to save bandwidth cost.
postalcoder 1 days ago [-]
I'm disappointed that they haven't done more to make the /resume command more usable. It's still useless for all intents and purposes.
gdudeman 1 days ago [-]
Resume is now a drop down menu at the top in the new VS Code plugin and it's much easier to read.
asadm 1 days ago [-]
ooh I like my ctrl-R in gemini cli. Good that it lands here too.
____tom____ 1 days ago [-]
What are they doing about the supply chain attacks on npm?
asadm 1 days ago [-]
thats your concern as a dev sending a patch to your repo., your IDE doesn't "address" attacks.
KaiserPro 1 days ago [-]
The same as everyone else; ignoring it and hope it goes away
drusepth 1 days ago [-]
Curious: what would, should, or could they be doing?
1 days ago [-]
oofbey 1 days ago [-]
Now if only `/rewind` could undo the `rm -rf ~/*` commands and other bone-headed things it tries to do on the filesystem when you're not watching!
acedTrex 1 days ago [-]
wow its way uglier lol, and why does it default to full screen?
unshavedyak 1 days ago [-]
> why does it default to full screen?
Pardon my ignorance, but what does this mean? It's a terminal app that has always expanded to the full terminal, no? I've not noticed any difference in how it renders in the terminal.
What am i misunderstanding in your comment?
acedTrex 1 days ago [-]
A tui does not have to start full screen, v1 of claude did not take over the entire terminal, it would only use a bit at the bottom and scroll up until it was full screen.
I just downgraded to v1 to confirm this.
unshavedyak 1 days ago [-]
Weird, i use it exclusively in the terminal (in raw term, tmux and zellij) and i've not noticed any difference in behavior. On both MacOS and Linux.
Wonder what changes that i'm not seeing? Do you think it's a regression or intentional?
pretty sure your old behavior was the broken one tho - i vaguely remember fugling with this to "fullscreen correctly" for a claude-in-docker-in-cygwin-via-MSYS2 a while ago
ph4rsikal 1 days ago [-]
[dead]
user3939382 16 hours ago [-]
I’m way ahead of anthropic and all of you on orchestration but not ready to share.
mentalgear 1 days ago [-]
Well I guess I'll be sticking with opencode.
jspdown 1 days ago [-]
Do you mind telling us a bit more? I never used OpenCode, what makes it better in your opinion?
mbarneyme 1 days ago [-]
I'm consistently hitting weird bugs with opencode, like escape codes not being handled correctly so the tui output looks awful, or it hanging on the first startup. Maybe after they migrate to opentui it'll be better
I do like the model selection with opencode though
nsonha 14 hours ago [-]
- opensource with an SDK so you can build things on top of
- supports every LLM provider under the sun, including Anthropic
Claude Code, Codex CLI etc can effectively do anything that a human could do by typing commands into a computer.
They're incredibly dangerous to use if you don't know how to isolate them in a safe container but wow the stuff you can do with them is fascinating.
After using gpt5-codex inside codex-cli to produce this fork of DOSBox (https://github.com/pmarreck/dosbox-staging-ANSI-server) that adds a little telnet server that allows me to screen-scrape VGA textmode data and issue virtual keystrokes (so, full roundtrip scripting, which I ended up needing for a side project to solve a Y2K+25 bug in a DOS app still in production use... yes, these still exist!) via 4000+ lines of C++ (I took exactly one class in C++), and it passes all tests and is non-blocking, I was able to turn around and (within the very same session!) have it help me price it to the client with full justification as well as a history of previous attempts to solve the problem (all of which took my billable time, of course), and since it had the full work history both in Git as well as in its conversation history, it was able to help me generate a killer invoice.
So (if all goes well) I may be getting $20k out of this one, thanks to its help.
Does the C++ code it made pass the muster of an experienced C++ dev? Probably not (would be happy to accept criticisms, lol, although I think I need to dress up the PR a bit more first), but it does satisfy the conditions of 1) builds, 2) passes all its own tests as well as DOSBox's, 3) is nonblocking (commands to it enter a queue and are processed one set of instructions at a time per tick), 4) works as well as I need it to for the main project. This still leaves it suitable for one-off tasks, of which there is a ton of need for.
This is a superpower in the right hands.
I’ve been using Claude code since launch, must have used it for 1000 hours or more by now, and it’s never done anything I didn’t want it to do.
Why would I run it in a sandbox? It writes code for me and occasionally runs a build and tests.
I’m not sure why you’re so fixated on the “danger”, when you use these things all the time you end up realizing that the safety aspect is really nowhere near as bad as the “AI doomers” seem to make out.
You (and many, many others) likely won't take this threat seriously until adversarial attacks become common. Right now, outside of security researcher proof of concepts, they're still vanishingly rare.
You ask why I'm obsessed with the danger? That's because I've been tracking prompt injection - and our total failure to find a robust solution for it - for three years now. I coined the name for it!
The only robust solution for it that I trust is effective sandboxing.
https://gitlab.com/txlab/ai/sandcastle/
Check it out if you're experimental - but probably better in a few weeks when it's more stable.
I share your worries on this topic.
I saw you experiment a lot with python. Do you have a python-focused sandboxed devcontainer setup for Claude Code / Codex you want to share? Or even a full stack setup?
Claude's devcontainer setup (https://github.com/anthropics/claude-code/tree/main/.devcont...) is focused on JS with npm.
I wrote a bit about that in a new post this morning, but I'm still looking for an ideal solution: https://simonwillison.net/2025/Sep/30/designing-agentic-loop...
I actually preferred running stuff in containers to keep my personal system clean anyway so I like this better than letting claude use my laptop. I'm working on hosting devcontainer claude code in kubernetes too so I dont need my laptop at all.
I feel this is overly exagerated here.
There is more issues that are currently getting leverage to hack with vscode extension than AI prompt injection, that require a VERY VERY complex chain of attack to get some leaks.
But that's a very big if. I've seen Claude Code attempt to debug a JavaScript issue by running curl against the jsdelivr URL for a dependency it's using. A supply chain attack against NPM (and those aren't exactly rare these days) could add comments to code like that which could trigger attacks.
Ever run Claude Code in a folder that has a downloaded PDF from somewhere? There are a ton of tricks for hiding invisible malicious instructions in PDFs.
I run Claude Code and Codex CLI in YOLO mode sometimes despite this risk because I'm basically crossing my fingers that a malicious attack won't slip in, but I know that's a bad idea and that at some point in the future these attacks will be common enough for the risk to no longer be worth it.
Again you likely use vscode. Are you checking each extension you download? There is already a lot of reported attacks using vscode.
A lot of noise over MCP or tools hypothetical attacks. The attack surface is very narrow, vs what we already run before reaching Claude Code.
Yes Claude Code use curl and I find it quite annoying we can't shut the internal tools to replace them with MCP's that have filters, for better logging & ability to proxy/block action with more in depth analysis.
Maybe it will never happen? I find that extremely unlikely though. I think the reason it hasn't happened yet is that widespread use of agentic coding tools only really took off this year (Claude Code was born in February).
I expect there's going to be a nasty shock to the programming community at some point once bad actors figure out how easy it is to steal important credentials by seeding different sources with well crafted malicious attacks.
The researcher has gotten actual shells on oai machines before via prompt injection
Lots of ways his could happen. To name two: Third-party software dependencies, HTTP requests for documentation (if your agent queries the Internet for information).
If you don't believe me, setup a MITM proxy to watch network requests and ask your AI agent to implement PASETO in your favorite programming language, and see if it queries https://github.com/paseto-standard/paseto-spec at all.
More seen as buzz article about how it could happen. This is very complicated to exploit vs classic supply chains and very narrow!
????
What does "This" refer to in your first sentence?
Just yesterday my cursor agent made some changes to a live kubernetes cluster even over my specific instruction not to. I gave it kubectl to analyze and find the issues with a large Prometheud + AlertManager configuration, then switched windows to work on something else.
When I was back the MF was patching live resources to try and diagnose the issue.
In my own career, when I was a junior, I fucked up a prod database... which is why we generally don't give junior/associate people to much access to critical infra. Junior Engineers aren't "dangerous" but we just don't give them too much access/authority too soon.
Claude Code is actually way smarter than a junior engineer in my experience, but I wouldn't give it direct access to a prod database or servers, it's not needed.
My way of explaining that to people is to say that it's dangerous to do things like that.
If it is not dangerous to give them this access, why not grant it?
(Having said that, I'm just a kibitzer.)
I have a cursor rule stating it should never make changes to clusters, and I have explicitly told it not to do anything behind my back.
I don't know what happened in the meantime, maybe it blew up its own context and "forgot" the basic rules, but when I got back it was running `kubectl patch` to try some changes and see if it works. Basically what a human - with the proper knowledge - would do.
Thing is: it worked. The MF found the templating issue that was breaking my Alertmanager by patching and comparing the logs. All by itself, however by going over an explicit rule I had given it a couple times.
So to summarize: it's useful as hell, but it's also dangerous as hell.
Problem is: I also force it to run `kubectl --context somecontext`, as to avoid it using `kubectl config use-context` and pull a hug on me (if it switches the context and I miss it, I might then run commands against the wrong cluster by mistake). I have 60+ clusters so that's a major problem.
Then I'd need a way to allowlist `kubectl get --context`, `kubectl logs --context` and so on. A bit more painful, but hopefully a lot safer.
And yes, these are all "skill issues" - as in, if they had known better this wouldn't have happened to them, however I think it's fair to call these possibilities out to counter balance the AI is amazing and everyone should use it for everything type narratives as to instil at least a little caution.
I too use it extensively. But they’re very, very capable models, and the command line contains a bunch of ways to exfiltrate data off your system if it wants to.
Yes, it was a legit safety issue and worth being aware of, but it’s not it was a general case. Red teamers worked hard to produce that result.
Was it a paper or something? Would you happen to remember the reference?
i.e. quite dangerous, but people do it anyway
You know what neighbors of serial killers say to the news cameras right?
"He was always so quiet and polite. Never caused any issues"
I've used it to troubleshoot some issues on my linux install, but it's also why the folder sandbox gives me zero confidence that it can't still brick my machine. It will happily run system wide commands like package managers, install and uninstall services, it even deleted my whole .config folder for pulseaudio.
Of course I let it do all these things, briefly inspecting each command, but hopefully everyone is aware that there is no real sandbox if you are running claude code in your terminal. It only blocks some of the tool usages it has, but as soon as it's using bash it can do whatever it wants.
[0]: https://ricardoanderegg.com/posts/control-shell-permissions-...
I have no way of really guaranteeing that it will do exactly what it proposed and nothing more, but so far I haven't seen it deviate from a command I approved.
Also, I think shellagent sounds cooler.
I expect the portion of Claude Code users who have a dedicated user setup like this is pretty tiny!
Not the exact setup, but also pretty solid.
As long as the supply chain is safe and the data it accesses does not generate some kind of jail break.
It does read instructions from files on the file system, I pretty sure it's not complex to have it poison its prompt and make it suggest to build a program infected with malicious intent. It's just one copy pasta away from a prompt suggestion found on the internet.
Instead I run it in bubblewrap sandbox: https://blog.gpkb.org/posts/ai-agent-sandbox/
Note that there needs to be open source libraries and toolings. It can’t do a Dolby Atmos master, for example. So you still need a DAW.
I would like a friendlier interface than the terminal, though. It looks like the “Imagine with Claude” experiment they announced today is a step in that direction. I’m sure many other companies are working on similar products.
Also, another important factor (as in everything) is to do things in many small steps, instead of giving one big complicated prompt.
Clearly not. Just put an LLM into some basic scaffolding and you get an agent. And as capabilities of those AI agents grow, so would the degree of autonomy people tend to give them.
That is still very much the case; the danger comes from what you do from the text that is generated.
Put a developer in a meeting room and no computer access, no internet etc; and let him scream instructions through the window. If he screams "delete prod DB", what do you do ? If you end up having to restore a backup that's on you, but the dude inherently didn't do anything remotely dangerous.
The problem is that the scaffolding people put around LLM is very weak, the equivalent of saying "just do to everything the dude is telling, no question asked, no double check in between, no logging, no backups". There's a reason our industry has development policies, 4 eyes principles, ISO/SOC standards. There already are ways to massively improve the safety of code agents; just put Claude code in a BSD jail and you already have a much safer environment than what 99% of people are doing, this is not that tedious to make. Other safer execution environments (command whitelisting, arguments judging, ...) will be developed soon enough.
But are all humans in jails? No, the practical reason being that it limits their usefulness. Humans like it better when other humans are useful.
The same holds for AI agents. The ship has sailed: no one is going to put every single AI agent in jail.
The "inherent safety" of LLMs comes only from their limited capabilities. They aren't good enough yet to fail in truly exciting ways.
LLM are in jail: an LLM outputting {"type": "function", "function": {"name": "execute_bash", "parameters": {"command": "sudo rm -rf /"}}} isn't unsafe. The unsafe part is the scaffolding around the LLM that will fuckup your entire filesystem. And my whole point is that there are ways to make that scaffolding safe. There is a reason why we have permissions on a filesystem, why we have read only databases etc etc.
One criticism on current generation of AI is that they have no real world experience. Well, they have enormous amount of digital world experience. That, actually, has more economical value.
I suppose they’re dangerous in the same way any terminal shell is dangerous, but it seems a bit of a moral panic. All tools can be dangerous if misused.
Even with approvals humans will fall victim to dialog fatigue, where they'll click approve on everything without reading it too closely.
The gap between coding agents in your terminal and computer agents that work on your entire operating system is just too narrow and will be crossed over quick.
Maybe something like bubblewrap could help
They still don't have good integration with the web browser, if you are debugging frontend you need to carry screenshots manually, it cannot inspect the DOM, run snippets of code in the console, etc.
I've seen Codex CLI install Playwright Python when I asked it to do this and it found it wasn't yet available in the environment.
It's pretty new, but so far it's been a lifesaver.
https://news.ycombinator.com/newsguidelines.html
Edit: We've had to ask you this more than once before, and you've continued to do it repeatedly (e.g. https://news.ycombinator.com/item?id=45389115, https://news.ycombinator.com/item?id=45282435). If you don't fix this, we're going to end up banning you, so it would be good if you'd please review the site guidelines and stick to them from now on.
I was under the impression that Docker container escapes are actually very rare. How high do you rate the chance of a prompt injection attack against Claude running in a docker container on macOS managing to break out of that container?
(Actually hang on, you called me out for suggesting containers like Docker are safe but that's not what I said - I said "a safe container" - which is a perfectly responsible statement to make: if you know how to run them in a "safe container" you should do so. Firecracker or any container not running on your own hardware would count there.)
That's the secret, cap... you can't. And it's due to in band signalling, something I've mentioned on numerous occasions. People should entertain the idea that we're going to have to reeducated people about what is and isn't possible because the AI world has been playing make believe so much they can't see the fundamental problems to which there is no solution.
https://en.m.wikipedia.org/wiki/In-band_signaling
True but all it will take is one report of something bad/dangerous actually happening and everyone will suddenly get extremely paranoid and start using correct security practices. Most of the "evidence" of AI misalignment seems more like bad prompt design or misunderstanding of how to use tools correctly.
You can use it for writing, data processing, admin work, file management, etc.
I compiled a list of non-coding use cases for Claude Code here:
https://github.com/paradite/claude-code-is-all-you-need
https://www.anthropic.com/news/context-management
Anyone know if these are used in Claude-Code?
Specifically, Input Method Editors needed for CJK inputs(esp. for C and J), to convert ambiguous semi-readable forms into proper readable text, use enter to finalize after candidates were iterated with spacebar. While IME engines don't interchange between different languages, I believe basically all of them roughly follow this pattern.
Unless you specifically wants to exclude CJK users, you have to either detect presence of IME and work with it so that enter do nothing to the app unless conditions are met. Switching to shift+enter works too.
1: https://github.com/anthropics/claude-code/issues/8405
2: https://www.youtube.com/watch?v=mY6cg7w2eQU
3: https://youtu.be/sYAnawy_VoA?feature=shared&t=282
4: https://www.youtube.com/watch?v=VmoeZ_W3WXo
1: https://github.com/anthropics/claude-code/issues/8405#issuec...
2: https://github.com/anthropics/claude-code/issues/8466
https://en.m.wikipedia.org/wiki/CJK_characters
[ { "key": "shift+enter", "command": "workbench.action.terminal.sendSequence", "args": { "text": "\u001b\n" }, "when": "terminalFocus" }, ]
It will allow you to get new lines without any strange output.
So I've been able to shift enter. I'm using iTerm2 and zsh with CC (if that's relevant)
others say here that option/alt-enter may work? not sure why shift-enter couldn't though.
https://news.ycombinator.com/item?id=45426787
Avoids even having to do "jj new"!
https://news.ycombinator.com/item?id=45426787
Avoids having to do any jj command at all!
> Our new checkpoint system automatically saves your code state before each change, and you can instantly rewind to previous versions by tapping Esc twice or using the /rewind command.
https://www.anthropic.com/news/enabling-claude-code-to-work-...
Lots of us were doing something like this already with a combination of WIP git commits and rewinding context. This feature just links the two together and eliminates the manual git stuff.
> Checkpoints apply to Claude’s edits and not user edits or bash commands, and we recommend using them in combination with version control
Hey Claude... uh... unlaunch those
That said, having a single option that rewinds LLM context and code state is better than having to do both separately.
- you DO want your prompts and state synced (going back to a point in the prompt <=> going back to a point in the code).
Git is a non starter then. At least the repo’s same git.
Plus, you probably don’t want the agent to run mutating git commands, just in case it decides to allucinate a push —force
[1] https://github.com/marckrenn/cc-mvp-prompts/compare/v1.0.128...
[2] https://x.com/CCpromptChanges/status/1972709093874757976
I should probably include that in my Claude.md instead I guess?
Interesting. This was in the old 1.x prompt, removed for 2.0. But CC would pretty much always add comments in 1.x, something I would never request, and would often have to tell it to stop doing (and it would still do it sometimes even after being told to stop).
I've considered just leaving the comments in, considering maybe they provide some value to future LLMs working in the codebase, but the extra human overhead in dealing with them doesn't seem worth it.
- like all documentation, they are prone to code rot (going out of date)
- ideally code should be obvious; if you need a comment to explain it, perhaps it's not as simple as it could be, or perhaps we're doing something hacky that we shouldn't
An example of this: assume you live in a world where the formula for the circumference of a circle has not been derived. You end up deriving the formula yourself and write a function which returns 2piradius. This is as simple as it gets, not hacky at all, and you would /definitely/ want to include a comment explaining how you arrived at your weird and arbitrary-looking "3.1415" constant.
So far Clause Code's comments on my code were completely useless. They just repeated what you could figure out from the name of called functions anyway.
Edit: an obvious exception is public libraries to document public interfaces, and use something like JavaDoc, or docstrings, etc.
I'm wondering if tsdoc/jsdoc tags like @link would help even more for context
It's cognitively stressing, but is beneficial for juniors, and developers new to the codebase, just as it is for senior developers to reduce the mental overhead for the reader.
It's always good to spend an extra minute thinking how to avoid a comment.
Of course there are exceptions, but the mental exercise trying to avoid having that exception is always worth it.
Comments are instant technical debt.
Especially junior developers will be extremely confused and slowed down by having to read both, the comment, and then the code, which was refactored in the meantime and does the opposite of what the comment said.
I think a happy medium of "comment brevity, and try thinking of a clearer way to do something instead of documenting the potentially unnecessary complexity with a comment" would be good.
I don't know where this "comments are instant technical debt" meme came from, because it's frankly fucking stupid, especially in the age of being able to ask the LLM "please find any out-of-date comments in this code and update them" since even the AI-averse would probably not object to it commenting code more correctly than the human did
Docstring comments are even worse, because it's so easy for someone to update the function and not the docstring, and it's very easy to miss in PR review
Good and up to date comments are good and up to date. Bad and outdated comments are bad and outdated. If you let your codebase rot then it rots. If you don't then it doesn't. It's not the comment's fault you didnt update it. It's yours.
Guard rails should be there to prevent inexperienced developers (or overworked, tired ones) from committing bad code.
"Try to think how to refactor functions into smaller ones and give them meaningful names so that everyone knows immediately what's going on" is a good enough guard rail.
That's exactly what I wrote, phrased slightly differently.
We both agree at the core.
I assume it comes from the myriad tutorial content on medium or something.
gpt-oss is the most egregious emoji user: it uses emoji for numbers in section headings in code, which was clearly a stylistic choice finetuned into the model and it fights you on removing them.
I’ve noticed Claude likes to add them to log messages and prints and with 4.5 seems to have ramped up their use in chat.
what in the world?
Here's how it works in detail: https://mariozechner.at/posts/2025-08-03-cchistory/
Here's how it works: https://mariozechner.at/posts/2025-08-03-cchistory/
The bot is based on Mario Zechner's excellent work[1] - so all credit goes to him!
[1] https://mariozechner.at/posts/2025-08-03-cchistory/
I wrote about one tool for doing that here: https://simonwillison.net/2025/Jun/2/claude-trace/
Why do you think these aren't legit?
* New native VS Code extension
* Fresh coat of paint throughout the whole app
* /rewind a conversation to undo code changes
* /usage command to see plan limits
* Tab to toggle thinking (sticky across sessions)
* Ctrl-R to search history
* Unshipped claude config command
* Hooks: Reduced PostToolUse 'tool_use' ids were found without 'tool_result' blocks errors
* SDK: The Claude Code SDK is now the Claude Agent SDK Add subagents dynamically with --agents flag
[1] https://github.com/anthropics/claude-code/blob/main/CHANGELO...
I told it to crop the video to just her and remove the obscured portion and that I had ffmpeg and imagemagick installed and it looked at the video, found the crop dimensions, then ran ffmpeg and I had a video of her all cleaned up! Marvelous experience.
My only complaint is that sometimes I want high speed. Unfortunately Cerebras and Groq don't seem to have APIs that are compatible enough for someone to have put them into Charm Crush or anything. But I can't wait for that.
https://github.com/grafbase/nexus/
If croq talks openai API, you enable the anthropic protocol, and openai provider with a base url to croq. Set ANTHROPIC_BASE_URL to the open endpoint and start claude.
I haven't tested croq yet, but this could be an interesting use case...
EDIT: For anyone coming here from elsewhere, Crush from Charm supports Cerebras/Groq natively!
Crush is also not a good assistant. It does not integrate scrollback with iTerm2 so I can't look at what the assistant did. The pane that shows the diff side by side is cool but in practice I want to go see the diff + reasoning afterwards so I can alter sections of it more easily and I can't do that.
Inpainting is harder on videos than on images, but there are plenty of models that can do it. Google's Veo 3 can remove objects from videos: https://deepmind.google/models/veo/
https://www.cerebras.ai/blog/introducing-cerebras-code
But you're right, they have an OpenAI compatible API https://inference-docs.cerebras.ai/resources/openai so perhaps I can actually use this in the CLI! Thanks for making me take another look.
EDIT: Woah, Charm supports this natively. This is great. I am going to try this now.
This is pretty funny while Cursor shipped their own CLI.
https://news.ycombinator.com/item?id=45377734
1: https://block.github.io/goose/
I think I lack the social skills to community drive a fix, probably through some undiagnosed disorder or something so I've been trying to soldier alone on some issues I've had for years.
The issues are things like focus jacking in some window manager I'm using on xorg where the keyboard and the mouse get separate focuses
Goose has been somewhat promising, but still not great.
I mean overall, I don't think any of these coding agents have given me useful insight into my long vexing problems
I think there has to be some type of perception gap or knowledge asymmetry to be really useful - for instance, with foreign languages.
I've studied a few but just in the "taking classes at the local JC" way. These LLMs are absolutely fantastic aids there because I know enough to frame the question but not enough to get the answer.
There's some model for dealing with this I don't have yet.
Essentially I can ask the right question about a variety of things but arguably I'm not doing it right with the software.
I've been writing software for decades, is it really that I'm not competent enough to ask the right question? That's certainly the simplest model but it doesn't check out.
Maybe in some fields I've surpassed a point where llms are useful?
It all circles back to an existential fear of delusional competency.
I've hit this point while designing developer UX for a library I'm working on. LLMs can nail boilerplate, but when it comes to dev UX they seem to not be very good. Maybe that's because I have a specific vision and some pretty tight requirements? Dunno. I'm in the same spot as you for some stuff.
For throwaway code they're pretty great.
They seem autonomous but often aren’t.
[0] https://cognition.ai/blog/devin-sonnet-4-5-lessons-and-chall...
Sonnet 4.5 is beating Opus 4.1 on many benchmarks. Feels like it's a change they made not to 'remove options', but because it's currently universally better to just let Sonnet rip.
So I can opt out of training, but they still save the conversation? Why can't they just not use my data when I pay for things. I am tired of paying, and then them stealing my information. Tell you what, create a free tier that harvests data as the cost of the service. If you pay, no data harvesting.
Even that is debatable. There are a lot of weasel words in their text. At most they're saying "we're not training foundation models on your data", which is not to say "we're not training reward models" or "we're not testing our other-data models on your data" and so on.
I guess the safest way to view this is to consider anything you send them as potentially in the next LLMs, for better or worse.
When they ask "How is Claude doing this session?", that appears to be a sneaky way for them to harvest the current conversation based on the terms-of-service clause you pointed out.
https://claude.ai/settings/data-privacy-controls
Storing the data is not the same as stealing. It's helpful for many use cases.
I suppose they should have a way to delete conversations though.
I've always been curious. Are tags like that one: "<system-reminder>" useful at all? Is the LLM training altered to give a special meaning to specific tags when they are found?
Can a user just write those magic tags (if they knew what they are) and alter the behavior of the LLM in a similar manner?
You can just make them up, and ask it to respond with specific tags, too.
Like “Please respond with the name in <name>…</name> tags and the <surname>.”
It’s one of the approaches to forcing structured responses, or making it role-play multiple actors in one response (having each role in its tags), or asking it to do a round of self-critique in <critique> tags before the final response, etc.
Okay, I know I shouldn't anthropomorphize, but I couldn't prevent myself from thinking that this was a bit of a harsh way of saying things :(
Looks great, but it's kind of buggy:
- I can't figure out how to toggle thinking
- Have to click in the text box to write, not just anywhere in the Claude panel
- Have to click to reject edits
I spend most of my time making version files with the prompt, but pretty impressed by how far I've gotten on an idea that would have never seen the light of day....
The thoughts of having to write input validation, database persistence, and all the other boring things I've had to write a dozen times in the past....
There’s a great sweet spot though around stuff like “make me this CRUD endpoint and a migration and a model with these fields and an admin dashboard”.
The thing is a lot of software jobs boil down to not difficult but time consuming.
If you want to chat with somebody, let me know.
If Claude Code was a car it'd be the ideal practical vehicle for all kinds of uses.
If OpenAI Codex was a car, it'd be a cauldron with wheels.
The reason I say this is CC offers so many features: plan mode, hooks, escape OR ctrl-c to interrupt it, and today added quick rewind. Meanwhile Codex can't even wrap text to the width of the terminal; you can't type to it while it's working to queue up messages to steer it (you have to interrupt with Ctrl-C then type), and it doesn't show you clearly when it's editing files or what edits it's making. It's the ultimate expression of OpenAI's "the agent knows what to do, silly human" plan for the future - and I'm not here for that. I want to steer my agent, and be able to have it show me its plan before it edits anything.
I really wish the developers of Codex spent more time using Claude Code.
https://www.reddit.com/r/ClaudeAI/comments/1mlhx2j/comment/n...
Is this going to be the way forward? Switching to whichever is better at a task, code base or context?
The UX is definitely better because it uses the bubble tea library which is probably the best TUI framework ever
And you can use a ton of different providers and models
UPD. They switched 3 months ago.
Why would they?
cl --version 1.0.44 (Claude Code)
as expected … liar! ;)
cl update
Wasn't that hard sorry for bothering
I also use jj to checkpoint. When working on a change, each time I get to a stable point I squash and start fresh with an empty change.
You can absolutely continue doing that.
- Need better memory management and controls (especially across multi-repos) - /upgrade needs better management
I haven’t fully tested it yet, but I found it because its supports JetBrains IDE integration. It has MCPs as well.
I wish it was maintained by a larger team though. It has a single maintainer and they seem to be backlogged or working on other stuff. If there was an aider fork that ran forward with capabilities I'd happily switch.
That said, I haven't tried Claude Code firsthand, only saw friends using it. I'm not comfortable letting agents loose on my production codebase.
Why?
Though I will see how this pans out.
I use Opus to write the planning docs for 30 min, then use Sonnet to execute them for another 30 min.
This isn't true, you just need to use the usual shortcut twice: shift+tab
If I hit shift-Tab twice I can still get to plan mode
that's generally my workflow and I have the results saved into a CLAUDE-X-plan.md. then review the plan and incrementally change it if the initial plan isn't right.
To be honest, Claude is not great about moving cards when it's done with a task, but this workflow is very helpful for getting it back on track if I need to exit a session for any reason.
### Development Process
All work must be done via TODO.md. If the file is empty, then we need to write our next todo list.
When TODO.md is populated:
1. Read the entire TODO.md file first 2. Work through tasks in the exact order listed 3. Reference specific TODO.md sections when reporting progress 4. Mark progress by checking off todos in the file 5. Never abbreviate, summarize, or reinterpret TODO.md tasks
A TODO file is done when every box has been checked off due to completion of the associated task.
WTF. Terrible decision if true. I don't see that in the changelog though
They just changed it so you can't set it to use Opus in planning mode... it uses Sonnet 4.5 for both.
Which makes sense Iif it really is a stronger and cheaper model.
If you have run your own benchmarks or have convincing anecdotes to the contrary, that would be an interesting contribution to the discussion.
I hope this is the case.
I feel like there's so many bugs. The / commands for add-dir and others I used often are gone.
I logged in, it still says "Login"
Pardon my ignorance, but what does this mean? It's a terminal app that has always expanded to the full terminal, no? I've not noticed any difference in how it renders in the terminal.
What am i misunderstanding in your comment?
I just downgraded to v1 to confirm this.
Wonder what changes that i'm not seeing? Do you think it's a regression or intentional?
pretty sure your old behavior was the broken one tho - i vaguely remember fugling with this to "fullscreen correctly" for a claude-in-docker-in-cygwin-via-MSYS2 a while ago
I do like the model selection with opencode though
- supports every LLM provider under the sun, including Anthropic
- has built-in LSP support https://opencode.ai/docs/lsp