Vibe Coding in Practice: Building an Obsidian Plugin

Last month, Andrej Karpathy tweeted about "vibe coding": giving in to the vibes, letting AI write the code, barely touching the keyboard. It got 4.5 million views. Andrew Ng called it "a bad name for a deeply intellectual exercise." Reddit and Hacker News split into camps. The dev community is still divided on it and I’m not a fan of the name either (it has the connotation of developing while high/buzzed).

I'd been using Copilot since 2022, but only trusted it for autocomplete — tab-completing boilerplate, filling in obvious patterns. Sometime around mid-2024 the models got good enough that I started copying whole functions out of Codex and dropping them into my projects. Then I started describing features in Aider (a terminal-based version of Cursor), and iterating on the output. With early versions there were a ton of iterations and it was pretty common to have agents make silly mistakes like dumping code based on unrelated framework into the project. It was exactly as the memes would suggest: 20% coding, 80% debugging some silly assumption the bot made. With time it got better: fewer corrections, fewer iterations.

By the time I started building my Obsidian plugin , I already had a few months of semi-vibe-coding under my belt. I did some refactoring of Investomation code that I discussed in my Smarter, Fast, Better post with the help of Aider (the bots are actually surprisingly good at refactoring due to their ability to pattern off of existing code). Back then I was using Gemini, which is notorious for its hallucinations, even the Pro version. Regardless of the explanation, there was definitely a bit of "vibe" to this type of coding. It felt both more productive and more error-prone. I could now paint with bigger brush strokes, but there was also more collateral damage - good enough for a throwaway project, but not something you could trust with complex logic.

AI has a weird way of failing. It will recommend a novel approach, then fail to implement it correctly, or miss crucial detail in a way that no human would. Earlier versions were even worse. Sometimes you'd ask it to rewrite a small portion of the code, and it would rewrite the entire module, making old bugs irrelevant but introducing new ones in the process. It felt like rolling the dice to regenerate the code, not like actually writing it. The "vibe" part was both a blessing and a curse, you couldn't really control the outcome, not reliably at least. And with the earlier systems, the more you tried to control for it, the worse it got - both because of the context window limitations and because simply telling GPT algorithm not to focus on something brought its attention to it. These agents are based on attention, simply telling them not to do something is like telling a kid not to think about candy.

The Right Project at the Right Time

Organization is my Achilles heel. When I was promoted to Senior Engineering Manager and given 6 direct reports, I knew I'd be in Peter principle territory. I was ill-equipped for this role. I enjoy solving challenging architectural problems, not filling out TPS reports and tracking the workload of 6 other people. Every single organization system I tried failed me. Traditional filing systems require commitment to see benefits, there is upkeep cost. Forgetting to use it is like letting go of a full water bucket when it's half-way out of the well, it resets progress. Even GTD, one of the simplest and most intuitive systems, requires commitment to a filing cabinet and an inbox basket. In digital age, it's not practical.

I wanted to build something different, a system that self-organizes like a search engine using algorithms, not one requiring the human to handle organization. I had one in my head and vibe-coding presented a perfect opportunity to build one:

I was already using Obsidian as the source of truth for my notes
I wouldn't need to learn Obsidian libraries in order to build something functional, the LLM would do it for me.
It's a great way to learn vibe-coding by doing

Even if the project goes nowhere, the days of regular programming are numbered. These systems will only get better, and I'll be learning this anyway in a couple years, but no longer on my terms.

This felt like the perfect project. It's a clean slate, no legacy code to break, every feature is new. And even if the output is messy, that's fine. The alternative isn't clean hand-written code, it's no plugin at all. Worst case, I end up with something functional but ugly and get real practice with a skillset I need to develop. That's a win either way.

What's Been Working Well

The plugin needed a lot of boilerplate: settings tabs, file I/O wrappers, modal dialogs, command registration. AI cranked through all of it in minutes. The output was never clean, functions were longer than they needed to be, logic duplicated across files, performance sucked, but it worked. And "works but messy" beats "doesn't exist" when I'm building something for myself. I could always tidy it later. The build loop felt noticeably faster, I'd describe a feature, get a rough version, tweak the prompt, get a better version. What used to be a weekend of reading Obsidian API docs and writing TypeScript by hand became an afternoon of iterating. But the last 20% always needed me. The AI would get the intent right and botch the details, a modal that opens but doesn't populate its fields, a search that returns results but loses some. Every feature shipped only after I sat down and finished what the AI started.

AI relies on trained data. Anything with a lot of online examples, standard Obsidian plugin patterns, common TypeScript idioms, API calls with well-documented endpoints, the AI nailed. But when I needed something specific to my plugin's architecture, it floundered. There was no Stack Overflow thread to pattern-match against, so it guessed, and the guesses were bad. The other thing I kept running into was organization. The AI would pile everything into one giant file, 800 lines of mixed concerns, like a junior dev who hasn't learned why separation matters. I'd have to explain where logic should live, which module owns which responsibility, how the pieces connect. Left to its own devices, it would draw boundaries that made no intuitive sense, splitting by arbitrary criteria instead of by domain.

Where It Falls Apart

AI still gets complex implementations wrong. It fails to respect interfaces, wires up functions wrong, and mixes up incompatible implementations. This is especially true for third party libraries, it's not uncommon for it to hallucinate some React into your Svelte project, call a Leaflet method that doesn't exist, or invert latitude and longitude. The more complex the project is, the higher the error rate (in big part due to the context window size).

State management bugs are common, so are racing conditions (although ironically, it's surprisingly good at fixing racing condition bugs when I call it out on it). If you point out its flaws, it will acknowledge its poor decision and enthusiastically offer a solution – but you need expertise to identify what's wrong. Attention is the name of the game.

Its architecture is often a house of cards. Each layer of generated code adds assumptions the AI made that I didn't validate, and those assumptions compound. By the time I realize the foundation is wrong, I've built three floors on top of it. I've learned this the hard way more than once since I started experimenting last summer.

The Skill Shift

Vibe coding (even in its current crappy condition) allows developers to iterate faster, it's a force multiplier. It's to modern programming what modern programming is to writing assembly. The latter skill is still in demand, but will not be default mode of operation for long. Just like it doesn't make sense to write a sort by hand outside of academia, it no longer makes sense to write CRUD logic by hand. But if you decide to vibe-code your kernel, you're in for a world of hurt (at least for now). The developers who dismiss it aren't wrong about quality, most of them use AI daily and their actual objection is shipping unreviewed code, and that's fair. But dismissing the approach misses the bigger shift: the default skill is moving from writing code to evaluating it, knowing when AI output is solid and when it needs to be thrown away.

After a few months of this, the pattern is pretty clear:

AI handles boilerplate and well-documented patterns fast, but the output is messy and needs cleanup
The build loop is noticeably faster (afternoon instead of weekend), but the human still finishes the job
The AI flounders on anything without online precedent: novel architecture, custom scoring logic, domain-specific decisions
Code organization is a human skill, AI draws boundaries that make no intuitive sense
Vibe coding builds working prototypes, not production code

I fly DJI drones, and the parallel is hard to ignore: a beginner can get technically flawless aerial footage on day one, but "sharp and stable" doesn't mean "worth watching." The skill was never in the flying, it's in knowing which shots to keep and which to cut. Vibe coding is the same shift. It's not the death of programming skill, it's a migration of where that skill gets applied. Karpathy just gave it a catchy name.

If you want to follow the plugin's development, the repo is here: https://github.com/atsepkov/obsidian-plus.