Blog archives

I had an AI identity crisis at a hackathon, so I made it everyone’s problem

Wikimedians deciphering the code i put in my logo at the Wikimedia Northwestern Europe Hackathon

Summary

I recently upgraded a five-year-old tool called Depictor in two days using AI. Something I’d struggled with for years, done in a weekend. That triggered a small identity crisis and led me to organize a workshop at the Wikimedia Hackathon in Milan around how we as a community should deal with the paradigm shift of AI.

This article is a write-up of those sessions and my thinking around them. It covers the good (lower barriers for non-coders, accessibility wins), the bad (scrapers hammering our servers, slop PRs drowning maintainers), and the hard questions about digital sovereignty, whether coding loses something when it stops being social, and where individual ethics end and community standards begin. The bottom line: we need to stick to our values, be more specific about the language we use, and most importantly remember that trust, not article count or technical innovation, is what the Wikimedia movement has to protect.

This article will take you about 20 minutes to read (~4,000 words).

Where it all started

Six weeks ago I went to the Wikimedia Northwestern Europe Hackathon in Arnhem, the Netherlands. I had a blast. I designed the logo, which included a hidden code. Participants were in the lobby at 1 am trying to figure out what I had hidden in the code, and together they cracked it. Of course people tried uploading the logo to their LLM of choice, like ChatGPT or Gemini, and trying to figure it out that way, but that didn’t work. So at 1 am participants were in the lobby trying to decipher the code. You know, actual humans, doing actual work.

But something else also happened during that hackathon. To explain what happened we first need to go back in time, to 2021, when I made a tool called Depictor. It lets you easily add structured “depicts” statements to files on Wikimedia Commons. It’s a very popular tool: almost three million edits have been made with it. One user has done almost half of those edits. Our community of volunteers never ceases to amaze me.

There were a couple of issues with Depictor though. Back in 2021 I wrote the code using a Javascript framework called Vue, specifically Vue 2. Vue 2 reached end of life at the end of 2023, which meant I needed to upgrade the code to Vue 3. I made several attempts, but it was a lot of work and I never finished it. So at the start of the hackathon, the codebase was basically the same as it had been five years ago.

At the hackathon, I met another fellow Wikimedian, and together we worked on Depictor. In two days, we upgraded all the code from Vue 2 to Vue 3. And not just that: we also upgraded the outdated PHP version, we converted the entire codebase to TypeScript, and we even fixed a few bugs and added some small new features.

Two days for something I’d struggled with for two or three years. How is that possible? Well, you can probably guess. The answer, of course, is AI.

I’d done some things with AI before, a few projects and small snippets of code, but I’d never taken a big application with a deprecated codebase and turned it into something functional. I really wasn’t expecting it to be possible in two days. Or maybe I was expecting it was possible in theory, but I hadn’t experienced it myself.

This led to a small identity crisis. Because if an AI can do my work, what is my job? What is my purpose? I started coding when I was five years old. It’s how I earn my living. So, what do I do now that a machine can do my work as well?

Paradigm shift

Of course I’m not the only one who has identified AI as a threat, or at least a fundamental change in how we as a community work. Wikimedian Christopher Henner put this well in an essay he wrote earlier this year, framing AI not as a tool but as a paradigm shift.

Writer and entrepreneur Anil Dash also wrote an essay in which he distinguishes between two kinds of coders: people who code because it’s their job, who want to earn money and write code, and people who write code because they like it, because they enjoy the craft. I firmly believe most Wikimedia developers fall into that last category. Dash also mentions the role of grief. A kind of labor and craft we were proud of as developers has seemingly been taken from us. The satisfaction of figuring something out yourself. Now a machine can do it. And that’s, well, just pretty fucked up.

Meaningful applications

I’ve had many thoughts since the events of the Arnhem hackathon. I don’t think you can say AI and LLMs are either good or bad. It’s all much more nuanced, like every important societal and technological change.

People who are technical but can’t code are now able to do things they couldn’t before. My wife works as a digital project lead and can now just ask an LLM to write her some Python code or a complicated Excel formula to analyze complex data instead of finding a developer to do it for her. AI is empowering a whole group of people who can now do things they could never do before.

For some people the implications might be even bigger. I recently came across this article about stand-up comedian JT who has almost 100,000 followers on TikTok. He has cerebral palsy and has difficulty talking, so his videos need to be subtitled. A job that used to take him 3 to 4 hours, until a volunteer vibe-coded an app that uses a text-to-speech model to cut that to 15 minutes. A very meaningful application of technology.

Darker sides

But obviously there is also a darker side to AI, and this is also a reason I wanted to organize a workshop during the International Hackathon in Milan this May. After the Arnhem hackathon both Lucas Werkmeister and Taavi Väänänen wrote recaps. They both mentioned the mixed feelings they had about people proudly telling other Wikimedians about the projects they ‘vibe-coded’.

Because all of those LLMs that are used for vibe-coding are trained on the hard work of Wikimedians who never saw a penny or any kind of attribution from the big tech companies building those models. They just pointed their scrapers at the Wikimedia sites and got it all for free. This causes real technical issues because of the exponential amount of traffic the Wikimedia servers get, forcing the engineers to take drastic measures. This actually happened at the hackathon in Arnhem six weeks ago: we couldn’t run some tools because rate limits had kicked in.

We have this wonderful vision as Wikimedians, imagining a world in which every single human being can freely share in the sum of all knowledge. That vision speaks of humans. It says nothing about AI scrapers slurping up all our data.

We should also mention the issue of sloppification. Many open source projects are currently overrun by quickly written slop pull requests and patches, causing the already overworked volunteers to spend their precious time giving feedback to low-effort slop pull requests instead of spending it on valuable changes to the projects.

And let’s not even get started about the amount of energy data centers need, or the questionable political biases and government ties of some of these models.

We’ve been here before

One thing I realized when I was thinking about this subject is that we as humanity have been here many times before. With technology taking over things previously done by humans.

A particularly apt example took place almost thirty years ago, in May 1997, when a computer called Deep Blue won a chess match against world champion Garry Kasparov.

Deep Blue co-creator Feng-hsiung Hsu wrote a 2002 book about building the machine that beat Kasparov. Two observations from this book feel particularly apt right now.

Man as performer vs man as toolmaker

The first is the reframing of the debate. In the media the debate was framed as ‘human vs machine’. Can a machine beat a human? Can a computer play better chess than a human?

Hsu says we shouldn’t see it like this. People tend to forget that there are actual people who built the computer that beat Garry Kasparov. It’s not man versus machine, it’s man as performer, Garry Kasparov performing the act of playing chess, and man as toolmaker, the people who built Deep Blue.

You could say we’re now in a similar situation. We as developers are the performers. And “man as toolmaker”, the people building today’s LLMs, the people training models on the content we wrote, these are the toolmakers taking over from us. It’s not really man versus machine. It’s one group of humans, with a different set of tools, taking over the work of another group of humans. That also ties back to the scraping: the toolmakers built their tools on top of what the performers, us, made. Just like Deep Blue was developed using countless grandmaster games for reference.

Productive coexistence

The second observation is actually from the foreword to the 2022 edition, by computer scientist Jon Kleinberg.

Kleinberg identifies three challenges: how to anticipate consequences, how to make sure we can explain the decisions AI systems make, and how to ensure productive coexistence between the behavior of AI and the behavior of human beings. I especially loved that last point, so I decided to use that as the basis for the workshops I held during the Wikimedia Hackathon in Milan on May 1st.

The workshop

The workshop I held in Milan was attended by around 50 to 70 people. So many people tried to fit into the room that we were in danger of violating fire regulations, so I had to give it twice. In the end that was probably for the better, we had the same discussion twice, and some points were explored more broadly in one session than the other and vice versa.

I asked the group to answer the following question using the 1-2-4-All workshop method:

What kind of community do we want to be in five years, and what does “productive coexistence” with AI systems look like for us?

During the second workshop I was kindly informed that my question was biased: asking what “productive coexistence with AI” would look like for us is already implying that we will use AI systems, so people might have a preconception that is not neutral but too positive. Fortunately Wikimedians are very smart and critical, so I hope that compensated for it somewhat.

Both sessions resulted in dozens of subjects and talking points (good luck deciphering my handwriting). I’ll try to summarize them here. The points came from many people. I hope I’ve represented them fairly.

What we talk about when we talk about AI

One point that kept coming back again and again was that we desperately need better words, statistics and comparisons when we talk about AI. “AI” has become a blanket term, just like we had earlier with terms like ‘big data’ or ‘web 2.0’. Big tech companies have poured marketing money into framing AI as an unmissable technology everyone needs to invest in. That, of course, serves their own interest. If we don’t buy into the hype the bubble will burst, and all of the billions of investment money will be for nothing. There have been many signs this will happen sooner or later. Five years (the horizon I gave in the workshop question) seems like a plausible window.

AI is supposed to improve productivity and efficiency, but studies suggest those gains are vastly overrated. We badly need actual statistics on productivity and efficiency gains, and concrete metrics on the specific AI-based tools we use or plan to use.

With any new technology we’re making comparisons to the existing ones. And when talking about the advantages and disadvantages we’re trying to figure out if things are really ‘new’ or just a different flavour. Are agents the same as bots? What is the difference between generating AI slop code and writing sloppy code? We complain now about people that vibe-code and don’t understand what their code does. But isn’t that the same as people in the 1980s complaining that you could only understand a computer if you soldered it yourself?

One thing that is clear is that AI is not neutral. Technology never is. Neil Postman wrote about this in his seminal 1985 work Amusing Ourselves to Death, about the influence of entertainment and television on society:

So maybe we need new comparisons and language. I liked the suggestion of comparing the layer of AI to the Eiffel tower (appropriately Wikimania this year is happening in Paris): we’ve built stack upon stack, layer upon layer, and now we’re somewhere at the top. We started with machine language or assembly, then higher-level languages like C and C++, and from there we got more and more abstract until we can talk to our machines using natural language. But we’re now looking down and asking ourselves if we still have the correct foundation.

Another less prosaic comparison was to the nuclear bomb. Here’s a technology that we developed and used, and after that collectively decided never to use again, even though the fear and chance of that happening is still here.

Digital sovereignty

The Wikimedia movement has always been uniquely positioned as one of the few large-scale online projects that is consistenly choosing open source solutions instead of relying on the products of big tech. Especially with the current political situation in the U.S. there has been talk (outside of the Wikimedia movement) of becoming ‘digitally autonomous’ or gaining ‘digital sovereignty’. And this need is urgent. Recent research has shown that nearly the entire Dutch government relies on American cloud services. And the Netherlands is currently still in the process of moving its national digital identity service to an American company. I don’t think i need to explain to the Wikimedia audience why that is a bad thing.

It’s obvious that the same trend might also happen with AI systems. What happens if there’s a monopolist or duopolist in the AI space? OpenAI might be bought by Microsoft. The remaining survivors will raise prices. You already see this phenomenon happening with Anthropic banning use of popular tools because their terms of service don’t allow it.

I think most workshop attendees agreed that we can’t have that as the Wikimedia movement. We can’t have vendor lock-in with big tech companies. We need to keep our own independent tech stack.

The question is if we as a movement want to invest the large sums of money needed to train our own models. The answer is probably no, but then the question would be: who do we partner with? Some of the open weight models (that you can run on your own hardware), like Qwen, Gemma, Mistral and DeepSeek, are quite good. However, they rarely publish their training data, which is quite essential to understand what kind of bias they might have.

It’s a public secret why that training data is rarely published: it was collected by dubious means. The frontier models (ChatGPT, Claude, Gemini, etc.) are trained on a dataset which is basically the whole internet (including pirated content) because they believe (or at least tell us) that it’s the only way to deliver their expected quality.

Whether that’s true or not is an open question, but it is clear that people have an expectation of how good AI models are based on those frontier models they already know. Of course this is a problem many open source projects face (related to Jakob’s law). But the gap between the frontier models and ‘completely open’ models seems to be especially big.

Coding as a social activity

I mentioned earlier that a positive side of AI is that technical people who can’t code might now be able to participate more in our community. The line between ‘techie’ and ‘non-techie’ is becoming blurrier, and I think that’s a good thing.

Especially for junior developers onboarding might get easier. Instead of reading the vast heaps of (sometimes outdated) documentation and figuring out things by trail-and-error people can now use an LLM to generate context-specific examples.

But again, there are also concerns. Being able to use an LLM for help might mean less interaction and less learning by seeing how other people get things done. The cliché of a solitary coder is sometimes true, but the fact that there are hundreds of developers walking around at this hackathon breaks that. Developers are social people too, and coding can be a social activity, whether in-person, at a conference, or online through code reviews and discussions. Removing a lot of the incentives to do that (because you can just ask an LLM) might also make people less social and eager to talk to other coders.

Senior developers can assess the quality of LLM-generated code because of their experience. But if you have little experience, how do you do that? And how do you create a solid understanding of technology if all you ever do is work together with an AI chatbot?

So, to put it bluntly: Is AI making us dumber? There is some research on this. It seems that in general, passive usage of text, LLM or not, makes you dumber. If you’re just copy-pasting stuff and not learning from it, you’re not getting smarter. On the other hand, other studies indicate AI-driven tutoring might outperform active learning classes.

So once again, nuance and ‘it depends’ is appropriate. Because ‘AI’ is such a blanket term we just can’t assume every use of it is bad or good.

The human backbone

Just like we can’t say that AI is good or bad, there’s no way of categorically saying when to use AI or when not. We need to be specific where we use AI, how, in what context and what setting. Comparing vibe-coded prototypes to critical production code is comparing apples to oranges.

This human backbone, as it was called in one session, or “human in the loop” as we called it in the other, is super important. You, as a coder, but most importantly as a human, keep responsibility for the code you’ve written. If you want to write sloppy, awful code, that’s fine, but it’s your responsibility, and we’re not going to go easier on it because you used a fancy AI tool.

The main sticking point is of course where the considerations of your own individual use of AI end and community or corporate (e.g. the Wikimedia Foundation) policy and rules start. Is it okay to vibe-code an app to learn a new library? How about putting a vibe-coded tool on Toolforge? Is it okay to use an LLM to check for obvious bugs in a patch to MediaWiki core? How about letting it upgrade the code to use new PHP features? How about letting it write an entire class? A plugin? What about letting an AI agent write all code reviews? Maintain and deploy AI-written code to MediaWiki automatically?

I’m sure there’s some point among those examples where you’ll think: ‘no, that’s not okay at all’. But the bar will be different for everyone, and we don’t have common standards and etiquette yet for what’s okay and what isn’t.

We do have a universal code of conduct. That code of conduct was written with humans in mind, but maybe we also need another code of conduct, for robots and automated processes. We do have some rules on what you can and cannot scrape, but maybe this needs to be expanded.

And maybe we need to rethink the current universal code of conduct. One of the founding principles of the code is ‘all who participate in Wikimedia projects and spaces (…) strive towards accuracy and verifiability in all its work [and are] part of a global community that will avoid bias and prejudice’. Given the issues with LLMs for exactly those things (accuracy, verifiability and bias), doesn’t using LLMs and other AI technologies violate the code of conduct? And how do we verify if someone is human? I hope we’re not heading towards a situation where everyone needs to send a copy of their passport to the Wikimedia Foundation before they can edit.

There are some interesting new developments. Most of the major language editions of Wikipedia have a policy for LLM-generated or assisted content. On Meta-Wiki a request for comments is taking place for a global AI policy for contributing to the projects. Most of the discussion around the policy proposals seems pretty divisive. But at least it’s a first step in making it clearer where we want to be heading as a community.

One of the tendencies that is prevalent in most of the policy proposals is a need for more transparency. If content is LLM-assisted in any way, whether that’s code, text or another form of media, it should be clear what the origins are and how it was made. Just as we ask for proper citations in articles, and attribution and version history in code.

Violations and equality

Violations of our legal terms might also apply to code. One loophole of avoiding open source licenses like the GPL is doing ‘clean room implementations‘ of existing code. Feeding that code back in our codebases might be a violation of those very licenses.

Apart from legal implications there is also the issue of equality and accessibility. A $20 subscription to a frontier coding agent might be accessible to people living in Western countries, but in a country like Pakistan that’s two days’ wages. As a movement, I hope we strive toward a world where more people get the chance to contribute, not one where we create haves and have-nots.

Wrapping up

Thinking back to the two workshops I organized and reading these notes again I’m reminded of the excellent keynote that philosopher and writer Maxim Februari gave during the Dutch Wikicon in November 2025. He argued that in a post-truth world Wikipedia (and hence, the Wikimedia movement) has one quality that is very valuable: trust.

When it’s becoming unclear what’s true and what isn’t, people fall back on the sources they trust, and over the last 25 years we’ve clearly earned people’s trust. Instead of focusing on growing the number of articles, maybe we should revisit old ones, rethink old procedures, and ask how we can keep the trust people have in our work.

One thing I learned from this workshop is that it’s more important than ever to stick to our values, while also thinking about how we put them into action in the present and future.

With the hype and FOMO of AI breathing down our necks, it might feel like the right move now is to buy big into these new technologies. But I think in the long run it’s better to err on the side of caution and conservatism. One approach would be a ‘no-unless’ policy: we don’t use AI technologies unless there is a clear and well-defined exception. That could be translation, accessibility features, and maybe even coding assistance or security-issue detection. As long as the goals, intentions and policies are very clear and transparent.

Whatever shape it takes, it should make things better for us as humans, not for big tech or robots. Maybe that is what ‘productive coexistence’ means.


Full transparency: I used several AI technologies to help set up the workshop and write this article. Call it fancy spell check.

All faults and omissions are my own. The jury’s still out on whether this article and the workshop would have been better if I hadn’t used these tools at all. Time will tell i guess.

I once again like to thank all the people who attended the workshop, and also the ones I spoke to before and after that gave me valuable insights and new ideas on this subject.

Add a comment