WikiWorkshop 2025 Recap2025-05-23

Most of my research efforts have been on YouTube and TikTok in recent years, but I still try to find time for Wikipedia (as more than just a volunteer). So I was looking forward to this year's WikiWorkshop, an annual [virtual, these days] conference dedicated to Wikipedia/Wikimedia-related research.

The format is unusual: everyone submits a 7-minute video two weeks before the conference. On the day, organizers play the video and the authors are present for Q&A afterwards. It feels like a cross between a lightning talk and a poster session. Cons: producing the video takes a little bit of effort, and I imagine some people who have never done so before could struggle technically; 7 minutes is also really short to present a research paper. Pros: no screen-sharing or audio-muting live technical difficulties; presentations are generally tighter and cleaner; nobody suffers from stagefright; maximized information-per-session. I wouldn't want every conference to go that way, but I'm glad it exists.

So I tried to turn my paper, approaching 10,000 words, into a 7-minute video. The idea of the paper: Wikipedia's size and importance continues to increase while its labor pool continues to decrease. As a result, the community has had to find ways to scale labor and make compromises to its founding principles of openness/low barriers to participation. I looked at 20 years of block logs (records of when users had their ability to edit revoked) to look for how that need for scaling/compromises might manfiest itself. I'll leave it at that and put my 7-minute video below. Stay tuned for the full paper.

This was my first time attending the WikiWorkshop, so I wasn't sure what level of work to expect. Would it be all academics presenting their research, Wikipedia volunteers discussing practical challenges, volunteer developers showing off new tools, the Wikimedia Foundation giving updates on products... most large wiki events are some blend of these). Almost entirely the first, as it turns out -- the overall level of work was about what I'd expect at any academic conference. So let me share some of the presentations that stood out!

Linguistic Difference and Content Diversity

Perhaps my favorite pair of presentations were in the "Linguistic Difference and Content Diversity" session, complementing each other on the subject of translation. For context, Wikipedia is available in hundreds of different languages, each written more or less independently of one another. The English Wikipedia is the oldest, largest, and most popular of the bunch, which means a lot of people treat it with some degree of primacy. When it comes to translation, this means it's believed to be kind of a standard to which other language versions are measured, and English Wikipedia articles are often translated into other languages. The paper "Translation Imbalances Between Wikipedia Language Editions" by Adam Wight, Simulo, Kavitha Appakayala, Abhishek Bhardwaj, and Nathaly Toledo measured "translation flows" using a tool which helps Wikipedia editors translate from one Wikipedia language into another. 73% of all translations come from the English Wikipedia. According to the authors, a large number was expected, but 73% is disproportionately large. Why?

Seems like a potential "winner take most" effect to me, where being the biggest gives you not just technical but reputational and structural advantages, granting you a disproportionate amount of overall attention/benefit. Another factor, which I think the authors or another attendee also touched upon, is that the English Wikipedia is broadly not a fan of the translation tool in particular and machine translation in general, placing heavy restrictions and heightened scrutiny on translated work.

In "Measuring Cross-Lingual Information Gaps in English Wikipedia: A Case Study of LGBT People Portrayals," Farhan Samir, Chan Young Park, Zining Wang, Anjalie Field, Vered Shwartz, and Yulia Tsvetkov challenge the common perception of the English Wikipedia's version of any given article as the "complete" or "authoritative" version, and that other languages typically have a "subset" of the English Wikipedia version. To do so, they developed a tool called InfoGap, a pipeline where articles are decomposed into discrete facts that can then be compared across languages. Using the article on Brittney Griner as an example, they found that only 39% of the English version of the article was present in the French version, and the French version included a range of facts that weren't in the English version. Comparing English to Russian version of the Tim Cook biography, they found, for example, a fact in the Russian version (but not in the English version) about Cook's offer to double employees' donations to the Ukraine war effort.

The English Wikipedia has slowly tightened its standards for quality, especially for high-visibility articles like those on Griner or Cook, and there are a range of procedural and normative differences between various language versions, so I wonder how many of these "gaps" could be chalked up to diverse rulesets and practices. Using the Cook article as a test, at least, I went to the Russian Wikipedia and found the line highlighted by the authors -- it had two solid references. Checked the talk page and its archives -- no mention of Ukraine. No mention of Ukraine in the edit summaries in the article's history, either. Seems like a straightforward case of a fact that was observed by one set of editors and not by another -- a solid case for the authors.

I'm intrigued by this InfoGap tool for other reasons, though. If it's effective in breaking a Wikipedia article down into constituent facts in a range of language and comparing them, it could also be useful to identify cases of systemic bias. Take all of the articles on a subject like a war or a political issue, build out datasets of facts present/missing in each language version of each article, and then run them through some other model -- maybe even some basic topic modeling -- and see if patterns emerge. Do the same thing with the same articles as they existed, say, 6 months before that war or a year before an issue was politicized, and look to see how it's changed. There's potential there to find clues for influence operations, systemic bias, or even just changing tides of public/journalistic attitudes on a subject. I tracked down the paper on arXiv and the Github repository -- going to have to make some time to explore further.

Community Health and Participation

Next session: "Community health and participation." A stand-out here was Nicole Schwitter's "From meetups to adminship: The offline path to leadership on the German-language Wikipedia." Basically, does attending offline meetup events with other Wikipedians increase the likelihood that someone will run for adminship (a [declining] user group with permissions to delete/protect pages and block users). Answer: nope. I love a nice, clean negative results paper, especially when it contradicts an optimistic belief held by exactly the audience you're presenting it to. :)

Also in this session was the WMF's report on administrators. Rather than link the extended abstract, I'll just link the big report it's based on: Wikipedia Administrator Recruitment, Retention, and Attrition. It's a really useful survey-based report about, again, one of Wikipedia's key declining resources: administrators. Too much to cover here, but it's a really useful project. Credit to Eli Asikin-Garmager, Yu-Ming Liou, Claudia Lo, Carolina Myrick, Bethany Gerdemann, Daisy Chen, and Siego Saez Trumper.

AI and LLMs

The last session I went to was "AI and LLMs," which is tricky for Wikipedia. My experience is primarily with the English Wikipedia, which seems to have developed a severe allergy to all things AI. Most AI-related tool use is prohibited or heavily stigmatized. The situation is similarly hostile, though less so, on Wikimedia Commons. I don't have s good sense of how much other wiki communities have embraced it, but it necessarily affects the way I listen to research about possible uses for LLMs on Wikipedia. Heather Ford and Michael Davis have a WMF-funded project on the "Implications of generative AI for knowledge integrity on Wikipedia." It's a thoughtful report, and I found it most interesting where it dealt with the many ways AI/LLMs affect Wikipedia in ways beyond people putting AI-generated stuff on Wikipedia. e.g. "genAI’s wholesale extraction of Wikimedia content for training without regard for open licensing conditions reflects a larger risk to the ongoing sustainability of open knowledge. Second, the growth in genAI tools functional solely in large languages may worsen the disparity between different language versions."

Also in that session was a nice methodology from Rona Aviram and Omer Benjakob via "Are we headed for another AI winter? Investigating the evolution of artificial intelligence as seen through Wikipedia’s archives." It's the latest application of a method they've used before, analyzing a subject through the development of content on Wikipedia. In other words, this is about what Wikipedia tells us about the history of AI, not what AI can do for Wikipedia. They look at section-level changes over time, editing activity, and other data and construct a narrative about the trajectory of public discourse about AI. This looks like the relevant Github repository for the scripts involved. Again, this will be worth looking into as a useful tool/method in the future.

The Rest

These are just a few of the talks in just the sessions I was able to attend. Check out the full schedule for other papers/abstracts. Overall it was a nice event. Between sessions there were "ask me anything" interviews, a town hall, research awards, and even live music (which could've been silly, but was actually kind of delightful). Kudos to Ugne Daniele. Below is the video for my presentation, for anyone interested.