As of writing this blog post, I have been involved in the organization of two hackathons and I've participated in a good number of them myself. Hackathons often come with prizes, but to determine who wins those prizes, it needs to be determined which projects are of the highest quality. This is often done in two ways: either professional judges review each and every project and hand pick the best one, or the projects are released to the participants and each participant is allowed to vote on which projects should win.
My first hackathon, MelonJam 5 used a judging system to determine winners but included a community vote prize for the game voted first place overall on the itch.io jam page. The jam allowed any one submitter to rate any other submission on categories such as gameplay, theming, and polish from one to five stars. On the surface, this seems like a great way to allow the community to participate in the democratic process of selecting the winners but if you think about it for more than five minutes, you start to discover all kinds of issues.
The vast majority of traditional hackathons use a professional judging system. Oftentimes the judges are representatives sent by sponsors or the organizers themselves. Peer voting as a judging system began to rise in popularity mainly within Hack Club hosted global satellite hackathons such as Counterspell, Scrapyard and Daydream. Hack Club has also been known to use peer voting to determine project quality in several programs such as Summer of Making.
The fundamental flaws of the peer voting system stem from the fact that participants would not be able to judge projects fairly due to personal biases and conflicts of interest.
Every participant, even those who attend an event just for the experience, wants to win a prize. When given a vote, some participants will vote fairly, but the desire to win can take others down a more nefarious path. Participants can and will "trade votes" with other teams in order to boost their own chances of winning prizes, something I have personally witnessed at Scrapyard Silicon Valley. In other cases like MakerHacks 1, I have seen participants of the Open Prototype (traditional hackathon) track begging robotics participants (who were allowed to vote in the open prototype voting form) to vote for their project, making the judging less about project quality and more about advertising.
While advertising and trading votes can significantly influence the results of a ballot judged hackathon, when the voting system is not casting ballots but rather rating each and every project on a scale such as in the case of itch.io game jams, the potential for nefarious influence significantly increases. Properly testing, rating, and reviewing a project takes time, which makes it impractical for a participant to rate every project. However, it is extremely easy for a nefarious actor to spam one-star ratings on many projects to decrease the average rating of other projects and make their own look better in comparison. This spamming of low ratings is known as "review bombing" and even a small percentage of review bombers can destroy honestly voted ratings to the point where they become effectively useless for any kind of judging.
Some may be quick to point out that review bombing will have the same effect on every project in the event. This is obviously an incorrect statement as some projects receive less ratings than others due to incompatible platforms, a steeper learning curve, or general randomness, meaning a review bomb will have far worse effects on projects with less ratings.
Even if a participant is trying to vote ethically, it is effectively impossible for them to make a completely fair vote. In large scale events, a participant may have dozens or even hundreds of projects to choose between and even if they wanted to, they simply would not have the time to carefully review every project. Soft barriers such as a lack of thumbnail images or tutorial or hard barriers like platform incompatibility can hinder a participant's attempt to test and rate a project. The limited amount of time to review projects ultimately leads to votes being determined by first impressions instead of overall functionality. Logical biases such as the familiarity and availability heuristics can cause participants to be biased toward projects made by teams working near them. Ultimately all of these factors combined lead to biased results not caused by nefarious intent but by the imperfection of human nature.
Professional judges are also affected by many of the same logical biases that affect participants, but the effect is mitigated significantly due to a judge's exposure to much more projects. While judges may have biases favoring a certain type or style of project, they do not have any personal interest in manipulating the results.
A judge's entire role is to judge projects, so they are able to review more projects with more time for each project, allowing them to judge projects more fairly. Professional judges have far more knowledge about the industry they're employed in, so they can most accurately determine which projects are of the highest quality. It is nearly impossible to nefariously influence the outcome of a professionally judged hackathon. Professional judging is not a perfect system, but it is far less biased and unfair as peer voted results.
The best and most obvious solution, the one that the majority of hackathons have implemented, is simply not to use peer voting. This was and still is the case in MelonJam and in essentially every hackathon not affiliated with Hack Club.
In extremely large scale events such as the GMTK Game Jam, the number of submissions makes a pure judging system extremely impractical, and purely voted results would contain all the issues mentioned above. Instead, GMTK uses a hybrid system where entries enter a round of community voting and the top 100 games in the jam, and the judge will play those 100 games to determine 20 winners.
When peer voting is absolutely required, actions can be taken to improve the fairness of the process. These actions include but are not limited to: requiring votes to include a review of the project, allowing organizers to disqualify projects that are asset or template rips, and automatic detection and removal of review bombs.
However, in smaller events such as Daydream Global and smaller satellite hackathons, there really is no excuse to not involve judges in some part of the process other than "we don't want to". MelonJam is judged entirely by current and former members of the Cupertino Game Development Club. Each officer is assigned a set of games to judge and told to pick three favorites. The selected projects are combined to enter a final discussion with all judges, and winners are selected. Daydream Global could have easily been professionally judged with even a small team (say eight judges judging 25 games each) of judges, bypassing all of the inevitable problems of peer voting. Instead, we got a single category of judge prizes with the vast majority of prizes being determined by a fundamentally flawed system.
Professional judging may not be a perfect system, but it's the best system we have for determining the highest quality projects.