Building a Commenting System for Static Sites - Part 1.

The Initial Idea: Brainstorming and Sketching out a Solution.

Graffiti'd wall, with 'Hello' and 'Part 1.' overlaying it, in nice font.

Project Page

This article is part of a project. You can see its current status, as well as find other information and articles about this project at its project page.

tl;dr

I'm making a commenting system for this site that will use statically compiled files. If the end result turns out well, I will open source it so other people can use it as well.

°l||l°l||l°l||l°l||l°l||l°l||l°l||l°l||l°l||l°l||l°

Ok, cool cool. Um, why? 😩🔗

Maybe it's me but a lot of commenting systems just feel gross. They're usually just weird feeling and unnecessarily social media-y. I'll be more specific later, but in essence, I just don't think it's a good user experience. If someone is going to read something that I have to say, I want them to have a good experience doing so. Speaking for myself, I don't feel like leaving comments on sites that use Disqus, Facebook, or similar things.

Also, there are technical reasons. For one, this site doesn't use javascript™(*), which pretty much means server side rendering or... generating once at compile time??? 🤔

Probably the only commenting system that I think is executed really well is Hacker News. I don't know though. I don't really spend a lot of time surfing the net. But if you do, and you have strong feelings about this, please please let me know if there's a way of doing this that you really like. Maybe do it after reading the article though. Ok, here we go.

What's Wrong with Existing "Commenting Technology"?🔗

There are a few different categories of "commenting technology", for lack of a better word. I'll do my best to briefly discuss each one, and what I found lacking.

Professional Blogging Platforms🔗

I don't really know what professional means here, but I think it means that providing an online venue for long form writing is their main focus. There are two subcategories: the author is a "thought leader", like Medium, or the author is an actual writer, like Substack, and they present different issues for me.

Options Like "Medium"🔗

Medium et al. have good typographic defaults and they're pretty well optimized for reading, which is good. What I don't like though is the social media-y feel, and the arbitrage relationship they create between an author and their audience. Somehow, despite being literally designed to just read stuff, Medium still feels spammy.

Somehow, despite being literally designed to just read stuff, Medium still feels spammy.

I think this is usually the case when a technology provider tries to elevate their importance beyond what value people actually want to derive from them (RIP WeWork). It's like when I'm there on these sites, there's this dull feeling in my head, probably from the collective consciousness, listlessly sighing, "I literally just want to read this article".

Do I have super good examples of what I mean? No, not really, sorry. If I see a modal informing me of some engagement that I need to do in order to really consummate my experience, I've already begun mentally filtering it out. However, maybe I don't have super specific examples of this spammy feeling, but do I plan on writing comments there? Hell no. And probably neither do you. And that's exactly my point.

(I don't know, maybe it's just me? Maybe I should write a separate article about how perfectly well designed products can be hurt by their own success, when their low barrier to entry and wider user adoption attracts a lot of low effort content?)

Places for Actual Writers🔗

Like Substack!

Hey, do you know what I think is a cool idea? Someone produces some thing or service that you like, and then you pay them for it. I am not even being snarky. I sincerely believe that maybe 60% of our problems in the technoligized parts of the world would be fixed just by paying for stuff. Think of it as like a carbon tax for free stuff that's bad for us.

Anyway, so why don't I use something like this? Well, mainly it's because I'm not really a professional writer, and therefore I don't think this technology is a good fit for what I have to offer.

I'm not really a professional writer, and therefore I don't think this technology is a good fit for what I have to offer.

Do you want to read an in-depth study, sharing the results of tests that I conducted, comparing different brands of sponges (from different countries) for bacteria content, because I'm convinced there's a cartel of sponge manufacturers conspiring against us? Maybe!

Do you want to pay for it? Probably not.

Other Reasons🔗

Finally, I have an intense interest in maintaining control over the technology of this site, and Medium/Substack are other sites altogether. By the way, I'm fine outsourcing the edges to stuff that people have built well once, so they can be reused again. I don't have a problem letting Cloudflare manage my SSL certs, or rate limiting request to my website, because that stuff lives on the edges and I'm mostly concerned with producing good content.

Finally, I have an intense interest in maintaining control over the technology of this site, and Medium/Substack are other sites altogether.

That being said, comments are literally just words about other words, surely it can't be that difficult to figure out how to get those words to live together in the same DNS zone?

It's like I want to embed something next to the main words. Ok, read on.

Social Media-y Stuff (AKA way 🆒)🔗

This should be obvious, so I won't spend a lot of time on this section. This option is a big no no for me. If I had to describe an ethos... it just doesn't seem cool to me. Additionally, setting aside protecting my reader's privacy and offering a first rate user experience, there is a big performance penalty as well. A lot of these disqus-like services have ad-tech baked into their products, with one of the results being the site feels much less responsive.

A blog/personal site should be about content first and foremost.

Hipstery Jamstack Stuff🔗

There are A LOT of contenders here, and it really should give me pause before making my own. Responding to what I don't like about each one would also be too long. Basically, I'd prefer at this stage not to host anything, I am trying to wean myself off of AWS, and I don't want to pay someone else for this.

Basically, I'd prefer at this stage not to host anything, I am trying to wean myself off of AWS, and I don't want to pay someone else for this.

Just writing that list made me realize how much I may come to regret this, as literally ever single item in that list was a perfectly reasonable compromise for built in comments on your website.

I do want the storage format to be files, maybe, and I really want full control of the user experience. I also fully expect my feelings on this to change.

Design Considerations🔗

A commenting system for some random person on the internet is not the same thing as building Facebook or Twitter. For that reason, it actually makes sense to value other things.

Some things I'm optimizing for/around are:

  • Cost - it should cost as much as hosting my static site, which is free
  • Low throughput - I would be happy with maybe dozens of comments/week
  • Simplicity - about as difficult as maintaining the rest of the content

Bearing this in mind, I decided that I wanted to try building something that aims to store comments as files (CSV's to be exact - wait 🙊... what? On purpose? Why?) colocated in the same directory as the pages they're commentating on.

Why Files?🔗

This may sound way dumb, but hear me out. Honestly, how many comments do I expect to receive per article? Maybe like 100 if my wildest fantasies came true. I'm pretty much expecting someone to only say something if they really want to.

So, scale in this case is not a reason not to store comments as files. Also, if this technology ended up becoming a thing that other people used, I imagine they'll more or less be in the same boat. If I ever were in trouble because of scale, then that's a good problem to have, so I'm not worried about the worst case scenario. 1000s of comments would be a bit much, so I guess I could change the design a little and move the files out of version control.

[...] scale in this case is not a reason not to store comments as files.

Text files fit well with a bunch of other tooling that I already use for my statically generated site. Want to review a comment? It's sort of like reviewing a pull request. This format is really portable too. You can grep it, sort it, awk it, etc... maybe in each of those use cases I gave there is another technology that's better in that singular focus, but it's hard of me to think of anything that's sort of ok at everything that I want. So I think the real risk here is more from some overlooked dealbreaker, rather than whether text files are up for the task of representing, er, text...

An analogy that I'd like to use is people all working on a shared code base, but instead of code it's ideas. So, the content of my blog is the "code base", where I'm the maintainer, and the commentators are contributors. In that sense, I just want to leverage the same tools and workflows to do that, since both code and ideas in this case are communicated as text. Is that a bad idea? Seriously, I'm not sure, but let's fine out.

Challenges Around Moderation🔗

There definitely needs to be some way of separating acceptable and unacceptable content. Obviously, I don't want gross stuff on my personal website. Not sure what else to add there, but that question then leads to how?

Most of the time this is solved through someone identifying themself, i.e. authenticating, before they comment. Unfortunately, at that point we've officially opened pandora's box and things aren't so simple any more, not to mention it brings up privacy concerns.

Note: if you have a good idea of how to provide user aliases, moderate comment, and control spam, without requiring some kind of login, please let me know

I have 3 ideas for identifying users so far:

  1. Let the user pick an alias + password, or email them a nonce, and store a session cookie
  2. In the article, generate href attributes for mailto: links. When a user clicks "Reply to Article" or "Reply to [...]" your email client opens up, with everything pre-filled. I can then write some serverless code to handle the logic of receiving the emails and associating the comment with the right article/comment from the pre-populated data.
  3. Integrate with an OAuth provider

In any case, how the authentication doesn't matter with respect to the rest of the system.

System Architecture (sketch)🔗

Here's a high-level overview of the flow of events:

  1. User fills out comment form
  2. On the next page, user authenticates somehow
  3. Comment gets added to the cue, and user is redirected
  4. A cron job checks the cue every few minutes
  5. Any comments waiting get added to a PR

There's obviously a lot of variability in that flow, but that's more or less a reasonable way to do this. Here's a picture I drew the other day.

Sketch of flow described above
Not quite literally 'back of the envelope' but very close in spirit

I think the most difficult decision will be how to elegantly integrate the user authentication. Fortunately, that step can be abstracted away from the rest of this system.

Where Are We with This? 🐌🔗

First let me be clear that I'm trying to do these design journals in real time, so this project is literally a few days old. With that out of the way, I've begun prototyping a few little pieces here and there. A user interface, serverless form submission handler using cloudflare workers, and then the bits and pieces that would be required to create the PR using GitHub's REST API.

Tech Stack🔗

So far I'm using cloudflare to build this all. They have really nice abstractions and very generous free limits. Cloudflare Workers handle the form submission, cron job, and PR creation, and I haven't decided yet what to do for authentication. I'm also using Cloudflare KV store right now for creating the comment queue, just because it's super easy. I'd like to explore though using their durable objects or R2 for this in the future, once I validate the prototype.

One key requirement is that workers have a limit of 10ms per request. That concerns me for the pull request aspect of this, as receiving the file contents so I can append the latest comments, and then sending them back could easily trigger that time limit.

What's Next?🔗

Misc.🔗

I'll put little bits and pieces that occur to me, before I publish the next article.

Wait. Why CSV?🔗

I know everyone wants everything to be JSON or gRPC, but there are legitimate reasons why I think CSV is the way to go with this. If you think otherwise, please let me know. Here is from my notes, verbatim, with respect to updating the comments in different file formats, i.e. step #5 from above

Shootout CSV vs TOML vs JSON

CSV Steps:
1. Get file using API
2. Append new comment as b64 string to “content”
3. Create a new blob, etc…

JSON Steps:
1. Get file using API
2. Decode content from b64 to json
3. Append comment to end of comments array in json obj
4. Encode json obj as b64
5. Create a new blob, etc…

TOML Steps:
1. Get file using API
2. Decode content from b64 to utf8 string
3. Parse utf8 string as TOML
4. Append comment to TOML comments field
5. Turn TOML into utf8 string
6. Encode utf8 string into base64
7. Create a new blob, etc…

So in other words it's a lot more performant, because you can just append what you need to the end of the file. You don't even need to decode it; just literally write a base64 string of the new row you're adding straight to the base64 string you get from GitHub.

Big Optimization To Make, or, Why CSV Part 2🔗

Additionally, there's a pretty big optimization that I want to make that really wouldn't be easy without using CSV.It's that right now I'm creating the PR fairly naively by:

  1. Getting the file
  2. Updating the file
  3. Opening a PR

(Note: this is simplified, but more or less that's the spirit of it.)

The slow part of this is the getting the file part. The thing is, I don't really want the file. I just want to stick this comment at the end of whatever they have there. With that in mind, I am pretty sure that I can just write the latest comments as blobs straight to the GitHub repo, then create a new tree through the API that basically says, "This blob is appended to that file" and never even have to pull the file.

I have yet to recreate this locally using the low level git commands, but I'm absolutely positive it's possible, as everything in git is literally just the objects: blobs, trees, and commits.

Any way, with this in mind, I think it's more clear what a not bracketed file format is the way to go. I don't know, maybe it's easy to append to JSON, using a mixture of GitHub's low level API, but I don't know. I'll let y'all know though when I learn.

°l||l°l||l°l||l°l||l°l||l°l||l°l||l°l||l°l||l°l||l°