I sometimes get asked what my process is for researching and writing Substack posts. I always enjoy reading other people’s discussions of their writing process, so I thought it would be worthwhile to explain mine.
First, some background. In school I was a competent but unspectacular writer (my verbal SAT score was a full 100 points lower than my math score). Like most millennials, I’ve spent the majority of my life in a sea of text (reading online, writing emails, instant messages, gaming chats, and so on), but “serious” writing has never been particularly compelling to me, and major writing projects always took a lot of willpower to do. Prior to Construction Physics, I had done very little writing in my adult life: I spent most of my career as an engineer, which required little to no writing, and while every so often I would start a blog or write a forum post, I never maintained it as a habit. I’m not like Scott Alexander or Freddie deBoer, writers who can crank out huge amounts of high-quality text like it's a bodily function. Writing does not come particularly naturally to me, and my process reflects that.
There are, broadly, two types of posts I write (with significant overlap). The first are explanation-driven posts, where I try to explain some given topic. More often than not, the impetus is to explain something to myself: I feel like I don’t understand a topic, and use writing as a way to work through an understanding of it. I started Construction Physics as a project to try and understand why construction productivity never seemed to increase, and though its scope has expanded to include infrastructure and industrial technology, learning how something works is still a major driver. A deficiency that I’ve turned into an advantage is that I get confused easily, and need a very clear, step-by-step explanation of something before I understand it. This often results in a written explanation that other people find useful. “How the Gas Turbine Conquered the Electric Power Industry” and “What Happened to the US Machine Tool Industry?” are recent examples of this sort of post.
In some explanatory posts, the impetus for writing is that while I understand something, I get the impression that most other people don’t. “On Klein on Construction," “On Yglesias on Manufactured Homes," and “How Valuable are Building Methods that use Fewer Materials” are examples. But even these posts generally require me to research a topic further before I can write about it. For “On Yglesias on Manufactured Homes," for instance, I needed to better understand things such as how much manufactured home mortgage lending occurs and the rules for treating manufactured homes as property.
The second type of post I write is exploratory, rather than explanatory. I find some dataset or collection of information that I think will be interesting to explore, and I write up what I find while exploring it. “When Did New York Start Building Slowly?” and “The Worst US Bridges are Getting Fixed” are examples of this type of post. I started with a dataset (the National Bridge Inventory and the CTBUH’s skyscraper database, respectively) and just poked around to see if there was anything interesting to find.
These categories will often overlap: I’ll be interested in understanding something, and do so by exploring some dataset (or multiple datasets). The “Does Construction Ever Get Cheaper” post is an example of this, where I started with the question and was able to work my way towards an answer by finding several good historical construction cost indexes.
Step 1: Research
Posts often require quite a bit of research and reading to do. As time has gone on and I’ve increasingly moved into topics that I don’t have a background in, the research burden has risen. Ideally, if I’ve chosen a narrow enough topic, I can simply read everything of relevance ever written about it. This might seem impossible to do in a short amount of time, but surprisingly little gets written about a lot of the topics I’m interested in, and it's often more feasible than you might think. Some of my best posts, such as the story of titanium, the history of mobile homes, and building the Empire State Building were ones where I was able to read or skim essentially every major text on the topic. In cases where I can’t read every major source, I try to get to the point where I’m hitting diminishing returns and each new source just covers material that I’ve already read elsewhere.
The sorts of sources I use vary. In some cases, there will be a major work or two (a book, or a PhD dissertation) that covers the topic very thoroughly, and I can just read those and a small number of supplementary sources. The “Why Did We Wait So Long For Wind Power” series is an example of this type of post: the lion’s share of the information came from a single book. (If a source is definitive and interesting, I’ll sometimes turn it into a book review post.) But more often than not I need to piece together an explanation from many separate sources. The gas turbine post and the Rise of Steel posts required lots of individual sources. Sometimes I’ll misjudge the “size” of an explanation and find myself faced with an enormous literature that I struggle to pare down. The Birth of the Grid series is an example of this (often a multipart post is an indication that I’ve misjudged the scope of my topic and have gotten myself into trouble). The ideal topic is one where the existing literature is extensive enough that there’s value in synthesizing it, but not so extensive that I can’t read all or most of it in a short amount of time.
There’s no particular magic to the research process. Just lots and lots of searching on Google/Google Scholar/Google Books for relevant topics, keywords and questions until I feel like I’ve cracked open the literature and found the major relevant works. I’ll often do a lot of searching before doing much actual reading, by opening up dozens of new tabs for anything that looks like it might be relevant and sifting through them (and then doing more searches as I find new relevant considerations, new keywords and search terms, or am pointed to new sources that I haven’t yet seen).
I can usually tell once I’ve “broken in” to the literature and found most of the relevant references, and I will keep hammering on searches until I do. While I don’t have much natural inclination towards writing, I do have one for internet research, and it takes little willpower for me to research a topic as deep as I feel like I need to go. Despite the internet being ubiquitous for decades, very few people have internalized that they truly have access to all the information in the world, and that if an answer to a question exists, its available somewhere on the internet if you're willing to look hard enough: either it’s on some electronic document, or in a book that modern e-commerce has made trivial to acquire, or in the mind of an expert whose email you can probably find. I will keep pushing on a topic until I feel like I've exhausted my options, and it's rare that I learn about a potentially useful book or reference and am not able to get it.
For access to sources, I have an alumni login for my college library, which allows me to access most academic papers, dissertations and the like. If that doesn’t work, I’ll turn to Sci-Hub, Libgen, Internet Archive and the like. I like having physical copies of books (and often I’ll want some book that's out of print), so I also buy a lot of books on Amazon. Early on I would hem and haw about purchasing a book for a post, but now I basically buy any book I think might be useful, which amounts to several hundred books per year.
Step 2: Reading and thinking
At the end of the research process, I end up with a big pile of relevant sources. I do not read particularly fast, but I am very good at skimming: I can hunt through a large text quickly, ignore most of the information that’s not relevant to my purposes, and hone in on the important bits. And because I often revisit topics (or use one post as a springboard into further, similar posts) I often find that sources cover material I'm already familiar with and can skip. With the US Steel post, for instance, I was already very familiar with the steel industry because of my previous posts on the topic, which made researching easier.
As I’m reading, I’m highlighting anything that is relevant to the topic, or that just seems interesting. Sometimes I will copy and paste these bits into a big scratch document, but often I’ll just mentally keep track of what the interesting bits are and what document they’re in. As I’m doing this, I’m also generating thoughts and ideas in response to what I’m reading: relationships to things I’ve read elsewhere or written about previously, organizing concepts that some detail is an example of, themes I see repeated, and so on. Like the highlights, sometimes I’ll put these down into the scratch document, but just as often I simply keep track of them in my head. I tend to get some of my best ideas at night right before bed (I find being tired makes the associations come a little more freely), which I write down by sending myself an email from my phone. I’m something of a slow thinker, so it often takes a few days of the research marinating in my head before I have any interesting ideas about it (one reason it takes me about a week to write a post).
If I’m writing about a topic that I don't have much direct experience with, I'll sometimes supplement reading with a discussion with an expert or experts, who can redirect me if I'm off-track or point me to useful literature and proper keywords. These folks are also often gracious enough to read through a draft of a post and point out any mistakes I might have made.
Step 3: Compression and structure
After I’ve read through the various sources on a topic, I have a collection of facts and details I think are interesting and relevant, and a collection of thoughts and ideas that the reading has inspired, either in a scratchpad document or just in my head. The next step in the process is to compress this mass of facts, details and ideas, by finding a simple structure that summarizes and explains it. This may be a concise history of the topic covering the bits that I think are relevant, it may be a simple explanation of the patterns in the data, and so on. Here I find it helpful to imagine a conversation: I’ve just read an enormous amount on a topic (or made an enormous number of graphs, if it's a data-driven, exploratory post). How would I explain the basic ideas in what I’ve read to an intelligent non-expert? The fact that I get easily confused is once again beneficial here – it’s very obvious to me if some explanation doesn’t make sense or feels incomplete.
This typically results in a short document, around 800 words in length, that lays out the basic structure of the post, but doesn’t include much if any detail. I put a lot of emphasis on getting this structure correct: with the right structure, everything else will fall into place nicely (and a good structure will make ideas clear even if the prose isn’t especially strong). But no amount of writing talent or finely crafted prose can rescue a bad structure.
Step 4: Draft
Once I have the structure, I’ll then flesh it out into a full draft, adding specific details and examples, turning placeholder explanations into full explanations, and so on. This takes the 800 word structure and turns it into a 2500-4500-word (on average) first draft. If my topic is too large for a single post, it becomes obvious at this stage, and I think about where it makes sense to split it into multiple posts. At this stage I am NOT trying to make the prose sound good, I am not thinking about phrasing, or word choice, or anything that’s typically associated with “good writing." I’m ONLY trying to get all the relevant ideas and details onto the page, in the right order. I try to make sure that any given idea is backed up with specific examples and evidence; flitting from idea to idea and building up long chains of reasoning without bothering to support them with examples and evidence along the way is a good way to be completely full of shit.
Here again, I’ll make use of a fictional conversation with an intelligent non-expert: if I was explaining this particular aspect of the idea to someone, how would I support my point in a convincing way? What would I say, and what order would I say it? In general I find "If you were going to explain this to someone standing in front of you, what would you say? Ok, write that." a very useful writing algorithm, both because it makes it easy to decide what to say, and it makes it clear whether an explanation is good or not.
At this stage, whether I’ve done a good job capturing the essence of a topic with the structure I created in step 3 will become obvious. A good structure is a strong structure that can support the “weight” of lots and lots of relevant detail. Any fact that feels germane should have an obvious place to slot in, and adding a new detail shouldn’t disrupt anything that’s already there. If the structure is correct, there won’t be an issue if I find a new source with new information late in the process. If I find myself finding facts, details, and ideas that don’t “fit," it’s an indicator that there’s something wrong with the basic structure and I need to rework it. This doesn’t happen especially often now, but is something I struggled with early on.
Step 5: Editing and posting
Once the draft is written, I go back through and edit it. This is when I try and make things “sound good”: cleaning up phrasing, tweaking word choice, rearranging sentences and paragraph structure, and so on. Ideally I would do multiple editing passes, but often I only have time for one. Once I've done my edits, it goes to the IFP editor for a second round of editing, and then gets posted to Substack. I don’t have a buffer of posts, so when they're done is when they go out.
Tools and miscellanea
The tools I use are not especially interesting. Most of my work is done in a web browser. I use Microsoft Edge, because it allows me to stack tabs vertically, and organize them into groups (last time I looked other browsers didn't have this option, but Safari maybe does now?) I only mention browser choice at all because I regard vertical, grouped tabs as absolutely critical for getting any serious work done. I typically have 100+ tabs open at any given time, a few dozen for each topic in the research stage, and the only reason it's not more is because eventually my computer runs out of memory and can’t open any more PDFs. It feels like there should be a way of working effectively that doesn’t require so many open tabs, but I find it very useful to have every reference open in front of me so I can quickly jump between them.
I use Zotero for a reference manager - every important reference gets added to my Zotero library, and each topic I write about gets its own tag. This isn’t especially useful for writing the post itself (since I basically just keep every source open in its own browser tab until the post is done) but it's very useful if I revisit a topic. I spent a long time resisting Zotero because I don’t like the way it stores PDFs, but I can’t imagine not using it now.
For note-taking and scratchpad work I use Obsidian. I’ve tried quite a few different tools for this (Evernote, Roam, Notion, Onenote, even text editors like Notepad++), and have landed on Obsidian because it has the basic functionality I want: snappy performance, can handle text and images, can quickly and easily navigate between different documents, decent search, and (most importantly) it isn’t a piece of shit. Each post gets its own scratch document in Obsidian, which I use to hold any facts and thoughts I feel need keeping track of. I also use Obsidian to write the first short structure document. I don’t use any of Obsidian’s cross-tagging or “tools for thought” features (and I suspect that most people who invest a lot in using them are sort of wasting their time).
For writing the draft, I use google docs. I’ve tried a few other word processors (such as Scrivener) but found them worse than google docs for internet writing. With internet writing it's very useful to write in a browser so I can easily move images around; if I want to put images into a document and then paste the whole thing into the Substack editor, it’s very easy if the writing is already in the browser, but very hard if I’m writing offline in something like Scrivener or Word. For data analysis, I almost always just use Excel (even for things I probably shouldn’t), though I'm now making occasional use of ChatGPT + Python for tasks where Excel would struggle.
I will sometimes use ChatGPT for research, but not often. If I need to learn the basics of some well-understood topic, ChatGPT is sometimes helpful, but often it has a hard time. For instance, on the gas turbine post, I needed to learn the basic physics of how various engine cycles worked. ChatGPT was helpful for this, but it still struggled with many of the questions I wanted answered (it had a very hard time giving a good explanation of the relationship between compression ratio and compressor efficiency, for instance). And for more complex topics where there's not an already-existing, correct answer, it does even worse. It's not useless, but right now it doesn't seem appreciably better than just basic search.
I have not found ChatGPT to be especially useful for pointing me towards useful references. For a while I was trying Elicit as a literature search tool, but it was never especially useful for the sorts of things I research (it seems much more medicine and biology focused), and it has steadily gotten worse as they’ve commercialized it.
This writing process is not particularly fast. A good writing day is one where I produce around 2000 words, though I’m generally happy with anything over 1500. I can generally write (or do writing-like tasks such as editing) for about 4 hours a day; empirically, words produced after that inevitably end up getting tossed out and aren’t useful. There are sort of conflicting trends at work here - over time I’ve gotten better at squeezing out more words in a day, but I’ve also increasingly been writing about topics outside my experience and expertise, which slows me down (since it requires consulting sources as I go along). So the 2000 words a day level has remained roughly constant. This means a long post might take 4-6 days to write and edit, not including the time spent researching.
I used to struggle with working on more than one writing project at a time - a single topic would occupy my thoughts to such an extent that there wasn’t room for anything else. Over time I’ve gotten better at this (partially because switching to vertical, grouped tabs reduced the mental overhead of keeping track of sources), and I can now have several different ideas simmering simultaneously. But it's still best if I can sort of "live" in a topic for a while and let it completely occupy my thoughts before I write a post on it. Once the draft is done, the topic often seeps out of my head as I turn my attention to a new one. I’ll sometimes disappoint people who ask me about a topic I wrote about many months ago and discover that my knowledge of it isn’t encyclopedic.
Conclusion
This is the process I use to write every post on this Substack (as well as any outside writing I do). I would not necessarily recommend it to anyone else - I think of it mostly as an algorithm for mechanically generating a steady stream of writing that takes maximum advantage of my strengths (such as internet research) and minimizes the amount of actual “writing” required. At every step, I have the luxury of ignoring almost all aspects of the writing process: while I’m writing the structure I don’t need to think about details or prose, while I’m writing the draft I don’t need to think about structure, word choice, or phrasing, while I’m editing I don’t need to think about anything except phrasing and clarity, etc. Different people with different strengths might be better served by a very different process. But for me, it seems to work well.
Thanks Brian! That was very interesting. As a retired engineer I appreciate your focus on building a strong structure to propel subsequent work.
Thank you Brian for sharing this. I was struggling with the idea of why writing was taking so much of my time. After reading this post, it turns out it should.