Managing Intermediate Artifacts

Moving on to larger projects

9 May 2019 — 7 min

In my last post, I talked about small public projects. When the project gets bigger however, the workflows I presented quickly become a pain. As I showed in the first post of the series we have reached a size where typechecking, testing and linting have slowed to a crawl, even to the point that the language intelligence of my IDE keeps crashing when I switch branches.

As I also showed in the first post, another problem is that the developer tools often do duplicate work, which both makes things slower, and opens the door for bugs. We use different tools to compile serverside code for production use, which is different from the way we run the code in local testing, also jest does its own thing to run the code, and webpack does its own thing when bundling code for the web.

Now I want to define some goals I would like to achieve, as well as to define some rules.

First off, I would like to start with a clean slate
- This implies that we are only caring about TypeScript code, which will be important in a sec.
There should be as little difference as possible between running code for local development and running code in production
Running code in local development should be convenient
- It should be fast
- It should involve as little boilerplate as possible
It should support code that targets web as well as node!
It should make it easier and convenient to organize code
- Specifically, it should support deep import X from "deep/within/other/modules"
- (Yes, I absolutely believe that small, public libraries should only support a single entry point and hide their internal structure! But this usecase is different.)
It should have strong rules in place to enforce best practices

After extensively exploring the problem space, I think it will become necessary to rethink some of the conveniences that I came to rely on coming from small projects. I think it is necessary to explicitly manage intermediate artifacts.

I think this will come with some significant advantages for local development as well as running code in production. But it comes with one significant disadvantage, which is that most development workflows are not self contained anymore, but rely on other steps.

# `tsc` to compile files

First step here would be to use tsc explicitly to transpile to code that runs natively in node 8 and modern browsers, with one very important twist: The code will use native esm modules instead of commonjs! To make this work in node, I propose to use the esm module to be able to natively load those.

I am very wary of using such require hooks in production, but I really want to give this one a shot. Apart from this one, we already use source-map-support.

Using tsc in --watch mode, combined with esm would mean the following:

We would run the exact same code in local development as we will do in production!
We wouldn’t need any webpack loader at all. Webpack/rollup can consume native esm modules. So we would also run the same code on the web as we do on the server.
Things would be fast: Since we don’t need any transpiling require hook or webpack loader anymore, hot reloads should actually get faster.
BUT: we would need to have tsc --watch running in the background at all times, which is an inconvenience.

Now that we have decided to actually have tsc emit something, combined with the fact that we will deal with TS files only, we can use project references which will hopefully significantly reduce the resource usage and startup time of the IDE.

# Code Organization

We currently use path mapping, which needs to be set up separately for tsc and jest, plus a custom require hook using tsconfig-paths, which I had to patch myself BTW because it was both horribly slow and buggy.

After some time, I come to the conclusion that relying on path mapping was not a good idea. Apart from the problems with tsconfig-paths itself and the need to correctly set it up, it was also a source of problems because the code had different behavior in local development as it had in production.

So far, we also used npm packages which were published to a private registry, which in itself has caused us a lot of problems every now and then. Instead of consuming code via npm, we decided to just put the whole monorepo (a more fitting name would be code dump) into a docker image, to make us independent from an npm registry.

However, I still think using npm packages, or more specifically node_modules has its merits.

So we established that we want to use the exact same code in production as in local development, and that we don’t want to rely on path mapping anymore. And we would like to have both convenient import paths and deep import paths. One of the reasons path mapping caused problems was the fact that we had src and dist folders, which would allow deep import paths in local development but fail in non obvious ways when running in production.

My proposal here, which I would still have to validate with a running example, is to remove the src/dist folders, and have tsc emit its artifacts right in the root folder. You would end up with a structure like this:

| some-package
+- README.md (maybe)
+- package.json
+- tsconfig.json
+- .eslintrc.js (maybe)
+- index.ts
+- index.js
+- index.js.map
+- index.d.ts
\- index.d.ts.map (not quite sure if these can be inlined?)

Yes, this does look very untidy. At least in vscode, the IDE can be configured to hide all the output artifacts if a corresponding .ts file exists. Not sure about other editors.

I think there are ways to organize things differently, for example by moving the package.json file into a different folder that would be the output folder with the intermediate artifacts, separate from the source files. But I think that would be more confusing than beneficial.

Also note that since we will not rely on publishing to an npm registry anymore, the package.json is free to define arbitrary names, such as this:

{
  "private": true,
  "name": "~components"
}

yarn workspaces or pnpm would make sure that a deep import such as import X from "~components/Button" would find the correct file.

# Digression: Code Generators

One other thing that is causing me a lot of concerns recently is how to deal with other intermediate artifacts, such as code created via code generation. We use intl-codegen and apollo codegen to produce code that depends on other source files. The have written the former myself and I’m not quite sure how happy I am with the latter.

We have multiple problems with the way we use these tools currently.

The generated files are currently committed to git, and cause a lot of churn and merge conflicts.
The files can get out of sync, since developers are not forced to re-generate and commit them.
Generating these files can break either the typechecking, or far worse, the code itself in unpredictable ways. Which is both inconvenient when CI builds suddenly turn red, and dangerous when things are shipped to production.
Translators often mess up the MessageFormat syntax, which will only break when a developer runs the codegen.

I think to solve this problem, it would be a good idea to .gitignore these files and rather integrate them better with file watcher running in the background.

For intl-codegen, this should be easy and straightforward, but apollo is more complex, since it relies on a graphql schema, which itself depends on running your code first. In this case, I propose to actually commit the schema, but write an automated test that runs the schema creation on CI and fails when the cached schema file differs.

# Conclusion

I think the proposal shown here would solve quite some problems while introducing only minimal inconveniences. I would really love to explore this further.