Managing Intermediate Artifacts
Moving on to larger projects
— 7 minIn my last post, I talked about small public projects. When the project gets bigger however, the workflows I presented quickly become a pain. As I showed in the first post of the series we have reached a size where typechecking, testing and linting have slowed to a crawl, even to the point that the language intelligence of my IDE keeps crashing when I switch branches.
As I also showed in the first post, another problem is that the developer tools often do duplicate work, which both makes things slower, and opens the door for bugs. We use different tools to compile serverside code for production use, which is different from the way we run the code in local testing, also jest does its own thing to run the code, and webpack does its own thing when bundling code for the web.
Now I want to define some goals I would like to achieve, as well as to define some rules.
- First off, I would like to start with a clean slate
- This implies that we are only caring about TypeScript code, which will be important in a sec.
- There should be as little difference as possible between running code for local development and running code in production
- Running code in local development should be convenient
- It should be fast
- It should involve as little boilerplate as possible
- It should support code that targets
web
as well asnode
! - It should make it easier and convenient to organize code
- Specifically, it should support deep
import X from "deep/within/other/modules"
- (Yes, I absolutely believe that small, public libraries should only support a single entry point and hide their internal structure! But this usecase is different.)
- Specifically, it should support deep
- It should have strong rules in place to enforce best practices
After extensively exploring the problem space, I think it will become necessary to rethink some of the conveniences that I came to rely on coming from small projects. I think it is necessary to explicitly manage intermediate artifacts.
I think this will come with some significant advantages for local development as well as running code in production. But it comes with one significant disadvantage, which is that most development workflows are not self contained anymore, but rely on other steps.
#
tsc
to compile files
First step here would be to use tsc
explicitly to transpile to code that
runs natively in node 8 and modern browsers, with one very important twist:
The code will use native esm
modules instead of commonjs
! To make this work
in node, I propose to use the esm module to be able to natively load those.
I am very wary of using such require hooks in production, but I really want to
give this one a shot. Apart from this one, we already use source-map-support
.
Using tsc
in --watch
mode, combined with esm would mean the following:
- We would run the exact same code in local development as we will do in production!
- We wouldn’t need any webpack loader at all. Webpack/rollup can consume native
esm
modules. So we would also run the same code on the web as we do on the server. - Things would be fast: Since we don’t need any transpiling require hook or webpack loader anymore, hot reloads should actually get faster.
- BUT: we would need to have
tsc --watch
running in the background at all times, which is an inconvenience.
Now that we have decided to actually have tsc
emit
something, combined with
the fact that we will deal with TS files only, we can use project references
which will hopefully significantly reduce the resource usage and startup time
of the IDE.
# Code Organization
We currently use path mapping, which needs to be set up separately for tsc
and
jest
, plus a custom require hook using tsconfig-paths, which I had to patch
myself BTW because it was both horribly slow and buggy.
After some time, I come to the conclusion that relying on path mapping was not a good idea. Apart from the problems with tsconfig-paths itself and the need to correctly set it up, it was also a source of problems because the code had different behavior in local development as it had in production.
So far, we also used npm packages which were published to a private registry, which in itself has caused us a lot of problems every now and then. Instead of consuming code via npm, we decided to just put the whole monorepo (a more fitting name would be code dump) into a docker image, to make us independent from an npm registry.
However, I still think using npm packages, or more specifically node_modules
has its merits.
So we established that we want to use the exact same code in production as in
local development, and that we don’t want to rely on path mapping anymore. And
we would like to have both convenient import paths and deep import paths.
One of the reasons path mapping caused problems was the fact that we had src
and dist
folders, which would allow deep import paths in local development but
fail in non obvious ways when running in production.
My proposal here, which I would still have to validate with a running example,
is to remove the src
/dist
folders, and have tsc
emit its artifacts right
in the root folder. You would end up with a structure like this:
| some-package
+- README.md (maybe)
+- package.json
+- tsconfig.json
+- .eslintrc.js (maybe)
+- index.ts
+- index.js
+- index.js.map
+- index.d.ts
\- index.d.ts.map (not quite sure if these can be inlined?)
Yes, this does look very untidy. At least in vscode
, the IDE can be configured
to hide all the output artifacts if a corresponding .ts
file exists. Not sure
about other editors.
I think there are ways to organize things differently, for example by moving the
package.json
file into a different folder that would be the output folder
with the intermediate artifacts, separate from the source files.
But I think that would be more confusing than beneficial.
Also note that since we will not rely on publishing to an npm registry anymore,
the package.json
is free to define arbitrary names, such as this:
{
"private": true,
"name": "~components"
}
yarn
workspaces or pnpm would make sure that a deep import such as
import X from "~components/Button"
would find the correct file.
# Digression: Code Generators
One other thing that is causing me a lot of concerns recently is how to deal with other intermediate artifacts, such as code created via code generation. We use intl-codegen and apollo codegen to produce code that depends on other source files. The have written the former myself and I’m not quite sure how happy I am with the latter.
We have multiple problems with the way we use these tools currently.
- The generated files are currently committed to git, and cause a lot of churn and merge conflicts.
- The files can get out of sync, since developers are not forced to re-generate and commit them.
- Generating these files can break either the typechecking, or far worse, the code itself in unpredictable ways. Which is both inconvenient when CI builds suddenly turn red, and dangerous when things are shipped to production.
- Translators often mess up the
MessageFormat
syntax, which will only break when a developer runs the codegen.
I think to solve this problem, it would be a good idea to .gitignore
these
files and rather integrate them better with file watcher running in the background.
For intl-codegen, this should be easy and straightforward, but apollo is more complex, since it relies on a graphql schema, which itself depends on running your code first. In this case, I propose to actually commit the schema, but write an automated test that runs the schema creation on CI and fails when the cached schema file differs.
# Conclusion
I think the proposal shown here would solve quite some problems while introducing only minimal inconveniences. I would really love to explore this further.