Expanding a Tiny(mce) home

For an open source project as old and well-known as TinyMCE the primary repository is more than just a collection of code; it’s home base, the first place to look for help and the foundation for everyone who forks or contributes to the project.

So you can imagine that making perhaps the biggest change this repository has ever seen was quite daunting. What started as a simple proposal became an evolving background priority task that has taken 5 months for me to become confident it is stable.

As TinyMCE transitioned to a modern codebase through 2017 and 2018 many external library dependencies were added from previously closed source projects. This multi-repository development style worked well enough in small teams, but as we grew the team on TinyMCE 5 we hit the pain points harder and harder slowing the team down significantly. It’s a common story in “I switched to a monorepo” blog posts.

As TinyMCE 5 wound through beta and release candidate at the end of 2018 I decided enough was enough. In consultation with Spocke the decision was made to bring our 22 library projects together alongside TinyMCE in a monorepo. I volunteered to take this on as a pet project, expecting the scope of changes in TinyMCE 5 to take priority and it would be a long slow burn.

I don’t want to create another tedious “how to monorepo” article, but I do want to give a high-level overview of why and how we did it for posterity and conversation.

Based on my expectation of delays I started with the best decision of the entire project; I did not build the monorepo by hand. I used a script to do the heavy lifting and import both the existing code and templates of new files. This script could rebuild my monorepo fork based on the master branch of TinyMCE and 22 source repositories in just 2 minutes. This gave me freedom to progress, experiment and iterate in my own little sandbox while also keeping it up to date.

Early on we decided to convert the TinyMCE repository rather than start a new one. The popularity of the project meant we could not break the master branch; the whole monorepo update for our developers had to come down with a single ​git pull. Next, we settled on Lerna as the basis for our monorepo. It is fairly well known and seemed to be strongly recommended.

Side note: the decision to use modules instead of the Lerna default packages is a whole other tale, one I tried to cover in the TinyMCE readme. It’s quite possible we will eventually drop Lerna, a yarn workspace gives us most of the benefits I was looking for, and there are definitely stories of people outgrowing Lerna. But for now it’s working well.

I quickly abandoned the low-level management functions and settled on Yarn workspaces instead, but Lerna’s help with publishing independently versioned modules is essential as we wished to continue publishing the libraries even after their source code was merged into the monorepo.

The base setup of my monorepo script was fairly simple:

  • yarn config set workspaces-experimental true
  • mkdir -p modules/tinymce && mv .gitignore * modules/tinymce
  • create new .gitignore
  • lerna init
  • Update lerna.json
    • "version": "independent",
      "npmClient": "yarn",
      "useWorkspaces": true
  • Update the default package.json created by lerna:
    • "workspaces": ["modules/*"]

 

From there I needed to start adding the package repositories into the monorepo. There are many and varied resources to help with creating a monorepo but the conversion I had planned was more complicated than most articles I found on the subject.

My initial attempt followed the simplest possible approach, git subtree:
git subtree add -P modules/name ../name master

This retained the commit log history, but it didn’t show file diffs in each commit (I guess due to the file location changes?). I’m not sure what the use case for this command is.

My second attempt was with Lerna’s import command. This filtered every commit, making it very very slow (8867 commits across 22 repositories) and the resulting git history structure did not impress me.

After digging through more articles, and finding the right words to search for, I came across a detailed approach to merging git repositories without losing file history by Eric Lee (via his Stack Overflow post).

This technique checks out the source repository master, moves all files into a subfolder, then merges that into the monorepo master. SeemsGood. I only needed to make small adjustments to this process, I can post the script if there are requests but most of the details are specific to TinyMCE and our internal git server.

Once the repositories are imported my script uses sed and a few other tricks to adjust common tooling so it works inside the monorepo:

  • Move all devDependencies to the monorepo root to avoid diverging versions
  • Switch all packages to TypeScript project references and build mode
  • Switch all packages to LGPL, matching TinyMCE (most of the source projects were Apache 2.0 licensed)
  • ./node_modules/.bin/cmd and npx cmd no longer work
    • This was a simple fix, we now use yarn cmd
  • load-grunt-tasks found no tasks to load because it required the task modules be in a local node_modules but the monorepo moved all of those to the root. So I had to get creative:
    • require('load-grunt-tasks')(grunt, {
        requireResolution: true,
        config: "../../package.json",
        pattern: ['grunt-*', 'rollup']
      });
  • grunt.loadNpmTasks() also found no tasks to load
    • This was deleted, replaced by pattern additions to the load-grunt-tasks config.

From here I made constant tweaks to the import script and began finding issues I could fix in the source repositories instead of patching with my script:

  • Repositories with CRLF in their files (TinyMCE standardises on LF). The way git merge is configured by the script it was performing renames + line ending conversions but only committed the rename. This did not happen often so was easy to fix when it came up.
  • In late 2018 we built a more modern API for our AJAX library, jax, but only deployed it to premium plugins. We decided that rather than standardise the monorepo on the old API, we would completely replace the open source jax with this new code.
  • I took the opportunity to use BFG Repo-Cleaner to strip out binary history from two projects before they were imported. This brought the monorepo git pull size down from 23MB to 11MB.
  • We use webpack a lot in development, and for a long time have relied on awesome-typescript-loader. We are big fans of atl, and we still use it for the demo environment, but testing in the monorepo was just too slow without support for TypeScript project references. So we switched testing to ts-loader, which does support them, via a seemingly innocuous commit message 😉
  • The way our test framework imported tests, when applied to the whole monorepo, produced a 19MB JavaScript file and a fun TypeScript error:
    TS2563: The containing function or module body is too large for control flow analysis.
    This one turned out to be easy to fix and sped up all projects, so in it went to the open source project months ago.
  • I sneakily deployed a monorepo-specific change to resource loading in tests. The test framework added resource aliasing for yarn workspaces, with an otherwise pointless local alias, except I switched TinyMCE tests to make use of the local alias allowing all tests to run inside the monorepo without patching from the script.

I poked and prodded at this on and off for a couple of months until I had my monorepo dev environment working, the tests all passed, and I could start thinking about versioning / publishing. I’m the sort of person who doesn’t totally trust documentation; I want to mess around and explore the commands to see what they do.

This required extreme caution. I had imported live NPM module source code so a rogue npm publish could have dire consequences. After spending some time logged out of NPM to be safe I hit upon Lerna’s publishConfig setting which let me constrain all publishing to a specific NPM registry.

I used a local Sonatype Nexus Docker container along with a fork of TinyMCE to give me complete freedom to publish builds and explore the publish styles Lerna offers while playing in my sandbox.

We publish to NPM from CI very regularly, so at first glance from-package seemed like the best path forward to match how we had been developing. After some discussion we decided to switch to automated lerna publish patch. Independent versioning on 22 packages will potentially create hundreds of version tags a year but we love automation and cleaning up a tag mess can be scripted. We do still use from-package to account for manually updating minor and major versions, but we are hoping to explore conventional commits to later automate the entire release process.

As a final step I leveraged lerna changed to create a new CI script that only runs tests on changed packages. This reduces CI build times and further improves the iteration speed of developing in the monorepo.

Five months in the planning, and after a week of internal testing, nearly 9000 new commits and the TinyMCE monorepo are now live. I’ve had a lot of fun building it and I hope it works as well for our contributors as it has already done for our own development.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.