Comparing Cypress and Puppeteer

An exercise in anger management

8 October 2019 — 10 min

Note: I actually wrote most of this post 2 months ago when I did a deep dive into comparing cypress and puppeteer. Unfortunately I cannot give a clear recommendation on either. You will have to make up your mind yourself, but I hope I can help a bit by presenting the learnings I did with this experiment, so here goes…

# Cypress API

First of, one of the main selling points of cypress is that it is a very convenient all-in-one solution for e2e testing. It comes with its own way to structure your tests, and with its own test runner.

The tests themselves are also written in a very special way, which certainly seems very strange and not quite intuitive at times.

One of the confusion comes from the fact that cypress actually hides the fact that every interaction with the website is by definition asynchronous.

Cypress test code might frequently look like this:

cy.get("#A").click();
cy.get("#B").should("be.visible");

First of all, I personally dislike the usage of jQuery for selectors and basically everything else. Second, it uses chai-jQuery in the background, the way it does that is horrible. cypress’ should method basically takes the chai chainable as a string. So long type checking and typos! I have also seen the usage of .then() inside some tests. But guess what, while you can return a Promise from that function, the return value of that function itself is not a Promise. You can’t await that.

It is just now that I write this blog post that I begin to understand how cypress actually works. Essentially, you just can’t think of the test code you write as real code that is executed. Rather, the code you write just defines a list of commands and assertions. And cypress is free to run them and re-run them however it sees fit. Wow, mind blown.

This is extremely unintuitive for someone who is used to write imperative async code. You have no control over what is actually run when, and in which context.

# Puppeteer API

Puppeteer is very different! To start with, puppeteer is not a testing framework. It is an imperative, async API to automate a browser. Everything else is up to you, such as using it for building e2e tests.

In this sense, the puppeteer API makes a lot more sense. You know exactly what runs when, and you have control over the context your code runs in. You have code which is evaluated inside the browser frame, and your normal testing code runs outside the browser.

The distinction, at least to me, is very clear and just makes sense. However, there are also some limitations and pitfalls to be aware of. Puppeteer has so called page functions which are evaluated in the context of the page/frame. But they are defined in your code just like normal JS functions. But they can’t reference any values of their containing scope !!!. You have to explicitly pass everything as additional parameter. It can also lead to surprising and unintuitive errors when using code coverage. At least there well documented workarounds for this. And with time you will get used to spotting and treating page functions differently eventually.

There are two more pain points with the puppeteer API. The first one is that the API is too async. You know you can extend Native Promises to offer a conveniently chainable API right? Maybe I will blog about that separately.

The other annoyance is that the API feels a bit inconsistent at times. Up until <1.20, some convenience methods like page.select(selector, ...options) and page.click(selector) were only available on the Page object. It would make a lot more sense to provide such helpers on a generic Parent or Container type, which could be used to scope everything to a DOM subtree, such as a modal dialog, because right now such scoping is a huge pain.

Combining such lazy chainable Promises with a Container-focused API, I could imagine an API like this:

await page
  .$("#my-modal")
  .$(".some-other-container")
  .click(".nested-child");

As you can see, it is absolutely possible to chain methods onto a custom Promise type and just await the final result.

But with great power also comes great responsibility. You definitely have more control with puppeteer, but you also have to take care to correctly use it. While cypress automatically retries commands until it hits a timeout, with puppeteer I had to insert explicit .waitForSelector or .waitForNavigation calls quite frequently. While this might be tedious and inconvenient, it also makes sense. And it kind of highlights that you should actually optimize your app for more instantaneous interactions :-)

# Running Non-Headless

One big issue I had with puppeteer was the fact that it is basically a different browser depending on if you run it headless vs when you really have a browser window.

One problem was language. Running a headless puppeteer apparently has no language at all. I don’t really know what Accept-Language header it provides, but express’ .acceptsLanguages() turns it into ["*"], which revealed a bug in a library that I maintain, which I then promptly fixed.

Anyhow, headless puppeteer has no language by default, and setting it needs to be done via the startup parameter --lang=en. But that on the other hand does not work with non-headless puppeteer, which instead uses the LANG environment variable. Well this took me some time to figure out, so as a recommendation, you better set an explicit language via both startup parameter as well as env.

# Working with multiple tabs

One thing that puppeteer supports, which is not possible in cypress is to run multiple Page objects / tabs in parallel, well kind of. In headless puppeteer, you can do that mostly without problems. But when running in non-headless mode, only the currently focused foreground tab will actually do anything. Background tabs will just hang indefinitely. To make it work, you will have to call page.bringToFront() every time you want to switch focus between pages. And of course make sure that your testing framework of choice does not run multiple tests in parallel. This has also caused my a lot of headaches. Depending on what you want to test, and what testing tools you are using, it might not be worth the hassle to use multiple pages. So essentially when your usecase is to run E2E tests, you should try to work with only one page object / tab.

# Integrating with Jest

Speaking of tools, since we use Jest for all of our other tests, I thought it would be a good idea to stick with it, since most engineers are already familiar with it. So I went ahead and set up jest-puppeteer, which was a bit tedious but otherwise quite straight forward.

Since we have other limitations about testing a website running in a separate process, and the tests not being independent of each other in the first place, I went with running jest with --runInBand anyway. But coming back to what I just said about being limited to only one tab that has focus, I’m not quite sure how non-headless mode would actually work with the normal way that jest splits up tests into multiple worker processes.

Oh, and I also filed a bug with expect-puppeteer which fails to work when using a different Page instance.

Another really severe bug I found was in jest itself, which just ignores thrown errors in async beforeAll. Wow!

What I also noticed is that sometimes the stack traces of errors are just swallowed up somehow. Not sure why, but I get hit by the infamous Node is either not visible or not an HTMLElement quite often without knowing which command, selector or element is responsible because the error has no stack trace. This makes it a nightmare to debug. Especially if things run fine for 90% of the time when run locally but fail all the time when run on CI.

But the problem of unreliable and flaky tests can happen with any tool. Both the website you are testing as well as the test code itself just needs to written in a way that either minimizes random failure, or explicitly optimize for it.

# Conclusion

It has definitely been a bumpy ride, but I learned a lot. In the end I am still quite disappointed with the current state of tools. I am also still not very confident in all of this considering that it took quite some time to get tests to pass on CI that were successful locally.

To summarize, puppeteer definitely has the more intuitive imperative async API.

IMO, a declarative test syntax such as cucumber can make a lot of sense, but not when you are writing JS code and really expect things to be imperative.
Puppeteer is a lot less opinionated, so you can use whatever test runner and assertion library you want.

Which ofc means that you have to invest time into that. Also, I am not quite happy with jest-puppeteer and expect-puppeteer, so I might recommend to just roll your own.
Puppeteer also forces you to be more explicit, especially around the different cases of waitForXXX. While this might be more tedious at first, I think in the end its a good thing to think about this, and to optimize the app under test itself to avoid long wait times.
Really think about if you want to use multiple tabs. It might not be worth the hassle. Here also the app we are testing is the problem, because the app loses state when you refresh the page or open the same URL in a different tab.
The debugging experience, at least with the other tools I use puppeteer with is horrible. As I said in the previous section, I had to struggle a lot with errors that had absolutely no context, which makes it impossible to debug.
In the end, the choice is yours. I might still prefer cypress when the goal is to write E2E tests. The visual test runner that you can pause, and inspect the real DOM is really convenient to write and debug testcases. The automatic video recordings also add incredible value for tests run on CI.
I just wish cypress would have done better technological choices. I mean it largely uses jQuery and is itself still largely written in coffeescript FFS. Also it does not yet have any predictable release schedule. I hope electrons move to a predictable release schedule will propagate to all the projects that depend on it.
Oh, and another note on cypress is that its pricing for its premium service is based around the number of test recordings, which they define as: […] each time the it() function is called […]. This definition is totally broken because you can easily game it by just putting everything into one giant it function, which runs contrary to the general notion of keeping your test cases as small as possible. A much better metric would be total runtime or something like that, since they save all your video recordings which scale with the runtime of your tests.