Comparing Cypress and Puppeteer
An exercise in anger management
— 10 minNote: I actually wrote most of this post 2 months ago when I did a deep dive into comparing cypress and puppeteer. Unfortunately I cannot give a clear recommendation on either. You will have to make up your mind yourself, but I hope I can help a bit by presenting the learnings I did with this experiment, so here goes…
# Cypress API
First of, one of the main selling points of cypress is that it is a very convenient all-in-one solution for e2e testing. It comes with its own way to structure your tests, and with its own test runner.
The tests themselves are also written in a very special way, which certainly seems very strange and not quite intuitive at times.
One of the confusion comes from the fact that cypress actually hides the fact that every interaction with the website is by definition asynchronous.
Cypress test code might frequently look like this:
cy.get("#A").click();
cy.get("#B").should("be.visible");
First of all, I personally dislike the usage of jQuery
for selectors and basically
everything else. Second, it uses chai-jQuery
in the background, the way it does
that is horrible. cypress’ should
method basically takes the chai chainable
as a string. So long type checking and typos!
I have also seen the usage of .then()
inside some tests. But guess what, while
you can return
a Promise
from that function, the return value of that
function itself is not a Promise
. You can’t await
that.
It is just now that I write this blog post that I begin to understand how cypress actually works. Essentially, you just can’t think of the test code you write as real code that is executed. Rather, the code you write just defines a list of commands and assertions. And cypress is free to run them and re-run them however it sees fit. Wow, mind blown.
This is extremely unintuitive for someone who is used to write imperative async code. You have no control over what is actually run when, and in which context.
# Puppeteer API
Puppeteer is very different! To start with, puppeteer is not a testing framework. It is an imperative, async API to automate a browser. Everything else is up to you, such as using it for building e2e tests.
In this sense, the puppeteer API makes a lot more sense. You know exactly
what runs when, and you have control over the context
your code runs in.
You have code which is evaluated inside the browser frame, and your normal
testing code runs outside the browser.
The distinction, at least to me, is very clear and just makes sense. However, there are also some limitations and pitfalls to be aware of. Puppeteer has so called page functions which are evaluated in the context of the page/frame. But they are defined in your code just like normal JS functions. But they can’t reference any values of their containing scope !!!. You have to explicitly pass everything as additional parameter. It can also lead to surprising and unintuitive errors when using code coverage. At least there well documented workarounds for this. And with time you will get used to spotting and treating page functions differently eventually.
There are two more pain points with the puppeteer API. The first one is that the API is too async. You know you can extend Native Promises to offer a conveniently chainable API right? Maybe I will blog about that separately.
The other annoyance is that the API feels a bit inconsistent at times. Up until
<1.20
, some convenience methods like page.select(selector, ...options)
and page.click(selector)
were only available on the Page
object. It would
make a lot more sense to provide such helpers on a generic Parent
or Container
type, which could be used to scope everything to a DOM subtree, such as a modal
dialog, because right now such scoping is a huge pain.
Combining such lazy chainable Promises with a Container
-focused API, I could
imagine an API like this:
await page
.$("#my-modal")
.$(".some-other-container")
.click(".nested-child");
As you can see, it is absolutely possible to chain methods onto a custom Promise type and just await the final result.
But with great power also comes great responsibility. You definitely have more
control with puppeteer, but you also have to take care to correctly use it.
While cypress automatically retries commands until it hits a timeout, with puppeteer
I had to insert explicit .waitForSelector
or .waitForNavigation
calls quite
frequently.
While this might be tedious and inconvenient, it also makes sense. And it kind
of highlights that you should actually optimize your app for more instantaneous
interactions :-)
# Running Non-Headless
One big issue I had with puppeteer was the fact that it is basically a different browser depending on if you run it headless vs when you really have a browser window.
One problem was language. Running a headless puppeteer apparently has no
language at all. I don’t really know what Accept-Language
header it provides,
but express’ .acceptsLanguages()
turns it into ["*"]
, which revealed
a bug in a library that I
maintain, which I then promptly fixed.
Anyhow, headless puppeteer has no language by default, and setting it needs
to be done via the startup parameter --lang=en
.
But that on the other hand does not work with non-headless puppeteer, which
instead uses the LANG
environment variable.
Well this took me some time to figure out, so as a recommendation, you better
set an explicit language via both startup parameter as well as env.
# Working with multiple tabs
One thing that puppeteer supports, which is not possible in cypress is to run
multiple Page
objects / tabs in parallel, well kind of.
In headless puppeteer, you can do that mostly without problems. But when
running in non-headless mode, only the currently focused foreground tab will
actually do anything. Background tabs will just hang indefinitely.
To make it work, you will have to call page.bringToFront()
every time you want
to switch focus between pages. And of course make sure that your testing
framework of choice does not run multiple tests in parallel.
This has also caused my a lot of headaches. Depending on what you want to
test, and what testing tools you are using, it might not be worth the hassle to
use multiple pages.
So essentially when your usecase is to run E2E tests, you should try to work with
only one page object / tab.
# Integrating with Jest
Speaking of tools, since we use Jest for all of our other tests, I thought it would be a good idea to stick with it, since most engineers are already familiar with it. So I went ahead and set up jest-puppeteer, which was a bit tedious but otherwise quite straight forward.
Since we have other limitations about testing a website running in a separate
process, and the tests not being independent of each other in the first place,
I went with running jest with --runInBand
anyway.
But coming back to what I just said about being limited to only one tab that has
focus, I’m not quite sure how non-headless mode would actually work with the
normal way that jest splits up tests into multiple worker processes.
Oh, and I also
filed a bug with
expect-puppeteer
which fails to work when using a different Page
instance.
Another really severe bug I found was in jest itself, which just
ignores thrown errors in async beforeAll
.
Wow!
What I also noticed is that sometimes the stack traces of errors are just swallowed up
somehow. Not sure why, but I get hit by the infamous
Node is either not visible or not an HTMLElement
quite often without knowing
which command, selector or element is responsible because the error has no stack trace.
This makes it a nightmare to debug. Especially if things run fine for 90% of the
time when run locally but fail all the time when run on CI.
But the problem of unreliable and flaky tests can happen with any tool. Both the website you are testing as well as the test code itself just needs to written in a way that either minimizes random failure, or explicitly optimize for it.
# Conclusion
It has definitely been a bumpy ride, but I learned a lot. In the end I am still quite disappointed with the current state of tools. I am also still not very confident in all of this considering that it took quite some time to get tests to pass on CI that were successful locally.
-
To summarize, puppeteer definitely has the more intuitive imperative async API.
IMO, a declarative test syntax such as cucumber can make a lot of sense, but not when you are writing JS code and really expect things to be imperative.
-
Puppeteer is a lot less opinionated, so you can use whatever test runner and assertion library you want.
Which ofc means that you have to invest time into that. Also, I am not quite happy with
jest-puppeteer
andexpect-puppeteer
, so I might recommend to just roll your own. -
Puppeteer also forces you to be more explicit, especially around the different cases of
waitForXXX
. While this might be more tedious at first, I think in the end its a good thing to think about this, and to optimize the app under test itself to avoid long wait times. -
Really think about if you want to use multiple tabs. It might not be worth the hassle. Here also the app we are testing is the problem, because the app loses state when you refresh the page or open the same URL in a different tab.
-
The debugging experience, at least with the other tools I use puppeteer with is horrible. As I said in the previous section, I had to struggle a lot with errors that had absolutely no context, which makes it impossible to debug.
-
In the end, the choice is yours. I might still prefer cypress when the goal is to write E2E tests. The visual test runner that you can pause, and inspect the real DOM is really convenient to write and debug testcases. The automatic video recordings also add incredible value for tests run on CI.
-
I just wish cypress would have done better technological choices. I mean it largely uses jQuery and is itself still largely written in coffeescript FFS. Also it does not yet have any predictable release schedule. I hope electrons move to a predictable release schedule will propagate to all the projects that depend on it.
-
Oh, and another note on cypress is that its pricing for its premium service is based around the number of test recordings, which they define as:
[…] each time the it() function is called […]
. This definition is totally broken because you can easily game it by just putting everything into one giantit
function, which runs contrary to the general notion of keeping your test cases as small as possible. A much better metric would be total runtime or something like that, since they save all your video recordings which scale with the runtime of your tests.