Using ReactPHP to consume data from an HTTP API

This is the first of two blogs detailing how to build a middleware leveraging ReactPHP to consume data from an API, normalize it, and then push that data into Drupal through JSON:API. In this example will be grabbing data from the PokéAPI. In the second blog post, we will take the data and create Pokémon nodes on a Drupal site.

The purpose of the middleware is to avoid putting this logic into the Drupal codebase. Drupal has an amazingly robust Migration API for consuming data and processing data to create content. Every Drupal site I have ever worked on — since Drupal 8’s beta days — has used the migration system to import content through CSVs or JSON from remote APIs. However. That means you have to be a Drupal developer to understand it, or go through the learning curve if you are not. A generic middleware means that any team can control and maintain the code. All you need to know is some PHP.

Why ReactPHP and not XYZ?

But, why ReactPHP? ReactPHP provides a low-level library for event-driven applications based around its library. If you're familiar with Rust, it is like the Tokio runtime. Node.js has an event loop runtime built-in.

Consuming data from a remote API is time-consuming. Especially if the process is synchronous:

  • Make a request to the API
  • Wait for the request to finish
  • Parse the response
  • Handle the response data
  • Repeat

There are plenty of HTTP requests that are sent and responses handled. Many times you enter the API at a collection of resources and then must fetch additional information about those resources. With ReactPHP we can process each request in a non-blocking fashion, speeding up the entire process.

What does the scraper do?

The scraper will:

  • Go through the paginated collection and collect links to each Pokemon resource
  • Fetch the Pokemon resource
  • Fetch the Pokemon’s species resource
  • Normalize the data (we don’t want all of the raw data)
  • Write the normalized data to a JSON file

We could fetch the data and immediately send it to the Drupal API. But that would create one huge process that could fail if either of the APIs go down. It also puts all concerns into one process. In this blog we are dumping the normalized data into a JSON file. What if we streamed it to Apache Kafka and our data pusher streamed it from there?

Let’s build it!

I was pretty impressed to see that the entire script is only about 100 lines of code, one-third of that being code to normalize the data about the Pokemon. One of the things to keep in mind when building with ReactPHP is that you will find yourself using a lot of closures for callbacks to be executed by the event loop.

The Pokemon resource on the PokeAPI returns a collection of Pokemon resource identifiers. What do I mean by that? It’s an array of objects specifying the resource name and a URL for retrieving it.

We will have to iterate through each page of the collection and fetch each Pokemon resource.

Use a class to wrangle your closures

I first started writing this in a single file and found myself writing some ridiculous closures that I had to keep adding statements for, to have access to the loop and HTTP client, amongst any other variables.

It is much easier to use a class. Your closures can then access the class properties and things are much more manageable.

No more trying to manage passing things around. They’re just accessible in the class.

In this example, we will create a class which will contain all of our event loop logic. It will enter the PokeAPI at and go through each page until all Pokemon have been processed.

Create the project and install dependencies

Before we can do anything, we need to set up the project with Composer and get our dependencies. We will need the following packages

  • react/event-loop: This is the main library. Our other dependencies require it, but I like to be explicit on my dependencies.
  • react/http: This library allows for asynchronous concurrent requests
  • clue/ndjson-react: Since we’re working in an asynchronous environment and processing a lot of data, we have to stream our data to the artifact file. The NDJSON format makes this a lot easier by supporting newline-delimited JSON files.

Edit the generated file to register our PSR-4 autoload namespace for our class.

💪 Now we can write some code.

The execution script

First, we will write the script that will be executed to scrape the API. That way you can give it a few tries along the way and do some experimentation. Because I know I am the kind of person who starts reading a blog, writes some code from the blog, and then ends up in a ton of experiments. So I want to make sure you can do the same.

Create a PHP file in the root of your project. It does a few things for us

  • Loads the autoloader generated by Composer
  • Creates an event loop
  • Instantiates our PokemonMaster class
  • Runs the API scraper

It’s really simple, but that is because all of our logic is in our class.

When using ReactPHP, your application revolves around creating an event loop and running it. You should always save calling for the last line of your code. The PHP script will continue executing so long as the event loop has ticks queued for execution. If you run the loop before anything is registered, nothing will happen.

You’ll be able to scrape the API by running the script with

The PokemonMaster: asynchronous and concurrent data fetching

As you saw from the execution script, we need a class requires the event loop that has a method. Create a directory and the file inside of it ().

Let’s create the base scaffolding of the code. We’ll define our class and its constructor, along with our entry point method. The constructor takes a loop as its parameter and sets it as a property. We then also construct a new browser object to act as our API client.

Our method will add our first tick to the event loop. That way things get rolling once the loop runs. Our method will kick off a series of requests to go through the PokeAPI and the list of Pokemon. Since the list of Pokemon is paginated, we will want to put that logic. That way we can call the method for each page of results.

Like every API should, the PokeAPI returns and links for collection resources.

That means we can just keep calling with the initial API URL and continue following the links until the API tells us there is nothing else to process.

At this point, we have not really gained much by using ReactPHP over a loop with Guzzle. But, that will change once we start processing the results. If there is a link we add a new tick to fetch the next results and then we process the results. We then need to loop through each of the results and retrieve the Pokemon resource.

To do this, we will create a method.

We call for each result. The adds a tick to the event loop to queue an HTTP request for the Pokemon resource. This is where ReactPHP becomes beneficial for non-blocking operations and the ReactPHP HTTP library for concurrent requests.

  • The API client leverages promises, so our script is not blocked on waiting for a request to complete before other ticks are handled.
  • We are adding a new tick for the new Pokemon collection page before we process the results of the response.
  • Each result is added as its own tick to the event loop to be processed after other ticks on the loop

This allows us to begin having concurrent requests and processing without being blocked on each individual HTTP request (it took me a few times to understand, and even write that.) That is one reason I left in the debugging statements because I find it cool to watch the order of things.

The PokemonMaster: streaming normalized data to an artifact.

Great! We have data. Now, what? We can write the data to a JSON file for later parsing. If we wanted to, we could have a class property that was an array of the data to be processed and write it all at the end. But that would use a lot of memory, and honestly is not as fun. We can use streams to write our data to a JSON file to use less memory and also handle our script’s concurrent nature.

That’s where the NDJSON format comes in handy. Instead of requiring a root array object in our JSON file, we can essentially just append new rows to the file.

In the constructor, we will create a writeable stream and pass that to the NDJSON library’s Encoder. We’ll set the encoder as a class property so that we can access in our closure.

Hat tip to Christian Lück (the ReactPHP maintainer) for showing me the option!

Now we can update to write data to our JSON stream. Instead of saving the entire payload, we will just save the and property.

That’s it!

The completed file

Your completed file should look like the following:

Now we can give it a try! Run the script:

Now check your created JSON file. Your results will vary, but here is an example:

If you notice, the order field is not sequential The Pokemon collection resource returns by the order property in ascending order. The fact we have out of sequence records means we had a script working asynchronously with concurrent requests!

But, there is more!

You may have noticed that there isn’t much data. We also need to fetch the Species resource for the Pokemon to get things like its name, description, and more. This blog was already complicated and I wanted to provide a more simplified example here.

I have put code for a more robust example on GitHub: https://github.com/mglaman/pokeapi-middleware

I even kept the Git history, so you can see what it looked like as just a bunch of functions: https://github.com/mglaman/pokeapi-middleware/commit/6e29b68b1ce49d1f84…

😱

Open source developer, working with Drupal and building Drupal Commerce.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store