almach | spoiler-free cagematch listings with fq (Reply)

Every so often, I decide I want to watch all the matches a wrestler has had that meet some criteria (e.g. in a particular promotion or against a particular opponent), and usually, I have a bit more fun if I don't go into every match knowing who's going to win and how many minutes it's going to take them to do it. Cagematch.net is great for listing out all the matches I'm interested in, but it's pretty rough for the latter: the "Matches" view on a wrestler's page is comprehensive but includes the results, and the "Matchguide" view is spoiler-free but doesn't include every match.

Fortunately, we can make the computer give us the info we want and nothing more. We're going to use fq for this, which is a very general tool for parsing, transforming, and making queries over structured data formats (if you're familiar with jq, it's basically "jq but for everything instead of just for JSON").

I recently subscribed to RevPro's streaming service because I wanted to watch more Safire Reed matches, so to start this process off I did a Cagematch query for a list of all of Safire's matches in RevPro and downloaded the page (as matches.html). A typical entry in the table looks something like this (I've added some indentation and removed the href attributes from the a tags to make it easier to see the structure):

<span class="MatchCard">
  <a>Safire Reed</a> defeats <a>Anita Vaughan</a> (10:33)
</span>

<div class="MatchEventLine">
  <a>RevPro Live In London 91</a> - Online Stream @ 229 The Venue in London, England, UK
</div>

The easiest useful thing we can do is to just grab the name of the event. If we run fq with just . as the command, it'll show us how it parsed the entire input: fq -d html '.' matches.html (maybe pipe it to less, since this is quite a bit of output). If we look for "MatchEventLine" in there, we'll find that the entries we're interested in look something like this:

{
  "#text": "- Online Stream @ 229 The Venue in London, England, UK",
  "@class": "MatchEventLine",
  "a": {
    "#text": "RevPro Live In London 91",
    "@href": "https://www.cagematch.net/?id=1&nr=381436"
  }
}

fq uses the grep_by function to recursively find all objects that match a given condition. We'll use that to get all the "MatchEventLine" objects:

fq -d html 'grep_by(."@class"=="MatchEventLine")' matches.html

From here, we want to narrow our focus down to the a element, and get just its inner "#text" (at this point I'm adding -r to tell fq to produce raw output, since I don't need the name of each event to be wrapped in double quotes):

fq -d html -r 'grep_by(."@class"=="MatchEventLine").a."#text"' matches.html

If you're wondering why "@class" and "#text" are in double quotes but a isn't, it's because @ and # are special characters. You could write "a" instead of a in the middle of the chain of selectors, but you don't have to.

This will give us our list of all the event names and nothing else (Cagematch's default sorting is newest-to-oldest, so if we want to watch in order, we should start at the bottom):

RevPro Live In London 106
RevPro Live In Coventry
RevPro Live In London 105
RevPro Live In London 101
RevPro Raw Deal 2025
...

Tada! Now we can go spend the whole day watching indie wrestling instead of whatever else we were supposed to be doing. If you're looking for a rec, Safire vs Kanji from Live In London 97 is available for free (I haven't actually watched it yet, but their match at Live In London 78 has one of my favorite final three-minute stretches of any match ever, so this one is probably awesome too).

Or, we can keep tinkering....

It would also be neat to keep the info about who else was in the match, without giving away who won it. This is a little bit trickier than just extracting the names of all the participants, because the names appear in the order "X defeats Y" (or e.g. "X defeats Y and Z" for a multi-way match): to hide the result from ourselves, we should also normalize or scramble the ordering somehow. I guess you could randomize it if you wanted, but the easiest thing to do is probably just to alphabetize it. So this is what we want to do:

Get each table entry that contains a <span class="MatchCard"> and a <div class="MatchEventLine">
Extract the event name from the MatchEventLine (just like we already did)
Extract all the wrestler names from the MatchCard, and sort them alphabetically
Output something like "Event A: Wrestler 1, Wrestler 2, Wrestler 3" for each match

With the way the HTML is structured, the two elements we're interested in always appear together (inside a <td> tag, but we don't need that outer enclosing tag), so we can just look for one of them with grep_by(.div."@class"=="MatchEventLine"). (We have a .div in here because we're looking for the element that has that div inside it, unlike earlier when we were looking for the div itself.) However, if we directly tack on a .div.a."#text" to get the name of the event, we'll no longer have access to the outer element that also contains the MatchCard span.

This will get long enough to be a bit unwieldy to work with as a single terminal command, so I'm going to put the commands in a file (call it cagematch.fq or something) and use -f cagematch.fq in place of writing out the instructions directly on the command line. This will also let us use linebreaks: yay!

fq has vaguely shell-like piping, and lets you bind variables with value as $name, both of which we'll be using. First, we'll pipe the output of our first command through to one that binds the event name to a variable called $event (which passes the output of the first command right on through without modifying it):

grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event

We don't need anything else from the MatchEventLine div, so we can narrow our focus down to just the MatchCard span now. All we want from there is the list of names of the participants in the match; conveniently, each of those names is hyperlinked (for a loose definition of "each", which we'll scrutinize shortly), so we can get the a elements and grab their #text (and no, I haven't forgotten about sorting it, just wait):

grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
    | grep_by(."@class"=="MatchCard").a | map(."#text")

However, some of the "names" that will show up in the resulting lists aren't the names of individual participants, but rather the names of tag teams or factions that they're in; a match like Medusa Complex (Charli Evans & Millie McKenzie) vs Cut Throat Collective (Lizzy Evo & Safire Reed) only has four wrestlers in it, but Medusa Complex and Cut Throat Collective will show up in the list of "participants". We can filter these out by examining the URL and selecting only the ones that link to individuals' pages: Cagematch pages for individual wrestlers have id=2 in the query string, whereas tag teams have id=28 and larger factions have id=29. Let's replace the plain old map(."#text") with a command that excludes the latter:

grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
    | grep_by(."@class"=="MatchCard").a
    | map(select(."@href" | (contains("id=28") or contains("id=29")) | not)."#text")

Plus, in the rare instance where the wrestler whose page you pulled the results from was the only wrestler in the match who has a Cagematch page, trying to map after .a will fail, because their name will be the only hyperlinked one and .a will produce a single element instead of an array. To account for that, we can pipe through the command arrays // [.], which basically says "if the input is already an array, pass it through, otherwise put it inside of one":

grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
    | grep_by(."@class"=="MatchCard").a | arrays // [.]
    | map(select(."@href" | (contains("id=28") or contains("id=29")) | not)."#text")

Finally, for each of these lists of participants, we want to sort it, and then output the name of the event, followed by a ":", followed by the list of participants (comma-separated):

grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
    | grep_by(."@class"=="MatchCard").a | arrays // [.]
    | map(select(."@href" | (contains("id=28") or contains("id=29")) | not)."#text")
    | sort | $event + ": " + join(", ")

Now we can run this code:

fq -d html -r -f cagematch.fq matches.html

And we have what we want!

RevPro Live In London 106: Amira, Anita Vaughan, Kanji, Mercedez Blaze, Rhio, Safire Reed
RevPro Live In Coventry: Alex Windsor, Safire Reed
RevPro Live In London 105: Alex Windsor, Anita Vaughan, Nina Samuels, Safire Reed
RevPro Live In London 101: Nina Samuels, Safire Reed
RevPro Raw Deal 2025: Myla Grace, Safire Reed
...

A couple final notes:

Some matches also have a <span class="MatchType"> that indicates it was a title match or a no-disqualification match or some other special and exciting type of thing. Including that in the output as well would be a fun little addition to make.
If there are more than 100 matches returned by your Cagematch query, the results will be paginated and you'll have to download each page separately. It's probably possible to pass in all the files at once with the --slurp option or something, but I haven't tried it.