Every so often, I decide I want to watch all the matches a wrestler has had that meet some criteria (e.g. in a particular promotion or against a particular opponent), and usually, I have a bit more fun if I don't go into every match knowing who's going to win and how many minutes it's going to take them to do it. Cagematch.net is great for listing out all the matches I'm interested in, but it's pretty rough for the latter: the "Matches" view on a wrestler's page is comprehensive but includes the results, and the "Matchguide" view is spoiler-free but doesn't include every match.
Fortunately, we can make the computer give us the info we want and
nothing more. We're going to use fq for this, which is a very
general tool for parsing, transforming, and making queries over
structured data formats (if you're familiar with jq, it's
basically "jq but for everything instead of just for
JSON").
I recently subscribed to RevPro's streaming service because I wanted
to watch more Safire Reed matches, so to start this process off I did a
Cagematch query for a
list of all of Safire's matches in RevPro and downloaded the page
(as matches.html). A typical entry in the table looks
something like this (I've added some indentation and removed the
href attributes from the a tags to make it
easier to see the structure):
<span class="MatchCard">
<a>Safire Reed</a> defeats <a>Anita Vaughan</a> (10:33)
</span>
<div class="MatchEventLine">
<a>RevPro Live In London 91</a> - Online Stream @ 229 The Venue in London, England, UK
</div>
The easiest useful thing we can do is to just grab the name of the
event. If we run fq with just . as the
command, it'll show us how it parsed the entire input:
fq -d html '.' matches.html (maybe pipe it to
less, since this is quite a bit of output). If we look for
"MatchEventLine" in there, we'll find that the entries we're interested
in look something like this:
{
"#text": "- Online Stream @ 229 The Venue in London, England, UK",
"@class": "MatchEventLine",
"a": {
"#text": "RevPro Live In London 91",
"@href": "https://www.cagematch.net/?id=1&nr=381436"
}
}
fq uses the grep_by function to recursively
find all objects that match a given condition. We'll use that to get all
the "MatchEventLine" objects:
fq -d html 'grep_by(."@class"=="MatchEventLine")' matches.html
From here, we want to narrow our focus down to the a
element, and get just its inner "#text" (at this point I'm
adding -r to tell fq to produce raw output,
since I don't need the name of each event to be wrapped in double
quotes):
fq -d html -r 'grep_by(."@class"=="MatchEventLine").a."#text"' matches.html
If you're wondering why "@class" and
"#text" are in double quotes but a isn't, it's
because @ and # are special characters. You
could write "a" instead of a in the
middle of the chain of selectors, but you don't have to.
This will give us our list of all the event names and nothing else (Cagematch's default sorting is newest-to-oldest, so if we want to watch in order, we should start at the bottom):
RevPro Live In London 106
RevPro Live In Coventry
RevPro Live In London 105
RevPro Live In London 101
RevPro Raw Deal 2025
...
Tada! Now we can go spend the whole day watching indie wrestling instead of whatever else we were supposed to be doing. If you're looking for a rec, Safire vs Kanji from Live In London 97 is available for free (I haven't actually watched it yet, but their match at Live In London 78 has one of my favorite final three-minute stretches of any match ever, so this one is probably awesome too).
Or, we can keep tinkering....
It would also be neat to keep the info about who else was in the match, without giving away who won it. This is a little bit trickier than just extracting the names of all the participants, because the names appear in the order "X defeats Y" (or e.g. "X defeats Y and Z" for a multi-way match): to hide the result from ourselves, we should also normalize or scramble the ordering somehow. I guess you could randomize it if you wanted, but the easiest thing to do is probably just to alphabetize it. So this is what we want to do:
- Get each table entry that contains a
<span class="MatchCard">and a<div class="MatchEventLine"> - Extract the event name from the MatchEventLine (just like we already did)
- Extract all the wrestler names from the MatchCard, and sort them alphabetically
- Output something like "Event A: Wrestler 1, Wrestler 2, Wrestler 3" for each match
With the way the HTML is structured, the two elements we're
interested in always appear together (inside a <td>
tag, but we don't need that outer enclosing tag), so we can just look
for one of them with
grep_by(.div."@class"=="MatchEventLine"). (We have a
.div in here because we're looking for the element that
has that div inside it, unlike earlier when we were looking for
the div itself.) However, if we directly tack on a
.div.a."#text" to get the name of the event, we'll no
longer have access to the outer element that also contains the MatchCard
span.
This will get long enough to be a bit unwieldy to work with as a
single terminal command, so I'm going to put the commands in a file
(call it cagematch.fq or something) and use
-f cagematch.fq in place of writing out the instructions
directly on the command line. This will also let us use linebreaks:
yay!
fq has vaguely shell-like piping, and lets you bind
variables with value as $name, both of which we'll be
using. First, we'll pipe the output of our first command through to one
that binds the event name to a variable called $event
(which passes the output of the first command right on through without
modifying it):
grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
We don't need anything else from the MatchEventLine div, so we can
narrow our focus down to just the MatchCard span now. All we want from
there is the list of names of the participants in the match;
conveniently, each of those names is hyperlinked (for a loose
definition of "each", which we'll scrutinize shortly), so we can
get the a elements and grab their #text
(and no, I haven't forgotten about sorting it, just
wait):
grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
| grep_by(."@class"=="MatchCard").a | map(."#text")
However, some of the "names" that will show up in the resulting lists
aren't the names of individual participants, but rather the names of tag
teams or factions that they're in; a match like Medusa Complex (Charli
Evans & Millie McKenzie) vs Cut Throat Collective (Lizzy Evo &
Safire Reed) only has four wrestlers in it, but Medusa Complex and Cut
Throat Collective will show up in the list of "participants". We can
filter these out by examining the URL and selecting only the ones that
link to individuals' pages: Cagematch pages for individual wrestlers
have id=2 in the query string, whereas tag teams have
id=28 and larger factions have id=29. Let's
replace the plain old map(."#text") with a command that
excludes the latter:
grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
| grep_by(."@class"=="MatchCard").a
| map(select(."@href" | (contains("id=28") or contains("id=29")) | not)."#text")
Plus, in the rare instance where the wrestler whose page you pulled
the results from was the only wrestler in the match who has a
Cagematch page, trying to map after .a will
fail, because their name will be the only hyperlinked one and
.a will produce a single element instead of an array. To
account for that, we can pipe through the command
arrays // [.], which basically says "if the input is
already an array, pass it through, otherwise put it inside of one":
grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
| grep_by(."@class"=="MatchCard").a | arrays // [.]
| map(select(."@href" | (contains("id=28") or contains("id=29")) | not)."#text")
Finally, for each of these lists of participants, we want to sort it, and then output the name of the event, followed by a ":", followed by the list of participants (comma-separated):
grep_by(.div."@class"=="MatchEventLine") | .div.a."#text" as $event
| grep_by(."@class"=="MatchCard").a | arrays // [.]
| map(select(."@href" | (contains("id=28") or contains("id=29")) | not)."#text")
| sort | $event + ": " + join(", ")
Now we can run this code:
fq -d html -r -f cagematch.fq matches.html
And we have what we want!
RevPro Live In London 106: Amira, Anita Vaughan, Kanji, Mercedez Blaze, Rhio, Safire Reed
RevPro Live In Coventry: Alex Windsor, Safire Reed
RevPro Live In London 105: Alex Windsor, Anita Vaughan, Nina Samuels, Safire Reed
RevPro Live In London 101: Nina Samuels, Safire Reed
RevPro Raw Deal 2025: Myla Grace, Safire Reed
...
A couple final notes:
- Some matches also have a
<span class="MatchType">that indicates it was a title match or a no-disqualification match or some other special and exciting type of thing. Including that in the output as well would be a fun little addition to make. - If there are more than 100 matches returned by your Cagematch query,
the results will be paginated and you'll have to download each page
separately. It's probably possible to pass in all the files at once with
the
--slurpoption or something, but I haven't tried it.