By David Zimdars (Zimdarsdavid) on Saturday, September 03, 2011 - 10:25 am: Edit |
Clayton, I'm not sure you are reading table 24 right in ace vs. ace. This table uses Schirmers SUPR prediction method, so he has try to correct for player skill using a Bayesian analysis rather than just use win/loss for a particular matchup. On this table you cross reference row vs. column, so Schirmer predicts Hydran wins 5.3/10 games vs. Fed, for example (ace vs. ace). I was referring to table 10 in the overall report.
By Clayton Krueger (Krieg) on Saturday, September 03, 2011 - 11:53 am: Edit |
How I read table 24 is that the Hydran has a winning percentage against 13 out of 17 opponents when both ships are flown by equally skilled players. Of course this is according to Schirmer and his particular statistical analysis. My limited experience tends to agree with many of his judgements, but not all.
By Peter D Bakija (Bakija) on Saturday, September 03, 2011 - 12:01 pm: Edit |
Dave wrote:
>>But the really unfortunate kicker is that the aces can point out that when ace vs. ace games are considered, the Fed is more even. And then this justifies the preservation of a pool of marks and maroons who stumble into playing the Fed - providing a neat and tidy group of easy kills for the aces.>>
Uh, wha?
Who is trying to "justify the preservation of a pool of marks and maroons"?
>> Now, I'm not saying those experts who say they prefer some other method than overall win loss method to justify maintaining status quo on the Fed were motivated to preserve their personal pool of easier games. Rather, I'm just pointing out that the comparison of the statistics from the two pools point to that unintended consequence.>>
You seem to be misunderstanding what is going on here. *I'm* (for example) not trying to defend the idea that the Fed is fine as it is (you'll note that I also am one of the strongest advocates if giving the ship a G-rack). But for the past 10 years or so, whenever someone says "The Fed is weak. It needs an upgrade." and this sentiment makes its way up to the Powers That Be (i.e. the game designers), they responded with "The Fed is perfectly within the bounds of acceptable win/loss statistics. It doesn't need an upgrade". Which are their own collected statistics. Not Schirmer's. As the Powers That Be don't pay attention to Schrimer's statistics and ratings. When I say "The Fed is totally average, in terms of win/loss", I'm referring to what the game designers tell us.
When I tend to look at the Ace vs Ace statistics (again, not 'cause they are Ace vs Ace games, but 'cause they are games vs players of roughly equivalent skill who are reasonably good at the game; if you look at the list of All Games and factor in players of equivalent skill, the Fed comes out at 45% wins, and 12th overall. Again, not awesome, but within what the game designers feel is reasonable (i.e. 50% +/- 5%, which is what they indicated was acceptable many years ago when they first published overall win/loss statistics in, like, an issue of Star Fleet Times). So by looking at the Schirmer statistics, and then finding the ones that seem to match what the designers are looking at (i.e. that according to their own collected statistics, the Fed is acceptable), no one is trying to maintain the status quo. We are simply illustrating what the game designers are seeing.
By Peter D Bakija (Bakija) on Saturday, September 03, 2011 - 12:10 pm: Edit |
Clayton wrote:
>>How I read table 24 is that the Hydran has a winning percentage against 13 out of 17 opponents when both ships are flown by equally skilled players.>>
Ah. I see what you are looking at. Table 24 of the Ace vs Ace data is the RPS chart. It is listing the expected chance of Ship X beating Ship Y in a given match. Like when we all discuss a given match up and say "Yeah, the Kzinti is 6-4 vs the Fed." or whatever.
Looking at the Hydran, it is advantaged (i.e. 60%+) in 8 matches, disadvantaged (i.e. less than 50%) in 2 matches, and about even in 8 matches.
On this, the Fed is certainly behind the curve, being disadvantaged in, like, 9 matches (although I find that it is significantly disadvantaged vs the Selt of all things to be deeply dubious...).
By David Zimdars (Zimdarsdavid) on Saturday, September 03, 2011 - 12:11 pm: Edit |
Clayton, OK, I see how you are using the table. I mistakenly thought you were saying that the WBS, for example, would win 15 out of 17 matches as an expectation value. Rather, you were tabulating that the WBS has a RPS winning percentage advantage in 15 out of 17 matches. Pardon. I think, if, in fact, the expected win percentage out of all ships were, say, all within a percent of each other, such a comparison wouldn't be kind of random. But, since the WBS has a .67 SUPR win percentage prediction vs. the FED (and many such other wide spreads) - it is a good measure that the WBS is consistently good vs. many ships, and the LDR is consistently bad vs. many ships.
By Clayton Krueger (Krieg) on Saturday, September 03, 2011 - 12:49 pm: Edit |
Yeah winning percentages less than 10% (ie 54% vs 46% or closer) are fairly even matchups in my opinion. It would interesting to redo the list considering all RPS percentages less than 10% considered ties and those 10% or more alone included in a win/loss table and then see how the ships shake out. I'm busy now I'll try to get to it later.
By Alan Trevor (Thyrm) on Saturday, September 03, 2011 - 01:39 pm: Edit |
I don't play Tournament SFB. I much prefer floating maps and to be blunt, I'm not very good playing on fixed maps. So I don't often post in this topic. But I did want to caution against relying too much on Ace vs. Ace battles as a general surrogate for battles between players of "equal skill". I am familiar with several wargames that, at least in my experience, are balanced in favor of one side in a battle between inexperienced players, but balanced very differently in a battle between veteran players. While the specifics vary with the individual game, the general reason seems to be that one side possesses obvious "brute force" advantages while the other side possesses more subtle advantages that require more experience to take advantage of. So the Gauls win in a game between to novices while the Romans win in a game between two veterans.
Does this happn in Tournament SFB? I don't know, but I at least wanted to raise the issue.
By William T Wilson (Sheap) on Saturday, September 03, 2011 - 01:55 pm: Edit |
Definitely it happens in tournament SFB. The old Andromedan is the obvious example, as it was considered to be not real strong when it was new, but when some aces figured out how to play it, it became nearly unbeatable.
In a more modern context, the Archaeo-Tholian is another "aces only" ship. Plasma, taken as a whole, plays better for experienced players than new ones.
The Fed is completely the opposite. Attempting to employ actual tactics in the Fed doesn't help much. The Fed's game has always been, and always will be, to close to the optimal range and push the fire button. The ace's main advantage is a better assessment of just what that optimal range is (newbies will often aim for R4, and R4 is often worse than R2 or R8).
Brute force vs. subtle advantages definitely at work in those examples.
Despite all that, the game has to be balanced at the top, because that is where it matters the most. Average and weak players will decide the game more often based on a blunder, rules problem or a simple difference in skill, as there is a lot more room for skill differences between non aces. However, ace players will exploit a ship's advantage to the maximum. Any imbalance shows much more at the top.
Additionally, ace vs ace games, which often occur in Hat and RAT tournaments, are more important (because of the tournament they are more likely playing in, not because of the players themselves).
By Ken Lin (Old_School) on Saturday, September 03, 2011 - 02:31 pm: Edit |
"Attempting to employ actual tactics in the Fed doesn't help much."
I have to disagree with this one. It takes a lot of skill and tactical execution to play the Fed successfully. While the objective may be stated in deceivingly simple terms (get to the optimal range and press the button), how you get there is extremely important (where on the map? when during the turn? how many drones did you have to shoot? do you have an escape route?). Against D&D for example, a medium skilled Fed may get to his optimal range, but have some of his P1s not available because he had to shoot some drones. A less skilled Fed may not even get the optimal shot, because of too many drones in the way. A more skilled Fed may be able to run around the SP drones, and get to the optimal shot will all P1s available. The tactical implications makes playing the Fed fun for me, especially in matchups that most people say are really bad for the Fed like WAX and BP.
It's kind of similar the the LYR, if you look at the LYR you tend to think "how boring, it's essentially a 6 P1 4 Disr ship", when actually, the tactical variations underlying are really interesting. Essentially, I guess I'm making the statement about how a simple-looking ship has a wealth of tactics behind it, which makes the game great.
And on another note, in the Fed I usually aim for r4 .
By Ken Lin (Old_School) on Saturday, September 03, 2011 - 02:36 pm: Edit |
"Despite all that, the game has to be balanced at the top, because that is where it matters the most."
This one I agree with 100%. Using Sheap's old Andro example, it doesn't matter if most of the field doesn't know how to exploit the Andro correctly - it only takes a few people who know how to utilize the ship to the fullest, and if it's not balanced (at the top), it will ruin your tournament.
By Clayton Krueger (Krieg) on Saturday, September 03, 2011 - 03:48 pm: Edit |
for what's its worth, listed below are tournament ships with a winning percentage of 10% or more based on Schirmir's table #24, with the consideration that winning percentages 54% or less are more or less ties:
WBS 12w
ZIN/ISC/HYD 10w
ORI 9w
KLI 8w
WAX 7w
AND/GRN/LYR 6w
FED/SEL 5w
RKR 4w
RFH/THA 3w
RKE/THN/LDR 2w
In my first list I credited the LYR with 4w which is incorrect. The LYR actually has 6w total, even including winning percentages less than 55%. I also mentioned the GRN had 9 'loses' but again incorrect. The GRN has 3 ties of 50%, so its complete record according to table 24 is 8w/6l/3t with 6 of those winning percentages being equal or greater than 10%. But anyway you slice it, at least according to Schirmir, BP and the Web are fairly weak.
By Peter D Bakija (Bakija) on Saturday, September 03, 2011 - 04:28 pm: Edit |
Well, to be fair here, the issue is not "winning percentages" it is "advantageous match ups".
Going with the assumption that any match up between 4.6 and 5.4 is essentially "even" (i.e. any match where a ship has a 46%-54% chance of victory is safe to assume is a wash), what you are looking at is what games does a ship have an advantage or a disadvantage from the get go, based on the two ships involved.
Ignoring the Andro, the Fed, for example, is disadvantaged vs the GRN, ISC, KLI, LYR, KR, SEL, WAX, and GBS. And a lot of those are incredibly marginal disadvantages (i.e. they are still in the 4.0-4.5 zone). It is advantaged over the LDR, ORI, THN, and THA Also fairly marginal advantage against everyone but the Tholians, for which the Fed is a slam dunk, generally speaking). And essentially even against the HYD, RFH, RKE, and ZIN.
When you compare these results to conventional wisdom, most folks seem to think that the Fed is totally hosed vs the RFH and ZIN (which I tend to agree with). And that the most disadvantaged fight for the Fed, according to that chart, is the SEL indicates that there is something amiss with the figuring, for my money (i.e. the math is based on a small number of games that tended to have a high degree of skill variance; if you go back a few pages and look at the "FED vs X" chart, you discover that the FED vs SEL data is based on a total of 4 games, of which the SEL won 3 of them. Which could be the result of any number of things other than the SEL actually being significantly advantaged over the FED).
>>But anyway you slice it, at least according to Schirmir, BP and the Web are fairly weak.>>
It is not that this chart indicates that they are weak. It is that it indicates that they have few advantaged fights.
By David Cheng (Davec) on Sunday, September 04, 2011 - 10:54 pm: Edit |
David Z:
One phenomenon I don't think you're taking into consideration is the guy who says "I'll play one game to help the SFB tournament get its Ace Card".
That guy often chooses Fed, because it often a faster game, and the systems are relatively simple for a guy who doesn't play a lot of SFB, and doesn't want to spend 6+ hours in the SFB tournament.
That guy also doesn't mind losing, because he's just taking one for the good of the tournament. He wants his game over quickly so he can go back to playing other games he likes more.
Since that not-insignificant population of tournament players is not really playing to win, they are pulling the Fed win numbers down.
I am confident this is a real phenomenon, and I suggest you factor it into your thinking when assessing the Fed's reported win percentages.
... And, we'd love to see you (and everyone else posting on this section of the BBS) at Council of Five Nations in October.
-DC
By David Zimdars (Zimdarsdavid) on Monday, September 05, 2011 - 12:09 am: Edit |
Hi DC,
I'm not sure to what extent I would expect the phenomena of the "devil may care Fed" to dominate Robert Schirmer's data, as I think most of the data is net-kill and online RAT games. Such a motivation might be more likely in face to face games, I suppose.
By Peter D Bakija (Bakija) on Monday, September 05, 2011 - 09:16 am: Edit |
A good portion of the FtF game data that Schirmer has is from Council from the last 10+ years ('cause DC has religiously kept records and sent them to Schirmer every year, IIRC), which has, historically speaking, a high percentage of "I'll take the Fed and play a game or two, and maybe I'll get a jackpot..." relative to serious Fed players.
I don't know that there are enough of these games to highly swing stats, but they might.
By Robert Schirmer (Rwschirmer) on Monday, September 05, 2011 - 02:53 pm: Edit |
Some comments.
1. Dave C has been excellent about sending me the CoFN data, and it does represent a big part of the FtF data in recent years. Within the 349 CoFN matches I have, the FED played in 66 games (19%) and won 26 times (39% win rate). I don't know how many matches represent casual vs. competitive players, although I could probably take a guess by going off whether or not the player is a regular participant in tournaments.
2. The data set does not include Demo games.
3. On the more technical side, with regard to some of Dave Z's earlier comments, yes, one could choose other priors - they're always present in statistical analysis and need to be addressed somehow. Any particular choice has its advantages and disadvantages, and this is one reason I show different cuts at the data (raw stats, Ace only, etc.). My own thinking is that it is the ship balance prior, more than the player skill prior, that I could justify making tighter. One way or another though, getting enough data to make the priors relatively insignificant to the results is a key objective.
v/r
Robert
By Andy Vancil (Andy) on Monday, September 05, 2011 - 03:39 pm: Edit |
Regardless of the data, a simple analysis of the Fed's systems relative to other, similar TCs, shows that it comes up short. It has less power, an inferior turn mode, and no tertiary weapon system. The only thing it has going for it is the photon crunch factor. And that is balanced by the photon whiff factor.
By David Zimdars (Zimdarsdavid) on Monday, September 05, 2011 - 07:33 pm: Edit |
Hi Robert,
Thanks for the nice post.
I was wondering about the player prior distribution, because a casual examination of the player ratings suggests the top player ratings are capped around 3200, and seem to be dropping with time, getting closer to 3000 and below. The prior distribution would suggest a few percent well above 3000, and we don't see that anymore.
Another thing I've wondered if it would be possible to weight more recent games more heavily. It would seem to me that certain player scores could inflate, say, if they played only newbies, and then as the newbies got better, stopped playing them and played the next batch of newbies. As the old newbies dropped their rust and played better, said player's score might float up as the intermediates started beating the new newbies. Is this phenomena possible?
-Dave
By Marcus J. Giegerich (Marcusg) on Tuesday, September 06, 2011 - 10:42 am: Edit |
Sorry, I have not been able to follow this discussion due to the kitchen project from Hell this weekend, but I say David Z post this:
"My point was that an entirely reasonable argument can be made that the Fed is the worst ship. In the overall statistics it absolutely is the worst ship."
Hell no. Heck no. Golly gee no! From somebody that flies the Fed often, I can attest that it has a bunch of poor match-ups and poor dice can flat out kill it. But by no means is it worse than the LDR, Selt, or Rom King Eagle. Heck, I'd probably put it above the Neo Tholian as well. To hell with statistics, I've played the ship enough and played against it enough to know what it's deal is. It may need help, but I can't agree in good consciousness that it is the worst TC.
And heed Dave Cheng, because I've seen the "take one for the team" players enter his Council of 5 Nations tournaments as Feds. In fact, I can't recall a Co5N tourney when that didn't happen. So those statistics are skewed when you consider that.
That being said, I totally am on board with a fix and I agree that the flagship of the game should at least be good and not mediocre.
By David Zimdars (Zimdarsdavid) on Tuesday, September 06, 2011 - 12:34 pm: Edit |
Now hey wait a minute.
The statistics as presented in Robert Schirmer's overall matchup report show that the FED has the worst raw win/loss percentage. I still say, without special knowledge to believe that the sample set is somehow flawed, it is still entirely reasonable to conclude that it is the overall worst ship. The statistics as presented on the black and white page absolutlely support that statement.
Robert didn't include an asterick by the results of the Fed to indicate he had special knowledge that the players at Co5N really werent' trying to win with the Fed or weren't playing seriously - and therefore an argument should be made that the sample is not representative. Only attendees/organizer of Co5N would know that - I wouldn't think Robert would. Although Robert seems to actually give this some credance in his previous post.
It sure looks like Robert put an *awful* lot of effort into his Tournament RPS effort. Doesn't it undermine that effort to incorporate tournament results which are known to have a high likelihood of not being serious?
Now frankly, I think ordinary net kill games have a lot of "what the heck" an "on a lark" games too. So I'm not sure the distribution of player intent ought to be any worse for any of the other empires.
Furthermore, as I understand it, Robert only includes results from players who have played 4 or more games. This should exclude a lot of one hit wonders - and I would think go a long way of culling Co5N bad apples.
Ordinarilly, I would expect that an ace player, even when roped into playing a game, would play a decent game. But what seems to be implied here is that some are "taking one for the team" by de facto throwing a game (either intentionally or by lack of effort). It certainly doesn't do any favors to the project of evaluating the RPS matchups to submit such games for statistical evaluation when there is reasonable suspicion that the results are flawed.
Afterall, this was supposed to be a tournament RPS report - the intent, I thought, was to compare the tournament RPS?
Did Co5N have special rule along the lines "if you want to blow off a game be sure to take the Fed and only the Fed"? Or did the "I don't care crowd" take some other ships too? What is to stop the same argument being made for some other ship?
So, which is it?
a) the Fed apologists didn't really mean to throw the baby out with the bath water - and it is just speculation that a portion of Co5N Fed players were de facto throwing games;
or
b)some or all Co5N data ought not to be included in the RPS report.
?
FWIW I am *really* *really* impressed and awed with the amount of quality work Robert Schirmer has put into his analysis and report. Top flight masters thesis level stuff. We all ought to be in awe of his dedication to create this analyis and to take the time to maintain it. I am always super immpressed about how SFB attracts such high talent individuals.
I just am a little dissappointed I needed to have the secret decoder ring to utilize the results.
Stuff happens, I guess, it is just a game.
By Del Bristol (Djb1701) on Tuesday, September 06, 2011 - 01:42 pm: Edit |
I have been playing forever and agree with just about everyone out here- the Fed needs an upgrade. This discussion, although interesting, unfortunately is not solving the problem. We all think it needs a G-rack, probably with some limitations to the load it has. I think it has been proposed for like a decade and yet we can't get the "Powers that be" to listen to the "Players of the Game". I would be willing to playtest it if the "PTB" are open to the idea and if they can give us an outline for the evidence they would need to endorse the change. If they won't, we all might as well get back to doing and talking about something else as this discussion isn't going to help the game we all love get any better.
Anyone got any ideas on how we can get the ship fixed? It really is kind of ridiculous how the flagship of the game, the ship that new players want to play is near the bottom of the barrel, if not the bottom. It is the frigging Enterprise after all!
By Ted Fay (Catwhoeatsphoto) on Tuesday, September 06, 2011 - 02:41 pm: Edit |
We to fix the TCC *and* we need need a TCF. Something that adds to the flavor of the tourney bouts.
For fixing the TCC, personally I'm in favor of the G rack with four ADDs and two type IM drones, no reloads.
By Clayton Krueger (Krieg) on Tuesday, September 06, 2011 - 03:10 pm: Edit |
I propose that we agree, as much as we can agree on anything, on what subtle changes are to made to the Fed and present it to TPTB. I know that its out of the question to change the phaser suite, so instead I propose 2 modifications to push the Fed into the upper half of tournament ships:
1) a limited G rack
2) 32 warp
I don't think the G rack alone would do the trick. And I don't think 32 warp would make the ship unbeatable. Increased warp power would be a generalized upgrade, helping the ship in every facet of the game.
The 'Enterprise' should be the best at something, so why not warp power. How many times did the Enterprise's warp engines bail out the crew in the original TV series? I don't mean to be cheesy here but keeping the ship true to TOS is important to TPTB in my opinion.
By Peter D Bakija (Bakija) on Tuesday, September 06, 2011 - 03:59 pm: Edit |
David wrote:
>>Furthermore, as I understand it, Robert only includes results from players who have played 4 or more games.>>
I don't know that this is true for basic results. It is true for Net Kill rankings, which are different than win/loss statistics. But I'm pretty sure that the overall win/loss statistics are based on "I got data on this game." and nothing else.
>> This should exclude a lot of one hit wonders - and I would think go a long way of culling Co5N bad apples.>>
"Bad apples" is certainly not a term that applies here. What Dave is talking about is people who show up to a tournament at a convention to play a game or two to support the effort and 'cause they like SFB in general, but aren't necessarily invested in the tournament. These are generally pretty casual players who are willing to help out and support the event, and register and play a game or two, and often take the Fed as games are straight forward, not real fast, and they have a not impossible chance of winning just by being lucky.
>>Ordinarilly, I would expect that an ace player, even when roped into playing a game, would play a decent game.>>
No one is talking about "ace" players here.
>> But what seems to be implied here is that some are "taking one for the team" by de facto throwing a game (either intentionally or by lack of effort).>>
I'm not quite sure where you see that this is being implied. But that isn't remotely what anyone is talking about here.
To be clear, what is being talked about here is:
A) People who play SFB very casually.
and
B) Who agree to play in the tournament for a couple games, just to support the effort, 'cause they like the game, but aren't necessarily that invested in it.
and
C) Often take the Fed for many reasons that are already detailed here.
>> It certainly doesn't do any favors to the project of evaluating the RPS matchups to submit such games for statistical evaluation when there is reasonable suspicion that the results are flawed.>>
They are games played by people who are actually trying to win, even if they aren't necessarily particularly invested in the tournament. There is no reason at all to exclude them from the data collection process. But given that they make up a not insignificant portion of the data, they might be skewing it a bit.
By Peter D Bakija (Bakija) on Tuesday, September 06, 2011 - 04:02 pm: Edit |
Clayton wrote:
>>1) a limited G rack
2) 32 warp >>
I don't think that 32 warp will fly, honestly. I can see going up to 40 power (add 2xAWR, presumably), but I can't see the warp engines getting enlarged. As they specifically removed 32 box warp engines from the tournament by design (see: Gorn, TFH), and the only 32 warp ship in the game is the Selt, which is a weird outlier and, well, not particularly good. I suspect that historical 30 warp is a non negotiable state.
Administrator's Control Panel -- Board Moderators Only Administer Page | Delete Conversation | Close Conversation | Move Conversation |