Archive through September 08, 2011

Star Fleet Universe Discussion Board: Star Fleet Battles: SFB Tournament Zone: Tactics Discussion: Archive through September 08, 2011
By Peter D Bakija (Bakija) on Tuesday, September 06, 2011 - 04:08 pm: Edit

David also wrote:
>>I just am a little dissappointed I needed to have the secret decoder ring to utilize the results. >>

I don't think there is a need to have a secret decoder ring.

Just looking at the basic, non ace "win-loss" statistics, the Fed has a 41% win ratio. Which, yes, is the worst in the pack. When you look at the same chart and look at the "with players of equal skill", it bumps up to 45%, which puts it at 12th out of 18, above two Romulans, both Tholians, and both Lyrans.

Ignoring the factor that Dave brought up (which is mostly washed out in the "equal players" tweak), and just looking at the non ace, all games ever statistics, the Fed comes out 12th of 18. Which is a little worse than average. Without any decoder rings.

By Ted Fay (Catwhoeatsphoto) on Tuesday, September 06, 2011 - 04:39 pm: Edit

Another thing. TPTB won't go with two changes at once. We need to propose one change and then playtest the you-know-what out of it.

By David Zimdars (Zimdarsdavid) on Tuesday, September 06, 2011 - 05:55 pm: Edit

Hi Peter,

I was not trying to suggest that any casual or non-serious players had any negative intent; rather I meant that their *data points* were "bad apples" in the data set(not the players themselves, but including the their win loss records). I apologize for this poor choice of words, I meant the data - not the players.

As far as "decoder ring", again, I apologize for the tone of my rhetoric. I meant that I was unaware that the inclusion of some subset of results (Co5N) was creating a special case for the Fed. That is, these data points may be skewing the data in a manner significant enough that experts in the game aware of the nature of these games would call into question the predictive power of the probabilities calculated from said data. I was hoping that the conditions between all tournament ship games were substantially equal in random distribution in player mindset, skill level, temperament - whatever, that comparisons could be made without such caveats arising.

You may be correct that the ace-vs-ace data culls the non-serious Co5N Fed players. On the other hand, there is no guarantee of that.

-Dave

By Peter D Bakija (Bakija) on Tuesday, September 06, 2011 - 06:38 pm: Edit

Dave wrote:
>>As far as "decoder ring", again, I apologize for the tone of my rhetoric. I meant that I was unaware that the inclusion of some subset of results (Co5N) was creating a special case for the Fed.>>

Well, to be fair, I don't know that it actually creating a special case for the Fed, so much as it represents a not insignificant factor that might affect the Fed numbers as a whole--that the Fed is probably inordinately played by newer/less committed players. While the Co5N results might or might not have a significant impact on the data as a whole, it is certainly possible that the simple reality that the Fed often gets played by newer/less experienced/less invested players (and conversely, also gets played less by more experienced players) probably does have an impact (i.e. the Co5N data is a symptom, not the 'cause of, larger issues that affect the Fed numbers).

>>I was hoping that the conditions between all tournament ship games were substantially equal in random distribution in player mindset, skill level, temperament - whatever, that comparisons could be made without such caveats arising.>>

It is possible that is is still the case. But I suspect that Fed has factors that affect its results that come simply from the Fed being the Fed.

>>You may be correct that the ace-vs-ace data culls the non-serious Co5N Fed players. On the other hand, there is no guarantee of that.>>

Oh, certainly not (although what I means was that the "vs players of equal skill" adjustment on the non ace game table which pushes the Fed up to 45% is what probably fixes this). But just looking at the data as presented, with the "all players" list, the Fed comes out at 12 out of 18 (yeah, using the raw numbers, the Fed comes out with a 41% win rating, but as the list is posted in order of "assuming equal skill" results, I suspect that is what the designer intended as the pertinent data point). With the "ace vs ace" players, the Fed comes out at 10 out of 18. Without any other info. In both cases, the Fed is, at worst, a little below average. Not spectacular, but not the worst.

Yes, it is true, that looking at unadjusted, raw win percentage from all players at all, the Fed comes out dead last. But then, the person doing the data crunching didn't think that was as significant as the adjusted 45% number, so it seems reasonable to look at that one as the important one.

By Andy Vancil (Andy) on Tuesday, September 06, 2011 - 07:03 pm: Edit

While I think the Fed is terrible, part of me is perfectly OK with the Fed being weak.

The main issue is the photon variability, which can ruin the game, either way. In many ways, the Fed would be a better ship for the tournament if it was limited to 12-point OLs, and had 5 photons and a little more power, or even 4 photons + 2 plasma-F. But then it wouldn't be a Fed.

Outside of duels, photons are OK as a weapon. In more casual play, they are OK because usually they hit somewhere around average, and it all works out. (At least, until you start using narrow salvos.) But for the tourney, the luck factor is just too large.

By David Cheng (Davec) on Tuesday, September 06, 2011 - 09:21 pm: Edit

Lots of Council prep to do, so I'll have to keep this brief.

1. The "play one and done" phenomenon is certainly not limited to Council. I attended the SFB tournament at TotalCon for years. There were several guys there, for years, who played a game or two to help the tournament, but with no intention or desire to play in the finals bracket.

2. One of the several reasons that the Fed is the most popular "play one and done" ship is because it is the classic Star Trek ship - you get to fly the closest thing to the Enterprise there is in the tournament.

3. The suggestion of "drop the people who have played 4 or fewer games" will not solve the problem. See point #1 above. There is a TotalCon guy, I think his name was John (don't remember his last name), who played one or two games every year for several years. Always flew Fed. Not a serious player, not playing to win, but accumulated well over four games. There are probably many other players like John out there in the data pool.

4. There are plenty of "one and done" tournament games in non-Fed ships too.

5. I agree with Peter. The "players with equal skill" percentage is far more salient than the overall win percentage, in my mind.

6. Who helped push the Fed upgrade more than Council of Five Nations? We openly welcomed players to take the Fed-with-G-Rack for the years we were not sanctioned. We're back to offering Ace Cards again this year, with Petrick on hand to keep everything kosher, so that won't apply this year. But we take full credit for doing at least our fair share to push for the upgrade and provide game results data to back it up.

Council is only about a month away. More in a day or two.

Link: SFB at Council

By Robert Schirmer (Rwschirmer) on Tuesday, September 06, 2011 - 10:04 pm: Edit

Hi Dave Z,

Catching up again here. Thanks for the kind words. To answer your questions:

1. I chose a very broad distribution for the player prior in order to represent my ignorance about the correct ratings in the absence of match results. It's rather ironic given the focus of this thread, but I ballparked the width using the chance of a rookie hitting the jackpot with the FED at range 8, plus a safety factor. Anyway, now that a great deal of match data has been accumulated, it is not surprising or problematic that the posterior rating distribution is narrower than the prior.

As I mentioned before, it is the ship prior that I could really justify making narrower. Specifically, I can look at the SSDs, count boxes, look at weapon tables, etc. and conclude that the TCs should be close to balanced even absent actual match data. I didn't bother to pursue this option though, so I'm using the same weak/broad prior I used for the players, and the match results dominate the RPS estimates when the match count >> 2 (for better or worse).

3. It would in theory be possible for a player to inflate his rating to some extent by carefully selecting who they play, e.g. only each new batch of supposed rookies. I don't think anyone has actually tried this. There would also be some practical difficulties with this plan.

4. I considered putting rating as a function of time into the model. It would not be especially hard to do, but many additional choices would need to be made and justified as to the details (to what extent do you decorrelate the rating across time and/or number of matches? etc.). In any event, it is certainly possible, but I did not do it.

5. The report pdf shows only players with 4+ matches. The underlying database and analysis includes every match I have on record.

6. Sampling errors are always possible. I haven't investigated correlations such as the FED dataset possibly being heavily influenced by casual/lark play vs. highly competitive play and so forth. And of course it's unclear how I could even assess many such factors.

I can say that I maintained additional cuts on the data way back, basically excluding various blocks of more casual tournament results from the analysis (e.g. keep RAT, remove NK) . The overall results did not appear to change much at the time, but I have not carried that piece forward in a rigorous way.

In any event, as you previously pointed out, were I to muck around with the data more, excluding some things while including others, it would get more difficult to justify and explain what I was doing and what it meant to the results.

Of course none of the above moves the ball at all on modifying/not modifying the FED. For what it's worth, the data do seem to show the FED is near the middle of the pack, at least in the hands of skilled players.

v/r

Robert

By Ken Lin (Old_School) on Tuesday, September 06, 2011 - 10:53 pm: Edit

Good post, thanks Robert. Echoes some of what I was going to post, but it means more coming from the one doing the modeling and collecting the data.

Cheers,
Ken

By David Zimdars (Zimdarsdavid) on Tuesday, September 06, 2011 - 11:36 pm: Edit

Hi Robert,

Thanks for the thoughtful post.

I brought up the score "inflation" idea not because I thought it was being done intentionally, but because it might be a by-product of the natural evolution of the player base.

I am not entirely certain of the Bayesian mathematics, so perhaps you can comment on this line of reasoning?

Consider a overall score (SFBTRv2) such as:

Sir_Starfurry 3301 MPE Win% 81% (I'm using Mr. Starfurry as a concrete example, with *no* implications other than a mathematical curiosity). Repeat - this is just mathematics.

If Sir_Starfurry played, on average, players with a 2000 rating, he'd have a difference of 1300, and interpolating table 1 an MPE Win % of ~96%.

But he doesn't, ergo the average rating he played is much higher than 2000.

He in fact has an (astounding!) 81% win record, and interpolating table 1, that is +550 (approximately), so his average opponent is 3301-550=2750.

Only 25 players have ratings above 2750. This suggest that maybe 50% of Sir_Starfurry's games are restricted to these 25 top players.

Or another way of putting it, had he played a distribution of players around 2000, his rating would only be ~2550.

Now, Sir_Starfurry is not currently active. And probably most of the players in his distribution are not active. But it stands to reason those who are, are maintaing a 65% win percentage or better against all comers. Is this buoying the score?

Or, is there some other reason the scores seem to be higher than a casual glance comparing actual win% with table 1 would suggest?

-Dave

By David Zimdars (Zimdarsdavid) on Wednesday, September 07, 2011 - 12:18 am: Edit

Council of 5 Nations

I would also like to point out that Co5N sounds like a fantastic event, and that Dave Cheng should be saluted for organizing and promoting this SFB event for many years. SFB deserves great events like Co5N and great proponents like Dave Cheng.

I have held out slight hope of attending this year, but work and family conflicts make it look grim. However, I'm hoping to pencil in attendance for next years event.

-Dave

By William T Wilson (Sheap) on Wednesday, September 07, 2011 - 12:19 am: Edit

ELO-type rating systems are prone to rating inflation. Whenever a game is played, the winner typically gains more points than the loser loses. This isn't totally unreasonable, as experienced players are typically better and every game played increases the experience of both players in it.

The main factor keeping this in check is that players retire and are replaced with new ones who generally have lower scores.

By Andrew J. Koch (Droid) on Wednesday, September 07, 2011 - 07:56 am: Edit

Sir Starfurry attends Council every year.

By Jean Sexton (Jsexton) on Wednesday, September 07, 2011 - 08:49 am: Edit

Guys, this is NOT the topic to discuss Co5N. Please take it to the appropriate thread.

Thanks,

Jean
WebMom

By David Zimdars (Zimdarsdavid) on Wednesday, September 07, 2011 - 10:00 am: Edit

Several have mentioned the Fed Photon Torpedoe's poor accuracy at range as a problem in tournament duels.

The severe feedback damage at range 1-0 (and range 1 is unique to the photon torpedo) is also a particularly unfortunate cross to bear; and often turns out to be a very high price to pay for a 100% chance to hit.

Consider, in an initial overloaded engagement between a Fed and a Kli, the Kli can equal the Fed damage when feedback is included (both will trade around 100 pts); so the Fed really has to shoot at R2, where on average he will do 25+ more than the Kli. If the Fed survives to fire a second volley, this often in a range 1 knife fight (or under weasels); and then will once again suffer feedback onto weekened shields. The extreme feedback damage makes trading a R1 volley vs. the Fed not nearly the terrible proposition it may seem at first.

Consider, just for hypothetical purposes of discussion, if the Fed were allowed to de-rate the odds of a photon torpedo hitting to 1-5 at range 1-0 for the purposes of avoiding feedback damage. I think this would actually make it far less attractive for an opponent to crawl up on top of the Fed and stay there on the re-load turn.

In a related (but fanciful conjecture regarding any tournament play), I wonder if the Fed CA/CC "in the wild" isn't balanced against dueling with other CAs (such as the D7) because he can narrow salvo the photons. It is a logical Fed doctrine in the wild (although boring and not conducive to the tournament) to cruise to range 8 oblique on imp32, fire an alpha narrow salvo strike with photons and phasers, and either a)finish off the roasted enemy hulk; or b) disengage after the miss. The narrow salvo does kinda make the fed special in these duels, as he's either guaranteed a major victory or at worse a minor loss. Dueling a Fed cruiser one on one in the wild (with narrow salvo and disengagement) is a sucker's bet for any Klingon D7 captain (which, is maybe why you tend to see D7s in groups of 3, which are more than enough to toast a Fed CA guaranteed). If the Fed has a reason not disengage, or is in fleets, or there is lots of EW, things are different. But none of this translates to the tournament (and maybe this is a bit of the reason the Fed feels a little unspecial for it).

By Ted Fay (Catwhoeatsphoto) on Wednesday, September 07, 2011 - 11:04 am: Edit

Not to mention in the wild the Feds usually come in their own squadrons or fleets. In which case you have enough photons that statistically the damage evens out, and the extreme range of the weapon relative to damage at that range becomes an ugly fact of life for the opponent.

By Marcus J. Giegerich (Marcusg) on Wednesday, September 07, 2011 - 12:50 pm: Edit

Dave Z said "Consider, in an initial overloaded engagement between a Fed and a Kli, the Kli can equal the Fed damage when feedback is included (both will trade around 100 pts); so the Fed really has to shoot at R2, where on average he will do 25+ more than the Kli."

True, the range 0-1 feedback thing is clearly an issue. But also keep in mind that a Fed can survive a 100 point alpha through a full shield better than a Klingon can :)

Also keep in mind that a Klingon arming full OLs is very likely to be at a power disadvantage to the Fed for that turn.

By Peter D Bakija (Bakija) on Wednesday, September 07, 2011 - 01:01 pm: Edit

Marcus wrote:
>>Also keep in mind that a Klingon arming full OLs is very likely to be at a power disadvantage to the Fed for that turn.>>

This is a very significant issue when fighting the Fed--on T2 when the initial exchange happens, the Fed generally will have 8 power in heavy weapons. To be able to partially compete with the Fed, a disruptor ship will have 16 power in heavy weapons (and still be doing less damage).

Yes, there is a lot to be said for being able to shoot the next turn, and the Fed has a lot of trouble reloading, and the disruptor ship generally has something else (see: Drones) to help balance the equation (unless you are a Tholian. Man. Tholians are *totally* hosed vs the Fed. Well, at least the Fed gets two slam dunk matchups :-). But on that initial exchange, the Fed has a lot more power to use for things like speed, tractors, and HET than a comparable disruptor opponent. Which is significant.

By David Zimdars (Zimdarsdavid) on Wednesday, September 07, 2011 - 02:04 pm: Edit

I kinda like Clayton's 32 warp and G-Rack idea for the Fed. It certainly would accentuate the Fed's turn 1 and turn 2 ability to close in and get his one definitive R4 strike while moving fast, while providing a shade more hope of surviving until the 2nd volley if he still misses with 3 or 4 torps. On the other hand, the photon lottery would still randomize the results. Swinging the Fed from 45% to 55% wouldn't hurt anything (heck, the WBS is at 55% right now, and the WYN are mostly iconic because the WBS and WAX are at 53-57% or so). Well, I suppose one or two more aces might get knocked out of the tournament after having been sucessfully stuffed with photon torpedoes by a newbie - but that seems to be the iconic tournament appeal of the Fed anyway. Even at 55% it won't be concsistent enough to string together a tournament's worth of wins. I few more aces would play it, for sure, but i suspect most aces would still be turned off by the inconsistency.

By William T Wilson (Sheap) on Wednesday, September 07, 2011 - 04:53 pm: Edit

32 warp from engines is just never going to happen. Even the ships that have 32 warp in the real game don't in the tournament, except the Seltorian. An extra +2 AWR might be barely possible, and would have the same effect on the overall performance of the ship (maybe even better, as they'd partially pad the shuttles).

Tholians need to make the Fed take firepower-reducing mizia damage before exchanging alphas. The ATC is so good at this, and the Fed so slow and unwieldy, that I think the matchup is closer to even, but I haven't played the matchup very often so am not really sure.

By David Zimdars (Zimdarsdavid) on Wednesday, September 07, 2011 - 04:56 pm: Edit

Why does the Selt have 32 warp?

Can the Selt HET at speeed 28?

By Andrew J. Koch (Droid) on Wednesday, September 07, 2011 - 05:01 pm: Edit

Yes it can HET at 28.

By Peter D Bakija (Bakija) on Wednesday, September 07, 2011 - 05:42 pm: Edit

William wrote:
>>32 warp from engines is just never going to happen. Even the ships that have 32 warp in the real game don't in the tournament, except the Seltorian.>>

Yep. The original TC design specified that all (MC1) ships would have 30 warp. As such, the Gorn lost 2 warp (and gained 2 APR), the TFH lost 2 warp (and gained 2 of something. Or possibly lost something. I don't remember what the power set up of the actual FH is. But it did have 32 warp and the TC has 30). I'm not sure how the Selt managed to fall through the cracks and maintain the 32 warp, but it was a late addition, and came in around the same time as the various other "expansion" ships (LDR, TKR, TKE, GBS, ATC), and was definitely a standout for having 32 warp.

>>Tholians need to make the Fed take firepower-reducing mizia damage before exchanging alphas. The ATC is so good at this, and the Fed so slow and unwieldy, that I think the matchup is closer to even, but I haven't played the matchup very often so am not really sure.>>

Heh. I played a quick, sloppy game vs Ken the other day; my Neo vs his Fed. And in the R3 alpha exchange, I hit with 3 of 4 disruptors, and he hit with 1 of 4 photons. And he still won (by virtue of a timely tractor I couldn't fight letting him land a suicide on my down shield that I couldn't do anything about. Which was very tricksy and required a whole lot of things to have luckily lined up for him and him to see it and do it, but still). My initial view of the match up (Neo/Archeo vs Fed) is basically that the Tholian can't do anything significant at all with a web to compromise the Fed's initial strike. And at any range that fire is exchanged, the Fed is just going to do more damage more consistently for less power. All the Fed has to do is to just move forward at moderate speeds, and eventually, it will get a R4 or so shot, and just take it and probably win in the long run. That is a really simplified view, but essentially the issue. Without seeking weapons or anything to distract the Fed's phasers, and needing to spend a lot of power on guns to compete, and the web being mostly little more than a minor inconvenience, this game is super rough for the Tholian. Which the win/loss data very clearly reflects. On the upside, the Fed isn't that good, so the Tholian is unlikely to run into them that often :-)

By Clayton Krueger (Krieg) on Wednesday, September 07, 2011 - 06:10 pm: Edit

'Why does the Selt have 32 warp?' I suppose it has 32 warp because its main tactic is a R4 (or R2) oblique alpha shot, has a D turn mode, and needs a fast getaway to avoid a knife fight and to create space for its reload turn. :)

By David Zimdars (Zimdarsdavid) on Wednesday, September 07, 2011 - 07:03 pm: Edit

Clayton, that sounds familiar. :)

By Robert Schirmer (Rwschirmer) on Thursday, September 08, 2011 - 12:02 pm: Edit

Hi Dave Z,

Responding to your earlier post. Your reasoning on win rate vs. opponent rating is fine, although some care needs to be taken with relating the average opponent rating to the win % and rating difference when the opposition consists of many players with different ratings.

In any case you are essentially correct about the possibility of rating inflation/deflation.

A player's rating is calculated as a single representative number across their entire match history to date. Therefore, a player could play some people, accumulate a record, then stop playing, and subsequently still have their rating move around as the ratings of their opponents change (assume nobody else stopped playing). As a concrete example, if the opponents were newbies when originally played, and then they get way better over time, the rating of the retired player would inaccurately float upward over time since he is getting credit, roughly speaking, for his opponent's current skill level.

More generally, the lack of a time component in the model can be a source of rating error, and a significant one if the following conditions hold: the ratings assigned to some set of a player's opponents are significantly in error, i.e. they do not reflect the opponent's skill at the time the match was played due to skill change over time; the matches with inaccurate ratings are a significant fraction of the total matches played; and the inaccurate rating assignments are significantly biased to one side or the other (mostly overestimated or mostly underestimated, and across wins and losses).

v/r

Robert

Administrator's Control Panel -- Board Moderators Only
Administer Page | Delete Conversation | Close Conversation | Move Conversation