EBU National Grading Scheme

9 Pages
« First
←
6
7
8
9
→

You cannot start a new topic
You cannot reply to this topic

EBU National Grading Scheme How accurate is it likely to be?

#141 Vampyr

Group: Advanced Members
Posts: 10,611
Joined: 2009-September-15
Gender:Female
Location:London

Posted 2012-April-14, 16:04

gnasher, on 2012-April-12, 04:45, said:

For the "strong newcomer" problem in general, I think it would be OK to make a subjective decision in a case where a new member had previously played in another country, and perhaps also where a player has reappeared after an absence of several years.

Alternatively, what about waiting until they'd played enough times to produce a reliable grade, then retrospectively applying that to the games they played earlier? Or are there technical obstacles to that?

I had assumed they were doing the latter, but in any case it seems obvious that it is right to do one or the other.

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones -- Albert Einstein

#142 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-April-15, 04:30

weejonnie, on 2012-April-14, 14:26, said:

(One point - if we play in the same direction as a pair - is the SOpp the strength of the pair or the average strength of the two players?

I'm pretty sure it's the average of the two players' grades. Apart from anything else, they don't have enough data to use partnership grades.

Edit: This is probably confirmed by the NGS Guide:

NGS Guide said:

The strength of a field is the average current grade of all the players in an event at the start of play of that event

The guide makes a distinction between "strength of field" and "strength of opponents", but I think the intended meaning is that "strength of opponents" is also based on player-grades.

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#143 Vampyr

Group: Advanced Members
Posts: 10,611
Joined: 2009-September-15
Gender:Female
Location:London

Posted 2012-April-15, 10:42

gnasher, on 2012-April-15, 04:30, said:

The guide makes a distinction between "strength of field" and "strength of opponents", but I think the intended meaning is that "strength of opponents" is also based on player-grades.

Apparently your grade for a session is based on the "average strength" of the field. Would it be more accurate to calculate each board against the opponents you played it against, or would this have the same end result since you are matchpointing against the field in question?

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones -- Albert Einstein

#144 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-April-15, 11:20

Vampyr, on 2012-April-15, 10:42, said:

In a two-winner Mitchell it's based on the average strength of the people who sit the same way as you do. That sounds right to me.

In a one-winner Mitchell (ie one with arrow-switches), it appears to be based on the average strength of the field. That also sounds right to me: if we accept that arrow-switching allows us to produce a winner by comparing the scores of the entire field, it also allows us to produce ratings by comparing against the entire field.

This is the relevant part of the Guide:

NGS Guide said:

So the SOpp factor will vary depending on type of movement:
For Swiss Events, take the average current grade of your opponents for each match.
For 1 Winner Movements, take the average current grade of all the other pairs.
For 2 Winner Movements, take the average current grade of all the other pairs sitting in the same direction as you.

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#145 broze

Group: Advanced Members
Posts: 1,006
Joined: 2011-March-08
Gender:Male
Location:UK

Posted 2012-October-17, 20:06

Time for me to weigh in. I'm not going to comment too much on what I think of the accuracy of the NGS since I haven't looked through the maths too much or even got an established grade myself. However, having looked up players that I know or have partnered I was surprised at some (just some) of the ratings. The problem cases I've observed often fall into the category of player A in this example:

Player A and B are partners; they play together regularly and do quite well, both acquiring ratings of 57.00 or so. However, player A is completely dependent on player B and when A plays in a pickup partnership with C he does terribly, there is no "gelling" (to use the term in the NGS guide) and they almost always come in last. Player A then plays more and more with player B, getting his grade back up to 56/57 and then plays with C again with the same disastrous result.

The problem is player C is not a bad bridge player but only gets to play club bridge with player A. Because whenever they play together A is rated well, player C's rating suffers greatly when they do badly. Even if C manages to save a few boards and not come last, his rating still suffers unfairly because player A just can't cope with him. Is this example relevant or helpful? And with it all in mind would it not be possible for the NGS to take into account the partnership grade; i.e. the Player A - Player C grade would be so low but when they do better than their pship grade might predict this cancels out their bad results to some extent. I hope this is clear and that I haven't grossly misunderstood or overlooked something. It could be the case that the maths in place already tackles this issue in which case I haven't understood it correctly.

On another note, I am delighted that the EBU have instituted the NGS. I think it can only be a good thing. I am very interested in the fluctuation of my grade and may even play more bridge as a result, certainly it gives me more motivation when playing with weaker partner's to know that my good play will have a positive tangible effect even if there's no way we'll win or get into the masterpoints.

I hope to see the NGS develop the system further, and a few tweaks here and there wouldn't go amiss. For example, when looking down my list of sessions it would be nice for there to be a column or icon to identify the form of scoring used for a particular session, so that when a change to my grade is not registered I can ascertain the reason. Also it would be good to be able to click on a session and see the ranking of each player that played, instead of having to look them all up. Seeing as the information is easily available anyway, it seems to me that this would not be unfair or intrusive to institute (especially as people who are uncomfortable with others being able to see their ranking can opt out as I understand it).

Thanks to the EBU for creating this and I hope they continue to develop its use.

'In an infinite universe, the one thing sentient life cannot afford to have is a sense of proportion.' - Douglas Adams

#146 barmar

Group: Admin
Posts: 21,910
Joined: 2004-August-21
Gender:Male

Posted 2012-October-17, 20:52

For a system like this to work well, you need lots of mixing and matching of players. The worst case is if a pair only plays together -- you can't tell how much each contributes to the partnership's results.

#147 Vampyr

Group: Advanced Members
Posts: 10,611
Joined: 2009-September-15
Gender:Female
Location:London

Posted 2012-October-17, 21:50

barmar, on 2012-October-17, 20:52, said:

Yes, I have always thought that rating of pairs instead of players would be more meaningful; but anyway it is just a bit of fun.

There is one thing I would like to know about it; does anyone know how it works for longer events. A player with a 61% rating is expected to score 61% in a session in order to keep from dropping -- are they also expected to score 61% in a four-session event? Actually there is another thing I want to know -- in Swiss Pairs, is every match scored as a session?

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones -- Albert Einstein

#148 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-October-18, 01:57

Vampyr, on 2012-October-17, 21:50, said:

There is one thing I would like to know about it; does anyone know how it works for longer events. A player with a 61% rating is expected to score 61% in a session in order to keep from dropping -- are they also expected to score 61% in a four-session event?

This is discussed in the Guide, at the bottom of page 6.

The grade score is factored by the number of boards involved. In a multi-session event, the sessions are treated as having occurred simultaneously, on the first day of the event. Scoring 65% and 57% in two 24-board sessions will have the same effect as scoring 61% in each.

Quote

Actually there is another thing I want to know -- in Swiss Pairs, is every match scored as a session?

No, it's calculated as a single session. However, it shouldn't make any difference whether you treat it as one session or as seven simultaneous sessions, or even as 56 simultaneous 1-board sessions.

The main difference between Swiss pairs and a normal pairs is the way that the "Strength of Opponents" is calculated (explained on page 11 and summarised at the top of page 12). In a normal non-arrow-switched pairs, it's the average strengh of the pairs who sit the same way as you. In a Swiss pairs, it's the average strengh of the pairs you played against. Neither of these is perfect, because in both movements your expected result is dependent on both the people you compare with and the people you play against.

This post has been edited by gnasher: 2012-October-18, 02:00

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#149 Vampyr

Group: Advanced Members
Posts: 10,611
Joined: 2009-September-15
Gender:Female
Location:London

Posted 2012-October-18, 14:52

gnasher, on 2012-October-18, 01:57, said:

[info]

Thanks.

I know not with what weapons World War III will be fought, but World War IV will be fought with sticks and stones -- Albert Einstein

#150 ash1968

Group: Members
Posts: 11
Joined: 2011-November-03

Posted 2012-October-18, 17:00

In reading this topic the one thing that stood out to me was the non-defensive attitude to the discussion from the developers of the scheme. The responses to questions raised were genuine, thoughtful and constructive. Very different from most authorities. Well done.

Cheers, Stephen

#151 TimG

Group: Advanced Members
Posts: 3,972
Joined: 2004-July-25
Gender:Male
Location:Maine, USA

Posted 2012-October-19, 12:52

broze, on 2012-October-17, 20:06, said:

Player A and B are partners; they play together regularly and do quite well, both acquiring ratings of 57.00 or so. However, player A is completely dependent on player B and when A plays in a pickup partnership with C he does terribly, there is no "gelling" (to use the term in the NGS guide) and they almost always come in last. Player A then plays more and more with player B, getting his grade back up to 56/57 and then plays with C again with the same disastrous result.

I do not know the details of the EBU National Grading Scheme, but in general a dynamic rating system should work this way. . .

When A plays with C and his rating drops, then goes back to play with B and has the same results that previously had the partnership at 57, A's rating will not go all the way back up to 57. A's rating will end somewhere below 57 and B's rating will end up somewhere above 57.

I don't think the scenario where A plays like a 57 player with B, but plays like a last place player with C is realistic. Maybe A unsuccessfully plays a wild and gambling game when he plays with C? If that is truly the case and A, B, and C seldom play with other players, then the rating system could give some odd results for these players. But, as A, B, and C play with other players, these oddities should be lessened.

#152 gnasher

Andy Bowles

Group: Advanced Members
Posts: 11,993
Joined: 2007-May-03
Gender:Male
Location:London, UK

Posted 2012-October-20, 03:54

TimG, on 2012-October-19, 12:52, said:

The EBU's system does do that. The contribution of a single result to each player's grade is:

0.5 * [own grade] – 0.5 * [partner's grade] + [score adjusted for strength of field/opponents]

If A has dropped to 55, B is still rated 57, and they score 57, the contributions from this game will be:

A : 56
B: 58

Curiously, last night somebody told me that this characteristic was a flaw in the scheme.

... that would still not be conclusive proof, before someone wants to explain that to me as well as if I was a 5 year-old. - gwnn

#153 mike777

Group: Advanced Members
Posts: 17,586
Joined: 2003-October-07
Gender:Male

Posted 2012-October-20, 04:08

I dont get any of this?

what is the point?

so far no one says any point.....

If you have a claim then say so
prove why it matters
prove why we care
prove your method best

so far all i see is alot of crap

#154 TimG

Group: Advanced Members
Posts: 3,972
Joined: 2004-July-25
Gender:Male
Location:Maine, USA

Posted 2012-October-20, 06:37

gnasher, on 2012-October-20, 03:54, said:

The EBU's system does do that. The contribution of a single result to each player's grade is:

0.5 * [own grade] – 0.5 * [partner's grade] + [score adjusted for strength of field/opponents]

If A has dropped to 55, B is still rated 57, and they score 57, the contributions from this game will be:

A : 56
B: 58

Curiously, last night somebody told me that this characteristic was a flaw in the scheme.

I think there is probably need for some modification to reflect partnership experience. In general, a pickup partnership of two 57 players should not be expected to perform as well as an experienced partnership of two 57 players. Of course if the two 57s in the pickup partnership earned their ratings by playing mostly in pickup partnerships while the experienced pair earned their 57 ratings almost exclusively playing with each other. . .

Anyway, I would seem to me that the partnership aspect of things is more complicated that assigning rating based upon the arithmetic mean of two ratings. A case can be made for having partnership ratings only rather than individual ratings.

#155 awm

Group: Advanced Members
Posts: 8,624
Joined: 2005-February-09
Gender:Male
Location:Zurich, Switzerland

Posted 2012-October-21, 14:28

There are a lot of issues with these ratings schemes, some of which are hard to resolve. The "experienced partnership" issue is one problem; presumably I could raise my rating by playing mostly in experienced partnerships (which tend to do better than the sum of our ratings) rather than pickup partnerships (presumably vice versa). There are also some issues with non-linearity of expected score... it's very hard to get scores above 80 or below 20 for example regardless of how well you play and how weak/strong the field is. And of course players who have few regular partners or play mostly in team events are hard to rate.

However, the right approach might be to compare this system to what preceded it (just master points) instead of to an "ideal system" of some sort, and I think there it comes out fairly well.

Adam W. Meyerson
a.k.a. Appeal Without Merit

#156 mikestar13

Group: Full Members
Posts: 649
Joined: 2010-October-27
Gender:Male
Location:San Bernardino, CA USA

Posted 2012-October-21, 16:21

My take is that ratings are of little or no value in differentiating skill levels of advanced and expert players, though somewhat useful for lesser players. I don't need a number to tell me that Meckwell or Fantunes are better than Joe Schmo and his brother Moe, and the best way to rate Meckwell vs. Fantunes is to have them play each other. In baseball, the World Series is won by the team that wins 4 out of 7 on the baseball field, not by the team with the better stats. If you are a good enough player to know that even good rating systems such as the EBU's are deeply flawed, you are too good to need them anyway. For the rest of us (in ACBL terms, substitute your own countries' "currency"), 500 Master Points and a couple of dollars will buy you a cup of coffee (I've played in the days when it was 300 and a buck, some inflation, huh?)

#157 mr1303

Admirer of Walter the Walrus

Group: Advanced Members
Posts: 2,566
Joined: 2003-November-14
Gender:Male
Location:Ulaanbaatar, Mongolia
Interests:Bridge, surfing, water skiing, cricket, golf. Generally being outside really.

Posted 2012-October-22, 08:13

Is it important? No, not really.

It is interesting? Yes. I check my placing at least once a week.

Does it show any significant information? No, not really. My highest ranking was an ace of hearts, when I had little to do except play bridge and I was full of energy when I played. Now I have a job, my ranking has gone down a little.

#158 mchristie

Group: Members
Posts: 7
Joined: 2012-March-12
Gender:Male
Location:London

Posted 2012-October-28, 06:08

Thanks for the positive feedback, to those that have given it, thanks to Gnasher for reading the Guide so fully and explaining it here.
Some more thoughts from the developers...

Improved data on web pages? Yes it'll come sometime, but is dependent on other unrelated EBU software changes.

More emphasis on partnership grades? Yes, this is again a display (and search) problem. I'd like to show all partnerships where at least one player belongs to county X, and maybe all partnerships (with over 300 graded boards in the last 3 years) for player P. There are boring software constraints that make these more work to implement than they should be.

Non-linearity of partnership performance... How well should we expect a 64/48 partnership to perform, if not half way between the two grades? I doubt anyone knows, and doubt we could ever get enough data to test any theory. I wouldn't worry about pairs who are so strong/weak their score in a session should be over 70% or under 30%; they are rare and will probably get a realistic grade when they play in a session more suited to their strength.

Pick-up partnerships play worse than regular ones. There is now enough data to quantify this, if we had time to analyse it. An ad-hoc rule that reduced the expected score by x% for a partnership that had played at most 1 prevoius session in the last 3 years would work and not be hard to implement.

The real issue with the A,B,C problem mentioned recently is when A stops playing with C, and FOREVER AFTER will have a lower grade than B. This feels wrong for a system that is supposed to reflect current playing strength, and would need another ad-hoc rule to fix it. The rule would be "if you nearly always play with B your grade is pulled towards B's grade".

All ad-hoc solutions have downsides, and currently our principle is to keep the scheme as mathematically clean as practically possible, so no ad-hoc fixes in the foreseeable future.

That's all for now folks.

Mike Christie

#159 scarletv

Group: Full Members
Posts: 320
Joined: 2009-April-27
Gender:Female
Location:Germany, Bavaria

Posted 2012-October-29, 05:04

mgoetze, on 2012-March-06, 10:41, said:

Zelandakh, on 2012-March-06, 06:19, said:

To mgoetze: do you have a link to the German version, please?

http://vu2109-rails....hoster.de/frame

This link is not working?

#160 MickyB

Group: Advanced Members
Posts: 3,290
Joined: 2004-May-03
Gender:Male
Location:London, England

Posted 2012-October-29, 05:43

mchristie, on 2012-October-28, 06:08, said:

The real issue with the A,B,C problem mentioned recently is when A stops playing with C, and FOREVER AFTER will have a lower grade than B. This feels wrong for a system that is supposed to reflect current playing strength, and would need another ad-hoc rule to fix it. The rule would be "if you nearly always play with B your grade is pulled towards B's grade".

I don't think this would be an improvement. The fundamental problem is that there is limited data to differentiate A from B. I don't think that [tending towards] ignoring the data that we do have is an improvement; Better to just hope that A and B occasionally play with other partners, thus the amount of data we have will increase.

I like the idea of reducing the par score for pick-up partnerships.

9 Pages
« First
←
6
7
8
9
→

You cannot start a new topic
You cannot reply to this topic

BBO Discussion Forums: EBU National Grading Scheme - BBO Discussion Forums

EBU National Grading Scheme How accurate is it likely to be?

#141 Vampyr

#142 gnasher

#143 Vampyr

#144 gnasher

#145 broze

#146 barmar

#147 Vampyr

#148 gnasher

#149 Vampyr

#150 ash1968

#151 TimG

#152 gnasher

#153 mike777

#154 TimG

#155 awm

#156 mikestar13

#157 mr1303

#158 mchristie

#159 scarletv

#160 MickyB

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users

Delete Post

Skin and Language

Execution Stats

BBO Discussion Forums: EBU National Grading Scheme - BBO Discussion Forums

EBU National Grading Scheme How accurate is it likely to be?

1 User(s) are reading this topic 0 members, 1 guests, 0 anonymous users

Delete Post

Skin and Language

Execution Stats

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users