tnevolin, on 2016-August-27, 13:37, said:
One good news and one bad news. I found a large game archive on BBO vugraph page. It is quite extensive: 400+k boards. Unfortunately, they all are hand entered and there are a lot of typos and mistakes. Some of them are easy detectable like impossible deal. Some of the are very hard to detect like one vugraph I found had N-S and E-W hands swapped. A very nasty error. When such error enter into analysis they pollute results. I am trying to devise empiric rules to filter such mistakes out. Anybody can propose such rule? I would prefer to filter out extra to make sure that rest of the observations are clean rather than leave incorrect entries. Thank you.
Hi Tim,
Hard to give some general rules, without probing the actual data/errors a bit, but
for your purposes, maybe it is OK to remove all "weird" results,
regardless of whether they were handtyping-mistakes or actually occurred at the table? (sh*t does happen
)
If so, some rules I can think of...
Trump-contracts:
- if declaring side has a combined 6 or less trump-cards => SKIP THE DEAL
- if declaring side has a combined 7 trumps, and playing a major-contract
higher than 4H/4S, or a minor-contract
higher than 3C/3D => SKIP
NT-contracts:
- if the contract is 2NT or higher, and declaring side has 19 hcp or less => SKIP
- Perhaps, also require, for example, if the contract is 3NT, declaring side should have either 22+hcp or a 6+suit? => Ohterwise, SKIP
- and dito higher requirements for higher 4+NT contracts...
Any contract that goes down 4 or more, is probably not relevant to your analysis? => SKIP
Not sure if this is of any help to you?
But if not, maybe you can give some more detail what errors/issues you found, and any ideas you have,
and we might come up with better suggestions...