险峰's profile安眠花的苹果树PhotosBlogListsMore Tools Help

险峰 张

December 20

揭秘研究员评选背后利益链 最佳分析师从何而来(转)

11月21日,夜,深圳福田香格里拉酒店,名利场的大幕再度拉开:“第六届新财富最佳分析师”颁奖典礼在此举行,31个研究领域的133位“新财富最佳分析师”闪亮登场。不过让人略感好奇的是,在分析师集体看错的2008年,最佳分析师从何而来?
“当然是投给关系好的分析师了。”上海一位参与投票评选最佳分析师的基金经理告诉记者,“不过光关系好也不行,他必须给我们带来利益,就是说让我们赚大钱或者少亏钱。”
“最佳分析师已成为服务基金的最佳分析师,证券公司、上市公司的最佳分析师,而唯独不是散户的最佳分析师。”知情人士总结说。记者采访过的业内人士均认可了这一观点。
“谁给你钱,就为谁说话”
2007年年底,券商分析师对来年的看法都非常乐观,多数人看到7000点到8000点,没有人看到4000点以下。
“当时6000点的时候,确实看不太清,但当股指跌破5000点后,大家逐渐看明白了,熊市来了,只是没有人说出来。”深圳一分析师告诉记者。
最佳分析师当然也看出来了,今年年初,上海某知名券商研究所所长在公司内部会议上指出年内股指可能跌到3000点以下。与此同时,该公司公开的策略报告却指出A股市场在未来数年内几乎不可能走熊,市场高估值也将在相当一段时间内持续,预计2008年上证综合指数在5000点到8000点波动。
“基金还没有出货,他怎么能唱空呢?”业内人士一针见血地指出原因。据悉,该券商的基金分仓收入已达到一年四五亿之多。
“谁给你钱,你就要为谁说话、服务,从来如此。”深圳某基金经理表示。
分析师不仅服务于基金,当券商、上市公司用利益来诱惑时,他也就变成了他们的分析师。北京某知名券商地产分析师一向自诩“独立性非常好”,2007年上半年,很多地产股基金持仓很重,但他坚决看空地产股,甚至下调了对整个行业的评级,赢得一片赞赏。
但当其所在的券商成为万科增发的主承销商时,他改变了观点。他公开表示万科的合理估值是40元,从长期来看万科未来的成长空间比较大,成长速度也比较快,随着公司的发展,公司的估值会相应地提高。可怜万科在40元以上只站了两天,而且没有一天是收在40元以上的。万科增发100亿给主承销商的承销费用在2亿到3亿元之间,在两三亿元面前,一切独立性都灰飞烟灭。
有人说分析师提供的都是免费的报告,不应过于苛责,但当散户们按照中国神华目标价100元、中国平安目标价200元的报告操作时,机构们正在大举出货,中国的散户为这些免费的报告付出了太多血泪的代价。
讲真话的代价
“为什么不说真话?因为中国人喜欢听好话,许小年不是说对了吗?你看看他的下场。”深圳某分析师告诉记者。
2001年在准确预测了中国股市将跌到1000点之后,许小年家的窗户被人砸烂,而他本人也接到无数恐吓与威胁的电话。以至于他年迈的母亲不得不提醒她已成年的孩子说:“孩子,你要学会保护好自己。”许小年终于学会了保护自己,于是2007年的10月我们再也听不到他的声音。
然而,还是有不怕死的,2007年11月,原东莞证券首席分析师李大霄在《A股长期估值顶部已经形成》一文中指出,A股市值取得了突飞猛进的发展,但是,市场的估值也达到了世界的最高位。根据沪深交易所10月17日盘后公布的数据,平均市盈率70倍,深市平均市盈率73倍,远高于通常水平。继续推升市场估值的动力已经不足,A股长期估值的顶部有可能已经形成。
有鉴于此,李大霄认为2008年的投资策略要从进攻转为防御,如何保护牛市的胜利果实是一个重要课题,特别是对社会贡献大的投资者要注意保护自己,还有风险承受能力比较弱的投资者也要引起高度重视。
李大霄也付出了自己的代价,被迫离开生活了20多年的城市,来到深圳。他对新单位的唯一要求就是允许他讲真话。“以我现在的身份地位,讲真话尚且如此,年轻的分析师们说真话,恐怕很难在这个行业生存。”李大霄感慨道。
东方证券房地产行业研究员王树娟就是一例,2007年9月,王树娟对外发布《地产板块2007年中报盘点报告》,正式将行业投资评级从“中性”调低为“看淡”,并认为,地产板块总市值已达1.4万亿元左右,较年初上涨约250%,PB平均达到9倍左右,估值水平明显提高,其中一线及部分二线蓝筹PB达到十几到二十几倍,PE普遍在50-60倍以上,泡沫特征明显。
王树娟的报告让当时手握大量地产股的基金经理愤怒了,他们联合起来集体封杀王树娟,不接受她的报告,拒绝她的路演。最终王树娟离开了这个行业。“虽然她很有行业研究经验和独立研究精神,但是她没能同客户搞好关系,我们研究机构也要生存,这是没有办法的事情。”她昔日的同事解释说。
 
 
 
由上,一点结论,在中国看分析报告,大多数时候你要反了看。
December 17

Optimizing regular expressions in Java

 
By Cristian Mocanu, JavaWorld.com, 09/04/07
 
If you've struggled with regular expressions that took hours to match when you needed them to complete in seconds, this article is for you. Java developer Cristian Mocanu explains where and why the regex pattern-matching engine tends to stall, then shows you how to make the most of backtracking rather than getting lost in it, how to optimize greedy and reluctant quantifiers, and why possessive quantifiers, independent grouping, and lookarounds are your friends.
Writing a regular expression is more than a skill -- it's an art.
-- Jeffrey Friedl

In this article I introduce some of the common weaknesses in regular expressions using the default java.util.regex package. I explain why backtracking is both the foundation of pattern matching with regular expressions and a frequent bottleneck in application code, why you should exercise caution when using greedy and reluctant quantifiers, and why it is essential to benchmark your regex optimizations. I then introduce several techniques for optimizing regular expressions, and discuss what happens when I run my new expressions through the Java pattern-matching engine.
For the purpose of this article I assume that you already have some experience using regular expressions and are most interested in learning how to optimize them in Java code. Topics covered include simple and automated optimization techniques as well as how to optimize greedy and reluctant quantifiers using possessive quantifiers, independent grouping, and lookarounds. See the Resources section for an introduction to regular expressions in Java. Notation
I use double quotes ("") to delimit regular expressions and input strings, X, Y, Z to denote regular sub-expressions or a portion of a regular expression, and a, b, c, d (et-cetera) to denote single characters.
The Java pattern-matching engine and backtracking
The java.util.regex package uses a type of pattern-matching engine called a Nondeterministic Finite Automaton, or NFA. It's called nondeterministic because while trying to match a regular expression on a given string, each character in the input string might be checked several times against different parts of the regular expression. This is a widely used type of engine also found in .NET, PHP, Perl, Python, and Ruby. It puts great power into the hands of the programmer, offering a wide range of quantifiers and other special constructs such as lookarounds, which I'll discuss later in the article.
At heart, the NFA uses backtracking. Usually there isn't only one way to apply a regular expression on a given string, so the pattern-matching engine will try to exhaust all possibilities until it declares failure. To better understand the NFA and backtracking, consider the following example:
The regular expression is "sc(ored|ared|oring)x" The input string is "scared"
First, the engine will look for "sc" and find it immediately as the first two characters in the input string. It will then try to match "ored" starting from the third character in the input string. That won't match, so it will go back to the third character and try "ared". This will match, so it will go forward and try to match "x". Finding no match there, it will go back again to the third character and search for "oring". This won't match either, and so it will go back to the second character in the input string and try to search for another "sc". Upon reaching the end of the input string it will declare failure.
Optimization tips for backtracking
With the above example you've seen how the NFA uses backtracking for pattern matching, and you've also discovered one of the problems with backtracking. Even in the simple example above the engine had to backtrack several times while trying to match the input string to the regular expression. It's easy to imagine what could happen to your application performance if backtracking got out of hand. An important part of optimizing a regular expression is minimizing the amount of backtracking that it does.

The Java pattern-matching engine has several optimizations at its disposal and can apply them automatically. I will discuss some of them later in the article. Unfortunately you can't rely on the engine to optimize your regular expressions all the time. In the above example, the regular expression is actually matched pretty fast, but in many cases the expression is too complex and the input string too large for the engine to optimize.
Because of backtracking, regular expressions encountered in real-world application scenarios can sometimes take hours to completely match. Worse, it takes much longer for the engine to declare that a regular expression did not match an input string than it does to find a successful match. This is an important fact to remember. Whenever you want to test the speed of a regular expression, test it mostly on strings that it does not match. Among those, especially use strings that almost match, because those take the longest to complete.
Now let's consider some of the ways you can optimize your regular expressions for backtracking.
Simple ways to optimize regular expressions
Later in the article I'll get into the more involved ways you can optimize regular expressions in Java. To start, though, here are a few simple optimizations that could save you time:
If you will use a regular expression more than once in your program, be sure to compile the pattern using Pattern.compile() instead of the more direct Pattern.matches(). Not compiling the regular expression can be costly if Pattern.matches() is used over and over again with the same expression, for example in a loop, because the matches() method will re-compile the expression every time it is used. Also remember that you can re-use the Matcher object for different input strings by calling the method reset().
Beware of alternation. Regular expressions like "(X|Y|Z)" have a reputation for being slow, so watch out for them. First of all, the order of alternation counts, so place the more common options in the front so they can be matched faster. Also, try to extract common patterns; for example, instead of "(abcd|abef)" use "ab(cd|ef)". The latter is faster because the NFA will try to match ab and won't try any of the alternatives if it doesn't find it. (In this case there are only two alternatives. If there were many alternatives the gains in speed would be more impressive.) Alternation really can slow down your programs. The expression ".*(abcd|efgh|ijkl).*" was three times slower in my test than using three calls to String.indexOf(), one for each alternative in the regular expression.
Capturing groups incur a small-time penalty each time you use them. If you don't really need to capture the text inside a group, always use non-capturing groups. For example, use "(?:X)" instead of "(X)".
Let the engine do the work for you
As I mentioned before, the java.util.regex engine can optimize a regular expression several ways when it is compiled. For example, if the regular expression contains a string that must be present in the input string (or else the whole expression won't match), the engine can sometimes search that string first and report a failure if it doesn't find a match, without checking the entire regular expression.

Another very useful way to automatically optimize a regular expression is to have the engine check the length of the input string against the expected length according to the regular expression. For example, the expression "\d{100}" is internally optimized such that if the input string is not 100 characters in length, the engine will report a failure without evaluating the entire regular expression. Using benchmarks
After you have identified a possible improvement of a regular expression, even if you are certain that it will improve the speed, make a benchmark and compare the results against the previous expression. If the engine was able to internally optimize the previous expression better than the new one, it could lead to unexpected performance penalties.
For instance, the Java regex engine was not able to optimize the expression ".*abc.*". I expected it would search for "abc" in the input string and report a failure very quickly, but it didn't. On the same input string, using "String.indexOf("abc")" was three times faster then my improved regular expression. It seems that the engine can optimize this expression only when the known string is right at its beginning or at a predetermined position inside it. For example, if I re-write the expression as ".{100}abc.*" the engine will match it more than ten times faster. Why? Because now the mandatory string "abc" is at a known position inside the string (there should be exactly one hundred characters before it).

Whenever you write complex regular expressions, try to find a way to write them such that the regex engine will be able to recognize and optimize for these particular situations. For instance, don't hide mandatory strings inside groupings or alternations because the engine won't be able to recognize them. When possible, it is also helpful to specify the lengths of the input strings that you want to match, as shown in the example above.
Optimizing greedy and reluctant quantifiers
You have some basic ideas of how to optimize your regular expressions, as well as some of the ways you can let the regex engine do the work for you. Now let's talk about optimizing greedy and reluctant quantifiers. A greedy quantifier such as "*" or "+" will first try to match as many characters as possible from an input string, even if this means that the input string will not have sufficient characters left in it to match the rest of the regular expression. If this happens, the greedy quantifier will backtrack, returning characters until an overall match is found or until there are no more characters. A reluctant (or lazy) quantifier, on the other hand, will first try to match as few characters in the input string as possible.
So for example, say you want to optimize a sub-expression like ".*a". If the character a is located near the end of the input string it is better to use the greedy quantifier "*". If the character is located near the beginning of the input string it would be better to use the reluctant quantifier "*?" and change the sub-expression to ".*?a". Generally, I've noticed that the lazy quantifier is a little faster than its greedy counterpart.
Another tip is to be specific when writing a regular expression. Use general sub-constructs like ".*" sparingly because they can backtrack a lot, especially when the rest of the expression can't match the input string. For example, if you want to retrieve everything between two as in an input string, instead of using "a(.*)a", it's much better to use "a([^a]*)a".
Possessive quantifiers and independent grouping
Possessive quantifiers and independent grouping are the most useful operators for optimizing regular expressions. Use them whenever you can to dramatically improve the execution time of your expressions. Possessive quantifiers are denoted by the extra "+" sign, such as in the expression "X?+", "X*+", "X++". The notation for an independent grouping is "(?>X)".
I have successfully used both possessive quantifiers and independent grouping to reduce the execution time of regular expressions from a few minutes to a few seconds. Both operators are allowed to disable the backtracking behavior of the pattern-matching engine for the group to which they are applied. They will try to match their expression as any greedy quantifier would, but if they are able to match it, they will not give back what they have matched, even if this causes the overall regular expression to fail.
The difference between them is subtle. You can see it best by comparing the possessive quantifier "(X)*+" and the independent grouping "(?>X)*". In the former case, the possessive quantifier will disable backtracking for both the X sub-expression and the "*" quantifier. In the latter case, only backtracking for the X sub-expression will be disabled, while the "*" operator, being outside the group, is not affected by the independent grouping and is free to backtrack.
How would you optimize this regular expression?
Now let's consider an optimization example. Say you're trying to match the sub-expression "[^a]*a" on a long input string containing only the character b repeated many times. This expression will fail because the input string does not contain any instances of the character a. Because the pattern engine doesn't know this, it will try to match the expression "[^a]*". Because "*" is a greedy quantifier, it will grab all the characters until the end of the input string, and then it will backtrack, giving back one character at a time in the search for a match.
The expression will fail only when it can't backtrack anymore, which can take some time. Worse, because the "[^a]*" grabbed all characters that weren't a, even backtracking is useless.

The solution is to change the expression "[^a]*a" to "[^a]*+a" using the possessive quantifier "*+". This new expression fails faster because once it has tried to match all the characters that are not a it doesn't backtrack; instead it fails right there.
Lookaround constructs
If you want to write a regular expression that matches any character except some, you could easily write something like "[^abc]*" which means: Match any characters except a or b or c. But what if you wanted it to match strings like "cab" or "cba", but not "abc"?
For this you could use the lookaround constructs. The java.util.regex package has four of them:
Positive lookahead: "(?=X)"
Negative lookahead: "(?!X)"
Positive lookbehind: "(?<=X)"
Negative lookbehind: "(?<!X)"
The word positive in this case means that you want the expression to match, while the word negative means that you don't want the expression to match. Lookahead means that you want to search to the right of your current position in the input string. Lookbehind means that you want to search to the left. Remember that the lookaround constructs only peek forward or backward; they don't actually change the current position in the input string. That said, you could use something like "((?!abc).)*" using the negative lookahead operator "?!" to match any sequence of characters but not "abc" in the given order.
Lookarounds in practice
Lookaround constructs help you to be more specific when writing regular expressions, which can have a big affect on matching performance. Listing 1 shows a very common example: using a regular expression to match HTML fields.
Listing 1. Matching HTML fields
Regular expression: "<img.*src=(\S*)/>"
Input string 1: "<img border=1 src=image.jpg />"
Input string 2: "<img src=src=src=src= .... many src= ... src=src="
 
With the regular expression in Listing 1, the goal is to match the contents of the "src" attribute from an HTML image tag. I especially simplified the expression, assuming that there will be no other attributes after "src", to be able to focus on its performance aspects. Why not be lazy?
You might be thinking that I could have used the reluctant quantifier ".*?" to optimize the regular expression in Listing 1. In fact, "<img.*?src=(.*)/>" would easily match the first-encountered "src=". This solution works for cases where the regular expression matches. If it didn't match the input string, however, it would start to backtrack and would take just as long to match as the greedy quantifier. Remember to always test your regular expressions using non-matching strings first!

The expression is fast enough when matching the input "string 1", but it takes a very long time to declare failure in its attempt to match the input "string 2 (time growing exponentially with the length of the input string). It fails because there is no "/>" at the end of the input string. To optimize this expression, look at the first ".*" construct. It is supposed to match any attributes that come before "src" but is too generic and it matches too much. In fact, the construct should only match any attributes except "src".
The rewritten expression "<img((?!src=).)*src=(\S*)/>" will handle a large, non-matching string almost a hundred times faster then the previous one!
A note about the StackOverflowError
Sometimes the regex Pattern class will throw a StackOverflowError. This is a manifestation of the known bug #5050507, which has been in the java.util.regex package since Java 1.4. The bug is here to stay because it has "won't fix" status. This error occurs because the Pattern class compiles a regular expression into a small program which is then executed to find a match. This program is used recursively, and sometimes when too many recursive calls are made this error occurs. See the description of the bug for more details. It seems it's triggered mostly by the use of alternations.
If you encounter this error, try to rewrite the regular expression or split it into several sub-expressions and run them separately. The latter technique can also sometimes even increase performance.
In conclusion
Regular expressions shouldn't take hours to match, especially for applications that only have seconds to spare. In this article I've introduced some of the weak points of the java.util.regex package and shown you how to work around them. Simple bottlenecks like backtracking just require a little finesse whereas culprits like greedy and reluctant quantifiers require more careful consideration. In some cases you can replace them completely, in others you simply have to "lookaround" them. Either way, you've learned some good tricks for coaxing speed out of your regular expressions.
 
September 12

广东省卫生厅副厅长的文章

亮菌甲素案没有结束,三聚氰胺又来了!什么时候大头婴,结石儿不再在中国大陆出现?善良的人们,用您的正直,用您的良心去做好每一件事吧!上天会记住您的!
  可怕的三聚氰胺,可怜的三鹿宝宝
  朋友在问我,三聚氰胺是何物?大家记得否,就在不那么久远的2007年,我国一家出口美国猫狗食物的公司在宠物食品事件导致中美关系轩然大波,其元凶是什么?就是三聚氰胺!
  在食品制作需要检查蛋白质含量,但是直接测量蛋白质含量技术上比较复杂,成本也比较高,不适合大范围推广,所以业界常常使用一种叫做“凯氏定氮法(Kjeldahlmethod)”的方法,通过食品中氮原子的含量来间接推算蛋白质的含量。也就是说,食品中氮原子含量越高,这蛋白质含量就越高。因此,三聚氰胺被派上大用场了。(具体参考技术专业书籍)
  为什么要用三聚氰胺呢?关键是含氮量很高,生产工艺简单、成本很低,给了掺假、造假者极大地利益驱动,有人估算在植物蛋白粉和饲料中使蛋白质增加一个百分点,用三聚氰胺的花费只有真实蛋白原料的1/5。所以“增加”产品的表观蛋白质含量是添加三聚氰胺的主要原因,三聚氰胺作为一种白色结晶粉末,没有什么气味和味道,掺杂后不易被发现等也成了掺假、造假者心存侥幸的辅助原因。
  1994年《国际化学品安全手册》表明:长期或反复大量摄入三聚氰胺可能对肾与膀胱产生影响,导致产生结石。为什么我们还这样做呢?
  我可百思不得其解!我们是生产奶粉,而不是生产氮,就算是给畜生吃也是没有营养价值的呀!它既不是填充剂也不是代用品,而是一种废物,一种造成肾结石的元凶!
  据说当时美国人发现三聚氰胺后百思不得其解,不知道为啥添加这玩意,还以为是老鼠药污染造成的。记得当时美国新闻媒体报道都是怀疑中国粮食仓库看管不严,造成老鼠药污染。后来终于有知情的中国人忍不住,偷偷告诉美国人这食品中添加三聚氰胺的奥秘,这高手云集的美国学术界这才恍然大悟,明白过来这复杂的高科技造假过程。
  大家注意这次三鹿奶粉事件,受“污染”的都是最便宜的18块钱一袋的婴儿奶粉,显然,三鹿为了占领农村奶粉市场这块最后的肥肉采取了低价倾销战略,但是卖这18块钱一袋的奶粉连本钱都不够,大量生产岂不亏老本了吗?于是三鹿为了节省成本,在奶粉中添加廉价大豆蛋白粉来替代奶粉,这大豆蛋白粉本来也没啥大事,但是,恰恰这次里面被添加了伪造蛋白质的三聚氰胺这高科技玩意,于是最终制造出这起轰动全国的三鹿奶粉事件。当然,成人奶粉中肯定也添加了这种高科技玩意,因为成年人的代谢能力比婴儿强大得多,除了特殊的病人,自然也不会有中毒事件发生。另外,如果你想知道三聚氰胺这玩意在中国食品工业和饲料工业应用的广泛性,“蛋白精”也许就是这个玩意!
  其实,造假也是一种创造,现在还有比三聚氰胺更先进的造假产品,能“耐水洗化验”,能“抗氨氮反应”。总之一句话,你高科技的爷爷都检测不出来这是假的蛋白质。你生活在这样的一个虚假的社会里,你觉得安全吗?
  有文章这样质问:三鹿奶粉事件,从一个侧面,反映了中国严重的食品安全问题,我们现在究竟还剩下什么东西可以安全地吃进肚子里?三聚氰胺这个黑手,从最初的牛羊饲料市场开始蔓延,发展到今天,终于伸到了婴儿奶粉这个领域。我想数以亿计的中国人,不知不觉中,早已吃了好多年用三聚氰胺喂养出来的猪肉,牛肉,鸡肉,喝了很多年添加了三聚氰胺的成人奶粉,不知不觉中,都受到了三聚氰胺的污染。有没有谁做过三聚氰胺对人类健康长期影响吗?我想肯定还没有,因为谁都不会想到,一个国家几亿人,竟然会去吃这种跟食品风牛马不相及的塑料工业的原料。
  大头婴,结石儿何时不再在中国大陆出现?善良的人们,用您的正直,有您的良心去做好每一件事吧!上天会记住您的!

我向来是不惮以最坏的恶意来揣测中国人的

石家庄官方初步认定:原奶收购中添加三聚氰胺???
国是谁的国?显然,不是我们的。
这么快就认定,凸显了政府的“反应迅速”?
鲁迅先生说:我向来是不惮以最坏的恶意来揣测中国人的。
政治利益和诉求压抑了进一步追查幕后黑手的渴求,对金钱的赤裸裸的追求造成了如此多婴儿的受害,
他们是我们的未来,虽然现在还饱受贫穷的困扰,18元的奶粉对他们已是较大的压力,现在却还要受身心的折磨。
三鹿的老总们,稍有良知的人们,请你们觉醒吧!
是的,也许我这是最坏的恶意,或许而已,我只愿这不是现实。
对于祖国,我们充满希望,却又无能为力,官老爷们,行行好吧,就当积德了!
August 21

阿扁真是强

      被清算了,不过能搞那么多钱真是爽,做太子不要太爽啊,陈致中不用工作也能住豪宅,轻松养活老婆孩子,还握有大把票子,就是不知道美国会不会把他赶回去。这下民进党不知还能耍什么花枪呢,拭目以待!
August 20

只想说一句:女排好样的!

     拼搏了,甭管实力如何,至少努力了!
     巴西男足忒惨了,中场的确玩不过阿根廷,都没几下像样的射门,被闲庭信步地搞定了。不得不说,messi真是太强了,不出现大的伤病,迟早是世界足球先生!
August 19

一场完美的秀

     在昨天,我看不见悲情,我看见的只是一场完美的作秀,当然还有怯懦,人性的懦弱之处尽显。不是不允许失败,不是不允许退出,但是这样的借口说句实在的,只要是明白人都能看得出来,得了吧,您就歇着,别演戏了。炎症不是这么发的,就这样训练还跑12秒80呢,您就扯吧,上神坛吧,ok?
     另外要说句:孙海平,一个大男人,哭成那样,不羞耻嘛,看人家埃蒙斯,还不成你那样呢。送您一句:傻逼!
     网络上负面的评论一概被一删而光,这就是国情,得,做哑巴得了!
 
Photo 1 of 2
No list items have been added yet.