a bug in HtmlCssSelector.CreateSelector?

Mar 27, 2011 at 1:14 AM

hi, Ivony

 

if the html like this "<div class='p-img'>...</div>"   , when i use domdocument.find("div.p-img"), will throw an exception.

 

在 Ivony.Web.Html.HtmlCssSelector.CreateSelector(String expression) 位置 C:\svn\CTP(20100827)\Ivony.Web.Html\HtmlCssSelector.cs:行号 83
   在 Ivony.Web.Html.HtmlCssSelector.<.ctor>b__0(String e) 位置 C:\svn\CTP(20100827)\Ivony.Web.Html\HtmlCssSelector.cs:行号 68
   在 System.Linq.Enumerable.WhereSelectArrayIterator`2.MoveNext()
   在 System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   在 System.Linq.Enumerable.ToArray[TSource](IEnumerable`1 source)
   在 Ivony.Web.Html.HtmlCssSelector..ctor(String[] expressions) 位置 C:\svn\CTP(20100827)\Ivony.Web.Html\HtmlCssSelector.cs:行号 68
   在 Ivony.Web.Html.HtmlCssSelector.Create(String[] expressions) 位置 C:\svn\CTP(20100827)\Ivony.Web.Html\HtmlCssSelector.cs:行号 43
   在 Ivony.Web.Html.HtmlCssSelector.Create(String expression) 位置 C:\svn\CTP(20100827)\Ivony.Web.Html\HtmlCssSelector.cs:行号 33
   在 Ivony.Web.Html.ElementExtensions.Find(IHtmlContainer container, String expression) 位置 C:\svn\CTP(20100827)\Ivony.Web.Html\ElementExtensions.cs:行号 259
   在 Ivony.Web.Html.ElementExtensions.FindSingle(IHtmlContainer container, String expression) 位置 C:\svn\CTP(20100827)\Ivony.Web.Html\ElementExtensions.cs:行号 272
   在 fengyj.me.Find4Me.Spider.buy360.buy360Spider.Search(String keyWords) 位置 E:\My Files\Documents\Visual Studio 2010\Projects\fengyj.me\Find4Me\Find4Me.Spider\buy360\buy360Spider.cs:行号 30
   在 SpiderTest.Frm360buy.btnSearch_Click(Object sender, EventArgs e) 位置 E:\My Files\Documents\Visual Studio 2010\Projects\fengyj.me\Test\Find4MeTest\SpiderTest\Frm360buy.cs:行号 26
   在 System.Windows.Forms.Control.OnClick(EventArgs e)
   在 System.Windows.Forms.Button.OnClick(EventArgs e)
   在 System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
   在 System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
   在 System.Windows.Forms.Control.WndProc(Message& m)
   在 System.Windows.Forms.ButtonBase.WndProc(Message& m)
   在 System.Windows.Forms.Button.WndProc(Message& m)
   在 System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
   在 System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
   在 System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
   在 System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
   在 System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr dwComponentID, Int32 reason, Int32 pvLoopData)
   在 System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
   在 System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
   在 System.Windows.Forms.Application.Run(Form mainForm)
   在 SpiderTest.Program.Main() 位置 E:\My Files\Documents\Visual Studio 2010\Projects\fengyj.me\Test\Find4MeTest\SpiderTest\Program.cs:行号 19
   在 System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
   在 System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
   在 Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
   在 System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   在 System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean ignoreSyncCtx)
   在 System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   在 System.Threading.ThreadHelper.ThreadStart()

Coordinator
Mar 27, 2011 at 3:15 AM

Solution:
This is a known problem, a simple solution to this:
Find ("div [class=p-img]")

Causes:
The character "-" has not been clearly defined whether to allow as part of the CSS class selector.

Relevant code:
"Regulars.cs" file at "Ivony.Web.Html" or "Ivony.Html" project:

    public static readonly string elementExpressionPattern = string.Format( @"(?<elementSelector>(?<name>\w+)?((#(?<identity>\w+))|(\.(?<class>\w+)))?(?<attributeSelector>{0})*(?<pseudoClassSelector>{1})*)", attributeExpressionPatternNoGroup, pseudoClassPatternNoGroup );
    public static readonly string elementExpressionPatternNoGroup = string.Format( @"((\w+)?((#(\w+))|(\.(\w+)))?({0})*({1})*)", attributeExpressionPatternNoGroup, pseudoClassPatternNoGroup );

 

(?<class>\w+)

=>

(?<class>[\w-]+)


This response is to use the Google translation from the Simplified Chinese translation from, if not smooth between syntax, like him to bear with me.

Mar 27, 2011 at 4:50 AM

"Find ("div [class=p-img]")"

seems should remove the space between the "div" and "[class"

:)