lxml – Extractors for XML or HTML data extracting.

class data_extractor.lxml.AttrCSSExtractor(expr: str, attr: str)

Bases: CSSExtractor

Use CSS Selector for XML or HTML data subelements’ attribute value extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters:
  • expr (str) – CSS Selector Expression.

  • attr (str) – Target attribute name.

extract(element: _Element) List[str]

Extract subelements’ attribute value from XML or HTML data.

Parameters:

element (data_extractor.lxml.Element) – Target.

Returns:

List of str, extracted result.

Return type:

list

Raises:

ExprError – CSS Selector Expression Error.

extract_first(element: Any, default: Any = sentinel) Any

Extract the first data or subelement from extract method call result.

Parameters:
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns:

Data or subelement.

Return type:

Any

Raises:

ExtractError – Thrown by extractor extracting wrong data.

class data_extractor.lxml.CSSExtractor(expr: str)

Bases: AbstractSimpleExtractor

Use CSS Selector for XML or HTML data subelements extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters:

expr (str) – CSS Selector Expression.

extract(element: _Element) List[_Element]

Extract subelements from XML or HTML data.

Parameters:

element (data_extractor.lxml.Element) – Target.

Returns:

List of data_extractor.lxml.Element objects, extracted result.

Return type:

list

extract_first(element: Any, default: Any = sentinel) Any

Extract the first data or subelement from extract method call result.

Parameters:
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns:

Data or subelement.

Return type:

Any

Raises:

ExtractError – Thrown by extractor extracting wrong data.

data_extractor.lxml.Element

alias of _Element

class data_extractor.lxml.TextCSSExtractor(expr: str)

Bases: CSSExtractor

Use CSS Selector for XML or HTML data subelements’ text extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters:

expr (str) – CSS Selector Expression.

extract(element: _Element) List[str]

Extract subelements’ text from XML or HTML data.

Parameters:

element (data_extractor.lxml.Element) – Target.

Returns:

List of str, extracted result.

Return type:

list

Raises:

ExprError – CSS Selector Expression Error.

extract_first(element: Any, default: Any = sentinel) Any

Extract the first data or subelement from extract method call result.

Parameters:
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns:

Data or subelement.

Return type:

Any

Raises:

ExtractError – Thrown by extractor extracting wrong data.

class data_extractor.lxml.XPathExtractor(expr: str)

Bases: AbstractSimpleExtractor

Use XPath for XML or HTML data extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters:

expr – XPath Expression.

extract(element: _Element) List[_Element] | List[str]

Extract subelements or data from XML or HTML data.

Parameters:

element (data_extractor.lxml.Element) – Target.

Returns:

List of data_extractor.lxml.Element objects, List of str, or str.

Return type:

list

Raises:

data_extractor.exceptions.ExprError – XPath Expression Error.

extract_first(element: Any, default: Any = sentinel) Any

Extract the first data or subelement from extract method call result.

Parameters:
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns:

Data or subelement.

Return type:

Any

Raises:

ExtractError – Thrown by extractor extracting wrong data.