lxml – Extractors for XML or HTML data extracting.

class data_extractor.lxml.AttrCSSExtractor(expr: str, attr: str)

Bases: data_extractor.lxml.CSSExtractor

Use CSS Selector for XML or HTML data subelements’ attribute value extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters
  • expr (str) – CSS Selector Expression.

  • attr (str) – Target attribute name.

extract(element: lxml.etree._Element) → List[str]

Extract subelements’ attribute value from XML or HTML data.

Parameters

element (data_extractor.lxml.Element) – Target.

Returns

List of str, extracted result.

Return type

list

Raises

ExprError – CSS Selector Expression Error.

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns

Data or subelement.

Return type

Any

Raises

ExtractError – Thrown by extractor extracting wrong data.

class data_extractor.lxml.CSSExtractor(expr: str)

Bases: data_extractor.core.AbstractSimpleExtractor

Use CSS Selector for XML or HTML data subelements extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters

expr (str) – CSS Selector Expression.

extract(element: lxml.etree._Element) → List[lxml.etree._Element]

Extract subelements from XML or HTML data.

Parameters

element (data_extractor.lxml.Element) – Target.

Returns

List of data_extractor.lxml.Element objects, extracted result.

Return type

list

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns

Data or subelement.

Return type

Any

Raises

ExtractError – Thrown by extractor extracting wrong data.

data_extractor.lxml.Element

alias of lxml.etree._Element

class data_extractor.lxml.TextCSSExtractor(expr: str)

Bases: data_extractor.lxml.CSSExtractor

Use CSS Selector for XML or HTML data subelements’ text extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters

expr (str) – CSS Selector Expression.

extract(element: lxml.etree._Element) → List[str]

Extract subelements’ text from XML or HTML data.

Parameters

element (data_extractor.lxml.Element) – Target.

Returns

List of str, extracted result.

Return type

list

Raises

ExprError – CSS Selector Expression Error.

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns

Data or subelement.

Return type

Any

Raises

ExtractError – Thrown by extractor extracting wrong data.

class data_extractor.lxml.XPathExtractor(expr: str)

Bases: data_extractor.core.AbstractSimpleExtractor

Use XPath for XML or HTML data extracting.

Before extracting, should parse the XML or HTML text into data_extractor.lxml.Element object.

Parameters

expr – XPath Expression.

extract(element: lxml.etree._Element) → Union[List[lxml.etree._Element], List[str]]

Extract subelements or data from XML or HTML data.

Parameters

element (data_extractor.lxml.Element) – Target.

Returns

List of data_extractor.lxml.Element objects, List of str, or str.

Return type

list

Raises

data_extractor.exceptions.ExprError – XPath Expression Error.

extract_first(element: Any, default: Any = sentinel) → Any

Extract the first data or subelement from extract method call result.

Parameters
  • element (Any) – The target data node element.

  • default (Any, optional) – Default value when not found. Default: data_extractor.utils.sentinel.

Returns

Data or subelement.

Return type

Any

Raises

ExtractError – Thrown by extractor extracting wrong data.