XPath - BioHackersNet

http://www.w3.org/TR/xpath

XML문서의 Parsing방법가운데 한가지. path-like string을 이용하여 문서를 검색하는 언어. 일종의 XML을 위한 SQL이다. XSLT에 주로 활용된다.

XPath expression은 특정 node, 혹은 groups of nodes에 초점을 맞추며, 질의 결과는 다음의 네가지 종류가 있다.

a collection of nodes
a boolean value
a floating-point number
a string

파일시스템의 path와 유사하다고 생각하면 이해하기 쉽다. 루트노드('/')밑으로 해당노드들을 '/'로 연결한다. 물론 파일시스템path보다 다양한 기능들을 포함하고 있다.

다음과같은 XML문서를 예로들면

<?xml version="1.0" encoding="UTF-8"?>
<shiptypes name="Unitied Federation of Planets">
    <ship name="USS Enterprise">
        <class>Sovereign</class>
        <captain>Jean-Luc Picard</captain>
        <registry-code>NCC-1701-E</registry-code>
    </ship>
    <ship name="USS Voyager">
        <class>Intrepid</class>
        <captain>Kathryn Janeway</captain>
        <registry-code>NCC-74656</registry-code>
    </ship>
</shiptypes>

이 문서에서의 XPath는 다음과 같이 표현된다.

'ship/captain'                       # all captain elements (relative location path)
'/shiptypes/ship/captain'            # absolute path
'ship/captain/text()'                # text function
'ship[2]/captain/text()'             # second ship element
'ship[class="Intrepid"]'             # ship elements that have a child class with "Intrepid"
'ship[class="Intrepid"]/@name        # name attribute 
'ship[@name="USS Enterprise"]'       # name attribute
'/shiptypes//captain'                # all matching elements beneath the root
'ship[@name="USS Voyager"]/../@name' # name attribute of parents element

Python XPath 예제

   1 # -*- coding: euc-kr -*- 
   2 import urllib2 
   3 from xml.dom.ext.reader import PyExpat 
   4 from xml.xpath import Evaluate 
   5 
   6 doc = PyExpat.Reader().fromStream( 
   7         urllib2.urlopen('http://www.kma.go.kr/weather/xml/current.xml')) 
   8 
   9 path = u'weather/local[text()="서울"]/@desc' 
  10 for node in Evaluate(path, doc.documentElement): 
  11     print node.value

CategoryLanguage