웹서버([[HTTP]])를 순회하며 각 홈페이지에 있는 수많은 정보를 수집하는 프로그램으로, 사람이 일일이 홈페이지의 각 링크를 따라가서 정보를 얻는 반복적인 작업을 대신하여 프로그램이 스스로 데이터를 분석하고 그 안의 [[URL]]들을 추출하여 다른 [[URL]]로 이동하면서 정보를 가지고 오는 것이다. 웹의 여기저기를 돌아다닌다는 의미에서 WWW wanderer, 웹이 그물 또는 거미줄을 의미하는 점에서 WWW spider라 불리기도 한다. (WebCrawler 라고도 불리움)

주로 다음의 것들이 있다.

|| '''주요기능''' || '''WebRobot''' ||
|| 자원발견 DB구축 || [[Google]]bot 등 검색엔진용 ||
|| 연결점확인 || MOM spider, EIT Link Verifier Robot ||
|| Web site mirroring || HTMLgobble, GetUrl, WebCopy, WebFetcher ||
|| 웹성장 모니터링 || W4 ||

웹로봇이 멍청하다는 문제점이 있다. 특히 많은 부분 고려를 안하고 대충만들어지게 되면, 웹서버를 죽이던가 하여 다른 시스템들을 파괴할 수 있다. 이를 막기 위해, robots.txt 파일을 사용한다. 거의 무시당하는 경우가 많은듯. 도저히 안되면 IpToBlock방법밖에 없다.

robots.txt 파일의 형식은 SeeAlso [[http://www.robotstxt.org/wc/norobots.html|robot.txt 표준]]
{{{
# --> 주석문
User-agent: --> Agent의 이름, 최소한 '*'라도 입력
Disallow: --> Robot이 방문할 수 없는 URL 기술, 전체 경로나 부분 경로 설정
}}}

[Perky]씨의 크롤러 필터링내용 in httpd.conf --[[yong27]], 2004-04-26
{{{
BrowserMatch "HTTrack" bad_bot
BrowserMatch "Download Ninja 2.0" bad_bot
BrowserMatch "JBH Agent 2.0" bad_bot
BrowserMatch "EmailCollector/1.0" spam_bot
BrowserMatch "EmailSiphon" spam_bot
BrowserMatch "EmailWolf 1.00" spam_bot
BrowserMatch "ExtractorPro" spam_bot
BrowserMatch "Crescent Internet ToolPak HTTP OLE Control v.1.0" spam_bot
BrowserMatch "Mozilla/2.0 (compatible; NEWT ActiveX; Win32)" spam_bot
BrowserMatch "CherryPicker/1.0" spam_bot
BrowserMatch "CherryPickerSE/1.0" spam_bot
BrowserMatch "CherryPickerElite/1.0" spam_bot
BrowserMatch "NICErsPRO" spam_bot
BrowserMatch "WebBandit/2.1" spam_bot
BrowserMatch "WebBandit/3.50" spam_bot
BrowserMatch "webbandit/4.00.0" spam_bot
BrowserMatch "WebEMailExtractor/1.0B" spam_bot
BrowserMatch "autoemailspider" spam_bot
}}}

주요로봇현황
|| [[Google]]bot || 64.68.*.* ||

관련자료
 * http://user.chollian.net/~jolra/study/robotagent.html