mirror of
https://github.com/python/cpython.git
synced 2025-08-09 19:38:42 +00:00

svn+ssh://pythondev@svn.python.org/python/trunk ........ r68133 | antoine.pitrou | 2009-01-01 16:38:03 +0100 (Thu, 01 Jan 2009) | 1 line fill in actual issue number in tests ........ r68134 | hirokazu.yamamoto | 2009-01-01 16:45:39 +0100 (Thu, 01 Jan 2009) | 2 lines Issue #4797: IOError.filename was not set when _fileio.FileIO failed to open file with `str' filename on Windows. ........ r68141 | benjamin.peterson | 2009-01-01 17:43:12 +0100 (Thu, 01 Jan 2009) | 1 line fix highlighting ........ r68142 | benjamin.peterson | 2009-01-01 18:29:49 +0100 (Thu, 01 Jan 2009) | 2 lines welcome to 2009, Python! ........ r68145 | amaury.forgeotdarc | 2009-01-02 01:03:54 +0100 (Fri, 02 Jan 2009) | 5 lines #4801 _collections module fails to build on cygwin. _PyObject_GC_TRACK is the macro version of PyObject_GC_Track, and according to documentation it should not be used for extension modules. ........ r68146 | ronald.oussoren | 2009-01-02 11:44:46 +0100 (Fri, 02 Jan 2009) | 2 lines Fix for issue4472: "configure --enable-shared doesn't work on OSX" ........ r68148 | ronald.oussoren | 2009-01-02 11:48:31 +0100 (Fri, 02 Jan 2009) | 2 lines Forgot to add a NEWS item in my previous checkin ........ r68149 | ronald.oussoren | 2009-01-02 11:50:48 +0100 (Fri, 02 Jan 2009) | 2 lines Fix for issue4780 ........ r68159 | ronald.oussoren | 2009-01-02 15:48:17 +0100 (Fri, 02 Jan 2009) | 2 lines Fix for issue 1627952 ........ r68160 | ronald.oussoren | 2009-01-02 15:52:09 +0100 (Fri, 02 Jan 2009) | 2 lines Fix for issue r1737832 ........ r68161 | ronald.oussoren | 2009-01-02 16:00:05 +0100 (Fri, 02 Jan 2009) | 3 lines Fix for issue 1149804 ........ r68162 | ronald.oussoren | 2009-01-02 16:06:00 +0100 (Fri, 02 Jan 2009) | 3 lines Fix for issue 4472 is incompatible with Cygwin, this patch should fix that. ........ r68166 | benjamin.peterson | 2009-01-02 19:26:23 +0100 (Fri, 02 Jan 2009) | 1 line document PyMemberDef ........ r68171 | georg.brandl | 2009-01-02 21:25:14 +0100 (Fri, 02 Jan 2009) | 3 lines #4811: fix markup glitches (mostly remains of the conversion), found by Gabriel Genellina. ........ r68172 | martin.v.loewis | 2009-01-02 21:32:55 +0100 (Fri, 02 Jan 2009) | 2 lines Issue #4075: Use OutputDebugStringW in Py_FatalError. ........ r68173 | martin.v.loewis | 2009-01-02 21:40:14 +0100 (Fri, 02 Jan 2009) | 2 lines Issue #4051: Prevent conflict of UNICODE macros in cPickle. ........ r68174 | benjamin.peterson | 2009-01-02 21:47:27 +0100 (Fri, 02 Jan 2009) | 1 line fix compilation on non-Windows platforms ........ r68179 | raymond.hettinger | 2009-01-02 22:26:45 +0100 (Fri, 02 Jan 2009) | 1 line Issue #4615. Document how to use itertools for de-duping. ........ r68195 | georg.brandl | 2009-01-03 14:45:15 +0100 (Sat, 03 Jan 2009) | 2 lines Remove useless string literal. ........ r68196 | georg.brandl | 2009-01-03 15:29:53 +0100 (Sat, 03 Jan 2009) | 2 lines Fix indentation. ........ r68210 | georg.brandl | 2009-01-03 20:10:12 +0100 (Sat, 03 Jan 2009) | 2 lines Set eol-style correctly for mp_distributing.py. ........ r68214 | georg.brandl | 2009-01-03 20:44:48 +0100 (Sat, 03 Jan 2009) | 2 lines Make indentation consistent. ........ r68215 | georg.brandl | 2009-01-03 21:15:14 +0100 (Sat, 03 Jan 2009) | 2 lines Fix role name. ........ r68217 | georg.brandl | 2009-01-03 21:30:15 +0100 (Sat, 03 Jan 2009) | 2 lines Add rstlint, a little tool to find subtle markup problems and inconsistencies in the Doc sources. ........ r68218 | georg.brandl | 2009-01-03 21:38:59 +0100 (Sat, 03 Jan 2009) | 2 lines Recognize usage of the default role. ........ r68219 | georg.brandl | 2009-01-03 21:47:01 +0100 (Sat, 03 Jan 2009) | 2 lines Fix uses of the default role. ........ r68220 | georg.brandl | 2009-01-03 21:55:06 +0100 (Sat, 03 Jan 2009) | 2 lines Remove trailing whitespace. ........ r68221 | georg.brandl | 2009-01-03 22:04:55 +0100 (Sat, 03 Jan 2009) | 2 lines Remove tabs from the documentation. ........ r68222 | georg.brandl | 2009-01-03 22:11:58 +0100 (Sat, 03 Jan 2009) | 2 lines Disable the line length checker by default. ........
79 lines
2.1 KiB
ReStructuredText
79 lines
2.1 KiB
ReStructuredText
|
|
:mod:`robotparser` --- Parser for robots.txt
|
|
=============================================
|
|
|
|
.. module:: robotparser
|
|
:synopsis: Loads a robots.txt file and answers questions about
|
|
fetchability of other URLs.
|
|
.. sectionauthor:: Skip Montanaro <skip@pobox.com>
|
|
|
|
|
|
.. index::
|
|
single: WWW
|
|
single: World Wide Web
|
|
single: URL
|
|
single: robots.txt
|
|
|
|
.. note::
|
|
The :mod:`robotparser` module has been renamed :mod:`urllib.robotparser` in
|
|
Python 3.0.
|
|
The :term:`2to3` tool will automatically adapt imports when converting
|
|
your sources to 3.0.
|
|
|
|
This module provides a single class, :class:`RobotFileParser`, which answers
|
|
questions about whether or not a particular user agent can fetch a URL on the
|
|
Web site that published the :file:`robots.txt` file. For more details on the
|
|
structure of :file:`robots.txt` files, see http://www.robotstxt.org/orig.html.
|
|
|
|
|
|
.. class:: RobotFileParser()
|
|
|
|
This class provides a set of methods to read, parse and answer questions
|
|
about a single :file:`robots.txt` file.
|
|
|
|
|
|
.. method:: set_url(url)
|
|
|
|
Sets the URL referring to a :file:`robots.txt` file.
|
|
|
|
|
|
.. method:: read()
|
|
|
|
Reads the :file:`robots.txt` URL and feeds it to the parser.
|
|
|
|
|
|
.. method:: parse(lines)
|
|
|
|
Parses the lines argument.
|
|
|
|
|
|
.. method:: can_fetch(useragent, url)
|
|
|
|
Returns ``True`` if the *useragent* is allowed to fetch the *url*
|
|
according to the rules contained in the parsed :file:`robots.txt`
|
|
file.
|
|
|
|
|
|
.. method:: mtime()
|
|
|
|
Returns the time the ``robots.txt`` file was last fetched. This is
|
|
useful for long-running web spiders that need to check for new
|
|
``robots.txt`` files periodically.
|
|
|
|
|
|
.. method:: modified()
|
|
|
|
Sets the time the ``robots.txt`` file was last fetched to the current
|
|
time.
|
|
|
|
The following example demonstrates basic use of the RobotFileParser class. ::
|
|
|
|
>>> import robotparser
|
|
>>> rp = robotparser.RobotFileParser()
|
|
>>> rp.set_url("http://www.musi-cal.com/robots.txt")
|
|
>>> rp.read()
|
|
>>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
|
|
False
|
|
>>> rp.can_fetch("*", "http://www.musi-cal.com/")
|
|
True
|
|
|